UNIVERSITY OF CALIFORNIA Los Angeles

Prosodic Transfer: An Acoustic Study of L2 English vs. L2 Japanese

A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Applied Linguistics

by

Motoko Ueyama

2000 © Copyright by Motoko Ueyama 2000 TABLE OF CONTENTS

Chapter 1: Introduction ...... 1 1.1. Language transfer: L1 background plays a role in second language learning . . 1 1.1.1. Overview of language transfer theory ...... 1 1.1.2. Language transfer in L2 speech development ...... 3 1.2. Focus of the present study ...... 5 1.2.1. Prosodic transfer ...... 5 1.2.2. Bi-directional transfer...... 6 1.3. Current view of ...... 7 1.4. Prosodic phenomena investigated in the present study ...... 9 1.4.1. Contrast between lexically accented and unaccented vowels ...... 9 1.4.2. Contrast between English tense vs. lax vowels and between Japanese long vs. short vowels ...... 10 1.4.3. Temporal organization across ...... 12 1.4.4. Minimal unit of prosodic segmentation at the word level ...... 13 1.5. Major factors affecting the prosodic phenomena investigated ...... 14 1.5.1. structure ...... 14 1.5.2. types ...... 17 1.5.3. Phrase-final lengthening ...... 18 1.5.4. size and phrase size ...... 19 1.6. Structure of the present study ...... 20 Chapter 2: Word Accent Production in L2 English and L2 Japanese ...... 21 2.1. Experiments 1 & 2: Word accent production in neutral declaratives ...... 21 2.1.1. Goal ...... 21 2.1.2. Word accent realization in L1 English vs. L1 Japanese ...... 21 2.1.3. Expected patterns in L2 Japanese and L2 English ...... 22 2.1.4. Speech materials ...... 24 2.1.5. Subjects ...... 26 2.1.6. Procedure ...... 29 2.1.7. Results of Experiment 1 (English) ...... 30 2.1.8. Results of Experiment 2 (Japanese) ...... 36 2.1.9. Discussion of Experiments 1 and 2 ...... 40 2.2. Experiment 3: Word accent production after focus in L2 English ...... 45 2.2.1. Prosodic context of the target word in Experiment 1: Nuclear position ...... 45 2.2.2. Possible learning strategy in L2 EnglishÐL1 Japanese ...... 45 2.2.3. Context of the target word in Experiment 3: Post-nuclear position . 46 2.2.4. Expected patterns ...... 47 2.2.5. Subjects ...... 48 2.2.6. Speech materials ...... 48 2.2.7. Procedure ...... 49 2.2.8. Results of Experiment 3 ...... 49 2.2.9. Discussion of Experiment 3 ...... 58 2.3. Summary of Experiments 1-3 ...... 61

ii Chapter 3: Vowel Contrast in L2 English and L2 Japanese ...... 65 3.1. Vowel system of English and Japanese ...... 65 3.2. Characteristics of vowel contrast in English vs. Japanese ...... 66 3.2.1. Prosodic unit ...... 67 3.2.2. Phonetic duration contrast ...... 67 3.2.3. Vowel quality ...... 68 3.2.4. Duration vs. vowel quality in the production of vowel contrasts. . . 68 3.3. Problems ...... 69 3.3.1. L2 JapaneseÐL1 English ...... 69 3.3.2. L2 EnglishÐL1 Japanese ...... 70 3.4. Goal ...... 71 3.5. Method ...... 72 3.5.1. Speech materials ...... 72 3.5.2. Subjects ...... 73 3.5.3. Procedure ...... 73 3.6. Results of Experiment 4 (English) ...... 74 3.6.1. Duration ...... 74 3.6.2. Vowel quality ...... 80 3.7. Results of Experiment 5 (Japanese) ...... 85 3.7.1. Duration ...... 85 3.7.2. Vowel quality ...... 89 3.8. Discussion of Experiments 4 and 5 ...... 94 3.8.1. Vowel contrast in L1 English vs. L1 Japanese ...... 94 3.8.2. Duration contrast in L2 English and L2 Japanese ...... 95 3.8.3. Quality contrast in L2 English and L2 Japanese ...... 100 3.9. Summary of Experiments 4 and 5...... 103 Chapter 4: Temporal Organization Across Syllables in L2 English and L2 Japanese . . . 105 4.1. English vs. Japanese timings ...... 105 4.1.1. Stress-foot and mora as basic timing units...... 105 4.4.2. Factors characterizing different timing types ...... 106 4.2. Linguistic factors investigated ...... 108 4.2.1. Effect of parts of speech on temporal organization...... 109 4.2.2. The lapse constraint and the culminativity requirement ...... 110 4.3. Experiment 6 (English) ...... 111 4.3.1. Expected patterns in L2 English ...... 111 4.3.2. Method ...... 113 4.3.3. Results of Experiment 6 ...... 117 4.3.4. Discussion of Experiment 6 ...... 120 4.4. Experiment 7 (Japanese) ...... 122 4.3.1. Expected patterns in L2 Japanese ...... 122 4.4.2. Method ...... 123 4.3.3. Results of Experiment 7 ...... 125 4.3.4. Discussion of Experiment 7 ...... 133 4.5. Summary of Experiments 6 and 7 ...... 142 Chapter 5: Awareness of L2 Structures ...... 143

iii 5.1. English and Japanese syllable structures ...... 143 5.2. Research questions ...... 146 5.3. Method ...... 147 5.3.1. Subjects ...... 147 5.3.2. Materials ...... 147 5.3.3. Procedure ...... 149 5.4. Results ...... 150 5.4.1. L2 English segmentation ...... 150 5.4.2. L2 Japanese segmentation ...... 156 5.3.3. Procedure ...... 149 5.5. Discussion ...... 150 5.5.1. L1 vs. L2 word segmentation ...... 150 5.5.2. L2 Japanese segmentation ...... 156 5.5.2. Word segmentation in beginning vs. advanced L2 speech ...... 162 5.5.3. Connection between awareness of L2 syllable structures and L2 segmentation production ...... 162 5.6. Summary ...... 166 Chapter 6: Conclusion ...... 168 References ...... 171

iv LIST OF FIGURES

Figure 2-1 F0 means & standard deviations of stressed vs. unstressed vowels for L1 English ...... 31 Figure 2-2 F0 means & standard deviations of stressed vs. unstressed vowels for Speakers AE1 and BE3 ...... 33 Figure 2-3 Average F0 ratio of English stressed/unstressed vowels ...... 33 Figure 2-4 Duration means and standard deviations of stressed and unstressed vowels for Speaker NE1 (L1 English) ...... 34 Figure 2-5 Average duration ratio of English stress/unstressed vowels ...... 35 Figure 2-6 Average F0 ratio of Japanese accented/unaccented vowels ...... 37 Figure 2-7 Duration means & standard deviations of accented vs. unaccented vowels for L1 Japanese ...... 38 Figure 2-8 Average duration ratio of Japanese accented/unaccented vowels ...... 40 Figure 2Ð9 Realization of nuclear pitch accent in English ...... 45 Figure 2Ð10 Realization of post-nuclear word stress in English ...... 47 Figure 2-11 F0 means & standard deviations of stressed vs. unstressed vowels for L1 English in post-nuclear position ...... 50 Figure 2-12 F0 means & standard deviations stressed vs. unstressed vowels for advanced L2 English in post-nuclear position ...... 51 Figure 2-13 F0 means & standard deviations stressed vs. unstressed vowels for beginning L2 English in post-nuclear position ...... 52 Figure 2-14 Duration means & standard deviations of stressed vs. unstressed vowels for L1 English in post-nuclear position ...... 55 Figure 2-15 Duration means & standard deviations of stressed vs. unstressed vowel for advanced L2 English in post-nuclear position ...... 56

Figure 3-1 Vowel system of English and Japanese ...... 65 Figure 3-2 Duration means and standard deviations of tense /i/ and lax /I/ for Speaker NE1 (L1 English) ...... 75 Figure 3-3 Average durational ratio of English tense/lax vowels for Speaker NE1 (L1 English) ...... 76 Figure 3-4 Mean and standard deviation of duration ratio of English tense/lax vowels for BE1, BE3 and AE1 (L2 English) ...... 77 Figure 3-5 Mean and standard deviation of duration ratio of English tense/lax vowels for AE2, AE3 and BE2 (L2 English) ...... 78 Figure 3-6 Average duration ratios of English tense/lax vowels ...... 79 Figure 3-7 /i/ and /I/ in the vowel space of L1 English speakers ...... 81 Figure 3-8 /i/ and /I/ in the vowel space of beginning L2 English speakers ...... 82 Figure 3-9 /i/ and /I/ in the vowel space of advanced L2 English speakers ...... 83 Figure 3-10 Euclidean distance (c) between the tense /i/ token T and the lax /I/ token L . 84 Figure 3-11 Euclidean distance between English tense /i/ and lax /I/ ...... 85 Figure 3-12 Duration means and standard deviations of short and long vowels for Speaker NJ1 (L1 Japanese) ...... 86 Figure 3-13 Duration ratios of Japanese long/short vowels for Speaker NJ1 (L1 Japanese) ...... 87 Figure 3-14 Average duration ratios of Japanese long/short vowels ...... 88 Figure 3-15 /i/ and /ii/ in the vowel space of L1 Japanese speakers ...... 90

v Figure 3-16 Spectral contrast in L1 Japanese vs. L1 English in the high front region . . .90 Figure 3-17 /i/ and /ii/ in the vowel space of AJ1, AJ2 and AJ3 (advanced L2 Japanese) and NE1 (L1 English) ...... 91 Figure 3-18 /i/ and /ii/ in the vowel space of beginning L2 Japanese speakers ...... 92 Figure 3-19 Average Euclidean distance between Japanese long /ii/ to short /i/ ...... 93 Figure 3-20 Mean and standard deviation of Japanese long and short vowels for AJ2, AJ3, BJ2 (L2 Japanese) and NJ1 (L1 Japanese) ...... 98 Figure 3-21 Mean and standard deviation of Japanese long and short vowels for AE2, AE3 (advanced L2 English) and NE1 (L1 English) ...... 99

Figure 4-1 Mean duration & standard deviation of unstressed /o/ for L1 English speakers in Experiment 6 ...... 117 Figure 4-2 Mean duration & standard deviation of unstressed /o/ for advanced speakers of L2 English in Experiment 6 ...... 119 Figure 4-3a Mean duration & standard deviation of vowel /o/ of 4-mora unaccented sentence (okane-da) for L1 Japanese speakers ...... 126 Figure 4-4a Mean duration & standard deviation of vowel /o/ of 4-mora unaccented sentence (okane-da) for one L1 and three L2 Japanese speakers ...... 130 Figure 4-4b Mean duration & standard deviation of vowel /o/ of 5-mora unaccented sentence (tomodatSi-da) for one L1 and three L2 Japanese speakers . . . . .131 Figure 4-5a Waveform and of “tomodatSi-da” in BJ’s production . . . . 137 Figure 4-5b Waveform and pitch contour of “tomodatSi-da” in AJ2’s production . . . 137 Figure 4-6 Mean duration and standard deviation of the vowel /o/ of “tomodatSi-da” for a L1 Japanese speaker (NJ4) and three L2 Japanese speakers ...... 139

Figures 5-1 Syllable structures of English and Japanese in native speakers’ awareness ...... 145 Figures 5-2 More examples of Japanese syllable structure ...... 146 Figure 5-3 Average number of instances of the segmentation unit /no/ for English monosyllabic words ...... 152 Figure 5-4 Percentage of native-English-like patterns produced by L1 English speakers and Japanese speakers of L2 English ...... 153 Figure 5-5 Average number of occurrences of the segmentation unit /no/ as a function of the number of consonants in a syllable ...... 156 Figure 5-6 Total number of occurrences of the segmentation unit /no/ for Japanese words ...... 158

vi LIST OF TABLES

Table 2-1 Background information of L2 English speakers in Experiment 1 ...... 26 Table 2-2 Background information of L2 Japanese speakers in Experiment 2 . . . . . 28 Table 2-3 ANOVA results for F0 data of L1 and L2 English in Experiment 1 . . . . . 32 Table 2-4 ANOVA results for duration data of L1 English in Experiment 1 ...... 35 Table 2-5 ANOVA results for F0 data of L1 and L2 Japanese in Experiment 2 . . . . 37 Table 2-6 ANOVA results for duration data of L1 Japanese in Experiment 2 ...... 39 Table 2-7 Summary of results of Experiment 1 (English) ...... 41 Table 2-8 Summary of results of Experiment 2 (Japanese) ...... 41 Table 2-9 F0 contrast in post-nuclear position ...... 53 Table 2-10 Durational contrast in nuclear position ...... 57 Table 2-11 Durational contrast in post-nuclear position ...... 57 Table 2-11 Durational contrast in post-nuclear position ...... 57 Table 2-12 Manipulation of F0 and duration in nuclear vs. post-nuclear ...... 58 Table 2-13 F0 and duration (D.) contrasts in nuclear position for advanced L2 English ...... 60 Table 2-14a F0 and duration (D.) contrasts in post-nuclear position for advanced L2 English ...... 60 Table 2-14b F0 and duration contrasts in post-nuclear position for advanced L2 English ...... 61

Table 3-1 English and Japanese vowel contrasts ...... 66 Table 3-2 ANOVA results for duration data of L1 English in Experiment 4 ...... 76 Table 3-3 ANOVA results for duration data of L1 Japanese in Experiment 5 . . . . . 86 Table 3-4 Summary of L1 English and L1 Japanese patterns observed in Experiments 4 and 5 ...... 94 Table 3-5 Summary of duration contrast in L2 English and L2 Japanese vowels observed in Experiments 4 and 5 ...... 95 Table 3-6 Summary of quality contrast in L2 Japanese and L2 English vowels observed in Experiments 4 and 5 ...... 100

Table 4-1 Percentages of CV and V syllable types in English, Spanish and Japanese ...... 108 Table 4-2 Background information of L2 English speakers in Experiment 6 114 Table 4-3 Test sentences in Experiment 6 ...... 115 Table 4-4 ANOVA results for L1 English speakers ...... 118 Table 4-5 ANOVA results for advanced speakers of L2 English 120 Table 4-6 ANOVA results for phrase-medial moras in 4-mora and 5-mora sentences produced by L1 Japanese speakers ...... 128 Table 4-7 Grouping of mora positions by L1 Japanese speakers ...... 128 Table 4-8 Accent patterns in L1 and L2 Japanese ...... 129 Table 4-9 ANOVA results for phrase-medial moras in 4-mora and 5-mora

vii sentences produced by three L2 Japanese speakers ...... 132 Table 4-10 Grouping of mora positions by one L1 and three L2 Japanese speakers . 133 Table 4-11 Accent patterns in L1 and L2 Japanese ...... 135

Table 5-1 Syllable structure in English and Japanese ...... 143 Table 5-2 44 monosyllabic words used in the English phonological experiment . . . 148 Table 5-3 24 words used in the Japanese phonological experiment ...... 148 Table 5-4 Number of /no/ (representing the segmentation of English words by L1 English speakers and Japanese speakers of L2 English) ...... 150 Table 5-5 Results of judgments on the segmentation of English words whose syllable structures are also possible in L1 Japanese ...... 154 Table 5-6 Number of /no/ (representing the segmentation of Japanese words by L1 Japanese speakers and English speakers of L2 Japanese) ...... 157 Table 5-7 Expected native patterns in the segmentation of Japanese words ...... 158 Table 5-8 Number of /no/ (representing the segmentation of Japanese words containing short vs. long vowels) ...... 164 Table 5-9 Average duration ratios of Japanese long/short vowels (based on the results of Experiment 4) ...... 165 Table 5-10 Average duration ratios of Japanese long/short vowels ...... 165

viii ACKNOWLEDGMENTS

This dissertation was finished thanks to the advice, encouragement and support of many people. First and foremost, great thanks go to my advisor, Sun-Ah Jun. Since I knocked to the door of her office six years ago, Sun-Ah has been encouraging, enthusiastic and supportive. She not only guided me towards the right direction at every stage of my dissertation, but also provided me with hope and inspiration, which kept me going on this long and bumpy road. I would like to also thank Patricia Keating, Marianne Celce-Murcia and Terry Au, who served in my dissertation committee. I owe special thanks to Patricia Keating. Pat taught me the basics of experimental phonetics, gave me generous help when I was working on my MA thesis and the proposal of this dissertation, invited me to work as a regular member in the UCLA Phonetics Lab, and gave me valid professional and academic opportunities. I thank Marianne Celce-Murcia for her long-term help and warm care. Finally, I thank Terry Au for her helpful comments and also for giving me the opportunity to take part in her developmental psychology project. I also thank Bruce Hayes, who guided me with great patience as the advisor of my MA thesis, which became a starting point of my dissertation work. More importantly, I learned from him to jump off the cliff with courage. I thank all the speakers for their patience in participating in the experiments. They also volunteered insightful comments and observations from the language learner’s point of view. I also thank Tetsuya Sano, who generously offered his time to find and organize subjects for me in Tokyo, and Hideo and Nariko Imamura, who, with great generosity, hosted me at their place during my data collection in Tokyo. I would like to thank the Department of Applied Linguistics, the Department of Linguistics, and the Rotary Foundation for supporting my education financially. I

ix specially thank the Department of Applied Linguistics for providing me with a dissertation year fellowship. It was a precious experience to spend many years working in the UCLA Phonetics Lab. I thank Peter Ladefoged, Ian Maddieson and Donca Steriade for their generosity and helpful comments on my work. Being together with fellow students in the lab made my long marathon comfortable and joyful: special thanks go to Adam Albright, Mary Baltazani, Taehong Cho, Matt Gordon, Testuo Harada, Kim Thomas, and Jie Zhang. I also thank Henry Tehrani for his technical help and for fun chats in his office. I owe thanks to more people for their input, which helped to broaden the scope of this study: Jennifer Venditti for providing insightful comments as a front runner of Japanese phonetics and helpful observations as a L2 Japanese speaker, and also for her friendship; Kikuo Maekawa and Yoshinori Sagisaka for encouragement and inspiring discussions; Masayoshi Shibatani and Keiichi Tajima for helpful comments; Robert Port for encouragement; Takayuki Arai, Tina Cambier-Langeveld, Nick Campbell, Susan Guion, Tetsuo Harada, Yukari Hirata, Yuko Kondo, Haruo Kubozono, Kanae Nishi, Takashi Otake, Yoshinori Sagisaka, Yoshiho Shibuya, Teruhisa Uchida and Natasha Warner for letting me share their work. I thank my friends I met in Los Angeles for sharing a lot of joyful times (often over good food). I specially thank Adam Albright, Ivano Caponigro, Taehong and Hye-Jeong

Cho, Cathryn Donohue, Barry Griner, Chai-Shune Hsu, Paul Iverson, Sahyang Kim, Leah Knightly, Yuri Kusuyama, Emi Moria, Yasuyo Sawaki, Shigeko Sekine and Yoshiko Tomiyama (“Tokko”). I also would like to thank Federica, Roberto, Stefano Cracolici and Kazumasa (“Terra chan”) for their long-distance friendship. Thanks to Alan Mar and Dan Tauber for thier long-term friendship, which saved me at some critical moments of my life. I also thank Cecile Fougeron for her never-changing sisterhood, and Stefano Vegnaduzzo for sharing a number of laughers and blues.

x I thank my American parents, Jack and Elspeth Collins, and my Italian parents, Bruno and Carla Baroni, for their affection, sense of humor and intellectual curiosity. I owe great thanks to my family in Japan --- my father, mother, brother and sister --- for their long- term support and encouragement, and also for understanding me, who has been wandering away from home for many years. Finally, I thank my husband, Marco Baroni, who has been always next to me to share good times and bad times, to put up with my never-changing difficulty with English articles and prepositions, and also to enjoy many wonderful things in life.

xi VITA

August 23, 1964 Born, Hiroshima City, Japan

August 1987 B.A., Education University of Tottori, Tottori, Japan

1993-1994 Teaching Assistant, Department of East Asian Languages and Cultures, University of California, Los Angeles, Los Angeles, California

June 1995 M.A., Teaching English as a Second Language. (advisors: Profs. Bruce Hayes & Marianne Celce-Murcia) Department of Applied Linguistics/TESL, University of California, Los Angeles, Los Angeles, California

1995-1999 Teaching Assistant, Department of Linguistics, University of California, Los Angeles, Los Angeles, California

PUBLICATIONS Ueyama, M. (1999). Durational reduction in L2 English produced by Japanese speakers. Proceedings of the 14th International Congress of Phonetics Sciences. Ueyama, M. (1999). An experimental study of vowel duration in phrase-final contexts in Japanese. UCLA Working Papers in Phonetics 97. Ueyama, M. & S.-A. Jun. (1998). Focus realization in Japanese English and Korean English intonation. In Hajime Hoji (ed.), Japanese and Korean Linguistics 7. CSLI, Stanford University Press. Ueyama, M. (1997). Phonology and phonetics of L2 intonation: the case of Japanese English. Proceedings of the 5th European Speech Conference. Ueyama, M & S.-A. Jun. (1997). Focus realization in Japanese English and Korean English intonation. UCLA Working Papers in Phonetics 94. Ueyama, M. (1996). Phrase-final lengthening and stress-timed shortening effects in the speech of native speakers and Japanese learners of English. Proceedings of the 4th International Conference on Spoken Language Processing. (also in UCLA Working Papers in Phonetics 92) Ueyama, M. (1995). Phrase-Final Lengthening and Stress-Timed Shortening Effects in the Speech of Native Speakers and Japanese learners of English. M.A. thesis, UCLA.

xii ABSTRACT OF THE DISSERTATION

Prosodic Transfer: An Acoustic Study of L2 English vs. L2 Japanese

by

Motoko Ueyama Doctor of Philosophy in Applied Linguistics University of California, Los Angeles, 2000 Professor Sun-Ah Jun, Chair

The effect of L1 characteristics on L2 speech has been investigated extensively at the segmental level. This dissertation investigates how L1 prosodic features affect L2 prosodic patterns in the production of the adult L2 speaker (i.e., prosodic transfer). Four speech types were analyzed: 1) L2 English produced by L1 Japanese speakers; 2) L2 Japanese produced by L1 English speakers; 3) L1 English; 4) L1 Japanese. This comparison is interesting, since Japanese and English are typologically very different in terms of their prosodic properties: e.g., English has stress accents, while Japanese has pitch accents; Japanese has a phonemic length contrast, while English does not; English is a stress-timed language, while Japanese is a mora-timed language. Seven phonetic experiments were conducted to investigate three prosodic phenomena: 1) the contrast between lexically accented and unaccented vowels (Experiments 1-3); 2) the contrast between English tense vs. lax vowels and between Japanese short vs. long vowels (Experiments 4 and 5); 3) the temporal organization across syllables (Experiments 6 and 7).

xiii Prosody is phonetically realized by multiple acoustic correlates, which differ from language to language. In the analysis of the collected data, various correlates relevant to each of the three prosodic phenomena were analyzed, taking both phonological and phonetic aspects into account. Additionally, a survey testing phonological awareness of L2 syllable structure was conducted. The results of the survey were analyzed together with the results of the phonetic experiments. The results supported the following generalizations: first, the transfer patterns of L1 prosodic features in L2 prosody can vary greatly from correlate to correlate. Second, different transfer patterns in the learner’s production can be explained by a difference between L1 and L2 in terms of the phonological status of a relevant prosodic feature. Third, there is a systematic interaction between the prosodic and segmental levels in the transfer of L1 features in L2 speech development. Finally, an L2 speaker’s prosodic system does not necessarily develop in a parallel manner for different dimensions of prosody.

xiv

Chapter 1: Introduction

1.1. Language transfer: L1 background plays a major role in second language learning 1.1.1. Overview of language transfer theory

It is a well known fact that when adult speakers learn to speak a foreign or second language (L2, henceforth) their pronunciation is commonly foreign-accented. One of the major factors causing an accent is the effect of first language (L1, henceforth) characteristics on L2 patterns: i.e., language transfer. The term transfer, as extensively used in the first half of the 20th century, refers to “the psychological process whereby prior learning is carried over into a new learning situation. The main claim with regard to transfer is that the learning of task A will affect the subsequent learning of task B” (Gass & Selinker 1994, p. 54). In the context of language learning, the oldest definition of transfer considers transfer as the carryover of prior linguistic knowledge to an L2 context. If this is true, it is expected that any differences between the L1 and the L2 will create difficulties in L2 learning. This idea was generalized later (in the period from the 1940s to the 1960s) as the Contrastive Analysis Hypothesis (or the strong form of language transfer theory). The hypothesis was linked to behaviorist learning theory, in which it is assumed “that language is habit and that language learning involves the establishment of a new set of habits” (Gass & Selinker 1994, p. 60). The central notion of the Contrastive Analysis Hypothesis is stated by the two advocates of the hypothesis, Lado (1945) and Weinreich (1953):

1 [T]hose elements that are similar to his native language will be simple for him [the learner], and those elements that are different will be difficult. (Lado 1975, p.2)

The greater the difference between the two systems, i.e., the more numerous the mutually exclusive forms and patterns in each, the greater is the learning problem and the potential area of interference... (Weinreich 1953, p.1)

The major claim of the Contrastive Analysis Hypothesis is that all L2 errors can be predicted by identifying differences between L1 and L2 forms and patterns. Systematic L1 effects on L2 learning have been studied by assuming that L2 linguistic patterns can be largely predicted on the basis of L1 characteristics, which transfer to L2 either positively or negatively. Positive transfer takes place when L1 habits facilitate L2 learning, while negative transfer occurs when L1 linguistic characteristics interfere with L2 learning. Contrastive analysis provides a way of comparing the phonological, morphological and syntactic systems of two languages. In contrastive studies, the following procedure is commonly followed in order to predict L2 errors:

(1) description (i.e., a formal description of the two languages is made) (2) selection (i.e., certain items, which may be entire subsystems, such as

the auxiliary system, are selected for comparison) (3) comparison (i.e., the identification of areas of difference and similarity) (4) prediction (i.e., identifying which areas are likely to cause errors) (Ellis 1985, pp. 25-26)

2 The Contrastive Analysis Hypothesis became subject to empirical tests from the end of 1960s, and a number of counterexamples to it were provided. Two main criticisms were presented: “[n]ot all actually occurring errors were predicted [by this hypothesis]; not all predicted errors occurred” (Gass & Selinker 1995, p. 65). The first criticism was based on the finding that many L2 errors are not attributable to L1 patterns, and that there can be similarities between L2 patterns produced by speakers of different L1s. For example, Gilbert & Orlovic (1975) found that no article appears in the oral discourse of beginning L2 German speakers regardless of whether their L1 has articles or not. The second criticism was based on the observation that some errors predicted by differences between the L1 and the L2 do not occur, as shown in Kleinmann’s (1977) study. He found that the progressive aspect, which is absent in Arabic, was learned early and well by native Arabic speakers learning English. He suggested, on the basis of this finding, that when something in L2 is very different from L1, there is a “novelty effect” which facilitates learning L2 patterns. A similar case was also reported in Best’s (1999) study of L2 perception: new that are perceptually salient, e.g., Zulu clicks, can be identified correctly by non-native adult speakers. A number of empirical results along these lines lead to the approach that has been widely accepted in the field of applied linguistics for the past 10 or 15 years, in which language transfer is considered not as a mechanical carryover of L1 structures, but as a

cognitive mechanism that underlies L2 acquisition. In the modified view, the L2 learner’s language is perceived not as a variation on L1 but as an autonomous linguistic system that dynamically changes under the influence of multiple factors including, but not limited to, characteristics of the learner’s L1.

1.1.2. Language transfer in L2 speech development

3 Although a large body of research on L2 acquisition has been conducted in the syntactic and morphological domains, there has also been a strong research interest in identifying L1 characteristics in ‘foreign accents’. In the 1950s, contrastive analyses of the production of L2 phonemes were conducted within the theoretical framework of structuralism (e.g., Weinreich 1953; Haugen 1956). The goal of such analyses was to find how L1 phonemes would interfere with L2 phonemes as produced by the language learner. However, it soon became apparent that the sole consideration of phonemic categories was inadequate to achieve a comprehensive understanding of how L1 characteristics affect the patterns of the L2 learner’s speech. Further research on L2 speech learning “indicated that the classic transfer hypothesis is an oversimplification, and pointed out the need for detailed phonetic investigations that are not subject to the ab initio data reduction of -based description” (Leather and James 1996, p. 278). For example, Brière (1968) trained native speakers of English to produce French, Arabic and Vietnamese phonemes varying in similarity to English phonemes. Results showed that a contrastive analysis of the phoneme inventories of L1 and L2 could not predict the observed patterns. Rather, the relative difficulty of learning different sounds could be explained by taking the phonetic level into account (see Leather and James 1996 for a review). Further evidence for language transfer at the phonetic level was presented in Flege’s (1987) study. He investigated a case in which L2 learners do not have to learn new segmental contrasts since L1 and L2 have similar phones. In this context, the Contrastive Analysis Hypothesis predicts that L2 learners will not have any difficulties, since difficulties should only arise from differences between the L1 and the L2 in phonemic inventories. However, Flege found that L2 phones similar to L1 phones are not necessarily easy to acquire, since even experienced L2 speakers often retain L1 phonetic habits in their production of L2 phones. For example, /t/ is found in both French and English, but it is produced with a short-lag VOT (voice-onset-time) and dental place of

4 articulation in French, and with a long-lag VOT and alveolar place of articulation in English. Flege’s (1987) data showed that experienced French speakers of L2 English produced English /t/ using phonetic characteristics of L1 French, and vice versa for the production of French /t/ in L2 French by L1 English speakers. The aforementioned studies indicate that two sources of L1 speech need to be considered for a better understanding of the general effect of the L1 sound system on L2 speech development: the L1 phonological system, and the L1 phonetic patterns surfacing in the realization of the phonological system. More aspects of L2 speech and more combinations of the learner’s L1 and L2 need yet to be investigated from phonological and phonetic perspectives.

1.2. Focus of the present study

1.2.1. Prosodic transfer

The systematic effect of L1 characteristics on L2 speech has been investigated extensively at the segmental level (for reviews, see Flege 1987, 1995; Leather & James 1996). For a valid assessment of how L1 characteristics affect L2 speech patterns, it is also necessary to investigate the suprasegmental or prosodic aspect of language transfer phenomena. I will call the effect of L1 prosodic characteristics on the L2 speech system prosodic transfer. Prosody widely refers to the phonological organization of individual sounds (i.e., segments) into higher-level constituents and also to the pattern of relative prominence

within these constituents, which is cued by variation of F0, duration, amplitude and segment quality (adapted from Shattuck-Hufnagel & Turk 1996). For example, intonation and timing patterns are considered prosodic phenomena. There have been some instrumental studies on the production of L2 prosody, and in particular L2 intonation (e.g., Gårding 1981 for L2 FrenchÐL1 Swedish and Greek; Todaka 1990 for L2 EnglishÐL1 Japanese; Argyres 1996 for L2 EnglishÐL1 Greek; Ueyama & Jun 1998 for L2 EnglishÐL1

5 Korean or Japanese; Jun & Oh 2000 for L2 KoreanÐL1 English), and L2 timing (e.g., Levitt 1992 for L2 FrenchÐL1 English; Mochizuki-Sudo & Kiritani 1992, Ueyama 1995, Shibuya 1997 for L2 EnglishÐL1 Japanese; Anderson-Hsieh & Venkatagiri 1994 for L2 EnglishÐL1 Chinese; Uchida 1996 for L2 JapaneseÐL1 Chinese). Each of these studies examined a single prosodic phenomenon. We believe that the general mechanism of prosodic transfer can be better understood by considering multiple prosodic phenomena within the same study. The present study investigates three prosodic phenomena in L2 speech production: 1) the contrast between lexically accented and unaccented vowels; 2) the contrast between English tense vs. lax vowels and between Japanese short vs. long

vowels; 3) the temporal organization across syllables.

1.2.2. Bi-directional transfer

Most past studies of L2 speech development examined one direction of language transfer (i.e., either transfer from language A to language B or transfer from language B to language A). Very few studies of L2 speech production have examined the two directions of language transfer within the same study (from language A to language B and from language B to language A). At the segmental level, Flege’s (1987) study on the production of /t/, mentioned earlier in this chapter, examined both L2 French produced by native English speakers and L2 English produced by native French speakers. However, as far as I know, no earlier study has investigated both directions of language transfer at the

prosodic level of L2 speech production. The analysis of both directions of transfer (e.g., L1 Japanese to L2 English and L1 English to L2 Japanese) is expected to be more informative than the analysis of only one direction in investigating prosodic transfer. The present study will instrumentally investigate prosodic transfer in two types of L2 adult speech: L2 English produced by native speakers of Tokyo Japanese and L2 Japanese produced by native speakers of American English (henceforth, L2 EnglishÐL1 Japanese

6 and L2 JapaneseÐL1 English, respectively). A comparison of these two L2 types should be interesting, since Japanese and English are very different in terms of their prosodic characteristics. This comparison of prosodic patterns in the two L2 types appears to be feasible, since the prosodic characteristics of L1 Japanese and L1 English are relatively well described under the same framework (relevant references are discussed later in this chapter).

1.3. Current view of prosody

Over the past 20 years, prosodic theory has been extensively developed. In the 1960s and 1970s, it was commonly assumed in generative grammar and psycholinguistic studies that the structural constituents of spoken utterances of a sentence correspond to those predicted by the syntax. Many studies showed that major acoustic phonetic phenomena such as intonational boundaries, preboundary lengthening or pausing, tend to occur at major syntactic boundaries (e.g., Klatt 1975). However, the examination of large corpora of utterances actually produced by speakers showed notable discrepancies with the results predicted by syntax. Although prosodic structure is largely determined by syntactic structure, it is found that the two structures are not isomorphic (see Shattuck-Hufnagel & Turk 1996, Fougeron 1999 for reviews; and Gee & Grojean 1983, Ferreira 1992, Jun 1993, 1998 among others for experimental evidence). Furthermore, it was found that extra-syntactic factors influence the constituency of spoken utterances. Research in the 1980s and 1990s provided evidence for the claim that “speakers make active use of prosodic elements in the production of spoken utterances, and that systematic variations in the phonetic realization of phonemic segments and features depends at least in part on prosodic structure” (Shattuck-Hufnagel & Turk 1996, p. 225; also see Jun 1993; Fougeron & Keating 1998; Fougeron 1999; Keating et al. [to appear]).

7 Currently, the existence of prosodic structure as an independent component of language is widely recognized. Prosody can be analyzed at two levels:

• At the level of physical realization in the speech signal, in terms of acoustic patterns of F0, duration, amplitude, spectral tilt, and segmental reduction, and their articulatory correlates.

• And at the level of its utterance-structuring function (which determines its physical realization). Prosodic structure is “the organization structure of speech” and “a complex grammatical structure that must be parsed in its own right” (Beckman 1996). This structure is organized in prosodic constituents defined as “domains” in which particular prosodic phenomena are realized. These phenomena are considered as prosodic because they do not refer to segments, but to higher level constituents. In this sense, they are suprasegmental phenomena. (adapted from Shattuck-Hufnagel & Turk 1996 and Fougeron 1999)

The realization of prosodic structure can be analyzed in terms of two main types of organization of continuous speech: tonal organization (i.e., intonational structure) and temporal organization (i.e., rhythmic structure). Both types of organization are acoustically realized as suprasegmental phenomena, and they can be analyzed as the variation of physical correlates of speech. Tonal organization is acoustically realized as F0 variation, and temporal organization is realized as variation in duration. Both types of prosodic organization are largely determined by phonological categories, such as the position of word accent or stress (e.g., in English, lexically stressed syllables are longer in duration and higher in F0 than unstressed syllables) or by the distribution of segments (e.g.,

8 Japanese phonemically long segments are at least two times longer than short segments, and in Japanese poetry, long segments are counted as two beats, while short segments count as one beat). As languages are different in terms of their segmental inventories, they also differ in terms of their prosodic properties. This, of course, applies to English and Japanese. The present study specifically investigates the effect of L1 prosodic characteristics on L2 prosodic patterns in the production of two L2 types, i.e., L2 EnglishÐL1 Japanese and L2 JapaneseÐL1 English. The investigation of the study focuses on the following three aspects of prosody: 1) the contrast between lexically accented and unaccented vowels; 2) the contrast between English tense vs. lax vowels and between Japanese short vs. long

vowels; 3) the temporal organization across syllables. English and Japanese are different in all these three aspects. First, English has a stress accent, while Japanese has a pitch accent. Second, Japanese has a phonemic length contrast, while English does not. Third, English is a stress-timed language, while Japanese is a mora-timed language. In the rest of this chapter, English and Japanese will be compared with respect to these three aspects and other important factors affecting prosodic structure.

1.4. Prosodic phenomena investigated in the present study

1.4.1. Contrast between lexically accented and unaccented vowels

Although they both possess lexically specified accents, English and Japanese differ in terms of how they manipulate the three acoustic correlates of accent: F0, duration and intensity. English word accent (i.e., stress) is cued by the combination of all three main correlates1: a significant change in F02, longer duration and greater amplitude (Fry 1955;

1 Another correlate of accent on which much less research has been conducted is vowel quality (Lehiste 1970). This correlate will not be included in our analysis of the production of L2 word accent. 2 Whether accent is associated with an increase or decrease of F0 depends on the types of pitch accents associated with stressed syllables. See the discussion about intonation in the section on control factors.

9 Liberman 1960; Lea 1977; Nakatani, et al. 1982; Beckman 1986). In contrast, Japanese accented and unaccented syllables are reliably distinguished only by fundamental frequency (F0) differences, and not by duration nor intensity (Han 1969; Weitzman 1969; Hoequist 1983a, 1983b; Sugito 1982a, 1990; Beckman 1986). Following Beckman (1986), if we define ‘stress’ as lexically specified prominence involving all three physical correlates, we can say that English lexical accents are stress accents, while Japanese lexical accents are non-stress accents. The acoustic difference between Japanese and English accents is paralleled by perceptual differences. Perception studies of English stress accents (Fry 1958; Lea 1977; Nakatani & Aston 1978; Beckman 1986) essentially agree on the point that “the listener relies on differences in (1) the length of the syllables, (2) the loudness of the syllables and (3) the pitch of the syllables... (Fry 1958, pp. 127-128).” On the other hand, Japanese speakers rely on pitch as the main cue in order to distinguish accented syllables from unaccented syllables (Beckman 1986). These studies have shown that the three prosodic correlates (F0, duration and intensity) are treated differently in Japanese and English in the production and perception of lexically accented and unaccented syllables. It is thus interesting to see how the two systems interact with each other in L2 speech development. The first three experiments of the present study will concern the two L2 types with respect to the manipulation of F0 and duration in the production of L2 word accent.

1.4.2. Contrast between English tense vs. lax vowels and between Japanese long vs. short vowels

In both English and Japanese, duration is used as an acoustic correlate of vowel contrast, and the distribution of vowels greatly conditions rhythmic organization. In English, both duration and vowel quality contribute significantly to the contrast between tense and lax

10 vowels (e.g., meat /m i t/ vs. mitt /m I t/; pool /p u l/ vs. pull /p U l/). Tense vowels are longer in duration (e.g., /i/ is longer than /I/) and more peripheral in the vowel space (e.g., /i/ is more peripheral than /I/) than lax vowels (Peterson & Lehiste 1960; Hillenbrand et al. 1995). Japanese displays a phonemic contrast between short and long consonants and vowels (e.g., /ka' tt a/ ‘won’ vs. /ka' t a/ ‘shoulder’; /t oo 'ru/ ‘pass’ vs. /t o 'ru/ ‘catch’). Such a contrast can be characterized in both phonological and phonetic terms. Short and long segments are phonologically categorized by L1 Japanese speakers as monomoraic and bimoraic, respectively (the mora is an abstract timing unit of Japanese, as discussed in the following section), and the phonetic durations of short and long segments are systematically differentiated. However, unlike for English tense and lax vowels, there is no significant difference in vowel quality between Japanese short and long vowels (e.g., /i/ and /ii/). The comparison of vowel contrasts in English and Japanese shows that they greatly differ in terms of the phonemic status and phonetic treatment of duration and vowel quality, the two acoustic correlates of vowel contrast. For duration, while the Japanese contrasts are characterized in both phonological and phonetic terms, the English contrasts are only phonetic but not phonological. In terms of vowel quality, English tense vowels are more peripheral than their lax counterparts in similar regions of the vowel space. In that sense, we can say that vowel quality plays a phonemic role in the production of English contrasts.

However, unlike the case of English tense and lax vowels, there is no significant quality difference between Japanese short and long vowels. The consideration of both phonological and phonetic properties of vowel contrasts leads us to ask the following two questions. How do L1 vowel contrasts transfer at the phonetic level, given the difference between English and Japanese in terms of the phonological status of vowel contrasts? The goal of Experiments 4 and 5 is to answer

11 these questions by investigating durational and spectral patterns of vowel contrasts in the production of L2 EnglishÐL1 Japanese and L2 JapaneseÐL1 English.

1.4.3. Temporal organization across syllables

Languages have been traditionally classified into different timing categories in terms of the fundamental units of timing or duration (for a summary, see Beckman 1992; Tajima 1998). English is classified as a stress-timed language, in which the fundamental isochronous unit of timing is the stress foot (Pike 1945; Abercrombie 1967). French is classified as a syllable-timed language in which the syllable is used as a basic timing unit. Finally, Japanese is classified as a mora-timed language having the mora as the basic timing unit (Jinbo 1927; Bloch 1950; Warner & Arai [submitted] for a review). However, many studies have later found that none of these timing units corresponds to a constant isochronous interval in the acoustic signal. They are rather abstract units without stable acoustic manifestations. Despite the absence of acoustic evidence, it has been proposed in recent psycholinguistic studies that the timing units play a crucial role in segmenting continuous speech when speech input is processed. Evidence from studies of adult speech processing (see Cutler 1996 for a review) suggests that English speakers do use the stress foot, French speakers the syllable, and Japanese speakers the mora as preferred units of segmentation. This indicates that the timing units are a part of grammar and psychologically real.

Previous studies also show that different timing types are characterized by other properties of speech. For example, it has been pointed out that the distribution of different types of syllable structures greatly affects the average durational difference between stressed and unstressed syllables. In stress-timed languages, more complex syllable structures tend to be found in stressed syllables, while simple structures (CV) occur in unstressed syllables. This difference results in a higher average number of segments in the

12 stressed syllables, which in turn explains the higher average duration of the stressed syllables (see Fant, Kruckenberg & Nord 1991). The inventory of syllable types is more limited in syllable-timed languages like French, Italian or Spanish, and even more limited in mora-timed languages like Japanese. It has been found that differences between L1 and L2 syllable structure cause problems in L2 speech production. For example, when some L2 syllables are not legitimate in L1 syllable structures (typically, they have a more complex structure than what is allowed in L1), the beginning language learner phonologically reorganizes L2 syllables to adapt them for possible L1 syllable structures (see Browselow 1983 and Browselow & Park 1995 for case studies of epenthesis errors in L2 production). This reorganization of the structures of L2 syllables impacts L2 temporal organization. Crosslinguistic differences in timing patterns are also affected by the treatment of duration in lexical accent realization. In English, stressed syllables are longer than unstressed syllables, and consequently the durational patterns of utterances are strongly affected by the distribution of lexical stress. On the other hand, in Japanese, lexical accent does not affect the duration of syllables. Consequently, the durational patterns of utterances are independent of the distribution of lexical stress, but rather dependent on the distribution of phonemic short and long segments. In other words, in English, lexical accent properties and rhythmic organization are closely related. In Japanese, these two aspects are independent, and rhythmic organization largely depends on the phonemic

distribution of short and long segments. Considering all these factors affecting temporal organization, Experiments 6 and 7 investigate the effect of L1 prosodic characteristics on temporal organization in L2 EnglishÐL1 Japanese and L2 JapaneseÐL1 English.

1.4.4 Minimal unit of prosodic segmentation at the word level

13 English and Japanese also differ in terms of the timing units used in segmenting continuous speech, as discussed earlier: the stress foot in English and the mora in Japanese. In the case of word segmentation, native speakers of English use the syllable as the minimal segmentation unit, while native speakers of Japanese use the mora. For example, the English word corn is not further segmented by native speakers of English, since this word is monosyllabic. The same word is borrowed and lexicalized as koon in Japanese. By employing the mora as the minimal segmentation unit, koon is further segmented into 3 moras (ko.o.n) by native Japanese speakers (note that the coda nasal is also counted as an independent mora). The difference between the two languages in terms of which phonological unit is employed as the minimal segmentation unit raises an important question: Do L2 speakers become aware of L2 prosodic word structure? To answer this question, a phonological survey was additionally conducted in the present study.

1.5. Major factors affecting the prosodic phenomena investigated

In order to investigate prosodic transfer in L2 English and L2 Japanese, the present study examines three acoustic correlates of prosody, i.e., F0, duration and vowel quality. It is well known that the variation of these acoustic correlates in prosodic realization is influenced by both segmental effects and other prosodic factors. Previous studies report that phonetic realization of the physical correlates of suprasegmental features (e.g., F0, duration, intensity or vowel quality) is highly affected by multiple linguistic factors. In the

seven phonetic experiments of the present study, the following major factors are controlled in order to strictly examine the three prosodic phenomena of interest: 1) intonational structure; 2) segment types; 3) phrase-final lengthening; 4) foot size and phrase size.

1.5.1. Intonational structure

14 As reviewed in the last section, the lexical specification of accents is phonetically realized by language-specific manipulations of the prosodic correlates: for English stress accents, this means a change in F0, longer duration and greater amplitude; for Japanese non-stress accents, there is only a change in F0. At the phrase and sentence level, in both languages, the values of the prosodic correlates can vary in different intonation patterns. This point will be explained for the case of English, adopting the phonological model of English intonation proposed by Pierrehumbert and her colleagues (Pierrehumbert 1980; Beckman & Pierrehumbert 1986). This model of English intonation can be summarized as follows (the summary is adapted from Ueyama & Jun 1998). Continuous F0 contours are analyzed as sequences of underlying H and L tones. These tones are categorized as one of three types: pitch accents; phrasal tones (or accents); boundary tones. The pitch accent is associated with the stressed syllable of the phrase, and by this association, the stressed syllable of a certain word receives pitch prominence. The boundary indicates the end of an Intonational Phrase (IP), which is the highest level of English prosodic organization. Finally, the phrasal tone indicates the end of an intermediate phrase, which is the second highest level below an IP, and it covers the space between the last pitch accent and the boundary tone in the IP. In English, there are six types of pitch accents (H*, L*, H+L*, H*+L, L+H*, L*+H), two types of phrasal tones (H-, L-), and two types of boundary tones (L%, H%). These three types of tones (pitch accent, phrasal tone, and boundary tone) are hierarchically organized, reflecting the hierarchical organization of an utterance. In the intonational structure of English, one IP has at least one pitch accent, one phrasal and one boundary tone (and exactly one phrasal and boundary tone if the IP is also one intermediate phrase), and one IP can have more than one intermediate phrase. Thus, the internal structure of an IP is formed by a sequence of pitch accents and phrasal tones. If there is more than one pitch accent in the intermediate phrase, the last pitch accent is the most prominent one in English, and it is

15 labeled as the nuclear pitch accent. Intonation contours provide information about phrasing, which varies depending on multiple factors such as the types and numbers of pitch accents, syntactic structure, speech rate, and the presence/absence and locations of pragmatic focus (see Jun 1993, Shattuck-Hufnagel & Turk 1996 for reviews). Pirrehumbert’s model of English intonation has been applied to Japanese by Beckman & Pierrehumbert (1986), Pierrehumbert & Beckman (1988) and Venditti (1995, 2000). Japanese intonation patterns are also analyzed as sequences of H and L tones, and these tones are categorized into pitch accents, phrasal tones and boundary tones. The Japanese tone inventory is much smaller than the English one. According to The Japanese ToBI Labeling Guidelines (Venditti 1995), which provide a system for transcribing Tokyo Japanese, there is one type of pitch accent (H*+L), one type of phrasal tone (H-) and three types of final boundary tones (L%, H%, HL%; LH% is also included in Venditti 2000). As in English, there are two hierarchical levels of tonally defined phrases in Tokyo Japanese: accentual phrases (AP) < intonational phrases (IP). In Tokyo Japanese, pitch accents are associated with lexically accented syllables, and the intonation pattern of the AP depends on the distribution of lexical accents. Some words are lexically accented, and some are unaccented in Tokyo Japanese. Consequently, the AP containing an accented word is accented, and the AP containing an unaccented word is unaccented. Consider the contrast between the two accented forms kaki' (accent on the second mora) ‘fence-NOM’

and ka'ki (accent on the first mora) ‘oyster-NOM’ and the unaccented form kaki (no accent) ‘persimmon-NOM’:

Accented APs H*+ L | kaki ‘fence’ kaki'-ga (lexical pitch accent specified on ki ) H*+L | kaki ‘oyster’ ka'ki-ga (lexical pitch accent specified on ka)

16 Unaccented AP kaki ‘persimmon’ kaki-ga3 (no lexical pitch accent) (adapted and modified from Shibuya 1997) In this model, as summarized in Venditti (1995), words may group together into an AP delimited by three tones: a H- phrasal tone and L% boundary tone. That is, the beginning of an AP is a H- phrasal tone near the beginning of an AP, and a final L% boundary tone marking the end. The H- phrasal tone is marked on both unaccented and accented APs, but it is not marked when H- is indistinguishable from the high tone of the lexical accent (H*+L) in an accented AP. The end of an AP is indicated by a L% boundary tone, which also serves to mark a fall tonal movement at the end of the IP when the end of the IP is also the end of an AP. As in English, different sequences of tone types directly affect F0 patterns in Japanese. For example, other things being equal, the accented AP has a higher F0 maximum than the unaccented AP (Beckman & Pierrehumbert 1986; Pierrehumbert & Beckman 1988). Also, different phrasing types affect the distribution of duration. Japanese tone types do not affect duration values (Homma 1981; Beckman 1986; Sugito 1982a, 1996), probably due to the strong immalleability of the mora duration. However, the IP-final syllable is often subject to phrase-final lengthening; consequently, this syllable is realized with much longer duration than other syllables within the same IP (see the following discussion on phrase- final lengthening).

Since duration and F0 are influenced by the tonal organization of phrases in a language- specific manner, intonation patterns will be matched across sentences and speakers in our experiments.

1.5.2. Segment types

3-ga is the nominative marker in Japanese.

17 Each segment has intrinsic prosodic characteristics (see Lehiste 1970, Beckman 1986 for review). “Other things being equal, a higher vowel generally has a higher F0 than a lower vowel, and the effect has been found in enough languages that there must be a physical basis” (Beckman 1986, p. 129). Also, different vowels have different intrinsic durations. “Other things being equal, a low vowel will be longer than a high vowel” (Beckman 1986, p. 141; also see Lehiste 1970). This relationship between and vowel height has been shown for typologically different languages including English (e.g., Peterson & Lehiste 1960; Lehiste 1970; Umeda 1975; van Santen 1992) and Japanese (e.g. Han 1961; Nishimura 1979). A similar relationship has been found for intrinsic amplitude: “other things being equal, a low vowel has higher peak overall intensity than does a high vowel” (Beckman 1986, p. 142). This is also attested for various languages including English (e.g., Lehiste & Peterson 1959) and Japanese (Homma, 1973). In addition, the acoustic patterns of segments vary greatly depending on the types of adjacent segments (for F0, see Beckman 1986, for duration, see van Santen 1992). Therefore, in the present study we will control for intrinsic and neighboring segment effects on the target vowels.

1.5.3. Phrase-final lengthening

Many if not all languages display phrase-final lengthening at the ends of prosodic units at some level. In English, it is reported that the amount of phrase-final lengthening reflects at least four levels of the prosodic hierarchy (Wightman et al. 1992): prosodic word <

accentual phrase4 < intermediate phrase (ip) < intonational phrase (IP). In Japanese, a similar type of correspondence is found, but which prosodic levels correspond to a certain amount of phrase-final lengthening is still controversial. For example, Beckman & Pierrehumbert (1988) have found that both intermediate and intonational phrases are

4 The existence of an accentual phrase in English was originally hypothesized by Beckman and Pierrehumbert (1986). However, a later study (Beckman & Edwards 1990) could not find any phonetic

18 marked by phrase-final lengthening, but they recognize that this finding should be confirmed by perception tests. The results of Ueyama’s (1999) production experiment show significant effects only at the end of the IP, but not at the end the intermediate phrase. It is also reported that the mora at the end of a breath group5 is longer, in a number of studies of Japanese duration patterns based on large-size speech corpora (Takeda et al. 1989; Kaiki et al. 1990; Campbell 1992; Kaiki & Sagisaka; Venditti & van Santen 1998). It was also reported that phrase-final lengthening spreads regressively to non-final syllables in Hebrew (Berkovits 1993) and Dutch (Cambier-Langeveld 2000). A similar effect on the penultimate syllable of a phrase has been observed in English (Ueyama 1995), so it seems that the distance between a target syllable and a boundary can affect duration at least in the case of English. In the present study, phrase-final lengthening and the regressive spreading of this effect was controlled by not putting the target vowels at the edges of prosodic units, and also by matching the number of segments and syllables between the target syllables and the boundary.

1.5.4. Foot size and phrase size

As briefly discussed in earlier sections, it has been claimed that English stress-timing is characterized by the constant duration of the stress-foot. This implies that constituent segments are more compressed as the foot size increases. This shortening effect is called stress-timed shortening. Later studies have found that this does not generally occur (e.g.,

for experimental evidence, Nakatani et al. 1982, Ueyama 1995; for review, Lehiste 1977, Kawasaki 1983), but it is possible that some speakers have some tendency toward this pattern. Thus, in the present study the number of syllables between two stressed syllables (i.e., interstress interval) was kept constant in the neighborhood of the target syllables.

correlates of this prosodic unit in English. In Wightman et al. (1992), this term refers to a grouping of words within an intermediate phrase. 5The breath group corresponds to the IP or the utterance.

19 The size of the prosodic units also affects the properties of prosodic organization. For example, Jun (1993) says that prosodic phrasing is sensitive to multiple factors including “heaviness of the phrase, that is, to the number of syllables within the phrase” (p. 180). If the phrase size is too large, the phrase tends to break up into two smaller phrases. A similar effect was also found in Japanese phrasing (Kubozono 1993; Maekawa 1994). Phrase size also affects the duration of constituent segments. Earlier studies have reported that in Japanese vowels are shorter as the length of phrases increases (e.g., Homma 1982; Kaiki and Sagisaka 1992). Furthermore, Ueyama and Jun (1998) have shown that less proficient L2 speakers tend to divide utterances into smaller phrases. So it is expected that phrase size will influence both segmental duration and prosodic grouping of speech in both L1 and L2 speech. For this reason, in the present study target syllables were embedded in phrases of similar sizes.

1.6. Structure of the present study

In the present study, the data of seven production experiments are analyzed in order to investigate language transfer in the three prosodic domains discussed earlier:

Investigated prosodic phenomena :

1) the contrast between lexically accented and unaccented vowels (Experiments 1Ð3)

2) the contrast between English tense vs. lax vowels and between Japanese short vs. long vowels (Experiments 4 & 5)

3) temporal organization across syllables (Experiments 6 & 7)

Moreover, a phonological survey was organized to test whether speakers become aware of the minimal segmentation unit in L2. The results of the phonological survey will be analyzed together with the results of the production experiments.

20 Chapter 2: Word Accent Production in L2 English and L2 Japanese

2.1. Experiments 1 & 2: Word accent production in neutral declaratives

2.1.1. Goal

Two experiments were conducted in order to investigate the effect of L1 phonetic habits on L2 word accent production. Experiments 1 and 2 examined L2 English produced by native speakers of Tokyo Japanese and L2 Japanese produced by native speakers of American English, respectively, focusing on the acoustic patterns of word accent in neutral declarative sentences.

2.1.2. Word accent realization in L1 English vs. L1 Japanese

Both English and Japanese have word accent whose position is specified at the lexical level: stress for English (e.g., stress on the first syllable in MUsic and on the second syllable in beLOW) and pitch accent for Japanese (e.g., pitch accent on the first syllable in ha shi ‘chopsticks’ and on the second syllable in ha shi ‘bridge’). There are similarities and differences between the two languages in terms of the manipulation of acoustic correlates in word accent production (see Section 1.4.1. for a review). English stress and Japanese pitch accent are similar in the sense that F0 is an essential correlate in word accent production. In both languages, lexically accented syllables show higher F0 than unaccented syllables6. However, the two languages differ in terms of the manipulation of duration. In English, stressed syllables are longer than unstressed syllables (i.e., duration is an active

6 This is always true in Japanese, but true in English only if a stressed syllable receives no accent or does not receive a post-lexical low tone accent (i.e., only if it receives a high tone accent), as discussed in Section 1.5.1.

21 correlate in word stress production). In contrast, in Japanese, there is no systematic difference in duration between accented and unaccented syllables (i.e., duration is an inert correlate in pitch accent production). However, remember that length is a phonemic feature in Japanese: short and long segments contrast phonemically ( o ki ‘off shore’ vs. oo ki ‘(last name)’; ka t a ‘form’ vs. ka tt a ‘bought’). Therefore, it can be said that duration is active phonologically in Japanese even though not phonetically active in pitch accent production in the sense that there is no significant durational difference between lexically accented and unaccented syllables when the other factors are kept equal. The manipulation of F0 and duration in English and Japanese for word accent production in neutral declarative sentences can be summarized as follows (YES and NO indicate the active and inactive roles of these two acoustic correlates in word accent production, respectively):

F0 Duration

L1 English YES YES

L1 Japanese YES NO

2.1.3. Expected patterns in L2 Japanese and L2 English F0 patterns

The aforementioned comparison of L1 English and L1 Japanese in terms of the

manipulation of F0 and duration leads us to expect the following L2 patterns at least in the initial stages of the development of L2 JapaneseÐL1 English and L2 EnglishÐL1 Japanese. The active role of F0 in L1 Japanese and L1 English probably facilitates the production of a F0 contrast between lexically accented and unaccented syllables in L2 English and L2 Japanese, respectively, i.e., phonetic habits are expected to positively transfer to L2 word production.

22 The effect of the active role of F0 in L1 Japanese was observed in previous studies of L2 EnglishÐL1 Japanese both in production and perception (for production, Shibuya 1997; for perception, see Beckman 1986, Watanabe 1987). It was found that F0 plays a dominant role in the production and perception of English stress by Japanese speakers, probably due to the transfer of L1 Japanese features. As far as I know, there is no study of L2 JapaneseÐL1 English from a production point of view, and only Beckman’s study (1986) has investigated the case of perception. Beckman compared monolingual Americans with Americans who had lived in Japan for one year at least in the past. Her results showed that exposure to authentic Japanese was positively correlated with the use of F0 as the main cue to the perception of Japanese accents. The case of production will be investigated in Experiment 2 of the present study.

Duration patterns

In order to learn the Japanese system, native speakers of English need to know how to suppress duration, i.e., an active correlate in their L1 accent system, in their L2 Japanese production. In contrast, native speakers of Japanese learning English need to learn how to activate duration, i.e., an inert correlate in their L1 accent system. The manipulation of F0 and duration in each L1 type and what needs to be learned to produce native-like patterns in each L2 type can be summarized in the following way:

suppress F0 Duration in L2 Japanese L1 English YES YES

L1 Japanese YES NO activate in L2 English

Ueyama’s (1996) study presented acoustic evidence of a positive correlation between the duration ratio of stressed/unstressed vowels and L2 proficiency levels in L2 English-L1

23 Japanese: i.e., more experienced Japanese speakers of L2 English showed a greater duration ratio of stressed/unstressed vowels. We expect that Experiment 1 of the present study will confirm this tendency in the production of English stress by Japanese speakers. As already mentioned, Beckman (1986) examined the perception of Japanese accents by Americans by assessing the relative perceptual salience of the four acoustic correlates of word accents (F0, duration, amplitude and vowel quality), in comparison with the perception of English stress by the same subjects. For L1 English, she found that all of the four acoustic parameters showed significant effects in the perception of English stress7. If we compare the use of the four acoustic correlates in L1 English and L2 JapaneseÐL1 English presented in Beckman’s data, we find that exposure to authentic Japanese tends to be positively correlated with the suppression of the use of duration in the perception of Japanese accent by English speakers. In Experiment 2, we will test whether a similar correlation is observed in the production of Japanese accent by English speakers.

2.1.4. Speech materials Experiment 1 (English)

In this experiment, four pairs of English nouns and verbs were compared: CONtract vs. conTRACT; DIgest vs. diGEST; PERmit vs. perMIT; SUBject vs. subJECT. For each pair, the noun form has word stress on the first syllable, while the verb form has word stress on the second syllable, and the two forms are segmentally homophonous except for the vowel quality of the first vowel (the first vowel of the verb form is typically reduced)8. One pair ( PER mit vs. per MIT) has a reduced second syllable in the noun. A context

7 This result agrees with the findings of Nakatani and Aston (1978). 8 The verb di GEST has two different pronunciations: [daIdZEst] and [dIdZEst]. All participants in Experiment 1 produced this verb with the former pronunciation, so, for our purposes, the noun form and the verb form of “digest” can be considered segmentally homophonous.

24 sentence and a frame sentence with the same target word were presented in order to elicit the expected stress patterns:

Context: I read Reader’s Digest. Frame: I said DIgest this time. Context: Cows can digest grass. Frame: I said diGEST this time.

In this context, we expected nuclear accent (i.e., sentence stress) on the stressed syllable of the target word of the frame sentence. For each word pair, F0 and duration of the first vowels were measured and statistically analyzed to compare patterns in stressed vs. unstressed conditions:

stressed vs. unstressed e.g. DI gest di GEST CON tract con TRACT

Experiment 2 (Japanese)

In this experiment, 3 pairs of segmentally homophonous Japanese nouns were compared: ki'ru ‘cut’ vs. kiru' ‘wear’; sa'too ‘(name)’ vs. sato'o ‘sugar’; to'Si ‘city’ vs. toSi' ‘age/year’. In each pair, the first form has a lexical pitch accent or word accent on the first syllable, while the second form has a word accent on the second syllable. The target word was presented in the following frame sentence:

Sosite to iimasu. (‘Next, I’ll say .’)

A context sentence was not needed in the Japanese experiment, since different words are spelled differently. For each word pair, F0 and duration of the first vowels were measured and statistically compared in the accented vs. unaccented condition:

25 accented vs. unaccented e.g. ki 'ru ki ru' sa 'too sa to'o

2.1.5. Subjects Experiment 1 (English)

The set of speakers for Experiment 1 included one control group and two experimental groups. The control group consisted of 4 native speakers of American English (NE1, NE2, NE3 and NE4). They were all male speakers except NE3. They were all UCLA undergraduate students at the time of recording. The two experimental groups consisted of 8 native speakers of Japanese learning English (L2 English speakers, henceforth): 4 speakers for each experimental group (AE1, AE2, AE3 and AE4 for the advanced learner group; BE1, BE2, BE3 and BE4 for the beginning learner group). The background information for all Japanese speakers of L2 English is summarized in Table 2-1.

Table 2-1: Background information of L2 English speakers in Experiment 1

age gender years of age of age of duration of English residence arrival in beginning instruction in the US the US of English instruction AE1 31 female 6 years 25 12 13 years 11 months AE2 28 female 7 years 21 13 13 years AE3 24 male 7 years 17 13 9 years 6 months AE4 29 female 11 years 18 13 10 years 3 months BE1 21 male 0 --- 13 8 years BE2 22 male 0 --- 13 8 years BE3 22 male 0 --- 13 8 years BE4 22 female 0 --- 12 10 years 9 months

26 The criterion used to select speakers for the two L2 speakers groups was the number of the years of residence in the United States. The four advanced Japanese speakers of L2 English had been staying in the United States for more than 5 years, while four beginning speakers had never stayed in English speaking countries for more than 3 months (in Table 2-1, zero indicates that the speaker resided in an English speaking country for less than 3 months). At the time of data collection, all four advanced speakers were UCLA students, while all four beginning speakers were college students in Tokyo, Japan. AE3, BE1Ð3 were male speakers, and the others were females. All 8 Japanese speakers of L2 English speak Tokyo dialect as their mother tongue9.

Experiment 2 (Japanese)

The control group for Experiment 2 consisted of 4 native speakers of Japanese (Tokyo dialect) who were college students in Tokyo, Japan (NJ1, NJ2, NJ3 and NJ4). They were all male speakers except NJ4. The experimental groups consisted of 7 native speakers of American English learning Japanese (L2 Japanese speakers, henceforth). As in Experiment 1, the number of the years of residence in the United States was used to group L2 Japanese speakers into the advanced and beginning learner groups. The background information for all seven speakers of L2 Japanese is summarized in Table 2-2. The three advanced speakers of L2 Japanese (AJ1, AJ2 and AJ3) had been staying in Japan for more than 4 years, while beginning speakers of L2 Japanese (BJ1, BJ2, BJ3 and BJ4) had never stayed in Japan more than 3 months (in Table 2-2, zero indicates that the speaker’s duration in an English speaking country is less than 3 months). AJ1, AJ2, BJ1, BJ2 and BJ4 were male speakers, while AJ3 and BJ3 were female speakers. All

9 It is important to keep the same dialectal background for Japanese speakers learning English, since Japanese dialects greatly differ in terms of tonal patterns.

27 beginning speakers of L2 Japanese were undergraduates studying Japanese as a foreign language at UCLA.

Table 2-2: Background information of L2 Japanese speakers in Experiment 2

age gender years of age of age of beginning duration of residence in arrival in of Japanese Japanese Japan Japan instruction instruction AJ1 37 male 11 years 26 25 3 years 9 months AJ2 26 male 4 years 20 18 9 years AJ3 30 female 4 years 26 19 5 years BJ1 22 male 0 --- 19 2 years 6 months BJ2 18 male 0 --- 14 4 years 7 months BJ3 24 female 0 --- 14 6 years 6 months BJ4 24 male 0 --- 21 2 years 6 months

Differences in learning backgrounds between L2 English and L2 Japanese

Even though the proficiency groups in each L2 type were determined on the basis of the same criterion (the years of residence), the learning background of the speakers of the two L2 types is not matched in terms of two factors which may significantly affect their performance in the experiments: the age at which they began to learn their L2, and the type of instruction they received in L2. All 8 Japanese speakers of L2 English in Experiment 1 started learning English at the age of 12, continuing in junior and senior high schools. They went through curricula with a strong emphasis on grammar and reading comprehension, having teachers whose native language was Japanese, not English. In contrast, five out of the seven L2 Japanese speakers in Experiment 2 began to learn Japanese in college (except BJ2 and BJ3, who started in junior high school). In addition to Japanese grammar, all seven L2 Japanese speakers have been learning speaking and listening skills from teachers whose mother

28 tongue was Japanese. Thus, in summary, L2 English speakers in Experiment 1 began to learn their L2 earlier than L2 Japanese speakers in Experiment 2 did; however, L2 English speakers received less authentic input in their target language than L2 Japanese speakers. It would be ideal to avoid these differences, but realistically, it is hard to perfectly match the learning background of L2 English and L2 Japanese groups, since the social status of the two target languages is different in Japan and the United States. Still, it is important to be aware of the aforementioned differences when we later interpret the data collected.

2.1.6. Procedure Recording

For each experiment, sentences with target words were mixed with foil sentences with words different from target words. Sentences in each reading of the list were pseudo- randomized in 10 different orders. The first reading was not analyzed. In the recording session, PsyScope was used to present sentences. One sentence was displayed on the computer screen at a time. The subjects were given sufficient time to practice the speech materials. They were asked to read sentences without hesitations or pauses in the middle. Data were recorded in the recording booth of the UCLA phonetics lab for L1 English, experienced L2 English, and L2 Japanese groups, and in the recording room of Meiji Gakuin University

Information Center in Tokyo for L1 Japanese and inexperienced L2 English groups.

Measurements

Collected data were digitized with Kay Elemetrics’s CSL at a 10 kHz sampling rate. Scicon’s PitchWorks was used to measure F0 and duration values. Tokens were not analyzed if:

29 • there were errors in accent or stress placement on a target word • there were hesitations or pauses in the middle of the sentence • there were exceptional sequences of phonological tones

Duration was measured for the first vowel of a target word, using waveforms and wide-band spectrograms. For the first syllable of con tract and per mit in Experiment 1, the duration of the whole syllable rhyme (nucleus vowel + coda consonant) was measured because of segmentation difficulties. F0 was measured at the center of the first syllable, using a pitch extraction display.

Statistic Analysis

Obtained values of F0 and duration were analyzed using two-factor ANOVAs and Scheffe’s post-hoc tests. The independent variables of the ANOVAs were word accent and word pair. The focus of Experiments 1 and 2 is on the effect of word accent on F0 and duration (stressed vs. unstressed conditions in Experiment 1; accented vs. unaccented conditions in Experiment 2). The effect of word pair was included in the ANOVAs in order to control for the variance generated by this factor.

2.1.7. Results of Experiment 1 (English) F0 patterns

For each L1 English speaker, the F0 mean and standard deviation of stressed and

unstressed vowels are plotted in Figure 2-1. The results show that all four speakers of L1 English produced a greater F0 mean in the stressed condition than in the unstressed condition for all four tested English word pairs.

30 Figure 2-1: F0 means & standard deviations of stressed vs. unstressed vowels for L1 English

140 NE2 120 * NE1 * * * 120 100 * * 100 80 80 60 60 40 40 20 20 0 0 contract digest per mit subject contract digest per mit subject

F0 (Hz) 210 240 * NE3 NE4 * 180 200 * * * * 150 160 120 120 90 80 60 40 30 0 0 contract digest per mit subject contract digest per mit subject

stressed unstressed (* = significantly different at α = 0.01)

We conducted ANOVAs in order to investigate the effect of word accent (stressed vs. unstressed target vowels) on the F0 values of the tested vowels. The results showed that the effect of word accent was statistically significant for the data of every L1 English

speaker (p < 0.0001). ANOVA results for each speaker are presented in Table 2-3. Scheffe’s post-hoc tests were conducted to test differences in F0 means between the stressed and unstressed vowel in each word pair. Word pairs showing significant

differences are marked by an asterisk (α = 0.01) in Figure 2-1. If we examine the distribution of significant differences between stressed and unstressed means, we find two patterns across the four speakers. NE2 and NE4 show significantly higher F0 in the

31 stressed condition for all 4 word pair types. NE1 and NE3 show this pattern for 2 out of 4 word pair types; however, stressed means are still greater in the two pairs showing no significant difference, indicating a trend for a greater mean in the stressed condition. These results suggest that in L1 English, overall, stressed vowels with a H* accent are significantly higher in F0 than unstressed vowels.

Table 2-3: ANOVA results for F0 data of L1 and L2 English in Experiment 1

L1 English Advanced Beginning L2 English L2 English

NE1 F(1, 59) = 22.13 AE1 F(1, 62) = 1090.42 BE F(1, 61) = 1319.37 p = <.0001 p = <.0001 1 p = <.0001

NE2 F(1, 47) = 110.37 AE2 F(1, 62) = 1995.15 BE F(1, 51) = 208.03 p = <.0001 p = <.0001 2 p = <.0001

NE3 F(1, 42) = 27.69 AE3 F(1, 59) = 101.01 BE F(1, 63) = 151.31 p = <.0001 p = <.0001 3 p = <.0001

NE4 F(1, 53) = 86.76 AE4 F(1, 63) = 269.05 BE F(1, 57) = 581.60 p = <.0001 p = <.0001 4 p = <.0001

All Japanese speakers of L2 English showed consistently higher F0 for stressed vowels in all 4 word pairs. In Figure 2-2, F0 means and standard deviations are shown for representative speakers of the advanced and beginning L2 English groups (AE1 and BE3, respectively). The results of a series of ANOVAs showed a significant effect of word

accent on the F0 values of the target vowels (p < 0.0001) in the F0 data of each L2 English speaker. ANOVA results for each speaker are presented in Table 2-3. According to the results of the Scheffe’s post-hoc tests (α = 0.01), the mean difference between stressed and unstressed vowels was always statistically significant. Word pairs showing significant differences are marked by an asterisk (α = 0.01) in Figure 2-2.

32 Figure 2-2: F0 means & standard deviations of stressed vs. unstressed vowels for Speakers AE1 and BE3

280 AE1280 BE3 240 * * ** * * 240 * 200 * 200 160 160 F0 (Hz) 120 120 80 80 40 40 0 0 contract digest per mit subject contract digest per mit subject

stressed unstressed (* = significantly different at α = 0.01)

Additionally, F0 ratios of stressed to unstressed vowels were computed for each speaker’s data by dividing the F0 value of each stressed vowel token by the one of the corresponding unstressed token for each repetition of each word pair type. For each speaker, obtained ratio values were pooled across all word pairs and repetitions, and mean ratio and standard deviation was computed. Results are summarized in Figure 2-3.

Figure 2-3: Average F0 ratio of English stressed/unstressed vowels

2 L1 English Advanced Beginning L2 English L2 English

1.5

1 F0 ratio 0.5

0 NE1 NE2 NE3 NE4 AE1 AE2 AE3 AE4 BE1 BE2 BE3 BE4

33 Figure 2-3 shows that L1 English speakers overall show smaller F0 ratios than beginning Japanese speakers of L2 English. We find three patterns in advanced L2 English data: AE3 shows a ratio in the range of L1 English ratios; AE4 shows a ratio in the range of ratios produced by beginning L2 English; finally, AE1 and AE2 show the greatest ratios. On the basis of these data, we can conclude that there is no systematic relation between L2 proficiency and the F0 ratios of stressed/unstressed vowels.

Duration patterns

All four L1 English speakers showed the same pattern: stressed vowels were consistently longer than unstressed vowels in all four word pairs. An example of this is shown in Figure 2-4, where means and standard deviations of stressed and unstressed vowels of a representative speaker (NE1) are presented.

Figure 2-4: Duration means and standard deviations of stressed and unstressed vowels for Speaker NE1 (L1 English)

(ms) stressed unstressed 175 150 * NE1 125 * 100 75 * *

duration 50 25 0 contract digest permit subject

(* = significantly different at α = 0.01)

The results of a series of ANOVAs showed that the effect of word accent on vowel duration was statistically significant for the data of every L1 English speaker (see Table 2-4 for ANOVA results for each speaker). According to the results of a series of Scheffe’s

34 post-hoc tests (α = 0.01), the difference in duration of the stressed vs. unstressed condition was statistically significant for all 4 tested word pairs and all 4 L1 English speakers,

Table 2-4: ANOVA results for duration data of L1 English in Experiment 1

NE1 F(1, 61) = 156.15 NE3 F(1, 53) = 406 p = <.0001 p = <.0001

NE2 F(1, 59) = 470.19 NE4 F(1, 54) = 241.17 p = <.0001 p = <.0001

The durational contrast between stressed and unstressed vowels can be quantified by computing duration ratios of stressed to unstressed vowels. For all speakers, the method used to compute F0 ratios in Figure 2-3 was also used for duration ratios. Results are summarized in Figure 2-5.

Figure 2-5: Average duration ratio of English stress/unstressed vowels

L1 English Advanced Beginning 3 L2 English L2 English

2 stressed = unstressed (no difference)

1 durational ratio

0 NE1 NE2 NE3 NE4 AE1 AE2 AE3 AE4BE1 BE2 BE3 BE4

For the four speakers of L1 English, the mean ratios approximately range from 1.6 (NE1) to 2.0 (NE2). In contrast, the ratios of beginning L2 English speakers are clustered around 1.0, which indicates that there is no durational difference between stressed and unstressed vowels. Advanced L2 English speakers show ratio magnitudes somewhere in the middle between L1 English and beginning L2 English. These results suggest the following

35 generalization: advanced Japanese speakers of L2 English produce native-like duration patterns more reliably than beginning speakers, showing greater duration ratios of stressed/unstressed vowels. The results show two points regarding the effect of transfer. First, we found the expected effect of negative transfer from L1 Japanese in that beginning Japanese learners of English produced Japanese-like duration ratios of stressed/unstressed vowels. Second, there is a positive effect of learning, given that advanced Japanese learners show more native-like ratio magnitudes.

2.1.8. Results of Experiment 2 (Japanese) F0 patterns

Both L1 and L2 Japanese speakers showed consistently higher F0 for accented vowels in all 3 tested word pairs. The results of a series of ANOVAs showed that the effect of word accent on the F0 values of the tested vowels was statistically significant in the data of every speaker (see Table 2-5 for ANOVA results for each speaker). According to the results of Scheffe’s post-hoc tests (α = 0.01), the mean difference between stressed and unstressed vowels was always statistically significant for every speaker. As in the analysis of the Experiment 1 data, F0 ratios were computed for all speakers. For each speaker, the F0 value of each accented vowel token was divided by the value of the corresponding unaccented token for each repetition of each word pair type. For each speaker, obtained ratio values were pooled across all word pairs and repetitions, and mean ratio and standard deviation were computed. Results are summarized in Figure 2-6.

36 Table 2-5: ANOVA results for F0 data of L1 and L2 Japanese in Experiment 2

L1 Japanese Advanced Beginning L2 Japanese L2 Japanese

NE1 F(1, 45) = 241.26 AE1 F(1, 32) = 133.53 BE F(1, 30) = 98.2 p = <.0001 p = <.0001 1 p = <.0001

NE2 F(1, 34) = 250.4 AE2 F(1, 28) = 124.05 BE F(1, 26) = 72.84 p = <.0001 p = <.0001 2 p = <.0001

NE3 F(1, 41) = 104.14 AE3 F(1, 59) = 122.35 BE F(1, 33) = 122.95 p = <.0001 p = <.0001 3 p = <.0001

NE4 F(1, 43) = 183.1 BE F(1, 23) = 92.09 p = <.0001 4 p = <.0001

Figure 2-6: Average F0 ratio of Japanese accented/unaccented vowels

2 L1 Japanese Advanced Beginning L2 Japanese L2 Japnaese 1.5

1 F0 ratio 0.5

0 NJ1 NJ2 NJ3 NJ4 AJ1 AJ2 AJ3 BJ1 BJ2 BJ3 BJ4

The comparison of individual ratios across the three speaker groups shows no systematic correlation between L2 Japanese proficiency and F0 ratios, indicating that the F0 characteristics of L1 English facilitate L2 Japanese produced by L1 English speakers regardless of their L2 proficiency levels. This shows the effect of positive transfer.

37 Duration patterns

For each L1 Japanese speaker, duration means and standard deviations of accented and unaccented vowels are plotted in Figure 2-7.

Figure 2-7: Duration means & standard deviations of accented vs. unaccented vowels for L1 Japanese

100 100 NJ1 NJ2 80 80

60 60

40 40

20 20

0 0 kiru satoo toshi kiru satoo toshi 100 100 duration (ms) NJ3 NJ4 80 80

60 60 * 40 40

20 20

0 0 kiru satoo toshi kiru satoo toshi accented unaccented

In Experiment 1, we found that all L1 English speakers produced stressed vowels as systematically longer than unstressed vowels, for all tested English word pairs. This is not the case in L1 Japanese production. In Figure 2-7, we can observe some cases in which the mean duration of accented vowels is greater than the mean of unaccented vowels, but this pattern is not consistent within the production of any L1 Japanese speaker. I conducted ANOVAs for the data of each L1 Japanese speaker, in order to examine the

38 effect of word accent on vowel duration. The results showed no significant effect (α =

0.01) in the data of any of the four L1 Japanese speakers (see Table 2-6 for ANOVA results of each speaker). Scheffe’s post-hoc tests were also conducted in order to test the significance of the difference of mean duration between accented and unaccented position for each word pair and each L1 Japanese speaker (α = 0.01). In only one instance were the

accented and unaccented means significantly different (i.e., the kiru pair in NJ4’s production), indicated by an asterisk in Figure 2-7. These results lead to the conclusion that in L1 Japanese (unlike in L1 English), duration does not play an essential role in differentiating the production of accented and unaccented monomoraic vowels.

Table 2-6: ANOVA results for duration data of L1 Japanese in Experiment 2

NJ1 F(1, 45) = .25 NJ3 F(1, 41) = 5.8 p = .6195 p = .0206

NJ2 F(1, 34) = 2.14 NJ4 F(1, 45) = 3.2 p = .1527 p = .0822

Following the same method for computing F0 ratios of accented to unaccented vowels which I used for Figure 2-6, duration ratios of accented to unaccented vowels were computed for all L1 and L2 Japanese speakers. For each speaker, obtained ratio values were pooled across all word pairs, and a mean ratio and standard deviation were computed.

Results are summarized in Figure 2-8. In this figure, the absence of a significant durational contrast in L1 Japanese speech, which was already observed in Figure 2-7, is shown by the fact that the ratios of NJ1-4 cluster around 1.0.

39 Figure 2-8: Average duration ratio of Japanese accented/unaccented vowels

L1 Japanese Advanced Beginning L2 Japanese L2 Japanese 2

accented = unaccented (no difference)

1 durational ratio

0 NJ1 NJ2 NJ3 NJ4 AJ1 AJ2 AJ3 BJ1 BJ2 BJ3 BJ4

The following duration patterns were observed for L2 Japanese. In the production of the seven speakers of L2 Japanese, the magnitudes of duration ratios are roughly grouped into three ranges. The ratios of AJ2, AJ3 and BJ2 fall into the range of L1 Japanese ratios; AJ1, BJ1 and BJ3 show the greatest duration ratios among all seven speakers of L2 Japanese; finally, BJ4 shows a ratio somewhere in the middle. The great ratios of AJ1, BJ1 and BJ3 confirm the effect of the transfer of the durational features of L1 English stress in their production of L2 Japanese accent. Notice that AJ1 shows the greatest duration ratio among all seven L2 Japanese speakers, and that BJ2 shows a ratio close to the range of L1 Japanese ratios. These results suggest that, unlike in the case of L2 English produced by native speakers of Japanese, there is no systematic correlation between duration ratios of accented/unaccented vowels and L2 proficiency levels in L2 Japanese produced by native speakers of English.

2.1.9. Discussion of Experiments 1 and 2

The results of Experiments 1 and 2 are summarized in Table 2-7 and Table 2-8, respectively, which outline how the F0 and duration correlates were manipulated in word accent production.

40 Table 2-7: Summary of results of Experiment 1 (English)

L1 English L2 EnglishÐL1 Japanese F0 stressed > unstressed stressed > unstressed (significant difference with 4 (significant difference for all cases) exceptions)

duratio stressed > unstressed positive correlation between L2 English proficiency and durational contrasts n (significant difference for all cases)

Table 2-8: Summary of results of Experiment 2 (Japanese)

L1 Japanese L2 JapaneseÐL1 English F0 accented > unaccented accented > unaccented (significant difference for all (significant difference for all cases) cases)

duration accented = unaccented no systematic correlation between L2 Japanese proficiency and durational (no significant difference with contrasts one exception)

Manipulation of F0 and duration in L1 English vs. L1 Japanese

Results of Experiments 1 and 2 show that in both L1 English and L1 Japanese the F0 correlate is actively manipulated in the production of word accent. However, we also find that a F0 contrast between accented and unaccented vowels is more consistently realized in L1 Japanese than in L1 English.

Interestingly, the absence of consistency in the F0 contrast in L1 English is compensated by a robust durational contrast. These differences between the two languages can possibly be explained in the following way. In Japanese, the distribution of lexical pitch accent greatly determines the tonal structure of the utterance, and there is only one type of pitch accent (H*+L). In this language, lexically accented syllables are phonologically specified with high tones (immediately followed by low tones) and

41 phonetically realized with a higher F0 peak than unaccented syllables. In contrast, in English, the position of lexical stress specifies the possible landmark for pitch accents, but pitch accents can have either high or low tones. Therefore, if a word stress receives a pitch accent with a low tone (e.g., L*), a syllable with this word stress and a low tone accent is produced with lower F0 than an adjacent unstressed syllable. Importantly, even in this tonal context stressed syllables are longer than unstressed syllables10. Thus, it is reasonable to say that duration is a more stable correlate of word stress than F0. Considering all, the manipulation of F0 and duration for word accent production can be summarized for L1 English and L1 Japanese in the following manner:

• In English, both F0 and duration are actively manipulated, but duration plays a more stable role than F0 in order to realize the acoustic contrast of stressed and unstressed vowels.

• In Japanese, only F0 is significantly manipulated.

Development of F0 manipulation in L2 English and L2 Japanese

The analysis of F0 patterns in Experiments 1 and 2 showed that lexically accented vowels were consistently higher in pitch than unaccented vowels in both L2 EnglishÐL1 Japanese and L2 JapaneseÐL1 English regardless of L2 proficiency levels. This suggests that the active role of F0 in the L1 accent system positively transfers to both L2 English and L2

Japanese, as expected. Our L1 English data showed that, although F0 is actively manipulated, the F0 contrast is not as consistently realized as the durational contrast in L1 English. Interestingly, this characteristic of L1 English does not transfer to L2 Japanese produced by native speakers of English: all seven speakers of L2 Japanese in Experiment 2 realized significant F0 contrasts with good consistency. In other words, the F0 correlate,

10 Also, a stressed syllable associated with a pitch accent with a low tone is still louder than an unstressed

42 which is manipulated actively but not consistently in L1 English, is manipulated actively and consistently in the word accent production of L2 JapaneseÐL1 English. The data of Experiment 1 showed that all eight Japanese speakers of L2 English produced English stress with significant F0 contrasts, as shown in Figure 2-3. There was no systematic relation between L2 proficiency and the use of F0. This confirms the results of previous studies (for production, Shibuya 1997; for perception, see Beckman 1986, Watanabe 1987), indicating that the F0 features of L1 Japanese largely facilitate the production of English stress. As reviewed in Section 2.1.3, Beckman’s (1986) study examined the role of the F0 cue in the perception of L2 JapaneseÐL1 English. She found that exposure to authentic Japanese was positively correlated with the use of F0 as the main cue in the perception of Japanese accents by English speakers. Such a correlation, however, was not observed in the production data of Experiment 2: in their production of Japanese accents, all of the seven speakers of L2 Japanese used F0 as consistently as L1 Japanese speakers did, regardless of L2 proficiency levels. The difference in results between Beckman’s perception study and the present study may be explained by the combination of the following two factors. First, the two studies examined Americans with different language backgrounds. In Beckman’s study, monolingual Americans with no knowledge of Japanese were compared with Americans speaking Japanese who had lived in Japan at least one year in the past. The present study examined two groups of Americans learning Japanese: Americans with a 2.5-years classroom knowledge of Japanese, who had never lived in Japan (the beginning group), and Americans who had lived in Japan at least four years (the advanced group). None of Beckman’s monolingual subjects had any exposure to authentic Japanese in the past, while all four Americans in the beginning group of the present study had listened to authentic syllable (although no data on this is available from Experiment 1 of the present study).

43 Japanese in classroom for at least 2.5 years (see Table 2-2 for their Japanese backgrounds). F0 is the dominant acoustic correlate of Japanese accent, carrying a significant degree of perceptual salience in L1 Japanese speech, and it is likely that a few years of exposure to authentic Japanese is sufficient for Americans to extract this feature of Japanese accents and actually realize it in their Japanese production. Second, the different results in the two studies may be simply attributed to a yet-to-be-identified difference between L2 perception and L2 production.

Development of duration manipulation in L2 English and L2 Japanese

The results of Experiments 1 and 2 presented an asymmetrical pattern between the two L2 types in terms of the development of duration manipulation. For L2 English, advanced Japanese speakers showed native-like durational patterns more reliably than beginning speakers. However, for L2 Japanese there was no systematic difference in duration manipulation between advanced and beginning speakers. This asymmetry can be explained in the following way. In L1 English, duration is only manipulated at the phonetic level: there is no phonemic length contrast, but stressed syllables are predictably longer than unstressed syllables. Since duration plays no phonemic role in L1 English, the manipulation of duration is only an unconscious phonetic habit for native speakers of English. Thus, English speakers learning Japanese are expected to have a hard time controlling duration when trying to suppress the durational

contrast between accented and unaccented syllables in their L2 Japanese speech. On the other hand, in L1 Japanese duration is not actively manipulated in word accent production; thus, there is no significant durational contrast between accented and unaccented syllables. However, duration is active at the phonological level in this language, in the sense that there is a phonemic length distinction between short and long segments (i.e., o ki ‘off shore’ vs. oo ki ‘(last name)’. Short and long segments are categorically perceived as one

44 or two moras long by native speakers of Japanese. Therefore, it can be predicted that Japanese speakers learning English are sensitive to the duration cue and become able to control duration consciously as they have more learning experience.

2. Experiments 3: Word accent production after focus in L2 English

2.2.1. Prosodic context of the target word in Experiment 1: Nuclear position

In Experiment 1, the target word was embedded in nuclear position, where the word stress is realized with the highest peak of pitch of the entire utterance, as indicated by Figure 2-9. A stress in nuclear position is equivalent to what was labeled ‘sentence stress’ in the traditional description of English prosody. The term nuclear (pitch) accent is commonly employed in Intonational Phonology (Pierrehumbert 1980; Beckman and Pierrehumbert 1986; Ladd 1996). According to the results of Experiment 1, only advanced Japanese speakers of L2 English were able to produce a significant durational contrast between stressed and unstressed vowels in addition to a F0 contrast.

Figure 2Ð9: Realization of nuclear pitch accent in English

nuclear position

pitch (F0) word stress of target word

I said SUBject this time

2.2.2. Possible learning strategy in L2 EnglishÐL1 Japanese

It is possible that advanced Japanese speakers of L2 English in Experiment 1 used the following strategy: they used an acoustic correlate which is already active in the production

45 of L1 word accent (i.e., F0), in order to learn to control a correlate which is not active in L1 (i.e., duration). This hypothesis can be phrased in the following way:

Japanese learners of English first employ F0, an active correlate in their L1 Japanese accent system, in order to contrast lexically stressed and unstressed vowels in their L2 English production, and later learn to use duration, an inert correlate in their L1 system. By employing this strategy, advanced Japanese speakers of L2 English still use F0 as the major stress cue, and they lengthen vowels when they raise F0 (pitch) in the word stress context.

The main goal of Experiment 3 is to find whether this learning strategy is employed or not in the production of English stress by native Japanese speakers.

2.2.3. Context of the target word in Experiment 3: Post-nuclear position

Whether the aforementioned learning strategy is employed by Japanese speakers of L2 English or not can be tested in a prosodic context where F0 and duration are not positively correlated in English stress realization. This condition is satisfied by post-nuclear position. In English, contrastive focus is assigned to items that are contrasted in a pragmatic context, as shown in Figure 2-10. In the pragmatic context in Figure 2Ð10, the subjects of the two sentences, Bob and I, are contrasted and attract sentence stress with the highest pitch peak. In American English, after a contrastive focus, the F0 opposition between stressed and unstressed syllables disappears (post-nuclear deaccentuation), but the durational contrast remains (see Huss 1978 and Ueyama & Jun 1998 for acoustic evidence). Therefore, in post-nuclear position, the phonetic contrast between stressed and unstressed syllables is still realized in terms of the duration correlate in American English, even though their F0 contrast is lost.

46 Figure 2Ð10: Realization of post-nuclear word stress in English

Bob didn’t say SUBject nuclear position

post-nuclear

pitch (F0) Contrastive Focus word stress of target word

I said SUBject this time

In Japanese, there is a strong tendency toward the suppression of the F0 contrast between accented and unaccented syllables after a contrastive focus (Maekawa 1994). However, F0 suppression in post-focus position is not as complete in Japanese as in English, and there may be a trace of a contrast between lexically accented and unaccented syllables after a contrastive focus, even though the magnitude of the F0 contrast is less (Maekawa 1994). Thus, it can be said that English generally shows a greater degree of F0 suppression in post-focus position than Japanese.

2.2.4. Expected patterns F0 patterns

Based on the findings of earlier studies, it can be predicted that in L1 English there will be no significant F0 contrast between lexically stressed and unstressed vowels after contrastive focus, because of post-focus deaccentuation. The study of Ueyama and Jun (1998) examined how post-focus deaccentuation is learned by Japanese learners of English. The results of their experiment showed that there is a positive correlation between L2 oral proficiency and the mastery of post-focus deaccentuation: i.e., more advanced Japanese learners are better at deleting pitch accents in

47 order to realize a plateau-shaped F0 contour after a contrastive focus. This finding leads us to the following prediction: more proficient Japanese learners of English will show less F0 contrast between stressed and unstressed syllables in post-nuclear position.

Duration patterns

According to Huss's (1978) findings, the duration contrast between stressed and unstressed vowels in English is preserved after a contrastive focus within the same prosodic phrase, although the F0 contrast disappears. Thus, it can be predicted that in L1 English stressed vowels will be significantly longer than unstressed vowels even in post- nuclear position. The results of Experiment 1 in the present study show that increased exposure to the native input of a target language (i.e., English) can help Japanese speakers of L2 English to activate duration, which is an inert correlate in L1 Japanese, in English neutral declaratives. Since the help of F0 as a stress cue is not available in post-nuclear position, where the F0 contrast disappears, Japanese learners of English are expected to have more difficulties in producing a durational contrast between stressed and unstressed vowels in this prosodic context.

2.2.5. Subjects

Four speakers of L1 English and 4 advanced and 3 beginning Japanese speakers of L2

English participated in Experiment 3. All these speakers also participated in Experiment 1.

2.2.6. Speech materials

The same pairs of nouns and verbs used in Experiment 1 were used in Experiment 3: CONtract vs. conTRACT; DIgest vs. diGEST; PERmit vs. perMIT; SUBject vs. subJECT. A context sentence and frame sentences with the same target word were presented in order to elicit an expected prosodic pattern:

48 Context: I read Reader’s Digest.

Frame: MOTO didn’t say digest. I said digest this time.

Here it is expected that the two items pragmatically contrasted, MOTO and I, attract a nuclear pitch accent in each sentence and that the post-focus part of each sentence including the target word digest is deaccented. The first vowel of the target word in the second frame

sentence (I said d i gest this time, in this example) was measured for F0 and duration values. Obtained values were analyzed by comparing stressed vs. unstressed conditions (e.g., DI gest vs. di GEST) in order to examine the effect of a word stress on each of the two acoustic correlates of word accent.

2.2.7. Procedure

The same procedure used in Experiment 1 was used (see Section 2.1.6.).

2.2.8. Results of Experiment 3 F0 patterns

As in Section 2.2.4, considering the findings of Huss (1978) and Ueyama & Jun (1998), we expect that L1 English speakers would show no significant F0 contrast between stressed and unstressed vowels after a contrastive focus. Figure 2-11 shows the F0 means and standard deviations of the first syllable of the target word in stressed and unstressed

conditions for each L1 English speaker and each word pair. In Figure 2-11, the word pair showing a significant difference between the stressed vs. unstressed mean in a Scheffe’s post-hoc test is marked by an asterisk (α = 0.01). The four speakers of L1 English show no significant F0 contrast between stressed and unstressed vowels except for one case (the permit pair in NJ3’s speech). This result confirms the expected pattern: in L1 English there is no F0 contrast of stressed and unstressed vowels in post-nuclear position.

49 Figure 2-11: F0 means & standard deviations of stressed vs. unstressed vowels for L1 English in post-nuclear position

140 140 NE1 NE2 120 120 100 100 80 80 60 60 40 40 20 20 0 0 contract digest permit subject contract digest permit subject

F0 (Hz) 140 NE3 NE4 200 * 120 160 100 120 80 60 80 40 40 20 0 0 contract digest permit subject contract digest permit subject stressed unstressed (* = significantly different at α = 0.01)

Figures 2-12 and 2-13 show F0 means and standard deviations for advanced and beginning L2 English, respectively. In Figure 2-12, we find two patterns among the four advanced Japanese speakers of L2 English. First, AE1, AE3 and AE4 made no F0 distinction between stressed and unstressed vowels in post-nuclear position, indicating that they were able to deaccent the word stress of the target word in a native-like manner. Second, AE2 made a significant difference for two word pairs (digest and permit), but not for the other two pairs (contract and subject).

50 Figure 2-12: F0 means & standard deviations stressed vs. unstressed vowels for advanced L2 English in post-nuclear position

200 AE1 280 AE2 240 * 160 * 200 120 160 80 120 80 40 40 0 0 contract digest permit subject contract digest permit subject

F0 (Hz) 160 AE3 280 AE4 140 240 120 100 200 80 160 60 120 40 80 20 40 0 0 contract digest permit subject contract digest permit subject

stressed unstressed (* = significantly different at α = 0.01)

In Figure 2-13, two patterns are observed among the three beginning speakers of L2 English. First, BE1 and BE3 produced stressed vowels consistently higher than

unstressed vowels for all 4 word pairs, indicating the absence of post-focus deaccentuation. Second, BE2 showed the same non-native-like pattern for permit and subject, but not for the remaining two pairs.

51 Figure 2-13: F0 means & standard deviations stressed vs. unstressed vowels for beginning L2 English in post-nuclear position

180 200 BE1 BE2 150 * * 160 * * * * 120 120 90 80 60 40 30 0 0 contract digest permit subject contract digest permit subject F0 (Hz) 280 BE3 stressed unstressed 240 * * * * 200 160 120 80 40 0 contract digest permit subject (* = significantly different at α = 0.01)

A statistically significant difference between stressed and unstressed F0 means indicates the presence of a significant F0 contrast between stressed and unstressed vowels and the absence of post-nuclear deaccentuation; no significant difference indicates the realization of deaccentuation in post-nuclear position. The distribution of significant F0 contrasts is summarized for all L1 and L2 English speakers in Table 2-9. The native pattern is characterized by the absence of F0 contrast (i.e., post-nuclear deaccentuation), while the non-native pattern is characterized by the presence of a significant F0 contrast. In the table,

52 statistically significant differences are indicated by a check mark, and cells showing non- native patterns are shaded11.

Table 2-9: F0 contrast in post-nuclear position

contract digest permit subject BE1 √ √ √ √ 2 no no √ √ 3 √ √ √ √ AE1 no no no no 2 no √ √ no 3 no no no no 4 no no no no NE1 no no no no 2 no no no no 3 no no √ no 4 no no no no

√ = stressed vowel is significantly higher in F0 than unstressed vowel (α = 0.01) shaded cell = expected non-native pattern

Observed L2 English patterns can be classified into three types:

• AE1, AE3 and AE4 suppressed the F0 contrast of stressed and unstressed vowels in post-nuclear position for all 4 tested word pairs; this indicates that these three advanced speakers of L2 English are able to realize native-like

post-nuclear deaccentuation successfully.

11 For F0 data, the presence of a significant difference in stressed and unstressed vowels is the expected non- native pattern. In contrast, for duration data, the absence of a significant difference is the expected non- native pattern. In order to show this difference between F0 and duration in terms of expected non-native patterns, I used both check marks and shaded cells.

53 • BE1 and BE3 preserved the F0 contrast of stressed and unstressed vowels for all 4 tested word pairs; this means that these two beginning speakers of L2 English do not deaccent post-nuclear stress at all.

• AE2 and BE2 suppressed the F0 contrast for two pairs, but not for the other two pairs; this indicates that these two speakers have begun to learn but have not mastered post-nuclear deaccentuation.

These results show the following general pattern for L2 English data: more advanced learners tend to realize post-nuclear deaccentuation (i.e., suppress F0 contrast of stressed and unstressed vowels after a contrastive focus) more reliably. This confirms the findings of Ueyama & Jun (1998).

Duration patterns

The duration means and standard deviations of stressed and unstressed vowels in post- focus position are plotted for the four speakers of L1 English and for each word pair in Figure 2-14. Scheffe’s post-hoc tests were conducted to test the statistical significance of mean differences for each word pair (α = 0.01). A word pair showing a significant difference between stressed and unstressed means is indicated by an asterisk. In Experiment 1, it was shown that in nuclear position all 4 L1 English speakers produced a stressed syllable with significantly longer duration than an unstressed vowel for all 4 word

pair types. For post-nuclear position, a similar relation is observed with one exception (the subject pair in NE1’s production), as shown in Figure 2-14. In Section 2.2.4, we predicted that Japanese learners of English would have more difficulties in producing a significant durational contrast between stressed and unstressed vowels in post-nuclear position, where the F0 contrast disappears, since they cannot use the help of F0 to activate duration. This prediction can be tested only if Japanese speakers

54 of L2 English suppress the F0 contrast successfully. Since the F0 analysis of L2 English data for Experiment 3 have already shown that the beginning Japanese speakers of L2 English have more difficulties in suppressing F0 contrast after a contrastive focus than the advanced speakers of L2 English, we will examine only the data of advanced L2 English for duration patterns.

Figure 2-14: Duration means & standard deviations of stressed vs. unstressed vowels for L1 English in post-nuclear position

160 240 NE1 140 NE2 * 200 * 120 * * 100 160 80 * 120 60 * 80 * 40 20 40 0 0 contract digest permit subject contract digest permit subject 240 240 duration (ms) NE3 NE4 200 * 200 * 160 160 * * 120 * 120 * * 80 80 * 40 40 0 0 contract digest permit subject contract digest permit subject

stressed unstressed (* = significantly different at α = 0.01)

For the four Japanese speakers of advanced L2 English, means and standard deviations of stressed and unstressed vowels are plotted in Figure 2-15. Word pairs showing a significant difference by Scheffe’s post-hoc tests (α = 0.01) are marked by an asterisk. In

55 Figure 2-14, we observed a uniform pattern across the four speakers of L1 English: stressed vowels were significantly longer in duration than unstressed vowels for all four tested word pairs. On the other hand, Figure 2-15 shows three different patterns among the four Japanese speakers of advanced L2 English. AE1 showed a native-like significant difference between stressed and unstressed vowels for all the four word pairs; AE2 and AE4 showed a significant difference for three out of four pairs; AE2 showed no significant difference for any word pairs.

Figure 2-15: Duration means & standard deviations of stressed vs. unstressed vowel for advanced L2 English in post-nuclear position

180 140 AE1 AE2 * 120 * 150 * * 100 120 * 80 * * 90 60 60 40 20 30 0 0 contract digest permit subject contract digest permit subject 180 140 duration (ms) AE3 AE4 120 150 * 100 120 * 80 90 60 * 60 40 30 20 0 0 contract digest permit subject contract digest permit subject

stressed unstressed (* = significantly different at α = 0.01)

56 In order to show the difference between L1 English and advanced L2 English patterns, the distribution of a significant durational contrast in post-nuclear position is summarized for L1 English and advanced L2 English in Table 2-11, following the same procedure for summarizing the distribution of significant F0 contrasts in Table 2-9. Additionally, the distribution of durational contrast in nuclear position from the data of Experiment 1 is summarized in Table 2-10. In both tables, a check mark indicates an expected native English pattern: the mean of the stressed vowel is significantly longer than the mean of the unstressed vowel. Cells showing non-native patterns are shaded.

Table 2-10: Durational contrast in nuclear position

contract digest permit subject AE1 √√√√ 2 √ no √√ 3 √√√√ 4 √√√√ NE1 √√√√ 2 √ √√√ 3 √√√√ 4 √√√√

Table 2-11: Durational contrast in post-nuclear position

contract digest permit subject AE1 √√√√ 2 √√√no 3 no no no no 4 √√no √ NE1 √√√no 2 √ √√√ 3 √√√√ 4 √√√√ √ = stressed vowel is significantly longer than unstressed vowel (α = 0.01) shaded cell = expected non-native pattern

57 The comparison of the two tables shows the following patterns for advanced L2 English:

• AE1 and AE2 did not show a difference in nuclear vs. post-nuclear positions in terms of the frequency of non-native-like patterns: none for AE1; one word pair for AE2.

• AE3 and AE4 produced more non-native-like patterns in post-nuclear position.

These suggest that Japanese learners of English generally have more difficulties in producing a durational contrast in post-nuclear than in nuclear position.

2.2.9. Discussion of Experiment 3 Possible learning strategy in L2 EnglishÐL1 Japanese

Results of Experiment 3 showed that in L1 English F0 and duration are manipulated independently in post-nuclear position: there is no F0 contrast but a significant duration contrast between stressed and unstressed syllables after contrastive focus. For L2 English, we found that advanced Japanese learners of English overall have a harder time to produce a significant durational contrast in post-nuclear position than in nuclear position. In Section 2.2.4, we assumed the following learning strategy employed by native Japanese speakers in their production of English word stress:

Japanese learners of English first employ F0, an active correlate in their L1 Japanese accent system, in order to contrast lexically stressed and unstressed vowels in their L2 English production, and later learn to use duration, an inert correlate in their L1 system. By employing this strategy, advanced Japanese speakers of L2 English still use F0 as the major stress

58 cue, and they lengthen vowels when they raise F0 (pitch) in the word stress context.

In order to test this assumption in a strict way, the suppression of the F0 contrast is a required condition. In this section, we will examine the interaction of F0 and duration patterns.

L1 English manipulation of F0 and duration

For L1 English, the manipulation of F0 and duration is summarized in Table 2-12, based on results of Experiments 1 and 3. In L1 English, in nuclear position, F0 and duration are significantly different in stressed and unstressed vowels. On the other hand, in post- nuclear position, the F0 contrast is lost, but a significant durational contrast is still preserved.

Table 2-12: Manipulation of F0 and duration in nuclear vs. post-nuclear positions

NUCLEAR (Exp. 1) vs. POST-NUCLEAR (Exp. 3) F0 Duration F0 Duration √√ no √

√ = significant difference between stressed vs. unstressed means (stressed > unstressed; α = 0.01) no = no significant difference

Distribution of F0 and duration contrasts in advanced L2 English

For advanced L2 English data, we also examined the distribution of F0 and duration contrasts in nuclear vs. post-nuclear position. We conducted Scheffe’s post-hoc tests for F0 and duration (α = 0.01) in order to test whether stressed and unstressed means were significantly distinguished for each word pair. For each position of the target word (nuclear or post-nuclear), we summarized the distribution of presence vs. absence of significant mean differences for both F0 and duration for each word pair for each speaker,

59 then we pooled the 4 advanced Japanese speakers of L2 English. For each word position, there were 16 cases in total, since there were 4 advanced speakers of L2 English and 4 word pairs (4 speakers X 4 word pairs). Results are summarized in Tables 2-13 and 2-14a:

Table 2-13: F0 and duration (D.) contrasts in nuclear position for advanced L2 English

native-like non-native-like F0 D. F0 D. F0 D. F0 D. √√no √ no no √ no cases 15001

Table 2-14a: F0 and duration (D.) contrasts in post-nuclear position for advanced L2 English

native-like non-native-like F0 D. F0 D. F0 D. F0 D. no √√√no no √ no cases 9 250 √ = significant difference (stressed > unstressed; α = 0.01) no = no significant difference

Difference between nuclear vs. post-nuclear position The comparison of Tables 2-13 and 2-14a shows that there are 6 more instances of native- like manipulation of F0 and duration in nuclear than post-nuclear position. This indicates that advanced Japanese speakers of L2 English have more difficulty in controlling F0 and duration correlates in post-nuclear position. This confirms the prediction presented in Section 2.2.4.

Analysis of non-native-like patterns in post-nuclear position

Non-native patterns in post-nuclear position in Table 2-14a fall into 2 types:

60 Type 1: positive relation between the manipulation of F0 and duration F0 D. F0 D. √√ no no

Type 2: negative relation between the manipulation of F0 and duration F0 D. √ no

Interestingly, there are more instances of Type 1 patterns than Type 2 patterns (7 vs. 0) in the post-nuclear data, as presented in Table 2-14b (the same data in Table 2-14a with emphasis on non-native-like patterns). This result brings support to the following hypothesis:

Japanese learners of English learn to activate duration with the help of F0 (i.e., lengthen vowel when F0 is high; keep vowel short when F0 is low) and only later learn to control duration and F0 independently.

Table 2-14b: F0 and duration contrasts in post-nuclear position for advanced L2 English

native-like non-native-like F0 D. F0 D. F0 D. F0 D. no √√√no no √ no cases 9 2 50

2.3. Summary of Experiments 1Ð3

In Experiments 1 and 2, we investigated how the manipulation of F0 and duration in the L1 system transfers to the production of L2 word accent in neutral declaratives. The findings of earlier studies show how F0 and duration are manipulated in word accent production of L1 English and L1 Japanese. F0 is actively used in both L1 English and L1 Japanese, while duration is actively used in L1 English, but not in L1 Japanese:

61 F0 Duration L1 English YES YES

L1 Japanese YES NO

These expected L1 patterns were overall supported by the results of Experiments 1 and 2. Our new finding is that in L1 English duration plays a more stable role in contrasting lexically stressed and unstressed vowels than F0, although both acoustic correlates are actively manipulated at least in neutral declaratives. Thus, the manipulation of F0 and duration observed in our L1 data can be summarized in the following way:

F0 Duration L1 English YES <<< YES

L1 Japanese YES NO

Considering the L1 system, we originally expected that the active role of F0 in both L1 types would positively transfer to L2 production. The F0 analysis of our L2 data brought support to this prediction, showing that lexically accented vowels were significantly and consistently higher in F0 than unaccented vowels in both L2 English and L2 Japanese. For L2 duration patterns, we expected that what must be learned in the target language

would be opposite in L2 Japanese and L2 English, since the role of duration is opposite in L1 English and L1 Japanese:

suppress F0 Duration in L2 Japanese L1 English YES YES

L1 Japanese YES NO activate in L2 English

62 In order to learn the duration patterns of their target language, native English speakers learning Japanese need to learn to suppress the duration contrast in their L2 Japanese, while native Japanese speakers learning English needed to learn to activate duration in their L2 English. The analysis of L2 duration patterns in Experiments 1 and 2 presents an asymmetry between L2 English and L2 Japanese in terms of the relation between L2 proficiency and duration patterns. For L2 English, advanced Japanese learners reliably showed more native-like durational patterns than beginning learners; however, for L2 Japanese, there was no systematic difference in the manipulation of duration between advanced and beginning learners. This asymmetry in the development of duration manipulation in the two L2 types can possibly be explained by the different phonological status of duration in L1 English and L1 Japanese. This difference results in different sensitivity to the duration cue and also in the different ability to control duration consciously in L2 word accent production. In Experiment 1, advanced Japanese speakers of L2 English were able to produce a significant durational contrast between stressed and unstressed vowels in addition to a F0 contrast, while beginning learners only produced the F0 contrast consistently. In order to explain this result, we suggested that advanced speakers of L2 English employ the following learning strategy: they start learning to lengthen vowels when they raise pitch in the nuclear-stress context of the target word. In order to test this assumption, in

Experiment 3, we examined how Japanese speakers of L2 English would manipulate duration when the English target word was embedded in post-nuclear position, where the F0 contrast between stressed and unstressed syllables disappears but the durational contrast remains in L1 English. The comparison of advanced L2 English patterns in Experiments 1 and 3 and the detailed analysis of non-native-like patterns in Experiment 3 brought strong support to our hypothesis.

63 Three general points emerge from the results of Experiments 1Ð3. First, in the beginning stage of L2 development, L2 speakers tend to import L1 phonetic habits in L2 word accent production. However, they are also able to modify these L1 habits to simulate L2 patterns. Consider in this respect the case of F0 in the production of L2 Japanese accent by English speakers. F0 is an active correlate in both L1 English and L1 Japanese for word accent production, but the two languages are different in the sense that the F0 contrast is more robust and stable in L1 Japanese than in L1 English. The results of Experiment 2 showed that native English speakers, regardless of whether they are beginning or advanced learners of Japanese, produce a more robust and stable contrast of F0 in their L2 Japanese than in their L1 English. This result indicates that these learners not only are able to use F0 in Japanese accent production, since it is already active in their L1 production of word accent, but that they are able to modify the phonetic patterns present in their L1 (English) to produce the robust contrast of F0 in their L2 (Japanese). Second, the asymmetry in the development of duration manipulation in the two L2 types suggests that we cannot predict L2 development patterns in a straightforward way on the basis of L1 phonetic habit. In order to explain the asymmetry, it is necessary to consider whether the manipulation of an acoustic correlate plays a distinctive role or not in the L1 system. Finally, as a learning strategy, L2 speakers may use an acoustic correlate which is already active in the L1 system (e.g., F0 in L1 Japanese), in order to learn to control a correlate which is not active in L1

(e.g., duration in L1 Japanese).

64 Chapter 3: Vowel Contrast in L2 English and L2 Japanese

3.1. Vowel system of English and Japanese The vowel system of L1 English (American English) is richer than the vowel system of L1 Japanese (Tokyo Japanese), as shown in Figure 3-1. American English has nine phonemic

monophthongal vowels, i.e., [i, I, E, œ, ´/ø, u, U, O, A]12, while Tokyo Japanese has five short vowels, i.e., [a, e, i, o, u] (Ladefoged 1993). In Japanese and most dialects of American English, these vowels are distinguished by height and backness.

Figure 3-1: Vowel system of English and Japanese

American English Tokyo Japanese

i u i, ii u, uu I U e, ee o, oo ´/ø E O œ A a, aa

An important characteristic of the Japanese vowel system is that each short vowel has a long counterpart: [a, aa], [e, ee], [i, ii], [o, oo], [u, uu]. In other words, in Japanese short and long vowels are phonemically contrasted in every region of the vowel space defined on the basis of vowel height and backness (e.g., /b ii ru/ ‘beer’ vs. /b iru/ ‘building’ in the high front region; /to o ru/ ‘pass’ vs. /t o ru/ ‘catch’ in the mid back region).

12 The vowel inventory varies from variety to variety of English, and which vowel is classified as a monophthong or diphthong varies from study to study. In the present study, we adapted the vowel inventory (based on Midwestern American English) and classification of monophthongs and diphthongs presented in Ladefoged’s A Course in Phonetics (1993, pp. 80-84). Ladefoged grouped [i, I, E, œ, ´/ø, u, U, O, A] as monophthongs, and [aI, aU, aI, aU, eI, oU, ju] as diphthongs.

65 While Japanese has a long-short vowel contrast in every spectral region of the vowel space, English has a vowel length contrast only in certain regions (e.g., meat /m i t/ vs. mitt

/mIt/ in the high front region; pool /p u l/ vs. pull /p U l/ in the high back region; bat /bœ t/ vs. bet /b E t/ in the low front region). The role of vowel length differs in the two languages. In Japanese, vowel length is phonemic. On the other hand, in English vowel length is not phonemic but it is one of the phonetic correlates of the tense-lax contrast (more peripheral vowels, such as [i] are classified as tense, whereas more central vowel, such as [I], are classified as lax). Because of this difference, it is interesting to compare the long and short vowels in the production of L2 English by L1 Japanese speakers and L2 Japanese by L1 English speakers.

3.2. Characteristics of vowel length contrast in English vs. Japanese

In addition to the aforementioned difference in phonemic status, the English tense-lax contrast and the Japanese long-short contrast differ in three aspects that will be discussed in the following sections: appurtenance to prosodic unit, phonetic duration and vowel quality. The characteristics of each language with respect to these three aspects are summarized in Table 3-1.

Table 3-1: English and Japanese vowel contrasts

English (tense vs. lax) Japanese (long vs. short)

prosodic unit SAME DIFFERENT tense: monosyllabic long: bimoraic (V.V) short: monosyllabic short: monomoraic (V)

phonetic duration YES YES contrast [I] : [i] = 1 : 1.3 [i] : [ii] = 1 : 2~3

vowel quality YES YES? contrast tense: more peripheral long: more peripheral? 3.2.1. Prosodic unit

66 As mentioned in Chapter 1, in the case of word segmentation, native speakers of English use the syllable as the minimal segmentation unit, while native speakers of Japanese use the mora. English speakers segment both tense and lax vowels into single sublexical prosodic units (= syllables). For example, both seat /sit/ and sit /sIt/ are treated as monosyllabic words by English speakers. Similarly, Japanese long vowels (but not short vowels) can be segmented by Japanese speakers into single syllables. However, the crucial difference between the two languages is that Japanese speakers further break long vowels into two subsyllabic units (= moras). For example, the word, t o-o- ru ‘pass’ contains three moras, but t o- ru ‘catch’ contains only two moras. See Chapter 5 for further discussion of prosodic units in the speech segmentation of English and Japanese.

3.2.2. Phonetic duration contrast

English tense vs. lax vowels have different phonetic duration, although they do not contrast

phonemically (no phonemic length distinction). I computed the durational ratio of /i/ to /I/ based on the duration data of male speech in Peterson & Lehiste (1960) and Hillenbrand et al. (1995). The results based on the data of the two studies are consistent: the high front tense vowel /i/ is about 30% longer in duration than the corresponding lax /I/. The phonemic length contrast of Japanese vowels is acoustically realized with a greater durational ratio of long to short vowels than the English tense vs. lax contrast. In word- medial position, long vowels are 2.5 to 3 times longer than the corresponding short vowels

(Han 1962). Thus, English and Japanese are phonetically similar in the sense that some vowels are characterized by a difference in duration, but this difference in phonetic duration is much greater in Japanese.

67 3.2.3. Vowel quality

Vowel quality plays an essential role in contrasting English tense and lax vowels. Tense vowels are systematically more peripheral in the vowel space than the corresponding lax

vowels (e.g., /i/ is more front and higher than /I/) (see Hillenbrand et al. 1995 for data on contemporary American vowels). The study of Nishi et al. (1998) shows that there is a difference between English and Japanese in terms of the role of vowel quality in the production of tense-lax/long-short vowels. Nishi et al. measured the frequency of the first and second formants (F1 and F2, henceforth) of English tense and lax vowels, and they conducted a similar formant analysis for the Japanese long vs. short contrast. Their results showed that the spectral overlap between Japanese long vs. short vowels (e.g., /ii/ vs. /i/) is significantly larger than the one between English tense and lax vowels. This indicates that in Japanese, vowel quality plays a weaker role in acoustically separating long and short vowels than in English, where tense vs. lax vowels are well differentiated by vowel quality.

3.2.4. Duration vs. vowel quality in the production of vowel contrasts

The aforementioned review of differences between L1 English and L1 Japanese shows an interesting balance of the roles of two physical correlates, duration and vowel quality, in the vowel system of each language. Duration or length plays a weaker role in contrasting English tense and lax vowels than in contrasting Japanese long and short vowels.

However, in the production of the English tense-lax contrast, the weaker role of duration is compensated by the greater role of vowel quality. We find the reverse relation in the production of the Japanese long-short contrast, where duration plays a greater role than vowel quality. This difference between L1 English and L1 Japanese, together with the difference in the phonemic status of vowel length, makes it interesting to compare the

68 production of L2 English by L1 Japanese speakers with the production of L2 Japanese by L1 English speakers.

3.3. Problems 3.3.1. L2 JapaneseÐL1 English

It is a well known fact among Japanese language instructors that the phonemic length contrast of Japanese vowels is very difficult for native English speakers to learn in both production and perception in the initial stage of their L2 Japanese development. Japanese short vowels produced by native English speakers tend to be perceived as the corresponding long vowels by native Japanese speakers. Thus, for example, ob a san ‘aunt’ in the production of Japanese by English speakers is often misunderstood as ob aa san ‘grandmother’, and a similar confusion occurs in the contrast between sh u jin and sh uu jin

ob a san vs. ob aa san ‘aunt’ ‘grandmother’ sh u jin vs. sh uu jin ‘husband’ ‘prisoner’

At this point, it is difficult to explain which Japanese vowel category is substituted with which English vowel category in the production of these minimal pairs. Do English speakers produce sh uu jin with their English tense /u/ and sh u jin with their lax /U/? If not, do they rather produce both Japanese long and short vowels in the same spectral region and

differentiate them by duration? In any case, we believe that the observed durational ambiguity between Japanese long and short vowels in L2 JapaneseÐL1 English is the product of the negative transfer of the fact that there is no phonemic length in L1 English. An additional characteristic of English which may affect the production of Japanese vowels by English speakers is the dominant role of vowel quality in perceiving the English tense vs. lax contrast. Bohn and Flege (1990) examined the perception of the tense-lax

69 contrast (/i/ vs. /I/ and /E/ vs. /œ/) by native English speakers in order to assess the relative salience of duration and vowel quality. Their results showed that native English speakers use vowel quality as the main cue to the vowel contrast and are not very sensitive to durational differences. This suggests that native English speakers are likely to have difficulties in perceiving the contrast of Japanese long vs. short vowels, which are significantly different only in duration, but not in vowel quality. The relatively dominant role of vowel quality (and the weaker status of duration) in the perception of the English vowel contrast) is then expected to affect not only the perception but the production of Japanese vowels by native English speakers.

3.3.2. L2 EnglishÐL1 Japanese

The prominent durational contrast between Japanese long and short vowels is likely to transfer to L2 English production, as shown in the following example:

mitt vs. meat <-> < ---> predicted vowel duration 1 : 2 ~ 2.5 in inexperienced L2 EnglishÐL1 Japanese

If beginning Japanese learners of English pronounce this , they are likely to directly import the durational characteristics of the Japanese vowel contrast, i.e., the Japanese durational ratio of long to short vowels (approximately 2~2.5).

Sugito (1982b) compared the durations of English /i/ and /I/ produced by native English

speakers with those produced by native Japanese speakers. The results showed that the

duration ratio of /i/ to /I/ is significantly larger in L2 English produced by Japanese speakers than in L1 English. The absence of a significant quality difference between Japanese long and short vowels should also transfer, as shown in the following example:

70 mitt vs. meat [ i ] [ ii ] predicted vowel quality in inexperienced L2 EnglishÐL1 Japanese The example shows the expected pattern of vowel quality in inexperienced L2 English produced by Japanese speakers. It is expected that the two vowel categories are differentiated by a significant difference in duration, but not in vowel quality, due to the transfer of the absence of a significant quality difference between Japanese short and long vowels. The transfer of spectral features of L1 Japanese in the production of the English

/i/Ð/I/ contrast was observed by Sugito (1982b) in the study mentioned earlier.

3.4. Goal

Two experiments were conducted in order to investigate the effect of L1 phonetic habits on the realization of the vowel contrast in L2 EnglishÐL1 English and L2 JapaneseÐL1 English. Experiments 4 and 5 examined L2 English produced by native speakers of Tokyo Japanese and L2 Japanese produced by native speakers of American English, respectively, in terms of the patterns of duration and vowel quality. In each experiment, L1, experienced L2 and inexperienced L2 speakers were compared in order to see possible developmental patterns.

Research questions

The comparison of the phonetic features of the vowel contrast in the two L1 types in

Section 3.3 leads us to the following research questions:

Experiment 4 (L2 EnglishÐL1 Japanese)

• Do native Japanese speakers learn to weaken their prominent duration contrast in the production of English tense and lax vowels?

• Do they learn to produce a native-like significant contrast of vowel quality?

71 weaken contrast L1 Japanese in L2 English L2 English

Duration Ratio long: short = 2 : 1 tense : lax = 1.3 : 1

Vowel Quality long ~ short tense = \ lax enlarge contrast in L2 English

Experiment 5 (L2 JapaneseÐL1 English)

• Do native English speakers learn to produce a native-like large duration contrast between Japanese short and long vowels?

• Do they learn to avoid producing a significant difference in vowel quality in the production of Japanese short and long vowels?

enlarge contrast L1 English in L2 Japanese L2 Japanese

Duration Ratio tense : lax = 1.3 : 1 long: short = 2 : 1

Vowel Quality tense = \ lax long ~ short

weaken contrast in L2 Japanese

In order to answer these questions, we assessed the effect of vowel type (tense vs. lax for English and long s. short for Japanese) on duration and vowel quality.

3.5. Method

3.5.1. Speech materials

In Experiment 4, four minimal pairs of English high front tense /i/ vs. lax /I/ were used: bead Ð bid, deep Ð dip, keen Ð kin and Pete Ð pit. The target word was presented in a frame sentence:

72 I said next. In Experiment 5, three minimal pairs of Japanese high front short /i/ and long /ii/ were used in the first syllable position (all words have a pitch accent on the first syllable):

b i ru ‘building’ b ii ru ‘beer’ k a do ‘corner’ k aa do ‘card’ t o ru ‘take’ t oo ru ‘pass’

The target word was presented in a frame sentence:

sosite to iimasu. (‘Next I said )

3.5.2. Subjects

Four speakers of L1 English and three advanced and three beginning Japanese speakers of L2 English participated in Experiment 4 (English). All these speakers also participated in Experiments 1 and 3. Three speakers of L1 Japanese and three advanced and four beginning English speakers of L2 Japanese participated in Experiment 5 (Japanese). All these speakers also participated in Experiment 2. Refer to Section 2.1.4. for the information of the language backgrounds of L2 speakers. BE2, BE3, BE4, AE1, AE3 and AE2 in Experiments 1 and 3 correspond to BE1, BE2, BE3, AE1, AE2 and AE3 in Experiment 4, respectively. The seven speakers of L2 Japanese participated in both Experiments 2 and 5 and were coded in the same way.

3.5.3. Procedure Recording

For each experiment, sentences with target words were mixed with foil sentences. Sentences in each reading of the list were pseudo-randomized in different orders. In the recording session, PsyScope was used to present sentences. One sentence was displayed on the computer screen at a time.

73 The subjects were given sufficient time to practice speech materials. They were asked to read sentences without hesitations or pauses in the middle. They read the sentence list 10 times. The first reading was not analyzed. Data were recorded in the recording booth of the UCLA phonetics lab for L1 English, advanced L2 English, L2 Japanese groups, and in the recording room of Meiji Gakuin University Information Center in Tokyo for L1 Japanese and beginning L2 English groups.

Measurements

The collected data were digitized with Kay Elemetrics’s CSL at a 10 kHz sampling rate. Scicon’s PitchWorks was used to measure duration and frequencies of the first and second formants (F1 and F2, henceforth). Frequencies of F1 and F2 are the physical correlates of vowel height and backness, respectively. Tokens were not analyzed if:

• there were hesitations or pauses in the middle of the sentence

• words were mispronounced

Duration of the first vowel of each target word was measured on waveforms and wide- band spectrograms. Formant frequencies were measured at the center of the first syllable, using LPC analysis.

Statistic Analysis

Obtained values of duration and formant frequencies (F1 and F2) were analyzed, using two-factor ANOVA and Scheffe’s post-hoc tests. The independent variables in the two factor ANOVAs were vowel type and word pair type. The focus of Experiments 4 and 5 is on the effect of vowel type on duration and vowel quality (tense vs. lax conditions in Experiment 4, and long vs. short conditions in Experiment 5). The effect of word pair type was included in the ANOVAs in order to control for the variance generated by this factor.

74 3.6. Results of Experiment 4 (English)

3.6.1. Duration L1 English pattern

All four L1 English speakers showed longer duration means for tense than lax vowels. A representative pattern is shown in Figure 3-2, in which the means and standard deviations

of tense /i/ vs. lax /I/ in NE1’s production are compared for each word pair.

Figure 3-2: Duration means and standard deviations of tense /i/ and lax /I/ for Speaker NE1 (L1 English)

tense lax ms

150 NE1 * * * 100 * bVd (bead vs. bid) 50 dVp (deep vs. dip) mean duration kVn ( keen vs. kin ) pVt ( Pete vs. pit ) 0 bVd dVp kVn pVt word pair type (* = significantly different at a = 0.01)

The results of a series of ANOVAs showed that the effect of vowel type (tense vs. lax vowels) on vowel duration was significant for the data of every L1 English speaker (p <

0.0001). None of the four L1 English speakers showed a significant interaction effect between vowel type and word pair type. See Table 3-2 for ANOVA results for each speaker (shaded cells indicate no significant effect). Also, according to the results of a series of Scheffe’s post-hoc tests (α = 0.01), the difference in duration means in the tense vs. lax vowels was statistically significant for all 4 tested word pairs for all four L1 English speakers. In Figure 3-2, word pairs showing significant differences are marked by an asterisk (α = 0.01).

75 Table 3-2: ANOVA results for duration data of L1 English in Experiment 4 (α = 0.01)

vowel word pair vowel*word pair

NE1 F(1, 51) = 52.53 F(3, 61) = 28 F(3, 61) = .76 p = <.0001 p = <.0001 p = .5241 NE2 F(1, 61) = 151.89 F(1, 61) = 69.22 F(1, 61) = 1.15 p = <.0001 p = <.0001 p = .3374 NE3 F(1, 64) = 89.51 F(1, 64) = 56.49 F(1, 61) = .577 p = <.0001 p = <.0001 p = .6351 NE4 F(1, 64) = 105.13 F(1, 64) = 46.84 F(1, 64) = 2.5 p = <.0001 p = <.0001 p = .0676

Additionally, duration ratios of tense to lax vowels were computed for each speaker by dividing the duration value of each tense token by the one of the corresponding lax token for each repetition of each word pair type. The four L1 English speakers showed a similar pattern: the duration ratios of tense to lax vowels are about 1.3 (i.e., the tense /i/ is about

30% longer than the corresponding lax /I/). A representative pattern is shown in Figure 3- 3. Figure 3-3: Average durational ratio of English tense/lax vowels for Speaker NE1 (L1 English) 2 NE1 1.5

1 bVd (bead vs. bid)

duration ratio .5 dVp (deep vs. dip) kVn ( keen vs. kin ) pVt ( Pete vs. pit ) 0 bVd dVp kVn pVt word pair type L2 English patterns

76 Duration ratios of English tense/lax vowels for L2 English data are presented in Figures 3-4 and 3-5.

Figure 3-4: Mean and standard deviation of duration ratio of English tense/lax vowels for BE1, BE3 and AE1 (L2 English)

3 BE1 3 BE3

2 2

1 1

0 0 bVd dVp kVn pVt bVd dVp kVn pVt

3 AE1

duration ratio (tense/lax) 2 bVd (bead vs. bid) dVp (deep vs. dip) 1 kVn ( keen vs. kin ) pVt ( Pete vs. pit ) 0 bVd dVp kVn pVt

word pair type

77 Figure 3-5: Mean and standard deviation of duration ratio of English tense/lax vowels for AE2, AE3 and BE2 (L2 English)

3 AE2 3 AE3

2 1.1 2 1.0 1 1

0 0 bVd dVp kVn pVt bVd dVp kVn pVt

3 BE2

duration ratio (tense/lax) 2 bVd (bead vs. bid) dVp (deep vs. dip) 1 kVn ( keen vs. kin ) pVt ( Pete vs. pit ) 0 bVd dVp kVn pVt word pair type

Three types of ratio patterns were observed in the data of L2 English produced by Japanese speakers.

• Two beginning (BE1 and BE3) and one advanced (AE1) L2 speakers

showed durational ratios around or above 2.0, which is similar to the durational ratio of Japanese long to short vowels (2.0~2.5, according to Han 1962), as shown in Figure 3-4.

• Two advanced L2 speakers (AE2 and AE3) approximated L1-English-like ratios for some word pairs, but they showed overcompensation (i.e., no

78 significant durational difference between tense vs. lax) for one pair, as shown in Figure 3-5.

• A L1-English-like pattern was also produced by one beginning L2 English speaker (BE2), as shown in Figure 3-5.

Additionally, for each speaker, obtained ratio values were pooled across all word pairs and repetitions, and mean ratio and standard deviation were computed. Results are shown in Figure 3-6. The horizontal lines in the figure delimit the range of the average ratios for the four L1 English speakers. As this figure shows, the duration ratios of the four L1 English speakers are clustered around 1.3. This result is consistent with the ratio of tense /i/ to lax /I/, which we computed from the results of Lehiste & Peterson (1960) and Hillenbrand et al. (1995) in Section 3.2.2. AE1, BE1 and BE3 show average ratios larger than 2.0, similar to Japanese-like ratios, as also shown in Figure 3-4. AE2 and AE3 approximate a native-like ratio (about 1.3), but remember that they also overcompensate the duration contrast in some contexts, as shown in Figure 3-5.

Figure 3-6: Average duration ratios of English tense/lax vowels

3 L1 English L2 English 2.5

2

1.5 L1 English

<Ð> range 1

duration ratio .5

0 NE1 NE2 NE3 NE4 AE1 AE2 AE3 BE1 BE2 BE3

79 These results show two points regarding the effect of the transfer of L1 Japanese durational characteristics. First, we found the expected effect of negative transfer from L1 Japanese, in that three L2 English speakers (AE1, BE1 and BE3) showed Japanese-like durational ratios. Second, it is possible to learn to reduce the Japanese-like duration contrast, as shown by the fact that AE2, AE3 and BE2 successfully produced native- English-like ratios. There is no systematic correlation between L2 proficiency and the duration ratio of L2

English tense /i/ / lax /I/, given that one advanced L2 English speaker (AE1) showed a Japanese-like ratio while one beginning speaker (BE2) showed a native-like ratio.

3.6.2. Vowel quality

In order to analyze vowel quality, we measured frequencies for F1 and F2. Obtained values were plotted, using UCLA Phonetics Lab’s Plot Formants software. In each speaker’s plot, the mean and two standard deviations of F1 and F2 frequencies are shown by the position of the phonetic symbol of each vowel category and an ellipsis circling the sound symbol, respectively.

L1 English patterns

Results of the four L1 English speakers are presented in Figure 3-7. The four L1 English speakers showed very consistent patterns: tense /i/ is higher and more front than lax /I/.

Also, there was no ellipsis overlap between the two vowel types, indicating that in L1 English tense /i/ and lax /I/ are significantly distinguished by vowel quality. This confirms the findings of previous studies.

80 Figure 3-7: /i/ and /I/ in the vowel space of L1 English speakers

NE1 NE2

NE3 NE4 F1 frequency (Hz)

[i] = tense; [I] = lax F2 frequency (Hz)

L2 English patterns

As opposed to the L1 English speakers, the three beginning Japanese speakers of L2

English showed no clear separation of tense /i/ and lax /I/, with a large overlap of the ellipses of the two categories, as shown in Figure 3-8.

81 Figure 3-8: /i/ and /I/ in the vowel space of beginning L2 English speakers

BE1 BE2

[i] = tense; [I] = lax BE3 F1 frequency (Hz)

F2 frequency (Hz)

The three advanced speakers of L2 English showed a clear separation between the means of tense /i/ and lax /I/, with still some overlap of ellipses, as shown in Figure 3-9. This indicates that the advanced speakers of L2 English could distinguish the two vowel

categories in terms of vowel quality, but not as consistently as L1 English speakers. Notice that the pattern of advanced L2 English is somewhere between L1 English and beginning L2 English patterns.

82 Figure 3-9: /i/ and /I/ in the vowel space of advanced L2 English speakers

AE1 AE2

[i] = tense; [I] = lax AE3 F1 frequency (Hz)

F2 frequency (Hz)

Euclidean distance data

The formant plots of individual speakers show three distinctive patterns, corresponding to the three speaker groups in terms of spectral separation between tense /i/ and lax /I/. In order to quantify degrees of spectral separation, I computed the Euclidean distance between tense and lax tokens of each minimal pair for each repetition of each speaker. The computation method is schematized in Figure 3-10. First, F1 and F2 differences between the tense token T and the lax token L, i.e., F1 and F2 distances between the two tokens, are computed (a and b in Figure 3-10); then the Euclidean distance between the tense and lax tokens (c in Figure 3-10) was computed for each repetition of each minimal pair (e.g.,

83 keen for /i/ vs. kin for /I/). For each speaker, the mean and standard deviation of the Euclidean distance were computed.

Figure 3-10: Euclidean distance (c) between the tense /i/ token T and the lax /I/ token L

T. b c .a L

a = F1 of T - F1 of L b = F2 of T - F2 of L F1 frequency (Hz)

F2 frequency (Hz) cab=+22

The results are summarized in Figure 3-11. In this figure, greater distances mean that /i/

and /I/ are further separated from each other, indicating that the two vowel categories are differentiated by vowel quality to a larger extent. The three speaker groups are significantly differentiated by the magnitudes of Euclidean distances: L1 English > advanced L2 English > beginning L2 English. This indicates that more experienced Japanese speakers of L2 English could approximate native-English-like spectral separation between tense /i/ and lax

/I/ more reliably. This indicates a positive correlation between L2 English proficiency and the vowel quality contrast between tense /i/ and lax /I/.

The results show two points regarding the effect of transfer. First, we found the expected effect of negative transfer from L1 Japanese in that beginning Japanese learners of

English showed no spectral separation between English tense /i/ and lax /I/. Second, there is a positive effect of learning, given that advanced Japanese learners showed more native- like separation between the two vowel categories (even though their spectral separation is not as stable as the separation produced by L1 English speakers).

84 Figure 3-11: Euclidean distance between English tense /i/ and lax /I/

800 L1 English Advanced Beginning L2 English L2 English

600

L1 Engish range 400

200 L1 Japanese range Euclidean distance mean 0 NE1 NE2 NE3 NE4 AE1 AE2 AE3 BE1 BE2 BE3

3.7. Results of Experiment 5 (Japanese)

3.7.1. Duration L1 Japanese pattern

All four L1 Japanese speakers showed longer duration means for long vowels. A representative pattern is shown in Figure 3-12, where the means of long and short vowels in NJ1’s production are compared for each word pair. The results of two-way ANOVAs showed that the effect of the vowel type (long vs. short vowels) on vowel duration was significant for the data of every L1 Japanese speaker (p < 0.0001). None of the four L1 Japanese speakers showed any significant interaction effects of the vowel type and the word pair type. See Table 3-3 for ANOVA results for each speaker (shaded cells indicate no significant effect). Also, according to the results of a series of Scheffe’s post-hoc tests (α = 0.01), the difference in duration means in the long vs. short condition was statistically significant for all three word pairs for all four L1 Japanese speakers. In Figure 3-12, word pairs showing significant differences are marked by an asterisk (α = 0.01).

85 Table 3-3: ANOVA results for duration data of L1 Japanese in Experiment 5 (α = 0.01)

vowel word pair vowel*word pair

NJ1 F(1, 43) = 689.19 F(1, 43) = 21.45 F(1, 43) = .35 p = <.0001 p = <.0001 p = .7070

NJ2 F(1, 45) = 805.79 F(1, 45) = 15.04 F(1, 45) = 3.76 p = <.0001 p = <.0001 p = .0309

NJ3 F(1, 48) = 1976.1 F(1, 48) = 19.82 F(1, 48) = 4.5 p = <.0001 p = <.0001 p = .0156

Figure 3-12: Duration means and standard deviations of short and long vowels for Speaker NJ1 (L1 Japanese)

short long ms NJ1 300

200 * * *

mean duration 100 biru (biru vs. biiru) kado (kado vs. kaado ) toru (toru vs. tooru) 0 biru kado toru (* = significantly different at α = 0.01)

Additionally, duration ratios of long to short vowels were computed for each speaker’s data by dividing the duration values of each long token by the ones of the corresponding short token for each repetition of each word pair type. The three L1 Japanese speakers showed a similar pattern: the duration ratios of long to short vowels are about 2.0 to 2.5 (i.e., long vowels are 2.0 to 2.5 times as long as the corresponding short vowels in L1 Japanese production). A representative pattern is shown in Figure 3-13.

86 Figure 3-13: Duration ratios of Japanese long/short vowels for Speaker NJ1 (L1 Japanese)

3 NJ1

2

1 duration ratio

0 biru kado toru word pair type

L2 Japanese patterns

Overall, the seven L2 Japanese speakers also produced Japanese long vowels with significantly longer duration than the corresponding short vowels for all 3 word pairs. This pattern is observed in Figure 3-14, where the average duration ratios of Japanese long to short vowels are shown for all participants. The horizontal lines in the figure delimit the range of duration ratios produced by the three L1 Japanese speakers (approximately 2.0 to 2.3). A ratio of about 1.0 indicates that there is no durational difference between long and short vowels. This figure shows that the durational ratio of every L2 Japanese speaker is greater than 1.5, which means that long vowels were at least 1.5 times longer than the corresponding short vowels in L2 Japanese. This indicates that all seven L2 Japanese speakers showed duration ratios of Japanese long/short vowels larger than their L1 ratio of English tense/lax vowels (approximately 1.3).

87 Figure 3-14: Average duration ratios of Japanese long/short vowels

4 L1 Japanese L2 Japanese

3

L1 Japanese

2 <Ð> range

1 duration ratio

0 NJ1 NJ2 NJ3 AJ1 AJ2 AJ3 BJ1 BJ2 BJ3 BJ4

By comparing L1 and L2 ratios, we can observe patterns in L2 Japanese:

• AJ1 and BJ1 showed durational ratios within the L1 Japanese range, i.e., the two speakers approximated the L1 Japanese pattern. BJ4 showed a ratio closer to the L1 Japanese range with a large standard deviation, which indicates that his approximation of the L1 Japanese pattern was not stable.

• BJ3 showed a duration ratio which was smaller than L1 Japanese ratios but still larger than L1 English ratios.

• Two advanced (AJ2, AJ3) and one beginning (BJ2) speakers of L2 Japanese showed ratios much greater than the range of L1 Japanese ratios.

This means that these three speakers exaggerated the durational contrast between Japanese long and short vowels (overcompensation effect).

These results show two points regarding the effect of the transfer of the durational characteristics of L1 English vowels. First, we did not find the expected effect of negative transfer from L1 English, given that all L2 Japanese speakers were able to produce the durational contrast between Japanese long and short vowels greater than their L1 durational

88 contrast between English tense and lax vowels (i.e., about 1.3). However, some speakers exaggerated the contrast. Second, we found no major difference between advanced and beginning L2 Japanese.

3.7.2. Vowel quality L1 Japanese pattern

The formant frequencies of Japanese long /ii/ and short /i/ for all the three L1 Japanese speakers are shown in Figure 3-15. The results showed a consistent L1 Japanese pattern: the means of long /ii/ and short /i/ in L1 Japanese were close to each other with no ellipse separation. This is very different from the L1 English pattern, which, as we saw in the

Experiment 4, is characterized by a clear spectral separation between tense /i/ and lax /I/. This difference between the two languages clearly emerges from the comparison of the formant plots of NJ1 and NE1 in Figure 3-16 (both speakers are males). The absence of a clear separation between long /ii/ and short /i/ in NJ1’s vowel space indicates that vowel quality does not play any role in phonetically distinguishing the two vowel categories (at least in the high front region), unlike the case of L1 English. Another important characteristic of L1 Japanese is illustrated by the comparison of NJ1’s L1 Japanese and NE1’s L1 English patterns in Figure 3-16. Both Japanese long /ii/ and short /i/ in L1 Japanese production are located in the spectral region of English tense /i/. This means that the vowel quality of Japanese /ii/ and /i/ is similar to the quality of English

tense /i/.

89 Figure 3-15: /i/ and /ii/ in the vowel space of L1 Japanese speakers

NJ1 NJ2

NJ3 F1 frequency (Hz)

[i] = long [I] = short

F2 frequency (Hz)

Figure 3-16: Spectral contrast in L1 Japanese vs. L1 English in the high front region

NJ1 (Japanese) NE1 (English)

F1 frequency (Hz) [i] = long; [I] = short [i] = tense; [I] = lax F2 frequency (Hz)

90 L2 Japanese patterns

The formant frequencies of Japanese long /ii/ and short /i/ were plotted for advanced and beginning speakers of L2 English in Figures 3-17 and 3-18, respectively. In Figure 3-17, a pattern representative of L1 English is also shown.

Figure 3-17: /i/ and /ii/ in the vowel space of AJ1, AJ2 and AJ3 (advanced L2 Japanese) and NE1 (L1 English) AJ1 AJ2

[i] = long; [I] = short

AJ3 c.f. NE1 (L1 English) F1 frequency (Hz)

[i] = tense; [I] = lax F2 frequency (Hz)

In Figure 3-17, we find a consistent pattern across the three advanced speakers of L2 Japanese: long /ii/ and short /i/ are tightly clustered in the spectral region of L1 English tense /i/. This indicates that there is no significant difference between long /ii/ and short /i/ in vowel quality in the production of advanced L2 Japanese. The comparison of advanced

91 L2 Japanese (AJ1Ð3) and L1 English (NE1) shows that all three advanced speakers of L2 Japanese selected the region of English tense /i/ and distinguished Japanese /ii/ and /i/ by varying duration.

Figure 3-18: /i/ and /ii/ in the vowel space of beginning L2 Japanese speakers BJ1 BJ2

BJ3 BJ4 F1 frequency (Hz)

[i] = long; [I] = short F2 frequency (Hz)

Similarly, the spectral region of English tense /i/ was selected in the production of both Japanese long /ii/ and short /i/ by the four beginning speakers of L2 Japanese, as shown by the comparison of Figures 3-17 and 3-18. A difference between beginning and advanced L2 Japanese is that the areas of the ellipses are larger in beginning L2 Japanese, indicating that the vowel quality of /i/ and /ii/ in beginning L2 Japanese varied to more extent.

92 Euclidean distance data

To quantify these comparisons, I computed the Euclidean distance between Japanese long and short tokens for the minimal pair of /ii/ vs. /i/ for each speaker following the same method used to compute the Euclidean distance between English tense and lax tokens in Experiment 4 (see Figure 3-10). Average Euclidean distances are shown for all speakers in Figure 3-19. The comparison of individual plots across L1 Japanese, advanced L2 Japanese and beginning L2 Japanese has already shown that none of the seven L2 Japanese speakers produced a quality contrast between Japanese long /ii/ and short /i/. This L2 Japanese pattern is also confirmed by Figure 3-19: the average distance for every L2 Japanese speaker is much smaller than the distance in the range of L1 English, and it is located within the L1 Japanese range. This suggests that all seven speakers of L2 Japanese could approximate the L1 Japanese pattern. The comparison of individual plots in Figures 3-14 and 3-15 has shown that the vowel quality of /i/Ð/ii/ in beginning L2 Japanese varied more than the one in advanced L2 Japanese. This difference is also observed in Figure 3-16: the average Euclidean distances of beginning L2 Japanese tend to have taller error bars.

Figure 3-19: Average Euclidean distance between Japanese long /ii/ to short /i/

800 L1 Japanese Advanced Beginning L2 Japanese L2 Japanese 600

L1 Engish range 400

200 L1 Japanese range Euclidean distance mean 0 NJ1 NJ2 NJ3 AJ1 AJ2 AJ3 BJ1 BJ2 BJ3 BJ4

93 The results suggest three general patterns for L2 JapaneseÐL1 English. First, all seven L2 Japanese speakers could approximate the L1 Japanese pattern, given that there was no significant difference between long /ii/ and short /i/ in vowel quality. Second, native

English speakers learning Japanese do not replace Japanese short /i/ with English lax /I/. They rather seem to choose the spectral region of tense /i/ and produce Japanese long and short high front vowels within that region. Finally, there was no evidence for the effect of negative transfer of L1 English spectral characteristics in L2 Japanese produced by English speakers.

3.8. Discussion of Experiments 4 and 5

3.8.1. Vowel contrast in L1 English vs. L1 Japanese

The L1 patterns emerging from the results of Experiments 4 and 5 are summarized in Table 3-4.

Table 3-4: Summary of L1 English and L1 Japanese patterns observed in Experiments 4 and 5

L1 English L1 Japanese duration YES: tense > lax YES: long > short contrast (about 30% longer) (about 2.0 ~ 2.5 times longer)

quality YES NO contrast tense is more peripheral than lax long /ii/ and short /i/ are produced in (clear separation between tense the spectral region close to L1 English and lax with no spectral overlap) tense /i/

Japanese long and short vowels in the same spectral region are differentiated only by duration. Both long /ii/ and short /i/ are produced in the spectral region of English tense /i/. In contrast, English tense and lax vowels are differentiated by both duration and vowel quality. An important difference between the two L1 types is that the durational contrast of Japanese long to short vowels (about 2.0 to 2.5) is greater in magnitude than that of

94 English tense to lax vowels (about 1.3). These patterns in Table 3-4 confirm the findings of previous studies, which were summarized in Table 3-1.

3.8.2. Duration contrast in L2 English and L2 Japanese vowels

The patterns of duration contrast in L2 English and L2 Japanese are summarized in Table 3- 5 and discussed in the sections that follow it.

Table 3-5: Summary of duration contrast in L2 English and L2 Japanese vowels observed in Experiments 4 and 5

L2 EnglishÐL1 Japanese L2 JapaneseÐL1 English negative transfer of the L1 able to produce the L1 Japanese- Japanese contrast (greater like pattern by enhancing duration contrast) phonetic duration contrast in L1 English duration however, possible to learn the L1 contrast English duration contrast by however, tend to exaggerate the weakening the L1 Japanese duration contrast by over- duration contrast (positive lengthening a long vowel learning effect) (overcompensation effect) tend to overweaken the duration no systematic correlation with L2 contrast (overcompensation Japanese proficiency effect) no systematic correlation with L2 English proficiency

Negative transfer of the L1 duration contrast

In Experiment 4, we found effect of negative transfer of the L1 duration pattern in L2

English produced by Japanese speakers: some Japanese speakers of L2 English produced a duration ratio of English tense/ lax vowels as large as the duration ratio of L1 Japanese long/short vowels. However, we have found no evidence of the negative transfer of the L1 English pattern in L2 Japanese produced by native English speakers: none of the L2 Japanese speakers produced a duration ratio of Japanese long/short vowels as small as the ratio of L1 English tense/lax vowels.

95 There is an asymmetry between the two L2 types in terms of the effect of the negative transfer of the L1 duration pattern. While there was some evidence for the negative transfer of the L1 Japanese pattern in L2 English, we did not find any evidence of the negative transfer of the L1 English pattern in L2 Japanese. This finding suggests that weakening the prominent phonemic duration contrast of L1 vowels (the case of L2 English produced by native Japanese speakers) may be more challenging than enhancing the L1 non-phonemic duration contrast (the case of L2 Japanese produced by native English speakers). Another possibly relevant factor is that the Japanese kana alphabets provide English learners of Japanese with visual cues helpful to acquire the distribution of phonemic length among Japanese vowels. Kana letters represent moras; thus, the moraic segments /Q/, /R/ and /N/ are represented by separate letters. The short and long vowels that were investigated in Experiment 5 are differentiated by the number of letters in the kana system. A significant effect of kana literacy acquisition on speech segmentation unit awareness by Japanese young children was found by Inagaki, Hatano and Otake (2000). Their experiments tested the segmentation of words containing CVN, CVQ, CVV and CV by 4- to 6- years old children. Results indicated that “the children’s conscious segmentation of words... developed from being a mixture of syllable- and mora-based to being predominantly mora-based as they learned to read kana letters” (Inagaki et al. 2000, p. 70). A similar effect on the segmentation of Japanese words by native English speakers participating in Experiment 5 is possible, given the fact that they all acquired kana letters. This hypothesis needs to be tested in further studies, for example by comparing English speakers with kana literacy with English speakers without kana literacy (but equally fluent in Japanese) in terms of their segmentation of Japanese words.

96 Positive learning effect

In Section 3.4, the following research questions were asked regarding the learning of the duration characteristics of a target language, with respect to L2 English and L2 Japanese:

• Do native Japanese speakers learn to weaken their prominent phonemic duration contrast in the production of English tense and lax vowels?

• Do native English speakers learn to produce a native-like large duration contrast between Japanese short and long vowels?

The results of Experiments 4 and 5 have answered the questions, showing evidence for positive learning effect in both L2 types: it is possible for native Japanese speakers to learn to approximate the native English contrast by weakening the prominent contrast in their L2 English production, and vice versa for L2 Japanese production by native English speakers.

Overcompensation effects

Two advanced Japanese speakers of L2 English (AE2 and AE3) successfully avoided producing the prominent duration contrast of L1 in L2 English production. However, in some contexts, they overweakened the duration contrast and eliminated the durational difference between English tense and lax vowels. This is an example of an overcompensation effect. Interestingly, the opposite direction of overcompensation was observed in L2 Japanese produced by native English speakers. Three speakers of L2 Japanese (AJ2, AJ3 and BJ2) consistently exaggerated the durational difference between Japanese long and short vowels. The means and standard deviations of Japanese long and short vowels are plotted for these three L2 Japanese speakers and one L1 Japanese speaker (NJ1) in Figure 3-20. The comparison of their patterns shows that the exaggerated contrast is due to overlengthening of the long vowels.

97 Figure 3-20: Mean and standard deviation of Japanese long and short vowels for AJ2, AJ3, BJ2 (L2 Japanese) and NJ1 (L1 Japanese)

ms 400 400 AJ2 AJ3

300 300

200 200

100 100

0 0 biru kado toru biru kado toru 400 400 BJ2 NJ1 (L1 Japanese)

mean duration 300 300

200 200

100 100

0 0 biru kado toru biru kado toru word pair type short long

On the other hand, there is no single factor contributing to the aforementioned overweakening of the L2 English duration contrast in the production of AE2 and AE3. The individual plots of these two L2 English speakers and a plot of a native English speaker’s data are shown in Figure 3-21. AE2’s durational contrast between tense /i/ and lax /I/ disappeared in the PeteÐpit pair. It is not possible to tell whether this pattern is due to a shortening of tense /i/ or to a lengthening of lax /I/. This is a case of genuine neutralization of the duration contrast between tense and lax vowels. On the other hand, the absence of duration contrast in AE3’s production of the beadÐbid pair is due to an

excessive lengthening both tense /i/ and lax /I/. We also notice that the overall duration of

98 both tense and lax durations is much greater in the production of AE2 and AE3 than the production of NE1. However, it is difficult to make a connection between this L2 pattern and the aforementioned overweakening of the L2 English duration contrast shown by AE2 and AE3.

Figure 3-21: Mean and standard deviation of Japanese long and short vowels for AE2, AE3 (advanced L2 English) and NE1 (L1 English)

250 250 ms AE2 AE3 200 200 * * * * 150 * 150 *

100 100

50 50

0 0 bVd dVp kVn pVt bVd dVp kVn pVt 250 NE1 (L1 English) tense lax 200 mean duration

150 * * * 100 * bVd (bead vs. bid) dVp (deep vs. dip) 50 kVn ( keen vs. kin ) pVt ( Pete vs. pit ) 0 bVd dVp kVn pVt word pair type (* = significantly different at a = 0.01)

These speakers of L2 English and L2 Japanese avoided importing L1 duration patterns in their production of L2 vowels. However, the observed cases of overcompensation effect in both L2 types indicate how challenging it is to master the phonetic habits of the target language.

99 No correlation with L2 English proficiency

The data on L2 English in Experiment 4 showed no systematic correlation between duration patterns and L2 English proficiency. That was also the case for L2 Japanese in Experiment 5. These results suggest that there is no systematic developmental pattern in the acquisition of the L2 duration contrast in both L2 types.

3.8.3. Quality contrast in L2 English and L2 Japanese vowels

The patterns of vowel quality contrast in L2 English and L2 Japanese are summarized in the following table and discussed in the sections that follow it.

Table 3-6: Summary of quality contrast in L2 Japanese and L2 English vowels observed in Experiments 4 and 5

L2 EnglishÐL1 Japanese L2 JapaneseÐL1 English negative transfer of the L1 able to produce the L1 Japanese Japanese pattern (no lax pattern (no strong evidence for category) negative transfer of the L1 English pattern) quality able to learn to distinguish tense contrast /i/ vs. lax /I/ by quality, but still already have tense and lax not with a native-like degree of categories in L1 English spectral separation Ð> choose the tense [i] area and Positive correlation with L2 just distinguish Japanese English proficiency short and long vowels by duration

Transfer effect of L1 quality characteristics

For vowel quality, the results of Experiments 4 and 5 revealed an asymmetrical pattern between the two L2 types in terms of the transfer effect of the spectral characteristics of the L1 vowel contrast. In Experiment 4, we found an effect of negative transfer of the L1 Japanese pattern in the production of English tense /i/ and lax /I/ by native Japanese speakers: the three beginning Japanese learners of English produced both English tense /i/ and lax /I/ in the spectral region of Japanese /i/, which is close to English tense /i/, and

100 differentiated the two categories only by duration. In contrast, we found no evidence of the negative transfer of the L1 English pattern in L2 Japanese produced by native English speakers. Both beginning and advanced speakers of L2 Japanese in Experiment 5 were able to produce the L1 Japanese pattern, i.e., to only produce the target vowels within the spectral region of English tense /i/, which is close to Japanese /i/-/ii/, and distinguish Japanese long and short vowels by duration. The presence vs. absence of negative transfer of L1 patterns in L2 English and L2 Japanese, respectively, can be explained by the following difference between L1 Japanese and L1 English. In the high front region of the vowel space, there is no lax category in L1

Japanese, i.e., the region of English lax /I/ is never used. Thus, native Japanese speakers need to learn to use the English lax region and create a new vowel category in order to

produce the quality contrast of English /i/Ð/I/. On the other hand, the spectral region of Japanese /i/-/ii/ is already present in the L1 English system (i.e., tense /i/), so there is no need to create a new vowel category in the production of L2 Japanese by native English speakers.

It would have been possible that English tense /i/ and lax /I/ were mapped to Japanese long /ii/ and short /i/, respectively, in the production of L2 Japanese by native English speakers. However, in the data of Experiment 5, we did not find any case of this possible negative transfer pattern. Native English speakers map English /i/ into both Japanese long

/ii/ and short /i/ without using the region of English lax /I/. This pattern could be due to the positive transfer of the high sensitivity of native English speakers to spectral information. As mentioned in Section 3.3.1, Bohn and Flege (1990) found that native English speakers use spectral information as a dominant cue in the perceptual discrimination of the English tense-lax contrast in the front region. It is likely that, thanks to their high L1 spectral sensitivity, native English speakers realize that Japanese long and short vowels produced

101 by L1 Japanese speakers are close to English tense /i/, and then produce the two vowel types as long and short versions of English tense /i/, respectively.

Learning effect

In Section 3.3, the following research questions were asked regarding the learning of the quality characteristics of the target language, with respect to L2 English and L2 Japanese.

• Do native Japanese speakers learn to produce a native-like significant contrast of tense-lax vowel quality?

• Do native English speakers learn to suppress a significant quality contrast in the production of Japanese short and long vowels?

As just discussed, opposite tasks are involved in approximating the pattern of the target language in the two L2 types. Native Japanese speakers learning English need to create a new vowel category in the spectral region of English lax /I/, while native English speakers learning Japanese need to avoid using the spectral region of English lax /I/ and produce Japanese long and short vowels in the region of English tense /i/. The results of Experiment 4 show evidence for the negative transfer of the L1 Japanese pattern into L2 English and also a positive learning effect in the patterns of advanced L2 English. In the L2 Japanese data, we do not find a single case in which a L2 Japanese speaker produced a

Japanese vowel in the English lax /I/ region. Furthermore, comparing individual formant plots between advanced L2 English and L2 Japanese in Figures 3-8 and 3-14, we find that advanced speakers of L2 Japanese produce the native-like quality contrast more reliably than advanced speakers of L2 English, who can differentiate English tense /i/ and lax /I/ by

quality, but still with some spectral overlap between tense /i/ and lax /I/.

102 These results lead to two general conclusions. First, in both L2 types, it is possible to learn the pattern of the target language. Second, however, it is more challenging for native Japanese speakers to produce a native-like quality contrast in the production of English tense and lax vowels than for native English speakers to avoid producing a significant quality contrast in the production of Japanese short and long vowels. Presumably, this is due to the fact that it is harder to create a new vowel category (lax /I/ in the case of L2 EnglishÐL1 Japanese) than it is to use a category already available in L1 in a different context (as in the case of L2 JapaneseÐL1 English). This hypothesis should be tested in further studies by comparing the two L2 types in the production of vowel categories in spectral regions other than the high front region considered in this chapter.

3.9. Summary of Experiments 4 and 5

In Experiments 4 and 5, we investigated how the patterns of duration and vowel quality in L1 vowel production transfer to vowel contrast in L2 English produced by Japanese speakers and L2 Japanese produced by English speakers, respectively. The findings of earlier studies were confirmed by our L1 data. L1 Japanese long and short vowels in the same spectral region are consistently distinguished only by duration, while L1 English tense and lax vowels are distinguished by both duration and vowel quality. The two languages differ in terms of the magnitude of the duration contrast: the ratio of Japanese long to short vowels is significantly greater than the ratio of English tense to lax vowels.

The duration results show that it is possible to learn the L2 pattern by learning to suppress the L1 duration pattern in the production of L2. However, L2 learners tend to exaggerate the duration pattern of their L2. This is the case of both L2 English and L2 Japanese. The absence vs. presence of negative transfer from L1 in L2 Japanese and L2 English, respectively, suggests that it may be more challenging to weaken a prominent L1 duration contrast (the case of L2 EnglishÐL1 Japanese) than enhance a L1 contrast in order

103 to approximate the duration contrast of L2 (the case of L2 JapaneseÐL1 English). This difference between L2 EnglishÐL1 Japanese and L2 JapaneseÐL1 English may reflect the difference between L1 Japanese and L1 English in terms of the phonemic status of the duration correlate: i.e., in L2 production, a duration contrast which is phonemic in L1 is harder to change or manipulate than a contrast which is not phonemic in L1. In other words, phonemic L1 duration patterns are more likely to negatively transfer to L2 production than duration patterns that are only phonetic. The analysis of vowel quality showed strong evidence of the negative transfer of the L1 Japanese pattern in L2 English, but no evidence for the negative transfer of the L1 English pattern in L2 Japanese. This asymmetry can be explained in terms of a difference between the two types of L1 in the phonemic vowel system: L1 Japanese speakers learning English have to develop a new phonetic/phonological category, i.e., English lax /I/, and this is harder than the task faced by L1 English speakers learning Japanese, who already possess a category equivalent to Japanese /i/-/ii/, i.e., English tense /i/, so that they simply have to learn to produce long and short versions of this category.

104 Chapter 4: Temporal Organization Across Syllables in L2 English and L2 Japanese

4.1. English stress vs. Japanese mora timings

4.1.1. Stress-foot and mora as basic timing units As mentioned in Section 1.4.3, languages have been traditionally classified into different timing categories, on the basis of the notion that temporal organization of speech is based on isochronous units of timing (for a summary, see Beckman 1992; Tajima 1998). English is classified as a language in which the fundamental unit of timing is the stress foot, i.e., a stress-timed language (Pike 1945; Abercrombie 1967). French is classified as a language in which the syllable is used as the basic timing unit, i.e., a syllable-timed language. Finally, Japanese is classified as a mora-timed language having the mora as a timing unit (see Jinbo 1927, Trubetzkoy 1939, Block 1950, and Hattori 1960 for early proposals of mora ; Warner and Arai [submitted] for a review). In the theory of isochrony, the duration of each occurrence of the basic timing unit is assumed to be equal. In the case of English stress-timing, the duration of the interval between successive stresses (i.e., the interstress interval) is expected to be the same no matter how many other factors may vary (e.g., the number of unstressed syllables, the

segmental composition of the syllables forming the foot). If this theory is correct, there should be, for example, no difference in the duration of interstress intervals across the following three sentences (the relevant stressed syllables are marked by acute accents):

Send Ánn hóme. Send Ánna hóme. Send ánimals hóme.

105 However, “[m]any investigators, beginning with Class (1939), have measured interstress intervals in English, and all have shown that they are objectively longer as they contain more and more syllables (see Lehiste 1977; Kawasaki 1983 for a review)” (Dauer 1983, p. 52). A number of studies were also conducted to test the theory of isochrony with respect to Japanese mora-timing. “Because of the effects of inherent segmental duration and differences in moraic structure (non-CV morae), it is clear that morae are not completely isochronous n Japanese”, as pointed by Warner and Arai (submitted, pp. 4-5). In an early experimental study of mora-timing, Han (1962) argued that the mora is a unit of duration in Japanese, and that all morae are of approximately (but not exactly) equal length, and that this is achieved through durational compensation within a mora. The hypotheses proposed by Han (1962), the nearly equal duration of different morae and compensation, are the focus of most early work on mora-timing13. However, Warner and Arai, on the basis of a review of the major studies on this topic, conclude that the experimental results are too inconsistent to support the claim that moras have approximately the same length.

4.1.2. Factors characterizing different timing types

It has been proposed in later studies that different timing types are characterized not by different types of isochronous units, but rather by other properties of speech. For example, Dauer (1983) suggested that crosslinguistic differences in timing patterns are affected by the interaction of factors, such as the phonetic realization of lexical accent,

13 After many counterexamples against the idea of inter-mora compensation as the basis of Japanese mora- isochrony were presented, a new definition of mora-timing was proposed by Port et al. (1987). Port et al. define mora-timing not as a tendency of all moras toward equal duration, but rather as predictability of word duration from the number of moras in the word. “Under this new definition, if one mora is shorter than average, the others are expected to be longer in order to maintain constant word duration, not shorter in order to match it... This version of the hypothesis predicts negative correlation both within and across mora boundaries” (Warner & Arai submitted for a review, p.14). A few experimental studies were conducted to test this hypothesis. Some studies (e.g., Bradlow et al. 1995; Sato 1995, 1996, 1998) presented

106 degrees of or syllable structure. These factors are helpful in understanding differences between English and Japanese timing patterns. As discussed in Chapter 2 of the present study, English and Japanese greatly differ in terms of how duration is treated in the realization of lexical accent (see Section 2.1.2. for a review). In English, stressed syllables are longer than unstressed syllables, and consequently the durational patterns of utterances are strongly affected by the distribution of lexical stress. A stressed syllable is subject to lengthening, so that an unstressed syllable tends to be shorter in duration. On the other hand, in Japanese, lexical accent does not affect the duration of syllables. Consequently, there is no significant durational difference between accented and unaccented syllables in Japanese, and the durational patterns of utterances are mainly dependent on the distribution of phonemic short and long segments. In other words, while in English lexical accent properties and rhythmic organization are closely related, in Japanese, these two aspects are independent, and rhythmic organization depends largely on the phonemic distribution of short and long segments. In stress-timed languages, more complex syllable structures tend to be found in stressed syllables, while simple structures (CV) occur in unstressed syllables. This difference results in a higher average number of segments in the stressed syllables, which contributes to the higher average duration of the stressed syllables (see Fant, Kruckenberg & Nord 1991). The inventory of syllable types is more limited in syllable-timed languages like French or Spanish, and even more limited in mora-timed languages like Japanese. The types of syllable structure frequently occurring in recorded speech in English (stress-time), Spanish (syllable-time) and Japanese (mora-time) are presented in Table 4-1, which shows percentages of CV and V syllable types occurring in these languages. Over half of the syllables in Spanish and Japanese have a simple CV structure, while English shows a

experimental support for the claim of Port et al., while other studies did not (e.g., Campbell & Sagisaka 1991; Campbell 1991; Han 1994).

107 wider distribution among different types of syllables. The proportion of CV is higher in Japanese than in Spanish.

Table 4-1: Percentages of CV and V syllable types in English, Spanish and Japanese14

English (Dauer 1983) Spanish (Dauer 1983) Japanese (Otake 1990) CV 34% CV 58% CV 73% V 8% V 6% V 10%

Dauer (1983) observed that syllables with more complicated structure tend to be stressed in a stressed-timed language. “[T]here is a strong tendency for “heavy” syllables (those containing many segments) to be stressed and “light” syllables (those containing few segments) to be unstressed. That is, syllable structure and stress are more likely to reinforce each other in a stress-timed than in a syllable-timed language” (Dauer 1983, pp. 55-56). Thus, heaviness makes stressed syllable longer than unstressed syllables. Dauer also points out that, besides syllable structure, the segmental composition of syllables reinforces the difference between stressed and unstressed syllables in English. In the English text analyzed by Dauer (1983), most unstressed CV syllables (92%) were

composed of a consonant plus /I/, /´/. and /„/, whereas most stressed CVs (83%) had /O/,

/ou/, /E/ or /eI/ (all vowels which have longer inherent duration) as their nucleus. In Japanese, however, there is no difference in terms of segmental composition between accented and unaccented syllables.

4.2. Linguistic factors investigated The temporal organization of speech is also affected by linguistic factors above the segmental and syllabic levels. In this chapter, we focus on the effects of two linguistic factors of this sort: 1) effect of parts of speech; 2) effect of the lapse constraint and

14 This table is adapted from Table 6 in Otake (1990).

108 culminativity requirement. The first and second factors will be examined in Experiments 6 and 7, respectively.

4.2.1. Effect of parts of speech on temporal organization

An additional difference between English and Japanese temporal organization concerns their morphosyntactic properties. In English, monosyllabic function words are typically treated as phonological clitics and realized in their weak forms. A clitic is a linguistic item which “exhibits behavior intermediate between that of a word and that of an affix. Typically, a clitic has the phonological form of a separate word, but cannot be stressed and is obliged to occupy a particular position in the sentence in which it is phonologically bound to an adjoining word, its host... Any process by which an independent word is reduced to a clitic is called cliticization” (Trask 1996, pp. 74-75). It is a major characteristic of English prepositions and articles that they may exhibit an extremely close phonological connection with the word that follows (Selkirk 1984). The phonetic consequence of cliticization of the English function word is that they are unstressed and durationally reduced, as unstressed syllables of content words. Thus, in the following example, the preposition in in the first sentence is cliticized and durationally reduced, having the next content word act as its host (stressed syllables are capitalized), and the cliticized and reduced preposition in tends to be spoken in temporal proximity to the host act. Consequently, the two sentences are phonetically identical (Selkirk 1984, to

appear)15:

We will SEE them in ACT 3.

15 The phonological treatment of the function word and the content word in prosodic organization has been controversial. For example, Selkirk’s (1984) position is substantially different from the one of Nespor & Vogel (1987) and Hayes (1989). For discussion, see the review paper by Shattuck-Hufnagel & Turk (1996). In the present study, for consistency, I have adopted Selkirk’s approach, in which both a sequence of a function word and a content word (e.g., in ACT) and a multisyllabic content word (e.g., enACT) form the same type of prosodic unit (i.e., the prosodic word).

109 We will SEE them en ACT THREE.

The cliticization and phonetic reduction of function words in English contribute to the formation of larger prosodic units. On the other hand, Japanese function morphemes are not independent words but bound morphemes. They are typically suffixed to content words. For example, in the following sentence, three function morphemes, -ga, -o, -ta, are bound to content words and cannot stand by themselves:

Taroo-ga mikan-o tabe-ta name-NOM tangerine-ACC eat-PAST ‘Taroo ate tangerines’

Like English function words, Japanese function morphemes (case particles in this case) are not accented. However, unlike English function words, Japanese function morphemes are not subject to shortening. The results of Experiments 1 and 3 show that the durational reduction of the unstressed syllables of content words in English is challenging to learn for native speakers of Japanese. How will Japanese learners of English treat English monosyllabic function words in the rhythmic organization of their L2 English? Can they learn to cliticize and durationally reduce English function words? These questions will be answered in Experiment 6.

4.2.2. The lapse constraint and the culminativity requirement

In English, there is a tendency to avoid many consecutive unstressed/unaccented syllables. This phenomenon has been referred to with the term “lapse constraint” (Selkirk 1983, p. 49). The lapse constraint affects the distribution of stress in English stress timing, where stresses obey a rhythmic tendency towards even spacing, so that they are neither too close nor too far apart (e.g., see Liberman 1975; Selkrik 1984 for a review). At the phrase level,

110 “a stressless syllable may serve as stressed when it is not adjacent to a stressed syllable (‘Promotion’)...” (Hayes 1984, p. 917). Moreover, English words are subject to the culminativity requirement. Culminativity requires that each content word has a single strongest syllable, bearing the main stress (Liberman & Prince 1977, p. 262; cited in Hayes 1995). Both the lapse constraint and the (word level) culminativity requirement are language specific characteristics. Indeed, in Japanese, neither the lapse constraint nor the culminativity requirement are enforced. Consequently, Japanese can have entire phrases without any pitch accent (Beckman 1992; Kubozono & Nakau 1998). Thus, the following sentence, which consists of a long sequence of unaccented syllables, is possible in Japanese:

watashi-wa mukashi kagoshima-no inaka-kara jyookyoo-shita I-TOP in the past (place)-of countryside-from came to Tokyo ‘I came to Tokyo from the countryside of Kagoshima in the past’ (adapted from Kubozono & Nakanishi 1998, p. 22)

Do the lapse and culminativity constraints on English stress distribution transfer negatively to the L2 Japanese produced by native speakers of English? If native speakers of English have to produce long trains of unaccented syllables in Japanese, will they assign stress accents, due to the negative transfer of the lapse constraint? These questions will be answered in Experiment 7 by examining how Japanese phrases consisting of only unaccented syllables are treated by native speakers of English.

4.3. Experiment 6 (English)

4.3.1. Expected patterns in L2 English

In Section 4.2.1, we reviewed how the distribution of part of speech affects the temporal organization of English and Japanese. In English, monosyllabic function words are

111 cliticized, and they are durationally reduced as much as the unstressed syllables of content words are. Thus, we expect the following L1 English pattern:

Expected L1 English pattern : In L1 English production, the duration of monosyllabic function words is not significantly different from that of the unstressed syllables of content words.

To be able to group multiple words into a larger prosodic unit is one of the challenges in L2 prosodic development. How an utterance is grouped into prosodic units is generally called phrasing. We can easily notice that less proficient learners have difficulties with phrasing. Previous studies showed that less proficient learners tend to put fewer words into one phrase (i.e. less proficient speech is more “choppy”). Ueyama and Jun (1998) investigated the phrasing patterns of Japanese speakers and Korean speakers of L2 English by analyzing tone types, and they found that less advanced speakers of L2 English produce smaller L2 prosodic units regardless of their L1 types: i.e., it is more challenging to produce longer prosodic units in L2 speech production. A similar correlation was found by Jun and Oh (2000), a study that investigated intonation patterns in L2 Korean produced by native English speakers. How is this difficulty in L2 prosodic development reflected in the temporal/durational aspects of L2 speech? In L1 English, as mentioned above, the cliticization of function words, which is phonetically cued by duration reduction and consequent temporal proximity to the content word that follows it, contributes to the formation of larger prosodic units. Given that L2 English speakers have difficulties in producing large prosodic units, they are likely to fail to merge clitic function words into larger prosodic units with their hosts. If they do fail to treat function words as clitics, they will probably not reduce the vowels of such function words. Thus, we assume that the aforementioned general difficulty in forming larger L2 prosodic units must be reflected by difficulty in the cliticization and durational reduction of

112 English function words. If this is correct, we expect that L2 English speakers have a harder time to learn the durational reduction of English function words than the durational reduction of an unstressed syllable within a content word. In other words, we expect the following order in the acquisition of durational reduction in L2 English produced by Japanese speakers: Japanese speakers of L2 English first learn to shorten unstressed syllables of content words and establish a durational contrast between stressed and unstressed syllables at the word level; they later learn to cliticize and durationally reduce function words, so that they can produce a larger prosodic unit than a word. Thus, we expect the following duration pattern in the production of L2 English by native Japanese speakers:

Expected L2 English pattern : In L2 English produced by less experienced Japanese speakers, monosyllabic function words are longer than the unstressed syllables of content words16.

4.3.2. Method Subjects

Subjects for Experiment 6 were selected from the pool of candidates, on the basis of fluency of reiterant speech, since some candidates showed significant difficulties even after a certain amount of practice. The final set of subjects included one control group and one experimental group. The control group consisted of three native speakers of American

English: NE1, NE2 and NE3. NE1 was a male speaker, while NE2 and NE3 were female speakers. The experimental group consisted of 5 native speakers of Japanese who had been staying in the United States for more than 5 years (AE1, AE2, AE3, AE4 and AE5).

16 This hypothesis pertain to only those Japanese speakers of L2 English who already learned to shorten unstressed vowels. Beginning speakers do not differentiate stressed and unstressed vowels by duration, as shown by the results of Experiment 1, so they will probably not differentiate monosyllabic function words from the unstressed syllables of content words, since they do not produce vowel reduction in either context.

113 According to the criterion used to determine proficiency levels in the present study, all of them can be classified as advanced learners: i.e., they have had stayed in an English speaking country for more than 3 years. We assume that they have already learned some vowel reduction, based on our finding in Experiment 1. All Japanese participants spoke the Tokyo dialect as their native tongue. AE1 and AE3 participated in other English experiments of the present study (Experiments 1, 3 and 4), while the other three speakers did not. AE1 in Experiment 6 is the same speaker classified as AE4 in Experiments 1 and 3 and AE3 in Experiment 4, while AE3 in Experiment 6 is the same speaker classified as AE2 in Experiments 1 and 3. Background information for all native Japanese speakers is presented in Table 4-2.

Table 4-2: Background information of L2 English speakers in Experiment 6

age gender years of age of age of beginning duration of residence arrival in of English English instruction in the US the US instruction AE1 29 female 11 years 18 13 10 years 3 months AE2 32 female 9 years 22 13 13 years AE3 28 female 7 years 21 13 13 years AE4 23 female 5 years 18 13 12 years AE5 25 female 5 years 20 13 8 years

Speech Materials

The corpus for Experiment 6 contained three pairs of test sentences. Sentences in each pair were identical in terms of Inter-Stress Interval (ISI), and they were different in terms of the context of the tested unstressed syllables (within a polysyllabic content word vs. in a monosyllabic function word). In order to see how the number of interstress syllables affects duration, the ISI size varied from 1 to 3. The three pairs of test sentences are listed

114 in Table 4-3 (the acute mark indicates stress). The syllables which will be analyzed for duration patterns are underlined.

Table 4-3. Test sentences in Experiment 6

unstressed syllable monosyllabic in a content word function word

ISI=1 Gó to géther Sée the góvernor

ISI=2 Sée the phi lósopher Bórrow a pénny

ISI=3 Gó to the phi lósopher Húrry to the ládder

We controlled for the position of the tested syllables with respect to the beginning of the sentence as well as for (i.e., all target syllables have a CV structure). In order to avoid epenthesis in L2 English production by native speakers of Japanese, we avoided using words with consonant clusters.

Recording

The six target sentences were mixed with foil sentences. Sentences in each reading of the list were pseudo-randomized in different orders. To avoid segmental effects on duration while preserving prosodic patterns, the participants were asked to replace each syllable of the text with the syllable /no/ (reiterant speech). For example, “Go together” had to be read as /NO noNOno/ (“NO” and “no” indicate stressed and unstressed syllables, respectively).

The participants were given sufficient time to practice reiterant speech before the recording. We made sure that all our subjects were able to perform the task17. The speakers had to read each sentence ten times. The first reading was not analyzed. Data were recorded in the recording booth of the UCLA phonetics lab.

17 The Japanese participants were also asked to produce a set of Japanese sentences using reiterant speech. None of them showed difficulties.

115 Speech data were collected in two different periods, using different methods to present the reading materials to speakers: during the first data collection period, speakers read sentences from sheets, whereas during the second period sentences were presented on a computer screen. We believe that duration patterns were not affected by this difference. NE3’s performance was recorded in two different sessions using the two different presentation methods. In the first session, she read a list of pseudo-randomized sentences from sheets. After one year, she participated in the second session and read sentences presented on the computer screen. There was no significant difference in the duration patterns produced by this speaker in the two sessions. Two L1 English speakers (NE1, NE2) and four Japanese speakers of L2 English (AE2ÐAE5) read the list of sentences from sheets. For the other two speakers (NE3 and AE1), PsyScope was used to present sentences. Sentences were displayed on the computer screen one at a time.

Measurement

The recorded data were converted from analog to digital at a 10 kHz sampling rate. We measured the duration of the unstressed vowel /o/ of the syllable no for each of the six tested conditions, using Kay Elemetrics’ Computerized Speech Laboratory (CSL). All the measurements were based on waveform analysis and wide-band spectrograms. The energy distribution was additionally inspected in the cases of difficult segmentation. Since segmental duration is affected by the prosodic structure of utterances, we checked that each speaker produced similar intonation patterns across tokens by inspecting the sequence of pitch accents, phrase tones and boundary tones.

Statistic Analysis

Obtained values of duration were analyzed, using two-factor ANOVA and Scheffe’s post- hoc tests. The independent variables in the two factor ANOVAs were the effect of the

116 morphosyntactic context of unstressed syllables and the effect of the size of the ISI. The focus of Experiment 6 is on the effect of the morphosyntactic context of unstressed syllables on duration (content vs. function word conditions). The effect of the size of the ISI was included in the ANOVAs in order to control for the variance generated by this factor.

4.3.3. Results of Experiment 6 L1 English patterns

In Figure 4-1, the mean duration and standard deviation of the unstressed vowel /o/ in the two morphosyntactic contexts are plotted for each speaker of L1 English. ANOVA results are presented in Table 4-4 (shaded cells represent significant effects).

Figure 4-1: Mean duration & standard deviation of unstressed /o/ for L1 English speakers in Experiment 6 (α = 0.01) 160 120 NE1 NE2 120 90

80 60

40 30

0 0 ISI=1 ISI=2 ISI=3 ISI=1 ISI=2 ISI=3

160 NE3 monosyllabic function word

/o/ duration (ms) 120 unstressed syllable 80 in content word

40

0 ISI=1 ISI=2 ISI=3

117 Table 4-4: ANOVA results for L1 English speakers (α = 0.01)

context ISI context*ISI

NE1 F(1, 54) = .668 F(2, 54) = 8.67 F(2, 54) = .291 p = .4172 p = .0005 p = .7483

NE2 F(1, 54) = 0.003 F(2, 54) = 9.537 F(2, 54) = 2.888 p = .9559 p = .003 p = .0644

NE3 F(1, 40) = .998 F(2, 40) = .600 F(2, 40) = .444 p = .3776 p = .4430 p = .6446

Figure 4-1 shows no systematic relation between the duration of the unstressed vowel /o/ and the morphosyntactic context of the vowel for all speakers of L1 English. The absence of a systematic relation between /o/ duration and the morphosyntactic context is confirmed by the results of ANOVAs in Table 4-4: none of the L1 English speakers show any effects of the morphosyntactic context of unstressed syllables. These results show that a monosyllabic function word and an unstressed syllable in a content word are treated in the same way in the temporal organization of L1 English, as expected.

L2 English patterns

Mean durations of unstressed /o/ as produced by advanced speakers of L2 English are presented in Figure 4-2. In this figure, we can see that AE2ÐAE5 showed noticeably greater means for the monosyllabic function word context. On the other hand, AE1 showed a much smaller difference between the two morphosyntactic contexts. These two patterns among the five Japanese speakers of L2 English are also illustrated by the ANOVA results presented in Table 4-5: in AE1’s data, the effect of morphosyntactic context on duration is not statistically significant, while it is significant in the data of the other four advanced speakers of L2 English (AE2-AE5).

118 Figure 4-2: Mean duration & standard deviation of unstressed /o/ for advanced speakers of L2 English in Experiment 6 (α = 0.01)

120 120 AE1 AE2 90 90 * *

60 60

30 30

0 0 ISI=1 ISI=2 ISI=3 ISI=1 ISI=2 ISI=3 250 120 AE3 AE4 200 * * 90 * ** 150 60 100 30 50 /o/ duration (ms)

0 0 ISI=1 ISI=2 ISI=3 ISI=1 ISI=2 ISI=3 AE5 120 * * * 90

60 monosyllabic function word 30 unstressed syllable in content word 0 ISI=1 ISI=2 ISI=3

We additionally performed Scheffe’s post-hoc test for each speaker to find for which ISI size there was a significant context effect (i.e., function word vs. unstressed syllable of a content word). Results are presented in Figure 4-2 (word pairs showing significant differences are marked by an asterisk). In the production of AE4 and AE5, on average,

119 vowel duration was significantly longer in monosyllabic function words than in unstressed syllables of polysyllabic content words for all ISIs. AE2 showed the same pattern for ISI=2 and ISI=3, and so did AE3 for ISI=1 and ISI=2. Only AE1 showed the L1-English- like pattern: there was no significant difference between the two contexts for all ISIs.

Table 4-5: ANOVA results for advanced speakers of L2 English (α = 0.01)

context ISI context*ISI

AE1 F(1, 48) = .138 F(2, 48) = 4.844 F(2, 48) = 1.044 p = .7118 p = .0117 p = .3601

AE F(1, 54) = 24.699 F(2, 54) = 1.749 F(2, 54) = 2.375 2 p = <.0001 p = .1836 p = .1026

AE F(1, 54) = 18.403 F(2, 54) = 6.524 F(2, 54) = 3.75 3 p = <.0001 p = .0029 p = .0299

AE F(1, 54) = 45.486 F(2, 54) = 2.141 F(2, 54) = .447 4 p = <.0001 p = .1274 p = .6231

AE F(1, 54) = 81.946 F(2, 54) = 4.988 F(2, 54) = 3.488 5 p = <.0001 p = .01 p = .0376

4.3.4. Discussion of Experiment 6 Tendency toward less reduction for function words in L2 English

The following findings emerge from the results of Experiment 6:

• All L1 English speakers durationally neutralized monosyllabic function words and unstressed syllables of content words.

• On the other hand, in L2 English production by native Japanese speakers, monosyllabic function words are less reduced than unstressed syllables of content words (except for AE1, who showed a native-like pattern).

120 The observed tendency toward less reduction of a monosyllabic function word in L2 EnglishÐL1 Japanese suggests that Japanese learners of English tend to parse English monosyllabic function words not as phonological clitics, but as independent words or prosodic units carrying stress. This difficulty in the cliticization of a function word can be explained by either the negative transfer of L1 characteristics or a universal constraint on prosodic development. If the difficulty is due to L1 negative transfer, which feature(s) of L1 Japanese is relevant? One potential candidate is a L1 Japanese morphosyntactic characteristic, i.e., Japanese grammatical morphemes are suffixes, not words. Therefore, it may be cognitively challenging for native speakers of Japanese to cliticize English function words phonologically and reduce them durationally as much as they reduce unstressed syllables of content words. Japanese speakers’ difficulty in the cliticization of English function words could also be due to a general learning constraint on speech development. In her study of coarticulation in adults and children, Nittrouer (1994) investigated the production of the schwa syllable which preceded the target monosyllables (CV) in the carrier phrase, “it’s a ( CV ), Bob”. Her results showed that the children’s production of the schwa syllable consistently longer than that of adults, although the production of schwa is simple in terms of gestural organization. Nittrouer explained this pattern in the following way:

To a great extent, the production of schwa in this case could be modeled

with simple jaw lowering, followed by jaw raising. According to all other analyses, these children seemed capable of executing jaw movements with adult-like timing. Why then were their schwas so long? One possible explanation may be that these children were treating this function word more like a stressed syllable than the adults were. (Nittrouer 1994, p. 970)

121 Nittrouer’s finding echoes the aforementioned difficulty in the cliticization of English function words emerging from the results of Experiment 6. In this experiment, native speakers of Japanese replaced short English sentences with the simple open syllable /no/. This syllable should not be difficult in terms of gestural organization, especially since only fluent speakers of reiterant speech were selected for the recording. The parallel difficulty of cliticizing function words between English-speaking children and Japanese learners of English suggests that it is cognitively challenging to group different words phonologically into a single prosodic unit.

4.4. Experiment 7 (Japanese)

4.4.1. Expected patterns in L2 Japanese

As mentioned in Section 4.2.2, Japanese can have a long train of unaccented moras. Also, as mentioned in Section 1.5.2, Japanese displays the effect of phrase-final lengthening at the end of the IP (Intonational Phrase). Considering these two factors, we expect the following pattern in the production of a Japanese sentence by L1 Japanese speakers when the sentence is phrased as one IP or AP (Accentual Phrase) with no phrase-medial phrase break.

Expected L1 Japanese pattern : In L1 Japanese production, there is no significant durational difference among

non-sentence-final moras.

Since in English stresses tend to obey a rhythmic tendency towards even spacing and sequences of many unstressed syllables are avoided, native English speakers are likely to negatively transfer this characteristic of L1 English to their production of L2 Japanese. If this is correct, we expect the following pattern in L2 JapaneseÐL1 English:

122 Expected L2 Japanese pattern : Native speakers of English will tend to introduce accents in Japanese sentences consisting of unaccented moras. Accented moras in their L2 Japanese production are lengthened and produced with longer duration than unaccented moras.

4.4.2. Method Subjects

The set of speakers for Experiment 7 included one control group and two experimental groups. The control group of this experiment consists of 4 native speakers of Japanese (Tokyo dialect) who were college students in Tokyo, Japan (NJ1, NJ2, NJ3 and NJ4). They were female speakers except for NJ1. The experimental groups are made up of three native speakers of American English learning Japanese (L2 Japanese speakers, henceforth): two advanced speakers of L2 Japanese (AJ1, AJ2) and one beginning speaker of L2 Japanese (BJ1). The three speakers of L2 Japanese also participated in Experiments 2 and 5. Refer to Section 2.4.1 for information on the learning backgrounds of L2 speakers. AJ1, AJ2 and BJ4 in Experiments 2 and 5 correspond to AJ1, AJ2 and BJ in Experiment 7, respectively.

Speech Materials

The corpus of Experiment 7 contained Japanese nouns without lexical pitch accent varying in length from 3 to 4 moras. The words were embedded in the following pro-drop structure ending with the inflectional morpheme -da, which corresponds to the copula in English:

[ X ] - da

123 ‘(It) is X’

4-mora sentence 5-mora sentence (3 mora noun + -da) (4 mora noun + -da) okane-da tomodatSi-da money-COP friend-COP ‘It's money’ ‘it’s a friend’

These two target words were selected from the list of words taught in the UCLA Japanese language program. A potential confounding factor is the variation of intonational patterns across speakers or across different repetitions of the same token for the same speaker. I use the pro-drop frame sentence in order to elicit the production of each sentence as one Accentual Phrase (AP). This will minimize the risk of intonation variation in the experiment. It has been pointed out that crosslinguistically sentences tend to break up into multiple tonal groups at syntactic boundaries, as the size of phrases increases. One boundary that is likely to cause a tonal break-up is that between a subject and a predicate. There is no boundary of this sort in the frame sentence of Experiment 7, which only retains the complement and the inflectional morpheme -da. Thus, the use of this structure is expected to minimize the risk of tonal breakups.

Recording

The two target sentences were mixed with foil sentences. Sentences in each reading of the list were pseudo-randomized in different orders. To avoid segmental effects on duration while preserving prosodic patterns, the participants were asked to replace each mora of the text with the mora /no/ (reiterant speech), as in Experiment 6 (English). For example,

[tomodatSi-da] had to be read as /nonononono/.

124 In the recording session, PsyScope was used to present sentences. One sentence at a time was displayed on the computer screen at a time. The participants were given sufficient time to practice reiterant speech ahead of recording. We made sure that all our subjects were able to perform the task. None of them exhibited difficulties. The speakers had to read each sentence ten times. The first reading was not analyzed. Data were recorded in the recording booth of the UCLA phonetics lab for L2 Japanese, and in the recording room of Meiji Gakuin University Information Center in Tokyo for L1 Japanese.

Measurement

The recorded data were converted from analog to digital at a 10 kHz sampling rate. The data were analyzed for duration patterns, following the method used in Experiment 6. Before measurements, we made sure that a sentence was produced as one prosodic phrase (i.e., Accentual Phrase) with neither hesitation or break in the middle of a sentence.

4.4.3. Results of Experiment 7 (Japanese) L1 Japanese patterns

The mean duration and standard deviation of the vowel /o/ in each mora position of the 4-

mora (okane-da) and 5-mora sentences (tomodatSi-da) are plotted for L1 Japanese speakers in Figures 4-3a, b, respectively. The following effects were consistently shown in both 4- mora and 5-mora sentences by all four speakers of L1 Japanese:

• The first mora is the shortest among all moras of the sentence.

• The last mora is the longest among all moras of the sentence.

Special properties of the initial and final positions of a prosodic phrase have been found in a number of studies, and they are known as edge effects (see Fougeron 1999 and Cambier-

125 Langeveld 2000 for reviews of phrase-initial and -final effects, respectively ). I consider the two systematic patterns presented above as effects of these in Japanese. The longest duration of the last mora in a sentence/Intonational Phrase is due to sentence-final lengthening, which was also observed in earlier studies of Japanese (e.g., Ueyama 1999 for experimental evidence). The shortest duration of the first vowel can be interpreted as the result of a sentence-initial shortening effect phenomenon. As far as I know, this has never been observed in earlier studies of Japanese.

Figure 4-3a: Mean duration & standard deviation of vowel /o/ of 4-mora unaccented sentence (okane-da) for L1 Japanese speakers

200 160 NJ1 NJ2 150 120 a a a a 100 80

50 40

0 0 1234 1234 200 200 NJ3 NJ4 a 150 a ab 150 /o/ duration (ms) a a 100 100

50 50

0 0 1234 1234 mora position (Letters on the top of bar columns indicate the grouping of means on the basis of ANOVAs and post-hoc tests.)

126 Figure 4-3b: Mean duration & standard deviation of vowel /o/ of 5-mora unaccented sentence (tomodatSi-da) for L1 Japanese speakers

150 NJ1 160 NJ2 a a a c 100 120 a b 80 50 40

0 0 12345 12345 240 200 NJ3 NJ4 /o/ duration (ms) 180 150 a a b a a a 120 100

60 50

0 0 12345 12345 mora position (Letters on the top of bar columns indicate the grouping of means on the basis of ANOVAs and post-hoc tests.)

There are two and three moras in the phrase-medial position of the 4-mora and 5-mora sentences, respectively, and these medial moras are not subject to edge effects. For each speaker, the statistical significance of a mean difference was tested across different medial mora positions in each sentence type, conducting a series of one-factor ANOVAs (tested mora positions were the 2nd vs. 3rd for the 4-mora sentence, and the 2nd vs. 3rd vs. 4th for the 5-mora sentence). The grouping patterns of phrase-medial moras are indicated by letters above the bars of the graphs in Figures 4-3a, b (i.e., a, b, c). ANOVA results are shown in Table 4-6.

127 Table 4-6: ANOVA results for phrase-medial moras in 4-mora and 5-mora sentences produced by L1 Japanese speakers (α = 0.01)

NJ1 NJ2 NJ3 NJ4

4-mora F(1, 16) = 1.75 F(1, 14) = 3.21 F(1, 16) = 9.61 F(1, 16) = 2.2 sentence p = .2040 p = .095 p = .007 p = .1571

5-mora F(2, 20) = 1.72 F(2, 24) = 21.09 F(2, 24) = 10.64 F(2, 24) = 1.36 sentence p = .2059 p = <.001 p = .0005 p = .2757

For the 4-mora sentence, mora position (2nd vs. 3rd moras) did not significantly affect the duration of the unaccented /o/ for the data of all L1 Japanese speakers except NJ3. For the 5-mora sentence, there were significant effects of mora position (2nd vs. 3rd vs. 4th moras) for NJ2 and NJ3, but not for NJ1 and NJ4. For the data of NJ2 and NJ3, Scheffe’s post-hoc tests were additionally conducted in order to find how the three mora positions are distinguished by the duration of the vowel /o/. The grouping of the mora positions are summarized in Table 4-7, on the basis of results of ANOVAs and Scheffe’s posthoc tests.

Table 4-7: Grouping of mora positions by L1 Japanese speakers

NJ1 NJ2 NJ3 NJ4

4-mora ( 2, 3 ) ( 2, 3 ) 2 > 3 ( 2, 3 ) sentence

5-mora ( 2, 3, 4 ) 4 > 2 > 3 ( 2, 4 ) > 3 ( 2, 3, 4 ) sentence

We find two major patterns in the L1 Japanese data. First, there is no difference across different mora positions: NJ1, NJ2 and NJ4 for the 4-mora sentence, and NJ1 and NJ4 for the 5-mora sentence. Second, duration is longer in the 2nd than 3rd position of the 4-mora sentence (NJ3), or in the 2nd and 4th positions than in the 3rd positions of the 5-mora

128 sentence (NJ2 and NJ3). Note that this relation is observed in the individual plots of all four speakers in Figure 4-3b. All these patterns suggest that the 3rd mora tends to be shorter than the 2nd and 4th moras.

L2 Japanese accent patterns

In L1 Japanese, no pitch accent was produced in either the 4-mora or in the 5-mora sentence types. Accent patterns were transcribed for all L2 Japanese speakers, in order to see whether they could produce a native-like sequence of unaccented moras. Our way of transcribing accent patterns was mostly based on the traditional method of kokukogaku (traditional Japanese linguistics), in which it is assumed that each mora is associated with either H or L tones. In addition, a pitch accent, i.e., an H tone immediately followed by a sharp pitch fall, was indicated by H*L. Accent transcriptions of L2 Japanese are presented, along with the ones for L1 Japanese, in Table 4-8. The total number of tokens for each sentence type is 9 (except for the 5-mora sentence for BJ).

Table 4-8: Accent patterns in L1 and L2 Japanese

L1 Japanese AJ1 AJ2 BJ

4-mora 9 (LHHH) 9 (LHHH) 8 (LHHH) 9 (LH*LL) sentence 1 (LLHH)

5-mora 9 (LHHHH) 9 (LLLLL) 9 (H*LLLL) 3 (LLH*LL) sentence no phrasal H- 5 (H*LH*LL)

Speaker AJ1 produced a sequence of unaccented moras reliably for both sentence types. In contrast, BJ did not show any instance of an unaccented phrase: he placed an accent on the 2nd mora in all 9 repetitions of the 4-moras sentence; an accent on the 3rd mora in all 9 cases and an additional accent on the 1st mora in 5 out of 9 for the 5-mora sentence. Finally, AJ2 produced a native unaccented phrase in 8 out of 9 counts for the 4-mora

129 sentence, but he produced a phrase with a pitch accent on the first mora for the 5-mora sentence.

L2 Japanese vowel duration patterns

The mean duration and standard deviation of the vowel /o/ are shown for L2 Japanese speakers and a representative speaker of L1 Japanese (Speaker NJ4) for each mora position in the 4-mora sentence (okane-da) in Figure 4-4a. The same information is presented for the 5-mora sentence (tomodatSi-da) in Figure 4-4b.

Figure 4-4a: Mean duration & standard deviation of vowel /o/ of 4-mora unaccented sentence (okane-da) for one L1 and three L2 Japanese speakers

200 200 NJ4 AJ1 150 150 a a 100 100 aa

50 50

0 0 1234 1234 200 200 AJ2 BJ 150 150 a /o/ duration (ms) b a 100 a 100

50 50

0 0 1234 1234 mora position (Letters on the top of bar columns indicate the grouping of means on the basis of ANOVAs and post-hoc tests.)

130 Figure 4-4b: Mean duration & standard deviation of vowel /o/ of 5-mora unaccented sentence (tomodatSi-da) for one L1 and three L2 Japanese speakers

200 200 NJ4 AJ1 150 150 a a a a a 100 100 a

50 50

0 0 12345 12345 160 160 AJ2 BJ b a /o/ duration (ms) 120 120 a a a a 80 80

40 40

0 0 12345 12345 mora position (Letters on the top of bar columns indicate the grouping of means on the basis of ANOVAs and post-hoc tests.)

As with the L1 Japanese data, a series of one-factor ANOVAs were conducted to look for differences between medial moras (not including the initial and final moras of each sentence type, which are expected to be subject to edge effects). ANOVA results are

shown in Table 4-9. Scheffe’s post-hoc tests were additionally conducted for the data of the 5-mora sentence in order to find which of the three medial moras were significantly different. The grouping patterns of phrase-medial moras are indicated by letters above the bars of the graphs in Figures 4-4a, b (i.e., a, b, c), and also summarized in Table 4-10, on the basis of results of ANOVAs and Scheffe’s posthoc tests.

131 Table 4-9: ANOVA results for phrase-medial moras in 4-mora and 5-mora sentences produced by three L2 Japanese speakers (α = 0.01)

AJ1 AJ2 BJ

4-mora F(1, 16) = .048 F(1, 14) = 7.411 F(1, 16) = 23.36 sentence p = .8298 p = .0165 p = .0002

5-mora F(2, 18) = 5.669 F(2, 24) = 2.957 F(2, 24) = 15.5 sentence p = .0123 p = <.0711 p = <.0001

Table 4-10: Grouping of mora positions by one L1 and three L2 Japanese speakers

NJ4 AJ1 AJ2 BJ

4-mora ( 2, 3 ) ( 2, 3 ) ( 2, 3 ) 2 > 3 sentence

5-mora ( 2, 3, 4 ) ( 2, 3, 4 ) ( 2, 3, 4 ) ( 2, 4 ) < 3 sentence

As shown in Figure 4-4a, for the 4-mora sentence (okane-da), AJ1 and AJ2 did not produce any significant mean difference between the 2nd and 3rd moras, while this difference was significant for BJ. The significantly longer duration of the second mora in BJ’s production is probably related to the constant placement of an accent on the second mora, as mentioned earlier (see Table 4-8). As for the 5-mora sentence, AJ1 and AJ2 did not show any mean difference between

the three phrase-medial moras for the 5-mora sentence (tomodatSi-da). Speaker BJ produced the 3rd mora significantly longer than the 2nd and 4th moras, which were not distinguished. Note that in his production the 3rd mora of the 5-mora sentence was consistently accented like the 2nd mora of the 4-mora sentence.

132 4.4.4. Discussion of Experiment 7 L1 Japanese patterns

The results of Experiment 7 display the following major patterns for L1 Japanese:

• Two types of systematic prosodic effects on vowel duration were observed: phrase-initial shortening and phrase-final lengthening.

• As a general tendency, there is no significant durational difference among medial moras in L1 Japanese production. If there is a difference, the 2nd and 4th moras are longer than the 3rd mora.

Phrase-final lengthening and the absence of durational difference across medial moras confirm the expected patterns, which were presented in Section 4.4.1. However, the trend toward the longer duration of the 2nd and 4th moras was not expected. There are three possible explanations for the observed tendency toward lengthening of the 2nd and 4th moras. The first explanation is that the observed pattern may be due to the effects of bimoraic foot structure and foot-final lengthening: a foot-final mora is longer than a foot-initial mora (e.g., o ka #neda, to mo #da t S i#da). The presence of bimoraic foot structure has been proposed to explain phenomena related to various properties of Japanese in various types of word formation processes, such as compounding, blending and borrowing (e.g., Tateishi 1989; Ito 1990; Mester; Poser 1990; Kubozono 1995).

Alternatively, the longer duration of the 2nd and 4th moras could be caused by the combination of the effects of a phrasal H- tone on the second mora and the spreading of sentence-final lengthening effects back to the second last mora (i.e., the penultimate). The AP is the domain of lexical accent patterns in Japanese as mentioned in Section 1.5.1, and the H- phrasal tone is linked with the second mora of an AP. Similarly, in Korean the phrasal H- tone of the AP is associated with either the second mora or the first two moras

133 (depending on syllable types). Jun (1995) found that the second syllable (constantly associated with a phrasal H- tone) is the longest in both sentence-initial and -medial APs. This lengthening effect of a phrasal H- on the 2nd syllable in the Korean AP seems to go in the same direction as the constantly longer duration of the 2nd mora in the Japanese AP, as emerging from our L1 Japanese data. Finally, the data of L1 Japanese also showed that in the 5-mora sentence the 4th mora tends to be longer than both the 3rd mora and the 2nd mora. This could be due to the spreading effect of sentence-final lengthening back to the penultimate mora, i.e., the 4th mora in the 5-mora sentence. If this is correct, the 3rd mora of the 4-mora sentence, which is also the second-to-last mora, should be affected by the spreading of sentence-final lengthening, and consequently be lengthened. Indeed, this appears to be the case. This interpretation can also explain why the 3rd mora of the 4-mora sentence, which is also a penult, is not as short as the 3rd mora of the 5-mora sentence. Of course, it is possible that all three effects discussed (binary-foot constraint, phrasal H- lengthening and spreading of final lengthening) affect the duration patterns of Japanese moras. None of these effects has been investigated in experimental studies (except for Kondo’s (1999) investigation of the binary-foot constraint on coarticulation). Further investigation is needed.

Production of consecutive unaccented moras in L2 JapaneseÐL1 English

In Experiment 7, we investigated the accent and duration patterns of two advanced and one beginning speakers of L2 Japanese (AJ1, AJ2 and BJ). Accent patterns produced by these speakers are shown along with those of three L1 Japanese speakers in Table 4-11 (adapted from Table 4-8).

134 Table 4-11: Accent patterns in L1 and L2 Japanese

L1 Japanese AJ1 AJ2 BJ

okane-da LHHH LHHH LHHH LH*LL LLHH

tomodatSi-da LHHHH LLLLL H*LLLL LLH*LL no phrasal H- H*LH*LL

BJ did not produce any native-like unaccented phrase. His consistent pattern was to place a

pitch accent (H*) on the penult of each target noun18 (i.e., okáne and tomodátSi) and an additional accent on the first mora of tomodatSi (i.e., tómodátSi). BJ’s systematic placement of accents on Japanese unaccented words can be explained by the negative transfer of the lapse constraint in L1 English, which militates against a sequence of many unaccented moras/syllables. This speaker’s favoring of a penultimate accent may be due to the influence of English loanword phonology. It is well known that stress tends to be placed on the penultimate syllable when foreign words are borrowed into English. This stress pattern is systematic, at least, for loanwords from Japanese: e.g., karáte, teriyáki, harakíri and tSiraSizúSi. This pattern agrees with the accent pattern of the two target words of Experiment 7: okáne and tomodátSi as produced by BJ. While Speaker BJ systematically showed one type of non-native accent pattern, AJ1 was consistently able to produce a sequence of unaccented syllables for both sentence types

(i.e., not produce a H tone immediately followed by a L tone). This speaker also placed a phrasal H- on the second mora of okane-da (LHH-H), which is a characteristic of Tokyo

Japanese, but he did not produce H- for tomodatSi-da (LLLL-L instead of LHHH-H). AJ2 showed both native and non-native accent patterns. The mixture of native and non-native

135 patterns in the data of these speakers shows that they have learned to produce a sequence of unaccented moras, but that their production of a native-like unaccented phrase (i.e., LHH- H) is not yet stable.

Differences in accent patterns across beginning and advanced L2 Japanese

Two crucial differences are found between advanced and beginning speakers of L2 Japanese. The first difference is that the beginning speaker (BJ) could not produce a sequence of unaccented patterns, while advanced speakers (AJ1 and AJ2) could (even though AJ2’s production was not stable). The second difference between beginning and advanced speakers of L2 Japanese is in which accent type is used, stress accent or non-stress accent. In Figures 4-5a and 4-5b, the waveforms and pitch contours of the reiterant speech of tomodatSi-da are shown for BJ and AJ2. The accent pattern of BJ’s production is H*LH*LL, while that of AJ2 is H*LLLL. In BJ’s production, the first and third moras receive a H tone immediately followed by a L tone (i.e., H*+L in Figure 4-5a), which is indicated by two pitch peaks aligned with these moras. Notice that these accented moras are longer in duration and greater in intensity than the other three unaccented moras, as shown in the waveform of Figure 4-5a. As mentioned in Chapters 1 and 2, English stress accent is characterized by higher pitch, longer duration and greater intensity, and all these three acoustic features are manifested in the accents of BJ’s production. Thus, we can conclude that BJ uses English stress accent in his L2

Japanese production: i.e., the negative transfer of English stress accent in L2 Japanese.

18 Here we mean not a target sentence but a target word. Please recall that the 4-moras and 5-moras target sentences of Experiment 7 (okane-da and tomodatSi-da) were formed by embedding the 3-moras and 4-moras target nouns okane and tomodatSi in a pro-drop structure ending with the inflectional morpheme -da.

136 Figure 4-5a: Waveform and pitch contour of “tomodatSi-da” in BJ’s production no no no no Ð no

200 BJ 175 H*+L H*+L 150 125 100 75 Hz ms 200 400 600 800 1000

Figure 4-5b: Waveform and pitch contour of “tomodatSi-da” in AJ2’s production nonono no Ð no

150 AJ2 125 H*+L

100

75

Hz ms 150 300 450 600 750

Figure 4-5b shows that AJ2 placed an accent on the first mora of the 5-mora sentence, as is indicated by the alignment of the pitch peak of the entire phrase with this mora. In AJ2’s production, the first accented mora is not as long and loud as the second mora, which is not accented. The acoustic contrast between the first accented and second unaccented moras is much smaller in AJ2’s than BJ’s production. All this suggests that while BJ uses English stress accent in his production of L2 Japanese, AJ2 does not.

Learning two properties of Japanese prosody: accent and duration patterns

137 The following duration patterns of phrase-medial moras and accent patterns of the 4-mora and 5-mora sentences were displayed by the three speakers of L2 Japanese in Experiment 7:

• BJ, a beginning speaker of L2 Japanese, made the penult of each target word significantly longer than the other medial moras (i.e., o ka ne-da and

tomod a tSi-da). The penult of both target nouns was always accented in his production.

• AJ1, who consistently produced a sequence of unaccented syllables, showed native-like duration patterns, i.e., he did not make any significant difference in duration across phrase-medial unaccented moras.

• AJ2 successfully produced a native-like unaccented pattern for okane-da,

but he placed a pitch accent (H*) on the first mora of tomodatSi-da. In both sentence types, he did not place any accent on phrase-medial moras and did not make any significant difference in duration across phrase-medial moras.

What emerges from these results is that accent and duration patterns seem to develop separately in the acquisition of L2 Japanese prosody. This point can be made clear by the re-examination of the case of tomodatSi-da. Individual plots of the three speakers of L2 Japanese, adapted from Figure 4-5b, are annotated for accent patterns and shown in Figure

4-6. In this figure, the first mora is additionally included for post-hoc analysis, in order to find how all non-final moras are differentiated by duration. The grouping of non-final moras is indicated by letters in the figure (i.e., a, b). As for accent patterns, the native-like pattern does not have any pitch accent (H*). While AJ1 produced a native-like unaccented pattern, AJ2 and BJ did not. AJ2 always placed a pitch accent on the first mora. BJ always placed an accent on the third mora and added another accent on the first mora for

138 some tokens of tomodatSi-da. As for duration patterns, the baseline in L1 Japanese patterns is that there is no significant difference across medial moras regardless of accent19. BJ produced the accented third mora with significantly longer duration than the other moras, which is a non-native-like pattern. AJ2, who did not produce a native-like accent pattern, showed a native-like duration pattern like AJ1.

Figure 4-6: Mean duration and standard deviation of the vowel /o/ of “tomodatSi-da” for a L1 Japanese speaker (NJ4) and three L2 Japanese speakers (based on Figure 4-5b) pitch-accented (H*)

160 160 AJ2 BJ b a a 120 120 a a a a a 80 80

40 40

0 0 12345 12345 200 200 AJ1 NJ4 (no pitch accent) 150 (no pitch accent) /o/ duration (ms) 150 a a a a a a a 100 100

50 50

0 0 12345 12345 mora position (Letters on the top of bar columns indicate the grouping of means on the basis of ANOVAs and post-hoc tests.)

19 The data of L1 Japanese in Experiment 7 showed a systematic tendency toward the longer duration of the 2nd and 4th mora. However, note that this is not conditioned by the distribution of pitch accents since all test sentences of this experiment consisted of unaccented moras.

139 The distribution of native-like and non-native-like patterns of accent and duration in production by the three speakers of L2 Japanese can be summarized in the following way (positive and negative symbols indicate native-like and non-native-like patterns, respectively):

L2 Japanese L2 Japanese accent patterns duration patterns AJ1 ++ AJ2 Ð+ BJ ÐÐ

These learning patterns suggest that native speakers of English can learn native-like Japanese duration patterns before accent patterns. However, more data from other L2 Japanese speakers whose native language is English is needed to make this suggestion a true generalization.

Transfer effects in L2 Japanese

The predicted pattern for L2 Japanese, as discussed in Section 4.4.1, is the following:

Native speakers of English will tend to introduce accents in a Japanese sentence consisting of unaccented moras. Accented moras in their L2 Japanese

production are lengthened and produced with longer duration than unaccented moras.

The results of Experiment 7 show the strong effect of L1 English characteristics on both accent and duration patterns in the initial stage of learning Japanese by English speakers. A beginning speaker of L2 Japanese, BJ, constantly placed accent(s) on a sequence of Japanese unaccented moras, respecting the English lapse constraint and loanword

140 phonology, and he produced those accents by employing the acoustic features of English stress accent. These results confirm the aforementioned predicted pattern of L2 JapaneseÐL1 English. The comparison of beginning and advanced speakers of L2 Japanese suggests a possible developmental path of accent and duration patterns: native-like duration patterns (i.e., no durational difference between phrase-medial moras) are learned before native-like accent patterns (i.e., the phrase-initial H tone on the second mora and no pitch accent on unaccented APs). At the moment, this tendency is unexplained.

Effects of instructions of Japanese as a foreign/second language

English speakers’ overall difficulty in learning a sequence of Japanese unaccented moras could be emphasized by the absence of explicit and systematic instruction of Japanese accent patterns in Japanese classrooms. What emerged from interviews and also a survey of the L2 learning background of the English speakers, who participated in the present study, is that a majority of speakers had neither systematic instruction nor continuous practice in Japanese accent patterns. Indeed, this is a common situation in teaching Japanese as a foreign/second language. This low emphasis on the instruction of Japanese accent patterns may be due to the fact that the distinctive function of Japanese accent patterns is not as prominent as in tone languages such as Chinese or Thai (e.g., Beckman 1986 for a review). The low distinctive function of Japanese accent is probably one of the

causes of the rich variation in accent patterns across different Japanese dialects or across different generations within each region. In spite of remarkable differences in accent patterns, speakers of different Japanese varieties can understand each other. Maybe because of this, teaching accent patterns appears to be a low priority in the curricula of Japanese as a foreign/second language.

141 4.5. Summary of Experiments 6 and 7

In Experiments 6 and 7, we investigated temporal organization across syllables in L2 English and L2 Japanese, respectively. In Experiment 6, we examined the morphosyntactic aspect of temporal organization, focusing on how Japanese speakers of L2 English learn to cliticize and reduce the duration of English function words. The results of this experiment showed that Japanese learners of English tend to produce English monosyllabic function words with longer duration than unstressed syllables of content words. This indicates that in the temporal organization of their L2 English function words are treated not as phonological clitics, but as independent words or prosodic units carrying their own stress. This tendency can be explained by either the negative transfer of L1 characteristics or by a universal learning constraint on producing function words as phonological clitics. In Experiment 7, we investigated how English speakers of L2 Japanese treated Japanese phrases consisting of only unaccented moras, in order to find whether the lapse constraint in the L1 English timing system affects the production of L2 Japanese or not. The results of Experiment 7 showed that English speakers of L2 Japanese tend to insert accents in a train of unaccented moras and also realize those accents using the acoustic features of English stress, i.e., characteristics of L1 English timing are transferred to L2 Japanese. The comparison of beginning and advanced L2 Japanese suggests that it may be

harder to learn accent patterns than duration patterns in the production of Japanese phrases. More data are needed to confirm this possibility.

142 Chapter 5: Awareness of L2 Syllable Structures

In Chapters 2-4 of the present study, I presented a series of phonetic experiments investigating how the L1 prosodic system affects the production of L2 prosody in various respects. An additional issue relevant to a better understanding of L2 prosodic development is how L2 speakers treat the syllable structures of their target language. Are L2 speakers aware of L2 syllable structure in the same way in which they are aware of L1 syllable structure? Does the phonological awareness of L2 syllable structure change over the time course of L2 speech development? How is the awareness of L2 syllable structure related to the phonetic patterns of L2 speech? The main goal of this chapter is to provide a partial answer to these questions.

5.1. English and Japanese syllable structure Syllable complexity

Languages differ in terms of the possible types of syllable structure they allow. As mentioned earlier, English allows a more complex syllable structure than Japanese does, as illustrated in Table 5-1.

Table 5-1: Syllable structure in English and Japanese

English Japanese

syllable (C)(C)(C)V(C)(C)(C) (C)V V structure ( {) C} (Kubozono 1989)

An English syllable can have a maximum of three consonants before and after the nuclear vowel (i.e., in the onset and the coda of the syllable, respectively), while a Japanese

143 syllable allows a maximum of one consonant in each position. Furthermore, Japanese allows only a nasal stop or the first half of a long consonant (e.g., pa n ‘bread’ and ka tta ‘bought’, respectively) in coda position, while English allows more types of consonants.

Awareness of internal syllable structures

English and Japanese speakers attribute different internal constituency to the syllables of their respective languages: Japanese speakers posit a subsyllabic unit, the mora, while English speakers are not aware of the presence of moras. This difference affects the way in which speakers segment words into minimal prosodic units. Native English speakers use the syllable as the minimal segmentation unit, while native Japanese speakers use the mora in word segmentation, as mentioned in Section 1.4.4. For example, the English word ten is not further segmented by native speakers of English, since this word is monosyllabic. The same word was borrowed and lexicalized in Japanese. Native Japanese speakers are also aware of the fact that this word is monosyllabic, but they further break it into two moras. This difference in the way in which syllable-internal constituents are perceived can be represented by the graphs in Figure 5-1 (for Japanese, the kind of structure proposed by Kubozono (1989) is assumed)20.

20 In phonological theories, the mora is used to represent syllable weight in order to account for the fact that some phonological phenomena are systematically constrained by how heavy syllables are. For example, in English, a CV with a lax vowel and a CV with a tense vowel or a diphthong are classified as light and heavy syllables, respectively. This classification is based on the fact that a CV with a lax vowel show phonological behaviors different from those of a CV with a tense vowel or a diphthong. For example, in English, a light CV by itself cannot form an independent word (*/bI/), while a heavy CV can form a word by itself (/bi/ and /baI/). The internal structures of the two types of syllables are distinguished by different representations at the moraic level: light syllables are represented as monomoraic and heavy syllables are represented as bimoraic. In the present chapter, we are dealing with speakers’ awareness of moras, and not with whether moras are active or not in phonological processes. Thus, for our purposes it makes sense to assume a representation of English syllables such as the one in Figure 5-1.

144 Figures 5-1: Syllable structures of English and Japanese in native speakers’ awareness

English Japanese word ω word ω

syllable σ syllable σ mora µµ

segment t E n segment t e n 'ten' 'point'

In Figure 5-1, the internal representations of the English word /tEn/ and the Japanese word /ten/ ‘point’ are compared. The English word consists of three segments forming a CVC syllable. Similarly, the Japanese word consists of three segments forming a syllable, but the sequence can be treated as both one unit (one syllable) or two (two moras) by native Japanese speakers: /ten/ Ð> /ten/ or /te + n/. A vowel can constitute one syllable and one independent mora (e.g., /i/ ‘stomach’, /e/ ‘painting’ and /o/ ‘tail’), but an onset consonant cannot stand alone as either a syllable or a mora. On the other hand, a coda nasal forms an individual mora, and it is called sokuon or moraic nasal and often transcribed as /N/ in Japanese phonology (see Vance 1987; Shibatani 1990; Tsujimura 1996). In addition to the coda nasal /N/, the second half of a long vowel and the first half of a long consonant also behave as individual moras in Japanese. They are called cho-on and hatsuon and represented by /R/ and /Q/ in Japanese phonology, respectively. Examples of these two moraic segments inside Japanese syllables are shown in Figure 5-2. The two

Japanese words, /took∆oo/ ‘Tokyo’ and /tatta/, consist of two syllables and are syllabified as /too-k∆oo/ and /tat-ta/, respectively. Each syllable of ‘Tokyo’ has a long vowel, /oo/, and the second half of this vowel is counted as an independent mora. Thus, this word contains four moras, /to-o- k∆ o-o/. The word /tatta/ contains a long consonant /tt/ in the middle of

145 the word. The first half of this long consonant (i.e., /Q/ or hatsuon) is counted as one mora. Thus, this word is counted as three moras, /ta-t-ta/.

Figures 5-2: More examples of Japanese syllable structure

word ω ω ω ω

syllable σ σ σ σ mora µµ µµ µµ µ

segment t o o k∆ o o t a t t a 'Tokyo' 'stood'

In addition to word segmentation, evidence for Japanese speakers’ awareness of the mora can be found in various linguistic phenomena. For example, the Japanese alphabet is mora-based, in the sense that each letter corresponds to one mora (i.e., the moraic nasal, the first half of a long consonant or the second half of a long vowel and all sequences of onset+short vowel/first half of a long vowel are represented by single letters)21. The meter of Japanese poetry (e.g., haiku) is based on mora counting, not on syllable counting. Word games played by Japanese children are based on mora counting. Note that there is no corresponding evidence for the existence of the mora in English (e.g., the English version of haiku is based on syllable counting, not mora counting).

5.2. Research questions

As shown in the review of the previous section, English and Japanese syllable structures show crucial differences in terms of two linguistic properties: 1) structural complexity (i.e., English allows more complex syllables with more consonants than Japanese) and 2) the

21 In recent psycholinguistic studies, it has been shown that the Japanese alphabet system, kana, plays a substantial role in developing the awareness of the mora in L1 Japanese acquisition. See Inagaki, Hatano and Otake (2000) for experimental evidence.

146 absence vs. presence of the mora in speakers’ phonological awareness (i.e., mora is present in Japanese, but not in English). Do language learners become aware of the characteristics of L2 syllable structure? More specifically, the following questions will be asked with respect to L2 EnglishÐL1 Japanese and L2 JapaneseÐL1 English:

• Do Japanese speakers of L2 English learn to segment English words into syllables without breaking English consonant clusters into moras/syllables?

• Do English speakers of L2 Japanese learn to segment Japanese words into moras (vs. syllables)?

In order to answer these questions, we conducted a phonological survey in which we asked participants for judgments on word segmentation.

5.3. Method

5.3.1. Subjects

Seven speakers of L1 English (NE1Ð7) and 3 advanced and 4 beginning Japanese speakers of L2 English (AE1Ð3 and BE1Ð4, respectively) participated in the English phonological survey. All Japanese speakers of L2 English except for BE4 also participated in Experiment 4, which investigated the production of English tense vs. lax vowels. Five speakers of L1 Japanese (NJ1Ð5) and 3 advanced and 4 beginning speakers of L2 Japanese

(AJ1Ð3 and BJ1Ð4, respectively) participated in the Japanese phonological survey. All speakers of L2 Japanese also participated in Experiment 5, which investigated the production of Japanese long vs. short vowels.

5.3.2. Materials

The speech materials of the phonological survey were designed in order to test all possible syllable structures for English and Japanese. The English corpus consisted of 44

147 monosyllabic words; the Japanese corpus consisted of 24 words, varying in the number of moras from one to three. Target words used in Experiments 4 and 5 (which tested the production of L2 vowel contrasts) were included. The English and Japanese materials are listed in Tables 5-2 and 5-3, respectively. In both tables, the syllable structure of each test word is presented next to the word. In English, both diphthongs and tense vowels are represented as VV. In Japanese, a long vowel is represented as VR.

Table 5-2: 44 monosyllabic words used in the English phonological survey

I VV chit CVC kept CVCC tribes CCVVCC it VC dip CVC next CVCC troops CCVVCC eat VVC kin CVC beast CVVCC scree CCCVV eyes VVC pit CVC kind CVVCC strew CCCVV X VCC bid CVC liked CVVCC stray CCCVV iced VVCC deep CVVC least CVVCC split CCCVC east VVCC cheat CVVC true CCVV strict CCCVCC asked VCCC keen CVVC spit CCVC strength CCCVCCC say CVV bead CVVC speak CCVVC strive CCCVVC cry CVV Pete CVVC spoil CCVVC striped CCCVVCC few CVV tax CVCC stream CCVVC streams CCCVVCC

Table 5-3: 24 words used in the Japanese phonological survey

e‘ V ‘painting’ tat’ta CVQCV ‘stood’ ki‘ CV ‘tree’ i‘i VR ‘good’ bi‘ru CVCV ‘building’ i‘i ko VR (CV) ‘good child’ ki‘ta CVCV ‘came’ a‘u VV ‘meet’ ka‘do CVCV ‘corner’ ki‘i CVR ‘key’ tS i‘zu CVCV ‘map’ ka‘a CVR ‘car’ to‘ru CVCV ‘take’ bi‘iru CVRCV ‘beer’ kutS i CVCV ‘mouth’ gi’taa CVCVR ‘guitar’ e‘n VN ‘yen’ tS i‘izu CVRCV ‘cheese’ se‘n CVN ‘thousand’ ka’ado CVRCV ‘card’ at‘ta VQCV ‘existed’ to’oru CVRCV ‘pass’ itta VQCV ‘said’ ko’on CVRN ‘corn’

148 5.3.3. Procedure Data collection

The selected words were pseudo-randomized. PsyScope was used to present words. One word was displayed on the computer screen at a time. Before showing the words to be analyzed, three practice words were presented. In order to elicit judgments on word segmentation, we asked participants to replace each syllable/mora with the syllable no. The selection of syllable or mora was open to each participant. For the Japanese survey, participants were asked to use only the syllable no, and not noo (with a long vowel), or a glottal stop or the moraic nasal /N/. We required this in order to avoid ambiguities: if speakers parsed a word such as tooru ‘pass’ with the string noo-no, we could not tell whether they replaced the bimoraic syllable too with another bimoraic syllable, or whether they simply preserved the segment length of the first vowel of this word (t oo ru). On the other hand, if speakers substitute too with no-no, we can be sure that they count it as two moras. Segmentation judgments were recorded in the recording booth of the UCLA phonetics lab for the L1 English, experienced L2 English, L1 Japanese and L2 Japanese groups. The Japanese speakers of beginning L2 English were recorded in the recording room of Meiji Gakuin University Information Center in Tokyo.

Data analysis

For each participant, the collected data were transcribed. For English words, all phonetic variations of the syllable no (e.g., [noU], [no:] or [no]) were counted as one count of the same segmentation unit /no/. For Japanese words, only [no] with the short vowel /o/ was used by all participants, since they were instructed to do so. For every word, the number of no used was counted.

149 5.4. Results

5.4.1. L2 English segmentation L1 English vs. L2 English patterns

For each participant, the counts of no produced in correspondence to all 40 English test words are presented in Table 5-4. Tense vowels and diphthongs are represented by VV, lax vowels by V.

Table 5-4: Number of /no/ (representing the segmentation of English words by L1 English speakers and Japanese speakers of L2 English)

word syllable L1 English L2 English

structure NE1 NE2 NE3 NE4 NE5 NE6 NE7 AE1 AE2 AE3 BE1 BE2 BE3 BE4 I VV 111111112 1 2 112 it VC 111111112 1 2 2 2 2 eat VVC 111111112 1 3 2 2 2 eyes VVC 11111113 3 2 3 3 3 3 X VCC 111111113 --- 4 3 2 3 iced VVCC 1 2 111113 3 3 4 3 3 3 east VVCC 11111112 3 1 4 3 3 3 asked VCCC 1113 112 4 4 3 4 4 3 4 say CVV 111111112 1 2 2 1 2 cry CVV 11111112 3 2 3 3 2 2 few CVV 11111111112 111 chit CVC 11111112 2 1 2 2 2 2 dip CVC 11111112 3 1 2 2 2 2 kin CVC 11111111212 21---

pit CVC 11111111312 2 2 2

bid CVC 11111111313 2 2 2 deep CVVC 11111112 2 1 3 2 2 2 cheat CVVC 11111112 2 1 3 2 2 2 keen CVVC 111111112 1 3 3 11 bead CVVC 11111112 213 2 2 2 Pete CVVC 11111112 2 2 3 2 2 2 tax CVCC 11111112 3 1 3 2 3 3 kept CVCC 1111112 3 3 3 3 3 3 3 next CVCC 111111114 3 3 3 3 4 beast CVVCC 1111112 3 3 3 4 3 3 3 kind CVVCC 11111112 4 1 4 3 3 ---

150 (Table 5-4 is continued.)

word sylllable L1 English L2 English

structure NE1 NE2 NE3 NE4 NE5 NE6 NE7 AE1 AE2 AE3 BE1 BE2 BE3 BE4 liked CVVCC 11111112 4 3 4 3 3 3 least CVVCC 11111112 314 3 3 3 true CCVV 11111112 113 2 2 2 spit CCVC 11111112 3 3 4 3 3 3 speak CCVVC 11111113 3 3 4 2 2 3 spoil CCVVC 1111112 3 4 3 4 3 3 3 stream CCVVC 11111112 4 3 5 3 3 3 tribes CCVVCC 1112 112 4 4 3 5 4 4 4 troops CCVVCC 11111112 4 3 5 4 4 4 scree CCCVV 11111112 3 2 4 3 3 3 strew CCCVV 111111113 2 3 3 3 3 stray CCCVV 111111113 3 4 3 3 4 split CCCVC 111111115 3 4 3 4 4 strict CCCVCC 11111112 5 3 5 3 5 5 strength CCCVCCC 11111112 5 3 6 4 5 5 strive CCCVVC 11111113 4 3 5 3 4 4 striped CCCVVCC 1112 112 2 5 3 6 5 5 --- streams CCCVVCC 11111112 5 3 6 5 5 4 TOTAL 4 4 4 5 4 4 4 8 4 4 4 4 5 0 8 5 1 3 6 8 6 1 5 7 1 2 1 1 2 0 1 1 7

average/word 1 1.02 1 1.09 1 1 1.13 1.93 3.09 2 3.65 2.75 2.73 2.85

Since all test words are monosyllabic, we expect that every L1 English speaker will replace each English word with 1 count of /no/. Shaded cells indicate non-native patterns, i.e., words produced with more than one /no/. The sum of the number of /no/ instances for all 40 words was computed for each participant. Results are presented in the second last row of the table. Furthermore, an average /no/ count per word was computed for each participant by dividing the total number of /no/ counts by the total number of words (i.e., average count/word = total count/44 words). The results show that for the L1 English group the expected native pattern is dominant. There are 308 cases in total (7 speakers X 44 test words), and 298 cases follow the

151 expected native pattern (a word is parsed as one count of /no/); the percentage of occurrence of the expected native pattern is more than 95%. In L2 English data, on the other hand, native-like patterns occur in a minority of cases. This difference between L1 and L2 English patterns is also illustrated in Figure 5-3. In this figure, the average of /no/ counts per word from the last row of Table 5-5 is plotted for all participants. This graph shows that L1 English speakers generally treat English monosyllabic words as one count of the segmentation unit no, while Japanese speakers of L2 English tend to break down English monosyllabic words into multiple counts of /no/.

Figure 5-3: Average number of instances of the segmentation unit /no/ for English monosyllabic words

L1 English Advanced Beginning L2 English L2 English 4 3.65

3.09 3 2.75 2.73 2.85

2 2 1.93

1.13 1 1.02 1 1.09 11 1

0

average /no/ count per word NE1 NE2 NE3 NE4 NE5 NE6 NE7 AE1 AE2AE3 BE1 BE2 BE3 BE4

Beginning vs. advanced L2 English

The results of the English survey show the persistence of non-native patterns across all seven Japanese speakers of L2 English: monosyllabic English words tend to be parsed as multiple counts of /no/. Is there any systematic difference between advanced and beginning speakers of L2 English? Do advanced Japanese speakers of L2 English begin to perceive monosyllabic English words in a native-English-like manner? In order to answer these

152 questions, we computed the percentage of cases showing a native English-like pattern (i.e., an English monosyllabic word is parsed as one count of /no/) for the L2 English data of each speaker (Table 5-4), by dividing the number of native-English-like responses by the total number of responses. As the reference for the native-English pattern, we conducted the same calculation for each of the L1 English speakers and computed the group average by pooling the averages of the single speakers. The results are shown in Figure 5-4.

Figure 5-4: Percentage of native-English-like patterns produced by L1 English speakers and Japanese speakers of L2 English

L1 English Advanced Beginning L2 English L2 English 100

75

50

25

0 percentage of native-like judgment (%) NE AE1 AE2 AE3 BE1 BE2 BE3 BE4

Two advanced speakers of L2 English, AE1 and AE3, showed larger percentages of native-like patterns than the other five speakers of L2 English, AE3 and BE1Ð4, although

these two speakers’ percentages are still much lower than the average percentage for L1 English (96 %). This suggests that it is possible to learn to perceive an English monosyllabic word as one count of /no/ to a certain extent, although it is difficult to master the native-like segmentation. AE2 and BE1Ð4 showed low percentages, all occurring within the same range (lower than 5 %), and no beginning L2 speaker showed patterns that were more native-English-like than those of the three advanced L2 speakers. These results imply some positive correlation between how experienced or proficient Japanese speakers

153 of L2 English are and how native-like their segmentation judgments become, although there is no systematic separation between the advanced and beginning L2 English groups.

Syllable or mora?

Which segmentation unit do Japanese speakers of L2 English employ to segment English monosyllabic words, the syllable or the mora? This question can be partially answered by analyzing the results for the following four words, whose syllable structures are also possible in Japanese:

Table 5-5: Results of judgments on the segmentation of English words whose syllable structures are also possible in L1 Japanese

word syllable L1 English L2 English

structure NE1 NE2 NE3 NE4 NE5 NE6 NE7 AE1 AE2 AE3 BE1 BE2 BE3 BE4 I VV 111111112 1 2 112 say CVV 111111112 1 2 2 1 2 few CVV 11111111112 111 kin CVC 11111111212 21---

In L2 English speakers’ judgments, these words correspond to one count of /no/ in some cases, as in L1 English, and to two counts of /no/ in other cases. This difference can be explained in terms of whether the mora is used or not to segment monosyllabic English words. For example, AE1, AE3 and BE3 segmented say as /no/, while AE2, BE1, BE2 and BE4 segment it as /no-no/. Two counts of /no/ surely evidence the use of mora as the unit employed to segment say (i.e., mora-based segmentation of say). A similar explanation applies to the other three words, I, few and kin. Among the seven Japanese speakers of L2 English, AE1, AE3 and BE3 treated all four words as one count of /no/, while BE1 treated all four words as two counts of /no/. AE2, BE2 and BE4 showed both types. Although two counts of /no/ indicate the use of mora- based segmentation for English monosyllabic words, one count of /no/ does not constitute

154 solid evidence of the use of syllable as a segmentation unit (i.e., syllable-based segmentation), due to a problem with the method used to collect speakers’ judgments in the English survey. Unlike in the Japanese survey, we did not ask participants to use only /no/ with the short vowel /o/. We believe that this is not a serious problem for the interpretation of the L1 English data. As shown in Table 5-5, we found no single case of two counts of /no/ in the judgments of seven L1 English speakers for the analyzed words (and, as shown in Table 5-4, which presents all 44 tested words, there was hardly any instance of two counts of /no/ in general). This suggests that one count of /no/ in the L1 English data means the use of a single unit -- the syllable -- for segmenting monosyllabic words. However, it is possible that L2 Japanese speakers occasionally used /noo/ instead of /no/ to represent two moras. Thus, we can only conclude that Japanese speakers of L2 English used mora-segmentation in some cases, while L1 English speakers used only the syllable, in order to syllabify English monosyllabic words. More data collected following the same method used for our Japanese survey is needed in order to determine whether L2 English speakers ever use the syllable to parse monosyllabic but bimoraic words.

Treatment of consonant clusters in L2 Japanese

The structural complexity of a syllable can be measured by the number of consonants within the syllable, without distinguishing between onset and coda. A close examination of the L2 data in Table 5-4 shows that Japanese speakers of L2 English tend to segment a

more complex syllable into more counts of /no/. In order to show this correlation, the average count of /no/ was calculated for a representative speaker from each speaker group, depending on the number of consonants of a test word (e.g., 0 for I (VV), 6 for strength (CCCVCCC)). Results are shown in Figure 5-5. Three distinctive patterns emerge across the three speakers. NE1 in the L1 English group consistently categorized test words as monosyllabic regardless of how many

155 consonants there are. In contrast, BE1, in the beginning L2 English group, shows a clear positive correlation between the average of /no/ counts and the number of consonants in monosyllabic English words. This indicates that this speaker perceives a train of English consonants as a train of moras or syllables. Finally, AE1 in the advanced L2 group shows an intermediate pattern.

Figure 5-5: Average number of occurrences of the segmentation unit /no/ as a function of the number of consonants in a syllable

7 7 L1 English advanced 6 NE1 6 AE1 L2 English 5 5 4 4 3 3 2 2 1 1 0 0 0 1 2 3 4 5 6 0 1 2 3 4 5 6 7 beginning 6 BE1 L2 English 5 average count of /no/ 4 3 2 1 0 0 1 2 3 4 5 6 # of consonants in syllable

5.4.2. L2 Japanese segmentation L1 Japanese vs. L2 Japanese patterns

For each participant, the number of occurrences of the segmentation unit no for 15 words which include bimoraic syllables (i.e., the moraic nasal /N/, long vowels or coda consonants are contained in a syllable) are presented in Table 5-6. As for the English

156 survey, the total number of /no/ counts for each participant is presented in Table 5-6 and plotted in Figure 5-6. We expected that in native Japanese /no/ would be treated as a mora, not as a syllable. This is a legitimate assumption, since we asked participants to use only /no/ with the short vowel /o/. We expected the moraic nasal, the second half of a long vowel and the first half of a long consonant to be produced as one count of /no/, and this is indeed the pattern shown in the results of our Japanese survey. In Table 5-6, shaded cells indicate non-native patterns. The expected native patterns are presented in Table 5-7.

Table 5-6: Number of /no/ (representing the segmentation of Japanese words by L1 Japanese speakers and English speakers of L2 Japanese)

L1 Japanese L2 Japanese NJ1 NJ2 NJ3 NJ4 NJ5AJ1AJ1AJ3BJ1BJ2BJ3BJ4 en VN 22222 2221 1 1 1 with /N/ sen CVN222222221 1 1 1 koon CVRN 3333333331 2 1 ii VR 2222222221 1 1 kaa CVR 2 2 2 2 222221 2 1 kii CVR2222222221 2 1 hai CVR222222221 1 1 1 with /R/ gitaa CVCVR 3 3 3 333333232 biiru CVRCV 3333333332 3 2 tooru CVRCV 3333333332 2 2 kaado CVRCV 3333333332 3 2 chiizu CVRCV 3333333332 3 2 tatta CVQCV 3 3 3 3 2 2 332 2 2 2 with /Q/ atta VQCV 3 3 3 3 2 2332 2 2 2 itta VQCV 3 3 3 3 2 2 332 2 2 3 total 41 41 41 41 38 38 41 41 35 24 32 25

157 Figure 5-6: Total number of occurrences of the segmentation unit /no/ for Japanese words

L1 Japanese Advanced Beginning L2 Japanese L2 Japanese

41 41 41 41 41 41 40 38 38 35 32 30 24 25 20

10 total number of /no/

0 NJ1 NJ2 NJ3 NJ4 NJ5 AE1 AE2 AE3 BE1 BE2 BE3 BE4

Table 5-7: Expected native patterns in the segmentation of Japanese words

word syllable mora word syllable mora word syllable mora segment segment segment . . . en VN no-no kii CVR no-no kaado CVRCV n o - n o - no sen CVN no-no hai CVR no-no chiizu CVRCV n o - n o - no koon CVRN no-no- gitaa CVCVR n o - n o - tatta CVQCV n o - n o - no no no ii VR no-no biiru CVRCV n o - n o - atta VQCV no-no- no no kaa CVR no-no tooru CVRCV n o - n o - itta VQCV no-no- no no

We find that the expected native Japanese patterns are dominant in both the L1 Japanese and advanced L2 Japanese groups: four out of five L1 Japanese speakers and 2 out of three advanced speakers of L2 Japanese fully parsed the tested Japanese words into moras. For example, tooru ‘pass’ consists of one bimoraic syllable and one monomoraic syllable (i.e., too-ru). This word is parsed as /no-no-no/ (vs. /no-no/) when the mora is employed as a segmentation unit. In contrast, four beginning speakers of L2 Japanese produced non-

158 native patterns, i.e., syllable-based segmentation in most cases. For example, sen ‘thousand’ consists of one bimoraic syllable, and it is parsed as /no/, not as /no-no/, indicating that the syllable is employed as a segmentation unit. Among the four beginning speakers of L2 Japanese, BJ2 and BJ4 produced non-native patterns (syllable-based segmentation) in almost all cases, while BJ1 and BJ3 mixed native (mora-based) and non- native (syllable-based) patterns. This difference between the three speaker groups (L1 Japanese, advanced L2 Japanese and beginning L2 Japanese) is reflected by the difference in the total count of occurrences of the segmentation unit as shown in Table 5-6 and Figure 5-6: the total number of instances of the segmentation unit is larger in the L1 Japanese and advanced L2 Japanese groups than in the beginning L2 Japanese group. Mora-based segmentation contributes to a larger total number of occurrences of the segmentation unit /no/.

Which moraic segment is hard to perceive as an independent mora?

In Table 5-6, test words are categorized on the basis of which of the three moraic segments is present in the bimoraic syllable: the moraic nasal /N/, the second half of a long vowel /R/ or the first half of a long consonant /Q/. The examination of the table shows that BJ1 and BJ3 produced more native-like patterns for test words with /R/ than for words with /N/ and /Q/. This indicates that L2 Japanese speakers begin to treat /R/ (the second half of a long vowel) as an individual mora before the other two types of moraic segments (/N/ and /Q/).

At this moment, it is difficult to explain why the second half of a Japanese long vowel /R/ is easier for native English speakers to perceive as an independent mora than the coda consonants /N/ and /Q/22. However, we may find cues to explain the pattern in the crosslinguistic typology of syllable weight (see Gordon 1999 for a review). As Gordon’s

22 With respect to /Q/, notice that the failure to parse it as a mora could be due to the fact that English speakers in general have a hard time producing long consonants. Consequently, it is possible that they do not perceive the first half of a geminate as part of the coda of the preceding syllable.

159 survey shows, in languages with a metrical system sensitive to syllable weight, (C)VV with a long vowel is more likely to be treated as heavy (bimoraic) than (C)VC with a short vowel followed by a coda consonant. Thus, it is also possible that it is easier to learn to treat a (C)VV syllable as bimoraic than to learn to treat a (C)VC syllable as bimoraic.

5.5. Discussion

5.5.1. L1 vs. L2 word segmentation

Are L2 speakers aware of L2 syllable structures in the same way in which they are aware of L1 syllable structures? I will try to answer this question by comparing L1 vs. L2 speakers’ judgments on word segmentation in English and Japanese.

Word segmentation in L1 Japanese and L1 English

The results of the English and Japanese phonological surveys confirmed the expected native patterns for each language. L1 English speakers segmented monosyllabic English words with one count of the segmentation unit /no/ in more than 95% cases, regardless of how complex their syllable structure was. L1 Japanese speakers employed the mora in order to segment Japanese test words: syllables including moraic segments (/N/, /R/ or /C/) were treated as two moras in almost all cases. This confirms that the syllable and the mora are used as units of word segmentation in L1 English and L1 Japanese, respectively.

The effects of L1 characteristics on L2 segmentation

Two major patterns emerge from the analysis of L2 English data. The first pattern is that Japanese speakers of L2 English tend to divide single complex English syllables into multiple counts of /no/ by breaking consonant clusters. This occurs when the structures of English syllables do not satisfy the constraints of Japanese syllable structure, where no consonant cluster other than /N.Q/ is allowed and only nasals or the first half of geminate consonants can occur in the coda position. The structural simplicity of L1 Japanese

160 syllables negatively transfers to the syllable structures of L2 English. The second major pattern in L2 English is the following: some Japanese speakers of L2 English use mora- based segmentation and reanalyze the complex structure of English syllables even though the structures of those English syllables are legitimate in Japanese, as shown in Table 5-5. This indicates that Japanese speakers of L2 English adapt L1 Japanese mora-based segmentation in their L2 English segmentation. As for L2 Japanese segmentation, while advanced speakers of L2 Japanese showed mora-based segmentation (the consistent pattern in L1 Japanese), beginning speakers of L2 Japanese used syllable-based segmentation in most cases. This difference is due to a difference in the treatment of bimoraic syllables containing moraic segments: beginning speakers of L2 Japanese treated a bimoraic Japanese mora as one count of /no/ (i.e., one syllable), while native Japanese speakers and advanced speakers of L2 Japanese treated it as two counts of /no/ (i.e., two moras). This indicates that beginning speakers of L2 Japanese tend not to treat moraic segments as individual moras in L2 Japanese word segmentation. However, this pattern needs to be interpreted with caution: it is not possible to conclude that beginning speakers of L2 Japanese are not aware of the presence of the mora. In the case of L2 English segmentation, the larger number of instances of the segmentation unit /no/ is a solid piece of evidence for the claim that Japanese speakers decompose an English syllable structure into multiple moras/syllables, due to the negative transfer of L1 Japanese characteristics. However, in the case of the Japanese data, the smaller number of counts of /no/ does not necessarily mean that beginning speakers of L2 Japanese use only the syllable without being aware of the mora. As Japanese has both moras and syllables, beginning L2 Japanese speakers could replace each syllable with /no/ even if they are aware of the existence of moras. The only thing we can be sure about concerns the awareness of the mora by advanced speakers of Japanese: their native-like

161 judgments on Japanese word segmentation shows that they have learned to analyze the internal structures of Japanese syllables with the use of the mora.

5.5.2. Word segmentation in beginning vs. advanced L2 speech

We have observed an asymmetry between the two L2 types in terms of the mastery of native-like segmentation by advanced L2 speakers. In both L2 English and L2 Japanese, beginning speakers showed the effect of their L1 segmentation patterns: e.g., the use of the mora and the decomposition of one English syllable into multiple moras/syllables in L2 EnglishÐL1 Japanese; the failure to treat moraic segments as individual moras in L2 JapaneseÐL1 English (at least at the surface level). However, while advanced speakers of L2 English show non-native patterns in their segmentation judgments, advanced speakers of L2 Japanese show the same patterns as L1 Japanese speakers do. This asymmetry may be due to the fact that learning a new category (the mora for English speakers) is easier than ignoring/suppressing/deleting sensitivity to a pre-existent category (the mora for Japanese speakers). Another possibly relevant factor is the difference between the English and Japanese writing system. English speakers could be helped in acquiring the Japanese mora category by the Japanese alphabet, kana. In the kana system, each letter corresponds to one mora. It is not difficult to imagine that the Japanese alphabet consistently reminds English speakers of the existence of the mora (a similar effect was observed in L1 Japanese

acquisition by Inagaki et al. (2000)). This type of visual help is not available for Japanese speakers learning English, since in English the writing system does not show the native- like syllabification of words.

5.5.3. Connection between awareness of L2 syllable structures and L2 segmental production

162 How is the awareness of L2 syllable structures related to L2 segmental production? In order to answer this question, we consider one case for each L2 type.

L2 EnglishÐL1 Japanese: phonological awareness of English syllable structure vs. production of the duration contrast between English tense vs. lax vowels

As mentioned before, beginning Japanese speakers of L2 English have a hard time to produce English words with consonant clusters or coda consonants (other than /N/ or /Q/). As a solution, they optimize complex English syllables by simplifying their structure. This strategy is used in Japanese loanword phonology as well. A classic example is: McDonald in L1 English is borrowed and pronounced as /ma.ku.do.na.ru.do/ with vowel epenthesis. It is a well known fact that many experienced Japanese speakers of L2 English eventually stop vowel epenthesis and become able to produce English consonant clusters. The results of the English survey show that even advanced speakers of L2 English (AE1, AE2 and AE3) are still not aware of English syllable structures and under a strong influence from L1 Japanese syllable structure and mora-based segmentation. A similar discrepancy is found between the phonological awareness of L2 English syllable structure and the production of a duration contrast between English tense vs. lax vowels. One of the advanced speakers of L2 English in the English survey, AE2, also participated in Experiment 4, in which we investigated the production of English tense and lax vowels by Japanese speakers. In the English survey, AE2’s segmentation judgments showed the influence of L1 Japanese characteristics as strongly as for beginning speakers of L2 English. If the phonological awareness of English syllable structures develops together with the production of English tense and lax vowels in L2 English development, we expect that Speaker AE2 will produce a duration ratio of English tense/lax vowels similar to the duration ratio of his L1 Japanese long/short vowels. However, this was not the case. AE2 successfully approximated the native-like duration ratio of English tense/lax

163 vowels (see Figure 3-6 in Chapter 3), even though he still parsed Japanese long vowels as bimoraic in the English survey (see Table 5-5). These discrepancies between the production of English segments and the phonological awareness of English syllable structures in L2 EnglishÐL1 Japanese are interesting, since they suggest that Japanese speakers are able to suppress L1 Japanese segmental patterns and can produce native-English-like segments even though they still retain Japanese moraic structures in their mind.

L2 JapaneseÐL1 English: phonological awareness of Japanese moras and production of the duration contrast between Japanese long vs. short vowels

In Experiment 5, the duration contrast between Japanese long vs. short vowels in L2 Japanese production by English speakers was investigated. The results of this survey show that English speakers choose the spectral region of English tense /i/ and differentiate long and short vowels only by duration. Is there any systematic relation between duration contrast patterns and the awareness of the bimoraicity of Japanese long vowels? In order to answer this question, we summarized the judgments on the target words that were used in Experiment 5 and also in the Japanese survey presented in this chapter in Table 5-8.

Table 5-8: Number of /no/ (representing the segmentation of Japanese words containing short vs. long vowels)

L1 L2 Japanese Japanese AJ1 AJ1 AJ3 BJ1 BJ2 BJ3 BJ4 biru CVCV 2 2 2 22222 biiru CVRCV 3 3 3 3 3 232 chizu CVCV 2 2 2 22222 chiizu CVRCV 3 3 3 3 3 232 kado CVCV 2 2 2 22222 kaado CVRCV 3 3 3 3 3 232 toru CVCV 2 2 2 22222 tooru CVRCV 3 3 3 3 3 2 2 2

164 To review the duration contrasts between Japanese long and short vowels which emerged from the results of Experiment 5, the average duration ratio of long/short vowels is summarized in Table 5-9, for three word pairs for each speaker. For L1 Japanese data, the ratio range of three speakers is presented. In both tables, non-native patterns are indicated by shaded cells.

Table 5-9: Average duration ratios of Japanese long/short vowels (based on the results of Experiment 4)

L1 L2 Japanese Japanese AJ1 AJ1 AJ3 BJ1 BJ2 BJ3 BJ4 biru vs. biiru 1.6~2.3 2.15 2.9 3.0 2.5 2.6 1.8 2.7 kado vs. kaado 2.1~2.5 2 2.9 4.0 2.3 2.7 1.7 1.7 toru vs. tooru 2.0~2.4 2.4 2.8 3.4 2.1 2.8 1.7 2.9

Table 5-10 summarizes the performance of each L2 Japanese speaker with respect to the following two aspects (based on the information presented in Tables 5-8 and 5-9, respectively): the segmentation of Japanese long vowels and the duration contrast between Japanese long vs. short vowels in his/her L2 Japanese production, as observed in Experiment 5.

Table 5-10: Average duration ratios of Japanese long/short vowels

segmentation of long-short Japanese long vowels duration contrast AJ1 mora-based native-like AJ2 mora-based too large AJ3 mora-based too large BJ1 mora-based native-like BJ2 syllable-based too large BJ3 mora-/syllable-based too small BJ4 syllable-based too large or too small

165 In Table 5-10, we observe three patterns in terms of the distribution of native-like and non- native-like patterns. AJ1 and BJ1 produced native-like patterns for both word segmentation and duration contrasts; AJ2 and AJ3 showed native-like word segmentation but excessively large duration contrasts; finally, BJ2, BJ3 and BJ4 showed neither native- like segmentation nor duration contrasts. Thus, the only two speakers showing native-like duration contrasts for Japanese long/short vowels treated Japanese long vowels as bimoraic, while none of the three speakers of L2 Japanese who did not show mora-based segmentation (BJ2, BJ3 and BJ4) showed native-like duration contrasts. This suggests that the ability of segmenting Japanese long vowels as bimoraic is correlated with the ability to produce a native-like duration contrast between Japanese long and short vowels.

5.6. Summary

In order to find how L2 speakers are aware of the syllable structures of their target language, we conducted a phonological survey for English and Japanese, in which participants were asked to segment words with the use of the segmentation unit /no/. The expected native pattern was observed in L1 speakers’ judgments on word segmentation: L1 English speakers treated monosyllabic English words as one syllable, while L1 Japanese speakers treated Japanese moraic segments as individual moras. Two major patterns were found in L2 English speakers’ judgments: first, the decomposition of one English syllable into multiple moras/syllables; second, the use of the mora for English word segmentation.

These two patterns indicate that the characteristics of L1 Japanese syllable structure strongly affect the awareness of English syllable structures even by advanced Japanese speakers of English. On the other hand, the L2 Japanese data showed that English speakers become aware of Japanese moraic segments (i.e., of the bimoraicity of Japanese complex syllables), and there is a positive correlation between proficiency levels and the phonological awareness of Japanese moraicity. The asymmetry between L2 English and

166 L2 Japanese in the ability of becoming aware of the syllable structures of the target language can be explained by the hypothesis that learning a new prosodic category (the mora for English speakers learning Japanese) is less challenging than ignoring/suppressing a category existing in L1 (the mora for Japanese learning English). Another possible factor which could be causing this asymmetry is the difference between English and Japanese in terms of how they represent syllable structures in the writing system. The Japanese kana system, which is mora-based, may help English speakers in becoming aware of mora- based parsing, while the English writing system does not offer an equivalent visual cue to Japanese speakers learning English in order to learn the phonological awareness of English syllable structures.

167 Chapter 6: Conclusion

In the present study I conducted seven phonetic experiments and a phonological experiment in order to investigate how L1 prosodic characteristics affect production of L2 prosodic patterns. I compared L2 EnglishÐL1 Japanese and L2 JapaneseÐL1 English. While the significance of the specific results of each experiment was discussed in previous chapters, I will conclude with some general remarks about aspects of the study of L2 prosody which emerge from the reported work, and I will indicate some directions that future research could explore. First, past research on L2 speech development at the segmental level has shown that it is important to take phonetic details into account in order to achieve a better understanding of language transfer in L2 speech development (e.g., Brière 1968 or Flege 1987). The present study has demonstrated that this is also the case for research on the prosodic level. Second, prosody is phonetically realized by multiple correlates, which differ from language to language. The present study has shown that it is important to investigate various correlates relevant to the prosodic phenomenon of interest in L2, since transfer patterns can vary greatly from correlate to correlate. Future research should analyze the entire set of relevant correlates for a comprehensive picture of L2 prosodic patterns (e.g., for accent, intensity and vowel quality should be included in addition to F0 and duration, the correlates that were examined in the present study). Third, it is essential to consider whether a certain prosodic feature plays a phonological role just in one language, in both languages, or in neither language. The present study has shown that different transfer patterns in the learner’s production can be explained by a

168 difference between L1 and L2 in terms of the phonological status of a relevant prosodic feature. Fourth, the analysis of contrast between English tense vs. lax vowels and between Japanese long vs. short vowels in L2 speech production has shown that there exists a systematic interaction of prosodic and segmental levels in the transfer of L1 features in L2 speech development. Fifth, the comparison of the data from the phonological experiment with the results of the phonetic experiments has shown that L2 phonological prosodic units are not necessarily learned in parallel with the L2 phonetic patterns of prosodic phenomena. We have observed two types of interactions between these two aspects of L2 speech. In some cases, L2 phonetic patterns can be learned while retaining phonological awareness based on L1 prosodic units (e.g., advanced Japanese speakers of L2 English produce a native- English-like duration contrast between English tense and lax vowels successfully even though they still base their phonological segmentation of English words on Japanese moraic structures). In other cases, phonological awareness of L2 prosodic units appears to be correlated with (required for?) the acquisition of L2 phonetic patterns (e.g., all and only English speakers showing native-Japanese-like duration contrasts between Japanese long and short vowels also showed native-Japanese-like awareness of Japanese moras). Finally, the investigation of multiple prosodic phenomena in the present study suggests that an L2 speaker’s prosodic system does not necessarily develop in a parallel manner for different dimensions of prosody. Some beginning learners are bound to L1 phonetic habits in the production of a certain prosodic phenomenon, but at the same time they show surprisingly native-like L2 patterns in the production of another prosodic phenomenon. Which factors affect the learner’s proficiency in any one prosodic dimension still needs to be clarified. Among the potential factors we hypothesized are the phonological status of a phonetic pattern, the relative importance of the role of that correlate in the production of that

169 specific prosodic phenomenon, the learner’s perceptual sensitivity, and the type and amount of L2 input. To assess the relevance and relative importance of each factor is, of course, a great challenge for future research. In the long term, it will be important for L2 prosody research to describe the L2 prosodic system of a speaker comprehensively by considering both tonal and temporal organization, and to collect this sort of data for more L2 speakers. Furthermore, we need to take the interaction between the segmental and prosodic level and the interaction between the phonological and phonetic level into account in the study of language transfer in L2 speech development. These efforts will not only help us in achieving a better understanding of L2 speech development, but also in establishing a better model of L2 speech development and accurate methods for diagnosing L2 oral proficiency. To conclude, while of course many questions are still open and a considerable amount of further research has to be conducted on the relevant issues, the work reported here shows that the experimental study of L2 prosody can provide interesting insights into language transfer in L2 speech development.

170 REFERENCES

Anderson-Hsieh, J. & H. Venkatagiri. (1994). Syllable duration and pausing in the speech of Chinese ESL speakers. TESOL Quarterly 28 (4), 807-812. Argyres, Z. (1996). The Cross-cultural Pragmatics of Intonation: the Case of Greek- English. MA thesis, UCLA. Beckman, M. E. (1982). Segment duration and the ‘mora’ in Japanese. Phonetica 39, 113- 135. Beckman, M. E. (1986). Stress and Non-Stress Accent. Dortdrecht: Foris Publications. Beckman, M. E. (1992). Evidence for speech across languages. In Y. Tohkura, E. Vatikiotis-Bateson, & Y. Sagisaka (eds.), Speech Perception, Production and Linguistic Structure. Tokyo: Ohmsha. Beckman, M. E. & J. B. Pierrehumbert. (1986). Intonational structure in Japanese and English. Phonology Yearbook Vol. 3, pp. 255-309. Beckman, M. E. (1996). The parsing of prosody. Language and Cognitive Processes (to appear). Beckman, M. E. & J. Edwards. (1990). Lengthenings and shortenings and the nature of prosodic constituency. In J. Kingston & M. E. Beckman (eds.), Papers in Laboratory Phonology I: Between the Grammar and the Physics of Speech. Cambridge: Cambridge University Press. Berkovits, R. (1993). Utterance-final lengthening and the duration of final-stop closures. Journal of Phonetics 21, 479-489. Best, C. (1995). A direct realist view of cross-language speech perception. In W. Strange (ed.),Speech perception and linguistic experience: issues in cross-linguistic experience: issues in cross-language research. Baltimore: York Press.

Bohn, O.-S. & J. E. Flege. (1990). Interlingual identification and the role of foreign language experience in L2 vowel perception. Applied Psycholinguistics 11, 131-58. Bloch, B. (1950). Studies in colloquial Japanese IV: Phonemics. Language 26, 86-125. Bradlow, A., R. Port & K. Tajima. (1995). The combined effects of prosodic variation on Japanese mora timing. Proceedings of International Congress of Phonetic Sciences 4, 344-347. Broselow, E. (1983). Non-obvious transfer: On predicting epenthesis errors. In S. Gass & L. Selinker (eds.), Language Transfer in language learning, pp. 269-280. Rowley, MA: Newbury House.

171 Broselow, E. & H.-B. Park. (1995). Mora conservation in second language prosody. In J. Archibald (ed.), Phonological Acquisition and Phonological Theory. Hillsdale, NJ: Lawrence Erlbaum. Cambier-Langeveld, T. (2000). Temporal Marking of Accents and Boundaries. Ph.D. dissertation, University of Amsterdam. Campbell, N. (1991). A study of Japanese speech timing from the syllable perspective. Phonetic Society of Japan 3, 29-39. Campbell, N. (1992). Segmental elasticity and timing in Japanese speech. In Y. Tohkura, E. Vatikiotis-Bateson, & Y. Sagisaka (eds.), Speech Perception, Production and Linguistic Structure. Tokyo: Ohmsha. Campbell, N. & Y. Sagisaka. (1991). Moraic and syllable-level effects on speech timing. (Onsei taimingu ni mrareru moora to onsetsu no eikyou ni tsuite). Translation. Committee for Speech Research, Acoustically Society of Japan SP90-107, 35-40. Class, A. (1939). The of English Prose. Oxford: Basil Blackwell. Dauer, R. (1983). Stress-timing and syllable-timing reanalyzed. Journal of Phonetics 11, 51-62. Eckman, F. (1977). Markedness and the contrastive analysis hypothesis. Language Learning 27, 315-30. Eckman, F. (1985). The markedness differential hypothesis: theory and applications. In B. Wheatley, A. Hastings, F. Eckman, L. Bell, G. Krukar & R. Rutkowski (eds.), Current Approaches to Second Language Acquisition: Proceedings of the 1984 University of Wisconsin-Milwaukee Linguistic Symposium, 3-21. Bloomington, ID: Indiana University Linguistics Club. Ellis, R. (1985). Understanding Second Language Acquisition. Oxford University Press. Fant, G., A. Kruckenberg & L. Nord (1991). Durational correlates of stress in Swedish, French and English. Journal of Phonetics 19, 351-365.

Fear, B., A. Cutler & S. Butterfield (1995). The strong/weak syllable distinction in English. Journal of the Acoustical Society of America 97 (3), 1893-1904. Ferreira, F. (1993). Creation of prosody during sentence production. Psychological Review 100 (n2), 233-253. Flege, J. E. (1987). The production of “new” and “similar” phones in a foreign language: evidence for the effect of equivalence classification. Journal of Phonetics 15, 47-65. Flege, J. E. (1988). The production and pronunciation of foreign language speech sounds. In H. Winitz (ed.), Human Communication and Its Disorders: A Review 2. Norwood, NJ.: Ablex Publishing Corp.

172 Flege, J. E. (1989). An instrumental study of vowel reduction and stress placement in Spanish-accented English. Studies in Second Language Acquisition 11, 35-62. Flege, J. E. (1992). Speech learning in a second language. In C.A. Ferguson, L. Menn & C. Stoel-Gammon (eds.), Phonological Development: Models, Research Implications, Timonium, Marland: York Press. Flege, J. E. (1995). Second-language speech learning: Theory, findings, and problems. In W. Strange (ed.), Speech Perception and Linguistic Experience: Issues in Cross- Language Research. Timonium, Maryland: York Press. Fougeron, C. (1999). Prosodically conditioned articulatory variations: A review. UCLA Working Papers in Phonetics 97, 1-73. Fougeron C. & P. Keating. (1997). Articulatory strengthening at edges of prosodic domain. Journal of the Acoustical Society of America 106(6), 3728-3740. Fries, C. (1945). Teaching and learning English as a foreign language. Ann Arbor: University of Michigan Press. Fry, D. B. (1955). Duration and intensity as physical correlates of linguistic stress. Journal of the Acoustic Society of America 27, 765-768. Fry. D. B. (1958). Experiments in the perception of stress. Language and Speech 1, 126- 152. Gårding, E. (1981). Contrastive prosody: A model and its application. Studia Lingusitica 35, 146-165. Gass, S. (1996). The role of language transfer. In C. W. Ritchie & T. K. Bhatia (eds.), Handbook of Second Language Acquisition. San Diego: Academic Press. Gass, S. & L. Selinker. (1994). Second Language Acquisition: An Introductory Course. Lawrence Erlbaum Associates. Gee, J. P. & F. Grojean. (1983). Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology 15, 411-458.

Gordon, M. (1999). Stress and other Weight-Sensitive Phenomena: Phonetics, Phonology and Typology. Ph.D. dissertation, UCLA. Han, M. (1961). Japanese Phonology: An Analysis Based on Sound Spectrograms. Ph.D. dissertation, University of Texas, Austin. Han, M. (1962). The feature of duration in Japanese. Onsei no Kenkyuu 10, 65-80. Han, M. (1962). Japanese Phonology: An Analysis Based upon Sound Spectrograms. Tokyo: Kenkyusha. Han, M. (1992). The timing control of geminate and single stop consonants in Japanese: A challenge for nonnative speakers. Phonetica 49, 102-127.

173 Han, M. (1994). Acoustic manifestations of mora timing in Japanese. Journal of Acoustical Society of America 96, 73-82. Haugen, E. (1956). Bilingualism in the Americas: A Bibliography and Research Guide. Baltiore: American Dialect Society. Hayes, B. (1989). Compensatory lengthening in moraic phonology. Linguistic Inquiry 20, 253-206. Hayes, B. (1984). Review Article: D. Attridge, The Rhythms of English Poetry. Language 60, 914-923. Hayes, B. (1989). The Prosodic Hierarchy in Meter. In P. Kiparsky & G. Youmans, Phonetics and Phonology, Volume 1: Rhythm and Meter. San Diego: Academic Press. Hayes, B. (1995). Metrical Stress Theory. The University of Chicago Press. Hoequist, C, Jr. (1983a). Durational correlates of linguistic rhythm categories. Phonetica 40, 19-31. Hoequist, C, Jr. (1983b). Syllable duration in stress-, syllable-, and mora-timed languages. Phonetica 40, 203-237. Homma, Y. (1973). An acoustic study of Japanese vowels. Study of Sounds 16, 347-368. Homma, Y. (1981). Durational relationship between Japanese stops and vowels. Journal of Phonetics 9, 273-281. Huss, V. (1978). English word stress in the postnuclear position. Phonetica 35, 86-105. Inagaki, K., G. Hatano & T. Otake. (2000). The effect of kana literacy acquisition on the speech segmentation unit used by Japanese young children. Journal of Experimental Child Psychology 75, 70-91. Ito, J. (1990). Prosodic minimality in Japanese. In K. Deaton, M. Noske & M. Ziolkowski (eds.), CLS 26-II: Papers from the Parasession on the Syllable in Phonetics and Phonology, 213-239. Jinbo, K. (1927). Kokugo no onseijou no tokushitsu [The top phonetic characteristics of Japanese]. In T. Shibata, Kitamura, H. Kindaichi (eds.), Nihon no gengogaku [Linguistics of Japan], pp. 5-15. Tokyo: Taishukan (reprinted in 1980). Jun, S.-A. (1993). The Phonetics and Phonology of Korean Intonation. Ph.D. dissertation, Ohio State University. [published in 1996 by Garland, NY.] Jun, S.-A. (1995). A phonetic study of stress in Korean. Posters presented in the 130th meeting of the Acoustical Society of America, St. Louis, MO.

174 Jun, S.-A. (1998). The Accentual Phrase in the Korean prosodic hierarchy. Phonology 15 (2), 89-226. Jun, S.-A. & M. Oh. (2000). Acquisition of Second Language Intonation. Proceedings of the International Conference on Spoken Language Processing, Beijing, China. Kaiki, N., Takeda, K., Sagisaka, Y., Katagiri, S., Umeda, T. and Kuwabara, H. (1990). A large-scale Japanese speech data base. Proceedings of the International Conference on Spoken Language Processing, 1.5, 17-20. Kaiki, N. & Y. Sagisaka. (1992). The control of segmental duration in speech synthesis using statistical methods. In Y. Tohkura, E. Vatikiotis-Bateson, & Y. Sagisaka (eds.), Speech Perception, Production and Linguistic Structure. Tokyo: Ohmsha. Kawasaki, H. (1983). Models and data on the temporal regulation of speech: Isochrony in Japanese and English. Journal of the Acoustical Society of Japan 39 (6), 389-397. (in Japanese). Keating, P., T. Cho, C. Fougeron & C.-S. Hsu. (to appear). Domain-initial articulatory strengthening in four languages. Papers in Laboratory Phonology 6. Klatt, D. (1975). Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics 3, 129-140. Cambridge: Cambridge University Press. Klatt, D. (1980). Software for a cascade/parallel formant synthesizer. Journal of the Acoustical Society of America 67, 971-95. Kondo, Y. (1998). Prosodic constraint on V-to-V Coarticulation in Japanese. Proceedings of the International Conference on Spoken Language Processing, Sydney. Kubozono, H. (1989). The mora and syllable structure in Japanese: Evidence from speech errors. Language and Speech 32(3), 249-278.

Kubozono, H. (1993). The Organization of Japanese Prosody. Tokyo: Kuroshio Publishers. Kubozono, H. (1995). Gokeisei to Oninkoozoo [Word Formation and Phonological Structure]. Tokyo: Kuroshio Shuppan.

Kubozono, H. (1995). Perceptual evidence for the mora in Japanese. In Connel, B. & A. Arvaniti (eds.), Phonology and Phonetic Evidence: Papers in Laboratory Phonology IV. Cambridge University Press. Kubozono, H. & M. Nakau (1998). Oninkoozoo to akusento [Phonological Structure and Accent]. Tokyo: Kenkyusha. Ingram, J. C. & S.-G. Park. (1997). Cross-language vowel perception and production by Japanese and Korean learners of English. Journal of Phonetics 25(3), 343-370.

175 Ladd, R. (1996). Intonational Phonology. Cambridge University Press. Ladefoged, P. (1993). A Course in Phonetics. 3rd Edition. Orlando, FL: Harcourt Brace & Company. Lado, R. (1957). Linguistics across cultures. Ann Arbor: University of Michigan Press. Larsen-Freeman, D. & M. Long. (1991). An Introduction to Second Language Acquisition. Longman. Leather, J. & A. James (1996). The acquisition of second language speech. In C. W. Ritchie & T. K. Bhatia (eds.), Handbook of Second Language Acquisition. San Diego: Academic Press. Lehiste, I. (1970). Suprasegmentals. Cambridge, MA: MIT Press. Lehiste, I. (1977). Isochrony reconsidered. Journal of Phonetics 5, 253-263. Lehiste, I. & G. E. Peterson (1959). Vowel amplitude and phonemic stress in American English. Journal of the Acoustical Society of America 31, 428-435. Levitt, A. (1992). Reiterant speech as a test of non-native speakers’ mastery of the timing of French. Journal of the Acoustical Society of America 90 (6), 3008-3018. Liberman, M. (1975). The Intonational System of English. Ph.D. dissertation, MIT. Liberman, M. & A. Prince. (1977). On stress and linguistic rhythm. Linguistic Inquiry 8, 249-336. Maekawa, K. (1994). Is there ‘dephrasing’ of the accentual phrase in Japanese? Ohio State University Working Papers in Linguistics: Papers from the Linguistic Laboratory 44, 146-165. Magnuson, J. S. & R. Akahane-Yamada. (1996). Acoustic correlates to the effects of talker variability on the perception of English /r/ and /l/ by Japanese listeners. In Proceedings of the Fourth International Congress on Spoken Language Processing (ICSLP 96). University of Delaware and A. I. du Point Institute.

Mester, A. (1990). Patterns of trunctation. Linguistic Inquiry (21)3, 278-485. Mitsuya, F. & M. Sugito (1977). A study of the accentual effect on segmental and moraic duration in Japanese. Annual Bulletin of the Research Institute of Logopedics and Phoniatrics, University of Tokyo 12, 97-112. Mochizuki-Sudo, M. & S. Kiritani. (1991). Production and perception of stress-related durational patterns in Japanese learners of English. Journal of Phonetics 19, 231-248. Nakatani, L. H., K. D. O'Connor & C. H. Aston (1981). Prosodic aspects of American English rhythm. Phonetica 38, 84-106.

176 Nespor, M. & I. Vogel. (1986). Prosodic Phonology. Dordrecht: Foris Publications. Otake, T. Gengo no rizumu to onsetsu koozoo [Rhythmic structure of Japanese and syllable structure]. IEICE Tech. Report 89, 55-61. Peterson, G. E. & H. L. Barney. (1952). Control methods used in a study of vowels. Journal of the Acoustical Society of America 24, 175-184. Peterson, G. E. & I. Lehiste. (1960). Duration of Syllable Nuclei in English. Journal of the Acoustical Society of America 32, 693-703. Pierrehumbert, J. (1980). The Phonology and Phonetics of English Intonation, Ph.D. dissertation, MIT. [published 1987 by IULC, Bloomington: Indiana University Linguistics Club.] Poser, W. (1990). Evidence for foot structure in Japanese. Language 66, 78-105. Sagisaka, Y. (1999a). Nihongo onin no jikanchoo seigyo to chikaku [Japanese sound duration control and perception]. Gengo, 51-56. Sagisaka, Y. (1999b). Koopasu beesu onsei goosei: Onsei kagaku chishiki ni motozuku goosei shisutemu koochiku gijitsu no shin paradaimu [Corpus-based speech synthesis Ð A new paradigm for synthesis system building based on the knowledge in speech science]. Proceedings of Meeting of the Acoustical Society of Japan, SeptemberÐOctober, 197-200. Sato, Y. (1993). The durations of syllable-final nasals and the mora hypothesis in Japanese. Phonetica 50, 44-67. Sato, Y. (1995). The mora timing in Japanese: A positive linear correlation between the syllable count and word duration. Bulletin of Phonetic Society of Japan 209, 40-53. Sato, Y. (1996). The moraic status of syllable-final nasals in Japanese. Bulletin of Phonetic Society of Japan 212, 67-75. Selinker, L. (1972). Interlanguage. International Review of Applied Linguistics 10, 209- 30.

Selkirk, E. (1983). Phonology and Syntax: The Relation between Sound and Structure. Cambridge, MA: The MIT Press. Selkirk, E. (1986). On derived domains in sentence phonology. Phonology Yearbook 3, 371-405. Selkirk, E. (to appear). The prosodic structure of function words. In J. Martin & K. Demuth (eds), International Conference on Bootstrapping from Speech to Grammar in Early Acquisition, Brown University. Hillsdale, N.J.: Lawrence Earlbaum. Shattuck-Hufnagel, S. & A. Turk. (1996). A prosody tutorial for investigators of auditory sentence processing. Journal of Psycholinguistic Research 25(2), 193-247.

177 Shibatani, M. (1990). Languages of Japan. Cambridge University Press. Shibuya, Y. (1997). Differences Between Native and Non-native Speakers’ Realization of Stress-Related Duration and Pitch Patterns in American English. A qualifying paper in Georgetown University Linguistics Department. Strange, W., O.-S. Bohn, S. A. Trent, M. C. McNair & K. C. Bielec. Context and speaker effects in the perceptual assimilation of German vowels by American listeners. In Proceedings of the Fourth International Congress on Spoken Language Processing (ICSLP 96). University of Delaware and A. I. du Pont Institute. Sugito, M. (1982a). Nihongo Akusento no Kenkyu, Tokyo: Sanseido Shuppan. Sugito, M. (1982b). Eibeijin oyobi nihonjin no hatsuwa ni okeru [I] oyobi [i] no onkyooteki tokuchyoo. [republished in Suguto, M. Nihonjin no Eigo (1996). Osaka: Izumi Shoin. Sugito, M. (1996). Nihonjin no Eigo. Tokyo: Izumi Shuppan. Takeda, K., Sagisaka, Y., and Kuwabara, H. (1989). On sentence-level factors governing segmental duration in Japanese. Journal of the Acoustical Society of America 86, 2081- 2087. Tateishi, K. (1989). Theoretical implications of the Japanese musicians’ language. WCCFL 8, 384-398. Todaka, Y. (1990). An Error Analysis of Japanese Students’ Intonation and Its Prosodic Analysis. MA thesis, UCLA. Trask, R. L. (1996). A Dictionary of Phonetics and Phonology. London: Routledge. Trubetzkoy, N. S. (1958). Grundzüge der Phonologie. Göttingen: Vanddenhoeck and Rupercht. Tsujimura, N. (1996). An Introduction to Japanese Linguistics. Oxford: Blackwell. Turk, A. & J. R. Sawush. The domain of accentual lengthening in American English. Journal of Phonetics 25(1), 25-42.

Uchida, T. (1996). Chuugokujin Nihongo Gakushuusha ni Okeru Chyooon Sokuon Hatsuon no Chyookakutekininchi no Tokuchyoo: Gaikokujin no tame no Nihongo Onseikyooiku ni Okeru Tokushyuhaku no Mondai ni Kansuru Chyookakuteki Kiso Kenkyuu. Ph.D. dissertation, Nagoya University. Ueyama, M. (1995). Phrase-final Lengthening and Stress-timed Shortening Effects in the speech of Native Speakers and Japanese learners of English. Unpublished MA thesis, UCLA. Ueyama, M. (1996). Phrase-final Lengthening and Stress-timed Shortening in the speech of Native Speakers and Japanese learners of English. Proceedings of the International Conference on Spoken Language Processing, Philadelphia.

178 Ueyama, M. & S.-A. Jun. (1998). Focus realization in Japanese English and Korean English intonation. Japanese/Korean Linguistics Vol. 7, CSLI/Stanford University Press. Ueyama, M. (1999). An experimental study of vowel duration in phrase-final contexts in Japanese. UCLA Working Papers in Phonetics 97, 174-182 Umeda, N. (1975). Vowel duration in American English. Journal of the Acoustical Society of America 58, 434-45. van Santen, J. P. H. (1992). Contextual effects on vowel duration. Speech Communication 11, 513-546. Vance, T. (1987). An Introduction to Japanese Phonology. SUNY Press. Venditti, J. (1995). Japanese ToBI Labeling Guidelines. Manuscript, Ohio-State University. Venditti, J. (2000). Discourse Structure and Attentional Salience Effects on Japanese Intonation. Ph.D. dissertation, Ohio State University. Venditti, J. & J. van Santen. (1998). Modeling segmental durations for Japanese text-to- speech synthesis. In Proceedings of the 3rd ESCA TTS Workshop, Jenolan Caues, Australia (CD-Rom version). Vihman, M. M. (1996). Phonological Development: The Origins of Language in the Child. Oxford: Blackwell. Warner, N. & T. Arai. (submitted for a review). Japanese mora-timing: A review. Watanabe, K. (1987). Sentence stress perception by Japanese students. Journal of Phonetics 16, 181-186. Weinreich, U. (1953). Languages in contact. The Hague: Mouton. Weitzman, R. (1969). Japanese Accent: An Analysis Based on Acoustic-Phonetic Data. Ph.D. dissertation, University of Southern California.

Wightman, C. W., Shattuck-Hufnagel, S., Ostendorf, M., and Price, P. J. (1992). Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America 92, 1707-17.

179