PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

THE CHARACTERISTICS AND INTELLIGIBILITY OF ENGLISH PRODUCED BY JAVANESE SPEAKERS

A THESIS

Presented as a Partial Fulfillment of the Requirements to Obtain the Magister Humaniora (M.Hum.) Degree In English Language Studies

Vincentius Tangguh Atyanto Nugroho Student Number: 136332044

THE GRADUATE PROGRAM IN ENGLISH LANGUAGE STUDIES SANATA DHARMA UNIVERSITY YOGYAKARTA 2016

i

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

ACKNOWLEDGEMENT

First of all, allow me to express my gratitude to the Lord for all of His blessings to everyone. Have faith in Him for He will never let you go astray.

My admiration goes to all faculty members of English Language Studies. Their actions are exemplary. They have been considerate and helpful, and I was lucky to find myself in their good hands. The English Language Studies program was my second home for almost three years, and a home is not a home without Pak Mul’s warm smile every time we ran into each other.

Last but not least, I thank my family for being there for me. My wife, Christina Yulianti, has been wonderful – as always. Ara, Marend, and Rama – our kids – have been our biggest motivators. Though they wear us out on daily basis, every single day spent with them is worth it.

vi PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

TABLE OF CONTENTS TITLE PAGE ………………………………………………………………………………. i

APPROVAL PAGE ………………………………………………………………………... ii

DEFENSE APPROVAL PAGE …………………………………………………………… iii

STATEMENT OF WORK ORIGINALITY ………………………………………………. iv

LEMBAR PERNYATAAN UNTUK PERSETUJUAN PUBLIKASI KARYA ILMIAH ... v

ACKNOWLEDGEMENT …………………………………………………………………. vi

TABLE OF CONTENTS …………………………………………………………………... vii

LIST OF TABLES …………………………………………………………………………. xi

LIST OF FIGURES ………………………………………………………………………... xii

LIST OF APPENDICES …………………………………………………………………… xiii

LIST OF ABBREVIATIONS ……………………………………………………………… xiv

ABSTRACT ………………………………………………………………………………... xv

ABSTRAK …………………………………………………………………………………. xvi

CHAPTER I INTRODUCTION …………………………………………………………… 1

A. Background of the Study …………………………………………………………... 1

B. Problem Limitation ………………………………………………………………… 5

C. Problem Formulation ………………………………………………………………. 6

D. Research Goals …………………………………………………………………...... 7

E. Benefits of the Study ……………………………………………………………….. 8

CHAPTER II THEORETICAL BACKGROUND ………………………………………… 9

A. Theoretical Reviews ……………………………………………………………...... 9

vii PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

1. Language …………………………………………………………………… 1

2. Speech Organs ………………………………………………………………... 11

3. Speech Production ……………………………………………………………. 12

a. Speech Sounds Production as Proposed by Giegerich ……………………... 13

1) The Initiation Process …………………………………………………... 14

2) The Process …………………………………………………. 15

a) The Larynx ………………………………………………………….. 16

b) Modification to Airstream by the Larynx…………………………… 17

3) The Oro-Nasal Process …………………………………………………. 21

4) The Articulation Process ……………………………………………….. 22

a) The Vocal Tract ……………………………………………………... 22

b) Modification to Airstream by the Vocal Tract……………………… 23

5) The Interaction of the Four Sequential Events …………………………. 25

b. Speech Production Mechanism as Proposed by Ladefoged and Johnson ….. 26

1) The Airstream Process ………………………………………………… 26

2) The Phonation Process …………………………………………………. 28

a) Phonation of Javanese Plosives ……………………………………... 33

b) Phonation of English Plosives ………………………………………. 37

c) Distribution of Voicing ……………………………………………… 42

3) The Articulatory Process …..…………………………………………… 48

a) ……………………………………………… 48

b) ………………………………………………… 49

viii PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

4. Brunelle’s Arguments Regarding Realization of Phonemic Contrast between

Javanese Lax and Tense Plosives …………………………………………….. 52

5. Comparison and Contrast of English – Javanese Plosives …………………… 58

6. Intelligibility ………………………………………………………………….. 61

B. Theoretical Framework …………………………………………………………. 66

CHAPTER III RESEARCH METHODOLOGY ……………………………………….. 67

A. Data of the Study and Data Source …………………………………………… 67

B. Approach ……………………………………………………………………… 69

C. Method of the Study …………………………………………………………….. 69

1. Data Collection ………...…………………………………………………….. 70

2. Data Analysis ………………………………………………………………… 73

CHAPTER IV RESULTS AND DISCUSSION ……………………………………… 77

A. Results ………………………………………………………………………… 77

1. Onset F1 frequency …………………………………………………………... 77

2. VOT length …………………………………………………………………... 80

3. Duration of voiced sound preceding word-final ……………………... 82

B. Discussion ………………………………………………………………………. 83

1. Parameters in the study ………………………………………………………. 83

2. Javanese informants ………………………………………………………….. 86

3. The characteristics of English plosives produced by Javanese informants ….. 90

a. English plosive in word-initial position ……………………………………. 91

1) Reduction in F1 frequency ……………………………………………... 92

ix PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

2) Perceived aspiration ……………………………………………………. 98

3) Prevoicing ……………………………………………………………… 124

b. Duration of voiced sound preceding word-final plosives ………………….. 133

4. The plausibility of plosives produced by Javanese informants to be

recognized as English ………………………………………………………… 140

CHAPTER V CONCLUSIONS AND RECOMMENDATIONS ……………………… 146

A. Conclusions …………………………………………………………………….. 146

B. Recommendations ………………………………………………………………. 148

REFERENCES …………………………………………………………………………. 150

APPENDICES …………………………………………………………………………. 154

x PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

LIST OF TABLES

Table 1. Ladefoged and Maddieson’s continuum of phonation types ………………. 32

Table 2. English-Javanese Plosives …………………………………………………. 60

Table 3. Range of F1 frequencies ………………………………………………….... 78

Table 4. Comparison of F1 frequencies at the onset of following word-initial

voiceless-voiced plosives …………………………………………………... 79

Table 5. Comparison of VOT lengths of voiceless-voiced plosives ………………... 82

Table 6. Comparison of duration of voiced sound preceding final plosives ………... 83

Table 7. Description of F1 frequency of vowel following voiced plosive as

compared to F1 of the same vowel following the corresponding voiceless

plosive ..………….…………………………………………………………. 93

Table 8. Comparison of VOT length of word-initial voiceless-voiced plosives ……. 99

Table 9. Word pairs reflecting expected realization of contrast between voiced-

voiceless plosive in word-initial position ………………………………….. 111

Table 10. Differences in duration of vowel occurring before word-final plosives …… 134

xi PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

LIST OF FIGURES

Figure 1. [b̥ ɑ] – [pɑ] ………………………………………………………………. 55

Figure 2. [˳d̪ ɑ] – [t̪ɑ] ………………………………………………………………. 56

Figure 3. [ɖ,̥ɔ] – [ʈɔ] ………………………………………………………………. 57

Figure 4. [g̥ ɑ] – [kɑ] ………………………………………………………………. 57

Figure 5. Range of intelligibility levels …………………………………………. 68

Figure 6. VOT lengths of voiceless and voiced plosives ………………………… 81

Figure 7. Comparison of formants frequency reduction ………………………… 97

Figure 8. VOT of word-initial /p/ …………………………………………………. 116

Figure 9. VOT of word-initial /t/ …………………………………………………. 117

Figure 10. VOT of word-initial /k/ …………………………………………………. 119

Figure 11. The pair pull – bull by Mita ……………………………………………. 127

Figure 12. Gap in duration of vowel preceding word-final bilabials ………………. 139

Figure 13. Empirical perception of degree of intelligibility by Kristen and Mum …. 143

Figure 14. Mean values of intelligibility as perceived by Kristen ………………….. 144

xii PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

LIST OF APPENDICES

Appendix 1. Initially-positioned /p/ and /b/ ……………………………………… 154

Appendix 2. Group 3: Finally-positioned /p/ and /b/ ……………………………. 155

Appendix 3. Group 4: Initially-positioned /t/ and /d/ …………………………….. 157

Appendix 4. Group 6: Finally-positioned /t/ and /d/ ……………………………… 159

Appendix 5. Group 7: Initially-positioned /k/ and /g/ ……………………………. 161

Appendix 6. Group 9: Finally-positioned /k and /g/ ……………………………… 162

Appendix 7. Gap in duration of vowel preceding final alveolars ………………… 164

Appendix 8. Gap in duration of vowel preceding final velars ……………………. 164

Appendix 9. General impression of levels of intelligibility as perceived by Kristen

and Mum ……………………………………………………………… 164

Appendix 10. Levels of intelligibility as perceived by Kristen ……………………… 165

Appendix 11. Mita’s TOEFL Certificate …………………………………………….. 165

Appendix 12. Yudi’s TOEFL Certificate …………………………………………….. 166

Appendix 13. Doni’s TOEFL Certificate …………………………………………….. 166

Appendix 14. Nurul’s TOEFL Certificate ………………………..…………………. 166

Appendix 15. Adi’s TOEFL Certificate ………………………..…………………….. 167

Appendix 16. Nana’s TOEFL Certificate ………………………..…………………... 167

xiii PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

LIST OF ABBREVIATIONS

F1 : First Formant

VOT : Onset Time

xiv PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

ABSTRACT

Vincentius Tangguh Atyanto Nugroho. 2016. The Characteristics and Intelligibility of English Plosives Produced by Javanese Speakers, Yogyakarta: The Graduate Program in English Language Studies, Sanata Dharma University.

It is a matter of fact that sound production of English as L2 is affected by an extent of transfer from the phonological rules of L1 already set firmly within the speaker. In the case of Javanese speakers, their production of English indicates transfer from Javanese, a language over which they have already acquired native mastery. The study attempted to understand the acoustic characteristics of English plosives produced by Javanese speakers. The acoustic characteristics were described using three parameters, onset F1 frequency, length of VOT, and duration of voiced sound preceding word-final plosive. The study also attempted to understand the plausibility of words carrying plosives produced by Javanese informants to be perceived as English. The study involved six Javanese informants having different previous training in, exposure to and practice with English. Two raters who speak English as the first language were invited to provide their perception towards the degree of intelligibility of words produced by informants in the study. Informants read word pairs in a list and their recorded voices were emailed to the raters who gave their perception about the plausibility of the utterances to be perceived as English. The study revealed that English plosives produced by Javanese speakers were characterized by a reduction in F1 frequency following an English voiced plosive in most observed tokens. The study found out that only 22.2% of all tokens featuring word-initial /p/ and /b/ were aspirated. Also, 52.4% of all tokens featuring word-initial /t/ were aspirated while only 21.4% of tokens featuring word-initial /d/ were aspirated. 86.1% of all tokens featuring word-initial /k/ were aspirated while 94.4% of all tokens featuring word-initial /g/ were also aspirated. Data from the study showed inconsistencies with duration of voiced sound preceding word-final plosive. Results of measurements in the study revealed that female informants variably started vibration of vocal folds before the release of voiced plosive in different tokens. These characteristics correlate with perception of raters towards English plosives produced by Javanese speakers. Among plosives in word-initial position observed in the study, raters found bilabials the hardest to recognize and alveolars the easiest. Raters also considered plosives in word- final position were harder to recognize than plosives in word-initial position.

Keywords: VOT, F1, lax, tense, register

xv PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

ABSTRAK

Vincentius Tangguh Atyanto Nugroho. 2016. The Characteristics and Intelligibility of English Plosives Produced by Javanese Speakers, Yogyakarta: Program Pascasarjana Kajian Bahasa Inggris, Universitas Sanata Dharma.

Adalah sebuah fakta bahwa produksi ujaran bahasa Inggris sebagai L2 sedikit banyak dipengaruhi oleh transfer aturan fonologis L1 yang sudah kokoh dikuasai pembuat ujaran. Dalam kasus penutur Jawa, produksi ujaran bahasa Inggris mereka menunjukkan transfer dari bahasa Jawa, yang telah dikuasai para penutur Jawa sebagai bahasa ibu. Penelitian ini berusaha memahami karakteristik akustik plosif bahasa Inggris yang dihasilkan oleh penutur Jawa. Karakteristik akustik dijelaskan menggunakan tiga parameter, frekuensi F1 onset, panjang VOT, dan durasi suara vokal yang muncul sebelum plosif di akhir kata. Penelitian ini juga berusaha untuk memahami plausibilitas ujaran yang mengandung plosif bahasa Inggris yang dihasilkan oleh informan Jawa. Penelitian ini melibatkan enam informan Jawa yang pernah menjalani pelatihan, paparan, dan praktek berbahasa Inggris yang berbeda-beda. Dua penutur bahasa Inggris diundang untuk menjadi penilai dengan memberikan persepsi terhadap tingkat kejelasan kata yang dihasilkan oleh informan dalam penelitian ini. Informan membacakan serangkaian pasangan kata dan suara mereka direkam dan diemailkan ke penilai yang selanjutnya memberikan persepsi mereka tentang plausibilitas pasangan kata sebagai ujaran berbahasa Inggris. Hasil penelitian menunjukkan bahwa karakteristik plosif bahasa Inggris yang diproduksi oleh penutur Jawa ditandai dengan penurunan frekuensi F1 onset yang mengikuti plosif bahasa Inggris bersuara di sebagian besar token yang diamati. Penelitian ini menemukan bahwa hanya 22,2% dari semua token yang menampilkan /p/ dan /b/ di awal ujaran yang mengalami aspirasi. Selanjutnya 52,4% dari semua token yang menampilkan /t/ di awal ujaran mengalami aspirasi dan hanya 21,4% dari token menampilkan /d/ di awal ujaran mengalami aspirasi. 86,1% dari semua token yang menampilkan /k/ di awal ujaran mengalami aspirasi, sementara 94,4% dari semua token yang menampilkan /g/ di awal ujaran mengalami aspirasi. Data dalam penelitian ini menunjukkan inkonsistensi dengan durasi suara vokal yang muncul sebelum plosif di akhir kata. Hasil pengukuran dalam penelitian ini mengungkapkan bahwa informan perempuan secara acak memulai getaran pita suara sebelum plosif bersuara dilepaskan. Karakteristik ini berkorelasi dengan persepsi penilai terhadap plosif bahasa Inggris yang diproduksi oleh penutur Jawa. Di antara semua plosif yang muncul di awal ujaran yang diamati dalam penelitian ini, penilai menganggap bahwa bilabial adalah yang paling sulit untuk dikenali dan alveolar adalah yang paling mudah. Penilai juga menyatakan bahwa plosif yang muncul di akhir ujaran lebih sulit dikenali daripada plosif yang muncul di awal ujaran. Kata kunci: VOT, F1, lax, tense, register xvi PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

1

CHAPTER I

INTRODUCTION

A. Background of the Study

In 2010 a Tanzanian and thirteen Indonesian graduate students sat in a class throughout a semester. Although English was not the first language (L1) of any students in the class, at times they exchange emails and text messages in English among themselves and communication went unimpeded. Each of them understood with ease what others meant. It was not the case, however, when they engaged in a verbal communication. Despite the fact that everyone in the class demonstrated fluency in oral English, the Indonesian students found it difficult to understand what the Tanzanian student said. Similarly, the Tanzanian student often had to strain his ears to catch utterances given by the Indonesian students. They displayed almost similar mastery of English grammar, but their accents were too different as though they had spoken two different languages. This phenomenon concurs with a claim by Escudero (2005 p.1) that second language (L2) learners are characterized by their ‘foreign-accented speech’ which is a result of difficulties in mastering second language sounds.

Unlike bilinguals who acquire two (or more) languages at the same time during their childhood, L2 learners acquire skills in a new language after they master their first language. Termed as sequential bilingualism, L2 learners start to learn a new language after they have achieved native competence in a first language (Fromkin,

Rodman and Hyams 2011, p. 357). It is not a rare case that learners of L2 manage to demonstrate flawless mastery of L2 grammar. Yet, as regards pronunciation,

Collins and Mees (2003, p. 186) and Fromkin, Rodman and Hyams (2011, p. 361) PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

2 agree that most L2 learners are bound to fall short of achieving native-like competence.

As evidenced by development of ‘foreign-accented speech’, efforts to master a second language are often marred by problem in mastering pronunciation of L2. There are variations in production of L2 sounds because of influence of native languages (McMahon 2002, p. 97) in the form of interference from phonemes, phonological rules, or syllable structures of L1 (Fromkin, Rodman, and

Hyams 2011, p. 363), which is also called negative transfer from L1 to L2 (Crystal

2008, p. 249). Accents influenced by negative transfer will produce speeches which vary from ‘standard’ English.

Behrens and Sperling (2010) assert that an accent is only variation at the pronunciation level and different varieties of language are equally effective means of communication. Though what is perceived as the ‘standard’ is just another variety of language, many see ‘standard’ pronunciation as the ‘normal’ way to speak the language (pp. 16-17). Thus, failure to produce ‘standard’ or ‘unaccented’ pronunciation can be seen as a failure to produce formal and educated utterances

(p. 15). Although everyone who speaks English speaks ‘with an accent’ (McCully

2009, p. 7), the ‘standard’ pronunciation offers a ‘high market value’ in places that require formality such as schools and workplace (Behrens and Sperling, 2010, p.

17).

Additionally, production of English sounds which varies from the ‘standard’ pronunciation may cause irritation or amusement on the part of listeners, even breakdown of intelligibility (Collins and Mees 2003, pp. 186-187). In other words, less “accented” English sounds are expected to improve comprehensibility and PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

3 acceptability. A good test bed for the significance of English sounds with reduced possibilities for irritation, amusement nor disruption to intelligibility is an interaction between speakers who speak different Englishes. The earlier anecdote exemplifies this situation. While all students in the story speak the same language, they produce English with distinct “foreign accents”. The L1s of respective speakers influence English sounds they produce, and it hampers communication.

To the Tanzanian student, whose mother tongue is Swahili, English is a second language. His Indonesian classmates, on the other hand, speak English that is used primarily as a foreign language in Indonesia.

As a foreign language in Indonesia, English has no significant numbers of first or second language speakers (Seargeant 2012, p. 151). Therefore, it is inevitable that most Indonesians learning English have already acquired mastery in their L1. This clearly takes its toll on English sounds produced by Indonesian speakers. Their pronunciation of English sounds is “foreign-accented”.

Any attempts to reduce ‘foreign accent’ in speeches made by English- learning Indonesians will require identification of the exact L1 whose phonemes and phonological rules influence production of English sounds. Because of the multiethnic and multilingual nature of Indonesia, it is not easy to single out one language in Indonesia, which is responsible for L1 influence to the production of

English sounds by Indonesian speakers. Since Javanese is the language spoken by the majority of people residing in Yogyakarta – where the study was conducted, its phonemes and phonological rules was suspected to influence the production of

English plosives by local people. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

4

Yogyakarta Special Region is the home to and culture.

It is a province whose inhabitants speak Javanese as their L1. Javanese, an ethnic language of Javanese people, is a mother tongue for people living in most part of

Java, certain areas in Sumatera and Kalimantan, Suriname, New Caledonia, and along the western coast of Johor (Wedhawati, Nurlina, Setiyanto, Sukesti, Marsono, and Baryadi 2010, p. 1). The study involved Javanese-speaking informants living in Yogyakarta city and its neighboring districts. Although it is not uncommon that people in the area speak Indonesian on daily basis, they speak Indonesian with an accent clearly different from, for example, the accent of local people in Jakarta or

Flores. It gives a hint on influence of the prevailing local language – Javanese – on

Indonesian language. In other words, the Javanese phonology and phonological rules influence the Indonesian language that people speak locally. As

Poedjosoedarmo (1982 p. 7) indicates, the influence of Javanese on Indonesian has come directly from speeches of the Javanese using Indonesian. The accent that people in Yogyakarta speak with reflects such direct influence. Therefore, Javanese language can be a good starting point to talk about negative transfer that contributes to accented English speech in the study.

In general, the study is an attempt to understand the acoustic characteristics of

English plosives produced by speakers who already acquire native mastery of

Javanese phonology and phonological rules. The acoustic characteristics are described using the following parameters: onset F1 frequency, VOT length, and duration of voiced sound occurring immediately before final plosive. The study also seeks to reveal the intelligibility of English plosives produced by informants in the study who are all Javanese speakers. Following assertion by Collins and Mees PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

5

(2013, pp. 215-216) about significant errors that may hamper intelligibility, the study will focus on the oral members of stops, the plosives. These are produced by completely closing articulators so that the airstream cannot escape through the mouth (Ladefoged and Johnson, 2011, p. 14). These consonants are among the most common sounds occurring in different languages. In fact, about 98 percent of the world's languages have voiceless stops (Ladefoged 2001 p. 140). The study aims to reveal how Javanese plosives affects the pronunciation of English plosives. Informants of this present study were six Javanese adult learners of

English, who resided in Yogyakarta at the moment of data collection. While all informants showed different degrees of influence of Javanese phonological rules, one person had the most marked Javanese accent in his pronunciation of English words and another one was an English teacher who had been intensively and extensively trained in English. Three people, one Tanzanian and two North

Americans, were invited to give their empirical perception concerning the intelligibility of words produced by all six Javanese informants. While the

Tanzanian national did not respond, the two US citizens who speak English as the first language agreed to participate in the study.

B. Problem Limitation

On significant errors that may lead to break down of intelligibility, Collins and

Mees (2013, pp. 215-216) have claimed that Indonesian people have trouble with aspiration and final voiced-voiceless stops when speaking English. Following the claim, the study will not attempt to discuss all possible speech sounds. Instead, the study only focuses on oral stops – or plosives – known and discussed in earlier works. Due to financial and time constraints, this phonetic study limits itself to PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

6 plosives occurring in word-initial and final positions. Occurrence of plosive within a cluster is out of the focus of the present study. The study also leaves out two sets of plosives. It omits glottal stop because its status as an English phoneme is questionable (Ladefoged and Johnson 2011 p. 38) and the subtype of plosives which are released more gradually than usual (McMahon 2006 p. 43), the .

C. Problem Formulation

To produce any speech sounds, no matter how varied they are, humans use their articulators to modify a moving airstream as it travels through the vocal tract.

Different articulatory modifications will result in different speech sounds. The modified airstream leaving the speaker’s mouth bears a particular pattern that reflects the articulatory gestures that produced it (Ashby 2011 p. 52). In turn, this modified, pattern-bearing, moving airstream triggers pressure fluctuations in the surrounding air (Johnson 2003, p. 3). The pressure fluctuations impact on a hearer's eardrum and causing it to move along with the fluctuations. The auditory system of the hearer transforms these movements into neural impulses. The hearer's brain recognizes the pattern of neural impulses and translates it back into the sound produced by the speaker. Different speech sounds that the hearer perceives relates to different articulatory modifications within the speaker’s vocal tract. In other words, what speech sound the hearer perceives under normal condition (the hearer is without hearing impairment and with minimum presence of disrupting sounds or elements: static, rain, storm, etc.) reflects how articulators are used to modify the airstream.

To understand how a speaker of a certain variety of English perceives English plosive sounds produced by a speaker of Javanese as a native language, there is a PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

7 need to reveal what modification to airstream that happens within the vocal tract of a speaker of Javanese as a native language. To arrive there, first it is important to understand the production of Javanese and English plosives by their respective native speakers. A comparison and contrast should be made to see how plosives in

Javanese in English are similar and in what manner they are different.

Second, after an attempt is made to compare and contrast Javanese and English plosives, another one should be made to find out in what respect Javanese plosives influences the production of English plosives by speakers of Javanese as a native language. Lastly, there is a need to discover how such influence may affect intelligibility.

To summarize, the study aims to answer the following questions:

1. What are the acoustic characteristics of English plosives produced by

speakers of Javanese as a native language?

2. How plausible are plosive-carrying English words produced by Javanese

speakers to be perceived as English?

D. Research Goals

To answer questions presented above, first the study sought to discover acoustic characteristics of English plosives produced by speakers already acquiring native mastery of Javanese phonological rules. To attain that, it was necessary to understand similarities and differences between Javanese and English plosives.

After an attempt to make comparison and contrast was made, English plosives produced by informants were analyzed to identify features of Javanese plosives that may affect production of English plosives. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

8

Second, the study aimed to discover the plausibility of plosive-carrying words spoken by Javanese speakers to be perceived as English. The two English-speaking raters gave their perception regarding the intelligibility level of English words carrying word-initial, word-medial, and word-final plosives. The words were spoken by Javanese informants in the study.

E. Benefits of the Study

After reading this work, learners of English who happen to be native Javanese speakers are expected to understand how to fine-tune their production of English plosives. The idea is not to attain a close imitation to how speakers of English as the first language sound their plosives. Instead, native Javanese speakers who learn

English are expected to be able to improve the intelligibility of words that they produce. Though the difference between the two motives can be subtle, something can be said about the latter. Native Javanese who learn English and read this work will have a clear idea about two things. Firstly, they understand what is needed to produce plosive-carrying English words more clearly. Thus, they are able to make themselves better understood. Secondly, they are aware that since phonetic influence of Javanese plosives is inescapable at a certain degree, a perfect imitation of speakers of English as the first language is not a goal that they need to attain.

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

9

CHAPTER II

THEORETICAL BACKGROUND

A. Theoretical Reviews

Every day we hear sounds. There are sounds of car whooshing, bird chirping, or clock ticking. These sounds may continue for hours at a variety of frequency and volume, yet they may go unnoticed unless a meaning is assigned to them. The sound of car honking at the driveway, for example, may send a significant idea to a person who has been waiting for a pick-up, but it may also mean nothing to a random bystander. The same honking sound from the car may mean one thing to a person, but it may mean another thing, even meaningless, to another people. It is different from the sounds that have been arbitrarily laden with meaning, the sound of a language (Fromkin, Rodman, and Hyams 2011, p. 37). Speakers of the same language have generally a similar idea about possible meanings of a word in their language.

1. Language

It is hard to image how we would do things in life without language. We order our coffee, ask for less sugar, and leave a comment on a friend’s Facebook page while pondering upon the wittiest thing to say, all are done using language. We experience and use language all the time. Even when you talk to yourself, you use language in your mind. We are so accustomed to language that we often take language for granted and simply define language as a way for us to communicate.

However, to say that for humans a language is a means of communicating is a short-sighted opinion because obviously animals do communicate. It is especially true during their mating season. They are capable of informing their presence and PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

10 attracting opposite sex using sound, color and odor. A pride of lions is able to orchestrate a coordinated ambush and a group of killer whales can manage to work together to bring down their marked prey. The fact that we do not have an exhaustive account of animal communication does not justify that only human know how to communicate. Having observed bees, Mangum (2010, p. 272) points out that there are features of human language present in some aspects of bee communication system. This, of course, may raise a question whether our more developed brain and speech organs mean nothing because we are not different from animals. To circumvent the situation, what we can do is to formulate a definition for human language that sets our species apart from other creatures.

Bussmann (2006) argues that human language is a means of expressing and receiving thoughts and ideas, of teaching, and of passing over wisdom (p. 627). She also specifically suggests that language is a phonetic-based system to encode and transmit objects and events which may not exist at the moment or locus of speaking

(p. 62). Language, in her view, is meaningful and systematic sounds produced by humans. This fits Davenport and Hannahs’ claim (2010, p. 7) that the most frequent medium through which most of us experience language is sound, something which

Jackson (2002, p.1) concurs that it is human speech that most of us experience as language most of the time.

The language that most of us experience most frequently, the human speech, is a series of noises created inside the throat, mouth, and nasal passages that flows rapidly out from the mouth and sometimes the nose (Akmajian, Demers, Farmer, and Harnish 2001, p. 66). The throat, mouth, nasal passages, and body parts inside them useful for speech sound production are called speech organs. In producing PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

11 speech sounds, we make gestures or movements of the tongue and the lip to form particular sounds (Ladefoged and Johnson 2011, p. 2). We also use other body parts in speech production. For example, we contract and relax our muscles controlling lower lip so that we can spread or round the lips to change quality of . We make these movements quickly and precisely that our speech organs are considered more complex than those of other mammals (Mullany and Stockwell 2010, p. 2).

Human would experience completely different speech sounds if their speech organs were not this complex.

2. Speech Organs

It is interesting to reveal what is implied in the statement that the speech organs of human are more complex than other mammals. To start with, let us observe what we can do with these organs. Humans have been exploiting these organs to produce language. They use specific parts of the mouth, the articulators, to shape different sounds. Variation in the position and movement of tongue and lips – the most important articulators (Mullany and Stockwell 2010, p. 4) – will result in a variety of sounds.

Interestingly, our speech organs are not designed specifically for speaking.

Instead, their main functions are related to respiratory system and the processing of food (Gussenhoven and Jacobs, 2011 p. 15). We use our mouth to channel food into our own body, our teeth to grind food, our tongue to taste and swallow food, and we use our lungs to pump air in and out of our body so that we can still see another day. Yet, we can use these organs for another – equally important yet not their natural – purpose. We use these body parts to make speech sounds that other people can hear. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

12

3. Speech Production

Speech sounds need to be audible so that they can be heard and recognized. As speech sounds lose energy as they move across air (Johnson 2003, p. 4), we need to produce them loudly and clearly enough within the frequency range that humans can hear. Fischer (2001, p. 11) claims that an average 15-year-old person can hear sounds within the range of 30 to 18,000 hertz at the loudness and closeness of normal conversation. Frequencies below or above that range are not only hard to hear but also difficult to produce using human speech organs.

To produce any kind of sound we need a moving body of air or airstream. To produce speech sound, however, we need to ensure that an airstream is available for us anytime we need it. To make sure such convenience, we use our own respiratory system to start the airstream.

Since respiratory is a system that enables breathing, to talk means to do modified breathing. When we talk we continue to exchange oxygen and carbon dioxide as in normal breathing but we alter our normal breathing rhythm. We exhale much more volume of air and deliberately impede the flow of air in the throat and mouth. In quiet breathing –when we breathe normally without talking – the phases for inhalation and exhalation are more or less similar. However, when we talk we inhale more briefly but we exhale longer. The longer the utterance we produce in a single attempt, the longer the phase of exhalation. As a result we breathe out three to four times as much air as during normal breathing.

Not only do we modify the volume of air for breathing, but we also make alteration to the flow of air when speaking. We breathe less frequently and obstruct the flow of air in the throat and mouth when we make speech (McMahon 2002, pp. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

13

24-25 and Akmajian et al. 2001, p. 67). What is marvelous is that when we speak we do not cause ourselves discomfort while we continue to modify the volume and direction of airstream using various parts of our respiratory system, which

Davenport and Hannahs concur (2010, p. 7).

Davenport and Hannahs continue that in addition to modification to the volume and direction of airstream, we also set the airstream – our speech – to vibrate and cause disturbance to the surrounding air. The air vibrates along with the speech sounds and forms sound waves. The sound waves spread through air and reach the hearer’s ears. The brain of the hearer processes the sound waves and interprets them as human speech.

Our sense of hearing and our brain may receive and interpret different sounds at the same time. For example, a person without hearing impairment hears a passing car, a buzzing washing machine, and a dialog on TV simultaneously. It means that all of these sounds can equally cause disturbances to the surrounding air. Disturbed air ripples and travels to a person’s ears as sound waves. It is the person’s brain that filters and interprets the sound waves carrying human speech as a speech. a. Speech Sounds Production as Proposed by Giegerich

Sound waves transmitting human speech, Giegerich (1992, pp. 1-7) comment, starts from the lungs and flows upwards through the trachea and vocal tract and undergoes modifications along the way. Speech sounds, Giegerich continues, are the products of an interrelated interaction among four different processes. To produce speech, first, a stream of air is pushed out of the lungs. Second, the airstream flows through the larynx and undergoes modifications. Third, the airstream may go either through the oral cavity or the nasal cavity, and each cavity PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

14 results in different consequences. Finally, the airstream receives more modifications by position and movement of anatomical structures of the vocal tract.

This view is shared by other linguists including Akmajian et al. (2001, pp. 66-71) and Odden (2005, p. 13). The latter adds that the geometry of the vocal tract modifies further the airstream by emphasizing certain frequencies while suppressing others. Presented below are the four sequential and interacting events in speech sound production, as suggested by Giegerich (1992, pp. 1-7).

1) The Initiation Process

The first event in speech sound production, according to Giegerich (1992, pp. 1-

7), happens when a stream of air is pushed out of the lungs. The longer we talk, the more air is pushed out (McMahon 2002, p. 24). In fact, when we speak, we let out three to four times more air than when we breathe normally without talking

(Akmajian et al. 2001, p. 67). Breathing out necessitates breathing in. Giegerich

(1992, p. 1) and Akmajian et al (2001, p. 67) agree that to breathe in, we contract our diaphragm, a sheet of muscular tissue that separates the chest cavity from the abdominal region. The contracted diaphragm will flatten out and as a result, the size of the chest cavity expands. An increase in the size of chest cavity can also be achieved by lifting up our ribs through contraction of intercostal muscles (Giegerich

1992, p. 1), a group of muscles between ribs in the rib cage (McMahon 2002, p.

25). Our lungs, as they are attached to the walls of the chest cavity, will expand along with the expansion of the chest cavity. As the lungs expands, the air flows in.

The air keeps filling up the lungs until inhalation completes. After inhalation is accomplished, the chest cavity relaxes and the lungs begin to shrink again

(Akmajian et al. 2001, pp. 67-68). The air is discharged until exhalation completes. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

15

During quiet breathing –breathing without talking – the air comes out quickly at first before slowing down by the end. It is similar to what happens after we inflate a bicycle tire using a pump. As soon as we pull off the pump hose from the tire, we hear a loud hissing sound before it diminishes shortly. If this happens when we talk, our speech will be loud in the beginning of an utterance before it becomes quieter by the end.

Luckily, we can avoid sounding like a released pump hose by making modification to our breathing cycle. We adapt our breathing cycle to suit the needs of speech (Akmajian et al. 2001 p. 67). When we talk the muscles of the diaphragm and the rib cage continue to be active to make sure that the lungs do not empty too rapidly. As a result, we have more control over the dynamics of our speech.

2) The Phonation Process

The second event in speech sound production according to Giegerich (1992, pp.

2-3) takes place at the larynx. The air travels into and out of the lungs through a pipe-like structure called the trachea. This is an important access pipe that constitutes the air passage from the nose to the lungs. The trachea is equipped with its own safety valve to block intrusion from foreign objects, curiously, during swallowing (Gussenhoven and Jacobs 2011, p. 16). It is because the passage from the lungs to the nose intersects the food passage from the mouth to the stomach.

Ladefoged (2001, p. 99) argues that this is not without a good reason. When we need to run quickly, we need more oxygen. Because both passages cross each other, simply breathing while opening our mouth will increase oxygen intake. These intersecting air-food passages, Ladefoged (2001, p. 99) warns, come with a risk.

Food may accidentally enter the wrong passage instead of travelling down the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

16 esophagus or food passage. It is a task for epiglottis, a small flap which closes over the larynx during swallowing, to direct food into the esophagus (Chrystal 2008, p.

171). a) The Larynx

The trachea itself, as previously mentioned, is equipped with a safety valve, the larynx. The trachea is a cartilage casing (Giegerich 1992, p. 2) constructed from a series of superimposed, incomplete rings of cartilage (Ashby 2011, p. 15). The very top cartilage, Ashby continues, is a complete ring, known as the cricoid cartilage.

This cartilage provides attachment to three cartilages that are responsible for opening and closing the airway. Those cartilages, thyroid cartilage and arytenoid cartilages, are found above cricoid cartilage across from each other. These three are important cartilages, from linguistic point of view, that construct the larynx (Ashby

2011, p. 16). The thyroid cartilage makes the front wall of the larynx. Its outer part is easily seen, especially in men, as Adam’s apple (Giegerich 1992, p.2). Its inner surface, however, is more important. Ashby (2011, p. 16) describes that the thyroid cartilage provides the anchor point for the front ends of two muscular bands of tissue. While the front ends of these muscular bands are secured to a single cartilage, the back ends are anchored to two separate cartilages that form the rear wall of the larynx, the arytenoid cartilages. The arytenoid cartilages, Ashby (2011, p. 16) explains, are a matching pair of small cartilages resting above the cricoid cartilage ring, across from the thyroid cartilage. These two muscular bands of tissue,

Gussenhoven and Jacobs declare (2011, p. 16), can be drawn tightly together and act as a valve that shut off the trachea. Gussenhoven and Jacobs further add that the main function of these two muscular bands of tissue is to seal off the trachea when PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

17 we chew food, preventing saliva, food particles, or other foreign objects from dropping into the lungs. However, it is not their role in food-swallowing business that renders them their name. Instead, they are called the vocal folds because of the major role they play in speech production. It is at the larynx that airstream coming from the lungs receives modifications. b) Modification to Airstream by the Larynx

When we produce speech, we modify airstream flowing from the lungs by exposing it to a controlled resistance at the larynx. We create different resistances by varying position of the folds. Being elastic and flexible, the folds can be pulled apart or drawn closer together. This will vary the glottis or aperture or opening between the folds.

Variation in resistance can also be made by applying different tension to the folds. When more tension is given to the folds, their frequency of vibration goes up resulting in higher perceived pitch. Difference in perceived pitch is also a result of variation in the size of the folds. Larger folds will result in lower pitch. This explains why the pitch of voices of most adult males – who possess relatively bigger vocal folds – is relatively lower than that of females and children.

Vaissiere (1997, pp. 115-116) asserts that there are four ways to control the larynx resulting in modifications to the vocal folds. First, modification to vocal folds is achieved by varying the subglottal (subglottic in Vaissiere’s term) pressure and the pressure across the glottis. Heightened subglottal pressure, according to

Vaissiere, helps making focus and prominence. If subglottal pressure is too low, vibration of the vocal folds will fail to occur (Ladefoged and Maddieson 1996, p.

49). PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

18

Second, the length of the vocal folds can be directly controlled. The more tension is given to the folds, the longer the folds become and the higher the pitch goes. On the contrary, when less tension is given to the folds or when the folds are shortened, their frequency of vibration goes down resulting in lower perceived pitch.

Third, the glottis can be finely adjusted. Muscles tied to the arytenoid cartilages move the cartilages in various directions. These movements cause the vocal folds to be in open (abducted) or closed (adducted) position (Ashby 2011, p. 16).

Fourth, the whole larynx can be raised or lowered. Honda, Hirai, Masaki, and

Shimada (1999) observe that as the larynx moves vertically up and down, it also rotates which results in modification to the angle between the cricoid and thyroid cartilages. Since the ends of vocal folds are anchored to thyroid cartilage and arytenoid cartilages (which sit on cricoid cartilage), changes in the relative position of the cricoid and thyroid cartilages will affect the degree of stretching of the vocal folds. The lower the larynx goes, the larger the rotation of cricoid cartilage, and the more relaxed the vocal folds become. Shortened or relaxed vocal folds produce lower pitch (F0). In addition, upwards movement of the larynx decreases the volume of the supraglottal cavity (Vaissiere 1997, pp. 115-116), and downwards movement of the larynx expands the cavity. The increase or decrease of cavity between the glottis and the point of maximum constriction between the tongue and the palate (Brunelle (2010, pp. 9-10) modifies the acoustic output of the vocal tract

(Odden 2005, p. 13). How Acoustic output of the vocal tract is ascribed to the vocal folds is comparable to how the resonator of a guitar relates to the strings.

When the strings are played, they vibrate and produce sound. The vibration of the strings causes the air inside the resonator to vibrate along with them at a PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

19 frequency similar to the frequency of the strings. This resonance amplifies the sound originally produced by the strings and makes it louder.

The vocal folds to vocal tract are the guitar strings to the resonator. When the vocal folds vibrate, they produce sound at a certain frequency – a value that can be measured physically (Collins and Mees 2013, p. 33). The frequency at which the vocal folds vibrate is called the fundamental frequency or F0 (Gussenhoven and

Jacobs 2011, p. 18), which is also known as pitch (Ladefoged 2001, p. 75).

Sometimes frequency of vocal folds vibration changes which affects intonation and tone (Collins and Mees 2013, p. 33).

As the vocal folds vibrate, the vibration sets the air inside the vocal tract to resonate. The vocal tract resonates in a particular fashion depending on the shape of the tract (Davenport and Hannahs 2010, p. 61). The shape of human vocal tract is so complex that it enables multiple resonance frequencies to be produced at once.

For example, the body of air behind and in front of a raised tongue may vibrate at different frequencies (Ladefoged 2001, p. 33). These resonance frequencies combine together to form only a few dominant frequency groups called formants.

In other words, formants are dominant resonance frequencies of the vocal tract

(Ladefoged 2001, p. 33), whose frequencies are above the fundamental frequency or F0 (Davenport and Hannahs 2010, p. 61). Manipulation of the vocal tract enables prominence of certain formant frequencies and suppression of others (Odden 2005, p. 13).

Different formants are arranged to produce different vowels. This makes the formants identifying characteristics of vowels (Ashby 2011, p. 94). Description of a vowel is based on three main formants whose frequencies are above the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

20 fundamental frequency: the first formant (F1), the second formant (F2), and the third formant (F3) (Crystal 2008, p. 196). The first formant (F1) is the lowest resonance (Ladefoged 2001, p. 33) while F2 and F3 are higher. These formants are associated with vowels because their pattern is typical across speakers of the same language (Davenport and Hannahs 2010, pp. 61-62). Comparison of frequencies at which F1, F2, and F3 occur enables systematic description of the physical properties of the vowels (Odden 2005, p. 10). Although their actual frequencies may vary, the pattern of F1, F2, and F3 remains consistent from speaker to speaker

(Davenport and Hannahs 2010, pp. 61). There are typical formant values for a given vowel that a similar vowel in other languages may have different typical values. A vowel produced with formant values different from the typical values in a language may be perceived as ‘foreign-accented’ (Davenport and Hannahs 2010, p. 71). To attain the typical formant values, there is a need for modifications of the shape and size of vocal tract cavity (Ladefoged 2001, p. 33).

As it is in a musical instrument with strings, the bigger the resonator, the lower the resonance of the vibration of the strings. The resonance of a bass is lower than that of a violin. It means that modification to the resonator will change the resonance frequencies. Referring to the case of resonance frequency of musical instrument, changes to the shape and size of vocal tract cavity which will result in modifications to formant frequencies. The longer or bigger the cavity of vocal tract, the lower the formant (Odden 2005, p. 13) and vice versa. When a speaker lowers his or her larynx, s/he creates longer supraglottal cavity which will lower the resonance frequency. To conclude, vertical movement of the larynx can modify the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

21 pitch (F0) and formants because such movement changes the length of vocal folds and the size of vocal tract.

The exploitation of vocal folds to create sound – phonation – is crucial in speech production. Phonation modifies airstream from the lungs to create sound that becomes a basis for speech (Gussenhoven and Jacobs, 2011 p. 17). The vocal folds play a central role in phonation because, as previously mentioned, they vibrate. To be more precise, they can be set to vibrate.

The vocal folds vibrate because of the pressure from air pushed out from the lungs. The air presses forward through glottis and blows the folds apart. Because air pushed through a narrow gap – in this case glottis – induces a drop in pressure, reduction in air pressure results in suction that affects the folds. This suction effect is known as the Bernoulli Effect (Ashby 2011, p. 20). This is a physical effect which causes a drop of pressure to minimum at points where there is a high flow of gases or liquids (Gussenhoven and Jacobs 2011, p. 17-18). The suction effect reinforces the folds to rapidly collapse together and be ready to be blown apart again. This cycle of quick, repetitious closing of the folds set the air in the throat and mouth into vibration. As a result, a sound wave is produced, and this is what we perceive as the sound of the voice (Ladefoged 2001, pp. 19-21) or normal/chest/

(Ashby 2011, pp. 18-20).

3) The Oro-Nasal Process

In the third event of speech sound production, the airstream coming through the glottis may go either into the nasal or oral cavities (Giegerich 1992, p. 3). For most sounds in English (McMahon 2002, pp. 26-27) and Javanese (Wedhawati et al

2010, p 96) the airstream from the lungs is pushed up through the trachea into the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

22 larynx. When the velum or soft palate is pressed against the back of the pharynx

(Giegerich 1992, p. 5), the airstream goes further into the mouth and undergoes modifications before it is released through the lips. Because the modified airstream finds its outlet through the mouth area (Crystal 2008, p. 343), the sound produced this way is called oral sound. When the soft plate is lowered, Crystal (2008, p. 320) adds, the airstream escapes audibly through the nose. Speech sounds articulated this way are called nasal. Because all plosives are oral sounds, a discussion on nasal sounds is not pursued further.

4) The Articulation Process

The airstream emerging from the glottis receives more modification along the vocal tract (Akmajian et al. 2001, p. 70). The sound wave that we hear as human voice is not merely a result of phonation at the larynx. There are more modifications to the airstream that happen within the vocal tract, an air passage above the larynx. a) The Vocal Tract

The vocal tract is the region in which the speech sounds of human language are produced. Our vocal tract consists of the oral tract and nasal tract. The oral tract constitutes the inside of the pharynx and the mouth. The pharynx is the vertical part of air passage that spans from the larynx to the velum. The front wall of the pharynx, as Gussenhoven and Jacobs (2011, p. 23) state, is actually the root of the tongue.

While the oral tract is the space within the mouth and the pharynx, the nasal tract, as the name suggests, is an area within the nose.

Ladefoged and Johnson (2011, p. 4) describe that the design of the vocal tract enables dual outlets for the airstream coming from the larynx. The tract allows air to flow out through the nose or the mouth. This is made possible by the movement PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

23 of a flap at the back of our mouth – the velum – that can be lowered or raised to direct the flow of air pushed from the lungs. When the velum or soft palate is raised, it presses against the back wall of the pharynx, creating a velic closure that shuts off the nasal tract. This gesture forces the airstream to travel through the only outlet temporary available within the vocal tract, the mouth. Similarly, when it is lowered, it obstructs the oral tract and directs the airflow to travel through the nose. b) Modification to Airstream by the Vocal Tract

Ladefoged and Johnson (2011, p. 4) argue that how the vocal tract is shaped is important in speech production. This is consistent with a similar claim in earlier works by Ladefoged (2001, p. 2) or Odden (2005, p. 13) in which it is stated that the shape of vocal tract determines acoustic output. By varying the shape of the vocal tract, certain sound wave frequencies will be emphasized while at the same time some others will be suppressed. Such modifications are made possible by the position and movement of parts of the vocal tract – termed articulators. These articulators are anatomical structures of the vocal tract that can be used to form sounds. By placing articulator in a particular position, the shape of the vocal tract will change. The airstream going through the vocal tract undergoes modification as the configuration of articulators within the vocal tract shifts (much as different ways of squeezing a water hose will vary how the water discharges).

Human has a number of articulators. While emphasizing the prominent role of the tongue and lips as articulators, Mullany and Stockwell (2010, p. 4) add teeth, alveolar ridge, hard palate, soft palate or velum, uvula, and vocal folds into the list of human articulators. Of all enlisted articulators, none of them is found within the nasal tract. Though modification to airstream may happen when the air flows PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

24 through the nose, all the above articulators are located within the oral tract. This includes the velum which is responsible for sending the airstream to the nose. Seen from the location of these articulators along the oral tract, different articulators form the lower and the upper surface of the oral tract.

Forming the lower surface of the vocal tract are the active articulators. The name points out the nature of these anatomical structures; they are highly mobile and active. One of the active articulators is the part that you protrude the most when you pout, the lower lip. Of the two soft edges at the opening of the mouth, the lower lip is more active and movable than its upper counterpart. Playing an equally important role as an active articulator is the tongue. For the convenience of discussion in many works, the tongue is divided into a number of sections. Though the division does not imply clear-cut boundaries between sections, it gives clues as to different regions on the tongue: the tip, blade (the zone immediately behind the tip), front

(the part opposite the hard palate), back (the part opposite the soft palate) and root.

Together, the front and the back of the tongue are referred to as the body (Davenport and Hannahs 2010, p. 11) or the dorsum (Gussenhoven and Jacobs 2011, p. 23).

While the lower lip and the tongue are agile, other articulators forming the upper surface of the vocal tract are more passive.

Davenport and Hannahs (2010, pp. 11-12) and Ladefoged and Johnson (2011, pp. 8-9) describe the passive articulators as the non-movable parts, which include the upper lip, the upper teeth, the roof of the mouth, and the back wall of the pharynx. The roof of the mouth is divided further into smaller regions: alveolar ridge (a small bony protuberance behind the front teeth), hard palate (the hard bony area behind the alveolar ridge), soft palate or velum (the soft fleshy part behind the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

25 hard palate), and uvula (the lower end of the soft palate). It is not surprising that most of articulatory contacts and the most drastic changes to the shape of vocal tract happen inside the mouth (Gussenhoven and Jacobs 2011, p. 23) because most of passive articulators are found inside it.

5) The Interaction of the Four Sequential Events

To produce speech, the active articulators actively move towards the passive articulators. By varying the position of the tongue or modifying the shape of the lips, sections of the vocal tract are expanded or contracted. This will vary the resonances when air flows through the tract (Odden 2005, p. 13). As a result, the body of air from the lungs – which has been modified at the larynx prior to entering the vocal tract – will flow out of the tract with varied resonances. This airflow causes disturbances to surrounding air which travels to a hearer’s ears as sound waves.

In each of the hearer's ear runs a narrow tube from the outer ear into the inner area of the ear. Down this narrow tube or auditory passage is a thin membrane or the eardrum. When sound waves initiated by speech sounds travel down the auditory passage, the waves cause the eardrum to move with them. The eardrum moves back and forth as the waves come and go. The vibration of the eardrum is transmitted to the liquid in the inner ear, which will vibrate along with the movement of the eardrum. The vibration of the liquid stimulates nerves connecting the liquid to the auditory sensation area of the brain. The stimulated nerves transmit the vibration to the brain. Finally, the brain translates the vibration back into sounds again, which we understand as the sensation of hearing speech sounds (Ladefoged

1996, pp. 1-2). PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

26 b. Speech Production Mechanism as Proposed by Ladefoged and Johnson

Similar to Giegerich, Ladefoged and Johnson propose four interrelating but separate processes in speech production mechanism. In general, description of speech sound production by Ladefoged and Johnson (2011, pp. 5-6) concurs with what has been proposed by Giegerich (1992, pp. 1-7). Ladefoged and Johnson, however, offer more updated information concerning mechanism to produce human speech and provide more explanation regarding types of airstream mechanism.

Ladefoged and Johnson (2011, pp. 5-6) propose a straightforward and inclusive description of airstream mechanism that leaves room for a discussion on speech sounds produced with ingressive airstream mechanism. They divide speech production mechanism into four main components: the airstream process, the phonation process, the oro-nasal process, and the articulatory process. A discussion on oro-nasal process, however, will not be carried out further in the study since plosives are oral sounds.

1) The Airstream Process

Crystal (2008, p. 18) states that airstream mechanism is a physiological process that provides an energy source for speech sound production. In other words, a mechanism of airstream is needed to generate energy to produce sound waves that carry human speech sounds. Crystal asserts that the energy for speech sound production in majority of languages is generated by the outward movement of body of air initiated by the lungs. The outward movement of air out of the lungs up the trachea through the glottis is known as egressive airstream (Giegerich 1992, p. 1).

In sound production of most languages, the lungs (pulmonic) pushes air to flow out PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

27 of the body, thus the name is pulmonic eggressive airstream (McMahon 2002, p.

25).

Akmajian et al. (2001, p. 67) pinpoint three interacting processes required for production of sound waves that carry human speech sound: process at the lungs, modification process at the larynx, and more modification process by articulators.

While this description holds true for most languages in the world, Akmajian et al.

(2001, pp. 67-71) fail to address a specific airstream mechanism in the production of certain sounds in a handful of languages such as Zulu (South Africa) and Sindhi

(India). A number of sounds in these languages are produced with a particular airstream mechanism that sucks air into the body instead of out (Davenport and

Hannahs 2010, p. 8).

Akmajian et al. in their work seem to neglect this particular airstream mechanism in which the air moves inward instead of outward. Giegerich (1992, p. 1) and

Ladefoged and Johnson (2011, p. 140) on the contrary, argue the presence of a mechanism in which the energy for speech sound production is created by inward movement of body of air. The inward flow of air, also known as ingressive airstream, is initiated by sucking the air within the mouth down through the glottis

(Ladefoged and Johnson 2011, p. 145) or by making gestures that involve moving the glottis vertically (Davenport and Hannahs 2010, p. 20).

For an obvious reason, Giegerich (1992, pp 1-7), Ladefoged and Johnson (2011, p. 5-6), and McMahon (2002, p. 25) put more emphasize on pulmonic egressive mechanism when elaborating the airstream process. Pulmonic egressive is the basic airstream mechanism that is used in every single language (Ashby 2011, p. 18).

However, in their work Ladefoged and Johnson specifically mention that to provide PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

28 the force for speech, it is possible that the air is sucked in (ingressive) in addition to being pushed out (egressive). Ladefoged and Johnson (2011, pp. 140-146) also provide description on speech sounds produced with ingressive airstream mechanism.

Leaving aside the omission of ingressive mechanism in the description given by

Akmajian et al. (2001, pp. 67-71), the notion of egressive and ingressive is only important when discussing speech sounds using glottalic airstream since they can be egressive or ingressive. Speech sounds using pulmonic airstream, on the other hand, is always egressive while velaric airstream is ingressive in every case

(Bickford and Floyd 2006, p. 2). In view of the fact that speech sounds in both

Javanese and English are produced using pulmonic airstream, there is no reason to pursue a further discussion on ingressive airstream mechanism. As a result, any following discussion is based on egressive pulmonic airstream viewpoint only.

2) The Phonation Process

The phonation process describes the actions of the vocal folds. Since the study focuses on the phonation of plosives in Javanese and English, special attention is given to how plosives are produced in English and Javanese only. While references to plosives in languages in general are made from time to time, plosives in other languages are not discussed here.

Most languages have two or more contrasting series of plosives. To contrast these series of plosives, languages vary the settings of larynx. One of variations in the laryngeal setting is the mode of laryngeal action (Ladefoged and Maddieson

1996, p. 47). Since such action involves a very complex vocal folds activity within the larynx (Ladefoged and Maddieson 1996, p. 49), simplification is needed to PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

29 construct a mental image of phonation types and mechanism of the larynx in general.

There are different types of phonation or ways in which the vocal folds are exploited to produce sound used as a basis for speech. The most important type of phonation is voice (Gussenhoven and Jacobs 2011, p. 17). It is produced when the arytenoid cartilages, by muscular contractions, bring together the vocal folds. When subglottal air pressure is strong enough to pass through the glottis, it causes the vocal folds to vibrate (Davenport and Hannahs 2010, pp. 9-10). The vibrating vocal folds, explain Ladefoged and Johnson (2011, p. 7), chop up the subglottal airstream and create pulses of air pressure that are alternately high and low.

Ladefoged and Maddieson (1996, p. 49) assert that any state of glottis in which vibration occurs is a form of voicing. It implies that there are different states of glottis in which vibration is possible. On the other hand, there are also states of glottis in which voicing is disabled. Although subglottal airstream is present, voicing is not possible when the vocal folds are too far apart (open glottis). In this setting, the subglottal airstream can flow unhindered without causing vibration. The resulting sounds are known as voiceless. Voicing is also disabled when there is glottal closure. It happens when vocal folds are pressed together causing a stoppage of air. Since airstream cannot pass through, no vibration can occur. The resulting sound is glottal sound (McMahon 2002, p. 33).

Since languages exploit different states of glottis (Davenport and Hannahs 2010, p. 10), Ladefoged and Maddieson (1996, p. 49) suggest a continuum of phonation types in terms of modes of vibration of the glottis. In the much more recent article by Gordon and Ladefoged (2001, p. 1), different phonation types are arranged PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

30 according the aperture between the arytenoid cartilages (see Table 1). In one extreme end of the continuum is voiceless state in which the arytenoid cartilages are the furthest apart (Gordon and Ladefoged 2001, p. 1) and vocal folds do not vibrate as the result of their being too far apart. In the opposite extreme end is glottal closure, the state in which arytenoid cartilages and the vocal folds are pressed together so firmly that airflow cannot pass. The configurations of vocal folds in both extreme ends disable voicing.

Between the two extreme ends are varying degrees of glottis that enable voicing.

Leaving one end where the glottis setting is voiceless and going to the opposite direction is . It is the most open configuration of vocal folds that still enables vibration. The arytenoid cartilages are open but the vocal folds are drawn together without appreciable contact (Ladefoged and Maddieson 1996, p. 48). The airflow escapes continuously between the open arytenoids cartilages and sets the vocal folds to vibrate (Ashby 2011, pp. 25-26).

The phonation type next to breathy is slack voice. Ladefoged and Maddieson describe that the vocal folds vibrate more loosely than in modal voice (p. 48) with a slightly increased glottal aperture (p. 63). They add in their description of slack voice that there is slightly higher rate of airflow than in modal voice (p. 48). After observing this phenomenon – a slightly higher rate of airflow – accompanying slack voice, Dudas (1976, p. 118), names stops produced with this kind of phonation

“heavy” while Adisasmito-Smith (1999, p. 2) labels them “breathy”.

Alongside slack voice on the continuum is modal voice. Ashby (2011, p. 18) argues that normal or modal voice is how a speaker sounds most of the time when s/he is speaking. There are regular vibrations of the vocal folds, she continues, PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

31 within the speaker's normal range. It is a result of the arytenoid cartilages and vocal folds are drawn together not so firmly that subglottal airflow can still pass through and set the folds to vibrate. Ladefoged and Maddieson (1996, pp. 48-50), however, suggest that the arytenoid cartilages are neither pulled apart nor pushed together.

The next phonation type along the continuum is . Ladefoged and

Maddieson (1996, p. 48) report that the vocal folds vibrate more stiffly than in modal voice. They argue that a contraction of the vocalis muscles may account for the action of the vocal folds (p. 55). They also observe a slightly lower rate of airflow than in modal voice. Since a series of slack stops contrasts with stiff ones in Javanese and stiff voice involves a lower rate of airflow compared to slack voice,

Dudas (1976, p. 118) terms Javanese stops produced with this type phonation

“light” while Adisasmito-Smith (1999, p. 2) “clear”. A longer discussion will be presented later on how Javanese contrasts slack and stiff stops.

The last one along the continuum in which voicing is still possible is . It is the most constricted setting of vocal folds that still enables vibration. In this state, the arytenoid cartilages and vocal folds are closed. While the arytenoid cartilages are tightly closed, the vocal folds are not too firmly that they can still vibrate. The vibration is very slowly near the point they are anchored to the thyroid cartilage, and normally along the rest of the length of the folds (Ashby 2011, pp.

24-25). Finally, at the extreme end of the continuum is glottal closure.

Ladefoged and Maddieson’s continuum provides a workable framework but this idea proposed two decades ago has its own limitations. As they have admitted themselves, the distinction between stiff voice and creaky voice is often difficult to draw. It is also difficult to tell how much muscular activity is needed to produce PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

32 stiff voice which is different from modal voice (p. 55). What can be said is that the glottal aperture for slack voice is greater than stiff voice and the arytenoid cartilages remain tight during the closure for the stiff voice stops (p. 63). In a later work

Ladefoged (2001, p. 130) affirms that terms such as breathy or creaky voice are not precise since each of the term on the continuum covers a range of voice qualities.

Table 1. Ladefoged and Maddieson’s continuum of phonation types

No voicing Voicing is enabled No voicing Open Glottis Breathy Voice Slack Voice Modal Voice Stiff Voice Creaky Voice Glottal Closure

- The - The - The vocal - The - The - The - The arytenoid arytenoid folds arytenoid vocal arytenoid arytenoid cartilages cartilages vibrate cartilages folds cartilages cartilages are open. are open. more are vibrate are are loosely neither more closed closed than in pulled stiffly firmly. firmly. modal apart nor than in voice. pushed modal together. voice.

- The vocal - The vocal - There is - The vocal - There - The vocal - The vocal folds are folds are slightly folds are is folds are folds are too far drawn higher drawn slightly drawn drawn apart to together rate of together lower together together vibrate. without airflow not rate of not tightly. appreci- than in tightly. airflow tightly. able modal than in contact. voice. modal voice. - It - It - It - It - It produces produces - “Breathy” produces - Clear produces produces voiceless breathy - Heavy normal - Light creaky glottal sound. sound. sound. sound. stop.

Ladefoged and Maddieson wrote between 3-4 pages for each stiff voice and slack voice, covering what had already been known then about stiff voice and slack voice. There are gaps in the information offered, however, that makes comparison of stiff voice and slack voice with the rest of phonation types is not easily made.

Table 1 shows that description about stiff voice and slack voice cannot easily fall into categories, unlike other phonation types. This informs us that two decades ago studies on stiff voice and slack voice were not as thorough as investigation into other types of phonation. More recent studies (Brunelle 2010, p. 7, Nothofer 2006, PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

33 p. 560, and Adisasmito-Smith 1999, p. 2) show that all Javanese stops produced with either stiff voice or slack voice, are voiceless in most environments. This brings the positions of stiff voice and slack voice along the continuum of phonation types into question. Ladefoged and Maddieson (1996, p. 99) acknowledge that it can be difficult to neatly categorize variance in phonation type. Wary of the possible problem with the addition of stiff voice and slack voice along the continuum,

Ladefoged and Maddieson suggest rearrangement of contrasts and open a new possibility that Javanese contrasts modal and slack voice stops, instead. a) Phonation of Javanese Plosives

Wedhawati et al. (2010, p. 73) list 11 plosives among a total number of 23

Javanese consonants. Plosives, especially the voiceless ones, are very common across different languages. Almost 92% of languages feature plain voiceless plosives series. A language which contrasts only 2 series of stops typically has a plain voiceless/voiced contrast (Maddieson 1984, pp. 27-28). Javanese is not a different case. The language has a phonemic contrast between two series of plosives. Brunelle (2010, p. 7) claims that Javanese contrasts a series of plain voiceless stops with a series of voiceless stops associated with a certain group of acoustic properties. In her study, Thurgood (2004, p. 280) emphasizes that these properties that separate one series of Javanese plosives from another exist as a cluster instead of a single distinctive feature. Brunelle continues that the opposition between two sets is acoustically subtle, and is achieved through, among others, the vertical movement of the larynx. This seconds suggestion by Ladefoged and

Maddieson (1996, p. 64) that lowered F1 associated with the production of slack- voiced stops indicates occurrence of larynx lowering. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

34

In his report Brunelle (2010, p. 8) uses terms ‘tense’ and ‘lax’ to for each series of Javanese stops to signal their opposition which involves, among others, changes in pitch height and F1 frequency. It should be understood that discussion on

Javanese stops is noted for idiosyncratic terminology devised by individual researchers to draw attention to the subtle opposition of the two series of Javanese stops. Other researchers refer to the contrast as a series of ‘stiff-voiced’ vs. a series of ‘slack-voiced’ stops (Ladefoged and Maddieson 1996, p. 63), ‘clear’ vs.

‘breathy’ stops (Adisasmito-Smith 1999, p. 2), ‘light’ vs. ‘heavy’ stops (Dudas

1976, p. 118), and ‘tense’ vs. ‘lax’ stops (Pennington 2005, p. 25). The study itself uses the terms ‘tense’ and ‘lax’, suggested by Brunelle and Pennington, to refer the opposition between two series of Javanese stops.

Wedhawati et al. (2010, pp. 73-74) transcribe the series of ‘tense’ stops as

‘voiceless’ and a series of ‘lax’ stops as ‘voiced’. However, labelling one series of

Javanese plosives (oral stops) as ‘voiced’ and another one as ‘voiceless’ is insufficient (Dudas 1968, p. 6). Voiced-voiceless distinction is an ineffective acoustic property to contrast phonation of Javanese plosives as Javanese does not use voicing to distinguish one series of plosives from another one. In fact, both series are voiceless (Nothofer 2009, p. 560) in any positions within a word except after nasals (Brunelle 2010, p. 7). The absence of voicing as a distinctive feature is replaced by acoustic properties collectively labelled as register (Brunelle 2010, p.

18). It is difference in register that contrast a series of Javanese tense plosives with a series of lax ones. Brunelle (2010, p. 20) suggests that acoustic properties associated with tense plosives are termed high register and those with lax plosives are low register. Register refers to variations in the length, thickness and tension of PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

35 the vocal folds (Crystal 2008, p. 409). Such variations typically affect pitch height, something that Ladefoged and Madieson (1996, p. 64) notice that frequency of F0 lowers when a vowel follows Javanese lax plosives.

Pitch variation or tone has been used in languages to represent their words.

Languages in which word meanings or grammatical categories are dependent on pitch level are known as tone languages (Crystal 2008, p. 486). At the most basic level, tone languages are categorized based on the nature of tone that is used. One group requires tones with a dynamic or changing pitch profile. These are known as contour tone languages. Another group requires the syllable to reach a certain pitch height. These are known as register tone languages (Ashby 2011, p. 174). In former group, the direction in which the pitches move is important. In latter group, the critical feature is the relative height of the syllabic pitches (Crystal 2008, p. 409).

An absolute pitch that speakers of a tone language have to attain is not important

(Fromkin, Rodman, and Hyams 2011, p. 213). Syllables have to reach a certain pitch height (Gussenhoven 2004, p. 26) which is relatively higher or lower in comparison to other pitches (Fromkin, Rodman, and Hyams 2011, p. 214).

Many tone languages just have a binary level contrast, high and low. They have different ways to vary the contrast between levels. In one of the variations, tones are combined with breathy voice and contrasted with another level in which tones combined with creaky voice. This variation is known as register (Gussenhoven,

2004, pp. 26-27). Pennington adds that to a lesser degree, lax voice has the same characteristics as breathy voice (2005, p. 24), and tense voice is considered a weaker form of creaky voice. Both voices are contrastive (2005, p. 25). PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

36

As Brunelle (2010, p. 18) has proposed that the contrast between two series of

Javanese voiceless plosives is realized through the use of two different registers, occurrence of a series of Javanese tense plosives is accompanied with high register and a series of Javanese lax plosives with low register. High register prescribes a shorter VOT and low register a longer VOT (Brunelle 2010, p. 9) with a slight aspiration (Wedhawati et al. 2010, p. 77, 83, and 92). Length of VOT to contrast two opposing series of plosives is not uncommon. In languages such as Thai,

Korean and Burmese, length of VOT is contrastive (Ashby 2011, p. 217).

Not only that, the use of high register also implies that the vowel following tense plosives should be acoustically sharp and clear (Horne 1961, p. xxix), produced with a higher pitch (F0) at the onset (ladefoged and Maddieson 1996, p. 64), and a higher F1 frequency (Thurgood 2004, p. 286). On the contrary, the use of low register requires the vowel following lax plosives to be acoustically fuzzy and murmured (Horne 1961, p. xxix) and produced with a lower pitch (F0) at the onset

(ladefoged and Maddieson 1996, p. 64), and a lower F1 frequency (Thurgood 2004, p. 286).

That occurrence of Javanese plosives in a word is accompanied with register fits the assertion by Matisoff (1973, p. 87) that Javanese, along with other western

Austronesian languages, has acquired register system. Though Javanese is not a tone language (Wedhawati et al. 2010, p. 102), Matisoff (1973, p. 87) assumes that it acquires tone system because of close cultural contact with tone languages.

People have migrated and roamed the region for centuries, providing an abundance of events for language contact. That phonation of Javanese plosives is affected by PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

37 register system adds more dimension in comparison to phonation of English plosives. b) Phonation of English Plosives

English plosives may be either voiceless or voiced (Davenport and Hannahs

2010, p. 19 and McMahon 2002, p. 28) but McMahon (2006, p. 364) argues that settings of the vocal folds may take far more variations than merely voiceless and voiced. For example, phonation of English sounds with creaky voice is common in many accents of English. Some English speakers are accustomed to keeping the larynx in a relatively high position and some others in a mid-point or low position.

However, these settings may be linked to a particular regional and/or social accent of English, or an idiosyncrasy (McMahon 2006, p. 376). In general, voice quality or pitch is not used in English to distinguish words (Ladefoged 2001, p. 130). This should revert us to the idea of voiceless-voiced sounds. These opposing states of the glottis have been used by linguists to describe English consonant sounds, in addition to the description about place and manner of articulation (Mullany and

Stockwell 2010, p. 4).

However, in reality the opposition between the set of voiceless plosives and the set of voiced plosives is not only that there is voicing during stop closure in one set while there is not in the other set. When plosives are used word-initially and word- finally, most speakers of English make insignificant voicing during stop closure

(how stops are produced will be discussed later). As a result, word-initial plosives

[p] and [b] in pie and buy are essentially voiceless (Ladefoged and Johnson 2011, p. 57). PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

38

What makes both initially-positioned plosives in pie and buy different is the aspiration, the small burst of air following the release of the lip closure in pie but not in buy (McMahon 2002, p. 18). A stop is said to be aspirated when a voiceless stop such as [p] in the initial position in a stressed syllable (Fromkin, Rodman, and

Hyams 2011, p. 240) is produced with a puff of air, an acoustic effect of a delay in the onset of voicing (Collins and Mees 2013, p. 87). The aspiration is a result of the setting of the vocal folds that remain open for a brief period after the release of the stop (Fromkin, Rodman, and Hyams 2011, p. 571). Because the vocal folds are open for a very short time, there is a period of before voicing the vowel following the stop. The aspiration of [p] in pie comes as a slightly noisy interval before the vowel starts (Ladefoged 2001, p. 119). This interval is a significant delay between the release of /p/ in pie and the beginning or onset of vocal-fold vibration for the two-sound vowel or diphthong [aɪ] (Davenport and Hannahs 2010, p. 69).

The time relation between the moment at which vocal folds begin to vibrate – the onset of vocal-fold vibration – and the release of plosive is called VOT or Voice

Onset Time (Crystal 2008, p. 155). VOT is a feature associated with stops that can be viewed on spectrograms (Davenport and Hannahs 2010, p. 69). The VOT has a positive value when the onset of vocal-fold vibration is later than the plosive release, as evidenced in the aspiration of /p/ in the beginning of pie. The common values for aspirated plosives are between VOT +50 milliseconds and VOT +80 milliseconds. The value +50 milliseconds means that the VOT is positive and the onset of voicing is 50 milliseconds after plosive release (Gussenhoven and Jacobs

2011, p. 18). Ladefoged gives an earlier onset of voicing for typical aspirated plosives, at around VOT +30ms (Ladefoged 2001, p. 18). PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

39

A VOT in the region of 0-20 milliseconds is hardly perceived as aspiration of stops (Ladefoged and Johnson, 2011, p. 151). To prove the case in point, Ladefoged and Johnson (2011, p. 154) refer to unaspirated voiceless plosive in Hindi, whose

VOT is only 20 milliseconds, a delay of voicing which is not significantly long enough to be perceived as an aspiration. Since plosives with a VOT of +20 milliseconds or less are not perceived as aspirated, Ashby and Maidment (2005, p.

183) add that a hearer who is a speaker of English as the first language dismisses a

VOT between 0-20 milliseconds as an aspiration and perceives the plosive produced with such VOT as voiced. Thus, it can be inferred that a VOT value that spans for less than 20 milliseconds – only one-fiftieth of a second –is negligible.

Any value of Voice Onset Time between 0-20 milliseconds is too short to be significantly heard as an aspiration in English.

When voicing does not begin after the release of stops but before the release of stops or during the plosive closure, is said that the VOT has a negative value.

Prevoicing or negative VOT happens when the onset of vibration of vocal folds is before the plosive release. It is visible on a spectrogram as a periodic low energy

(Kong 2009, p. 18). Prevoicing is uncommon in English since there is only little or no voicing at all during closure for English plosives (Ladefoged and Johnson, 2011, p. 151). However, a recent study shows that prevoicing prevails in Southern

American English. After investigating nine monolingual speakers of American

English who were born and raised in the Mississippi, Hunnicutt and Morris (2016, p. 217) provide the proof that prevoicing is not uncommon in Southern American

English. They conclude that their subjects of investigation contrast between a series of voiced plosives with prevoicing and a series of aspirated voiceless plosives in PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

40 word-initial position (Hunnicutt and Morris 2016, p. 223). It means that the vocal folds already vibrate before /b/ in bay is released, and vibration of vocal folds is delayed after the release of /p/ in pay. Some, however, argue that it is difficult to enable vocal folds vibration during obstruction of the vocal cavity.

Vaissiere (1997, p. 118) and Simon (2010, p. 12) argue that vibration of the vocal folds during stop closure is difficult to do. This is because the air pressure behind blockage inside the oral cavity can be too high for the body of air from the lungs to press forward through the glottis. Yet, there are strategies to enable vibration of the vocal folds.

One of the strategies is decompression of space inside oral cavity. Kong (2009, p. 18) claims that the air pressure inside the oral cavity should be reduced to allow movement of air through the glottis. This is done by keeping a port running through the nose open before the release of blockage inside the oral cavity. As a result, some of air escapes through the port – which is heard as – and it causes reduction of pressure within the oral cavity. The reduced supraglottal pressure permits transglottal airflow. This, Kong continues, is evidenced in the prenasalization of a voiced stop. The prenasalized English word the as spoken by certain Javanese people is an example of prevoicing. The effect of prevoicing is heard as an additional hum of a nasal sound as in [nd̪ ə].

Another strategy is prolonging the process of air pressure build-up above the glottis within oral cavity. The rapid increase of supraglottal pressure within oral cavity is not favorable for vocal folds vibration. To enable vibration, the rise in air pressure should be moderated. Simon (2010, p. 12) maintains that to decelerate air pressure build up, the cavity above the glottis is expanded by lowering the larynx. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

41

This expansion gives additional space for air to fill up. As a result, the sharp rise in supraglottal pressure is decelerated, allowing the body of air initiated in the lungs to travel through the glottis.

When vibration of vocal folds begins simultaneously with the release of plosive, it is said that the VOT is zero. Plosive with a zero VOT is evidenced in English words such as book where voicing of /ʊ/ begins immediately after the release of /b/.

Unlike English, aspiration following plosives in Javanese does not signal the distinction between two sets of plosives. Word-initial Javanese tense and lax plosives – which are both voiceless stops – do not rely solely on aspiration to be perceived as different. As Pennington (2006, p. 3) argues, aspiration accompanying lax and tense plosives is acoustically negligible. That aspiration is not a feature contrasting tense and lax plosives is not surprising. In a language such as Korean which contrasts three series of voiceless stops, aspiration is a feature separating the first two series, lax and tense voiceless stops, from the third series, aspirated voiceless stops (Kim and Duanmu 2004, p. 59).

Aspiration that follows English plosives, on the other hand, is a feature of most

English accents (Collins and Mees 2013, p. 88). Aspirated English plosives, especially those occurring initially in a stressed syllable, helps listeners perceive the word itself. Speakers of English as a native language are more likely to recognize [p] in pie by its aspiration, and [b] in buy by lack of aspiration, rather than relying on voicing (McMahon 2002, p. 26). To put it differently, the position of a plosive may influence the nature of the production of the plosive itself and its neighboring voiced sound.

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

42 c) Distribution of Voicing

Ashby (2011, p. 120) states that in many languages sounds that are transcribed using IPA as voiced, because of certain phonetic environments, can be formed without vocal fold vibration. Similarly, sounds marked in IPA as voiceless can in fact be produced with vibration of vocal fold. This should warn us against oversimplification that phonation of a plosive will promptly fall into one, consistent setting of vocal folds anytime the plosive is produced and anywhere it occurs in an utterance. A plosive which is defined as voiced by the IPA, for example, may be produced as voiceless.

Vaissiere (1997, p. 118) and Simon (2010, p. 12) offer an explanation as to tendency of plosives – and other – to completely or partially devoice.

Obstruents (plosives belong to this class of speech sounds) are produced by creating an obstruction within oral cavity above the glottis. Because of the transient supraglottal blockage, the mass of air from the lungs quickly collects inside the oral cavity inducing a quick rise in supraglottal air pressure. The high supraglottal pressure decreases or ceases the air pressure flowing through the glottis (Vaissiere

1997, p. 118). Without enough air pressure flowing through the glottis, it is aerodynamically difficult for vocal folds to vibrate (Simon 2010, p. 12). Thus, it is natural that plosives, as well as other obstruents, are devoiced. When a normally- voiced sound undergoes devoicing, it is produced with less voice or without voice at all. This phenomenon is usually triggered by a certain phonetic environment

(Crystal 2008, pp. 514-515). It implies that plosives, either voiced or voiceless, are in fact produced as voiceless in many cases. Referring to Javanese plosives,

Brunelle (2010, p. 7) provides support to this conclusion. He states that Javanese PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

43 plosives are only fully voiced when they occur after nasals. There is more perceptible vocal folds vibration in the production of [b̥ ] in mbakar [mbɑkɑr] ‘to burn’ in which the voicing of [b̥ ] is induced by the preceding nasal [m] (Adisasmito-

Smith 1999, p. 1). Likewise, English plosives (and other English obstruents) are only fully voiced when they occur medially amidst a fully voiced environment

(Ashby 2011, p. 121 and McMahon 2002, p. 26). The vocal folds vibrate throughout in the production of [d] in my daddy [maɪ dædi].

Yet, when English plosives are not surrounded by voice vowels, they are partially or fully devoiced. This is the kind of environment in which devoicing happens because the plosives occur immediately adjacent to silence/ a voiceless consonant (Ashby 2011, p. 121) or pause/voiceless sounds (McMahon 2002, p. 26).

Utterance-initial and utterance-final are positions where an English plosive is next to silence or voiceless sounds. As a result, a plosive in those positions may not be fully voiced or not voiced at all.

Because both voiced and voiceless plosives become devoiced in utterance-initial position, speakers of English as a native language recognize the voiced plosive in initial position of a stressed syllable because of lack of aspiration, not voicing. In other words, aspiration or lack of it helps listeners identify an initially positioned plosive in a stressed syllable (Fromkin, Rodman and Hyams 2011, p. 249).

Correspondingly, both voiced and voiceless plosives in utterance-final position also undergo devoicing. Speakers of English as a native language identify the plosive in word-final position because of the length of voiced sound preceding the plosive

(Ashby 2011, p. 121). A voiced plosive that occurs after a voiced phoneme (a liquid, nasal or vowel) makes the preceding voiced sound to lengthen (Davenport and PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

44

Hannahs 2010, p. 24). In other word, in a syllable closed by a voiced plosive, the vowel preceding the plosive is fully articulated. On the other hand, when a voiceless plosive comes after a voiced phoneme, pre-fortis clipping occurs (Ashby 2011, p.

122). The normal length of the voiced phoneme is clipped when the sound occurs before a voiceless plosive. The vowel [ɔ:] is fully long in ode, but it is appreciably shorter in duration in oat (Ashby 2011, p. 106). The difference in duration of [ɔ:] helps listeners to identify the plosive [d] in ode and the plosive [t] in oat.

Significantly longer duration of a voiced phoneme occurring before a word-final voiced plosive helps speakers of English as a native language recognize the plosive.

Consider the following:

(1) a. tip [tɪp] b. dip [dɪp]

c. latter [lætɚ] d. ladder [lædɚ]

e. bit [bɪt] f. bid [bɪd]

Because of a small delay between the release of plosive and before the voicing of the vowel [ɪ] in which a small burst of air is rushed out, speakers of English as a native language identifies the initial plosive in (1a) as [t], and [d] in (1b) because of the prompt voicing of the vowel [ɪ] without aspiration that follows the plosive.

The voiced plosive is fully voiced in (1d) because it occurs medially in the word, surrounded by voiced sounds [æ] and [ə]. The end of sound [æ] in (1c) is devoiced in anticipation of the forthcoming [t]. In turn, the voiceless sound [t] is partially voiced to anticipate the voiced vowel [ə]. However, it should be noted that the distinction between alveolar plosives [t] and [d] occurring intervocalically – immediately preceded and immediately followed by a vowel – as in (1c) and (1d) is neutralized in many North American and Northern Irish accents of English. The PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

45 intervocalic [t] and [d] are replaced by a voiced alveolar flap [ɾ] (Davenport and

Hannahs 2010, p. 26).

The final plosives in (1e) and (1f) are recognizable because of difference in duration of the preceding vowel [ɪ]. The vowel [ɪ] is significantly longer in bid (1f) than in bit (1e). The vowel [ɪ] in bid is fully long while the same vowel in bit is shorter because of pre-fortis clipping.

It can be inferred that speakers of English as a native language recognize a plosive by examining the voiced sound immediately adjacent to the plosive itself.

It is a similar case with Javanese language. Thurgood (2004, p. 281) concludes that the phonation of Javanese plosives is evident in the vowel, not on the consonants themselves. Among important acoustic features that clearly show the difference between lax and tense plosives, Thurgood states, are fundamental frequency (F0) and formant frequencies of vowel following Javanese plosives. Yet, difference in formant frequencies – not the fundamental frequency – is the most consistent feature characterizing tense-lax voice opposition (Thurgood 2004, p. 290).

Investigation by Ladefoged and Maddieson (1996, p. 64) confirms that lax plosives show a lowered F1. Difference in the F1 frequency of a vowel occurring after lax and tense plosives is also suggested by Brunelle (2010, p. 9).

Brunelle (2010, p. 8) asserts that realization of lax-tense contrast is systematic through VOT length distinction, difference in F1 values at the vowel onset, and vowel quality changes. Having observed vowels /ɑ/ and /ɔ/occurring after Javanese lax-tense plosives, Thurgood (2004, p. 291) reports that vowels /ɑ/ and /ɔ/ coming after a lax plosive significantly have a lower F1. It can also be inferred that realization of lax-tense contrast can be examined from the effect on the voiced PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

46 sound following plosives (Thurgood 2004, p. 280), much alike English plosives.

Plosive occurring word-final, however, is a different thing.

While English contrasts voiced-voiceless plosives in final position, Javanese does not. The distinction between Javanese lax and tense plosives is suspended when they occur in word-final position. As Dudas points out (1976, p. 118), both series of Javanese lax-tense plosives may occur word-initially and medially but only tense plosives occur word-finally. Wedhawati et al. (2010, p. 77-78, p. 83-84, and p. 93) concur that the absence of lax-tense distinction is due to neutralization, a phenomenon when regular contrast between two series of phonemes is lost in a particular environment (McMahon 2006, p. 59). Because contrast is nulled, only one series of phonemes prevail in a certain context. Neutralization of otherwise contrasting phonemes in final position is not unique to Javanese; it is also evident in German and Dutch (Ashby 2011, pp. 121-122). In English, aspiration is neutralized when a plosive follows initial [s] (Davenport and Hannahs 2010, p. 23).

Since no contrast is possible in this environment, only one series of plosives may occur. Thus, there is the word spill but not *sbill (McMahon 2006, p. 59).

In Javanese, the final phoneme in the word abad ‘century’ is the tense plosive

[t̪] instead of [d̪ ] although the orthography indicates differently. Consider this:

(3) a. pick [pɪk] b. pig [pɪg]

c. utek [ut̪ək] ‘brain’ d. ndableg [nd̪ ɑb̥ lək] ‘stubborn’

e. lepat [ləpɑt̪] ’mistake’ f. murid [murɪt̪] ‘student’

g. ganep [g̥ ɑnəp] ‘even’ h. rebab [rəb̥ ɑp] ‘a musical instrument’

Unlike English in which a choice between final voiceless sound [k] in (3a) and final voiced sound [g] in (3b) contrasts meaning, only one series of plosives may PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

47 occur in final position in Javanese words. Since lax-tense contrast is neutralized in that position, it is only the tense plosive [k] that prevails in both (3c) and (3d).

Similarly, because lax plosives do not normally occur word-finally, the tense plosives [t̪] and [p] are in the final position in (3e) and (3f), and also (3g) and (3h), respectively, regardless the orthography.

The discussion on VOT earlier also describes how VOT of English plosives compares to that of Javanese plosives. With VOT between 50-60 milliseconds for

/k/ and slightly shorter for /t/ and /p/ (Ladefoged 2001, p. 120), period of voicelessness of word-initial English voiceless plosives is much longer than any of

Javanese plosives. The above examples demonstrate that the delay between the release of lax plosive and the onset of voicing is only around 20 ms (between 14 –

28 ms, to be exact) which is not significant enough to be perceived as aspiration.

This may lead Pennington (2006, p. 3) to conclude that aspiration following lax- tense plosives is negligible. Yet, the existing gap – albeit a narrow one – in VOT length, coupled with breathy-clear voice quality associated with tense-lax plosives, may be perceived as a contrast between ‘slight aspiration’ and ‘non-aspiration’.

Collins and Mees (2013, p. 88) offer an explanation on the opposition between

‘slight aspiration’ and ‘non-aspiration’ of word-initial plosives in non-aspiration languages. English, being a language in which aspiration is a feature of voiceless stops occurring initially in a stressed syllable (Fromkin, Rodman, and Hyams 2011, p. 240), is of course an aspiration language. Javanese, on the other hand, is a non- aspiration language.

In non-aspiration languages such as Javanese, explain Collins and Mees (2013, p. 88), articulators make a slightly different gesture when making a closure for PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

48 opposing sets of plosives. The articulators form a tight air pocket inside the vocal cavity during closure for plain voiceless – tense – plosives before they briskly release the compressed air. The quick release of air pressure, which implies a brief

VOT, from a tight vocal tract can be perceived as acoustically clear. On the contrary, the articulators make a looser closure in producing plosives with a slight aspiration. The air pocket inside the vocal tract is not tightly sealed; compressed air somewhat leaks out and released more slowly. This results in somewhat breathier release of plosive with a relatively longer VOT.

3) The Articulatory Process

Speech sounds vary in the way the airstream is affected as it flows from the lungs up and out of the mouth and nose. To vary the sounds, airstream is manipulated by the action of active and passive articulators. The articulatory process describes the active and passive articulators involved in the articulation of plosives and their relative position to each other. a) Manner of Articulation

To produce plosives, a velic closure is created (Davenport and Hannahs 2010, p.

11) by raising the soft palate or velum so that the air is prevented from escaping through the nasal tract (Ladefoged and Johnson 2011, p. 14). Since the air finds its way out through the oral tract, plosives are oral, instead of nasal sounds. The moment the velum is lifted up, the active and passive articulators make contact and completely block the airstream in the oral cavity for a brief period (McMahon 2006, p. 28). While the active and passive articulators touch for tens of milliseconds and create a blockage (Fromkin, Rodman & Hyams 2011, p. 201), the air from the lungs is compressed behind the blockage. The air pressure continues to build up until the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

49 blockage is abruptly released. The quick release causes a very brief friction when the air suddenly bursts outwards. The very brief friction – though it is not usually heard as friction – adds the popping quality to plosive release (Gussenhoven and

Jacobs 2011, p. 28). b) Place of Articulation

Place of articulation of plosives is where the short period of air blockage occurs within the vocal tract (Fromkin, Rodman, and Hyams 2011, p. 195). It states the relative position of the highest point of the active articulator and the passive articulator (Davenport and Hannahs 2010, p. 13). The active articulator is mostly the tongue while the passive ones may vary, which give different labels to designate plosives in English and Javanese. Ladefoged and Johnson (2011, pp. 8-13),

Fromkin, Rodman, and Hyams (2011, pp. 201), and Davenport and Hannahs (2010, p. 13) offer explanation of different places of articulation.

English plosives consist of nine phonemes, [p], [b], [t], [d], [k], [g], [tʃ], [dʒ], and [Ɂ], and three of them are not the focus of the study. As a result, discussion on plosive with more gradual release, , is put aside. Similarly, there will be no further discussion on glottal stop. Although glottal stop is found in certain varieties of English, it only occurs at the initial position before vowels in American English and between vowels in London Cockney. Because of its limited distribution, its status as a consonant phoneme comes under question (Ladefoged and Johnson

2011, p. 38). In the study, attention will be given to the first six English phonemes,

[p], [b], [t], [d], [k], and [g].

What makes [p], [t], and [k] similar is that in the IPA they are voiceless while

[b], [d], and [g] are voiced. The term bilabial is given to [p] and [b] because the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

50 blockage of airstream is created by the complete closure of the lips (bi=two, labia= lips). The active articulator is the lower lip and the passive one is the upper lip. The name alveolar plosive is given to [t] and [d] because the complete closure happens at the alveolar ridge. These sounds are produced by moving the tongue tip or blade

(active articulator) up towards the alveolar ridge (passive articulator). Lastly, [k] and [g] are velar plosives because the blockage takes place at the velum or soft palate. To produce these sounds, the back of the tongue (active articulator) is raised to touch the velum (passive articulator).

Javanese, on the other hand, has eleven plosives [p], [b̥ ], [t̪], [d̥̪ ], [ʈ], [ɖ̥ ], [c], [Ɉ̥ ],

[k], [g̥ ], and [Ɂ] (Nothofer 2009, p. 560). Since English affricates – which are palatal sounds – are not the focus of the study, the Javanese palatal phonemes [c] and [Ɉ̥ ] are not discussed further. As additional information, [c] and [Ɉ] are alternative ways to transcribe [tʃ] and [dʒ]. The use of a single letter is meant to highlight the fact that each affricate is single unit (Ladefoged and Johnson 2011, p. 38). Similarly, since discussion on English glottal stop will not be pursued, there will be no further discussion on Javanese glottal stop – itself only occurs in syllable/word final position, and in a certain phonetic environment, may occur word-initially

(Wedhawati et al. 2010, p. 95). As a result, discussion will be about the remaining eight Javanese plosives, [p], [b̥ ], [t̪], [d̥̪ ], [ʈ], [ɖ̥ ], [k], and [g̥ ].

The phonemes [p], [t̪], [ʈ], and [k] are tense plosives associated with high register while the lax plosives [b̥ ], [d̥̪ ], [ɖ̥ ], and [g̥ ] are associated with low register. Similar to their English counterparts, both phonemes [p] and [b̥ ] are bilabial because air blockage occurs briefly at the mouth. The lower lip (active articulator) makes contact with the upper lip (passive articulator) to create the closure. When the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

51 tongue tip or blade (active articulator) touches the upper teeth, the sounds produced are dental phonemes [t̪] and [d̥̪ ]. Poedjosoedarmo, in a lecture, offers the name post- dental to these plosives because the closure is created by bringing the tip of the tongue to a region slightly behind the upper teeth but still in front of the alveolar ridge.

The next set, [ʈ] and [ɖ̥ ], is called retroflex. Unlike English, Javanese makes phonemic distinction between dental and retroflex plosives (Nothofer 2009, p. 560).

The Javanese word [put̪u] means ‘grandchild/grandchildren’, but when the medial dental phoneme is substituted with a retroflex consonant, the utterance becomes

[puʈu] which is a type of rice cake. In English on the other hand, retroflex is not used on a regular basis but it is found in other Englishes (Ashby 2011, p. 57) such as Indian English. For speakers of Southern Standard British English and General

American, the consonant /r/ is commonly realized as [ɹ] which is retroflex (McMahon 2006, p. 32) or post-alveolar approximant (Collins and

Mees 2013, p. 95). As an additional note, Javanese only has alveolar trill [r]

(Wedhawati et al. 2010, p. 96), a consonant which is rather uncommon for speakers of English (McMahon 2006, p. 29).

To produce retroflex plosives, the tongue tip (active articulator) curls backwards and touches an area immediately behind the alveolar ridge (passive articulator).

Because these phonemes are produced using the tip of the tongue (apical) and an area somewhere in the hard palate, another name for retroflex is apico-palatal

(Ashby 2011, p. 36), which Wedhawati et al. (2010, p. 74) use to describe this set of phonemes. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

52

The velar plosives [k] and [g̥ ], without the registers associated with them, are produced in a similar fashion as their counterparts in English. The complete closure happens at the velum or soft palate (passive articulator) when the tongue back

(active articulator) makes a brief contact with it.

The description above concerns the places of articulation of plosives discussed in the study. It has also been revealed that most of Javanese plosives share somewhat similar places of articulation with their English counterparts. In terms of phonation, however, there is a major difference.

The phonemic contrast between two series of English plosives lies in voicing.

One set of English plosives is voiced while the other one is voiceless. Accepting voicing as the phonemic contrast between two sets of Javanese plosives

(Wedhawati et al. 2010, pp. 73-74) will certainly make comparison between

Javanese and English plosives much simpler as awkward differences in phonation between Javanese-English plosives will be neatly trimmed. However, this view simply overlooks the fact that all Javanese plosives are voiceless in any position

(Nothofer 2009, p. 560). They are voiced only when they occur after nasal, in which the voicing is induced by the preceding nasal (Adisasmito-Smith 1999, p. 1).

Brunelle, therefore, asserts that the realization of phonemic contrast between

Javanese lax-tense plosives does not lie on voicing. Instead, it is systematically manifests in register system (2010, p. 19).

4. Brunelle’s Arguments Regarding Realization of Phonemic Contrast

between Javanese Lax and Tense Plosives

Brunelle confirms the systematic contrast between lax-tense plosives (2010, pp.

8-9) and claims that Javanese lax-tense contrast should be treated as a variant of PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

53 register contrast (2010, p. 19). Again, Brunelle (2010, pp. 8-9) maintains that

Javanese lax-tense contrast is realized in difference in VOT length, voice quality, reduction of formant energy, and decrease of pitch frequency. These translates into two opposing registers, high register vs. low register, along with their respective acoustic properties.

First, Lax plosive has a longer VOT than its tense counterpart in the same place of articulation e.g. VOT of lax bilabial plosive is longer than VOT of tense bilabial plosive. Second, lax plosive is released with a breathier voice quality than tense plosive in the same place of articulation, which is also evident in the voiced sound that immediately follows the plosive e.g. lax dental plosive and the immediately succeeding voiced sound have a breathier quality than tense dental plosive and the immediately succeeding voiced sound. Indeed, this register contrast – high and low

– is also evident in the voiced sound following plosives (Thurgood 2004, p. 280).

Third, as a result of formant energy reduction, there is a decrease in FI value following lax plosive. In other words, the F1 of voiced sound that immediately follows lax plosive is lower than the F1 of voiced sound that immediately follows tense plosive For example, the F1 of /ɑ/ immediately following lax bilabial /b/ in a word-initial syllable ba- is lower than the F1 of /ɑ/ immediately following tense bilabial /p/ in a word-initial syllable pa-. Fourth, the pitch at the onset of voiced sound following lax plosive is also reduced. This means that the F0 value at the onset of a vowel immediately occurring after a lax plosive is lower than the F0 value at the onset of a vowel immediately occurring after a tense plosive. However,

Thurgood (2004, p. 290) argues that comparison of F1 values correlates more consistently to lax-tense opposition than comparison of F0 values. Therefore, F0 PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

54 value is not the focus of the study. Moreover, Javanese breathiness is less easy to observe on waveform and spectrogram than VOT, F1, and F0 values. As a result, the opposition between breathy-clear qualities is not discussed in the study. Thus, two values remain and become the focus of the study. In the study, the phonemic contrast between lax-tense plosives is observable in two acoustic features i.e. VOT length and F value.

Consider the following minimal pairs of Javanese words spoken by a speaker of

Javanese as the first language:

(2) a. bayu [b̥ ɑju] ‘wind’ b. payu [pɑju] ‘sold out’

c. damu [˳d̪ ɑmu] ‘blow’ d. tamu [t̪ɑmu] ‘guest’

e. dhodhog [˳ɖ,ɔ˳ɖ,ɔɁ] ‘with knees bent’ f. thothok [ʈɔʈɔɁ] ‘cover’

g. gali [g̥ ɑli] ‘thug’ h. kali [kɑli] ‘river’

Occurring in word-initial position in (2a), (2c), (2e), and (2g) are lax plosives

[b̥ ], [d̥̪ ], [ɖ̥ ], and [g̥ ] respectively. Contrasted to these consonants are tense plosives

[p] in (2b), [t̪] in (2d), [ʈ] in (2f), and [k] in (2h). Though all examples featuring

Javanese lax-tense opposition are two-syllable words, attention is only given to the first syllable where such opposition occurs. To isolate contrasting lax-tense plosives, the second syllable in each word has been cut off in Praat, and the cropped sounds have been presented in the form of waveforms and spectrograms below.

There are four sets of waveforms and spectrograms, Figure 1, Figure 2, Figure 3, and Figure 4, that show the contrast between (2a) – (2b), (2c) – (2d), (2e) – (2f), and (2g) – (2h), respectively. In each figure, the left waveform and spectrogram show the syllable with an initial lax plosive and the right diagrams display the syllable with an initial tense plosive. Two acoustic features that speakers of PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

55

Javanese as the first language use to make distinction between lax and plosive consonants – length of VOT and the first formant (F1) of following vowel – are shown in the figure.

b̥ ɑ p ɑ Figure 1. [b̥ ɑ] – [pɑ]

Figure 1 above exhibits the contrast between the initial syllable [b̥ ɑ] in (2a)

‘bayu’ [b̥ ɑju] (diagrams on the left) and [pɑ] in (2b) ‘payu’ [pɑju] (diagrams on the right). The VOT of lax [b̥ ] in this sample 1, though it is not long enough to be significantly perceived as an aspiration, is longer by 6 ms (milliseconds) than the bilabial plosive in the same set, [p], which is 13 ms. How these consonants affect the succeeding vowel [ɑ] differently is indicated by value of F1. The mean F1 at the mid-point of the vowel [ɑ] that follows lax plosive [b̥ ] is 860.5 Hz. As it turns out, the value is lower than the mean F1 of the vowel [ɑ] occurring after [p], which is 912.2 Hz. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

56

˳d̪ ɑ t̪ ɑ Figure 2. [˳d̪ ɑ] – [t̪ɑ]

Similarly, Figure 2 above proves that longer VOT and lower F1 on the following vowel are acoustic features associated with Javanese lax plosives. The figure presents the waveform and spectrogram of the first syllable in (2c) [d̥̪ ɑ] and the first syllable in (2d), [t̪ɑ]. The VOT of lax consonant [˳d̪ ] in the sample is 14 milliseconds, which is almost twice longer than that of tense plosive [t̪] (8 milliseconds). The mean F1 of vowel [ɑ] coming after [˳d̪ ] is 853.4 Hz. It is slightly lower than the value of the same vowel occurring after tense plosive [t̪], which is

864.4 Hz.

Figure 3 below which demonstrates the contrast between two initial syllables of words in (2e) and in (2f), shows that the mean F1 of [ɔ] is 603.9 Hz after [ɖ̥ ,] and

916.7 Hz after [ʈ]. The VOT of lax plosive [ɖ̥ ,] is 14 ms while that of [ʈ] is 6 ms shorter. Again, Figure 3 reveals the fact that VOT of a retroflex lax plosive is longer than that of a retroflex tense plosive, and F1 of vowel following a lax plosive is lower as well. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

57

ɖ,̥ ɔ ʈ ɔ Figure 3. [ɖ,̥ɔ] – [ʈɔ]

The first syllables of pair (2g) – (2h) are contrasted in Figure 4 (below).

g̥ ɑ k ɑ Figure 4. [g̥ ɑ] – [kɑ]

As it is the case with the previous sets, the consonant with a longer VOT is lax plosive [g̥ ]. Shorter by 6 milliseconds, the VOT of tense plosive [k] is 22 milliseconds. The mean F1of vowel [ɑ] coming after tense consonant is higher, which indicates a similar behavior to other post-plosive vowel in previous PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

58 exemplified sets. The mean F1 of [ɑ] after [k] is 860.9 Hz, which is higher than the value of the same vowel following [g̥ ] which is 757.2 Hz.

These examples affirm that longer VOT and lower F1 of post-plosive vowel feature prominently in lax plosives. The above examples fit assertion by Brunelle

(2010, p. 8) that realization of lax-tense contrast is systematic through VOT length distinction, difference in F1 values at the vowel onset, and vowel quality changes.

Having observed vowels /ɑ/, /ɔ/, and /u/ occurring after Javanese lax-tense plosives,

Thurgood (2004, p. 291) reports that vowels /ɑ/ and /ɔ/ coming after a lax plosive significantly have a lower F1. Vowel /u/ occurring after a lax plosive, on the other hand, is characterized by having a higher F1. Visible in waveform and spectrogram created in Praat, VOT length and mean F1 at the mid-point of vowel following a plosive are two features used in the study to show Javanese phonological rules at work.

5. Comparison and Contrast of English – Javanese Plosives

To investigate the characteristics of English plosives produced by Javanese speakers, it is necessary to map the similarities and differences of plosives in both languages. Table 2 below summarizes findings from previous studies and functions as a tool to compare and contrast English and Javanese plosives. The table also serves as a basis to identify English plosive that may cause pronunciation problems to speakers of Javanese as a native language, to analyze possible influence from

Javanese phonological rules, and to help explain the intelligibility of English words carrying plosives. In the last case, English raters are asked to rate the plausibility of the words to be perceived as English. The table follows suggestion by Ladefoged and Johnson (2011, p. 5) that description of a phoneme is given in terms of the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

59 airstream process, the phonation process, the oro-nasal process, and the articulatory process. However, due to the similar nature of speech production in both Javanese and English that is pulmonic eggressive, comparison in terms of airstream process is simply unnecessary. The fact that the study focuses on plosives, a group of oral stops, also renders a special column delineating the oro-nasal process unneeded.

The letter ‘A’ within table 2 below means that a plosive in this group occurs immediately in adjacent to silence or voiceless environment and the phoneme is produced without voicing. After the release of the plosive, there is a delay of voicing of the following vowel, which is perceived as an aspiration. Aspiration following this plosive is a feature that helps listeners identify the phoneme itself.

Letter ‘B’ in table 2 below indicates that a plosive in this group occurs immediately in adjacent to silence or voiceless environment. As a result, there is only partial or no voicing at all. There is no delay in the voicing of the following vowel, resulting in no aspiration. Lack of aspiration helps listeners identify the phoneme itself.

A plosive in group ‘C’ in table 2 below is produced with an open glottis. It results in no voicing. Partial voicing is possible when the phoneme occurs amidst voiced sounds. On the contrary, a plosive in group ‘D’ is produced with a narrowed glottis and it is fully voiced amidst voiced sounds.

A consonant in group ‘E’ is voiceless produced with an open glottis. Because of that, it takes a relatively strong degree of muscular effort and breath force (fortis) to articulate the phoneme. When it occurs in final position, it causes pre-fortis clipping to the preceding voiced sound. As a result, the preceding voiced sound becomes shorter in duration. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

60

Table 2. English-Javanese Plosives

English Plosives Javanese Plosives Phonation Process Articulation Process Phonation Process Articulation Process

IPA Environment IPA Environment

lator

Initially finally Initially finally

medially medially

------

Word Word Word Articulation of Place Articulator Active Articu Passive Word Word Word Articulation of Place Articulator Active Articulator Passive [p] A C E bilabial [p] G I J bilabial

voiceless lip Upper lip Lower voiceless lip Upper lip Lower

[b] B D F bilabial [b̥] H I N/A bilabial

voiced lip Upper lip Lower voiceless lip Upper lip Lower [t] A C E alveolar [t]̪ G I J dental

voiceless blade or tip Tongue ridge Alveolar voiceless blade or tip Tongue teeth Upper

[d] B D F alveolar [d̪̥̪] H I N/A dental

voiced blade or tip Tongue ridge Alveolar voiceless blade or tip Tongue teeth Upper

[ʈ] G I J retroflex

voiceless tip tongue Curled palate Hard

[ɖ̥] H I N/A retroflex

voiceless tip tongue Curled palate Hard [k] A C E velar [k] G I J velar

voiceless back Tongue velum voiceless back Tongue velum [g] B D F velar H I N/A velar [g̥]

voiced back Tongue velum voiceless back Tongue velum

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

61

Because the plosive in group ‘F’ occurs immediately in adjacent to silence or voiceless environment, voicing is disabled or only partially. Any voiced sound occurring before the phoneme is produced fully long in duration.

Phoneme in group ‘G’ is voiceless. The plosive is identifiable because the delay between the release of the phoneme and voicing of the following vowel is shorter and the F1 value of following vowel is higher with modal-voice quality (high register). Although group ‘H’ plosives are also voiceless, they are identifiable because the delay between the release of the phoneme and voicing of the following vowel is longer. The F1 value of following vowel is lower with breathy-voice quality (low register).

The phoneme in group ‘I’ is voiceless. When it occurs after a nasal, voicing may happen because of influence from the nasal sound which is voiced by nature. The phoneme is identifiable of because of the register associated with it, a term that covers VOT length, F1 value of following vowel, pitch height at the onset of the following vowel, and voice quality. Only voiceless tense plosives occur in group

‘J’ since contrast is neutralized. The aforementioned groups classify Javanese and

English plosives according to their features and how they interact with surrounding sounds. These Javanese and English plosives may influence – or take the consequences of – the environment in which they occur, differently. This may affect perceived intelligibility when a speaker of English as the first language makes

Javanese utterances or vice-versa.

6. Intelligibility

Along with Arabic, French, Greek, Sanskrit, and Spanish, English is among the most widely distributed languages in the world. One of the consequences of having PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

62 a wide geographic distribution and number of speakers is the diverse range of peoples who use English (Nelson 2011, p. 2). Users of English and their local cultures have been shaping the language into its current diversified forms e.g.

Indian English, Singapore English, Nigerian English, Irish English, and South

African English. In other words, English has undergone localization across the globe. At the same time, the language has also been globalized when people who do not share a common linguistic and cultural background opt to use a form of

English to communicate among themselves (Siemund, Davydova, and Maier 2012, p.1).

When a language operates on a global level, an issue may rise concerning intelligibility. The question is whether the extent of language spread and linguistic form diversity resulting from the spread itself create problems with understanding the different varieties of English which develop (Seargeant 2012, p.5). There is a need to know whether one’s English will serve the purpose of communicating with other users of English who are not of their immediate neighborhood, circle, region, or nation (Nelson 2011, p. 3). English has been diversified into so many forms that they are not always mutually intelligible (Siemund, Davydova, and Maier 2012, p.

2).

Intelligibility, however, is a general notion encompassing different degrees of problems, argue Kachru and Nelson (2011, p.66). They continue that without considering the social nature and basis of language, the notion of intelligibility fails to be meaningful. Degree of intelligibility, they continue, should be considered in a particular context of participants and situation: who are the speakers, where they are, when the conversation takes place, why they have the conversation, and so on. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

63

Nelson (2011, p. 5) adds that without considering the situation, almost nothing that a speaker says is communicative. He continues that discussing intelligibility without making reference to participants and other relevant aspects of the context of situation is unfeasible (Nelson 2011, p. 7).

On an approach to intelligibility, Kachru and Nelson (2011, pp 67-68) also

Nelson (2011, p. 28) often quote Smith who analyzes the issue of general intelligibility into three components: intelligibility, comprehensibility and interpretability.

Intelligibility refers to the level of sound, which is the limited, technical sense of intelligibility (Kachru and Nelson 2011, p.67). It covers features of phonetics and phonology needed to recognize utterances. It involves the ability to parse utterances into recognizable or plausible words. When a person is able to recognize sounds of the word “din-dins”, “turf”, “bookie”, or “tiffin” and believe that these words are

English, intelligibility has been reached. It is said that intelligibility of the utterances is high when the person is able to recognize them as plausible English words based on phonetics and phonology features of the words inasmuch as the person may not understand the meaning. In such case, the person only needs to ask for clarification and the entire exchange does not have to break down (Kachru and

Nelson 2011, p.67). The next level, comprehensibility, comprises assigning meaning to utterances, which is almost similar to the idea of ‘understanding’ the meaning of utterances. Interpretability is the most complex level of communication, and the most important one. Speakers are not only able to identify phonetics and phonology features and recognize meaning, they are also capable of noticing the purpose of an utterance (Kachru and Nelson 2011, p.68). Since the study concerns PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

64 only with how speakers of geographically-distant variant of English are able to recognize utterances produced by informants as English words, a discussion on comprehensibility and interpretability will not be pursued.

It is natural that we expect utterances in a language to sound as we expect them to sound (Nelson 2011, p. 22). For utterances to be called “English”, they need to be recognizably English enough to be effective with other English users. In the case of adult learners of English who already acquire native competence in their first language (L1), “foreign-accented” pronunciation is inescapable (Fromkin, Rodman and Hyams 2011, p. 361). Their errors in pronunciation are often caused by transfer of the phonemes, phonological rules, or syllable structures of their first language

(Fromkin, Rodman and Hyams 2011, p. 363). Production of English as a second language (L2) that is influenced by L1 is closely related with habit formation. The muscular movements that have been tuned to production of L1 operate automatically in the production of English as L2 (Jenkins 2000, pp. 32-33). To produce sounds that are phonetically very different from those in the L1, articulators have to make new gestures which can be difficult to do, adds Jenkins. On the contrary, sounds in L2 that are perceived to some extent as similar to those in L1 are categorized in terms of L1, concludes Jenkins. Problems with influence of L1 features, however, do not always lead to an unresolvable situation. When a speaker of English as L2 produces sounds of English differently from how a listener expects them to sound, some listeners rely on both linguistic and real-world contexts to apprehend what has just been said (Nelson 2011, p. 29). Moreover, when the listener spends more time with such speakers and becomes more comfortable and familiar with the way they produce the sound of English as L2, the listener, in most PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

65 cases, will eventually acquire open-mindedness and gain better access to greater intelligibility. As Nelson (2011, p.3) puts it, exposure and practice leads to more familiarity which, in the end, lowers one’s intelligibility threshold. With more exposure, the sounds of English as L2 are easier to process and speech becomes more intelligible. Exposure and training, as it turns out, work two-ways. Practicing

English as a means of communication between people whose first languages are different, Jenkins (2014, p. 2) argues, may improve ability to understand speakers of English from different first languages, as well as the ability to adjust one's own way of speaking so as to accommodate English-speaking interlocutors from a range of L1 backgrounds (Jenkins 2014, p. 97). She believes that exposure and practice help lower one’s intelligibility threshold and improve one’s utterances to others. In the case of English speakers whose L1 is Javanese, exposure and practice give them a greater access to intelligibility of utterances and ability to adjust their own utterances so as to accommodate listeners whose L1 may not be Javanese. To see whether one speaker is more intelligible than another, a comparison is needed. To see the degree of intelligibility of an utterance in English as L2, Nelson (2011, p.7) argues, one needs to consider participants and contexts.

To follow Kachru and Nelson (2011, p.66), a discussion on the degree of intelligibility is only meaningful when particular context of participants and situation is considered. As influence from L1 is inescapable, different L1s will affect production of English as L2 differently, sometimes to the point of unintelligibility (Siemund, Davydova, and Maier 2012, p. 2). The level of unintelligibility increases along with the increased gap between interlocutors’ L1

(Jenkins 2000, p. 20). Therefore, to yield a meaningful degree of intelligibility, PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

66 comparison of English sounds as a second language is made among people who share the same situation i.e. affected by the same L1 features. The degree of intelligibility is a comparison of how well each individual in the same group produces sounds recognizably “English”. Undoubtedly each person has their own different situation that may affect their production of English sounds. Previous training, exposure to English sounds, and experience in using English to communicate with speakers of English as the first language are likely to contribute differences in the degree of intelligibility.

B. Theoretical Framework

It is expected that the characteristics of English plosives produced by Javanese speakers are a result of influence from Javanese phonological rules. Such influence may be evidenced in different environments. First, word-initial English voiced plosive has a longer VOT and the F1 of following vowel is lower. Second, word- initial English voiceless plosive has a shorter VOT and the F1 of following vowel is higher. Third, words with final English voiceless plosive is not necessarily shorter in duration because pre-fortis clipping does not occur. Fourth, to some extent influence of Javanese phonological rules may affect the plausibility of words produced by speakers of Javanese as the first language to be perceived as English.

As it has been mentioned previously, a word-initial English voiceless plosive needs to have a VOT longer than 20 milliseconds to be perceived as aspirated. It is also stated that a voiced sound preceding an English voiced plosive needs to be articulated long enough to be perceived as fully voiced.

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

67

CHAPTER III

RESEARCH METHODOLOGY

A. Data of the Study and Data Source

The raw data collected for the study was sound of English words in which plosives of different places of articulation occur in initial, medial, and final positions. The words, printed on paper and presented as groups of minimal pairs, were adapted from J. D. O'Connor’s Better English Pronunciation. The words were read out by speakers of Javanese as the first language and recorded into a cellphone,

Samsung Galaxy ACE 3 GT S7270. The cellphone was handheld and kept at a distance of 10-15 centimeters from informant’s mouth. Each informant’s production was recorded into a single audio file. Six audio files were made and saved in MPEG-4 AAC LC format at sampling rate of 48,000 Hz. The running time of each audio file was around 3 minutes and they were light enough to be sent as an email attachment for an intelligibility measurement.

To document the degree of intelligibility, a range of intelligibility levels was set.

The range, resembling a ruler, bears 30 notches along its stretch (see figure 5).

These notches provide raters with flexibilities in expressing their perception towards how plausible it is the utterance made by informants to be identified as an

English word. The first group of notches on the extreme left represents varied degrees of complete unintelligibility. The next six notches to the right represent different degrees of unintelligibility in most cases. The group of notches in the middle represents varied degrees of adequate intelligibility. The next notches to the right represent varied degrees of intelligibility in most cases. The last group of notches at the extreme right represents different degrees of complete intelligibility. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

68

Raters provided their perception towards degree of intelligibility as a general impression. Later, Kristen provided her perception towards the degree of intelligibility of a group of minimal pairs produced by an individual informant.

Figure 5. Range of intelligibility levels

The fact that there is no nativized variety of English in Indonesia (Wee 2010, p.

109) means that there is no sizeable number of people in the country whose first language is English. Yet, Javanese people who are fluent in English do exist and the already available Javanese people having a good command of English – proved by their high TOEFL score – were awardees of BAPPENAS scholarship. They, at the time data was taken, were attending an intensive English training at Pusat

Pelatihan Bahasa, Fakultas Ilmu Budaya, Universitas Gadjah Mada. The compulsory training was an attempt to boost their IELTS score prior to attending overseas universities. Five informants who agreed to participate in the study, Mita,

Yudi, Doni, Nurul, and Adi, were civil servants and they were in their early thirties.

The sixth informant, Nana, was a teacher with Pusat Pelatihan Bahasa, Fakultas

Ilmu Budaya, Universitas Gadjah Mada. She had already obtained her master degree in English literature, and her previous education was English literature as well.

Two raters, residents of California, were asked to provide their empirical impression concerning the degree of intelligibility of English sounds produced by informants. This does not suggest an endorsement that American English, or any other certain varieties of English, is more “standard” and “acceptable” than others. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

69

The decision was because the raters speak English as the first language and they were already available within reach. The raters, Kristen and Mum, agreed to participate at a short notice. However, as Mum was in preparation for her relocation when the study was conducted, her involvement was not as intensive as Kristen, her daughter. Still, Mum could manage to give her general impression of degree of intelligibility of all six informants.

B. Approach

In the investigation of characteristics of English plosives produced by Javanese speakers, data for the study were treated as phonetic units. Observation of acoustic features was done in Praat, a scientific computer software package developed by

Paul Boersma and David Weenink of the University of Amsterdam. The software, specifically designed to do phonetics by computer, is available for download for free.

The study was quantitative in nature since it aimed to quantify how Javanese phonological rules influence production and intelligibility of English words carrying plosives by Javanese informants. The degree of intelligibility in the study would express the plausibility of the words to be perceivable as English. The study sought to make prediction about Javanese speakers attempting to produce English plosives.

C. Method of the Study

The study was conducted in June 2015 in Yogyakarta. Five informants whose

TOEFL scores were between 590-630 and one English trainer were invited to participate in the study. They were three males (Yudi, Doni, and Adi) and three PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

70 females (Mita, Nurul and Nana). Their voice was recorded in the library of Pusat

Pelatihan Bahasa, Fakultas Ilmu Budaya, Universitas Gadjah Mada Yogyakarta.

1. Data Collection

Data were taken from recorded words purposefully selected for the study. The words were spoken by Javanese males and females who resided in Yogyakarta at the moment the study was conducted. All informants were readily available source of data who were capable of using English to communicate, as proven by their

TOEFL score. Informants were asked to read groups of minimal pairs shown below.

Only the target words (in boldface) were used and processed for the study. There were fifteen groups of words, featuring different environments in which plosives occur: bilabials occurring word-initially (group1), word-medially, (group 2), word- finally (group 3), alveolars occurring word-initially (group 4), word-medially

(group 5), word-finally (group 6), velars occurring word-initially (group 7), word- medially (group 8), and word-finally (group 9). There were also groups of words presented to informants but the produced sounds were not analyzed and studied: palatal occurring word-initially (group 10), word-medially (group 11), word-finally

(group 12), nasals occurring word-finally (group 13), word-medially (group 14), and syllabic nasals (group 15).

Group 1

peak beak pull bull pit bit pride bride pack back port bought park bark plays blaze

Group 2

happy shabby repel rebel (verb) supper rubber simple symbol PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

71

paper labor apply oblige

Group 3

rip rib cap cab rope robe tripe tribe tap tab wrap grab

Group 4

two do torn dawn ten den tie die ton done town down tune dune twin dwindle

Group 5

writer rider wetting wedding latter ladder water warder whitish widish putting pudding

Group 6

bet bed heart hard late laid sight side set said brought broad

Group 7

cave gave card guard curl girl could good cap gap coal goal class glass crow grow

Group 8

licking digging lacking lagging weaker eager thicker bigger market target ankle angle

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

72

Group 9

pick pig dock dog back bag lock log lake plague broke rogue

Group 10

chin gin choke joke cheer jeer chain Jane choice Joyce chest jest

Group 11

riches ridges batches badges catching cadging watching lodging fetching edging kitchen pigeon

Group 12

rich ridge catch cadge search surge H age fetch edge watch lodge

Group 13

him limp room lump one send soon fond lamb lamp game games tin sent mine sons

Group 14

lambs lamp hums hump send sent sins since joined joint complained complaint

Group 15

person reason fashion occasion even often region kitchen PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

73

2. Data Analysis

To begin analyzing the data, the format of sound files needed conversion since

Praat can only read wav files. After installing a third-party application, files were converted from their original mp4 format. Each file, now in wav format, was opened in Praat and its two waveforms (because the sound was still in stereo) and spectrogram were displayed. Since not all word groups were analyzed, the parts that were unneeded were removed, including interfering sounds such as coughing or chair being dragged. After removal of unnecessary parts, the stereo files were converted into mono so that there were only one waveform instead of two for each file. The file was renamed and ready for analysis.

Values of VOT, F1, and duration of voiced sound were taken from the waveform and spectrogram of newly-edited file. Since the distinctive feature of an initially- positioned Javanese lax plosive is low register associated with it, a comparison of

VOT and F1 values of English words produced by each informant was made to investigate influence of Javanese phonological rules. The VOT of an English voiced plosive was compared to see if it is longer than the VOT of an English voiceless plosive produced at the same place of articulation. Similarly, the vowel following an English voiced plosive was observed to check if its F1 is lower than that of the same vowel following an English voiceless plosive produced at the same place of articulation. The duration of voiced sound preceding a final plosive was also observed. The value was taken from English words produced by informants to understand their strategy to accentuate the difference between English voiced and voiceless plosives in word-final position. Data of F1 frequencies at the onset was PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

74 taken from location within a voiced sound where frequencies already or remain stabilize.

To take data about Voice Onset Time, zooming into the waveform was necessary until it was visually clear the beginning of voicing. After two points in time – the start of plosive release and the beginning of voicing – were decided, the length between the two points was measured as the VOT. Voicing happening before the release of plosive closure was recorded as negative VOT.

To obtain F1 at the onset, the cursor was positioned at the point in time already decided as the beginning of voicing. The cursor was moved to the right to the point in time when cycles of voiced sound begin to stabilize. The option “move cursor by…” within “Select” menu came in handy to ensure uniformity and precision of cursor placement to the right by 10 milliseconds.

Data about word-finally plosive were taken by selecting the whole word and zooming in to show only the voicing. The beginning and ending of voicing was decided and measured to obtain duration of voiced sound preceding the plosive.

One of important measurements in the study was comparison of VOT length of

English voiceless plosives to that of English voiced plosives produced by Javanese informants. To compare, the VOT length of /b/, /d/, and /g/ was subtracted from the

VOT length of /p/, /t/, and /k/. A positive result would indicate that the VOT of a voiceless plosive is longer than that of a voiced plosive at the same place of articulation, and a negative result would signify the opposite. While a positive result would better fit what a speaker of English as the first language expects as a feature of /p/, /t/, and /k/ occurring initially in a stressed syllable, a negative result would indicate aptness to follow Javanese phonological rules that /p/, /t̪/, /ʈ/ and /k/ are PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

75 associated with high register – setting a shorter VOT for tense plosives and a higher

F1 frequency of vowel following the plosives.

Comparison of length of voiced sound preceding plosive in final position was another important measurement in the study. The length of voiced sound preceding

/p/, /t/, and /k/ in final position was subtracted from the length of voiced sound occurring before word-final /b/, /d/, and /g/. A positive result would show that the length of voiced sound preceding a voiced plosive is longer than that of a voiceless plosive at the same place of articulation and a negative result would mean the opposite. A positive result would better fit what a speaker of English as the first language expects to hear from a voiced sound occurring before word-final voiced plosives: /b/, /d/, and /g/. A negative result would not confirm nor deny tendency to follow Javanese phonological rules since opposition between a tense and lax plosives is neutralized in word-final position.

Comparison of F1 frequencies occurring at the onset of voicing was made to supplement data on VOT length. Observation of the length of voiced sound preceding a plosive in word-final position was necessary to understand what strategies the informants used to accentuate the difference between English voiced and voiceless plosives in word-final position.

Meanwhile, the sound files were emailed to Kristen and Mum in California. The two native English speakers listened to the files on computer and filled out the sheets documenting the degree of intelligibility and gave their empirical perception concerning the sound of English they were listening to. They placed marks on the measuring ruler to record how intelligible a given informant compared to the rest and how intelligible his or her pronunciation of a certain group of words in PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

76 comparison to the rest of the word groups. Recorded perceived degree of intelligibility along with comments from the two English speakers were emailed back for further analysis.

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

77

CHAPTER IV

RESULTS AND DISCUSSION

A. Results

There were two parameters to describe the characteristics of English plosive in word-initial position. The values of onset F1 and VOT were used to describe the characteristics of word-initial /p/, /b/, /t/, /d/, /k/, and /g/ which were produced by

Javanese informants. There was one parameter used to characterize English plosive in word-final position. The duration of voiced sound preceding the final plosive were used to describe the characteristics of word-final /p/, /b/, /t/, /d/, /k/, and /g/ which were produced by Javanese informants.

The recorded words were analyzed in Praat and measurements of onset F1 frequency, VOT length, and duration of vowel preceding plosive in final position were presented below. Table 3 below enlists F1 values of voiced sound following voiced-voiceless plosives observed in the study. The F1 values were taken at the onset of the voiced sound, 10 milliseconds after voicing began.

1. Onset F1 frequency

From acoustic point of view, word-initial English plosives produced by

Javanese informants were characterized by a higher onset F1 frequency of vowel immediately following /p/, /t/, and /k/ and a reduced onset F1 frequency of vowel immediately following /b/, /d/, and /g/ in most tokens (see table 3 below).

The Javanese informants tended to produce /i:/ in peak and beak and /ɪ/ in pit and bit as the same, a shortened /i/. At the same time, they exhibited an inclination to realize English /æ/ in pack, back, cap, and gap as /e/. They also PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

78 inclined to realize English /ʊ/ and /u:/ as a shortened /u/. Female informants also tended to produce a broader F1 frequency spectrum than males.

Table 3. Range of F1 frequencies

Vowel Male Female F1 range following /p/ F1 range following /b/ F1 range following /p/ F1 range following /b/ /i/ 351.1 - 427.3 Hertz 309.3 - 413.2 Hertz 379.4 - 757.9 Hertz 326.7 - 458.6 Hertz /u/ 427.7 - 571.5 Hertz 444.7 - 591.5 Hertz 418.7 - 575.1 Hertz 411.9 - 507.9 Hertz /e/ 637.2 - 736.3 Hertz 500.5 - 519.5 Hertz 749.3 - 886.6 Hertz 494.9 – 640.8 Hertz /ɔ:/ 586.5 - 643.0 Hertz 540.0 - 649.1 Hertz 489.7 - 602.2 Hertz 432.7 - 615.7 Hertz /ɑ/ 710.0 - 850.5 Hertz 555.1 - 974.8 Hertz 694.2 - 755.0 Hertz 461.5 - 563.7 Hertz Male Female F1 range following /t/ F1 range following /d/ F1 range following /t/ F1 range following /d/ /u/ 350.4 - 466.2 Hertz 318.9 - 438.8 Hertz 308.5 - 447.5 Hertz 302.8 - 394.6 Hertz /ɔ:/ 656.4 - 734.9 Hertz 504.7 - 646.4 Hertz 562.1 - 815.7 Hertz 493.2 - 564.0 Hertz /e/ 597.9 - 718.6 Hertz 424.9 - 513.5 Hertz 615.0 - 800.4 Hertz 442.6 - 526.2 Hertz /aɪ/ 654.8 - 741.1 Hertz 466.2 - 563.0 Hertz 347.5 - 857.0 Hertz 521.8 - 606.7 Hertz /ʌ/ 589.4 - 748.0 Hertz 490.9 - 575.4 Hertz 683.7 - 839.2 Hertz 539.0 - 634.2 Hertz /aʊ/ 696.1 - 772.7 Hertz 513.6 - 600.5 Hertz 762.1 - 825.3 Hertz 494.6 - 539.1 Hertz Male Female F1 range following /k/ F1 range following /g/ F1 range following /k/ F1 range following /g/ /eɪ/ 379.8 - 464.9 Hertz 374.9 - 410.3 Hertz 454.7 - 610.5 Hertz 386.1 - 440.7 Hertz /ɑ/ 710.7 - 758.3 Hertz 453.5 - 583.0 Hertz 589.5 - 861.6 Hertz 441.1 - 561.0 Hertz /ɜ:/ 476.4 - 591.2 Hertz 426.1 - 521.1 Hertz 517.4 - 707.4 Hertz 430.0 - 488.6 Hertz /u/ 464.3 - 480.0 Hertz 379.2 - 545.7 Hertz 340.8 - 513.3 Hertz 356.0 - 398.4 Hertz /e/ 579.0 - 629.2 Hertz 425.4 - 489.4 Hertz 557.6 - 755.6 Hertz 404.9 - 486.2 Hertz /əʊ/ 509.6 - 569.4 Hertz 451.9 - 510.8 Hertz 449.9 - 749.9 Hertz 369.0 - 460.8 Hertz

Incidences of an increase in formant energy – instead of a decrease – occurring in a small number of tokens were sporadic. Such increase could have been incidental, not intentional. Occurrences of reduced F1 frequency of vowel following immediately a voiced plosive, however, were more consistent. Table 4 below shows in more detail the consistency of F1 energy reduction when a vowel occurs immediately after /b/, /d/, and /g/.

Table 4 below displays that the F1 frequency of Mita’s production of the vowel in peak at 10 milliseconds after the start of voicing of /i:/ was 455 Hertz.

Meanwhile, the F1 frequency of the same vowel in beak at the onset (10 milliseconds after the voicing of /i:/) was lower, at 399.5 Hertz. Mita’s tendency to produce a lower F1 frequency of vowel following a voiced plosive was consistent in any other tokens. The table indicates that Mita was the only informants who PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

79 consistently reduced the onset F1 frequency of voiced sound occurring after voiced plosives.

Table 4. Comparison of F1 frequencies at the onset of vowel following word-

initial voiceless-voiced plosives*

Word Pairs Mita Yudi Doni Nurul Adi Nana /p/ /b/ /p/ /b/ /p/ /b/ /p/ /b/ /p/ /b/ /p/ /b/ peak - beak 455.0 399.5 393.4 329.9 387.8 345.9 494.6 368.9 400.9 387.5 757.9 454.8 pull - bull 575.1 507.9 427.7 569.2 463.7 444.7 551.4 455.1 571.5 591.7 418.7 411.9 pit - bit 432.6 384.7 390.1 309.3 351.1 356.2 493.1 458.6 427.3 413.2 379.4 326.7 pack - back 886.6 494.9 726.3 519.5 637.2 500.5 749.3 640.8 736.3 538.3 840.1 506.4 port - bought 602.2 586.2 648.9 559.7 643.0 540.0 489.7 615.7 586.5 646.1 562.0 432.7 park - bark 722.1 527.1 850.5 974.8 710.0 555.1 694.2 461.5 764.0 707.5 775.0 563.7 /t/ /d/ /t/ /d/ /t/ /d/ /t/ /d/ /t/ /d/ /t/ /d/ two - do 447.5 391.8 443.5 428.6 350.4 423.2 369.0 371.2 466.2 318.9 388.0 302.8 torn - dawn 562.1 545.8 656.7 646.4 656.4 504.7 815.7 493.2 734.9 621.3 614.1 564.0 ten - den 615.0 526.2 718.6 478.5 597.9 424.9 800.4 497.6 643.5 513.5 735.9 442.6 tie - die 823.4 521.8 719.6 563.0 654.8 466.2 347.5 606.7 741.1 547.2 857.0 576.8 tone - done 839.2 552.4 664.1 575.4 589.4 490.9 819.6 539.0 748.0 550.2 683.7 634.2 town - down 762.1 539.1 772.7 587.1 699.7 513.6 811.3 494.6 696.1 600.5 825.3 514.2 tune - dune 432.1 390.2 432.5 415.2 352.6 377.9 406.2 394.6 415.7 438.8 308.5 371.8 /k/ /g/ /k/ /g/ /k/ /g/ /k/ /g/ /k/ /g/ /k/ /g/ cave - gave 469.9 386.1 464.9 410.3 379.8 374.9 610.5 440.7 408.8 396.8 454.7 392.1 card - guard 680.9 561.0 758.3 453.5 722.1 583.0 861.6 530.5 710.7 541.0 589.5 441.1 curl - girl 545.7 480.7 476.4 434.8 532.4 426.1 707.4 488.6 591.2 521.1 517.4 430.0 could - good 454.0 398.4 480.0 414.4 472.5 379.2 513.3 390.2 464.3 545.7 340.8 356.0 cap - gap 650.8 404.9 629.2 480.2 592.0 425.4 755.6 486.2 579.0 489.4 557.6 411.2 coal - goal 531.4 432.4 562.6 634.8 509.6 451.9 749.9 460.8 569.4 510.8 449.9 369.0 *in Hertz Nevertheless, other informants also tended to reduce the F1 frequency of vowel occurring after a voiced plosive. Yudi produced onset F1 frequency of vowel in peak higher than that in beak. The onset F1 frequency of the vowel in peak was

393.4 Hertz while in beak was lower, at 329.9 Hertz. Similar to Yudi, Doni articulated the vowel in peak with onset F1 frequency higher than in beak. The onset

F1 frequency of the vowel in peak was 387.8 Hertz while in beak was lower, at

345.9 Hertz. Likewise, the onset F1 frequency of Nurul’s vowel was higher in peak than in beak. The onset F1 frequency of the vowel was 494.6 Hertz in peak and

368.9 Hertz in beak. Adi was also an informant who produced vowel with a higher

F1 frequency at the onset in peak than in beak. The onset F1 frequency of the vowel was 400.9 Hertz in peak and lower in beak, at 387.5 Hertz. Similar to Adi, Nana PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

80 also produced the vowel in peak with onset F1 frequency higher than in beak. The onset F1 frequency of the vowel was 757.9 Hertz in peak and 454.8 Hertz in beak.

Results of F1 frequency measurement showed that Yudi, Doni, Nurul, Adi, and

Nana reduced the onset F1 frequency of voiced sound occurring after voiced plosives in most tokens observed in the study. In conclusion, English voiced plosives produced by Javanese informants were characterized by a reduction in F1 energy of the voiced sound immediately following the plosives.

2. VOT length

From acoustic viewpoint, voiceless English plosives occurring in word-initial position were characterized by problems with aspiration. In most tokens featuring word-initial /p/ observed in the study, the VOT values were shorter than 20 milliseconds hampering perceived aspiration. Figure 6 below shows that only

22.2% of all tokens featuring /p/ were aspirated. Figure 6 also demonstrates that

52.4% of all tokens featuring word-initial /t/ were aspirated, and 86.1% of tokens featuring word-initial /k/ were produced with perceivable aspiration. However, there were potential problems concerning acoustic identification of word-initial velars produced by Javanese informants. Figure 6 displays that almost all tokens featuring word-initial /g/ in the study (94.4%) were produced with very long VOT values making them easily perceivable as aspirated.

Statistically speaking, if perceived aspiration is the only tool to identify English plosives in initial position as produced by the informants, there were better chances of recognizing English alveolar than bilabials and velars. More than 50% of all tokens with word-initial /t/ were produced with perceivable aspiration, and only

21.4% of all tokens featuring word-initial /d/ were aspirated. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

81

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% /p/ /t/ /k/ /b/ d/ /g/

Aspirated Unaspirated

Figure 6. VOT lengths of voiceless and voiced plosives

Presented in table 5 below is comparison of VOT lengths of voiceless and voiced plosives in the same place of articulation. Expressed under each name is the conclusion of comparison. The statement “longer” means the VOT of word-initial voiceless plosive is longer than that of word-initial voiced plosive in the same word- pair. Likewise, the statement “shorter” signifies that the VOT of word-initial voiceless plosive is shorter than that of word-initial voiced plosive.

Table 5 shows that Mita consistently made the VOT of all her voiceless plosives longer than that of the voiced ones. Yudi, on the contrary, almost consistently produced his word-initial voiceless plosives with VOT length shorter than the VOT length of voiced plosives. Of nineteen tokens observed in the study,

Yudi made the VOT of voiceless plosive longer than the VOT of voiced plosive in three tokens only. Doni, Nurul, Adi, and Nana were inconsistent in making the VOT of voiceless plosive longer than the VOT of voiced plosive.

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

82

Table 5. Comparison of VOT lengths of voiceless and voiced plosives

Word Pairs Mita Yudi Doni Nurul Adi Nana peak - beak longer shorter shorter shorter shorter shorter pull - bull longer shorter longer shorter shorter shorter pit - bit longer shorter shorter shorter shorter longer pack - back longer shorter shorter shorter shorter longer port - bought longer longer shorter longer shorter longer park - bark longer shorter shorter longer shorter longer

two - do longer shorter longer longer shorter shorter torn - dawn longer shorter longer longer longer longer ten - den longer longer shorter longer shorter longer tie - die longer shorter longer longer longer longer tone - done longer shorter shorter longer shorter shorter town - down longer shorter longer shorter shorter shorter tune - dune longer shorter longer longer longer longer

cave - gave longer shorter shorter longer shorter shorter card - guard longer longer longer shorter shorter longer curl - girl longer shorter longer longer shorter longer could - good longer shorter longer shorter shorter longer cap - gap longer shorter longer shorter shorter longer coal - goal longer shorter longer longer longer longer

3. Duration of voiced sound preceding word-final plosive

From acoustic point of view, word-final English plosives produced by

Javanese informants were characterized by inconsistencies in duration of voiced sound preceding plosive in final position. Presented in table 6 below is the comparison of duration of voiced sound occurring before word-final plosives. The statement “longer” means the duration of voiced sound coming immediately before word-final voiced plosive is longer than the duration of the same voiced sound when preceding word-final voiceless plosive. The statement “shorter”, on the other hand, indicates that voiced sound immediately preceding word-final voiced plosive is shorter in duration than the same voiced sound when occurring immediately before word-final voiceless plosive.

Table 6 demonstrates that that Mita invariably made the duration of voiced sound occurring immediately before word-final voiced plosive longer than that of voiced sound immediately preceding word-final voiceless plosive. The voiced PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

83 sound coming immediately before final voiced plosive was always longer in all observed tokens produced by Mita.

Table 6. Comparison of duration of voiced sound preceding final plosives

Word Pairs Mita Yudi Doni Nurul Adi Nana Final Bilabial rip – rib longer shorter shorter longer longer longer rope – robe longer shorter shorter longer shorter longer tap – tab longer shorter longer longer shorter shorter cap – cab longer shorter longer longer longer shorter tripe – tribe longer shorter longer longer longer shorter Final Alveolar bet – bed longer shorter shorter shorter longer longer late – laid longer longer longer longer longer longer set – said longer longer longer longer longer longer heart – hard longer shorter longer longer longer shorter sight – side longer longer longer longer longer shorter brought – broad longer longer shorter shorter shorter shorter Final Velar pick – pig longer longer shorter longer shorter longer back – bag longer longer longer longer shorter shorter dock – dog longer shorter longer longer shorter longer lock – log longer shorter shorter longer longer longer

The rest of informants, however, were inconsistent. They made the duration of voiced sound immediately preceding word-final voiced plosive longer than that of voiced sound occurring immediately before word-final voiceless plosive in some tokens and shorter in some others.

B. Discussion

1. Parameters in the study

The acoustic characteristics of English plosives produced by Javanese informants were described using the following parameters: VOT length, onset F1 frequency, and duration of voiced sound occurring immediately before final plosive. The first parameter, the VOT length, plays a major role in phonemic contrast between English voiced-voiceless plosives in word-initial position. In

English, phonemic contrast between word-initial plosives is realized in difference in VOT lengths. By stretching VOT of an English plosive long enough, the span of delay between plosive release and the start of voicing can be perceived as aspiration PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

84 by an English hearer. Perceived aspiration gives a hint to an English hearer that the word-initial plosive is a voiceless one. For a plosive to be perceived as aspirated, its VOT should be longer than 20 milliseconds. On the contrary, a word-initial voiced plosive is indicated by a very short – almost zero – VOT length. A VOT less than 20 milliseconds is not perceived as aspiration by speakers of English as the first language. It can be inferred, therefore, that VOT of an English voiced plosive in word-initial position is shorter than that of the voiceless one in the same environment. The phonemic contrast between word-initial Javanese plosives, on the other hand, does not rely solely on comparison of VOT length.

In fact, one of realizations of contrast between lax-tense plosives is reduced, instead increased, VOT length of “voiceless” Javanese plosive. The VOT of a

Javanese “voiced” plosive in word-initial position is longer than that of the

“voiceless” one in the same environment. The so-called Javanese “voiceless” oral stops are termed tense plosives and their “voiced” counterparts are termed lax plosives in the study. The reason for the special naming is clear. All Javanese plosives are in fact voiceless in all positions except immediately after nasal.

Because voicing is not distinctive, there should be (an)other feature(s) that contrast(s) lax and tense plosives. As Brunelle (2010) argues, the phonemic contrast between lax-tense plosives is realized in two opposing registers, high versus low.

Each register includes a group of acoustic features. Thus, the opposition between lax-tense plosives is realized in two opposing groups of features. Matisoff (1973) argues that adoption of register system by a non-tonal language such as Javanese is due to high mobility of people in the region enabling language contact in the past. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

85

Lax and tense plosives are identifiable by features associated with them. The features that contrast lax-tense plosives include length of VOT, reduction in formant energy and pitch frequency, and a change in voice quality. The VOT of lax plosives is longer than that of tense plosives. The F1 value of a vowel following immediately a lax plosive is lower than the F1 value of the same vowel following immediately a tense plosive. The F0 frequency at the onset of a vowel following immediately a lax plosive is lower than the F0 frequency at the onset of the same vowel following immediately a tense plosive. Also, lax plosives are produced with breathier voice quality while tense plosives are articulated with clearer voice quality. Except for a change in voice quality, features such as length of VOT, reduction in formant energy, and decrease in pitch frequency are easily seen on waveform and spectrogram. That is the reason a change in voice quality is not the focus of the study. Difference in VOT is discernable by measuring its length, reduction in formant energy is observable by making comparison of F1 values, and decrease in pitch frequency is detectable by comparing F0 frequencies. Because

Thurgood (2004) insists that F1 is a more consistent feature contrasting lax-tense plosives than F0, the latter is not the focus of the study.

Thus, the investigation into the characteristics of word-initial English plosives produced by Javanese informants has made the most of measurement of VOT length and comparison of F1 values. Understanding the characteristics of word- final plosives produced by Javanese informants, however, requires a different outlook.

Wedhawati et al. (2010) have suggested that the opposition between lax-tense plosives is suspended in word-final position. Since opposition is neutralized, the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

86 only oral stops prevailing in word-final position are Javanese tense plosives.

English, on the other hand, contrasts two series of plosives in word-final position.

The contrast is realized in the duration of voiced sound preceding word-final plosives. A word-final English voiced plosive is recognized by fully long duration of voiced sound occurring immediately before the plosive itself. A word-finally voiceless plosive, on the other hand, is marked with pre-fortis clipping. It is shortened duration of voicing of voiced sound because the sound occurs immediately preceding a voiceless plosive. An English hearer relies of the duration of voiced sound occurring immediately before a final plosive to identify whether the plosive is voiced or voiceless. The fact that contrast between two series of plosives in word-final position is known to exist in English but not in Javanese may pose challenges to English speakers whose first language is Javanese. The challenge, undoubtedly, lies on how informants in the study enunciate English plosives in word-final position.

2. Javanese informants

All informants had something in common: their Javanese mother tongue, residency in Yogyakarta and its neighboring areas, and a high TOEFL score. They, however, had varied exposure to and practice with English.

Mita, a female, was born and raised in Surakarta, a city 60 kilometers away from Yogyakarta. She had her primary and secondary education in Surakarta and attended a state university in the same city. Javanese is her mother tongue. She speaks the language to family members and friends and only uses Indonesian with someone who cannot speak Javanese. She identifies her Indonesian as “styled” on

Jakarta accent, a city where she works as a civil servant after graduating from PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

87 university. She, however, reverts to “Javanese-accented Indonesian” when speaking

Indonesian with someone who clearly has Javanese accent.

Mita claimed that she had never taken a special English program, let alone a pronunciation training session, prior to attending intensive English program in PPB

UGM. She did not start to learn English formally until she was in Junior high school and she scored 590 in TOEFL at around the time the study was conducted. Although she rarely used English in everyday communication, she often participated in meetings in which English is the language to use. Her position in the office required her to use English when having a meeting with representatives from NGO or with other foreign visitors.

She watched TV series in English almost every day and English movies twice a month. She read online articles in English between 4 to 5 times a week and she enjoyed listening to English songs. She stated that singing English songs helped her improve her pronunciation.

Yudi whose score in TOEFL was remarkably high – 603 – was born in

Yogyakarta and had his entire education in the city. Being a native speaker of

Javanese, he spoke the language fluently and used it when conversing with family members and friends. He claimed that his friends considered his Indonesian and

English highly ‘Javanese-accented’. He also admitted himself of having a strong

Javanese accent. In fact, Yudi was purposefully selected to participate in the study because of his broad Javanese accent, the strongest among all informants, and his high TOEFL score.

Yudi did not learn English formally until he attended junior high school. He, however, never used English outside the class and never attended a special English PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

88 program except English lessons he received as part of his formal education. He enjoyed listening to English songs and watching sports programs presented in

English on TV such as UFC (boxing) and football matches. When confronted about his remarkable achievement in English as evidenced by his high TOEFL score, he attributed his vast vocabulary and advanced reading skills to his hobby as a teenager. He was a keen gamer in high school, and devoted much of his time for gaming. He read a lot of game walkthroughs – careful explanation of details of a game – to help him complete long and complicated games. Words used in the walkthroughs, he claimed, enriched his vocabulary considerably. In addition, thorough explanation presented in long sentences typically used in game walkthroughs helped him improve his mastery of English grammar.

Doni scored 610 in TOEFL. He hailed from Banyuwangi, a city almost 570 km away from Yogyakarta. He had all of his education in Banyuwangi until he attended a university in Yogyakarta. A native Javanese, he spoke Javanese to school friends and family but he occasionally spoke Indonesian with his children. He also resorted to Indonesian when talking to non-Javanese people he met at the university.

He described his Indonesian ‘Javanese-accented’.

His formal English lessons did not start until he attended junior high school and he rarely used English either at home or work. He claimed that the only training in English prior to the training at Pusat Bahasa UGM was English lessons as part of his formal education. He had never attended a special training in English. He also stated that he read a lot of English texts. He read the Jakarta Post online almost every morning and he enjoyed sports news. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

89

Nurul was born in Sukoharjo, an area 60 km from Yogyakarta. She resided in

Bantul, a district of Yogyakarta Special Region during the time of study. She spent her childhood and adolescence in Sukoharjo before she attended university in

Yogyakarta. She spoke Javanese most of the time with friends and family members and rarely used English at work. She sometimes spoke Indonesian with people from work with an accent typical of Javanese.

Her English lessons started when she was in junior high school as part of her formal education. She scored 630 in TOEFL and she explained that it was partly due to her love for the language. She claimed that she loved learning English. She read a lot and she enjoyed watching videos on YouTube. She watched TV series in

English and read online news. She visited Huffington Post regularly. Moreover, she received intensive training in English sometime in 2014.

Adi hailed from the same city as Mita, Surakarta. His current TOEFL score at the time of the study was 597. He had primary and secondary education in Surakarta before continuing to a university in Semarang. He was born into a Javanese family and spoke Javanese since birth. He used the language when conversing with friends and family members. With the exception of formal events, Javanese had been his primary language in most occasions.

He started to learn English as part of his formal education when he attended junior high school. During that time he claimed that he had special training in

English in addition to his formal lessons in English. He stated that he listened to

English songs every day and read online articles for work/study-related references. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

90

Nana made the highest TOEFL score, 640, among all informants. She was born in Magelang, a city 40 km to the north of Yogyakarta, and attended primary and secondary education there. She pursued a higher education in Yogyakarta and obtained a master degree in English. She had special training in pronunciation as part of her education in the university. She, without doubts, had the most intensive and extensive training in English among all informants.

She was born into a Javanese family and spoke Javanese since birth. She used the language when conversing with friends and family. When she was in junior and senior high school, she sometimes used Indonesian with school friends. She mainly used Indonesian when she had a conversation with people she met at the university.

She spoke Indonesian with highly reduced Javanese accent that, as she claimed, people would not believe that Javanese was her mother tongue. Since teaching

English was her profession, she always used English at work. She also used English at home because there were usually foreign students staying at her house.

The accounts above indicate that all informants did not share the same level of exposure to and practice with English. Different exposure and practice, as it turns out, have implications for the production of English plosives and degree of intelligibility. Longer exposure to and practice with English benefited informants, as more exposure and practice enhanced the level of intelligibility of English words produced by informants in the study.

3. The characteristics of English plosives produced by Javanese informants

To understand the characteristics of English plosives produced by Javanese informants in the study, discussion will be made into two parts. How Javanese phonological rules influence production of word-initial English plosive will be PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

91 discussed first. Two parameters in the study, VOT length and F1 frequency will also be discussed here. Next, the production of word-final English plosives by informants will be considered later in the second part. The last parameter in the study, the duration of voiced sound preceding word-final plosive, will be discussed in the second part. a. English Plosive in Word-Initial Position

To reveal how Javanese phonological rules may affect production of word-initial

English plosives and what attempts being made by Javanese informants to enunciate the difference between initially positioned voiced-voiceless plosives, understanding of how such rules work is pivotal. It is especially necessary to fathom how speakers of Javanese as the first language distinguish between two contrasting plosive groups, tense and lax.

The distinctive feature separating Javanese lax plosives from tense plosives is not a single feature. In fact, it is a group of features called register. There is low register whose features are associated with Javanese lax plosives. There is another group of features, high register, which is associated with tense plosives. Speakers of Javanese as the first language distinguish word-initial lax plosive from tense plosive based on its longer VOT, breathier voice quality, and how the immediately- following voiced sound is produced. The succeeding voiced sound is produced with reduced pitch frequency and reduced formant energy. These reductions are observable from measurement of the succeeding voiced sound (or vowel). The F0 and F1 values of succeeding vowel are lower when the vowel immediately follows a lax plosive than a tense plosive. Thus, the F0 and F1 values of a voiced sound occurring immediately after lax bilabial, dental, retroflex, or velar are lower than PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

92 the F0 and F1 values of the same voiced sound occurring immediately post the corresponding tense plosive. Since result from investigation by Thurgood (2004, p.

290) shows that reduction in formant energy is a more consistent marker of contrast between lax-tense plosive than reduction in pitch, the study used value of first formant (F1) instead of F0 to show influence of Javanese phonological rules to the production of English plosives.

1) Reduction in F1 frequency

While in fact all informants reduced the F1 energy of a voiced sound following a voiced plosive and produced the same voiced sound following a voiceless plosive with a higher F1 frequency in nearly all observed word pairs, they – with the exception of Mita – randomly did the opposite with a small group of tokens. As shown in table 6 below, most informants variably produced /ʊ/, /ɪ/, /ɑ/, /ɑɪ/, and /u:/ following /b/ and /d/ in bull, bit, bark, do, die, and dune with a higher F1 frequency while they produced the same voiced sounds /ʊ/, /ɪ/, /ɑ/, /ɑɪ/, and /u:/ succeeding

/p/ and /t/ in pull, pit, park, two, tie, and tune with a lower F1 frequency.

Table 6 below describes onset F1 frequency of vowel following word-initial plosive. The F1 frequency was taken at 10 milliseconds after the start of voicing.

The statement “reduced” means the onset F1 frequency of vowel occurring immediately after word-initial voiced plosive is lower than that of the same vowel following word-initial voiceless plosive. Likewise, the statement “increased” signifies that the onset F1 frequency of vowel immediately following word-initial voiced plosive is higher than that of the same vowel occurring immediately after word-initial voiceless plosive. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

93

Yudi and Adi produced /ʊ/ in bull with an increased F1 frequency than it was in pull. Although Yudi produced voiced sounds immediately succeeding voiced plosive with a reduced frequency in most tokens, an exception existed in three word pairs. In addition to /ʊ/ in bull /bʊl/, Yudi also produced /ɑ/ in bark /bɑ:rk/ and /oɑ/ in goal /goɑl/ with an increased frequency as demonstrated by the raised F1 frequency.

Table 7. Description of F1 frequency of vowel following voiced plosive as compared to F1 of the same vowel following the corresponding voiceless plosive

Word Pairs Mita Yudi Doni Nurul Adi Nana peak - beak Reduced Reduced Reduced Reduced Reduced Reduced pull - bull Reduced Increased Reduced Reduced Increased Reduced pit - bit Reduced Reduced Increased Reduced Reduced Reduced pack - back Reduced Reduced Reduced Reduced Reduced Reduced port - bought Reduced Reduced Reduced Reduced Reduced Reduced park - bark Reduced Increased Reduced Reduced Reduced Reduced two - do Reduced Reduced Increased Increased Reduced Reduced torn - dawn Reduced Reduced Reduced Reduced Reduced Reduced ten - den Reduced Reduced Reduced Reduced Reduced Reduced tie - die Reduced Reduced Reduced Reduced Reduced Reduced tone - done Reduced Reduced Reduced Reduced Reduced Reduced town - down Reduced Reduced Reduced Reduced Reduced Reduced tune - dune Reduced Reduced Increased Reduced Increased Increased cave - gave Reduced Reduced Reduced Reduced Reduced Reduced card - guard Reduced Reduced Reduced Reduced Reduced Reduced curl - girl Reduced Reduced Reduced Reduced Reduced Reduced could - good Reduced Reduced Reduced Reduced Increased Increased cap - gap Reduced Reduced Reduced Reduced Reduced Reduced coal - goal Reduced Increased Reduced Reduced Reduced Reduced

As previously stated, production of /ʊ/ in bull with an increased F1 frequency was also achieved by Adi. Further, he articulated two more tokens in which vowel was produced with a raised F1 frequency when following a voiced plosive. Adi produced /ʊ/ in good as well as /u:/ in dune with an increased frequency and /ʊ/ in could and /u:/ in tune with a reduced frequency. In total Adi raised F1 frequency of vowel following a voiced plosive and reduced the F1 frequency of the same vowel PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

94 following a voiceless plosive in three word pairs. In most word pairs, Adi produced the vowel following a voiced plosive with a reduction in F1 energy.

Similar to Adi, Doni also articulated the vowel in dune with F1 frequency higher than in tune. Again, similar to other male informants in the study, he articulated the vowel following a voiced plosive with an increased F1 frequency and a voiceless plosive with a reduced F1 frequency in three word pairs only. Doni produced /u:/ in both dune and do and /ɪ/ in bit with a higher F1 frequency than in tune, two, and pit. In the rest of the tokens observed in the study, Doni produced vowel with a reduced F1 frequency when the vowel was subsequent to a voiced plosive and an increased F1 energy when following a voiceless plosive.

All male informants (Yudi, Doni, and Adi) reduced the F1 frequency of vowel following a voiced plosive and increased the F1 frequency following a voiceless plosive except in a small number of word pairs. Each of male in the study made an increase, instead of reduction, in F1 frequency of vowel following voiced plosive in three word pairs only. All male informants did not share the exact same word pairs in which post-voiced-plosive vowel received an increase in F1 frequency, except in /ʊ/ in bull and /u:/ in dune. These two were the only instances in which two male informants shared a similar vowel receiving an increase of F1 frequency.

All female informants (Mita, Nurul, and Nana), on the other hand, increased the

F1 frequency of a vowel following voiced plosive less often. In other words, female informants in the study reduced the F1 value of the vowel subsequent to a voiced plosive more often than male informants. Mita always, without fail, reduced the F1 value of voiced sound following immediately a voiced plosive. While Mita made

100% occurrences of F1 frequency reduction, Nurul increased, instead of reduced, PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

95

F1 value of the vowel following a voiced plosive in one token only. She produced

/u:/ in do with F1 value higher than in two. She, however, produced the vowel succeeding a voiced plosive with a reduction in F1 frequency in the rest of tokens.

Nana, similar to other informants, tended to reduce F1 frequency of the vowel subsequent to a voiced plosive. Yet, she made an increase in F1 frequency of a vowel following a voiced plosive in two tokens. Nana produced /ʊ/ in good and /u:/ in dune with an increased frequency and /ʊ/ in could and /u:/ in tune with a reduced frequency.

Female informants did not share among themselves a similar token in which there was an increase in F1 frequency of a vowel succeeding a voiced plosive.

However, when they did share the same word pair, they shared it with male informants. Both Nurul and Doni increased the F1 frequency of /u:/ in do and reduced the F1 frequency of the same vowel in two. Nana, as well as Adi, increased the F1 frequency of /ʊ/ in good. Nana also made an increase in the F1 frequency of

/u:/ in dune, which was also done by Doni.

With very few exceptions, all Javanese informants tended to reduce the F1 frequency of a vowel following immediately /b/, /d/, or /g/ and increase the F1 frequency of vowel succeeding /p/, /t/, or /k/. Such tendency is consistent with the realization of contrast between two opposing Javanese plosives, lax and tense. In

Javanese, the vowel occurring immediately after /b/, /d̪ /, /ɖ/, and /g/ is produced with a reduced F1 frequency and after /p/, /t̪/, /ʈ/, and /k/ with an increased F1 frequency.

Exceptions existed, though. There were tokens in which two informants or more produced certain vowels succeeding immediately /b/, /d/, and /g/ in an unusual PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

96 fashion; they increased, instead of reduced, the F1 frequency of particular vowels following a voiced plosive. This was especially true with /ʊ/ and /u:/ following immediately /b/, /d/, and /g/. The vowel /ʊ/ occurring immediately after /b/ in bull and after /g/ in good was produced with an increased F1 frequency by two informants. The vowel /u:/ immediately subsequent to /d/ in do was produced with an increased F1 frequency by two informants and /u:/ in dune by three informants.

These particular exceptions are consistent with findings in Javanese reported by

Thurgood (2004, p. 289). In her study, Thurgood confirms that the Javanese vowel

/u/ occurring immediately after a lax plosive (/b/, /d̪ /, /ɖ/, or /g/) is produced with increased – instead of reduced – F1 and F2 frequencies. On the other hand, the same

Javanese vowel /u/ succeeding a tense plosive is produced with reduced F1 and F2 frequencies. Moreover, Thurgood reports that the peculiarity of /u/ in Javanese is found in both male and female informants. In our study, that the F1 frequency of

/ʊ/ and /u:/ is increased – instead of reduced – when the vowels occur immediately after a /b/, /d/, and /g/ was also found in both male and female informants. In our study the increased F1 frequency of /ʊ/ and /u:/ is found in tokens produced by all male informants (Yudi, Doni, and Adi) and female informants (Nurul and Nana).

Again, all informants in the study in general tended to reduce the F1 frequency of vowel following voiced plosives.

Figure 7 below displays the comparison of F1 frequency reduction in all tokens observed in the study. The red bars indicate the percentages of F1 frequency reduction of vowel following voiced plosives. The purple bars illustrate the percentages of F1 frequency increase of vowel following voiced plosives. The stack PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

97 diagram in table 7 presents the fact that both male and female informants reduced

F1 frequency of vowel immediately succeeding /b/, /d/, and /g/ in most tokens.

100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Mita Yudi Doni Nurul Adi Nana

Increased F1 Reduced F1

Figure 7. Comparison of formant frequency reduction

The table also proves that female informants reduced F1 frequency of vowel immediately succeeding a voiced plosive more often than their male counterparts.

Mita produced the vowel following a voiced plosive with a reduction in F1 frequency in all tokens (100%). Nurul and Nana reduced the F1 value of vowel immediately subsequent to a voiced plosive in 18 tokens (94.7%) and 17 tokens

(89.5%), respectively.

Male informants, on the other hands, made a reduction in F1 frequency less frequently than female informants. All male informants reduced the F1 values in 16 tokens (84.2%). Each of male informant made an increase in F1 value of the vowel following a voiced plosive at a similar rate, in three tokens out of 19 observed tokens. Thus, female informants in the study were more likely to articulate the vowel occurring immediately subsequent to a voiced plosive with a reduction in F1 value than male informants. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

98

2) Perceived aspiration

As previously described, feature that separates Javanese tense from lax plosives in word-initial position is high register. This register includes clearer voice quality, raised formant energy evident in the higher F1 value at the onset of post-plosive vowel, and a shorter VOT for tense plosives. More can be said about the latter.

Javanese phonological rules prescribe that word-initial tense plosives /p/, /t̪/, /ʈ/ and

/k/ should have a shorter VOT than /b/, /d̪ /, /ɖ/, and /g/ in the same position. This particular setting works in direct contradiction to English aspiration, a feature of initially-positioned voiceless plosives in a stressed syllable. Aspiration in English requires that voiceless plosives /p/, /t/, and /k/ in the initial position within a stressed syllable should have a longer VOT than /b/, /d/, and /g/ in the same position. The

VOT of English voiceless plosives /p/, /t/, and /k/ in word-initial position should be more than 20 milliseconds while the VOT of English voiced /b/, /d/, and /g/ in the same position should be shorter, almost zero.

To investigate the influence of Javanese phonological rules, particularly how register system governing production of Javanese plosives influence production of

English plosives by Javanese speakers, VOT length of plosives that the informants produced should be compared. Longer VOT length of a word-initial voiceless plosive than that of a word-initial voiced plosive in the same set reflects observance of aspiration in English while shorter VOT length of a word-initial voiceless plosive than that of a word-initial voiced plosive in the same set indicates influence from

Javanese register system.

Table 8 below compares VOT length of word-initial voiceless-voiced plosives investigated in the study. Because a word-initial English voiceless plosive in a PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

99 stressed syllable is aspirated, speakers of English as the first language produce such plosive with a VOT value longer than that of a voiced plosive. Data taken from the informants, however, showed mixed results.

Table 8. Comparison of VOT length of word-initial voiceless-voiced plosives*

Word Pairs Mita Yudi Doni Nurul Adi Nana /p/ /b/ /p/ /b/ /p/ /b/ /p/ /b/ /p/ /b/ /p/ /b/ peak - beak 11 8 9 14 11 16 13 24 12 13 6 8 pull - bull 15 -87 5 22 25 22 25 40 13 32 28 31 pit - bit 39 11 12 16 10 21 16 23 10 13 0 -24 pack - back 11 -44 14 18 12 12 10 14 11 14 10 0 port - bought 66 13 13 11 13 13 13 -14 17 17 22 9 park - bark 41 -63 19 19 15 18 30 -56 16 19 7 6 /t/ /d/ /t/ /d/ /t/ /d/ /t/ /d/ /t/ /d/ /t/ /d/ two - do 83 17 14 17 63 34 85 25 10 18 12 17 torn - dawn 39 12 9 13 61 17 56 -83 22 15 14 12 ten - den 21 14 38 13 11 23 15 13 12 18 16 9 tie - die 42 14 15 19 15 12 95 14 20 17 8 7 tone - done 52 12 19 21 14 18 65 9 11 14 4 7 town - down 40 12 16 21 41 13 56 94 11 16 9 11 tune - dune 69 23 19 23 87 16 93 23 88 18 86 19 /k/ /g/ /k/ /g/ /k/ /g/ /k/ /g/ /k/ /g/ /k/ /g/ cave - gave 56 28 15 32 49 66 95 39 30 49 29 31 card - guard 69 35 29 18 66 50 0 44 16 30 71 36 curl - girl 63 33 27 59 50 31 71 59 22 25 65 52 could - good 67 37 19 35 80 44 42 49 24 41 47 32 cap - gap 58 19 15 25 48 41 28 30 28 31 44 22 coal - goal 72 30 31 53 72 52 88 45 30 26 80 45 *in milliseconds There were informants marked by their inconsistency. They produced word- initial voiceless plosives with a VOT value shorter than that of voiced plosives in certain word pairs (which can be different from informant to informant) and longer in other pairs. There were other informants, however, exhibiting an inclination to consistently prolong the VOT of word-initial voiceless plosives and shorten the

VOT of word-initial voiced plosives. As a result, their VOT of word-initial voiceless plosives was generally longer than that of word-initial voiced plosives.

On the contrary, there were those having tendency to do completely the other way around: curtailing the VOT of word-initial voiceless plosives and extending the

VOT of word-initial voiced plosives. As the outcome, their VOT of word-initial voiceless plosives was generally shorter than that of word-initial voiced plosives. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

100

Mita was an informant showing high consistency in the production of word- initial English plosives. In fact, she was the only informant in the study who produced all word-initial /p/, /t/, and /k/ with a VOT length longer than that of word- initial /b/, /d/, and /g/. Yet, the difference in VOT length among different pairs of tokens varied widely. There were word pairs spoken by Mita unlikely to be perceived as different words because the slim difference in VOT values of voiced plosive and voiceless plosive was too short to be significant. Mita’s VOT of /p/ in peak was +11 milliseconds while VOT of /b/ in beak was +8 milliseconds. The difference between both VOTs was negligible because it was only 2 milliseconds.

Such insignificant difference in VOT length between peak and beak produced by

Mita may lead an English hearer to perceive the two words as the same. In another case, Mita produced ten with a VOT length of 21 milliseconds, a value that can be said to barely lend aspiration to word-initial voiceless alveolar /t/. However, the

VOT of the corresponding voiced alveolar /d/ in den was only 6 milliseconds shorter which may add difficulty in perceiving Mita’s ten and den as two distinct words.

With the exception of three voiceless bilabial tokens, Mita produced most of word-initial voiceless bilabial and all of word-initial voiceless alveolar and velar with VOT values above 20 milliseconds. The VOT values of most of Mita’s word- initial voiceless bilabial were between 39-66 milliseconds, all of word-initial voiceless alveolar between 21-83 milliseconds, and all of voiceless velar between

56-72 milliseconds (table 8 above). In other words, she could manage to impart the quality of aspiration in most of word-initial /p/ and all of word-initial /t/ and /k/ observed in the study. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

101

There were three tokens produced by Mita in which the VOT of word-initial voiceless bilabial was lower than 20 milliseconds. Her articulation of peak (VOT

11 milliseconds), pull (VOT 15 milliseconds), and pack (VOT 11 milliseconds) was hardly perceived as aspirated since the VOT values of the voiceless bilabial in the aforementioned tokens were shorter than 20 milliseconds.

While in fact VOT values of all of Mita’s word-initial voiced and voiceless velars were long (most above 20 milliseconds), the VOT values of Mita’s word- initial voiceless velar were much longer than those of voiced velar by comparison.

However, articulated in isolation, Mita’s word-initial voiced velar in most tokens could have been identified as slightly aspirated since their VOT values were between 19-30 milliseconds.

Similar to Mita, Yudi was in general consistent in producing his word-initial plosives. His consistency, however, was not in that he had always produced word- initial voiceless plosives with a VOT longer than that of voiced plosives. To the contrary, he almost always produced word-initial /p/, /t/, and /k/ with a VOT shorter than that of word-initial /b/, /d/, and /g/. The VOT values of Yudi’s word-initial /p/ in all tokens were between 5-19 milliseconds and /b/ between 11-19 milliseconds.

With the exception of the pair ten – den, in the rest of tokens the VOT values of

Yudi’s word-initial /t/ were between 9-19 milliseconds and /d/ between 13-23 milliseconds. The VOT values of Yudi’s word-initial /k/ in all tokens were between

15-31 milliseconds and /g/ between 18-59 milliseconds (table 8 above).

This is in direct contradiction to the realization of contrast between word-initial

English voiced-voiceless plosives. In English, the VOT of an initially-positioned voiceless plosive in a stressed syllable should be at least 20 milliseconds and should PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

102 be longer than the VOT of its voiced counterpart to enable hearer to recognize the plosive as voiceless. VOTs of Yudi’s voiceless plosives were shorter than those of their voiced counterparts. By comparison, the relatively longer VOT of his intended-to-be voiced plosive rendered the plosive prone to misidentification as the voiceless one. Moreover, Yudi articulated most of word-initial plosives that were meant to be voiceless with VOT less than 20 milliseconds. Except for his production of ten, card, curl, and coal, most of Yudi’s initially-positioned “voiceless” plosives were likely to be perceivable as voiced because the VOT was too short to be recognized as aspiration. of As a matter of fact, there were certain word pairs in which the VOT of word-initial “voiced” plosive was not only longer than that of the “voiceless” counterpart, but it also exceeded 20 milliseconds. As a result, his word-initial “voiced” plosives in certain tokens could have been misidentified as voiceless because of their longer VOT and more importantly, because the VOT was long enough to be perceived as aspiration.

Yudi produced bull with VOT of 22 milliseconds while he articulated pull with

VOT of merely 5 milliseconds. The word bull could have been mistakenly recognized as pull because the VOT was long enough to be perceived as aspiration accompanying initially-positioned voiceless bilabial. At the same time, the VOT of pull was too short to be significantly perceivable as aspirated /p/. In the same way, the VOT of word-initial plosive in Yudi’s down, which was 21 milliseconds, was long enough to be perceivable as aspiration of /t/. Yudi’s town, on the other hand, was produced with a VOT unlikely to be perceived as aspiration, 16 milliseconds.

There were tokens indicating that to a certain degree Yudi produced otherwise contrasting groups of plosives as similar. The VOT of Yudi’s pit was 12 PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

103 milliseconds and his bit was 16 milliseconds. Apart from the fact that pit was produced without perceivable aspiration, both of his pit and bit could have been perceived as a similar word since the difference in length of VOTs was merely 4 milliseconds. Yudi also produced pack (VOT 14 milliseconds) and back (VOT 18 milliseconds) with difference in VOTs of 4 milliseconds. The insignificant differences in the length of VOTs existed in most of Yudi’s production of word- initial bilabials and alveolars. In fact, Yudi articulated park and bark both with exactly a similar VOT length, 19 milliseconds.

Yudi, however, produced word-initial velars with a significant gap in the length of VOTs between word-initial /k/ (VOT values between 15-31 milliseconds) and

/g/ (VOT values between 18-53 milliseconds). Yet, the VOT length of /g/ was longer than that of /k/ in most tokens observed in the study.

Yudi’s tendency to lengthen the VOT of word-initial /b/, /d/, and /g/ so that their

VOTs became longer than those of word-initial /p/, /t/, and /k/ is in agreement with the realization of contrast between Javanese tense-lax plosives. In Javanese, the

VOTs of lax /b/, /d̪ /, /ɖ/, and /g/ are longer than those of tense p/, /t̪/, /ʈ/ and /k/. In addition, his pronunciation of two-syllabled English words was also marked with provision of equal stress to each syllable (while the words in questions were not the focus of the study, they were among the words presented to informants to read). He also consistently realized /r/ as trill and his pronunciation of the pair coal [koʊl] and goal [goʊl] is [koɑl] and [goɑl].

He, however, could manage to accentuate the contrast between ten and den in a fashion that would agree with English pronunciation patterns. The VOT of his voiceless alveolar /t/ in ten was 38 milliseconds which could be perceived as an PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

104 aspiration by an English hearer. The rest of informants, on the other hand, produces word-initial /t/ in ten with VOT about 20 milliseconds or less; their voiceless alveolar in ten could have been perceivable as unaspirated. Yudi also produced /d/ in den with a VOT less than 20 milliseconds. His VOT of voiced alveolar in den was 13 milliseconds to be exact, which makes the plosive perceivable as voiced.

Thus, Yudi outperformed other informants in the production of ten and den. He knew how to accentuate the difference in the sounds of both words. This might relate to the extent of his familiarity with the words. The word ten is daily vocabulary related to number and den is a familiar word for fans of role play games, the kind of games to which Yudi has an ample of exposure.

Unlike Yudi, Doni produced less word-initial voiceless plosives with shorter

VOTs. Aside from Doni’s peak (VOT 11 milliseconds), pit (VOT 10 milliseconds), pack (VOT 12 milliseconds), port (VOT 13 milliseconds), park (VOT 15 milliseconds), ten (VOT 11 milliseconds), tone (VOT 14 milliseconds), and cave

(VOT 49 milliseconds) which were articulated with VOTs of word-initial plosive shorter than those of word-initial plosive in beak (VOT 16 milliseconds), bit (VOT

21 milliseconds), back (VOT 12 milliseconds), bought (VOT 13 milliseconds), bark

(VOT 18 milliseconds), den (VOT 23 milliseconds), done (VOT 18 milliseconds), and gave (VOT 66 milliseconds), he produced more than half of tokens observed in the study with VOT of word-initial voiceless plosives longer than that of word- initial voiced plosives (table 8 above).

He produced the word-initial plosives in pack and back with the same VOT length, 12 milliseconds. Because the VOT value in both tokens was under 20 milliseconds, it may lead to an acoustic effect of unaspirated plosive. As a PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

105 consequence an English hearer may perceive both utterances as the same word with a voiced bilabial in initial position. Doni’s production of port and bought was also the case. He produced word-initial plosives in both tokens with a VOT value of 13 milliseconds. An English hearer may find it difficult to recognize his port as a different word from bought because both word-initial bilabials were perceivable as voiced with exactly the same VOT and the same effect on the surrounding sounds.

In an almost similar case, Doni articulated pull with a VOT value of 25 milliseconds. The VOT was long enough to be perceivable as aspiration of word- initial voiceless plosive. However, the VOT length of word-initial bilabial in bull was 22 milliseconds, which was also a value above 20 milliseconds. With a gap between two VOTs of only 3 milliseconds, the very slim difference in VOT length may lead an English hearer to recognize both pull and bull as the same utterance.

Doni’s production of tie and die was also another example of this situation. The

VOT of "voiceless" bilabial in tie was only 15 milliseconds while in die was 12 milliseconds. Both VOTs were shorter than 20 milliseconds and the gap between them was only 3 milliseconds. Thus, Doni’s tie and die could be misidentified as the same utterance.

While Doni was inclined to produce more than half of observed tokens in the study with VOT of word-initial voiceless plosive longer than that of word-initial voiced plosive, he articulated /g/ in all tokens with VOT value long enough to be perceivable as aspirated velar (between 31-66 milliseconds). The VOTs of voiced velar in most tokens were indeed shorter than that of voiceless velar but the VOTs might signal aspiration. Except for VOT of /g/ in gave, all other instances of /g/ were produced with VOTs shorter than that of /k/. Yet, Doni produced word-initial PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

106

/g/ in each token with VOT value noticeably long. The VOTs of Doni's cave, card, curl, cap, and coal were 49 milliseconds, 66 milliseconds, 50 milliseconds, 80 milliseconds, 48 milliseconds, and 72 milliseconds, respectively. The VOT values of Doni's word-initial voiceless velar were long enough to be perceivable as aspiration of /k/. However, since the VOTs of Doni's gave (VOT 66 milliseconds), guard (VOT 50 milliseconds), girl (VOT 31 milliseconds), good (VOT 44 milliseconds), gap (VOT 41 milliseconds), and goal (VOT 52 milliseconds) were equally long (table 8 above), an English hearer may perceive the long VOTs of

Doni’s word-initial /g/ as aspiration as well. As a result, an English hearer may recognize Doni’s cave and gave – or other pairs with velar in the initial position for that matter – as both aspirated; except that one of them was produced with a longer delay between the release of the plosive and voicing of the vowel.

Doni’s case concerning word-initial velar was not unique. Other informants also tended to produce their word-initial voiced velar with long VOTs ranging from slightly under 20 milliseconds to far above 60 milliseconds. Indeed Doni produced word-initial /g/ with the longest VOT. In one of the observed tokens Doni’s VOT of /g/ reached 66 milliseconds. As a comparison, the VOT of English aspirated /k/ as produced by its native speakers, while it is certainly longer than that of unaspirated /g/, spans between 50-60 milliseconds (Ladefoged 2001, p. 120). Thus,

Doni’s long VOT in gave may be considered as exceptionally long by an English hearer.

Similar to Doni, Nurul made the VOT of word-initial voiceless plosives longer than that of word-initial voiced plosives in more than half of all tokens observed in the study. In less than half of all tokens, Nurul produced word-initial voiceless PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

107 plosives with shorter VOTs. Her word-initial plosives in peak (VOT 13 milliseconds), pull (VOT 25 milliseconds), pit (VOT 16 milliseconds), pack (VOT

10 milliseconds), town (VOT 56 milliseconds), card (VOT 0 milliseconds), could

(VOT 42 milliseconds), and cap (VOT 28 milliseconds) were articulated with VOT values shorter than in beak (VOT 24 milliseconds), bull (VOT 40 milliseconds), bit

(VOT 23 milliseconds), back (VOT 14 milliseconds), down (VOT 94 milliseconds), guard (VOT 44 milliseconds), good (VOT 49 milliseconds), and gap (VOT 30 milliseconds).

There were pairs in which the difference in VOT length was too short to be noticeable. The VOT of /p/ in pack was 10 milliseconds, which was hardly perceivable as aspiration, while the VOT of /b/ in bark was 14 milliseconds. The gap in VOTs was only 4 milliseconds making the difference between both contrasting plosives in initial position in park and bark negligible. Similarly, the word-initial plosive in Nurul's could was produced with VOT of 42 milliseconds.

This VOT indeed marked that the plosive occurring word-initially was aspirated; therefore it was perceivable as a voiceless velar. However, the word-initial velar in good was articulated with VOT of 49 milliseconds, which was long enough to be recognized as aspiration of a voiced velar. In addition, the gap between both VOTs was only 7 milliseconds. With such a slim gap, the word-initial velars in Nurul's could and good were not only perceivable as both aspirated but the whole words themselves were also potentially perceivable as the same. Nurul's pair of card and guard was a more extreme case of a longer VOT of "voiced" plosive. The VOT of

/k/ in card was zero making the velar perceivable as voiced. Meanwhile, the VOT of /g/ in guard was 44 milliseconds which in fact was recognizable as aspiration of PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

108 voiceless velar. As a result, an English hearer might misidentify Nurul's card as guard and vice-versa.

She, however, could manage to make the VOT of her word-initial voiceless plosives longer in the rest of observed tokens. Among word pairs in which voiceless plosives were produced with a longer VOT value, the length of VOT of Nurul's port and ten were under 20 milliseconds. Such short VOTs therefore might make it harder for an English hearer to perceive /p/ in port and /t/ in ten as aspirated. In fact,

Nurul's den was articulated with a VOT value slightly lower, only 2 milliseconds shorter. As a result, an English hearer might misidentify Nurul's ten and den as the same word.

Similar to Nurul, Adi's production of ten and den was marked with a very thin difference in VOT length. Adi articulated /t/ in ten with VOT of 12 milliseconds and /d/ in den with VOT 5 milliseconds longer (see table 8 above). The case with longer VOT of word-initial "voiced" alveolar than that of word-initial "voiceless" alveolar as in Adi’s den and ten was not rare among all tokens produced by Adi. As a matter of fact, Adi produced most of tokens observed in the study with VOT of word-initial voiceless plosives shorter than that of word-initial voiced plosives.

Word pairs in which Adi managed to produce initially-positioned voiceless plosive with VOT longer than word-initial voiced plosive were tie – die and coal – goal. The VOT of Adi's tie was 20 milliseconds which was barely perceivable as aspiration. The VOT of die, on the other hand, was 17 milliseconds, which was only

3 milliseconds shorter. The small difference in VOT length might lead an English hearer to recognize Adi's tie and die as the same word. Similarly, Adi articulated

/k/ in coal with VOT of 30 milliseconds. While this VOT may be perceivable as PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

109 aspiration of /k/, the VOT of /g/ in goal was 26 milliseconds, which was only 4 milliseconds shorter. Again, the small gap between both VOTs might add to difficulties in recognizing both coal and goal as distinct words. Adi's port and bought were a slightly different case. The VOTs of /p/ in port and /b/ in bought were both 17 milliseconds. This value might not only hamper perceived aspiration but also might lead an English hearer to perceive both as the same word. Adi’s tendency to make VOT of /p/, /t/, and /k/ in initial position shorter than that of word- initial /b/, /d/, and /g/ was consistent with phonological rules in Javanese concerning realization of contrast between Javanese tense and lax plosives. In Javanese, VOT of word-initial /p/, /t̪/, /ʈ/ and /k/ is shorter than that of word-initial /b/, /d̪ /, /ɖ/, and

/g/ in the same environment.

Different from Adi, Nana was more consistent in her production of initially- positioned plosives. She almost always produced word-initial voiceless plosive with a VOT longer than that of a voiced plosive. Yet, she occasionally produced voiceless plosive in initial position with a shorter VOT. She articulated peak (VOT

6 milliseconds), pull (VOT 28 milliseconds), two (VOT 12 milliseconds), tone

(VOT 4 milliseconds), town (VOT 9 milliseconds), and cave (VOT 29 milliseconds) with VOTs of plosive in word-initial position shorter than those of word-initial plosive in beak (VOT 8 milliseconds), bull (VOT 31 milliseconds), do

(VOT 17 milliseconds), done (VOT 7 milliseconds), down (VOT 11 milliseconds), and gave (VOT 31 milliseconds). The VOTs of /p/ in peak and pull, /t/ in two, tone, and town, and /k/ in cave were all about 5 milliseconds shorter than those of /b/ in beak and bull, /d/ in do, done, and down, and /g/ in gave. In these word pairs, both voiced and voiceless plosives in word-initial position were produced with almost PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

110 similar VOT length. It, therefore, might add challenge to attempt to recognize each word in the said pairs.

Among Nana's production of word pairs in which VOT of voiceless plosive is longer than that of voiced plosive, there were tokens in which the VOT of

"voiceless" plosive in initial position was not long enough to be perceivable as aspiration. Nana produced /p/ in pack with VOT of only 10 milliseconds. By comparison, this VOT was longer than that of /b/ in back which was zero, but the

VOT of word-initial bilabial in pack was too narrow to be perceivable as aspirated

/p/. In a different case, the VOT of an initially-positioned "voiceless" plosive was not only too short to be recognizable as aspiration but it was only slightly longer than that of a voiced plosive.

The VOT of /p/ in park was 7 milliseconds while that of /b/ in bark was 6 milliseconds. The almost similar VOTs might lead an English hearer to recognize both words as the same. Likewise, the VOTs of /t/ in torn, ten, and tie were 14 milliseconds, 16 milliseconds, and 8 milliseconds, respectively. These VOT values were undoubtedly insufficient to render perceived aspiration. Meanwhile, the VOTs of /d/ in down, den, and die, in the same order, were 12 milliseconds, 9 milliseconds, and 7 milliseconds. It shows that the VOT values of /t/ were only slightly bigger than those of /d/. An English hearer, therefore, might find the difference in VOT values negligible. In other words, the short VOTs of /t/ and only slightly shorter

VOTs of /d/ might lead an English hearer to recognize all word-initial plosives in the above words as voiced alveolar.

In another case, the difference between word-initial voiced and voiceless plosive was more straightforward. The word-initial bilabial in port was perceivable as PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

111 aspirated voiceless /p/ with VOT of 22 milliseconds. On the other hand, the word- initial plosive in bought was recognizable as voiced bilabial because the plosive was produced with a short VOT, only 9 milliseconds. Almost similarly, the VOT of /t/ in Nana’s tune was noticeably long, 86 milliseconds. An English hearer might readily recognize the word-initial plosive as a voiceless alveolar with such a long span of VOT. The VOT of Nana's /d/ in dune, on the other hand, was less than 20 milliseconds, which might help an English hearer to perceive it as a voiced alveolar.

To understand which tokens produced by informants that mirror expected contrast between English voiced and voiceless plosive in initial positions, clear criteria need to be set. The criteria should be able to filter out tokens that might not agree with what an English hearer might expect from word-initial plosives. In this section, two criteria have been set to select word pairs produced by informants in the study. First, the VOT of voiceless plosive in word-initial position should be longer than that of word-initial voiced plosive. Second, the minimum VOT value of word-initial voiceless plosive should be 20 milliseconds to render perceivable aspiration.

Table 9 below shows the percentages of selected word pairs – of all pairs with the same type of plosive – which might agree with what an English hearer would expect from English plosives in word-initial position.

Table 9. Word pairs reflecting expected realization of contrast between voiced-

voiceless plosive in word-initial position*

Word-initial Mita Yudi Doni Nurul Adi Nana Plosives Bilabial 50 0 16.67 16.67 0 16.67 Alveolar 100 14.29 57.14 71.49 42.86 14.29 Velar 100 16.67 83.33 50 16.67 83.33 *in per cent

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

112

The table displays that Mita could only manage to produce half of all word-initial bilabials with plosives that meet the two criteria set above (50%). Yet, she was able to produce all word-initial alveolars (100%) and all word-initial velars (100%) with

VOT of voiceless plosives longer than that of the voiced counterparts and the VOT of voiceless plosives spans for more than 20 milliseconds. Table 9 indicates that producing word-initial bilabials that fit the above criteria poses more challenge to

Mita than producing word-initial alveolar or velar.

Yudi on the other hand, failed to produce any word-initial bilabials that agree with the criteria above. He could only meet the criteria in one out of seven word pairs (14.29%) featuring word-initial alveolar and one out of six word pairs

(16.67%) featuring initially-positioned velar. It shows that to Yudi producing word- initial bilabials that fit the above criteria is more challenging than producing word- initial alveolar or velar.

Doni could only produce one out of six word pairs (16.67%) featuring initially- positioned bilabials that agrees with the above criteria. He could produce more pairs that fit the criteria when alveolar is in initial position (57.14%). He performed best when velar is in initial position (83.33%). Similar to Mita and Yudi, table 9 shows that Doni had more difficulties producing word-initial bilabials that meet the above criteria.

While Nurul could only produce one out of six word pairs featuring word-initial bilabials that fits the criteria, most of her production of word-initial alveolars meet the criteria (71.49%). She managed to produce half of all word-initial velars (50%) that fit the criteria. Table 9 shows that Nurul had more difficulties producing word- initial bilabials that meet the above criteria than word-initial alveolar or velar. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

113

Adi failed to produce any word-initial bilabials that meet the above criteria. It means that the VOT of all of Adi's voiceless bilabial was shorter than that of voiced bilabial and none of VOTs of /p/ exceeded 20 milliseconds. He however, performed better and produced more word pairs that fit the criteria above when alveolar is initially-positioned (42.86%). He produced one out of six word pairs (16.67%) in which initially-positioned velar agrees with the criteria above. Table 9 shows that producing word-initial bilabials that agree with the above criteria is more challenging to Adi than producing word-initial alveolar or velar that meet the same criteria.

Nana managed to produce one out of six word pairs (16.67%) in which word- initial bilabials meet the above criteria, and one out of seven word pairs (14.29%) in which word-initial alveolars agree with the same criteria. Most of word-initial velars that Nana produced fit the criteria above, which means that the VOT of her voiceless velar in most cases was longer than that of voiced velar and most of her

VOTs of /k/ were longer than 20 milliseconds. Almost similar to other informants,

Nana had more difficulties in producing word-initial bilabials and alveolars that agree with the above criteria than producing word-initial velar that meet the same criteria.

Again, two criteria adopted for the selection of word pairs that reflect expected contrast between initially-positioned English voiceless and voiced plosives are: the

VOT of voiceless plosives should be longer than that of their voiced counterparts and the VOT of word-initial voiceless plosive should be more than 20 milliseconds to impart aspiration. Different numbers of word pairs that meet the criteria have been displayed in table 9 above and consistency has been found across all PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

114 informants concerning bilabials in initial position. Table 9 proves that all informants in the study had more difficulties in producing initially-positioned bilabials that agree with the criteria than making initially-positioned alveolars and velars that met the same criteria. In other words, it was more challenging for all informants to articulate word-initial bilabials in a way that the VOT of /p/ should be longer than that of /b/ and that the VOT of /p/ should be more than 20 milliseconds to enable perceivable aspiration. Likewise, it was less challenging for all informants to produce word-initial alveolars or velars in a way that the VOT of

/t/ should be longer than that of /d/ or /k/ than /g/ and that the VOT of /t/ and /k/ should be longer than 20 milliseconds.

A trend revealed itself when the VOTs of word-initial voiceless plosives as produced by all informants in the study were compared. Figure 6 below displays the comparison of VOTs of word-initial voiceless bilabial as produced by all informants. Lining up on the horizontal axis are observed words in which /p/ occurs in the initial position. Along the vertical axis are VOT values. Figure 6 displays what has been revealed by table 9 above; Mita produced VOT of /p/ long enough to be perceivable as aspiration in half of all observed tokens, Yudi and Adi were unable to produce any word-initial /p/ with perceivable aspiration because all VOTs were under 20 milliseconds, Doni, Nurul, and Nana each produced only a single token indicating aspiration of /p/. However, the figure also reveals tendency of all informants concerning their production of pack.

First, it should be noted that all informants exhibited an inclination to produce

/p/ in pack immediately followed by /e/, a quality of vowel which is different from what a speaker of English as the first language might expect to hear in pack. In PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

115

English, /p/ in pack is followed immediately by /æ/. However, all Javanese informants in the study tended to realize /æ/ in English as /e/ and this is not without a reason. While the vowel /æ/ is not available in Javanese, the vowel /e/ is

(Wedhawati et al. 2010, pp. 66-73). All Javanese informants in the study tended to produce the more familiar sound /e/ instead of /æ/ in the study and it agrees with observation by Collins and Mees (2013, p. 217) that speakers of Indonesian as the first language tend to produce /e/ as a substitute for /æ/.

Second, apart from the realization of /æ/ as /e/, all informants also tended to produce pack in an almost similar fashion. In figure 6, the six lines representing six informants in the study are in close proximity to each other at the word pack but not at the rest of the words along the horizontal axis. It means that all informants inclined to articulate /p/ in pack with an almost similar VOT value but not in peak, pit, park, pull, and port in which /p/ also occurs in initial position (see figure 6 below). They produced word-initial /p/ which is succeeded immediately by the vowel /e/ in pack in an almost similar way in terms of VOT length. All informants in the study articulated word-initial bilabial in pack with VOT values lining up along a narrow range. They began the voicing of vowel /e/ in pack in the neighborhood of 10-14 milliseconds after the release of bilabial /p/. This, however, was not repeated with the rest of the words.

Figure 8 below shows each informants did differently with /p/ followed by /i:/ in peak, /p/ followed by /ɪ/ in pit, /p/ followed by /ɑ:/ in park, /p/ followed by /ʊ/ in pull, and /p/ followed by /ɔ:/ in port. That all informants tended to produce /e/ after word-initial voiceless plosive with an almost similar VOT value of the plosive is also observable in figure 9. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

116

70.0

60.0

50.0

40.0

30.0

20.0

10.0

0.0 peak pit pack park pull port

Mita Yudi Doni Nurul Adi Nana

Figure 8. VOT of word-initial /p/

Figure 9 below displays the comparison of VOTs of word-initial voiceless alveolar as produced by all informants. Arranged along the horizontal axis are observed words in which /t/ occurs in the initial position. The values on the side of the vertical axis are VOT of word-initial /t/. Similar to figure 8 above, this figure demonstrates what has been expressed in table 9 above. Figure 9 below shows that

Mita produced word-initial /t/ with long enough VOT to be perceivable as aspiration in all observed tokens, both Yudi and Nana produced word-initial /p/ with perceivable aspiration in one token only, and Adi, Doni, and Nurul, each produced 3, 4, and 5 tokens out of 7 observed tokens in which /t/ was produced with perceivable aspiration.

More importantly, figure 9 below exhibits that six lines representing six informants in the study arrive in close proximity to each other at the word ten. This means that all informants tended to produce /t/ followed by /e/ in ten with almost similar VOT values. Their VOT values of word-initial /t/ in ten were in the range of 11-38 milliseconds. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

117

100.0 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 two ten tone tune torn tie town

Mita Yudi Doni Nurul Adi Nana

Figure 9. VOT of word-initial /t/

Though the range between 11 milliseconds to 38 milliseconds seems wide (the gap between the two values is 27), it is relatively close when the gaps between VOT values of word-initial /t/ in the rest of observed words are considered. In fact, the gap between the shortest and the longest VOT values of /t/ in ten is the narrowest band compared to the gaps between VOT values of /t/ in the other words observed in the study.

Informants articulated two with VOT values ranging from 10 milliseconds to 85 milliseconds (the gap between the two values is 75) and torn with VOT values lining up between 9 milliseconds and 61 milliseconds (the gap between the two values is 52). They produced tie with different VOT values ranging from 8 milliseconds to 95 milliseconds (the gap is 87), tone with VOT values starting from

4 milliseconds to 65 milliseconds (the gap is 61), town with various VOT values lining up between 9 milliseconds and 56 milliseconds (the gap is 37), and tune with

VOT values ranging from 19 milliseconds to 93 milliseconds (the gap between the two values is 74). PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

118

The fact that the various VOT values of word-initial /t/ in ten, which were produced by all informants, accumulate within a relatively narrow band means that all informants produced word-initial /t/ immediately followed by the vowel /e/ in ten with almost the same VOT values. All informants in the study began voicing the vowel /e/ 11-38 milliseconds after the release of voiceless plosive /t/ in ten. Each of informants, however, produced /t/ in the rest of the words differently. They did not articulate /t/ with almost similar VOT values when the voiceless plosive is followed by /u:/ in two, by /oʊ/ in tone, by /u:/ in tune, by /ɔ:/ in torn, by /aɪ/ in tie, and by /aʊ/ in town.

Meanwhile, figure 10 below shows the comparison of VOTs of word-initial voiceless bilabial as produced by all informants. Shown along the horizontal axis are words observed in the study, in which voiceless velar occurs in the initial position. Written along the vertical axis are the VOT values of word-initial /k/.

Figure 8 shows that the six lines representing six informants in the study come in close proximity to each other at the word cap. While all informants tended to realize

/æ/ in cap as /e/, they were also inclined to produce the word-initial /k/ in cap with

VOT values that were relatively in adjacent to each other.

Compared to the VOT values of /k/ in the other observed words, the lining up of

VOT values of /k/ in cap were relatively closer to each other. The shortest VOT of word-initial /k/ in cap was 15 milliseconds and the longest VOT of /k/ initially positioned in cap was 58 milliseconds. The gap between the two values is 43 and this figure is the smallest among other gaps in VOT lengths of the other words starting with /k/ in the study. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

119

100.0 90.0 80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 cave curl cap card could coal

Mita Yudi Doni Nurul Adi Nana

Figure 10. VOT of word-initial /k/

In the study. informants produced cave with VOT values ranging from 15 milliseconds to 95 milliseconds (the gap between the two values is 80), card with

VOT values lining up between 0 milliseconds and 71 milliseconds (the gap between the two values is 71), and curl with VOT values starting from 22 milliseconds to 71 milliseconds (the gap between the two values is 49). They articulated could with different VOT values ranging from 19 milliseconds to 80 milliseconds (the gap is

61), and coal with various VOT values lining up between 30 milliseconds and 88 milliseconds (the gap is 58). Thus, 43 is by far the narrowest gap between the shortest and the longest VOT values, and it is shown by the arrays of VOT values of /k/ in cap.

It means that in terms of VOT values, all informants tended to produce //ke-/ in cap in a relatively similar way. They sounded the vowel /e/ after a relatively similar pause following the release of /k/ in cap. All informants, on the other hand, made more varied articulatory gestures when they produced /keɪ/- in cave, /kɑ:/- in card,

/kɜ:/- in curl, /kʊ/- in could, and /koʊ/- in coal. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

120

To sum up, figure 8 demonstrates that all informants exhibited an inclination to produce /pe/- in pack in a similar fashion. They started voicing the vowel /e/ after almost the same period of time – between 10-14 milliseconds – following the release of voiceless bilabial /p/. They uniformly gave a short pause after the release of plosive /p/ before making the vowel /e/ resulting in perceived unaspiration. They, however, did not produce any other words in the group uniformly. All informants did not produce /pi:/- in peak, /pɪ/- in pit, /pɑ:/- in park, /pʊ/- in pull, and /pɔ:/- in port with similar VOT values.

Figure 9 also shows that all informants similarly tended to produce /te-/ in ten in a similar way. all informants similarly had a tendency to begin voicing the vowel

/e/ after relatively the same period of time – between 11-38 milliseconds – following the release of voiceless alveolar /t/. Most of them gave a short pause after the release of plosive /t/ before voicing the vowel /e/. Yet, all informants did not articulate /tu:/ in two, /toʊ/- in tone, /tu:/- in tune, /tɔ:/- in torn, /taɪ/ in tie, and /taʊ/- in town with similar VOT values.

The graphs in figure 8 and figure 9 both show that the six lines representing six informants in the study converge at the same vicinity which describes the VOT values of word-initial voiceless immediately followed by the vowel /e/. The area at which the lines concentrate indicates a short span of time between the release of voiceless plosive and the voicing of vowel /e/. Both figure 8 and figure 9 demonstrate that all informants engage a similar articulatory gestures following production of /pe/- and /te/- that all of them allow only a short span of pause between the release of word-initial voiceless plosive and the voicing of vowel /e/.

Though it is not as convincing as the graphs in figure 6 and 7, the graph in figure 8 PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

121 also shows that all informants share the same tendency concerning production of

/e/ following word-initial voiceless velar. The six lines representing six informants in the study are in closer proximity to each other at the sound /ke-/ than at any other sounds in which voiceless velar is in initial position.

Concerning the word-initial voiceless bilabial and alveolar, it leads to a conclusion that all Javanese informants in the study, in realizing the English /æ/ and the English /e/, they produced Javanese /e/ instead. The study shows that all informants tended to automatically activate Javanese phonological rules and produce the L1 sounds in L2. Collins and Mees (2013, p. 217) have indicated previously that speakers of Indonesian are inclined to realize English /æ/, a sound that is absent in both Indonesian and Javanese, as /e/, a sound that is familiar for speakers of Javanese as the first language because it is available in both Indonesian and Javanese (Wedhawati et al. 2010, pp. 66-73). Because all informants identified the extent of similarity between English /æ/ and Javanese /e/, they categorized both sounds as the same and produce them similarly in terms of the phonological rules of informants’ first language, Javanese. It also follows that all informants recognized the similarity between English /e/ and Javanese /e/ and as a result, the

English /e/ is highly likely to be produced in terms of Javanese /e/. This concurs with a claim by Jenkins (2000, pp. 32-33) that if a speaker recognizes a degree of similarity between the sound in L1 and the sound in L2, the speaker tends to categorize the sound in L2 and the sound in L1 as the same and produce them the same way the speaker usually produces the sound in L1.

All informants in the study already acquired Javanese and gained mastery of

Javanese as their first language. Their muscular movements concerning language PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

122 production have already been set to the production of Javanese sounds. As a result, their articulatory gestures to produce Javanese sounds are already automatic. In the study, automatic activation of sound production that complies with Javanese phonological rules occurred when the informants tried to articulate the English vowel /æ/ and /e/ and produced them as Javanese /e/. The automatic activation of muscular movements typical in production of Javanese language sound affected the production of sounds in the immediate vicinity of /æ/ and /e/. This affected how informants in the study produced English /pæ/- and English /te/-. The study shows that informants in the study tended to produce English /pæ/- in pack and English

/te/- in ten with short VOT values.

To follow Jenkins (2000, pp. 32-33), all informants identified the degree of similarity between English /pæ/- in pack and Javanese /pe/- and English /te/- in ten and Javanese /te/-. They categorized English /pæ/- and Javanese /pe/- also English

/te/- and Javanese /te/- as the same and automatically produced the sounds in a way that is more familiar to them. They automatically activated their muscular habits already set to production of Javanese sounds to produce /pe/- in pack and /te/- in ten. Since pack and ten were produced using articulatory gestures set to produce

Javanese sounds, all Javanese informants produced word-initial plosives in pack and ten in the same way they would produce Javanese lax bilabial and Javanese lax alveolar followed by Javanese vowel /e/. As a result, the VOT values of word-initial

/p/ and /t/ that all informants produced might follow the typical VOT values of

Javanese lax plosives, which is short. This is supported by the graphs in picture 6 and picture 7 showing that the VOT values of all informants were short and almost similar to each other. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

123

That all informants tended to produce /pe/- in pack and /te/- in ten in a manner reflecting Javanese phonological rules did not occur with the rest of observed words in the study, in which voiceless bilabial and velar are also in initial position. All informants did not articulate /pi:/- in peak, /pɪ/- in pit, /pɑ:/- in park, /pʊ/- in pull,

/pɔ:/- in port, /tu:/ in two, /toʊ/- in tone, /tu:/- in tune, /tɔ:/- in torn, /taɪ/ in tie, and

/taʊ/- in town with similar VOT values. Some of the informants might signal unmistakable presence of Javanese phonological in the production of /pi:/-, /pɪ/-,

/pɑ:/-, /pʊ/-, /pɔ:/-, /tu:/, /toʊ/-, /tu:/-, /tɔ:/-, /taɪ/, and /taʊ/-, some might show more inclination to comply with English phonological rules, but some others might be able to control their articulatory gestures and show compliance with English phonological rules in certain tokens only but released the control in other tokens and reverted to Javanese phonological rules.

Production of word-initial voiceless plosive that conforms to English phonological rules will cause the word-initial plosive to have a VOT value that is long enough to be perceivable as aspiration. On the other hand, articulation of word- initial voiceless plosive that conforms to Javanese phonological rules will make the word-initial plosive to have a brief VOT, which should be shorter than the VOT of its voiced counterpart. Highly varied VOT values of /pi:/-, /pɪ/-, /pɑ:/-, /pʊ/-, /pɔ:/-

, /tu:/, /toʊ/-, /tu:/-, /tɔ:/-, /taɪ/, and /taʊ/- in the study shows that informants did not comply with a single, common set of phonological rules and, as a result, they did not produce the word-initial plosives with VOT values showing uniformity. On the contrary, highly similar VOT values of /pe/- and /te/- in the study prove that all informants complied with a single, common set of phonological rules, which was

Javanese. They automatically activated the muscular movements suitable for PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

124 production of sounds conforming to Javanese phonological rules and produced the word-initial plosives with VOT values showing uniformity. At the same time, it also shows that realization of English /æ/ and English /e/ as Javanese /e/ and the automatic production of word-initial English /p/ and word-initial English /t/ in terms of sound production in Javanese were more prevalent than other combinations in the study. In other words, production of /pe/- and /te/- conforming to Javanese phonological rules is more prevalent than production of /pi:/-, /pɪ/-, /pɑ:/-, /pʊ/-,

/pɔ:/-, /tu:/, /toʊ/-, /tu:/-, /tɔ:/-, /taɪ/, and /taʊ/- following Javanese phonological rules.

3) Prevoicing

There were tokens in which Mita produced certain pairs of word-initial bilabials with a stark VOT value contrast. The VOT gap between certain tokens involving word-initial bilabials was huge, extending to more than 100 milliseconds.

The tokens pull (VOT 15 milliseconds) and bull (VOT -87 milliseconds) by

Mita, for example, showed a wide difference in VOT length. The wide gap in VOT length, however, was achieved through a strategy unusual in English. This strategy did not reflect what an English hearer would naturally expect from a contrast between word-initial voiceless-voiced plosives. The plosive /p/ in pull was hardly perceived as aspirated since its VOT was less than 20 milliseconds. On the other hand, /b/ in bull had a negative VOT value. Occurrence of VOT with a negative value – prevoicing – is not widely spread in English. In other words, prevoicing is not very common in English. The VOT of /b/ in bull was -87 milliseconds, sending the difference between the two VOT values soaring high, 102 milliseconds. Mita also produced prevoiced /b/ in back with a VOT of -44 milliseconds. The plosive PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

125

/p/ in pack, however, was unlikely to be perceived as aspirated since its VOT was only +11 milliseconds. The widest gap in VOT length (104 milliseconds) was evidenced in two tokens, park and bark produced by Mita. There was indeed perceivable aspiration of /p/ in park with a VOT of +41 milliseconds, but with prevoiced /b/ in bark (VOT -63 milliseconds), the difference in VOT length between the two was appreciably big. The only other informant who produced /b/ in bark with a negative VOT was Nurul.

Nurul started voicing 56 milliseconds before the release of bilabial plosive in bark, giving VOT of /b/ a negative value. As it was the case with Mita, Nurul’s aspiration of word-initial voiceless bilabial in park was identifiable by the VOT of

/p/, which was +30 milliseconds. Prevoicing happened when Nurul produced word- initial voiced bilabial in bought with a VOT of -14 milliseconds. The word-initial

/p/ in port, however, was not generally recognized as aspirated since its VOT was only 13 milliseconds. For the record, native speakers of English recognize aspiration when the VOT is more than 20 milliseconds. In addition, Nurul was the only informant who produced the sole token of a prevoiced voiced alveolar in the study.

She produced aspirated voiceless alveolar in town with a VOT of +56 milliseconds and voiceless alveolar in dawn with a VOT of -83 milliseconds. The difference in VOT length between the two tokens was big, 139 milliseconds.

Nurul’s prevoicing of /d/ occurring before /ɔ/ in dawn was the only occurrence of a prevoiced voiced alveolar in the study. In fact, it was the only prevoicing of non- bilabial in the study. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

126

The third informant who produced word-initial voiced with prevoicing was

Nana. Voicing was detected during the closure of word-initial voiced bilabial occurring immediately preceding /i/. The VOT of /b/ in bit was -24 milliseconds.

She, on the other hand, produced her voiceless bilabial in pit in a fashion suitable for a word-initial English voiced plosive; the /p/ in pit had a zero VOT.

That prevoicing observed in the study occurred only in voiced plosives agrees with a study by Hunnicutt and Morris (2016, p. 215). They report that all prevoiced plosives in Spanish and Russian are voiced. This relates to the point in time when voicing starts.

Voicing associated with an English voiced plosive commonly starts with or shortly after the release of the plosive. As a result, voiced plosives are not, in fact, voiced or only partially voiced. However, when voicing occurs earlier during the plosive closure rather than later after the plosive release, the voiced plosive becomes fully voiced (Kong, 2009, p. 18). English hearers identify fully-voiced plosives as voiced. Kong (2009, p. 18) states that fully-voiced plosives have been proven to be perceivable as voiced plosives as it is the case in South American

English. This explains why prevoicing occurs with voiced plosives only.

Commencing the voicing before the plosive release, with the release, or shortly after the release (less than 20 milliseconds after the release) will make the plosive perceived as voiced. Starting the voicing more than 20 milliseconds after the plosive release will result in the plosive perceived as voiceless. A plosive, though intended to be voiceless but is produced with much-too-early voicing – less than 20 milliseconds or even before the plosive release – is not generally recognized as a voiceless. It is in fact will be identified as a voiced plosive. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

127

An early voicing, as opposed to delayed voicing that may produce aspiration, gives VOT a negative value and adds an acoustic effect of a low frequency before plosive release. This effect is visible on a spectrogram as a bar of low frequency energy. Figure 9 below shows the bar of low frequency appearing before the release of /b/ in bull. As shown in the spectrogram at the bottom of figure 9, there is a dark horizontal bar that extends before the release of bilabial plosive in bull.

/pull/ /bull/

Figure 11. The pair pull – bull by Mita

In the study, occurrence of prevoicing was sporadic. Mita produced fully-voiced bilabial occurring immediately before /ʊ/, /e/ (mispronunciation of the vowel /æ/), and /ɑ:/ in bull, back, and bark and not elsewhere. While Mita’s prevoicing occurred in three tokens with word-initial voiced bilabial, Nurul’s prevoicing is found in fewer tokens. She produced prevoiced word-initial voiced plosives in bark and dawn. The word bark was the only utterance in the study that two informants – Mita and Nurul – similarly produced with prevoiced bilabial. The word dawn as spoken by Nurul was the only token in the study that was produced with a prevoiced alveolar. It makes Nurul the only informant who produced fully-voiced alveolar PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

128 preceding /ɔ/ in dawn. As a matter of fact, /d/ in this token was the only non-bilabial produced with prevoicing in the study. Nana made only a single token with prevoicing, word-initial voiced bilabial in bit.

That Mita, Nurul and Nana did not start voicing during the closure of any other voiced plosives in any other tokens suggests that prevoicing in the study was sporadic, thus it was not systematic. Due to the irregularity of the prevoicing, it is suspected that informants in the study did not purposefully produce voiced plosives with a negative VOT. Instead, it is probable that prevoicing observed in the study has been accidental. Prevoicing in the study is expected to correlate with larynx lowering.

The Javanese phonological rules dictate that lax plosives (/b/, /d̪ /, /ɖ/, and /g/) and the vowel occurring immediately after them should be produced with, among other things, lowered F1 and F2 values at the vowel onset. One of strategies to achieve such an acoustic effect, Brunelle (2010, p. 21) and also Ladefoged and

Maddieson (1996 p. 64) assert, is changing the height of the larynx. Motivated by the need to accentuate difference between word-initial voiced and voiceless plosives, informants may have spontaneously followed Javanese phonological rules. The reason for the instinctive conformity to Javanese phonological rules is because muscular movements controlling speech organs are automatic and already tuned to production of L1 sounds. Thus, certain informants in the study might have inadvertently lowered the larynx to achieve lowered F0 and F1.

A lowered larynx causes cricoid cartilage to make a rotating movement that brings about relaxation of vocal folds. What happens to vocal folds is akin to a string running along a guitar. The string of a guitar is affixed to a tuning peg that PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

129 can be rotated to tighten or slacken the string. The more stress is given to the string i.e. the string is tighten, the higher the pitch the string produces. Conversely, if the string slackens and it becomes more relaxed, the pitch it produces goes lower.

Similarly, relaxed vocal folds reflect acoustically in a lower F0 frequency. A lowered larynx, however, does more than just lowering the F0 frequency.

A lowered larynx also means an expansion of vocal tract, and an expanded vocal tract provides a bigger resonating chamber. A vocal tract is similar the wooden body of a guitar, by analogy. The hollow body of a guitar resonates the sound created by vibration of the string. The bigger the body of a guitar, the lower the resonance frequencies. Likewise, lower resonance frequencies of the vocal tract are created by expanding the vocal tract. In other words, reduction of resonance frequencies – First

Formant (F1) included – is achieved by expansion of vocal tract which is created by lowering the larynx.

Thus, not only does a lowered larynx reduce the F0 frequency, it also lowers the

F1 frequency. Since a larger chamber created by a lowered larynx produces a lower resonance frequency, a vowel produced by lowering the larynx will have a reduced

F1 value. It agrees with the data taken from informants in the study. The F1 values of /ɑ:/ following fully-voiced bilabial plosive in bark by Mita and Nurul were 527.1

Hz and 461.5 Hz, respectively (see table 3 above). This values were lower than the

F1 values of the same vowel following voiceless bilabial plosive in park by the same informants, which were 722.1 Hz and 694.2 Hz, respectively. The F1 value of /ʊ/ in bull produced by Mita is 507.9 Hz which was lower than the F1 value of the same vowel following voiceless bilabial plosive. The F1 value of /ʊ/ in pull produced by Mita was 575.1 Hz. Mita also produced /e/ (which was a realization of PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

130

/æ/) in back with F1 value lower than F1 value of the same vowel following voiceless bilabial plosive in pack. The F1 of /e/ in back (which was the realization of /æ/) was 494.9 Hz while in pack is 886.6 Hz. The F1 value of /ɪ/ in bit produced by Nana is 326.7 Hz. This value was lower than the F1 value of the same vowel following voiceless bilabial plosive in pit, which was 379.4 Hz. The vowel /ɔ:/ in dawn as produced by Nurul, the only token in the study with prevoiced alveolar plosive, also exhibited a lower F1 value. The F1 value in dawn was 493.2 Hz while it was 815.7 Hz in torn. The only exception was /ɔ:/ after bilabial plosives in port – bought produced by Nurul. The F1 value of /ɔ/ in bought was 615.7 Hz which was higher than in port, which was 489.7 Hz. While no explanation can be presented concerning the peculiarity of /ɔ/ in bought produced by Nurul, most of these findings concur with suggestion by Ladefoged and Maddieson (1996, p. 64) that a lowered F1 value indicates occurrence of a lowered larynx.

At the same time, an expanded vocal cavity created by larynx lowering has a side effect. Because the larynx is positioned lower than its normal position, the cavity above the larynx expands. A bigger cavity above the larynx potentially causes supraglottal air pressure to take longer time to rise. This is because the air needs to fill up a bigger space. The moderated rise in supraglottal pressure provides a favorable condition for movement of body of air through the glottis (Brunelle

2010, p. 11). If the glottis is not abducted nor stiffened, movement of body of air through the glottis may become powerful enough to set vocal folds vibration while the oral cavity is still constricted. As a result, vibration of the vocal folds happens during the plosive closure, which is acoustically perceived as a prevoicing. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

131

Prevoicing in the study occurred in six tokens only: five tokens with prevoiced bilabial and one with prevoiced alveolar. The finding shows that chances for accidental prevoicing to occur was bigger among voiced bilabial than voiced alveolar. This concurs with findings by Simon (2010, p. 13) that prevoicing is more likely to happen with the production of voiced labial than voiced alveolar. This,

Simon claims, relates to the relative size of oral cavity that is set during the production of plosive. To produce bilabials, an air blockage is created at the lips.

To produce alveolars, on the other hand, the blockage is created at the alveolar ridge. Different locations where the air is stopped entail different sizes of oral cavity. The oral cavity for bilabials spans from the glottis to where the lips close together, and the oral cavity for alveolars runs from the glottis to the alveolar ridge with which the tongue makes contact. Because the lips are anterior to the alveolar ridge, by comparison, the oral cavity for the production of bilabials is bigger than that for the production of alveolars. Bigger oral cavity for the production of bilabials benefits from the extra space extending from the lips to the alveolar ridge. Simon argues that since the space above the glottis is bigger during the production of labial plosives, the air pressure inside the oral cavity takes longer time to rise. Prolonged increase in air pressure leaves a window for vibration of vocal folds. In contrast, the air pressure during production of alveolars rises more quickly within a smaller oral cavity, and a rapid increase in air pressure within oral cavity is not favorable for the vocal folds to vibrate. As a result, production of bilabials better facilitates vibration of vocal folds than production of alveolars. Because chances for the vocal folds to vibrate are bigger during production of bilabials, prevoicing in the production of these plosives is more possible than in the production of alveolars. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

132

The accidental prevoicing in the study only occurred among female informants

(Mita, Nurul, anad Nana). All male informants (Yudi, Doni and Adi) do not start voicing during plosive closure. Since prevoicing in the study is suspected to stem from larynx lowering, it implies that larynx lowering among male informants may not be conducive to transglottal airflow. The supraglottal pressure in males increases too rapidly to allow body of air to pass through the glottis and set the vocal folds to vibrate. It is suspected that the larynx of male informants does not moves downwards far enough to create a cavity above the glottis spacious enough to slow the increase rate of supraglottal pressure. Brunelle (2010, p. 14) reports that vertical displacement of the larynx is indeed larger in female than in male. When the larynx of a female moves up and down, Brunelle observes, it travels a longer distance than that of a male. Because the larynx of females reaches extremities that the larynx of males does not, there is more space above the glottis in females. As a result, supraglottal pressure takes longer to rise in female, and movement of body of air through the glottis is relatively easier to achieve. Provided that the vocal folds are not wide open nor stiffened, it entails that vibration of the vocal folds may have more chances to occur during the blockage in the oral cavity in females than in males. This may explain the prevoicing that tends to occur among female informants in the study.

The above conclusion agrees with the previous conclusion regarding reduction in F1 frequency. It has been concluded previously in that section that female informants in the study were more likely to articulate the vowel occurring immediately subsequent to a voiced plosive with a reduction in F1 value than male informants. It means that female informants were more consistent in lowering the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

133 larynx to achieve a reduction in F1 frequency. Since larynx lowering was more consistent among female informants and females tend to lower the larynx to extremities that males cannot achieve, accidental prevoicing was more likely to occur among female informants. b. Duration of voiced sound preceding word-final plosives

The contrast between English voiced and voiceless plosive in word-finally is realized in the relative duration of vowel preceding word-final plosive. Also known as pre-fortis clipping, the vowel that occurs before voiceless plosives is shortened but it has full length word-finally, before voiced plosives, and before nasals and /l/

(Collins and Mees 2013, p. 58). In other words, the vowel before /p/, /t/, and /k/ is shortened in duration, but it is fully-long when occurring before /b/, /d/, and /g/. An

English hearer identifies word-final plosive by observing the vowel occurring before the plosive. If the vowel is shortened, the word-final plosive is voiceless and if the vowel is fully-long, the word-final plosive is voiced.

Meanwhile, opposition between high register and low register ceases to exist when Javanese plosives occur in word-final position. This is a result of neutralization of opposition between tense and lax plosives in word-finally. Since opposition is neutralized, there is no need for differing register to make contrast between word-final tense and lax plosives. What occurs in word-final position is one of the tense plosives: /p/, /t̪/, /ʈ/or /k/. Such neutralization may cause a speaker of Javanese as the first language not to have inherent strategies to realize contrast between two opposing groups of plosives occurring in word-final position.

Table 10 below displays the differences in duration of the vowel occurring before word-final plosives. Written horizontally on top of the table are informants PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

134 in the study and inside the left-most column are word pairs in which word-final voiceless plosive is contrasted to its voiced counterpart. The figures inside cells under each informant indicate the gap between the duration of a vowel before a voiced plosive and before a voiceless one. These figures are derived by subtracting the duration of a vowel occurring immediately before /p/, /t/, or /k/ from the duration of the same vowel preceding /b/, /d/, or /g/, respectively.

Table 10. Differences in duration of vowel occurring before word-final plosives*

Word Pairs Mita Yudi Doni Nurul Adi Nana Final Bilabial rip – rib 28.0 -17.0 -4.0 16.0 10.0 17.0 rope – robe 56.0 -36.0 -20.0 49.0 -12.0 19.0 tap – tab 22.0 -1.0 4.0 56.0 -11.0 -6.0 cap – cab 50.0 -21.0 1.0 46.0 2.0 -17.0 tripe – tribe 48.0 -7.0 13.0 35.0 29.0 -6.0 Final Alveolar bet – bed 49.0 -4.0 -1.0 -21.0 21.0 45.0 late – laid 69.0 63.0 8.0 6.0 22.0 70.0 set – said 22.0 17.0 49.0 8.0 12.0 68.0 heart – hard 77.0 0.0 45.0 23.0 9.0 -48.0 sight – side 73.0 27.0 9.0 13.0 25.0 -13.0 brought – broad 5.0 16.0 -105.0 -57.0 -24.0 -16.0 Final Velar pick – pig 36.0 3.0 -8.0 45.0 -11.0 42.0 back – bag 26.0 7.0 2.0 39.0 -4.0 -2.0 dock – dog 58.0 -10.0 1.0 36.0 -1.0 17.0 lock – log 71.0 -19.0 -5.0 6.0 7.0 43.0 *in milliseconds A positive value means that the duration of a vowel before a voiced plosive is longer than its duration when occurring before a voiceless plosive. Conversely, a negative value indicates that the duration of a vowel before a voiced plosive is shorter than its duration when occurring before a voiceless plosive.

Table 10 above shows that Mita could manage to produce the vowel occurring before word-final voiced plosives longer in duration than when the vowel occurred before word-final voiceless plosives in all tokens. In tokens such as heart – hard, sight – side, and lock – log, the differences in vowel duration were wide, more than

70 milliseconds. The gap in duration of vowel /ɔ:/ in brought and broad, however, PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

135 was only 5 milliseconds. This gap was too short to render perceivable difference in duration of /ɔ:/ before /t/ in brought and before /d/ in broad.

Yudi on the other hand, produced the vowel occurring before word-final voiced plosives shorter in duration than when the vowel occurred before word-final voiceless plosives in more than half of all tokens. The gap in duration of vowel /e/

(realization of English vowel /æ/) in tap and tab was only 1 millisecond, making it difficult to perceive the difference in duration of /e/ before /t/ in tap and /e/ before

/b/ in tab. Similarly, the gaps in duration of vowels occurring before word-final plosives in tripe – tribe (gap 7 milliseconds), bet – bed (gap 4 milliseconds), heart

– hard (gap 0 millisecond), pick – pig (gap 3 milliseconds), and back – bag (gap 7 milliseconds) were all very short making the values negligible. An English hearer who relies on the relative duration of the vowel before word-final plosive to identify the plosive itself as voiced or voiceless might find the differences in Yudi’s tripe – tribe, bet – bed, heart – hard, pick – pig, and back – bag hard to discern. The English hearer may find it difficult to recognize the words because the duration of vowel preceding word-final voiceless plosive in the aforementioned tokens were only slightly different from the duration of the same vowel when occurring immediately before word-final voiced plosive.

Doni produced more than half of all tokens observed in the study in which he made the duration of vowel preceding word-final voiced plosive longer than that of the same vowel when occurring before word-final voiceless plosive. He, however, produced rip – rib (gap 4 milliseconds), tap – tab (gap 4 milliseconds), cap – cab

(gap 1 millisecond), bet – bed (gap 1 millisecond), late – laid (gap 8 milliseconds), sight – side (gap 9 milliseconds), pick – pig (gap 8 milliseconds), back – bag (gap PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

136

2 milliseconds), dock – dog (gap 1 millisecond), and lock – log (gap 5 milliseconds) in which the gaps between duration of vowel occurring immediately before word- final voiced plosives and word-final voiceless plosives were narrow, between 1-9 milliseconds. An English hearer may find it difficult to discern the differences between Doni’s rip – rib, tap – tab, cap – cab, bet – bed, late – laid, sight – side, pick – pig, back – bag, dock – dog, and lock – log because the duration of vowel occurring immediately before word-final voiced plosives was only slightly different from the duration of the same vowel occurring immediately before word-final voiceless plosives.

Moreover, Doni produced the vowel /ɔ:/ before /t/ in brought longer by 105 milliseconds than the vowel /ɔ:/ before /d/ in broad. As a result, an English hearer might perceive Doni’s broad as brought because the vowel in broad was much shorter in duration, which made the word-final alveolar in broad perceivable as /t/.

Similar to Doni, Nurul articulated more than half of all observed tokens in the study in which she made the duration of vowel occurring before word-final voiced plosives longer than the duration of the vowel when occurring before word-final voiceless plosives. She, however, produced late – laid (gap 6 milliseconds), set – said (gap 8 milliseconds), and lock – log (gap 6 milliseconds) with only narrow differences in duration of vowel. The duration of /eɪ/ in late, /e/ in set, and /ɔ/ in lock was only slightly different from the duration of /eɪ/ in laid, /e/ in said, and /ɔ/ in log (table 10 above). An English hearer might find it difficult to distinguish between the word-final voiced and voiceless plosives in these word pairs. As a result, the English hearer might not be able to discern the differences in the pairs late – laid, set – said, and lock – log that Nurul produced. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

137

Adi was also able to produce more than a half of all tokens observed in the study in which he made the duration of vowel occurring before word-final voiced plosives longer than the duration of the same vowel when preceding word-final voiceless plosives. Adi's production of cap – cab (gap 2 milliseconds), heart – hard (gap 9 milliseconds), back – bag (gap 4 milliseconds), dock – dog (gap 1 millisecond), and lock – log (gap 7 milliseconds) shows that he did not make perceivable differences in duration of vowel, however (table 10 above). The duration of vowels occurring before voiced bilabial, alveolar, and velar plosives in the aforementioned pairs were only slightly different from the duration of the same vowels preceding voiceless bilabial, alveolar, and velar plosives. As a result, an English hearer might recognize the words in each of aforementioned pair that Adi produced (cap – cab, heart – hard, back – bag, dock – dog, and lock – log) as the same.

Nana also produced more than half of all tokens observed in the study in which she made the duration of vowel preceding word-final voiced plosive longer than that of the same vowel when occurring before word-final voiceless plosive.

However, unlike Yudi, Doni, Nurul or Adi, she produced less number of word pairs showing negligible differences in duration of vowel. She only produced three word pairs having narrow differences in vowel duration, tap – tab (gap 6 milliseconds), tripe – tribe (gap 6 milliseconds), and back – bag (gap 2 milliseconds). She produced the vowel /e/ (which is realization of /æ/) in tap, /aɪ/ in tripe, and /e/

(which is also realization of /æ/) in back with only slight difference in duration than

/e/ (again, it is realization of /æ/) in tab, /aɪ/ in tribe, and /e/ (which is realization of

/æ/) in bag. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

138

Results shown in table 10 demonstrates that in term of duration of voiced sound preceding word-final plosive, the Javanese informants were inconsistent. In some of the tokens the duration of voiced sound occurring immediately before a final voiced plosive was longer or insignificantly longer than that of the same voiced sound preceding a voiceless plosive, but in some other tokens it was vice-versa.

This indicates that how the informants accentuated the contrast between voiced- voiceless plosive occurring in final position did not reflect what an English hearer might expect from the realization of contrast between both opposing plosive groups.

Such failure to ensure that the duration of vowel preceding word-final voiced plosive should be significantly longer than the duration of the same vowel when occurring before a word-initial voiceless plosive to enable perceivable pre-fortis clipping reflects the inability of informants in the study to successfully realize the contrast between word-final voiced and voiceless plosives. At the same, the absence of ability to accentuate the contrast between two opposing groups of plosive in word-final position indicates that contrast between two groups of plosives in word- initial position is also absent in their L1. Indeed, contrast between two opposing plosives in Javanese is neutralized.

Figure 12 below shows comparison of gaps in duration of vowel preceding word- final bilabials. Along the horizontal axis are the word pairs observed in the study, and on the side of vertical axis are the gaps between duration of vowel preceding final /b/ and duration of the same vowel occurring before final /p/.

The six lines in the graph in figure 12 below represent six informants in the study. The graph shows that the only occasion most lines are in proximity to each other is when they are at the pair tap – tab, in which /e/ occurs medially. Again, PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

139 informants in the study tended to realize /æ/ coming immediately after alveolar in tap and tab as /e/. The close proximity indicates that most informants in the study tended to have a similar strategy in contrasting /e/ in tap and /e/ in tab. They similarly produced /e/ in tap and in tab in a way that the gaps between the duration of /e/ in tap and the duration of /e/ in tab were almost similar.

80.0

60.0

40.0

20.0

0.0

-20.0

-40.0

-60.0 rip - rib rope - robe tap - tab cap - cab tripe - tribe

Mita Yudi Doni Nurul Adi Nana

Figure 12. Gap in duration of vowel preceding word-final bilabials

It means that in the production of /e/ in tap and tab, most informants uniformly adopt a similar behavior. This finding concurs with the previous claim that Javanese informants in the study tended to realize English /æ/ as /e/ and had an inclination to produce English /æ/ and English /e/ in a manner they would produce Javanese /e/.

Informants automatically activate muscular movements already tuned for production of Javanese sound. They also tended to automatically produce word- initial voiceless alveolar occurring together with /e/ in tap and tab in a manner they would produce Javanese sounds. As a result, the most VOT values of voiceless alveolar in tap and tab, the quality of /e/, and duration of /e/ became almost similar.

However, it was not the case with the pair cap – cab in which /e/ also occurs in medial position. Informants did not similarly produce the contrast in the duration PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

140 of /e/ occurring immediately after voiced-voiceless velar. The differences in the duration of /e/ between /k/ and /p/ and the duration of /e/ between /k/ and /b/ highly varied among informants. It means that informants did not take a similar strategy to articulate the sound /e/ occurring immediately after voiceless velar in cap and cab. In terms of differences in the duration of /e/ and how sounds surrounding /e/ were produced, informants in the study tended to produce /tæ/- in tap and tab in an almost similar fashion. As what has been argued previously, informants tended to produce /tæ/- in tap and tab as Javanese /te/-. This has also been the case with /pæ/- in pack (see figure 6) and /te/- in ten (see figure 7). Informants tended to realize

English pæ/- as Javanese /pe/- and English /te/- as Javanese /te/-, but realization of

English /kæ/- in cap as Javanese /ke/- was not convincing (see figure 8). As shown by figure 10, informants in the study did not produce /kæ/- in cap and cab in an almost similar way. Thus, figure 6, figure 7, and figure 10 indicate that realization of English /æ/ and English /e/ and voiceless bilabial and voiceless alveolar immediately preceding them in /pæ/-, /tæ/-, and /te/- as Javanese /pe/- and Javanese

/te/- is more prevalent than realization of English /kæ/- as Javanese /ke/-.

4. The plausibility of plosives produced by Javanese informants to be

recognized as English

In addition to production of plosives, there were issues concerning the realization of vowel contrasts, stress pattern, and production of trilled /r/. The Javanese informants were inclined to realize English /æ/ as /e/ in back and bag. They also tended to produce /e/ with formant values typical of Javanese. Because there are typical formant values for a given vowel that a similar vowel in other languages may have different typical values, a vowel produced with formant values different PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

141 from the typical values in English language may be perceived as ‘foreign-accented’

(Davenport and Hannahs 2010, p. 71). Ability to produce formants values that are close to typical formants that an English hearer expects help the raters to recognize a vowel as a plausible English vowel. On the contrary, production of vowels with formant values typical of Javanese may lead a hearer to perceive the vowel as

Javanese-accented. The informants also tended to realize /ɹ/ as trill in broke and rogue. These two words, although they were not studied, were spoken by informants and presented to raters along with the rest of words observed in the study. In addition to production of plosives, there were issues concerning the realization of vowel contrasts, stress pattern, and production of trilled /r/.

Altogether, these may affect the level of perceived intelligibility.

The study limited the notion of intelligibility to technical aspect of intelligibility itself. To follow Kachru and Nelson (2011, p.67), the notion of intelligibility covers features of phonetics and phonology needed to make utterances recognizable or plausible. Thus, Kristen and Mum, the raters in the study, provided their perceptions of the plausibility of the plosive-carrying words produced by Javanese informants that the words are English.

Both Kristen and Mum, in their email, admitted that word-final plosive is the hardest for the informants to produce and the hardest for the raters themselves to recognize. This is not surprising since opposition between two sets of plosives in word-final position is neutralized in Javanese.

Kristen and Mum also provided what they perceived as degree of intelligibility in general. They compared informants to one another and the result is shown in figure 13 below. The two lines in the graph represent the two speakers of English PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

142 as the first language in the study. The yellow line represents perception of Kristen towards degree of intelligibility of each informant and the orange line indicate perception of Mum towards degree of intelligibility of each informant in the study.

Figure 13 shows that both Kristen and Mum are in agreement regarding their perception of degree of intelligibility of English sound production by Yudi. Both raters considered that most production of sounds by Yudi were not plausible

English sounds. Yudi made VOT values of his word-initial voiced plosive longer than those of word-initial voiceless plosive in most tokens he produced. He was also inclined to produce unaspirated word-initial voiceless plosive.

In addition, pronunciation of Yudi may carry different types of errors. Some of the vowels Yudi articulated were produced with a quality closer to that of Javanese vowels. He also tended to realize /ɹ/ as trill and he placed equal stress to both syllables in a two-syllable words. Yudi’s multiple errors agree with a claim by

Collins and Mees (2013, p. 215) that errors in pronunciation cannot be attributed to a single source of problem only in most cases. Collins and Mees add that apart from the problems with contrast of voiced-voiceless plosives, there can be also problems with vowel contrasts and stress. This means that Yudi’s problem with production of English vowels and stress pattern has contributed to the degree of unintelligibility of his production of English in general.

Similarly, Nana’s long exposure to English and years of practice with the language could have help her in the realization of vowel contrasts and stress.

Attention to vowel contrasts, stress pattern, and realization of /ɹ/ augmented Nana’s perceived level of intelligibility. In fact, both Kristen and Mum had the same opinion regarding the production of English words by Nana, the only English- PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

143 teaching informant in the study. Nana’s past experience with English proved to benefit her. She produced utterances that are most recognizable as plausible English sounds to Kristen and Mum.

Figure 13. Empirical perception of degree of intelligibility by Kristen and Mum

Figure 13 shows Kristen and Mum’s differing opinions regarding Mita. While in fact Mita made VOT values of her word-initial voiceless plosive longer than those of word-initial voiced plosive in all tokens she produced, she was unable to reach Nana’s degree of intelligibility in Mum’s perception. Kristen, however, who spent two years teaching English in Yogyakarta, has become more familiar with the ways Javanese people produce English sounds. Her intelligibility threshold has lowered and she has more access to intelligibility of Mita’s utterances. As a result,

Kristen found Mita equally intelligible as Nana.

In the study Kristen, one of the intelligibility raters, gave her perception about the English words each informant produced. She rated production of plosives according to different place of articulation and different environment in which the PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

144 plosive occurred. Figure 14 below shows the mean values of perceived intelligibility towards English plosives produced by Javanese informants.

Different plosives in different environment pose different challenges in terms of intelligibility. Kristen found that word-final plosives the Javanese informants produced were more challenging to recognize than word-initial plosives. Among plosives occurring in initial position, Kristen indicated that word-initial bilabials were the hardest to recognize. Word-initial alveolars produced by informants, on the other hand, were completely intelligible to her and the easiest to recognize. As figure 14 below displays, Kristen showed that word-final velars were the hardest to recognize among other plosives in the same environment.

Figure 14. Mean Values of Intelligibility as Perceived by Kristen

What is shown in figure 14 agrees with the characteristics of English plosives produced by the Javanese informants stated previously. English plosives produced by the Javanese informants were characterized by, among others, problems with aspiration of voiceless plosive. It has been revealed previously that English voiceless bilabials in initial position were mostly unaspirated in the study. Without PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

145 perceivable aspiration, an English hearer may find it difficult to recognize bilabials in word-initial position.

It has also been displayed that most English voiceless velars in initial position were aspirated in the study, and so were most of English voiced velars. This may cause problems with identification of which velar is in initial position. Word-initial voiceless alveolar in the study, however, were produced with perceivable aspiration in more than half cases while voiced alveolar were hardly aspirated in the study.

This may help an English hearer to correctly identify word-initial alveolars.

English plosives produced by Javanese informants were also characterized by highly inconsistent duration of voiced sound preceding plosive in final position.

Such inconsistency my result in difficulties in recognizing word-final plosives.

Indeed, figure 14 supports the claim regarding the characteristics of English plosives produced by the Javanese informants that the study suggests. Kristen stated that utterances with word-initial bilabials produced by informants were mostly intelligible while some others were not. Kristen also indicated that word-initial velars were barely regarded as completely intelligible. She, however, agreed that word-initial alveolars produced by informants were completely intelligible.

As a result of highly inconsistent duration of voiced sound preceding plosive in final position, Kristen claimed that word-final English plosives produced by informants were mostly intelligible while certain utterances were not.

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

146

CHAPTER V

CONCLUSIONS AND RECOMMENDATIONS

A. Conclusions

The characteristics of English plosives produced by Javanese informants reflect transfer from Javanese phonological rules. The fact that Javanese phonological rules dictate shorter VOT for initially-positioned tense plosives and longer VOT for lax plosive in the same position has put some speakers of Javanese as the first language at disadvantage. They tend to produce their initially-positioned English voiceless plosive with a shorter VOT. This is realized in unperceivable aspiration following English voiceless plosive. Informants who can manage not only to produce word-initial voiceless plosive with a longer VOT value than its voiced counterpart but also to stretch the VOT of word-initial voiceless plosive to last for more than 20 milliseconds to impart the effect of aspiration may produce more plausible English words.

Female informants irregularly produce prevoiced bilabial and alveolar in the study. This relates to larynx lowering that all informants did in the study. However, female informants are found to be more consistent in lowering their larynx than male informants. Female informants lower their larynx to extremities that male informants cannot reach, resulting in more space within oral cavity that is conducive for prevoicing.

Larynx lowering is the most prevalent influence of Javanese phonological rules.

All informants lower their larynx in most tokens. Realization of both English /æ/ and English /e/ as Javanese /e/ and production of English /pe/- and /te-/ as Javanese

/pe/- and /te/- are also prevalent in the study. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

147

While raters expected that the vowel occurring immediately before a voiced plosive is full in length and become shorter when preceding a voiceless plosive as an effect of pre-fortis clipping, they found words produced by informants carrying plosive in final position hard to recognize. This is a result of neutralization of opposition of word-final plosives in Javanese. In some of the tokens informants were unable to accentuate the difference between word-final voiced-voiceless plosives.

The study proves that problems to perceived intelligibility do not stem from a single source. There were other factors contributing to the level of plausibility of words to be perceived as English. While certain informants produced utterances with a higher degree of intelligibility, others display various extent of transfer from

Javanese morphology, phonemes, and phonological rules. An English hearer may find such transfer an obstacle hampering access to greater intelligibility.

Devotion of equal length of time to each syllable regardless of stress reflects transfer from Javanese morphology. English and Javanese have different pattern of stress. English is a stress-timed language, which means that only certain syllables in an English word are louder, somewhat higher in pitch, and rather longer in duration (Fromkin, Rodman and Hyams 2011, p. 212). All syllables in Javanese, on the other hand, have somewhat the same stress. They have more or less equal loudness, length and pitch. The difference in duration between stressed and unstressed syllable in Javanese is relatively small (Heuven and Zanten 1994, pp.

210-211). A speaker of Javanese as the first language – because of influence of

Javanese phonological rules – may fail to put stress where an English native speaker would. For the same reason, a speaker of Javanese as the first language may apply PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

148 an equal stress to all syllables in an English word. As a result, the produced English word may become Javanese-accented (Fromkin, Rodman and Hyams 2011, p. 213).

Realization of English /ɹ/ as trill /r/ is an example of transfer from Javanese phonemes. Production of vowels with quality that differs from what an English hearer expects may also be attributable to influence of Javanese vowel sounds.

Similarly, failure to contrast /æ/ from /e/ may stem from the fact that /æ/ does not exist in Javanese. Lastly, a longer VOT given to voiced plosives in word-initial position is an example of transfer from Javanese phonological rules.

In addition, how the informant exploited their passive and active articulators in making plosives may also contribute to the degree of intelligibility. English voiceless alveolar for example, is not alveolar in Javanese. In fact, Javanese has two distinct sounds that might seem similar to English /t/, which are /t̪/ and /ʈ/. The last two sound do not involve alveolar. Instead, they are produced by making the tip of the tongue to make contact with upper teeth for /t̪/ and hard palate for /ʈ/.

The severity of intelligibility problems may also reflect the lack of practice and exposure. The case with Nana proves that having an ample of practice and exposure helps improves plausibility of utterances to be perceived as English.

B. Recommendations

Further researches are needed into the opposition between vowels immediately following tense-lax plosive. Such researches are necessary to reveal the peculiarity of formant frequencies of Javanese vowels such as /u/ when following immediately the opposing tense-lax plosives. More researches are also necessary to understand different strategies that Javanese speakers take to accentuate the contrast between PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

149 word-final voiced-voiceless plosives. Such studies may be useful for practical usage in a language class.

Future researchers are also advised about the use of together with vowels in their instruments as these phonemes can be difficult to separate clearly from vowel because they tend to blend with vowel visually in spectrogram.

Therefore, monophthong is preferable unless it is necessary to do otherwise.

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

150

REFERENCES

Adisasmito-Smith, Niken. Acoustic characteristics of the Javanese breathy/clear phonation contrast. Paper presented at the Austronesian Formal Linguistics Association VI, U Toronto. 18 April 1999.

Akmajian, A., R.A. Demers, A.K. Farmer, & R.M. Harnish. Linguistics: an introduction to language and communication. 5th ed. Cambridge: The MIT Press. 2001.

Ashby, Michael, and John Maidment. Introducing Phonetic Science. Cambridge: Cambridge UP. 2005.

Ashby, Patricia. Understanding Phonetics. London: Hodder Education. 2011.

Baugh, Albert C., and Thomas Cable. A History of the English Language. 5th ed. London: Routledge. 2002.

Behrens, Susan J., and Rebecca L. Sperling. "Language Variation: Students and Teachers Reflect on Accents and Dialects" in Susan J. Behrens & Judith A. Parker (eds.). Language in the real world, 11-26. London: Routledge. 2010.

Bickford, Anita C., and Rick Floyd. Articulatory Phonetics: Tools for analyzing the world's languages. 4th ed. Dallas: SIL International. 2006.

Brunelle, Marc. “The role of larynx height in the Javanese tense ~ lax stop contrast”. In Raphael Mercado, Eric Potsdam, and Lisa Demena Travis (eds.) Austronesian and Theoretical Linguistics, 7-24. Amsterdam: John Benjamins. 2010.

Bussmann, Hadumod. Routledge Dictionary of Language and Linguistics. Trans. Gregory Trauth & Kerstin Kazzazi. New York: Routledge, 2006.

Clark, Ross. "Austronesian Languages". In Bernard Comrie (eds.) The World’s Major Languages. 2nd ed., 819-832. London: Routledge. 2009.

Chen, Yudong. A Comparison of Spanish Produced by Chinese L2 Learners and Native Speakers-an Acoustic Phonetics Approach. Diss. University of Illinois at Urbana- Champaign, 2007. Web. "A Comparison of Spanish Produced by Chinese L2 Learners and Native Speakers-an Acoustic Phonetics Approach."

Crystal, David. A Dictionary of Linguistics and Phonetics. 6th ed. Malden: Blackwell Publishing. 2008.

Collins, Beverley, and Inger M. Mees. Practical Phonetics and Phonology. 3rd ed. London: Routledge. 2013.

Davenport, Mike, and S. J. Hannahs. Introducing Phonetics & Phonology. 3rd ed. London: Hodder Education. 2010.

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

151

Escudero, Paola. Linguistic Perception and Second Language Acquisition: Explaining the attainment of optimal phonological categorization. Utrecht: LOT. 2005.

Finegan, Edward. "English". In Bernard Comrie (ed.) The World’s Major Languages. 2nd ed. 59-85. London: Routledge. 2009.

Fischer, Steven R. A history of language. London: Reaktion Books Ltd. 2001.

Fromkin, Victoria, Robert Rodman, and Nina Hyams. Introduction to Language. 9th ed. Boston: Wadsworth, Cengage Learning. 2011.

Gelderen, Elly van. A history of the English language. Amsterdam: John Benjamins Publishing Company. 2006.

Giegerich, Heinz J. English Phonology: an introduction. Cambridge: Cambridge UP. 1992.

Gussenhoven, Carlos. The Phonology of Tone and Intonation. Cambridge: Cambridge UP. 2004.

Gussenhoven, Carlos, and Haike Jacobs. Understanding Phonology. 3rd ed. Bernard Comrie and Greville Corbett (eds.). London: Hodder Education. 2011.

Honda, K., Hirai, H., Masaki, S., & Shimada, Y. Role of vertical larynx movement and cervical lordosis in FO control. Language and Speech, 42, 401-11. 1999. Retrieved from http://search.proquest.com/docview/213733447?accountid= 25704. Web. 15 Feb. 2016.

Horne, Elinore C. Beginning Javanese. New Haven: Yale UP. 1961.

Jackson, Howard. Grammar and Vocabulary: A Resource Book for Students. London: Routledge. 2002.

Hunnicutt, Leigh, Paul A. Morris. "Prevoicing and Aspiration in Southern American English." University of Pennsylvania Working Papers in Linguistics Volume 22 Issue 1 Proceedings of the 39th Annual Penn Linguistics Conference (1-1-2016): 213-224.

Jenkins, Jennifer. The Phonology of English as an International Language. Oxford: Oxford UP, 2000.

Johnson, Keith. Acoustic and auditory phonetics. Oxford: Blackwell Publishing Ltd. 2003.

Kachru, Yamuna, and Cecil L. Nelson. World Englishes in Asian Contexts. Hong Kong: Hong Kong UP. 2006.

Kim, Mi-Ryoung, and San Duanmu. "“Tense” and “Lax” stops in Korean". Journal of East Asian Linguistics 13 (2004): 59–104.

Kong, Eun Jong. The Development of Phonation-type Contrasts in Plosives: Cross-linguistic Perspectives. Diss. The Ohio State University, 2009. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

152

Ladefoged, Peter. A course in Phonetics. 4th ed. Boston: Heinle & Heinle. 2001.

Ladefoged, Peter. Elements of Acoustic Phonetics. Chicago: The University of Chicago Press. 1996.

Ladefoged, Peter. Vowels and Consonants: An Introduction to the Sounds of Languages. Massachusetts: Blackwell Publishers Inc. 2001.

Ladefoged, Peter, and Ian Maddieson. The Sounds of the World's Languages. Oxford: Blackwell Publishers Ltd. 1996.

Ladefoged, Peter, and Keith Johnson. A Course in Phonetics. 6th ed. Boston: Wadsworth, Cengage Learning. 2011.

Mangum, Wyatt A. "Animal Communication: The “Language” of Honey Bees" in Behrens, Susan J., and Judith A. Parker (eds.). Language in the real world 255-273. London: Routledge. 2010.

McCully, Chris. The Sound Structure of English: An Introduction. Cambridge: Cambridge UP. 2009.

McMahon, April. An Introduction to English Phonology. Edinburgh: Edinburgh UP. 2002.

McMahon, Michael K. C. "English Phonetics" in Bas Aarts & April McMahon (eds.). The handbook of English linguistics 359-381. Malden: Blackwell Publishing Ltd. 2006.

Maddieson, Ian. Patterns of sounds. Cambridge: Cambridge UP. 1984.

Mullany, Louise, and Peter Stockwell. Introducing English Language: a resource book for students. New York: Routledge. 2010.

Nelson, Cecil L. Intelligibility in World Englishes: Theory and Application. New York: Routledge. 2011.

Nothofer, B. "Javanese". In Brown, Keith and Sarah Ogilvie (eds.). Concise Encyclopedia of Languages of the World. Oxford: Elsevier Ltd. 2009.

Oakes, Michael P. "Javanese". In Bernard Comrie (ed.) The World’s Major Languages. 2nd ed. 819-832. London: Routledge. 2009.

O'Connor, J. D. Better English Pronunciation. Cambridge: Cambridge UP. 1980.

Odden, David. Introducing Phonology. Cambridge: Cambridge UP. 2005.

Pennington, Mark. The Phonetics and Phonology of Glottal Manner Features. Diss. Indiana University, 2006. Web. "The Phonetics and Phonology of Glottal Manner Features."

Poedjosoedarmo, Soepomo, 1982. Javanese Influence on Indonesian. Canberra: Pacific Linguistics D 38. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

153

Poedjosoedarmo, Soepomo. Course lecture. Sanata Dharma University. Yogyakarta, circa November 2010.

Radford, A., M. Atkinson, D. Britain, H. Clahsen, and A. Spencer. Linguistics: An Introduction. 2nd ed. Cambridge: Cambridge UP. 2009.

Seargeant, Phillip. Exploring World Englishes: Language in a Global Context. Oxon: Routledge. 2012.

Schoterman, J. An introduction to old Javanese Sanskrit dictionaries and grammars. In: Bijdragen tot de Taal-, Land- en Volkenkunde 137 (1981), no: 4, Leiden, 419-442 downloaded from http://www.kitlv-journals.nl. Web. 22 Feb. 2016.

Siemund, Peter, Julia Davydova, and Georg Maier. The Amazing World of Englishes: A Practical Introduction. Berlin: De Gruyter Mouton. 2012.

Simon, Ellen. Voicing in Contrast: Acquiring a Second Language Laryngeal System. Ghent: Academia Press. 2010.

Thurgood, Ela. "Phonation types in Javanese". Oceanic Linguistics 43 (2004): 279-297.

Vaissiere, Jacqueline. Phonological use of the larynx: a tutorial. Larynx 97, 1994, Marseille, France. pp.115-126, 1997.

Wedhawati, W.E.S. Nurlina, E. Setiyanto, R. Sukesti, Marsono, and I.P. Baryadi. Tata Bahasa Jawa Mutakhir. Yogyakarta: Penerbit Kanisius. 2010. PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

154

APPENDICES

Appendix 1. Group 1: Initially-positioned /p/ and /b/

Informants peak beak Note on /p/ VOT F1 F2 VOT F1 F2 VOT Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Length Mita 11.0 455.0 2392.0 8.0 399.5 2361.3 Longer Yudi 9.0 393.4 2273.3 14.0 329.9 2025.4 Doni 11.0 387.8 2119.9 16.0 345.9 2157.4 Nurul 13.0 494.6 2975.2 24.0 368.9 2719.1 Adi 12.0 400.9 2136.8 13.0 387.5 2045.7 Nana 6.0 757.9 2119.7 8.0 454.8 2155.8 Informants pit bit Note on /p/ VOT F1 F2 VOT F1 F2 VOT Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Length Mita 39.0 432.6 2809.8 11.0 384.7 2329.5 Longer Aspirated Yudi 12.0 390.1 2391.9 16.0 309.3 2141.9 Doni 10.0 351.1 2257.6 21.0 356.2 2122.1 Nurul 16.0 493.1 2392.6 23.0 458.6 2409.9 Adi 10.0 427.3 2058.1 13.0 413.2 1953.8 Nana 0.0 379.4 2182.6 -24.0 326.7 2288.2 Longer Informants pack back Note on /p/ VOT F1 F2 VOT F1 F2 VOT Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Length Mita 11.0 886.6 1767.9 -44.0 494.9 1771.0 Longer Yudi 14.0 726.3 1598.4 18.0 519.5 1859.7 Doni 12.0 637.2 1655.7 12.0 500.5 1856.0 Nurul 10.0 749.3 1734.8 14.0 640.8 1923.7 Adi 11.0 736.3 1544.6 14.0 538.3 1617.4 Nana 10.0 840.1 1688.6 0.0 506.4 1809.1 Longer Informants park bark Note on /p/ VOT F1 F2 VOT F1 F2 VOT Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Length Mita 41.0 722.1 1008.1 -63.0 527.1 1060.1 Longer Aspirated Yudi 19.0 850.5 1236.1 19.0 974.8 1946.2 Doni 15.0 710.0 1159.5 18.0 555.1 1130.4 Nurul 30.0 694.2 1105.9 -56.0 461.5 999.8 Longer Aspirated Adi 16.0 764.0 1235.8 19.0 707.5 1194.7 Nana 7.0 775.0 1024.9 6.0 563.7 1170.7 Longer

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

155

Informants pull bull Note on /p/ VOT F1 F2 VOT F1 F2 VOT Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Length Mita 15.0 575.1 957.6 -87.0 507.9 1142.3 Longer Yudi 5.0 427.7 713.3 22.0 569.2 1854.1 Doni 25.0 463.7 1247.5 22.0 444.7 1039.6 Longer Aspirated Nurul 25.0 551.4 993.0 40.0 455.1 950.4 Aspirated Adi 13.0 571.5 2648.2 32.0 591.7 2287.5 Nana 28.0 418.7 1066.5 31.0 411.9 993.7 Aspirated Informants port bought Note on /p/ VOT F1 F2 VOT F1 F2 VOT Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Length Mita 66.0 602.2 901.9 13.0 586.2 1012.9 Longer Aspirated Yudi 13.0 648.9 1032.6 11.0 559.7 1159.5 Longer Doni 13.0 643.0 911.5 13.0 540.0 1228.1 Nurul 13.0 489.7 870.7 -14.0 615.7 1240.4 Longer Adi 17.0 586.5 725.0 17.0 646.1 975.7 Nana 22.0 562.0 751.6 9.0 432.7 985.2 Longer Aspirated

Appendix 2. Group 3: Finally-positioned /p/ and /b/

Informants rip rib The length F1 at the F2 at the Intensity The length F1 at the F2 at the Intensity of voiced plosive plosive at the of voiced plosive plosive at the sound closure (in closure (in plosive sound closure (in closure (in plosive preceding hertz) hertz) closure (in preceding hertz) hertz) closure (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 91.0 394.8 1857.4 76.4 119.0 411.8 1814.8 78.5 Yudi 77.0 415.1 1689.9 78.9 60.0 427.1 2155.6 79.3 Doni 67.0 389.5 1871.7 73.8 63.0 403.0 1671.2 81.2 Nurul 97.0 479.4 1988.5 76.3 113.0 463.3 1556.8 82.1 Adi 93.0 405.2 2077.2 76.0 103.0 349.2 1898.3 79.1 Nana 63.0 422.8 2011.0 80.7 80.0 385.7 1856.5 78.4

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

156

Informants rope robe The length F1 at the F2 at the Intensity The length F1 at the F2 at the Intensity of voiced plosive plosive at the of voiced plosive plosive at the sound closure (in closure (in plosive sound closure (in closure (in plosive preceding hertz) hertz) closure (in preceding hertz) hertz) closure (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 81.0 465.4 1024.3 79.5 137.0 435.2 1072.0 79.0 Yudi 120.0 463.3 786.5 80.7 84.0 671.7 679.3 77.0 Doni 125.0 644.4 2242.1 67.2 105.0 686.2 1137.4 69.3 Nurul 81.0 495.7 1033.8 76.8 130.0 539.0 1099.0 78.4 Adi 108.0 518.8 628.1 76.6 96.0 381.2 891.1 80.1 Nana 105.0 561.7 1222.0 79.6 124.0 493.6 975.0 77.6 Informants tap tab The length F1 at the F2 at the Intensity The length F1 at the F2 at the Intensity of voiced plosive plosive at the of voiced plosive plosive at the sound closure (in closure (in plosive sound closure (in closure (in plosive preceding hertz) hertz) closure (in preceding hertz) hertz) closure (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 130.0 576.2 1716.7 74.0 152.0 564.7 1733.7 78.0 Yudi 98.0 601.3 1735.7 75.8 97.0 672.2 1785.3 74.4 Doni 99.0 593.2 1384.1 76.2 103.0 528.1 1381.2 73.2 Nurul 102.0 737.2 1840.6 73.2 158.0 581.3 1724.8 77.6 Adi 100.0 616.8 1675.6 72.7 89.0 502.8 1568.7 74.8 Nana 116.0 753.3 1506.5 76.0 110.0 750.8 1545.2 71.6 Informants cap cab The length F1 at the F2 at the Intensity The length F1 at the F2 at the Intensity of voiced plosive plosive at the of voiced plosive plosive at the sound closure (in closure (in plosive sound closure (in closure (in plosive preceding hertz) hertz) closure (in preceding hertz) hertz) closure (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 121.0 747.1 1686.7 74.6 171.0 577.5 1634.2 75.9 Yudi 111.0 612.1 1675.9 75.6 90.0 683.7 1731.0 75.9 Doni 108.0 597.0 1311.4 73.1 109.0 539.4 1385.8 74.4 Nurul 110.0 854.9 1769.8 76.9 156.0 536.9 1715.5 77.5 Adi 91.0 542.1 1566.5 73.4 93.0 556.9 1682.3 73.6 Nana 135.0 781.4 1473.5 75.8 118.0 803.7 1573.3 72.8

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

157

Informants tripe tribe The length F1 at the F2 at the Intensity The length F1 at the F2 at the Intensity of voiced plosive plosive at the of voiced plosive plosive at the sound closure (in closure (in plosive sound closure (in closure (in plosive preceding hertz) hertz) closure (in preceding hertz) hertz) closure (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 146.0 443.0 1820.0 75.9 194.0 374.4 1885.3 74.2 Yudi 122.0 475.5 1835.6 70.4 115.0 578.1 1658.1 77.9 Doni 151.0 514.1 1588.2 72.0 164.0 442.6 1844.0 67.7 Nurul 162.0 609.5 1879.1 72.9 197.0 423.8 1505.6 77.0 Adi 151.0 543.0 1792.8 69.9 180.0 444.2 1605.0 72.2 Nana 163.0 316.4 1536.6 76.7 157.0 436.3 1563.0 68.4

Appendix 3. Group 4: Initially-positioned /t/ and /d/

Informants two do Note on /t/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 83.0 447.5 1874.7 17.0 391.8 2088.6 Longer VOT Aspirated Yudi 14.0 443.5 1328.5 17.0 428.6 1638.4 Doni 63.0 350.4 1571.0 34.0 423.2 1945.0 Longer VOT Aspirated Nurul 85.0 369.0 2519.6 25.0 371.2 1929.0 Longer VOT Aspirated Adi 10.0 466.2 1569.0 18.0 318.9 1705.8 Nana 12.0 388.0 1399.6 17.0 302.8 2042.4 Informants ten den Note on /t/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 21.0 615.0 1938.0 14.0 526.2 2114.8 Longer VOT Aspirated Yudi 38.0 718.6 1775.5 13.0 478.5 1780.7 Longer VOT Aspirated Doni 11.0 597.9 1668.3 23.0 424.9 1989.6 Nurul 15.0 800.4 2258.4 13.0 497.6 2265.6 Longer VOT Adi 12.0 643.5 1697.8 18.0 513.5 1805.4 Nana 16.0 735.9 1824.3 9.0 442.6 2369.1 Longer VOT Informants tone done Note on /t/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 52.0 839.2 1160.5 12.0 552.4 1803.4 Longer VOT Aspirated Yudi 19.0 664.1 1273.7 21.0 575.4 1552.0 Doni 14.0 589.4 1255.2 18.0 490.9 1702.3 Nurul 65.0 819.6 1486.3 9.0 539.0 1950.1 Longer VOT Aspirated Adi 11.0 748.0 1188.2 14.0 550.2 1513.7 Nana 4.0 683.7 1165.7 7.0 634.2 1532.0

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

158

Informants tune dune Note on /t/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 69.0 432.1 2113.8 23.0 390.2 2330.4 Longer VOT Aspirated Yudi 19.0 432.5 1057.0 23.0 415.2 1344.2 Doni 87.0 352.6 1468.0 16.0 377.9 1748.4 Longer VOT Aspirated Nurul 93.0 406.2 2304.1 23.0 394.6 2340.5 Longer VOT Aspirated Adi 88.0 415.7 1874.7 18.0 438.8 1873.9 Longer VOT Aspirated Nana 86.0 308.5 1906.5 19.0 371.8 1907.9 Longer VOT Aspirated Informants torn dawn Note on /t/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 39.0 562.1 1232.8 12.0 545.8 1513.7 Longer VOT Aspirated Yudi 9.0 656.7 1202.9 13.0 646.4 1526.5 Doni 61.0 656.4 1133.8 17.0 504.7 1752.5 Longer VOT Aspirated Nurul 56.0 815.7 1225.2 -83.0 493.2 2341.4 Longer VOT Aspirated Adi 22.0 734.9 1155.1 15.0 621.3 1550.3 Longer VOT Aspirated Nana 14.0 614.1 1097.9 12.0 564.0 1461.9 Longer VOT Informants tie die Note on /t/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 42.0 823.4 1355.6 14.0 521.8 1784.3 Longer VOT Aspirated Yudi 15.0 719.6 1426.6 19.0 563.0 1783.4 Doni 15.0 654.8 1380.7 12.0 466.2 1811.9 Longer VOT Nurul 95.0 347.5 2803.0 14.0 606.7 2132.6 Longer VOT Aspirated Adi 20.0 741.1 1418.2 17.0 547.2 1689.0 Longer VOT Nana 8.0 857.0 1590.9 7.0 576.8 2212.6 Longer VOT Informants town down Note on /t/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 40.0 762.1 1484.5 12.0 539.1 1866.4 Longer VOT Aspirated Yudi 16.0 772.7 1443.4 21.0 587.1 1491.0 Doni 41.0 699.7 1375.2 13.0 513.6 1781.4 Longer VOT Aspirated Nurul 56.0 811.3 1235.9 94.0 494.6 2343.2 Aspirated Adi 11.0 696.1 1220.8 16.0 600.5 1614.1 Nana 9.0 825.3 1679.3 11.0 514.2 2217.5

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

159

Appendix 4. Group 6: Finally-positioned /t/ and /d/

Informants bet bed The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure (in closure (in plosive voiced closure (in closure (in plosive sound hertz) hertz) closure (in sound hertz) hertz) closure (in preceding decibel) preceding decibel) plosive in plosive in final final position position (in ms) (in ms) Mita 120.0 478.0 1750.7 75.5 169.0 540.0 1865.3 76.1 Yudi 100.0 419.4 1747.5 78.7 96.0 495.2 1700.9 78.4 Doni 126.0 451.6 1622.8 71.2 125.0 445.3 1691.0 76.2 Nurul 129.0 637.4 2082.5 73.8 108.0 518.6 2279.0 76.7 Adi 97.0 511.4 1678.7 77.3 118.0 459.3 1644.5 78.5 Nana 130.0 585.6 1723.1 77.7 175.0 534.7 1973.7 65.6 Informants late laid The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure (in closure (in plosive voiced closure (in closure (in plosive sound hertz) hertz) closure (in sound hertz) hertz) closure (in preceding decibel) preceding decibel) plosive in plosive in final final position position (in ms) (in ms) Mita 161.0 403.9 2532.6 72.6 230.0 420.4 2377.2 73.8 Yudi 111.0 374.7 1862.5 75.5 174.0 462.2 1878.6 72.3 Doni 97.0 382.2 1966.7 74.9 105.0 372.6 2067.9 74.6 Nurul 114.0 580.9 2262.5 74.6 120.0 447.0 2694.6 72.6 Adi 83.0 398.7 1882.2 77.7 105.0 371.4 2006.2 77.2 Nana 194.0 540.0 1941.4 78.8 264.0 437.8 2326.3 72.3 Informants set said The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure (in closure (in plosive voiced closure (in closure (in plosive sound hertz) hertz) closure (in sound hertz) hertz) closure (in preceding decibel) preceding decibel) plosive in plosive in final final position position (in ms) (in ms) Mita 120.0 536.6 1758.8 75.4 142.0 537.7 1837.5 76.0 Yudi 90.0 491.8 1452.5 73.8 107.0 439.8 1978.4 79.5 Doni 97.0 552.1 1518.1 75.5 146.0 362.5 1961.0 70.2 Nurul 118.0 694.5 1942.1 74.3 126.0 674.2 1971.9 73.8 Adi 97.0 514.6 1551.7 74.8 109.0 432.9 1845.3 74.8 Nana 118.0 626.5 1663.8 76.6 186.0 418.5 1974.4 69.7

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

160

Informants heart hard The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure (in closure (in plosive voiced closure (in closure (in plosive sound hertz) hertz) closure (in sound hertz) hertz) closure (in preceding decibel) preceding decibel) plosive in plosive in final final position position (in ms) (in ms) Mita 156.0 471.4 1846.1 73.1 233.0 443.6 1802.8 72.0 Yudi 115.0 363.6 1732.7 75.7 115.0 565.6 1570.8 77.7 Doni 105.0 541.0 1349.1 79.1 150.0 562.8 1499.5 75.2 Nurul 143.0 614.7 1647.9 74.8 166.0 710.4 1600.8 72.1 Adi 112.0 657.4 1482.4 76.9 121.0 526.1 1478.7 75.7 Nana 155.0 388.9 1806.1 76.1 107.0 487.9 1801.7 72.1 Informants sight side The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure (in closure (in plosive voiced closure (in closure (in plosive sound hertz) hertz) closure (in sound hertz) hertz) closure (in preceding decibel) preceding decibel) plosive in plosive in final final position position (in ms) (in ms) Mita 167.0 389.1 1825.7 71.4 240.0 423.4 2120.0 74.1 Yudi 154.0 452.2 1781.7 73.4 181.0 472.1 1849.8 77.2 Doni 174.0 438.0 1872.6 71.2 183.0 439.7 1791.1 69.9 Nurul 175.0 433.4 2370.0 74.6 188.0 457.1 2545.7 67.2 Adi 138.0 497.4 1890.5 72.8 163.0 385.6 1842.6 75.7 Nana 223.0 377.3 2309.7 74.7 210.0 383.6 2389.7 68.0 Informants brought broad The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure (in closure (in plosive voiced closure (in closure (in plosive sound hertz) hertz) closure (in sound hertz) hertz) closure (in preceding decibel) preceding decibel) plosive in plosive in final final position position (in ms) (in ms) Mita 177.0 497.0 1511.1 76.3 182.0 454.8 1373.7 74.4 Yudi 156.0 360.6 1581.1 74.5 172.0 478.4 1745.5 72.2 Doni 250.0 481.8 1345.0 73.7 145.0 496.0 1368.2 70.0 Nurul 207.0 477.2 1716.9 77.7 150.0 561.6 1210.1 77.2 Adi 164.0 488.6 1346.7 76.2 140.0 446.6 2538.1 79.9 Nana 201.0 480.7 1628.2 75.4 185.0 594.9 1581.0 66.3

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

161

Appendix 5. Group 7: Initially-positioned /k/ and /g/

Informants cave gave Note on /k/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 56.0 469.9 2212.7 28.0 386.1 2450.9 Longer VOT Aspirated Yudi 15.0 464.9 2057.1 32.0 410.3 2109.1 Doni 49.0 379.8 2121.1 66.0 374.9 2258.2 Aspirated Nurul 95.0 610.5 2889.2 39.0 440.7 2826.8 Longer VOT Aspirated Adi 30.0 408.8 2145.5 49.0 396.8 2047.3 Aspirated Nana 29.0 454.7 2619.3 31.0 392.1 2666.6 Aspirated Informants curl girl Note on /k/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 63.0 545.7 1455.1 33.0 480.7 1463.5 Longer VOT Aspirated Yudi 27.0 476.4 984.7 59.0 434.8 1691.7 Aspirated Doni 50.0 532.4 1333.4 31.0 426.1 1562.4 Longer VOT Aspirated Nurul 71.0 707.4 1154.6 59.0 488.6 1321.2 Longer VOT Aspirated Adi 22.0 591.2 1396.5 25.0 521.1 1534.6 Aspirated Nana 65.0 517.4 1548.7 52.0 430.0 1557.5 Longer VOT Aspirated Informants cap gap Note on /k/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 58.0 650.8 1857.9 19.0 404.9 2442.2 Longer VOT Aspirated Yudi 15.0 629.2 1861.7 25.0 480.2 1897.3 Doni 48.0 592.0 1917.7 41.0 425.4 2155.5 Longer VOT Aspirated Nurul 28.0 755.6 2490.7 30.0 486.2 2687.0 Aspirated Adi 28.0 579.0 1939.6 31.0 489.4 2006.4 Aspirated Nana 44.0 557.6 1991.0 22.0 411.2 2576.7 Longer VOT Aspirated Informants card guard Note on /k/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 69.0 680.9 1065.3 35.0 561.0 1436.5 Longer VOT Aspirated Yudi 29.0 758.3 1324.0 18.0 453.5 1373.9 Longer VOT Aspirated Doni 66.0 722.1 1372.4 50.0 583.0 1330.7 Longer VOT Aspirated Nurul 0.0 861.6 1222.6 44.0 530.5 1247.8 Adi 16.0 710.7 1477.1 30.0 541.0 1525.7 Nana 71.0 589.5 1280.6 36.0 441.1 1781.0 Longer VOT Aspirated

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

162

Informants could good Note on /k/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 67.0 454.0 947.7 37.0 398.4 1230.5 Longer VOT Aspirated Yudi 19.0 480.0 865.3 35.0 414.4 1415.6 Doni 80.0 472.5 2387.8 44.0 379.2 1382.2 Longer VOT Aspirated Nurul 42.0 513.3 1004.2 49.0 390.2 1136.2 Aspirated Adi 24.0 464.3 668.4 41.0 545.7 1882.6 Aspirated Nana 47.0 340.8 956.5 32.0 356.0 1466.4 Longer VOT Aspirated Informants coal goal Note on /k/ VOT F1 F2 VOT F1 F2 VOT Length Aspiration (in ms) (in Hertz) (in Hertz) (in ms) (in Hertz) (in Hertz) Mita 72.0 531.4 869.1 30.0 432.4 983.8 Longer VOT Aspirated Yudi 31.0 562.6 764.6 53.0 634.8 742.6 Aspirated Doni 72.0 509.6 842.8 52.0 451.9 1088.8 Longer VOT Aspirated Nurul 88.0 749.9 897.7 45.0 460.8 908.2 Longer VOT Aspirated Adi 30.0 569.4 2089.6 26.0 510.8 969.6 Longer VOT Aspirated Nana 80.0 449.9 832.7 45.0 369.0 1107.1 Longer VOT Aspirated

Appendix 6. Group 9: Finally-positioned /k and /g/

Informants pick pig The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure closure plosive voiced closure closure plosive sound (in hertz) (in hertz) closure sound (in hertz) (in hertz) closure preceding (in preceding (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 84.0 353.7 2212.3 72.2 120.0 396.1 2007.2 69.2 Yudi 54.0 369.4 2333.7 66.3 57.0 439.8 2231.4 66.6 Doni 62.0 363.3 2359.3 70.0 54.0 423.9 2427.1 70.2 Nurul 81.0 529.6 2824.5 73.0 126.0 317.1 2526.4 75.8 Adi 82.0 389.5 2160.9 75.9 71.0 355.2 2149.2 72.5 Nana 51.0 324.7 2759.8 77.6 93.0 367.2 2564.6 70.7

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

163

Informants back bag The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure closure plosive voiced closure closure plosive sound (in hertz) (in hertz) closure sound (in hertz) (in hertz) closure preceding (in preceding (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 147.0 473.6 2274.3 74.3 173.0 504.8 2181.9 74.9 Yudi 99.0 435.4 2172.8 71.4 106.0 471.8 1935.1 69.7 Doni 113.0 474.6 2081.1 74.0 115.0 443.2 2031.2 74.5 Nurul 121.0 623.4 2175.4 74.2 160.0 506.6 2222.2 74.6 Adi 112.0 484.0 1995.8 73.3 108.0 436.3 1990.9 71.8 Nana 188.0 503.7 2284.7 72.1 186.0 352.0 2044.3 67.5 Informants dock dog The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure closure plosive voiced closure closure plosive sound (in hertz) (in hertz) closure sound (in hertz) (in hertz) closure preceding (in preceding (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 154.0 965.8 2713.7 73.0 212.0 497.2 1307.0 75.6 Yudi 130.0 947.4 3092.0 66.7 120.0 550.6 1331.8 70.6 Doni 132.0 557.8 1364.5 65.7 133.0 504.8 1576.1 68.6 Nurul 150.0 611.7 971.9 74.2 186.0 720.3 989.8 69.3 Adi 116.0 598.4 1021.0 76.4 115.0 642.3 1097.7 77.9 Nana 187.0 495.4 983.7 72.9 204.0 513.7 957.6 76.0 Informants lock log The F1 at the F2 at the Intensity The F1 at the F2 at the Intensity length of plosive plosive at the length of plosive plosive at the voiced closure closure plosive voiced closure closure plosive sound (in hertz) (in hertz) closure sound (in hertz) (in hertz) closure preceding (in preceding (in plosive in decibel) plosive in decibel) final final position position (in ms) (in ms) Mita 88.0 574.2 1026.5 76.9 159.0 583.0 1494.8 72.7 Yudi 128.0 820.0 2056.4 74.2 109.0 884.0 1869.8 72.9 Doni 103.0 878.2 2350.3 66.4 98.0 578.3 1273.6 75.9 Nurul 104.0 803.9 1145.7 76.5 110.0 481.1 1093.0 73.8 Adi 91.0 604.1 1074.4 73.3 98.0 495.0 1037.3 75.9 Nana 101.0 611.0 962.4 76.7 144.0 1177.4 2776.4 68.6

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

164

Appendix 7. Gap in duration of vowel preceding final alveolars

100.0 80.0 60.0 40.0 20.0 0.0 -20.0 -40.0 -60.0 -80.0 -100.0 -120.0 bet - bed late - laid set - said heart - hard sight - side brought - broad

Mita Yudi Doni Nurul Adi Nana

Appendix 8. Gap in duration of vowel preceding final velars

80.0 70.0 60.0 50.0 40.0 30.0 20.0 10.0 0.0 -10.0 -20.0 -30.0 pick - pig back - bag lock - log dock - dog

Mita Yudi Doni Nurul Adi Nana

Appendix 9. General impression of levels of intelligibility as perceived by Kristen and Mum

Kristen Mum Mita 26 20 Yudi 6 9 Doni 15 15 Nurul 17 13 Adi 12 15 Nana 26 26 PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

165

Appendix 10. Levels of intelligibility as perceived by Kristen

Word Group Contrast Mita Yudi Doni Nurul Adi Nana 1 Initial bilabial 27 10 25 25 25 26 2 Medial bilabial 27 20 29 29 30 30 3 Final bilabial 28 5 23 29 28 25 4 Initial alveolar 30 9 30 28 28 30 5 Medial alveolar 30 9 20 28 28 30 6 Final alveolar 30 9 23 23 24 28 7 Initial velar 29 10 30 24 25 30 8 Medial velar 27 20 30 29 29 30 9 Final velar 26 3 19 22 22 26 10 Initial palatal 30 4 25 23 29 29 11 Medial palatal 26 10 20 28 26 30 12 Final palatal 25 5 19 28 25 29 13 Initial nasal 29 10 17 27 24 25 14 Medial nasal 28 10 15 26 22 23 15 Final nasal 30 21 30 30 30 30

Appendix 11. Mita’s TOEFL Certificate

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

166

Appendix 12. Yudi’s TOEFL Certificate

Appendix 13. Doni’s TOEFL Certificate

Appendix 14. Nurul’s TOEFL Certificate

PLAGIAT MERUPAKAN TINDAKAN TIDAK TERPUJI

167

Appendix 15. Adi’s TOEFL Certificate

Appendix 16. Nana’s TOEFL Certificate