<<

音声研究 第 22 巻第 2 号 Journal of the Phonetic Society 2018(平成 30)年 8 月 of Japan, Vol. 22 No. 2 131–140 頁 August 2018, pp. 131–140

特 集 ノート Onset Time of Word-Initial Stops and in Khalkha Mongolian

Naoki UETA*

モンゴル語ハルハ方言の語頭破裂音・破擦音のVOT

SUMMARY: Khalkha Mongolian has two types of obstruents, which are transliterated as〈 b, d, g, dz, ǰ〉 and〈 p, t, k, ts, č〉, respectively; however, it is not clear whether the feature distinguishing them is voicing or aspiration. This production study examines the distribution of the VOT values between the two series of word-initial obstruents in Khalkha Mongo- lian. The data show that VOTs of〈 b, d, g, dz, ǰ〉 generally show positive values and that there is no phonetic ground to view /g/ and /ɢ/ as voiced , although they are phonologically regarded as voiced.

Key words: Khalkha Mongolian, voice onset time, obstruent, , speech production

1. Introduction ranges: one from about −125 ms to −75 ms (voiced unaspirated stop), one from 0 to +25 ms (voiceless un- In Khalkha Mongolian (henceforth just Mongolian), aspirated stop), and one from about +60 ms to +100 ms widely spoken in Mongolia and also called Standard (voiceless aspirated stop). Mongolian, there are two contrastive series of stops and It is well known that the features of voicing and as- affricates (obstruents)1). In Cyrillic letters2) and their piration for syllable-initial stops can be specified quite common transliterations, the two series of obstruents well only by VOT, and a number of cross-linguistic are rendered as shown in Table 1 (Sanders and Bat- studies have been conducted to show the validity of Ireedüi 1999, pp. 3–4, Shiotani and Prevjav 2001, p. 2, VOT for doing so (Shimizu 1996, Cho and Ladefoged Yamakoshi 2012, p. 19). 1999, Takada 2011; among others). However, few The alphabetical transcription seems to imply that reports are available on VOT values of Mongolian this contrast is one of voicing. However, it is still not obstruents. clear precisely what phonetic or acoustic features dis- My aim is to examine the distribution of VOT values tinguish these two types of sounds. of each obstruent in Mongolian with a speech produc- In this study, I focus on voice onset time (VOT) of tion experiment. In the following section, I provide Mongolian word-initial obstruents. VOT, which is an an overview of previous studies and point out some acoustic feature proposed by Lisker and Abramson remaining problems. Section 3 describes the current (1964), is the time interval between the articulatory production experiment. In section 4, I present the re- release of the stop and the onset of vocal fold vibration. sults of the experiment and show that (i) VOT of〈 b, VOT takes a positive value if the vocal fold vibration d, g, dz, ǰ〉 generally takes positive values, (ii) VOT starts after the articulatory release and, conversely, of velar and uvular stops is longer than that of bilabial takes a negative value if voicing precedes the release. and dental stops, (iii) there is no phonetic ground to Lisker and Abramson (1964) examine 11 languages view /g/ and /ɢ/ as voiced consonants, although they and show that the stop categories fall into three VOT are phonologically regarded as voiced, and (iv) young Mongolian speakers distinguish〈 k〉 [kh] from〈 g〉 [q] Table 1 Cyrillic letters for Mongolian obstruents and phonetically, though some previous studies state that their transliterations. 〈k〉 is replaced by [x, ɡ, ɢ]. Section 5 is the conclusion of this paper. п〈 p〉 т〈 t〉 ц〈 ts〉 ч〈 ch〉 к〈 k〉 б〈 b〉 д〈 d〉 з〈 z〉 ж〈 j〉 г〈 g〉

* Graduate School of Language and Culture, Osaka University(大阪大学言語文化研究科)

— 131 — 特集「有声性の対立に関する音声と音韻」

Table 2 Mongolian stops and affricates (Svantesson and Karlsson 2012, p. 453)3).

Labial Dental Post-alveolar Velar Uvular

Aspirated stop ph th Unaspirated stop p t Voiced stop g ɢ Aspirated ʦh ʧh Unaspirated affricate ʦ ʧ

Table 3 Mean VOT duration (in ms) (Svantesson and 2. Previous Studies of Mongolian Obstruents Karlsson 2012, p. 456).

As shown in section 1, the two contrastive series VOT of obstruents in Mongolian are usually rendered with Speaker h symbols for voiceless and voiced consonants in Cyrillic /t -/ /t-/ p-values letters and their transliterations. In phonological analy- BB 57 22 p<.05 ses, the terms strong - weak (čanga - sul in Mongolian), DD 58 11 p<.001 fortis - lenis, and tense - lax are traditionally used for XB 40 23 p<.01 these two series of obstruents, and many researchers /ʦh-/ /ʦ-/ have described the basic phonetic characteristics of each sound (Stuart and Haltod 1957, Luvsanvandan BB 102 47 p<.01 1964, Poppe 1970, Tsoloo 1976, Möömöö 1979, for DD 88 57 p<.01 XB 76 49 p≥.05 example). However, there is little consensus on which distinctive feature distinguishes these two series. For /ʧh-/ /ʧ-/ example, Poppe (1970) and Janhunen (2012) found that BB 85 58 p<.05 Mongolian “tense” obstruents are voiceless aspirated DD 108 60 p≥.05 consonants and “lax” obstruents are voiceless unaspi- XB 70 49 p≥.05 rated consonants, while Tsoloo (1976) and Sambuudorj (2012) regard “strong” and “weak” obstruents to be voiceless and voiced, respectively (for more details, see about −20 ms to about +20 ms. In Japanese, accord- Svantesson et al. 2005, pp. 220–221). ing to Takada (2011, pp. 70–71), VOT of word-initial Recently, Svantesson et al. (2005), Karlsson and voiced stops is distributed from −320 ms to +74 ms Svantesson (2011, 2012), and Svantesson and Karls- and there are two peaks of distribution; one around son (2012) have claimed that the distinctive feature −80 to −60 ms and the other around 0 to +10 ms. In which distinguishes these two series of obstruents in Mongolian, however, it is not clear whether VOT of Mongolian is aspiration: post-aspiration word-initially unaspirated (or voiced) obstruents can take negative and pre-aspiration in the other positions. Table 2 shows values or whether it always takes positive values. the phonological system for the stops and affricates The second, and the more crucial, problem is that proposed in Svantesson and Karlsson (2012). the data are limited to dental stops, dental affricates, Svantesson and Karlsson (2012) presented some and postalveolar affricates. In other words, labial, ve- acoustic data, including on VOT; according to their lar, and uvular stops were not studied by Svantesson measurements, mean VOT values for word-initial /th–t/, and Karlsson (2012). This limitation may have been /ʦh–ʦ/ and /ʧh–ʧ/ are those shown in Table 3. unavoidable because the studies in question have tried The acoustic data seem to support their interpreta- to examine acoustic characteristics of obstruents in all tions in Table 2. However, there are some remaining positions, and only dental stops, dental affricates, and issues. postalveolar affricates frequently occur in all positions First, the distribution of VOT values is not clear, in Mongolian. It is known, however, that VOT gener- since Svantesson and Karlsson report only mean val- ally varies among places of articulation. For example, ues. It is well known that VOT of voiced obstruents Kent and Read (1992, p. 114) show that bilabials have can take both negative and positive values in some the shortest VOTs and velars have the longest. Cho languages; for example, Kent and Read (1992, p. 108) and Ladefoged (1999) analyze the VOT values in 18 show that VOT of voiced stops in English ranges from languages and report that “velar stops have the lon-

— 132 — Voice Onset Time of Word-Initial Stops and Affricates in Khalkha Mongolian gest VOTs in all of the 13 languages that do not have is the case and the VOT range of this sound differs from contrasts between velar and uvular stops; and in the that of〈 g〉, it follows that there is a phonetic contrast remaining five languages either velars or uvulars have between〈 k〉 and〈 g〉. In addition, if〈 k〉 is seldom re- the longest VOT” (p. 218). Taking these facts into con- placed by the native sounds [x, ɡ, ɢ], it seems that there sideration, it is necessary to examine the VOT values of is room for discussion on whether this sound should be all kinds of obstruents in Mongolian in order to explore acknowledged as a phoneme /k/, even if it occurs only the phonological nature of this contrast. in loan-words, that is, not even in onomatopoeia. In particular, the acoustic data for velar and uvular To summarize, the following questions still remain stops need to be analyzed. Although Svantesson et al. open regarding “the voicing contrast” in Mongolian: (2005, p. 12) state that Mongolian velar and uvular stops are often voiced [ɡ, ɢ] and transcribe them as (2) a. What are the distributions of the VOT values voiced stops /g, ɢ/, they present no acoustic evidence of each obstruent? for this interpretation4). In addition, as shown in Table b. Do the VOT values vary substantially among 2, they claim that the velar and uvular stops only have places of articulation? voiced series, that is, there is no phonemic voicing con- c. Can the observation that the velar and uvular trast in velar and uvular positions. However, the letter stops are phonetically voiced be confirmed к〈 k〉 is present in the orthography, as shown in Table by actual VOT data? 1, and к〈 k〉 occurs in loan-words. Svantesson et al. d. Is it true that the borrowed〈 k〉 is pronounced (2005) state with respect to the pronunciation of〈 k〉 in either as [x, ɢ, ɡ] (i.e., substitution with the loan-words as follows: native phones) or [k] (like Russian /k/)?

(1) a. It is difficult to decide exactly which bor- The purpose of this study is to address these ques- rowed sounds have become regular Mon- tions by carrying out a speech production experiment. golian phonemes, since an individual’s In this article, I focus on VOT of word-initial obstru- pronunciation of loan-words depends on his ents; VOT in the other positions will be discussed in knowledge of the donor language. Thus, another paper. those who know Russian well may pro- In what follows, I use the transliterated letters shown nounce [k] and [f] when they occur in loans, in Table 1, except that Cyrillic letters з, ч, and ж are but it is common to substitute [x] and [ph] for transliterated as〈 dz, č, ǰ〉 respectively, and use the them. (Svantesson et al. 2005, p. 30) “voiced” for〈 b, d, g, dz, ǰ〉 and “voiceless” for〈 p, t, k, b. Russian k, which does not occur in Mongo- ts, č〉, following the transliteration system. lian, is sometimes retained, but may also be changed to x or g/ɢ. (Svantesson et al. 2005, 3. Experiment p. 31) 3.1 Target Words However, Svantesson et al. (2005) and Svantesson The target items are 20 words with a word-initial and Karlsson (2012) regard〈 p〉 as a marginal phoneme voiceless obstruent and 20 words with a word-initial (/ph/), which occurs only in loan-words and some voiced obstruent, as shown in Table 4. The target words onomatopoeia, whereas they do not regard〈 k〉 in loan- with word-initial〈 p-〉 and〈 k-〉, and some other words words as even a marginal phoneme. This is presumably are loan-words from Russian and English. because 〈p〉 is usually pronounced as [ph] and not 〈g〉 is generally pronounced as a uvular stop [ɢ] changed to other sounds (for example, [p]), while〈 k〉 in words with /a, ɔ, ʊ/ and as a [ɡ] in words is, in their view, often changed to the native sounds [x, with /e, ɵ, u/; this realization pattern is related to ɡ, ɢ], or alternatively, because〈 p〉 occurs in onomato- harmony. In other words, [ɡ] and [ɢ] can be interpreted poeia, which indeed constitutes a native-origin word as allophones for the phoneme /g/ in word-initial posi- class, while〈 k〉 does not5). However, as mentioned tion6); since the target words contain /a/, this above, Svantesson et al. (2005) acknowledge that Rus- is realized as [ɢ]. It is true, strictly speaking, that sian /k/ is sometimes retained. This means that〈 k〉 can 〈k〉 and〈 g〉 [ɢ] in this experiment differ in place of be pronounced as unaspirated [k], as with Russian /k/. articulation, but it is plausible to assume that these Furthermore, in fact, quite a few speakers, especially consonants pair with each other before the vowel /a/. younger ones, do seem to pronounce〈 k〉 as [kh]. If this In addition, it is predicted that the difference in VOT

— 133 — 特集「有声性の対立に関する音声と音韻」

Table 4 Target words.

p park ‘park’ par ‘central heating’ patent ‘patent’ paspɔrt ‘passport’ b bars ‘tiger’ bar ‘bar’ batalgaa ‘proof’ baatar ‘hero’

t tal ‘steppe’ taax ‘to guess’ talbai ‘field’ taarax ‘to fit’ d dal ‘shoulder-blade’ daax ‘to bear’ dalbaa ‘flag’ daarax ‘to get cold’

k kart ‘card’ kass ‘cashier’ kanɔn ‘copy’ kamer ‘camera’ g gardz ‘loss’ gaadz ‘gas’ gadzar ‘place’ gaixaš ‘surprise’

ts tsam ‘mask’ tsaas ‘paper’ tsalgix ‘to splash’ tsaatan ‘dukha’* dz dzam ‘road’ dzaal ‘hall’ dzalgix ‘to swallow’ dzaawar ‘insruction’

č čats ‘limb’ čiig ‘dampness’ čarmaix ‘to endeavor’ čʊʊlgan ‘meeting’ ǰ ǰad ‘spear’ ǰiix ‘to stretch’ ǰargal ‘happiness’ ǰʊʊlčin ‘traveler’

*dukha: community of nomadic reindeer herders between the velar stop [ɡ] and the uvular stop [ɢ] is tences were written in Cyrillic orthography. The target small, if any, in view of the fact that these sounds are al- words were arranged randomly and displayed on a lophones for the phoneme /g/ (see Cho and Ladefoged computer once at a time with the two carrier sentences. 1999, pp. 221–222, for the difference in VOT between This procedure was repeated three times per partici- velars and uvulars in languages that contrast these two pant, and the order of the stimuli was randomized per types of sounds). For the above reasons, I regard〈 k〉 each trial. As a result, the number of recorded tokens and〈 g〉 as a pair in this study. was as shown in Table 5. All data were recorded with a portable recorder 3.2 Method (ZOOM H4n [WAV, 44.1 kHz / 16bit]) and a head- The participants in the experiment were 9 native mounted condenser microphone (AKG C520). The speakers of Mongolian (4 males and 5 females). The recordings were conducted in a quiet room at the Mon- age of the speakers ranged from 17 to 20 years old. golian University of Science and Technology. They were all undergraduate students at the Mongolian University of Science and Technology. They had all 3.3 Analysis learned English and Japanese. Some participants had Recorded material was analyzed with Praat (Boers- learned Russian for one year or two years, but did not ma and Weenink 2012); VOT was measured from speak Russian. stop release to the beginning of vocal fold vibration. The target words were read in carrier sentences (3a, b). Illustrative waveforms and spectrograms for stops and affricates are shown in Figures 1 and 2, respectively. j (3) a. — gedeg n juu we? ‘What is —?’ There were some cases in which VOT could not b. bi — geǰ xelsen. ‘I said —’ be measured due to devoicing of the vowel following the obstruent or to fricativization of the obstruent7). The target words occur in sentence-initial position in These data were excluded from analysis. The number (3a), and are preceded by a vowel in (3b). These two of tokens excluded and analyzed in this experiment is carrier sentences were prepared to address the possibil- shown in Table 6. ity that VOT values could be affected by whether the obstruent was preceded by a vowel or not; however, 4. Results and Discussion analyses of the data shows that there was little differ- ence between the two conditions. Thus, in this study, I 4.1 VOT Distribution pooled the data from these two environments. First, I show the distribution and the mean values of Four participants also read the words in isolation, VOT for each obstruent. Figures 3–7 are histograms before reading (3a, b). These speakers read each target representing the VOT distribution of 〈b/p〉, 〈d/t〉, word three times: in isolation, in (3a), and in (3b), while 〈g/k〉, 〈dz/ts〉 and 〈ǰ/č〉, respectively. These histo- the other participants read each target word twice: in grams are based on the total VOT data. The mean VOT (3a) and in (3b). The target words and the carrier sen- values for each stop and affricate in each participant

— 134 — Voice Onset Time of Word-Initial Stops and Affricates in Khalkha Mongolian

Table 5 The number of recorded tokens.

Participant Word Carrier sentence Repetition Number

AS 40 in isolation, (1a) and (1b) 3 360 NS 40 in isolation, (1a) and (1b) 3 360 TG 40 in isolation, (1a) and (1b) 3 360 BE 40 (1a) and (1b) 3 240 BG 40 (1a) and (1b) 3 240 GM 40 (1a) and (1b) 3 240 JB 40 (1a) and (1b) 3 240 NE 40 (1a) and (1b) 3 240 ZZ 40 (1a) and (1b) 3 240

Sum 2,520

Figure 1 Illustrative waveforms and spectrograms for〈 b〉 (bar ‘bar’) and〈 p〉 (par ‘central heating’).

Figure 2 Illustrative waveforms and spectrograms for and〈 dz〉 (dzam ‘road’) and〈 ts〉 (tsam ‘mask’).

— 135 — 特集「有声性の対立に関する音声と音韻」

Table 6 The number of excluded and analyzed tokens.

Recorded Excluded tokens Analyzed Participant tokens 〈p〉 〈b〉 〈g〉 Others Sum tokens

AS 360 0 0 1 0 1 359 NS 360 9 11 0 0 20 340 TG 360 1 2 2 0 5 355 BE 240 0 0 0 0 0 240 BG 240 1 1 0 0 2 238 GM 240 0 1 2 0 3 237 JB 240 2 6 1 1* 10 230 NE 240 0 1 0 0 1 239 ZZ 240 5 1 1 0 7 233 Sum 2,520 18 23 7 1 49 2,471

* The excluded sound was〈 t〉.

Figure 3 VOT distribution of〈 b〉–〈p〉. Figure 6 VOT distribution of〈 dz〉–〈ts〉.

Figure 4 VOT distribution of〈 d〉–〈t〉. Figure 7 VOT distribution of〈 ǰ〉–〈č〉.

are shown in Tables 7 and 8, respectively. The p-values in Tables 7 and 8 show the results of unpaired t-tests per speaker, and thus indicate whether the differences in VOT between voiced and voiceless consonants are statistically significant or not8). The independent vari- able is “voice” (voiced or voiceless) and the dependent variables are VOT values. The statistical analysis was carried out with Microsoft Excel 2010. As seen in the figures, on the whole the observed Figure 5 VOT distribution of〈 g〉–〈k〉. VOT values are larger than 0 ms, though there are a few

— 136 — Voice Onset Time of Word-Initial Stops and Affricates in Khalkha Mongolian

Table 7 Mean VOT values with standard deviations (in parentheses) for stops (in ms).

Bilabial Dental Uvular / Velar Participant 〈b〉 〈p〉 p-value 〈d〉 〈t〉 p-value 〈g〉 〈k〉 p-value

AS 9.3 43.7 p<.01 9.3 64.0 p<.01 31.1 60.4 p<.01 (1.8) (15.9) (1.7) (17.7) (12.9) (10.2)

NS 9.0 42.9 p<.01 9.1 55.1 p<.01 21.6 59.5 p<.01 (12.6) (11.6) (3.0) (12.2) (9.5) (9.5)

TG 16.4 43.8 p<.01 11.7 41.4 p<.01 27.9 54.0 p<.01 (7.2) (19.2) (3.4) (13.0) (10.3) (9.7)

BE 10.2 61.7 p<.01 8.0 84.7 p<.01 15.0 65.9 p<.01 (4.0) (30.2) (1.6) (19.5) (5.0) (16.4)

BG 14.3 43.4 p<.01 14.9 48.0 p<.01 26.7 53.5 p<.01 (4.9) (16.7) (2.9) (17.3) (10.1) (11.7)

GM 3.2 36.6 p<.01 5.4 42.5 p<.01 27.1 48.8 p<.01 (27.4) (15.4) (28.5) (12.9) (19.9) (21.0)

JB 2.9 58.4 p<.01 9.6 58.2 p<.01 36.5 60.9 p<.01 (28.0) (20.6) (14.4) (18.9) (13.7) (14.1)

NE 8.9 61.3 p<.01 9.2 73.4 p<.01 35.0 89.7 p<.01 (1.5) (20.1) (1.5) (19.3) (16.8) (17.8)

ZZ 10.5 54.3 p<.01 9.4 55.4 p<.01 20.5 62.5 p<.01 (3.9) (17.1) (2.4) (12.6) (6.4) (12.5)

Average 9.9 48.8 p<.01 9.7 57.4 p<.01 26.8 61.2 p<.01 (13.7) (20.8) (10.4) (20.6) (13.7) (17.1)

exceptions. It seems safe to conclude that phonetically, tively. This means that〈 g〉 is phonetically a voiceless voiceless obstruents are voiceless aspirated consonants unaspirated stop [q], while〈 k〉 is pronounced as a and voiced obstruents are voiceless unaspirated con- voiceless aspirated stop [kh]. Based on these observa- sonants. tions,〈 g〉 [q] and〈 k〉 [kh] are obviously distinguished Figures 3 and 4 show that VOT values of〈 b〉 and from each other at least phonetically. 〈d〉 distribute around 0 to +20 ms. This means that It is clear from the above results that 〈k〉 was phonetically these consonants are voiceless unaspirated pronounced neither as [ɡ, ɢ] nor [k], but as [kh]. In consonants. In contrast, VOT values of〈 p〉 and〈 t〉 are addition, no tokens of〈 k〉 in this experiment were longer than those of〈 b〉 and〈 d〉, and therefore〈 p〉 pronounced as [x]. In other words, the borrowed〈 k〉 and〈 t〉 should be characterized as voiceless aspirated was always pronounced as [kh] and never changed to consonants. As shown in Table 7, the differences in the native sounds [x, ɡ, ɢ], nor retained as Russian [k], VOT between〈 b, d〉 and〈 p, t〉 are statistically signifi- at least in the experiment here. This result is quite dif- cant in each participant. ferent from the description in Svantesson et al. (2005, Figure 5 shows that〈 g〉 and〈 k〉 differ from each pp. 30–31) (see (1) in Section 2). This issue is discussed other in their distribution of VOT, in the same fashion in Section 4.3. as〈 b–p〉 and〈 d–t〉. All participants pronounced〈 k〉 VOT distribution of affricates is shown in Figures with longer VOT than〈 g〉 and, again, the difference in 6 and 7. VOT values of affricates are generally longer VOT between〈 g〉 and〈 k〉 is statistically significant in than those of stops, mainly because affricates include each participant, as shown in Table 7. The mean VOT the frication interval after stop release. values of〈 g〉 and〈 k〉 are 26.8 ms and 61.1 ms respec-

— 137 — 特集「有声性の対立に関する音声と音韻」

Table 8 Mean VOT values with standard deviations (in parentheses) for affricates (in ms).

Dental Post-alveolar Participant 〈dz〉 〈ts〉 p-value 〈ǰ〉 〈č〉 p-value

AS 45.9 85.7 p<.01 44.9 87.7 p<.01 (8.8) (15.2) (13.6) (23.2)

NS 37.3 68.8 p<.01 42.4 76.1 p<.01 (7.1) (13.2) (9.1) (16.7)

TG 44.8 68.9 p<.01 42.0 65.6 p<.01 (7.4) (11.8) (9.8) (12.4)

BE 43.9 102.0 p<.01 45.2 91.2 p<.01 (9.7) (18.6) (11.6) (21.1)

BG 53.7 90.0 p<.01 50.9 84.4 p<.01 (7.0) (16.5) (13.8) (18.6)

GM 36.6 71.6 p<.01 42.4 81.0 p<.01 (32.4) (12.6) (19.6) (19.3)

JB 41.6 89.1 p<.01 40.9 77.0 p<.01 (6.5) (16.9) (10.2) (16.8)

NE 82.6 153.2 p<.01 52.1 132.8 p<.01 (15.7) (31.9) (13.2) (36.3)

ZZ 42.5 85.3 p<.01 33.3 68.7 p<.01 (7.0) (14.4) (9.1) (15.7)

Average 46.9 88.2 p<.01 43.7 83.7 p < .01 (18.1) (29.2) (13.3) (27.3)

4.2 Difference in VOT by Place of Articulation Next, I examine the difference in VOT values among different places of articulation. Figure 8 shows the mean VOT values of each consonant. As Figure 8 shows, mean VOT values vary not only between manners of articulation stop or affricate, but also among places of articulation (see also Figures 3–5). As for voiceless consonants, VOT is longest for〈 k〉, and shortest for〈 p〉. According to the Tukey–Kramer method (with the independent variables of places of ar- ticulation and the dependent variables of VOT values), Figure 8 Mean VOT values of each consonant. the difference between〈 p〉 and〈 t〉 and that between 〈p〉 and 〈k〉 are statistically significant (p<.01), the longest VOT. This result suggests that the remark while the difference between〈 t〉 and〈 k〉 is not (n.s.). by Svantesson et al. (2005, p. 12) that the uvular stop As for voiced consonants, VOT is significantly longer /ɢ/ (and probably, also the velar stop /g/) is phoneti- for〈 g〉 than〈 b〉 and〈 d〉 (Tukey–Kramer: p<.01), but cally voiced is disconfirmed by the current VOT data; 〈b〉 and〈 d〉 are comparable (Tukey–Kramer: n.s)9). instead, g and ɢ are better characterized as voiceless These tendencies accord well with the observa- consonants. tion by Kent and Read (1992, p. 114) and Cho and Ladefoged (1999, p. 218): velars (and uvulars) have

— 138 — Voice Onset Time of Word-Initial Stops and Affricates in Khalkha Mongolian

4.3 Phonological Implications of the Phonetic Re- stops are phonetically voiced be confirmed alization of〈 k〉 by actual VOT data? As shown in section 4.1,〈 k〉 and〈 g〉 differ from d. Is it true that the borrowed〈 k〉 is pronounced each other in their distribution of VOT, and〈 k〉 in either as [x, ɢ, ɡ] (i.e., substitution with the loan-words was always pronounced as [kh]. This means native phones) or [k] (such as Russian /k/)? that there was a phonetic contrast between〈 k〉 and 〈g〉, and〈 k〉 was never replaced by native sounds The answers to these questions are as follows: such as [x, ɡ, ɢ]. This behavior of〈 k〉 in loan-words is similar to that of〈 p〉, which occurs only in loan-words (5) a. VOTs of〈 b, d, g, dz, ǰ〉 are generally posi- (and some onomatopoeia) and is pronounced as [ph]. If tive, and thus these obstruents should be pho- /ph/ is regarded as a (marginal) phoneme, it seems to be netically regarded as voiceless unaspirated reasonable to accept the existence of a (marginal) pho- consonants. VOTs of〈 p, t, k, ts, č〉 are larger neme /k/ and a phonemic contrast between /k/ and /g/. than those of〈 b, d, g, dz, ǰ〉, and thus〈 p, t, k, It needs to be noted, however, that the participants ts, č〉 should be phonetically characterized as in this experiment were all relatively young speakers voiceless aspirated consonants. (from 17 to 20 years old), while the investigation in b. VOT values vary by place of articulation. Svantesson et al. (2005) were conducted in 1990 and VOTs of velar and uvular stops are larger the ages of the informants were 21, 26, and 36 years than those of bilabial and dental stops, the at that time. The difference between the present study same tendency pointed out by Kent and Read and Svantesson et al. (2005) is likely to be ascribed to (1992). the generational difference. In addition, the participants c. There is no phonetic reason to consider the in this experiment had learned English, which has a velar and uvular stops as voiced consonants phonemic contrast between /k/ and /g/. In general, the from the perspective of the VOT data. young in Mongolia learn English at school and have d. Contrary to the previous descriptions,〈 k〉 is more or less knowledge of English sounds. It is there- pronounced neither as [x, ɢ, ɡ] nor [k], but as fore predicted that young Mongolian speakers general- [kh], in contrast with〈 g〉 [q], at least among ly have a phonemic contrast between /k/ and /g/. On the young speakers. other hand, speakers who have little or no knowledge of English (or Russian) sounds may not have this contrast. These results support Svantesson and Karlsson’s In addition, Van Alphen and Smits (2004) and Ringen claim that the distinctive feature that distinguishes the and Kulikov (2012) point out that the degree of voicing two series of obstruents in Mongolian is aspiration. can be influenced by a second language. In order to However, the acoustic characteristics need to be further clarify the influence of other languages or of speakers’ explored in order to firmly make that conclusion, espe- generation on the phonological system in Mongolian, cially from the perspective of perception. No percep- extensive sociolinguistic research is needed. This issue tual investigation, as far as I know, has been done so far is beyond the scope of this study. The conclusion here on the phonemic contrast of the two series of obstruents is that young Mongolian speakers distinguish〈 k〉 [kh] in Mongolian; thus, perceptual experiments as well as from〈 g〉 [q] phonetically. more extensive research on speech production remain to be done to clarify what distinctive features lie behind 5. Conclusion this contrast.

This study examined the VOT values and VOT dis- Acknowledgements tribution of word-initial obstruents in Mongolian, in order to answer the following questions. I thank two anonymous reviewers for their helpful and constructive comments. My thanks also go to the (4) (=(2)) students and teachers at the Mongolian University of a. What are the distributions of the VOT values Science and Technology for their cooperation in the of each obstruent? experiment. This research was supported by JSPS KA- b. Do the VOT values vary substantially among KENHI Grant Number 17J06051. places of articulation? c. Can the observation that the velar and uvular

— 139 — 特集「有声性の対立に関する音声と音韻」

Kent, R. D. and C. Read (1992) The acoustic analysis of Notes speech. San Diego: Singular Publishing Group. 1) Mongolian (/s/ and /x/) do not have voiced Lisker, L. and A. S. Abramson (1964) “A cross-language phonemic counterparts. In this paper, the term “obstru- study of voicing in initial stops: Acoustical measure- ent” thus includes only stops and affricates. ments.” Word 20, 384–422. 2) The Mongolian orthographic system uses Cyrillic Luvsanvandan, Š. (1964) “The Khalkha-Mongolian phone- letters. mic system.” Acta Orientalia Academiae Scientiarum Hungaricae 17, 175–185. 3) In this table, palatalized consonants (/bj/, for ex- Möömöö, S. (1979) Mongol xelnii awian züi [ in ample) have been omitted because they are not related Mongolian]. Ulaanbaatar: Ulsiin Xewleliin Gazar. to this study. Poppe, N. (1970) Mongolian language handbook. Washing- 4) It is true, however, that the velar and uvular stops ton, D. C.: Center for Applied Linguistics. function phonologically as voiced consonants /g, ɢ/ from Ringen, C. and V. Kulikov (2012) “Voicing in Russian stops: the perspectives of phonotactics and sonority (Svantes- Cross-linguistic implications.” Journal of Slavic Lin- son et al. 2005, pp. 65–68). guistics 20(2), 269–286. 5) This probability was pointed out by one of the re- Sambuudorj, O. (2012) Mongol xelnii ügiin duudlagiin toli viewers. [Pronouncing dictionary of Mongolian]. Ulaanbaatar: 6) In morpheme-final position, both [ɡ] and [ɢ] can Monsudal Xewleliin Gazar. Sanders, A. J. K. and J. Bat-Ireedüi (1999) Colloquial Mon- occur in words with /a, ɔ. ʊ/ and there is a phonemic golian: The complete course for beginners. London: contrast between /g/ and /ɢ/. These are distinguished in Routledge. orthography by absence or presence of a mute vowel (for Shimizu, K. (1996) A cross-language study of voicing con- example,〈 bag〉/bag/ ‘team’ vs.〈 baga〉/baɢ/ ‘small’). trasts of stop consonants in Asian languages. Tokyo: 7) In this experiment, devoicing mainly occurred in the Seibido. target words patent ‘patent’ and batalgaa ‘proof’. The Shiotani, S. and E. Prevjav (2001) Shokyuu mongorugo reason is that the vowel in initial syllable is followed by [Mongolian for beginners]. Tokyo: Daigaku Shorin. t, which has preaspiration. Stuart, D. G. and M. M. Haltod (1957) “The phonology of the 8) Not paired, but unpaired, t-tests were used, since word in modern standard Mongolian.” Word 13, 65–99. some data were excluded from the analysis, and the Svantesson, J. and A. M. Karlsson (2012) “Preaspiration in modern and old Mongolian.” Suomalais-ugrilaisen number of data point in voiced consonants is not neces- Seuran Toimituksia 264, 453–464. sarily the same as that in the paired voiceless consonants. Svantesson, J., A. Tsendina, A. M. Karlsson and V. Franzén 9) Here again, the statistical analyses were carried out (2005) The phonology of Mongolian. Oxford: Oxford with Microsoft Excel 2010. University Press. Takada, M. (2011) Nihongo no gotoo heisaon no kenkyuu: VOT no kyoojiteki bumpu to tsuujiteki henka [Research References on word-initial stops in Japanese: Synchronic distribu- Boersma, P. and D. Weenink (2012) Praat: Doing phonetics tion and diachronic changes of VOT]. Tokyo: Kurosio by computer (Version 5.3.23). http://www.praat.org/ Publishers. Cho, T. and P. Ladefoged (1999) “Variation and universals in Tsoloo, J. (1976) Orchin tsagiin Mongol xelnii awia züi [Pho- VOT: Evidence from 18 languages.” Journal of Phonet- netics in modern Mongolian]. Ulaanbaatar: Mongolian ics 27(2), 207–229. Academy of Science. Janhunen, J. A. (2012) Mongolian. Amsterdam: John Ben- Van Alphen, P. M. and R. Smits (2004) “Acoustical and jamins Publishing Company. perceptual analysis of the voicing distribution in Dutch Karlsson, A. M. and J. Svantesson (2011) “Preaspiration in initial : The role of prevoicing.” Journal of Pho- Mongolian dialects: Acoustic properties of contrastive netics 32(4), 455–491. stops.” Paper Presented at The 10th Seoul International Yamakoshi, Y. (2012) Kuwashiku wakaru mongorugo Altaistic Conference, 125–140. bumpoo (CD tsuki) [Understandable Mongolian gram- Karlsson, A. M. and J. Svantesson (2012) “Aspiration of mar (with CD)]. Tokyo: Hakusuisha. stops in Altaic languages: An acoustic study.” Altai Hakpo 22, 205–222. (Received Apr. 5, 2018, Accepted Aug. 14, 2018)

— 140 —