Voicing and Geminacy in Japanese: an Acoustic and Perceptual Study*
Total Page:16
File Type:pdf, Size:1020Kb
Voicing and geminacy in Japanese: An acoustic and perceptual study* Shigeto Kawahara University of Massachusetts, Amherst 1. Introduction Maintaining voicing in obstruents is articulatorily challenging. During obstruent closure, intraoral air pressure goes up quickly, and as a consequence it becomes difficult to maintain a sufficient transglottal air pressure drop to produce voicing. This difficulty becomes more problematic in geminates, which have long closures (Hayes and Steriade 2004; Jaeger 1978; Ohala 1983; Westbury 1979). Reflecting this articulatory difficulty, historically, Japanese allowed no voiced geminates. Various alternations support this distributional restriction. Coda nasalization in mimetic gemination in (1b) is induced to avoid voiced geminates (Kuroda 1965; Itô and Mester 1999). A root-final vowel in Sino- Japanese is syncopated in compounds, with subsequent place assimilation of the root- final consonant to the following consonant, as shown in (2a) (Itô and Mester 1996); however, such syncope is blocked when it would result in a voiced geminate, as in (2b): (1) a. /tapu+µ+ri/ → [tappuri] *[tampuri] ‘a lot of’ b. /zabu+µ+ri/ → [zamburi] *[zabburi] ‘splashing sound’ (2) a. /hatu+kaku/ → [hakkatsu] *[hatsutatsu] ‘revelation’ b. /hatu+gen/ → [hatsugen] *[haggen] ‘remarks’ Although a restriction against voiced geminates in Japanese is clearly motivated, in recent loanwords, we do find voiced geminates (McCawley 1968; Itô and Mester 1999 among others). A word-final sound in a donor language is geminated, and the following vowel is epenthesized, even when this results in a voiced geminate, as in ‘dog’ borrowed as [doggu] (Lovins 1973; Katayama 1998; Shirai 1999; Takagi and Mann 1994). Such a * I am thankful to José Benki, Kathryn Flack, Ben Gelbart, Joe Pater, Chris Potts, John McCarthy, Caren Rotello and Taka Shinya for their comments and suggestions on this project. I would like to thank especially John Kingston for his support in virtually every aspect of the studies reported here. Finally, I wish to express my gratitude for all the Japanese speakers who participated in the experiments reported below. All remaining errors are mine. voiced geminate minimally contrasts with a voiceless geminate; there are minimal pairs like [kiddo] ‘kid’ and [kitto] ‘kit’ that show that voicing is indeed phonemic for geminates in the loanword phonology. Yet, as mentioned above, voicing in geminates is aerodynamically challenging, and there is evidence that voicing in singletons and voicing in geminates behave differently in Japanese phonology; Nishimura (2003) and Kawahara (2005) show that only voiced geminates, but not voiced singletons, devoice when they cooccur with another voiced obstruent. In other words, only voicing in geminates can be lost in response to the OCP(voi), which prohibits more than one voiced obstruent within a stem (e.g. Itô and Mester 2003). (3) Voicing in geminates can optionally be lost in response to the OCP(voi) gebberusu ~ gepperusu ‘Göbbels (proper name)’ guddo ~ gutto ‘good’ beddo ~ betto ‘bed’ doggu ~ dokku ‘dog’ baggu ~ bakku ‘bag’ (4) Voicing in singletons is not lost bagii ‘buggy’ bogii ‘bogey’ bobu ‘Bob’ doguma ‘dogma’ dagu ‘Doug’ daibu ‘dive’ giga ‘giga- (109)’ gaburieru ‘Gabriel’ To account for this asymmetry, following the P-Map hypothesis (Steriade 2001), Kawahara (2005) hypothesizes that voicing in geminates is more easily lost because voicing in geminates is harder to hear. Cross-linguistically, contrasts that are signaled by weaker cues are more prone to phonological neutralization (Hura et al. 1992; Jun 2004; Kohler 1990). For example, preconsonantal consonants have many disadvantages in signaling their place: they suffer from the lack of CV transitions which provide primary cues for place distinction, and they are often unreleased, which again weakens place cues (see Jun 2004 and references cited therein). As is well-known, preconsonantal consonants undergo place neutralizations much more often than prevocalic consonants. Kawahara (2005) applies the same logic to explain the contrast between (3) and (4). He hypothesized that voicing is harder to detect in geminates, and therefore it is more prone to phonological neutralization. More concretely, for example, the [atta]~[adda] contrast is less reliably perceived than the [ata]~[ada] contrast; so it would not have a large perceptual impact if [adda] became [atta], while if [ada] became [ata], it would be more perceptually conspicuous. In other words, neutralizing voicing in geminates is regarded as “perceptually tolerated articulatory simplification” (Kohler 1990): since voicing in geminates is hard to perceive, its loss does not have a large perceptual consequence, hence tolerated. Just as preconsonantal consonants are more likely to undergo place neutralization than prevocalic consonants, perceptually weak voicing in geminates is more easily lost than more robustly cued voicing in singletons. See Kawahara (2005) on why the loss of voicing in (4) cannot be purely due to the articulatory difficulty of voicing in geminates. This paper reports phonetic studies that aim to verify the hypothesis that voicing in geminates is less reliably perceived than voicing in singletons. As little is known about voicing cues in Japanese voiced geminates, I began with an acoustic experiment that identified a set of acoustic cues that distinguishes voiceless and voiced consonants. The primary aim of this experiment was to see whether such cues manifest themselves differently in singletons and geminates, and if so, in what ways. In other words, the experiment looked for evidence from acoustics bearing on whether voicing is harder to detect in geminates. The result of this experiment shows that some cues are indeed weakened in geminates, which might lead to higher confusability of voicing in geminates. With these observations in mind, the second experiment more directly tested the core hypothesis of this paper, which is that voicing is harder to detect in geminates than in singletons. In order to most closely replicate the natural environments in which Japanese listeners hear voiced geminates, the natural tokens recorded in the first experiment were used. In the experiment, Japanese speakers identified the presence (or the absence) of voicing in a noisy environment. The result clearly shows that voicing is hard to perceive in geminates, while voicing in singletons is accurately perceived. In summary, the two experiments reported in this paper show the following points: (5) a) Some phonetic correlates of voicing are weakened in geminates. b) Some phonetic differences are enhanced in vowels next to geminates. c) Voicing in geminates is not well perceived. 2. Experiment I: Acoustics of voicing and geminacy The first experiment was designed to investigate the following three questions: (6) 1. What are the phonetic cues that signal voicing in Japanese? 2. How are such cues different in singleton and geminates? 3. Do geminates have a disadvantage in signaling voicing? 2.1. Methods 2.1.1. The speakers and recording Three native speakers of Japanese were recruited at the University of Massachusetts, Amherst. They were all female and in their mid twenties. An informed consent form was obtained from each speaker in accordance with the University of Massachusetts human research subjects guidelines. The dialects the subjects spoke were Shizuoka Japanese (Speaker E), Tokyo Japanese (Speaker T) and Hiroshima Japanese (Speaker W). The frame sentence used in the experiment was Standard (Tokyo) Japanese, and the subjects were asked to read the sentences in Standard Japanese as well. They were all paid for their time. The speech was recorded through a microphone (MicroMic II C420 by AKG) by a CD-recorder (TAS-CAM CD RW-700) in a sound attenuated booth at the University of Massachusetts. The recorded tokens were then digitized with a 22.050 KHz sampling rate and 16 bit quantization level. Including short breaks between each repetition, the recording session lasted about 45 minutes. 2.1.2. The stimuli The stimuli consisted of 36 words, which were mostly nonce words.1 In addition, 36 nonce words were added as fillers. The target words were all disyllabic: the first consonant was [k], the second consonant was the target ([p], [t], [k], [pp], [tt], [kk], [b], [d], [g], [bb], [dd], [gg]) and three different vowels were used ([a], [e], [o]) for both the first syllable and the second syllable (henceforth V1 and V2, respectively); some examples are kappa, kaba, kege, kokko, kodo. The speakers were asked to pronounce these tokens with a HL tonal contour, which is a default pattern in loanword and nonce word pronunciation. Each word was written on a card in katakana orthography, which is conventionally used for loanwords. This was because voiced geminates are found only in loanwords. Six repetitions of each set were recorded, with a short break between each repetition. The order of the stimuli was randomized after each repetition In order to solicit natural utterances and avoid domain-edge strengthening effects on target words (e.g. Fourgeron and Keating 1997), the stimuli were embedded in the following frame sentence: (7) jyaa ____ de onegai then ____ with please ‘Please, (do something) with ___. (casual register)’ In order to avoid the hyper-articulation of the materials in an experimental environment, the speakers were encouraged to produce sentences in a natural speech style. Specifically, they were asked to imagine a situation where they were preparing a party and they wanted