Spectral Measures for Sibilant Fricatives of English, Japanese, and Mandarin Chinese
Total Page:16
File Type:pdf, Size:1020Kb
SPECTRAL MEASURES FOR SIBILANT FRICATIVES OF ENGLISH, JAPANESE, AND MANDARIN CHINESE Fangfang Li a, Jan Edwards b, & Mary Beckman a aDepartment of Linguistics, OSU & bDepartment of Communicative Disorders, UW {cathy|mbeckman}@ling.ohio-state.edu, [email protected] ABSTRACT English /s/ and / Ѐ/ can be differentiated by the Most acoustic studies of sibilant fricatives focus on spectral properties of the frication itself [1, 8]. As languages that have a place distinction like the mentioned earlier, English /s/ and / Ѐ/ contrast in English distinction between coronal alveolar /s/ place, with the narrowest lingual constriction being and coronal post-alveolar / Ѐ/. Much less attention made more backward in the oral cavity for / Ѐ/ than has been paid to languages such as Japanese, for /s/. The longer front cavity in producing / Ѐ/ where the contrast involves tongue posture as lowers the overall frequency for the major energy much as position. That is, the Japanese sibilant that concentration in the fricative spectrum. The length an alveolopalatal fricative difference is further enhanced by lip protrusion in ,/נ / contrasts with /s/ is that has a “palatalized” tongue shape (a bunched /Ѐ/, so that it has been consistently observed that predorsum). This paper describes measures that there is more low-frequency energy for the / Ѐ/ can be calculated from the fricative interval alone, spectrum and more high-frequency energy for /s/ which we applied both to the place distinction of [8, 16]. English and the “palatalization” or posture This generalization about a difference in energy distinction of Japanese. The measures were further tested on Mandarin Chinese, a language that has a distribution between /s/ and / Ѐ/ can be captured three-way contrast in sibilant fricatives contrasting effectively by the centroid frequency, the first in both tongue position and posture. moment or mean frequency when the power spectrum is treated as a probability distribution [3, Keywords: Spectral analysis, Sibilant fricatives 9, 15]. This is a measure that negatively correlates with the length of the front resonating cavity, and 1. INTRODUCTION thus roughly describes where the constriction is Although both English and Japanese have a 2-way made relative to the length of the oral cavity. contrast in sibilant fricatives, they contrast in This interpretation of the centroid as primarily different articulatory aspects. In English, dental or reflecting front-cavity resonances fails in fricatives alveolar /s/ contrasts with coronal post-alveolar /S/ such as the Mandarin retroflex / ß/. The short slack in tongue position, with the narrowest constriction constriction used in producing / ß/ allows coupling for / S/ being further back in the oral cavity than of the front cavity with the back cavity and that for /s/, whereas in Japanese, dental /s/ therefore creates a low-frequency prominence in primarily in the fricative spectrum [17]. If the centroid is to be /נ / contrasts with alveolopalatal tongue posture, with the front of the tongue body calculated over the whole spectrum, the low being bunched up towards the palate in producing centroid value reflects the back cavity resonances .but not in the production of /s/ [7, 10, 18]. as much as it does the front cavity resonances ,/נ/ Therefore, this measure needs to be refined in Mandarin Chinese is a language that has a 3-way order to assess front cavity resonances in the contrast among dental/alveolar /s/, alveolopalatal sibilant fricatives of Mandarin Chinese. To assess a tongue posture distinction, such as /נ / and retroflex postalveolar / ß/, where ,/נ/ contrasts with /s/ and / ß/ in posture, and / ß/ that in Japanese, Polish, or Mandarin Chinese, in place [11, 18]. several researchers [4, 7] have proposed using the /נ / contrasts with /s/ and Extensive research has been done on languages F2 frequency taken at the onset of the vowel that have a place distinction in sibilant fricatives, following the fricative to index the length of the especially English. It has been suggested that back cavity in the fricative. A “palatalized posture” creates a good-quality loud-speakers connected to the audio /נ / in producing alveolopalatal fricative long palatal channel and a shorter back cavity, port of an IBM Thinkpad notebook computer, which yields higher onset F2 frequency than along with a picture prompt presented on the otherwise. However, this onset F2 frequency computer screen. All productions were recorded measure varies much more with the following using a Marantz 660 Flashcard recorder. vowel than do measures taken from the fricative 2.2. Developing Spectral Measures interval, which makes it less appropriate for cross-linguistic comparison. Also, it cannot be a Halle & Stevens [7] examine acoustic properties of and retroflex /נ / reliable measure for the Tokyo dialect of Japanese, the Polish alveolopalatal fricative where vowels are frequently devoiced or even fricative / ß/, and point out that the two fricatives deleted [13], as shown in Fig. 1. Therefore, in this differ in the presence or absence of a spectral peak paper, we explore better measures that can be in the F2 region of the fricative noise spectrum. calculated from the fricative interval alone, to see This is because the “palatalized” posture in whether any could be applied both to the place forms a long palatal channel, which /נ / producing contrast between the two coronal fricatives of English and of Mandarin, and to the posture effectively prevents acoustic coupling to the back contrast between coronal fricatives and the cavity, and results in little resonance in the F2 alveolopalatal fricative of Japanese and Mandarin. region of the fricative spectrum. By contrast, the short constriction made by the retroflex fricative / ß/ Figure 1: Production of sakana ‘fish’ by a male adult speaker of Tokyo Japanese, with the first [a] devoiced. yields a spectral peak in the F2 region owing to the considerable acoustic coupling with the back 5000 cavity. Halle and Stevens locate the F2 region at 4000 around 1500 Hz, the corresponding F2 of a neutral 3000 vowel as produced by typical American white male speakers [2]. Their results agree with those of 2000 Fujisaki & Kunisaki [5], who model the acoustics 1000 of voiceless fricatives in Japanese, and show that at a point around /נ / adding a zero in the model for 0 ./from /s /נ / s a k a n a 1500 Hz effectively differentiates 0.3 0.6 In this study, two spectral parameters are time (seconds) proposed. Both measures are taken from Bark spectra calculated over 20 ms windows from the 2. METHODS middle of the fricative noise interval. The first is the Amplitude Ratio (“ampRatio” hereafter), which 2.1. Materials and Data collection is the difference in dB between the amplitude of The materials were lists of words beginning with the most prominent peak and the F2 amplitude. the target consonants followed by one of the five This measure is designed to assess the degree of vowels /a/, /i/, /o/, /e/ or /u/, if allowed by the “palatalization” — i.e., the tongue posture the value of ,/נ / phonotactics of the language. For example, difference. For alveolopalatal Japanese /s/ was not elicited in the context of /i/, ampRatio is predicted to be larger due to the lack since it is phonotactically illegal. Also, the English /i/, of resonance in the F2 region, whereas that of / ß/ is /e/, and /u/ contexts included both the tense and the expected to be much smaller due to the coupling lax vowels. There were three word targets for each between the front and the back cavities. vowel context. Thirty speakers aged from 18 to 30 This “ampRatio” is similar to the relative years were recruited for this study, with all U. S. amplitude measure developed by McGowan & English speakers recorded in Columbus, Ohio, all Nittrouer [14] for English children’s productions. Japanese speakers in Tokyo, and all Mandarin However, our “ampRatio” differs from their speakers in Songyuan, China, to ensure dialect relative amplitude measure in how we identify the homogeneity. For each language, there were 5 F2 region. McGowan and Nittrouer measured an males and 5 females. Productions were elicited in a observable F2 peak in the spectrum, with the F2 word-repetition task, with each word prompted by peak being usually prominent in children’s speech. a digitized audio stimulus played through In this paper, since what we look for is the the oral cavity, and also that the tongue shape is amplitude of an F2 anti-formant, it is hard to locate apical, which allows much coupling with the back are separated /נ / it directly from the spectrum. Instead, the cavity. At the same time, /s/ and /נ/-/frequency region of F2 was calculated by taking jointly by the two parameters, mimicking the /s the 3/5 ratio of the F3 frequency of the back vowel contrast pattern in Japanese. The LDA again /a/ for each speaker. This approximates the F2 for a correctly identified 92% of the tokens. theoretical neutral vowel, which Japanese and Mandarin Chinese do not have. 4. DISCUSSION The second measure is a refinement of a standard measure for the place distinction. It is the Our results showed that ampRatio and CentHigh centroid frequency calculated for the frequency can effectively capture the differences among band above this “F2 region” (CentHigh hereafter). sibilant fricatives that contrast in two different Since spectral energy in the low frequency range aspects of articulatory gestures across the three can reflect a coupled back cavity resonance, languages. These parameters also showed robust calculating the centroid only in the high frequency invariance across vowel contexts and speakers, as band makes it a more precise measure of front demonstrated by the fairly clear separation cavity resonance. More specifically, for example, between fricatives in each of the panels in Fig.