1 an Acoustic Analysis of Labialization of Coronal Nasal Consonants in American English by Elisabeth A. Hon B.S.E., Electrical E
Total Page:16
File Type:pdf, Size:1020Kb
An Acoustic Analysis of Labialization of Coronal Nasal Consonants in American English by Elisabeth A. Hon B.S.E., Electrical Engineering Princeton University, 2003 Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology September 2005 © 2005 Massachusetts Institute of Technology All rights reserved Signature of Author………………………………………………………………………… Department of Electrical Engineering and Computer Science August 31, 2005 Certified by………………………………………………………………………………… Kenneth N. Stevens Clarence J. LeBel Professor of Electrical Engineering Thesis Supervisor Accepted by………………………………………………………………………………... Prof. A.C. Smith Chair, Committee on Graduate Students 1 An Acoustic Analysis of Labialization of Coronal Nasal Consonants in American English by Elisabeth A. Hon Submitted to the Department of Electrical Engineering and Computer Science on August 31, 2005 in Partial Fulfillment of the Requirements for the Degree of Master of Science in Electrical Engineering and Computer Science ABSTRACT A challenge for speech recognition models is to account for the variation between natural connected speech forms and the canonical forms of the lexicon. This study focuses on one particular sound change common in conversational speech, in which word-final coronal nasal consonants undergo place assimilation toward following word-initial labial consonants. Formant frequency measurements were taken from words ending with coronal nasal consonants in potentially assimilating sentence contexts, and identical words ending in labial nasal consonants, across vowel contexts. The frequency of the second formant at vowel offset and during nasal closure was found to be sufficient to discriminate between underlying forms. There was evidence that even strongly- assimilated coronal segments differ on the basis of these cues from their pure labial counterparts. It is hypothesized that listeners can use these acoustic cues to uncover the intended place of articulation of assimilated segments, without recourse to phonological inference or sentence context. Thesis Supervisor: Kenneth N. Stevens Title: Clarence J. LeBel Professor of Electrical Engineering 2 Table of Contents Title Page ………………………………………………………………………….. 1 Abstract ……………………………………………………………………………. 2 Table of Contents …………………………………………………………………. 3 List of Tables ……………………………………………………………………… 4 List of Figures ……………………………………………………………………... 4 Acknowledgements ………………………………………………………………... 5 1. Introduction 1.1 Place Assimilation …………………………………………………..... 6 1.2 Susceptibility of Coronal Nasals ……………………………………... 8 1.3 Perceptual Studies …………………………………………………... 10 1.4 Articulatory Studies ………………………………………………… 13 1.5 Acoustic Studies …………………………………………………...... 14 1.5.1 Acoustic Cues for Place of Articulation ………………...... 14 1.5.2 Recent Assimilation Studies ……………………………… 16 1.5.3 Current Study ……………………………………………... 17 2. Method 2.1 Recordings ………………………………………………………….. 19 2.2 Measurements ………………………………………………………. 22 3. Analysis and Results 3.1 Cues for Place of Articulation ………………………………………. 26 3.2 Analysis by Degree of Assimilation ………………………………... 36 4. Discussion 4.1 Problems Encountered ……………………………………………… 42 4.2 General Findings ……………………………………………………. 44 5. Conclusions and Future Work ……………………………………………….. 46 Appendices Appendix 1: Sentences Used in Recordings, Part 1 …………………….. 48 Appendix 2: Words Used in Recordings, Part 2 ………………………... 50 References ……………………………………………………………………….. 51 3 List of Tables Table 1: Average F2 frequencies for coronal and labial underlying forms …….. 30 Table 2: Average changes in F2 frequency from vowel midpoint ……………… 31 Table 3: Average F3 frequencies for coronal and labial underlying forms …….. 32 Table 4: Average changes in F3 frequency from vowel midpoint ……………… 33 Table 5: Frequency of place assimilation by speaker …………………………... 37 Table 6: F2 frequencies for speaker MA by assimilation score ………………… 38 Table 7: F2 frequencies for speaker LS by assimilation score …………………. 39 Table 8: F2 frequencies for speaker TH by assimilation score …………………. 40 Table 9: F2 frequencies for speaker KH by assimilation score ………………… 41 List of Figures Fig. 1: Position of DFT Hamming window …………………………………….. 22 Fig. 2: Comparison of spectrograms for “bead” and “bean” …………………… 24 Fig. 3: Spectrograms showing measurement points ……………………………. 28 Fig. 4: Plots of F2 frequencies at vowel offset and nasal closure ………………. 34 Fig. 5: Bar graphs of F2 frequencies with standard deviations …………………. 35 4 Acknowledgements I would like to thank my thesis supervisor, Prof. Ken Stevens, for his guidance and patience in my preparation of this study and this thesis. Many thanks also to Arlene Wint, an invaluable resource in the office, and to all of our office-mates. The helpful spirit and collegiality of the Speech Communication Group at MIT are a direct effect of Prof. Stevens’ kindness and openness, and are a great gift. I would like to thank Dr. David Gow, Jr. for sharing his work and ideas with me, and my subjects, whose patient contributions have made this study possible. I would also like to thank Yoko Saikachi and Tamas Bohm for keeping me company during long hours of measurements. Finally, I would like to thank Kevin Hunt for his companionship and patience during all my endeavors, and my family for their constant support. 5 1. Introduction 1.1 Place Assimilation An ever-present obstacle to the success of speech recognition models is the vast variation between canonical word forms and natural connected speech. One widespread source of variation has been termed “assimilation”, a process by which neighboring segments in connected speech become more similar to each other. In English, the assimilation of a word-final coronal segment to the place of articulation of a following labial or velar segment is common. For example, when the phrase “green beans” is spoken in casual speech, the /n/ in “green” may be produced with a labial closure similar to that of the /b/ in “beans”, making the phrase to some extent resemble “greem beans”. Such alteration in the surface form of spoken words can pose great difficulties to recognition systems which rely on comparison with canonical forms in a lexicon. Various theories as to the causes of assimilation have led to differing conclusions about the ability of the perceptual system to successfully recognize the intended forms. Roughly, these theories may be described in two groups: phonology-based theories and articulation-based theories. Many phonology-based theories of assimilation support the concept of “underspecification” in lexical representation. Under the theory of underspecification, only distinctive feature specifications which are not predictable from context are included in the abstract canonical word forms in the lexicon. Unmarked or default feature values are not specified in the underlying forms, but they are filled in during production if the feature nodes are still empty. Before they are filled in, they are free to assimilate the feature values of neighboring segments. When assimilated forms are presented to the perceptual system, they are successfully recognized as the underlying forms, since they do not mismatch the unspecified features. Lahiri and Marslen-Wilson (1991) claim to have demonstrated the reality of underspecification by examining vowel nasality in Bengali and English. Since vowel 6 nasality is predictable and non-distinctive in English, they reason that the feature [nasal] is never specified for English vowels. In Bengali, vowel nasality is distinctive, and nasal vowels must be specified as [nasal]. Their experiments confirm that, when hearing nasalized vowels, English and Bengali speakers treat them differently, conforming to the differences in specification in their lexicons. When hearing oral vowels, however, English and Bengali speakers treat them the same. Lahiri and Marslen-Wilson interpret this result as evidence that [oral] is the universal default feature value for vowels, and is therefore unspecified in both languages. Oral vowels are then free to assimilate the nasality of a following nasal segment without mismatching the underlying representation. Marslen-Wilson et al (1995) apply the theory of underspecification to their treatment of place assimilation in English. Since only coronal consonants regularly assimilate to the place of articulation of a following consonant, [coronal] is widely assumed by phonologists to be the universal default unspecified place feature (Avery & Rice, 1989; Paradis & Prunet, 1989). Since coronals are unspecified for place of articulation, they may freely assimilate to following labial or velar consonants. Labials and velars are specified for place of articulation and may not assimilate the place of another consonant. When the perceptual system is presented with a surface labial or velar consonant, the input is of course compatible with the labial or velar underlying form, but it is also compatible with the coronal underlying form, since it does not mismatch an unspecified feature. Accordingly, Marslen-Wilson et al (1995) found that listeners treat labial and velar surface forms, which could be derived from labial, velar, or assimilated coronal underlying forms, as more ambiguous than pure coronal surface forms, which could not be derived from assimilated labial or velar forms. The application of underspecification theories to place assimilation