Classification of Stop Consonant Place of Articulation
Total Page:16
File Type:pdf, Size:1020Kb
Classification of stop consonant place of articulation by Atiwong Suchato B.Eng., Chulalongkorn University (1998) S.M., Massachusetts Institute of Technology (2000) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of M ASSACHUSETTS INSTITUTE Doctor of Philosophy OF TECHNOLOGY at the JUL 2 6 2004 Massachusetts Institute of Technology LIBRARIES June 2004 @ Massachusetts Institute of Technology 2004. All rights reserved. Author /Department of Electric ngineering and Computer Science April 27t', 2004 Certified by Kenneth N. Stevens ClarenweJ. LeBel Professor of Electrical Engineering an4 Professor of Healtl-,Sciences & Technology Thepigj$upervisor Accepted by Arthur C. Smith Chairman, Departmental Committee on Graduate Students BARKER This page is intentionally left blank. 2 Classification of Stop Consonant Place of Articulation by Atiwong Suchato Submitted to the Department of Electrical Engineering and Computer Science on April 27 th, 2004, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electrical Engineering and Computer Science Abstract One of the approaches to automatic speech recognition is a distinctive feature-based speech recognition system, in which each of the underlying word segments is represented with a set of distinctive features. This thesis presents a study concerning acoustic attributes used for identifying the place of articulation features for stop consonant segments. The acoustic attributes are selected so that they capture the information relevant to place identification, including amplitude and energy of release bursts, formant movements of adjacent vowels, spectra of noises after the releases, and some temporal cues. An experimental procedure for examining the relative importance of these acoustic attributes for identifying stop place is developed. The ability of each attribute to separate the three places is evaluated by the classification error based on the distributions of its values for the three places, and another quantifier based on F-ratio. These two quantifiers generally agree and show how well each individual attribute separates the three places. Combinations of non-redundant attributes are used for the place classifications based on Mahalanobis distance. When stops contain release bursts, the classification accuracies are better than 90%. It was also shown that voicing and vowel frontness contexts lead to a better classification accuracy of stops in some contexts. When stops are located between two vowels, information on the formant structures in the vowels on both sides can be combined. Such combination yielded the best classification accuracy of 95.5%. By using appropriate methods for stops in different contexts, an overall classification accuracy of 92.1% is achieved. Linear discriminant function analysis is used to address the relative importance of these attributes when combinations are used. Their discriminating abilities and the ranking of their relative importance to the classifications in different vowel and voicing contexts are reported. The overall findings are that attributes relating to the burst spectrum in relation to the vowel contribute most effectively, while attributes relating to formant transition are somewhat less effective. The approach used in this study can be applied to different classes of sounds, as well as stops in different noise environments. Thesis supervisor: Professor Kenneth Noble Stevens Title: Clarence J. LeBel Professor of Electrical Engineering and Professor of Health Sciences and Technology 3 This page is intentionally left blank. 4 Acknowledgement It takes more than determination and hard work for the completion of this thesis. Supports I received from many people through out my years here at MIT undeniably played an important role. My deepest gratitude clearly goes to Ken Stevens, my thesis supervisor and my teacher. His genuine interest in the field keeps me motivated, while his kindness and understanding always help me go through hard times. Along with Ken, I would like to thank Jim Glass and Michael Kenstowicz for the time they spent on reading several drafts of this thesis and their valuable comments. I would like to specially thank Janet Slifka for sharing her technical knowledge, as well as her study and working experience, and Arlene Wint for her help with many administrative matters. I could hardly name anything I have accomplished during my doctoral program without their help. I also would like to thank all of the staffs in the Speech Communication Group, including Stefanie Shattuck-Hufnagel, Sharon Manuel, Melanie Matthies, Mark Tiede, and Seth Hall, as well as all of the students in the group, especially Neira Hajro, Xuemin Chi, and Xiaomin Mou. My student life at MIT would have been much more difficult without so many good friends around me. Although it is not possible to name all of them in this space, I would like to extend my gratitude to them all and wish them all the best. I would like to thank my parents who are always supportive. Realizing how much they want me to be successful is a major drive for me. Also, I would like to thank Mai for never letting me give up, and for always standing by me. Finally, I would like to thank Anandha Mahidol Foundation for giving me the opportunity to pursue the doctoral degree here at MIT. This work has been supported in part by NIH grant number DC 02978 5 This page is intentionally left blank. 6 Table of Contents Chapter 1 Introduction............................................................................................ 17 1.1 M otivation ..................................................................................................... 17 1.2 Distinctive feature-based Speech Recognition System.................................. 19 1.3 An Approach to Distinctive Feature-based speech recognition.................... 20 1.4 Literature Review.......................................................................................... 25 1.5 Thesis Goals.................................................................................................. 29 1.6 Thesis Outline ................................................................................................ 29 Chapter 2 Acoustic Properties of Stop Consonants............................................... 31 2.1 The Production of Stop Consonants ............................................................. 31 2.2 Unaspirated Labial Stop Consonants ........................................................... 33 2.3 Unaspirated Alveolar Stop Consonants ......................................................... 34 2.4 Unaspirated Velar Stop Consonants ............................................................. 34 2.5 Aspirated Stop Consonants ........................................................................... 36 2.6 Chapter Summary .......................................................................................... 36 Chapter 3 Acoustic Attribute Analysis.................................................................. 37 3.1 SP D atabase .................................................................................................. 38 3.2 Acoustic Attribute Extraction ...................................................................... 40 3.2.1 Averaged Power Spectrum ........................................................................ 40 3.2.2 Voicing Onsets and Offsets ...................................................................... 41 3.2.3 Measurement of Formant Tracks ............................................................. 41 3.3 Acoustic Attribute Description ...................................................................... 43 3.3.1 Attributes Describing Spectral Shape of the Release Burst...................... 43 3.3.2 Attributes Describing the Formant Frequencies ........................................ 48 3.3.3 Attributes Describing the spectral shape between the release burst and the voicing onset of the following vowel.................................................................... 50 3.3.4 Attributes Describing Some Temporal Cues ............................................ 50 3.4 Statistical Analysis of Individual Attributes ................................................. 52 3 .4 .1 R esu lts........................................................................................................... 5 7 3.4.2 Comparison of Each Acoustic Attribute's Discriminating Property ...... 93 3.4.3 Correlation Analysis ................................................................................. 96 3.5 Chapter Summary .......................................................................................... 98 Chapter 4 Classification Experiments ..................................................................... 101 4.1 Classification Experiment Framework ........................................................... 102 4.1.1 Acoustic Attribute Selection....................................................................... 102 4.1.2 Classification Result Evaluation................................................................. 104 4.1.3 Statistical Classifier .................................................................................... 105 4.1.4 Classification Context................................................................................. 106 4.2 LOOCV Classification Results for Stops Containing Release Burst.............. 107 7 4.3 LOOCV Classification