Relating Optical Speech to Speech Acoustics and Visual Speech Perception

Relating Optical Speech to Speech Acoustics and Visual Speech Perception

UNIVERSITY OF CALIFORNIA Los Angeles Relating Optical Speech to Speech Acoustics and Visual Speech Perception A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Jintao Jiang 2003 i © Copyright by Jintao Jiang 2003 ii The dissertation of Jintao Jiang is approved. Kung Yao Lieven Vandenberghe Patricia A. Keating Lynne E. Bernstein Abeer Alwan, Committee Chair University of California, Los Angeles 2003 ii Table of Contents Chapter 1. Introduction ............................................................................................... 1 1.1. Audio-Visual Speech Processing ............................................................................ 1 1.2. How to Examine the Relationship between Data Sets ............................................ 4 1.3. The Relationship between Articulatory Movements and Speech Acoustics........... 5 1.4. The Relationship between Visual Speech Perception and Physical Measures ....... 9 1.5. Outline of This Dissertation .................................................................................. 15 Chapter 2. Data Collection and Pre-Processing ...................................................... 17 2.1. Introduction ........................................................................................................... 17 2.2. Background ........................................................................................................... 18 2.3. Recording a Database for the Correlation Analysis.............................................. 19 2.3.1. Talkers................................................................................................... 19 2.3.2. Materials................................................................................................ 20 2.3.3. Recording Facilities...............................................................................21 2.3.4. Placement of EMA Pellets and QualisysTM Retro-Reflectors............... 24 2.3.5. Synchronization..................................................................................... 27 2.3.6. Recording Procedure............................................................................. 28 2.4. Recording a Database for the Perceptual Similarity Analysis .............................. 29 2.5. Perceptual Experiments......................................................................................... 30 2.5.1. Participants............................................................................................ 30 2.5.2. Video Presentations............................................................................... 30 2.5.3. Procedure............................................................................................... 31 2.6. Conditioning the Data ........................................................................................... 32 2.6.1. Audio..................................................................................................... 32 2.6.2. Optical Data........................................................................................... 34 2.6.3. EMA Data............................................................................................. 37 2.6.4. All Three Data Streams......................................................................... 39 2.7. Summary of Physical Measures and Perceptual Data........................................... 39 2.7.1. Physical Measures................................................................................. 39 2.7.2. Perceptual Data ..................................................................................... 42 2.8. Summary ............................................................................................................... 44 Chapter 3. Multilinear Regression, Multidimensional Scaling, Hierarchical Clustering Analysis, and Phoneme Equivalence Classes.............................................45 iii 3.1. Introduction ........................................................................................................... 45 3.2. Multilinear Regression .......................................................................................... 45 3.2.1. Mean Subtraction.................................................................................. 45 3.2.2. Multilinear Regression .......................................................................... 46 3.2.3. Jackknife Procedure .............................................................................. 47 3.2.4. Goodness of Fit ..................................................................................... 48 3.2.5. Maximum Correlation Criterion Estimation .........................................49 3.3. Phi-Square Transformation ................................................................................... 52 3.4. Multidimensional Scaling .....................................................................................55 3.5. Hierarchical Clustering Analysis and Phoneme Equivalence Classes.................. 59 3.6. Summary ............................................................................................................... 63 Chapter 4. On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics ...................................................................................................... 64 4.1. Introduction ........................................................................................................... 64 4.2. Background ........................................................................................................... 64 4.3. Analysis of CV Syllables ...................................................................................... 67 4.3.1. Consonants: Place and Manner of Articulation..................................... 67 4.3.2. Syllable-Dependent Predictions ............................................................ 67 4.3.3. Discussion ............................................................................................. 72 4.4. Examining the Relationships between Data Streams for Sentences ..................... 75 4.4.1. Analysis................................................................................................. 75 4.4.2. Results ...................................................................................................76 4.4.3. Discussion ............................................................................................. 77 4.5. Prediction Using Reduced Data Sets..................................................................... 80 4.5.1. Analysis................................................................................................. 80 4.5.2. Results ...................................................................................................81 4.5.3. Discussion ............................................................................................. 82 4.6. Predicting Face Movements from Speech Acoustics Using Spectral Dynamics.. 83 4.6.1. Analysis................................................................................................. 83 4.6.2. Results of Correlation Analysis Using Dynamical Information ........... 86 4.6.3. Discussion ............................................................................................. 90 4.7. Summary ............................................................................................................... 90 Chapter 5. The Relationship between Visual Speech Perception and Physical Measures…… .................................................................................................................. 93 5.1. Introduction ........................................................................................................... 93 iv 5.2. Background ........................................................................................................... 93 5.3. Method .................................................................................................................. 97 5.3.1. Analyses of Perceptual Data ................................................................. 97 5.3.2. 3-D Optical Signal Analyses................................................................. 98 5.3.3. Consonant Classification, Traditional Visemes, and Phoneme Equivalence Classes ............................................................................................ 101 5.3.4. Analysis Approach .............................................................................. 102 5.4. Results and Discussion........................................................................................ 105 5.4.1. Overall Visual Perception Results ...................................................... 105 5.4.2. Predicting Visual Perceptual Measures from Physical Measures ....... 106 5.5. Multidimensional Scaling Analysis .................................................................... 110 5.6. Phoneme Equivalence Class (PEC) Analysis...................................................... 116 5.7. General Discussion.............................................................................................. 120 5.8. Summary ............................................................................................................. 125 Chapter 6. Examining the Correlations between Face Movements and Speech Acoustics Using Mutual Information Faces................................................................ 126 6.1. Introduction

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    213 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us