Relating Optical Speech to Speech Acoustics and Visual Speech Perception
Total Page:16
File Type:pdf, Size:1020Kb
UNIVERSITY OF CALIFORNIA Los Angeles Relating Optical Speech to Speech Acoustics and Visual Speech Perception A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Electrical Engineering by Jintao Jiang 2003 i © Copyright by Jintao Jiang 2003 ii The dissertation of Jintao Jiang is approved. Kung Yao Lieven Vandenberghe Patricia A. Keating Lynne E. Bernstein Abeer Alwan, Committee Chair University of California, Los Angeles 2003 ii Table of Contents Chapter 1. Introduction ............................................................................................... 1 1.1. Audio-Visual Speech Processing ............................................................................ 1 1.2. How to Examine the Relationship between Data Sets ............................................ 4 1.3. The Relationship between Articulatory Movements and Speech Acoustics........... 5 1.4. The Relationship between Visual Speech Perception and Physical Measures ....... 9 1.5. Outline of This Dissertation .................................................................................. 15 Chapter 2. Data Collection and Pre-Processing ...................................................... 17 2.1. Introduction ........................................................................................................... 17 2.2. Background ........................................................................................................... 18 2.3. Recording a Database for the Correlation Analysis.............................................. 19 2.3.1. Talkers................................................................................................... 19 2.3.2. Materials................................................................................................ 20 2.3.3. Recording Facilities...............................................................................21 2.3.4. Placement of EMA Pellets and QualisysTM Retro-Reflectors............... 24 2.3.5. Synchronization..................................................................................... 27 2.3.6. Recording Procedure............................................................................. 28 2.4. Recording a Database for the Perceptual Similarity Analysis .............................. 29 2.5. Perceptual Experiments......................................................................................... 30 2.5.1. Participants............................................................................................ 30 2.5.2. Video Presentations............................................................................... 30 2.5.3. Procedure............................................................................................... 31 2.6. Conditioning the Data ........................................................................................... 32 2.6.1. Audio..................................................................................................... 32 2.6.2. Optical Data........................................................................................... 34 2.6.3. EMA Data............................................................................................. 37 2.6.4. All Three Data Streams......................................................................... 39 2.7. Summary of Physical Measures and Perceptual Data........................................... 39 2.7.1. Physical Measures................................................................................. 39 2.7.2. Perceptual Data ..................................................................................... 42 2.8. Summary ............................................................................................................... 44 Chapter 3. Multilinear Regression, Multidimensional Scaling, Hierarchical Clustering Analysis, and Phoneme Equivalence Classes.............................................45 iii 3.1. Introduction ........................................................................................................... 45 3.2. Multilinear Regression .......................................................................................... 45 3.2.1. Mean Subtraction.................................................................................. 45 3.2.2. Multilinear Regression .......................................................................... 46 3.2.3. Jackknife Procedure .............................................................................. 47 3.2.4. Goodness of Fit ..................................................................................... 48 3.2.5. Maximum Correlation Criterion Estimation .........................................49 3.3. Phi-Square Transformation ................................................................................... 52 3.4. Multidimensional Scaling .....................................................................................55 3.5. Hierarchical Clustering Analysis and Phoneme Equivalence Classes.................. 59 3.6. Summary ............................................................................................................... 63 Chapter 4. On the Relationship between Face Movements, Tongue Movements, and Speech Acoustics ...................................................................................................... 64 4.1. Introduction ........................................................................................................... 64 4.2. Background ........................................................................................................... 64 4.3. Analysis of CV Syllables ...................................................................................... 67 4.3.1. Consonants: Place and Manner of Articulation..................................... 67 4.3.2. Syllable-Dependent Predictions ............................................................ 67 4.3.3. Discussion ............................................................................................. 72 4.4. Examining the Relationships between Data Streams for Sentences ..................... 75 4.4.1. Analysis................................................................................................. 75 4.4.2. Results ...................................................................................................76 4.4.3. Discussion ............................................................................................. 77 4.5. Prediction Using Reduced Data Sets..................................................................... 80 4.5.1. Analysis................................................................................................. 80 4.5.2. Results ...................................................................................................81 4.5.3. Discussion ............................................................................................. 82 4.6. Predicting Face Movements from Speech Acoustics Using Spectral Dynamics.. 83 4.6.1. Analysis................................................................................................. 83 4.6.2. Results of Correlation Analysis Using Dynamical Information ........... 86 4.6.3. Discussion ............................................................................................. 90 4.7. Summary ............................................................................................................... 90 Chapter 5. The Relationship between Visual Speech Perception and Physical Measures…… .................................................................................................................. 93 5.1. Introduction ........................................................................................................... 93 iv 5.2. Background ........................................................................................................... 93 5.3. Method .................................................................................................................. 97 5.3.1. Analyses of Perceptual Data ................................................................. 97 5.3.2. 3-D Optical Signal Analyses................................................................. 98 5.3.3. Consonant Classification, Traditional Visemes, and Phoneme Equivalence Classes ............................................................................................ 101 5.3.4. Analysis Approach .............................................................................. 102 5.4. Results and Discussion........................................................................................ 105 5.4.1. Overall Visual Perception Results ...................................................... 105 5.4.2. Predicting Visual Perceptual Measures from Physical Measures ....... 106 5.5. Multidimensional Scaling Analysis .................................................................... 110 5.6. Phoneme Equivalence Class (PEC) Analysis...................................................... 116 5.7. General Discussion.............................................................................................. 120 5.8. Summary ............................................................................................................. 125 Chapter 6. Examining the Correlations between Face Movements and Speech Acoustics Using Mutual Information Faces................................................................ 126 6.1. Introduction