![Speech Sounds of American English](https://data.docslib.org/img/3a60ab92a6e30910dab9bd827208bcff-1.webp)
Speech Sounds of American English • There are over 40 speech sounds in American English which can be organized by their basic manner of production Manner Class Number Vowels 18 Fricatives 8 Stops 6 Nasals 3 Semivowels 4 Affricates 2 Aspirant 1 • Vowels, glides, and consonants differ in degree of constriction • Sonorant consonants have no pressure build up at constriction • Nasal consonants lower the velum allowing airflow in nasal cavity • Continuant consonants do not block airflow in oral cavity 6.345 Automatic Speech Recognition Speech Sounds 1 Vowel Production • No significant constriction in the vocal tract • Usually produced with periodic excitation • Acoustic characteristics depend on the position of the jaw, tongue, and lips [i][@][a][u] 6.345 Automatic Speech Recognition Speech Sounds 2 Vowels of American English • There are approximately 18 vowels in American English made up of monothongs, diphthongs, and reduced vowels (schwa’s) /i¤ / iy beat /O/ ao bought /a¤ / ay bite /I/ ih bit /^/ahbut /O¤ / oy Boyd /e¤ /eybait /o⁄ / ow boat /a⁄ /awbout /E/ehbet /U/ uh book [{] ax about /@/aebat /u/ uw boot [|]ixroses /a/ aa Bob /5/ er Bert [}] axr butter • They are often described by the articulatory features: High/Low, Front/Back, Retroflexed, Rounded,andTense/Lax 6.345 Automatic Speech Recognition Speech Sounds 3 Spectrograms of the Cardinal Vowels Time (seconds) Time (seconds) Time (seconds) Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 16 16 16 16 16 16 16 16 Zero Crossing Rate Zero Crossing Rate Zero Crossing Rate Zero Crossing Rate kHz 8 8 kHzkHz 8 8 kHzkHz 8 8 kHzkHz 8 8 kHz 0 0 0 0 0 0 0 0 Total Energy Total Energy Total Energy Total Energy dB dB dB dB dB dB dB dB Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz dB dB dB dB dB dB dB dB 8 8 8 8 8 8 8 8 Wide Band Spectrogram Wide Band Spectrogram Wide Band Spectrogram Wide Band Spectrogram 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 kHz 4 4 kHzkHz 4 4 kHzkHz 4 4 kHzkHz 4 4 kHz 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Waveform Waveform Waveform Waveform 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 beet bat bott boot /bi¤ t//b@t//bat//but/ 6.345 Automatic Speech Recognition Speech Sounds 4 Vowel Formant Averages • Vowels are often characterized by the lower three formants • High/Low is correlated with the first formant, F1 • Front/Back is correlated with the second formant, F2 • Retroflexion is marked by a low third formant, F3 Female Speakers Male Speakers 3500 3500 F3 F2 F1 F3 F2 F1 3000 3000 2500 2500 2000 2000 1500 1500 1000 1000 Average Frequency (Hz) Average Frequency (Hz) 500 500 0 0 i¤ I e¤ E @ a O ^ o⁄ U u 5 { | i¤ I e¤ E @ a O ^ o⁄ U u 5 { | Vowel Vowel 6.345 Automatic Speech Recognition Speech Sounds 5 Vowel Durations • Each vowel has a different intrinsic duration • Schwa’s have distinctly shorter durations (50ms) • /I, E, ^, U/ are the shortest monothongs • Context can greatly influence vowel duration Female Speakers Male Speakers 250 250 200 200 150 150 100 100 Average Duration (ms) Average Duration (ms) 50 50 0 0 i¤ I e¤ E @ a O ^ o⁄ U u 5 { | a⁄ o¤ a¤ ¤u i¤ I e¤ E @ a O ^ o⁄ U u 5 { | a⁄ o¤ a¤ ¤u Vowel Vowel 6.345 Automatic Speech Recognition Speech Sounds 6 Rob's Happy Little Vowel Chart "So inaccurate, yet so useful." is m F3 ight ink y l o! Th ow to g FRONT ? ay i Yo e w ur pal 5 is th I e E @ uÚ ^,{ TENSE = Towards Edges tends to be longer LAX = Towards Center Increases O tends to be shorter 2 U a F o BACK u SCHWAS: Plain [{] About [{ba⁄t] Front [|] Roses [ro⁄z|z] HIGH MID LOW Retroflex [}] Forever [f}Ev}] F1 Increases 6.345 Automatic Speech Recognition Speech Sounds 7 Fricative Production • Turbulence produced at narrow constriction • Constriction position determines acoustic characteristics • Can be produced with periodic excitation [f][T][s][S] 6.345 Automatic Speech Recognition Speech Sounds 8 Fricatives of American English • There are 8 fricatives in American English • Four places of articulation: Labio-Dental (Labial), Interdental (Dental), Alveolar,andPalato-Alveolar (Palatal) • They are often described by the features Voiced/Unvoiced,or Strident/Non-Strident (constriction behind alveolar ridge) Type Unvoiced Voiced Labial /f/ffee /v/vv Dental /T/ththief/D/dhthee Alveolar /s/ssee /z/zz Palatal /S/ sh she /Z/zhGigi 6.345 Automatic Speech Recognition Speech Sounds 9 Spectrograms of Unvoiced Fricatives Time (seconds) Time (seconds) Time (seconds) Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 16 16 16 16 16 16 16 16 Zero Crossing Rate Zero Crossing Rate Zero Crossing Rate Zero Crossing Rate kHz 8 8 kHzkHz 8 8 kHzkHz 8 8 kHzkHz 8 8 kHz 0 0 0 0 0 0 0 0 Total Energy Total Energy Total Energy Total Energy dB dB dB dB dB dB dB dB Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz dB dB dB dB dB dB dB dB 8 8 8 8 8 8 8 8 Wide Band Spectrogram Wide Band Spectrogram Wide Band Spectrogram Wide Band Spectrogram 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 kHz 4 4 kHzkHz 4 4 kHzkHz 4 4 kHzkHz 4 4 kHz 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Waveform Waveform Waveform Waveform 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 fee thief see she /fi¤ //Ti¤ f//si¤ //Si¤ / 6.345 Automatic Speech Recognition Speech Sounds 10 Fricative Energy NON-STRIDENT STRIDENT Probability Density unadjusted for frequency 0.0 0.02 0.04 0.06 -100 -90 -80 -70 -60 -50 -40 Average Total Energy Strident fricatives tend to be stronger than non-strident fricatives. 6.345 Automatic Speech Recognition Speech Sounds 11 Fricative Durations UNVOICED VOICED Probability Density unadjusted for frequency 02468101214 0.0 0.05 0.10 0.15 0.20 0.25 0.30 Duration Voiced fricatives tend to be shorter than unvoiced fricatives. 6.345 Automatic Speech Recognition Speech Sounds 12 Examples of Fricative Voicing Contrast Time (seconds) Time (seconds) Time (seconds) Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 16 16 16 16 16 16 16 16 Zero Crossing Rate Zero Crossing Rate Zero Crossing Rate Zero Crossing Rate kHz 8 8 kHzkHz 8 8 kHzkHz 8 8 kHzkHz 8 8 kHz 0 0 0 0 0 0 0 0 Total Energy Total Energy Total Energy Total Energy dB dB dB dB dB dB dB dB Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz dB dB dB dB dB dB dB dB 8 8 8 8 8 8 8 8 Wide Band Spectrogram Wide Band Spectrogram Wide Band Spectrogram Wide Band Spectrogram 7 7 7 7 7 7 7 7 6 6 6 6 6 6 6 6 5 5 5 5 5 5 5 5 kHz 4 4 kHzkHz 4 4 kHzkHz 4 4 kHzkHz 4 4 kHz 3 3 3 3 3 3 3 3 2 2 2 2 2 2 2 2 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 Waveform Waveform Waveform Waveform 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 sue zoo face phase /su//zu//fe¤ s//fe¤ z/ 6.345 Automatic Speech Recognition Speech Sounds 13 Rob's Friendly Little Consonant Chart "Somewhat more accurate, yet somewhat less useful." The Semi-vowels: Place of Articulation yiis like an extreme Labial Dental Alveolar Palatal Velar wuis like an extreme lois like an extreme p b t d k g r5is like an extreme The Odds and Ends: f v T D s z S Z h (unvoiced h) H (voiced h) FricativeWeak Stop (Non-strident) Strong (Strident) F (flap) FÊ (nasalized flap) m n4 ? (glottal stop) Manner of Articulation Nasal Voicing: Unvoiced Voiced The Affricates: Ctis like +S Jdis like +Z 6.345 Automatic Speech Recognition Speech Sounds 14 What is this word? Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 16 16 Zero Crossing Rate kHz 8 8 kHz 0 0 Total Energy dB dB Energy -- 125 Hz to 750 Hz dB dB 8 8 Wide Band Spectrogram 7 7 6 6 5 5 kHz 4 4 kHz 3 3 2 2 1 1 0 0 Waveform 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1.1 6.345 Automatic Speech Recognition Speech Sounds 15 Stop Production • Complete closure in the vocal tract, pressure build up • Sudden release of the constriction, turbulence noise • Can have periodic excitation during closure [b][d][g] 6.345 Automatic Speech Recognition Speech Sounds 16 Stops of American English • There are 6 stop consonants in American English • Three places of articulation: Labial, Alveolar,andVelar • Each place of articulation has a voiced and unvoiced stop Type Voiced Unvoiced Labial /b/ b bought /p/ p pot Alveolar /d/ d dot /t/ t tot Velar /g/ggot /k/kcot •Unvoiced stops are typically aspirated • Voiced stops usually exhibit a “voice-bar’’ during closure • Information about formant transitions and release useful for classification 6.345 Automatic Speech Recognition Speech Sounds 17 Spectrograms of Unvoiced Stops Time (seconds) Time (seconds) Time (seconds) 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 16 16 16 16 16 16 Zero Crossing Rate Zero Crossing Rate Zero Crossing Rate kHz88 kHz kHz88 kHz kHz 8 8 kHz 00000 0 Total Energy Total Energy Total Energy dB dB dB dB dB dB Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz Energy -- 125 Hz to 750 Hz dB dB dB dB dB dB 8 8 8 8 8 8 Wide Band Spectrogram Wide Band Spectrogram Wide Band Spectrogram 7 7 7 7 7 7 6 6 6 6 6 6 5 5 5 5 5 5 kHz4 4 kHz kHz4 4 kHz kHz 4 4
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages48 Page
-
File Size-