HCS 7367 Speech Perception
Total Page:16
File Type:pdf, Size:1020Kb
Course web page HCS 7367 • http://www.utdallas.edu/~assmann/hcs6367 Speech Perception – Course information – Lecture notes – Speech demos – Assigned readings Dr. Peter Assmann – Additional resources in speech & hearing Fall 2010 Course materials Course requirements http://www.utdallas.edu/~assmann/hcs6367/readings • Class presentations (15%) • No required text; all readings online • Written reports on class presentations (15%) • Midterm take-home exam (20%) • Background reading: • Term paper (50%) http://www.speechandhearing.net/library/speech_science.php • Recommended books: Kent, R.D. & Read, C. (2001). The Acoustic Analysis of Speech. (Singular). Stevens, K.N. (1999). Acoustic Phonetics (Current Studies in Linguistics). M.I.T. Press. Class presentations and reports Suggested Topics • Pick two broad topics from the field of speech • Speech acoustics • Vowel production and perception perception. For each topic, pick a suitable (peer- • Consonant production and perception reviewed) paper from the readings web page or from • Suprasegmentals and prosody • Speech perception in noise available journals. Your job is to present a brief (10-15 • Auditory grouping and segregation minute) summary of the paper to the class and • Speech perception and hearing loss • Cochlear implants and speech coding initiate/lead discussion of the paper, then prepare a • Development of speech perception written report. • Second language acquisition • Audiovisual speech perception • Neural coding of speech • Models of speech perception 1 Finding papers Finding papers PubMed search engine: Journal of the Acoustical Society of America: http://www.ncbi.nlm.nih.gov/entrez/ http://scitation.aip.org/jasa/ The evolution of speech: The evolution of speech: a comparative review a comparative review Primate vocal tract W. Tecumseh Fitch Primate vocal tract W. Tecumseh Fitch Trends in Cognitive Sciences 4(7) July 2000 Trends in Cognitive Sciences 4(7) July 2000 air sac orangutan chimpanzee human tongue body larynx Source-filter theory of speech production Human vocal tract Vibrating Output Lungs Vocal Vocal Sound folds Tract Power supply Oscillator Resonator 2 Organs of speech • Lungs: apply pressure to generate Acoustics air stream (power supply) • Larynx: air forced through the of speech glottis, a small opening between the vocal folds (sound source) • Vocal tract: pharynx, oral and ● Articulation nasal cavities serve as complex ● Phonation resonators (filter) Source-Filter Theory Audio demo: the source signal • Source signal for an adult male voice • Source signal for an adult female voice • Source signal for a 10-year child From Fitch, W.T. (2000). Trends in Cognitive Sciences Vocal fold oscillation Vocal fold oscillation • One-mass model • Three-Mass Model – Air flow through the – One large mass glottis during the closing (representing the phase travels at the thyroarytenoid muscle) same speed because of and two smaller masses, inertia, producing M1 and M2 (representing lowered air pressure the vocal fold surface) . above the glottis. All three masses are connected by springs and damping constants. Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.html Source: http://www.ncvs.org/ncvs/tutorials/voiceprod/tutorial/model.html 3 Source-Filter Theory: Vowels Source-Filter Theory: Vowels G. Fant (1960). Acoustic Theory of Speech Time domain version: Production U( t ) T( t ) R( t ) = P( t ) Linear systems theory Assumpt ions: (1) linear ity (2) t ime-iiinvariance Vowels can be decomposed into two primary components: a source (input signal) and a filter Amplitude (modulates the input). Frequency Frequency domain version: U( f ) · T( f ) · R( f ) = P( f ) Source properties In voiced sounds the glottal source spectrum contains Demo: harmonic synthesis a series of lines called harmonics. The lowest one is called the fundamental frequency Additive harmonic synthesis: vowel /i/ (F0). F Cumulative sum of harmonics: vowel /i/ 0 0 Additive synthesis: “wheel” -10 Cumulative sum of partials: Amplitude -20 Spectrum -30 -40 Relative Amplitude (dB) -50 0 200 400 600 800 1000 Frequency (Hz) Filter properties The vocal tract resonances (called formants) produce source filter radiation= output sound peaks in the spectrum envelope. Formants are labelled F1, F2, F3, ... in order of increasing frequency. / i / F1 F2 F3 0 -10 F4 Amplitude -20 Spectrum Amplitude / A / (with superimposed -30 LPC spectral envelope) Amplitude in dB in Amplitude -40 -50 0 1 2 3 4 Frequency Frequency (kHz) 4 Source: J. Hillenbrand Source-filter theory Source-filter theory Robert Mannell, Macquarie University http://clas.mq.edu.au/phonetics/phonetics/vowelartic/vowel_artic.gif Source: J. Hillenbrand Source: J. Hillenbrand Source: J. Hillenbrand Source-filter theory Source-filter theory Source: J. Hillenbrand Source-filter theory Speech terminology… /s/ and /S/ SF Models Overlaid Fundamental frequency (F0): lowest frequency component in voiced speech sounds, linked to vocal fold vibration. Formants: resonances of the vocal tract. Amplitude Formant F0 Frequency 5 Source properties: Pitch Source properties: Pitch Fundamental frequency (F0) is determined by the F0 can be removed by filtering (as in telephone circuits) rate of vocal fold vibration, and is responsible for the and the pitch remains the same. perceived voice pitch. This is the problem of the missing fundamental, one of the oldest problems in hearing science. Pitch is determined by the frequency pattern of the harmonics (or their equivalent in the time domain, the periodicities in the waveform). Harmonicity and Periodicity Harmonicity and Periodicity Period: regularly repeating pattern in the Harmonic: regularly repeating peak in the waveform amplitude spectrum Period duration, T0 = 6 ms 20 F0 = 1000 / 6 = 166 Hz waveform 0 F0 = 1 / T0 -20 Amplitude (dB) amplitude -40 spectrum 0 0.5 1 1.5 2 2.5 F(kH) Harmonicity and Periodicity Voicing irregularities • Period: regularly repeating pattern in Shimmer: variation in amplitude from one the waveform Period duration T = 6 ms 0 Waveform cycle to the next. Jitter: variation in frequency (period duration) from one cycle to the next. 20 F0 = 1000 / 6 = 166 Hz Harmonics are F0 = 1 / T0 0 integer multiples of F0 and are -20 evenly spaced in Amplitude Amplitude (dB) frequency -40 Spectrum 0 0.5 1 1.5 2 2.5 Frequency (kHz) 6 Vocal tract properties Voicing irregularities Resonating tube model Breathy voice is associated with a glottal – approximation for neutral vowel (schwa), [´] waveform with a steeper roll-off than modal – closed at one end (glottis); open at the other (lips) voice. As a result there is less energy in the – uniform cross-sectional area higher harmonics (steeper slope in the – curvature is relatively unimportant spectrum). Glottis Lips length, L Uniform tube model (schwa) Vocal tract model Quarter-wave resonator: /´/ Fn = ( 2n – 1 ) c / 4 L – Fn is the frequency of formant n in Hz – c is the v elocity of sou nd (abou t 35000 cm/sec) – L is the length of the vocal tract (17.5 for adult male) Acoustic vowel space Vocal tract model 3000 Quarter-wave resonator: i “heed” Fn = ( 2n – 1 ) c / 4 L 2000 – F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz uency (Hz) q – F2 = (2(2) –1)*35000/(4*17. 5) = 1500 Hz Ə fre – F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz F2 1000 ɑ “hod” u “who’d” 0 Second formant, 0 200 400 600 800 1000 First formant, F1 frequency (Hz) 7 Vocal tract model Vocal tract model Quarter-wave resonator: Quarter-wave resonator: Fn = ( 2n – 1 ) c / 4 L Fn = ( 2n – 1 ) c / 4 L – Fn is the frequency of formant n in Hz – F1 = (2(1) –1)*35000/(4*17.5) = 500 Hz – c is the v elocity of sou nd in air (abou t 35000 cm/sec) – F2 = (2(2) –1)*35000/(4*17. 5) = 1500 Hz – L is the length of the vocal tract (17.5 for adult male) – F3 = (2(3) –1)*35000/(4*17.5) = 2500 Hz L L Helium speech Helium speech The speed of sound in a helium/oxygen mixture Using Matlab as a calculator, find the at 20°C is about 93000 cm/s, compared to frequencies of F1, F2 and F3 for a 17.5 cm 35000 cm/s in air. This increases the resonance vocal tract producing the vowel /ə/ in a frequencies but has relatively little effect on F0. helium/air mixture (velocity c ≈ 93000 cm/s) In helium speech, the formants are shifted up Fn = ( 2n – 1 ) c / 4 L but the pitch stays the same. F1 = (2(1) –1)*93000/(4*17.5) = F2 = (2(2) –1)*93000 /(4*17.5) = F3 = (2(3) –1)*93000 /(4*17.5) = Speech in air Helium speech Audio demos 4 – Speech in air – Speech in helium 3 Hz) – Pitch in air k 2 – Pitch in helium Frequency ( 1 0 http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html 0 100 200 300 400 500 600 700 800 900 Time (ms) http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html 8 Speech in helium Perturbation Theory 4 The first formant (F1) frequency is lowered by a constriction in the front half of the vocal tract (/u/ and /i/), and raised when the constriction is 3 in the back of the vocal tract, as in /A/. Hz) k 2 Frequency ( 1 delta F1 0 glottis lips 0 100 200 300 400 500 600 700 800 Time (ms) http://phys.unsw.edu.au/phys_about/PHYSICS!/SPEECH_HELIUM/speech.html Perturbation Theory Perturbation Theory The second formant (F2) is lowered by a The third formant (F3) is lowered by a constriction near the lips or just above the constriction at the lips or at the back of the pharynx; in /u/ both of these regions are mouth or in the upper pharynx. This occurs in constricted. F2 is raised when the constriction is /r/ and /r/-colored vowels like American be hin d the lips an d tee th, as in the vowe l /i/.