Formants, source-filter theory, and related acoustic concepts
Ling 205 Fall 2013
Where do formants come from?
From frequency to wavelength
speedofsound wavelength= frequency
(for speech purposes, typically measured in centimeters) speed of sound = 340 m/s = 34000 cm/s
wavelength
point of maximum amplitude within the cycle end of cycle
wavelength
1/4 2/4 3/4 4/4
so, the peak (point of maximum amplitude) occurs at ¼ of the wavelength
If the sound is emitted from a uniform tube which is exactly ¼ the wavelength, the sound will emerge from the tube at its loudest
sound mouth of source tube
Whereas if it has a shorter or longer wavelength, the sound coming out of the tube is weaker
Resonance
● In speech, the soundwave from the glottal source is not a simple sine wave, but a complex waveform, with multiple component frequencies (i.e. the fundamental and all the harmonics)
● Imagine the supralaryngeal vocal tract as a uniform tube of length L. The component frequencies that emerge from that tube at maximum strength are those with corresponding wavelengths L/4, L/8, L/16, etc. These are the resonant wavelengths.
Resonant frequencies
● If frequency F corresponds to wavelength L/4, the other resonant frequencies are 5F, 9F, 13F, etc.
● Non-resonant frequency components of the source sound will be damped to a greater or lesser degree depending on how far they are from the resonant frequencies.
Where formants come from
● Formants are the regions at and around the resonant frequencies. ● In these regions, the harmonics emerge at maximum strength, in all other regions they are damped.
3 tube model
● The vocal tract is not a uniform tube. But the effects of tongue body position on the formants can be modelled (to a close approximation) as a sequence of three tubes
tube 2 laryngeal lips source tube 1 tube 3 tongue body
● Depending on the size of each tube (determined by placement of the tongue body),
particular formant freqencies emerge. Formants
● Only the first 3 or 4 formants are relevant for linguistic phonetics. – (Higher formants are useful for identifying speaker voices, but not for identifying what's being said.) ● F1 (first formant) frequency is inversely correlated with tongue body height: high F1 = low vowel. ● F2 (second) freq. is directly correlated with tongue body advancement: high F2 = front vowel.
Source-filter theory
● The sound emerging from the vocal tract (in voiced sounds anyway) can be thought of as the product of 2 things – The sound source, i.e. glottal pulses (determines the F0 and the frequency of the harmonics) – The sound filter, i.e. the supra-laryngeal vocal tract (depending on how it's shaped at any given moment, boosts amplitude of harmonics near resonant frequencies, and damps harmonics elsewhere, resulting in some pattern of formants)
Glottal source
● If there were no filter on top of it, a spectrum of the glottal source would show near-linear decrease in amplitude of the harmonics as frequency increases.
Filter spectrum
Typical F1, F2, F3 values for an adult male speaker saying [æ]
Product of source and filter
● Results in the actual spectrum that we can observe in Praat
Source-filter independence
Likewise, we could change the source (F0 and harmonic frequencies) Same source without changing spectrum as the filter. The previous slide, precise but different harmonics would filter then be in different places, but the location of the formants would be the same.
Aperiodic spectra
● Though an aperiodic sound (e.g. a voiceless fricative) has no F0, it still may be stronger in certain frequency regions. – Compare sounds of [θ, s, ʃ ] – Fricative spectrum is primarily determined by the size of the cavity in front of the point of constriction: ● the larger the cavity, the lower the centre of energy ● Go through HW 2