<<

Formants, source-filter theory, and related acoustic concepts

Ling 205 Fall 2013

Where do formants come from?

From to wavelength

speedofsound wavelength= frequency

(for speech purposes, typically measured in centimeters) speed of = 340 m/s = 34000 cm/s

wavelength

point of maximum amplitude within the cycle end of cycle

wavelength

1/4 2/4 3/4 4/4

so, the peak (point of maximum amplitude) occurs at ¼ of the wavelength

If the sound is emitted from a uniform tube which is exactly ¼ the wavelength, the sound will emerge from the tube at its loudest

sound mouth of source tube

Whereas if it has a shorter or longer wavelength, the sound coming out of the tube is weaker

Resonance

● In speech, the soundwave from the glottal source is not a simple sine wave, but a complex waveform, with multiple component (i.e. the fundamental and all the )

● Imagine the supralaryngeal vocal tract as a uniform tube of length L. The component frequencies that emerge from that tube at maximum strength are those with corresponding wavelengths L/4, L/8, L/16, etc. These are the resonant wavelengths.

Resonant frequencies

● If frequency F corresponds to wavelength L/4, the other resonant frequencies are 5F, 9F, 13F, etc.

● Non-resonant frequency components of the source sound will be damped to a greater or lesser degree depending on how far they are from the resonant frequencies.

Where formants come from

● Formants are the regions at and around the resonant frequencies. ● In these regions, the harmonics emerge at maximum strength, in all other regions they are damped.

3 tube model

● The vocal tract is not a uniform tube. But the effects of tongue body position on the formants can be modelled (to a close approximation) as a sequence of three tubes

tube 2 laryngeal lips source tube 1 tube 3 tongue body

● Depending on the size of each tube (determined by placement of the tongue body),

particular formant freqencies emerge. Formants

● Only the first 3 or 4 formants are relevant for linguistic . – (Higher formants are useful for identifying speaker voices, but not for identifying what's being said.) ● F1 (first formant) frequency is inversely correlated with tongue body height: high F1 = low . ● F2 (second) freq. is directly correlated with tongue body advancement: high F2 = front vowel.

Source-filter theory

● The sound emerging from the vocal tract (in voiced anyway) can be thought of as the product of 2 things – The sound source, i.e. glottal pulses (determines the F0 and the frequency of the harmonics) – The sound filter, i.e. the supra-laryngeal vocal tract (depending on how it's shaped at any given moment, boosts amplitude of harmonics near resonant frequencies, and damps harmonics elsewhere, resulting in some pattern of formants)

Glottal source

● If there were no filter on top of it, a spectrum of the glottal source would show near-linear decrease in amplitude of the harmonics as frequency increases.

Filter spectrum

Typical F1, F2, F3 values for an adult male speaker saying [æ]

Product of source and filter

● Results in the actual spectrum that we can observe in Praat

Source-filter independence

Likewise, we could change the source (F0 and frequencies) Same source without changing spectrum as the filter. The previous slide, precise but different harmonics would filter then be in different places, but the location of the formants would be the same.

Aperiodic spectra

● Though an aperiodic sound (e.g. a voiceless ) has no F0, it still may be stronger in certain frequency regions. – Compare sounds of [θ, s, ʃ ] – Fricative spectrum is primarily determined by the size of the cavity in front of the point of constriction: ● the larger the cavity, the lower the centre of energy ● Go through HW 2