A. Acoustic Theory and Modeling of the Vocal Tract
Total Page:16
File Type:pdf, Size:1020Kb
A. Acoustic Theory and Modeling of the Vocal Tract by H.W. Strube, Drittes Physikalisches Institut, Universität Göttingen A.l Introduction This appendix is intended for those readers who want to inform themselves about the mathematical treatment of the vocal-tract acoustics and about its modeling in the time and frequency domains. Apart from providing a funda mental understanding, this is required for all applications and investigations concerned with the relationship between geometric and acoustic properties of the vocal tract, such as articulatory synthesis, determination of the tract shape from acoustic quantities, inverse filtering, etc. Historically, the formants of speech were conjectured to be resonances of cavities in the vocal tract. In the case of a narrow constriction at or near the lips, such as for the vowel [uJ, the volume of the tract can be considered a Helmholtz resonator (the glottis is assumed almost closed). However, this can only explain the first formant. Also, the constriction - if any - is usu ally situated farther back. Then the tract may be roughly approximated as a cascade of two resonators, accounting for two formants. But all these approx imations by discrete cavities proved unrealistic. Thus researchers have have now adopted a more reasonable description of the vocal tract as a nonuni form acoustical transmission line. This can explain an infinite number of res onances, of which, however, only the first 2 to 4 are of phonetic importance. Depending on the kind of sound, the tube system has different topology: • for vowel-like sounds, pharynx and mouth form one tube; • for nasalized vowels, the tube is branched, with transmission from pharynx through mouth and nose; • for nasal consonants, transmission is through pharynx and nose, with the closed mouth tract as a "shunt" line. The situation becomes even more complicated for plosive and fricative consonants, where one must furt her take into account different places and kinds of excitation. Instead of - or in addition to - glottal oscillation, there is a turbulent-noise source at any narrow constriction, and for plosives, a sudden pressure release after opening a closure. 204 A. Acoustic Theory and Modeling of the Vocal Tract In this appendix, we will present a fundamental description of a one dimensional tube in the time and frequency domains, show the connection between tube shape and formants, and present methods for time-domain modeling of sound propagation as weIl as frequency-domain computation of transfer functions and impedances. The "inverse problem" of how to estimate the tract shape from acoustical data will also be discussed briefty. A.2 Acoustics of a Hard-Walled, Lossless Tube To keep the formulas simple, we will present the fundamental equations for the hard-walled, lossless tube only, but first allowing time-varying tube shape. The more general case will only be described in the frequency domain for a time-invariant tube shape. In addition to the well-known representation by pressure and volume velo city, less familiar representations will be introduced that are useful for modeling and computation or show analogies to other fields of physics. A.2.1 Field Equations The acoustic field equations are derived from the N avier-Stokes ftow equa tions by linearization, assuming that ftow velo city is small compared to the speed of sound, c. Furthermore, the non-zero average (dc) air ftow in the vocal tract is neglected (its effects on the formants would be of second order on1y). But keep in mind that at narrow constrictions, nonlinear effects can become important, e.g. when whistling or in the glottis. Additional approximations are: (1) The curved vocal tract is treated like a straight tube. (2) The waves propagate one-dimensionally along the tube axis x and are approximately plane. This requires that the slope of the tube walls be small. (3) No higher mo des with nodes over the cross-section are taken into account [they are removed by the integrals in (A.1) below]. In the vocal tract, higher modes cannot propagate below about 4 kHz. Thus the tube is entirely described by its "area function" A(x, t). Let y, z be the coordinates in the cross-section plane. The appropriate field quantities are the a1ternating (ac) press ure averaged over the cross-section, p(x, t), and the volume velocity q(x, t), defined as p(x,t) = A(;,t) JJp~(x,y,z,t)dydz, A (A.1) q(x,t) = JJvx(x,y,z,t)dydz. A Here p~ is the three-dimensional ac pressure field and V x the x component of the velocity field. The ac density p(x, t) is defined analogously to p(x, t) and proportional to it (state equation); the constant average density will be A.2 Acoustics of a Hard-Walled, Lossless Tube 205 denoted by {!o. The motion is then described by "Newton's law", the (one dimensional) continuity equation, and the state equation: (!o(q/A)" = _pi , (A.2) ({!A)" + {!oA = -{!oq' , (A.3) p=c2 {!. (A.4) Here, a dot denotes 0/ ot, a prime means 0/ ox. The proportionality factor c2 in (A.4) will in fact turn out to be the speed of sound (phase velocity). The second term in (A.3) represents a flow source due to the motion of the tube walls. Since in the vocal tract these move too slowly to generate audible sound, this term will henceforth be neglected. Then the last two equations can be combined into (pA)) {!OC2 = -q' . (A.5) Obviously, the two field equations (A.2), (A.5) are of a form analogous to those of a lossless electrical transmission line, if p is identified with voltage and q with current and L' = {!o/A, C' = A/ {!OC2 (A.6) correspond to an inductance and capacitance density, respectively. These are not independent, since L'C' = c-2 is constant. Thus the tube may be com pletely described by the eharaeteristie impedanee: Z = JL' /C' = (!oc/A; (A.7) then L' = Z/e, C' = l/Ze, and for any derivative or variation "0" we have oA/A = oC'/C' = -oL'/L' = -OZ/Z. (A.8) Equations (A.2), (A.5) are rewritten as _pi = (L'q)" = c-1 (qZ)" , (A.9a) -q' = (C'p)" = c-1 (p/Z)" . (A.9b) The infinitesimal transmission-line element is then that shown in Fig. A.1. Note that in the time-varying case, L' contains a quasi-resistive and C' a quasi-conductive component, since (L'q)" = L'q + i/q, etc. The energy balance of the tube can be derived by multiplying (A.9a) and (A.9b) with q and p, respectively, and adding them, leading to the continuity equation L' dx ~'----iI""---- C' dx Fig. A.1. Electrical equivalent circuit of the I infinitesimallossless transmission-line element 206 A. Acoustic Theory and Modeling of the Vocal Tract w )" P' (w - wq)Z/Z , (Wp + q + = p (A.lO) w p = C'p2/2, w q = L'q2/2, P =pq, where wp and wq are the potential and kinetic energy densities and P is the power (energy flow). The right-hand side represents an energy source density due to work of the moving walls against the radiation pressure in the tube. This term vanishes in the time-invariant case, so that energy is then conserved. Note that all equations are invariant under the duality transformation p+-+q, L' +-+ C', Z+-+l/Z. (A.ll) Another familiar representation uses the velocity potential P, related to p and q according to p = {!ocP, q=-Ap'. (A.12) These equations imply (A.2) or (A.9a) automatically. Inserting them in (A.5) now yields a second-order wave equation, namely Webster's horn equation, generalized for time-varying A: (A.13) which has a completely symmetrie form in the space and time derivatives. Another similar form with A replaced by 1/A - again corresponding to the duality transformation (A.ll) - can be obtained by using the volume dis placement J q dt instead of P. When A and thus L', C', Z are constant in time, Webster's horn equation can be written in its familiar form, not only for velocity potential and volume displacement but also for pressure and volume velocity themselves: p"+ (A'/A)p' - c-2 jj = 0, (A.14) q"- (A' /A)q' - c-2 ij = O. (A.15) Other Representations. a) Square root of energy density. By replacing p and q with the corresponding square-root-of-energy-densi ty com ponents, 7jJ = PVC' /2, rp = qVU/2, (A.16) the field equations now read c-1(Zl/2rp)" = _(Zl/27jJ)' , (A.17) or equivalently, c-1(cp + Qrp) = -7jJ' - W7jJ , c-1(-rj; - Q7jJ) = -rp' + Wrp, (A.18a) with Q = Z/2Z = -Ä/2A, W = Z'/2Z = -A'/2A. (A.18b) In the time-invariant case (Q = 0), this representation removes the first-order derivatives of the fields from the Webst er horn equations (A.14), (A.15), yielding Schrödinger (or rather, Klein-Gordon) type equations instead: A.2 Acoustics of a Hard-Walled, Lossless Tube 207 -'1//' + Vp~ + c-2;j; = 0 , (A.19) -'{J" + Vq'{J + c-2 y; = 0 , (A.20) with potentials Vp = (JA)"/JA = W 2 - W', (A.21) Vq = (I/JA)" JA = W 2 + W' . This representation may be useful for eigenvalue problems and inverse prob lems. In the quantum-mechanical terminology, Vp and Vq have the form of supersymmetrie partner potentials with W as superpotential. b) Wave quantities. As will become apparent in Sect. A.3 below, it is often advantageous to transform p and q to an equivalent representation by "right" and "left" traveling waves, whose dimension may be pressure, volume velocity, or square root of power.