Synthesis of Breathy, Normal, and Pressed Phonation Using a Two-Mass Model with a Triangular Glottis
Total Page:16
File Type:pdf, Size:1020Kb
Synthesis of breathy, normal, and pressed phonation using a two-mass model with a triangular glottis Peter Birkholz, Bernd J. Kroger,¨ Christiane Neuschaefer-Rube Clinic of Phoniatrics, Pedaudiology, and Communication Disorders, University Hospital Aachen and RWTH Aachen University, Aachen, Germany [email protected], [email protected], [email protected] Abstract qualities. A previous vocal fold model with inclined mass ele- ments somewhat similar to ours was presented by Childers [9], Two-mass models of the vocal folds and their variants are valu- but it was more simplified and not designed for the simula- able tools for voice synthesis and analysis, but are not able to tion of voice qualities. The proposed model is introduced in produce breathy voice qualities. The produced voice qualities Sec. 2. Section 3 describes the synthesis of vowels for different usually lie between normal and pressed. The reason for this degrees of abduction with both the classical TMM [1] and the property is that the mass elements are aligned parallel to the new TMM. Section 4 compares the performance of both models dorso-ventral axis. Thereby, the glottis always closes simulta- with respect to the voice quality of synthesized vowels on the neously along the entire length of the vocal folds. For breathy kinematic, acoustic, and perceptual level. phonation, however, the closure happens rather gradual. This article introduces a modified two-mass model with mass ele- ments that are inclined with respect to the dorso-ventral axis as 2. Proposed two-mass model a function of the degree of abduction. In this way, the closing 2.1. Mechanics phase of the glottis becomes progressively more gradual when the degree of abduction is increased. This model is able to pro- Each vocal fold is represented by two mass elements that duce the continuum of voice qualities from pressed over normal are connected to a fixed reference frame with springs ki and to breathy voices. dampers ri (i = 1, 2 for the lower and upper mass, respec- Index Terms: Vocal fold model, triangular glottis, voice quality tively) and coupled to each other with an additional spring kc (Fig. 1). We assume symmetry with respect to the midsagit- 1. Introduction tal plane. In the pre-phonatory rest position, the displacements of the masses at the posterior end (at z = 0) are given by Low-dimensional lumped-mass models of the vocal folds (e.g. xrest1(0) and xrest2(0). When xresti(0) 0, the displace- [1, 2, 3]) are able reproduce many properties of phonation, like ments decrease linearly towards zero at the≥ anterior commis- self-sustained oscillations over a wide frequency range, differ- sure, so that the pre-phonatory shape of the glottis becomes tri- ent voice registers, and the phase differences between the upper angular, i.e. xresti(z) = xresti(0)(1 z/l) for z > 0, where and lower margins of the vocal folds. However, the simula- l is the length of the vocal folds. In− the following, we use tion of breathy voice qualities was recognized as problematic the shorthand notation xresti xresti(0). Let x1 and x2 de- with this class of models [4]. During breathy phonation, the note the time-varying horizontal≡ displacements of the masses. vocal folds open and close more gradually than during modal Then, the half-width of the glottis along the dorso-ventral z- phonation, and the glottis does often not close entirely during axis is given by wi(z) = max 0,xresti(1 z/l)+ xi and a vibration cycle [5]. Previous low-dimensional lumped-mass the glottal areas between the lower{ and upper− mass pairs} are models cannot account for these properties, because their mass l Ai = 2 Rz=0 wi(z)dz. Figure 1b) and c) illustrate the shape of elements are aligned parallel to the dorso-ventral axis, so that the glottis for different time-varying displacements but the same the opening and closing of the vocal folds always happens si- pre-phonatory rest displacements. When the rest displacement multaneously along the entire length. This gives the synthetic xresti < 0, i.e. when the vocal folds are strongly adducted, then voice usually a pressed or normal voice quality. Vocal fold Ai = max 0, 2l(xresti + xi) , as in the classical TMM. models with multiple masses along the dorso-ventral dimension The equations{ of motion for} each of the masses are (e.g. [6]) can in principle account for gradual and incomplete ∗ closure, but at the expense of considerably increased complex- F1 = m1x¨1 + r1x˙ 1 + k1x1 + kcol1 1(x1 + xrest1) ity. Another class of models capable to simulate gradual clo- +kc(x1 x2) (1) sures and breathy voices are geometric models [7, 8]. However, − ∗ they are not self-oscillating and therefore less realistic from a F2 = m2x¨2 + r2x˙ 2 + k2x2 + kcol2 2(x2 + xrest2) physiological point of view. +kc(x2 x1), (2) In this study, we present a new two-mass model (TMM) − with mass elements that are inclined with respect to the dorso- where i are the time-varying relative portions of the length l, ventral axis as a function of the degree of abduction. This makes where the left and right masses are in contact (0 i 1, ∗ ≤ ≤ the opening and closing of the vocal folds progressively more cf. Fig. 1b and c), and xresti are the rest displacements in ∗ ∗ gradual with increasing abduction and results in incomplete clo- the middle of these portions along the z-axis (at z1 and z2 in sure at high degrees of abduction. This allows to synthesize dif- Fig. 1c). kcol1 and kcol2 are the spring constants of the ad- ferent degrees of breathiness besides pressed and normal voice ditional springs that repel the left and right vocal folds during a) Vocalfoldsinrestposition b) Wideopenvocalfoldsduring c)Partlyclosedvocalfoldsduring anoscillationcycle anoscillationcycle (topview) (topview) z Anteriorcommissure z * z2 d out kr, 2 2 z * l 1 m2 k d2 c d m 1 1 0 x2 kr1, 1 x * * xrest2 xrest1 x x (z) x (0) x1 d rest1,2 rest1,2 in Inletregion Uppermass Lowermass á1=á2=0 á1=0.7 á2=0.23 Figure 1: (a) Pseudo-3D view of the model. (b,c) Top view of the model for a wide open and a partly closed glottis during an oscillation cycle with the same pre-phonatory rest displacements. The dotted lines show the vocal fold margins in the rest position, i.e. xrest1,2(z). collision. For simplicity, we use linear springs in our model, be- For the digital simulations, Eqs. 1 and 2 were approximated by cause the nonlinear spring characteristics of the classical model a finite difference scheme analog to [1] to obtain x1 and x2 at a have a relatively little effect on the oscillations according to [10, rate of 44100 Hz. p. 916]. The external forces are 2.2. Aerodynamic-acoustic model F1 = P1d1lopen1 + 0.25 (Psub + P1)dinl (3) ⋅ F2 = P2d2lopen2 + 0.25 (P2 + Psupra)doutl, (4) The model of the vocal folds was implemented in the frame- ⋅ work of the articulatory speech synthesizer VocalTractLab where lopen1 and lopen2 are the lengths of the open partitions (www.vocaltractlab.de). The synthesizer approximates between the upper and lower mass pairs (0 lopeni l), i.e. ≤ ≤ the trachea, the glottis, and the vocal tract as a series of abut- the partitions where the masses are not in contact. d1, d2, din, ting cylindrical tube sections with variable lengths. Two tube and dout are explained in Tab. 1. Psub, P1, P2, and Psupra sections with the time-varying lengths d1 and d2 and areas A1 denote the subglottal pressure, the pressures between the lower and A2 represent the glottis. The aerodynamic-acoustic simu- and upper masses, and the supraglottal pressure, respectively. lation is based on a transmission-line representation of the tube The second terms on the right-hand side of Eqs. 3 and 4 are system [11, 12]. The simulation assumes a Bernoulli-type flow the hinge moments on the lower and upper masses due to the from the subglottal region to the glottis section with the mini- mean pressures in the inlet and outlet regions. The classical mum diameter and flow detachement without dynamic pressure TMM neglects these forces, but we consider it as more realistic recovery at the exit of this section. This differs from the orig- to include them like e.g. [2]. inal assumptions by Ishizaka and Flanagan [1] and conforms with more recent findings about the pressure distribution in the glottis [13]. A dipole noise source injects white noise with an Table 1: Mechanical parameters of the two-mass model. Refer amplitude proportional to the squared Reynolds number of the to the main text for q, 1, and 2. glottal flow right above the glottis to simulate aspiration noise. Parameter Symbol Value Unit Vocal fold length l 1.3 √q cm ⋅ 3. Simulation experiments Lower mass thickness d1 0.25/√q cm Upper mass thickness d2 0.05/√q cm At the physiological level, pressed, normal, and breathy voice Lower mass m1 0.125/q g qualities mainly differ in terms of the degree of glottal abduc- Upper mass m2 0.025/q g tion (and hence glottal rest area), which is greatest for breathy Lower spring constant k1 80 q N/m ⋅ voice, least for pressed voice, and somewhere in between for Upper spring constant k2 8 q N/m ⋅ 2 normal voice. We examined for both the classical TMM and Coupling spring constant kc 25 q N/m ⋅ the new TMM to what extend these models can reproduce this Lower collision spring cons. kcol1 240 q N/m ⋅ relationship between voice qualities and degrees of abduction Upper collision spring cons.