<<

Journal of (1997) 25, 405— 419

Lip-pellet positions during and labial

John R. Westbury and Michiko Hashi Waisman Center and Department of Communicative Disorders, University of Wisconsin-Madison, 1500 Highland Avenue, Madison, WI 53705—2280, U.S.A.

Received 15th July 1996, and in revised form 8th April 1997

Sagittal-plane movements of small markers attached to the upper and lower were analyzed for ten speakers of American English, and seven speakers of Japanese. Each speaker produced simple utterances containing vowels and labial consonants. The data were analyzed to better understand: (1) patterns of pellet motions associated with labial production; (2) pellet positions at discrete, acoustically- defined moments during selected speech sounds; and (3) the relationship between midline separation between the surfaces and inter-lip-pellet distance. Results from the study provide qualitative information about the dynamics of labial gestures for consonants involving lip closure. The data also indicate that the English and Japanese speakers positioned and moved their lips in generally similar ways during the test sounds analyzed. Finally, results suggest that plausible estimates of mid-line inter-lip separation can be derived from the trajectories of two pellets, one on each lip, as long as the possibility of lip-body deformation is taken into account. ( 1997 Academic Press Limited

1. Introduction

Houde (1968) introduced the phrase ‘‘point-parameterized’’ to refer to records of speech movement that take the form of trajectories of discrete, point-like ‘‘markers’’ (e.g., pellets, coils, light-emitting diodes, reflecting disks, or even bony landmarks) on an articulator. Several modern techniques for studying speech movement provide data of this type for accessible articulators such as the , lips, jaw, and soft palate. For example, the X-ray microbeam (e.g., Macchi, 1988), electro-magnetic articulometry (e.g., Perkell, Matthies, Svirsky & Jordan, 1993), and certain opto-electrical techniques (e.g., Boyce, 1990), have all been used to examine and describe speech-related motions of the lips in terms of sagittal-plane coordinates of the centers of small markers, one firmly attached in the mid-line at the vermillion border of each lip. The lips pucker and spread and rise and fall during speech, as talkers shape the inter-lip cavity to modify the spectrum of sound radiating from the mouth. In the best of all worlds, a complete understanding of these actions might be inferred from a sagittal-plane representation of the motions of two fleshpoints. However, more must be known about the relationship between lip-marker

 A recent report by Ramsey, Munhall, Gracco & Ostry (1996) describes three-dimensional fleshpoint- kinematic data, recorded from a single talker, using multiple lip markers tracked by means of an optoelectrical system. Those data provide a more complete view of the actions of the lips than the more conventional two-marker, sagittal-plane view considered in this report. However, inferences about size and shape of the lip opening, from both types of data, will probably be constrained in similar ways. 0095—4470/97/040405#15 $25.00/0/jp970050 ( 1997 Academic Press Limited 406 J. R. ¼estbury and M. Hashi positions, and the nature and degree of labial constrictions, before broad inferences about lip action can be drawn. A handful of direct-imaging studies of lip function (e.g., Fujimura, 1961; Fromkin, 1964; Linker, 1982; Abry & Bo¨ e, 1986) provide information about time changes in size and shape of the lip opening during speech. However, these studies do not tell us what discrete point motions reveal about labial articulation. An indirect way to begin to address this question is to examine trajectories of lip markers during speech events that limit lip opening. The labial consonants /p b m/ are events of this type. During their closures, no air exits the mouth, because the lips are fully together and form a tight seal. For this reason, we might expect the positions of lip markers, and the distances between them, to vary little during consonantal closure intervals. The analysis of lip-marker kinematics described in this report was designed, in part, to address these simple expectations, and in general, to provide an improved understanding of two broader topics. The first relates to how midline labial fleshpoints move as lip closures are formed and released for the labial consonants /p b m/. The second relates to the question of whether reliable conclusions about labial constrictions are possible from something so simple as the sagittal-plane trajectories of one marker on each lip. A practical benefit associated with a better understanding of the articulatory signifi- cance of lip-marker positions has to do with representing vocal tract postures for vowels. It is common to think of the vocal tract as a flexible tube, and to define the articulatory posture at any moment during speech in terms of an area function which represents the cross-sectional area of the tube as a function of its length, from the glottis to the lips. A plausible labial termination can easily be ‘‘attached’’ to a tube approximation of the

Figure 1. A stylized tracing of a midsagittal section of the vocal tract. Hypothetical pellets are represented by small solid circles ‘‘attached’’ to the outlines of both lips. Letter symbols and lines and defined in the text. ¸ip-pellet positions during selected sounds 407 vocal tract if length and degree of the lip constriction are simply related to the measured positions of upper and lower-lip markers. A sketch shown in Fig. 1 suggests a scheme for doing this. To a first approximation, constriction length might be represented by the length of a line such as A, drawn tangent to the lower-most edges of the maxillary incisors, and perpendicular to the segment connecting upper and lower-lip marker positions. Constriction degree reflected by the midsagittal separation between the lip surfaces, analogous to B, might then be represented by the distance C between the two markers, minus some reference measure of the lips’ combined thickness. A ‘‘candi- date’’ reference measure might be some distance between markers when the opposing lip surfaces are in contact (e.g., during closure for a labial stop).

2. Methods and materials

Acoustic and labial fleshpoint kinematic data from a sample of 10 speakers of American English, and seven speakers of Japanese, were analyzed. Data from these speakers were not collected specifically for this analysis. Instead, they were available from existing corpora, in which the sound pressure wave had been sampled 21,739 times/s, while each of the speakers produced single examples of /p b m/ in isolated /" — "/ frames, with primary stress on the second ; and, single, isolated examples of the five vowels /i 2"ou/. During each brief speech task, sagittal-plane positions of upper-lip (UL) and lower-lip (LL) markers (gold pellets, 2.5 mm in diameter, attached in the midline at the vermillion border), were recorded at 40 and 80 times/s, respectively, using the X-ray microbeam (XRMB) system at the University of Wisconsin. Pellet positions were expressed relative to cranial axes, defined relative to each speaker’s maxillary occlusal plane (MaxOP) and central maxillary incisors (CMI), according to conventions de- scribed elsewhere (Westbury, 1994a). Materials for the English speakers were drawn from the publicly available XRMB Database (Westbury, 1994b) [speakers E29 (female), E31(f), E34(f), E35(f), E41 (male), E44(m), E53(m), E54(f), E59(m), E61(m)]. Materials for the Japanese

 Throughout the text of this report, the mid-front, low-back, and high-back vowels of Japanese are transcribed phonemically as /2"u/, respectively. These symbols are correct for the English vowels, but possibly misleading for Japanese. Sources cited by Vance (1987) suggest that the mid- in Japanese may be phonetically ‘‘midway’’ between [e] and [2]; the low- midway between [a] and ["]; and the high-back vowel closer to [ɯ]. In Fig. 5 the symbols e and a are used to represent the mid-front and low-back vowels produced by both sets of speakers. This usage is due to graphical limitations imposed by plotting software.  The XRMB system sometimes fails to track one or more pellets for some or all time samples spanning a speech task. Examples of /m b/ were lost to mistracking for one Japanese speaker (J5). Consequently, only 49 VCV utterances were available for analysis, rather than the 51 that would be expected given one example each of/pbm/inan["—"] frame, for seventeen speakers. Two time samples of UL pellet position were also lost to mistracking for one English speaker (E61), during the release gesture for /m/. However, sufficient residual information was available in the latter case to approximate the two missing time samples inthe trajectory, using a simple linear interpolation scheme.  Identification numbers for English speakers are the same as in the publicly available XRMB Speech Production Database, while the speech tasks correspond to vcv and vowel records from that corpus. Waveforms analyzed in the current study were processed and filtered in a slightly different way from those in the public release of the XRMB corpus, though these differences have no material effect on relative magnitudes of positional measures. Materials from these XRMB Database speakers, tasks, and/or waveforms may be used for similar or other purposes, in future work by other investigators, and it may become important to know which of the relevant materials were included in the current analysis. The 17 English and Japanese speakers included in this study are the same group also described in an independent analysis of vowel configurations (Hashi, Westbury & Honda, 1994). 408 J. R. ¼estbury and M. Hashi speakers (three males and four females) were drawn from a smaller, independently collected data set. For the Japanese dataset, utterances containing /p b m/ embedded in /" — "/ frames were written in katakana, and read aloud as meaningless words with primary accent on the second /"/. Each isolated vowel production, also read aloud, was prompted by a single katakana letter corresponding to the sound. Speakers in both samples were healthy and neurologically normal, with no evidence of speech pathology. Moreover, none was younger than 18 years, and all were native speakers of their language group. All English speakers had spent their linguistically

Figure 2. Representative data from /"m"/, produced by one English speaker (E29). Time histories are shown for lip-pellet separation D (top panel); lip-pellet coordinates (e.g., ULx, lower right); and the sound pressure wave. Sagittal-plane trajectories of upper and lower-lip pellets are shown in the lower left panel. Vertical lines intersecting time histories represent ‘‘event’’ times associated with judged moments of consonantal closure and release (t# and t, respectively), and the time of minimum distance between labial pellets (t *). ¸ip-pellet positions during selected sounds 409

Figure 3. The sound pressure wave, and corresponding time histories of upper and lower-lip pellet speed, during /"p"/ produced by one English speaker (E31). All histories are intersected by vertical lines corresponding to the moments t# and t. Shorter vertical lines, labeled t# and t, intersect UL and LL speed histories, to indicate the times of occurrence of local maxima in pellet speed, just before t#, just after t. formative years as residents of Wisconsin, and were essentially monolingual. Six of the seven Japanese speakers spoke a Tokyo dialect, while one (J5) spoke an Ibaragi dialect. J5 was included in the Japanese sample because his vowel formant frequencies were essentially like those of the Tokyo-dialect speakers. All Japanese speakers were nominal- ly bilingual, but variably proficient in English. In general, data from the two groups were elicited, recorded, and processed according to procedures described in a handbook for the XRMB Database (Westbury, 1994b). Several measurements were made from each VCV token, and from each isolated vowel, produced by each speaker. Measurements from representative VCV tokens are illustrated in Figs. 2 and 3. For each consonant in VCV tokens (e.g., Fig. 2), abrupt spectral changes in the acoustic wave associated with closure and release moments were identified and marked as t# and t, respectively (corresponding to vertical lines in Fig. 2, headed by unfilled circles 1 and 2). A time of minimum distance (t *) was marked at the moment when the separation between pellets was smallest during the [t#,t] interval. Moments corresponding to local maxima in pellet speed preceding t# and following t (e.g., Fig. 3), were identified for both pellets as t# and t, respectively. In each isolated

 Following conventions that are customary in mechanics, the term speed is used to refer to the scalar magnitude of the velocity vector (dx/dt, dy/dt) that can be defined at each position sample for each pellet. 410 J. R. ¼estbury and M. Hashi vowel, an acoustically-defined ‘‘steady-state’’ moment (t) (Hashi, Westbury & Honda, 1994) was identified as the midpoint of the interval within which neither F nor F changed more than 40 Hz over any 20 ms interval. Sagittal-plane coordinates for both lip pellets, and the distance between them (hereafter, D), were determined at the four moments +t#,t,t *,t,. Speed values for both pellets were determined at each of their respective moments +t#,t#,. For reference purposes, first and second formant frequencies (F and F) were meas- ured at t during each isolated vowel, using an LPC-based formant tracking algorithm in Cspeech (Milenkovic & Read, 1992), with the number of LPC coefficients set to 24. The formant-tracking measurements were supplemented by independent spectrographic measurements, involving bandwidths of 300 and 500 Hz for male and female speakers, respectively. Measurements based upon formant tracks were accepted when differences between measurements were within 30 Hz for F, and 60 Hz for F. Spectrographic measurements of F and/or F were accepted when measurements differed by more than the relevant criterion. Scatterplots of F and F frequencies at t are shown in Fig. 4, for both speaker samples.

3. Results

3.1. ¸ip-pellet motions accompanying labial closure and release for /pbm/ Close inspection of data from all speakers revealed no systematic differences in pellet movements associated with the three different labial consonants, and relatively few differences in lip-pellet motions produced by English and Japanese speakers. During all utterances containing /p b m/, both pellets moved toward one another before t#, and away from one another after t. No speaker moved only one lip when making these sounds. At least one lip pellet also continued to move throughout each closure produced by each speaker. Movements of the UL pellet during the interval surrounding [t#,t] usually covered distances no larger than 5 mm, and tended to be about twice as large in the -direction (normal to the maxillary occlusal plane) as in the x-direction (parallel to the occlusal plane). UL trajectories were sometimes looping and partly elliptical, though on the whole there was no stereotyped movement pattern. Movements of the LL pellet during the interval surrounding [t#,t] followed paths that were initially upward and slightly forward, and then downward and rearward covering distances of roughly 20 mm from or toward positional extrema associated with the adjacent, low back vowel /"/. In general, LL trajectories were either line-like, in which the approach and retreat segments, toward and away from the local positional maximum at t *, fell along roughly the same path; or, they were shaped something like an inverted letter Y, V, or U, in which the approach path was often steeper than, and always forward of, the retreat path. As a rule, the LL pellet began moving toward closure, synchronously in the x and y directions, about 75—100 ms before t#. In the majority of cases, both coordinate histories tended to reach their local maxima at about the same time, rarely

 The term history is used to refer to any time-series record of a measured or derived quantity (e.g., the location of a pellet in either the x or y direction of the sagittal plane; or, the speed of a pellet, as it moves along its respective sagittal-plane trajectory). The term trajectory is reserved to refer to the path traced by a pellet moving in a plane. ¸ ip - eltpstosdrn eetdsounds selected during positions pellet

Figure 4. Scatterplots of frequencies of first and second formants (F and F, respectively) at t, for English and Japanese speaker samples. The vowels /i 2"o u/ are represented by unfilled circles, upright triangles, squares, inverted triangles, and diamonds, respectively. Identification numbers for speakers are shown inside their respective symbols. 411 412 J. R. ¼estbury and M. Hashi more than 75 ms after t#. Then, the pellet moved smoothly down and back, through the release moment. An inflection occurred at about t# in the majority of UL and LL trajectories. This inflection is too small to be seen clearly in the UL trajectory shown in the lower left panel of Fig. 2, but corresponds to subtle ‘‘direction’’ changes in the UL x-and y-coordinate histories in the vicinity of t#, in the lower right panel of the same figure. The closure- related inflection in the LL trajectory most often corresponded to a local change in the slope of the pellet’s y-coordinate history immediately before and after t#, (cf. the lower right panel of Fig. 2). The ‘‘bend’’ or ‘‘knee’’ near the top of the ascending leg of the LL trajectory, beginning at about the unfilled circle 1 marking t#, illustrates the phenom- enon. Both pellets speed up as they approach t#; slow down during the [t#,t *] interval; and then, speed up again as they move toward and through the moment t. This stereotyped pattern is illustrated in Fig. 3, in which local maxima in speed histories derived from both the UL and LL trajectories occur shortly before t#, and shortly after t. Descriptive statistics for times of occurrence of these maxima, relative to t# and t for both pellets, and computed across sounds, talkers and language groups, are shown in Table I. On average, the local maximum in UL speed prior to t# occurred slightly before the local maximum in LL speed also preceding that moment. Conversely, the local maximum in UL speed after t occurred slightly later than the local maximum in LL speed also occurring after that moment. The UL and LL pellet speeds at their respective t# and t moments, averaged across sounds and talkers but within each language group, are shown in Table II. Several generalizations within and across groups are possible from these data, though it is important not to overstate their significance in view of the small sample sizes. The strongest of these generalizations relates to the fact that speeds for the UL pellet were noticeably lower than for the LL pellet, at either t# or t. Other generaliz- ations include the fact that: (1) pellet speeds were greater among Japanese than English talkers, at each measurement time except t; (2) among English talkers, the maximum LL speed prior to t# was systematically lower than the maximum LL speed after t (cf. Sussman, MacNeilage & Hanson, 1973), while the UL pellet speeds at the two moments were about the same; and (3) among the Japanese talkers, maximum pellet speeds were higher before t# than after t#, though the relevant distributions were overlapping.

¸ 3.2. ip pellet positions at t#, t, t *, and t Scatterplots of average lip-pellet positions (computed across speakers, within each language sample), at closure-related moments for consonants (t#,t, and t *), and  Speakers’ lips differ in size and shape. Morphological differences may account for certain between-group differences in data described in this report. For example, apparent group differences in linear slopes and y intercepts relating LL pellet-position coordinates, shown in Fig. 5, might just as easily represent a ‘‘mor- phological artifact’’ associated with the specific speakers included in either group, as a global distinction between the two languages. Similarly, the greater pellet separations for Japanese than English speakers, at t for matched vowels (cf. Table III), were probably due to a general group difference in lip thickness. When morphological differences between talkers are extreme, and/or the directions and relative magnitudes of variable effects covary with morphology, some procedure for speaker normalization may be necessary before sample statistics can be calculated across speakers. Nothing about data summarized in this report suggested any strong argument for normalization. Consequently, data from individual speakers were merely ‘‘added together’’ before statistical calculations. ¸ip-pellet positions during selected sounds 413

TABLE I. Statistics for relative times of occurrence of local maxima in UL and LL speeds. Event times associated with t# and t are expressed relative to t# and t, respectively. Thus, a negative number indicates that t# (or t) occurred before t# (or t). Conversely, a positive number indicates that t# (or t) followed t#. Data are pooled across consonants, speakers, and language groups. Meas- ures are in ms

t# re/t# t re/t xpRange xpRange

UL (n"49) !28 13 (!62, !7) 26 26 (!28, 72) LL (n"49) !21 12 (!56, 0) 14 9 (!3, 36)

TABLE II. Statistics for the magnitudes of (maximum) pellet speed at t#, and at t. Measures are in mm/s

Closing speed (at t#) Opening speed (at t) xp Range xpRange

UL( (n"19) 61 21 (25, 96) 45 16 (21, 77) UL# (n"30) 34 14 (15, 74) 33 12 (15, 75) LL( (n"19) 194 47 (106, 300) 178 29 (118, 238) LL# (n"30) 140 44 (61, 246) 180 51 (103, 281)

º ‘‘steady-state’’ moments for vowels (t) are shown in Fig. 5. p, toward the top of the head (along lines perpendicular to the maxillary occlusal plane, MaxOP), is toward the top of each panel in the figure, while forward, in the direction toward the face (expressed relative to the central maxillary incisors, CMI), is toward the right. Average sagittal-plane pellet positions for vowels are indicated by filled, labeled circles, while those for consonants are indicated by labeled, unfilled symbols (squares and circles). Several generalizations can be drawn from data illustrated in Fig. 5. One simple rule of thumb is that average UL position was about the same for all three consonants, at t#,t, and t *, within each speaker sample. This result is illustrated by the cluster of overlap- ping positions, and (appropriately) illegible labels, for the unfilled squares and circles in the upper (UL) portions of the right and left panels of the figure. Within each language sample, average LL position was also about the same for different consonants measured at the same moment, but differed according to the moment at which measurements are made, being highest at t * (indicated by a single unfilled circle, labeled mn); somewhat lower at t# (the higher triple of labeled, unfilled squares) and lower still at t (in each panel, the lower triple of labeled, unfilled squares). Because the lips were separated during vowels, average UL positions were higher for vowels than for consonants and average LL positions were generally lower for vowels than consonants, though average LL position for /u/, within both language samples, fell within the lower portion of the range of LL positions observed during the consonants. Data plotted in Fig. 5 also show that the UL pellet was systematically further forward at t for /o u/ than for /i 2"/, in both language samples. For the LL pellet, within both speaker samples, sounds and moments associated with the highest positions (e.g., /u/) also exhibited relatively anterior positions. The highest LL position, at t *, was also the most forward. The lowest LL position, at t for /"/, was also the one furthest back. The 414 J. R. ¼estbury and M. Hashi

Figure 5. Average pellet positions, computed across speakers within groups, at +t#,t,t *,t,. Open squares (e.g., labeled pc, bc, mc, br and mr) indicate pellet positions at closure and release for /p b m/. Open circles (labelled mn) indicate average position at t *, computed across all three consonants. Average positions at that moment differed so little among the three consonants that it is impractical to use separate labeled symbols. Solid circles (labeled i, e, a, o, and u and for /i 2"o u/, respectively) indicate average positions at t. high, positive, correlation coefficients, shown in the same illustration, indicate that average x and y coordinates for the LL pellet were strongly related within language groups.

3.3. ¸ip-pellet separations during consonants and vowels Fig. 2 includes the derived history of inter-lip-pellet distance D, shown above the speech wave. The intersections between vertical lines indicating closure and release moments ¸ip-pellet positions during selected sounds 415

TABLE III. Pellet separations (in mm) by speaker, for isolated vowels at t. The column labeled AvDA, described in the text, refers to average pellet separation at t#, computed across/p b m/, for each speaker

Speaker /i/ /2//"/ /o/ /u/ AvDA E29 25.2 34 36.2 22.6 18.6 16.9 E31 26.5 30.5 38.2 22.1 20.5 15.6 E34 30.6 33.6 35.5 22.9 21.4 18.4 E35 22.4 23.9 31.4 22.2 20.6 15.5 E41 31.8 36.2 40.1 30.2 30.1 19.4 E44 24.9 28.2 27.9 21.2 21.9 17.5 E53 30.6 38.1 42.2 34.5 25.3 20.2 E54 20.6 23.1 24.8 20.5 17.8 15.5 E59 25.5 28 27 20.9 19.7 18.5 E61 23.2 26 30.9 23.2 20.3 20 J1 35.1 35.5 37.9 33.2 27.4 22.8 J2 33.8 36.6 41 29.3 25.5 20.8 J3 26.3 32.3 33.4 30 25.9 22.2 J4 38.8 39.6 48.8 43 29.6 20.2 J5 31.7 36.4 35.7 26 26.5 23.8 J6 24.7 27.5 30.3 29.9 23 18.5 J7 24.7 34 36.6 34 32.8 25

t# and t, and D(t), are indicated by short horizontal lines, to emphasize lip pellet separation at both moments. Two facts about D during /p b m/ were notable. The first is that the distance between upper and lower-lip pellet positions was never constant during the closure interval for any of these consonants. Instead, D always decreased for a period after t#, and then began to increase again as t was approached. Across speakers and consonants, the decrease in pellet separation, between t# and t *, averaged 3.1 mm " (p 1.3 mm; and ranged between 0.7 and 6.6 mm). The decrease in D after t# was about the same for different consonants, but somewhat larger among Japanese than English speakers. Across speakers and consonants, the time to minimum D (i.e., ! " [t * t#]) averaged 51 ms (p 20 ms; and ranged between 13 and 106 ms), and was about the same for different consonants and speaker samples. The amount by which D decreased after closure, and the time to minimum D, were moderately strongly " correlated (r 0.55) across sounds, talkers, and language samples. A second notable fact about D is that it was usually greater at t than at t# (by 1.5 mm ! on average, across speakers and consonants). The difference D(t) D(t#) was positive in 43 of 49 comparisons, ranged between !0.8 and 4.2 mm, and ‘‘belonged’’ largely to a difference in LL pellet position. Lip-pellet separations at t are shown for each speaker in Table III. D was smallest at t for /u/ produced by 15 of 17 talkers, and largest for /"/ for 14 of 17 talkers. In general, D was about 5 mm greater among the Japanese than English speakers, at t for each of the five vowels. A composite measure of pellet separation at t# (hereafter, AvDA), averaged across /p b m/ for each speaker, is indicated in the rightmost column. The significance of this measure will be discussed in a subsequent section. 416 J. R. ¼estbury and M. Hashi

4. Discussion

4.1. Inferences about articulatory dynamics of labial gestures

Patterns among the positions and motions of lip pellets, accompanying vowels and consonants described in this report, prompt qualitative insights and interesting specula- tions about the dynamics of labial gestures. For example, the small but systematic inflections in pellet trajectories at about t# for most instances of intervocalic /p b m/ suggest that the forces bringing the lips together are large enough, and timed in such a way as to deform the lips’ shape when they contact one another (cf. Fujimura, 1990). This straightforward inference accounts for two facts established by the data. The first relates to changes in the inter-lip-pellet distance D that always occur after t#.D will decrease between t# and t * if the lips compress as their own inertial forces, and diminishing muscular (closing) forces, dissipate into the soft tissue (cf. Folkins & Abbs, 1975). Conversely, D will increase between t * and t as ‘‘opening’’ forces accelerate the lips apart from a compressed state. The fact that D is systematically larger at t than t# may be due to a vertical stretch of one or both lips as they are drawn apart, deformed by opening forces and/or ‘‘held together’’ by tension generated between their contacting surfaces (cf. Schulman, 1989). It is probably significant that the UL pellet often seemed to follow the LL pellet downward during the release gesture after t * (cf. the ULy history in Fig. 2), as the LL pellet moved down from its own local, mid-closure extremum. A second fact likely due to deformation of the lips’ shapes relates to changes in the direction of pellet motion associated with inflections in lip-pellet trajectories just after t#. Total lip volume must remain constant. Pressing the lips together should cause them to deform either laterally (normal to the sagittal plane, which cannot be seen in microbeam data), and/or horizontally, in the x dimension of the sagittal plane. In this horizontal dimension, the lips cannot distend rearward, since both are bounded behind by buccal surfaces of the teeth. Thus, any horizontal deformation should occur in the space in front of the lips, and would appear as small ‘‘protrusive’’ movements, like those that can be seen during the [t#,t *] interval in the x-coordinate histories for UL and LL Fig. 2. The fact that inflections in UL and LL trajectories routinely occurred at about t#, where this moment was judged independently from spectral changes in the acoustic wave (e.g., changes involving rapid reductions in energy across the spectrum), supports the idea that the spectral changes themselves reflect discontinuities in lip kinematics. It is curious that compression during the [t#,t *] interval seems to be largely restricted to the lower lip. The position of the UL pellet at t * (cf. Figs. 2 and 5) is not much different than at other moments during consonantal closures. In contrast, the position of the LL pellet at t * is usually upward and forward of its position at t#. This difference in position of the two pellets suggests that the intrinsic mechanical properties of the lower lip, and/or the control forces applied to its position, may be significantly different from those affecting the upper lip. Recent comments by Honda, Kurita, Kakita & Maeda (1995) suggest a basis for such a difference. The timing of local maxima in UL and LL speed histories, relative to t# and t (cf. Fig. 3), provide information about the timing of forces affecting the lips. The data show, for example, that local maxima in pellet speeds before t# occurred within a narrow ! temporal window, at most 50 ms wide, and centered about 20 ms with respect to t#.In short, lip pellets do not begin decelerating until very shortly before labial closure. The ¸ip-pellet positions during selected sounds 417 relatively tight ‘‘temporal coupling’’ between deceleration and t# suggests that deceler- ation may be driven more by increasing contact, than by any muscular forces which could slow the lips’ approach. From this point of view, lip movements toward closure for /p b m/ are probably not delicate acts. Instead, speakers can accelerate the lips toward one another relatively coarsely, and take advantage of the fact that their collision and compression will create closure. Restoring forces arising from compression may even contribute toward release. At some time after t#, lip pellet speeds would be expected to increase, as restoring forces arising from compression, assisted by muscular ‘‘opening’’ forces, accelerate the lips apart. It is interesting to learn that the release acceleration ends, on average, no more than 25 ms after t. A negative result in the current data that may refine some views about labial gesture dynamics relates to the fact than no systematic differences were observed among pellet movements for /p b m/. Across talkers, pellet trajectories for the three consonants did not differ in extent or shape. Moreover, the timing and magnitude of local maxima in approach and release speeds, averaged across talkers, were indistinguishable by conson- ant. The strong similarity in lip movements for /p b m/ is consistent with observations by Browman & Goldstein (1986, p. 233), but contrary to a report by Fujimura (1961), based upon an analysis of high-speed (240 frames/s) stroboscopic data recorded from one talker, and to a report by Sussman et al. (1973), based upon strain-gauge data from five talkers. Results from Fujimura’s study, for example, revealed differences in lip configura- tions just before the release moment, and damped oscillations after release, for /p b/ but not /m/. Fujimura (op cit., p. 236) attributed these differences to an ‘‘overpressure behind the obstruction’’ for /p b/, and suggested that this pressure represents a significant mechanism for effecting consonantal release. Sussman et al. (1973) also proposed an aerodynamic argument to account for sound-related differences in their electromyo- graphic and kinematic data. The fact that no differences in fleshpoint kinematics among /p b m/ were found in this study casts doubt on the influence of intra-oral pressure, for the current sample of talkers and speech tasks. It is important to remember, however, that comparisons of the current results with those from other studies are complicated by differences in transduction and sampling methods.

4.2. Implications of pellet positions and separations for lip openings during vowels Speakers use the lips in specific and distinctive ways to create differences between some speech sounds. In English, the rounded vowels /o u/ are accompanied by protrusion of the lips (Perkell, 1969; Linker, 1982), and narrowing of the height and width of the opening between them (Fromkin, 1964). The relatively narrow average separations and protruded positions of lip pellets at t for /o u/, shown in the left half of Fig. 5 for English speakers, are expected. Among Japanese speakers, however, comparable pellet positions for /u/ and /o/, and especially the protruded position of the UL pellet for /u/, shown in the right half of Fig. 5, are surprising. In descriptive phonetic accounts (cf. Vance, 1987; p. 10), Japanese /o/ is often said to be the only vowel of the language ‘‘that involves active lip rounding,’’ while /u/ ‘‘is commonly described as unrounded.’’ Hattori (1950), for example, generally considered the labial feature of Japanese Tokyo-dialect /u/ to be ‘‘spread’’ (with corners of the mouth retracted). The fact that Japanese speakers described in this report were bilingual in English may account for their unexpected lip protrusions during /u/. An alternate view is that their /u/ protrusion reflects hyper-articulation associated with laboratory speech. 418 J. R. ¼estbury and M. Hashi

Considered together, position data for the two pellets and speaker groups imply that rounding/protruding gestures for vowels, and closing gestures for consonants, may differ in kind for the upper lip, but not for the lower lip. Data summarized graphically in Fig. 5 show, for example, that the UL pellet was lower for labial consonant closures in a way that was not simply pushed more forward, and forward for rounded vowels in a way that was not simply lowered. In contrast, forward and up (and rearward and down) generally fell along a line for the LL pellet. Thus, these data suggest that the kinematic working space of the middle section of the lower lip, assisted by any contribu- tion from the lower jaw on which it rides, may be chiefly one-dimensional. Macchi (1988) reached a similar conclusion about the lower lip, from additional data from two other speakers. The distances separating UL and LL positions at different measurement times provide useful hints about the size of the lip opening, though data summarized in Fig. 5 suggest caution in using inter-pellet distances to infer lip surface separation. Note, for example, that there is no one inter-lip-pellet distance that corresponds to labial closure. Any of several inter-pellet distances can be taken to indicate that the lips are closed. One implication of this fact is that there is no unambiguous way to estimate mid-line surface separation between the lips when they are apart (e.g., during a vowel), merely by subtracting some closure-related reference value from the distance between the pellets at some other moment of interest. The composite measure of pellet separation at t#, AvDA, included in the rightmost column of Table III, is one of several ‘‘candidate’’ reference values for representing the combined mid-line thickness of the lips when the distance between them is zero. Other candidate values that could be considered include some index of inter-pellet separation at t * (probably too small), or at t (probably too large). Working estimates of mid-line lip separation for /i 2"o u/, derived by subtracting AvDA from the observed pellet separation at t for each talker and vowel, averaged 9.0, 12.5, 15.7, 7.7 and 4.5 mm, respectively, across the sample of 17 talkers described in Table III. These estimates are broadly in line with direct mid-line surface-separation measurements reported by Fromkin (1964). This fact suggests that plausible separation estimates from lip pellet data are possible, provided that a reference value corresponding to labial closure is carefully selected. Interpreting pellet-position data in terms that are ‘‘larger’’ than the pellets them- selves—e.g., in terms that refer either: (1) to positions of the lips as a whole, the tongue, soft palate, or jaw; or (2) to the geometry of constrictions formed between these mobile articulators and vocal tract boundaries—is a problem common to all point-para- meterized speech kinematic data. Solving even the first of these interpretative problems, for deformable bodies like the tongue, soft palate, and lips, whose anatomic partitions are not completely understood, and which may in fact not be discontinuous, is a difficult problem (cf. Fujimura, 1990). It is possible that very many measurement ponts, supple- mented by high-resolution images, are necessary for general descriptions of soft body motions. However, multiple, closely-spaced measurement points are impractical for techniques such as the X-ray microbeam method, or electromagnetic articulometry. Highly accurate data supplied by these techniques must be supplemented by other information if we wish to extrapolate from fleshpoints to articulators, or to time- variations in vocal tract constrictions. Certain fine details about labial fleshpoint motions that accompany intervocalic consonants and isolated vowels, produced by different speakers, enhance an appreciation of the interpretative problems posed by point-parameterized data, and expose some of the limitations we must bear in mind ¸ip-pellet positions during selected sounds 419 when we try to use such data to understand the actions and control of the lips during speech.

Research support was provided by USPHS Grant DC00820, and a collaborative research agree- ment between the University of Wisconsin-Madison and ATR Human Information Processing Research Laboratories, Kyoto, Japan. Constructive comments from Jim Dembowski, Kiyoshi Honda, Osamu Fujimura, Patrice Beddor, and an anonymous reviewer are gratefully acknow- ledged.

References

Abry, C. & Bo¨ e, L. (1986) ‘‘Laws’’ for lips, Speech Communications, 5, 97—104 Browman, C. P. & Goldstein, L. M. (1986) Towards an articulatory phonology, Phonology ½earbook 3, 219—252 Boyce, S. (1990) Coarticulatory organization for lip rounding in Turkish and English, Journal of the Acoustical Society of America, 88, 2584—2595 Folkins, J. & Abbs, J. (1975) Lip and jaw motor control during speech: responses to resistive loading of the lower jaw, Journal of Speech and Heading Research, 18, 207—220 Fromkin, V. (1964) Lip positions in American English vowels, ¸anguage and Speech, 7, 215—225 Fujimura, O. (1961) and nasal consonants: a motion picture study and its acoustical implications, Journal of Speech and Hearing Research, 4, 233—247 Fujimura, O. (1990) Articulatory perspectives of speech organization. In Speech Production and Speech Modeling (W. J. Hardcastle & A. Marchal, eds) Kluwer Academic Publishers: Dordrecht, The Netherlands, 324—342 Hattori, S. (1950) Onseigaku. Tokyo, Japan, Iwanami Shoten Hashi, M., Westbury, J. & Honda, K. (1994) Articulatory and acoustic variability of vowels in Japanese and English, Journal of the Acoustical Society of America, 95, 2820 Honda, K., Kurita, T., Kakita, Y. & Maeda, S. (1995) Physiology of the lips and modeling of lip gestures, Journal of Phonetics, 23, 243—254 Houde, R. A. (1968) A study of tongue body motion during selected speech sounds. Speech Communications Research Laboratory Monograph No. 2: Santa Barbara, California Linker, W. (1982) Articulatory and acoustic correlates of labial activity in vowels: a cross- linguistic study, ºC¸A ¼orking Papers in Phonetics, 56 Macchi, M. (1988) Labial articulation patterns associated with segmental features and syllable structure in English, Phonetica, 45, 109—121 Milenkovic, P. H. & Read, C. (1992) Cspeech »ersion 4 ºser’s Manual. Madison, WI Perkell, J. S. (1969) Physiology of speech production. MIT Press: Cambridge, MA Perkell, J., Matthies, M., Svirsky, M. & Jordan, M. (1993) Trading relations between tongue-body raising and lip rounding in production of the vowel /u/: a pilot ‘‘motor-equivalence’’ study, Journal of the Acoustical Society of America, 93, 2948—2961 Ramsey, J. O., Munhall,K. G., Gracco, V. L. & Ostry, D. J. (1996) Functional analysis of lip motion, Journal of the Acoustical Society of America, 99, 3718—3727 Schulman, R. (1989) Articulatory dynamics of loud and normal speech, Journal of the Acoustical Society of America, 85, 295—312 Sussman, H., MacNeilage, P. & Hanson, R. (1973) Labial and mandibular dynamics during the production of bilabial consonants: preliminary observations, Journal of Speech and Hearing Research, 16, 397—420 Vance, T. (1987) An introduction to Japanese phonology. Albany, State University of New York Westbury, J. (1994a) On coordinate systems and the representation of articulatory movements, Journal of the Acoustical Society of America, 95, 2271—2273 Westbury, J. (1994b) X-ray microbeam speech production database user’s handbook. Madison, WI

.