A Kinematic Analysis of Prosodic Structure in Speech and Manual Gestures

A Kinematic Analysis of Prosodic Structure in Speech and Manual Gestures

A KINEMATIC ANALYSIS OF PROSODIC STRUCTURE IN SPEECH AND MANUAL GESTURES Jelena Krivokapic,1, 2 Mark K. Tiede, 2 Martha E. Tyrone 2,3 1University of Michigan, 2Haskins Laboratories, 3Long Island University Brooklyn [email protected], [email protected], [email protected] ABSTRACT and body gesturing is not well understood, however. The first goal of this study is to address some basic Two experiments examining the effects of prosodic questions regarding this relationship. We present structure on the kinematic properties of speech and two experiments. Experiment 1 examines the effect manual gestures are presented. Experiment 1 of prosodic boundaries, stress, and their interaction investigated the effects of prosodic boundaries, on oral, intonation and manual gestures. Experiment stress and their interaction on manual, oral, and 2 examines the effects of varying degrees of intonation gestures (their duration and coordination). prominence (deaccented, narrow focus, broad focus, Experiment 2 investigated the effects of different contrastive focus) on these gestures. Working within types of prominence (deaccented, narrow focus, the framework of Articulatory Phonology ([3 ff.]), broad focus, contrastive focus) on oral constriction, these two experiments allow us to examine 1) the intonation and manual gestures (duration and effect of prosodic structure (boundaries and coordination). We recorded speech audio, vocal tract prominence) on the duration of vocal tract gestures gestures (using electromagnetic articulometry) and and manual gesturing (specifically pointing gestures) manual movements (using motion capture). To the and 2) the coordination of oral gestures, manual best of our knowledge, this is the first study to gestures and intonation gestures, and how prosodic examine kinematic properties of body movement structure affects this coordination. We test the and vocal tract gestures concurrently. Preliminary hypothesis that prosodic control extends beyond the results focus on the effects of prosodic structure on vocal tract ([21, 26]). Under this hypothesis, for gesture duration and show that 1) manual and oral question 1, pointing movements are expected to be gestures are longer phrase-initially than phrase- affected by prosodic structure and to show phrase- medially and 2) manual and oral gestures lengthen initial temporal lengthening which increases with under phrase-level prominence. [Supported by NIH]. boundary strength, in parallel with oral constriction gestures. The effects of prominence are less clear, Keywords: prosody; speech production; manual but we expect that overall there will be an increase gestures; articulatory gestures; prosodic lengthening. in gesturing duration from the deaccented condition to contrastive focus (as has been observed in [9, 12, 1. INTRODUCTION 19]). This question is the focus of the current study. Further analyses to test the second question are Prosodic structure is marked by temporal and tonal ongoing. The prediction is that manual gestures are properties. At boundaries, acoustic segments become coordinated with intonation and/or oral constriction longer and articulatory gestures become larger, gestures in-phase or anti-phase, but that they will not longer, and less overlapped (e.g., [4, 6, 15, 25]). The affect how vowel and consonant gestures are effects are graded, such that higher boundaries show coordinated with each other, since manual gestures more lengthening (e.g., [4, 6, 8, 25]). The overall are not part of the lexical representation of the word effects of prominence are less well understood, but ([16, 20]). prominent syllables also exhibit lengthening (e.g., To address these questions, kinematic data are [7, 10, 13, 19]). Tonally, prosodic structure is needed for both vocal tract gestures and manual marked by rising and falling pitch on prominent gesturing. For this purpose, we recorded speech and syllables and phrase-finally. body movement concurrently, using electromagnetic In addition to acoustic and articulatory articulometry (EMA) and motion capture. To the manifestations of prosodic structure, there is best of our knowledge, this is the first study to evidence of a relationship between prosodic record the kinematics of vocal tract gestures and structure and body movement (gesturing). Thus body movements simultaneously; accordingly, a gesturing is timed to prominent syllables (e.g., [11, second goal of the study is to demonstrate the 18, 21, 22, 23, 26]), and this timing can be feasibility of the data collection method. influenced by prosodic boundaries ([14]). The exact nature of the relationship between prosodic structure 2. METHODS Table 2: Context questions for Experiment 2. 2.1. Stimuli and participants Condition Context deaccented Does Lenny want to see Bob? The Experiment 1 stimuli consisted of six sentences broad What is going on? varying the phrase-initial boundary (word boundary, narrow Who does Anna want to see? ip—intermediate phrase boundary, IP—Intonation contrastive Does Anna want to see Mary? Phrase boundary) and stress position (the first or second syllable of disyllabic target words). To keep the segments in the two stress conditions identical, 2.2. Data collection two nonce words were used (the names MIma and miMA, with stress on the first and on the second The audio signal, gestures of the vocal tract, and syllable, respectively). The sentences were read body movement were recorded concurrently. Vocal from a computer screen twelve times for a total of tract gestures were recorded using an 72 sentences (3 boundary x 2 stress x 12 repetitions). electromagnetic articulometer (EMA; WAVE, The sentences were semi-randomized in blocks of Northern Digital), at 100Hz. These data were six sentences. Table 1 lists stimuli for the target collected synchronously with the audio signal, which word MIma (stress on the first syllable). The was sampled at 22025Hz ([2]). Body movement was sentences for the condition with stress on the second acquired separately using a motion capture system syllable were identical except that the target word (Vicon; Oxford, UK) which includes 6 infrared was miMA. Participants were asked to point to the sensitive cameras and a visible-light camera, appropriate picture of a doll (named either miMA or supporting collection of 3D movement data MIma) while reading the target word. synchronized with video both at a sampling rate of 100Hz ([24]). EMA sensors were placed on the Table 1: Stimuli for Experiment 1 for the target tongue tip, tongue body, tongue dorsum (see Figure word MIma. The boundary is before MIma. 1), on the lower incisors (for jaw movement) and three reference sensors were placed on upper Cond Sentence incisors, and left and right mastoid processes (to ition correct for head movement). Twenty-five motion 1. There are other things. I saw MIma being capture markers were placed on the lips, eyebrows, word stolen in broad daylight by a cop. face, arms and hands, including one marker on each 2. ip Mary would like to see Shaw, MIma, index finger, taped to the participants’ skin or to Beebee, and Ann while she is here. their clothes (Figure 1). In post-processing, data 3. IP There are other things I saw. MIma being from the concurrently recorded audio, EMA, Vicon stolen was the most surprising one. and video streams were temporally aligned through cross-correlation of head movement reference data, Experiment 2 consisted of four sentences that and trajectories of head-mounted sensors and varied the type of prominence on the target word markers were converted to a coordinate system (deaccented, narrow focus, broad focus, contrastive centered on the upper incisors and aligned with the focus). They were repeated twelve times, for a total speaker’s occlusal plane. Importantly, the EMA and of 48 sentences (4 conditions x 12 repetitions). The motion capture systems allow for unrestricted sentences were semi-randomized in blocks of four movement, which is necessary for body gesturing sentences. The target word in all sentences was Bob. and a natural conversational setting. The sentence was always “Anna wants to see Bob. In the morning if possible”. To elicit the appropriate Figure 1: Gesture tracking. EMA sensors on the prominence, the sentence was placed in a question- left and motion capture markers on the right. answer context (the questions are given in Table 2; see [19] for a similar procedure). Participants were asked to point to the picture representing Bob while reading the target word. The data collected in these experiments are part of a larger study. Two native speakers of American English participated; they were paid for their participation and naïve as to the study’s purpose. Participants were seated with a clear view of a and closing movement. Focusing here on the effects monitor (for stimulus presentation) and a of prosodic structure on gesture duration, the confederate co-speaker (Figure 2). During the following variables are of interest for the LA and experiment, sentences appeared on the monitor; pointing gestures: 1) duration of the constriction participants read them and pointed with their closing movement (from onset to maximum dominant index finger at pictures as they read the constriction), 2) duration of the constriction opening associated target words. A green paper dot was movement (from maximum constriction to gesture affixed close to the participant’s knee to serve as the offset). For the pointing gesture, the closing resting position for the pointing finger (similar to movement is the movement towards

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    5 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us