<<

Phonetic Transcription What is phonetic transcription? • Sequence of symbols representing the successive and in a stretch of speech. • What are consonants and vowels? • Sets of gestures that can potentially distinguish words from one another in a language • So a phonetic transcription is a representation of the sequence of gestures that compose an utterance. Transcription Types • Broad Phonemic • Each is a symbol for a contrastive gesture or a set (combination) of gestures • The order of symbolizes contrastive aspects of gestural sequencing. • Two utterances that differ in at least one phoneme symbol or one ordering are contrastive: if they differ only in that one symbol, then they are a . .., /pæt/ /bæt/. • Represented in slashes, e.g. /pæt/ • Narrow Phonetic • Annotates non-contrastive details of gestural sequence • Two transcriptions that differ in at least one symbol represent utterances that may or may not contrast • Represented in square , e.g., [pʰæt] phonemes: minimal sets

IPA symbols Consonant phonemes: gesture composition

lips tip body velum glottis p bilabial stop opening bilabial stop bilabial stop opening alveolar stop opening alveolar stop alveolar stop opening velar stop opening g velar stop ŋ velar stop opening labiodental fric opening labiodental fric θ dental opening ð dental fricative alveolar fricative opening alveolar fricative ʃ palatoalveolar opening fricative ʒ palatoalveolar fricative lips tip body velum glottis

alveolar stop uvular ɹ bilabial palatoalveolar pharyngeal approximant approximant approximant bilabial uvular approximant approximant palatal approximant opening tʃ alveolar opening stop+fricative dʒ alveolar stop+fricative Consonant Chart Vowels • A single symbol is used for distinctive combinations of tongue and lip gestures for vowels. • e.g., /but/ • have two symbols: one for each tongue gesture or tongue-lip combination. • e.g. /baɪt/ • Diphthongs can be treated as single phoneme (like /tʃ/, /dʒ/ among the consonants). ɛɹ a ɪɹ ə ʌ ɑɹ ɪɹ ɚ

ɹ American

British phonemes: setsminimal bored Vowel Chart

ə Narrow Phonetic Transcription • Annotates non-contrastive details of gestural sequence • What kind of details? • • The variants are not distinctive; they cannot differentiate words • e.g., presence vs. absence of a release gesture at the end of a word: • [mætʰ] (TT release gesture) vs. [mæt˺] (no TT release gesture) • Allophonic Variation • of the same phoneme are determined by context. • The allophones are in complementary distribution: they appear in distinct sets of contexts, so the cannot create a minimal pair. • [bæ̃n] (nasalized vowel) vs. [bæd] (oral vowel) Source of Allophony in Time • Gestural overlap • Gestures have intrinsic duration in time during which the constriction motor task is carried out. • The time course of gestures constituting a sequence of segments may overlap. • Allophonic variation can result from the overlap of a gesture task with different neighboring gestures and the resulting articulatory and acoustic consequences. • Gesture reduction • In certain contexts, the amount of time a motor task is allowed to be active may be too short for it to reach its goal: target state is not achieved. Organization of gestures in time • The consonant and vowel gestures that form a word are each active “mad”“ban” for a fxed interval in time which may vary for different gestures.

VEL wide TT clo • The multiple gestures associated TB wide pharyngeal with a given consonant are not

LIPS clo necessarily synchronous with each other. time • Speech is not a sequence of synchronous gesture bundles. • How do we discover the timing patterns (temporal structure) of gestures? Finding Gestures in Time • To fnd when gesture is active in time, examine the movements of the constricting device that forms the constriction for that gesture. • When it begins to move towards the gesture’s constriction target, this is the moment of gestural activation. • When it begins to move away the gesture’s constriction target, this is the moment of gestural deactivation. Gesture Activation Times

AUDIO

ULyU Lip

LLyL Lip

LALA “two back”

T1yTT

T1DISTTTCD

T4yTB

T4DISTTBCD

400 500 600 700 800 900 1000 1100 1200

Gestures for initial and V begin at roughly the same time Coordination of gestures in Time

• Gestures do not foat freely but are coordinated to one another in specifc ways. • Organization of gestures into segments is not a good predictor of coordination type. • Gestures belonging to a sequence of segments may be coordinated synchronously (e.g. initial consonant an vowel). • Gestures belonging to a single segment may be coordinated sequentially (e.g., velic and oral Author'sAuthor's personal personal copy copy gestures of a fnal nasal). ARTICLEARTICLE IN IN PRESS PRESS 104104 D.D. Byrd Byrd et et al. al. / Journal/ Journal of of Phonetics 37 37 (2009) (2009) 97–110 97–110 “bow know” “bone oh” Coda Onset

Velic

T Tip 16 Byrd et al, (2009)

Fig.Fig. 2. 2. Samples Samples showing showing the the marked marked time time points points for for gesture gesture onset onset (vertical (vertical line line 1) 1) and and peak peak (vertical (vertical line line 2) 2) and and the the target target plateau plateau interval interval (rectangl (rectangle).e). VelumVelum aperture aperture is is in in the the top top panel panel of of each each figure figure and and tongue tongue tip tip aperture aperture in in the the bottom bottom panel panel for for each each individual individual example. example.

Fig.Fig. 3. 3. A A sample sample schematizing schematizing the the dependent dependent variables variables of of velum velum aperture aperture displacementdisplacement (VELDISP (VELDISP) and) and velum–tongue velum–tongue tip tip lag lag (LAG (LAG).). A A negative negativeLAGLAGis is Fig.Fig. 4. 4. Pooled Pooled results results showing showing mean meanLAGLAGforfor onsets, onsets, juncture juncture geminates, geminates, shownshown in in this this example. example. andand codas. codas. Gestural Scores • Temporal organization of gestures • Time along horizontal dimension • Boxes represent intervals of time during which gestures are active in the vocal tract. • Gestures of oral constrictors, velum, glottis are displayed on different rows, e.g., “bad”:

• Labels on the boxes indicate the constriction degree (and location) of the gesture. • Redundant (non-contrastive) gestures are left out to simplify display: • Consonant release gestures • Glottal narrowing (for voicing) • Velum closing (for non-nasal phonemes) Contrast among gestural scores • Differences in gestural scores that can count as different words: • presence or absence of particular gestures • gestures' values of CD and CL • qualitative organization of gestures in time Presence or absence of gestures: compositionality

“bad” “pad”

“ban” “pan”

“tan” “Ann” Contrast: gestures' values of CD and CL

“sad” “shad” Contrast: organization in time

“bad” “dab” Allophonic Variation and Overlap • Allophonic variation that is due to how gestural activations overlap is implicit in gestural score. • Aspiration of word-initial stops • of vowels before fnal nasal consonants Aspiration of initial voiceless stops

palatal wide palatal wide

opening Allophony: voiceless stops and clusters • Voiceless stops are aspirated when they are single word-initial consonants. • are at least partially voiceless following initial voiceless stops. • Voiceless stops are unaspirated following /s/ at the beginning of a word.

Principle1: Glottal Gestures in onset English allows only one glottal opening gesture in onset Aspiration in /#p…/, but not /#sp…/

/p eɪ d/ /s p eɪ d/

[ p ʰ eɪ d] [s p eɪ d] Principle1: Glottal Gestures in onset English allowsAllophonic only one glottal Voicing opening Variation gesture in in English onset

"paid" "spade"

"prayed" "sprayed"

Back to Allophonic Variation French • Voiceless stops are always unaspirated • peine “pain” • pleine “full” • spa “spa” • splendide “splendid” • Glottal opening gesture is shorter in French: equal in to closure. “two” “tous” English: Nasalization of vowels before nasals

palatal wide palatal wide

opening opening speakers of Standard French. The English speakers were a 41 year old Australian male from Sydney (En1, one of the authors) and a 31 year old American male from Buffalo, NY (En2, also one of the authors). Stimuli are listed in Table 1.1

2.1. Image Acquisition Data were acquired using a rtMRI protocol developed specifi- cally for the study of speech production [19]. Subjects’ upper Figure 1: French nasal vowel production: pan [p˜a] ‘pane’. airways were imaged midsagittally with spatial resolution 68 Frame 288: initiallabialstop;f291:tongue body retracton and x68pixels,fieldofview200x200mm,andatemporalre- initiation of velum lowering; f293: tongue body at target vowel construction rate of 33.18 f.p.s. Speech was recorded inside posture, velum fully open the scanner at 20 kHz, simultaneously with the MRI acquisi- tion, and later noise-reduced [21]. The companion video and audio recordings allow for dynamic visualization of the entire midsagittal plane of the vocal tract, including the velum and nasopharynx.

2.2. Articulatory Analysis Gestural Score MRI data were loaded into a custom graphical user interface de- “bun” signed for the synchronization, inspection and analysis• of com-Velum lowering gesture panion audio and video recordings [22]. Articulatory landmarksfor fnal Figurenasal 2:precedesFrench coda production: panne (detailed below) in the of interest were identifiedoral by constriction[pan] ‘failure’.Frame198:initiallabialstop;f201:velumre- examining video sequences and time-aligned audio and spectra. mains raised during nuclear vowel; f204: velic lowering during Frame times were recorded, from which time intervals between coda alveolar nasal stop. the articulatory events of interest were calculated. Where im- age transitions in the the default frame sequence were consid- ered to be too temporally course to capture events of interest, open by the time the tongue body achieves its target posture video sub-intervals were reconstructed at higher frame rates to (frame 147) – well before“yvonne the tongue” tip achieves the alveolar facilitate finer temporal resolution of articulatory details. lingual target of the coda consonant (frame 150). Time from onset was measured from the frame showing maximal lip closure in the initial /p,f,v/ beginning each target syllable (Fig. 1, left). Lingual tar- gets in coronal consonants /t,n/ were identified in frames show- ing maximal tongue tip closure in the alveolar region (Fig. 2, center). Vocalic lingual targets /a,˜a,A,˜A,E,˜E/ were identified in frames showing maximum lowering and retraction of the v ɑ n tongue body in the pharyngeal region (e.g. Fig. 1, right). Tim- ings for nasal segments /n/,V˜ were measured with respect to the first frame showing velum lowering in the video sequence (e.g. Fig. 1, center). Figure 3: Coda nasal consonant production in English: Yvonne [i:.v˜On].Frame144:initiallabialfricativeoftarget syllable; f147: velum fully open as tongue body achieves vowel target; f150: velum in maximally lowered position during coda 3. Results alveolar nasal stop. Articulation by subject Fr2 of the low back nasal vowel in the French word pan [p˜A] ‘pane’ is illustrated in Fig. 1. Velum lowering (frame 291) commences soon after the release of the 3.1. Quantifying Lingual and Velic Timing initial labial (frame 288). By the time the tongue body achieves For each utterance, six time intervals were calculated between its target posture, the velum is fully lowered (frame 293), and four articulatory landmarks in the target syllable, using the cri- remains open throughout the production of the vowel. teria described in §2.2: (i) onset to vocalic target (V); (ii) onset Production of the coda nasal consonant in the French word to initiation of velic lowering (Vel), (iii) onset to tongue tip clo- panne [pan] ‘failure’, by subject Fr2, is illustrated in Fig. 2. sure (TT); (iv) vocalic target to initiation of velic lowering; (v) The velum remains raised throughout the articulation of the pre- vocalic target to tongue tip closure; and (vi) initiation of velic consonantal vowel (frame 201), then lowers as the tongue tip lowering to tongue tip closure. Mean durations of each inter- moves towards its alveolar target (frame 204). val are compared in phrase-medial (control) and phrase final Production of the English coda nasal consonant in the word (boundary-lengthened) position for both languages in Table 2. Yvonne [i:.v˜On] by Subject En1 is shown in Fig. 3. Like the co- ordination observed in the French vowel (Fig. 1), but unlike the 3.2. Lingual and Velic Coordination velic activity observed during French nasal consonant produc- tion (Fig. 2), velum lowering commences soon after the release In the eight French words containing nasal vowels examined of the initial labial (frame 144), and the velum is already fully in this study, velum lowering commenced an average of 16 msec (subject Fr1) and 58 msec (Fr2) before the tongue body 1Tokens pet(s) and pen(s) were not elicited from subject En1. achieved its vocalic target, when the lexical target was uttered

578 speakers of Standard French. The English speakers were a 41 year old Australian male from Sydney (En1, one of the authors) and a 31 year old American male from Buffalo, NY (En2, also one of the authors). Stimuli are listed in Table 1.1

2.1. Image Acquisition Data were acquired using a rtMRI protocol developed specifi- cally for the study of speech production [19]. Subjects’ upper Figure 1: French nasal vowel production: pan [p˜a] ‘pane’. airways were imaged midsagittally with spatial resolution 68 Frame 288: initiallabialstop;f291:tongue body retracton and x68pixels,fieldofview200x200mm,andatemporalre- initiation of velum lowering; f293: tongueEnglish body at target vowel vs French construction rate of 33.18 f.p.s. Speech was recorded inside posture, velum fully open the scanner at 20 kHz, simultaneously with the MRI acquisi- tion, and later noise-reduced [21]. The companion video and audio recordings allow for dynamic visualization of the entire In French, the velum gesture for a fnal nasal midsagittal plane of the vocal tract, including the velum and • nasopharynx. is synchronous with the oral constriction.

2.2. Articulatory Analysis speakers of Standard French. The English speakers were a 41 MRI data were loaded into a custom graphical user interface de- year old Australian male from Sydney (En1,“bun” one of the authors) “bonne” signed for the synchronization, inspection and analysis of com- and a 31 year old American male from Buffalo, NY (En2, also panion audio and video recordings [22]. Articulatory landmarks 1 oneFigure of the authors).2: French Stimuli coda arenasal listed consonant in Table production: 1. panne (detailed below) in the syllables of interest were identified by [pan] ‘failure’.Frame198:initiallabialstop;f201:velumre- examining video sequences and time-aligned audio and spectra. 2.1.mains Image raised Acquisition during nuclear vowel; f204: velic lowering during Frame times were recorded, from which time intervals between coda alveolar nasal stop. the articulatory events of interest were calculated. Where im- Data were acquired using a rtMRI protocol developed specifi- age transitions in the the default frame sequence were consid- cally for the study of speech production [19]. Subjects’ upper Figure 1: French nasal vowel production: pan [p˜a] ‘pane’. ered to be too temporally course to capture events of interest, airways were imaged midsagittally with spatial resolution 68 open by the time the tongue body achieves its target posture Frame 288: initiallabialstop;f291:tongue body retracton and video sub-intervals were reconstructed at higher frame rates to x68pixels,fieldofview200x200mm,andatemporalre- (frame 147) – well before the tongue tip achieves the alveolar initiation of velum lowering; f293: tongue body at target vowel facilitate finer temporal resolution of articulatory details. construction rate of 33.18 f.p.s. Speech was recorded inside lingual target of the coda consonant (frame 150). posture, velum fully open Time from syllable onset was measured from the frame the scanner at 20 kHz, simultaneously with the MRI acquisi- showing maximal lip closure in the initial labial consonant tion, and later noise-reduced [21]. The companion video and /p,f,v/ beginning each target syllable (Fig. 1, left). Lingual tar- audio recordings allow for dynamic visualization of the entire gets in coronal consonants /t,n/ were identified in frames show- midsagittal plane of the vocal tract, including the velum and ing maximal tongue tip closure in the alveolar region (Fig. 2, nasopharynx. center). Vocalic lingual targets /a,˜a,A,˜A,E,˜E/ were identified in frames showing maximum lowering and retraction of the 2.2. Articulatory Analysisɑ n p ɑ n tongue body in the pharyngeal region (e.g. Fig. 1, right). Tim- v MRI data were loaded into a custom graphical user interface de- ings for nasal segments /n/,V˜ were measured with respect to signed for the synchronization, inspection and analysis of com- the first frame showing velum lowering in the video sequence panion audio and video recordings [22]. Articulatory landmarks (e.g. Fig. 1, center). Figure 3: Coda nasal consonant production in English: Figure 2: French coda nasal consonant production: panne (detailedYvonne below)[i:.v˜On] in.Frame144:initiallabialfricativeoftarget the syllables of interest were identified by [pan] ‘failure’.Frame198:initiallabialstop;f201:velumre- examiningsyllable; video f147: sequences velum fully and open time-aligned as tongue audio body achievesand spectra. vowel mains raised during nuclear vowel; f204: velic lowering during Frametarget; times f150: were velum recorded, in maximally from which lowered time position intervals during between coda coda alveolar nasal stop. 3. Results thealveolar articulatory nasal events stop. of interest were calculated. Where im- age transitions in the the default frame sequence were consid- Articulation by subject Fr2 of the low back nasal vowel in the ered to be too temporally course to capture events of interest, [p˜A] open by the time the tongue body achieves its target posture French word pan ‘pane’ is illustrated in Fig. 1. Velum video sub-intervals were reconstructed at higher frame rates to lowering (frame 291) commences soon after the release of the 3.1. Quantifying Lingual and Velic Timing (frame 147) – well before the tongue tip achieves the alveolar facilitate finer temporal resolution of articulatory details. lingual target of the coda consonant (frame 150). initial labial (frame 288). By the time the tongue body achieves ForTime each from utterance, syllable six onset time was intervals measured were calculated from the frame between its target posture, the velum is fully lowered (frame 293), and showingfour articulatory maximal lip landmarks closure in in the the target initial syllable, labial usingconsonant the cri- remains open throughout the production of the vowel. /p,f,v/teriabeginning described each in §2.2: target (i) syllable onset to (Fig. vocalic 1, left). target Lingual (V); (ii) tar- onset Production of the coda nasal consonant in the French word getsto in initiation coronal consonants of velic lowering/t,n/ were (Vel), identified (iii) onset in to frames tongue show- tip clo- [pan] panne ‘failure’, by subject Fr2, is illustrated in Fig. 2. ingsure maximal (TT); tongue (iv) vocalic tip closure target to in initiation the alveolar of velic region lowering; (Fig. 2, (v) The velum remains raised throughout the articulation of the pre- center).vocalic Vocalic target to lingual tongue targets tip closure;/a,˜a,A,˜ andA,E,˜ (vi)E/ were initiation identified of velic consonantal vowel (frame 201), then lowers as the tongue tip in frameslowering showing to tongue maximum tip closure. lowering Mean and durations retraction of each of the inter- moves towards its alveolar target (frame 204). tongueval are body compared in the pharyngeal in phrase-medial region (e.g. (control) Fig. 1, and right). phrase Tim- final Production of the English coda nasal consonant in the word ings(boundary-lengthened) for nasal segments /n/ position,V˜ were for measured both languages with respect in Table to 2. [i:.v˜On] Yvonne by Subject En1 is shown in Fig. 3. Like the co- the first frame showing velum lowering in the video sequence ordination observed in the French vowel (Fig. 1), but unlike the (e.g.3.2. Fig. Lingual 1, center). and Velic Coordination Figure 3: Coda nasal consonant production in English: velic activity observed during French nasal consonant produc- Yvonne [i:.v˜On].Frame144:initiallabialfricativeoftarget tion (Fig. 2), velum lowering commences soon after the release In the eight French words containing nasal vowels examined syllable; f147: velum fully open as tongue body achieves vowel of the initial labial (frame 144), and the velum is already fully in this study, velum lowering commenced an average of 16 target; f150: velum in maximally lowered position during coda msec (subject Fr1) and3. 58 Results msec (Fr2) before the tongue body alveolar nasal stop. 1Tokens pet(s) and pen(s) were not elicited from subject En1. achieved its vocalic target, when the lexical target was uttered Articulation by subject Fr2 of the low back nasal vowel in the French word pan [p˜A] ‘pane’ is illustrated in Fig. 1. Velum lowering (frame 291) commences soon after the release of the 3.1. Quantifying Lingual and Velic Timing initial labial (frame 288). By the time the tongue body achieves For each utterance, six time intervals were calculated between its target posture, the velum is fully lowered (frame 293), and four articulatory landmarks in the target syllable, using the cri- remains open throughout the production of the vowel. teria described in §2.2: (i) onset to vocalic target (V); (ii) onset Production of the coda nasal consonant in the French word to initiation of velic lowering (Vel), (iii) onset to tongue tip clo- 578panne [pan] ‘failure’, by subject Fr2, is illustrated in Fig. 2. sure (TT); (iv) vocalic target to initiation of velic lowering; (v) The velum remains raised throughout the articulation of the pre- vocalic target to tongue tip closure; and (vi) initiation of velic consonantal vowel (frame 201), then lowers as the tongue tip lowering to tongue tip closure. Mean durations of each inter- moves towards its alveolar target (frame 204). val are compared in phrase-medial (control) and phrase final Production of the English coda nasal consonant in the word (boundary-lengthened) position for both languages in Table 2. Yvonne [i:.v˜On] by Subject En1 is shown in Fig. 3. Like the co- ordination observed in the French vowel (Fig. 1), but unlike the 3.2. Lingual and Velic Coordination velic activity observed during French nasal consonant produc- tion (Fig. 2), velum lowering commences soon after the release In the eight French words containing nasal vowels examined of the initial labial (frame 144), and the velum is already fully in this study, velum lowering commenced an average of 16 msec (subject Fr1) and 58 msec (Fr2) before the tongue body 1Tokens pet(s) and pen(s) were not elicited from subject En1. achieved its vocalic target, when the lexical target was uttered

578 of /l/

• English /l/ is described as “dark” or “velarized” in coda, and “brighter” not velarized in onset.

• The gestures in the two positions are in fact the same, but the timing is different.

• In coda, the retraction of the TB occurs frst and contributes to the “velarized” percept.

• Pattern in very similar to that for nasals. Velarization of /l/

• English /l/ is described as “dark” or “velarized” in coda, and “brighter” not velarized in onset.

• The gestures in the two positions are in fact very similar, but the timing is different.

• In coda, the retraction of the TB occurs frst and contributes to the “velarized” percept.

• Pattern in very similar to that for nasals.

Principle 2: Coordination in onset vs coda in English Onset: all gestures composing a C begin synchronously Coda: gestures composing a C can be sequential, with wider constriction leading Nasal Assimilation to a following following coronal • “ten times” [tɛn]̃ vs “ten things” [tɛñ ]̪ • overlap of alveolar nasal and dental fricative results in blending of the two TT gestures sound produced by a given gesture to change (if you listen very carefully). The difficulty we have in perceiving this is probably due to the fact that we perceive the underlying contrastive gestural units, not their context-dependent consequences.

"ten times" vs. "ten themes"

Overlap of two gestures that attempt to control the same constrictor causes blending of the two gestures.

Return to Linguistics 120 Home Page Allophonic variation due to reduction

• fapping in American English: • Duration of Coronal stops and the laryngeal opening gestures "shrink" in time between stressed and unstressed vowels, and become approximants or “faps”. • “latest” [leɪɾɪst]

“Tim takes” “latest”

[tʰɪm] [tʰeɪks] [leɪɾɪst] Contextual variation across words • Contextual variation in narrow transcription of the same word as a following word. • Examples: place assimilation • “miss it” [mɪs] • “miss you” [mɪʃ] • “I’m sure I’m gonna miss you” • What is going on here? • Results from overlap of fnal consonant gestures of one word with the initial consonant of the next. Change in Gestural overlap: Synthesis

VELUM WIDE SLOW

LIPS STOP

ALV FR TT

PALATAL NAR PAL NAR VELAR NAR TB

GLOTTIS WIDE

VELUM WIDE FAST

LIPS STOP

ALV FR TT

PALATAL NAR PAL NAR VELAR NAR TB

GLOTTIS WIDE Nasal Place Assimilation • Final /n/ is sometimes assimilated to the place of a following labial or dorsal stop: • “can be” [kæ̃nbi] slow vs. [kæ̃mbi] fast Nasal Assimilation: Synthesis

“can be” SLOW “can be” FAST

VELUM WIDE VELUM WIDE

LIPS LAB STOP LIPS LAB STOP

ALV ST ALV ST TT TT

PAL WIDE PAL NAR PAL WIDE PAL NAR TB VEL ST TB VEL ST

GLOTTIS WIDE GLOTTIS WIDE Nasal Assimilation to a following following coronal • “ten times” [tɛn]̃ vs “ten things” [tɛñ ]̪ • overlap of alveolar nasal and dental fricative results in blending of the two TT gestures sound produced by a given gesture to change (if you listen very carefully). The difficulty we have in perceiving this is probably due to the fact that we perceive the underlying contrastive gestural units, not their context-dependent consequences.

"ten times" vs. "ten themes"

Overlap of two gestures that attempt to control the same constrictor causes blending of the two gestures.

Return to Linguistics 120 Home Page MRI evidence for blending

“shorten this” [n]̪ [ð]

“open every” [n]