Tongue Shape and Airstream Contrasts in N|uu Clicks: Predictable Information is Phonologically Active

Amanda L. Miller Cornell University [email protected] Outline of Presentation

• Introduction - Phonetic questions raised by click phonology

• Methods and Benefits of High-speed ultrasound w/ software mixing

• Results – Tongue shape & C-V coarticulation

• Conclusions Statement of Problem

Clicks are produced with two constrictions – one anterior and one posterior. Phonetic and Phonological studies have all asserted that only the anterior constriction varies by click type. However, anterior constriction differences do not straightforwardly account for phonological patterns seen. N|uu Lingual and Mixed Airstream Segments

LINGUAL

Labio-uvular eDnti- Alveo-uvular Palato-

pharyngeal Central Lateral pharyngeal

Stop ʘ ǀ ǃ ǁ ǂ

Nasal

LINGUO-PULMONIC

Stop ʘ͡ q ǀ͡q ǃ͡q ǁ͡q ǂ͡ q

Affricate N|uu Click Phonology: The Back Constraint C V V / C V VC V R oot s 200 180 Fr on t V ow el 160 Ba ck V ow el ncy 140 que 120 e 100 Fr l

a 80 c 60 xi 40 Le 20 0 l l - r r r - r ʘ | q - ! i = - r

. a a o- o- a o a a o- a . i t r g | t r o l r t l l o- b j t a r . =q t n a a a bi z . ol ol a u ul bi r q n k, l a l r c, bi p, ul e e l Ve ul s, ul v v De La veo La uv pha Uv pha l l De l Pa Pa La uv pha Pa uv uv A A pha A S tops S to ps S tops P u lm oni c Li ngua l L in guo- pul m oni c A irstream / Init ia l C P lace

•Anterior place does not predict the phonological patterns. •Posterior place does predict the phonological pattern – •Uvular-i sequences do not occur – upper pharyngeal –i sequences do occur. Hypotheses

• POSTERIOR PLACE OF CLICKS ACCOUNTS FOR BVC PATTERNS.

• CLICKS PREVIOUSLY TRANSCRIBED AS ‘VELAR’ vs. ‘UVULAR’ CLICKS DIFFER ONLY IN AIRSTREAM. N|UU AIRSTREAM CONTRASTS Lingual burst Lingual burst Pulmonic burst

(b) Dental (a) D e n tal

0 Ti m e ( s ) 0.075 0 Ti m e ( s) 0. 1 (c ) C e n tral alv e olar (b ) C e n tral alveolar

0 Ti m e ( s ) 0.075 0 Ti m e ( s) 0. 1 (d ) Lateral alveolar (c) Lateral alveolar

0 Ti m e ( s ) 0.075 0 Ti m e ( s) 0. 1 (e ) P a latal (d ) P a latal

0 Ti m e ( s ) 0.075 0 Ti m e ( s) 0. 1 LINGUAL STOPS LINGUO-PULMONIC STOPS – contour segments in airstream

k O || |q | =q = !q ! 0. 00 Phases f 0. o 05 r St D ops andC u r a t i o n ( i n se co l i c n ks ( d 0. s) 10 3 Speaker VO P Po A n o t s s e T t t r er e i or r io i s o R r r ) C R e l e l e o l as eas s u 0. e r e e 15 Question

Do N|uu lingual (so-called VELAR) and linguo-pulmonic (so-called UVULAR) clicks also differ in place of the pulmonic portion of the segment, as claimed by Traill (1985) and Ladefoged and Traill (1984) for similar segments in !Xoo? Methods New Architecture

• High-speed ultrasound is needed in order to investigate the dynamics of click consonants, and their co-articulation with following . • Since we can’t capture audio in the ultrasound machine, we need to do software mixing, in order to have both high speed, and accurate articulatory – acoustic alignment. DICOM DATA Path / Post Data Collection Software Mixing • High Frame rate DICOM New SysteSystem Diagraagram path with perfect fidelity to US machine. – Today’s data 51 fps Ultrasound (Loqicbook) – Next machine 100-160 fps

Microphone Ethernet • Software mixing cable DICOM (Adobe Premier Pro) 1/8 in stereo cable

SVGA cable – Powerful, precise, flexible • Redundant h / w mixed audio / Video Cam Canopus Twin 100 video path (s / w mixing aid) Audio AND Video • Video CAM path for IEEE 1374(firwire ) Firewire palataglossatron spatial correction, with audio. Spatial Correction Helmet with Palataglossatron • Articulate Instrument’s Ultrasound Helmet with Palataglossatron spatial correction (Mielke et al. 2005) – Greatly reduces tongue / palate position uncertainty • Typically correcting 2-4 mm within 5- 10 frames . – Particularly required for high frame rate, extended-times, data integrity • Increased temporal alignment and visibility precision would be less valuable without it. Ultrasound helmet results

• The ultrasound helmet limited the amount of movement to about 2-3 mm. for 2 out of 3 speakers, and a bit more for a 3rd subject.

• The larger amount of movement is due to the ill-fitting nature of the helmet for the 3rd speaker. Offset in mm by word and speaker

7.00

AK GS KE

) 6.00 m

5.00 et (m s f f O

n 4.00 o i t a l

cu 3.00 ti r a f 2.00 ce o a l

P 1.00

0.00 Palate !uu !qui 'ashes' =uuke 'fly' =quu 'neck' 'camelthorn acacia' Wo rd Tongue Height Offset in mm by word and speaker

7.00

AK GS KE 6.00 ) m

m 5.00 ( t e s f 4.00 Of ht g i 3.00 He ue 2.00 ng To 1.00

0.00 Palate !uu !qui 'ashes' =uuke 'fly' =quu 'neck' 'camelthorn acacia' Word Rotation an gle in degrees by word an d speaker

0.90

0.80 AK GS KE

0.70 ) . g

e 0.60 d ( e 0.50 gl

an 0.40 on i t a t 0.30 o R 0.20

0.10

0.00 Palate !uu !qui 'ashes' =uuke 'fly' =quu ' neck' 'cam elthorn acaci a' Wo rd High speed, greater visibility

• 50 FPS implies 20 msecs resolution instead of 33 msecs – 33 milliseconds is long for some events – 100 FPS yields 10 msecs resolution • To the frame alignment, along 9 seconds of speech – Removes audio / US mapping concerns – Opens up study of individual token differences, and other precision issues. • Adobe tools make observing 20 msecs differences easy – higher spatial image quality 51 FPS Frame clarity – see demo Resolution of spatio-temporal inaccuracies of video based ultrasound

ref Alan Wrench and Jim Scobbie*

• Many artifacts and 51 FPS distortions documented DICOM (Wrench and Scobbie 2007) are removed by this method • NO external video used in final data • DICOM preserves the internal cine loop quality 30 FPS at ANY frame rate. Video – B-mode sweep preserved * – Notice artifact due to multiple scan times in same external video image

Dynamic Alveolar click [!] release To the single frame, temporal alignment with high frame rate DICOM • DICOM perfect cine loops at any speed • H/w mixed audio / 30 fps cine loops • Align audio (adobe slide tool) with DICOM cine loop – Mixed US as guide – Linguistic knowledge along entire 9 sec loop Frame Mapping Demo

• Adobe "to the Frame" Alignment Visual Estimate* : frame alignment accuracy for all independent events along a timeline is better than the most accurate event

Assumed: the time alignment error is a dependent event • During 9 seconds of speech there are multiple independent events , but the alignment precision of all is dependent and equal. • Each token has a known probability of being accurate “to a frame” by observation • Probability theory for dependent events (each alignment error) predicts ¾ Given that event B has happened ( and A is dependent on B) , then the probability A will happen is greater . ¾ P{A|B}=P{A ∩ B} / P{B} ¾ Given token B is aligned “within a frame”, the probability that token A will be aligned as well is greater than without B. ¾ Implies potential for “probable” sub-frame alignment, by looking at all events on the timeline.

*Resmi Gupta, Cornell Statistical Consulting Unit [email protected]

Error band A Error band B Error band C Palataglossatron Data Prep: All Frames Aligned (except palate)

• Palette (LB) N/A •51 Hz (LB) •30 Hz (LT) • Video cam (HV) Results Head Correction results

• In data uncorrected for head movement, [k] and [q] do not look much different.

• In data corrected for head movement, [q] has a clearly further back constriction than [k].

• The [!] back constriction is more like [q] than [k]. Alveolar Click [!] Correction

BEFORE AFTER 0 3 - 200 5 5 9 3 1 0- 4 190 - 1] 1] 2, , 5 2, e[ 4 185 , [ e ac r c t . a r t - 0 new - 0- 18 5 - 5 175 5

k k 0- C Clo 170

6 C Clo - u u q q

20 40 60 80 100 110 120 130 140 150 160 170

trace[, 1, 1] new.trace[, 1, 1] Before correction, [!] is more similar to [k]. After correction, [!] is more similar to [q]. [q] and [!] similarity arises from tongue root shape. Alveolar Click [!] correction

BEFORE AFTER 0 3 - 0 1 2 40 - ] 1 1] ,

200 2, [, 2 , [ 0 e e 5 c - c a a r .tr t - w e n - 0 9 1 60 - 0

0 k 7 18 k - C Clo u C Clo q u q

40 60 80 100 40 60 80 100 120 trace[, 1, 1] new.trace[, 1, 1] In this case, the difference is not as large, but [!] still becomes more similar to [q] after correction, then before correction. We see that the hump of the [u] 40 ms after [! ] is the width only of the posterior constriction in [!], not both constrictions. Correction effects- Palatal click

• Correction for head movement allows us to see tongue root raising implicit in the palatal click more clearly.

• Tongue root raising captures similarity of [ǂ], with the production of [u] described by Esling (2005). Palatal Click [ǂ] Correction

BEFORE AFTER 220 0 3 - 0 21 0 ] -4 1 , 1]

[, 2 2, e , [ c e a 0 c .tr 0 a r 2 w t - e n 0 - 5 - 0 19 0 -6 k k C Clo C Clo u u q q

30 40 50 60 70 80 90 100 40 50 60 70 80 90 100 110

trace[, 1, 1] new.trace[, 1, 1] The palatal click posterior constriction is farther back in the corrected plot than in the uncorrected plot. The palatal click posterior tongue is raised. C-V Coarticulation

BEFORE AFTER 0 -3 0 5 21 3 - 0 4 ] - 1 , 0 1]

0 [, 2 2 2, e , [ c 5 e a 4 c - .tr a r w t - e n - 0 -5 90 1 5 5 -

k k 0

0 C Clo C Clo 18 -6 u u q q

20 40 60 80 100 120 140 160

trace[, 1, 1] new.trace[, 1, 1] We can see clear coarticulation between [ǂ] and [u]. [u] 40 ms following [ǂ] has a wide hump that spans the width of the two humps in the palatal click. C-V Coarticulation

• These data show that there are differences in the timing and manner of the release of the anterior constriction in the alveolar [!] and palatal [ǂ] clicks. • In [!], the anterior constriction is released quickly, and the tip of the tongue is low in the mouth. Therefore the [u] following the click only maintains the shape of the posterior constriction of the click. •In [ǂ], the anterior constriction has a slow laminal release, and the [u] following the click has characteristics of both the anterior and posterior constrictions. That is, the hump of the [u] spans the width of both click constrictions. Palatal Linguo-pulmonic stops

Tongue shape in [u] following [ǂ͡q] displays even greater coarticulation than is found with [ǂu] sequences. Here we see two peaks in the [u]. Alveolar Linguo-pulmonic stops BEFORE AFTER 0 0 21 3 - 0 0 4 0 - 2 ] 1] 1 , 2, , 2 , [ e[ e c ac r 0 t ra . -t -5 90 new 1 - 0 6 - 0

k 18 C Clo k C Rel C Clo C Rel

0 u u -7 q q

20 40 60 80 100 40 60 80 100 120 trace[, 1, 1] new.trace[, 1, 1] [k] and [q] are more different after correction. We see more retraction in the alveolar click after correction. Conclusions

• Alveolar and palatal clicks differ in tongue body and tongue root shape, as well as the place of the posterior constriction. Posterior place does not predict the phonological patterns. We must distinguish between TONGUE ROOT RETRACTION vs. TONGUE ROOT RAISING (Esling 2005), or TONGUE BODY and TONGUE ROOT SHAPE.

• Clicks previously transcribed as being “velar” vs. “uvular” clicks do not display a difference in posterior place. Thus, the so-called uvular clicks must be represented as contour segments on the airstream dimension.

• This provides evidence for airstream being a fourth descriptor (in addition to place, manner and voicing) needed to describe consonants. Conclusions (Cont’d)

• High speed ultrasound with software mixing allows us to investigate dynamic segments like click releases and processes like C-V coarticulation, and to investigate issues of articulatory to acoustic alignment.

• Results here have shown that the anterior release of [!] is faster than [ǂ]. Thus, we see a wider hump in a following [u] in a post-[ǂ] context than in a post-[!] context. Conclusions (cont’d)

• Although tongue body shape is to some extent predictable from anterior tongue shape, the phonological pattern is only phonetically grounded if we state it in terms of tongue body or tongue root shape / dynamics of posterior release • Clicks provide evidence that predictable phonetic information is phonologically active. References Esling, J. (2005) There Are No Back Vowels: The Laryngeal Articulator Model. Canadian Journal of Linguistics 50, 13-44. Ladefoged, P. & Traill, A. (1984). Linguistic phonetic description of clicks. Language 60, 1-20. Lindblom, B. and Sundberg, J. (1971). Acoustical Consequences of Lip, Tongue, Jaw and Larynx Movement, JASA 50, 4(pt.2), 1166-1179. Mielke, J., Baker, A., Archangeli, D. and Racy, S. (2005). Palatron: A Technique for Aligning Ultrasound Images of the Tongue and Palate. Coyote Papers: Working Papers in Linguistics, Linguistic Theory 14, Siddiqi, D. and Tucker, B., eds. 96-107 (Available online at http://coyotepapers.sbs.arizona.edu/CPXIV/CPXIV%20p96- 107%20Mielke-Baker-Archangeli-Racy.pdf) Miller, Brugman, Sands, Exter, Namaseb and Collins (2007) Differences in Airstream and Posterior Place of Articulation among Nǀuu Lingual Stops, JIPA. References (cont’d)

Sagey, Elizabeth (1986). ”Representation of Features and Relations in Non-Linear Phonology”. MIT Dissertation in Linguistics. Stone, M., Epstein, M. & Iskarous, K.(2006) Functional segments in tongue movement, Clinical Linguistics & 18, 507-521. Traill, a. (1985). Phonetic and phonological studies of ǃXóõ Bushman (Quellen zur Khoisan- Forschung 1). Hamburg: Helmut Buske Verlag. Wrench, A. and Scobbie, J. 2007. Spatio-temporal inaccuracies of video-based ultrasound images of the tongue. Acknowledgements The research on N|uu was made possible by a National Science Foundation grant “Collaborative Research: Descriptive and Theoretical Studies of N|u” (BCS- 0236735/BCS-0236795, PIs: Amanda Miller-Ockhuizen, Christopher Collins and Bonny Sands).

I would like to thank collaborators Johanna Brugman, Chris Collins, Mats Exter, Khalil Iskarous, Levi Namaseb and Bonny Sands. Acknowledgements (Cont.)

I would also like to thank our N|uu consultants Khais Brow, Geelmeid Esau, Anna Kassie, Hannie Koerant, Hanna Koper, Andreis Olyn, |Una Rooi and Griet Seekoeie, as well as our translators Collin Louw, Willem Damarah, Gerhardus Damarah and Gertuida Sauls for their patience and good humor during our sometimes tedious recording sessions.