<<

Click reduction in fluent : a semi-automated analysis of Mangetti Dune !Xung

Amanda Miller Micha Elsner Department of Linguistics Department of Linguistics The Ohio State University The Ohio State University [email protected] [email protected]

Abstract ter of Gravity (COG) and Maximum Burst Ampli- tude (MBA), differentiate the clicks in clear pro- We compare click production in fluent ductions. We extend this analysis to fluent, natu- speech to previously analyzed clear pro- ralistic speech in a corpus of folktales (Augumes ductions in the Namibian Kx’a language et al., 2011). Using a semi-automated rule-based Mangetti Dune !Xung. Using a rule-based method to locate the clicks in the acoustic data, software system, we extract clicks from we are able to inexpensively align a large enough recorded folktales, with click detection ac- portion of the corpus for acoustic analysis. We curacy about 65% f-score for one story- show that the cues identified by Miller and Shah teller, reducing manual annotation time by (2009) are less effective in differentiating clicks two thirds; we believe similar methods in running speech, providing quantitative evidence will be effective for other loud, short con- that this rare class of is subject to pho- sonants like ejectives. We use linear dis- netic reduction. Finally, we provide an analysis criminant analysis to show that the four of the acoustic cues which differentiate the clicks, click types of !Xung are harder to differ- showing that the best cues for discriminating pro- entiate in the folktales than in clear pro- ductions vary among speakers, but that in general ductions, and conduct a feature analysis spectral cues work better than measures of ampli- which suggests that rapid production ob- tude. Overall, our results demonstrate that clicks, scures some acoustic cues to click identity. which are known for being unique in their loud- An analysis of a second storyteller sug- ness, are not always so loud, and that even sounds gests that clicks can also be phonetically that are known for their loudness undergo reduc- reduced due to language attrition. We ar- tion just like other speech sounds. gue that analysis of fluent speech, espe- cially where it can be semi-automated, is 2 Background an important addition to analysis of clear productions in understanding the phonol- 2.1 Click Burst Amplitude ogy of endangered languages. It has long been noted that clicks are louder than 1 Introduction pulmonic stop consonants. Ladefoged and Traill (1994) note that clicks in !Xo´o˜ often have a peak- We compare click production in fluent speech to-peak voltage ratio that is more than twice that to previously analyzed clear productions in the of the onset of the following (about a 6 dB Namibian Kx’a language Mangetti Dune !Xung difference in intensity), which Traill (1997) com- (hereafter !Xung). This language contains the four pares to Greenberg (1993) description of pulmonic coronal click types recognized by the IPA (Associ- stops as typically “. . . 40 dB less intense than the ation, 2006). Most content words contain an initial following vowel.” This property of clicks should click, making clicks an important marker for lex- make them easy to recognize automatically, even ical identity and useful for marking the beginning with relatively unsophisticated methods. While Li of words for speech processing (Beckman, 2013). and Loizou (2008) have shown that low amplitude Miller and Shah (2009) show that temporal cues, pulmonic are degraded in noisy speech burst duration (BD) and rise time to peak inten- environments, clicks might be expected to differ sity in the click burst (RT); and spectral cues, Cen- in this regard due to their typically high amplitude

107 Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages, pages 107–115, Honolulu, Hawai‘i, USA, March 6–7, 2017. c 2017 Association for Computational Linguistics click Alveolar, Lateral Palatal Dental and acoustically based phonological features have IPA Sym !, { =/ | been proposed to capture this contrast. There are several acoustic cues that differenti- Table 1: Intensity of noisebursts for !Xo´o˜ clicks ate stops vs. . Burst duration, rise time to (loudest to quietest) (Traill, 1997). peak amplitude and frication duration differences are all part and parcel of the manner contrast. Ka- bursts. gaya (1978) quantified the burst duration differ- Previous work on click amplitudes has also ences among Naro clicks, and showed that bilabial noted a large degree of variability, which could [ò], dental [|] and lateral [{] click types have long make both click recognition and differentation of burst durations, while alveolar [!] and palatal [=/] different click types more difficult. Traill (1997) click types exhibit short burst durations. Sands provides an intensity scale as in Table 21, but (1990), Johnson (1993) and Ladefoged and Traill states that there is a great degree of variability. (1994) showed that there are similar differences in Miller-Ockhuizen (2003) shows similar results for Xhosa and !Xo´o˜ clicks. In addition to measuring Ju|’hoansi, and also comments on the great degree click burst durations, Ladefoged and Traill (1994) of variability. Traill and Vossen (1997) note that measured Rise Time to Peak amplitude in the click there is a large degree of variability in the ampli- bursts, following Howell and Rosen (1983), who tude of click bursts. None of these studies has nu- showed that this measure differentiates pulmonic merically quantified the variability or determined from affricates. Ladefoged and Traill to what degree it makes clicks confusable with showed that the alveolar and types in non-clicks or with each other. !Xo´o˜ exhibit short rise times, while the bilabial, Traill (1997) argues that non-pulmonic stop dental and types exhibit longer rise consonants are enhanced versions of pulmonic times to peak amplitude. Johnson proposed the stops, given the high amplitude bursts that are typi- feature [+/-noisy], focusing on the acoustic prop- cally louder than the following vowel in clicks and erties of the releases, and Ladefoged and Traill in intermediate intensity bursts found in ejectives, proposed the feature as [+/-abrupt] to describe this building on Stevens and Keyser (1989)’s theory of phonological contrast in terms of the speed of the enhancement. The theory suggests that anterior release. clicks should be easier to identify in the acoustic In Mangetti Dune !Xung, these click features signal than pulmonic consonants. However, clicks were studied by Miller and Shah (2009), who with lower amplitude should also result in lower show that the palatal click burst duration preced- perceptibility in human speech recognition and ing [u] in Mangetti Dune !Xung exhibits inter- lower identification in automatic speech recogni- speaker variation. One of the four speakers’ pro- tion. What remains unknown from this work is ductions that they studied exhibited longer burst whether or how often low-amplitude clicks are ac- durations for the palatal click type, suggesting tually produced. that this speaker released the click less abruptly. Miller (to appear) explicitly compared the realiza- 2.2 Click Burst Duration tion of clicks preceding [i] and [u], showing that Click amplitudes are useful cues for extracting the palatal click type in Mangetti Dune !Xung has clicks from the speech stream, but in order to dif- two . It is non-affricated (and thus pre- ferentiate between click types, listeners must also sumably abruptly released) preceding [u] as in the attend to other features. Previous work agrees other languages, but has a period of palatalization that these features indicate different manners of (palatal frication noise) following the click burst articulation, although they differ in their theoret- preceding [i]. Miller transcribes the palatalized al- > ical account of the underlying phonological con- lophone as [=/C]. trasts. Beach (1938) referred to this difference as vs. implosive. Trubetzkoy (1969) 2.3 Click Discrimination recouched the manner contrast among clicks as vs. . Both articulatory based We know of one prior study using acoustic fea- tures for click discrimination. Fulop et al. (2004) 1Traill’s scale also includes the [ò], which is on average less intense than the others. This click does not applied discriminant analysis to the four coronal occur in !Xung. clicks in the Bantu language Yeyi on the basis

108 of the four spectral moments of the anterior click clicks to more leisurely anterior releases, that lead bursts, and showed that the classification for the to frication. They suggest that the affrication of laterals and palatals were much worse than the the abruptly released alveolar and palatal clicks classification results for the alveolars and dentals. make them less perceptually distinct from the af- The alveolar clicks only displayed a 2.6% error fricated dental and lateral clicks, and that full click rate, and the dental clicks an error rate of 24%, loss would then resolve the perceptual ambiguity while the lateral and palatal clicks displayed an among the two classes of clicks. error rate of 93% and 67% respectively. The er- Conversational reduction of clicks, meanwhile, ror rates given here represent measurements from is motivated not by language-wide change but by isolated productions. In the present study, we give general articulatory concerns. Miller et al. (2007) similar results for isolated productions in !Xung provide qualitative evidence that nasal clicks have and compare these to results for productions from a stronger and longer duration of nasal voicing in fluent speech. Like Fulop et al. (2004), we find rel- their closures in weaker prosodic positions. Mar- atively high degrees of confusion among the dif- quard et al. (2015) compared acoustic properties ferent clicks. of voiceless oral plosives and clicks in three dif- ferent phrasal positions (Initial, Medial and Final) 2.4 Click Reduction in N|uu spontaneous speech. Their quantitative re- sults showed that while the duration of pulmonic Previous work on click reduction has distin- stop closures got shorter from initial, to medial, guished two situations in which clicks are weak- to final position, the clicks were shortest in initial ened: as an intermediate stage leading to click position, and lengthened in medial and final po- loss throughout a language, and as a prosodic phe- sitions. The clicks only showed reduction effects nomenon in ordinary speech. We find evidence of for Center of Gravity (lower COG values in phrase both these phenomena in our corpus data. Here, medial and phrase final positions, compared with we review some prior work on them. phrase initial position), and in the acoustic energy Traill and Vossen (1997) quantify a stage of level (degree of voicing) before the release burst, “click weakening”, which they claim is an inter- which is highest in phrase-final position, lower in mediate stage before click loss (the change from medial position, and lowest in phrase-final posi- a click consonant to a ). They tion. Neither study investigated the effects of re- describe click weakening as a process of acous- duction on the distinguishability of clicks. tic attenuation that effects only the abruptly re- leased clicks [!] and [=/]. They compare the 3 Materials same click types in !Xo´o,˜ a language that has not yet been described as undergoing any click loss, The corpus used in the current study consists of and G{ana, a Khoe language where many of the three folktales told by two different speakers, to- alveolar clicks have been lost. They point out taling about 45 minutes of speech (Augumes et al., 2 that the weakened G{ana clicks are noisier, and 2011) . One story, Lion and Hare, was told by one have more diffuse spectra, than the strong !Xo´o˜ of the two oldest living speakers in Mangetti Dune, clicks of the same type. They quantify the am- Namibia, Muyoto Kazungu (MK). Two additional plitudes of the clicks in the different languages stories, Iguana (BG1) and Lion and Frog(BG2) by providing difference measures of click inten- were told by Benjamin Niwe Gumi, (BG) who was sity based on the peak amplitude of the click mi- a bit younger, but still a highly respected elder in nus the peak amplitude of the following vowel, the community. Our click identification tool does which provides a scale of click amplitude rela- not require a transcript. The acoustic analyses of tive to the vowel across the different languages. extracted clicks do require an orthographic tran- Further, they provide palataograms of some of script, since our tool assigns each detected click the strong, and weakened clicks, which show that to its correct phonetic category by aligning the de- “weakened articulations have larger cavities and tections to the transcript. Two of the stories have this is a result of reduction in the degree of tongue existing ELAN transcripts in the archive, and the contact.” They describe this weakening as a pro- clicks of the third were transcribed by the first au- cess of articulatory undershoot. They attribute the 2https://elar.soas.ac.uk/Collection/ noisiness of the anterior releases in the weakened MPI178567

109 thor. mum Amplitude in the Burst to the Amplitude at The laboratory data used was a set of words 20 ms into the vowel. These features were used recorded in a frame sentence that were previously in a previous study (Miller and Shah, 2009) and analyzed by Miller and Shah (2009) and Miller (to shown to separate !Xung clicks preceding [u]. We appear). use the same dataset of 248 click tokens studied by Miller and Shah (2009), extracted from single 4 Click Detection content words produced in the focused position of a frame sentence, and compare the results to We present a simple rule-based tool imple- those for 197 clicks extracted from the folktales. mented using the acoustic analysis program Praat The Miller and Shah (2009) dataset includes COG (Boersma and Weenink, 2016) to automatically values only for clicks produced before [u], so we detect clicks in the audio stream. This method is restrict our analyses of the folktales to the clicks intended to locate clicks as a general class; we produced before non-low back [u] and [o] discuss the problem of separating the clicks by to make the two sets as comparable as possible. type below (Sec. 5). Because clicks are relatively The [u] data from Miller and Shah (2009) are all short in duration and high in amplitude, the tool bimoraic monosyllabic words containing the long searches the acoustic signal in 1 ms frames. vowel [u ], though they vary in terms of and At each frame, a potential click is detected if : type. In the texts, both monosyllabic bi- the raw signal amplitude exceeds 0.3 Pascal and moraic and bisyllabic words with two short vow- the Center of Gravity exceeds 1500 Hz. If the re- els occur. The vowels following the clicks in the gion of consecutive frames which passes these fil- monosyllabic words in the stories are either a long ters has a duration less than 20 ms, it is labeled monophthong like [u ] or [o ], or are one of the as a click. For MK, the center of gravity cutoff is : : diphthongs that commences with a non-low back changed to 1000 Hz and the durations allowed to vowel: [ui, oe, oa]. In CVCV words, both vowels extend to 25 ms. (These parameters were tuned are short. All laryngeal release properties (voiced, impressionistically.) aspirated, glottalized) of clicks and vowels with We explored a few other measurements for non-modal qualities were included, as these identifying clicks. A relative measurement of am- don’t effect the vowel quality (only the voice qual- plitude (checking that the frame has higher am- ity of the vowel). Both nasal and oral clicks are plitude than the one 15 ms back) improves preci- also included. Uvularized clicks were excluded, sion but at the expense of recall. Since we hand- as were epiglottalized vowels, as these affect the corrected the output of our tool, we opted to em- vowel quality, and it is unknown, but possible, that phasize recall (it is easier for a human analyst to they might affect the C.O.G. of the click bursts. reject click proposals than to find clicks that the tool has not marked). We also attempted to reject For the detection of clicks, and for the acoustic short vowel sounds by checking for detectable for- analysis of detected clicks, we measured the Rise mants within the high-amplitude region, but this Time to Peak Amplitude (RT) in the burst as the proved unreliable. duration from the onset of the click burst to the Following click detection with the tool, a hu- maximum RMS amplitude during the click burst man analyst corrected all three transcripts. This proper (transient, not including separate frication process took less than 1 hour for BG, who con- 2 noise or aspiration noise that follows the tran- sistently produced his clicks at higher amplitudes, sient), following Ladefoged and Traill (1994). The but 2-3 hours for MK, who varied his click ampli- click burst duration was measured as the duration tudes more widely. The corrected transcripts are of the transient itself. The center of gravity was used as a gold standard for evaluating the tool’s measured using the standard Praat measure on a stand-alone performance. 22,050 Hz spectrum that was calculated using a Hanning window. The relative burst amplitude 5 Acoustic Analysis was measured as the maximum RMS amplitude We compute 4 acoustic features known to differ- found in the click burst (release of the anterior entiate coronal clicks: Burst Duration (BD), Rise constriction) divided by the RMS amplitude of the Time to Peak Amplitude in the Burst (RT), Cen- following vowel at a point 20 ms from the start ter of Gravity (COG) and the Ratio of the Maxi- of the vowel. The 20 ms point was chosen as it

110 Transcript Clicks Prec Rec F Dataset N Clicks Disc. acc MK 1 250 38.1 17.6 24.1 Lab 248 75 BG 1 180 66.8 61.6 64.2 Folktales 197 54 BG 2 202 65.3 70.8 67.9 Lab (spkr JF) 75 87 All 632 59.6 47.2 52.7 Lab (spkr MA) 75 84 Lab (spkr TK) 74 92 Table 2: Number of clicks and detection results for Folktales (spkr BG) 142 73 three transcripts. Folktales (spkr MK) 55 56

Table 3: Linear discriminant analysis accuracies was far enough into the vowel to allow the vowel (leave-one-out) on folktale and laboratory clicks. to reach a higher amplitude, but contained com- pletely within the first mora of the vowel. This assured that the vowel being measured was [u] or using the four acoustic features from Miller and [o] in all cases. Shah (2009), which were shown to differentiate among the four click types in clear productions. 5.1 Results Here, we show that they are much less effective Our evaluation (Table 2) scores a system- for fluent speech, suggesting that clicks, like other annotated click as correct if it occurs within 10 speech sounds, are reduced in fluent speech, blur- ms of a true click. (Small variations in this num- ring the primary acoustic cues that distinguish be- ber affect the result relatively little, since click tween them. bursts are typically short, distinctive events.) On 6.1 Features the two transcripts of speaker BG, results are rel- atively good (precision around 65, recall between Burst duration and Rise time to peak amplitude are 60-70), enabling rapid post-correction by a human both acoustic correlates of , analyst. Performance is much poorer for MK (pre- indicating the click’s degree of frication. Longer cision 38, recall 17) and post-correction took over burst durations and rise times to peak amplitude four times as long. both indicate more frication, while affricates have Precision errors for BNG generally corre- an immediate high-amplitude burst right after the sponded to other short, loud speech sounds: coro- release of the initial constriction. The relative burst amplitude reflects the size of the cavity and nal ejectives: [ts’], [Ù’] and the highest amplitude part of [i] vowels. Errors for MK were more var- the abruptness of the release burst. The fourth ied; MK produced many quieter click bursts which acoustic attribute that was measured, Center of were less distinct from the surrounding speech, Gravity (COG), correlates with the the size of the and it was harder to set cutoffs that would distin- lingual cavity of the click, and therefore is deter- guish the clicks from pulmonic stops and vowel mined by the of both constric- sounds. See Figure 1 for example spectrograms. tions. We believe these very low-amplitude clicks are 6.2 Discriminant Analysis a consequence of MK’s linguistic background, a Using linear discriminant analysis in the R pack- possibility we return to in more detail below (Sec. age MASS (Venables and Ripley, 2013), we find 7). that these features indeed differentiate clicks in 6 Acoustic Analysis of Clicks the single-word lab productions, but are less di- agnostic in fluent speech. Accuracies (Table 3) Once the clicks have been extracted, we conduct are computed with leave-one-out cross-validation. an acoustic analysis of the four click types. The The lab speech clicks are classified with 75% ac- previous section focused on the task of distin- curacy, while performance on the folktale clicks is guishing clicks from other sounds as an engineer- reduced to 54%. This gap is exaggerated by the ing application. Here, we build a model to dis- poor discriminability of clicks produced by MK, criminate between the four click types, in order to whose atypically quiet clicks were also difficult understand how much information they contribute to detect. However, a similar result can be ob- for lexical identification in real speech process- tained by comparing individual speakers. When a ing. We conduct a linear discriminant analysis model is learned for each speaker individually, the

111 104 104 8000 8000 6000 6000 4000 4000 2000 2000 Frequency (Hz) Frequency (Hz) 0 0 i n!Ae te ko n!un te n!ae khe

! ! !

0 0.1 0.2 0.3 0.4 0.5 0.6 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 Time (S) Time (S)

Figure 1: Spectrograms of the in the word n!a‚e´ “lion” produced by BG (left) and MK (right) showing the difference in burst amplitudes. ¨ three lab speakers’ clicks are discriminated with and TK but neither of the others. Interestingly, 84-92% accuracy, while the folktale speaker BG’s MK’s atypical clicks are classified mainly based clicks are discriminated with only 73% accuracy. on their duration, a cue which was uninformative Thus, although intra-speaker variability decreases for the rest of the dataset. A small ablation anal- the accuracy of the classifier in both settings, it is ysis on BNG and MK’s data tells the same story; still clear that the folktale clicks are harder to dis- COG is responsible for most of the classification criminate overall. performance for BNG (70% with COG alone vs A visual explanation of the result is shown in 73% with all features). For MK, it is less use- Figure 2, where we plot the RT vs COG (the two ful but still captures over half of classifier perfor- most discriminative features for these speakers) mance (35% vs 56%). for clicks from the folktale storyteller BG versus one laboratory elicitation speaker. The laboratory 7 Discussion clicks show a clear separation among all four click We can infer from the evidence provided that types. Among the folktale clicks, only the dental !Xung clicks are subject to phonetic reduction in [|] is cleanly separable from the others. fluent speech. The primary temporal and spectral An examination of the learned discriminant cues for click identification become highly vari- functions shows the relative importance of the four able and less informative in rapid production. Lis- acoustic cues. Each discriminant function is a lin- teners presumably use top-down information like ear combination of the cues; in our data, the first lexical context to make up for increased confus- discriminant function captures most of the vari- ability. Thus, !Xung clicks behave much like other ance between the clicks for all speakers except speech sounds in rapid production, despite their MK, whose clicks were poorly classified to begin canonical loudness, which makes them stand out with. Table 4 shows the coefficients of the first from the speech stream in clear speech. discriminant function for several datasets. For the Although clicks in fluent speech are harder to other speakers, COG is the most discriminative discriminate from one another, our results do sup- property of clicks, but the second-most discrim- port the widespread idea that clicks as a class are inative function varies among speakers. Ampli- easy to pick out of the speech stream, at least for tude is a good cue for two of the laboratory speak- speakers who produce them in the canonical way. ers, MA and TK, but not for JF or the folktale Despite relying on a few features and hand-tuned speaker BG; rise time is also a good cue for MA threshold parameters, our click detection script

112 Speaker BNG (Story data, [o,u]) Speaker JF (Lab data, [u]) 6000 6000

Click Click 4000 4000 ! ! / / // // 2000 = 2000 = Center of Gravity (Hz) Center of Gravity (Hz) Center of Gravity

0 0 0.00 0.01 0.02 0.03 0.00 0.01 0.02 0.03 Rise Time (S) Rise Time (S)

Figure 2: Folktale and laboratory clicks: RT vs. COG

% var Rise T COG Dur Max amp Spkr Feats Acc Lab 92 0.41 1.63 0.03 -0.33 BNG COG only 70 Lab (JF) 98 0.53 3.72 -0.25 -0.09 BNG COG/rise 70 Lab (MA) 88 1.12 1.89 -0.17 -1.70 BNG all 73 Lab (TK) 98 1.53 5.28 -0.03 -1.11 MK COG only 35 Folk (BNG) 96 0.26 1.45 0.35 0.40 MK COG/rise 47 Folk (MK) 67 0.03 -0.40 1.38 0.52 MK all 56

Table 4: Left: Percent of variance captured and coefficients of four features in the first discriminant function learned in different datasets. Right: Ablation results for two speakers. was able to automate enough of the acoustic analy- language endangerment (Traill and Vossen, 1997). sis to save a substantial amount of transcriber time It seems, therefore, that MK’s clicks represent an and effort. We expect that other non-pulmonic initial stage of language loss and replacement with consonants like ejectives could also be detected Bantu, which was reversed for the younger gener- with similar methods. These results are encourag- ation. BG’s speech represents this revitalization of ing for corpus research in endangered languages. !Xung and its replacement of Bantu as the prestige language in the community. The extremely low accuracy for click detections in the speech of MK should qualify this conclu- The implication for speech technology and cor- sion. There are a few possible reasons why MK’s pus research is that detection methods may vary clicks are much lower in amplitude, harder to de- in their accuracy from community to community. tect and harder to discriminate than those of the Methods developed for robust language commu- other speakers. First, MK is the oldest speaker in nities may need to be recalibrated when work- the dataset (in his 70s). Second, MK spent a large ing with severely endangered languages or fea- portion of his life in Angola living among speak- tures undergoing rapid change. Within a single ers of an unknown Bantu language which did not community, however, the accuracy of a tuned de- include clicks. This Bantu language was an impor- tector might serve as a measure of language loss tant mode of communication for much of his life, by quantifying the degree to which the target seg- and indeed, he occasionally code-switched into it ments have been lost. during the storytelling session. Perhaps because of Our results reveal new facts about the discrimi- this L2 background, MK produced some phone- native features for clicks. For example, although mic click consonants as pulmonic stops (primarily Traill (1997) provided a scale of click burst in- [k,g] for [!,{], and [c,] for [|, =/]), and produced tensity, shown in (1) above, the variability of the extremely variable amplitudes for many of the oth- amplitude of the alveolar click bursts relative to ers. the following vowel is so high, that it is clear that Similar variability in click production is re- the amplitude alone can not be very useful in dis- ported in langugage as a stage in click loss due to criminating the four click types. As mentioned

113 above, the relative perceptual weighting of the two et al., 2016; Bird et al., 2014) in training a full- temporal measures (click burst duration and rise scale ASR system, or to bootstrap a learning-based time to peak amplitude) is completely unknown landmark recognition system (Hasegawa-Johnson for clicks. Comparing our results to Fulop et al. et al., 2005). (2004) Yeyi results, we can conclude that a com- bination of manner cues and place of articulation Acknowledgments cues results in much better discriminability. Of We thank Mehdi Reza Ghola Lalani, Muyoto course, we can not rule out the contribution of Kazungu and Benjamin Niwe Gumi. This work click reduction / loss to the poorer discriminability was funded by ELDP SG0123 to the first author seen in the Yeyi results. and NSF 1422987 to the second author. Machine learning can indicate how much in- formation about click identity is carried by each of these cues, but this does not necessarily reveal References which cues are important to human listeners. For International Phonetic Association. 2006. The inter- instance, English and affricates are also national phonetic alphabet (revised to 2005) [chart]. differentiated by duration and rise time to peak page 135. amplitude (Howell and Rosen, 1983). Early stud- ies assumed that Rise Time was the main acous- Christine Augumes, Amanda Miller, Levi Namaseb, and David Prata. 2011. Mangetti Dune !Xung sto- tic feature of importance. However, Castleman ries: In !Xung and English. The Ohio State Uni- (1997) showed that frication duration differences versity, the University of Namibia, and the Mangetti among the English contrast are more perceptually Dune Traditional Authority. Deposited at ELAR. relevant than the rise time differences. While our Douglas Martyn Beach. 1938. The of the results imply that COG is the most informative Hottentot language. W. Heffer & Sons, ltd. criterion for click identity, further perceptual ex- periments could tell whether this matches listen- Jill N Beckman. 2013. Positional faithfulness: an Op- timality Theoretic treatment of phonological asym- ers’ actual perceptual weightings. Of course, four metries. Routledge. manually selected features and a linear classifier do not tell the whole story of click discriminabil- Steven Bird, Lauren Gawne, Katie Gelbart, and Isaac ity. A more sophisticated model (King and Tay- McAlister. 2014. Collecting bilingual audio in re- mote indigenous communities. In COLING, pages lor, 2000) could discover features directly from the 1015–1024. acoustic signal; however, we believe our features acceptably represent the major categories of cues. Paul Boersma and David Weenink. 2016. Praat: do- ing phonetics by computer. Version 6.0.20 from http://praat.org. 8 Conclusion Wendy Ann Castleman. 1997. Integrated perceptual Results suggest that phonetic studies of endan- properties of the [+/-] distinction in frica- gered languages must consider both clean produc- tives and affricates. tions and naturalistic speech corpora. It is impor- Sean A Fulop, Peter Ladefoged, Fang Liu, and Rainer tant to discover not only the phonemic inventory Vossen. 2004. Yeyi clicks: Acoustic description and of the language and the canonical landmarks that analysis. Phonetica, 60(4):231–260. allow listeners to recognize speech sounds in clear Steve Greenberg. 1993. Speech processing: Auditory speech, but also the range of phonetic variability models. In R.E. Asher and S.M.Y. Simpson, editors, displayed in fluent speech. In this study, investi- Pergammon Encyclopedia of Language and Linguis- gation of led to the conclusion tics, Vol. 8, pages 4206–4227. that the scale of click burst intensity is not very Mark Hasegawa-Johnson, James Baker, Sarah Borys, useful in distinguishing clicks, since the amplitude Ken Chen, Emily Coogan, Steven Greenberg, Amit of alveolar click bursts is so variable. In study- Juneja, Katrin Kirchhoff, Karen Livescu, Srividya ing natural data, rule-based extraction of particu- Mohan, et al. 2005. Landmark-based speech recog- lar segments may offer a low-cost alternative to nition: Report of the 2004 Johns Hopkins summer workshop. In Acoustics, Speech, and Signal Pro- developing a full ASR system for a language with cessing, 2005. Proceedings.(ICASSP’05). IEEE In- little available data. The processed data could be ternational Conference on, volume 1, pages I–213. used to supplement non-expert annotations (Liu IEEE.

114 Peter Howell and Stuart Rosen. 1983. Produc- Kenneth N Stevens and Samuel Jay Keyser. 1989. Pri- tion and perception of rise time in the voiceless mary features and their enhancement in consonants. affricate/fricative distinction. The Journal of the Language, pages 81–106. Acoustical Society of America, 73(3):976–984. Anthony Traill and R Vossen. 1997. Keith Johnson. 1993. Acoustic and auditory analyses in the : new data on click loss of Xhosa clicks and pulmonics. UCLA Working Pa- and click replacement. Journal of African languages pers in Phonetics, 83:33–45. and linguistics, 18(1):21–56.

Ryohei Kagaya. 1978. Soundspectrographic analy- Anthony Traill. 1997. Linguistic phonetic features sis of Naron clicks: A preliminary report. Annual for clicks: Articulatory, acoustic and perceptual evi- Bulletin of Institute of Logopedics and Phoniatrics, dence. African linguistics at the crossroads: Papers 12:113–125. from Kwaluseni, pages 99–117. Nikolai S. Trubetzkoy. 1969. Principles of Phonology. Simon King and Paul Taylor. 2000. Detection of University of California Press. phonological features in continuous speech using neural networks. Computer Speech and Language, William N Venables and Brian D Ripley. 2013. Mod- 14(4):333–353. ern applied statistics with S-PLUS. Springer Sci- ence & Business Media. Peter Ladefoged and Anthony Traill. 1994. Clicks and their accompaniments. Journal of Phonetics, 22(1):33–64.

Ning Li and Philipos C Loizou. 2008. The contribu- tion of consonants and acoustic landmarks to speech recognition in noise. The Journal of the Acoustical Society of America, 124(6):3947–3958.

Chunxi Liu, Preethi Jyothi, Hao Tang, Vimal Manohar, Rose Sloan, Tyler Kekona, Mark Hasegawa- Johnson, and Sanjeev Khudanpur. 2016. Adapt- ing ASR for under-resourced languages using mis- matched transcriptions. In Acoustics, Speech and Signal Processing (ICASSP), 2016 IEEE Interna- tional Conference on, pages 5840–5844. IEEE.

Carina Marquard, Oliver Niebuhr, and Alena Witzlack- Makarevich. 2015. Phonetic reduction of clicks– evidence from N|uu. In Proceedings of the Interna- tional Congress of Phonetic Sciences (ICPhS).

Amanda Miller and Sheena Shah. 2009. The acoustics of Mangetti Dune !Xung clicks. In Proceedings of INTERSPEECH, pages 2283–2286.

Amanda L Miller, Johanna Brugman, Bonny Sands, Levi Namaseb, Mats Exter, and Chris Collins. 2007. The sounds of N|uu: Place and airstream contrasts. Working Papers of the Cornell Phonetics Labora- tory, 16:101–160.

Amanda Miller-Ockhuizen. 2003. The Phonetics and Phonology of : A Case Study from Ju|’hoansi. Outstanding Dissertations in Linguistics Series. Routledge.

Amanda Miller. to appear. Palatal click allophony in Mangetti Dune !Xung: Implications for sound change. Journal of the International Phonetic As- sociation.

Bonny Sands. 1990. Some of the acoustic characteris- tics of Xhosa clicks. UCLA Working Papers in Pho- netics, (74):96.

115