Analysis of Nasal Consonants Using Perceptual Linear Prediction
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of nasal consonants using perceptual linear prediction YingyongQi Departmentof Speech and Hearing Sciences and Department of E!ectrical and Computer Engineering, Universityof •4rizona, Tucson, •4 rizona 85721 RobertA. Fox Diuisionof Speech and Hearing Science& Ohio State Unioersit•, Columbus, Ohio 43210 (Received31 May 1991;accepted for publication19 November 1991 ) Until recently,speech analysis techniques have been built aroundthe all-polelinear predictive model.This studyexamines the effectivenessof usingthe perceptuallinear predictive method for analyzingnasal consonants. Six speakers(three men and three women) produced300 CV syllableswith initial nasalconsonants/m/and/n/. A threshold-basedboundary detection algorithmwas developed to extractnasal segments from the CV contexts.Poles of a fifth-order perceptuallinear predictivemodel were calculated and the frequencyof the secondpole was usedto characterizethe placeof articulationof nasalconsonants. Results indicated that the frequencyfor the secondtransformed pole was significantly lower for/m/than for/n/and wasindependent of factorssuch as vowelcontext and genderof the speaker.A nasal identificationrate of 86% wasobtained based on the frequencyof the secondpole. The useof the perceptuallinear predictive method may thusovercome some difficulties associated with analyzingnasal consonants. PACS numbers:43.71.Es, 43.71.Cq, 43.72.Ar INTRODUCTION salconsonants and measured the frequency of theantireson- The progressivephonetic refinement of broadcategories anceusing the methodof analysisby synthesis.This study of speechsounds (stop, fricative, and sonorant) is animpor- provided,for the first time, valuableexperimental evidence tant step in implementingspeech understanding and/or on the predictedacoustic characteristics of nasal conson- speechrecognition systems (Reddy, 1976); however, ants.As pointedout by the author,however, "the procedure knowledgeabout the acousticcues that allow for reliable ... is by no meanssimple or mechanical... and a machine extractionof phoneticcharacteristics of speechfrom the couldhardly replacethe humanoperator at the stageof this acousticwaveform is limited (Harrington, 1988). For exam- study." ple, robustfeatures that enableautomatic within-class dif- Kurowski and Blumstein(1987) reportedthat the place ferentiation of nasal consonants have not been well estab- of articulationof nasalconsonants in CV syllablescould be lished. differentiatedby thepattern of spectralenergy change in the The acousticstructure of nasal consonantshas long vicinityof theoral releaseonce the spectra were transformed beenpredicted by the acoustictheory of speechproduction usingan algorithmbased on the psychoacousticcharacteris- (Fant, 1960;Fujimura, 1962; Flanagan,1972). The pres- ticsof humanauditory system. Utilizing criticalband analy- enceof a side-branchingresonator (the blockedoral cavity) sis based on the Bark scale (Zwicker, 1970), Kurowski and will introducean antiresonance(zero) in the spectrumof Blumsteinexamined the spectraof two glottalpulses of the nasal consonants.Theoretically, the antiresonancecan be nasalmurmur immediatelypreceding oral releaseof the na- usedto identifythe placeof articulationof nasalconsonants salstop closure and the firsttwo glottalpulses after the nasal becausethe frequencyof the antiresonanceis determinedby release.The spectralenergy increase was reported to belarg- the dimensionof the side-branchingresonator. For example, er between 5-7 Bark than between 11-14 Bark for the nasal thefrequency of theantiresonance should be lower for/m/ /m/; whereasthe spectral energy increase was reported to be than for/n/because the blockedoral cavityattached to the larger between11-14 Bark than between5-7 Bark for the pharyngeal-nasalpassage is longerfor/m/than for/ru t. nasal/n/. Although the auditory-basedspectral transfor- Such differences,however, are difficult to detect using con- mation provides a reasonablealternative for analyzing the ventionaltechniques of spectralanalysis. The presenceof the nasalconsonants, these features were inevitablycontext de- spectralzero introducesnonlinear equations to parametric pendentbecause spectral information of the followingvowel methodsof spectralanalysis (Kay, 1987). The harmonic was an important part of the measurement.In addition, structure,the absence of a prominentspectral valley, and the there wasa certaindegree of arbitrarinessin the selectionof significantlydamped high-frequency energy in nasalspectra the frequencyrange for comparingenergy increase around are the sources of some of the difficulties associated with nasalrelease since the selectionwas neithersystematically analyzingnasal consonants using nonparametric (periodo- optimizedgiven the transformedspectrum nor directlypre- gram) methods. scribedby theory. Fujimura (1962) examinedthe acoustic structure of ha- Hermansky(1990) proposeda new approach,called 1718 J. Acoust.Soc. Am. 91 (3), March1992 0001-4966/92/031718-09500.80 (• 1992Acoustical Society of America 1718 Downloaded 21 Sep 2011 to 128.146.89.68. Redistribution subject to ASA license or copyright; see http://asadl.org/journals/doc/ASALIB-home/info/terms.jsp perceptuallinear predictive (PLP) method,for thespectral cyand bandwidth of oneequivalent spectral peak. The real transformationbased on characteristicsof the auditorysys- pole specifiesthe overall spectraltilt of the transformed tem.While the underlying principles of thespectral transfor- spectrum.Because nasal spectra are dominatedby only one mationwas similar to thosepreviously reported (Kewley- low-frequencypole (250-300 Hz) (Fujimura, 1962;Fuji- Port and Luce, 1984; Kurowski and Blumstein, 1987), the mura arid Lindqvist, 1971), the 5th-order PLP model is finalall-pole modeling of thetransformed spectrum parame- adoptedhere for the analysis.However, the realpole is not tricallyspecified the spectrumand, thus,simplified the ex- consideredin the finalanalysis because spectral tilt caneasi- tractionof acousticcharacteristics of speechsignals. Her- ly be influencedby factorssuch as recording conditions and mansky (1990) demonstrated that a fifth-order PLP glottal characteristicsof a particular speakerand is often spectraltransformation improved the performanceof a tem- consideredas a phoneticallynonessential factor (Klatt, plate-basedspeech recognition system. 1982). In light of the newlyproposed PLP methodof spectral analysis,the presentstudy was undertakento determine A. Subjects and recordings whether acoustic characteristics of nasal consonants could Sixnative speakers of AmericanEnglish (three men and be found that were independentof the following vowel con- three women) providedthe speechsamples. Each speaker text in a CV syllableand to evaluatewhether place of articu- producedten C¾ syllableswith initial nasalconsonants/m/ lation couldbe automaticallyidentified on the basisof these and/n/followed by five differentvowels/i/,/a•/,/o/,/a/, features.The paper is organizedas follows. SectionI in- and/u/. Eachsyllable was repeated five times. The syllables cludesa brief rationalefor usingthe PLP method and the wererandomly ordered. The recordingswere made in a quiet implementationdetails of the PLP analysis.Section II con- room with backgroundnoise at about 30 dB (SPL). The tainsa descriptionof thecontext-independent acoustic char- microphonewas placed approximately10 cm from the acteristicsof nasalconsonants revealed by the PLP analysis mouth of the speaker.Recorded tokens were low-passfil- and the results of nasal identification based on these features. tered (f• = 4.5 kHz) and digitizedat a samplingrate of 10 Section III is the discussion and conclusion. kHz with a 16-bitquantization level. I. METHOD B. Nasal segment selection The integrationof spectralenergy as part of the audi- A 25.6-mssegment of the nasalconsonant before the tory-basedspectral transformation provides a uniqueadvan- oral releaseinto the followingvowel (i.e., within the nasal tage in analyzingthe spectralcharacteristics of nasalcon- murmur) was selectedfor the PLP analysis.The boundary sonants. As stated earlier, the absenceof a prominent betweenthe nasalconsonant and the following vowel was spectralvalley and the reducedspectral energy in the mid- to automaticallylocated from the acousticwaveform. The ex- high-frequencyranges make it difficultto accuratelylocate act algorithmis asfollows. The signalwas bandpass filtered the center frequencyof the antiresonanceand to use this ( 1-3.5 k Hz, fourth-order Butterworth filter) to enhancethe frequencyto distinguishnasal consonants. There are, how- energydiscontinuity of the boundarysince energy for the ever, less-intensivespectral perturbations at differentfre- nasalmurmur is mostlyin the frequencyrange below about quencyranges for differentnasal consonants due to the side- 800 Hz (Mermelstein,1977). The upperfrequency limit for branch (oral) coupling. For example, there is a relatively the bandpassfilter wasused to eliminateunpredictable high- broad energyreduction in the mid-frequency(1000-2000 frequencyinfluences. The boundarywas determined using a Hz) rangewithin the nasalmurmur of nasal/n/that is not short-term amplitude signal calculated from the filtered asapparent for the nasal/m/. When a nonuniformintegra- waveform.A hammingwindow, with a windowsize of 10ms tion of poweris undertakenas part of the spectraltransfor- anda windowstep of 2 ms,was used in the calculations.The mation, someotherwise unreliable energy concentrations short-termamplitude was calculated as the absolutesum of are expectedto