<<

applied sciences

Article Electroglottographic Analysis of the in Young Male Role of Kunqu

Li Dong 1,* and Jiangping Kong 2

1 School of Humanities, Shenzhen University, Shenzhen 518060, China 2 Department of Chinese and Literature, Peking University, Beijing 100871, China; [email protected] * Correspondence: [email protected]; Tel.: +86-1581-182-5016

Abstract: The types used in the young male role in Kunqu Opera were investigated. Two national young male role singers volunteered as the subjects. Each singer performed three voice conditions: , stage , and lyrics. Three electroglottogram parameters, the fundamental frequency, contact quotient, and speed quotient, were analyzed. Electroglottogram parameters were different between voice conditions. Five phonation types were found by clustering analysis in singing and stage speech: (1) , (2) high adduction , (3) modal voice, (4) untrained , and (5) high adduction falsetto. The proportion of each phonation type was not identical in singing and stage speech. The relationship between phonation type and pitch was multiple to one in the low pitch range, and one to one in the high pitch range. The pressure levels were related to the phonation types. Five phonation types, instead of only the two phonation types (modal voice and falsetto) that are identified in traditional Kunqu Opera singing theory, were concomitantly used in the young male role’s artistic voices. These phonation types were more similar  to those of the young female roles than to those of the other male roles in the Kunqu Opera. 

Citation: Dong, L.; Kong, J. Keywords: young male role; singing; stage speech; electroglottography Electroglottographic Analysis of the Voice in Young Male Role of Kunqu Opera. Appl. Sci. 2021, 11, 3930. https://doi.org/10.3390/ 1. Introduction app11093930 1.1. Young Male Role in Kunqu Opera The Kunqu Opera is a traditional opera in China and inscribed in United Nations Academic Editor: Michael Döllinger Educational, Scientific, and Cultural Organization’s list of “oral masterpiece and intangible heritage of humanity”. It has been handed down orally since the middle of the sixteenth Received: 19 March 2021 century and is revered as the ancestor of all Chinese . There are at least 10 artists in a Accepted: 23 April 2021 Published: 26 April 2021 theatrical troupe, Jing (colorful face role), Guansheng (hat role), Jinsheng (kerchief role), Laosheng (aged male role), Fumo (second aged male role), Zhengdan (middle-aged woman

Publisher’s Note: MDPI stays neutral role), Guimendan (young woman role), Liudan (young girl role), Fuchou (second clown with regard to jurisdictional claims in role), and Xiaochou (clown role). Their voice mirror the ages, genders, characters, published maps and institutional affil- and identities of the various personages. iations. Both Guansheng and Jinsheng belong to young male (YM) roles. A Guansheng performer, whose voice quality has been described as “broad and bright” having “a heavy oral resonance”, acts as a young king or a gifted scholar. Jinsheng performers often act in love stories, and his voice quality has been described as “brighter and lyrical” [1]. A YM singer can adjust his voice to play the part of either Guansheng or Jinsheng. From Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. the perspective of pitch range, a YM’s singing is similar to a Western , while the This article is an open access article stage speech is similar to a Western [2]. In the traditional singing theory of Kunqu distributed under the terms and Opera, a YM role uses modal voice in the low pitch range and falsetto in the high pitch conditions of the Creative Commons range to recite and sing. The of a YM role was not traditionally fixed. It was from Attribution (CC BY) license (https:// B3 to #F4, which depended on the ages and identities of the personages. The younger the creativecommons.org/licenses/by/ personage was, the lower the passaggio that was adopted [3]. However, the voice timbre 4.0/). in the low pitch range differs from that of the speech, and the falsetto deviates from both

Appl. Sci. 2021, 11, 3930. https://doi.org/10.3390/app11093930 https://www.mdpi.com/journal/applsci Appl. Sci. 2021, 11, 3930 2 of 12

the Western operatic tradition and untrained falsetto, which have been well described in previous research [4–8]. To reveal how the voice can be used in dramatic contexts to create the character of an ancient Chinese young man, it is necessary to investigate the details of the phonation type in scientific terms. The present study focuses on the phonation types from the perspective of electroglottogram (EGG) parameters and investigates (1) the distribution differences of parameters among three conditions, namely, singing, stage speech, and reading lyrics; (2) the phonation types used in singing and stage speech; and (3) the relationships between the pitch and phonation types.

1.2. EGG Analysis Electroglottography is a non-invasive technique to measure variations in the contact area between the two vocal folds as a function of time. It is related complementarily to the glottal air flow [9]. The peaks in the derivative of the EGG signal correspond to the closing and opening events of the vocal folds [10–12]. A model [9] that pinpoints certain landmarks and their relation to the glottal airflow pulse during normal voicing is widely used to speculate the movement and position of the vocal folds during phonation [12]. Clinically meaningful alterations of vocal fold status and behavior also have been reported to result in different geometric characteristics in the EGG waveform [13]. Different phonation settings result in different phonation types and characteristic EGG shapes [4,14,15]. The production of modal voice is carried out with moderate adductive tension, medial compression, and longitudinal tension in the lower pitch range of a speaker. In a considerably higher pitch range, when the of the vocal fold is made stiff and less mobile, which often is accompanied by slightly abducted folds, a falsetto is produced. Unlike modal voice and falsetto, breathiness is a modificatory setting of vocal folds. The notion of breathy voice involves a type of phonation which can be produced over a very wide range of air flows [4]. Three EGG parameters, namely, fundamental frequency (fo), contact quotient (CQ), and speed quotient (SQ), have been found to be associated with phonation types [2,5,16]. The CQ is defined as the ratio between the contact phase of the EGG signal and the fundamental period, as can be seen in Figure1. The CQ and the closed quotient derived from the glottal flow are not necessarily equal, since transglottal airflow may occur during incomplete glottal closure [17]. For EGG waveforms with a single peak, a high CQ is typically related to a pressed voice, while a low one is commonly observed for breathy voice. However, this is not true for a double-peak EGG signal. The SQ was originally defined for glottal flow as the duration of the opening phase divided by the duration of the closing phase [18]. A high SQ corresponds to faster glottal contact and indicates that the voice has more high-frequency energy [19]. For EGG, the SQ is the ratio between the decontacting phase and the contacting phase (see Figure1) in the EGG [ 5]. The SQ is related to the convergence (along with vertical phasing) in the [20] and the degree of tension of vocal fold; the greater the tension (such as falsetto) is, the closer to 100% the SQ will be. In several Chinese dialects and minority , compared with modal voice, the special vocal fry showed a lower fo, smaller CQ, and larger SQ; the breathy voice showed a lower fo, smaller CQ, and smaller SQ; the pressed voice showed a lower fo, larger CQ, and larger SQ; and the high-pitched voice showed a higher fo, larger CQ, and smaller SQ [5]. In order to correctly calculate the EGG parameters, three methods have been previously applied in research to pinpoint the moments of glottal contact and of loss of glottal contact [21–25]. The criterion-level method [23] showed better applicability for double-peak EGG signals [14], weak signals with a large high-frequency noise component, and other special cases, and was more appropriate for the analysis of the complex EGG signals of YMs. Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 12 Appl. Sci. 2021, 11, 3930 3 of 12

FigureFigure 1. 1. The landmarkslandmarks in in the the EGG EGG signal. signal. 2. Materials and Methods 2. Materials and Methods Two male national actors of Kunqu Opera (YM1 and YM2 for short) volunteered as subjects.Two Themale ages national of YM1 actors and YM2 of Kunqu were 45 Opera and 27 (YM1 years, and respectively, YM2 for when short) the volunteered signals as subjects.were collected. The ages Their of professional YM1 and YM2 experiences were 45 were and 27 27 and years, 9 years, respectively, respectively. when Neither the of signals werethem collected. smoked or Their had a historyprofessional of vocal experiences cord disease. were Both 27 singers and sang9 years, four respectively. songs (16 min), Neither ofrecited them a smoked section of or stage had speecha history (2 min) of vocal as they cord would disease. on stage Both after singers a warm-up sang exercisefour songs (16 min),for about recited 1 min, a section and read of the stage lyrics speech of the songs (2 min) and as stage they speech would (3 min) on stage in a modal after voice.a warm-up Two of the songs were for Jinsheng and were the South song, and the other two songs were exercise for about 1 min, and read the lyrics of the songs and stage speech (3 min) in a for Guansheng and were the North song. modalBoth voice. singers Two were of the recorded songs in were a quiet for living Jinsheng room, and of a sizewere of the about South4 × 5song,× 3 m and3. Au- the other twodio wassongs picked were up for by Guansheng a Sony Electret and condenser were the microphone North song. placed off axis at a measured distanceBoth of singers 15 cm fromwere the recorded mouth. in Sound a quiet pressure living level room, (SPL) of a calibration size of about was 4 carried × 5 × 3 outm3. Audio wasby recording picked up a 1000-Hzby a Sony tone, Electret the SPL condenser of which was microphone measured atplaced the recording off axis microphoneat a measured dis- tanceby means of 15 of cm aTES-52 from the sound mouth. level Sound meter (TESpressure Electrical level Electronic, (SPL) calibration Corp., Taipei, was carried China). out by recordingThe EGG signala 1000-Hz was collectedtone, the bySPL an of EGG which system was (Electroglottographmeasured at the recording Model 6103; microphone Kay, by meansMontvale, of a NJ,TES-52 USA). sound The signals level meter were simultaneously (TES Electrical recorded Electronic, and Corp digitized., Taipei, on 16 China). bits The at a sampling frequency of 20 kHz and recorded on dual channel wav files into ML880 EGG signal was collected by an EGG system (Electroglottograph Model 6103; Kay, PowerLab system. The equivalent sound levels for reading lyrics, singing, and stage speech Montvale,were 73, 89, NJ, and USA). 92 dB The (A) forsignals YM1 were and 73, simultaneo 88, and 93usly dB (A) recorded for YM2, and respectively digitized [ 26on]. 16 bits at a samplingThe signals frequency were divided of 20 kHz into and characters. recorded The on audio dual and channel EGG signals wav files were into analyzed ML880 Pow- erLabcharacter system. by character The equivalent using the sound VoiceLab levels 1.0 for [27 re] automatically.ading lyrics, singing, The SPL and values stage were speech ex- were 73,tracted 89, and from 92 the dB audio (A) for signals YM1 and and calibrated 73, 88, an tod 0.3 93 m dB in order(A) for to makeYM2, anrespectively easier comparison [26]. withThe the previoussignals were study. divided The contacting into characters. and the decontacting The audio and events EGG were signals approximated were analyzed characterusing the by commonly character used using 35% the of theVoiceLab EGG amplitude 1.0 [27] criterionautomatically. [16,23,28 The], which SPL isvalues ad- were extractedvantageous from when the vocal audio adduction signals and is to calibrated be detected. to Three 0.3 m phonatory in order to parameters make an wereeasier com- calculated: (1) f , (2) EGG CQ (abbreviated as CQ), and (3) EGG SQ (abbreviated as SQ). parison with theo previous study. The contacting and the decontacting events were ap- The extracted data from each character were stratified by sampling into 30 data, since the proximatedduration of theusing character the commonly varied a lot. used The song35% andof the the stageEGG speechamplitude did not criterion contain [16,23,28], all whichspeech is , advantageous but did containwhen vocal most vowelsadduction of the is to language. be detected. The sample Three sizephonatory in EGG param- eterscycles were was calculated: 12,150 (YM1 (1)0s singing), fo, (2) EGG 4350 CQ (YM1 (abbreviated0s stage speech), as CQ), 11,400 and (YM2 (3) EGG0s singing), SQ (abbreviated and as3870 SQ). (YM2 The0s extracted stage speech). data Statistical from each analyses characte werer were completed stratified using by SPSS sampling 18. Since into the 30 data, sincedata didthe notduration comply of with the the character normal distribution, varied a lot. and The the testsong of homogeneityand the stage of variancesspeech did not containwas significant all speech (p < sounds, 0.05), and but the did distributions contain most of the EGG parameters of the language. were compared The sample by size inMann–Whitney EGG cycles was U tests. 12,150 (YM1′s singing), 4350 (YM1′s stage speech), 11,400 (YM2′s sing- Previous research indicated that the long-term-average spectrum of a YM role showed ing), and 3870 (YM2′s stage speech). Statistical analyses were completed using SPSS 18. a large standard deviation, which implies a great variation of voice timbre [2]. To classify Sincethe phonation the data did types, not the comply EGG parameterswith the normal were clustered distribution, by k-means and the method test of [homogeneity16,29]. ofWhen variances clustering was in significant three dimensions, (p < 0.05), at least and two the clusters distributions in each dimensionof the EGG are parameters needed to were compareddifferentiate by between Mann–Whitney high and lowU tests. values. Hence, if the parameters vary independently of onePrevious another, research then at leastindicated eight centroids that the may long-term-average be needed. On thespectrum one hand, of ifa the YM role showedparameters a large of the standard two centroids deviation, in the clusteringwhich implies results a are great very variation close, they of are voice considered timbre [2]. To classifyto be of thethe same phonation phonation types, type. the On theEGG other para hand,meters the phonationwere clustered types were by determinedk-means method [16,29].consulting When both clustering parameter propertiesin three dimensions, and characteristics at least of thetwo waveform. clusters in For each the purposedimension are of discussing waveforms, the modal voice from reading lyrics was presumed as the neutral needed to differentiate between high and low values. Hence, if the parameters vary inde- pendently of one another, then at least eight centroids may be needed. On the one hand, if the parameters of the two centroids in the clustering results are very close, they are considered to be of the same phonation type. On the other hand, the phonation types were determined consulting both parameter properties and characteristics of the waveform. For the purpose of discussing waveforms, the modal voice from reading lyrics was presumed Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 12

as the neutral phonation state. The phonation types were determined on the basis of the Appl. Sci. 2021, 11, 3930 relationships between their EGG parameters and modal voice’s EGG4 of 12 parameters.

3. Results phonation state. The phonation types were determined on the basis of the relationships 3.1.between Distributions their EGG parameters of EGG Parameters and modal voice’s EGG parameters.

3. ResultsThe distributions of fo for different conditions of each singer are listed in Figure 2. The3.1. Distributions fo was transformed of EGG Parameters into semitones (re 55 Hz). The fo of reading lyrics was significantly

lowerThe than distributions singing of andfo for stage different speech conditions (p < of 0.05), each singer but arethe listed difference in Figure 2between. The the fos of singing andfo was stage transformed speech into was semitones significant (re 55 Hz).only The forf oYM2.of reading For lyricsboth was singers: significantly reading lyrics presented lower than singing and stage speech (p < 0.05), but the difference between the fos of singing theand most stage speech concentrated was significant distribution only for YM2. (which For both me singers:ant the reading smallest lyrics presented interquartile range), while stagethe most speech concentrated showed distribution the widest; (which the meant distribution the smallest interquartile of stage speech range), whileincluded that of singing andstage partly speech showedoverlapped the widest; with the reading distribution lyrics; of stage however, speech included there thatwas of nearly singing no overlap between and partly overlapped with reading lyrics; however, there was nearly no overlap between thethe distributionsdistributions of singing of singing and reading and lyrics. reading lyrics.

Figure 2. Box plot of fo. Figure 2. Box plot of fo. The CQ were also alike between singers (see Figure3). For both singers, the median of theThe CQ was CQ lowest were in also singing alike and highestbetween in reading singers lyrics (see (p < Figure 0.05). From 3). theFor distance both singers, the median ofbetween the CQ the was first andlowest third in quartile singing locations, and highest which revealed in reading the distribution lyrics (p width < 0.05). of From the distance typical data (the 50% data around the distribution center), in order of most concentrated, betweenthe CQ distribution the first was: and reading third lyrics, quartile singing, locations, and stage which speech. Withrevealed respect the to the distribution width of typicalsame condition, data (the the distribution50% data widtharound of typical the distribu data in singingtion center), and stage in speech order was of most concentrated, thesimilar CQ between distribution singers; however, was: reading it was not inlyrics, reading sing lyrics.ing, Regarding and stage the length speech. of the With respect to the whiskers as the reflection of the phonation diversity, for both singers, singing and stage samespeech condition, showed greater the CQ distribution diversity than width reading lyrics.of typical For YM1, data the in CQ singing diversity and of stage speech was similarstage speech between was slightly singers; greater however, than that of it singing, was not while in the reading opposite lyrics. was true Regarding for YM2, the length of the whiskerssince he adopted as the more reflection voices with of large the CQs. phonation diversity, for both singers, singing and stage Figure4 shows the distributions of SQ for different singers and conditions. Most of the speechSQ values showed were above greater 100%, which CQ implies diversity that the than decontacting reading phase lyrics. was longer For thanYM1, the the CQ diversity of stagecontacting speech phase. was For both slightly YM1 and greater YM2: the than median that SQ ofof reading singing, lyrics while was remarkably the opposite was true for YM2,larger thansince that he of adopted singing and more stage speechvoices (p with< 0.05); large the median CQs. SQ of stage speech was a little smaller than singing (p < 0.05); the median was closer to the first quartile than to the third one, for both singing and stage speech. From the perspective of the distribution width of typical data, for both singers, singing showed the most concentrated distribution and stage speech showed the widest. In regard to the same condition, YM1 presented a similar typical data distribution width of reading lyrics with YM2; for YM1, the typical data distribution width of singing and stage speech were both wider than YM2. As for the phonation diversity, which is reflected by the length of the whiskers, singing showed a little greater SQ diversity than stage speech.

Figure 3. Box plot of CQ. Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 12

as the neutral phonation state. The phonation types were determined on the basis of the relationships between their EGG parameters and modal voice’s EGG parameters.

3. Results 3.1. Distributions of EGG Parameters

The distributions of fo for different conditions of each singer are listed in Figure 2. The fo was transformed into semitones (re 55 Hz). The fo of reading lyrics was significantly lower than singing and stage speech (p < 0.05), but the difference between the fos of singing and stage speech was significant only for YM2. For both singers: reading lyrics presented the most concentrated distribution (which meant the smallest interquartile range), while stage speech showed the widest; the distribution of stage speech included that of singing and partly overlapped with reading lyrics; however, there was nearly no overlap between the distributions of singing and reading lyrics.

Figure 2. Box plot of fo.

The CQ were also alike between singers (see Figure 3). For both singers, the median of the CQ was lowest in singing and highest in reading lyrics (p < 0.05). From the distance between the first and third quartile locations, which revealed the distribution width of typical data (the 50% data around the distribution center), in order of most concentrated, the CQ distribution was: reading lyrics, singing, and stage speech. With respect to the same condition, the distribution width of typical data in singing and stage speech was similar between singers; however, it was not in reading lyrics. Regarding the length of the whiskers as the reflection of the phonation diversity, for both singers, singing and stage speech showed greater CQ diversity than reading lyrics. For YM1, the CQ diversity of Appl. Sci.Appl. 2021 Sci., 112021, x, 11 FOR, 3930 PEER REVIEWstage speech was slightly greater than that of singing, while the 5opposite of 12 was true 5for of 12

YM2, since he adopted more voices with large CQs.

Figure 4 shows the distributions of SQ for different singers and conditions. Most of the SQ values were above 100%, which implies that the decontacting phase was longer than the contacting phase. For both YM1 and YM2: the median SQ of reading lyrics was remarkably larger than that of singing and stage speech (p < 0.05); the median SQ of stage speech was a little smaller than singing (p < 0.05); the median was closer to the first quartile than to the third one, for both singing and stage speech. From the perspective of the dis- tribution width of typical data, for both singers, singing showed the most concentrated distribution and stage speech showed the widest. In regard to the same condition, YM1 presented a similar typical data distribution width of reading lyrics with YM2; for YM1, the typical data distribution width of singing and stage speech were both wider than YM2. As for the phonation diversity, which is reflected by the length of the whiskers, singing FigureshowedFigure 3. Box3. Boxa plot little plot of CQ. greater of CQ. SQ diversity than stage speech.

Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 12

Figure 4 shows the distributions of SQ for different singers and conditions. Most of the SQ values were above 100%, which implies that the decontacting phase was longer than the contacting phase. For both YM1 and YM2: the median SQ of reading lyrics was remarkably larger than that of singing and stage speech (p < 0.05); the median SQ of stage speech was a little smaller than singing (p < 0.05); the median was closer to the first quartile than to the third one, for both singing and stage speech. From the perspective of the dis- tribution width of typical data, for both singers, singing showed the most concentrated distribution and stage speech showed the widest. In regard to the same condition, YM1 presented a similar typical data distribution width of reading lyrics with YM2; for YM1, the typical data distribution width of singing and stage speech were both wider than YM2. As for the phonation diversity, which is reflected by the length of the whiskers, singing showed a little greater SQ diversity than stage speech.

FigureFigure 4. Box4. Box plot plot of SQ. of SQ. 3.2. Phonation Types

3.2.The Phonation EGG waveforms Types were more complicated than those in modal speech. Thus, the Figurephonation 4. Box plot of SQ. types were determined by consulting both the parameters and characteristics of 3.2.the Phonation waveform. TypesThe EGG The waveforms typical waveforms were can more be seen complicated in Figure5. The than waveforms those werein modal all speech. Thus, the The EGG waveforms were more complicated than those in modal speech. Thus, the phonationselectedphonation types were from determined YM’stypes by consulti singing wereng both and thedetermined parameters stage and speech, characteristics by except consulti for Figureng 5bothc, which the was parameters a typical and characteristics of waveformthe waveform. The of typical the waveforms colorful can be face seen in role Figure [ 285. The]. waveforms The spectra were all of some audio clips which corresponded selectedof fromthe YM’s waveform. singing and stage speech, The except typicalfor Figure 5c, which waveforms was a typical can be seen in Figure 5. The waveforms were all waveformto the of EGGthe colorful waveforms face role [28]. The in spectra Figure of some5 areaudio shownclips which incorre- Figure6. spondedselected to the EGG waveforms from in FigureYM’s 5 are shown singing in Figure 6. and stage speech, except for Figure 5c, which was a typical waveform of the colorful face role [28]. The spectra of some audio clips which corre- sponded to the EGG waveforms in Figure 5 are shown in Figure 6.

(a) (b) (c) (d)

(e) (f) (g) (h) Figure 5. Eight typical EGG waveforms: (a) modal voice, (b) high adduction modal voice, (c) pressedFigure voice, ( 5.d) untrainedEight falsetto, typical (e) high EGG adduction waveforms: falsetto, (f–h) breathy (a) voice. modal voice, (b) high adduction modal voice, (c) pressed voice, (d) untrained falsetto, (e) high adduction falsetto, (f–h) breathy voice.

(a) (b) (c) (d)

(e) (f) (g) (h) Figure 5. Eight typical EGG waveforms: (a) modal voice, (b) high adduction modal voice, (c) pressed voice, (d) untrained falsetto, (e) high adduction falsetto, (f–h) breathy voice. Appl. Sci. 2021,, 11,, 3930x FOR PEER REVIEW 66 of of 12

(a) (b)

(c) (d)

Figure 6.6. Spectra ofof modalmodal voice, voice, high high adduction adduction modal modal voice, voice, breathy breathy voice, voice, untrained untrained falsetto, falsetto, and and high high adduction adduction falsetto. fal- (setto.a) Spectra (a) Spectra of modal of modal voice and voice high and adduction high adduction modal voice; modal (b voice;) spectra (b) ofspectra falsetto of and falsetto high and adduction high adduction falsetto; ( cfalsetto;) spectrum (c) ofspectrum breathy of voice breathy with voice double with peaks; double (d) peaks; spectrum (d) ofspectrum breathy of voice breathy with voice gathered with double gathered peaks. double peaks.

The waveform in Figure 55aa waswas aa modalmodal voice,voice, sincesince itit showedshowed aa similarsimilar fo,, CQ, CQ, SQ, and shape to the EGG parameters of YM10′s reading. The voice which had a similar EGG waveform asas thatthat inin Figure Figure5b 5b sounded sounded much much brighter brighter and and clearer clearer than than the modalthe modal voice, voice, but butnot asnot tense as tense as the as pressedthe pressed voice voice in other in other Kunqu Kunqu male rolesmale [roles28] (see [28] Figure (see Figure5c). The 5c). CQ The of CQthis of kind this of kind voice of was voice significantly was significantly larger than larg thater than of modal that of voice. modal In addition,voice. In addition, as shown as in shownFigure6 ina, theFigure high-frequency 6a, the high-frequency partials showed partials more showed energy more than modalenergy voice. than Bothmodal of voice. these Bothindicated of these that indicated the degree that of the posterior degree glottal of posterior adduction glottal increased, adduction which increased, was based which on was the basedresults on of the a previous results of research a previous in which research the experimentin which the was experiment documented was by documented simultaneous by simultaneouslaryngeal videostroboscopy, laryngeal vide electroglottography,ostroboscopy, electroglottogr and by capturingaphy, and the acousticby capturing data. [30the]. acousticMeanwhile, data. the [30]. CQ Meanwhile, was significantly the CQ smaller was significantly than that ofsmaller the pressed than that voice of inthe the pressed same voicepitch range,in the same which pitch is around range, 70% which for is the around other 70% male for roles the in other the Kunqumale roles opera in the [28 ].Kunqu Thus, operait was [28]. named Thus, modal it was voice named with modal a high voice degree with a of high posterior degree glottal of posterior adduction, glottal oradduc- high tion,adduction or high modal adduction voice. modal The waveform voice. The in waveformFigure5d showedin Figurea 5d significantly showed a significantly smaller CQ smallerthan modal CQ voicethan modal and a SQvoice close and to a 100%. SQ close Thus, to it 100%. was categorized Thus, it was as categorized untrained falsetto. as un- trainedThe waveform falsetto. in The Figure waveform5e showed in Figure similar 5e geometric showed similar characteristics geometric with characteristics that in Figure with5d thatin the in contactingFigure 5d in phase, the contacting but a larger phase, CQ. bu Itt was a larger named CQ. as It falsetto was named with as high falsetto degree with of highposterior degree glottal of posterior adduction, glottal or high adduction, adduction or falsetto.high adduction As shown falsetto. in Figure As 6shownb, the spectrum in Figure 6b,of high the spectrum adduction of falsetto high adduction had more falsetto high-frequency had more energy high-frequency than that of energy untrained than falsetto. that of Appl. Sci. 2021, 11, 3930 7 of 12

The waveforms in Figure5f–h were all classified as breathy voice, although they had different shapes and parameter characteristics. In this case, low adductive tension and weak medial compression made the vocal folds never fully come together. There was a continuous glottal leakage with some audible frication noise compared with the modal voice [4]. The waveform in Figure5f was a typical breathy voice, since it showed a smaller CQ than modal voice and a SQ larger than 100. The waveform in Figure5g exhibited larger CQ and SQ than that in Figure5d. A second peak, which occurred in the decontacting phase, suggested that the anterior and posterior parts of the vocal folds vibrated relatively independently [16,29,31]. Thus, it was another EGG signal pattern of breathy voice. A second peak in EGG may result in a sub- in the spectrum (see Figure6c). The waveform in Figure5h showed a similar CQ with the high adduction modal voice and a similar SQ with the untrained falsetto. However, the shape of the EGG signal could be considered as the combination of a main and a second peak. The spectrum is similar to that of Figure5g (Figure6c,d). The higher were replaced by aspiration noise in Figure6c,d, which suggested the occurrence of the turbulence noise. Thus, it was also identified as breathy voice. More evidence for identifying these voices as breathy voice were found in the spectra if we ignored the contribution to the spectra. The level difference between the fundamental component in the spectrum H1 and the second- amplitude A2 of a / / was 6.4 dB lower for breathy voice than for modal voice. The first-formant bandwidthU of the breathy / / was 170Hz, and that for the modal / / was only 76 Hz. U U 3.3. Clustering Analysis To determine the vibration modes of the vocal folds, all three EGG parameters needed to be taken into consideration. Table1 illustrates the cluster centroids of EGG parameters and the percentage of data around the centroid of the two YM singers. A combination of the singer, the condition (S stood for singing, SS stood for stage speech), w and the number of the clustering centroid, such as “YM1_S_1” (the first clustering centroid of YM10s singing parameters), was used to refer to each clustering centroid.

Table 1. EGG parameters and corresponding phonation type for each clustering centroid and the percentage of the data around the centroid. (P: percentage; PT: phonation types; m: modal voice; M: high adduction modal voice; B: breathy voice; f: untrained falsetto; F: high adduction falsetto).

Singing Stage Speech Singer No. fo/Semi CQ/% SQ/% P/% PT No. fo/Semi CQ/% SQ/% P/% PT 1 25 43 318 10 m 1 22 57 414 7 M 2 26 33 216 15 B 2 23 28 184 9 B 3 27 46 166 7 m 3 24 41 299 10 m 4 29 50 269 6 M 4 27 53 297 15 M YM1 5 29 25 115 18 f 5 29 59 90 5 B 6 32 36 139 11 B 6 31 47 210 13 m 7 34 31 105 21 f 7 34 35 104 18 f 8 38 39 126 11 F 8 40 44 118 22 F 1 25 39 258 11 m 1 21 37 289 9 B 2 25 35 173 19 B 2 22 25 146 10 B 3 29 25 115 21 f 3 24 49 335 7 M 4 31 58 289 3 M 4 26 36 200 15 B YM2 5 32 58 99 8 B 5 27 53 132 8 B 6 34 36 148 10 B 6 28 47 240 16 m 7 34 28 112 19 f 7 37 28 109 18 f 8 39 38 124 9 F 8 41 35 111 17 F

As shown in Table1, there were eight effective clustering centroids for each singer in each condition. Up to 84% of the clustering centroids was distributed below the maximum Appl. Sci. 2021, 11, 3930 8 of 12

pitch of the YM roles’ passaggio. On the other hand, in the high pitch range (above B4), the vibrated stably since there was only one clustering centroid. The vibration mode of the vocal fold did not show a one-to-one correspondence with pitch. The fo of YM1_S_4 and 5, YM2_S_1 and 2, and YM2_S_6 and 7 were same. However, the CQ and SQ of them showed great diversity. On the adjacent pitches, the CQ or SQ was also different. The proportions of the clustering centroids varied a lot. The smallest was 3% and the largest was 22%. In most cases, if the fo of two clustering centroids was close, the proportions of them displayed a relatively large difference. Between singers, the clustering centroids appeared on different pitches in both singing and stage speech. Some clustering centroids presented the same or similar parameter characteristics, such as YM1_S_5 and YM2_S_3, and YM1_S_8 and YM2_S_8. The other cen- troids showed more obvious differences, such as YM1_SS_4 and YM2_SS_5, and YM1_S_6 and YM2_S_5. For both singers, multi differences were observed between conditions. The clustering centroids were located in different pitch ranges, especially for YM2. There were five clustering centroids in the pitch range of 29~34 in YM20s singing, while there was none for his stage speech. Even if the clustering centroids showed the same fo, the CQ and SQ of some clustering centers were different, such as YM1_S_5 and YM1_SS_5. Eight clustering centroids did not equal eight phonation types. Regarding the influence of the fo on the clustering result, if two clustering centroids showed a similar CQ and SQ but a different fo, they may be or may not be the same phonation type, which depended on the similarity degree between their EGG waveforms. Thus, combining the analysis for Figure5 , the phonation type of each clustering centroid can be determined; see the last column of each part in Table1. For YM1 and YM2 0s singing, the proportion of phonation type, from high to low, was untrained falsetto, breathy voice, modal voice, high adduction falsetto, and high adduction modal voice. The phonation types used were slightly different between singers in stage speech. YM1 used a higher percentage of high adduction modal voice, modal voice, and high adduction falsetto and less breathy voice and untrained falsetto than in singing. For YM2, compared with singing, a higher percentage of breathy voice, high adduction modal voice, modal voice, high adduction falsetto, and less untrained falsetto were employed in stage speech. A higher degree of posterior glottal adduction made the voice have stronger energy in stage speech than in singing, which was verified by previous research [26]. Taken together, high adduction modal voice, modal voice, breathy voice, and un- trained falsetto were used in the low pitch range (fo from 25 to 34 for singing and from 21 to 37 for stage speech), while high adduction falsetto was used in the high pitch range (fo above 38 for singing and above 40 for stage speech), as shown in Figure7. Abundant phonation types in the low pitch range diversified the voice, while high adduction falsetto made the voice timbre unify in the high pitch range. Modal voice, breathy voice, untrained falsetto, and high adduction falsetto were also adopted in different proportions by two young female roles in Kunqu Opera [16,29]. More modal voice and the use of high adduc- tion modal voice made the voice of the YM role manlier. However, YM’s voice was much gentler than those of the colorful face role and the old man role, since they used pressed voice [28].

3.4. SPL Analysis

The medians of SPL were different between the phonation types and fos, as shown in Figure8. The lowest SPL was found in breathy voice for each singer and condition, and the highest was observed in high adduction falsetto. In breathy voices, EGG waveforms with a single peak (Figure5f,h) showed a low SPL, while EGG waveforms with double peaks (Figure5g) showed a higher SPL, which was close to the SPL of the singer’s modal voice. The SPL of untrained falsetto was lower than that of high adduction falsetto. Except for the SPL of YM20s singing, the SPL of untrained falsetto was higher than breathy voice in the proximate pitch range. For YM1, the SPL of high adduction modal voice was higher than that of modal voice. However, it was not true for YM2. From the above comparison, the Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 12

Appl. Sci. 2021, 11, 3930 9 of 12

SPL was related to the phonation types, and contact or adduction increased the SPL Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 12 in most cases. Between conditions, the median SPL of breathy voice was larger in singing. (a) (b) For other phonation types, in most cases, the median SPL was larger in stage speech. Figure 7. Three-dimensional diagrams of the clustering centroids: (a) the parameters of YM1, and (b) the parameters of YM2. m: modal; M: high adduction modal; B: breathy; f: untrained falsetto; F: high adduction falsetto.

3.4. SPL Analysis

The medians of SPL were different between the phonation types and fos, as shown in Figure 8. The lowest SPL was found in breathy voice for each singer and condition, and the highest was observed in high adduction falsetto. In breathy voices, EGG waveforms with a single peak (Figure 5f,h) showed a low SPL, while EGG waveforms with double peaks (Fig- ure 5g) showed a higher SPL, which was close to the SPL of the singer’s modal voice. The SPL of untrained falsetto was lower than that of high adduction falsetto. Except for the SPL of YM2′s singing, the SPL of untrained falsetto was higher than breathy voice in the proxi- mate pitch range. For YM1, the SPL of high adduction modal voice was higher than that of modal voice. However, it was not true for YM2. From the above comparison, the SPL was (arelated) to the phonation types, and extra contact or adduction(b) increased the SPL in most cases. Between conditions, the median SPL of breathy voice was larger in singing. For other FigureFigure 7.7.Three-dimensional Three-dimensional diagramsdiagrams ofof thethe clusteringclustering centroids:centroids: ((aa)) thethe parametersparameters ofof YM1,YM1, andand ((bb)) thethe parametersparameters ofof phonation types, in most cases, the median SPL was larger in stage speech. YM2.YM2. m:m: modal;modal; M:M: highhigh adductionadduction modal;modal; B:B: breathy;breathy; f:f: untraineduntrained falsetto;falsetto; F:F: highhigh adductionadduction falsetto.falsetto.

3.4. SPL Analysis

The medians of SPL were different between the phonation types and fos, as shown in Figure 8. The lowest SPL was found in breathy voice for each singer and condition, and the highest was observed in high adduction falsetto. In breathy voices, EGG waveforms with a single peak (Figure 5f,h) showed a low SPL, while EGG waveforms with double peaks (Fig- ure 5g) showed a higher SPL, which was close to the SPL of the singer’s modal voice. The SPL of untrained falsetto was lower than that of high adduction falsetto. Except for the SPL of YM2′s singing, the SPL of untrained falsetto was higher than breathy voice in the proxi- mate pitch range. For YM1, the SPL of high adduction modal voice was higher than that of modal voice. However, it was not true for YM2. From the above comparison, the SPL was related to the phonation types, and extra contact or adduction increased the SPL in most cases. Between conditions, the median SPL of breathy voice was larger in singing. For other phonation types, in most cases, the median SPL was larger in stage speech.

(a) (b)

FigureFigure 8.8. The relationship betweenbetween ffoos and SPL medians for each cluster: (a)) the parameters ofof YM1,YM1, andand ((b)) thethe parametersparameters ofof YM2.YM2. m: modal; M: high adduction modal; B: breathy; f:f: untrained falsetto; F:F: highhigh adductionadduction falsetto.falsetto. TheThe blueblue marksmarks representrepresent singing singing and and the the red red ones ones represent represent the the stage stage speech. speech.

Taking all SPL and fo data into consideration, for most phonation types, the SPLs of singing showed more concentrated distributions (which corresponded to smaller interquar- tile range) than those of stage speech. Pearson tests were done between the fo and SPL values. R2 was 0.41, 0.44, 0.30, and 0.66 for YM10s and YM20s singing and stage speech, respectively. The wide distribution range of fo in each pitch range resulted in the low R2 values. It also led to the observation of multiple phonation types in the same pitch range, such as YM1_S_4 and 5, which showed the same fo but contrasting SPLs.

4. Discussion

Several phonation types were reported in this research. The determinations on the (a) (b) natures of the phonation types were based on three considerations. Firstly, the reference Figure 8. The relationship betweenEGG signal fos and was SPL selected medians fromfor each each cluster: singer’s (a) the reading, parameters since of theYM1, parameters and (b) the ofparameters modal voice of YM2. m: modal; M: high adduction modal; B: breathy; f: untrained falsetto; F: high adduction falsetto. The blue marks represent singing and the red ones represent the stage speech. Appl. Sci. 2021, 11, 3930 10 of 12

were different between singers. YM1 showed a higher CQ than YM2 in reading. The CQ of YM10s stage speech was lower than that of his own reading, but similar to the CQ of YM20s reading (see Figure3). However, YM1 0s stage speech sounded different from YM20s reading. Thus, setting the singer’s own reading as the reference was important. Secondly, the CQ and SQ of certain phonation types varied with the fo; thus, the situation that the vocal folds were elongated as the fo was rising should be considered while judging the phonation types. Previous research showed that the CQ had a positive correlation with the fo in modal voice, and the SQ had a negative one [5]. The comparison of parameters need to be confined to the same pitch range. Thirdly, in the case of special phonation types, not only the EGG parameters, but also the shapes of the EGG signals and the spectrum characteristics should be taken into consideration. The EGG features can be explained with the geometric and kinematic parameters. The waveforms in Figure5 can be explained by the combinations of four features, which are the pulse widening that occurred from adduction, the peak skewing caused by increased convergence in the glottis, the skirt bulging linked to medial surface bulging on the vocal folds, and the shirt ramping produced by vertical phasing [20]. Then, the waveform that showed a similar CQ can be explained to have different vocal fold vibration modes, such as the waveform of Figure5b,h, Figure5e,f. In addition, the SPL, instead of the amplitude of EGG signal, could be a supplement to the contact or adduction degree of the vocal folds. It could be used in the analysis of EGG signals with low signal-to-noise ratios [12]. Some double-peak EGG signals occurred in YM’s singing and stage speech, such as those in Figure5g. Double-peak EGGs were also reported in the singing and stage speech of young woman and young girl roles [16,29]. The second peak in their EGG signals appeared in the contacting phase, was conspicuous, and visibly influenced the values of CQ and SQ. On the contrary, the second peak of YM’s EGG signal was easily ignored. Even so, the second peak corresponded to the vibration of some part of the vocal folds. It was necessary to point it out. However, neither the criterion-level method [23] nor the differential of EGG signal method [21,22] can solve this problem. New methods for double-peak EGG signal need to be introduced. One possible way is using the contact quotient by integration [32]. It could avoid the abrupt change of CQ when the second peak showed a similar level with the criterion, and the neglect of the second peak when its level was lower than the criterion. Another possible way is separating the second peak from the signal on the basis of the phase diversity. This was achieved by simultaneously analyzing multiple cycles of signals instead of analyzing one cycle at a time. Then, the higher CQ only corresponded to the more pressed voice, but not to the result of the second peak of the breathy voice. The phonation types used by YM singers were multitudinous and also observed in other types of singing voices. Above the passaggio, the voice had more high-frequency energy and a larger CQ than classic falsetto. It was more like the falsetto of a trained Western male singer [24,33]. Thus, the traditional term falsetto was loose. In and below the pitch range of passaggio, the phonation types were dramatically different from the term “modal voice” in the traditional singing theory of Kunqu Opera [1]. The singers varied phonation types not only with pitch, but also with conditions and personages. On the one hand, the extensive use of high adduction modal voice, breathy voice, and untrained falsetto made the voice timbre rich and various. On the other hand, the usage environment of each phonation type needs to be studied further. Some phonation types were found in similar pitch range, such as the untrained falsetto and the breathy voice in YM20s singing, as can be seen in Figure8. The position of a word in a sentence and the of the sentence might be the reason for the adoption of a phonation type. Similar EGG signals of some phonation types can also be found in the singing voice of tenor. High adduction modal voice and high adduction falsetto showed a similar CQ and SQ with mezza voce and falsetto of tenor, respectively [33], though high adduction modal voice was used in lower pitch range than mezza voce. Appl. Sci. 2021, 11, 3930 11 of 12

5. Conclusions The present study explored the phonation types used by YM roles in Kunqu Opera on the basis of the EGG parameters and the characteristics of EGG waveforms. The results showed that it is inaccurate to describe a YM role’s voice in the traditional terms modal voice and falsetto. The acoustic effects of YM role’s singing and stage speech were formed by complicated patterns of phonation types, which were breathy voice, modal voice, high adduction modal voice, untrained falsetto, and high adduction falsetto. The phonation types of YM were more similar to young female roles than to the other male roles in Kunqu Opera. This study showed the complexity of analyzing the EGG signals of artistic voices. On the one hand, new method need to be introduced to solve the one-to-many problem between the parameters and the waveforms. On the other hand, the contributing factor of the second peak in the EGG waveform needs to be backed up by the other research that uses visible techniques, such as the laryngeal videostroboscopy, which can improve the credibility of the present conclusions and form a paradigm for double-peak EGG signal analyzing.

Author Contributions: Conceptualization, L.D.; methodology, L.D. and J.K.; software, J.K.; investiga- tion, L.D.; resources, L.D. and J.K.; Writing—Original draft preparation, L.D.; Writing—Review and Editing, L.D.; funding acquisition, L.D. All authors have read and agreed to the published version of the manuscript. Funding: This research was funded by the National Office for Philosophy and Social Sciences, grant number 16CYY047. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Not applicable. Acknowledgments: We are grateful to the voice experts for their gentle participation in this investi- gation. Thanks to Bradley Phillips for the language modification. Conflicts of Interest: The authors declare no conflict of interest.

References 1. Wu, X. Dictionary of Chinese Kunqu Opera; Nanjing University Press: Nanjing, China, 2002; pp. 565–568. 2. Dong, L.; Kong, J.; Sundberg, J. Long-term-average spectrum characteristics of Kunqu opera singers’ speaking, singing and stage speech. Logoped. Phoniatr. Vocol. 2014, 39, 72–80. [CrossRef] 3. Tian, S. Research on Kunqu Opera Singing Art; Zhejiang University Press: Hangzhou, China, 2013; pp. 6–18. 4. Laver, J. The Phonetic Description of Voice Quality; Cambridge University Press: London, UK, 1980. 5. Kong, J. On Language Phonation; Central Nationalities University Press: Beijing, China, 2001; pp. 159–189. 6. Fant, G. Speech Acoustics and ; Springer Netherlands: Berlin, Germany, 2005. 7. Sundberg, J. The Science of the Singing Voice; Northern Illinois University: DeKalb, IL, USA, 1987. 8. Welch, G.F.; Sergeant, D.C.; Maccurtain, F. Xeroradiographic-electrolaryngographic analysis of male vocal registers. J. Voice 1989, 3, 244–256. [CrossRef] 9. Rothenberg, M. Some relations between glottal air flow and vocal fold contact area. In Proceedings of the Conference on the Assessment of Vocal Pathology; American Speech and Hearing Association, Ed.; American Speech and Hearing Association: Rockville, MD, USA, 1981; pp. 88–96. 10. Baer, T.; Löfqvist, A.; McGarr, N.S. Laryngeal vibrations: A comparison between high-speed filming and glottographic techniques. J. Acoust. Soc. Am. 1983, 74, 1304–1308. [CrossRef][PubMed] 11. Childers, D.G.; Naik, J.M.; Larar, J.N.; Krishnamurthy, A.K.; Moore, G.P. Electroglottography, speech, and ultra-high speed cinematography. In Vocal Fold Physiology and Biophysics of Voice; Titze, I.R., Scherer, R., Eds.; Denver Center of Performing Arts: Denver, CO, USA, 1983; pp. 202–220. 12. Herbst, C.T.; Schutte, H.K.; Bowling, D.L.; Svec, J.G. Comparing Chalk With Cheese—The EGG Contact Quotient Is Only a Limited Surrogate of the Closed Quotient. J. Voice 2017, 31, 401–409. [CrossRef] 13. Baken, R.J.; Orlikoff, R.F. Clinical Measurement of Speaking and Voice; Delmar Learning: New York, NY, USA, 2000. 14. Yoshinaga, I.; Kong, J. Laryngeal vibratory behavior in traditional Noh singing. Tsinghua Sci. Technol. 2012, 17, 94–103. [CrossRef] Appl. Sci. 2021, 11, 3930 12 of 12

15. Esling, J.H. A laryngographic investigation of phonation type and laryngeal configurations. Work. Pap. Linguist. Circle Univ. Vic. 1983, 14–36. 16. Dong, L.; Kong, J. Voice characteristics of young Girl Role in Kunqu Opera. J. Voice 2019, 33, e945.e19–e945.e25. [CrossRef] [PubMed] 17. Herbst, C.; Ternström, S. A comparison of different methods to measure the EGG contact quotient. Logoped. Phoniatr. Vocol. 2006, 31, 126–138. [CrossRef] 18. Timcke, R.; Von, L.H.; Moore, P. Laryngeal vibrations: Measurements of the glottic wave: Part II–Physiologic variations. Ama. Arch. Otolaryngol. 1959, 68, 438–444. [CrossRef] 19. Fant, G. Studies of the voice source. Acta Acust. 1988, 13, 135–146. 20. Titze, I.R. Interpretation of the electroglottographic signal. J. Voice 1990, 4, 1–9. [CrossRef] 21. Childers, D.G.; Hicks, D.M.; Moore, G.P.; Alsaka, Y.A. A model for vocal fold vibratory motion, contact area, and the electroglot- togram. J. Acoust. Soc. Am. 1986, 80, 1309–1320. [CrossRef] 22. Henrich, N.; D’Alessandro, C.; Doval, B.; Castellengo, M. On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. J. Acoust. Soc. Am. 2004, 115, 1321–1332. [CrossRef][PubMed] 23. Rothenberg, M.; Mahshie, J.J. Monitoring vocal fold abduction through vocal fold contact area. J. Speech Hear. Res. 1988, 31, 338–351. [CrossRef][PubMed] 24. Howard, D.M.; Lindsey, G.A.; Allen, B. Toward the quantification of vocal efficiency. J. Voice 1990, 4, 205–212. [CrossRef] 25. Howard, D.M. Variation of electrolaryngographically derived closed quotient for trained and untrained adult female singers. J. Voice 1995, 9, 163–172. [CrossRef] 26. Dong, L.; Sundberg, J.; Kong, J. Loudness and pitch of Kunqu Opera. J. Voice 2014, 28, 14–19. [CrossRef][PubMed] 27. Kong, J. VoiceLab 1.0; National Copyright Administration of the People’s Republic of China: Beijing, China, 2012. 28. Dong, L.; Sundberg, J.; Kong, J. Formant and voice source properties in two male Kunqu Opera roles: A pilot study. Folia. Phoniatr. Logo. 2014, 65, 294–302. [CrossRef] 29. Dong, L. Voice characteristics of young woman role in Kunqu Opera. Essays Linguist. 2016, 54, 292–304. 30. Herbst, C.T.; Howard, D.; Schlömicher-Thier, J. Using electroglottographic real-time feedback to control posterior glottal adduction during phonation. J. Voice 2010, 24, 72–85. [CrossRef] 31. Kong, J. Laryngeal Dynamics and Physiological Models; Peking University Press: Beijing, China, 2007. 32. Ternström, S. Normalized time-domain parameters for electroglottographic waveforms. J. Acoust. Soc. Am. 2019, 146, EL65–EL70. [CrossRef][PubMed] 33. Mayr, A. Investigating the voce faringea: Physiological and acoustic characteristics of the bel canto tenor’s forgotten singing practice. J. Voice 2017, 31, 255.e13–255.e23. [CrossRef][PubMed]