Evaluation of Segmentation Approaches and Constriction Degree Correlates for Spirant Approximant Consonants

Evaluation of segmentation approaches and constriction degree correlates for spirant approximant consonants Mauricio A. Figueroa Candia, Bronwen G. Evans Speech Hearing and Phonetic Sciences, UCL (University College London) [email protected], [email protected] ABSTRACT (e.g., [4,7]), most previous studies declare using auditory and visual cues from waveforms, The segmentation of Spanish spirant approximant spectrograms, intensity and formant contours to consonants poses several methodological challenges identify the most likely location of the approximant given the gradual transition from these consonants to consonant [4,6,9]. neighbouring segments. Most studies to date employ A somewhat different approach consists in the manual segmentation approaches, although these automated identification of the minimum intensity in methods are highly subjective and time-consuming. the consonant and of maxima intensity values in the Alternative automated approaches have used the surrounding segments [4,8,13]. Several intensity minimum intensity in the consonant and maxima measurements based on these intensity landmarks from the surrounding segments as landmarks for have been calculated, usually as correlates for the relative intensity measurements. consonant's degree of constriction (e.g., [8,9,14]). Multinomial Logistic Regression (MLR) analyses So far, with the exception of intensity-related were used to assess whether acoustic measurements measurements, there have been no attempts to obtained from manual and automated segmentation extract other types of acoustic measurements from approaches are able to predict the level of a known points of maximum and minimum intensity in categorical outcome variable (phoneme category). A consonants such as [ββ], [ðβ] or [ɣβ] and their third analysis was included to explore the predictive surrounding segments. Yet, if the intensity minimum capabilities of data from several constriction degree in a spirant approximant is assumed as being the correlates, based on relative intensity or spectral place where the consonant's properties are best energy measurements. exemplified, additional acoustic measurements can Results showed a slight advantage for the be extracted at this point, thus circumventing the automated segmentation method. As for constriction subjectivity associated with manual segmentation. degree correlates, some measurements – Intensity Along the same lines, no comparisons of datasets Ratio and Intensity Difference A – were unable to originating from different segmentation methods fully predict the outcome categorical variable. have been carried out for spirant approximants yet. In the case of acoustic constriction degree correlates, Keywords: segmentation, approximants, data for intensity ratios, intensity differences and constriction correlates, MLR maximum intensity change velocity has been correlated to electromagnetic articulometry data for 1. INTRODUCTION the constriction of /b/ using linear regression. Results showed that intensity ratios without low pass Spirant approximants [ββ], [ðβ] or [ɣβ] are to be found filtering were best correlated to the articulatory in complementary distribution with other allophones domain [14]. The effect of the preceding segment in from /b, dd, g/ in all dialects of Spanish [1,11,15]. the constriction degree for /dd/ using intensity Although an important degree of variation is to be differences, spectral tilt and maximum intensity found between dialects (see for example [3,6]), a change velocity has also been evaluated via mixed- general property of these consonants is that they effects models in regression analyses. The results display a gradual transition to neighbouring showed that maximum velocity was most likely the segments, which makes segmentation difficult [18], best predictor [8]. thus hampering the extraction of acoustic data. This study uses Multinomial Logistic Regression Despite this methodological problem, several (MLR) to assess how acoustic measurements studies to date provide acoustic measurements for obtained through two different segmentation these consonants, normally resorting to manual approaches (manual and automated) are able to segmentation (e.g., [9,10]). While it is not always predict the levels of the categorical outcome variable clear how manual segmentation has been executed phoneme category (levels: /b/, /dd/ and /g/) [5]. The /b/ /d/ 95% CI for odds ratio 95% CI for odds ratio Lower Upper Lower Upper Estimate (SE) Odds ratio Estimate (SE) Odds ratio (2.5%) (97.5%) (2.5%) (97.5%) Intercept -2.24(0.33)*** 0.06 0.11 0.20 -2.22(0.33)*** 1.00 1.00 1.00 Intensity 0.03(0.01)*** 0.06 0.11 0.21 0.01(0.01) 1.00 1.00 1.00 F1 0.00(0.00)*** 1.02 1.03 1.04 0.00(0.00)*** 1.00 1.00 1.00 F2 0.00(0.00)* 1.00 1.01 1.02 0.00(0.00)*** 1.00 1.00 1.00 Table 1: MLR results for the manual segmentation approach. Non significant main effects have been highlighted. Significance codes: ***, p < 0.001; **, p < 0.01; *, p < 0.05. Log-Likelihood (unexplained variability): -5334.9; McFadden R2 (effect size): 0.017299; Likelihood ratio test (significant variability explained by model): 187.83, p < 0.001. choice for this particular outcome variable was made sonant minimum intensity subtracted from under the assumption that the acoustic information the following segment's maximum intensity that distinguishes spirant approximants originating [8,14]. from these three phonological units is present in the • Intensity Difference B (IntDiff_B): measurements undertaken, provided that the Consonant minimum intensity subtracted segmentation method is adequate. from the preceding segment's maximum The same approach using MLR will be employed intensity [12]. to assess the predictive capabilities of several • Maximum Velocity (MaxVel): Maximum acoustic constriction degree correlates, most of them intensity from first differences (0.001 ms relative intensity measurements. interval) between consonant intensity minimum and following segment's intensity 2. METHODS AND PROCEDURES maximum [8,9,14]. For first differences at n intensity samples Ii: 2.1. Measurements (1) Δ I i = I i − I i−1 Acoustic measurements were extracted from a Chilean Spanish corpus containing 3 speech styles Maximum Velocity is defined as: from 10 native male and female adult speakers (n = 4948: /g/ 33%, /b/ 36%, /dd/ 31%). All data was (2) MaxVel = max Δ I i extracted and some processes automated using Praat i=1,…,n [2]. The extraction of acoustic parameters via manual • Minimum Velocity (MinVel): Minimum segmentation was aided by visual cues from the intensity from first differences (0.001 ms waveform and spectrogram, and formant and interval) between consonant intensity intensity curves. These cues were used to identify minimum and following segment's intensity likely boundaries inside each consonants' transitions maximum [9] (see Equation 3). to neighbouring segments. Intensity, F1 and F2 were measured as mean values from the inner 50% for (3) MinVel = min Δ I i each token. For the automated approach, intensity, i=1,…, n F1 and F2 values were extracted from the points of • Spectral Tilt (SpecTilt): Spectral energy minimum intensity inside the consonant. difference between a low 50-500 Hz band Additionally, the following acoustic constriction and a high 500-5000 Hz band [8,17], degree correlates were calculated: measured in this case between the consonant • Intensity Ratio (IntRatio): Consonant min- intensity minimum and the following imum intensity divided by the following intensity maximum (see Equation 4). segment's maximum intensity [3,14]. • Intensity Difference A (IntDiff_A): Con- − (4) lfb(50−500 Hz) hfb(500−5000Hz) /b/ /d/ 95% CI for odds ratio 95% CI for odds ratio Lower Upper Lower Upper Estimate (SE) Odds ratio Estimate (SE) Odds ratio (2.5%) (97.5%) (2.5%) (97.5%) Intercept -2.09(0.29)*** 0.07 0.12 0.22 -1.93(0.29)*** 0.08 0.15 0.26 Intensity 0.03(0.01)*** 1.02 1.03 1.04 0.02(0.01)** 1.00 1.02 1.03 F1 0.00(0.00)*** 1.00 1.00 1.00 0.00(0.00)*** 1.00 1.00 1.00 F2 0.00(0.00)* 1.00 1.00 1.00 0.00(0.00)*** 1.00 1.00 1.00 Table 2: MLR results for the automated segmentation approach. Significance codes: ***, p < 0.001; **, p < 0.01; *, p < 0.05. Log-Likelihood: -5348.2; McFadden R2: 0.014856; Likelihood ratio test: 161.3, p < 0.001. 2.2. Analyses 3. RESULTS Three MLR analyses were conducted in R [16] (package: mlogit). The reference level was defined 3.1. Manual approach as /g/ in order to obtain more conservative results, given that /b/ and /dd/ have been shown to be The results for the MLR analysis for the manual acoustically more similar to each other [6]. segmentation approach dataset show that intensity, The analysis for the manual approach data F1 and F2 are able to predict /b/ instead of the included intensity, F1 and F2 as main factors. The reference level /g/ with statistical significance (see same main factors were included for the dataset Table 1). Only F1 and F2 are able to predict /dd/ generated via the automated approach. Finally, the instead of the reference level with statistical analysis for the constriction degree correlates significance. included IntRatio, IntDiff_A, IntDiff_B, MaxVel, MinVel and SpecTilt as main factors. 3.2. Automated approach No multicollinearity was found for the analyses pertaining to the segmentation approaches, however The MLR results for the dataset generated through high correlations were found for some constriction the automated approach (see Table 2) showed that all degree correlates (IntRatio against IntDiff_A: the main factors were able to predict both /b/ and /dd/ r(4946) = -.993; IntRatio against MaxVel: r(4946) = levels instead of the reference level, with statistical -.859; MaxVel against IntDiff_A: r(4946) = .851). significance. /b/ /d/ 95% CI for odds ratio 95% CI for odds ratio Lower Upper Lower Upper Estimate (SE) Odds ratio Estimate (SE) Odds ratio (2.5%) (97.5%) (2.5%) (97.5%) Intercept 5.35(3.19).

Evaluation of Segmentation Approaches and Constriction Degree Correlates for Spirant Approximant Consonants

LINGUISTICS 221 LECTURE #3 the BASIC SOUNDS of ENGLISH 1. STOPS a Stop Consonant Is Produced with a Complete Closure of Airflow

Manner of Articulation

Part 1: Introduction to The

Issues in the Distribution of the Glottal Stop

Cue Switching in the Perception of Approximants: Evidence from Two English Dialects

Towards Zero-Shot Learning for Automatic Phonemic Transcription

A Sociophonetic Study of the Metropolitan French [R]: Linguistic Factors Determining Rhotic Variation a Senior Honors Thesis

Velar Segments in Old English and Old Irish

Package 'Phontools'

Dynamic Modeling for Chinese Shaanxi Xi'an Dialect Visual

LING 220 LECTURE #8 PHONOLOGY (Continued) FEATURES Is The

78. 78. Nasal Harmony Nasal Harmony