ACOUSTIC CHARACTERISTICS OF ARABIC

By MOHAMED ALI AL-KHAIRY

A DISSERTATION PRESENTED TO THE GRADUATE SCHOOL OF THE UNIVERSITY OF FLORIDA IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY UNIVERSITY OF FLORIDA 2005 Copyright 2005 by Mohamed Ali Al-Khairy To my father who did not live to see the fruit of his work. ACKNOWLEDGMENTS After finishing writing this dissertation on a rainy summer night I decided not to bother with a lengthy acknowledgment section. After all I was the one who wrote it. Well, leaving ego and false pride aside, this work could not have been done without the help of many. First and foremost, thanks go to The Almighty GOD for His guidance and blessings without which graduate school would have been a worse nightmare. My gratitude goes also to my wonderful supervisor and mentor Dr. Ratree Wayland whose dedication to her students, teaching, and research is beyond highest expectations. Without her help, guidelines, constant encouragement, and support, this work would not have been possible. Members of my supervisory committee (Dr. Gillian Lord and Dr. Caroline Wiltshire from Linguistics, and Dr. Rahul Shirvastav from Communication Sciences and Disorders) were of the utmost help in the process of finishing this work. My stay in Gainesville introduced me to many people. Most were nice and cheerful and some one could definitively live without. I will skip the latter group to save space. However, among such nice and wonderful people I got to know during this journey are the wonderful students, faculty, and staff of the Linguistics Department who were of tremendous help both personally and academically. My special thanks and gratitude go also to Dr. Aida Bamia and Dr. Haig Der-Houssikian from the Department of African and Asian and Literature. Their supervision, friendship, and encouragement went far beyond the responsibilities of mentors to those of parents. For that I will be eternally grateful. I also would like to thank my study partners, Yousef Al-Dlaigan, who was unjustly forced to change his career, and AbdulWaheed Al-Saadi, who was brave enough

iv to finish his Ph.D. I regret to say that I am still unclear of the process of gene transformation in strawberry and citrus. I hope though you learned from me how to read a spectrogram. I tried my best. Now is the fun part: thanking my friends in the lab. Listed in chronological order of their liberation from school are Rebecca Hill, Jodi Bray, Philip Monahan, Sang-Hee Yeon, HeeNam Park, Victor Prieto, and Manjula Shinge. Yet to feel the wonderful breeze outside Turlington basement are my great friends Andrea Dallas, Bin Li, and Priyankoo Sarmah. I thank them for all the cheerful moments and laughs we shared at the University of Florida. Although life might take us into different routes, our friendship is eternal. Although they are in a different time zone, I thank my friends on the west cost and across the Atlantic for their great advice and emotional support, without which long nights would definitely have been longer. I will send them my phone bills later. I am sure that I left out some names; for those unintentionally missed I extend my apologies and sincere thanks. The acoustic analyses in this dissertaion were carried out in a timely manner thanks to the existence of the wonderful free PRAAT program and the abundant help and suggestion from its authors and the PRAAT user community. Also, I was extremely fortunate to escape the nightmare of typesetting using the popular- but-not-really-friendly commercial software. I thank Ron Smith for making his ufthesis LATEX class freely available. Across oceans and continents, the prayers and encouragement of my parents and siblings were a driving force and endless motivation to finish and join them back home. Although God had other plans for my father and older brother, I am sure they are proud of what their prayers from high above have accomplished. Finally, words fall short in describing my gratitude and thanks toward my wife, Nadaa; and kids, Faisal and Farah. They have suffered through this dissertation

v almost as much as I have; maybe even more. Through the many nights I spent at the lab, they have shown endless patience, love, and understanding. I truly cannot imagine having gone through this process without such amazing love and support. Parts of this work were supported by a McLaughlin Dissertation Fellowship from the College of Liberal Arts and Sciences, University of Florida.

vi TABLE OF CONTENTS page ACKNOWLEDGMENTS ...... iv LIST OF TABLES ...... ix LIST OF FIGURES ...... x ABSTRACT ...... xii CHAPTER 1 INTRODUCTION ...... 1 2 LITERATURE REVIEW ...... 5 2.1 Introduction ...... 5 2.2 Production ...... 5 2.3 Acoustic Cues to Fricative ...... 7 2.3.1 Amplitude Cues ...... 7 2.3.2 Duration Cues ...... 13 2.3.3 Spectral Cues ...... 15 2.3.4 Formant Transition Cues ...... 22 2.4 Studies of Arabic Fricatives ...... 26 3 METHODOLOGY ...... 29 3.1 Data Collection ...... 29 3.1.1 Participants ...... 29 3.1.2 Materials ...... 30 3.1.3 Recording ...... 30 3.2 Data Analysis ...... 31 3.2.1 Segmentation of Speech ...... 31 3.2.2 Acoustic Analyses ...... 34 3.3 Statistical Analyses ...... 40 4 AMPLITUDE AND DURATION ...... 42 4.1 Amplitude Measurements ...... 42 4.1.1 Normalized Frication Noise RMS Amplitude ...... 42 4.1.2 Relative Amplitude of Frication Noise ...... 45

vii 4.2 Temporal Measurements ...... 56 4.2.1 Absolute Duration of Frication Noise ...... 56 4.2.2 Normalized Duration of Frication Noise ...... 59 5 SPECTRAL MEASUREMENTS ...... 63 5.1 Spectral Peak Location ...... 63 5.2 Spectral Moments ...... 69 5.2.1 Spectral Mean ...... 71 5.2.2 Spectral Variance ...... 74 5.2.3 Spectral Skewness ...... 80 5.2.4 Spectral Kurtosis ...... 89 6 FORMANT TRANSITION ...... 96 6.1 Second Formant (F 2) at Transition ...... 96 6.2 Locus Equation ...... 100 7 STATISTICAL CLASSIFICATION OF FRICATIVES ...... 102 7.1 Discriminant Function Analysis ...... 102 7.2 Classification Accuracy of DFA ...... 103 7.3 Classification Power of Predictors ...... 105 7.4 Classification Results ...... 105 8 GENERAL DISCUSSION ...... 111 8.1 Temporal Measurement ...... 112 8.2 Amplitude Measurement ...... 113 8.3 Spectral Measurement ...... 115 8.4 Transition Information ...... 118 8.5 Discriminant Analysis ...... 119 8.6 Conclusion ...... 120 REFERENCES ...... 121 BIOGRAPHICAL SKETCH ...... 127

viii LIST OF TABLES Table page 1–1 Arabic Fricatives ...... 3 4–1 Relative Amplitude: Context ...... 48 4–2 Mean Relative Amplitude ...... 53 5–1 Spectral Peak Location ...... 65 5–2 Spectral Moments ...... 72 5–3 Spectral Skewness: Significant Contrasts for Voiced Fricatives ..... 86 5–4 Spectral Skewness: Significant Contrasts for Voiceless Fricatives .... 86 6–1 Second Formant at Transition ...... 97 6–2 Locus Equation: Slope and -intercept ...... 101 7–1 Prior Probabilities for Group Membership ...... 103 7–2 Variance Accounted for by DFA Functions ...... 104 7–3 Overall Voiceless Classification ...... 107 7–4 Cross-Validated Classification Results ...... 107 7–5 Overall Voiced Classification ...... 109 7–6 Cross-Validated Voiced Classification ...... 109 7–7 Overall Voiceless Classification ...... 109 7–8 Cross-Validated Voiceless Classification ...... 110

ix LIST OF FIGURES Figure page 3–1 Example of Segmentation ...... 32 3–2 Segmentation of /Q/ ...... 33 3–3 Hamming vs. Kaiser Window ...... 35 3–4 Duration ...... 36 4–1 Frication Noise RMS Amplitude ...... 43 4–2 Frication Noise RMS Amplitude: Vowel Context ...... 44 4–3 Frication Noise RMS Amplitude: Place and Voicing ...... 45 4–4 Relative Amplitude ...... 47 4–5 Relative Amplitude: Place and Voicing ...... 49 4–6 Relative Amplitude; Place and Short ...... 51 4–7 Relative Amplitude; Place and Long Vowels ...... 52 4–8 Relative Amplitude: Voicing and Short Vowels ...... 54 4–9 Relative Amplitude: Voicing and Long Vowels ...... 55 4–10 Fricative Duration: Place and Voicing ...... 57 4–11 Fricative Duration: Place and Voicing Interactions ...... 58 4–12 Fricative Duration: Vowel Context ...... 59 4–13 Normalized Frication Noise: Place and Voicing ...... 60 4–14 Normalized Fricative Duration: Place and Voicing Interactions .... 61 4–15 Normalized Frication Noise: Vowel Context ...... 62 5–1 Spectral Peak Location: Place and Voicing ...... 66 5–2 Spectral Peak Location: Place × Voicing Interaction ...... 67 5–3 Spectral Peak Location: Place × Vowels ...... 68 5–4 Spectral Peak Location: Place × Short Vowel Interaction ...... 69

x 5–5 Spectral Peak Location: Place × Long Vowel Interaction ...... 70 5–6 Spectral Mean: Place and Voicing ...... 75 5–7 Spectral Mean: ...... 76 5–8 Spectral Mean: Place × Voicing Interaction ...... 77 5–9 Spectral Mean: Vowel ...... 78 5–10 Spectral Variance: Place and Voicing ...... 81 5–11 Spectral Variance: Place × Voicing Interaction ...... 82 5–12 Spectral Variance: Vowel ...... 83 5–13 Spectral Skewness: Place and Voicing ...... 85 5–14 Spectral Skewness: Voice ...... 87 5–15 Spectral Skewness: Place × Voicing Interaction ...... 88 5–16 Spectral Skewness: Vowel ...... 89 5–17 Spectral Kurtosis: Place and Voicing ...... 91 5–18 Spectral Kurtosis: Voicing ...... 93 5–19 Spectral Kurtosis: Place × Voice interaction ...... 94 5–20 Spectral Kurtosis: Vowel ...... 95 6–1 Second Formant: Place × Voicing Interaction ...... 98 6–2 Second Formant: Vowel Context ...... 99 6–3 Locus Equation ...... 100 7–1 Discrimination Plane ...... 108 7–2 Discrimination Plane by Voicing ...... 110

xi Abstract of Dissertation Presented to the Graduate School of the University of Florida in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy ACOUSTIC CHARACTERISTICS OF ARABIC FRICATIVES By Mohamed Ali Al-Khairy August 2005 Chair: Ratree Wayland Major Department: Linguistics The acoustic characteristics of fricatives were investigated with the aim of finding invariant cues that classify fricatives into their place of articulation. However, such invariant cues are hard to recognize because of the long-noticed problem of variability in the acoustic signal. Both intrinsic and extrinsic sources of variability in the speech signal lead to a defective match between a signal and its percept. Nevertheless, such invariance can be circumvented by using appropriate analysis methods. The 13 fricatives of Modern Standard Arabic (/f, T, D, DQ, s, sQ, z, S, X, K, è, Q, h/) were elicited from 8 male adult speakers in 6 vowel contexts (/i, i:, a, a:, u, u:/). The acoustic cues investigated included amplitude measurements (normalized and relative frication noise amplitude), spectral measurements (spectral peak location and spectral moments), temporal measurements (absolute and normalized frication noise duration), and formant information at fricative-vowel transition (F2 at vowel onset and locus equation). For the most part, fricatives in Arabic had patterns similar to those reported for similar fricatives in other languages (e.g., English, Spanish, Portuguese) . A discriminant function analysis showed that among all the cues investigated, spectral

xii mean, skewness, second formant at vowel onset, normalized RMS amplitude, relative amplitude, and spectral peak location were the variables contributing the most to overall classification with a success rate of 83.2%. When voicing was specified in the model, the correct classification rate increased to 92.9% for voiced and 93.5% for voiceless fricatives.

xiii CHAPTER 1 INTRODUCTION Since the early years of speech research, studies (using various models and methods) have focused on finding the properties that distinguish among naturally produced speech sound. Many such studies investigated the properties of the acoustic signal through which sound is transmitted from speaker to hearer. However, the task is complicated by the long-noticed problem of variability in the acoustic signal resulting in a defective match between a signal and its percept (Liberman, Cooper, Shankweiler, and Studdert-Kennedy 1967). The production mechanism of speech sounds, particularly fricatives, involves intrinsic sources of variability arising from changes in the shape of the vocal tract and the rate of air flow (Strevens 1960; Tjaden and Turner 1997). Variability in the speech signal also arises from extrinsic sources including speaker age (Pentz, Gilbert, and Zawadzki 1979), vocal tract size (Hughes and Halle 1956), speaking rate (Nittrouer 1995), and linguistic context (Tabain 2001). Variability in speech also is often a result of a combination of these factors. Withstanding the variability found in the speech signal, numerous studies (Stevens 1985; Behrens and Blumstein 1988a,b; Forrest, Weismer, Milenkovic, and Dougall 1988; Sussman, McCaffrey, and Matthews 1991; Hedrick and Ohde 1993; Jongman, Wayland, and Wong 2000; Abdelatty Ali, Van der Spiegel, and Mueller 2001; Nissen 2003) found invariant cues in the speech signal when the appropriate analyses are carried out. Along this line of research, our study investigated the defining properties of fricative sounds as produced in Modern Standard Arabic (MSA).

1 2

We used Arabic fricatives for three equally important reasons. First, the articulatory space of fricatives in Arabic spans across most of the places of articulation in the vocal tract, starting from the and ending at the glottis. Second, unlike most of the languages used in acoustic studies of fricatives, Arabic has two unique features that serve a phonemic distinction: pharyngeal co-articulation and segment length. Specifically, a phonemic distinction exists between plain fricatives (/D/ and /s/) and their pharyngealized counterparts /DQ/ and /sQ/ in Arabic. Furthermore, although governed by some phonological distribution rules, and vowel length in Arabic are phonemic. Third, most studies on the acoustic characteristics of fricatives were conducted predominantly with reference to English fricatives. Given the phonetic status of Arabic and the gap in the literature due to the lack of Arabic-related research, our study is theoretically and empirically important. Our findings will contribute generally to the way fricative production is viewed and specifically to the way languages differ in that respect. Further, such findings will aid speech synthesis and parsing softwares related to the less-understood, yet important, Arabic . As mentioned, both consonant and vowel length are phonemic in Arabic. However, to compare and contrast the performance of cues used in our study with those reported in the literature for other languages, we examined only vowel–length variations. The inventory of fricatives in Arabic is shown in Table 1–1. Arabic has 11 fricatives, with only 4 pairs in voicing contrast. Also, for voiced dental and voiceless alveolar fricatives, a pharyngealized counterpart also exists. The voiced post-alveolar fricative /Z/ was excluded, since it was articulated in most of the elicited data as an affricate /Ã/. Studies of Standard Arabic and Arabic dialectology suggest that /Z/ is realized as either /Z, Ã, g/ or /j/ depending on the geographical region in which Arabic is spoken (Kaye 1972). 3

Table 1–1. Place of articulation of Arabic fricatives Labio- Post- Dental Alveolar Uvular Pharyngeal Glottal dental alveolar voiceless f T s S X è h voiced D z K Q /D/ and /s/ have pharyngealized counterparts /DQ/ and /sQ/.

Both local (static) and global (dynamic) cues have been shown to participate in the identification of (English) fricatives. Specifically, three main acoustic features have been examined in research aimed to distinguish fricatives: the spectral properties of the frication noise, the relation between the frequency characteristics of frication noise versus the vowel, and duration of frication noise. Our study aimed to describe the acoustic characteristics of Arabic fricatives using many of the acoustic measurements used in other related studies with specific interest in finding cues that differentiate between plain and pharyngealized fricatives. Our study also aimed to see if phonemic differences in vowel length affect the acoustic cues measured. Our data were elicited from 8 male adult speakers (mean age = 20) who had no history of hearing or speaking impairments and who had limited experience with English as a second language. Cues investigated in our study were amplitude measurements (normalized and relative frication noise amplitude), spectral measurements (spectral peak location and spectral moments), temporal measurements (absolute and normalized frication noise duration), and formant information at fricative-vowel transition (F2 at vowel onset and locus equation). Normalized amplitude is defined here as the ratio between the average RMS amplitude (in dB) of three consecutive pitch periods at the point of maximum vowel amplitude and the RMS amplitude of the entire frication noise. Relative amplitude, on the other hand, is defined as the amplitude of the frication noise relative to the vowel amplitude measured in certain frequency regions. Spectral peak location relates the fricative place of articulation to the 4 frequency location of energy maximum in the frication noise. Spectral moments analysis is a statistical approach that treats FFT spectra as a random probability distribution from which the first four moments (mean, variance, skewness, and kurtosis) are calculated. Spectral mean refers to the average energy concentration and variance to its range. Skewness, on the other hand, is a measure of spectral tilt that indicates the frequency of most energy concentration. Kurtosis is an indicator of the distribution peakedness. Formant transitions were assessed using locus equations that relate second formant frequency at vowel onset (F2onset) to that at vowel midpoint (F2vowel). Along with reporting how each of the acoustic measures mentioned above differentiates between different places of fricatives articulation, we used a statistical method (discriminant function analysis) to find the most parsimonious combination of acoustic cues that distinguish among the different places of fricative articulation and the contribution of each selected cue to the overall classification of fricatives into their places of articulation. CHAPTER 2 LITERATURE REVIEW 2.1 Introduction

In this chapter we review relevant literature that deals with the acoustic characteristics that have been shown to be effective in differentiating among fricative place of articulation and voicing in the world’s languages. Given the fact that certain fricatives that exist in Standard Arabic (e.g., pharyngealized vs. non-pharyngealized) do not occur in other languages of the world, in this chapter, we also discuss whether these acoustic cues will be effective in differentiating acoustically among Standard Arabic fricatives. 2.2 Fricative Production

Fricative production is best described in terms of the source-filter theory of speech production (Fant 1960). According to that theory, speech can be modeled as a result of two independent components: a source signal (which could be the glottal source, or noise generated at a compressed level in the vocal tract); and a filter (reflecting the resonance in the cavities of the vocal tract downstream from the glottis, or the constriction). The basic mechanism for fricative production is that a turbulence forms in the air flow at a point in the oral cavity. To generate such turbulence, a steady air flow with velocity greater than a critical number1 passes through a narrow constriction in the oral cavity and forms a jet that mixes with surrounding air in

1 This number is Reynold’s Number (Re) which is a dimensionless quantity that relates the constriction size to the volume velocity needed to produce turbulence in the air. For speech Re > 1800 (Kent and Read 2002).

5 6 the vicinity of a constriction to generate eddies. These eddies, which are random velocity fluctuations in the air flow, act as the source for frication noise (Stevens 1971). Depending on the nature of the constriction, frication noise can also be generated at either an obstacle or a wall (Shadle 1990). According to Shadle, obstacle source refers to fricatives in which sound is generated primarily at a rigid body perpendicular to the air flow. An example is the production of voiceless alveolar and voiceless post-alveolar fricatives (/s, S/): the upper and lower teeth, respectively, act as the spoiler for the airflow. Such sources are characterized by a maximum source amplitude for a given velocity. On the other hand, wall source occurs when sound is generated primarily along a rigid body parallel to the air flow. Spectrums of sounds generated by a wall source, like voiced and voiceless velar fricatives (/x, G/), are characterized by a flat broad peak with less amplitude than sounds of obstacle sources (Shadle 1990). Vibration of the vocal folds also adds to the sources responsible for voiced fricative production. Whatever the source, the resulting turbulence is then modified by the resonance characteristics of the vocal tract (filter). The spectrum of the product of such a filter represents the effect of transfer function of the vocal tract which in turn depends on 1) the natural frequencies of the cavities anterior to the constriction (poles), 2) the radiation characteristics of the sound leaving the mouth, and 3) the resonant frequency of the posterior cavity (zeros). For fricatives, the vocal tract is tightly constricted and hence the coupling between the front and back cavities is small (Johnson 1997). Therefore, the transfer function of the vocal tract for fricatives is largely dependent on the resonances of the front cavity. The nth resonance can be calculated using Equation (2–1) where c is the speed of sound and l is the length of the vocal tract. In case a strong coupling occurs between the front and back cavities, such as when the “constriction is gradually tapered” (Kent and Read 2002, p. 43), the resonances of the back cavity are calculated using Equation 7

(2–2). Resonances of the back and front cavities sharing the same frequency and bandwidth cancel each other out.

(2n − 1) c fn = (2–1) front 4l (n) c fn = (2–2) back 2l

2.3 Acoustic Cues to Fricative Place of Articulation

Both local (static) and global (dynamic) cues have been shown to participate with different degrees in the identification of (English) fricatives. The three main acoustic cues that have been of most interest in the literature on fricatives are the amplitude and spectral properties of the frication noise, the relationship between the frequency characteristics of frication noise and those of the vowel, and the role of duration of frication noise in distinguishing fricative place and voicing. 2.3.1 Amplitude Cues

2.3.1.1 Frication amplitude

Most studies of frication noise amplitude have focused on (English) voiceless fricatives, and found similar results: (/s, z, S, Z/) have higher amplitude than nonsibilants (/f, v, T, D/) with no differences within each class. This difference in amplitude between sibilants and nonsibilants is predictable if one looks into the aerodynamics of producing these fricatives. For example, to examine fricative production mechanisms, Shadle (1985) used a mechanical model in which constriction area, length, location can vary, and the presence or absence of an obstacle can be manipulated. Based on results from spectra produced using such a model, Shadle (1985) concluded that the lower teeth act as an obstacle at some 3 cm downstream from the noise source of constriction. Such configuration results in an increase in turbulence of the airflow, which in turn causes an increase in the sibilant amplitude. Nonsibilant fricatives, on the other hand, have no such obstacle, resulting in very low energy levels. The difference between the sibilant 8 and nonsibilant fricatives with regard to frication amplitude was also found to have auditory salience. McCasland (1979) studied the role of amplitude as a perceptual cue to fricative place of articulation. He cross-spliced naturally spoken syllables of English /f, T, s, S/ and /i/ such that the fricative part in /si/ and /Si/ was cross-spliced to the vocalic part of both /fi/ and /Ti/. The overall amplitude of the spliced-in frication noise was attuned to the same level of intensity as that of the original nonsibilant fricative by reducing /s, S/ amplitude to that of /f/ and /T/. The resulting fricative-vowel syllables sounded like /fi/ and /Ti/ when the vocalic part of the utterance was coming from an original /fi, Ti/, respectively. These findings led McCasland to conclude that the low amplitude of nonsibilant fricatives was used as a perceptual cue to distinguish them from the sibilants /s, S/. However, because of the cross-splicing method used, it is not clear whether the results can be attributed solely to the reduction of /s, S/ amplitude. In fact, Behrens and Blumstein (1988a) pointed out that the results of McCasland’s method are not conclusive since the method involves mismatching information from frication noise and vocalic transition. Specifically, it is not clear whether listeners were using the reduced noise amplitude of sibilants as a cue for nonsibilants, or they were using transitional information in the original vocalic part of the nonsibilant to judge the token to be /f, T/. Listeners might be using either one of those cues, or both; and there was no way of telling which, using the cross-splicing methodology. One way to remedy the shortcomings of the cross-splicing method is to use synthetic speech. Gurlekian (1981) used synthetic /sa, fa/ syllables in which the frequency and the amplitude of the vowel were kept constant in order to test whether the distinction between sibilant and nonsibilant fricatives could be based solely on differences in their noise amplitude. For fricatives, the center frequency of the noise was kept fixed at 4500 Hz, while its amplitude was manipulated to vary relative to the fixed vowel amplitude. The central frequency used was similar to the 9 range at which /s/ was correctly identified 90% of the time by Argentine Spanish listeners (Manrique and Massone 1979), and within the range described for English /s/ (Heinz and Stevens 1961). An identification test with 6 Argentine Spanish and 6 English listeners showed that both groups assigned a /fa/ percept to the tokens with low noise amplitude and a /sa/ percept to those with high noise amplitude. Also, Behrens and Blumstein (1988a) investigated the role of fricative noise amplitude in distinguishing place of articulation among fricatives. Basically, Behrens and Blumstein altered the amplitude of the frication part of CV syllables, with the C being one of /f, T, s, S/, while preserving the vocalic part of the utterance. This matching was done by raising the noise amplitude of /f, T/ to that of /s, S/ and conversely, lowering the noise amplitude of /s, S/ to that of /f, T/ without substituting or changing the vocalic part of the utterance. They found, contrary to previous studies, that the overall amplitude of the fricative noise relative to the amplitude of the following vowel does not constitute the primary cue for sibilant/nonsibilants distinction. Therefore, Behrens and Blumstein called for an integration of spectral properties and amplitude characteristics of fricatives in order to successfully discriminate among their places of articulation. Another way to capture classification information found in frication noise amplitude is to measure the Root-Mean-Square (RMS) amplitude of the fricative noise normalized relative to the vowel. Jongman et al. (2000) used this method in their large-scale study of English fricatives. Among the many measures used to characterize fricatives, Jongman et al. measured the difference between the average RMS amplitude (in dB) of three consecutive pitch periods at the point of maximum vowel amplitude and the RMS amplitude of the entire frication noise. Results were derived from 20 native speakers of (10 females and 10 males). The speakers produced all 8 English fricatives in the onset of CVC syllables with the rhyme consisting of each of six vowels /i, e, æ, A, o, u/ and /p/. The authors 10 found that this “normalized RMS amplitude” can differentiate among all four places of fricatives in English with voiced fricatives having a smaller amplitude than their voiceless counterparts. The integration of fricative and vowel amplitude as a way of normalization was also used for automatic recognition of continuous speech. Abdelatty Ali et al. (2001) used Maximum Normalized Spectral Slope (MNSS), which relates the spectral slope of the frication noise spectrum to the maximum total energy in the utterance, thus capturing the spectral shape of the fricative and its amplitude in addition to the vowel amplitude features in one quantity. It differs, however, from Jongman and colleagues’ normalized amplitude in two ways: first it uses peak amplitude instead of RMS amplitude for the vowel and the fricative; and second, it uses only the strongest peak of the fricative (as opposed to whole frication noise) and normalizes that in relation to the strongest peak of the vowel (as opposed to the average of the strongest three pitch periods). For MNSS, a statistically determined threshold (0.01 for voiced and 0.02 for voiceless fricatives) is used to classify the fricative as nonsibilant if MSNN falls below the threshold, and as sibilant if it is above it. Using such criteria, Abdelatty Ali et al. obtained a 94% recognition accuracy of sibilant vs. nonsibilants fricatives. No further information was given on using MSNN to classify fricatives within these classes. 2.3.1.2 Relative amplitude

Since amplitude cues from the frication noise and spectral cues of the vocalic part in a syllable depend on each other (Behrens and Blumstein 1988a; Jongman et al. 2000); changes in amplitude might carry more perceptual weight if the frequency range over which such changes occur is taken into consideration. Such integration was presented by Stevens and Blumstein (1981) as an invariant property of speech production. They demonstrated theoretically that different amplitude changes that occur at the consonant-vowel boundary in certain frequency 11 ranges are related to articulatory mechanisms associated with certain places in the vocal tract. Therefore, listeners might be using these relational values as a cue for the place of a consonant production. To test this claim, Stevens (1985) synthesized sibilant/nonsibilant and anterior/nonanterior continua such that the frication noise amplitude at certain frequency ranges on the continuum was gradually changed from one stimuli to the other. Listeners’ judgments abruptly shift from /T/ to /s/ when the amplitude of frication noise in the fifth and sixth formant frequency regions (F5 & F6 ) is increased relative to the amplitude in the same frequency regions at vowel onset. On the other hand, listeners identified the consonant to be /s/ rather than /S/ when the frication noise amplitude at the F3 region, relative to F3 amplitude of the vowel, rises at the transition and as /S/ if it falls. These findings led Stevens to hypothesize that the vowel is used as an “anchor against which the spectrum of the fricative noise is judged or evaluated” (Stevens 1985, p. 249). Other researchers tried to test the robustness of this feature in different contexts. Hedrick and Ohde (1993) looked into the effect of frication duration and vowel context on the relative amplitude and whether such changes would affect perception of fricative place of articulation. This was done by varying the amplitude of the fricative relative to vowel onset amplitude at F3 and F5 for the contrast /s/-/S/ and /s/-/T/ respectively. Frication duration and vowel context also varied. Ten adult listeners with no history of speech or hearing disorders who successfully perceived (with 70% accuracy) the end points of /s - S/ and /s - T/ continua were asked to identify each stimulus as one member of the contrastive pairs above. In the /s/-/S/ contrast, listeners chose more /s/ responses when presented with lower relative amplitude and more /S/’s when presented with higher relative amplitude. These findings held constant across the different vowel and duration conditions and were in agreement with those obtained by Stevens (1985). 12

Furthermore, the additional post-fricative vowel contexts in Hedrick and Ohde’s study influenced only the magnitude of the relative amplitude effect for a given contrast. Hedrick and Ohde claim that relative amplitude is used as a primary invariant cue since listeners used relative amplitude information more effectively than the context-dependent formant transitions. To further test this assumption, Hedrick and Ohde (1993) also varied along a continuum the appropriate formant transitions of the contrasts presented above while keeping the relative amplitude fixed across all stimuli. The hypothesis was that if relative amplitude was indeed a primary cue, then variation in formant transition would not affect identification of members of the contrasting pair. Their findings indicate that for the /s/-/S/ contrast, formant transition did affect the identification of at least the end points of the continua. For the /s/-/T/ contrast, formant transitions had a negligible effect on the identification of the two fricatives even at boundary points. Taken together, all these findings indicate that relative amplitude is part of a primary cue to fricative place of articulation. Such a role becomes more salient when the contrast involves sibilant vs. nonsibilant fricatives. Additionally, Hedrick and Ohde (1993) findings also suggest that formant transitions do influence the perception of fricative place of articulation, at least among sibilants. However, a trading relationship seems to exist between the use of the two cues in the presence of factors obstructing an effective use of a given cue. Hedrick (1997) found that listeners with sensorineural hearing loss relied less on formant transition information than on relative amplitude in discriminating between English /s/ and /f/. On the other hand, listeners with normal hearing showed the opposite preference. This was the case even when the formant transition information was presented at a level audible to listeners with sensorineural hearing loss. So far, relative amplitude has been shown only to differentiate between sibilants and nonsibilants as a class, with the exception of Jongman et al. (2000) 13 study, in which they found that relative amplitude, as defined by Hedrick and Ohde (1993), also differentiates among all four places of fricatives articulation in English. 2.3.2 Duration Cues

Fricative duration measures were used in previous research mainly to differentiate between sibilants and nonsibilants, and to assess the voicing of fricatives. One such study was conducted by Behrens and Blumstein (1988b) who recorded three native speakers of English producing each of the 4 English voiceless fricatives /f, T, s, S/ followed by one of the five vowels /i, e, a, o, u/. They found that sibilants /s, S/ were longer than nonsibilants /f, T/ with an average difference of 33 ms. Also, they found no significant differences between the duration of members of the same class. The vowel effect was found to be minimal and only among the nonsibilant fricatives. Similar results were obtained by Pirello, Blumstein, and Kurowski (1997). The researchers also found that alveolar fricatives were longer on average than labiodental fricatives in English. Jongman (1989) questioned the importance of frication noise duration as a cue for fricative identification. He found that listeners can identify fricatives based on a fraction of its frication noise duration. In a perception test, listeners only needed as little as 50-ms of the initial frication noise of a naturally produced fricative-vowel syllable to successfully classify fricatives. Although cues like amplitude or spectral properties localized at the initial parts of the frication noise may have been used here, it is important to note that such results undermine the significance of an absolute duration value in classifying fricatives. Temporal features of speech can vary as a function of speaking rate. In fact, when frication noise duration was normalized by taking the ratio of fricative duration over word duration, Jongman et al. (2000) found a significant difference among all places of fricative articulation with the exception of the labiodental and interdental contrast. 14

Frication noise duration has also been used to assess the voicing distinction between fricatives of the same place of articulation. Cole and Cooper (1975) examined the role of frication noise duration on the perception of voicing in fricatives. They found that decreasing the length of frication noise of voiceless fricative in syllable-initial position resulted in a shift in their perception toward their voiced counterparts. They noted also that in syllable-final position, duration of the frication noise relative to that of the preceding vowel becomes the cue for fricative voicing (voiced fricatives being shorter than voiceless). Similar findings were also obtained by Manrique and Massone (1981) for Spanish fricatives /B, f, D, s, S, Z, x, G/ in three conditions: isolated, in CV syllables, and CVCV words. Noise duration was significantly shorter for voiced fricatives than for voiceless fricatives in all three conditions. However, of these fricatives, only /S, Z/ and /x, G/ are homorganic; while the other two pairs do not share the same place of articulation (Baum and Blumstein 1987). Therefore, the reported temporal differences in Manrique and Massone’s study might have been due to factors other than fricative voicing since, as mentioned previously, durational differences existed between fricatives sharing the same voicing but belonging to different places of articulation (Behrens and Blumstein 1988b). Nevertheless, Baum and Blumstein’s own experiments showed that syllable-initial voiceless English fricatives in citation forms are longer than their voiced counterparts. However, they noted considerable overlap in duration distributions of voiced and voiceless fricatives at all places studied. Using connected speech, Crystal and House (1988) also found that, on average, voiceless fricatives in word-initial position are longer than voiced fricatives. Like Baum and Blumstein’s results, there was a considerable amount of overlap between the duration distributions of the voiced and voiceless fricatives in connected speech. Again, the use of duration per se as the sole cue for fricative voicing was questioned 15 by Jongman (1989) who found that identification of fricatives voicing was accurate (83%) even if only 20 ms of frication noise is used. However, Jongman et al. (2000) used a relative measure of duration to quantify its use as a cue for fricative voicing. Normalized fricative noise duration (defined as the ratio of fricative duration over that of the carrier word) significantly longer for voiceless than for voiced fricatives. They also found that such differences are more apparent in nonsibilant than in sibilant fricatives. 2.3.3 Spectral Cues

In addition to amplitude and duration, spectral properties of the frication noise have been investigated to find cues that identify fricative place of articulation. Among the spectral properties previously studied are spectral peak location and spectral moments measurements. 2.3.3.1 Spectral peak location

One of the early attempts to relate the fricative place of articulation to the frequency location of energy maximum in the frication noise was the study by Hughes and Halle (1956). In this study, gated 50 ms windows of the frication noise were used to produce spectra of English fricatives /f, v, s, z, S, Z/. An investigation of the fricative spectra revealed that for some speakers a strong energy component was located at the frequency region below 700 Hz for the spectrum of voiced fricatives. Such energy concentration was absent at the same region for voiceless fricatives. However, these findings were not consistent among all speakers. Based on this inconsistency, in addition to the similarities found between the spectra of homorganic voiced and voiceless fricatives above 1 kHz, Hughes and Halle ruled out the use of spectral prominence as a basis for voicing distinction among fricatives. On the other hand, the distinction of place was found to be related, to a certain extent, to the location of the most prominent spectral peak. Hughes and Halle found that /f, v/ had a relatively flat spectrum below 10 kHz, whereas 16 spectral prominence was observed for /S, Z/ at the region of 2-4 kHz, and for /s, z/ at the region above 4 kHz. Also, they found that the exact location of the peak for each fricative was lower for males and higher for females. Based on these observations, Hughes and Halle concluded that the size and shape of the resonance chamber in front of the fricative’s point of constriction determine the place of energy maximum in frication noise spectra. Specifically, they reported that the length of the vocal tract from the point of constriction to the lips was inversely related to the frequency of the peak in the spectrum. Thus, the spectral peak increases as the point of articulation becomes closer to the lips. Such observations are consistent with predictions made by the the source-filter theory of speech production presented in section 2.2. Strevens (1960) also looked into the use of spectral prominence to differentiate between fricatives through examining the front (/F, f, T/), mid (/s, S, ç/) and back (/x, X, h/) voiceless fricatives as produced by subjects with professional training in phonetics. Based on average line spectra, Strevens found that the front fricatives were characterized by unpatterned low intensity and smooth spectra, the mid fricatives by high intensity with significant peaks on the spectra around 3.5 kHz and the back fricatives by medium intensity and a marked formant like structure with peaks around 1.5 kHz. The results reported above for front and mid fricatives were also shown to be perceptually valid (Heinz and Stevens 1961). Using a synthesized continuum of white noise with spectral peaks in ranges representative of those found in /S, ç, s, f, T/, Heinz and Stevens found that participants were consistently shifting the identification of the fricative from /S/ to /ç/ to /s/ to /f, T/ as the peak of the resonance frequency increased, with no distinction that could be made between /f, T/. 17

Similar properties were also found for fricatives in Spanish. In their study of Spanish fricatives, Manrique and Massone (1981) found that /s/, /f/ and /T/ have spectral peak values comparable to the English fricatives as reported by Hughes and Halle (1956). Furthermore, they reported finding that spectral energy in /x/ is concentrated in a low narrow frequency band continuous with the F2 of the following vowel and that /ç/ spectral frequency is concentrated at a low band continuous with F3 of the following vowel. Manrique and Massone (1981) also examined the identification of a subset of Spanish fricatives to see whether changes in spectral peak location would change the way fricatives are perceived by Spanish speakers. They synthesized 9 cascade stimuli of the middle 500 ms of each of a deliberately lengthened /f, s, S, x/ using a set of low- and high-pass filters so that only certain spectral zones were present for each stimuli. The unfiltered fricatives had recognition scores ranging from 95% for /f/ and /s/, to 100% for /S/ and /x/. For the filtered fricatives, they found that the spectral peak location carries the perceptual load for the identification of /s/, /S/, and /x/. However, the diffused spectrum of /f/ was believed to be the characterizing factor of its identifiability. Other studies of English fricatives confirmed that spectral peak location can classify sibilants from nonsibilants as a class, and only between sibilants. For example, Behrens and Blumstein (1988b) found that for English voiceless fricatives, major spectral peaks in ranges within 3.5-5 kHz were apparent for /s/ and within 2.5-3.5 kHz for /S/. On the other hand /f/ and /T/ appeared flat with a diffused spread of energy from 1.8-8.5 kHz with a good deal of variability in their spectral shape. The same pattern was also observed across age groups. Pentz et al. (1979), for example, compared the spectral properties of English fricatives (/f, v, s, z, S, Z/) produced by preadolescent children to that reported for adults. As reported for adults elsewhere, they found the same pattern of energy localization and constriction point. However, the values obtained from children in their study 18 were higher than those obtained for male and female adult speakers in the studies mentioned above. This difference was attributed in large part to the differences in vocal tract lengths. Male adult speakers have the longest vocal tract and the lowest vocal tract resonance, while children have the shortest vocal tract and the highest vocal tract resonance; female adult speakers fall between the two groups. In another study, Nissen (2003) investigated, among other metrics, the spectral peak location of voiceless English as produced by male and female speakers of four different age groups. For the fricatives in the study, he found that “the spectral peak decreased as a function of increased speaker age” (Nissen 2003, p. 139). Beside being age and gender dependent, spectral peak location has also been found to be vowel dependent (Mann and Repp 1980; Soli 1981) and highly variable for speakers with neuromotor dysfunction (Chen and Steven 2001) due to their lack of control over articulatory muscles. However, in contrast to all the studies mentioned above, Jongman et al. (2000) found that across all (male and female) speakers and vowel contexts, all four places of fricative articulation in English were significantly different from each other in terms of spectral peak location. Further, they found spectral peak location to reliably differentiate between /T/ and /D/ and between /f/ and /v/. The researchers justified the use of the larger analysis window they adopted in their study, as compared to other studies, as a way to obtain better resolution in the frequency domain at the expense of temporal domain resolution. They argue that such a compromise is advantageous due to the stationary nature of frication noise. In summary, spectral peak location for the fricatives increases as the constriction becomes closer to the open end of the vocal tract. Also, spectral peak for back fricatives shows a formant-like structure similar to the following vowel. Both of these generalizations can be accounted for by the source-filter theory of speech production. Fricatives are characterized by turbulent airflow through a 19 narrow constriction in the oral cavity, with the portion of the vocal tract in the front of the constriction effectively becoming the resonating chamber. For long and narrow constrictions, like fricatives, the acoustic theory of speech production predicts that the only present resonance components in the spectrum are those related to the area in front of the constriction due to lack of acoustic coupling from the cavity behind the constriction (Heinz and Stevens 1961). The size of the resonating cavity, therefore, can be inversely correlated with the frequency of the most prominent peak in the spectrum (Hughes and Halle 1956). As a result of this correlation, fricatives produced at or behind the alveolar region are characterized by a well-defined spectrum with peaks around 2.5-3.5 kHz for /S, Z/ and at 3.5-5 kHz for /s, z/. However, due to the very small area in front of the constriction, fricatives produced at the labial or labiodental area are characterized with a flat spectrum and a diffused spread of energy between 1.5 and 8.5 kHz. Since nonsibilant production creates a cavity in close proximity to the open end of the vocal tract, different degrees of rounding (Shadle, Mair, and Carter 1996), and the additional turbulence produced by the air stream hitting the teeth (Strevens 1960; Behrens and Blumstein 1988a) will introduce a great amount of variability in the location of the energy concentration. On the other hand, sibilants usually have a clearly defined spectral peak location. However, for speakers with limited precision over the placement of the constriction (Chen and Steven 2001), such variability also exists for sibilants. 2.3.3.2 Spectral moments

Spectral moments analysis is another metric that has been used for fricative identification. Unlike spectral peak location analysis, this statistical approach captures both local (mean frequency and variance) and global (skewness and kurtosis) aspects of fricative spectra. Spectral mean refers to the average energy concentration and variance to its range. Skewness, on the other hand, is a measure 20 of spectral tilt that indicate the frequency of the most energy concentration. Skewness with a positive value indicates a negative spectral tilt with energy concentration at the lower frequencies, while negative skewness is an indication of positive tilt with energy concentration at higher frequencies (Jongman et al. 2000). Kurtosis is an indicator of the distribution’s peakedness. One of the early applications of spectral moments to classify speech sounds was the study by Forrest et al. (1988) on English obstruents. For the fricatives in that study, Forrest et al. generated a series of Fast Fourier Transforms (FFT) using a 20 ms analysis window with a step-size of 10 ms that started at the onset through three pitch periods into the vowel. The FFT-generated spectra were then treated as a random probability distribution from which the first four moments (mean, variance, skewness, and kurtosis) were calculated. The spectral moments obtained from both linear and Bark scales were entered into a discriminant function analysis in an attempt to classify voiceless fricatives according to their place of articulation. Classification scores, on both scales, were good for the sibilants /s/ and /S/ with 85% and 95% respectively. The nonsibilants, on the other hand, were not as accurately classified using any moment on either of the two scales (58% for /T/ and 75% for /f/). Subsequent implementations of the spectral moment analysis tried to extend or replicate Forrest et al. approach with some modifications. The study by Tomiak (1990) of English voiceless fricatives, for example, used a different analysis window (100 ms) at different locations of the English voiceless frication noise. Like in previous research, spectral moments were successful in classifying sibilants and /h/ data. In the case of nonsibilants, it was found that the most useful spectral information is contained in the transition portion of the frication. Additionally, in contrast to Forrest et al., Tomiak found an advantage for the linearly derived moment profiles over the Bark-scaled ones. 21

Spectral moments were also used by Shadle et al. (1996) to classify voiced and voiceless English fricatives. The study involved spectral moments measured from discrete Fourier transform (DFT) analyses performed at different locations within the frication noise and at different frequency ranges. They found that spectral moments provided some information about fricative production but did not discriminate reliably between their different places of articulation. Furthermore, their results indicated that spectral moments are sensitive to the frequency range of the analysis. However, the moments were not sensitive to the analysis position within the fricative. Similar results were also obtained for children (Nittrouer, Stiddert-Kennedy, and McGowan 1989; Nittrouer 1995). The use of spectral moments as a tool to distinguish between /s/ and /S/ was also extended to atypical speech and found to be reliable. Tjaden and Turner (1997), for example, compared spectral moments obtained from speakers with amyotrophic lateral sclerosis (ALS) and healthy controls matched for age and gender and found that the first moment was significantly lower for the ALS group. Tjaden and Turner suggested that the low means values found among ASL speakers can be attributed to difficulties they face at making the appropriate degree of constriction required to produce frication, or to a weaker subglottal sound source due to weak respiratory muscles that are common with ASL speakers. The studies mentioned so far demonstrate the ability of spectral moments to distinguish sibilants from nonsibilants as a class and that they can reliably distinguish only among sibilants. However, contrary to the studies mentioned above, Jongman et al. (2000) found that spectral moments were successful in capturing the differences between all four places of fricative articulation in English. Jongman et al. study, however, differs from other studies in that it calculated moments from a 40 ms FFT analysis window placed at four different places in the frication noise (onset, mid, end, and transition into vowel) and that it uses a 22 larger and more representative number of speakers and tokens (2880 tokens from 20 speakers) as compared to a smaller population in other studies. Across moments and window locations, variance and skewness at onset and transition were found to be the most robust classifiers of all four places. Also, on average, variance was shown to effectively distinguish between voiced and voiceless fricatives with the former having greater variance. 2.3.4 Formant Transition Cues

2.3.4.1 Second formant at transition

Early research on formant transition focused on perceptual usefulness of such information in classifying speech sounds. For example, Harris (1958) recorded the English fricatives /f, v, T, D, s, z, S, Z/ followed by one of each of the vowels /i, e, o, u/. Then she spliced and recombined vocalic and frication partitions of all CV combinations. Listeners correctly identified sibilant fricatives regardless of the source of the cross-spliced vocalic part. Frication noise alone was sufficient for correct identification of sibilant fricatives. On the other hand, among nonsibilant fricatives, a correct identification as /f, v/ occurred only when the vocalic part was matching (i.e. coming from a /f, v/ syllable), and as /T, D/ with mismatching vocalic parts. Based on these identification patterns, Harris suggested that the perception of fricatives occurs at two consecutive stages. In the first stage, cues from frication noise alone determine whether the fricative is a sibilant or nonsibilant. If sibilant is the determined class, then cues from the frication noise alone will differentiate among the sibilant fricatives. However, if the class is determined to be nonsibilant at the first stage, then the formant transition information is used for the within-class classification. As was the case with cross- splicing methods previously mentioned (section 2.3.1.1), this method also does not eliminate the possibility of dynamic coarticulatory information from being colored into the precut vowel and/or fricative. It is not clear, therefore, that the results 23 obtained can be attributed solely to the mismatching vocalic part of the cross- spliced signal. To overcome this problem, Heinz and Stevens (1961) synthesized stimuli consisting of white noise of varying frequency peaks, similar to peaks found in English fricatives, followed by four synthetic formant transition values. Listeners were instructed to label these stimuli as one of the four voiceless English fricatives /f, T, s, S/. Based on identification scores, the researchers concluded that /f/ is distinguished from /T/ on the basis of the F2 transition in the following vowel. There was no apparent effect of formant transition on the distinction between /s/ and /S/. These findings support those of Harris (1958), while using more controlled stimuli. The role of formant transition, however, was not found to be as crucial in other studies. LaRiviere, Winitz, and Herriman (1975) used the fricative noise in its entirety in a perceptual test and obtained high recognition scores for /s, S/, lower scores for /f/ and poor scores for /T/. More importantly, when vocalic information was included for the /f, T/ tokens, no significant increase in their recognition was obtained. Other studies (Manrique and Massone 1981; Jongman 1989) also found similar results using different methods. The perceptual experiments thus far mentioned used a forced-choice technique that might have biased participants’ responses. For that reason Manrique and Massone (1981) used a tape splicing paradigm to study the effect of formant transition on the perception of Spanish fricatives by Spanish listeners. They constructed their stimuli by splicing CV syllables into their respective frication and vowel parts. Listeners were asked to choose the fricative when presented with the frication noise alone and to freely guess the sound that preceded the vowel when presented with the vocalic part. In the latter case, most token were judged (85% of the responses) to have been preceded with a stop sharing the same place of articulation as the spliced fricative. Spanish fricatives with no stops sharing 24 the same place of articulation were perceived as /t/, with the exception of /f/ which was perceived as /p/ 50% of the times. The same listeners were able to identify the fricative accurately from only the frication part in all cases except for /x/ and /G/. However, another study found that formant transition was not crucial for correct identification of fricatives (Jongman 1989). Based only on the frication noise part of fricative-vowel syllables, Jongman (1989) achieved correct (92%) fricative identification in a perceptual experiment of English fricatives. More importantly, there was no significant increase in identification accuracy when the entire fricative-vowel syllable was presented. As with results obtained from synthetic speech, measures of formant transition from naturally produced fricatives are also conflicting. Wilde and Huang (1991), for example, measured the F2 at the vowel onset for fricatives of only one male speaker and found that the F2 value did not differentiate systematically between /f/ and /T/. However, in another study, Wilde (1993) found that transitional information as measured by F2 value at the fricative-vowel boundary can be used to identify fricative place of articulation. The measurement she obtained from two speakers showed that as the place of constriction moves back in the vocal tract, the value of F2 systematically increases and its range becomes smaller. 2.3.4.2 Locus equations

Locus equations provide a method to quantify the role of formant transition in the identification of fricative place of articulation by relating second formant frequency at vowel onset (F2onset) to that at vowel midpoint (F2vowel). Locus equations are straight line regression fits to data points formed by plotting onsets of F2 transitions along the y axis and their corresponding vowel nuclei F2 along the x axis in order to obtain the value of the slope and y-intercept. This metric has been used primarily to classify English stops (Lindblom 1963; Sussman et al. 1991). It was only recently that this measure was applied to fricatives. Fowler 25

(1994) investigated the use of locus equations as cues to place of articulation across different manners of articulation including the fricatives /v, D, z, Z/ as spoken by five males and five females speakers of English. In this study, Fowler found that locus equations (in terms of slope and y-intercept) of a homorganic stop and fricative were significantly different, while those of a stop and a fricative of different place of articulation were significantly similar. Nevertheless, locus equations were able to differentiate between members that share the same . Slopes for fricatives /v, D, z, Z/, for example, were significantly different (slopes of 0.73, 0.50, 0.42, and 0.34 respectively). In another study, Sussman (1994) investigated the use of locus equations to classify across manners of articulation (, fricatives, and nasals). In contrast to Fowler (1994), he found that fricatives were not distinguishable based on the slope of their locus equations. Only /v/ had a distinctive slope. Results of other studies of English fricatives were similar to those of Sussman (1994). For example, in their large-scale study of English fricatives, Jongman et al. (2000) calculated the slope and y-intercept for all English fricatives in six vowel environments. Specifically, Jongman and colleagues measured F2onset and F2vowel from a 23.3 ms full Hamming window placed at the onset and midpoint of the vowel respectively. This was the same method used by the previously mentioned studies. Similar to Sussman (1994), Jongman et al. (2000) found that only the slope value for /f, v/ was significantly different and that the y-intercept were distinct only for /f, v/ and /S, Z/. Locus equations are particularly of interest here since they have been shown to work across languages (Sussman, Hoemeke, and Ahmed 1993), gender (Sussman et al. 1991), speaking style (Krull 1989), and speaking rate (Sussman, Fruchter, Hilbert, and Sirosh 1998). 26

2.4 Studies of Arabic Fricatives

The use of acoustic cues to distinguish between the different fricatives in Arabic has been underinvestigated in the literature. Furthermore, the very few studies dealing with acoustic characteristic of Arabic fricatives (see below) have been predominantly concerned with a single acoustic feature and not with the way multiple cues can be integrated in order to distinguish among the fricative place of articulation. While some of the cues mentioned above seem to distinguish with a relatively good accuracy between English fricatives, the same cues when used to classify Arabic fricatives need to take into account acoustic characteristics particular to Arabic. For example, unlike English, Arabic utilizes durational differences of both vowels and consonants for phonemic distinctions. It is of interest, therefore, to see how such durational property would affect voicing and place classification of Arabic fricatives. Another interesting feature of Arabic is the existence of co-articulated (pharyngealized) fricatives that are phonemically distinct from their plain counterparts. Due to their double articulation mechanism, it is expected in our study that pharyngealized fricatives will have two patterns of peaks emerging at the middle and near the end of frication. Therefore, it seems necessary to use a second analysis window at the end of frication noise such that its right shoulder is aligned with the end of frication noise. Additionally, the two window locations are suggested because studies of spectral peak location have demonstrated that high frequency peaks are more likely to emerge at the middle and end of frication noise (Behrens and Blumstein 1988b). Also, the frequency of the most prominent peak for the pharyngealized fricatives is expected to be lower than their plain counterparts because of acoustic coupling resulting from co-articulation. Spectral moments seem to be another promising technique in classifying Arabic fricatives if the proper size and location of the analysis windows are used. In fact, in a study of fricatives in Cairo Arabic, Norlin (1983) found that /s, 27 sQ, z, zQ/ are characterized by a sharp peak in higher frequencies, and that the peak of /sQ, zQ/ are broader than /s, z/. Norlin used Center of Gravity (COG) and dispersion as ways of quantifying the location of the peak and the spread of the dispersion respectively. Therefore, it seems that a combination of spectral mean and variance along with skewness measures would differentiate between pharyngealized and plain fricatives. The use of formant transition information was investigated in the literature in relation to the fricatives articulated at the back of the oral cavity. For example, El-Halees (1985) found that the F1 value at the transition differentiates between uvular and pharyngeal fricatives with the former being lower. Also, he found that listeners can differentiate between the two classes based only on this single feature. The perceptual salience of F1onset was also demonstrated by Alwan (1989), who used synthetic speech to test the discrimination between voiced pharyngeal fricative /Q/ and /X/. She found that the higher F1onset for the pharyngeal was essential to make the distinction, while F2onset was not. The relation between back articulation and high F1 was also attested for vowels following such sounds. Zawaydeh (1997) found that F1 at the middle of the vowel was raised when preceded by one of the /sQ, è/ or the glottal /h/ as compared to non-gutturals. In addition to first and second formant at transition, locus equations were also used as a classification metric for Arabic. The first attempt was part of a cross-linguistic study of locus equations as a cue for stops place of articulation. Sussman et al. (1993) recorded the voiced stops /b, d, dQ, g/ as produced by three speakers of the Cairene dialect of Arabic. They found that both slope and y-intercept for almost all comparisons were significantly different except for the slope of /d/ and /dQ/, and the y-intercept for /b/ and /g/. The second study was conducted by Yeou (1997) who elicited both stops and fricatives from nine 28

Moroccan subjects. Yeou found that y-intercept and slope distinguished between most fricative comparisons. However, neither slope nor y-intercept distinguished /S/ from /è/ or /f/ from /X/. More importantly, locus equation slopes were able to group pharyngealized (/DQ, sQ/) together as a distinct group differing from their non-pharyngealized counterparts and other fricatives with distinctly low y-intercepts and flat slopes. Yeou argued that unlike their plain counterparts, pharyngealized fricatives resist the articulatory effects of the following vowel due to their double articulation. Instead they induce their coarticulatory effect on the following vowel by raising its F1 and lowering its F2. This change in F2, as compared to plain fricatives, causes the slope to be flatter and the intercept to be lower. To summarize, several acoustic cues related to spectral, temporal and amplitude information found in the speech signal were used in different languages to classify fricatives into their places of articulation. Such cues, alone and collectively, served to distinguish between different places/classes of fricatives in English. Howeve, the use of these cues to classify Arabic fricatives has not received much attention. In our study we attempt to examine how each of the spectral, temporal and amplitude characteristics mentioned in Sections (2.3) would serve alone and collectively to distinguish between place of articulation of Arabic fricatives. Additionally, of particular importance to our study is to see if the acoustic cues found to be effective in fricative classification in other languages will be affected by the vowel length differences present in Arabic; and if such cues would distinguish between plain and pharyngealized fricatives. In the following chapter, we will discuss how such cues are investigated and the modifications implemented in the measurements techniques if any. CHAPTER 3 METHODOLOGY Several spectral, amplitude, and temporal measurements have been used in previous research to describe the acoustic cues that characterize fricatives in different languages. The current study investigated Arabic fricatives to find such acoustic cues. This chapter describes the way in which the speech samples were elicited, recorded and analyzed. For most of the acoustic analyses, this research followed the procedures commonly used to study fricatives in English as illustrated in Jongman et al. (2000). Certain modifications were applied to further investigate characteristics particular to Arabic. All coding and data analysis was carried out using the PRAAT software (Boersma and Weenink 2004) and a set of scripts developed at the phonetics lab of the University of Florida by the author. 3.1 Data Collection

3.1.1 Participants

A group of eight adult male speakers of Modern Standard Arabic (MSA) were recruited to participate in our study from the general undergraduate student population of King Saud University1. The mean age of participants was 20 years. They did not have any history of hearing or speaking impairments, and all had a very limited experience with English as a second language. Participants were given class credit by their instructors for participating in the study.

1 King Saud University, Riyadh, Saudi Arabia

29 30

3.1.2 Materials

There is a gap that exists in Arabic between MSA and its vernacular varieties. Arabic has been known as a traditional example of diglossia in which two varieties of the language are used to fulfill different communicative functions (Ferguson 1959). Although participants were all fluent speakers of MSA, additional care was taken in eliciting speech material in order to ensure that the participants would stay within the target MSA register. Therefore fricatives were elicited using screen prompted speech in conjunction with prerecorded audio prompts. A trained phonetician, who is also a fluent speaker of MSA, produced CVC syllables where the initial consonant was a MSA fricative /f, T, D, DQ, s, sQ, z, S, X, K, è, Q, h/ followed by each of the six vowels /i, i:, a, a:, u, u:/. The final consonant was always /t/. Each resulting word was repeated three times to yield a total of 234 audio prompts (13 fricatives × 6 vowels × 3 repetitions). The recorded prompts were then edited to be of equal length (' 1 second) by adding silence to the end if needed. The written prompts were constructed using fully vowelled Arabic orthography on a white background. The participants were instructed to repeat the word presented in the carrier phrase “qul marratajn” (say twice); with the audio prompt functioning only as a reference. The prompts were presented randomly in blocks of 39 words with breaks between blocks. Before the actual recording of any participant, a practice session with 10 words presented in two blocks was conducted to familiarize the participants with the task. 3.1.3 Recording

The recording was carried out using the facilities of the Computer & Electronics Research Institute at KACST2. Two adjacent sound-attenuated booths with a monitoring window between them hosted the data collection process.

2 King Abdulaziz City for Science and Technology, Riyadh, Saudi Arabia. 31

In one booth a PC computer running Microsoft PowerPoint was used to present the synchronized audio-written production prompts via an LCD screen affixed to the outside of the monitoring window of the other booth. The text was shown on the LCD screen while the synchronized audio prompt was fed through headphones (Sennheiser Noisegard mobile HDC 451). A Kay Elemetrics CSL (Computer Speech Lab) model 4300B which was connected to another PC computer was used for in-line recording of the participants’ utterances. It should be pointed out that anti-aliasing is carried out automatically during data capture through CSL external module. All recordings were done at 22.05 kHz sampling rate and 16 bit quantization. The participant’s production of the word in the carrier phrase was captured using a low-impedance, unidirectional head-worn dynamic microphone (SHURE SM10A) positioned about 20 mm to the left of the participants’ mouth in order to prevent direct air flow turbulence from impinging on the microphone. Each word lasted 4 seconds on the screen and then the following word was shown. In case a participant did not produce the word in the allocated time or a mispronunciation occurred, the recording was stopped by the author and that particular word was presented again. Each block was saved to a separate sound file for easy manipulation. The resulting sound files were then transfered into PRAAT for segmentation and further analyses. 3.2 Data Analysis

3.2.1 Segmentation of Speech

Both a wide-band spectrogram and a waveform display were used in the segmentation of the recorded material into the monosyllabic words containing the test fricatives. For each token, four points were identified on the waveforms: the beginning of frication, the offset of fricative/beginning of the vowel, the end of the vowel, and the end of word. For all these points the nearest zero-crossing 32 was always used. Fricative onset was taken to be the point in time at which high- frequency energy appeared on the spectrogram and/or a significant increase in zero-crossings rate occurred. The offset of the voiceless fricative was taken to be the point of minimum intensity preceding the periodicity of the vowel. For the voiced fricatives, the offset was taken to be the zero-crossing of the pulse preceding the earliest pitch period exhibiting a change in the waveform from that seen throughout the initial frication (Jongman et al. 2000). The vowel offset was taken to be the end of periodicity while the end of the segmented token was taken to be the onset of stop burst release. Figure 3–1 shows an example of these points. The time indices of the segmentation points were written to a PRAAT TextGrid file. Such files make it easier to handle the signal independently from the segmentation data and labels.

Fricative onset Fricative offset Vowel offset Stop release

Figure 3–1. Example of Segmentation 33

The only exception to the above mentioned general rules was with the voiced pharyngeal fricative /Q/, where it was difficult to visually localize the fricative- vowel boundary. Pharyngeal fricative /Q/ is known to have a formant-like structure continuous with that of the following vowel, with the lowest frequency of the fricative matches that of the second formant of the following vowel (Johnson 1997). Therefore, the frication offset for /Q/ was taken to be the point at which an upwards intensity-shift occurred with reference to the intensity of the fricative onset. Such point indicates the shift from low intensity founds in the frication noise towards the higher intensity of the vocalic part. Figure 3–2 shows an example of the segmentation of /Q/. Due to the absence of voicing during frication, such modification in segmentation criteria was not necessary for either /è/ nor /h/.

Fricative onsetFricative offset Vowel offset Stop release

Figure 3–2. Segmentation of /Q/. The dotted line shows the intensity level. 34

3.2.2 Acoustic Analyses

All measurements described below were obtained using scripts written by the author for the PRAAT program. All measurements were then entered into a MySQL database for later querying and statistical analyses. For spectral analyses based on fast Fourier transform (FFT), a double-Kaiser window was used. A window is a frequency weighting function applied to the time domain data to reduce the spectral leakage associated with finite-duration time signals. This process is achieved by applying a smoothing function that peaks in the middle frequencies (forming a main lobe) and decreases to near zero at the edges (forming side lobes), thus reducing the effects of the discontinuities as a result of finite duration. The ideal window is one that has a narrow main lobe and low sidelobes (Harris 1978). However, there is a tradoff relationship between these two characteristics as narrowing the main lobe introduces many levels of sidelobes and vice versa. Traditionally, in speech research, Hamming and Hann windows were used for spectral analyses. However, the more optimum Kaiser window is used in our study. The Kaiser window is the best approximation to a Gaussian window given a certain ratio between physical length and effective length. More precisely, when weighting is used, a Kaiser window of double physical length is applied to the signal (Boersma and Weenink 2004). Such windowing function produces similar bandwidth as compared to a Hamming window with comparable effective width. However, with a Hamming window, we end up with sidelobes of about −42 dB on each side of the main lobe while such windowing artifacts are at a level of −190 dB for the Kaiser window (Figure 3–3). Most speech analysis software uses a Hamming (or Hann) window because evaluating a Kaiser window as explained above is slower by a factor of two since the analysis is performed on twice as many samples per frame. With modern computers, such speed/performance tradeoff is minimal and hence the adaptation of the weighting function for our study. 35

80 80 ) z H Hz) / main lobe / B d (

l

e 60 60 v e l

e side lobes r u s s e r p

d n u o S 40 Sound pressure level (dB 40

980 1020 980 1020 Frequency (Hz) Frequency (Hz) AB

Figure 3–3. Two Window functions. A)The 0.1-seconds Hamming Window. B)The 0.2-seconds Kaiser Window.

Pre-emphasis of each spectral analysis interval was carried out in order to correct for the −6 dB per octave falloff in production of voiced speech. This falloff is a result of the 12 dB per octave decrease due to excitation source and 6 dB per octave increase due to the radiation compensation at the lips. With pre-emphasis applied, the flattened spectrum would be a function of the vocal tract alone. Pre- emphasis was applied as described in the PRAAT manual as a filter changing each sample xj of the sound (except for x1) starting from the last sample according to Equation (3–1) where 4t is the sampling period of the sound and F is the frequency above which the change is applied. In our study α was set to 0.98 and F to 50 Hz. The pre-emphasis filter was applied to the signal before windowing.

α = exp (−2 π F 4t) (3–1) xj = xj − αxj−1 36

3.2.2.1 Duration

Three temporal measurements were extracted based on the segmentation criteria mentioned above: fricative, vowel and word duration. Since different tokens of the same fricative included different stop burst durations, word duration was measured from fricative onset to the point where the release of stop burst is visible on the spectrogram (Figure 3–4).

Fricative Vowel Word

Figure 3–4. Duration

3.2.2.2 Spectral Moments

Spectral Moments measurements were modeled after those of Forrest et al. (1988) with the window length modification employed by Jongman et al. (2000). After pre-emphasis is applied to the signal, FFT spectra were calculated from four different locations in the fricative with a 40 ms double-Kaiser window. The first three windows were aligned so that the first covered the initial 40 ms of the fricative, the second the middle 40 ms and the third the final 40 ms of frication noise. The fourth window was centered over the fricative-vowel boundary so that it covered 20 ms of each, capturing any transitional information. The analysis 37 windows may or may not overlap based on the length of the frication noise. Following Forrest et al. (1988), each FFT was treated as a random probability distribution from which the first four moments (mean, variance, skewness, and kurtosis) were calculated. Only moments from linear spectra were calculated since previous research on fricatives (Jongman et al. 2000) reported that there was no substantial difference between the linear and bark-transformed spectra. The PRAAT program measures the first moment (center of gravity) as in Equation (3–2) where S(f) is the complex spectrum, f is the frequency and the denominator is the energy. The quantity p was set to 2 in order to weigh the average frequency by the power spectrum (not by the absolute spectrum).

R ∞ f |S(f)|p df 0 (3–2) R ∞ p 0 |S(f)| df The other three moments were first calculated using Equation (3–3) where n denotes the nth moment. To normalize skewness with regard to different levels of variance, the product of Equation (3–3), with n = 3, was divided by 1.5 power of the second moment. Likewise, to normalize kurtosis, the product of Equation (3–3), with n = 4, was divided by the square of the second moment and then a value of 3 was subtracted (Forrest et al. 1988).

R ∞ (f − f )n |S(f)|p df 0 c (3–3) R ∞ p 0 |S(f)| df

3.2.2.3 RMS Amplitude

Root-Mean-Square (RMS) amplitude in dB was measured from the entire frication noise. Since different speakers and recording sessions may result in different intensities, direct measures of amplitude cannot be compared across speakers. Therefore, fricative amplitude was normalized using the method described by Behrens and Blumstein (1988b). Basically, the average RMS amplitude (in dB) of three consecutive pitch periods at the point of maximum 38 vowel amplitude was subtracted from the RMS amplitude of the entire frication noise. In PRAAT, RMS amplitude was given in units of Pascal and were then changed into dB following Equation(3–4).

nAmplitude o RMS Amplitude dB = 20 × log pascal (3–4) 10 2 × 10−5

3.2.2.4 Spectral Peak Location

Spectral Peak Location of the fricative was estimated using a 40 ms double- Kaiser window positioned over the middle of the frication noise. The analysis window was set this large in order to gain better frequency resolution (Jongman et al. 2000). Another window was placed at the end of the frication noise such that its right shoulder was aligned with the end of frication noise. The two window locations were used because studies of spectral peak location have demonstrated that high frequency peaks are more likely to emerge at the middle and end of frication noise (Behrens and Blumstein 1988a). Further, as explained in Section (2.3.3.1), it is anticipated that two patterns of peaks will emerge: one at middle of the frication noise and the other at the end of the co-articulated pharyngealized fricatives due to their coarticulatory nature. After applying pre-emphasis and windowing, an FFT spectrum was derived. A script written for PRAAT searched each spectrum to find the highest amplitude peak and its associated frequency. As before, the amplitude was converted into dB using Equation (3–4). 3.2.2.5 Relative Amplitude

Relative Amplitude was measured as described in Hedrick and Ohde (1993) and later in Jongman et al. (2000) with one more modification. An FFT spectrum was derived at vowel onset with a 23.3 ms double-Kaiser window. The mean value of the first six formants in the windowed selection were estimated based on the FFT spectrum. Each spectrum was then filtered using a pass-band Hann filter to 39 isolate regions of the second, third and fifth formants based on the mean values obtained above. Each region spanned from the mean frequency of the target formant to half the distance to the two adjacent formants. A schematic example of the upper and lower limits of such region is presented in Equation (3–5).

maxF = meanF + [(meanF − meanF )/2] i i i i−1 (3–5) minFi = meanFi − [(meanFi+1 − meanFi)/2]

A script written for PRAAT searched each frequency region of the spectrum to find its spectral peak and associated amplitude as mentioned in Section 3.2.2.4 above. Similar to previous research with (English) fricatives, spectral peak at the F 5 region was used for non-sibilant fricatives /f, T, D/ and spectral peak at F 3 region for sibilant fricatives /s, z, S/. However, for the remaining fricatives (/X, K, Q, h, sQ,DQ/), spectral peak of the F 2 region was used. Another FFT spectrum was derived at the middle of frication noise and subsequently filtered into frequency regions based on the frequency of amplitude peaks of F 2, F 3 and F 5 regions of the vowel. Each region spanned 128 Hz on each of the two sides around the vowel’s frequency regions. The amplitude of the spectral peak in the said regions was measured using the same procedure outlined above for the vowel. Relative amplitude was then defined for each frequency region as the ratio between fricative amplitude and vowel amplitude at that frequency range. Ratios in log scale are expressed as the difference between the two values. 3.2.2.6 Locus Equations

Following previous research on locus equations (for example Sussman et al. 1991, 1993; Fowler 1994; Sussman 1994; Yeou 1997; Govindarajan 1998; Jongman 1998; Jongman et al. 2000; Tabain 2002), coefficients of locus equations were derived from scatterplots of F 2 values measured at vowel onset and vowel nucleus for each speaker and place of articulation combination. Specifically, the second formant at vowel onset as well as at the middle of the vowel were estimated using 40 the formant tracking procedure implemented in PRAAT. At first, the sound was resampled to 10 kHz and then pre-emphasized using the algorithm mentioned above Equation (3–1). After a Gaussian-like window of 25 ms length was applied to the signal, the LPC coefficients were calculated for each analysis window using the algorithm by Burg, as given in Anderson (1978) and Press, Flannery, Teukolsky, and Vetterling (1992). For each speaker and place combination, linear regression fits were applied on scatterplots with F 2 averaged across all vowel contexts. Each scatterplot had F 2 measured at the onset of the vowel represented on the y-axes and F 2 measured at the mid-point of the vowel represented on the x-axes. The coefficients of each regression line (the slope ‘k’ and the y-intercept ‘c’) were taken to be the terms of locus equations. 3.2.2.7 F2 at Transition

Second Formant at the transition was also measured from the first window (at vowel onset) used to derive F 2 for the locus equations above. 3.3 Statistical Analyses

Along with reporting the descriptive statistics for the acoustic measures mentioned above, measures of significant differences between different places of articulation for these measures were obtained using appropriate Analysis of Variance (ANOVA) methods. All reported statistics were calculated from data points aggregated across the three repetitions for each speaker. Discriminant function analysis (DFA) was used to measure the contribution of different cues towards the classification of fricatives into their respective classes. The DFA procedure reduces the physical space, built by extracted cues, into subspaces corresponding to the sound classes under consideration (Jassem 1979). This classification method works first by forming vectors of the metrics mentioned above. Recall that each cue mentioned above, except for locus equations, represents a value of some single feature at a given point in time. Therefore, each token can 41 be represented as a combination of values (a vector) from all these cues. All the tokens, then, are represented as points defined by their respective vectors in a multidimensional space. The dimensions of such space depend on the number of parameters in use. The goal of DFA is to find the optimal number of parameters that provide the optimal classification accuracy of tokens into their pre-defined classes. This process involves calculating three types of probabilities: the probability of observing a particular parameter p for a token t (P [ p | t ]), the probability of observing a token t in the data (P [ t ]) and finally the probability of observing a specific value for a parameter (P [ p ]). All these probabilities are calculated from training data to predict the membership of an unknown token in testing data using the Bayesian Theorem (3–6). The value P [t|p] is the probability that an unknown token belongs to class t given a value for parameter p (Harrington and Cassidy 1999).

P [ p | t ] P [ t ] P [ t | p ] = (3–6) P [ p ]

The unknown token then is classified as belonging to class A (ta) not class B

(tb) if the condition P [p|ta]P [ta] > P [p|tb]P [tb] is satisfied (Harrington and Cassidy 1999). The traditional way of applying this method to fricatives classification (see for example Shadle and Mair 1996; Tabain 1998; Jongman et al. 2000; Nissen 2003) involves all-but-one speakers as the training data and tokens from the remaining speaker as the testing data. The process is repeated so that each speaker will be in the testing data at a given time. The DFA procedure produces a classification accuracy score along with a set of coefficients that represent the contribution of the parameters in the classification. CHAPTER 4 AMPLITUDE AND DURATION This chapter reports results of the amplitude and duration measurements. These results were derived from a three-way ANOVA with place of articulation, voicing, and vowel context as between-subject factors. Post hoc tests of significant effects were adjusted for multiple comparisons using the Bonferroni method. All data were aggregated across the three repetitions of each speaker prior to any statistical analysis. 4.1 Amplitude Measurements

4.1.1 Normalized Frication Noise RMS Amplitude

Normalized frication RMS amplitude was calculated as the difference between frication noise RMS amplitude and the average RMS amplitude of three consecutive pitch periods at the point of maximum vowel amplitude. A three-way Analysis of Variance (ANOVA) with normailized frication noise RMS as the dependent factor and the place of articulation, voicing, and vowel context as between subject factors revealed a significant main effect of Place [F (8, 561) = 75.241, p < 0.001; η2 = 0.518]. Due to a lack of voicing contrast at some places of fricative articulation in Arabic (Labiodental, Post-Alveolar, and Glottal), differences within voiceless fricatives and within voiced fricatives will be interpreted separately. For both voiced and voiceless fricatives, subsequent Bonferroni post hoc tests showed that plain fricatives and their pharyngealized counterparts (/D - DQ/ and /s - sQ/) did not differ in normalized RMS amplitude (mean normalized RMS values are reported in Figure 4–1). However, with the exception of the contrast between voiced alveolar and uvular fricatives (/z - K/), normalized RMS amplitude significantly (p < 0.0001) distinguished all

42 43 places of voiced fricative articulation. Additionally, within voiceless fricatives, nonsibilant fricatives /f, T/ had the lowest normalized RMS amplitude (−23.94 and −22.50 dB respectively). While such RMS amplitude values for /f/ and /T/ were not statistically different from each other, normalized RMS amplitude values of both /f/ and /T/ were significantly lower than all other voiceless fricatives. Additionally, no differences were obtained between /s, S, h/ or between /X, è/. All other contrasts were significant (Figure 4–1).

-14.01 Glottal voiced voiceless

-19.09 Pharyngeal -7.52

-20.17 Uvular -13.66

-14.40 Post-Alveolar

Pharyngealized Alveolar -16.55

-15.38 Alveolar -14.53

Place of Articulation Pharyngealized -18.15

Place of Articulation Dental

-22.50 Dental -17.26

-23.94 Labiodental

Normalized RMS Amplitude (dB) Normalized RMS Amplitude (dB) Figure 4–1. Mean frication noise normalized RMS amplitude (dB) by place of articulation and voice.

There was also a significant main effect of Vowel context [F (5, 561) = 16.185, p < 0.001; η2 = 0.126]. For short vowels, normalized frication RMS amplitude tended to be lower as the vowel context changed from /i/ to /u/ to 44

/a/ with means of −16.51 dB, −17.03 dB, and −17.81 dB respectively. The same pattern was also observed with long vowels (/i:/ to /u:/ to /a:/ with means of −14.30 dB, −16 dB, and −18.58 dB respectively). However, statistically significant differences in terms of vowel context effect, as suggested by post hoc tests, were observed with long vowels only with p = 0.004 for the /i: -u:/ contrast and p < 0.001 for all other contrasts. Additionally, as can be seen from Figure 4–2, when comparing a short vowel to its long variant, we find that only the front long vowel /i:/ resulted in a significantly (p < 0.001) lower value for normalized frication RMS amplitude than its short counterpart /i/.

/ i / / u / / a / 0

-2

-4

-6

-8

-10

-12

-14 Normalized RMS Amplitude (dB) Normalized RMS Amplitude (dB) -16

-18 Short Vowels Long Vowels -20 VowelVowel Context Context Figure 4–2. Mean frication noise normalized RMS amplitude (dB) by vowel context.

Finally, a significant main effect of Voicing [F (1, 518) = 315.204, p < 0.001; η2 = 0.36] was also found. Normalized RMS amplitude of voiced fricatives 45

(mean = −14.22 dB) was greater than that of voiceless fricatives (mean = −18.26 dB). In addition to this main effect, there was a significant Place by Voicing interaction [F (3, 561) = 41.9, p < 0.001; η2 = 0.183]. As can be seen in Figure 4–3, Bonferroni post hoc tests showed that the significant difference in normalized frication RMS amplitude between voiced and voiceless fricatives noted above was not present for alveolar fricatives /s, z/.

0 Voiced Voiceless

-5 dB) ( ude

it -10 mpl A

MS -15 R zed ali m Normalized RMS Amplitude (dB)

r -20 No

-25 Dental Alveolar Uvular Pharyngeal PlPlaceace of ofA Articulationrticulation FigureFigure 4–3.1: Mean Meanfricatio fricationn no noiseise nor normalizedmalized R RMSMS a amplitudemplitude ( (dB)dB) as asa afunc functiontion of of place of articulaplacetion of articulationand voicing. and voicing.

4.1.2 Relative Amplitude of Frication Noise

Relative amplitude is defined here as the ratio between the amplitude of a specific frequency (F 3 for /f, T, D/, F 5 for /s, z, S/, and F 2 for /X, K, sQ,DQ, è, Q, h/) measured at the frication noise midpoint and the amplitude of the corresponding frequency measured at vowel onset. Results of a three-way ANOVA

1 46

(place × voice × vowel) with relative amplitude as the dependent variable showed a significant main effect of Place [F (8, 561) = 104.525, p < 0.001; η2 = 0.598]. In general, relative amplitude of a fricative becomes greater as the place of articulation advances towards the lips (Figure 4–4). The only notable exception was the post-alveolar fricative (/S/). It was the only fricative in which the frication amplitude measured at the region of F 3 was greater than the amplitude of the same frequency region at the following vowel onset (i.e., giving a value for relative amplitude above zero). Collapsed across voicing, differences in relative amplitude between all places of fricative articulation were significant with the exception of all possible pairwise comparisons between the following three places: alveolar /s, z/, pharyngeal /è, Q/, and glottal /X, K/ fricatives. However, since voicing contrast is not present at all places, Bonferroni post hoc tests carried out on voiced and voiceless fricatives showed a different pattern. Within voiced fricatives, relative amplitude of pharyngealized dental fricative /DQ/ was significantly lower than those of all other voiced fricatives, while those of alveolar /z/, dental /D/, and uvular /K/ fricatives were not statistically different from one another. Furthermore, the difference in relative amplitude between /D/ and /Q/ was not significant. All other contrasts between voiced fricatives were significant (Figure 4–4). Within voiceless fricatives, relative amplitude differentiated /f/ (−5.22 dB) and /T/ (−5.45 dB) from all other fricatives; however, no significant difference was observed between these two nonsibilant fricatives. Additionally, relative amplitude differentiated between all other voiceless fricatives with the exception of the contrasts between /s/–/è/, /s /–/h/, and /è/–/h/. There was also a significant main effect for Vowel context [F (5, 561) = 11.642, p < 0.001; η2 = 0.094]. However, the source of this main effect as revealed by Bonferroni post hoc tests can be solely attributed to differences in the context of long vowels. Specifically, relative amplitude of fricatives followed by the high back 47

-14.27 Voiced Voiceless Glottal

-17.32 Pharyngeal -11.78

-22.66 Uvular -20.05

0.90 Post-Alveolar

Pharyngealized -31.23 Alveolar

-15.76 Alveolar Place of Articulation -16.28

Pharyngealized

Place of Articulation Dental -28.03

-5.45 Dental -14.95

-5.22 Labiodental

RelativeRelative Amplitude Amplitude (dB) (dB) Figure 4–4. Mean relative amplitude of fricatives. 48 vowel /u:/ (mean = −11.31 dB) was significantly higher (p < 0.0001) than relative amplitude of fricative in front of any other vowel except /i:/ which has similar height and length as /u:/. Another source for the obtained main effect above was the significantly low (p < 0.016) relative amplitude of fricatives preceding the low vowel /a:/ (mean = −17.02 dB) in relation to other long vowels. Furthermore, there was a general trend such that a short vowel would result in a lower relative amplitude than its long counterpart with only /u, u:/ contrast reaching significance level (p < 0.05). Mean values for relative amplitude of fricatives in different vowel contexts are presented in Table 4–1 where cells with significant differences are shaded.

Table 4–1. Relative amplitude in different Vowel contexts. Means are arranged in descending order. Mean /i/ /u/ /a/ /i:/ /u:/ /a:/ /u:/ -11.31 ∗ ∗ ∗ ∗ /i:/ -13.85 ∗ ∗ /i/ -16.17 ∗ /u/ -16.33 ∗ /a:/ -17.02 ∗ ∗ /a/ -18.61 ∗ ∗ ∗ significant difference at p < 0.05

The ANOVA also revealed a significant Place by Voicing interaction [F (3, 561) = 20.834, p < 0.001; η2 = 0.10]. Bonferroni post hoc tests showed that only the differences between voiceless and voiced dental fricatives /T, D/ (9.5 dB) and between voiceless and voiced pharyngeal fricatives/è, Q/ (−5.5 dB) were significant (Figure 4–5). However, no main effect of voicing was obtained. A Place by Vowel context interaction was also significant [F (40, 561) = 4.101, p < 0.001; η2 = 0.226]. Multiple one-way ANOVAs, with Bonferroni post hoc tests corrected for multiple comparisons, were conducted for each place of articulation in which vowel context was separated as long and short vowels. The results of these ANOVAs showed that for long vowels, the significant increase 49

0 Voiced Voiceless

-5 dB) ( -10 ude it mpl

A -15 e iv Relat

Normalized RMS Amplitude (dB) -20

-25 Dental Alveolar Uvular Pharyngeal PlPlaceace of ofA Articulationrticulation Figure 1: Mean frication noise normalized RMS amplitude (dB) as a function of Figure 4–5. Relative amplitude as a function of Place and Voicing. place of articulation and voicing.

1 50 of relative amplitude in front of /u:/ mentioned above was present only within labiodental (/f/) (mean = 5.34 dB) and alveolar (/s, z/) (mean = −6.37 dB) fricatives. In addition, relative amplitude within pharyngealized alveolars (/sQ/) in the context of low vowel /a:/ was significantly lower (mean = −38.21 dB) than in the context of high vowels /i:/ (mean = −21.36 dB) and /u:/ (mean = −22.54 dB). Finally, unlike the absence of differences between long vowels of the same height observed above, the relative amplitude of glottal fricative (/h/) in the context of the /i:/ (mean = −10.21 dB) was significantly higher than in the context of /u:/ (mean = −20 dB) (Figure 4–6). As for short vowels, a similar pattern of significant differences was obtained. Specifically, the relative amplitude of labiodental (/f/) and alveolar (/s, z/) fricatives was significantly higher in the context of /u/ (mean = −1.31 and −10.64 dB respectively) than either /i/ (mean = −9.77 and −21.58 dB respectively) or /a/ (mean = −9.83 and −20.79 dB respectively). Moreover, the relative amplitude of pharyngealized Alveolar (/sQ/) in the context of low vowel /a/ (mean = −39.07 dB) was only significantly lower than in the context of high vowel /i/ (mean = −28.02 dB) (Figure 4–7). Mean values for relative amplitude of fricatives in different vowel context are also presented in Table (4–2). Finally, a Vowel context by Voicing interaction was also found to be significant [F (5, 561) = 4.574, p < 0.001; η2 = 0.039]. Bonferroni post hoc tests were carried out on long and short vowels separately. In general the relative amplitude of voiceless fricatives in a given vowel context is higher than that of voiced fricatives in the same context (Figure 4–8 and Figure 4–9), however this difference was significant only with /i:/ (mean = −10.80 dB for voiceless and −18.71 dB for voiced). 51

/DQ//T,D//f/ /sQ//s,z/ /S/ K/ /h//è,Q//X, 10

0

-10

-20

-30 Relative Amplitude (dB)

-40 / i / / u / / a / -50 Place of Articulation

Figure 4–6. Relative amplitude (dB) as a function of place of articulation and short vowels. 52

/DQ//T,D//f/ /sQ//s,z/ /S/ K/ /h//è,Q//X, 10

0

-10

-20 `

-30 Relative Amplitude (dB)

-40 / i : / / u: / / a: / -50 Place of Articulation

Figure 4–7. Relative amplitude (dB) as a function of place of articulation and long vowels. 53 /a/ /i/ /u/ short long short long short long Voiced -18.88 -15.22Voiced -14.49 -21.54 -9.36 -18.67 -15.85 -9.83 -15.88 -6.91Voiced -21.31 -22.28 -22.71 -18.44 -18.45 -15.10 -22.58 -20.15 Voiced -12.98 -12.05 -14.65 -10.58 -10.78 -9.66 Voiceless -7.13 -5.26Voiceless -21.62 -6.51 -17.87 0.87 -11.46 -5.84Voiceless -7.55 -16.67 -19.30 -7.12 -16.52 -18.49 -29.88 -22.51 -27.48 -22.90 Voiceless -10.60 -7.04 -24.63 -21.55 -19.76 -20.35 Table 4–2. Mean relative amplitude of frication noise. Dental Uvular Glottal Voiceless -13.30 -10.21 -18.09 -20.00 -12.20 -11.80 Alveolar Pharyngeal Labiodental Voiceless -9.77 -7.12 -1.31 5.34 -9.83 -8.64 Pos-Alveolar Voiceless -2.09 -1.05 3.73 7.96 -3.16 0.01 Pharyngealized Dental Voiced -26.30 -24.91 -28.53 -26.76 -32.02 -29.67 Pharyngealized Alveolar Voiceless -28.02 -21.36 -38.21 -22.54 -39.07 -38.21 54

Voiced Voiceless -5

-10

-15 ` Relative Amplitude (dB) -20

/ i / / u / / a / VoicelessVoiced -25

Figure 4–8. Relative amplitude (dB) as a function of voicing and vowel context (short vowels). 55

Voiced Voiceless -5

-10

-15 Relative Amplitude (dB) -20

/ i : / / u: / / a: / -25 VoicelessVoiced

Figure 4–9. Relative amplitude (dB) as a function of voicing and vowel context (long vowels). 56

4.2 Temporal Measurements

Two measures of fricative noise duration are reported here: absolute fricative duration and normalized fricative duration. For the latter, the ratio between word and fricative durations was calculated to normalize and account for the different speaking rates that might have occurred. For each measure, a three-way ANOVA (place × voice × vowel context) was carried out. Subsequent post hoc tests were corrected for multiple comparisons using the Bonferroni method. 4.2.1 Absolute Duration of Frication Noise

A three-way ANOVA (place × voice × vowel context) with the duration of the frication noise as the dependent factor revealed a main effect of Place [F (8, 561) = 50.092, p < 0.001; η2 = 0.417] with mean frication noise duration of 117.99 ms. Mean duration of frication noise as a function of place of articulation and voicing are presented in Figure 4–10. Averaged across voicing and vowel context, pharyngealized dental /DQ/ and glottal fricative /h/ had the shortest duration with a mean of 86.47 and 98.55 ms respectively. Due to the well known effect of voicing on segmental duration (Cole and Cooper 1975; Manrique and Massone 1981; Baum and Blumstein 1987; Behrens and Blumstein 1988b; Crystal and House 1988; Pirello et al. 1997, among others), two sets of comparisons were mad, one fore voiced and the other for voiceless fricatives. Among voiced fricatives, alveolar fricative /z/ was significantly longer than all other voiced fricatives with a mean duration of 110.12 ms. No other differences among voiced fricatives reached the significance level of p < 0.05. On the other hand, contrasts within voiceless fricatives revealed that glottal fricative /h/, with a mean duration of 98.55 ms, was significantly shorter than all other voiceless fricatives. Although no significant difference between nonsibilants was observed, each of the nonsibilants /f/ and /T/ (127.86 and 131.68 ms respectively) were significantly shorter than each of the sibilants /s/, /sQ/, and 57

/S/. Additionally, alveolar /s/ and it pharyngealized counterpart /sQ/ (mean = 149.86 and 149.70 ms) were significantly longer than all other voiceless fricatives excluding /S/. As in the case of voiced fricatives, no significant differences were found among voiceless labiodental, dental, uvular, and pharyngeal fricatives or between pharyngealized fricatives and their plain counterparts (/sQ-s/).

98.55 Voiceless Glottal Voiced 134.84 Pharyngeal 83.82

138.59 Uvular 88.39

142.59 Post-Alveolar

Pharyngealized 149.70 Alveolar

149.86 Alveolar 110.21

Pharyngealized

Place of Articulation Dental 86.47 Place of Articulation

131.68 Dental 91.36

127.86 Labiodental

FricationFrication NoiseNoise Duration (ms) (ms) Figure 4–10. Absolute Frication noise duration as a function of place and voice averaged across all vowel context and speakers.

Also, as expected, a main effect of Voicing was found [F (1, 561) = 721.75, p < 0.001; η2 = 0.563], with voiceless fricatives (mean 134.21 ms) being significantly longer than voiced fricatives (mean 92.05 ms). A Place by Voice interaction was also significant [F (3, 561) = 3.327, p < 0.05; η2 = 0.017]. Subsequent Bonferroni post hoc tests showed that this difference was significant across all places of articulation 58 with a voicing contrast (Figure 4–11). The source of this interaction is probably due to variation in the magnitude of duration differences between a voiced and voiceless fricative in a given place. As is apparent from Figure 4–11, the difference between voiced and voiceless fricatives was greater for uvular and pharyngeal than for dental and alveolar fricatives.

160 Voiced

150 Voiceless

140 ms) ( 130 on

120 Durati

se 110 oi N 100 on

90 cati i r F

Duration of Frication Noise (ms) 80

70

60 Dental Alveolar Uvular Pharyngeal PlacePlaceof ofA Articulationrticulation FigureFigure 4–11.1: Mean Meanfricatio absoluten no fricationise normalized noise durationRMS amplitude for places(dB with) as aa voicingfunction of place of articulacontrast.tion and voicing.

Finally, a main effect of Vowel context [F (5, 561) = 4.708, p < 0.001; η2 = 0.04] was significant. However, post hoc tests showed that differences in frication noise duration measured in the context of vowels of the same length were not significantly different from each other. Moreover, the source of the main effect was due to the significantly increased duration of fricatives measured in the context of /i:/ (mean 123.25 ms) as compared to all short vowels; and the significantly longer

1 59 duration of frication noise in the context of /u:/ (mean 122.80 ms) when compared to /a, u/ (Figure 4–12).

140 Short Vowels Long Vowels

120

100

80

60

40 Duration of Frication Noise (ms) Frication noise Duration (ms)

20

0 / i / / u / / a / VowelVowel Context Context Figure 4–12. Mean absolute frication noise duration in different vowel contexts.

4.2.2 Normalized Duration of Frication Noise

Normalized frication noise duration is defined here as the ratio between fricative duration and word duration. As can be seen from Figure 4–13, normalized frication noise followed a pattern similar to the one observed with absolute frication noise duration. Specifically, averaged across voicing and vowel context, pharyngealized dental /DQ/ and glottal fricative /h/ had the shortest normalized duration with means of 0.27 and 0.31 respectively. The results of the three-way ANOVA revealed a main effect of Place [F (8, 561) = 49.82, p < 0.001; η2 = 0.415]. Separated according to voicing, Bonjferroni post hoc tests showed, as was the case 60 with absolute duration, that /z/ (mean 0.34) was significantly longer than all other voiced fricatives. No significant differences were observed among voiced dental, uvular, and pharyngeal fricatives or between pharyngealized dental and their plain counterparts (i.e., /DQ - D/). As for contrasts within voiceless fricatives, glottal fricative /h/, with the mean duration of 0.307, was significantly shorter than all other voiceless fricatives. Moreover, voiceless alveolar /s/ was significantly longer than all other voiceless fricatives excluding the post-alveolar and pharyngealized alveolar fricatives/S, sQ/, which in themselves were significantly longer than labiodental, pharyngeal, and glottal fricatives /f, è, h/. No difference among voiceless fricatives reached the significance level of p < 0.05.

0.307 Voiceless Glottal Voiced

0.379 Pharyngeal 0.263

0.388 Uvular 0.276

0.401 Post-Alveolar

Pharyngealized 0.405 Alveolar

0.412 Alveolar 0.335

Pharyngealized

Place of Articulation Dental 0.266 Place of Articulation

0.370 Dental 0.284

0.375 Labiodental

MeanNormalized Normalized Frication Frication Duration Noise Duration Figure 4–13. Mean normalized frication noise duration as a function of place and voice averaged across all vowel contexts and speakers. 61

The effect of Voicing on normalized fricative duration was also significant [F (1, 561) = 724.74, p < 0.001; η2 = 0.564]. Averaged across other conditions, voiced fricatives had significantly shorter normalized durations (mean = 0.29) than voiceless fricatives (mean = 0.38). In addition, a significant Place by Voicing interaction [F (3, 561) = 7.079, p < 0.001; η2 = 0.036] and subsequent Bonferroni post hoc tests showed that this difference was greater for uvular and pharyngeal than for dental and alveolar fricatives (Figure 4–14).

0.45 Voiced

n Voiceless io

at 0.40 Dur

ise 0.35 No n io 0.30 ricat F

zed 0.25 ali m r

No 0.20 Normalized Duration of Frication Noise

0.15 Dental Alveolar Uvular Pharyngeal PlPlaceace ofof ArticulationArticulation FigureFigure 4–14.1: Mean Meanfricatio of normalizedn noise nor fricationmalized noiseRMS durationamplitude for( placesdB) as witha func at voicingion of place of articulacontrast.tion and voicing.

Finally, as shown in Figure 4–15, normalized frication noise duration was significantly affected by the Vowel context [F (5, 561) = 8.862, p < 0.001; η2 = 0.073]. However, such effect as suggested by Bonferroni post hoc tests was localized only with reference to contrasts involving long vowels. Specifically, while no

1 62 significant differences were observed within short vowels, normalized frication noise duration was significantly shorter (mean = 0.32) in the context of /a:/ than all other vowels. On the other hand, fricatives preceding /i:/ had significantly longer normalized duration (mean 0.35) than in the context of other long vowels.

0.4 Short Vowels Long Vowels

0.35

0.3

0.25

0.2

0.15

0.1 Normalized Frication Noise Duration Normalized Duration of Frication Noise 0.05

0 / i / / u / / a / VowelVowel Context Context Figure 4–15. Mean normalized frication noise duration in different vowel contexts. CHAPTER 5 SPECTRAL MEASUREMENTS 5.1 Spectral Peak Location

This chapter reports on results of the spectral measurements which include spectral peak location (frequency region of eneregy maximum in frication noise) and spectral moments (mean, variance, skewness, and kurtosis). As mentioned in Section (3.2.2.4), spectral peak frequencies were measured at eh center as well as the end of frication noise. First, mean spectral peak location obtained from the two locations was used in a one-way ANOVA as dependent variable to test for the effect of the analysis window location. The ANOVA showed a main effect for Window Location [F (1, 1246) = 1022.9, p < 0.001; η2 = 0.451]. Mean spectral peak location when measured at the middle of the frication noise (4323 Hz) was higher than when measured at the end of frication noise. However, a three-way ANOVA (place × vowel × voicing) with spectral peak measured at the end of the frication noise as the dependent variable showed no significant effect for place. Therefore only the results of measurements derived from the middle of frication noise will be reported in details below. Table 5–1 represents the mean frequency of spectral peak location obtained from a 40-ms Kaiser window placed at the middle of frication noise of all fricatives in different vowel contexts averaged across speakers and repetitions. Results of a three-way ANOVA (place × vowel × voicing) with spectral peak measured at the middle of frication noise as the dependent variable revealed a main effect for Place [F (8, 561) = 143.402, p < 0.001; η2 = 0.672]. The observed general trend of spectral peak location is that, when averaged across speakers and vowel context,

63 64 the frequency of the peak tends to decrease as the place of articulation moves backwards in the oral cavity. Since voicing contrast is not present for some places of fricative articulation in Arabic, Bonferroni post hoc tests conducted to test for the simple main effect for place will be conducted separately for voiced and voiceless fricatives. That is, differences within voiceless fricatives and within voiced fricatives will be interpreted separately. Mean frequencies of spectral peak of fricatives separated by place and voicing are presented in Figure (5–1). Among voiceless fricatives, three homogeneous groups of fricatives articulated at adjacent places emerged, with differences in spectral peak location significant only for contrasts between members of different groups. The first group included labiodental, dental, and alveolar fricatives (/f, T, s/); the second included post-alveolar and uvular fricatives (/S, X/); and finally the third group consisted of pharyngeal and glottal fricatives (/è, h/). As for voiced fricatives, only the difference between /K/ and /Q/ was not significant. Moreover, no significant difference was observed between plain fricatives and their pharyngealized counterpart (/D - DQ/ or /s - sQ/). Another main effect was observed for Voicing [F (1, 561) = 152.388, p < 0.001; η2 = 0.214], in which the frequency of spectral peak location for voiceless fricatives (mean =4957 Hz) was significantly greater than that of voiced fricatives (mean =3279 Hz). However, a significant Place by Voicing interaction [F (3, 562) = 26.48, p < 0.001; η2 = 0.124] and subsequent Bonferroni post hoc comparisons within places that have a voicing contrast showed that the difference between voiceless and voiced fricatives was not significant for alveolar fricatives (/s, z/). Also, as apparent from Figure (5–2), the difference was most prominent for the nonsibilant dental fricatives (/T, D/). A main effect for Vowel context was also significant [F (5, 561) = 8.473, p < 0.001; η2 = 0.07]. While no significant differences between vowels differing only in 65 /a/ /i/ /u/ short long short long short long Voiced 6720 8079 5228 5283 7124 7237 Voiced 1872 2153Voiced 1414 763 1368 1139 2186 2104 640 641 900 1162 Voiced 4115 5838 2559 3823 2942 1788 Voiceless 8016 7686 5583 5801 7389 7270 Voiceless 3206 3238Voiceless 3927 2493 3398 2545 3323 2651 3767 2414 2203 2298 Voiceless 7686 8275 7426 7513 7879 7248 Dental Uvular Glottal Voiceless 2243 2363 935 1149 1776 2042 Alveolar Pharyngeal Labiodental Voiceless 8144 7210 7031 6241 7613 7940 Post-Alveolar Voiceless 3486 3690 3327 3668 3348 3495 Pharyngealized Dental Voiced 3413 4249 3101 2767 3702 4047 Pharyngealized Alveolar Voiceless 7135 6875 4738 6147 6972 7137 Table 5–1. Mean frequency (Hz) of amplitude peak as measured at the middle of frication noise. 66

1751 Voiceless Glottal Voiced 2434 Pharyngeal 874

3476 Uvular 1850

3502 Post-Alveolar

Pharyngealized 6501 Alveolar

6958 Alveolar 6612

Pharyngealized

Place of Articulation Dental 3547 Place of Articulation

7671 Dental 3511

7363 Labiodental

SpectralSpectral Peak Peak Location Location (Hz) Figure 5–1. Mean spectral peak location as a function of place and voicing 67

9000 Voiced

8000 Voiceless

7000

6000

5000

4000

3000 Spectral peak location (Hz)

Spectral2000 Peak Location (Hz)

1000

0 Dental Alveolar Uvular Pharyngeal PlacePlace of Articulation Articulation Figure 5–2. Place of articulation and voicing interaction for spectral peak location 68 length were present (Figure 5–3), subsequent post hoc tests adjusted for multiple comparisons using the Bonferroni method showed that frequency of spectral peak location measured in the context of either /u/ or /u:/ was significantly lower than spectral peak location measured in the context of either /i/ or /i:/. Moreover, spectral peak location of fricatives preceding /u/ had significantly lower frequencies than in the context of all other vowels except as noted above for the /u-u:/ contrast.

6000 short long

5000

4000

3000

2000 Spectral peak location (Hz) Spectral Peak Location (Hz) 1000

0 / i / / a / / u / PlaceVowel of Articulation Context Figure 5–3. Frequency of spectral peak location in different vowel contexts

A significant [F (40, 561) = 1.441, p < 0.05; η2 = 0.093] Place by Vowel context interaction with subsequent Bonferroni post hoc tests showed that the effect of vowel context mentioned above was confined only to alveolar and glottal fricatives. As apparent from Figure (5–4) and Figure (5–5), both /u/ and /u:/ resulted in a significantly lower frequency of spectral peak location in alveolar fricatives than 69 all other vowels. In the case of glottal fricative /h/, the short high back vowel /u/ (mean =935 Hz) introduced a significantly lower spectral peak frequency only when compared to /i/ and /i:/ (mean =2243 Hz and 2363 Hz respectively). Although the frequency of the spectral peak location of /sQ/ in the context of /u/ was about 2396 Hz lower than that of /a, i/, such a difference was only marginally significant (p = 0.051).

/f/ D/ Q//T, Q//s,z//D /S//s /h//è,Q//X,K/ 9000

8000

7000

6000

5000

4000

3000 Spectral peak location (Hz) 2000

/ i / 1000 / u / / a / 0 Place of Articulation Figure 5–4. Mean frequency of spectral peak location as a function of place and short vowels

5.2 Spectral Moments

The first four statistical moments were computed from three 40 ms windows located at the onset, middle, and offset of the frication and from a 40 ms window centered at the fricative offset to capture any transitional information into the vowel. In this section, two analyses are presented for each moment. Specifically, to capture the general trend of spectral moments, separate one-way ANOVAs were 70

/f/ D/ Q//T, Q//s,z//D /S//s /h//è,Q//X,K/ 9000

8000

7000

6000

5000

4000

3000 Spectral Peak Location (Hz) 2000

/ i: / 1000 / u: / / a: / 0 Place of Articulation Figure 5–5. Mean frequency of spectral peak location as a function of place and long vowels 71 conducted for place and voice with moments across window locations as dependent variables. Additionally, a preliminary one-way ANOVA test of differences between moments computed at different windows showed a main effect for window location for all moments. Therefore, separate three-way ANOVAs (place × vowel × voicing) with subsequent Bonferroni post hoc tests were conducted for each moment and window location combination. A summary of the spectral moments collapsed across speakers, vowel context, and window locations are presented in Table (5–2). 5.2.1 Spectral Mean

One-way ANOVAs for place and voicing were carried out utilizing spectral mean measurements across the four window locations as the dependent variable. The ANOVA revealed a main effect for Place of articulation [F (8, 2487) = 210.567, p < 0.001; η2 = 0.403]. Subsequent Bonferroni post hoc tests were conducted for voiceless and voiced fricatives separately. For voiced fricatives, spectral mean was highest for alveolar /z/ (5935 Hz) and lowest for pharyngeal /Q/ (1547 Hz). Differences in spectral means for all contrasts within voiced fricatives were significant, with the exception of the contrast between plain dental /D/ and its pharyngealized counterpart/DQ/. As for voiceless fricatives, alveolar /s/ had the highest spectral mean (5546 Hz), while glottal /h/ had the lowest (2513 Hz). Also, with the exception of the nonsibilants (/f, T/), spectral mean tends to decrease as the fricative articulation moves towards the back of the mouth. Additionally, as was the case in spectral peak location (Section 5.1), three categories containing fricatives articulated in adjacent places (/f, T, s, sQ/, /S, X/ and /Q, h/) were observed to have no within-group differences that were statistically significant. Only comparisons involving members of different groups were significant. The only exception to this general observation was with the first group in which the contrast between labiodental /f/ (4802 Hz) and alveolar /s/ (5546 Hz) was significant. A main effect was also obtained for Voicing 72 0.93 0.89 5.23 11.74 1.15 0.74 6.48 1.51 2.96 2.10 2.38 13.69 0.45 0.19 1.57 2.34 0.65 1.79 2.25 0.69 0.70 0.84 1.33 -0.06 6.91 5.26 4.38 1.46 4.39 5.97 7.45 3.61 (MHz) 4633 6.45 5740 4.83 3024 4.39 2034 1.96 (Hz) Voiced 3999 Voiced 5935 Voiced 2396 Voiced 1547 Voiceless 5266 5.99 0.25 0.72 Voiceless 5546 4.39 0.44 1.05 Voiceless 3652 4.40 1.36 3.97 Voiceless 2522 2.45 2.42 9.79 Dental Uvular Glottal Voiceless 2513 4.43 1.76 4.56 Alveolar Pharyngeal Table 5–2. Spectral moments for place and voice averaged across all window locations. Labiodental Voiceless 4802 Post-Alveolar Voiceless 3888 Place Spectral Mean Variance Skewness Kurtosis of Articulation Pharyngealized Dental Voiced 3910 Pharyngealized Alveolar Voiceless 5257 73

[F (1, 2494) = 59.025, p < 0.001; η2 = 0.023]. Collapsed across all speakers, place and vowel contexts, voiceless fricatives had higher values for spectral mean (4181 Hz) than voiced fricatives (3557 Hz). As mentioned above, values for spectral mean measured at different window locations were statistically different [F (3, 2492) = 326.978, p < 0.001; η2 = 0.28]. Therefore, separate three-way ANOVAs (place × vowel × voicing) were carried out for spectral mean at each window location. There was a main effect for place of articulation for all window locations with η2 values of 0.736 (window 1), 0.830 (window 2), 0.790 (window 3) and 0.602 (window 4). The range of η2 indicates that spectral information measured at these windows contributed with varying degrees to the separation of fricatives according to their place of articulation. This observation was confirmed by post hoc tests for differences performed on voiced and voiceless fricatives separately. For voiced fricatives, across all windows, alveolar fricative /z/ had the highest spectral mean while pharyngeal /Q/ had the lowest. Additionally, spectral mean distinguished between all places of voiced fricatives in all windows, with the exception of the contrasts between (/D/ and /DQ/) in the first three windows and between any combination of (/K/, /Q/ and /DQ/) in the fourth window (Figure 5–6). On the other hand, differences between voiceless fricatives in terms of spectral mean measured at different windows were not as categorically distinguishing as in the case of voiced fricatives. Nevertheless, as noted above, three clusters containing fricatives articulated in adjacent places (/f, T, s, sQ/, /S, X/ and /è, h/) emerged as distinct groups for which no within-group differences were significant with regard to spectral mean measured at the second (middle) and third (offset) windows. However, all comparisons between members of different groups were significant with spectral mean decreasing as the articulation moved backwards in the mouth (Figure 5–6). Furthermore, spectral mean as measured at the first (onset) window significantly differentiated between all places with the exception 74 of all possible contrast involving (/T, s, sQ/) and the contrast between (/è- h/). Only alveolar /s/ was significantly different than all other voiceless fricatives at the fourth (transitional) window. Moreover, at the onset and transitional windows, differences observed elsewhere between /f/ and /T/ were not significant (Figure 5–6). There was also a main effect for Voicing in all four windows. As can be seen from Figure (5–7), spectral mean for voiceless fricatives was significantly higher than voiced fricatives in the first three windows and significantly lower at the last (transitional) window. Additionally, a significant Place by Voicing interaction (Figure 5–8) revealed that alveolar fricatives /s, z/ were not significantly different from each other in terms of spectral mean in all but the fourth window at which the /s - z/ contrast was the only one reaching significance level (p < 0.05). Finally, there was a main effect for Vowel context at all four windows. Spectral mean was highest for fricatives preceding /i/ and /i:/, and lowest for fricatives preceding either /u/ or /u:/. Pairwise comparisons for the different vowel contexts at each window showed that the difference between any of the high front vowels (/i, i:/) and either of /u/ and /u:/ was significant at all window locations. Additionally, spectral mean of fricatives in the context of both /i, i:/ was significantly higher than that in the context of either /a, a:/ at the fourth (transitional) window (Figure 5–9). 5.2.2 Spectral Variance

One-way ANOVAs for Place and Voice were conducted with spectral variance averaged across all window locations. A main effect for Place of articulation was obtained [F (8, 2487) = 206.936, p < 0.001; η2 = 0.399], with the lowest variance observed for sibilants and back articulated fricatives while the highest variance was observed for nonsibilants. Table (5–2) shows mean variance values for all fricatives measured in Megahertz (MHz). Bonferroni post hoc tests showed that 75

7000 C C

6000 C

5000 BG B G A G 4000 C B Spectral Mean (Hz)

3000 E E Spectral Mean (Hz) B Place of Articulation E A Labiodental 2000 G B Dental N N NE N C Alveolar D Post-Alveolar E Uvular N Pharyngeal BC Pharyngealized 7000 A G H Dental Pharyngealized C H 6000 BC H Alveolar H B A M Glottal 5000 A DE D D 4000 E E B

Spectral Mean3000 (Hz) N MN C

Spectral Mean (Hz) M NM H BD 2000 M AEN

onsetonset middlemiddle offseto!set transitiontransition WindowWindow Location Location Figure 5–6. Spectral mean (Hz) averaged across vowel contexts for each window as a function of place of articulation. A) voiced. B) voiceless. 76

6000

Voiced 5000 Voiceless

4000

3000

Spectral Mean (Hz) 2000

1000

0 onset middle offset transition 1 Window2 Location3 4 Window Location Figure 5–7. Spectral mean (Hz) averaged across place and vowel contexts for each window as a function of voicing. 77

8000 8000

voiced voicceless 6000 6000

4000 4000 Spectral Mean (Hz)

Spectral Mean (Hz) 2000 2000

0 0 Dental Alveolar Uvular Pharyngeal Dental Alveolar Uvular Pharyngeal AB

8000 8000

6000 6000

4000 4000

Spectral Mean (Hz) 2000 2000 Spectral Mean (Hz)

0 0 Dental Alveolar Uvular Pharyngeal Dental Alveolar Uvular Pharyngeal CD

Figure 5–8. Place of articulation and voicing interaction for spectral mean at four window locations. A) onset, B) middle, C) offset, and D) transition. 78

6000 6000 short long

4000 4000

2000 2000 Spectral Mean (Hz) Spectral Mean (Hz)

0 0 / i / / u / / a / / i / / u / / a / AB

6000 6000

4000 4000

2000 2000 Spectral Mean (Hz) Spectral Mean (Hz)

0 0 / i / / u / / a / / i / / u / / a / CD

Figure 5–9. Spectral mean as a function of vowel context at four window locations. A) onset, B) middle, C) offset, and D) transition. 79 within voiced fricatives, spectral variance did not differentiate between plain dental (/D/) and its pharyngealized counterpart (/DQ/). However, all other comparisons within voiced fricatives were significant (p < 0.001). As for voiceless fricatives, spectral variance for the nonsibilants /f, T/ was significantly higher than those of all other places. However, spectral variance for the /f/ and /T/ themselves was not significantly different. Moreover, spectral variance for /S/ and /è/ was significantly lower than that of all other places. Another main effect was observed for Voicing [F (1, 2494) = 39.778, p < 0.001; η2 = 0.016] with voiced fricatives having higher variance (5.09 MHz) than voiceless fricatives (4.45 MHz). Since a one-way ANOVA showed that overall spectral variance differed significantly as a function of Window Location [F (3, 2492) = 33.742, p < 0.001; η2 = 0.04], multiple three-way ANOVAs (place × vowel × voicing) were carried out for spectral variance at each window location. The ANOVAs revealed a main effect for Place of Articulation [F (8, 561) = 104.502 (onset), 98.597 (middle), 137.024 (offset), 55.05 (transition); p < 0.001; η2 = 0.6 (onset), 0.58 (middle), 0.66 (offset), 0.44 (transition)]. As apparent from Figure (5–10), for both voiced and voiceless fricatives, nonsibilants (/f, T, D, DQ/) had the highest variance while pharyngeal fricatives (/è, Q/) had the lowest variance. Pairwise comparisons within voiced fricatives showed that only the difference between /D - DQ/ was not significant at all windows. With the exception of the /D - DQ/ contrast, spectral variance differentiated between all places of articulation within voiced fricatives at all window locations. On the other hand, spectral variance did not differentiate between voiceless fricatives in the same manner as it did with voiced fricatives. Specifically, spectral variance was able to distinguish between any combination of voiceless fricatives either at the second or the third window (Figure 5–10). The only exceptions are the expected lack of difference between /s, sQ/ and the insignificant difference between /h, sQ/ at all windows. Additionally, as with voiced 80 fricatives, nonsibilant fricatives (/f, T/) had significantly higher variance than all other voiceless fricatives in at least three of the four analysis windows. As mentioned previously, a main effect of Voicing was observed with the overall spectral variance. However, ANOVA’s conducted for individual windows revealed that such effect was only present at the second (middle) window [F (1, 561) = 9.973, p < 0.001; η2 = 0.017] with the expected increase in variance for voiced fricatives (5.4 MHz compared to 4.5 MHz for voiceless fricatives). Nevertheless, a significant Place by Voicing interaction was present at all analysis windows. Bonferroni post hoc tests showed that the increase in spectral variance for voiced fricatives as compared to voiceless fricatives was significant only for dentals (/T, D/) at the second window; and for alveolars (/s, z/) at fourth window. Another source of the interaction, as can be seen from Figure (5–11), is due to an increase in spectral variance for voiceless, rather than voiced, pharyngeal fricatives. Such an increase, and subsequent shift in the voicing effect, was present at all windows but significant only at the fricative-vowel boundary (windows three and four). There was also a main effect for Vowel context (p < 0.0001) in all but the first analysis window. The source for this effect as revealed by post hoc tests is twofold: first, there was a significant increase in spectral variance for fricatives preceding either /u/ or /u:/ as compared to all other vowels in the second (middle) and third (offset) windows (Figures 5–12A and B); and second, the variance of fricatives preceding /i/ and /i:/ was significantly higher than that of either /a/ or /a:/ in the fourth window (Figure 5–12C). 5.2.3 Spectral Skewness

A one-way ANOVA for spectral skewness across all window locations showed a significant main effect for Place [F (8, 2487) = 137.975, p < 0.001; η2 = 0.31], with skewness ranging from 2.34 for pharyngeal (/è, Q/) to 0.19 for alveolar fricatives (/s, z/). Subsequent Bonferroni post hoc tests indicated that for both voiced and 81

G G 8 G B B B C

6 E A C E B G C 4 E C

E Spectral Variance (MHz) Place of Articulation Spectral Variance (MHz) 2 A Labiodental N N N N B Dental C Alveolar D Post-Alveolar E Uvular N Pharyngeal Pharyngealized 8 G A Dental Pharyngealized B H A A Alveolar B M Glottal 6 BE C E MH M B EC M H HC B 4 D DH C D M

Spectral Variance (MHz) D N A NE N Spectral Variance (MHz) 2 N

onsetonsetmiddle middleo! offsetset transition Window Location Window Location Figure 5–10. Spectral variance (MHz) averaged across vowel contexts for each window as a function of place of articulation. A) voiced. B) voiceless. 82

8 8

7 7 voiced

6 6 voiceless

5 5

4 4

3 3

2 2 Spectral Variance (MHz) 1 Spectral Variance (MHz) 1

0 0 Dental Alveolar Uvular Pharyngeal Dental Alveolar Uvular Pharyngeal AB

8 8

7 7

6 6

5 5

4 4

3 3

2 2 Spectral Variance (MHz) Spectral Variance (MHz) 1 1

0 0 Dental Alveolar Uvular Pharyngeal Dental Alveolar Uvular Pharyngeal CD

Figure 5–11. Place of articulation and voicing interaction for spectral variance at four window locations. A) onset, B) middle, C) offset, and D) transition. 83

7 7 Short Long 6 6

5 5

4 4

3 3

2 2

Spectral Variance (MHz) 1 Spectral Variance (MHz) 1

0 0 / i / / u / / a / / i / / u / / a / AB

7

6

5

4

3

2

Spectral Variance (MHz) 1

0 / i / / u / / a / C

Figure 5–12. Spectral variance as a function of vowel context at three window locations. A) middle, B) offset, and C) transition. 84 voiceless fricatives, skewness did not differentiate between plain fricatives and their pharyngealized counterparts (/D - DQ, s - sQ/). However, besides the exception noted above, all voiced fricatives were significantly different from each other in terms of skewness (means are reported in Table (5–2). Within voiceless fricatives, skewness significantly differentiated among nonsibilants /f/ and /T/ (0.7 and 0.25 respectively). However, skewness did not distinguish nonsibilants from either /s/ or /sQ/ or between /S/ and / X/. All other voiceless fricatives were significantly different from each other in terms of spectral skewness. The effect of voicing on spectral skewness was not significant (p = 0.67). Due to the previously mentioned significant differences between skewness measured at different windows [F (3, 2492) = 145.382, p < 0.001; η2 = 0.15], a three-way ANOVA (place × vowel × voicing) was conducted for spectral skewness at each window location. A main effect for Place was obtained at all window locations. With the exception of /D - DQ/ contrast, pairwise comparisons showed that all voiced fricatives were significantly different from each other in term of spectral skewness at the second (middle) and third (offset) windows (Figure 5–13). Pharyngeal /Q/ had the highest skewness, indicating a concentration of energy at frequencies lower than for all other voiced fricatives, while the negative skewness obtained for /z/ indicates a concentration of energy at higher frequencies. Interestingly the difference in skewness between dental and pharyngealized dental (/D - DQ/) reached significance (p = 0.008) only at the fourth window located at fricative-vowel transition (Table 5–3). The lack of a significant difference between plain fricatives and their pharyngealized counterparts was also present for voiceless fricatives /s - sQ/ at all window locations. As can be seen in Table (5–4), skewness differentiated between all voiceless fricatives in at least two windows with the notable exception of the /S - h/ contrast, which was significant only at the fourth window (transition). If the number of places distinguished in term of skewness 85

3.00

2.50 E N N N GN 2.00 E

E 1.50 B E A 1.00 B 0.50 G Skewness C G G B B 0.00 C C Place of Articulation C -0.50 A Labiodental B Dental -1.00 C Alveolar D Post-Alveolar E Uvular 3.00 N Pharyngeal E Pharyngealized AN G 2.50 N Dental N M N Pharyngealized H 2.00 Alveolar B M Glottal M H 1.50 M M D D D C E D 1.00 E AH E B C H

Skewness 0.50 AC

Spectral Skewness0.00 H Spectral Skewness B B C -0.50 AB -1.00 onsetonset middlemiddle offseto!set transition transition WindowWindow Location Location Figure 5–13. Spectral skewness averaged across vowel contexts for each window as a function of place of articulation. A) voiced. B) voiceless. 86 differences at a given window is used as an indicator to that window’s distinctive spectral information, windows placed at the middle and offset of frication noise were more successful in distinguishing between voiceless fricatives than others (Tables 5–3 and 5–4).

Table 5–3. Window locations at which a difference between voiced fricatives in terms of spectral skewness are significant. /D/ /z/ /K/ /Q/ /z/ 1 2 3 4 /K/ 1 2 3 4 1 2 3 4 /Q/ 1 2 3 4 1 2 3 4 φ 2 3 φ /DQ/ φ φ φ 4 1 2 3 4 1 2 3 φ 1 2 3 φ φ indicates absence of significant differences

Table 5–4. Window locations at which a difference between voiceless fricatives in terms of spectral skewness are significant. /f/ /T/ /s/ /S/ /X/ /è/ /sQ/ /T/ 1 φ φ 4 /s/ φ 2 φ 4 1 φ φ 4 /S/ 1 2 3 4 1 2 3 φ 1 2 3 φ /X/ 1 2 3 φ 1 2 3 4 1 2 3 4 φ φ 3 4 /è/ 1 2 3 φ 1234 1234 1234 123 φ /sQ/ φ 2 φ 4 1 2 3 φ φ φ φ φ 1 2 3 φ φ 2 φ 4 1 2 3 4 /h/ 1 2 3 φ 1 2 3 4 1 2 3 4 φ φ φ 4 φ 2 3 φ 1 2 3 φ 1 2 3 4 φ indicates absence of significant differences

Although the effect of voicing was not significant for the overall skewness, a main effect for Voicing was obtained at all but the third (offset) window. At both frication onset and middle windows, voiceless fricatives had significantly (p < 0.001) lower skewness than voiced fricatives; while skewness measured at the fricative- vowel transition was significantly (p < 0.0001) higher for voiceless fricatives than voiced ones (Figure 5–14). Also, a Place by Voicing interaction was significant at all but the last (transition) window. In general, the reduction in skewness for voiceless fricatives when compared to voiced fricatives as noted in the main effect above was reversed for alveolar and pharyngeal fricatives in the first three windows; and for all fricatives in the fourth window (Figure 5–15). However, this increase in 87

skewness for voiceless fricatives was only significant (p < 0.05) for alveolar fricatives at the fourth (transition) window.

2.52.5

Voiced Voiceless 22

1.51.5

11 Spectral Skewness Spectral Skewness

0.50.5

00 onset middle offset transition 1 Window2 Location3 4 Figure 5–14. Spectral skewnessWindow averaged acrossLocation place and vowel contexts for each window as a function of voicing.

The ANOVAs also revealed a main effect of Vowel context at all window locations. The magnitude of the effect becomes larger as the window moves closer to the vowel (η2 = 0.028 at frication mid-piont, 0.037 at frication offset and 0.31 at fricative-vowel transition). The source of such effect, as illustrated in Figure (5–16) and associated Bonferroni post hoc tests, is attributed to the significant decrease in fricative skewness in the context of short /i/ and long /i:/. Specifically, long /i:/ resulted in significantly lower skewness than long /u:/ in all but the second window, while short /i/ resulted in significantly lower skewness than short /u/ in the first and fourth windows. Additionally, differences between high front and 88

3 3

voiced voiceless 2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5 Spectral Skewness Spectral Skewness 0 0

-0.5 -0.5

-1 -1 Dental Alveolar Uvular Pharyngeal Dental Alveolar Uvular Pharyngeal AB

3 3

2.5 2.5

2 2

1.5 1.5

1 1

0.5 0.5 Spectral Skewness Spectral Skewness 0 0

-0.5 -0.5

-1 -1 Dental Alveolar Uvular Pharyngeal Dental Alveolar Uvular Pharyngeal CD

Figure 5–15. Place of articulation and voicing interaction for spectral skewness at four window locations. A) onset, B) middle, C) offset, and D) transition. 89 low front vowels (/i, i:/ and /a, a:/) were significant only at the transition window (Figure 5–16D).

1.4 0.9 short 0.8 1.2 long 0.7 1 0.6

0.8 0.5

0.6 0.4 0.3 0.4 Spectral Skewness Spectral Skewness 0.2 0.2 0.1

0 0 / i / / u / / a / / i / / u / / a / AB

1.2 3

1 2.5

0.8 2

0.6 1.5

0.4 1 Spectral Skewness Spectral Skewness 0.2 0.5

0 0 / i / / u / / a / / i / / u / / a / CD

Figure 5–16. Spectral skewness as a function of vowel context at four window locations. A) onset, B) middle, C) offset, and D) transition.

5.2.4 Spectral Kurtosis

One-way ANOVAs testing for effects of place and voice with spectral kurtosis measurements across the four windows as the dependent variable revealed a main effect of Place [F (8, 2487) = 99.567, p < 0.001; η2 = 0.24]. Bonferroni post hoc tests conducted on voiced fricatives showed that only kurtosis of uvular /K/ (6.5) and pharyngeal /Q/ (13.7) were significantly higher than all other voiced 90 fricatives. As for within voiceless fricatives, kurtosis significantly differentiated between the nonsibilants /f/ and /T/ with a mean of 2.96 and 0.72 respectively. Moreover, pharyngeal /è/ with kurtosis of 9.8 was significantly higher than all other voiceless fricatives. The ANOVA also revealed a main effect of Voicing [F (1, 2494) = 22.922, p < 0.001; η2 = 0.01] in which voiceless fricatives had significantly lower kurtosis than voiced fricatives (mean of 3.376 and 4.83 respectively). A one-way ANOVA showed that kurtosis differed significantly as a function of Window location [F (3, 2492) = 67.968, p < 0.001; η2 = 0.076], with the fourth (transition) window registering the highest values for kurtosis. Therefore, a three-way ANOVA (place × vowel × voicing) was conducted for spectral kurtosis at each window location. The results of the three-way ANOVAs showed a main effect of Place at all window locations. With the exception of the fourth window, the magnitude of the effect becomes larger as the window advances towards the fricative-vowel boundary (η2 of the first three windows was 0.34, 0.46 and 0.51 respectively). Subsequent Bonferroni post hoc tests at each window were carried out for voiced and voiceless fricatives separately (Figure 5–17). Within voiced fricatives, no significant differences were observed with all possible contrasts between /D, DQ, z/ at all windows with the exception of the /DQ - z/ contrast, which reached significance level (p < 0.05) at the fourth window only. Moreover, while kurtosis of pharyngeal /Q/ was significantly higher than uvular /K/ in all but the last (transition) window, each of the two fricatives had significantly higher (p < 0.01) kurtosis than all other voiced fricatives in the first and third window. A similar pattern was also observed with voiceless fricatives. Specifically, voiceless pharyngeal fricative /è/ had significantly higher kurtosis than all other voiceless fricatives in the second (mean =11.6) and third analysis windows (mean =10.8). Also, as was the case with /D - DQ/ contrast, no difference was obtained 91

N 15

N N

N E 10

E A G Kurtosis 5 E

E B B Spectral Kurtosis C Place of Articulation CG BG C G A Labiodental 0 B C B Dental C Alveolar D Post-Alveolar E Uvular N Pharyngeal 15 Pharyngealized G Dental N Pharyngealized E H A Alveolar M Glottal 10 N M N N

B Kurtosis 5 BDH

M M Spectral Kurtosis M DE D C C E H CD A AHE H 0 B C B BA onsetonsetmiddle middle offseto!set transition transition Window Location Window Location Figure 5–17. Spectral kurtosis averaged across vowel contexts for each window as a function of place of articulation. A) voiced. B) voiceless. 92 between plain alveolar /s/ and its pharyngealized counterpart /sQ/ at all windows. Additionally, while kurtosis of glottal /h/ was significantly lower than that of pharyngeal /è/ at all windows, it was significantly higher than kurtosis of /S/ in the fourth window and significantlly higher than all other remaining voiceless fricatives in the second and third windows (Figure 5–17). A main effect of Voicing was also obtained at all but the fourth window. Similar to the effect observed with the overall kurtosis, voiceless fricatives in the aforementioned windows had significantly lower kurtosis than voiced fricatives (Figure 5–18). The size of this effect was rather small and generally decreased in the middle window (η2 of the first three windows was 0.05, 0.03 and 0.06 respectively). Moreover, a Place by Voicing interaction was also significant at the first three windows. Basically, as suggested by the corrosponding post hoc tests shown in Figure (5–19), the effect of voicing was significant (p < 0.05) for uvulars /K, X/ at frication onset, for pharyngeals /è, Q/ at the middle of frication noise and for both uvular and pharyngeal places of articulation at the frication offset. Finally the effect of vowel context was observed only at the edges of the frication noise: frication onset [F (5, 561) = 3.068, p < 0.001; η2 = 0.03]; and transition into the vowel [F (5, 561) = 17.406, p < 0.001; η2 = 0.134]. Subsequent Bonferroni post hoc tests carried out at these windows showed that the source of the main effect is due to the significant decrease in kurtosis for a fricative preceding /i:/ as compared only to /u/ at the onset window (Figure 5–20A); and due to the greater decrease in kurtosis for fricatives preceding short /i/ and long /i:/ as compared to all other vowels at the transition window (Figure 5–20B). The difference between long /i:/ and long /u:/ was marginally significant (p = 0.056) at the onset window. 93

9 9

8 8 voiced voiceless

7 7

6 6

5 5

4 4

3 3 Spectral Kurtosis Spectral Kurtosis 2 2

1 1

0 0 onset middle offset transition 1 2Window Location3 4 Figure 5–18. Spectral kurtosisWindow averaged across Location place and vowel contexts for each window as a function of voicing. 94

16 16 voiced voiceless 14 14

12 12

10 10

8 8

6 6

4 4 Spectral Kurtosis Spectral Kurtosis

2 2

0 0

-2 -2 Dental Alveolar Uvular Pharyngeal Dental Alveolar Uvular Pharyngeal AB

16

14

12

10

8

6

4 Spectral Kurtosis

2

0

-2 Dental Alveolar Uvular Pharyngeal C

Figure 5–19. Place of articulation and voicing interaction for spectral kurtosis at four window locations. A) onset, B) middle, and C) offset. 95

5 12 short long 4.5 10 4 3.5 8 3 2.5 6 2 4 1.5 Spectral Kurtosis 1 Spectral Kurtosis 2 0.5 0 0 / i / / u / / a / / i / / u / / a / AB

Figure 5–20. Spectral kurtosis as a function of vowel context at two window locations: A) onset and B) transition. CHAPTER 6 FORMANT TRANSITION This chapter reports on acoustic measurements related to spectral information at the fricative-vowel transition that might help distinguish between the different places of fricative articulation. The first measurement reported is the frequency of the second formant (F 2) measured in Hertz from a 25-ms kaiser window placed at the vowel onset. The second measurement is the coefficients of regression line fits with scatterplots of F 2 at the vowel’s onset (y-axes) and mid-point (x-axes) derived for each place and speaker and averaged across voicing and vowel context. 6.1 Second Formant (F 2) at Transition

Table (6–1) presents the F 2 values at the onset of the vowel for each place of articulation and voicing, averaged across speakers and vowel context. The results of a three-way ANOVA (place × voicing × vowel) showed a significant main effect for Place of articulation [F (8, 561) = 97.988, p < 0.0001; η2 = 0.58]. Subsequent post hoc tests were carried out separately on voiced and voiceless fricatives. For both voiced and voiceless fricatives, pharyngealized fricatives (/DQ/ 1164 Hz and /sQ/ 1288 Hz) had significantly lower F 2 frequencies than their plain counterparts (/D/: 1603 Hz and /s/: 1636 Hz). In fact, within voiced fricatives /DQ/ had a significantly lower frequency than all voiced fricatives with the exception of uvular /K/. While upholding the lack of significance between /DQ - K/, voiced uvular /K/ also had a significantly lower F 2 frequency (1171 Hz) than all other voiced fricatives. No other contrasts within voiced fricatives were statistically significant. A similar pattern was also observed within voiceless fricatives. Specifically, as was the case for voiced fricatives, there was a lack of significant difference between pharyngealized and uvular fricatives (/sQ - X/ in this case), and between dental and

96 97 alveolar fricatives (/T - s/). Moreover, the F 2 frequencies of both pharyngeal /è/ and glottal /h/ were statistically similar to /f/, /T/ and /s/ (means are reported in Table (6–1)). Additionally, no significant difference was obtained between uvular and pharyngeal (/X - è/). All other contrasts between voicless fricatives were significant (p < 0.05 for within non-sibilants and p < 0.0001 for other contrasts).

Table 6–1. Mean values of F 2 (Hz) at transition averaged across speakers and vowel context as a function of place and voicing. Place of Articulation F2 at transition (Hz) mean Labiodental Voiceless 1496

Dental Voiced 1603 Voiceless 1602 1602 Alveolar Voiced 1633 Voiceless 1636 1634 Post-Alveolar Voiceless 1742

Uvular Voiced 1171 Voiceless 1325 1248 Pharyngeal Voiced 1555 Voiceless 1589 1572 Pharyngealized Dental Voiced 1164

Pharyngealized Alveolar Voiceless 1288

Glottal Voiceless 1565

The ANOVA also revealed a main effect of Voicing [F (1, 561) = 9.145, p < 0.005; η2 = 0.016], with voiceless fricatives registering higher F 2 frequencies than voiced fricatives (mean 1530 and 1425 respectively). However, a significant Place by Voicing interaction [F (3, 561) = 5.337, p < 0.002; η2 = 0.028] and subsequent Bonferroni post hoc tests (Figure 6–1) showed that such effect was limited to uvular fricatives. 98

1700 voiced voiceless 1600

1500

1400

owel Onset (Hz) 1300 V F2 at 1200

1100

1000 Dental Alveolar Uvular Pharyngeal

Figure 6–1. Place of articulation and voicing interaction for F 2 (Hz) measured at vowel onset. 99

There was also a main effect of Vowel context [F (5, 561) = 221.237, p < 0.0001; η2 = 0.66]. As expected, F 2 (measured at the onset of high front vowels /i, i:/ with mean frequency of 1708 and 1919 Hz respectively) were significantly higher than all other vowels (p < 0.0001). Also, the F 2 frequencies of back vowels (/u, u:/ with means of 1209 and 1259 Hz respectively) were significantly lower than those of all other vowel contexts (p < 0.0001). The mean frequency of F 2 at /a/ onset was 1435 Hz and 1409 Hz for /a:/. The effect of vowel length on F 2 frequency was not significant except for the /i -i:/ contrast, for which long vowels introduced higher F 2 frequencies.

2500

short long 2000

1500 owel Onset (Hz) V 1000 F2 at

500

0 / i / / a / / u /

Figure 6–2. F 2 (Hz) measured at vowel onset as a function of vowel context. 100

6.2 Locus Equation

Locus equation coefficients for every place of articulation were obtained for each of the eight speakers in our study (8 speakers × 9 places of articulation). Specifically, a linear regression fit was applied on scatterplots with F2 values averaged across all vowel contexts. Each scatterplot had F 2 measured at the onset of the vowel represented on the y-axes and F2 measured at the mid-point of the vowel represented on the x-axes. The coefficients of each regression line (the slope ‘k’ and the y-intercept ‘c’) were taken to be the terms of locus equations. An example plot is presented in Figure (6–3).

2500

y = k x + c y = 0.5837 x + 666.25 2000 owel onset

V 1500

1000 equency (Hz) at

F2 Fr 500

0 0 500 1000 1500 2000 2500 F2 Frequency (Hz) at Vowel mid-point

Figure 6–3. An example of a scatterplot to derive coefficients of locus equation.

Table (6–2) presents mean slope and y-intercept values for each place of articulation averaged across vowel contexts. A one-way ANOVA for slope showed 101 a main effect for Place of Articulation [F (8, 63) = 15.092, p < 0.001; η2 = 0.66]. Pharyngealized fricatives had the lowest slope (0.168 for /DQ/ and 0.399 for /sQ/), while glottal /h/ had the highest (mean slope of 0.924). However, post hoc tests revealed that the slope for pharyngealized dental /DQ/ was significantly different from all other plain (non-pharyngealized) fricatives. Furthermore, the high slope of /h/ was significantly different from all other fricatives with the exception of uvular fricatives /X, K/. The slope of pharyngealized alveolar /sQ/ was only significantly different from uvular fricatives. No other contrasts were significant. On the other hand, a one-way ANOVA for y-intercept revealed a main effect for place [F (8, 63) = 10.313, p < 0.001; η2 = 0.57]. Glottal /h/ and uvular fricatives /X, K/ had the lowest y-intercept values (160 and 289 Hz respectively), while the highest y-intercept value was observed for post-alveolar fricative /S/ (956 Hz). Although no significant differences between y-intercept of /h/ and /X, K/ were observed, Bonferroni post hoc tests showed that y-intercept for /h/ was significantly lower than all other places of articulation. Additionally, the y-intercept values for uvular fricatives were significantly lower than all other places of articulation with the exception of labiodental and pharyngeal fricatives (/f/ and /Q, è/). No other significant differences were obtained.

Table 6–2. Mean slope and y-intercept values for each place of articulation averaged across vowel contexts. Place slope y-intercept of Articulation Labiodental 0.565 652 Dental 0.507 825 Alveolar 0.451 930 Post-Alveolar 0.502 956 Uvular 0.692 289 Pharyngeal 0.579 665 Pharyngealized Dental 0.168 938 Pharyngealized Alveolar 0.399 751 Glottal 0.925 160 CHAPTER 7 STATISTICAL CLASSIFICATION OF FRICATIVES Discriminant Function Analysis (DFA) was used to determine the most parsimonious way to distinguish among the different places of articulation using the acoustic cues investigated in our study (descriptive DFA). Furthermore, DFA was used here to assess the contribution of each selected cue to the overall classification of fricatives into their places of articulation. Also, to get a more realistic indication of the use of these cues in distinguishing unknown tokens, a cross-validation method was used with the obtained discriminant functions (predictive DFA). All acoustic variables investigated in our study were used in the DFA procedure with the exception of locus equations since they do not reflect measures of single tokens, but rather the coefficients of linear regression fits on aggregated data points representing places of articulation for each speaker. 7.1 Discriminant Function Analysis

Discriminant function analysis is a statistical procedure that classifies tokens into two or more mutually exclusive a priori groups (i.e., place of articulation) using a set of predictors (i.e., acoustic cues) (Klecka 1980; Hair, Anderson, and Tatham 1987; Stevens 2002). A discrimination function consists of a linear combination of one or more variables that maximizes the distance (i.e., differences) between the groups being classified. In our study, for both descriptive and predictive DFA, predictors were entered into the analysis using a step-wise method in which only the predictor that minimized Wilks’ Lambda (Λ) statistic, also known as U-statistic, would be entered at any given step. The criteria for entry was set at p = 0.05 and at p = 0.10 for removal. Also, since the levels of the dependent variables (i.e., places of articulation) have unequal numbers of cases due to lack of

102 103 voicing contrast in some places, the prior probabilities for group membership were calculated from the group size (Table 7–1).

Table 7–1. Prior probabilities for group membership Cases Used Place Prior in Analysis Labiodental 0.077 48 Dental 0.154 96 Alveolar 0.154 96 Post-Alveolar 0.077 48 Uvular 0.154 96 Pharyngeal 0.154 96 Pharyngealized Dental 0.077 48 Pharyngealized Alveolar 0.077 48 Glottal 0.077 48 Total 1 624

The number of discriminant functions obtained by the DFA procedure is the smallest of (g − 1), where g is the number of groups, or (k), where k is the number of predictors. In our study the number of discriminant functions obtained was eight and all were significant (p < 0.001). Table (7–2) shows the percentage of variance accounted for by each of the eight functions. Although all functions were significant, we limited our interpretation to the first three functions since they were the ones contributing the most to the accumulative variance as inferred from their eigenvalues and the canonical correlation associated with these functions (Table 7–2). 7.2 Classification Accuracy of DFA

Before interpreting the classification results obtained from DFA procedure, an assessment of the validity of the current model and its accuracy was carried out. For any classification method, a certain percentage of any performance can be attributed solely to random chance. Therefore, for the current classification model derived from DFA to be valid, it needs to classify cases in a manner better than if the classification was done based on chance. Since the group sizes are unequal 104

Table 7–2. The amount of the variance accounted for by each of the functions calculated by the DFA. Function Eigenvalue % of Variance Cumulative % Canonical Correlation 1 5.224 43.0 43.0 0.916 2 3.651 30.1 73.1 0.886 3 1.894 15.6 88.7 0.809 4 0.470 3.9 92.5 0.566 5 0.387 3.2 95.7 0.528 6 0.244 2.0 97.7 0.443 7 0.177 1.5 99.2 0.388 8 0.098 0.8 100.0 0.298 in our study, the determination of the chance classification were done using two criteria: the proportional chance criterion (Cpro) and maximum chance criterion (MCC) (Hair et al. 1987). The proportional chance criterion is a measure of the average probability of classification calculated considering all group sizes, while the MCC is the percentage of the total sample represented by the largest group. Given the total number of cases and groups in our study, MCC was estimated to be 15.4% and Cpro to be 12.4%. However, both measures serve only as subjective reference points for model accuracy. In fact, there is no general consensus on how high the classification accuracy should be in relation to chance. However, Hair et al. (1987) suggest that it should be at least one fourth greater than classification by chance. Subsequently, the current model should achieve an overall classification rate higher than 19.25% (1.25 × MCC) to be valid. Proportional and maximum chance criteria were calculated as in Equations (7–1) and Equation (7–2), respectively, where N = total number of cases, g = number of groups, n = number of cases in a group and gmax = group with largest number of cases.

g 2 X ni  C = 100 × (7–1) pro N i=1 n MCC = 100 × gmax (7–2) N 105

It is important to note that both proportional and maximum chance criteria are subjective in nature. To circumvent this issue, Press’ Q statistic (Equation 7–3) was used as an additional measurement of model accuracy. Significance of Press’ Q statistic is assessed using a chi-square (χ2) distributed with one degree of freedom. This value will be calculated below for both sets of classification results (descriptive and predictive DFAs). The value ncorrect in Equation (7–3) denotes the number of correctly classified cases.

 2 N − ncorrect × g Q = (7–3) N − (g − 1)

7.3 Classification Power of Predictors

The standardized canonical function coefficients indicate the partial contribution of each variable to the discriminant function(s), controlling for other independents entered in the equation and are used to assess each independent variable’s unique contribution to the discriminant function (Klecka 1980; Hair et al. 1987). Based on these coefficients, spectral mean (frication noise onset, middle, and offset), skewness (onset, offset of frication and transition into the vowel), second formant at vowel onset, normalized RMS amplitude and spectral peak location were identified to be the variables contributing the most to the overall classification. 7.4 Classification Results

As mentioned above, the first goal of DFA implementation in our study was to find the degree to which the acoustic cues investigated here would successfully classify fricatives. To that effect, DFA revealed that 83.2% of the original grouped cases were successfully classified into their respective places of articulation using discriminant functions derived from the acoustic measurements investigated in our study. Furthermore, when the data was split into voiced and voiceless subgroups, 106 the overall classification accuracy was 92.9% for voiced and 93.5% for voiceless fricatives. This classification ratio exceeded both the maximum likelihood and the proportional chance value. Additionally, the Press’s Q statistic (Q = 17.99) was significant at 0.0001. Therefore, it can be concluded that the model investigated was valid. In general, three groups can be identified using a two-dimensional discrimination plane (Figure 7–1 and Figure 7–2). A leave-one-out (also known as jackknife) classification procedure was also used to cross-validate the discrimination functions derived above. In this procedure, the data was split into two sets with discrimination functions obtained from all- but-one subjects (training set) and then used to classify the cases of the remaining subject (testing set). The procedure was repeated until each speaker was included in the testing phase. The overall performance of the discrimination function was taken to be the averaged score across all speakers. An overall correct classification ratio of 79.3% was obtained using the cross-validation method outlined above. When voicing was specified in the model, cross-validated correct classification ratios of 87.9% and 89.8% were obtained for voiced and voiceless fricatives respectively. Both procedures satisfy the criteria mentioned in Section (7.2) for model validity

(Cpro, MCC and Press’ Q). The confusion matrices presented in Tables (7–3) to (7–8) show the percentage of predicted class membership in terms of the fricative place of articulation. Numbers in boldface represent correct classification rates while other numbers represent misclassification rates. Generally speaking, DFA clustered the nine places of fricative articulation into three groups: non-sibilants (/f, T, D, DQ/), sibilants (/s, sQ, z, S/ and back-articulated fricatives (/K, X, è, Q, h/) with misclassification rarely crossing the boundaries of these groups. Such observation was true even when fricatives are partitioned according to voicing. 107

Table 7–3. Overall classification results of all fricatives. Predicted Group Membership Place /f/ /T, D/ /DQ/ /s, z/ /sQ/ /S/ /X, K/ /è, Q/ /h/ /f/ 88 10 0 0 0 2 0 0 0 /T, D/ 6 76 7 0 0 0 3 0 7 /DQ/ 0 2 88 2 0 0 6 0 2 /s, z/ 0 2 0 89 6 2 1 0 0 /sQ/ 0 0 0 17 83 0 0 0 0 /S/ 0 0 0 0 0 98 2 0 0 /X, K/ 0 2 6 0 0 2 72 10 7 /è, Q/ 0 0 0 0 0 0 5 87 8 /h/ 0 0 2 0 0 0 8 10 79

Table 7–4. Cross-validated classification results of all fricatives. Predicted Group Membership Place /f/ /T, D/ /DQ/ /s, z/ /sQ/ /S/ /X, K/ /è, Q/ /h/ /f/ 79 17 0 0 0 2 2 0 0 /T, D/ 8 72 8 0 0 0 3 0 8 /DQ/ 0 6 77 2 0 0 10 0 4 /s, z/ 0 2 0 84 9 3 1 0 0 /sQ/ 2 0 0 15 83 0 0 0 0 /S/ 0 0 0 0 0 98 2.1 0 0 /X, K/ 0 2 7 0 0 2 70 12 7 /è, Q/ 0 0 0 0 0 1 7 82 9 /h/ 0 0 2 0 0 0 8 13 77 108

A A A A AA A A A A A A A A AA AA A A A A A AAAA AA AAAA A A AA AAAA AA AA A A A AA A AAAAA A AA AAA A AAA A A A AA AAAA AA A A A AA A A AAAA AA A AAA AAAAAAAAAAAAA A A A AA A A A AAAA A AAA AAAA AAA AAA A A AA A A AA A AAAA A AAAAA AAAAA AA A A A AA AA A A AA A AA A AAA A A AAA AAA A A A A A A AA A A A A A AA A AAAAA A A A AAA AA A AAAA AAAAA A A AAA A A A A Predicted Group A AA AAAAA AA AAAAAA A A A AA A AAAA AAAAAAA AA A A AAA AA A AAAAAAA A A Labiodental AAAA A AA A AA AAA A AA A A A A A A A AAAA A AAA A A A AA A AA AAAAAAA A Dental AA A A AA AAA AAAAAA AA A AA AAA A AA AA A AA A AA A A A AAAAAAAAA A Alveolar A A A A A A AAAA A AA A AA A A A A A A A A AAAAAA A A A A Post-Alveolar AA A AA A A A A A A A Uvular AA AAA AA AAA AA AAAAAA A A A Pharyngeal A A A AAAAAAAA AA A AAAAAAAAAA A A A A A A AA AAAA A Pharyngealized Dental A AAAAAAAAAA A AA AAAA A A A AA AAAA A AA Pharyngealized Alveolar A AA A A A AAA A A A AA A Glottal A A A A

Figure 7–1. Discrimination plane for all fricatives. 109

Table 7–5. Overall classification results of voiced fricatives. Predicted Group Membership Place /D/ /DQ/ /z/ /K/ /Q/ /D/ 89.6 8.3 0 2.1 0 /DQ/ 8.3 87.5 0 4.2 0 /z/ 0 0 100 0 0 /K/ 6.3 4.2 0 89.6 0 /Q/ 0 0 0 2.1 97.9

Table 7–6. Cross-validated classification results of voiced fricatives. Predicted Group Membership Place /D/ /DQ/ /z/ /K/ /Q/ /D/ 83.3 8.3 2.1 6.3 0 /DQ/ 14.6 75 0 10.4 0 /z/ 0 0 100 0 0 /K/ 6.3 6.3 0 83.3 4.2 /Q/ 0 0 0 2.1 97.9

Table 7–7. Overall classification results of voiceless fricatives. Predicted Group Membership Place /f/ /T/ /s/ /sQ/ /S/ /X/ /è/ /h/ /f/ 79.2 16.7 0 0 2.1 2.1 0 0 /T 8.3 91.7 0 0 0 0 0 0 /s 0 2.1 87.5 8.3 2.1 0 0 0 /sQ/ 0 0 18.8 81.3 0 0 0 0 /S/ 0 0 0 0 100 0 0 0 /X 0 2.1 0 0 2.1 91.7 4.2 0 /è/ 0 0 0 0 0 6.3 93.8 0 /h/ 0 0 0 0 0 0 6.3 93.8 110

Table 7–8. Cross-validated classification results of voiceless fricatives. Predicted Group Membership Place /f/ /T/ /s/ /sQ/ /S/ /X/ /è/ /h/ /f/ 83.3 12.5 0 0 2.1 2.1 0 0 /T 6.3 93.8 0 0 0 0 0 0 /s 0 0 91.7 8.3 0 0 0 0 /sQ/ 0 0 10.4 89.6 0 0 0 0 /S/ 0 0 0 0 100 0 0 0 /X 0 0 0 0 0 97.9 2.1 0 /è/ 0 0 0 0 0 2.1 97.9 0 /h/ 0 0 0 0 0 0 6.3 93.8

A A AA A A A AA A AAAA A AAAAA AA A AA AAAAA AA AAA A A AAA AAA AA AA AA A AAA AAA AA AAA AAAAA AAA A A AAAAA AAAAAA A A A AAAA A A AAA AAA AA AAA AAAA A AAAAAAAA A A AAAAAAA AA AAAAAAA A AAAAA A AA AA A A A AAA A AAA A A A AA AA AAAAA A AAAAA A A AAAAAAAA A AA A A A A A A Predicted Group Labiodental Dental Alveolar Post-Alveolar Uvular Pharyngeal A A A AA AAAAA AAA A AAAAA AA A A Pharyngealized Dental AA AAAAAAAAAA AAA AAAAAAAA A A A AAAAAAAAA AA AA Pharyngealized Alveolar A A AA AA A A AAA AAA A A A A AAAA AAA A A A Glottal AA AAAAAA AA AA A A A AA AA AAA A AAAAAA AAA AAAAAAAAA AAA AAAAAAAA AAAAAAAAAAAAA AAAAAAA A A AAAAAAAAAAA A AAA A AAAAAAAA A AAA A A AA A A A AAAAAAAAAA A B AAAAAAAAAAA A AAAAAAAA AAAAAAA AA AAAAAA A AAAAAA A A A

Figure 7–2. Discrimination plane for voiced and voiceless fricatives. A) voiced. B) voiceless. CHAPTER 8 GENERAL DISCUSSION Several acoustic measurements were investigated in our study with the aim of describing the acoustic characteristics of fricatives as produced by native speakers of Arabic. The use of Arabic was motivated by three reasons. First, fricative articulation in Arabic spans most of the places of articulation in the vocal tract, starting from the lips and ending at the glottis. Second, for certain fricatives in Arabic, a phonemic distinction exists between plain fricatives (/D/ and /s/) and their pharyngealized counterparts (/DQ/ and /sQ/); and between short and long vowels (/i - i:, u - u:, a - a:/). Third, the majority of studies dealing with the acoustic characteristics of fricatives have been carried out predominantly with reference to English fricatives. Therefore, our study aimed at describing the acoustic characteristics of Arabic fricatives utilizing many of the acoustic measurements investigated in other related studies, with specific interest in finding cues that would differentiate between plain and pharyngealized fricatives. The cues investigated in our study were amplitude measurements (relative and normalized frication noise amplitude), spectral measurements (spectral peak location and spectral moments), temporal measurements (absolute and normalized frication noise duration) and formant information at the fricative-vowel transition (F2 at vowel onset and locus equation). Along with reporting these cues, an attempt was also made to classify fricatives into their respective places of articulation using statistical modeling (discriminant function analysis) with an optimum combination of the measurements mentioned above.

111 112

8.1 Temporal Measurement

Findings of the present study were in agreement with previous research dealing with the effect of place of articulation to the frication noise duration. Specifically, in agreement with previous research (Behrens and Blumstein 1988b; Jongman 1989; Pirello et al. 1997), our study found that the overall absolute frication noise duration of sibilant fricatives (mean 138.09 ms) was longer than nonsibilants (mean 109.34 ms). The longer duration of sibilants can be attributed to the greater articulatory effort needed to force air through the narrow constriction required for sibilant articulation. Additionally, frication noise duration of voiceless fricatives (mean 134.21 ms) was longer on average than that of voiced fricatives (mean 92.05 ms). Such effect of voicing was also found in previous studies of English (Cole and Cooper 1975; Baum and Blumstein 1987; Crystal and House 1988; Fox, Nissen, McGory, and Rosenbauer 2001; Nissen 2003) and Spanish fricatives (Manrique and Massone 1981). The effect of voicing on the reduction of segmental duration can be attributed in part to the decrease in air flow due to higher glottal impedance during voicing. Contrary to what was reported in previous research (Nissen 2003), our study did not find an effect of vowel context for vowels of the same length. However, fricative duration was significantly longer when it was followed by long high vowels (/i:, u:/) than when followed by their short counterparts (/i/ and /u/ respectively). Similar results with regard to sibilant/nonsibilant duration and effect of voicing were obtained when the duration of the fricatives was normalized relative to word duration. However, a different pattern of vowel context effect emerged with normalized frication duration. Specifically, within long vowels, high vowels (/i:, u:/) induced a longer normalized frication duration than the low vowel /a:/. Additionally, the normalized frication noise duration of fricatives was longer preceding the front vowel /i:/ than preceding the back vowel /u:/. Such effects 113 of vowel context are not surprising if intrinsic differences between vowel duration is taken into consideration. Vowel duration has been shown to corrolate with the degree of jaw lowering associated with its production such that the lower the vowel the longer its duration. (Fant 1960; Lindblom 1967; Beckman 1986). 8.2 Amplitude Measurement

Both normalized frication noise amplitude and relative amplitude were investigated in our study. Normalized frication RMS amplitude was defined as the difference between the RMS amplitude of frication noise and the average RMS amplitude of three consecutive pitch periods at the point of maximum vowel amplitude. The findings of our study are consistent with findings from previous research in that such measurements differentiated nonsibilants (/f, T, D, DQ/) as a class from sibilant fricatives (/s, sQ, z, S/) while failing to distinguish within each of the two classes. Although Jongman et al. (2000) study of English fricatives found noise amplitude to differentiate within sibilants and within nonsibilants, other research on frication noise amplitude (Strevens 1960; Heinz and Stevens 1961; Manrique and Massone 1979; Behrens and Blumstein 1988a) reported that while frication noise amplitude distinguished between sibilant and nonsibilants fricatives, it could not distinguish within sibilant or within nonsibilant fricatives. The decrease in nonsibilant frication noise normalized RMS amplitude as compared with sibilant fricatives was expected given the intrinsic amplitude associated with the two classes. Specifically, sibilant articulation, as explained in Section (8.1), involves a greater articulatory effort to force the air through the narrow constriction needed for sibilant articulation, giving rise to an increase in noise amplitude. The same reasoning can be used to explain the lower frication noise RMS amplitude of voiceless fricatives (mean −14.22 dB) as compared to their voiced counterparts (mean −18.26 dB). An additional source for this difference is the presence of two sources of acoustic energy during the production of voiced 114 fricative. The energy resulting from glottal vibration during voicing, in addition to acoustic energy resulting from frication at an oral constriction, results in an overall increase in the RMS amplitude of voiced fricatives. Not surprising also was the finding that normalized frication noise RMS amplitude increased proportional to the height of the vowel. Recall here that frication noise RMS amplitude is normalized by subtracting the vowel RMS amplitude, so when the intrinsic vowel amplitude increases, the overall normalized noise frication RMS amplitude decreases. Additionally, such intrinsic vowel amplitude is controlled by the degree of openness/closeness (height) of the vowel. In the articulation of /a (:)/, the oral cavity is wide open giving rise to an acoustic waveform of intrinsically higher amplitude (Lehiste and Peterson 1959; Beckman 1986). The opposite is true with high vowels. Interestingly, intrinsic vowel amplitude, as well as duration (see above), led to significant differences in the overall frication noise RMS amplitude only when the comparisons are confined to long vowels. Previous research on relative amplitude generally involved the perceptual effect of this cue on distinguishing places of articulation with Jongman et al. (2000) as the only notable exception. Our study found relative amplitude to be a reliable acoustic cue that differentiates among some, but not all, places of fricative articulation. On the other hand, the trend in our data was parallel to previously reported values in the literature (Hedrick and Ohde 1993; Jongman et al. 2000). Specifically, the voiceless post-alveolar fricative (/S/, mean = 0.9 dB) had the greatest relative amplitude, indicating a stronger concentration of energy above the F3 region. Furthermore, in line with Jongman et al. (2000) findings, our study found that nonsibilants, especially voiceless ones, have the highest relative amplitude. More importantly, pharyngealized fricatives /DQ/ and /sQ/ had significantly lower relative amplitude than their plain counterparts. 115

The difference in relative amplitude between plain and pharyngealized fricatives can be attributed to the lowering of vowel’s F2 frequency caused by (Stevens 1998) with the increase in amplitude associated with it. Recall here that for pharyngealized fricatives, relative amplitude was defined as the difference between the fricative’s and the vowel’s amplitude at the F2 region. Therefore, an increase in vowel amplitude at such frequency will lead to a lowering of the relative amplitude value. There was also an effect of vowel context parallel to that obtained for normalized frication noise RMS amplitude. As before, such effect of vowel context is related to vowels’ intrinsic amplitude. With relative amplitude, our study revealed that relative amplitude measured for fricatives preceding low vowel /a:/ was significantly lower than those preceding high vowels /i:, u:/, due to the inherent higher amplitude of /a:/. 8.3 Spectral Measurement

Spectral peak location of fricatives, as was the case in previous studies (Hughes and Halle 1956; Strevens 1960; Manrique and Massone 1981; Behrens and Blumstein 1988b; Jongman et al. 2000), tends to decrease as the place of articulation moves backwards in the oral cavity. Furthermore, the results of the current study were in line with previous research in that spectral peak location distinguished nonsibilant from sibilant fricatives, with the only exception being the similar values obtained for /s/ and voiceless nonsibilants /f, T/. Although spectral peak location distinguished between post-alveolar /S/ and alveolar fricatives /s, z/, it failed to distinguish among nonsibilants. Moreover, plain and pharyngealized fricatives did not differ in terms of the frequency of the amplitude peak as measured at the midpoint of frication noise. Of interest here, however, is the fact that three mutually exclusive regions of fricative place of articulation can be identified based on spectral peak location. For voiceless fricatives, the first group includes fricatives articulated at or anterior to 116 the alveolar ridge, the second includes post-alveolar and uvular fricatives, while the third group consists of pharyngeal and glottal fricatives. For voiced fricatives, the groups followed the more traditional division of nonsibilants, sibilant and back-articulated fricatives. Spectral peak location was found not to be affected by vowel length but rather by its degree of roundedness such that rounded vowel /u/ introduced a lower spectral peak location than unrounded vowels /i, a/. Spectral moments (spectral mean, variance, kurtosis and skewness) were estimated in our study from four windows centered at frication noise onset, midpoint, offset and transition into the vowel. Albeit lower due to the male population from which the data were sampled, the average values for spectral mean in our study were consistent with those reported for similar fricatives in Jongman et al. (2000); Nissen (2003): alveolar fricatives had the highest while the lowest spectral mean was observed for pharyngeal and glottal fricatives. Furthermore, spectral mean, averaged across all windows, served to distinguish all places of voiced fricatives articulation, and, as was the case with spectral peak location, identified three mutually exclusive groups of voiceless fricatives (/f, T, s, sQ/, /S, K/ and /Q, h/). Such classification ability of spectral mean, for both voiced and voiceless fricatives, was present at the second (frication noise midpoint) and third (transition) windows. It was also found that voiceless fricatives had higher spectral means than voiced fricatives in the first three windows, while the effect was reversed when the vocalic part (transition window) was used to measure spectral mean. Similar to the effects explained above for spectral peak location, vowel context also influenced the measured spectral mean in all four windows; with rounded vowel

/u(:)/ introducing lower spectral mean for the fricatives. Specifically of interest here is the fact that it was only when the fricative’s transition into the vowel was used to derive spectral mean values that a significant difference between plain 117 and pharyngealized fricatives was observed in part due to pharyngealization effect on the vocalic part of the window. As mentioned above, the general pattern of the obtained spectral mean values was parallel to that of Jongman et al. (2000). Contrary to this similarity, in our study spectral mean was more effective at the frication midpoint and offset in separating fricatives into their respective places of articulation as compared to Jongman et al. onset and transition windows. The results obtained for the second statistical moment (variance) were parallel in nature to that of spectral mean and very similar to values reported by Nissen (2003). No direct comparison could be made with variance values reported in Jongman et al. (2000) since in that study values were averaged across voicing. However, like both studies, our study found spectral variance of sibilants to be significantly lower than sibilants in the first three windows for voiceless fricatives and at all windows for voiced fricatives. Nevertheless, no differences were found within nonsibilant fricatives. Jongman et al. (2000) reported similar results for all but the second window. Another finding consistent with previous research is the lower variance of voiceless fricatives as compared to voiced fricatives (4.5 MHz and 5.4 MHz respectively) at the middle of frication noise. Although variance served to distinguish many of fricative place of articulation, it failed at all of the four analysis windows to statistically distinguish between plain and pharyngealized fricatives, or between fricatives in the vocalic contexts differing in length. Skewness measured at all window locations did not differentiate between plain fricatives and their pharyngealized counterparts. However, skewness measured at the second and third windows differentiated between all voiced fricatives. With the exception of alveolar /z/ that had the only negatively skewed distribution among voiced fricatives, skewness became positively skewed and increased as the place of articulation advances backwards in the oral cavity. For voiceless fricatives, skewness distinguished between sibilants and nonsibilants; and within sibilants at 118 the second analysis window. In general, alveolar fricatives had the lowest skewness indicating a concentration of energy at higher frequencies, while such concentration of energy was at lower frequencies for pharyngeal and glottal fricatives. Although the number of places investigated here is greater than in either Jongman et al. (2000) or Nissen (2003), our results are in general agreement with both studies for alveolar and post-alveolar fricatives. Also, our study is in agreement with Jongman et al. in that skewness increases substantially at the fricative-vowel transition due to “the predominance of low-frequency over high-frequency energy as the vowel begins” (Jongman et al. 2000, p. 1257). The effect of the vowel context became more pronounced at this transition window with rounded vowels /u, u:/ with their inherently lower frequencies. Kurtosis was used previously in the literature as a measure of the peakedness if the spectral distribution. In our study, kurtosis was substantially higher for pharyngeal fricatives /è, Q/ at the first three windows than all other fricatives. Furthermore, the peakedness of alveolar fricatives observed elsewhere in the literature (Tomiak 1990; Jongman et al. 2000; Nissen 2003) was not observed in our results. 8.4 Transition Information

Formant transitions at the fricative-vowel boundary were investigated in our study using measures of the second formant at transition and locus equations. For F2 values, the results obtained were consistent with predictions of the Source-Filter theory of speech production. Specifically, F2 values of pharyngealized fricatives were significantly lower than their plain counterparts. As mentioned previously, such values are expected due to the lowering effect of second formant in pharyngeal co-articulation (Stevens 1998). Also of interest was the finding that, within the back articulated fricatives, only the uvular fricatives had similar (and significantly lower) F2 values than sibilants and nonsibilants. 119

The similar grouping of uvular and pharyngealized fricatives suggests similar articulatory processes in their production. The reasoning behind this grouping is twofold: first, values of F2 are inversely related to the height of the tongue; and second, the secondary constriction involved in the /DQ, sQ/ production is in a higher position than that of plain pharyngeal fricatives (Al-Ani 1970; McCarthy 1994; Ladefoged and Maddieson 1996). Therefore, the fact that both pharyngealized and uvular fricatives shared similar F2 properties, that were distinct from all other fricatives, supports McCarthy (1994)’s proposal to name co-articulated emphatics in Arabic as “uvularized” rather than “pharyngealized”. However, such a generalization should be taken cautiously since the realization of emphatics as either uvularized or pharyngealized is dependent on the dialect of Arabic used (Keating 1988; Zawaydeh 1997; Watson 1999). Both the slope and y-intercept of locus equations in our study, in general, did not distinguish between all the different places of fricative articulation. However, both measurements served to distinguish uvular and glottal fricatives /X, K, h/ as a group having a higher slope and a lower y-intercept than all other fricatives. More importantly and in contrast to findings reported in Yeou (1997), y-intercept of pharyngealized fricatives did not differ from their plain counterparts, while only the slope of /DQ/ was different from /D/. 8.5 Discriminant Analysis

The various acoustical cues, except for locus equations, were used in a discriminant function analysis to identify the cues maximally contributing to the classification of fricatives into places of articulation. It was found that the spectral mean (at frication noise onset, middle, and offset), skewness (at onset, offset of frication and transition into the vowel), second formant at vowel onset, normalized RMS amplitude and spectral peak location were the variables contributing the most to the overall classification with a success rate of 83.2% . When voicing was 120 specified in the model the correct classification rate increased to 92.9% for voiced and 93.5% for voiceless fricatives. It is worth mentioning, however, that if rate of misclassification was taken into consideration, then fricatives could be clustered into three groups, namely nonsibilants, sibilants and gutturals with pharyngealized fricatives grouped with their plain counterparts in the same natural class. 8.6 Conclusion

Our study investigated the acoustic characteristics of Arabic fricatives. Results obtained from most of the cues used were consistent with results obtained in previous research for fricatives in other languages. Among the cues investigated, spectral measures were the most efficient in distinguishing among the different places of fricative articulation. Further research should focus on the perceptual reality of the acoustic cues investigated in this study and how changes in the acoustic cue effect the perceptually of fricative place of articulation. REFERENCES Abdelatty Ali, A. M., J. Van der Spiegel, and P. Mueller (2001). Acoustic-phonetic features for the automatic classification of fricatives. J Acoust Soc Am 109 (5 Pt 1), 2217–2235. Al-Ani, S. H. (1970). Arabic Phonology. Paris: Mouton, The Hague. Alwan, A. (1989). Perceptual cues for place of articulation for the voiced pharyngeal and uvular consonants. J Acoust Soc Am 86 (2), 549–556. Anderson, N. (1978). On the calculation of filter coefficients for maximum entropy spectral analysis. In D. G. Childers (Ed.), Modern spectrum analysis, pp. 252–255. New York, NY: IEEE Press. Baum, S. R. and S. E. Blumstein (1987). Preliminary observations on the use of duration as a cue to syllable-initial fricative consonant voicing in English. J Acoust Soc Am 82 (3), 1073–1077. Beckman, M. E. (1986). Stress and non-stress accent. Dordrecht, Holland: Foris. Behrens, S. and S. E. Blumstein (1988a). Acoustic characteristics of English voiceless fricatives:a descriptive analysis. J Phonetics 16, 295–298. Behrens, S. and S. E. Blumstein (1988b). On the role of the amplitude of the fricative noise in the perception of place of articulation in voiceless fricative consonants. J Acoust Soc Am 84 (3), 861–867. Boersma, P. and D. Weenink (2004). Praat: a system for doing phonetics by computer. Amsterdam: Institute of Phonetic Sciences of the University of Amsterdam. Chen, H. and K. N. Steven (2001). An acoustical study of the fricative /s/ in the speech of individuals with dysarthria. J Speech Lang Hear Res 44 (6), 1300–1314. Cole, R. A. and W. E. Cooper (1975). Perception of voicing in English affricates and fricaitves. J Acoust Soc Am 58 (6), 1280–1287. Crystal, T. and A. House (1988). Segmental durations in connected-speech signals: Current results. J Acoust Soc Am 83, 1553–1573. El-Halees, Y. (1985). The role of F1 in the place-of-articulation distinction in Arabic. J Phonetics 13 (3), 287–298. Fant, G. (1960). Acoustic theory of speech production. Mouton: The Hague.

121 122

Ferguson, C. A. (1959). Diglossia. Word 15, 325–340. Forrest, K., G. Weismer, P. Milenkovic, and R. N. Dougall (1988). Statistical analysis of word-initial voiceless obstruents: preliminary data. J Acoust Soc Am 84 (1), 115–123. Fowler, C. A. (1994). Invariants, specifiers, cues: An investigation of locus equations as information for place of articulation. Perception & Psychophysics 55, 597–611. Fox, R. A., S. Nissen, J. McGory, and K. Rosenbauer (2001). Age-related changes in the acoustic characteristics of voiceless English fricative. J Acoust Soc Am 110, 2704. Govindarajan, K. (1998). Listeners’ perceptual mapping of locus equations and variability. Behav Brain Sci 21 (2), 266–267. Gurlekian, J. A. (1981). Recognition of the Spanish fricatives /s/ and /f/. J Acoust Soc Am 70 (6), 1624–1627. Hair, J., R. Anderson, and R. Tatham (1987). Multivariate data analysis with readings. New York, NY: MacMillan. Harrington, J. and S. Cassidy (1999). Techniques in Speech Acoustics. Norwell, MA: Kluwer Academic Publisher. Harris, F. J. (1978). On the use of windows for harmonic analysis with the discrete fourier transform. Proceedings of IEEE 66, 51–83. Harris, K. S. (1958). Cues for the discrimination of American English fricatives in spoken syllables. Lang Speech 1, 1–7. Hedrick, M. (1997). Effect of acoustic cues on labeling fricatives and affricates. J Speech Lang Hear Res 40 (4), 925–938. Hedrick, M. S. and R. N. Ohde (1993). Effect of relative amplitude of frication on perception of place of articulation. J Acoust Soc Am 94 (4), 2005–2027. Heinz, J. M. and K. N. Stevens (1961). On the properties of voiceless fricative consonants. J Acoust Soc Am 33, 589–596. Hughes, G. W. and M. Halle (1956). Spectral properties of fricative consonants. J Acoust Soc Am 28, 303–310. Jassem, W. (1979). Classification of fricative spectra using statistical discriminant functions. In B. Lindblom and S. Ohman¨ (Eds.), Fronteirs of Speech Research. London: Academic Press. Johnson, K. (1997). Acoustic and Auditory Phonetics. Oxford: Blackwell. 123

Jongman, A. (1989). Duration of fricative noise required for identification of English fricatives. J Acoust Soc Am 85, 1718–1725. Jongman, A. (1998). Are locus equations sufficient or necessary for obstruent perception? Behav Brain Sci 21 (2), 271–272. Jongman, A., R. Wayland, and S. Wong (2000). Acoustic characteristics of English fricatives. J Acoust Soc Am 108 (3 Pt 1), 1252–1263. Kaye, A. S. (1972). Arabic /ˇz/: A synchronic and diachronic study. Linguistics 79, 31–63. Keating, P. (1988). A Survey of Phonological Features. Bloomington, IN: Indiana University Linguistics Club. Kent, R. D. and C. Read (2002). The Acoustic Analysis of Speech. San Diego: Singular Publishing Group. Klecka, W. (1980). Discriminant Analysis. London: Sage. Krull, D. (1989). Second formant locus pattern and consonant-vowel coarticulation in spontaneous speech. Perilus 10, 87–108. Ladefoged, P. and I. Maddieson (1996). The sounds of the world’s languages. Oxford: Blackwell. LaRiviere, C., H. Winitz, and F. Herriman (1975). The distribution of perceptual cues in English prevocalic fricatives. J Speech Hear Res 18, 613–622. Lehiste, I. and G. Peterson (1959). Vowel amplitude and phonemic stress in american english. J Acoust Soc Am 31, 428–435. Liberman, A. M., F. S. Cooper, D. P. Shankweiler, and M. Studdert-Kennedy (1967). Perception of the speech code. Psychol Review 74 (6), 431–461. Lindblom, B. (1963). A spectrographic study of vowel reduction. J Acoust Soc Am 35, 1773–1781. Lindblom, B. (1967). Vowel duration and a model of lip mandible coordination. STL-QPSR 8 (4), 1–29. Mann, V. A. and B. H. Repp (1980). Influence of vocalic context on perception of the [s] - [sh] distinction. Perception & Psychophysics 28, 213–228. Manrique, A. M. and M. I. Massone (1979). On the identification of Argentine Spanish voiceless fricatives. In Proceedings of the Ninth International Congress of Phonetic Sciences, Volume 1, Copenhagen, Denmark, pp. 237. Manrique, A. M. and M. I. Massone (1981). Acoustic analysis and perception of Spanish fricative consonants. J Acoust Soc Am 69 (4), 1145–1153. 124

McCarthy, J. (1994). The phonetics and phonology of semitic pharyngeals. In P. Keating (Ed.), Papers in laboratory phonology 3: Phonological structure and phonetic form, pp. 191–233. Cambridge: Cambridge University Press. McCasland, G. P. (1979). Noise intensity and spectrtuirt cues for spoken fricatives. J Acoust Soc Am Suppl 165, S78–79. Nissen, S. (2003). An accoustic analysis of voicless obstruents produced by adults and typically developing children. Ph. D. thesis, Ohio State University, Columbus, OH. Nittrouer, S. (1995). Children learn separate aspects of speech production at different rates: evidence from spectral moments. J Acoust Soc Am 97 (1), 520–530. Nittrouer, S., M. Stiddert-Kennedy, and R. McGowan (1989). The emergence of phonetic segments: evidence from the spectral structure of fricative-vowel syllables spoken by children and adults. J Speech Hear Res 32, 120–132. Norlin, K. (1983). Acoustic analysis of fricatives in cairo Arabic. Working Papers, Phonetics Laboratory, Lund University 25, 113–137. Pentz, A., H. R. Gilbert, and P. Zawadzki (1979). Spectral properties of fricative consonants in children. J Acoust Soc Am 66 (6), 1891–1893. Pirello, K., S. E. Blumstein, and K. Kurowski (1997). The characteristics of voicing in syllable-initial fricatives in American English. J Acoust Soc Am 101 (6), 3754–3765. Press, W. H., B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling (1992). Numerical recipes in C: the art of scientific computing. Cambridge: Cambridge University Press. Shadle, C., S. J. Mair, and J. N. Carter (1996). Acoustic characteristics of the front fricatives [f, v, T, D]. In Proceedings of ETRW - 4th Speech Production Seminar, Aturans, France, pp. 193–169. Shadle, C. H. (1985). The acoustics of fricative consonants. Ph. D. thesis, M.I.T., Cambridge, MA. Shadle, C. H. (1990). Articulatory-acoustic relationships in fricative consonants. In W. J. Hardcastle and A. Marchal (Eds.), Speech Production and speech modelling, pp. 187–209. Dordrecht, Netherlands: Kluwer Academic Publishers. Shadle, C. H. and S. J. Mair (1996, October). Quantifying spectral characteristics of fricatives. In Proceedings of the Fourth International Conference on Spoken Language Processing, Volume 3, Philadelphia, PA., pp. 1521–1524. 125

Soli, S. D. (1981). Second formants in fricatives: acoustic consequences of fricative- vowel coarticulation. J Acoust Soc Am 70 (4), 976–984. Stevens, J. (2002). Applied multivariate statistics for the social sciences. Mahwah, NJ: Erlbaum. Stevens, K. N. (1971). Airflow and turbulence noise for fricative and stop consonants: Static considerations. J Acoust Soc Am 50, 1182–1192. Stevens, K. N. (1985). Evidence for the role of acoustic boundaries in the perception of speech sounds. In V. Fromkin (Ed.), Phonetic Linguistics., pp. 243–256. New York, NY: Academic Press. Stevens, K. N. (1998). Acoustic Phonetics. Cambridge, MA: MIT Press. Stevens, K. N. and S. E. Blumstein (1981). The search for invariant acoustic correlates of phonetic features. In P. D. Eimas and J. L. Miller (Eds.), Perspectives of the Study of Speech. Hillsdale, NJ: Erlbaum. Strevens, P. (1960). Spectra of fricative noise in human speech. Lang Speech 3, 32–49. Sussman, H. M. (1994). The phonological reality of locus equations across manner class distinctions: Preliminary observations. Phonetica 51, 119–131. Sussman, H. M., D. Fruchter, J. Hilbert, and J. Sirosh (1998). Linear correlates in the speech signal: the orderly output constraint. Behav Brain Sci 21 (2), 241–299. Sussman, H. M., K. A. Hoemeke, and F. S. Ahmed (1993). A cross-linguistic investigation of locus equations as a phonetic descriptor for place of articulation. J Acoust Soc Am 94 (3 Pt 1), 1256–1268. Sussman, H. M., H. A. McCaffrey, and S. A. Matthews (1991). An investigation of locus equations as a source of relational invariance for stop place categorization. J Acoust Soc Am 90, 1309–1325. Tabain, M. (1998). Non-sibilant fricatives in English: spectral information above 10 khz. Phonetica 55 (3), 107–130. Tabain, M. (2001). Variability in fricative production and spectra: implications for the hyper- and hypo- and quantal theories of speech production. Lang Speech 44 (Pt 1), 57–94. Tabain, M. (2002). Voiceless consonants and locus equations: a comparison with electropalatographic data on coarticulation. Phonetica 59 (1), 20–37. Tjaden, K. and G. S. Turner (1997). Spectral properties of fricatives in amyotrophic lateral sclerosis. J Speech Lang Hear Res 40 (6), 1358–1372. 126

Tomiak, G. R. (1990). An acoustic and perceptual analysis of the spectral moments invariant with voiceless fricative obstruents. Ph. D. thesis, State University of New York, Buffalo, NY. Watson, J. C. (1999). The directionality of emphasis spread in arabic. Linguistic Inquiry 30, 289–300. Wilde, L. (1993). Inferring articulatory movements from acoustic properties at fricative-vowel boundaries. J Acoust Soc Am 94, 1881. Wilde, L. F. and C. B. Huang (1991). Acoustic properties at fricative-vowel boundaries in American English. In Proceedings of the of the 12th International Congress of Phonetics Sciences, Aix-en-Provence, pp. 394–401. Yeou, M. (1997). Locus equations and the degree of coarticulation of Arabic consonants. Phonetica 54, 187–202. Zawaydeh, B. A. (1997). An acoustic analysis of spread in Ammani- Jordanian Arabic. Studies in the Linguistic Sciences 27 (1), 185–200. BIOGRAPHICAL SKETCH Mohamed Ali Al-Khairy was born in Makkah, Saudi Arabia. He went to Umm Al-Qura University and earned his B.A. in English Literature and Linguistics. At the University of Florida, he started graduate study in linguistics in Fall 1998. He completed an M.A. in linguistics in Fall 2000 and then embarked on a Ph.D. degree in linguistics. During his study, he taught for the Department of African and Asian Languages and Literature from 1999 to 2004. He received an Alec Courtelis Award for Exceptional International Students in 2002 and a College of Liberal Arts and Sciences Award for International Student with Outstanding Academic Achievement in the same year. He was also awarded a McLaughlin Dissertation Fellowship in Spring 2005.

127