HCS 7367 Perception  Anatomical, acoustic, neural properties  Production and perception

 Focuses on continuous/physical properties Dr. Peter Assmann  Basic unit: phone (=speech sound) h Fall 2014  Narrow transcription: [ p ]

Phonology

 Abstract representations mediate between  : minimum meaning- the speech stream and its interpretation as differentiating units in a

meaningful  Phonemic inventory: list of the phonemes  Discrete entities (phonemes, , ...) of a given language

 Basic unit: (smallest contrastive  American English has ~12 unit of a language) phonemes, 3 , ~24

 Broad transcription: / p /

Phonology Phonology

 Minimal pairs test: criterion for deciding  Allophones: variants (pronunciations) of a whether two sounds belong to the same given phoneme

phonemic category.  In English, the “p” sound in “pot” is aspirated h  If two words or phrases in a given language (pronounced [ p ɔt ]) while the “p” in “spot” is differ only in one phonological segment (e.g. unaspirated (pronounced [ spɔt ]). English “pin” and “bin”) but are perceived as  [ph] and [p] are allophones of the phoneme /p/ distinct words, then that sound contrast contains two distinct phonemes (/p/ and /b/).

1 Perception Phonology  What is a consonant?  Preliminaries to Jakobson, Fant and Halle (1961).  Consonants are sounds produced with a major Speech Analysis. constriction somewhere in the vocal tract.  Distinctive feature theory: universal set of (binary) features, combining articulation and perception  Chomsky and Halle (1968). The Sound Pattern of English  Phonological representations as sequences of segments (phonemes) composed of distinctive features.  Two levels: underlying abstract representation and surface phonetic representation

English Consonants Consonant production

Labio‐  Bilabial Dental Alveolar Palatal Velar Glottal dental  American English has 24 consonant phonemes Voicing  –+–+–+–+–+–+–  The standard phonetic classification system Stop p b t d k g uses three main features to classify the Nasal m n ŋ articulation

of consonants: f v θ ð s z ∫ ʒ h t∫ ʤ  Place of articulation

Manner (w) r jw Lateral l 

Ladefoged, P. (1993). A Course in Phonetics. Harcourt Brace: Fort Worth. 3rd Ed.  Voicing

Consonant classification Stop consonants in English

Place of articulation  Place of articulation bilabial alveolar velar  bilabial, labiodental, dental, alveolar, palatal, velar, glottal /ba/ /da/ /ga/  Manner of articulation voiced

 stops, nasal, , , approximant, lateral Voicing /pa/ /ta/ /ka/  Voicing voiceless

 voiced, voiceless Manner of articulation = stop consonants

2 Consonant production Source‐filter theory for consonants

 Speech production theory  Consonants: sounds produced with a major

 Source‐filter concept still applies constriction somewhere in the vocal tract.

 Relatively narrow constriction or closure  Source‐filter theory still applies, but there are

 Secondary sound source in the upper vocal tract complications:

1. secondary sound source within the vocal tract

2. secondary source is aperiodic, turbulent noise (frication) rather than quasi‐periodic like the glottal source

3. secondary source may combine with the glottal source to produce mixed excitation (voiced fricatives)

Acoustic tube perturbation and vowel formants Filter properties of stop consonants Source: Stevens (1999, p. 286) After Stevens (1997)

velar alveolar 2.0

1.5

labial F2 frequency (kHz) F2 frequency

1.0 2.0 2.5 3.0 F3 frequency (kHz)

Area functions: cylindrical tube Source‐filter theory for consonants approximations of the vocal tract

 Obstruents (fricatives, affricates, stops)  Sonorants (nasals, liquids, glides)  In obstruents, the constriction can divide the resonating vocal tract into two separate cavities.

 coupled vs. uncoupled

 interactions with the source

3 Schematic model for fricative /s/

After Kent, Dembowski & Lass (1996)

Voiceless Obstruction (maxillary incisors) unaspirated stop consonant

Glottis Jet Lips [pɑ] Eddies

Back Cavity Front Cavity

Anterior constriction

Time (ms)

Place of articulation Place of articulation

 What are the acoustic cues for place of  What are the acoustic cues for place of articulation (i.e., what acoustic properties do articulation (i.e., what acoustic properties do listeners use to distinguish /p/, /t/, /k/ (and listeners use to distinguish /p/, /t/, /k/ (and /b/, /d/, /g/) from each other)? /b/, /d/, /g/) from each other)?

1. Frequency of burst

2. Formant transitions (F2 and F3) from the burst to the vowel

Acoustic cues for place of Formant transitions articulation in stops

 Burst cues  During the closure for a stop consonant the

 Release burst, frication noise, aspiration noise vocal tract is completely closed, and no sound is emitted. At the moment of release  Spectral shape the resonances change rapidly. These  Static vs. dynamic changes are called formant transitions.  Formant transition cues

 F2, F3 onset vs. target (vowel) frequency

4 Formant transitions Burst cues

 The first formant (F1) always increases in  The burst is a brief acoustic transient that frequency following the release of a stop occurs at the moment of consonant release. closure. F2 and F3 increases or decreases,  Burst intensity is higher for voiceless stops than depending on the place of constriction and for voiced stops the flanking .  Burst frequency varies as a function of place

voicing onset burst

Burst cues Stop consonants (from Kent and Read, 1992)

 Is consonant place information available when the burst is removed? Closure (stop gap)  Release Aspirated  -initial (noise burst) prevocalic stop Unaspirated  Transition (formant transitions)

/ pɑ /

Alveolar stop Velar stop ata aka

4 4 Closure Release Formant (stop gap) (noise burst) transitions / ada / Closure Release Formant /aga/ (stop gap) (noise burst) transitions 3 3 •high F2 •high F2 F3 •high F3 •low F3

2 2 F3 F2 F2 Frequency (kHz) Frequency Frequency (kHz) Frequency 1 1

F1 F1 0 0 0 100 200 300 400 500 0 100 200 300 400 500 Time (ms) Time (ms)

5 Bilabial stop Stop consonants apa (from Kent and Read, 1992)

4 Closure Release Formant Transition (stop gap) (noise burst) transitions /aba/ (formant transitions) Syllable-final 3 postvocalic stop Closure •low F2 (stop gap) •low F3 F3 Release (noise burst) 2 or No release (no burst) Frequency (kHz) Frequency 1 F2

F1 /  /

0 0 100 200 300 400 500 Time (ms)

Alveolar stop Bilabial stop ap

4 4 /ad/ /ab/ 3 3 •high F2 •low F2 F3 •high F3 F3 •low F3 2 2 F2

Frequency (kHz) F2

Frequency (kHz) 1 1

0 0 0 50 100 150 200 250 0 100 200 300 400 Time (ms) Time (ms)

Burst cues: /ba/ and /pa/ Burst cues: /da/ and /ta/

/ba/ /pa/ /da/ /ta/

4 4 4 4

3 3 3 3

2 2 2 2 Frequency (kHz) Frequency (kHz) 1 1 1 1

0 0 0 0 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 350 400 0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 Time (ms) Time (ms) Time (ms) Time (ms)

• Bilabials: Burst energy is concentrated at low frequencies • Alveolars: Burst energy is concentrated at high frequencies

6 Burst cues: /ga/ and /ka/ Place of articulation

/ga/ /ka/  What are the acoustic cues for place of 4 4 articulation (i.e., what acoustic properties do

3 3 listeners use to distinguish /p/, /t/, /k/ (and

2 2 /b/, /d/, /g/) from each other)? Frequency (kHz) Frequency (kHz) 1 1

0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 Time (ms) Time (ms)

• Velars: Burst energy is concentrated in the mid range

Place of articulation Acoustic cues for place of articulation in stops

 What are the acoustic cues for place of  Burst cues

articulation (i.e., what acoustic properties do  Release burst, frication noise, aspiration noise listeners use to distinguish /p/, /t/, /k/ (and  Spectral shape /b/, /d/, /g/) from each other)?  Static vs. dynamic

1. Frequency of burst  Formant transition cues

2. Formant transitions (F2 and F3) from the  F2, F3 onset vs. target (vowel) frequency burst to the vowel

Formant transitions Formant transitions

 The first formant (F1) always increases in  During the closure for a stop consonant the vocal tract is completely closed, and no frequency following the release of a stop sound is emitted. At the moment of release closure. F2 and F3 increases or decreases, the resonances change rapidly. These depending on the place of constriction and changes are called formant transitions. the flanking vowels.

7 Burst cues Burst cues

 The burst is a brief acoustic transient that  Is consonant place information available when occurs at the moment of consonant release. the burst is removed?

 Burst intensity is higher for voiceless stops than for voiced stops

 Burst frequency varies as a function of place

voicing onset burst

Invariance problem for stop consonants Invariance problem for stop consonants Liberman, Delattre, and Cooper (1952)

NOISE VOWELS BURSTS B C t A 3.5 4K 3.0

3K 2.5

2K 2.0 ppkpk p 1K 1.5

Burst frequency(kHz) i eu  c  1.0 burst + transitionless vowels 0.5

pp Cooper, Delattre, Liberman, Borst & Gerstman (1952) JASA 24, 597-606 ie a c ou Majority responses

Invariance problem for stop consonants Invariance problem for stop consonants

Transition + vowel without burst

8 Categorical Perception of stop consonants Phoneme boundary (50%) Equal acoustic changes  unequal auditory percepts Identification place of articulation of stops: /b/ vs /d/ vs /g/ (labeling) function

Discrimination function b d g

Liberman, Harris, Hoffman, and Griffith (1957) Journal of Experimental Psychology 54, 358-368

Categorical Perception Segmentation problem for stops

Where are the segments? 1. Identification function shows steep "bag" slope (cross‐over) between categories /bæg/ 2. Good between‐category discrimination F3 (near perfect) when the members of the F2

pair belong to different categories F1

3. Poor within‐category discrimination time (near chance) when they are perceived to belong to the same category

Same noise ‐ different consonant Different transition ‐ same consonant

F 2

1400 Hz 1400 Hz

F 1

pea ka dee da

9 Motor Theory K.N. Stevens

  Invariance problem Quantal theory  Certain sounds are favored in the  Segmentation problem of the world because their  Categorical perception acoustic properties can be produced  Mapping from continuous to discrete with a wide range of articulations.  Encodedness of speech

 Units of perception (segments, syllables..) Acoustic Articulatory consequences  Linguistic units, and articulation variations

K.N. Stevens K.N. Stevens JASA 2002

 Distinctive feature theory  Acoustic Landmarks

 syllables /bi/, /du/, …  An early stage in the processing of speech identifies acoustic landmarks in the signal  segments /b/, /d/, … such as syllable nuclei and acoustic  features [+voiced], [-rounded], … discontinuities corresponding to consonantal closures and releases.

 Landmarks point to regions of the signal where a more detailed phonetic analysis is carried out.

K.N. Stevens JASA 2002 K.N. Stevens JASA 2002

 Lexical access from features  Lexical access from features

 The acoustic signal in the vicinity of these  From these landmarks and features, lexical landmarks is processed by a set of () hypotheses are evaluated.

modules, each of which identifies a  This is done using analysis-by-synthesis, phonetic feature. in which a word sequence is hypothesized, a possible pattern of features from this sequence is internally synthesized, and the synthesized pattern is tested for a match against an acoustically derived pattern.

10 Burst spectrum Static spectral shape model

Step 1: Find consonant burst onset  Blumstein and Stevens (1979, 1980) invariance hypothesis (static spectral shape model)

 The shape of the spectrum – sampled at the time of burst release – provides invariant cues specifying the place of articulation for the English stop consonants.

Temporal window Acoustic landmarks

Step 2: Position a 25.6 ms half-Hamming window Step 3: Compute spectrum of windowed segment

10

0 LPC spectrum -10 FFT spectrum -20 Amplitude in dB in Amplitude

-30

-40

-50 0 0.5 1 1.5 2 2.5 33.5

Frequency (kHz)

Static spectral shape model Static spectral shape model

Diffuse falling Diffuse rising Compact  Predictions: 10 10 10 0 0 0 1) spectral shape for a given place of -10 -10 -10

-20 -20 -20 articulation is the same for different talkers Amplitude in dB Amplitude in dB Amplitude in dB Amplitude in

-30 -30 -30

-40 -40 -40 and in different phonetic contexts.

-50 -50 -50 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 0 0.5 1 1.5 2 2.5 3 3.5 Frequency (kHz) Frequency (kHz) Frequency (kHz) 2) acoustic modifications that distort other / ba / / da / / ga / aspects of the syllable but preserve the spectral shape near the burst do not affect Burst spectra of syllable-initial stop consonants consonant identity.

11 Dynamic properties Locus Equations

 Kewley‐Port (1983) hypothesized that the  Delattre et al. (1954) described the locus as identification of stop consonants is based on the frequency location of F2 extrapolated time‐varying changes in the spectrum from the back in time to the consonant release. onset of the burst into the transition region. F2 of /i/ locus Onset of Onset of voiced formant transition vowel target F2 of /u/

Problems with the locus concept Locus equations

 No physical energy exists at the time/frequency position of the locus; actual extrapolation of  Sussman et al. (1991, 1993) proposed the formant transition may lead to a change in that locus equations provide invariant consonant identity. relational cues for the perception of place of articulation in stop consonants.  Locus equations describe the relationship locus Onset of between the frequency of F2 at burst Onset of voiced formant transition vowel target onset and F2 in the vowel.

Locus equations Locus equations

ba da

4 4

3 3

2 2 Frequency (kHz) Frequency (kHz) 1 1

0 0 0 100 200 300 400 0 50 100 150 200 250 300 Time (ms) Time (ms)

12 Locus equations Locus equations

ga

4

3

2 Sussman (1991) Frequency (kHz) 1

0 0 50 100 150 200 250 300 Time (ms)

Smits, ten Bosch, and Collier (1996) Cues for voicing

 Gross cues • VOT – onset time  spectral features that are distributed across frequency or time Short lag (0-20 ms) = voiced sounds /b,d,g/

 overall spectral shape or its relative change over time Long lag (80-100 ms) = unvoiced sounds /p,t,k/  Detailed cues

 features that are narrowly localized in frequency or time

 center frequency of prominent spectral peaks and their dynamic transitions.

short lag long lag bi pi

4 4

3 3

2 2 Frequency (kHz) Frequency (kHz) 1 1

0 0 0 50 100 150 200 250 300 0 100 200 300 400 Time (ms) Time (ms)

13 long lag short lag di ti

4 4

3 3

2 2 Frequency (kHz) Frequency (kHz) 1 1

0 0 0 50 100 150 200 250 300 0 50 100 150 200 250 300 350 Time (ms) Time (ms)

short lag long lag gi ki

4 4

3 3

2 2 Frequency (kHz) Frequency (kHz) 1 1

0 0 0 50 100 150 200 250 300 350 0 100 200 300 400 Time (ms) Time (ms)

aga aka

4 4

3 3

2 2 Frequency (kHz) Frequency (kHz) 1 1

0 0 0 100 200 300 400 500 0 100 200 300 400 500 Time (ms) Time (ms)

14 VOT in Dutch Voicing Prevoiced and unaspirated stops • Lisker and Abramson (1970) Studied voice onset time (VOT) in several /bɛn/ /pɛn/ languages, some with 2 voicing categories (e.g., English, Spanish, Cantonese) and others with 3 or 4 voicing categories (e.g. Thai, Hindi, Korean) • Prevoiced • Voiced • Voiceless aspirated • Voiceless unaspirated

VOT in an English-Dutch bilingual child Voicing

E. Simon (2010). • Identification Child L2 development: A longitudinal case study on functions for Voice Onset Times in word- initial stops. English VOT J. Child Lang. 37 159–173. continuum. • Frequency histograms of measured VOT in English stops.

Identification and discrimination of Voicing English VOT continuum • Identification 100 Discrimination functions for Identification function Thai VOT 80 function

continuum. 60 • Frequency histograms of 40 measured VOT in Thai stops. Percent /ba/ responses 20 Hypothetical data 0 -150 -100 -50 0 50 100 150 Voice onset time (ms)

15 Voicing cues in stop consonants Categorical Perception / a d a / • Relationship between VOT and 4 identification functions is nonlinear, with 4. FUNDAMENTAL steep transitions and a clearly defined FREQUENCY 3 boundary. 3. BURST AMPLITUDE

• Discrimination functions show a peak near 2 the phoneme boundary; near chance 1. VOICE ONSET TIME Frequency (kHz) Frequency levels elsewhere. 1 5. VOWEL DURATION 6. PREVOICING

2. F1 CUTBACK 0 0 100 200 300 400 500 Time (ms)

Voicing cues in stop consonants

/ a t a /

4 4. FUNDAMENTAL FREQUENCY 3 3. BURST AMPLITUDE

2 1. VOICE ONSET TIME Frequency (kHz) Frequency 1 5. VOWEL DURATION 6. PREVOICING

2. F1 CUTBACK 0 0 100 200 300 400 500 Time (ms)

16