Speech Perception Is ‘Special’ – I.E
Total Page:16
File Type:pdf, Size:1020Kb
24.963 Linguistic Phonetics Perception - cues Discrimination Identification 100 /b/ /d/ /g/ 0 -6 0 +8 -6 0 +8 HC HC 100 Percent correct 0 Percent /b.d.g./ -6 0 +8 -6 0 +8 PG PG 100 0 -6 0 +8 -6 0 +8 DL DL Image by MIT OCW. Adapted from Liberman, A. M. "Some characteristics of perception in the speech mode." Perception and its Disorders 48 (1970): 238-254. And Liberman, A. M. "Discrimination in speech and nonspeech modes." Cognitive Psychology 2 (1970): 131-157. 1 • Reading for next week: Johnson ch. 9 • Assignments: – Acoustics assignment 4, due 11/3 – Talk to me about a paper topic. 2 Speech perception • The problem faced by the listener: To extract meaning from the acoustic signal. • This involves the recognition of words, which in turn involves discriminating the segmental contrasts of a language. • Much phonetic research in speech perception has been directed toward identifying the perceptual cues that listeners use. 3 Perceptual Cues • Production studies can reveal differences between minimally contrasting words, e.g. aspirated and unaspirated stops in Mandarin differ in VOT. – Are listeners sensitive to these differences in speech perception? • Most direct test of perceptual significance of an acoustic property: manipulate the acoustic property synthetically (or by editing, resynthesis) and see if perceptual response is affected. – E.g. vary VOT in synthetic CV syllables (or by editing natural utterances), and have listeners classify the syllables as voiced or voiceless. 4 VOT as a cue to stop voicing and aspiration contrasts - Lisker & Abramson (1970) • Synthesized a VOT continuum for /Ta/ syllables. – Three steady state formants for [a] – Formant transitions for labial, coronal, dorsal stop places (3 continua) – 37 VOT variants -150 ms to +150 ms • Negative VOT = voicing bar during stop closure • Positive VOT filled with aspiration noise • 10 ms steps, except 5 ms steps from -10 to +50. • Stimuli presented to – 5 speakers of Latin American Spanish – 12 speakers of American English – 8 speakers of Thai • Subjects classified stimuli using orthographic labels (forced choice). – e.g. ‘ba’, ‘pa’ 5 VOT - Lisker & Abramson (1970) • Histograms show frequencies of different VOT ranges from a production study. • Lines are identification functions for voiced (dashed) and voiceless (solid). • Spanish contrasts voiced vs. voiceless unaspirated [b vs. p] Courtesy of Arthur Abramson and Leigh Lisker. Used with permission. Source: Lisker, Leigh, and Arthur S. Abramson. "The voicing dimension: Some experiments in comparative phonetics." In Proceedings of the 6th international congress of phonetic sciences, pp. 563-567. Academia Prague, 1970. 6 VOT - Lisker & Abramson (1970) • In initial position, English usually contrasts voiceless unaspirated with aspirated [p, pʰ], with the occasional voiced stop [b~p]. • Place affects VOT in production and interpretation of VOT in perception Velar > alveolar > labial Courtesy of Arthur Abramson and Leigh Lisker. Used with permission. Source: Lisker, Leigh, and Arthur S. Abramson. "The voicing dimension: Some experiments in comparative phonetics." In Proceedings of the 6th international congress of phonetic sciences, pp. 563-567. Academia Prague, 1970. 7 VOT - Lisker & Abramson (1970) • In initial position, Thai contrasts voiced, voiceless unaspirated and aspirated stops [b, p, pʰ]. Courtesy of Arthur Abramson and Leigh Lisker. Used with permission. Source: Lisker, Leigh, and Arthur S. Abramson. "The voicing dimension: Some experiments in comparative phonetics." In Proceedings of the 6th international congress of phonetic sciences, pp. 563-567. Academia Prague, 1970. 8 Excursus: Categorical perception • Strict categorical perception is said to occur where discrimination performance is limited by identification performance, i.e. listeners only have access to category labels, so stimuli can only be distinguished if they are identified as belonging to different categories. • Tested in two stages: – Identification of a synthetic continuum – Discrimination of stimuli from the continuum 9 Categorical perception • E.g. Liberman (1970) place of articulation F2 transition continuum, b-d-g. 2500 +9 +8 +7 +6 2000 +5 +4 +3 +2 +1 0 1500 -1 -2 -3 (Hz) -4 -5 -6 1000 500 0 0 100 200 Time (msec) Image by MIT OCW. Adapted from Liberman, A. M. "Some characteristics of perception in the speech mode." Perception and its Disorders 48 (1970): 238-254.And Liberman, A. M. "Discrimination in speech and nonspeech modes." Cognitive Psychology 2 (1970): 131-157. 10 Categorical perception • Identification: Subjects identify stimuli as b, d, g • Discrimination: Subjects are presented with pairs of stimuli and asked to judge whether they are the same or different. Discrimination Identification 100 • Relatively abrupt /b/ /d/ /g/ transitions in 0 -6 0 +8 -6 0 +8 identification functions. HC HC 100 • Peaks in discrimination Percent correct function at the category 0 Percent /b.d.g./ -6 0 +8 -6 0 +8 boundary PG PG 100 0 -6 0 +8 -6 0 +8 DL DL Image by MIT OCW. Adapted from Liberman, A. M. "Some characteristics of perception in the speech mode." Perception and its Disorders 48 (1970): 238-254. And Liberman, A. M. "Discrimination in speech and nonspeech modes." Cognitive Psychology 2 (1970): 131-157. 11 Categorical perception • Discrimination has never been found to be precisely predictable from identification - Discrimination is always better than predicted. • More loosely, categorical perception is sometimes said to be exhibited where there is a discrimination peak at the category boundary determined by identification, even if the relationship is not precisely as predicted by strict categorical perception. • A sharp transition in the 'identification function' for a stimulus continuum is not categorical perception in any technical sense. 12 Why is categorical perception significant? • The (loose) categorical perception pattern contrasts with the pattern observed in psychophysical experiments using non- speech stimuli: "Typically, nonspeech stimuli that vary acoustically along a single continuum are perceived continuously, resulting in discrimination functions that are monotonic with the physical scale" (Luce and Pisoni, p.31). • This contrast was used by Liberman and others to argue that speech perception is ‘special’ – i.e. it uses special mechanisms, not the general mechanisms of non-speech auditory perception. 13 Why is categorical perception significant? • Vowels are not usually perceived categorically, even in the loose sense (Luce and Pisoni and refs there). • The argument for specialness from categorical perception has been weakened by: –Evidence for categorical perception of non-speech sounds (noise- buzz, Miller et al 1976). –Evidence that Chinchillas perceive a VOT continuum categorically (Kuhl and Miller 1975). • It has been argued that perception is actually essentially continuous, with categorical effects arising from a categorical decision (natural or experimentally imposed) (e.g. Massaro). 14 Modeling categorical perception • ‘Noisy’ continuous perception plus a decision criterion can account for the shape of identification functions ga ka © Source Unknown. All rights reserved. This content is excluded from our Creative Commons license. For more information, see https://ocw.mit.edu/help/faq-fair-use/. • Doesn’t explain discrimination peak at boundry. 15 Modeling categorical perception • The occurrence of discrimination peaks at category boundaries follows from a Bayesian model of discrimination (Feldman et al 2009). • When presented with two stimuli in a discrimination task, the listener is trying to locate the stimuli on a perceptual dimension (e.g. VOT). • Given the presence of noise in the signal (and perceptual process), the perceived VOT of a stimulus may differ from the actual VOT. • Listener has to estimate the most likely VOT for that stimulus given the perceived value, and their knowledge of the prior probabilities of different values of VOT. – Best estimate of VOT is shifted towards values with higher probability. 16 Modeling categorical perception • VOT values close to the category boundary have a low probability, so perceived VOT values near the boundary are shifted towards the category centers, resulting in better discrimination Courtesy of Arthur Abramson and Leigh Lisker. Used with permission. Source: Lisker, Leigh, and Arthur S. Abramson. "The voicing dimension: Some experiments in comparative phonetics." In Proceedings of the 6th international congress of phonetic sciences, pp. 563-567. Academia Prague, 1970. 17 Cues • Perception experiments based on synthetic/edited speech have been used to establish the perceptual significance of a wide variety of acoustic correlates of linguistic contrasts. 18 Cues to consonant contrasts • Place cues (Wright, Frisch and Pisoni 1999) Fricative noise Stop release burst F3 F2 F1 a t a a aas F2 Transitions a n a a l a Nasal pole and zero Relative spacing of F2 and F3 Image by MIT OCW. Adapted from Wright, R., S. Frisch, D. B. Pisoni. "Speech Perception." In Wiley Encyclopedia of Electrical and Electronics Engineering, Vol. 20. New York, NY: John Wiley and Sons, 1999, pp. 175-195. 19 Cues to consonant contrasts • Manner cues (Wright, Frisch and Pisoni 1999) Stop release burst Abruptness and degree of attenuation F3 F2 F1 a t a a aas Slope of formant Presence of formant Nasalization transitions structure of vowel a n a a l a Nasal pole and zero Image by MIT OCW. Adapted from Wright, R., S. Frisch, D. B. Pisoni. "Speech Perception." In Wiley Encyclopedia of Electrical and Electronics