Effects of Talker Variability on Speechreading

Effects of Talker Variability on Speechreading

Perception & Psychophysics 2000, 62 (7), /405-/4/2 Effects oftalker variability on speechreading DEBORAH A YAKEL, LAWRENCE D, ROSENBLUM, and MICHELLE A FORTIER University ofCalifornia, Riverside, California The effects of talker variability on visual speech perception were tested by having subjects speech­ read sentences from either single-talker or mixed-talker sentence lists. Results revealed that changes in talker from trial to trial decreased speechreading perfonnance. To help determine whether this decrement was due to talker change-and not a change in superficial characteristics of the stimuli­ Experiment 2 tested speechreading from visual stimuli whose images were tinted by a single color, or mixed colors. Results revealed that the mixed-color lists did not inhibit speechreading performance rel­ ative to the single-color lists. These results are analogous to findings in the auditory speech literature and suggest that, like auditory speech, visual speech operations include a resource-demanding com­ ponent that is influenced by talker variability. The relationship between speaker and speech recogni­ (Mills, 1987). Finally, in one experimental context using tion has been explored extensively in the last 40 years. discrepant audiovisual speech syllables, integration of Many theories ofauditory speech perception propose that visual speech was shown to be automatic and mandatory speech input undergoes a normalization process in which for most subjects (McGurk & MacDonald, 1976). talker-specific attributes are extracted and discarded, leav­ The second reason visual speech normalization would ing the phonetic material needed for the perception of seem an important issue is that it bears on a general the­ speech segments (Halle, 1985; K. Johnson, 1990). The oretical question in cognitive science. The question of concept that this talker normalization process might also modularity (see, e.g., Fodor, 1983), or whether particular extend to audiovisual speech recognition is implicit in cognitive functions exhibit behavioral and anatomical other speech perception theories (Fowler, 1986; Liberman specialization, has been central to modern cognitive sci­ & Mattingly, 1985; McClelland & Elman, 1986). How­ ence. Among other characteristics, modules are consid­ ever, no research has addressed this question for visual ered to be informationally encapsulated in that they have speech. By borrowing the recent methods used in the au­ access to only the information/processes needed for their ditory speech literature (Mullennix, Pisoni, & Martin, particular function. Interestingly, two ofthe prototypical 1989; Sommers, Nygaard, & Pisoni, 1994), the present modules cited by theorists are those for speech/language study examines the degree to which visual talker normal­ and face perception (e.g., Ellis, 1989; Fodor, 1983; Liber­ ization influences speechreading (lipreading). man & Mattingly, 1985). Inthis sense, visual speech per­ The issue ofvisual speech normalization would seem ception would seem to pose a particularly interesting the­ important for two reasons. First, there is accumulating oretical problem: Are processes enlisted for visual speech evidence that visual speech perception is an important perception associated with speech, face, or both functions? component of the general speech perception process. The modular characteristic ofinformation encapsulation While it is clear that speechreading can be useful for the would seem to suggest that all language recognition-in­ hearing impaired, it is also known that visual speech is cluding visual speech perception-would discard talker­ used by individuals with good hearing when they are faced specific facial properties. Testing the influences oftalker/ with a noisy environment (e.g., MacLeod & Summerfield, face normalization on visual speech perception can help 1987), speech with a heavy foreign accent, or speech con­ address this question. We now turn to the auditory speech veying complicated subject matter (Reisberg, Mcl.ean, normalization literature in order to borrow conceptual & Goldfield, 1987). There is also evidence that access to and methodological tools. visual speech is necessary for normal speech development One way the effects ofauditory speech normalization have been examined is through measuring the processing costs incurred during listening to multiple- versus single­ This research was supported by NSF Grant SBR-9617047 awarded talker stimuli. It is known that vowel and consonant stim­ to L.D.R. We gratefully acknowledge the assistance of Elizabeth Al­ berto. Mike Gordon, Sheila Kirby, Anjani Panchal, Julia Rogenski, and uli are easier to identify when spoken by a single talker Christi Royster, as well as the helpful comments of two anonymous re­ than when the talker changes from trial to trial (e.g., viewers and the UCR cognitive science group. D.A.Y.is currently in the Strange, Verbrugge, Shankweiler, & Edman, 1976; Ver­ Department of Psychology at Orange Coast College. M.A.F. is cur­ brugge, Strange, Shankweiler, & Edman, 1976). Analo­ rently in the Department ofPsychology at San Diego State University. gous multiple-talker effects have been observed for laten­ Correspondence should be addressed to L. D. Rosenblum, Department of Psychology, University of California, Riverside, CA 92521 (e-mail: cies in vowel categorizing and matching (Summerfield rosenblu@ citrus.ucr.edu). and Haggard, 1973). With regard to more complex stimuli, 1405 Copyright 2000 Psychonomic Society, Inc. 1406 YAKEL, ROSENBLUM, AND FORTIER Creelman (1957) found that listeners identify words em­ To summarize, a good amount of recent evidence not bedded in noise less accurately when they are spoken by only supports Mullennix et al.s (1989) suggestion that multiple versus single talkers. Mullennix et al. (1989) talker-specific information can be retained during pho­ replicated these findings using a larger set ofwords both netic processing, but also suggests that talker-specific embedded in noise and in the clear. In one experiment, information can facilitate speech recognition. In this sense, II subjects received a list of 68 words produced by one any "normalization" process that might occur would not ofthe talkers, while another 11 subjects received a list of completely discard talker-specific information. More gen­ 68 words derived from 15 talkers. Both word lists were erally, recent evidence suggests that the functions ofpho­ presented against varying degrees of white noise. Word netic recognition and voice identification are not as inde­ identification was more accurate for the single- than for pendent as once thought (e.g., Halle, 1985; K. Johnson, the mixed-talker list. Concerned that the effects might be 1990). In fact, Remez, Fellowes, and Rubin (1997) have attributable to the degraded nature of the stimuli, Mul­ proposed that both functions could use similar acoustic lennix et al. conducted a second experiment with unde­ primitives-a contention very different from those of'tra­ graded words and measured response latency for a nam­ ditional theories ofspeech and speaker perception (e.g., ing task. Performance for both identification and latency Pollack, Pickett, & Sumby, 1954; Van Lancker, Kreiman, measures were significantly worse for the mixed- than & Emmorey, 1985). for the single-talker list. While research on the effects ofauditory talker normal­ Mullennix and his colleagues (1989) offered two ex­ ization is abundant, there seems to be little analogous re­ planations for these results. First, the effects of talker search in the visual speech literature. It is known that variability could be due to speaker normalization pro­ speakers vary widely in their visible speech movements, cesses operating at a very early stage in the acoustic­ which bears on how difficult they are to speechread (De­ phonetic analysis. The normalized output would then be morest & Bernstein, 1992; Kricos & Lesner, 1982; Mont­ passed on to higher level language processes without gomery, Walden, & Prosek, 1987). However, it is not talker-specific information. On the basis ofthis account, known which, if any, talker-specific facial information one would expect processing costs due to talker variabil­ might be discarded during visual speech perception. Po­ ity to occur early in speech perception but not during tentially, a visual speech normalization process would higher level processing. strip away phonetically irrelevant information about the Alternatively, talker variability may affect performance face such as eye color, skin tone, and featural information because talker-specific features are retained for some time, beyond the mouth. The end product of this normaliza­ rather than discarded. Retaining such talker-specific fea­ tion might be the retention ofonly phonetically relevant tures for mixed-talker lists would incur a greater pro­ dimensions, including positions and movements of the cessing cost than it would for single-talker lists. Potentially, lips, tongue, andjaw. Presumably, this process ofnormal­ talker-specific features from a previous item could pro­ ization would take some time, so that speechreading duce interference when a subsequent item with different from a multiple-talker list would be more difficult than talker-specific features is perceived (Mullennix et aI., that from a single-speaker list. 1989). From this account, the effects of talker-specific The following experiments were designed to examine dimensions could appear for higher level functioning. the effects ofvisual speech "normalization"

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    8 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us