Audiovisual Sentence Recognition Not Predicted by Susceptibility to the Mcgurk Effect
Total Page:16
File Type:pdf, Size:1020Kb
Atten Percept Psychophys DOI 10.3758/s13414-016-1238-9 Audiovisual sentence recognition not predicted by susceptibility to the McGurk effect Kristin J. Van Engen1,2,3 & Zilong Xie1 & Bharath Chandrasekaran1 # The Psychonomic Society, Inc. 2016 Abstract In noisy situations, visual information plays a Keywords Audiovisual speech perception . Speech critical role in the success of speech communication: lis- perception in noise . McGurk effect teners are better able to understand speech when they can see the speaker. Visual influence on auditory speech per- ception is also observed in the McGurk effect, in which Introduction discrepant visual information alters listeners’ auditory per- ception of a spoken syllable. When hearing /ba/ while see- A brief foray into the literature on audiovisual (AV) speech ing a person saying /ga/, for example, listeners may report perception reveals a common rhetorical approach, in which hearing /da/. Because these two phenomena have been as- authors begin with the general claim that visual information sumed to arise from a common integration mechanism, the influences speech perception and cite two effects as evidence: McGurk effect has often been used as a measure of audio- the McGurk effect (McGurk & McDonald, 1976) and the visual integration in speech perception. In this study, we intelligibility benefit garnered by listeners when they can see test whether this assumed relationship exists within indi- speakers’ faces (Sumby & Pollack, 1954). (For examples of vidual listeners. We measured participants’ susceptibility papers that begin this way, see Altieri et al., 2011;Anderson to the McGurk illusion as well as their ability to identify et al., 2009; Colin et al., 2005; Grant et al., 1998;Magnotti sentences in noise across a range of signal-to-noise ratios et al., 2015; Massaro et al., 1993; Nahorna et al., 2012;Norrix in audio-only and audiovisual modalities. Our results do et al., 2007;Ronquestetal.,2010; Rosenblum et al., 1997; not show a relationship between listeners’ McGurk suscep- Ross et al., 2007; Saalasti et al., 2011; Sams et al., 1998; tibility and their ability to use visual cues to understand Sekiyama, 1997;Sekiyamaetal.,2003; Strand et al., 2014; spoken sentences in noise, suggesting that McGurk suscep- van Wassenhove et al., 2007.) Both effects have been replicat- tibility may not be a valid measure of audiovisual integra- ed many times and unquestionably show the influence of vi- tion in everyday speech processing. sual input on speech perception. It is often assumed, then, that these two phenomena arise from a common audiovisual integration mechanism. As a re- sult, the McGurk effect (i.e., auditory misperception of a spo- ken syllable when it is presented with incongruent visual in- * Bharath Chandrasekaran formation) has often been used as a measure of auditory-visual [email protected] integration in speech perception. Van Wassenhove et al. (2007), for example, define AV speech integration as having 1 Department of Communication Sciences and Disorders, The occurred when Ba unitary integrated percept emerges as the University of Texas at Austin, Austin, TX, USA result of the integration of clearly differing auditory and visual 2 Department of Linguistics, The University of Texas at Austin, informational content,^ and therefore use the McGurk illusion Austin, TX, USA to Bquantify the degree of integration that has taken place.^ (p. 3 Present address: Department of Psychological and Brain Sciences, 598). Alsius et al. (2007) similarly define the degree of AV Washington University, St. Louis, MO, USA integration as Bthe prevalence of the McGurk effect.^ (p.400). Atten Percept Psychophys Moreover, a number of studies investigating AV speech per- accordance with a protocol approved by the University of ception in clinical populations (e.g., individuals with schizo- Texas Institutional Review Board. phrenia(Pearletal.,2009), children with amblyopia (Burgmeier et al., 2015), and individuals with Asperger syn- McGurk task drome (Saalasti et al., 2011)) have used the McGurk effect as their primary dependent measure of AV speech processing. Stimuli The stimuli were identical to those in Experiment 2 of Despite the popularity of this approach, it still remains to be Mallick et al. (2015). They consisted of two types of AV convincingly demonstrated that an individual’s susceptibility syllables: McGurk incongruent syllables (auditory /ba/ + vi- to the McGurk illusion relates to their ability to take advantage sual /ga/) and congruent syllables (/ba/, /da/, and /ga/). The of visual information during everyday speech processing. McGurk syllables were created using video recordings from There is evidence that McGurk susceptibility relates eight native English speakers (four females, four males). A (weakly) to lip-reading ability under some task and scoring different female speaker recorded the three congruent conditions (Strand et al., 2014), with better lip readers being syllables.1 slightly more susceptible to the McGurk effect. There is also evidence linking lip-reading ability to AV speech perception Procedure. The task was administered using E-Prime 2.0 (Grant et al., 1998). The connection between McGurk suscep- software (Schneider et al., 2002). Auditory stimuli were pre- tibility and AV speech perception was directly investigated in sented binaurally at a comfortable level using Sennheiser HD- one study on older adults with acquired hearing loss (Grant & 280 Pro headphones, and visual stimuli were presented on a Seitz, 1998), with equivocal results: there was a correlation computer screen. A fixation cross was displayed for 500 ms between McGurk susceptibility and visual enhancement for prior to each stimulus. Following Mallick et al. (2015), partic- sentence recognition (r=.46), but McGurk susceptibility did ipants were instructed to report the syllable they heard in each not contribute significantly to a regression model predicting trial from the set /ba/, /da/, /ga/, and /tha/. The 11 stimuli (eight visual enhancement. The relationship between McGurk sus- McGurk and three congruent) were each presented ten times. ceptibility and the use of visual information during speech The presentation of these 110 stimuli was randomized and perception, therefore, remains unclear. Here we present a self-paced. within-subjects study of young adults with normal hearing in which we assess McGurk susceptibility and AV sentence rec- ognition across a range of noise levels and types. If suscepti- McGurk susceptibility. Responses to the McGurk incongru- ent stimuli were used to measure listeners’ susceptibility to the bility to the McGurk effect reflects an AV integration process McGurk effect. As in Mallick et al. (2015), responses of either that is relevant to everyday speech comprehension in noise, then we expect listeners who are more susceptible to the illu- /da/ or /tha/ were coded as McGurk fusion percepts. sion to show greater speech intelligibility gains when visual Speech perception in noise task information is available for sentence recognition. If, on the other hand, different mechanisms mediate the use of auditory Target speech stimuli A young adult male speaker of and visual information in the McGurk task and during every- American English produced 80 simple sentences, each con- day speech perception, then no such relationship is predicted. Such a finding would cast doubt on the utility of the McGurk taining four keywords (e.g., The gray mouse ate the cheese) (Van Engen et al., 2012). Sentences were used for this task task as a measure of AV speech perception. (rather than syllables) because our interest is in the relation- ship between McGurk susceptibility and the processing of running speech. Method Maskers. Two maskers, equated for RMS amplitude, were Participants generated to create speech-in-noise stimuli: speech-shaped noise (SSN) filtered to match the long-term average spec- Thirty-nine healthy young adults (18–29 years; mean age = trum of the target speech and two-talker babble consisting 21.03 years) were recruited from the Austin, TX, USA com- of two male voices. The two maskers were included to munity. All participants were native speakers of American assess listeners’ ability to take advantage of visual cues in English and reported no history of speech, language, or hearing problems. Their hearing was screened to ensure thresholds ≤25 dB hearing level (HL) at 1,000, 2,000, and 1 The unimodal intelligibility of these stimuli was tested by Mallick et al. 4,000 Hz for each ear, and their vision was normal or (2015). Auditory stimuli were identified with 97 % accuracy (SD = 4 %) corrected-to-normal. Participants were compensated in and visual stimuli were identified with 80 % accuracy (SD=10 %). Atten Percept Psychophys different types of challenging listening environments. SSN Results renders portions of the target speech signal inaudible to listeners (i.e., energetic masking), while two-talker babble McGurk susceptibility can also interfere with target speech identification by creat- ing confusion and/or distraction not accounted for by the Figure 1 shows the distribution of McGurk susceptibility physical properties of the speech and noise (i.e., informa- scores. As in Mallick et al. (2015), scores ranged from 0 % tional masking). Visual information has been shown to be to 100 % and were skewed to the right. more helpful to listeners when the masker is composed of other voices (Helfer & Freyman, 2005). Keyword identification in noise Average keyword intelligibility across