<<

9 Perceptual Effects of Cross-Modal Stimulation: Ventriloquism and the Freezing Phenomenon

JEANVROOME~ AND BEATRICEDE GELDER

Introduction that multimodal representations send feedbackto pri- mary sensory levels (e.g., Driver & Spence, 2000). In For readers of a book on multimodal perception, it this view, higher-order multimodallevels can affect sen- probablycomes as no surprise that most events in real sory levels. Vision might thus affect audition, but only lifeconsist of perceptual inputs in more than one modal- via multimodal representations. Alternatively, it may ity and that sensory modalities may influence each also be the casethat cross-modal interactions take place oilier.For example, seeing a speaker provides not only without multimodal representations. For example, it auditoryinformation, conveyed in what is said, but also may be that the sensesaccess each other directly from visualinformation, conveyed through the movements of their sensory-specificsystems (e.g., Ettlinger & Wilson, ilie lips, face, and body, as well as visual cues about the 1990). Vision might then affect audition without the originof the sound. Most handbooks on cognitive psy- involvement of a multimodal representation (see, e.g., chologypay comparatively little attention to this multi- Falchier, Clavagnier, Barone, & Kennedy, 2002, for modalstate of affairs, and the different senses-seeing, recent neuroanatomical evidence showing that there hearing,smell, taste, touch-are treated as distinct and are projections from primary auditory cortex to the separatemodules with little or no interaction. It is visual area VI). becomingincreasingly clear, however, that when the dif- The role of feedback in sensory processing has been ferentsenses receive correlated input about th~ same debated for a longtime (e.g., the interactive activation externalobject or event, information is often combined model of reading by Rumelhart and McClelland, 1982). by the perceptual system to yield a multimodally However, as far as cross-modal effects are concerned, determinedpercept. there is at present no clear empirical evidence that al- An important issue is to characterize such multisen- lows distinguishing among feed-forward, feedback, and goryinteractions and their cross-modal effects. THere direct-accessmodels. Feed-forward models predict that areat leastthree different notions at stake here. One is early sensory processing levels should be autonomous iliat information is processed in a hierarchical and and unaffected by higher-order processing levels, strictlyfeed-forward fashion. In this view, information whereas feedback or direct-access models would, in fromdifferent sensory modalities converges in a multi- principle, allow that vision affects auditory processes,or modalrepresentation in a feed-forward way. For exam- vice versa. Although this theoretical distinction seems pIe,in the fuzzy logic model of perception (Massaro, straightforward, empirical demonstration in favor of 1998),degrees of support for different alternatives one or another alternative has proved difficult to ob- from each modality-':'-say,audition and vision-are de- tain. One of the main problems is to find measuresthat terminedand then combined to give an overall degree are sufficiently unambiguous and that can be taken as of support.Information is propagated in a strictly feed- pure indices of an auditory or visual sensory process. forwardfashion so that higher-order multimodal repre- Among the minimal requirements for stating that a sentationsdo not affect lower-order sensory-specific cross-modaleffect has perceptual consequencesat early representations.There is thus no cross-talkbetween the sensory stages, the phenomenon should at least be sensorymodalities such that, say,vision affetts early pro- (1) robust, (2) not explainable as a strategic effect, and cessingstages of audition or vice versa. Cross-modal in- (3) not occurring at response-relatedprocessing stages. teractionsin feed-forward models take place only at or If one assumes stagewise processing, with sensation beyondmultimodal stages.An alternative possibility is coming before attention (e.g. the "late selection" view

141 of attention), one might also want to argue that ecologically useful, because spatial resolution in the (4) cross-modal effects should be pre-attentive. If these visual modality is better than in the auditory one. minimal criteria are met, it becomes at least likely that However, there are also other, more trivial explana- cross-modal interactions occur at early perceptual tions of the ventriloquism effect. One alternative is sim- processing stages, and thus that models that allow' ilar to Stroop task interference: when two conflicting access to primary processing levels (i.e., feedback or stimuli are presented together, such as when the word direct-accessmodels) better describe the phenomenon. blue is written in red ink, there is competition at the In our work on cross-modal perception, we investigated level of response selection rather than at a perceptual the extent to which such minimal criteria apply to some level per se. Stroop-like response competition may also casesof audiovisual perception. One caseconcerns a sit- be at play in the ventriloquism effect. In that case,there uation in which vision affects the localization of a sound would be no real attraction between sound and vision, (the ventriloquism effet.t) , the other a situation in but the ventriloquist illusion would be derived from the which an abrupt sound affects visual processing of a fact that~ubjects sometimes point to the visual stimulus rapidly presented visual stimulus .(the freezing phe- instead of the sound by mistake. Strategic or cognitive nomenon). In Chapter 36 of this volume, we describe factors may also playa role. For example, a subject may the case of cross-modal interactions in affect percep- wonder why sounds and lights are presented from dif- tion. Each of these phenomena we consider to be based ferent locations, and then adopt a postperceptual re- on cross-modal interactions affecting early levels of sponse strategy that satisfiesthe experimenter's ideas of perception. the task (Bertelson, 1999). Of course, these possibilities are not exclusive, and the experimenter should find Vision affecting sound localization: waysto check or circumvent them. The ventriloquism effect In our research, we dealt with these and other aspects of the ventriloquist situation in the hope of showing Presenting synchronous auditory and visual informa- that the apparent location of a sound is indeed shifted tion in slightly separate locations creates the illusion at a perceptual level of auditory spaceperception. More that the sound is coming from the direction of the vi- specifically,we askedwhether a ventriloquism effect can sual stimulus. Although the effect is smaller, shifting of be observed (1) when subjects are explicitly trained to the visual percept in the direction of the sound has, at ignore the visual distracter; (2) when cognitive strate- least in some studies, also been observed (Bertelson & gies of the subject to respond in a particular way can be Radeau, 1981). The auditory shift is usually measured exclud~d; (3) when the visual distracter is not attended, by asking subjects to localize the sound by pointing or either endogenously or exogenously; and (4) when the by fixating the eyes on the apparent location of the visual distracter is not seen consciously. We also asked sound. When localization responsesare compared with (5) whether the ventriloquism effect as such is possibly a control condition (e.g., a condition in which the a pre-attentive phenomenon. sound is presented in isolation without a visual stimu- 1. A visual distractercannot beignored In a typical ven- Ius), a shift of a few degreesin the direction of the visual triloquism situation, subjects are asked to locate a stimulus is usually observed. Such an effect, produced sound while ignoring a visual distracter. Typically, sub- by audiovisual spatial conflict, is called the ventriloquism jects remain unaware of how well they perform during effect,because one of the most spectacular everyday ex- the experiment and how well they succeed~in obeying amples is the illusion created by performing ventrilo- instructions. In one of our experiments, though, we quists that the speech they produce without visible fa- asked whether it is possible to train subjects explicitly to cial movements comes from a puppet they agitate in ignore the visual distracter (Vroomen, Bertelson, & de synchrony with the speech. Gelder, 1998). If despite training it is impossible to ig- A standard explanation of the ventriloquism effect is nore the visual distracter, this speaksto the robustness the following: when auditory and visual stimuli occur in of the effect. Subjects were trained to discriminate close temporal and spatial proximity, the perceptual sys- among sequencesof tones that emanated either from a tem assumesthat a single event has occurred. The per- central location only or from alternating locations, in ceptual systemthen tries to reduce the conflict between which case two speakers located next to a computer the location of the visual and auditory data, because screen emitted the tones. With no visual input, this there is an a priori constraint that an object or event can same/ different location task was very easy,because the have only one location (e.g., Bedford, 1999). Shifting difference between the central and lateral locations was the auditory location in the direction of the visual event clearly noticeable. However, the task was much more rather than the other way around would seem to be 'difficult when, in synchrony ~th the tones, light flashesI

142 PERCEPTUAL CONSEQUENCES OF MULTIPLE SENSORYSYSTEMS werealternated left and right on a computer screen. The location at which theseresponse reversals began to This condition created the strong impression that occur was the dependent variable. In the study by soundsfrom the central location now alternated be- Bertelson and Aschersleben, sounds were delivered to- tweenleft and right, presumably because the light gether with a visual distracter, a light-emitting diode flashesattracted the apparent location of the sounds. (LED) flash, in a central location. When the LED flash Wethen tried to train subjects to discriminate centrally was synchronized with the sound, response reversal oc- presentedbut ventriloquized sounds from sounds that curred earlier than when the light was desynchronized alternated physically between the left and right. from the sound. Apparently, the synchronized LED Subjectswere instructed to ignore the lights as much as flashes attracted the apparent location of the sound to- I possible(but without closing their eyes), and they re- ward its central location, so that response reversal oc- t ceivedcorrective feedback after each trial. The results curred earlier on the staircase.Similar results have now

I. werethat the larger the separationbetween the lights, been reported by Caclin, Soto-Faraco,Kingstone, and

I themor~ false alarms occurred (subjectsrespon~ed S~ence(2002) showi~gthat a centrally located ~ctile ! "alternatlngsound" to a centrally presented ventnlo- stImulus attracts a perIpheral sound toward the mIddle. r quizedsound), presumablybecause the farther apart It is important to note that there is no waysubjects can

I thelights were, the farther apart wasthe perceived10- figure out a responsestrategy that might lead to this re- cationof the sounds. Moreover, although feedback was suIt, because once response reversal begins to occur, providedafter each trial, performance did not improve subjects no longer know whether a sound belongs to a in the course of the experiment. Instructions and feed- left or a right staircase.A conscious response strategy in backthus could not overcome the effect of the visual this situation is thus extremely unlikely to account for distracteron sound localization, which indicates that the effect. theventriloquism effect is indeed very robust. Conscious strategies are also unlikely to playa role 2. A ventriloquismeffect is obtained even when cognitive when the effect of presenting auditory and visual stimuli strategiescan be excluded When subjects are asked to at separate locations is measured as an aftereffect. The pointto the location of auditory stimuli while ignoring aftereffect is a shift in the apparent 10catiQJlof unimodally spatiallydiscrepant visual distracters, they may be aware presented acoustic stimuli consequent on exposure of the spatial discrepancy and adjust their response ac- to synchronous but spatially disparate auditory-visual cordingly.The visual bias one obtains may then reflect stimulus pairs. Initial studies used prisms to shift the rela- postperceptualdecisions rathe.r than genuine percep- tive locations of visual and auditory stimuli (Canon, 1970; tualeffects. However, contamination by strategies can Radeau & Bertelson, 1974). Participants localized beprevented when ventriloquism effects are studied via acoustic targets before and after a period of adaptation. a staircaseprocedure or as an aftereffect. During the adaptation phase, there was a mismatch Bertelsonand Aschersleben (1998) were the first to between the spatial locations of acoustic and visual applythe staircaseprocedure in the ventriloquism situ- stimuli. Typically, between pre- and posttest a shift of ation.The advantage of a staircase procedure is that it about 1-4 degreeswas found in the direction of the visual is not transparent, and the effects are therefore more attractor. Presumably, the spatial conflict between likely to reflect genuine perceptual processes. In auditory and visual data during the adaptation phasewas the staircase procedure used by Bertelson and resolved by recalibration of the perceptual system,and Aschersleben,subjects had to judge the apparent origin this/alteration lasted long enough to be detected as of a stereophonicallycontrolled sound as left or right of aftereffects. It should be noted that aftereffects are amedian reference point. Unknown to the subjects, the measured by comparing unimodal pointing to a sound locationof the sound was changed as a function of their before and after an adaptation phase. Stroop-like judgment,following the principle of the psychophysical response competition between the auditory target and staircase.Mter a "left" judgment, the next sound on the visual distractor during the test situation thus plays no samestaircase was moved one step to the right, and vice role, becausethe test sound is presented without a visual versa.A staircasestarted with sounds coming from an distracter. Moreover, aftereffects are usually obtained extremeleft or an extreme right position. At that stage, when the spatial discrepancybetween auditory and visual correctresponses were generally given on each succes- stimuli is so small that subjectsdo not even notice the sep- sivetrial, so that the target sounds moved progressively aration (Frissen,Bertelson, Vroomen, & de Gelder, 2003; towardthe center. Then, at some point, responsereversal5 Radeau & Bertelson, 1974;Vroomen et al., 2003; Woods (responsesdifferent from the preceding one on the & Recanzone, Chap. 3, this volume). Mtereffects can samestaircase) began to occur. Thereafter, the subjects therefore be interpreted astrue perceptual recalibration wereno longer certain about ~e location of the sound. effects (e.g., Radeau, 1994).

VROOMEN AND DE GELDER: PERCEPTUAL EFFECTS OF CROSS-MODAL STIMULATION 143 3. Attention towardthe visual distracteris not neededto obtain sensory modality rather than another regardless of a ventriloquism effect A relevant question is whether location (Spence & Driver, 1997a), or one may attend to attention plays a role in the ventriloquism effect. One one particular location rather than another regardless could argue, as Treisman and Gelade (1980) have done of modality. Furthermore, in the literature on spatial in feature-integration theory for the visual modality, that attention, two different means of the allocation of focused attention might be the glue that combines fea- attention are generally distinguished. First, there is an tures across modalities. Could it be, then, that when a endogenousprocess by which attention can be moved vol- ! I sound and visual distracter are attended, an integrated untarily. Second, there is an automatic or exogenous cross-modal event is perceived, but when they are mechanism by which attention is reoriented automati- unattended, two separate events are perceived that do cally to stimuli in the environment with some special not interact? If so, one might predict that a visual features. The study by Bertelson, Vroomen, et al. (2000) distracter would have a stronger effect on the apparent manipulated endogenous attention by asking subjects location of a sound when it is focused upon. to focus on one or the other location. Yet it may have We considered the possibility that ventriloquism in- been the case that the visual distracter received a cer- deed requires or is modulated by this kind of focused tain amount of exogenous attention independent of attention (Bertelson, Vroomen, de Gelder, & Driver, where the subject was focusing. For that reason, one 2000). The subjects' task was to localize trains of tones might ask whether capture of exogenous attention by while monitoring visual events on a computer screen. the visual distracter is essential to affect the perceived On experimental trials, a bright square appeared on the location of a sound. left or the right of the screen in exact synchrony with To investigate this possibility, we tried to create a situ- the tones. No square appeared on control trials. The ation in which exogenous attention wascaptured in one attentional manipulation consisted of having subjects direction, whereas the apparent location of a sound was monitor either the center of the display, in which case ventriloquized in the other direction (Vroomen, the attractor square was in the visual periphery, or the Bertelson, & de Gelder, 2001a). Our choice was influ- lateral square itself for occasional occurrences of a enced by earlier data showing that attention can be cap- catch stimulus (a very small diamond that could only be tured by a visual item differing substantially in one or detected when in the fovea). The attentional hypothesis several attributes (e.g., color, form, orientation, shape) predicts that the attraction of the apparent location of from a set of identical items among which it is displayed the sound by the square would be stronger with atten- (e.g., Treisman & Gelade, 1980). The unique item is tion focused on the attractor square than with attention sometimes called the singleton, and its influence on focused on the center. In fact, though, equal degrees of attention is referred to as the singleton effect. Ifventril- attraction were obtained in the two attention condi- oquism is mediated by exogenous attention, presenting tions. Focused attention thus did not modulate the a sound in synchrony with a display that contains a sin- ventriloquist effect. gleton should shift the apparent location of the sound However, the effect of attention might have been small toward the singleton. Consequently, finding a singleton and overruled by the bottom-up information from the that would not shift the location of a sound in its direc- laterally presented visual square. What would happen tion would provide evidence that exogenous attention when the bottom-up information was more ambiguous? can be dissociated from ventriloquism. Would an effect of attention then appear? In a second We used a psychophysical staircase procedure as in experiment, we used bilateral squares that were flashed Bertelson and Aschersleben (1998). The occurrence of in synchrony with the sound so as to provide competing visual bias was examined by presenting a display in syn- visual attractors. When the two squares were of equal chrony with the sound. We tried to shift the apparent 10- size, auditory localization was unaffected by which side cation of the sound in the opposite direction of the sin- participants monitored for visual targets, but when one gleton by using a display that consisted of four square was larger than the other, auditory localization horizontally aligned squares: two big squares on one was reliably attracted toward the bigger square, again side, and a big square and a small square (the single- regardlessof where visual monitoring wasrequired. This ton) on the other side (Fig. 9.1). The singleton was led to the conclusion that the ventriloquism effect either in the far left or in the far right position. A visual largely reflects automatic sensoryinteractions, with little bias dependent on the position of the singleton should or no role for focused attention. manifest itself at the level of the locations at which re- In discussing how attention might influence ventrilo- versalsbegin to occur on the staircasesfor the two visual quism, though, one must distinguish several sensesin displays. If, for instance, the apparent location of the which the term attentionis used. One may attend to one sound were attracted toward the singleton, reversals

144 PERCEPTUAL CONSEQUENCES OF MULTIPLE SENSORYSYSTEMS the singleton. In the control condition, the display con- sisted of four equally sized big squares. Discrimination performance wasworse in the singleton condition than in the control condition, thus showing that attention was attracted awayfrom the target letter and toward the singleton. Nevertheless, one might still argue that a sin- gleton in the sound localization task did not capture at- tention because subjects were paying attention to audi- tion, not vision. In a third experiment, we therefore randomized sound localization trials with visual X/O discrimination trials so that subjects did not know in ad- vance which task they had to perform. When subjects saw an X or an 0, they pressed as fast as possible a cor- responding key; otherwise, when no letter wasdete~ted, they decided whether the sound had come from the left FIGURE9.1 An example of one of the displays used by or right of the central reference. With this mixed de- Vroomen,Bertelson, and de Gelder (2001a). Subjects saw. ..' " (th . 1 t ) all th th th sIgn, results were stIll exactly as before. attentIon was at- lour squares, one e SIng e on sm er an e 0 ers. .. . Whenthe display was flashed on a computer screen, subjects tracted toward the smgleton whIle the sound was shIfted hearda stereophonicallycontrolled sound whose location away from the singleton. Strategic differences between hadto bejudged as left or right of the median fixation cross. an auditory and a visual task were thus unlikely to ex- Theresults showed that the apparent location of the sound plain the result. Rather we demonstrated a dissociation wasshifted in the direction of the two big squares,and not to- " ' . .

dth . I 0 I .al ' fi d th th . between ventriloquIsm and exogenous attentIon. the war e SIng eton. n contro trl s, !twas oun at e sm-. . gletonattracted visual attention. The direction in which apparent locatIon of the sound was shIfted toward the asound was ventriloquized was thus dissociatedfrom where two big squares (or the "center of gravity" of the visual exogenousattention was captured. display), while the singleton attracted exogenous atten- f tion. The findings from the studies concerning the role , of exogenous attention together with those of the ear- wouldfirst occur at locations more to the left for the dis- lier one showing the independence of ventriloquism playwith the singleton on the right than for the display from the direction of endogenous attention (Bertelson, withthe singleton on the left. Vroomen,- et. al., 2000) thus support the conclusion The results of this experiment were very straightfor- that ventriloquism is not affected by the direction of ward.The apparent origin of the sound was not shifted attention. towardthe singleton but in the opposite direction, to- 4. The ventriloquismeffect is still obtainedwhen the visual ward the two big squares. Apparently, the two big distracter is not seen consciously The conclusion that squareson one side of the display were attracting the attention is not needed to obtain a ventriloquism effect apparentorigin of the sound more strongly than the is further corroborated by our work on patients with smalland big square at the other side. Thus, the attrac- unilateral visual neglect (Bertelson, Pavani, Ladavas, tor size effect that we previously obtained (Bertelson, Vroomen, & de Gelder, 2000). The neglect syndrome is Vroomen,de Gelder, & Driver, 2000) occurred with the usually interpreted as an attentional deficit and presentvisual display as well. This result thus suggested reflected in a reduced capacity to report stimuli in the that attraction of a sound was not mediated through contralateral side (usually the left). Previously, it had exogenousattention capture. However, befor~ that con- been reported that ventriloquism could improve the at- clusioncould be drawn, it was necessary to check that tentional deficit. Soroker, Calamaro, and Myslobodsky thevisual display had the capacity to attract attention (1995) showed that inferior identification of syllables towardthe singleton. We therefore ran a control exper- delivered through a loudspeaker on the left (auditory iment in which the principle was to measure the neglect) could be improved when the same stimuli on attention-attracting capacity of the small square the left were administered in the presence of a fictitious through its effect on the discrimination of targets pre- loudspeaker on the right. The authors attributed this sentedelsewhere in the display. In the singleton condi- improvement to a ventriloquism effect, even though tion, participants were shown the previously used dis- their setting was very different from the usual ventrilo- playwith the three big squares and the small one. A quism situation. That is, their visual stimulus was targetletter X or 0, calling for a choice reaction, was stationary, whereas typically the onset and offset of an displayedin the most peripheral big square opposite auditory and visual stimulus are synchronized. The

VROOMENAND DE GELDER:PERCEPTUAL EFFECTS OF CROSS-MODALSTIMULATION 145 effect was therefore probably mediated by higher-order the periphery? Can such a ventriloquized cue affect re- knowledge about the fact that sounds can be delivered sponses to auditory targets? The ventriloquized cue through loudspeakers. consisted of a tone presented from an invisible central In our research, we used the more typical ventrilo- speaker synchronized with a visual cue presented on the quism situation (a light and a sound presented simulta- left or right. Depending on the stimulus onset asyn- neously) and asked whether a visual stimulus that re- chrony (SOA)-100, 300, or 500 ms-a target sound mains undetected because it is presented in the (white noise bursts) was delivered with equal probabili- neglected field nevertheless shifts the apparent location ties from one of the four target speakers.Subjects made of the sound toward its location. This may occur be- a speeded decision about whether the target had been causealthough perceptual awarenessis compromised in delivered through one of the upper speakersor one of neglect, much perceptual processing can still proceed the lower speakers.Results showed that visual cues had unconsciously for the affected side. Our results were no effect on auditory target detection (see also Spence clear: patients with left visual neglect consistently failed & Driver, 1997b). More important, ventriloquized cues to detect a stimulus presented in their left visual field; had no cuing effect at 100 ms SOA, but the facilitatory nevertheless,their pointing to a sound wasshifted in the effect appeared at 300 and 500 ms SOA. This suggests direction of the visual stimulus. This is thus another that a ventriloquized cue directed auditory exogenous demonstration that ventriloquism is not dependent on attention to the perceived rather than the physical au- attention or even awarenessof the visual distracter. ditory location, implying that the cross-modal interac- 5. The ventriloquism effect is a pre-attentivePhenomenon tion between vision and audition reorganized space on The previous studies led us to conclude that cross-modal which auditory exogenous attention operates. Spence interactions take place without the need of attention. and Driver (2000) reported similar cuing effects (with a This stage is presumably one concerned with the initial somewhat different time course) in their study of ven- analysisof the spatial scene (Bertelson, 1994). The pre- triloquized cues in the vertical dimension. They showed sumption receivesadditional support from the findings that a visual cue presented above or below fixation led by Driver (1996) in which the visual bias of auditory to a vertical shift of auditory attention when it was location was measured in the classic "cocktail party" paired with a tone presented at fixation. Attentional situation through its effect in facilitating the focusing of capture can thus be directed to the apparent location of attention on one of two simultaneous spoken messages. a ventriloquized sound, suggesting that cross-modalin- Subjects found the shadowing task easier when the tegration precedes or at least co-occurs with reflexive apparent location of the target sound wasattracted away shifts of covert attention. from the distracter by a moving face. This result thus im- To summarize the case of ventriloquism, the appar- plies that focused attention operates on a representa- ent location of a sound can be shifted in the direction of tion of the external scenethat has already been spatially a visual stimulus that is synchronized with the sound. It reorganized by cross-modal interactions. is unlikely that this robust effect can be explained solely We asked whether a similar cross-modal reorganiza- by voluntary response strategies, as it is obtained with a tion of external space occurs when exogenous rather psychophysical staircase procedure and can be ob- than focused attention is at stake (Vroomen, Bertelson, served as an aftereffect. The effect is even obtained & de Gelder, 2001b). To do so, we used the orthogonal when the visual distracter is not seen consciously, as in cross-modal cuing task introduced by Spence and patients with hemineglect. Moreover, the ventriloquism Driver (1997b). In this task, participants are asked to effect does not require attention because it is not judge the elevation (up vs. down, regardless of whether affected by whether a visual distracter is focused on or it is on the left or right of fixation) of peripheral targets not, and the direction of ventriloquism can be dissoci- in either audition, vision, or touch following an unin- ated from where visual attention is captured. In fact, the formative cue in either one of these modalities. In gen- ventriloquism effect may be a pre-attentive phenome- eral, cuing effects (i.e., faster responseswhen the cue is non, becauseauditory attention can be captured at the on the same side as the target) have been found across ventriloquized location of a sound. Taken together, all modalities, except that visual cues do not affect re- these observations indicate that the ventriloquism sponsesto auditory targets (Driver & Spence, 1998; but effect is perceptually "real." see McDonald, Teder-Salejirvi, Heraldez, & Hillyard, 2001; Spence, 2001). This then opens an intriguing pos- Soundaffecting vision: Thefreezing phenomenon sibility: What happens with an auditory cue whose veridical location is in the center but whose apparent 10- Recently we described a case of sound affecting vision cation is ventriloquized toward a simultaneous light in (Vroomen & de Gelder, 2000). The basic phenomenon

146 PERCEPTUALCONSEQUENCES OF MULllPLE SENSORYSYSTEMS r::e described as follows: when subjects are shown a at a rate that would yield a single perceptual object rapidly changing visual display, an abrupt sound may when the accompanying visual sequence was perceived "freeze"the display with which the sound is synchro- as a single object. The number of objects seen thus nized.Perceptually, the display appears brighter or ap- influenced the number of objects heard. They also pearsto be shown for a longer period of time. We de- found the opposite influence from audition to vision. scribedthis phenomenon against the background of Segmentation in one modality thus affected segmenta- sceneanalysis. tion in the other modality. To us, though, it was not Sceneanalysis refers to the concept that information clear whether the cross-modal effect was truly percep- reaching the sense organs is parsed into objects and tual, or whether it occurred becauseparticipants delib- events.In vision, scene analysissucceeds despite partial erately changed their interpretation of the sounds and occlusionof one object by the other, the presence of dots. It is well known that there is a broad range of rates shadowsextending acrossobject boundaries, and defor- and tones at which listeners can hear, at will, one or two mationsof the retinal image produced by moving ob- streams (van Noorden, 1975). O'Leary and Rhodes pre- jects.Vision is not the only modality in which object seg- sented ambiguous sequences,and this raises the possi- regation occurs. Auditory object segregation has also bility that a cross-modal influence was found because beendemonstrated (Bregman, 1990). It occurs, for in- perceivers changed their interpretationof the sounds and stance,when a sequence of alternating high- and low- dots, but the perceptionmay have been the same. For ex- frequencytones is played at a certain rate. When the fre- ample, participants under the impression of hearing quencydifference between the tones is small, or when two streams instead of one may have inferred that in thetones are played at a slow rate, listeners are able to vision there should also be two streams instead of one. follow the entire sequence of tones. But at bigger fre- Such a conscious strategy would explain the observa- quencydifferences or higher rates, the sequence splits tions of the cross-modal influence without the need for into two streams, one high and one low in pitch. a direct perceptual link between audition and vision. Althoughit is possible to shift attention between the two We pursued this question in a study that led us to ob- streams,it is difficult to report the order of the tones in serve the freezing phenomenon. We first tried to deter- theentire sequence. Auditory stream segregation, like mine whether the freezing of the display to which an apparentmotion in vision, appears to follow Korte's abrupt tone is synchronized is a perceptually genuine third law (Korte, 1915). When the difference in fre- effect or not. Previously, Stein, London, Wilkinson, & quencybetween the tones increases,stream segregation Price (1996) had shown, with normal subjects, that a occursat longer SOAs. sound enhances the perceived visual intensity of a stim- Bregman (1990) described a number of Gestaltprin- ulus. The latter seemed to be a close analogue of the ciplesfor auditory scene analysis in which he stressed freezing phenomenon we wanted to create. However, the resemblance between audition and vision, since Stein et al. used a somewhat indirect measure of visual principlesof perceptual organization such as similarity intensity (a visual analogue scale; participants judged (in volume, timbre, spatial location), good continua- the intensity of a light by rotating a dial), and they could lion, and common fate seem to play similar roles in the not find an enhancement by a sound when the visual twomodalities. Such a correspondence between princi- stimulus was presented subthreshold. It was therefore piesof visual and auditory organization raises the ques- unclear whether their effect was truly perceptual rather lion of whether the perceptual systemutilizes informa- than postperceptual. In our experiments, we tried to lion from one sensory modality to organize the avoid this difficulty by using a more direct estimate of vi- perceptualarray in the other modality. In other words, sual persistence by measuring speeded performance on isscene analysis itself a cross-modal phenomenon? a detection task. Participants sawa 4 X 4 matrix of flick- Previously,O'Leary and Rhodes (1984) showed that ering dots that was created by rapidly presenting four perceptual segmentation in one modality could influ- different displays, each containing four dots in quasi- encethe concomitant segmentation in another modal- random positions (Fig. 9.2). Each display on its own was ity.They used a display of six dots, three high and three difficult to see, because it was shown only briefly and low.The dots were displayed one by one, alternating be- was immediately followed by a mask. One of the four tweenthe high and low positions and moving from left displays contained a target to be detected. The target to right. At slow rates, a single dot appeared to move up consisted of four dots that made up a diamond in the and down, while at faster rates two dots were seen as upper left, upper right, lower left, or lower right corner movinghorizontally, one above the other. A sequence of the matrix. The task of the participants was to detect that was perceived as two dots caused a concurrent the position of the diamond as fast and as accurately as !uditory sequence to be perceived as two tones as well possible. We investigated whether the detectability of

I VROOMEN AND DE GELDER: PERCEPTUAL EFFECTS OF CROSS-MODAL STIMULATION 147 O. , information about when to expect the target? In a sec- t4 ." tone4 ond experiment we controlled for this possibilityand . . . ' synchronized the high tone with a distracter display .. . that immediately preceded presentation of the target target 0 '. Participants were informed about the temporal relation 3 '.. tone 3 between high tone and target display and thus knew t '.,. that the target would be presented immediately after , . , . the high tone. Yet, although the participants were now 0 ' , also given a cue about when to expect the target their . , , 'tone 2 performance actually worsened. As reported by the par- t2 .,. ticipants, it is probable that the high tone contributed ,. D to higher visibility of the distracter display with which it . , . .. mask was synchronized, thereby increasing interference. O. ' 'tone 1 . . However, the most important result was that we could t1 ,., show that the perceptual organization of the tone Sf- , , , quence determined the cross-modal enhancement Our introspective observation was that visual detection FIGUREquence 9.2used Ain Vroomensimplified andrepresentation de Gelder (2000).of a stimulusBig squares se- was improved only when the high tone w as s~grega t e d represent the dots shown at time t; small squareswere actually from the tone seq~ence. In our. next expenment. we not presented to the viewers but are in the figure to show the prevented segregation of the hIgh tone by making position of the dots within the 4 X 4 matrix. The four-dot dis- it part of the beginning of the well-known tune plays were shown for 97 ms each. Ea~h display was immedi- "Frere Jacques." When subjects heard repetitively a ately followed by a mask (the full matrIx of 16 dots) for 97 ms, low-middie-high-iow tone sequence while seeing followed(in this example, by a dark the blank diamond screen in for the 60 upper ms. Theleft cornertarget displaywhose th e tar get on th e. th.lr d h'. 19 h tone, th ere was no position had to be detected) was presented at ~. The se- enhancement of the VISUaldIsplay. Thus, the perceptual quence of the four-dot displays was repeated without inter- organization of the tone in the sequence increased the ruption until a responsewas given. Tones (97 ms in duration) visibility of the target display rather than the high tone were synchronized with the onset of the four-dot displays. per se, showing that cross-modal interactions can occur t th I I f al . Results showed that when. a tone was presented at ~ that seg- a e eve 0 scene an YSIS. regated,target detectIonwas enhanced, presumably because the visibilityof the targetdisplay was increased. When a segre- . gating tone was presented at ~, target detection became worse How to qualify thenature of cross-modal becausethe visual distracterat ~ causedmore interference. interactions? There was no enhancement when the tone at ~ did not segre- gate. The visibility of a display was thus increasedwhen We have argued that ventriloquism and the freezing synchronized with an abrupt tone. . . phenomenon are two examples of mtersensory mterac- tions with consequencesat perceptual processing levels (seealso Vroomen & de Gelder, 2003, for audiovisual in- the target could be improved by the presentation of an teractions between moving stimuli). They may therefore abrupt sound together with the target. The tones were be likely candidates for showing that cross-modalinter- delivered through a loudspeaker under the monitor. actions can affect primary sensory levels. There is now Participants in the experimental condition heard a high also some preliminary neurophysiological evidence tone at the target display and a low tone at the other showing that brain areasthat are usually considered to be four-dot displays (the distracters). In the control condi- unimodal can be affected by input from different modal- tion, participants heard only low tones. The idea was ities. For example, with functional magnetic resonance that the high tone in the sequence of tones segregated imaging, Calvert, Campbell, and Brammer (2000) found from the low tones, and that under the experimental that lipreading could affect primary auditory cortex. In a circumstances it would increase the detectability of the similar vein, Macaluso, Frith, and Driver (2000) showed target display. The results indeed showed that the target that a tactile cue could enhance neural responsesto a vi- was detected more easily when presentation of the tar- sual target in visual cortex. Giard and Peronnet (1999) get was synchronized with presentation of the high also reported that tones synchronized with a visual stirn- tone. Subjects were faster and more accurate when a ulus affected event-related potentials (ERPs) in visual high tone was presented at target onset. Did the high cortex. Pourtois, de Gelder, Vroomen, Rossion, and tone simply act as a warning signal, giving subjects Crommelink (2000) found early modulation of auditory

148 PERCEPTUAL CONSEQUENCES OF MULllPLE SENSORYSYSTEMS ERPs when facial expressions of emotions were pre- Bertelson,P. (1999).Ventriloquism:Acaseofcros&-modalper- sented with auditory sentence fragments, the prosodic ceptual grouping. In G. Aschersleben, T. Bachmann, & J.

. .th . MussIer (Eds.), Cognitive contributions to the perception of spa- content of whIch was congruent WI or Incongruent . l ( 347 362) Am d El . . . tzal a nd temrOTa,J, events pp. - . ster am: seVIer.. WIththe face. When the face-voIce paIrs were congruent, Bertelson, P., & Aschersleben, G. (1998). Automatic visual there was a bigger auditory Nl component at around bias of auditory location. PsychonomicBulletin and Review, 5, 110ms than when they were incongruent. All these find- 482-489. ings are in accordance with the idea that there is feed- Bertelson, P., Pavani, F., Ladavas, E., Vroomen,J., & de Gelder, back from multimodal levels to unimodal levels of B. (2000). Ventriloqui~m in patients with unilateral visual . . . th dal . . neglect. Neuropsychologza,38, 1634-1642. perceptIon or WIth the notIon at sensory mo ItIes Bertelson, P., & Radeau, M. (1981). Cros&-modal bias and accesseach other directly. perceptual fusion with auditory-visual spatial discordance. Such cross-talk between primary sensory areas may Perception & Psychophysics,29, 578-584. also be related to the fact that the subjective experience Bertelson, P., Vroomen, J., & de Gelder, B. (2003). Visual of cross-modal interaction affects the target modality. In re?alibration of auditory speech identification. Psychological Sczence. ~e M~Gurk e~fect (Mc.curk & ~acDo~ald, 1976), Bertelson, P., Vroomen,J., de Gelder, B., & Driver,]. (2000). VIsual InformatIon proVIded by lIpreadIng changes The ventriloquist effect does not depend on the direction the way a sound is heard. In the ventriloquism situation, of deliberate visual attention. Perception & Psychophysics,62, when a sound is presented with a spatially conflicting 321-332. light, the location of the sound is changed. With emo- Bregman,A.S. (1990). Auditory sceneanalysis. Cambridge, MA: . h " Ii 1 .. h . h h £ MIT Press. tlons, w en a J.ear u voIce IS s own WIt a appy J.ace, C 1. '. ac In, A., S0 t0- Faraco, S., Ki ngs tone,.,A &S pence,. C (2002) . the VOIce sounds happIer (de Gelder & Vroomen, 2000). Tactile "capture" of audition. Perception & Psychophysics,64, There are other examples of such qualitative changes: 616-630. For example, when a single flash of light is accompa- Calvert, G. A, Campbell, R., & Brammer, M. J. (2000). nied by multiple beeps the light is seen as multiple Evidence from functional magnetic resonance imaging of fl h (Sh Ka .ta". & Sh ' . 2000) Th cros&-modal binding in the human heteromodal cortex. as ~s ams, ~I lll, Imojo, . es~ Current Biology, 10,649-657. multlmodally determIned percepts thus have the Ulll- Canon, L. K (1970). Intermodality inconsistency of input and modal qualia of the sensory input from the primary directed attention as determinants of the nature of adapta- modality, and this may be due to the existence of tion.journalofExperimentalPsychology, 84, 141-147. backprojections to the primary sensory areas. de ?elder, B., & Vroomen,]. (~O.OO).The pe~ception of emo- Yet this is not to sa that a percept resulting from ~ons by ear and by eye. CognztzonandE~tz~, 1~, 289-~11. . . Y .. . Dnver, J. (1996). Enhancement of selectIve lIstenIng by Illu- r cross-modal InteractIons IS, m all Its relevant aspects, sory mislocalization of speech sounds due to lip-reading. r equivalent to its unimodal counterpart. For example, is Nature, 381, 66-68. r a ventriloquized sound in all its perceptual and neuro- Driver,J., & Spence, C. (1998). Attention and the cros&-modal r physiological relevant dimensions the same as a sound ?onstruction of space. Trends in Cogniti~e Sciences,2, 254-~62.

played from the direction from where the ventrilo- Dnver, J., & Spence, C. (2000). MultIsensory perceptIon:

" I . d d . "' c quize soun was perceIve d r For M c Gur k-l I. ke stImu. 1us Beyond731-735. modularity and convergence. Current Biology, 10, ,. combinations (i.e., hearing /ba/ and seeing /ga/) , the Ettlinger, G., & Wilson, W. A. (1990). Cros&-modal perfor- , auditory component can be dissociated from the per- mance: Behavioural processes, phylogenetic considerations ceived component, as the contrast effect in selective and neural mechanisms. Behavioural Brain Research, 40, speech adaptation is driven mainly by the auditory stim- 169-192.

I d b th . d t fth d ,. al Falchier, A., Renaud, L., Barone, P., & Kennedy, H. (2001). u .us ' an not ..Y e perceIve aspec 0 e au IOVISU ExtensIve.. projections. from the pnmary" auditory cortex and stimulus combInatIon (Roberts & Summerfield, 1981; polysensory area STP to peripheral area VI in the macaque. but see Bertelson, Vroomen, & de Gelder, 2003). For Thejournal of Neuroscience,27,511-521. other cross-modal phenomena such as ventriloquism, a Falchier, A., Clavagnier, S., Barone, P., & Kennedy, H. (2002). number of intriguing questions remain about the Anatomical evidence of multimodal integration in primate h. h th . 11 . d .. . h bl striate cortex. The journal of Neuroscience, 22, 5749-5759. extent to w IC e I usory percept IS IstInguls a e, F . . .. nssen, I., Berte 1son, P., Vroomen,., J & de Gelder, B. (2003). or not, from Its vendlcal counterpart. The aftereffects of ventriloquism: Are they sound frequency " specific? Acta Psychologica, 113, 315-327. REFERENCES Gi~d, M. ~., & Pe~onnet, F. ~1999). Au~i~orY-;isual integra- tIon dunng multImodal object recogmtIon In humans: A Bedford, F. L. (1999). Keeping perception accurate. Trends in behavioural and electrophysiological study. journal of CognitiveSciences, 2,4-11. Cognitive Neuroscience,11,473-490. Bertelson, P. (1994). The cognitive architecture behind Korte, A. (1915). Kinematoscopische Untersuchungen auditory-visual interaction in scene analysis and speech [Kinematoscopic research]. Zeitschrift flir Psychologie der ~ identification. Cahiers de PsychologieCognitive, 13, 69-75. Sinnesorgane, 72, 193-296.

~ VROOMEN AND DE GELDER: PERCEPTUAL EFFECTS OF CROSS-MODAL STIMUlATION 149 Macaluso, E., Frith, C., & Driver, J. (2000). Modulation of distraction and action: MultiPle perspectiveson attentional human visual cortex by cross-modal spatial attention. capture(pp. 231-262). Amsterdam: Elsevier. Science,289, 1206-1208. Spence, C., & Driver, J. (1997a). On measuring selective Massaro, D. W. (1998). Perceivingtalking faces: From speech attention to a specific sensory modality. Perception& perceptionto a behavioralprinciple. Cambridge: MA: MIT Psychophysics,59, 389-403. Press. Spence,C., & Driver,]. (1997b). Audiovisual links in exogenous McDonald, J. J., Teder-Salejarvi,W. A., Heraldez, D., & covertspatial attention. Perception & Psychophysics,59,1-22. Hillyard, S. A. (2001). Electrophysiological evidence for the Spence, C., & Driver,J. (2000). Attracting attention to the il- "missing link" in cross-modal attention. CanadianJournal of lusory location of a sound: Reflexive cross-modal orienting Psychology,55, 143-151. and ventriloquism. NeuroReport,11,2057-2061. McGurk, H., & MacDonald,J. (1976). Hearing lips and seeing Stein,B. E., London, N., Wilkinson,L. K., &Price,D. D. (1996). voices. Nature, 264, 746-748. Enhancement of perceived visual intensity by auditory O'Leary, A., & Rhodes, G. (1984). Cross-modaleffects on visual stimuli: A psychophysical analysis. Journal of Cognitive and auditory object perception. Perception& Psychophysics, Neuroscience,8, 497-506. 35,565-569. Treisman, A. M., & Gelade, G. (1980). A feature-integration Pourtois, G., de Gelder, B., Vroomen, J., Rossion, B., & theory of attention. CognitivePsychology, 5, 109-137. Crommelink, M. (2000). The time-<:ourse of intermodal van Noorden, L. P. A. S. (1975). Temporalcoherence in thepercep. binding between seeing and hearing affective information. tion of tone sequences.Unpublished doctoral dissertation, NeuroReport,11,1329-1333. Technische Hogeschool Eindhoven, The Netherlands. Radeau, M. (1994). Auditory-visual spatial interaction and Vroomen, J., Bertelson, P., & de Gelder, B. (1998). A visual modularity. Cahiersde Psychologie Cognitive, 13,3-51. influence in the discrimination of auditory location. Radeau, M., & Bertelson, P. (1974). The after-effects of ven- Proceedingsof the International Conferenceon Auditory-Visual triloquism. The QuarterlyJournal of ExperimentalPsychology, speechProcessing (AVSP '98) (pp. 131-135). Terrigai-Sydney, 26,63-71. Australia: Causal Productions. Roberts, M., & Summerfield, Q. (1981). Audiovisual presenta- Vroomen, J., Bertelson, P., & de Gelder, B. (2001a). The. tion demonstrates that selective attention to speech is ventriloquist effect does not depend on the direction of purely auditory. Perception& Psychophysics,30, 309-314. automatic visual attention. Perception& Psychophysics,63, Rumelhart, D., & McClelland, J. (1982). An interactive 651-659. . activation model of context effects in letter perception: Vroomen,J., Bertelson, P., & de Gelder, B. (2001b). Directing Part 2. The contextual enhancement effect and some spatial attention towards the illusory 10catioQof a ventrilo- tests and extensions of the model. PsychologicalReview, 89, quized sound. Acta Psychologica,108,21-33. 60-94. Vroomen,J., Bertelson, P., Frissen, I., & de Gelder, B. (2003). Shams,L., Kamitani, Y, & Shimojo, S. (2000). What you seeis A spatial gradient in the ventriloquismafter-effect. Unpublished what you hear. Nature, 408, 708. manuscript. . Soroker, N., Calamaro, N., & Myslobodsky, M. S. (1995). Vroomen,J., & de Gelder, B. (2000). Sound enhances visual Ventriloquism reinstates responsivenessto auditory stimuli perception: Cross-modaleffects of auditory organisation on in the "ignored" spacein patients with hemispatial neglect. vision. Journal of ExperimentalPsychology: Human Perception Journal of Clinical and Experimental Neuropsychology,17, and Performance,26, 1583-1590. 243-255. Vroomen,J., & de Gelder, B. (2003). Visual motion influences Spence, C. (2001). Cross-modal attentional capture; A contro- the contingent auditory motion aftereffect. Psychological versy resolved? In C. Folk & B. Gibson (Eds.), Attention, Science,14,357-361.

150 PERCEPTUAL CONSEQUENCES OF MULllPLE SENSORYSYSTEMS