COMMENTARY

Animal awareness: The (un)binding of multisensory cues in decision making by animals

Ron Hoy* Department of Neurobiology and Behavior, 215 Mudd Hall, Cornell University, Ithaca, NY 14850; and Department of Environmental Science, Policy, and Management, University of California, Berkeley, CA 94720

ach day, we perceive the world tribution to discussions about ‘‘animal the territory-holding male to the loud- unfolding before us, and we awareness’’ (8). speaker. Once the territorial male was never give a thought to having Narins et al. (7) worked in Guyana, in attracted to the sound of the call, the to integrate the separate sights the Amazonian rain forest. There, male sight of the silicon robofrog motivated E poison-dart frogs vigorously hold and him to touch and explore the model. and sounds of everyday life. They are effortlessly composed in our brain as the defend territories against other conspe- However, exploration did not escalate successive moments of our conscious cific males. The territorial males pro- into a full-blown aggressive attack unless lives. Cognitive neuroscientists know duce vocal and repetitive advertisement the model was ‘‘calling,’’ simulated by that the neurosensory mechanisms re- calls that attract females for mating (9). broadcasting song through its nearby sponsible for our seamless The diurnally active males can be seen loudspeaker and pulsating its inflated are astonishing (1). However, the per- as well as heard in the act of calling. A latex ‘‘vocal sac.’’ Thus, Narins et al. ceptual unity of our world can ‘‘break’’ singing frog is easily recognized on fractionated the aggressive behavior of a from neurological disorders (2) and, less sight͞site by his conspicuously inflated poison-dart frog by separately dissociat- dramatically, when we experience sen- and pulsating vocal sac, which when ing the model’s vocal from visual sig- sory illusions (3, 4). How our brain fully inflated is nearly half the size of nals: sound alone was sufficient to elicit keeps the many trains of sensory infor- the frog itself. The species-specific call attraction and exploration; sight alone mation ‘‘on track and in time’’ in percep- is easily recognized. It consists of four of an accurately painted model frog tual space͞time is fascinating. However, loud high-pitched notes, delivered in would also elicit exploration and touch- humans are not the only animals on glissando. When a male intrudes upon ing, but only the combination of the Earth that confront a myriad of sights the territory of a calling resident male, sight of a calling frog with the sound of and sounds and have to make adaptive he is quickly approached and attacked its call elicits full aggression, an attack sense of them. Ethologists and behav- by the resident (9). Narins et al. inge- upon the model. These observations ioral ecologists have shown that animals niously used what they called an ‘‘elec- clearly illustrate the synergistic and interacting in small groups or large soci- tromechanical model frog,’’ herein modulatory roles of separate sensory eties constantly make behavioral deci- called ‘‘robofrog.’’ Robofrog is an accu- modalities in a naturally multimodal sig- rately sculpted and painted silicon sions, for example, whether to court and nal. They also show the importance of model of a male poison-dart frog, posed then mate or reject, or whether to chal- ecological context in a behavior as com- in singing position next to a small lenge and then fight or flee, decisions plex as territorial defense. There is a wooden log, wired with a small hidden that are deeply consequential for an in- growing appreciation of multimodal sig- loudspeaker from which prerecorded dividual due to the forces of natural and naling in the communication of animals, calls were broadcast. Full functionality sexual selection (5). Such interactions ranging from invertebrates (10, 11) to was conferred on the model frog by giv- vertebrates (12, 13). are laden with sensory cues and infor- ing it a ‘‘vocal sac,’’ consisting of an ul- mation from multiple modalities that the In the present study, Narins et al. (7) trathin flexible latex membrane pouch, went further in dissociating acoustic individual must translate into actions which could be inflated and pulsated in that make adaptive sense. from visual cues to assess temporal and the size and shape of a real vocal sac. spatial factors that influence perceptual Understanding what composes the Broadcasts from the loudspeaker, as perceptual world of animals is more coherence in multimodal displays. well as inflation of the vocal sac, were Again, aggressive behavior served as the challenging than self͞human studies, remote controlled by investigators who behavioral assay. First, time delays were because investigating humans is just a observed all encounters on live-action introduced between the visual (inflated matter of a subject’s ability to report video. Narins et al. first identified the pulsating vocal sac) and acoustic (onset ‘‘what happens’’ in plain speech, often territorial boundaries of frogs on their and duration of advertisement call) signs as her͞his brain is being scanned (6). field site, then placed their robofrog to explore the ability of the frogs to The perceptual world of animals must within a territory and recorded the reac- (dis)integrate time cues between modali- be inferred from their reactions to ex- tions of the resident frog. ties. Second, the experimenters dissoci- perimental manipulation, which are I take pains to describe the experi- ated the visual sign of the ‘‘singing’’ mostly measured by their movements. mental setup, because the ingenuity of robofrog from the actual location of the Thus, the study by Narins et al. in this the robotic device and ability to video- sound source by placing a second re- issue of PNAS (7) will be welcome to tape behavior under entirely natural mote loudspeaker at various distances ethologists, comparative psychologists, conditions were essential to revealing away from the model. This permitted and cognitive neuroscientists. Narins the full play of behavioral acts that un- Narins et al. to quantitatively assess the et al. designed instrumentally ingenious folded in the rain forest, which might ͞ ͞ ability of frogs to (dis)integrate spatial experiments to dissect the aggressive not be as robust if the experiment were distance cues between modalities. territorial behavior of poison-dart performed in the sterile conditions of a frogs (Epipedobates femoralis) and laboratory. Narins et al. (7) discovered framed their findings in the context of that broadcasting the calling sound of See companion article on page 2425. cognitive and human psy- another male from the territory of a *E-mail: [email protected]. chophysics. Theirs is an important con- territorial male was sufficient to attract © 2005 by The National Academy of Sciences of the USA

www.pnas.org͞cgi͞doi͞10.1073͞pnas.0500093102 PNAS ͉ February 15, 2005 ͉ vol. 102 ͉ no. 7 ͉ 2267–2268 Downloaded by guest on September 24, 2021 To investigate the temporal integra- tially integrate disconcordant sensory Narins et al. (7) also relate their find- tion of , Narins et al. (7) cues. As before, the resident frog was ings to human psychoacoustics by bridg- systematically desynchronized the ampli- attracted to sound and investigated both ing them to ‘‘a number of human studies tude-modulated visual sign of the calling the external loudspeaker and the robo- (that) have shown that visual cues can act, the inflation–deflation cycles of the frog. However, a frog’s behavioral modulate the apparent location of audi- pulsating ‘‘vocal sac’’ of the robofrog, reaction to the model depended on the tory cues. This is clearly seen in the from the auditory sign, the amplitude- displacement distance. For small spatial ‘spatial ventriloquist effect’. . . .’’ They modulated features of the call itself. The displacements between the model and also cite recent studies in humans that degree of desynchronization ranged the external speaker (defined as 2–12 demonstrate temporal spatial ventrilo- from partial synchrony to complete de- cm), many physical contacts and attacks quism effects. These are illusions more coupling. Live territorial male frogs (Ͼ75%) were made upon the model. familiar, perhaps, to readers ‘‘of a cer- were attracted to the immediate locale However, for large displacements (25–50 tain age,’’ for whom television perfor- of the call regardless of whether the cm), the model was touched or attacked mances of vaudevillians performed with dummies (think Edgar Bergen and calls were synchronized with vocal sac in only 25% of trials. Moreover, the Charlie McCarthy) are memorable. Ber- movements of the robofrog. The stereo- amount of time the males spent in the gen ‘‘sold’’ the illusion that his dummy, typed species-specific calls function as vicinity of the robofrog was greater long-distance attraction signals; visual Charlie, was speaking by moving the when the displacements were small. Ap- dummy’s lips as Bergen himself spoke cues play no role in attraction. However, parently, to an inquisitive male frog, the once a male has been attracted to the Charlie’s dialogue; critical to the illu- ͞ more coherent its perception of a full- sion, Bergen did not move his own lips. site sight of the acoustically broadcasting blown singing frog on its territory, the robofrog, the amount of time it remained Accomplished ventriloquists were said more compelling, and the more likely it to be able to ‘‘throw their voice,’’ a tes- in the vicinity depends largely on whether is to attack. the model’s visual and auditory cues are tament to how compelling the illusion ͞ Narins et al. (7) interpret their findings could be. Linguistic studies of human synchronized and or overlapped; when in terms of a perceptual ‘‘binding’’ prob- the two cues were completely decoupled, speech indicate that where sound and lem. In cognitive neuroscience, the usual visual cues appear to be in conflict, vi- the frog left the site much sooner, he framing of perceptual binding is within sion tends to dominate (16). Few com- seemingly ‘‘losing interest.’’ However, the the context of a single modality, usually parable animal studies exist, hence the male was provoked to attack the model if mammalian vision (14, 15). Multiple vi- Narins et al. study is a welcome one. the visual stimulus was decoupled from sual streams diverge from the primary The poison-dart frogs also deal with a the auditory stimulus by less than a half disparity of spatial cues in a way consis- second; longer desynchronized intervals , V1, and extract different features of the visual scene: movement tent with visual dominance, but only up were not provocative. Thus, the sound of to certain spatial limits, beyond which a calling male compels investigation, and and stereopsis, color and texture, and spe- cially ‘‘labeled’’ features such as faces (15). the frogs’ is dominated or the sight of noncalling male is not ap- ‘‘captured’’ by the auditory stimulus [fig- parently threatening enough to evoke In the brain, the confluence of disparate visual processing streams to produce a ure 3 and table 3 of Narins et al. (7)]. aggressive behavior, unless the two This study by Narins et al. (7) is rich in stimuli coincide within a half-second coherent and unitary visual percept has yet to be fully understood; however it is possibilities for inferring important per- time window. The curious male esca- ceptual mechanisms, such as selective at- lated to full aggression only by the sight achieved, binding of multiple information streams must occur within the spatial and tention, sensory binding, sensory domi- and sound of a rival male in the com- nance, and multimodal interactions, all of temporal constraints of the brain. Narins plete act of calling. which are lively issues in human cognition et al. frame their results as a multimodal To test for spatial integration, Narins and perception studies but are just emerg- et al. (7) added an external loudspeaker binding problem, but the implications for ing in comparative animal studies. Narins to their experiment in addition to the perceptual coherence are similar. They et al. show once again how important it is speaker built into the model frog͞log manipulated the temporal and spatial con- to perform experimental tests of animal setup. This allowed them to separate the straints of binding visual to auditory cues perception and cognition within natural actual acoustic location of the call from and were able to ‘‘break’’ the coherence settings, echoing the precepts of the late the visual stimulus of the robofrog exer- of perception (where the visual location of James J. Gibson (17), who in his own cising its vocal sac, as though calling. By a vocalizing frog no longer ‘‘matched’’ the studies of human perception pointed out, systematically varying the distance be- location of the sound source) in both over a half century ago, the importance of tween the robofrog and the external space and time. Such inferences about the the ‘‘ecological validity’’ of experimental speaker (displacements) and observing coherence of multimodal percepts in ani- settings and the design of salient stimulus the effect on the aggressive behavior of mals are hard to make, and rarely can situations. The present study makes very the resident male frog, the investigators they be tested in as direct a manner as clear that this is no less true for under- determined the limits of a male to spa- demonstrated here. standing the perceptual world of animals.

1. Bolhis, J., ed. (2000) Brain, Perception, Memory 6. Cabeza, R. & Kingstone, A. (2001) Handbook of 11. Elias, D., Hebets, E., Hoy, R. & Mason, A. (2005) (Oxford Univ. Press, New York). Functional Neuroimaging of Cognition (MIT Press, Anim. Behav., in press. 2. Sacks, O. (1985) The Man Who Mistook His Wife Cambridge, MA). 12. Partan, S. & Marler, P. (1999) Science 283, 1272–1273. for a Hat and Other Clinical Tales (Simon & 7. Narins, P. M., Grabul, D. S., Soma, K. K., Gau- 13. Patricelli, G. L., Uy, J. A. C., Walsh, G. & Borgia, Schuster, New York). cher, P. & Ho¨dl, W. (2005) Proc. Natl. Acad. Sci. G. (2002) Nature 415, 279–280. 3. Zihl, J., Von Cramon, D. & Mai, N. (1983) Brain USA 102, 2425–2429. 14. Engel, A. K. & Singer, W. (2001) Trends Cognit. 106, 313–340. 8. Griffin, D. (2001) Animal Minds (Univ. of Chicago Sci. 5, 16–25. 4. Purves, D. & Lotto, B. (2003) Why We See What Press, Chicago). 15. Purves, D. (2002) Neuroscience (Sinauer, Sunder- We Do: An Empirical Theory of Vision (Sinauer, 9. Narins, P. M., Ho¨dl, W. & Grabul, D. S. land, MA). Sunderland, MA). (2003) Proc. Natl. Acad. Sci. USA 100, 577– 16. McGurk, H. & MacDonald, J. (1976) Nature 264, 5. Krebs, J. R. & Davies, N. B. (1993) An Intro- 580. 746–748. duction to Behavioural Ecology (Blackwell, 10. Hebets, E. A. & Papaj, D. R. (2005) Behav. Ecol. 17. Gibson, J. J. (1950) The Perception of the Visual London). Sociobiol. 57, 197–294. World (Houghton Mifflin, Boston).

2268 ͉ www.pnas.org͞cgi͞doi͞10.1073͞pnas.0500093102 Hoy Downloaded by guest on September 24, 2021