Visual Scene Perception
Total Page:16
File Type:pdf, Size:1020Kb
Galley: Article - 00632 Level 1 Visual Scene Perception Helene Intraub, University of Delaware, Newark, Delaware, USA CONTENTS Introduction Transsaccadic memory Scene comprehension and its influence on object Perception of `gist' versus details detection Boundary extension When studying a scene, viewers can make as many a single glimpse without any bias from prior con- as three or four eye fixations per second. A single text. Immediately after viewing a rapid sequence fixation is typically enough to allow comprehension like this, viewers participate in a recognition test. of a scene. What is remembered, however, is not a The pictures they just viewed are mixed with new photographic replica, but a more abstract mental pictures and are slowly presented one at a time. representation that captures the `gist' and general The viewers' task is to indicate which pictures they layout of the scene along with a limited amount of detail. remember seeing before. Following rapid presenta- tion, memory for the previously seen pictures is rather poor. For example, when 16 pictures were INTRODUCTION presented in this way, moments later, viewers were only able to recognize about 40 to 50 per 0632.001 The visual world exists all around us, but physio- logical constraints prevent us from seeing it all at cent in the memory test. One explanation is that once. Eye movements (called saccades) shift the they were only able to perceive about 40 to 50 per position of the eye as quickly as three or four cent of the scenes that had flashed by and those times per second. Vision is suppressed during were the scenes they remembered. However, con- each saccade, and then resumes during the next trary to this plausible explanation, many viewers eye fixation (the period of time that the eye pauses adamantly claimed that they had indeed moment- to receive visual input). It has been postulated that arily perceived most of the scenes but that the the mental representation of a scene that is main- onslaught of new scenes interfered with their abil- tained across a saccade (`transsaccadic memory') is ity to remember what they had momentarily surprisingly sparse. This information, in conjunc- grasped. tion with expectations about upcoming layout, pro- To determine whether viewers were good at mo- 0632.003 vides a means for integrating successive views into mentarily grasping the meaning of a scene under a coherent representation of a scene. these conditions, it was necessary to design a task that could tap into the early stages of scene process- ing at a point before forgetting is likely to take SCENE COMPREHENSION AND ITS place. Subjects were shown the same rapid se- INFLUENCE ON OBJECT DETECTION quences as in the previous experiments, but in this case a brief description of one of the pictures Perception occurs so quickly and automatically 0632.002 (e.g. `a road with cars') was provided in advance. that individuals cannot discern the amount of They were instructed to look for a picture matching time they need to understand a scene. Are multiple eye fixations necessary for scene perception, or are that description and to immediately press a re- sponse key as soon as they saw it. In some experi- we able to perceive the meaning of a scene right ments the description was fairly specific (as in the away, based on the first eye fixation? One way to example `a road with cars') and in others it was address this question is to see if viewers can under- very broad and general (e.g. `a type of furniture'). stand unrelated scenes presented in rapid succes- Because the detection task required them to re- sion at a rate that mimics the rapid pace at which spond immediately (without waiting until the end we normally make eye fixations (e.g. three per of the sequence), this minimized the likelihood of second). The reason for using unrelated scenes is to allow an assessment of what can be gleaned from forgetting. At speeds as fast as three and four pic- tures per second, the ability to correctly detect a Galley: Article - 00632 2 Visual Scene Perception scene based on a description was excellent, usually objects. How does this information become inte- reaching about 90 per cent correct or better. This grated with new information obtained from the performance far exceeded the 40 to 50 per cent next eye fixation? correct obtained from subjects who took the recog- nition test. These results indicate that scene percep- TRANSSACCADIC MEMORY tion is very rapid: a single fixation is enough to allow the viewer to identify its meaning. Initially, many researchers who studied the relation 0632.006 0632.004 Scenes, however, usually contain multiple between eye movements and cognition thought objects. There are two ways that the identification that after each eye fixation on a scene, a detailed process might proceed: (a) first, objects are identi- sensory record of the fixated region was stored in a fied, and then viewers begin to understand what very short-term memory system called an `integra- kind of scene they are looking at, or (b) first, the tive buffer'. Within the buffer, information from a scene's general meaning and layout are perceived new fixation would be integrated (i.e. knitted to- holistically, and then identification of specific gether) with the stored information from the previ- objects follows. Although a controversial topic, ous fixation. By integrating the details of successive many studies suggest the second alternative. In a views, the system was thought to provide the series of experiments, outline drawings of scenes viewer with a seamless, detailed perception of the were each briefly presented (e.g. 50±150 millisec- world ± like piecing together parts of a jigsaw onds) one at a time. Prior to the presentation of each puzzle. However, research on memory for briefly scene, viewers were provided with the name of an glimpsed scenes, and research on reading and eye object (e.g. fire hydrant), and immediately after movements, has led to a different interpretation each presentation a dot appeared on the screen at about how views are integrated across saccades. the location previously filled by one of the objects The idea is that our perception of a detailed, con- in the scene. Viewers had to say whether or not the tinuous world is to some degree illusory. Instead it object named at the outset (in this example, a fire is proposed that `transsaccadic memory' (memory hydrant) had been present at that location. In some for information that is maintained across a saccade) cases the object did fit with the general meaning of includes the scene's meaning, along with the gen- the scene (e.g. a fire hydrant in a street scene) and in eral layout, and only some detail. others it did not (e.g. a fire hydrant in a kitchen Transsaccadic memory allows the visual system 0632.007 scene). If objects are identified before the scene's to relate the information obtained from one eye meaning is grasped, then the ability to detect the fixation to the contents of the next ± but not by object should be unaffected by whether or not it fits integrating detailed photograph-like views. It is with the context of the scene. However, contrary to somewhat surprising to think that our perception this possibility, the object's relation to the scene as a of the world is built up from relatively sparse infor- whole did affect the speed and accuracy of the mation, but there is a lot of evidence to support this response. Other relations of the object to the claim. You can get a sense of this yourself if you scene, such as whether its relative size did or did look at a complex visual scene (e.g. a bookshelf not fit the scene, or its location in the scene was with books of different sizes and colors) and then plausible or implausible, had a similar effect on the close your eyes and recount all the details. You will response. This suggests that we may grasp the probably find that your visual memory is missing a meaning and general layout of a scene rapidly lot of information. Of course, the claim is best sup- enough to affect our ability to detect specific objects ported by controlled experiments that carefully ad- in the scene. dress the viewer's ability to detect changes. 0632.005 A similar conclusion has been drawn from other Various types of change detection experiments 0632.008 experiments in which eye movements were moni- have been conducted in which a change is made in tored. Pictures were presented for relatively long a display at exactly the same time that the eyes intervals that allowed viewers to make many eye make a saccade. Using computer technology, a fixations. Prior to the picture's onset, viewers fix- viewer's eyes are monitored, and when a saccade ated the centre of the screen. Frequently, the first is launched, a change is made before the eyes land saccade brought the eyes to an object that didn't again and begin the next fixation. Therefore, the belong, and subjects maintained longer fixation change in the scene takes place while vision is times on that object, as if they were trying to under- suppressed. When the eyes land, the change has stand it. Again, this suggests that a scene's meaning already occurred. In these situations, many kinds of and layout is understood very rapidly in process- changes go unnoticed both in scenes and in text. ing, occurring prior to identification of all its For example, in one case, researchers presented Galley: Article - 00632 Visual Scene Perception 3 sentences in an AlTeRnAtInG cAsE on a computer to this perplexing question.