Chapter 3. Visual Attention and Visual Awareness
Total Page:16
File Type:pdf, Size:1020Kb
Section I. Anatomy and physiology of the human visual system. CHAPTER 3. VISUAL ATTENTION AND VISUAL AWARENESS. Rufin VanRullen1 and Christof Koch2. 1. CNRS Centre de Recherche Cerveau et Cognition 133 Route de Narbonne 31062 Toulouse Cedex (France) 2. California Institute of Technology Division of Biology and Division of Engineering and Applied Sciences MC 139-74 Pasadena CA 91125 (USA) 1 Intuitively, vision appears an overall easy process, effortless, almost automatic, and so efficient that a simple glance at a complex scene is sufficient to produce immediate awareness of its entire structure and elements. Unfortunately, much of this is a grand illusion (O’Regan & Noe, 2001). Proper manipulations will reveal that many often essential aspects of the visual scene can go purely unnoticed. For example, human subjects will often fail to notice an unexpected but quite large stimulus flashed right at the center of gaze during a psychophysical experiment (a phenomenon known as “inattentional blindness”, Mack and Rock, 1998). In more natural environments, observers can fail to notice the appearance or disappearance of a large object (Fig 3.1; Rensink et al, 1997; O’Regan et al, 1999), the change in identity of the person they are conversing with (Simons and Levin, 1998), or the passage of a gorilla in the middle of a ball game (Simons and Chabris, 1999). As a group, these visual failures are referred to as “change blindness” (Rensink, 2002). Such limitations are rarely directly experienced in real life –except as one of the main instrument of magician tricks; yet they shape much of our visual perceptions. Fig 3.1. An example of change blindness. The two pictures, differing by an aspect unknown in advance (change of size, color, position, or even disappearance of an object, as shown here), are presented successively, always separated by a blank frame. This is repeated in a cycle. Typically, an observer requires many repetitions in order to notice the change, even when this change is substantial. These peculiar phenomena reflect in fact the limited capacity of an ingredient essential to much of visual perception: attention. While the retina potentially embraces the entire scene, attention can only focus on one or a few elements at a time, and thus facilitate their perception, their recognition, or their memorization for later recall. This is not to say that perception cannot exist outside the focus of attention, as will be seen in 2 the last section. Motion-induced blindness (Bonneh et al, 2001), flash-suppression (Wolfe, 1984) and binocular rivalry (Blake and Logothetis, 2002) are other examples of visual phenomena where the withdrawal of focal attention is likely to be critical. Before addressing the nature and the role of attention, it is equally important to understand what can be done in the absence of attention. This depends, in part, on the overall structure and organization of visual cortex. Fig 3.2. Visual cortical hierarchy. At least two functional streams can be identified as being emitted from primary visual cortex (V1). The ventral “what” pathway runs through V4 into infero-temporal cortex (right), while the dorsal “where” pathway comprises areas V3, MT and MST, ending within parietal cortex (left). Adapted from Felleman and Van Essen (1991). 3 3.1. CORTICAL HIERARCHIES AND PROCESSING STREAMS 3.1.1. Hierarchical organization. As detailed in the previous chapter, the three dozen cortical areas that constitute visual cortex are not randomly interconnected but display a specific pattern of organization. The laminar distribution of cortical projection neurons and axonal termination zones permits the observant neuro-anatomist to define forward, feedback and sideway cortico-cortical connections (Rockland & Pandya, 1979; Bullier et al, 1984). In visual cortex, each area can thus be assigned a position within a hierarchy comprising at least a dozen levels (Felleman and Van Essen, 1991; Van Essen et al, 1992). Functionally, in the ventral pathway, the hierarchy (which is non-unique) is best described as a sequence of feature-selective neuronal populations of increasing complexity (Barlow, 1972); each level “explicitly” represents a particular feature dimension (e.g. color, orientation), with high-level concepts and categories being “explicitly” represented in higher-level areas (e.g. inferior and medial temporal cortex). By “explicit”, we mean that the firing of a certain population of neurons can be directly related to the presence of this aspect or element within the visual scene. For example, direction of movement can be explicitly represented by certain neurons or cortical columns in area MT (Newsome et al, 1989). Patients with lesions in and around MT can show a selective loss for the perception of movement (Zihl et al, 1983). One can say that MT constitutes an “essential node” (Zeki, 2001) for direction of movement. There is probably a direct relation between the clinical concept of “essential node” and the neurophysiological concept of “explicit coding” (e.g. columnar representation). We believe that both concepts will prove to be very useful to describe neuronal coding and representation. To better understand the primate cortical visual system, whose organization is rather complex (Fig 3.2.), it is convenient to separate it in two distinct functional streams. 3.1.2. What and Where pathways It was primarily on the basis of lesion studies in macaque monkeys that Leslie Ungerleider and Mortimer Mishkin (Ungerleider & Mishkin, 1982) arrived at the conclusion that the visual system comprised two ensembles of cortical areas with complementary functions (Morel & Bullier, 1990). These experiments in macaques were informed by various neurological deficits observed in humans following specific lesions: impairments in the perception of space (e.g. neglect) after lesions of parietal cortex (Driver and Mattingley, 1998), and impairments in color (achromatopsia) or shape (agnosia) perception following lesions of temporal lobe areas (Humphreys & Riddoch, 1987). Upon lesioning ventral areas of the temporal lobe, Ungerleider and Mishkin (1982) observed that monkeys could find and manipulate objects but not discriminate between them on the basis of their shape; lesions of dorsal areas of the parietal lobe yielded the opposite pattern of results: shape discrimination was preserved, but spatial processing was greatly impaired. They postulated that the “ventral” stream was primarily concerned with “what”-like information (i.e., the identity of objects in the scene), while the second “dorsal” stream had to do with “where”-like information (i.e. the spatial location and 4 movement of objects). Since then, a similar distinction has been demonstrated in humans using PET and fMRI techniques (Ungerleider & Haxby, 1994). Nevertheless, the separation between these two streams is not absolute. First, there are significant connections between areas of the ventral and dorsal streams (Morel and Bullier, 1990; Baizer et al, 1991). Second, there exist cortical areas with an intermediate position between the temporal and parietal lobes (in particular around the superior temporal sulcus), that cannot be easily classified (e.g. Karnath, 2001). The case of patient D.F., who suffered from diffuse bilateral lesions affecting lateral extra-striate areas 18 and 19 (preventing ventral, but not dorsal pathway activation in this patient), initiated a different, though non-exclusive interpretation of this dichotomy. D.F. could not report the orientation (e.g. horizontal, vertical) of a slot made on the front of a box; yet when asked to “post” her hand through the slot, she would move it and orient it in perfect accordance with the orientation that she “could not” perceive. It seemed as if the patient’s visually guided movements could make use of information that the subject was not explicitly aware of. Milner and Goodale (1995) proposed that the correct distinction was in fact between a “what” pathway for perception (ventral) and a “how” pathway for action (dorsal): the latter could access limited shape information, but not deliver it to the observer’s awareness. Deriving from these theories, the “what” ventral stream has been associated with the contents of consciousness. The “where” dorsal stream is thought to be involved in spatial cognition, and in particular the guidance of eye and attentional mechanisms, as will be described in section 3.4. It is within the ventral stream that the hierarchical “feature extraction” functional organization is the most apparent. Among other things, neurons in V1 extract information about the orientation of bars and edges (Hubel & Wiesel, 1968); neurons in V2 respond to illusory contours in addition to real ones (Von der Heydt et al, 1984); in V4, to simple geometric patterns and shapes (Gallant et al, 1993; Ghose & T’so, 1997); in posterior infero-temporal cortex, to common object parts or features (Tanaka, 1996); in anterior infero-temporal cortex, to more complex categories of objects such as faces or animals (Perrett et al, 1982; Logothetis & Sheinberg, 1996; Vogels, 1999). The human fusiform gyrus is the homologue of monkey infero-temporal cortex, with strong face-selective responses obtained in electrophysiological recordings in the monkey (Allison et al, 1999) and fMRI in humans (Kanwisher et al, 1997), as well as responses to other types of objects (although this point is the subject of much ongoing debate; Chao et al, 1999; Gauthier et al, 2000; Tarr & Cheng, 2003). In the human medial temporal lobe (MTL), one step higher in the cortical hierarchy, electrophysiological recordings on epileptic patients have revealed that single neurons are able to respond selectively to individual images, celebrities and natural categories such as animals or cars (Kreiman et al, 2000). It seems evident that, computationally, neurons at a given stage in the hierarchy can build their selectivity by pooling together the outputs of neurons selective to more simple features at preceding levels. This powerful “feed- forward” representation scheme is applied rather successfully by many state-of-the-art object recognition neural network models (e.g. Fukushima & Miyake, 1982; Riesenhuber & Poggio, 1999).