Proc. Natl. Acad. Sci. USA Vol. 92, pp. 2433-2440, March 1995 Review Visual Thomas D. Albright* and Gene R. Stoner The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, CA 92037

ABSTRACT The primate visual mo- manifold functions performed by the pri- tion system performs numerous functions mate motion processing subsystem. We essential for survival in a dynamic visual will begin by briefly reviewing the origin world. Prominent among these functions and substance of the contemporary view is the ability to recover and represent the that visual motion is processed by a basic trajectories of objects in a form that fa- and well-defined neural system. cilitates behavioral responses to those .t ~/ / movements. The first step toward this The Motion Processing Apparatus goal, which consists of detecting the dis- placement of retinal image features, has Elementism-the reductive interpretation D~~~~~ / been studied for many years in both psy- of perception as merely the association of chophysical and neurobiological experi- a set of basic sensory elements-was a .4 1 ments. Evidence indicates that achieve- dominant theme in mid-19th century per- ment of this step is computationally ceptual . According to this in- straightforward and occurs at the earliest fluential doctrine, motion was not thought cortical stage. The second step involves to be directly sensed but rather indirectly the selective integration of retinal motion inferred from the association of light sen- FIG. 1. Neuronal directional selectivity, as first observed by Hubel and Wiesel (4) in signals according to the object of origin. sations at different points in space and primary (area V1) of rhesus mon- Realization ofthis step is computationally time. That is to say, motion was not key. (Left) The neuronal receptive field is in- demanding, as the solution is formally un- thought to be directly sensed but rather dicated by broken rectangles. The visual stim- derconstrained. It must rely-by defini- indirectly inferred. This view of motion ulus was moved in each of 14 directions (rows tion-upon utilization of retinal cues that perception began to fade in the latter part A-G; opposing directions indicated by arrows) are indicative of the spatial relationships of the 19th century as its explanatory through receptive field. (Right) Recorded within and between objects in the visual limitations became widely recognized. In traces of cellular activity are shown in which the scene. Psychophysical experiments have particular, the immediacy and directness horizontal axis represents time (2 s per trace), documented this dependence and suggested and each vertical line represents an action of the motion sense were suggested by potential. This responds most strongly mechanisms by which it may be achieved. early reports of two motion "illusions": (i) to motion up and to the right (row D). Re- Neurophysiological experiments have pro- the nonveridical experience of motion that printed with permission from ref. 4 (copyright vided evidence for a neural substrate that follows prolonged exposure to real mo- 1968, The Physiological Society, London). may underlie this selective motion signal tion, a phenomenon known as the motion integration. Together they paint a coherent after-effect or waterfall illusion (1); and within an anatomically segregated path- portrait of the means by which retinal im- (ii) the experience of motion induced by way extending from retina through several age motion gives rise to our perceptual light stimulation at discrete spatial and cortical stages (Fig. 2; see ref. 7 for re- experience of moving objects. temporal intervals (2), a phenomenon view). This pathway is commonly known known as apparent motion. The critical as the "magnocellular" or "M" stream Motion of an image on one's retina re- feature of both is the fact that movement and is typically contrasted with the ana- flects a change in one's visual environ- can be perceived without the observer tomically parallel and functionally com- ment. From the point of view of survival, having detected changes in the "primary" plementary "parvocellular" or "P" the ability to detect such motion is one of positional cues, a fact that led Wertheimer stream. The M stream originates with a the most important functions performed (3) to insist on the validity of motion as a morphologically distinct subset (Pa) of by the primate . Motion per- sensation unto itself, not uniquely reduc- retinal ganglion cells, which project ception has been a subject of study for ible to other sensory events-a view that uniquely to the magnocellular laminae of many decades in psychophysical experi- has carried us through the present day. the thalamic lateral geniculate nucleus ments. More recently, it has also been a Although the primacy of the motion (LGN). The magnocellular output as- central focus of both neurobiological and sense was thus commonly accepted by the cends cortically to area Vl and continues computational studies. In consequence, it time neurophysiological techniques be- through several successive cortical visual is a system for which we now have a unique came readily available, the foundation for areas including, most notably, the middle and critical convergence of information. our present understanding was set with temporal visual area, which is commonly the an neural known as area MT (or V5). Indeed, visual motion processing is argu- discovery of explicit repre- First described in the Dub- ably the most well-understood sensory sentation of motion in the form of cells macaque by for in ner and Zeki (8), area MT is a small subsystem in the primate brain. that exhibit selectivity the direction zone that By comparison with the state of our which an image feature moves across the visuotopically organized (9-12) knowledge, the scope of this review is retina (Fig. 1). In primates, this property fairly circumscribed. For example we will of directional selectivity is first seen at the Abbreviations: 1-, 2-, and 3-D, one-, two-, and three-dimensional; area MT, middle temporal consider only the primate visual system level ofprimaryvisual cortex (area V1) (4, visual area; IOC, intersection of constraints; and, specifically, the geniculo-striate- 5). These motion-sensitive Vl F/B, foreground/background assignment. extrastriate pathways. Our discussion will constitute part of a larger functional sub- *To whom reprint requests should be ad- also be limited to but a small sweep of the system, as evinced by the fact that they lie dressed. 2433 Downloaded by guest on September 30, 2021 2434 Review: Albright and Stoner Proc. NatL Acad ScL USA 92 (1995)

lies along the posterior extent of the lower parsing retinal image features into the related functions listed above has tradi- bank of the superior temporal sulcus (Fig. objects that are present in the visual scene tionally been cast in terms of the proper- 2) and is a recipient of ascending projec- ("image segmentation") (18,23,24). One of ties of retinal image features. As the pat- tions from areas Vi, V2, and V3 (12, 13). the most important and intuitively obvious tern of light falling upon the retina Directional selectivity is by far the most goals of the motion processing apparatus is, undergoes displacement in space and distinctive feature of MT and the property of course, to recover and represent the time, the task of the motion detector is to that has drawn persistent interest since its trajectories of real-world objects in a form detect the spatio-temporal continuity of behavioral responses to discovery. By striking contrast to sur- that will facilitate image features. Posed in this manner, those movements (25). It is this latter func- rounding cortical areas, including others there are really two facets to the problem. tion that we will focus on in the remainder of the M stream, some 95% of MT neu- The first pertains to the definition of of our review. rons exhibit marked directional selectivity "features" and the second addresses the The problem of representing object mo- "matching" of the simplest form (i.e., selectivity for tion might, upon first consideration, seem mechanism responsible for motion along a linear path in the frontal to be among the most direct and accessible these features in space-time. plane) in combination with a conspicuous of motion-related functions, possibly be- Input to Motion Detectors: Retinal Im- absence of selectivity for form or color cause it is a salient, common, and appar- age Cues and Motion Correspondence (10, 14, 15). It is the preeminence of these ently effortless part of conscious experi- Tokens. What is it, precisely, that is properties that led Zeki and others to the ence. As we shall see, however, it is nor- matched over space and time? The ques- supposition that area MT is a principal mally a difficult and underconstrained tion highlights the fact that motion detec- component of the neural apparatus for computation and one that is necessarily tion is, by its very nature, dependent upon motion processing. dependent upon other sources of informa- prior or coincident detection of contrast in tion about the structure of the visual the retinal image. The contrast elements Functions of the Motion Processing scene. This problem of motion processing potentially available as matching primi- Apparatus can be conveniently divided into two sub- tives are of various physical types (lumi- problems, which we will term "detection" nance, chrominance, texture, etc.) and to Visual motion is a source of information and "interpretation." Solutions these levels of abstraction (e.g., local gradients, problems are thought to be computed at that can serve many functions for a be- edges, and more complex object charac- having animal. These functions include (i) sequential stages in a hierarchical process, teristics). Historically, most attempts to and they have been tentatively identified identify motion detection mechanisms establishing the three-dimensional (3-D) with specific neuronal populations in the structure of a visual scene (16-18), (ii) primate brain. have blurred issues of matching primitives guiding balance and postural control (19, with the matching operation itself. As a 20), (iii) estimating the direction of the Motion Detection result, many of the subtypes of motion observer's own path of motion (21) and detectors that have been proposed to exist his/her time to collision with objects or The problem of visual motion detection, (see below) are defined by contrast type or surfaces in the environment (22), and (iv) which is common to all of the motion- level of abstraction rather than on the basis of the matching algorithm. MST We will first consider more closely the A level of abstraction at which spatio- temporal matches are made. At the sim- plest level, matching could occur prior to the encoding of any of the complex fea- tures-such as edges-that characterize our perception of natural scenes. The fact that a strong motion percept can arise from stimuli lacking any coherent spatial structure [such as random-dot cinemato- grams (26)] appears to support the idea (24, 27). On the other hand, the demon- stration that motion is more likely to be B seen between spatio-temporally displaced edges of the same orientation than be- tween edges of differing orientation (28) suggests that more complex image "to- kens" are matched by the motion system. The apparent contradiction between these psychophysical findings has traditionally been held as evidence for two motion detection subsystems: (i) "short-range" system that simply detects spatio-temporal FIG. 2. (A) Lateral view of macaque brain showing location of striate cortex (V1) and some continuity of local luminant energy over extrastriate visual areas. Sulci have been partially opened (shaded regions). Indicated borders of small displacements and (ii) a "long- visual areas (dashed lines) are approximate. EC, external calcarine sulcus; IO, inferior occipital range" system that is sensitive to higher- sulcus; IP, intraparietal sulcus; LA, lateral sulcus; LU, lunate sulcus; PO, parieto-occipital sulcus; order stimulus attributes and operates ST, superior temporal sulcus. (B) Anatomical connectivity diagram emphasizing hierarchical over a larger spatio-temporal range. organization and parallel processing streams along the geniculo-striate-extrastriate pathway. The neurophysiological data bearing on Except where indicated by arrows, connections are known to be reciprocal. Not all known the issue of matching primitives are less components of magnocellular (unshaded) and parvocellular (shaded) pathways are shown. RGC, layer; LGN, lateral geniculate nucleus of the thalamus; M, magnocellular dichotomous. For most directionally se- subdivisions; PI and P2, parvocellular subdivisions; MT, middle temporal; MST, medial superior lective neurons, these primitives are use- temporal; FST, fundus superior temporal; PP, posterior parietal cortex; VIP, ventral intraparietal; fully defined by the other stimulus at- STP, superior temporal polysensory. Reprinted with permission from ref. 6 (copyright 1993, tributes for which cells express sensitivity. Elsevier Science, Amsterdam). In area Vl, this means that the matching Downloaded by guest on September 30, 2021 Review: Albright and Stoner Proc. Natl. Acad. Sci. USA 92 (1995) 2435

token typically consists of a one-dimen- abstraction that happens to define a mov- Input sional (1-D) image contour of a particular ing image feature. It thus seems plausible receptive I0I2 3 orientation (4, 5). These early motion that the neural circuitry underlying mo- fields: detectors thus employ a set of matching tion detection may-like perception and primitives that is richer than that com- the responses of some neurons-be form- monly associated with the "low-level" cue invariant. While verification of this Neural short-range system-a conclusion that prediction must await more detailed anal- mechanism: seems at variance with the traditional yses of neuronal circuitry, the computa- short-range/long-range dichotomy. Addi- tional principles outlined below should be tional cause to question this view comes considered in this broad context. from recent psychophysical experiments Several detailed models have been pro- Output (directionally _ r1 that fail to find any sharp distinction be- posed to account for the computations selective) 1 > 30 tween the matching primitives important carried out by motion detectors (e.g., refs. receptive L 9* -1 --i for small vs. large spatio-temporal dis- 42-47). The earliest complete model was fields: placements (29). developed nearly 40 years ago by Hassen- FIG. 3. Simple spatio-temporal comparator Cavanagh and Mather (29) offer an stein and Reichardt (42) to explain the for directional motion detection by neurons, alternative dichotomy of motion-detec- behavioral sensitivity to visual motion ex- based upon computational principles of Has- tion subsystems founded on the type of hibited by winged arthropods. According senstein and Reichardt (42) and neurophysio- image contrast (rather than feature com- to this "correlation model," motion is logical observations of Barlow and Levick (48). plexity per se) that defines a moving fea- simply sensed by computing the product The neural circuit (Middle) consists of a repeat- ture. Specifically, they suggest a division of the outputs of two receptors possessing ing sequence of "input" neurons 1-3 with spa- tially adjacent receptive fields (Top). Each in- between mechanisms sensitive to "first- luminance sensitivity profiles that are dis- put neuron possesses both an excitatory con- order" vs. "second-order" image varia- placed in space and time. Subsequent neu- nection (+) that extends directly and a time- tion. First-order refers to variation along rophysiological studies of motion detec- delayed (At) inhibitory connection (-) that the primary dimensions of luminance or tion in the rabbit retina (48) demonstrated extends laterally and asymmetrically to neurons color, whereas second-order image con- the existence of a similar correlation-type at the motion detection stage (Ml, M2). These trast includes variation along "secondary" operation, based upon delayed inhibition motion-sensitive neurons operate as NOT- dimensions, such as texture, binocular dis- extending laterally from one receptor's AND gates, and the lateral temporally delayed parity, or luminance contrast modulation. output to that of a spatially adjacent re- connection confers upon them the property of Support for this dichotomy comes from ceptor (Fig. 3). When a visual stimulus directional selectivity. Thus, leftward motion (receptive field stimulation sequence 3,2,1) psychophysical (e.g., ref. 30) and neuro- moves in the appropriate direction, the yields a direct excitatory input to each motion physiological (e.g., ref. 31) evidence sug- flow of inhibition through the circuit par- detector in turn. Under these conditions, the gesting that the motion of first-order im- allels the spatial displacement of the stim- lateral inhibitory connections have no func- age features is detected independently of ulus, thereby nulling any potential recep- tional effect. By contrast, rightward motion that for features defined by second-order tor output. For other directions, the inhi- triggers an inhibitory signal that propagates cues. bition is less marked or nonexistent. The laterally in parallel with the displacement of the Regardless ofwhether different types of result is neuronal activity that varies as a input activity. Each motion detector thus re- image variation are extracted by indepen- function of direction of stimulus motion sponds more strongly to leftward than to right- dent mechanisms, we predict that at some (i.e., directional selectivity). Similar evi- ward motion across its receptive field (Bottom). level there should exist motion-sensitive dence applies to the mechanisms for mo- neurons that represent the displacement tion detection in other mammalian sys- perpendicular to the preferred orienta- of a feature, whether it is defined by color, tems (e.g., ref. 49). tion. All other image motion is effectively luminance, or texture. This prediction of "Motion energy" models (44, 45) em- invisible. Because it is 1-D, the output form-cue invariance (32, 33) is based on phasize the processing of motion in spatio- provided by each motion detector is thus the observation that the velocity of an temporal frequency domain and offer an ambiguous with respect to the 2-D mo- object is physically unrelated to the cue alternative to correlation models. Al- tions of the retinal image. A secondary that distinguishes that object in the retinal though the computations that underlie processing stage is needed to integrate image. Perceptual form-cue invariance for motion energy and correlation models are 1-D motion signals and thereby interpret moving features has been demonstrated formally equivalent to one another (43), the visual scene events that gave rise to for luminance and chrominance (34, 35) as the two strategies do suggest somewhat them. well as a variety of second-order cues, such different neural implementations. The re- as texture (36, 37) and stereoscopic dis- sults of recent neurophysiological experi- Motion Interpretation parity (38). A potential neuronal substrate ments appear to support the motion en- for form-cue invariant perception has ergy form of implementation (50). The problem of visual motion interpreta- been found in area MT. In addition to its 1-D vs. Two-Dimensional (2-D) Motion tion is that ofrecovering the trajectories of well-known sensitivity to conventional lu- Signals. The retinal surface is 2-D. Ac- visual scene elements (objects) from the minance-defined motion (10, 14, 15), cordingly, the trajectory of a moving ret- retinal image information provided by the many cells in this cortical area also re- inal image (cast by a moving object) re- motion detection stage-the sort of rep- spond to the motion of chromatically de- quires a 2-D vector for adequate descrip- resentation that both forms our conscious fined stimuli (39-41) as well as motion of tion. The motion signals rendered by the experience of motion and enables us to second-order cues (32). The functional detection processes we have described are, interact effectively with a dynamic envi- utility of such an arrangement is quite however, inherently 1-D. Specifically, the ronment. While we have seen that it is a clear: the system gains more uniform sen- Vl neurons that are thought to constitute computationally straightforward matter sitivity to motion over the broad spectrum the motion-detection stage are individu- to detect the displacement of retinal im- of cues that are characteristic of our visual ally sensitive to image-contrast gradients age features, the subsequent representa- world. along a single dimension-i.e., they are tion of real object motions is greatly in- Computations Underlying Motion De- orientation selective. Each neuron of this convenienced by the fact that such mo- tection. Spatio-temporal comparison is sort can only detect image motion along tions are not uniquely evident from the fundamental to the detection of motion the dimension for which it possesses con- dynamic patterns of light in the retinal whatever the type of image cue or level of trast gradient sensitivity-along the axis image. The reason for this is quite clear Downloaded by guest on September 30, 2021 2436 Review: Albright and Stoner Proc. NatL Acad Sci. USA 92 (1995) and not unexpected: Determination of monkey, the occluding branch, and the ble for selective integration of motion object motion in a 3-D world from a 2-D shadow. As we have seen, however, each signals. projected image is a facet of the inverse primary motion detector merely renders a The Mechanics of Integration. Compu- problem of optics, for which no unique signal indicating the presence of a moving tational considerations. Consider the sim- solution exists (51). feature (an oriented contour, for exam- ple case of a single moving object with a In principle, knowledge of the physical ple) at a particular location in the retinal many-faceted boundary contour (Fig. 6). rules by which reflected lights mix in the image. Moving objects are typically, as in According to the preceding arguments, formation of the retinal image should this example, formed from an amalgam of early detectors represent, in a piece-wise facilitate recovery of the scene elements such motion correspondence tokens, fashion, the motion of retinal features that gave rise to the image. That the which implies that motion interpretation along limited regions of the contour. Be- primate visual system has "knowledge" of depends upon integration of primary mo- cause the motion signal rendered by each such rules is implied by the very fact that tion signals (55). Moreover, the integra- is inherently 1-D, the true 2-D trajectory we are able to segment complex retinal tion process must be selective, utilizing of the pattern is only knowable by inte- images into component objects. Retinal image segmentation cues to link retinal grating information from multiple detec- image features that are, according to these motion signals according to the object of tors. Formally speaking, true pattern mo- rules, indicative of specific object interre- origin (33). tion can be determined from the intersec- lationships are appropriately termed im- The mechanics of integration and the tion of constraints provided by two or age segmentation cues. While the impor- means bywhich selectivity is imposed have more 1-D motion signals (Fig. 6)-that is, tance of such cues for object recognition become amenable to study using visual the single 2-D direction and speed that is has long been appreciated (52), recent stimuli known as "plaid patterns" (56, 57), consistent with the evidence provided by evidence suggests that they are also crucial which are formed by spatial superposition each 1-D signal (47, 57). Although the to the interpretation of motion (33, 52- of two drifting periodic gratings (Fig. 5). mechanism remains to be determined, 54). Plaids provide a simple laboratory ana- one can readily envision a circuit whereby As illustration ofthis point, consider the logue to natural conditions that give rise a neuron representing a specific 2-D pat- image in Fig. 4. That is, of course, a single to overlapping moving contours in the tern trajectory is supported by an ensem- frame from a dynamic sequence, which is retinal image; as happens, for example, ble of neurons providing appropriate 1-D meant to convey some of the complexities when one moving object passes in front of motion signals (15, 55, 57). Accordingly, of natural scenes. The goal of this motion another. Their utility derives from the fact simultaneous activation of two or more subsystem is to represent the independent that under some conditions the two com- inputs would engage a unique pattern trajectories of the scene elements: the ponent gratings are seen to drift "non- motion detector. coherently" past one another, whereas Psychophysics. The hypothesis that the under other conditions, the gratings form integration mechanism actually exploits a rigid 2-D pattern that moves "coherent- the IOC rule has been lent support by ly" in a single direction. These robust several psychophysical studies employing perceptual phenomena thus serve as a plaid patterns (e.g., refs. 57-59). Wilson model for the selective integration of mo- and colleagues (e.g., ref. 60) have ob- tion signals. By examining the character of served, however, that some plaid config- the coherent motion percept, it may be urations ("type II" plaids, which unlike possible to infer something about the in- the example in Fig. 5 possess component tegration mechanism itself. Likewise, by directions that both lie to one side of the exploring the image conditions that lead pattern direction) lead to a percept that to the vastly dissimilar coherent and non- differs from the IOC prediction but is coherent motion percepts, it may be pos- approximated by a vector-sum of 1-D sible to discover the mechanisms that gov- component motions (see ref. 61 for re- ern selectivity. Finally, by employing plaid view).Apriori, the general utility ofvector stimuli in neurophysiological experi- summation as an approach to the motion ments, it should be possible to reveal the signal integration problem would appear neuronal structures and events responsi- questionable, as (in contrast to IOC) it c a b 4 Pattern motion + OR FIG. 4. (Top) Single frame from dynamic d sequence illustrating complexity of motion in- Vr terpretation in natural scenes. (Middle) The motion detection stage extracts local 1-D com- ponents of velocity for retinal image features. Individually, these motion measurements are ambiguous with respect to motions of the ob- jects from which they arise. (Bottom) The task Component motion of the motion interpretation stage is to selec- tively combine detection-stage measurements FIG. 5. Moving plaid patterns are produced by superimposition of two drifting periodic according to the object of origin. The desired gratings (a and b) (56, 57). The resultant percept is either that of a coherently moving 2-D pattern outcome is a representation of the independent (c) or two 1-D gratings sliding past one another (d), depending upon a variety of stimulus motions of objects. parameters. Downloaded by guest on September 30, 2021 Review: Albright and Stoner Proc. NatL Acad Sci. USA 92 (1995) 2437

respond best when the entire plaid pattern our experience of complex dynamic moves in the preferred direction reflect scenes, the underlying mechanism neces- integration of component motions and are sarily exploits retinal image segmentation termed "pattern neurons." cues, which by definition reflect the spa- As anticipated, neurons in area Vl are tial interrelationships (adjacency or super- of component type by this criterion (55), position vs. contiguity) between surfaces as are the majority of neurons ("60%) in in the visual scene (33). By simulating the area MT. A subset of MT neurons optical conditions of a ubiquitous real- (-25%), however, are of the more inter- world scene, two overlapping moving sur- esting pattern type (Fig. 7; refs. 55, 63, and faces, moving plaid patterns have afforded 64). Neurophysiological analysis has thus a simple and direct means to evaluate the FIG. 6. Intersection of constraints (IOC) identified specific neural events that cor- expected role of image segmentation cues solution to motion signal integration. When respond to the outcome of the predicted in motion integration. examined locally (as implied by circles), the computational process and may underlie As indicated previously (Fig. 5), there apparent motion of a 1-D image feature em- the perceptual integration of motion sig- are conditions under which the compo- bedded in a 2-D pattern is inherently ambigu- nals. nents of a plaid pattern are perceived to ous but physically constrained to a "family" of 2-D image motions (as suggested by the vectors Selectivity of Integration. Computa- slide noncoherently across one another. that terminate on the broken constraint lines). tional considerations. We come closer to For example, noncoherence becomes The IOC from two or more 1-D motion mea- the problem of motion interpretation more likely if the components differ sig- surements defines the 2-D velocity of the pat- when we evaluate the complexities of se- nificantly along a particular stimulus di- tern (bold rightward arrow). lective motion signal integration. Con- mension, such as luminance contrast, spa- sider the case in which two moving objects tial frequency (55, 57), or color (65, 66). does not guarantee a veridical solution. have trajectories such that their projec- These results traditionally have been in- The nonveridical percept associated with tions overlap in the retinal image. Here, terpreted as manifestations of channel- type II plaids may be rare, however, when contrary to the single object case, motion specific integration mechanisms, in which viewing natural scenes because of the interpretation depends not simply upon dissimilar components are processed by more uniform distributions of compo- integrating motion signals. Rather, it is independent neural channels (55). While nents possessed by most natural moving also conditional upon independently inte- such proposals are valid and focus our objects. Moreover, as Wilson (61) and grating only those signals common to each attention on the mechanisms potentially others (62) have argued, this limitation object (such that, for example, motion responsible for selective integration, they can be corrected by utilizing information signals arising from the shadow in Fig. 4 generally fail to address the functional given by second-order motion cues are not pooled with those arising from the utility of this process. We therefore advo- present in plaid patterns. In any event, the monkey). Posed as a generic matter of cate a complementary view in which se- psychophysical evidence accumulated combining retinal motion signals, the lective integration is interpreted in a thus far can arguably be accommodated by problem has no unique solution. This not- broad functional light: featural dissimilar- integration schemes based on either IOC withstanding, the primate visual system ity may be regarded as a source of infor- or vector-sum architectures. It thus seems clearly does pretty well. To achieve the mation that fosters image segmentation, that final resolution of this controversy selective motion integration suggested by with the corresponding effects on motion must await anatomical or physiological studies of functional connectivity between motion processing stages. GRATING |COMPONENTCOMPREDICTIONS: PLA--PLAID . We have seen that the TUNINGGRANINGCUVSCURVE PATTERN TUTUNING--CURVESIDGCRE motion detection stage can be identified with directionally selective neurons in area Vi. A goal of recent neurophysiolog- ical experiments has been to identify the level of the cortical hierarchy at which the integration of these primary motion sig- nals takes place (55, 63, 64). It was pre- dicted that this secondary stage would be recognizable by the emergence of neurons that exhibit selectivity for the direction of motion of 2-D patterns rather than for the motion of the oriented features that make up those patterns (55). This prediction can be tested by comparing the directional tuning elicited by a 1-D stimulus, such as a single grating, with that elicited by a plaid pattern, composed from two such gratings. The preferred direction of mo- tion is determined for each neuron in a FIG. 7. Data from two MT neurons representing "component" (Upper) and "pattern" (Lower) conventional fashion by using the single stages of motion processing. Direction tuning curves were acquired using a drifting sine-wave grating drifting grating. When stimulated with (Left) or a perceptually coherent plaid pattern (Right). Responses elicited by each stimulus type, plaid patterns, neurons that respond best moving in each of 16 directions, are plotted in polar format. The radial axis represents response per second the angle represents direction of motion, and the small when either component moves in the pre- amplitude [spikes (s/s)], polar central circle represents spontaneous activity level. Both cells exhibit a single peak in grating tuning are definition ferred direction by preinte- curve. From these curves, responses to plaid patterns were predicted (55) (Center). Component gration and are termed "component neu- predictions reflect sensitivity to both oriented gratings in the plaid pattern. Pattern predictions reflect rons." This is the expected behavior of the sensitivity to composite appearance of the plaid. By definition, behavior of component neuron motion detection stage, which we consid- conforms to component prediction, while that of pattern neuron conforms to pattern prediction. ered above. By contrast, neurons that Adapted from ref. 63; printed with permission (copyright 1989, Springer, Berlin). Downloaded by guest on September 30, 2021 2438 Review: Albright and Stoner Proc. NatL AcatL ScL USA 92 (1995) coherence born out of this process (33). latter conditions of transparency are char- illustrated in Fig. 9a is only physically con- This "image segmentation hypothesis" acteristic of shadows. Because the reflec- sistent with transparency if region A repre- has a far-reaching implications, which can tances of both surfaces can vary indepen- sents the intersection of two foreground be explored empirically through the intro- dently, as can the foreground attenuation features, and region D represents the back- duction of image segmentation cues to factor, there exists a broad range of lumi- ground across which the features move. If, plaid patterns. nance configurations compatible with on the other hand, we were told that region Psychophysics. There are many manip- transparency/occlusion. The primate vi- D represents the foreground intersection ulations that can be used to elicit image sual system typically recognizes-in- and region A represents the background, we segmentation. Ofparticular interest in this stantly and effortlessly-configurations would be forced to conclude that the pattern context are cues that promote perceptual within this range as having resulted from is incompatible with transparency/occlu- depth ordering of spatially overlapping the depth-ordering conditions that would sion because the intersection luminance is features. Perhaps the most obvious cue in normally give rise to them. brighter than is physically possible. this class is binocular disparity. As pre- Stoner et al. (53) incorporated these Whether region A or D happens to be dicted by the image segmentation hypoth- rules for transparency/occlusion in the foreground in the real world is, of course, esis, plaids composed of two gratings that construction of plaid patterns (Fig. 8). By impossible to determine from these reti- lie in different stereoscopically defined varying the luminance of a single (repeat- nal images. Fortunately the primate visual depth planes are perceived to move non- ing) subregion of the plaid, it was possible system furnishes default perceptual inter- coherently (54, 67). to form retinal image conditions that were pretations by using other sources of infor- It is also well known that depth ordering either physically consistent or inconsistent, mation and probabilistic rules. For exam- can be elicited by luminance-based cues with one component grating overlapping ple, smaller image features are generally for transparency and opaque occlusion. another. As predicted, human subjects were interpreted (all else being equal) as fore- The "rules" by which these cues operate very likely to perceive noncoherent motion ground (23). Thus, the default interpreta- have been extensively documented (e.g., of the individual components when the tion for the image in Fig. 9a holds region ref. 68), and they are largely dictated by plaid was configured for transparency or A to be foreground. By simply reversing the physics of light mixture in optical occlusion. By contrast, coherent motion of the relative sizes of regions A and D, the projection. Briefly, the intensity of light the plaid became the dominant percept opposite percept ensues (Fig. 9b). Stoner corresponding to the region at which two when the luminance relationships were and Albright (unpublished data) used surfaces overlap will normally be equal to physically incompatible with transparency/ these and other means to manipulate F/B the sum of (i) light reflected directly from occlusion. assignment in plaid patterns. Because per- the foreground surface and (ii) light re- Stoner and Albright (unpublished data) ceptual transparency depends upon F/B, flected from the background surface and extended this result with a demonstration they predicted that motion coherence subsequently attenuated as it passes of the additional constraints on motion would be markedly influenced by F/B through the foreground surface. If, for coherence contributed by pictorial cues manipulations. In accordance with these example, foreground attenuation is com- for image segmentation. Specifically, they predictions, human subjects were found plete, the background luminance contri- exploited the fact that perceptual trans- most likely to report noncoherent motion bution is nil and the "intersection lumi- parency/occlusion, as manipulated by lu- when the presumptive foreground inter- nance" is simply equal to the foreground minance relationships, is inherently de- section was compatible with transparen- luminance. This is the unique case of pendent upon establishing which image cy/occlusion. occlusion. Alternatively, the foreground features are interpreted by the observer as These results and others of a similar surface may attenuate incompletely and foreground and which are interpreted as nature (54, 66, 71, 72) are consistent with possess no reflectance of its own. The background. For example, the plaid unit the proposal that image segmentation cues influence the perceptual integration of motion signals. The functional utility (indeed, necessity) of such a scheme is clear: Survival in a dynamic environment is critically dependent upon the ability to distinguish motions of occlusive or trans- parent objects, as well as shadows, from the motions of the surfaces over which they move. However, in the light of cur- rent theories of motion detection, these c X d e transparency phenomena are truly puz- zling, and the mechanism behind them has become a subject of much debate. Mechanism. One type of mechanism that has merited serious consideration is derived from the fact that transparency manipulations introduce moving lumi- t~~~Da nance gradients to which primary motion FIG. 8. Procedures for creating transparent plaids are derived from rules of light mixture in detectors are known to be sensitive (refs. formation of retinal image (68). (Upper) Each plaid (a) can be viewed as a tesselated image 62 and 73; unpublished data). It is thus composed of four repeating subregions (b) identified as A, B, C, and D. Region D is normally conceivable that transparent plaid pat- seen as background (because of its larger size). Regions B and C are seen as narrow overlapping terns simply elicit greater activity from surfaces and remaining region D as their intersection. Perceptual transparency can be manipu- primary motion detectors tuned to the lated by varying the luminance of region A, while luminances of regions B, C, and D remain pattern direction than do nontransparent constant. (Lower) Icons depict luminance configuration for three examples, along with an if one accepts the existence indication of dominant percept. For one of these conditions (d), luminance of region A is chosen plaids. Indeed, to be consistent with transparency, yielding a percept of noncoherent motion. For the other two of certain nonlinearities in the encoding of conditions illustrated, luminance of region A is either too dark (c) or too bright (e) to be consistent retinal image contrast (70), it can be with transparency. These nontransparent plaids generally elicit a percept of coherent pattern shown that the magnitude of the lumi- motion. Adapted from ref. 64; printed with permission (copyright 1992, Macmillan, London). nance gradient moving in the pattern di- Downloaded by guest on September 30, 2021 Review: Albright and Stoner Proc. NatL Acad. ScL USA 92 (1995) 2439 perceptual assignment of the component neuron are shown in Fig. 10. When stim- intersection feature (region A in Fig. 8a) ulated using nontransparent/perceptually as either "intrinsic" (a variation in surface coherent plaids, responses were strongest D _ reflectance) or "extrinsic" (an incidental when the plaid moved in the cell's pre- consequence of overlap in the formation ferred direction. As described above (Fig. of the retinal image) to the plaid pattern 7Lower), this style of selectivity is char- (71). While this conclusion carries impor- acteristic of pattern-type neurons. When tant functional implications, it begs the stimulated using the transparent and per- question of mechanism. One promising ceptually noncoherent plaid, however, this approach to this problem involves the use cell's behavior underwent a marked trans- of a "feature selection" module (74). The formation: The pattern response dropped function of such a mechanism is to choose while component responses became ele- from among the available motion signals vated. The resultant bilobed directional FIG. 9. Schematic depiction of one of several those that are most likely to be indicative tuning curve is characteristic of compo- means used by Stoner and Albright (unpublished data) to manipulate percept of foreground and of true pattern motion. When imple- nent type neurons (Fig. 7 Upper). In other background in plaid patterns. Plaids were con- mented as a neural network, this selective words, when the stimulus was configured structed as tessellatedversions ofthese basic units faculty can be acquired through training, to elicit a percept of coherent pattern (see Fig. 8). (a Upper) The largest region D is and it effectively enables the motion inte- motion, the cell appeared to represent typically seen as background, and with that con- gration stage to distinguish between mo- that motion. By contrast, when the stim- straint, luminance configuration is compatible tion signals arising from intrinsic vs. ex- ulus was designed to elicit a percept of two with transparency. (a Lower) As predicted, hu- trinsic sources (provided that the model components sliding past one another, the man subjects report a percept of component has "learned" the relevant set of "rules"). cell appeared to represent those indepen- motion. (b Upper) By simply reversing the relative sizes of regions A and D, while leaving the Indeed, simulations of perceptual motion dent motions. Similar behavior was ob- luminance of each region unaltered, the domi- coherence performed with a feature se- served for the majority of MT neurons nant percept becomes such that region D is now lection model (74) yield results that are studied. seen as foreground. With that constraint, region strikingly consistent with the original The plasticity expressed by such neu- D is not consistent with transparency (it is too transparency results of Stoner et al. (53). rons stands as a potential neural substrate bright). (b Lower) Predictably, subjects report a Whether this approach can be made to for the selective quality of the perceptual percept of 2-D pattern motion. These fore- generalize across the range of image fac- experience. The circuitry responsible for ground/background (F/B) manipulations do not tors known to influence perceptual mo- this neuronal phenomenon is yet un- significantly alter the relative magnitude of lumi- tion coherence remains to be seen. nance gradients moving in pattern and compo- known. Possibilities include enhanced syn- nent directions (70). Neurophysiology. As we have seen, stud- aptic efficacy for inputs to the pattern ies of cellular response properties have motion stage that reflect motion of reli- rection is approximately minimal when revealed a population of neurons in cor- able or intrinsic features. Such proposals the plaid in Fig. 8a is configured for tical visual area MT that may be respon- appear to be compatible with the various transparency (refs. 62 and 73; unpublished sible for perceptual motion coherence (55, mechanisms that have been proposed for data). In other words, the strength of the 63). Because these pattern-type neurons integration itself (i.e., IOC, vector sum- coherent pattern motion percept can be are believed to play a significant role in mation). They are, moreover, in line with roughly accounted for by the magnitude the motion signal integration process, the feature selection model mentioned of the luminance gradient moving in the Stoner and Albright (64) hypothesized above (74), and they could be imple- pattern direction. that neural responsivity would be altered mented by altering the temporal syn- The computational appeal of this sim- by the same stimulus attributes known to chrony of active component inputs (75). ple explanation is undercut, however, by influence the selectivity of perceptual mo- Much more neurophysiological data are the lack of generality it affords. Consider, tion coherence. To test this hypothesis, needed, however, to weigh these various for example, the aforementioned interac- perceptual motion coherence was manip- possibilities. tive effects of F/B and transparency on ulated in a manner identical to the earlier perceptual motion coherence. In this case, psychophysical study (53), such that plaid Concluding Remarks altering the relative sizes of pattern sub- patterns were either compatible or incom- regions can trigger a radical change in the patible with transparency. Data obtained Nearly 50 years ago, and not long after the observer's state of perceptual coherence. from a typical directionally selective MT advent of neurophysiological techniques, However, the relative size manipulations do not significantly alter the fraction of 90 U) luminant energy drifting in the pattern L_ 4 80- direction (ref. 62; unpublished data). In Nontransparent ': other words, the result is incompatible (D 30 - 70- with the predictions of the luminance gra- a) 20 60 dient hypothesis. Similarly, the reported ~j10- Transparent (I, 50- conjoint influences of binocular disparity o 0 tutu-1003-2 and luminance cues for transparency (54) -10 40 expose a dissociation between the pres- cc:a) O 90 t80 270 360 -90 0 ence of luminance gradients and motion Grating direction, PEattern direction, Nontransparent coherence. Finally, and perhaps most re- degrees degrees vealingly, perceptual transparency and motion coherence also parallel spontane- FIG. 10. Neural correlates of perceptual motion signal integration in cortical visual area MT. (Left) Directional tuning for single drifting grating. (Center and Right) Responses to coherent ous ("metastable") reversals of F/B that (nontransparent) and noncoherent (transparent) plaids. When stimulated with coherent plaid occur in the absence of any modifications patterns, response (-, A) was maximal when the pattern moved in the neuron's preferred direction to the retinal stimulus (unpublished data). (00 in Center). When stimulated with noncoherent plaid patterns, however, responses (n) were It would thus seem that the only com- maximal when either component moved in the preferred direction (±67.5° in Center). Adapted mon determinant of motion coherence is from ref. 64; printed with permission (copyright 1992, Macmillan, London). Downloaded by guest on September 30, 2021 2440 Review: Albright and Stoner Proc. NatL Acad Sci. USA 92 (1995) Donald Hebb (69) observed that "the 1. Addams, R. (1834) Philos. Mag. 5, 373. 39. Dobkins, K. R. & Albright, T. D. (1994) J. Neu- and chart 2. Exner, S. (1875) Arch. Ges. Physiol. 11, 403-432. rosci. 14, 4854-4870. psychologist neurophysiologist 3. Wertheimer, M. (1961) in Classics in Psychology, 40. Saito, H., Tanaka, K., Isono, H., Yasuda, M. & the same bay." While one sometimes has ed. Shipley, T. (Philosophical Library, New Mikami, A. (1989) Exp. Brain Res. 75, 1-14. the feeling that they sail silently past one York), pp. 1032-1089 (originally published in 41. Gegenfurtner, K. R., Kiper, D. C., Beusmans, another in the night, scarcely has Hebb's 1912). J. M. H., Carandini, M., Zaidi, Q. & Movshon, dictum proven more profitable than in the 4. Hubel, D. H. & Wiesel, T. N. (1968) J. Physiol. J. A. (1994) Visual Neurosci. 11, 455-466. (London) 195, 215-243. 42. Hassenstein, B. & Reichardt, W. (1956) Chlo- experimental study of the neural bases of 5. Schiller, P. H., Finlay, B. L. & Volman, S. F. rophanus Z. Naturforsch Teil B 11, 513-524. visual motion perception. As we have tried (1976) J. Neurophysiol. 39, 1288-1319. 43. Van Santen, J. P. H. & Sperling, G. (1985) J. Opt. 6. Albright, T. D. (1993) in Visual Motion and Its Soc. Am. A 2, 300-320. to convey, attempts to understand motion 44. Adelson, E. H. & Bergen, J. R. (1985)1J Opt. Soc. perception necessarily involve the insepa- Role in the Stabilization ofGaze, eds. Miles, F. A. Am. A 2, 284-299. rable issues of phenomenology, function, & Wallman, J. (Elsevier, Amsterdam), pp. 177- 45. Watson, A. B. & Ahumada, A. J. (1985) J. Opt. 201. Soc. Am. A 2, 322-342. and mechanism, and they benefit most 7. Livingstone, M. S. & Hubel, D. H. (1988) Science 46. Marr, D. C. & Ullman, S. (1981) Proc. R. Soc. when guided by the convergent lights of 240, 740-749. London B 211, 151-180. computation, physiology, anatomy, per- 8. Dubner, R. & Zeki, S. M. (1971) Brain Res. 35, 47. Fennema, C. L. & Thompson, W. B. (1979) Com- ception, and behavior. In measure of suc- 528-532. put. Graphics Image Process. 9, 301-315. 9. Allman, J. M. & Kaas, J. H. (1971) Brain Res. 31, 48. Barlow, H. B. & Levick, R. W. (1965) J Physiol. cess, we have identified computational 85-105. (London) 173, 477-504. goals of the system, linked them to specific 10. Zeki, S. M. (1974) J. Physiol. (London) 236, 49. Ganz, L. & Felder, L. (1984) J. Neurophysiol. 51, loci in a distributed and hierarchically 549-573. 294-324. 11. Gattass, R. & Gross, C. G. (1981) J. Neuro- 50. Emerson, R. C., Bergen, J. R. & Adelson, E. H. organized neural system, and documented physiol. 46, 621-638. (1992) Vision Res. 32, 203-218. their functional significance in a real- 12. VanEssen, D. C., Maunsell, J. H. R. & Bixby, 51. Horn, B. K. P. (1986) Robot Vision (MIT Press, world sensory/behavioral context. None- J. L. (1981) J. Comp. Neurol. 199, 293-326. Cambridge, MA). theless, it should be evident that many 13. Ungerleider, L. G. & Mishkin, M. (1979) J. 52. Nakayama, K. & Shimojo, S. (1990) Cold Spring Comp. Neurol. 188, 347-366. Harbor Symp. Quant. Biol. 55, 911-924. basic questions remain unanswered, pos- 14. Maunsell, J. H. R. & Van Essen, D. C. (1983) J. 53. Stoner, G. R., Albright, T. D. & Ramachandran, ing formidable technical and conceptual Neurophysiol. 49, 1127-1147. V. S. (1990) Nature (London) 344, 153-155. challenges to modern neuroscience. 15. Albright, T. D. (1984)J. Neurophysiol. 52, 1106- 54. Trueswell, J. C. & Hayhoe, M. M. (1993) Vision Finally, one of the most important les- 1130. Res. 33, 313-328. 16. von Helmholtz, H. (1924) Physiological Optics, 55. Movshon, J. A., Adelson, E. H., Gizzi, M. & sons from this analysis of visual motion Vol. 3 [English translation by Southall, J. P. C. Newsome, W. T. (1985) in Study Group on Pat- perception concerns the critical role of from (1909) Handbuch derPhysiologischen Optik tern Recognition Mechanisms, eds. Chagas, C., Hamburg, 3rd. Gattass, R. & Gross, C. G. (Pontifica Academia context in neurophysiological investiga- (Voss, Germany), Ed.]. Scientiarum, Vatican City), pp. 117-151. tions. To see the motion of a retinal image 17. Wallach, H. & O'Connell, D. N. (1953) J. Exp. 56. De Valois, K. K., De Valois, R. L. & Yund, E. W. feature (in the sense that we have consid- Psychol. 45, 205-217. (1979) J Physiol. (London) 291, 483-505. 18. Nakayama, K. & Loomis, J. M. (1974) Perception 57. Adelson, E. H. & Movshon, J. A. (1982) Nature ered it) means to interpret the real-world 3, 63-80. (London) 300, 523-525. events that gave rise to it, and that inter- 19. Gibson, J. J. (1950) The Perception of the Visual 58. Welch, L. (1989) Nature (London) 337, 734-736. pretation is quite impossible without con- World (Houghton Mifflin, Boston). 59. Burke, D. & Wenderoth, P. (1993) Vision Res. 33, text. The point hardly needs definition, as 20. Lee, D. N. (1980) Philos. Trans. R Soc. London 343-350. B 290, 169-179. 60. Ferrera, V. P. & Wilson, H. R. (1990) Vision Res. it has been a linchpin of experimental 21. Koenderink, J. J. (1986) Vision Res. 26, 161-180. 30, 273-287. psychology for well over 100 years (e.g., 22. Lee, D. N. (1976) Perception 5, 437-457. 61. Wilson, H. R. (1994) in Visual Detection of Mo- ref. 23). Nonetheless, for a variety of 23. Koffka, K. (1935) Principles ofGestalt Psychology tion, eds. Smith, A. T. & Snowden, R. J. (Aca- (Routledge & Kegan Paul, London). demic, London), in press. reasons, it is a point that has been largely 24. Braddick, 0. (1974) Vision Res. 14, 519-527. 62. Stoner, G. R. & Albright, T. D. (1994) in Visual neglected in neurophysiological studies of 25. Gibson, J. J. (1979) The Ecological Approach to Detection of Motion, eds. Smith, A. T. G. & vision. The reviewed studies of motion (Houghton Mifflin, Boston). Snowdon, R. J. (Academic, London), in press. signal integration suggest that neuronal as 26. Julesz, B. (1971) Foundations of Cyclopean Per- 63. Rodman, H. R. & Albright, T. D. (1989) Exp. ception (Univ. of Chicago Press, Chicago). Brain Res. 75, 53-64. well as perceptual response to a moving 27. Anstis, S. M. (1970) Vision Res. 10, 1411-1430. 64. Stoner, G. R. & Albright, T. D. (1992) Nature retinal image feature is critically depen- 28. Ullman, S. (1979) The Interpretation of Visual (London) 358, 412-414. dent upon contextual factors that influ- Motion (MIT Press, Cambridge, MA). 65. Krauskopf, J. & Farell, B. (1990) Nature (Lon- ence image interpretation and are unre- 29. Cavanagh, P. & Mather, G. (1989) Spatial Vision don) 348, 328-331. 4, 103-129. 66. Kooi, F. L., De Valois, K. K. & Switkes, E. (1992) lated to motionper se. Only by adopting a 30. Ledgeway, T. & Smith, A. T. (1994) Vision Res. Perception 21, 583-598. contextual approach will it be possible for 34, 2727-2740. 67. Adelson, E. H. & Movshon, J. A. (1984) J. Opt. 31. Zhou, Y. X. & Baker, C. L. Jr. (1993) Science Soc. Am. A 1, 1266. neurophysiologists and psychologists to 68. Metelli, F. (1974) Sci. Am. 230, 91-95. jointly chart the many murky waters of 261, 98-101. 69. Hebb, D. 0. (1949) The Organization ofBehavior perception. 32. Albright, T. D. (1992) Science 255, 1141-1143. (Wiley, New York). 33. Stoner, G. R. & Albright, T. D. (1993) J. Cognit. 70. MacLeod, D. I. A., Williams, D. R. & Makous, Neurosci. 5, 129-149. W. (1992) Vision Res. 32, 347-363. We thank J. Costanza for superb technical 34. Cavanagh, P. & Anstis, S. M. (1991) Vision Res. 71. Shimojo, S., Silverman, G. H. & Nakayama, K. assistance. G. Buracas, G. Carman, L. Croner, 31, 2109-2148. (1989) Vision Res. 29, 619-626. and R. Duncan provided helpful comments on 35. Dobkins, K. R. & Albright, T. D. (1993) Vision 72. Vallortigara, G. & Bressan, P. (1991) Vision Res. the manuscript. This work was supported by Res. 33, 1019-1036. 31, 1967-1978. grants from the National Institute of Mental 36. Ramachandran, V. S., Rao, V. M. & Vidyasagar, 73. Noest, A. J. & van den Berg, A. V. (1993) Spatial Health and the National Institute. G.R.S. T. R. (1973) Vision Res. 13, 1399-1401. Vision 7, 125-147. Eye 37. Chubb, C. & Sperling, G. (1988) J. Opt. Soc. Am. 74. Nowlan, S. J. & Sejnowski, T. J. (1994) J Neu- was partially supported by a Research Fellow- A 5, 1986-2006. rosci., in press. ship from the McDonnell-Pew Center for Cog- 38. Julesz, B. & Payne, R. A. (1968) Vision Res. 8, 75. Gray, C. M., Koenig, P., Engel, A. K. & Singer, nitive Neuroscience at San Diego. 433-444. W. (1989) Nature (London) 338, 334-337. Downloaded by guest on September 30, 2021