F

Figure-Ground Segregation, Once figure-ground segmentation is achieved, Computational Neural the figure region often delineates a zone of addi- Models of tional attentive .

Arash Yazdanbakhsh1 and Ennio Mingolla2 1 Computational Neuroscience and Vision Lab, Detailed Description Department of Psychological and Brain Sciences, Graduate Program for Neuroscience (GPN), The Importance of Figure-Ground Center for Systems Neuroscience (CSN), Center Segregation for Research in Sensory Communications and Visual function in animals has broadly increased Neural Technology (CReSCNT), Boston in complexity and competence across eons of University, Boston, MA, USA fi 2 evolution, with gure-ground being Department of Communication Sciences and among the later and more intriguing achieve- Disorders, Bouvé College of Health Sciences, ments of relatively sophisticated visual species. Northeastern University, Boston, MA, USA The importance of figure-ground perception can be seen by considering its advantages to animals that have the capability, as compared to the visual Definition limitations of animals whose visual systems do not support figure-ground perception (Land and Figure-ground segregation refers to the capacity Nilsson 2002). For example, the single-celled of a to rapidly and reliably pick out euglena can use its eyespot and flagellum to orient for greater visual analysis, attention, or awareness, its locomotion in a light field, but it cannot take or preparation for motor action, a region of the account of the edges of individual objects in the visual field (figure) that is distinct from the com- sense of figure-ground for purposes of steering. bined areas of all the rest of the visual field Arrays of visual receptors that sample different (ground). The “figure” region is often, but not directions of the visual field, such as the com- necessarily, bounded by a single closed visual pound eyes of insects or mammalian , contour, and the figure region is often said to are needed for computing figure-ground relations. “own” the boundary between it and any adjacent Successful figure-ground segregation can facili- regions. The figure region is generally experi- tate several subsequent visual tasks, such as object enced as “in front of” (along lines of sight) sur- recognition or guidance of locomotion toward faces of objects that are in the ground region. a goal or around an obstacle.

© Springer Science+Business Media, LLC, part of Springer Nature 2019 D. Jaeger, R. Jung (eds.), Encyclopedia of Computational Neuroscience, https://doi.org/10.1007/978-1-4614-7320-6_100660-1 2 Figure-Ground Segregation, Computational Neural Models of

Some species of animals can perform figure- regions of higher-than-average (relative to the rest ground perception only in the presence of visual of the scene) activation of certain featural qualities motion, while others can also do so in static envi- (e.g, of color, contrast, motion, or orientation) in ronments. It must be noted that even when the visually topographic “feature maps” make such environment outside an animal is perfectly still, regions likely targets for attention or of overt eye the animal itself may be making voluntary move- movements (Wolfe and Horowitz 2017). ments at a macroscale or involuntary movements An important class of figure-ground phenom- (eye tremor or microsaccades) that can help to ena occurs in cluttered visual scenes, where support figure-ground segregation. the achievement of figure-ground segregation Finally, as we will see in the case of human (or, can amount to the breaking of camouflage. Gibson more generally, primate) figure-ground percep- et al. (1969) demonstrated vivid figure-ground tion, the space-variant arrangement of receptors segregation in random dot kinematograms – dis- on the and the associated “cortical magnifi- plays involving some combination of static cation” that concentrates the brain’s processing and moving randomly distributed dots that are resource to the central region of the visual analogous in the time/motion domain to random field is are major factors both in achieving and dot stereograms. In random dot stereograms, two in exploiting figure-ground perception. Voluntary eyes receive related, but different, images, typi- eye movements to an approximate centroid of cally where a region of dots (the figure) is a relatively homogenous visual region can displaced in one eye’s view relative to the other enhance the visual system’s capacity to delineate eye’s view, causing a relative disparity of stimu- an associated figure-ground boundary and thus lated retinal locations that is different from the to perform figure-ground segregation at all. disparities of other corresponding dots in the two Conversely, as long as the eyes are foveating eyes (which form the ground). a “figure” region, a high proportion of cortical The importance of motion to figure-ground tissues is devoted to analysis of that region, cour- perception of ecological scenes has been evident tesy of cortical magnification, thus facilitating since, at least, the seminal paper by Lettvin et al. object recognition or discrimination between (1959) “What the frog’s eye tells the frog’s brain,” two similar object categories. The ability to whose authors noted that frogs of the studied deploy foveal vision while performing figure- species would starve in the presence of freshly ground segregation can thus be viewed as killed flies spread on the ground before them. a “force multiplier” that enables detailed scrutiny Nonetheless, humans and certain other species at relatively low metabolic cost (eye movements), have the capacity to perform figure-ground segre- rather than requiring a reorientation of an entire gation for many complex static scenes, including head or actual approach of the entire body toward pictures – for which endogenous eye movements an environmental region. convey no additional information, as would be the case for views of actual three-dimensional Elementary Cues for Figure-Ground environments. Segregation In visual scenes with little structure, separation The Scope of the Present Article of figure from ground may appear as an easy or This article focuses on computational models of even trivial task. For example, a small dark spot human or primate figure-ground segregation. on an otherwise homogenous light background is We indicate whether models are intended to often said to “pop out” from its background in the address moving or static scenes, or both, and context of a visual search task (e.g., “Find the dark whether they rely on binocular vision. Reasons spot,” Treisman and Gelade 1980). Such isolated, for this focus include (1) that primate visual salient figural properties “call attention to them- systems have the greatest complexity and compe- selves” and are effective as lures that attract sac- tency among known biological systems and cades. A consensus of recent decades holds that (2) that many machine vision systems are Figure-Ground Segregation, Computational Neural Models of 3 designed with the intent of matching if not faces are seen against the uniform background, the exceeding human performance in important borders between the black and white areas belong visual tasks. While many computer vision algo- to the faces and the same for the vase. rithms are designed without reference to animal A good deal of attention in introductory text- examples, it makes sense to study natural visual books is devoted to bistable displays (Fig. 1b–c) systems for inspiration in those domains for when discussing figure-ground perception. which machine performance continues to lag, Although such displays help to make an important as is the case with figure-ground segregation point about the relationship between figure- in cluttered scenes. ground relations and border-ownership, they are a potentially misleading starting point relative What Is Border-Ownership and How Is It to figure-ground perception in natural scenes. Related to Figure-Ground Perception? In bistable displays, volitional manipulation A “figure” object and its borders occlude the by the viewer of semantic categories such as objects and their borders of the background. face or vase can “drive” a resulting percept. Figure 1a shows the famous painting by Leonardo The bistability of labeling border-ownership da Vinci: Mona Lisa occludes the trees, river, is not typical of most normal perceptual settings, bushes, and other objects in the background. which are closer to the depiction of Fig. 1a. Note The foreground borders in (Fig. 1a) obviously also that finding boundaries is trivial in binary- “belong” to the object (i.e., Mona Lisa) in front, valued displays with solid, connected regions, which is the figure. The shape of the borders is such as Fig. 1b and c, as opposed to conditions often informative about the identity of the object of, for example, animal camouflage (Fig. 2). (a human) that is depicted, which is said to “own” the border – thus figure-ground perception can be Psychophysical Antecedents of Recent an important step on the way to object recognition. Computational Neuroscience of Figure- Note however that Evans and Treisman (2005) Ground Perception have shown that people can perform animal detec- The second half of the twentieth century saw a tion in natural scenes with great success even great deal of human psychophysics devoted to the though they are unable to localize the animal, so study of texture segregation, visual segmentation figure-ground perception cannot be said to be a and grouping, and visual search. required step in object recognition. Rather, the Linking these studies – many of which localization of a figure’s borders and proper assign- sought to find fundamental visual “elements” of ment of the ownership of associated borders are perception – was the goal of characterizing how, logical requirements of figure-ground segregation. out of the “blooming, buzzing, confusion” of time-varying visual scenes, our visual systems Border-Ownership Is Independent from the are able to focus on relatively coherent objects Polarity of Contrast Across Borders of interest (figures). The texture segmentation A clever artist can devise displays in which the and grouping studies characterized conditions ownership of a border is ambiguous, leading to under which perceptually continuously connected bistable percepts. See Fig. 1b, the Rubin (doctoral boundaries between regions could form, even in dissertation 1915) faces/vase. While consciously the absence of continuous differences in lumi- experiencing the faces against a black back- nance or contrast along those boundaries (such ground, the borders separating the white and as is evident in Fig. 2). Thus, the question of black areas are the face borders. Note that “ownership” of boundaries was often not raised. flipping the contrast of Fig. 1b and making the Visual search studies, pioneered by Anne faces black and the vase white (Fig. 1c) do not Treisman in the 1980s (Treisman and Gelade change the relationship between the figure-ground 1980) and developed since by many, most notably separation and border-ownership, i.e., when the Jeremy Wolfe (Wolfe and Horowitz 2017), 4 Figure-Ground Segregation, Computational Neural Models of

Figure-Ground Segregation, Computational Neural (Adapted from https://en.wikipedia.org/wiki/Mona_Lisa Models of, Fig. 1 Examples of 2D images helping under the public domain), (b) Rubin faces/vase. (Taken formulate the concept of figure-ground segregation from https://commons.wikimedia.org/wiki/File:Multi and border ownership, irrespective of some visual attributes stability.svg and is released under the public domain by such as edge contrast. (a) Mona Lisa by Leonardo da Vinci. Alan De Smet), and (c) its contrast inverted version

Figure-Ground Segregation, Computational Neural figure borders cannot be readily detected by the lining up of Models of, Fig. 2 Figure-ground segregation during the units detecting edge contrast, as tiny edge contrasts camouflage, when the figure and background have similar with different orientations are almost uniformly distrib- textures and their average luminance are close to each uted. (Adapted from pixabay.com; photo is released other. Such scenes can pose another level of challenge to under CC0 Creative Commons) computational models of figure-ground segregation, as the focused on describing conditions under which a given display, search could be “guided” by top- a region of a scene would “pop out” and thus be down expectations (Wolfe and Horowitz 2017), or treated as a “figure” for subsequent processing, a number of candidate regions (Grossberg et al. with the rest of the display acting as “ground.” 1994) might be evaluated in order of likelihood of When conditions for pop-out were not in play in containing the object being searched for. In scenes Figure-Ground Segregation, Computational Neural Models of 5 containing an object of search, the finding of that intra-areal. Zhou et al. (2000) and Ko and von object, whether quickly or slowly, inevitably is der Heydt (2018) have found that about half accompanied by the phenomenal experience of of sampled cells from primate visual areas V2 the visible regions of that object as “figure.” and V4 preferentially respond to borders whose A series of psychophysical studies led in turn ownership is related to figure-ground relations to related paradigms involving brain imaging (Fig. 4). These cells are referred to as border or or electrophysiological recordings in primates, B-cells and have “side-of-figure” selectivity that which have advanced our understanding of indicates a neurophysiological correlate of per- figure-ground perception tremendously. cepts observed in Fig. 1b–c, namely, that a border Finally, an important thing to note about figure- is owned by either the region to one side (for ground perception is that, unlike the examples example vase) or another of that border (face), discussed to this point, in daily life it does not but not both. generally occur in a flat plane, such as in a picture. When we look at a spoon or cup that we are about Temporal Dynamics of Border-Ownership to reach for, the surface of the object is extended Neural Response and often curved in three dimensions. Tyler and The border-ownership-related neural response Kontesevich (1995) introduced the important can emerge within 25 msec after the B-neuron’s concept of an “attentional shroud”–a fuzzy mani- response onset. In V2 and V4, the differences fold in perceptual space that envelopes the sur- were found to be nearly independent of figure faces of attended objects. The shroud delineates size up to the limit set by the size of the display both the limits of focal attention and also the (21) that Zhou et al. (2000) used (Fig. 4). tendency of attention to involuntarily “spread” Because such a response depends on the pro- along the visible surfaces of an attended figure, cessing of an image region that is at least as rather than into the rest of the scene (the ground). large as the figure and thus requires the transmis- Indeed, the delineation of attentional resources for sion of information over some distance in the visual processing after the determination of cortex, these short delays constrain the functional figure-ground segregation can be viewed as an network underlying the B-cell response and the important functional reason for engaging in biological plausibility of neural models. figure-ground segregation in the first place. The short delay of the emergence of border- ownership signals constrains the contribution of Brain Areas Involved in Figure-Ground inter-areal and intra-areal connections in a biolog- Perception ically plausible neural model that can perform Although figure-ground segregation is fundamen- figure-ground segregation (Layton et al. 2012). tal to , how the visual system This is because inter-areal connections are mye- performs it is not well understood. Coding of linated and transfer neural signals much faster border-ownership starts early in primate brain than the intra-areal ones, and therefore, the small visual stream. A direct link between visual delays observed in border-ownership signals figure-ground perception and the responses of demand fast inter-areal connections. Researchers certain single neurons has, however, been have proposed that B cells access global informa- established in the early visual system. tion either intra-areally, i.e., by lateral connections These cell responses may require the simulta- within a single visual cortical area, such as V2 neous activation of parts of visual areas V1, V2, (Zhaoping 2005), or inter-areally, i.e., where cells and V4 (Fig. 3) acting as a functional network with larger receptive fields (e.g., V2) communi- (Layton et al. 2012), whose neurons are mono- cate contextual information about the scene via synaptically connected to each other across visual feedback projections to visual areas whose cells areas (Fig. 3b). Such connections are termed inter- have smaller receptive fields (e.g., V1; see the areal. On the other hand, neural connections diagram in Fig. 3b). Intra-areal and inter-areal within each area (i.e., V1, V2, etc.) are called axonal conduction velocities have been estimated 6 Figure-Ground Segregation, Computational Neural Models of

ab V2

V1

V4

Inter-areal connections

Figure-Ground Segregation, Computational Neural a multiple-area network for processing the input. (a) Lat- Models of, Fig. 3 Brain areas at some distance can be eral view of the left hemisphere of a monkey brain, roughly axonally connected, such as inter-areal connections showing the locations of V1, V2, and V4. (Adapted from between visual areas V1, V2, and V4. Inter-areal connec- Wurtz (2015) under the Creative Commons Attribution 4.0 tions via myelated axons are much faster than within International license). (b) A simple schema showing inter- area (intra-areal, unmyelinated) connections. As a result, areal and mutual connections between areas V1, V2, and during a stream of visual inputs, multiple brain areas V4, highlighting the fast inter-areal connections that can can be activated within a short period of time, forming generate a functional multiple-area visual network

Figure-Ground Segregation, Computational Neural the neuron, yet, the neuron shows preference for figure to Models of, Fig. 4 The von der Heydt group discovered the left (1st row in a and b) than to the right (2nd row in the presence of border-ownership neurons. The neuron a and b). Such cells, therefore, encode border-ownership, receptive field is shown by small circle/ellipse which is and in this particular case, left-side ownership. (Adapted over the figure edge. Local contrast (light/dark) is the same from von der Heydt (2015) under CC BY License) across each column and over the classic receptive field of to be 0.3 and 3.5 m/s in early visual areas, respec- within a single cortical area, transmitting the tively (Bullier 2001). Hence, inter-areal connec- information to another area with large receptive tions can be an order of magnitude faster than field cells could afford a roughly fixed delay intra-areal connections for propagating informa- irrespective of the figure size in the visual field tion across the visual field. B-cell responses to 3 (Layton et al. 2012). Hence, it appears that con- squares did not differ in latency compared to those nections within a single cortical area alone could to an 8 square, which is consistent with the use not plausibly account for the fast global scene of inter-areal connections, but not intra-areal con- integration that is observed in B-cell border- nections, to propagate contextual figure-ground ownership responses (Layton et al. 2012), but information. Although a variable amount of time see Zhaoping (2005), who argues otherwise. is required to propagate information about a figure Figure-Ground Segregation, Computational Neural Models of 7

The immediately preceding discussion is based on primate receptive field scatter (jitter in an example of an emerging paradigm shift receptive field center locations) and variation in understanding of cortical visual processes. in receptive field sizes to generate a code for Most physiological studies related to cell response inside versus outside of a visible figure. to luminance-based and hue-based edges of the Feedforward projections from V4 signal the cur- primate visual system in the past half-century vature of object boundaries (curved contour cells), have followed the path established by Hubel and and feedback projections from visual temporal Wiesel (1962) and largely focused on the function areas group neurons with different receptive of individual areas or subpopulations of cells field sizes and receptive field center locations in within a visual area. Topics of such studies a teardrop geometry. Neurons sensitive to convex include how different cells in the primary visual contours, which respond more when centered on cortex respond to luminance base edges, to the a figure, balance feedforward information and light-dark polarity of contrast, and to variations of feedback from higher visual areas. The model edge orientation. It instead appears that the cortex produces maximum activity along the medial can solve complex problems with networks that axis of figures irrespective of the concavities and span multiple areas. The visual system may thus convexities of figure contour. The model feedback rapidly recruit an assembly of cortical areas to mechanism balanced with feedforward signals determine border-ownership in figure-ground is crucial for figure-ground segregation and segregation, a single emergent function. Neuroan- localization (Fig. 5). atomical evidence indicates that early visual Another computational approach in solving the areas such as LGN, V1, V2, and V4 are massively figure-ground problem for static scenes is given interconnected with numerous feedforward by Grossberg (1993, 1994). This neural model and feedback connections (Sincich and Horton involves feedforward and feedback interactions 2005, see Fig. 3b). Feedforward connections between cortical units that represent boundaries are believed to quickly propagate sensory visual and surfaces at multiple spatial scales. The assign- information to cortical areas further up the ment of border-ownership drives interactions visual hierarchy to serve a rich perception of where signals for boundaries that are nearer to the visual scene. Feedback projections are often the viewer suppress signals for further-away said to play a modulatory role with respect to boundaries in a way that supports featural visibil- bottom-up sensory visual signals by increasing ity (lightness) of the figure region while allowing the gain of neuronal responses in attended regions aligned segments of far boundaries to complete and performing contextual integration. To date, and support amodal completion of partially few studies have hypothesized that feedback occluded surfaces. Note that, while differing projections play a crucial as opposed to supple- in important ways from the Layton et al. (2012, mentary roles for the functions of early visual 2014) models, the Grossberg (1993, 1994) cortices. It appears that the simultaneous activa- approach introduced the idea of cooperation and tion of multiple areas early in the visual system competition among representations in multiple not only performs modular functions that are later cortical areas as a key requirement of figure- combined but also that such activation can collec- ground segregation. tively solve problems that individual cortical areas A contrasting approach to modeling figure- cannot solve alone (Layton et al. 2012). ground segregation is taken by Kogo et al. (2010), who view figure-ground perception A Teardrop Model of Figure-Ground as a nonlinear differentiation/integration process Segregation and an Edge-Integration Model that simultaneously accounts for border-owner- Layton et al. (2014) presented a model of figure- ship, lightness, and depth perception. A crucial ground segregation in which inter-areal feedback step in this model is called “Integration for plays a crucial role in disambiguating a figure’s Surface Completion,” where estimates of depth interior and exterior. The model processes are in a featurally homogeneous region’s interior are 8 Figure-Ground Segregation, Computational Neural Models of

ab

Receptive field outline + Receptive field centroid Figure region The centroid of the blue + figure region

Figure-Ground Segregation, Computational Neural activation of the two neurons in the third row can be Models of, Fig. 5 Localizing a C-shaped (crescent approximately the same. Localization is a different matter, moon shaped) figure is difficult from the perspective of however. The figure centroid is shown by two red cross a single neuron’s convex receptive field (shown as dotted symbols, +, whereas the centers of the two neurons’ recep- black outlines in the top and bottom rows of the figure). tive fields are indicated by blue crosses. Localization is The middle row shows two stimulus shapes – a concave poor in case (a). The teardrop model of Layton et al. (2014) C shape in column (a) and an ellipse in column (b). Note offers an inter-areal network to resolve figure localization that the shape in (a) is darker and has a greater contrast with in cases like in (a). Note that the areas chosen for investi- the white background, leading to higher activation per unit gation in the study described in Fig. 4 are also “C-shaped,” area of a neuron’s receptive field. Thus the degree of where the C is a block shape with right-angle corners

“filled-in” through values that are computation- feedback is needed to better understand figure- ally propagated from signals from the region’s ground segregation. edges. This integration step differs with the Layton et al. 2014 approach, which depends on Neural Responses to Texture-Defined Static estimates of depth from multiple receptive fields Figures in a higher visual area whose centroids are distrib- Lamme (1995) is an important monkey single- uted (“jittered”) across the region’s area. Integra- unit recording study involving displays in which tion is a computation that would seem to a figure-ground segregation is easily perceived “naturally” occur within a visual area, and further (by humans) through differences in the orientation investigation of comparisons between classes of of texture elements. Although the receptive field models employing integration versus inter-areal of certain neurons in early visual areas of monkey Figure-Ground Segregation, Computational Neural Models of 9 lies entirely within the “figure” texture region, the despite the figure and background possessing pat- neurons exhibit an enhanced firing rate compared terns with statistically identical luminance? Obvi- to when the monkeys were presented a display ously, this is a demanding scene processing task containing a uniform texture of the neuron’s for the primate visual system, which goes beyond preferred orientation throughout the display. simple contrast processing and is likely to involve The interior enhancement effect persists when brain areas involved in motion processing (the the edges of the square are 8–10 and the modu- middle temporal area, MT, among others) as lation occurs after an 80–100 ms latency from the well as form processing (including V4 and prob- onset of the stimulus, which suggests feedback ably other visual areas) to extract the moving form from neurons with larger RFs may be involved. from visual motion cues scattered within the A temporal analysis indicates that neural activity camouflaged scene. In this regard Layton and related to the edges of the figure emerges first, Yazdanbakhsh (2015) have suggested an following a short latency, then interior enhance- interconnected network model of areas V1-V2- ment occurs in the “late component” of the res- V4-MT, in which MT and V4, both monosynap- ponse (Roelfsema et al. 2002). A neuron that tically connected to V2, process the motion and shows interior enhancement continues to fire at form signals, respectively. Another possible an elevated rate when the RF is centered at differ- approach to motion-based figure-ground segrega- ent positions within the texture-defined figure, tion is described by Barnes and Mingolla (2013), and the firing rate drops when the RF is centered whereby motion onset and offset signals gener- on the background. Neurons in V2 demonstrate a ated by accretion or deletion of texture elements greater degree of interior enhancement compared via occlusion act as key gating signals for figure- to those in V1, and the magnitude of interior ground segregation. Note that it is quite possible enhancement response is greatest in V4. that the primate visual system has evolved multi- ple mechanisms, with partially overlapping Motion-Based Figure-Ground Segregation competencies, to address the important needs of Fishes, frogs, moths, and snakes demonstrate figure-ground segregation. adaptations in the appearance of their body to hide from predators. Through camouflage, the Need for Feedback Among Brain Areas markings on the prey’s body cause the animal to As noted in “the section Temporal Dynamics of be grouped with, rather than stand out from the Border Ownership Neural Response”, the primate surrounding. This strategy is effective as long as visual system can rapidly recruit several cortical the animal does not move, because many preda- areas in the service of determining figure-ground tors and humans cannot easily detect stationary relations. One way to investigate the mechanisms objects that resemble their surroundings in tex- underlying the context-sensitive aspects of per- ture, color, and luminance. However, when a pre- ceptual organization is the use of images that viously invisible animal breaks camouflage by create bistable perception in which the perceptual sudden motion, humans rapidly perceive a figure interpretations of the image keep alternating over at a different depth from the surroundings, even time, while the physical input image is kept con- if the texture is statistically identical. When stant. One of the most famous images to induce a figure moves in front of a similarly textured perceptual bistability involving figure-ground background, it is said to produce kinetic occlusion organization is Rubin’s “face-or-vase” image (Gibson et al. 1969; Kaplan 1969). No reliable (Fig. 1b–c): The perceptual interpretation keeps luminance contrast exists between the figure and alternating between “two faces” on the sides and a background; there is only the relative motion “vase” in the center, while the competing area is between the texture patterns separated by a kinet- perceived as a part of the background. Because the ically defined edge (kinetic edge). How do perceptual switch in this case is specifically linked humans perceive the figure at a different depth to the reversal of the figure-ground organization, than the background (figure-ground segregation), we can call this phenomenon bistable figure- 10 Figure-Ground Segregation, Computational Neural Models of

“2D” Versus “3D” Displays: Figure-Ground Segregation and Stereopsis Depth and separating a figure from background are tightly linked. A figure is generally closer in depth than the background. This brings up the question whether a figure can be purely defined and segregated from a ground area by binocular depth, with no reliance on contrast differences along an edge. Béla Julesz in 1959 generated pairs of images with random dots for which, if were shown to each eye separately, a portion of Figure-Ground Segregation, Computational Neural dots (e.g., forming a square) was seen at a differ- Models of, Fig. 6 The is a classic example fi ent depth from the rest (Julesz 1971). Note that of bistable gures. Sometimes the lower square appears in fi front and sometimes in the back. The switch is instanta- gure-ground segregation is thereby achieved, neous, showing our visual brain exhausts possible 3D with the resulting stereoscopic border being alternatives which result in the same 2D projection owned by the nearer figure region. Closing each of the eyes (or just looking at each half pair ground organization. Rubin presented his disser- instead) vanishes the percept of the circle as tation in 1915, with one of the key points that a figure against its background (Fig. 7). This is when two neighboring fields share a border, one an example of pure stereopsis involvement (and will be seen as figure against the other as the nothing more) in forming a figure against its back- background and that the percept of a figure is ground. Here again, similar to the animal in cam- dominated by one of the fields against the other ouflage case, there is no simple contrast between sharing that border. Pitts et al. (2011) used a Rubin the figure and its background, and the figure and faces/vase display to investigate the dynamics of its borders are depth based. Are there single-cell figure-ground processing via EEG with humans data available for such figure-ground segregation and found that reorganization (faces to vase or to be used for a neural modeling approach? back) could occur on a timescale of 200 msec or Von der Heydt et al. (2000) investigated the less, which allows for signals to propagate both in representation of stereoscopic surfaces and edges feedforward and feedback directions across visual in monkey and whether such neural areas. responses carry information of the figure borders Another example of bistable images is the and surfaces. Two key points are discovered in Necker cube (Fig. 6), in which the front side of their work: first, if a figure like a square is gener- the cube swings between the top and bottom ated by random dot stereogram in which there is squares. Similar to Rubin vase, such a perceptual no first-order (contrast-based) edge, the V1 cells bistability can well be temporally modulated by respond to the stereoscopically defined figure attention, i.e., with a bit of practice, one can “will” surface rather than to its edges; second, V2 cells the top or bottom square closer and vice versa at respond to the stereoscopically defined edges of a given time. Similarly, in Rubin faces/vase case, the figure rather than to its surface. one can make the face or vase the figure by atten- tion at a time. Such demonstrations further sup- port the claim that figure-ground perception is Conclusion a complex process that involves both top-down and bottom-up interactions in the primate brain. For most naturally occurring scenes, figure- The degree of facility of such voluntary atten- ground perception is a quick, automatic, seem- tional switches has been exploited as potential ingly effortless, and successful process, with cer- markers for progression of Parkinson’s disease tain naturally occurring static images or humanly (Diaz et al. 2015a, b). engineered camouflage being notable exceptions. Figure-Ground Segregation, Computational Neural Models of 11

Figure-Ground Segregation, Computational Neural each half-pair, additional cues can create a pattern with Models of, Fig. 7 Random Dot Stereogram. Looking at clear borders. Additional cues can be stereopsis (here), each half-pair doesn’t result in seeing a particular pattern, motion, or large scale grouping of figure elements. yet, if the left and right eyes see only the left and right half- (Figure is adapted from https://commons.wikimedia.org/ pairs, respectively, a circle in the middle appears farther. wiki/File:Mh_stereogramm_randomdot.png under Crea- This example shows that while there is no visual border in tive Commons Attribution-Share Alike 3.0 Unported)

Even in cases of successful camouflage, however, “border-ownership” has been considered a “fea- our visual experience is often one of a figure- ture” to the human visual system and hence worth ground segregation – just not the segregation of study through single-unit neural recording or that camouflage is designed to hide. Indeed, it is brain imaging. Much remains to be done. difficult to engineer a visual display that does not One of the characteristics of visual neurons is support figure-ground organization, i.e., a their receptive fields, which are derived from the ganzfeld – a completely homogeneous light rich connectivity between neurons in different field – or very shallow gradients of luminance visual areas. We have not reached a sufficiently throughout the visual field. In contrast, if existing detailed understanding of the spatial and temporal neural models capable of figure-ground segrega- characterization of neuronal networks that per- tion were to be challenged with a wide gamut of form the sophisticated task of figure-ground seg- natural scenes, all of today’s models would suffer regation on the order of a few tens of milliseconds. multiple and systematic failures, because existing Progress in neuroanatomy and connectomics neural models have been designed for a very small increasingly is revealing the geometry of available subset of such scenes. The diversity of the cortical connections in brain networks, but a functional networks and associated computational strategies understanding of physiology remains constrained that of our visual system uses to perform figure- by several factors. One is the inherent spatial and ground segregation is partially revealed by variety temporal limits of each recording modality, be it of visual illusions, but the known class of figure/ single or multiunit recording, EEG, MEG, fMRI, ground illusions is a particularly small and non- or other modality. Another factor is that the struc- representative subset of natural scenes. Given that tures of current computational neural models are our visual system performs figure-ground segre- far simpler than their biological counterparts, as is gation with its all-neural machinery capable of a virtual requirement of any progressive attempts doing many other visual processes, understanding at model-building. A satisfactory level of under- the underlying neural processing underlying standing of figure-ground segregation awaits figure-ground segregation can be indeed a diffi- further integrated cycles of experimentation and cult undertaking. It is only in recent years that model development. 12 Figure-Ground Segregation, Computational Neural Models of

Before closing, we note that this article has not as machine algorithms “inspired by” primate neu- touched on two topics that a reader might wonder roscience than as computational models “of” pri- about. The first is deep learning or deep convolu- mate visual function, which is the focus of the tional networks. Recent successes of this class present article. of modeling architectures in object recognition The other topic is neural synchrony, which has are impressive in their accuracy, outperforming been proposed as a coding strategy underlying humans in many tasks for which a sufficient data- perceptual binding between disparate regions of base of accurately labeled exemplars has been the visual field (Wolf Singer 1999). Dong et al. compiled to train the artificial neural network. 2008 have reported that synchrony depends on the Such networks have also enjoyed increasing border-ownership selectivity of the neuron pairs successes in boundary segmentation and semantic being recorded. Neural synchrony has become segmentation, in the latter case assigning each a bit of a “field unto itself,” with controversies pixel in an image the name of a category (e.g., about methods and meaning of results that are not “grass,”“road,”“sky,” etc.) that most likely is its at all related to figure-ground perception. More- source in an imaged scene. over, even if it were conclusively shown that, Accurately finding all boundaries in a scene or during figure-ground perception, neurons coding semantically labeling all pixels in an image is, a figure fire synchronously while those coding the however, almost the exact opposite of figure- ground fire in anti-phase with the neurons coding ground segregation. Animals have not evolved the figure but in synchrony with each other, a key with the luxury of virtually limitless computa- question would remain: How did each member of tional power or billions of computing cycles to each population get recruited to fire when it did, as before interacting with objects in their world. opposed to some other time? In other words, what As noted at the start of this article, the significance functional receptive field properties underlie the of human figure-ground segregation is intimately “sorting” of parts of the visual field into figure and tied to the machinery of primate vision – trying to ground? This article has focused on the latter balance the fine resolution of the fovea with questions. awidefield of view, creating the tradeoffs of eye movements and cortical magnification. Our default mode of visual scanning is to focus Cross-References on one object at a time for a few hundred milli- “ ” seconds, taking in at a glance, or perhaps with ▶ Attentional Pop-Out a handful of , the important characteris- ▶ Attentional Shroud “fi ” tics of each gure that we scrutinize in turn. ▶ Border-Ownership It is thus not reasonable to expect computational approaches that treat large images as trans- lationally uniform arrays of pixels to be informa- tive about human figure-ground segregation. References With this said, any machine vision algorithm that fi Barnes T, Mingolla E (2013) A neural model of visual is designed to speci cally segment an image into figure-ground segregation from kinetic occlusion. two regions, with one completely surrounded by Neural Netw 37:141–164. https://doi.org/10.1016/j. another, can be said to be addressing the figure- neunet.2012.09.011. Epub 2012 Oct 6. ground problem in some way. A particular rele- Bullier J (2001) Integrated model of visual processing. Brain Res Rev 36(2–3):96–107 vant example is that of Sarti et al. (2000) which Diaz-Santos M, Cao B, Mauro SA, Yazdanbakhsh A, uses a “seed” location (analogous to a point of Neargarder S, Cronin-Golomb A (2015a) Effect of foveation) to drive completion of illusory bound- visual cues on the resolution of perceptual ambiguity ’ aries through an algorithm that uses partial differ- in Parkinson s disease and normal aging. J Int Neuropsychol Soc 21:1–10 ential equations and level-set methods. Such Diaz-Santos M, Cao B, Yazdanbakhsh A, Norton DJ, approaches, however, can be better characterized Neargarder S, Cronin-Golomb A (2015b) Perceptual, Figure-Ground Segregation, Computational Neural Models of 13

cognitive, and personality rigidity in Parkinson’s dis- Layton OW, Mingolla E, Yazdanbakhsh A (2014) Neural ease. Neuropsychologia 69:183–193 dynamics of feedforward and feedback processing in Dong Y, Mihalas S, Qiu F, von der Heydt R, Niebur E figure-ground segregation. Front Psychol 10(5):972. (2008) Synchrony and the binding problem in macaque https://doi.org/10.3389/fpsyg.2014.00972 visual cortex. J Vis 8(7):30.1–30.16. https://doi.org/ Lettvin JY, Maturana HR, McCulloch WS, Pitts WH 10.1167/8.7.30 (1959) What the frog’s eye tells the frog’s brain. Proc Evans KK, Treisman A (2005) Perception of objects in Inst Radio Eng NY 47:1940–1951 natural scenes: is it really attention free? J Exp Psychol Pitts MA, Martínez A, Brewer JB, Hillyard SA (2011) Early Hum Percept Perform 31(6):1476–1492. https://doi. stages of figure–ground segregation during perception of org/10.1037/0096-1523.31.6.1476 the face–vase. J Cogn Neurosci 23(4):880–895. https:// Gibson JJ, Kaplan GA, Horace N, Reynolds JR, Wheeler K doi.org/10.1162/jocn.2010.21438 (1969) The change from visible to invisible: a study of Roelfsema PR, Lamme V, Spekreijse H, Bosch H (2002) optical transitions. Percept Psychophys 5(2):113–116 Figure-ground segregation in a recurrent network archi- Grossberg S (1993) A solution of the figure-ground prob- tecture. J Cogn Neurosci 14:525–537. https://doi.org/ lem for biological vision. Neural Netw 6(4):463–483. 10.1162/08989290260045756 https://doi.org/10.1016/S0893-6080(05)80052-8 Rubin E (1915) Synsoplevede figurer (visually experi- Grossberg S (1994) 3-D vision and figure-ground separa- enced figures). Doctoral dissertation tion by visual cortex. Percept Psychophys Sarti A, Mallad R, Sethian JA (2000) Subjective surfaces: 55(1):48–121. https://doi.org/10.3758/BF03206880 A method for completing missing boundaries. Proceed- Grossberg S, Mingolla E, Ross WD (1994) A neural theory ings of the National Academy of Sciences, vol 97, issue of attentive visual search: interactions of visual, 12, 6258–6263 spatial, and object representations. Psychol Rev Sincich LC, Horton JC (2005) The circuitry of V1 and V2: 101(3):470–489 integration of color, form, and motion. Annu Rev Hubel DH, Wiesel TN (1962) Receptive fields, binocular Neurosci 28:303–326 interaction and functional architecture in the cat’s Singer W (1999) Neuronal synchrony: a versatile code visual cortex. J Physiol 160:106–154 review for the definition of relations? Neuron 24:49–65 Julesz B (1971) Foundations of cyclopean perception. Treisman A, Gelade G (1980) A feature integration theory The University of Chicago Press, Chicago. ISBN of attention. Cogn Psychol 16:97–136 0-226-41527-9 Tyler CW, Kontsevich LL (1995) Mechanisms of stereo- Kaplan GA (1969) Kinetic disruption of optical texture: the scopic processing: stereoattention and surface percep- perception of depth at an edge. Percept Psychophys tion in depth reconstruction. Percept Lond 6(4):193–198. https://doi.org/10.3758/BF03207015 24(1):127–154. c Ko HK, von der Heydt R (2018) Figure-ground organiza- von der Heydt R (2015) Figure–ground organization and tion in the visual cortex: does meaning matter? the emergence of proto-objects in the visual cortex. J Neurophysiol 119(1):160–176. https://doi.org/ Front Psychol 6:1695. https://doi.org/10.3389/ 10.1152/jn.00131.2017 fpsyg.2015.01695 Kogo N, Strecha C, Van Gool L, Wagemans J (2010) von der Heydt R, Zhou H, Friedman HS (2000) Represen- Surface construction by a 2-D differentiation- tation of stereoscopic edges in monkey visual cortex. integration process: a neurocomputational model Vis Res 40:1955–1967. https://doi.org/10.1016/S0042- for perceived border ownership, depth, and lightness 6989(00)00044-4 in Kanizsa figures. Psychol Rev 117(2):406–439. Wolfe JM, Horowitz TS (2017) Five factors that guide https://doi.org/10.1037/a0019076 attention in visual search. [Review article]. Nat Lamme V (1995) The neurophysiology of figure ground Hum Behav 1:0058. https://doi-org.ezproxy.neu.edu/ segregation in primary visual-cortex. J Neurosci 10.1038/s41562-017-0058 15:1605–1615 Wurtz RH (2015) Using perturbations to identify the brain Land MF, Nilsson DE (2002) Animal eyes. Oxford circuits underlying active vision. Philos Trans R Soc B University Press, Oxford 370:20140205. http://rstb.royalsocietypublishing.org/ Layton OW, Yazdanbakhsh A (2015) A neural model of content/370/1677/20140205 border-ownership from kinetic occlusion. Vis Res Zhaoping L (2005) Border ownership from intracortical 106:64–80 interactions in visual area V2. Neuron 47(1):143–153 Layton OW, Mingolla E, Yazdanbakhsh A (2012) Zhou H, Friedman HS, von der Heydt R (2000) Coding of Dynamic coding of border-ownership in visual cortex. border ownership in monkey visual cortex. J Neurosci J Vis 12:8. https://doi.org/10.1167/12.13.8 20:6594–6611