Synergy of Spatial Frequency and Orientation Bandwidth in Texture Segregation
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Vision (2021) 21(2):5, 1–25 1 Synergy of spatial frequency and orientation bandwidth in texture segregation Department of Psychology, Methods Section, Johannes Cordula Hunt Gutenberg-Universität, Mainz, Germany Department of Psychology, Methods Section, Johannes Günter Meinhardt Gutenberg-Universität, Mainz, Germany Defining target textures by increased bandwidths in (Baddeley, 1997; Groen et al., 2013; VanRullen, 2006; spatial frequency and orientation, we observed strong Wichmann et al., 2006). cue combination effects in a combined texture figure Likewise, rapid and preattentive discrimination detection and discrimination task. Performance for of “textures,” that is, artificial images with spatial double-cue targets was better than predicted by regularity, also indicates involvement of early analysis independent processing of either cue and even better mechanisms. First studies suggested that identity of than predicted from linear cue integration. Application the power spectrum decides whether two textures of a texture-processing model revealed that the can be discriminated only with detailed scrutiny or oversummative cue combination effect is captured by immediately and preattentively. However, later studies calculating a low-level summary statistic ( CEm), which found several counter examples, demonstrating that describes the differential contrast energy to target and there are textures with equal power spectra, and even reference textures, from multiple scales and with equal third-order statistics, which segregate at a orientations, and integrating this statistic across channels with a winner-take-all rule. Modeling detection glance (Julesz et al., 1973, 1978). These results suggested performance using a signal detection theory framework that a Fourier-like image description that could be showed that the observers’ sensitivity to single-cue and extracted from the responses of orientation and spatial double-cue texture targets, measured in d units, could frequency tuned cell populations found in striate cortex be reproduced with plausible settings for filter and noise (Campbell et al., 1969; Daugman, 1980; Hubel & parameters. These results challenge models assuming Wiesel, 1968) might only be an initial step in texture separate channeling of elementary features and their segregation (Sagi, 1995; Victor et al., 1995). later integration, since oversummative cue combination In one class of feedforward processing models, the effects appear as an inherent property of local energy output of orientation and spatial frequency selective mechanisms, at least for spatial frequency and filters is squared or full-wave-rectified to form a local orientation bandwidth-modulated textures. energy measure (Landy, 2013; Lin & Wilson, 1996). This basic operation not only marks local contrast and luminance differences but also identifies texture regions that differ in arbitrary feature modulations by Introduction translating them into different local energy distributions (Bergen & Adelson, 1988; Landy & Bergen, 1991). Due Humans categorize visual scenes rapidly (Fei-Fei to its simplicity, the model has widely been used to et al., 2007; Schyns & Oliva, 1994), judging at a describe human sensitivity to feature modulation across glance whether its content is natural or man made space (Arsenault et al., 1999; Bergen & Adelson, 1988; (Greene & Oliva, 2009a, 2009b; Loschky et al., Caelli, 1982; Prins & Kingdom, 2006; Rubenstein & 2007; Loschky & Larson, 2008; Oliva & Torralba, Sagi, 1990; Sagi, 1995). It has also been used in recent 2001). Also discriminating scene objects at the basic attempts to explain the rapid extraction of a scene level of categorization seemingly works within the “gist” (Groen et al., 2013). In most implementations, first 100 ms of processing without allocation of there is a first filtering stage, followed by local energy spatial attention (Grill-Spector & Kanwisher, 2005; computation, which again is followed by a secondary, Hershler & Hochstein, 2005; Rousselet et al., 2005; larger-scale filtering stage, which enhances pooled Thorpe et al., 1996). Rapid and preattentive scene regional differences in feature modulation. Models and object categorization indicates that observers rely of this class are referred to as filter-rectify-filter on cues provided by earlier stages of image analysis (FRF) models (Landy & Bergen, 1991; Landy, 2013; Citation: Hunt, C., & Meinhardt, G. (2021). Synergy of spatial frequency and orientation bandwidth in texture segregation. Journal of Vision, 21(2):5, 1–25, https://doi.org/10.1167/jov.21.2.5. https://doi.org/10.1167/jov.21.2.5 Received May 26, 2020; published February 9, 2021 ISSN 1534-7362 Copyright 2021 The Authors Downloaded from jov.arvojournals.orgThis work ison licensed 09/23/2021 under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Journal of Vision (2021) 21(2):5, 1–25 Hunt & Meinhardt 2 Landy & Oruc, 2002; Prins & Kingdom, 2006).FRF parallel (Kubovy & Cohen, 2001). Such models have models can predict the close covariation of detection been proposed to describe “pop out” in visual search, performance with degree of texture modulation (Landy whereby each module signals spatiotemporal gradients & Bergen, 1991), the salience of texture boundaries for the feature it analyzes (Treisman & Gelade, 1980; (Malik & Perona, 1990; Landy & Bergen, 1991),and Treisman, 1988). Local energy-based models are at odds several asymmetries in human texture segregation with this notion, since their local nonlinearity marks (Rubenstein & Sagi, 1990). However, as pointed out any regional texture change regardless of its specific by Prins and Kingdom (2006), direct evidence for the featural origin. Hence, in local energy-based models, existence of local energy-based texture mechanisms is there are no feature modules built from the outputs of rare. the primary analyzers. Malik and Perona (1990) claimed that a complete Predicting the feature synergy effect for textures model of texture perception should satisfy three jointly modulated in two features from local energy- criteria: (1) It should be biologically plausible, (2) it based mechanisms would be strong evidence for this should be general enough to be tested on any gray-scale model class, since there are no free parameters for image, and (3) it should yield a quantitative match weighting how texture modulations from two features with psychophysical data. Local energy-based models combine. Support for local energy-based mechanisms satisfy the first criterion of biological plausibility as would be even stronger if the specific nature of the they use first-order filters with weighting characteristics texture stimuli makes architectures with separate feature resembling the receptive field profiles of cortical modules no likely candidates for explaining the synergy simple cells in V1 (Daugman, 1980, 1985),which effect, even with additional assumptions for the way were found to be tuned to specific orientations and how module outputs are integrated. spatial scales (Campbell et al., 1969; Hubel & Wiesel, In this sense, the present study provides evidence 1965). Further, V2 cells were shown to respond to for local energy-based mechanisms in human texture locus and orientation of texture boundaries but were segregation. We devised stimuli with no likely unresponsive to the luminance profiles of texture double-cue advantage, using textures with no feature elements (von der Heydt et al., 1984), a behavior that contrast but different orientation and spatial frequency closely matches the non-Fourier second-order squaring bandwidths. In these textures, orientation and spatial and pooling operation (Lin & Wilson, 1996). Thus, frequency differences of target and reference textures FRF mechanisms can be viewed as modeling processing arise in regions of the parameter space where practically chains of V1 and V2 cells. Local energy-based models no overlap of the response characteristics of primary also easily satisfy the second criterion as they can take filters can be expected. However, we observed a strong an arbitrary gray-scale image as input. double-cue benefit for these stimuli, which was at The third criterion states that a test for local energy- least as strong as the synergy effect for textures with based texture mechanisms requires a psychophysical orientation and spatial frequency contrast (Meinhardt benchmark. A strong challenge for feedforward local et al., 2006; Persike & Meinhardt, 2008). Analyzing energy-based models would be prediction of the the textures with a plausible local energy model and “feature synergy effect” arising in the discrimination of combining the space-average responses with a simple texture regions defined by feature modulation in two max-rule replicated the synergy effect for detection, dimensions (Meinhardt & Persike, 2003; Meinhardt while no further assumptions or free parameters et al., 2004, 2006; Kida et al., 2011; Straube et al., entered. The findings suggest that nonlinear saliency 2010). Authors showed that sensitivity to target texture enhancement for jointly modulated textures is an regions that varied from the surround in orientation inherent property of local energy mechanisms, which and spatial frequency was much higher than predicted offers a parsimonious and straightforward explanation from the assumption of detecting texture modulation of the feature synergy effect in texture segregation. along either dimension independently. Such a multi cue advantage has been termed “synergy”