Predictive Coding As a Unifying Principle for Explaining a Broad Range of Brightness Phenomena

bioRxiv preprint doi: https://doi.org/10.1101/2020.04.23.057620; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Predictive coding as a unifying principle for explaining a broad range of brightness phenomena Alejandro Lerer1Y*, Hans Supèr2, Matthias S.Keil1,2Y, 1 Departament de Cognició,Desenvolupament i Psicologia de l’Educació,Faculty of Psychology, University of Barcelona, Barcelona, Spain 2 Institut de Neurociències,Universitat de Barcelona, Barcelona, Spain 3 Institut de Recerca PediàtricaHospital Sant Joan de Déu,Barcelona, Spain 4 Catalan Institute for Advanced Studies (ICREA), Barcelona, Spain YThese authors contributed equally to this work. * [email protected] Abstract The visual system is highly sensitive to spatial context for encoding luminance patterns. Context sensitivity inspired the proposal of many neural mechanisms for explaining the perception of luminance (brightness). Here we propose a novel computational model for estimating the brightness of many visual illusions. We hypothesize that many aspects of brightness can be explained by a predictive coding mechanism, which reduces the redundancy in edge representations on the one hand, while non-redundant activity is enhanced on the other (response equalization). Response equalization is implemented with a dynamic filtering process, which (dynamically) adapts to each input image. Dynamic filtering is applied to the responses of complex cells in order to build a gain control map. The gain control map then acts on simple cell responses before they are used to create a brightness map via activity propagation. Our approach is successful in predicting many challenging visual illusions, including contrast effects, assimilation, and reverse contrast. Author summary We hardly notice that what we see is often different from the physical world \outside" 1 of the brain. This means that the visual experience that the brain actively constructs 2 may be different from the actual physical properties of objects in the world. In this 3 work, we propose a hypothesis about how the visual system of the brain may construct 4 a representation for achromatic images. Since this process is not unambiguous, 5 sometimes we notice \errors" in our perception, which cause visual illusions. The 6 challenge for theorists, therefore, is to propose computational principles that recreate a 7 large number of visual illusions and to explain why they occur. Notably, our proposed 8 mechanism explains a broader set of visual illusions than any previously published 9 proposal. We achieved this by trying to suppress predictable information. For example, 10 if an image contained repetitive structures, then these structures are predictable and 11 would be suppressed. In this way, non-predictable structures stand out. Predictive 12 coding mechanisms act as early as in the retina (which enhances luminance changes but 13 suppresses uniform regions of luminance), and our computational model holds that this 14 principle also acts at the next stage in the visual system, where representations of 15 perceived luminance (brightness) are created. 16 April 18, 2020 1/30 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.23.057620; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. Introduction 17 Visual perception is relative rather than absolute; the visual system (VS) computes the 18 perceptual attributes of a visual target not only based on its physical properties, but 19 also by considering information from the surrounding region of the target (context). For 20 example, it is possible to induce different kinds of effects by context modification, such 21 that the brightness of a target is contrasted (increasing brightness differences) or 22 assimilated (decreasing brightness differences) with respect to its adjacent surround. 23 Variants of these effects give rise to a myriad of visual illusions, which are of great 24 utility for building hypothesis about computational mechanisms or perceptual rules for 25 brightness perception. 26 At first sight it seems that contrast effects, such as simultaneous brightness contrast 27 (SBC), can be explained by lateral inhibition between a target (center) and its context 28 (surround). However, activity related to brightness contrast does possibly not occur 29 before V1, albeit the receptive fields of retinal ganglion cells are consistent with lateral 30 inhibition [1]. 31 Unlike contrast, brightness assimilation pulls a target's brightness towards to that of 32 its immediate context, and therefore cannot be explained by mechanisms based on plain 33 lateral inhibition. In fact, the neural mechanisms involved in generating brightness 34 appear to be more intricate, and only few computational proposals were made so far for 35 explaining assimilation effects. Note that any (computational) proposal could only be 36 deemed as being successful if it explained contrast and assimilation effects at the same 37 time. For example, the simplest (low-level) explanation for assimilation is spatial 38 lowpass filtering, but it would predict brightness assimilation also for contrast displays. 39 Similarly, [2] suggested a retinal model based on a second-order adaptation mechanism 40 induced by double opponent receptive fields; it predicted assimilation, but then again it 41 failed to predict contrast displays. With respect to mid-or higher-level processing, [3] 42 suggested a pattern-specific inhibition mechanism acting in the visual cortex, which 43 inhibits regularly arranged patterns of a visual stimulus. 44 Anchoring theory is a rule-based mechanism for segregating the contributions of 45 illumination and reflectance in brightness, and thus to map luminance to perceived gray 46 levels [4]. Accordingly, a visual image is first divided into one or more perceptual 47 frameworks (a framework is a set of surfaces that are grouped together). Within each 48 framework, the highest luminance value is anchored at perceived white, and smaller 49 luminance values are mapped to gray levels according to their luminance-ratio with the 50 anchor. Note that \white" may not be the only possible anchor. For example, the 51 lowest luminance may be anchored at \black", or the mean luminance may be anchored 52 at the Eigengrau level. Although anchoring theory is successful in explaining the 53 perception of reflectance for simple displays, it is not clear how its rules apply to 54 natural images [5,6]. Furthermore, rather than specifying computational mechanisms, 55 anchoring theory (but also White's suggestion) represents only a phenomenological 56 proposal, and it is not clear how its empirically defined set of rules could be related to 57 any specific information processing strategy in the visual system (but see e.g., [7]). 58 Another fairly popular hypothesis holds that contrast and assimilation effects are 59 related to the contrast sensitivity function of the visual system [8]. In one or the other 60 way, this idea forms the base of many computational brightness models, which 61 decompose a visual image according to its spatial frequency content (multi-scale models). 62 Typically, multi-scale models adjust the response amplitudes of spatial-frequency 63 selective filters in a way that the resulting distribution (response amplitude versus 64 spatial frequency) follows the shape of the contrast sensitivity function. Contrast and 65 assimilation effects are then predicted to occur as a consequence of the adjustment. 66 Specifically, [9, 10] proposed a multi-scale model which represents a luminance image 67 with oriented difference of Gaussians (ODOG) filters. The crucial mechanism in the 68 April 18, 2020 2/30 bioRxiv preprint doi: https://doi.org/10.1101/2020.04.23.057620; this version posted April 23, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY 4.0 International license. ODOG-model is (local) contrast normalization for adjusting the spatial frequency 69 representation. In agreement with [8], [9] observed that contrast and assimilation 70 displays are distinguished according to their respective content in spatial frequencies: 71 Luminance patterns with lower spatial frequencies than the encoding filters exhibit a 72 loss of low spatial frequency information in their brightness representation and produce 73 contrast as a consequence. Assimilation, on the other hand, is generated when the high 74 spatial frequencies of a luminance pattern exceed the highest spatial frequencies that 75 can be represented with the encoding filters. Although the ODOG model predicts SBC 76 and White's effect, it cannot account for more sophisticated visual illusions such as the 77 Dungeon Illusion and the Benary Cross. Both of the latter cannot be reproduced by 78 building a brightness representation with oriented filters that are confined to a specific 79 spatial frequency range. Several extensions of the ODOG model have been proposed in 80 order to account for a bigger set of brightness illusions [11].

Predictive Coding As a Unifying Principle for Explaining a Broad Range of Brightness Phenomena

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support