Normative theory of visual receptive fields

Tony Lindeberg Computational Brain Science Lab, Department of Computational Science and Technology, KTH Royal Institute of Technology, SE-100 44 Stockholm, Sweden. Email: [email protected]

Abstract—This article gives an overview of a normative x_2

computational theory of visual receptive fields. It is de- . . . . . scribed how idealized functional models of early spatial,

spatio-chromatic and spatio-temporal receptive fields can be . . . . . derived in an axiomatic way based on structural properties of the environment in combination with assumptions about the internal structure of a vision system to guarantee . . . . . consistent handling of image representations over multiple spatial and temporal scales. Interestingly, this theory leads . . . . . to predictions about visual receptive field shapes with qual- itatively very good similarity to biological receptive fields measured in the , the LGN and the primary visual x_1 cortex (V1) of mammals. Keywords—Receptive field, Functional model, Gaussian Fig. 1. A receptive field is traditionally defined as a region in derivative, Scale covariance, Affine covariance, Galilean the visual field for which a visual sensor/neuron/operator responds to covariance, Temporal causality, Illumination invariance, visual stimuli. This figure shows a set of partially overlapping receptive Retina, LGN, Primary , Simple , Double- fields over the spatial domain with all the receptive fields having the opponent cell, Vision. same spatial extent. More generally, one can conceive distributions of receptive fields over space or space-time with the receptive fields of different size, different shape and orientation in space as well I.INTRODUCTION as different directions in space-time, where adjacent receptive fields may also have significantly larger relative overlap than shown in this When reaches a visual sensor such as the retina, schematic illustration. In this work, we focus on a functional description of linear receptive fields, concerning how a neuron responds to visual the information necessary to infer properties about the stimuli over image space regarding spatial receptive fields or over joint surrounding world is not contained in the measurement space-time regarding spatio-temporal receptive fields. of image intensity at a single point, but from the relations between intensity values at different points. A main reason for this is that the incoming light constitutes an indirect fields “ought to” respond to visual data? source of information depending on the interaction be- Initially, such a problem might be regarded as in- tween geometric and material properties of objects in the tractable unless the question can be further specified. It is, surrounding world and on external illumination sources. however, possible to address this problem systematically Another fundamental reason why cues to the surrounding using approaches that have been developed in the area of world need to be collected over regions in the visual known as scale-space theory (Iijima [6]; field as opposed to at single image points is that the Witkin [7]; Koenderink [8]; Koenderink and van Doorn measurement process by itself requires the accumulation [9], [10]; Lindeberg [11], [12], [13], [14]; Florack [15]; of energy over non-infinitesimal support regions over Sporring et al. [16]; Weickert et al. [17]; ter Haar Romeny space and time. Such a region in the visual field for which [18]). A paradigm that has been developed in this field a neuron responds to visual stimuli is traditionally referred is to impose structural constraints on the first stages to as a receptive field (Hubel and Wiesel [1], [2], [3]) of visual processing that reflect symmetry properties of (see Figure 1). In this work, we focus on a functional the environment. Interestingly, it turns out to be possible description of receptive fields, regarding how a neuron to substantially reduce the class of permissible image with a purely spatial receptive field responds to visual operations from such arguments. arXiv:1701.06333v4 [q-bio.NC] 16 Feb 2017 stimuli over image space, and regarding how a neuron The subject of this article is to describe how structural with a spatio-temporal receptive field responds to visual requirements on the first stages of visual processing as stimuli over space and time (DeAngelis et al. [4], [5]). formulated in scale-space theory can be used for deriving If one considers the theoretical and computational idealized functional models of visual receptive fields and problem of designing a vision system that is going to implications of how these theoretical results can be used make use of incoming reflected light to infer properties of when modelling biological vision. A main theoretical the surrounding world, one may ask what types of image argument is that idealized functional models for linear operations should be performed on the image data. Would receptive fields can be derived by necessity given a small any type of image operation be reasonable? Specifically set of symmetry requirements that reflect properties of regarding the notion of receptive fields one may ask what the world that one may naturally require an idealized types of receptive field profiles would be reasonable. Is vision system to be adapted to. In this respect, the it possible to derive a theoretical model of how receptive treatment bears similarities to approaches in theoretical physics, where symmetry properties are often used as can be described by diffusion equations and are therefore main arguments in the formulation of physical theories of suitable for implementation on a biological architecture, the world. The treatment that will follow will be general since the computations can be expressed in terms of com- in the sense that spatial, spatio-chromatic and spatio- munications between neighbouring computational units, temporal receptive fields are encompassed by the same where either a single computational unit or a group of unified theory. computational units may be interpreted as corresponding This paper gives a condensed summary of a more gen- to a neuron or a group of neurons. Specifically, computa- eral theoretical framework for receptive fields derived and tional models involving diffusion equations arise in mean presented in [13], [19], [20], [21] and in turn developed to field theory for approximating the computations that are enable a consistent handling of receptive field responses performed by populations of neurons (Omurtag et al. [33]; in terms of provable covariance or invariance properties Mattia and Guidic [34]; Faugeras et al. [35]). under natural image transformations (see Figure 2). In relation to the early publications on this topic [13], [19], A. Structure of this article [20], this paper presents an improved version of that This paper is organized as follows: Section II gives theory leading to an improved model for the temporal an overview of and motivation to the assumptions that smoothing operation for the specific case of a time- the theory is based on. A set of structural requirements causal image domain [21], where the future cannot be is formulated to capture the effect of natural image accessed and the receptive fields have to be solely based transformations onto the illumination field that reaches on information from the present moment and a compact the retina and to guarantee internal consistency between buffer of the past. Specifically, this paper presents the image representations that are computed from receptive improved axiomatic structure on a compact form more field responses over multiple spatial and temporal scales. easy to access compared to the original publications and Section III describes linear receptive families that arise also encompassing the better time-causal model. as consequences of these assumptions for the cases of either a purely spatial domain or a joint spatio-temporal external illumination domain. The issue of how to perform relative normal- ization between receptive field responses over multiple spatial and temporal scales is treated, so as to enable comparisons between receptive field responses at differ- ent spatial and temporal scales. We also show how the influence of illumination transformations and exposure control mechanisms on the receptive field responses can be handled, by describing invariance properties obtained by applying the derived linear receptive fields over a viewing distance logarithmically transformed intensity domain. viewing direction Section IV shows examples of how spatial, spatio- relative motion position in 3-D chromatic and spatio-temporal receptive fields in the spatial sampling orientation in 3-D temporal sampling motion in 3-D retina, the LGN and the primary visual cortex can be well modelled by the derived receptive field families. Fig. 2. Basic factors that influence the formation of images for an Section V gives relations to previous work, including eye with a two-dimensional retina that observes objects in the three- conceptual and theoretical comparisons to previous use of dimensional world. In addition to the position, the orientation and the Gabor models of receptive fields, approaches for learning motion of the object in 3-D, the perspective projection onto the retina is affected by the viewing distance, the viewing direction and the relative receptive fields from image data and previous applications motion of the eye in relation to the object, the spatial and the temporal of a logarithmic transformation of the image intensities. sampling characteristics of the neurons in the retina as well the usually Finally, Section VI summarizes some of the main results. unknown external illumination field in relation to the geometry of the scene and the observer. II.ASSUMPTIONS UNDERLYING THE THEORY: It will be shown that the presented framework leads to STRUCTURALREQUIREMENTS predictions of receptive field profiles in good agreement In the following, we shall describe a set of structural with receptive measurements reported in the literature requirements that can be stated concerning: (i) spatial (Hubel and Wiesel [1], [2], [3]; DeAngelis et al. [4], geometry, (ii) spatio-temporal geometry, (iii) the image [5]; Conway and Livingstone [22]; Johnson et al. [23]). measurement process with its close relationship to the Specifically, explicit phenomenological models will be notion of scale, (iv) internal representations of image given of LGN neurons and simple cells in V1 and will be data that are to be computed by a general purpose vision compared to related models in terms of Gabor functions system and (v) the parameterization of image intensity (Marceljaˇ [24]; Jones and Palmer [25], [26]; Ringach [27], with regard to the influence of illumination variations. [28]), differences of Gaussians (Rodieck [29]) and Gaus- For modelling the image formation process, we will at sian derivatives (Koenderink and van Doorn [9]; Young any point on the retina approximate the spherical retina by [30]; Young et al. [31], [32]). Notably, the evolution a perspective projection onto the tangent plane of the reti- properties of the receptive field profiles in this model nal surface at that image point, below represented as the image plane. Additionally, we will approximate the possi- This genericity property is closely related to the basic bly non-linear geometric transformations regarding spatial property of the mammalian vision system, that the compu- and spatio-temporal geometry by local linearizations at tations performed in the retina, the LGN and the primary every image point, and corresponding to the derivative of visual cortex provide general purpose output that is used the possibly non-linear transformation. In these ways, the as input to higher-level visual areas. theoretical analysis can be substantially simplified, while b) Shift invariance: To ensure that the visual inter- still enabling accurate modelling of essential functional pretation of an object should be the same irrespective properties of receptive fields in relation to the effects of of its position in the image plane, we assume that the natural image transformations as arising from interactions first processing stages should be shift invariant, so that if with the environment. an object is moved a distance ∆x = (∆x1, ∆x2) in the image plane, the receptive field response should remain A. Static image data over a spatial domain on a similar form while shifted with the same distance. In the following, we will describe a theoretical model Formally, this requirement can be stated that the family for the computational function of applying visual recep- of image operators Ts should commute with the shift tive fields to local image patterns. operator defined by S∆x(f)(x) = f(x − ∆x): For time-independent data f over a two-dimensional spatial image domain, we would like to define a family Ts (S∆xf) = S∆x (Tsf) . (4) of image representations L(·; s) over a possibly multi- In other words, if we shift the input by a translation and dimensional scale parameter s, where the internal image then apply the receptive field operator Ts, the result should representations L(·; s) are computed by applying some be similar as applying the receptive field operator to the parameterized family of image operators Ts to the image original input and then shifting the result. data f: c) Convolution structure: Together, the assumptions L(·; s) = Ts f(·). (1) about linearity and shift-invariance imply that Ts will convolution Specifically, we will assume that the family of image correspond to a operator [36]. This implies L operators T should satisfy: that the representation can be computed from the image s data f by convolution with some parameterized family of a) Linearity: For the earliest processing stages to convolution kernels T (·; s): make as few irreversible decisions as possible, we assume that they should be linear L(·; s) = T (·; s) ∗ f. (5)

Ts(a1f1 + a2f2) = a1Tsf1 + b1Tsf2. (2) d) Semi-group structure over spatial scales: To en- sure that the transformation from any finer scale s to Specifically, linearity implies that any particular scale- 1 any coarser scale s + s should be of the same form for space properties (to be detailed below) that we derive for 1 2 any s > 0 (a requirement of algebraic closedness), we the zero-order image representation L will transfer to any 2 assume that the result of convolving two kernels T (·; s1) spatial derivative Lxα1 xα2 of L, so that 1 2 and T (·; s2) from the family with each other should be a L α1 α2 (·; s) = ∂ α1 α2 (Ts f) = Ts(∂ α1 α2 f). (3) kernel within the same family of kernels and with added x1 x2 x1 x2 x1 x2 parameter values T (·; s1 + s2): In this sense, the assumption of linearity reflects the requirement of a lack of bias to particular types of image T (·; s1) ∗ T (·; s2) = T (·; s1 + s2). (6) structures, with the underlying aim that the processing This assumption specifically implies that the represen- performed in the first processing stages should be generic, tation L(·; s ) at a coarse scale s can be computed to be used as input for a large variety of visual tasks. By 2 2 from the representation L(·; s ) at a finer scale s < s the assumption of linearity, local image structures that 1 1 2 by a convolution operation of the same form (5) as the are captured by e.g. first- or second-order derivatives will transformation from the original image data f while using be treated in a structurally similar manner, which would the difference in scale levels s − s as the parameter not necessarily be the case if the first local neighbourhood 2 1 processing stage of the first layer of receptive fields would L(·; s2) = T (·; s2 − s1) ∗ L(·; s1). (7) instead be genuinely non-linear.1 This property does in turn imply that if we are able to 1Note, however, that the assumption about linearity of some first derive specific properties of the family of transformations layers of receptive fields does, however, not exclude the possibility of Ts (to be detailed below), then these properties will not defining later stage non-linear receptive fields that operate on the output from the linear receptive fields, such as the computations performed by only hold for the transformation from the original image complex cells in the primary visual cortex. Neither does this assumption data f to the representations L(·; s) at coarser scales, but of linearity exclude the possibility of transforming the raw image also between any pair of scale levels s2 > s1, with the intensities by a pointwise non-linear mapping function prior to the application of linear receptive fields based on processing over local aim that image representations at coarser scales should neighbourhoods. In Section III-D it will be specifically shown that a be possible to regard as simplifications of corresponding pointwise logarithmic transformation of the image intensities prior to the image representations at finer scales. application of linear receptive fields has theoretical advantages in terms of invariance properties of derivative-based receptive field responses In terms of mathematical concepts, this form of alge- under local multiplicative illumination transformations. braic structure is referred to as a semi-group structure over spatial scales increase to coarser scales s > s0. Similarly, if a point is a local minimum point in the image plane, then the value Ts Ts = Ts +s . (8) 1 2 1 2 at this minimum point L(x0; s0) must not decrease to e) Scale covariance under spatial scaling transfor- coarser scales s > s0. mations: If a visual observer looks at the same object Formally, this requirement that new structures should from different distances, we would like the internal rep- not be created from finer to coarser scales, can be for- resentations derived from the receptive field responses to malized into the requirement of non-enhancement of local be sufficiently similar, so that the object can be recognized extrema, which implies that if at some scale s0 a point as the same object while appearing with a different size x0 is a local maximum (minimum) for the mapping from on the retina. Specifically, it is thereby natural to require x to L(x; s0), then (see Figure 3): that the receptive field responses should be of a similar • (∂sL)(x; s) ≤ 0 at any spatial maximum, form while resized in the image plane. • (∂sL)(x; s) ≥ 0 at any spatial minimum. This corresponds to a requirement of spatial scale This condition implies a strong condition on the class of covariance under uniform scaling transformations of the possible smoothing kernels T (·; s). 0 spatial domain x = Ss x:

0 0 0 z L (x ; s ) = L(x; s) ⇔ TSs(s) Ss f = Ss Ts f (9) 0 to hold for some transformation s = Ss(s) of the scale x parameter s. f) Affine covariance under affine transformations: If a visual observer looks at the same local surface patch Fig. 3. The requirement of non-enhancement of local extrema is a way of restricting the class of possible image operations by formalizing the from two different viewing directions, then the local notion that new image structures must not be created with increasing surface patch may be deformed in different ways onto the scale, by requiring that the value at a local maximum must not increase different views and with different amounts of perspective and that the value at a local minimum must not decrease from finer to coarser scales s. foreshortening from the different viewing directions. If we approximate the local deformations caused by the perspective mapping by local affine transformations, then B. Time-dependent image data over space-time the transformation between the two differently deformed To model the computational function of spatio-temporal views of the local surface patch can in turn be described receptive fields in time-dependent image patterns, we do by a composed local affine transformation x0 = A x. If we for a time-dependent spatio-temporal domain first inherit are to use receptive field responses as a basis for higher the structural requirements regarding a spatial domain and level visual operations, it is natural to require that the complement the spatial scale parameter s by a temporal receptive field response of an affine deformed image patch scale parameter τ. In addition, we assume: should remain on a similar form while being reshaped by a) Scale covariance under temporal scaling trans- a corresponding affine transformation. formations: If a similar type of spatio-temporal event This corresponds to a requirement of affine covariance f(x, t) occurs at different speeds, faster or slower, it is under general affine transformations x0 = A x: natural to require that the receptive field responses should be of a similar form, while occurring correspondingly 0 0 0 L (x ; s ) = L(x; s) ⇔ TA(s) A f = ATs f (10) faster or slower. to hold for some transformation s0 = A(s) of the scale This corresponds to a requirement of temporal scale parameter. covariance under a temporal scaling transformation of the temporal domain t0 = S t: g) Non-creation of new structure with increasing τ scale: If we apply the family of transformations Ts for L0(x0, t0; s0, τ 0) = L(x, t; s, τ) ⇔ computing representations at coarser scales from repre- T S f = S T f (11) sentations at finer scales according to (1) and (7), there Sτ (s,τ) τ τ s,τ 0 0 could be a potential risk that the family of transforma- to hold for some transformation (s , τ ) = Sτ (s, τ) of the tions could amplify spurious structures in the input to spatio-temporal scale parameters (s, τ). produce macroscopic amplifications in the representations b) Galilean covariance under Galilean transforma- at coarser scales that do not directly correspond to sim- tions: If an observer looks at the same object in the world plifications of corresponding structures in the original for different relative motions v = (v1, v2) between the image data. To prevent such undesirable phenomena from object and the observer, it is natural to require that the occurring, we require that local spurious structures must internal representations of the object should be sufficiently not be amplified and express this condition in terms of similar so as to enable a coherent of the the evolution properties over scales at local maxima and object under different relative motions relative to the minima in the image intensities as smoothed by the family observer. Specifically, we may require that the receptive of convolution kernels T (·; s): If a point x0 for some field responses under relative motions should remain on scale s0 is a local maximum point in the image plane, the same form while being transformed in a corresponding then the value at this maximum point L(x0; s0) must not way as the relative motion pattern. If we at any point in space-time locally linearize the smoothing property over temporal scales for the temporal possibly non-linear motion pattern x(t) = (x1(t), x2(t)) smoothing kernel over temporal scales by a local Galilean transformation x0 = x+v t over space- time L(·; τ2) = h(·; τ1 7→ τ2) ∗ L(·; τ1), (16) 0 0 0 0 0 f = Gv f ⇔ f (x , t ) = f(x, t) with x = x+v t, (12) where the temporal kernels h(t; τ) should for any triplets then the requirement of guaranteeing a consistent visual of temporal scale values and temporal delays τ1, τ2 and interpretation under different relative motions between the τ3 obey the transitive property object and the observer can be stated as a requirement of Galilean covariance: h(·; τ1 7→ τ2) ∗ h(·; τ2 7→ τ3) = h(·; τ1 7→ τ3). (17)

0 0 0 0 0 L (x , t ; s , τ ) = L(x, t; s, τ) ⇔ This weaker assumption of a cascade smoothing property

TGv (s,τ) Gv f = Gv Ts,τ f (13) (16) still ensures that a representation at a coarser tempo- ral scale τ2 should with a corresponding requirement of to hold for some transformation Gv(s, τ) of the spatio- an accompanying simplifying condition on the family of temporal scale parameters (s, τ). kernels h (to be detailed below) constitute a simplification c) Semi-group structure over temporal scales in the of the representation at a finer temporal scale τ1, while not case of a non-causal temporal domain: To ensure that the implying as hard constraints as a semi-group structure. representations between different spatio-temporal scale e) Non-enhancement of local space-time extrema in levels (s , τ ) and (s , τ ) should be sufficiently well- 1 1 2 2 the case of a non-causal temporal domain: In the case behaved internally, we will make use of different types of of a non-causal temporal domain, we again build on the assumptions depending on whether the temporal domain notion of non-enhancement of local extrema to guarantee is regarded as time-causal or non-causal. Over a time- that the representations at coarser spatio-temporal scales causal temporal domain, the future cannot be accessed, should constitute true simplifications of corresponding which is the basic condition for real-time visual percep- representations at finer scales Over a spatio-temporal tion by a biological . Over a non-causal temporal domain, we do, however, state the requirement in terms of domain, the temporal kernels may extend to the relative local extrema over joint space-time instead of over local future in relation to any pre-recorded time moment, which extrema over image space. If a point (x , t ) for some is sometimes used as a conceptual simplification when 0 0 scale (s , τ ) is a local maximum point over space-time, analysing pre-recorded time-dependent data although not 0 0 then the value at this maximum point L(x , t ; s , τ ) at all realistic in a real-world setting. 0 0 0 0 must not increase to coarser scales (s, τ) ≥ (s0, τ0). Sim- For the case of a non-causal temporal domain, we make ilarly, if a point is a local minimum point over space-time, use of a similar type of semi-group property (8) as for- then the value at this minimum point L(x0, t0; s0, τ0) mulated over a purely spatial domain, while extending the must not decrease to coarser scales (s, τ) ≥ (s0, τ0). semi-group property over both the spatial scale parameter Formally, this requirement of non-creation of new s and the temporal scale parameter τ: structure from finer to coarser spatio-temporal scales, can be stated as follows: If at some scale (s0, τ0) a point Ts1,τ1 Ts2,τ2 = Ts1+s2,τ1+τ2 . (14) (x0, t0) is a local maximum (minimum) for the mapping In analogy with the case of a purely spatial domain, this from (x, t) to L(x, t; s0, τ0), then requirement guarantees that the transformation from any • α (∂sL)(x, t; s, τ) + β (∂τ L)(x, t; s, τ) ≤ 0 at any finer spatio-temporal scale level (s1, τ1) to any coarser spatio-temporal maximum spatio-temporal scale level (s2, τ2) ≥ (s1, τ1) will always • α (∂sL)(x, t; s, τ) + β (∂τ L)(x, t; s, τ) ≥ 0 at any be of the same form (algebraic closedness) spatio-temporal minimum L(·, ·; s , τ ) = T L(·, ·; s , τ ). (15) should hold in any positive spatio-temporal direction 2 2 s2−s1,τ2−τ1 1 1 defined from any non-negative linear combinations of α Specifically, this assumption implies that if we are able to and β. This condition implies a strong condition on the establish desirable properties of the family of transforma- class of possible smoothing kernels T (·, ·; s, τ). tions Ts,τ (to be detailed below), then these relations hold f) Non-creation of new local extrema or zero- between any pair of spatio-temporal scale levels (s1, τ1) crossings for a purely temporal signal in the case of a and (s2, τ2) with (s2, τ2) ≥ (s1, τ1). non-causal temporal domain: In the case of a time-causal d) Cascade structure over temporal scales in the temporal domain, we do instead state a requirement for case of a time-causal temporal domain: Since it can be purely temporal signals, based on the cascade smoothing shown that the assumption of a semi-group structure over property (16). We require that for a purely temporal signal temporal scales leads to undesirable temporal dynamics f(t), the transformation from a finer temporal scale τ1 to in terms of e.g. longer temporal delays for a time-causal a coarser temporal scale τ2 must not increase the number temporal domain [37, Appendix A], we do for a time- of local extrema or the number of zero-crossings in the causal temporal domain instead assume a weaker cascade signal. III.IDEALIZED FAMILIES g(x1, x2; s) A. Spatial image domain 5 Based on the above assumptions in Section II-A, it can 0 be shown [13] that when complemented with certain regu- larity assumptions in terms of Sobolev norms, they imply -5 that a spatial scale-space representation L as determined -5 0 5

gx1 (x1, x2; s) gx2 (x1, x2; s) by these must satisfy a diffusion equation of the form

15 15

10 10 1 T T ∂sL = ∇ (Σ0∇L) − δ ∇L (18) 5 5 2 0

0 0

!5 !5 for some positive semi-definite covariance matrix Σ0 !10 !10 and some translation vector δ0. In terms of convolution

!15 !15 !15 !10 !5 0 5 10 15 !15 !10 !5 0 5 10 15 kernels, this corresponds to Gaussian kernels of the form gx1x1 (x1, x2; s) gx1x2 (x1, x2; s) gx2x2 (x1, x2; s) 15 15 15 1 T −1 −(x−δs) Σs (x−δs)/2 10 10 10 g(x;Σ , δ ) = √ e , s s N/2 5 5 5 (2π) det Σs

0 0 0 (19)

!5 !5 !5 which for a given Σs = s Σ0 and a given δs = s δ0 !10 !10 !10 satisfy (18). If we additionally require these kernels to

!15 !15 !15 !15 !10 !5 0 5 10 15 !15 !10 !5 0 5 10 15 !15 !10 !5 0 5 10 15 be mirror symmetric through the origin, then we obtain Fig. 4. Spatial receptive fields formed by the 2-D Gaussian kernel affine Gaussian kernels with its partial derivatives up to order two. The corresponding family of receptive fields is closed under translations, rotations and scaling 1 T −1 g(x; Σ) = √ e−x Σ x/2. (20) transformations, meaning that if the underlying image is subject to a set N/2 of such image transformations then it will always be possible to find (2π) det Σ some possibly other receptive field such that the receptive field responses of the original image and the transformed image can be matched. Their spatial derivatives constitute a canonical family for expressing receptive fields over a spatial domain that can

g(x;Σ1) g(x;Σ2) g(x;Σ3) be summarized on the form

5 5 5 T (x; s, Σ) = g(x; s Σ). (21)

0 0 0 Incorporating the fact that spatial derivatives of these ker- nels are also compatible with the assumptions underlying -5 -5 -5

-5 0 5 -5 0 5 -5 0 5

gϕ1 (x;Σ1) gϕ2 (x;Σ2) gϕ3 (x;Σ3)

5 5 5

0 0 0

-5 -5 -5

-5 0 5 -5 0 5 -5 0 5

gϕ1ϕ1 (x;Σ1) gϕ2ϕ2 (x;Σ2) gϕ3ϕ3 (x;Σ3)

5 5 5

0 0 0

-5 -5 -5

-5 0 5 -5 0 5 -5 0 5 Fig. 5. Spatial receptive fields formed by affine Gaussian kernels and directional derivatives of these, here using three different covariance matrices Σ1, Σ2 and Σ3 corresponding to the directions θ1 = π/6, θ2 = π/3 and θ3 = 2π/3 of the major eigendirection of the covariance matrix and with first- and second-order directional derivatives com- puted in the corresponding orthogonal directions ϕ1, ϕ2 and ϕ3. The corresponding family of receptive fields is closed under general affine transformations of the spatial domain, including translations, rotations, Fig. 6. Distribution of affine Gaussian receptive fields corresponding to scaling transformations and perspective foreshortening (although this a uniform distribution on a hemisphere regarding zero-order smoothing figure only illustrates variabilities in the orientation of the filter, thereby kernels. In the most idealized version of the theory, one can think of disregarding variations in both the size and the degree of elongation). all affine receptive fields with their directional derivatives in preferred This closedness property implies that receptive field responses computed directions aligned to the eigendirections of the covariance matrix Σ as from different views of a smooth local surface patch can be perfectly being present at any position in the image domain. When restricted to matched, if the transformation between the two views can be modelled a limited number of receptive fields in an actual implementation, there as a local affine transformation. is also an issue of distributing a fixed number of receptive fields over the spatial coordinates x = (x1, x2) and the filter parameters s and Σ. this theory, this does specifically for the case of a two- operations over space and time, this leads to zero-order dimensional spatial image domain lead to spatial receptive spatio-temporal receptive fields of the form [13], [19]: fields that can be compactly summarized on the form T (x1, x2, t; s, τ; v, Σ) = g(x1−v1t, x2−v2t; s, Σ) h(t; τ). m1 m2 (23) Tϕm1 ⊥ϕm2 (x1, x2; s, Σ) = ∂ϕ ∂⊥ϕ (g(x1, x2; sΣ)) , (22) After combining that result with the results from cor- where responding theoretical analysis for a time-causal spatio- temporal domain in [13], [21], the resulting spatio- • x = (x1, x2) denote the spatial coordinates, • s denotes the spatial scale, temporal derivative kernels constituting the spatio- • Σ denotes a spatial covariance matrix determining temporal extension of the spatial receptive field model the shape of a spatial affine Gaussian kernel, (22) can be reparametrised and summarized on the fol- lowing form (see [13], [19], [20], [21]): • m1 and m2 denote orders of spatial differentiation, • ∂ϕ = cos ϕ ∂x + sin ϕ ∂x , ∂⊥ϕ = sin ϕ ∂x − 1 2 1 Tϕm1 ⊥ϕm2 t¯n (x1, x2, t; s, τ; v, Σ) cos ϕ ∂ denote spatial directional derivative opera- x2 = ∂m1 ∂m2 ∂n (g(x − v t, x − v t; s, Σ) h(t; τ)) , tors in two orthogonal directions ϕ and ⊥ϕ aligned ϕ ⊥ϕ t¯ 1 1 2 2 (24) with the eigenvectors of the covariance matrix Σ, 1 −xT Σ−1x/2s • g(x; s, Σ) = √ e is an affine where 2πs det Σ Gaussian kernel with its size determined by the • x = (x1, x2) denote the spatial coordinates, spatial scale parameter s and its shape by the spatial • t denotes time, covariance matrix Σ. • s denotes the spatial scale, Figure 4 and Figure 5 show examples of spatial receptive • τ denotes the temporal scale, T fields from this family up to second order of spatial • v = (v1, v2) denotes a local image velocity, differentiation. Figure 4 shows partial derivatives of the • Σ denotes a spatial covariance matrix determining Gaussian kernel for the specific case when the covariance the shape of a spatial affine Gaussian kernel, matrix Σ is restricted to a unit matrix and the Gaussian • m1 and m2 denote orders of spatial differentiation, kernel thereby becomes rotationally symmetric. The re- • n denotes the order of temporal differentiation, sulting family of receptive fields is closed under scaling • ∂ϕ = cos ϕ ∂x1 + sin ϕ ∂x2 and ∂⊥ϕ = sin ϕ ∂x1 − transformations over the spatial domain, implying that if cos ϕ ∂x2 denote spatial directional derivative opera- an object is seen from different distances to the observer, tors in two orthogonal directions ϕ and ⊥ϕ aligned then it will always be possible to find a transformation of with the eigenvectors of the covariance matrix Σ, the scale parameter s between the two image domains so • ∂t¯ = v1∂x1 + v2∂x2 + ∂t is a velocity-adapted that the receptive field responses computed from the two temporal derivative operator aligned to the direction T image domains can be matched. Figure 5 shows examples of the local image velocity v = (v1, v2) , 1 −xT Σ−1x/2s • g(x; s, Σ) = √ e is an affine of affine Gaussian receptive fields for covariance matrices 2πs det Σ Σ that do not correspond to rescaled copies of the unit Gaussian kernel with its size determined by the matrix. The resulting full family of affine Gaussian deriva- spatial scale parameter s and its shape determined tive kernels is closed under general affine transformations, by the spatial covariance matrix Σ, implying that for two different perspective views of a local • g(x1 − v1t, x2 − v2t; s, Σ) denotes a spatial affine smooth surface patch, it will always be possible to find Gaussian kernel that moves with image velocity v = a transformation of the covariance matrices Σ between (v1, v2) in space-time and the two domains so that the receptive field responses can • h(t; τ) is a temporal smoothing kernel over time be matched, if the transformation between the two image corresponding√ to a Gaussian kernel h(t; τ) = domains is approximated by a local affine transformation. g(t; τ) = 1/ 2πτ exp(−t2/2τ) in the case of non- In the most idealized version of the theory, one should causal time or a cascade of first-order integrators or think of receptive fields for all combinations of filter equivalently truncated exponential kernels coupled in parameters as being present at every image point, as cascade h(t; τ) = hcomposed(·; µ) according to (26) illustrated in Figure 6 concerning affine Gaussian recep- over a time-causal temporal domain. tive fields over different orientations in image space and This family of spatio-temporal scale-space kernels can be different eccentricities. seen as a canonical family of linear receptive fields over a spatio-temporal domain. B. Spatio-temporal image domain For the case of a time-causal temporal domain, the Over a non-causal spatio-temporal domain, correspond- result states that truncated exponential kernels of the form ing arguments as in Section III-A lead to a similar  1 e−t/µk t ≥ 0 h (t; µ ) = µk (25) form of diffusion equation as in Equation (18), while exp k 0 t < 0 expressed over the joint space-time domain p = (x, t) coupled in cascade constitute the natural temporal and with δ interpreted as a local drift velocity. After 0 smoothing kernels. These do in turn lead to a composed splitting the composed affine Gaussian spatio-temporal temporal convolution kernel of the form smoothing kernel corresponding to (19) while expressed K over the joint space-time domain into separate smoothing hcomposed(·; µ) = ∗k=1hexp(·; µk) (26) and corresponding to a set of first-order integrators cou- −T (x, t; s, τ) pled in cascade (see Figure 7).

. .

f_in f_out

Tx(x, t; s, τ) Tt(x, t; s, τ, δ) . .

Fig. 7. Electric wiring diagram consisting of a set of resistors and capacitors that emulate a series of first-order integrators coupled in cascade, if we regard the time-varying voltage fin as representing the time varying input signal and the resulting output voltage fout as representing the time varying output signal at a coarser temporal scale. According to the theory of temporal scale-space kernels for one- Txx(x, t; s, τ) Txt(x, t; s, τ) Ttt(x, t; s, τ) dimensional signals (Lindeberg [38], [21]; Lindeberg and Fagerstrom¨ [39]), the corresponding equivalent truncated exponential kernels are the only primitive temporal smoothing kernels that guarantee both temporal causality and non-creation of local extrema (alternatively zero-crossings) with increasing temporal scale.

Two natural ways of distributing the discrete time constants µk over temporal scales are studied in detail in [21], [37] corresponding to either a uniform or a Fig. 8. Space-time separable kernels Txmtn (x, t; s, τ) = logarithmic distribution in terms of the composed variance ∂xmtn (g(x; s) h(t; τ)) up to order two obtained as the composition of Gaussian kernels over the spatial domain x and a cascade of truncated K exponential kernels over the temporal domain t with a logarithmic X 2 distribution of the intermediate temporal scale levels that approximates τK = µk. (27) √ the time-causal limit kernel (s = 1, τ = 1, K = 7, c = 2). k=1 The corresponding family of spatio-temporal receptive fields is closed Specifically, it is shown in [21] that in the case of a under spatial scaling transformations as well as under temporal scaling transformations for temporal scaling factors that are integer powers logarithmic distribution of the discrete temporal scale of the distribution parameter c of the temporal smoothing kernel. levels, it is possible to consider an infinite number of (Horizontal axis: space x. Vertical axis: time t.) temporal scale levels that cluster infinitely dense near zero temporal scale −T (x, t; s, τ, v) τ τ τ ... 0 , 0 , 0 , τ , c2τ , c4τ , c6τ ,... (28) c6 c4 c2 0 0 0 0 so that a scale-invariant time-causal limit kernel Ψ(t; τ, c) can be defined obeying self-similarity and scale covariance over temporal scales and with a Fourier transform of the form Tx(x, t; s, τ, v) Tt(x, t; s, τ, δ) ∞ Y 1 Ψ(ˆ ω; τ, c) = √ √ . (29) −k 2 k=1 1 + i c c − 1 τ ω Figure 8 and Figure 9 show spatio-temporal kernels over a 1+1-dimensional spatio-temporal domain using approx- imations of the time-causal limit kernel for temporal smoothing over the temporal domain and the Gaussian Txx(x, t; s, τ, v) Txt(x, t; s, τ, v) Ttt(x, t; s, τ, v) kernel for spatial smoothing over the spatial domain. Figure 8 shows space-time separable receptive fields corresponding to image velocity v = 0, whereas Figure 9 shows unseparable velocity-adapted receptive fields cor- responding to a non-zero image velocity v 6= 0. The family of space-time separable receptive fields for zero image velocities is closed under spatial scaling transformations for arbitrary spatial scaling factors as well Fig. 9. Velocity-adapted spatio-temporal kernels as for temporal scaling transformations with temporal Txmtn (x, t; s, τ, v) = ∂xmtn (g(x − vt; s) h(t; τ)) up to scaling factors that are integer powers of the distribution order two obtained as the composition of Gaussian kernels over the spatial domain x and a cascade of truncated exponential kernels parameter c of the time-causal limit kernel. The full over the temporal domain t with a logarithmic distribution of the family of velocity-adapted receptive fields for general intermediate temporal scale levels that approximates√ the time-causal non-zero image velocities is additionally closed under limit kernel (s = 1, τ = 1, K = 7, c = 2, v = 0.5). In addition to spatial and temporal scaling transformations, the corresponding family of receptive fields is also closed under Galilean transformations. (Horizontal axis: space x. Vertical axis: time t.) Galilean transformations, corresponding to variations in given by the relative motion between the objects in the world and the observer. Given that the full families of receptive Tϕm1 ⊥ϕm2 t¯n,norm(x1, x2, t; s, τ; v, Σ) m γ /2 fields are explicitly represented in the vision system, this m1γs/2 2 s nγτ /2 = sϕ s⊥ϕ τ means that it will be possible to perfectly match receptive ∂m1 ∂m2 ∂n (g(x − v t, x − v t; s, Σ) h(t; τ)) , field responses computed under the following types of ϕ ⊥ϕ t¯ 1 1 2 2 (31) natural image transformations: (i) objects of different size in the image domain as arising from e.g. viewing the where γs and γτ denote the spatial and temporal scale same object from different distances, (ii) spatio-temporal normalization parameters of γ-normalized derivatives events that occur with different speed, faster or slower, and specifically the choice γs = 1 and γτ = 1 and (iii) objects and spatio-temporal that are viewed with leads to maximum scale invariance in the sense that different relative motions between the objects/event and the magnitude response of the spatio-temporal recep- the visual observer. tive field will be invariant under independent scaling If additionally the spatial smoothing is performed over transformations of the spatial and the temporal domains 0 0 0 the full family of spatial covariance matrices Σ, then (x1, x2, t ) = (Ss x1,Ss x2,Sτ t), provided that both receptive field responses can also be matched (iv) between the spatial and temporal scale levels are appropriately 0 0 0 2 2 2 different views of the same smooth local surface patch. matched (sϕ, s⊥ϕ, τ ) = (Ss sϕ,Ss s⊥ϕ,Sτ τ). c) Scale-normalized spatial receptive fields in the C. Scale normalisation of spatial and spatio-temporal case of a time-causal spatio-temporal domain: For the receptive fields case of a time-causal spatio-temporal domain, where the When computing receptive field responses over multi- temporal smoothing operation in the spatio-temporal re- ple spatial and temporal scales, there is an issue about how ceptive field model is performed by truncated exponential the receptive field responses should be normalized so as kernels coupled in cascade h(t; τ) = hcomposed(·; µ) to enable appropriate comparisons between receptive field (26), the corresponding scale-normalized spatio-temporal responses at different scales. Issues of scale normalisation derivative kernel corresponding to the spatio-temporal of the derivative based receptive fields defined from scale- receptive field model (24) is given by space operations are treated in [40], [41], [42] regarding T m1 m2 ¯n (x1, x2, t; s, τ; v, Σ) spatial receptive fields and in [21], [37], [43] regarding ϕ ⊥ϕ t ,norm m1γs/2 m2γs/2 spatio-temporal receptive fields. = sϕ s⊥ϕ αn,γτ (τ) m1 m2 n a) Scale-normalized spatial receptive fields: Let sϕ ∂ϕ ∂⊥ϕ∂t¯ (g(x1 − v1t, x2 − v2t; s, Σ) h(t; τ)) , and s⊥ϕ denote the eigenvalues of the composed affine (32) covariance matrix s Σ in the spatial receptive field model where γ and γ denote the spatial and and temporal scale (22) and let ∂ϕ and ∂⊥ϕ denote directional derivative op- s τ erators along the corresponding eigendirections. Then, the normalization parameters of γ-normalized derivatives and scale-normalized spatial derivative kernel corresponding αn,γτ (τ) is the temporal scale normalization factor, which to the receptive field model (22) is given by for the case of variance-based normalization is given by

nγτ /2 αn,γτ (τ) = τ (33) Tϕm1 ⊥ϕm2 ,norm(x1, x2; s, Σ) = m γ /2 m1γs/2 2 s m1 m2 in agreement with (31) while for the case of Lp- sϕ s⊥ϕ ∂ϕ ∂⊥ϕ (g(x1, x2; sΣ)) , (30) normalization it is given by [21, Equation (76)] where γs denotes the spatial scale normalization parame- Gn,γτ ter of γ-normalized derivatives and specifically the choice αn,γτ (τ) = , (34) khtn (·; τ)kp γs = 1 leads to maximum scale invariance in the sense that the magnitude response of the spatial receptive field with Gn,γτ denoting the Lp-norm of the nth order scale- will be covariant under uniform spatial scaling trans- normalized derivative of a non-causal Gaussian temporal 0 0 formations (x1, x2) = (Ss x1,Ss x2), provided that the kernel with scale normalization parameter γτ . In the 0 0 spatial scale levels are appropriately matched (sϕ, s⊥ϕ) = specific case when the temporal smoothing is performed 2 2 (Ss sϕ,Ss s⊥ϕ). using the scale-invariant limit kernel (29), the magnitude b) Scale-normalized spatial receptive fields in the response will for the maximally scale invariant choice case of a non-causal spatio-temporal domain: For of scale normalization parameters γs = 1 and γτ = 1 the case of a non-causal spatio-temporal domain, be invariant under independent scaling transformations 0 0 0 where the temporal smoothing operation in the spatio- of the spatial and the temporal domains (x1, x2, t ) = temporal receptive field model is performed by a (Ss x1,Ss x2,Sτ t) for general spatial scaling factors Ss j non-causal Gaussian√ temporal kernel h(t; τ) = and for temporal scaling factors Sτ = c that are g(t; τ) = 1/ 2πτ exp(−t2/2τ), the scale-normalized integer powers of the distribution parameter c of the spatio-temporal derivative kernel corresponding to the scale-invariant limit kernel, provided that both the spa- spatio-temporal receptive field model (24) is with corre- tial and temporal scale levels are appropriately matched 0 0 0 2 2 2 sponding notation regarding the spatial domain as in (30) (sϕ, s⊥ϕ, τ ) = (Ss sϕ,Ss s⊥ϕ,Sτ τ). D. Invariance to local multiplicative illumination varia- be different for different points in the world as tions or variations in exposure parameters mapped to corresponding image coordinates (x1, x2) The treatment so far has been concerned with modelling over time t, ˜ π d receptive fields under natural geometric image transfor- (iii) Ccam(f(t)) = 4 f represents the possibly time- mations, modelled as local scaling transformations, local dependent internal camera parameters with the ratio ˜ affine transformations and local Galilean transformations f = f/d referred to as the effective f-number, where representing the essential dimensions in the variability d denotes the diameter of the and f the focal of a local linearization of the perspective mapping from distance, and 2 2 a local surface patch in the world to the tangent plane (iv) V (x1, x2) = −2 log(1 + x1 + x2) represents a of the retina. A complementary issue concerns how to geometric natural vignetting effect corresponding to 4 model receptive field responses under variations in the the factor log cos (φ) for a planar image plane, with external illumination and under variations in the internal φ denoting the angle between the viewing direction exposure mechanisms of the eye that adapt the diameter (x1, x2, f) and the surface normal (0, 0, 1) of the of the pupil and the sensitivity of the photoreceptors image plane. This vignetting term disappears for a to the external illumination. In this section, we will spherical camera model. present a solution for this problem regarding the subset From the structure of Equation (37) we can note that for of intensity transformations that can be modelled as local any non-zero order of spatial differentiation with at least multiplicative intensity transformations. either m1 > 0 or m2 > 0, the influence of the internal ˜ To obtain theoretically well-founded handling of image camera parameters in Ccam(f(t)) will disappear because data under illumination variations, it is natural to represent of the spatial differentiation with respect to x1 or x2, and the image data on a logarithmic luminosity scale so will the effects of any other multiplicative exposure control mechanism. Furthermore, for any multiplicative f(x , x , t) ∼ log I(x , x , t). (35) 0 1 2 1 2 illumination variation i (x1, x2) = C i(x1, x2), where Specifically, receptive field responses that are computed C is a scalar constant, the logarithmic luminosity will 0 from such a logarithmic parameterization of the image be transformed as log i (x1, x2) = log C + log i(x1, x2), luminosities can be interpreted physically as a super- which implies that the dependency on C will disappear position of relative variations of surface structure and after spatial or temporal differentiation. illumination variations. Let us assume: (i) a perspective Thus, given that the image measurements are performed camera model extended with (ii) a thin circular lens for on a logarithmic brightness scale, the spatio-temporal gathering incoming light from different directions and receptive field responses will be automatically invariant (iii) a Lambertian illumination model extended with (iv) a under local multiplicative illumination variations as well spatially varying albedo factor for modelling the light that as under local multiplicative variations in the exposure is reflected from surface patterns in the world. Then, it parameters of the retina and the eye. can be shown [19, Section 2.3] that a spatio-temporal receptive field response IV. COMPUTATIONAL MODELLING OF BIOLOGICAL RECEPTIVE FIELDS Lϕm1 ⊥ϕm2 t¯n (·, ·; s, τ) = ∂ϕm1 ⊥ϕm2 t¯n Ts,τ f(·, ·) (36) In two comprehensive reviews, DeAngelis et al. [4], of the image data f, where T represents the spatio- s,τ [5] present overviews of spatial and temporal response temporal smoothing operator (here corresponding to a properties of (classical) receptive fields in the central spatio-temporal smoothing kernel of the form (23)) can visual pathways. Specifically, the authors point out the be expressed as limitations of defining receptive fields in the spatial

Lϕm1 ⊥ϕm2 t¯n (x1, x2, t; s, τ) = domain only and emphasize the need to characterize  receptive fields in the joint space-time domain, to describe = ∂ϕm1 ⊥ϕm2 t¯n Ts,τ log ρ(x1, x2, t) + log i(x1, x2, t) how a neuron processes the visual image. Conway and ˜  Livingstone [22] and Johnson et al. [23] show results of + log Ccam(f(t)) + V (x1, x2) , corresponding investigations concerning spatio-chromatic (37) receptive fields. where In the following, we will describe how the above (i) ρ(x1, x2, t) is a spatially dependent albedo factor derived idealized functional models of linear receptive that reflects properties of surfaces of objects in the fields can be used for modelling the spatial, spatio- environment with the implicit understanding that this chromatic and spatio-temporal response properties of bi- entity may in general refer to points on different ological receptive fields. Indeed, it will be shown that the surfaces in the world depending on the viewing di- derived idealized functional models lead to predictions of rection and thus the (possibly time-dependent) image receptive field profiles that are qualitatively very similar position (x(t), y(t)), to all the receptive field types presented in (DeAngelis et (ii) i(x1, x2, t) denotes a spatially dependent illumina- al. [4], [5]) and schematic simplifications of most of the tion field with the implicit understanding that the receptive fields shown in (Conway and Livingstone [22]) amount of incoming light on different surfaces may and (Johnson et al. [23]). hxxt(x, t; s, τ) −hxxtt(x, t; s, τ)

Fig. 12. Spatio-chromatic receptive field response of a double- opponent neuron as reported by Conway and Livingstone [22, Fig- ure 2, Page 10831], with the colour channels L, M and S essentially corresponding to red, green and blue, respectively. (From these L, M Fig. 10. Computational modelling of space-time separable receptive and S colour channels, corresponding red/green and yellow/blue colour- field profiles in the lateral geniculate nucleus (LGN) as reported by opponent channels can be formed from the differences between L to M DeAngelis et al. [4] using idealized spatio-temporal receptive fields and between L+M to S.) of the form T (x, t; s, τ) = ∂xm ∂tn (g(; s) h(t; τ)) according to Equation (24) with the temporal smoothing function h(t; τ) modelled as a cascade of first-order integrators/truncated exponential kernels of the form (26): (left) a “non-lagged cell” modelled using first-order temporal derivatives, (right) a “lagged cell” modelled√ using second-order√ temporal derivatives. Parameter values with σx = s and σt = τ: (a) hxxt: σx = 0.5 degrees, σt = 40 ms. (b) hxxtt: σx = 0.6 degrees, σt = 60 ms. (Horizontal dimension: space x. Vertical dimension: time t.)

∇2g(x, y; s)

1.5

1.0

0.5

0.0

-0.5

-1.0

-1.5 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 Fig. 13. Idealized spatio-chromatic receptive fields over the spatial Fig. 11. Computational modelling the spatial component of receptive domain corresponding to the application of the Laplacian operator fields in the LGN using the Laplacian of the Gaussian: (left) Receptive to positive and negative red/green and yellow/blue colour opponent fields in the LGN have approximately circular center-surround responses channels, respectively. These receptive fields can be seen as idealized in the spatial domain, as reported by DeAngelis et al. [4]. (right) models of the spatial component of double-opponent spatio-chromatic In terms of Gaussian derivatives, this spatial response profile can be receptive fields in the LGN. modelled by the Laplacian of the Gaussian ∇2g(x, y; s) = (x2 + 2 3 2 2 √ y − 2s)/(2πs ) exp(−(x + y )/2s), here with σs = s = 0.6 in units of degrees of visual angle. defined as a cell for which the second lobe dominates (Figure 10(right)). A. Spatial and spatio-temporal receptive fields in the When using a time-causal temporal smoothing kernel, LGN the first peak of a first-order temporal derivative will be Regarding visual receptive fields in the lateral genic- strongest, whereas the second peak of a second-order ulate nucleus (LGN), DeAngelis et al. [4], [5] report temporal derivative will be strongest (see [21, Figure 2]). that most neurons (i) have approximately circular center- Thus, according to this theory, non-lagged LGN cells can surround organization in the spatial domain and that be seen as corresponding to first-order time-causal tem- (ii) most of the receptive fields are separable in space- poral derivatives, whereas lagged LGN cells can be seen time. There are two main classes of temporal responses as corresponding to second-order time-causal temporal for such cells: (i) a “non-lagged cell” is defined as derivatives. a cell for which the first temporal lobe is the largest The spatial response, on the other hand, shows a one (Figure 10(left)), whereas (ii) a “lagged cell” is high similarity to a Laplacian of a Gaussian, leading to an idealized receptive field model of the form [19, ∂xg(x, y; Σ) Equation (108)] 2

1

0 hLGN (x, y, t; s, τ) = ±(∂xx+∂yy) g(x, y; s) ∂tn h(t; τ). (38) -1 Figure 11 shows a comparison between the spatial com- -2 ponent of a receptive field in the LGN with a Laplacian of -2 -1 0 1 2 the Gaussian. This model can also be used for modelling Fig. 14. Computational modelling of a receptive field profile over spatial on-center/off-surround and off-center/on-surround the spatial domain in the primary visual cortex (V1) as reported by receptive fields in the retina. Figure 10 shows results DeAngelis et al. [4], [5] using affine Gaussian derivatives: (middle) Receptive field profile of a over image intensities as of modelling space-time separable receptive fields in the reconstructed from cell recordings, with positive weights represented LGN in this way, using a cascade of truncated exponential as green and negative weights by red. (left) Stylized simplification of kernels of the form (26) for temporal smoothing over the the receptive field shape. (right) Idealized model of the receptive field from a first-order directional derivative of an affine Gaussian kernel temporal domain. ∂ g(x, y; Σ) = ∂ g(x, y; λ , λ ) according to (20), here with x √ x xp y Regarding the spatial domain, the model in terms of σx = λx = 0.45 and σy = λy = 1.4 in units of degrees of visual angle, and with positive weights with respect to image intensities spatial Laplacians of Gaussians (∂xx + ∂yy) g(x, y; s) is closely related to differences of Gaussians, which have represented by white and negative values by violet. previously been shown to constitute a good approximation ∂⊥ϕg(x, y; Σ) of the spatial variation of receptive fields in the retina and the LGN (Rodieck [29]). This property follows from the fact that the rotationally symmetric Gaussian satisfies the isotropic diffusion equation

1 ∇2L(x, y; s) = ∂ L(x, y; s) 2 s L(x, y; s + ∆s) − L(x, y; s) ≈ Fig. 15. Computational modelling of a spatio-chromatic receptive ∆s field profile over the spatial domain for a double-opponent simple DOG(x, y; s, ∆s) cell in the primary visual cortex (V1) as measured by Johnson et al. = , (39) [23] using affine Gaussian derivatives over a colour-opponent channel: ∆s (left) Responses to L-cones corresponding to long wavelength red cones, with positive weights represented by red and negative weights which implies that differences of Gaussians can be in- by blue. (middle) Responses to M-cones corresponding to medium terpreted as approximations of derivatives over scale and wavelength green cones, with positive weights represented by red and negative weights by blue. (right) Idealized model of the receptive field hence to Laplacian responses. Conceptually, this implies from a first-order directional derivative of an affine Gaussian kernel very good agreement with the spatial component of the ∂⊥ϕg(x, y; Σ) according√ to (20) over a red-green√ colour-opponent LGN model (38) in terms of Laplacians of Gaussians. channel for σ1 = λ1 = 0.6 and σ2 = λ2 = 0.2 in units of et al. degrees of visual angle, α = 67 degrees and with positive weights for More recently, Bonin [44] have found that LGN the red-green colour-opponent channel represented by red and negative responses in cats are well described by difference-of- values by green. Gaussians and temporal smoothing complemented by a non-linear gain control mechanism (not modelled here). colour-opponent channels,

   1 1 1    f 3 3 3 R B. Double-opponent spatio-chromatic receptive fields in 1 1  u  =  2 − 2 0   G  , (40) the LGN 1 1 v 2 2 −1 B In a study of spatio-chromatic response properties respectively, we obtain equivalent spatio-chromatic of V1 neurons in the alert macaque monkey, Conway receptive fields corresponding to red-center/green- and Livingstone [22] describe receptive fields with ap- surround, green-center/red-surround, yellow-center/blue- proximately circular red/green and yellow/blue colour- surround or blue-center/yellow-surround, respectively, as opponent response properties over the spatio-chromatic shown in Figure 13 and corresponding to the following domain, see Figure 12. Such cells are referred to as spatio-chromatic receptive field model applied to the double-opponent cells, since they simultaneously compute RGB channels both spatial and chromatic opponency. According to Con- way and Livingstone [22], this cell type can be regarded as hdouble−opponent(x, y; s) = the first layer of spatially opponent colour computations.  1 − 1 0  ± (∂ + ∂ ) g(x, y; s) 2 2 . (41) If we, motivated by the previous application of Lapla- xx yy 1 1 −1 cian of Gaussian functions to model rotationally sym- 2 2 metric on-center/off-surround and off-center/on-surround In this respect, these spatio-chromatic receptive fields can receptive fields in the LGN (38), apply the Laplacian be used as an idealized model for the spatio-chromatic of the Gaussian operator to red/green and yellow/blue response properties for double-opponent cells. hxt(x, t; s, τ) −hxxt(x, t; s, τ)

hxx(x, t; s, τ, v) −hxxx(x, t; s, τ, v)

Fig. 16. Computational modelling of simple cells in the primary visual cortex (V1) based on neural cell recordings reported by DeAngelis et al. [4] and using idealized spatio-temporal receptive fields of the form T (x, t; s, τ, v) = ∂xm ∂tn (g(x − vt; s) h(t; τ)) according to Equation (24) and with the temporal smoothing function h(t; τ) modelled as a cascade of first-order integrators/truncated exponential kernels of the form (26): (upper part) Separable receptive fields corresponding to mixed derivatives of first- or second-order derivatives over space with first-order derivatives over time. (lower√ part) Inseparable√ velocity-adapted receptive fields corresponding to second- or third-order derivatives over space. Parameter values with σx = s and σt = τ: (a) hxt: σx = 0.6 degrees, σt = 60 ms. (b) hxxt: σx = 0.6 degrees, σt = 80 ms. (c) hxx: σx = 0.7 degrees, σt = 50 ms, v = 0.007 degrees/ms. (d) hxxx: σx = 0.5 degrees, σt = 80 ms, v = 0.004 degrees/ms. (Horizontal axis: Space x in degrees of visual angle. Vertical axis: Time t in ms.) C. Spatial, spatio-chromatic and spatio-temporal recep- model provides a better parameterization of the spatio- tive fields in V1 temporal receptive field model over a non-causal spatio- Concerning the neurons in the primary visual cortex temporal domain based on the Gaussian spatio-temporal (V1), DeAngelis et al. [4], [5] describe that their receptive scale-space concept. fields are generally different from the receptive fields in This model or earlier versions of it has in turn been the LGN in the sense that they are (i) oriented in the exploited for modelling of biological receptive fields by spatial domain and (ii) sensitive to specific ve- Lowe [47], May and Georgeson [48], Hesse and George- et al. locities. Cells (iii) for which there are precisely localized son [49], Georgeson [50], Wallis and Georgeson “on” and “off” subregions with (iv) spatial summation [51], Hansen and Neumann [52], Wang and Spratling et al. within each subregion, (v) spatial antagonism between [53], Mahmoodi [54], [55] and Pei [56]. on- and off-subregions and (vi) whose visual responses A. Relations to modelling by Gabor functions to stationary or moving spots can be predicted from the Gabor functions [57] spatial subregions are referred to as simple cells (Hubel and Wiesel [1], [2], [3]). G(x; s, ω) = e−iωx g(x; s), (42) Figure 14 shows an example of the spatial dependency have been frequently used for modelling spatial recep- of a simple cell, that can be well modelled by a first-order tive fields (Marceljaˇ [24]; Jones and Palmer [25], [26]; affine Gaussian derivative over image intensities. Fig- Ringach [27], [28]) motivated by their property of min- ure 15 shows corresponding results for a color-opponent imizing the uncertainty relation. This motivation can, receptive field of a simple cell in V1, that can be modelled however, be questioned on both theoretical and empiri- as a first-order affine Gaussian spatio-chromatic derivative cal grounds. Stork and Wilson [58] argue that (i) only over an R-G colour-opponent channel. complex-valued Gabor functions that cannot describe Figure 16 shows the result of modelling the spatio- single receptive field minimize the uncertainty relation, temporal receptive fields of simple cells in V1 in this (ii) the real functions that minimize this relation are way, using the general idealized model of spatio-temporal Gaussian derivatives rather than Gabor functions and receptive fields in Equation (24) in combination with a (iii) comparisons among Gabor and alternative fits to temporal smoothing kernel obtained by coupling a set both psychophysical and physiological data have shown of truncated exponential kernels in cascade (26). The that in many cases other functions (including Gaussian results in the upper part show space-time separable spatio- derivatives) provide better fits than Gabor functions do. temporal receptive fields corresponding to zero image Conceptually, the ripples of the Gabor functions, which velocity v = 0, and corresponding to either first- or are given by complex sine waves, are related to the second-order spatial derivatives over image space in com- ripples of Gaussian derivatives, which are given by bination with first-order temporal derivatives over time. Hermite functions. A Gabor function, however, requires The results in the lower part show inseparable spatio- the specification of a scale parameter and a frequency, temporal receptive fields corresponding to non-zero image whereas a Gaussian derivative requires a scale parameter velocities and based on either second- or third-order and the order of differentiation (per spatial and temporal spatial derivatives over image space. dimension). With the Gaussian derivative model, receptive As can be seen from these figures, the proposed ide- fields of different orders can be mutually related by alized receptive field models do quite well reproduce derivative operations, and be computed from each other the qualitative shape of the neurophysiologically recorded by nearest-neighbour operations. The zero-order receptive biological receptive fields. fields as well as the derivative based receptive fields can be modelled by diffusion equations, and can therefore V. RELATIONSTOPREVIOUSWORK be implemented by computations between neighbouring Young [30] has shown how spatial receptive fields in computational units. cats and monkeys can be well modelled by Gaussian In relation to invariance properties, the family of affine derivatives up to order four. Young et al. [31], [32] have Gaussian kernels is closed under affine image deforma- also shown how spatio-temporal receptive fields can be tions, whereas the family of Gabor functions obtained modelled by Gaussian derivatives over a spatio-temporal by multiplying rotationally symmetric Gaussians with domain, corresponding to the Gaussian spatio-temporal sine and cosine waves is not closed under affine im- concept described here, although with a different type age deformations. This means that it is not possible to of parameterization; see also [45], [46] for our closely compute truly affine invariant image representations from related earlier work. The normative theory for visual such Gabor functions. Instead, given a pair of images receptive fields presented in [13], [19], [20], [21] and here that are related by a non-uniform image deformation, does first of all provide additional theoretical foundation the lack of affine covariance implies that there will be for Young’s spatial modelling work based on Koenderink a systematic bias in image representations derived from and van Doorn’s theory [8], [9] and does additionally such Gabor functions, corresponding to the difference provide a conceptual extension to a time-causal spatio- between the backprojected Gabor functions in the two temporal domain that takes into explicit account the fact image domains. If using receptive profiles defined from that the future cannot be accessed. Additionally, our directional derivatives of affine Gaussian kernels, it will on the other hand be possible to compute affine invari- outside world and the transformations that occur when a ant image representations [20], in turn providing better three-dimensional world is projected to a two-dimensional internal consistency between receptive field responses image domain. computed from different views of objects in the world. C. Logarithmic brightness scale With regard to invariance to multiplicative illumination variations, the even cosine component of a Gabor function The notion of a logarithmic brightness scale goes back does in general not have its integral equal to zero, which to the Greek astronomer Hipparchus, who constructed a means that the illumination invariant properties under subjective scale for the brightness of stars in six steps multiplicative illumination variations or exposure control labelled “1 . . . 6”, where the brightest stars were said to mechanisms described in Section III-D do not hold for be of the first magnitude (m = 1), while the faintest stars Gabor functions. near the limits of human perception were of the sixth mag- In this respect, the Gaussian derivative model is sim- nitude. Later, when quantitative physical measurements pler, it can be related to image measurements by dif- were made possible of the intensities of different stars, ferential geometry, be derived axiomatically from sym- it was noted that Hipparchus subjective scale did indeed metry principles, be computed from a minimal set of correspond to a logarithmic scale. In astronomy today, connections and allows for provable invariance properties the apparent brightness of stars is still measured on a under locally linearized image deformations (affine trans- logarithmic scale, although extended over a much wider formations) as well as local multiplicative illumination span of intensity values. A logarithmic transformation of variations and exposure control mechanisms. image intensities is also used in the retinex theory (Land [68], [69]). B. Relations to approaches for learning receptive fields In psychophysics, the Weber-Fechner law attempts to from natural image statistics describe the relationship between the physical magnitude Work has also been performed on learning receptive and the perceived intensity of stimuli. This law states that field properties and visual models from the statistics the ratio of an increment threshold ∆I for a just notice- of natural image data (Field [59]; van der Schaaf and able difference in relation to the background intensity I van Hateren [60]; Olshausen and Field [61]; Rao and is constant over large ranges of magnitude variations [70, Pages 671–672] Ballard [62]; Simoncelli and Olshausen [63]; Geisler [64]; ∆I Hyvarinen¨ et al. [65]; Lorincz¨ [66]) and been shown to = k, (43) I lead to the formation of similar receptive fields as found in biological vision. The proposed theory of receptive fields where the constant k is referred to as the Weber ratio. The can be seen as describing basic physical constraints under theoretical analysis of invariance properties of a logarith- which a learning based method for the development of mic brightness scale under multiplicative transformations receptive fields will operate and the solutions to which of the illumination field as well as multiplicative exposure an optimal adaptive system may converge to, if exposed control mechanisms in Section III-D is in excellent agree- to a sufficiently large and representative set of natural ment with these psychophysical findings. If one considers image data (see Figure 17). an adaptive image exposure mechanism in the retina that Field [59] as well as Doi and Lewicki [67] have adapts the diameter of the pupil and the sensitivity of the described how ”natural images are not random, instead photopigments such that relative range variability in the they exhibit statistical regularities” and have used such signal divided by the mean illumination is held constant statistical regularities for constraining the properties of (43) (see e.g. Peli [71]), then such an adaptation mecha- receptive fields. The theory presented in this paper can nism can be seen as implementing an approximation of be seen as a theory at a higher level of abstraction, in the derivative of a logarithmic transformation dz terms of basic principles that reflect properties of the d(log z) = . (44) environment that in turn determine properties of the image z data, without need for explicitly constructing specific For a strictly positive entity z, there are also information statistical models for the image statistics. Specifically, the theoretic arguments to regard log z as a default parameter- proposed theory can be used for explaining why the above ization (Jaynes [72]). This property is essentially related mentioned statistical models lead to qualitatively similar to the fact that the ratio dz/z then becomes a dimension- types of receptive fields as the idealized functional models less integration measure. A general recommendation of of receptive fields obtained from our theory. care should, however, be taken when using such reason- An interesting observation that can be made from the ing based on dimensionality arguments, since important similarities between the receptive field families derived phenomena could be missed, e.g., in the presence of by necessity from the assumptions and the receptive hidden variables. The physical modelling of the effect of profiles found by cell recordings in biological vision, illumination variations on receptive field measurements is that receptive fields in the retina, LGN and V1 of in Section III-D provides a formal justification for using higher mammals are very close to ideal in view of a logarithmic brightness scale in this context as well as the stated structural requirements/symmetry properties. an additional contribution of showing how the receptive In this sense, biological vision can be seen as having field measurements can be related to inherent physical adapted very well to the transformation properties of the properties of object surfaces in the environment. Fig. 17. Two structurally different ways of deriving receptive field shapes for a vision system intended to infer properties of the world by either biological or artificial visual perception. (top row) A traditional model for learning receptive fields shapes consists of collecting real-world image data from the environment, and then applying learning algorithms possibly in combination with evolution over multiple generations of the organism that the vision system is a part of. (bottom row) With the normative theory for receptive fields presented in this paper, a short-cut is made in the sense that the derivation of receptive field shapes starts from structural properties of the world (corresponding to symmetry properties in theoretical physics) from which receptive field shapes are constrained by theoretical mathematical inference.

VI.SUMMARY about receptive field profiles in good agreement with Neurophysiological cell recordings have shown that receptive fields found by cell recordings in biological mammalian vision has developed receptive fields that are vision. Specifically, idealized functional models have been tuned to different sizes and orientations in the image presented for space-time separable receptive fields in the domain as well as to different image velocities in space- retina and the LGN and for both space-time separable and time. A main message of this article has been to show non-separable simple cells in the primary visual cortex that it is possible to derive such families of receptive field (V1). profiles by necessity, given a set of structural requirements on the first stages of visual processing as formalized The qualitatively very good agreement between the into the notion of an idealized vision system, and whose predicted receptive field profiles from the normative ax- functionality is determined by set of mathematical and iomatic theory with the receptive field profiles found physical assumptions (see Figure 17). by neurophysiological measurements indicates that the These structural requirements reflect structural proper- earliest receptive fields in higher mammal vision have ties of the world for the receptive fields to be compatible reached a state that can be seen as very close to ideal with natural image transformations including: (i) varia- in view of the stated structural requirements/symmetry tions in the sizes of objects in the world, (ii) variations properties. In this sense, biological vision can be seen as in the viewing distance, (iii) variations in the viewing having adapted very well to the transformation properties direction, (iv) variations in the relative motion between of the outside world and to the transformations that occur objects in the world and the observer, (v) variations in when a three-dimensional world is projected onto a two- the speed by which temporal events occur and (vi) local dimensional image domain. multiplicative illumination variations or multiplicative ex- posure control mechanisms, which are natural to adapt to Compared to more common approaches of learning for a vision system that is to interact with the world in receptive field profiles from natural image statistics, the a successful manner. In a competition between different proposed framework makes it possible to derive the , adaptation to these properties may constitute shapes of idealized receptive fields without any need an evolutionary advantage. for training data. The proposed framework for invariance The presented theoretical model provides a normative and covariance properties also adds explanatory value by theory for deriving functional models of linear receptive showing that the families of receptive profiles tuned to fields based on Gaussian derivatives and closely related different orientations in space and image velocities in operators. Specifically, the proposed theory can explain space-time that can be observed in biological vision can the different shapes of receptive field profiles that are be explained from the requirement that the receptive fields found in biological vision from a requirement that the should be covariant under basic image transformations to should be able to compute covariant re- enable true invariance properties. If the underlying recep- ceptive field responses under the natural types of image tive fields would not be covariant, then there would be transformations that occur in the environment, to enable a systematic bias in the visual operations, corresponding the computation of invariant representations for percep- to the amount of mismatch between the backprojected tion at higher levels [20]. receptive fields. The presented theory leads to a computational frame- work for defining spatial and spatio-temporal receptive Corresponding types of arguments applied to the area fields from visual data with the attractive properties that: of hearing, lead to the formulation of a normative theory (i) the receptive field profiles can be derived by neces- of auditory receptive fields (Lindeberg and Friberg [73], sity from first principles and (ii) it leads to predictions [74]). ACKNOWLEDGMENTS [26] ——, “An evaluation of the two-dimensional Gabor filter model of simple receptive fields in cat striate cortex,” J. of Neurophysiology, The support from the Swedish Research Council (Grant vol. 58, pp. 1233–1258, 1987. Number 2014-4083) is gratefully acknowledged. [27] D. L. Ringach, “Spatial structure and symmetry of simple-cell receptive fields in macaque primary visual cortex,” Journal of Neurophysiology, vol. 88, pp. 455–463, 2002. REFERENCES [28] ——, “Mapping receptive fields in primary visual cortex,” Journal of Physiology, vol. 558, no. 3, pp. 717–728, 2004. [1] D. H. Hubel and T. N. Wiesel, “Receptive fields of single neurones [29] R. W. Rodieck, “Quantitative analysis of cat in the cat’s striate cortex,” J Physiol, vol. 147, pp. 226–238, 1959. response to visual stimuli,” Vision Research, vol. 5, no. 11, pp. [2] ——, “Receptive fields, binocular interaction and functional archi- 583–601, 1965. tecture in the cat’s visual cortex,” J Physiol, vol. 160, pp. 106–154, [30] R. A. Young, “The Gaussian derivative model for spatial vision: 1962. I. Retinal mechanisms,” Spatial Vision, vol. 2, pp. 273–293, 1987. [3] ——, Brain and Visual Perception: The Story of a 25-Year [31] R. A. Young, R. M. Lesperance, and W. W. Meyer, “The Gaussian Collaboration. Oxford University Press, 2005. derivative model for spatio-temporal vision: I. Cortical model,” [4] G. C. DeAngelis, I. Ohzawa, and R. D. Freeman, “Receptive field Spatial Vision, vol. 14, no. 3, 4, pp. 261–319, 2001. dynamics in the central visual pathways,” Trends in Neuroscience, [32] R. A. Young and R. M. Lesperance, “The Gaussian derivative vol. 18, no. 10, pp. 451–457, 1995. model for spatio-temporal vision: II. Cortical data,” Spatial Vision, [5] G. C. DeAngelis and A. Anzai, “A modern view of the classical vol. 14, no. 3, 4, pp. 321–389, 2001. receptive field: Linear and non-linear spatio-temporal processing [33] A. Omurtag, B. W. Knight, and L. Sirovich, “On the simulation by V1 neurons,” in The Visual Neurosciences, L. M. Chalupa and of large populations of neurons,” Journal of Computational Neu- J. S. Werner, Eds. MIT Press, 2004, vol. 1, pp. 704–719. roscience, vol. 8, pp. 51–63, 2000. [6] T. Iijima, “Observation theory of two-dimensional visual patterns,” [34] M. Mattia and P. D. Guidice, “Population dynamics of interacting Papers of Technical Group on Automata and Automatic Control, spiking neurons,” Physics Review E, vol. 66, no. 5, p. 051917, IECE, Japan, Tech. Rep., 1962. 2002. [7] A. P. Witkin, “Scale-space filtering,” in Proc. 8th Int. Joint Conf. [35] O. Faugeras, J. Toubol, and B. Cessac, “A constructive mean-field Art. Intell., Karlsruhe, Germany, Aug. 1983, pp. 1019–1022. analysis of multi-population neural networks with random synaptic [8] J. J. Koenderink, “The structure of images,” Biological Cybernet- weights and stochastic inputs,” Frontiers in Computational Neuro- ics, vol. 50, pp. 363–370, 1984. science, vol. 3, no. 1, p. 10.3389/neuro.10.001.2009, 2009. [9] J. J. Koenderink and A. J. van Doorn, “Representation of local [36] I. I. Hirschmann and D. V. Widder, The Convolution Transform. geometry in the visual system,” Biological Cybernetics, vol. 55, Princeton, New Jersey: Princeton University Press, 1955. pp. 367–375, 1987. [37] T. Lindeberg, “Temporal scale selection in time-causal scale [10] ——, “Generic neighborhood operators,” IEEE Trans. Pattern space,” Journal of Mathematical Imaging and Vision, 2017, Analysis and Machine Intell., vol. 14, no. 6, pp. 597–605, Jun. doi:10.1007/s10851-016-0691-3. 1992. [38] ——, “Scale-space for discrete signals,” IEEE Trans. Pattern [11] T. Lindeberg, Scale-Space Theory in Computer Vision. Springer, Analysis and Machine Intell., vol. 12, no. 3, pp. 234–254, Mar. 1993. 1990. [12] ——, “Scale-space theory: A basic tool for analysing struc- [39] T. Lindeberg and D. Fagerstrom,¨ “Scale-space with causal time di- tures at different scales,” Journal of Applied Statistics, rection,” in Proc. European Conf. on Computer Vision (ECCV’96), vol. 21, no. 2, pp. 225–270, 1994, also available from ser. Springer LNCS, vol. 1064, Cambridge, UK, Apr. 1996, pp. http://www.csc.kth.se/∼tony/abstracts/Lin94-SI-abstract.html. 229–240. [13] ——, “Generalized Gaussian scale-space axiomatics comprising [40] T. Lindeberg, “Feature detection with automatic scale selection,” linear scale-space, affine scale-space and spatio-temporal scale- International Journal of Computer Vision, vol. 30, no. 2, pp. 77– space,” Journal of Mathematical Imaging and Vision, vol. 40, 116, 1998. no. 1, pp. 36–81, 2011. [41] ——, “Edge detection and ridge detection with automatic scale [14] ——, “Generalized axiomatic scale-space theory,” in Advances in selection,” International Journal of Computer Vision, vol. 30, Imaging and Electron Physics, P. Hawkes, Ed. Elsevier, 2013, no. 2, pp. 117–154, 1998. vol. 178, pp. 1–96. [42] ——, “Scale selection,” in Computer Vision: A Reference Guide, [15] L. M. J. Florack, Image Structure, ser. Series in Mathematical K. Ikeuchi, Ed. Springer, 2014, pp. 701–713. Imaging and Vision. Springer, 1997. [43] ——, “Spatio-temporal scale selection in video data,” In prepara- [16] J. Sporring, M. Nielsen, L. Florack, and P. Johansen, Eds., Gaus- tion, 2017. sian Scale-Space Theory: Proc. PhD School on Scale-Space The- [44] V. Bonin, V. Mante, and M. Carandini, “The suppressive field ory, ser. Series in Mathematical Imaging and Vision. Copenhagen, of neurons in the lateral geniculate nucleus,” The Journal of Denmark: Springer, 1997. Neuroscience, vol. 25, no. 47, pp. 10 844–10 856, 2005. [17] J. Weickert, S. Ishikawa, and A. Imiya, “Linear scale-space has [45] T. Lindeberg, “Linear spatio-temporal scale-space,” in Proc. Inter- first been proposed in Japan,” Journal of Mathematical Imaging national Conference on Scale-Space Theory in Computer Vision and Vision, vol. 10, no. 3, pp. 237–252, 1999. (Scale-Space’97), ser. Springer LNCS, B. M. ter Haar Romeny, [18] B. ter Haar Romeny, Front-End Vision and Multi-Scale Image L. M. J. Florack, J. J. Koenderink, and M. A. Viergever, Eds., Analysis. Springer, 2003. vol. 1252. Utrecht, The Netherlands: Springer, Jul. 1997, pp. [19] T. Lindeberg, “A computational theory of visual receptive fields,” 113–127. Biological Cybernetics, vol. 107, no. 6, pp. 589–635, 2013. [46] ——, “Linear spatio-temporal scale-space,” Dept. of [20] ——, “Invariance of visual operations at the level of receptive Numerical Analysis and Computer Science, KTH, Tech. fields,” PLOS ONE, vol. 8, no. 7, p. e66990, 2013. Rep. ISRN KTH/NA/P--01/22--SE, Nov. 2001, available from [21] ——, “Time-causal and time-recursive spatio-temporal receptive http://www.csc.kth.se/cvap/abstracts/cvap257.html. fields,” Journal of Mathematical Imaging and Vision, vol. 55, no. 1, [47] D. G. Lowe, “Towards a computational model for object recogni- pp. 50–88, 2016. tion in IT cortex,” in Biologically Motivated Computer Vision, ser. [22] B. R. Conway and M. S. Livingstone, “Spatial and temporal Springer LNCS, vol. 1811. Springer, 2000, pp. 20–31. properties of cone signals in alert macaque primary visual cortex,” [48] K. A. May and M. A. Georgeson, “Blurred edges look faint, and Journal of Neuroscience, vol. 26, no. 42, pp. 10 826–10 846, 2006. faint edges look sharp: The effect of a gradient threshold in a [23] E. N. Johnson, M. J. Hawken, and R. Shapley, “The orientation multi-scale edge coding model,” Vision Research, vol. 47, no. 13, selectivity of color-responsive neurons in Macaque V1,” The pp. 1705–1720, 2007. Journal of Neuroscience, vol. 28, no. 32, pp. 8096–8106, 2008. [49] G. S. Hesse and M. A. Georgeson, “Edges and bars: where do [24] S. Marcelja, “Mathematical description of the responses of simple people see features in 1-D images?” Vision Research, vol. 45, no. 4, cortical cells,” Journal of Optical Society of America, vol. 70, pp. 507–525, 2005. no. 11, pp. 1297–1300, 1980. [50] M. A. Georgeson, K. A. May, T. C. A. Freeman, and G. S. Hesse, [25] J. Jones and L. Palmer, “The two-dimensional spatial structure of “From filters to features: Scale-space analysis of edge and blur simple receptive fields in cat striate cortex,” J. of Neurophysiology, coding in human vision,” Journal of Vision, vol. 7, no. 13, pp. 7–, vol. 58, pp. 1187–1211, 1987. 2007. [51] S. A. Wallis and M. A. Georgeson, “Mach edges: Local features predicted by 3rd derivative spatial filtering,” Vision Research, vol. 49, no. 14, pp. 1886–1893, 2009. [52] T. Hansen and H. Neumann, “A recurrent model of contour integration in primary visual cortex,” Journal of Vision, vol. 8, no. 8, pp. 8–, 2008. [53] Q. Wang and M. W. Spratling, “Contour detection in colour images using a neurophysiologically inspired model,” Cognitive Computation, vol. 8, no. 6, pp. 1027–1035, 2016. [54] S. Mahmoodi, “Linear neural circuitry model for visual receptive fields,” Journal of Mathematical Imaging and Vision, vol. 54, no. 2, pp. 1–24, 2016. [55] ——, “Nonlinearity in simple and complex cells in early biological visual systems,” Journal of Mathematical Imaging and Vision, pp. 1–10, 2017. [56] Z.-J. Pei, G.-X. Gao, B. Hao, Q.-L. Qiao, and H.-J. Ai, “A cascade model of information processing and encoding for retinal prosthesis,” Neural Regeneration Research, vol. 11, no. 4, p. 646, 2016. [57] D. Gabor, “Theory of communication,” J. of the IEE, vol. 93, pp. 429–457, 1946. [58] D. G. Stork and H. R. Wilson, “Do Gabor functions provide appropriate descriptions of visual cortical receptive fields,” Journal of Optical Society of America, vol. 7, no. 8, pp. 1362–1373, 1990. [59] D. J. Field, “Relations between the statistics of natural images and the response properties of cortical cells,” Journal of Optical Society of America, vol. 4, pp. 2379–2394, 1987. [60] A. van der Schaaf and J. H. van Hateren, “Modelling the power spectra of natural images: Statistics and information,” Vision Research, vol. 36, no. 17, pp. 2759–2770, 1996. [61] B. A. Olshausen and D. J. Field, “Emergence of simple-cell receptive field properties by learning a sparse code for natural images,” Journal of Optical Society of America, vol. 381, pp. 607– 609, 1996. [62] R. P. N. Rao and D. H. Ballard, “Development of localized oriented receptive fields by learning a translation-invariant code for natural images,” Computation in Neural Systems, vol. 9, no. 2, pp. 219– 234, 1998. [63] E. P. Simoncelli and B. A. Olshausen, “Natural image statistics and neural representations,” Annual Review of Neuroscience, vol. 24, pp. 1193–1216, 2001. [64] W. S. Geisler, “Visual perception and the statistical properties of natural scenes,” Annual Review of Psychology, vol. 59, pp. 10.1– 10.26, 2008. [65] A. Hyvarinen,¨ J. Hurri, and P. O. Hoyer, Natural Image Statistics: A Probabilistic Approach to Early Computational Vision, ser. Computational Imaging and Vision. Springer, 2009. [66] A. Lorincz,¨ Z. Palotal, and G. Szirtes, “Efficient sparse cod- ing in early sensory processing: Lessons from signal recovery,” PLOS Computational Biology, vol. 8(3), no. e1002372, 2012, 10.1371/journal.pcbi.1002372. [67] E. Doi and M. S. Lewicki, “Relations between the statistical regularities of natural images and the response properties of the early visual system,” in Japanese Cognitive Science Society: Sig P&P, Kyoto University, 2005, pp. 1–8. [68] E. H. Land, “The retinex theory of colour vision,” Proc. Royal Institution of Great Britain, vol. 57, pp. 23–58, 1974. [69] ——, “Recent advances in retinex theory,” Vision Research, vol. 26, no. 1, pp. 7–21, 1986. [70] S. E. Palmer, Vision Science: Photons to Phenomenology. MIT Press, 1999, first Edition. [71] E. Peli, “Contrast in complex images,” Journal of the Optical Society of America (JOSA A), vol. 7, no. 10, pp. 2032–2040, 1990. [72] E. T. Jaynes, “Prior probabilities,” Trans. on Systems Science and Cybernetics, vol. 4, no. 3, pp. 227–241, 1968. [73] T. Lindeberg and A. Friberg, “Idealized computational models of auditory receptive fields,” PLOS ONE, vol. 10, no. 3, pp. e0 119 032:1–58, 2015. [74] ——, “Scale-space theory for auditory signals,” in Proc. Scale- Space and Variational Methods for Computer Vision (SSVM 2015), ser. Lecture Notes in Computer Science, vol. 9087. Springer, 2015, pp. 3–15.