What happens in biological vision?

Biological Vision • patterns of light enter the eyes

Keith May • the brain extracts Department of Optometry and Visual Science useful information City University • generates London appropriate behaviour [email protected] www.keithmay.org • generates conscious perceptual experience

Perception is correlated with physical reality Goethe vs Newton

• our senses tell us about the physical world • Goethe criticised Newton’s ideas about the nature of light • But his criticisms rest on a failure to understand the distinction between physical light and our perception of it

• but there’s a distinction between the physical reality and the perceptual state that it creates (Opticks, pp. 108-109)

1 What is colour?

• monochromatic light with wavelength around 580 nm looks yellow • but so does a mixture of “red” and “” light • the only thing that all yellow things have in common is that they look yellow to us • Colour is a perceptual state

Simultaneous contrast Simultaneous contrast

• suggests the is using intensity gradients across the image, rather than raw light intensities

2 Gradient domain Gradient domain

Gradient domain Simultaneous contrast - Adelson’s version

http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html

3 Simultaneous contrast - Adelson’s version Simultaneous contrast - Adelson’s version

http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html http://web.mit.edu/persci/people/adelson/checkershadow_illusion.html

Simultaneous contrast - Lotto’s version Simultaneous contrast - Lotto’s version

http://www.lottolab.org/ http://www.lottolab.org/

4 Simultaneous contrast - Lotto’s version Simultaneous contrast - Lotto’s version

http://www.lottolab.org/ http://www.lottolab.org/

Simultaneous contrast - Lotto’s version Simultaneous contrast - Lotto’s version

http://www.lottolab.org/ http://www.lottolab.org/

5 Illusions of colour Illusions of colour

http://www.lottolab.org/ http://www.lottolab.org/

Illusions of colour Illusions of colour

http://www.lottolab.org/ http://www.lottolab.org/

6 Illusions of colour Illusions of colour

http://www.lottolab.org/ http://www.lottolab.org/

Hermann grid Hermann grid

John Tyndall Ludimar Hermann

7 Hermann grid Chladni patterns

Ludimar Hermann

Hermann grid Hermann grid

Ludimar Hermann Ludimar Hermann

8 Scintillating

Vasarely, Tlinko (1955) Vasarely, Eridan II (1956) Schrauf, Lingelbach & Wist (1997)

Hermann grid Hermann grid “An explanation of this phenomenon by simultaneous contrast is easy. The apparent brightness of each point on the white grid depends on the amount of black which exists in a certain area around it. If one assumes the diameter of this area to be larger than the width of the white stripes, then each point on the intersections receives in its surround less black than any other point on the white stripes; its brightness will thus be less enhanced by contrast and must therefore appear darker” (Hermann, 1870)

(Hermann, 1870)

9 The eye Recording from ganglion cells

ganglion Haldan K. Hartline cells (Hartline, 1938)

Axon (output) Dendrites (input)

“spikes” (Granit, 1948) Ragnar Granit

Receptive field Ganglion cell structure

• Ganglion cells get their input from a small, local • Classic early study on ganglion cell receptive fields in cats patch of the , called the “receptive field” was by Kuffler (1953) • Discovered that they have a centre-surround structure Receptive field of a neuron is “the region of the retina which must be illuminated in order to obtain a response” (Hartline, 1938)

• Nowadays, more often defined in terms of visual space, rather than surface of retina • Distances expressed in terms of the visual angle θ θ subtended at the • For neurons in early stages of visual system, receptive field of each neuron has a fixed location within the visual field • Neurons that are close to each other in the eye/brain have receptive fields that are close to each other in visual space – “retinotopic” organisation

10 Ganglion cell receptive field structure Baumgartner (1960) • Proposed explanation of Hermann grid illusion • Receptive field size increases with based on ganglion cell receptive fields increasing eccentricity, i.e. small in • For on-centre cells, more inhibition at the centre of the visual field, and junctions than middle of the stripes large in the periphery (Hubel & Wiesel, 1960) • If perceived brightness is determined by activation of on-centre cells, then junctions should appear dark

• Accounts for a number of characteristics of • In darkness, the antagonistic Hermann grid illusion (Spillmann, 1994) surround disappears, and central region grows (Barlow, Fitzhugh & Kuffler, 1957; Enroth-Cugell & Robson, 1966)

Baumgartner (1960) Problem with Baumgartner’s explanation (1) • Proposed explanation of Hermann grid illusion based on ganglion cell receptive fields • For on-centre cells, more inhibition at junctions than middle of the stripes • If perceived brightness is determined by activation of on-centre cells, then junctions should appear dark

• Accounts for a number of characteristics of Hermann grid illusion (Spillmann, 1994) • In the centre of the visual field, receptive fields are small, and whole receptive field will be illuminated, whether it is at the junctions or the middle of a stripe, so there should be no illusory dark patches in central vision • Illusion weakens with decreasing brightness, as predicted from disappearance of inhibitory surround in darkness (Wist, 1976; Troscianko, 1982) Figure from Geier, Bernáth, Hudák & Séra (2008)

11 Problem with Baumgartner’s explanation (2) Newton’s colour circle

The illusion goes away if the lines are wiggly, even though the effects on centre- surround receptive fields would be very similar (Geier, Bernáth, Hudák & Séra, 2008)

CIE chromaticity diagram CIE chromaticity diagram

• Choose any three “primary” colours R, G, and B.

• Make any colour within the triangle by mixing different amounts of the primaries

G • If R, G, & B are the primaries of your computer monitor, then the triangle represents the “gamut” of all the colours the monitor can produce R • Led to trichromatic colour theory, that human vision has three basic types of colour B receptor

12 Trichromatic colour theory Trichromatic colour theory

Thomas Young Rosetta Stone Thomas Young

Trichromatic colour theory Trichromatic colour theory

Now, as it is almost impossible to conceive each sensitive point of the retina to contain an infinite number of particles, each capable of vibrating in perfect unison with every possible undulation, it becomes necessary to suppose the number limited, for instance, to the • 3 types of “cone” photoreceptor in retina, with peak sensitivity to long (L), three principal colours medium (M), or short (S) wavelengths Thomas Young • Only properly confirmed in 1983! (Dartnall, Bowmaker & Mollon, 1983)

13 Hering’s colour Complementary colours opponent theory

Complementary colours Spanish castle illusion

http://www.johnsadowski.com/big_spanish_castle.php

14 Spanish castle illusion Spanish castle illusion

http://www.johnsadowski.com/big_spanish_castle.php http://www.johnsadowski.com/big_spanish_castle.php

Lilac Chaser

http://www.michaelbach.de/ot/col_lilacChaser/ http://www.michaelbach.de/ot/col_lilacChaser/

15 Lilac Chaser Young’s Trichromatic Hering’s opponent colour theory colour theory

vs

http://www.michaelbach.de/ot/col_lilacChaser/

They’re both right! What’s going on? • Mapping from cone respones to colour opponent channels is a linear transformation • Essentially a rotation of 3D coordinate axes: SML (S, M, L) → (L–M, L+M –S, L+M) Cone photoreceptors • S, M, and L responses are all highly correlated: inefficient use of channel bandwidth, repeating same information in multiple channels

• L–M, L+M –S, L+M responses are uncorrelated, making better use of channel bandwidth: transmit more information for the same data rate (Buchsbaum & Gottschalk, 1983) Ganglion Yellow-Blue Luminance Red-Green cells

16 Efficient coding by ganglion cells Ganglion cells as linear filters • Retina contains around 2-4 million “cone” photoreceptors (used in daylight) and around 40-80 million “rods” (used in dark conditions) • As well as on-centre/off-centre, we (Jonas, Schneider & Naumann, 1992) can also distinguish between linear and nonlinear ganglion cells (Enroth- • contains only about 1 million nerve fibres (Ogden & Miller, Cugell & Robson, 1966) 1966) • Eye needs to perform image compression – the problem of sending the X-cells: linear retinal image down the optic nerve to the brain is similar to sending Y-cells: nonlinear images over the internet • A classic study, very poorly received by reviewers, who recommended that the details of the methods should be omitted because “it is very unlikely that anyone will wish to follow so closely the same procedure” (see Alpern, 1984)

• Need to reduce data rate, without losing too much information • In fact, their methods became the dominant paradigm for the next few decades • Transformation of colour signals is one way of doing this • Shape of receptive fields also plays a role ...

X-cells (linear) Measuring frequency response function • Perform weighted average of light intensity within receptive field • Another way of characterising a linear system is in terms of the phase • Weighting function is called the “impulse response function”, or and amplitude of its response to sine waves of different frequency “receptive field profile” – gives response to small spots of light

Light adapted Dark adapted • For characterising a visual mechanism, we use sine wave gratings, with spatial frequency measured in cycles per degree of visual angle

θ

• An array of these units, with receptive fields centred on different image locations, is called a linear filter High contrast • Receptive field is called the kernel of the filter

17 Measuring frequency response function Contrast sensitivity function (CSF) of X-cells • Enroth-Cugell & Robson (1966) used electrodes to record from X-cells in • Another way of characterising a linear system is in terms of the phase the cat retina, while they displayed sine wave gratings and amplitude of its response to sine waves of different frequency • For each spatial frequency they found the contrast, C, that gave a just- detectable neural response • For characterising a visual mechanism, we use sine wave gratings, with spatial frequency measured in cycles per degree of visual angle • Contrast sensitivity at that spatial frequency is then 1/C Effect of brightness level

θ

Low contrast Contrast is the variation in luminance as a proportion of the mean luminance

Contrast sensitivity function (CSF) of X-cells Contrast sensitivity function (CSF) of X-cells • Enroth-Cugell & Robson (1966) used electrodes to record from X-cells in • Enroth-Cugell & Robson (1966) used electrodes to record from X-cells in the cat retina, while they displayed sine wave gratings the cat retina, while they displayed sine wave gratings • For each spatial frequency they found the contrast, C, that gave a just- • For each spatial frequency they found the contrast, C, that gave a just- detectable neural response detectable neural response • Contrast sensitivity at that spatial frequency is then 1/C • Contrast sensitivity at that spatial frequency is then 1/C Light adapted – band-pass filter Effect of brightness level Dark adapted – low-pass filter Effect of brightness level

18 Efficient coding by X-cells Efficient coding by X-cells • Shannon (1948) • Optimal for all signal components Bright conditions to have equal output power – 1⎛⎞ output power information= log “whitening” 2noise2 ⎜⎟ power ⎝⎠ • Images have 1/f spectrum (amplitude inversely proportional to spatial frequency)

• Diminishing return of information with increasing output power • To whiten the image, sensitivity needs to be proportional to • Given fixed energy budget, it’s better to “spend” that energy boosting spatial frequency lower-power signal components than boosting already-high components • For very low signal-to-noise • Optimal for all signal components to have equal output power – ratios, the components contain “whitening” mostly noise, and it’s a waste of energy to transmit them

• So beyond a certain spatial frequency, sensitivity should start to fall

Efficient coding by X-cells Efficient coding by X-cells • Optimal for all signal components Dark conditions Bright conditions Effect of brightness level (Enroth- to have equal output power – Cugell & Robson, 1966) “whitening” • Images have 1/f spectrum (amplitude inversely proportional to spatial frequency)

• To whiten the image, sensitivity Dark conditions needs to be proportional to spatial frequency • For very low signal-to-noise ratios, the components contain mostly noise, and it’s a waste of energy to transmit them • Whitening (in bright conditions) decorrelates responses of neighbouring neurons, to increase information capacity • So beyond a certain spatial frequency, sensitivity should • Blurring (in dark conditions) performs spatial averaging to reduce noise start to fall • See Atick & Redlich (1990) and Srinivasan, Laughlin & Dubs (1982)

19 Efficient coding pioneers

Simon Laughlin – Professor of Joseph Atick – Executive Vice Neurobiology, University of President and Chief Strategic Officer Cambridge of L-1 Identity Solutions

• See Atick & Redlich (1990) and Srinivasan, Laughlin & Dubs (1982)

Simple model visual neuron Efficient signal transduction in the fly retina

• Need to maximize the amount of information that the retinal neurons tell us about the pattern of light entering the eye

• This can be done by maximizing the entropy (or unpredictability) of the neural signal (Shannon, 1948)

• And we maximize entropy by having a flat distribution of neural responses (any deviation from a flat distribution makes the response more predictable, and so the entropy is lower)

• We achieve histogram equalization by putting the neural response through a transducer (response function) that has the shape of the Pattern of light Linear receptive field Nonlinear transduction integral of the probability distribution of contrasts in the environment (see Dayan & Abbott textbook, chapter 4)

• Laughlin (1981, 1983) showed that this is what happens in the fly’s retina

20 Laughlin (1981, 1983) Receptive fields at later stages of processing

• Measured frequency of occurrence of different contrast levels in the natural Hubel & Wiesel (1962) habitat of the fly. Contrast is (I − Ib)/Ib, where Ib is the background intensity.

LGN V1 • Retinal response function maximizes neuron’s information capacity

Receptive fields at later stages of processing Receptive fields at later stages of processing

Hubel & Wiesel (1962) Hubel & Wiesel (1962)

× = excitatory region × = excitatory region LGN V1 U = inhibitory region LGN V1 U = inhibitory region

21 Neurons in primary (V1) Basics of Gaussian filters

• Two types: simple (linear) cells and complex (nonlinear) cells • Scale = size • Similar dichotomy to X and Y cells in the cat retina • Gaussian function with standard deviation σ is said to have a scale of σ. • Simple cell receptive fields can be modelled as Gaussian derivatives • Can be considered kernels of Gaussian derivative filters, which simultaneously blur the • To make a Gaussian edge with scale (blur) of σ image and take derivative • take a step-edge • For each point in the visual • filter with a Gaussian kernel with scale σ field, there are receptive fields with many different = orientations and scales * (i.e. sizes) • To simultaneously blur a stimulus and take the nth derivative, filter it with • What are they for? a kernel that is the nth derivative of a Gaussian • Respond very well to edges • Edge detection? Figure from Georgeson, May, Freeman & Hesse (2007) * =

Edge detection in biological vision

• Marr’s (1976) proposal: Raw primal sketch • symbolic representation of primitive features, e.g. edges • edges are “intensity changes”, i.e. peaks of luminance gradient

• To find an edge ...

• ... look for peaks in the 1st derivative (Canny, 1986) ...

• ... or zero-crossings in the 2nd derivative (Marr & Hildreth, 1980)

Laplacian of Gaussian filter kernel • similar to LGN receptive field • why not use a pure derivative operator?

22 Edges have a range of scales, sharp to blurred

• blur tells us about • Initially ignored by computer vision researchers • depth “Natural scenes usually consist of objects with sharply • shadows defined surfaces.1” (Hueckel, 1971, Journal of the • surface curvature Association for Computing Machinery, 18, 113-125) “1Exceptions may occur in a scene containing patches of • for edge maps to capture fog or in a picture containing objects out of focus.” most of the information in • Marr: most images contain blurred edges the image, blur coding is • Explains need for receptive fields probably essential (Elder, with several different scales 1999)

Figure from Elder & Zucker (1998)

Marr & Hildreth’s approach to multiple scales Do humans detect edges with 2nd derivative?

• Detected zero-crossings in output of 2nd-derivative (Laplacian of • If edge processing starts off with 2nd derivative operation, adding a Gaussian) filters with different scales (sizes) linear ramp to a stimulus should make no difference to the output

• “Spatial coincidence assumption” – genuine edge gives rise to zero- crossings at same location in a range of filters with different scales

• Not true!

• When zero-crossings don’t superimpose, “the larger channels must be ignored, and the description formed solely from small channels of which the zero-crossings do superimpose” (Marr & Hildreth, 1980, p. 204)

• So, the large-scale channels are ignored, unless they agree precisely with the small-scale channels, in which case they’re redundant

• Not a good solution

23 Effect of added ramp on perceived blur For edge detection, we need to ...

• May & Georgeson (2007b) investigated this psychophysically, by • Find local peaks luminance gradient adding a linear ramp to a fixed blurred edge, and then adjusting a reference edge until it looked as blurred as the edge + ramp stimulus • Measure the blur of each edge

• Big effect of ramp rules out • Detect sharp edges with small-scale operator and the idea that the first stage detect blurred edges with large-scale operator of edge processing is a 2nd derivative operator, as proposed in most previous models of edge coding in biological vision (e.g. Marr & Hildreth, 1980; Watt & Morgan, 1985; Kingdom & Moulden, 1992; Georgeson, 1992)

Figure from Elder & Zucker (1998)

Lindeberg’s (1996, 1998b) algorithm Lindeberg’s (1996, 1998b) algorithm

• Scale–space theory (Witkin, 1983; Koenderink, 1984) • Scale–space theory (Witkin, 1983; Koenderink, 1984) • Scale–space representation contains a dimension representing • Scale–space representation contains a dimension representing filter scale, σ, in addition to the two dimensions of the image filter scale, σ, in addition to the two dimensions of the image

image gradient image gradient 0.5 find 0.5 gradient apply Gaussian 1st × σ apply gaussian × σ derivative operator receptive field with scale σ with scale σ

• Detect edges by looking for peaks in scale–space • Detect edges by looking for peaks in scale–space ) ) σ • Position of peak along “spatial position” σ • Position of peak along “spatial position” dimension gives spatial position of edge dimension gives spatial position of edge

scale ( scale • Position of peak along “scale” dimension gives ( scale • Position of peak along “scale” dimension gives the blur of the edge the blur of the edge

spatial position spatial position

24 Lindeberg’s (1996, 1998b) algorithm Lindeberg’s (1996, 1998b) algorithm

• Scale–space theory (Witkin, 1983; Koenderink, 1984) • Scale–space theory (Witkin, 1983; Koenderink, 1984) • Scale–space representation contains a dimension representing • Scale–space representation contains a dimension representing filter scale, σ, in addition to the two dimensions of the image filter scale, σ, in addition to the two dimensions of the image

image gradient image gradient find 0.5 find 0.5 gradient gradient apply gaussian × σ apply gaussian × σ receptive field receptive field with scale σ with scale σ

• Detect edges by looking for peaks in scale–space • Detect edges by looking for peaks in scale–space ) ) σ • Position of peak along “spatial position” σ • Position of peak along “spatial position” dimension gives spatial position of edge dimension gives spatial position of edge

scale ( scale • Position of peak along “scale” dimension gives ( scale • Position of peak along “scale” dimension gives the blur of the edge the blur of the edge • Sharp edges detected with small-scale operator; spatial position spatial position blurred edges detected with large-scale operator

Predicting human performance What have we learned?

• Some cells perform linear summation across the image, and their • Lindeberg’s algorithm does quite well at predicting perceived receptive fields act like the kernel of a linear filter blur of edges (Georgeson, May, Freeman & Hesse, 2007) • But performance is better predicted by a similar algorithm that • Retinal ganglion cells have centre-surround receptive fields looks for peaks in the 3rd derivative, rather than the 1st (Georgeson et al., 2007; May & Georgeson, 2007b) • These cells transform the cone responses into colour-opponent channels • This 3rd-derivative based algorithm also explains the effect of • Contrast response function of retinal neurons has the shape of the adding a linear ramp cumulative probability distribution of contrast levels • All of the above maximize the efficiency of information transfer from eye to brain • Visual cortex more concerned with interpreting the image, e.g. edge detection • Receptive fields in V1 are elongated, and respond well to edges

• An edge coding model that uses V1-style receptive fields to construct a scale-space representation accurately predicts perceived blur of edges

25 Things are not always as they seem

26