<<

Spatial Filters/Meese/Nov 2000

A Tutorial Essay on Spatial Filtering and Spatial Vision

Tim S Meese Neurosciences School of Life and Health Sciences Aston University

Feb 2009: minor stylistic and modifications and technical corrections

March 2009: referencing tidied up

To follow: Final round-up sections and detailed references added.

-1- Spatial Filters/Meese/Nov 2000

1. Introduction To the layperson, vision is all about ‘seeing’—we open our eyes and effortlessly we see the world ‘out there’. We can recognize objects and people, we can interact with them in sensible and useful ways and we can navigate our way about the environment, rarely bumping into things. Of course, the only information that we are really able to act upon is that which is encoded as neural activity by our nervous system. These visual codes, the essence of all of our perceptions, are constructed from information arriving from each of two directions. Image data arrives ‘bottom-up’ from our sensory apparatus, and knowledge-based rules and inferences arrive ‘top-down’ from memory. These two routes to visual perception achieve very different jobs. The first route is well suited to providing a descriptive account of the retinal image while the second route allows these descriptions to be elaborated and interpreted. This chapter focuses primarily on one aspect of the first of these routes, the processing and encoding of spatial information in the two-dimensional retinal image. One of the major success stories in understanding the human brain has been the exploration of the bottom-up processes used in vision, sometimes referred to as early vision. But how do vision scientists get in to explore it? Our visual world is a very private one and not available for external scrutiny from vision scientists in an obvious way. Fortunately, it is not necessary and, according to some, perhaps not even helpful to think of vision this way. The point is made by David Marr’s consideration of the lowly housefly (Marr, 1982). Although we cannot know what it is like to be a housefly, we don’t have to suppose that a housefly has an explicit visual representation of the world around it for us to study its . Perhaps the housefly just computes a few simple but immediately useful visually guided parameters. For instance, when the computation of ‘rate of image expansion’ reaches a certain value this means that a surface is approaching and the ‘undercarriage down, prepare to land’ signal should be triggered. On this view, it makes little sense to ask how the visual world might ‘look’ to a housefly. However, by carefully manipulating artificial visual stimuli that mimic the fly’s normal visual environment it is possible to investigate its visual apparatus by observing its behavioural responses (e.g. whether it prepares to land). A similar philosophy has been applied to understanding early vision in humans. The central tenet is that the visual system is a signal processor. It is treated as a black box, with a two-dimensional spatial signal as input and a ‘neural image’ (an organized distribution of neural activity) as output (see Figure 1) upon which behavioural decisions can be made. To learn about signal processing in the early visual system we need to know about the neural image. Visual psychophysics attempts to do this in the laboratory (see Figure 2) by assessing the behavioural responses to visual stimuli (e.g. pressing one of two buttons), usually made by trained observers. With this technique, the system is necessarily treated as a whole though as we shall see, careful thought and experimentation can allude to the nuts and bolts of the visual system’s inner workings. In neurophysiology, single- and multiple-cell recordings (see Figure 3) give us more direct access to the neural response, but allow us to look only at isolated fragments of the system at any one time. Input image Early vision Output image (grey levels) (neural activity)

BLACK BOX

Fig 1. The black box approach to vision.

-2- Spatial Filters/Meese/Nov 2000

Together, neurophysiology and psychophysics have converged on the view that early spatial vision consists of a set of spatial filters. The evidence for this view and the reasons why vision might work this way form the basis of the second part of the chapter, but first we need some suitable building blocks. We begin by exploring some formal concepts of spatial filtering.

Fig 2. A psychophysics laboratory.

Fig 3. Direct recordings of visual neurons.

2. Filtering in the Fourier Domain In most basic terms, a filter is a device that receives something as input and passes on some of its input as output. For example, a sieve might receive various grades of gravel as input, retaining the largest stones and passing on only the small chippings as output. In image processing, the input and output are both images, but what aspect of the image might be selectively passed on or filtered out? One very useful way of approaching this is from the Fourier domain. Essentially, a Fourier component can be thought of as a sine-wave grating1 of some particular orientation, spatial , amplitude and spatial phase. It turns out, perhaps astonishingly, that all images can be thought of as a set of Fourier components that when added together recreate the original image. An example of an image and the amplitude spectrum of its (FT) are shown in Figure 4. In Figure 4b, the grey levels indicate the amplitudes of Fourier components (sine-wave ) within a map called Fourier space. This space is most conveniently expressed in terms of polar coordinates,

1 A sine-wave is a stimulus in which luminance is modulated sinusoidally, producing an image that looks like a set of blurred stripes. The experimenter can control the width of the stripes (spatial frequency), their orientation, spatial phase (how they line up with the centre of the display screen) and their contrast (the light levels for the light and dark bars relative to mean luminance). Sine- wave gratings and related stimuli are widely used in vision science (partly because they are the fundamental building blocks of all images), and examples can be seen by looking ahead to Figures 12 and 13.

-3- Spatial Filters/Meese/Nov 2000 where the argument (angle) indicates the orientation of a Fourier component and the modulus (distance from the origin) indicates its spatial frequency. Unfortunately, the gentle gradients of the grey levels in Figure 4b make it difficult to appreciate its structure, so a second method of plotting the Fourier transform is shown in Figure 4c. Here, quantization and colour enhancement reveal contours representing Fourier components of similar amplitudes. In this case, those with the greater amplitudes (coloured red and yellow) are those closer to the origin (centre of the image). But the point here is that images contain Fourier energy distributed across Fourier space (i.e. they have many Fourier components with different orientations and spatial ) and that spatial filters are selective for different regions of Fourier space, passing some Fourier components and stopping (filtering out) others, just like the sieve. We will consider why it might be a good idea to do this later on (in Section 6), but for know, let us just consider the different types of filters that we might construct.

(a) (b) (c)

Fig 4. Image and its Fourier transform. a) The original image. b) Amplitude spectrum of the Fourier transform. c) Colour enhanced version of (b). The colours represent high to low amplitudes as follows: red, orange yellow, green, blue, purple, black.

A filter that is selective for the circular region of Fourier space shown in Figure 5a (known as the filter’s pass-band) would pass only low spatial frequencies (i.e. those close to the origin) but would not care about their orientations. Such a filter is known as a low-pass isotropic filter and an example of how it transforms an image is shown in Figure 6a. (For now, you can ignore the small insets in the upper left corner of Figure 6; these will be explained later in Section 3). The output image contains only the Fourier components from the input image that are within the filter’s pass-band. In other words, it contains low spatial frequencies but no high spatial frequencies at any orientation—we say they have been filtered out. Other interesting filters are those that pass only a band of spatial frequencies at any orientation (band-pass isotropic filters; Figure 5b) and bands of spatial frequencies at only specific orientations (oriented band-pass filters; Figure 5c)2. Note that for all three of the filters in Figure 6, the Fourier components for which the filter is most responsive are those shaded red. So, for the filter in Figure 6b for example, the red ring represents a single spatial frequency at any orientation. The results of applying these filters to the image in Figure 4a are shown in Figure 6. (See ahead to Fig 13 for another example of oriented band-pass filtering). With the aid of a computer and appropriate image processing software, performing filtering operations in the Fourier domain is very straightforward. All we have to do is multiply the Fourier transform of the input image with the Fourier representation of the filter (sometimes called the filter’s modulation transfer function [MTF]), the details for which are described in Box 1. The three MTFs in Figure 5 have been multiplied by the Fourier transform of the image in Figure 4, to produce Fourier transforms (Figure 7) of three different output images. The final step of generating the output images, simply involves computing their inverse Fourier transforms. This is done by adding-up spatial representations of all the many Fourier components (sine-wave gratings) in the Fourier transform. This is how the three filtered images in Figure 6 were generated.

2 For mathematical reasons each Fourier component appears twice in the Fourier domain, one being a rotation of the other through 180°. Intuitively, this makes good sense when you realize that if you rotate a cosine-phase sine-wave grating through 180°, you get back to where you started. So, at first sight, in Fig 5c it looks as though the filter is sensitive to two different regions of Fourier space, but one of them is in fact just a copy of the other rotated through 180°.

-4- Spatial Filters/Meese/Nov 2000

(a) (b) (c)

Fig 5. Modulation transfer functions of three filters. a) low-pass isotropic filter (e.g. the eye’s optics). b) band-pass isotropic filter (e.g. retina and LGN). c) band-pass vertical filter (e.g. cortex). The colours represent high to low sensitivities as follows: red, orange yellow, green, blue, purple, black.

(a) (b) (c)

Fig 6. a,b,c. Effects of applying the three filters in Fig 5 to the image in Fig 4a.

(a) (b) (c)

Fig 7. a,b,c. Fourier transforms of the filtered images in Fig 6. Note that the amplitude scale in (c) has been amplified a little to help reveal the spectrum’s structure.

-5- Spatial Filters/Meese/Nov 2000

------BOX 1 Multiplication of two images

Filtering can be achieved by multiplying a filter’s MTF with the Fourier transform of the input image to generate the Fourier transform of the output image. The first step is to think of the amplitude spectrum of the image’s Fourier transform (Figures 4 & 7) and the filter’s MTF as images: they just contain regions of different shades of grey on a dark background. (Though in most of the figures in this chapter, colour has been used to achieve visual enhancement). Now, to multiply any two images together, all that is needed is to convert the image’s grey levels to numbers. It is convenient to associate black with zero, white with one, and intermediate grey levels (or colours) with intermediate values. We can now think of each image as a two-dimensional array of neighbouring points with each entry being represented by a number (a grey level). To calculate the product of the two images all we do is work out the product of the two numbers at each of the corresponding points in the two images. Note that because the numbers representing the MTF are never greater than one, the numbers in the output can never be greater than the corresponding numbers in the input. For this reason, filters are often said to attenuate the input image.

------

2.1 Preferred stimuli and bandwidths

Preferred Preferred orientation = 90° spatial frequency = 10 c.deg-1 1 1

θbw 0.75 0.75

0.5 bandwidth = 40° 0.5

Sensitivity bandwidth = log (14/5.7) 2

Sensitivity bandwidth = 1.3 octaves 0.25 0.25 fbw

0 0 50 70 90 110 130 1 2 3 5.7 10 14 20 -1 Orientation (deg) Spatial Frequency (c.deg )

Fig 8. Filter bandwidths. a) MTF of band-pass horizontal filter. b) Orientation bandwidth. c) Spatial frequency bandwidth. Note that (a) and (b) are plotted on linear scales, whereas in (c) a log-axis is used.

A convenient way of summarizing a filter’s characteristic (its MTF) is in terms of i) the spatial frequency and orientation that produce the maximum response (i.e. the greatest output), referred to as the preferred orientation and preferred spatial frequency and ii) the range of Fourier components to which it responds, referred to as its bandwidth. There are several different conventions for describing bandwidth, but the one considered here is the filter’s full-width at half-height. Figure 8a is the MTF of an oriented band-pass filter (note that the orientation and spatial frequency are different from those in Figure 5c). The white circle in Figure 8a has its centre at the origin of Fourier space. This means that all of the points on this circle have the same spatial frequency but different orientations. The circle also passes through the point in the MTF that represents the preferred spatial frequency and orientation for this filter (the red spot). This is the Fourier component that would be least affected (least attenuated) by the filter. Sometimes we say that the filter is tuned to this spatial frequency and orientation. In Figure 8b a one-dimensional plot of the filter’s MTF is shown. The ordinate represents amplitude of the MTF, and when presented this way it is often referred to as sensitivity (or, sometimes, gain). The abscissa represents the contour in Fourier space mapped out by an arc from the circle in Figure 8a, indicating ‘orientation’. The horizontal line in Figure 8b shows the width of the plot at half of the filter’s maximum sensitivity, indicating that this filter has an orientation bandwidth of 40°. (This is also shown by the wedge -shape in Figure 8a.) Figure 8c illustrates a similar idea for spatial frequency. Here the abscissa is the contour through the MTF given by a straight line that passes through the origin and the filter’s preferred spatial frequency (see Figure 8a). It is common to plot spatial frequency on a log-axis, and this is what is done in Fig 8c (i.e. even spatial intervals on the graph represent even numeric multiples). Furthermore, it is also common to report spatial frequency bandwidth in terms of octaves (an octave is a relative dimension; see Box 2). Spatial frequency bandwidth is determined in the same way as for orientation, and Figure 8c shows that for our filter it is 1.3 octaves.

-6- Spatial Filters/Meese/Nov 2000

Fig 9 a,b,c. Spatial filter-elements for the spatial filters shown in Fig 5. For these filters, the spatial filter-element is also the filter’s pointspread function. Note that in all cases, the figures have been enlarged for clarity.

------BOX 2 Octaves

In the frequency domain, an octave is simply a multiple of two (or a half, depending upon whether you are moving up or down the scale). For example, as we move up a piano keyboard in octave steps we are doubling the pitch of the notes. For Fourier components, a spatial frequency of 8 c.deg-1 is two octaves higher than one of 2 c.deg-1 because we double once to get from 2 to 4, and then again to get from 4 to 8. Formally, the difference between two frequencies in octaves is given by Log2(H/L), where H is the higher frequency and L is the lower frequency. (And as every high school child knows, Log2(x) = Log10(x)/(Log10(2).)

------2.2 Spatial phase We have seen that spatial filters are selective for particular orientations and spatial frequencies. Another dimension along which spatial filters are selective is spatial phase. The phase selectivity of a spatial filter is represented by the phase spectrum (not shown) of the Fourier representation of the filter. This is a little more difficult to think about and the details are beyond the scope of this chapter. However, in brief, a cosine-phase filter will impose no phase shift on its output whereas a sine-phase filter will shift the phase of the Fourier components within its pass-band by a phase angle of 90°. As all of the filters that we consider here are cosine-phase filters, we can safely ignore the phase spectrum for our present purposes.

2.3. The eye’s optics as a filter The eye’s optics are not perfect but degrade the image by blurring it (amongst other things). This process can be characterized in terms of the MTF of the optics and can be measured experimentally. Recall from above that:

OPamp = MTF × IPamp, where OPamp and IPamp are the amplitude spectra of output and input images respectively, and the MTF is the Fourier representation of the filter. This rearranges to give:

MTF = OPamp/IPamp. (1)

This means that with a carefully chosen input image we should be able to calculate the MTF (the amount by which each and every Fourier component is attenuated) for the eye’s optics. Specifically, because the MTF describes the attenuating effects of the filter on all Fourier components, we need a stimulus that contains all Fourier components so that OPamp/IPamp is a complete description of the filter. It is perhaps not immediately obvious what such an image might look like, so for now we will consider something slightly different but which is exactly equivalent for the filters considered in this chapter. Instead of presenting all of the Fourier components (sine-wave gratings) at the same time, we will consider presenting them sequentially. Of course, to do this for all Fourier components would take

-7- Spatial Filters/Meese/Nov 2000 forever but a good approximation can be achieved by selecting a restricted set of test gratings and sampling at say every 10° in orientation and every 0.5 octaves in spatial frequency. So, we need lots of sine-wave gratings at different orientations and spatial frequencies each of unit amplitude, and we need to measure the attenuating effects of the eye’s optics for each of these gratings. We can do this by using a modified ophthalmoscope (a device for looking into people’s eyes), to inspect the retinal image that is produced for each of the test gratings. With the help of a photo- diode it is possible to measure the light levels at the peaks and troughs of the gratings that appear on the retinal images and deduce the amplitude of each grating’s image. Of course, we must remember to adjust our calculations to allow for the fact that the light has passed through the eye’s optics twice (once on the way in, to produce the retinal image, and once on the way out, to produce the image that is being measured by our photo-diode). Now, because the amplitude spectrum of each grating in IPamp was unity it follows from Equation 1 that an estimate of the MTF is given directly from our measurements of OPamp. Specifically, to generate our estimate we first draw some axes that represent Fourier space. We now create a picture of the MTF by using the spatial frequency and orientation of each grating from our experiment to index Fourier space and write into the picture at that point an intensity (grey level) that represents the amplitude of the grating in the retinal image. For a typical observer without astigmatism, this would produce a picture (after colour enhancement) that looks something like that in Figure 5a. (In fact, the level of optical blur has been exaggerated considerably here for the purpose of illustration and, in practice, the picture would be much more grainy than shown because of the practical limits imposed on the sampling of Fourier space described above.) All this means that the eye’s optics can be thought of as an isotropic low-pass filter. Note that what we have achieved here is a characterization of the eye’s optics (the MTF) that allows us to deduce how any image would be filtered, just by observing the way in which a set of sine-wave gratings are attenuated. Actually, as we shall see in Section 3, there is a second and more straightforward method that we can use to achieve exactly the same thing.

2.3.1 Anti-aliasing We have seen above that the eye’s optics filter out high spatial frequencies from the input. At first this might seem like bad news because it follows that lots of spatial information out there in the real world never makes it into our neural representations. (It has been claimed that this is where the fairies live, which is why you never see them!) However, it turns out that the fidelity of the optics is very well matched to the fidelity of subsequent neural processing. It is well known to engineers that if an image is under-sampled, spurious low frequency Fourier components called aliases will be introduced (e.g. see Reference 7 in the reading list). Aliased components are undesirable because they do not convey useful information about the outside world; they just create distracting clutter. In vision, the sampling density is determined by the spacing of receptors on the retinal mosaic, which are positioned such that spatial frequencies higher than about 50 or 60 c/deg would result in aliasing (this cut-off point is referred to as the Nyquist frequency). Fortunately, the MTF of the eye’s optics ensures that spatial frequencies this high never reach the retina. This means the eye’s optics can be thought of as an anti-aliasing filter. This is a valuable operation and very similar to the sort of filtering that is performed by sound engineers when making digital recordings. In this case, the analogue input signal (e.g. music) might contain audio frequencies that are higher than the sampling frequency used for making CDs (and also too high to be heard). However, unless they are filtered out using a low-pass anti-aliasing filter, these high frequencies would produce low frequency aliases, which would find there way onto the CD producing unpleasant auditory interference. See Meese (2002) for further details.

3.0 Filtering in the spatial domain In the discussion above we have considered filtering from a Fourier perspective by characterizing a filter in terms of its MTF. An alternative and equally useful characterization is possible in the spatial domain. Such representations go by several names including the pointspread function, the spatial filter-element, the receptive field, the convolution kernel, the convolution mask, the impulse response and the weighting-function. The preferred term depends upon several things including the derivation, the application, and the background of the author! The terms used in this chapter will be the first three in the respective contexts of i) empirical estimates from point-source stimuli, ii) applications in image processing and iii) their presence in biological visual systems. Inevitably, however, there will be some overlap between the contexts and therefore the terminology. Spatial domain representations of the three filters from Figure 5 are shown in Figure 9. In the case of the eye’s optics (Figure 9a) it is very straightforward to get an empirical estimate of this function. Suppose that we repeat the experiment described previously using the modified ophthalmoscope but this time we use just a single point of bright light as the input image and record the retinal image that it produces. If we now adjust our record of this image

-8- Spatial Filters/Meese/Nov 2000 to allow for the fact that the light has passed through the eye’s optics twice we have a representation of what is known as the pointspread function (Figure 9a). This is a description of how a filter (in this case the eye’s optics) distributes a point of light received as input. Because all images can be thought of as just a set of spatially distributed points of light, we can mimic the effects of our spatial filter by replacing each light point in the input image with a copy of the pointspread function multiplied by a number (between zero and one) that represents the light intensity of the image at that point. Summing together all of the weighted and overlapping pointspread functions then creates the output image. Although very different in approach, this is exactly equivalent to calculating the filter output through the Fourier route as described earlier. Note then, that this time we achieved a characterization of the eye’s optics (the pointspread function) that allows us to deduce how any image would be filtered, just by observing the way in which a single point of light is blurred. In fact, it turns out that this method is very closely related to the earlier method using gratings. The problem that we sidestepped before was the creation of a stimulus containing all Fourier components. Perhaps surprisingly, however, our single point of light is just such a stimulus. It is the image that you get when you inverse Fourier transform an amplitude spectrum where the whole of Fourier space is set to unit amplitude (and cosine phase). This leads us to an important insight. Because IPamp in Equation 1 is unity for all Fourier components in our single point of light it follows that for this stimulus the MTF is equal to the amplitude spectrum of the output image. In other words, the MTF is equal to the Fourier transform of the filter’s pointspread function. In fact, the MTFs in Figure 5 were generated by calculating the Fourier transforms of the pointspread functions in Figure 9. Another important spatial domain process that, for our purposes, is equivalent to the one described above, is convolution. In this case, spatial filtering is implemented by a set of identical spatial filter-elements (such as those shown in Figure 9) centred over each spatial location in the input image. (More commonly, in image processing applications, the process is implemented serially with a single filter-element that is moved sequentially over the entire image.) To create the output image, the ‘response’ of each filter-element is calculated as follows. First, we need to realize that a spatial filter-element can itself be thought of as an image; it too is just represented by a set of grey levels. In this case, however, it is common to assign a value of 1 to white, -1 to black, and values between these extremes to intermediate grey levels, with zero being mid-grey. Filter-elements are usually only a small fraction of the size of the image with which they are being convolved and the filter-elements in Figure 9 have been enlarged for clarity; the actual sizes of the filter-elements that were used in the convolutions are shown as the insets (upper left corners) in Figure 6. The next stage is to multiply the small image patch that describes each filter-element with the patch of input image that lies beneath it (recall that multiplication of images is described in Box 1). Next, we add up all of the values that resulted from the multiplications to give a single number for each filter-element3. This number is the response of the filter-element and is written into the output image in a position corresponding with the centre position of the filter-element. (The entire process is summarized in Figure 10). The output image that is produced by this process is sometimes called a convolved image and is identical to that which would be generated using either of the other methods of filtering described earlier (for cosine phase filters).

Filter-element for a band-pass vertical filter

Multiply and sum to produce a single grey- level

Input image Convolution Output image

Fig 10. The process of convolution for just a single point in the image. By repeating the process for different positions of the filter-element, an entire output image can be generated. The figure shows the process partially complete.

3 For technical reasons, this number is often scaled by a constant that depends upon the size of the filter-element.

-9- Spatial Filters/Meese/Nov 2000

3.1 Relation between the pointspread function and the filter-element As we have seen, it is most straightforward to think of optical systems performing filtering in terms of the pointspread function. However, as we shall see, at later stages of the visual system where filtering is performed by visual neurons it is much more obvious to think in terms of convolution and the filter-element. Fortunately, there is a very close formal relation between a filter’s pointspread function and its filter-element. It turns out that the one can be generated from the other by reflections across the x-axis and the y-axis (see Reference 5 in the reading list for a good illustration of why this is so). More fortunate still, for the cosine-phase filters considered in this chapter, these reflections have no effect (this is revealed from inspection of Figure 9) and so the pointspread function and the spatial filter-element are exactly the same.

3.2 Positive and negative lobes In Figure 9a, the spatial filter-element contains only a positive lobe denoted by the circle in Figure 11. In effect, when it is used to perform convolutions it is performing a local spatial averaging of light in the image so it is perhaps not surprising that this results in a blurring of the image (Figure 6a). Thought of this way, it is easy to see why this filter will not pass high spatial frequencies. When the light and dark bars of a grating are very fine they are blurred together by the filter. For example, in Figure 11, it doesn’t matter where the filter-element is placed, its response (output) will be very much the same—an average mid-grey level. Note also that changing the orientation of the grating won’t make any difference to the response of the filter because of the circular shape of the filter-element. Because of the shape of this filter-element, isotropic filters are sometimes called circular filters.

+

Fig 11. Low-pass filter-element and high spatial frequency grating.

-10- Spatial Filters/Meese/Nov 2000

The spatial structures of the other two filter-elements in Figure 9 are more complicated and include negative lobes (denoted by grey levels darker than mid grey) as well as positive lobes. As we have just seen, the positive lobes result in the attenuation of high spatial frequencies, but if a (cosine-phase) filter is to attenuate low spatial frequencies (as do the band-pass filters of Figures 6b & c) then its filter-element will have negative lobes. The reason for this is outlined intuitively in Figure 12 where we consider the response of a filter-element placed over a light bar of a grating at three different spatial frequencies. In Figure 12b, the spatial frequency of the grating is close to optimal—a light bar falls in the positive lobe and dark bars fall in the negative surround. In this case, the centre region produces a positive response and the absence of light in the negative region results in very little negative contribution, so the net response is positive and strong. In Figure 12c the response is weak because both the centre and the surround blur together the fine bars of the stimulus. In Figure 12a, the response is also weak because even though the positive lobe responds to the light region, the light also stimulates the negative surround and the two contributions cancel each other out. Thus, we have seen that this filter-element does not respond to low or high spatial frequencies, but does respond to mid spatial frequencies. This confirms the band-pass characteristic that we have seen revealed already by its MTF (Figure 5b). Again, because the filter-element is circular, changing the orientation of a grating has no effect on its response, just as we would expect for an isotropic filter.

- +

- +

- +

Fig 12. Isotropic band-pass filter-element and gratings. a) Low spatial frequency grating. b) Mid spatial frequency grating. c) High spatial frequency grating.

3.3 Preferred stimuli and bandwidths in the spatial domain A filter’s preferred stimulus and bandwidth is actually given most directly in the Fourier domain representation as we saw in Figure 8 because, essentially, these terms are Fourier descriptions. However, manipulating these parameters has predictable consequences for the filter-element. The most obvious one is orientation: a filter’s preferred orientation is given by the orientation of the spatial structure of its filter-element. In Fig 9c, for example, this is vertical. A filter’s preferred spatial frequency depends upon the widths of the positive and negative lobes. To a first approximation, the widths of these lobes indicate the widths of the light and dark bars contained in the sine-wave grating that is preferred by the filter. Consequently, halving all the spatial dimensions of a filter-element will double the filter’s preferred spatial frequency.

-11- Spatial Filters/Meese/Nov 2000

Bandwidths are a little trickier to think about but in brief, decreasing orientation bandwidth increases the height of the spatial filter-element, and decreasing the spatial frequency bandwidth increases the number of positive and negative lobes in the filter-element. In essence, to decrease a filter’s bandwidth all you have to do is more closely match the spatial structure of the filter-element to the spatial structure of the filter’s preferred sine-wave grating. Thus, in the limit, a filter with infinitesimally narrow orientation and spatial frequency bandwidths would have a filter-element that was exactly like a sine-wave grating and of infinite spatial extent.

4. Relation between filtering in the spatial and Fourier domains In the previous sections we have seen how filtering can be understood from perspectives of both the Fourier domain and the spatial domain. The beauty of the interrelation between these approaches through the Fourier transform is summarized in Figure 13. The bottom row illustrates the Fourier route and shows that we can generate the Fourier transform of a filtered (output) image by multiplying the filter’s MTF with the Fourier transform of the original (input) image. Alternatively, as shown by the top route, we can filter an image by convolving it with the filter’s filter- element. And for the filters considered here (see Section 3.1) this is given by the inverse Fourier transform of the filter’s MTF. Input image Filter-element Output image

Convolved with: Produces: SPATIAL DOMAIN

Fourier Inverse Fourier Inverse Fourier Inverse transform Fourier transform Fourier transform Fourier transform transform transform

Multiplied by: Produces: FOURIER DOMAIN

FT of input image Modulation transfer function FT of ouput image

Fig 13. Relation between filtering in the Fourier domain and the spatial domain. Note that strictly speaking, the inverse Fourier transform of the MTF is the pointspread function but that for the cosine-phase filters considered in this tutorial, the pointspread function is identical to the filter- element.

5. The contrast sensitivity function (CSF) In Section 2.3 we saw how experiments have allowed us to characterize the first stage of filtering performed by the optics of the eye. We shall now turn to the overall filtering properties of the whole visual system. This has been characterized psychophysically by the contrast sensitivity function (CSF) which provides an estimate of the MTF for human vision4. The technique involves measuring contrast detection thresholds for sine-wave gratings over a range of spatial frequencies. The contrast detection threshold is the lowest contrast at which a stimulus can be just detected by an observer, and can be measured using what is known as a two-interval forced-choice technique (2IFC). In this technique a single experimental trial consists of two temporal intervals (two brief stimulus presentations separated in

4 This involves several assumptions, the details of which are beyond the scope of this chapter. .

-12- Spatial Filters/Meese/Nov 2000 time) each signaled by an auditory beep. One of the intervals, chosen at random, contains a test grating and the other contains no stimulus, just a blank display with the same mean luminance as that in the test interval. The observer has to decide which interval contained the test stimulus and indicate their response by pressing one of two buttons. If the observer were able to see the stimulus then their response would be correct, whereas if they were not, they would have to guess and would be correct with a probability of 0.5. By performing many trials at a range of stimulus contrasts it is possible to generate a psychometric function such as that shown in Fig 14. This is a plot of the percentage of correct responses as a function of stimulus contrast and can be used to estimate the contrast detection threshold. This is often treated as the contrast level at which observers were correct on 75% of trials estimated by fitting a smooth curve through the data. Thus, for the example shown in Fig 14, the detection threshold is a contrast of 1%, sometimes written ct = 0.01. As we shall see, when plotting the CSF the results are typically expressed in terms of sensitivity which is reciprocally related to contrast detection threshold: sensitivity = 1/ct. Thus, for our example, sensitivity equals 100.

100

75

Guess Rate = 50% 50

Percent Correct 25

0 0.25 0.5 1 2 4 Stimulus Contrast (%)

Fig 14. Psychometric function for detecting a sine-wave grating in a psychophysical 2IFC experiment.

Typical CSF 1000 0.001

100 0.01 Contrast

10 0.1 Contrast Sensitivity

1 1 0.1 1 10 100 -1 Spatial Frequency (c.deg ) Fig 15. The contrast sensitivity function (CSF).

By measuring psychometric functions for vertical sine-wave gratings at different spatial frequencies, sensitivities can be plotted as a function of spatial frequency. In our present context, Campbell and Robson (1968) were the first to do this, and the sort of results that they found are shown in Fig 15. Note that the axes are similar to those that were introduced earlier in our discussion of spatial filters (see Figure 8c) though here, the ordinate is also shown as a log axis. The filter characterization is not as complete as those shown in Figure 5 because the effects of orientation have not been considered, but for a single orientation the CSF provides a good illustration of the spatial frequency tuning of the human visual system. For example, at high spatial frequencies a grating cannot be detected (it just looks like a uniform grey field) no matter how high its contrast is. We should not be surprised by this attenuation

-13- Spatial Filters/Meese/Nov 2000 at high spatial frequencies because of what we have learned already about the filtering performed by the optics. However, why should there be attenuation of low spatial frequencies, the optics doesn’t do this! As we know from Section 3.2, in order for a filter to have this band-pass characteristic the filter must have a filter-element with at least one negative lobe. How might the human visual system implement this filtering, and where within the nervous system might it take place?

6. The retina and lateral geniculate nucleus Before vision scientists had even begun to think of early vision in terms of spatial filters, valuable physiological evidence had already begun to accrue. The evidence came from the single-cell recording technique used by physiologists. In brief, this surgical technique involves inserting a micro-electrode into (or close to) a visual neuron and observing the response of the cell in response to conventional visual stimulation. In the 1950s, Kuffler recorded from retinal ganglion cells in cat and found that these cells were responsive to stimulation in approximately circular regions on the retina, different cells preferring stimulation from different regions (Kuffler, 1953). This region is known as the cell’s receptive field. Crucially, however, Kuffler’s receptive fields were found to contain distinct excitatory and inhibitory subregions. So, for example, compared with the cell’s response rate in the absence of stimulation, it might have increased its response if a bright spot were placed in the centre of its receptive field, but decreased its response if a bright spot were placed towards the boundary of its receptive field. For obvious reasons, such cells are often referred to as on-centre cells. It is possible to plot a detailed map of a cell’s receptive field by using grey levels to indicate whether stimulation has caused its response rate to increase (between mid grey and white) or decrease (between mid grey and black). When this is done the results look remarkably similar to the spatial filter-element shown in Fig 9b. In other words, the receptive fields of retinal ganglion cells can be thought of as the spatial filter-elements of an isotropic band-pass filter that cover the retina, with neighbouring cells having overlapping receptive fields centred on adjacent locations in the retina. Recordings made in the lateral geniculate nucleus (LGN) prompt similar conclusions about the organization and character of the visual neurons found there. A more detailed account of receptive fields and their stimulation is provided in Box 3. Because the receptive fields of retinal ganglion cells and LGN cells can be thought of as filter-elements, it is an obvious next step to think of the filtered image in similar terms. Quite simply, the response rate of each cell represents the response of each filter-element. Put another way, the grey levels of neighbouring pixels in the filtered image are represented by the response rates of visual neurons whose receptive fields are centred over neighbouring points in the original image (the retinal image). Note though, unlike the retinal image, this neural image is not really an image at all but a spatially distributed set of neural responses. However, some care is needed over how we interpret the different grey levels. Recall that in Section 3.0 we learned that it is conventional to use negative numbers to represent the negative (e.g. black) regions in the filter-element. This makes sense in terms of neural receptive fields because these regions are in fact inhibitory. However, a moment’s thought should reveal that the consequence of this is that regions of the output image could become negative. For example, if the image were entirely black in the positive region of a filter-element and entirely white in the negative region of the filter element, then the response would be negative. The regions whose grey level is darker than the mid-grey level represent these ‘negative’ responses in Figures 6b and 6c. This poses something of a problem because neurons cannot fire negatively and so there is a danger that information will be lost from the filtered image. This problem is handled in two different ways by biological vision. We have already met the solution used in the retina and LGN. Here, a constant response rate (spontaneous discharge) is added to the output so that negative numbers represent a decrease in response rate relative to the normal (unstimulated) rate of activity. This is very similar to the solution used by image processing software, where both positive and negative numbers are represented by a continuum of grey levels with mid-grey representing zero. However, cortical neurons (next section), are silent when unstimulated (they have no spontaneous discharge) and so vision must adopt another solution to the problem. In fact, the trick is very simple and requires only that we should think of filter-elements in terms of pairs of receptive fields having lobes of opposite polarity. So, for example, the filter-element in Figure 9b might be represented by a neuron whose receptive field looks exactly like that in Figure 9b, plus a second neuron whose receptive field is the same shape but contains a central inhibitory region flanked by neighbouring excitatory regions (i.e. all of the lobes have opposite signs). Although both neurons can only respond positively, the second neuron is actually carrying information about the negative response of the filter. In other words, these neurons would be the ones that respond in the dark regions of Figures 6b and 6c. So long as the visual system is able to keep track of this, by knowing the polarity of its neurons’ receptive fields for example, filtered information will not be lost. This solution is also employed in the retina and LGN, providing something of a belt and braces approach at this level of the visual system.

-14- Spatial Filters/Meese/Nov 2000

------BOX 3 Effects of stimulating visual neurons with small spots of light. Consider a retinal ganglion cell’s receptive field containing an excitatory (positive) centre and an inhibitory (negative) surround and stimuli made from light and dark spots placed on a mid-grey background. In the presence of the mid-grey background alone, the cell will respond at a background level known as spontaneous discharge. If a light spot is placed in the excitatory region, then the cell is excited and the response increases, whereas if a light spot is placed in the inhibitory surround, then the cell is antagonized and the response decreases. On the other hand, if a dark spot is placed in the excitatory centre, then this is equivalent to removing some light (the mid-grey level that the dark spot now covers) from the centre. Consequently, the cell will be less excited and the response of the cell will decrease. Finally, if a dark spot is placed in the surround, this is equivalent to removing some light from the surround. Light in the surround normally has an inhibitory effect, so its removal will cause the cell’s response to increase. These manipulations are summarized in Table 1, where, from left to right ‘+’ indicates an excitatory region, an increase in light and an increase in response and ‘-’ indicates an inhibitory region, a decrease in light and a decrease in response. Note that the contents of this table can be readily constructed using the rules of multiplication. For example, in the second row, moving from left to right, a positive number multiplied by a negative numbers gives a negative number.

Table 1 Region Light Response + + + + - - - + - - - + ------

7. Primary visual cortex The story so far is this. The eye’s optics is a low-pass filter, but the CSF indicates that the MTF of the visual system is band-pass. This is consistent with the band-pass characteristics of the isotropic spatial filters that we find implemented by visual neurons in the retina and LGN. We will now learn that in primary visual cortex, the filtering process becomes even more elaborate. For example, the CSF does not represent just a single band-pass filter with a broad bandwidth, but encompasses many spatial filters tuned for different spatial frequencies and different orientations.

7.1 Adaptation The first evidence for us to consider is psychophysical and involves a procedure known as adaptation. In brief, prolonged visual stimulation is thought to fatigue visual neurons, reducing their sensitivity and making them less responsive for a short period of time after the adapting stimulus is removed. Because the post-adaptation fatigue causes the visual system to behave slightly differently from normal it results in what are known as adaptation aftereffects. Blakemore and Campbell (1969) used this technique to learn about the details of filtering in human vision. They adapted to a high contrast sine-wave grating for several minutes and then measured the CSF. Their results were similar to those of Campbell and Robson, except that they found that a notch had been cut out of the CSF around the spatial frequency of the grating to which they had adapted. In other words, as shown in Figure 16, adaptation to a grating of 8 c.deg-1 increased the detection thresholds (reduced sensitivity) for gratings at and around this spatial frequency but had no effect on the detectability of much lower and much higher spatial frequencies. Adapting to different spatial frequencies resulted in notches being cut out of different locations in the CSF but always close to the spatial frequency to which the observer had adapted. The strong implication is that vision contains filters tuned for different spatial frequencies and that adaptation desensitizes the filters that are most responsive to the adapting stimulus. Thus, the idea emerged that the CSF represents the combined sensitivities of filters tuned to a range of different spatial frequencies (see Figure 17). In subsequent experiments, spatial frequency has been kept constant and sensitivity has been measured as a function of orientation. At moderate spatial frequencies this produces a flat function: vision is sensitive to no one orientation any more or less than any other. Adaptation aftereffects have then been measured in a similar way as above. In this case, adapting to a vertical grating reduces sensitivity to gratings with orientations close to vertical, but

-15- Spatial Filters/Meese/Nov 2000 has no effect, for example, on the detectability of a horizontal grating. The implication here is that the spatial filters revealed by this technique are also tuned for orientation. This also puts a constraint on the locus of the adaptation aftereffect. Recall from above that we know from physiology and anatomy that up to the stage of the LGN, filters are isotropic, so these adaptation results cannot be reflecting the properties of the visual neurons encountered up to that stage. The oriented filters must be at a later stage, the earliest one possible being primary visual cortex. Finally, the breadths of the adaptation aftereffects measured in these kinds of experiments have been used to estimate the bandwidths of the underlying spatial filters. Estimates vary a little, but something close to 1.5 octaves and ±20° is probably about right.

Fig 16. Adaptation and the CSF. The curve shows the unadapted CSF. The open symbols show the shape of the CSF after adapting to a sine-wave grating with a spatial frequency of 8 c.deg-1. Replotted from Blakemore and Campbell (1969).

CSF

MTFs of multiple filters Sensitivity

Spatial Frequency

Fig 17. The CSF and the MTFs of multiple spatial filters.

-16- Spatial Filters/Meese/Nov 2000

7.2 Summation We first encountered the idea of a detection threshold in Section 3; it is the lowest stimulus contrast that can be reliably detected by an observer, typically defined as the 75% correct point on a psychometric function measured using 2IFC (see Figure 14). We have seen that filtering can attenuate the contrast of a test stimulus (e.g. at high spatial frequencies), making it difficult to detect, and that adaptation can desensitize spatial filters, making stimuli even more difficult to detect, but what is it that is actually doing the detecting? Central to the summation paradigm is the idea that a stimulus is detected when the filtered contrast of the stimulus exceeds the detection threshold of an individual filter-element. The idea is sketched in Figure 18 and shows a grating stimulus presented as input to a filtering stage. In the figure, just a single filter-element is shown, though we might conceive of different regions of the image being processed by additional filter-elements. Crucially, the output of the filter-element passes through a second stage. Here, the response must be greater than some criterion level (the detection threshold) before it can be passed on to subsequent processing stages. This sequence of a filter-element followed by a detection threshold is an example of what is sometimes referred to as a detecting mechanism. If no stimulus information is passed on because the response of the filter-element was less than the detection threshold, then the stimulus cannot be detected. Or put another way, for a stimulus to be detected, the detecting mechanism must respond. This is known as high threshold theory, and while sophisticated psychophysical experimentation has shown that it is wrong in detail, for our present purposes it is a very useful aid to thinking. Stimulus (Input image) If output is greater than zero then stimulus is detected Input (Response of filter-element)

Stage 1 Stage 2 Stage 3 (Filter-element) (Detection threshold) (Decision)

Fig 18. A detection mechanism. If the filter-element has a similar orientation and spatial frequency to the grating stimulus then it will respond in proportion to the stimulus contrast (stage 1). However, an output nonlinearity (stage 2) means that this response is only passed on to later processing stages if it is greater than some internal level, often referred to as the detection threshold. If this happens the stimulus is detected (stage 3). The scheme is based on high-threshold theory, which is known to be wrong in detail but serves as a useful tool for thinking.

Let’s now consider the details of the summation paradigm. In this psychophysical technique, which was widely used by Graham and others in the 1970s and 1980s (see Graham, 1989), sensitivity is measured first for one stimulus, lets call it component A, then for a second stimulus, lets call it component B, and finally a compound stimulus made from a combination of the two components. If the two components (typically small patches of sine- wave grating) both stimulate the same detecting mechanism then it should be easier to detect the compound stimulus than either of its components in isolation because both components will help the detecting mechanism reach its detection threshold. If, on the other hand, the stimulus components stimulate completely different mechanisms, then summation could not occur within the detecting mechanism and sensitivity to the compound stimulus (when expressed in terms of its component contrasts) should be similar to its components. For our present purposes there are two important results, the first of which provides further evidence for the view that vision uses oriented filters. Consider a case when component A and component B are vertical and horizontal gratings respectively, and have the same spatial frequency. First let’s suppose that the detection threshold for each of the two components when presented in isolation is a contrast of 1%. Now let’s halve the contrast of each of these components (to 0.5%) so that they are each below their own detection threshold. If we add these components together we will produce a compound stimulus with vertical and horizontal components and an overall contrast of 1% (because the contrasts of the two components add arithmetically). The isotropic filters in the retina and LGN would respond to both components in the compound stimulus because these filters are not fussy about the orientations of the stimulus components. Consequently, they would ‘see’ a stimulus with a contrast of 1%. Now, if observers are able to access visual

-17- Spatial Filters/Meese/Nov 2000 information at this level of the nervous system (in other words, if these filter-elements are the detecting mechanisms) then the compound stimulus should be just as detectable as the two individual components because their contrasts are the same. Interestingly, however, this is not what happens. It turns out that the compound stimulus must have approximately double the contrast required for detecting either of the individual components (i.e. nearly 2%). In other words, the compound stimulus is not detected until at least one of its components reaches its own detection thresholds. This means that the filter-elements in the retina and LGN cannot be thought of as detecting mechanisms. Instead, contrast detection must be mediated by orientation tuned spatial filters at a subsequent processing stage, presumably primary visual cortex. In this case, a vertical filter would respond to our component A but not component B and a horizontal filter would respond to our component B but not component A. Because neither filter responds to both components, little or no summation of contrast is seen. Similar experiments have also been performed where the two components have the same orientation but different spatial frequencies. Once again, so long as the spatial frequencies differ by a factor of around 3 or more, little or no summation is found, suggesting detecting mechanisms tuned to different spatial frequencies. A second important result relates to the implementation of visual spatial filtering, but first we need to consider the stochastic nature of threshold vision. Although not mentioned earlier, one conception of high threshold theory supposes that the location of the detection threshold fluctuates slightly from moment to moment. Equivalently (and probably closer to the truth.5), one can conceive of the threshold being fixed and random noise (sometimes positive, sometimes negative) being added to the signal after the filter-element (stage 1) but before the nonlinearity (stage 2) in Figure 18. Either way, this explains why the psychometric function for detecting a stimulus is s-shaped. A constant low contrast stimulus will exceed detection threshold on some trials but not on others, meaning that the percentage of trials on which a stimulus is detected can be somewhere between chance (50% in 2IFC) and 100% (see Figure 14). Now consider our working hypotheses: 1) Vision implements spatial filtering by convolving the retinal image with a set of oriented spatial filter-elements (see Section 3). 2) Spatial filter-elements serve as detecting mechanisms, allowing a stimulus to be detected when the activity in one of them exceeds its detection threshold (see Figure 18). So, if these ideas are correct, then increasing the size of a stimulus should increase the number of filter- elements that could be usefully used to detect the stimulus. For example, in Figure 18 only a single detecting mechanism is shown, but if filters have lots of filter-elements, each ‘looking’ at different regions of the image, then there would be lots of independent detecting mechanisms that could be used to detect the stimulus. It follows that detection thresholds should be lower for large grating stimuli (many cycles of a sine-wave) than small grating stimuli (few cycles of a sine-wave) because increasing the number of detecting mechanisms increases the probability that at least one of them will exceed its detection threshold. This is often referred to as probability summation. It turns out that it is possible to make good quantitative predictions for how probability summation causes sensitivity to improve as stimulus size is increased and that these predictions provide a remarkably good account of some of the experimental data that have explored the issue (see Graham, 1989). Similar arguments have also been made for the experiments described above, where thresholds for a single component are compared to those for two (or more) components with different orientations or spatial frequencies. Even if the components do not stimulate the same detecting mechanism there should still be a small improvement in performance (in terms of the threshold contrast of a single component) owing to probability summation. And this is exactly what is found experimentally—in the example involving components A and B above, the detection threshold for each of the components is reduced from 1% to about 0.85%, just as probability summation predicts.

7.3 Single-cell recordings The psychophysical evidence outlined in the previous subsections is very much consistent with evidence from single cell recordings. For example, unlike those in the retina and the LGN, the receptive fields of cortical cells are oriented and that they contain multiple parallel lobes (Hubel and Wiesel, 1959). Just like the retina and LGN, however, the receptive fields of cortical cells tile the visual field. This is exactly what we would expect if cortical visual neurons were the filter-elements for band-pass, oriented filters (see Figure 9c). Visual neurons have also been characterized in the Fourier domain. For example, Maffei & Fiorentini (1973) used sine-wave gratings as stimuli and found that the range of spatial frequencies that cells responded to was narrower in the LGN than in the retina, but narrower still in the cortex. In other words, spatial frequency bandwidths become narrower as we move up the primary visual pathway from retina to cortex. In general, the estimates of spatial

5 But still a simplification.

-18- Spatial Filters/Meese/Nov 2000 frequency and orientation bandwidths of cortical cells are consistent with the conclusions from psychophysics. If it is appropriate to think of vision as containing spatial filters, then we know from Section 4 that the Fourier transform of the receptive field (filter-element or pointspread function) should predict the MTF. In a particularly elegant type of experiment, several different groups of physiologists have investigated this by measuring the response characteristic of visual neurons in both the spatial domain and the Fourier domain. In the first case, receptive fields were measured using small spots and lines as stimuli, and in the second case, the neuron’s MTF was measured using sine-wave gratings as stimuli. It has turned out that the data gathered in one domain provide good predictions of the data gathered in the other domain through the Fourier transform, just as the theory predicts.

7.4 The story so far The picture that has emerged is of a sequence of filtering stages starting at the eye, and becoming increasingly refined as we move towards visual cortex. At the eye’s optics, filtering is low-pass and isotropic. At the retina and LGN, filtering becomes band-pass but remains isotropic. At the level of the cortex spatial frequency bandwidths become narrower, and filtering is oriented. At this stage a detection threshold is also imposed (this is probably a simplification), allowing us to think of the filter-elements as detecting mechanisms.

Fourier Domain Spatial Domain

Pass- bands

Multiple overlapping filter-elements at each spatial location

Fig 19. Filters in vision. a) Covering the Fourier domain. b) Covering the spatial domain. Note that this figure ignores the dimension of spatial phase (see Section 2.2).

Within the cortex, visual neurons represent numerous spatial filters with different preferred orientations and spatial frequencies. Figure 19a is an idealization of how these filters cover Fourier space. Note that in this figure, Fourier space is represented as polar axes and the radial spatial frequency axis is linear. Each filter is represented by a circle that describes its half-sensitivity contour (see Figure 8). In this example, orientation bandwidths and spatial frequency bandwidths (in octaves) are the same for all filters, though in vision, this is probably not strictly true. Evidence from both psychophysics and neurophysiology (not reviewed here) suggests that both orientation and spatial frequency bandwidths vary quite considerably but, on average, become slightly narrower with an increase in the filter’s preferred spatial frequency. In other words, the filters that process fine-grain visual information are also the ones that are most narrowly tuned. A summary of vision’s cortical filters in the spatial domain is shown in Figure 19b. Here the axes represent Cartesian space across the retina in linear units. At each location on the retina there are filter-elements selective for a

-19- Spatial Filters/Meese/Nov 2000 large number of different spatial frequencies and orientations with overlapping receptive fields. In this idealization, the figure supposes that the same set of filter-elements exist at each retinal location, which is what is required if each filter were to cover the entire retina. In fact, this is something of a simplification, and much evidence (not reviewed here) indicates that spatial-filter elements for high spatial frequencies exist only at the central region of the retina where most of our attention for fine-grain spatial information (e.g. printed text) is directed. The consequence of this is that finely detailed spatial information is not represented in early vision outside the centre of the visual field. This represents a substantial saving in neural hardware and neural computation.

8 Why use spatial filters in vision? In Sections 3 to 5 we considered psychophysical and neurophysiological evidence for multiple spatial filters in biological vision. But why should vision work this way? What do all those filters achieve? One obvious answer is that different visual information is conveyed at different spatial scales. For example, consider the cat in Figure 20. At one level of analysis we might conclude that the contour orientation midway along the cats back is approximately horizontal, and that this indicates the boundary of the cat’s body. This would be useful to know about if the visual task was to identify the location and perhaps the identity of the object. On the other hand, around the same general region the orientations of the raised hairs are close to vertical. This too is useful to know about, particularly if the task were to judge the emotional state of the cat! Conveniently, filters at different spatial scales would best convey these very different orientations. For example, vertical filters tuned to mid and low spatial frequencies would blur together the individual representations of the hairs and fail to see this detail. However, this blurring would be useful in a low frequency horizontal filter, which could be used to detect the contour of the cat’s back. At much higher spatial frequencies, the cat’s hairs would no longer blur together because multiple hairs would not fall within common lobes of the spatial filter-elements. So, at higher spatial frequencies, vertical filters would provide the useful information about the cat’s hairs.

Fig 20. A cat. Vertical high spatial frequency filter-elements would be needed to represent the hairs on the cats back and a horizontal low-spatial frequency filter-element could represent the horizontal contour of the cat’s back.

Other points are illustrated in the band-pass filtered images in Figures 6b & c. Here, the boundaries between light and dark typically correspond with object boundaries in the original image illustrating how spatial filters can perform edge detection. (Of course, as we have just described above, different edges are found at different spatial scales). Note that this is a particularly efficient way of encoding images because it emphasizes the interesting information about the boundaries of objects and patterns but results in no neural response (mid grey level) for less informative regions where image luminance is constant. In Figure 6b, edges are detected at all orientations, but in Figure 6c, only those close to vertical are highlighted. Clearly, different information is contained at different orientations and filters tuned for different orientations might be best used to represent it. For example, in Figure 6c, the vertical filter preserves information about the boundary of the nose, but conveys little information about the mouth, for which a horizontal filter would be much better suited (see Figure 13). This could help the visual system to encode and recognize faces because filter-elements with different orientations would be activated at different positions within the image. We have seen that different filters can be used to pick out different features of the image. More precise information about local image structure might then be extracted from a detailed analysis of the oriented filter- elements at a particular region of the image. We consider the details of how this might be achieved next.

-20- Spatial Filters/Meese/Nov 2000

8.1 Population coding The first thing to realize is that the response of a single visual neuron of the type considered in this chapter cannot provide unambiguous information about the stimulus that stimulated it. The following argument is idealized, but makes our point. Consider a neuron with a receptive field like that shown in Figure 9c. It might respond at say 30 pulses per second to a grating with contrast of 5% at its preferred orientation and spatial frequency. However, as we know from Figure 8, it would also respond to a grating at some other orientation so long as it was within its pass- band. So, lets suppose that we change the orientation of the grating to 20 degrees. For our filter-element, sensitivity will fall to half that which it was before, so the response rate will drop to 15 pulses per second. However, if now we were to increase the contrast by a factor of two then this would bring the response rate back to 30 pulses per second. So, a response rate of 30 pulses per second could indicate a low-contrast stimulus at the preferred orientation, a high- contrast stimulus oriented 20 deg from the preferred orientation, or any manner of appropriate combinations of contrast and orientation. Clearly, the response of a single filter-element cannot be the way that vision encodes stimulus orientation. The solution to the problem is to consider the distribution of activity across a range of filter-elements that each look at the same region of the image but through receptive fields at different orientations. Figure 21 shows the pattern of activity that we would get in response to a vertical grating at two different contrasts. Although the firing rate of each individual neuron changes with stimulus contrast, the shape of the distribution across the population of neurons does not. The peak of the distribution, for example, indicates the orientation of the stimulus. This is known as a population code and a psychophysical phenomenon known as the tilt aftereffect (TAE) suggests that this is in fact the way that orientation is encoded in human vision.

8.1.1 The tilt aftereffect If an observer adapts to a slightly tilted grating (oriented at say 15 deg from vertical), a subsequently presented vertical test grating appears tilted in a direction opposite from that of the adapter. (The perceived orientation of the test stimulus is assessed by the observer matching it with a comparison grating placed in some unadapted region of the visual field). As the perceived orientation of the grating is different from the physical orientation of the test image on the retina, the visual system must have made an orientation coding error. This phenomenon is known as the tilt aftereffect and has been explained as follows.

High contrast

Low contrast Response

Orientation of Receptive Fields Fig 21. A population code for orientation.

Figure 21 shows the distribution of activity that would be expected in a population code for a vertical grating. The peak of the distribution is for a vertical filter-element and so this is the orientation that is seen. Now suppose that the observer adapts to a grating oriented at 15 deg. The population response to the adapter is shown in Figure 22a; the peak response is for a filter-element oriented at 15 deg and during adaptation this is the orientation that is seen. Now, recall from Section 5.1 that adaptation fatigues visual neurons according to their level of excitation

-21- Spatial Filters/Meese/Nov 2000 by the adapter. This leads to the distribution of fatigue illustrated in Figure 22b. The adapter is now removed and a vertical grating is presented. Normally this stimulus would produce the response distribution shown in Figure 22b, but because of fatigue, the response distribution is distorted. This is shown in Figure 22c and was calculated simply by summing the expected response distribution with the fatigue distribution (Figure 22b). Because the peak of this distribution is oriented anticlockwise from vertical the stimulus appears to be tilted anticlockwise, even though the retinal image is vertical.

1.5

Adapt to 15°

1

0.5 Response

0

-0.5

-90 -60 -30 0 30 60 90 (a) Orientation (deg)

1.5

Vertical test stimulus

1

0.5 Response

0

Fatigue -0.5 from adaptation

-90 -60 -30 0 30 60 90 (b) Orientation (deg)

1.5

Response to stimulus after adaptation 1

0.5 Response

0

-0.5

-90 -60 -30 0 30 60 90 (c ) Orientation (deg)

Fig 22. The tilt aftereffect. a) Clockwise tilted adapting stimulus and population response. b) Vertical test stimulus, fatigue distribution and expected response to the test stimulus. c) Perceived orientation and actual response distribution calculated from the sum of the two distributions in (b).

8.2 Off-channel looking In the previous subsection we learned that perceived orientation is determined by the distribution of activity across a population of filter-elements tuned for different orientations but looking at the same region in the image. But what is

-22- Spatial Filters/Meese/Nov 2000 the fidelity of this code? This has been assessed psychophysically by performing orientation discrimination. Typically, an observer might be presented with two high contrast gratings of slightly different orientations and the observer’s task is to indicate which of the two gratings is oriented most clockwise. In this type of experiment, the smallest change in orientation that observers can reliably detect is called the orientation discrimination threshold. Experimenters have found this threshold to be about 0.5 deg, which is one twelfth of the angle made by the second hand on your watch when it ticks once. This is impressive performance, and all the more so when we recall that the orientation bandwidths of vision’s filters are as broad as ±20 deg (i.e. 40 deg). We see the implication of this in Figure 23. Figure 23a shows the distribution of activity across a population of oriented filter-elements for two gratings, one oriented at -1° and the other at +1°, and Figure 23b shows the absolute difference in these responses. Clearly, the most responsive filter-elements (i.e. those tuned to orientations close to vertical) could not be used to perform this discrimination because the small changes in stimulus orientation result in negligible changes in their output. So how does vision manage to do so well? A possible answer is revealed by noticing that some filter-elements do change their responses quite markedly, but that they have preferred orientations that are rather remote from the orientations of the test gratings; in Figure 23b, these filter-elements are those preferring stimulus orientations around -20 deg and +20 deg.

(a) (b)

Response Distribution Response Distribution

Units) 1 for Grating at -1° for Grating at 1° 0.06 0.05 0.8 Units) 0.04 0.6 Difference

(Arbitrary (Arbitrary 0.03 0.4 0.02 (Arbitrary (Arbitrary

0.2 Response 0.01 Response Response 0 0 - 4 0 - 2 0 0 20 40 - 4 0 - 2 0 0 20 40 Filter Orientation (deg) Filter Orientation (deg)

Fig 23. Off-channel looking a) Population responses for two gratings whose orientations differ by 2°. This orientation difference is four times greater than that measured experimentally, but serves to illustrate the point. b) Absolute difference in population responses for the two stimuli in (a).

But does vision actually take advantage of this? Regan and Beverley (1985) developed another cunning use for the adaptation paradigm in order to address this question. They reasoned that if vision uses filter-elements with preferred orientations that are remote from the orientations that are being discriminated, then task performance should be disrupted most severely by adapting to those remote orientations. This is exactly what they found; orientation discrimination thresholds were increased most when the difference between the orientation of their test gratings and the orientation of their adapter was about 20°. This result provides a nice illustration of how vision can work really quite counter-intuitively. Without the observer even having to think about it, the visual system is smart enough to know that when making fine judgements about changes in orientation (as might happen when threading a needle for example), the important filters for the task (sometimes called channels) are those tuned to orientations that are remote from those which are most active. For this reason, the strategy is sometimes referred to as off-channel looking.

-23- Spatial Filters/Meese/Nov 2000

Selected References

Blakemore, C & Campbell, F. W. (1969) On the existence of neurons in the human visual system selectively sensitive to the orientation and size of retinal images. Journal of Physiology, 203, 237-260.

Campbell, F. W. & Robson, J. G. (1968) Application of to the visibility of gratings. Journal of Physiology (London), 197, 551-566.

Graham, N. (1989) Visual Pattern Analyzers. Oxford: Oxford University Press.

Hubel, D. H. & Wiesel, T. N. (1959). Receptive fields of single neurones in the cat’s striate cortex. Journal of Physiology – London, 148, 574-591.

Kuffler, S. W. (1953) Discharge patterns and functional organization of mammalian retina. Journal of Neurophysiology, 16, 37-68.

Maffei & Fiorentini (1973) Visual-cortex as a spatial frequency analyzer. Vision Research, 13, 1255-1267.

Marr, D. (1982) Vision. New York: Freeman.

Meese, T. S. (2002) Spatial Vision in Signals and Perception: The Fundamentals of Human Sensation. (Ed. David Roberts), New York: Palgrave, Macmillan. pp 171-183.

Regan, D. & Beverley, K. I. (1985) Postadaptation orientation discrimination. Journal of the Optical Society of America A, 2, 147-155.

-24-