A Tutorial Essay on Spatial Filtering and Spatial Vision
Total Page:16
File Type:pdf, Size:1020Kb
Spatial Filters/Meese/Nov 2000 A Tutorial Essay on Spatial Filtering and Spatial Vision Tim S Meese Neurosciences School of Life and Health Sciences Aston University Feb 2009: minor stylistic and modifications and technical corrections March 2009: referencing tidied up To follow: Final round-up sections and detailed references added. -1- Spatial Filters/Meese/Nov 2000 1. Introduction To the layperson, vision is all about ‘seeing’—we open our eyes and effortlessly we see the world ‘out there’. We can recognize objects and people, we can interact with them in sensible and useful ways and we can navigate our way about the environment, rarely bumping into things. Of course, the only information that we are really able to act upon is that which is encoded as neural activity by our nervous system. These visual codes, the essence of all of our perceptions, are constructed from information arriving from each of two directions. Image data arrives ‘bottom-up’ from our sensory apparatus, and knowledge-based rules and inferences arrive ‘top-down’ from memory. These two routes to visual perception achieve very different jobs. The first route is well suited to providing a descriptive account of the retinal image while the second route allows these descriptions to be elaborated and interpreted. This chapter focuses primarily on one aspect of the first of these routes, the processing and encoding of spatial information in the two-dimensional retinal image. One of the major success stories in understanding the human brain has been the exploration of the bottom-up processes used in vision, sometimes referred to as early vision. But how do vision scientists get in to explore it? Our visual world is a very private one and not available for external scrutiny from vision scientists in an obvious way. Fortunately, it is not necessary and, according to some, perhaps not even helpful to think of vision this way. The point is made by David Marr’s consideration of the lowly housefly (Marr, 1982). Although we cannot know what it is like to be a housefly, we don’t have to suppose that a housefly has an explicit visual representation of the world around it for us to study its visual system. Perhaps the housefly just computes a few simple but immediately useful visually guided parameters. For instance, when the computation of ‘rate of image expansion’ reaches a certain value this means that a surface is approaching and the ‘undercarriage down, prepare to land’ signal should be triggered. On this view, it makes little sense to ask how the visual world might ‘look’ to a housefly. However, by carefully manipulating artificial visual stimuli that mimic the fly’s normal visual environment it is possible to investigate its visual apparatus by observing its behavioural responses (e.g. whether it prepares to land). A similar philosophy has been applied to understanding early vision in humans. The central tenet is that the visual system is a signal processor. It is treated as a black box, with a two-dimensional spatial signal as input and a ‘neural image’ (an organized distribution of neural activity) as output (see Figure 1) upon which behavioural decisions can be made. To learn about signal processing in the early visual system we need to know about the neural image. Visual psychophysics attempts to do this in the laboratory (see Figure 2) by assessing the behavioural responses to visual stimuli (e.g. pressing one of two buttons), usually made by trained observers. With this technique, the system is necessarily treated as a whole though as we shall see, careful thought and experimentation can allude to the nuts and bolts of the visual system’s inner workings. In neurophysiology, single- and multiple-cell recordings (see Figure 3) give us more direct access to the neural response, but allow us to look only at isolated fragments of the system at any one time. Input image Early vision Output image (grey levels) (neural activity) BLACK BOX Fig 1. The black box approach to vision. -2- Spatial Filters/Meese/Nov 2000 Together, neurophysiology and psychophysics have converged on the view that early spatial vision consists of a set of spatial filters. The evidence for this view and the reasons why vision might work this way form the basis of the second part of the chapter, but first we need some suitable building blocks. We begin by exploring some formal concepts of spatial filtering. Fig 2. A psychophysics laboratory. Fig 3. Direct recordings of visual neurons. 2. Filtering in the Fourier Domain In most basic terms, a filter is a device that receives something as input and passes on some of its input as output. For example, a sieve might receive various grades of gravel as input, retaining the largest stones and passing on only the small chippings as output. In image processing, the input and output are both images, but what aspect of the image might be selectively passed on or filtered out? One very useful way of approaching this is from the Fourier domain. Essentially, a Fourier component can be thought of as a sine-wave grating1 of some particular orientation, spatial frequency, amplitude and spatial phase. It turns out, perhaps astonishingly, that all images can be thought of as a set of Fourier components that when added together recreate the original image. An example of an image and the amplitude spectrum of its Fourier transform (FT) are shown in Figure 4. In Figure 4b, the grey levels indicate the amplitudes of Fourier components (sine-wave gratings) within a map called Fourier space. This space is most conveniently expressed in terms of polar coordinates, 1 A sine-wave grating is a stimulus in which luminance is modulated sinusoidally, producing an image that looks like a set of blurred stripes. The experimenter can control the width of the stripes (spatial frequency), their orientation, spatial phase (how they line up with the centre of the display screen) and their contrast (the light levels for the light and dark bars relative to mean luminance). Sine- wave gratings and related stimuli are widely used in vision science (partly because they are the fundamental building blocks of all images), and examples can be seen by looking ahead to Figures 12 and 13. -3- Spatial Filters/Meese/Nov 2000 where the argument (angle) indicates the orientation of a Fourier component and the modulus (distance from the origin) indicates its spatial frequency. Unfortunately, the gentle gradients of the grey levels in Figure 4b make it difficult to appreciate its structure, so a second method of plotting the Fourier transform is shown in Figure 4c. Here, quantization and colour enhancement reveal contours representing Fourier components of similar amplitudes. In this case, those with the greater amplitudes (coloured red and yellow) are those closer to the origin (centre of the image). But the point here is that images contain Fourier energy distributed across Fourier space (i.e. they have many Fourier components with different orientations and spatial frequencies) and that spatial filters are selective for different regions of Fourier space, passing some Fourier components and stopping (filtering out) others, just like the sieve. We will consider why it might be a good idea to do this later on (in Section 6), but for know, let us just consider the different types of filters that we might construct. (a) (b) (c) Fig 4. Image and its Fourier transform. a) The original image. b) Amplitude spectrum of the Fourier transform. c) Colour enhanced version of (b). The colours represent high to low amplitudes as follows: red, orange yellow, green, blue, purple, black. A filter that is selective for the circular region of Fourier space shown in Figure 5a (known as the filter’s pass-band) would pass only low spatial frequencies (i.e. those close to the origin) but would not care about their orientations. Such a filter is known as a low-pass isotropic filter and an example of how it transforms an image is shown in Figure 6a. (For now, you can ignore the small insets in the upper left corner of Figure 6; these will be explained later in Section 3). The output image contains only the Fourier components from the input image that are within the filter’s pass-band. In other words, it contains low spatial frequencies but no high spatial frequencies at any orientation—we say they have been filtered out. Other interesting filters are those that pass only a band of spatial frequencies at any orientation (band-pass isotropic filters; Figure 5b) and bands of spatial frequencies at only specific orientations (oriented band-pass filters; Figure 5c)2. Note that for all three of the filters in Figure 6, the Fourier components for which the filter is most responsive are those shaded red. So, for the filter in Figure 6b for example, the red ring represents a single spatial frequency at any orientation. The results of applying these filters to the image in Figure 4a are shown in Figure 6. (See ahead to Fig 13 for another example of oriented band-pass filtering). With the aid of a computer and appropriate image processing software, performing filtering operations in the Fourier domain is very straightforward. All we have to do is multiply the Fourier transform of the input image with the Fourier representation of the filter (sometimes called the filter’s modulation transfer function [MTF]), the details for which are described in Box 1. The three MTFs in Figure 5 have been multiplied by the Fourier transform of the image in Figure 4, to produce Fourier transforms (Figure 7) of three different output images.