Principles of Image Representation in Visual Cortex
Total Page:16
File Type:pdf, Size:1020Kb
In: The Visual Neurosciences, L.M. Chalupa, J.S. Werner, Eds. MIT Press, 2003: 1603-1615. Principles of Image Representation in Visual Cortex Bruno A. Olshausen Center for Neuroscience, UC Davis [email protected] 1 Introduction data. Neuroscientists are interested in understand- ing how the cortex extracts certain properties of the The visual cortex is responsible for most of our con- visual environment—surfaces, objects, textures, mo- scious perception of the visual world, yet we remain tion, etc.—from the data stream coming from the largely ignorant of the principles underlying its func- retina. Similarly, engineers are interested in design- tion despite progress on many fronts of neuroscience. ing algorithms capable of extracting structure con- The principal reason for this is not a lack of data, tained in images or sound—for example, to identify but rather the absence of a solid theoretical frame- and label parts within the body from medical imag- work for motivating experiments and interpreting ing data. These problems at their core are one in the findings. The situation may be likened to trying same, and progress in one domain will likely lead to to understand how birds fly without knowledge of new insights in the other. the principles of aerodynamics: no amount of ex- The key difficulty faced by both sides is that the perimentation or measurements made on the bird core principles of pattern analysis are not well un- itself will reveal the secret. The key experiment— derstood. No amount of experimentation or techno- measuring air pressure above and below the wing as logical tinkering alone can overcome this obstacle. air is passed over it—would not seem obvious were it Rather, it demands that we devote our efforts to not for a theory suggesting why the pressures might advancing new theories of pattern analysis, and in be different. directing experimental efforts toward testing these But neuroscience can not simply turn to mathe- theories. matics or engineering for a set of principles that will In recent years, a theoretical framework for how elucidate the function of the cortex. Indeed, engi- pattern analysis is done by the visual cortex has be- neers and mathematicians themselves have had lit- gun to emerge. The theory has its roots in ideas pro- tle success in emulating even the most elementary posed more than 40 years ago by Attneave and Bar- aspects of intelligence or perceptual capabilities, de- low, and it has been made more concrete in recent spite much effort devoted to the problem over the years through a combination of efforts in engineer- past 40 years. The lack of progress here is espe- ing, mathematics, and computational neuroscience. cially striking considering the fact that the past two The essential idea is that the visual cortex contains decades alone have seen a 1000-fold increase in com- a probabilistic model of images, and that the activ- puter power (in terms of computer speed and mem- ities of neurons are representing images in terms of ory capacity), while the actual “intelligence” of com- this model. Rather than focussing on what features puters has improved only moderately by comparison. of “the stimulus” are represented by neurons, the emphasis of this approach is on discovering a good The problem: pattern analysis featural description of images of the natural environ- ment, using probabilistic models, and then relating Although seldom recognized by either side, both this description to the response properties of visual neuroscientists and engineers are faced with a com- neurons. mon problem—it is the problem of pattern analysis, In this chapter, I will focus on recent work that or how to extract structure contained in complex has attempted to understand image representation in area V1 in terms of a probabilistic model of natu- others—and is characterized by the distribution ral scenes. Section 2 will first provide an overview of P (E). the probabilistic approach and its relation to theo- From these two quantities, one can make inferences ries of redundancy reduction and sparse coding. Sec- about the environment by computing the posterior tion 3 will then describe how this framework has distribution, P (E|D), which specifies the relative been used to model the structure of natural images, probabilities of different states of the environment and section 4 will discuss the relation between these given the data. It is computed by combining P (E) models and the response properties of V1 neurons. together with P (D|E) according to Bayes’ rule: Finally, in section 5 I will discuss some of the exper- imental implications of this framework, alternative P (E|D) ∝ P (D|E)P (E)(1) theories, and the prospects for extending this ap- This simple equation provides a mathematical for- proach to higher cortical areas. mulation of the essential problem faced by the cor- tex. By itself, it does not provide all the answers to 2 Probabilistic models how the cortex works. But it does provide a guiding principle from which we can begin to start filling in The job of visual perception is to infer properties of details. the environment from data coming from the sensory receptors. What makes this such a difficult problem Redundancy reduction is that we are trying to recover information about The general idea of “perception as probabilistic in- the three-dimensional world from a two-dimensional ference” is by no means new, and in fact it goes back sensor array. This process is loaded with uncertainty at least to Helmholtz (1867/1962). Attneave (1954) due to the fact that the light intensity arriving at later pointed out that there could be a formal rela- any point on the retina arises from a combination tionship between the statistical properties of images of lighting properties, surface reflectance properties, and certain aspects of visual perception. The no- and surface shape (Adelson & Pentland 1996). There tion was then put into concrete mathematical and is no unique solution for determining these proper- neurobiological terms by Barlow (1961, 1989), who ties of the environment from photoreceptor activi- proposed a self-organizing strategy for sensory ner- ties; rather, some environments provide a more prob- vous systems based on the principle of redundancy able explanation of the data than others based on reduction—i.e., the idea that neurons should encode our knowledge of how the world is structured and information in such a way as to minimize statistical how images are formed. Visual perception is thus dependencies among them. Barlow reasoned that a essentially a problem of probabilistic inference. statistically independent representation of the envi- In order to do probabilistic inference on images, ronment would make it possible to store information two things are needed: about prior probabilities, since the joint probability x 1. A model for how a given state of the environ- distribution of a particular state could be easily ment (E) gives rise to a particular state of ac- calculated from the product of probabilities of each x P x P x tivity on the receptors (the observable data, component i: ( )=Πi ( i). It also has the ad- D). This model essentially describes the pro- vantage of making efficient use of neural resources in cess of image formation and can be character- transmitting information, since it does not duplicate ized probabilistically (to account for uncertain- information in different neurons. ties such as noise) using the conditional distri- The first strides in quantitatively testing the the- bution P (D|E). ory of redundancy reduction came from the work of Simon Laughlin and M.V. Srinivasan. They mea- 2. A model for the prior probability of the state of sured both the histograms and spatial correlations the environment. This expresses our knowledge of image pixels in the natural visual environment of of how the world is structured—which proper- flies, and then used this knowledge to make quan- ties of the environment are more probable than titative predictions about the response properties of 2 neurons in early stages of the visual system (Laugh- inputs (approximately 25:1 in cat area 17—inferred lin 1981; Srinivasan et al., 1982). They showed that from Beaulieu & Colonnier (1983) and Peters & Yil- the constrast response function of bipolar cells in the maz (1993)). If the bandwidth per axon is about fly’s eye performs histogram equalization (so that all the same, then the unavoidable conclusion is that output values are equally likely), and that lateral in- redundancy is being increased in the cortex, since hibition among these neurons serves to decorrelate the total amount of information can not increase their responses for natural scenes, confirming two (Field, 1994). The expansion here is especially strik- predictions of the redundancy reduction hypothesis. ing given the evidence for wiring length being min- Another advance was made ten years later by At- imized in many parts of the nervous system (Cher- ick (1992) and van Hateren (1992; 1993), who for- niak, 1995; Koulakov & Chklovskii 2001). So what mulated a theory of coding in the retina based on is being gained by spending extra neural resources whitening the power spectrum of natural images in in this way? space and time. Since it had been shown by Field First, it must be recognized that the real goal (1987) that natural scenes posses a characteristic of sensory representation is to model the redun- 1/f 2 spatial power spectrum, they reasoned that dancy in images, not necessarily to reduce it (Bar- the optimal decorrelating filter should attempt to low, 2001). What we really want is a meaningful whiten the power spectrum up to the point where representation—something that captures the causes noise power is low relative to the signal (since a of images, or what’s “out there” in the environment.