Bayes and the Simplicity Principle in Perception

Bayes and the Simplicity Principle in Perception

Rutgers Center for Cognitive Science Technical Report #80, 2004 Bayes and the Simplicity Principle in Perception Jacob Feldman Dept. of Psychology, Center for Cognitive Science Rutgers University Two governing principles of perceptual inference, the Likelihood principle and the Simplicity principle, have been much discussed by perceptual theorists, and often placed in opposition. Recently, Chater (1996) has argued that the two principles are actually consistent in that their decisions tend to agree asymptotically. This article seeks to relate in a more mathematically di- rect way what is arguably the most plausible version of a likelihood theory, Bayesian inference, with a recently-proposed generalized formal minimum principle (the maximum-depth rule of Minimal Model theory). Assuming qualitative information on the part of the observer, and equal prior probabilities among among all competing hypotheses, maximizing the Bayesian posterior probability turns out to be mathematically equivalent to choosing the maximum- depth interpretation from a hierarchical space of possible scene interpretations. That is, the maximum-depth rule is analytically equivalent to Bayes with a particular choice of priors. Thus this version of the Simplicity principle, as well as “full-blown Bayes,” each constitute distinct points in a well-defined continuum of possible perceptual decision rules. In a very literal math- ematical sense, the observer’s position in this continuum—and, consequently, the perceptual decision rule it employs—reflect the nature of its tacit assumptions about the environment. Simplicity vs. Likelihood of the principle (Hatfield & Epstein, 1985), paralleling an principles in Perception analogous debate about the rationale of Occam’s razor in the selection of scientific theories (Quine, 1965; Sober, 1975). A recurrent theme in the study of human visual perception Another well-known principle of perceptual inference, is the idea that the visual system selects the simplest interpre- sometimes held up in opposition to the Simplicity principle, tation consistent with the visual image—sometimes referred is the Likelihood principle: choose the interpretation most to as the Simplicity principle, sometimes by the Gestalt term likely to be true. The rationale behind this idea seems rel- Pragnanz¨ , and sometimes as the minimum principle. The atively self-evident, in that it is clearly desirable (say, from principle has taken many forms, from a relatively vague pref- an evolutionary point of view) for the organism to achieve erence for the maximization of “regularity” (Kanizsa, 1979), veridical percepts of the world. Yet the mere statement of to more concrete systems in which the image is described the principle begs the question of how the visual system ac- in some fixed coding language, and the interpretation whose tually determines the relative likelihood of various candidate code is of minimal length is selected (Hochberg& McAlister, interpretations, and hence it has not been clear exactly how 1953; Leeuwenberg, 1971; Buffart, Leeuwenberg, & Restle, the Likelihood principle might translate into concrete com- 1981). Many phenomena of visual perception seem to be putational procedures. explained at least in part by appeals to the minimum princi- Historically, the minimum principle and the likelihood ple, in one form or another. Yet at the same time many au- principle have usually been regarded as competitors, or at thors have been troubled over the motivation or justification least roughly contradictory (Perkins, 1976; Hatfield & Ep- stein, 1985; Leeuwenberg & Boselie, 1988; van der Helm, 2000). Recently, however Chater (1996), using mathemati- I am grateful to Rajesh Kasturirangan, Whitman Richards, Man- cal arguments paralleling those from Minimum Description ish Singh, Matthew Stone, and Josh Tenenbaum for helpful discus- Length (MDL) theory (Rissanen, 1989), has shown that the sions and comments. This research was supported in part by NSF two principles can be regarded as equivalent. Under very SBR-9875175. general assumptions, the visual interpretation whose descrip- Please direct correspondence to Jacob Feldman, Dept. of Psy- tion is of minimum length is, in fact, the one that is most chology & Center for Cognitive Science, Rutgers University - New likely to be the correct in an objective sense. This remarkable Brunswick, Busch Campus, Piscataway, New Jersey, 08854, or by demonstration combines the most appealing aspects of both e-mail at [email protected]. principles, giving the minimum principle a clear rationale 1 2 JACOB FELDMAN (namely, veridicality) while suggesting a criterion by which terior probability, i.e. the conditional probability of the hy- the most likely interpretation can be identified (namely, min- pothesis given the data. In the context of visual perception, imum length). the data is the visual image I and the hypotheses are the Nevertheless Chater’s demonstration is pitched at a very various scene interpretations among which the observer will high level of abstraction. The argument connecting proba- choose. In what follows I will assume an image I chosen 2 bility to complexity is based on the notion of Kolmogorov from an image space I, and a finite set S = {S1,S2 ...Sn} complexity (the length of the shortest computer program that of distinct candidate interpretations, i.e. categories of distal could generate a given string). The beauty of the mathemat- scenes. Each interpretation Si has an associated likelihood ics surrounding Kolmogorov complexity is that it doesn’t de- function p(I|Si) indicating how likely a given possible image pend on details of the coding language used—all so-called is under that hypothesis; and each scene occurs with a cer- universal codes give approximately the same complexity tain scalar prior probability p(Si). The priors must sum to value. But the flip side of this same universality is that while unity (∑i p(Si)= 1), and each likelihood function integrates one can make general statements about the Kolmogorov to unity over I ( I p(I|Si)dI = 1). complexity of a given string, you never know the specific By Bayes’ rule,R given image I, the posterior probability value—the length of the actual shortest program; in fact it is that the interpretation Si is correct is uncomputable in general (see Sch¨oning & Pruim, 1998 for a simple proof of this). The agreement between the proba- p(Si)p(I|Si) p(Si|I)= (1) bility of an interpretation and its complexity (like all state- ∑ j p(S j)p(I|S j) ments about Kolmogorov complexity) is necessarily asymp- totic: they tend to match in the limit as number of stimulus Because the denominator is the same for all interpretations, elements grows infinitely large. But for any given stimulus the winning interpretation will be that Si that maximizes the and any given coding language, the disagreement can be ar- numerator bitrarily large, and thus potentially overshadow the agree- 1 ment. The exact discrepancy for realistic stimuli depends p(Si)p(I|Si), (2) on the coding language, meaning that different coding lan- guages may in practice achieve the most veridical conclusion i.e. the product of the prior and the likelihood. (Notice this with extremely different degrees of success. depends both on the likelihood of the given image under Si, as well as the prior probability p S that S was true before Hence Chater’s argument, while persuasive in an abstract ( i) i the image was observed.) Another definition that will be im- sense, leaves open narrower—but crucial—questions such as portant below is the support of an interpretation S , defined as the nature of the actual coding language used by the visual i the region of I where S ’s likelihood is non-zero3 and denoted system, and the exact form of the associated minimization i σ S , rule. The current paper seeks to demonstrate a stronger and ( i) more specific connection between a particular minimization σ rule recently proposed (the maximum-depth rule of Mini- (Si)= {I ∈ I|p(I|Si) > 0}. (3) mal Model theory; see Feldman, 1997c, 1997b, 1999, 2003) The support of a scene model Si is the set of images that and probabilistically optimal inference, i.e. Bayesian the- could have been produced by it. ory. The conditions invoked in this rule are less general A fully Bayesian observer would select an interpretation than in Chater’s formulation, but are important in wide va- by first computing the product (2) for all Si, and then select- riety of perceptual situations, especially those involving per- ing the largest.4 Bayesian theory has at times been criticized ceptual organization, grouping, and the inference of three- for requiring this global maximization, which may involve a dimensionality. Hence while consistent with Chater’s gen- great deal of computation. eral conclusion, the connections between minimum princi- ples and Bayes described below give a more concrete ac- 1 Technically, the agreement between two complexity measures count of why perceptually realistic simplicity minimization is bounded by a constant that does not depend on the stimulus. tends to yield the true state of the world. But the size of the constant depends on the amount of information needed to specify the design of a complete Turing machine, en- Bayesian formulation abling one universal Turing machine to simulate another. Because Turing machines, or Turing-equivalent computing systems such as Bayesian theory is particularly attractive formulation of human brains, can be arbitrarily large and complex, the constant dif- the likelihood principle in perception (for examples see ference between coding lengths for two machines can be arbitrarily B¨ulthoff & Yuille, 1991; Feldman, 2001; Knill & Richards, large. 2 We will not generally require that S be finite, but it is more 1996; Landy, Maloney, Johnston, & Young, 1995). Its at- notationally convenient to assume so. tractiveness stems largely from the fact that it provides prov- 3 The support of S can alternatively be defined as the region ably optimal inferences under conditions of uncertainty (see where p(I|S) > ε for some arbitrarily small number ε; this makes Jaynes, 1957/1988 for an especially lucid demonstration of no difference in the ensuing theory.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us