The Simplicity Principle in Perception and Cognition Jacob Feldman*

The Simplicity Principle in Perception and Cognition Jacob Feldman*

Overview The simplicity principle in perception and cognition Jacob Feldman* The simplicity principle, traditionally referred to as Occam’s razor, is the idea that simpler explanations of observations should be preferred to more complex ones. In recent decades the principle has been clarified via the incorporation of modern notions of computation and probability, allowing a more precise under- standing of how exactly complexity minimization facilitates inference. The sim- plicity principle has found many applications in modern cognitive science, in contexts as diverse as perception, categorization, reasoning, and neuroscience. In all these areas, the common idea is that the mind seeks the simplest available interpretation of observations— or, more precisely, that it balances a bias toward simplicity with a somewhat opposed constraint to choose models consistent with perceptual or cognitive observations. This brief tutorial surveys some of the uses of the simplicity principle across cognitive science, emphasizing how complexity minimization in a number of forms has been incorporated into probabilistic models of inference. © 2016 Wiley Periodicals, Inc. How to cite this article: WIREs Cogn Sci 2016, 7:330–340. doi: 10.1002/wcs.1406 OCCAM’S RAZOR that the bias toward simplicity is an essentially aes- thetic preference, akin to elegance or beauty that — The principle of simplicity or parsimony broadly, mathematicians prize in theorems, conveying no par- the idea that simpler explanations of observations ticular claim to correctness (see Ref 2). Simpler the- — should be preferred to more complex ones is con- ories were seen as more manageable, more ventionally attributed to William of Occam, after comprehensible, and more testable,3 but not neces- whom it is traditionally referred to as Occam’s a sarily more truthful. In the 20th century, some razor. Since then philosophers of science have authors (see Refs 4,5) began to argue that simpler adopted a bias toward simpler explanations as a theories were, in fact, more likely to be true, but until foundational principle of inference, guiding the selec- recently the precise connection between simplicity tion of hypotheses whenever multiple hypothesis are and truth remained, at best, extremely unclear. To — consistent with data as is nearly always the case. understand the connection, we need at the very least But why should simpler theories be preferred? a more precise definition of simplicity. Such a defini- Practicing scientists have generally assumed it is tion only arrived in the last few decades. because they are actually more likely to be correct. But it has never been clear exactly why this should be so. Hume’s principle of “Uniformity of Nature” suggests that simpler theories are preferable because MATHEMATICAL DEFINITIONS they make a good match for a highly regular, lawful OF SIMPLICITY AND COMPLEXITY world. Conversely, some philosophers have assumed Historically it has generally been assumed that sim- plicity and complexity were inherently subjective *Correspondence to: [email protected] notions, impervious to clear, rigorous, or universal Department of Psychology, Center for Cognitive Science, Rutgers definitions. A theory that is simple in one method of University, New Brunswick, NJ, USA expression may appear complex in another, implying “ ” Conflict of interest: The author has declared no conflicts of interest that simplicity lies in the eye of the beholder. A for this article. notorious example was the competition between 330 ©2016WileyPeriodicals,Inc. Volume7,September/October2016 WIREs Cognitive Science The simplicity principle Julian Schwinger’s elaborate formalization of quan- computer language. Turing11 had demonstrated the tum thermodynamics, and Richard Feynman’s appar- existence of computers (now referred to as universal ently simpler account (based on “funny little Turing machines) that can, in a well-defined sense, diagrams”)—which, notwithstanding their apparent carry out any concretely specifiable algorithm. In difference in complexity, were eventually shown by modern terminology, we can think of them as com- Freeman Dyson to be equivalent.6 Cases like this puter languages that are general enough to express appeared to imply that complexity depends on the any computable function—including, critically, to chosen method of expression, and thus that no quan- “simulate” other computer languages. Assume a tification of complexity could be universal. string S that can be expressed by computer language L1 in K1(S) steps, meaning that its Kolmogorov com- plexity is at least as low as K1(S). Assume that some Kolmogorov Complexity other language L2 can be expressed in language L1 in This all changed in the 1960s with the introduction a finite number |L2| of steps—in modern terms, |L2|is of a principled and convincing mathematical defini- the length of a compiler for language L2 written in tion of complexity now known as Kolmogorov language L1. It follows fairly immediately that string complexity or algorithmic complexity. Introduced S can be expressed in language L2 in |L2|+K1(S) in slightly different forms by Solomonoff,7 steps, meaning that the complexity of S in language 8 9 Kolmogorov, and Chaitin, the main idea is that L2 (i.e., K2(S)) is at least as low as |L2|+K1(S)— the complexity of a string of characters reflects the because that is how many steps it takes to translate degree of incompressibility, as measured by the L2 into L1 and then express S in L2. In other words, length of a computer program required to faithfully the expression of the string in the first language can express the string (see Ref 10). Simple strings are be translated into the second language, with a gen- those that can be expressed by brief computer pro- eral cap on the number of steps required by the proc- grams, and complex datasets are those that cannot. ess of translation. The “translation component” |L2| The idea is usually formalized by considering a string may be very large, if the computer languages are very S (a sequence of symbols), and then considering the different, but it is finite—and, critically, it does not length (in symbols) of the shortest computer program depend on the length of the string. This means that capable of generating it. For example, the loop as strings get longer and longer, the translation com- (in pseudocode) “for i = 1 to 1000: print ‘1’” prints ponent of their complexity matters less and less, and a string of 1000 characters, but the program itself in this sense, their complexity is asymptotically inde- contains only 26; this string is highly compressible. pendent of the programming language. For this rea- By contrast a “typical” random string of 1000 char- son, the Kolmogorov complexity K(S) of a string S is acters (e.g., “39383827262226…”) can’t be com- usually thought of as a universal measure of its inher- pressed in this way, although it can be expressed by ent complexity or randomness. a program that is itself about 1000 characters long Note that the actual value of the K(S) is uncom- (e.g., “Print ‘9486390348473969683…’,” which has putable.b For long strings, it can be approximated by 1008 characters). More generally, a string that con- effective string-compression algorithms such as tains regularities or patterns—of any form that can Lempel-Ziv (see Ref 10) implemented in the common be expressed in the computer language—can be faith- utility gzip,14 meaning that the Kolmogorov com- fully reproduced by a short program that takes plexity of a long string S is approximately the length, advantages of these patterns, while a relatively com- in characters, of the gzipped version of S. For shorter plex or “random” string cannot be similarly com- strings, such approximations are in principle less reli- pressed. This measure of complexity is inherently able, though recent work by Gauvrit et al.15 has for capped at approximately the length of the original the first time provided practical techniques for esti- string, because any string, no matter how irregular, mating the Kolmogorov complexity of short strings, can be reproduced exactly simply by quoting it ver- opening an intriguing research avenue for evaluating batim as in the example above. In this view, simplic- the role of complexity in psychological models. ity is essentially compressibility. Critically, Kolmogorov complexity is universal in the sense that it does not depend “very much” on Information-Theoretic Description Length the computer language in which the program is writ- Another important approach to the quantification of ten. The caveat “very much” has a very precise complexity was initiated by Shannon.16 Shannon meaning here, which derives from the fact that any showed that in a set of messages m1, m2, … which computer language can be translated into any other occur with probability p1, p2 …, each message Volume7,September/October2016 ©2016WileyPeriodicals,Inc. 331 Overview wires.wiley.com/cogsci conveys information given by − log pi, which quanti- well. This perfectly encapsulates Einstein’s (perhaps fies the degree of “surprise” or unexpectedness apocryphal) quip that our theories should be “as sim- entailed by the message. Consequently, if one seeks ple as possible, but no simpler.” to convey a set of messages in the fewest symbols possible, one should adopt a coding language in which each message mi is assigned a code of length Bayesian Inference approximately − log pi symbols. Such a procedure The intimate relationship between Occam’s razor will minimize the expected total code length, that is, and rational probabilistic inference was probably which achieves the most compressed expression pos- first pointed out by Jeffreys,4 one of the principal sible. As a result, the quantity − log pi is sometimes developers of the modern conception of Bayesian referred to as the Description Length (DL). The DL inference. Jeffreys argued in some detail that the sim- of a message is in effect a measure of complexity, plest interpretation was indeed the most likely one, because it quantifies how many symbols are required and in particular advocated adopting priors that to express mi in a maximally compressed code.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us