<<

Synthese (2018) 195:2367–2386 https://doi.org/10.1007/s11229-017-1373-4

S.I. : PREDICTIVE BRAINS

Predictive perceptual systems

Nico Orlandi1

Received: 5 July 2016 / Accepted: 11 March 2017 / Published online: 20 March 2017 © Springer Science+Business Media Dordrecht 2017

Abstract This article attempts to clarify the commitments of a predictive cod- ing approach to . After summarizing predictive coding theory, the article addresses two questions. Is a predictive coding perceptual system also a Bayesian system? Is it a Kantian system? The article shows that the answer to these questions is negative.

Keywords Predictive coding · Bayes · Perception · Kant · Expectations · Error · Free-energy

What does a predictive coding approach to perception (or PCP) commit us to? This article suggests that predictive coding is primarily a theory of information transmis- sion. It is a theory that, in cognitive science, is geared at explaining how perceptual and sensory parts of the brain transmit information in a way that is efficient for the guidance of action. The article questions some additional commitments that the theory is assumed to have. The focus is on two such commitments. First, although PCP and Bayesian accounts of the mind and brain are usually presented hand in hand, I suggest that predictive coding and Bayesianism do not necessarily go together. A predictive coding perceptual system need not also be a Bayesian system. Second, some propo- nents of predictive coding interpret the theory as putting a Kantian and intellectualist gloss on mental processes (Gładziejewski 2016; Hohwy 2013). I explain, instead,

This article develops themes from Orlandi 2014. I thank Jona Vance and Martin Thomson-Jones for assistance in understanding Bayesian decision theory. Needless to say, any mistake present in the paper is my own. I also thank two anonymous referees selected by Synthese, for constructive criticism on an earlier version of this article.

B Nico Orlandi [email protected]

1 UC Santa Cruz, Santa Cruz, CA, USA 123 2368 Synthese (2018) 195:2367–2386 that predictive coding is fully compatible with embedded and ecological accounts of cognition. Why focus on these two themes? Doing so hopes to deliver a better understanding of PCP. But the themes are also interesting from a historical point of view. PCP was initially developed in cognitive science in the field of active vision, and it was thought to be good news for proponents of ecological and embodied understandings of perception (Rao and Ballard 1999). It is a curious development that it would be taken up by proponents of more intellectualist accounts. This article questions this type of uptake. In Sect. 1, I present what a predictive coding framework is, and describe how it may apply to perception. In Sect. 2, I argue that predictive perceptual systems are not necessarily Bayesian systems. In Sect. 3, I discuss the relationship between predictive coding and intellectualism, showing that the posits of predictive coding are in line with an ecological way of understanding the perceptual process.

1 Predictive coding and predictive perception

Suppose that two people make a plan to meet in a certain location at a certain time. Once they make the plan, they decide that they will communicate prior to the meeting only if a problem arises. In doing this, the protagonists employ an efficient technique for transmitting information. They assume the meeting will take place, and communicate only if a change is needed. In other words, they communicate only if there is error in the original plan. This style of communication is the idea behind predictive or efficient coding. Predictive coding is originally a label for an information-transmission strategy employed, for example, in television (Harrison 1952; Oliver 1952). Television requires the continuous transmission of images. One of the problems faced by engineers in the image transmission field is the overload of the transmission medium. Transmission channels have limited capacity. To avoid overload, a strategy is to have a code that transmits information in a way that gets rid of redundancy. Consider transmitting a “still” image. We can either have a signal that transmits the still twice, risking overload, or we can have a signal that transmits the still once, and then communicates that there is no change in the image from the previous transmission (Harrison 1952, p. 766). Similarly, in transmitting a changing picture, we could have a signal that informs concerning every aspect of the changing picture, or we could have a signal that only informs concerning what changes from the initial image. This type of signal does not encode regions of the changed picture that are highly predictable given the past. This would be a predictive coding signal. A predictive code informs about error or what is unexpected, rather than what is present (Clark 2013; Shi and Sun 1999). Error, expectation and prediction are thought to be three key concepts of this approach.1

1 In this first sketch of the theory, I treat expectations and predictions as separate elements. Depending on the specific example, however, there may be no difference between the two (see, predictive coding in lateral and temporal inibition below). In Sect. 3, following the insight of an anonymous referee, I put this distinction into question. 123 Synthese (2018) 195:2367–2386 2369

When an image is transmitted, an expectation is created. The expectation is that the subsequent image will be roughly similar to the previous image. This expectation may in turn create a prediction of what the following pixel values in the incoming signal should be. The prediction is then either confirmed or disconfirmed by the incoming signal, which serves as an error sign. Starting in the 1980s, the predictive coding idea was applied to understanding the response behavior of neurons in preliminary stages of sensory processing—for example, in retinas and LGNs of flies, cats and monkeys (Barlow 1981; Dan et al. 1996; Srinivasan et al. 1982). Neurons in early sensory areas display both spatial lateral inhibition, and temporal inhibition. Inhibition means that neurons can exhibit behavior that is hampering, rather than excitatory. They can inhibit neighboring neurons, and they can self-inhibit. An example of spatial lateral inhibition is “center-surround” antagonism, where neurons in a given retinal layer are organized in such a way that the center is excitatory while the surround is inhibitory. Because of this arrangement, the behavior of excitatory neurons is not simply a function of the amount of light that hits them, but also a function of the inhibitory activity of nearby neurons. Similarly, in temporal inhibition (or self-inhibition) neurons have a “phasic” response profile. They may have an initial excitatory response, followed by a neutral or inhibitory response. A constant intensity of light may evoke virtually no sustained response other than the response due to the initial exposure. A flash of light, by con- trast, may evoke an initial excitatory response, followed by an inhibitory response (Srinivasan et al. 1982, p. 438). Several theories have been put forward to explain lateral and temporal inhibition. One hypothesis is that inhibition removes redundancy by reducing the response range of neurons, thus also enhancing their efficiency in transmitting information (Barlow 1961). Predictive coding is a framework used to spell out this idea. Lateral inhibition can be seen as the realization of a predictive code. In center-surround antagonism, surrounding neurons that have been exposed to nat- ural images generate a statistical estimate of the intensity expected at a particular point in the middle. By subtracting this best estimate (through inhibition) from the signal actually entering at the center, the center neurons communicate if there is something unexpected. In this way, redundant information is suppressed and neurons instantiate an efficient code. In temporal inhibition, a constant intensity light generates an initial response, but then virtually no response. Neurons “predict” the intensity coming in based on how they have already been excited. They subsequently stop responding unless there is a change in intensity—an error. Predictive coding is then introduced as a theory of information transmission that makes of the response behavior of some sensory neurons in vertebrates. Why make it into a more general idea about brains and cognitive activity? This is not always made explicit, but a number of factors likely contributed. Initially, the theory was used to understand the feedback connections between lay- ered neurons (Rao and Ballard 1999). As described so far, lateral inhibition involves the horizontal interplay of nearby neuronal elements. Inhibitory interactions, how- ever, can also occur “top-down”. The visual cortex is organized in layers of neurons 123 2370 Synthese (2018) 195:2367–2386 that often have reciprocal connections. Neurons at higher levels (for instance in V2) may send predictions down to neurons in lower levels (V1). The feedback connections serve as (inhibitory) predictions. The feedforward connections from V1 to V2 serve as error signals. Since not just the visual cortex, but the cerebral cortex in general is hierarchically structured, we can see how this theory can serve as a story of how the whole brain processes information (Rao and Ballard 1999, p. 85). Later, and perhaps more importantly, predictive coding insights were used to under- stand the effective control of action (Clark 2016, ch. 4). One example is given by skilled activity. A major puzzle in understanding how skilled activity takes place is given by information transmission problems. Information transmission problems arise because of the limitations of nerves and synapses. Such limitations produce signaling delays that would seem to impede fluid motion. In skilled (fast) reaching, for example, the brain has to receive, and respond to, a stream of proprioceptive information concerning the position and trajectory of one’s arm and hand, as well as information about the location of objects (Clark and Toribio 1994 p. 402). The problem is that, for fast motions, there are signaling delays both in proprioceptive feedback coming from nerve endings, and in the sensory systems that inform the brain about states of the world. Yet we seem capable of reaching objects, generally successfully, even when the movement is fast. Predictive coding is put to use to understand how this happens. The idea is that the brain uses a “forward model”—a neural network whose interconnected units model our arm–hand apparatus. The network emulates the interplay between arm–hand parame- ters, and it provides mock feedback in place of the absent proprioceptive feedback. In this way, skilled reaching can happen despite the limits in the real-time transmission of information. If we understand the emulator as an expectation concerning the projected position of the body, and the mock feedback as a prediction concerning what the proprioceptive feedback should be, then we have the elements of predictive coding applied to under- standing the guidance of action. Actual proprioceptive feedback, in this picture, has primarily the function of signaling error—that is, of signaling if there is a discrepancy between the mock feedback generated by the emulator, and the actual feedback. The other factor that likely contributed to making predictive coding into a more general theory about brains is its connection with the “free-energy minimization” framework (Clark 2016, p. 305). A predictive code is a code that signals error. Pre- sumably, the brain uses a predictive code to minimize error. In the guidance of action, when actual proprioceptive information indicates that there is error in the mock feedback, the brain responds by adjusting the grasping movement so as to get rid of the error. Error reduction is part of what allows smooth grasping. The thought is that a brain that uses a predictive code is also a brain that strives to minimize error. Now, error amounts to “free-energy” understood as an information theoretic notion. In its information theoretic meaning, free-energy, is (roughly) the discrepancy between a prediction and the data. Minimizing free-energy, in this sense, is presumably some- thing that any self-organizing biological system that exchanges information with the environment does to preserve its own physical integrity (Clark 2016, p. 305; Fris- ton 2009, p. 295; Friston et al. 2006, p. 71). By minimizing information theoretic 123 Synthese (2018) 195:2367–2386 2371

Fig. 1 Shaded dots typically seen as convex. Based on Ramachandran (1988)

free-energy, biological systems resist a tendency to disorder. They do so, either by adjusting their expectations, or by producing action that alters the world to agree with the expectations (Friston 2009).2 Predictive coding is then a theory of information transmission that has been embedded into a larger theory about how brains and biological systems adjust to the environment. Still, it is unclear why it should be applied to understanding spe- cific cognitive processes. For one thing, it is an empirical question whether the brain always uses a predictive code to transmit information. For another, it is also a question whether the brain, and biological systems more generally, always, or even often, try to reduce error. Admittedly, the free-energy minimization framework is sometimes described as a series of heuristics for how biological systems might function, rather than as a rigorous treatment (Friston and Stephan 2007, p. 418). As a theory of more specific processes, it has encountered resistance. It is not clear, for example, that free-energy minimization, intended as the minimization of the mismatch between prediction and data, is a useful idea in explaining intentional action. In performing intentional actions, we are often driven by parochial aims, such as getting a drink. Aims of this kind are central to explaining what we do, yet they do not seem to be equivalent to the general aim of reducing error. In a similar vein, it is legitimate to wonder whether predictive coding is fit to explain isolated cognitive processes, such as perception and reasoning. In what follows, I leave these questions aside and describe how perception may be predictive. My aim in this article is not so much to dispute the truth of predictive coding as a theory of cognition, but to question whether the theory has additional Bayesian and intellectualist commitments.

1.1 Predictive perception

In perception, the brain has to pick up on what is present distally given the stimulation at the sensory receptors, for example, at the retina. Consider the configuration in Fig. 1.

2 There is also a notion of free-energy that comes from thermodynamics. In thermodynamics, free energy stands for the measure of energy “available to do useful work” (Clark 2016 p. 17; Friston et al. 2006 p. 71). The link between information theoretic, and thermodynamic free-energy is only mathematical. Both notions appeal to the same probabilistic foundations. It is the information theoretic notion that is at play in free-energy minimization theory (Friston and Stephan 2007, p. 420). 123 2372 Synthese (2018) 195:2367–2386

Fig. 2 Simplified illustration of predictive coding in vision. The sensory stimulus, in this case, is a pattern of light projected on the retina

This configuration is typically perceived as consisting of convex half-spheres illu- minated from above. How does the brain form this perception from a pattern of light? In so far as this can be regarded as an information-processing problem, we can apply predictive coding to it. In a predictive coding framework, this problem is solved in roughly the following manner. The perceptual system has been trained in our environment, and it is skewed to exhibit certain cerebral configurations. These configurations constitute expectations. Proponents of PCP tend to view these as expectations concerning what is present distally—for example, expectations that convex elements are present. Expectations of this kind generate predictions concerning what should be present at the retina, if the expectations were correct. The predictions, in turn, can be seen as “mock sensory states” that are matched against actual sensory states—that is, against the sensory stimulation that actually hits the eyes. If the predictions encounter no error, then the expectations are confirmed. If there is substantial error (some error may be ignored), the expectations are changed—or, alternatively, the environment is changed to fit the expectations. Because in this kind of story expectations generate predictions, predictive coding is said to put forward a generative model of perceptual activity. Figure 2 is a schematic illustration of the whole process. This is a simplified rendition of PCP. One important complication is given by the fact that predictive processing is typically understood as hierarchical. In the brain, an expectation is implemented in a state of a neural network that “issues a prediction” concerning the lower neural state, which in turn issues a prediction concerning the state below. Issuing a prediction, in this context, is presumably just checking if the 123 Synthese (2018) 195:2367–2386 2373 level of neurons below the expectation are configured as they should be, given the level above. This process is iterated down to the retinal level.3 Despite this complication, this preliminary picture highlights the central role of expectation, prediction and error in the process. Still, as described, the predictive coding approach is pretty general. It can be regarded as a useful framework that pertains to how information is transmitted in the brain. Further questions remain concerning— among other things—what expectations and predictions are in this context, how exactly they are generated and how error is taken into account. In what follows, I consider some ways in which PCP has been further spelled out. I suggest that the framework is general enough to leave a number of open options.

2 Predictive coding and Bayes

While the main problematic behind predictive coding is efficiency in transmitting infor- mation, one of the main problematic behind Bayesian ideas is action under uncertainty. Bayesian decision theory is a mathematical framework that models decision-making in uncertain circumstances. Despite this divergence in underlying motivations, predictive coding views are sometimes presented as equivalent to Bayesian views. The tendency is to suppose that, if we think of predictive coding in terms of free-energy, or error minimization, then the way error is minimized is by performing Bayesian operations (Friston 2009, p. 294). That this is not an obvious move is suggested by the historical fact that an early proponent of the idea that the brain reduces free-energy, was also not a fan of thinking of brains as performing Bayesian inferences. Gestalt Psychologist Wolfgang Köhler thought that the brain is a dynamic, physical system that converges towards an equilib- rium of minimum energy. Yet Köhler rejected Helmholtz’s idea of brains as inference machines (Köhler 1920). So is a predictive perceptual system also thereby a Bayesian system? Answering this question is hard mainly because the question hinges on having a more specific sense of what a Bayesian system is. No doubt there may be liberal conceptions of Bayesianism that would force one to accept that any system that reduces error is Bayesian. If, for example, we take a Bayesian system to be just a system that computes its best guesses using a generative model and taking into account the current sensory evidence (Clark 2016, p. 303), then it seems that many predictive coding systems will also be Bayesian. Similarly, if we conceive of a Bayesian system as any system that reduces uncer- tainty, and then we equate uncertainty with error, then any system that uses predictive coding for error minimization will also be Bayesian.

3 There are other ways in which the sketch of PCP just outlined is simplified. For example, I described the visual apparatus as an isolated process while there are certainly influences from other modalities, and from action. I do not think that these simplifications perniciously affect what is argued in this article. 123 2374 Synthese (2018) 195:2367–2386

Many formulations of the Bayesian idea, however, make some reference to Bayes’ theorem (Jacobs and Kruschke 2011; Rescorla 2015). A system is Bayesian if it is usefully described as approximately conforming to Bayes’ theorem. There are different ways of understanding the meaning and import of Bayes’ the- orem. In this article, we can understand Bayes’ theorem as defining optimal—and in fact rational—performance on a task. In standard examples, Bayes’ theorem describes a principle concerning beliefs. It is a principle satisfied by the credences—or degrees of belief—of rational agents in uncertain conditions. Suppose that an agent holds a number of beliefs concerning the grass being wet, the presence of clouds, the presence of rain and the presence of active sprinklers (Jacobs and Kruschke 2011). These beliefs—or “hypotheses”—are expressed by propositions. At any given time, the agent has degrees of confidence in each proposition she holds. This prior confidence is expressed by a number between 0 and 1. This is the prior probability of a belief or hypotheses, or simply the prior. At any given time, an agent also has conditional credences for one proposition on another. These can be seen as credences that concern the evidentiary relations between propositions—that is, the relations between, say, the presence of rain and the wetness of grass. For any two propositions, P and Q, it will be true of a rational agent that the probability of P over Q conforms to Bayes’ theorem. This means that the agent will hold that the probability of P given Q is dependent on the prior probability of P,on its likelihood—the probability of Q given P—and on the prior probability of Q.The probability of the grass being wet given that it is raining, for example, is relatively high, perhaps higher than the probability that the grass is wet when it is simply cloudy. Bayes’ theorem formally says: p(P|Q) = p(Q|P)p(P) ÷ p(Q), where p(P|Q) is the posterior probability of hypothesis P. A rational agent faced with uncertain circumstances has credences that conform to Bayes’ theorem. When the agent acquires a new belief or a new piece of evidence—if she is rational—she will also update all of her credences in accordance with Bayes’ rule. This rule states that the prior probability of a belief equals its posterior probability. If, for example, the agent wants to explain why the grass is wet, and she observes that the sprinkler is on, she will update all of her credences to reflect this fact. This involves updating her credence in the sprinklers being on, but also updating all of her conditional credences. She will probably increase her confidence in the fact that it was the sprinkler to cause the grass to be wet, and also decrease her confidence in the fact that rain was present. The fact that all credences are updated is important, as we will see later.

2.1 Bayesian perception

Why apply this type of picture to the perceptual case? Proponents of Bayesian accounts of perception tend to stress that perceptual systems operate in noisy conditions. Per- ception is presumed to operate in conditions of uncertainty. Sensory stimulation is said to be compatible with many distal causes, with patterns of light at the retina being congruent with many visual elements, and auditory waves being similarly reconcil- able with a variety of sounds. We do not, however, perceive the world in a constantly 123 Synthese (2018) 195:2367–2386 2375

Fig. 3 Shaded dots typically seen as concave. Based on Ramachandran (1988)

shifting way. We typically perceive the environment in a stable way. Bayesian ideas are brought in to explain how this happens (Hohwy et al. 2008; Palmer 1999; Rock 1983). Recall the configuration in Fig. 1. It is typically perceived as consisting of convex half-spheres illuminated from above. The configuration, however, is also compatible with the presence of concave holes illuminated from below (Ramachandran 1988). You may be able to see the spheres as holes on your own. Figure 3 helps by showing that rotating the configuration makes the central elements appear concave. Given that the stimulus produced by Fig. 1 is compatible with different percepts, the question is why we typically see it in just one way. In a Bayesian framework, this question is answered by reference to Bayes’ theorem. For simplicity, we can think of the as holding just two hypotheses concerning what is present. The hypothesis that convex half-spheres illuminated from above are present (let’s call this hypothesis S) and the hypothesis that concave holes illuminated from below are present (let’s call this hypothesis C). Both hypotheses have, from the start, a non-zero chance of being true. The evidence E1 that the system acquires at a certain time is a pattern of light on the retina—the pattern that in Fig. 1 is presumably compatible with different hypotheses. In the standard Bayesian picture, prior to the acquisition of new evidence, the visual system assigns probability to the initial hypotheses S and C, and to the potential pieces of evidence coming in: E1, E2, E3. For example, the system may assign a prior probability of .6 to hypotheses S, .45 to hypothesis C, and .5toE1. The system also has various conditional credences that conform to Bayes’ theorem for example:

p(S|E1) = p(E1|S)p(S) ÷ p(E1) p(C|E1) = p(E1|C)p(C) ÷ p(E1)

These credences depend, for their numerical value, on the likelihoods. If we presume that p(E1|S) is equal to .7, then, in the first credence, the probability of S given E1 is .84—that is, hypothesis S has an 84% chance of being right given E1. When new retinal stimulation comes in, the visual system updates. Suppose the evidence coming in is, in fact, E1. This evidence makes the posterior for hypothe- sis S higher than for other hypotheses. Hypothesis S—that convex half-spheres are present—is then selected.4

4 In some Bayesian models of perception, the selection of the percept is made in accord with expected utility maximization which involves calculating the costs and benefits associated with accepting the hypothesis 123 2376 Synthese (2018) 195:2367–2386

Interestingly, in our example involving Fig. 1, E1 is compatible with both hypothe- ses S and C. The likelihood of the two hypotheses is roughly similar: p(E1|S) ≈ p(E1|C) = .7. What breaks the tie, so to speak, is the value of the priors. Since the prior probability of hypothesis C is only .45, its posterior is .63. Hypothesis C has “only” a 63% chance of being right given E1. So the priors do substantive work in this context. Priors are probabilities that are independent of the evidence. In perceptual theory, they are thought to be assumptions about the environment, and there is disagreement concerning their ontogeny—that is, whether they are innately specified or learnt from experience (Beierholm et al. 2009,p.7;Brainard 2009, p. 25; Clark 2013 footnote xxxvi, Hohwy et al. 2008, p. 3–4; Mamassian et al. 2002, p. 13). Human vision assumes, for example, that light comes from a single source and from above. It tends to presume that things are illuminated by something overhead, like the sun (Mamassian et al. 2002; Ramachandran 1988; Rescorla 2013; Stone 2011, p.179). Moreover—although this is still controversial—perceptual systems presented with specific shapes are said to prefer convexity to concavity.5 This makes it so that hypothesis S has higher prior probability than C. When the evidence does not make a substantial difference to the likelihoods of the two hypotheses, the priors favor hypothesis S. If, vice versa, the likelihood of C were higher—for instance, .95— then the evidence would play a corrective function in spite of the priors. In this case, hypothesis C would have a higher chance of being right (85.5%) and of being selected as the percept. Like in predictive coding accounts of perception, prior expectations play a central role and the incoming signal serves mostly as a corrector. This may be one reason for associating predictive coding with a Bayesian picture. It is not clear, however, that the similarities extend much further, as the next subsection argues.

2.2 The space between prediction and Bayes

While the expectations of predictive coding have a clear analogue in Bayesian theory as high-level hypotheses, it is not obvious how likelihoods fit into the predictive coding framework. Are they used to calculate the prediction—that is, the mock sensation that is then matched against the actual sensation? Or are they calculated after the new sensory evidence comes in? If likelihoods have no clear analogue in predictive coding, then some Bayesian systems are not predictive coding systems. Conversely, it is far from clear that a predictive coding system must be Bayesian. In the standard Bayesian story, there are many more variables and much more complexity than what is required for a system to simply use a predictive code and to reduce error. One striking difference is the fact that a Bayesian agent maintains and updates probabilities for all possible values of her beliefs or hypotheses. The values must

Footnote 4 continued with the highest posterior. In others, like the one I just described, it is made simply as a function of the posterior probability (Maloney et al. 2009, Howe et al. 2006,p.2;Mamassian et al. 2002, p. 20–21, Mamassian and Landy 1998). 5 For discussion of this prior, see Stone (2011), p. 179. 123 Synthese (2018) 195:2367–2386 2377 ultimately sum up to 1. This is an important feature of Bayesian models that is regarded as central to explaining certain aspects of perceptual learning (Kruschke 2008; Jacobs and Kruschke 2011). A Bayesian system is one that reasons “by exoneration”. If one suspect confesses, an unaffiliated suspect is exonerated. If the sprinkler is on, it is unlikely to be raining, and likely that the sprinkler is responsible for the grass being wet. This feature of Bayes’ theory predicts that Bayesian systems exhibit “backward blocking” (Krutschke 2008; Jacobs and Kruschke 2011). Backward blocking is the following phenomenon. Suppose that in stage one of an experiment, an agent is repeat- edly exposed to two cues C1 and C2 (raining and sprinklers), followed by an outcome O (wet grass). The agent learns that each cue is at least partly predictive of the out- come. In stage two, the agent is repeatedly exposed to C1 followed by O. C2 does not appear in stage two. After stage two, the agent acts as if there is a strong association between C1 and O. If the agent also acts as if there is a diminished associative strength between C2 and O despite the fact that C2 did not appear in stage two, then the agent acts like a Bayesian agent. The associative strength of the second cue has apparently been blocked, or diminished, by the subsequent learning of the first cue. For a perceptual example, think of a visual system trained on two types of stimuli— two types of retinal patterns—that are associated with one distal visual object. In phase one, both stimuli give rise to the perception of the object. In phase two, only one stimulus is present and the association between the stimulus and the distal object is enhanced. For example, the speed at which the object is perceived given the stimulus in question is faster. If after phase two, the visual system were to have a diminished association between the second stimulus and the object, then the system would be Bayesian. But what if the visual system failed to do that? What if the second stimulus continued to be associated with the object similarly to how it was associated with it after stage one? In that case, the visual system would be a system that reduces error, but not in a Bayesian way. It would perhaps be a Rescorla–Wagner system that reduces error by changing a single associative strength—the strength between the first stimulus and the object (Kruschke 2008). Similarly, if an agent increases her confidence that rain caused the grass to be wet while, at the same time, keeping constant the confidence that the sprinkler may also have been on, then the agent is non Bayesian. The agent is still reducing error by thinking that it was probably the rain to cause the grass to be wet. Yet she is not reducing error in a Bayesian way. This suggests that a predictive coding system that reduces error is not necessarily a Bayesian system. Bayesian systems have levels of complexity that predictive coding systems need not have. Not all error reducing systems are optimal or rational. Now, we have been taking a fairly standard interpretation of Bayesian theory. Per- haps we should not consider this standard theory, as it is quite idealized. It may be a good idea, for the purpose of explaining cognitive activity to remove some of the complexity. In accounts of perception, for instance, it may be good to think that the visual system does not go to the trouble of figuring out lots of priors for the sensory evidence, and lots of likelihoods prior to receiving new retinal information. This would conserve some cognitive resources. 123 2378 Synthese (2018) 195:2367–2386

Indeed, it seems certain that the conformity to Bayes’ rule has to be seen as approx- imate (Clark 2016; Jacobs and Kruschke 2011). Due to cognitive capacity limitations such as limits on the size of working memory, and on the quantity of attentional resources, actual Bayesian systems are coarse and incomplete. Still, the question remains of how much of the Bayesian machinery we can do with- out, while keeping the core of the theory. One of the main criticisms of Bayesianism is that it is an unfalsifiable “just so” story. It can be arbitrarily altered so that the data fits the theory (Bowers and Davis 2012). For our purposes, the risk is that, by stripping the Bayesian framework of some components, we may end up with something that is coextensive with a predictive coding approach, but that is not genuinely Bayesian. This is the risk we face when proponents of Bayesian accounts of perception hardly mention the role of updating (Rescorla 2015). The thought seems to be that perception could be Bayesian simply in virtue of percepts being derived as a function of the pos- terior probability of a perceptual hypothesis, with no need to think that the perceptual system also updates in accordance with Bayes’ rule. If, in getting rid of updating, we got rid of the capacity to explain backward blocking, then it seems that we would give up a distinctive feature of Bayesianism. Even then, however, we could ask whether a predictive coding system is necessarily a “minimal” Bayesian system of this kind. The answer seems again to be negative. It may be that a predictive perceptual system reduces error by calculating the posterior of an expectation. It may also be that it does not. In perception, the system could try to reduce error by reproducing from memory the last sensory stimulation that a given percept produced, and then checking if the stimulation is present at the sensory receptors. What counts as reducing error is wider than what counts as reducing uncertainty in a Bayesian way. We can appreciate this fact by reflecting on what would falsify the two theories. Presumably, a non-predictive coding system is a system where, either information is not transmitted in the form of error, or it is so transmitted, but the system is not trying to reduce error. A non-Bayesian system, by contrast, is one that more specifically does not conform to Bayes’ theorem. If this is true, then there is some distance between accepting PCP and accepting a Bayesian approach to perception. In the next section, we ask, instead, about the philosophical commitments of predictive coding theory.

3 Predictive coding and Kant

What kind of picture of the perceptual process do we get from accepting PCP? The tendency has been to presume that predictive coding aligns with a Kantian and intel- lectualist way of understanding perception. Two elements of predictive coding suggest a Kantian gloss. One is the stress on expectations. Expectations are seen to drive what we perceive—a position that is vaguely reminiscent of Kant’s adage that intuition without concepts is blind. The other is the idea, stemming in particular from pairing predictive coding with Bayesian theory, that perception involves a top-down inference. Perception involves a process of hypoth- esis formation and confirmation where percepts are inferred from prior experience and 123 Synthese (2018) 195:2367–2386 2379 from current evidence. Both of these claims seem to clash with an anti-intellectualist and ecological approach to perception. Let’s consider the two claims and their merits in turn.6

3.1 Expectation-driven perception?

The stress on expectations appears often in standard presentations of predictive coding views of perception. Indeed, some proponents of this approach seem to want to recon- ceptualize the importance of the incoming input. Perception is driven “from the top”, by prior knowledge of the world. Incoming sensory signals are responsible mostly for detecting error. Gladziejewski, for example writes: PCT [Predictive Coding Theory] presents us with a view of perception as a Kan- tian in spirit, “spontaneous” interpretive activity, and not a process of passively building up percepts from inputs. In fact, on a popular formulation of PCT, the bottom-up signal that is propagated up the processing hierarchy does not encode environmental stimuli, but only signifies the size of the discrepancy between the predicted and actual input. On such a view, although the world itself surely affects perception—after all, the size of the bottom-up error signal is partly dependent on sensory stimulation—its influence is merely corrective and consists in indi- rectly “motivating” the system, if needed, to revise its perceptual hypotheses. (Gładziejewski 2016, p. 16) Hohwy similarly writes: Perception is more actively engaged in making sense of the world than is com- monly thought. And yet it is characterized by curious passivity. Our perceptual relation to the world is robustly guided by the offerings of the sensory inputs. And yet the relation is indirect and marked by a somewhat disconcerting fragility. The sensory input to the brain does not shape perception directly: sensory input is better and more perplexingly characterized as feedback to the queries issued by the brain. Our expectations drive what we perceive and how we integrate the perceived aspects of the world, but the world puts limits on what our expectations can get away with. (Hohwy 2013,p.2) As these passages attest, the concept of expectation is central to PCP. The passages also downplay the guiding role of perceptual inputs. However, by everyone’s lights, predictive perceptual processes result from “the delicate dance between top-down and bottom-up” influences (Clark 2016). The bottom-up influences are not merely

6 A third way in which predictive coding can be given an intellectualist gloss is by intending “error” as the mismatch between how we represent the world to be and how the world is. In this interpretation, reducing this mismatch is tantamount to getting to truth. As far as I can see, this is not a mandatory reading of “error” in predictive coding. “Error”, in this context, is the discrepancy between the information already transmitted, and the information coming in, quite independently of how this information matches what is in the world. The information may concern what is useful for the system to know, rather than what is true. In other words, a system that uses predictive coding may reduce error in the sense of reducing biological disadvantage and enhancing fit. For reasons of space, I do not investigate this aspect of predictive coding any further in the present article. 123 2380 Synthese (2018) 195:2367–2386 corrective. In many instances, they drive perception. Indeed, it is ultimately very difficult to judge what drives what in predictive coding accounts. Recall that predictive coding is a strategy for transmitting information in which an expectation is first established, and then the incoming signal plays a corrective function. Establishing the expectation is a process that is partly driven from the bottom. Expectations do not come out of nowhere. In untrained neural networks that are often taken to instantiate predictive codes, the incoming stimulation (typically from a training set) is used to train the network to display certain neural configurations in the presence of certain stimuli. This happens during what is called “supervised learning”. If we want a network to discern line segments, then it is the presence of line segments, and of what they produce at the network’s interface, that helps establish what particular configuration the network should be in. In trained neural networks that are already skewed to display certain arrangements, the incoming signal plays the role of telling the system what specific configuration stored in memory to prefer. Indeed this is true, even in conditions of high noise. Take the case of Fig. 1, in which the stimulus is compatible with both the perception of something convex and the perception of something concave. A trained network may be predisposed to prefer convexity apriori—that is, it may have a setting that makes it go for convexity in situations of noise. Still, the incoming signal has to tell the system that we are indeed in conditions of noise, and that the system should go with its own settings. In this case, the bottom-up input determines what specific configuration to display in tandem with pre-set preferences. Saying that it merely has a corrective function is misguided. Stressing too much the importance of expectations in effect overlooks how proponents of predictive coding conceive of their models. Predictive—and even Bayesian—accounts of perception are usually paired with the study of the statisti- cal regularities occurring in the environment. Consider the following example: neurons in low visual areas of cats and monkeys respond optimally to small line segments, but they reduce or stop their response when the line segment surpasses a certain length (Rao and Ballard 1999, p. 79). Why should a neuron that responds to a stimulus stop responding when the same stimulus extends in a certain way? If we think of the visual cortex as trained on natural images, then we can understand this effect. In natural images, short bars seldom occur in isolation. They are usually part of a longer bar that extends to neighboring neurons. When the stimulus properties in a neuron’s receptive field match the stimulus properties in the surrounding region, little response is evoked from the error-detecting neurons because the “surround” can predict the “center”. On the other hand, when the stimulus occurs in isolation such a prediction fails, eliciting a relatively large response (Rao and Ballard 1999, p. 84). In explaining the response behavior of neurons in low visual areas of cats and monkeys, Rao and Ballard appeal to predictive coding and to the properties of the stimulus. It is because small segments—and the stimulus they produce—seldom appear in isolation that the neurons behave as they do. In fact, it is because of the properties of the stimulus that neurons could develop an effective strategy at all. Similarly, Bayesian

123 Synthese (2018) 195:2367–2386 2381 accounts of perception are often paired with Natural Scene Statistics (Stocker and Simoncelli 2006; Hosoya et al. 2005). Natural scene statistics is an approach that originates in physics, and that enjoyed some recent fortune due to the improved ability of computers to parse and analyze images of natural scenes (Geisler 2008, p. 169). One of the fundamental ideas of NSS is to use statistical tools to study not what goes on inside the head, but rather what goes on outside. NSS is interested both in what is more likely present in our environment, and in the relationship between what is in the world and the stimulus it produces. For example, by calculating what type of projection convex elements produce on the retina, NSS can predict whether an element will be seen as convex or concave (Yang and Purves 2004; Geisler et al. 2001). In this framework, a retinal projection gives rise to the perception of something convex primarily because of what comes from the bottom—that is, what type of retinal excitation convex things produce. High-level expectations may also have a role, but only in tandem with the bottom-up signal. Accepting this point amounts to recognizing, in ecological spirit, that a predictive coding approach needs to be grounded in a study of both the statistics of natural scenes, and of the kind of bottom-up input they produce. Indeed, accepting this point, means also recognizing that the stress on uncertainty and noise in the study of perception is overblown. It gives us one more reason to keep PCP and Bayesian approaches separate, in so far as the latter are introduced with the presupposition that perceptual circum- stances are highly uncertain. The stimulus for perception is not (always) ambiguous and not (always) noisy. Expectations can drive perception in any meaningful sense only when they have already been established, and when they have been selected by the incoming stimulation. In this first respect, a predictive coding system is not necessarily—and even not plausibly—a Kantian system.7

3.2 Inference and representation

The second element of the intellectualist gloss on predictive coding is the stress on inferences in perception. Proponents of PCP identify Helmholtz as their predecessor. Hohwy et al. write: There is growing support of the idea that the brain is an inference machine, or hypothesis tester, which approaches sensory data using principles similar to those that govern the interrogation of scientific data. In this view, perception is a type of unconscious inference. (…) This view goes back at least to (Helmholtz 1867/1925) and has been expressed with increasing finess since that time (Gregory 1966; MacKay 1956; Neisser 1967; Rock 1983). More recently it has been proposed that this intuitive idea can be captured in terms of hierar- chical Bayesian inference, using generative models with predictive coding or free-energy minimization (Hohwy et al. 2008,p.2).

7 An anonymous referee rightly points out that work in Bayesian perception often adds a Gaussian noise to the perceptual signal. This addition seems to be un-ecological. 123 2382 Synthese (2018) 195:2367–2386

Motivated by Kantian concerns, Wilhelm von Helmholtz is considered the first proponent of constructivism—the position that perception consists in an unconscious inference (Helmholtz 1867/1925; Rock 1983). In Helmholtz, like in Bayesian accounts of perception, perceptual inferences consist in the testing of perceptual hypotheses, which are typically regarded as representational states that stand for distal objects and properties. The inferential insight was also developed in early computational models of per- ceptual activity, but it is reasonable to wonder about the continuity with Helmholtzian ideas. In classical computational theories, perceptual inferences are “bottom-up” rather than Bayesians (Fodor 1983; Marr 1982). The inferences resemble more traditional transitions from premises to conclusions. They do not consist in the testing of hypothe- ses that conform to Bayes’ theorem.8 What is common to both types of inferences, is that they involve representational states. In classical computations, the visual system is thought to form representations of, for example, lines and edges, in response to representations of certain light discon- tinuities at the retina, by using assumptions about lines and edges as middle premises. Similarly, a Bayesian perceptual system forms representations of convex elements in response to sensory inputs, in virtue of certain assumptions concerning the frequency of convexity in our world. These high-level representations are then checked against representations of certain light patterns in early visual areas. The primary alternative to this kind of inferential position—in both its Bayesian and bottom-up versions—is ecology (Gibson 1966). While constructivist theories view perception as a representationally mediated relation to the world, theories that favor ecology understand perception as providing direct access to the environment. Ecolog- ical approaches admit that perception involves, often complex, neurological activity. However, according to ecologists, such activity is not representational, and not infer- ential. The activity does not resemble reasoning, but simple attunement to the world. Are the kinds of processes introduced by predictive coding, representational and inferential in nature? This issue hinges on whether we need to view expectations, predictions and error signals as representational elements. I think that we do not, and that in fact some aspects of predictive coding suggest that its elements are non- representational. To appreciate this point, we need to spend a few words talking about representation. Representations are internal, unobservable states that are distinctive of cognizers. They are one of the elements that single out psychological systems from purely bio- logical and chemical systems. Representations carry information about environmental states of affairs. By so doing, they have content or accuracy conditions. Like premises, they can be accurate or inaccurate and they have a certain syntax. Furthermore, rep- resentations stir behavior. They are introduced precisely to explain the behavior of intelligent creatures. Reflection on the notion of representation was once centered on giving naturalis- tically acceptable conditions for content, and for misrepresentation. More recently, the focus has somewhat shifted to give a non-trivializing characterization of what

8 Thanks to an anonymous referee for raising this point. 123 Synthese (2018) 195:2367–2386 2383 mental representations are. The worry is that naturalistic views of content based on function and/or on causal covariance—as well as notions of representation coming from —run the risk of promoting a notion of representation that is too liberal. They run the risk of confusing mere causal mediation—which simple detectors can do—with representation, which is a distinctive psychological capacity (Ramsey 2007). One of the central insights in this area, is that carrying information is not sufficient for representation. Mere detectors, or trackers can carry information in some natural- istic sense, without being representations. Detectors are typically understood as states that monitor proximal conditions and that cause something else to happen. As such, they are mere intermediaries that are found in many biological and chemical systems. What is important about detectors is that they are active in a restricted domain, and they do not model distal or absent conditions. The behavior of a system that uses only detectors is mostly viewable as controlled by the world. What the system does can be explained by appeal to present environmental situations. Magnetosomes in bacteria and cells in the human immune system that respond to damage to the skin are examples of detectors of this kind. Representations, by contrast, guide action and are “detachable” (Clark and Toribio 1994; Gładziejewski 2016; Grush 1997; Ramsey 2007). They display some indepen- dence from what is around. There are different ways of spelling out what the detachment of representations amounts to. In some theories, this idea is understood in terms of “off-line” use. A GPS, for example, is a representational device that guides action “off-line”. One’s ongoing decisions concerning where to go are dictated by the GPS. They depend on the contents of the GPS that serves as a stand-in. The decisions are not directly controlled by the world (Gładziejewski 2016, p.10). In other proposals, the detachment of representations is spelled out in terms of content. Representations allow us to coordinate with what is absent because they stand for what is not present to the sensory organs (Orlandi 2014). Regardless of how the idea of detachability is fleshed out, representations are typ- ically understood as unobservable structures that are capable of misrepresenting, that guide action, and that are detachable in one of the just described (Clark and Toribio 1994; Gładziejewski 2016). Are the elements of PCP necessarily representational? That the answer to this question is negative, can be seen first, by noticing that predictive coding models have been offered for a variety of phenomena some of which plausibly involve represen- tational and psychological states, and some of which do not. We have predictive coding accounts of the brain, but we also have predictive coding models of the purely physiological activity of retinal ganglion cells. Such activity plausibly involves no representations. Retinal cells can act as if they are predicting sensory states without genuinely doing so. They simply inhibit neighboring cells. In this context, talk of “expectations” and “predictions” is purely metaphorical. When we apply the predictive coding framework to more sophisticated mental pro- cesses, such as perception, the situation does not substantially change. The high-level perceptual expectations that are formed in response to sensory states are typically regarded as representations. This is because, as also suggested by Fig. 2, the expecta- 123 2384 Synthese (2018) 195:2367–2386 tions are thought to stand for distal conditions. They guide action, they can misrepresent and, depending on how we understand detachability, they have some sort of indepen- dence from what is present in the environment. In the case of Fig. 1, they stand for something absent, since they stand for convex elements when there is nothing convex in the two-dimensional image. We can wonder, however, whether this is a mandatory way of understanding expec- tations. In my intellectualist-friendly presentation of predictive coding, I have been treating expectations and predictions as two separate elements. Expectations concern environmental or bodily elements, while predictions concern internal, neuronal states of the brain—for example, sensory states, or proprioceptive states. But is this faithful to the letter of predictive coding?9 May expectations be just predictions, namely pre- dictions of what is to be found at the neural level below the present level? High-level states of predictive systems, in this reading, do not concern what is to be expected out there. They concern what is to be expected in the brain. Such states are not rep- resentations. They are causal mediators between the level above, and the level below them. They may carry information—in the sense of causally or statistically covarying with environmental parameters—but this is both quite irrelevant to the function they perform, and not sufficient to regard them as representations.10 This is not to say that such states cannot be seen as representations. We might, in fact, have to regard expectations as representations. But that is because of other reasons— for example, because it would otherwise be impossible to explain how perceivers move in the world. The representational status of high-level expectations is not dictated by a predictive coding model by itself. What about the other components of predictive coding? Early sensory states presum- ably carry information about what is in the environment, in particular about proximal conditions, such as the characteristics of light reflected by objects. But even so, such states are not properly detached from their causal antecedents, and they do not steer what the subject does. Areas of the retina, the optic nerve or the primary visual cortex that carry information about light patterns, gradients and other elements of the prox- imal input do not guide person-level activity. They merely track what is present and activate higher areas of perceptual processing. We reach a similar conclusion if we consider error signals. The initial tendency may be to suppose that the error signal is a well-determined and informative signal— telling the brain what aspects of the current expectation are wrong. This, however, is both not mandatory and not plausible. There is little evidence for an independent channel of error signals in the brain (Clark 2013). Low-level perceptual stages appear to communicate the kind of error present by gradually attempting to reconfigure the higher level neuronal state. Error signals understood in this way are both not driving action—unless we question-beggingly assume, as Gładziejewski (2016) seems to do, that action is error reduction—and not sufficiently detached from present conditions. Indeed, like in the case of high-level expectations, the signals are not even about the conditions present

9 Thanks to an anonymous referee for raising this point. 10 Thanks also to Casey O’Callaghan for bringing up this point. 123 Synthese (2018) 195:2367–2386 2385 to the perceiver. Error signals are about things internal to the brain. They inform the brain concerning the need to adjust its own states to reach an error-free equilibrium. Predictions or mock sensations are similar to error signals in this way. They are presumably produced by the high-level expectations to be matched with the incoming data. They only concern the brain’s internal states. Predictions are not aimed at steering action and at doing so in the absence of the sensory stimulation they are supposed to match. They are states produced for checking the neuronal level below them. This exhausts their function. If this is true, then we have reason to doubt the intellectualist gloss put on predictive coding accounts of perception. Such accounts introduce features that are compatible with an ecological interpretation. The machinery they posit is easily seen as non- representational.

4 Conclusion

In this article, I tried to clarify the commitments of a predictive coding approach to perception. After summarizing what I take a predictive coding theory to hold, I addressed two questions. Is a predictive coding perceptual system also a Bayesian System? Is it a Kantian system? I argued that the answer to these questions is negative.

References

Barlow, H. B. (1961). Possible principles underlying the transformations of sensory messages. In W. A. Rosenblith (Ed.), Sensory communication. Cambridge: MIT Press. Barlow, H. B. (1981). The ferrier lecture, 1980: Critical limiting factors in the design of the eye and visual cortex. Proceedings of the Royal Society of London B: Biological Sciences, 212(1186), 1–34. Beierholm, U. R., Quartz, S. R., & Shams, L. (2009). Bayesian priors are encoded independently from likelihoods in human multisensory perception. Journal of vision, 9(5), 23. Bowers, J. S., & Davis, C. J. (2012). Bayesian just-so stories in psychology and neuroscience. Psychological Bulletin, 138(3), 389. Brainard, D. H. (2009). Bayesian approaches to color vision. The visual (Vol. 4). http://color. psych.upenn.edu/brainard/papers/BayesColorReview.pdf. Clark, A. (2013). Whatever next? predictive brains, situated agents, and the future of cognitive science. Behavioral and Brain Sciences, 36(3), 181–204. Clark, A. (2016). Surfing uncertainty: Prediction, action and the embodied mind. Oxford: Oxford University Press. Clark, A., & Toribio, J. (1994). Doing without representing? Synthese, 101, 401–431. Dan, Y., Atick, J. J., & Reid, R. C. (1996). Efficient coding of natural scenes in the lateral geniculate nucleus: Experimental test of a computational theory. The Journal of Neuroscience, 16(10), 3351–3362. Fodor, J. A. (1983). The modularity of mind: An essay on faculty psychology. Cambridge: MIT press. Friston, K. (2009). The free-energy principle: A rough guide to the brain? Trends in Cognitive Sciences, 13(7), 293–301. Friston, K., Kilner, J., & Harrison, L. (2006). A free energy principle for the brain. Journal of Physiology- Paris, 100(1), 70–87. Friston, K. J., & Stephan, K. E. (2007). Free-energy and the brain. Synthese, 159(3), 417–458. Geisler, W. (2008). Visual perception and the statistical properties of natural scenes. Annual Review of Psychology, 59, 167–192. Geisler, W., Perry, J., Super, B., Gallogly, D., et al. (2001). Edge co-occurrence in natural images predicts contour grouping performance. Vision Research, 41(6), 711–724. Gibson, J. (1966). The senses considered as perceptual systems. Boston: Houghton Mifflin. Gładziejewski, P. (2016). Predictive coding and representationalism. Synthese, 193(2), 559–582. 123 2386 Synthese (2018) 195:2367–2386

Gregory, R. (1966). The intelligent eye. New York: McGrawy Hill. Grush, R. (1997). The architecture of representation. Philosophical Psychology, 10(1), 5–23. Harrison, C. W. (1952). Experiments with linear prediction in television. Bell System Technical Journal., 31(4), 764–783. Helmholtz von, H. (1867/1925). Treatise on physiological optics (Vol. 3). New York: Courier Dover Pub- lications. Hohwy, J. (2013). The predictive mind. Oxford: Oxford University Press. Hohwy, J., Roepstorff, A., & Friston, K. (2008). Predictive coding explains binocular rivalry: An epistemo- logical review. Cognition, 108(3), 687–701. Hosoya, T., Baccus, S. A., & Meister, M. (2005). Dynamic predictive coding by the retina. Nature, 436(7047), 71–77. Howe, C. Q., Beau Lotto, R., & Purves, D. (2006). Comparison of bayesian and empirical ranking approaches to visual perception. Journal of Theoretical Biology, 241(4), 866–875. Jacobs, R. A., & Kruschke, J. K. (2011). Bayesian learning theory applied to human cognition. Wiley Interdisciplinary Reviews: Cognitive Science, 2(1), 8–21. Köhler, W. (1920). Physical gestalten at rest and in steady state: A natural-philosophical investigation. In A Source Book of Gestalt Psychology. London: Routledge (Reimpresión en Die Physischen Gestalten in Ruhe und im stationciren Zustand Eine nature-philosophische Untersuchung por W Köhler 1920, Braunsschweig Germany: Friedr, Vieweg und Sohn). Kruschke, J. K. (2008). Bayesian approaches to associative learning: From passive to active learning. Learning and Behavior, 36(3), 210–226. MacKay, D. M. (1956). The epistemological problem for automata. In C. E. Shannon & J. McCarthy (Eds.), Automata studies (pp. 235–251). Princeton: Princeton University Press. Maloney, L. T., Mamassian, P.,et al. (2009). Bayesian decision theory as a model of human visual perception: Testing bayesian transfer. Visual Neuroscience, 26(01), 147–155. Mamassian, P., & Landy, M. S. (1998). Observer biases in the 3d interpretation of line drawings. Vision Research, 38(18), 2817–2832. Mamassian, P., Landy, M., Maloney, L. T. (2002). Bayesian modelling of visual perception. In R. P. N. Rao, B. A. Olshausen, M. S. Lewicki (Eds.), Probabilistic models of the brain: Perception and neural function (pp. 13–36). MIT press. Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: Henry Holt and Co., Inc. Neisser, U. (1967). Cognitive Psychology. Englewood Cliffs, NJ: Prentice Hall. Oliver, B. (1952). Efficient coding. Bell System Technical Journal, 31(4), 724–750. Orlandi, N. (2014). The innocent eye: Why vision is not a cognitive process. Oxford: Oxford University Press. Palmer, S. E. (1999). Vision science: Photons to phenomenology (Vol. 1). Cambridge: MIT Press. Ramachandran, V. S. (1988). Perceiving shape from shading. Scientific American, 259(2), 76–83. Ramsey, W. M. (2007). Representation reconsidered. Cambridge: Cambridge University Press. Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79–87. Rescorla, M. (2013). Bayesian perceptual psychology. In M. Matthen (Ed.), The oxford handbook of the philosophy of perception. Oxford: Oxford University Press. Rescorla, M. (2015). Review of nico orlandi’s the innocent eye. Notre Dame Philosophical Reviews. http:// ndpr.nd.edu/news/the-innocent-eye-why-vision-is-not-a-cognitive-process/. Rock, I. (1983). The logic of perception. Cambridge: MIT press. Shi, Y. Q., & Sun, H. (1999). Image and video compression for multimedia engineering: Fundamentals, algorithms, and standards. Boca Raton: CRC Press. Srinivasan, M. V., Laughlin, S. B., & Dubs, A. (1982). Predictive coding: A fresh view of inhibition in the retina. Proceedings of the Royal Society of London B: Biological Sciences, 216(1205), 427–459. Stocker, A. A., & Simoncelli, E. P. (2006). Noise characteristics and prior expectations in human visual speed perception. Nature Neuroscience, 9(4), 578–585. Stone, J. V. (2011). Footprints sticking out of the sand (part ii): Children’s bayesian priors for shape and lighting direction. Perception, 40(2), 175–190. Yang, Z., & Purves, D. (2004). The statistical structure of natural light patterns determines perceived light intensity. Proceedings of the National Academy of Sciences of the United States of America, 101(23), 8745–8750.

123