<<

The - border: A case for architectural division

The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

Citation Green, E. J. “The perception-cognition border: A case for architectural division.” Philosophical Review, 129, 3 (July 2020): 323-393 © 2020 The Author

As Published 10.1215/00318108-8311221

Publisher Cornell University Press

Version Author's final manuscript

Citable link https://hdl.handle.net/1721.1/130159

Terms of Use Creative Commons Attribution-Noncommercial-Share Alike

Detailed Terms http://creativecommons.org/licenses/by-nc-sa/4.0/ The Perception-Cognition Border: A Case for Architectural Division* E. J. Green, MIT Penultimate draft (forthcoming, The Philosophical Review)

Abstract: A venerable view holds that a border between perception and cognition is built into our cognitive architecture, and that this imposes limits on the way information can flow between them. While the deliverances of perception are freely available for use in reasoning and inference, there are strict constraints on information flow in the opposite direction. Despite its plausibility, this approach to the perception-cognition border has faced criticism in recent years. This paper develops an updated version of the architectural approach, which I call the dimension restriction hypothesis (DRH). According to DRH, perceptual processes are constrained to compute over a bounded range of dimensions, while cognitive processes are not. This view allows that perception is cognitively penetrable, but places strict limits on the varieties of penetration that can occur. I argue that DRH enjoys both theoretical and empirical support. I also defend the view against several objections.

1. Introduction

Many philosophical debates presuppose that there is a border between perception and cognition.

This assumption is salient in discussions of the epistemic role of perceptual experience, the rich/thin debate about perceptual content, and the role of perception in fixing demonstrative reference. But is the assumption correct? If so, how should the perception-cognition border be characterized?

A venerable view holds that the perception-cognition border is marked by constraints on the way information flows between them. On this approach, the ’s cognitive architecture contains a stable information-processing boundary between perceptual processes and cognitive processes. This boundary ensures that while the outputs of perceptual processing are freely available for use in cognition, there are strict constraints on information flow in the opposite direction (Fodor 1983,

2000; Pylyshyn 1999). Call this the architectural approach to the perception-cognition border.1

Recently it has been suggested that the architectural approach is out of date and in need of replacement. Clark (2013), for instance, claims that pervasive top-down effects render the border

* Thanks to Alex Byrne, John Morrison, Tyler Wilson, and two reviewers for this journal for helpful comments on earlier drafts of this paper. I am also grateful to Jonathan Cohen, Steven Gross, Eric Mandelbaum, and Jake Quilty-Dunn for extended discussion of these issues. Portions of this material were presented at the 2018 Eastern APA Meeting, the University of Illinois-Chicago, and Princeton University. I thank the audiences on these occasions for their valuable feedback. 1 See also Quilty-Dunn (2018) for this terminology.

1 between perception and cognition “fuzzy, perhaps even vanishing” (190). Some have even proposed eliminating the perception/cognition distinction from our theoretical vocabulary (Clark 2013;

Lupyan 2015b, 2017; Shea 2014). Others have instead opted for non-architectural theories of the perception-cognition border. Block (forthcoming) suggests that the perception-cognition border should be drawn by appeal to differences in representational format. Beck (2018) and Phillips (2017) suggest that perceptual states, unlike cognitive states, function to be stimulus-dependent. Ogilvie and

Carruthers (2015) propose replacing the architectural approach with a neurofunctional characterization on which the visual system consists of “the set of -mechanisms specialized for the of signals originating from the retina” (722).

In contrast to these views, this paper develops and defends an updated version of the architectural approach. I’ll argue that we can take seriously the abundance of evidence in favor of top-down effects on perception while also preserving the insight that there is a deep architectural boundary between perceptual and cognitive processes.

The most familiar version of the architectural approach holds that perception is cognitively impenetrable (Pylyshyn 1999). Roughly, perceptual processing is immune from influences of the agent’s beliefs, desires, or intentions.2 A number of challenges to this view (and, in my opinion, the most empirically secure among them) have relied on pre-cueing effects, in which instructions to expect or covertly attend to an upcoming stimulus alter perceptual processing of that stimulus in a way that makes sense given the instructions.3 Thus, cues to expect or attend to specific colors (Störmer &

Alvarez 2014), orientations (Kok et al. 2012), motion directions (Serences & Boynton 2007), and

2 The cognitive impenetrability thesis is obviously related to the view that perceptual systems are modular while cognitive systems are not (Fodor 1983). However, because modularity incorporates further characteristics beyond architectural constraints on information flow (e.g., domain-specificity, innateness, and dedicated neural substrate), I’ll restrict my to the impenetrability thesis. 3 For challenges of this sort, see Block (2016), Clark (2016), Lupyan (2015a, 2017), Masrour et al. (2015), Ogilvie and Carruthers (2015), Teufel and Nanay (2017), and Vetter and Newen (2014).

2 shapes (Stokes et al. 2009) result in systematic changes to the responses of visual brain areas that process information about these dimensions.

There is a lively debate about whether such effects constitute cognitive penetration. (The issue is subtle, and I’ll consider it in detail in section 5.) Suppose for now, however, that they do. Is the architectural approach to the perception-cognition border thereby defunct? In what follows, I’ll argue that we can preserve the architectural approach while permitting constrained forms of cognitive penetration. On the account I’ll offer, perceptual processes are marked by dimension restriction, while cognitive processes are not. Individual perceptual processes are constrained to compute over a bounded range of dimensions, and this range cannot be modified through cognitive influence. For example, a perceptual process might be constrained to compute only over the dimensions of brightness, orientation, and motion. If so, then the agent cannot change this fact, no matter her desires or beliefs. My aims, then, are threefold: (i) to articulate the dimension restriction hypothesis in more detail, (ii) to defend the hypothesis against several pressing challenges, and (iii) to argue that the view has advantages over its main rivals.

The structure of the paper is as follows. Section 2 presents the dimension restriction hypothesis (henceforth DRH) and argues that the view preserves a key idea motivating earlier versions of the architectural approach—namely, that cognition is isotropic, while perception is not.

Section 3 considers and replies to several challenges to DRH. Section 4 argues that DRH enjoys empirical advantages over competing views that seek to eliminate information-flow constraints at the perception-cognition border. Section 5 contrasts DRH with cognitive impenetrability, and argues that while pre-cueing effects are compatible with dimension restriction, they raise problems for impenetrability that cannot be finessed through standard maneuvers. Section 6 concludes.

Three qualifications before we get started. First, I am concerned to characterize the border between perception and cognition. I won’t be offering necessary and sufficient conditions for being

3 perceptual (or cognitive). As Beck (2018, 322) observes, these are different projects. An adequate theory of the perception-cognition border needs to provide criteria for determining whether a state is perceptual or cognitive provided that we are antecedently confident that it is one or the other. I suggest that dimension restriction plausibly plays this role, but I don’t claim that it is sufficient for being perceptual, since there may be systems that are neither perceptual nor cognitive that also exhibit dimension restriction (e.g., specialized motor or affective systems).

Second, my defense of DRH will center on examples drawn from vision. Nonetheless, it is my hope that the view will generalize to the other senses. Moreover, if the view is right, then it should also characterize multisensory processes like those responsible for the ventriloquism effect and other cross-modal illusions. I don’t doubt that these cases present unique theoretical challenges, but their exploration must be left for another time.

Third, I allow that there may be some borderline cases of processes that are neither determinately perceptual nor determinately cognitive. But surely almost every psychological category has borderline cases, so why should perception and cognition be any different? The real challenge is to significantly limit the range of borderline cases so that the perception-cognition distinction remains theoretically important and useful. The border can’t be so fuzzy that it vanishes. I aim to supply a view that delivers this result.

2. The Dimension Restriction Hypothesis

2.1. Selection, Modulation, and Enrichment

Potential cognitive influences on perception can be classified according to the differences that they make to perceptual processing.4 I’ll distinguish among three kinds of differences that cognition might make to perception. I’ll call these selection, modulation, and enrichment effects.

4 I’ll understand cognitive states to include both doxastic states (beliefs and expectations) and conative states (desires, intentions, goals).

4 A selection effect makes a difference to which information—objects, features, or relations— get passed to a perceptual process. More precisely, cognition exerts a selection effect on perceptual process P if a cognitive state influences which information (including information computed by other perceptual processes) serves as input to P. Selection effects are central to Wayne Wu’s (2017) recent arguments for cognitive penetration. Wu highlights examples in which intentions to saccade to a particular target result in information concerning that target being passed to the visual operations that drive eye movements (Chelazzi et al. 2001). He describes the effect as follows:

“Attention…serves as a gate within the visual system, filtering one object for further processing— namely, that object on which the subject intends to act” (22). Other cases of cognitively induced selection may include intentionally prioritizing objects for visual tracking (Scholl 2009) or maintenance in visual short-term (Luck & Vogel 1997; Woodman & Vogel 2008).

A modulation effect makes a difference to the perceptual representation of feature values along a fixed dimension or set of dimensions (e.g., color, size, distance, contrast, etc.) that a perceptual process already has at its disposal. More precisely, cognition exerts a modulation effect on

perception at time t if (i) a perceptual process P can already compute over dimensions D1…Dn prior to t, and (ii) at t, a cognitive state influences the value or range of values that P computes over for

one or more of D1…Dn.

I won’t try to give sufficient conditions for the grouping of features into a dimension.

However, I’ll assume the following necessary condition: If an object determinately has one value along a dimension, then, as a matter of empirical necessity, it can’t determinately have a distinct value along the same dimension. Thus, features along the same dimension must empirically exclude one another. If an object is determinately red, it can’t be determinately green, and so on. But this is only a necessary condition. Two features may empirically exclude one another without belonging to the

5 same dimension. Being a human being empirically excludes being a triangle, but these properties do not fall on the same dimension.

Most of the dimensions I’ll discuss are quantitative. By this, I mean that values along the dimension can be naturally ordered, and degrees of difference among them can be measured numerically (Eddon 2013). Most low-level visible dimensions, like size, orientation, brightness, speed, and saturation, are quantitative, and there is strong reason to believe that the visual system is sensitive to their ordering and closeness relations. I also allow for categorical dimensions—dimensions that admit of multiple values that empirically exclude one another, but whose values can’t be compared quantitatively. One example might be the dimension facial expression. This dimension has multiple values (sad face, happy face, etc.), but these may not be naturally orderable and distances among them may not be numerically measurable.5 Some might find this permissive use of

“dimension” unnatural and will prefer to substitute a different term (e.g., “determinable,” “variable,” or “genus”). If so, I have no objection.6

I’ve said that a modulation effect alters the perceptual representation of feature values along a fixed dimension. For categorical dimensions, this would affect which category along the dimension an object is perceptually represented as falling under (e.g., angry face vs. fearful face). For quantitative dimensions, on the other hand, there are two kinds of modulation effects. One involves altering the determinacy of perceptually represented values along a dimension. If cognition exerts this sort of influence, then it alters the range of feature values that a perceptual process represents, and perhaps

5 Two qualifications. First, it shouldn’t be taken for granted, that the facial expression dimension is non-quantitative. It is possible that perception is sensitive to a measurable numerical ordering of faces from fearful to angry—a face space. Webster and MacLeod (2011) argue that vision encodes facial properties in terms of their numerical distance from a neutral face (e.g., a face halfway between angry and fearful). If so, then perceptible facial properties may fall along quantitative dimensions after all. Second, even when dimensions only admit of discrete values, they often possess some basic geometrical structure. Specifically, some categorical values may naturally fall between others, and some pairs of categorical values may be closer together (more similar) than others (see Gärdenfors (2000: 7-17) for discussion). 6 Perceptible dimensions have received extensive attention in quality space theory (see Clark 1993; Rosenthal 2010). The central idea in this literature is that perceptible dimensions form a multidimensional quality space, and that distances in this space underlie relations of phenomenal perceptual similarity and discriminability. The ultimate goal is usually to explain facts about perceptual phenomenal character in non-qualitative, structural terms.

6 also probability assignments in this range, thereby altering precision. The second type involves altering the magnitude of perceptually represented values along a dimension. For example, cognition might alter the intensity of perceived brightness or contrast.

Voluntarily directed feature-based attention (FBA) provides evidence for cognitively induced modulation, and is central to some of Ned Block’s (2016; forthcoming) recent arguments for cognitive penetration. Here is one example. Ling et al. (2009) had participants perform a motion direction discrimination task in which they saw a group of dots with a coherent direction of global motion, and were asked to report whether the dots were moving clockwise or counterclockwise relative to a reference direction. Ling et al. compared participants’ discrimination performance under two conditions: an attention condition, where a specific direction was cued prior to the stimulus presentation, and a neutral condition, where there was no such cue. They found significantly better discrimination performance in the former case. Ling et al. argue that cognitively directed FBA produced a sharpened or more precise visual representation of motion direction, enabling better discrimination ability.7 If this is correct, then cognitive states can, via FBA, modulate visual processes that encode motion direction.

While modulation alters the values that a perceptual process represents for a fixed set of dimensions, an enrichment effect would instead alter the dimensions a perceptual process can represent.

More precisely, cognition exerts an enrichment effect on perception if (i) prior to time t, a perceptual

process P can compute only over dimensions D1…Dn, and (ii) at t, a cognitive state enables P to

compute over a value or range of values for dimension Di ∉ D1…Dn. Suppose, for example, that a visual process starts out having only the dimensions of size and orientation at its disposal, but a

7 Such effects may be explained by the feature-similarity gain principle, according to which FBA amplifies the responses of visual neurons tuned to the attended feature while suppressing the responses of neurons tuned to unattended features (Martinez-Trujillo & Treue 2004; Treue 2015). The result is a sharpened population-level response. This is plausibly an effect on the contents represented by certain visual processes. The representation of motion direction across a visual neural population becomes more precise or determinate as a result of the person’s choice of which feature to attend to.

7 desire causes it to represent a color. If so, then cognition has enriched the representational resources of the process by expanding the range of dimensions it can compute over.

Certain arguments for rich content in perception appeal to forms of cognitive penetration that would seem to require enrichment. For example, Siegel (2010) proposes that cognitive recognitional capacities can infuse perception with representational resources that it did not possess before, and she suggests (2010, 10-11) that the most plausible story about how this might happen involves cognitive penetration. In the most familiar example, a subject acquires a cognitive ability to recognize pine trees by sight, which in turn expands the representational repertoire of perception to include the property of being a pine tree. On the assumption that pine tree is not a value along any dimension that was perceptually representable prior to cognitive penetration, this would qualify as an enrichment effect.

I don’t claim that the three-way taxonomy of selection, modulation, and enrichment is exhaustive, but I think it is comprehensive enough to include most of the effects that cognition has been claimed to exert on perception.

2.2. Dimension Restriction

The primary claims of the dimension restriction hypothesis (henceforth DRH) are, first, that cognition cannot enrich the class of dimensions that particular perceptual processes can compute over—i.e., perceptual processes are dimensionally restricted—and, second, that dimension restriction holds of perceptual processes but not of cognitive processes.

DRH encompasses a more general view of the style of computational processing employed in perception. Let’s say that a system S exhibits dimension restriction just in case there is an analysis

A of S into a set of processes P1…Pn, such that:

(1) A is a functional analysis of the sort found in contemporary cognitive science. In particular,

A is both natural and appropriately fine-grained.

8 (2) Cognitive architecture constrains each of P1…Pn to compute over and output values for a

bounded class of dimensions.

(3) Cognition cannot modify these constraints through any internal psychological process. That

is, the class of dimensions that any Pi ∈ P1…Pn can compute over is fixed by the agent’s

cognitive architecture, and cognitive states like beliefs, desires, and intentions cannot expand

or reduce this range through any internal psychological process.

DRH says that perception exhibits dimension restriction, while cognition does not. Individual perceptual processes are architecturally constrained to compute over a bounded class of dimensions, and cognition cannot change this fact through any internal mechanism. Cognitive processes, on the other hand, are not architecturally constrained to compute over a bounded class of dimensions. They do not satisfy (2). Since (3) presupposes that there are constraints of the sort specified in (2), they don’t meet (3) either. Accordingly, there are constraints on information flow from cognition to perception that do not apply within cognition, and this grounds an architectural boundary between perception and cognition.

Four clarifications are in order. First, when I speak of “perceptual processes,” I intend to pick out standing, repeatable computational capacities of perceptual systems, rather than particular exercises of these capacities (compare Schellenberg 2018, ch. 2). The same process (e.g., edge detection) can be executed over and over again. Functional analyses partition perceptual capacities into subcapacities (Marr 1982; Cummins 1983). Each of these subcapacities is a process that computes a function from inputs to outputs, and the outputs of some of these processes serve as inputs to others. Outputs of edge-detection serve as inputs to figure-ground organization, and outputs of brightness-detection serve as inputs to motion-detection. DRH is thus, in the first instance, a theory of the difference between perceptual processes and cognitive processes. The programmed execution of many perceptual processes underlies exercises of our capacity to perceive. I don’t hold that the

9 execution of a perceptual process is sufficient for perception, understood as a personal-level psychological state (e.g., Phillips 2018). Rather, the idea is that perception results from the operation of perceptual processes, which are marked by dimension restriction. Perception occurs roughly when these processes generate psychological states attributable to the whole person.

Second, when I require that the analysis be natural, I mean that it must not involve any arbitrary groupings. For instance, the analysis should not churn out a process consisting of the visual computation of color and orientation properties, but nothing else. These are distinct computational tasks that cognitive scientists model independently. There may be a superordinate perceptual process that includes both color and orientation processing, but there is, as far as we know, no superordinate process that includes only color and orientation processing. Natural candidates for the analysis would include contour integration (Geisler et al. 2001), shape parsing (Singh & Hoffman 2001), and computing motion from low-level luminance signals (Lu & Sperling 2001). Note, however, that while the analysis must be natural, I do not require it to be uniquely privileged. I doubt whether there is a single “right” way to divide up perception. Nonetheless, surely there are wrong ways to divide it up.

(1) is intended to rule these out.

Third, when I require that the analysis be appropriately fine-grained, I mean that it should be as fine-grained as our best cognitive science suggests, but no more. For example, cognitive scientists treat ventral vision (the “what”-stream) and dorsal vision (the “where”-stream) as distinct visual processes, but they also make much finer-grained distinctions within these categories. Thus, a functional analysis that bottomed out with the ventral/dorsal division would not faithfully reflect what cognitive science tells us about the computational structure of the visual system.8

8 The fine-grainedness requirement is important for another reason. Suppose that cognition directs an early visual process (e.g., edge detection) to represent a categorical dimension like animate/inanimate, which it was unable to represent beforehand. Moreover, suppose that later visual processes can represent this dimension even in the absence of cognitive intervention. In this case, cognition enriches a particular visual process, even though it doesn’t enrich vision as a whole. I intend DRH to rule out this type of influence. The way to do so is to require that legitimate analyses be fine-grained.

10 Fourth, I do not claim that the range of dimensions a given perceptual process can compute over is innately fixed or immutable over time. The claim is rather that cognition cannot alter this range through an internal psychological process. It is possible, for instance, that perceptual can expand the class of dimensions that a perceptual process can compute over (e.g., Schyns et al. 1998), and it is also possible that the direction of perceptual learning is to some extent under cognitive control. But perceptual learning depends on having the right series of external inputs to a perceptual system. It is not mediated solely by an internal psychological process.9 Similar remarks hold, I suggest, for any changes in perceptible dimensions that occur either during normal development or upon the restoration of vision later in life.

2.3. Further Clarifying DRH

According to DRH, perceptual processes are architecturally constrained to compute over a bounded range of dimensions, and cognition cannot modify these constraints. But what exactly does it mean for perceptual processes to be constrained in this way?

To understand and evaluate DRH, it is important to distinguish three questions. First, there is the question of which dimensions a psychological process does compute over on a particular occasion. Second, there is the question of which dimensions the process can compute over at a given time, given the agent’s cognitive architecture at that time. And third, there is the question of what dimensions the process could compute over if the agent’s cognitive architecture were modified in the right sort of way (e.g., through the right perceptual learning regimen). When I speak of restrictions on the dimensions a process can compute over, I have in mind restrictions of the second type— restrictions on the dimensions a process can compute over, holding cognitive architecture fixed.

This distinction is subtle, but an example should help to clarify it. Suppose that we have an assembly line of sorting mechanisms that takes physical objects as input (fig. 1). The first mechanism

9 It may also be possible through extended training to expand the class of dimensions that a process takes as input from other processes. This sort of effect has been observed in multimodal perception (Ernst 2007).

11 is a size sorter. It divides inputs into two categories: big and small. Now suppose that, for whatever reason, big and small things are relevant for different purposes. Given how they are used, we need to know the temperature (but not weight) of the big things and the weight (but not temperature) of the small things. So the big things are fed into a temperature sorter, while the small things are fed into a weight sorter. The temperature sorter divides its inputs into cold and hot, while the weight sorter divides its inputs into heavy and light. Thus, the assembly line has four possible outputs: big+cold, big+hot, small+heavy, and small+light.

Insert figure 1 about here

Suppose that on a particular occasion the assembly line receives a small heavy thing as input.

The object will be classified as small by the size sorter, and then sent to the weight sorter, which will produce the small+heavy output. Now consider the question of which dimensions the assembly line does take into account on this occasion. The answer is clear: It classifies its input by size and weight alone. The temperature sorter simply isn’t activated. But we can also ask a second question: Given how the assembly line is currently structured (i.e., the constraints imposed by its present architecture), which dimensions can it take into account? Now the answer is different: It can classify objects according to size, weight, and temperature. (It just doesn’t do all three at the same time.)

Finally, we might want to ask a third question: Which dimensions could the assembly line take into account if we allowed its architecture to be freely modified? This question is difficult to answer, but at any rate, it is clearly a different question from the first two.

We can ask the same three questions about processes within the assembly line. Consider the temperature sorter. When the assembly line receives a small object as input, what dimensions does the temperature sorter take into account? None. The process is simply not activated. But it

12 nevertheless can classify objects by temperature given the way the system is structured. Holding the system’s internal architecture fixed, it would have done so had the initial input been big. Again, heaven knows which dimensions the temperature sorter could process if we allowed it to be freely modified, but the question is clearly different from the first two.

I have emphasized the distinction among these three questions because it will be important when we evaluate potential challenges to DRH in section 3. The key point is this: DRH requires constraints on which dimensions a process can compute over, holding fixed the facts about cognitive architecture. The assembly line exhibits such constraints. So, I argue, does perception. DRH is not a view about the dimensions a process does compute over on a particular occasion, nor does it impose specific restrictions on how the architecture of perception might be modified over time.

Now let’s apply DRH to a real-world example. Consider the process of computing shape from shading, illustrated in fig. 2 (see Mamassian et al. 2002; Rescorla 2015). Suppose the process works as follows. The visual system encodes prior probabilities for a class of convexity values and a class of light source locations. It also encodes a likelihood function specifying how likely a particular pattern of retinal illumination is for each {convexity, source location} combination. From these, it computes a posterior distribution over possible convexities, given a particular retinal illumination, and selects a particular value from this distribution according to a decision rule (e.g., selecting the mode or mean of the posterior).10 The upshot is that certain distributions of illumination cause a percept of convex dots, while others cause a percept of concave dots.

Insert figure 2 about here

10 The example is meant only to be illustrative. I take no stand on whether a strongly realist interpretation of Bayesian models is correct (see Block 2018).

13 DRH suggests that the shape-from-shading process can only compute over these dimensions (convexity, source location, and illumination), and no others. Suppose, for example, that

Beatrice believes that it is highly likely that she will encounter convex shapes when she is in her office, and believes herself to be in her office. If she could, at will, modify the shape-from-shading process by adding a new term specifying the likelihood of convexity conditional on being located in her office, this would be a form of enrichment. DRH precludes this kind of effect. It is a fact about

Beatrice’s cognitive architecture that the shape-from-shading process can compute over the dimensions of convexity, light source location, retinal illumination, and these alone. Because the property being in Beatrice’s office is not a value along any of these dimensions, her shape-from-shading process can’t compute over it.

I believe that DRH fits comfortably with standard practice in vision science. Vision scientists routinely individuate perceptual processes according to the dimensions to which they are sensitive.

Consider, for instance, visual motion perception. Motion perception appears to involve at least three functionally separable processes: first-order, second-order, and third-order motion (Lu & Sperling

2001; Nishida 2011). Importantly, the main reason this functional division was proposed is that each process is sensitive to a different range of cues or dimensions. Roughly, the first-order motion system computes motion from absolute luminance signals, the second-order motion system instead uses contrast, and the third-order motion system uses overall salience.11

It is not trivial to claim that perceptual processes are dimensionally restricted. This is because it is easy to think of psychological processes that are not dimensionally restricted. Consider, for instance, a process dedicated to performing modus ponens (Fodor 2000, 60-62). The process responds to any inputs of the form {(P → Q), P} and produces Q as output. While the process is

11 To stave off a potential worry: “Overall salience” itself—at least as indexed by visual search—appears to be determined by a restricted range of dimensions (Wolfe 2015), so third-order motion is not a problem for DRH.

14 sensitive only to inputs that meet these formal parameters, the process is not dimensionally restricted. “P” and “Q” may involve any dimensions whatsoever. Thus, DRH is not trivially true.12

DRH is also not equivalent to the claim that perception can only represent perceptible dimensions. We can distinguish two notions of “perceptible dimension”. A relativized notion would classify dimensions as perceptible or imperceptible relative to a perceiver at a time. Roughly, D is perceptible for S at t just in case S is able to perceptually represent values along D at t, given the architecture of her perceptual system. A non-relativized notion would classify dimensions as perceptible or imperceptible full stop. D is perceptible just in case it’s possible for some agent, at some time, to perceptually represent values along D.

DRH says that there are constraints on the dimensions representable by particular perceptual processes of particular agents at particular times. Since it is not a view about what could be perceptually represented in principle, it is not equivalent to the claim that perception can only represent perceptible properties in the non-relativized sense. Indeed, it’s consistent with DRH that there are no well-defined limits on the dimensions that could be perceptually represented by some agent at some time (e.g., given the right evolutionary or ontogenetic history).

DRH is also not equivalent to the claim that perception can only represent perceptible dimensions in the relativized sense. DRH says that particular perceptual processes are dimensionally restricted. If so, then many perceptual processes should be unable to represent dimensions that other perceptual processes can represent. I discuss examples of this sort in section 4, where I argue that processes involved in visual search are unable to represent certain features like cross-shaped

12 There are important parallels between DRH and Burnston and Cohen’s (2015) recent proposal that psychological modules should be understood as functionally separable systems that respond only to a delimitable range of inputs. However, the two are not equivalent for two reasons. First, Burnston and Cohen do not put forward their position as a theory of the perception-cognition border, instead leaving open the question of whether there are perceptual mechanisms that are not modular in their sense. Second, DRH is stronger than the claim that perceptual processes respond to a delimitable range of inputs. It is a specific claim about how this range should be delimited. The hypothetical modus ponens system only responds to inputs meeting certain formal parameters, and in this sense its possible inputs can be delimited (Sperber 2002; Barrett & Kurzban 2006), but it is not dimensionally restricted.

15 intersections and T-junctions. These features are perceptible in the relativized sense, but not all perceptual processes can represent them. Thus, DRH does not merely place constraints on the dimensions that perception as a whole can represent (for an agent at a time). It constrains the dimensions that particular processes within perception can represent.

Thus, DRH isn’t equivalent to the claim that we can only perceive what is perceptible. This shouldn’t be so surprising. After all, we can only cognize what is cognizable, but this doesn’t entail that cognitive processes are dimensionally restricted. DRH is an empirical claim about the style of computational processing employed in perception. Individual perceptual processes are subject to architectural constraints on the dimensions that they can compute over. As a result, they can be affected by outside systems only in highly constrained ways.

Before moving on, I should also clarify the relation between DRH and the rich/thin debate about perceptual content. Rich theorists claim that perception represents both low-level properties

(e.g., color, shape, and motion) and high-level properties (e.g., causation and natural kinds), while thin theorists claim that perception represents only low-level properties (see Siegel & Byrne 2016 for a debate). In section 3, I’ll consider whether beliefs about high-level properties can enrich perception. I’ll argue for a negative answer. For now, however, I want to emphasize that it is no commitment of DRH that perceptual content is thin. DRH claims only that if perceptual content is rich, then it does not become rich through enrichment. This is consistent with rich contents being conferred through either hardwired mechanisms or diachronic perceptual learning. But what should a proponent of DRH say about the pine tree case mentioned earlier? There are two options.13 One could either hold that the pine tree expert comes to perceptually represent the property pine tree through diachronic perceptual learning, or hold that she does not represent the property at all, and

13 Three options, if we include the option that these cases involve no perceptual change at all.

16 instead gets better at attending to clusters of values along dimensions that were already representable by her perceptual system. I’ll remain neutral between these options.

2.4. Dimension Restriction and Isotropy

I suggest that DRH, though weaker than full-fledged impenetrability, accommodates many of the principal insights that made the impenetrability thesis plausible. Most importantly, the view entails that perceptual processes are non-isotropic. If, as many have argued, cognitive processes are isotropic, then there is a key computational difference between perception and cognition.

To say that a psychological process is isotropic is to say that, in principle, it has access to any of one’s beliefs, desires, intentions, and so on during normal functioning (Fodor 1983; Fodor 2008,

115). The clearest example would be a process that can initiate an unbounded search through long- term memory. Fodor (1983, 105-119) argued that cognitive processes must be isotropic if we are to explain the holistic character of belief fixation: When forming a new belief, your process of deliberation must in principle have access to anything you know. With proper time and motivation, there are no limits on the range of background knowledge that your deliberation may call to mind.

The isotropic character of cognition is perhaps most evident in our ability to comprehend metaphor. Thus, Chiappe (2000) writes:

[T]he human capacity for metaphor…seems to require a mechanism capable of full or massive integration… [M]etaphors involve bringing together concepts from semantically distant domains, as when we compare crime to disease, evolution to a lottery, lawyers to sharks, education to stairways, rage to a volcano, and so on. Indeed, there does not seem to be a limit to the domains that we can bring together to create metaphors. (151-152)

Metaphor underscores our capacity for quick integration of knowledge from widely disparate domains. This capacity speaks strongly in favor of isotropy (for other arguments in support of isotropy, see Currie & Sterelny 2000, 149-150; Samuels 2006, 47-48). As I’ll understand it, the claim that cognition is isotropic is not a normative claim about the standards of good reasoning. It is not the claim that to reason well, you should have access to practically anything you know. Rather, it is a

17 descriptive claim about accessibility in practice (cf. Fodor 1983, 105). Cognitive architecture places no fixed constraints on the range of information that you can access during normal reasoning.

Fodor argued that perceptual processes are not like this. They lack freewheeling access to our background beliefs, and they do not search laboriously through long-term memory during normal functioning, no matter how much time you give them to operate. I believe that this insight remains powerful.

If DRH is right, then perceptual processes must be non-isotropic. The reason is that the vast majority of information in long-term memory would be unusable for a given perceptual process. The only cognitive states that can potentially affect a given perceptual process P are those that (a) have implications for the set of dimensions that P can compute over, which (b) the agent knows about.

Thus, DRH entails what many have construed as a key computational difference between perceptual and cognitive processes.

Firestone and Scholl (2016a), in their recent defense of cognitive impenetrability, have also touted the non-isotropy of vision science models:

Today’s vision science has essentially worked out how low-level complex motion is perceived and processed by the brain, with elegant models of such processes accounting for extraordinary proportions of variance in motion processing – and this success has come without factoring in morality, hunger, or language (etc.). Similarly, such factors are entirely missing from contemporary vision science textbooks. (2)

Firestone and Scholl go on to suggest that if visual processes do have access to information about morality, hunger, and language, then standard models in vision science are “scandalously incomplete” (3). In other words, perception science has been successful, and this success has come in part by treating perceptual processes as non-isotropic.

If DRH is correct, then there is a good reason why successful models in vision science cast perceptual processes as non-isotropic. Outside factors influence perceptual processing only by selecting or modulating estimates for the fixed range of variables that successful computational

18 models already include.14 So if DRH is right, then these models are not “scandalously” incomplete, because they don’t omit any terms that they ought to include.

Thus, DRH entails that perception is non-isotropic. Assuming that cognition is isotropic,

DRH preserves Fodor’s insight that there are important computational differences between perception and cognition. Of course, this motivation for DRH requires accepting that Fodor was right that cognitive processes are isotropic. One might worry that if a massively modular view of the mind is right, then this undercuts the motivation for DRH. In section 3.3, however, I’ll argue that

DRH remains a plausible way to draw the perception-cognition border even if Fodor was wrong about cognition in the ways that massive modularists contend.

3. Challenges to Dimension Restriction

There are two main ways to challenge DRH. On the perception side, one might argue that some or all perceptual processes are not dimensionally restricted. On the cognition side, one might argue that some or all cognitive processes are dimensionally restricted. I discuss perception-side challenges in sections 3.1 and 3.2. I discuss cognition-side challenges in section 3.3. Section 3.4 addresses a more abstract concern about whether DRH is falsifiable.

3.1. Impoverished and Crowded Displays

Perception-side challenges come in two broad types. First, one might argue that DRH’s architectural framework is simply wrongheaded. Perceptual processes are unconstrained in the dimensions that they can represent. If so, then there is no question of cognition modifying these constraints, since there are no constraints to begin with. I’ll discuss this unconstrained alternative in section 4 and argue that it faces serious problems, so I delay discussion of the issue until then. A second challenge

14 We can also think of dimension restriction as imposing implicit conditional independence assumptions. Suppose that a process computes the value of dimension F on the basis of dimensions G and H, and that it is architecturally constrained to compute over these dimensions alone. Then the process in effect assumes that, conditional on the values of G and H, the value of F is probabilistically independent of any other variables. Other variables can thus be ignored.

19 alleges that there are, in general, constraints on which dimensions perceptual processes can compute over, but our beliefs and desires can sometimes modify these constraints (e.g., when sensory input is degraded or highly ambiguous). I consider versions of the second challenge in this subsection and the next.15

One recipe for a counterexample to DRH would be this: Before time t, a perceptual process

P cannot compute over high-level category F. At t, the subject receives a hint that there is an object in her surroundings belonging to category F. Cognition then directs P to “look” for the F. If P ends up computing over F as a result of this directive, then it has been enriched through an internal psychological mechanism, and DRH is false. I will discuss two types of cases that might be to work this way. In this subsection, I consider the influence of high-level hints on object detection in contexts where detection is difficult. For instance, the context might be informationally degraded, very brief, or crowded with distractors. In the next subsection, I’ll consider bistable figures.

Insert figures 3a and 3b about here

Figures 3a and 3b contain degraded images of items from familiar categories (see Lupyan

2017 for more examples). A commonplace experience with these images is that the objects are initially difficult to differentiate, but significantly easier following a clue to the object’s category. The

Dalmatian is easier to see when you are told that one is depicted. Consistent with this, Stein and

Peelen (2015) found that category-level hints helped subjects to discriminate low-contrast objects in noisy, degraded displays. This held for a variety of basic-level categories, including people, cats,

15 There is a further wrinkle. On the perception side, I’ll only be considering objections that allege that cognition can enrich or expand the dimensions available to a perceptual process. Strictly speaking, however, DRH would also be false if cognition can reduce the range of dimensions accessible to a perceptual process. (Thanks to a reviewer for pointing this out.) Since I am unaware of any cases that have been thought to work this way, I will focus only on enrichment challenges, not reduction challenges.

20 chairs, cups, and tools. Similar results have been found for superordinate categories. For example,

Stein and Peelen (2017) found that when subjects were cued to expect animals, they were better at detecting the presence of animals in masked 33 ms displays (as indexed by the d’ detection sensitivity measure). These effects were category-specific at least to some degree. Cues to expect animals improved the detection of animals, but not of vehicles.

Other studies reinforce the idea that advance knowledge about an object’s category can aid object detection. Lupyan and Ward (2013) examined this using continuous flash suppression. One eye was shown a series of high-contrast noise patterns on each trial. The other eye was shown a coherent image of an object on 50% of trials, and no object on the remaining trials. Subjects were given either a valid or invalid auditory cue before the stimuli were presented. Valid cues foretold the category of the object to be presented (e.g., “kangaroo” followed by a picture of a kangaroo), while invalid cues did not. The participants were simply asked to report whether they had seen an object.

Lupyan and Ward found that detection was significantly faster and more accurate following valid cues relative to invalid cues (conditions with no cue at all fell in the middle). They concluded that linguistic cues to an object’s category can “boost an otherwise unseen object into visual awareness”

(see Pinto et al. 2015 for similar results).

Do these results show that beliefs about high-level categories can enrich early perceptual processes involved in object detection? I’ll argue, first, that they surely do not show this, and second, that there is recent evidence against the view that high-level cues exert their effects via enrichment.

To make the case that a perceptual process has been enriched with high-level category F at time t, two things must be shown: (1) that the process cannot compute over F before t; and (2) that it does compute over F at t.

21 While (1) may be open to doubt,16 I’ll grant it for present purposes. It’s plausible that visual processes are generally unable to compute over most high-level categories. I contend, however, that the current evidence provides no reason to accept (2). Beliefs about a high-level category F might lead early vision to prioritize F things without causing early vision to compute over F. Such beliefs may instead exert their influence by prioritizing the diagnostic features of F for further analysis.

Diagnostic features of a category are features that raise the probability of category membership. Having four legs raises the probability of being a dog, while having wheels raises the probability of being a car. The idea, then, is that beliefs about F activate tacit knowledge about the diagnostic features of F. Then, perhaps through feature-based attention, diagnostic features of F that are detected in early vision are preferentially passed to perceptual segmentation and grouping processes. This would be a selection effect, not an enrichment effect.

The diagnostic-features approach is empirically plausible. Consider again Stein and Peelen’s

(2017) finding that cues to expect animals led to improved visual detection of animals. I’ll simply assume that this was an effect on perception, and not merely on post-perceptual categorization

(although see Firestone & Scholl (2016a, 17) for reservations about this sort of inference). Cognition influenced perceptual processing so that members of the animal category visually popped out, while other objects did not. On its face, this is one of the hardest cases for the diagnostic-features approach. Animals vary widely in shape, size, color, texture, and so on. What features, short of animal itself, could cognition select in order to facilitate detection of animals but not other objects?

16 Some have argued that early vision automatically computes high-level categories like animal, car, and human without cognitive influence (Mandelbaum 2017). If so, then it is false that early visual processes are unable to compute over such categories. Mandelbaum cites work within the rapid serial visual presentation (RSVP) paradigm showing that participants can reliably detect items belonging to a particular basic-level category even with presentation times as short as 13 ms followed by masks (see Potter et al. 2014). He argues that the best interpretation of these results is that basic-level categorization happens during early vision without top-down feedback. However, Mandelbaum’s interpretation of this data is controversial. Block (forthcoming) argues that the RSVP evidence does not show that basic-level categorization is really achieved within 13 ms because further processing may ensue between the removal of the stimulus and the subject’s response. In any case, I’ll remain neutral on whether early vision ever represents high-level categories. I claim, however, that the most plausible account of how high-level hints affect object detection does not presuppose this view.

22 Some recent studies have shed light on this issue. It turns out that animals do differ fairly systematically from other kinds of objects, like minerals and artifacts. They are, on average, more rounded and symmetric, and they tend to a have an obvious ‘main part’—e.g., an elongated torso

(Schmidt et al. 2017).17 A recent study highlights the importance of diagnostic shape and texture features in animal detection. Long et al. (2017) first used a visual search task to confirm that animals are more quickly detected when surrounded by non-animals (in this case, artifacts) than when surrounded by other animals. Next, the authors manipulated the images used in the first experiment to create ‘texforms’, which preserve some shape and texture information but are no longer recognizable at the basic level—e.g., as dogs, snakes, etc. (fig. 4). Remarkably, Long et al. found that animals were still easier to find among artifacts than among other animals even when all the images had been reduced to unfamiliar texforms. This result strongly suggests that the original search advantage was due largely if not wholly to differences in the low-level shape or texture features that were preserved in the unfamiliar texforms.18 These are candidates for the kinds of diagnostic features of animals that cognition might select (via feature-based attention) to enable rapid animal detection.

Insert figure 4 about here

This underscores an important point. The diagnostic-features approach can’t be ruled out from the armchair. Its assessment demands careful analysis of the distribution of low- and mid-level features throughout the relevant category. Even when a category seems hopelessly heterogeneous, it can turn out that its members typically share relatively simple shape or texture features. If so, then

17 Of course, these aren’t the only diagnostic features that might be used in animal detection. More local features may matter as well. For example, subjects are both faster and more accurate in detecting an animal when at least one eye, a mouth, or four limbs are present in the image (Delorme et al. 2010). 18 Which features are these? Long et al. observed that the animal texforms tended to be “curvy,” while the artifact texforms were more “boxy” (see also Levin et al. 2001). In a further analysis they confirmed that subjects were faster to detect an animal texform among artifact texforms when there was a larger curvature difference between them. However, this did not fully account for the search advantage, suggesting that additional features were relevant.

23 cognition can facilitate detection of the relevant category by attentionally selecting some of these features for further processing. To repeat: This is a selection effect, not an enrichment effect.

I’ve argued that the diagnostic-features approach provides an empirically viable alternative to the enrichment hypothesis. But we can do more than point to a burden of proof. In fact, there is evidence against the enrichment hypothesis as an account of our ability to rapidly detect high-level category members in crowded or impoverished displays.

If DRH is correct, then the possible effects of high-level category hints are constrained by the fixed representational architecture of perception. Specifically, high-level hints can affect a given perceptual process only by filtering through the architecturally fixed dimensions that the process can compute over. If, on the other hand, cognition enriches perception following a high-level hint, then cognition alters the representational architecture of perception. It expands the range of dimensions that one or more perceptual processes can compute over. Thus, one way to adjudicate the dispute between the hypotheses is to ascertain whether high-level hints exert their influence within the bounds of the preexisting representational architecture of the visual system (i.e., its representational architecture prior to the high-level hint).

In a recent study, Cohen et al. (2017) put this question to the test. Cohen et al. exploited the idea that changes to the representational architecture of perception should be associated with changes in the similarity relations to which perceptual processes are sensitive. If a process is only able to represent the dimensions of brightness, contrast, and orientation, then it will only be sensitive to similarities along those dimensions. Thus, if a dimension—say, size—is added to the process, then the process should become sensitive to new similarities. For instance, two objects that are initially treated as dissimilar due to differences in brightness, contrast, and orientation could come to be treated as similar if they are alike in size. Likewise, if a perceptual process is enriched with high-level category F, then this should selectively enhance the perceptual similarity of Fs beyond what could

24 have been predicted before enrichment. Cairn Terriers and Great Danes are dissimilar in texture, color, size, and most other perceptible properties, but alike as regards doghood. Enrichment with the category dog should thus enhance their perceptual similarity (see also Byrne 2009, 449-450).19

The Cohen et al. study had two stages. In the first stage, participants were placed in an fMRI scanner and sequentially shown items drawn from eight categories, including bodies, buildings, cars, cats, and chairs. From this, they derived measures of cross-category neural similarity: the voxel-by- voxel correlation in visual brain regions between activity in response to category F and activity in response to category G. In the second stage, a different group of participants performed a visual search task in which they were cued to find objects belonging to one category among distractors of one of the other categories (e.g., find the body amidst a set of chairs). All the objects were presented in black-and-white with low contrast, making detection more difficult. Note that if high-level category hints can produce enrichment, then this is the sort of case where we would expect to observe it: Subjects are cued to expect a target belonging to a particular category, and they must differentiate and locate this target as quickly as possible in a context that is crowded and informationally impoverished. Furthermore, visual search provides a good way to gauge perceptual similarity, because it is well known that search for a target becomes progressively more difficult as target/distractor similarity increases (Duncan & Humphreys 1989; Wolfe 2015; see also section 4).

Cohen et al. found that ease of search for one category among distractors of another category was reliably predicted by the neural similarity measure calculated in the first stage of the study. This relationship was especially strong in ventral occipital cortex, where the correlation between neural similarity and visual search reaction time was 0.77. This suggests that perceptual similarity following a high-level category hint (indexed by ease of visual search) is a direct function of perceptual similarity without a high-level hint (indexed by neural similarity). This fits nicely with

19 The idea that perceptual similarity between two items is a function of their distance in a similarity space structured by perceptible dimensions is also familiar from work on quality space theory (e.g., Clark 1993; Rosenthal 2010).

25 DRH, because it suggests that when high-level hints aid object detection in crowded displays, they do not restructure perceptual similarity space by injecting new dimensions or categories. Rather, their influence goes by way of the dimensions to which various perceptual processes are already attuned. Cohen et al. conclude: “[R]esponses across higher-level visual cortex reflect a stable architecture of object representation that is a primary bottleneck for many visual behaviors (e.g., categorization, visual search, working memory, visual awareness, etc.)” (400).

To sum up: High-level category hints can produce top-down effects on object detection in impoverished or crowded displays. Nonetheless, there is compelling evidence that these phenomena do not alter the stable representational architecture of perception. Rather, they appear to operate within the bounds of dimension restriction.

3.2. Bistability and Seeing-As

I’ve considered the role of high-level hints in rapidly detecting category instances within crowded or impoverished displays. I’ve argued that such effects probably do not involve enrichment. However, bistable figures might be argued to pose a deeper problem for DRH.20 Consider figure 5a, due to

Bugelski and Alampay (1961). The figure can be seen in two ways—either as a rat or as an old man’s face. It is possible to shift between these interpretations without moving the eyes. The consensus view among psychologists is that cognitive states can influence (though not fully control) these shifts. If so, doesn’t cognition enrich perception? We first see the figure as a rat, and then we see it as a man.21 This is a different category that doesn’t correspond to a value along any of the dimensions represented previously. A new dimension is represented thanks to cognitive influence.

20 Thanks to a reviewer for raising this issue. 21 A thin theorist about perceptual content might object that perception only represents thin counterparts of face and rat (e.g., complex shape gestalts roughly coextensive with these categories). However, I’ll put this issue aside because my treatment of bistability applies regardless of whether we accept the rich or the thin view.

26 Insert figures 5a-5c about here

I contend that DRH accommodates cognitive influences on the perception of bistable figures. Recall the distinction between (i) the dimensions a process does compute over on a particular occasion, and (ii) the dimensions a process can compute over given the agent’s cognitive architecture.

DRH imposes restrictions of the second type. However, it is compatible with DRH that cognitive processes can affect (i)—the dimensions a perceptual process does compute over in a particular case.

The most obvious way is by affecting whether the process is active at all.

The basic model that I’ll adopt to explain cognitive influences on bistable perception is not new. However, it is empirically plausible, and it is important to show how the model fits within the

DRH framework. Before getting to the rat-man figure, let’s consider two more familiar examples: the Necker cube (fig. 5b) and Rubin’s face-vase (fig. 5c). These figures are also bistable. You can see fig. 5b either as a cube pointing down and to the left, or as a cube pointing up and to the right. You can see fig. 5c either as a single vase or as a pair of faces looking at each other. We have some voluntary control over how we see these figures. You can, at least to some extent, intentionally shift interpretations or “hold” an interpretation for a time (Leopold & Logothetis 1999; Long & Toppino

2004). How does this happen?

There is evidence that cognitively induced shifts of the Necker cube and the face-vase depend on allocation of spatial attention. Hold your eyes fixed at a point in the center of fig. 5b. If you covertly attend to the vertex marked A, you tend to see the figure pointing down-and-left, and if you attend to the vertex marked B, you tend to see it pointing up-and-right (Peterson & Gibson

1991; Meng & Tong 2004). Likewise, when you attend to the vase region of fig. 5c, you tend to see the vase as figure, while attending to either of the face regions has the opposite effect (Baylis &

27 Driver 1995). These observations are buttressed by neurophysiological data.22 Thus, cognition plausibly induces shifts in the perception of ambiguous figures via the allocation of attention.

Presumably, it has this effect by selecting or prioritizing certain locations or features of the figure for further analysis. In the case of the Necker cube, focusing your attention on one corner leads the visual system to locate that corner in the foreground. It tends to appear closer than the unattended corners, and this affects the subsequent processing of 3-D shape.

Here is how I propose to explain cognitive effects on the perception of bistable figures.

When a figure permits multiple high-level interpretations, cognition can, through the allocation of attention, privilege one of these interpretations by prioritizing certain features or locations for further perceptual analysis. When the features diagnostic of one category are selected, higher-level processes attuned to that category are preferentially activated.23 But this is a selection effect, not an enrichment effect. To see why, consider a modified version of the earlier assembly line (fig. 6).

Suppose that the assembly line can receive multiple inputs at once, and the size sorter classifies all of them. However, the temperature and weight sorters are mutually inhibitory—only one can be active at a time. Which one is dominant depends on which output from the size sorter is selected for further analysis. If an item classified as big is selected, the temperature sorter wins out. If an item classified as small is selected, the weight sorter wins out. One way for an outside mechanism to influence the final output of the assembly line is to bias the selection process. Causing a big object to be selected will lead the final output will be classified by temperature, while causing a small object to be selected will lead the final output to be classified by weight. However, neither effect would violate

22 The neural regions activated while exerting voluntary control over the perception of ambiguous figures overlap extensively with those responsible for directing spatial attention. Both tasks heavily recruit the superior parietal lobule and intraparietal sulcus (Slotnick & Yantis 2005; Sterzer et al. 2009). Moreover, individual differences in alternation rate for bistable stimuli are correlated with differences in superior parietal lobe structure—a region known to be involved in spatial attention—and disrupting superior parietal lobe activity through transcranial magnetic stimulation results in decreased alternation rate (Kanai et al. 2010). 23 If one prefers a thin view of perceptual content, essentially the same story can be told. It’s just that the relevant high- level process must be understood as representing a thin, “recognitionally coextensive” correlate of the relevant high-level category (Block 2014).

28 dimension restriction, either for the system as a whole or for processes within the system. The temperature sorter classifies by temperature, the weight sorter by weight, etc., and the system as a whole classifies only by size, weight, and temperature. Outside influences on selection do not modify these architectural constraints.

Insert figure 6 about here

I contend that the same is true of perception. Like in the assembly line, an outside mechanism (cognition) selects some of the outputs of early vision for further processing, and this affects which high-level interpretation dominates at a time. I’ve discussed evidence for the first part of the story (attentional selection). If the cases are genuinely analogous, then we should also expect different high-level perceptual processes to predominate depending on which high-level interpretation one perceives (analogous to activity of the temperature sorter versus the weight sorter). There is evidence for this as well. For example, in an fMRI study, Andrews et al. (2002) examined neural activity during perception of Rubin’s face-vase figure. They discovered that activity in the fusiform face area (FFA), a high-level visual area selectively attuned to faces (Kanwisher &

Yovel 2006), is greater during perception of faces than during perception of a vase (see also Wang et al. 2017). Parallel findings have been reported in the case of binocular rivalry (Tong et al. 1998).24

Let’s now extend the model to the rat-man image (fig. 5a). I conjecture that when you attend to the region corresponding to the man’s eyes or glasses, this promotes the old man percept. When you attend to the regions corresponding to the rat’s eyes or mouth, this promotes the rat percept.

This is just a hypothesis, of course, but it is an empirically reasonable one. Moreover, there is strong

24 I should note that there is some evidence for high-level unconscious processing of the region of a figure-ground display perceived as background (Cacciamani et al. 2014). Strictly speaking, then, attentional selection influences which high-level interpretation dominates over another in bistable perception and makes it into conscious experience, but this doesn’t always completely turn off those high-level processes responsible for the non-dominant interpretation.

29 evidence that eyes are among the most important features in face perception and recognition (Hills et al. 2011), so it is plausible that attending to the eyes would preferentially activate high-level processes attuned to faces (e.g., those subserved by the FFA). But this is a selection effect.

Cognition affects which objects or features get selected for further processing, and thus affects which high-level processes dominate at a time. But there is no reason to think that this enriches the dimensions computable by either early vision or high-level vision.

This account may generalize to certain degraded images containing objects that suddenly

“pop into place,” like the Dalmatian image (figure 3a). Studies suggest that there are low-level diagnostic features in the Dalmatian image (mainly texture differences) that aid our ability to detect the figure. When these are absent, finding the Dalmatian is considerably more difficult (van Tonder

& Ejima 2000). I conjecture that when cognition aids in detecting the Dalmatian, it does so by attentionally selecting low-level features diagnostic of dogs (or perhaps mid-sized four-legged animals in general). Once selected, these features are prioritized by the processes responsible for object differentiation. When these processes decide that an object is present, the Dalmatian pops into place. But this is a selection effect, not an enrichment effect.

3.3. Dimension Restriction and Cognition

I now consider cognition-side challenges to DRH. DRH claims that perception is dimensionally restricted while cognition is not. Cognition-side challenges allege that this thesis is false because at least some cognitive processes are dimensionally restricted. I’ll focus on two specific challenges: the objection from massive modularity and the objection from visual imagery.

Many have claimed that we can hold beliefs that are inaccessible to central cognition.

Consider, for example, the much-discussed case of the avowed egalitarian who harbors implicit prejudiced attitudes.25 These attitudes have some effects on her behavior and automatic associations,

25 The example is merely illustrative. I take no stance on whether such cases occur.

30 but they are not generally available for reasoning and inference. Suppose that there are implicit beliefs of this sort, and that they are genuinely inaccessible to most cognitive processes. If so, then it’s not true that every cognitive process has access to literally everything we believe.

While these cases may present problems for the view that cognition is perfectly isotropic, they do not threaten the view that cognition is dimensionally unrestricted. For even if there are certain beliefs that a given cognitive process can’t access, this doesn’t show that the range of beliefs it can access is marked by dimension restriction. Consider again the ability to grasp metaphor. Given the kind of freewheeling conceptual integration involved in this ability, there seem to be no principled limits on the range of dimensions that the metaphor comprehension process can represent. Crucially, these considerations are just as powerful even if we assume that there are some implicit beliefs buried elsewhere in the mind that the process cannot access. The argument relies only on the observation that the range of beliefs it can access is sufficiently wide-ranging that the process is unlikely to be dimensionally restricted.

Massively modular views of the mind may seem to pose a more serious challenge to DRH.

Massive modularists have suggested that Fodor’s picture of central cognition is misguided in two respects (e.g., Carruthers 2006a, 2006b; Sperber 2002). First, they contend that cognition is not a single domain-general processing system, but rather is composed of a collection of functionally dissociable, domain-specific processors (‘modules’), such as systems for mind-reading, social exchange, and mate selection. Second, they contend that these processors do not ever search through the entirety of one’s knowledge database. Rather, they use fast and frugal search strategies that quickly retrieve a small subset of the information the agent knows.

We can put the issue of domain-generality versus domain-specificity aside for now. We need not assume that cognition is a single domain-general processing mechanism in order to leverage

DRH into a theory of the perception-cognition border. All that matters is that cognitive processes

31 are not dimensionally restricted. Because it is possible for a system to be dedicated to processing information about a particular domain but to lack strict limits on the class of dimensions it can compute over, it could be true both that cognitive systems are domain-specific and that they are dimensionally unrestricted.

The more important question is whether we can still use dimension restriction to draw the perception-cognition border if, like perceptual processes, cognitive processes are also starkly limited in the range of information that they can take into account. I believe we can.

If cognitive systems are limited in the range of information that they can access, how are they limited? One option is that they are encapsulated in Fodor’s strict sense—they lack access to information stored in other cognitive systems. However, this would be an implausible view. For one thing, systems of social exchange and mate selection would surely need to share information with one another, and both would need to access information from the mind-reading system. At minimum, these systems must receive inputs from one another.

Carruthers (2006b, 57-59)—a leading defender of massive modularity—provides the most developed alternative answer. He distinguishes “narrow-scope” from “wide-scope” encapsulation. A system S is narrow-scope encapsulated if most of the information in the mind is such that S cannot access that information during on-line processing. S is wide-scope encapsulated if S cannot access most of the information in the mind during on-line processing. According to Carruthers, cognitive modules are task-specific processing systems that are encapsulated in only the wide-scope sense.26

It is likely that many cognitive systems are wide-scope encapsulated. After all, this criterion requires just that a system is incapable of computing over the entirety of one’s knowledge database in a single processing episode. Carruthers illustrates wide-scope encapsulation using fast and frugal

26 Sperber and Wilson (2002) also significantly weaken the notion of modularity. They characterize a module as “a domain- or task-specific autonomous computational mechanism” (9). Like Carruthers (see below), they also suggest that modules are characterized by fast and frugal heuristic strategies.

32 heuristics. For instance, Gigerenzer and colleagues’ (1999) recognition heuristic states that when deciding which of two items scores higher along a criterion, if one of the items is recognized while the other is not, you should choose the recognized item. Thus, suppose you are deciding which of two German cities has a higher population. If only one of the cities is recognized, then that one should be selected. An agent following the recognition heuristic can make a quick decision about relative population size without searching exhaustively through everything they know.

It is an open question whether all cognitive systems are wide-scope encapsulated. But even if they are, this does not undermine DRH. The reason is that wide-scope encapsulation does not entail dimension restriction. A system that uses only fast and frugal search strategies counts as wide-scope encapsulated, but it needn’t be dimensionally restricted. For example, there need be no limits on the dimensions accessible to a system using the recognition heuristic, since there are no limits to the dimensions that can be recognized. Likewise, consider Carruthers’ proposed practical reasoning module, which he describes as follows:

This [system] takes as initial input whatever is currently the strongest desire, for P. It then queries the various belief-generating modules, while also conducting a targeted search of long-term memory, looking for beliefs of the form Q ⊃ P. If it receives one as input, or if it finds one from its own search of memory, it consults a database of action schemata, to see if Q is something doable here and now. If it is, it goes ahead and does it. If it isn’t, it initiates a further search for beliefs of the form R ⊃ Q, and so on. If it has gone more than n conditionals deep without success, or if it has searched for the right sort of conditional belief without finding one for more than some specified time t, then it stops and moves to the next strongest desire. (2006b, 57-58)

This system is offered as an example of wide-scope encapsulation since it uses a fast and frugal search strategy, and so could never access all the information in one’s memory database in a single processing episode. But the system is clearly not dimensionally restricted. It takes as input whatever is

33 the currently strongest desire, and there are no fixed limits on the range of dimensions our desires may represent.27

However, even if massive modularity (in the permissive sense of ‘module’ these views employ) is consistent with DRH, particular cognitive processes may still be thought to present difficulties for DRH. Cognitive processes are sometimes modeled in a way that can make them appear dimensionally restricted—their mapping from inputs to outputs is cast as a computation over a restricted range of variables (examples might include moral cognition, mind-reading, and mental arithmetic). While I cannot examine such cases in detail,28 let me offer a preemptive strategy. The fact that a process typically uses a limited class of dimensions does not entail that the process is

27 Similar remarks hold for the version of massive modularity defended in Sperber (2002). Sperber considers a general- purpose modus ponens device such as the one described in section 2.3, and concludes that it is “a perfect example of a module.” The syntactic constraints on its inputs and outputs, says Sperber, place constraints on the representational content it can process (it can process conditional propositions, but not conjunctive or disjunctive propositions), securing the result that it is encapsulated. Regardless of the strength of this argument, it is clear for reasons noted above that a modus ponens device would not be dimensionally restricted. Thus, Sperber’s notion of modularity cannot entail dimension restriction. Others have argued that cognitive systems are modular simply because there are formal constraints on their acceptable inputs. For instance, Barrett and Kurzban (2006, 634) contend that a modus ponens processor would be modular because it only responds to inputs with the right syntactic properties. This is an even greater departure from dimension restriction, which places constraints on content, not format. Thus, far from threatening DRH, Barrett and Kurzban’s brand of massive modularity is totally silent on whether cognitive processes are dimensionally restricted. 28 I should, however, say something about the case of cheater detection, since it is the most familiar putative case of cognitive modularity. Originally proposed by Cosmides and Tooby (e.g., Cosmides & Tooby 2005), the cheater detection mechanism is a system shaped by evolution to detect the presence of cheaters in social exchange. It is selectively activated by situations in which a person is obligated to meet some requirement (e.g., paying money or meeting a minimum age limit) in order to receive a benefit (e.g., money, services, or alcohol). The cheater-detection computation outputs verdicts about which categories of people are the potential cheaters in a given social exchange scenario— namely, those who fail to meet the requirement and those that have received the benefit. These verdicts allow us to efficiently identify cheaters (they include anyone who falls in both categories). Critically, it may appear at first blush that the cheater detection mechanism could be dimensionally restricted. In particular, one might suggest that the mechanism is constrained to compute over a restricted range of variables like benefits, requirements, and agents. However, further reflection reveals that this is not so. To deliver verdicts that actually enable cheater detection, the cheater detection mechanism cannot only compute over abstract categories like benefits and requirements. Rather, it needs to produce and output representations of the specific benefits and requirements operative in a given exchange context. More concretely, if you are in a situation where the relevant benefit is obtaining alcohol and the relevant requirement is being at least 21 years of age, the cheater detection mechanism needs to output representations of these specific categories, marked as the ones that need to be checked to identify possible cheaters. Otherwise, its output would be of no use in actually finding cheaters. It is no good simply to be told that the cheaters in a given scenario are those people who may have received a benefit without passing a requirement. To actually detect the cheaters, one needs to know what the relevant benefits and requirements are, since these define the categories of people who need to be checked in that context. Thus, the cheater detection mechanism must be able to generate representations of whatever things could qualify as benefits or requirements in a given system of social exchange. But it is highly implausible that this range is dimensionally restricted. The class of things or quantities that could count as benefits in some case of social exchange is essentially boundless.

34 architecturally constrained to compute over those dimensions alone. As Pylyshyn (1984) observed, regularities in psychological processing can be due either to cognitive architecture or to what the agent believes. If an agent believes that only certain dimensions are relevant to a cognitive task, then she will typically consult these alone when performing the task. This doesn’t show that the process used to perform the task is dimensionally restricted. To show this, it would need to be shown that the agent couldn’t help but consult these dimensions alone when performing the task via that process, even if she believes that other dimensions are relevant. In section 4, I’ll highlight cases in which perceptual processes (namely, those involved in visual search and texture segregation) are constrained in just this way.

To sum up: Even if cognitive processes are modular in the sense suggested by massive modularity theorists, this doesn’t show that cognition is dimensionally restricted. Even on these views, it remains plausible that there are deep computational differences between perception and cognition. Thus, although DRH comports with Fodor’s view of central cognition as a single, domain-general mechanism, it is not reliant on this view.

One more cognition-side challenge should be considered. There is evidence that cognitive processes sometimes recruit perceptual brain regions. The most familiar example is imagery. When you conjure up a visual mental image of a scene, some of the same regions of visual cortex are activated as when you see the scene (Kosslyn et al. 1995), and these areas exhibit retinotopic organization during imagery just as they do during perception (Slotnick et al. 2005). But we use mental imagery for reasoning and decision-making. Does this show that the very same processes are used during both perception and cognition? If so, then either some perceptual processes are dimensionally unrestricted or some cognitive processes are dimensionally restricted.

The answer is no. Let’s grant that when visual brain regions are recruited for cognitive tasks, they implement the same representations that they do during perceptual processing. Crucially, the

35 fact that some of the same representations are used during both perception and cognition does not show that these representations participate in the same computational processes. When we use imagery for cognitive tasks like reasoning and inference, I contend that the processes in which our mental images participate are not dimensionally restricted. Consider using a visual image of your bedroom for the purpose of deciding what color carpet to purchase. There are no fixed limits on the range of dimensions or categories you might consult. (Do you plan to paint your walls in the coming months?

How will the carpet look next to the dresser you’ve recently ordered?) Thus, the fact that perceptual brain regions are activated during some cognitive tasks does not show that those regions participate in the same processes. Such phenomena do not refute DRH.

3.4. Vacuousness and Falsifiability

A final challenge alleges that DRH is unfalsifiable because there are too few constraints on what counts as a dimension. Suppose that cognition leads a perceptual process P to compute over some feature F, which it wouldn’t have computed over were it not for this cognitive influence. To determine whether this creates a problem for DRH, we need to know whether F falls along any of the dimensions that P was already able to compute over. The worry is that if we are always free to construe F as a value along one of the dimensions P previously computed over, then DRH is unfalsifiable. It can be stretched to accommodate any bit of evidence.

Fortunately, this worry can be addressed. Recall that an important constraint on dimensions is that their values empirically exclude one another. If something determinately exemplifies one value for a dimension, then as a matter of empirical necessity it does not determinately exemplify any of the others. This suggests a necessary condition for modulation, and a corresponding sufficient condition for enrichment. Suppose that cognition leads a visual process P to compute over feature F, which it hasn’t computed over previously. For this influence to be merely modulatory, it must be the case that there is some other feature G that P was previously able to represent, such that

36 F and G exclude one another. If, on the other hand, something’s having F is compatible with its having any of the features that P has represented previously, then the evidence suggests that P has been enriched. For example, if P is initially constrained to represent values for orientation and motion, but cognition causes it to compute over a saturation value, then it has been enriched. Thus, it isn’t true that we can simply reinterpret the dimensions that a process computes over in order to turn any candidate case of enrichment into a mere case of modulation. We can’t just add new values for a dimension, since the putative new values will usually fail to exclude the old ones.

However, features can empirically exclude one another without falling along the same dimension. Being an apple arguably excludes being two-dimensional, but if a process is initially constrained to compute over 2-D shapes, and cognition leads it to compute over the property of being an apple, this looks like a clear-cut case of enrichment. Fortunately, there are ways to convincingly demonstrate enrichment that don’t require establishing non-exclusion.

Suppose that prior to time t, a visual process P has computed only over quantitative dimensions. (Recall that values along these dimensions can be naturally ordered, and amounts of difference between them can be numerically measured.) If, at t, cognition leads P to compute over a feature that fails to stand in the appropriate ordering and closeness relations with any of the features that P previously represented, then the most plausible conclusion is that cognition has enriched P.29

So, for processes tuned to quantitative dimensions, there are good ways to identify enrichment.

I suspect that the most difficult cases to legislate will involve perceptual processes (if any) that represent categorical dimensions. For instance, suppose that prior to time t, a process has computed over a small class of facial expressions, like fearful and angry (Butler et al. 2008; Block

29 Although this is the most plausible conclusion, it is not strictly entailed. In general, empirical data will only directly tell us which dimensions a process has computed over before a given time. To evaluate DRH, we must infer what the process was able to compute over before that time. However, if a perceptual process never computes over dimension D before time t (and the subject’s environment is not impoverished—she has come across objects with values along D), then this provides compelling evidence that the process was architecturally constrained from computing over D before t.

37 2014). If cognition leads it, at t, to compute over happy, has the process been enriched, or merely modulated? That depends on how we construe the dimension that the process initially represented.

If we construe it as facial expression, then the process has been merely modulated, but if we construe it as negative facial expression, then the evidence supports enrichment. Perhaps there are good ways to answer this question. However, in the interests of falsifiability, I’m willing to count any example of this sort as a case of enrichment. So if cognition leads a perceptual process to compute over a value along a categorical dimension that it hasn’t represented previously, then we should say that it has been enriched. This is not problematic, since it just makes DRH easier to falsify.

Thus, given a background functional analysis of vision into constituent processes, we can detect cases of enrichment by reference to independent rules governing the grouping of features into dimensions. If cognition leads a perceptual process to compute over a feature that cannot be dimensionally grouped with any of the features that the process was previously able to compute over, then the process has been enriched, and DRH is false.

4. Dimension Restriction and Unconstrained Predictive Processing

I’ve now explained DRH, offered some initial motivation for it, and defended it against some objections. However, the hypothesis must ultimately be judged by its empirical success relative to competing models. In this section, I’ll argue that DRH enjoys advantages over views that seek to abolish architectural constraints on information flow between cognition and perception. In the next section, I’ll argue that DRH avoids the empirical problems that afflict cognitive impenetrability.

Predictive processing approaches hold that perception is guided at each level by top-down predictions. More precisely, each level of perceptual processing receives sensory inputs from lower levels. These are compared against signals from higher levels that convey predicted sensory inputs, generated on the basis of the system’s current ‘best guess’ about their external-world causes.

38 Mismatches between predicted and received sensory inputs are called “prediction errors.” Ascending connections allow precision-weighted prediction errors to be sent upward. These encode the respects in which actual inputs differ from predicted inputs, weighted in some way by their relevance to the agent’s current task. Residual prediction errors (i.e., those aspects of the incoming signal that were not successfully predicted) are then used to revise the hypotheses about the external world that generated the predictions (for details, see den Ouden et al. 2012; Hohwy 2013; Clark 2013, 2015).

Some predictive processing theorists have sought to eliminate architectural information-flow constraints between perception and cognition (see, in particular, Lupyan 2015a, 2017; Lupyan &

Clark 2015). The recipe for such a view involves the basic predictive processing framework together with the tenet that any cognitive predictions can be transmitted to any stage of perceptual processing, provided that the predictions bear on the processing occurring at that stage. These theorists construe perception as a highly flexible mechanism that can be reshaped through prediction to meet the agent’s goals. Thus, Gary Lupyan writes: “There is no gatekeeper deciding how far down a cognitive state should penetrate perceptual processes. In evolving to minimize prediction error neural systems naturally end up incorporating whatever sources of knowledge, at whatever level, to lower global prediction error” (2015a, 551, emphasis added; see also Lupyan & Clark 2015, 282). And, elsewhere: “If allowing information from another modality, prior experience, expectations, knowledge, beliefs, etc., lowers global prediction error, then such information will be used to guide processing at the lower levels” (Lupyan 2015a, 550, emphasis added). Clark (2015, 200-201) quotes the latter passage approvingly.

I’ll call this the unconstrained predictive processing (UPP) approach. The approach is

“unconstrained” because it claims that there are no restrictions beyond prediction error

39 minimization on the ways that cognitive predictions can affect perceptual processing.30 If this is indeed the only fixed principle governing psychological information flow, and it applies throughout the entire processing stream (from early sensory processing all the way to reasoning and inference), then the view rejects any architectural border between perception and cognition.

I believe that UPP faces difficulties, and that these difficulties highlight the virtues of DRH.

I’ll argue (i) that if UPP is right, then we should expect to find cognitively induced enrichment of certain visual processes, but (ii) that there are salient examples where this fails to occur, even when it would be predictively useful. By contrast, (iii) this evidence fits nicely with DRH.

If information flow from cognition to perception is governed solely by prediction error minimization, then we should expect a perceptual process to be enriched when (and only when) enrichment would help to reduce prediction error. There are clear hypothetical cases in which this would be true. Suppose that a process computes distal size, and that up to the present time it has taken into account only distance and visual angle. Clearly, there are cases where prediction error would be reduced if the process took additional dimensions into account. Suppose, for example, that participants are placed in a context where they know that yellow objects are likely to be large, while red objects are likely to be small. Then the process should be enriched with information about color.

Visual search and texture segregation provide two cases in which, were prediction error minimization the sole principle governing information flow from cognition to perception, we should expect to observe enrichment. I’ll now argue that since we do not observe enrichment in these cases, this raises problems for the UPP approach. However, the cases confirm the predictions of DRH.

30 Two qualifications: First, some attracted to UPP might object to my way of characterizing the view, since they reject any distinction between perception and cognition (see Macpherson 2017). However, one could accept UPP while still affirming a distinction between perception and cognition. It’s just that any such distinction could not be drawn via constraints on information flow. Second, one might be worried that the foregoing characterization of UPP overlooks the hierarchical character of predictive processing architectures: Predictions from one level are only directly provided to the level just below. If so, then perhaps cognitive predictions cannot be sent to just any stage of perceptual processing—only to those levels just below the relevant cognitive processing level. This restriction is noticeably absent from some UPP theorists’ explicit claims about cognitive penetration (e.g., the Lupyan quotes above). In any case, I’ll argue below that the hierarchical twist cannot rescue UPP from the objections raised here.

40 Visual search for targets among distractors is traditionally classified as either “efficient” or

“inefficient.” When search is efficient, this means that the time it takes to find the target is nearly independent of the number of distractors. When search is inefficient, this means that it takes progressively longer to find the target as the distractor set size increases. Target search efficiency is characterized by search slope—the amount of additional time it takes to identify the target for each further distractor introduced.

Some dimensions—like brightness, orientation, and size—enable efficient search. A black object among a set of white objects can be found quickly and independent of the number of white objects. Likewise for a horizontal line among vertical lines, or a large square among small squares

(Wolfe & Horowitz 2004, 2017). Other characteristics do not enable efficient search. The most familiar example is absences. Search for a normal circle embedded among circles with lines protruding from them is not efficient, while search for a circle with a protrusion among normal circles is efficient (Treisman & Gormican 1988). In the former case, the target can be differentiated from the distractors only by the absence of a feature (a protruding line).

A leading model of visual search, the Guided Search Model (Wolfe 2007, 2015; see also

Olivers et al. 2011), explains these effects among others. The Guided Search Model holds that focal spatial attention is needed to identify a search target. Focal attention is directed to items on the basis of a priority map. An item’s level of activation in this map is a function of (1) its bottom-up salience (a function of how different it is from other nearby items along a fixed range of dimensions), (2) top- down weighting, which prioritizes features of the item that the subject is searching for, and (3) random noise. Top-down weighting is a kind of feature-based attention (also sometimes called the

“search template”), and it is sensitive to the subject’s goals and expectations. If the subject is looking for a red horizontal line among blue horizontal lines and red vertical lines (a “conjunction” search), then the activation of each item will be determined partly by how different the item is from its

41 neighbors as regards color, orientation, size, etc., but also, because of top-down weighting, by how similar its features are to the desired target’s features (red and horizontal).

Because salience is a function of an item’s degree of difference from its neighbors, the

Guided Search Model explains why search efficiency increases as target-distractor similarity decreases (Duncan & Humphreys 1989). Moreover, items that share more of the target’s features will tend to receive higher activation due to top-down weighting, which explains why conjunction searches can be relatively efficient, although not as efficient as simple feature searches. Finally, because we can place top-down weighting on features, but not on absences of features, the model explains why absences cannot be found efficiently.

The crucial point is that the visual processes that construct the priority map are dimensionally restricted. Only certain feature dimensions play a role in determining an item’s level of activation in this map, and we cannot change this, even when doing so would reduce overall prediction error. Consider search for targets defined by a +-shaped intersection. This is a simple perceptible feature, and we have little difficulty describing or imagining it. Nevertheless, the feature is unavailable for guiding visual search, even when subjects are told beforehand that this is what the target will look like. Wolfe and DiMase (2003) found that search for a +-shape was highly inefficient once other low-level features, such as symmetry and number of line terminators, were controlled

(see fig. 7a). Moreover, “brick wall” stimuli with cross-intersections fail to pop out from those with

T-junctions, and vice versa (fig. 7b).

Insert figures 7a and 7b here

This is not the only striking search inefficiency. We are also unable to search efficiently for a

T-junction among L-junctions (Wolfe & Horowitz 2004), for a bouncing rhythm with a unique

42 period (Li et al. 2014), for abrupt changes in color without changes in luminance (Theeuwes 1995), or for particular material types among others, such as a fur surface among stone surfaces (Wolfe &

Myers 2010).31 Note that in each of these cases, we are able to perceive and think about the features that distinguish the target from the distractors. We just can’t use expectations about these features to optimally guide focal attention to the target.32

Texture segregation is another process that looks to be dimensionally restricted. We visually segregate certain textures rapidly and effortlessly, while others require careful inspection. In figure 8, regions containing randomly oriented L’s are effortlessly segregated from those containing randomly oriented X’s, and vice versa (top half), but regions containing randomly oriented L’s are not effortlessly segregated from those containing randomly oriented T’s, or vice versa (bottom half).

Subjectively, there seem to be perceptible boundaries between the former two types of regions, but not between the latter two, even though we can easily tell the L and T regions apart when we inspect them more closely. Evidence suggests that the ability to segregate textures of the latter sort remains

31 In a recent review article, Wolfe and Horowitz (2017, 3) offer the following list of attributes that most likely cannot guide visual search: “intersection, optic flow, color change, 3D volumes (for example, geons), luminosity, material type, scene category, duration, stare-in-crowd, biological motion, your name, threat, semantic category (animal, artefact, and so on), blur, visual rhythm, [and] animacy/chasing.” As a reviewer points out, there remains controversy about which features can guide efficient search. One particularly contentious case is alphanumeric identity, which I discuss further in note 32. At any rate, it is clear that the dimensionally restricted account of visual search is empirically vulnerable. I suppose further evidence could convince us that visual search is not dimensionally restricted, and this would undercut a key piece of evidence for DRH. But I regard this as a feature, not a bug. Any theory of the perception-cognition border that is responsive to vision science should be hostage to empirical fortune. If the scientific models that motivate the theory are discredited, then this should undercut the support for the theory. 32 A caveat: The features that enable efficient visual search may be somewhat malleable over time. Certain over-learned categories (e.g., letters or numerals) may become search-guiding features after extensive experience. Studies have found that we are quicker to identify an item when it appears against a background of familiar distractors, such as Ns, than when it appears among unfamiliar distractors, such as inverted N’s (Malinowski & Hubner 2001). Other studies have found that cues to classify stimuli into alphanumeric categories increase search efficiency (Lupyan & Spivey 2008). It is unclear, however, whether letters and numbers really become new search-guiding features, or perceivers instead become better at assigning top-down weights to preexisting search-guiding features that are diagnostic of the relevant letter or numerical category (Wolfe 2015). In any case, no matter how this issue is resolved, the findings are consistent with DRH. The evidence suggests that perceptual learning effects on visual search take significant time to unfold, and require controlled patterns of inputs to the visual system. For instance, one study explicitly testing the ability to form new search-guiding features found that it took 5,000-7,000 trials over a 4-6 day period to start finding the feature efficiently, and even then the learning benefits were initially location-specific (Sigman & Gilbert 2000). Thus, while it may be possible through extended practice to get the process of priority map construction to take new dimensions or categories into account, this mechanism is not wholly internal—rather, it relies on externally mediated perceptual learning.

43 slow and difficult, even when the same texture elements are repeated over many trials (Bergen &

Julesz 1983; Treisman & Paterson 1984; Rosenholtz 2015). Again, we can predict what the differentiating features will be from one trial to the next, but we can’t use this prediction to direct vision to find the texture boundaries. DRH helps us understand why this should be: There is a fixed class of dimensions, and a fixed computation over them, responsible for texture segregation. The X and L regions are very different along one or more of these dimensions (or with respect to summary statistics computed from them—see Rosenholtz et al. 2012), while the L and T regions are not.

Thus, texture segregation can rapidly distinguish X-regions from L-regions, but can’t rapidly distinguish L-regions from T-regions.

Insert figure 8 about here

There is an ongoing investigation into which dimensions texture segregation computations are sensitive to.33 Contrary to some earlier theories (Treisman 1985), the dimensions that enable rapid texture segregation are not quite the same as those that enable efficient visual search (Wolfe

1992). Critically, however, we can already appreciate that there are restrictions on the dimensions that texture processing can take into account, and that these constrain the ways that our expectations can guide texture segregation. Simply knowing that texture regions will be defined by differences in a particular feature isn’t enough to get vision to segregate textures that way.

How well does the UPP approach handle the salient limits to top-down influences on visual search and texture segregation? Prima facie, these cases are very puzzling for the view. In visual

33 Some views of texture segregation posit that the visual system is sensitive to certain basic features of texture elements called “textons,” and that textured regions are differentiated on the basis of differences in textons (Julesz 1981). Other views hold that we compute a fixed set of summary statistics (e.g., luminance autocorrelation) over larger pooling regions and segregate textures on the basis of these statistics (Rosenholtz et al. 2012; see Rosenholtz 2015 for review). The differences between these approaches do not affect the central point that constraints on texture segregation are plausibly explained by architectural constraints on the dimensions (and the computations over these dimensions) used to differentiate texture regions.

44 search, the agent has the goal of locating the target, and in texture segregation, she has the goal of locating texture boundaries. In both cases the agent possesses knowledge (e.g., the target will look like this) that, if used, would both reduce prediction error and help her to attain these goals. Consider repeated visual search for a +-shaped intersection. It would be highly useful, and would significantly reduce prediction error, for vision to use information about this feature during priority map construction. If the information were used, then the target would systematically receive higher activation, and prediction error about target location would be reduced. Likewise, if information about L’s and T’s were made available to texture segregation processes, then prediction error about texture boundary locations would be reduced. But these things don’t happen. Or, at least, they don’t happen in an optimal way, given what the perceiver knows. DRH, however, has a ready explanation:

These effects can’t happen because the processes of texture segregation and priority map construction are dimensionally restricted.

I’ll consider three replies to the foregoing argument. Each of them alleges that predictive processing theorists can accommodate the above evidence without conceding that visual search and texture segregation are dimensionally restricted.

The first two replies trade on the hierarchical character of predictive processing models (see

Clark 2015, 30; Drayson 2017; Orlandi & Lee 2018). Proponents of predictive processing depict perception as organized into levels. Each level contains both representation units, which supply predictions to lower levels, and error units, which encode the discrepancy between the level’s current input and the predictions it received from above. It is also often noted that levels earlier in the processing hierarchy tend to process fine-grained features, while those later on process more abstract features. One might argue that this hierarchical organization already places certain constraints on the kinds of cognitive penetration we should expect to observe, and that these

45 constraints are sufficient to explain why visual search and texture segregation processes can’t be freely reshaped by cognition.

There are two ways this maneuver could work. First, it might be suggested that the processes of priority map construction and texture segregation are too early in the predictive processing hierarchy to be influenced by cognitive expectations. In particular, if these processes are separated from cognitive expectations by too many computational levels, this could effectively insulate them from cognitive influence (e.g., Hohwy 2013, 124-126). Drayson (2017) explains the relevant idea as follows:

[W]hen it is true that Level A+1 causally influences Level A, and Level A causally influences Level A–1, we need not expect it to be true that Level A+1 causally influences Level A–1. The further apart the levels in the hierarchy are, the less likely there is to be causal influence from the higher level to the lower level. In this way, we can accept that each level in the predictive hierarchy is causally influenced by (i.e. gets its priors from) the level above, without having to accept that each level in the hierarchy causally influences all the levels below it, or that each level is causally influenced by all the levels above it.

Applied to present case, the proposal would be that the processes of priority map construction and texture segregation are separated by too many processing levels to be influenced by cognition.

There are two problems with this reply. First, note that the reply already concedes the point that, contra UPP, there are fixed architectural constraints on information flow from cognition to perception, and these act as a “gatekeeper” on cognitive penetration, pace Lupyan (2015a). For it requires that cognitive predictions can only penetrate some distance down the perceptual processing hierarchy. The second and more serious problem, however, is that the suggestion is simply implausible in the visual search case. The processes that contribute to constructing the search priority map are influenced by cognitive predictions, and advocates of predictive processing have themselves emphasized this. Advance knowledge about certain features can help us to find search targets more quickly. Priority map construction is thus not insulated from our goals and expectations. The process is not too low-level for cognitive influence.

46 A second reply also trades on the hierarchical character of predictive processing models.

One might suggest that although processes of priority map construction and texture segregation can take cognitive predictions into account, they only do so when those predictions are framed at the right level of precision. When predictions fail to be used efficiently, this is because the features they concern are either too abstract or too fine-grained for the process that would need to use them.

Thus, the reason that cognition fails to fine-tune visual search and texture segregation is not because these processes are dimensionally restricted, but rather because they are dedicated to processing features at a fixed level of precision.

The first rebuttal mentioned above also applies here. But even putting that issue aside, the reply won’t work. Consider again the failure of expectations about +-intersections and T-junctions to enrich priority map construction. There is independent evidence that these features are not too abstract for this level of vision. It’s widely agreed that other early visual processes (namely, those involved in scene parsing) recover these features, since they are important for computing occlusion and transparency (Anderson 1997). Furthermore, there is evidence that occlusion relations help to determine pop-out in visual search (Davis & Driver 1998), indicating that there are visual processes earlier than priority map construction that can encode +-intersections and T-junctions. Thus, the features in question are not too abstract (and nor are they too complex) for early vision. But perhaps they are too fine-grained? It is difficult to see why. Note that intersection and junction categories abstract away from precise metric details like length, size, orientation, and so on.

The third reply does not appeal to hierarchical organization. Rather, it suggests that although using information about +-intersections or T-junctions would reduce local prediction error, it would require increasing global prediction error—e.g., prediction errors associated with other visual processes, or with subsequent cases of search guidance.34 I think that this is the most promising

34 Compare Lupyan’s (2015a, 559) treatment of the Muller-Lyer illusion (also Lupyan & Clark 2015, 281-282).

47 strategy, but as stated it is unacceptably ad hoc. The objector owes us an explanation of why the top- down weighting of certain features, but not others, during priority map construction necessitates increases in overall prediction error. Note that if UPP applies to early scene parsing, then there can be no general predictive cost to processing intersections and T-junctions. Thus, it must be that selecting these features increases overall prediction error specifically in the context of search guidance.

To sum up: There is strong evidence that the architecture of the visual system places constraints on which dimensions (and, hence, which features) are available for guiding visual search and texture segregation. The dimensions computed over during these processes are limited, and they cannot be enriched through cognitive influence. This provides compelling support for DRH, and also raises problems for views that would have us eliminate information-flow constraints on the perception-cognition border.

5. Dimension Restriction and Cognitive Impenetrability

The previous section considered views on which cognitive penetration is wholly unconstrained, and found them problematic. I’ll now shift to the other end of the spectrum.

I needn’t refute the cognitive impenetrability thesis to establish DRH. Indeed, it’s compatible with DRH that cognition never affects perception. Nonetheless, the significance of DRH hinges partly on whether enrichment is a uniquely disallowed type of cognitive penetration. If enrichment were not unique in this way, there would be no reason to emphasize dimension restriction over other sorts of information-flow constraints in characterizing the perception- cognition border. In the current section, I’ll argue that there are compelling reasons to treat certain pre-cueing effects as cases of penetration. The impenetrability thesis is thus in serious doubt. As such, those attracted to the idea that there is an architectural division between perception and cognition should welcome a different characterization of this division, which DRH provides.

48 I won’t explore how things play out for every incarnation of the impenetrability thesis.35

Rather, I’ll start with a fairly permissive notion of cognitive penetration, due to Dustin Stokes

(2013), and then turn to a much more restrictive one, due to Steven Gross (2017). I’ll argue in section 5.1 that on the permissive notion, pre-cueing effects provide clear evidence for cognitive penetration. I’ll argue in section 5.2 that the more restrictive notion is inappropriate in the current context because it divorces the question of whether perception is penetrable from the question of whether there are architectural constraints on information flow from cognition to perception.

Nonetheless, I’ll argue that there is emerging evidence that perception is penetrable even by these more stringent standards.

5.1. Permissive Penetration and Pre-Cueing Effects

Stokes (2013) is primarily concerned with the penetration of perceptual experience, but his definition can be leveraged into a characterization of cognitive penetration of perception more generally. Stokes writes: “A perceptual experience E is cognitively penetrated if and only if (1) E is causally dependent upon some cognitive state C and (2) the causal link between E and C is internal and mental.” Applying this idea, we can say that a perceptual state or process P is cognitively penetrated if and only if (1) P is causally influenced by some cognitive state or process C and (2) the causal link between P and C is internal and mental. Call this notion permissive penetration. The notion is permissive because it requires only an internal, psychological causal chain connecting a cognitive state and a perceptual state or process. No conditions are placed on the type of causal chain, or on the epistemic relation between the content of the penetrating state and that of the penetrated state.

Do pre-cueing effects provide evidence for permissive penetration? To show this, we must show (i) that the effects are effects on perception rather than perceptual judgment, and (ii) that the effects are not produced merely by changing the inputs to perception. There is compelling evidence

35 For various formulations, see Pylyshyn (1999), Macpherson (2012), Siegel (2012), Stokes (2013, 2015), Wu (2017), Gross (2017), Burnston (2017), and Block (forthcoming).

49 in support of both (i) and (ii). Although a number of studies would suffice to make the point, I’ll just discuss one (see Treue 2015 for review).

Patzwahl and Treue (2009) recorded from neurons in area MT of macaque monkeys while they viewed two superimposed dot clusters moving in opposite directions. While maintaining fixation, the monkeys were cued to attend to just one of the motion directions. (The cued direction varied from one trial to the next.) Patzwahl and Treue recorded the activity of neurons optimally tuned to one of the superimposed directions. They found that these neurons fired on average 32% more vigorously when attention was allocated to their preferred motion direction than when it was allocated to the opposite direction. In a neutral condition where neither direction was attended, responses fell in between: 19% below the attend-preferred condition, and 11% above the attend- opposite condition. Because retinal inputs were the same across these conditions (monkeys had to fixate the same point on each trial), these differences cannot be due to changes in retinal input.

(They also cannot be due to changes in spatial attention, because the dot clusters were precisely superimposed.) Furthermore, MT is a well-known visual processing area, and its perceptual functions are well understood (Born & Bradley 2005), so these are clearly effects on perception, and not merely on perceptual judgment. Thus, Patzwahl and Treue’s findings at least appear to meet

Stokes’s permissive criteria for cognitive penetration.

One way to reply to this type of case would be to appeal instead to a definition of cognitive penetration that rules out attentional effects by stipulation (see, e.g., Siegel 2012). But as tempting as it may be, the move is question-begging. The proponent of impenetrability must give us principled reasons for treating attention as an obstacle to cognitive penetration, rather than as a mechanism of cognitive penetration (see Mole 2015; Wu 2017).

50 A different reply alleges that although attention does not change retinal inputs, attentional effects should still be dismissed because they merely change the inputs to the visual system (see

Pylyshyn 1999), or perhaps to particular visual processes. Firestone and Scholl (2016a) write:

In many…cases, changing what we see by selectively attending to a different object or feature…seems importantly similar to changing what we see by moving our eyes (or turning the lights off). In both cases, we are changing the input to mechanisms of visual perception, which may then still operate inflexibly given that input. (13)

This argument claims, first, that selection effects (i.e., changing the inputs to a visual process) should not count as cognitive penetration, and second, that attentional influences on perception (or at least the majority of them) are mere selection effects. Although I am unconvinced of the first claim, I will put it aside, because there is a clear problem with the second.

The problem is that Firestone and Scholl’s maneuver is only successful if it generalizes to all cognitively driven attentional influences. But it doesn’t. Attention may either select among candidate inputs to a perceptual process or modulate the information computed over by a process (or both).

Changes in the determinacy or intensity of perceptually represented features should not be treated as mere selection effects. These interactions reshape the information processed in perception, rather than simply choosing which objects and features to process further.

Furthermore, note that if modulatory effects were assimilated to selection effects and dismissed on that basis, the reply would overgenerate. Many of the proposed cases of penetration that impenetrability theorists have been most concerned about involve modulatory effects. See, for example, the lively disputes over whether desires modulate distance perception (Balcetis & Dunning

2010; Durgin et al. 2011), and whether bioenergetic factors modulate slant perception (Proffitt 2006;

Durgin et al. 2009; Firestone 2013).

It might be suggested that apparent modulatory effects at mid- and high-level visual areas result from first changing the inputs to the earliest stages of the visual system. However, our best evidence suggests that this is not the way endogenous attention works. Rather, attentional influences

51 are propagated downward through the visual hierarchy—affecting higher areas first, then lower areas

(Buffalo et al. 2010).36

Thus, modulatory pre-cueing effects seem to meet the permissive criteria for cognitive penetration. They are initiated by cognitive states, they exert clear effects on perceptual processing, and they do so via an internal psychological mechanism.

5.2. Restrictive Penetration and Pre-Cueing Effects

Gross (2017) has recently presented a more compelling case for excluding attention-mediated effects, which relies on a more restrictive notion of cognitive penetration.

36 Firestone and Scholl offer some additional arguments for excluding attentional effects. I’ll flag two of them here. First, they argue that attention-mediated effects are not suitably dependent on the contents of cognitive states. They write:

The influence of attention…is completely independent of your reason for attending that way. (…) Attention may enhance what you see regardless of the reasons that led you to deploy attention in that way, and even whether you attended voluntarily or through involuntary attentional capture…. Put differently, such attentional (or light-turning-off) effects may be occasioned by a relevant intention or belief, but they are not sensitive to the content of that intention or belief. (2016a, 14)

However, these considerations do not convincingly show that attentional effects on perception are not sensitive to the contents of the cognitive states that initiate them. After all, attentional effects are obviously sensitive to the contents of cognitive states in the sense of depending counterfactually on these contents. Suppose my desire to find my red shirt leads me to attend to all the red items in my visual field, which in turn boosts the precision with which these items are represented. If I had instead desired to find my second favorite shirt, I would have attended differently and instead modulated my perception of the blue items. It seems that what Firestone and Scholl really want to argue is that attentional effects are indirect. When a cognitive state influences perception via attention, the following will be true: Had the cognitive state been different while attention allocation remained the same, the same perceptual effect would have resulted. Thus, attention acts as a mediator between cognition and perception. If this is the idea, then Firestone and Scholl are appealing to one of Gross’s (2017) more restrictive conditions for cognitive penetration. I’ll discuss these below.

Firestone and Scholl provide a further argument for dismissing FBA effects in their replies to commentators:

[A]ttending to features, rather than locations, may not be analogous to moving one’s eyes, but it is importantly analogous to seeing through a tinted lens—merely increasing sensitivity to certain features rather than others. (…) This is what we mean when we speak of attention as constraining input: attention acts as a “filter” that “selects” the information for downstream visual processing, which may itself be impervious to cognitive influence. (2016b, 62)

Two points should be noted about this. First, there may indeed be respects in which FBA is analogous to looking through a tinted lens, but it is a further question whether these respects are sufficient to dismiss FBA as a case of penetration. By analogy, if desires influence size perception, then desiring an object may be analogous to looking at it through a magnifying glass, but this doesn’t show that desires fail to penetrate vision. Second, F&S suggest that FBA merely acts as a selective filter for downstream processing. However, this claim is incorrect for reasons already noted. FBA plays more than a selective role. It modulates either the determinacy or intensity of perceptual representations, and modulation should not be assimilated to selection.

52 The restrictive conception says that for an influence of cognition on perception to count as penetration, the influence must be both direct and semantically coherent.37 A cognitive state C exerts a direct influence on perceptual state P when C influences P, and not by way of any mediating psychological state. C’s influence on P is semantically coherent (in Gross’s sense) when C does not merely causally affect P, but also provides an epistemic reason for P. For this to be the case, the content of C must stand in an inferential (logical or probabilistic) relation to the content of P.38 Gross writes:

It is not enough that the contents of early vision states be sensitive to those of the cognitive states in the weaker sense of depending counterfactually, statistically, or in a law-like manner upon them. They must also do so in virtue of early vision itself operating over the cognitive states in a manner that mirrors a rational relation. (2017, 3)

Gross points out that Pylyshyn (1999), in several passages, seems to have had something like the restrictive notion of penetration in mind.39

Using the restrictive notion, Gross poses a dilemma for the claim that cognitively driven attention is a mechanism of cognitive penetration. According to Gross, when cognition causes an attentional effect on perception, it does so by way of an attentional command—roughly, “Allocate attention to x.” This command is either a cognitive representation or a perceptual representation. If it is cognitive, then, Gross argues, the cognitive influence is not semantically coherent. This is because the attentional command is an imperative, and thus cannot bear the right inferential relation to the resulting perceptual state. If, on the other hand, the command is perceptual, then, Gross argues, the influence is not direct. This is because the perceptual representation (the command) that is directly affected by cognition is not the sort of state that is a candidate for being penetrated. Gross

37 Gross actually favors a pluralist picture on which different notions of cognitive penetration are appropriate for different theoretical projects. However, for present purposes I am only interested in his most restrictive notion, since it furnishes the strongest defense of impenetrability against pre-cueing effects. 38 Burnston’s (2017, 3651) ‘computation’ condition on cognitive penetration articulates a similar idea. 39 Nonetheless, the correct interpretation of Pylyshyn (1999) is somewhat unclear. Although Pylyshyn does at certain points seem to endorse a definition of cognitive penetration along Gross’s lines (Pylyshyn 1999, 365, footnote 3), many of the putative cases of penetration that he is most concerned to rule out (e.g., effects of value or desire on size perception) don’t satisfy this definition.

53 writes: “Though it is perhaps (if we deny it cognitive status) a representational state in perception, it is not itself a perceptual state, at least in the sense of a state whose function is to represent the here and now” (2017, 6).

I’ll argue below that there are pre-cueing effects that escape this dilemma. However, before doing so it is worth asking whether the restrictive notion is even appropriate for settling the issue of whether there is an architectural boundary between perception and cognition.

One might object to both the directness and semantic coherence requirements. As regards directness, suppose that perception and cognition compute in formally distinct languages that have the same expressive power. While cognition can freely transmit information to perception, it can only do so by way of a third system—a compiler—that functions to translate sentences in the language of thought into sentences in the language of perception. Thus, while cognitive influences on perception are systematic and pervasive, they are all indirect. Should this sort of case really not qualify as cognitive penetration?

It might be replied, however, that for the purposes of assessing whether there is an architectural boundary between perception and cognition, it is fair to impose a directness requirement

(see Quilty-Dunn 2019). Why? Because if perception can access information from cognition only by accessing information from a mediating system, then perhaps this already suffices to show that there is an architectural constraint on information flow from cognition to perception—messages must always be channeled through an intermediary. At minimum, top-down effects are surely most worrisome for the impenetrability thesis if they meet the directness constraint.

Doubts about the semantic coherence requirement are harder to assuage. One problem is that the requirement makes the category of cognitive penetration so narrow that it excludes, by definition, many proposed cases of penetration. Consider again the New Look claim that desiring an object makes it look larger. Desiring an object does not provide an epistemic reason for concluding

54 that it is larger. Accordingly, there is no rational relation between the cognitive state and the penetrated perceptual state (see also Stokes 2013).

There is, though, a more serious worry with the semantic coherence requirement: The requirement threatens to divorce the impenetrability thesis from the architectural issue we are concerned with. The question of whether there is an architectural boundary between perceptual and cognitive systems is, in part, the question of whether there are fixed constraints on the range of information within cognition that perception can access during processing. But if we adopt the semantic coherence requirement, then perception could qualify as impenetrable despite having wholly unconstrained access to information in cognition. Why? Because the semantic coherence requirement concerns how perception puts cognitive information to use (is the information used rationally or irrationally?), and not the range of cognitive information it puts to use. While interesting in its own right, the former issue is not immediately relevant to whether there are architecturally fixed restrictions on perceptual access to cognitive information. Thus, the semantic coherence requirement may be too strong.

In any event, recent evidence suggests that there are pre-cueing effects that avoid Gross’s dilemma.40 To avoid it, we need a top-down effect on perception in which (i) the penetrating cognitive state is the right sort of state to serve as an epistemic reason for the resulting perceptual state, and (ii) the state’s influence on perception does not go by way of an attentional command (or any other non-doxastic state). Recent work on feature-based expectation suggests that there are top- down influences of this sort.

The distinction between feature-based attention and feature-based expectation is motivated by the observation that a feature’s relevance to an agent’s ongoing activities can depart from the

40 To be clear: Gross (2017) is concerned only with attention. He does not argue against treating other pre-cueing effects as cognitive penetration. Gross (2017, 10) also accepts that feature-based expectation is distinct from FBA (see below), but doesn’t say whether he thinks it counts as cognitive penetration.

55 agent’s beliefs about that feature’s probability (Summerfield & Egner 2016). We attend to features when they are germane to an ongoing task. If I am searching for my red shirt, then I will preferentially attend to red items (if any) in my visual field. I do this irrespective of my beliefs about the probability of redness in my environment. I attend as I do simply because redness is relevant to my current aims, and I think there is some chance (however small) that my red shirt is nearby.

In many pre-cueing studies, the cued feature is both task-relevant and expected, making it difficult to disentangle FBA from feature-based expectation. (This is true of the Ling et al. (2009) study described earlier.) However, recent studies have dissociated a feature’s task-relevance from its probability of occurrence. This is important because if expectations about a feature influence perceptual processing even when the feature is task-irrelevant and unattended, then there are plausibly cases where a cognitive state suited to provide an epistemic reason (viz., an expectation) affects perception, and does so without mediation of an attentional command. An important fMRI study by Kok et al. (2012) supplies evidence of just this sort.

Kok et al. presented participants with an auditory cue that predicted the orientation of an upcoming grating stimulus with 75% accuracy. The participants’ task was to compare either the orientation or the contrast of this grating with that of a second grating presented shortly afterward

(see fig. 9). Participants knew before the trial began which of these two tasks they would be performing. Kok et al. measured BOLD activity in V1, where the majority of visual neurons are sensitive to both orientation and contrast. Critically, they found that orientation expectations influenced V1 activity regardless of whether orientation was task-relevant (i.e., the subject had to make a comparative judgment about orientation) or task-irrelevant (i.e., the subject instead had to make a comparative judgment about contrast).

Expectation had two main effects. First, V1 activity was lower overall when the grating’s orientation was expected than when it was unexpected. Second, multi-voxel pattern analysis

56 (MVPA) revealed that the representation of orientation was significantly sharpened when the grating’s orientation was expected: Roughly, a grating’s orientation could be more reliably decoded from V1 activity when its orientation was expected than when it was unexpected. Further analysis suggested that these two effects were likely caused by the systematic suppression of voxels tuned away from the expected orientation value. Kok et al. summarize their results as follows:

Whereas expectation leads to suppressed responses in V1, it concurrently enhances the amount of information about the orientation of the stimulus. Crucially, this pattern of results is exactly what is predicted by the “sharpening” hypothesis of expectation, in which bottom-up sensory signals that are incongruent with prior expectations are relatively suppressed. (268)

To emphasize, expectations sharpened the coding of orientation in V1 regardless of whether orientation was task-relevant (attended) or task-irrelevant (not attended). Thus, the Kok et al. study suggests that expectations can influence the visual processing of orientation without mediation by feature-based attention.41

Insert figure 9 about here

It is also noteworthy that when orientation was task-relevant, expectations enhanced subjects’ perceptual discrimination performance. Specifically, when judging orientation, subjects were more sensitive to small differences between the two gratings if the first grating’s orientation was expected than if it was unexpected. The fact that feature-based expectation both affected the activation of visual brain regions selective for orientation and had a significant influence on

41 It might be objected that we cannot be sure that expectation effects really bypassed attention in this study. What if orientation values were simply attended more when they were expected than when they were unexpected, irrespective of whether orientation was task-relevant? One reason to doubt this is that the effects of feature-based expectation were distinct from those standardly exerted by attention. While attention and expectation both produce sharpened neural representation (and increased perceptual sensitivity), attention tends to increase overall activity, while expectation without attention reduces it (Kok et al. 2011; Kumar et al. 2017). Furthermore, other work suggests that while attention and expectation both increase visual sensitivity to certain features, they do so in different ways: Expectation has its most pronounced effect on sensitivity when signal-to-noise ratio is low, while the reverse is true for attention (Wyart et al. 2012).

57 discrimination ability is particularly important. A common complaint against appeals to descending neural pathways to support cognitive penetration is that we cannot be sure what functional role signals along these pathways are serving (e.g., Pylyshyn 1999; Firestone & Scholl 2016a, 4). Because visual brain regions serve other functions beyond perceptual processing (e.g., visual imagery), we can’t be sure that an effect on visual brain regions is also an effect on perception. However, the Kok et al. behavioral evidence suggests that top-down signals sent to visual brain areas enhanced sensitivity to fine-grained differences in orientation within those areas. This is an effect on the perceptual processing of orientation: The visual system encodes orientations with greater precision when they are expected than when they are not expected, and this improves discrimination. Thus, the significance of the study is not just that it illustrates a top-down pathway in the brain, but that the physiological effect has a coherent upshot for perceptual processing.42

Feature-based expectation seems to furnish a compelling case for cognitive penetration, even granting the appropriateness of Gross’s restrictive criteria. Expectations guide perceptual processing in a way rather like epistemic reasons. Here is how it works: The belief that F (e.g., a particular orientation) is probable in the current context leads perceptual computations to treat F as more probable than they would have otherwise. This is implemented through suppression to those perceptual circuits that respond to non-F, leading F-responsive units to have higher activation (other things being equal) than non-F-responsive units. Assuming that the relative strength of population response to two features corresponds to how probable perceptual processes take those two features to be, this is a semantically coherent effect of cognition on perception. Beliefs about probability help

42 The study is important for one more reason. Some have attempted to explain pre-cueing effects on visual processing by appeal to bottom-up perceptual priming either within a trial (as a result of the cue) or from one trial to the next (as a result of the stimulus perceived on the previous trial) (Theeuwes 2013). However, two features of the Kok et al. (2012) study preclude this explanation. First, the study employed an arbitrary cue (tone pitch) that had no obvious perceptual similarity to the cued feature (grating orientation). Second, the cued and presented orientation changed unpredictably from trial to trial. Thus, the results can’t be explained by within- or between-trial perceptual priming.

58 to guide the perceptual assignment of probability. Thus, feature-based expectation provides compelling evidence for a direct, semantically coherent influence of cognition on perception.

The argument is not decisive. One might insist that feature-based expectation, like attention, is always indirect. Perhaps, for instance, expectations can influence perceptual processing only by way of “expectation commands” (e.g., expect F). But this claim would be unmotivated without independent evidence for such intermediate states. The argument also relies on a particular content- level interpretation of the activation patterns observed by Kok et al. (2012)—namely, that they correspond to (at least implicit) probability assignments.43 Perhaps a devotee of impenetrability will defend a different interpretation. However, the content-level interpretation is buttressed by the relation between physiological and behavioral results. The proposal that feature-based expectations influence probability assignments in V1 nicely explains why subjects were more sensitive to fine- grained differences in orientation when making judgments about that dimension.44 In any case, I emphasize again that I don’t need to decisively refute the impenetrability thesis. I contend only that we should have serious doubts about it, and that this motivates the transition to DRH.

Fortunately, the foregoing findings are perfectly consistent with DRH. The finding that expectations can modulate the representation of orientation in V1 is not surprising (we already knew that V1 represents orientation). Likewise for top-down influences on motion representation in MT

(Serences & Boynton 2007; Patzwahl & Treue 2009), color representation in V4 (Bichot et al. 2005), and shape representation in lateral occipital cortex (Stokes et al. 2009). But not anything goes.

Cognition affects perception only within the bounds of background representational architecture.

43 See Ritchie, Kaplan, and Klein (2017) for concerns about the use of MVPA to infer representational contents. 44 A further concern is that a later study by Kumar et al. (2017) actually found decreased classification accuracy for expected stimuli versus unexpected stimuli in the inferotemporal cortex (IT) of macaque monkeys. However, as Kumar et al. observe, these studies differed in a number of respects (Kumar et al. 2017, 1452; see also de Lange et al. 2018). It is reasonable to suppose that expectations might exert different effects on sensory processing at different levels of the cortical hierarchy (V1 in the case of Kok et al., IT in the case of Kumar et al.) and depending on the type of property that is expected (grating orientation in the case of Kok et al., complex familiar shapes in the case of Kumar et al.).

59 One might complain that the effects I’ve adduced in this section are unexciting—at least not revolutionary. But that is precisely the point. DRH explains why such cases don’t require sweeping changes to how we think about the mind’s architecture. We can maintain an architectural division between perception and cognition even if information flows between them in both directions. What we need are constraints on the manner of information flow. This is what DRH provides.

6. Conclusion

This paper has defended an architectural approach to the perception-cognition border. According to the Dimension Restriction Hypothesis (DRH), perceptual processes are architecturally constrained to compute over a limited class of dimensions, while cognitive processes are not. This places a filter on any top-down influence on perception, and marks an architectural boundary between perception and cognition—a penetrable boundary, but a boundary nonetheless. I’ve also argued that DRH enjoys empirical advantages over its main rivals. The view comports with the salient limits on cognitive penetration in visual search and texture segregation, while unconstrained predictive processing approaches do not. Moreover, the view accommodates the substantial evidence for top- down influences produced by feature-based attention and feature-based expectation, while the cognitive impenetrability thesis does not.

If DRH is correct, then it is possible to uphold a perception-cognition border even if top- down effects on perception are pervasive. The view also promises to improve our understanding of the limits of perception by adjudicating controversial cases. For example, there has been extensive debate concerning whether capacities to apprehend biologically important categories like causation, agency, facial expressions, and numerosity count as perceptual or cognitive (Block 2014; Briscoe

2015; Burge 2014; Carey 2009; Rips 2011; Shea 2014; Siegel 2010). If DRH is right, then the answer depends on whether the processes that produce representations of these categories are

60 dimensionally restricted. If so, then we should conclude that they are perceptual. In fact, those who contend that causation and animacy are genuinely perceived often emphasize that the apprehension of these properties depends on a relatively small, bounded class of cue variables. Changes in these variables result in predictable changes in representation of the high-level property (Scholl & Gao

2013; Kominsky et al. 2017). DRH helps us see why these sorts of arguments have merit.

There is an element of bootstrapping here. We support a theory of the perception-cognition border by appeal to the uncontroversial cases, and then we use the theory to help adjudicate the controversial ones. Naturally, if someone thinks that there are uncontroversial cases of perceptual processing that don’t meet DRH’s criteria, then they will object to using the view as an arbiter in this way. But there are other reasons one might balk. One might accept dimension restriction as a necessary condition for perception, but hold that other features (e.g., automaticity or stimulus- dependence) are also required for a process to count as perceptual. If so, then verifying that a process is dimension-restricted would not immediately show that it is perceptual.45 I am open to the possibility that further conditions may need to be added to DRH. However, I predict that dimension restriction will remain at the core of any successor view, since DRH specifies minimal conditions for architectural division: constraints on information flow from cognition to perception. In any case, it seems best to start minimal and add further conditions only if they are absolutely needed.46

Finally, the relations between DRH and various other views of the perception-cognition border remain to be explored. It has been claimed, for instance, that perception and cognition differ in their representational format: Perhaps allowing a few exceptions, perceptual representations are

45 Recall from section 1, however, that I do not intend DRH to provide sufficient conditions for being perceptual. Rather, the aim is to supply criteria for determining whether a state is perceptual or cognitive provided that we are antecedently confident that it is one or the other. Nonetheless, one might contend that dimension restriction is not sufficient even for this more modest purpose. 46 But what if it should turn out, on further investigation, that dimension restriction is not even necessary for perception? Then DRH would obviously fall short of its bold ambitions, but this wouldn’t render the view insignificant or uninteresting. For it would still be plausible that dimension restriction is a prototypical marker of the perceptual, and prototypical markers can still be useful in adjudicating controversial cases (e.g., Block forthcoming, ch. 2). Thus, even if you think DRH has counterexamples, you may still find important work for its criteria.

61 iconic while cognitive representations are discursive (Block forthcoming; Burge 2010; although see

Green & Quilty-Dunn 2017). While it’s possible that DRH will turn out to be extensionally compatible with a format-based conception of the perception-cognition border, there is no a priori reason to expect this, since dimension restriction is compatible with either iconic or discursive format. Another recent proposal is that perceptual states function to be stimulus-dependent, while cognitive states do not (Beck 2018). Again, DRH is not necessarily inconsistent with this view, but it would be a genuine discovery if dimensionally restricted processes aligned cleanly with stimulus- dependent processes. If the views do diverge, then their advantages must be weighed. Perhaps one approach is simply superior to the others, or perhaps a type of pluralism about the perception- cognition border is warranted. Nonetheless, these issues must be left for another time.

References

Anderson, Barton L. 1997. “A theory of illusory lightness and transparency in monocular and binocular images: The role of contour junctions.” Perception 26, no. 4: 419-453. Andrews, Timothy J., Denis Schluppeck, Dave Homfray, Paul Matthews, and Colin Blakemore. 2002. “Activity in the fusiform gyrus predicts conscious perception of Rubin’s vase-face illusion.” NeuroImage 17: 890-901. Balcetis, Emily, and David Dunning. 2010. “Wishful seeing: More desired objects are seen as closer.” Psychological Science 21, no. 1: 147-152. Barrett, H. Clark, and Robert Kurzban. 2006. “Modularity in cognition: Framing the debate. Psychological Review 113, no. 3: 628-647. Beck, Jacob. 2018. “Marking the perception–cognition boundary: The criterion of stimulus- dependence.” Australasian Journal of 96, no. 2: 319-334. Bergen, James R., and Bela Julesz. 1983. “Parallel versus serial processing in rapid pattern discrimination.” Nature 303, no. 5919: 696-698. Bichot, Narcisse P., Andrew F. Rossi, and Robert Desimone. 2005. “Parallel and serial neural mechanisms for visual search in macaque area V4.” Science 308, no. 5721: 529-534. Block, Ned. 2014. “Seeing-as in the light of vision science.” Philosophy and Phenomenological Research 89, no. 3: 560-572. Block, Ned. 2016. “Tweaking the concepts of perception and cognition.” Behavioral and Brain Sciences 39: 21-22. Block, Ned. 2016. “The Anna Karenina principle and skepticism about unconscious perception.” Philosophy and Phenomenological Research 93, no. 2: 452-459. Block, Ned. 2018. “If perception is probabilistic, why does it not seem probabilistic?” Philosophical Transactions of the Royal Society B 373: 20170341. Block, Ned. forthcoming. The Border Between Seeing and Thinking. Cambridge, MA: MIT Press.

62 Born, Richard T., and David C. Bradley. 2005. “Structure and function of visual area MT.” Annual Review of 28: 157-189. Briscoe, Robert. 2015. “Cognitive penetration and the reach of phenomenal content.” In J. Zeimbekis and A. Raftopoulos (eds.), The Cognitive Penetrability of Perception, 174-199. Oxford: Oxford University Press. Buffalo, Elizabeth A., Pascal Fries, Rogier Landman, Hualou Liang, and Robert Desimone. 2010. “A backward progression of attentional effects in the ventral stream.” Proceedings of the National Academy of Sciences 107, no. 1: 361-365. Bugelski, B. R., and Delia A. Alampay. 1961. “The role of frequency in developing perceptual sets.” Canadian Journal of Psychology 14, no. 4: 205-211. Burge, Tyler. 2010. Origins of Objectivity. Oxford: Oxford University Press. Burge, Tyler. 2014. “Reply to Block: Adaptation and the upper border of perception.” Philosophy and Phenomenological Research 89, no. 3: 573-583. Burnston, Daniel C. 2017. “Cognitive penetration and the cognition-perception interface.” Synthese 194, no. 9: 3645-3668. Burnston, Daniel C., and Cohen, Jonathan. 2015. “Perceptual integration, modularity, and cognitive penetration.” In J. Zeimbekis and A. Raftopoulos (eds.), The Cognitive Penetrability of Perception: New Philosophical Perspectives, 123-143. Oxford: Oxford University Press. Butler, Andrea, Ipek Oruc, Christopher J. Fox, and Jason J.S. Barton. 2008. “Factors contributing to the adaptation aftereffects of facial expression.” Brain Research 1191: 116-126. Byrne, Alex. 2009. “Experience and content.” The Philosophical Quarterly 59, no. 236: 429-451. Cacciamani, Laura, Andrew J. Mojica, Joseph L. Sanguinetti, and Mary A. Peterson. 2014. “Semantic access occurs outside of awareness for the ground side of a figure.” Attention, Perception, & Psychophysics 76: 2531-2547. Carey, Susan. 2009. The Origin of Concepts. Oxford: Oxford University Press. Carruthers, Peter. 2006a. “Simple heuristics meet massive modularity.” In P. Carruthers, S. Laurens, and S. Stich (eds.), The Innate Mind: Culture and Cognition, 181-198. Oxford: OUP. Carruthers, Peter. 2006b. The Architecture of the Mind. Oxford: Oxford University Press. Chelazzi, Leonardo, Earl K. Miller, John Duncan, and Robert Desimone. 2001. “Responses of neurons in macaque area V4 during memory-guided visual search.” Cerebral Cortex 11, no. 8: 761-772. Chiappe, Dan L. 2000. “Metaphor, modularity, and the evolution of conceptual integration.” Metaphor and Symbol 15, no. 3: 137-158. Clark, Andy. 2013. “Whatever next? Predictive , situated agents, and the future of cognitive science.” Behavioral and Brain Sciences 36: 181-204. Clark, Andy. (2015). Surfing Uncertainty: Prediction, Action, and the Embodied Mind. Oxford: Oxford University Press. Clark, Andy. 2016. “Attention alters predictive processing.” Behavioral and Brain Sciences 39: 23-24. Clark, Austen. 1993. Sensory Qualities. Oxford: Oxford University Press. Cohen, Michael A., George A. Alvarez, Ken Nakayama, and Talia Konkle. 2017. “Visual search for object categories is predicted by the representational architecture of high-level visual cortex.” Journal of 117: 388-402. Cosmides, Leda, and John Tooby. 2005. “Neurocognitive adaptations designed for social exchange.” In D. M. Buss (ed.), Handbook of Evolutionary Psychology, 584-627. Hoboken, NJ: Wiley. Cummins, Robert. C. 1983. The Nature of Psychological Explanation. Cambridge, MA: MIT Press. Currie, Gregory, and Sterelny, Kim. 2000. “How to think about the modularity of mind-reading.” The Philosophical Quarterly 50, no. 199: 145-160. Davis, Greg, and Jon Driver. 1998. “Kanizsa subjective figures can act as occluding surfaces at

63 parallel stages of visual search.” Journal of Experimental Psychology: Human Perception and Performance 24, no. 1: 169-184. de Lange, Floris P., Micha Heilbron, and Peter Kok. 2018. “How do expectations shape perception?” Trends in Cognitive Sciences 22, no. 9: 764-779. Delorme, Arnaud, Ghislaine Richard, and Michele Fabre-Thorpe. 2010. “Key visual features for rapid categorization of animals in natural scenes.” Frontiers in Psychology 1: 1-13. Den Ouden, Hanneke EM, Peter Kok, and Floris P. De Lange. 2012. “How prediction errors shape perception, attention, and motivation.” Frontiers in Psychology 3: 1-12. Drayson, Zoe. 2017. “Modularity and the predictive mind.” Philosophy and Predictive Processing. Frankfurt am Main: MIND Group. DOI: 10.15502/9783958573130. Duncan, John, and Glyn W. Humphreys. 1989. “Visual search and stimulus similarity.” Psychological Review 96, no. 3: 433-458. Durgin, Frank H., Jodie A. Baird, Mark Greenburg, Robert Russell, Kevin Shaughnessy, and Scott Waymouth. 2009. “Who is being deceived? The experimental demands of wearing a backpack.” Psychonomic Bulletin & Review 16, no. 5: 964-969. Durgin, Frank H., Dinah DeWald, Stephanie Lechich, Zhi Li, and Zachary Ontiveros. 2011. “Action and motivation: Measuring perception or strategies?” Psychonomic Bulletin & Review 18, no. 6: 1077-1082. Eddon, Maya. 2013. “Quantitative properties.” Philosophy Compass 8, no. 7: 633-645. Ernst, Marc O. 2007. “Learning to integrate arbitrary signals from vision and touch.” Journal of Vision 7, no. 5: 1-14. Evans, Karla K., and Anne Treisman. 2005. “Perception of objects in natural scenes: is it really attention free?” Journal of Experimental Psychology: Human Perception and Performance 31, no. 6: 1476-1492. Firestone, Chaz. 2013. “How “paternalistic” is spatial perception? Why wearing a heavy backpack doesn’t—and couldn’t—make hills look steeper.” Perspectives on Psychological Science 8, no. 4: 455-473. Firestone, Chaz, and Scholl, Brian J. 2016a. “Cognition does not affect perception: Evaluating the evidence for “top-down” effects.” Behavioral and Brain Sciences 39: 1-19. Firestone, Chaz, and Scholl, Brian J. 2016b. “Seeing and thinking: Foundational issues and empirical horizons.” Behavioral and Brain Sciences 39: 53-67. Fodor, Jerry A. 1983. The Modularity of Mind. Cambridge, MA: MIT Press. Fodor, Jerry A. 2000. The Mind Doesn’t Work That Way. Cambridge, MA: MIT Press. Fodor, Jerry A. 2008. LOT 2: The Language of Thought Revisited. Oxford: Oxford University Press. Gärdenfors, Peter. 2000. Conceptual Spaces: The Geometry of Thought. Cambridge, MA: MIT Press. Geisler, Wilson S., Jeffrey S. Perry, B. J. Super, and D. P. Gallogly. 2001. “Edge co-occurrence in natural images predicts contour grouping performance.” Vision Research 41, no. 6: 711-724. Gigerenzer, Gerd, Peter M. Todd, and the ABC Research Group. 1999. Simple Heuristics That Make Us Smart. Oxford: Oxford University Press. Green, E. J., and Jake Quilty-Dunn. 2017. “What is an object file?” The British Journal for the Philosophy of Science. DOI: axx055. Gross, Steven. 2017. “Cognitive penetration and attention.” Frontiers in Psychology 8: 1-12. Hills, Peter J., David A. Ross, and Michael B. Lewis. 2011. Attention misplaced: The role of diagnostic features in the face-inversion effect. Journal of Experimental Psychology: Human Perception and Performance 37, no. 5: 1396-1406. Hohwy, Jakob. 2013. The Predictive Mind. Oxford: Oxford University Press. Hsiao, Jhih-Yun, Yi-Chuan Chen, Charles Spence, and Su-Ling Yeh. 2012. “Assessing the effects of

64 audiovisual semantic congruency on the perception of a bistable figure.” and Cognition 21, no. 2: 775-787. Julesz, Bela. 1981. “Textons, the elements of texture perception, and their interactions.” Nature 290, no. 5802: 91-97. Kanai, Ryota, Bahador Bahrami, and Geraint Rees. 2010. “Human parietal cortex structure predicts individual differences in perceptual rivalry.” Current Biology 20, no. 18: 1626-1630. Kanwisher, Nancy, and Galit Yovel. 2006. “The fusiform face area: A cortical region specialized for the perception of faces.” Philosophical Transactions of the Royal Society B 361: 2109-2128. Kleffner, Dorothy A., and V. S. Ramachandran. 1992. “On the perception of shape from shading.” Perception & Psychophysics 52, no. 1: 18-36. Kok, Peter, Janneke FM Jehee, and Floris P. De Lange. 2012. “Less is more: expectation sharpens representations in the primary visual cortex.” Neuron 75, no. 2: 265-270. Kok, Peter, Dobromir Rahnev, Janneke FM Jehee, Hakwan C. Lau, and Floris P. De Lange. 2011. “Attention reverses the effect of prediction in silencing sensory signals.” Cerebral Cortex 22, no. 9: 2197-2206. Kominsky, Jonathan F., Brent Strickland, Annie E. Wertz, Claudia Elsner, Karen Wynn, and Frank C. Keil. 2017. “Categories and constraints in causal perception.” Psychological Science 28, no. 11: 1649-1662. Kosslyn, Stephen M., William L. Thompson, Irene J. Kim, and Nathaniel M. Alpert. 1995. “Topographical representations of mental images in primary visual cortex.” Nature 378: 496- 498. Kumar, Susheel, Peter Kaposvari, and Rufin Vogels. 2017. “Encoding of predictable and unpredictable stimuli by inferior temporal cortical neurons.” Journal of 29, no. 8: 1445-1454. Leopold, David A., & Nikos K. Logothetis. 1999. “Multistable phenomena: Changing views in perception.” Trends in Cognitive Sciences 3, no. 7: 254-264. Levin, Daniel T., Yukari Takarae, Andrew G. Miner, and Frank Keil. 2001. “Efficient visual search by category: Specifying the features that mark the difference between artifacts and animals in preattentive vision.” Perception & Psychophysics 63, no. 4: 676-697. Li, Hui, Yan Bao, Ernst Pöppel, and Yi-Huang Su. 2014. “A unique visual rhythm does not pop out.” Cognitive Processing 15, no. 1: 93-97. Ling, Sam, Taosheng Liu, and Marisa Carrasco. 2009. “How spatial and feature-based attention affect the gain and tuning of population responses.” Vision Research 49, no. 10: 1194-1204. Long, Bria, Viola S. Störmer, & George A. Alvarez. 2017. “Mid-level perceptual features contain early cues to animacy.” Journal of Vision 17, no. 6: 20, 1-20. Long, Gerald M., & Thomas C. Toppino. 2004. “Enduring interest in perceptual ambiguity: Alternating views of reversible figures.” Psychological Bulletin 130, no. 5: 748-768. Lu, Zhong-Lin, and George Sperling. 2001. “Three-systems theory of human visual motion perception: review and update.” Journal of the Optical Society of America A 18, no. 9: 2331-2370. Luck, Steven J., and Edward K. Vogel. 1997. “The capacity of visual working memory for features and conjunctions.” Nature 390, no. 6657: 279-281. Lupyan, Gary. 2015a. “Cognitive penetrability of perception in the age of prediction: Predictive systems are penetrable systems.” Review of Philosophy and Psychology 6, no. 4: 547-569. Lupyan, Gary. 2015b. “Reply to Macpherson: Further illustrations of the cognitive penetrability of perception.” Review of Philosophy and Psychology 6, no. 4: 585-589. Lupyan, Gary. 2017. “Changing what you see by changing what you know: The role of attention.” Frontiers in Psychology 8: 1-15. Lupyan, Gary, and Clark, Andy. 2015. “Words and the world: Predictive coding and the language-

65 perception-cognition interface.” Current Directions in Psychological Science 24, no. 4: 279-284. Lupyan, Gary, and Michael J. Spivey. 2008. “Perceptual processing is facilitated by ascribing meaning to novel stimuli.” Current Biology 18, no. 10: R410-R412. Lupyan, Gary, and Emily J. Ward. 2013. “Language can boost otherwise unseen objects into visual awareness.” Proceedings of the National Academy of Sciences 110, no. 35: 14196-14201. Macpherson, Fiona. 2012. “Cognitive penetration and colour experience: Rethinking the issue in light of an indirect mechanism.” Philosophy and Phenomenological Research, 84, no. 1: 24-62. Macpherson, Fiona. 2017. “The relationship between cognitive penetration and predictive coding.” Consciousness and Cognition 47: 6-16. Malinowski, Peter, and Ronald Hübner. “The effect of familiarity on visual-search performance: Evidence for learned basic features.” Perception & Psychophysics 63, no. 3: 458-463. Mamassian, Pascal, Michael Landy, and Laurence T. Maloney. 2002. “Bayesian modelling of visual perception.” In R. P. N. Rao, B. A. Olshausen, and M. S. Lewicki (eds.), Probabilistic Models of the Brain: Perception and Neural Function, 13-36. Cambridge, MA: MIT Press. Mandelbaum, Eric. 2017. “Seeing and conceptualizing: Modularity and the shallow contents of perception.” Philosophy and Phenomenological Research. DOI: 10.1111/phpr.12368. Marr, David. 1982. Vision: A Computational Investigation into the Human Representation and Processing of Visual Information. San Francisco, CA: W.H. Freeman. Martinez-Trujillo, Julio C., and Stefan Treue. 2004. “Feature-based attention increases the selectivity of population responses in primate visual cortex.” Current Biology 14, no. 9: 744-751. Masrour, Farid, Gregory Nirchberg, Michael Schon, Jason Leardi, and Emily Barrett. 2015. “Revisiting the empirical case against perceptual modularity.” Frontiers in Psychology 6: 1676. Meng, Ming, and Frank Tong. 2004. “Can attention selectively bias bistable perception? Differences between binocular rivalry and ambiguous figures.” Journal of Vision 4: 539-551. Mole, Christopher. 2015. “Attention and cognitive penetration.” In J. Zeimbekis and A. Raftopoulos (eds.), The Cognitive Penetrability of Perception: New Philosophical Perspectives, 218-238. Oxford: Onnnexford University Press. Nishida, Shin'ya. 2011. “Advancement of motion psychophysics: Review 2001-2010.” Journal of Vision 11, no. 5: 1-53. Ogilvie, Ryan, and Peter Carruthers. 2015. “Opening up vision: The case against encapsulation.” Review of Philosophy and Psychology 7, no. 4: 721-742. Olivers, Christian NL, Judith Peters, Roos Houtkamp, and Pieter R. Roelfsema. 2011. “Different states in visual working memory: When it guides attention and when it does not.” Trends in Cognitive Sciences 15, no. 7: 327-334. Orlandi, Nico, and Geoff Lee. 2018. “How radical is predictive processing?” In M. Colombo, E. Irvine, and M. Stapleton (eds.), and His Critics. Oxford: Oxford University Press. Patzwahl, Dieter R., and Stefan Treue. 2009. “Combining spatial and feature-based attention within the receptive field of MT neurons.” Vision Research 49, no. 10: 1188-1193. Peterson, Mary A., & Bradley S. Gibson. 1991. “Directing spatial attention within an object: Altering the functional equivalence of shape description. Journal of Experimental Psychology: Human Perception and Performance 17, no. 1: 170-182. Phillips, Ben. 2017. “The shifting border between perception and cognition.” Noûs. DOI: 10.1111/nous.12218. Phillips, Ian. 2018. “Unconscious perception reconsidered.” 59, no. 4: 471-514. Pinto, Yair, Simon van Gaal, Floris P. de Lange, Victor AF Lamme, and Anil K. Seth. 2015. “Expectations accelerate entry of visual stimuli into awareness.” Journal of Vision 15: 1-15. Potter, Mary C., Brad Wyble, Carl Erick Hagmann, and Emily S. McCourt. 2014. “Detecting meaning in RSVP at 13 ms per picture.” Attention, Perception, & Psychophysics 76: 270-279.

66 Pylyshyn, Zenon W. 1984. Computation and Cognition. Cambridge, MA: MIT Press. Pylyshyn, Zenon W. 1999. “Is vision continuous with cognition? The case for cognitive impenetrability of visual perception.” Behavioral and Brain Sciences 22, no. 3: 341-365. Quilty-Dunn, Jake. 2018. “Perceptual pluralism.” Noûs. DOI: 10.1111/nous.12285. Quilty-Dunn, Jake. 2019. “Attention and encapsulation.” Mind & Language. DOI: 10.1111/mila.12242. Rescorla, Michael. 2015. “Bayesian perceptual psychology.” In M. Matthen (ed.), The Oxford Handbook of Philosophy of Perception, 694-716. Oxford: Oxford University Press. Rips, Lance J. 2011. “Causation from perception.” Perspectives on Psychological Science 6, no. 1: 77-97. Ritchie, J. Brendan, David Michael Kaplan, and Colin Klein. 2017. “Decoding the brain: Neural representation and the limits of multivariate pattern analysis in cognitive neuroscience.” The British Journal for the Philosophy of Science. DOI: axx023. Rosenholtz, Ruth, Jie Huang, Alvin Raj, Benjamin J. Balas, and Livia Ilie. 2012. “A summary statistic representation in peripheral vision explains visual search.” Journal of Vision 12(4), no. 14, 1- 17. Rosenholtz, Ruth. 2015. “Texture perception.” In J. Wagemans (ed.), The Oxford Handbook of Perceptual Organization, 167-186. Oxford: Oxford University Press. Rosenthal, David. 2010. “How to think about mental qualities.” Philosophical Issues 20: 368-393. Samuels, Richard. 2006. “Is the human mind massively modular?” In R. Stainton (ed.), Contemporary Debates in Cognitive Science, 37-56. Oxford: Blackwell. Schellenberg, Susanna. 2018. The Unity of Perception: Content, Consciousness, Evidence. Oxford: Oxford University Press. Schmidt, Filipp, Mathias Hegele, & Roland W. Fleming. 2017. “Perceiving animacy from shape.” Journal of Vision 17, no. 11: 10, 1-15. Scholl, Brian J. 2009. “What have we learned about attention from multiple object tracking (and vice versa)?” In D. Dedrick and L. Trick (eds.), Computation, Cognition, and Pylyshyn, 49-78. Cambridge, MA: MIT Press. Scholl, Brian J., and Tao Gao. 2013. “Perceiving animacy and intentionality: Visual processing or higher-level judgment?” In M. D. Rutherford and V. A. Kuhlmeier (eds.), Social Perception: Detection and Interpretation of Animacy, Agency, and Intention. Cambridge, MA: MIT Press. Schyns, Philippe G., Robert L. Goldstone, and Jean-Pierre Thibaut. 1998. “The development of features in object concepts.” Behavioral and Brain Sciences 21, no. 1: 1-17. Serences, John T., and Geoffrey M. Boynton. 2007. “Feature-based attentional modulations in the absence of direct visual stimulation.” Neuron 55, no. 2: 301-312. Shea, Nicholas. 2014. “Distinguishing top-down from bottom-up effects.” In D. Stokes and M. Matthen (eds.), Perception and Its Modalities, 73-91. Oxford: Oxford University Press. Siegel, Susanna. 2010. The Contents of Visual Experience. Oxford: Oxford University Press. Siegel, Susanna. 2012. “Cognitive penetrability and perceptual justification.” Noûs 46, no. 2: 201- 222. Siegel, Susanna, and Alex Byrne. 2016. “Rich or thin?” In B. Nanay (ed.), Contemporary Debates in the Philosophy of Perception. New York: Routledge. Sigman, Mariano, and Charles D. Gilbert. 2000. “Learning to find a shape.” Nature Neuroscience 3, no. 3: 264-269. Singh, Manish, and Donald D. Hoffman. 2001. “Part-based representations of visual shape and implications for visual cognition.” Advances in Psychology 130: 401-459. Slotnick, Scott D., William L. Thompson, and Stephen M. Kosslyn. 2005. “Visual mental imagery induces retinotopically organized activation of early visual areas.” Cerebral Cortex 15, no. 10: 1570-1583.

67 Slotnick, Scott D., and Steven Yantis. 2005. “Common neural substrates for the control and effects of visual attention and perceptual bistability.” Cognitive Brain Research 24, no. 1: 97-108. Sperber, Dan. 2002. “In defense of massive modularity.” In I. Dupoux (ed.), Language, Brain, and Cognitive Development, 47-57. Cambridge, MA: MIT Press. Sperber, Dan, and Deirdre Wilson. 2002. “Pragmatics, modularity, and mind-reading.” Mind & Language 17: 3-23. Stein, Timo, and Marius V. Peelen. 2015. “Content-specific expectations enhance stimulus detectability by increasing perceptual sensitivity.” Journal of Experimental Psychology: General 144, no. 6: 1089-1104. Stein, Timo, and Marius V. Peelen. 2017. “Object detection in natural scenes: Independent effects of spatial and category-based attention. Attention, Perception, & Psychophysics 79, no. 3: 738-752. Sterzer, Philipp, Andreas Kleinschmidt, and Geraint Rees. 2009. “The neural bases of multistable perception.” Trends in Cognitive Sciences 13, no. 7: 310-318. Stokes, Dustin. 2013. “Cognitive penetrability of perception.” Philosophy Compass 8, no. 7: 646-663. Stokes, Dustin. 2015. “Towards a consequentialist understanding of cognitive penetration.” In J. Zeimbekis and A. Raftopoulos (eds.), The Cognitive Penetrability of Perception: New Philosophical Perspectives, 75-100. Oxford: Oxford University Press. Stokes, Mark, Russell Thompson, Anna C. Nobre, and John Duncan. 2009. “Shape-specific preparatory activity mediates attention to targets in human visual cortex.” Proceedings of the National Academy of Sciences 106, no. 46: 19569-19574. Störmer, Viola S., and George A. Alvarez. 2014. “Feature-based attention elicits surround suppression in feature space.” Current Biology 24, no. 17: 1985-1988. Summerfield, Christopher, and Tobias Egner. 2016. “Feature-based attention and feature-based expectation.” Trends in Cognitive Sciences 20, no. 6: 401-404. Teufel, Christoph, and Bence Nanay. 2017. “How to (and how not to) think about top-down influences on visual perception.” Consciousness and Cognition 47: 17-25. Theeuwes, Jan. 1995. “Abrupt luminance change pops out; abrupt color change does not.” Perception & Psychophysics 57, no. 5: 637-644. Theeuwes, Jan. 2013. “Feature-based attention: It is all bottom-up priming.” Philosophical Transactions of the Royal Society B 368: 20130055. Tong, Frank, Ken Nakayama, J. Thomas Vaughan, and Nancy Kanwisher. 1998. “Binocular rivalry and visual awareness in human extrastriate cortex.” Neuron 21, no. 4: 753-759. Treisman, Anne. 1985. “Preattentive processing in vision.” Computer Vision, Graphics, and Image Processing 31, no. 2: 156-177. Treisman, Anne, and Stephen Gormican. 1988. “Feature analysis in early vision: Evidence from search asymmetries.” Psychological Review 95, no. 1: 15-48. Treisman, Anne, and Randolph Paterson. 1984. “Emergent features, attention, and object perception.” Journal of Experimental Psychology: Human Perception and Performance 10, no. 1: 12-31. Treue, Stefan. 2015. “Object- and feature-based attention: Monkey physiology.” In A. C. Nobre and S. Kastner (eds.), The Oxford Handbook of Attention, 601-619. Oxford: OUP. van Tonder, Gert J., and Yoshimichi Ejima. 2000. “Bottom-up clues in target finding: Why a Dalmatian may be mistaken for an elephant.” Perception 29: 149-157. Vetter, Petra, and Albert Newen. 2014. “Varieties of cognitive penetration in visual perception.” Consciousness and Cognition 27: 62-75. Webster, Michael A., and Donald I. A. MacLeod. 2011. “Visual adaptation and face perception.” Philosophical Transactions of the Royal Society B: Biological Sciences 366, no. 1571: 1702-1725. Wolfe, Jeremy M. 1992. ““Effortless” texture segmentation and “parallel” visual search are not the

68 same thing.” Vision Research 32, no. 4: 757-763. Wolfe, Jeremy M. 2007. “Guided search 4.0: Current progress with a model of visual search.” In W. D. Gray (ed.), Integrated Models of Cognitive Systems, 99-119. Oxford: Oxford University Press. Wolfe, Jeremy M. 2015. “Approaches to visual search: Feature integration theory and guided search.” In A. C. Nobre and S. Kastner (eds.), The Oxford Handbook of Attention, 11-55. Oxford: Oxford University Press. Wolfe, Jeremy M., and Jennifer S. DiMase. 2003. “Do intersections serve as basic features in visual search?” Perception 32, no. 6: 645-656. Wolfe, Jeremy M., and Todd S. Horowitz. 2004. “What attributes guide the deployment of visual attention and how do they do it?” Nature Reviews Neuroscience 5, no. 6: 495-501. Wolfe, Jeremy M., and Todd S. Horowitz. 2017. “Five factors that guide attention in visual search.” Nature Human Behaviour 1, no. 3: 0058. Wolfe, Jeremy M., and Loretta Myers. 2010. “Fur in the midst of the waters: Visual search for material type is inefficient.” Journal of Vision 10, no. 9: 1-9. Woodman, Geoffrey F., and Edward K. Vogel. 2008. “Selective storage and maintenance of an object’s features in visual working memory.” Psychonomic Bulletin & Review 15, no. 1: 223-229. Wu, Wayne. 2017. “Shaking up the mind’s ground floor: The cognitive penetration of visual attention.” The Journal of Philosophy 114, no. 1: 5-32. Wyart, Valentin, , and Christopher Summerfield. 2012. “Dissociable prior influences of signal probability and relevance on visual contrast sensitivity.” Proceedings of the National Academy of Sciences 109, no. 9: 3593-3598.

69

Figure 1. Assembly line. Objects enter the assembly line and are sorted by size. The big objects are passed to the temperature sorter, while the small objects are passed to the weight sorter. The whole process has four possible outputs.

Figure 2. The dots that are brighter toward the top tend to appear convex, while those brighter toward the bottom tend to appear concave. The standard explanation is that the visual system assumes a higher probability for light source locations above the perceiver. Source: Kleffner & Ramachandran (1992). Reproduced with permission of Springer Nature.

70

Figure 3a. Image by R. C. James Figure 3b. Source: Lupyan (2017). Reproduced under the terms of the Creative Commons Attribution License.

Figure 4. Original and texform images used by Long et al. (2017). Source: Long et al. (2017). Reproduced under the terms of the Creative Commons Attribution-NonCommercial- NoDerivs license.

71

Figure 5a (left): Rat-man image from Bugelski and Alampay (1961). Figure 5b (middle): Necker cube. Figure 5c (right): Rubin’s face-vase image.

Figure 6. Modified Assembly Line.

72 Do intersections serve as basic features in visual search? 653

(a) (b) Figure 8. Sample stimuli from experiment 2. Subjects could be asked to look for (a) intersections Figure 7a (left) and 7b (right). Search for +-shapedor their intersections absence, or (b) is for inefficient. terminators orSource: their absence.Wolfe and Dimase (2003). Reproduced with permission of1250 SAGE Publications.

1125 No terminator versus terminator 47x 549

ms 1000 ‡ = Intersection versus no intersection NATURE VOL. 303 23 JUNE 1983 36x 588 ‡ 875 L£ I I ERSTONo intersection NATURE versus intersection 26x 603 ‡ 750

625 Terminator versus no terminator 8x 502 ‡ 500

2000 Intersection versus no intersection 132x 566 ‡ 1750 No terminator versus terminator 131x 490 ‡ No intersection versus intersection

ms1500 Reaction time

= 95x 557 ‡ 1250

1000 Terminator versus no terminator

Reaction time 40x 552 ‡ 0 750 0 0 500 369 Set size Figure 9. Reaction time6set size results for the four conditions of experiment 2. SOA(ms) No mask Target Ground B 100 •- + L B J.... )( Figure 8. Regions with randomly oriented L’s)( and X’s are effortlessly segmented ..( ..( -f. o -T L from one another (top half)-1 , while regions+ with -f.randomly orientedI' L’s and T’s are .l. I ).. r- -<. .l. ++ + + \ "'----- SOA=IOOms not. Source: BergenA & Julesz (1983). Reprinted with+ permission of Springer Nature. )-, -\ .l. " "\ y y -f. " + " X + J.... + + + + r- 1' 1' X + + " + -1 1- + + -<. .-\I )( ) ..( 1- " -{ '( )( + " X " "\ ..( .l. )( + '( + Fig. 1 A, Demonstration of preattentive versus attentive texture 60 discrimination. The upper half of the figure contains distinct regions of clearly defined shape. The lower half contains embedded regions of identical size and shape which do not perceptually 2 4 6 8 10 segregate. B, Stimuli of the type used in the experiments of Fig. No. of elements 2A and B. Left, lower curve in Fig. 2A ; middle, upper curve in 110 Fig. 2A ; right, first point of lower curve in Fig. 28. (Actual stimuli c were white figures on a dark CRT face.) 7 ELEMENTS73 "- 21 .e• • - 13.8° up the individual elements extend for 50 min arc. The task of • - 9.2° o- 4 .6• the observer was simply to determine whether the elements 90 IJ- 2 .8° were all the same, or if one, the 'target', was different. The target was present in half the trials. The position of the target element, if present, was varied randomly from trial to trial 8 within the outer two shells of the hexagonal array. Thus, the observers did not know where it would appear next. The 70 observer fixated the small central cross which was always present. The time available for inspection was limited by apply- ing, after the test stimulus, an erasing flash consisting of ele- ments which were the union of those to be discriminated. This procedure allows study of discrimination using inspection times SOA(ms) shorter than the persistence of the after-image. Figure 2A compares discrimination behaviour for the two Fig. 2 A, Comparison of probability of correct detection of a target element in the presence of 35 background elements. Solid elements which in Fig. lA yield strong texture segregation, symbols represent the 'L' and 'X', which form strongly segregating with that for two elements which do not. The test and masking textures, while the open symbols represent 'L' and 'T' elements, patterns were each presented for 40 ms with a blank interval which do not (see Fig. lA). Data for two experienced observers of variable duration between them. During this interval, the (J.R.B. and R.A.S.) are shown. The three points at the far right after-image of the test pattern was highly visible but could not, are three independent measurements of performance, without a of course, be inspected using eye movements. The use of this mask, for R.A.S. B, Probability of correct detection of a single brief flash presentation gives many of the benefits of retinal target element presented among a variable number of background stabilization without sophisticated equipment. The data are elements. For description of dashed curve see text. C, Demonstra- plotted as a function of the interval between the test and mask tion of scaling invariance. Seven elements were presented. Diameter of the stimulus ranged from 2.8 to 21.8 deg. arc. onsets ('stimulus onset asynchrony' or SOA). The results in the two cases are clearly different. In the case of one L among 35 be sought out. This introspective observation is supported by Xs (shown at the centre of Fig. lB) detection is almost perfect results for detection of the L when only one background T is by SOA of 160 ms, while for one L among 35 Ts (as in the presented, as shown on the right in Fig. lB. In this case perform- example on the left of Fig. lB) the asymptote is not reached ance is similar to that for one L among 35 Xs. This suggests until about 300 ms SOA and performance never exceeds about that the L among Xs somehow marks its own location, as if it 62% correct. were presented alone on a blank field. We believe that the Perceptually, the L among the Xs stands out, even when the texton difference between the X and L marks the location of inspection interval is very short, while the L among the Ts must the disparate element, removing the need to find it by serial

© 1983 Nature Publishing Group

Figure 9. Overview of a trial in the Kok et al. (2012) study. An auditory cue predicted the likely orientation of the first of two black-and-white striped gratings with 75% reliability. The second grating differed slightly from the first in both orientation and contrast. At the end of a trial, participants performed either an orientation discrimination task (judge whether the second grating was rotated clockwise or counterclockwise relative to the first) or a contrast discrimination task (judge whether the second grating had higher or lower contrast than the first).

74