Normalized

Jeremy Goodman and Bernhard Salow Draft of November 13, 2020

Curious how much he weighs, Bjorn steps on an inexpensive digital scale. He discovers that it reads 175.4 pounds, but this isn’t all that he learns. He also learns something about how much he weighs. This knowledge has limits. Even if Bjorn weighs exactly 175.4 pounds, he doesn’t learn anything so precise. The scale isn’t reliable enough for that, as the fluctuations in its rightmost digits show. Bjorn learns only that his weight is in a certain interval around 175.4 pounds. The size of this interval depends on the workings of the scale. It also depends on Bjorn’s weight, since Bjorn cannot learn what isn’t true. While the details of this dependence of knowledge on weight are controversial, some systematic patterns are clear. For example, if the scale overestimates his weight, then anything Bjorn learns is something he would still have learned if the scale had overestimated his weight by less.1 What explains these contours of Bjorn’s knowledge? Here is our account.2 Not all situations in which the scale reads 175.4 are equally normal. Some are less normal than others, when they involve a greater disparity between Bjorn’s weight and the scale’s reading. If such a situation is sufficiently less normal than the one Bjorn is actually in, then he knows that he is not in that situation. And if such a situation either is not much less normal than any other, or is at least as normal as the situation he is actually in, then for all he knows he is in that situation. As we will show below, this account offers a simple explanation of Bjorn’s knowledge and its limits. Bjorn’s case illustrates a much more general phenomenon. In theorizing about what people know, it is often productive to factor their knowledge into

1Following Williamson (2011, 2013a, 2014, forthcoming), a number of authors have of- fered formal models of cases like this; see Cohen and Comesa˜na(2013), Goodman (2013), Weatherson (2013), Stalnaker (2015), Carter (2019), Dutant and Rosenkranz (forthcoming) and Carter and Goldstein (forthcoming), all of whom agree on the above claims about Bjorn’s knowledge. Later in this paper we will also consider inductive knowledge of a more statistical character, which has also been a topic of recent interest; see Dorr et al. (2014), Bacon (2014, forthcoming), Smith (2017, 2018a) and Goodman and Salow (2018). 2Related ideas about knowledge and normality can be found in Stalnaker (2006, 2015, 2019), Goodman (2013), Greco (2014), Dutant (2016), Goodman and Salow (2018), Carter (2019), Littlejohn and Dutant (2020), Beddor and Pavese (forthcoming), Carter and Goldstein (forthcoming), and Loets (ms). Smith (2010, 2016, 2017, 2018a,b) develops related ideas about justified , a topic we turn to in the second half of this paper. Connections between normality and belief have also been very influential in the literature non-monotonic reasoning; see, e.g., Kraus et al. (1990) and Makinson (1993, §3), and references therein.

1 two components: knowledge of some starting points (such as instrument read- ings, memories, perceptual appearances, and other background certainties) and inductive knowledge that goes beyond these starting points. Call these starting points a person’s evidence. Inductive knowledge is then knowledge that goes beyond one’s evidence.3 This paper offers a general framework for theorizing about inductive knowl- edge in terms of the comparative normality of situations compatible with one’s evidence. Section 1 presents the framework and shows how it can account for the basic contours of Bjorn’s knowledge. Section 2 situates the framework relative to other approaches in the literature. Section 3 shows how different models of cases like Bjorn’s that one finds in the literature correspond to different combi- nations of assumptions about the structure of comparative normality and about how comparative normality determines what people know. We then defend the framework against worries about knowing too little or too much when circum- stances are normal in some respects but not others (section 4), explain how it can represent belief as well as knowledge (section 5), show how it can be extended to study the dynamics of knowledge and belief in response to new evidence (section 6), and compare the results with existing theories of belief- revision (section 7). Finally, we apply the framework to study our knowledge about the outcomes of chancy processes and inductive knowledge of laws of na- ture (section 8), and argue that the striking dynamics of such knowledge demand a rethinking of theories of belief-revision (section 9). Section 10 concludes. Our aim is to make a case not only for the normality framework, but also for the broader methodology of using the tools of epistemic logic to theorize about induction. While there is an immense literature on induction deploying sophisticated formal methods, very little of this work is concerned with knowl- edge (as opposed to belief or probability). No doubt this is in part because the inherent fallibility of induction has led some to question whether inductive knowledge is even possible. For example, when articulating the “old” (tradi- tional) problem of induction, Goodman (1955, p. 61) disparagingly writes that “obviously the genuine problem cannot be one of attaining unattainable knowl- edge or of accounting for knowledge that we do not in fact have”. But this is the wrong reaction. As elsewhere in epistemology, a better methodology is to bracket sweeping skeptical impulses and see whether a stable picture of human knowledge can emerge. Viewed in this experimental spirit, the models described in the course of this paper are of independent interest. For whatever one thinks about the framework that suggested them to us, such models encode strong predictions about the contours of our inductive knowledge, and thereby display the fruitfulness of viewing cases of induction through the lens of epistemic logic.

3Nothing of substance hangs on our use of the word “evidence”. Our point can be accepted by those who, like Williamson (2000), think that all knowledge is evidence; they will simply chafe at our use of “evidence” to characterize the operative notion of starting points. (By contrast, Bacon (2014) and Zardini (2017) do argue that ordinary inductive knowledge presents a substantive challenge for Williamson’s view.) Nor are we claiming that inductive knowledge involves drawing inferences from these starting points: we take no stand on the psychological mechanisms underlying inductive knowledge. We are also friendly to the idea that the operative notion of “evidence”/“starting points” is vague, context-sensitive, and/or interest relative.

2 1 The Normality Framework

This section lays out the basic framework linking knowledge, evidence, and comparative normality. This framework will form the common core of three more specific proposals we will consider in section 3. We will proceed on the familiar idealizing assumption that a person’s knowl- edge can be characterized by the set of situations that, for all they know, they are in – the set of situations epistemically accessible for them – and that a per- son’s evidence is characterized by the set of situations that, for all their evidence implies, they are in – the set of situations that are evidentially accessible for 4 them. We write RK (w) for the set of situations epistemically accessible from w and RE(w) for the set of situations evidentially accessible from w. In addition, following Goodman and Salow (2018), we appeal to two relations of comparative normality: that of one situation being at least as normal as another (which we take to be reflexive and transitive, and write as <), and that of one situation being sufficiently more normal than another (which we take to be asymmetric and to imply being at least as normal, and write as Ï). We assume that these two relations can be chained together: if w1 is at least as normal as w2, w2 is sufficiently more normal than w3, and w3 is at least as normal as w4, then w1 is sufficiently more normal than w4 (w1 < w2 Ï w3 < w4 ⇒ w1 Ï w4). Against this backdrop, let the normality framework be the conjunction of the following five claims, which we will gloss below:5

factivity w is epistemically accessible from w.

w ∈ RK (w) evidential knowledge If v is not evidentially accessible from w, then v is not epistemically ac- cessible from w.

RK (w) ⊆ RE(w) inductive knowledge If w is sufficiently more normal than v, then v is not epistemically acces- sible from w.

w Ï v ⇒ v 6∈ RK (w)

4We talk of “situations” rather than “worlds” since, as will become clear, the relata of epistemic accessibility take a stand on what time it is, whereas possible worlds are often understood as corresponding to possibly maximally strong eternal truths about the course of history (although see Dorr and Goodman (forthcoming) for reasons to resist this “historicist” conception of worlds). In this respect, situations are more like the “centered worlds” of Lewis (1979a) and the “cases” of Williamson (2000). But unlike those notions, we are not here assuming that situations that agree on the course of history and on what time it is can differ further in which person they concern. Our use of “situation” is also different from the one found in the literature on situation semantics (cf. Kratzer (2019)). 5Appendix A collects principles and defined terms that occur repeatedly.

3 inductive limits If v is evidentially accessible from w and no v0 evidentially accessible from w is sufficiently more normal than v, then v is epistemically accessible from w.

(v ∈ RE(w) ∧ ∀u ∈ RE(w)(u Ï6 v)) ⇒ v ∈ RK (w) convexity If v is evidentially accessible from w and is at least as normal as some v0 epistemically accessible from w, then v is epistemically accessible from w.

(v ∈ RE(w) ∧ ∃u ∈ RK (w)(v < u)) ⇒ v ∈ RK (w) factivity says that everything we know is true. evidential knowledge says that our knowledge includes our evidence. inductive knowledge says that any possibility sufficiently less normal than the one we are in is one that we know we are not in; it thus says that certain facts about comparative normal- ity suffice for our knowledge to go beyond our evidence. By contrast, inductive limits says that certain patterns of comparative normality are necessary for our knowledge to go beyond our evidence: in order to know that an evidential possi- bility does not obtain, some other evidential possibility must be sufficiently more normal than it. Finally, convexity says that, to the extent that knowledge can go beyond our evidence, its contours must follow those of comparative normal- ity: every evidential possibility at least as normal as some epistemic possibility must itself be an epistemic possibility.6 To see the framework in action, let us apply it to Bjorn. We assume that his evidence implies that the scale reads 175.4 pounds but does not imply anything about how much he weighs; we also assume that his evidence is the same in all situations compatible with his evidence, so that we may restrict our attention to such situations, all of which will be evidentially accessible from each other. We represent these situations by real numbers, each corresponding to a situation in which Bjorn weighs that many pounds and the scale reads 175.4; for simplicity, we ignore all non-weight differences between situations. Supposing that Bjorn actually weighs α pounds, we make three assumptions about these situations’ comparative normality: 1. A situation in which the scale overestimates Bjorn’s weight by a given amount is at least as normal as one in which the scale overestimates his weight by a greater amount (and likewise for underestimation). (x < y ≤ 175.4 or x > y ≥ 175.4) ⇒ y < x 2. The actual situation is sufficiently more normal than some situations in which the scale overestimates Bjorn’s weight (and likewise for underesti- mation). ∃x(x < 175.4 ∧ α Ï x) and ∃y(y > 175.4 ∧ α Ï y)

6In requiring this, convexity ensures that certain Gettier-style justified true beliefs fail to count as knowledge, as discussed in section 5.

4 3. The situation in which the scale’s reading is perfectly accurate is not sufficiently more normal than every situation in which it overestimates Bjorn’s weight (and likewise for underestimation). ∃x(x < 175.4 ∧ 175.4 Ï6 x) and ∃y(y > 175.4 ∧ 175.4 Ï6 y) Combined with these assumptions, the normality framework implies most of the advertised features of Bjorn’s knowledge of his weight. By (2), α is suffi- ciently more normal than some situations in which the scale overestimates his weight; by (1), and the chaining principle, α is also sufficiently more normal than any situation in which the scale overestimates his weight by more. So, by inductive knowledge, Bjorn’s knowledge puts a lower bound on his weight. Similar reasoning shows that his knowledge also puts an upper bound on his weight. convexity – together with (1) – ensures that his knowledge cannot go beyond the placing of such upper and lower bounds, so that the weights compat- ible with his knowledge form an interval. factivity ensures that this interval always contains α. (1) ensures that the situation in which Bjorn weighs 175.4 is at least as normal as any other; (3) thus ensures that there are some situa- tions involving over- and under-estimation that are not sufficiently less normal than any situation. So inductive limits ensures that the interval representing Bjorn’s knowledge also contains 175.4, and some values on either side of 175.4. These predictions are notable because the normality framework and (1)- (3) leave open many other questions about the structure of Bjorn’s knowledge, questions about which there is disagreement in the literature. This is both be- cause (1)-(3) don’t settle all questions about the comparative normality of the situations in question, and because the normality framework doesn’t settle all questions about what a person knows given their evidence and facts about sit- uations’ comparative normality. In section 3, we will look at how both the nor- mality framework and the constraints on normality (1)-(3) can be elaborated to deliver more specific epistemological predictions, including our earlier claim that decreasing the extremity of the scale’s error should not decreases what Bjorn knows. But before delving into those details, we want to consider some broader features of the normality framework, and how they compare to other accounts of inductive knowledge that may be more familiar.

2 Situating the Framework

One useful reference point for the present project is Lewis’s (1996) well-known theory of knowledge. Like the normality framework, Lewis employs a notion of evidential accessibility (he refers to this as one situation being ‘uneliminated’ by the evidence at another) and endorses evidential knowledge. And his various ‘rules’ for what we can ‘properly ignore’ play roles analogous to the nor- mality framework’s other principles. The ‘rule of actuality’ (that one can never properly ignore one’s actual situation) is equivalent to factivity. The rules of ‘attention’ and ‘belief’ play a role similar to inductive limits: they set certain limits on inductive knowledge, guaranteeing that certain situations (those that

5 we are attending to, and those that the person in question has a substantial credence in) are epistemically accessible whenever they are evidentially accessi- ble. The rules of ‘method’, ‘reliability’, and ‘conservatism’ play a role analogous to inductive knowledge: they ensure the possibility of inductive knowledge, by maintaining that certain situation (those in which perception, testimony, memory, statistical inferences, inference to the best explanation, or other ordi- nary belief-forming processes go wrong) are not epistemically accessible (unless one of the other rules requires them to be).7 And his ‘rule of similarity’ plays a somewhat similar role to convexity, telling us that some situations being epistemically accessible guarantees that others are too. Another helpful reference point may be a theory which, inspired by theories of knowledge as belief that is safe from error, maintains that we know whatever our evidence safely indicates. We can make this idea precise by appealing to the notion of two situations’ being sufficiently close to one another, and saying that v is epistemically accessible from w if and only if it is both evidentially accessible from w and the two situations are sufficiently close.8 All three of these accounts approach their topic at a high level of abstraction. None of the views just sketched mention that in order to know something you need to believe it – or even have the conceptual resources required to entertain it, or even be capable of thought at all. Consequently, they also do not capture the fact that you fail to know when you believe only for bad reasons. And since they characterize your knowledge in terms of what is true in all epistemically accessible situations, they implausibly predict that you know a proposition only if you also know every proposition that it necessarily implies. But while a full theory of inductive knowledge would address these important issues, there is much that can be said while abstracting away from them. Moreover, the main extant strategies that safety-based theorists have used to address these problems can be co-opted within the normality framework fairly straightforwardly.9 So we

7Salow (2016, especially footnote 9) offers a slightly different reading, on which the possi- bility of inductive knowledge on Lewis’s account is driven by a knowledge default: a possibility is not epistemically accessible unless one of the rules requires it to be, which makes the rules of ‘method’, ‘reliability’, and ‘conservatism’ redundant. We discuss such knowledge-defaults, which similarly make inductive knowledge redundant, in section 3. 8Sosa (1999) and Williamson (2000) endorse safety as a necessary condition on knowledge; later, Williamson (2009) also treats it as sufficient. The theory discussed here departs from these views because it doesn’t consider whether a belief is safe, but instead focuses on what some evidence safely indicates. Bacon (2014, forthcoming) uses a theory roughly like the one sketched here to theorize about the kinds of cases we discuss in section 8; assuming that ‘sufficiently close’ and ‘similar’ are interchangeable, it is also equivalent to the result of simplifying Lewis’s (1996) theory by dropping all rules except those of ‘actuality’ and ‘similarity’. Smith (2016, ch.7) also discusses the notion of what some evidence safely indicates in the context of a theory of justification. 9Central to these strategies is the reification of beliefs, understood as particular mental states, so that two people have different beliefs even if they believe the same propositions. Instead of simply considering which believed propositions are true in all close possibilities in which we believe them, we now consider which beliefs are states of believing truly in all close possibilities in which we have them. For example, Hawthorne and Lasonen-Aarnio (2009) propose that a belief amounts to knowledge just in case it could not easily have been had without being a belief in a true proposition, and that a proposition is known just in case it is

6 will set these issues aside in what follows. Even at this high level of abstraction and idealization, there are important differences between the normality framework and its alternatives. Perhaps the most obvious difference from Lewis’s account is that the normality framework does not immediately build in any context-sensitivity in knowledge attribu- tions.10 But another, to us more important, difference is that where Lewis’s rules feature a large grab bag of different notions, the principles of the normal- ity framework feature only the relations of comparative normality and evidential accessibility. This makes the normality framework significantly more parsimo- nious, and hence more amenable to systematic investigation.11 By contrast, the simple safety theory sketched above is at least as parsimo- nious and tractable as the normality framework. Both theories are also natu- rally developed using a similarly non-reductive methodology. Williamson (2009, pp. 9-10), for example, explicitly rejects the possibility of giving an account of ‘closeness’ which does not itself presuppose epistemic notions such as knowledge. Rather, following Lewis’s (1973) attitude to the notion of comparative similarity central to his theory of counterfactuals, Williamson maintains that the relevant notion is to be understood in terms of its role in his theory. We will adopt a similar attitude to comparative normality. While we think the terminology of the content of some belief that amounts to knowledge. This proposal allows that you can fail to know p even though you believe it and it is true in all sufficiently close evidential possibilities, since it might be that all of your beliefs that have p as their content could easily have had a different content that was false. For example, even if the content of a true belief expressed by “That’s Bill” is necessarily true, it might be that the belief could easily have had a different (and necessarily false) content if you could easily have misidentified someone else as Bill. An exactly parallel strategy is available to the normality theorist. Start with a characterization of epistemic accessibility in terms of evidence and normality (see section 3), and then say that a belief amounts to knowledge just in case in all accessible situations in which you have that belief it is a belief in something true, and then re-characterize the propositions we know as those propositions which are contents of beliefs that amount to knowledge. Another safety-theoretic strategy, developed by Williamson (2009), appeals not to different possibilities in which the very same belief has a different content, but instead to belief-world pairs as relata of closeness. A belief b amounts to knowledge in a world w just in case every pair hb0, vi sufficiently close to hb, wi is such that, at v, b0 is a belief with a true content. This strategy is meant to accommodate the fact that true beliefs formed on the basis of unreliable testimony or uninformed guesses need not constitute knowledge even when they concern non-contingent truths of logic and mathematics. Again, a parallel strategy is available for the normality theorist. Just as safety theorists can replace closeness relations among worlds with closeness relations among belief-world pairs, we can replace normality relations between situations with normality relations between belief-situation pairs, and likewise for evidential accessibility (which we might think of in this context as the beliefs being formed in the same way). inductive knowledge should then be replaced with the claim that if all hb0, vi evidentially accessible from hb, wi such that b0 fails to have a true content at v are sufficiently less normal than hb, wi, then b amounts to knowledge in w; similar reformulations are available for the other principles. 10We are open to the idea that both evidential accessibility and the comparative normality relations depend on context – see footnote 3, section 4, and Authors (ms e). But this is more analogous to the idea (implicit in Lewis’s (1996, p.556) talk of salient resemblance) that the respects of similarity featuring in the ‘rule of similarity’ depend on context than to the central contextualist feature of Lewis’s account, the ‘rule of attention’. 11This is not to say that the Lewisian framework is completely unamenable to formal in- vestigation; see e.g. Holliday (2015) and Salow (2016).

7 “normality” provides some helpful guidance in fixing ideas, we hold that the operative notions of comparative normality should ultimately be understood primarily in terms of their role in the normality framework. Fortunately, an account of closeness and comparative normality in non- epistemic terms is not needed to distinguish the safety theory from the nor- mality framework. We have seen how the normality framework makes some fairly substantive predictions about Bjorn given only rather minimal assump- tions about comparative normality. The simple safety theory is less promising here. For whatever two situations’ being sufficiently close amounts to, being suf- ficiently close is a symmetric relation. So (assuming that all situations in which the scale reads 175.4 pounds are evidentially accessible from each other), the simple safety theory predicts that epistemic accessibility is a symmetric relation on the relevant situations. And this prediction is undesirable: if in v Bjorn does in fact weigh approximately 175.4 pounds while in w he weighs much less, we want v to be epistemically accessible from w (since we may suppose that, even if the scale overestimated his weight significantly, Bjorn wouldn’t be able to tell this from the reading) but not vice versa (since Bjorn should learn a fair bit about his weight when the scale is working well).12 We should pause here to draw some connections to Williamson (2000). How does his safety-based approach (chapter 5) fit with his emphasis on asymme- tries in epistemic accessibility (chapter 8)? The answer is that the asymmetries of epistemic accessibility he discusses all arise from asymmetries of evidential accessibility (chapter 9).13 Of course, proponents of a safety-based epistemol- ogy might reasonably complain that our simple safety theory is oversimple, and go on to articulate some non-symmetric notion of one possibility being nearby another. Still, it is striking that the normality framework has a clearly non- symmetric (but transitive) relation at its core, while the notions of modal dis- tance that one finds in the safety literature strongly suggests a symmetric (and non-transitive) relation; even if not decisive, this fact is a prima facie reason to find the normality framework more promising. Another important high-level feature of the normality framework is again

12Compare Magidor’s (2018) argument that, if knowledge is belief that is safe from error, then safely envatted brains-in-vats can know that they are envatted (although, unlike us, Magidor does not treat this as a reductio of such views), and Bacon’s (2014) view that, if a fair coin is going to land heads 1000 times in a row, then you’re in a position to know now that it won’t land heads approximately half of the time. Compare also Beddor and Pavese (forthcoming), who argue that normality-based accounts make better predictions than closeness-based ones about Pritchard’s (2012) Temp, who forms beliefs about the temperature based on a broken thermometer but has a guardian angel who always adjusts the temperature to match Temp’s beliefs. 13This attribution isn’t based on Williamson’s equation of evidence and knowledge, which we think marks a non-substantive terminological difference between us; see note 3. Our point is rather that his discussion of skepticism contains little discussion of the sorts of cases we will be focused on below. Williamson (2011, 2013a, 2014, forthcoming) has discussed such cases more recently, and while he does not use the language of “normality”, many of his models are easily situated within the normality framework (and not the safety theory), as we will see. All of this is to say that our misgivings about the simple safety theory are not directed at any explicit position of Williamson’s, but only at one way of developing some of his ideas.

8 helpfully illustrated by comparison with Lewis’s theory of counterfactuals. Un- like the three-place relation of comparative similarity that figures in his theory (that of v being at least as similar to w as u is to w), the relations that figure in the normality framework are binary: a situation v is at least as normal as (or suf- ficiently more normal than) another situation u simpliciter, not merely from the perspective of some third situation w. Formally, it would be straightforward to introduce Lewis-style relativization to third situations; we would simply modify the framework so that (i) “normalw” is substituted for “normal” in inductive knowledge, inductive limits, and convexity, and (ii) require, for all w, that the structural constraints on normality relations hold for being at least as/sufficiently more normalw. But we resist this sort of flexibility because it makes the resulting framework significantly less predictive.14 One might, following Carter (2019) and Williamson (forthcoming), object that what is normal is clearly a contingent matter, and hence should be modeled in a situation-relative way. But this notion of normality is different from ours. They are theorizing about what is normally the case – a property of propositions. This may well be a contingent matter, depending, for example, on contingent facts about what is and isn’t typical. But our discussion is conducted in terms of relations of comparative normality between situations. And situations take a stand on what is and isn’t typical. So while it may be contingent whether it is normal for Anna to be up after midnight, owing in part to contingency in her sleeping habits, it does not follow that it is contingent whether a situation in which Anna goes to bed late as usual is more normal than one in which Anna goes to bed early despite typically going to bed late. To avoid misunderstanding, we should emphasize that we are not here sug- gesting any analysis of comparative normality in terms of typicality. To reiterate: we think that comparative normality is best understood through its role in the theory of inductive knowledge developed here. For some readers “plausibility” will have more apt connotations than “normality” does – we use “normality”

14Smith (2016, ch. 6.4) also considers whether normality should be relativized in this way, but denies that doing so would affect his framework’s predictions on the issues he is interested in. We think he is only partially correct. In particular, Smith (2016, ch. 7.2) and (2018b) is in- terested in how beliefs should change in response to new evidence. And the considerations that lead him to consider relativizing comparative normality relations to situations also motivate relativizing comparative normality to times. Since belief-revision in response to new evidence involves the passage of time, relativizing comparative normality to times would undermine Smith’s predictions on these issues. (This point is slightly obscured by the fact that, while Smith’s informal discussion often concerns the dynamics of belief in response to new evidence, his official system regiments such claims as ones about what various bodies of evidence sup- port right now. But in moving back and forth between such characterizations Smith is clearly assuming that what a body of evidence supports does not change over time; given Smith’s theory of evidential support, this requires that comparative normality relations do not change over time either.)

9 in part in keeping with the relevant literature.15,16 It’s also worth noting that, since we are concerned with relations of comparative normality between situ- ations, rather than with normality as a property of propositions, there can be no straightforward link between normality and probability, since each situation may have zero probability of occurring. Indeed, as we will discuss in section 8, even in applications where it is reasonable to assign a pair of situations equal positive probability, one may be sufficiently more normal than the other.17 The contingency of normality-relevant factors does highlight an important feature of our assumption that there is only one evidential possibility in which Bjorn has any given weight. In making this idealization, we are treating his evidence as settling facts about the quality of his scale. A more realistic model would have different possibilities that agree on Bjorn’s weight and the scale’s reading but disagree on its reliability, and as a result differ in their comparative normality to other situations. Such a model would yield a similar picture of Bjorn’s knowledge of his weight, provided that he knows that the scale is not

15Loets (ms) argues that English constructions using the word “normal” are a poor guide to the epistemologically relevant notion of normality. We are also neutral on the question whether the present notion of normality is the same as the one Yalcin (2016) appeals to in his semantics for what he calls “modalities of normality”. Note that, if Dorr et al. (ms) are right, then the English construction “at least as normal as” satisfies the principle compara- bility discussed below, whereas the relations we are concerned with are sufficiently theoretical ones that whether they satisfy that principle is independent of the logic of comparative con- structions in English; compare their discussion of the logician’s “at least as strong as”, which uncontroversially admits incomparabilities. 16While a number of epistemologists appeal to notions of normality in something like the way we do (see note 2), epistemologists have also invoked “normality” in less related ways. Leplin (2007) gives a reliabilist theory of justified belief, and uses a notion of normality to characterize reliable belief-forming processes as those that “would not produce or sustain false beliefs under normal conditions.” So normality plays a similar role in his analysis of reliability as it does in our theory of knowledge (or, more accurately, in a variant of our theory that reifies belief- states as discussed in footnote 9). By contrast, Goldman (1986) and Graham (2012, 2017) appeal to normality to qualify the relevance of the reliability of a belief-forming processes to a belief’s being justified. For Goldman, the process need only be reliable in “normal worlds” (which he characterizes in a way quite alien to our approach, as worlds largely in keeping with the body of the believer’s beliefs). For Graham, what matters is that being reliable is part of the function of our belief-forming mechanisms, where the notion of function is understood in terms of the explanatory history of the process. (“In all possible circumstances C, a belief is prima facie justified in C iff based on a normally functioning process that has the etiological function of reliably producing true beliefs (so that it is reliable in normal conditions when functioning normally).”) Millikan (1984) is an influential precedent. Also drawing on her work on proper functioning, Ball (2013) claim that “knowledge is normal belief” – “in normal cases of belief formation, one that p only if one knows that p”. Peet and Pitcovski (2018) adopt the same slogan but develop it in a different way: like Goldman and Graham, normality is used to qualify a familiar externalist condition on knowledge, in this case safety from error rather than reliability. They hold that for a belief to be knowledge it is not enough that it be safe (which they claim “normal beliefs” are) – it also needs to be safe in the “normal” (i.e., characteristic) way. Notice that none of these views identify normal conditions as those in which the beliefs in question must be true in order to amount to knowledge or to be justified. In yet another vein, Horvath and Nado (forthcoming) defend the view that normal beliefs amount to knowledge, in the colloquial sense of “normal” studied by Bear and Knobe (2017) that is a blend of what is typical and what is ideal. 17We develop a non-straightforward reduction of normality to evidential probability, which exploits the kind of context-sensitivity discussed in section 4, in Authors (ms e).

10 too unreliable. We will discuss the status of these sorts of idealizations at greater length in section 6. Finally, for the purposes of this paper, we will also make the following strong idealizing assumption about evidence: partitionality: Evidential accessibility is an equivalence relation. This is a natural idealization in a case like Bjorn’s, where we can think of one situation as evidentially accessible from another if and only if the scale gives the same reading in both situations. Lewis (1996) and Stalnaker (2015, 2019) endorse partitionality quite generally, deriving it from a substantive account of evidential accessibility as sameness of (some aspect of) the relevant person’s “internal” state. We are not endorsing any such account. In particular, we want to leave open that, in ordinary circumstances, our evidence can in- clude facts about our immediate surroundings (such as scale readings); and this conflicts with partitionality, and especially with the symmetry of evidential accessibility, for the kinds of reasons articulated by McDowell (1982, 1995) and Williamson (2000, chapter 8). But even if partitionality does not always hold, it is still a reasonable idealization when thinking about the structure of our purely inductive knowledge. People are not in fact infallible or omniscient about the readings on their scales; but pretending that they are does little harm when we are interested in studying how knowing those readings enables them to learn how much they weigh. We will leave examination of how the normality framework behaves when partitionality fails for another occasion.18

3 The Normality Landscape

We saw above that the normality framework, when combined with the natural assumptions (1)-(3), makes a number of desired predictions about the struc- ture of Bjorn’s inductive knowledge: his knowledge about his weight will be characterized by an interval which includes his actual weight, the displayed weight, and some values either side of the displayed weight. But other aspects of Bjorn’s knowledge remain unexplained, such as the fact that in situations where the scale over-/underestimates his weight by less than it actually does, his knowledge should be characterized by an interval contained in the one that actually characterizes it. In this section we will consider different concrete pro- posals about the comparative normality of situations in which Bjorn has differ- ent weights and the scale reads 175.4 pounds, and how these proposals interact with three different elaborations of the normality framework to deliver more specific epistemological predictions, highlighting points of contact with the ex- isting literature along the way.19 Each of the three elaborations of the normality

18See Authors (ms c), where we discuss puzzles that arise when evidential and inductive considerations push in opposite directions – e.g., when things appear so abnormal that it is tempting to think we can have inductive knowledge that we are hallucinating. 19Appendix B contains a list of diagrams illustrating these different proposals about compar- ative normality and their consequences for epistemic accessibility given the three elaborations of the normality framework discussed below.

11 framework fully determines what a person knows in terms of their evidence and facts about comparative normality – these stronger theories will be our focus in the remainder of the paper. The simplest proposal about the comparative normality of the situations involving Bjorn is a binary normal/abnormal taxonomy. The normal situations are those corresponding to values in a certain interval that includes α as well as some values on either side of 175.4. All normal situations are equally nor- mal, all abnormal situations are equally normal, and every normal situation is sufficiently more normal than every abnormal situation. The normality frame- work therefore implies that, in abnormal situations, Bjorn knows only what is entailed by his evidence, while in normal situations (such as the actual one), he knows what is entailed by his evidence together with the fact that conditions are normal. Greco (2014) is naturally read as proposing such a view. The main problem with this view is that it grossly oversimplifies the contours of inductive knowledge. In effect, it claims that there is a special proposition that conditions are normal, and we get to know this proposition whenever it is true, but there is no other inductive knowledge.20 But inductive knowledge admits finer gradations than this proposal allows. For example, it seems possible that there be three situations in each of which you have the same evidence but in no two of which you have the same knowledge. The reading on Bjorn’s scale could have a middling error, so that he knows less about his weight than he would had that reading been perfectly accurate, but more about his weight than he would had the scale erred even more extremely.21 A natural response to this objection is to replace the normal/abnormal di- chotomy with a more fine-grained classification of Bjorn’s evidential possibilities in terms of levels of normality. More precisely, we assume that the most normal situations correspond to an interval containing values on either side of 175.4, and that for any two evidential possibilities, they are either equally normal or one is sufficiently more normal than the other. Together with the constraints (1)- (3), the normality framework again suffices to uniquely determine what Bjorn knows: his epistemic possibilities are precisely those evidential possibilities that are at least as normal as his actual situation, yielding a model broadly similar to Stalnaker’s (2015) treatment of similar examples. More generally (as we will discuss shortly), the normality framework uniquely determines what one knows when the following two conditions are met:

comparability If v is evidentially accessible from w, then w and v are comparable.

v ∈ RE(w) ⇒ (v < w ∨ w < v) 20Compare also Hume, who was traditionally read as claiming that the problem of justifying induction reduces to the problem of justifying belief in a single proposition: the “principle that instances, of which we have had no experience, must resemble those, of which we have had experience, and that the course of nature continues always uniformly the same” (1739, 1.3.6.4). 21Greco (2014, footnote 35) sketches a potential response to this worry, which appeals to resources not represented in the current framework. See Bird and Pettigrew (forthcoming) and Carter (2019) for discussion.

12 collapse If v is more normal than w, then v is sufficiently more normal than w. (v < w ∧ w <6 v) ⇒ v Ï w (Two situations are comparable if one is at least as normal as the other, and a situation is more normal than another if it is at least as normal as it but not vice versa.) In the absence of comparability and collapse, the normality framework does not suffice to determine everything about what Bjorn does and doesn’t know. Consider first comparability. Suppose we modify the ‘levels of normal- ity’ treatment of the case by supposing that situations that are sufficiently less normal than 175.4 are comparable if and only if they are either both ones in which the scale overestimates Bjorn’s weight or are both ones in which it under- estimates his weight. Now consider a situation w+ sufficiently less normal than 175.4 in which the scale overestimates Bjorn’s weight, and another situation w− sufficiently less normal than 175.4 in which the scale underestimates Bjorn’s height. The normality framework does not settle whether w− is epistemically accessible from w+. One might deny that it is. On this proposal, Bjorn weighing less than the scale reads can lead to more epistemic possibilities of him weigh- ing less than the scale reads than there would be had he weighed as much as the scale reads, but can’t lead to more epistemic possibilities of him weighing more than the scale reads. Cohen and Comesa˜na’s(2013) treatment of similar examples makes a prediction like this. But one could also go the other way. For all the normality framework says, every evidentially accessible situation w− incomparable with w+ is also epistemically accessible from it. Absent further stipulations, Bjorn weighing sufficiently less than the scale indicates may gen- erate further, or even complete ignorance concerning possibilities in which he weighs more than the scale indicates.22 Our preferred treatments of Bjorn’s case maintain comparability but re- jects collapse. Here is the simplest version of such a model. We assume that, for any two situations x and y, x is at least as normal as y just in case x is at least as close to 175.4 as y is, and that x is sufficiently more normal than y just in case x is at least c pounds closer to 175.4 than y is (i.e., that |x−175.4|−|y−175.4| ≥ c), for some positive constant c. Now consider a pair of situations x and y such that |y−175.4| ≥ c ≥ |y−175.4|−|x−175.4| ≥ 0, so that the situation in which Bjorn weighs y pounds is sufficiently less normal than the situation in which he weighs 175.4 pounds, and is less normal but not sufficiently less normal than the situa- tion in which he weighs x pounds. The normality framework says nothing about whether y should be epistemically accessible from x. (Since it is sufficiently less normal than 175.4, denying that it is epistemically accessible does not violate

22To rule out such extreme ignorance without rejecting the idea that incomparabilities beget ignorance, one could adopt a more attenuated account of incomparability in Bjorn’s case. For example, one could count overestimations as less normal than underestimations only when the latter are more extreme by a factor of z; one possible implementation of this would be to say that x is at least as normal as y just in case either (i) x and y are both equally normal to 175.4, or (ii) x is in the closed interval with endpoints y and 175.4, or (iii) |x−175.4| < z×|y−175.4|.

13 inductive limits; since it is not sufficiently less normal than x, holding that it is epistemically accessible does not violate inductive knowledge.) Denying that such situations are epistemically accessible yields a model of Bjorn’s case which – like Cohen and Comesa˜na’s(2013) and Goodman and Salow’s (2018, section 5) treatments of similar examples – allows that Bjorn can sometimes know that he doesn’t weigh less than x pounds when he actually weighs exactly x pounds (what Williamson (2013b) derides as “cliff-edge knowledge”). Alter- natively, one might hold that such situations are epistemically accessible – i.e., that any weight within c pounds of Bjorn’s actual weight is one that, for all he knows, is his weight. More generally, the normality framework is compatible with the principle:

margins: If v is evidentially accessible from w and comparable with w but not sufficiently less normal than w, then v is epistemically accessible from w.

(v ∈ RE(w) ∧ (v < w ∨ w < v) ∧ w Ï6 v) ⇒ v ∈ RK (w) Combined with the aforementioned normality relations, margins yields a treat- ment of Bjorn’s case similar to Williamson’s (2011, 2013a, 2014) and Goodman’s (2013) treatments of similar examples. This is notable because it shows that such models can be motivated without appeal to the principle that knowledge requires belief that is safe from error (which is how Williamson motivates them).23 We are now in a position to outline the three elaborations of the normality framework that we will be exploring in the remainder of this paper.

k-normality: Epistemic accessibility is the most restrictive relation com- patible with factivity, inductive limits, and convexity. km-normality: Epistemic accessibility is the most restrictive relation compatible with margins, inductive limits, and convexity.

i-normality: Epistemic accessibility is the most inclusive relation com- patible with evidential knowledge and inductive knowledge. The “k” indicates that, except when one of the principles mentioned implies otherwise, there is a knowledge default: epistemic accessibility relates as few situations as possible, and hence people get to know as much as possible. By contrast, the “i” indicates that, except when one of the principles mentioned

23These models also predict violations of the ‘KK’ thesis, according to which, if you know that p, then you know that you know that p. This prediction is notable because normality- theoretic approaches to knowledge have been used to defend KK by Greco (2014), Stalnaker (2015, 2019), and Goodman and Salow (2018) (although the latter can also be seen as an argument against KK, following Dorr et al. (2014), given the amount of bullet-biting they show is needed to maintain it). Their defenses are versions of the k-normality elaboration of the normality framework discussed below, which does not imply margins. An assessment for the prospects for KK in the context of the normality framework must wait for another occasion. See Authors (ms d), the main result of which is that, without partitionality, k-normality does not imply KK, even given the analogous EE principle about evidence.

14 implies otherwise, there is an ignorance default: epistemic accessibility relates as many situations as possible, and hence people know as little as possible. It is straightforward to verify that, given the structural constraints on com- parative normality, k-, km- and i-normality correspond to the following three hypotheses about epistemic accessibility (and are thus well-defined):

k : RK (w) = {v ∈ RE(w): v < w} ∪ {v ∈ RE(w): ∀u ∈ RE(w)(u Ï6 v)}

km : RK (w) = {v ∈ RE(w): ∃u ∈ RE(w)(w < u ∧ w Ï6 u ∧ v < u)} ∪ {v ∈ RE(w): ∀u ∈ RE(w)(u Ï6 v)}

i : RK (w) = {v ∈ RE(w): w Ï6 v} It is also easy to verify that k-, km- and i-normality all imply the nor- mality framework. k-normality has people know as much as is possible given that framework; i-normality has people know as little as is possible given the framework; and km-normality is a natural intermediate view because:

• km-normality and i-normality are equivalent given comparability. • k-normality and km-normality are equivalent given collapse. Since i-normality and k-normality diverge as far as is possible within the bounds of the normality framework, this also establishes our earlier claim that, given both comparability and collapse, the normality framework itself fully determines what one knows.24 Our investigation of the normality framework will be guided by the hypoth- esis that it should be developed in one of these three ways. This disjunctive hypothesis already non-trivially strengthens the normality framework. For k-, km- and i-normality all imply, given our minimal assumptions (1)-(3) about the comparative normality of Bjorn’s situations, that he would not have known less than he actually does had the scale over- or underestimated his weight by less that it actually does. This is because all three version of the normal- ity framework imply that, if you have the same evidence in w and v and w is at least as normal as v, then you know at least as much in w as you do in v (RE(w) = RE(v) ∧ w < v ⇒ RK (w) ⊆ RK (v)). This principle is not a conse- quence of the normality framework. For all the framework says, k-normality correctly describes what is epistemic accessible from some situations while i- normality correctly describes what is epistemic accessible from others; such a hybrid treatment can lead to violations of the present principle in any of the collapse or comparability violating models of Bjorn described above.

24i-normality also implies that knowledge factors into independent inductive and evidential components: RK (w) = RE (w) ∩ {v : w <6 v}. In this way it resembles the Lewisian and closeness-based theories discussed in section 2; see also Xu and Chen (2018). This is not true for k- and km-normality, since these views make inductive limits a substantive constraint rather than a consequence of an ignorance default, and evidential accessibility figures in that principle.

15 4 Skepticism

A common worry about normality-theoretic approaches to knowledge is that they leave us with too little knowledge when circumstances are abnormal.25 We raised a version of this worry in the previous section to argue against a binary normal/abnormal taxonomy. But there is a subtler version of the worry which cannot be resolved simply by admitting that normality relations have more structure than a mere dichotomy. The worry we have in mind arises for views that accept comparability or i- normality. Either principle implies that, whenever two situations w and v are evidentially accessible from one another, one must be epistemically accessible from the other.26 But there are reasons to think that a sufficiently anti-skeptical account of inductive knowledge should resist this claim. Suppose that, in ad- dition to weighing himself on a scale, Bjorn also takes his temperature with a thermometer. Even if the scale grossly misreports his weight, this needn’t undermine his knowledge of his temperature when the thermometer gives an accurate reading; and even if the thermometer grossly misreports his tempera- ture, this needn’t undermine his knowledge of his weight when the scale gives an accurate reading. So there seem to be a pair of situations that are evidentially but not epistemically accessible from each other: one in which the scale is in error and the thermometer is accurate, and another in which the thermometer is accurate and the scale is in error. But this possibility is ruled out both by comparability and by i-normality. The most straightforward response to this observation is to simply reject both comparability and i-normality. More generally, the idea might be that, when there are two dimensions along which we can assess the normality of some situations (e.g., the accuracy of the scale and the accuracy of the thermometer), for w to be at least as normal as v is for it to be at least as normal on both dimensions, and for w to be sufficiently more normal than v is for it to be least as normal and to be sufficiently more normal on at least one dimension. For concreteness, suppose that along each dimension situations can be either

25See e.g. Carter (2019) and Bird and Pettigrew (forthcoming). 26Given partitionality, this in turn implies that epistemic accessibility is connected: if both v and u are epistemically accessible from w, then one is epistemically accessible from the other. This means that the H axiom of modal logic (also sometimes known as the .3 axiom, since it yields the modal logic S4.3 when added to the modal logic S4) holds for knowledge: K(Kφ → ψ) ∨ K(Kψ → φ). And this axiom in turn is enough to generate arguably untoward skeptical results. Suppose your evidence is compatible with things being sufficiently abnormal in two independent respects (w), or in only the first respect (v), or in only the second respect (u). Then the H axiom is incompatible with the claim that you could know, when things are only abnormal in the first respect, that they are not abnormal in the second respect, and vice versa. For the axiom entails that, in w, either you know that, if you know you’re not in w or v, then you’re in w or u, or you know that, if you know you’re in w or u, then you’re not in w or v. The first disjunct is incompatible with the claim about what you know in u, and the second disjunct is incompatible with the claim about what you know in v. Of course, if these were the only three situations compatible with your evidence, the same result would follow from inductive limits. But for all we have said there is a fourth situation compatible with your evidence which is sufficiently more normal than both v and u (in which things are normal in both respects). See also Stalnaker (2006, section 6) for related discussion.

16 ordinary (0), odd (1), or bizarre (2). Ordinary is sufficiently more normal than bizarre; odd is in between, but neither sufficiently less normal than ordinary nor sufficiently more normal than bizarre. We can model this idea with nine situations all evidentially accessible from each other, depicted below.

.6 (2, 2) hp ; c

(1, 2) (2, 1) ; OW c ; GO c

(0, 2) (1, 1) (2, 0) OW c ; c ; GO

(0, 1) (1, 0) c ;

(0, 0)

(Pictorial conventions are as follow. Every depicted situation is evidentially accessible from every other. A dotted arrow from w to v represents that w is at least as normal as v; a dotted double-arrow from w to v represents that w is sufficiently more normal than v. Reflexive relations of at least as normal as are not depicted; neither are relations of at least as normal as and sufficiently more normal than that can be inferred from the transitivity of these relations and the chaining principle connecting them.) comparability clearly fails in this model since, for example, neither of (0,1) and (1,0) is at least as normal as the other. And i-normality indeed has the skeptical consequence that (2,0) is epistemically accessible from (0,1), meaning that things being at all out of the ordinary in one respect prevents us from knowing that they aren’t bizarre in an unrelated respect – even if, in that respect, things are in fact perfectly ordinary. But neither k- nor km-normality makes this skeptical prediction. In fact, both k- and km-normality entail that none of (2,0), (2,1), or (2,2) are epistemically accessible even from (0,2) – so even when things are in fact bizarre along one dimension, you can still know that they are not bizarre along the other.

(2, 2) (2, 2)

(1, 2) (2, 1) (1, 2) (2, 1) KS 5= 2 * (0, 2) (1, 1) (2, 0) (0,2) *4 (1, 1) (2, 0) ai 3A 2

17 (Pictorial conventions are as follows. The diagrams represent what is epis- temically accessible given the facts about normality and evidence described and pictured above. Each diagram only represents what is epistemically accessible from the situation(s) printed in bold. Triple arrows indicate what is accessible under all three of k-, km-, and i-normality; double arrows indicate what is accessible under both km- and i-normality; and single arrows indicate what is accessible under i-normality only. ) However, cases involving multiple independent dimensions of abnormality also raise a concern in the other direction: that the normality framework can allow you to know too much. Our assumption that you know everything that is true in all epistemically accessible scenarios implies:

knowledge conjunction: If you know p and you know q, then you know p and q. But this principle might be thought to generate too much knowledge in models like the one above. Suppose that our situation is ordinary along a large number of different dimensions. The anti-sceptical judgements we appealed to above then suggests that, for each of those dimensions, you know that things are not bizarre along that dimension; and this is also what we would predict given either k- or km-normality in a high-dimensional generalization of the two- dimensional model depicted above. By knowledge conjunction, it follows that you know that things are not bizarre along any of these dimensions. But when there are enough such dimensions, this may not seem like the sort of thing one can know. The worry is often formulated in terms of probability, emphasizing the unlikeliness – either objectively, or in view of your evidence – of things being non-bizarre along each of very many dimensions. But it can also be put in the language of normality: things being non-bizarre along each of very many dimensions might itself seem strikingly abnormal, and hence not something we can have inductive knowledge of. This general worry about knowledge conjunction is not specific to the normality framework.27 But the normality framework does suggest a natural contextualist response to it, which simultaneously disarms the aforementioned skeptical threat posed by comparability and i-normality. Both problems arise in cases that are naturally thought of as involving multiple dimensions of abnormality. A natural response, then, is that the comparative normality of two situations should be understood as relative to these dimensions. As a result, “know” expresses different relations depending on which dimension or dimensions are salient in the context of utterance. Here is an example of how this idea might be fleshed out. Consider three possible contexts in which we might be characterizing what the subject in the two-dimensional model depicted above “knows”. In c1, only the first dimension of abnormality is salient; in c2, only the second is; and in c3, both are. The normality-relative-to-c1 of a situation coincides with how normal it is along the

27See especially Hawthorne and Lasonen-Aarnio (2009) and Williamson (2009) for discussion in the context of safety-based accounts of knowledge.

18 first dimension, as in the diagram below. This structure obeys comparability. And i-normality now predicts that (2,0) is not epistemically accessible from (0,1) relative to c1, so that, when things are odd along the second dimension, you can still know relative to c1 that they are not bizarre along the first. Assuming that normality-relative-to-c2 works analogously, you can also know relative to c2 that things are not bizarre along the second dimension even when they are odd along the first. The anti-skeptical pattern of “knowledge”-ascriptions can be accommodated assuming a shift in context.

(2, 2) o / (2, 1) o / (2, 0) 7? O

(1, 2) o / (1, 1) o / (1, 0) O

(0, 2) o / (0, 1) o / (0, 0)

What about normality-relative-to-c3? Different options are possible, includ- ing the one depicted originally. An interesting alternative proposal holds that (0, 0), the situations in which things are maximally normal along both dimen- sions and which is thus amongst the most normal situations relative to both c1 and c2, may not be maximally normal relative to c3. This corresponds to the idea, voiced above, that it seems normal for there to be a little abnormality. A toy implementation of this idea is depicted in the diagram below. In this model the most normal situations are (0, 1) and (1, 0), and other situations are more abnormal the further they depart from these two situations. It might then turn out that (2, 0) and (0, 2) are not sufficiently less normal than any other situa- tion, so that one cannot know relative to c3 that nothing bizarre is happening in either respect. Since one cannot know this relative to c1 or c2 either, we then re- spect the idea that this would constitute too much knowledge. This is consistent with knowledge conjunction, since although the relevant proposition is the conjunction of two propositions each of which can be known in some context, there is no single context in which we can know both that things are non-bizarre in the first respect and that they are non-bizarre in the second respect.28

28Carter and Goldstein (forthcoming) develop a similar idea. Their view is roughly that, in addition to the ‘first-order’ dimensions of abnormality, there is a further ‘second-order’ dimension representing how much abnormality is present across the first-order dimensions, and that situations which are maximally normal along every first-order dimension are (when there are enough dimensions) abnormal along this second-order dimension. However, Carter and Goldstein don’t appeal to context-sensitivity, and also don’t try to generate overall normality orderings in terms of these various dimensions. Instead, they propose that an evidentially accessible situation is epistemically accessible just in case it is not sufficiently less normal along any dimension, including the second-order one. This allows them to construct models in which one can know that things are not ordinary along every first-order dimension. However, this proposal does not address the worry about “knowing too much”, since it still implies that for any set of first-order dimensions, however numerous, when things are ordinary along all of those dimensions, you can know that things are non-bizarre along all of them.

19 (2, 2) 7? O

(1, 2) o / (2, 1) 7? O

(0, 2) o / (2, 0) o / (1, 1) o / (0, 0) O

(0, 1) o / (0, 1) Having introduced the idea that that normality relations might be context- sensitive in this way, we will now set it aside. The important point is that the possibility of such context-sensitivity demonstrates that the normality frame- work has a natural way of responding to the skeptical threat posed by compa- rability and i-normality that doubles as a way of defusing the threat of too much knowledge posed by knowledge conjunction.29 The sceptical threat arising from comparability and i-normality is thus not a conclusive objec- tion to these principles, and so their consequences remain worth exploring. We should also reiterate that endorsing this form of contextualism isn’t required to stop the normality framework from making skeptical prediction: we can instead reject both comparability and i-normality, and reconcile ourselves to the surprising predictions of knowledge conjunction in some other way.

5 Belief k-, km- and i-normality each fully characterize what a person knows without any explicit reference to what they believe. This is quite striking, since it is widely held that belief is necessary for knowledge. Might evidential knowl- edge fail for someone who doesn’t believe everything entailed by their evidence? And might inductive knowledge fail for someone who doesn’t disregard ev- ery situation sufficiently less normal than the one they are actually in? We will sidestep these questions by treating the framework as a model of people who believe everything they are entitled to believe. There are two nat- ural approaches for using the normality framework to model what such people believe.30 The first approach identifies belief with the possibility of knowledge

29A parallel response is much less natural for safety-based theories of knowledge, since whether or not an error possibility is safely far removed from actuality is not naturally un- derstood as relative to the subject matter we are considering a person’s beliefs about. 30We here follow the tradition in epistemology of thinking of belief, roughly, as the “internal component” of knowledge. But we agree with Hawthorne et al. (2016) and Holgu´ın(ms) that “believe” in English expresses an attitude considerably less demanding than this. “Be sure” is arguably closer to the mark for present purposes. While there may be cases of knowing without being sure, such Radford’s (1966) unconfident examinee, these seem to turn on failures of partitionality, and so are beyond the scope of this paper. (E.g., assuming the examinee does know, he does so evidentially, by remembering, and his evidence does not entail that he

20 (call this k-belief ): a person k-believes that p if and only if they do not know that they do not know that p.31 The second approach defines belief directly in terms of evidence and comparative normality (call this e-belief ). Say that v is doxastically accessible from w just in case v is evidentially accessible from w and not sufficiently less normal than any u evidentially accessible from w. A person e-believes that p in w if and only if p is true in every situation that is doxastically accessible from w.32 So defined, e-belief can exhibit some patholog- ical behavior if there can be infinite chains of sufficiently greater normality (for example, there may then be no situations doxastically accessible from a given situation). To rule this out, we will assume:

foundation: If v is evidentially accessible from w, then there is some u that is doxastically accessible from w and at least as normal as v. In appendix C we discuss how to modify the definition of e-belief in a way that generalizes smoothly to cases (if there are any) where foundation fails. Both the k-belief and e-belief accounts deliver the basic facts about knowl- edge and (justified) belief: knowledge entails true belief, but not vice versa. In particular, convexity ensures that Gettier-style justified true beliefs fail to amount to knowledge. Consider, following Russell (1949), a case where you come across a clock which you reasonably assume is working, but which has in fact stopped. You see that it reads 12pm; and, as it happens, it is in fact 12pm. Your belief that it is 12pm is both true and justified, yet it isn’t knowledge. And convexity explains why. For there will be evidentially accessible situations in which the clock is stopped at 12pm and it is 1pm which are at least as normal as the actual situation in which the clock is stopped at 12pm and it is 12pm; since the latter is epistemically accessible by factivity, the former is epistemically accessible by convexity, and so you do not know that it is 12pm. partitionality and foundation together imply that both e-belief and k- belief supervene on your evidence. Since partitionality implies that people always know what their evidence is, it follows that, in both senses of belief, people always know what they do and do not believe. so remembers.) We discuss such cases in Authors (ms c). 31This account of belief in terms of knowledge was first suggested by Lenzen (1978), and has been taken up by Stalnaker (2006), Rosenkranz (2018), and Dutant (forthcoming). The equivalence of belief and possible knowledge is also a feature of the aforementioned models of Williamson (2013a) and Goodman (2013). 32Goodman and Salow (2018) characterize belief in this way. Smith’s (2016) theory of jus- tification is closely related; but comparing it to the present notion of e-belief is complicated by the fact that he deploys only the relation of one situation being at least as normal as another, and not also the notion of one situation being sufficiently more normal than another. Perhaps the most natural way of mapping between his framework and ours is to treat him as presupposing collapse; this makes his theory of justification isomorphic to the present account of e-belief (or, more accurately, to the account discussed in appendix C) given com- parability, which is presupposed in his framework. However, Smith also considers a way of defining something like a sufficiently more normal than relation in terms of his at least as normal as relation and then reformulating his characterization of justification in terms of this defined notion; see Smith (2016, chapters 5.2 and 8). The resulting theory coincides with the present one given comparability.

21 Moreover, against the background of foundation and partitionality, ei- ther one of k-normality and comparability implies that k-belief and e-belief coincide. Without k-normality or comparability, k-belief still approximates e-belief, in the sense that you e-believe something if and only if it is entailed by the totality of things you k-believe. But in other respects the two notions may come apart in striking ways. In particular, without k-normality or compa- rability, there can be failures of the principle:33 k-belief conjunction: If you k-believe p and you k-believe q, then you k-believe p and q. (By contrast, the analogous principle for e-belief always holds.) Consider a model (depicted below) with four situations, w, v, u, and t. Each is evidentially acces- sible from any other; w is sufficiently more normal than u, insufficiently more normal than t, and incomparable with v; while v is sufficiently more normal than t and insufficiently more normal than u. Now suppose margins is true. Then for all you know you are in w, in which case you know you’re not in u; and for all you know you are in v, in which case you know you’re not in t; but you know that you don’t know that you are neither in u nor in t, since in every situation margins forces one or the other of them to be epistemically accessible. u t u t KS _ ? KS \d :B

w v w  jt *4 vÙ Here is one reason to think that such structures are a genuine possibility. In our initial model in section 4, with nine situations that vary in normality along two dimensions, (0, 1), (1, 0), (2, 1), and (1, 2) stand in the same relations of com- parative normality to each other as w, v, u, t do above. So it’s natural to think that a case with this four-situation structure could arise from circumstances originally modeled by the nine-situation structure after receiving evidence that eliminates all but those four situations. Note that, given km-normality and partitionality, failures of k-belief conjunction must involve failures of collapse, as in the above example. But given i-normality there may be failures of k-belief conjunction without failures of collapse; for example, by modifying the above model so that w is incomparable with t and v is incomparable with u, as depicted below.

u t u t KS KS ` >

w v w  jt *4 vÙ

33Authors (ms c) discuss counterexamples to k-belief conjunction that can arise in the setting of k-normality from failures of partitionality.

22 Again, we can argue that this structure depicts a genuine possibility by show- ing how it can be embedding in an independently motivated larger structure. Suppose there are three relevant dimensions of abnormality, and that matters can be either normal (0) or abnormal (1) along each dimension, with normal sufficiently more normal than abnormal, as depicted below. For example, sup- pose you have DNA evidence that a bear is a grizzly, but for all your evidence implies it might have any combination of three traits that are characteristic of grizzlies (shoulder humps, a concave face profile, and long and light claws). Supposing that the absence of any one of these traits on its own can be the basis for inductive knowledge that a bear is not a grizzly, it seems, conversely, that having evidential knowledge that a bear is a grizzly can be the basis for inductive knowledge in advance of observation that it will have any of these traits. As discussed in section 4, this pattern of judgments is naturally vin- dicated using the model depicted below (with one dimension for each trait). And within this model, (0, 0, 1), (1, 0, 0), (0, 1, 1) and (1, 1, 0) stand in the same relations of comparative normality to each other as w, v, u, t do above. So it’s natural to think that a case with that four-situation structure could arise from circumstances originally modeled by this eight-situation structure after receiv- ing evidence that eliminates all but those four situations – e.g., in this case, the evidence that the bear has either shoulder humps or long and light claws, but not both. (We will revisit this example in section 7. This mode of argument is not airtight, for reasons we discuss in section 9.)

(1, 1, 1) 5= KS ai

(0, 1, 1) (1, 0, 1) (1, 1, 0) KS ai 5= ai 5= KS

(0, 0, 1) (0, 1, 0) (1, 0, 0) ai KS 5=

(0, 0, 0)

What should we make of failures of k-belief conjunction, assuming they are genuine? Following Makinson (1965), many have found it plausible that one can believe each of a large number of claims – those contained in a particu- lar history book, for example – without believing their conjunction. Is this a consideration in favor of modeling belief as k-belief rather than e-belief?34 We don’t think so. This is because the considerations that make it seem plausible

34Working in a normality-based framework, Littlejohn and Dutant (2020) argue for k-belief (and similar analyses of belief) over e-belief on roughly this basis, though they give no formal model for how k-belief conjunction fails. Carter and Goldstein (forthcoming) develop a theory of knowledge that is a close cousin to km-normality, and show in detail that this leads to failures of k-belief conjunction in some preface-like situations; they too treat this as a selling point for an account of belief as k-belief.

23 that such cases are ones in which we do not believe the conjunction of every- thing we believe make it equally plausible that we do not know the conjunction of everything we know, as discussed in section 4. The two claims should stand or fall together: any account of how belief fails to be closed under conjunc- tion in such cases should imply that knowledge can also fail to be closed under conjunction in such cases, and so must involve a departure from the normal- ity framework itself (since the framework implies knowledge conjunction). For this reason, we don’t think that failures of k-belief conjunction are the right diagnosis of what is going on when we are reluctant to attribute belief in the conjunction of each of very many independent claims we separately believe. (A more promising diagnosis, we think, would invoke the context-sensitivity of comparative normality sketched in section 4.)

6 Dynamics

Let us now consider how to use the normality framework to model the dynam- ics of knowledge and belief – that is, how knowledge and belief change when we get new evidence. A tempting thought is that getting more evidence is a matter of fewer situations becoming evidentially accessible. But ordinary pro- cesses of evidence-gathering – like weighing yourself – aren’t like this. Before you’ve weighed yourself, your evidence (ordinarily) will entail that you haven’t weighed yourself yet; after you’ve weighed yourself, your evidence will entail that you have weighed yourself. So no situation evidentially accessible after you weigh yourself was evidentially accessible before you weighed yourself. Part of what is going on is that evidential accessibility encodes not only facts about what your evidence entails about your environment, but also facts about what your evidence entails about what your evidence entails. Ordinarily, when we think about ending up with more evidence than we started with, what we mean is that our later evidence entails more about a given subject matter than our earlier evidence did. This will be our guiding idea here. To implement this idea, we model situations as having additional structure, representing how things are with respect to a given subject matter and what your evidence implies concerning that subject matter. Formally, we start with a set of ‘states’, each of which corresponds to a complete specification of how things are with respect to the relevant subject-matter. Situations are then ordered pairs hx, Y i, where x is the actual state of the world, and Y is the set of states of the world compatible with your evidence. Not all such pairs are possible situations; at a minimum, x must be a member of Y , since the truth must be compatible with your evidence. We then stipulate that hx0,Y 0i is evidentially accessible from hx, Y i just in case Y = Y 0 – that is, just in case these situations agree on what evidence you have about the state of the world.35 In our running example of Bjorn weighing himself, we can treat states of the world as encoding facts

35Note that this stipulation implies partitionality, which we noted earlier is controversial as a general principle. It is not clear how the current strategy for modelling dynamics could be extended to cases in which partitionality fails.

24 about how much he weighs and what the scale reads when he steps on it. (We’re here assuming that the state of the world corresponds to an eternal proposition, and so doesn’t change as you acquire new evidence.) Since situations settle the truth values of propositions and also determine a state of the world, we can associate every proposition p with the set p∗ of states x such that, for some Y , p is true in hx, Y i. This allows us to make precise our earlier remarks about what an agent knows/believes/has evidence about concerning the state of the world: a person knows(/believes/has evidence entailing) that the state of the world is a member of Y just in case, for some proposition p, Y = p∗ and the person knows(/believes/has evidence entailing) p. Finally, let us say that v is the result of discovering p in w just in case, for some state x and sets of states Y and Y 0, w = hx, Y i, v = hx, Y 0i, and Y 0 = Y ∩ p∗. This notion of discovery allows us to make precise our earlier dynamic locutions about evidence accumulation, by allowing us to model acquiring a new piece of evidence p as a change in one’s situation.36 To see this machinery in action, consider a slightly more complicated version of the Bjorn case. Suppose that, instead of stepping on the scale only once, he 37 steps on it twice. States can then be represented by triples hx, y1, y2i, where x is Bjorn’s actual weight, y1 is the weight indicated by the scale on the first weighing, and y2 is the reading indicated on the second weighing. For each state s, there are then three situations. In the first, Bjorn’s evidence is characterized by the set of all states; in the second, his evidence is characterized by the set of all states that agree with s about the first reading of the scale; in the third, his evidence is characterized by the set of all states that agree with s about both readings of the scale. We want to know how his beliefs and knowledge evolve as he moves between these situations. A natural strategy for thinking about the normality relations between these situations is to treat them as parasitic on normality relations between their component states. Let us say that the state hx, y1, y2i is at least as normal as 0 0 0 2 2 0 0 2 0 0 2 the state hx , y1, y2i just in case (x − y1) + (x − y2) ≤ (x − y1) + (x − y2) , 2 2 2 0 0 2 and sufficiently more normal just in case (x−y1) +(x−y2) +c < (x −y1) + 0 0 2 (x − y2) for some positive constant c. A situation hx, Y i is at least as normal as(/sufficiently more normal than) a situation hx0,Y 0i just in case x is at least as normal as(/sufficiently more normal than) x0. We will give some reasons why this is a natural model below. These normality relations satisfy comparability and foundation, so k- belief and e-belief coincide, and we may simply speak about what Bjorn believes. The model implies that, before stepping on the scale, Bjorn believes nothing about his weight. Once he steps on the scale for the first time, and sees that it

36While we think the present framework is conceptually cleaner, we suspect that much of the recent work in dynamic epistemic logic – where learning is modeled as a change in accessibility relations – can be recast in our setting by way of mutual connections with theories belief-revision discussed in the next section; see van Bentham (2007). 37Goldstein and Hawthorne (ms) explore a range of strategies for modeling similar cases in the context of a normality-based epistemology.

25 38 reads y1, he believes that he weighs y1±c pounds. Once he steps on the scale for the second time, and sees that it reads y , he believes that he weighs y1+y2 ± √c 2 2 2 pounds. If both readings match Bjorn’s actual weight (i.e., if x = y1 = y2), then his beliefs amount to knowledge before weighing himself, after weighing himself once, and after weighing himself a second time. Given k-normality, the same is true as long as the scale’s errors are not too large (i.e., so long as 2 2 2 (x − y1) + (x − y2) ≤ c ). By contrast, given margins, any error in either of the scale’s readings implies that (i) after weighing himself once, Bjorn has some beliefs about how much he weighs that are not knowledge, (ii) even before stepping on the scale, he has some beliefs about how far apart the two readings will be that do not amount to knowledge, and (iii) after weighing himself twice, all of this beliefs amount to knowledge if and only if his true weight is the y1+y2 average of the two readings (i.e., if x = 2 ). To see why these predictions are attractive, suppose that the errors in the scale are the result of Gaussian noise. That is, whenever Bjorn weighs himself, the chances of different magnitudes of error in the scale’s reading are char- acterized by a ‘bell curve’ with center 0 (i.e., with overestimates as likely as corresponding underestimates). We assume that almost all of the area under the curve is within the interval of radius c centered around 0, with only a very small proportion of the area remaining in the distribution’s tails on either side. The scale’s error on the second weighing is probabilistically independent of its error on the first weighing. So the chances of different pairs of errors for Bjorn’s first and second weighings are characterized by a bivariate normal distribution (a ‘bell curve’ in two dimensions), one dimension corresponding to the error in the first reading and the other to the error in the second. Although any given error has probability 0 of occurring, different errors are associated with differ- ent probability densities (corresponding to the height of the bell curve at that point), and in this way it makes sense to take ratios of the chances of different (pairs of) readings. Here are two relevant mathematical facts. The first is that the distribution characterizing the average of the two errors is also a bell curve, but one that is more compact, by a factor of √1 . The second is that, for any pairs of readings 2 0 y1, y2 and z1, z2 that have the same average and any weights x and x , the ratio of the probability of the scale reading y1 and then y2 when Bjorn weighs x 0 pounds to the probability of it reading y1 and then y2 when Bjorn weighs x pounds is the same as the ratio of the probability of it reading z1 and then z2 when Bjorn weighs x pounds to the probability of it reading z1 and then z2 when Bjorn weighs x0 pounds. Intuitively, this means that, if all we care about is the bearing of a pair of readings on the question of how much Bjorn weighs, then all that matters is the average of those readings. Since these averages are characterized by the same kind of distribution as the one characterizing a single reading, except scaled by a factor of √1 , it makes sense that what Bjorn believes 2 (and, when all goes as well as possible, knows) about how much he weighs after weighing himself twice should correspond to an interval of weights centered on

38We use ‘a ± b’ to refer to the range of values in the interval [a − b, a + b].

26 the average of the two readings that is narrower by a factor of √1 than the 2 interval characterizing his beliefs about his weight after a single reading.39 Note that although all that matters for what Bjorn believes about his weight is the average of the two readings, the disparity between the readings is not epis- temically irrelevant. Greater disparities between the readings are increasingly abnormal;√ as a result, Bjorn will believe that the readings will not differ by more than 2c unless he discovers otherwise. Of course, the errors on real scales are not characterized by truly Gaussian noise (scales cannot give negative readings after all, and Bjorn’s digital scale gives discrete readings to the nearest tenth of a pound). Many scales’ errors are not characterized by even approximately unbiased Gaussian noise (for example, if the errors are partially the result of miscalibration in addition to random noise). And most people weighing themselves do not know how error-prone their scales are, or the nature of this error, or the number of times they will decide to weigh themselves. And even if the model is a reasonable one on the assumption that Bjorn’s evidence somehow settles all of these questions, we haven’t given a general recipe for how to model knowledge in terms of the statistics relating readings of measurement devices and the values of the quantities they measure. Nonetheless, we do think these statistical facts help to support the model, and the verdicts it delivers.40 Having gone some way towards motivating the normality structure described above, let’s return to the predictions it makes about Bjorn’s knowledge and beliefs. We already noted that, given margins, any error in either of the scale’s readings implies that, after weighing himself once, not everything Bjorn believes about how much he weighs amounts to knowledge. Notably, this is true even if the first reading exactly matches his actual weight (so that only the second reading involves any error). This phenomenon arises even given k-normality provided the second reading’s error is sufficiently large: Bjorn knows less about his weight after one perfectly accurate reading in hx, x, yi than he does in hx, x, xi when |x − y| > c. It is worth pausing on this prediction, since it means that what Bjorn knows depends not only on his weight and the readings of the scale so far, but also on what the scale will read in the future. Another striking prediction of the model given k-normality is that Bjorn can lose knowledge√ about how much he weighs. Suppose the state is hx, x, yi where (2 − 2)c < y − x < c. After seeing the first reading of the scale, Bjorn knows that he weighs x ± c pounds; after seeing both readings, he knows about his weight only that it is x+y ± √c pounds. And x + c < x+y + √c , so after the 2 2 2 2 second weighing he no longer knows that he weighs no more than x + c pounds, despite having known this previously. That knowledge about the state of the world can be destroyed by acquir- ing new evidence is, in this model, a distinctive prediction of k-normality.

39 √ All of the above√ facts carry over to an n-observation generalization of the example, with n in place of 2 after n observations. The squaring of errors in the definition the normality relations is in order that Bjorn’s beliefs evolve in line with the relevant statistical facts. 40In Authors (ms e) we show how to derive this model from plausible assumptions about evidential probabilities given a probabilistic account of comparative normality.

27 margins implies that there is no knowledge lost, since Bjorn will know less to begin with. In particular, it implies that after seeing the first measurement Bjorn knows only that he weighs x ± p(y − x)2 + c2 pounds, which is some- thing he continues to know after weighing himself again. But all versions of the normality framework predict that the second reading will require Bjorn to give up some of his beliefs about his weight, since they all predict that he will go from believing that he weighs x±c pounds to believing that he weighs x+y ± √c 2 2 pounds, and x + c < x+y + √c . This is notable because this change in Bjorn’s 2 2 beliefs is the result of discovering that the scale reads y the second time. And, since x < y < x + c, it was compatible with what Bjorn believed after weighing himself the first time that the scale would read y the second time. The model therefore predicts that belief-revision is non-monotonic: discovering something compatible with your beliefs can occasion giving up some of those beliefs. We will consider further examples of this phenomenon below.

7 Belief-Revision

How do the two-dimensional models described in the previous section compare to existing models of belief-revision in the literature? To facilitate this compar- ison, we begin with two observations. The first is that, given partitionality and foundation, belief (however characterized) supervenes on evidence: if two situations agree on what your evidence is, then they agree on what you believe. This means that, when thinking about belief-revision, we can reduce a situation to its evidence component, i.e. to a set of states. The second observation is that, in the above model of Bjorn, normality relations between situations are parasitic on normality relations between states, in the following sense: evidence neutrality: For all situations hx, Y i and hx0,Y 0i, if x = x0, then hx, Y i is as normal as hx0,Y 0i. In this section and the next will restrict our attention to models that satisfy this constraint. Given the reducibility of situations to their evidential components (as far as belief is concerned), and our assumption of evidence neutrality, the relevant structure of our two-dimensional models is essentially that of a set of sets of states, corresponding to possible bodies of evidence, and relations of comparative normality on the states themselves. For every body of evidence, what you believe given that evidence is characterized by a non-empty subset of your evidence, the doxastic possibilities given that evidence. The dynamic questions we are interested in concern how the beliefs held relative to one set of states (the initial evidence) compare to those held relative to a subset of those states (the evidence after discovering something new). These simplified structures are easy to compare with standard models of belief-revision using plausibility orders.41 In their simplest form, these models

41These models were first described by Grove (1988). See Caie (2019) and Lin (2019) for recent surveys helpfully focused on the issues discussed below.

28 consist of a set of worlds (corresponding to our states) and a relation of one world being at least as plausible as another, which is assumed to be reflexive, transitive, complete (i.e., no incomparabilities), and well-founded (i.e., no infi- nite chains of greater plausibility). Belief is defined as truth in all doxastically accessible worlds. Initially, w is doxastically accessible if and only if no world v is more plausible than w; and for any consistent p, w is doxastically acces- sible after learning q if and only if p is true at w and no world v at which p is true is more plausible than w. This definition is essentially the same as our definition of e-belief (where the initial state is one with trivial evidence). So these plausibility-order models correspond exactly the aforementioned structure in our two-dimensional models in the special case where every non-empty set of states is a possible body of evidence, and comparative normality obeys col- lapse (in which case there is only one normality order) and comparability (in which case it is a complete order), in addition to foundation (in which case it is well-founded, given that every non-empty set of states is a possible body of evidence) and evidence neutrality. This correspondence makes it routine to establish that, when evidence neu- trality, foundation, collapse, and comparability all hold, our frame- work for modeling belief-revision obeys the AGM postulates wherever it is de- fined.42 However, unlike standard approaches like the one in terms of plausibility orders described above, we do not assume that, as long as p is not a contra- diction, it makes sense to ask what happens when you discover p. For example, for any situation in which your evidence entails not-p, there is no situation that is the result of discovering p. This difference means that our framework faces no problem of iterated belief-revision. That problem, which has dominated the post-AGM literature, arises from a desire to make sense of “learning” some- thing inconsistent with something you previously “learned”. In our setting, you can never discover something inconsistent with what you previously discovered, since anything discovered must be true. Indeed, not only will everything you discover be consistent with anything you later discover, but it will also continue to be known in every situation that results from further discoveries.43 More substantial departures from AGM arise in models that violate either comparability or (as in our model of Bjorn) collapse. In particular, the AGM axioms entail the principle:

42See Alchourr´onet al. (1985), which has become the dominant theory of belief-revision. Grove (1988) identified the corresponding constraints on plausibility orders. While foun- dation does not imply that being sufficiently more normal is well-founded when not all non-empty sets of states are possible bodies of evidence, this subtlety does not affect the correspondence with AGM. 43This does not mean that, once you discover something, you will know it forever. For our notion of discovery is logical, not temporal (although we sometimes use temporal idioms to describe it for ease of exposition). Being in w sometime before you are in v neither implies nor is implied by the existence of any p such that discovering p in w results in v. In fact, when we forget things, a temporally prior situation w may be the result of discovering p in a later situation v. Note that discovering p is not a one-to-one operation: there can be distinct situations w and w0 such that v is the result of discovering p in both of them. So there is no well-defined operation of “forgetting p”. This brings out another difference with the literature on belief-revision, which usually postulates such an operation called “contraction”.

29 preservation: If you believe p and don’t believe not-q, then you still believe p after discovering q. And we’ve just seen that this principle fails in our collapse-violating model of Bjorn.44,45 Lin (2019) notes that preservation can fail in plausibility-order models if comparative plausibility fails to be a complete order (i.e., if there are a pair of worlds neither of which is at least as plausible as the other).46 The situation is more delicate in our setting, since as discussed in section 5, failures of compa- rability can lead e-belief and k-belief to come apart by generating failures of k-belief conjunction. For simplicity, let us interpret the operative notion of belief as e-belief in what follows. We can then illustrate the potential failure of preservation with the four-state structure depicted below.47

a c KS KS

b d

The possible bodies of evidence are {a, b, c, d}, {b, c}, and {a, d}. Given the evidence {a, b, c, d}, you believe the state to be either b or d, which is consistent with it being one of a or d. But you give up this belief when you discover that it is one of a or d: for all you believe after this discovery, the state is a. This non-monotonic revision strikes us as plausible in the Grizzly-bear example used to motivate structures like this in section 5. Having discovered that the bear has either the characteristic shoulder humps or the characteristic light claws, but not both, you still believe that it has the characteristic concave face profile. But suppose you discover further that it also has only one of the concave face profile and the shoulder hump. Arguably, on at least some ways of fleshing out the example, the correct reaction is not to conclude that it has both the concave face profile and the light claws but lacks the shoulder hump, as preservation would require, but rather to suspend judgement on all three characteristic features. Aside: while our focus is now on the revision of beliefs, it’s worth mentioning the knowledge-analogue of preservation, namely

44Compare Smith (2018b, footnote 17), who notes in passing that preservation (his ‘Ra- tional Monotonicity’) isn’t valid on the version of his theory discussed in footnote 32, which defines something like our sufficiently more normal than relation in terms of an at least as normal as relation, thus in effect allowing for failures of collapse. 45Hall (1999), Lin and Kelly (2012), and Shear and Fitelson (2019) all describe potential counterexamples to preservation, which could be naturally modelled in our framework as arising from failures of collapse. These authors respond instead by proposing probability- threshold models of belief that invalidate preservation; its failure in those models turns on the fact that it’s possible for the probability of p to exceed t and the probability of not-q to fall below t without the probability of p given q exceeding t. 46Lin also uses an example from Stalnaker (1994) (analogous to the Grizzly-bear example from section 5) to argue that comparative plausibility needn’t be a complete order. 47The diagram looks the same as one used to illustrate potential failures of k-belief con- junction in section 5, but it is subtly different, since it depicts states rather than situations.

30 no defeat: If you know p, then you still know p after discovering q. We have already seen that, given k-normality, this fails in our model of Bjorn. By contrast, given i-normality, it holds in any model satisfying evidence neutrality. Given km-normality and evidence neutrality, it can fail only in models that invalidate comparability. For example, in the model just discussed, you know in hd, {a, b, c, d}i that the state is either b or d; but you no longer know (or believe) this after discovering that the state is one of a or d and thus ‘moving’ to hd, {a, d}i. Another interesting feature of this structure is that it predicts failures of a natural principle about non-belief: anticipation: If you wouldn’t believe p after discovering q and you wouldn’t believe p after discovering not-q, then you don’t believe p. Recall that, given evidence {a, b, c, d}, you believe that the state is either b or d. But you would give up this belief if you were to discover that the state is either a or d; and you would also give it up if you were to discover that the state is either b or c. Applied to the Grizzly example: you believe that the mystery Grizzly has a concave face profile; you would stop believing this if you were to discover that he has only one of the concave face profile and the shoulder hump; you would also stop believing this if you were to discover that he has only one of the concave face profile and the light claws; and you have evidential knowledge that exactly one of those two claims is true. These principles are diagnostically helpful in comparing the present frame- work to theories of belief-revision weaker than AGM. For example, if we as- sume evidence neutrality and foundation but neither comparability nor collapse, the resulting theory of discovery mirrors the belief-revision the- ory P + discussed by Lin (2019) in which neither preservation nor anticipa- tion holds. The same is true if we impose collapse, since it can be shown that any pattern of doxastic accessibility that can arise through failures of collapse can arise instead through failures of comparability.48 By contrast, imposing comparability but not collapse yields a theory stronger than P + but weaker than AGM, which validates anticipation but not preservation.49 In the next section we will show how models of this kind arise naturally in thinking about inductive knowledge about objective chances and laws of nature.

8 Chance

We will now discuss two cases from Dorr, Goodman, and Hawthorne (2014) involving repeated, objectively chancy processes. The first case is analyzed by 48The two four-world models depicted in section 5 partially illustrate this fact; see also note 67. Note that models of collapse without comparability arise naturally if we under- stand being sufficiently more normal as being more normal on all admissible ‘precisification’ of comparative normality (in the sense of Fine (1975)), with each precisification corresponding to a complete order that respects all determinate facts about comparative normality. 49Compare Kraus et al. (1990) on systems of non-monotonic logic where Rational Monotony fails but Negation Rationality holds; see note 63 for these principles.

31 Goodman and Salow (2018) in a way that provides another illustration of the modeling strategy from the previous section. The second provides a natural model of a key form of inductive knowledge, namely knowledge about the laws of nature or the objective chances obtained by observing the events they govern; it also illustrates that – pace Smith (2018a) – there are strong reasons to deny that outcomes that have equal chances of occurring (e.g. the different outcomes of a fair coin flip) are always equally normal. Here is the first case: Flipping for Heads A coin flipper will flip a fair coin until it lands heads. Then he will flip no more. Suppose you have evidential knowledge that this is the setup. The coin will in fact land heads on the first flip. Dorr et al. argue that, on pain of skepticism, in this case you will know that the coins will not be flipped 1000 times. Goodman and Salow (2018) agree, and suggest (in effect) the following model. The states 50 are s0, s1,... corresponding to the number of times the coin lands tails. For every set of states Y and state x ∈ Y , there is a situation hx, Y i. As with the case of Bjorn considered above, normality relations between situations are treated as parasitic on the normality relations between their underlying states, thereby validating evidence neutrality. si is at least as normal as sj just in case i ≤ j, and si is sufficiently more normal than sj just in case i + c ≤ j, where c is “the largest real number such that a .5c chance still qualifies as substantial” (p. 187). As in the aforementioned model of Bjorn, these relations satisfy comparability and foundation, so e-belief and k-belief again coincide. Moreover, belief-revision is again predicted to be non-monotonic; and, given k- normality, knowledge can be defeated by new evidence. For example, suppose c = 10. Then when your evidence is trivial, you believe that the coin will be flipped at most 10 times; and, given k-normality, this belief will amount to knowledge assuming it is true. If you then discover that the coin lands tails the first time it is flipped (which is compatible with what you previously believed), you will no longer believe (and hence no longer know, assuming you knew before) that the coin will be flipped at most 10 times – now you will believe only that the coin will be flipped at most 11 times.51 Again, we think this non-monotonic belief-revision is exactly the right pre- diction, and that it is an advantage of the present framework that it can easily model it. If you’re watching the experiment unfold and it is still in progress, then what you believe about how many future flips there will be until the coin lands

50Here and throughout, we ignore the possibility of the experiment going on forever. 51Given km- or i-normality, this does not amount to knowledge defeat, since if the coin really does land tails the first time it is flipped (as it must if you are to discover that it did), your belief that it would be flipped at most 10 times was never knowledge to begin with. It’s worth noting that Flipping for Heads closely resembles the surprise exam paradox, and the choice of whether or not to accept margins is analogous to the choice between diagnosing the surprise exam paradox as involving a failure of the KK thesis, as Williamson (2000, chapter 6) does, or a failure of no defeat, as Kripke (2011) does; see also Holliday (2016), who suggests that which diagnosis is appropriate may depend on the details of the case.

32 heads should not change. This is particularly compelling in a slightly modified case, where instead of flipping a fair coin until it lands heads, we are watching a radioactive atom and waiting for it to decay. On pain of skepticism, we think we should allow that ordinary people have some non-trivial beliefs about such subject matters. After all, many processes we set in motion in daily life (such as detonating thermonuclear weapons) require the actualization of some such decay events. But having beliefs about how long (from the present moment) it will be before the atom decays that change as we wait for it to decay smacks of a gambler’s fallacy: the fact that it hasn’t decayed yet doesn’t make it in any sense ‘overdue’, and this constancy in the objective chances should be matched by a constancy in the shape of our beliefs about the future. This way of thinking about what it is reasonable to believe about the out- comes of chancy processes sets our account apart from that of Martin Smith (2016, 2018a), the most systematically developed normality-based epistemol- ogy in the literature. Smith writes that “Given that a coin is fair, it would be equally normal for it to be flipped and land heads and for it to be flipped and land tails” (2018a, p. 731).52 Whether this marks a disagreement with us is a delicate question, since Smith operates with only one relation of comparative normality where we have two. His notion of one situation being more normal than another plays a role in his theory of justified belief analogous to our no- tion of one situation being sufficiently more normal than another. We agree with Smith that the outcome of a flip of a fair coin cannot by itself make for a sufficient difference in the normality of two situations, and the model just described respects this judgment.53 Where we disagree with Smith is over the stronger principle that the outcome of a flip of a fair coin cannot by itself make for any difference in the normality of two situations. (Smith’s commitment to this stronger principle is evident from the fact that he thinks that you shouldn’t believe anything about the outcome of the experiment beyond what is entailed by your evidence.) Intuitively, the point of disagreement is that we think small differences in normality can add up to a sufficient difference in normality, and that s0, . . . , s1000 are a case in point. Note that our view is not driven by a pretheoretic judgment to the effect that s0 is sufficiently more normal than s1000. The operative notions of normal- ity, as we are understanding them, are too theoretical for such judgments to be very probative. Rather, our view is motivated by the fact that the hypothesized

52He continues: “Arguably, this is part of the analysis of what it means for a coin to be ‘fair’.” In this respect, his view is reminiscent of Lewis’s (1980) account of objective chance in terms of its role in a theory of rational degrees of belief. But the view is rather less developed – after all, presumably one adequate analysis of what it is for a coin to be fair is for it to have a .5 chance of landing on each side when flipped, but Smith’s remarks do not point to any general way of assigning numbers in the unit interval to the outcomes of chancy processes. 53Compare Goodman and Salow’s (2018) principle that “no possible world-history in which e obtains, where e is the event of some fair coin being flipped, is far more normal than every such history in which e results in a tails outcome (and vice versa)” (p. 195). Although they accept this as a general principle about coin flips, they do not accepts a more general principle (their “Chance-Normality Link”, p. 193) about propositions with chance .5, for reasons similar to those that lead Hawthorne and Lasonen-Aarnio (2009) and Williamson (2009) to reject a related “High Chance – Close Possibility” bridge-principle.

33 normality relations make attractive epistemic predictions. Smith’s view, by con- trast, is driven in part by an independent conception of normality: “To describe an event or a situation as ‘abnormal’ . . . is to mark [it] out . . . as something that would require special explanation if it were to occur or come about” (Smith, 2018b, p.3). We agree with Smith that striking outcomes of repeated coin flips sometimes have no special explanation (i.e. none that wouldn’t be shared by more mundane outcomes), and hence don’t require special explanation. We thus reject the supposed connection between being sufficiently less normal and re- quiring a special explanation. It’s worth noting, however, that while striking outcomes needn’t require a special explanation, they may nonetheless ‘cry out’ for one (to use White’s (2005) expression). Rejecting Smith’s particular proposal may thus still be compatible with linking the abnormality of an event to the ex- tent to which it ‘cries out’ for special explanation, and thus also with Harman’s (1965) idea that all reasonable inductive inferences are a kind of ‘inference to the best explanation’.54 We can sharpen our disagreement with Smith concerning knowledge of coin flips by considering a second example from Dorr et al. (2014). They write (and we agree) that “Surely, if you could ever learn anything non-trivial about objective chances, you could learn that a certain double-headed coin is not fair by flipping it repeatedly, seeing it land heads each time, and eventually inferring that it is not fair”. So consider: Headed for Heads You find a coin and, rather than inspecting it, you resolve to flip it 100 times and record whether it lands heads or tails. In fact, the coin is double-headed. Surely it should be possible to learn that the coin is double-headed (or, at least not fair) after seeing it land heads all 100 times you flip it in this case. As Bacon (2014) points out, this case seems like a simplified model for how we learn scientific laws: we observe a particular outcome a large number of times and eventually conclude that the regularity we see is due to an underlying law, not random chance. So denying that one can learn that the coin is double- headed by flipping it repeatedly comes dangerously close to all-out scepticism about induction. Fortunately, it is easy to give normality-based models which allow for such learning. We could, for example, say that, amongst situations in which the coin is fair, a situation in which it lands heads n times is at least as normal as one in which it lands heads m times just in case |50 − n| ≤ |50 − m| and sufficiently more normal just in case |50 − n| + c ≤ |50 − m| for some sufficiently large c (assumed to be less than 50), and the situation in which the coin is double-headed is as normal as any situation in which it is fair and lands heads exactly 50 times.55 This model predicts that, after see the coin land

54Smith is well aware of the different ways of connecting explanation and induction, and in Smith (2017) explicitly opts for the more demanding one in the case of patterns in the outcomes of coin flips. 55Natural generalizations of this model would also explain, for example, how one can come to know how many sides of a die are painted red and how many are painted green by observing

34 heads enough times, you can know that it is double-headed. But, of course, it violates Smith’s desideratum that two situations are equally normal whenever they differ only in how a fair coin lands when it is flipped. It is, moreover, hard to see how Smith’s desideratum could be squared with our anti-sceptical claim that you can eventually learn that the coin isn’t fair. For suppose, following Smith, that (i) the case can be modeled in a way that respects evidence neutrality,56 and (ii) two situations are equally normal whenever they differ only in how a fair coin lands when it is flipped. For all you know before flipping it, the coin is fair. So there is at least one epistemically accessible situation in which the coin is fair. So that situation is not sufficiently less normal than the actual situation. By (ii), that situation is as normal as the one where the coin is fair and will land heads every time. So when your evidence is trivial, the situation in which the coin is double headed is not sufficiently more normal than the one where it is fair and will land heads every time. By (i), this remains unchanged after discovering that the coin lands heads every time. So even after that discovery, you won’t know that the coin isn’t fair.57 A different worry about Smith’s idea that we cannot have any inductive knowledge of the patterns of outcomes of the flips of fair coins is that it threat- ens the possibility of gaining inductive knowledge about measurable magnitudes using scientific instruments. For the approximately Gaussian distributions that usually characterize instruments’ errors are not a coincidence. Rather, they arise (by the Central Limit Theorem) from the fact that these errors result from adding up the impacts of many independent chancy processes that occur in the measurement process. This suggests that we can view the error in Bjorn’s scale, for example, as relevantly like the heads/tails disparity among many indepen- dent coin flips. If this is right, then Smith’s view implies that Bjorn lacks the kind of inductive knowledge that we have been assuming that he has.58

9 Evidence Neutrality

All of the two-dimensional models we have considered so far have satisfied evi- dence neutrality. This section presents a case this is naturally described by two-dimensional models that violate that principle, and draws out some strik- ing predictions for the dynamics of knowledge and belief. For example, we will argue that the case exhibits an analogue for inductive knowledge of what Pryor (2000) calls “dogmatism” about perceptual justification: actually getting some evidence can allow you to know p even though you cannot know in advance that, if you will get that evidence, then p is true. We highlight some methodological the outcome of a sufficient number of rolls. So such models illuminate not just our knowledge of law-like generalizations, but also of facts about the objective chances. 56evidence neutrality is in effect a presupposition of Smith’s framework, for the same reason as it is for plausibility-order models of belief-revision (as discussed in section 7), since the two classes of models are isomorphic. 57In the next section we consider a way of preserving Smith’s desideratum by rejecting (i). 58For further discussion of this perspective on instrument-based knowledge in a normality- theoretic context, see Goodman (2013).

35 implications for theories of belief-revision. The case combines features of the two coin-flipping examples from the previ- ous section. As a warm up, suppose you decide to flip 100 fair coins at once. For the kind of reasons discussed in connection to Headed for Heads, we think that, ordinarily, you can know that they will not all land heads. Now consider iterating this procedure in the following way: Flipping for All Heads A coin flipper will simultaneously flip 100 fair coins until they all simultaneously land heads. Then he will flip no more. As in the non-iterated warm up, we think that – given evidential knowledge of this setup – you can know that the coins won’t all land heads the first time.59 And as in Flipping for Heads, we think that, if you watch the experiment unfold, then as long as it is still ongoing, what you believe about how many more trials it will take for the experiment to come to an end remains the same. These minimal assumptions conflict with evidence neutrality given a natural two-dimensional description of the relevant situations, which identifies states with outcomes of the experiment. For let n be the smallest number such that it is compatible with your beliefs that there will be at most n total trials. By hypothesis, n > 1. Now suppose you watch the first trial, and see that at least one of the coins lands tails. Since what you believe about how many more trials it will take for the experiment to come to an end remains the same, n + 1 is now the smallest number such that it is compatible with what you believe that there will be at most that many total trials. Let si be the state in which the experiment ends after i trials, let S = {s1, s2,... }, and let x be the number of trials there will actually be. At the beginning of the experiment, you are in situation hsx,Si; since it is compatible with what you believe that you are in hsn,Si, there must be no state s ∈ S such that hsn,Si is sufficiently less normal than hs, Si. After the first trial, you are in situation hsx,S\{s1}i; since it is not compatible with what you believe that you are in hsn,S\{s1}i, there must be some state s ∈ S\{s1} such that hsn,S\{s1}i is sufficiently less normal than hs, S\{s1}i. Since sn is sufficiently less normal than s when paired with your new evidence but not when paired with your old evidence, we have a violation of evidence neutrality. We can make a similar argument using less artificial examples. Consider what you know about how long it will be before your car breaks down. Let n be the least number such that, for all you now know, your car will have broken down that many days from now; and let n0 be the least number such that, for all you will know n days from now, your car will have broken down that many days from then. Plausibly, 1 < n < n + n0: you now know that your car will

59As mentioned earlier, radioactive decay has a similar large-scale probabilistic structure. There too we are attracted to an anti-skeptical view. Imagine two physicists setting up a Geiger counter to detect the decay of an atom with a half-life of one decade. One says to the other “Let’s take a break for lunch; we can finish this later – it won’t have decayed yet”. We think that such a physicist could speak from knowledge. But the argument in the main text does not hinge on such judgments.

36 hold up for a non-trivial number of days, and after seeing it hold up that long, you’ll then know that it won’t break down immediately. Natural two-dimensional descriptions of this case will involve violations of evidence neutrality, for the same reason that our two-dimensional description of Flipping for All Heads does. Note that this argument does not assume that n = n0; to the contrary, presumably n > n0, since after n days you can expect your car to be closer to breaking down than you could before. These violations of evidence neutrality have some striking implications for the epistemology of induction. Consider the following principle from Dorr et al. (2014):60

inductive anti-dogmatism If q together with what you know doesn’t entail p, then you won’t know p after discovering q. This principle fails given our minimal assumptions about Flipping for All Heads. To see why, notice that we get a violation of the even weaker principle:

no learning from what you already know If you know q but don’t know p, then you won’t know p after discovering q. This is because you can know at the start of the experiment that the coins won’t all land heads on the first trial, and you don’t know that they won’t all land heads on the nth trial. But, after discovering that they don’t all land heads on the first trial, you will know that they won’t all land heads on the nth trial.61 The case also involves failures of preservation, for the same reason that Flipping for Heads does. But now the relevant discovery – that the experiment won’t end immediately – is also something you initially believe. So we also get a violation of the weaker principle:

weak preservation If you believe q and believe p, then you’ll still believe p after discovering q. If we are right, then, this case involves much more extreme departures from

60They call this principle ‘inferential anti-dogmatism’. Compare also Bacon’s (forthcoming) principle ‘What You See is What You Know’. 61A complication: if you are watching the experiment unfold, then when you learn that at least one coin lands tails you also see that a particular coin lands tails, which is not something you already knew. Does this show that the above argument is an artifact of an overly coarse- grained model of states? We don’t think so. For it is easy to imagine a version of the case where you aren’t observing the experiment yourself but are merely informed about whether it is still ongoing. (Compare waiting for a Geiger counter to indicate that a radioactive atom has decayed.) More importantly, making states more fine-grained to reflect which coins land tails does not block our argument against evidence neutrality and inductive anti-dogmatism; it merely makes it more complicated to state. Nor would it prevent violations of the following principle, which although stronger than no learning from what you already know is similarly compelling: If you know q1 ∨ · · · ∨ qn but don’t know p, then either you wouldn’t know p after discovering q1 or ... or you wouldn’t know p after discovering qn.

37 orthodox theories of belief-revision than are usually considered.62,63 To drive this point home, consider a different weakening of preservation, which to our knowledge has gone unchallenged in the belief-revision literature: no reversal: If you believe p and don’t believe not-q, then you won’t believe not-p after discovering q. This principle fails too. Let m be the least number such that, at the start of the experiment, you believe the experiment will end in the first m trials. So at the start of the experiment you do not believe that the experiment will end in the first m − 1 trials. Suppose you then discover that the experiment is ongoing after m − 1 trials. Just as you initially believed that it wouldn’t end on the first trial, you will now believe that it won’t end on the mth trial, and hence that it won’t end in the first m trials.64 These counterexamples have important methodological implications for the- ories of belief-revision. As Lin (2019) cautions, theories of belief-revision are of- ten presented so abstractly that it is difficult to know what they predict about concrete cases. We have tried to make progress on this front by showing how to interpret the machinery of belief-revision using models of knowledge and belief. But this required the non-trivial idealization that we can model the relevant situations two-dimensionally. This idealization is not always reasonable; it fails whenever partitionality is not a reasonable idealization, and even when it is there still may be no reasonable way of factoring situations into a state of the world and your evidence about the state of the world. So from our perspective, we shouldn’t expect a theory of belief-revision to deliver exceptionless general- izations. Rather, we should be looking for generalizations that hold given natural

62For an explicit defense of the principle that updating with a proposition you already believe leaves your beliefs unchanged, see Leitgeb (2014). Note that just as Flipping for Heads yields counterexamples to no defeat given k-normality, a parallel argument shows that Flipping for All Heads, given k-normality, yields counterexamples to the weakening of no defeat that results from replacing “believe” with “know” in weak preservation. 63The case also presents counterexamples to widely held principles of non-monotonic reason- ing (the names of which are taken from Kraus et al. (1990)) if we interpret p |∼ q as the claim that, given total evidence p, you believe q; compare Makinson and G¨ardenfors(1991). The fail- ure of the (belief-analogue of) no learning from what you already know corresponds to a failure of Cut (p |∼ q and p ∧ q |∼ r ⇒ p |∼ r), and the failure of weak preservation to the failure of Cautious Monotony (p |∼ q and p |∼ r ⇒ p ∧ q |∼ r). Hawthorne and Makinson’s (2007) even weaker principle Very Cautious Monotony (p |∼ q∧r ⇒ p∧q |∼ r) is violated too – notably since, unlike Cautious Monotony, it is probabilistically valid (P r(q∧r|p) ≤ P r(r|p∧q)). preservation corresponds to Rational Monotony (p 6 |∼ ¬q and p ∧ q 6 |∼ r ⇒ p 6 |∼ r), antici- pation to Negation Rationality (p∧r 6 |∼ q and p∧¬r 6 |∼ q ⇒ p 6 |∼ q), and the (belief-analogue of) inductive anti-dogmatism to S (p ∧ q |∼ r ⇒ p |∼ q → r). If we are right then all of these principles fail on the present interpretation. 64It is also instructive to consider the principle that results from weakening preservation in both of the above ways, namely: weak no reversal: If you believe p and believe q, then you won’t believe not-p after discovering q. Flipping for All Heads does not suggest any counterexamples to this principle. For it is natural to think that, for some 0 < n < m, before each trial what you believe is that it will take somewhere between n and m more trials for the experiment to end. If this is right, there won’t be any counterexamples to weak no reversal as you watch the experiment unfold.

38 idealizations, and for features of cases that make these idealizations reasonable. Exploring these connections is an important topic for future work.65 Before concluding we should tie up a loose end. In section 8 we argued that, assuming evidence neutrality (as Smith tacitly does), his principle that it is equally normal for a fair coin to be flipped and land heads as for it be flipped and land tails conflicts with our anti-skeptical claim that, having flipped a coin 100 times and seeing it land heads every time, you can come to know that it is double-headed. But if we reject evidence neutrality, Smith’s principle needn’t lead to skepticism. Perhaps when paired with trivial evidence, the state where the coin is fair and lands heads all 100 times is just as normal as any other state in which the coin is fair, and not sufficiently less normal than the state where the coin is double headed; but, when paired with the evidence that the coin landed heads all 100 times, the state in which it is also fair is sufficiently less normal than the one in which it is double-headed.66,67 65Authors (ms e) make some progress on this front. For example, we identify conditions under which we can assume the following weakening of evidence neutrality 0 0 <-neutrality: hx, Y i is at least as normal as hx ,Y i if and only if hx, Y i is at least as normal as hx0,Y 0i. This principle implies no reversal, and so must fail in Flipping for All Heads. 66The probabilistic account of normality explored in Authors (ms e) naturally generates models like this, although it also predicts that these models characterize our knowledge only in certain contexts. 67There are also interesting connections between evidence neutrality and collapse. Re- call from the end of section 7 that, assuming evidence neutrality and foundation, any model in which collapse fails can be transformed into one with the same pattern in doxastic accessibility in which collapse holds, provided we introduce compensating violations of com- parability. Here we will show something similar: assuming k-normality and foundation, any model in which collapse fails can be transformed into one with the same pattern in epis- temic accessibility in which collapse holds, without introducing any new incomparabilities, provided we introduce compensating violations of evidence neutrality. Let < and Ï be 0 0 the original normality relations. For w = hx, Y i and v = hx ,Y i, say that w < v just in case either (i) w is doxastically accessible given Y and v isn’t, (ii) neither w nor v is doxastically accessible given Y , and w < v, or (iii) both w and v are doxastically accessible given Y . And say that w Ï0 v just in case either (i) w is doxastically accessible given Y and v isn’t, or 0 (ii) neither w nor v is doxastically accessible given Y , and w < v but v <6 w. So defined, < and Ï0 obey all the structural constraints we have imposed on the comparative normality relations as well as collapse; and using them to replace < and Ï yields a model in which all the same epistemic accessibility relations obtain. evidence neutrality may radically fail for 0 0 the ‘collapsed’ normality relations < and Ï , even if it held for the original relations < and Ï. Note that no analogous strategy is available given margins. Intuitively, this is because, given k-normality, the only sufficiently-more-normal facts that matter are ones involving the most normal possibilities compatible with the various bodies of evidence. Given margins, by contrast, all sufficiently-more-normal facts can matter for determining epistemic accessibility. (For a similar reason, the result mentioned in section 7 does not extend to epistemic acces- sibility, since at-least-as-normal facts involving doxastically inaccessible situations matter for epistemic accessibility given convexity.) Moreover, even assuming k-normality, we think these models are somewhat artificial, since they are most naturally characterized by ‘collaps- ing’ models in which collapse fails, and so seem to display a tacit recognition of a distinction between mere greater normality and sufficiently greater normality.

39 10 Conclusion

Compared to theories of inductive belief and credence, theories of inductive knowledge are in their infancy. Of course, theories of inductive belief can be paired with general accounts of when beliefs amount to knowledge, and such accounts have been around since . But these pairings won’t be theories of inductive knowledge of the kind we have in mind. For we can expect them to be intractable when it comes to the questions we have been exploring. For exam- ple, imagine combining a Bayesian-conditioning theory of inductive credence, a Lockean reduction of belief to high credence, and a reliably-true-belief analysis of knowledge. What general principles will knowledge obey on such an account? And how should we model what a person knows in the sort of cases we have been investigating? This heterogeneous conjunction offers little guidance. The normality framework is different. Its ambition is to do for inductive knowledge what Lewis (1973) did for counterfactual dependence. It provides a unified framework in which alternative theories of inductive knowledge can be productively compared, and in which general principles about inductive knowl- edge can be derived from structural conditions on comparative normality, just as Lewis showed how different principles of counterfactual logic can be derived from structural conditions on worlds’ comparative similarity. More importantly, just as Lewis’s framework gave philosophers new tools to think systematically about patterns of counterfactual dependence as they relate to causation, dispositions, laws of nature, and so on, the normality frame- work suggest natural models of what people know in a range of independently interesting cases. In particular, we have seen how it can be used to build at- tractive models of knowledge gained through multiple readings of error-prone instruments, knowledge about complex patterns that can be broken down into independent chancy events, knowledge of laws of nature, and knowledge pooled across multiple inductive domains. In other work we apply it to a broader range of cases, and argue that at least in many contexts independent characterizations of comparative normality in terms of evidential probability are available, which can be used to derive the formal models discussed in the second half of this paper.68 Finally, just as Lewis theorizes directly about counterfactuals, rather than analyzing them in terms of other modal notions like laws of nature, the normality framework characterizes knowledge directly, rather than analyzing it in terms of belief. At least at the present level of idealization, inductive knowledge may do more to illuminate inductive belief than the other way around.

68Authors (ms b) applies the framework to motivate a distinctive solution to the lottery paradox; Authors (ms c) uses it to analyze puzzles about belief and knowledge arising from failures of partitionality; Authors (ms a) argues that the framework compares favorably to safety-based approaches when applied to paradigm cases of inductive knowledge in statistical physics, such as the dynamics of a gas in a box and the trajectory of a particle subject to Brownian motion. The probabilistic reduction of normality is developed in Authors (ms e); compare Lewis (1979b), who argues that in many contexts comparative similarity can be independently characterized.

40 References

Carlos Alchourr´on,Peter G¨ardenfors,and David Makinson. On the logic of theory change: Partial meet contraction and revision functions. Journal of Symbolic Logic, 50:510–530, 1985. Authors. Paper on inductive knowledge in statistical physics. ms a. Authors. Paper on induction and lotteries. ms b. Authors. Paper on belief when partitionality fails. ms c. Authors. Paper on KK when partitionality fails. ms d. Authors. Paper on a contextualist reduction of normality to probability. ms e. Andrew Bacon. Giving your knowledge half a chance. Philosophical Studies, 171:373–397, 2014. Andrew Bacon. Inductive knowledge. Noˆus, forthcoming. Brian Ball. Knowledge is normal belief. Analysis, 73:69–76, 2013. Adam Bear and Joshua Knobe. Normality: Part descriptive, part normative. Cognition, 167:25–37, 2017. Bob Beddor and Carlotta Pavese. Modal virtue epistemology. Philosophy and Phenomenological Research, forthcoming. Alexander Bird and Richard Pettigrew. Internalism, externalism, and the KK principle. Erkenntnis, forthcoming. Michael Caie. Doxastic logic. In Richard Pettigrew and Jonathan Weisberg, editors, The Open Handbook of Formal Epistemology. The PhilPapers Foun- dation, 2019. Sam Carter. Higher order ignorance inside the margins. Philosophical Studies, 176:1789–1806, 2019. Sam Carter and Simon Goldstein. The normality of error. Philosophical Studies, forthcoming. Stewart Cohen and Juan Comesa˜na.Williamson on Gettier cases and epistemic logic. Inquiry, 56:15–29, 2013. Cian Dorr and Jeremy Goodman. Diamonds are forever. Noˆus, forthcoming. Cian Dorr, Jeremy Goodman, and John Hawthorne. Knowing against the odds. Philosophical Studies, 170:277–287, 2014. Cian Dorr, Jacob Nebel, and Jake Zuehl. The case for comparability. ms. Julien Dutant. How to be an infallibilist. Philosophical Issues, 26:148–171, 2016.

41 Julien Dutant. Knowledge-first evidentialism about rationality. In Fabian Dorsch and Julien Dutant, editors, The New Evil Demon Problem. Oxford UP, forthcoming. Julien Dutant and Sven Rosenkranz. Inexact knowledge 2.0. Inquiry, forthcom- ing.

Kit Fine. Vagueness, truth, and logic. Synthese, 30:265–300, 1975. . Epistemology and Cognition. Harvard UP, 1986. Simon Goldstein and John Hawthorne. Knowledge from multiple appearances. ms.

Jeremy Goodman. Inexact knowledge without improbable knowing. Inquiry, 56:30–53, 2013. Jeremy Goodman and Bernhard Salow. Taking a chance on KK. Philosophical Studies, 175:183–196, 2018.

Nelson Goodman. Fact, Fiction, and Forecast. Harvard UP, 1955. Peter Graham. Epistemic entitlement. Noˆus, 46:449–482, 2012. Peter Graham. Normal circumstances reliabilism: Goldman on reliability and justified belief. Philosophical topics, 45:33–61, 2017.

Daniel Greco. Could KK be OK? Journal of Philosophy, 111:169–197, 2014. Adam Grove. Two modellings for theory change. Journal of Philosophical Logic, 17:157–170, 1988. Ned Hall. How to set a surprise exam. Mind, 108:647–703, 1999.

Gilbert Harman. The inference to the best explanation. Philosophical Review, 74:88–95, 1965. James Hawthorne and David Makinson. The quantitative/qualitative watershed for rules of uncertain inference. Studia Logica, 86:247–297, 2007.

John Hawthorne and Maria Lasonen-Aarnio. Knowledge and objective chance. In Patrick Greenough and , editors, Williamson on knowl- edge. Oxford UP, 2009. John Hawthorne, Daniel Rothschild, and Levi Spectre. Belief is weak. Philo- sophical Studies, 173:1393–1404, 2016.

Ben Holgu´ın.Thinking, guessing, and believing. ms. Wesley Holliday. Epistemic close and epistemic logic I: Relevant alternatives and subjunctivism. Journal of Philosophical Logic, 44:1–62, 2015.

42 Wesley Holliday. Simplifying the suprise exam. UC Berkeley Working Paper in Philosophy, 2016. Joachim Horvath and Jennifer Nado. Knowledge and normality. Synthese, forthcoming. . A Treatise of Human Nature. Oxford UP, 1739. Angelika Kratzer. Conditional necessity and possibility. In Semantics from Different Points of View. Springer, 1979. Angelika Kratzer. Situations in natural language semantics. In The Stanford Encyclopedia of Philosophy (Summer 2019 Edition). 2019. Sarit Kraus, Daniel Lehmann, and Menachem Magidor. Nonmonotonic rea- soning, preferential models, and cumulative logics. Artificial Intelligence, 44: 167–207, 1990. . On two paradoxes of knowledge. In Philosophical Troubles. Oxford UP, 2011. Hannes Leitgeb. The review paradox: On the diachronic costs of not closing rational belief under conjunction. Noˆus, 48:781–793, 2014. Wolfgang Lenzen. Recent Work in Epistemic Logic. Acta Philosophica Fennica, 1978. Jarrett Leplin. In defense of reliabilism. Philosophical Studies, 134:31–42, 2007. David Lewis. Counterfactuals. Blackwell, 1973. David Lewis. Attitudes de dicto and de se. Philosophical Review, 88:513–543, 1979a. David Lewis. Counterfactual dependence and time’s arrow. Noˆus, 13:455–476, 1979b. David Lewis. A subjectivist’s guide to objective chance. In William Harper, Robert Stalnaker, and Glenn Pearce, editors, Ifs. D. Reidel, 1980. David Lewis. Ordering semantics and premise semantics for counterfactuals. Journal of Philosophical Logic, 10:217–234, 1981. David Lewis. Elusive knowledge. Australasian Journal of Philosophy, 74:549– 567, 1996. Hanti Lin. Belief revision theory. In Richard Pettigrew and Jonathan Weis- berg, editors, The Open Handbook of Formal Epistemology. The PhilPapers Foundation, 2019. Hanti Lin and Kevin Kelly. Propositional reasoning that tracks probabilisic reasoning. Journal of Philosophical Logic, 41:957–981, 2012.

43 Clayton Littlejohn and Julien Dutant. Justification, knowledge, and normality. Philosophical Studies, 177:1593–1609, 2020. Annina Loets. Choice points for a theory of normality. ms. Ofra Magidor. How both you and the brain in a vat can know whether or not you are envatted. Aristotelian Society Supplementary Volume, 92:151–181, 2018. David Makinson. The paradox of the preface. Analysis, 25:205–207, 1965. David Makinson. Five faces of minimality. Studia Logica, 52:339–379, 1993.

David Makinson and Peter G¨ardenfors. Relations between the logic of theory change and nonmonotonic logic. In Andr´eFuhrmann and Michael Mourreau, editors, The logic of theory change. Springer, 1991. John McDowell. Criteria, defeasibility and knowledge. Proceedings of the British Academy, 68:455–479, 1982.

John McDowell. Knowledge and the internal. Philosophy and Phenomenological Research, 55:877–893, 1995. Ruth Millikan. Naturalist reflections on knowledge. Pacific Philosophical Quar- terly, 65:315–334, 1984.

Andrew Peet and Eli Pitcovski. Normal knowledge: Towards an explanation- based theory of knowledge. Journal of Philosophy, 115:141–157, 2018. John Pollock. The “possible worlds” analysis of counterfactuals. Philosophical Studies, 29:469–476, 1976. Duncan Pritchard. Anti-luck virtue epistemology. Journal of Philosophy, 109: 247–279, 2012. James Pryor. The sceptic and the dogmatist. Noˆus, 34:517–549, 2000. Colin Radford. Knowledge: by examples. Analysis, 27:1–11, 1966. Sven Rosenkranz. The structure of justification. Mind, 127:309–338, 2018.

Bertrand Russell. Human Knowledge: Its Scope and its Limits. Allen & Unwin, 1949. Bernhard Salow. Lewis on iterated knowledge. Philosophical Studies, 173:1571– 1590, 2016.

Ted Shear and Branden Fitelson. Two approaches to belief-revision. Erkenntnis, 84:487–518, 2019. Martin Smith. What else justification could be. Noˆus, 44:10–31, 2010.

44 Martin Smith. Between Probability and Certainty: What Justifies Belief. Oxford UP, 2016. Martin Smith. Why throwing 92 heads in a row is not surprising. Philosophers’ Imprint, 17:1–8, 2017.

Martin Smith. Coin trials. Canadian Journal of Philosophy, 48:726–741, 2018a. Martin Smith. The logic of epistemic justification. Synthese, 195:3857–3875, 2018b. Ernest Sosa. How do defeat opposition to Moore. Philosophical Perspectives, 13:141–154, 1999.

Robert Stalnaker. What is a non-monotonic consequence relation? Fundamenta Informaticae, 21:7–21, 1994. Robert Stalnaker. On logics of knowledge and belief. Philosophical Studies, 128: 169–99, 2006.

Robert Stalnaker. Luminosity and the KK thesis. In S. Goldberg, editor, Ex- ternalism, Self-Knowledge, and Skepticism. Cambridge UP, 2015. Robert Stalnaker. Contextualism and the logic of knowledge. In Knowledge and Conditionals. Oxford UP, 2019.

Johan van Bentham. Dynamic logic for belief revision. Journal of Applied Non-Classical Logics, 17:129–155, 2007. Brian Weatherson. Margins and errors. Inquiry, 56:63–76, 2013. Roger White. Explanation as a guide to induction. Philosophers’ Imprint, 5: 1–29, 2005.

Timothy Williamson. Knowledge and its Limits. Oxford UP, 2000. . Probability and danger. The Amherst Lecture in Philos- ophy, 4:1–35, 2009. Timothy Williamson. Improbable knowing. In Trent Dougherty, editor, Eviden- tialism and its Discontents. Oxford UP, Oxford, 2011. Timothy Williamson. Gettier cases in epistemic logic. Inquiry, 56:1–14, 2013a. Timothy Williamson. Response to Cohen, Comesa˜na,Goodman, Nagel, and Weatherson on Gettier cases in epistemic logic. Inquiry, 56:77–96, 2013b.

Timothy Williamson. Very improbable knowing. Erkenntnis, 79:971–999, 2014. Timothy Williamson. The KK principle and rotational symmetry. Analytic Philosophy, forthcoming.

45 Zhaoqing Xu and Bo Chen. Epistemic logic with evidence and relevant alter- natives. In Hans van Ditmarsch and Gabriel Sandu, editors, Jaakko Hintikka on Knowledge and Game-Theoretical Semantics. Springer, 2018. Seth Yalcin. Modalities of normality. In Deontic Modality. Oxford UP, 2016.

Elia Zardini. K * E. Philosophy and Phenomenological Research, 94:540–557, 2017.

A Glossary

Some principles or defined notions are used repeatedly. They are reprinted here for ease of reference. Section 1 defines the normality framework as the conjunction of the follow- ing: structural principles < (‘is at least as normal as’) is reflexive and transitive Ï (‘is sufficiently more normal than’) is asymmetric w Ï v ⇒ w < v w1 < w2 Ï w3 < w4 ⇒ w1 Ï w4 factivity w is epistemically accessible from w.

w ∈ RK (w) evidential knowledge If v is not evidentially accessible from w, then v is not epistemically ac- cessible from w.

RK (w) ⊆ RE(w) inductive knowledge If w is sufficiently more normal than v, then v is not epistemically acces- sible from w.

w Ï v ⇒ v 6∈ RK (w) inductive limits If v is evidentially accessible from w and no v0 evidentially accessible from w is sufficiently more normal than v, then v is epistemically accessible from w.

(v ∈ RE(w) ∧ ∀u ∈ RE(w)(u Ï6 v)) ⇒ v ∈ RK (w) convexity If v is evidentially accessible from w and is at least as normal as some v0 epistemically accessible from w, then v is epistemically accessible from w.

(v ∈ RE(w) ∧ ∃u ∈ RK (w)(v < u)) ⇒ v ∈ RK (w)

46 Section 2 introduces a strong assumption about evidence, which we treat as an idealizing assumption for the remainder of the paper:

partitionality: Evidential accessibility is an equivalence relation. Section 3 considers various strengthenings of the normality framework. Say that two situations are comparable if one is at least as normal as the other, and a situation is more normal than another if it is at least as normal as it but not vice versa. The main additions discussed are then:

comparability If v is evidentially accessible from w, then w and v are comparable.

v ∈ RE(w) ⇒ (v < w ∨ w < v) collapse If v is more normal than w, then v is sufficiently more normal than w. (v < w ∧ w <6 v) ⇒ v Ï w margins: If v is evidentially accessible from w and comparable with w but not sufficiently less normal than w, then v is epistemically accessible from w.

(v ∈ RE(w) ∧ (v < w ∨ w < v) ∧ w Ï6 v) ⇒ v ∈ RK (w) k-normality: Epistemic accessibility is the most restrictive relation com- patible with factivity, inductive limits, and convexity.

RK (w) = {v ∈ RE(w): v < w} ∪ {v ∈ RE(w): ∀u ∈ RE(w)(u Ï6 v)} km-normality: Epistemic accessibility is the most restrictive relation compatible with margins, inductive limits, and convexity.

RK (w) = {v ∈ RE(w): ∃u ∈ RE(w)(w < u ∧ w Ï6 u ∧ v < u)} ∪ {v ∈ RE(w): ∀u ∈ RE(w)(u Ï6 v)} i-normality: Epistemic accessibility is the most inclusive relation com- patible with evidential knowledge and inductive knowledge.

RK (w) = {v ∈ RE(w): w Ï6 v}

Section 5 compares two notions of belief. A person e-believes that p if and only if p is true in every evidentially accessible possibility which is not sufficiently less normal than any other evidentially accessible possibility. A person k-believes that p if and only if they do not know that they do not know that p. To study the dynamics of knowledge and belief, section 6 introduces two- dimensional models that treat situations as state/evidence pairs, where a body of evidence is a set of states. We say that hx0,Y 0i is the result of discovering p in hx, Y i if and only if x = x0 and Y 0 is the set of states in Y that are compatible with p (i.e., the set of states z such that, for some Z, hz, Zi is a situation in which p is true – note that only some state/set-of-state pairs are situations).

47 We used this formalism to give models of repeated weighing (section 6) and of repeated coin-flipping (section 8) that, among other things, invalidate:

preservation: If you believe p and don’t believe not-q, then you still believe p after discovering q.

In section 9, we gave a model of a more complicated coin-flipping setup that invalidates further principles, including:

evidence neutrality: For all situations hx, Y i and hx0,Y 0i, if x = x0, then hx, Y i is as normal as hx0,Y 0i.

no learning from what you already know: If you know q but don’t know p, then you won’t know p after discovering q.

B Diagrams of Bjorn

In section 3 we considered a range of competing hypotheses about the compar- ative normality of the situations in which Bjorn has different weights and the scale reads 175.4 pounds. Here we present diagrams that illustrating these hy- potheses and their consequences for epistemic accessibility given k-normality, km-normality, and i-normality. A new pictorial convention is that situations within the same box are all equally normal (i.e. each is at least as normal as the other); an arrow from one box to another thus means that the relevant relation holds between every situation in the first box and every situation in the second box. Aside from this, the conventions are as in the main text. (For normality diagrams: every depicted situation is evidentially accessible from every other. A dotted arrow from w to v represents that w is at least as normal as v; a dotted double arrow from w to v represents that w is sufficiently more normal than v. Reflexive relations of at least as normal as are not depicted; neither are relations of at least as normal as and sufficiently more normal than that can be inferred from the transitivity of these relations and the chaining principle connecting them. For epistemic accessibility diagrams: we only represent what is epistemically accessible from the situations printed in bold; triple arrows indicate what is accessible from there under all three of k-normality, km-normality, and i-normality; double arrows indicate what is accessible under both km-normality and i- normality; and single arrows indicate what is accessible under i-normality only.) The first hypothesis discussed is a simple binary model. This might look as follows:

48 170.4 180.4 170.4 180.4 4C 6E 171.4 179.4 171.4 179.4 7F 2@ 172.4 178.4 172.4 178.4 3; JT /9 173.4 177.4 173.4 *4 177.4  174.4 176.4 174.4 jt %/ 176.4 & rÐ 175.4 175.4 The second hypothesis discussed goes beyond the simple binary model by introducing ‘levels of normality’. This might look as follows:

170.4 180.4 170.4 180.4

/7 171.4 179.4 171.4 179.4 172.4 178.4 172.4 178.4 JT /9 /7 173.4 177.4 173.4 *4 177.4  174.4 176.4 174.4 jt %/ 176.4 & rÐ 175.4 175.4 Note that it’s not essential to this hypothesis (as we formulated it) that ‘levels’ other than the most normal one contain multiple values of over- and under-estimation; so it might also be elaborated as follows:

/7 170.4 180.4 170.4 180.4

/7 171.4 179.4 171.4 179.4

/7 172.4 178.4 172.4 178.4

/7 173.4 177.4 173.4 *4 177.4  174.4 176.4 174.4 jt %/ 176.4 & rÐ 175.4 175.4

The third hypothesis expands on the ‘levels of normality’ by rejecting com- parability between situations involving over- and underestimation. This might look as follows (though again there is nothing in this hypothesis that requires ‘levels’ other than the most normal one to contain multiple values):

49 170.4 180.4 170.4 180.4 < 171.4 179.4 171.4 179.4 /7 go 8 172.4 178.4 172.4 178.4 JT 3 /7 173.4 177.4 go 173.4 / 177.4  174.4 176.4 174.4 jt %/ 176.4 & rÐ 175.4 175.4 Our preferred hypothesis maintains that abnormality always increases with the distance between displayed and actual value, thus rejecting collapse but maintaining comparability:

3; 170.4 180.4 k 170.4 180.4

3; 171.4 179.4 k 171.4 179.4 172.4 178.4 172.4 178.4 3; k KS /7 173.4 177.4 173.4 *4 177.4 Ya k go KS  ) 174.4 176.4 174.4 jt 176.4 8 & rÐrÐ 175.4 175.4 A related idea (suggested by Goodman (2013)) would similarly reject col- lapse while maintaining comparability, but would insist that there is a ‘safe haven’ of (maximal) normality, within which things are equally normal:

3; 170.4 180.4 k 170.4 180.4

3; 171.4 179.4 k 171.4 179.4 172.4 178.4 172.4 178.4 3; k KS /7 173.4 177.4 173.4 *4 177.4 k eo JT  ) 174.4 176.4 174.4 jt 176.4 & rÐ 175.4 175.4

A final option (not discussed in the main text but particularly illustrative of the three way distinction between k-, km-, and i-normality) combines features of the fourth and fifth proposals to reject both collapse and comparability:

50 170.4 180.4 170.4 180.4 3; O O ck ] < 171.4 179.4 171.4 179.4 3; O O ck b 8 172.4 178.4 172.4 178.4 3; O O ck KS f 3 173.4 177.4 173.4 / 177.4 O Ya =E O k KS  ) 174.4 176.4 174.4 jt 176.4 f 8 & rÐrÐ 175.4 175.4 In footnote 22 we mention that one might also want over- and under-estimation to be only partially comparable. The main effect of this is to slightly rein in some of i-normality’s more sceptical implications. For example, when applied to the final proposal, it could yield something like the following:

170.4 180.4 170.4 180.4 3; O ]e 9A O ck 171.4 179.4 171.4 179.4 3; O ]e 9A O ck 8 172.4 178.4 172.4 178.4 3; O O ck KS f 3 173.4 177.4 173.4 / 177.4 O Ya =E O k KS  ) 174.4 176.4 174.4 jt 176.4 f 8 & rÐ 175.4 175.4

C Belief without Foundation

In section 5 we made the assumption foundation: If v is evidentially accessible from w, then there is some u that is doxastically accessible from w and at least as normal as v. where v is doxastically accessible from w just in case v is evidentially accessi- ble from w and not sufficiently less normal than any u evidentially accessible from w. This was for convenience: it allows for a simple definition of what a person e-believes as what is true in all doxastically accessible situations. In the absence of foundation, that definition behaves poorly. Suppose, for example, that foundation fails because the evidential possibilities form an infinite chain of sufficiently increasing normality. Then for every evidential possibility, there will be another that is sufficiently more normal; so no possibility will be doxasti- cally accessible. As a result, the person will e-believe all propositions, including contradictions.

... / w3 / w2 / w1

51 Fortunately, there is a standard way of generalizing this definition of e- belief, which coincides with that definition given foundation and gives sensible verdicts if foundation fails. This revised definition says that a person e-believes p in w just in case there is a set of evidential possibilities in which p is true that (i) contains every evidential possibility that is at least as normal as any of its members, and (ii) is such that every evidential possibility not in it is sufficiently less normal than one of its members.69 When foundation holds, this coincides with the earlier definition, since the set of doxastically accessible situations will then uniquely satisfy these conditions. But when foundation fails, this definition is superior; for example, it does not predict that people will e-believe every proposition in the above example. (Note that, even on this revised definition, it remains true that a situation is doxastically accessible if and only if it is compatible with everything you e-believe.) Even with this improved definition of e-belief, there are still puzzles arising from failures of foundation. One is that knowledge may fail to imply e-belief on this new definition. Suppose there is an infinite chain v1, v2,..., where each vi is sufficiently more normal than any vj preceding it, and also a possibility w incomparable with any member of that chain. Assuming that all of the pos- sibilities are evidentially accessible at w, k-normality and km-normality predict that, at w, you know that you are in w without e-believing that you are. Knowledge only approximately entails e-belief, in the sense that, if you know something, it is entailed by the totality of propositions you e-believe.

... +3 v2 +3 v1

w

Here is a case that might illustrate this structure. v1, v2 ... correspond to worlds of ever increasing finite value in which God exists; since it is always strange for God to create a less valuable world than he could have done, each vi is sufficiently more normal than its predecessors. w is a situation in which God doesn’t exist, which is incomparable with situations in which he does. The prediction is then that, at w, you know, but do not e-believe, that there is no God. One reaction to this disconnect between knowledge and e-belief would be to revise how we define knowledge in terms of epistemic accessibility; we might say that you know p if and only if it is both true in every epistemically accessible situation and something that you e-believe. (Note that, even on this revised definition, it remains true that a situation is epistemically accessible if and only if it is compatible with everything you know.) One motivation for this response would be that e-belief is necessary for belief and that belief is necessary for knowledge. A different motivation, which does not go by way of judgments about belief, is a desire to validate the principle:

69foundation is similar to the ‘limit assumption’ in discussions of the semantics of condi- tionals. The above definition adapts Lewis’s “fully neutral” truth conditions for counterfactuals (1981, p.230); building on Pollock (1976) and Kratzer (1979), this is intended to presuppose neither the limit assumption nor that the relevant relation between worlds obeys an analogue of comparability.

52 strong inductive limits p is known only if every evidentially accessible not-p-situation is sufficiently less normal than some evidentially accessible p-situation that is not suffi- ciently less normal than any evidentially accessible not-p-situation.

This principle is a mouthful, but here is the basic idea. inductive limits says that an evidential possibility can be ruled out only by another that is sufficiently more normal than it. strong inductive limits essentially adds that ruling out not-p possibilities in this way only suffices for knowing p if the p-possibilities that are doing the ruling out are not themselves ruled out by further not-p possibilities. strong inductive limits implies inductive limits, and is implied by inductive limits given foundation and the original definition of knowledge as truth in all epistemically accessible situations. But strong inductive limits is not implied by the normality framework on this original definition: for example, at w in the above model, you can know that you are not in any of the vi that have even subscripts. However, it is implied by the normality framework if knowledge is instead defined as being true in all epistemically accessible situations and being e-believed. Whether we redefine knowledge in this way also makes a difference to the structure of k-belief. partitionality and foundation together entail the fol- lowing principle:

k-belief consistency If you k-believe p, you do not k-believe not-p.

But if we stick with the original definition and accept either k-normality or km-normality, this principle can fail when foundation and comparability fail together. Suppose we have two incomparable infinite chains of ever increasing sufficient normality, v1, v2,... and u1, u2,..., as well as a possibility w which is less normal than any of the other possibilities; and suppose that any of these possibilities is evidentially accessible from any other. Given either k-normality or km-normality, it follows that, if you are in the first chain, you know that you are, and if you are in the second chain, you know that you are. At w, it’s consistent with what you know that you are in either chain, and so you both k-believe that you are in the first chain and k-believe that you are in the second chain (and hence not in the first chain).

... +3 v2 +3 v1

#+ 3; w

... +3 u2 +3 u1 By contrast, if we redefine knowledge so that it requires e-belief, we get no such counterexamples: partitionality now becomes sufficient for k-belief consistency. It will also be sufficient to ensure that k-belief approximates

53 e-belief (in the sense discussed in section 5), and that they coincide given k-normality. However partitionality and comparability are no longer enough to ensure the coincidence of e-belief and k-belief in the absence of foun- dation. This is illustrated by the case depicted below, in which you e-believe that you are in one of the w-situations, even though, assuming margins, you know that you do not know this.

... / v3 / v2 / v1 j KS KS KS

... / w3 / w2 / w1

54