1

Probability, Interpretations of1

1. INTRODUCTION 'Interpreting ' is a commonly used but misleading name for a worthy enterprise. The so-called 'interpretations of probability' would be better called 'analyses of various concepts of probability', and 'interpreting probability' is the task of providing such analyses. Normally, we speak of interpreting a formal system, that is, attaching familiar meanings to the primitive terms in its axioms and theorems, usually with an eye to turning them into true statements. However, there is no single formal system that is 'probability', but rather a host of such systems. To be sure, Kolmogorov's axiomatization, which we will present shortly, has achieved the status of orthodoxy, and it is typically what philosophers have in mind when they think of 'probability theory'. Nevertheless, several of the leading 'interpretations of probability' fail to satisfy all of Kolmogorov's axioms, yet they have not lost their title for that. Moreover, various other quantities that have nothing to do with probability do satisfy Kolmogorov's axioms, and thus are interpretations of it in a strict sense: mass, length, area, volume (each suitably normalized), and indeed anything that falls under the scope of measure theory. Nobody ever seriously considers these to be 'interpretations of probability', however, because they do not play the right role in our conceptual apparatus. Instead, we will be concerned here with various probability-like concepts that purportedly do. Be all that as it may, we will follow common usage and drop the cringing scare quotes in our survey of what philosophers have taken to be the chief interpretations of probability. Whatever we call it, the project of finding such interpretations is an important one. Probability is virtually ubiquitous. It plays a role in almost every branch of science. It finds its way, moreover, into much of philosophy. In epistemology, the philosophy of mind, and cognitive science, we see states of opinion being modeled by subjective probability functions, and being modeled by the updating of such probability functions. Since probability theory is central to and game theory, it has ramifications for various theories in ethics and political philosophy. It figures prominently in such staples of metaphysics as causation and laws of nature. It appears again in the philosophy of science in the analysis of confirmation of theories, scientific explanation, and in the philosophy of specific scientific theories, such as quantum mechanics, statistical mechanics, and genetics. It can even take center stage in the philosophy of logic, the philosophy of language, and the philosophy of religion. Thus, problems in the foundations of probability bear at least indirectly, and sometimes directly, upon central scientific and philosophical concerns. The interpretation of probability is arguably the most important such foundational problem.

2. KOLMOGOROV'S PROBABILITY Probability theory was inspired by games of chance in 17th century France and inaugurated by the Fermat-Pascal correspondence. However, its axiomatization had to wait until Kolmogorov’s classic book (1933). Let Ω be a non-empty set (‘the universal set’). A sigma-field (or sigma-algebra) on Ω is a set F of subsets of Ω that has Ω as a

1 I thank Branden Fitelson for extremely helpful comments. 2 member, and that is closed under complementation (with respect to Ω) and countable union. Let P be a function from F to the real numbers obeying: 1. (Non-negativity) P(A) ≥ 0 for all A ∈ F. 2. (Normalization) P(Ω) = 1. 3. (Finite additivity) P(A ∪ B) = P(A) + P(B) for all A, B ∈ F such that A ∩ B = ∅. Call P a probability function, and (Ω, F, P) a probability space. We could instead attach to members of a collection S of sentences of a formal language, closed under truth-functional combinations, with the following counterpart axiomatization: I. P(A) ≥ 0 for all A ∈ S. II. If T is a logical truth (in classical logic), then P(T) = 1. III. P(A ∨ B) = P(A) + P(B) for all A ∈ S and B ∈ S such that A and B are logically incompatible. It is controversial whether we should strengthen finite additivity, as Kolmogorov does: 3«. (Countable additivity) If {Ai} is a countable collection of (pairwise) disjoint sets, each ∈ F, then ∞ ∞ P(U An) = ∑ P(An). n =1 n=1 The of A given B is then taken to be given by the ratio of unconditional probabilities: P(X ∩ Y) P(X|Y) = P(Y) , provided P(Y) > 0. There are other axiomatizations that give up normalization; that give up countable additivity, and even additivity; that allow probabilities to take infinitesimal values (positive, but smaller than every positive real number); that allow probabilities to be vague (interval-valued, or more generally sets of numerical values); and that take conditional probability to be primitive. For now, however, when we speak of 'the probability calculus', we will mean Kolmogorov's approach. Given certain probabilities as inputs, the axioms and theorems allow us to compute various further probabilities. However, apart from the assignment of 1 to the universal set and 0 to the empty set, they are silent regarding the initial assignment of probabilities. For guidance with that, we need to turn to the interpretations of probability. 3. CRITERIA OF ADEQUACY First, however, let us list some criteria of adequacy for such interpretations. We begin by following Salmon (19xx, 64):

Admissibility. We say that an interpretation of a formal system is admissible if the meanings assigned to the primitive terms in the interpretation transform the formal axioms, and consequently all the theorems, into true statements. A fundamental requirement for probability concepts is to satisfy the mathematical relations specified by the calculus of probability… Ascertainability. This criterion requires that there be some method by which, in principle at least, we can ascertain values of probabilities. It merely expresses the 3

fact that a concept of probability will be useless if it is impossible in principle to find out what the probabilities are… Applicability. The force of this criterion is best expressed in Bishop Butler's famous aphorism, "Probability is the very guide of life."…

It might seem that the criterion of admissibility goes without saying: 'interpretations' of the probability calculus that assigned to P the interpretation 'the number of hairs on the head of' or 'the political persuasion of' would obviously not even be in the running, because they would render the axioms and theorems so obviously false. The word 'interpretation' is often used in such a way that 'admissible interpretation' is a pleonasm. Yet it turns out that the criterion is non-trivial, and indeed if taken seriously would rule out several of the leading interpretations of probability! As we will see, some of them fail to satisfy countable additivity; for others (certain propensity interpretations) the status of at least some of the axioms is unclear; and one of them (unconstrained subjectivism) violates all of the axioms. Nevertheless, we regard them as genuine candidates. It should be remembered, moreover, that Kolmogorov's is just one of many possible axiomatizations, and there is not universal agreement on which is 'best' (whatever that might mean). Thus, there is no such thing as admissibility tout court, but rather admissibility with respect to this or that axiomatization. It would be unfortunate if, perhaps out of an overdeveloped regard for history, one felt obliged to reject any interpretation that did not obey the letter of Kolmogorov's laws, and that was thus 'inadmissible'. Indeed, Salmon's preferred axiomatization differs from Kolmogorov's.2 In any case, if we found an inadmissible interpretation that did a wonderful job of meeting the criteria of ascertainability and applicability, then we should surely embrace it. So let us turn to those criteria. It is a little unclear in the ascertainability criterion just what "in principle" amounts to, though perhaps some latitude here is all to the good. Understood charitably, and to avoid trivializing it, it presumably excludes omniscience. On the other hand, understanding it in a way acceptable to a strict empiricist or a verificationist may be too restrictive. 'Probability' is apparently, among other things, a modal concept, plausibly outrunning that which actually occurs, let alone that which is actually sensed. Most of the work will be done by the applicability criterion. We must say more (as Salmon indeed does) about what sort of a guide to life probability is supposed to be. Mass, length, area and volume are all useful concepts, and they are 'guides to life' in various ways (think how critical distance judgments can be to survival); moreover, they are admissible and ascertainable, so presumably it is the applicability criterion that will

2 It turns out that the axiomatization that Salmon gives (p. 59) is inconsistent, and thus that by his lights no interpretation could be admissible. His axiom A2 states: "If A is a subclass of B, P(A, B) = 1" (read this as 'the probability of B, given A, equals 1'). Let I be the empty class; then for all B, P(I, B) = 1. But his A3 states: "If B and C are mutually exclusive P(A, B ∪ C) = P(A, B) + P(A, C)." Then for any X, P(I, X ∪ ÐX) = P(I, X) + P(I, ÐX) = 1 + 1 = 2, which contradicts his normalization axiom A1. This problem is easily remedied (simply add the qualification in A2 that A is non-empty), but it is instructive. It suggests that we ought not take the admissibility criterion too seriously. After all, Salmon's subsequent discussion of the merits and demerits of the various interpretations, as judged by the ascertainability and applicability criteria, still stands, and that is where the real interest lies. 4 rule them out. Perhaps it is best to think of applicability as a cluster of criteria, each of which is supposed to capture something of probability's distinctive conceptual roles; moreover, we should not require that all of them be met. They include: Non-triviality: an interpretation should make non-extreme probabilities at least a conceptual possibility. For example, suppose that we interpret 'P' as the truth function: it assigns the value 1 to all true sentences, and 0 to all false sentences. Then trivially, all the axioms come out true, so this interpretation is admissible. We would hardly count it as an adequate interpretation of probability, however, and so we need to exclude it. It is essential to probability that, at least in principle, it can take intermediate values. All of the interpretations that we will present meet this criterion, so we will discuss it no more. Applicability to frequencies: an interpretation should render perspicuous the relationship between probabilities and (long-run) frequencies. Among other things, it should make clear why, by and large, more probable events occur more frequently than less probable events. Applicability to rational belief: an interpretation should clarify the role that probabilities play in constraining the credences of rational agents. Among other things, knowing that one event is more probable than another, a rational agent will be more confident about the occurrence of the former event. Applicability to ampliative : an interpretation will score bonus points if it illuminates the distinction between 'good' and 'bad' ampliative , while explicating why both fall short of deductive inferences. The next criterion may be redundant, given our list so far, but including it will do no harm: Applicability to science: an interpretation should illuminate paradigmatic uses of probability in science (for example, in quantum mechanics and statistical mechanics). Perhaps there are further metaphysical desiderata that we might impose on interpretations. For example, there appear to be connections between probability and modality. Events with positive probability can happen, even if they don't. Some authors (e.g. Carnap 19xx, Jackson 19xx) also insist on the converse condition that only events with positive probability can happen, although this is more controversial. Be that as it may, our list is already long enough to help in our assessment of the leading interpretations on the market.

4. THE MAIN INTERPRETATIONS

4.1 CLASSICAL PROBABILITY The classical interpretation owes its name to its early and august pedigree. Championed by Laplace, and found even in the works of Pascal, Bernoulli and Leibniz, it assigns probabilities in the absence of any evidence, or in the presence of symmetrically balanced evidence. The guiding idea is that in such circumstances, probability is shared equally among all the possible outcomes, so that the classical probability of an event is simply the fraction of the total number of possibilities in which the event occurs. It seems especially well suited to games of chance which by their very design create such circumstances—for example, the classical probability of a fair die landing with an even number showing up is 3/6. It is often presupposed (usually tacitly) in textbook probability puzzles. 5

Here is a classic statement of the classical interpretation by Laplace (1814):

The theory of chance consists in reducing all the events of the same kind to a certain number of cases equally possible, that is to say, to such as we may be equally undecided about in regard to their existence, and in determining the number of cases favorable to the event whose probability is sought. The ratio of this number to that of all the cases possible is the measure of this probability, which is thus simply a fraction whose numerator is the number of favorable cases and whose denominator is the number of all the cases possible.

There are numerous questions to be asked about this formulation. When are events of the same kind? Intuitively, 'heads' and 'tails' are equally likely outcomes of tossing a fair coin; but if their kind is 'ways the coin could land', then 'edge' should presumably be counted alongside them. Is the "certain number of cases" always finite? Laplace's talk of "the ratio of this number to that of all the cases possible" suggests that it is. What, then, of probabilities in infinite spaces? Apparently, irrational-valued probabilities such as 1/√2 are automatically eliminated, and thus the truth of a theory such as quantum mechanics that posits them is ruled out a priori. Who are "we", who "may be equally undecided"? Different people may be equally undecided about different things (as it might be, Joe is that way regarding the sixth decimal place of pi, while Jo is not). It seems, then, that Laplace's is best regarded as a quasi-subjectivist interpretation, a statement of the probability assignment of a rational agent in an epistemically neutral position. But then the proposal risks sounding empty: for what is it for an agent to be "equally undecided" about a set of cases, other than assigning them equal probability? This brings us to one of the key objections to Laplace's account. Central to it is the notion of "equally possible" cases. But either this is category mistake (for 'possibility' does not come in degrees), or circular (for what is meant is really 'equally probable'. The notion is finessed by the so-called 'principle of indifference', a coinage due to Keynes. It states that whenever there is no evidence favoring one possibility over another, then they have the same probability. Enter Bertrand's paradoxes. They all turn on alternative parametrizations of a given problem that are non-linearly related to each other. The following example (adapted from van Fraassen 1989) nicely illustrates how Bertrand-style paradoxes work. A factory produces cubes with side-length between 0 and 1 foot; what is the probability that a randomly chosen cube has side-length between 0 and 1/2 a foot? The tempting answer is 1/2, as we imagine a process of production that is uniformly distributed over side-length. But the question could have been given an equivalent restatement: A factory produces cubes with face-area between 0 and 1 square-feet; what is the probability that a randomly chosen cube has face-area between 0 and 1/4 square-feet? Now the tempting answer is 1/4, as we imagine a process of production that is uniformly distributed over face-area. And it could have been restated equivalently again: A factory produces cubes with volume between 0 and 1 cubic feet; what is the probability that a randomly chosen cube has volume between 0 and 1/8 cubic-feet? Now the tempting answer is 1/8, as we imagine a process of production that is uniformly distributed over volume. What, then, is the probability of the event in question? 6

It is sometimes said that Bertrand-style paradoxes involve different ways of carving up the space of possibilities, or words to that effect. That is incorrect. In the cube example, (and in general) the space of possibilities is exactly the same in each case. For example, the very same possibility is variously labeled 'the cube's side-length is1/3', 'the cube's faces have area 1/9' and 'the cube's volume is 1/27'. By contrast, a genuine case of carving up the space of possibilities in different ways is this: the possible outcomes of a die toss are {1, 2, 3, 4, 5, 6}, or alternatively {1, not-1}. The classical theory is then precariously poised to deliver conflicting values to the probability of the die landing 1, namely 1/6 and 1/2. What the Bertrand-style paradoxes do involve (unlike the die case) are different ways of 'equally weighting' the very same set of possibilities. It is worth pointing out that the paradoxes always involve infinite—indeed, uncountable—probability spaces. But going back to Laplace's formulation, and our initial critique of it, it is unclear whether the classical theory even applies in such cases. However one enumerates the sets of possibilities in question, it cannot be by straightforward counting. (Presumably it is by integration.) It seems, then, the Bertrand's classic 'refutations' of the classical interpretation—supposedly showing that it is inconsistent—involve exactly those cases where the interpretation, at least as originally formulated, fails to apply. Of course, that only drives home the point that the interpretation is incomplete; but that is another point. It is best to think of Bertrand's paradoxes as posing difficulty for the principle of indifference, which can be separated from Laplace's particular formulation. Classical probabilities are only finitely additive (de Finetti 1974). This is not to say that they violate countable additivity, for the two sides of the countable additivity equation, which involve infinite collections of events, are always undefined for Laplacean classical probabilities. It would be more careful to say that classical probabilities fail to satisfy countable additivity—but presumably that still implies that they are not admissible with respect to Kolmogorov's axioms. They are ascertainable, assuming that the space of possibilities can be determined in principle. They bear a relationship to the credences of rational agents; the concern is that the relationship is vacuous, and that rather than constraining the credences of a rational agent in an epistemically neutral position, they merely record them. Without supplementation, the classical theory makes no contact with frequency information. However the coin happens to land in a sequence of trials, the possible outcomes remain the same. Indeed, even if we have strong empirical evidence that the coin is biased towards heads with probability, say, 0.6, it is hard to see how the unadorned classical theory can accommodate this fact—for what now are the ten possibilities, six of which are favorable to heads? Laplace does supplement the theory with his Rule of Succession: "Thus we find that an event having occurred successively any number of times, the probability that it will happen again the next time is equal to this number increased by unity divided by the same number, increased by two units." That is: N+1 Pr(success on N+1st trial | N consecutive successes) = N+2 Thus, inductive learning is possible. We must ask, however, whether such learning can be captured once and for all by such a simple formula, the same for all domains. We will return to this question when we discuss the logical interpretation below. 7

Science apparently invokes at various points probabilities that look classical. Bose- Einstein statistics, Fermi-Dirac statistics, and Maxwell-Boltzmann statistics each arise by considering the ways in which particles can be assigned to states, and then applying the principle of indifference to different subdivisions of the set of alternatives. The trouble is that Bose-Einstein statistics apply to some particles (e.g. photons) and not to others, Fermi-Dirac statistics apply to different particles (e.g. electrons), and Maxwell- Boltzmann statistics do not apply to any known particles. None of this can be determined a priori, as the classical interpretation would have it. Moreover, the classical theory purports to yield probability assignments in the face of ignorance. But as Fine (1973) writes: "If we are truly ignorant about a set of alternatives, then we are also ignorant about combinations of alternatives and about subdivisions of alternatives. However, the principle of indifference when applied to alternatives, or their combinations, or their subdivisions, yields different probability assignments" (170). This brings us to one of the chief points of controversy regarding the classical interpretation. Critics accuse the principle of indifference of extracting information from ignorance. Proponents reply that it rather codifies the way in which such ignorance should be epistemically managed—for anything other than an equal assignment of probabilities would represent the possession of some knowledge. Critics counter-reply that in a state of complete ignorance, it is better to assign vague probabilities (perhaps vague over the entire [0, 1] interval), or to eschew the assignment of probabilities altogether.

4.2 LOGICAL PROBABILITY Logical theories of probability retain the classical interpretation’s guiding idea that probabilities can be determined a priori by an examination of the space of possibilities. However, they generalize it in two important ways: the possibilities may be assigned unequal weights, and probabilities can be computed whatever the evidence may be, symmetrically balanced or not. Indeed, the logical interpretation, in its various guises, seeks to codify in full generality the degree of support or confirmation that a piece of evidence E confers upon a given hypothesis H, which we may write as c(H, E). In doing so, it can be regarded also as generalizing deductive logic and its notion of implication, to a complete theory of inference equipped with the notion of ‘degree of implication’ that relates E to H. It is often called the theory of ‘inductive logic’, though this is another misnomer: there is no requirement that E be in any sense ‘inductive’ evidence for H. ‘Non-deductive logic’ would be a better name, although even that overlooks the fact that deductive logic’s relations of implication and incompatibility are also accommodated as extreme cases in which the confirmation function takes the values 1 and 0 respectively. Nevertheless, what is significant is that the logical interpretation provides a framework for induction. Early proponents of logical probability include Keynes (1921), W. E. Johnson (1932), and Jeffreys (1939). However, by far the most systematic study of logical probability was by Carnap. His formulation of logical probability begins with the construction of a formal language. In (1950) he considers a class of very simple languages consisting of a finite number of logically independent monadic predicates (naming properties) applied to countably many individual constants (naming individuals) or variables, and the usual logical connectives. The strongest (consistent) statements that can be made in a given 8 language describe all of the individuals in as much detail as the expressive power of the language allows. They are conjunctions of complete descriptions of each individual, each description itself a conjunction containing exactly one occurrence (negated or unnegated) of each predicate of the language. Call these strongest statements state descriptions. Any m(Ð) over the state descriptions automatically extends to a measure over all sentences, since each sentence equivalent to a disjunction of state descriptions; m in turn induces a confirmation function c(Ð,Ð): m(h & e) c(h, e) = m(e) There are obviously infinitely many candidates for m, and hence c, even for very simple languages. Carnap argues for his favored measure “m*” by insisting that the only thing that significantly distinguishes individuals from one another is some qualitative difference, not just a difference in labeling. A structure description is a maximal set of state descriptions, each of which can be obtained from another by some permutation of the individual names. m* assigns each structure description equal measure, which in turn is divided equally among their constituent state descriptions. It gives greater weight to homogenous state descriptions than to heterogeneous ones, thus ‘rewarding’ uniformity among the individuals in accordance with putatively reasonable inductive practice. It can be shown that the induced c* allows inductive learning from experience—as, annoyingly, do infinitely many other candidate confirmation functions. Carnap claims that c* nevertheless stands out for being simple and natural. He later generalizes his confirmation function to a continuum of functions c λ. Define a family of predicates to be a set of predicates such that, for each individual, exactly one member of the set applies, and consider first-order languages containing a finite number of families. Carnap (1963) focuses on the special case of a language containing only one- place predicates. He lays down a host of axioms concerning the confirmation function c, including those induced by the probability calculus itself, various axioms of symmetry (for example, that c(h, e) remains unchanged under permutations of individuals, and of predicates of any family), and axioms that guarantee undogmatic inductive learning, and long-run convergence to relative frequencies. They imply that, for a family {Pn}, n = 1, ..., k , k > 2: s + λ/k j , cλ (individual s + 1 is Pj, sj of the first s individuals are Pj) = s + λ where λ is a positive real number. The higher the value of λ, the less impact evidence has: induction from what is observed becomes progressively more swamped by a classical-style equal assignment to each of the k possibilities regarding individual s + 1. The problem remains: what is the correct setting of λ, or said another way, how ‘inductive’ should the confirmation function be? Also, it turns out that for any such setting, a universal statement in an infinite universe always receives zero confirmation, no matter what the (finite) evidence. Many find this counterintuitive, since laws of nature with infinitely many instances can apparently be confirmed. Earman (1992) discusses prospects for avoiding the unwelcome result. Significantly, Carnap’s various axioms of symmetry are hardly logical truths. More seriously, we cannot impose further symmetry constraints that are seemingly just as plausible as Carnap’s, on pain of inconsistency—see Fine (1973, p. 202). Goodman 9 taught us: that the future will resemble the past in some respect is trivial; that it will resemble the past in all respects is contradictory. And we may continue: that a probability assignment can be made to respect some symmetry is trivial; that one can be made to respect all symmetries is contradictory. This threatens the whole program of logical probability. Another Goodmanian lesson is that inductive logic must be sensitive to the meanings of predicates, strongly suggesting that a purely syntactic approach such as Carnap’s is doomed. Scott and Krauss (1966) use model theory in their formulation of logical probability for richer and more realistic languages than Carnap’s. Still, finding a canonical language seems to many to be a pipe dream, at least if we want to analyze the “logical probability” of any argument of real interest—either in science, or in everyday life. Logical probabilities are admissible, and, assuming a rich enough language, ascertainable. They fail on the applicability criteria in very much the same way that the classical theory did, and for very much the same reasons. In connection with the 'applicability to science' criterion, a further point, due to Lakatos, is worth noting. The degree of confirmation of a hypothesis depends on the language in which the hypothesis is stated and over which the confirmation function is defined. But scientific progress often brings with it a change in scientific language (for example, the addition of new predicates and the deletion of old ones), and such a change will bring with it a change in the corresponding c-values. Thus, the growth of science may overthrow any particular confirmation theory. There is something of the snake eating its own tail here, since logical probability was supposed to explicate the confirmation of scientific theories.

4.3 SUBJECTIVE PROBABILITY We may characterize subjectivism (also known as personalism and subjective Bayesianism) with the slogan: 'Probability is degree of belief'. We identify probabilities with degrees of confidence, sometimes known as credences, of suitable agents. Thus, we really have many interpretations of probability here, as many as there are doxastic states of suitable agents: we have Aaron's degrees of belief, Abel's degrees of belief, Abigail's degrees of belief, …, or better still, Aaron's degrees of belief-at-time-t1, Aaron's degrees of belief-at-time-t2, Abel's degrees of belief-at-time-t1, … Of course, we must ask what makes an agent 'suitable'. Unconstrained subjectivism places no constraints on the agents—anyone goes, and hence anything goes. Various studies by psychologists (e.g. Kahneman and Tversky 19xx) show that people commonly violate the usual probability calculus in spectacular ways. We clearly do not have here an admissible interpretation (with respect to any probability calculus), since there is no limit to what agents might assign—negative probabilities, infinite probabilities, probabilities that are not additive, and so on. Indeed, one wonders what would make them assignments of probabilities, as opposed to just utterances of numbers. More interesting, however, is the claim that the suitable agents must be, in a strong sense, rational. Various subjectivists want to assimilate probability to logic, regarding probability as the logic of partial belief. A rational agent is required to be logically consistent, now taken in a broad sense. These subjectivists argue that this implies that the agent obeys the axioms of probability (with at least finite additivity), and that 10 subjectivism is thus (to this extent) admissible. But before we can present this argument, we must say more about what degrees of belief are.

The betting interpretation Subjective probabilities, in turn, are traditionally analyzed in terms of betting behavior. Here is a classic statement by de Finetti (19xx): Let us suppose that an individual is obliged to evaluate the rate p at which he would be ready to exchange the possession of an arbitrary sum S (positive or negative) dependent on the occurrence of a given event E, for the possession of the sum pS; we will say by definition that this number p is the measure of the degree of probability attributed by the individual considered to the event E, or, more simply, that p is the probability of E (according to the individual considered; this specification can be implicit if there is no ambiguity). (1980, p. 62)

This boils down to the following analysis: Your degree of belief in E is p iff p units is the price at which you would buy or sell a bet that pays 1 unit if E, 0 if not E.

De Finetti presupposes that, for any E, there is exactly one such price. This presupposition may fail. There may be no such price—you may refuse to bet on E at all (perhaps unless coerced, in which case your genuine opinion about E may not be revealed), or your selling price may differ from your buying price, as may occur if your probability for E is vague. There may be more than one such price—you may find a range of such prices acceptable, as may also occur if your probability for E is vague. For now, however, let us waive these concerns, and turn to an argument that uses the betting interpretation purportedly to show that rational degrees of belief must conform to the probability calculus (with at least finite additivity).

The Dutch Book argument A Dutch book is a series of bets, each of which the agent regards as fair, but which collectively guarantee her loss, however the world turns out. De Finetti (1937) proves that if your subjective probabilities violate the probability calculus, then you are susceptible to a Dutch book. For example, suppose that you violate the additivity axiom by assigning P(A v B) < P(A) + P(B), where A and B are mutually exclusive. Then a bookie could buy from you a bet on A v B for P(A v B), and sell you bets on A and B individually for P(A) and P(B) respectively. He pockets an initial profit of P(A) + P(B) Ð P(A v B), and retains it whatever happens. Equally important, and often neglected, is the converse theorem that establishes that you do not face such a predicament whatever you do. If your subjective probabilities conform to the probability calculus, then no Dutch book can be made against you (Kemeny 1955); your probability assignments are then said to be coherent. In a nutshell, conformity to the probability calculus is necessary and sufficient for coherence.3

3 Some authors simply define 'coherence' as conformity to the probability calculus. 11

Problems with the betting interpretation But let us return to the betting interpretation. We have here an operational definition of subjective probability, and indeed it inherits some of the difficulties of operationalism in general, and of behaviorism in particular. For example, you may have reason to misrepresent your true opinion, or to feign having opinions that in fact you lack, by making the relevant bets (perhaps to exploit an incoherence in someone else's betting prices). Moreover, as Ramsey (19xx) points out, placing the very bet may alter your state of opinion. Trivially, it does so regarding matters involving the bet itself (e.g., you suddenly increase your probability that you have just placed a bet). Less trivially, placing the bet may change the world, and hence your opinions, in other ways (betting at high stakes on the proposition 'I will sleep well tonight' may suddenly turn you into an insomniac). And then the bet may concern an event such that, were it to occur, you would no longer value the pay-off the same way. (During the August 11, 1999 solar eclipse in the UK, a man placed a bet that would have paid a million pounds if the world came to an end.) These problems stem largely from taking literally the notion of entering into a bet on E, with its corresponding payoffs; yet as it stands, it is unclear how else we are supposed to understand the analysis. The problems are avoided by identifying your degree of belief in a proposition with the betting price you regard as fair, whether or not you enter into such a bet; see Howson and Urbach (19xx). Still, the fair price of a bet on E appears to measure the wrong quantity: not your probability that E will be the case, but rather your probability that E will be the case and that the prize will be paid, which may be rather less—for example, if E is unverifiable. Weatherson (19xx) argues that this commits proponents of the betting interpretation to an underlying intuitionistic (as opposed to classical) logic. This in turn raises the question: if subjectivism is supposed to be an interpretation of the probability calculus, how exactly should the sentential version of the normalization axiom, with its reference to 'logical truth', be understood? Likewise, what exactly is the additivity axiom, with its reference to 'logical incompatibility', that is supposedly interpreted by the subjectivist? Absent answers to these questions, it is premature to claim that subjectivism is an admissible interpretation of Kolmogorov's axioms (understood sententially). De Finetti speaks of "an arbitrary sum" as the prize of the bet on E. It is just as well that he does not speak of dollar or pound or lire amounts, as some subjectivists do, as this would hold the analysis—again, taken literally—hostage (trivially) to the existence of these monetary systems, and (more interestingly) to their corresponding 'graininess'. Dollars, for instance, can be divided no more finely than units of 1/100, so any probability measurement using a dollar as prize would be imprecise beyond the second decimal place, and propositions that ought to receive different probabilities would wind up getting the same (e.g., a logical contradiction and 'a fair coin lands heads 8 times in a row'). Still, there are difficulties with de Finetti's "arbitrary sums". Whatever they are, they had better be infinitely divisible, or else the problem of imprecision will still arise. More significantly, if utility is not a linear function of such sums, then the size of the prize will make a difference to the putative probability: winning a dollar means more to a pauper more than it does to Bill Gates, and this may be reflected in their betting behaviors in ways that have nothing to do with their genuine probability assignments. De 12

Finetti responds to this problem by suggesting that the prizes be kept small; that, however, only creates the opposite problem that agents may be reluctant to bother about trifles, as Ramsey points out. Better, then, to let the prizes be measured in utilities: after all, utility is infinitely divisible, and utility is a linear function of utility.

Probabilities and utilities Utilities (desirabilities) of outcomes, their probabilities, and rational preferences are all intimately linked. The Port Royal Logic showed how utilities and probabilities together determine rational preferences; de Finetti's betting interpretation derives probabilities from utilities and rational preferences; von Neumann and Morgenstern (1944) derive utilities from probabilities and rational preferences. And most remarkably, Ramsey (1926) derives both probabilities and utilities from rational preferences alone. First, he defines a proposition to be ethically neutral—relative to an agent and an —if the agent is indifferent between having that outcome when the proposition is true and when it is false. The idea is that the agent doesn't care about the ethically neutral proposition as such—it is a means to a end that he might care about, but it has no intrinsic value. Now, there is a simple test for determining whether, for a given agent, an ethically neutral proposition N has probability 1/2. Suppose that the agent prefers A to B. Then N has probability 1/2 iff the agent is indifferent between the gambles: A if N, B if not B if N, A if not. Ramsey assumes that it does not matter what the candidates for A and B are. We may assign arbitrarily to A and B any two real numbers u(A) and u(B) such that u(A) > u(B), thought of as the desirabilities of A and B respectively. Given various assumptions about the richness of the preference space, and certain 'consistency assumptions', he can define a real-valued utility function of the outcomes A, B, etc—in fact, various such functions will represent the agent's preferences. He is then able to define equality of differences in utility for any outcomes over which the agent has preferences. It tuns out that ratios of utility-differences are invariant—the same whichever representative utility function we choose. This fact allows Ramsey to define degrees of belief as ratios of such differences. For example, suppose the agent is indifferent between A, and the gamble "B if X, C otherwise". Then her degree of belief in X, P(X), is given by:

u(A) Ð u(C) P(X) = u(B) Ð u(C) Ramsey shows that degrees of belief so derived obey the probability calculus (with finite additivity). He calls what results "the logic of partial belief", and indeed he opens his essay with the words "In this essay the Theory of Probability is taken as a branch of logic...". Ramsey avoids some of the objections to the betting interpretation, but not all of them. Notably, the essential appeal to gambles again raises the concern that the wrong quantities are being measured. And his account has new difficulties. It is unclear what facts about agents fix their preference rankings. It is also dubious that consistency requires one to have a set of preferences as rich as Ramsey requires, or that one can find 13 ethically neutral propositions of probability 1/2. This places strain on Ramsey's claim to assimilate probability theory to logic. Savage (1954) likewise derives probabilities and utilities from preferences among options that are constrained by certain putative 'consistency' principles. For a given set of such preferences, he generates a class of utility functions, each an affine transformation of the other, and a unique probability function; together these are said to 'represent' the agent's preferences. Jeffrey (1966) refines the method further. The result is theory of decision according to which rational choice maximizes 'expected utility', a certain probability-weighted average of utilities. Some of the difficulties with the behavioristic betting analysis of degrees of belief can now be resolved by moving to an analysis of degrees of belief that is functionalist in spirit. According to Lewis (1986a, 1994), an agent's degrees of belief are represented by the probability function belonging to a utility function/probability function pair that best rationalizes her behavioral dispositions, rationality being given a decision-theoretic analysis. It is a striking fact that agents with preferences that satisfy such-and-such conditions can be represented by a utility/probability function pairs. However, this falls short of the claim that they must be so represented. Said another way, the fact that there exists a probabilistic representation of a suitable agent does not preclude there also existing non- probabilistic representations of the very same agent. Indeed, Zynda (2001) shows that this is the case. The question thus arises: why should the probabilistic representation be favored over the non-probabilistic? There is another deep issue that underlies all of these accounts of subjective probability. They all presuppose the existence of necessary connections between desire- like states and belief-like states, rendered explicit in the connections between preferences and probabilities. In response, one might insist that such connections are at best contingent, and indeed can easily be imagined to be absent. Think of an idealized Zen Buddhist monk, devoid of any preferences, who dispassionately surveys the world before him, forming beliefs but no desires. Once desires enter the picture, they may also have unwanted consequences. For example, how does one separate an agent's enjoyment or disdain for gambling from the value she places on the gamble itself? As Ramsey puts it, "The difficulty is like that of separating two different co-operating forces" (xx). One might 'overvalue' a gamble that provides insurance against an unwanted outcome, or for its desirable consequences. (The manager of a basketball team might secretly bet against his team at an inflated price for the first reason, or, in a public show of bravado, on his team at an inflated price for the second reason.) There is no telling how many forces might enter into the overall desirability of a gamble, only one of which is the probabilistic force that we seek. The betting interpretation apparently makes subjective probabilities ascertainable. It is unclear that the derivation of them from preferences does—for it is unclear that an agent's full set of preferences is ascertainable (even to himself or herself!). Here a lot of weight may need to be placed on the 'in principle' qualification in that criterion. The expected utility representation makes it virtually analytic that an agent should be guided by probabilities—after all, the probabilities are her own, and they are fed into the formula for expected utility in order to determine what it is rational for her to do. But do they function as a good guide to life? Subjectivists in the style of de Finetti recognize no rational constraints on subjective probabilities beyond conformity to the probability 14 calculus (and a certain rule for how probabilities change under the impact of new evidence known as 'conditionalizing'). This permissiveness licenses doxastic states that we would normally call crazy. Thus, you could assign probability 0.999 to this sentence ruling the universe, while upholding subjectivism—provided, of course, that you assign probability 0.001 to this sentence not ruling the universe, and that your other probability assignments all conform to the probability calculus. Such probabilistic coherence plays much the same role for degrees of belief that consistency plays for ordinary, all-or- nothing beliefs. What the subjectivist lacks is an analogue of truth, some yardstick for distinguishing the 'veridical' probability assignments from the rest (such as the 0.999 one above). To the extent that truth is an indispensable guide to life, it seems that the subjectivist needs something more.

4.4 FREQUENCY INTERPRETATIONS Gamblers, actuaries and scientists have long understood that relative frequencies bear an intimate relationship to probabilities. Frequency interpretations posit the most intimate relationship of all: identity. Thus, we might identify the probability of 'heads' on a certain coin with the frequency of heads in a suitable sequence of tosses of the coin, divided by the total number of tosses. A simple version of frequentism, which we will call finite frequentism, attaches probabilities to events or attributes in a finite reference class in such a straightforward manner: the probability of an attribute A in a finite reference class B is the relative frequency of actual occurrences of A within B. Thus, finite frequentism bears certain structural similarities to the classical interpretation, insofar as it gives equal weight to each of a set of events, simply counting how many of them are 'favorable' as a proportion of the total. The crucial difference, however, is that where the classical interpretation counted all the possible outcomes of a given experiment, finite frequentism counts actual outcomes. It is thus congenial to those with empiricist scruples. It was developed by Venn (1876), who in his discussion of the proportion of births of males and females, concludes: "probability is nothing but that proportion" (p. 84, his emphasis). Other notable finite frequentists include Russell (19xx) and Braithwaite (19xx). Finite frequentism gives another operational definition of probability, and its problems begin there. For example, just as we want to allow that our thermometers could be ill-calibrated, and could thus give misleading measurements of temperature, so we want to allow that our 'measurements' of probabilities via frequencies could be misleading, as when a fair coin lands heads 9 out of 10 times. More than that, it seems to be built into the very notion of probability that such misleading results can arise. Indeed, in many cases, misleading results are guaranteed. Starting with a degenerate case: according to the finite frequentist, a coin that is never tossed, and that thus yields no actual outcomes whatsoever, lacks a probability for heads altogether; yet a coin that is never measured does not thereby lack a diameter. Perhaps even more troubling, a coin that is tossed exactly once yields a relative frequency of heads of either 0 or 1, whatever its bias. Famous enough to merit a name of its own, this is the so-called 'problem of the single case'. In fact, many events are most naturally regarded as not merely unrepeated, but in a strong sense unrepeatable—the 2000 presidential election, the final game of the 2001 NBA play-offs, the Civil War, Kennedy's assassination, certain events in the very 15 early history of the universe. Nonetheless, it seems natural to think of non-extreme probabilities attaching to some, and perhaps all, of them. Worse still, some cosmologists regard it as a genuinely chancy matter whether our universe is open or closed (apparently certain quantum fluctuations could, in principle, tip it one way or the other), yet whatever it is, it is 'single-case' in the strongest possible sense. The problem of the single case is particularly striking, but we really have a sequence of related problems: 'the problem of the double case', 'the problem of the triple case' ... Every coin that is tossed exactly twice can yield only the relative frequencies 0, 1/2 and 1, whatever its bias… A finite reference class of size n, however large n is, can only produce relative frequencies at a certain level of 'grain', namely 1/n. Among other things, this rules out irrational probabilities; yet our best physical theories say otherwise. Furthermore, there is a sense in which any of these problems can be transformed into the problem of the single case. Suppose that we toss a coin a thousand times. We can regard this as a single trial of a thousand-tosses-of-the-coin experiment. Yet we do not want to be committed to saying that that experiment yields its actual result with probability 1. The problem of the single case is that the finite frequentist fails to see intermediate probabilities in various places where others are inclined to see them. There is also the converse problem: the frequentist sees intermediate probabilities in various places where others are inclined not to. Our world has myriad different entities, with myriad different attributes. We can group them into still more sets of objects, and then ask with which relative frequencies various attributes occur in these sets. Many such relative frequencies will be intermediate; the finite frequentist automatically identifies them with intermediate probabilities. But it would seem that whether or not they are genuine probabilities, as opposed to mere tallies, depends on the case at hand. Bare ratios of attributes among sets of disparate objects may lack the sort of modal force that one might expect from probabilities. I belong to the reference class consisting of myself, the Eiffel Tower, the southernmost sandcastle on Santa Monica Beach, and Mt Everest. Two of these four objects are less than 7 ft tall, a relative frequency of 1/2; moreover, we could easily extend this class, preserving this relative frequency (or, equally easily, not). Yet it would be odd to say that my probability of being less than 7 ft tall, relative to this reference class, is 1/2, even though it is perfectly acceptable (if uninteresting) to say that 1/2 of the objects in the reference class are less than 7 ft tall. Frequentists (notably Venn, and Reichenbach 19xx among others), partly in response to some of the problems above, have gone on to consider infinite reference classes, identifying probabilities with limiting relative frequencies of events or attributes therein. Thus, we require an infinite sequence of trials in order to define such probabilities. But what if the actual world does not provide an infinite sequence of trials of a given experiment? In that case, some frequentists (e.g., Venn, Reichenbach) identify probability with a hypothetical or counterfactual limiting relative frequency. We are to imagine hypothetical infinite extensions of an actual sequence of trials; probabilities are then what the limiting relative frequencies would be if the sequence were so extended. Note that at this point we have left empiricism behind. A modal element been injected into frequentism with this invocation of a counterfactual; moreover, the counterfactual may involve a radical departure from the way things actually are, one that may even require the breaking of laws of nature. (Think what it would take for the coin in my pocket, 16

which has only been tossed once, to be tossed infinitely many times—never wearing out, and never running short of people willing to toss it!) Limiting relative frequencies, we have seen, must be relativized to a sequence of trials. Herein lies another difficulty. Consider an infinite sequence of the results of tossing a coin, as it might be H, T, H, H, H, T, H, T, T, … Suppose for definiteness that the corresponding relative frequency sequence for heads, which begins 1/1, 1/2, 2/3, 3/4, 4/5, 4/6, 5/7, 5/8, 5/9, …, converges to 1/2. By suitably reordering these results, we can make the sequence converge to any value in [0, 1] that we like. (If this is not obvious, consider how the relative frequency of even numbers among positive integers, which intuitively 'should' converge to 1/2, can instead be made to converge to 1/4 by reordering the integers with the even numbers in every fourth place, as follows: 1, 3, 5, 2, 7, 9, 11, 4, 13, 15, 17, 6, …) To be sure, there may be something natural about the ordering of the tosses as given—for example, it may be their temporal ordering. But there may be more than one natural ordering. Imagine the tosses taking place on a train that shunts backwards and forwards on tracks that are oriented west-east. Then the spatial ordering of the results from west to east could look very different. Why should one ordering be privileged over others? A well-known objection to any version of frequentism is that relative frequencies must be relativised to a reference class. Consider a probability concerning myself that I care about—say, my probability of living to age 80. I belong to the class of males, the class of non-smokers, the class of philosophy professors who have two vowels in their surname, … Presumably the relative frequency of those who live to age 80 varies across (most of) these reference classes. What, then, is my probability of living to age 80? It seems that there is no single frequentist answer. Instead, there is my probability-qua- male, my probability-qua-non-smoker, and so on. This is an example of the so-called reference class problem for frequentism (although it can be argued that analogues of the problem arise for the other interpretations as well). As we have seen in the previous paragraph, the problem is only compounded for limiting relative frequencies: probabilities must be relativized not merely to a reference class, but to a sequence within the reference class. We might call this the reference sequence problem. The beginnings of a solution to this problem would be to restrict our attention to sequences of a certain kind, those with certain desirable properties. For example, there are sequences for which the limiting relative frequency of a given attribute does not exist; Reichenbach thus excludes such sequences. Von Mises gives us a more thoroughgoing restriction to what he calls collectives—hypothetical infinite sequences of attributes (possible outcomes) of specified experiments that meet certain requirements. Call a place-selection an effectively specifiable method of selecting indices of members of the sequence, such that the selection or not of the index i depends at most on the first i Ð 1 attributes. The axioms are: Axiom of Convergence: the limiting relative frequency of any attribute exists. Axiom of Randomness: the limiting relative frequency of each attribute in a collective ω is the same in any infinite subsequence of ω which is determined by a place selection. [Axiom of randomness implies axiom of convergence.] The probability of an attribute A, relative to a collective ω, is then defined as the limiting relative frequency of A in ω. Note that a constant sequence such as H, H, H, …, in which 17 the limiting relative frequency is the same in any infinite subsequence, trivially satisfies the axiom of randomness—a perhaps unwelcome result. Collectives are abstract mathematical objects that are not empirically instantiated, but that are nonetheless posited by von Mises to explain the stabilities of relative frequencies in the behavior of actual sequences of outcomes of a repeatable random experiment. Church (1940) renders precise the notion of a place selection as a recursive function. Nevertheless, the reference sequence problem remains: probabilities must always be relativized to a collective, and for a given attribute such as 'heads' there are infinitely many. Von Mises embraces this consequence, insisting that the notion of probability only makes sense relative to a collective. In particular, he regards single case probabilities as "nonsense". Some critics believe that rather than solving the problem of the single case, this merely ignores it. Let us take stock. Finite relative frequencies only arise where there are finite sequences of trials. They are, of course, finitely additive; however, they fail to satisfy countable additivity, since neither side of the equation is defined (another point of analogy to classical probabilities). Limiting relative frequencies violate countable additivity (Birkhoff 1940, Ch. XII, ¤6; de Finetti 1972, ¤5.22). Indeed, the domain of definition of limiting relative frequency is not even a field (de Finetti 1972, ¤5.8). So relative frequencies do not provide an admissible interpretation of Kolmogorov's axioms. Finite frequentism has no trouble meeting the ascertainability criterion, as finite relative frequencies are in principle easily determined. The same cannot be said of limiting relative frequencies. On the contrary, any finite sequence of trials (which, after all, is all we ever see) puts literally no constraint on the limit of an infinite sequence; still less does an actual finite sequence put any constraint on the limit of an infinite hypothetical sequence, however fast and loose we play with the notion of 'in principle' in the ascertainability criterion. Offhand, it might seem that the frequentist interpretations resoundingly meet the applicability to frequencies criterion. Finite frequentism meets it all too well, while limiting relative frequentism meets it in the wrong way. If anything, finite frequentism makes the connection between probabilities and frequencies too tight, as we have already observed. A fair coin that is tossed a million times is very unlikely to land heads exactly half the time; one that is tossed a million and one times is even less likely to do so! Limiting relative frequentism fails to connect probabilities with finite frequencies. It connects them with limiting relative frequencies, of course, but again too tightly: for even in infinite sequences, the two can come apart. (A fair coin could land heads forever, even if it is highly unlikely to do so.) To be sure, science has much interest in finite frequencies, and indeed working with them is much of the business of statistics. Whether it is interested in highly idealized, hypothetical extensions of actual sequences, and relative frequencies therein, is another matter. The applicability to rational opinion goes much the same way: it is clear that it is guided by finite frequency information, unclear that it is guided by information about limits of hypothetical frequencies.

4.5 PROPENSITY INTERPRETATIONS Like the frequency interpretations, propensity interpretations locate probability 'in the world' rather than in our heads or in logical abstractions. Probability is thought of as a physical propensity, or disposition, or tendency of a given type of physical situation to yield an outcome of a certain kind, or to yield a long run relative frequency of such an 18 outcome. This view was motivated by the desire to make sense of single-case probabilities, such as 'the probability that this radium atom decays in 1500 years is 1/2'. Indeed, Popper (1957) advocates propensity as an account of such quantum mechanical probabilities. Popper develops the theory further in (1959). For him, a probability p of an outcome of a certain type is a propensity of a repeatable experiment to produce outcomes of that type with limiting relative frequency p. For instance, when we say that a coin has probability 1/2 of landing heads when tossed, we mean that we have a repeatable experimental set-up—the tossing set-up—which has a propensity to produce a sequence of outcomes in which the limiting relative frequency of heads is 1/2. With its heavy reliance on limiting relative frequency, this position risks collapsing into von Mises-style frequentism according to some critics. Giere (1973), on the other hand, explicitly allows single-case propensities, with no mention of frequencies: probability is just a propensity of a repeatable experimental set-up to produce sequences of outcomes. This, however, creates the opposite problem to Popper's: how, then, do we get the desired connection between probabilities and frequencies? It is thus useful to follow Gillies (2001) in distinguishing long-run propensity theories and single-case propensity theories: "A long-run propensity theory is one in which propensities are associated with repeatable conditions, and are regarded as propensities to produce in a long series of repetitions of these conditions frequencies which are approximately equal to the probabilities. A single-case propensity theory is one in which propensities are regarded as propensities to produce a particular result on a specific occasion." (822). Gillies offers a long-run (though not infinitely long-run) propensity theory; Miller (19xx) and Fetzer (19xx) offer single-case propensity theories. It seems that those theories that tie propensities to frequencies do not provide an admissible interpretation of the probability calculus, for the same reasons that relative frequencies do not. It is prima facie unclear whether single-case propensity theories obey the probability calculus or not. To be sure, one can stipulate that they do so, perhaps using that stipulation as part of the implicit definition of propensities. Still, it remains to be shown that there really are such things—stipulating what a witch is does not suffice to show that witches exist. Indeed, to claim, as Popper does, that an experimental arrangement has a tendency to produce a given limiting relative frequency of a particular outcome, presupposes a kind of stability or uniformity in the workings of that arrangement (for the limit would not exist in a suitably unstable arrangement). But this is the sort of 'uniformity of nature' presupposition that Hume argued could not be known either a priori, or empirically. Now, appeals can be made to limit theorems—so called 'laws of large numbers'—whose content is roughly that under suitable conditions, such limiting relative frequencies almost certainly exist, and equal the single case propensities. Still, these theorems make assumptions (e.g., that the trials are independent and identically distributed) whose truth again cannot be known, and must merely be postulated. Part of the problem here, say critics, is that we do not know enough about what propensities are to adjudicate these issues. There is some property of this coin tossing arrangement such that this coin would land heads with a certain limiting frequency, say, or with a certain long-run frequency (approximately). But as Hitchcock (forthcoming) points out, "calling this property a ‘propensity’ of a certain strength does little to indicate 19 just what this property is." Said another way, propensity accounts are accused of giving empty accounts of probability, à la Molière's 'dormative virtue'. Similarly, Gillies objects to single-case propensities on the grounds that statements about them are untestable, and that they are "metaphysical rather than scientific" (825). Some might level the same charge even against long-run propensities, which are supposedly distinct from the testable relative frequencies. This suggests that the propensity account has difficulty meeting the applicability to science criterion. Some propensity theorists (e.g. Giere) liken propensities to physical magnitudes such as electrical charge that are the province of science. But Hitchcock observes that the analogy is misleading. We can only determine the general properties of charge—that it comes in two varieties, that like charges repel, and so on—by empirical investigation. What investigation, however, could tell us whether or not propensities are non-negative, normalized and additive? More promising, perhaps, is the idea that propensities are to play certain theoretical roles, and that these place constraints on the way they must behave, and hence what they could be (in the style of the Ramsey/Lewis/'Canberra plan' approach to theoretical terms). The trouble here is that these roles may pull in opposite directions, overconstraining the problem. The first role, according to some, constrains them to obey the probability calculus (with finite additivity); the second role, according to others, constrains them to violate it. On the one hand, propensities are said to constrain rational credences in a way codified by the so-called 'Principle of Direct Probability', and refined and made famous to philosophers by David Lewis (19xx) under the name 'the Principal Principle'. Roughly, the principle is that rational credences strive to 'track' propensities, so that if a rational agent knows the propensity of a given outcome, her degree of belief will be the same. More generally, where 'Cr' is the subjective probability function of a rational agent, and 'Pr' is the propensity function, for any X, (*) Cr(X | Pr(X) = x) = x. For example, my degree of belief that this coin toss lands heads, given that its propensity of landing heads is 3/4, is 3/4. (*) underpins an argument that whatever they are, propensities must obey the usual probability calculus (with finite additivity); after all, it is argued, rational credences, which are guided by them, do. On the other hand, Humphries (19xx) gives an influential argument that propensities do not obey the probability calculus. The idea is that the probability calculus implies Bayes' theorem, which allows us to 'invert' a conditional probability:

P(B|A).P(A) P(A|B) = P(B) Yet at least some propensities seem to be measures of 'causal tendencies', and much as the causal relation is asymmetric, so these propensities supposedly do not invert. Suppose we have a test for an illness that occasionally gives false positives and false negatives. A given sick patient may have a (non-trivial) propensity to give a positive test result, but it apparently makes no sense to say that a given positive test result has a (non-trivial) propensity to have come from a sick patient. 'Humphries' paradox', as it is known, has prompted Fetzer and Nute (19xx) to offer a "probabilistic causal calculus" which looks 20 quite different to Kolmogorov's calculus. Thus, we have an argument that whatever they are, propensities must not obey the usual probability calculus.4 Perhaps all this shows that the notion of 'propensity' bifurcates: on the one hand, there are propensities that bear an intimate connection to relative frequencies and rational credences, and that obey the probability calculus (with finite additivity); on the other hand, there are causal propensities that behave rather differently. In that case, there would be still more interpretations of probability than have previously been recognized.

5. CONCLUSION: FUTURE PROSPECTS? It should be clear from the foregoing that there is still much work to be done regarding the interpretation of probability. Each interpretation that we have canvassed seems to capture some crucial insight into it, yet falls short of doing complete justice to it. Perhaps the full story about probability is something of a patchwork, with partially overlapping pieces. In that sense, the above interpretations might be regarded as complementary, although to be sure each may need some further refinement. My bet, for what it is worth, is that we will retain at least three distinct notions of probability: one quasi-logical, one objective, and one subjective. There are already signs of the rehabilitation of classical and logical probability, and in particular the principle of indifference, by authors such as Stove (1986), Bartha (forthcoming), Festa (1993) and Maher (2000, forthcoming). Relevant here may also be advances in and complexity theory (see Fine 1973, Li and Vitanyi 1997). These theories have already proved to be fruitful in the study of randomness (Kolmogorov 19xx, Martin-Löf 19xx), which obviously is intimately related to the notion of probability. Refinements of our understanding of randomness, in turn, should have a bearing on the frequency interpretations (recall von Mises' appeal to randomness in his definition of 'collective'), and on propensity accounts (especially those that make explicit ties to frequencies). Given the supposed connection between propensities and causation adumbrated in the previous section, powerful causal modeling techniques by authors such as Pearl (2000) and Spirtes, Glymour and Scheines (1993) may also play a role here. An outgrowth of frequentism is Lewis' (1986, 1994) account of chance. It runs roughly as follows. The laws of nature are those regularities that are theorems of the best theory: the true theory of the universe that best balances simplicity, strength, and likelihood (that is, the probability of the actual course of history, given the theory). If any of the laws are probabilistic, then the chances are whatever these laws say they are. Now, it is somewhat unclear exactly what 'simplicity' and 'strength' consist in, and exactly how they are to be balanced. Perhaps insights from statistics and may be helpful here: approaches to statistical model selection, and in particular the ‘curve-fitting’ problem, that attempt to codify simplicity, and its trade-off with strength—e.g., the Akaike Information Criterion (see Forster and Sober 1994), the Bayesian Information Criterion (see Kieseppä forthcoming), Minimum Description Length theory (see Rissanen 1999) and theory (see Wallace and Dowe 1999). We may expect that further criteria of adequacy for subjective probabilities will be developed—perhaps refinements of ‘scoring rules’ (Winkler 1996), and more generally, candidates for playing a role for subjective probability analogous to the role that truth

4 It should be noted that Gillies argues that Humphries' paradox does not force non-Kolmogorovian propensities on us. 21 plays for belief. Here we may come full circle. For belief is answerable both to logic and to objective facts. A refined account of degrees-of-belief may be answerable both to a refined quasi-logical and a refined objective notion of probability. Well may we say that probability is a guide to life; but the task of understanding exactly how and why it is has still to be completed, and will surely prove to be a guide to future theorizing about it. Alan Hájek California Institute of Technology

REFERENCES [to be continued]

Bertrand, J. (1889?): Calcul des Probabilités, 1st edition, Gauthier-Villars. Carnap, Rudolf (1950): Logical Foundations of Probability, University of Chicago Press. Carnap, Rudolf (1963): "Replies and Systematic Expositions", in P. A. Schilpp (ed.), The Philosophy of , Open Court, La Salle, Ill, 966-998. De Finetti, Bruno (1937): “Foresight: Its Logical Laws, Its Subjective Sources”, translated in Kyburg and Smokler (1964). Fine, Terrence (1973): Theories of Probability, Academic Press. Gaifman, Haim (1988): "A Theory of Higher Order Probabilities", in Causation, Chance, and Credence, eds. Brian Skyrms and William L. Harper, Kluwer. Giere, R. N. (1973): "Objective Single-Case Probabilities and the Foundations of Statistics", in P. Suppes et al. (eds.), Logic, Methodology and Philosophy of Science IV, North Holland 467-83. Gillies, Donald (2000): "Varieties of Propensity", British Journal of Philosophy of Science 51, 807-835. Goldstein, Michael (1983): “The Prevision of a Prevision” Journal of the American Statistical Association 77, 822Ð30. Goodman, Nelson (1983): Fact, Fiction, and Forecast (4th ed.), Harvard University Press. Hájek, Alan (forthcoming): "What Conditional Probability Could Not Be", Synthese. Hild (forthcoming): Introduction to The Concept of Probability: A Reader, MIT Press. Jaynes, E. T. (1968): “Prior Probabilities”, Institute of Electrical and Electronic Engineers Transactions on Systems Science and Cybernetics, SSC-4, 227-241. Jeffreys, Harold (1939): Theory of Probability; reprinted in Oxford Classics in the Physical Sciences series, Oxford University Press, 1998. Johnson, W. E. (1932): “Probability: The Deductive and Inductive Problems”, Mind 49, 409- 423. Keynes, J. M. (1921): Treatise on Probability, Macmillan, London. Reprinted 1962, Harper and Row, New York. Laplace, Pierre Simon de (1814): “Essai Philosophique sur les Probabilités”, Paris. Translated into English as A Philosophical Essay on Probabilities, New York, 1952. Lewis, David (1980): "A Subjectivist's Guide to Objective Chance", in R.C. Jeffrey ed. Studies in Inductive Logic and Probability, Vol II., University of California Press, 263-293; reprinted in Philosophical Papers Volume II, Oxford University Press. Lewis, David (1994b): “Humean Supervenience Debugged”, Mind 103, 473-490. 22

Popper, Karl (1959a): "The Propensity Interpretation of Probability", British Journal of Philosophy of Science10, 25-42. Ramsey, F. P. (1926): “Truth and Probability”, in Foundations of Mathematics and other Essays; reprinted in Kyburg and Smokler (1963), and in D.H. Mellor (ed.) Philosophical Papers, Cambridge University Press, Cambridge, 1990. Reichenbach, Hans (1949): The Theory of Probability, University of California Press. Rosenkrantz, R.D. (1981): Foundations and Applications of Inductive Probability, Ridgeview Publishing Company. Savage, L. J. (1954): The Foundations of Statistics, New York, John Wiley. Skyrms, Brian (1994): “Bayesian Projectibility”, in Grue! (D. Stalker ed.), Open Court, 241- 62. Sober, Elliott (2000): Philosophy of Biology, Westview Press, 2nd ed. Strevens, Michael (forthcoming): Bigger than Chaos, Harvard University Press. van Fraassen, Bas (1984): "Belief and the Will", Journal of Philosophy 81, 235-256. van Fraassen, Bas (1989): Laws and Symmetry, Clarendon Press, Oxford. van Fraassen, Bas (1995): "Belief and the Problem of Ulysses and the Sirens", Philosophical Studies 77, 7-37. Venn, John (1876): The Logic of Chance, 2nd ed., Macmillan and co. von Mises, Richard (1957): Probability, Statistics and Truth, revised English edition, New York. 23

.