R. A. Fisher's Fiducial Argument and Bayes' Theorem

Teddy Seidenfeld

Statistical Science, Vol. 7, No. 3. (Aug., 1992), pp. 358-368.

Stable URL: http://links.jstor.org/sici?sici=0883-4237%28199208%297%3A3%3C358%3ARAFFAA%3E2.0.CO%3B2-G

Statistical Science is currently published by Institute of .

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/about/terms.html. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/journals/ims.html.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission.

The JSTOR Archive is a trusted digital repository providing for long-term preservation and access to leading academic journals and scholarly literature from around the world. The Archive is supported by libraries, scholarly societies, publishers, and foundations. It is an initiative of JSTOR, a not-for-profit organization with a mission to help the scholarly community take advantage of advances in technology. For more information regarding JSTOR, please contact [email protected].

http://www.jstor.org Tue Mar 4 10:35:54 2008 Statistical Science 1992, Vol. 7, No. 3, 358-368 R. A. Fisher's Fiducial Argument and Bayes' Theorem Teddy Seidenfeld

1. INTRODUCTION determine whether the goals Fisher set for fiducial probability are even mutually consistent. In celebration of the 100th anniversary of Fisher's A convenient starting place for our investigation is birth, I want to raise the subject of fiducial inference the distinction between direct and inverse inference (or for our reflection. Shortly after Fisher's death in 1962, direct and ). These terms date back my teacher and friend, Henry Kyburg, addressed a at least as far as Venn and appear often in those conference on fiducial probability. I find it appropriate passages where Fisher attempts to explicate fiducial to begin with some of Kyburg's (1963) remarks: probability, for the reason I will give below. As a I am a logician and a philosopher; I have not first approximation, the difference between direct and studied statistics for very long, and so I still very inverse probability is the difference between condi- quickly get out of my depth in a discussion of the tional probability for a specific (observable) event D technicalities of . But I think given a statistical hypothesis S, p(DIS),and conditional it is important, none the less, for people whose probability of a statistical hypothesis given the evi- interests lie in the area of inference as such to do dence of sample data, p(SlD).For example, an instance the best they can in reacting to -and in having an of a direct probability statement is, "The probability action upon -current work in that particular kind is 0.5 of 'heads' on the next flip of this coin, given that of inference called "statistical." That this interac- it is a fair coin." Here the conditioning proposition may tion is difficult for both parties is the more reason be understood as supplying the statistical hypothesis for attempting it. (p. 938) Sthat there is a binomial model (a hypotheticalpopula- tion) for flips with this coin, 0 = 0.5, and the event D is My purpose in this essay is to try to assist that interac- that the next flip lands "heads." An inverse probability tion by focusing on the rather vague inference pattern statement is, "The probability is 0.4 that the coin is known as the "fiducial argument." I hope to show that, fair, given that 4 of 7 flips land 'heads'." Direct and though it really is untenable as a logical argument, inverse inference, then, denote those principles of in- nonetheless, it illuminates several key foundational ductive logic which determine or, at least, constrain the issues for understanding Fisher's disputes over the probability values of the direct and inverse probability status of Bayes' theorem and thereby some of the statements. They would explain the probability values continuing debates on the differences between so-called 0.5 and 0.4, assuming those values are inductively orthodox and Bayesian statistics. valid. Begin with the frank question: What is fiducial prob- A maxim for direct inference which is commonplace, ability? The difficulty in answering simply is that there in so far as any inductive rule can be so described, are too many responses to choose from. As is well is what Hacking (1965, p. 165) labels the Frequency known, Fisher's style was to offer heuristic examples Principle. It states loosely that, regarding the direct of fiducial arguments and, too quickly, to propose probability p(DIS), provided all one knows (of rele- (different) formal rules to generalize the examples. By vance) about the event D is that it is an instance of the contrast, among those who have attempted to recon- statistical "law" S, the direct conditional probability for struct fiducial inference according to a well-expressed event D is the value specified in the statistical law S. theory, one must single out the following five, in chro- If all we know about the next flip of this coin is that nological order: Jeffreys (1961), Fraser (1961), Demp- it is a flip of a fair coin, then the direct probability is ster (1963), Kyburg (1963, 1974) and Hacking (1965). 0.5 that it lands "heads" on the next flip. So, instead of beginning with a formal definition, let To this extent, direct probability is less problematic us try to find out what fiducial inference is supposed than inverse probability. There is no counterpart to to accomplish and then see whether, in fact, any answer the Frequency Principle for inverse probability. At to our question can be accepted. That is, first we shall

best, p(S1D) may be determined by appeal- - to Bayes' theorem, p(SID) a p(DIS)p(S): an inverse inference Teddy Seidenfeld is Professor in the Departments of which involves both a direct probability p(D1S) and a Philosophy and Statistics, Carnegie Mellon University, p(S). It is here that Fisher sought Pittsburgh, Pennsylvania 15213. relief from inverse inference through fiducial probabil- FISHER'S FIDUCIAL ARGUMENT 359

ity for, as we will see in the next section, his idea ple is not accepted as evidence, this analysis does not was to reduce inverse to direct probability by fiducial invalidate the unconditional confidence level state- inference. Thus, we have to explore the limits of direct ment. The Neyman-Pearson direct inference about the inference in order to appreciate fiducial probability. confidence level is pegged to the reference set of re- Of course, rarely do we find our background informa- peated trials through a (pretrial) constraint on which tion so conveniently arranged as posited by the Fre- features of the statistical sample count as evidence. quency Principle. I may know where the coin was Whereas the Neyman-Pearson resolution to direct minted, or who flipped it last, in addition to knowing inference is grounded on a pragmatic choice of eviden- it is fair. Are these relevant considerations about the tial considerations, Fisher offers an epistemological next flip, given the statistical hypothesis (S)that the criterion. His criterion (reported below) is a balance coin is fair? I may know that the next flip of this coin of what we know and what we don't know. Fisher is an instance of a different statistical law, S', as well incorporates his views about direct inference into what an instance of S. It may be a flip of a coin drawn from he calls a semantics for probability. The following is a urn U (0 = 0.6), where half the coins in U are fair and succinct expression both of Fisher's dissatisfaction the other half biased (0 = 0.7) for "heads." By the with the limiting frequency interpretation of probabil- Frequency Principle, if all I know about the next flip ity and a sketch of his positive theory. is that it is an instance of St, the direct probability that it lands "heads" on the next flip is 0.6, p(DIS') = Remark 0.6. What if I know that the next flip is an instance of In the quotation below, Fisher alludes to an aggre- both laws S and St? (H. Reichenbach, an advocate of a gate or hypotheticalpopulationof, for example, throws limiting frequency interpretation of probability, called of a fair die to suggest what the binomial magnitude this the problem of the reference class for single 0 = 116 is about; to wit, one-sixth of the hypothetical events.) population of throws are "aces." I find Fisher's talk of Invariably, these difficulties are resolved by some hypothetical populations of value only as a metaphor. version of a Total Evidence principle. However, com- It facilitates the additional language of subpopulations peting versions do not lead to the same results. For and relevant reference sets, which are helpful in fram- instance, the usual account of Neyman-Pearson statis- ing the real problems with direct inference. In a differ- tical tests is in terms of direct probabilities of type 1 ent setting- statistical estimation -Fisher (1973) uses and type 2 errors. Likewise, the usual interpretation the concept of a hypothetical population to justify his of confidence levels is as direct probabilities that the criterion of Fisher Consistency (see Seidenfeld, 1992). covers the (unknown) parameter. These familiar probability claims are given in terms of Indeed, I believe that a rather simple semantic direct inference based a privileged reference set of confusion may be indicated as relevant to the repeated trials. But how is the "repeated trial" so privi- issues discussed, as soon as consideration is given leged? I. Levi argues (1980, Section 17.2) that the so- to the meaning that the word probability must called orthodox theory involves a limitation, imposed have to anyone so much practically interested as before the trial, on the evidence that is available after a gambler, who, for example, stands to gain or lose the trial. He expresses this through a distinction between money, in the event of an ace being thrown with features of the data used as evidence (for direct probabil- a single die. To such a man the information sup- ity statements), in contrast with features of the data plied by a familiar mathematical statement such that are used merely as input to a statistical routine. as: , (A similar distinction, useful for reconciling "orthodox" "If a aces are thrown in n trials, the probabil- and Bayesian methods, is created using I. J. Good's, ity that the difference in absolute value between 19'71,Statistician's Stooge.) Which features of the data a/n and 116 shall exceed any positive value E, are to serve (post-trial)as statistical evidence for direct however small, shall tend to zero as the number inference are identified (pretrial) by the pragmatics of n is increased indefinitely," the ~roblemcontext, for example, by the repetitions which arise in a particular quality control setting. Only will seem not merely remote, but also incomplete those statistical laws S valid for this distinguished set and lacking in definiteness in its application to the of repeated trials may be ~used.For example, as in particular throw in which he is interested. Indeed, the Buehler-Fedderson (1963) problem (cf. Section 3, by itself it says nothing about that throw. . . . below), it may be that conditional on the sample falling Before the limiting ratio of the whole set can be in a recognizable subset of the sample space, the cover- accepted as applicable to a particular throw, a age probability for a particular confidence interval is second condition must be satisfied, namely that bounded away from the announced confidence level. before the die is cast no such subset can be recog- However, because the extra information about the Sam- nized. This is a necessary and sufficient condition 360 T. SEIDENFELD

for the applicability of the limiting ratio of the What if we know this is a flip of a fair coin (S), but entire aggregate of possible future throws as the also it is a flip of a fair coin which stays up for more probability of any one particular throw. On this than 3 seconds (S*)-and we don't know the statistics condition we may think of a particular throw, or for the subpopulation of such fair coin flips? Does of a succession of throws, as a random sample from Fisher's theory prohibit the direct probability of 0.5 the aggregate, which is in this sense subjectively that this flip of the fair coin lands "heads" merely homogeneous and without recognizable stratifica- because we know (also) that it stays up for more than tion. 3 seconds, though we have no statistical information This fundamental requirement for the applicabil- about such extended flips? ity to individual cases of the concept of classical There are two clear options. Fisher's program may probability shows clearly the role both of well- be completed by adding clauses on direct inference specified ignorance and of specific knowledge in a which impose the onus for proof: either (A) on the typical probability statement. . . . The knowledge challenges to the claim that D is not a random event required for such a statement refers to a well- under the statistical law S or (B)on the defense of the defined aggregate, or population of possibilities claim that D is a random event under the statistical within which the limiting frequency ratio must be law S. Under (A), merely knowing that the coin flip exactly known. The necessary ignorance is speci- lasts more than 3 seconds, alone, does not prevent the fied by our inability to discriminate any of the direct probability based on the statistical hypothesis different sub-aggregates having different frequency that it is a fair coin. That is, unless one knows that ratios, such as must always exist. (pp. 34-36) such extended coin flips have different statistics for landing "heads," the direct inference p(D1S) = 0.5 We have here the ingredients for Fisher's solution to stands. Under (B),unless one knows that the subpopu- the direct inference problem. The knowledge he refers lation of extended flips of a fair coin has the same to is nothing more than knowledge of some statistical statistics of landing "heads" as the larger population law S' (e.g., that the coin is chosen from the urn U) of fair coin flips, then the extra information unseats which applies to the particular event of interest, D the direct inference based on the larger population of (e.g., that the next flip lands "heads"). Then, in order all fair coin flips. As we shall see, Fisher's fiducial to apply the statistical law S' to instance D, as in the argument appears to rely on policy (A):additional data Frequency principle, one must be ignorant of compet- are not relevant to a direct inference unless you know ing laws which apply to the particular D. Unfortu- the rival statistical law governing the subpopulation nately, Fisher gives only rules-of-thumb, not a formal created by the added constraints. (See Kyburg's work account of when laws compete. For example, suppose for a systematic development of this strategy, to in- we know that D belongs also to a proper subpopulation clude the important cases where the agent has only of the statistical law S' and that law S (with different partial [inequality] information about population sta- statistics) applies to this subpopulation. Then knowl- tistics.) edge of S prevents D from being a random event with respect to law S'. But knowledge of S' does not prevent D from being a random event with respect to law S. 2. A SKETCH OF THE UNIVARIATE Accordingly, if you know that the next is a flip of a FlDUClAL ARGUMENT fair coin (0 = 0.5), that information prevents it from being a random flip of a coin chosen from the urn U According to Fisher, fiducial probability is special (0 = 0.6), since S embeds the next flip in a relevant only by its genesis, not by its content. (He says this subpopulation with respect to the statistical law S'. in numerous places, e.g., Fisher, 1973, p. 59.) That is, Roughly put, only half of the flips from U are flips of whatever we call fiducial probability must satisfy the a fair coin. But the situation between S and S' is mathematical calculus of probability, be that a count- asymmetric. Knowledge about the larger population ably additive or finitely additive (normed) measure. pertaining to S' does not provide relevant information The uniqueness of fiducial probability is, supposedly, about the next flip once you know (S)it is a fair coin that it provides statements of inverse probability with- which is tossed. out admitting into the inference any (unwarranted) The lack of a formal theory about "relevant" subpopu- "prior" probability for statistical hypotheses, that is, lations makes Fisher's semantics for probability seri- without relying on Bayes' theorem to derive inverse ously incomplete. What shall we say about a case where probability from direct probability and prior probabil- we know that a specific event is an instance of a ity. More accurately, fiducial inference attempts to statistical law S* whose hypothetical population is derive inverse probability in the absence of statistically narrower than that for the law S we intend to use, but based prior probability. As Fisher (1973, p. 59) ex- we don't know the statistics for this competing law? presses it, a precondition for fiducial inference is that FISHER'S FIDUCIAL ARGUMENT 361

there is insufficient background knowledge to determine inference to direct inference by manipulating the rele- an initial (or "prior") value for probability about unknown vance conditions for applying the Frequency Principle parameters by direct inference using, say, a hyperpopu- to probability statements for a pivotal variable. Hack- lation. By contrast, for example, a hyperpopulation is ing (1965)and Kyburg (1963)made this logic eminently available in genetics when knowledge of the FOgeno- clear. Half a century ago, Jefieys sketched the Bayes- types provides a direct probability "prior" for the F1 ian model for fiducial inference. And almost 35 years genotype. Then, that prior may be used in Bayes' ago, Lindley (1958)used that model to call Fisher (1973) theorem with evidence of an Fz phenotype, to deter- on his bold assertion that fiducial probability was, after mine an inverse probability about the F1 genotypes. all, no different in "logical content" than "probability In this respect, the ignorance of "priors" for fiducial derived by different methods of reasoning" (p. 59). Spe- inference is identical to Neyman's assumption when cifically, Lindley questions whether fiducial probability he proposes confidence intervals as a replacement for is no different in logical content from inverse probabil- inverse inference. However, unlike the situtation with ity derived by Bayes' method. In the following rational confidence intervals, by Fisher's own claim, fiducial reconstruction of fiducial inference, I try to clarify how probability is to satisfy the same formal conditions as close fiducial probability comes to that ideal. any other probability statement. In other words, it is Let us rehearse a familiar illustration of fiducial rea- fair to ask that fiducial probability satisfies the probabil- soning. ity calculus for conditional probability, which entails EXAMPLE2.1. Consider the random variable x there exists some Bayesian model with a "prior," whose which, according to our background statistical knowl- posterior probability duplicates the fiducial probabil- edge, S, follows a Normal, N(p, 1) distribution, but ity. Of course, Fisher thought such models were mere where we plead ignorance about the "unknown" mean mathematical niceties. Their prior probability is no p. Fisher might say that, about the next observation more than a pretense to (imaginary)statistical informa- x, we know only that it belongs to a "hypothetical tion about a hyperpopulation for the parameter. population" of entities whose x-values are normally The following is a succinct statement of Fisher's distributed with unit variance. Prior to observing x, (1973) views on the topic of such hyperpopulations for given S we have no statistical basis for assigning a "prior." probability to statements about p, for example, It should, in general, be borne in mind that the "p(- 1 Ip 5 1)"is undefined for Fisher. (Expressed in population of parametric values, having the fidu- other words, we do not know statistics for a hyperpopu- cial distribution inferred from any particular Sam- lation of normal populations containing this one.) Of ple, does not, of course, concern any population of course, given a statistical hypothesis about p, So:p = 0, populations from which that sampled might have direct probability about x follows by the Frequency been in reality chosen at random, for the evidence Principle, for example, p(- 1 Ix I 1ISO)= 0.68. available concerns one population only, and tells Define the pivotal variable v = (x - p). It seems non- us nothing more of any parent population that controversial to claim v follows a standard Normal N(0,l) might lie behind it. Being concerned with probabil- distribution, a(.),independent of the unknown mean p. ity, not with history, the fiducial argument, when (But see the remark, below.) Then, equally evident is available, shows that the information provided by direct probability about the next value v (corresponding the sample about this one population is logically to the next value x). That is, from the background statis- equivalent to the information, which we might tical assumption, S, that v - N(0, I), by the Frequency alternatively have possessed, that it had been tho- Principle we assert p(-1 5 v I11s)=: 0.68. , sen at random from an aggregate specified by the Fiducial inference rests on the claim that the same fiducial probability distribution. (pp. 124-125) direct probability statements obtain for the pivotal be- Rather than fiducial probability "being concerned with fore and after observing the sample data. Fisher asserts history," to use Fisher's polemical prose, his claim that that, for example, p(-1 I v 5 11s)= p(-1 5 v 5 fiducial probability is probability served to justify its 1IS, X) = 0.68. Fiducial inference takes knowledge of use in Bayes' theorem for a wide range of inference the observed random variable to be irrelevant to direct problems: inference problems involving data from inference about the pivotal variable. Then, given the different experiments, problems involving nuisance pa- datum, for example, x = 7, since the proposition rameters and problems of prediction. I return to this "-1 5 v 5 1" is equivalent to the proposition "6 I theme in Section 3, below. p I 8," we conclude p(6 5 p 5 81S, x = 7)=: 0.68. Ac- What, then, is fiducial inference? How may one infer cording to fiducial inference, inverse probability about a statement of inverse probability without conceding the parameter p given datum x is reduced to direct Bayes' method of using a prior probability and Bayes' probability about the pivotal variable v, all in the theorem? The short answer is as follows: reduce inverse absence of a statistically based "prior" for p. 362 T. SEIDENFELD

Remark matter of logic, the pivotal VF has a truncated range, Probability about the pivotal v is trivial for a subjec- 0

the absence of knowledge of 6, h summarizes all the about the nuisance factor u2, given s2. Fiducially, relevant evidence about {. (The claim makes sense, I p(a21s2)treats u2 inversely proportional to a x2distribu- believe, only in connection with the step-by-step tion (with n - 1d.f.). The first term,p(Zlp, a2),supports method, which affords a Bayesian check for the coher- fiducial inference about the parameter of interest p, ence of the claim.) Last, suppose that the first factor given Z and a2. Fiducially, p(plZ, u2)is normal N(Z, a2/ supports fiducial inference for 6 from g, given { and h, n). These fiducial probabilities may be used in Bayes' ~(613;g, h).Then these terms may be combined, using theorem to solve for the marginal, inverse probability Bayes' theorem, to yield for the parameter of interest, just as in the Behrens- Fisher problem:

p(~1Z.s~)=/ ~(plb$)p($ls~)da. This is Fisher's "step-by-step" method for solving the infamous Behrens-Fisher problem. This yields the familiar Student's t-distribution (n - 1 d.f.) as a fiducial probability for p. However, unlike EXAMPLE3.2 (Behrens-Fisher problem). Let xi be the Behrens-Fisher distribution for 6, the t-distribution i.i.d. N(p,, a2) (i = 1,. . . , n). Likewise, let yi be i.i.d. may be derived in a "direct" fiducial distribution using N(py,ay2) (i = 1, . . . , n). All four parameters are un- the pivotal variable: t = Jn(p - %)Is,which of course known, but we are interested in the difference in means: has Student's t-distribution (with n - 1 d.f.). 6 = (p, - p,). The variances (ox2,a,') are nuisance fac- Wrongly, I believe, Fisher (1973) asserts that the tors. Define { = ax2/ay2,the population variance ratio, "direct" argument here is available as a simple, fiducial and let z = sx2/sy2,the sample variance ratio. Last, inference. He says, define the quantity It will be recognized that "Student's" distribution allows of induction of the fiducial type, for the inequality

Then, given 5; there is a simple fiducial inference from will be satisfied with just half the probability for d' to 6, yielding: ~(6ld',{),as p(d116,{) is a Student's t which t is tabulated, if t is positive, and with the (with 2n - 2 d.f.), centered on the parameter of inter- complement of this value if t is negative. The est, 6. Fisher uses a fiducial inference from z to 5; reference set for which this probability statement yielding p({lz), as p(zl{) has Fisher's F distribution. holds is that of the values of p, Z and s correspond- (Here is where Fisher assumes z is sufficient for { in ing to the same sample, for all samples of a given the absence of knowledge of 6.) Then these fiducial size or all normal populations. Since Z and s are probabilities are combined, using Bayes' theorem: jointly Sufficient for estimation, and knowledge of p and a a priori is absent, there is no possibility ~(6ldata)= p(6Idt,{)p(llz) dl. of recognizing any subset of cases, within the 1, general set, for which any different value of the It is important to understand that there can be no probability should hold. The unknown parameter p "direct" fiducial argument duplicating this inference has therefore a frequency distribution a posteriori about 6 (Linnik, 1963).That is, to appreciate the Bayes- defined by "Student's" distribution. (pp. 82-84) ian aspects of the Behrens-Fisher solution, where Bayes' theorem is used to integrate out the nuisance parame- In describing the "step-by-step" method, using Bayes' theorem to solve the joint distribution for (p, a) and ter {, let us contrast it with the "step-by-step"fiducial method for inference about an unknown Normal mean, then integrating out the nuisance parameter a, Fisher p, when a is a nuisance parameter. (1973) writes, The rigorous step-by-step demonstration of the EXAMPLE3.3 (Student's t-distribution as a fiducial bivariate distribution by the fiducial argument probability). Let xibe i.i.d. N(p, d),with both parame- would in fact consist of the establishment of the ters unknown, but with p, alone, the parameter of second factor giving the distribution of a given S, interest. The two sample btatistics (Z,s2)are jointly disregarding the other parameter, p, and then of sufficient for the two parameters. Recall the likelihood finding the first factor as the distribution of p for the data factors as follows: given Z and a. Several writers have adduced in- stances in which, when the formal requirements p(Z,s21p,a2) = p(XIp, a2)p(s2)u2). of the fiducial argument are ignored, the results The second term, p(s2/u2),supports fiducial inference of the projection of frequency elements using arti- 364 T. SEIDENFELD

ficially constructed pivotal quantities may be in- ous demonstration of fiducial inference for problems consistent. When the fiducial argument itself is involving several parameters. That is, to avoid para- applicable, there can be no such inconsistency. doxes from relevant reference sets, joint fiducial infer- It will be noticed that in this simultaneous dis- ence has to be related to marginal fiducial inference tribution p and a2 are not distributed indepen- through Bayes' theorem. Thus, Student's t-distribution dently. Integration with respect to either variable is the marginal, fiducial probability for the unknown yields the unconditional distribution of the other, mean p. But that fiducial probability is not derived by and these are naturally obtainable by direct appli- a "direct" fiducial argument involving the t-pivotal. cation of the fiducial argument, namely that The proposal to base fiducial inference on the step- by-step method is incomplete as a resolution of the dilemma without some explanation of Buehler and Fed- is distributed as is t for (N - 1)degrees of free- dersen's relevant reference-set "paradox." The answer dom, while I propose has no basis in any of Fisher's writings I am aware of. In fact, as a mathematical rather than S/u2 statistical response, it is likely Fisher would have ob- is distributed as is X2 for (N - 1)degrees of free- jected to it just as he vehemently objected to what he dom. The distribution of any chosen function of p felt were excessively mathematical theories of statis- and a2 can equally be obtained. (pp. 123-124) tics. Nonetheless, what follows is that explanation which makes the most sense to me. Alas, Fisher's conclusion about the "direct" argument The Buehler-Fedderson phenomenon does not pro- is unwarranted. Using the t-pivotal to shortcut the duce a contradiction with the step-by-step fiducial ar- step-by-step argument does not produce an instance of gument, I propose, because fiducial inference uses a fiducial reasoning where direct probability about the finitely and not necessarily countably additive theory t-pivotal is unaffected by observed, sample informa- of probability. Recall, by the late 1930s, JefFreys (1961, tion. True, (2, s) are jointly sufficient for (p, u). But that Section 3.4, pp. 137-147) had shown that the Bayesian premise is logically inadequate for Fisher's conclusion model for fiducial probability in two-parameter N(p, 2) about the impossibility of a recognizable reference set inference uses the "improper" prior density, dp dula. An for the t-pivotal. There is a recognizable subpopulation improper prior assigns equal magnitudes of "support" having statistics that conflict with the t-distribution, to each of a countably infinite partition. For example, as we see next. the uniform, Lebesgue density for a real-valued quan- EXAMPLE3.3 Continued (Buehler-Feddersen prob- tity p assigns equal magnitudes to each unit interval. lem). Let n = 2, so we have two (i.i.d.) observations But there can be no countably additive probability from N(p, a2).Trivially, there is the direct probability which duplicates this feat. Only (purely) finitely addi- tive probabilities satisfy the condition that each unit interval for p has equal prior probability. Of course, for each pair (p, 2). Likewise, the fiducial "marginal" there are different ways of representing the improper t-probability (1d.f.) satisfies, prior that mask this feature: it can be thought of as a sigma-finite measure (Renyi, 1955), or as a limit of so-called "proper" countably additive probabilities (Lindley, 1969). Regardless of how it is described, the for all samples (XI,x2). Define the statistic u = lxl + improper prior behaves like the finitely additive proba- x2l I 1x1 - ~21.Then, as R. J. Buehler and A. P. Feddersen proved (1963), within a year of Fisher's bility it is! (See also the discussion by Levi, 1980, pp. aeath, 125-131, and by Kadane et al., 1986.) The reason for emphasizing this point is that, unlike countably additive probability, finitely additive proba- bility must admit what de Finetti called "nonconglo- for each pair (p,a2). If the observed sample satisfies merability." (u I 1.5), doesn't the inequality (3.3) point us to a DEFINITION(Dubins, 1975). Say that probability p is subpopulation with statistics different from that for conglomerable in an infinite partition a = {h,} if, for the t-distribution (on 1d.f.), summarized in (3.2)?Given each bounded random variable x, (u 5 1.5), is not the fiducid step for the t-pivotal invalid? Isn't the evidence (u 5 1.5) relevant to direct provided k1 5 Ep[x / ha] 5 k2 (for each element of a), inference about t: p(t) # p(t1 u 5 1.5)? then k1 5 Ep[x]5 k2, Let me propose a way out of this dilemma for fiducial where "Ep[ 1" denotes expectation with respect to p. probability, but at the expense of the "direct" argu- ment. As Fisher asserts in the second of the conflicting As Dubins (1975) has shown, conglomerability of p quotations above, the step-by-stepmethod is the rigor- in a partition a is equivalent top disintegrability in a. FISHER'S FIDUCIAL ARGUMENT 365

However, Schervish, Seidenfeld and Kadane (1984) the two endpoints of the parameter space, can the have demonstrated that each finitely additive probabil- average coverage probability equal the announced sig- ity that is not countably additive suffers nonconglo- nificance level. merability of probability (for indicator functions) in Fiducial Prediction some countable partition. Hence, Bayesian models that rely on improper priors can display nonconglomerabil- Prediction offers a third variety of fiducial inference ity without being inconsistent. supported by using fiducial probabilities in Bayes' theo- How does this relate to the Buehler-Feddersen "para- rem. Suppose we observe xl - N(p,1)and we want to dox"? With respect to Jefieys' Bayesian model for predict an (independent) x2 - N(p, l), from the same fiducial inference, consider the conditional probability hypothetical population. Of course, there is the "direct" pivotal argument: let y = (xz - xl) and y - N(O,2). However, if we are to incorporate these predictions If this conditional probability is conglomerable in the with our fiducial inferences about p, then the following (two-dimensional) partition of the unknown parame- use of Bayes' theorem provides the so-called "rigorous'' ters, then (3.3) entails argument. The joint likelihood factors: p(x1,x2(p) = p(xllp)p(xzlp).Bayes' theorem leads to the result

If the conditional probability is conglomerable in the ~(xzlxl)a~p(x21~l~(~lxlldp. (two-dimensional) partition of the observed random variables, then (3.2) entails Use the fiducial probability p(p(x1)in this consequence of Bayes' theorem, that is, where the conditional proba- bility p(plxl) is as p - N(x1,l). The result is in Together, the conditional conglomerability assump- agreement with the "direct" pivotal conclusion. Fisher tions make an inconsistent pair. Given (u 5 1.5), p (1960) gives the same analysis for a more complicated experiences nonconglomerability in (at least) one of these case of normal prediction when both p and g2 are two partitions. Suppose we opt to make p(. I u s 1.5) unknown. That problem involves the joint fiducial pos- conglomerable in the partition of the data, as we may terior for (p,u2) given an observed sample which, then, with Jefieys' model. Then there is no warrant for using is integrated out to yield the fiducial prediction for a the relevant subset argument with the t-pivotal, based second, independent sample from the same population. on direct (conditional) probabilities of the kind in (3.3). Moreover, under the same model, we may have p(.) 4. CANONICAL PIVOTALS unconditionally conglomerable in both partitions! (See Heath and Sudderth, 1978. Thus the event (u 5 1.5) When may the fiducial argument be applied to a has prior probability'o in JefFreys' model.) In short, pivotal variable? When may the observed data be irrele- there is a consistent formulation of fiducial inference vant to direct inference about a pivotal? If univariate which saves the step-by-step method, leading to the fiducial probability using the c.d.f. pivotal has a Bayes- marginal fiducial t-distribution for p and which, at the ian model, that is, if -8FIaO coheres with Bayes' same time, places no weight on the existense of relevant theorem, Lindley (1958) answered the question. Spe- subsets for direct inference in the form (3.3). What cifically, within a fixed exponential family, combine a more than that can be asked in defense of Fisher's fiducial probability induced by datum xl for unknown theory against the Buehler-Feddersen "paradox"? 0 with the likelihood for independent datum xz given 0, using Bayes' theorem, to obtain a posterior probability Remark pl(0lx1,xz). The probability pl agrees with pz(Oly), a To the best of my knowledge, the closest Fisher direct fiducial probability for 0 given y (where y is a comes to recognizing the finitely additive nature of for the composite data) provided fiducial probability is in one of his discussions of the the problem admits a transformation to a location Behrens-Fisher significance test (Fisher, 1973, p. 100). parameter. There he observes that, given the null-hypothesis, the In this section, I want to suggest a Fisherian justifi- announced significance level for his test may be greater cation for a restriction on pivotal variables that insures than the so-called "coverage!'probability at each value the minimal coherence of fiducial probability afforded of the unknown (nuisance parameter) variance ratio. by Lindley's result. Assume a one-dimensionalcontinu- Only at the limiting variance ratios (of 0 and w) is the ous, parametric model for datum x, with density f(x10). coverage probability equal to the significance level, Let v = g(x,0) be a pivotal, that is, a variable whose being smaller for each point in the nuisance parameter distribution is determinate without knowing the pa- space. Hence, only with an improper prior over the rameter 0. If knowledge of x is to be irrelevant to direct nuisance parameter, which agglutinates all its mass at probability for v, and for this to induce a well-defined 366 T. SEIDENFELD

fiducial density on 8, then (Tukey, 1957) the following 3.2. Note, too, that if p is known the pivotal for fiducial three are necessary: (i) that v has the same range for inference about a2 is not s21a2 since, then, s2 is not a each possible value of x; (ii) that v is 1-1, with single sufficient statistic of the data for a2. Instead, if p is valued inverse; and (iii)this inverse is continuous with known, the fiducial inference uses the (sufficient) sam- continuous derivatives. Tukey calls a pivotal smoothly ple sum-of-squares around p. invertible when it satisfies these requirements. Smooth There are two justifications for the condition that invertibility is equivalent to conditions that Fisher (1973, pivotals be canonical, a Bayesian reason and a Fish- pp. 73-74 and p. 179) called for. Evidently, Example erian reason. First, regarding Bayesian models for fi- 2.2 involves a failure of clause (i),which helps to explain ducial inference, in the setting considered by Lindley why the fiducial argument does not apply there. canonical pivotals are coherent. That is, with canonical Smooth invertibility of a pivotal does not suffice for pivotals in the exponential family, the inference prob- coherence of fiducial inference, however, as illustrated lem can be transformed to a location parameter. (See by Lindley's (1958, p. 229) counterexample or Good's Seidenfeld, 1979, Appendix 9.3.) Second, fiducial infer- (1965, Appendix A). With motivation to follow in due ence with canonical pivotals supplies a link between course, define pivotal v to be canonical provided it is (Fisherian)tests of significance and fiducial probability. smoothly invertible and its distributionp(v)is the same As early as 1930, in his first attempt at fiducial as the conditional distribution p(xlo*) of the observable inference involving the correlation of a bivariate nor- x for some value of the unknown parameter 0 = O*. mal population, Fisher proposed "fiducial intervals" as In advance of proposing a justification for this added sets of unrejected null-hypotheses (Fisher, 1930, p. 533). constraint on pivotal variables, examine the following Significance tests remain as popular in applied statis- illustrations of canonical pivotals for several of the tics as they are enigmatic to . None- fiducial inferences used elsewhere in this essay. In each theless, since fiducial inference with canonical pivotals case, Tukey's condition of smooth invertibility is easily reduces to inference for a location parameter, there is verified. a simple tie among fiducial probability, likelihood and significance levels. EXAMPLE4.1 (Normal mean). Let x N(p,1).Define - Suppose, in location form, significance levels are de- pivotal v = (x - p), so v N(0,I). That is, v is - termined by a so-called probabilistic discrepancy rank- canonical for p* = 0. ing-to use the language of H. Crambr (1946).That is, EXAMPLE4.2 (Exponential distribution). Let c.d.f. call a sample outcome 01 more discrepant with the (null) F(x,0) = 1 - exp{-x0) for positive x and 0. Define hypothesis H~othan sample outcome 02 if and only if pivotal u = x0, so v is exponential with parameter 1. p(o2100)

EXAMPLE4.5 (Normal, fiducial intervals). Let x - readily understood in terms of Jefieys' Bayesian model N(p, I),with canonical pivotal v = (x - p), and so v - for the fiducial argument. When we create p(ply) by N(0,l). Under the probabilistic discrepancy ranking fiducial reasoning, we use the canonical pivotal (y - v) for outcomes, the 0.05 significance level for hypothesis whose Bayesian model requires a uniform ("improper") H: p = k, is (approximately) the set {x: Ix - kI > 2). prior over v. When, instead, we createp(p1x)by fiducial Likewise, a prediction about LJ, using the same index reasoning, we use the canonical pivotal (x - ,u) whose of discrepancy, is p{-2 5 v 5 2) = 0.95. The fiducial Bayesian model requires a uniform ("improper")prior prediction about v, then, is p{-2 5 v 5 2 1 x) = over p. Because p and v are nonlinear transformations 0.95. But this fiducial prediction yields a 0.95 "fiducial of the same quantity, it is impossible to have a uniform interval" of p values coinciding with the set of values distribution simultaneously over both. {p:x - 2 5 ,u 5 x + 2) of hypotheses which are not This fault in fiducial inference, then, is yet another significant at the 0.05 level (or less). As is well known, version of a very old problem with Laplace's Principle this interval also results from a likelihood ratio test. of Insufficient Reason. If "ignorance" over a set of In fact, this agreement between fiducial intervals and possibilities is to be represented by a uniform probabil- significance tests requires, only, that "fiducial predic- ity, to capture the symmetry which "ignorance"entails, tion" for the canonical pivotal be based on the same then we get mutually inconsistent representations of discrepancy index as is used to determine significance the same state of "ignorance" merely through an equiv- levels for the statistical hypotheses. In particular, the alent reparameterization of the set of possibilities. probabilistic discrepancy index is useful, in addition, Both p and v parameterize the same set of possibilities. for establishing the tie to likelihood ratio tests for Through the lens of Jefieys' Bayesian model, we see location parameters. where the fiducial step leads Fisher's assumption of prior ignorance into the paradoxes of InsufficientRea- son. That is to say, making the data irrelevant to direct 5. NON-BAYESIAN ASPECTS OF FIDUCIAL probability for the pivotal conflicts with Fisher's claim INFERENCE AND CONCLUSION that there is no prior probability over the parameter. The contradiction is not that some "prior" is required The fiducial argument displays its non-Bayesian for a Bayesian model of fiducial inference. Rather, it is character through reliance on the sample space of possi- using fiducial probability in Bayes' theorem that leads ble observations to locate its Bayesian model. That is, to a contradiction about which "prior" represents the the prior for fiducial inference may depend upon which same state of ignorance. component of the likelihood is used to drive the fiducial To be fair to Fisher, this problem is not his alone. argument. Example 5.1 is a challenging exercise for a wide variety of (what Savage called) "necessitarian" theories: theo- EXAMPLE5.1 (Inconsistent fiducial inferences using ries that try to find privileged distributions to repre- Bayes' theorem). Let x - N(p,1)and, independently, sent "ignorance." The Example 5.1 applies to Jefieys' let y N(v, I), where p = v3. Such variety of data - (1961, Section 3.10) theory of Invariance, which uses might arise by using different measurement techniques Information theory to fix symmetries preserved in a for the same (theoretical) unknown parameter. How- prior. It applies to Fraser's (1968) group theoretic, ever, because p and v are not linearly related, there is Structural Inference. And it applies to Jaynes' (1983) no real-valued sufficient statistic for the pair (x,y)- program of Maximum Entropy. they are minimally sufficient by themselves -and Lind- In his 1957 paper, "The Underworld of Probability," ley's result does not apply. Fisher proposes a modified fiducial argument with in- ' The joint likelihood factors are as follows: equalities in place of equalities of probabilities, for example, fiducial conclusions of the form p(0 r 0) > 0.5 to replace statements like, p(0 r 0) = 0.5. This idea so there is the opportunity for using Bayes theorem relates to current research using sets of probabilities, with a fiducial probability based on (either)one of these rather than a single probability, to represent an induc- factors: tive conclusion. Can ignorance be depicted by a large set of prior probabilities? Explicit connection of this approach with fiducial inference is found in A. P. Demp- and ster's (1966) work and in H. E. Kyburg's (1961, 1974) novel theory. Perhaps it is premature to say we have seen the end of the fiducial idea?! However, contrary to Bayes' theorem, the inverse fidu- In the conference on fiducial probability (from which cial probability, p(plx, y), depends upon which factor I quoted Kyburg's remarks to begin this paper), Savage of the likelihood is used for fiducial inference. This is (1963) wrote: 368 T. SEIDENFELD

The aim of fiducial probability. . . seems to be proposed by Mr. Lindley. J. Roy. Statist. Soc. Ser. B 22 299- what I term "making the ~a~esianomelette with- 301. out breaking the Bayesian eggs." (p. 926) FISHER,R. A. (1973). Statistical Methods and Scientific Infer- ence, 3rd ed. Hafner, New York. In that sense, fiducial probability is impossible. You FRASER,D. A. S. (1961). The fiducial method and invariance. cannot reduce inverse to direct inference. As with many Biometrika 48 261-280. FRASER,D. A. S. (1968). The Structure of Inference. Wiley, New great intellectual contributions, what is of lasting value York. is what we learn trying to understand Fisher's insights GOOD,I. J. (1965). The Estimation of Probabilities. MIT Press. on fiducial probability. His solution to the Behrens- GOOD,I. J. (1971). Twenty seven principles of rationality. In Fisher problem, for example, was a brilliant treatment Foundations of Statistical Inference (V. F'. Godambe and D. A. Sprott, eds.) 124-127. Holt, Rinehart, and Winston, of nuisance parameters using Bayes' theorem. His in- Toronto. stincts about "recognizable subpopulations" led to HACKING,I. (1965). Logic of Statistical Inference. Cambridge work on so-called relevant reference sets, a subject of Univ. Press. continuing research even from the "orthodox" point HEATH,J. D. and SUDDERTH,W. (1978). On finitely additive of view. And, exploration of multiparameter fiducial priors, coherence, and extended admissibility. Ann. Statist. 6 333-345. - inference helped to expose puzzles of improper priors JAYNES,E. T. (1983).Papers on Probability, Statistics and Statis- an area still ripe with controversy (Seidenfeld, 1982). tical Physics (R. Rosenkrantz, ed.). Reidel, Dordrecht. In this sense, "the fiducial argument is 'learning from JEFFREYS,H. (1961).Theory of Probability, 3rd ed. Oxford Univ. Fisher"' (Savage, 1963, p. 926). So interpreted, it cer- Press. tainly remains a valuable addition to the statistical KADANE,J. B., SCHERVISH,M. J. and SEIDENFELD,T. (1986). Statistical implications of finitely additive probability. In lore. Bayesian Inference and Decision Techniques (P. Goel and A. Zellner, eds.) 59-76. Elsevier, New York. ACKNOWLEDGMENTS KYBURG,H. E. (1961). Probability and Logic of Rational Belief. Wesleyan Univ. Press. This paper is adapted from an invited address to a KYBURG,H. E. (1963).Logical and fiducial probability. Bull. Inst. commemorative session of the AAAS held February Internat. Statist. 40 884-901, 938-939. 16, 1990 in New Orleans, Louisiana. I am indebted KYBURG,H. E. (1974). Logical Foundations of Statistical Infer- to my friend and teacher Isaac Levi for many lively ence. Reidel, Boston. LEVI,I. (1980). The Enterprise of Knowledge. MIT Press. discussions about Fisher's work. Levi (1980) provides LINDLEY,D. V. (1958). Fiducial distributions and Bayes' theorem. insightful views on fiducial probability and related J. Roy. Statist. Soc. Sex B 20 102-107. themes about direct inference. J. Kadane's advice has LINDLEY,D. V. (1969).Introduction to Probability and Statistics improved this written version of my address. from a Bayesian Viewpoint. Cambridge Univ. Press. LINNIK,Y. V. (1963). On the Behrens-Fisher problem. Bull. Inst. Internat. Statist. 40 833-841. REFERENCES RENYI,A. (1955).On a new axiomatic theory of probability. Acta Math. Hungar. 6 285-335. BUEHLER,R. J. and FEDDERSEN,A. P. (1963). Note on a condi- SAVAGE,L. J. (1963). Discussion. Bull. Inst. Internat. Statist. 40 tional property of Student's t. Ann. Math. Statist. 34 1098- 925-927. 1100. SCHERVISH,M. J., SEIDENFELD,T. and KADANEJ. B. (1984). The CRAM~R,H. C. (1946). Mathematical Methods of Statistics. Re- extent of non-conglomerability of finitely additive probabili- port 1971, Princeton Univ. Press. ties. 2. Wahrsh. Verw. Gebiete 66 205-226. DEMPSTER,A. P. (1963). On direct probabilities. J. Roy. Statist. SEIDENFELD,T. (1979).Philosophical Problems of Statistical In- Soc. Sex B 35 100-110. ference. Reidel, Dordrecht. DEMPSTER,A. P. (1966). New methods for reasoning towards SEIDENFELD,T. (1982). Paradoxes of conglomerability and fidu- posterior distributions based on sample data. Ann. Math. cial inference. In Proceedings of the 6th International Con- , Statist. 37 355-374. gress on Logic, Methodology and Philosophy of Science (J. DUBINS,L. (1975). Finitely additive conditional probabilities, Los and H. Pfeiffer, eds.) 395-412. North Holland, Am- conglomerability and disintegrations. Ann. Probab. 3 89-99. sterdam. FISHER,R. A. (1930). Inverse probability. Proceedings of the SEIDENFELD,T. (1992). R. A. Fisher on the design of experiments Cambridge Philosophical Society 26 528-535. and statistical estimation. In The Founders of Evolutionary FISHER,R. A. (1957). The underworld of probability. SankhyE 18 Genetics (S. Sarkav, ed.). Kluwer, Dordrecht. In press. 201-210. TUKEY,J. W. (1957). Some examples with fiducial relevance. Ann. FISHER,R. A. (1960). On some extensions of Bayesian inference Math. Statist. 28 687-695. http://www.jstor.org

LINKED CITATIONS - Page 1 of 2 -

You have printed the following article: R. A. Fisher's Fiducial Argument and Bayes' Theorem Teddy Seidenfeld Statistical Science, Vol. 7, No. 3. (Aug., 1992), pp. 358-368. Stable URL: http://links.jstor.org/sici?sici=0883-4237%28199208%297%3A3%3C358%3ARAFFAA%3E2.0.CO%3B2-G

This article references the following linked citations. If you are trying to access articles from an off-campus location, you may be required to first logon via your library web site to access JSTOR. Please visit your library's website or contact a librarian to learn about options for remote access to JSTOR.

References

Note on a Conditional Property of Student's t1 R. J. Buehler; A. P. Feddersen The Annals of Mathematical Statistics, Vol. 34, No. 3. (Sep., 1963), pp. 1098-1100. Stable URL: http://links.jstor.org/sici?sici=0003-4851%28196309%2934%3A3%3C1098%3ANOACPO%3E2.0.CO%3B2-G

On Direct Probabilities A. P. Dempster Journal of the Royal Statistical Society. Series B (Methodological), Vol. 25, No. 1. (1963), pp. 100-110. Stable URL: http://links.jstor.org/sici?sici=0035-9246%281963%2925%3A1%3C100%3AODP%3E2.0.CO%3B2-H

New Methods for Reasoning Towards Posterior Distributions Based on Sample Data A. P. Dempster The Annals of Mathematical Statistics, Vol. 37, No. 2. (Apr., 1966), pp. 355-374. Stable URL: http://links.jstor.org/sici?sici=0003-4851%28196604%2937%3A2%3C355%3ANMFRTP%3E2.0.CO%3B2-Y

Finitely Additive Conditional Probabilities, Conglomerability and Disintegrations Lester E. Dubins The Annals of Probability, Vol. 3, No. 1. (Feb., 1975), pp. 89-99. Stable URL: http://links.jstor.org/sici?sici=0091-1798%28197502%293%3A1%3C89%3AFACPCA%3E2.0.CO%3B2-Q http://www.jstor.org

LINKED CITATIONS - Page 2 of 2 -

On Some Extensions of Bayesian Inference Proposed by Mr Lindley R. A. Fisher Journal of the Royal Statistical Society. Series B (Methodological), Vol. 22, No. 2. (1960), pp. 299-301. Stable URL: http://links.jstor.org/sici?sici=0035-9246%281960%2922%3A2%3C299%3AOSEOBI%3E2.0.CO%3B2-U

The Fiducial Method and Invariance D. A. S. Fraser Biometrika, Vol. 48, No. 3/4. (Dec., 1961), pp. 261-280. Stable URL: http://links.jstor.org/sici?sici=0006-3444%28196112%2948%3A3%2F4%3C261%3ATFMAI%3E2.0.CO%3B2-R

On Finitely Additive Priors, Coherence, and Extended Admissibility David Heath; William Sudderth The Annals of Statistics, Vol. 6, No. 2. (Mar., 1978), pp. 333-345. Stable URL: http://links.jstor.org/sici?sici=0090-5364%28197803%296%3A2%3C333%3AOFAPCA%3E2.0.CO%3B2-5

Fiducial Distributions and Bayes' Theorem D. V. Lindley Journal of the Royal Statistical Society. Series B (Methodological), Vol. 20, No. 1. (1958), pp. 102-107. Stable URL: http://links.jstor.org/sici?sici=0035-9246%281958%2920%3A1%3C102%3AFDABT%3E2.0.CO%3B2-Z

Some Examples with Fiducial Relevance John W. Tukey The Annals of Mathematical Statistics, Vol. 28, No. 3. (Sep., 1957), pp. 687-695. Stable URL: http://links.jstor.org/sici?sici=0003-4851%28195709%2928%3A3%3C687%3ASEWFR%3E2.0.CO%3B2-F