arXiv:1302.7021v3 [stat.ME] 3 Nov 2014 nteBrbu ruetfrthe Mayo G. Principle for Deborah Likelihood Argument Strong Birnbaum the On 04 o.2,N.2 227–239 2, DOI: No. 29, Vol. 2014, Science Statistical and 10.1214/14-STS472 [email protected] ilasHl,Bakbr,Vrii 46,UAe-mail: USA 24061, Virginia Blacksburg, Major Hall, 235 Tech, Williams Virginia Philosophy, of Department

eoa .My sPoesro Philosophy, of Professor is Mayo G. Deborah c ttsia Science article Statistical original the the of by reprint published electronic an is This ern iesfo h rgnli aiainand pagination in detail. original typographic the from differs reprint nttt fMteaia Mathematical of Institute 1 icse in Discussed 10.1214/14-STS475 10.1214/13-STS457 . 10.1214/14-STS470 , nttt fMteaia Statistics Mathematical of Institute , 10.1214/14-STS473 u fti ae.I atclr foutcomes if (SLP particular, principle In likelihood paper. strong this the of as cus known result principle feature This a distribution. of relevant the is unitntos uhas such notions, quentist Abstract. frlvn eeiin.TeWPsy htoc ti nw wh known is it once that says WCP restrictin The justify repetitions. relevant to of (WCP) Principle Conditionality Weak iments rbblt models probability o all for neec about inference ocnie xeiet te hnteoepromdt prod to requ performed not one [ does Cox the this David than argue data. other we experiments observed, consider one to the than other comes rdcdtemaueet h sesetsol ei terms in be should of assessment properties the measurement, the produced n tog,smln hoy ucec,wa conditional weak sufficiency, theory, sampling strong), Suc and phrases: SP. and and words WCP ho Key the SLP]. show entails both we [WCP holding refute SLP], while also entails SLP cases the SP) violate and may Alt [(WCP data argument. i that Birnbaum’s article purports of this critique argument of and would so goal clarification The and this new distributions. mixtures, But a sampling of (SP). of case sufficiency use the the as in principle WCP a the controversial applying from follow to Assoc. Statist. 04 o.2,N.2 227–239 2, No. 29, Vol. 2014, eone at rejoinder ; θ E outcomes , 2014 , 1 nesnilcmoeto neec ae nfmla fre- familiar on based inference of component essential An and , E 10.1214/14-STS471 10.1214/14-STS482 i 57 h upiiguso fAlnBrbu’ [ Birnbaum’s Allan of upshot surprising The . θ E , lhuhsc iltosse rmcnieigout- considering from stem violations such Although . 10.1214/14-STS474 2 n.Mt.Statist. Math. Ann. 16)2936 rueti htteSPappears SLP the that is argument 269–306] (1962) f bt ihukonparameter unknown with (both x 1 ∗ ( · ) and f , 2 inamzto,lklho rnil (weak principle likelihood Birnbaumization, p ( This . vle,sgicneadcndnelevels, confidence and significance -values, y · ,te vnthough even then ), ∗ , . a aedffrn mlctosfran for implications different have may in 1 L olw rmpicpe htfeunitsam- frequentist that accept: theorists principles the but pling that from SLP, show follows the to SLP violate purports argument on Birnbaum so the and levels significance ain fsaitc.Ntol oalo h famil- the of notions, all error-probability do only frequentist foun- Not iar the statistics. for of result controversial, dations if held significant, been a long as has (SLP) principle likelihood strong 29 ti ayt e h inamsagmn o the for argument Birnbaum’s why see to easy is It 15)3732 rpssthe proposes 357–372] (1958) hoyadpatc,ntbyteNeyman– statistical the notably modern practice, of and theory body incompatible main is the with principle likelihood The x ∗ 1 f and 1 .INTRODUCTION 1. ( x θ ∗ aedifferent have ) y ; θ ∗ nviolations in s = ) rmexper- from oprovide to s h space the g ity. og his hough cf ,tefo- the ), .Amer. J. preclude 2 c the uce ( ich fthe of r us ire y ∗ un- ; E θ w h ) i p -values, 2 D. G. MAYO

Pearson theory of hypothesis testing and While conditioning on the instrument actually of confidence intervals, and incompatible used seems obviously correct, nothing precludes the in general even with such well-known con- Neyman–Pearson theory from choosing the proce- cepts as standard error of an estimate and dure “which is best on the average over both ex- significance level. [Birnbaum (1968), page periments” in Emix [Lehmann and Romano (2005), 300.] page 394]. They ask the following: “for a given test The incompatibility, in a nutshell, is that on the or confidence procedure, should probabilities such SLP, once the data x are given, outcomes other than as level, power, and confidence coefficient be calcu- x are irrelevant to the evidential import of x. “[I]t lated conditionally, given the experiment that has is clear that reporting significance levels violates the been selected, or unconditionally?” They suggest LP [SLP], since significance levels involve averaging that “[t]he answer cannot be found within the model over sample points other than just the observed x.” but depends on the context” (ibid). The WCP gives [Berger and Wolpert (1988), page 105.] a rationale for using the conditional appraisal in the context of informative parametric inference. 1.1 The SLP and a Frequentist Principle of Evidence (FEV) 1.2 What Must Logically Be Shown Birnbaum, while responsible for this famous ar- However, the upshot of the SLP is to claim that gument, rejected the SLP because “the likelihood the sampling theorist must go all the way, as it were, concept cannot be construed so as to allow useful given a parametric model. If she restricts attention appraisal, and thereby possible control, of probabili- to the experiment producing the data in the mix- ties of erroneous interpretations” [Birnbaum (1969), ture experiment, then she is led to consider just the page 128]. That is, he thought the SLP at odds with data and not the sample space, once the data are in a fundamental frequentist principle of evidence. hand. While the argument has been stated in various Frequentist Principle of Evidence (general): Draw- forms, the surprising upshot of all versions is that ing inferences from data requires considering the rel- the SLP appears to follow from applying the WCP evant error probabilities associated with the under- in the case of mixture experiments, and so uncon- lying data generating process. troversial a notion as sufficiency (SP). “Within the David Cox intended the central principle invoked context of what can be called classical frequency- in Birnbaum’s argument, the Weak Conditionality based statistical inference, Birnbaum (1962) argued Principle (WCP), as one important way to justify that the conditionality and sufficiency principles im- restricting the space of repetitions that are rele- ply the [strong] likelihood principle” [Evans, Fraser vant for informative inference. Implicit in this goal and Monette (1986), page 182]. is that the role of the sampling distribution for Since the challenge is for a sampling theorist who informative inference is not merely to ensure low holds the WCP, it is obligatory to consider whether error rates in repeated applications of a method, and how such a sampling theorist can meet it. While but to avoid misleading inferences in the case at the WCP is not itself a theorem in a formal system, hand [Mayo (1996); Mayo and Spanos (2006, 2011); Mayo and Cox (2010)]. Birnbaum’s argument purports that the following is To refer to the most familiar example, the WCP a theorem: says that if a parameter of interest θ could be mea- [(WCP and SP) entails SLP]. sured by two instruments, one more precise then the other, and a randomizer that is utterly irrel- If true, any data instantiating both WCP and SP evant to θ is used to decide which instrument to could not also violate the SLP, on pain of logical use, then, once it is known which experiment was contradiction. We will show how data may violate run and its outcome given, the inference should be the SLP while still adhering to both the WCP and assessed using the behavior of the instrument actu- SP. Such cases also refute [WCP entails SLP], mak- ally used. The convex combination of the two in- ing our argument applicable to attempts to weaken struments, linked via the randomizer, defines a mix- or remove the SP. Violating SLP may be written as ture experiment, Emix. According to the WCP, one not-SLP. should condition on the known experiment, even if We follow the formulations of the Birnbaum argu- an unconditional assessment improves the long-run ment given in Berger and Wolpert (1988), Birnbaum performance [Cox and Hinkley (1974), pages 96–97]. (1962), Casella and Berger (2002) and Cox (1977). ON THE BIRNBAUM ARGUMENT FOR THE STRONG LIKELIHOOD PRINCIPLE 3

The current analysis clarifies and fills in important happen with subjective Bayesianism. . . . the objec- gaps of an earlier discussion in Mayo (2010), Mayo tive Bayesian responds that objectivity can only and Cox (2011), and lets us cut through a fascinat- be defined relative to a frame of reference, and ing and complex literature. The puzzle is solved by this frame needs to include the goal of the analy- adequately stating the WCP and keeping the mean- sis.” [Berger (2006), page 394.] By contrast, Savage ing of terms consistent, as they must be in an argu- stressed: ment built on a series of identities. According to Bayes’s theorem, P (x θ)... | 1.3 Does It Matter? constitutes the entire evidence of the ex- periment . . . [I]f y is the datum of some On the face of it, current day uses of sampling other experiment, and if it happens that theory statistics do not seem in need of going back P (x θ) and P (y θ) are proportional func- 50 years to tackle a foundational argument. This tions| of θ (that| is, constant multiples of may be so, but only if it is correct to assume that each other), then each of the two data the Birnbaum argument is flawed somewhere. Sam- x and y have exactly the same thing to pling theorists who feel unconvinced by some of the say about the value of θ. [Savage (1962a), machinations of the argument must admit some dis- page 17, using θ for his λ and P for P r.] comfort at the lack of resolution of the paradox. If one cannot show the relevance of error probabilities 2. NOTATION AND SKETCH OF and sampling distributions to inferences once the BIRNBAUM’S ARGUMENT data are in hand, then the uses of frequentist sam- pling theory, and resampling methods, for inference 2.1 Points of Notation and Interpretation purposes rest on shaky foundations. Birnbaum focuses on informative inference about The SLP is deemed of sufficient importance to be a parameter θ in a given model M, and we retain included in textbooks on statistics, along with a ver- that context. The argument calls for a general term sion of Birnbaum’s argument that we will consider: to abbreviate: the inference implication from exper- It is not uncommon to see statistics texts iment E and result z, where E is an experiment argue that in frequentist theory one is involving the observation of Z with a given distri- faced with the following dilemma: either bution f(z; θ) and a model M. We use the following: to deny the appropriateness of condition- InfrE[z]: the parametric statistical infer- ing on the precision of the tool chosen ence from a given or known (E, z). by the toss of a coin, or else to embrace the strong likelihood principle, which en- (We prefer “given” to “known” to avoid refer- tails that frequentist sampling distribu- ence to psychology.) We assume relevant features of tions are irrelevant to inference once the model M are embedded in the full statement of ex- data are obtained. This is a false dilemma. periment E. An inference method indicates how to . . . The ‘dilemma’ argument is therefore compute the informative parametric inference from an illusion. [Cox and Mayo (2010), page (E, z). Let 298.] (E, z) InfrE[z]: an informative paramet- ric inference⇒ about θ from given (E, z) is If we are correct, this refutes a position that is Infr generally presented as settled in current texts. But to be computed by means of E[z]. the illusion is not so easy to dispel, thus this paper. The principles of interest turn on cases where (E, z) Perhaps, too, our discussion will illuminate a point is given, and we reserve “ ” for such cases. The ab- ⇒ of agreement between sampling theorists and con- breviation InfrE[z], first developed in Cox and Mayo temporary nonsubjective Bayesians who concede (2010), could allude to any parametric inference ac- they “have to live with some violations of the likeli- count; we use it here to allow ready identification hood and stopping rule principles” [Ghosh, Delam- of the particular experiment E and its associated pady and Sumanta (2006), page 148], since their sampling distribution, whatever it happens to be. Infr prior probability distributions are influenced by the Emix (z) is always understood as using the convex sampling distribution. “This, of course, does not combination over the elements of the mixture. 4 D. G. MAYO

Assertions about how inference “is to be com- 2.3 Sufficiency Principle (Weak Likelihood puted given (E, z)” are intended to reflect the prin- Principle) ciples of evidence that arise in Birnbaum’s argu- ment, whether mathematical or based on intuitive, For informative inference about θ in E, if TE is a philosophical considerations about evidence. This is (minimal) sufficient statistic for E, the Sufficiency important because Birnbaum emphasizes that the Principle asserts the following: WCP is “not necessary on mathematical grounds SP: If TE(z1)= TE(z2), then InfrE[z1]= alone, but it seems to be supported compellingly InfrE[z2]. by considerations . . . concerning the nature of evi- dential meaning” of data when drawing parametric That is, since inference within the model is to be computed using the value of TE( ) and its sampling statistical inferences [Birnbaum (1962), page 280]. · In using “=” we follow the common notation even distribution, identical values of TE have identical though WCP is actually telling us when z1 and z2 inference implications, within the stipulated model. should be deemed inferentially equivalent for the as- Nothing in our argument will turn on the minimality sociated inference. requirement, although it is common. Infr By noncontradiction, for any (E, z), E[z] = 2.3.1 Model checking. An essential part of the InfrE[z]. So to apply a given inference implication statements of the principles SP, WCP and SLP is means its inference directive is used and not some that the validity of the model is granted as ade- competing directive at the same time. Two outcomes quately representing the experimental conditions at z1 and z2 will be said to have the same inference im- hand [Birnbaum (1962), page 280]. Thus, accounts plications in E, and so are inferentially equivalent that adhere to the SLP are not thereby prevented within E, whenever InfrE[z1]= InfrE[z2]. from analyzing features of the data, such as residu- 2.2 The Strong Likelihood Principle: SLP als, in checking the validity of the The principle under dispute, the SLP, asserts the itself. There is some ambiguity on this point in inferential equivalence of outcomes from distinct ex- Casella and Berger (2002): periments E1 and E2. It is a universal if-then claim: Most model checking is, necessarily, based SLP: For any two experiments E1 and on statistics other than a sufficient statis- E2 with different probability models f1( ), tic. For example, it is common practice to · f2( ) but with the same unknown parame- examine residuals from a model. . . Such a · ∗ ∗ ter θ, if outcomes x and y (from E1 and practice immediately violates the Suffi- E2, resp.) give rise to proportional likeli- ciency Principle, since the residuals are ∗ ∗ hood functions (f1(x ; θ)= cf2(y ; θ) for not based on sufficient statistics. (Of all θ, for c a positive constant), then x∗ course such a practice directly violates and y∗ should be inferentially equivalent the [strong] LP also.) [Casella and Berger for any inference concerning parameter θ. (2002), pages 295–296.] A shorthand for the entire antecedent is that They warn that before considering the SLP and ∗ ∗ ∗ (E1, x ) is an SLP pair with (E2, y ), or just x WCP, “we must be comfortable with the model” ∗ and y form an SLP pair (from E1, E2 ). Assum- [ibid, page 296]. It seems to us more accurate to re- { } ing all the SLP stipulations, for example, that θ is gard the principles as inapplicable, rather than vi- a shared parameter (about which inferences are to olated, when the adequacy of the relevant model is be concerned), we have the following: lacking. Applying a principle will always be relative ∗ ∗ SLP: If (E1, x ) and (E2, y ) form an SLP to the associated experimental model. ∗ ∗ InfrE InfrE pair, then 1 [x ]= 2 [y ]. 2.3.2 Can two become one? The SP is sometimes Experimental pairs E1 and E2 involve observing called the weak likelihood principle, limited as it is random variables X and Y, respectively. Thus, (E2, to a single experiment E, with its sampling distri- ∗ ∗ ∗ y ) or just y asserts “E2 is performed and y ob- bution. This suggests that if an arbitrary SLP pair, ∗ ∗ ∗ Infr 1 2 served,” so we may abbreviate E2 [(E2, y )] as (E , x ) and (E , y ), could be viewed as resulting Infr ∗ ∗ E2 [y ]. Likewise for x . A generic z is used when from a single experiment (e.g., by a mixture), then needed. perhaps they could become inferentially equivalent ON THE BIRNBAUM ARGUMENT FOR THE STRONG LIKELIHOOD PRINCIPLE 5 using SP. This will be part of Birnbaum’s argument, But Birnbaum does not stop there. Having con- but is neatly embedded in his larger gambit to which structed the hypothetical experiment EB , we are to we now turn. use the WCP to condition back down to the known 2.4 Birnbaumization: Key Gambit in Birnbaum’s experiment E2. But this will not produce the SLP Argument as we now show. The larger gambit of Birnbaum’s argument may 2.5 Why Appeal to Hypothetical Mixtures? be dubbed Birnbaumization. An experiment has ∗ Before turning to that, we address a possible been run, label it as E2, and y observed. Suppose, query: why suppose the argument makes any appeal for the parametric inference at hand, that y∗ has an ∗ to a hypothetical mixture? (See also Section 5.1.) SLP pair x in a distinct experiment E1. Birnbaum’s The reason is this: The SLP does not refer to mix- task is to show the two are evidentially equivalent, tures. It is a universal generalization claiming to as the SLP requires. hold for an arbitrary SLP pair. But we have no We are to imagine that performing E2 was the objection to imagining [as Birnbaum does (1962), result of flipping a fair coin (or some other random- page 284] a universe of all of the possible SLP pairs, izer given as irrelevant to θ) to decide whether to where each pair has resulted from a θ-irrelevant ran- run E1 or E2. Cox terms this the “enlarged exper- ∗ domizer (for the given context). Then, when y is iment” [Cox (1978), page 54], EB. We are then to ∗ observed, we pluck the relevant pair and construct define a statistic TB that stipulates that if (E2, y ) ∗ TB. Our question is this: why should the inference is observed, its SLP pair x in the unperformed ex- ∗ implication from y be obtained by reference to periment is reported; ∗ InfrE [y ], the convex combination? Birnbaum does ∗ ∗ ∗ B (E1, x ), if (E1, x ) or (E2, y ), not stop at [B], but appeals to the WCP. Note the TB(Ei, Zi)= (Ei, zi), otherwise. WCP is based on the outcome y∗ being given. Birnbaum’s argument focuses on the first case and ours will as well. 3. SLP VIOLATION PAIRS Following our simplifying notation, whenever E2 ∗ ∗ Birnbaum’s argument is of central interest when is performed and Y = y observed, and y is seen to we have SLP violations. We may characterize an admit an SLP pair, then label its particular SLP pair ∗ SLP violation as any inferential context where the (E1, x ). Any problems of nonuniqueness in identi- antecedent of the SLP is true and the consequent is fying SLP pairs are put to one side, and Birnbaum ∗ false: does not consider them. Thus, when (E2, y ) is ob- ∗ ∗ ∗ served, TB reports it as (E1, x ). This yields the SLP violation: (E1, x ) and (E2, y ) form ∗ ∗ Birnbaum experiment, EB , with its statistic TB. We an SLP pair, but InfrE [x ] = InfrE [y ]. 1 6 2 abbreviate the inference (about θ) in EB as An SLP pair that violates the SLP will be called Infr ∗ EB [y ]. an SLP violation pair (from E1, E2, resp.). ∗ The inference implication (about θ) in EB from y It is not always emphasized that whether (and under Birnbaumization is how) an inference method violates the SLP depends ∗ ∗ on the type of inference to be made, even within (E2, y ) InfrE [x ], ⇒ B an account that allows SLP violations. One cannot where the computation in EB is always a convex just look at the data, but must also consider the combination over E1 and E2. But also, inference. For example, there may be no SLP viola- ∗ ∗ tion if the focus is on point against point hypotheses, (E1, x ) InfrE [x ]. B whereas in computing a statistical significance prob- ⇒ ∗ ∗ It follows that, within EB , x and y are inferen- ability under a null hypothesis there may be. “Signif- tially equivalent. Call this claim icance testing of a hypothesis. . . is viewed by many ∗ ∗ as a crucial element of statistics, yet it provides a [B] : InfrEB [x ]= InfrEB [y ]. startling and practically serious example of conflict The argument is to hold for any SLP pair. Now with the [SLP].” [Berger and Wolpert (1988), pages [B] does not yet reach the SLP which requires 104–105.] The following is a dramatic example that Infr ∗ Infr ∗ E1 [x ]= E2 [y ]. often arises in this context. 6 D. G. MAYO

3.1 Fixed versus Sequential Sampling For the sampling theorist, to report a 1.96 stan- dard deviation difference known to have come from Suppose X and Y are samples from distinct ex- 2 optional stopping, just the same as if the sample size periments E1 and E2, both distributed as N(θ,σ ), had been fixed, is to discard relevant information with σ2 identical and known, and p-values are to be for inferring inconsistency with the null, while “ac- calculated for the null hypothesis H0: θ = 0 against cording to any approach that is in accord with the H1: θ = 0. 6 strong likelihood principle, the fact that this partic- In E2 the sampling rule is to continue sampling 1 n ular stopping rule has been used is irrelevant.” [Cox until yn > cα = 1.96σ/√n, where yn = n i=1 yi. In and Hinkley (1974), page 51.]3 The actual p-value E1, the sample size n is fixed and α = 0.P05. will depend of course on when it stops. We empha- In order to arrive at the SLP pair, we have to size that our argument does not turn on accepting consider the particular outcome observed. Suppose a frequentist principle of evidence (FEV), but these that E2 is run and is first able to stop with n = 169 ∗ considerations are useful both to motivate and un- trials. Denote this result as y . A choice for its SLP derstand the core principle of Birnbaum’s argument, ∗ √ pair x would be (E1, 1.96σ/ 169), and the SLP the WCP. violation is the fact that the p-values associated with ∗ ∗ x and y differ. 4. THE WEAK CONDITIONALITY PRINCIPLE 3.2 Frequentist Evidence in the Case of (WCP) Significance Tests ∗ ∗ From Section 2.4 we have [B] InfrEB [x ]= InfrEB [y ] “[S]topping ‘when the data looks good’ since the inference implication is by the constructed TB. How might Birnbaum move from [B] to the SLP, can be a serious error when combined ∗ ∗ with frequentist measures of evidence. For for an arbitrary pair x and y ? instance, if one used the stopping rule There are two possibilities. One would be to insist [above]. . . but analyzed the data as if a informative inference ignore or be insensitive to sam- fixed sample had been taken, one could pling distributions. But since we know that SLP vi- guarantee arbitrarily strong frequentist olations result because of the difference in sampling distributions, to simply deny them would obviously ‘significance’ against H0 . . . .” [Berger and Wolpert (1988), page 77.] render his argument circular (or else irrelevant for sampling theory). We assume Birnbaum does not From their perspective, the problem is with the intend his argument to be circular and Birnbaum use of frequentist significance. For a detailed discus- relies on further steps to which we now turn. sion in favor of the irrelevance of this stopping rule, 4.1 Mixture (Emix): Two Instruments of see Berger and Wolpert (1988), pages 74–88. For Different Precisions [Cox (1958)] sampling theorists, by contrast, this example “taken in the context of examining consistency with θ = 0, The crucial principle of inference on which Birn- is enough to refute the strong likelihood principle” baum’s argument rests is the weak conditionality [Cox (1978), page 54], since, with probability 1, it principle (WCP), intended to indicate the relevant will stop with a ‘nominally’ significant result even sampling distribution in the case of certain mix- though θ = 0. It contradicts what Cox and Hinkley ture experiments. The famous example to which we call “the weak repeated sampling principle” [Cox already alluded, “is now usually called the ‘weigh- and Hinkley (1974), page 51]. More generally, the ing machine example,’ which draws attention to the frequentist principle of evidence (FEV) would re- need for conditioning, at least in certain types of gard small p-values as misleading if they result from problems” [Reid (1992), page 582]. a procedure that readily generates small p-values We flip a fair coin to decide which of two instru- 2 under H0. ments, E1 or E2, to use in observing a Normally dis- tributed random sample Z to make inferences about 2Mayo and Cox (2010), page 254: 3 FEV: y is (strong) evidence against H0, if and only if, were Analogous situations occur without optional stopping, as H0 a correct description of the mechanism generating y, then, with selecting a data-dependent, maximally likely, alternative with high probability this would have resulted in a less dis- [Cox and Hinkley (1974), Example 2.4.1, page 51]. See also cordant result than is exemplified by y. Mayo and Kruse (2001). ON THE BIRNBAUM ARGUMENT FOR THE STRONG LIKELIHOOD PRINCIPLE 7

6 mean θ. E1 has variance of 1, while that of E2 is 10 . There are three sampling distributions, and the We limit ourselves to mixtures of two experiments. WCP says the relevant one to use whenever Infr Infr In testing a null hypothesis such as θ = 0, the same Emix [zi] = Ei [zi] is the one known to have gen- z measurement would correspond to a much smaller erated the6 result [Birnbaum (1962), page 280]. In p-value were it to have come from E1 rather than other cases the WCP would make no difference. from E2: denote them as p1(z) and p2(z), respec- tively. The overall (or unconditional) significance 4.3 The WCP and Its Corollaries level of the mixture Emix is the convex combina- We can give a general statement of the WCP as tion of the p-values: [p1(z)+ p2(z)]/2. This would follows: give a misleading report of how precise or stringent A mixture Emix selects between E1 and E2, using the actual experimental measurement is [Cox and a θ-irrelevant process, and it is given that (Ei, zi) Mayo (2010), page 296]. [See Example 4.6, Cox and results, i = 1, 2. WCP directs the inference implica- Hinkley (1974), pages 95–96; Birnbaum (1962), page tion. Knowing we are mapping an outcome from a 280.] mixture, there is no need to repeat the first com- Suppose that we know we have observed a mea- ponent of (Emix, zi), so it is dropped except when a surement from E2 with its much larger variance: reminder seems useful: The unconditional test says that we can assign this a higher level of significance (i) Condition to obtain relevance: than we ordinarily do, because if we were (Emix, zi) InfrEi [(Emix, zi)] = InfrEi (zi). to repeat the experiment, we might sam- ⇒ ple some quite different distribution. But In words, zi arose from Emix but the inference this fact seems irrelevant to the interpre- implication is based on Ei. tation of an observation which we know (ii) Eschew unconditional formulations: came from a distribution [with the larger (Emix, zi) ; InfrE [zi], variance]. [Cox (1958), page 361.] mix The WCP says simply: once it is known which whenever the unconditional treatment yields a Ei has produced z, the p-value or other inferential different inference implication, assessment should be made with reference to the ex- that is, whenever InfrE [zi] = InfrE [zi]. periment actually run. mix 6 i Note. InfrE [zi] which abbreviates 4.2 Weak Conditionality Principle (WCP) in the mix InfrE mix i Weighing Machine Example mix [(E , z )] asserts that the inference impli- cation uses the convex combination of the relevant We first state the WCP in relation to this exam- pair of experiments. ple. We are given (Emix, zi), that is, (Ei, zi) results We now highlight some points for reference. from mixture experiment Emix. WCP exhorts us to 4.3.1 WCP makes a difference. The cases of in- condition to be relevant to the experiment actually terest here are where applying WCP would alter producing the outcome. This is an example of what the unconditional implication. In these cases WCP Cox terms “conditioning for relevance.” makes a difference. WCP: Given (Emix, zi), condition on the Note that (ii) blocks computing the inference im- Ei producing the result Infr plication from (Emix, zi) as Emix [zi], whenever InfrE i InfrE i (Emix, zi) InfrE [(Emix, zi)] mix [z ] = i [z ] for i = 1, 2. Here E1, E2 and ⇒ i 6 Infr Emix would correspond to three sampling distribu- = pi(z)= Ei [zi]. tions. Do not use the unconditional formulation WCP requires the experiment and its outcome to ; Infr be given or known: If it is given only that z came (Emix, zi) Emix [(Emix, zi)] from E1 or E2, and not which, then WCP does not = [p1(z)+ p2(z)]/2. authorize (i). In fact, we would wish to block such The concern is that an inference implication. For instance,

InfrE mix i 1 2 i mix [(E , z )] = [p (z)+ p (z)]/2 = p (z). (E1 or E2, z) ; InfrE [z]. 6 1 8 D. G. MAYO

Point on notation: The use of “ ” is for a given WCP involves an inequivalence as well as an equiv- outcome. We may allow it to be used⇒ without am- alence. The WCP prescribes conditioning on the ex- biguity when only a disjunction is given, because periment known to have produced the data, and while E1 entails (E1 or E2), the converse does not not the other way around. It is their inequivalence hold. So no erroneous substitution into an inference that gives Cox’s WCP its normative proscriptive Infr implication would follow. force. To assume the WCP identifies Emix [zi] and InfrE [zi] leads to trouble. (We return to this in Sec- 4.3.2 Irrelevant augmentation: Keep irrelevant i tion 7.) facts irrelevant (Irrel). Another way to view the However, there is an equivalence in WCP (i). Fur- WCP is to see it as exhorting us to keep what is ir- ther, once the outcome is given, the addition of θ- relevant to the sampling behavior of the experiment irrelevant features about the selection of the exper- performed irrelevant (to the inference implication). iment performed are to remain irrelevant to the in- Consider Birnbaum’s (1969), page 119, idea that a ference implication: “trivial” but harmless addition to any given exper- imental result z might be to toss a fair coin and InfrEi [(Emix, zi)] = InfrEi [(Ei & Irrel, zi)]. augment z with a report of heads or tails (where Both are the same as InfrE [zi]. While claiming that this is irrelevant to the original model). Note the i z came from a mixture, even knowing it came from similarity to attempts to get an exact significance a nonmixture, may seem unsettling, we grant it for level in discrete tests, by allowing borderline out- purposes of making out Birnbaum’s argument. By comes to be declared significant or not (at the given (Irrel), it cannot alter the inference implication un- level) according to the outcome of a coin toss. The der Ei. WCP, of course, eschews this. But there is a crucial ambiguity to avoid. It is a harmless addition only 5. BIRNBAUM’S SLP ARGUMENT if it remains harmless to the inference implication. If it is allowed to alter the test result, it is scarcely 5.1 Birnbaumization and the WCP harmless. What does the WCP entail as regards Birnbau- A holder of the WCP may stipulate that a given mization? Now WCP refers to mixtures, but is the zi can always be augmented with the result of a θ- Birnbaum experiment EB a mixture experiment? irrelevant randomizer, provided that it remains ir- Not really. One cannot perform the following: Toss relevant to the inference implication about θ in Ei. a fair coin (or other θ-irrelevant randomizer). If it We can abbreviate this irrelevant augmentation of a lands heads, perform an experiment E2 that yields given result zi as a conjunction: (Ei & Irrel), ∗ a member of an SLP pair y ; if tails, observe an ex-

(Irrel): InfrEi [(Ei & Irrel, zi)] = InfrEi [zi], periment that yields the other member of the SLP i = 1, 2. pair x∗. We do not know what outcome would have resulted from the unperformed experiment, much We illuminate this in the next subsection. less that it would be an outcome with a propor- 4.3.3 Is the WCP an equivalence? “It was the tional likelihood to the observed y∗. There is a sin- adoption of an unqualified equivalence formulation gle experiment, and it is stipulated we know which of conditionality, and related concepts, which led, and what its outcome was. Some have described in my 1962 paper, to the monster of the likelihood the Birnbaum experiment as unperformable, or at axiom” [Birnbaum (1975), page 263]. He admits the most a “mathematical mixture” rather than an “ex- contrast with “the one-sided form to which applica- perimental mixture” [Kalbfleisch (1975), pages 252– tions” had been restricted [Birnbaum (1969), page 253]. Birnbaum himself calls it a “hypothetical” 139, note 11]. The question of whether the WCP is mixture [Birnbaum (1962), page 284]. a proper equivalence relation, holding in both direc- While a holder of the WCP may simply deny its tions, is one of the most central issues in the argu- general applicability in hypothetical experiments, ment. But what would be alleged to be equivalent? given that Birnbaum’s argument has stood for over Obviously not the unconditional and the condi- fifty years, we wish to give it maximal mileage. Birn- tional inference implications: the WCP makes a dif- baumization may be “performed” in the sense that ∗ ∗ ference just when they are inequivalent, that is, TB can be defined for any SLP pair x , y . Refer when InfrE [zi] = InfrE [zi]. Our answer is that the back to the hypothetical universe of SLP pairs, each mix 6 i ON THE BIRNBAUM ARGUMENT FOR THE STRONG LIKELIHOOD PRINCIPLE 9 imagined to have been generated from a θ-irrelevant 5.3 Refuting the Supposition that [(SP and ∗ mixture (Section 2.5). When we observe y we pluck WCP) entails SLP] the x∗ companion needed for the argument. In short, we can Birnbaumize an experimental result: Con- We can uphold both (1) and (2), while at the same time holding the following: structing statistic TB with the derived experiment EB is the “performance.” But what cannot shift in Infr ∗ Infr ∗ (3) E1 [x ] = E2 [y ]. i 6 the argument is the stipulation that E be given or ∗ ∗ known (as noted in Section 4.3.1), that i be fixed. Specifically, any case where x and y is an SLP Nor can the meaning of “given z∗” shift through the violation pair is a case where (3) is true. Since when- argument, if it is to be sound. ever (3) holds we have a counterexample to the Given z∗, the WCP precludes Birnbaumizing. On SLP generalization, this demonstrates that SP and the other hand, if the reported z∗ was the value WCP and not-SLP are logically consistent. Thus, of TB, then we are given only the disjunction, pre- so are WCP and not-SLP. This refutes the supposi- cluding the computation relevant for i fixed (Sec- tion that [(SP and WCP) entails SLP] and also any tion 4.3.1). Let us consider the components of Birn- purported derivation of SLP from WCP alone.4 baum’s argument. SP is not blocked in (1). The SP is always relative 5.2 Birnbaum’s Argument to a model, here EB . We have the following: ∗ ∗ ∗ ∗ (E2, y ) is given (and it has an SLP pair x ). The x and y are SLP pairs in EB, and Infr ∗ Infr ∗ question is to its inferential import. Birnbaum will EB [x ]= EB [y ] (i.e., [B] holds). seek to show that One may allow different contexts to dictate whether Infr ∗ Infr ∗ E2 [y ]= E1 [x ]. or not to condition [i.e., whether to apply (1) or (2)], ∗ but we know of no inference account that permits, The value of TB is (E1, x ). Birnbaumization maps let alone requires, self-contradictions. By noncontra- outcomes into hypothetical mixtures EB : diction, for any (E, z), InfrE[z]= InfrE[z]. (“ ” is (1) If the inference implication is by the stipulations a function from outcomes to inference implications,⇒ of EB, and z = z, for any z.) ∗ ∗ ∗ (E2, y ) InfrE [x ]= InfrE [y ]. Upholding and applying. This recalls our points ⇒ B B ∗ in Section 2.1. Applying a rule means following its Likewise for (E1, x ). TB is a sufficient statistic inference directive. We may uphold the if-then stip- for EB (the conditional distribution of Z given ulations in (1) and (2), but to apply their competing TB is independent of θ). implications in a single case is self-contradictory. (2) If the inference implication is by WCP, Arguing from a self-contradiction is unsound. ∗ Infr ∗ (E2, y ) ; EB [y ], The slogan that anything follows from a self- rather contradiction G and not-G is true, since for any ∗ ∗ claim C, the following is a logical truth: If G then (E2, y ) InfrE [y ] ⇒ 2 (if not-G then C). Two applications of modus po- and nens yield C. One can also derive not-C! But since ∗ ∗ G and its denial cannot be simultaneously true, (E1, x ) InfrE [x ]. ⇒ 1 any such argument is unsound. (A sound argument Following the inference implication according to must have true premises and be logically valid.) We EB in (1) is at odds with what the WCP stipu- know Birnbaum was not intending to argue from ∗ lates in (2). Given y , Birnbaumization directs us- a self-contradiction, but this may inadvertently oc- ing the convex combination over the components of cur. TB; WCP eschews doing so. We will not get Infr ∗ Infr ∗ 4By allowing applications of Birnbaumization and appro- E1 [x ]= E2 [y ]. priate choices of the irrelevant randomization probabilities, The SLP only seems to follow by the erroneous SP can be weakened to “mathematical equivalence,” or even identity: (with compounded mixtures) omitted so that WCP would en- Infr ∗ Infr ∗ tail SLP. See Birnbaum (1972) and Evans, Fraser and Monette EB [zi ]= Ei [zi ] for i = 1, 2. (1986). 10 D. G. MAYO

5.4 What if the SLP Pair Arose from an Actual tion 2.5). However, the key point of the WCP is to Mixture? deny that this fact should alter the inference im- ∗ ∗ ∗ plication from the known y . To insist it should is What if the SLP pair x , y arose from a genuine, to deny the WCP. Granted, WCP sought to identify and not a Birnbaumized, mixture. (Consider fixed the relevant sampling distribution for inference from versus sequential sampling, Section 3.1. Suppose E1 ∗ a specified type of mixture, and a known y , but fixes n at 169, the coin flip says perform E2, and it it is Birnbaum who purports to give an argument happens to stop at n = 169.) We may allow that an that is relevant for a sampling theorist and for “ap- unconditional formulation may be defined so that proaches which are independent of this [Bayes’] prin- Infr ∗ Infr ∗ Emix [x ]= Emix [y ]. ciple” [Birnbaum (1962), page 283]. Its implications for sampling theory is why it was dubbed “a land- But WCP eschews the unconditional formulation; mark in statistics” [Savage (1962b), page 307]. it says condition on the experiment known to have Let us look at the two statements about inference produced zi: ∗ implications from a given (E2, y ), applying (1) and ∗ ∗ (Emix, zi ) InfrE [zi ], i = 1, 2. (2) in Section 5.2: ⇒ i ∗ ∗ ∗ Infr ∗ Any SLP violation pair x , y remains one: (E2, y ) EB [x ], ∗ ∗ ⇒ InfrE [x ]= InfrE [y ]. ∗ ∗ 1 2 (E2, y ) InfrE [y ]. 6 ⇒ 2 6. DISCUSSION Can both be applied in exactly the same model with the same given z? The answer is yes, so long as the We think a fresh look at this venerable argument WCP happens to make no difference: is warranted. Wearing a logician’s spectacles and en- Infr ∗ Infr ∗ tering the debate outside of the thorny issues from EB [zi ]= Ei [zi ], i = 1, 2. decades ago may be an advantage. Now the SLP must be applicable to an arbitrary It must be remembered that the onus is not on SLP pair. However, to assume that (1) and (2) can someone who questions if the SLP follows from SP be consistently applied for any x∗, y∗ pair would be and WCP to provide suitable principles of evidence, to assume no SLP violations are possible, which re- however desirable it might be to have them. The ally would render Birnbaum’s argument circular. So onus is on Birnbaum to show that for any given y∗, ∗ from Section 5.3, the choices are to regard Birn- a member of an SLP pair with x , with different baum’s argument as unsound (arguing from a con- probability models f1( ), f2( ), that he will be able · · ∗ ∗ tradiction) or circular (assuming what it purports to derive from SP and WCP, that x and y would to prove). Neither is satisfactory. We are left with have the identical inference implications concerning competing inference implications and no way to get shared parameter θ. We have shown that SLP viola- to the SLP. There is evidence Birnbaum saw the gap tions do not entail renouncing either the SP or the in his argument (Birnbaum (1972)), and in the end WCP. he held the SLP only restricted to (predesignated) It is no rescue of Birnbaum’s argument that a sam- point against point hypotheses.5 pling theorist wants principles in addition to the It is not SP and WCP that conflict; the conflict WCP to direct the relevant sampling distribution comes from WCP together with Birnbaumization— for inference; indeed, Cox has given others. It was understood as both invoking the hypothetical mix- to make the application of the WCP in his argu- ture and erasing the information as to which ex- ment as plausible as possible to sampling theorists periment the data came. If one Birnbaumizes, one that Birnbaum begins with the type of mixture in cannot at the same time uphold the “keep irrele- Cox’s (1958) famous example of instruments E1, E2 vants irrelevant” (Irrel) stipulation of the WCP. So with different precisions. for any given (E, z) one must choose, and the an- We do not assume sampling theory, but employ a swer is straightforward for a holder of the WCP. To formulation that avoids ruling it out in advance. The failure of Birnbaum’s argument to reach the SLP re- 5This alone would not oust all sampling distributions. Birn- lies only on a correct understanding of the WCP. We ∗ baum’s argument, even were it able to get a foothold, would may grant that for any y its SLP pair could occur have to apply further rounds of conditioning to arrive at the in repetitions (and may even be out there as in Sec- data alone. ON THE BIRNBAUM ARGUMENT FOR THE STRONG LIKELIHOOD PRINCIPLE 11 paraphrase Cox’s (1958), page 361, objection to un- . . . are contrary to the intentions of the principles, conditional tests: as judged by the relevant supporting and motivat- ing examples. From this viewpoint we can state that Birnbaumization says that we can assign ∗ the intentions of S and C do not imply L.” [Where y a different level of significance than we ordinarily do, because one may identify an S, C and L are our SP, WCP and SLP.] Like Durbin ∗ and Kalbfleisch, they offer a choice of modifications SLP pair x and construct statistic TB . But this fact seems irrelevant to the in- of the principles to block the SLP. These are highly terpretation of an observation which we insightful and interesting; we agree that they high- light a need to be clear on the experimental model know came from E2. To conceal the index, and use the convex combination, would at hand. Still, it is preferable to state the WCP so give a distorted assessment of statistical as to reflect these “intentions,” without which it is significance. robbed of its function. The problem stems from mis- Infr Infr taking WCP as the equivalence Emix [z]= Ei [z] 7. RELATION TO OTHER CRITICISMS OF (whether the mixture is hypothetical or actual). BIRNBAUM This is at odds with the WCP. The puzzle is solved by adequately stating the WCP. Aside from that, A number of critical discussions of the Birnbaum we need only keep the meaning of terms consistent argument and the SLP exist. While space makes through the argument. it impossible to discuss them here, we believe the We emphasize that we are neither rejecting the SP current analysis cuts through this extremely com- nor claiming that it breaks down, even in the spe- plex literature. Take, for example, the most well- cial case EB. The sufficiency of TB within EB , as a known criticisms by Durbin (1970) and Kalbfleish mathematical concept, holds: the value of TB “suf- (1975), discussed in the excellent paper by Evans, ∗ fices” for InfrE [y ], the inference from the associ- Fraser and Monette (1986). Allowing that any y∗ B ated convex combination. Whether reference to hy- may be viewed as having arisen from Birnbaum’s pothetical mixture EB is relevant for inference from mathematical mixture, they consider the proper or- ∗ given y is a distinct question. For an alternative der of application of the principles. If we condition criticism see Evans (2013). on the given experiment first, Kalbfleish’s revised sufficiency principle is inapplicable, so Birnbaum’s 8. CONCLUDING REMARKS argument fails. On the other hand, Durbin argues, if we reduce to the minimal sufficient statistic first, An essential component of informative inference then his revised principle of conditionality cannot be for sampling theorists is the relevant sampling dis- applied. Again Birnbaum’s argument fails. So either tribution: it is not a separate assessment of perfor- way it fails. mance, but part of the necessary ingredients of in- Unfortunately, the idea that one must revise the formative inference. It is this feature that enables initial principles in order to block SLP allows down- sampling theory to have SLP violations (e.g., in sig- playing or dismissing these objections as tanta- nificance testing contexts). Any such SLP violation, mount to denying SLP at any cost (please see the 6 according to Birnbaum’s argument, prevents adher- references ). We can achieve what they wish to ing to both SP and WCP. We have shown that SLP show, without altering principles, and from WCP ∗ violations do not preclude WCP and SP. alone. Given y , WCP blocks Birnbaumization; The SLP does not refer to mixtures. But sup- ∗ ∗ given y has been Birnbaumized, the WCP pre- posing that (E2, y ) is given, Birnbaum asks us to cludes conditioning. consider that y∗ could also have resulted from a θ- We agree with Evans, Fraser and Monette (1986), irrelevant mixture that selects between E1, E2. The page 193, “that Birnbaum’s use of [the principles] WCP says this piece of information should be irrele- ∗ vant for computing the inference from (E2, y ) once 6In addition to the authors cited in the manuscript, see es- ∗ ∗ given. That is, InfrEi [(Emix, y )] = InfrEi [y ], i = pecially comments by Savage, Cornfield, Bross, Pratt, Demp- ∗ ∗ 1, 2. It follows that if InfrE [x ] = InfrE [y ], the two ster et al. (1962) on Birnbaum. For later discussions, see 1 6 2 ∗ Barndorff-Nielsen (1975), Berger (1986), Berger and Wolpert remain unequal after the recognition that y could (1988), Birnbaum (1970a, 1970b), Dawid (1986), Savage have come from the mixture. What was an SLP vi- (1970) and references therein. olation remains one. 12 D. G. MAYO

Given y∗, the WCP says do not Birnbaumize. One from anonymous referees, from Jean Miller and is free to do so, but not to simultaneously claim Larry Wasserman. My understanding of Birnbaum to hold the WCP in relation to the given y∗, on was greatly facilitated by philosopher of science, pain of logical contradiction. If one does choose to Ronald Giere, who worked with Birnbaum. I’m also Birnbaumize, and to construct TB , admittedly the grateful for his gift of some of Birnbaum’s original ∗ known outcome y yields the same value of TB as materials and notes. ∗ would x . Using the sample space of EB yields [B]: Infr ∗ Infr ∗ EB [x ]= EB [y ]. This is based on the con- REFERENCES vex combination of the two experiments and differs ∗ ∗ Barndorff-Nielsen, O. (1975). Comments on paper by J. from both InfrE [x ] and InfrE [y ]. So again, any 1 2 D. Kalbfleisch. Biometrika 62 261–262. SLP violation remains. Granted, if only the value Berger, J. O. Infr (1986). Discussion on a paper by Evans et of TB is given, using EB may be appropriate. al. [On principles and arguments to likelihood]. Canad. J. For then we are given only the disjunction: either Statist. 14 195–196. ∗ ∗ (E1, x ) or (E2, y ). In that case, one is barred from Berger, J. O. (2006). The case for objective Bayesian anal- 1 using the implication from either individual Ei. A ysis. Bayesian Anal. 385–402. MR2221271 Berger, J. O. and Wolpert, R. L. (1988). The Likeli- holder of WCP might put it this way: once (E, z) is hood Principle, 2nd ed. Lecture Notes—Monograph Series given, whether E arose from a θ-irrelevant mixture 6. IMS, Hayward, CA. or was fixed all along should not matter to the in- Birnbaum, A. (1962). On the foundations of statistical in- ference, but whether a result was Birnbaumized or ference. J. Amer. Statist. Assoc. 57 269–306. Reprinted not should, and does, matter. in Breakthroughs in Statistics 1 (S. Kotz and N. Johnson, There is no logical contradiction in holding that if eds.) 478–518. Springer, New York. Birnbaum, A. (1968). Likelihood. In International Encyclo- data are analyzed one way (using the convex com- pedia of the Social Sciences 9 299–301. Macmillan and the bination in EB), a given answer results, and if ana- Free Press, New York. lyzed another way (via WCP), one gets quite a dif- Birnbaum, A. (1969). Concepts of statistical evidence. ferent result. One may consistently apply both the In Philosophy, Science, and Method: Essays in Honor of Ernest Nagel (S. Morgenbesser, P. Suppes and EB and the WCP directives to the same result, in M. G. White, eds.) 112–143. St. Martin’s Press, New the same experimental model, only in cases where ∗ ∗ York. WCP makes no difference. To claim for any x , y , Birnbaum, A. (1970a). Statistical methods in scientific infer- the WCP never makes a difference, however, would ence. Nature 225 1033. assume that there can be no SLP violations, which Birnbaum, A. (1970b). On Durbin’s modified principle of 65 would make the argument circular.7 Another pos- conditionality. J. Amer. Statist. Assoc. 402–403. Birnbaum, A. (1972). More on concepts of statistical evi- sibility would be to hold, as Birnbaum ultimately dence. J. Amer. Statist. Assoc. 67 858–861. MR0365793 did, that the SLP is “clearly plausible” [Birnbaum Birnbaum, A. (1975). Comments on paper by (1968), page 301] only in “the severely restricted J. D. Kalbfleisch. Biometrika 62 262–264. case of a parameter space of just two points” where Casella, G. and Berger, R. L. (2002). Statistical Inference, these are predesignated [Birnbaum (1969), page 2nd ed. Duxbury Press, Belmont, CA. Cox, D. R. 128]. But that is to relinquish the general result. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357–372. MR0094890 Cox, D. R. (1977). The role of significance tests. Scand. J. ACKNOWLEDGMENTS Stat. 4 49–70. MR0448666 Cox, D. R. (1978). Foundations of statistical inference: I am extremely grateful to David Cox and Aris The case for eclecticism. Aust. N. Z. J. Stat. 20 43–59. Spanos for numerous discussions, corrections and MR0501453 joint work over many years on this and related foun- Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statis- dational issues. I appreciate the careful queries and tics. Chapman & Hall, London. MR0370837 detailed suggested improvements on earlier drafts Cox, D. R. and Mayo, D. G. (2010). Objectivity and con- ditionality in . In Error and Inference: Recent Exchanges on Experimental Reasoning, Reliability, 7His argument would then follow the pattern: If there are and the Objectivity and Rationality of Science (G. Mayo SLP violations, then there are no SLP violations. Note that and A. Spanos, eds.) 276–304. Cambridge Univ. Press, (V implies not-V) is not a logical contradiction. It is logically Cambridge. equivalent to not-V. Then, Birnbaum’s argument is equiva- Dawid, A. P. (1986). Discussion on a paper by Evans et lent to not-V: denying that x∗, y∗ can give rise to an SLP al. [On principles and arguments to likelihood]. Canad. J. violation. That would render it circular. Statist. 14 196–197. ON THE BIRNBAUM ARGUMENT FOR THE STRONG LIKELIHOOD PRINCIPLE 13

Durbin, J. (1970). On Birnbaum’s theorem on the relation Mayo, D. G. and Cox, D. R. (2011). Statistical scientist between sufficiency, conditionality and likelihood. J. Amer. meets a philosopher of science: A conversation. In Ratio- Statist. Assoc. 65 395–398. nality, Markets and Morals: Studies at the Intersection of Evans, M. (2013). What does the proof of Birnbaum’s theo- Philosophy and Economics 2 (D. G. Mayo, A. Spanos rem prove? Unpublished manuscript. and K. W. Staley, eds.) (Special Topic: Statistical Sci- Evans, M. J., Fraser, D. A. S. and Monette, G. (1986). ence and Philosophy of Science: Where do (should) They On principles and arguments to likelihood. Canad. J. Meet in 2011 and Beyond?) (October 18) 103–114. Frank- Statist. 14 181–199. MR0859631 furt School, Frankfurt. Ghosh, J. K., Delampady, M. and Samanta, T. (2006). Mayo, D. G. and Kruse, M. (2001). Principles of infer- An Introduction to Bayesian Analysis. Theory and Meth- ence and their consequences. In Foundations of Bayesian- ods. Springer Texts in Statistics. Springer, New York. ism (D. Corfield and J. Williamson, eds.) 24 381–403. MR2247439 Applied Logic. Kluwer Academic Publishers, Dordrecht. Kalbfleisch, J. D. (1975). Sufficiency and conditionality. Mayo, D. G. and Spanos, A. (2006). Severe testing as a ba- Biometrika 62 251–268. MR0386075 sic concept in a Neyman–Pearson philosophy of induction. Lehmann, E. L. and Romano, J. P. (2005). Testing Sta- British J. Philos. Sci. 57 323–357. MR2249183 tistical Hypotheses, 3rd ed. Springer Texts in Statistics. Mayo, D. G. and Spanos, A. (2011). Error statistics. In Springer, New York. MR2135927 Philosophy of Statistics 7 (P. S. Bandyopadhyay and M. Mayo, D. G. (1996). Error and the Growth of Experimental R. Forster, eds.) 152–198. Handbook of the Philosophy of Knowledge. Univ. Chicago Press, Chicago, IL. Science. Elsevier, Amsterdam. Mayo, D. G. (2010). An error in the argument from con- Reid, N. (1992). Introduction to Fraser (1966) structural ditionality and sufficiency to the likelihood principle. In probability and a generalization. In Breakthroughs in Error and Inference: Recent Exchanges on Experimental Statistics (S. Kotz and N. L. Johnson, eds.) 579–586. Reasoning, Reliability, and the Objectivity and Rationality Springer Series in Statistics. Springer, New York. of Science (D. G. Mayo and A. Spanos, eds.) 305–314. Savage, L. J., ed. (1962a). The Foundations of Statistical Cambridge Univ. Press, Cambridge. Inference: A Discussion. Methuen, London. Mayo, D. G. and Cox, D. R. (2010). Frequentist statis- Savage, L. J. (1962b). Discussion on a paper by A. Birn- tics as a theory of inductive inference. In Error and Infer- baum [On the foundations of statistical inference]. J. Amer. ence: Recent Exchanges on Experimental Reasoning, Reli- Statist. Assoc. 57 307–308. ability, and the Objectivity and Rationality of Science (D. Savage, L. J. (1970). Comments on a weakened principle of G. Mayo and A. Spanos, eds.) 247–274. Cambridge Univ. conditionality. J. Amer. Statist. Assoc. 65 (329) 399–401. Press, Cambridge. First published in The Second Erich Savage, L. J., Barnard, G., Cornfield, J., Bross, I., L. Lehmann Symposium: Optimality 49 (2006) (J. Rojo, Box, G. E. P., Good, I. J., Lindley, D. V. et al. (1962). ed.) 77–97. Lecture Notes—Monograph Series. IMS, Beach- On the foundations of statistical inference: Discussion. J. wood, OH. Amer. Statist. Assoc. 57 307–326.