The Philosophy of Author(s): Dennis V. Lindley Reviewed work(s): Source: Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 49, No. 3 (2000), pp. 293-337 Published by: Wiley-Blackwell for the Royal Statistical Society Stable URL: http://www.jstor.org/stable/2681060 . Accessed: 05/09/2012 14:41

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp

. JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected].

.

Wiley-Blackwell and Royal Statistical Society are collaborating with JSTOR to digitize, preserve and extend access to Journal of the Royal Statistical Society. Series D (The Statistician).

http://www.jstor.org The Statistician(2000) 49, Part3, pp. 293-337

The philosophy of statistics

Dennis V. Lindley Minehead,UK

[Received June 1999]

Summary. This paper puts forwardan overallview of statistics.It is argued that statisticsis the studyof uncertainty.The many demonstrationsthat uncertaintiescan only combine accordingto the rules of the probabilitycalculus are summarized.The conclusion is thatstatistical inference is firmlybased on probabilityalone. Progress is thereforedependent on the constructionof a probabilitymodel; methods for doing this are considered. It is argued that the probabilitiesare personal. The roles of likelihoodand exchangeabilityare explained. Inferenceis onlyof value ifit can be used, so the extensionto decision analysis, incorporatingutility, is relatedto riskand to the use of statisticsin science and law. The paper has been writtenin the hope thatit will be intelligible to all who are interestedin statistics.

Keywords:Conglomerability; analysis; Decision analysis; Exchangeability;Law; Likelihood; Models; Personal probability;Risk; Scientificmethod; Utility

1. Introduction Insteadof discussinga specificproblem, this paper providesan overviewwithin which most statisticalissues can be considered.'Philosophy' in thetitle is used in thesense of 'The studyof the generalprinciples of some particularbranch of knowledge,experience or activity'(Onions, 1956).The wordhas recentlyacquired a reputationfor being concerned solely with abstract issues, divorcedfrom reality. My intentionhere is to avoidexcessive abstraction and to deal withpractical mattersconcerned with our subject. If thepractitioner who readsthis paper does notfeel that the studyhas benefitedthem, my writing will havefailed in one of itsendeavours. The papertries to developa wayof lookingat statisticsthat will help us, as statisticians,to developbetter a sound approachto anyproblem which we mightencounter. Technical matters have largely been avoided, notbecause theyare not important,but in theinterests of focusingon a clear understandingof how a statisticalsituation can be studied.At someplaces, matters of detailhave been omittedto highlightthe key idea. For example,probability densities have been used withoutan explicit mentionof the dominating measure to which they refer. The paperbegins by recognizingthat statistical issues concernuncertainty, going on to argue thatuncertainty can only be measuredby probability.This conclusionenables a systematic accountof inference,based on theprobability calculus, to be developed,which is shownto be differentfrom some conventionalaccounts. The likelihoodprinciple follows from the basic role playedby probability.The role of data analysisin theconstruction of probabilitymodels and the natureof modelsare nextdiscussed. The developmentleads to a methodof makingdecisions and the natureof riskis considered. and its applicationto some legal issues are explainedwithin the probabilistic framework. The conclusionis thatwe have herea satisfactory generalset of statisticalprocedures whose implementation should improve statistical practice.

Addressfor correspondence: Dennis V Lindley,"Woodstock", Quay Lane, Minehead, Somerset, TA24 5QU, UK. E-mail:[email protected]

? 2000 Royal StatisticalSociety 0039-0526/00/49293 294 D. V. Lindley The philosophyhere presented places moreemphasis on modelconstruction than on formal inference.In thisit agreeswith much recent opinion. A reasonfor this change of emphasisis that formalinference is a systematicprocedure within the calculus of probability. Model construction, bycontrast, cannot be so systematic. The paperarose out of myexperiences at theSixth Valencia Conference on BayesianStatistics, heldin June1998 (Bemardoet al., 1999). AlthoughI was impressedby theoverall quality of the papersand thesubstantial advances made, many participants did notseem to me fullyto appre- ciatethe Bayesian philosophy. This paper is an attemptto describemy version of thatphilosophy. It is a reflectionof 50 years'statistical experience and a personalchange from a frequentist, throughobjective Bayes, to the subjectiveattitude presented here. No attempthas been made to analysein detailalternative philosophies, only to indicatewhere their conclusions differ from those developedhere and to contrastthe resulting practical methods.

2. Statistics To discuss the philosophyof statistics,it is necessaryto be reasonablyclear what it is the philosophyof, not in thesense of a precisedefinition, so thatthis is 'in', thatis 'out', butmerely to be able to perceiveits outlines. The suggestionhere is thatstatistics is thestudy of uncertainty (Savage, 1977): thatstatisticians are expertsin handlinguncertainty. They have developedtools, like standarderrors and significancelevels, that measure the uncertainties that we mightreason- ablyfeel. A checkof howwell this description of oursubject agrees with what we actuallydo can be performedby lookingat thefour series of journals published by the Royal Statistical Society in 1997. These embraceissues as diverseas social accountingand stablelaws. Apartfrom a few exceptions,like thealgorithms section (which has subsequentlybeen abandoned)and a paperon education,all thepapers deal eitherdirectly with uncertainty or withfeatures, like stablelaws, whicharise in problemsthat exhibit uncertainty. Support for this view of oursubject is provided by the fact that statisticsplays a greaterrole in topics thathave variability,giving rise to uncertainty,as an essentialingredient, than in moreprecise subjects. Agriculture, for example, enjoysa close associationwith statistics, whereas physics does not. Notice thatit is onlythe manipulationof uncertaintythat interests us. We are not concernedwith the matterthat is uncertain.Thus we do not studythe mechanismof rain;only whether it will rain.This places statisticsin a curioussituation in thatwe are,as practitioners,dependent on others.The forecastof rainwill be dependenton bothmeteorologist and statistician.Only as theoreticianscan we exist alone.Even therewe sufferif we remaintoo divorcedfrom the science. The term'client' will be used in referenceto theperson, e.g. scientistor lawyer,who encountersuncertainty in theirfield of study. The philosophicalposition adopted here is thatstatistics is essentiallythe study of uncertainty and thatthe statistician'srole is to assist workersin otherfields, the clients,who encounter uncertaintyin theirwork. In practice,there is a restrictionin thatstatistics is ordinarilyassociated withdata; and it is thelink between the uncertainty, or variability, in thedata and that in thetopic itselfthat has occupiedstatisticians. Some writerseven restrictthe data to be frequencydata, capable of near-identicalrepetition. Uncertainty, away fromdata, has rarelybeen of statistical interest.Statisticians do nothave a monopolyof studiesof uncertainty.Probabilists discuss how randomnessin one partof a systemaffects other parts. Thus themodel fora stochasticprocess providespredictions about the data that the process will provide. The passagefrom process to data is clear;it is whenwe attempta reversaland go fromdata to processthat difficulties appear. This paperis mainlydevoted to thislast phase, commonly called inference, and theaction that it might generate. Philosophyof Statistics 295 Notice thatuncertainty is everywhere,not just in science or even in data. It providesa motivationfor some aspects of theology(Bartholomew, 1988). Therefore,the recognitionof statisticsas uncertaintywould imply an extensiverole forstatisticians. If a philosophicalposition can be developedthat embraces all uncertainty,it will providean importantadvance in our understandingofthe world. At the moment it would be presumptiveto claimso much.

3. Uncertainty Acceptancethat statistics is thestudy of uncertaintyimplies that it is necessaryto investigatethe phenomenon.A scientificapproach would mean the measurementof uncertainty;for, to follow Kelvin,it is onlyby associatingnumbers with any scientificconcept that the conceptcan be properlyunderstood. The reasonfor measurement is notjust to makemore precise the notion that we are moreuncertain about the stock-market than about the sun rising tomorrow, but to be able to combineuncertainties. Only exceptionally is thereone elementof uncertaintyin a problem; morerealistically there are several.In thecollection of data,there is uncertaintyin thesampling unit,and thenin thenumber reported in thesampling. In an archetypalstatistical problem, there is uncertaintyin bothdata and parameter.The centralproblem is thereforethe combinationof uncertainties.Now therules for the combination of numbersare especiallysimple. Furthermore, numberscombine in twoways, addition and multiplication,so leadingto a richnessof ideas. We wantto measureuncertainties in orderto combinethem. A politiciansaid thathe preferred adverbsto numbers.Unfortunately it is difficultto combineadverbs. How is thismeasurement to be achieved?All measurementis based on a comparisonwith a standard.For lengthwe referto the orange-redline of thekrypton-86 isotope. The key role of comparisonsmeans that there are no absolutesin theworld of measurement. This is a pointwhich we shallreturn to in Section11. It is thereforenecessary to finda standardfor uncertainty. Several have been suggestedbut the simplestis historicallythe first,namely games of chance.These providedthe first uncertainties to be studiedsystematically. Let us thereforeuse as ourstandard a simplegame. Considerbefore you an urncontaining a knownnumber N of balls thatare as nearlyidentical as modernengineering can makethem. Suppose that one ball is drawnat randomfrom the urn. Forthis to make sense,it is needfulto definerandomness. Imagine that the balls are numbered consecutivelyfrom 1 to N and supposethat, at no costto you,you wereoffered a prizeif ball 57 weredrawn. Suppose, alternatively, that you were offered the same prize if ball 12 weredrawn. If you are indifferentbetween the two propositionsand, in extension,beween any two numbers between1 and N, then,for you, the ball is drawnat random.Notice thatthe definitionof randomnessis subjective;it dependson you.What is randomfor one personmay not be random foranother. We shallreturn to thisaspect in Section8. Havingsaid whatis meantby the drawing of a ball at random,forget the numbers and suppose thatR of the balls are red and the remainderwhite, the colouringnot affectingyour opinion of randomness.Consider the uncertainevent that the ball, withdrawnat random,is red. The suggestionis thatthis provides a standardfor uncertainty and thatthe measureis R/N, the proportionof red balls in the urn.There is nothingprofound here, being just a variantof the assumptionon whichgames of chanceare based. Now pass to anyevent, or proposition,which can eitherhappen or not,be trueor false.It is proposedto measureyour uncertainty associated withthe event happening by comparisonwith the standard.If you thinkthat the event is just as uncertainas therandom drawing of a redball froman urncontaining N balls,of whichR arered, thenthe event has uncertaintyR/N foryou. R and N are foryou to choose.For given N, itis easy to see thatthere cannot be morethan one suchR. Thereis nowa measureof uncertaintyfor any 296 D. V. Lindley eventor proposition. Before proceeding, let us considerthe measurement process carefully. A seriousassumption has beenmade that the concept of uncertainty can be isolatedfrom other features.In discussingrandomness, it was usefulto comparea prizegiven under different cir- cumstances,ball 57 or ball 12. Rewardswill notnecessarily work in thecomparison of an event witha standard.For example,suppose that the eventwhose uncertainty is beingassessed is the explosionof a nuclearweapon, within 50 milesof you,next year. Then a prizeof ?10000, say, will be valued differentlywhen a red ball appearsfrom when it will when fleeingfrom the radiation.The measurementprocess just describedassumes that you can isolatethe uncertainty of thenuclear explosion from its unpleasantconsequences. For the momentwe shall make the assumption,returning to it in Section 17 to showthat, in a sense,it is not important.Ramsey (1926), whosework will be discussedin Section4, introducedthe concept of an 'ethicallyneutral event'for which the comparison with the urn presents fewer difficulties. Nuclear bombs are not ethicallyneutral. In contrast,notice an assumptionthat has notbeen made. For any event, including the nuclear bomb,it has notbeen assumedthat you can do themeasurement to determineR (and N, butthat onlyreflects the precisionin yourassessment of the uncertainty).Rather, we assumethat you wouldwish to do it,were you to knowhow. All thatis assumedof anymeasurement process is thatit is reasonable,not that it can easilybe done.Because you do notknow how to measurethe distanceto ourmoon, it does notfollow that you do notbelieve in theexistence of a distanceto it. Scientistshave spentmuch efforton the accuratedetermination of lengthbecause theywere convincedthat the concept of distancemade sense in termsof kryptonlight. Similarly, it seems reasonableto attemptthe measurement of uncertainty.

4. Uncertaintyand probability It has beennoted that a primereason for the measurement of uncertainties is to be able to combine them,so letus see howthe method suggested accomplishes this end. Supposethat, of the N balls in theurn, R are red,B are blue and theremainder white. Then the uncertainty that is associated withthe withdrawal of a colouredball is (R + B)/N = R/N + B/N, thesum of the uncertainties associatedwith red, and withblue, balls. The sameresult will obtainfor any two exclusive events whose uncertaintiesare respectivelyR/N and B/N and we have an additionrule for your uncertaintiesof exclusive events. Next,suppose again that R balls are redand theremaining N - R white;at thesame time,S are spottedand theremaining N - S plain.The urnthen contains four types of ball, of which one typeis bothspotted and red,of whichthe number is say T Thenthe uncertainty associated with thewithdrawal of a spottedred ball is T/N, whichis equal to R/N X T/R, theproduct of the uncertaintyof a redball and thatof spottedballs amongthe red. Again the same result will apply forany two eventsbeing compared with coloured and withspotted balls and we have a product rulefor uncertainties. The additionand product rules just obtained,together with the convexity rule that the measure- mentR/N alwayslies in the(convex) unit interval, are thedefining rules of probability,at least fora finitenumber of events (see Section5). The conclusionis thereforethat the measurements of uncertaintycan be describedby thecalculus of probability.In thereverse direction, the rules of probabilityreduce simply to the rulesgoverning proportions. Incidentally, this helps to explain whyfrequentist arguments are oftenso useful:the combination of uncertaintiescan be studiedby proportions,or frequencies,in a group,here of balls. The mathematicalbasis ofprobability is very simpleand itis perhapssurprising that it yields complicated and useful results. The conclusionsare thatstatisticians are concernedwith uncertainty and thatuncertainty can Philosophyof Statistics 297 be measuredby probability. The sketchydemonstration here may not be thoughtadequate for such an importantconclusion, so letus lookat otherapproaches. Historically,uncertainty has been associatedwith games of chanceand gambling.Hence one wayof measuring uncertainty is through the gambles that depend on it.The willingnessto stakes on an eventto win w if the eventoccurs is, in effect,a measureof theuncertainty of the event expressedthrough the odds (against)of w/s to 1. The combinationof eventsis nowreplaced by collectionsof gambles.To fixideas, contemplatethe situationin horse-racing.With two horses runningin thesame race,bets on themseparately may be consideredas well as a beton eitherof themwinning. With horses in differentraces, the eventof bothwinning may be used. Using combinationslike these, the concept of what in Britainwe terma Dutchbook can be employed.A seriesof betsat specifiedodds is said to constitutea Dutch book if it is possibleto place a set of stakesin sucha waythat one is sureto comeout overall with an increasein assets.A bookmaker neverstates odds forwhich a Dutchbook may be made.It is easyto show(de Finetti,1974, 1975) thatavoidance of a Dutchbook is equivalentto theprobabilities, derived from the odds, obeying theconvexity, addition and multiplicationrules of probabilityalready mentioned. For odds o, the correspondingprobability is p = (1 + o)-1. In summary,the use of odds, combinedwith the impossibilityof a Dutchbook, leads back to probabilityas before. de Finettiintroduced another approach. Suppose that you are requiredto stateyour numerical uncertaintyfor an event,and you are toldthat you will be scoredby an amountS(x, E) if you statex, whereE =1 or E = 0 if theevent is subsequentlyshown to be trueor falserespectively. For differentevents, the scores are to be added. A possible penaltyscore used by him is S(x, E) = (x - E)2, but,aside froma fewscores that lead to ridiculousconclusions, any function will do. Then if you use x as some functionof probability,i.e. as a functionof somethingthat obeysthose three rules of probability, you will be sure,in thesense of obtaining a smallerpenalty score,to do betterthan someone who acts in anotherway. Which function depends on S. This approachis attractivebecause it is operationaland has been used to trainweather forecasters. Generallyit provides an empiricalcheck on thequality of yourprobability assessments and hence a testof yourabilities as a statistician(see Section9). The importantpoint here is thatagain it leads to probability.We next studya thirdgroup of methodsthat again compelsthe use of probabilityto describeuncertainty. The RoyalStatistical Society, together with many other statistical groups, was originallyset up to gatherand publishdata. This agreeswith our argumentabout uncertainty because part of the purposebehind the data gatheringwas a reductionin uncertainty.It remains an essentialpart of statisticalactivity today and most Governmentshave statisticaloffices whose functionis the acquisitionand presentation of statistics. It didnot take long before statisticians wondered how the data mightbest be used and modernstatistical inference was born.Then statisticians watched as theinferences were used as thebasis of decisions,or actions,and began,with others, to concern themselveswith decision-making. It is nowexplained how this leads againto probability. The firstperson to makethe step to decision-makingseems to have been Ramsey(1926). He askedthe simple question 'how shouldwe makedecisions in theface of uncertainty?'.He made some apparentlyreasonable assumptions and fromthem deduced theorems. The theoremthat concernsus hereis thatwhich says thatthe uncertaintymeasures should obey the rulesof the probabilitycalculus. So we are back to probabilityagain. Ramsey's work was unappreciateduntil Savage (1954) askedthe same question and fromdifferent, but related, assumptions came up with thesame result, namely that it has to be probability.Savage also establisheda linkwith de Finetti's ideas. Since then,many others have exploredthe field with similar results. My personalfavourite amongpresentations that have full rigour is thatof DeGroot (1970), chapter6. An excellentrecent presentationis Bernardo and Smith(1994). 298 D. V. Lindley 5. Probability The conclusionis thatmeasurements of uncertaintymust obey the rules of the probability calculus.Other rules, like thoseof fuzzylogic or possibilitytheory, dependent on maximaand minima,rather than sums and products,are out. So are some rules used by statisticians;see Section 6. All these derivations,whether based on balls in urns,gambles, scoring rules or decision-making,are based on assumptions.Since these assumptions imply such important results, itis properthat they are examined with great care. Unfortunately, thegreat majority of statisticians do notdo this.Some denythe central result about probability, without exploring the reasons for it. It is notthe purpose of thispaper to providea rigorousaccount but let us look at one assumption becauseof itslater implications. As withall theassumptions, it is intendedto be self-evidentand thatyou wouldfeel foolish if you wereto be caughtviolating it. It is based on a primitivenotion of one eventbeing more likely than another. ('Likely' is notused in itstechnical sense, but as part of normalEnglish usage.) We write'A is morelikely than B' as A > B. The assumptionis thatif Al and A2 are exclusive,and the same is trueof B, and B2, then Ai > Bi (i = 1, 2) imply Al U A2> B, U B2. (Al U A2 meansthe eventthat is truewhenever either Al or A2 is true.)We mightfeel unhappy if we thoughtthat the next person to pass througha doorwas morelikely to be a non-whitefemale, AI, thana non-whitemale, B,, thata whitefemale, A2, is morelikely than a whitemale, B2, yet a female,AI U A2 was less likelythan a male, B, U B2. The developments outlinedabove startfrom assumptions, or axioms,of thischaracter. The importantpoint is that theyall lead to probabilitybeing the only satisfactory expression of uncertainty. The last sentenceis not strictlytrue. Some writershave consideredthe axioms carefully and producedobjections. A finecritique is Walley(1991), whowent on to constructa system that uses a pair of numbers,called upperand lowerprobabilities, in place of the singleprobability. The resultis a morecomplicated system. My positionis thatthe complication seems unnecessary. I haveyet to meeta situationin whichthe probability approach appears to be inadequateand where theinadequacy can be fixedby employingupper and lowervalues. The pair is supposedto deal withthe precision of probabilityassertions; yet probability alone containsa measureof its own precision.I believein simplicity;provided that it works,the simpleris to be preferredover the complicated,essentially Occam's razor. With the conclusionthat uncertaintyis only satisfactorilydescribed by probability,it is convenientto stateformally the threerules, or axioms,of theprobability calculus. Probability dependson twoelements: the uncertain event and theconditions under which you are considering it. In theextraction of balls froman urn,your probability for red dependson thecondition that theball is drawnat random.We writep(A IB) foryour probability of A whenyou know,or are assuming,B to be true,and we speakof your probability of A, givenB. The rulesare as follows.

(a) Rule 1 (convexity):for all A and B, 0 - p(AIB) - 1 and p(A A) = 1. (b) Rule 2 (addition):if A and B are exclusive,given C, p(A U BI C) = p(AIC) + p(BI C). (c) Rule 3 (multiplication):for all A, B and C, p(ABIC) = p(A IBC) p(BI C). (Here AB meansthe event that occurs if, and onlyif, both A and B occur.)The convexityrule is sometimesstrengthened to includep(AIB)= 1 only if A is a logical consequenceof B. The additionis calledCromwell's rule. Thereis one pointabout the addition rule that appears to be merelya mathematicalnicety but in facthas importantpractical consequences to be exhibitedin Section8. Withthe three rules as Philosophyof Statistics 299 statedabove, it is easyto extendthe addition rule for two, to anyfinite number of events.None of themany approaches already discussed lead to the rule'sholding for an infinityof events.It is usualto supposethat it does so holdbecause of theundesirable results that follow without it. This can be madeexplicit either by simply restating the addition rule, or by adding a fourthrule which, togetherwith the three above, leads to additionfor an infinityof exclusiveevents. My personal preferenceis forthe latter and to add thefollowing. (d) Rule 4 (conglomerability):if {B,,} is a partition,possibly infinite, of C and p(AIB, C) = k, thesame value for all n,then p(A IC) = k. (It is easy to verifythat rule 4 followsfrom rules 1-3 whenthe partition is finite.The definition is due to de Finetti.)Conglomerability is in thespirit of a class of rulesknown as 'surethings'. Roughly,if whateverhappens (whatever B,,) your belief is k, thenyour belief is k uncondition- ally. The assumptiondescribed earlier in thissection is in the same spirit.Some statisticians appearto be conglomerableonly when it suitsthem: hence the practical connection to be studied in Section8. Note thatthe rules of probabilityare herenot stated as axiomsin themanner found in texts on probability.They are deductions,apart fromrule 4, fromother, more basic, assumptions.

6. Significance and confidence The reactionof manystatisticians to theassertion that they should use probabilitywill be to say thatthey do it already,and thatthe developments here described do nothingmore than give a little cachetto whatis alreadybeing done. The journalsare fullof probabilities:normal and binomial distributionsabound; the exponential family is everywhere.It mighteven be claimedthat no other measureof uncertaintyis used: few,if any,statisticians embrace fuzzy logic. Yet this is nottrue; statisticiansdo use measuresof uncertaintythat do not combineaccording to the rules of the probabilitycalculus. Considera hypothesisH, thata medicaltreatment is ineffectual,or thata specificsocial factor does notinfluence crime levels. The physician,or sociologist,is uncertainabout H, and data are collectedin thehope of removing,or at leastreducing, the uncertainty. A statistician called in to adviseon theuncertainty aspect may recommend that the client uses, as a measureof uncertainty, a tail area,significance level, with H as thenull hypothesis. That is, assumingthat H is true,the probabilityof the observed, or moreextreme, data is calculated.This is a measureof the credence thatcan be puton H; thesmaller the probability, the smaller is thecredence. This usage fliesin the face of the argumentsabove whichassert that uncertainty about H needsto be measuredby a probabilityfor H. A significancelevel is notsuch a probability.The distinctioncan be expressedstarkly: significancelevel-the probabilityof some aspect of the data, given H is true; probability yourprobability of H, giventhe data. The prosecutor'sfallacy is well knownin legal circles.It consistsin confusingp(A IB) with p(BIA), two values whichare onlyrarely the same. The distinctionbetween significance levels and probabilityis almostthe prosecutor's fallacy: 'almost' because although B, in theprosecutor form,may be equated with H, the data are treateddifferently. Probability uses A as data. Adherentsof significancelevels soon recognizedthat they could notuse just thedata buthad to include'more extreme' data in theform of thetail of a distribution.As Jeffreys(1961) putit: the levelincludes data which might have happened but did not. Fromhypothesis testing, let us pass to pointestimation. A parameter0 mightbe theeffect of 300 D. V. Lindley themedical treatment, or the influenceof the social factoron crime.Again 0 is uncertain,data mightbe collectedand a statisticianconsulted (hopefully not in thatorder). The statisticianwill typicallyrecommend a confidenceinterval. The developmentabove based on measureduncer- taintywill use a probabilitydensity for 0, andperhaps an intervalof that density. Again we havea contrastsimilar to theprosecutor's fallacy: confidenceprobability that the interval includes 0; probability probabilitythat 0 is includedin theinterval. The formeris a probabilitystatement about the interval, given 0; thelatter about 0, giventhe data. Practitionersfrequently confuse the two. More important than the confusion is thefact that neither significancelevels nor statements of confidencecombine according to therules of theprobability calculus. Does the confusionmatter? At a theoreticallevel, it certainlydoes, because the use of any measurethat does not combineaccording to therules of theprobability calculus will ultimately violate some of the basic assumptionsthat were intendedto be self-evidentand to cause embarrassmentif violated. At a practicallevel, it is not so clear and it is necessaryto spenda whileexplaining the practical implications. Statisticians tend to studyproblems in isolation,with the resultthat combinations of statementsare not needed,and it is in the combinationsthat difficultiescan arise,as was seenin thecolour-sex example in Section5. Forexample, it is rarely possibleto makea Dutchbook against statements of significancelevels. Some commonestimators are knownto be inadmissible.The clearestexample of an importantviolation occurs with the relationshipbetween a significancelevel and the sample size n on which it is based. The interpretationof 'significant at 5%' dependson n, whereasa probabilityof 5% alwaysmeans the same.Statisticians have paid inadequateattention to therelationships between statements that they make and the samplesizes on whichthey are based. Thereare theoreticalreasons (Berger and Delampady,1987) for thinkingthat it is too easy to obtain 5% significance.If so, many experimentsraise false hopes of a beneficialeffect that does nottruly exist. Individualstatistical statements, made in isolation,may not be objectionable;the trouble lies in theircombinations. For example,confidence intervals for a single parameterare usually acceptablebut, with many parameters, they are not. Even the ubiquitoussample mean fora normaldistribution is unsoundin high dimensions.In an experimentwith several treatments, individualtests are finebut multiple comparisons present problems. Scientific truth is established by combiningthe results of manyexperiments: yet meta-analysis is a difficultarea forstatistics. How do you combineseveral data sets concerningthe same hypothesis,each with its own significancelevel? The conclusionsfrom two Studentt-tests on meansY, and M2 do notcohere withthe bivariate T-test for (yi, a2) (Healy,1969). In contrast,the view adopted here easily takes themargins of the latter to providethe former. The conclusionthat probability is theonly measure of uncertaintyis therefore not just a pat on theback but strikesat manyof thebasic statisticalactivities. Savage developedhis ideas in an attemptto justify statistical practice. He surprisedhimself by destroying some aspects of it.Let us thereforepass fromdisagreements to theconstructive ideas thatflow from the appreciation of the basic roleof probability in statistics.

7. Inference The formulationthat has servedstatistics well throughout this century is based on thedata having, foreach value of a parameter,a probabilitydistribution. This accordswith the idea thatthe uncertaintyin the data needs to be describedprobabilistically. It is desiredto learnsomething Philosophyof Statistics 301 aboutthe parameter from the data. Generallynot every aspect of theparameter is of interest,so writeit as (0, a) wherewe wishto learnabout 0 witha as a nuisance,to use thetechnical term. Denotingthe data by x, theformulation introduces p(x 0, a), theprobability of x given0 and a. A simpleexample would have a normaldistribution of mean0 and variancea, butthe formulation embracesmany complicated cases. This handlesthe uncertaintyin the data to everyone'ssatisfaction. The parameteris also uncertain.Indeed, it is thatuncertainty that is thestatistician's main concern. The recipesays that it also shouldbe describedby a probabilityp(O, a). In so doing,we departfrom the conventional attitude.It is oftensaid thatthe parameters are assumedto be randomquantities. This is notso. It is the axioms thatare assumed,from which the randomnessproperty is deduced.With both probabilitiesavailable, the probability calculus can be invokedto evaluatethe revised uncertainty in thelight of the data: p(O,a x) oxp(x 0, a) p(O,a), (1) theconstant of proportionality dependent only on x, notthe parameters. Since a is notof interest, itcan be eliminated,again by the probability calculus, to give

p(O x)= p(O, ax) da. (2)

Equation(1) is theproduct rule; equation (2) theaddition rule. Together they solve theproblem of inference,or, better,they provide a frameworkfor its solution.Equation (1) is Bayes's theoremand, by a historicalaccident, it has givenits name to the whole approach,which is termedBayesian. This perhapsunfortunate terminology is accompaniedby some otherwhich is even worse. p(O) is oftencalled the priordistribution, p(Olx) the posterior.These are unfortunatebecause prior and posteriorare relativeterms, referring to thedata. Today's posterior is tomorrow'sprior. The termsare so engrainedthat theircomplete avoidance is almost impossible. Let us summarizethe position reached. (a) Statisticsis thestudy of uncertainty. (b) Uncertaintyshould be measuredby probability. (c) Data uncertaintyis so measured,conditional on theparameters. (d) Parameteruncertainty is similarly measured by probability. (e) Inferenceis performedwithin the probability calculus, mainly by equations (1) and (2). Points(a) and (b) havebeen discussedand (c) is generallyaccepted. Point (e) followsfrom (a)- (d). The mainprotest against the Bayesian position has beento point(d). It is thereforeconsidered next.

8. Subjectivity At thebasic level fromwhich we started,it is clearthat one person'suncertainties are ordinarily differentfrom another's. There may be agreementover well-conducted games of chance,so that theymay be used as a standard,but on manyissues, even in science,there can be disagreement. As thisis beingwritten, scientists disagree on some issuesconcerning genetically modified food. It mighttherefore be sensibleto reflectthe subjectivity in thenotation. The preferredway to do thisis to includethe concept of a person'sknowledge at thetime that the probability judgment is made. Denotingthat by K, a betternotation than p(O) is p(OIK), theprobability of 0 givenK, changingto p(Olx,K) on acquiringthe data. This notationis valuablebecause it emphasizesthe 302 D. V. Lindley factthat probability is alwaysconditional (see Section5). It dependson two arguments:the elementwhose uncertaintyis being describedand the knowledgeon whichthat uncertainty is based. The omissionof the conditioningargument often leads to confusion.The distinction betweenprior and posterioris betterdescribed by emphasizingthe differentconditions under whichthe probability of the parameter is beingassessed. It has been suggestedthat two people with the same knowledgeshould have the same uncertainties,and thereforethe same probabilities.It is called thenecessary view. There are two difficultieswith this attitude. First, it is difficultto say whatis meantby twopeople havingthe same knowledge,and also hardto realizein practice.The secondpoint is that,if theprobability does necessarilyfollow, it shouldbe possibleto evaluateit withoutreference to a person.So far no-onehas managedthe evaluation in an entirelysatisfactory way. One wayis throughthe concept of ignorance.If a stateof knowledgeI was identified,that described the lack of knowledgeabout 0, and thatp(O I) was defined,then p(OIK), forany K, could be calculatedby Bayes'stheorem (1) on updatingI to K. Unfortunatelyattempts to do this ordinarilylead to a conflictwith conglomerability.For example,suppose that 0 is knownonly to assumepositive integer values andyou are otherwiseignorant about 0. Thenthe usual concept of ignorancemeans all valuesare equallyprobable: p(O = i I) = c forall i. The additionrule for n exclusiveevents {0 = i}, with ne > 1, meansc = 0 since no probabilitycan exceed 1 by convexity.Now partitionthe positive integersinto sets of three,each containingtwo odd,and one even,value: A,,= (4n - 3, 4n - 1, 2n) will do. If E is the eventthat 0 is even, p(EIA,,) = 3 and by conglomerabilityp(E) = 3 Anotherpartition, each set withtwo even and one odd value, say B,,= (4n - 2, 4n, 2n - 1), has p(EIBn) 2 and hence p(E) =2 in contradictionwith the previousresult. By the suitable selectionof a partition,p(E) can assumeany value in theunit interval. Such a distributionis said tobe improper. Unfortunatelymost attempts to producep(OII) by a necessaryargument lead to impropriety which,in additionto violatingconglomerability, leads to othertypes of unsatisfactorybehaviour. See, forexample, Dawid et al. (1973). The necessaryview was firstexamined in detailby Jeffreys (1961). Bemardo(1999) and othershave made real progress but the issue is stillunresolved. Here theview will be takenthat probability is an expressionby a personwith specified knowledge aboutan uncertainquantity. The personwill be referredto as 'you'. p(A IB) is yourbelief about A whenyou knowB. In thisview, it is incorrectto referto theprobability; only to yours.It is as importantto statethe conditions as it is theuncertain event. Returningto the exampleof 0 takingpositive integer values, as soon as you understandthe meaningof theobject to whichthe Greek letter refers, you are, because of thatunderstanding, no longerignorant. Then p(O = i) = ai with ai = 1. Some statisticalresearch is vitiatedby a confusionbetween the Greek letter and the reality that it represents. I challenge anyone to produce a real quantityabout which they are trulyignorant. A furtherconsideration is thata computer couldnot produce one exampleof 0. Since p(O - n I) = nc = 0, p(O > n) = 1, so 0 mustsurely be largerthan any value that you care to name,or the computer can handle. A furthermatter requires attention. It is commonto use thesame conceptand notationwhen partof the conditioning event is unknownbut assumed to be true.For example, all statisticiansuse p(xI0, a) as above,or, more accurately, p(xI0, a, K). Here theydo not knowthe value of the parameter.What is beingexpressed is uncertaintyabout x, if theparameters were to havevalues 0 and a (and theyhad knowledgeK). It is notnecessary to distinguishbetween supposition and fact in thiscontext and the notation p(A IB) is adequate. The philosophicalposition is thatyour personal uncertainty is expressedthrough your prob- abilityof an uncertainquantity, given your state of knowledge,real or assumed.This is termed thesubjective, or personal, attitude to probability. Philosophyof Statistics 303 Many people, especially in scientificmatters, think that their statementsare objective, expressedthrough the probability, and are alarmedby the intrusionof subjectivity.Their alarm can be alleviatedby considering reality and how that reality is reflectedin theprobability calculus. We discussin the contextof sciencebut the approachapplies generally. Law providesanother example.Suppose that 0 is the scientificquantity of interest.(In criminallaw, 0 1 or 0 0 accordingto whetherthe defendant did, or did not,commit the crime.) Initially the scientist will knowlittle about 0, becausethe relevant knowledge base K is small,and twoscientists will have differentopinions, expressed through probabilities p(0IK). Experimentswill be conducted,data x obtainedand theirprobabilities updated to p(0lx, K) in the way alreadydescribed. It can be demonstrated(see Edwardset al. (1963)) thatunder circumstances that typically obtain, as the amountof data increases,the disparate views will converge,typically to where0 is known,or at least determinedwith considerableprecision. This is what is observedin practice:whereas initiallyscientists vary in theirviews and discuss,sometimes vigorously, among themselves, they eventuallycome to agreement.As someonesaid, the apparentobjectivity is reallya consensus. Thereis thereforegood agreementhere between scientific practice and the Bayesianparadigm. Thereare cases wherealmost all agreeon a probability.These will be discussedwhen we consider exchangeabilityin Section14. It is now possibleto see whythere has been a reluctanceto acceptpoint (d), the use of a probabilitydistribution to describeparameter uncertainty. It is because theessential subjectivity has not been recognized.With little data, p(O, a) variesamong subjects: as the data increase, consensusis reached.Notice that p(x 0, a) is also subjective.This is openlyrecognized when two statisticiansemploy different models in theiranalysis of the same data set. We shallreturn to these pointswhen the role of models is treatedin Sections9 and 11.

9. Models The topicof modelshas been carefullydiscussed by Draper(1995) fromthe Bayesian viewpoint and thatpaper shouldbe consultedfor a more detailedaccount than that provided here. The philosophicalposition developed here is thatuncertainty should be describedsolely in termsof your probability.The implementationof this idea requiresthe constructionof probability distributionsfor all theuncertain elements in thereality being studied. The completeprobability specificationwill be called a (probability)model, thoughthe terminologydiffers from that ordinarilyused in statistics,in a way to be describedlater. It also differsfrom model as used in science.The statistician'stask is to constructa model for the uncertain world under study. Having donethis, the probability calculus enables the specific aspects of interestto havetheir uncertain- ties computedon theknowledge that is available.There are thereforetwo aspectsto our study: theconstruction of themodel and theanalysis of thatmodel. The latteris essentiallyautomatic; in principleit can be done on a machine.The formerrequires close contactwith reality. To paraphraseand exaggeratede Finetti,think when constructing the model; with it, do notthink but leave it to thecomputer. We repeatthe point already made that, in doingthis, the subject, whose probabilitiesare beingsought, the 'you' in thelanguage adopted here, is notthe statistician, but the client,often a scientistwho has asked for statisticaladvice. The statistician'stask is to articulatethe scientist's uncertainties in thelanguage of probability,and thento computewith the numbersfound. A model is merelyyour reflection of realityand, like probability,it describes neitheryou northe world, but onlya relationshipbetween you and thatworld. It is unsoundto referto thetrue model. One timethat this usage can be excusedis whenmost people are agreed on the model. Thus the model of the heightsof fathersand sons as bivariatenormal might reasonablybe describedas true. 304 D. V. Lindley Whatuncertainties are therein a typicalscenario? The fundamentalproblem of inferenceand inductionis to use past data to predictfuture data. Extensiveobservations on the motionsof heavenlybodies enables their future positions to be calculated.Clinical studies on a drugallow a doctorto givea prognosisfor a patientfor whom the drug is prescribed.Sometimes the uncertain dataare in thepast, not the future. A historianwill use whatevidence he has to assesswhat might havehappened where records are missing.A courtof criminallaw enquiresabout what had hap- penedon thebasis of laterevidence. We shall,however, use thetemporal image, with past data x beingused to inferfuture data y (as x comesbefore y in thealphabet). In thisview, the task is to assess p(ylx, K). In the interestsof clarity,the backgroundknowledge, fixed throughout the treatment,will be omittedfrom the notation and we writep(ylx). One possibilityis to tryto assess p(ylx) directly.This is usuallydifficult, though it maybe thoughtof as thebasis of theapprenticeship system. Here an apprenticewould sit at themaster's feetand absorbthe data x. Withyears of suchexperience, the apprentice could inferwhat would be likelyto happenwhen he workedon his own. Successiveobservation on theuse of ash in the constructionof a wheelwould enable him to employash forhis ownwheel. There is, however,a betterway to proceedand thatis to studythe connections between x and y, and themechanisms thatoperate. Newton's laws enable the tidesto be calculated.Materials science assistsin the designand constructionof a wheel.Most modern inference can be expressedthrough a parameter O thatreflects the connectionbetween the two sets of data. Extendingthe conversation,strictly withinthe probability calculus, to include0,

p(YIx) = I0H, x) p(OIx) dO.

It is usual to supposethat, once 0 is known,the past data are irrelevant.In probabilitylanguage, given0, x and y are independent.Combining this fact with the determination of p(Olx) byBayes's theorem,we have

P(ylx) p(yI0) p(xI ) p(O)dO fp(x 0) p(O) dO.

Now thetask is to assess theuncertainty p(O) aboutthe parameter and thetwo data uncertainties, p(xI0) and p(yI0), giventhe parameter. Often the inference can stopat p(Olx), leavingothers to insertp(yI0). This mighthappen with the drug example above, where the doctor would need to knowthe distribution of efficacy0 andthen assess p(y 0) forthe individual patient. Althoughmuch inference is rightlyexpressed in termsof theevaluation of p(Olx),there is an importantadvantage in contemplatingp(ylx). The advantageaccrues from the factthat y will eventuallybe observed;the doctorwill see whathappens to the patient.The parameteris not usuallyobserved. The uncertaintyof 0 oftenremains; that of y disappears.This featureenables theeffectiveness of theinference to be displayedby using a scoringrule in an extendedversion of thatdescribed in Section4. If theinference is p(ylx) and y is subsequentlyobserved to be Yo,a score functionS{yo, p( Ix)} describeshow good the inferencewas, so thatthe clientand the statisticianhave theircompetences assessed. The methodhas been used in meteorology,e.g. in forecastingtomorrow's rainfall. Such methodsare not readilyavailable for p(Olx). One of the criticismsthat has been levelledagainst significance levels is thatlittle study has been made of how manyhypotheses, rejected at 5%, have subsequentlybeen shownto be true.There is no reasonto thinkthat it is 5%. Theorysuggests that it shouldbe muchhigher and that significance is too easily attained.A weatherforecaster who predictedrain on only 5% of days, when it subsequentlyrained on 20%, wouldnot be highlyesteemed. The bestway to assess thequality of Philosophyof Statistics 305 inferencesis to checkp(Olx) throughthe data probabilities p(ylx) thatthey generate. As previouslymentioned, itis usuallynecessary to introducenuisance parameters a, in addition to 0, to describeadequately the connection between x and y, and to establishthe independence betweenthem, given (0, a). In the drugexample, a mightinvolve features of the individual patient.Nuisance parameters impose formidable problems for some forms of inference,like those based solelyon likelihood,but, in principle,are easilyhandled within the probability framework bypassing from the joint distribution of (0, a) to themarginal for 0: equation(2). The introductionof parametersreduces the construction of a modelto providingp(x 0, a) and p(O, a). p(yI0, a) also arisesbut its assessment is similarto thatfor x and neednot be separately discussed.Here we see the distinctionbetween our use of 'model' and thatcommonly adopted, whereonly the data distribution,given the parameters, is included.Our definitionincludes the distributionof theparameters, since they form an importantpart of the uncertainty that is present. Most of thecurrent literature on modelstherefore concerns the data and is discussedin thenext section.For the moment, we just repeatthe point made earlier that even p(xI0, a) is subjective.A commonreason for wrongly thinking that it is objectivelies in the factthat there is oftenmore public informationon the data than on the parameters,and we saw in Section 8 that,with increasedinformation, people tend to approachagreement. Whygo throughthe ritual of determiningp(xI0, a) and p(O, a), and thencalculating p(Olx)? If p(O, a) can be assessed,why not assess p(Olx) directlyand avoid some complications?To use terminologythat I do notlike: ifyour prior can be assesseddirectly, why not your posterior? Part ofthe answer lies in theinformation that is typicallyavailable about the data density, but the desire forcoherence is themajor reason. A set of uncertaintystatements is said to be coherentif they satisfythe rules of the probabilitycalculus. Thus, the pair of statementsp(AIB) = 0.7 and p(Al -B) = 0.4 do not cohere withthe pair p(BIA) = 0.5 and p(BI -A) = 0.3. (Here ,B denotesthe complement of B.) Thinkof A as a statementabout data x and B as a statementabout parameter0. The firstpair refers to uncertaintiesin thedata and cohereswith the first parameter statement,p(BIA) = 0.5, fordata A. (Take p(B) = 0.4/1.1 = 0.36.) But all threedo notcohere withthe secondparameter statement for data -A, thatp(BI r-A) = 0.3. Withp(B) = 0.36, the coherentvalue is 0.22. The standardprocedure ensures that you are prepared for any values of the data,A or '-A, andthe final inferences about B willcollectively make sense.

10. Much statisticalwork is not concernedwith a mathematicalsystem, whether frequentist or Bayesian,but operatesat a less sophisticatedlevel. When faced with a new set of data, a statisticianwill 'playaround' with them, an activitycalled (exploratory) data analysis. Elementary calculationswill be made; simple graphswill be plotted.Several valuable ideas have been developedfor 'playing',such as histogramsand box plots. We arguethat this is an essential, importantand worthwhileactivity that fits sensibly into the philosophy. The viewadopted here is thatdata analysisassists in theformulation of a modeland is an activitythat precedes the formal probabilitycalculations that are needed for inference. The argumentdeveloped so farin thispaper has demonstratedthe need for probability.Data analysisputs fleshonto this mathematical skeleton.The onlynovelties that we add to conventionaldata analysisis therecognition that its finalconclusions should be in termsof probability and shouldembrace parameters as well as data. In thelanguage of the last section, the conclusions of data analysis should cohere. The fundamentalconcept behind the measurement of uncertaintywas thecomparison with a standard.Such comparisonsare oftendifficult and thereis a need to findsome replacement. We do not measurelength by usingkrypton light, the standard,but employother methods. Data 306 D. V.Lindley analysisand theconcept of coherenceis such a replacement.Suppose that you need to assess a singleprobability; then all youhave to guideyou is thenecessity that the value lies between0 and 1. In contrast,suppose that the need is to assess severalprobabilities of related events or quantities, whenthe whole of therich calculus of probabilitiesis availableto helpyou in yourassessments. In theexample that concluded Section 9, youmight have reached the four values given there, but considerationsof coherencewould forceyou to alterat least one of them.Coherence acts like geometryin themeasurement of distance;it forcesseveral measurements to obeythe system. We haveseen how this happens in replacingp(y x) by p(x 0, a) and p(O, a). Let us considerthis and itsrelationship with data analysis, considering first the data density p(x 0, a). A familiarand usefultool hereis thehistogram and modemvariants like stem-and-leafplots. These helpto determinewhether a normaldensity might be appropriate,or whethersome richer familyis required.If the data consist of two, or more, quantities x = (w, z), thena plotof z against w will helpto assess theregression of z on w and hencep(zlw, 0, a). These devicesinvolve the conceptof repeatedobservations, e.g. to constructthe histogram. We shallreturn to thispoint in discussionof the concept of exchangeabilityin Section14. Thereare issues herethat have not alwaysbeen recognized.You are makingan uncertainty statement,p(x 0, a), fora quantityx, which,with the data available, is foryou certain.Moreover you are doingit withreal thought,in theform of data analysisabout x. It is strangeonly to use uncertainty(probability) for the only certainquantity present. Furthermore, suppose that t(x) describesthe aspects of thedata that you have considered,the histogram or theregression. Then theresult of the data analysis is reallyp{xl 0, a, t(x)}; youare conditioningon t(x). Forexample, you mightsay thatx is normalwith mean 0 and variancea, but only afterseeing t(x), or equivalentlydoing the data analysis.This may lead to spuriousprecision in the subsequent calculations.One way to proceedwould be to constructthe modelwithout looking at the data. Indeed,this is necessarywhen designing the experiment (Section 16). The constructioncould only come in close consultationwith the clientand would involvelarger models than are currently used. Perhapsdata analysiscan be regardedas approximateinference, clearing out the grosser aspectsof the larger model that are not needed in theoperational, smaller model. Anotherpoint is thatp(xJH, a), say in theform of a histogram,is onlyexhibited for one value of (0, a), namelythe uncertain value that holds there. The datacontain little that, even if x - N(0O, ao), it is N(0, a) in situationsunobserved. There is a case thereforefor making models as big as yourcomputing power will accommodate,to allow for non-normalityand general parametervalues. The size of a modelis discussedin Section11. Noticethat the difficulties raised in thelast two paragraphs are as relevantto thefrequentist as theyare to theBayesian. The assessmentproblem is differentwhen it comes to theparameter density because thereis oftenno repetitionand thefamiliar tools of dataanalysis are no longeravailable. Furthermore, in handlingthe data density, several standard models are readily available, e.g. theexponential family and methodsbuilt around GLIM. Thesemodels have primarily been designedfor ease of analysis throughthe possession of specialproperties like sufficientstatistics of fixedlow dimensionality, thoughthey have the difficulty that outliers are noteasily accommodated. These constraintshave been imposedpartly through limitations of computercapacity but more importantlybecause, withinthe frequency approach, there are no generalprinciples and a new modelmay require the introductionof new ideas. Modern computationaltechniques lessen the firstdifficulty and Bayesian methods,with their ubiquitous use of the probabilitycalculus, remove the second entirely;the object is alwaysto calculatep(Olx). We shallreturn to thispoint in Section15. Few standardmodels are availablefor the parameter density, essentially limited to thedensities thatare conjugateto the memberof the exponentialfamily chosen forthe data density.The frequentistchant is 'where did you get thatprior?'. It is not a silly gibe; thereare serious Philosophyof Statistics 307 difficultiesbut they are partly caused by a failureto linktheory and practice. I haveoften seen the stupidquestion posed 'what is an appropriateprior for the varianceo2 of a normal(data) density?'.It is stupidbecause a is just a Greekletter. To findthe parameter density, it is essential to go beyondthe alphabetand to investigatethe realitybehind or2. Whatis it thevariance of? Whatrange of values does theclient think it has? Recall thatthe statistician's task is to expressthe uncertaintyof you,the client, in probabilityterms. A sensibleform of questionmight be 'whatis youropinion about the variabilityof systolicblood pressurein healthy,middle-aged males in England?'.But, even with careful regard for practice, it wouldbe stupidto denythe existence of veryreal, and largelyunexplored, problems here. This is especiallytrue when, as in mostcases, theparameter space has highdimensionality. We are lackingin methodsof appreciatingmulti- variatedensities. (This is trueof data as well as parameters.)Physicists did not denyNewton's laws because severalof theideas thathe introducedwere difficult to measure.No, theysaid that the laws made sense,they work where we can measure,so let us developbetter methods of measurement.Similar considerations apply to probability.A neglectedarea of statisticalresearch is theexpression of multivariateopinion in termsof probability,where independence is invoked too often,on groundsof simplicity,ignoring reality. It is notoften recognized that the notion of independence,since it involvesprobability, is also conditional.The mantrathat (xl, x2,. . ., xJ forminga randomsample are independentis ridiculouswhen they are used to inferx,,1. They areindependent, given 0. It is sometimesargued that data analysiscan make no contributionto the assessmentof a distributionfor the parameter because it involveslooking at thedata, whereas what is neededis a distributionprior to thedata. This is counteredby theobservation that we all use datato suggest somethingand then consider what our attitude to itwas withoutthe data. You see a sequenceof Os and Is and noticefew, but long, runs. Could thesequence be Markovinstead of exchangeableas you had anticipated?You thinkabout reasonsfor the dependenceand, havingdecided that a Markovchain is possible,think about its value. Had you seen Is onlywhen the order was prime, youwould fail to findreasons and acceptthe extraordinary thing that has happened.

11. Models again A modelis a probabilisticdescription of a client'ssituation, whose assessment is helpedby data analysisand explorationof theclient's present understanding. Several problems remain, of which one is thesize of themodel. Should you includeextra quantities, besides x, as covariates?Should theparameters increase in numberto offergreater flexibility, replacing a normaldistribution by a Student'st, say? Savage once gave thewise advicethat a modelshould be as big as an elephant. Indeed,the ideal Bayesianhas one modelembracing everything: what has been termeda world view.Such a model is impracticaland you mustbe contentwith a smallworld embracing your immediateinterests. But how small shouldit be? Really small worldshave the advantageof simplicityand thepossibility of obtainingmany results, but they have the disadvantage that they may not captureyour understanding of realityso thatp(ylx) based on themmay have a high penaltyscore. Compromise is called for,but always choose the largest model that your computa- tionalpowers will tolerate.One successfulstrategy is to use a largemodel and to determine, throughrobustness studies, what aspects of the model seriously affect your final conclusion. Those thatdo notcan be ignoredand somereduction in size achieved. It is valuableto thinkabout the relationships between the small world selected and thelarger worldsthat contain it. In Englandit is currentpractice to publishleague tables of schools' performancesthat use onlyexamination results. Many contendthat this is a ridiculouslysmall worldand that other quantities, like the performance of pupils at admission,should be included.It 308 D. V.Lindley is sometimessaid that,in ourapproach, a smallerworld cannot fit into a largerone andthat, if the formeris foundto be inadequate,it is necessaryto startafresh. This is not so; theapprehension arisesthrough a failureto appreciatethe conditionalnature of probability.Here is an example. Supposethat your model is thatx - N(O, a). Then,in full,you are describingp(x 0, a, N, K), whereN denotesnormality. In words,knowing K and supposingnormality, x has mean 0 and variancea. If thepresence of outlierssuggests an extensionto Student'st, so thatx - t(H,a, v) withindex v, thenthe two modelscohere, the formerhaving the restriction,or condition,that v= oc. In contemplatingt, you will alreadyhave consideredlarge values of v. Typically,in passingfrom a smallto a largemodel, the former will correspondto thelatter under conditions and the smalleris embeddedin thelarger. Occasionally, this appears not to be so. For example, one modelmay say thatx is normal;the other that log(x) is. One way out of thisdifficulty is to introducea seriesof transformationswith x and log(x) as twomembers, as suggestedby Box and Cox (1964). If thisis notpossible and you are genuinelyuncertain whether model Ml or model M2 obtains,then describe your uncertainty by probability,producing a modelthat has Ml with probabilityy and M2 withprobability 1 - y. Partof theinferential problem will be thepassage fromy to p(MI x). This is a problemthat has been discussed(O'Hagan, 1995), and where improprietyis best avoided and conglomerability assumed. Largemodels have been criticizedbecause they can sometimesappear to produceunsatisfac- toryresults in comparisonwith smaller models. For example, in consideringthe regression of one quantityon manyothers, you are urgednot to includetoo manyregressor variables, because to do so leads to overfitting.This undesirablefeature comes about throughthe use of frequentist methods.A theoremwithin the Bayesian paradigm shows that the phenomenon cannot arise with a coherentanalysis, essentially because maximization over a subsetcannot exceed thatover the fullset. The issue is connectedwith conglomerability (Section 5) because themethod of fitting thatis ordinarilyused, least squares, is equivalentto a Bayesianargument using an improperprior, namelya uniformdistribution over the space of theregression parameters. This does not cause offencewhen the dimensionof the space is low,but causes increasingdifficulties as it grows (Stein,1956), and hence the overfitting. Statisticianshave, over the years, developed a collectionof standardmodels, some of which are so routinethat computer packages for their implementation exist. Although these, when modified fromtheir frequentist form to providea coherentanalysis, are indubitablyvaluable, they should neverreplace your careful construction of a model fromthe practicalrealities. We repeatthe importantadvice to thinkin constructingthe model: once thathas been done,leave everythingto the probabilitycalculus. An illustrationis providedby the inconvenientphenomenon of non- responsein samplesurveys. Here it is importantto thinkabout the mechanisms that gave riseto thelack of response,and to modelthem. Some modelsin theliterature do notflow from any real understandingof whythe data are incomplete,and theyare thereforesuspect. The client'sreality mustbe modelledin probability terms. The suggestionhas oftenbeen made that it is possibleto testthe adequacy of a model,without thespecification of alternatives,and methods for doing this have been developed (Box, 1980). We arguethat the rejection of a modelis nota realityexcept in comparisonwith an alternativethat appearsbetter. The reasonlies in the natureof probabilitywhich is essentiallya comparative measure.The Bayesianworld is a comparativeworld in whichthere are no absolutes.The point will emergeagain in decisionanalysis in Section 15, whereyou decide to do something,not because it is good,but because it is betterthan anything else thatyou can thinkof. People who refuseto votein an electionon thegrounds that no candidatemeets their requirements miss the pointthat, with the limited availability of candidates,you shouldchoose the one whomyou think is best,even if awful. Philosophyof Statistics 309 12. Optimality The positionhas beenreached that the practical uncertainties should be describedby probabilities, incorporatedinto your model and thenmanipulated according to the rules of the probability calculus.We now considerthe implications that the manipulations within that calculus have on statisticalmethods, especially in contrastwith frequentistprocedures, thereby extending the discussionof significancetests and confidence intervals in Section6. It is sometimessaid, by those who use Bayes estimatesor tests,that all the Bayesianapproach does is to add a priorto the frequentistparadigm. A prioris introducedmerely as a devicefor constructing a procedure, that is theninvestigated within the frequentistframework, ignoring the ladder of theprior by whichthe procedurewas discovered.This is untrue:the adoptionof the fullBayesian paradigm entails a drasticchange in theway that you think about statistical methods. A largeamount of efforthas beenput into the derivation of optimumtests and estimates.This is evidenton thetheoretical side wherethe splendidscholarly books of Lehmann(1983, 1986) are largelydevoted to methodsof findinggood estimatesand testsrespectively. Again, more informally,in dataanalysis, reasons are advancedfor using one procedurerather than another, as whentrimmed means are rightlysaid to be betterthan raw means in thepresence of outliers.Let us thereforelook at inference,in thesense of sayingsomething about a parameter0, givendata x, in the presenceof nuisanceparameters a. The frequentistmay seek the best pointestimate, confidenceinterval or significancetest for 0. A remarkable,and largelyunrecognized, fact is that,within the Bayesianparadigm, all the optimalityproblems vanish; a wholeindustry disappears. How can thisbe? Considerthe recipe. It is to calculatep(Olx, K), thedensity for the parameter of interestgiven the data and background knowledge.This densityis a completedescription of yourcurrent understanding of 0. Thereis nothingmore to be said. It is an estimate:your only estimate. Integrated over a set H, it provides yourentire understanding of whetherH is true.There is nothingbetter than p(Olx, K). It is unique;the only candidate. Consider the case of thetrimmed means just mentioned.If themodel incorporatessimple normality, the densityfor 0 is approximatelynormal about x-, the sample mean.However, suppose that normality is replacedby Student'st (withtwo nuisance parameters, spreadand degreesof freedom);then the density for 0 will be centred,not on -x,but on whatis essentiallya trimmedmean. In otherwords, the estimatearises inevitablyand not because of optimalityconsiderations. The Bayesian'sunique estimate, the posteriordistribution, depends on the prior,so thereis some similaritybetween the Bayesianand the frequentistwho uses a priorto constructtheir optimumestimates. (The class of good frequentistprocedures is the Bayes class.) The real differenceis thatthe frequentist will use differentcriteria, like the error rate, rather than coherence tojudge thequality of the resulting procedure. This is discussedfurther in Section16.

13. The likelihood principle We haveseen that parametric inference is madeby calculating

p(O x) = p(x 0) p(O) fp(x 0) p(O) dO. (3) Considerp(x 0) as a functionof two quantities, x and 0. As a functionof x, forany fixed 0, p( 0) is a probabilitydensity, namely it is positiveand integrates,over x, to 1. As a functionof 0, for anyfixed x, p(x .) is positivebut does notusually integrate to 1. It is calledthe likelihood of 0 for the fixedx. It is immediatefrom equation (3) thatthe onlycontribution that the data make to inferenceis throughthe likelihood function for the observed x. Thisis thelikelihood principle that 310 D. V. Lindley valuesof x, otherthan that observed, play no rolein inference.A valuablereference is Bergerand Wolpert(1988). This facthas importantrecognized consequences. Whenever in inferencean integrationtakes place over values of x, the principleis violatedand the resultingprocedure may cease to be coherent.Unbiased estimatesand tail area significancetests are among the casualties.The likelihoodfunction therefore plays a moreimportant role in Bayesianstatistics than it does in the frequentistform, yet likelihood alone is notadequate for inference but needs to be temperedby theparameter distribution. Uncertainty must be describedby probability,not likelihood.Before enlargingon thisremark, it is importantto be clearwhat is meantby likelihood.If a modelwith datax has been developedwith parameters (0, a), thenp(x 0, a) as a functionof (0, a), forthe fixedobserved value of x, is undoubtedlythe likelihood function. However, inference in equation (3) does notinvolve the entire likelihood function, but only its integral

p(xI 0) =Jp(x I H, a) p(a I0) da. (4)

We referto thisas thelikelihood of 0 butthe terminology is notalways accepted. The reasonis clear:its construction involves one aspect,p(a 0), of theparameter density, p(O, a), whichlatter is not admittedto the frequentistor likelihoodschools. In neitherschool is theregeneral agreementabout what constitutesthe likelihoodfunction for a parameter0 of interestin the presenceof a nuisanceparameter a. Thereare at least a dozen candidatesin the literature.For example,in additionto theintegrated form in equation(4), thereis p(x 0, a), wherea is thevalue thatmakes p(xI0, a) overa a maximum.The plethoraof candidatesreflects the impossibility of anysatisfactory definition that avoids the intrusion of probabilities for the parameters. The reasonfor likelihood being, on its own,inadequate is that,unlike probability, it is not additive.If A and B are two exclusivesets, then p(A U B) = p(A) + p(B), omittingthe condi- tions,whereas it is nottrue that l(A U B) = l(A) + I(B) fora likelihoodfunction 1(.). Since the propertiesused as axiomsin the developmentof inference,e.g. in thework of Savage, lead to additivity,any violation may lead to someviolation of theaxioms. This happenswith likelihood. In Section5 we had an exampleinvolving colour and sex, whichwas expressedin termsof the informalconcept of one eventbeing more likely than another. In fact,the exampleholds when 'likely'is used in thetechnical sense as definedhere. Likelihood is an essentialingredient in the inferencerecipe but it cannot be theonly one. Noticethat the likelihood principle only applies to inference,i.e. to calculationsonce thedata have been observed.Before then,e.g. in some aspects of model choice, in the design of experimentsor in decisionanalysis generally, a considerationof severalpossible data values is essential(see Section16).

14. Frequentist concepts Ever sincethe 1920s,statistics has been dominatedby thefrequentist approach and has, by any sensiblecriterion, been successful;yet we have seen thatit clashes withthe coherentview in apparentlyserious ways. How can thisbe? Our explanationis thatthere is a property,shared by bothviews, that links them more closely than the material so farpresented here might suggest. The link is the conceptof exchangeability.A sequence (xl, x2,..., x") of uncertainquantities is, foryou, exchangeableunder conditions K if yourjoint probabilitydistribution, given K, is invariantunder a permutationof the suffixes.For example,p(x1 = 3, x2= 51K) = p(x2= 3, xi = 5 1K) on permuting1 and 2. An infinitesequence is exchangeableif every finite subsequence is so judged.The rolesof 'you' and K havebeen mentioned to emphasizethat exchangeability is a Philosophyof Statistics 311 subjectivejudgment and that you may change your opinion if the conditions change. If youjudge a sequenceto be (infinitely)exchangeable, then your probability structure for the sequenceis equivalentto introducinga parameter, V/ say, such that, given V/, the members of the sequenceare independentand identicallydistributed (IID). As theparameter is uncertain,you will have a probabilitydistribution for it. This resultis due to de Finetti(1974, 1975). OrdinarilyV/ will consistof elements(0, a) of which 0 is of interestand a is nuisance.Consequently exchangeabilityimposes the structureused above but withthe additionthat the data x have theparticular form of IID components.Furthermore, ip is relatedto frequencyproperties of the sequence.Thus, in the simplecase wherexi is either0 or 1, the Bernoullisequence, i is the limitingproportion of themthat are 1. Consequently,a Bayesian who makesthe exchangeability judgmentis effectivelymaking the same judgmentabout data as a frequentist,but withthe additionof a probabilityspecification for the parameter. The conceptof IID observationshas dominatedstatistics in thiscentury. Even whenobviously inappropriate,as in the studyof timeseries, the modellinguses IID as a basis. For example, x- Ox,-,may be supposedIID forsome 0, leadingto a linear,autoregressive, first-order process. Withinthe IID assumption,frequency ideas areapposite, some even within the Bayesian canon, so therehas developeda beliefthat uncertainty and probabilityare thereforebased on frequency. Some statisticstexts only deal withIID dataand therefore restrict the range of statisticalactivities. Theirexamples will come fromexperimental science, where repetition is basic,and notfrom law, whereit is not.Frequency, however, is notadequate because thereis ordinarilyno repetitionof parameters;they have uniqueunknown values. Consequentlythe confusionbetween frequency and probabilityhas denied the frequentistthe opportunityof usingprobability for parameter uncertainty,with the result that it has beennecessary for them to developincoherent concepts like confidenceintervals. The use of frequencyconcepts outside exchangeability leads to anotherdifficulty. Frequentists oftensupport their arguments by sayingthat they are justified'in the long run',to whichthe coherentresponse is 'whatlong run?'. For example,a confidenceinterval (see Section6) will coverthe truevalue a proportion1 - a of timesin the long run.To make sense of thisit is necessaryto embedthe particularcase of data x into a sequence of similardata sets: which sequence?;what is similar?The classic exampleis a dataset consisting of r successesin n trials, judgedto be Bernoulli.In thesequence do we fixn, or fixr or someother feature of the observed data?It matters.Bayesians provide an answerfor the single situation, whereas frequentists often needto embedthe situation into a sequenceof situations. The restrictionof probability to frequencycan lead to misrepresentations.Here is an example, concerningthe determinationof physicalconstants, such as the gravitationalconstant G. It is commonand reasonableto suppose thatthe measurementsmade at one place and time are exchangeableand unbiased,each havingexpectation G. It is reasonableto use theirmean as the currentestimate of G. Some rejectionof outliersmay be needed beforethis is done. The uncertaintyattached to thisestimate is foundby takings2, equal to the averageof the squared deviationsfrom the mean,and quotinga standarderror of s/IVn, where n is the numberof measurements.This leads to confidencelimits for G. Experienceshows that the more recent estimatesusually lie outsidethe confidence limits of earlierestimates. In otherwords, the limits weretoo narrow.A scoringrule forestimators of G wouldproduce a largepenalty score. The reasonis thatthe measurements are actuallybiased. Since the amount of thebias is notamenable to frequencyideas, it is ignored.The Bayesianapproach would have a distributionfor the bias and would use as a priorfor G the posteriorfrom the last estimate,possibly adjusted for any modificationsin themeasurement process. Often standard errors are too smallbecause onlythe exchangeablecomponent of uncertaintyis considered.Similar mistakes can arise with the 312 D. V. Lindley predictionsof futurenumbers of cases of acquiredimmune deficiency syndrome. They can ignore changesin personalbehaviour or Governmentpolicy, changes that are not amenable to frequentist analysis.

15. Decision analysis It has been notedhow statisticsbegan withthe collectionand presentationof data, and then extendedto includethe treatment of thedata and theprocess which we nowcall inference.There is a furtherstage beyond that, namely the use of data,and the inferencesdrawn from them, to reacha decisionand to initiateaction. In myview, statisticians have a realcontribution to maketo decisionanalysis and shouldextend their data collectionand inferenceto includeaction. The methodsof Ramseyand Savagehave demonstrated how the foundations can be presentedthrough decisionanalysis. The extensionto includeaction can be betterunderstood if we ask whatis the purposeof an inferencethat consists in calculatingp(ylx) forfuture data y, conditionalon past data x. An examplecited was a doctorwho had data on a drugand wishedto inferwhat might happento a patientgiven the drug.The exampleinvolves a decision,namely which drug to prescribe,another drug possibly leading to a differentinference for y. We argue,following Ramsey,that an inferenceis onlyof value if it is capableof beingused to initiateaction. Partial knowledgethat cannot be used is of littlevalue. Even in itsparametric form, p(Olx) will onlybe worthwhileif it can be incorporatedinto actions that involve the uncertain 0. Marxwas right:the pointis notjust to understandthe world (inference) but also to changeit (action).Let us see how thiscan be donein theBayesian view. The structureused by Savageand othersis to formulatea list of possible decisions d thatmight be taken.The uncertaintyis capturedin a quantity(or parameter)0. The pair (d, 0) is termeda consequence,describing what will happenif you take decision d whenthe parameter has value 0. Wehave seen how the uncertainty in 0 needsto be describedby a probabilitydistribution p(O). This will be conditionalon yourstate of knowledge,which is omittedfrom the notation. It mayalso dependon thedecision, as inthe case wherethe decisions are to investin advertising,or not, and 0 is nextyear's sales. We thereforewrite p(O d). The foundationalargument goes on to showthat the meritsof the consequence (d, 0) canbe describedby a realnumber u(d, 0), termedthe utility of the consequence.One consequenceis preferredto anotherif it has the higher utility. If these utilities are constructedin a sensibleway, the best decision is thatwhich maximizes your expected utility

fu(d, 0) p(O d) dO.

The additionof a utilityfunction for consequences, combined with the probability description of uncertainty,leads to a solutionto thedecision problem. Utility has to be describedwith care. It is not merelya measureof worth,but a measureof worthon a probabilityscale. If the best consequencehas utility1 and worstutility 0, thenconsequence (d, 0) has utilityu(d, 0) if you (noticethe subjective element) are indifferent between (a) theconsequence for sure and (b) a chanceu(d, 0) ofthe best (and 1 - u(d, 0) ofthe worst). It is thisprobability construction that enables the expectationto emergeas the only relevant criterionfor the choice of decision.Utility embraces all aspectsof theconsequence. For example, if one outcomeof a gambleis a winof ?100, itsutility includes not only an increasein monetary assets but also the thrillof the gamble.Some analyses,based solely on money,are defective becauseof their limited view of utility. Philosophyof Statistics 313 Noticethat, just as p(O) is notthe statistician's uncertainty, but rather the client's, so theutility is thatof thedecision maker. The statistician'srole is to articulatethe client's preferences in the formof a utilityfunction, just as it is to expresstheir uncertainty through probability. Notice also thatthe analysis supposes that there is onlyone decisionmaker, the 'you' of ourtext, though 'you' maybe severalindividuals forming a group,making a collectivedecision. None of thearguments givenhere apply to thecase oftwo, or more, decision makers who do nothave a commonpurpose, ormay even be in conflict.This is an importantlimitation on maximizedexpected utility. One topicthat statisticians have often considered their own, at leastsince the brilliant work of Fisher(1935), is the designof experiments.This is a decisionproblem and fitsneatly into the principlesjust enunciated.Let e be a memberof a class of possibleexperiments from which one mustbe selected.Let x denotedata that might arise from such an experiment.The experimentation presumablyhas some purpose,expressed by the selectionof a (terminal)decision d. As usual, denotethe uncertain element by 0. (A similaranalysis applies when the inference is forfuture data y.) The finalconsequence of experimentationand actionis (e, x, d, 0) to whichyou attach a utility u(e, x, d, 0). The expected utilityis

Ju(e, x, d, 0) p(O e, x, d) dO, (5) the uncertaintybeing conditionalon all the otheringredients. The optimumdecision is thatd whichmaximizes expression (5). Denotethe maximum value so obtainedby -u(e,x). The expec- tationof this is

Ju(e,x) p(x e) dx, (6) sincex is theonly uncertain element at thisstage, the uncertainty being clearly dependent on the experimente. A finalmaximization of expression(6) providesthe optimum experimental design. Notice the simplicityof the principlesthat are involvedhere, even thoughthe technical manipulationsmay be formidable.There is a temporalsequence that alternates between taking an expectationover the quantitiesthat are uncertainand maximizingover the decisionsthat are available.Each uncertaintymust be evaluatedconditionally on all thatis knownthen. The utility is attachedto the finaloutcome, other (expected) utilities, like iu(e, x) beingderived therefrom. Thisprovides a formalframework for the . It wouldappear to be a sensiblecriticism of the method just outlinedthat many experiments are not conductedwith a terminaldecision in mindbut merely to gatherinformation about 0. This aspectcan be accommodatedby extendingthe interpretationof a decision.Information about 0 dependson youruncertainty about 0 expressed,as always,by probability. So letthe decision d be to selectthe relevant density, here p(O e, x). A utilityfunction can thenbe constructed.Often it is reasonableto supposeu additivein thesense that u(e, x, d, 0) = u(e, x) + u(d, 0), (7) thefirst term involving the experimental cost and the secondthe terminal consequences. Notice theconnection between u(d, 0) andthe scoring rules suggested in Section9. Here u(d, 0) maybe thoughtof as a rewardscore attached to decisiond to announcep(O e, x) whenthe parameter has value 00.The usualmeasure of the information provided by p(O) is Shannon's,

Jp(O) log{p(0)} dO. The languageof decisionanalysis has been used by Neymanand othersin connectionwith 314 D. V.Lindley hypothesistesting, where they speak of the decisionsto accept,and to reject,the hypothesis H. There are cases whereacceptance and rejectioncan legitimatelybe thoughtof as action,as withthe rejectionof a batchof items.Equally thereare othercases wherewe could calculate p(Hlx, K) as an inferenceabout H on datax andknowledge K. The latterform may, as in thelast paragraph,be thoughtof as a decision.Both forms are valid and useful for different purposes. Our philosophyaccommodates both views and it is foryou to considerhow to modelthe reality before you.An importantfeature of theBayesian paradigm is its abilityto encompassa wide varietyof situationsusing a fewbasic principles. Somewriters, in discussinghypothesis testing, have argued that there are many different cases. For example,some may really involve action; some are purelyinferential. Other cases havebeen described,ending up, as withlikelihood, in a plethoraof situationsand greatcomplexity. The Bayesianview is thatthese are all coveredby the generalprinciples and thatthe differences perceivedare differencesin theprobability and utility structures. Some folklove complexityfor it hidesinadequacies and evenerrors.

16. Likelihood principle (again) In Section13 it was seen howthe likelihood principle is basic forinference, yet denied by many frequentistnotions. The principleceases to applywhen experimental design is partof the decision analysis,essentially because of the integrationover x involvedin expression(6). At the initial stage,where you are consideringwhich experiment to perform,the data, conditionalon any experimentselected, is uncertainfor you. This uncertaintyis expressedthrough p(xle) and is eliminatedby the operationof expectationin expression(6). In conductingan inference,or in makinga terminaldecision, you know the value of x, forthe data are available.Consequently it is unnecessaryto considerother data values and the likelihoodis all thatis needed.When it is a questionof experimentaldesign, the data are surelynot availableand all possibilitiesmust be contemplated.This contrastbetween pre- and post-dataemphasizes the importanceof the con- ditionswhen you face uncertainty. Probability is a functionof twoarguments, not one. Justhow the consideration of theexperiment can involveone formof integrationover x used byfrequentists, namely error rates, can be seenas follows.Denote by d*(e, x) thatdecision which maximizesthe expected utility (5). The expectationover x, expression(5), can thenbe written JJ u{e, x, d*(e, x), O} p{0 e, x, d*(e, x)} dOp(xl e) dx. Now

p{Ole, x, d*(e, x)} = p(Ole, x) sincethe addition of d*, a functionof e andx, addsno furthercondition. The latterprobability is p(x e, 0) p(O e)/p(x e). Insertingthis value into the expectation and reversing the orders of integration,we have

Ju{ex, d*(e, x), 0} p(x e, 0)dxp(0 e)dO wherethe innerintegral exposes the frequentistintegration over x. For a fixedexperiment, and witha utilitythat does notdirectly involve x, therelevant integral is

Ju{d*(e,x), 0} p(x e, 0) dx. Philosophyof Statistics 315 Withtwo decisionsand 0-1 utility,we immediatelyhave fp(x Ie, 0) dx overa subsetof sample space andthe familiar errors of the two kinds. The occurrenceof errorrates leads to some confusionbecause theyare oftentreated as the quantitiesto be controlled,and thereforeoccupy a primaryposition in decisionanalysis, whereas ourprimary consideration lies in theutility structure. Once theutility structure has beenimposed, the errorswill look afterthemselves. However, a considerationof differenterrors may lead to undesirablechanges in theutility structure. The Bayesianview is thatthe utilities, not the errors, arethe invariants of the analysis. For example, to designan experimentto achieveprescribed error ratesmay be incoherent.The prescriptionshould instead specify utilities.

17. Risk Riskis a termwhich we havenot used. It has beendefined (Duckworth, 1998) as 'thepotential for exposureto uncertainevents, the occurrence of which would have undesirable consequences'. The definitionrecognizes the two elementsin whatwe have called decisionanalysis, the uncertainty and the utility,though Duckworth, in commonwith most statisticians, emphasized the loss, or undesirability,rather than the gain, the utility. The changeis linguistic.Risk is thereforedependent on twoarguments and ourfoundational presentation in Section3 is dependenton theseparation of themin uncertaintyand worth. Yet it is common,as Duckworthdoes, to quotea measureof risk as a singlenumber, so denyingthe separation. Thus the risk associated with a 1000-mileflight is 1.7 in suitableunits. This is defensiblefor the following reason. The optimumdecision maximizes expected utility which, for data x, is proportionalto

ju(d, 0) p(0) p(x 0) dO andmay be writtenas a weightedlikelihood

jw(d, 0) p(x 0) dO, wherew(d, 0) = u(d, 0) p(O). The analysis,for given data, and hence for a givenlikelihood, does notdepend separately on theutility and probability,the two corner-stones of thephilosophy, but onlyon theirproduct. To putit in anotherway, if you were to watcha coherentperson acting (as distinctfrom expressing his thoughts) you would not, on thebasis ofthe observed actions, be able to separatethe two elements; only the weight function might be determined. Neverthelessthere are several reasons for separatingutility from probability. The most importantis theneed forinference, i.e. fora soundappreciation of theworld without reference to action.The philosophysays thatthis is had throughyour probability structure for the world. In inference,manipulations take place entirelywithin the probabilitycalculus, which therefore becomesseparated from utility. There are people who arguethat inference, in the formof pure science,is unsatisfactorywhen isolated from its applicationsin technology.What is undoubtedly importantis thatinference should be in a suitableform for decision-making, not an activitythat is isolatedfrom application. We have seen how Bayesianinference is perfectlyadapted for this purpose.It will be seen in Section 19 how some aspectsof the law separateinference from decision. Anotherreason for the separationlies in the desirabilityof communicationbetween people, betweendifferent 'yous'. Take the example of the 1000-mileflight cited above. Part of the calculationrests on theobserved accident rate for aircraft. Another part rests on theconsequences of theflight. You mayreact differently to these two elements.For example,it is knownthat for 316 D. V.Lindley elderlypeople there is an increasedrisk of circulatory problems due to sittingfor hours in cramped seats,and therefore you may evaluate your accident rate differently from that suggested purely by the accidentrate foraircraft. In contrast,a healthy,middle-aged executive, travelling in more comfortin firstclass, may accept the accident statistics but have a differentutility because of the importanceof themeeting to whichhe is bound.These considerationssuggest that the accident rateand consequencesof an accidentbe keptseparate because you may be able to use one element butnot the other, whereas the weight function alone would be moredifficult to use.

18. Science Karl Pearsonsaid 'The unityof all science consistsalone in its method,not in its material' (Pearson,1892). It is nottrue to say thatphysics is sciencewhereas literature is not.There are timeswhen a physicistmakes a leap ofthe imagination like an artist.Analyses of word counts can helpto identifythe author of an anonymouspiece ofliterature. Scientific method is certainlymuch moreimportant in physicsthan in literature,but it has thepotentiality to be usedin anydiscipline. Of whatthen does themethod consist? There is an enormousliterature devoted to answering thisquestion and it is presumptuousof me to claim to have the answer.But I do believethat statisticians,in theirdeep studyof thecollection and analysesof datahave, perhaps unwittingly, uncoveredthe answerand it lies in the philosophypresented here. Experimentation,with its productionof data, is an essentialingredient of scientificmethod, so the connectionbetween statisticsand scienceis not surprising.In thisview, the scientificmethod consists in expressing yourview of youruncertain world in termsof probability,performing experiments to obtaindata, and usingthat data to updateyour probability and henceyour view of theworld. Although the emphasisin thisupdating is ordinarilyput on Bayes,effectively the product rule, the elimination of theubiquitous nuisance parameters by theaddition rule 2 is also important.As we have seen, thedesign of the experiment is also amenableto statisticaltreatment. Scientific method consists of a sequencealternating between reasoning and experimentation.As explainedin Section8, each scientistis a 'you' withtheir own beliefs which are broughtinto harmony through the accumula- tionof data. It is thisconsensus that is objectivescience. Objectionshave been made to thissimple view on thegrounds that scientists do notact in the waydescribed in thelast paragraph. They even do tail area significancetests. The responseto the objectionis thatour philosophyis normative,not descriptive.It is notthe intentionto describe howscientists behave but how they would wish to behaveif onlythey knew how. The probability calculus providesthe 'how'. An impedimentaffecting 'how' is the lack of good methodsof assessingprobabilities when no exchangeabilityassumption is availableto guide you. This is ordinarilydescribed as determiningyour prior, but in realityit is widerthan that. Some attackson scienceare truly attacks on howscientists behave-on thedescriptive aspect. Often they are valid. Such attackswould become less cogentif theydealt withthe normativeaspect. Scientists are human.Real scientistsare affectedby extraneousconditions. One would hope thata scientist workingfor a multinationalcompany and anotheremployed by an environmentalagency differ onlyin theirprobabilities and wouldupdate accordingly. One suspectsthat other issues intervene. It is myhope that a Bayesianapproach would help to exposeany biases or fallaciesin eitherof the protagonists'arguments.

19. Criminallaw Thereare tworeasons for including this section on criminallaw: firstbecause of myown interest in forensicscience; second because of theconviction that this interest has engenderedthat some Philosophyof Statistics 317 importantaspects of the law are amenableto thescientific method as describedin thelast section. These aspectsconcern the trialprocess, where there is uncertaintyabout the defendant'sguilt, uncertaintythat is subsequentlytempered by data,in theform of evidence,hopefully to reacha consensusabout the guilt. Clearly this fits into the paradigm developed here. Lawyers do nothave a monopolyon thediscovery of thetruth; scientists have been doingit successfullyfor centuries. Thereare aspectsof thelaw, like the writing of a law,to whichthe scientific method has littleto contribute.However, the courts are notjust concernedwith guilt; they need to pass sentence.The law has separatedthese two functions,just as we have.They can be recognizedas inferenceand decision-makingrespectively. The defendantin a courtof law is eithertruly guilty G or notguilty G. The guiltis uncertain and so shouldbe describedby a probabilityp(G). (The backgroundknowledge is omittedfrom thenotation.) Data, in the formof evidenceE, are producedand theprobability updated. Since thereare onlytwo possibilities, G or - G, it is convenientto workin termsof odds (on), o(G) p(G)/p(G), whenBayes's theorem reads

IG o(GIE) p(E o(G) p(El G)oG involvingmultiplication of the originalodds by the likelihoodratio. Evidence ofteninvolves nuisanceparameters but, in principle,these can be eliminatedin theusual way by the addition rule.They will often enter into p(E- G) becausethere might be severalways in whichthe crime couldhave been committed, other than by the defendant. As thetrial proceeds, further evidence is introducedand successivemultiplications by likelihoodratios determine the finalodds. A dif- ficultyhere is thatsuccessive pieces of evidencemay not be independent,either given G or given rG. So farthis method has mainlybeen used successfullyfor scientific evidence, like bloodstains and DNA (Aitkenand Stoney,1991). Its applicabilityin generaldepends on satisfactorymethods ofprobability assessment. It has thepotential advantage of helpingthe court to combinedisparate typesof evidencefor, as remarkedin Section3, theprincipal merit of measurementlies in its abilityto meldseveral uncertainties into one. The law agrees withthe philosophyin separatinginference from decision. It even allows differentevidence to be admittedinto the two processes. For example,previous convictions may be used in sentencing(decision) but not alwaysin assessingguilt (inference). Expected utility analysisincludes a theoremto theeffect that cost-free information is always expected to increase theutility. This suggeststhat the only reason for not admitting evidence should be on groundsof cost(Eggleston, 1983). The partof the trial process that, at present,results in thejudgment guilty, or notguilty, should, in our view,be replacedby the calculationof odds o(GIE), whereE is now the totalityof all admittedevidence. On thisview, the jury should not make a firmstatement of guilt,or not,but statetheir final odds, or probability,of guilt.At leastthis provides a moreflexible and informative communication.More importantly,it provides the judge withthe informationthat he needs for sentencing.If d is a possibledecision, about gaol or a fine,then the expected utility of d is

u(d, G) p(G IE) + u(d, -G) p(-G IE). The optimumsentence is thatd whichmaximizes this expectation. The utilitieshere will reflect society'sevaluation of themerits of differentsentences for the guilty person, and theseriousness 318 D. V.Lindley of falseimprisonment. We are a longway fromthe implementation of theseideas buteven now theycan guideus intosensible procedures and avoidincoherent ones.

20. Conclusions The philosophyof statisticspresented here has threefundamental tenets: first, that uncertainty shouldbe describedby probabilities; second, that consequences should have their merits described byutilities; third, that the optimum decision combines the probabilities and utilities by calculating expectedutility and then maximizing that. If these are accepted, then the first task of a statisticianis to developa (probability)model to embracethe client's interests and uncertainties.It will include thedata and anyparameters that are judged necessary. Once accomplished,the mechanics of the calculustake over and the required inference is made.If decisionsare involved, the model needs to be extendedto includeutilities, followed by another mechanical operation of maximizing expected utility.One attractivefeature is thatthe whole procedure is welldefined and there is littleneed for ad hoc assumptions.There is, however,a considerableneed forapproximation. To carryout this schemefor the largeworld is impossible.It is essentialto use a smallworld, which introduces simplificationbut oftencauses distortion.Even the mechanicsof calculationneed numerical approximations.Both these issues have been consideredin theliterature, whether frequentist or Bayesian,and substantialprogress has been made. Where a real difficultyarises is in the constructionof the model.Many valuable techniques have been introducedbut, because of the frequentistemphasis in past work,there is a real gap in our appreciationof how to assess probabilities-ofhow to expressour uncertainties in therequisite form. My view is thatthe most importantstatistical research topic as we enterthe new millennium is thedevelopment of sensible methodsof probabilityassessment. This will requireco-operation with numerate experimental psychologistsand much experimental work. A colleagueput it neatly, though with some exaggera- tion:'There are no problemsleft in statisticsexcept the assessment of probability'. It is curiousthat thetypical expert in probability knows nothing about, and has no interestin, assessment. The adoptionof the positionoutlined in this paper would resultin a wideningof the statistician'sremit to includedecision-making, as well as data collection,model construction and inference.Yet it also involvesa restrictionin theiractivity that has not been adequately recognized.Statisticians are not mastersin theirown house. Theirtask is to help the clientto handle the uncertaintythat they encounter. The 'you' of the analysisis the client,not the statistician.Our journals,and perhapsour practice,have been too divorcedfrom the client's requirements.In thisI have been as guiltyas any.But at least the theoreticianhas developed methods.Your task is to putthem to gooduse.

References Aitken,C. G. G. and Stoney,D. A. (1991) The Use ofStatistics in Forensic Science. Chichester: Horwood. Bartholomew,D. J.(1988) Probability,statistics and theology (with discussion). J R. Statist.Soc. A, 151, 137-178. Berger,J. 0. andDelampady, M. (1987) Testingprecise hypotheses (with discussion). Statist. Sci., 2, 317-352. Berger,J. 0. andWolpert, R. L. (1988) TheLikelihood Principle. Hayward: Institute of . Bemardo,J. M. (1999) Nestedhypothesis testing: the Bayesianreference criterion. In BayesianStatistics 6 (eds J.M. Bemardo,J. 0. Berger,A. P.Dawid andA. F M. Smith).Oxford: Clarendon. Bemardo,J. M., Berger,J. O., Dawid,A. P. and Smith,A. F M. (eds) (1999) BayesianStatistics 6. Oxford:Clarendon. Bemardo,J. M. and Smith,A. F M. (1994) BayesianTheory. Chichester: Wiley. Box, G. E. P. (1980) Samplingand Bayes' inferencein scientificmodelling (with discussion). J R. Statist.Soc. A, 143, 383-430. Box, G. E. P. andCox, D. R. (1964) An analysisof transformations (with discussion). J R. Statist.Soc. B, 26, 211-252. Dawid, A. P., Stone,M. and Zidek,J. V (1973) Marginalizationparadoxes in Bayesianand structuralinference (with discussion).J R. Statist.Soc. B, 35, 189-233. Philosophyof Statistics 319

DeGroot,M. H. (1970) OptimalStatistical Decisions. New York:McGraw-Hill. Draper,D. (1995) Assessmentand propagation of model uncertainty (with discussion). J R. Statist.Soc. B, 57, 45-98. Duckworth,F. (1998) The quantificationofrisk. RSS News,26, no. 2, 10-12. Edwards,W. L., Lindman,H. and Savage,L. J.(1963) Bayesianstatistical inference for psychological research. Psychol. Rev.,70, 193-242. Eggleston,R. (1983) Evidence,Proof and Probability.London: Weidenfeld and Nicolson. de Finetti,B. (1974) Theoryof Probability, vol. 1. Chichester:Wiley. (1975) Theoryof Probability, vol. 2. Chichester:Wiley. Fisher,R. A. (1935) TheDesign of Experiments. Edinburgh: Oliver and Boyd. Healy,M. J.R. (1969) Rao's paradoxconcerning multivariate tests of significance.Biometrics, 25, 411-413. Jeffreys,H. (1961) Theoryof Probability. Oxford: Clarendon. Lehmann,E. L. (1983) Theoryof . New York:Wiley. (1986) TestingStatistical Hypotheses. New York:Wiley. O'Hagan,A. (1995) FractionalBayes factors for model comparison (with discussion). J R. Statist.Soc. B, 57, 99-138. Onions,C. T. (ed.) (1956) TheShorter English Dictionary. Oxford: Clarendon. Pearson,K. (1892) TheGrammar of Science. London: Black. Ramsey,F. P. (1926) Truthand probability.In The Foundationsof Mathematicsand OtherLogical Essays (ed. R. B. Braithwaite),pp. 156-198. London:Kegan Paul. Savage,L. J.(1954) TheFoundations of Statistics. New York:Wiley. (1977) The shiftingfoundations of statistics.In Logic, Laws and Life: Some PhilosophicalComplications (ed. R. G. Colodny),pp. 3-18. Pittsburgh:Pittsburgh University Press. Stein,C. (1956) Inadmissibilityof the usual estimationof the mean of a multivariatenormal distribution. In Proc. 3rd BerkeleySymp. Mathematical Statistics and Probability(eds J.Neyman and E. L. Scott),vol. 1,pp. 197-206. Berkeley: Universityof California Press. Walley,P. (1991) StatisticalReasoning with Imprecise Probabilities. London: Chapman and Hall.

Comments on the paper by Lindley Peter Armitage(Wallingford) Dennis Lindleyhas writtenso frequently,and so persuasively,about the principles of Bayesianstatistics, thatwe scarcelyexpect to findnew insightsin yet anothersuch paper.The presentpaper shows how wrongsuch a priorjudgment would be. Lindley'sconcern is withthe verynature of statistics,and his argumentunfolds clearly, seamlessly and relentlessly.Those of us who cannotaccompany him to theend of hisjoumey mustconsider very carefully where we need to dismount;otherwise we shall findourselves unwittinglyat thebus terminus,without a returnticket. I wrote'those of us' because theremust be manywho, like me, sympathizewith much of theBayesian approachbut are unwillingto discarda frequentisttradition which appears to have servedthem well. It is worthtrying to enquirewhy this should be so. One possibility,of course,is thatour reluctanceis, at least in part,a manifestationof inertia,demonstrating a lack of courageor understanding.I must leave thatfor others to judge. I think,though, that there are sounderreasons for withholding full supportfor theBayesian position. Lindley and I came to statisticsduring the 1940s, at a timewhen the subjectwas dominatedby the Fisherianrevolution. During the 19th centuryinverse probability had co-existed uneasilywith frequentist methods by theuse of flatpriors, standard errors and normalapproximations, resultsbeing interpretable by eithermode of reasoning,albeit with occasional lack of clarity.Fisher had, it seemed,cleared the air by disposingof theneed forinverse probability. Philosophical disputes, such as thosewith Neyman and E. S. Pearson,took place withinthe frequentistschool, although Jeffreys and a fewother pioneers maintained and developedthe Laplace-Bayes framework.To manyof us enteringthe fieldat thattime it wouldhave seemedbizarre to overtumsuch a powerfulbody of ideas. It is greatlyto thecredit of Lindley,and of a fewof his contemporarieslike Good and Savage, thatthey recognized the possibilitythat, as theymight have putit, the Emperor had no clothes. The greatmerit of the Fisherianrevolution, apart from the sheerrichness of the applicablemethods, was the abilityto summarize,and to draw conclusions from,experimental and observationaldata withoutreference to priorbeliefs. An experimentalscientist needs to reporthis or her findings,and to state a range of possible hypotheseswith which these findingsare consistent.The scientistwill un- doubtedlyhave prejudicesand hunches,but the reportingof these should not be a primaryaim of the investigation.Consider, for instance, one of themajor achievements of medical statisticsin thelast half- century-thefirst study of Doll and Hill (1950) on smokingand lung cancer.They certainlyhad prior hunches,e.g. thatair pollutionwas more likelythan smoking to cause lung cancer,but it would have served no purpose to quantifythese beliefs and to enterthem into the calculationsthrough Bayes's 320 Commentson the Paper by Lindley theorem.There were indeed important uncertainties, about possible biases in the choice of controlsand aboutthe possible existence of confoundingfactors. But theway to deal withthem was to considereach in turn,by scrupulousargument rather than by assigningprobabilities to differentmodels. That is an arbitraryexample, but one thatcould be replicatedby studiesin a wide varietyof appliedfields. This is notto denythe importance of priorbeliefs in theweighing up of evidence,or especiallyin the planningof futurestudies, but rather to cast doubton theneed to quantifythese beliefs with any degree of precision.As Curnow(1999) remarked,in connectionwith studies on passive smoking, 'Bayesian concepts must be fundamentalto our way of thinking.However, quantifyingand combiningin any formalway the evidenceon mechanismswith that from the epidemiologicalstudy are,in myview, impossible.' To return,then, to Lindley'somnibus, I findthat I shouldhave dismountedby stage (b) in the list of stages (a)-(e) in Section 7. I believe that thereare many instancesof uncertaintywhich are best approachedby discussionand furtherinvestigation and whichdo notlend themselves to measurementby probability.In thatway I am absolved fromthe laterneed to dismountat stage (d). On furtherthought, though,I should perhapshave dismountedat (a), where,we are told, 'Statisticsis the studyof un- '.This seems to claim too much forstatistics, as indeed the authorrecognizes at the end of Section2. I am surprisedthat he abandonsthe more traditional identification of statisticswith the study of groupsof numericalobservations. Uncertainty still comes intothe picture, by way of unexplainedor 'random'variation, but thismore modest view of our subjectputs the emphasison frequentistvariation ratherthan the more ambitious Bayesian world view. Frequentistsare accustomedto receivinggenerous amounts of criticismfrom Bayesians about their incoherentpractices. (Incidentally, I am not at all surethat a littleincoherency is not a good thing.As Durbin (1987) implied,to look at a problemfrom irreconcilable points of view may generateuseful insight.)Significance tests come in forespecially hard knocks. Thus (Section 6), 'The interpretationof "significantat 5%" depends on n'. Well, it depends what you mean by 'interpretation'.In a rather obvious sense,the definitionis independentof n. The possibleresults are arrangedin some meaningful orderand rankedby theirprobability on thenull hypothesis. Irrespective of thesample size, a resultthat is significantat 5% comes beyondthe appropriate percentile of thenull distribution.What the objection means is that,on a particularformulation of priorprobabilities for the non-nullhypothesis, the Bayes factorcomparing the two hypotheseswill varywith n (Lindley,1957). However,there is no reasonwhy the non-nullpriors should be the same for differentn. Large sample sizes are appropriatefor the detectionof small differences,and we mightexpect the non-null prior to be moreconcentrated towards the null forlarge rather than small n. Withan adjustmentof thissort the phenomenon disappears (Cox and Hinkley(1947), section10.5). Lindleysometimes seems to underestimatethe abilityof frequentistmethods to cope withcomplex- ities.For instance(Section 6), 'meta-analysisis a difficultarea forstatistics'. As he says,this is merely thetask of combiningthe results of manyexperiments. It has come ratherlate to some disciplines,such as clinicaltrials, because previouslyit was unusualto findmany replicated studies that were sufficiently similarto be combinedsensibly. But a frequentistanalysis, combining estimates and not significance levels,is straightforward.In an earliergeneration it was a standardexercise in agriculturaltrials (Yates and Cochran,1938) and in bioassay(Finney (1978), chapter14). Again,he asserts(Section 14) thatfrequentist analysis is 'unable to cope' withthe effectsof changes in personalbehaviour or Governmentpolicy on predictionsof futurenumbers of cases of acquired immunedeficiency syndrome. Yet the effectsof changes in sexual practicesand of improvementsin therapycan be estimatedby experimentor observationand builtinto the models thatare used forsuch projections.It seems betterto approachthese problems by specificenquiries about each effectthan to imposedistributions of bias determinedby subjectivejudgments. Lindley'sforceful presentation has led me to respondmore robustly than I had expected.I respectthe intellectualrigour of modernBayesianism, and I acknowledgethe influence that it has had in reminding statisticiansthat ancillary information is important,whether or not it is includedin a formalanalysis of the currentdata. In particularfor decisions and verdictsa Bayesianapproach seems essential,although here again a fullyquantitative analysis may be unnecessary.However, I cannot agree that for the reportingof typicalscientific studies a Bayesiananalysis is mandatory.I have no objectionsto thosewho wishto followthat route, provided that they can make theirconclusions comprehensible to theirreaders, but I wish to reservethe rightto presentmy own conclusionsin a differentway. Unfortunately,such eclecticismis unlikelyto findmuch support among the Bayesian community. Commentson the Paper by Lindley 321 M. J. R. Healy (Harpenden) In thispaper Dennis Lindleysums up theexperience of 50 years' advocacyof thesubjective approach to statisticalreasoning. It has been a longhaul; he mustat timeshave feltsympathy for Bishop Berkeley,of whose argumentsit was said thatthey admitted no refutationbut carriedno conviction.Today though, thanksvery largely to workby him and his students,Bayesian methodologyis widelystudied and the Royal StatisticalSociety's journals routinelycarry papers in whichBayesian techniques are employed. Yet it remainstrue that the impactof thismethodology on thevast amountof statisticalanalysis that is publishedin the scientificliterature is essentiallynegligible. Almost every paper that I see as statistical adviserto a prominentmedical journal contains the sentence 'Values of p less than0.05 wereregarded as significant'or its equivalent,and non-statisticalreferees regularly criticize submitted papers for an absenceof powercalculations or of adjustmentsof significancelevels to allow formultiple comparisons. If the Fisher-Neyman-Pearsonparadigm (Healy, 1999) is demonstrablyunsatisfactory, as Professor Lindleyclaims to show,then large quantities of researchdata are beingwrongly interpreted and a highly unsatisfactorysituation exists. It seemsto me thatthere may be morethan one reasonfor this. The first,and probablythe least important,consists of weaknessesin thetheoretical underpinning of Bayesianmethods. One of theserelates to therepresentation of ignorance,an area in whichmuch work has been done. (It may be thatwe are nevertruly ignorant, but thereare meritsin enquiringwhat we shouldbelieve if we had been so.) Walley(1996) is particularlyrelevant here, and I mustconfess that I foundthe contributions to thediscussion by ProfessorLindley and some of his colleaguesunconvincing. Anotherissue stemsfrom the fact that the bulk of statisticalwork is actuallydone by non-statisticians. This is as it shouldbe; one of theresponsibilities of thestatistical profession (one thatis notalways lived up to) is thatof makingits insightsavailable to researchworkers in all fields.Medical studentstoday are exposed to statisticalteaching in theirpreclinical years and later to text-books(not all writtenby statisticians)which lay down the standardapproach of t, x2, r, Wilcoxonand the rest.If theywish to publishthe results of research,they are liable to be encouraged,not to say compelled,to quote p-values and confidencelimits-the paradigmthat I have referredto is in full possession of the field. As statisticianswe may come to follow ProfessorLindley and to agree among ourselvesthat new and incompatibletechniques must replace it, but how are we to explain this to our clients? Are we to apologize and suggestthat they must unlearn all thatwe have been teachingthem for many decades, that theymust abandon their favourite computer packages? And, if we do, will theylisten to us? But the most severeproblem, I suggest,is essentiallya matterof psychology.Mankind in general longsfor certainty, and therise of naturalscience has been seen as a way of obtainingcertain knowledge. We statisticianshave pointedout thatcomplete certainty is unobtainable,but we have maintainedthat the degreeof uncertaintyis quantifiableand objective(Schwartz, 1994)-we can be certainabout how uncertainwe shouldbe. If we now insiston the personalsubjective nature of uncertainty,how do we wish scientiststo behave when theypresent their results? Are the conclusionsto be precededby the rubric'In our opinion',with the implied parenthesis '(but you don'thave to agreewith us)'? We cannot even fall back on the objectivityof the data, since (as ProfessorLindley has pointedout to me) the likelihoodfunction itself depends on a model which is itselfsubjectively chosen. It may be thatthe inclusionof such a rubricwould show a certaindegree of realism we can all rememberpapers to which our reactionwas 'I don'tbelieve a wordof it'. But it mustbe admittedthat it will not be welcomedby thescientific community as a whole,let alone by thegeneral public. Dennis Lindley'spaper, like so manyof his previouscontributions, raises innumerabletopics that are worthyof deep thoughtand discussion.There is no escaping the fact thatstatistics, unlike most disciplines,demands philosophical investigation. As practitionerswe owe him a debt of gratitudefor persuadingus, unwillingas we may be, thatsuch investigationsmust be pursuedand forlaying down one paththat needs to be followed.

D. R. Cox (NuffieldCollege, Oxford) It is 50 years since Dennis Lindleyand I became colleagues at the StatisticalLaboratory, Cambridge. Since thenI have read mostif notall of his workon thefoundations of statisticsalways with admiration for its intellectualand verbal clarityand vigour.The presentpaper is no exception.It sets out with persuasivenessand aplomb the personalisticapproach to uncertaintyand individualdecision-making. The ideas describedare an importantpart of modernstatistical thinking. A key issue,though, is whether theyare the all-embracingbasis of at least themore formal part of statisticsor are to be takenas some partof an eclecticapproach as I, and surelymany others, have patientlyargued; see, forexample, Cox (1978, 1995, 1997) and Cox and Hinkley(1974). 322 Commentson the Paper by Lindley On one pointI believe thatwe are in totalagreement. Bayesian in thiscontext does notmean merely relyingon a formalapplication of Bayes's theoremto produce inferences.Many of the applications involvingflat priors or hyperpriorsin a small numberof dimensionscan be regardedessentially as a combinationof empiricalBayes methodsand a technicaldevice to produce sensible approximate confidenceintervals implemented, for example, by Markovchain Monte Carlo methods.Provided that the relativelyflat priors are not in a large numberof dimensions,such investigationsseem philosophi- cally fairlyneutral. Dennis Lindley's view is much more radical. It is the predominanceof the constructiveuse of personalisticprobability to synthesizevarious kinds of information,including that directlyprovided by data, intoa comprehensiveassessment of totaluncertainty, preferably leading to a decision analysis.Flat priorshave no role except occasionallyas an approximation.The terminology 'Bayesian' is unfortunate,but I supposethat we are stuckwith it. Why is thisunsatisfactory as the primarybasis forour subject?In tryingto discuss this in a brief contributionone is in the difficultynot merelyof soundingmore belligerentthan is the intentionbut even more seriouslyof not havingthe last word! Also the paper is rich in specificdetail on which commentis reallydesirable. A major attractionof the personalisticview is thatit aims to addressuncertainty that is not directly based on statisticaldata, in the narrowsense of thatterm. Clearly much uncertainty is of thisbroader kind. Yet when we come to specificissues I believe thata snag in the theoryemerges. To take an examplethat concerns me at themoment: what is the evidencethat the signalsfrom mobile telephones or transmissionbase stationsare a majorhealth hazard? Because such telephonesare relativelynew and thelatency period for the development of, say, brain tumours is longthe direct epidemiological evidence is slender;we relylargely on the interpretationof animal and cellular studiesand to some extenton theoreticalcalculations about the energylevels thatare needed to induce certainchanges. What is the probabilitythat conclusions drawn from such indirectstudies have relevancefor human health? Now I can elicitwhat my personal probability actually is at themoment, at least approximately.But thatis not the issue. I wantto know what my personalprobability ought to be, partlybecause I want to behave sensiblyand muchmore importantly because I am involvedin thewriting of a reportwhich wants to be generallyconvincing. I come to theconclusion that my personal probability is of littleinterest to me and of no interestwhatever to anyone else unless it is based on serious and so far as feasible explicit information.For example,how oftenhave verybroadly comparable laboratory studies been misleading as regardshuman health? How distantare the laboratorystudies from a directprocess affecting health? The issue is notto elicithow muchweight I actuallyput on such considerationsbut how muchI oughtto put.Now of course in the personalisticapproach having (good) informationis betterthan having none but thepoint is thatin myview the personalisticprobability is virtuallyworthless for reasoned discus- sion unless it is based on information,often directly or indirectlyof a broadlyfrequentist kind. The personalisticapproach as usuallypresented is in dangerof puttingthe cartbefore the horse. I hope that Dennis Lindleywill commenton this.Is the issue in effecta nuance of interpretationor, as I tendto think,a pointof principle? Anotherway of sayingthis is thatwe can put broadlythree requirements on a theoryof the more formalparts of statistics:that it embracesas much as possible in a single approach,that it leads to internallyconsistent (coherent) consequences and thatit mesheswell withthe real world(calibration). Now thepersonalistic approach scores extremely well on the firsttwo points.My difficultyis thatI put verylarge, indeed almost total, weight on thethird. If therewere to be a choice betweenworking self- consistentlyand beingin accordwith the real world,and of coursewe wouldlike to do both,then I prefer thelatter. The frequency-basedapproach attempts, often rather crudely, to putthat first. Take a simplesituation in Mendeliangenetics in whichsome probabilitiesof 2 and 4 arise.Are they approximaterepresentations of some biological phenomenathat were going on long beforeanyone investigatedthem or are theyto be interpretedas essentiallythe convergenceof your personalistic probabilityin theface of a largeamount of information?The secondview is interestingbut, to mymind, the formeris the preferredinterpretation and the essentialreason why the probabilitiesare important. This is undera philosophicalposition that is close to naive realism;there is a real worldout there which it is our task to investigateand which shows certainregularities captured, in this case, by biological constants. This leads to theconclusion that the elicitation of priorsis generallyuseful mainly in situationswhere thereis a largeamount of information,possibly of a relativelyinformal kind, which it is requiredto use and which it is not practicableto analyse in detail. An informalsummary by expertsinto a prior distributionmay thenbe a fruitfulapproach. It carriesthe danger,however, that the expertsmay be Commentson the Paper by Lindley 323 wrongand treatingtheir opinion as equivalentto explicitempirical data has hazards.In anycase settling issuesby appeal to supposedauthority, while sometimes unavoidable, is in principlebad. It is, of course, also possiblethat data are wrong,i.e. seriouslydefective, but this is open to directinvestigation. I understandDennis Lindley'sirritation at the cry 'wheredid theprior come from?'.I hope thatit is clear thatmy objectionis ratherdifferent: why should I be interestedin someoneelse's priorand why shouldanyone else be interestedin mine?There is a parallelquestion: where did themodel forthe data- generatingprocess come from?This is no trivialmatter, especially in subjectslike economics.Here any sort of repetitionis very hypotheticaland, althoughsome economistsassert a solid theoretical knowledgebase, this seems decidedlyshaky. The reason forbeing interestedin models is, however, clear.They are an imperfectbut hopefully reasoned attempt to capturethe essence of some aspectof the real physical,biological or social worldand are in principleempirically at least partlytestable. If we havea reasonablyfruitful representation then in principleeveryone is or shouldbe interestedin it. The need for personaljudgment, perhaps supremely in scientificresearch, is not in dispute.The formalizationof thismay be instructivein some situations.A centralissue concernsthe role of statistical methods(not the role of statisticianswhich is a differentmatter). I see thatrole as primarilythe provision of a basis formathematical representation of physical random phenomena and forpublic discourse about uncertainty. The Bayesianformalism is an elegantrepresentation of the evolutionof uncertaintyas increasingly moreinformation arises. In its simplestform it is concernedwith combining two sourcesof information. But one of the generalprinciples in the combinationof evidence is not to mergeinconsistent 'data'. Consistencyhas usually to be interpretedin a probabilisticsense. Thereforewe should face the possibilitythat the data and the otherassessment (the prior) are inconsistent.(I am, of course,aware of theargument that no possibilityshould be excludeda prioribut I cannotsee thatas a satisfactoryway out.) Of course it may be thatthe data are flawedor being misinterpreted.But at least in principle somethinglike a significancetest seems unavoidable.I knowthat in principlewe can reservea small portionof priorprobability to put on unexpectedpossibilities but surelywe need also to representwhat theseare and thismay be totallyunknown. A complexset of data may show entirelyunanticipated but importantfeatures. This is connectedwith the matter of temporalcoherency on whichcomments would be welcome. Are thenp-values needed? It is interestingthat for 50 years statisticianswriting from a broadly frequentistperspective have criticizedthe overuseof significancetests (Yates, 1950). Indeed in some fields,notably epidemiology, conclusions are now primarilypresented via approximateconfidence limits whichcould be regardedas an approximatespecification of a likelihoodfunction, if thatpoint of view were preferred.But in principleit seems essentialto have a way of saying'the data underimmediate analysisare inconsistentwith the representation suggested'. Now I agree withDennis Lindleythat it is necessaryto have some idea of an alternativebut not that it is necessaryto formulateit probabilistically: desirablemaybe, but necessary no. For examplewe maytest for linearity without an explicitidea of the formof non-linearitythat is appropriate.If theneed ariseswe maythen have to formulatespecific new modelsbut not otherwise. The constructionof overviews(so-called meta-analyses) via treatingp-values as uncertaintymeasures is clearlya poor procedureif estimatedeffects and measuresof precisionare available on a comparable scale. But so also wouldbe overviewsin whichmeasures of thedegree of beliefin some hypothesiswere theonly evidence available. It seems to be a fundamentalassumption of the personalistictheory that all probabilitiesare comparable.Moreover, so faras I understandit, we are not allowed to attachmeasures of precisionto probabilities.They are as theyare. A probabilityof 2 elicitedfrom unspecified and flimsyinformation is the same as a probabilitybased on a massive high quality database. Those based on very little informationare unstableunder perturbations of the informationset but thatis all. This relatesto the previouspoint and to theusefulness of suchmeasures for communication. Forecastingof acquiredimmune deficiency syndrome (AIDS) is mentionedas an exemplarof model uncertainty.Now the initialreport on AIDS in the UK discussed sources of uncertaintyand stated explicitlythat model uncertainty was themajor source. Indeed the message was rammedhome by a front coverwhich showed several quite different forecasts as curvesagainst time. Would it have helpedto put probabilitieson the differentmodels and to have producedan overallassessment? I suppose thatit is a matterof judgment but it seemsto me thatthis would have been a confusingand misleadingthing to do and would have hiddenrather than clarified the issues involved.To put the pointgently, the idea that onlyBayesians are concernedabout model uncertainty is wrong. 324 Commentson the Paper by Lindley Dennis Lindleyputs decision-making as a primaryobjective. Now I agree thatsuch questionsas why is thisissue being studiedand whatare the consequencesof such and such conclusionsmust always be consideredwhatever the field of study.At the same time I have rarelyfound quantitative decision analysisuseful although I certainlyaccept thisas a limitationof personalexperience and imagination. For example,in the AIDS predictionsmentioned above, the summaryof the forecastsinto a recom- mendedplanning basis was based on an informaldecision analysis based on the qualitativeidea thatit was betterto overpredict,leading to an overprovisionof resources,than to underpredict,leading to a shortfall.It wouldhave been difficultto putthis quantitatively other than as a veryartificial exercise via a seriesof sensitivityanalyses. In mostof theapplications that I see, therole of statisticalanalysis is, in anycase, to providea base forinformed public discussion. Over the designof experiments,I do not see thatas primarilythe preserveof statisticiansand it is importantthat most experimentsare done to add to the pool of public knowledgeof a subject and thereforeshould, for their interpretation at least,not be too stronglytied to thepriors of theinvestigator. Witha differentinterpretation of theword public this applies to industrialexperiments also. I do not understandthe commentthat theory makes a predictionabout the proportionof hypotheses rejectedat the 5% level in a significancetest thatare in fact false. How can theorypossibly show anythingof the sort?They may all be false or all (approximately)true depending on whatwe chose to investigate.I agree that it is the case thatmany assessmentsof uncertaintyunderestimate the error involvedbut this is fora varietyof empiricalreasons, the use of models ignoringcertain components of varianceor biases (which statisticianssurely do not in generalignore), real instabilitiesin the effects underinvestigation and so on. My attitudemay be partlya reflectionof a lack of masteryof currentcomputational procedures but I am deeplysceptical of the advice of Savage to take models as complexas we can handle.This seems a recipefor overelaboration and forthe abandonmentof an importantfeature of good statisticalanalyses, namelytransparency, the ability to see thepathways between the data and theconclusions. I agree thatprediction is underemphasizedin many treatmentsof statisticsand thatthe test of a representationof data is its abilityto predictnew observationsor aspectsof theoriginal data notused in analysis. But this does not mean thatprediction is necessarilythe rightfinal objective. We are not interestedin estimatingthe velocity of lightto predictthe next measurement on it. In conclusion,and not directlya commenton the paper,I wantto object to the practiceof labelling people as Bayesian or frequentist(or any other'ist'). I wantto be both and can see no reason fornot being,although if pushed,as I have made clear,I regardthe frequentistview as primary,for most if not virtuallyall the applicationswith which I happento have been involved.I hope thatby combiningthis view witha highregard for the present paper I am notcommitting an ultimatesin: incoherency.

J. Nelder (ImperialCollege ofScience, Technologyand Medicine,London) Recently(Nelder, 1999) I have argued that statisticsshould be called statisticalscience, and that probabilitytheory should be called statisticalmathematics (not mathematicalstatistics). I thinkthat ProfessorLindley's paper shouldbe called thephilosophy of statisticalmathematics, and withinit there is littlethat I disagreewith. However, my interestis in the philosophyof statisticalscience, whichI regardas different.Statistical science is notjust about the studyof uncertainty,but ratherdeals with inferencesabout scientifictheories from uncertain data. An importantquality about theories is thatthey are essentiallyopen ended; at anytime someone may come along and producea new theoryoutside the currentset. This contrastswith probability, where to calculate a specificprobability it is necessaryto have a boundeduniverse of possibilitiesover which the probabilities are defined.When there is intrinsic open-endednessit is notenough to have a residualclass of all thetheories that I have notthought of yet. The best thatwe can do is to expressrelative likelihoods of differentparameter values, withoutany implicationthat one of themis true.Although Lindley stresses that probabilities are conditionalI do not thinkthat this copes withthe open-endedness problem. I followFisher in distinguishingbetween inferences about specific events, such as thatit will rainhere tomorrow,and inferencesabout theories. For inferencesabout events, Lindley's analysis is persuasive;if I were a businessman tryingto reach a decision on whetherto investa millionpounds in a project,I would act verymuch as he suggests.In analysingdata relativeto one or morescientific theories, I would wishto presentwhat is objectiveand notto mix thiswith subjective probabilities which are derivedfrom mypriors. If theexperimenter whom I am workingwith wishes to combinelikelihoods with his own set of weightsbased on his (doubtlessmore extensive) knowledge then he is at libertyto do so; it is notmy job to do it forhim. However,if he wishes to communicatethe resultsto otherscientists, it would be Commentson the Paper by Lindley 325 better,in my view, to stay withthe objectivepart. (This paragraphis heavilydependent on ideas of GeorgeBarnard.) Generalideas like exchangeabilityand coherenceare finein themselves,but problems arise whenwe tryto applythem to data fromthe real world.In particularwhen combininginformation from several data sets we can assume exchangeability,but the data themselvesmay stronglysuggest that this assumptionis nottrue. Similarly we can be coherentand wrong,because theworld is notas assumedby Lindley.I findthe procedures of scientificinference to be morecomplex than those defined in thepaper. These latterfall intothe class of 'wouldn'tit be nice if', i.e. would it not be nice if the philosophyof statisticalmathematics sufficed for scientific inference. I do notthink that it does.

A. P. Dawid (UniversityCollege London) It is a real pleasureto commenton this paper. Dennis Lindley has been one of the most significant influenceson myprofessional life, and his wordsare alwaysworth reading carefully and takingto heart. It is in no way a criticismto say thatI recognize,in the currentwork, ideas thatDennis has been promotingthroughout the 30 yearsand morethat I have knownhim-these thingsare stillworth saying, perhapsnow morethan ever. For thosewho wishto read moreof his penetratingand thought-provoking analyses,I particularlyrecommend Lindley (1971) and Lindley(1978), whichcontain some fascinating and educationalexamples of the differencesbetween the frequentistand the Bayesian approachesto problemsand clearlypoint up the logical difficultiesthat can arise when we do not conformto the principlesof Bayesiancoherence.

A casefor SherlockBayes? A recentand very importantreal example of this has arisen in the area of forensicidentification, a problemarea to whichLindley made some importantearly contributions (Lindley, 1977). We are asked to comparetwo cases. The followingdetails are commonto each case. A murderhas been committed,and a DNA profile,which can be assumedto be thatof themurderer, has been obtained fromblood at thescene of thecrime. A suspecthas been apprehended,and a DNA profileobtained from his blood. The two profilesmatch perfectly. The probabilityof thisevent, if the suspectis innocent,is some small numberP-a realisticvalue mightbe P = 10-6. (It is assumedthat a matchis certainif he is guilty.Then smallervalues of the 'match probability'P may reasonablybe taken as expressing strongerevidence against the suspect.) There is no otherdirectly relevant evidence. The differencebetween the two cases is thatin case 1 the suspectwas picked up at random,for completelyunrelated reasons, and, on being tested,was foundto matchthe DNA fromthe scene of the crime,whereas in case 2 a searchwas made througha computerdatabase containing the DNA profilesof a largenumber N (perhapsN = 10000) of individuals,and thesuspect (and no-oneelse in thedatabase) was foundto match. The questionto be addressedis 'In which of these two cases is the evidence againstthe suspect stronger?'.(Note thatthis question relates only to the strengthof the evidence;we are not concerned withthe possibility that the prior probabilities in thetwo cases mightreasonably be different.) The defencecounsel argues, with mathematical correctness, that, because of the 'multipletesting' that has takenplace in case 2, theprobability of findinga (single) matchin thedatabase, if the true murderer is notincluded in it,is aroundNP (theprobability of findingmore than one matchis entirelynegligible). Since thismatch probability NP forcase 2 is verysubstantially larger than the matchprobability P for case 1,that means that the evidence against the suspect in case 2 is verymuch weaker. The prosecutioncounsel points out that, in case 2, one consequenceof thesearch was to eliminatethe otherN - 1 individualsin the database as possible alternativesuspects, thus increasing the strengthof theevidence against the suspect-albeit by a typicallynegligible amount above thatfor case 1. Readersmay like to assess whetherthey are intuitiveBayesians or intuitivefrequentists by deciding whichof thesetwo argumentsthey prefer. Although both are based on probabilityarguments, only one of themis in accordancewith the coherentuse of probabilityto measureand manipulateuncertainty. Insteadof identifyingwhich this is, I shalljust give a hint:consider the extremecase thatthe database containsrecords on everyonein thepopulation. For furtherreading on thisproblem, see Stockmarr(1999) and Donnellyand Friedman(1999); for moregeneral application of coherentBayesian reasoning to forensicidentification problems, see Balding and Donnelly(1995) and Dawid and Mortera(1996, 1998). 326 Commentson the Paper by Lindley ThomasBayes in the21st century I shareLindley's view that,much as the tremendousrecent expansion of interestin Bayesian statistics is to be welcomedand admired,its emphasison computationalaspects can sometimesstand in theway of a fullerunderstanding and appreciationof the Bayesian approach. It was the deep logical and philosophicalconundra that beset the makingof inductiveinferences from data thatattracted me into statisticsin the firstplace and have exercisedme ever since. But I have alwaysbeen disappointedthat so few otherstatisticians seem to share my view of statisticsas 'applied philosophyof science', and even thatsmall numberseems to be dwindlingfast. On the positiveside, thereare increasingnumbers of researchersin artificialintelligence and machine learning who are taking foundationalissues extremelyseriously and are conductingsome very originaland importantwork. It is ironic that,as statisticiansdevote more of theireffort to computing,so computerscientists are applyingthemselves to statisticallogic. When I was startingout, Bayesian computationof any complexitywas essentiallyimpossible. We could handle a few simple normal,binomial and Poisson models, and that was it. Whateverits philosophicalcredentials, a common and valid criticismof Bayesianismin those days was its sheer impracticability.Indeed, when I was engaged in organizingthe firstmeeting on 'Practical Bayesian statistics'(sponsored by whatwas thenstill the Instituteof Statisticians)in Cambridgein 1982, it was stillpossible foran eminentstatistician to writeto the Institute'snewsletter suggesting that this was 'a contradictionin terms':an extremeand biased judgment,perhaps, but witha grainof truth.So, as we could notcompute, we had to devoteourselves instead to foundationalissues. How thingshave changed! Withthe availabilityof fastcomputers and sophisticatedcomputational techniquessuch as Markov chain Monte Carlo sampling,Bayesians can now constructand analyse realisticmodels of a degree of complexitywhich leaves most classical statisticiansfar behind.This power and versatilityis itself a very strongargument for doing statisticsthe Bayesian way-far stronger,perhaps, than deep considerationof thelogic of inference.But it would be sad if thispractical success were at the expenseof a clear understandingof whatwe are doing,and whywe are doing it. Whatis theprincipal distinction between Bayesian and classical statistics?It is thatBayesian statistics is fundamentallyboring. There is so littleto do: just specifythe model and the prior,and turnthe Bayesian handle. There is no room for clever tricksor an alphabeticcornucopia of definitionsand optimalitycriteria. I have heard people who should know betteruse this 'dullness' as an argument againstBayesianism. One mightas well complainthat Newton's dynamics, being based on threesimple laws of motionand one of gravitation,is a poor substitutefor the richness of Ptolemy'sepicyclic system. Nevertheless,the Ptolemaictemptation is difficultto resistand is apparentin much neo-Bayesian work,which struggleshard to escape fromthe restrictiveconfines of the fullycoherent subjectivist Bayesianparadigm, dreaming up insteadits own new and clevertricks. I regardthis as a seriouslywrong direction.All my experienceteaches me thatit is invariablymore fruitful,and leads to deeperinsights and betterdata analyses,to explorethe consequences of beinga 'thoroughlyboring Bayesian'. Withouta clear appreciationof what being coherententails, and the guidance thata strictBayesian framework supplies, it is all too easy to fall into erroneousand misguidedways of formulatingproblems and analysingdata.

J. F. C. Kingman(University of Bristol) This paper is of greatimportance. If 'philosophy'is read as 'generalprinciples', the authoris laying down the generalprinciple that the outputfrom any statisticalanalysis should consistof a numberof probabilitystatements. These are subjectivein the sense thatthey depend on assumptionsmade by the analystand statedin thereport, and anotheranalyst with different prejudices will producedifferent con- clusions. I use theword 'analyst'rather than 'statistician' because the argument,if it is valid at all, may apply not just to statisticalmethod but to any reportedresearch in which uncertaintyplays a part. Thus ProfessorLindley is calling for a revolutionin the way thatresearch in general is carriedout and reported,and is doing so on the basis of verysimple argumentsof coherence.If we do not followhis advice,he can makemoney systematically from us by askingus to bet on our conclusions. I firstencountered the clarityand deceptive simplicityof ProfessorLindley's exposition as a Cambridgefreshman listening enthralled to his introductorycourse on statistics.Much of whathe taught us thenhe would now recant,but the way in whichthe complexitiesof an uncertainworld were fitted intoan elegantand convincingtheory was deeplyimpressive. Perhaps mathematicians select themselves Commentson the Paper by Lindley 327 bythis desire to reducechaos to orderand onlylearn by experiencethat the real world takes its revenge. The most commonreason for scepticismabout the Bayesian approachis the apparentlyarbitrary natureof the priordistribution p(O), but I worryeven more about the 'model' p(x10), whichso many statisticians,Bayesian or otherwise,seem to takefor granted. Just what evidence do we need to convince us thata particularmodel, with a particularmeaning for the parameter0, is or is not appropriateto a particularproblem? Special aspectsof thisquestion have of coursebeen studiedin theoreticalterms, but in practicemany statisticiansmake a conventionalchoice, oftenbased on mathematicalor computationalconvenience. This habit seems to me to be based on a feelingthat, although is difficultand controversial,the probability calculus at leastis a firmfoundation that need notbe questioned. Mathematicianssince Kolmogorovhave connivedby presentingthe mathematicsof probabilityas followingirresistibly from the general theory of measureand integration,but the internal consistency of themathematics is no guaranteethat it applies to anyreal situation.Philosophers warn of thedangers of attachingfirm meaning to any probabilitystatement about the world,and the factthat such statements are undeniablyuseful to (for instance)the designersof telephonesystems should not lead us to an uncriticalreliance on whatis in theend onlya collectionof mathematicaltautologies. One examplemust suffice. We teachour studentsthat two eventsare (statistically)independent when theprobability that they both occur is the productof theirprobabilities. We thenforget the adverb,and assumethat, if we cannotsee anycausal linkbetween two events,the multiplication law mustapply. At the level of constructinga plausiblemodel, this is a reasonableprocedure, so long as themodel is then tested.But how do we testthe sortof assertionthat is made aboutthe safetyof nuclearpower-stations, thatthe probability of disasteris 10-N, whereN is some verylarge number. The assertionis based on manyapplications of the 'multiplicationlaw', ignoringthe factthat the justification of the law is in- herentlycircular. So probabilitystatements are dangerouscurrency even before we tryto inferthem from dirty data. We mustdistrust the prophetswho can sum up all the complexitiesin a few simple formulae.But such scepticismdoes not absolve statisticiansfrom asking what the generalprinciples of theirsubject are. If we do notaccept Professor Lindley's prescription, what alternative do we have?

David J. Bartholomew(Stoke Ash) It is a pleasureto commenton thislucid and authoritativeexposition of subjectiveBayesianism. There is muchin the paper withwhich I whole-heartedlyagree, but I shall focus on the pointsof difference.I agree thatuncertainty and variabilityare at the heartof statistics.Unlike the author,however, I regard variabilityas themore fundamental. Data analysisthen comes firstand does nothave to be justifiedlater as a tool formodel selection.Uncertainty arises naturally, but secondarily,when we need to thinkabout p(ylx) or p(0lx). My main pointconcerns modelling. Debates on inferencehave oftentreated the model as givenand so focused on the prior distribution.How the model is chosen is much more importantfor the philosophicalfoundations of statistics.Lindley argues for the largestpossible model. What happens when we push thisto the limitand tryto imaginea trulyglobal model forthe whole cosmos? In the beginningthe model's x would have to includeliterally everything that could be observed.At thatstage K, the backgroundknowledge, is an empty set. How could we then assign a prior withoutany backgroundknowledge? If thisis impossible,how does thejourney to knowledgeever start? But if we allow that,somehow or other,it did startwhat matters now is how we proceed;just how largedoes the 'world' of themodel have to be? How do we cope withthe fact that different models may have the same observationalconsequences? Why shouldtwo scientistsever agree,no matterhow much data theyhave in common,if theyare operatingwith different, but equally well-supported,models? In anycase, anyrealistic world model is underdeterminedand so certaintyis beyondour reach. Experimentalscience attemptsto solve the 'small' world problemby controllingall extraneous variables.R. A. Fishertook that idea furtherby usingrandomization to ensurethat the extraneous effects were controlledon average.It is a weaknessof the subjectiveBayesian philosophy that it has no place forrandomization and thusno way of makingvalid unconditionalinferences in a smallworld. Instead it is leftto flounderon theslippery slope of whatlooks suspiciouslylike an infiniteregress. Finally,it is the personal focus of the philosophywhich makes me most uneasy. The pursuitof knowledge-inferenceand decision-making-isa collectiveas well as an individualactivity. It is not, essentially,about what it is rationalfor you or me to believe and do, butabout what claim thereis on us all, collectively,to believe somethingor to act in a particularway. Lindley recognizes this distinction in 328 Commentson thePaper by Lindley decision-makingbut it is equallyrelevant in inference.Inference is conditionalon themodel and with- out agreementon wherethe journey starts there is no guaranteethat we shall all arriveat the same destination. Attractivethough it is, the author'sworld of discourseseems too small and self-containedto be the last word.

A. O'Hagan (Universityof Sheffield) I congratulateDennis Lindleyfor his elegantlywritten paper, that so lucidlycovers an enormousrange of fundamentaltopics. I particularlyliked the sectionon 'data analysis'in Section 10. The idea, thatwe should conditionon whateversummaries t(x) of the data have been used in buildingor checkingthe model,is a real insight.It clearlycovers the case wherewe use partof the data as a 'trainingsample', reservingthe remainingdata forconfirmatory analysis, and so linksto the use of partialBayes factors (O'Hagan, 1995). It will, of course,be moredifficult to apply followingmore loosely structured'data analysis'. I also applaud the emphasis,in the finalsection, on the need forresearch into methods of assessing (or eliciting)probability distributions. Lindleymisses an opportunity,however, to showhow theBayesian approach clarifies the concept of a nuisance parameter.He says, at several points, that it may be 'necessary' to introducenuisance parametersa in additionto theparameters of interest0. In whatsense is this'necessary'? Lindleyintroduces, in Section9, the exampleof a doctorneeding to give a prognosisy fora patient based on observationsx fromprevious patients. He says thatthis could be done simplyby assessingthe predictivedistribution p(ylx), but thatthis is 'usually difficult'.He thenasserts that 'a betterway to proceed... is to studythe connections between x and y,and themechanisms that operate'. This argument onlyworks if we recognizethe limitations of practicalprobability assessment. To assess p(ylx) directlyis notjust 'difficult'but likely to be veryinaccurate, whereas constructing it indirectlyvia otherassessments (a model) and the laws of probabilityis bothmore accurate and moredefensible to others.In O'Hagan (1988) I developthis idea of 'elaboration'as a fundamentaltool of probability measurement. Taking this view, nuisance parametersare desirable (if not absolutely 'necessary') to achieve sufficientlyaccurate and defensibleassessments; we can be confidentof assessingp(ylx, a, 0) but not of assessingp(ylx, 0). Finally,I recognizethat in such a concise surveyas this it is necessaryto make some judicious simplifications,but thereare a fewplaces whereLindley risks damaging his argumentby being sim- plisticrather than simplified.

(a) The examplein Section8 of non-conglomerabilitywhen ignorance is representedby an improper uniformdistribution is nota good one, because theconditional probabilities are notdefined. The conditioningevents have zero probability. (b) At the end of Section 8, it is not truethat consensus will be reachedif, as statedin the next sentence,we admitsubjectivity about the likelihood. (c) At the end of Section 11, the possibilityof tacticalvoting is overlooked.One's utilityis not simplya matterof thenumber of votescast forthe 'best' candidate.

David J. Hand (Imperial College ofScience, Technologyand Medicine,London) I would like to congratulateProfessor Lindley on a masterfullyclear expositionof the Bayesian perspective.The argumentsthat he presentsfor adopting this approach to inferenceare difficultto refute. Nevertheless,I must take issue with the paper,and my disagreementbegins with the title.What is describedin thepaper is a philosophyof statisticalinference, not a philosophyof statistics.As such,it ignoresmuch which should properly be regardedas withinthe orbit of statistics. Lindleysuggests, in Section 2, thatstatistics is the studyof uncertainty.This is certainlyone of the mostimportant aspects of statistics,perhaps the largest part, but it does notdefine it. At thevery least, it leaves out description,summarization and simplificationwhen the data are notuncertain and the aim is not inference,as arises, for example,when the completepopulation is available foranalysis. Would ProfessorLindley claim thatdata analytictools such as multidimensionalscaling and biplotsare not statisticaltools? Would he claim thatthe clusteringof chemicalmolecules, when data are available for the entirepopulation of a given familyof molecules,is not statistics?Would he claim thatclustering microarraygene expressiondata is notstatistics? This would not be a serious issue if it were merelya matterof terminology.But it is not. It goes Commentson the Paper by Lindley 329 furtherthan this and has implicationsfor how the disciplineof statisticsas a whole is perceived.In my view,the narrowview of statisticswhich it implieshas contributedto the factthat other data analytic disciplineshave grownup and adoptedsubject-matter, kudos and resourceswhich are moreappropriately regardedas belongingto statistics.Again, this would notmatter if it were onlya questionof hurtpride. But, again,there is moreinvolved. In particular,it meansthat the elegant tools forhandling uncertainty whichhave been developedby statisticianshave notalways been adoptedby othersconcerned with data analysis,and have thereforenot been appliedto problemswhich could benefitfrom them. For example, database technologistshave not always appreciatedthe need for inferentialmethods (a case in point beingthe analysisof supermarkettransaction data, where the discoveryof a relationshipin thedata has been takenat face value, withno explicitlyarticulated notion of an underlyingpopulation from which the data were drawn).A second exampleis the case of fuzzylogic. The underlyinglogic herehas not been uniquelydefined, and thearea certainlylacks theelegant rigour of theBayesian inferential strategy describedby ProfessorLindley. But the methodsnow attracta huge following.This followingtends to come fromthe computationaldisciplines where,because of the narrowview of statisticsdescribed above, statisticalinferential methods have not been adopted as fundamental.A thirdexample is computationallearning theory, which began by assumingthat the classes in supervisedclassification problemswere, in principleat least,perfectly separable and has onlyrecently begun to strugglewith the morerealistic non-separable case whichstatisticians take for granted. Sometimesthere is a convergence artificialneural nets are a most importantrecent example, and recursivepartitioning tree methods,developed in parallel by the statisticaland machine learning communities,are another.When thishappens significant synergy can resultfrom the integrationof the differentperspectives. It is a pity,and detrimentalto the rate of scientificprogress, that a period of separationhas to existat all. One issue on which I would welcome ProfessorLindley's commentsis the issue of what I call 'problemuncertainty'. The inferentialstrategy outlined in the paper capturesmodel uncertaintyand samplinguncertainty, but real problemsoften have an extralayer of uncertainty,in thatthe question that the researcheris tryingto answeris not preciselydefined. An obvious illustrationlies in the need to operationalizemeasurements: in physicswe mayhave a good idea thatour measuring instruments match our conceptualdefinition of a variable,but in manyother domains things are notso clear cut. Our model maypredict a good outcomeif we measurea responsein one way,but whatif thereis a disagreement aboutthe best way to measurethe response? An extremeexample would be quality-of-lifemeasurement. Similarly,in a clinical trial,the responseto a treatmentmay be measuredin differentways. And in classificationproblems, for example, it is not always clear how to weightthe relativecosts of the differentkinds of misclassification.How shouldwe takeinto account this kind of uncertainty? On a minorpoint, if I disagreewith Professor Lindley about the scope of statistics,I perhapsdisagree withhim even more about the scope of literature.Analyses of word countsmay 'help to identifythe authorof an anonymouspiece of literature'(Section 18), but theydo not say anythingabout literature per se. I would like to end on a noteof agreement.Lindley remarks, in his finalparagraph, that 'Our journals ... have been too divorcedfrom the client's requirements'. This seems too painfullyto be thecase. The focus seems to be increasinglyon narrowtechnical advance into increasinglyspecialized areas, with greatermerit being awarded to workwhich is moreabstract and moredivorced from the realities of data. Statisticshas enoughof an image problemto overcome,without our gratuitouslyaggravating it. I am painfullyreminded of Ronald Reagan's remark,that 'Economists are people who see somethingwork in practiceand wonderif it would workin theory'.I would hate statisticiansto be tarredwith the same brush.

George Barnard (Colchester) Space does notpermit listing the many points of disagreement,and some pointsof agreement,between me and my friendProfessor Lindley. My centralobjection to probabilityas the sole measure of uncertaintyis therule Pr(H) + Pr(notH) = 1. If H is a statisticalhypothesis that is relevantto a given data set E it mustspecify the probabilityPr(EI H) of E. But the mere assertionthat H is false leaves Pr(EIH) whollyunspecified. It is only when given a particularmodel, i.e. a specifiedcollection M of hypotheses,that we are entitledto equate 'not H' with'some otherhypothesis in M'. Our modelmay be wrong,and the primaryfunction of traditionalp-values is to pointto thispossibility. If M is wrong,in repeatedexperimentation p will shrinkto 0. Giventhat M is accepted,our statisticalproblem becomes that of weightingthe evidence for any one 330 Commentson thePaper by Lindley H inM againstthat for any other H' in M. Thisis doneby calculating the likelihood ratio W = L(H versus H'IE) = Pr(EIH)/Pr(EIH'). In anylong series of judgments between pairs H andH', ifwe chooseH ratherthan H' whenL exceeds w andchoose H' ratherthan H whenL is lessthan 1/w, leaving our choice undetermined when L falls between1/w and w,correct choices will outnumberincorrect choices by at leastw: 1. Forimportant choiceswe mightfix w = 100,insisting on moredata when w fallsbetween 100 and 1/100.For less importantchoices we mightbe willingto take w = 20. Themore important our choice, the more data we mayneed to collect. Likelihoodscannot always be added.But if with data E givingL(a, lSIE)we areinterested in a but notin /3then, provided that the data themselves are reasonablyinformative about /3, adding L(a, /3) over/3 values is permissibleas a reasonableapproximation. Nowadays desk-top computers allow us to overviewL(a, /) and it is easy to see whether,for thedata to hand,such an approximationis permissible. Fishernever had a desk-topcomputer. But in everyedition of StatisticalMethods for Research Workershe said thatlikelihood was themeasure of credibilityfor inferences. To keepto statistical methodsthat were actually usable in hisday he hadto overstressp-values. I am surethat Fisher would changehis mind today, and I hopethat Professor Lindley may be persuadedto do thesame.

Brad Efron(Stanford University) Thelikelihood principle seems to be oneof those ideas that is rigorouslyverifiable and yet wrong. My difficultyis that the principle rules out many of ourmost useful data analytic tools without providing workablesubstitutes. Here is a bootstrapstory, not entirely apocryphal, toillustrate the point. A medicalresearcher investigating a new type of abdominal surgery collected the following data on thepost-operative hospital stay, in days, for 23 patients: 1 2 3 3 3 3 4 4 4 4 5 5 5 5 6 6 7 7 8 9 10 16 29.

04

to

4 5 7 3 4 5 6 7 8 9 bootstrap10% trimmedmean Fig. 1. 2000 bootstrapreplications of the 10% trimmedmean forthe hospitalstay data:.*---.----,Bayesian bootstrap Commentson the Paper by Lindley 331 Followinga referee'sadvice she had summarizedthe data withits 10% trimmedmean, 5.35, butwanted some formulafor the estimate's accuracy. To helpto answerher question I drew2000 independentbootstrap samples, each comprising23 draws withreplacement from the data above, and foreach sample computedthe 10% trimmedmean. The histogramof the 2000 bootstraptrimmed means is shownin Fig. 1. From it I calculateda bootstrap standarderror of 0.87 and a nonparametric90% bootstrapapproximate confidence interval [4.42, 7.25]. It was interestingto notice thatthe intervalextended twice as farabove as below the pointestimate, reflectingthe long right-hand tail of thebootstrap histogram. This is exactlythe kind of calculationthat is ruled out by the likelihoodprinciple; it relies on hypotheticaldata sets differentfrom the data thatare actuallyobserved and does so in a particularly flagrantway. Some of the bootstrapsamples put more than 10% sample weighton the two largest observations,16 and 29, givingthem influence on the 10% trimmedmean thatthey do not exertin the originaldata set. As a matterof factthis effectaccounts for most of the long right-handtail in the histogram. It is notas thoughBayesian theory does nothave anythingto say aboutthis example. We mightput an uninformativeDirichlet prior on the class of probabilitydistributions entirely supported on the 23 observedvalues, as suggestedin Rubin (1981) and in chapter10 of Efron(1982), and calculate the posteriordistribution of the population 10% trimmedmean. Interestingly,this posteriordistribution agreesclosely with the bootstrap histogram I have indicatedit by thedotted histogram in Fig. 1. But of coursethis is not a genuineBayesian analysis; it is empiricalBayesian, using the data to guide the formationof the 'prior'distribution. A well-knownquote of ProfessorLindley says thatnothing is less Bayesian than empiricalBayes analysis. Does he still feel this way? My feelingis that good frequentistprocedures are often carryingout somethinglike an 'objective' Bayesian analysis, as suggestedin Efron(1993), and thatmaybe this hints at a usefulconnection between the realities of data analysisand thephilosophic cogency of theBayesian argument.

D. A. Sprott(Centro de Investigacionen Matemacticas,Guanajuato) Some of my doubtsabout the applicationof the ideas in this paper to scientificinference are listed brieflybelow.

(a) This paper relegates statisticaland scientificinference to a branch (probability)of pure mathematics,where inferences are deductivestatements of implication:if HI thenH2. This can say nothingabout whether there is reproducibleobjective empirical evidence for HI or H2, as is requiredby a scientificinference. Scientific inference transcends pure mathematics. (b) In particular,Bayes's theorem(1) requiresthat all possibilitiesHI, H2, ..., Hk be specifiedin advance, along with theirprior probabilities.Any new, hithertounthought of hypothesisor conceptH will necessarilyhave zero priorprobability. From Bayes's theorem,H will thenalways have zero posteriorprobability no matterhow strongthe empirical evidence in favourof H. (c) This demonstratesthe necessity of presentingthe likelihood function to summarizethe objective experimentalevidence. But whatif the likelihood function flatly contradicts the prior distribution, leading to a posteriordistribution that flatlycontradicts both the prior distributionand the likelihoodfunction? Surely contradictory, conflicting, items cannotmerely be routinelycom- bined. (d) Likelihoods are point functionswhereas probabilitiesare set functions.Likelihood therefore measuresthe relative plausibility of two specificvalues, 0':0", of a continuousparameter 0. The probability,however, of each specificvalue, 0' and 0", is 0. Probabilitymeasures the uncertainty of intervals.The practicalvalue of usinglikelihood supplemented by probability(if possible) to measureuncertainty is illustratedby Diaz-Francesand Sprott(2000). (e) I do not see how theinjection of subjectivebeliefs into experimental evidence (Section 8) can be justified.Beliefs are necessaryin designingexperiments. To injectthem into the analysisof the objectivedata could lead to proofby assumptionor belief,or to the combinationof contradic- toryitems as in (c) above. The beliefsmay just be plainlywrong and should be rejected,not combined,however 'incoherent' this would be. In any case the likelihoodshould be presented separatelyas a summaryof theempirical evidence, uncontaminated by beliefs.As putby Bernard (1957), page 23,

'I considerit, therefore, an absoluteprinciple that experiments must always be devisedin view of a 332 Commentson thePaper by Lindley

preconceivedidea .... As fornoting the results of an experiment,... I positit similarlyas a principle thatwe musthere, as always,observe without a preconceivedidea.'

Ian W. Evett (ForensicScience Service,London) I am gratefulfor this opportunity to commenton thepaper by ProfessorLindley. He and I have known each otherfor over 25 yearsand it would be difficultfor me to exaggeratethe effectthat he has had on mythinking. His view, like that of the majorityof statisticians,is that of the mathematician.I am, firstand foremost,a scientist indeed, a forensicscientist. My perspectiveis quite differentand mightbe considerediconoclastic, renegade even, to readersof thisjournal. It is appropriatethat Dennis shouldmention that there appears not to be a strongassociation between statisticsand physics.I tookmy first degree in physicsand myintroduction to statisticscame in myfirst year.It was a two-hourlecture on 'errorsof observationand theirtreatment'; the lecturerwas so proud of it thathe recordedhimself for future use. I understoodnone of it. My secondyear included a course on statisticsfrom a real live academic statistician.I am not exaggeratingwhen I say thatI foundit completelymystifying. Now clearly,I had to do somethingin mypractical experiments to indicatethe extentof the uncertaintiesin any estimatesor otherinferences that I drewfrom my observations.But thatwas not reallya problem,because it soon became clear thatmy supervisors,to say nothingof my fellowstudents, understood no moreof thefiner points of statisticsthan I did! A good amountof fudging to roundoff one's experimentalreports was quiteenough to satisfythe most scrupulous demonstrator. Later by thattime a practisingforensic scientist I was sufficientlyfortunate to studystatistics full timein a post-graduatecourse at Cardiff.Since then,I have spentmost of mytime working with forensic scientistsof all disciplineson mattersof inferenceand statistics.A large proportionof my time is devotedto training withthe emphasison new entrantsto the ForensicScience Service.And whatdo I find?Most graduatescientists have learnedlittle of statisticsother than a dislikeof the subject.Even moreimportantly they have no understandingof probability.A forensicscientist spends his or hertime dealing with uncertainty.In courts,terms such as 'probable', 'unlikely' and 'random' are everyday currency.I have thoughtfor a longtime that if a forensicscientist is indeeda genuinescientist then he or she shouldunderstand probability. Yet it is myimpression that many statisticians are ratheruncomfortable with the notion of probability. Certainly,there is plentyof talk about long run frequencies but what about the probabilitiesof real world problems?Is it not a fundamentalweakness that most texts and teaching schemes present conditionalprobabilities as somethingspecial? Even such a delightfultext as Chance Rules (Everitt, 1999) talks quite happilyabout 'probability'before introducing 'conditional probability', almost as a new concept,in chapter7. All probabilitiesare conditional.There is no such thingas an 'unconditional probability'.A probabilityis withoutmeaning unless we specifythe information and assumptionsthat it is based on. Whereasthese assertionsare to me obvious,I sense thatmany statisticians would dispute them. Much progresshas been made towardsestablishing the principlesof forensicscience throughthe Bayesian paradigm.There is littledisagreement among Bayesians and frequentistalike thatBayes's theoremprovides a logical model forthe processing of evidencein courtsof law. I have heardclassical statisticianswho have venturedinto the field say thingslike 'I have nothingat all againstBayesian methods indeed,I use them myselfwhen they are appropriate'.But this is the cry of the toolkit statistician a statisticianwho lacks a philosophy.In the world of science here lies the distinction betweenthe scientist and thetechnician. The new technologyof DNA profilinghas caughtthe headlinesand broughtmany statisticians into thefield. Much of thishas been highlybeneficial to thepursuit but there have been severaleccentricities, arisingfrom the classical view,that have confusedrather than illuminated. An example of misguided statisticalthinking is thenotion of significancetesting for Hardy-Weinberg equilibrium (HWE). We all knowthat the conditions for HWE cannotexist in thereal worldbut what do we do whena new locus is implementedfor DNA profiling?We set out to testthe null hypothesisthat HWE is true-even though we knowthat it is patentlyfalse! Whydo we play thesesilly games? They do nothelp science and they do nothelp the advancement of statisticsas a scientificdiscipline. My view of statistics,as a scientist,is thatthere is the Bayesian paradigmand thereis everything else a hotchpotchof significancetesting and confidenceintervals that is at best peripheralto the scientificmethod. Yet this is whatis taughtto scienceundergraduates and most,like me, are mystifiedby it. In his concludingparagraph, Dennis says thatstatisticians 'have been too divorcedfrom the client's Commentson the Paper by Lindley 333 requirements'.Here is my requestas a client:in the future,I requirethat all new science graduates shouldunderstand probability. And I am nottalking about coin tossingand long runfrequencies: I am talkingabout probability as thefoundation of logical scientificinference.

Author's response As explainedat the end of Section 1, thispaper began as a reprimandto my fellowBayesians fornot being sufficientlyBayesian, but it ended up by being a statementof myunderstanding of the Bayesian view. To many,this view acts like a red rag to a bull and I am most appreciativeof the factthat the discussantshave notbeen bullishbut have broughtforward reasoned and sensiblearguments that carry weightand deserverespect. Limitations of space preventevery point from being discussed and omission does notimply lack of interestor dismissalas unimportant. Manyof thediscussants are reluctantto abandonfrequentist ideas and I agreewith Armitage that this is notjust inertia,though the difficulty all of us experiencein admittingwe werewrong must play a part. There are at least two solid reasons forthis: frequencytechniques have enjoyedmany successes and, throughthe concept of exchangeability,share many ideas withthe Bayesian paradigm. Against this there is the considerationthat almost everyfrequentist technique has been shown to be flawed,the flaws arisingbecause of the lack of a coherentunderpinning that can only come throughprobability, not as frequency,but as belief.A second considerationis that,unlike the frequency paradigm with its extensive collectionof specializedmethods, the coherent view providesa constructivemethod of formulatingand solvingany and everyuncertainty problem of yours.30 years ago I thoughtthat the statisticalcom- munitywould appreciatethe flaws,but I was wrong.My hope now is thatthe constructiveflowering of the coherentapproach will convince,if not statisticians,scientists, who increasinglyshow awarenessof thepower of methodsbased on probability,e.g. in animalbreeding (Gianola, 2000). The paper,stripped to its bare essentials,amounts to sayingthat probability is thekey to all situations involvinguncertainty. What it does not tell us is how the probabilityis to be assessed. I have been surprisedthat, although the rules are so simple,their implementation can oftenbe so difficult.Dawid providesa strikingexample that caused me muchanguish when it was firstencountered in the 1970s. Kingman'sexample of independenceis sound and pertinent.I findit illuminating,when seeing yet anotherintroductory text on statistics,to see whetherthe definition of independenceis correct;often it is not. Some elementarytexts do not even mentionthe notionexplicitly and conditionalprobability is rarelymentioned. No wondermany of thesetexts are so poor. In his perceptivecomments, Cox may not have appreciatedmy view of the relationshipbetween a statisticianand theirclient. It is not the statisticianwho has the probabilitiesbut the client; the statistician'stask is to articulatethe client's uncertainties and utilitiesin termsof theprobability calculus, these being 'based on serious and, so far as feasible,explicit information' that the client has. This informationmay be based directlyon data but oftenit uses deep understandingof physicaland other mechanismsthat are unfamiliarto the statistician.The idea of a statisticianstarting from a position almostof ignoranceabout mobile telephonesand updatingby Bayes's theorem,using whatthe expert says, is not how I perceivethe process. The clienthas informedviews and it is these thatneed to be quantifiedand, if necessary,modified to be made coherent.In the mobile telephonesexample there are presumablymany clientsincluding the manufacturersand environmentalistswith opposingconcerns. Althoughideas exhibitedin thepaper do notdirectly apply to groupsin opposition,and I do notknow of anymethod that does in generality,they can assist,especially in exhibitingstrange utility functions. It is interestingthat, having expressed doubts about probabilitiesin the mobile telephone study,Cox concludesthat 'the elicitationof priorsis generallyuseful mainly in situationswhere there is a large amountof information,possibly of a relativelyinformal kind, which it is requiredto use and which it is not practicableto analysein detail'. Is not this a fairdescription of the study?Incidentally, the statistician'sfondness for frequencydata shouldnot blind them to informationto be had froma scientificappreciation of theunderlying physical mechanism. I agreewith Cox thatpersonalistic probability should be based on information;that is whyit is always conditionalon thatinformation. But I do not see how he can claim that'confidence limits ... could be regardedas an approximatespecification of a likelihoodfunction'. Observing r successes in n trials,a 334 Commentson thePaper by Lindley likelihoodfunction can be foundbut not a confidencelimit because the samplespace is undefined.Cox suggeststhat we can testa hypothesiswithout an alternativein mind.Yes, but whetherthe testwill be anygood dependson thealternative. My emphasis,supported by Evett,on the conditionalnature of probability,that it dependson two arguments,not one, has not been fullyappreciated. For example,if hypotheses,HI, H2, ..., H,, are contemplated,with H theirunion, then all probabilitieswill be conditionalon H, so theaddition of Hn,I will only necessitatea change in the conditionsto the union of H and H,,,1. This meets Barnard's objection about not-H, Nelder's point about likelihood and Sprott'spoint (b). Incidentally,it is interesting,though not unexpected that, apart from Barnard, no one attemptsto demolishthe arguments of Sections1-6 leadingto whathas been called 'the inevitabilityof probability'and it is onlywhen they mountthe bus, in Armitage'shappy analogy, that doubts enter. The proofof thepudding may be in the eatingbut the recipe counts also. The identificationof statisticswith uncertainty has worriedmany, even thoughI explainedthat it was 'not in thesense of a precisedefinition' so Hand is allowedhis exemptions,though even thereit is well to recognizethat, even witha completepopulation, a summaryintroduces uncertainty and thequality of a summaryis judged by how littleuncertainty it leaves behind.To answerArmitage, the reasons for choosinguncertainty as primary,rather than variability in the data, are firstthat it is theuncertainty of theparameter (or the defendant'sguilt) that is primaryand the data are aids to assessingit, and second thatdata need somethingmore beforethey will be of value. In amplificationof the second point,it is possible to have two 2 X 2 X 2 tables, each with a controlfactor, another of effectand the last a confoundingfactor, both with the same numbers,in whichthe conclusions about the effect of thecontrol are completelyopposed (Lindley and Novick, 1981). Statisticalpackages that analyse data without contextare unsound. The suggestionof an eclecticapproach to statistics,incorporating the best of variousapproaches, has been made. I would,with Evett, call it unprincipled.Why do adherentsof the likelihoodapproach, part of thiseclecticism, continue with their upwards of 12 varietiesof likelihood,all designedin an attempt to overcomethe failureof likelihoodto be additive,a requirementeasily seen to be essentialto any measureof uncertainty?There is onlyone principle:probability. Why use a pejorativeterm like sin to describeincoherence? These eclectic people do not like principles,as is evidentby theirfailure to considerthem, instead concentrating on theirperceptions of whathappens when they are applied,often falsely.Efron worries about the likelihoodprinciple, which is not surprisingwhen the bootstraphas no likelihoodabout which to have a principle.The Bayesian view embracesthe whole world,which is overambitious,and has to be reducedto smallworlds, whereas the frequentistview restrictsattention to a population.The bootstrapgoes to theextreme and operateswithin the sample, eschewing reference to outside aspects and using ad hoc methods,like trimmedmeans, discussed in Section 12 withina coherentframework. Readers who are not already familiarwith it mightlike to read the balanced discussionof thebootstrap in Young(1994) and,in particular,Schervish's remark that 'we shouldthink aboutthe problem'. O'Hagan may be wrongwhen he says, in his point(b), thata consensuswill not be reachedif the likelihoodis subjective.With two hypothesesand exchangeabledata, your log-odds change, on receipt of data, by the additionof the sum of yourlog-likelihood ratios. Provided that the expectationof the ratiosis positiveunder one hypothesisand negativeunder the other,the correctconclusion will be reachedand henceconsensus. Bartholomewworries about this consensusin a wider context.I do not know how we get started; perhapsit is all wiredin as Chomskysuggests grammar is. Interestingas thispoint is, it does notmatter in practicebecause, when we are faced withquantities in a problem,they make some sense to us and thereforewe have some knowledgeof them.I maybe undulyoptimistic but I feel thatif two people are each separatelycoherent, a big assumption,then a coherentappreciation of theoryand experiencewill ultimatelylead to agreement.It happens in science, thoughnot in politics or religion,but are they coherent?A small correctionto Bartholomew:Bayes does recognizewhat I have called a haphazard design(Lindley, 1982), and a convenientway to produceone is by randomization(after checking that therandom Latin square is notKnut-Vik). Hand raisesthe important question for our Societyof why 'the elegant tools for handlinguncertainty which have been developed by statisticianshave not alwaysbeen adopted'. One reasonmay be thatsome statisticians,myself included, have not immersedthemselves sufficiently Commentson the Paper by Lindley 335 in the circumstancessurrounding the data. Efronprovides an examplewhen, apart from telling us that the numbersrefer to hospital stays aftersurgery, he just treatsthe 23 numbersas elementsin the calculation:they might equally have referredto geophysicsfor all thebootstrap cares. We need to show morerespect than we do forour clients.When I suggestedthis at a meetingrecently, some membersof the audiencelaughed and mentionedcranks who believedin alternativemedicine as people who do not deserverespect. Ought we not to tryto help all who come to us to expresstheir ideas in termsof probability,to help themto become coherentand to respondmore sensibly to data? Anotherreason for suspicionof statisticsis that some of our methodssound, and are, absurd.How many practitioners understanda confidenceinterval as coverage of a fixedvalue, ratherthan as a statementabout a parameter? Cox defendsthe statistical analysis of acquiredimmune deficiency syndrome (AIDS) progression.My pointhere, perhaps not clearlyexpressed, for which I apologize,is thata frequentistapproach can only lead to standarderrors for estimates that use thefrequentist variation that is presentin thedata. It cannot incorporateinto its prediction other types of uncertainty.For example,it mayhappen that, impressed by media emphasison AIDS, the public may act morecautiously in theirsexual activitiesand, as a result, the incidencewould decrease. This sort of judgmentis outside the frequencycanon and, although competent,dedicated frequentists will search forways aroundthe limitation,the Bayesian approach naturallyincorporated both forms of uncertainty.In thisconnection, what is the objectionto attaching probabilitiesto frequentistmodels to providean overallBayesian model? Hoetinget al. (1999) provides a good account.I findCox's notionof predictiontoo narrow.Scientists want to estimatethe velocity of light,not to predicta futuremeasurement of velocitybut to predictrealities that depend on it, e.g. the timetaken for the lightfrom a starto reach us. In p(ylx), y is not necessarilya repetitionof x, but merelyrelated to x; therelationship often being made explicit,as O'Hagan says,through a parameter.Of course,frequentists do not like predictionbecause it is so difficultwithin their paradigm, involving contortionsakin to thosewith confidence intervals. Several discussantsraise the importantissue of inconsistency,e.g. between a prior,to use the unfortunateterm, and the likelihood.In thiscase, data have arisenwhich were unexpected.There are several possibilities.One is that an inspectionof the data or discussionwith a colleague reveals a possibilitythat you had not contemplated,in whichcase you may add the quantity(as H,+1 above) and continue.Another is thatthe trulyastonishing has occurred,just as it oughtto occasionally,and you continue.A thirdpossibility is thatyou selectedprobabilities that were insufficiently dispersed. There is some evidence that the psychologicalprocess involvedin probabilityassessment can lead to over- confidencein yourknowledge. Sometimes it can easilybe correctedby usinga long-taileddistribution, such as t withlow degreesof freedom,in place of a normaldistribution, when the combination of prior and inconsistentlikelihood leads to a reasonablecompromise but with enhanceddispersion (Lindley, 1983). To repeatthe pointmade in the penultimateparagraph of my paper,we are woefullyignorant aboutthe assessmentof probabilitiesand a concertedresearch effort in thisfield is important.Healy is rightto drawattention to thepsychology of theproblem. Cox raises the natureof the probabilitiesarising in Mendeliangenetics. I would like to reservethe word 'probability'to referto beliefs.Genetics, and similarlyquantum mechanics, use 'chances' which are, as Cox would prefer,related to biological phenomenaand arise fromexchangeability judgments discussedin Section 14, chance playingthe role of the parameterVp there. The older terminologywas directand inverseprobabilities. The distinctionbecomes usefulwhen you wish to considerthe simple concept of your probabilityof a chance, whereas probabilityof a probabilityis, in the philosophy, unsound.I do notunderstand Cox's remarkabout calibration. When others raise theissue theyordinarily referto the long-runfrequency, whereas Bayesians live aroundthe present, not long runs,and continu- ally adjust by our beloved theorem,responding in a way thatis distinctfrom the frequentist.This is illustratedby their use of presentutility functions rather than errorfrequencies. His claim that frequentistsmesh better than Bayesians with the real world seems wrong to me. O'Hagan is rightin claimingthat my discussionof nuisanceparameters is deficientand I agree with him thatthe inclusionof extra quantities,that are not of immediateinterest, is fundamentalto the assessmentof probabilities.It is a case of the largermodel being simplerand hence more commu- nicative.However, I do not agree with his commenton the conglomerabilityexample for it is the conditionalprobabilities that are the tangibleconcepts. A uniformdistribution on the integersonly makessense when it means uniform in anyfinite set. Wheneverthere is a discussionabout the Bayesian view, someone is sure to bringout the remark aboutbeing 'coherentbut wrong' and Nelderdoes not disappoint.You are neverwrong on the evidence 336 Commentson the Paper by Lindley thatyou have, when expressingyour beliefs coherently. To appreciatethis, try to give a definitionof 'wrong'. Of course additionalevidence may demonstratethat you were wrongbut Bayesianscan deal with that,either by changingthe conditions,as when you learn that an event on which you have conditionedis false,or by updatingby the theorem.Wrong you may oftenbe withhindsight but even frequentists,or likelihoodenthusiasts, have thatproperty also. I do agree withDawid that'Bayesian statisticsis fundamentallyboring'. A copy of thispaper was sent to the personwho has the fullestunderstanding of the subjectivistview of anyoneI know (Lad, 1996), and his principalcomment was thatit was boring.My initialreaction was of disappointment, even fury,but furthercontemplation showed me thathe is rightfor the reasonsthat Dawid gives. My onlyqualification would be thatthe theory may be boringbut the applications are exciting. Sprott,in his point(e), argues thatyou should summarizeempirical evidence without reference to preconceivedideas and says thatthis should be done throughlikelihood statements. Against this I would arguethat no-one has succeeded in describinga sensibleway of doingthis. I disputeArmitage's claim that the 'Fisherian revolution'accomplished this because, althoughhis methodswere superb,his justificationswere mostlyfallacious. Likelihood will not work because of difficultieswith nuisance parametersand because of absurditieslike thatdescribed in Section 13. An interestingfeature of the commentsis an omission;there is littlereference to the subjectivity advocatedin the paper,which surprisesme because science is usually describedas objective.Indeed Cox, in concludinghis advocacyof theeclectic approach, gives a personalisticreason for supporting his view. 'I regardthe frequentistview as primary,for most if not virtuallyall the applicationswith which I happento have been involved.' My advocacy of the subjectiveposition is based on reason,subsequently supported by experiencesof myselfand others. I concludeon a personalnote. When, half a centuryago, I began to do seriousresearch in statistics, myobject was to putstatistics, then almost entirely Fisherian, onto a logical,mathematical basis to unite the manydisparate techniques that genius has produced.When this had been done by Savage, in the formthat we todaycall Bayesian,I feltthat practice and theoryhad been united.Kingman's sentence is so apt to whatfollowed. 'Perhapsmathematicians select themselves by thisdesire to reducechaos to orderand onlylearn by experiencethat the real world takes its revenge.' The revengecame laterwith the advocacy of thelikelihood principle by Barnard,and laterBimbaum, so thatdoubts began to enter,and laterstill, as the plethoraof counter-examplesappeared, I realized that Bayes destroyedfrequency ideas. Even then I clung to the improperpriors and the attemptto be objective,only to have themdamaged by the marginalizationparadoxes. More recentlythe subjectivist view has been seen as thebest that is currentlyavailable and de Finettiappreciated as thegreat genius of probability.It is thereforeeasy forme to understandhow othersfind it hard to adopt a personalistic attitudeand am thereforegrateful to the discussantsfor the reasonedarguments that they have used, some of whichI mighthave myselfused in thepast.

References in the comments Balding,D. J. and Donnelly,P. (1995) Inferencein forensicidentification (with discussion). J R. Statist.Soc. A, 158, 21-53. Bernard,C. (1957) AnIntroduction to theStudy of Experimental Medicine (Engl. transl.). New York.Dover Publications. Cox, D. R. (1978) Foundationsof statisticalinference: the case foreclecticism (with discussion). Aust. J Statist.,20, 43-59. (1995) The relationbetween theory and applicationin statistics(with discussion). Test, 4, 207-261. (1997) The natureof statistical inference. Nieuw Arch. Wisk., 15, 233-242. Cox, D. R. andHinkley, D. V (1974) TheoreticalStatistics. London: Chapman and Hall. Curnow,R. N. (1999) Unfathomablenature and Governmentpolicy. Statistician, 48, 463-476. Dawid,A. P. andMortera, J. (1996) Coherentanalysis of forensic identification evidence. J R. Statist.Soc. B, 58, 425-443. (1998) Forensicidentification with imperfect evidence. Biometrika, 85, 835-849. Diaz-Frances,E. and Sprott,D. A. (2000) The use of the likelihoodfunction in the analysisof environmentaldata. Environmetrics,11, 75-98. Commentson the Paper by Lindley 337

Doll, R. andHill, A. B. (1950) Smokingand carcinoma of the lung: preliminary report. Br Med.J., ii, 739-748. Donnelly,P. and Friedman,R. D. (1999) DNA databasesearches and thelegal consumptionof scientificevidence. Mich. Law Rev.,97, 931-984. Durbin,J. (1987) Statisticsand statisticalscience. J R. Statist.Soc. A, 150, 177-191. Efron,B. (1982) The Jackknife,the Bootstrap,and OtherResampling Plans. Philadelphia:Society for Industrial and AppliedMathematics. (1993) Bayesand likelihood calculations from confidence intervals. Biometrika, 80, 3-26. Everitt,B. S. (1999) ChanceRules: an InformalGuide to Probability, Risk and Statistics.New York:Springer. Finney,D. J.(1978) StatisticalMethod in Biological Assay, 3rd edn. London: Griffin. Gianola,D. (2000) Statisticsin animalbreeding. J Am.Statist. Ass., 95, 296-299. Healy,M. J.R. (1999) Paradigmeset pragmatisme. Rev. Epidem. Sant. Publ., 47, 185-189. Hoeting,J. A., Madigan,D., Raftery,A. E. and Volinsky,C. T. (1999) Bayesianmodel averaging:a tutorial(with discussion).Statist. Sci., 14, 382-417. Lad,F. (1996) OperationalSubjective Statistical Methods. New York:Wiley. Lindley,D. V (1957) A statisticalparadox. Biometrika, 44, 187-192. (1971) BayesianStatistics: a Review.Philadelphia: Society for Industrial and Applied Mathematics. (1977) A problemin forensicscience. Biometrika, 64, 207-213. (1978) The Bayesianapproach (with discussion). Scand. J Statist.,5, 1-26. (1982) The use ofrandomization in inference.Philos. Sci. Ass.,2, 431-436. (1983) Reconciliationof probability distributions. Ops Res., 13, 866-880. Lindley,D. V andNovick, M. R. (1981) The roleof exchangeability in inference.Ann. Statist., 9, 45-58. Nelder,J. A. (1999) Fromstatistics to statisticalscience. Statistician, 48, 257-267. O'Hagan,A. (1988) Probability:Methods and Measurement.London: Chapman and Hall. (1995) FractionalBayes factors for model comparison (with discussion). J R. Statist.Soc. B, 57, 99-138. Rubin,D. B. (1981) The Bayesianbootstrap. Ann. Statist., 9, 130-134. Schwartz,D. (1994) Le Jeude la Scienceet du Hasard.Paris: Flammarion. Stockmarr,A. (1999) Likelihoodratios for evaluating DNA evidencewhen the suspect is foundthrough a databasesearch. Biometrics,55, 671-677. Walley,P. (1996) Inferencefrom multinomial data: learning about a bag of marbles(with discussion). J R. Statist.Soc. B, 58, 3-57. Yates,F. (1950) The influenceof "Statisticalmethods for research workers" on thedevelopment of the science of statistics. J Am.Statist. Ass., 46, 19-34. Yates,F andCochran, W G. (1938) The analysisof groups of experiments. J Agric. Sci., 28, 556-580. Young,G. A. (1994) Bootstrap:more than a stabin the dark (with discussion)? Statist. Sci., 9, 382-415.