, .

UNIVERSITY OF NORTH CAROLINA Department of Statistics Chapel Hill, N. C.

A BAYESIAN INDIFFERENCE POS'I'ULATE (Preliminary Report)

by

Belvin R. Novicl,-

Barch 1962

This research v~s primarily supported by the Office of Naval Research under contract No. Nonr-855(09) for research in probability and statist:Lcs at the University of North Carolina, Chapel Hill, N. C. Reproduction in 1,rhole or in part is pel"ll1itted for

any purpose of the United States Government 0 Sup­ plemer>.tal. support 'tras received by the National Science Foundation, Grant 0-5824.

Institute of Statistics Mimeo Series No. 319 A BAYf:SIJ\N INDIFFERENCE POSTULATEl ,2 (Preliminary Report)

by

r1elvin R. Novick

Department of Statistics and The Psychometric Laboratory The University of North Carolina

liThe only thing that I know is that I Imow nothing." Attributed to Socrates.

1. Introduction and Summary.

A Bayesian indifference postulate and a mode of estiwnting the density of a

random variable are proposed. is the density of a random varialbe

,..... ). 1----( dependent on a parameter g € 1 € I-I Q, and /--( is a prior distribution for the prior marginal density of X may be defined as (, f ~ (x) == ! f,,(X) d 'f (G) (1.1) ,I ~ I (i-i) In cases where no prior information is available it is proposed that a prior distri-

'- /-( bution 1 € I-t be chosen ,·rhich minimizes the Shannon information measure of r----f f~ (x). Some sufficient conditions are given under which the prior distribution will be uniquely specified. In connnon examples agrees with that specified by the usual application of the Bayes postulate. The proposed postulate, however, leads to a choice of f which is invariant under one-to-one transforwation of the parameter space.

The posterior marginal density of X is considered as an estimator of the true densi'l:;y of X. Under general conditions, the sequence of l')osterior marg:Lnals IThiS research v~s primarily supported by the Office of Naval Research under contract No. Nonr-855(09) for research in probability and statistics at the University of North Carolina, Chapel Hill, N. C. Reproduction in whole or in part is permitted for any purpose of the United States Government. Supplemental support 'Has received by the National Science Foundation, Grant G-5824. 2 Presented as a contributed paper at the Eastern Regional ~Ieeting of the Institute of , April 1962, Chapel Hill, N. C. -2-

is consistent and l~s certain interesting information ~ theoretic properties.

This preliminary report is eJ~ository in nature; proofs of the results obtained

are presented when they contribute materially to the conceptual development of the

theory.

2. The Parametric Problem

He associate with the possible outcomes of an experiment the real-or vector-

valued random variable x taking values in ~le assume that there exists a

density function f(x) (with respect to a suitable measure) which way serve as

an adequate probabilistic representation of the relative frequencies of the possi-

ble outcomes of the eXl)eriment. If f(x) were Imovill, no statj.stical problem 'woulO.

exist, as the probability of any possible outcome could be calculated. vfuen f(x)

is unlmovID a statistical problem exists and may be approached. in d.ifferent vlayS.

An important consideration is the character of the assumptions the statistician

is willing to make concerning the nature of f. A common assumption is that f

belongs to a class of distributions ....?!'and that these distributions may be indexed

by a parameter g € 0) (real- or vector-valued). vle say tl::.en that the problem is in parametric form. The parametrization of a class of distriJutions, however, is

y'") '-'() (\ ,p not unique. If C ::: \ G is a one-to-one transformation on ,,") ,then \

0,'\,-, might as easily serve as a parametrization for 'f", . The choice of a particular parametrization has been subject to convention and convenience but not to logical

specification.

A simple example may be illustrative. Consider "'j. to be the class of normal

distributions vrlth mean zero. Then Q is real-valued, and an arbitrary member,

f g , of the class may be written 1 ::::- e , - 00 < x < 00, o

tion is chosen so that the parameters coincide vrith moments of the distribution as

~ in the above case, where Q2, the variance of X, is often taken as the parameter.

other classes of distributions, for example, the beta o.istributions; are not co~:nmon~·

ly parametrized by moments.

3. The Classical Approach

The classical approach to the statistical problem in its parametric formulation

has been to consio.er the statistical problem to be one of drauing j.nferences con-

cernipg Q. These inferences are usually eJcpressed in the form of point estimates,

interval estimates, or tests of hypotheses. Indeed, to avoid controversy, we will

consider that to be the definit,ion of t'classical approach". 'rhe syntactic or rrathe-

matical problems arising frum this forrrmlation have been a major SUbject of research

activity. The semantic problem, that is the problem of mal~ing su~h inferences

meaningful with respect to the scientific problem under study, has recently been e g:l.ven serious attention, particularly by those advocating Bayesian procedures. An inclination to add to the argurnents against the meaningfulness of the classical

approach uill not be pursued. Those who subscribe to this approach are unlikely

to be moved by further criticism here, while a more positive approach may be more

compelling. We vall note only tlw.t the classical approach has transformed the do-

main of study from the random variable to an arbitrary parameter.

4. The Bayesian Approa::E

In the Bayesian approach the parametric formulation is as in the classical /- ,. approach. The additional aSBumption is that there is a distribution E: 1/-'; on

(h) vTith probability element d r (Q). In Eayesian terminology fQ(X) is the con-

'-, ditional density of X given Q and ! is the unconditional or prior distribution

of Q. The marginal density of X v~s defined in (1.1). Given a vector of n

~ observations ~, the posterior probability element of Q given x is given by

fG(~) d 7(G) C!.~; (Q) :::: in (4.1) f.,. (x) ~ '-' -4- where f'; (x) is defined by (1.1) inth x substituted for x. The posterior I -, marginal density of X. is defined by

f::; , n (x) ;, (4.2)

The dependence of (4.1) and (4.2) on x is indicated only by the subscript n. If it is possible to specify a prior distribution, r, the implied evaluation

of prior information ~ be combined with the experimental results to yield a

posterior distribution j of g (4.1) . In practice i may be based upon prior ob- I n servation of the process being studied or by analog to some similar process. The

rnajor unresolved difficulty in this procedure lies in specifying the prior distri-

bution of g vThen IInothing is lmoi'm about G". A resolution of this problem by

postulate i'~s attempted, in a special case, by Bayes ~176~7. A full discussion

of the history of this problem may be found in Perks ~19417. e The Bayes postulate stipulates that in the absence of p.rJ.or ImovTledge, all points in the parameter space are to be tal>:en as equally likely, the so-called

"principle of indifferenceII. For a < g < b, where a and bare arbitrary con-

stants and G is real-valued, a uniform prior distribution is taken. For

-0) < G < co, a normal distribution vr.tth "large" variance a:o(l 2.rbitrary mean may be

taken. For 0 < G < 00 an exponen'bial distribution with large mean may be taken.

The extension to vector-valued parameters is straightfori'~rd.

An advantage of the Bayesian approach is that it permits direct w~thematical

probability statements concerning parameters. vlhile these mathematical probability

statements are not usually interpretable in a relative frequency sense, they lnay be

considered as a model for a subjective evaluation of the parameter. The methods of

point estiw~tion, interval estiw~tion and tests of hypotheses have direct analogs in

the Bayesian context and,it is felt, can be more meaningfully handled in that con-

text since direct probability statements on G can be made.

The rrajor difficulty in the Bayes postulate arises in that it is not invariant -5- under one-to-one transformation of the parameter space. As pointed out in section e 2, the parametrization of a class of distributions is quite arbitrary and indeed,

given any parametrization, say Q, then \{ == I..f (Q) may also be taken as the para-

meter provided only that \1' (Q) is one-to-one. As an example suppose X is binomially distributed. The class ')~ of binomial densities may be parametrized by

o < P < 1, where p is the probability of success in each of the Bernoulli trials

which w~ke up the experiment. We have then

f (x) P == where n is the total number of trials and x::: 0, 1, .•• , n is the number of

successes.

Under the assumption that we have no prior information relative to the process,

the Bayes postulate would specify that the prior distribution of p should be uni-

form 0 < P < 1, which corresponds to the beta distribution

__ k-l/~ ,V-l p ~.L-pr g(p) == t3(k,l) °< P < 1; k, f > 0

with parameters k == f == 1. However, the parameters \.t' == '* (p) == 2 arc sin !P'

amI \jr::: \jr (p) == f.n could also serve as parameters for f _rLindley, J:..I-p 19577. The Bayes postu]~te would require uniform prior densities for 'f and \jr which are equivalent to beta densities for p, in the first instance vdth parameters 1 k == f == 2' and in the second instance with parameters k == f ::: € (e arbitrarily

small) •

/--"1' Indeed if we restrict ~~ to the class of beta densities, for every specifi-

cation of k == f == c (say) there COuld be found a transfoni~tion on p such that

the beta prior distribution on p with pa~ameters k == f == c would be equivalent

to a uniform distribution on the ne"T parameter. Thus our problem would be to give

a rule which would logically specify a value for c.

This general problem has persisted since the posthumous publication of Bayes

paper in 1763. Notable attempts i-Jefferies, 19491 Perks, 19417 have been w~de to -6-

circumvent this difficulty; hovTever" the results bave no-t.. m~t "dtn "O.n;r (;0"'''''1''1''1 accep- tit tance. One contemporary writer bad j.gnored the problem in an earlier paper fiindle,'yj 19517 but recently lLindley, 196J:.7 has given it serious attention. A second has flatly asserted that it cannot be resolved 1Raiffa and Schlaifer, 196J:.7. Early papers of Fisher £19527 contain excellent discussions of this and other points con­ cerning Bayes theory. Fisher's discussion of the fiducial distribution of the (n+l)~

observation 1-193'1.7 v18,s j.mportant in the clevelop:rre nt of the theory presented. in this

paper.

Two other crit:i.cisms of Bayes procedures have been raised on occasion. The

first is tl1..at even if there is some prior information it is usually difficult to

formulate this adequnte1y into a prior distribution. It is certainly true, however,

that the difficulty is not so great that it is completely unamenable to analysis. Schlaifer £19527, and Raiffa and Schlaifer f196~7, discussed this problem in some detail. Given a method of specifying the prior distribution under indifference,

this problem could be even less imposing.

A third criticism, a logical rather than wnthematical one, has been disappearing

now that the mathematical theory of probability has been placed on a firm a)c1ornatic

basis, and developments in the philosophy of science have mOl'9 clearly demarcated

the syntactic and semantic aspects of probability theory. It was at one time ar-

gued that it 'Was illogical to consi.der a parameter to have a prior distribution as

the parameter v~s a fixed constant and not a random variable.

The error in this thiru~ing was the failure to recognize the difference between

the semantics and the syntactics of probability theory. The syntactic definition

of probability, that is the definition of probability in the syntax of a wnthemati­

cal system, may be rigorously formulated within the theory of measure. The semantic

definition, that is the definition which lirues mathematical probability, as a model,

to some empirical process is less easily explicated. There has, according to Carnap £19427, developed two (semantic) definitions of probability. The first of these -7- is associated ,nth the concept of degree of belief, the second with the concept of e relative frequency. Prominent names to be associated with the first definition are, in the earlier period, Bayes and laplace, and in the later period, Jeffries, Keynes,

Good and Savage. The second definition however, gained dominant acceptance through

the 'l'lOrk of Fisher, Von lUses and others.

It is possible to resolve the problem of definition and to avoid much of the

confusion that has been associated vnth the Bayes postulate only by fully recog-

nizing that the concept of probability requires a syntactic definition an~ in a

Bayesian context, tvlO semantic definitions.

5. Minimal Information In the previous section we discussed the problem of choosing a prior distri-

bution for g 'l'lhen there VIas no prior information concerning g. He were seeking

what we might call a minimally informative distribution. A general measure of the

amount of information in a distributi m may be tall:en from the ''lOrk of Shannon £i94J.7. If X has density hex), continuous-or discrete-type, then the information in h

is defined to be LLindley, 195§j ( I(X) ~ I hex) log hex) 0. x ';lc· In the discrete case the integral is replaced by a sum. The convention

h log h = 0 when h=O is aSSUIl1ecl. It is shown £ Shannon, 194,27 that the function rex) is the unique function, up to a multiplicative constant, satisfying certain

properties which might reasonably be required of an information function. The major

property will be noted later.

One property of the Shannon information measure ~ShaIll10n, 19427, is that for a < x < b where a and b are fixed constants and g is real-valued, the

density

1 hex) , = b - a -8-

the uniform density, has minirr.al information. For .00 < x < 00 and for Var(x) fixed the normal distribution ,nth arbitrary mean has minimal information, and for (}o o < x < 00 and G(x) fixed the exponential distribution has miniw.al information. For x talting a finite number of discrete values the discrete uniform distribution has minirr.al information. The requ:l.rement that Var(G) or t' (Q) be ftxe(l is equi- valent to fixing the scale of measurement. In the case of a density in a finite interval the scale of measurement is set by specifying the endpoints of the interval. In each instance, the fixing of two constants is equivalent to fixing the origin and scale of measurement; in the case of a doubly infinite range of x, the con- stants are the prior mean and variance, in the case of a singly-infinite range, the constants are the endpoint (zero) and the mean, while in the case of finite range,

the constants are the two endpoints. Extensions to two discrete cases not covered by Shannon are quite simple. It ~ is easily seen that the geometric distribution minimizes, for fixed mean, the in- formation among distributions on the non-negative integers and that the discrete analog of the normal distribution, with the probability of the integer x propor­ -ax2 tional to e ' minimizes the information, for fixed variance and arbitrary mean, over the set of all integers.

For the joint density h(x,y) of two random variables the inforr~tion is de- fined analogously as

I(x,y) := JJh(x,y) log h(x,y) ely ax ): y The major defining relation of the information measure, mentioned previously, is

I(x,y) = I(X) + I(y/x),

where I(y/x) is the information in the conditional distribution of y given JC

fLindley, 195§}. It has also been shovm fLindley, 195§} that

I(x,y) ~ I(x) + I(y) -9-

with equality if and only if x and yare statistically independent. Hence to

minimize r(x,y) we need only obtain a joint distribution h(x,y) such that the in-

for.ma tion in each marginal is minimized and x and yare independent. Por dis-

tributions of infinite extent, means or variances must again be fixed. EJctension

to several variables is straightforv~rd.

6. A Bayesian Indifference Postulate

It will be convenient now to give names to the distributions discussed in

section 1. The density fg(X) ,·Till be called the model density. It may be the

distribution of a single randam variable X or more generally of a random vector

X. The distribution 1 will be called the prior Bayes distribution. The marginal 1 density f t (x) will be called the prior Ziducial density of x. The distribution will be called the posterior Bayes distribution and ffn(X) will be called the 1n posterior fiducial dens!t y ., Vie will also vdsh to consider the sequence Lfn} of

Bayes distributions and f f., (x)T of fiducial distributions with '4 =: 1 7,n I (0 r , .... J f'1 =: f'~ (x). Note that the true and unlmOi'ffi density of X is denoted by ,0(x)

f(x) and not by the more customary f g (x). o Our proposed Bayesian indifference postulate may be stated in the follovTing

form:

Let '.'.J- be a parametric class of m.odel density functions

'T for .1\•• Let 9 € VI) be an arbitrary parametrization for

}--I II Q. Let 1---/ be a class of prior distributions for '. /--; Hhen no prior information is available, that ~ € /1, is . r'-7 chosen, if eXistent, which minimizes the information in

.., \_,;. E no such mLLL::i zing ~; exj~8ts a { for ifhich ! 5 the information in f is arbitrarily close to minimum,

if eXistent., is chosen.

A slight further explicat.ion 'i¥ill be needed in some caseD and 'ifill be demonstrated

" in a later binomial example. Typically, few restrictions on f vlould be imposed. -10-

However, it will often be convenient to restrict f to the class of natural con- /---f jugates, or some other parametric class, and then shoyr that the € H which 1 /--1 minimizes information in f~(X) also provides the required minimization for '? H ) € ~I *-, the class of all distributions over the appropriate spectrum. I-i Consider the folloiving example. Let the model density be (x_Q)2 1 --2- = e ; -C

Par convenience let '-J be the class of normal distributions with mean p. and variance ~r·. 2, we may express 1 by the density (Q_p.)2 1 2'-1 g(Q) :: ')I 2:l! e Then the prior fiducial density (1.1) is found to be 2 (x-p,)

e

min~llized The information in f"r, (x) is I <;) (x:) - - log ~'Tc.:.+l'2 /2:l!er- and thus is

vlhen l' is arbitrarily large, and /.1- is arbitrary but fixedo This coincides vlith the usual interpretation of the Bayes postulate when the paraneter of the model

density is the mean Q. Additionally, f"1 (x) has minilJ1.al information among all 2 densities on (-00, +00) ydth arbitrary mean and variance at most (1' +1) ~Ketteridge, 19617. The primary advantage of the ne'VT indifference rule is that it is obviously invariant under one-to-one transformation of the parameter space, provided the origin and scale of measurement of the random variable are considered fixed. Its reasona-

bleness is partially confirme<3. by its agreement Ylith 1-That has come to be the accepted manner of applying the Bayes postulate. A second. example may be instructive. 4It Consider the binomial example given in section 4. Since our postulate is in- variant under reparametrization we way arbitrarily take the model distribution to be -11-

f (x) p

g(l') , 1~ > 0, K > o.

Then it is readily seen that

_ r(n+l) r(k+K) r(x+k) r(?-~~ - r(x+l) r(n-x+l) r{k) r(IT r(n":"'+'":""k+-;f"'"') x '" 0, 1, .•. , n .

Since f (x) is a discrete density defined on x = 0, 1, .•. , n, the information will be minimized when f7 (x) is uniform, Le.

1 = -n-+~l-

It is easily seen that this occurs if and only if k = K= 1, which agrees with the e application of the Bayes postulate to the parameter p. v'le had required that the infonnation be minimized only in the class of fiducial densities generated by

prior Bayes distributions of the beta class, whereas we indeed minimized the infor-

mation with respect to the class of all prior distrj.butions (though not necessarily uniguely: see section 8). 7. Posterior Fiducial Inference He nOvT consider a second motivation for our postulate. vlhereas both classical

and Bayes approaches to the parametric formulation involve methods of inference con-

cerning same parameter (or moment) of the distribution, we propose that a more

natural procedure and one ,vhich may in many scientific studies be extremely use-

ful, vTOuld be to estil11.ate the density function of X. This vTould have the advan-

tage of completely freeing us from the restrictions imposed by selecting a parti-

cular parametrization. We would desire a general method which utilized all experi-

mental infonnation and all prior information, if there were any, but which could

still be used when there Vias Ilno prior inforw.ation", or \-Then it was desirable to -12-

have a result which depended only upon the data . .. '-." ) . I Consider the sequences l. f n Jand tf~n(X)J . Under reasonably general conditions the distribution 'T n 'fill converge to a point distribution on the true parameter value and hence i·,ill be a consistent estiIl"stor of f(x).

While the author has been unable to find necessary and sufficient conditions in the

literature it must be presmned that this matter has been previously studied. The

follovnng two conditions are easily seen to be sufficient. For the case of real-

valued G (extensions obvious) v18 require

(i) f~t' (Q) > 0 for every non-degenerate interval (a,b) in (~), ~(l... ' (ii) The existence of a consistent Il"sximum likelihood estinEtor.

Distributions satisfying (i) will be called.adaptive_.

Our procedure when no prior inforwstion is available is to begin vnth an a

"- priori distribution t which minimizes the informati. on in f ~ (x) and to utilize f~ Follm~ing the results of the evneriment..~ to obtain an estiwste 7n'(x).. Lindley ~19~~7 it is easily seen that under this procedure the expected infornRtion in E~ f1 n(X)' i.e. I1n(x), is greater than the information in f 3 (x) where the ex­ pectation is taken with respect to the true distribution of (Xl' X , X '···, X ), 2 3 n and the expected value of the increase in inf.ormation is positive, i.e.,

I~ ,n(x)7.- > 0, i ::: l, 2, 3,

In the normal example, variance one, under our proposed rule ,,18 have

f··- (x) i n

(see Section 9) where x is the observed sample mean and hence the information in f 1 n(X) is -log /n+l /2rre. He then have the "unusual" result that the infor­ n Il"stion is totally independent of the observed value of x and that the sequence

I -'1n(x) is monotone increasing. In other cases I'7;,n(x) will depend on the -13~

vector of observations. It is possible in such cases to obtain, under certain

conclitions,

I -l- (X) < I"1n(x) for some i = 1,2,3, ... 1n i

This vall occur, roughly speaking, when an unusual vector ( Xn+l,Xn+2"'" xn+l.) is observed.

We might also note that the Bayes approach regardless of the prior distribu-

tion, provided only that it is not pre-judicial ~Raiffa and Schlaiffe, 196~7 has

the important and easily established property that in a sequential procedure the

final distribution "--::; and hence f -; n(x) is dependent only on the value of a J n for a parameter Q which indexes the distributions ~. It

is thus, given the sufficient statistic, free of dependence on the individual ob-

servations and more particularly on the order in 'l-lhich they occurred. If

pre-judicial then by definition and A restriction, 4It in applications, to adaptive, prior distributions would be in conformance with the philosophy of methodology of science.

8. Unique determination of the prior distribution

The proposed indifference postulate provides a solution to the problem of in-

variance under reparametrization, provided the scale of measurement of the random

variable is considered fixed. It, however, creates a second problem. If the

postulate is to be meaningful it must be such as to completely specify the pro-

cedure to be followed and the result to be obtained for given observations. This

resolves itself to the problem of showing that the proposed postulate ~niquely

specifies the prior distribution.

j-i J-.../* In some cases "le w.ay 'dsh i_I to be the class 1--/ * of all adaptive distri- t---; 1---/ /'. j---Y butions on (!:!), or I-i 1-1 may be taken to be a parametrizable class of adaptive dis- 4It tributions, such as natural conjugates. In the latter case uniqueness can be readily obtained. In the former case it is often possible to obtain a solution by -14-

}- --I '-. ;-'-1 restricting H to be the class of natural conjugates, obtaining a unique ~ € (-/ 1--1 / I--f •• 1---1 *.* vlhich minimizes I (x) and then shmling that 7 is also un~que ~n N 1 (--1' It does not appear to be entirely trivial to state conditions under which

the integral equation

f-~ (x) ::: _f f~(X) d ~ (Q) , x €1 (real values) (8.1) (Ii) has at most one solution given f1 ane1 f (x) for all "") VIe (for T!.' (x) Q -'" . shall, however, demonstrate uniqueness for sufficiently large classes of problems

to justify further research based on the proposed postulate. It is hoped that

more general proofs of uniqueness way be forthcoming. Let us suppose (~) is an interval (perhaps infinite). If 7 is to be adap­ tive it would be convenient to restrict it to a continuous-type density with spec­ trum (ij) , so that (8.1) becomes,

\ fa(X) g(Q) d G , (8.2) J - @ In our normal example let g(Q) be any prior density with spectrum 0~0 ; then if (8.2) is to hold,

( 1 e g(Q) d Q for all f f (x) ::: ) r;- ~ /2:rc @) By factoring the normal density fQ(X) we have the condition

for suitable h(G). But h(G) anCl. hence g(G) are (essentially) unique by the uniqueness of the bilateral laplace transform. Uniqueness cdulCJ. be shO\m without the assumption of the continuity of 7 by employing the uniqueness

of the bilateral laplace-Stieltjes transfor~m. The restriction to continuous-type densities for G when (2) is continuous is appealing. We may generalize this technique somewhat, though the generalization is more forr~Bl than useful. -15-

If , .! fg(X) d 9 < 00 for all x € II j then "\ . C (Q) :::: (~:) z , Q € , Z € .f, \!f ()Z d t'n ,.. J Qa a (1-1) is a density function (natural conjugate) with parameter z. If C (G) is a corn- z plete class lLehmann, 195.27 of densities for z €* )there can be a most one g satisfying (8.2). The proof follows immediately from (8.2) by the definition of completeness. Unfortunately the techniques for establishing the completeness of

classes of distributions tend to be rather varied and specialized. The uniqueness

of integral transforms is one technique.

The restriction (8.3) 1'la8 made only in order to apply the Cl.p.finition of the

completeness of a class of densities; completeness of more general classes of e functions could be define(l in an obvious way and applied to the lill:elihood function

f Q(x) (a function of Q for x € ;:.) 1'lithout such a restriction. In fact multi-

P1Y1ng· b0th Sl.des ox'" (8. 2) byean i tx d'lntegra tlng . over x € 'l)\, 1eadsana10- gously to the following sufficient condition. If '(' (t I .) is a complete class

of functions for -00 < t < 00, 1'lhere \f (tIQ) is the characteristic function of the model density, then there is at most one prior density g satisfying (8.2). This method 1'rould also apply in the normal example above.

A second instructive example is the binomial case. The model distribution is

The mini~ally infonnative prior fiducial distribution is

1 f (x) ::: n + 1

anCl_ hence the integral equation is ,--I e j 1 I (n) :::: pX(l_p)n-x g(p) dp, X == 0, 1, .•• , n (8.4) n+l )X 0 -16- tit FoI' n:::: 1 vTe have

1 (~) "2 c 1 pX(1_p)l-X g(p) dp But this equation must hold for all x, i.e., x :::: 0, lj hence t \ ~ and 1 c J(l-P) g(p) dp "2 JP g(p) dp

- 0 L.-' which implies (p) :::~. For n:::: 2 vTe have t'') 2

\ 1 2 '" n-x , (x) p""(l-p) g(p) dp for x:::: 0, 1, 2, "3 Jo which inth the restriction implies Var(p) ::: -.1 For general n, 12 the first n moments must equal the first n moments of the uniform distribution.

Thl1S, since the uniform distribution obeys the moment theorem we may obtain uni­ tt queness by requiring that (8.4) hold for all n. It would seem that vThenever the spectrum of X was finite it would be necessary to require that the integral

equation ~ f I (xl' .•. , x) ::: J... f' (x.) g(Q) d. Q ( n i=l 1. (fi) hold for all n. If it holds for n it holds for n-l, since the conditions of

Fubini's theorem are satisfiecl we roo.y integrate (8.5) over x and interchange the n order of integration to obtain the required result~

The POiSSO!l cane is an eJro-mp.le of a cliscrete rantlom variable but vlith an irl~>

finite spectrum. The model density is

x ::: 0,1,2, ... , with t fixed. The Bayes density is

US1lally taken as a member of the conjugate prior class

TP'A. p-l e-'A.T r( p) (8.6) -17-

The prior fiducial dene±ty ±s then

X t TP r(x+p) ;::: (8. "() r(p)r(x+l)(t+T)x+p which is simply a negative binomial, which has the geometric as a special case. From section 5 we know that the geometric distribution minimizes the information, for fixed mean, in the class of distributions on the non-negative integers. The equation (8.2) is then

(~O~-:>_A.t (A.t)x or rep) r(x+I)(t+T)x+p ) x ~()

= x ;::: A. h (A.) dA. (3.8) r( p)( t+T )x+P l J(,) "There h~ ( K) ;::: e-A.t f!.- ()A. . But (8.6) must hold for x ;::: 0; 1, 2; 3; ... Hence j." " -.t .. any function h(A.) ;::: e-A.t g(A.) must have all moments equal to the moments of hl(A.). Since the condition of the moment theorem is clearly satisfied it follows that hl(A.) and hence gl(A.) is the unique solution to (8.8). It should be /--1 noted that '-Te have here for convenience again taken J':j to be conjugate prior and then shovm that the '{ € /~;..( which uniquely minirniz,es I ~ (x) also uniquely mini·· J 1_-( } '7 1-7' ~~ mi.zes I ~ ( x ) for? €" • The method of characteristic functions mentioned 1--1 above could alternately be used.

9. Frequency pr~erties of posterior fiducial distributions - a conjecture. Having focused our attention on a procedure which estimates the density rex) of the random variable X, instead of a procedure which involves inferences l-\} concerning some arbitrary parametrization of ~:r, it is now necessary to begin an examination of the properties of the estimator f'S'.n(x). Information-theoretic properties are given in section 7. In this section we shall investigate the rela- -18-

tive rrequency properties or the posterior fiducial estimation procedure. 1Je pro-

mal~e pose that the experimenter nBy wish to use r ,n(x) to probability state- ments concerning X) i.e. perhaps to act as ir r; n(x) = f(x). Hoperully) these probability statements vnll have meaning as relative rrequency probability

equalities or inequalities.

The unique feature or the proposed method is that it begins vdth a minimally

inrormative estimate and proceeds sequentially through a sequence or estiwstes

which are) in expectation) more inronnative and in the limit converge to the true

density. The amount of infonnation in a distribution is a ,measure or the lacl~ or

dispersion in the distribution. In the nornBl case it v~s found that information

v~s proportional to - log~. Now to the extent that a distribution is less dis-

perse) that is has more inrormation) predictions made rrom it will be more precise.

Thus predictions made from the distributions r '"1 n(x) will tend to be more pre- cise as n increases.

Consider the normal example. The model density is normal with mean G and

variance 1; the prior Bayes density is normal with mean fl) arbitrary but rixed)

and variance ,..2; the prior riducial density is normal '-lith mean lJ. and variance 2 1 + ,.. ; the posterior Bayes distribution arter n observations is normal with -2)!( -2) (_2,-1 mean (I.: x. + fl ,.. n + ,.. and variance n + 'T" /• The posterior riducial J. density is norwsl with mean (I.: x. + lJ. ,..-2)!(fl + ,..-2) and vdth variance 1- ~(n+l) + "'-~7!(n + ,..-2) the specification of our indirference postulate will take ,..2 to be arbitrarily large and hence the posterior fiducial density

will be norwsl with mean arbitrarily close to x and variance arbitrarily close n+l to He have then n n (x _ x)2 2 (n+l) = e -19-

He consider this to be the posterior fiducial distribution of the n+lst obser-

vation. He are thus concerned with the extent to which probability statements

based upon this distribution have relative frequency meaning with respect to the

random variable X which has true density f(x).

Had 'V1e proceeded by more classical methods} e.g. least squares} maximum like- lihood} or minimum variance unbiased estimation} we would have obtained x as

our estimator of g and if we had replaced G in the model density} i.e.} acted

as if Q::;: x} ,ve i'lOuld have ob'ba:tned}

- (x-x-.)2- 1 2 f-(x) ::;: e x instead of (9.1).

The probability that a 1 - a central interval of f-(x) covers a 1 - a x central intel~al of f(x) is zero} as it will occur if and only if x::;: G. rlore

specifically we see that the probability that the (n+l)st observation (Which) of

course) is governed by the lai'1 f(x) '-1ill fall in a 1 - a central interval of

f-(x) is less than 1 - Ho'vever} the probability that the (n+l)st observation x a. will fall in the 1 - a central interval of f'7 n(x) is ~~~ct~:z 1 - a. This

y can be readily verified by noting that the distribution of X­ 'n+l is normal n+l with mean zero and variance 'n' Thus i're see that a sequence of such indepen- dent predictions based upon 1 - 'f a central intervals of f ,n(x) ,,,ill be confirmed with long run relative frequency exactly 1 - a. 3 Somewhat similar results may be obtained in other examples. These results are

not generally in terms of central intervals nor is exact equality attained. In the binomial example} with the true p > .5} the expected value of p} the fiducial probability of a success on the n+lst trial} is easily seen to satisfy the in­ ~ equality .5 < P < p. Hence predictions of success w~de on the basis of p will

3we have} in essence} a Bayesian tolerance interval. -20-

be conservative, that is the true probability of confirmation is greater than that

claimed by the fiducial prediction.

The described information theoretic behavior of f e";i..n. (x) and a consideration of the above two very special cases leads us to the conjecture that more general

frequency theorems may be obtainable. The difficulty in generalizing this con-

cept lies in the fact that these frequency properties seem to be quite different

with regaro. '1:.0 the spectrum of the random v8,riable, i ,e" whether finite, infinite

or semi-infinite. This problem defines a major area of future research in the

development of the proposed mode of inference.

10. Measurement of the random variable

It has been assumed throughout that the scale of measurement of the random

variable wus fixed. The assumption is clearly more stringent than necessary. If e is a linear transforwation on X the minimally informative prior distribution then determined will be consistent with the minimally informative prior distribution

determined by the original random variable and this result will be true independent-

ly of the particular parametrization chosen in either case. Thus whenever we way

consider the scale of measurement of the random variable to be fixed up to a linear

transformation, the proposed mode of analysis ivill not be tro:,lbled by lack of in- variance. Invariance under non-linear transformation of the random variable is

generally neither present nor meaningful in classical procedures vnlen the model

density is specified.

11. Acknowledgments

The author ivishes to aCknOi'Tledge his very great indebtedness to his advisor,

Professor W. J. Hall, who has given most freely of his time and energy to guide the author in formulating this system, based 1nita11y upon little more than the author's intuition and hopeful conjecture, into some semblance of mathewatica1 coherence. Hithout this guidance these ideas could Dot 'have been developed. Pro- fessor Hall's assistance in revising the present 11lanus.cl'ipt is also gratefully -21- acknowledged.

vfuile many other members of the University of North Carolina community have made some direct or indirect contribution to these developments, a few should be mentioned e2~licitly. Professor R. Darrell Bocl~, of the Psychometric Laboratory, has offered many invaluable criticisms and suggestions. Professor Walter L. Smith, of the Department of Statistics, has on several occasions shovm the author the error of his i~YS. Professor W. Robert Mann, of the Department of l~athewatics, r~s given the author private instruction in areas of mathewatics in which he was un­ trained.

The Office of Naval Research and the National Science Foundation are to be tharu~ed for their financial support. REFERENCES

Bayes, T., (1763) "Essay tOvTards solving a problem in the doctrine of chances. II

;Ph:i,J~_ TrallS :.-£9Y.LSQ9·, 53, 370-418.

Fisher, R. A., (1950), .90ntributions to Nathematical Statistics, vli1ey, NevT Yorl..

Papers 10, 11, 22, 24, 25.

Fisher, R. A. (1933), llThe concepts of and fj.ducia1 probability

referring to ul1knovm parameters", .:e;ro~!-.-B.QJG.. SQ~~ A, 139, 343-8.

Good, r. J. (1950}, Probability and the Heighing of Evidence, Griffin, London.

Jeffries, H. (19~·8), T~eory ,of Probability, Oxford Press, Oxford.

Ketteridge, D. P., (1961), "Inaccuracy and Inferences",JQ}J.~'r.~J301l... Bt..aj;, SQQ., 23,

18L~-194,

laPlace, P. S., I1arquis o.e (1796). Reprinted in A Philosophical Essay on Probabi:

1ities, Dover, New York, 1951.

Lindley, D. V., (1956), "On a measure of infol"rnation provided by an experiment," e i.l;n1J..•~r.'I~th.L-Staj~ .. , 27, 986-1005· Lindley, D. V., (1957), "Binomial sampling schemes and the concept of information,ll

13j,9metrikG1., 41+, 179-186.

Lindley, D. V., (1961), "The use of distributions in statistical

inference and decisions, II (1961) .r.rocJou:r.:th..llJ3rl~_.QY1ill?" California Press,

Berkeley.

Lel:umnn, E. L., (1959), Testing Statistical Hypotheses, Hiley, New York.

Hi.ses, R. von, (1957), Probability, Statistics, and Truth, HaclUllan, New York.

J\Teymt,Ul, J.; (1952); ,!;~stures and Conferences On !Iathematical Statisti.cs and

Probability, USDA Graduate School, Washington.

Neurath, 0., et. al., (1938), :C!!,:t~pationalJ:!:llQ.Y9l:QP~M~9fUnif;i.ed Science,

Univ. of Chicago Press, Chicago.

Perlr.s, W., (1947), "Some observations on inverse probability including a new in­

difference rule," J_ourlli..J.nst~~t., 73, Part II, 285-334. 2,5 Raiffa, H., and Schlaifer, R., (1961), Applied Statistical , 4It Harvard Business School, Boston. savage, L. J., (1954), The Foundations of Statistics, Wiley, New York. Schlaifer, R., (1959), Probability and Statistics for Business Decisionrs, rlcGravl­ Hill, Nevr Yor}>.. Shannon, C. E., anc111eaver, H., (1949), The Hathematical Theory c:f Communication, University of Illinois Press, Urbana.