Statistical Decisions Using Likelihood Information Without Prior Probabilities

170 GIANG & SHENOY UAI2002 Statistical Decisions Using Likelihood Information Without Prior Probabilities Phan H. Giang and Prakash P. Shenoy University of Kansas School of Business 1300 Sunnyside Ave., Summerfield Hall, Lawrence, KS 66045-7585, USA {phgiang,pshenoy} @ku. edu Abstract Although the maximum likelihood principle is not based on any clearly defined optimum been in This paper presents a decision-theoretic ap consideration, it has very successful proach to statistical inference that satisfies leading to satisfactory procedures in many the Likelihood Principle (LP) without using specific problems. For wide classes of prob prior information. Unlike the Bayesian ap lem, maximum likelihood procedure have also proach, which also satisfies LP, we do not been shown to possess various asymptotic op assume knowledge of the prior distribution timum properties as the sample size tends to of the unknown parameter. With respect to infinity. information that can be obtained from an ex periment, our solution is more efficient than A major approach to statistical inference is the Wald's minimax solution. However, with re decision-theoretic approach in which the statistical in spect to information assumed to be known ference problem is viewed as a decision problem un before the experiment, our solution demands der uncertainty. This view implies that every action less input than the Bayesian solution. taken about the unknown parameter has consequences depending on the true value of the parameter. For ex ample, in the context of an estimation problem, an action can be understood as the estimate of the pa 1 Introduction rameter; or in the context of a hypothesis testing prob lem an action is a decision to ccept or reject an hy The Likelihood Principle (LP) is one of the funda a pothesis. The consequences of actions are valued by mental principles of statistical inference 5, 3, [8, 2]. their utility or loss. A solution to a decision problem A statistical inference problem can be formulated as is selected according to certain theories. Two widely follows. We are given a description of an experiment used decision theories are Wald's minim x and the in the form of a partially specified probability model a expected utility maximization. The latter is appropri that consists of a random variable which is as Y ate for Bayesian statistical problems in which there is sumed to follow one of the distributions in the family sufficientinformation to describe the posterior proba = }. The set of distributions is param :F {PolO E !1 bility distribution on the hypothesis space. Bayesian eterized by whose space is Suppose we observe (I !1. approach agrees with LP. It holds that the relevant = y, an outcome of the experiment. What can we Y information from the experiment is indeed contained conclude about the true value of the parameter (I?. in the likelihood function. However, in addition to the Roughly speaking, LP holds that all relevant informa experimental data, Bayesian statistics assumes the ex tion from the situation is encoded in the likelihood istence of a prior probability distribution, which sum function on the parameter space. In practice, likeli marizes the information about the unknown parameter hood information is used according to the maximum before the experiment is conducted. This prior prob likelihood procedure (also refered to as the maximum ability assumption is probably the most contentious likelihood principle or method) whereby the likelihood topic in statistics. assigned to a set of hypotheses is often taken to be the maximum of the likelihoods of individual hypothesis In [10, 11], we presented an axiomatic approach to de in the set. The power and importance of this pro cision making where uncertainty about the true state cedure in statistics is testified by the following quote of nature is expressed by a possibility function. Possi from Lehman [14]: bility theory is a relatively recent calculus for describ- UAI2002 GIANG & SHENOY 17 1 ing uncertainty. Possibility theory has been developed the parameter. One can write Pe0(y) in the form of a and used mostly within the AI community. It was first conditional probability: P(Y = y[e = eo). The latter proposed in late 1970s by Zadeh [23]. notation implies that there is a probability measure on parameter space !1. This is the case for the Bayesian In this paper, we take the position of LP. In particular, approach. In this paper, we do not assume such a we assume that likelihood information, as used in the probability measure. So we will stick with the former maximum likelihood procedure, faithfully represents notation. For each value E !1, there is a likelihood all relevant uncertainty about the unknown parame e quantity. If we view the set of likelihood quantities as ter. We show that such information is a possibility a function on the parameter space, we have a likeli measure. Thus, our decision theory for possibility cal hood function. To emphasis the fact that a likelihood culus is applicable to the problem of statistical infer function is tied to data and has as the variable, ence. We argue that our approach satisfies LP. But, y e the following notation is often used: in contrast with the Bayesian approach, we do not as sume any knowledge of the prior distribution of the liky(e) = Pe(y) (4) unknown parameter. We claim that our solution is more information efficient than Wald 's minimax solu Thus, a likelihood function is determined by a partially tion, but demands less input than the Bayesian solu specified probabilistic model and an observation. It is tion. important to note that the likelihood function can no longer can be interpreted as a probability function. For 2 Likelihood Information and example, integration (or summation) of the likelihood Possibility Function function over the parameter space, in general, does not sum up to unity. Let us recall basic definitions in possibility theory. A The Likelihood Principle (LP) states that all informa possibility function is a mapping from the set of pos tion about e that can be obtained from an observation sible worlds S to the unit interval [0, 1] y is contained in the likelihood function for e given y, up to a proportional constant. LP has two powerful 1r: S-+ [0, 1] such that max1r(w) = 1 (1) wES supporting arguments [3]. Possibility for a subset A <;; S is defined as First, it is well known that likelihood function liky (e) is a minimal sufficient statistic for e. Thus, from the 1r(A) = maX7r(w) (2) classical statistical point of view, the likelihood con wE A tains all information about e. Second, Birnbaum, in 1 A conditional possibility is defined for A, B C a seminal article published in 1962 [4], showed that S, 1r(A) f. 0 :� the Likelihood Principle is logically equivalent to the 1r( B) combination of two fundamental principles in statis 1r(B[A) = (3) ( ) tics: the principle of conditionality and the principle of sufficiency. The principle of conditionality says that Compared to probability functions, the major differ only the actual observation is relevant to the analysis ence is that the possibility of a set is the maximum of and not those that potentially could be observed. The possibilities of elements in the set. principle of sufficiency says that in the context of a Given a probabilistic model :F and an observation y, given probabilistic model, a sufficientstatistic contains we want to make inference about the unknown param all relevant information about the parameter that can eter e which is in space !1. be obtained from data. The likelihood concept used in modern statistics was Let us define an "extended" likelihood function Liky : n coined by R. A. Fisher who mentioned it as early as 2 -+ [0, 1] as follows: 1922 [9]. Fisher used likelihood to measure the "men Ziky Ziky tal confidence"in competing scientific hypothesesas a del (e) (e) (5) result of a statistical experiment (see [8] for a detailed liky ( 0) account). Likelihood has a puzzling nature. For a del sup Liky(w) A<;; (6) single value the likelihood is just the prob for !1 e0, Pe0 (y), wE A ability of observing y if e0 is in fact the true value of 1 There are two definitions of conditioning in possibil where e is a maximum likelihood estimate of e. ity theory. One that is frequently found in possibility lit The "extended" likelihood is not new. In fact, it was erature is called ordinal conditioning. The definition we used by Neyman and Pearson in seminal papers pub use here is called numerical conditioning. See [7] for more details. lished in 1928 [18] where their famous hypothesis test- 172 GIANG & SHENOY UAI2002 ing theory was presented. The idea is to use the max Lik imum likelihood estimate as the proxy for a set. Such a procedure is not only intuitively appealing, but it is also backed by various asymptotic optimality proper ties [13, 15, 19]. If in the process, one decides to focus on a particular subset of the parameter space B <;:: !1, the effect of refocusing on the extended likelihood function could be express through what we call conditioning. liky(An B) liky (AlB) �� (7) Figure 1: A Likelihood for Y = 1.4 liky(B) We have a theorem whose proof consists of just veri often raised in discussions: "Where does possibility in fying that the function Iliky satisfies the axioms of a formation come from?" One answer is it comes from a possibility measure listed in eqs (1, 2, 3). partially specified probability model of a phenomenon and experimental observations. The derived possibil Theorem 1 The extended likelihood function liky is ity function encodes all relevant information that can a possibility function.

Load more