<<

An Interpretation of and Basu's E. L. LEHMANN*

In order to obtain a statistical interpretation of complete• f. A necessary condition for T to be complete or bound• ness of a sufficient T, an attempt is made to edly complete is that it is minimal. characterize this property as equivalent to the condition The properties of minimality and completeness are of that all ancillary are independent ofT. For tech• a rather different nature. Under mild assumptions a min• nical reasons, this characterization is not correct, but two imal exists and whether or not T is correct versions are obtained in Section 3 by modifying minimal then depends on the choice of T. On the other either the definition of ancillarity or of completeness. The hand, existence of a complete sufficient statistic is equiv• light this throws on the meaning of completeness is dis• alent to the completeness of the minimal sufficient sta• cussed in Section 4. Finally, Section 5 reviews some of tistic, and hence is a property of the model '!/' . We shall the difficulties encountered in models that lack say that a model is complete if its minimal sufficient sta• completeness. tistic is complete. It has been pointed out in the literature that definition KEY WORDS: Anciltary statistic; Basu's theorem; Com• (1.1) is purely technical and does not suggest a direct Unbiased estimation. pleteness; Conditioning; statistical interpretation. Since various aspects of statis• 1. INTRODUCTION tical become particularly simple for complete models, one would expect this concept to carry statistical Let X be a random observable distributed according meaning. It is the principal aim of this article to attempt to a distribution from the family'!/' = {P9 , 8 E !1}. Recall a statistical characterization of models which are com• the following definitions. plete or boundedly complete. In the last section, we shall briefly survey some of the difficulties presented by in• A statistic V = V(X) is ancillary if its distribution does not depend on 8. complete models.

A statistic T = T(X) is sufficient if the conditional dis• 2. AN EXAMPLE AND BASU'S THEOREM X T does not depend on 8. tribution of given In order to get some insight into the problem, consider These two concepts are complementary. If the rest of the following example. the data is discarded, a sufficient statistic by itself retains Example 1. Let X 1 , • • • , X" be iid with density f(x1 about 8 contained in the data; an an• all the information - 8), and contrast what happens when f is normal, ex• 8. cillary statistic by itself contains no information about ponential, or uniform on the one hand, and in most other Since sufficiency reduces the data without loss of in• cases, for example, when f is logistic, Cauchy, or double formation, a sufficient statistic is of greatest interest when exponential on the other. In the fitst type of situation the • it is minimal, that is, when it provides the greatest re data can be reduced without loss of information to a small duction of the data. Formally, a sufficient statistic is number of sufficient statistics: one for the normal (Xl and minimal if it is a of every other sufficient statis• exponential (X(t~l . and two for ·the uniform (Xm . X,.,). tic. Although exceptions are possible (Landers and Rogge wbere Xo> < .. . < X denote the order statistics. In the 1972 ; Pitcher 1975), minimal sufficient statistics typically second type of situation, the of n order statistics is exist and are easy to construct (Bahadur 1954; Lehmann minimal sufficient. (Intermediate cases are of course pos• and Scheffe 1950). sible, as is seen, for example, by considering f(x - 8) A sufficient statistic T is complete if = Ce - ".) Eaf - X(i> (i = I, . .. , n - I) constitute an (n - I)-dimensional ancillary statistic. In the normal and exponential cases, • E. L. Lehmann is Professor, Department of Statistics , University Y's of the minimal sufficient statistic of California, Berkeley, CA 94720. Research was supported in part by the are independent National Science Foundation Grant MCS79-03716 and Office of Naval Research Contract N00014-75-C-0444. The author wishes to thank the referee for his very careful reading of two versions of the manuscript, © Journal of the American Statistical Association and in particular for his discovery of an error in the original version of June 1981, Volume 76, Number 374 Theorem J . and Methods Section

J. Rojo (ed.), Selected Works of E. L. Lehmann, Selected Works in Probability 315 and Statistics, DOI 10.1007/978-1-4614-1412-4_27, © Springer Science+Business Media, LLC 2012 Journal otlhe American Statistical Association, June 1981 and hence carry no information about a either by them• Theorem 1. A necessary and sufficient condition for a selves or in conjunction with the minimal sufficient sta• sufficient statistic T to be boundedly complete is that tistic. On the other hand, when the n order statistics are every bounded first-order ancillary Vis uncorrelated (for minimal, the differences Y1-although still ancillary-are all a) with every bounded real-valued function of T. functions of the minimal sufficient statistic. Thus, suffi• The proof is based on the following characterization ciency has not been successful in squeezing out the an• of all bounded, unbiased estimators of zero. cillary material. By themselves, the Y's carry no infor• Lemma 1. If a sufficient statistic Tis boundedly com• mation but in conjunction with the rest of the data they plete, then a bounded statistic S = S(X) is an unbiased do. estimator of zero (im The uniform case is intermediate. Here the Y's are not independent of the minimal sufficient statistic, but the 1](t) = E[S I t] = 0 (a.e. T) . (3.1) ratios Z1 = Y;/Y 1 (i = 2, ... , n - I) are. If we represent (Here (a. e. ~r) means: except on a set N of values ofT, the ordered sample by which has probability zero for all a.) (![X(I) X XtnJJ, Y, = XT) and hence cov6 (f(T), V) The minimal sufficient statistic continues to beT= (X(l), = 0 for all a. X 0, and Basu (1955, 1958), which states that when a sufficient this proves the condition to be sufficient. statistic T is boundedly complete, then all ancillary sta• The analogous result holds for completeness instead tistics are independent of T. This would provide the de• of bounded completeness if attention is restricted to sta• sired characterization of complete models as those in which the minimal sufficient statistic is completely suc• tistics with finite . cessful in discarding all ancillary material, if the converse Instead ofmodifying the definition of ancillarity to ob• were also true, that is, if the independence of all ancillary tain a converse of Basu's theorem, one can instead mod• ify the definition of Quite generally, call statistics from a sufficient statistic implied that T were completenes~. boundedly complete. Unfortunately this is not correct a sufficient statistic :Y'-complete if Eef( 1) = 0 for all a and f E :Y' implies f(t) = 0 (a.e. ~T) where :Y' is some (see Sec. 3, Ex. 2) but it contains a germ of . given of real-valued functions. Completeness and bounded completeness correspond respectively to the 3. SOME ADAPTATIONS OF BASU'S THEOREM case that F is the class of all integrable or all bounded The converse of Basu's theorem fails because ancil• functions. The case that it is the class of all square in• larity is concerned with the whole distribution of a sta• tegrable functions is relevant for the analog of Theorem tistic, while completeness is a property dealing only with I mentioned following the proof of that theorem. expectation. We shall now show that correct versions of If Fa is the class of all two-valued functions, we have the converse can be obtained by replacing either ancil• the following result, which provides a partial converse larity with the corresponding first-order property or com• of Basu's theorem. pleteness with a condition reflecting the whole distribution. Theorem 2. If Let us call a real-valued statistic V = V(X) first-order Every ancillary statistic is independent ofT, (3 .2) ancillary if £ 8 (V) is independent of a. (A first-order an• cillary is simply an unbiased estimator of zero plus a then Tis Fa-complete. constant.).Then we have the following first-order version Proof Suppose Tis not Fa-complete. Then there exists of Basu's theorem. a two-valued function f such that Eof(T) = 0 for all a,

316 Lehmann: Completeness and Basu's Theorem but f is not zero with probability one. Let f(t) = -a For any such V, P(V = I) = P(V = 0) = ! and Vis when t E A and f(t) = b otherwise. Then P0 (T E A) independent ofT, so that (3.2) holds. On the other hand, = b/(a + b) is independent of 8 so that the indicator of T has been seen not to be boundedly complete. A. IA(T) , is an ancillary. This provides a contradiction. That F0 -completeness is a very weak (and somewhat unnatural) condition is indicated by the fact that a suf• Basu's theorem, together with Theorem 2, shows that ficient statistic can be F0 -complete with being minimal. Bounded completeness ~ (3.2) ~ F0 -completeness. As a simple example, suppose that X and Y are inde• (3 .3) pendent Poisson variables with E(X) = E( Y) = }l. Then T = X + Y is minimal sufficient and complete, but (X, That neither of these implications holds in the other Y) is F0 -complete. This follows from the facts that the direction is shown by the following example. conditional distribution of X given X + Y = t is the binomial distribution b(l. /) and that for no constant c Example 2. Let X be distributed as does there exist a nontrivial two-valued function f(x, y) - 5 -4 -3 -2 - 1 I 2 3 4 5 such that E[f(X, Y) I t) = c for all 1. a 'pzq a'pq2 !pl !ql 'Y'pq 'YPq !ql !pl apq2 ap2q Example 2, together with (3 .3), shows that bounded where the first row lists the values X takes on and the completeness is a stronger condition than (3.2) while F0 - second row their probabilities and where a, 'Y · a', 'Y' are completeness is not strong enough. It is interesting that positive constants satisfying there exists an intermediate· completeness condition whiCh is exactly equivalent to (3 .2) . Although the con• a + 'Y =a' + 'Y ' = f . dition is rather intractable and not likely to be of much !I is easy to check the following facts. use, we give it here for the sake of completeness. Let F, be the class of all functions f(t) which are the If 0 < p < I, q = I - p, the probabilities add up conditional expectation of a two-valued function of X to l. given t. If X = T, then F 1 reduces to F0 • II T = I X I is minimal sufficient. III P(X > 0) = ! so that the indicator V of the set Theorem 3. Condition (3.2) is necessary and. sufficient {X > 0} is an ancillary statistic. forT to be F 1-complete in the model '!P. IV If a * a ', Vis not independent ofT, so that (3 .2) Proof (a) Suppose T is F 1-complete with respect to does not hold. '1P and that Vis an ancillary statistic. Then for any set A , the probability P(V E A) = pis independent of 8, and V To see that Tis F0 -complete, note that Ef(T) = 0 iff g( V) = /A ( V) - p is both a two-valued function of X and an unbiased estimator of zero. If f(t) = E[g( V) I t) , it f(l)b + 'Y']pq + f(2}q 3 + f(3)p 3 follows as in the proof of Theorem l(a} that E0 f(T} = 0 for all 8. Since f E F 1, this implies + (a + a')[f(4)pq2 + f(5)p 2 q] = 0. f(t) = P[V E A I 1) - p = 0 a.e. This holds for all 0 < p < I iff and hence the independence of V and r. f(2) = f(3) = 0, f(l) = a, (3.4) (b) Conversely, suppose that Tis not F 1-complete, so f(4) = f(5) = - 'I + 'Y' a that there exists f E F, which is not 0 a.e. but such that a+ a ' Eof(T) = 0 for all 8. Let g be a nontrivial two-valued for some a. Since no two-valued f can satisfy (3.4), Tis function of X for which E[g(X} I 1) = f(t), say g(x} = a if x E A and g(x) = b otherwise. Then Eog(X) = 0, the F0 -complete. indicator V of A is an ancillary statistic as in the proof It now follows from (IV) that F 0 -completeness does of Theorem 2, but by the assumption about f, Vis not not imply (3 .2). Also, since (3.4) shows Tnot to be bound• independent of T. edly complete; F 0 -completeness does not imply bounded completeness. 4. INTERPRETATION OF COMPLETENESS That (3 .2) does not imply bounded completeness can be seen from any example without bounded completeness Ancillary statistics by themselves have no value for in which no ancillary statistic exists and (3 .2) therefore estimating 8 and the same is likely to be true of first-order holds vacuously. A less trivial example is obtained by ancillaries (since their expectation is. independent of 8), puting a = a', 'Y = 'Y ' in Example 2. The totality of although statistics with either property may be valuable ancillary statistics is then obtained by selecting any of when used in conjunction with other statistics. The re• the 5 sign combinations ±I, ±2, ±3, ±4, ±5, for ex• sults of the preceding section can be summarized roughly ample + I, - 2, - 3, + 4, -5 and setting by saying that the various forms of completeness of a sufficient statistics T characterize the success of T in V( + I)= V( -2) = V( -3) = V(4) = V( -5) =I (3 .5) separating the informative part of the data from that part V(-1) = V(+2) = V(+3) = V(-4) = V(5} = 0 . which by itself carries no (or little) information.

317 Journal of lhe American StaHstlcal Association, June 1981

Basu's theorem explains how a complete sufficient sta• be found in Cox and Hinkley (1974). Many additional tistic achieves this separation. It does this by making the results are given by Barndorff-Nielsen (1978). As a brief ancillary part of the data independent of (or uncorrelated introduction to the subject, we shall here discuss a few with) T. For the technical reasons mentioned in Section examples. 2, this interpretation of completeness is not as clear-cut as one might like. Roughly speaking, however, a suffi• Example 3. (a) In the standard introductory example, cient statistic T is complete if it contains no ancillary which is implicit in the writing of Fisher, the natural an• information or equivalently if all ancillary information is cillary statistic is the sample size. To be specific, let N independent of (or at least uncorrelated with) T. be a random variable taking on values I, 2, . .. with The idea that it is the lack of correlation between first• known probabilities 'IT,, 'IT2, • • . . Having observed N order ancillaries and functions of T that makes complete = n, we perform n binomial trials with success proba• sufficient statistics so effective as a noise-free distillation bility p and let Sn be the number of successes in these of the data finds some support in a theorem of Lehmann trials. For the observation X = (N, SN), X is minimal and Scheffe (1950) and Rao (1952). Suppose Tis not com• sufficient and N is ancillary. plete, and consider the functions f(T) which are uncor• The natural estimator SNIN of p is unbiased and has related with all first-order ancillaries (all functions being variance pq E(l/N). It is, however, not clear that this is assumed to be square integrable). One might then expect the appropriate measure of accuracy for the following these f's (though not sufficient) to be particularly effec• reason. The most striking feature of the example is that tive estimators. The theorem in question states in part N, while not providing any information about p, tells us that these functions are the only unbiased estimators with how much information about p is contained in SN · If the uniformly minimum variance (UMVU). observed value n of N is large, SN is highly informative; Unfortunately, it turns out that in many common prob• for small n it carries little information. For this reason lems there are no functions f for which the above prop• it may be better to cite as measure of accuracy the con• ditional variance pq/n as appropriate in those cases in erty holds. For example, when X 1 , • • • , Xn are iid ac• cording to the uniform distribution on (0 - !, 0 + t), it which n has been obtained. was shown by Lehmann and Scheffe (1950, I956) that no If this is accepted, not only the variance but function of 0 has a UMVU estimator. One might expect the whole problem should be viewed conditionally. This this property to hold fairly generally in location problems includes in particular the choice of estimate which in the with incomplete minimal sufficient statistic. (For some present case would remain unchanged but in others might partial results in this direction see Bondessen I975.) not. From this point of view, conditioning on N has the The corresponding result was proved by Unni (1978) for advantage of reducing the random part of the data from the case that X 1 , • • • , X" is a sample from the normal (N, SN) to the simple one-dimensional statistic Sn . Since distribution N(rcr, cr2 ), r = known, and for the Behrens• S" is complete, the conditional model is complete, there Fisher problem (i .e. , the case of samples from two normal exist UMVU estimators for p and pq/n, and so on. distributions N(~ . cr 2), N(~ , T 2) with unknown (b) Example (a) may seem artificial (although very and common unknown mean.) Examples in which the similar situations arise in sampling from finite popula• minimal sufficient statistic is not complete but in which tions) but some of its principal features are in fact quite some parametric functions have UMVU estimators can typical. Consider for instance once more the location of course be constructed (see for instance Ex. 5.2 of problem of Example I. Here the differences Y, provide Lehmann and Scheffe I950), but even then the dimension the ancillary information which i,') analogous to N, and of the space of such functions is typicalry quite low. typically give an indication of the dispersion of the con• ditional distribution of an estimator of 0 such as the mean or median. This is particuarly clear in the case of the 5. ANCILLARY STATISTICS AND THE PRINCIPLE OF uniform distribution. The minimal sufficient statistic can CONDITIONALITY then be represented as (Z, Y) with Z = ![X(I) + X1n>l and When a model is not complete, the minimal sufficient Y = X - X(l). The conditional distribution of Z, given statistic will contain ancillary or at least first-order an• Y = y, is the uniform distribution on (0 - i(l - y), 0 cillary statistics, and as a result its dimension will typi• + !(I - y)). Thus Y again measures the amount of in• cally be larger than that of the parameter space. The formation Z contains about 0 and it is natural to restrict question then arises as to how to use the ancillary infor• attention to the conditional distribution of Z given Y. In mation to effect a further reduction of the data in order this case even the conditional model is not complete. to facilitate the inference problem. Little is known about (Additional ancillaries exist but are rather unsatisfactory. ways of utilizing first-order ancillaries (the by See Basu I964, Sec. 4.) Lehmann and Scheffe. and by Rao mentioned in Section (c) As a last illustration consider the ancillary statistic 4 can be viewed as a contribution in this direction) but V = I{x > 0} of Ex. 2. The probabilities of the conditional there is considerable literature on the use of ancillary distribution of X given V = I are I 2 3 4 5 statistics. The ideas were first put forward by R.A. Fisher (5.1) (1934 , I935) and an excellent discussion of the issues can (3 - 2a)pq, q 3 , p 3 , 2apq2 , 2apq2 •

318 Lehmann: Completeness and Basu's Theorem

Given V = 0, they are the corresponding values with a so that part of the randomness of the data is eliminated. replaced by a'. It is easy to see that an experiment with It frequently happens that the conditional model for Y outcome probabilities (5. I) is more informative than the given v is complete, in which case conditioning is partic• corresponding experiment with a ' when a > a'. (The ularly rewarding. It is interesting to note that in this latter reason is that the events with probabilities 2a.pq2 and case the ancillary V is essentially maximal in the sense 2apq2 can be split at random into four events with prob• of Basu (1959) . This result (to which we hope to return abilities 2a'pq2 , 2a'p 2 q, 2(a - a')pq2 , 2(a - a')p2 q. in a later paper) presents a considerable strengthening of The latter two have a combined probability of 2(a - Theorem 7 of Basu (1959). ex' )pq, which can be combined with (3 - 2cx)pq to yield In summary it is seen that models involving ancillaries an event with probability (3 - 2cx')pq. Thus, from the contained in the minimal sufficient statistic are more dif• a-experiment and a table of random numbers one can ficult to analyze than complete models. Even in so rel• generate the a' -experiment but not vice versa.) atively simple a situation as that of Example I, no general The above considerations suggest the conditionality agreement has been reached whether to assess the ac• principle, first proposed by Fisher (1935, 1936, 1956) and curacy .of the best estimator of a (the Pitman estimator) investigated as a fundamental principle by Birnbaum conditionally or unconditionally. The situation becomes (1962): in the presence of an ancillary statistic V, statis• even more complicated when nuisance parameters are tical inference should be carried out not in the overall present, a possibility we have not considered here. model but in the conditional model given V. Commenting on his efforts to develop a satisfactory Apart from possible reservations which one may have small-sample theory of estimation, Fisher wrote in 1956 about the principle itself, difficulties in implementing it (pp. 157 and 158): "The most important step which has arise from the fact that ancillary statistics are not unique. been taken so far to complete the structure of the theory In Example 3(b), for instance, L (X, -X? is ancillary, of estimation is the recognition of Ancillary statistics." as is L I X, - X I or any other function of the differences. Despite some important recent work on conditional in• To avoid this difficulty it is tempting, in analogy with the ference by Efron and Hinkley (1978) and Hinkley (1977), definition of a minimal sufficient statistic, to call an an• we still seem to be far from a full understanding of this cillary statistic V maximal if every other ancillary statistic difficult topic and a satisfactory small-sample theory of is a function of V. One would then condition on this models that lack completeness. maximal V, thereby carrying the conditioning process as far as possible. (Received November 1979. Revised December 1980.] Unfortunately, a maximal V in this strong sense often does not exist. Basu (1964) has therefore introduced a REFERENCES weaker concept and has defined V as maximal if there BAHADUR, R.R. (1954), "Sufficiency and Statistical Decision Func• exists no ancillary statistic V' of which V is a function. tions," Annals of Mathematical Statistics, 25, 423-462. Such maximal ancillaries, however, frequently are not BARNARD, G.A., and SPROTI, D.A. (1971), "A Note on Basu's unique. In searching for a way out of the difficulty, it is Examples of Anomalous Ancillary Statistics" (with discussion), in Foundations of Statistical Inference, eds. Godambe and Sprott, helpful to recall the motivation for conditioning which 163-176. Toronto: Holt, Rinehart and Winston. was given in the comments after Example 3, and which BARNDORFF-NIELSEN, 0. (1978), Information and Exponential was stated by Fisher (1935, p. 48) as follows: "Ancillary Families in Statistical Theory, New York: John Wiley. BASU, D. (1955, 1958) , "On Statistics Independent of a Complete statistics are only useful when different samples of the Sufficient Statistic," Sankhya, 15, 377-380 and 20, 223-226. same size can supply different amounts of information, -- (1959), "The Family of Ancillary Statistics;• Sankhya, 21, and serve to distinguish those which supply more from 247-256. --(1964), "Recovery of Ancillary Information," Sankhya, Ser. A, those which supply less." 26, 3-16. Fisher's statement leads to the idea that the better an BIRNBAUM, A. (l%2), "On the Foundations of Statistical Inference" ancillary statistic is, in distinguishing the more informa• (with discussion), Journal of the American Statistical Association, 57. 269-306. tive samples from those that are less so, the more useful BONDESSEN, L. (1975). "Uniformly Minimum Variance Estimation it is as a conditioning variable. An implementation of this in Families," Annals of Statistics, 3, 637-660. criterion for choosing an ancillary has been provided by COX, D.R. (1971), "The Choice Between Alternative Ancillary Statis• tics," Journal of the Royal Statistical Society, Ser. B, 33, 251-255. Cox (1971), who compares different ancillaries V in terms COX, D.R., and HINKLEY, D.V. (1974), Theoretical Statistics, Lon• of the variance of the of the condi• don: Chapman and Hall. tional distribution of X given V. An ancillary V is pre• EFRON, B., and HINKLEY, D.V. (1978), "Assessing the Accuracy of the Maximum Likelihood Estimator: Observed Versus Expected ferred to V' if it has the larger such variance for all pa• Fisher Information" (with comments), Biometrika, 65, 457-487. rameter values. Such a principle will resolve the choice FISHER, R.A. (1934), "Two New Properties of Mathematical Likeli• of ancillary in many cases although it may of course turn hood," Proceedings of the Royal Statistical Society, Ser. A, 144, 285-307. out that it leads to different choices for different param• --(1935), "The oflnductive Inference." Journal ofthe Royal eter values. Statistical Society, 98, 39-54. "Uncertain Inference," Proceedings of the American If X = ( Y, V), where V is ancillary, conditioning on -- (1936), Academy of Arts and Science, 71 , No.4, 245-258. V can be thought of as data .reduction obtained by re• --(1956), Statistical Methods and Scientific Inference, New York: placing the random quantity V by a· known constant v, Hafner.

319 Joumal olllle American StaHsHcal AssoclaHon, June 1981

HINKLEY, D. V. (1977), "Conditional Inference About a Normal Mean PITCHER, T.S. (1975), "Sets of Measures Not Admitting Necessary With Known Coefficient of Variation," Biometrika, 64, 105-108. and Sufficient Statistics or Subfields," Annals of Mathematical Sta• LANDERS, D., and ROGGE, L. (1972), "Minimal Sufficient cr-Fields tistics, 26, 26'1-268. and Minimal Sufficient Statistics. Two Counterexamples." Annals RAO, C.R. (1952), "On Statistics With Uniformity Minimum Vari• of Mathematical Statistics, 43, 2045-2049. ance," Science and Culture, 17, 483-484. LEHMANN, E .L., and SCHEFFE, H. (1950, 195, 1956), "Complete• UNNI, K. (1978), "The Theory of Estimation in Algebraic and Analytic ness, Similar Regions, and Unbiased Estimation," SankhyiJ, 10, Exponential Families With Applications to Variance Component 305-340; 15. 219-236; and 17, 250. Models," Ph.D. thesis, Indian Statistical Institute, Calcutta.

320