4 Invariant Statistical Decision Problems
Total Page:16
File Type:pdf, Size:1020Kb
4 Invariant Statistical Decision Problems 4.1 Invariant decision problems Let be a group of measurable transformations from the sample space X into G itself. The group operation is composition. Note that a group should include 1 1 the identity transformation e and the inverse g− , such that g− g = e. Consequently, all transformations are one-to-one. Definition 30. The family of distributions Pθ, θ Θ, is said to be invari- 2 ant under the group , if for every g and every θ Θ there exists a G 2 G 2 unique θ0 Θ such that the distribution of g(X) is given by Pθ whenever 2 0 the distribution of X is given by Pθ. This unique θ0 is denoted byg ¯(θ). The meaning of the definition is that for every real-valued integrable function φ Eθφ(g(X)) = Eg¯(θ)φ(X): Definition 31. A parameter θ is said to be identifiable if distinct values of θ correspond to different distributions. If the family of distributions is invariant under , then the unicity of θ G 0 implies that θ is identifiable. Lemma 8. If a family of distributions Pθ, θ Θ, is invariant under , then 2 G ¯ = g¯ : g is a group of transformations of Θ into itself. G f 2 Gg Definition 32. A decision problem, consisting of the game (Θ; ;L) and A the distributions Pθ over X is said to be invariant under the group if the G family of distributions is invariant and if the loss function is invariant under in the sense that for every g and a there exist a unique a G 2 G 2 A 0 2 A such that L(θ; a) = L(¯g(θ); a0) for all θ Θ. 2 Denote the unique a0 byg ˜(a). Lemma 9. If a decision problem is invariant under a group , then ˜ = G G g˜ : g is a group of transformations of into itself. f 2 Gg A Example 14. Consider the shift group in the normal estimation problem with L(θ; a) = (a θ)2. Here g (x) = x + c. Thus, g¯ (θ) = θ + c and − c c g˜c(a) = a + c. 14 Example 15. Assume X is binomial. Let L(θ; a) = W (θ a), for some − even function W . Let = e; g , where g(x) = n x. The distribution of G f g − g(X) is B(n; 1 θ). Thus, g¯(θ) = 1 θ and g˜(a) = 1 a. − − − 2 Example 16. Let X N(θ11; θ2I) and Θ = θ = (θ1; θ2): θ2 > 0 . Let ∼ 2 2 f g = R and let L(θ; a) = (a θ1) /θ . Consider transformations of the A − 2 form g (x) = bx + c1, where b = 0. Then g¯ (θ) = (bθ + c; b θ ) and b;c 6 b;c 1 j j 2 g˜b;c(a) = ba + c. Example 17. Consider again the situation in Example 16. If = 0; 1 A f g and we take 1 if θ > 0 L(θ; 0) = 1 0; if θ 0; 1 ≤ 1 if θ 0 L(θ; 1) = 1 ≤ 0; if θ > 0: 1 For gb(x) = bx, b > 0, then g¯b(θ) = bθ and g˜b(a) = a. 4.2 Invariant decision rules Definition 33. Given an invariant decision problem, a non-randomized decision rule d is said to be invariant under if for all x X and all 2 D G 2 g 2 G d(g(x)) =g ˜(d(x)): A randomized decision rule is invariant if it is a mixture of invariant decision rules. Theorem 14. The risk of an invariant decision rule is constant over the orbits of the group ¯. G 4.3 Location and scale parameters Definition 34. A real parameter θ Θ is said to be a location parameter 2 for the distribution of a random variable X if F (x) is a function of x θ θ − only. Lemma 10. For the location parameter: 1. θ is a location parameter for the distribution of X if, and only if, the distribution of X θ is independent of θ. − 15 2. If the distribution of X are absolutely continuous with density fθ(x), then θ is a location parameter if, and only if, f (x) = f(x θ), for θ − some density function f(x). Example 18. The normal mean (known variance), the Cauchy α (known β), the U(θ; θ + 1) are all examples of location parameters. Definition 35. A real parameter θ Θ is said to be a scale parameter for 2 the distribution of a random variable X if Fθ(x) is a function of x/θ only. Lemma 11. For the scale parameter: 1. θ is a location parameter for the distribution of X if, and only if, the distribution of X/θ is independent of θ. 2. If the distribution of X are absolutely continuous with density fθ(x), then θ is a location parameter if, and only if, fθ(x) = (1/θ)f(x/θ), for some density function f(x). Example 19. The normal N(θµ, θ2), the Cauchy β (known α), the U(0; θ), the β in a Gamma distribution (known α) are all examples of location pa- rameters. One can combine both definitions and get a location-scale family with parameters (µ, θ). Note that the distribution of X µ is independent of µ, − but the distribution of X/σ is not independent of σ. Lemma 12. If every nonrandomized invariant rule is an equalizer rule (that is, it has a constant risk), the nonrandomized invariant decision rules form an essentially complete class of all randomized invariant rules. Assume Θ = = R and let L(θ; a) = L(a θ). A − Theorem 15. In the problem of estimating a location parameter with loss L(θ; a) = L(a θ), if E0(X b) exists and is finite for some b and if there − − exists b0 such that E0L(X b0) = inf E0L(X b); (1) − b − (over b for which the expectation exists) then d (x) = x b is a best invariant 0 − 0 rule. It has a constant risk equal to (1). Example 20. If L(θ; a) = (a θ)2 and X has a finite variance, then b = − 0 E0(X). If L(θ; a) = a θ and X has a finite first moment, then b0 is the j − j median of X under P0. 16 Example 21. Let Pθ(X = θ + 1) = 1 Pθ(X = θ 1) = 0:5 and let − − a θ if θ a 1, L(θ; a) = j − j j − j ≤ 1 if a θ > 1. j − j The the best invariant rules are not admissible. Example 22. In the multidimensional normal location estimation problem the vector of means is the best invariant estimator. However, it is not ad- missible for dimension higher than 2, since the estimator: k 2 d(X) = X¯ 1 − − X¯ 2 k k is better (k is the dimension). However, typically the best invariant estimate is minimax. Theorem 16. If an equalizer rule is extended Bayes, it is also a minimax rule. Which leads to: Theorem 17. Under the conditions of Theorem 15, if L is bounded below and if for every > 0 there exists N such that E0L(X b)I X N inf E0L(X b) − fj |≤ g ≥ b − − for all b, then the best invariant rule is minimax. An example of a location parameter is the normal mean: Theorem 18. If X1;:::;Xn is a sample from a normal distribution with mean θ and known variance σ2, then X¯ is a best invariant estimate of θ and a minimax estimate of θ, provided that the loss function is a nondecreasing ¯ function of a θ and that E0L(X) exists and is finite. j − j 4.4 Estimation of a distribution function Let X1;:::;Xn be a sample from a continuous distribution F . We estimate the distribution with Fˆ, which is continuous from the right. Two commonly used loss functions are: L1(F; Fˆ) = sup F (x) Fˆ(x) ; (2) x j − j 17 and L (F; Fˆ) = (F (x) Fˆ(x))2F (dx); (3) 2 − Z A sufficient statistic is (X(1);:::;X(n)), the order statistics. Let us re- duce the collection rules further by considering only invariant decision rules. Consider the group of transformations = g : g (x ; : : : ; x ) = (ψ(x ); : : : ; ψ(x )) G f ψ ψ (1) (n) (1) (n) g 1 for all ψ continuous and strictly monotone. Theng ¯ψ(F )(x) = F (ψ− (x)) 1 andg ˜ψ(Fˆ)(x) = Fˆ(ψ− (x)) for both losses. As part of the homework assignment you should prove that the invariant decision rules have the form n ˆ F (x) = uiI[X(i);X(i+1))(x); Xi=0 for = X < X < < X < X = . It turns out that for −∞ (0) (1) ··· (n) (n+1) 1 the second loss function and for invariant rules: n F (X(i+1)) ˆ 2 R(F; F ) = E (t ui) dt: F (X(i)) − Xi=0 Z This function is minimized for ui = (i + 1)=(n + 2). 18.