4 Invariant Statistical Decision Problems

4.1 Invariant decision problems

Let be a group of measurable transformations from the sample space X into G itself. The group operation is composition. Note that a group should include 1 1 the identity transformation e and the inverse g− , such that g− g = e. Consequently, all transformations are one-to-one.

Definition 30. The family of distributions Pθ, θ Θ, is said to be invari- ∈ ant under the group , if for every g and every θ Θ there exists a G ∈ G ∈ unique θ0 Θ such that the distribution of g(X) is given by Pθ whenever ∈ 0 the distribution of X is given by Pθ.

This unique θ0 is denoted byg ¯(θ). The meaning of the definition is that for every real-valued integrable function φ

Eθφ(g(X)) = Eg¯(θ)φ(X).

Definition 31. A parameter θ is said to be identifiable if distinct values of θ correspond to different distributions.

If the family of distributions is invariant under , then the unicity of θ G 0 implies that θ is identifiable.

Lemma 8. If a family of distributions Pθ, θ Θ, is invariant under , then ∈ G ¯ = g¯ : g is a group of transformations of Θ into itself. G { ∈ G} Definition 32. A decision problem, consisting of the game (Θ, ,L) and A the distributions Pθ over X is said to be invariant under the group if the G family of distributions is invariant and if the is invariant under in the sense that for every g and a there exist a unique a G ∈ G ∈ A 0 ∈ A such that L(θ, a) = L(¯g(θ), a0) for all θ Θ. ∈ Denote the unique a0 byg ˜(a). Lemma 9. If a decision problem is invariant under a group , then ˜ = G G g˜ : g is a group of transformations of into itself. { ∈ G} A Example 14. Consider the shift group in the normal estimation problem with L(θ, a) = (a θ)2. Here g (x) = x + c. Thus, g¯ (θ) = θ + c and − c c g˜c(a) = a + c.

14 Example 15. Assume X is binomial. Let L(θ, a) = W (θ a), for some − even function W . Let = e, g , where g(x) = n x. The distribution of G { } − g(X) is B(n, 1 θ). Thus, g¯(θ) = 1 θ and g˜(a) = 1 a. − − − 2 Example 16. Let X N(θ11, θ2I) and Θ = θ = (θ1, θ2): θ2 > 0 . Let ∼ 2 2 { } = R and let L(θ, a) = (a θ1) /θ . Consider transformations of the A − 2 form g (x) = bx + c1, where b = 0. Then g¯ (θ) = (bθ + c, b θ ) and b,c 6 b,c 1 | | 2 g˜b,c(a) = ba + c. Example 17. Consider again the situation in Example 16. If = 0, 1 A { } and we take 1 if θ > 0 L(θ, 0) = 1 0, if θ 0, š 1 ≤ 1 if θ 0 L(θ, 1) = 1 ≤ 0, if θ > 0. š 1 For gb(x) = bx, b > 0, then g¯b(θ) = bθ and g˜b(a) = a.

4.2 Invariant decision rules Definition 33. Given an invariant decision problem, a non-randomized decision rule d is said to be invariant under if for all x X and all ∈ D G ∈ g ∈ G d(g(x)) =g ˜(d(x)). A randomized decision rule is invariant if it is a mixture of invariant decision rules.

Theorem 14. The risk of an invariant decision rule is constant over the orbits of the group ¯. G 4.3 Location and scale parameters Definition 34. A real parameter θ Θ is said to be a ∈ for the distribution of a X if F (x) is a function of x θ θ − only.

Lemma 10. For the location parameter:

1. θ is a location parameter for the distribution of X if, and only if, the distribution of X θ is independent of θ. −

15 2. If the distribution of X are absolutely continuous with density fθ(x), then θ is a location parameter if, and only if, f (x) = f(x θ), for θ − some density function f(x). Example 18. The normal mean (known variance), the Cauchy α (known β), the U(θ, θ + 1) are all examples of location parameters. Definition 35. A real parameter θ Θ is said to be a for ∈ the distribution of a random variable X if Fθ(x) is a function of x/θ only. Lemma 11. For the scale parameter: 1. θ is a location parameter for the distribution of X if, and only if, the distribution of X/θ is independent of θ.

2. If the distribution of X are absolutely continuous with density fθ(x), then θ is a location parameter if, and only if, fθ(x) = (1/θ)f(x/θ), for some density function f(x). Example 19. The normal N(θµ, θ2), the Cauchy β (known α), the U(0, θ), the β in a Gamma distribution (known α) are all examples of location pa- rameters. One can combine both definitions and get a location-scale family with parameters (µ, θ). Note that the distribution of X µ is independent of µ, − but the distribution of X/σ is not independent of σ. Lemma 12. If every nonrandomized invariant rule is an equalizer rule (that is, it has a constant risk), the nonrandomized invariant decision rules form an essentially complete class of all randomized invariant rules.

Assume Θ = = R and let L(θ, a) = L(a θ). A − Theorem 15. In the problem of estimating a location parameter with loss L(θ, a) = L(a θ), if E0(X b) exists and is finite for some b and if there − − exists b0 such that

E0L(X b0) = inf E0L(X b), (1) − b − (over b for which the expectation exists) then d (x) = x b is a best invariant 0 − 0 rule. It has a constant risk equal to (1). Example 20. If L(θ, a) = (a θ)2 and X has a finite variance, then b = − 0 E0(X). If L(θ, a) = a θ and X has a finite first moment, then b0 is the | − | of X under P0.

16 Example 21. Let Pθ(X = θ + 1) = 1 Pθ(X = θ 1) = 0.5 and let − − a θ if θ a 1, L(θ, a) = | − | | − | ≤ 1 if a θ > 1. š | − | The the best invariant rules are not admissible.

Example 22. In the multidimensional normal location estimation problem the vector of means is the best invariant . However, it is not ad- missible for dimension higher than 2, since the estimator:

k 2 d(X) = X¯ 1 − − X¯ 2 ’ k k “ is better (k is the dimension).

However, typically the best invariant estimate is minimax.

Theorem 16. If an equalizer rule is extended Bayes, it is also a minimax rule.

Which leads to:

Theorem 17. Under the conditions of Theorem 15, if L is bounded below and if for every  > 0 there exists N such that

E0L(X b)I X N inf E0L(X b)  − {| |≤ } ≥ b − − for all b, then the best invariant rule is minimax.

An example of a location parameter is the normal mean:

Theorem 18. If X1,...,Xn is a sample from a normal distribution with mean θ and known variance σ2, then X¯ is a best invariant estimate of θ and a minimax estimate of θ, provided that the loss function is a nondecreasing ¯ function of a θ and that E0L(X) exists and is finite. | − | 4.4 Estimation of a distribution function

Let X1,...,Xn be a sample from a continuous distribution F . We estimate the distribution with Fˆ, which is continuous from the right. Two commonly used loss functions are:

L1(F, Fˆ) = sup F (x) Fˆ(x) , (2) x | − |

17 and L (F, Fˆ) = (F (x) Fˆ(x))2F (dx), (3) 2 − Z A sufficient statistic is (X(1),...,X(n)), the order . Let us re- duce the collection rules further by considering only invariant decision rules. Consider the group of transformations

= g : g (x , . . . , x ) = (ψ(x ), . . . , ψ(x )) G { ψ ψ (1) (n) (1) (n) } 1 for all ψ continuous and strictly monotone. Theng ¯ψ(F )(x) = F (ψ− (x)) 1 andg ˜ψ(Fˆ)(x) = Fˆ(ψ− (x)) for both losses. As part of the homework assignment you should prove that the invariant decision rules have the form n ˆ F (x) = uiI[X(i),X(i+1))(x), Xi=0 for = X < X < < X < X = . It turns out that for −∞ (0) (1) ··· (n) (n+1) ∞ the second loss function and for invariant rules: n F (X(i+1)) ˆ 2 R(F, F ) = E (t ui) dt. F (X(i)) − Xi=0 Z

This function is minimized for ui = (i + 1)/(n + 2).

18