Minimax Admissible Estimation of a Multivariate Normal Mean and Improvement upon the James-Stein

By Yuzo Maruyama

Faculty of Mathematics, Kyushu University

A dissertation submitted to the Graduate School of Economics of University of Tokyo in partial fulfillment of the requirements for the degree of Doctor of Economics

June, 2000 (Revised December, 2000) Abstract

This thesis mainly focuses on the estimation of a multivariate normal mean, from the decision-theoretic point of view. We concentrate our energies on investigating a class of satisfying two optimalities: minimaxity and admissibility. We show that all classes of minimax admissible generalized Bayes estimators in the hitherto researches are divided in two, on the basis of whether a certain function of the prior distribution is bounded or not. Furthermore by using Stein’s idea, we propose a new class of minimax admissible estimators. The relations among minimaxity, a prior distribution and a shrinkage factor of the genaralized , are also investigated. For the problem of finding admissible estimators which dominate the inadmissible James-Stein estimator, we have some results. Moreover, in non-normal case, we consider the problem of proposing admissible minimax estimators. For a certain class of spherically symmetric distributions which includes at least multivariate-t distributions, admissible minimax estimators are de- rived.

ii Acknowledgements

I am indebted to my adviser, Professor Tatsuya Kubokawa for his guidance and encour- agement during the research for and preparation of this thesis. I wish to take opportunity to express my deep gratitude to Professors of Gradu- ate School of Economics, University of Tokyo: Akimichi Takemura, Naoto Kunitomo, Yoshihiro Yajima, Kazumitsu Nawata and Nozomu Matsubara, and to Professors of Faculty of Mathematics, Kyushu University: Takashi Yanagawa, Sadanori Konishi, Hiroto Hyakutake, Yoshihiko Maesono, Masayuki Uchida and Kaoru Fueda, for their encouragement and suggestions. Finally, special thanks to my wife, Hiromi, for her support, encouragement, tender- ness and patience.

iii Contents

Abstract ii

Acknowledgements iii

1 Introduction 1

2 Basic notions of statistical decision theory 6

3 Estimation of a multivariate normal mean 14 3.1 Inadmissibility of the usual estimator ...... 14 3.2 Admissibility and permissibility ...... 20 3.2.1 Preparations ...... 20 3.2.2 The necessary condition for the admissibility ...... 22 3.2.3 The sufficient condition for admissibility ...... 23 3.2.4 Exterior boundary problem and recurrent diffusions ...... 30 3.2.5 Permissibility ...... 32 3.3 The class of admissible minimax estimators ...... 34 3.3.1 Generalized Bayes estimators and their admissibility ...... 34 3.3.2 The construction of minimax Bayes estimator ...... 38 3.3.3 Alam-Maruyama’s type generalized Bayes estimator ...... 44 3.3.4 Generalization of Stein’s Bayes estimator ...... 50 3.4 Improvement upon the James-Stein estimator ...... 59

iv 4 Estimation of a mean vector of scale mixtures of multivariate normal distributions 66 4.1 Introduction and summary ...... 66 4.2 The construction of the generalized Bayes estimators ...... 67 4.3 Admissible Minimax estimators of the mean vector ...... 74

5 Estimation problems on a multiple regression model 78 5.1 Minimax estimators of a normal variance ...... 78 5.1.1 Introduction ...... 78 5.1.2 Main results ...... 79 5.2 Estimation of a multivariate normal mean with unknown variance . . . 84 5.2.1 Introduction ...... 84 5.2.2 Main results ...... 85

6 A New Positive Estimator of 89 6.1 Introduction ...... 89 6.2 A positive estimator improving on Johnstone’s estimator ...... 90

7 Conclusion 97

A Technical results 99

v Chapter 1

Introduction

Statistical decision theory, which was made by Wald(1950), characterizes the mode of modern statistical thinking. The contrast between the simplicity of its basic notions and the broad area of applicability is astonishing. The minimax principle and least favorable distributions are central notions of statistical decision theory and are close parallel to the corresponding concepts of the game theory. Admissibility is another basic concept of statistical decision theory and is related to the concept of Pareto optimality in economics. During the last thirty years, the study of statistical decision theory, in particular of admissibility, led to unexpected, amazing connections to many branches of mathematics, for example, partial differential equations of mathematical physics, differential inequalities and recurrence properties of stochastic process. For the good accounts, see Brown(1971), Eaton(1992) and Rukhin(1995). It is noted that these connections were discovered and have developed in the process of investigating the estimation problem of a multivariate normal mean, a typical topic of what is called the Stein phenomenon, which is introduced as follows. Many statisticians would like to believe that a natural estimator as maximum like- lihood estimator(MLE) is admissible because admissibility is a weak criterion for opti- 0 mality. For the estimation problem of a mean vector θ = (θ1, . . . , θp) based on random 0 vector X = (X1,...,Xp) having p-variate N(θ, Ip) relative to a quadratic loss function, Stein(1956) showed that the MLE X, which is also minimax,

1 is inadmissible for p ≥ 3 while it is admissible for p = 1, 2. James and Stein(1961) succeeded in giving an explicit form of an estimator improving on X as µ ¶ p − 2 δJS = 1 − X, kXk2 which is called the James-Stein estimator. This result surprises us because this means that a usual estimator is inadmissible in the framework of the simultaneous estimation of several parameters, although the components of the estimator are separately admissible to estimate the corresponding one-dimensional parameters, and we call it the Stein phenomenon. Generally if δ dominates a minimax estimator, then δ is also minimax and, thus, definitely preferred. It is, therefore, important from decision-theoretic point of view, to ascertain whether a proposed minimax estimator δ is admissible. Moreover if the estimator δ in the above is also inadmissible, it is interesting to derive admissible esti- mators improving upon δ. In this thesis, we mainly consider the estimation problem in Stein(1956)’s setting and concentrate our energies on investigating a class of estimators satisfying two optimalities: minimaxity and admissibility. We also consider the problem of deriving admissible estimators improving upon the James-Stein estimator which is known to be inadmissible. The outline of this thesis is as follows. In Chapter 2, we review basic notions of statistical decision theory, particular in esti- mation problem. The concepts of “admissibility”, “minimaxity”, “Bayes” and “permis- sibility” are introduced and results concerning close connections among these concepts are presented with proof. Chapter 3 contains a variety of results concerning an estimation of a multivariate normal mean. In Section 3.1, the minimaxity and the inadmissibility for p ≥ 3 of the natural estimator X are proved. In Section 3.2, results of Brown(1971,1988) and Srinivasan(1981) concerning admissibility and permissibility in this estimation problem are presented. In Section 3.3, we consider a generalized Bayes estimator δh with respect

2 to scale mixtures of multivariate normal distribution whose density is proportional to Z µ ¶ µ ¶ 1 λ p/2 λ exp − kθk2 λ−ah(λ)dλ, (1.1) 0 1 − λ 2(1 − λ) where h(λ) is a measurable positive function on (0, 1). By Brown(1971)’s result, it is shown that δh is admissible if and only if a ≤ 2. Next we consider the minimaxity of δh. As seen easily, all classes of minimax generalized Bayes estimators given in the hith- erto researches, that is, Strawderman(1971), Alam(1973), Berger(1976b), Faith(1978),

Maruyama(1998) and Fourdrinier et al.(1998) are expressed as δh although relations among classes above have not been clear. Hence we succeed in treating uniformly classes above. We point out that Fourdrinier et al.(1998)’s result is based on the boundedness of h(λ) although their result is powerful and includes results of Strawderman(1971), Berger(1976b) and Faith(1978). Indeed, the generalized Bayes estimators proposed by

Alam(1973) and Maruyama(1998), are expressed as δh in the case of unbounded h(λ). Thus we can see that the hitherto researches are divided in two, as Table 3.1 in Section 3.3.3, on the basis of whether h(λ) is bounded or not. Moreover we pay attention to the following Stein(1973)’s simple idea and provide it with the warrant for statistical decision theory. Stein(1973) suggested that the gener- alized Bayes estimator with respect to the prior distribution: the weighted sum of that determined by the density kθk2−p, which corresponds to (1.1) with a = 2 and h ≡ 1, and a measure concentrated at the origin, may dominate the James-Stein estimator. Since Kubokawa(1991) afterward showed that the generalized Bayes estimator with re- spect to the density kθk2−p dominates the James-Stein estimator, Stein’s suggestion is to the point in the case where the ratio of a measure concentrated at the origin is very small. Efron and Morris(1976) expressed Stein(1973)’s estimator explicitly. As for the decision-theoretic properties of the estimator, admissibility is easily able to be checked by using Brown(1971)’s theorem. On the other hand, even minimaxity, to say noth- ing of dominance over the James-Stein estimator has not been proved yet. Extending Stein’s prior distribution, we consider a prior distribution: the weighted sum of a mea- sure concentrated at the origin and the prior distribution whose density is proportional

3 to (1.1). This prior distribution corresponds to one which has a density function (1.1) with h(λ) = βh(λ) + δ(λ − 1), where β > 0 and δ(·) is the Dirac delta function. We ∗ show that for h(λ) such that δh is shown to be minimax, there exists a constant β such that for β ≥ β∗, the generalized Bayes estimator with respect to the above distribution is minimax. Furthermore the relations among minimaxity, the prior distribution, and the be- havior of φ(w), a shrinkage factor of the generalized Bayes estimator, are investigated. Especially, in the case where φ(w) is not monotone, the researches on the relations are fragmentary and have not been arranged yet. We characterize the subclass of the prior distributions which lead minimax estimators with not-monotone φ(w). We believe that the investigation of estimators with not-monotone φ(w) is more and more important be- cause such estimators are considered as candidates for the solution of the most difficult problem in this field, that is, the problem of finding admissible estimators dominating the James-Stein positive-part rule. In Section 3.4, the problem of finding admissible estimators which dominate the James-Stein estimator is treated. We show that an estimator which is proposed in Maruyama(1996) and improves upon the James-Stein estimator, is permissible. We unfortunately conjecture that it is inadmissible, which implies that it is quite difficult to find a class of admissible estimators which improve upon the James-Stein estimator. The Stein phenomenon is not limited in the case of a multivariate normal distri- bution. Since Stein(1956), considerable effort has been given to improving upon the best equivariant estimator δ0, which is minimax under some mild conditions. The the- oretical questions were answered quite thoroughly by Brown(1966). He showed that in

3 or more dimensions δ0 is inadmissible for an extremely wide variety of distributions and loss functions. It is however very difficult to give explicitly improved minimax es- timators without restriction of distributions. For the explicit construction of improved minimax estimators in non-normal case, the estimation of a mean vector of spheri- cally symmetric distributions under the quadratic loss function, has been an important topic in this field. As a special case of spherically symmetric distributions, Strawder-

4 man(1974) introduced scale mixtures of multivariate normal distributions, which have a probability density function Z µ ¶ ∞ kx − θk2v f(kx − θk2) = (2π)−p/2vp/2 exp − G(dv), 0 2 where G is a known distribution function. Strawderman(1974) proposed a class of improved minimax estimators of the mean vector. Generally Berger(1975), Bock(1985) and Brandwein and Strawderman(1978,1991) found and discussed classes of improved minimax estimators for spherically symmetric distributions. It is however noted that, in non-normal case, an estimator satisfying both minimaxity and admissibility, has not been derived yet. In Chapter 4, the problem of estimating the mean vector θ of scale mixtures of multivariate normal distributions, in the case where G has a probability density function, with the quadratic loss function is considered. For a certain subclass of these distributions, which includes at least multivariate-t distributions, admissible minimax estimators are derived for p ≥ 5. In Chapter 5 and 6, we review my published papers Maruyama(1997,1999a,1999b).

5 Chapter 2

Basic notions of statistical decision theory

In this chapter, referring to Takemura(1991) and Lehmann and Casella(1999), we review statistical decision theory, particular in estimation problem. Let X be a random variable

taking on values in a sample space X according to a distribution Pθ, which is known to

belong to a family P = {Pθ, θ ∈ Θ}. The aim of an estimation problem is to determine an estimator, a plausible real-valued function δ defined over the sample space, for an estimand, g(θ), a real-valued function of the parameter θ. Needless to say, we hope that δ(X) will tend to be close to the unknown g(θ). Since δ(X) is a random variable, we shall interpret this to mean that it will be close on the average. To make this requirement precise, it is necessary to specify a measure of the average closeness of an estimator to g(θ). One example of such measures is mean square error E(δ(X)−g(θ))2. Quite generally, we introduce a loss function L(θ, d), a nonnegative function of θ and d. It is usually assumed that L(θ, d) ≥ 0 for all θ, d and L(θ, g(θ)) = 0 for all θ, so that the loss is zero when the correct value is estimated. The accuracy of an estimator δ is then measured by the risk function

R(θ, δ) = Eθ{L(θ, δ(X))}. (2.1)

If estimators are given, we are able to compare the estimators using the risk functions calculated as (2.1).

6 Definition 2.1. For two estimators δ1 and δ2, δ1 dominates δ2 if

R(θ, δ1) ≤ R(θ, δ2) for all θ and with strict inequality for some θ. An estimator δ is said to be inadmissible if there exists another estimator δ0 which dominates it and admissible if no such estimator δ0 exists.

Since admissibility is a desirable property, it is of interest to determine the totality of admissible estimators.

Definition 2.2. A class of C of estimators is complete if for any δ not in C there exists an estimator δ0 in C such that δ0 dominates δ.

It follows from this definition that any estimator outside a complete class is inad- missible. It is therefore reasonable, in the search for an optimal estimator, to restrict attention to a complete class. We note however that admissibility is a weak criterion of optimality and many admissible estimators exist. For example, an unreasonable constant estimator δ0 = θ0, which does not utilize the information of X, is typically admissible. Therefore, only on the basis of admissibility, we cannot solve the problem “Which estimator should be chosen?” As one more criterion for comparing estimators, minimax criterion, which aims at minimizing the risk in the least favorable case, is often utilized. Needless to say, a constant estimator is not minimax.

Definition 2.3. An estimator δM , which minimizes the maximum risk, that is, which satisfies

inf sup R(θ, δ) = sup R(θ, δM ) δ θ θ is called a minimax estimator.

There exist close connections between admissibility and minimaxity as follows.

Theorem 2.4. If an estimator is unique minimax, it is admissible.

7 Proof. If it were inadmissible, another estimator would dominate it in risk and, hence, would be minimax.

For an estimator with constant risk, we have the following.

Theorem 2.5. If an estimator has constant risk and is admissible, it is minimax.

Proof. If it were not, another estimator would have smaller maximum risk and, hence, uniformly smaller risk.

If there is a unique minimax estimator δ, by Theorem 2.4, δ is admissible and we should use it as the best estimator. If the uniqueness is removed from the above assumption, there can exist other minimax estimators which improve upon δ. In particular, if the constant risk minimax estimator is inadmissible, this is the worst minimax estimator in the sense that every other minimax estimator has a uniformly small risk. For example, suppose that δ is minimax and has constant risk r. For any other minimax estimator 0 0 0 δ , which implies that supθ R(θ, δ ) equals r, the case where R(θ, δ ) < r for any θ and 0 limkθk→∞ R(θ, δ ) = r may happen. Indeed, on the most important theme of this thesis, the Stein phenomenon, this happens. In this case, the one and only best estimator on the basis of minimaxity and admissibility, is not able to be determined since there can exist many admissible minimax estimators. It is however noted that a problem of characterizing a class of estimators satisfying both admissibility and minimaxity appears. To prove admissibility or minimaxity of a certain estimator δ, we often have to express it as Bayes estimator or variant of Bayes estimator which are defined in the following. R Definition 2.6. Let Θ have distribution π, that is, π(dθ) = 1. An estimator δ minimizing a Bayes risk Z r(π, δ) = R(θ, δ)π(dθ) is called a Bayes estimator with respect to π.

8 This definition immediately leads the following principal method for proving admis- sibility.

Theorem 2.7. Any unique Bayes estimator is admissible.

Proof. If δ is unique Bayes with respect to the distribution π and is dominated by δ0, then Z Z R(θ, δ0)π(dθ) ≤ R(θ, δ)π(dθ), which contradicts uniqueness.

The following result replaces the uniqueness assumption by the continuity of all risk functions.

Theorem 2.8. For a possibly vector-valued parameter θ, suppose that δν is a Bayes estimator having finite Bayes risk with respect to a distribution which has a density function ν, that ν is positive for all θ, and that the risk function of every estimator δ is continuous function of θ. Then δν is admissible.

Proof. If δν is not admissible, there exists an estimator δ such that R(θ, δ) ≤ R(θ, δν) for all θ with strict inequality for some θ. It then follows from the continuity of risk ν functions that R(θ, δ) < R(θ, δ ) for all θ in some open subset Θ0 of the parameter space and hence that Z Z R(θ, δ)ν(θ)dθ < R(θ, δν)ν(θ)dθ, which contradicts the definition of δν.

In the Bayesian analysis, π and ν in the above are called a prior distribution and a prior density function, respectively. Namely θ is viewed as a realization of a random vector

θ whose prior distribution is π. A sample X is drawn from Pθ = Px|θ, which is viewed as the conditional distribution of X given θ = θ. The sample X = x is then used to

9 obtain an updated prior distribution, which is called posterior distribution. Noting that the joint distribution of X and θ is a probability measure on X × Θ determined by Z

P (A × B) = Px|θ(A)π(dθ),A ∈ BX ,B ∈ BΘ, B where BX and BΘ are the Borel σ-field on X , Θ respectively, we see that the posterior distribution of θ, given by X = x, is expressed as the conditional distribution Pθ|x. If

Px|θ and π have probability density functions f(x|θ) and ν(θ) respectively, the posterior distribution Pθ|x has a posterior density function

µZ ¶−1 ν(θ|x) = f(x|θ)ν(θ) f(x|θ)ν(θ)dθ .

By using the concept of posterior, the determination of a Bayes estimator is, in principle, quite simple as follows.

Theorem 2.9. Suppose the following assumptions hold

1. There exists an estimator δ0 with finite risk,

2. for almost all x, there exists a value δπ(x) minimizing

E {L(θ, δ(x))|X = x} , (2.2)

where the expectation is taken with respect to the posterior distribution Pθ|x.

Then δπ is a Bayes estimator.

The expectation (2.2) is called posterior expected loss. Therefore a Bayes estimator is interpreted as one which minimizes the posterior expected loss.

Proof. Let δ be any estimator with finite risk. Then (2.2) is finite since L is nonnegative. Hence,

E {L(θ, δ(x))|X = x} ≥ E {L(θ, δπ(x))|X = x} a.e., and the result follows by taking the expectation of both sides.

10 Corollary 2.9.1. Suppose the assumptions of Theorem 2.9 hold. If L(θ, d) = (d − g(θ))2, then δπ(x) = E(g(θ)|x), which is the mean of the posterior distribution.

Proof. By Theorem 2.9, the Bayes estimator is obtained by minimizing © ª E (g(θ) − δ(x))2|X = x . (2.3)

Since (2.3) is expanded as

(δ(x) − E(g(θ)|x))2 − (E(g(θ)|x))2 + E(g(θ)2|x), it is minimized by δ(x) = E(g(θ)|x).

Next we extend the above definition of the Bayes estimator to the case where π is R a measure satisfying π(dθ) = ∞, a so-called improper prior distribution. In the case where (2.2) is finite for each x, the Bayes estimator can formally be defined.

Definition 2.10. An estimator δπ is a generalized Bayes estimator with respect to a measure π(θ)(even if it is not a proper probability distribution) if the posterior expected loss, E {L(θ, δ(x))|X = x}, is minimized at δ = δπ for all x.

Therefore a class of generalized Bayes estimators coincides with direct sum of a class of proper Bayes estimators and a class of improper Bayes estimators. There is another useful variant of a Bayes estimator, a limit of Bayes estimators.

Definition 2.11. An estimator δ is a limit of Bayes estimators if there exists a sequence

πn πn of proper prior distributions πn and Bayes estimators δ such that δ → δ a.e. as n → ∞.

By using the property of a limit of Bayes estimators, a sufficient condition for minimaxity of a certain estimator δ is given as follows.

Lemma 2.12. Suppose that {πn} is a sequence of proper prior distributions and δn is the Bayes estimator with respect to πn. If

sup R(θ, δ) = lim r(πn, δn), θ n→∞ then δ is minimax.

11 Proof. Suppose that δ0 is any other estimator. Then we have Z 0 0 sup R(θ, δ ) ≥ R(θ, δ )dπn(θ) ≥ r(πn, δn), θ and this holds for every n. Hence,

sup R(θ, δ0) ≥ sup R(θ, δ) θ θ and δ is minimax.

In the case where at least one minimax estimator is derived, a following idea of unbi- ased estimator of risk, often enables us to show that the other estimators are minimax.

Definition 2.13. For a given estimator δ(X) and its risk R(θ, δ), if there exists a statistic Rˆ(δ(X)), which is not depend on θ and satisfies

ˆ R(θ, δ) = Eθ(R(δ(X))),

Rˆ(δ(X)) is called the unbiased estimator of risk.

Lemma 2.14. Suppose that δ0(X) with the unbiased estimator of risk, Rˆ(δ0(X)), is minimax and that δ(X) with the unbiased estimator of risk, Rˆ(δ(X)), is any other estimator.

1. If Rˆ(δ(x)) ≤ Rˆ(δ0(x)) for any x, δ(X) is minimax.

2. If Rˆ(δ(x)) < Rˆ(δ0(x)) for any x, δ0(X) is inadmissible.

Proof. To prove part 1, by taking the expectation of both sides of Rˆ(δ(x)) ≤ Rˆ(δ0(x)), we have R(θ, δ) ≤ R(θ, δ0) for any θ. Since δ0 is minimax, δ is also minimax. Part 2 follows immediately from the definition of inadmissibility.

The concept of unbiased estimator of risk leads to a criterion of optimality which is weaker than admissibility.

12 Definition 2.15. A estimator δ(X) with the unbiased estimator of risk, Rˆ(δ(X)), is said to be permissible if the inequality Rˆ(δ(x)) ≥ Rˆ(δ0(x)) is valid for some δ0 for all values of x, implies the equality for all x.

If the estimator δ is not permissible, then δ is inadmissible in the original problem. A permissible estimator is not always admissible. The notion of permissibility is not dealt with in standard texts of mathematical statistics. See Brown(1988) and Rukhin(1995) for details and more discussion.

13 Chapter 3

Estimation of a multivariate normal mean

3.1 Inadmissibility of the usual estimator

Let X be a random variable having p-variate normal distribution Np(θ, Ip) and pθ(x) denote its density function. Then we consider the problem of estimating the mean vector θ by δ(X) relative to the quadratic loss function kδ(X) − θk2. Therefore every estimator is evaluated on the risk function Z £ 2¤ 2 R(θ, δ) = Eθ kδ(X) − θk = kδ(x) − θk pθ(x)dx. Rp The usual estimator of θ is X, which is a uniformly minimum variance unbiased and maximal likelihood estimator. This estimation problem is invariant under the transfor- mation ΓX +d,Γθ+d for any orthogonal matrix Γ and any vector d when the estimator δ(X) satisfies the equivariance δ(ΓX + d) = Γδ(X) + d, which implies δ(X) = X + d for vector d. Hence we easily see that X is the best among this class of equivariant estima- tors. Moreover the risk of X is clearly constant, that is, R(θ, X) = p. Following results concerning the minimaxity and admissibility of X are well-known and fundamental.

Theorem 3.1. 1. The estimator X is minimax.

2. For p ≤ 2, X is admissible.

14 3. For p ≥ 3, X is inadmissible.

In this section, we prove part 1 and 3. We will prove part 2 in the next section. Part 1 and 3 of this theorem imply that for p ≥ 3, if δ dominates X, then δ is also minimax and, thus, definitely preferred.

proof of part1. For the prior distribution of θ, we select p-variate normal distribution

Np(0, mIp). The joint density function of θ and X is then given by µ ¶ µ ¶ kx − θk2 kθk2 f(x, θ) ∝ exp − exp − 2 2m µ ¶ µ ¶ m + 1 kθ − mx/(m + 1)k2 kxk2 = exp − exp − . m 2 2(m + 1)

Since the posterior density is the joint density divided by the marginal density of X, −1 we see that posterior distribution of θ|X = x is Np(mx/(m + 1), m(m + 1) Ip). Under

the quadratic loss function, the Bayes estimator is δm(X) = mX/(m + 1), by Corollary

2.9.1. The risk of δm(X) is

£ 2¤ R(θ, δm) = E kδm(X) − θk £ ¤ = E km(m + 1)−1(X − θ) − θ/(m + 1)k2 = (m/(m + 1))2 p + (m + 1)−2kθk2,

so that the Bayes risk is

2 −2 r(πm, δm) = (m/(m + 1)) p + (m + 1) pm,

which approaches the value p or the risk of X, when m tends to infinity. Lemma 2.12 guarantees the minimaxity of X.

Next by proposing a class of estimators dominating X, we prove part 3. The inadmis- sibility of X for p ≥ 3 was first presented by Stein(1956) and surprised many statis- ticians because this means that a usual estimator is inadmissible in the framework of the simultaneous estimation of several parameters, although the components of the

15 estimator are separately admissible to estimate the corresponding one-dimensional pa- rameters. Although Stein(1956)’s proof of inadmissibility was non-constructive, James and Stein(1961) succeeded in giving an explicit form of an estimator improving on X as µ ¶ p − 2 δJS = 1 − X, kXk2 which is called the James-Stein estimator. Their method of proof of the Stein phe- nomenon utilizes the fact that a non-central chi square distribution is represented by a Poisson mixture of a central chi square distribution. In this thesis, a more simple and powerful method derived by Stein(1973) is introduced. The following relation which is called the Stein identity is essential.

Lemma 3.2 (Stein(1973)). If Y ∼ N(µ, 1) and g is any differentiable function such that E[|g0(Y )|] < ∞, then

E[g(Y )(Y − µ)] = E[g0(Y )]. (3.1)

0 proof. We write p0(y) for the standard normal density, with derivative p0(y) = −yp0(y). The right-hand side of the equation (3.1) is expressed as

Z 0 Z ∞ 0 0 0 E[g (Y )] = g (y)p0(y − µ)dy + g (y)p0(y − µ)dy. (3.2) −∞ 0

R ∞ By using the relation p0(y − µ) = y (z − µ)p0(z − µ)dz, the second term in the right- hand side of the equation (3.2) is written as

Z ∞ Z ∞ µZ ∞ ¶ 0 0 g (y)p0(y − µ)dy = g (y) (z − µ)p0(z − µ)dz dy 0 0 y Z ∞ µZ ∞ ¶ 0 = (z − µ)p0(z − µ) g (y)I[y≤z]dy dz 0 0 Z ∞ = (z − µ)p0(z − µ)(g(z) − g(0))dz, (3.3) 0

16 where I[y≤z] is a indicator function, that is, 1 if y ≤ z, 0 otherwise. Fubini’s Theorem ensures the second equality in (3.3). Similarly we have

Z 0 Z 0 0 g (y)p0(y − µ)dy = (z − µ)p0(z − µ)(g(z) − g(0))dz. (3.4) −∞ −∞ Combining (3.3) and (3.4), we have Z ∞ 0 E[g (Y )] = (z − µ)p0(z − µ)(g(z) − g(0))dz −∞ = E[(Y − µ)g(Y )].

Using this identity (3.1), we can write the risk function of the estimator of the form δ(X) = X + g(X) as

£ 2¤ R(θ, δ) = Eθ kδ(X) − θk Xp £ 2¤ £ 2¤ 0 = Eθ kX − θk + Eθ kg(X)k + 2 Eθ [(X − θ) g(X)] i=1 p · ¸ £ ¤ X ∂ = p + E kg(X)k2 + 2 E g (X) , θ θ ∂x i i=1 i where we assume that g(x) = {gi(x)} is differentiable and E|(∂/∂xi)gi(X)| < ∞ for i = 1, . . . , p. In this case, the unbiased estimator of risk is

p X ∂ p + kg(X)k2 + 2 g (X) (3.5) ∂x i i=1 i and by Theorem 2.14 we have a sufficient condition for minimaxity.

Theorem 3.3 (Stein(1973)). For the estimator of the form X + g(X), assume that g(x) is given by solutions of the following partial differential inequality

p X ∂ kg(x)k2 + 2 g (x) ≤ 0. (3.6) ∂x i i=1 i Then X + g(X) is minimax.

17 To make the structure of a minimax estimator comprehensible, we consider an esti- mator of the form of

2 2 δφ(X) = (1 − φ(kXk )/kXk )X, (3.7) which is equivariant with respect to the orthogonal transformation ΓX and Γθ for orthogonal matrix Γ. Assigning g(x) = −φ(kxk2)x/kxk2 in (3.6), we have the following result.

Corollary 3.3.1 (Efron and Morris(1976)). For the estimator of the form (3.7), assume that φ(w) is given by solutions of the following differential inequality

φ(w) (2(p − 2) − φ(w)) /w + 4φ0(w) ≥ 0, (3.8) which is, for example, satisfied by

(A1) φ(w) is monotone nondecreasing.

(A2) 0 ≤ φ(w) ≤ 2(p − 2) for every w ≥ 0.

Then δφ is minimax.

Thus a class of the estimators better than X is constructed. Since the inequality (3.8) are satisfied by φ(w) = p − 2, the James-Stein estimator δJS is included in this class. It is noted that the sufficient condition for minimaxity, (A1) and (A2), had been derived by Baranchik(1970) before Stein(1973).

Remark 3.4. Although the monotonity of φ which is required by (A1) is very compre- hensible and is easily checked by differentiating φ(w), the differential inequality (3.8) allows φ(w) to decrease. Indeed (3.8) is equivalent to that

wp/2−1φ(w) (3.9) 2(p − 2) − φ(w)

18 is nondecreasing, as shown in Efron and Morris(1976). Hence considering the case where (3.9) equals a nonnegative constant c, we have an estimator which has constant 2 2 risk p, δd(X) = (1 − φd(kXk )/kXk )X where 2c(p − 2) φ (w) = . d c + wp/2−1

Notice that φd(w) is strictly monotone decreasing in w and that c = 0 corresponds to the usual estimator X while c = ∞ corresponds to the James-Stein estimator (1 − 2 2(p − 2)/kXk )X. These facts on φd(w) were first pointed out by DasGupta and Strawderman(1997).

When kxk2 < p−2, the James-Stein estimator yields an over-shrinkage and changes

the sign of each xi. For eliminating this drawback, the positive-part James-Stein esti- mator µ ¶ p − 2 δJS = max 0, 1 − X + kXk2 is considered and it dominates the James-Stein estimator as shown in Baranchik(1964). risk

6

5

4 the usual estimator

3 the James-Stein estimator

2 the James-Stein positive-part estimator

1 0 2 4 6 8 10 | |

JS JS Figure 3.1 : Comparison of the risks of X, δ and δ+

19 JS JS Figure 3.1 gives a comparison of the respective risks of X, δ and δ+ , for p = 6. Furthermore the complete class theorem, which will be introduced in the next section, implies that the positive-part James-Stein estimator is not analytic and thus is inad- missible. Since we are interested in proposing a class of admissible minimax estimators, we will review, in the next section, some results of complete class theorem, admissibility and permissibility in this problem.

3.2 Admissibility and permissibility

3.2.1 Preparations

First we briefly review a generalized Bayes estimator since it plays an important role in this section. Let F be a nonnegative σ-finite Borel measure on Rp such that f(x) = R p pθ(x)F (dθ) < ∞ for almost all x in R . Then the generalized Bayes estimator with respect to F exists and is denoted by δF . The estimator δF is given by R θpθ(X)F (dθ) δF (X) = R . pθ(X)F (dθ)

It is easy to see, by differentiating under integral sign that we can write δF (X) = ∇f(X)/f(X) + X. The basic tool for our study is the necessary and sufficient condition for admissibility due to Farrell(1968). Although his original result includes many regularity conditions, the result for our estimation problem, can be simplified as follows.

Theorem 3.5 (Farrell(1968)). δ is admissible if and only if there exists a sequence of finite measures {Gn} satisfying

1. {Gn} has compact support,

2. Gn({θ : kθk ≤ 1}) ≥ 1 for all n and Z

lim (R(θ, δ) − R(θ, δG )) Gn(dθ) = 0. (3.10) n→∞ n

20 proof. Since the necessity of a method of proof is quite deep, we omit it in this thesis. We shall prove the sufficiency below. Suppose δ is not admissible. Then there is an estimator δ0 such that R(θ, δ0) ≤ R(θ, δ) and Z kδ0(x) − δ(x)kdx > 0. (3.11)

We define δ00 by δ00 = (δ + δ0)/2. Then, using Jensen’s inequality and (3.11), we have Z 00 00 2 R(θ, δ ) = kδ (x) − θk pθ(x)dx µZ Z ¶ 1 < kδ(x) − θk2p (x)dx + kδ0(x) − θk2p (x)dx 2 θ θ = (R(θ, δ) + R(θ, δ0)) /2 ≤ R(θ, δ).

R(θ, δ00) and R(θ, δ) are both continuous functions. Hence the inequality above yields 00 the existence of an ² > 0 such that R(θ, δ ) < R(θ, δ) − ² for kθk ≤ 1. Hence if Gn satisfies Gn({θ : kθk ≤ 1}) ≥ 1, we have Z Z 00 (R(θ, δ) − R(θ, δGn )) Gn(dθ) ≥ (R(θ, δ) − R(θ, δ )) Gn(dθ) Z 00 ≥ (R(θ, δ) − R(θ, δ )) Gn(dθ) kθk≤1 ≥ ².

This contradicts (3.10). It follows that if (3.10) is satisfied, δ is admissible.

If u : Rp → R, we shall say that u is piecewise differentiable if there exists a countable p ∞ ¯ ¯ collection of disjoint open set {Oi} such that R = ∪i=1Oi ( Oi is the closure of Oi) and u is continuously differentiable in each Oi.

21 3.2.2 The necessary condition for the admissibility

Interchanging the order of integration we have, as in James and Stein(1961), for any estimator δ Z ZZ ¡ 2 2¢ (R(θ, δ) − R(θ, δGn ))Gn(dθ) = kδ − θk − kδGn − θk pθ(x)Gn(dθ) Z µ Z 2 = kδ(x)k gn(x) − 2δ(x) · θpθ(x)Gn(dθ) Z ¶ 2 −kδGn (x)k gn(x) + 2δGn (x) · θpθ(x)Gn(dθ) dx Z 2 = kδ(x) − δGn (x)k gn(x)dx. (3.12)

Using a continuity theorem for Laplace transforms shown in Brown(1971) and (3.12), we have the following result, what is called, Brown’s complete class theorem.

Theorem 3.6 (Brown(1971)). If δ is admissible, there is a nonnegative measure F such that f(x) < ∞ for all x and such that δ(X) = δF (X) a.e. (dx). proof. If {Gn} is a sequence satisfying part 2 of Theorem 3.5 then we have Z −p/2 2 2 gn(x) ≥ (2π) exp(−kxk /2 + θ · x − kθk /2)Gn(dθ) kθk≤1 Z ≥ (2π)−p/2 exp(−kxk2/2 − kxk) exp(−kθk2/2)dθ kθk≤1 = C exp(−kxk2/2 − kxk), (3.13) for some constant C(> 0). Hence (3.12) and (3.13) imply that ∇gn(x)/gn(x) converges to δ(x) − x in measure (dx) on each compact set in Rp. Thus there is a subsequence {n0} such that

k∇gn0 (x)/gn0 (x)k → kδ(x) − xk < ∞ a.e. (dx).

2 Defining µn(dθ) = exp(−kθk /2)Gn(dθ) and the multivariate Laplace transformµ ˜i by Z

µ˜i(x) = exp(x · θ)µi(dθ)

22 p for x ∈ R , the above implies that ∇µ˜n0 (x)/µ˜n0 (x) converges to δ(x) a.e. (dx). Thus, using a continuity theorem for multivariate Laplace transform shown in Brown(1971), 00 0 there is a measure µ0(dθ) and a subsequence {n } of {n } such that ∇µ˜n00 (x)/µ˜n00 (x) → p ∇µ˜0(x)/µ˜0(x) for all x ∈ R . It follows that ∇µ˜0(x)/µ˜0(x) = δ(x) a.e. (dx). Defining R 2 F (dθ) = exp(kθk /2)µ0(dθ), we have f(x) = pθ(x)F (dθ) < ∞ and x + ∇f(x)/f(x) =

∇µ˜0(x)/µ˜0(x) and hence δ(x) = δF (x) a.e. (dx). This completes the proof.

3.2.3 The sufficient condition for admissibility

As shown in Theorem 3.6, admissible estimators are generalized Bayes. Therefore, our study of admissibility of estimators can be confined to generalized Bayes estimators.

The main aim of this subsection is to obtain the sufficient conditions on f(x) for δF to be admissible. Throughout the rest of this subsection we assume that F is a fixed nonnegative σ-finite Borel measure with unbounded support. (If the support of F is bounded then F is a finite measure and therefore δF is proper Bayes and hence admissible.) We also assume that the closed convex hull of the support is equal to Rp. For example, this assumption is satisfied if F is spherically symmetric. Moreover we restrict estimators considered in this subsection ones which have bounded risk, because we are interested in minimaxity, that is, risk bounded less than p. A nec- essary and sufficient condition for bounded risk is the following.

Theorem 3.7 (Brown(1971)). There is a constant r such that R(θ, δF ) < r for all θ ∈ Rp if and only if there is a constant B such that

k∇f(x)/f(x)k < B for all x ∈ Rp. (3.14) proof. The “only if” part is omitted. See Theorem 3.3.1 of Brown(1971). We shall prove the sufficiency below. Suppose that (3.14) is satisfied. When we have Z 2 R(θ, δF ) = kx + ∇f(x)/f(x) − θk pθ(x)dx Z ¡ 2 2¢ ≤ 2 kx − θk + k∇f(x)/f(x)k pθ(x)dx ≤ 2p + B2.

23 This completes the proof.

Using the fact that δF (x) = x + ∇f(x)/f(x) and interchanging the order of integration in (3.12), we have, as in Brown(1971), Z Z 2 (R(θ, δF ) − R(θ, δGn ))Gn(dθ) = kδF (x) − δGn (x)k gn(x)dx Z ° ° ° °2 °∇f(x) ∇gn(x)° = ° − ° gn(x)dx f(x) gn(x) Z ° ° °f(x)∇g (x) − g (x)∇f(x)°2 f(x)2 = ° n n ° dx ° f(x)2 ° g (x) Z n k∇h (x)k2 = 4 n f(x)dx h (x) Z n 1/2 2 = k∇hn(x) k f(x)dx, (3.15)

where hn(x) = gn(x)/f(x). This identity plays a crucial role in the rest of this subsec- tion.

We note that by (3.13), gn(x) is greater than exp(−3/2) for kxk ≤ 1. Multiplying

F by a positive constant does not affect the value of R(θ, δF ) − R(θ, δGn ). Without loss of generality, we may thus assume F has been normalized on the unit sphere so that hn(x) ≥ 1 for kxk ≤ 1 for all n. Moreover we have the following result.

Lemma 3.8 (Birnbaum(1955)). If a sequence of {Gn} has a compact support,

lim sup hn(x) = 0 (3.16) r→∞ kxk=r for all n.

Now let J be the class of all nonnegative piecewise differentiable real valued functions j defined Rp satisfying

i) j(x) ≥ 1 for kxk ≤ 1,

ii) limr→∞ supkxk=r j(x) = 0 for all n.

24 By (3.15) it is easy to see that Z Z 2 (R(θ, δF ) − R(θ, δGn ))Gn(dθ) ≥ inf k∇j(x)k f(x)dx (3.17) j∈J for all n. By Theorem 3.5, if δF is admissible Z inf k∇j(x)k2f(x)dx (3.18) j∈J equals to zero. In particular, if (3.18) is positive then δF is inadmissible. The converse, that (3.18) is zero implies δF is admissible, is our goal. Theorem 3.9 (Brown(1971) and Srinivasan(1981)). The bounded risk generalized

Bayes estimator δF is admissible if and only if Z inf k∇j(x)k2f(x)dx = 0. j∈J proof. The “only if” part is trivial. We shall the sufficiency below. The proof involves constructing a sequence of finite measures {Gn} as in Theorem 3.5 and showing that the sequence satisfies (3.10).

Let jr(x) be a sequence of nonnegative functions satisfying  1 for kxk ≤ 1 jr(x) = exp(−2Br) for r ≤ kxk ≤ 2r (3.19)  0 for kxk > 2r and Lf jr(x) = 0 for 1 < kxk < r. Such a sequence exists by Theorem 3.11 in R 2 the next subsection. Let Gr(dθ) = j (θ)F (dθ), gr(x) = pθ(x)Gr(dθ) and ψr(x) = R r jr(θ)pθ(x)F (dθ)/f(x). We shall now prove that Z

(R(θ, δF ) − R(θ, δGr )) Gr(dθ) (3.20) converges to 0 as r → ∞. We can write (3.20) as Z °Z ° ° ¡ ¢ °2 ° 2 2 pθ(x)F (dθ)° ° jr (θ) − ψr (x) (θ − δF (x)) ° gr(x)dx kxk<2r gr(x) Z °Z ° ° °2 ° 2 pθ(x)F (dθ)° ° jr (θ)(θ − δF (x)) ° gr(x)dx. (3.21) kxk≥2r gr(x)

25 We will call the two terms in (3.21) T1 and T2 respectively. We first consider the second

term T2. Since jr(θ) = 0 for kθk > 2r, using Schwartz inequality and (3.14), we have Z Z 2 2 T2 ≤ C kxk jr (θ)pθ(x)F (dθ)dx, (3.22) kxk>2r kθk<2r for some constant C. Now let D = {θ : kθk ≤ r} and E = {x : kxk > 3r}. It is R easy to see that (3.14) implies exp(−Bkθk)F (dθ) = B1 < ∞. Hence using maximum modulus principle on jr(θ) shown in Srinivasan(1981), we have Z Z Z 2 2 2 ¡ 2 2 ¢ kxk jr (θ)pθ(x)F (dθ)dx ≤ B1 exp(Br) kxk exp −(kxk − r) /2 dx. kxk>2r D kxk>2r (3.23)

The right-hand side of (3.23) is easily seen to converge to zero as r → ∞. By a similar argument we can show that Z Z 2 2 lim kxk jr (θ)pθ(x)F (dθ) = 0. (3.24) r→∞ E kθk≤2r

Moreover from the fact that jr(θ) = exp(−2Br) for r < kθk < 2r, we have Z Z 2 2 lim kxk jr (θ)pθ(x)F (dθ) = 0. r→∞ 2r

Hence we have proved that T2 → 0 as r → ∞. We will now complete the proof of 2 2 the theorem by showing T1 goes to zero as r → ∞. Expressing (jr (θ) − ψr (x)) as

(jr(θ) − ψr(x))(jr(θ) + ψr(x)) and using Schwartz inequality we have Z µZ ¶ 2 2 T1 ≤ 2 (jr(θ) − ψr(x)) kθ − δF (x)k pθ(x)F (dθ) kxk<2r µZ ¶ ¡ 2 2 ¢ −1 × jr (θ) + ψr (x) pθ(x)F (dθ) gr (x)dx. (3.25)

2 Since the inequality ψr (x)f(x) ≤ gr(x) is satisfied by Schwartz inequality, T1 is evalu-

26 ated as Z Z 2 2 T1 ≤ 4 (jr(θ) − ψr(x)) kθ − δF (x)k pθ(x)F (dθ)dx Zkxk<2r Z 2 2 ≤ 4 (jr(θ) − jr(x)) kθ − δF (x)k pθ(x)F (dθ)dx kZxk<2r Z 2 2 +4 (jr(x) − ψr(x)) kθ − δF (x)k pθ(x)F (dθ)dx. (3.26) kxk<2r

We call the two terms in (3.26) T3 and T4 respectively. By Lemmas A.3, A.4 and (3.14), we have ·Z Z 2 2 T3 ≤ C (jr(θ) − jr(x)) kθ − xk pθ(x)F (dθ)dx Z kxk<2rZ ¸ 2 + (jr(θ) − jr(x)) pθ(x)F (dθ)dx Zkxk<2r 2 ≤ C1 k∇jr(x)k f(x)dx (3.27) where C and C1 are some positive constants. The choice of jr’s imply that (3.27) goes to zero and hence T3 goes to zero as r → ∞. Finally we consider the term T4. Using Schwartz inequality, Lemma A.4 and (3.14), Z Z µZ ¶ kθ − δ (x)k2 T ≤ (j (x) − j (η))2p (x)F (dη) F p (x)F (dθ)dx 4 r r η f(x) θ kxZk<2r Z 2 ≤ C2 (jr(x) − jr(η)) pη(x)F (dη)dx, kxk<2r for some constant C2. Appealing to Lemma A.3, it is easy to see that Z 2 T4 ≤ C3 k∇jr(x)k f(x)dx → 0 as r → ∞. (3.28)

Hence T1 goes to zero as r → ∞ and this completes the proof.

Now we restrict the class of the estimators considered in this subsection, to the class of generalized Bayes estimators with respect to spherically symmetric generalized prior distributions. Namely, for any measure set A ∈ Rp and any orthogonal transformation

27 Γ, F (A) = F (ΓA) is satisfied. In this case, the marginal density function f(x) satisfies f(Γx) = f(x) for any Γ, because we have Z f(Γx) = exp(−kθ − Γxk2/2)F (dθ) Rp Z = exp(−kxk2/2) exp(x0Γ0θ) exp(−kθk2/2)F (dθ) ZRp 2 0 2 = exp(−kxk /2) exp(x η) exp(−kηk /2)FΓ(dη) ZRp = exp(−kxk2/2) exp(x0η) exp(−kηk2/2)F (dη) Rp = f(x),

p where FΓ is a measure such that F (ΓA) = FΓ(A) for any measure set A ∈ R . Therefore f(x) is a function only of kxk2 and we put m(kxk2) = f(x). We have the following result as the corollary of Theorem 3.9.

Corollary 3.9.1. Suppose that F is spherically symmetric. The bounded risk general- ized Bayes estimator δF is admissible if and only if

Z ∞ t−p/2m(t)−1dt = ∞. 1 proof. The necessity part will be proved in Theorem 3.12, where we show that if R ∞ −p/2 −1 1 t m(t) dt < ∞ then δF is not permissible and is thus inadmissible. We shall prove the sufficiency below. We define  1 for kxk2 ≤ 1  R R i −p/2 −1 −1 i −p/2 −1 2 ji(x) = ( 1 t m(t) dt) kxk2 t m(t) dt for 1 < kxk ≤ i  0 for kxk2 > i.

28 A direct computation yields

Z µZ i ¶−2 Z 2 −p/2 −1 2−p 2 −2 k∇ji(x)k f(x)dx = 4 t m(t) dt kxk m(kxk ) f(x)dx 1 {x:1≤kxk2≤i} µZ i ¶−2 Z = 4 t−p/2m(t)−1dt kxk2−pm(kxk2)−1dx 1 {x:1≤kxk2≤i} µZ i ¶−1 = C t−p/2m(t)−1dt 1 for some positive constant C. Therefore we have

Z µZ i ¶−1 2 −p/2 −1 lim k∇ji(x)k f(x)dx = lim t m(t) dt = 0 i→∞ i→∞ 1 and this completes the proof.

By using this theorem, we can prove part 2 of Theorem 3.1.

proof of part 2 of Theorem 3.1. Since the estimator X is the generalized Bayes estima- R p ∞ −p/2 tor with respect to the Lebesque measure on R , m(t) is constant. Clearly 1 t dt diverges for p = 1, 2. This completes the proof.

By Theorem 3.6, it is very important to determine whether a proposed estimator is generalized Bayes or not. We note that every estimator of the form (3.7) is able to be written as

2 X + ∇ log mφ(kXk ), (3.29) where µ Z ¶ 1 w φ(w) mφ(w) = exp − dw , (3.30) 2 ² w for every ² > 0. Because (3.29) takes the same form of the generalized Bayes estimator, Bock(1988) called it a pseudo-Bayes estimator and investigated its properties. If there exists a measure π which satisfies Z 2 2 mφ(kxk ) = exp(−kθ − xk /2)π(dθ), (3.31) Rp

29 the estimator of the form (3.29) is truly generalized Bayes. Generally, it is however quite difficult to find such a p-dimensional measure π. Strawderman and Cohen(1971) derived one-dimensional necessary condition for an estimator of the form (3.29) to be generalized Bayes.

Theorem 3.10 (Strawderman and Cohen(1971)). A necessary condition for an estimator of the form (3.29) to be generalized Bayes, is that there exists a one-dimensional measure ν which satisfies

Z ∞ 2 2 mφ(y ) = exp(−y /2) exp(yη)ν(dη). (3.32) −∞ proof. Since an estimator of the form (3.29) is generalized Bayes, there exists a measure π which satisfies (3.31). Noting that the left-hand side of (3.31) is spherically symmetric, for any orthogonal transformation Γ, we have Z Z 2 2 2 mφ(kxk ) = exp(−kθ − xk /2)π(dθ) = exp(−kθ − Γxk /2)π(dθ). (3.33) Rp Rp

Choosing the orthogonal transformation Γx which carries x into (kxk, 0, ··· , 0), we have

Z ∞ 2 2 mφ(kxk ) = exp(−kxk /2) exp(θ1kxk)ν(dθ1), (3.34) −∞ where Z Z 2 ν(dθ1) = ··· exp(−kθk /2)π(dθ). (3.35) θ2 θp

By letting y = kxk and η = θ1, this completes the proof.

3.2.4 Exterior boundary problem and recurrent diffusions

We observed in the previous subsection, that the following calculus of variation problem Z inf k∇j(x)k2f(x)dx j∈J

30 is crucial to our study of admissibility of δF . Hence finding conditions under which this infimum is not zero is important. We introduce, below an exterior boundary value problem which describes when this infimum is zero.

Let Lf denote the elliptic differential operator given by

Lf u(x) = ∆u(x) + ∇ log f(x) · ∇u(x) (3.36) where u is a twice continuously differential function. We say that Exterior Boundary

Problem for Lf (BP for Lf ) is solvable if there exists a unique bounded solution u0 for the equation Lf u(x) = 0 in the region {x : kxk > 1} satisfying the condition u0(x) = 1 on kxk = 1. Note that the unique bounded solution u0 is identically equal to 1. The following result is well-known. Theorem 3.11. A necessary and sufficient condition for Z inf k∇j(x)k2f(x)dx = 0 j∈J is BP for Lf is solvable. proof. See Meyers and Serrin(1960) and Srinivasan(1981).

The operator Lf u(x) can be considered as the generator of the diffusion with the local covariance 2I and the local mean ∇ log f(x). There exist some bounded solutions of

Lf u(x) = 0 if and only if the diffusion is transient in which case u(x) can be taken to the probability of the reaching the unit ball. Thus the estimator is admissible if and only if the diffusion process determined by Lf u(x) is recurrent. For the usual estimator X, the corresponding diffusion process is Brownian motion, which is known to be recurrent if and only if the dimension p does not exceed two.

Equation Lf u(x) = 0 now means that

∆u(x) = 0 for kxk ≥ 1, i.e. u is a bounded harmonic function outside the unit ball whose determination is a classical Dirichlet boundary problem. Putting u(x) = v(kxk2), one obtains

pv0(y) + 2yv00(y) = 0, for y ≥ 1.

31 All solutions of this equation are v(y) = 1 for all p and ( y1−p/2, for p 6= 2 v(y) = log y + 1 for p = 2.

Hence only for p = 1, 2, the exterior boundary problem for Lf is solvable.

3.2.5 Permissibility

As shown in Chapter 2, permissibility is a weaker criterion of optimality than admis- sibility. It is however noted that if the estimator δ is permissible then it is impossible to find an estimator δ0 whose unbiased estimate of risk is always less than that of δ. In short, if δ is inadmissible, the customary method for finding an estimator improving upon δ cannot succeed.

Suppose that a proposed estimator δφ is the form of (3.29) and that δφ,h(X) = 2 2 X + ∇ log{mφ(kXk )h(kXk )} is a proposed competing estimator. In the following of 2 this subsection, let m = mφ for simplicity. Then substituting g(x) = ∇ log m(kxk ) and g(x) = ∇ log{m(kxk2)h(kXk2)} in (3.5), we have the unbiased estimators of risk ˆ as R(δφ) = p + 4R(δφ) where µ ¶ m0(w) 2 m0(w) m00(w) R(δ ) = −w + p + 2w (3.37) φ m(w) m(w) m(w)

2 ˆ and w = kxk and R(δφ,h) = p + 4R(δφ,h) where µ ¶ (m(w)h(w))0 2 m0(w) h0(w) (m(w)h(w))00 R(δ ) = −w + p + p + 2w φ,h m(w)h(w) m(w) h(w) m(w)h(w) µ ¶ h0(w) 2 h00(w) h0(w) h0(w) m0(w) = R(δ ) − w + 2w + p + 2w . φ h(w) h(w) h(w) h(w) m(w)

The estimator is then permissible if R(δφ,h) ≤ R(δφ) implies h(w) ≡ 1. Now we assume R 1 −p/2 −1 that 0 w m (w)dw = ∞, which is satisfied by reasonable estimators, for example, the generalized Bayes estimator δF such that f(x) < ∞ and the James-Stein estimator. The following result is a special case of Brown(1988)’s result.

32 Theorem 3.12 (Brown(1988)). The estimator of the form (3.29) is permissible if and only if

Z ∞ w−p/2m(w)−1dw = ∞. 1 R ∞ −p/2 −1 proof. We first consider the “only if” part. We assume that 1 w m(w) dw < ∞. p/2 By letting M(w) = w m(w), the difference R(δφ,h) − R(δφ) is written as µ ¶ M 0(w) h0(w) 2 2 h0(w) + 2h00(w) − h(w) . (3.38) M(w) h(w)

Note that the sum of the first and second terms is interpreted as the elliptic differential operator LM h(w). Let h1(w) be a function satisfying ( 1 for w ≤ 1 h (w) = ¡R ¢ R 1 ∞ −1 −1 ∞ −1 1 M (t)dt w M (t)dt for w > 1.

Then we have R(δφ,h1 ) − R(δφ) = 0 for w ≤ 1 and

µ 0 ¶2 h1(w) R(δφ,h1 ) − R(δφ) = −h1(w) ≤ 0 h1(w)

for w > 1 with strict inequality for some w which implies that δφ is not permissible. R ∞ −p/2 −1 Next we shall prove the sufficiency. We assume that 1 w m(w) dw = ∞. By 0 letting g(w) = h (w)M(w)/h(w), the inequality R(δφ,h) − R(δφ) ≤ 0 is written as à µ ¶ ! g0(w) g(w) 2 h(w) 2 + ≤ 0. (3.39) M(w) M(w)

We shall now show that the inequality (3.39) implies that g(w) ≥ 0 for all w. Suppose to the contrary that g(w) < 0 for some w0. Then g(w) < 0 for all w ≥ w0 since 0 g (w) < 0 and for all w > w0, we can write (3.39) as µ ¶ d 1 1 ≥ . dw g(w) 2M(w)

33 ∗ Integrating both sides from w0 to w leads to Z w∗ 1 1 1 −1 ∗ − ≥ M (t)dt. g(w ) g(w0) 2 w0 As w∗ → ∞, the right-hand side of above inequality tends to infinity, and this provides a contradiction since the left-hand side is less than −1/g(w0). Similarly we have g(w) ≤ 0 for all w. It follows that g(w) is zero for all w, which implies that h(w) ≡ 1 for all w. This completes the proof.

By Corollary 3.9.1 and Theorem 3.12, we have the following result.

Proposition 3.13. A bounded risk permissible estimator is admissible if it is general- ized Bayes and inadmissible if not.

It is difficult to find estimators dominating an estimator which is both permissi- ble and inadmissible, because we have to overcome the limitation of the character- istics through the Stein identity. For example, the corresponding m(kxk2) for the James-Stein estimator is kxk2−p and the James-Stein estimator is permissible since R ∞ p/2 1−p/2 −1 1 (t t ) dt = ∞. Finding estimators dominating the James-Stein estimator ex- cept for the positive-part estimator is therefore difficult. In Section 3.4, we deal with the problem of improving upon the James-Stein estimator.

3.3 The class of admissible minimax estimators

3.3.1 Generalized Bayes estimators and their admissibility

In this subsection, we construct a certain class of generalized Bayes estimators and consider their admissibility. All but Lemma 3.17 is due to the hitherto researches, in particular Fourdrinier et al.(1998). We deal with a convenient subclass of spheri- cally symmetric prior distributions, a class of variance mixtures of multivariate normal distributions, whose density is proportional to Z µ ¶ µ ¶ 1 λ p/2 λ exp − kθk2 λ−ah(λ)dλ, (3.40) 0 1 − λ 2(1 − λ)

34 where h(λ) is a measurable positive function on (0, 1) and we assume that limλ→0 h(λ) = c1 > 0. This prior distribution is interpreted as the following two-stage prior distribu- −1 tion. For a fixed value of λ, let the θ be according to N(0, λ (1 − λ)Ip). In addition, suppose that λ itself is a random variable, Λ, with distribution Λ ∼ λ−ah(λ). Hence this R 1 −a prior distribution is proper provided 0 λ h(λ)dλ is finite. By using Fubini’s theorem for positive functions, the marginal density function of X is calculated as

Z 1 Z µ ¶p/2 λ −a fh(x) ∝ λ h(λ) p 1 − λ 0 Rµ ¶ kx − θk2 λ × exp − − kθk2 dθdλ 2 2(1 − λ) Z Z µ ¶ µ ¶ 1 λ p/2 λ = exp − kxk2 p 1 − λ 2 0 Rµ ¶ kθ − (1 − λ)xk2 × exp − λ−ah(λ)dθdλ 2(1 − λ) Z µ ¶ 1 λ ∝ exp − kxk2 λp/2−ah(λ)dλ. (3.41) 0 2 Lebesque’s dominated convergence theorem ensures that differentiating under the inte- gral sign is valid. Hence we obtain

µZ 1 ¶ p/2+1−a 2 ∇fh(x) = − λ exp(−kxk λ/2)h(λ)dλ x (3.42) 0 and

µZ 1 ¶ p/2+2−a 2 2 ∆fh(x) = λ h(λ) exp(−kxk λ/2)dλ kxk 0 µZ 1 ¶ −p λp/2+1−ah(λ) exp(−kxk2λ/2)dλ . (3.43) 0

Noting that the generalized Bayes estimator is written as X + ∇ log fh(X), we see that fh(x) exists for all x and the generalized Bayes estimator is well defined if and only if R 1 p/2−a 0 λ h(λ)dλ < ∞, (3.44)

35 which requires that a < p/2 + 1. Therefore we have the generalized Bayes estimator à R ! 1 p/2+1−a 2 0 λ h(λ) exp(−kXk λ/2)dλ δh(X) = 1 − R X. (3.45) 1 p/2−a 2 0 λ h(λ) exp(−kXk λ/2)dλ It is noted that by putting R 1 p/2+1−a 0 λ h(λ) exp(−wλ/2)dλ φh(w) = w R , (3.46) 1 p/2−a 0 λ h(λ) exp(−wλ/2)dλ

2 2 δh(X) is represented as (1 − φh(kXk )/kXk )X.

Concerning finiteness of the risk of δh(X), we have the following result.

Lemma 3.14 (Fourdrinier et al.(1998)). The generalized Bayes estimator δh(X) has finite risk. proof. As the risk of the estimator X is finite (constant risk p), by a straight application of Schwarz’s inequality the risk of the estimator is evaluated as

2 R(θ, δh) = E[kX + ∇ log fh(X) − θk ] 2 2 = E[kX − θk ] + E[k∇ log fh(X)k ] 0 +2E[(X − θ) log ∇fh(X)] 2 ≤ p + E[k∇ log fh(X)k ] ¡ 2 2 ¢1/2 +2 E[kX − θk ]E[k∇ log fh(X)k ] .

2 Hence the risk is finite if and only if E[k∇ log fh(X)k ] < ∞. For any x, we have

2 2 k∇fh(x)k k∇ log fh(x)k = 2 fh (x) ÃR !2 1 λp/2+1−ah(λ) exp(−kxk2λ/2)dλ = kxk2 R0 1 p/2−a 2 0 λ h(λ) exp(−kxk λ/2)dλ ≤ kxk2,

2 2 2 so that E[k∇ log fh(X)k ] ≤ E[kXk ] ≤ p + kθk < ∞. This completes the proof.

36 Now the admissibility of δh(X) is considered. By Theorem 3.9.1, we have only to 2 investigate the behavior of fh(x) in the case where kxk tends to infinity. A following Tauberian theorem gives a nice technique for relating the tail behavior of a function and its Laplace transform.

Theorem 3.15 (Tauberian Theorem). For the Laplace transform

Z ∞ f(s) = exp(−st)g(t)dt, 0 if we have g(t) ∼ tγ as t → +0, then f(s) ∼ s−γ−1Γ(γ + 1) as s → ∞.

Using this theorem, we can evaluate fh(x) as

Z 1 ¡ 2 ¢ p/2−a fh(x) = exp −λkxk /2 λ h(λ)dλ 0 Z ∞ p/2−a+1 ¡ 2 ¢ p/2−a = 2 exp −kxk t t h(2t)I(0,1/2)(t)dt, 0 p/2−a+1 −2(p/2−a+1) ∼ c12 Γ(p/2 − a + 1)kxk . (3.47)

2 Putting mh(kxk ) = fh(x), we have

−p/2 −1 −1 −a+1 t mh(t) ∼ C t

p/2−a+1 as t → ∞, where C = c12 Γ(p/2 − a + 1). Therefore the integral Z ∞ −p/2 −1 t mh(t) dt 1 diverges if and only if a ≤ 2. Hence we have the following result.

Theorem 3.16 (Fourdrinier et al.(1998)). δh(X) is admissible if and only if a ≤ 2.

Moreover using Theorem 3.15, we can investigate the limitation of φh(w) in the case w → ∞. Similarly to (3.47), we can evaluate as

Z 1 ¡ 2 ¢ p/2−a+1 p/2−a+2 −2(p/2−a+2) exp −λkxk /2 λ h(λ)dλ ∼ c12 Γ(p/2 − a + 2) kxk , 0

37 which implies that when w tends to infinity, we have

p/2−a+2 c12 Γ(p/2 − a + 2) 1−(p/2−a+2)+(p/2−a+1) φh(w) ∼ p/2−a+1 w c12 Γ(p/2 − a + 1) = 2(p/2 − a + 1).

Hence we have the following result.

Lemma 3.17. limw→∞ φh(w) = p − 2a + 2, which is not depend upon h(λ).

3.3.2 The construction of minimax Bayes estimator

In this subsection and next two subsections, we consider a class of minimax general- ized Bayes estimators. If we are interested in deriving admissible minimax estimators, by Theorem 3.16, we have only to restrict the following results concerning minimax generalized Bayes estimators to the case of a ≤ 2. In this subsection, we assume that

lim h(λ) = c2(≥ 0), (3.48) λ→1 which implies that h(λ) is bounded. Before considering the minimaxity of δh, we inves- tigate the properties of the behavior of φh(w) with bounded h(λ).

Theorem 3.18. 1. Assume that λh0(λ)/h(λ) with bounded h(λ) is monotone non-

increasing. Then φh(w) is monotone increasing.

0 2. Assume that λh (λ)/h(λ) with bounded h(λ) is monotone increasing. Then φh(w) is increasing from the origin to a certain point and is decreasing from the point.

It is noted that the assumption of part 1 implies h0(λ) ≤ 0 and the assumption of part 2 implies h0(λ) ≥ 0 because the value λh0(λ)/h(λ) on λ = 0 is 0. proof. Applying an integration by parts to the numerator on right-hand side of (3.46),

38 we have Z 1 2 £ ¤1 λp/2+1−ah(λ) exp(−wλ/2)dλ = − λp/2+1−ah(λ) exp(−wλ/2) w 0 0 Z p + 2 − 2a 1 + λp/2−ah(λ) exp(−wλ/2)dλ w Z 0 2 1 + λp/2+1−ah0(λ) exp(−wλ/2)dλ. (3.49) w 0 It is noted that the assumption (3.48) guarantees this integration by parts. By (3.49),

φh(w) is written as R 1 p/2−a+1 0 0 λ h (λ) exp(−wλ/2)dλ φh(w) = 2(p/2 − a + 1) + 2 R 1 p/2−a 0 λ h(λ) exp(−wλ/2)dλ 2c exp(−w/2) −R 2 . 1 p/2−a 0 λ h(λ) exp(−wλ/2)dλ Putting µZ 1 ¶ p/2−a+1 0 ϕ1(w) = exp(w/2) λ h (λ) exp(−wλ/2)dλ − c2 exp(−w/2) , 0

µZ 1 ¶ p/2−a ϕ2(w) = exp(w/2) λ h(λ) exp(−wλ/2)dλ , 0

ϕ(w) = ϕ1(w)/ϕ2(w) and R 1 ϕ0 (w) λp/2−a+1(1 − λ)h0(λ) exp(w(1 − λ)/2)dλ ψ(w) = 1 = 0R , ϕ0 (w) 1 p/2−a 2 0 λ (1 − λ)h(λ) exp(w(1 − λ)/2)dλ

we have φh(w) = p − 2a + 2 + 2ϕ(w) and

0 0 −1 ϕ (w) = ϕ2(w)(ϕ2(w)) (ψ(w) − ϕ(w)) . (3.50) Note that ϕ(0) = −p/2 + a − 1 and R 1 λp/2−a+1(1 − λ)h0(λ)dλ ψ(0) = 0R 1 λp/2−a(1 − λ)h(λ)dλ 0 R 1 λp/2−a+1h(λ)dλ = −p/2 + a − 1 + R 0 , 1 p/2−a 0 λ (1 − λ)h(λ)dλ

39 which implies that ϕ(0) < ψ(0). Moreover we have

lim ϕ(w) = lim ψ(w) = 0, (3.51) w→∞ w→∞ which is shown by Tauberian’s theorem. First we prove part 1. By Lemma 3.19 in the below, ψ(w) is strictly increasing in w from ψ(0)(< 0) to ψ(∞) = 0. If we had the point w0 which satisfies ϕ(w0) = ψ(w0), 0 0 it is necessary to satisfy ϕ (w0) ≥ ψ (w0) > 0. This however contradicts the equation (3.50). Therefore for every w ≥ 0, we have ϕ(w) < ψ(w), that is, ϕ(w) is increasing. Next we prove part 2. By Lemma 3.19, ψ(w) is strictly decreasing in w from

ψ(0)(> 0) to ψ(∞) = 0. Note that for sufficiently large w, ϕ1(w) is positive. If the inequality ϕ(w) ≤ ψ(w) were satisfied for every w ≥ 0, by (3.51) we have ϕ(w) ≤ 0 for every w ≥ 0. This contradicts the positivity of ϕ1(w) for large w. Hence there exists w1 which satisfies ϕ(w1) > ψ(w1). By decreasingness of ψ(w), we have the only one point w2 which is strictly less than w1 and satisfies ϕ(w2) = ψ(w2). Clearly for the point w which satisfies 0 < w < w2, we have ϕ(w) < ψ(w). For the point w which satisfies w > w2, similarly to the proof of part 1, we easily show that ϕ(w) > ψ(w). This completes the proof.

A following lemma is known as a correlation inequality.

Lemma 3.19 (correlation inequality). Let Y be a random variable, and g(y) and h(y) any functions for which E[g(Y )], E[h(Y )] and E[g(Y )h(Y )] exist. Then:

1. If one of the functions g(·) and h(·) is nonincreasing and the other is nondecreas- ing, then

E[g(Y )]E[h(Y )] ≥ E[g(Y )h(Y )].

2. If both functions are either nondecreasing or nonincreasing, then

E[g(Y )]E[h(Y )] ≤ E[g(Y )h(Y )].

40 In the case of both part 1 and 2, the equation is satisfied if and only if either g(·) or h(·) is constant or Y has a degenerating distribution.

Combining (A1) and (A2) of Corollary 3.3.1, Lemma 3.17 and part 1 of Theorem 3.18, we have the following result which had been derived by Faith(1978).

Proposition 3.20 (Faith(1978)). Assume that h(λ) is a nonnegative bounded func- tion such that λh0(λ)/h(λ) is monotone nonincreasing. Then the generalized Bayes

estimator δh with 3 − 2/p ≤ a < 1 + p/2 is minimax and φh(w) is monotone increasing.

The minimax condition (A1) and (A2) of Corollary 3.3.1 is however included in the sufficient condition for minimaxity represented by the differential inequality (3.8), which allows φ(w) to decrease with increasing w. Therefore by using (3.8), which is equivalent to (3.6), a larger class of minimax generalized Bayes estimators is able to be constructed as follows.

Theorem 3.21. Let h(λ) be a bounded nonnegative function such that λh0(λ)/h(λ) can be decomposed as h1(λ) + h2(λ) where h1(λ) ≤ H1 and is nondecreasing while

0 ≤ h2(λ) ≤ H2 with H1 + 2H2 ≤ p/2 + a − 3. Then the generalized Bayes estimator δh is minimax.

Needless to say, this theorem includes Proposition 3.20. We note that this theorem is equivalent to the result by Fourdrinier et al.(1998). For the proof of this theorem, the integration by parts (3.49) is essential. Although Fourdrinier et al.(1998) admitted (3.49) unconditionally, the boundedness of h(λ) is indispensable for it. proof. By substituting g(x) = ∇ log fh(x) in (3.6), the sufficient condition for minimax- ity is equivalent to

∆f (x) 1 k∇f (x)k h − h ≤ 0. k∇fh(x)k 2 fh(x)

41 By (3.42) and (3.43), we rewrite the above inequality as R 1 λp/2+2−ah(λ) exp(−kxk2λ/2)dλ p − kxk2 R0 1 p/2+1−a 2 0 λ h(λ) exp(−kxk λ/2)dλ R 1 kxk2 λp/2+1−ah(λ) exp(−kxk2λ/2)dλ + R0 ≥ 0. (3.52) 2 1 p/2−a 2 0 λ h(λ) exp(−kxk λ/2)dλ We put s = kxk2/2 and apply an integration by parts in the numerators of second and third term on the left-hand side of (3.52), similarly to (3.49). Therefore (3.52) becomes " # 2c exp(−s) c exp(−s) a + p/2 − 3 + R 2 − R 2 1 λp/2+1−ah(λ) exp(−sλ)dλ 1 λp/2−ah(λ) exp(−sλ)dλ R 0 R 0 1 λp/2+2−ah0(λ) exp(−sλ)dλ 1 λp/2+1−ah0(λ) exp(−sλ)dλ − 2 R0 + 0R ≥ 0. (3.53) 1 p/2+1−a 1 p/2−a 0 λ h(λ) exp(−sλ)dλ 0 λ h(λ) exp(−sλ)dλ Clearly the term in bracket in (3.53) is nonnegative since the denominator of the second term is larger than that of the first. Putting, for fixed s,

λkg(λ) exp(−sλ) fk(λ) = R , 1 k 0 λ g(λ) exp(−sλ)dλ we have a sufficient condition for minimaxity of the Bayes estimator δh as µ ¶ µ ¶ λh0(λ) λh0(λ) a + p/2 − 3 − 2E + E ≥ 0, (3.54) p/2+1−a h(λ) p/2 h(λ)

0 where Ek denotes expectation with respect to the density fk(λ). Let λh (λ)/h(λ) = h1(λ) + h2(λ) where h1 ≤ H1 and is nonincreasing and 0 ≤ h2 ≤ H2. Noting that h1 is nonincreasing and bounded by H1, by the correlation inequality we have

2Ep/2+1−a(h1) − Ep/2−a(h1) ≤ Ep/2−a(h1) ≤ H1.

Moreover we have 2Ep/2+1−a(h2) − Ep/2−a(h2) ≤ 2H2. Hence the left-hand side of

(3.54) is bounded below by a + p/2 − 3 − H1 − 2H2, which in turn is nonnegative by the hypothesis of the theorem. This completes the proof.

42 In the case of part 2 of Theorem 3.18, that is, λh0(λ)/h(λ) is monotone nondecreasing,

we have the following by putting h1(λ) = 0 on Theorem 3.21.

Corollary 3.21.1. Assume that h(λ) is a nonnegative bounded function such that λh0(λ)/h(λ) is monotone nondecreasing and that 3 − p/2 < a < 1 + p/2 and that 0 h (1) ≤ c2(a + p/2 − 3)/2. Then the generalized Bayes estimator δh is minimax and

φh(w) has one extremum.

Example 3.22. We consider the case of µ ¶ λ h(λ) = (1 − λ)b exp c + dλ . 1 − λ Fourdrinier et al.(1998) considered the case of d = 0 in the above. Following results are the correct and extended version of Section 4.3 in Fourdrinier et al.(1998). We easily calculate as h0(λ) λ λ λ = −b + c + dλ h(λ) 1 − λ (1 − λ)2 µ ¶ λ λ 2 = −(b − c) + c + dλ. (3.55) 1 − λ 1 − λ If 3 − p/2 ≤ a < p/2 + 1, c ≤ 0, b ≥ c and d ≤ 0, the generalized Bayes estimator is minimax by Proposition 3.20. If 3 − p/2 < a < p/2 + 1, b = c = 0 and 0 < d ≤ (p/2−3+a)/2, the generalized Bayes estimator is minimax by Corollary 3.21.1. Finally we consider the case 3 − p/2 < a < p/2 + 1, c < 0, b < c and d = 0. Putting à µ ¶ ! λ λ 2 h (λ) = −(b − c) + c I (λ) 1 1 − λ 1 − λ [λ/(1−λ)≥(b−c)/(2c)]

and à µ ¶ ! λ λ 2 h (λ) = −(b − c) + c I (λ), 2 1 − λ 1 − λ [λ/(1−λ)<(b−c)/(2c)]

we easily see that h1(λ) is monotone decreasing and h2(λ) is monotone increasing and 2 that h1(λ) and h2(λ) are less than −(b − c) /(4c) for any λ. Therefore by Theorem 3.21, if a + p/2 − 3 + 3(b − c)2/(4c) ≥ 0, the generalized Bayes estimator is minimax.

43 P n bi Example 3.23. We consider the case of h(λ) = i=0 aiλ , where ai > 0 for every i, b0 = 0, bi < bj if i < j and n is either finite or infinite. If the termwise differenti- Pn ation is allowed and i=0 aibi exists (Needless to say, these assumptions are satisfied unconditionally in the case of finite n), we have P n bi 0 i=0 aibiλ λh (λ)/h(λ) = Pn , bi i=0 aiλ which is monotone increasing in λ by Lemma 3.24 in the below. Therefore the gener- Pn Pn alized Bayes estimator is minimax if 3 − p/2 < a < p/2 + 1 and i=0 aibi/ i=0 ai ≤ (a + p/2 − 3)/2.

A following lemma is a generalization of Lehmann(1986)’s. ¡P ¢ ¡P ¢ n bi n bi Lemma 3.24. Let g(y) = i=0 ciy / i=0 aiy where ai and ci are positive for every i, b0 = 0, bi < bj if i < j and n is either finite or infinite. We assume that P P n bi n bi i=0 ciy and i=0 aiy converge for all y > 0 and that the termwise differentiation is allowed. If the sequence ci/ai is monotone increasing(decreasing), then g(y) is monotone increasing(decreasing) in y. proof. We have à ! à ! Xn −2 Xn Xn Xn Xn 0 bi −1 bi bi bi bi g (y) = aiy y aiy biciy − ciy biaiy i=0 i=0 i=0 i=0 i=0 à ! Xn −1 µ ¶ bi −1 = aiy y E({ci/ai}bi) − E(ci/ai)E(bi) , i=0 P bi n bi where E denotes expectation with respect to the probability function aiy / i=0 aiy , i = 0, . . . , n. The correlation inequality guarantees the result.

3.3.3 Alam-Maruyama’s type generalized Bayes estimator

Although Theorem 3.21 is very powerful and easily enables the construction of minimax Bayes estimators, there exist two famous classes of generalized Bayes estimators to

44 which Theorem 3.21 is not able to be applied. One is the generalized Bayes estimator b δh for h(λ) = (1−λ) with −1 < b < 0, which is obviously not bounded and is considered in this subsection. The other is Stein(1973)’s Bayes estimator, which is considered in the next subsection. The generalized Bayes estimator in the case of h(λ) = (1 − λ)b with −1 < b < 0 was originally investigated by Alam(1973) and was developed by Maruyama(1998). As a matter of fact, Alam(1973) considered the prior distribution whose density is proportional to kθk2α with α < 1 − p/2. It is noted that scale mixtures of multivariate normal distributions which have a density (3.40) for h(λ) = (1 − λ)b with a = b + 2 is equivalent to kθk2(b+1−p/2) since we can calculate as Z µ ¶ µ ¶ 1 λ p/2 λ exp − kθk2 λ−a(1 − λ)bdλ 0 1 − λ 2(1 − λ) Z ∞ = tp/2−a(t + 1)a−b−2 exp(−tkθk2/2)dt 0 ∝ kθk2(b+1−p/2).

Results in this subsection are essentially equivalent to Maruyama(1998,2000a)’s al- though these are more simplificated than Maruyama(1998)’s . The generalized Bayes 2 2 estimator is written as δb(X) = (1 − φb(kXk )/kXk )X, where R 1 p/2−a+1 b 0 λ (1 − λ) exp(−wλ/2)dλ φb(w) = w R . 1 p/2−a b 0 λ (1 − λ) exp(−wλ/2)dλ

To investigate the properties of the behavior of φb(w) and the minimaxity of δb(X), the representation through the confluent hypergeometric function

n M(a, b, x) = 1 + ax/b + ··· + (a)nx /(b)nn! + ··· , where (a)n = a · (a + 1) ··· (a + n − 1) for n ≥ 1 and (a)0 = 1 is quite useful. From the formulas (13.1.27), (13.2.1), (13.4.3), (13.4.4) and (13.4.8) given in Abramowitz and Stegun(1964), we have the following relations which will be used in the sequel, Z 1 Γ(b − a)Γ(a)(Γ(b))−1M(a, b, x) = extta−1(1 − t)b−a−1dt, for b > a > 0, (3.56) 0

45 M(a, b, x) = exM(b − a, b, −x), (3.57)

bM(a, b, x) − bM(a − 1, b, x) − xM(a, b + 1, x) = 0, (3.58)

(1 + a − b)M(a, b, x) − aM(a + 1, b, x) + (b − 1)M(a, b − 1, x) = 0 (3.59)

and d a M(a, b, x) = M(a + 1, b + 1, x). (3.60) dx b From (3.56), (3.57) and (3.58), we obtain

p/2 − a + 1 M(b + 1, p/2 − a + b + 3, w/2) φ (w) = w b p/2 − a + b + 2 M(b + 1, p/2 − a + b + 2, w/2) = (p − 2a + 2)(1 + ϕ(w)), where ϕ(w) = −M(b, c, w/2)/M(b + 1, c, w/2) and c = p/2 − a + b + 2. Noting that

− d M(b, c, w/2) b M(b + 1, c + 1, w/2) ψ(w) = dw = − , d b + 1 M(b + 2, c + 1, w/2) dw M(b + 1, c, w/2) is strictly decreasing in w from ψ(0) = −b/(b + 1) to ψ(∞) = 0 by Lemma 3.24, we can prove the following result of the properties of the behavior of φb(w), similarly to part 2 of Theorem 3.18.

Lemma 3.25 (Maruyama(1998)). φb(w) with −1 < b < 0 is increasing from the origin to a certain point and is decreasing from the point.

The result of minimaxity of δb(X) is following.

Theorem 3.26 (Maruyama(1998,2000a)). The generalized Bayes estimator δb(X) with 3 − p/2 < a < 1 + p/2 and b ≥ −(a + p/2 − 3)/(3p/2 + 1 − a) is minimax.

46 proof. Similarly to the proof of Theorem 3.21, the sufficient condition for minimaxity is written as R 1 λp/2−a+2(1 − λ)b exp(−kxk2λ/2)dλ p − kxk2 R0 1 p/2−a+1 b 2 0 λ (1 − λ) exp(−kxk λ/2)dλ R 1 1 λp/2−a+1(1 − λ)b exp(−kxk2λ/2)dλ + kxk2 R0 ≥ 0. (3.61) 2 1 p/2−a b 2 0 λ (1 − λ) exp(−kxk λ/2)dλ Putting s = kxk2/2 and using the relations (3.58) and (3.59), we rewrite the inequality (3.61) as M(b, c, s) M(b, c, s) p/2 + a − 3 − 2b + 2c − (p/2 + 1 − a) ≥ 0. (3.62) M(b + 1, c + 1, s) M(b + 1, c, s) For s which satisfies M(b, c, s) ≥ 0, the left-hand side of (3.62) is bounded below by p/2 + a − 3 − 2b since clearly we have c ≥ p/2 + 1 − a and M(b + 1, c + 1, s)−1 ≥ M(b + 1, c, s)−1. For s which satisfies M(b, c, s) < 0, the left-hand side of (3.62) is bounded below by c + 1 p/2 + a − 3 + (3p/2 + 1 − a)b p/2 + a − 3 − 2b + 2b = , (3.63) b + 1 b + 1 since the inequality

(b + 1)cM(b, c, s) − b(c + 1)M(b + 1, c + 1, s) ≥ 0 is satisfied for −1 < b < 0. The above inequality is shown by verifying that the coefficients of sn for every n ≥ 0 are nonnegative. By the hypothesis of the theorem, the left hand side of (3.62) is nonnegative and this completes the proof.

Remark 3.27. To investigate the properties of the behavior of φb(w) and the mini- maxity of δb(X), the relation (3.58) substitutes for the integration by parts (3.49) in Section 3.3.2 where h(λ) is bounded.

Theorem 3.26 has been already proved by Maruyama(1998), in which he however made a mistake in evaluation of a certain inequality. We provide the correct version of the proof in the following.

47 proof. Putting µ ¶ d S(a, b, w) = 4 φ (w) + φ (w) (2(p − 2) − φ (w)) /w , dw b b b we can write the sufficient condition for minimaxity as S(a, b, w) ≥ 0 for every w ≥ 0. Putting c = p/2 − a + b + 2 and s = w/2 and using the relation (3.60), we obtain µ 2(c − b − 1) S(a, b, w) = (p − c − b − 1)M(b + 1, c + 1, s)M(b + 1, c, s) cM 2(b + 1, c, s) +(c − b − 1)M(b + 1, c + 1, s)M(b, c, s) ¶ +2(b + 1)M(b + 2, c + 1, s)M(b, c, s) . (3.64)

The following calculation is mainly based on the method of Alam(1973). Using the series expansion for the confluent hypergeometric function, we have

M(b + 1, c + 1, s)M(b + 1, c, s) µ ¶ X∞ sn Xn n (b + 1) (b + 1) = γ n−γ n! γ (c + 1) (c) n=0 γ=0 γ n−γ µ ¶ X∞ sn Xn n (b + 1) (b + 1) c = γ n−γ n! γ (c) (c) (c + n − γ) n=0 γ=0 γ n−γ [n/2] µ ¶ · ¸ X∞ sn X n (b + 1) (b + 1) 1 1 = γ n−γ cq + , (3.65) n! γ (c) (c) γ c + γ c + n − γ n=0 γ=0 γ n−γ where [n/2] denotes the largest integer less than or equal to n/2, qγ = 1 for γ < n/2 and qγ = 1/2 for γ = n/2. Similarly to (3.65), we have

M(b + 1, c + 1, s)M(b, c, s) [n/2] µ ¶ X∞ sn X n (b + 1) (b + 1) = γ n−γ cq n! γ (c) (c) γ n=0 γ=0 γ n−γ · ¸ 1 1 ×b + (3.66) (c + γ)(b + n − γ) (c + n − γ)(b + γ)

48 and

M(b + 2, c + 1, s)M(b, c, s) [n/2] µ ¶ X∞ sn X n (b + 1) (b + 1) = γ n−γ cq n! γ (c) (c) γ n=0 γ=0 γ n−γ · ¸ b b + 1 + γ b + 1 + n − γ × + . (3.67) b + 1 (c + γ)(b + n − γ) (c + n − γ)(b + γ) Combining (3.64), (3.65), (3.66) and (3.67), we can see that S(a, b, w) is nonnegative if · ¸ 1 1 T (c, b, n, γ) = (c − b − 1)b + (c + γ)(b + n − γ) (c + n − γ)(b + γ) · ¸ b b + 1 + γ b + 1 + n − γ +2(b + 1) + b + 1 (c + γ)(b + n − γ) (c + n − γ)(b + γ) · ¸ 1 1 +(p − c − b − 1) + (3.68) c + γ c + n − γ is nonnegative for each γ = 0, 1, ··· , [n/2] and for each n = 0, 1, ··· . Clearly n = 0 implies γ = 0 and we have T (c, b, 0, 0) = 2p/c > 0. Thus we deal with the case of n ≥ 1. Note that n ≥ 1 implies n − γ ≥ 1. We arrange the right-hand side of (3.68) as 2c + n T (c, b, n, γ) = (p − c − b − 1) (c + γ)(c + n − γ) · ¸ c + b + 1 + 2γ c + b + 1 + 2n − 2γ +b + . (3.69) (c + γ)(b + n − γ) (c + n − γ)(b + γ) The quantity inside the braces on the right-hand side of (3.69) can be evaluated as · ¸ c + b + 1 + 2γ c + b + 1 + 2n − 2γ + (c + γ)(b + n − γ) (c + n − γ)(b + γ) · ¸ 1 c + n − γ c + γ = (c + b + 1 + 2γ) + (c + b + 1 + 2n − 2γ) (c + γ)(c + n − γ) b + n − γ b + γ · ¸ 1 c + 1 c + 1 ≤ (c + b + 1 + 2γ) + (c + b + 1 + 2n − 2γ) (c + γ)(c + n − γ) b + 1 b + 1 c + 1 2c + n c + b + 1 + n = 2 b + 1 (c + γ)(c + n − γ) 2c + n c + 1 2c + n ≤ 2 . (3.70) b + 1 (c + γ)(c + n − γ)

49 Hence we have · ¸ 2c + n b(c + 1) T (c, b, n, γ) ≥ p − c − b − 1 + 2 , (c + γ)(c + n − γ) b + 1 which is nonnegative if p − c − b − 1 + 2b(c + 1)/(b + 1) ≥ 0, that is, (3p/2 + 1 − a)b + p/2 − 3 + a ≥ 0. In this case, S(a, b, w) is also nonnegative and this completes the proof.

Remark 3.28. In Section 3.3.2 and 3.3.3, we see that all classes of minimax gener- alized Bayes estimators given in the hitherto researches, that is, Strawderman(1971), Alam(1973), Berger(1976b), Faith(1978), Maruyama(1998,2000a) and Fourdrinier et

al.(1998) are expressed as δh, although relations among classes above have not been clear. Hence we succeed in treating uniformly classes above as the following table.

Table 3.1: All classes of minimax Bayes estimators given in the hitherto researches

Author The sufficient condition for minimaxity on h(λ) and a φh(w) Strawderman-Berger h ≡ 1 3 − p/2 ≤ a < p/2 + 1 % h(λ) is bounded Faith λh0(λ)/h(λ) is monotone nondecreasing % 3 − p/2 ≤ a < p/2 + 1 h(λ) is bounded % 0 λh (λ)/h(λ) can be decomposed as h1(λ) + h2(λ) or Fourdrinier et al. h1(λ) ≤ H1 and is monotone decreasing %& 0 ≤ h2(λ) ≤ H2 or H1 + 2H2 ≤ p/2 + a − 3 ··· Alam h(λ) = (1 − λ)b with b < 0 a = b + 2 h(λ) = (1 − λ)b with b < 0 %& Maruyama 3 − p/2 < a < p/2 + 1 −(a + p/2 − 3)/(3p/2 + 1 − a) ≤ b < 0

3.3.4 Generalization of Stein’s Bayes estimator

In this subsection, we pay attention to the following Stein(1973)’s simple idea and pro- vide it with the warrant for statistical decision theory. Stein(1973) suggested that the

50 generalized Bayes estimator with respect to the prior distribution: the weighted sum of that determined by the density kθk2−p, which corresponds to (3.40) with a = 2 and h(λ) ≡ 1, and a measure concentrated at the origin may dominate the James-Stein estimator. Since Kubokawa(1991) afterward showed that the generalized Bayes esti- mator with respect to the density kθk2−p dominates the James-Stein estimator, Stein’s suggestion is to the point in the case where the ratio of a measure concentrated at the origin is very small. Efron and Morris(1976) expressed Stein(1973)’s estimator explic- itly. As for the decision-theoretic properties of the estimator, admissibility is easily able to be checked by using Brown(1971)’s theorem. On the other hand, even minimaxity, to say nothing of dominance over the James-Stein estimator has not been proved yet. (Hara(1997) tried to prove the minimaxity of Stein’s estimator. Unfortunately his proof is incorrect.) In this subsection, we investigate minimaxity and admissibility of the generalized Bayes estimator with respect to the prior distribution which includes Stein’s. We consider the following prior distribution: scale mixtures of multivariate normal distribution whose density is proportional to Z µ ¶ µ ¶ 1 λ p/2 λ exp − kθk2 λ−ah(λ)dλ, 0 1 − λ 2(1 − λ) which is equivalent to the density function considered in Section 3.3.1, is chosen with probability β/(1+β) and a measure concentrated at the origin is chosen with probability 1/(1 + β) for β > 0. This prior distribution is interpreted as the one whose density is (3.40) with

h(λ) = βh(λ) + δ(λ − 1), (3.71) where δ(·) is the Dirac delta function. Similarly to (3.41), the marginal density function of X is calculated as

Z 1 p/2−a 2 2 fβ(x) = β λ h(λ) exp(−kxk λ/2)dλ + exp(−kxk /2). 0

51 Since the generalized Bayes estimator is written as X + ∇ log fβ(X), fβ(x) exists for all x and the generalized Bayes estimator is well defined if and only if the integral R 1 p/2−a 0 λ h(λ)dλ is finite. Noting that µZ 1 ¶ p/2−a+1 2 −1 2 ∇fβ(x) = − λ h(λ) exp(−kxk λ/2)dλ + β exp(−kxk /2) x, (3.72) 0 we have the generalized Bayes estimator

2 2 δβ(X) = (1 − φβ(kXk )/kXk )X, where R 1 p/2−a+1 β 0 λ h(λ) exp(−wλ/2)dλ + exp(−w/2) φβ(w) = w R . (3.73) 1 p/2−a β 0 λ h(λ) exp(−wλ/2)dλ + exp(−w/2) Results in Section 3.3.1 are clearly succeeded regardless of β.

Theorem 3.29. δβ(X) is admissible if and only if a ≤ 2.

Lemma 3.30. limw→∞ φβ(w) = p − 2a + 2, which is not depend upon h(λ) and β.

Before considering the minimaxity of δβ(X), we furthermore investigate the prop- erties of the behavior of φβ(w).

Theorem 3.31. 1. Assume that λh0(λ)/h(λ) with bounded h(λ) is monotone non- b decreasing or that h(λ) = (1 − λ) with −1 < b < 0. Then φβ(w) for every β > 0 is increasing from the origin to a certain point and is decreasing from the point.

2. Assume that λh0(λ)/h(λ) with bounded h(λ) is monotone nonincreasing and that

h(λ) is not constant. Then there exists a constant β1 such that the following are satisfied.

(a) If β ≥ β1, φβ(w) is monotone nondecreasing.

(b) If β < β1, the function φβ(w) is increasing from the origin to a certain point

w1 and is decreasing from the point w1 to the other point w2 and is increasing

from the point w2. Namely, φβ(w) has two extreme points.

52 We cannot derive the value β1 analytically although we see that β1 < β2 where

µZ 1 ¶−1 p/2−a β2 = λ (1 − λ)h(λ)dλ 0 ÃR R !−1 1 λp/2−a+1(1 − λ)2h0(λ)dλ 1 λp/2−a+1(1 − λ)h0(λ)dλ · 0R − 0R . 1 p/2−a 2 1 p/2−a 0 λ (1 − λ) h(λ)dλ 0 λ (1 − λ)h(λ)dλ

phi(w)

5

(p-2) 4

3

beta =1

2

beta = 2

1 beta = 4

0 0 5 10 15 20 25 30 35 w

Figure 3.2 : The behavior of φβ(w) for p = 6, a = 2 and h(λ) = 1 − λ when β = 1, 2, 4.

Figure 3.2 gives the behavior of φβ(w) with a = 2 and h(λ) = 1 − λ, in the case of

β = 1, 2, 4, for p = 6. We can guess that there exists β1 in this case, between 2 and 4. proof. First we consider the case of h(λ) = (1 − λ)b for −1 < b < 0. Using the relation

(3.58), we rewrite φβ(w) as w − (p − 2a + 2) − βd(p − 2a + 2)M(b, c, w/2) φ (w) = p − 2a + 2 + , β 1 + βdM(b + 1, c, w/2)

53 where c = p/2 − a + b + 2 and d = Γ(p/2 + 1 − a)Γ(b + 1)/Γ(c). Similarly to the proof of Theorem 3.18, we have the result. Next we consider the case of bounded h(λ). Applying an integration by parts to the numerator of the right-hand side of (3.73), we rewrite φβ(w) as R 1 p/2−a+1 0 2β 0 λ h (λ) exp((1 − λ)w/2)dλ + w φβ(w) = p − 2a + 2 + R 1 p/2−a 1 + β 0 λ h(λ) exp((1 − λ)w/2)dλ 2βc + (p − 2a + 2) − R 2 . 1 p/2−a 1 + β 0 λ h(λ) exp((1 − λ)w/2)dλ Putting

Z 1 p/2−a+1 0 ϕ1(w) = β λ h (λ) exp((1 − λ)w/2)dλ + w/2 − βc2 − (p/2 − a + 1), 0

Z 1 p/2−a ϕ2(w) = 1 + β λ h(λ) exp((1 − λ)w/2)dλ, 0

ϕ(w) = ϕ1(w)/ϕ2(w), and

R 1 ϕ0 (w) β λp/2−a+1(1 − λ)h0(λ) exp((1 − λ)w/2)dλ + 1 ψ(w) = 1 = 0 R , ϕ0 (w) 1 p/2−a 2 β 0 λ (1 − λ)h(λ) exp((1 − λ)w/2)dλ

0 0 −1 we have φβ(w) = p − 2a + 2 + 2ϕ(w) and ϕ (w) = ϕ2(w)(ϕ2(w)) (ψ(w) − ϕ(w)) . Moreover putting

R 1 ϕ00(w) λp/2−a+1(1 − λ)2h0(λ) exp((1 − λ)w/2)dλ ρ(w) = 1 = 0R , ϕ00(w) 1 p/2−a 2 2 0 λ (1 − λ) h(λ) exp((1 − λ)w/2)dλ

0 00 0 −1 we have ψ (w) = ϕ2(w)(ϕ2(w)) (ρ(w) − ψ(w)) . Furthermore it is noted that ϕ(0) = −p/2 + a − 1, R 1 λp/2−a+1(1 − λ)h0(λ)dλ + β−1 ψ(0) = 0 R 1 λp/2−a(1 − λ)h(λ)dλ 0 R 1 λp/2−a+1h(λ)dλ + β−1 = −p/2 + a − 1 + R0 , 1 p/2−a 0 λ (1 − λ)h(λ)dλ

54 R 1 λp/2−a+1(1 − λ)2h0(λ)dλ ρ(0) = 0R (3.74) 1 p/2−a 2 0 λ (1 − λ) h(λ)dλ and

lim ϕ(w) = lim ψ(w) = lim ρ(w) = 0. (3.75) w→∞ w→∞ w→∞ If λh0(λ)/h(λ) is monotone nondecreasing, by Lemma 3.19, ψ(w) is monotone decreas- ing. Hence similarly to the proof of Theorem 3.18, we have the result. Fillaly we deal with the case where λh0(λ)/h(λ) is monotone nonincreasing. By

Lemma 3.19, ρ(w) is monotone increasing. Therefore if ρ(0) ≤ ψ(0), that is, β ≥ β2, similarly to the proof of Theorem 3.18, we see that ψ(w) is monotone increasing, which

implies that ϕ(w) is monotone increasing. For β < β2, ψ(w) is decreasing from the origin to a certain point and is increasing from the point. For fixed w(> p − 2a + 2), as

β is smaller and smaller, there exists β such that φβ(w) exceeds p−2a+2, that is, ϕ(w) exceeds 0, which implies that ϕ(w) is not monotone by (3.75). By continuity of φβ(w) in β, there exists a constant β1 such that for β ≥ β1, ψ(w) ≥ ϕ(w) for every w and for

β < β1, we have the point w0 which satisfies that ψ(w0) < ϕ(w0). Similarly to the proof of Theorem 3.18, we see that for β < β1, there exist two points w1(< w0 <)w2 which satisfies that ψ(w) = ϕ(w). Moreover we see that for 0 ≤ w < w1, ψ(w) > ϕ(w) is satisfied and for w1 < w < w2, ψ(w) < ϕ(w) is satisfied and for w > w2, ψ(w) > ϕ(w) is satisfied. This completes the proof.

The result of minimaxity of δβ(X) is following.

∗ Theorem 3.32. δβ(X) is minimax if β ≥ β where

µ Z 1 ¶−1 β∗ = D λp/2−a+1(1 − λ)h(λ)dλ  0 s  µZ 1 ¶−1 Z 1 × 1 + 1 + D λp/2−ah(λ)dλ λp/2−a+1(1 − λ)h(λ)dλ 0 0

for D > 0 where D = a + p/2 − 3 − H1 − 2H2 under the assumption of Theorem 3.21: (a + p/2 − 3 + b(3p/2 + 1 − a))(b + 1)−1 in the case of h(λ) = (1 − λ)b with −1 < b < 0.

55 By this theorem, we see that, for h(λ) such that δh is shown to be minimax in Section ∗ ∗ 3.3.2 and 3.3.3, there exists a constant β such that for β ≥ β , δβ is also minimax. proof. Similarly to the proof of Theorem 3.21, the sufficient condition for minimaxity is written as R β 1 λp/2+2−ah(λ) exp(−kxk2λ/2)dλ + exp(−kxk2/2) p − kxk2 R0 1 p/2+1−a 2 2 β 0 λ h(λ) exp(−kxk λ/2)dλ + exp(−kxk /2) R 1 kxk2 β λp/2+1−ah(λ) exp(−kxk2λ/2)dλ + exp(−kxk2/2) + 0R ≥ 0. 2 1 p/2−a 2 2 β 0 λ g(λ) exp(−kxk λ/2)dλ + exp(−kxk /2) Putting s = kxk2/2 and

Z 1 k Zk(s) = exp(s) λ h(λ) exp(−sλ)dλ, 0 we can rewrite the above inequality as µ ¶ µ Z (s) Z (s) p p − 2s p/2−a+2 + s p/2−a+1 β2 + + Z (s) Z (s) Z (s) p/2−a+1 p/2−a p/2−a+1 ¶ p Z (s) − Z (s) 2s + 2s p/2−a+1 p/2−a+2 − β Zp/2−a(s) Zp/2−a(s)Zp/2−a+1(s) Zp/2−a+1(s) p − s + ≥ 0. (3.76) Zp/2−a(s)Zp/2−a+1(s)

Noting that Zp/2−a+1(s) − Zp/2−a+2(s) ≥ 0, the value β which satisfies the inequality µ ¶ Z (s) Z (s) p − 2s p/2−a+2 + s p/2−a+1 β2 Zp/2−a+1(s) Zp/2−a(s) 2s s − β − ≥ 0 (3.77) Zp/2−a+1(s) Zp/2−a(s)Zp/2−a+1(s)

also satisfies the inequality (3.76). Let

−1 −1 A(s) = p − 2sZp/2−a+2(s)Zp/2−a+1(s) + sZp/2−a+1(s)Zp/2−a(s) ,

56 B(s) = s/Zp/2−a+1(s) and C(s) = s/(Zp/2−a+1(s)Zp/2−a(s)). For the quadratic equation A(s)β2 − 2B(s)β − C(s) = 0, we would like to bound the larger solution q β∗(s) = B(s)/A(s) + (B(s)/A(s))2 + C(s)/A(s) above. For this, it is sufficient to bound A(s) below and B(s),C(s) above. First we deal with A(s). Under the assumption of Theorem 3.21, A(s) is bounded b below by a + p/2 − 3 − H1 − 2H2. In the case of h(λ) = (1 − λ) with −1 < b < 0, by (3.63), A(s) is bounded below by (a + p/2 − 3 + b(3p/2 + 1 − a))(b + 1)−1. Next we can evaluate B(s) as à Z !−1 X∞ sn 1 B(s) = sZ (s)−1 = s λp/2−a+1(1 − λ)nh(λ)dλ p/2−a+1 n! n=0 0 µZ 1 ¶−1 ≤ λp/2−a+1(1 − λ)h(λ)dλ . (3.78) 0 Moreover noting that for every s ≥ 0, the inequality

Z 1 Z 1 λp/2−ah(λ)dλ ≤ λp/2−ah(λ) exp((1 − λ)s)dλ 0 0 is satisfied, we have

µZ 1 Z 1 ¶−1 C(s) ≤ λp/2−a+1(1 − λ)h(λ)dλ λp/2−ah(λ)dλ . (3.79) 0 0 Therefore we see that β∗(s) is bounded above by β∗ where

µ Z 1 ¶−1 β∗ = D λp/2−a+1(1 − λ)h(λ)dλ  0 s  µZ 1 ¶−1 Z 1 × 1 + 1 + D λp/2−ah(λ)dλ λp/2−a+1(1 − λ)h(λ)dλ 0 0

for D > 0 where D = a + p/2 − 3 − H1 − 2H2 under the assumption of Theorem 3.21: (a + p/2 − 3 + b(3p/2 + 1 − a))(b + 1)−1 in the case of h(λ) = (1 − λ)b with −1 < b < 0. This completes the proof.

57 Example 3.33. We consider the sufficient condition for minimaxity of Stein(1973)’s Bayes estimator. Since the prior density kθk2−p corresponds to the density (3.40) with a = 2 and h(λ) ≡ 1, we have à s ! p(p + 2) (p − 2)2 β∗ = 1 + 1 + . 2(p − 2) p(p + 2)

Moreover we consider Stein(1973)’s Bayes estimator. As introduced in the next section, Kubokawa(1991) showed that the generalized Bayes estimator with respect to the den- sity kθk2−p is admissible and dominates the James-Stein estimator. As far as we know, the estimator is the only one which has above two optimalities, that is, domination over the James-Stein estimator and admissibility. The sole drawback is that Kubokawa’s es- timator has the same risk of the James-Stein estimator at kθk = 0. We note that shrinkage estimators are derived by using the vague prior information that kθk is close to 0. It goes without saying that we would like to get significant improvement of risk when the prior information is accurate. Therefore Stein’s estimator with a sufficiently large β, has quite possibility of dominating the James-Stein estimator significantly at the origin, although we cannot have an analytic result about it.

58 3.4 Improvement upon the James-Stein estimator

The inadmissibility of the James-Stein estimator is stated in Section 3.1. Since the James-Stein estimator is permissible as shown in Section 3.2, for the construction of a class of estimators improving on the James-Stein estimator, it is necessary to overcome the limitation of the characteristics through the Stein identity. Kubokawa(1991, 1994) succeeded in overcoming it. Kubokawa(1991) applied the method of Brewster and Zidek(1974) in estimation of a normal variance to this estimation problem and showed 2 2 that the estimator δK (X) = (1 − φK (kXk )/kXk )X where R 1 p/2−1 0 λ exp(−wλ/2)dλ φK (w) = wR , 1 p/2−2 0 λ exp(−wλ/2)dλ dominates the James-Stein estimator and is admissible. Moreover Kubokawa(1994) pro- posed a new approach to improving on the James-Stein estimator, by the idea to express a difference of risk functions through an integral. The main theorem of Kubokawa(1994) is stated in the following.

Theorem 3.34 (Kubokawa(1994)). Assume that

(B1) φ(w) is nondecreasing and limw→∞ φ(w) = p − 2

(B2) φ(w) ≥ φK (w).

Then δφ(X) of form (3.7) dominates the James-Stein estimator.

proof. By the Stein identity, we have · ¸ φ2(kXk2) − 2(p − 2)φ(kXk2) R(θ, δ ) = p + E − 4φ0(kXk2) φ kXk2 and · ¸ 1 R(θ, δJS) = p − (p − 2)2E . kXk2

59 By letting limw→∞ φ(w) = p − 2, the difference of risk functions can be written as

R(θ, δJS) − R(θ, δ ) · φ ¸ φ(∞) φ(kXk2) ¡ ¢ = E (φ(∞) − 2(p − 2)) − φ(kXk2) − 2(p − 2) kXk2 kXk2 £ ¤ +4E φ0(kXk2) ·Z µ ¶ ¸ ∞ d φ(tkXk2) ¡ ¢ £ ¤ = E φ(tkXk2) − 2(p − 2) dt + 4E φ0(kXk2) dt kXk2 ·Z1 ¸ ∞ ¡ ¢ £ ¤ = 2E φ(tkXk2) − (p − 2) φ0(tkXk2)dt + 4E φ0(kXk2) 1 Z ∞ Z ∞ 0 £ 0 2 ¤ = 2 (φ(tx) − (p − 2)) φ (tx)fp(x; λ)dtdx + 4E φ (kXk ) , (3.80) 0 1

2 where λ = kθk and fp(x; λ) denotes a density of a non-central chi square distribution with p degrees of freedom and non-centrality parameter λ. Making the transformations w = tx and y = w/t, we rewrite the first term in the right-hand side of the equation of (3.80) as

Z ∞ Z ∞ 0 −1 (φ(w) − (p − 2)) φ (w)fp(w/t; λ)t dtdw 0 1 Z ∞ Z w 0 −1 = (φ(w) − (p − 2)) φ (w)fp(y; λ)y dydw. 0 0 Moreover by the inequality

Z w Z w −1 −1 fp(w; λ)/ y fp(y; λ)dy ≥ fp(w)/ y fp(y)dy, 0 0 where fp(y) = fp(y; 0), which can be shown by the correlation inequality, we can eval- uate the risk difference as

JS R(θ, δ ) − R(θ, δφ) Z ∞ µ Z w ¶ 0 −1 = 2 φ (w) (φ(w) − (p − 2)) y fp(y, λ)dy + 2f(w, λ) dw 0 0 Z ∞ µZ w ¶ 0 −1 ≥ 2 φ (w)(φ(w) − φ0(w)) y fp(y, λ)dy dw, (3.81) 0 0

60 where

Z w −1 φ0(w) = p − 2 + 2fp(w)/ y fp(y)dy. 0 The assumptions of the theorem guarantee the left-hand side of the inequality (3.81)

is nonnegative. Using an integration by parts, we easily verify that φ0 = φK . This completes the proof.

JS δK (X) and δ+ (X) are included in this class. We note that a shrinkage estimators such as the James-Stein estimator is derived by using the vague prior information that kθk2 is close to 0. It goes without saying that we would like to get the significant improvement of risk when the prior information is accurate. Though δK (X) is an admissible generalized Bayes estimator and thus analytic, it does not improve upon δJS(X) at kθk2 = 0, which JS JS is easily verified by (3.81). On the other hand, δ+ (X) improves on δ (X) significantly at kθk2 = 0, but it is not analytic and is thus inadmissible by complete class theorem. Therefore it is desirable to get an analytic estimator (which implies that it has some possibilities of being admissible) improving on δJS(X) especially at kθk2 = 0. Maruyama(1996) proposed an estimator having such a property:

¡ 2 2¢ δα(X) = 1 − φα(kXk )/kXk X (3.82) where R 1 α(p/2−1) 0 λ exp(−wαλ/2)dλ φα(w) = wR 1 α(p/2−1)−1 0 λ exp(−wαλ/2)dλ M(1, α(p/2 − 1) + 2, αw/2) = w M(1, α(p/2 − 1) + 1, αw/2) µ ¶ 1 = (p − 2) 1 − . (3.83) M(1, α(p/2 − 1) + 1, αw/2)

The above second and third equations are from formula (3.56), (3.57) and (3.58).

Clearly we have δ1(X) = δK (X). Using the representation of φα(w) as (3.83), we have the following result.

61 JS Theorem 3.35 (Maruyama(1996)). δα(X) dominates δ (X) for α ≥ 1. proof. We shall verify that φα(w) for α ≥ 1 satisfies (B1) and (B2) of Theorem

3.34. As M(1, α(p/2 − 1) + 1, αw/2) is increasing in w, φα(w) is increasing in w.

As limw→∞ M(1, α(p/2 − 1) + 1, αw/2) = ∞, it is clear that limw→∞ φα(w) = p − 2.

In order to show that φα(w) ≥ φK (w) = φ1(w) for α ≥ 1, we have only to check that

φα(w) is increasing in α, which is easily verified because the coefficient of each term of the right-hand side of the equation α αw 1 M(1, (p − 2) + 1, ) = 1 + w 2 2 p − 2 + 2/α 1 + w2 + ··· (3.84) (p − 2 + 2/α)(p − 2 + 4/α)

is increasing in α. We have thus proved the theorem.

Proposition 3.36 (Maruyama(1996)). δα(X) approaches the positive-part James- Stein estimator when α tends to infinity, that is,

JS lim δα(X) = δ+ (X). α→∞ proof. By the proof of Theorem 3.35, M(1, α(p − 2)/2 + 1, αw/2) is increasing in α, so P∞ i that it converges to i=0(w/(p − 2)) by the monotone convergence theorem. Consid- ering two cases: w < (≥)p − 2, we obtain limα→∞ φα(w) = w if w < p − 2; = p − 2 otherwise. This completes the proof.

Needless to say, we are interested in determining whether δα(X) for α > 1 is admissible or not. Before considering admissibility, we consider the permissibility of δα(X). We have à ! Z 2 µZ ¶ 1 kxk φ (w) 1 αkxk2 1/α exp − α dw ∝ λα(p/2−1)−1 exp(− λ)dλ 2 ² w 0 2 ∼ ckxk2−p,

62 for some constant c. The evaluation in the above is from Tauberian’s theorem. Hence

δα(X) is permissible by Theorem 3.12. Moreover by Proposition 3.13, δα(X) with α > 1 is admissible if it is generalized Bayes, that is, there exists a measure π which satisfies

Z µZ 1 ¶1/α exp(−kθ − xk2/2)π(dθ) = λα(p/2−1)−1 exp(−αkXk2λ/2)dλ . Rp 0

By Theorem 3.10, if δα(X) with α > 1 is admissible, then it is necessary to be written as

µZ 1 ¶1/α Z ∞ λα(p/2−1)−1 exp(α(1 − λ)y2/2)dλ = exp(yt)ν(dt) (3.85) 0 −∞ for a certain measure ν.

Remark 3.37. Recently, Maruyama and Iwasaki(2000) considered the inverse Laplace transform of the left-hand side of equation (3.85)

Z c+i∞ 1 1/α fα(t) ≡ F (y) exp(yt)dy, (3.86) 2πi c−i∞ for suitable constants c, where

Z 1 F (y) = λα(p/2−1)−1 exp(α(1 − λ)y2/2)dλ 0 and investigated the properties of asymptotic behavior of F (y), although the investiga-

tion is still incomplete. As a result, Maruyama and Iwasaki(2000) conjecture that fα(t) with α > 1 takes negative values for a sufficiently large t, which implies that there does

not exist a measure ν by unicity of inverse Laplace transform and hence δα(X) with α > 1 is inadmissible.

δα(X) can be represented as follows.

Definition 3.38. We say that an estimator is fake-Bayes if it is a (generalized) Bayes estimator for a different problem and for a suitable prior distribution.

63 Proposition 3.39. δα(X) is a fake-Bayes estimator for the following problem: −1 2 X is a random variable having Np(θ, α Ip), θ is estimated relative to the loss kδ − θk and the prior distribution which has a density function Z µ ¶ µ ¶ 1 λ p/2 αλ exp − kθk2 λ(α−1)(p/2−1)−2dλ 0 1 − λ 2(1 − λ) is taken.

proof. The marginal density function of X is written as Z Z µ ¶ 1 kx − θk2 αλ f(x) ∝ exp −α − kθk2 0 Rp 2 2(1 − λ) µ ¶ λ p/2 × λ(α−1)(p/2−1)−2dθdλ 1 − λ Z µ ¶ 1 αλ ∝ exp − kxk2 λα(p/2−1)−1dλ. 0 2 Noting that the generalized Bayes estimator is expressed as X + α−1∇ log f(X) in this case, we have the estimator (3.82).

As a conclusion of this section, we consider a necessary condition for dominance over the James-Stein estimator and the James-Stein positive-part estimator. For the risk of the estimator of the form (3.7), · ¸ φ(kXk2)(φ(kXk2) − 2(p − 2)) R(θ, δ ) = p + E − 4φ0(kXk2) , (3.87) φ kXk2 by using a Laplace approximation on the expectation in (3.87), we have · ¸ φ(kXk2)(φ(kXk2) − 2(p − 2)) E − 4φ0(kXk2) kXk2 φ(η)(φ(η) − 2(p − 2)) ≈ − 4φ0(η), η where η = kθk2. By carefully working with the error terms in the Laplace approxima- tion, Berger(1976a) showed that the error of approximation is o(η−1), that is, φ(η)(φ(η) − 2(p − 2)) R(θ, δ ) = p + − 4φ0(η) + o(η−1). (3.88) φ η

64 0 Here it is noted that if δφ is minimax, it is necessary that limη→∞ ηφ (η) = 0 is satisfied 0 because limη→∞ ηφ (η) 6= 0 implies that φ(η) is not bounded, so that we have

0 lim η(R(θ, δφ) − p) = lim (φ(η)(φ(η) − 2(p − 2)) − 4ηφ (η)) η→∞ η→∞ = +∞.

By the approximation R(θ, δJS) = p − (p − 2)2/η + o(η−1), we have

(φ(η) − (p − 2))2 R(θ, δJS) − R(θ, δ ) = − + o(η−1), (3.89) φ η for a minimax estimator δφ. Clearly we have the following result.

Theorem 3.40. A necessary condition for an estimator δφ to dominate the James- 0 Stein estimator is that limη→∞ φ(η) = p − 2 and limη→∞ ηφ (η) = 0.

Next we consider a necessary condition for an admissible estimator δφ to dominate the James-Stein positive-part estimator. If an estimator δφ is admissible, φ(w) < w is JS 2 clearly satisfied. Moreover note that δ+ = max(0, 1 − (p − 2)/kXk )X is written as JS 2 2 JS JS (1 − φ+ (kXk )/kXk )X, where φ+ (w) = min(w, p − 2). If the inequality φ+ (w) > φ(w) for every w ≥ 0 is satisfied, for the risks of these two estimators at kθk = 0, we JS JS have R(0, δ+ (X)) > R(0, δφ(X)), which implies that δφ does not dominate δ+ (X).

Therefore there exists w0(> p − 2) such that φ(w0) exceeds p − 2. By noting Theorem 3.40, we have the following result.

Theorem 3.41. A necessary condition for an admissible estimator δφ to dominate the 0 James-Stein positive-part estimator is that limη→∞ φ(η) = p − 2 and limη→∞ ηφ (η) = 0 and that there exists w0(> p − 2) such that φ(w0) exceeds p − 2, which implies that φ(w) is not monotone.

This theorem implies that an admissible estimator satisfying the sufficient condition for dominance over the James-Stein estimator, (B1) and (B2), could never improve the James-Stein positive-part estimator.

65 Chapter 4

Estimation of a mean vector of scale mixtures of multivariate normal distributions

4.1 Introduction and summary

Since Stein(1956), considerable effort has been given to improving upon the best equiv-

ariant estimator δ0, which is minimax under the quadratic loss function as shown in Kiefer(1957). The theoretical questions were answered quite thoroughly by Brown(1966).

He showed that in 3 or more dimensions δ0 is inadmissible for an extremely wide va- riety of distributions and loss functions. It is however very difficult to explicitly give improved minimax estimators without restriction of distributions and of loss functions. For the explicit construction of improved minimax estimators in non-normal case, the estimation of a mean vector of spherically symmetric distributions under the quadratic loss function has been an important topic in this field. As a special case of spherically symmetric distributions, Strawderman(1974) introduced scale mixtures of multivariate normal distributions, which have a probability density function Z ∞ kx − θk2v f(kx − θk2) = (2π)−p/2vp/2 exp(− )G(dv), 0 2

66 where G is a known distribution function. Strawderman(1974) proposed a class of im- proved minimax estimators of the mean vector. Generally Berger(1975) and Brandwein and Strawderman(1978,1991) found and discussed a class of minimax estimators for spherically symmetric distributions. In view of statistical decision theory, however, we are interested in proposing a class of estimators satisfying not only minimaxity but also admissibility. So far as I know, explicit results of constructing admissible minimax estimators of a mean vector were restricted to the case of the multivariate normal dis- tribution. For the normal case, see Strawderman(1971), Alam(1973), Berger(1976b), Fourdrinier et al.(1998), Maruyama(1998) and Chapter 3. In this chapter, the problem of estimating the mean vector θ of scale mixtures of multivariate normal distributions, in the case that G has a probability density function, under the quadratic loss function is considered. For certain class of these distributions which includes at least multivariate-t distributions, admissible minimax estimators are derived.

4.2 The construction of the generalized Bayes esti- mators

In this section, we derive an generalized Bayes estimator with respect to the following prior distribution. Let the conditional distribution of θ given t, 0 < t < 1, be normal −1 with mean 0 and covariance matrix t (1−t)Ip and a density function of t is proportional −a b to t (1 − t) I(0,1)(t) with b > −1. Therefore the density function of θ, ha,b(θ) is proportional to

Z 1 µ ¶p/2 µ ¶ t t 2 −a b ha,b(θ) = exp − kθk t (1 − t) dt. 0 1 − t 2(1 − t) It is noted that this prior distribution is proper for a < 1 and is improper for a ≥ 1. Under the quadratic loss function, the generalized Bayes estimator is expressed as R θf(kx − θk2)h (θ)dθ δ g (x) = R a,b . (4.1) a,b 2 f(kx − θk )ha,b(θ)dθ

67 To make the following calculations easy, by change of variables formula, we rewrite

ha,b(θ) as, for every v > 0, Z µ ¶ 1 λ p/2 h (θ) = λ−a(1 − λ)b a,b 1 − λ 0 µ ¶ vλkθk2 × exp − vp/2−a+1(1 − λ + λv)a−b−2dλ. 2(1 − λ) The integrals with respect to θ of both the denominator and numerator of right-hand side of (4.1) are calculated as follows: Z µ ¶ vkx − θk2 vλ numerator = θ exp − − kθk2 dθ p 2 2(1 − λ) ZR µ ¶ µ ¶ λv vkθ − (1 − λ)xk2 = θ exp − kxk2 exp − dθ Rp 2 2(1 − λ) µ ¶ µ ¶ λv 1 − λ p/2 = C exp − kxk2 (1 − λ)x 1 2 v and Z µ ¶ vkx − θk2 vλ denominator = exp − − kθk2 dθ Rp 2 2(1 − λ) µ ¶ µ ¶ λv 1 − λ p/2 = C exp − kxk2 , 1 2 v

g for a constant C1. Hence we have the generalized Bayes estimator δa,b(X) = (1 − g 2 2 φa,b(kXk )/kXk )X, where R R 1 ∞ λp/2−a+1vp/2−a+1(1 − λ)b(1 − λ + vλ)a−b−2 exp(−wvλ/2)g(v)dvdλ φ g (w) = w R0 R0 . a,b 1 ∞ p/2−a p/2−a+1 b a−b−2 0 0 λ v (1 − λ) (1 − λ + vλ) exp(−wvλ/2)g(v)dvdλ g Next we investigate some properties of the behavior of φa,b(w).

Theorem 4.1. 1. Assume that g(s1t1)g(s2t2) ≤ g(s1t2)g(s2t1) for s1 ≤ s2 and t1 ≤ g t2. Then φa,b(w) for b − a + 2 > 0 and b ≥ 0 is monotone nondecreasing.

g 2. φa,b(w) for b − a + 2 = 0 and b ≥ 0 is monotone nondecreasing.

68 proof. Putting Z 1 Z ∞ p/2−a+1 p/2−a+1 b a−b−2 Φ1(w) = λ v (1 − λ) (1 − λ + vλ) exp(−wvλ/2)g(v)dvdλ 0 0

and Z 1 Z ∞ p/2−a p/2−a+1 b a−b−2 Φ2(w) = λ v (1 − λ) (1 − λ + vλ) exp(−wvλ/2)g(v)dvdλ, 0 0

g g we have φa,b(w) = wΦ1(w)/Φ2(w). The derivative of φa,b(w) is written as

d g φa,b(w) = Φ1(w)/Φ2(w) dw Z Z w 1 ∞ − λp/2−a+2vp/2−a+2(1 − λ)b(1 − λ + vλ)a−b−2 2Φ (w) 2 µ 0 0 ¶ wvλ × exp − g(v)dvdλ 2 Z Z wΦ (w) 1 ∞ + 1 λp/2−a+1vp/2−a+2(1 − λ)b(1 − λ + vλ)a−b−2 2Φ (w)2 2 µ 0 0¶ wvλ × exp − g(v)dvdλ. 2 For b > 0, applying an integration by parts gives Z 1 λp/2−a+2(1 − λ)b(1 − λ + vλ)a−b−2 exp(−wvλ/2)dλ 0 Z 2 ³ 1 = (p/2 − a + 2)λp/2−a+1(1 − λ)b(1 − λ + vλ)a−b−2 exp(−wvλ/2)dλ wv 0 Z 1 − bλp/2−a+2(1 − λ)b−1(1 − λ + vλ)a−b−2 exp(−wvλ/2)dλ 0 Z 1 ´ + (a − b − 2)(v − 1)λp/2−a+2(1 − λ)b(1 − λ + vλ)a−b−3 exp(−wvλ/2)dλ 0Z p 1 = λp/2−a+1(1 − λ)b(1 − λ + vλ)a−b−2 exp(−wvλ/2)dλ wv 0Z 2 1 − ((a − 2)(1 − λ) + bvλ)λp/2−a+1(1 − λ)b−1 wv 0 ×(1 − λ + vλ)a−b−3 exp(−wvλ/2)dλ. (4.2)

69 Similarly, we get

Z 1 λp/2−a+1(1 − λ)b(1 − λ + vλ)a−b−2 exp(−wvλ/2)dλ 0 Z p − 2 1 = λp/2−a(1 − λ)b(1 − λ + vλ)a−b−2 exp(−wvλ/2)dλ wv Z 0 2 1 − ((a − 2)(1 − λ) + bvλ)λp/2−a(1 − λ)b−1 wv 0 ×(1 − λ + vλ)a−b−3 exp(−wvλ/2)dλ.

Therefore we have µ Z Z d 1 ³ 1 ∞ φ g (w) = Φ (w) ((a − 2)(1 − λ) + bvλ)λp/2−a+1(1 − λ)b−1 dw a,b Φ (w)2 2 2 0 0 ´ ×(1 − λ + vλ)a−b−3 exp(−wvλ/2)vp/2−a+1g(v)dvdλ ³Z 1 Z ∞ p/2−a b−1 −Φ1(w) ((a − 2)(1 − λ) + bvλ)λ (1 − λ) 0 0 ´¶ ×(1 − λ + vλ)a−b−3 exp(−wvλ/2)vp/2−a+1g(v)dvdλ .

Making the transformation t = (1 − λ)−1λv gives

µZ 1 Z ∞ d g 1 (a − 2) + bt λ φa,b(w) = 2 fw(t, λ)dtdλ dw Φ2(w) 0 0 1 + t 1 − λ Z 1 Z ∞ × fw(t, λ)dtdλ Z 0Z 0 1 ∞ (a − 2) + bt 1 − fw(t, λ)dtdλ 0 0 1 + t 1 − λ Z 1 Z ∞ × λfw(t, λ)dtdλ, 0 0 where µ ¶ µ ¶ wt 1 − λ f (t, λ) = tp/2−a+1(1 + t)a−b−2λ−2(1 − λ)p/2 exp − (1 − λ) g t . w 2 λ

Here FKG inequality due to Fortuin et al.(1971) is useful.

70 Lemma 4.2 (FKG inequality). Let ξ denote a probability density function with re- spect to ν for a q-variate random vector. For two points y = (y1, ··· , yq) and z =

(z1, ··· , zq), define

y ∧ z = (y1 ∧ z1, ··· , yq ∧ zq),

y ∨ z = (y1 ∨ z1, ··· , yq ∨ zq) where a ∧ b = min(a, b), a ∨ b = max(a, b). Suppose that ξ satisfies

ξ(y)ξ(z) ≤ ξ(y ∨ z)ξ(y ∧ z) (4.3) and that α(y), β(y) are nondecreasing in each argument and that α, β, αβ are integrable with respect to ξ. then Z Z Z αβξdν ≥ αξdν βξdν.

Here we note that fw(t, λ) satisfies (4.3) by assumption of the theorem. In the case of b − a + 2 ≥ 0, as the function (a − 2 + bt)(1 + t)−1(1 − λ)−1 is nondecreasing in t g and λ, the derivative of φa,b(w) is nonnegative. For b − a + 2 = 0, it is noted that the condition with respect to g(v) is not needed. In the case of b = 0, while the difference of the calculation on (4.2) give rise to slight g g difference of the form of the derivative of φa,b(w), we are able to show that φa,b(w) is nondecreasing similarly. This completes the proof.

g 0 Theorem 4.3. φa,b(w)/w is nonincreasing if a − b − 2 ≥ 0 and vg (v)/g(v) is nonin- creasing.

g proof. By change of variables formula, we write φa,b(w)/w as R R φ g (w) 1 ∞ λ−1tp/2−a+1(1 − λ)b(1 − λ + t)a−b−2 exp(−wt/2)g(t/λ)dtdλ a,b = R0 R0 w 1 ∞ λ−2tp/2−a+1(1 − λ)b(1 − λ + t)a−b−2 exp(−wt/2)g(t/λ)dtdλ R0 0 ∞ ψ (t)tp/2−a+1 exp(−wt/2)dt = R0 1 , ∞ p/2−a+1 0 ψ2(t)t exp(−wt/2)dt

71 where

Z 1 −1 b a−b−2 ψ1(t) = λ (1 − λ) (1 − λ + t) g(t/λ)dλ 0 and

Z 1 −2 b a−b−2 ψ2(t) = λ (1 − λ) (1 − λ + t) g(t/λ)dλ. 0

g The derivative of φa,b(w)/w is

µ g ¶ µZ 1 ¶−2 µZ 1 Z 1 d φa,b(w) −1 = 2 kw(t)dt ψ(t)kw(t)dt tkw(t)dt dw w 0 0 0 Z 1 Z 1 ¶ − kw(t)dt tψ(t)kw(t)dt , 0 0

p/2−a+1 where kw(t) = ψ2(t)t exp(−wt/2) and ψ(t) = ψ1(t)/ψ2(t). By the correlation g inequality, φa,b(w)/w is nonincreasing if ψ(t) is nondecreasing in t. We can calculate the derivative of ψ(t) as

Z 1 0 −1 ¡ a−b−3 ψ (t) = ψ2(t) (a − b − 2)(1 − λ + t) g(t/λ) 0 +(1 − λ + t)a−b−2λ−1g0(t/λ)) λ−1(1 − λ)bdλ Z 1 −2 −2 b ¡ a−b−3 −ψ2(t) ψ1(t) λ (1 − λ) (a − b − 2)(1 − λ + t) g(t/λ) 0 +(1 − λ + t)a−b−2λ−1g0(t/λ)) dλ.

By the assumption of the theorem, (a−b−2)(1−λ+t)−1+λ−1g0(t/λ)/g(t/λ) is increasing in λ, which implies that ψ(t) is nondecreasing in t by the correlation inequality. This completes the proof.

Theorem 4.4. For max(a − 2, 0) ≤ b < p/2 − 1,

Z ∞ g −1 lim φa,b(w) = 2(p/2 − a + 1) v g(v)dv. w→∞ 0

72 g proof. φa,b(w) can be expressed as

Z ∞ ÁZ ∞ g −1 φa,b(w) = Ψ1(v, w)v g(v)dv Ψ2(v, w)g(v)dv 0 0 where

Z 1 p/2−a+2 p/2−a+1 b a−b−2 Ψ1(v, w) = (wv) λ (1 − λ) (1 − λ + vλ) exp(−wvλ/2)dλ 0 and

Z 1 p/2−a+1 p/2−a b a−b−2 Ψ2(v, w) = (wv) λ (1 − λ) (1 − λ + vλ) exp(−wvλ/2)dλ. 0 Making a transformation gives

Z ∞ p/2−a+2 Ψ1(v, w) = (2w) q(s, v) exp(−ws)ds 0

p/2−a+1 b a−b−2 p/2−a+1 where g(s, v) = s (1−2s/v) (1−2s/v+2s) I(0,v/2)(s). Since q(s, v) ∼ s as s → 0+, from Tauberian’s theorem, we have

Z ∞ q(s, v) exp(−ws)ds ∼ Γ(p/2 − a + 2)w−(p/2−a+2) 0

p/2−a+2 as w → ∞, which implies that limw→∞ Ψ1(v, w) = 2 Γ(p/2 − a + 2). Next we

will show Ψ1(v, w) is bounded to the above. For max(a − 2, 0) ≤ b < p/2, Ψ1(v, w) is evaluated as

Z 1 p/2−a+2 p/2−b p/2−b−1 Ψ1(v, w) ≤ w v λ exp(−wvλ/2)dλ 0 Z ∞ ≤ w−a+b+22p/2−b tp/2−b−1 exp(−t)dt. 0

Therefore Ψ1(v, w) for max(a − 2, 0) ≤ b < p/2 is bounded by a constant value

which is not depend upon v and w. Similarly we can show that limw→∞ Ψ2(v, w) = p/2−a+1 2 Γ(p/2 − a + 1) and that Ψ2(v, w) for max(a − 2, 0) ≤ b < p/2 − 1 is bounded to

73 the above. By Lebesque’s dominated convergence theorem, we have

Z ∞ Á Z ∞ g −1 lim φa,b(w) = lim Ψ1(v, w)v g(v)dv lim Ψ2(v, w)g(v)dv w→∞ w→∞ 0 w→∞ 0 Z ∞ ÁZ ∞ −1 = lim Ψ1(v, w)v g(v)dv lim Ψ2(v, w)g(v)dv 0 w→∞ 0 w→∞ Z ∞ = 2(p/2 − a + 1) v−1g(v)dv. 0

4.3 Admissible Minimax estimators of the mean vec- tor

2 2 For the shrinkage estimator δφ(X) = (1 − φ(kXk )/kXk )X, the known minimax con- dition in the case of scale mixtures of multivariate normal distributions is following.

Theorem 4.5 (Strawderman(1974) and Berger(1975)). Assume the following con- ditions:

(C1) φ(w) is nondecreasing.

(C2) φ(w)/w is nonincreasing.

R ∞ (C3) 0 ≤ φ(w) ≤ 2(p − 2)/ 0 vg(v)dv.

Then δφ(X)is minimax.

By Theorem 4.1 and 4.3, except for the case of b − a + 2 = 0, we cannot get the range of values (a, b) where both assumption (C1) and (C2) of Theorem 4.5 are satisfied. Taking account of admissibility, we have to restrict the range of a < 1 and b > −1 which implies that b − a + 2 > 0. Therefore we would like to derive the sufficient condition g for minimaxity which does not require the decrease of φa,b(w)/w. The below theorem is a special case of Berger(1975)’s Theorem 3, which provides the sufficient condition for minimaxity of general spherically symmetric distributions.

74 Theorem 4.6 (Berger(1975)). Assume the following conditions:

(D1) φ(w) is nondecreasing. R R ∞ p/2−1 ∞ p/2 (D2) 0 ≤ φ(w) ≤ 2(p − 2) 0 v g(v)dv/ 0 v g(v)dv.

Then δφ(X)is minimax. proof. By using the Stein identity, Lemma 3.2, the risk difference can be written as

¡ 2¢ ¡ ¡ 2 2¢ 2¢ R(θ, X) − R(θ, δφ) = E kX − θk − E k 1 − φ(kXk )/kXk X − θk Z Z ∞ 2(p − 2)φ(kxk2) vp/2−1 kx − θk2v = 2 p/2 exp(− )g(v)dvdx Rp 0 kxk (2π) 2 Z Z ∞ p/2−1 2 0 2 v kx − θk v + 4φ (kxk ) p/2 exp(− )g(v)dvdx p (2π) 2 ZR Z0 ∞ φ2(kxk2) vp/2 kx − θk2v − 2 p/2 exp(− )g(v)dvdx. Rp 0 kxk (2π) 2 Using the inequality R R ∞ vp/2−1 exp(−vkx − θk2/2)g(v)dv ∞ vp/2−1g(v)dv R0 ≥ R0 , ∞ p/2 2 ∞ p/2 0 v exp(−vkx − θk /2)g(v)dv 0 v g(v)dv which is shown by the correlation inequality, we evaluate the risk difference as ÃÃ ! Z Z R ∞ ∞ vp/2g(v)dv φ(kxk2) R(θ, X) − R(θ, δ ) ≥ 2(p − 2) − R 0 φ(kxk2) φ ∞ p/2−1 2 Rp 0 v g(v)dv kxk ¶ 0 kx − θk2v +4φ0(kxk2) (2π)−p/2vp/2−1 exp(− )g(v)dvdx. 2

By the conditions of (D1) and (D2), we see that R(θ, X) − R(θ, δφ) is nonnegative for every θ. This completes the proof.

Combining Theorem 4.1, 4.3, 4.4, 4.5 and 4.6, we have the theorem about minimaxity g and admissibility of δa,b(X).

75 0 g Theorem 4.7. 1. If a = b + 2 and vg (v)/g(v) is nonincreasing, δa,b(X) for

max(bg, 0) ≤ b < p/2 − 1 is minimax, where ¡R R ¢ ∞ ∞ −1 −1 bg = (p/2 − 1)(1 − 2 0 vg(v)dv 0 v g(v)dv ).

g 2. If g(v) satisfies the assumption of part 1 of Theorem 4.1, δa,b(X) for

ag ≤ a < p/2 + 1 and max(a − 2, 0) ≤ b < p/2 − 1 is minimax, where R ¡R R ¢ ∞ p/2−1 ∞ −1 ∞ p/2 −1 ag = p/2 + 1 − (p − 2) 0 v g(v)dv 0 v g(v)dv 0 v g(v)dv .

g 3. If g(v) satisfies the assumption of part 1 of Theorem 4.1 and ag < 1, δa,b(X) for

ag ≤ a < 1 and 0 ≤ b < p/2 − 1 is admissible and minimax.

a

p/2+1

minimax

1

-2 -1 p/2-1 minimax & admissible b

a(g) admissible

Figure 4.1 : Ranges of values (a, b) for minimaxity and admissibility (Theorem 4.7).

76 The result of part 1 of Theorem 4.7 is a revised version of Strawderman(1974)’s one

about minimax generalized Bayes estimators. In fact, ha,b(θ) with a = b + 2 is propor- tional to kθk2−p+2b, which corresponds to the generalized prior density considered in Strawderman(1974). Finally we consider a class of g(v) satisfying the assumption of part 1 of Theorem α−1 4.1. We easily check that gα,β(v) = Cv exp(−βv) for α > 0 and β > 0 satisfies

this condition. In particular, gα,β(v) for α = β = m/2 corresponds to the case of

multivariate-t distribution whose degrees of freedom is m. Since we calculate ag = p/2 + 1 − (p − 2)(m − 2)/(p + m − 2) in this case, minimax admissible estimators can be proposed if p ≥ 5 and m > (p − 2)(p + 4)/(p − 4).

77 Chapter 5

Estimation problems on a multiple regression model

In Chapter 3, we considered the estimation problem of a multivariate normal mean with known covariance matrix. From a practical sense, it is however important to discuss the case where a variance of the underlying normal distribution is unknown. For instance, a canonical form of a multiple regression model is given by

2 2 2 X ∼ Np(θ, σ Ip), S/σ ∼ χn,

2 where random variables X and S are independent and χn denotes a chi square distribu- tion with degrees of freedom n. In this chapter, we deal with the problem of estimating the mean vector θ and the variance σ2.

5.1 Minimax estimators of a normal variance

5.1.1 Introduction

In this section, we review my published paper Maruyama(1999a), where the problem of estimating the unknown variance σ2 by an estimator δ relative to the quadratic loss 2 2 2 function L1(σ , δ) = (δ/σ − 1) is considered. Stein(1964) showed that the best affine −1 equivariant estimator is δ0 = (n+2) S and it can be improved by considering a class of

78 2 scale equivariant estimators δφ = φ(Z)S, for Z = kXk /S. He really found an improved estimator δST = φST (Z)S, where φST (z) = min((n+2)−1, (n+p+2)−1(1+z)). Brewster and Zidek(1974) derived an improved generalized Bayes estimator δBZ = φBZ (Z)S, where R 1 1 λp/2−1(1 + λz)−(n+p)/2−1dλ φBZ (z) = R0 . n + p + 2 1 p/2−1 −(n+p)/2−2 0 λ (1 + λz) dλ We note that shrinkage estimators such as Stein’s procedure and Brewster-Zidek’s pro- cedure are derived by using the vague prior information that λ = kθk2/σ2 is close to 0. It goes without saying that we would like to get significant improvement of risk when ST the prior information is accurate. Though δ improves on δ0 at λ = 0, it is not an- alytic and thus inadmissible. On the other hand, Brewster-Zidek’s estimator does not improve on δ0 at λ = 0 though it is admissible as shown in Proskin(1985). Therefore it is desirable to get analytic improved estimators dominating δ0 especially at λ = 0. In V V this thesis, we give such a class of improved estimators δα = φα (Z)S, where

R 1 1 λαp/2−1(1 + λz)−α((n+p)/2+1)dλ φV (z) = R 0 , α n + p + 2 1 αp/2−1 −α((n+p)/2+1)−1 0 λ (1 + λz) dλ

V for α > 1. It is noted that δ1 coincides with Brewster-Zidek’s estimator. Further V V we demonstrate that δα with α > 1 improves on δ0 especially at λ = 0 and that δα approaches Stein’s estimator when α tends to infinity.

5.1.2 Main results

V 2 We first derive the estimator δα . The problem of estimating α times the variance ασ 2 2 2 relative to the loss function Lα(σ , d) = (d/(ασ ) − 1) is considered, which is slight different from that of the variance σ2. Among many generalized Bayes estimators for this problem, by selecting a suitable prior distribution, we can propose the estimator V 2 2 δα with α > 1 which is not suitable for minimax estimator of ασ but that of σ . So far V we have not determine whether or not δα with α > 1 is the generalize Bayes estimator of σ2.

79 V 2 Calculation for deriving the estimator δα is following. For η = 1/(ασ ), let the conditional distribution of θ given λ, 0 < λ < 1, be normal with mean 0 and covari- −1 −1 −1 ance matrix λ (1 − λ)α η Ip and density functions of λ and η are proportional to (α−1)p/2−1 (α−1)((p+n)/2+1)−1 λ I(0,1)(λ) and η I(0,∞)(η), respectively . Then the joint density function g(η, x, s) of η, X, S is given by Z µ ¶ µ ¶ ³ αη ´ λ p/2 λ αη g(η, x, s) ∝ ηp/2 exp − kx − θk2 η exp − kθk2 2 1 − λ 1 − λ 2 ·η(α−1)((p+n)/2+1)−1λ(α−1)p/2−1ηn/2 exp(−αηs/2)dθdλ Z µ ¶ µ ¶ λ p/2 kθ − (1 − λ)xk2 αηkxk2λ ∝ ηp/2 η exp −αη − 1 − λ 2(1 − λ) 2 ·η(α−1)((p+n)/2+1)−1λ(α−1)p/2−1ηn/2 exp(−αηs/2)dθdλ Z µ ¶ 1 kxk2λ + s ∝ ηα((p+n)/2+1)−2 λαp/2−1 exp −αη dλ. 0 2

2 2 As a generalized Bayes estimator for the loss function Lα(d, ασ ) is E(η | X,S)/E(η | X,S), we have the generalized Bayes estimator R R 1 λαp/2−1 ∞ ηα((p+n)/2+1)−1 exp (−αη(kXk2λ + S)/2) dλ δV = R0 R0 α 1 αp/2−1 ∞ α((p+n)/2+1) 2 0 λ 0 η exp (−αη(kXk λ + S)/2) dλ R 1 1 λαp/2−1(1 + λZ)−α((n+p)/2+1)dλ = R 0 S. n + p + 2 1 αp/2−1 −α((n+p)/2+1)−1 0 λ (1 + λZ) dλ

V In the following, φα (z) is represented through the hypergeometric function

X∞ (a) (b) xn F (a, b, c, x) = 1 + n n for (a) = a · (a + 1) ··· (a + n − 1). (c) n! n n=1 n The following facts about F (a, b, c, x), from Abramowitz and Stegun(1964), are needed; Z x xa ta−1(1 − t)b−1dt = F (a, 1 − b, a + 1, x) for a, b > 1, (5.1) 0 a

F (a, b, c, x) = (1 − x)c−a−bF (c − a, c − b, c, x), (5.2)

80 (c − a − b)F (a, b, c, x) − (c − a)F (a − 1, b, c, x) + b(1 − x)F (a, b + 1, c, x) = 0, (5.3)

(b − a)(1 − x)F (a, b, c, x) − (c − a)F (a − 1, b, c, x) + (c − b)F (a, b − 1, c, x) = 0, (5.4)

F (a, b, c, 1) = ∞ when c − a − b ≤ −1. (5.5)

Making a transformation and using (5.1) and (5.2), we have

Z 1 Z 1 λαp/2−1(1 + λz)−α((n+p)/2+1)dλ/ λαp/2−1(1 + λz)−α((n+p)/2+1)−1dλ 0 0 Z z Z z z+1 z+1 = tαp/2−1(1 − t)α(n/2+1)−1dt/ tαp/2−1(1 − t)α(n/2+1)dt 0 0 F (1, α(n + p + 2)/2, αp/2 + 1, z/(z + 1)) = (z + 1) . F (1, α(n + p + 2)/2 + 1, αp/2 + 1, z/(z + 1))

V Moreover by (5.3) and (5.4), φα (z) is expressed as · ¸ 1 p z + 1 φV (z) = 1 − α n + 2 p + n + 2 F (1, α(p + n + 2)/2 + 1, αp/2 + 1, z/(z + 1)) · ¸ 1 p = 1 − . (5.6) n + 2 (n + 2)F (1, α(p + n + 2)/2, αp/2 + 1, z/(z + 1)) + p

The following sufficient condition for minimaxity is very useful.

Theorem 5.1 (Brewster and Zidek(1974)). Assume that

(E1) φ(z) is nondecreasing and limz→∞ φ(z) = 1/(n + 2)

(E2) φ(z) ≥ φBZ (z).

Then δ(X,S) = φ(Z)S is minimax. proof. See Brewster and Zidek(1974) and Kubokawa(1994).

Making use of this theorem, we have the following.

81 V Theorem 5.2. The estimator δα with α ≥ 1 is minimax.

V proof. We shall verify that φα (z) with α ≥ 1 satisfies the conditions (E1) and (E2). V Since F (1, α(p + n + 2)/2, αp/2 + 1, z/(z + 1)) is increasing in z, φα (z) is increasing in V V z. By (5.5), it is clear that limz→∞ φα (z) = 1/(n + 2). In order to show that φα (z) ≥ V φ1 (z) for α ≥ 1, we have only to check that F (1, α(p + n + 2)/2, αp/2 + 1, z/(z + 1)) is increasing in α, which is easily verified because the coefficient of each term of the right-hand side of the equation

F (1, α(p + n + 2)/2, αp/2 + 1, z/(z + 1)) µ ¶ p + n + 2 z (p + n + 2)(p + n + 2 + 2/α) z 2 = 1 + + + ··· (5.7) p + 2/α 1 + z (p + 2/α)(p + 4/α) 1 + z is increasing in α. We have thus proved the theorem.

V Now we investigate the nature of the risk of δα with α > 1. By using Kubokawa(1994)’s V method, the risk difference between δ0 and δα at λ = 0 is written as Z ∞ Z ∞ V d V ¡ V V ¢ 2 R(0, δ0) − R(0, δα ) = 2 φα (z) φα (z) − φ1 (z) t Fp(zt)fn(t)dtdz, 0 dz 0 2 where fk(t) and Fk(t) designate the density and the distribution functions of χk. There- V fore we see that Brewster-Zidek’s estimator (δα with α = 1) does not improve upon the best equivariant estimator at λ = 0. See also Rukhin(1992). On the other hand, V V since φα (z) is strictly increasing in α, δα with α > 1 improves on the best equivariant estimator especially at λ = 0. Figure 5.1 gives a comparison of the respective risks of V the best equivariant estimator, Stein’s estimator, Brewster-Zidek’s estimator, δα with α = 2, 4 and 10 for p = 10 and n = 4. This figure reveals that the risk behavior of V δα with α = 10 is similar to that of Stein’s truncated estimator. In fact, we have the following result.

V Proposition 5.3. δα approaches Stein’s estimator when α tends to infinity. proof. Since F (1, α(p+n+2)/2, αp/2+1, z/(z +1)) is increasing in α, by the monotone P∞ −1 i convergence theorem this function converges to i=0{(p(z + 1)) (n + p + 2)z} when

82 α tends to infinity. Considering two cases: (n + p + 2)z < (≥)p(z + 1), we obtain V limα→∞ φα (z) = (1 + z)/(n + p + 2) if z < p/(n + 2); = 1/(n + 2) otherwise. This completes the proof.

0.35

0.34

0.33

0.32

0.31 risk

0.3 best equivariant Brester-Zidek V2 V4 0.29 V10 Stein

0.28

0.27 0 2 4 6 8 10 12 14 noncentrality parameter

ST BZ V Figure 5.1 : Comparison of the risks of the estimators δ0, δ , δ , δα with α = 2, 4, 10. V ’noncentrality parameter’ denotes kθk/σ.Vα denotes the risk of δα .

GH Remark 5.4. Ghosh(1994) proposed the minimax generalized Bayes estimator δk = GH φk (Z)S, where

R 1 1 λp/2+k(1 + λz)−(n+p)/2−(k+2)dλ φGH (z) = R0 , k n + p + 2(k + 2) 1 p/2+k −(n+p)/2−(k+3) 0 λ (1 + λz) dλ

83 GH for −1 − p/2 < k ≤ −1. Clearly δk with k = −1 coincides with Brewster-Zidek’s GH estimator and we can see that δk with −1 − p/2 < k < −1 improves on the best equivariant estimator especially at λ = 0. It is noted that without the troublesome calculation in Ghosh(1994), the minimaxity of Ghosh’s estimators is easily proved in GH the same way as Theorem 5.2 because φk with −1−p/2 < k ≤ −1 satisfies conditions (E1) and (E2).

2 2 2 Remark 5.5. For the entropy loss function L0(σ , d) = d/σ − log(d/σ ) − 1, the dis- cussions in this section are directly applied. In this case, the best equivariant estimator ST ST is the unbiased estimator δ0 = S/n and Stein’s truncated estimator is δ = φ (Z)S, where φST (z) = min(1/n, (n + p)−1(1 + z)). Moreover Brewster-Zidek’s estimator is δBZ = φBZ (Z)S, where

R 1 1 λp/2−1(1 + λz)−(n+p)/2dλ φBZ (z) = R 0 . n + p 1 p/2−1 −(n+p)/2−1 0 λ (1 + λz) dλ

V V Then the proposed minimax estimator is δα = φα (Z)S, where

R 1 1 λαp/2−1(1 + λz)−α(n+p)/2dλ φV (z) = R 0 , α n + p 1 αp/2−1 −α(n+p)/2−1 0 λ (1 + λz) dλ for α > 1.

5.2 Estimation of a multivariate normal mean with unknown variance

5.2.1 Introduction

In this section, we review results of Maruyama(1999b) where the problem of estimating the mean vector θ relative to the quadratic loss function kδ − θk2/σ2 is considered. The usual minimax estimator X is inadmissible for p ≥ 3. James and Stein(1961) con- structed the improved estimator δJS = [1 − ((p − 2)/(n + 2))S/kXk2] X, which is also JS JS dominated by the James-Stein positive-part estimator δ+ = max(0, δ ) as shown in

84 Baranchik(1964). We note that shrinkage estimators such as James-Stein’s procedure are derived by using the vague prior information that λ = kθk2/σ2 is close to 0. It goes without saying that we would like to get significant improvement of risk when the prior JS JS information is accurate. Though δ+ improves on δ at λ = 0, it is not analytic and thus inadmissible. Kubokawa(1991) showed that one of minimax generalized Bayes esti- mators derived by Lin and Tsai(1973) dominates δJS. This estimator, however, does not improve on δJS at λ = 0. Therefore it is desirable to get analytic improved estimators dominating δJS especially at λ = 0. In this thesis, we show that the estimators in the M M subclass of Lin-Tsai’s minimax generalized Bayes estimators δα = (1 − ψα (Z)/Z)X, where Z = kXk2/S and R 1 λα(p−2)/2(1 + λz)−α(n+p)/2−1dλ ψM (z) = z R 0 α 1 α(p−2)/2−1 −α(n+p)/2−1 0 λ (1 + λz) dλ

JS M with α ≥ 1, dominate δ . It is noted that δα with α = 1 coincides with the estimator M derived by Kubokawa(1991). Further we demonstrate that δα with α > 1 improves on JS M δ especially at λ = 0 and that δα approaches the James-Stein positive-part estimator when α tends to infinity.

5.2.2 Main results

M We first represent ψα (z) through the hypergeometric function F (a, b, c, x). The formu- las from (5.1) to (5.5) and

c(1 − x)F (a, b, c, x) − cF (a, b − 1, c, x) + (c − a)xF (a, b, c + 1, x) = 0, (5.8) are needed. Making a transformation and using (5.1) and (5.2), we have

Z 1 Z 1 λα(p/2−1)(1 + λz)−α(n+p)/2−1dλ/ λα(p/2−1)−1(1 + λz)−α(n+p)/2−1dλ 0 0 α(p/2 − 1) F (1, α(n + p)/2 + 1, α(p/2 − 1) + 2, z/(z + 1)) = . α(p/2 − 1) + 1 F (1, α(n + p)/2 + 1, α(p/2 − 1) + 1, z/(z + 1))

85 Moreover by (5.3) and (5.8), we obtain · ¸ p − 2 n + p ψM (z) = 1 − . α n + 2 (n + 2)F (1, α(n + p)/2, α(p − 2)/2 + 1, z/(z + 1)) + p − 2 (5.9)

Making use of (5.9), we can easily prove the theorem.

M JS Theorem 5.6. The estimator δα with α ≥ 1 dominates δ .

M proof. We shall verify that ψα (z) with α ≥ 1 satisfies the condition for dominating the

James-Stein estimator derived by Kubokawa(1994): for δψ = (1 − ψ(Z)/Z)X, ψ(z) is M nondecreasing, limz→∞ ψ(z) = (p − 2)/(n + 2), and ψ(z) ≥ ψ1 (z). M Since F (1, α(n+p)/2, α(p−2)/2+1, z/(z+1)) is increasing in z, ψα (z) is increasing M in z. By (5.5), it is clear that limz→∞ ψα (z) = (p − 2)/(n + 2). In order to show that M M ψα (z) ≥ ψ1 (z) for α ≥ 1, we have only to check that F (1, α(n + p)/2, α(p − 2)/2 + 1, z/(z + 1)) is increasing in α, which is easily verified because the coefficient of each term of the right-hand side of the equation

F (1, α(n + p)/2, α(p − 2)/2 + 1, z/(z + 1)) µ ¶ p + n z (p + n)(p + n + 2/α) z 2 = 1 + + + ··· (5.10) p − 2 + 2/α 1 + z (p − 2 + 2/α)(p − 2 + 4/α) 1 + z is increasing in α. We have thus proved the theorem.

M Now we investigate the nature of the risk of δα with α ≥ 1. By using Kubokawa(1994)’s M method, the risk difference between the James-Stein estimator and δα at λ = 0 is written as

Z ∞ µ M M ¶ Z z JS M d M ψα (z) − ψ1 (z) −1 R(0, δ ) − R(0, δα ) = 2(n + p) ψα (z) M s h(s)dsdz, 0 dz 1 + ψ1 (z) 0 R ∞ 2 where h(s) = 0 vfn(v)fp(vs)dv and fp(t) designates a density of χp. Therefore we see M that Kubokawa(1991)’s estimator (δα with α = 1) does not improve upon the James- M Stein estimator at λ = 0. On the other hand, since ψα (z) is strictly increasing in α, M δα with α > 1 improves on the James-Stein estimator especially at λ = 0 and the risk

86 M R(λ, δα ) at λ = 0 is decreasing in α. Figure 5.2 gives a comparison of the respective M risks of the James-Stein estimator, its positive-part rule and δα with α = 1, 2 and 10 for p = 8 and n = 4.

8

7

6

risk

5 James-Stein Kubokawa M2 M10 Positive-Part

4

3 0 1 2 3 4 5 6 noncentrality parameter JS JS M Figure 5.2 : Comparison of the risks of the estimators δ , δ+ , δα with α = 1, 2, 10. M ’noncentrality parameter’ denotes kθk/σ.Mα denotes the risk of δα .

M This figure reveals that the risk behavior of δα with α = 10 is similar to that of the James-Stein positive-part estimator. In fact, we have the following result.

M Proposition 5.7. δα approaches the James-Stein positive-part estimator when α tends M JS to infinity, that is, limα→∞ δα = δ+ . proof. Since F (1, α(n+p)/2, α(p−2)/2+1, z/(z+1)) is increasing in α, by the monotone P∞ −1 i convergence theorem this function converges to i=0{((p − 2)(z + 1)) (n + p)z} .

87 M Considering two cases: (n + p)z < (≥)(p − 2)(z + 1), we obtain limα→∞ ψα (z) = z if z < (p − 2)/(n + 2); = (p − 2)/(n + 2) otherwise. This completes the proof.

We, here, consider the connection between the estimation of the mean vector and the variance. It is noted that the James-Stein estimator is expressed as (1 − ((p − 2 2 2 −1 2 2)/kXk )ˆσ0)X, forσ ˆ0 = (n + 2) S and thatσ ˆ0 is the best affine equivariant estimator 2 2 2 of the variance under the quadratic loss function L1(σ , d) = (d/σ − 1) , as seen in 2 Section 5.1. Stein(1964) showed thatσ ˆ0 is inadmissible. George(1990) suggested that it might be possible to use an improved variance estimator to improve on the James- Stein estimator. In the same way as Theorem 5.6, we have the following result, which includes Berry(1994)’s and Kubokawa et al.(1993)’s.

Theorem 5.8. Assume that σˆ2 is an improved estimator of variance, which satisfies the conditions (E1) and (E2) of Theorem 5.1. Then δ = (1 − (p − 2)ˆσ2/kXk2)X dominates δJS.

It should be, however, noticed that the estimator derived from Theorem 5.8 is obviously inadmissible, since (1 − (p − 2)ˆσ2/kXk2) sometimes takes a negative value.

88 Chapter 6

A New Positive Estimator of Loss Function

6.1 Introduction

In this chapter, we review my published paper Maruyama(1997). Let X be a random variable having a p-variate normal distribution Np(θ, Ip). We assume that for the prob- lem of estimating θ relative to the quadratic loss function, the usual minimax estimator X is used. Now it is of great interest in this loss kX − θk2, which depends on θ and is unobservable. Therefore we wish to estimate it from the data using rule δ(X). To study how well δ(X) estimates the loss, we use the squared error loss function. Thus the expected distance or risk incurred by δ(X) is R(δ, θ) = E[{δ(X) − kX − θk2}2].

Johnstone(1987) showed that the unbiased estimator δU (X) = p of the loss is admissible when p ≤ 4, but is dominated by δJO(X) = p − 2(p − 4)/kXk2 when p ≥ 5. While kX − θk2 is non-negative, δJO(X) takes negative values when kXk2 < 2(p − 4)/p. We need to modify δJO(X) in order to eliminate this undesirable property. One possible modification is, as proposed in Johnstone(1987), to truncate the rule at zero, that is JO 2 δ0 (X) = max (0, p − 2(p − 4)/kXk ). However, this modification is imperfect because the loss is positive with probability 1. In this thesis, We propose a class of positive

89 estimators improving on δJO(X): · ¸ 2(p − 4) p δJO(X) = max a, p − for 0 < a ≤ 2 . a kXk2 p − 2

Through our simulation results, we see that positive estimators should be chosen if one 2 2 has no information about kθk . The case where the covariance matrix of X is σ Ip for unknown σ2 is also discussed.

6.2 A positive estimator improving on Johnstone’s estimator

2 2 We consider a class of orthogonally invariant estimators δφ(X) = p − φ(kXk )/kXk . Using Stein(1973)’s identity, Lemma 3.2, Johnstone(1987) derived the unbiased estima- tor of the risk, which is written as "µ ¶ # φ(kXk2) 2 R(δ (X), θ) = E p − − kX − θk2 φ kXk2 · µ ¶¸ φ0(kXk2) φ(kXk2) = 2p + E 4(p − 4) − kXk2 kXk4 · ¸ φ2(kXk2) +E 8φ00(kXk2) + . (6.1) kXk4

Then by the above equation and Kubokawa(1994)’s argument, the condition on φ(w) for the improvement on δJO(X) is obtained.

Theorem 6.1. Assume the following conditions;

1. φ(w) is nondecreasing and limw→∞ φ(w) = 2(p − 4),

2. φ(w) ≥ min(2(p − 4), φ0(w)) for every w, where,

wp/2−1e−w/2 φ (w) = 2(p − 4) + (2 − 4/w)R . 0 w p/2−3 −y/2 0 y e dy

90 JO Then δφ(X) improves on δ (X).

JO proof. We want to get the condition of φ(w) satisfying R(δ , θ) − R(δφ, θ) ≥ 0 for every θ. By using the equation (6.1) and noting φ(∞) = 2(p − 4), the risk difference is written as

R(δJO, θ) − R(δ , θ) · φ ¸ 4(p − 4)2 φ0(kXk2) φ(kXk2) φ2(kXk2) = E − − 4(p − 4)( − ) − 8φ00(kXk2) − kXk4 kXk2 kXk4 kXk4 · ¸ φ(∞)(φ(∞) − 4(p − 4)) φ2(kXk2) φ(kXk2) = E − + 4(p − 4) kXk4 kXk4 kXk4 · ¸ φ0(kXk2) +E −4(p − 4) − 8φ00(kXk2) . (6.2) kXk2 The first term of the right-hand side of the extreme equation in (6.2) is rewritten as ·Z · ¸ ¸ ∞ d φ(tkXk2) ¡ ¢ E φ(tkXk2) − 4(p − 4) dt dt kXk4 1 ·Z µ ¶ ¸ ∞ φ(tkXk2)φ0(tkXk2) 4(p − 4)φ0(tkXk2) = E 2 − dt kXk2 kXk2 Z 1 Z ∞ ∞ φ0(tx) = 2 (φ(tx) − 2(p − 4)) f (x, λ)dt dx x p Z0 Z1 ∞ ∞ φ0(w) = 2 (φ(w) − 2(p − 4)) fp(w/t, λ)dt dw 0 1 w Z ∞ Z w 0 1 = 2 φ (w) 2 (φ(w) − 2(p − 4)) fp(y, λ)dy dw. 0 0 y 2 It is noted that fp(t, λ) and fp(t) designate a density of χp(λ) with non-centrality 2 2 λ = kθk /2 and a density of χp, respectively. For the second term of the right-hand side of the extreme equation in (6.2), applying an integration by parts gives Z ∞ Z ∞ 00 0 d φ (w)fp(w, λ)dw = − φ (w) fp(w, λ)dw. 0 0 dw Hence the second term is expressed by

Z ∞ · ¸ 0 d 4(p − 4) φ (w) 8 fp(w, λ) − fp(w, λ) dw. 0 dw w

91 Hence we get

Z ∞ · Z w JO 0 −2 R(δ , θ) − R(δφ, θ) = 2 φ (w) (φ(w) − 2(p − 4)) y fp(y, λ)dy 0 0 ¸ d −2(p − 4)f (w, λ)/w + 4 f (w, λ) dw. (6.3) p dw p It is here noted that the following inequalities hold:

Z w Z w fp(y, λ) fp(y) fp(w, λ)/ 2 dy ≥ fp(w)/ 2 dy (6.4) 0 y 0 y and d d f (y, λ)/f (y, λ) ≥ f (y)/f (y) = (p/2 − 1) /w − 1/2. (6.5) dw p p dw p p JO Combining (6.3), (6.4) and (6.5), we see that R(δ , θ) − R(δφ, θ) ≥ 0 for every θ, if

1. φ(w) is nondecreasing and limz→∞ φ(z) = 2(p − 4),

2. for φ(w) − 2(p − 4) < 0,

Z w 2 (φ(w) − 2(p − 4)) 1/y fp(y)dy/fp(y) − 2(p − 4)/w + (2p − 4)/w − 2 ≥ 0, 0 which are guaranteed by the conditions of this theorem and the proof is complete.

The following result is a simple and useful consequence of Theorem 6.1.

JO Corollary 6.1.1. The estimator δφ(X) is superior to δ (X) if

1. φ(w) is nondecreasing and limz→∞ φ(z) = 2(p − 4),

2. φ(w) ≥ min(2(p − 4), p(p − 4)/(p − 2)w).

The class of improved estimators given by Corollary 6.1.1 include nonnegative-part JO 2 Johnstone’s estimator δ0 (X) = max(0, p − 2(p − 4)/kXk ) and positive estimators · ¸ 2(p − 4) p δJO(X) = max a, p − for 0 ≤ a ≤ 2 . a kXk2 p − 2

92 proof. Our task is only to show that φ0(w) ≤ p(p − 4)/(p − 2)w. By applying an integration by parts, φ0(w) is rewritten as wp/2−1e−w/2 φ (w) = 2(p − 4) + (2 − 4/w)R 0 w yp/2−3e−y/2dy R 0 (p − 4) w yp/2−3e−y/2dy + (1 − 2/w)wp/2−1e−w/2 = 2 0 R w p/2−3 −y/2 0 y e dy R w yp/2−2e−y/2dy wp/2−1e−w/2 = 2R0 + 2R . w p/2−3 −y/2 w p/2−3 −y/2 0 y e dy 0 y e dy Noting the following two inequalities: Z Z w w 2 yp/2−3e−y/2dy ≥ e−w/2 yp/2−3dy = wp/2−2e−w/2 0 0 p − 4 and R R R w p/2−2 −y/2 1 p/2−2 −wt/2 1 p/2−2 1 0 y e dy 0 t e dt 0 t dt p − 4 R w = R ≤ R = , w yp/2−3e−y/2dy 1 p/2−3 −wt/2 1 p/2−3 p − 2 0 0 t e dt 0 t dt we get the required inequality p − 4 p − 4 φ (w) ≤ 2 w + (p − 4)w = p w. 0 p − 2 p − 2

Table 6.1 reports the results of simulation for the “relative risk improvement” which is defined by

RRI(δ) = 100 × {R(kθk2/p, δJO) − R(kθk2/p, δ)}/R(kθk2/p, δJO),

JO JO JO 2 where δ = δ0 , δp/(p−2), δ2p/(p−2). We see that if one has the prior knowledge that kθk JO nearly equals zero, then δ0 may be desirable. When one has no information about 2 JO JO kθk , positive estimators, for example δp/(p−2) or δ2p/(p−2), should be chosen. Moreover, as seen in the case of p = 10, improvements in risk over δJO are negligible for large values of p.

93 Table 6.1 : The relative risk improvements in estimation of the loss function for p = 5, 6, 10 p kθk2/p 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8 2.0 JO 5 δ0 4.92 2.25 1.08 1.06 0.93 0.36 0.24 0.18 0.12 0.10 0.02 JO δp/(p−2) 4.89 2.31 1.16 1.14 1.00 0.41 0.27 0.20 0.14 0.12 0.03 JO δ2p/(p−2) 3.50 1.84 1.11 1.27 1.18 0.59 0.42 0.32 0.23 0.19 0.08 JO 6 δ0 2.06 0.99 1.13 0.47 0.43 0.15 0.09 0.04 0.03 0.01 0.00 JO δp/(p−2) 2.06 1.05 1.21 0.53 0.48 0.18 0.12 0.06 0.05 0.01 0.01 JO δ2p/(p−2) 1.14 0.85 1.29 0.68 0.62 0.29 0.22 0.13 0.10 0.05 0.02 JO 10 δ0 0.07 0.04 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 JO δp/(p−2) 0.06 0.05 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 JO δ2p/(p−2) 0.06 0.05 0.02 0.01 0.01 0.00 0.00 0.00 0.00 0.00 0.00

2 Now we extend the above results to the case where the covariance matrix of X is σ Ip for unknown σ2. Let X and S be independent random variables where X has a p-variate 2 2 2 normal distribution Np(θ, σ Ip) and S/σ has a chi square distribution χn with n degrees of freedom. When X is used to estimate θ under the loss L(d, θ, σ2) = kd − θk2/σ2, it is of great interest in estimating the loss kX − θk2/σ2 from the data using rule δ(X,S). For evaluating the loss estimator δ(X,S), the quadratic loss (δ(X,S) − kX − θk2/σ2)2 is exploited. 2 We consider a class of estimators δφ(X,S) = p − φ(Z)/Z for Z = kXk /S. By the 2 Stein and the chi-square identities, we get the unbiased estimator of risk R(δφ, θ, σ ), which is written as "µ ¶ # φ(Z) kX − θk2 2 R(δ , θ, σ2) = E p − − φ Z σ2 · φ2(Z) 4(p − 4) φ(Z) = 2p + E − Z2 n + 2 Z2 ¸ 4n(p + n − 2) σ2 φ0(Z) + φ0(Z) − 4 . n + 2 kXk2 Z

Therefore we can see that, in the class of δc(X,S) = p − c/Z, δc(X,S) with 0 < c <

4(p − 4)/(n + 2) is superior to the unbiased estimator δU (X,S) = p for p ≥ 5 and

94 ∗ ∗ δc (X,S) = δc(X,S) for c = 2(p − 4)/(n + 2) is optimal in this class. Moreover, ∗ the estimator δc (X,S) has the drawback that it takes negative values with a positive probability. Here, in the same way as Theorem 6.1, modified estimators improving on ∗ δc (X,S) are obtained.

∗ Theorem 6.2. The estimator δφ(X,S) improves on δc (X,S) if

1. φ(z) is nondecreasing and limz→∞ φ(z) = 2(p − 4)/(n + 2),

2. φ(z) ≥ min(2(p − 4)/(n + 2), φ0(z)) for every z, where,

nz − 2 zp/2−1(z + 1)−(p+n)/2 φ (z) = 2(p − 4)/(n + 2) + 2 R . 0 z p/2−3 −(p+n)/2 z(n + 2) 0 t (t + 1) dt

proof. Using Kubokawa’s method gives Z ·µ ¶ Z ∞ p − 4 z R(δ∗, θ, σ2) − R(δ , θ, σ2) = 2 φ0(z) φ(z) − 2 t−2h(t, λ)dt c φ n + 2 0 Z 0 ¸ 2 p + n − 2 1 ∞ + h(z, λ) − 2n fp(vz, λ)fn(v)dv dw, z n + 2 z 0

R ∞ where h(t, λ) = 0 vfp(vt, λ)fn(v)dv. The following inequalities: Z z Z z t−2h(t, λ)dt/h(z, λ) ≤ t−2h(t)dt/h(z), 0 0

Z Z ∞ ∞ z + 1 fp(vz, λ)fn(v)dv/h(z, λ) ≤ fp(vz)fn(v)dv/h(z) = 0 0 p + n − 2 and some calculation yield the condition of the theorem.

Furthermore, in the same way as the proof of Collorary 6.1.1, the inequality φ0(z) ≤ p(p − 4)z/(p − 2) is derived, which implies that · ¸ 2(p − 4) S p δTR(X,S) = max a, p − for 0 ≤ a ≤ 2 a n + 2 kXk2 p − 2

∗ improves on δc (X,S).

95 Through the simulation experiment, we can see that the relative risk improvement of TR TR δ0 (X,S) and δa (X,S) for a = p/(p−2), 2p/(p−2) have similar performance to Table 6.1, regardless of values of n, although the details are omitted. In the sequel, positive estimators should be chosen as good procedures when when one has no information about kθk2/σ2.

96 Chapter 7

Conclusion

In this chapter, we emphasize our original results in this thesis. We mainly consider the estimation of a multivariate normal mean and try to find a broader class of admissible minimax estimators. In Section 3.3.2 and 3.3.3, we see that all classes of minimax admissible estimators given in the hitherto researches, that is, Strawderman(1971), Alam(1973), Berger(1976b), Faith(1978), Maruyama(1998) and

Fourdrinier et al.(1998) are expressed as δh although relations among classes above have not been clear. See Table 3.1. We not only summarize the hitherto researches but also show that Fourdrinier et al.’s class includes estimators whose shrinkage factor φh(w) is not monotone and has one extremum as shown in Corollary 3.21.1. Moreover we present more simple proof of Maruyama(1998)’s result, Theorem 3.26. In Section 3.3.4, we pay attention to Stein(1973)’s simple idea and provide it with the warrant for statistical decision theory. Stein(1973) suggested that the generalized Bayes estimator with respect to the prior distribution: the weighted sum of that determined by the density kθk2−p and a measure concentrated at the origin, may dominate the James-Stein estimator. The generalized Bayes estimator has, however, not-monotone φ(w) as shown in Efron and Morris(1976), so that even minimaxity, to say nothing of dominance over the James-Stein estimator has not been proved yet. Extending Stein’s prior distribution, we consider a prior distribution: the weighted sum of a measure concentrated at the origin and the prior distribution considered in Section 3.3.2 and

97 3.3.3. In Theorem 3.32, We derive analytically the sufficient condition for minimaxity on the ratio of a measure concentrated at the origin. Moreover, in Theorem 3.31 we show that this class of generalized Bayes estimators includes variety of not-monotone φ(w), such that

• φ(w) is increasing from the origin to a certain point and is decreasing from the point. Namely, φ(w) has one extreme point.

• the function φ(w) is increasing from the origin to a certain point w1 and is de-

creasing from the point w1 to the other point w2 and is increasing from the point

w2. Namely, φ(w) has two extreme points.

Hence the relations among minimaxity, the prior distribution, and the behavior of φ(w), are quite clear although the researches on the relations, especially in the case where φ(w) is not monotone, have been fragmentary and have not been arranged yet. We believe that the investigation of estimators with not-monotone φ(w) is more and more important because such estimators are considered as candidates for the solution of the most difficult problem in this field, that is, the problem of finding admissible estimators dominating the James-Stein positive-part rule. In Section 3.4, the problem of finding admissible estimators which dominate the James-Stein estimator is treated. We show that an estimator which is proposed in Maruyama(1996) and improves upon the James-Stein estimator, is permissible. We unfortunately conjecture that it is inadmissible, which implies that it is quite difficult to find a class of admissible estimators which improve upon the James-Stein estimator. In Chapter 4, the problem of estimating the mean vector of scale mixtures of multi- variate normal distributions is considered. For a certain subclass of these distributions, which includes at least multivariate-t distributions, we derive minimax admissible es- timators although, in non-normal case, an estimator satisfying both minimaxity and admissibility, has not been derived yet.

98 Appendix A

Technical results

The proof of Theorem 3.9 requires certain technical results which we present in this chapter. All results in this appendix are due to Brown(1971) and Srinivasan(1981). Let u be a bounded piecewise differential functions defined on Rp. The following result is a rather standard one.

Lemma A.1. There exists a constant K0 such that Z ·Z 2 2 −p+1 (u(θ) − u(x)) pθ(x)dx ≤ K0 k∇u(x)k kx − θk pθ(x)dx Z ¸ 2 + k∇u(x)k pθ(x)dx . proof. Let r = kx − θk and let ϕ denote the usual orthogonal angular coordinates in Rp around the point θ.(ϕ is a (p − 1) vector.) In short (r, ϕ) = (r(x), ϕ(x)) are spherical coordinates around the point θ. For covenience we normalize ϕ so that R R p−1 kxk<1 dx = r drdϕ. Moreover for convenience, let us denote u expressed in terms of these coordinates; i.e. u(x) = us(r(x), ϕ(x)). By Schwartz inequality, we have

à !2 Z r(x) 2 (u(θ) − u(x)) = k∇us(s, ϕ(x))kds 0 Z r(x) 2 ≤ r(x) k∇us(s, ϕ(x))k ds. 0

99 Therefore we have

Z ZZ µZ r ¶ 2 2 2 p−1 (u(θ) − u(x)) pθ(x)dx ≤ r k∇us(s, ϕ)k ds exp(−r /2)r drdϕ 0 ZZ ∞ Z ∞ 2 p 2 = k∇us(s, ϕ)k r exp(−r /2)drdsdϕ 0 s Integrating by parts gives

Z ∞ p 2 p−1 2 r exp(−r /2)dr ≤ K0(s + 1) exp(−s /2) s for some constant K0 > 0. Hence we have

Z ·Z ∞ 2 2 p−1 2 (u(θ) − u(x)) pθ(x)dx ≤ K0 k∇us(s, ϕ)k s exp(−s /2)dsdϕ 0 Z ∞ ¸ 2 2 + k∇us(s, ϕ)k exp(−s /2)dsdϕ ·0Z 2 −p+1 = K0 k∇u(x)k kx − θk pθ(x)dx Z ¸ 2 + k∇u(x)k pθ(x)dx .

We have proved the theorem.

Lemma A.2. Let ρ be a constant such that 0 < ρ < 1/2. Suppose that u(x) is a bounded piecewise differentiable function satisfying Lf u(x) = 0 for 1 < kxk < r, u(x) ≡ 1 for kxk ≤ 1 and u(x) ≡ c, 0 < c < 1 for kxk > r for some r > 2 + ρ. Then there exists a constant K1 > 0 such that Z Z 2 2 (u(x) − u(θ)) dx ≤ K1 k∇u(x)k dx. kx−θk≤ρ kx−θk≤K1 proof. Let u(x) be as given in the lemma. Then it follows from Brown(1971) that u(x) has mean value property. More precisely for any given θ, there exists qθ(η) > 0 such R R that u(θ) = u(η)qθ(η)dη. Moreover qθ(η) = 0 if kθ − ηk > 1, qθ(η)dη = 1 and

100 qθ(η) < K2 for some constant K2. Then we have

Z Z µ Z ¶2 2 (u(x) − u(θ)) dx = u(x) − u(η)qθ(η)dη dx kx−θk≤ρ Zkxµ−θZk≤ρ ¶ 2 ≤ (u(x) − u(η)) dx qθ(η)dη Z µZkx−θk≤ρ ¶ 2 ≤ (u(x) − u(η)) dx qθ(η)dη. (A.1) kx−ηk≤1+ρ The last line (A.1) follows from the fact {x : kx − θk ≤ ρ} ⊂ {x : kx − ηk ≤ 1 + ρ} for any η such that kη − θk ≤ 1. In the similar way as the proof of Lemma A.1, we have

Z Z 1+ρ µZ v ¶ 2 p 2 (u(x) − u(η)) dx ≤ v k∇us(s, ϕ)k ds dvdϕ kx−ηk<1+ρ 0 0 Z 1+ρ µZ 1+ρ ¶ 2 p = k∇us(s, ϕ)k v dv dsdϕ 0 s Z 1+ρ m+1 2 −p+1 p−1 ≤ (1 + ρ) k∇us(s, ϕ)k s s dsdϕ Z0 = (1 + ρ)m+1 k∇u(x)k2kx − ηk−p+1dx, (A.2) E where E = {x : kx − ηk ≤ 1 + ρ}. Therefore, bounding the second integral in (A.1) using (A.2) and the upper bound of qθ(η), we have Z Z 2 m+1 (u(x) − u(θ)) dx ≤ (1 + ρ) K2 kx−θk<ρ µZ kθ−ηk≤1 ¶ k∇u(x)k2kx − ηk−p+1dx dη kx−θk≤3+ρ Z m+1 2 ≤ cK2(1 + ρ) k∇u(x)k dx, kx−θk≤3+ρ where Z Z c = kzk1−pdz ≥ kx − ηk−p+1dη. kzk≤1 kθ−ηk≤1

m+1 Letting K1 = cK2(1 + ρ) + 4 the lemma follows.

101 Lemma A.3. Let u(x) satisfy the condition of Lemma A.2. Then there exists a con- stant K3 > 0 such that Z Z 2 2 (u(x) − u(θ)) pθ(x)dx ≤ K3 k∇u(x)k pθ(x)dx. (A.3) proof. By Lemma A.1, we have Z Z 2 2 −p+1 (u(x) − u(θ)) pθ(x)dx ≤ C1 k∇u(x)k kx − θk pθ(x)dx kx−θk>ρ kZx−θk>ρ 2 +C1 k∇u(x)k pθ(x)dx Z 2 ≤ C2 k∇u(x)k pθ(x)dx (A.4)

−p+1 where C2 = 2C1ρ . For 0 < ρ < 1/2, we have by Lemma A.2 Z Z 2 −p/2 2 (u(x) − u(θ)) pθ(x)dx ≤ (2π) (u(x) − u(θ)) dx kx−θk≤ρ kxZ−θk≤ρ −p/2 2 ≤ (2π) K1 k∇u(x)k dx kxZ−θk≤K2 2 2 ≤ K1 exp(K1 /2) k∇u(x)k pθ(x)dx Zkx−θk≤K2 2 2 ≤ K1 exp(K1 /2) k∇u(x)k pθ(x)dx. (A.5)

2 Hence letting K = K1 exp(K1 /2) + C2, we have the lemma.

The next lemma is an extension of Lemma A.3.

Lemma A.4. Let u(x) satisfy the condition of Lemma A.2. Then there exists a con- stant K4 > 0 such that Z Z µZ ¶ 2 2 2 (u(x) − u(θ)) kx − θk pθ(x)dx ≤ K4 k∇u(x)k pθ(x + ξ)dξ dx kξk

102 Lemma A.5. For a positive constant α, there exists a constant K5 depending only upon K5 and p such that Z

exp(αkx − θk)pθ(x) ≤ K5 pθ(x + ξ)dξ. kξk

103 Bibliography

[1] Abramowitz, M and Stegun, I.A. (1964). Handbook of Mathematical Functions, Dover Publications, New York.

[2] Alam, K. (1973). A family of admissible minimax estimators of the mean of a multivariate normal distribution. Ann. Statist., 1, 517-525.

[3] Baranchik, A.J. (1964). Multiple regression and estimation of the mean of a mul- tivariate normal distribution. Stanford Univ. Technical Report 51

[4] Baranchik, A.J. (1970). A family of minimax estimators of the mean of a multi- variate normal distribution. Ann. Math. Statist., 41, 642-645.

[5] Berger, J. (1975). Minimax estimation of location vectors for a wide class of den- sities. Ann. Statist., 3, 1318-1328.

[6] Berger, J. (1976a). Tail minimaxity in location vector problems and its applica- tions. Ann. Statist., 4, 33-50.

[7] Berger, J. (1976b). Admissible minimax estimation of a multivariate normal mean with arbitrary quadratic loss. Ann. Statist., 4, 223-226.

[8] Berger, J. (1985). Statistical Decision Theory and Bayesian Analysis. 2nd. Ed., Springer-Verlag, New York.

[9] Berry, J.C. (1994). Improving the James-Stein estimator using the Stein Variance estimator. Statist. and Prob. Letters, 20, 241-245.

104 [10] Birnbaum, A. (1955). Characterization of complete classes of tests of some mul- tiparametric hypotheses with applications to likelihood ratio tests. Ann. Math. Statist., 22, 22-42.

[11] Bock, M.E. (1985). Minimax estimators that shift towards a hypersphere for lo- cation vectors of spherically symmetric distributions. J. Multivariate Anal., 17, 127-147.

[12] Bock, M.E. (1988). Shrinkage estimators: pseudo-Bayes rules for normal mean vectors. In Statistical Decision Theory and Related Topics IV (S.S. Gupta, J.O. Berger, eds.), 281-297, Springer-Verlag, Newyork.

[13] Brandwein, A. and Strawderman. W.E. (1978). Minimax estimation of location parameters for spherically symmetric unimodal distributions under quadratic loss. Ann. Statist. , 6, 377-416.

[14] Brandwein, A. and Strawderman. W.E. (1991). Generalizations of James-Stein estimators under spherical symmetry. Ann. Statist. , 19, 1639-1650.

[15] Brewster, J.F. and Zidek, J.V. (1974). Improving on equivariant estimators. Ann. Statist., 2, 21-38.

[16] Brown, L.D. (1966). On the admissibility of invariant estimators of one or more location parameters. Ann. Math. Statist., 37, 1087-1136.

[17] Brown, L.D. (1971). Admissible estimators, recurrent diffusions, and insoluble boundary value problems. Ann. Math. Statist., 42, 855-903.

[18] Brown, L.D. (1988). The differential inequality of a statistical estimation problem. In Statistical Decision Theory and Related Topics IV (S.S. Gupta, J.O. Berger, eds.), 299-324, Springer-Verlag, Newyork.

105 [19] DasGupta, A. and Strawderman. W.E. (1997). All estimates with a given risk, Riccaci differential equations and a new proof of a theorem of Brown. Ann. Statist., 25, 1208-1221.

[20] Eaton, M.L. (1992). A statistical diptych - recurrence of symmetric Markov chains. Ann. Statist., 20, 1147-1179.

[21] Efron, B. and Morris, C. (1976). Families of minimax estimators of the mean of a multivariate normal distribution. Ann. Statist., 4, 11-21.

[22] Faith, R.E. (1978). Minimax Bayes estimators of a multivariate normal mean. J. Multivariate Anal., 8, 372-379.

[23] Farrell, R. (1968). On a necessary and sufficient condition for admissibility of esti- mators when strictly convex loss is used. Ann. Math. Statist., 38, 23-28.

[24] Fortuin, C.M., Kasteleyn, P.W. and Ginibre, J. (1971). Correlation inequalities on some partially ordered sets. Comm. Math. Phys. 22, 89-103.

[25] Fourdrinier, D., Strawderman. W.E. and Wells, T. (1998). On the construction of Bayes minimax estimators. Ann. Statist., 26, 660-671.

[26] George, E.I. (1990). Comment on “Developments in decision theoretic variance estimation”, by Maatta, J.M. and Casella, G. Statist. Sci., 5, 107-109.

[27] Ghosh, M.(1994). On some Bayesian solutions of the Neyman-Scott problem. In Statistical Decision Theory and Related Topics V, 267-276.

[28] Hara, H. (1997). unpublished manuscript.

[29] James, W. and Stein, C. (1961). Estimation of quadratic loss. Proc. 4th Berkeley Symp. Math. Statist. Prob., Vol.1, 361-379, Univ. of California Press, Berkeley.

[30] Kiefer, J.(1957). Invariance, minimax sequential estimation and continuous time processes. Ann. Math. Statist., 28, 573-601.

106 [31] Kubokawa, T.(1991). An approach to improving the James-Stein estimator. J. Multivariate Anal., 36, 121-126.

[32] Kubokawa, T.(1994). An unified approach to improving equivariant estimators. Ann. Statist., 22, 290-299.

[33] Kubokawa, T.(1998). The Stein phenomenon in simultaneous estimation: a review. Applied Statistical Science III (eds. S.E. Ahmed, M. Ahsanullah and B.K. Sinha), 143-173, NOVA Science Publishers, Inc., New York.

[34] Kubokawa, T., Morita, K., Makita, S. and Nagakura, K. (1993). Estimation of the variance and its applications. J. Statist. Plann. Inference, 35, 319-333.

[35] Lehmann,E.L. (1986). Testing Statistical Hypotheses. 2nd Ed., Wiley, New York.

[36] Lehmann, E.L. and Casella, G.(1999). Theory of Point Estimation. 2nd Ed., Springer, New York.

[37] Lin, P. and Tsai, H. (1973). Generalized Bayes minimax estimators of the multi- variate normal mean with unknown covariance matrix. Ann. Statist., 1, 142-145.

[38] Maruyama, Y. (1996). Estimation of the mean vector and the variance of a mul- tivariate normal distribution. Master thesis of graduate school of economics, Uni- versity of Tokyo.

[39] Maruyama, Y. (1997). A New Positive Estimator of Loss Function. Statistics & Probability Letters, 36, 269-274.

[40] Maruyama, Y. (1998). A unified and broadened class of admissible minimax esti- mators of a multivariate normal mean. J. Multivariate Anal., 64, 196-205.

[41] Maruyama, Y. (1999a). Minimax estimators of a normal variance. Metrika, 48, 209-214.

107 [42] Maruyama, Y. (1999b). Improving on the James-Stein Estimator. Statistics & Decisions, 17, 137-140.

[43] Maruyama, Y. (2000a). Correction to “A unified and broadened class of admissible minimax estimators of a multivariate normal mean”. to appear in J. Multivariate Anal.

[44] Maruyama, Y. (2000b). Admissible minimax estimators of a mean vector of scale mixtures of multivariate normal distributions. submitted.

[45] Maruyama, Y. and Iwasaki, K. (2000). unpublished manuscript.

[46] Meyers, N. and Serrin, J. (1960). The exterior problem for second order elliptic equations. J. Math. and Mech., 9, 513-539.

[47] Proskin, H.M. (1985). An admissibility theorem with applications to the estimation of variance of the normal distribution. Ph.D dissertation, Dept. Statist., Rutgers University.

[48] Rukhin, A.L. (1992). Asymptotic risk behavior of mean vector and variance esti- mators and the problem of positive mean. Ann. Inst. Statist. Math., 44, 299-311.

[49] Rukhin, A.L. (1995). Admissibility: Survey of a concept in progress. Inter. Statist. Review, 63, 95-115.

[50] Robert, C.P. (1994). The Bayesian Choice: A Decision-Theoretic Motivation. Springer-Verlag, New York.

[51] Srinivasan, C. (1981). Admissible generalized Bayes estimators and exterior bound- ary value problems. Sankhya (Ser. A), 43, 1-25.

[52] Stein, C. (1956). Inadmissibility of the usual estimator for the mean of a multivari- ate normal distribution. Proc. 3rd Berkeley Symp. Math. Statist. Prob., Vol.1,197- 206, Univ. of California Press, Berkeley.

108 [53] Stein, C. (1964). Inadmissibility of the usual estimator for the variance of a normal distribution with unknown mean. Ann. Inst. Statist. Math., 16, 155-160.

[54] Stein, C. (1973). Estimation of the mean of a multivariate normal distribution. In Proc. Prague Symp. Asymptotic Statist., 345-381.

[55] Strawderman, W.E. (1971). Proper Bayes minimax estimators of multivariate nor- mal mean. Ann. Math. Statist., 42, 385-388.

[56] Strawderman, W.E. (1974). Minimax estimation of location parameters for certain spherically symmetric distributions. J. Multivariate Anal., 4, 255-264.

[57] Strawderman, W.E. and Cohen, A. (1971). Admissibility of estimators of the mean vector of multivariate normal distribution with quadratic loss. Ann. Math. Statist., 42, 270-296.

[58] Takemura, A. (1991). Foundation of the Multivariate Statistical Inference. Kyoritsu Press, Tokyo (in Japanese).

[59] Wald, A. (1950). Statistical Decision Functions. Wiley, New York.

109