Domination Over the Best Invariant Estimator Is, When Properly

Total Page:16

File Type:pdf, Size:1020Kb

Domination Over the Best Invariant Estimator Is, When Properly DOCUMENT RESUME ED 394 999 TM 024 995 AUTHOR Brandwein, Ann Cohen; Strawderman, William E. TITLE James-Stein Estimation. Program Statistics Research, Technical Report No. 89-86. INSTITUTION Educational Testing Service, Princeton, N.J. REPORT NO ETS-RR-89-20 PUB DATE Apr 89 NOTE 46p. PUB TYPE Reports Evaluative/Feasibility (142) Statistical Data (110) EDRS PRICE MF01/PCO2 Plus Postage. DESCRIPTORS Equations (Mathematics); *Estimation (Mathematics); *Maximum Likelihood Statistics; *Statistical Distributions IDENTIFIERS *James Stein Estimation; Nonnormal Distributions; *Shrinkage ABSTRACT This paper presents an expository development of James-Stein estimation with substantial emphasis on exact results for nonnormal location models. The themes of the paper are: (1) the improvement possible over the best invariant estimator via shrinkage estimation is not surprising but expected from a variety of perspectives; (2) the amount of shrinkage allowable to preserve domination over the best invariant estimator is, when properly interpreted, relatively free from the assumption of normality; and (3) the potential savings in risk are substantial when accompanied by good quality prior information. Relatively, much less emphasis is placed on choosing a particular shrinkage estimator than on demonstrating that shrinkage should produce worthwhile gains in problems where the error distribution is spherically symmetric. In addition, such gains are relatively robust with respect to assumptions concerning distribution and loss. (Contains 1 figure and 53 references.) (Author/SLD) *********************************************************************** * Reproductions supplied by EDRS are the best that can be made * * from the original document. * *********************************************************************** US. OlPAOTOPINT Of MOICATION Moo al Elea Monet Romorch and improvement "PERMISSION TO REPRODUCE THIS EDUCATION& RESOURCES INFORMATION MATERIAL HAS BEEN GRANTED BY F--"---it.,1/4:44%f4s1,c CENTER (ERIC) ne clocurnal has hailm reproduced as rectsvod from th person or orpsnisstion feiehtU/V onoinahng 0 Miner Chance tetan made 10 employs roroductiOn Quality Pointsol via./ or opinions statudinthisdocu mnt do nol ri.cseanly mpriestrit official TO THE EDUCATIONAL RESOURCES OEM Position or policy INFORMATION CENTER (ERIC)." James-Stein Estimation Ann Cohen Brandwein and William E. Strawderman PROGRAM STATISTICS RESEARCH TECHNICAL REPORT NO. 89-86 BEST COPY AVAILABLE EDUCATIONAL TESTING SERVICE PRINCETON, NEW JERSEY 08541 2 THE JAMES-STEIN ESTIMATION Ann Cohen Brandwein and William E. Strawderman Program Statistics Research Technical Report No. 89-86 Research Report No. 89-20 Educational Testing Service Princeton, New Jersey 08541-0001 April 1989 Copyright c 1989 by Educational Testing Service. All rights reserved. The Program Statistics Research TechnicalReport Series is designed to make the working papers of the ResearchStatistics Group at Educational Testing Servicegenerally available. The series con- sists of reports by the members of theResearch Statistics Group as well as their external and visiting statisticalconsultants. Reproduction of any portion of a ProgramStatistics Research Technical - Report requires the written consent of theauthor(s). 4 TABLE OF CONTENTS' Page 1. Introduction 1 2. A Geometric Hint 3 3. An Empirical Bayes Account 4 4. Some Spherical Normal Theory 5 5. More Normal Theory 8 6. Scale Mixtures of Normals. .. 18 7. Other Spherically Symmetric Distributions 22 8. Multiple Observations 27 9. Other Loss Functions 28 10. Comments 33 References 36 bst r a ct This paper presents an expository development of JamesStein estimation with substantial emphasis on exact results for nonnormal location models. The themes of the paper area) that the improvement possible over the best invariant estimator via shrinkage estimation is not surprisingbut expected from a variety uf perspectives; b) that the amount of shrinkage allowable to preservedomination over the best invariant estimator, is, when properly interpreted, relativelyfree from the assumption of normality; and c) the potential savings in risk are substantialwhen accompanied by good quality prior information. 1 I.Introduction. This paper presents an exposiiory development of JamesStein estimation with substantial emphasis on exact results for nonnormal location models. The themes of the paper area) that the improvement possible over the best invariant estimator via shrinkage estimation is not surprising but expectedfrom a variety of perspectives; b) that the amount of shrinkage alkable to preserve domination over the best invariant estimator, is, when properly interpre ,relatively free from the assumption of normality; and c) the potential savings in risk are substantial whenaccompanied by good quality prior information. Relatively, much less emphasis is placed on choosing a particularshrinkage estimator than on demonstrating that shrinkage should produceworthwhile gains in problems where the error distribution is spherically symmetric.Additionally such gains are relatively robust with respect to assumptions concerning distributionand loss. The basic problem, of course, is the estimation of the mean vector0 of a pvariate location parameter family. In the normal case (with identitycovariance) for p = I, the usual estimator, the sample mean, is the maximum likelihoodestimator, the UMVUE, the best equivariant and minimax estimator for nearly arbitrarysymmetric loss, and is admissible for essentially arbitrary symmetric loss. Admissibilityfor quadratic loss was first proved by Hodges and Lehmann (1950) and Girschickand Savage (1951) using the CramerRao inequality and by Blyth (1951) using a limit of Bayes typeargument. For p = 2, the above properties also hold in thenormal case. Stein (1956) proved admissibility using an information inequality argument. Inthat same paper however, Stein proved a result that astonished many and which has led to an enormousand rich literature of substantial importance in statistical theory and practice. Stein (1956) showed that estimators of the form (1 a/(b+11X112))X dominate X for 7 a sufficiently small and h sufficientlylarge when p > 3. James and Stein (1961) sharpened the result and gave an explicit class of dominating estim: t.ors, (1 a/11X112)X for 0 < a < 2(p-2). They also indicated that a version ofthe result holds for general location equivariant estimators with finite fourth moment and for lossfunctions which are concave functions of squared error loss. Brown (1966) showed that inadmissibilityof the best equivariant estimator of location holds for virtually allproblems for p > 3, and, in Brown (1965), that admissibility tends to hold for p = 2. Minimaxity for all pfollows from Kiefer (1957). Section 2 gives a geometrical argument du,?: to Stein whichindicates that shrinkage might be expected to work under quite broad distributionalassumptions. Section 3 gives an empirical Bayes argument in thenormal case which results in the usual JamesStein estimator. Section 4 presents Stein's "unbiased estimator of risk"in the normal case and develops the basic theory for the standard JamesSteinestimator in the normal case. Section 5 describes a variety of extensions of thebasic theory to cover shrinkage towards subspaces, Bayes minimax estimation,nonspherical shrinkage, and limited translation rules. Section 6 considers extensions of the resultsof sections 4 and 5 to scale mixtures of normal distributions. 3 Section 7 presents generalizations togeneral spherically symmetric families of distributions, while section 8 indicates theapplicability of earlier results to the multiple observation case. Section 9 is concerned with resultsfor nonspherical quadratic loss and for nonquadratic loss. ., Section 10 presents some additional comments. 2. A Geometric Hint As in much of the development of thesubject, the following rough geometric argument is basically due toStein (1962). Consider an observation vectorX in p dimensions with mean vector 0 andindependent (or uncorrelated) components.Assume n that the components have equalvariance, a`. The situation is depictedin figure 1. Figure 1 about here. Since E(X-0)' 0 = 0 we expectX-0 and 0 to be nearly orthogonal,especially for estimator of 0 might be too large HI. SinceEIIXII2=pa2+1142, itappears that X as an might be a better estimator. long, and that the projectionof 0 on X or something close to it but perhaps we This projection of course depends on0 and therefore isn't a valid estimator, approximate a. can estimate it.If we denote this projectionby (1a)X the problem is to One way to do this is to assumethat the angle between 0and X-0 is exactly a right 9 FIGURE 1 0 4 2 angle and to assume that 11 X11is exactly equal to its expected value 0'0 -I- p9 a- and i.211.x112 similarly 11X-011-2 is equaltopa2. In thiscase we have 11Y112 = pa2i2.'--..2 Ix!'from triangle BCD and 111112 = 11 0112 (1i)211X112 9 9 2 2 = 11)(11- - (1i) 11X11 from triangle ABD. Equating these expressions we obtain 2 2 2 2 pa a-11X11 = 11)(11 P2 g(la)11X11 or (1-2)l1X11= 11X11 2p2. Thisgives 2 9 2 a = po-/11X11and the suggested estimator is (1 a) X = (1Pa 11X11- The above development does not particularly depend on normality of X or even that 0 is a location vector. Unfortunately, it fails to be a proof of the inadmissibility of X, and also fails to distinguish between different values of p.It is however suggestive that the possibility of improving on the unbiased vector X by shrinkage toward the origin may be quite general. 3. An Empirical
Recommended publications
  • Matrix Means and a Novel High-Dimensional Shrinkage Phenomenon
    Matrix Means and a Novel High-Dimensional Shrinkage Phenomenon Asad Lodhia1, Keith Levin2, and Elizaveta Levina3 1Broad Institute of MIT and Harvard 2Department of Statistics, University of Wisconsin-Madison 3Department of Statistics, University of Michigan Abstract Many statistical settings call for estimating a population parameter, most typically the pop- ulation mean, based on a sample of matrices. The most natural estimate of the population mean is the arithmetic mean, but there are many other matrix means that may behave differently, especially in high dimensions. Here we consider the matrix harmonic mean as an alternative to the arithmetic matrix mean. We show that in certain high-dimensional regimes, the har- monic mean yields an improvement over the arithmetic mean in estimation error as measured by the operator norm. Counter-intuitively, studying the asymptotic behavior of these two matrix means in a spiked covariance estimation problem, we find that this improvement in operator norm error does not imply better recovery of the leading eigenvector. We also show that a Rao-Blackwellized version of the harmonic mean is equivalent to a linear shrinkage estimator studied previously in the high-dimensional covariance estimation literature, while applying a similar Rao-Blackwellization to regularized sample covariance matrices yields a novel nonlinear shrinkage estimator. Simulations complement the theoretical results, illustrating the conditions under which the harmonic matrix mean yields an empirically better estimate. 1 Introduction Matrix estimation problems arise in statistics in a number of areas, most prominently in covariance estimation, but also in network analysis, low-rank recovery and time series analysis. Typically, the focus is on estimating a matrix based on a single noisy realization of that matrix.
    [Show full text]
  • Estimation of Common Location and Scale Parameters in Nonregular Cases Ahmad Razmpour Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1982 Estimation of common location and scale parameters in nonregular cases Ahmad Razmpour Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Razmpour, Ahmad, "Estimation of common location and scale parameters in nonregular cases " (1982). Retrospective Theses and Dissertations. 7528. https://lib.dr.iastate.edu/rtd/7528 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS This reproduction was made from a copy of a document sent to us for microfilming. While the most advanced technology has been used to photograph and reproduce this document, the quality of the reproduction is heavily dependent upon the quality of the material submitted. The following explanation of techniques is provided to help clarify markings or notations which may appear on this reproduction. 1. The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure complete continuity. 2. When an image on the film is obliterated with a round black mark, it is an indication of either blurred copy because of movement during exposure, duplicate copy, or copyrighted materials that should not have been filmed.
    [Show full text]
  • Nearly Weighted Risk Minimal Unbiased Estimation✩ ∗ Ulrich K
    Journal of Econometrics 209 (2019) 18–34 Contents lists available at ScienceDirect Journal of Econometrics journal homepage: www.elsevier.com/locate/jeconom Nearly weighted risk minimal unbiased estimationI ∗ Ulrich K. Müller a, , Yulong Wang b a Economics Department, Princeton University, United States b Economics Department, Syracuse University, United States article info a b s t r a c t Article history: Consider a small-sample parametric estimation problem, such as the estimation of the Received 26 July 2017 coefficient in a Gaussian AR(1). We develop a numerical algorithm that determines an Received in revised form 7 August 2018 estimator that is nearly (mean or median) unbiased, and among all such estimators, comes Accepted 27 November 2018 close to minimizing a weighted average risk criterion. We also apply our generic approach Available online 18 December 2018 to the median unbiased estimation of the degree of time variation in a Gaussian local-level JEL classification: model, and to a quantile unbiased point forecast for a Gaussian AR(1) process. C13, C22 ' 2018 Elsevier B.V. All rights reserved. Keywords: Mean bias Median bias Autoregression Quantile unbiased forecast 1. Introduction Competing estimators are typically evaluated by their bias and risk properties, such as their mean bias and mean-squared error, or their median bias. Often estimators have no known small sample optimality. What is more, if the estimation problem does not reduce to a Gaussian shift experiment even asymptotically, then in many cases, no optimality
    [Show full text]
  • Improving on the James-Stein Estimator
    Improving on the James-Stein Estimator Yuzo Maruyama The University of Tokyo Abstract: In the estimation of a multivariate normal mean for the case where the unknown covariance matrix is proportional to the identity matrix, a class of generalized Bayes estimators dominating the James-Stein rule is obtained. It is noted that a sequence of estimators in our class converges to the positive-part James-Stein estimator. AMS 2000 subject classifications: Primary 62C10, 62C20; secondary 62A15, 62H12, 62F10, 62F15. Keywords and phrases: minimax, multivariate normal mean, generalized Bayes, James-Stein estimator, quadratic loss, improved variance estimator. 1. Introduction Let X and S be independent random variables where X has the p-variate normal 2 2 2 distribution Np(θ, σ Ip) and S/σ has the chi square distribution χn with n degrees of freedom. We deal with the problem of estimating the mean vector θ relative to the quadratic loss function kδ−θk2/σ2. The usual minimax estimator X is inadmissible for p ≥ 3. James and Stein (1961) constructed the improved estimator δJS = 1 − ((p − 2)/(n + 2))S/kXk2 X, JS which is also dominated by the James-Stein positive-part estimator δ+ = max(0, δJS) as shown in Baranchik (1964). We note that shrinkage estimators such as James-Stein’s procedure are derived by using the vague prior informa- tion that λ = kθk2/σ2 is close to 0. It goes without saying that we would like to get significant improvement of risk when the prior information is accurate. JS JS Though δ+ improves on δ at λ = 0, it is not analytic and thus inadmissi- ble.
    [Show full text]
  • DECISION THEORY APPLIED to an INSTRUMENTAL VARIABLES MODEL 1 Gary Chamberlain ABSTRACT This Paper Applies Some General Concepts
    DECISION THEORY APPLIED TO AN INSTRUMENTAL VARIABLES MODEL 1 Gary Chamberlain ABSTRACT This paper applies some general concepts in decision theory to a simple instrumental variables model. There are two endogenous variables linked by a single structural equation; k of the exoge- nous variables are excluded from this structural equation and provide the instrumental variables (IV). The reduced-form distribution of the endogenous variables conditional on the exogenous vari- ables corresponds to independent draws from a bivariate normal distribution with linear regression functions and a known covariance matrix. A canonical form of the model has parameter vector (½; Á; !), where Á is the parameter of interest and is normalized to be a point on the unit circle. The reduced-form coe±cients on the instrumental variables are split into a scalar parameter ½ and a parameter vector !, which is normalized to be a point on the (k 1)-dimensional unit sphere; ¡ ½ measures the strength of the association between the endogenous variables and the instrumental variables, and ! is a measure of direction. A prior distribution is introduced for the IV model. The parameters Á, ½, and ! are treated as independent random variables. The distribution for Á is uniform on the unit circle; the distribution for ! is uniform on the unit sphere with dimension k 1. These choices arise from the solution of a minimax problem. The prior for ½ is left general. It¡turns out that given any positive value for ½, the Bayes estimator of Á does not depend upon ½; it equals the maximum-likelihood estimator. This Bayes estimator has constant risk; since it minimizes average risk with respect to a proper prior, it is minimax.
    [Show full text]
  • Michal Kolesár Cowles Foundation Yale University Version 1.4, September 30, 2013
    INTEGRATED LIKELIHOOD APPROACH TO INFERENCE WITH MANY INSTRUMENTS Michal Kolesár∗ Cowles Foundation Yale University Version 1.4, September 30, 2013y Abstract I analyze a Gaussian linear instrumental variables model with a single endogenous regressor in which the number of instruments is large. I use an invariance property of the model and a Bernstein-von Mises type argument to construct an integrated likelihood which by design yields inference procedures that are valid under many instrument asymptotics and are asymptotically optimal under rotation invariance. I establish that this integrated likelihood coincides with the random-effects likelihood of Chamberlain and Imbens (2004), and that the maximum likelihood estimator of the parameter of interest coincides with the limited information maximum likelihood (liml) estimator. Building on these results, I then relax the basic setup along two dimensions. First, I drop the assumption of Gaussianity. In this case, liml is no longer optimal, and I derive a new, more efficient estimator based on a minimum distance objective function that imposes a rank restriction on the matrix of second moments of the reduced-form coefficients. Second, I consider minimum distance estimation without imposing the rank restriction and I show that the resulting estimator corresponds to a version of the bias-corrected two-stage least squares estimator. Keywords: Instrumental Variables, Incidental Parameters, Random Effects, Many Instruments, Misspecification, Limited Information Maximum Likelihood, Bias-Corrected Two-Stage Least Squares. ∗Electronic correspondence: [email protected]. I am deeply grateful to Guido Imbens and Gary Chamberlain for their guidance and encouragement. I also thank Joshua Angrist, Adam Guren, Whitney Newey, Jim Stock, and participants in the econometrics lunch seminar at Harvard University for helpful comments and suggestions.
    [Show full text]
  • Objective Priors in the Empirical Bayes Framework Arxiv:1612.00064V5 [Stat.ME] 11 May 2020
    Objective Priors in the Empirical Bayes Framework Ilja Klebanov1 Alexander Sikorski1 Christof Sch¨utte1,2 Susanna R¨oblitz3 May 13, 2020 Abstract: When dealing with Bayesian inference the choice of the prior often re- mains a debatable question. Empirical Bayes methods offer a data-driven solution to this problem by estimating the prior itself from an ensemble of data. In the non- parametric case, the maximum likelihood estimate is known to overfit the data, an issue that is commonly tackled by regularization. However, the majority of regu- larizations are ad hoc choices which lack invariance under reparametrization of the model and result in inconsistent estimates for equivalent models. We introduce a non-parametric, transformation invariant estimator for the prior distribution. Being defined in terms of the missing information similar to the reference prior, it can be seen as an extension of the latter to the data-driven setting. This implies a natural interpretation as a trade-off between choosing the least informative prior and incor- porating the information provided by the data, a symbiosis between the objective and empirical Bayes methodologies. Keywords: Parameter estimation, Bayesian inference, Bayesian hierarchical model- ing, invariance, hyperprior, MPLE, reference prior, Jeffreys prior, missing informa- tion, expected information 2010 Mathematics Subject Classification: 62C12, 62G08 1 Introduction Inferring a parameter θ 2 Θ from a measurement x 2 X using Bayes' rule requires prior knowledge about θ, which is not given in many applications. This has led to a lot of controversy arXiv:1612.00064v5 [stat.ME] 11 May 2020 in the statistical community and to harsh criticism concerning the objectivity of the Bayesian approach.
    [Show full text]
  • Invariant Models for Causal Transfer Learning
    Journal of Machine Learning Research 19 (2018) 1-34 Submitted 09/16; Revised 01/18; Published 09/18 Invariant Models for Causal Transfer Learning Mateo Rojas-Carulla [email protected] Max Planck Institute for Intelligent Systems T¨ubingen,Germany Department of Engineering Univ. of Cambridge, United Kingdom Bernhard Sch¨olkopf [email protected] Max Planck Institute for Intelligent Systems T¨ubingen,Germany Richard Turner [email protected] Department of Engineering Univ. of Cambridge, United Kingdom Jonas Peters∗ [email protected] Department of Mathematical Sciences Univ. of Copenhagen, Denmark Editor: Massimiliano Pontil Abstract Methods of transfer learning try to combine knowledge from several related tasks (or do- mains) to improve performance on a test task. Inspired by causal methodology, we relax the usual covariate shift assumption and assume that it holds true for a subset of predictor variables: the conditional distribution of the target variable given this subset of predictors is invariant over all tasks. We show how this assumption can be motivated from ideas in the field of causality. We focus on the problem of Domain Generalization, in which no ex- amples from the test task are observed. We prove that in an adversarial setting using this subset for prediction is optimal in Domain Generalization; we further provide examples, in which the tasks are sufficiently diverse and the estimator therefore outperforms pooling the data, even on average. If examples from the test task are available, we also provide a method to transfer knowledge from the training tasks and exploit all available features for prediction. However, we provide no guarantees for this method.
    [Show full text]
  • Minimax Theory
    Copyright c 2008–2010 John Lafferty, Han Liu, and Larry Wasserman Do Not Distribute Chapter 36 Minimax Theory Minimax theory provides a rigorous framework for establishing the best pos- sible performance of a procedure under given assumptions. In this chapter we discuss several techniques for bounding the minimax risk of a statistical problem, including the Le Cam and Fano methods. 36.1 Introduction When solving a statistical learning problem, there are often many procedures to choose from. This leads to the following question: how can we tell if one statistical learning procedure is better than another? One answer is provided by minimax theory which is a set of techniques for finding the minimum, worst case behavior of a procedure. In this chapter we rely heavily on the following sources: Yu (2008), Tsybakov (2009) and van der Vaart (1998). 36.2 Definitions and Notation Let be a set of distributions and let X ,...,X be a sample from some distribution P 1 n P . Let ✓(P ) be some function of P . For example, ✓(P ) could be the mean of P , the 2P variance of P or the density of P . Let ✓ = ✓(X1,...,Xn) denote an estimator. Given a metric d, the minimax risk is b b Rn Rn( )=infsup EP [d(✓,✓(P ))] (36.1) ⌘ P ✓ P 2P where the infimum is over all estimators. Theb sample complexityb is n(✏, )=min n : R ( ) ✏ . (36.2) P n P n845 o 846 Chapter 36. Minimax Theory 36.3 Example. Suppose that = N(✓, 1) : ✓ R where N(✓, 1) denotes a Gaussian P { 2 } with mean ✓ and variance 1.
    [Show full text]
  • Lecture 16: October 4 Lecturer: Siva Balakrishnan
    36-705: Intermediate Statistics Fall 2019 Lecture 16: October 4 Lecturer: Siva Balakrishnan In the last lecture we discussed the MSE and the bias-variance decomposition. We discussed briefly that finding uniformly optimal estimators (i.e. estimators with lowest possible MSE for every value of the unknown parameter) is hopeless, and then briefly discussed finding optimal unbiased estimators. We then introduced some quantities: the score and the Fisher information, and used these notions to discuss the Cram´er-Raobound, which provides a lower bound on the variance of any unbiased estimator. You might come across this terminology in your research: estimators that achieve the Cram´er-Raobound, i.e. are unbiased and achieve the Cram´er-Raolower bound on the variance, are often called efficient estimators. In many problems efficient estimators do not exist, and we often settle for a weaker notion of asymptotic efficiency, i.e. efficient but only as n ! 1. Next week, we will show that in many cases the MLE is asymptotically efficient. The Cram´er-Raobound suggests that the MSE in a parametric model typically scales as 1=(nI1(θ)). In essence 1=I1(θ) behaves like the variance (we will see this more clearly in the future), so that the MSE behaves as σ2=n, which is exactly analogous to what we have seen before with averages (think back to Chebyshev/sub-Gaussian tail bounds). Supposing that the Fisher information is non-degenerate, i.e. that I1(θ) > 0, and treating the model and θ as fixed: I1(θ) is a constant as n ! 1.
    [Show full text]
  • Improved Minimax Estimation of a Multivariate Normal Mean Under Heteroscedasticity” (DOI: 10.3150/13-BEJ580SUPP; .Pdf)
    Bernoulli 21(1), 2015, 574–603 DOI: 10.3150/13-BEJ580 Improved minimax estimation of a multivariate normal mean under heteroscedasticity ZHIQIANG TAN Department of Statistics, Rutgers University, 110 Frelinghuysen Road, Piscataway, NJ 08854, USA. E-mail: [email protected] Consider the problem of estimating a multivariate normal mean with a known variance matrix, which is not necessarily proportional to the identity matrix. The coordinates are shrunk directly in proportion to their variances in Efron and Morris’ (J. Amer. Statist. Assoc. 68 (1973) 117– 130) empirical Bayes approach, whereas inversely in proportion to their variances in Berger’s (Ann. Statist. 4 (1976) 223–226) minimax estimators. We propose a new minimax estimator, by approximately minimizing the Bayes risk with a normal prior among a class of minimax estimators where the shrinkage direction is open to specification and the shrinkage magnitude is determined to achieve minimaxity. The proposed estimator has an interesting simple form such that one group of coordinates are shrunk in the direction of Berger’s estimator and the remaining coordinates are shrunk in the direction of the Bayes rule. Moreover, the proposed estimator is scale adaptive: it can achieve close to the minimum Bayes risk simultaneously over a scale class of normal priors (including the specified prior) and achieve close to the minimax linear risk over a corresponding scale class of hyper-rectangles. For various scenarios in our numerical study, the proposed estimators with extreme priors yield more substantial risk reduction than existing minimax estimators. Keywords: Bayes risk; empirical Bayes; minimax estimation; multivariate normal mean; shrinkage estimation; unequal variances 1.
    [Show full text]
  • Stat 710: Mathematical Statistics Lecture 3
    Lecture 3: Minimaxity and admissibility Consider estimators of a real-valued J = g(q) based on a sample X from Pq , q 2 Θ, under loss L and risk RT (q) = E[L(T (X);q)]. Minimax estimator An estimator d is minimax if supq Rd (q) = infall T supq RT (q) Discussion A minimax estimator can be very conservative and unsatisfactory. It tries to do as well as possible in the worst case. A unique minimax estimator is admissible, since any estimator better than a minimax estimator is also minimax. We should find an admissible minimax estimator. Different for UMVUE: if a UMVUE is inadmissible, it is dominated by a biased estimator) If a minimax estimator has some other good properties (e.g., it is a Bayes estimator), then it is often a reasonable estimator. beamer-tu-logo UW-Madison (Statistics) Stat 710 Lecture 3 Jan 2019 1 / 15 Minimax estimator The following result shows when a Bayes estimator is minimax. Theorem 4.11 (minimaxity of a Bayes estimator) Let Π be a proper prior on Θ and d be a Bayes estimator of J w.r.t. Π. Suppose d has constant risk on ΘΠ. If Π(ΘΠ) = 1, then d is minimax. If, in addition, d is the unique Bayes estimator w.r.t. Π, then it is the unique minimax estimator. Proof Let T be any other estimator of J. Then Z Z sup RT (q) ≥ RT (q)dΠ ≥ Rd (q)dΠ = sup Rd (q): q2Θ ΘΠ ΘΠ q2Θ If d is the unique Bayes estimator, then the second inequality in the previous expression should be replaced by > and, therefore, d is the unique minimax estimator.
    [Show full text]