Econ 2148, Fall 2017 Shrinkage in the Normal Means Model

Total Page:16

File Type:pdf, Size:1020Kb

Econ 2148, Fall 2017 Shrinkage in the Normal Means Model Shrinkage Econ 2148, fall 2017 Shrinkage in the Normal means model Maximilian Kasy Department of Economics, Harvard University 1 / 47 Shrinkage Agenda I Setup: the Normal means model X ∼ N(q;Ik ) and the canonical estimation problem with loss kqb − qk2. I The James-Stein (JS) shrinkage estimator. I Three ways to arrive at the JS estimator (almost): 1. Reverse regression of qi on Xi . 2. Empirical Bayes: random effects model for qi . 3. Shrinkage factor minimizing Stein’s Unbiased Risk Estimate. I Proof that JS uniformly dominates X as estimator of q. I The Normal means model as asymptotic approximation. 2 / 47 Shrinkage Takeaways for this part of class I Shrinkage estimators trade off variance and bias. I In multi-dimensional problems, we can estimate the optimal degree of shrinkage. I Three intuitions that lead to the JS-estimator: 1. Predict qi given Xi ) reverse regression. 2. Estimate distribution of the qi ) empirical Bayes. 3. Find shrinkage factor that minimizes estimated risk. I Some calculus allows us to derive the risk of JS-shrinkage ) better than MLE, no matter what the true q is. I The Normal means model is more general than it seems: large sample approximation to any parametric estimation problem. 3 / 47 Shrinkage The Normal means model The Normal means model Setup k I q 2 R I e ∼ N(0;Ik ) I X = q + e ∼ N(q;Ik ) I Estimator: qb = qb(X) I Loss: squared error 2 L(qb;q) = ∑(qbi − qi ) i I Risk: mean squared error h i h 2i R(qb;q) = Eq L(qb;q) = ∑Eq (qbi − qi ) : i 4 / 47 Shrinkage The Normal means model Two estimators I Canonical estimator: maximum likelihood, ML qb = X I Risk function ML 2 R(qb ;q) = ∑Eq ei = k: i I James-Stein shrinkage estimator JS (k − 2)=k qb = 1 − · X: X 2 I Celebrated result: uniform risk dominance; for all q JS ML R(qb ;q) < R(qb ;q) = k: 5 / 47 Shrinkage Regression perspective First motivation of JS: Regression perspective I We will discuss three ways to motivate the JS-estimator (up to degrees of freedom correction). I Consider estimators of the form qbi = c · Xi or qbi = a + b · Xi : I How to choose c or (a;b)? I Two particular possibilities: 1. Maximum likelihood: c = 1 (k−2)=k 2. James-Stein: c = 1 − X 2 6 / 47 Shrinkage Regression perspective Practice problem (Infeasible estimator) I Suppose you knew X1;:::;Xk as well as q1;:::;qk , I but are constrained to use an estimator of the form qbi = c · Xi . 1. Find the value of c that minimizes loss. 2. For estimators of the form qbi = a + b · Xi , find the values of a and b that minimize loss. 7 / 47 Shrinkage Regression perspective Solution I First problem: ∗ 2 c = argmin ∑(c · Xi − qi ) c i I Least squares problem! I First order condition: ∗ 0 = ∑(c · Xi − qi ) · Xi : i I Solution ∗ ∑Xi qi c = 2 : ∑i Xi 8 / 47 Shrinkage Regression perspective Solution continued I Second problem: ∗ ∗ 2 (a ;b ) = argmin ∑(a + b · Xi − qi ) a;b i I Least squares problem again! I First order conditions: ∗ ∗ 0 = ∑(a + b · Xi − qi ) i ∗ ∗ 0 = ∑(a + b · Xi − qi ) · Xi : i I Solution ∑(Xi − X) · (qi − q) sXq b∗ = = ; a∗ + b∗ · X = q 2 2 ∑i (Xi − X) sX 9 / 47 Shrinkage Regression perspective Regression and reverse regression I Recall Xi = qi + ei , E[ei jqi ] = 0, Var(ei ) = 1. I Regression of X on q: Slope sXq seq 2 = 1 + 2 ≈ 1: sq sq I For optimal shrinkage, we want to predict q given X, not the other way around! I Reverse regression of q on X: Slope s s2 + s s2 Xq = q eq ≈ q : 2 2 2 2 sX sq + 2seq + se sq + 1 I Interpretation: “signal to (signal plus noise) ratio” < 1. 10 / 47 Shrinkage Regression perspective Illustration 148 S. M. STIGLER Floridabe usedto improvean estimateof the price of Frenchwine, when it is assumedthat they are unre- lated? The best heuristicexplanation that has been offeredis a Bayesianargument: If the 0i are a priori independentN(0, r2), thenthe posterior mean of Oi is ofthe sameform as OJ, and henceO'J can be viewed as an empiricalBayes estimator(Efron and Morris, (9i,X,) I@ * -' 1973;Lehmann, 1983, page 299). Anotherexplanation 8~~~~~~(iX,)- thathas been offeredis that0JS can be viewedas a relativeof a "pre-test"estimator; if one performsa preliminarytest of the null hypothesis that 0 = 0, and one then uses 0 = 0 or 0i = Xi dependingon the outcomeof the test, the resultingestimator is a o~~~ x weightedaverage of 0 and00 ofwhich 0Js is a smoothed version(Lehmann, 1983, pages 295-296). But neither of these explanationsis fullysatisfactory (although bothhelp render the resultmore plausible); the first becauseit requiresspecial a prioriassumptions where Stein did not,the secondbecause it correspondsto the resultonly in the loosestqualitative way. The FIG. 1. Hypotheticalbivariate plot of Oi vs. Xi, fori =1, * ,k. difficultyof understandingthe Steinparadox is com- 11 / 47 poundedby the fact that its proof usually depends on the of means (, )to lie nearthe 45? explicitcomputation of the risk function or the theory expect point line. of completesufficient statistics, by a processthat all convincesus of its truthwithout really illuminating Now our goal is to estimate of the Oi's given all the reasonsthat it works.(The best presentationI of the Xi's, with no assumptions about a possible knowis thatin Lehmann(1983, pages 300-302)of a dfistributionalstructure for the Oi's-they are simply proofdue to Efronand Morris(1973); Berger(1980, to be viewed as unknown constants. Nonetheless, to page 165,example 54) outlinesa shortbut unintuitive 3eewhy we should expect that the ordinaryestimator O' proof;the one shorterproof I have encounteredin a can be improvedupon, it helps to thinkabout what if textbookis vitiatedby a majornoncorrectable error.) we would do this were not the case. If the Oi's,and The purposeof thispaper is to showhow a different hencethe pairs (Xi, Oi), had a knownjoint distribution, a natural (and in some settingseven optimal) method perspective,one developedby FrancisGalton over a setig hreth -fdono ---0aditibt -n of With no di~stiuinlasmtosaothe',wouldbe to calculate6 = E centuryago (Stigler,1986, chapter 8), can renderthe proceeding (X) (O I X) resulttransparent, as' well as lead to a simple,full and use this, the theoreticalregression function of 0 on to estimatesof the it proof.This perspectiveis perhapscloser to thatof the X, generate Oi'sby evaluating foreach We think of this as an unattainable periodbefore 1950 than to subsequentapproaches, but Xi. may it has points in commonwith more recentworks, ideal, unattainable because we do not know the con- ditional distributionof 0 X. we will not particularlythose of Efronand Morris(1973), Rubin given Indeed, assume that our about the unknowncon- (1980),Dempster (1980) and Robbins(1983). uncertainty stants Oican be describedby a probabilitydistribution 2. STEIN ESTIMATIONAS A REGRESSION at all; our view is not that of eitherthe Bayesian or We do know the condi- PROBLEM empiricalBayesian approach. tional distributionof X given 0, namely N(O, 1), and The estimationproblem involves pairs of values we can calculate E (X I 0) = 0. Indeed this, the theo- (Xi, Os), i = 1, ** , k,where one element of each pair retical regressionline of X on 0, correspondsto the (Xi) is knownand one (Oi) is unknown.Since the Oi's line 0 = X in Figure 1, and it is this line which gives are unknown,the pairs cannot in factbe plotted,but the ordinaryestimators 09? = Xi. Thus the ordinary it will help our understandingof the problemand estimator may be viewed as being based on the suggesta meansof approaching it ifwe imaginewhat "4wrong"regression line, on E (XI l ) rather than sucha plot wouldlook like.Figure 1 is hypothetical, E(O I X). Since, as Francis Galton alreadyknew in the butsome aspects of it accuratelyreflect the situation. 1880's, the regressionsof X on 0 and of 0 on X can be Since X is N(O, 1), we can thinkof the X's as being markedlydifferent, this suggests that the ordinary generatedby addingN(O, 1) "errors"to the givenO's. estimatorcan be improvedupon and even suggests Thus thehorizontal deviations of the points from the how this mightbe done-by attemptingto approxi- 450 line 0 = X are independentN(O, 1), and in that mate "E(O I X) "-or whateverthat mightmean in a respectthey should cluster around the line as indi- cated.Also, E(X) = Wand Var(X) = 1/k, so we should This content downloaded from 128.103.149.52 on Thu, 2 Oct 2014 09:21:26 AM All use subject to JSTOR Terms and Conditions Shrinkage Regression perspective Expectations Practice problem 1. Calculate the expectations of 1 2 1 2 X = k ∑Xi ; X = k ∑Xi ; i i and 2 1 2 2 2 sX = k ∑(Xi − X) = X − X i 2. Calculate the expected numerator and denominator of c∗ and b∗. 12 / 47 Shrinkage Regression perspective Solution I E[X] = q I E[X 2] = q 2 + 1 2 2 2 2 I E[sX ] = q − q + 1 = sq + 1 ∗ I c = (Xq)=(X 2), and E[Xq] = q 2. Thus q 2 c∗ ≈ : q 2 + 1 ∗ 2 2 I b = sXq =sX , and E[sXq ] = sq . Thus 2 ∗ sq b ≈ 2 : sq + 1 13 / 47 Shrinkage Regression perspective Feasible analog estimators Practice problem Propose feasible estimators of c∗ and b∗. 14 / 47 Shrinkage Regression perspective A solution I Recall: ∗ Xq I c = X 2 2 I qe ≈ 0, e ≈ 1. I Since Xi = qi + ei , Xq = X 2 − Xe = X 2 − qe − e2 ≈ X 2 − 1 I Thus: 2 2 2 ∗ X − qe − e X − 1 1 c = ≈ = 1 − =: bc: X 2 X 2 X 2 15 / 47 Shrinkage Regression perspective Solution continued I Similarly: ∗ sXq I b = 2 sX 2 I sqe ≈ 0, se ≈ 1.
Recommended publications
  • Shrinkage Priors for Bayesian Penalized Regression
    Shrinkage Priors for Bayesian Penalized Regression. Sara van Erp1, Daniel L. Oberski2, and Joris Mulder1 1Tilburg University 2Utrecht University This manuscript has been published in the Journal of Mathematical Psychology. Please cite the published version: Van Erp, S., Oberski, D. L., & Mulder, J. (2019). Shrinkage priors for Bayesian penalized regression. Journal of Mathematical Psychology, 89, 31–50. doi:10.1016/j.jmp.2018.12.004 Abstract In linear regression problems with many predictors, penalized regression tech- niques are often used to guard against overfitting and to select variables rel- evant for predicting an outcome variable. Recently, Bayesian penalization is becoming increasingly popular in which the prior distribution performs a function similar to that of the penalty term in classical penalization. Specif- ically, the so-called shrinkage priors in Bayesian penalization aim to shrink small effects to zero while maintaining true large effects. Compared toclas- sical penalization techniques, Bayesian penalization techniques perform sim- ilarly or sometimes even better, and they offer additional advantages such as readily available uncertainty estimates, automatic estimation of the penalty parameter, and more flexibility in terms of penalties that can be considered. However, many different shrinkage priors exist and the available, often quite technical, literature primarily focuses on presenting one shrinkage prior and often provides comparisons with only one or two other shrinkage priors. This can make it difficult for researchers to navigate through the many prior op- tions and choose a shrinkage prior for the problem at hand. Therefore, the aim of this paper is to provide a comprehensive overview of the literature on Bayesian penalization.
    [Show full text]
  • Shrinkage Estimation of Rate Statistics Arxiv:1810.07654V1 [Stat.AP] 17 Oct
    Published, CS-BIGS 7(1):14-25 http://www.csbigs.fr Shrinkage estimation of rate statistics Einar Holsbø Department of Computer Science, UiT — The Arctic University of Norway Vittorio Perduca Laboratory of Applied Mathematics MAP5, Université Paris Descartes This paper presents a simple shrinkage estimator of rates based on Bayesian methods. Our focus is on crime rates as a motivating example. The estimator shrinks each town’s observed crime rate toward the country-wide average crime rate according to town size. By realistic simulations we confirm that the proposed estimator outperforms the maximum likelihood estimator in terms of global risk. We also show that it has better coverage properties. Keywords : Official statistics, crime rates, inference, Bayes, shrinkage, James-Stein estimator, Monte-Carlo simulations. 1. Introduction that most of the best schools—according to a variety of performance measures—were small. 1.1. Two counterintuitive random phenomena As it turns out, there is nothing special about small schools except that they are small: their It is a classic result in statistics that the smaller over-representation among the best schools is the sample, the more variable the sample mean. a consequence of their more variable perfor- The result is due to Abraham de Moivre and it mance, which is counterbalanced by their over- tells us that the standard deviation of the mean p representation among the worst schools. The is sx¯ = s/ n, where n is the sample size and s observed superiority of small schools was sim- the standard deviation of the random variable ply a statistical fluke. of interest.
    [Show full text]
  • Linear Methods for Regression and Shrinkage Methods
    Linear Methods for Regression and Shrinkage Methods Reference: The Elements of Statistical Learning, by T. Hastie, R. Tibshirani, J. Friedman, Springer 1 Linear Regression Models Least Squares • Input vectors • is an attribute / feature / predictor (independent variable) • The linear regression model: • The output is called response (dependent variable) • ’s are unknown parameters (coefficients) 2 Linear Regression Models Least Squares • A set of training data • Each corresponding to attributes • Each is a class attribute value / a label • Wish to estimate the parameters 3 Linear Regression Models Least Squares • One common approach ‐ the method of least squares: • Pick the coefficients to minimize the residual sum of squares: 4 Linear Regression Models Least Squares • This criterion is reasonable if the training observations represent independent draws. • Even if the ’s were not drawn randomly, the criterion is still valid if the ’s are conditionally independent given the inputs . 5 Linear Regression Models Least Squares • Make no assumption about the validity of the model • Simply finds the best linear fit to the data 6 Linear Regression Models Finding Residual Sum of Squares • Denote by the matrix with each row an input vector (with a 1 in the first position) • Let be the N‐vector of outputs in the training set • Quadratic function in the parameters: 7 Linear Regression Models Finding Residual Sum of Squares • Set the first derivation to zero: • Obtain the unique solution: 8 Linear Regression
    [Show full text]
  • Linear, Ridge Regression, and Principal Component Analysis
    Linear, Ridge Regression, and Principal Component Analysis Linear, Ridge Regression, and Principal Component Analysis Jia Li Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/∼jiali Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Introduction to Regression I Input vector: X = (X1, X2, ..., Xp). I Output Y is real-valued. I Predict Y from X by f (X ) so that the expected loss function E(L(Y , f (X ))) is minimized. I Square loss: L(Y , f (X )) = (Y − f (X ))2 . I The optimal predictor ∗ 2 f (X ) = argminf (X )E(Y − f (X )) = E(Y | X ) . I The function E(Y | X ) is the regression function. Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Example The number of active physicians in a Standard Metropolitan Statistical Area (SMSA), denoted by Y , is expected to be related to total population (X1, measured in thousands), land area (X2, measured in square miles), and total personal income (X3, measured in millions of dollars). Data are collected for 141 SMSAs, as shown in the following table. i : 1 2 3 ... 139 140 141 X1 9387 7031 7017 ... 233 232 231 X2 1348 4069 3719 ... 1011 813 654 X3 72100 52737 54542 ... 1337 1589 1148 Y 25627 15389 13326 ... 264 371 140 Goal: Predict Y from X1, X2, and X3. Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Linear Methods I The linear regression model p X f (X ) = β0 + Xj βj .
    [Show full text]
  • Kernel Mean Shrinkage Estimators
    Journal of Machine Learning Research 17 (2016) 1-41 Submitted 5/14; Revised 9/15; Published 4/16 Kernel Mean Shrinkage Estimators Krikamol Muandet∗ [email protected] Empirical Inference Department, Max Planck Institute for Intelligent Systems Spemannstraße 38, T¨ubingen72076, Germany Bharath Sriperumbudur∗ [email protected] Department of Statistics, Pennsylvania State University University Park, PA 16802, USA Kenji Fukumizu [email protected] The Institute of Statistical Mathematics 10-3 Midoricho, Tachikawa, Tokyo 190-8562 Japan Arthur Gretton [email protected] Gatsby Computational Neuroscience Unit, CSML, University College London Alexandra House, 17 Queen Square, London - WC1N 3AR, United Kingdom ORCID 0000-0003-3169-7624 Bernhard Sch¨olkopf [email protected] Empirical Inference Department, Max Planck Institute for Intelligent Systems Spemannstraße 38, T¨ubingen72076, Germany Editor: Ingo Steinwart Abstract A mean function in a reproducing kernel Hilbert space (RKHS), or a kernel mean, is central to kernel methods in that it is used by many classical algorithms such as kernel principal component analysis, and it also forms the core inference step of modern kernel methods that rely on embedding probability distributions in RKHSs. Given a finite sample, an empirical average has been used commonly as a standard estimator of the true kernel mean. Despite a widespread use of this estimator, we show that it can be improved thanks to the well- known Stein phenomenon. We propose a new family of estimators called kernel mean shrinkage estimators (KMSEs), which benefit from both theoretical justifications and good empirical performance. The results demonstrate that the proposed estimators outperform the standard one, especially in a \large d, small n" paradigm.
    [Show full text]
  • Generalized Shrinkage Methods for Forecasting Using Many Predictors
    Generalized Shrinkage Methods for Forecasting using Many Predictors This revision: February 2011 James H. Stock Department of Economics, Harvard University and the National Bureau of Economic Research and Mark W. Watson Woodrow Wilson School and Department of Economics, Princeton University and the National Bureau of Economic Research AUTHOR’S FOOTNOTE James H. Stock is the Harold Hitchings Burbank Professor of Economics, Harvard University (e-mail: [email protected]). Mark W. Watson is the Howard Harrison and Gabrielle Snyder Beck Professor of Economics and Public Affairs, Princeton University ([email protected]). The authors thank Jean Boivin, Domenico Giannone, Lutz Kilian, Serena Ng, Lucrezia Reichlin, Mark Steel, and Jonathan Wright for helpful discussions, Anna Mikusheva for research assistance, and the referees for helpful suggestions. An earlier version of the theoretical results in this paper was circulated earlier under the title “An Empirical Comparison of Methods for Forecasting Using Many Predictors.” Replication files for the results in this paper can be downloaded from http://www.princeton.edu/~mwatson. This research was funded in part by NSF grants SBR-0214131 and SBR-0617811. 0 Generalized Shrinkage Methods for Forecasting using Many Predictors JBES09-255 (first revision) This revision: February 2011 [BLINDED COVER PAGE] ABSTRACT This paper provides a simple shrinkage representation that describes the operational characteristics of various forecasting methods designed for a large number of orthogonal predictors (such as principal components). These methods include pretest methods, Bayesian model averaging, empirical Bayes, and bagging. We compare empirically forecasts from these methods to dynamic factor model (DFM) forecasts using a U.S. macroeconomic data set with 143 quarterly variables spanning 1960-2008.
    [Show full text]
  • Cross-Validation, Shrinkage and Variable Selection in Linear Regression Revisited
    Open Journal of Statistics, 2013, 3, 79-102 http://dx.doi.org/10.4236/ojs.2013.32011 Published Online April 2013 (http://www.scirp.org/journal/ojs) Cross-Validation, Shrinkage and Variable Selection in Linear Regression Revisited Hans C. van Houwelingen1, Willi Sauerbrei2 1Department of Medical Statistics and Bioinformatics, Leiden University Medical Center, Leiden, The Netherlands 2Institut fuer Medizinische Biometrie und Medizinische Informatik, Universitaetsklinikum Freiburg, Freiburg, Germany Email: [email protected] Received December 8, 2012; revised January 10, 2013; accepted January 26, 2013 Copyright © 2013 Hans C. van Houwelingen, Willi Sauerbrei. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. ABSTRACT In deriving a regression model analysts often have to use variable selection, despite of problems introduced by data- dependent model building. Resampling approaches are proposed to handle some of the critical issues. In order to assess and compare several strategies, we will conduct a simulation study with 15 predictors and a complex correlation struc- ture in the linear regression model. Using sample sizes of 100 and 400 and estimates of the residual variance corre- sponding to R2 of 0.50 and 0.71, we consider 4 scenarios with varying amount of information. We also consider two examples with 24 and 13 predictors, respectively. We will discuss the value of cross-validation, shrinkage and back- ward elimination (BE) with varying significance level. We will assess whether 2-step approaches using global or pa- rameterwise shrinkage (PWSF) can improve selected models and will compare results to models derived with the LASSO procedure.
    [Show full text]
  • Mining Big Data Using Parsimonious Factor, Machine Learning, Variable Selection and Shrinkage Methods Hyun Hak Kim1 and Norman R
    Mining Big Data Using Parsimonious Factor, Machine Learning, Variable Selection and Shrinkage Methods Hyun Hak Kim1 and Norman R. Swanson2 1Bank of Korea and 2Rutgers University February 2016 Abstract A number of recent studies in the economics literature have focused on the usefulness of factor models in the context of prediction using “big data” (see Bai and Ng (2008), Dufour and Stevanovic (2010), Forni et al. (2000, 2005), Kim and Swanson (2014a), Stock and Watson (2002b, 2006, 2012), and the references cited therein). We add to this literature by analyzing whether “big data” are useful for modelling low frequency macroeconomic variables such as unemployment, in‡ation and GDP. In particular, we analyze the predictive bene…ts associated with the use of principal component analysis (PCA), independent component analysis (ICA), and sparse principal component analysis (SPCA). We also evaluate machine learning, variable selection and shrinkage methods, including bagging, boosting, ridge regression, least angle regression, the elastic net, and the non-negative garotte. Our approach is to carry out a forecasting “horse-race” using prediction models constructed using a variety of model speci…cation approaches, factor estimation methods, and data windowing methods, in the context of the prediction of 11 macroeconomic variables relevant for monetary policy assessment. In many instances, we …nd that various of our benchmark models, including autoregressive (AR) models, AR models with exogenous variables, and (Bayesian) model averaging, do not dominate speci…cations based on factor-type dimension reduction combined with various machine learning, variable selection, and shrinkage methods (called “combination”models). We …nd that forecast combination methods are mean square forecast error (MSFE) “best” for only 3 of 11 variables when the forecast horizon, h = 1, and for 4 variables when h = 3 or 12.
    [Show full text]
  • Shrinkage Improves Estimation of Microbial Associations Under Di↵Erent Normalization Methods 1 2 1,3,4, 5,6,7, Michelle Badri , Zachary D
    bioRxiv preprint doi: https://doi.org/10.1101/406264; this version posted April 4, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC 4.0 International license. Shrinkage improves estimation of microbial associations under di↵erent normalization methods 1 2 1,3,4, 5,6,7, Michelle Badri , Zachary D. Kurtz , Richard Bonneau ⇤, and Christian L. Muller¨ ⇤ 1Department of Biology, New York University, New York, 10012 NY, USA 2Lodo Therapeutics, New York, 10016 NY, USA 3Center for Computational Biology, Flatiron Institute, Simons Foundation, New York, 10010 NY, USA 4Computer Science Department, Courant Institute, New York, 10012 NY, USA 5Center for Computational Mathematics, Flatiron Institute, Simons Foundation, New York, 10010 NY, USA 6Institute of Computational Biology, Helmholtz Zentrum M¨unchen, Neuherberg, Germany 7Department of Statistics, Ludwig-Maximilians-Universit¨at M¨unchen, Munich, Germany ⇤correspondence to: [email protected], cmueller@flatironinstitute.org ABSTRACT INTRODUCTION Consistent estimation of associations in microbial genomic Recent advances in microbial amplicon and metagenomic survey count data is fundamental to microbiome research. sequencing as well as large-scale data collection efforts Technical limitations, including compositionality, low sample provide samples across different microbial habitats that are sizes, and technical variability, obstruct standard
    [Show full text]
  • Shrinkage Estimation for Functional Principal Component Scores, with Application to the Population Kinetics of Plasma Folate
    Shrinkage Estimation for Functional Principal Component Scores, with Application to the Population Kinetics of Plasma Folate Fang Yao1, Hans-Georg M¨uller1; 2, Andrew J. Clifford3, Steven R. Dueker3, Jennifer Follett3, Yumei Lin3, Bruce A. Buchholz4 and John S. Vogel4 February 16, 2003 1Department of Statistics, University of California, One Shields Ave., Davis, CA 95616 2Corresponding Author, e-mail: [email protected] 3Department of Nutrition, University of California, One Shields Ave., Davis, CA 95616 4Center for Accelerator Mass Spectrometry, Lawrence Livermore National Laboratory, Livermore, CA 94551 1 ABSTRACT We present the application of a nonparametric method to perform functional principal com- ponents analysis for functional curve data that consist of measurements of a random trajectory for a sample of subjects. This design typically consists of an irregular grid of time points on which repeated measurements are taken for a number of subjects. We introduce shrinkage estimates for the functional principal component scores that serve as the random effects in the model. Scatter- plot smoothing methods are used to estimate the mean function and covariance surface of this model. We propose improved estimation in the neighborhood of and at the diagonal of the covari- ance surface, where the measurement errors are reflected. The presence of additive measurement errors motivates shrinkage estimates for the functional principal components scores. Shrinkage estimates are developed through best linear prediction and in a generalized version, aiming at min- imizing one-curve-leave-out prediction error. The estimation of individual trajectories combines data obtained from that individual as well as all other individuals. We apply our methods to new data regarding the analysis of the level of 14C-folate in plasma as a function of time since dosing healthy adults with a small tracer dose of 14C-folic acid.
    [Show full text]
  • Image Denoising Via Residual Kurtosis Minimization
    Image Denoising via Residual Kurtosis Minimization Tristan A. Hearn∗ Lothar Reichely Abstract A new algorithm for the removal of additive uncorrelated Gaussian noise from a digital image is presented. The algorithm is based on a data driven methodology for the adaptive thresholding of wavelet coefficients. This methodology is derived from higher order statistics of the residual image, and requires no a priori estimate of the level of noise contamination of an image. 1 Introduction With the mass production of digital images of many kinds, the need for efficient methods to restore corrupted images has become immediate. A major source of image contamination is noise. Image noise may arise from quantization of the image data, transmission errors, electronic interference from the imaging hardware, as well as from other sources [1]. Many methods have been proposed for denoising of digital images. Some are based on finite-impulse response (FIR) filtering, where the noisy image is convolved with a smoothing kernel to reduce the visible effect of noise. This may introduce undesirable artificial blur to an image [16]. Some methods cannot reliably distinguish between noise and edge information in an image; they may require a priori knowledge of the noise level in the contaminated image, or might introduce waveform artifacts. A recent discussion of many available methods can be found in [2]. Denoising methods based on the discrete wavelet transform are popular. These methods are similar to spectral filtering methods, but can take ad- vantage of the partial localization in space and frequency offered by wavelet transforms. This allows efficient separation of noise from impulsive signal ∗Department of Mathematical Sciences, Kent State University, Kent, OH 44242, USA.
    [Show full text]
  • An Empirical Bayes Approach to Shrinkage Estimation on the Manifold of Symmetric Positive-Definite Matrices∗†
    An Empirical Bayes Approach to Shrinkage Estimation on the Manifold of Symmetric Positive-Definite Matrices∗† Chun-Hao Yang1, Hani Doss1 and Baba C. Vemuri2 1Department of Statistics 2Department of Computer Information Science & Engineering University of Florida October 28, 2020 Abstract The James-Stein estimator is an estimator of the multivariate normal mean and dominates the maximum likelihood estimator (MLE) under squared error loss. The original work inspired great interest in developing shrinkage estimators for a variety of problems. Nonetheless, research on shrinkage estimation for manifold-valued data is scarce. In this paper, we propose shrinkage estimators for the parameters of the Log-Normal distribution defined on the manifold of N × N symmetric positive-definite matrices. For this manifold, we choose the Log-Euclidean metric as its Riemannian metric since it is easy to compute and is widely used in applications. By using the Log-Euclidean distance in the loss function, we derive a shrinkage estimator in an ana- lytic form and show that it is asymptotically optimal within a large class of estimators including the MLE, which is the sample Fr´echet mean of the data. We demonstrate the performance of the proposed shrinkage estimator via several simulated data exper- iments. Furthermore, we apply the shrinkage estimator to perform statistical inference in diffusion magnetic resonance imaging problems. arXiv:2007.02153v3 [math.ST] 27 Oct 2020 Keywords: Stein's unbiased risk estimate, Fr´echet mean, Tweedie's estimator ∗This research was in part funded by the NSF grants IIS-1525431 and IIS-1724174 to Vemuri. †This manuscript has been submitted to a journal.
    [Show full text]