Robustness of Location Estimators Under T- Distributions

Total Page:16

File Type:pdf, Size:1020Kb

Robustness of Location Estimators Under T- Distributions IOP Conference Series: Earth and Environmental Science PAPER • OPEN ACCESS Related content - Between the mean and the median: the Lp Robustness of location estimators under t- estimator Francesca Pennecchi and Luca Callegaro distributions: a literature review - Generalized shrunken type-GM estimator and its application C Z Ma and Y L Du To cite this article: C Sumarni et al 2017 IOP Conf. Ser.: Earth Environ. Sci. 58 012015 - On the Lp estimation of a quantity from a set of observations R Willink View the article online for updates and enhancements. This content was downloaded from IP address 170.106.33.19 on 25/09/2021 at 16:42 ISS IOP Publishing IOP Conf. Series: Earth and Environmental Science 58 (2017) 012015 doi:10.1088/1755-1315/58/1/012015 International Conference on Recent Trends in Physics 2016 (ICRTP2016) IOP Publishing Journal of Physics: Conference Series 755 (2016) 011001 doi:10.1088/1742-6596/755/1/011001 Robustness of location estimators under t-distributions: a literature review C Sumarni1,2, K Sadik1, K A Notodiputro1 and B Sartono1 1 Department of Statistics, Bogor Agriculture University, INDONESIA. 2 Statistics Indonesia, Bali Province. Email: [email protected], [email protected], [email protected], [email protected] Abstract. The assumption of normality is commonly used in estimation of parameters in statistical modelling, but this assumption is very sensitive to outliers. The t-distribution is more robust than the normal distribution since the t-distributions have longer tails. The robustness measures of location estimators under t-distributions are reviewed and discussed in this paper. For the purpose of illustration we use the onion yield data which includes outliers as a case study and showed that the t model produces better fit than the normal model. 1. Introduction Estimation of parameters in a survey often assumes that the data is a random sample from a normally distributed population. Suppose that sample data are recorded n units ( ), and are assumed as random samples from normal distribution, 2 (1) yNi iid ()μ,σ In statistical modeling it is also common to assume normality of the error terms. The model can be written as (2) and called the normal model. T 2 or yNii= x β+,εεσ i where i ()0, (2) 2 yNi iid ()μ(β ),σ Assuming normal distribution implies that the estimates become inaccurate when there are outliers in the data. We may assume robust distribution of the error terms to solve this problem. The t- distribution is useful for statistical modelling when data contains outliers. Lange et al. [1] has successfully demonstrated that the estimation results were robust to outliers in linear models if the 2 assumptions of normality has been replaced by assuming a t-distribution, εσi tv(0, , ) . The t-model can be written as 2 yti iid ()μ(β ),σ , v (3) Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. Published under licence by IOP Publishing Ltd 1 ISS IOP Publishing IOP Conf. Series: Earth and Environmental Science 58 (2017) 012015 doi:10.1088/1755-1315/58/1/012015 where tv((),,)μ β σ 2 denotes the univariate t-distribution with location parameter μ()β where β is a vector of coefficient’s regression, scale parameter σ 2 and v degrees of freedom. Lange et al. [1] has not shown explicit measures of the robustness of location estimators under t- distributions. In this paper, this robustness is reviewed by referring to the idea that the robustness of an estimator can be measured from the influence functions (IF), the asymptotic relative efficiency (ARE) and the breakdown point [2]. In this case, we focus on the influence functions and the asymptotic relative efficiency. We also study the influence of outliers toward modelling by illustrating the model of solid pesticides effects on the onion yield. 2. Location estimation under t-distributions We assume in a robust estimation that are random samples from t-distributed population which density function (pdf) is defined by (4). −(1)/2v+ ⎛⎞2 Γ((v + 1) / 2) 1 ⎛⎞yi − μ fy()i = ⎜1+ ⎟−∞≤≤∞, y (4) 2 ⎜⎟v ⎜⎟ πσvvΓ(/2)⎝⎠⎝⎠σ Note that, If and , (4) is a density of univariate Student’t distributions with degrees of freedom (), otherwise to be noncentral t-distributions with location parameter , scale parameter and shape parameter degrees of freedom [3]. Denote logarithm functions of (4) as , ⎛⎞2 112 (1)1v + ⎛⎞yi − μ lvi = ln()Γ ((+ 1)/2)−−−Γ−⎜ ln()πσ v ln ln() ( v /2) ln1+ ⎟ (5) 22() 2⎜⎟v ⎜⎟ ⎝⎠⎝⎠σ or it can be written as 2 (1)v + ⎛⎞ 1⎛⎞y − μ l = constant−⎜ ln 1+ i ⎟ (6) i 2 ⎜⎟v ⎜⎟ ⎝⎠⎝⎠σ The first derivative of respect to is ∂−lyiiv +1 ⎛⎞μ = (7) ∂μ 22⎜⎟ vy+ ()()/i − μ σ ⎝⎠σ If σ 2 is assumed known and v is fixed, we would get the maximum likelihood estimator (MLE) of by finding the solution of this equation, n ∂l ∑ i = 0 i=1 ∂μ n v +1 ⎛⎞y − μ i = 0 ∑ 22⎜⎟ i=1 vy+ ()()/i − μ σ ⎝⎠σ v +1 n ⎛⎞y − μ If we denote w , then w i 0 i = 2 i ⎜⎟2 = ∑ σ vy+ ()()/i − μ σ i=1 ⎝⎠ The MLE of μ under t-distributions is ⎛⎞nn ˆ wy w μTiii= ⎜⎟∑∑ (8) ⎝⎠ii==11 2 ISS IOP Publishing IOP Conf. Series: Earth and Environmental Science 58 (2017) 012015 doi:10.1088/1755-1315/58/1/012015 This is a weighted mean with depending on the sample, so the location estimator under t-distributions is one of an M-estimator [2]. Since is a function of parameters (), equation (8) has no closed form solution. The EM algorithm can be used to get the solution (for more detailed see Lange et al. [1]). 3. The measure of robustness under t-distributions Robustness of an estimator can be measured quantitatively and qualitatively. The quantitative robustness can be measured by a breakdown point and the qualitative ones can be seen from efficiency and stability. 3.1. The Influence Function (IF) The stability of robust estimators can be seen from the influence function. The influence function can be computed according to the function. The influence function of an M-estimate (maximum likelihood type estimates) is defined as [2]. ψ ()y;μ IF(.) = (9) −∂∂Ey⎡⎤⎣⎦()()/;μ ψ μ If the function of an M-estimator is computed from the first derivative of logarithm function of pdf, ∂ ψ ()yfy;ln()μ = ()− , then the M-estimator is the usual MLE [4]. ∂μ The location estimator under the t-distributions is a form of M-estimators with the ψ function defined as (10). vy+1 − μ y; ⎛⎞ ψ ()μ = 22⎜⎟ (10) vy+ ()()/− μ σ ⎝⎠σ The computing of the first derivative of the ψ function (10) can be seen in Appendix and the expectation is 2 ⎡⎤()v +1 ⎡⎤⎛⎞y − μ ⎢⎥⎢⎥− v ⎡⎤2 ⎜⎟ ∂ ⎡⎤vy+1 ⎛⎞− μ ⎢⎥σ ⎢⎥⎝⎠σ EyE/;⎢⎥⎢⎥E⎣⎦ ⎣⎦⎡⎤()()∂∂μ ψ μ ==22⎜⎟ ⎢⎥2 ⎢⎥∂μ ⎢⎥vy()/⎝⎠σ 2 ⎣⎦⎣⎦+ ()− μ σ ⎢⎥vy+ ()()/− μ σ ⎢⎥() ⎣⎦ ⎡⎤⎡⎤2 1 ⎛⎞y − μ ⎢⎥⎢⎥⎛⎞v ⎢⎥⎜⎟⎜⎟− ()v +1 ⎢⎥v ⎜⎟⎝⎠σ = E ⎢⎥⎢⎥⎝⎠ ⎢⎥v 22⎢⎥2 σ ⎛⎞vy+ ()/− μ σ ⎢⎥⎢⎥⎜⎟() ⎢⎥⎢⎥⎜⎟v ⎣⎦⎣⎦⎝⎠ y − μ Let z = , Lange et al. (1989) has computed that the integration yields, σ ⎛⎞⎛⎞vv 2 −m ⎜⎟⎜⎟+ m −1 ⎡⎤⎛⎞z ⎝⎠⎝⎠22 E ⎢⎥⎜⎟1+= and ⎢⎥⎝⎠v ⎛⎞⎛⎞vv++11 ⎣⎦⎜⎟⎜⎟+ m −1 ⎝⎠⎝⎠22 3 ISS IOP Publishing IOP Conf. Series: Earth and Environmental Science 58 (2017) 012015 doi:10.1088/1755-1315/58/1/012015 ⎡⎤⎛⎞z2 11 22−−21⎢⎥⎜⎟+ − 2 2−2 ⎡⎤zz⎛⎞ v ⎡⎛⎞⎛⎞z z ⎤ ⎢⎥⎝⎠ . So we get EEE⎢⎥⎜⎟11+=2 =+ ⎢⎜⎟⎜⎟− 1+ ⎥ vv ⎢⎥2 v v ⎣⎦⎢⎥⎝⎠ ⎛⎞z ⎣⎢⎝⎠⎝⎠ ⎦⎥ ⎢⎥⎜⎟1+ ⎣⎦⎢⎥⎝⎠v ⎡⎤⎡⎤2 ⎢⎥⎢⎥z 22 −1 ⎡⎤⎡⎤22−−2 ⎢⎥()vv++11⎢⎥v ()⎛⎞⎛⎞zz () v+ 1⎛⎞z EyE/; E⎢⎥1 E⎢⎥1 ⎡⎤⎣⎦()()∂∂μ ψ μ ==+⎢⎥22⎢⎥ 2⎜⎟⎜⎟− 2 ⎜⎟+ vvσσ2 ⎢⎥vv ⎢⎥ v σv ⎢⎥⎢⎥⎛⎞z ⎣⎦⎝⎠⎝⎠⎣⎦ ⎝⎠ ⎜⎟1+ ⎢⎥⎢⎥v ⎣⎦⎣⎦⎝⎠ ()vv++11⎡⎤⎡⎤v ()v()v+2 = 22⎢⎥⎢⎥− vvσσ⎣⎦⎣⎦⎢⎥⎢⎥()()vv++13()() vv ++ 13 1 ()v + 2 = − σσ22vv++33 () () ()v +1 = − σ 2 ()v + 3 and ()v +1 −∂∂Ey⎡⎤()()/;μ ψ μ = 2 (11) ⎣⎦σ ()v + 3 Thus the influence function of an M-estimator under t-distributions is v + 3 IF ˆ y ()μμT = 2 ()− (12) vy+ ()()− μ / σ How to see that μˆT (location estimator under t-distribution) is more robust than μˆ N , the location estimator under normal distribution ( )? To answer this, we must know the ψ function and the influence function under normal distribution. The same way as before, we get under normal distribution the ψ function is y − μ ψ *;()y μ = (13) σ 2 and the influence function IF() is IF()μˆN = y − μ (14) The influence function of location estimators under normal distribution is a trend line and unbounded. The increasingly an observed value ( ) will produce a higher value of the influence function ( ) and vice versa. It means that the data’s outlier will influence highly on the estimates. Meanwhile, the influence function of location estimators under t-distributions is a bounded function (see figure 1). That is since any outlier data is down weighted by v +1 wi = 2 (15) vy+ ()()/i − μ σ 4 ISS IOP Publishing IOP Conf. Series: Earth and Environmental Science 58 (2017) 012015 doi:10.1088/1755-1315/58/1/012015 where is a shape parameter, then influence of such data is reduced. An outlier will not affect the estimate significantly. That is why the location estimator under t-distributions is more robust than under the normal distribution. t-IF normal-IF -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -20 -10 0 10 20 -20 -10 0 10 20 y y Figure 1.
Recommended publications
  • Lecture 12 Robust Estimation
    Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Copyright These lecture-notes cannot be copied and/or distributed without permission. The material is based on the text-book: Financial Econometrics: From Basics to Advanced Modeling Techniques (Wiley-Finance, Frank J. Fabozzi Series) by Svetlozar T. Rachev, Stefan Mittnik, Frank Fabozzi, Sergio M. Focardi,Teo Jaˇsic`. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Outline I Robust statistics. I Robust estimators of regressions. I Illustration: robustness of the corporate bond yield spread model. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Robust Statistics I Robust statistics addresses the problem of making estimates that are insensitive to small changes in the basic assumptions of the statistical models employed. I The concepts and methods of robust statistics originated in the 1950s. However, the concepts of robust statistics had been used much earlier. I Robust statistics: 1. assesses the changes in estimates due to small changes in the basic assumptions; 2. creates new estimates that are insensitive to small changes in some of the assumptions. I Robust statistics is also useful to separate the contribution of the tails from the contribution of the body of the data. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Robust Statistics I Peter Huber observed, that robust, distribution-free, and nonparametrical actually are not closely related properties.
    [Show full text]
  • 4.2 Variance and Covariance
    4.2 Variance and Covariance The most important measure of variability of a random variable X is obtained by letting g(X) = (X − µ)2 then E[g(X)] gives a measure of the variability of the distribution of X. 1 Definition 4.3 Let X be a random variable with probability distribution f(x) and mean µ. The variance of X is σ 2 = E[(X − µ)2 ] = ∑(x − µ)2 f (x) x if X is discrete, and ∞ σ 2 = E[(X − µ)2 ] = ∫ (x − µ)2 f (x)dx if X is continuous. −∞ ¾ The positive square root of the variance, σ, is called the standard deviation of X. ¾ The quantity x - µ is called the deviation of an observation, x, from its mean. 2 Note σ2≥0 When the standard deviation of a random variable is small, we expect most of the values of X to be grouped around mean. We often use standard deviations to compare two or more distributions that have the same unit measurements. 3 Example 4.8 Page97 Let the random variable X represent the number of automobiles that are used for official business purposes on any given workday. The probability distributions of X for two companies are given below: x 12 3 Company A: f(x) 0.3 0.4 0.3 x 01234 Company B: f(x) 0.2 0.1 0.3 0.3 0.1 Find the variances of X for the two companies. Solution: µ = 2 µ A = (1)(0.3) + (2)(0.4) + (3)(0.3) = 2 B 2 2 2 2 σ A = (1− 2) (0.3) + (2 − 2) (0.4) + (3− 2) (0.3) = 0.6 2 2 4 σ B > σ A 2 2 4 σ B = ∑(x − 2) f (x) =1.6 x=0 Theorem 4.2 The variance of a random variable X is σ 2 = E(X 2 ) − µ 2.
    [Show full text]
  • Estimation of Common Location and Scale Parameters in Nonregular Cases Ahmad Razmpour Iowa State University
    Iowa State University Capstones, Theses and Retrospective Theses and Dissertations Dissertations 1982 Estimation of common location and scale parameters in nonregular cases Ahmad Razmpour Iowa State University Follow this and additional works at: https://lib.dr.iastate.edu/rtd Part of the Statistics and Probability Commons Recommended Citation Razmpour, Ahmad, "Estimation of common location and scale parameters in nonregular cases " (1982). Retrospective Theses and Dissertations. 7528. https://lib.dr.iastate.edu/rtd/7528 This Dissertation is brought to you for free and open access by the Iowa State University Capstones, Theses and Dissertations at Iowa State University Digital Repository. It has been accepted for inclusion in Retrospective Theses and Dissertations by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. INFORMATION TO USERS This reproduction was made from a copy of a document sent to us for microfilming. While the most advanced technology has been used to photograph and reproduce this document, the quality of the reproduction is heavily dependent upon the quality of the material submitted. The following explanation of techniques is provided to help clarify markings or notations which may appear on this reproduction. 1. The sign or "target" for pages apparently lacking from the document photographed is "Missing Page(s)". If it was possible to obtain the missing page(s) or section, they are spliced into the film along with adjacent pages. This may have necessitated cutting through an image and duplicating adjacent pages to assure complete continuity. 2. When an image on the film is obliterated with a round black mark, it is an indication of either blurred copy because of movement during exposure, duplicate copy, or copyrighted materials that should not have been filmed.
    [Show full text]
  • A Comparison of Unbiased and Plottingposition Estimators of L
    WATER RESOURCES RESEARCH, VOL. 31, NO. 8, PAGES 2019-2025, AUGUST 1995 A comparison of unbiased and plotting-position estimators of L moments J. R. M. Hosking and J. R. Wallis IBM ResearchDivision, T. J. Watson ResearchCenter, Yorktown Heights, New York Abstract. Plotting-positionestimators of L momentsand L moment ratios have several disadvantagescompared with the "unbiased"estimators. For generaluse, the "unbiased'? estimatorsshould be preferred. Plotting-positionestimators may still be usefulfor estimatingextreme upper tail quantilesin regional frequencyanalysis. Probability-Weighted Moments and L Moments •r+l-" (--1)r • P*r,k Olk '- E p *r,!•[J!•. Probability-weightedmoments of a randomvariable X with k=0 k=0 cumulativedistribution function F( ) and quantile function It is convenient to define dimensionless versions of L mo- x( ) were definedby Greenwoodet al. [1979]to be the quan- tities ments;this is achievedby dividingthe higher-orderL moments by the scale measure h2. The L moment ratios •'r, r = 3, Mp,ra= E[XP{F(X)}r{1- F(X)} s] 4, '", are definedby ßr-" •r/•2 ß {X(u)}PUr(1 -- U)s du. L momentratios measure the shapeof a distributionindepen- dently of its scaleof measurement.The ratios *3 ("L skew- ness")and *4 ("L kurtosis")are nowwidely used as measures Particularlyuseful specialcases are the probability-weighted of skewnessand kurtosis,respectively [e.g., Schaefer,1990; moments Pilon and Adamowski,1992; Royston,1992; Stedingeret al., 1992; Vogeland Fennessey,1993]. 12•r= M1,0, r = •01 (1 - u)rx(u) du, Estimators Given an ordered sample of size n, Xl: n • X2:n • ''' • urx(u) du. X.... there are two establishedways of estimatingthe proba- /3r--- Ml,r, 0 =f01 bility-weightedmoments and L moments of the distribution from whichthe samplewas drawn.
    [Show full text]
  • The Asymmetric T-Copula with Individual Degrees of Freedom
    The Asymmetric t-Copula with Individual Degrees of Freedom d-fine GmbH Christ Church University of Oxford A thesis submitted for the degree of MSc in Mathematical Finance Michaelmas 2012 Abstract This thesis investigates asymmetric dependence structures of multivariate asset returns. Evidence of such asymmetry for equity returns has been reported in the literature. In order to model the dependence structure, a new t-copula approach is proposed called the skewed t-copula with individ- ual degrees of freedom (SID t-copula). This copula provides the flexibility to assign an individual degree-of-freedom parameter and an individual skewness parameter to each asset in a multivariate setting. Applying this approach to GARCH residuals of bivariate equity index return data and using maximum likelihood estimation, we find significant asymmetry. By means of the Akaike information criterion, it is demonstrated that the SID t-copula provides the best model for the market data compared to other copula approaches without explicit asymmetry parameters. In addition, it yields a better fit than the conventional skewed t-copula with a single degree-of-freedom parameter. In a model impact study, we analyse the errors which can occur when mod- elling asymmetric multivariate SID-t returns with the symmetric multi- variate Gauss or standard t-distribution. The comparison is done in terms of the risk measures value-at-risk and expected shortfall. We find large deviations between the modelled and the true VaR/ES of a spread posi- tion composed of asymmetrically distributed risk factors. Going from the bivariate case to a larger number of risk factors, the model errors increase.
    [Show full text]
  • On the Efficiency and Consistency of Likelihood Estimation in Multivariate
    On the efficiency and consistency of likelihood estimation in multivariate conditionally heteroskedastic dynamic regression models∗ Gabriele Fiorentini Università di Firenze and RCEA, Viale Morgagni 59, I-50134 Firenze, Italy <fi[email protected]fi.it> Enrique Sentana CEMFI, Casado del Alisal 5, E-28014 Madrid, Spain <sentana@cemfi.es> Revised: October 2010 Abstract We rank the efficiency of several likelihood-based parametric and semiparametric estima- tors of conditional mean and variance parameters in multivariate dynamic models with po- tentially asymmetric and leptokurtic strong white noise innovations. We detailedly study the elliptical case, and show that Gaussian pseudo maximum likelihood estimators are inefficient except under normality. We provide consistency conditions for distributionally misspecified maximum likelihood estimators, and show that they coincide with the partial adaptivity con- ditions of semiparametric procedures. We propose Hausman tests that compare Gaussian pseudo maximum likelihood estimators with more efficient but less robust competitors. Fi- nally, we provide finite sample results through Monte Carlo simulations. Keywords: Adaptivity, Elliptical Distributions, Hausman tests, Semiparametric Esti- mators. JEL: C13, C14, C12, C51, C52 ∗We would like to thank Dante Amengual, Manuel Arellano, Nour Meddahi, Javier Mencía, Olivier Scaillet, Paolo Zaffaroni, participants at the European Meeting of the Econometric Society (Stockholm, 2003), the Sympo- sium on Economic Analysis (Seville, 2003), the CAF Conference on Multivariate Modelling in Finance and Risk Management (Sandbjerg, 2006), the Second Italian Congress on Econometrics and Empirical Economics (Rimini, 2007), as well as audiences at AUEB, Bocconi, Cass Business School, CEMFI, CREST, EUI, Florence, NYU, RCEA, Roma La Sapienza and Queen Mary for useful comments and suggestions. Of course, the usual caveat applies.
    [Show full text]
  • Robustness of Parametric and Nonparametric Tests Under Non-Normality for Two Independent Sample
    International Journal of Management and Applied Science, ISSN: 2394-7926 Volume-4, Issue-4, Apr.-2018 http://iraj.in ROBUSTNESS OF PARAMETRIC AND NONPARAMETRIC TESTS UNDER NON-NORMALITY FOR TWO INDEPENDENT SAMPLE 1USMAN, M., 2IBRAHIM, N 1Dept of Statistics, Fed. Polytechnic Bali, Taraba State, Dept of Maths and Statistcs, Fed. Polytechnic MubiNigeria E-mail: [email protected] Abstract - Robust statistical methods have been developed for many common problems, such as estimating location, scale and regression parameters. One motivation is to provide methods with good performance when there are small departures from parametric distributions. This study was aimed to investigates the performance oft-test, Mann Whitney U test and Kolmogorov Smirnov test procedures on independent samples from unrelated population, under situations where the basic assumptions of parametric are not met for different sample size. Testing hypothesis on equality of means require assumptions to be made about the format of the data to be employed. Sometimes the test may depend on the assumption that a sample comes from a distribution in a particular family; if there is a doubt, then a non-parametric tests like Mann Whitney U test orKolmogorov Smirnov test is employed. Random samples were simulated from Normal, Uniform, Exponential, Beta and Gamma distributions. The three tests procedures were applied on the simulated data sets at various sample sizes (small and moderate) and their Type I error and power of the test were studied in both situations under study. Keywords - Non-normal,Independent Sample, T-test,Mann Whitney U test and Kolmogorov Smirnov test. I. INTRODUCTION assumptions about your data, but it may also require the data to be an independent random sample4.
    [Show full text]
  • Robust Methods in Biostatistics
    Robust Methods in Biostatistics Stephane Heritier The George Institute for International Health, University of Sydney, Australia Eva Cantoni Department of Econometrics, University of Geneva, Switzerland Samuel Copt Merck Serono International, Geneva, Switzerland Maria-Pia Victoria-Feser HEC Section, University of Geneva, Switzerland A John Wiley and Sons, Ltd, Publication Robust Methods in Biostatistics WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice, Iain M. Johnstone, Geert Molenberghs, David W. Scott, Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg, Harvey Goldstein. Editors Emeriti Vic Barnett, J. Stuart Hunter, Jozef L. Teugels A complete list of the titles in this series appears at the end of this volume. Robust Methods in Biostatistics Stephane Heritier The George Institute for International Health, University of Sydney, Australia Eva Cantoni Department of Econometrics, University of Geneva, Switzerland Samuel Copt Merck Serono International, Geneva, Switzerland Maria-Pia Victoria-Feser HEC Section, University of Geneva, Switzerland A John Wiley and Sons, Ltd, Publication This edition first published 2009 c 2009 John Wiley & Sons Ltd Registered office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com. The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
    [Show full text]
  • A Guide to Robust Statistical Methods in Neuroscience
    A GUIDE TO ROBUST STATISTICAL METHODS IN NEUROSCIENCE Authors: Rand R. Wilcox1∗, Guillaume A. Rousselet2 1. Dept. of Psychology, University of Southern California, Los Angeles, CA 90089-1061, USA 2. Institute of Neuroscience and Psychology, College of Medical, Veterinary and Life Sciences, University of Glasgow, 58 Hillhead Street, G12 8QB, Glasgow, UK ∗ Corresponding author: [email protected] ABSTRACT There is a vast array of new and improved methods for comparing groups and studying associations that offer the potential for substantially increasing power, providing improved control over the probability of a Type I error, and yielding a deeper and more nuanced understanding of data. These new techniques effectively deal with four insights into when and why conventional methods can be unsatisfactory. But for the non-statistician, the vast array of new and improved techniques for comparing groups and studying associations can seem daunting, simply because there are so many new methods that are now available. The paper briefly reviews when and why conventional methods can have relatively low power and yield misleading results. The main goal is to suggest some general guidelines regarding when, how and why certain modern techniques might be used. Keywords: Non-normality, heteroscedasticity, skewed distributions, outliers, curvature. 1 1 Introduction The typical introductory statistics course covers classic methods for comparing groups (e.g., Student's t-test, the ANOVA F test and the Wilcoxon{Mann{Whitney test) and studying associations (e.g., Pearson's correlation and least squares regression). The two-sample Stu- dent's t-test and the ANOVA F test assume that sampling is from normal distributions and that the population variances are identical, which is generally known as the homoscedastic- ity assumption.
    [Show full text]
  • A Bayesian Hierarchical Spatial Copula Model: an Application to Extreme Temperatures in Extremadura (Spain)
    atmosphere Article A Bayesian Hierarchical Spatial Copula Model: An Application to Extreme Temperatures in Extremadura (Spain) J. Agustín García 1,† , Mario M. Pizarro 2,*,† , F. Javier Acero 1,† and M. Isabel Parra 2,† 1 Departamento de Física, Universidad de Extremadura, Avenida de Elvas, 06006 Badajoz, Spain; [email protected] (J.A.G.); [email protected] (F.J.A.) 2 Departamento de Matemáticas, Universidad de Extremadura, Avenida de Elvas, 06006 Badajoz, Spain; [email protected] * Correspondence: [email protected] † These authors contributed equally to this work. Abstract: A Bayesian hierarchical framework with a Gaussian copula and a generalized extreme value (GEV) marginal distribution is proposed for the description of spatial dependencies in data. This spatial copula model was applied to extreme summer temperatures over the Extremadura Region, in the southwest of Spain, during the period 1980–2015, and compared with the spatial noncopula model. The Bayesian hierarchical model was implemented with a Monte Carlo Markov Chain (MCMC) method that allows the distribution of the model’s parameters to be estimated. The results show the GEV distribution’s shape parameter to take constant negative values, the location parameter to be altitude dependent, and the scale parameter values to be concentrated around the same value throughout the region. Further, the spatial copula model chosen presents lower deviance information criterion (DIC) values when spatial distributions are assumed for the GEV distribution’s Citation: García, J.A.; Pizarro, M.M.; location and scale parameters than when the scale parameter is taken to be constant over the region. Acero, F.J.; Parra, M.I. A Bayesian Hierarchical Spatial Copula Model: Keywords: Bayesian hierarchical model; extreme temperature; Gaussian copula; generalized extreme An Application to Extreme value distribution Temperatures in Extremadura (Spain).
    [Show full text]
  • Sampling Student's T Distribution – Use of the Inverse Cumulative
    Sampling Student’s T distribution – use of the inverse cumulative distribution function William T. Shaw Department of Mathematics, King’s College, The Strand, London WC2R 2LS, UK With the current interest in copula methods, and fat-tailed or other non-normal distributions, it is appropriate to investigate technologies for managing marginal distributions of interest. We explore “Student’s” T distribution, survey its simulation, and present some new techniques for simulation. In particular, for a given real (not necessarily integer) value n of the number of degrees of freedom, −1 we give a pair of power series approximations for the inverse, Fn ,ofthe cumulative distribution function (CDF), Fn.Wealsogivesomesimpleandvery fast exact and iterative techniques for defining this function when n is an even −1 integer, based on the observation that for such cases the calculation of Fn amounts to the solution of a reduced-form polynomial equation of degree n − 1. We also explain the use of Cornish–Fisher expansions to define the inverse CDF as the composition of the inverse CDF for the normal case with a simple polynomial map. The methods presented are well adapted for use with copula and quasi-Monte-Carlo techniques. 1 Introduction There is much interest in many areas of financial modeling on the use of copulas to glue together marginal univariate distributions where there is no easy canonical multivariate distribution, or one wishes to have flexibility in the mechanism for combination. One of the more interesting marginal distributions is the “Student’s” T distribution. This statistical distribution was published by W. Gosset in 1908.
    [Show full text]
  • The Smoothed Median and the Bootstrap
    Biometrika (2001), 88, 2, pp. 519–534 © 2001 Biometrika Trust Printed in Great Britain The smoothed median and the bootstrap B B. M. BROWN School of Mathematics, University of South Australia, Adelaide, South Australia 5005, Australia [email protected] PETER HALL Centre for Mathematics & its Applications, Australian National University, Canberra, A.C.T . 0200, Australia [email protected] G. A. YOUNG Statistical L aboratory, University of Cambridge, Cambridge CB3 0WB, U.K. [email protected] S Even in one dimension the sample median exhibits very poor performance when used in conjunction with the bootstrap. For example, both the percentile-t bootstrap and the calibrated percentile method fail to give second-order accuracy when applied to the median. The situation is generally similar for other rank-based methods, particularly in more than one dimension. Some of these problems can be overcome by smoothing, but that usually requires explicit choice of the smoothing parameter. In the present paper we suggest a new, implicitly smoothed version of the k-variate sample median, based on a particularly smooth objective function. Our procedure preserves many features of the conventional median, such as robustness and high efficiency, in fact higher than for the conventional median, in the case of normal data. It is however substantially more amenable to application of the bootstrap. Focusing on the univariate case, we demonstrate these properties both theoretically and numerically. Some key words: Bootstrap; Calibrated percentile method; Median; Percentile-t; Rank methods; Smoothed median. 1. I Rank methods are based on simple combinatorial ideas of permutations and sign changes, which are attractive in applications far removed from the assumptions of normal linear model theory.
    [Show full text]