Density Estimation Using Quantile Variance and Quantile-Mean Covariance

Total Page:16

File Type:pdf, Size:1020Kb

Density Estimation Using Quantile Variance and Quantile-Mean Covariance Density Estimation using Quantile Variance and Quantile-Mean Covariance Montes-Rojas, Gabriel∗ Mena, Sebastian´ y August 2018 Abstract Based on asymptotic properties of sample Quantile Distribution derived by Hall & Martin (1988) and Ferguson (1999), we propose a novel method which explodes Quantile Variance, and Quantile-Mean Covariance to estimate distributional density from samples. The process consists in firstly estimate sample Quantile Variance and sample Quantile-Mean Covariance using bootstrap techniques and after use them to compute distributional density. We conducted Montecarlo Simulations for different Data Generating Process, sample size and parameters and we discovered that for many cases Quantile Density Estimators perform better in terms of Mean Integrated Squared Error than standard Kernel Density Estimator. Finally, we propose some smoothing techniques in order to reduce estimators variance and increase their accuracy. Keywords:Density Estimation, Quantile Variance, Quantile-Mean Covariance, Bootstrap JEL Classification:C13, C14, C15, C46 ∗CONICET , Universidad de San Andr´es, and Universidad de Buenos Aires. Email: [email protected] yCONICET, Universidad de San Andr´es, and Universidad Nacional de Tucum´an. Email: [email protected] 1 1 Introduction If the continuous random variable X has density f(x) and median Q(0:5) = θ, then the ^ 1 1 2 sample variance of Qn(0:5) is known to be approximately [4nf(θ)2] . As n ! 1 we also 1 know that the asymptotic variance of the median converges to [4f(θ)2] . It can be shown that this result is true for any quantile τ being the asymptotic variance of Q(τ) = xτ equal to τ(1−τ) 2 . Ferguson (1999) extended this results and proved that sample Mean-Quantile joint f(xτ ) distribution has asymptotic covariance equal to $(τ) 3. This results had been widely used to f(xτ ) estimate order statistics variance and construct confidence intervals for robust inference. This usually requires make assumptions about f(x) functional form or instead firstly estimate this density function using non-parametric techniques. Based on properties of sample quantile variance bootstrap estimation derived by Hall & Martin (1988) we explore the revert problem of firstly estimate sample quantile variance (Qv), and sample quantile-mean covariance (Qc), and then use them to non-parametrically estimate density function f(x). We call them Qv and Qc density estimators. 2 Quantile Variance Density Estimator Let (X1; :::; Xn) be i:i:d: with distribution function F (x), density f(x), mean µ and finite 2 variance σ . Let 0 < τ < 1 and let qτ denote the τ quantile of F , so that F (qτ ) = τ. Assume that the density f(x) is continuous and positive at qτ . Let Q(τ) be the quantile function so −1 that Q(τ) ≡ F (τ) = qτ and Yτ;n = Xn:nτ denote the sample τ quantile. Then: p L τ(1 − τ) n(Yτ;n − qτ ) −! N 0; 2 (1) f(xτ ) 1i.e. the sample median of n observations 2According to Stigler (1973) this result, jointly with asymptotic properties of Median estimation, was firstly stated by Laplace in 1818 3 Where $(τ) = argmina E[ρτ (X − a)] is the expected quantile loss function 2 Definition 1 The quantile variance function, V (τ), is the asymptotic variance of the sample quantiles, i.e. Yτ;n, with index τ 2 (0; 1) p V (τ) = lim nV (Yτ;n): (2) n!1 τ(1−τ) From Equations1 and2 we know that at the limit V (τ) = 2 so density at quantile f(xτ ) τ is: s τ(1 − τ) f(Q(τ)) = : (3) V (τ) The Quantile Variance estimator of f(Q(τ)) requires only a consistent estimator of V (τ). Following subsections sketch the steps we need to achieve that. 2.1 Consistent estimator of V (τ) We propose a naive non parametric Bootstrap estimator which tooks B random sub-samples b b of size n out of n observations, sort them into a order rank j = 1; :::; n so X1 < ::: < Xn and: 1. Compute sub-sample quantile τ for sub-sample b as: ( Pn Xb1 Xb ≤ Qb (τ)! ) ^b b j=1 j j n Qn(τ) = inf Xj j Pn b ≥ τ b = 1; 2; :::; B; j = 1; ::; n j=1 Xj ^ 1 PB ^b 2. Compute bootstraped quantile mean as: Qn(τ) = B b=1 Qn(τ); b = 1; 2; :::; B ^ 1 PB ^b ^ 2 3. Finally Compute bootstraped quantile variance as: V (τ) = B b=1(Qn(τ)−Qn(τ)) ; b = 1; 2; :::; B ^ B Lemma 1 Vn (τ) = V (τ) + op(1) as n ! 1, B ! 1. 3 2.2 QV Density Estimator Then our proposed density estimator is s τ(1 − τ) f^(Q(τ)) = : (4) V^ (τ) ^ − 1 Lemma 2 fn(Q(τ)) = f(Q(τ)) + Op(n 4 ) as n ! 1, B ! 1. Hall & Martin(1988) shows that relative error of the bootstrap quantile variance esti- mator and bootstrap sparsity function estimator is of precise order n1=44. Then the rate of ^ convergence of fn(Q(τ) is: τ(1 − τ) nσ2 = + O(n−1) qτ 2 f(xτ ) 1 ^ L 2 n 4 (fn(Q(τ)) − f(Q(τ))) −! N 0;T (5) 2 1 τ(1−τ) Where T = 1 2 2π 2 f(xτ ) 3 Quantile-Mean Covariance Density Estimator Suppose fX1; :::; Xng is an i.i.d. (independent and identically distributed) sample with dis- tribution function F (:), density f(:), quantile function Q(τ); τ 2 (0; 1), and mean µ. Definition 2 The quantile-mean covariance function, C(τ), is the asymptotic covariance ^ between the sample quantiles, i.e. Qn(τ), with index τ 2 (0; 1) and the sample mean, i.e., ¯ Pn Xn = i=1 Xi, p ^ ¯ C(τ) = lim nCOV (Qn(τ); Xn): (6) n!1 4Where n is Sample Size 4 Definition 3 Define the expected quantile loss function (EQLF) 1 $(τ) = argmin E[ρτ (X −a)] = τ(µ−E[XjX < Q(τ)]) = τ µ − E[1[X < Q(τ)]X] ; (7) a τ where ρτ (u) = fτ − 1[u ≤ 0]gu is the quantile check function in Koenker and Bassett (1978). By Ferguson (1999), $(τ) f(Q(τ)) = : (8) C(τ) The estimator of f(Q(τ)) requires consistent estimators of $(τ) and C(τ). The following subsections sketch the steps we need for each element. 3.1 Consistent estimator of $(τ) Consider the following estimator: n 1 1 X $^ (τ) = τ(X¯ − X 1[X < Q^ (τ)]; n n τ n i i n i=1 Lemma 3 $^ n(τ) = $(τ) + op(1) as n ! 1. 3.2 Consistent estimator of C(τ) The key point is how to estimate C(τ). Basically we just want to estimate a \covariance" between two random variables. Think about how we would estimate the covariance between ¯ 2 Xn andσ ^n or any other two \moments". Proposal: The bootstrap b n 1. Consider B bootstrap samples of size n, fxi gi=1; b = 1; 2; :::; B; ¯ b ^b 2. Compute (Xn; Qn(τ)); b = 1; 2; :::; B; 5 ^B 1 PB ¯ b ^b 1 PB ¯ b 1 PB ^b 3. Compute Cn (τ) = B b=1 Xn × Qn(τ) − B b=1 Xn B b=1 Qn(τ) (i.e. sam- ple covariance) ^B Lemma 4 Cn (τ) = C(τ) + op(1) as n ! 1, B ! 1. 3.3 QC Density Estimator Then our proposed density estimator is ^ $^ n(τ) fn(Q(τ)) = (9) ^B Cn (τ) ^ Lemma 5 fn(Q(τ)) = f(Q(τ)) + op(1) as n ! 1, B ! 1. 4 Smoothing Methods In practice, the density estimator may have several kinks in small samples because of the non continuous nature of quantile estimators. Then we can propose smoothing strategies to obtain smoother estimators. 4.1 Moving Average Suppose a given grid of τ values T = τ1; τ2; :::; τT such that τi < τi + 1; i = 1; 2; :::; T − 1. The consider a moving average of 2m + 1 (i.e. we would consider m quantiles to the left and m quantiles to the right to take an average), for τi, i = m + 1; :::; m − 1, 1 f^ (Q(τ ))MA = f^ (Q(τ )) + f^ (Q(τ )) + ::: n i 2m + 1 n i−m n i−m+1 ^ ^ ^ ^ +fn(Q(τi−1)) + fn(Q(τi)) + fn(Q(τi+1)) + ::: + fn(Q(τi+m)) 6 4.2 HP Filter Hodrick and Prescott (1997) proposed a very popular method for decomposing time-series into trend and cycle. Paige & Trindade (2010) proved that HP-Filter is a special case of ^ HP Penalized Spline Smoother. Then, our HP smoothed estimator fn(Q(τi)) will result from minimization of: T X ^ ^ HP 2 min f (f(Q(τi)) − fn(Q(τi)) ) f^ (Q(τ ))HP n i i=1 T X ^ HP ^ HP ^ HP ^ HP 2 +λ [(fn(Q(τi)) − fn(Q(τi−1)) ) − (fn(Q(τi−1)) − fn(Q(τi−2)) )] g i=1 Where λ is a free smoothing parameter. 4.3 Kernel Alternatively, consider a kernel smoothing estimator for a kernel Ψ, 1 f^ (Q(τ ))K = Ψ(τ − τ )f^ (Q(τ )) + Ψ(τ − τ )f^ (Q(τ )) + ::: n i 2m + 1 i−m i n i−m i−m+1 i n i−m+1 ^ ^ ^ +Ψ(τi−1 − τi)fn(Q(τi−1)) + Ψ(0)fn(Q(τi)) + Ψ(τi+1 − τi)fn(Q(τi+1)) + ::: ^ +Ψ(τi+m − τi)fn(Q(τi+m)) m is here a smoothing parameter and we can thus analyze the asymptotic properties with respect to n and m to get optimality properties. 5 Montecarlo Simulations This section shows alternative scenarios where we firstly simulate known Data Generating Process (DGP), secondly we try to recover those DGP using techniques presented before (Quantile Variance Density Estimator, Quantile-Mean Covariance Density Estimator) and 7 thirdly we evaluate the performance of estimators through their Mean Integrated Square Error (MISE).
Recommended publications
  • Generalized Linear Models and Generalized Additive Models
    00:34 Friday 27th February, 2015 Copyright ©Cosma Rohilla Shalizi; do not distribute without permission updates at http://www.stat.cmu.edu/~cshalizi/ADAfaEPoV/ Chapter 13 Generalized Linear Models and Generalized Additive Models [[TODO: Merge GLM/GAM and Logistic Regression chap- 13.1 Generalized Linear Models and Iterative Least Squares ters]] [[ATTN: Keep as separate Logistic regression is a particular instance of a broader kind of model, called a gener- chapter, or merge with logis- alized linear model (GLM). You are familiar, of course, from your regression class tic?]] with the idea of transforming the response variable, what we’ve been calling Y , and then predicting the transformed variable from X . This was not what we did in logis- tic regression. Rather, we transformed the conditional expected value, and made that a linear function of X . This seems odd, because it is odd, but it turns out to be useful. Let’s be specific. Our usual focus in regression modeling has been the condi- tional expectation function, r (x)=E[Y X = x]. In plain linear regression, we try | to approximate r (x) by β0 + x β. In logistic regression, r (x)=E[Y X = x] = · | Pr(Y = 1 X = x), and it is a transformation of r (x) which is linear. The usual nota- tion says | ⌘(x)=β0 + x β (13.1) · r (x) ⌘(x)=log (13.2) 1 r (x) − = g(r (x)) (13.3) defining the logistic link function by g(m)=log m/(1 m). The function ⌘(x) is called the linear predictor. − Now, the first impulse for estimating this model would be to apply the transfor- mation g to the response.
    [Show full text]
  • Stochastic Process - Introduction
    Stochastic Process - Introduction • Stochastic processes are processes that proceed randomly in time. • Rather than consider fixed random variables X, Y, etc. or even sequences of i.i.d random variables, we consider sequences X0, X1, X2, …. Where Xt represent some random quantity at time t. • In general, the value Xt might depend on the quantity Xt-1 at time t-1, or even the value Xs for other times s < t. • Example: simple random walk . week 2 1 Stochastic Process - Definition • A stochastic process is a family of time indexed random variables Xt where t belongs to an index set. Formal notation, { t : ∈ ItX } where I is an index set that is a subset of R. • Examples of index sets: 1) I = (-∞, ∞) or I = [0, ∞]. In this case Xt is a continuous time stochastic process. 2) I = {0, ±1, ±2, ….} or I = {0, 1, 2, …}. In this case Xt is a discrete time stochastic process. • We use uppercase letter {Xt } to describe the process. A time series, {xt } is a realization or sample function from a certain process. • We use information from a time series to estimate parameters and properties of process {Xt }. week 2 2 Probability Distribution of a Process • For any stochastic process with index set I, its probability distribution function is uniquely determined by its finite dimensional distributions. •The k dimensional distribution function of a process is defined by FXX x,..., ( 1 x,..., k ) = P( Xt ≤ 1 ,..., xt≤ X k) x t1 tk 1 k for anyt 1 ,..., t k ∈ I and any real numbers x1, …, xk .
    [Show full text]
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • Variance Function Regressions for Studying Inequality
    Variance Function Regressions for Studying Inequality The Harvard community has made this article openly available. Please share how this access benefits you. Your story matters Citation Western, Bruce and Deirdre Bloome. 2009. Variance function regressions for studying inequality. Working paper, Department of Sociology, Harvard University. Citable link http://nrs.harvard.edu/urn-3:HUL.InstRepos:2645469 Terms of Use This article was downloaded from Harvard University’s DASH repository, and is made available under the terms and conditions applicable to Open Access Policy Articles, as set forth at http:// nrs.harvard.edu/urn-3:HUL.InstRepos:dash.current.terms-of- use#OAP Variance Function Regressions for Studying Inequality Bruce Western1 Deirdre Bloome Harvard University January 2009 1Department of Sociology, 33 Kirkland Street, Cambridge MA 02138. E-mail: [email protected]. This research was supported by a grant from the Russell Sage Foundation. Abstract Regression-based studies of inequality model only between-group differ- ences, yet often these differences are far exceeded by residual inequality. Residual inequality is usually attributed to measurement error or the in- fluence of unobserved characteristics. We present a regression that in- cludes covariates for both the mean and variance of a dependent variable. In this model, the residual variance is treated as a target for analysis. In analyses of inequality, the residual variance might be interpreted as mea- suring risk or insecurity. Variance function regressions are illustrated in an analysis of panel data on earnings among released prisoners in the Na- tional Longitudinal Survey of Youth. We extend the model to a decomposi- tion analysis, relating the change in inequality to compositional changes in the population and changes in coefficients for the mean and variance.
    [Show full text]
  • Flexible Signal Denoising Via Flexible Empirical Bayes Shrinkage
    Journal of Machine Learning Research 22 (2021) 1-28 Submitted 1/19; Revised 9/20; Published 5/21 Flexible Signal Denoising via Flexible Empirical Bayes Shrinkage Zhengrong Xing [email protected] Department of Statistics University of Chicago Chicago, IL 60637, USA Peter Carbonetto [email protected] Research Computing Center and Department of Human Genetics University of Chicago Chicago, IL 60637, USA Matthew Stephens [email protected] Department of Statistics and Department of Human Genetics University of Chicago Chicago, IL 60637, USA Editor: Edo Airoldi Abstract Signal denoising—also known as non-parametric regression—is often performed through shrinkage estima- tion in a transformed (e.g., wavelet) domain; shrinkage in the transformed domain corresponds to smoothing in the original domain. A key question in such applications is how much to shrink, or, equivalently, how much to smooth. Empirical Bayes shrinkage methods provide an attractive solution to this problem; they use the data to estimate a distribution of underlying “effects,” hence automatically select an appropriate amount of shrinkage. However, most existing implementations of empirical Bayes shrinkage are less flexible than they could be—both in their assumptions on the underlying distribution of effects, and in their ability to han- dle heteroskedasticity—which limits their signal denoising applications. Here we address this by adopting a particularly flexible, stable and computationally convenient empirical Bayes shrinkage method and apply- ing it to several signal denoising problems. These applications include smoothing of Poisson data and het- eroskedastic Gaussian data. We show through empirical comparisons that the results are competitive with other methods, including both simple thresholding rules and purpose-built empirical Bayes procedures.
    [Show full text]
  • A Tail Quantile Approximation Formula for the Student T and the Symmetric Generalized Hyperbolic Distribution
    A Service of Leibniz-Informationszentrum econstor Wirtschaft Leibniz Information Centre Make Your Publications Visible. zbw for Economics Schlüter, Stephan; Fischer, Matthias J. Working Paper A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution IWQW Discussion Papers, No. 05/2009 Provided in Cooperation with: Friedrich-Alexander University Erlangen-Nuremberg, Institute for Economics Suggested Citation: Schlüter, Stephan; Fischer, Matthias J. (2009) : A tail quantile approximation formula for the student t and the symmetric generalized hyperbolic distribution, IWQW Discussion Papers, No. 05/2009, Friedrich-Alexander-Universität Erlangen-Nürnberg, Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung (IWQW), Nürnberg This Version is available at: http://hdl.handle.net/10419/29554 Standard-Nutzungsbedingungen: Terms of use: Die Dokumente auf EconStor dürfen zu eigenen wissenschaftlichen Documents in EconStor may be saved and copied for your Zwecken und zum Privatgebrauch gespeichert und kopiert werden. personal and scholarly purposes. Sie dürfen die Dokumente nicht für öffentliche oder kommerzielle You are not to copy documents for public or commercial Zwecke vervielfältigen, öffentlich ausstellen, öffentlich zugänglich purposes, to exhibit the documents publicly, to make them machen, vertreiben oder anderweitig nutzen. publicly available on the internet, or to distribute or otherwise use the documents in public. Sofern die Verfasser die Dokumente unter Open-Content-Lizenzen (insbesondere CC-Lizenzen) zur Verfügung gestellt haben sollten, If the documents have been made available under an Open gelten abweichend von diesen Nutzungsbedingungen die in der dort Content Licence (especially Creative Commons Licences), you genannten Lizenz gewährten Nutzungsrechte. may exercise further usage rights as specified in the indicated licence. www.econstor.eu IWQW Institut für Wirtschaftspolitik und Quantitative Wirtschaftsforschung Diskussionspapier Discussion Papers No.
    [Show full text]
  • Variance Function Estimation in Multivariate Nonparametric Regression by T
    Variance Function Estimation in Multivariate Nonparametric Regression by T. Tony Cai, Lie Wang University of Pennsylvania Michael Levine Purdue University Technical Report #06-09 Department of Statistics Purdue University West Lafayette, IN USA October 2006 Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cail, Michael Levine* Lie Wangl October 3, 2006 Abstract Variance function estimation in multivariate nonparametric regression is consid- ered and the minimax rate of convergence is established. Our work uses the approach that generalizes the one used in Munk et al (2005) for the constant variance case. As is the case when the number of dimensions d = 1, and very much contrary to the common practice, it is often not desirable to base the estimator of the variance func- tion on the residuals from an optimal estimator of the mean. Instead it is desirable to use estimators of the mean with minimal bias. Another important conclusion is that the first order difference-based estimator that achieves minimax rate of convergence in one-dimensional case does not do the same in the high dimensional case. Instead, the optimal order of differences depends on the number of dimensions. Keywords: Minimax estimation, nonparametric regression, variance estimation. AMS 2000 Subject Classification: Primary: 62G08, 62G20. Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, PA 19104. The research of Tony Cai was supported in part by NSF Grant DMS-0306576. 'Corresponding author. Address: 250 N. University Street, Purdue University, West Lafayette, IN 47907. Email: [email protected]. Phone: 765-496-7571. Fax: 765-494-0558 1 Introduction We consider the multivariate nonparametric regression problem where yi E R, xi E S = [0, Ild C Rd while a, are iid random variables with zero mean and unit variance and have bounded absolute fourth moments: E lail 5 p4 < m.
    [Show full text]
  • Variance Function Program
    Variance Function Program Version 15.0 (for Windows XP and later) July 2018 W.A. Sadler 71B Middleton Road Christchurch 8041 New Zealand Ph: +64 3 343 3808 e-mail: [email protected] (formerly at Nuclear Medicine Department, Christchurch Hospital) Contents Page Variance Function Program 15.0 1 Windows Vista through Windows 10 Issues 1 Program Help 1 Gestures 1 Program Structure 2 Data Type 3 Automation 3 Program Data Area 3 User Preferences File 4 Auto-processing File 4 Graph Templates 4 Decimal Separator 4 Screen Settings 4 Scrolling through Summaries and Displays 4 The Variance Function 5 Variance Function Output Examples 8 Variance Stabilisation 11 Histogram, Error and Biological Variation Plots 12 Regression Analysis 13 Limit of Blank (LoB) and Limit of Detection (LoD) 14 Bland-Altman Analysis 14 Appendix A (Program Delivery) 16 Appendix B (Program Installation) 16 Appendix C (Program Removal) 16 Appendix D (Example Data, SD and Regression Files) 16 Appendix E (Auto-processing Example Files) 17 Appendix F (History) 17 Appendix G (Changes: Version 14.0 to Version 15.0) 18 Appendix H (Version 14.0 Bug Fixes) 19 References 20 Variance Function Program 15.0 1 Variance Function Program 15.0 The variance function (the relationship between variance and the mean) has several applications in statistical analysis of medical laboratory data (eg. Ref. 1), but the single most important use is probably the construction of (im)precision profile plots (2). This program (VFP.exe) was initially aimed at immunoassay data which exhibit relatively extreme variance relationships. It was based around the “standard” 3-parameter variance function, 2 J σ = (β1 + β2U) 2 where σ denotes variance, U denotes the mean, and β1, β2 and J are the parameters (3, 4).
    [Show full text]
  • Sampling Student's T Distribution – Use of the Inverse Cumulative
    Sampling Student’s T distribution – use of the inverse cumulative distribution function William T. Shaw Department of Mathematics, King’s College, The Strand, London WC2R 2LS, UK With the current interest in copula methods, and fat-tailed or other non-normal distributions, it is appropriate to investigate technologies for managing marginal distributions of interest. We explore “Student’s” T distribution, survey its simulation, and present some new techniques for simulation. In particular, for a given real (not necessarily integer) value n of the number of degrees of freedom, −1 we give a pair of power series approximations for the inverse, Fn ,ofthe cumulative distribution function (CDF), Fn.Wealsogivesomesimpleandvery fast exact and iterative techniques for defining this function when n is an even −1 integer, based on the observation that for such cases the calculation of Fn amounts to the solution of a reduced-form polynomial equation of degree n − 1. We also explain the use of Cornish–Fisher expansions to define the inverse CDF as the composition of the inverse CDF for the normal case with a simple polynomial map. The methods presented are well adapted for use with copula and quasi-Monte-Carlo techniques. 1 Introduction There is much interest in many areas of financial modeling on the use of copulas to glue together marginal univariate distributions where there is no easy canonical multivariate distribution, or one wishes to have flexibility in the mechanism for combination. One of the more interesting marginal distributions is the “Student’s” T distribution. This statistical distribution was published by W. Gosset in 1908.
    [Show full text]
  • Examining Residuals
    Stat 565 Examining Residuals Jan 14 2016 Charlotte Wickham stat565.cwick.co.nz So far... xt = mt + st + zt Variable Trend Seasonality Noise measured at time t Estimate and{ subtract off Should be left with this, stationary but probably has serial correlation Residuals in Corvallis temperature series Temp - loess smooth on day of year - loess smooth on date Your turn Now I have residuals, how could I check the variance doesn't change through time (i.e. is stationary)? Is the variance stationary? Same checks as for mean except using squared residuals or absolute value of residuals. Why? var(x) = 1/n ∑ ( x - μ)2 Converts a visual comparison of spread to a visual comparison of mean. Plot squared residuals against time qplot(date, residual^2, data = corv) + geom_smooth(method = "loess") Plot squared residuals against season qplot(yday, residual^2, data = corv) + geom_smooth(method = "loess", size = 1) Fitted values against residuals Looking for mean-variance relationship qplot(temp - residual, residual^2, data = corv) + geom_smooth(method = "loess", size = 1) Non-stationary variance Just like the mean you can attempt to remove the non-stationarity in variance. However, to remove non-stationary variance you divide by an estimate of the standard deviation. Your turn For the temperature series, serial dependence (a.k.a autocorrelation) means that today's residual is dependent on yesterday's residual. Any ideas of how we could check that? Is there autocorrelation in the residuals? > corv$residual_lag1 <- c(NA, corv$residual[-nrow(corv)]) > head(corv) . residual residual_lag1 . 1.5856663 NA xt-1 = lag 1 of xt . -0.4928295 1.5856663 .
    [Show full text]
  • Nonparametric Multivariate Kurtosis and Tailweight Measures
    Nonparametric Multivariate Kurtosis and Tailweight Measures Jin Wang1 Northern Arizona University and Robert Serfling2 University of Texas at Dallas November 2004 – final preprint version, to appear in Journal of Nonparametric Statistics, 2005 1Department of Mathematics and Statistics, Northern Arizona University, Flagstaff, Arizona 86011-5717, USA. Email: [email protected]. 2Department of Mathematical Sciences, University of Texas at Dallas, Richardson, Texas 75083- 0688, USA. Email: [email protected]. Website: www.utdallas.edu/∼serfling. Support by NSF Grant DMS-0103698 is gratefully acknowledged. Abstract For nonparametric exploration or description of a distribution, the treatment of location, spread, symmetry and skewness is followed by characterization of kurtosis. Classical moment- based kurtosis measures the dispersion of a distribution about its “shoulders”. Here we con- sider quantile-based kurtosis measures. These are robust, are defined more widely, and dis- criminate better among shapes. A univariate quantile-based kurtosis measure of Groeneveld and Meeden (1984) is extended to the multivariate case by representing it as a transform of a dispersion functional. A family of such kurtosis measures defined for a given distribution and taken together comprises a real-valued “kurtosis functional”, which has intuitive appeal as a convenient two-dimensional curve for description of the kurtosis of the distribution. Several multivariate distributions in any dimension may thus be compared with respect to their kurtosis in a single two-dimensional plot. Important properties of the new multivariate kurtosis measures are established. For example, for elliptically symmetric distributions, this measure determines the distribution within affine equivalence. Related tailweight measures, influence curves, and asymptotic behavior of sample versions are also discussed.
    [Show full text]
  • Estimating Variance Functions for Weighted Linear Regression
    Kansas State University Libraries New Prairie Press Conference on Applied Statistics in Agriculture 1992 - 4th Annual Conference Proceedings ESTIMATING VARIANCE FUNCTIONS FOR WEIGHTED LINEAR REGRESSION Michael S. Williams Hans T. Schreuder Timothy G. Gregoire William A. Bechtold See next page for additional authors Follow this and additional works at: https://newprairiepress.org/agstatconference Part of the Agriculture Commons, and the Applied Statistics Commons This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License. Recommended Citation Williams, Michael S.; Schreuder, Hans T.; Gregoire, Timothy G.; and Bechtold, William A. (1992). "ESTIMATING VARIANCE FUNCTIONS FOR WEIGHTED LINEAR REGRESSION," Conference on Applied Statistics in Agriculture. https://doi.org/10.4148/2475-7772.1401 This is brought to you for free and open access by the Conferences at New Prairie Press. It has been accepted for inclusion in Conference on Applied Statistics in Agriculture by an authorized administrator of New Prairie Press. For more information, please contact [email protected]. Author Information Michael S. Williams, Hans T. Schreuder, Timothy G. Gregoire, and William A. Bechtold This is available at New Prairie Press: https://newprairiepress.org/agstatconference/1992/proceedings/13 Conference on153 Applied Statistics in Agriculture Kansas State University ESTIMATING VARIANCE FUNCTIONS FOR WEIGHTED LINEAR REGRESSION Michael S. Williams Hans T. Schreuder Multiresource Inventory Techniques Rocky Mountain Forest and Range Experiment Station USDA Forest Service Fort Collins, Colorado 80526-2098, U.S.A. Timothy G. Gregoire Department of Forestry Virginia Polytechnic Institute and State University Blacksburg, Virginia 24061-0324, U.S.A. William A. Bechtold Research Forester Southeastern Forest Experiment Station Forest Inventory and Analysis Project USDA Forest Service Asheville, North Carolina 28802-2680, U.S.A.
    [Show full text]