Keming Yu Brunel University London Outline

Total Page:16

File Type:pdf, Size:1020Kb

Keming Yu Brunel University London Outline Bayesian Quantile Regression Keming Yu Brunel University London Outline • Quantile regression (QR) • Bayesian inference quantile regression (BQR) • Methods/algorithms • Conclusions Quantile regression • Regression or regression model usually refers to modelling the conditional expectation of Y given X E(Y jX = x): • Regression model typically measures the rela- tionship between Y and X, • it could be used to identify the dependency or effect, • it can also be used for prediction. Quantile regression • QR model is to modelling the conditional quan- tile of Y given X Qτ (Y jx); where 0 < τ < 1 stands for the τth quantile of Y . • For example, median regression with τ = 0:5. • If the conditional distribution of Y given X is symmetric, then the median regression is iden- tical to the mean regression, otherwise, they are different. Z 1 • In general, E(yjx) = Qτ (Y jx)dτ 0 =) mean regression model can be viewed as a summary of all the quantile effects. =) QR gives a deep analysis of the way that Y and X are related. An example of a simple linear quantile regression: when (X; Y ) ∼ bivariate normal distribution N(0; 0; r; 1; 1), 2 −1 Qτ (x) = rx + (1 − r )Φ (τ); where Φ−1(τ) denotes the inverse of standard nor- mal distribution. Due to the symmetry of the conditional distribu- tion, all quantile curves q0:1(x), q0:5(x) and q0:9(x) from N(0; 0; 0:75; 1; 1) are parallel. • • (X, Y)~N(0, 0, 0.75, 1, 1) • • •• •• • • • • • •• ••• • • • • • • • • • • • • • • •• • • • • • • • • ••• • •• • •• ••• • • • • • • • • • • • • • • • • • • • • • •• •• ••• • •• • •• •• • • •• • • •• • •• • • • • •• ••• • • • • • ••• • • • • • • • • •• • • • •• ••• • • • • • • • • • • • • • • • ••• •• • •••• ••• • • • • •• • • • •• •• •••••• ••• ••• • • • • • • • ••• • •••• •••• •••• • • • •• • • • •• ••• • • • • •• • • •• •• • ••• • • y • • ••• • • •• • • •• •• •• • • • • • • • •• • • ••• • •• • • • •• • • •••• • •• • • • • • •••• •• • • ••• • • •• • • •• •• • •••• • •• • • •• • • •• • • ••••• • • • • • ••••••• • • •• • • ••• • ••••• • • • •• •• ••• •• ••• • • • • • •• • • • • • 90% • •• • • ••• • • • •• •• • • • ••• • • • • • • • • •• • • • • • • •• • • • • •• • •• • • • • • • • • • •• • 50% • • • • • • • • •• • • • • • • • • • • • 10% • −4 −2 0 2 −3 −2 −1 0 1 2 x Otherwise, as see from the 4011 US girl weights against ages: Children Weight (kg) Children Weight (kg) Overweight Weight (kg) Weight (kg) Weight Underweight 0 20 40 60 80 100 0 20 40 60 80 100 5 10 15 20 5 10 15 20 Age (year) Age (year) Applications of quantile regression include • risk measurement: • the commonly used risk measurers in finance is called VaR (value at risk), which actually corresponding a tail value of an asset price. • People are more concerned the upper pollution level than an average one. Applications of quantile regression include • analysis of fat-tailed distribution: • does cheaper food contribute to children obe- sity? • does a specific diet make difference on chil- dren's weights? • The probability of Y exceeding a threshold Q: • given Q, find τ: τ = P r[Y ≤ QjX]. • forecast of the time-varying probability of a financial return exceeding a given threshold; • In wind-farm to predict/check if the wind speed exceeds a certain level; • flooding prediction and energy demand predic- tion. daily exceedance probability p LS Logit Empirical 0.0 0.2 0.4 0.6 0.8 1.0 hourly wind speed m/s 0 5 10 15 20 Jan Mar May Jul Sep Nov Jan Year 2011 Basic fitting method of QR • mean regression satisfies 2 • E(Y jX = x) = argminaE[(Y − a) jX]: • an QR qτ (x) satisfies • τ = P r[Y ≤ qτ (x)jX = x]: where the `Check function' τu; u > 0 • ρ (u) = [τ − I(u < 0)]u = f τ −(1 − τ)u; u ≤ 0: • f gn Given the data Yi;Xi i=1 to fit a linear QR model Y = x β + ϵ then P • ^ n − 0 β(τ) = argminb i=1 ρτ (Yi Xib): `Working' likelihood function for QR • Mean regression corresponds to loss function L(u) = u2 and normal likelihood with pdf − f (u) = p 1 exp(−(y µ)): N 2πσ 2σ2 • QR corresponds to check function ρτ (u) and the asymmetric Laplace likelihood (ALD) with pdf τ(1 − τ) y − µ f (u) = exp(−ρτ ( )): ALD σ σ • As minimizing the `check function' () maxi- mizing an ALD-based likelihood function. Bayesian Quantile Regression (BQR) • Regarding the regression parameters β from a 0 linear QR model Qτ (Y jx) = x β as a random variable, aiming at its posterior distribution ac- cording to Bayes Theorem: P r(βj:) ≈ P r(β) × P r(:jβ): • Bayesian inference depends on prior and likeli- hood function. • All selected likelihoods may not be true but some are working. • ALD has good performance on data generated from many error distributions (Ji et al. 2012; Li et al. 2010; among others) and theoretic justification: • Komunjer (2005) Quasi-Maximum Likelihood Estimation for Conditional Quantiles, J. Econo- metrics. Sriram, Ramamoorthi and Ghosh (2013). Pos- terior consistency of Bayesian quantile Regres- sion under a mis-specified likelihood based on asymmetric Laplace density, Bayesian Analysis. BQR • Yu and Moyeed(2001) used Metropolis Hast- ings (M-H) on joint distribution P r(βj:::): • R packages: http://cran.r-project.org/web/packages/bayesQR/bayesQR. pdf http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/ MCMCpack/html/MCMCquantreg.html Gibbs sampling for BQR • The advantage of Gibbs sampler over Metropo- lis Hastings (MH) algorithm: • Gibbs sampling doesn't have the convergence issue as MH algorithm has, and unlike M-H al- gorithm which has proposal distribution selec- tion, the proposal distribution of Gibbs sam- pling is simply taken to be the conditional dis- tributions of the target distribution. • Each density to be sampling is of low dimension (often one dimensional) and thus it is very easy and efficient to sample from it. Gibbs sampling for BQR • ALD as a Mixture of Normals { ( )} τ (1 − τ) z − µ fτ (z; µ, σ) = exp −ρτ σ σ Z { } 1 1 1 = p exp − fz − µ − (1 − 2τ)wg2 0 2 πσw 4σw ( ) τ(1 − τ) τ(1 − τ) × exp − w dw σ σ • This extends the results (Laplace distribution with τ = 0:5 here) of West (1987) and Andrews and Mallows (1974) to the ALD. Gibbs sampling for BQR • That is, p ϵ = µ v + δ σ v u; where = 1−2τ and 2 = 2 , µ τ(1−τ) δ τ(1−τ) • v and u are independent and follow standard exponential distribution and normal distribution respectively: • v ∼ Exp(1/σ) and u ∼ N(0; 1). • Therefore the conditional distribution f(yjβ; σ; v) 0 under y = β X + ϵ consists of independent nor- 0 2 2 mal distribution N(β Xt + µσzt; δ σ zt): • Once the normal-Gamma conjugate prior of a normal distribution is provided for (β; σ), we can construct a Gibbs sampler for inference of the posterior conditional densities of all quantities. • To this end we assume that the prior β ∼ N(β0;B0) and σ ∼ IG(n0=2; s0=2), an inverse Gamma dis- tribution IG(a; b) with parameters a = n0=2 and b = s0=2. • Then the posterior of β still follows normal dis- tribution βjy; v; σ ∼ N(βp;Bp) P − with B−1 = n (X X0/δ2σv ) + B 1 and β = P p i=1 i i i 0 p n − 2 −1 Bp( i=1(Xi(Yi µvi)/δ σvi) + Bp β0): • The posterior distribution for vi follows a gen- eralized inverse Gaussian distribution 1 v jy; β; σ ∼ GIG( ; α ; γ ); i 2 i i 2 − 0 2 2 2 2 2 with αi = (Yi β Xi) /δ σ and γi = 2/σ + µ /δ σ: • The posterior distribution for σ follows an in- verse Gamma distribution n∗ s∗ σjy; β; v ∼ IG( ; ); 2 2 P with n∗ = n + 3n and s∗ = s + 2 n v + P 0 0 i=1 i n − 0 − 2 2 i=1(Yi β Xi µ vi) /δ vi: • A Gibbs sampler successively sampling from σjβ; v; y vjβ; σ; y βjσ; v; y converges to p(β; σ; vjy). R packages: R-Package lqmm (cran.r-project.org/package=lqmm) for panel (longitudinal) data. R-package 'bayesQR' cran.r-project.org/web/packages/bayesQR/bayesQR.pdf. R-package Package 'Brq' cran.r-project.org/web/packages/Brq/Brq.pdf for BQR with cross-section data. R-package BSquare is also for BQR with panel data. Mathematics Justification of ALD-likelihood • Komunjer (2005): Quasi-Maximum Likelihood Estimation for Conditional Quantiles," Journal of Econometrics, 128, 137{164. • Komunjer (2005): likelihood should belong a tick-exponential family of densities { family whose role in QR is analog to the role of the linear- exponential family in mean regression estima- tion. • The "well-known member of the tick-exponential family is the asymmetric Laplace density..." How accurate of Gibbs sampler with ALD-likelihood? Consider fitting a QR model y = Qτ (yjx) + ϵ with BQR and eight different model errors. The 8 error distributions chosen to represent a wide variety of characteristics that a true error distribution may possess. • Gaussian (Gau): N(0; 12); • 1 −22 2 1 − 49 32 3 49 52 Skewed (Skew): 5N( 25 ; 1 ) + 5N( 125; 2 ) + 5N(250; 9 ); • 2 2 1 1 2 Kurtotic (Kur): 3N(0; 1 ) + 3N(0; 10 ); • 1 2 9 1 2 Outlier (Outl): 10N(0; 1 ) + 10N(0; 10 ); • 1 − 22 1 2 2 Bimodal (Bim): 2N( 1; 3 ) + 2N(1; 3) ); • 1 −3 12 1 3 12 Bimodal, separate modes (Sepa): 2N( 2; 2 ) + 2N(2; 2 ); • 3 − 43 2 1 107 12 Skewed bimodal (Skeb): 4N( 100; 1 ) + 4N(100; 3 ); • 9 −6 32 9 6 32 1 12 Trimodal (Tri): 20N( 5; 5 ) + 20N(5; 5 ) + 10N(0; 4 ): Gau Skew 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Kur Outl 0 1 2 3 0.0 0.5 1.0 1.5 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Bim Sepa 0.0 0.1 0.2 0.3 0.4 0.00 0.10 0.20 0.30 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Tri Skeb 0.0 0.1 0.2 0.3 0.4 0.00 0.10 0.20 0.30 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Error distributions.
Recommended publications
  • QUANTILE REGRESSION for CLIMATE DATA Dilhani Marasinghe Clemson University, [email protected]
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Clemson University: TigerPrints Clemson University TigerPrints All Theses Theses 8-2014 QUANTILE REGRESSION FOR CLIMATE DATA Dilhani Marasinghe Clemson University, [email protected] Follow this and additional works at: https://tigerprints.clemson.edu/all_theses Part of the Statistics and Probability Commons Recommended Citation Marasinghe, Dilhani, "QUANTILE REGRESSION FOR CLIMATE DATA" (2014). All Theses. 1909. https://tigerprints.clemson.edu/all_theses/1909 This Thesis is brought to you for free and open access by the Theses at TigerPrints. It has been accepted for inclusion in All Theses by an authorized administrator of TigerPrints. For more information, please contact [email protected]. QUANTILE REGRESSION FOR CLIMATE DATA A Master Thesis Presented to the Graduate School of Clemson University In Partial Fulfillment of the Requirements for the Degree MASTER OF SCIENCE Mathematical Sciences by DILHANI SHALIKA MARASINGHE August 2014 Accepted by: Dr. Collin Gallagher, Committee Chair Dr. Christoper McMahan Dr. Robert Lund Abstract Quantile regression is a developing statistical tool which is used to explain the relationship between response and predictor variables. This thesis describes two examples of climatology using quantile re- gression. Our main goal is to estimate derivatives of a conditional mean and/or conditional quantile function. We introduce a method to handle autocorrelation in the framework of quantile regression and used it with the temperature data. Also we explain some properties of the tornado data which is non-normally distributed. Even though quantile regression provides a more comprehensive view, when talking about residuals with the normality and the constant variance assumption, we would prefer least square regression for our temperature analysis.
    [Show full text]
  • Gretl User's Guide
    Gretl User’s Guide Gnu Regression, Econometrics and Time-series Allin Cottrell Department of Economics Wake Forest university Riccardo “Jack” Lucchetti Dipartimento di Economia Università Politecnica delle Marche December, 2008 Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation (see http://www.gnu.org/licenses/fdl.html). Contents 1 Introduction 1 1.1 Features at a glance ......................................... 1 1.2 Acknowledgements ......................................... 1 1.3 Installing the programs ....................................... 2 I Running the program 4 2 Getting started 5 2.1 Let’s run a regression ........................................ 5 2.2 Estimation output .......................................... 7 2.3 The main window menus ...................................... 8 2.4 Keyboard shortcuts ......................................... 11 2.5 The gretl toolbar ........................................... 11 3 Modes of working 13 3.1 Command scripts ........................................... 13 3.2 Saving script objects ......................................... 15 3.3 The gretl console ........................................... 15 3.4 The Session concept ......................................... 16 4 Data files 19 4.1 Native format ............................................. 19 4.2 Other data file formats ....................................... 19 4.3 Binary databases ..........................................
    [Show full text]
  • Quantreg: Quantile Regression
    Package ‘quantreg’ June 6, 2021 Title Quantile Regression Description Estimation and inference methods for models of conditional quantiles: Linear and nonlinear parametric and non-parametric (total variation penalized) models for conditional quantiles of a univariate response and several methods for handling censored survival data. Portfolio selection methods based on expected shortfall risk are also now included. See Koenker (2006) <doi:10.1017/CBO9780511754098> and Koenker et al. (2017) <doi:10.1201/9781315120256>. Version 5.86 Maintainer Roger Koenker <[email protected]> Repository CRAN Depends R (>= 2.6), stats, SparseM Imports methods, graphics, Matrix, MatrixModels, conquer Suggests tripack, akima, MASS, survival, rgl, logspline, nor1mix, Formula, zoo, R.rsp License GPL (>= 2) URL https://www.r-project.org NeedsCompilation yes VignetteBuilder R.rsp Author Roger Koenker [cre, aut], Stephen Portnoy [ctb] (Contributions to Censored QR code), Pin Tian Ng [ctb] (Contributions to Sparse QR code), Blaise Melly [ctb] (Contributions to preprocessing code), Achim Zeileis [ctb] (Contributions to dynrq code essentially identical to his dynlm code), Philip Grosjean [ctb] (Contributions to nlrq code), Cleve Moler [ctb] (author of several linpack routines), Yousef Saad [ctb] (author of sparskit2), Victor Chernozhukov [ctb] (contributions to extreme value inference code), Ivan Fernandez-Val [ctb] (contributions to extreme value inference code), Brian D Ripley [trl, ctb] (Initial (2001) R port from S (to my 1 2 R topics documented: everlasting shame -- how could I have been so slow to adopt R!) and for numerous other suggestions and useful advice) Date/Publication 2021-06-06 17:10:02 UTC R topics documented: akj..............................................3 anova.rq . .5 bandwidth.rq .
    [Show full text]
  • 柏際股份有限公司 Bockytech, Inc. 9F-3,No 70, Yanping S
    柏際股份有限公司 BockyTech, Inc. 9F-3,No 70, YanPing S. Rd., Taipei 10042, Taiwan, R.O.C. 10042 台北市延平南路 70 號 9F-3 Http://www.bockytech.com.tw Tel:886-2-23618050 Fax:886-2-23619803 Version 10 LIMDEP Version 10 is an integrated program for estimation and analysis of linear and nonlinear models, with cross section, time series and panel data. LIMDEP has long been a leader in the field of econometric analysis and has provided many recent innovations including cutting edge techniques in panel data analysis, frontier and efficiency estimation and discrete choice modeling. The collection of techniques and procedures for analyzing panel data is without parallel in any other computer program available anywhere. Recognized for years as the standard software for the estimation and manipulation of discrete and limited dependent variable models, LIMDEP10 is now unsurpassed in the breadth and variety of its estimation tools. The main feature of the package is a suite of more than 100 built-in estimators for all forms of the linear regression model, and stochastic frontier, discrete choice and limited dependent variable models, including models for binary, censored, truncated, survival, count, discrete and continuous variables and a variety of sample selection models. No other program offers a wider range of single and multiple equation linear and nonlinear models. LIMDEP is a true state-of-the-art program that is used for teaching and research at thousands of universities, government agencies, research institutes, businesses and industries around the world. LIMDEP is a Complete Econometrics Package LIMDEP takes the form of an econometrics studio.
    [Show full text]
  • Unconditional Quantile Regressions*
    UNCONDITIONAL QUANTILE REGRESSIONS* Sergio Firpo Nicole M. Fortin Departamento de Economia Department of Economics Catholic University of Rio de Janeiro (PUC-Rio) University of British Columbia R. Marques de S. Vicente, 225/210F #997-1873 East Mall Rio de Janeiro, RJ, Brasil, 22453-900 Vancouver, BC, Canada V6T 1Z1 [email protected] [email protected] Thomas Lemieux Department of Economics University of British Columbia #997-1873 East Mall Vancouver, Canada, BC V6T 1Z1 [email protected] July 2007 ABSTRACT We propose a new regression method to estimate the impact of explanatory variables on quantiles of the unconditional (marginal) distribution of an outcome variable. The proposed method consists of running a regression of the (recentered) influence function (RIF) of the unconditional quantile on the explanatory variables. The influence function is a widely used tool in robust estimation that can easily be computed for each quantile of interest. We show how standard partial effects, as well as policy effects, can be estimated using our regression approach. We propose three different regression estimators based on a standard OLS regression (RIF-OLS), a logit regression (RIF-Logit), and a nonparametric logit regression (RIF-OLS). We also discuss how our approach can be generalized to other distributional statistics besides quantiles. * We are indebted to Joe Altonji, Richard Blundell, David Card, Vinicius Carrasco, Marcelo Fernandes, Chuan Goh, Joel Horowitz, Shakeeb Khan, Roger Koenker, Thierry Magnac, Whitney Newey, Geert Ridder, Jean-Marc Robin, Hal White and seminar participants at Yale University, CAEN-UFC, CEDEPLARUFMG, PUC-Rio, IPEA-RJ and Econometrics in Rio 2006 for useful comments on this and earlier versions of the manuscript.
    [Show full text]
  • Stochastic Dominance Via Quantile Regression
    Abstract We derive a new way to test for stochastic dominance between the return of two assets using a quantile regression formulation. The test statistic is a variant of the one-sided Kolmogorov-Smirnoffstatistic and has a limiting distribution of the standard Brownian bridge. We also illustrate how the test statistic can be extended to test for stochastic dominance among k assets. This is useful when comparing the performance of individual assets in a portfolio against some market index. We show how the test statistic can be modified to test for stochastic dominance up to the α-quantile in situation where the return of one asset does not dominate another over the whole spectrum of the return distribution. Keywords: Quantile regression, stochastic dominance, Brownian bridge, test statis- tic. 1Introduction Stochastic dominance finds applications in many areas. In finance, it is used to assess portfolio diversification, capital structure, bankruptcy risk, and option’s price bound. In welfare economics, it is used to measure income distribution and income inequality In reinsurance coverage, the insured use it to select the best coverage option while the insurers use it to assess whether the options are consistently priced. It is also used to select effective treatment in medicine and selection of the best irrigation system in agriculture. There are two big classes of stochastic dominance tests. The first is based on the inf / sup statistics over the support of the distributions as in McFadden (1989), Klecan, McFadden and McFadden (1991), and Kaur, Rao and Singh (1994). The second class is based on comparison of the distributions over a set of grid points as in Anderson (1996), Dardanoni and Forcina (1998, 1999), and Davidson and Duclos (2000).
    [Show full text]
  • A New Quantile Regression for Modeling Bounded Data Under a Unit Birnbaum–Saunders Distribution with Applications in Medicine and Politics
    S S symmetry Article A New Quantile Regression for Modeling Bounded Data under a Unit Birnbaum–Saunders Distribution with Applications in Medicine and Politics Josmar Mazucheli 1 , Víctor Leiva 2,* , Bruna Alves 1 and André F. B. Menezes 3 1 Departament of Statistics, Universidade Estadual de Maringá, Maringá 87020-900, Brazil; [email protected] (J.M.); [email protected] (B.A.) 2 School of Industrial Engineering, Pontificia Universidad Católica de Valparaíso, Valparaíso 2362807, Chile 3 Departament of Statistics, Universidade Estadual de Campinas, Campinas 13083-970, Brazil; [email protected] * Correspondence: [email protected] or [email protected] Abstract: Quantile regression provides a framework for modeling the relationship between a response variable and covariates using the quantile function. This work proposes a regression model for continuous variables bounded to the unit interval based on the unit Birnbaum–Saunders distribution as an alternative to the existing quantile regression models. By parameterizing the unit Birnbaum–Saunders distribution in terms of its quantile function allows us to model the effect of covariates across the entire response distribution, rather than only at the mean. Our proposal, especially useful for modeling quantiles using covariates, in general outperforms the other competing models available in the literature. These findings are supported by Monte Carlo simulations and applications using two real data sets. An R package, including parameter estimation, model checking Citation: Mazucheli, J.; Leiva, V.; as well as density, cumulative distribution, quantile and random number generating functions of the Alves, B.; Menezes, A.F.B. A New unit Birnbaum–Saunders distribution was developed and can be readily used to assess the suitability Quantile Regression for Modeling of our proposal.
    [Show full text]
  • Quantile Regressionregression
    QuantileQuantile RegressionRegression By Luyang Fu, Ph. D., FCAS, State Auto Insurance Company Cheng-sheng Peter Wu, FCAS, ASA, MAAA, Deloitte Consulting AgendaAgenda Overview of Predictive Modeling for P&C Applications Quantile Regression: Background and Theory Two Commercial Line Case Studies – Claim Severity Modeling and Loss Ratio Modeling Q&A Overview of Predictive Modeling for P&C Applications OverviewOverview ofof PMPM forfor P&CP&C ApplicationsApplications Modeling Techniques: ◦ Ordinary Least Square and Linear Regression: Normal distribution assumption Linear relationship between target and covariates Predict the mean of the target against the covariates ◦ GLM: Expansion of distribution assumptions with exponential family distributions,: frequency - Poisson, severity - Gamma, pure premium and loss ratio - Tweedie Linear relationship between the mean of the target and the covariates through the link function ◦ Minimum Bias/General Iteration Algorithm: Iterative algorithm Essentially derives the same results as GLM Categorical predictive variables only Linear relationship between target and covariates ◦ Neural Networks: ◦ Non-linear regression: a series of logit functions to approximate the nonlinear relationship between target and covariates ◦ Originated in the data mining field; a curve fitting technique; non parametric assumptions ◦ Different cost or error functions can be selected for minimization, for example, a least square error term or an absolute error function ◦ One popular algorithm is the backward propagation
    [Show full text]
  • Notes on Median and Quantile Regression
    Notes On Median and Quantile Regression James L. Powell Department of Economics University of California, Berkeley Conditional Median Restrictions and Least Absolute Deviations It is well-known that the expected value of a random variable Y minimizes the expected squared deviation between Y and a constant; that is, µ E[Y ] Y ≡ =argminE(Y c)2, c − assuming E Y 2 is finite. (In fact, it is only necessary to assume E Y is finite, if the minimand is normalized by|| subtracting|| Y 2, i.e., || || µ E[Y ] Y ≡ =argminE[(Y c)2 Y 2] c − − =argmin[c2 2cE[Y ]], c − and this normalization has no effect on the solution if the stronger condition holds.) It is less well-known that a median of Y, defined as any number ηY for which 1 Pr Y η and { ≤ Y } ≥ 2 1 Pr Y η , { ≥ Y } ≥ 2 minimizes an expected absolute deviation criterion, η =argminE[ Y c Y ], Y c | − | − | | though the solution to this minimization problem need not be unique. When the c.d.f. FY is strictly increasing everywhere (i.e., Y is continuously distributed with positive density), then uniqueness is not an issue, and 1 ηY = FY− (1/2). In this case, the first-order condition for minimization of E[ Y c Y ] is | − | − | | 0= E[sgn(Y c)], − − for sgn(u) the “sign” (or “signum”) function sgn(u) 1 2 1 u<0 , ≡ − · { } here defined to be right-continuous. 1 Thus, just as least squares (LS) estimation is the natural generalization of the sample mean to estimation of regression coefficients, least absolute deviations (LAD) estimation is the generalization of the sample median to the linear regression context.
    [Show full text]
  • The Use of Logistic Regression and Quantile Regression in Medical Statistics
    The use of logistic regression and quantile regression in medical statistics Solveig Fosdal Master of Science Submission date: June 2017 Supervisor: Ingelin Steinsland, IMF Co-supervisor: Håkon Gjessing, Nasjonalt folkehelseinstitutt Norwegian University of Science and Technology Department of Mathematical Sciences Summary The main goal of this thesis is to compare and illustrate the use of logistic regression and quantile regression on continuous outcome variables. In medical statistics, logistic regres- sion is frequently applied to continuous outcomes by defining a cut-off value, whereas quantile regression can be applied directly to quantiles of the outcome distribution. The two approaches appear different, but are closely related. An approximate relation between the quantile effect and the log-odds ratio is derived. Practical examples and illustrations are shown through a case study concerning the effect of maternal smoking during preg- nancy and mother’s age on birth weight, where low birth weight is of special interest. Both maternal smoking during pregnancy and mother’s age are found to have a significant ef- fect on birth weight, and the effect of maternal smoking is found to have a slightly larger negative effect on low birth weight than for other quantiles. Trend in birth weight over years is also studied as a part of the case study. Further, the two approaches are tested on simulated data from known probability density functions, pdfs. We consider a population consisting of two groups, where one of the groups is exposed to a factor, and the effect of exposure is of interest. By this we illustrate the quantile effect and the odds ratio for several examples of location, scale and location-scale shift of the normal distribution and the Student t-distribution.
    [Show full text]
  • Chapter Quantile Regression
    A Handbook of Statistical Analyses Using R — 3rd Edition Torsten Hothorn and Brian S. Everitt CHAPTER 12 Quantile Regression: Head Circumference for Age 12.1 Introduction 12.2 Quantile Regression 12.3 Analysis Using R We begin with a graphical inspection of the influence of age on head circumfer- ence by means of a scatterplot. Plotting all pairs of age and head circumference in one panel gives more weight to the teens and 20s, so we produce one plot for younger boys between two and nine years old and one additional plot for boys older than nine years (or > 108 months, to be precise). The cut function is very convenient for constructing a factor representing these two groups R> summary(db) head age Min. :33.5 Min. : 0.03 1st Qu.:48.8 1st Qu.: 1.75 Median :53.0 Median : 9.99 Mean :51.7 Mean : 8.94 3rd Qu.:55.7 3rd Qu.:14.84 Max. :66.3 Max. :21.68 R> db$cut <- cut(db$age, breaks = c(2, 9, 23), + labels = c("2-9 yrs", "9-23 yrs")) which can then be used as a conditioning variable for conditional scatterplots produced with the xyplot function (Sarkar, 2014, package lattice). Because we draw 5101 points in total, we use transparent shading (via rgb(.1, .1, .1, .1)) in order to obtain a clearer picture for the more populated areas in the plot. Figure 12.1, as expected, shows that head circumference increases with age. It also shows that there is considerable variation and also quite a number of extremely large or small head circumferences in the respective age cohorts.
    [Show full text]
  • Logistic Quantile Regression to Evaluate Bounded Outcomes
    Logistic Quantile Regression to Evaluate Bounded Outcomes Vivi Wong Kandidatuppsats i matematisk statistik Bachelor Thesis in Mathematical Statistics Kandidatuppsats 2018:16 Matematisk statistik Juni 2018 www.math.su.se Matematisk statistik Matematiska institutionen Stockholms universitet 106 91 Stockholm Matematiska institutionen Mathematical Statistics Stockholm University Bachelor Thesis 2018:16 http://www.math.su.se Logistic Quantile Regression to Evaluate Bounded Outcomes Vivi Wong∗ June 2018 Abstract Lower urinary tract symptoms in men are common when men get older, and these symptoms can be measured with I-PSS (International Prostate Symptom Score), a scale between 0-35. The density function of the bounded outcome variable, I-PSS, is highly skewed to the right. It can therefore be difficult to analyze this type of variables with stan- dard regression methods such as OLS, since these methods give us the effect of the explanatory variables on the mean of the response variable. Epidemiological studies commonly study how lifestyle and several other factors affect health-related problems. We will therefore study the effect physical activity has on lower urinary tract symptoms by using logistic quantile regression, which is an appropriate method to use when we have bounded outcomes. The method works well because instead of the mean, it focuses on quantiles and it takes the bounded interval into account. The results show a negative relationship between total physical activity and lower urinary tract symptoms, so men who are more physical active will more likely have lower and milder symptoms. ∗Postal address: Mathematical Statistics, Stockholm University, SE-106 91, Sweden. E-mail: [email protected].
    [Show full text]