Keming Yu Brunel University London Outline
Total Page:16
File Type:pdf, Size:1020Kb
Bayesian Quantile Regression Keming Yu Brunel University London Outline • Quantile regression (QR) • Bayesian inference quantile regression (BQR) • Methods/algorithms • Conclusions Quantile regression • Regression or regression model usually refers to modelling the conditional expectation of Y given X E(Y jX = x): • Regression model typically measures the rela- tionship between Y and X, • it could be used to identify the dependency or effect, • it can also be used for prediction. Quantile regression • QR model is to modelling the conditional quan- tile of Y given X Qτ (Y jx); where 0 < τ < 1 stands for the τth quantile of Y . • For example, median regression with τ = 0:5. • If the conditional distribution of Y given X is symmetric, then the median regression is iden- tical to the mean regression, otherwise, they are different. Z 1 • In general, E(yjx) = Qτ (Y jx)dτ 0 =) mean regression model can be viewed as a summary of all the quantile effects. =) QR gives a deep analysis of the way that Y and X are related. An example of a simple linear quantile regression: when (X; Y ) ∼ bivariate normal distribution N(0; 0; r; 1; 1), 2 −1 Qτ (x) = rx + (1 − r )Φ (τ); where Φ−1(τ) denotes the inverse of standard nor- mal distribution. Due to the symmetry of the conditional distribu- tion, all quantile curves q0:1(x), q0:5(x) and q0:9(x) from N(0; 0; 0:75; 1; 1) are parallel. • • (X, Y)~N(0, 0, 0.75, 1, 1) • • •• •• • • • • • •• ••• • • • • • • • • • • • • • • •• • • • • • • • • ••• • •• • •• ••• • • • • • • • • • • • • • • • • • • • • • •• •• ••• • •• • •• •• • • •• • • •• • •• • • • • •• ••• • • • • • ••• • • • • • • • • •• • • • •• ••• • • • • • • • • • • • • • • • ••• •• • •••• ••• • • • • •• • • • •• •• •••••• ••• ••• • • • • • • • ••• • •••• •••• •••• • • • •• • • • •• ••• • • • • •• • • •• •• • ••• • • y • • ••• • • •• • • •• •• •• • • • • • • • •• • • ••• • •• • • • •• • • •••• • •• • • • • • •••• •• • • ••• • • •• • • •• •• • •••• • •• • • •• • • •• • • ••••• • • • • • ••••••• • • •• • • ••• • ••••• • • • •• •• ••• •• ••• • • • • • •• • • • • • 90% • •• • • ••• • • • •• •• • • • ••• • • • • • • • • •• • • • • • • •• • • • • •• • •• • • • • • • • • • •• • 50% • • • • • • • • •• • • • • • • • • • • • 10% • −4 −2 0 2 −3 −2 −1 0 1 2 x Otherwise, as see from the 4011 US girl weights against ages: Children Weight (kg) Children Weight (kg) Overweight Weight (kg) Weight (kg) Weight Underweight 0 20 40 60 80 100 0 20 40 60 80 100 5 10 15 20 5 10 15 20 Age (year) Age (year) Applications of quantile regression include • risk measurement: • the commonly used risk measurers in finance is called VaR (value at risk), which actually corresponding a tail value of an asset price. • People are more concerned the upper pollution level than an average one. Applications of quantile regression include • analysis of fat-tailed distribution: • does cheaper food contribute to children obe- sity? • does a specific diet make difference on chil- dren's weights? • The probability of Y exceeding a threshold Q: • given Q, find τ: τ = P r[Y ≤ QjX]. • forecast of the time-varying probability of a financial return exceeding a given threshold; • In wind-farm to predict/check if the wind speed exceeds a certain level; • flooding prediction and energy demand predic- tion. daily exceedance probability p LS Logit Empirical 0.0 0.2 0.4 0.6 0.8 1.0 hourly wind speed m/s 0 5 10 15 20 Jan Mar May Jul Sep Nov Jan Year 2011 Basic fitting method of QR • mean regression satisfies 2 • E(Y jX = x) = argminaE[(Y − a) jX]: • an QR qτ (x) satisfies • τ = P r[Y ≤ qτ (x)jX = x]: where the `Check function' τu; u > 0 • ρ (u) = [τ − I(u < 0)]u = f τ −(1 − τ)u; u ≤ 0: • f gn Given the data Yi;Xi i=1 to fit a linear QR model Y = x β + ϵ then P • ^ n − 0 β(τ) = argminb i=1 ρτ (Yi Xib): `Working' likelihood function for QR • Mean regression corresponds to loss function L(u) = u2 and normal likelihood with pdf − f (u) = p 1 exp(−(y µ)): N 2πσ 2σ2 • QR corresponds to check function ρτ (u) and the asymmetric Laplace likelihood (ALD) with pdf τ(1 − τ) y − µ f (u) = exp(−ρτ ( )): ALD σ σ • As minimizing the `check function' () maxi- mizing an ALD-based likelihood function. Bayesian Quantile Regression (BQR) • Regarding the regression parameters β from a 0 linear QR model Qτ (Y jx) = x β as a random variable, aiming at its posterior distribution ac- cording to Bayes Theorem: P r(βj:) ≈ P r(β) × P r(:jβ): • Bayesian inference depends on prior and likeli- hood function. • All selected likelihoods may not be true but some are working. • ALD has good performance on data generated from many error distributions (Ji et al. 2012; Li et al. 2010; among others) and theoretic justification: • Komunjer (2005) Quasi-Maximum Likelihood Estimation for Conditional Quantiles, J. Econo- metrics. Sriram, Ramamoorthi and Ghosh (2013). Pos- terior consistency of Bayesian quantile Regres- sion under a mis-specified likelihood based on asymmetric Laplace density, Bayesian Analysis. BQR • Yu and Moyeed(2001) used Metropolis Hast- ings (M-H) on joint distribution P r(βj:::): • R packages: http://cran.r-project.org/web/packages/bayesQR/bayesQR. pdf http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/ MCMCpack/html/MCMCquantreg.html Gibbs sampling for BQR • The advantage of Gibbs sampler over Metropo- lis Hastings (MH) algorithm: • Gibbs sampling doesn't have the convergence issue as MH algorithm has, and unlike M-H al- gorithm which has proposal distribution selec- tion, the proposal distribution of Gibbs sam- pling is simply taken to be the conditional dis- tributions of the target distribution. • Each density to be sampling is of low dimension (often one dimensional) and thus it is very easy and efficient to sample from it. Gibbs sampling for BQR • ALD as a Mixture of Normals { ( )} τ (1 − τ) z − µ fτ (z; µ, σ) = exp −ρτ σ σ Z { } 1 1 1 = p exp − fz − µ − (1 − 2τ)wg2 0 2 πσw 4σw ( ) τ(1 − τ) τ(1 − τ) × exp − w dw σ σ • This extends the results (Laplace distribution with τ = 0:5 here) of West (1987) and Andrews and Mallows (1974) to the ALD. Gibbs sampling for BQR • That is, p ϵ = µ v + δ σ v u; where = 1−2τ and 2 = 2 , µ τ(1−τ) δ τ(1−τ) • v and u are independent and follow standard exponential distribution and normal distribution respectively: • v ∼ Exp(1/σ) and u ∼ N(0; 1). • Therefore the conditional distribution f(yjβ; σ; v) 0 under y = β X + ϵ consists of independent nor- 0 2 2 mal distribution N(β Xt + µσzt; δ σ zt): • Once the normal-Gamma conjugate prior of a normal distribution is provided for (β; σ), we can construct a Gibbs sampler for inference of the posterior conditional densities of all quantities. • To this end we assume that the prior β ∼ N(β0;B0) and σ ∼ IG(n0=2; s0=2), an inverse Gamma dis- tribution IG(a; b) with parameters a = n0=2 and b = s0=2. • Then the posterior of β still follows normal dis- tribution βjy; v; σ ∼ N(βp;Bp) P − with B−1 = n (X X0/δ2σv ) + B 1 and β = P p i=1 i i i 0 p n − 2 −1 Bp( i=1(Xi(Yi µvi)/δ σvi) + Bp β0): • The posterior distribution for vi follows a gen- eralized inverse Gaussian distribution 1 v jy; β; σ ∼ GIG( ; α ; γ ); i 2 i i 2 − 0 2 2 2 2 2 with αi = (Yi β Xi) /δ σ and γi = 2/σ + µ /δ σ: • The posterior distribution for σ follows an in- verse Gamma distribution n∗ s∗ σjy; β; v ∼ IG( ; ); 2 2 P with n∗ = n + 3n and s∗ = s + 2 n v + P 0 0 i=1 i n − 0 − 2 2 i=1(Yi β Xi µ vi) /δ vi: • A Gibbs sampler successively sampling from σjβ; v; y vjβ; σ; y βjσ; v; y converges to p(β; σ; vjy). R packages: R-Package lqmm (cran.r-project.org/package=lqmm) for panel (longitudinal) data. R-package 'bayesQR' cran.r-project.org/web/packages/bayesQR/bayesQR.pdf. R-package Package 'Brq' cran.r-project.org/web/packages/Brq/Brq.pdf for BQR with cross-section data. R-package BSquare is also for BQR with panel data. Mathematics Justification of ALD-likelihood • Komunjer (2005): Quasi-Maximum Likelihood Estimation for Conditional Quantiles," Journal of Econometrics, 128, 137{164. • Komunjer (2005): likelihood should belong a tick-exponential family of densities { family whose role in QR is analog to the role of the linear- exponential family in mean regression estima- tion. • The "well-known member of the tick-exponential family is the asymmetric Laplace density..." How accurate of Gibbs sampler with ALD-likelihood? Consider fitting a QR model y = Qτ (yjx) + ϵ with BQR and eight different model errors. The 8 error distributions chosen to represent a wide variety of characteristics that a true error distribution may possess. • Gaussian (Gau): N(0; 12); • 1 −22 2 1 − 49 32 3 49 52 Skewed (Skew): 5N( 25 ; 1 ) + 5N( 125; 2 ) + 5N(250; 9 ); • 2 2 1 1 2 Kurtotic (Kur): 3N(0; 1 ) + 3N(0; 10 ); • 1 2 9 1 2 Outlier (Outl): 10N(0; 1 ) + 10N(0; 10 ); • 1 − 22 1 2 2 Bimodal (Bim): 2N( 1; 3 ) + 2N(1; 3) ); • 1 −3 12 1 3 12 Bimodal, separate modes (Sepa): 2N( 2; 2 ) + 2N(2; 2 ); • 3 − 43 2 1 107 12 Skewed bimodal (Skeb): 4N( 100; 1 ) + 4N(100; 3 ); • 9 −6 32 9 6 32 1 12 Trimodal (Tri): 20N( 5; 5 ) + 20N(5; 5 ) + 10N(0; 4 ): Gau Skew 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Kur Outl 0 1 2 3 0.0 0.5 1.0 1.5 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Bim Sepa 0.0 0.1 0.2 0.3 0.4 0.00 0.10 0.20 0.30 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Tri Skeb 0.0 0.1 0.2 0.3 0.4 0.00 0.10 0.20 0.30 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Error distributions.