Keming Yu Brunel University London Outline

Bayesian Quantile Regression Keming Yu Brunel University London Outline • Quantile regression (QR) • Bayesian inference quantile regression (BQR) • Methods/algorithms • Conclusions Quantile regression • Regression or regression model usually refers to modelling the conditional expectation of Y given X E(Y jX = x): • Regression model typically measures the rela- tionship between Y and X, • it could be used to identify the dependency or effect, • it can also be used for prediction. Quantile regression • QR model is to modelling the conditional quantile of Y given X Qτ (Y jx); where 0 < τ < 1 stands for the τth quantile of Y . • For example, median regression with τ = 0:5. • If the conditional distribution of Y given X is symmetric, then the median regression is iden- tical to the mean regression, otherwise, they are different. Z 1 • In general, E(yjx) = Qτ (Y jx)dτ 0 =) mean regression model can be viewed as a summary of all the quantile effects. =) QR gives a deep analysis of the way that Y and X are related. An example of a simple linear quantile regression: when (X; Y ) ∼ bivariate normal distribution N(0; 0; r; 1; 1), 2 −1 Qτ (x) = rx + (1 − r )Φ (τ); where Φ−1(τ) denotes the inverse of standard normal distribution. Due to the symmetry of the conditional distribution, all quantile curves q0:1(x), q0:5(x) and q0:9(x) from N(0; 0; 0:75; 1; 1) are parallel. • • (X, Y)~N(0, 0, 0.75, 1, 1) • • •• •• • • • • • •• ••• • • • • • • • • • • • • • • •• • • • • • • • • ••• • •• • •• ••• • • • • • • • • • • • • • • • • • • • • • •• •• ••• • •• • •• •• • • •• • • •• • •• • • • • •• ••• • • • • • ••• • • • • • • • • •• • • • •• ••• • • • • • • • • • • • • • • • ••• •• • •••• ••• • • • • •• • • • •• •• •••••• ••• ••• • • • • • • • ••• • •••• •••• •••• • • • •• • • • •• ••• • • • • •• • • •• •• • ••• • • y • • ••• • • •• • • •• •• •• • • • • • • • •• • • ••• • •• • • • •• • • •••• • •• • • • • • •••• •• • • ••• • • •• • • •• •• • •••• • •• • • •• • • •• • • ••••• • • • • • ••••••• • • •• • • ••• • ••••• • • • •• •• ••• •• ••• • • • • • •• • • • • • 90% • •• • • ••• • • • •• •• • • • ••• • • • • • • • • •• • • • • • • •• • • • • •• • •• • • • • • • • • • •• • 50% • • • • • • • • •• • • • • • • • • • • • 10% • −4 −2 0 2 −3 −2 −1 0 1 2 x Otherwise, as see from the 4011 US girl weights against ages: Children Weight (kg) Children Weight (kg) Overweight Weight (kg) Weight (kg) Weight Underweight 0 20 40 60 80 100 0 20 40 60 80 100 5 10 15 20 5 10 15 20 Age (year) Age (year) Applications of quantile regression include • risk measurement: • the commonly used risk measurers in finance is called VaR (value at risk), which actually corresponding a tail value of an asset price. • People are more concerned the upper pollution level than an average one. Applications of quantile regression include • analysis of fat-tailed distribution: • does cheaper food contribute to children obe- sity? • does a specific diet make difference on children's weights? • The probability of Y exceeding a threshold Q: • given Q, find τ: τ = P r[Y ≤ QjX]. • forecast of the time-varying probability of a financial return exceeding a given threshold; • In wind-farm to predict/check if the wind speed exceeds a certain level; • flooding prediction and energy demand prediction. daily exceedance probability p LS Logit Empirical 0.0 0.2 0.4 0.6 0.8 1.0 hourly wind speed m/s 0 5 10 15 20 Jan Mar May Jul Sep Nov Jan Year 2011 Basic fitting method of QR • mean regression satisfies 2 • E(Y jX = x) = argminaE[(Y − a) jX]: • an QR qτ (x) satisfies • τ = P r[Y ≤ qτ (x)jX = x]: where the `Check function' τu; u > 0 • ρ (u) = [τ − I(u < 0)]u = f τ −(1 − τ)u; u ≤ 0: • f gn Given the data Yi;Xi i=1 to fit a linear QR model Y = x β + ϵ then P • ^ n − 0 β(τ) = argminb i=1 ρτ (Yi Xib): `Working' likelihood function for QR • Mean regression corresponds to loss function L(u) = u2 and normal likelihood with pdf − f (u) = p 1 exp(−(y µ)): N 2πσ 2σ2 • QR corresponds to check function ρτ (u) and the asymmetric Laplace likelihood (ALD) with pdf τ(1 − τ) y − µ f (u) = exp(−ρτ ( )): ALD σ σ • As minimizing the `check function' () maxi- mizing an ALD-based likelihood function. Bayesian Quantile Regression (BQR) • Regarding the regression parameters β from a 0 linear QR model Qτ (Y jx) = x β as a random variable, aiming at its posterior distribution ac- cording to Bayes Theorem: P r(βj:) ≈ P r(β) × P r(:jβ): • Bayesian inference depends on prior and likelihood function. • All selected likelihoods may not be true but some are working. • ALD has good performance on data generated from many error distributions (Ji et al. 2012; Li et al. 2010; among others) and theoretic justification: • Komunjer (2005) Quasi-Maximum Likelihood Estimation for Conditional Quantiles, J. Econo- metrics. Sriram, Ramamoorthi and Ghosh (2013). Pos- terior consistency of Bayesian quantile Regres- sion under a mis-specified likelihood based on asymmetric Laplace density, Bayesian Analysis. BQR • Yu and Moyeed(2001) used Metropolis Hast- ings (M-H) on joint distribution P r(βj:::): • R packages: http://cran.r-project.org/web/packages/bayesQR/bayesQR. pdf http://hosho.ees.hokudai.ac.jp/~kubo/Rdoc/library/ MCMCpack/html/MCMCquantreg.html Gibbs sampling for BQR • The advantage of Gibbs sampler over Metropo- lis Hastings (MH) algorithm: • Gibbs sampling doesn't have the convergence issue as MH algorithm has, and unlike M-H algorithm which has proposal distribution selec- tion, the proposal distribution of Gibbs sampling is simply taken to be the conditional distributions of the target distribution. • Each density to be sampling is of low dimension (often one dimensional) and thus it is very easy and efficient to sample from it. Gibbs sampling for BQR • ALD as a Mixture of Normals { ( )} τ (1 − τ) z − µ fτ (z; µ, σ) = exp −ρτ σ σ Z { } 1 1 1 = p exp − fz − µ − (1 − 2τ)wg2 0 2 πσw 4σw ( ) τ(1 − τ) τ(1 − τ) × exp − w dw σ σ • This extends the results (Laplace distribution with τ = 0:5 here) of West (1987) and Andrews and Mallows (1974) to the ALD. Gibbs sampling for BQR • That is, p ϵ = µ v + δ σ v u; where = 1−2τ and 2 = 2 , µ τ(1−τ) δ τ(1−τ) • v and u are independent and follow standard exponential distribution and normal distribution respectively: • v ∼ Exp(1/σ) and u ∼ N(0; 1). • Therefore the conditional distribution f(yjβ; σ; v) 0 under y = β X + ϵ consists of independent nor- 0 2 2 mal distribution N(β Xt + µσzt; δ σ zt): • Once the normal-Gamma conjugate prior of a normal distribution is provided for (β; σ), we can construct a Gibbs sampler for inference of the posterior conditional densities of all quantities. • To this end we assume that the prior β ∼ N(β0;B0) and σ ∼ IG(n0=2; s0=2), an inverse Gamma distribution IG(a; b) with parameters a = n0=2 and b = s0=2. • Then the posterior of β still follows normal distribution βjy; v; σ ∼ N(βp;Bp) P − with B−1 = n (X X0/δ2σv ) + B 1 and β = P p i=1 i i i 0 p n − 2 −1 Bp( i=1(Xi(Yi µvi)/δ σvi) + Bp β0): • The posterior distribution for vi follows a gen- eralized inverse Gaussian distribution 1 v jy; β; σ ∼ GIG( ; α ; γ ); i 2 i i 2 − 0 2 2 2 2 2 with αi = (Yi β Xi) /δ σ and γi = 2/σ + µ /δ σ: • The posterior distribution for σ follows an inverse Gamma distribution n∗ s∗ σjy; β; v ∼ IG( ; ); 2 2 P with n∗ = n + 3n and s∗ = s + 2 n v + P 0 0 i=1 i n − 0 − 2 2 i=1(Yi β Xi µ vi) /δ vi: • A Gibbs sampler successively sampling from σjβ; v; y vjβ; σ; y βjσ; v; y converges to p(β; σ; vjy). R packages: R-Package lqmm (cran.r-project.org/package=lqmm) for panel (longitudinal) data. R-package 'bayesQR' cran.r-project.org/web/packages/bayesQR/bayesQR.pdf. R-package Package 'Brq' cran.r-project.org/web/packages/Brq/Brq.pdf for BQR with cross-section data. R-package BSquare is also for BQR with panel data. Mathematics Justification of ALD-likelihood • Komunjer (2005): Quasi-Maximum Likelihood Estimation for Conditional Quantiles," Journal of Econometrics, 128, 137{164. • Komunjer (2005): likelihood should belong a tick-exponential family of densities { family whose role in QR is analog to the role of the linear- exponential family in mean regression estimation. • The "well-known member of the tick-exponential family is the asymmetric Laplace density..." How accurate of Gibbs sampler with ALD-likelihood? Consider fitting a QR model y = Qτ (yjx) + ϵ with BQR and eight different model errors. The 8 error distributions chosen to represent a wide variety of characteristics that a true error distribution may possess. • Gaussian (Gau): N(0; 12); • 1 −22 2 1 − 49 32 3 49 52 Skewed (Skew): 5N( 25 ; 1 ) + 5N( 125; 2 ) + 5N(250; 9 ); • 2 2 1 1 2 Kurtotic (Kur): 3N(0; 1 ) + 3N(0; 10 ); • 1 2 9 1 2 Outlier (Outl): 10N(0; 1 ) + 10N(0; 10 ); • 1 − 22 1 2 2 Bimodal (Bim): 2N( 1; 3 ) + 2N(1; 3) ); • 1 −3 12 1 3 12 Bimodal, separate modes (Sepa): 2N( 2; 2 ) + 2N(2; 2 ); • 3 − 43 2 1 107 12 Skewed bimodal (Skeb): 4N( 100; 1 ) + 4N(100; 3 ); • 9 −6 32 9 6 32 1 12 Trimodal (Tri): 20N( 5; 5 ) + 20N(5; 5 ) + 10N(0; 4 ): Gau Skew 0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Kur Outl 0 1 2 3 0.0 0.5 1.0 1.5 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Bim Sepa 0.0 0.1 0.2 0.3 0.4 0.00 0.10 0.20 0.30 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Tri Skeb 0.0 0.1 0.2 0.3 0.4 0.00 0.10 0.20 0.30 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Error distributions.

Keming Yu Brunel University London Outline

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support