Bayesian Statistics for Genetics Lecture 4: Linear Regression

Bayesian Statistics for Genetics Lecture 4: Linear Regression

Bayesian Statistics for Genetics Lecture 4: Linear Regression Ken Rice Swiss Institute in Statistical Genetics September, 2017 Regression models: overview How does an outcome Y vary as a function of x = fx1; : : : ; xpg? • What are the effect sizes? • What is the effect of x1, in observations that have the same x2; x3; :::xp (a.k.a. \keeping these covariates constant")? • Can we predict Y as a function of x? These questions can be assessed via a regression model p(yjx). 4.1 Regression models: overview Parameters in a regression model can be estimated from data: 0 1 y1 x1;1 ··· x1;p B . C @ . A yn xn;1 ··· xn;p The data are often expressed in matrix/vector form: 0 1 0 1 0 1 y1 x1 x1;1 ··· x1;p B . C B . C B . C y = @ . A X = @ . A = @ . A yn xn xn;1 ··· xn;p 4.2 FTO example: design FTO gene is hypothesized to be involved in growth and obesity. Experimental design: • 10 fto + =− mice • 10 fto − =− mice • Mice are sacrificed at the end of 1-5 weeks of age. • Two mice in each group are sacrificed at each age. 4.3 FTO example: data ● ● 25 ● fto−/− ● ● fto+/− 20 ● ● ● 15 ● ● ● ● weight (g) weight ● ● ● ● 10 ● ● ● ● 5 ● 1 2 3 4 5 age (weeks) 4.4 FTO example: analysis • y = weight • xg = indicator of fto heterozygote 2 f0; 1g = number of \+" alleles • xa = age in weeks 2 f1; 2; 3; 4; 5g How can we estimate p(yjxg; xa)? Cell means model: genotype age −=− θ0;1 θ0;2 θ0;3 θ0;4 θ0;5 +=− θ1;1 θ1;2 θ1;3 θ1;4 θ1;5 Problem: 10 parameters { only two observations per cell 4.5 Linear regression Solution: Assume smoothness as a function of age. For each group, y = α0 + α1xa + . This is a linear regression model. Linearity means \linear in the parameters", i.e. several covariates, each multiplied by corresponding α, and added. A more complex model might assume e.g. 2 3 y = α0 + α1xa + α2xa + α3xa + , { but this is still a linear regression model, even with age2, age3 terms. 4.6 Multiple linear regression With enough variables, we can describe the regressions for both groups simultaneously: Yi = β1xi;1 + β2xi;2 + β3xi;3 + β4xi;4 + i ; where xi;1 = 1 for each subject i xi;2 = 0 if subject i is homozygous, 1 if heterozygous xi;3 = age of subject i xi;4 = xi;2 × xi;3 Note that under this model, E[ Y jx; ] = β1 + β3 × age if x2 = 0, and E[ Y jx ] = (β1 + β2) + (β3 + β4) × age if x2 = 1. 4.7 Multiple linear regression For graphical thinkers... β3 = 0 β4 = 0 β2 = 0 β4 = 0 ● ● ● ● 25 ● ● 20 ● ● ● ● ● ● 15 ● ● ● ● weight ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● 5 ● ● β4 = 0 ● ● ● ● 25 ● ● 20 ● ● ● ● ● ● 15 ● ● ● ● weight ● ● ● ● ● ● ● ● ● ● ● ● 10 ● ● ● ● ● ● ● ● 5 ● ● 1 2 3 4 5 1 2 3 4 5 age age 4.8 Normal linear regression How does each Yi vary around its mean E[ Yijββ; xi ]; ? T Yi = β xi + i 2 1; : : : ; n ∼ i.i.d. normal(0; σ ): This assumption of Normal errors specifies the likelihood: n 2 Y 2 p(y1; : : : ; ynjx1;::: xn; ββ; σ ) = p(yijxi; ββ; σ ) i=1 1 n = (2 2)−n=2exp X ( T )2 πσ {− 2 yi − β xi g: 2σ i=1 Note: in large(r) sample sizes, analysis is \robust" to the Normality assumption|but here we rely on the mean being linear in the x's, and on the i's variance being constant with respect to x. 4.9 Matrix form T • Let y be the n-dimensional column vector (y1; : : : ; yn) ; • Let X be the n × p matrix whose ith row is xi Then the normal regression model is that yjX; ββ; σ2 ∼ multivariate normal (Xββ; σ2I); where I is the p × p identity matrix and 0 x ! 1 1 0 β 1 0 β x + ··· + β x 1 0 Y jββ; x 1 B C 1 1 1;1 p 1;p E 1 1 B x2 ! C B . C B . C B . C Xβ = B . C @ . A = @ . A = @ . A : @ . A βp β1xn;1 + ··· + βpxn;p EYnjββ; xn xn ! 4.10 Ordinary least squares estimation What values of β are consistent with our data y;X? Recall 1 n ( 2) = (2 2)−n=2exp X ( T )2 p y1; : : : ; ynjx1;::: xn; ββ; σ πσ {− 2 yi−β xi g: 2σ i=1 P T 2 This is big when SSR(β) = (yi − β xi) is small { where we define n X T 2 T SSR(β) = (yi − β xi) = (y − Xβ) (y − Xβ) i=1 = yT y − 2βTXT y + βTXTXβ : What value of β makes this the smallest, i.e. gives fitted values closest to the outcomes y1; y2; :::yn? 4.11 OLS: with calculus `Recall' that... 1. a minimum of a function g(z) occurs at a value z such that d dzg(z) = 0; 2. the derivative of g(z) = az is a and the derivative of g(z) = bz2 is 2bz. Doing this for SSR... d d SSR(β) = yT y − 2βTXT y + βTXTXβ dββ dββ = −2XT y + 2XTXβ ; d and so SSR(β) = 0 , −2XT y + 2XTXβ = 0 dββ , XTXβ = XT y , β = (XTX)−1XT y : ^ T −1 T βols = (X X) X y is the Ordinary Least Squares (OLS) estimator of β. 4.12 OLS: without calculus The calculus-free, algebra-heavy version { which relies on knowing the answer in advance! T −1 T ^ Writing Π = X(X X) X , and noting that X = ΠX and Xβols = Πy; (y − Xβ)T (y − Xβ) = (y − Πy + Πy − Xβ)T (y − Πy + Πy − Xβ) ^ T ^ = ((I − Π)y + Π(βols − β)) ((I − Π)y + Π(βols − β)) T ^ T ^ = y (I − Π)y + (βols − β) Π(βols − β); because all the `cross terms' with Π and I − Π are zero. Hence the value of β that minimizes the SSR { for a given set ^ of data { is βols. 4.13 OLS: using R ### OLS estimate > beta.ols <- solve( t(X)%*%X )%*%t(X)%*%y > beta.ols [,1] (Intercept) -0.06821632 xg 2.94485495 xa 2.84420729 xg:xa 1.72947648 ### or less painfully, using lm() > fit.ols<-lm( y~ xg*xa ) > coef( summary(fit.ols) ) Estimate Std. Error t value Pr(>|t|) (Intercept) -0.06821632 1.4222970 -0.04796208 9.623401e-01 xg 2.94485495 2.0114316 1.46405917 1.625482e-01 xa 2.84420729 0.4288387 6.63234803 5.760923e-06 xg:xa 1.72947648 0.6064695 2.85171239 1.154001e-02 4.14 OLS: using R ● ● ● ● ● ● ● ● weight ● ● ● ● ● ● ● ● ● ● ● 1 2 3 4 5 age > coef( summary(fit.ols) ) Estimate Std. Error t value Pr(>|t|) (Intercept) -0.06821632 1.4222970 -0.04796208 9.623401e-01 xg 2.94485495 2.0114316 1.46405917 1.625482e-01 xa 2.84420729 0.4288387 6.63234803 5.760923e-06 xg:xa 1.72947648 0.6064695 2.85171239 1.154001e-02 4.15 Bayesian inference for regression models yi = β1xi;1 + ··· + βpxi;p + i Motivation: • Incorporating prior information • Posterior probability statements: P[ βj > 0jy;X ] • OLS tends to overfit when p is large, Bayes' use of prior tends to make it more conservative { as we'll see • Model selection and averaging (more later) 4.16 Prior and posterior distribution prior β ∼ mvn(β0; Σ0) sampling model y ∼ mvn(Xββ; σ2I) posterior βjy;X ∼ mvn(βn; Σn) where 2 −1 T 2 −1 Σn = Varβjy;X; σ = (Σ0 + X X/σ ) 2 −1 T 2 −1 −1 T 2 βn = Eβjy;X; σ = (Σ0 + X X/σ ) (Σ0 β0 + X y/σ ): Notice: −1 T 2 ^ • If Σ0 X X/σ , then βn ≈ βols −1 T 2 • If Σ0 X X/σ , then βn ≈ β0 4.17 The g-prior How to pick β0; Σ0? A classical suggestion (Zellner, 1986) uses the g-prior: β ∼ mvn(0; gσ2(XTX)−1) ^ Idea: The variance of the OLS estimate βols is σ2 Varβ^ = σ2(XTX)−1 = (XTX=n)−1 ols n This is (roughly) the uncertainty in β from n observations. σ2 Varβ = gσ2(XTX)−1 = (XTX=n)−1 gprior n=g The g-prior can roughly be viewed as the uncertainty from n=g observations. For example, g = n means the prior has the same amount of info as 1 observation { so (roughly!) not much, in a large study. 4.18 Posterior distributions under the g-prior After some algebra, it turns out that 2 fβjy;X; σ g ∼ mvn(βn; Σn); 2 g 2 T −1 where Σn = Var[βjy;X; σ ] = σ (X X) g + 1 2 g T −1 T βn = E[βjy;X; σ ] = (X X) X y g + 1 Notes: g ^ • The posterior mean estimate βn is simply g+1βols. g ^ • The posterior variance of β is simply g+1Varβols. • g shrinks the coefficients towards 0 and can prevent overfit- ting to the data • If g = n, then as n increases, inference approximates that ^ using βols. 4.19 Monte Carlo simulation What about the error variance σ2? It's rarely known exactly, so more realistic to allow for uncertainty with a prior... 2 2 prior1 /σ ∼ gamma(ν0=2; ν0σ0=2) sampling model y ∼ mvn(Xββ; σ2I) 2 2 posterior1 /σ jy;X ∼ gamma([ν0 + n]=2; [ν0σ0 + SSRg]=2) where SSRg is somewhat complicated. Simulating the joint posterior distribution: joint distribution p(σ2; βjy;X) = p(σ2jy;X) × p(βjy;X; σ2) simulation fσ2; βg ∼ p(σ2; βjy;X) , σ2 ∼ p(σ2jy;X); β ∼ p(βjy;X; σ2) To simulate fσ2; βg ∼ p(σ2; βjy;X), • First simulate σ2 from p(σ2jy;X) • Use this σ2 to simulate β from p(βjy;X; σ2) Repeat 1000's of times to obtain MC samples: fσ2; βg(1);:::; fσ2; βg(S).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    48 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us