Theory of Generalized Linear Models

Theory of Generalized Linear Models ◮ If Y has a Poisson distribution with parameter µ then µy e µ P(Y = y)= − y! for y a non-negative integer. ◮ We can use the method of maximum likelihood to estimate µ if we have a sample Y1,..., Yn of independent Poisson random variables all with mean µ. ◮ If we observe Y1 = y1, Y2 = y2 and so on then the likelihood function is n y µ y nµ µ i e− µ i e− P(Y1 = y1,..., Yn = yn)= = P yi ! yi ! Yi=1 Q ◮ This function of µ can be maximized by maximizing its logarithm, the log likelihood function. Richard Lockhart STAT 350: Estimating Eauations ◮ Set derivative of log likelihood with respect to µ equal to 0. ◮ Get likelihood equation: d [ y log µ nµ log(y !)] = y /µ n = 0 . dµ i − − i i − X X X ◮ Solutionµ ˆ =y ¯ is the maximum likelihood estimate of µ. Richard Lockhart STAT 350: Estimating Eauations ◮ In a regression problem all the Yi will have different means µi . ◮ Our log-likelihood is now y log µ µ log(y !) i i − i − i X X X ◮ If we treat all n of the µi as unknown parameters we can maximize the log likelihood by setting each of the n partial derivatives with respect to µk for k from 1 to n equal to 0. ◮ The kth of these n equations is just y /µ 1 = 0 . k k − ◮ This leads toµ ˆk = yk . ◮ In glm jargon this model is the saturated model. Richard Lockhart STAT 350: Estimating Eauations ◮ A more useful model is one in which there are fewer parameters but more than 1. ◮ A typical glm model is T µi = exp(xi β) where the xi are covariate values for the ith observation. ◮ Often include an intercept term just as in standard linear regression. ◮ In this case the log-likelihood is y xT β exp(xT β) log(y !) i i − i − i which shouldX be treatedX as a function ofXβ and maximized. ◮ The derivative of this log-likelihood with respect to βk is y x exp(xT β)x = (y µ )x i ik − i i,k i − i i,k ◮ If β hasXp componentsX then setting theseXp derivatives equal to 0 gives the likelihood equations. Richard Lockhart STAT 350: Estimating Eauations ◮ It is no longer possible to solve the likelihood equations analytically. ◮ We have, instead, to settle for numerical techniques. ◮ One common technique is called iteratively re-weighted least squares. 2 ◮ For a Poisson variable with mean µi the variance is σi = µi . ◮ Ignore for a moment the fact that if we knew σi we would know µi and 2 ◮ consider fitting our model by least squares with the σi known. ◮ We would minimize (see our discussion of weighted least squares) 2 (Yi µi ) −2 σi X by taking the derivative with respect to βk and (again 2 ignoring the fact that σi depends on βk we would get (Yi µi )∂µi /∂βk 2 − 2 = 0 − σi X Richard Lockhart STAT 350: Estimating Eauations ◮ But the derivative of µi with respect to βk is µi xik 2 ◮ and replacing σi by µi we get the equation (Y µ )x = 0 i − i ik exactly as before. X ◮ This motivates the following estimation scheme. 1. Begin with guess for SDs σi (taking all to be 1 is easy). 2. Do (non-linear) weighted least squares using guessed weights. Get estimated regression parameters βˆ. 2 3. Use these to compute estimated variancesσ î . Go back to do weighted least squares with these weights. 4. Iterate (repeat over and over) until estimates stop changing. ◮ NOTE: if the estimation converges then the final estimate is a fixed point of the algorithm which solves the equation (Y µ )x = 0 i − i ik derived above. X Richard Lockhart STAT 350: Estimating Eauations Estimating equations: an introduction via glim Get estimates θˆ by solving h(X ,θ) = 0 for θ. 1. The normal equations in linear regression: X T Y X T X β = 0 − 2. Likelihood equations; if ℓ(θ) is log-likelihood: ∂ℓ = 0. ∂θ 3. Non-linear least squares: ∂µ (Y µ ) i = 0 i − i ∂θ 4. The iteratively reweightedX least squares estimating equation: Yi µi ∂µi −2 = 0; σi ∂θ X 2 for generalized linear model σi is known function of µi . Richard Lockhart STAT 350: Estimating Eauations Poisson regression revisited ◮ The likelihood function for a Poisson regression model is: yi µi L(β)= exp( µi ) yi ! − Y X ◮ the log-likelihood is y log µ µ log(y !) i i − i − i ◮ A typical glmX model is X X T µi = exp(xi β) where xi is covariate vector for observation i (often include intercept term as in standard linear regression). ◮ In this case the log-likelihood is y xT β exp(xT β) log(y !) i i − i − i which shouldX be treatedX as a function ofXβ and maximized. Richard Lockhart STAT 350: Estimating Eauations ◮ The derivative of this log-likelihood with respect to βk is y x exp(xT β)x = (y µ )x i ik − i i,k i − i i,k X X X ◮ If β has p components then setting these p derivatives equal to 0 gives the likelihood equations. ◮ For a Poisson model the variance is given by 2 T σi = µi = exp(xi β) ◮ so the likelihood equations can be written as (yi µi )xi,k µi (yi µi ) ∂µi − = −2 = 0 µi σi ∂βk X X which is the fourth equation above. Richard Lockhart STAT 350: Estimating Eauations IRWLS ◮ Equations solved iteratively, as in non-linear regression, but iteration now involves weighted least squares. ◮ Resulting scheme is called iteratively reweighted least squares. 1. Begin with guess for SDs σi (taking all equal to 1 is simple). 2. Do (non-linear) weighted least squares using the guessed weights. Get estimated regression parameters βˆ(0). 2 3. Use to compute estimated variancesσ î . Re-do weighted least squares with these weights; get βˆ(1). 4. Iterate (repeat over and over) until estimates not really changing. Richard Lockhart STAT 350: Estimating Eauations Fixed Points of Algorithms (k) ◮ Suppose the βˆ converge as k to something, say, βˆ. →∞ ◮ Recall y µ (βˆ(k+1)) ∂µ (βˆ(k+1)) i − i i = 0 2 ˆ(k) ˆ(k+1) " σi (β ) # ∂β X ◮ we learn that βˆ must be a root of the equation y µ (βˆ) ∂µ (βˆ) i − i i = 0 2 ˆ ˆ " σi (β) # ∂β X which is the last of our example estimating equations. Richard Lockhart STAT 350: Estimating Eauations Distribution of Estimators ◮ Distribution Theory: compute distribution of statistics, estimators and pivots. ◮ Examples: Multivariate Normal Distribution; theorems about chi-squared distribution of quadratic forms; theorems that F statistics have F distributions when null hypothesis true; theorems that show a t pivot has a t distribution. ◮ Exact Distribution Theory: exact results as in previous example when errors are assumed to have exactly normal distributions. ◮ Asymptotic or Large Sample Distribution Theory: same sort of conclusions but only approximately true and assuming n is large. Theorems of the form: lim P(Tn t)= F (t) n →∞ ≤ ◮ For generalized linear models do asymptotic distribution theory. Richard Lockhart STAT 350: Estimating Eauations Uses of Asymptotic Theory: principles ◮ An estimate is normally only useful if it is equipped with a measure of uncertainty such as a standard error. ◮ A standard error is a useful measure of uncertainty provided the error of estimation θˆ θ has approximately a normal − distribution and the standard error is the standard deviation of this normal distribution. ◮ For many estimating equations h(Y ,θ)= 0 the root θˆ is unique and has the desired approximate normal distribution, provided the sample size n is large. Richard Lockhart STAT 350: Estimating Eauations Sketch of reasoning in special case ◮ Poisson example: p = 1 x β ◮ Assume Yi has a Poisson distribution with mean µi = e i where now β is a scalar. ◮ The estimating equation (the likelihood equation) is U(β)= h(Y ,..., Y , β)= (Y exi β)x = 0 1 n i − i X ◮ It is now important to distinguish between a value of β which we are trying out in the estimating equation and the true value of β which I will call β0. ◮ If we happen to try out the true value of β in U then we find E (U(β )) = x E (Y µ ) = 0 β0 0 i β0 i − i X Richard Lockhart STAT 350: Estimating Eauations ◮ On the other hand if we try out a value of β other than the correct one we find E (U(β)) = x (exi β exi β0 ) = 0 . β0 i − 6 X ◮ But U(β) is a sum of independent random variables so by the law of large numbers (law of averages) must be close to its expected value. ◮ This means: if we stick in a value of β far from the right value we will not get 0 while if we stick in a value of β close to the right answer we will get something close to 0. ◮ This can sometimes be turned in to the assertion: The glm estimate of β is consistent, that is, it converges to the correct answer as the sample size goes to . ∞ Richard Lockhart STAT 350: Estimating Eauations ◮ The next theoretical step is another linearization. ◮ If βˆ is the root of the equation, that is, U(βˆ) = 0, then 0= U(βˆ) U(β ) + (βˆ β )U′(β ) ≈ 0 − 0 0 ◮ This is a Taylor’s expansion.

Theory of Generalized Linear Models

Generalized Linear Models and Generalized Additive Models

An Introduction to Poisson Regression Russ Lavery, K&L Consulting Services, King of Prussia, PA, U.S.A

Generalized Linear Models (Glms)

Generalized Least Squares and Weighted Least Squares Estimation

Generalized and Weighted Least Squares Estimation

Generalized Linear Models with Poisson Family: Applications in Ecology

Lecture 24: Weighted and Generalized Least Squares 1

18.650 (F16) Lecture 10: Generalized Linear Models (Glms)

Weighted Least Squares

Heteroscedastic Errors

4.3 Least Squares Approximations

Generalized Linear Models