LINEAR REGRESSION MODELS Likelihood and Reference Bayesian

LINEAR REGRESSION MODELS Likelihood and Reference Bayesian

{ LINEAR REGRESSION MODELS { Likeliho o d and Reference Bayesian Analysis Mike West May 3, 1999 1 Straight Line Regression Mo dels We b egin with the simple straight line regression mo del y = + x + i i i where the design p oints x are xed in advance, and the measurement/sampling i 2 errors are indep endent and normally distributed, N 0; for each i = i i 1;::: ;n: In this context, wehavelooked at general mo delling questions, data and the tting of least squares estimates of and : Now we turn to more formal likeliho o d and Bayesian inference. 1.1 Likeliho o d and MLEs The formal parametric inference problem is a multi-parameter problem: we 2 require inferences on the three parameters ; ; : The likeliho o d function has a simple enough form, as wenow show. Throughout, we do not indicate the design p oints in conditioning statements, though they are implicitly conditioned up on. Write Y = fy ;::: ;y g and X = fx ;::: ;x g: Given X and the mo del 1 n 1 n parameters, each y is the corresp onding zero-mean normal random quantity i plus the term + x ; so that y is normal with this term as its mean and i i i 2 variance : Also, since the are indep endent, so are the y : Thus i i 2 2 y j ; ; N + x ; i i with conditional density function 2 2 2 2 1=2 py j ; ; = expfy x =2 g=2 i i i for each i: Also, by indep endence, the joint density function is n Y 2 2 : py j ; ; = pY j ; ; i i=1 1 Given the observed resp onse values Y this provides the likeliho o d function for the three parameters: the joint density ab ove evaluated at the observed and 2 hence now xed values of Y; now viewed as a function of ; ; as they vary across p ossible parameter values. It is clear that this likeliho o d function is given by 2 2 n pY j ; ; / expQ ; =2 = where n X 2 Q ; = y x i i i=1 n=2 and where a constant term 1=2 has b een dropp ed. Let us lo ok at computing the joint maximum likeliho o d estimates MLEs 2 ^ of the three parameters. This involves nding the values ^ ; ; ^ such that 2 2 2 ^ pY j ;^ ; ^ >pY j ; ; for any other parameter values ; ; : Wedo this as follows. 2 For any xed value of ; the likeliho o d function is a strictly increasing function of Q ; : Hence, changing ; to decrease the value of Q im- plies that the value of the likeliho o d function increases. Clearly,cho osing ; to minimise Q implies that we maximise the likeliho o d function. As 2 a result, the MLEs of ; for any sp eci c value of are simply the LSEs. From the earlier discussion of least square estimation, we know that the 2 ^ LSEs ; ^ do not, in fact, dep end on the value of the variance : As a result, the full three-dimensional maximisation is solved at the LSE values 2 2 ^ ^ ^ ; and bycho osing ^ to maximise pY j ;^ ; as a function of just 2 : This trivially leads to n X 2 2 ^ ^ = =n i i=1 2 ^ where ^ = y ^ x for each i; this MLE of is the usual residual i i i sum of squares with divisor n: 1.2 Reference Bayesian Analyses 1.2.1 Parametrisation and Reference Prior 2 To develop the reference Bayesian p osterior distribution for ; ; it is tra- 2 2 ditional to reparameterise from to the precision parameter =1= : This is done simply for clarity of exp osition and ease of development. Plugging 2 =1= in the likeliho o d function leads simply to n=2 pY j ; ; / expQ ; =2 { de ning the joint likeliho o d function for the three parameters ; ; : 2 Bayesian inference requires a prior p ; ; { a trivariate prior density de ned on the mo del parameter space. Here we use the traditional reference prior in which: The three parameters are indep endent, so p ; ; =p p p: As a function of ; the log-likeliho o d function is quadratic, indicating that the likeliho o d function will contribute a normal form in : Hence a normal prior for would be conjugate, leading to a normal p osterior. On this basis, a normal prior with an extremely large variance as in reference analysis of normal mo dels will representa vague or uninformative prior p osition. Taking the formal limit of a normal prior with a variance tending to in nity provides the traditional reference prior p / constant: The same reasoning applies to ; leading to the traditional reference prior p / constant: As a function of alone, the likeliho o d function has the same form as that of a gamma density function in : Hence a gamma prior for would b e conjugate, leading to a gamma p osterior. On this basis, a gamma prior with very small de ning parameters as in reference analysis of Poisson or exp onential mo dels will representa vague or uninformative prior p osition. Taking the formal limit of the Gammaa; b prior at a = b = 0 1 provides the traditional reference prior p / : Combing these comp onents pro duces the standard non-informative/reference 1 prior p ; ; / : Then Bayes' theorem leads to the reference p osterior n=21 p ; ; jY / p ; ; pY j ; ; / expQ ; =2; over real-valued and and >0: This is a joint density for the three quanti- ties, and reference inference follows by exploring and summarising its prop erties. Notice that the p osterior is almost the normalised likeliho o d function { the only 1 di erence is in the prior term : We quote key features of this reference p os- terior in the following sections. 1.2.2 Marginal Reference Posterior for The marginal p osterior for is available byintegrating ; ; i.e., Z pjY = p ; ; jY d d where the range of integration is 1 < ; < 1: It can b e shown that this yields the simple form a1 pjY / expfbg P n 2 where a =n 2=2 and b = ^ =2: As a result, the p osterior for is simply i i=1 Gammaa; b with these values of a; b: In particular, the p osterior mean is 2 E jY =1=s 3 where n X 2 2 s = ^ =n 2: i i=1 2 2 Since E jY isapoint estimate of =1= ; then s is a corresp onding p oint 2 estimate of : It is referred to as the residual variance estimate, as it is a sample 2 variance computed from the tted residuals ^ : Note that, unlike the MLE ^ ; i 2 the estimate s has a divisor n 2: The common-sense interpretation of this is that the e ective number of observations is the actual total n reduced by the number of tted parameters, here just two. The term n 2 is called the residual degrees of freedom, re ecting this adjustment from n; the initial degrees 2 2 of freedom. One implication is that s > ^ ; re ecting a more conservative estimate of variance after accounting for the estimation of the two parameters. 1.2.3 Marginal Reference Posterior for ; The marginal p osterior for ; is obtained as Z p ; jY = p ; ; jY d and turns out to b e a bivariate T distribution. Of key practical relevance are the implied univariate margins for p osterior for and linear functions of ; alone, and these are all univariate T distributions see App endix for details of T distributions. Sp eci c univariate margins are as follows. 2 De ne v =1=S : The univariate p osterior for has a density function xx n1=2 2 2 2 ^ g ; p jY /fn 2+ =s v and this is the density of a T distribution with n 2 degrees of freedom, ^ mo de and scale sv : By way of notation wehave 2 2 ^ jY T ; s v : n2 2 2 ^ As long as n>4; it is also true that E jY = and V jY =cs v with c =n 2=n 4: The p osterior is symmetric and normal shap ed ab out the mo de, though has heavier tails than a normal p osterior. We can write ^ ^ = +sv t and t = =sv where the random quantity t T 0; 1: Posterior probabilities and n2 intervals for follow from those of the Student T distribution: if t is the p ^ 100p quantile of t; then that for is simply +sv t : The term sv p is called the p osterior standard error of the co ecient. For large degrees of freedom, the Student T distribution approaches the standard normal, in which case we have the approximation t N 0; 1 and so 2 2 ^ : jY N ; s v 4 Otherwise, we can view the distribution informally as \like the normal but with a little bit of additional uncertainty." The univariate margin for is similarly a T distribution with n 2 degrees 2 1 2 of freedom, mo de ^ and scale sv where v = n +x =S ; i.e., xx 2 2 jY T ^ ; s v : n2 Under p ; jY the two parameters are generally correlated. Assuming n>4 so that second moments of the T distribution exist, the p osterior 2 covariance is s c n 2=n 4 where c = x=S : Note that n ; ; xx 2=n 4 1 when n is large, when the p osterior is approximately 2 normal; in that case, the p osterior covariance is just the term s c ab ove.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    18 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us