{ LINEAR REGRESSION MODELS {
Likeliho o d and Reference Bayesian
Analysis
Mike West
May 3, 1999
1 Straight Line Regression Mo dels
We b egin with the simple straight line regression mo del
y = + x +
i i i
where the design p oints x are xed in advance, and the measurement/sampling
i
2
errors are indep endent and normally distributed, N 0; for each i =
i i
1;::: ;n: In this context, wehavelooked at general mo delling questions, data
and the tting of least squares estimates of and : Now we turn to more
formal likeliho o d and Bayesian inference.
1.1 Likeliho o d and MLEs
The formal parametric inference problem is a multi-parameter problem: we
2
require inferences on the three parameters ; ; : The likeliho o d function
has a simple enough form, as wenow show. Throughout, we do not indicate the
design p oints in conditioning statements, though they are implicitly conditioned
up on. Write Y = fy ;::: ;y g and X = fx ;::: ;x g: Given X and the mo del
1 n 1 n
parameters, each y is the corresp onding zero-mean normal random quantity
i
plus the term + x ; so that y is normal with this term as its mean and
i i i
2
variance : Also, since the are indep endent, so are the y : Thus
i i
2 2
y j ; ; N + x ;
i i
with conditional density function
2 2 2 2 1=2
py j ; ; = expf y x =2 g=2
i i i
for each i: Also, by indep endence, the joint density function is
n
Y
2 2
: py j ; ; = pY j ; ;
i
i=1 1
Given the observed resp onse values Y this provides the likeliho o d function for
the three parameters: the joint density ab ove evaluated at the observed and
2
hence now xed values of Y; now viewed as a function of ; ; as they vary
across p ossible parameter values. It is clear that this likeliho o d function is given
by
2 2 n
pY j ; ; / exp Q ; =2 =
where
n
X
2
Q ; = y x
i i
i=1
n=2
and where a constant term 1=2 has b een dropp ed.
Let us lo ok at computing the joint maximum likeliho o d estimates MLEs
2
^
of the three parameters. This involves nding the values ^ ; ; ^ such that
2 2 2
^
pY j ;^ ; ^ >pY j ; ; for any other parameter values ; ; : Wedo
this as follows.
2
For any xed value of ; the likeliho o d function is a strictly increasing
function of Q ; : Hence, changing ; to decrease the value of Q im-
plies that the value of the likeliho o d function increases. Clearly,cho osing
; to minimise Q implies that we maximise the likeliho o d function. As
2
a result, the MLEs of ; for any sp eci c value of are simply the
LSEs.
From the earlier discussion of least square estimation, we know that the
2
^
LSEs ; ^ do not, in fact, dep end on the value of the variance : As a
result, the full three-dimensional maximisation is solved at the LSE values
2 2
^ ^
^ ; and bycho osing ^ to maximise pY j ;^ ; as a function of just
2
: This trivially leads to
n
X
2 2
^ ^ = =n
i
i=1
2
^
where ^ = y ^ x for each i; this MLE of is the usual residual
i i i
sum of squares with divisor n:
1.2 Reference Bayesian Analyses
1.2.1 Parametrisation and Reference Prior
2
To develop the reference Bayesian p osterior distribution for ; ; it is tra-
2 2
ditional to reparameterise from to the precision parameter =1= : This
is done simply for clarity of exp osition and ease of development. Plugging
2
=1= in the likeliho o d function leads simply to
n=2
pY j ; ; / exp Q ; =2
{ de ning the joint likeliho o d function for the three parameters ; ; : 2
Bayesian inference requires a prior p ; ; { a trivariate prior density
de ned on the mo del parameter space. Here we use the traditional reference
prior in which:
The three parameters are indep endent, so p ; ; =p p p:
As a function of ; the log-likeliho o d function is quadratic, indicating that
the likeliho o d function will contribute a normal form in : Hence a normal
prior for would be conjugate, leading to a normal p osterior. On this
basis, a normal prior with an extremely large variance as in reference
analysis of normal mo dels will representa vague or uninformative prior
p osition. Taking the formal limit of a normal prior with a variance tending
to in nity provides the traditional reference prior p / constant:
The same reasoning applies to ; leading to the traditional reference prior
p / constant:
As a function of alone, the likeliho o d function has the same form as that
of a gamma density function in : Hence a gamma prior for would b e
conjugate, leading to a gamma p osterior. On this basis, a gamma prior
with very small de ning parameters as in reference analysis of Poisson or
exp onential mo dels will representa vague or uninformative prior p osition.
Taking the formal limit of the Gammaa; b prior at a = b = 0