1 2

SUBMODELS (NESTED MODELS) Examples: AND

ANALYSIS OF VARIANCE . NH: !1 = 1

OF REGRESSION MODELS AH: !1 ! 1

Data: (x1, y1), (x2, y2), … , (xn, yn) What submodel does NH correspond to?

Assumptions:

• Linear mean function

• Constant conditional variance How many parameters does it have? • Independence of observations

• Normality of conditional distributions AH corresponds to full model (with three

parameters, including ! ). Our model has 3 parameters: 1

E(Y|x) = ! + ! x (Two parameters: ! and ! ) 0 1 0 1 b. NH: ! = 0 Var(Y|x) = "2 (One parameter: ") 1 AH: ! ! 0 1

We will call this the full model. AH <-> full model.

Many hypothesis tests on coefficients can NH <-> ? reformulated as test of the full model against a submodel : a special case of the full model obtained by specifying certain of the parameters or certain Number of parameters? relationships between parameters.

3 4 c. NH: !0 = 0 As with the full model, can "fit" a submodel using

AH: !0 ! 0 least squares:

AH <-> full model. Example 1: Submodel:

NH <->? (Y|x) = 2 + !1x

Var(Y|x) = "2, Number of parameters?

Consider lines y = 2 + h1x We can go the other way: [picture!] Examples: For the submodel given, what is the null hypothesis of the corresponding hypothesis test? Minimize 2 2 RSS(h1) = # di = # [ - (2 + h1xi)] d. E(Y|x) = 2 + !1x 2 Var(Y|x) = " to get "ˆ1 .

Note: For this example, yi - (2 + h1xI) = (yi - 2) - h1xi, so fitting this model is equivalent to fitting the model ! e. E(Y|x) = !0 + !0x E(Y|x) = !1x Var(Y|x) = "2 Var(Y|x) = "2

to the transformed data

(x1, y1 - 2), (x2, y2 - 2), … , (xn, yn - 2)

5 6

Exercise: Try finding the least squares fit for the Example 2: Submodel submodel

E(Y|x) = !0 E(Y|x) = !1x

2 2 Var(Y|x) = " Var(Y|x) = "

("Regression through the origin") Minimize RSS(h ) = d 2 = (y - h )2 0 # # i 0 You should get a different formula for "ˆ 1 from that

obtained by setting "ˆ 0 = 0 in the formula for the least [picture!] squares fit for the full model. Details:

Result: h0 = y -- the same as the univariate estimate.

ˆ Note: This is also the same as setting "1 = 0 in the least squares fit for the full model.

Caution: The result is not always this nice, as the exercise below shows.

! !

!

! 7 8

Generalizing: If we fit a submodel by Least Squares, General Properties: (Stated without proof; true for we can define the residual sum of squares for the multiple regression as well as simple regression) submodel: 2 i. RSSsub is a multiple of a $ distribution, with ˆ 2 eˆ 2 RSSsub = #(yi - y i,sub ) = # i,sub ii. degrees of freedom dfsub = n - (# of coefficients estimated), and

where ! RSSsub ! 2 2 iii. "ˆ sub = is an estimate of " for the ˆ ˆ dfsub y i,sub = E sub(Y|x) submodel. is the fitted value for the submodel and This will allow us to do hypothesis tests comparing a eˆ yˆ ! i,sub = yi - i,sub submodel with the full model.

Example: For the submodel ! ! E(Y|x) = !0 Var(Y|x) = "2,

yˆ i,sub = y for each i, so

2 RSSsub = #(yi - y ) = SYY = (n-1) sY,

! where sY is the sample standard deviation of Y.

! ! !

! ! 9 10

This hypothesis test uses the t-statistic Another Perspective: SXY "ˆ SXX We want a way to help decide whether the full model 1 t = s.e.("ˆ ) = "ˆ ~ t(n-2), is significantly better than the full model. 1 SXX

ˆ RSSsub - RSSfull can be considered a measure of how where here "ˆ = " full is the estimate of " for the full much better the full model is than the submodel. model.

(Why is this difference always " 0?). Note that

2 But RSSsub - RSSfull depends on the scale of the data, (SXY ) 2 (SXX) 2 (SXY) RSS " RSS 2 2 2 sub full t = "ˆ = "ˆ (SXX) so RSS is a better measure. SXX full Recall: Example (to help build intuition): The submodel (SXY )2

! RSSfull = SYY - E(Y|x) = ! SXX 0 Var(Y|x) = "2 RSS = SYY (in this particular example) sub

Testing this model against the full model is Thus equivalent to performing a hypothesis test with (SXY )2

RSSsub - RSSfull = . NH: ! = 0 SXX 1 so AH: ! ! 0. 1

! !

!

!

!

!

!

! 11 12

RSSsub " RSSfull F Distributions 2 t = #ˆ 2 Recall: A t(k) random variable has the distribution of

RSSsub " RSS full a random variable of the form = RSS (n " 2) full where RSS " RSS sub full = RSS ÷ (n-2), ! full Thus which is just a constant times our earlier measure t(k)2 ~

RSSsub " RSS full Also, ! of how much better the full model is Z2 ~ RSS full

than the submodel. Definition: An F-distribution F(!1, !2) with !1 degrees of freedom in the numerator and !2 degrees ! of freedom in the denominator is the distribution of a random variable of the form

W " 1 2 where W ~ $ (!1) "2 2 U ~ $ (!2) and U and W are independent.

!

! 13 14

Thus: t(k)2 ~ F(1, k); Still another look at the F-statistic t2:

i.e., the square of a t(k) random variable is an F(1,k) RSSsub " RSS full random variable – so any two-sided t-test could also F = RSS (n " 2) be formulated as an F-test. full

RSS " RSS df " df In particular, the hypothesis test with hypotheses ( sub full ) ( sub full ) = RSS df , full full ! NH: !1 = 0 since dfsub – dffull = (n – 1) – (n – 2) = 1. AH: !1 ! 0 could be done using the F-statistic t2 instead of the t- i.e., F is the ratio of: statistic . Numerator: Recall that in this example, the difference between the residual sum of squares for the submodel and the RSS for the full model RSSsub " RSS full 2 t = RSS ÷ (n-2), full Denominator: the residual sum of squares for the full model which we have seen does make sense as a measure of whether the full model (corresponding to AH) is ! But: better than the submodel (corresponding to NH). with each divided by its degrees of freedom to "weight" them appropriately to get a tractable Example: Forbes data. distribution.

! 15 16

RSSsub " RSS full Rewriting the F-statistic, This is also just a constant times , RSS full RSS RSS df df which is a reasonable measure of how much better ( sub " full ) ( sub " full )

the full model is than the submodel in fitting the data. RSSfull df full

! This generalizes: Whenever we have a submodel (in # &# & RSSsub " RSS full df full multiple linear regression as well as simple linear % (% ( = % RSS (% df " df ( regression), $ full '$ fsub full '

2 RSS " RSS ˆ 2 sub full a. RSSsub (hence " sub) will be a constant times a $ is just a constant multiple of , which RSS full distribution, with degrees of freedom dfsub, which we ! is a reasonable measure of how much better the full then also refer to as the degrees of freedom of RSSsub 2 model is than the submodel in fitting the data. and of "ˆ sub.

! Thus we can use this F statistic for the hypothesis test (RSSsub " RSSfull ) (dfsub " df full ) b. 2 #ˆ full NH: Submodel

AH: Full model (RSSsub " RSSfull ) (dfsub " df full ) = RSSfull df full

~ F(dfsub - dffull, dffull).

!

!

! !

!