1 2
SUBMODELS (NESTED MODELS) Examples: AND
ANALYSIS OF VARIANCE a. NH: !1 = 1
OF REGRESSION MODELS AH: !1 ! 1
Data: (x1, y1), (x2, y2), … , (xn, yn) What submodel does NH correspond to?
Assumptions:
• Linear mean function
• Constant conditional variance How many parameters does it have? • Independence of observations
• Normality of conditional distributions AH corresponds to the full model (with three
parameters, including ! ). Our model has 3 parameters: 1
E(Y|x) = ! + ! x (Two parameters: ! and ! ) 0 1 0 1 b. NH: ! = 0 Var(Y|x) = "2 (One parameter: ") 1 AH: ! ! 0 1
We will call this the full model. AH <-> full model.
Many hypothesis tests on coefficients can be NH <-> ? reformulated as test of the full model against a submodel : a special case of the full model obtained by specifying certain of the parameters or certain Number of parameters? relationships between parameters.
3 4 c. NH: !0 = 0 As with the full model, we can "fit" a submodel using
AH: !0 ! 0 least squares:
AH <-> full model. Example 1: Submodel:
NH <->? E(Y|x) = 2 + !1x
Var(Y|x) = "2, Number of parameters?
Consider lines y = 2 + h1x We can go the other way: [picture!] Examples: For the submodel given, what is the null hypothesis of the corresponding hypothesis test? Minimize 2 2 RSS(h1) = # di = # [yi - (2 + h1xi)] d. E(Y|x) = 2 + !1x 2 Var(Y|x) = " to get "ˆ1 .
Note: For this example, yi - (2 + h1xI) = (yi - 2) - h1xi, so fitting this model is equivalent to fitting the model ! e. E(Y|x) = !0 + !0x E(Y|x) = !1x Var(Y|x) = "2 Var(Y|x) = "2
to the transformed data
(x1, y1 - 2), (x2, y2 - 2), … , (xn, yn - 2)
5 6
Exercise: Try finding the least squares fit for the Example 2: Submodel submodel
E(Y|x) = !0 E(Y|x) = !1x
2 2 Var(Y|x) = " Var(Y|x) = "
("Regression through the origin") Minimize RSS(h ) = d 2 = (y - h )2 0 # i # i 0 You should get a different formula for "ˆ 1 from that
obtained by setting "ˆ 0 = 0 in the formula for the least [picture!] squares fit for the full model. Details:
Result: h0 = y -- the same as the univariate estimate.
ˆ Note: This is also the same as setting "1 = 0 in the least squares fit for the full model.
Caution: The result is not always this nice, as the exercise below shows.
! !
!
! 7 8
Generalizing: If we fit a submodel by Least Squares, General Properties: (Stated without proof; true for we can define the residual sum of squares for the multiple regression as well as simple regression) submodel: 2 i. RSSsub is a multiple of a $ distribution, with ˆ 2 eˆ 2 RSSsub = #(yi - y i,sub ) = # i,sub ii. degrees of freedom dfsub = n - (# of coefficients estimated), and
where ! RSSsub ! 2 2 iii. "ˆ sub = is an estimate of " for the ˆ ˆ dfsub y i,sub = E sub(Y|x) submodel. is the fitted value for the submodel and This will allow us to do hypothesis tests comparing a eˆ yˆ ! i,sub = yi - i,sub submodel with the full model.
Example: For the submodel ! ! E(Y|x) = !0 Var(Y|x) = "2,
yˆ i,sub = y for each i, so
2 RSSsub = #(yi - y ) = SYY = (n-1) sY,
! where sY is the sample standard deviation of Y.
! ! !
! ! 9 10
This hypothesis test uses the t-statistic Another Perspective: SXY "ˆ SXX We want a way to help decide whether the full model 1 t = s.e.("ˆ ) = "ˆ ~ t(n-2), is significantly better than the full model. 1 SXX
ˆ RSSsub - RSSfull can be considered a measure of how where here "ˆ = " full is the estimate of " for the full much better the full model is than the submodel. model.
(Why is this difference always " 0?). Note that
2 But RSSsub - RSSfull depends on the scale of the data, (SXY ) 2 (SXX) 2 (SXY) RSS " RSS 2 2 2 sub full t = "ˆ = "ˆ (SXX) so RSS is a better measure. SXX full Recall: Example (to help build intuition): The submodel (SXY )2
! RSSfull = SYY - E(Y|x) = ! SXX 0 Var(Y|x) = "2 RSS = SYY (in this particular example) sub
Testing this model against the full model is Thus equivalent to performing a hypothesis test with (SXY )2
RSSsub - RSSfull = . NH: ! = 0 SXX 1 so AH: ! ! 0. 1
! !
!
!
!
!
!
! 11 12
RSSsub " RSSfull F Distributions 2 t = #ˆ 2 Recall: A t(k) random variable has the distribution of
RSSsub " RSS full a random variable of the form = RSS (n " 2) full where RSS " RSS sub full = RSS ÷ (n-2), ! full Thus which is just a constant times our earlier measure t(k)2 ~
RSSsub " RSS full Also, ! of how much better the full model is Z2 ~ RSS full
than the submodel. Definition: An F-distribution F(!1, !2) with !1 degrees of freedom in the numerator and !2 degrees ! of freedom in the denominator is the distribution of a random variable of the form
W " 1 2 U where W ~ $ (!1) "2 2 U ~ $ (!2) and U and W are independent.
!
! 13 14
Thus: t(k)2 ~ F(1, k); Still another look at the F-statistic t2:
i.e., the square of a t(k) random variable is an F(1,k) RSSsub " RSS full random variable – so any two-sided t-test could also F = RSS (n " 2) be formulated as an F-test. full
RSS " RSS df " df In particular, the hypothesis test with hypotheses ( sub full ) ( sub full ) = RSS df , full full ! NH: !1 = 0 since dfsub – dffull = (n – 1) – (n – 2) = 1. AH: !1 ! 0 could be done using the F-statistic t2 instead of the t- i.e., F is the ratio of: statistic . Numerator: Recall that in this example, the difference between the residual sum of squares for the submodel and the RSS for the full model RSSsub " RSS full 2 t = RSS ÷ (n-2), full Denominator: the residual sum of squares for the full model which we have seen does make sense as a measure of whether the full model (corresponding to AH) is ! But: better than the submodel (corresponding to NH). with each divided by its degrees of freedom to "weight" them appropriately to get a tractable Example: Forbes data. distribution.
! 15 16
RSSsub " RSS full Rewriting the F-statistic, This is also just a constant times , RSS full RSS RSS df df which is a reasonable measure of how much better ( sub " full ) ( sub " full )
the full model is than the submodel in fitting the data. RSSfull df full
! This generalizes: Whenever we have a submodel (in # & RSSsub " RSS full df full multiple linear regression as well as simple linear % (% ( = % RSS (% df " df ( regression), $ full '$ fsub full '
2 RSS " RSS ˆ 2 sub full a. RSSsub (hence " sub) will be a constant times a $ is just a constant multiple of , which RSS full distribution, with degrees of freedom dfsub, which we ! is a reasonable measure of how much better the full then also refer to as the degrees of freedom of RSSsub 2 model is than the submodel in fitting the data. and of "ˆ sub.
! Thus we can use this F statistic for the hypothesis test (RSSsub " RSSfull ) (dfsub " df full ) b. 2 #ˆ full NH: Submodel
AH: Full model (RSSsub " RSSfull ) (dfsub " df full ) = RSSfull df full
~ F(dfsub - dffull, dffull).
!
!
! !
!