Chapter 2 Simple Linear Regression Analysis
The simple linear regression model We consider the modelling between the dependent and one independent variable. When there is only one independent variable in the linear regression model, the model is generally termed as a simple linear regression model. When there are more than one independent variables in the model, then the linear model is termed as the multiple linear regression model.
The linear model Consider a simple linear regression model
yX01 where y is termed as the dependent or study variable and X is termed as the independent or explanatory variable. The terms 0 and 1 are the parameters of the model. The parameter 0 is termed as an intercept term, and the parameter 1 is termed as the slope parameter. These parameters are usually called as regression coefficients. The unobservable error component accounts for the failure of data to lie on the straight line and represents the difference between the true and observed realization of y . There can be several reasons for such difference, e.g., the effect of all deleted variables in the model, variables may be qualitative, inherent randomness in the observations etc. We assume that is observed as independent and identically distributed random variable with mean zero and constant variance 2 . Later, we will additionally assume that is normally distributed.
The independent variables are viewed as controlled by the experimenter, so it is considered as non-stochastic whereas y is viewed as a random variable with
Ey()01 X and Var() y 2 . Sometimes X can also be a random variable. In such a case, instead of the sample mean and sample variance of y , we consider the conditional mean of y given X x as
E(|)yx01 x
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 1
and the conditional variance of y given Xx as
Var(|) y x 2 .
2 When the values of 01,and are known, the model is completely described. The parameters 01, and 2 are generally unknown in practice and is unobserved. The determination of the statistical model
2 yX01 depends on the determination (i.e., estimation ) of 01, and . In order to know the
values of these parameters, n pairs of observations (xii ,yi )( 1,..., n ) on ( Xy , ) are observed/collected and are used to determine these unknown parameters.
Various methods of estimation can be used to determine the estimates of the parameters. Among them, the methods of least squares and maximum likelihood are the popular methods of estimation.
Least squares estimation
Suppose a sample of n sets of paired observations (xii ,yi ) ( 1,2,..., n ) is available. These observations are assumed to satisfy the simple linear regression model, and so we can write
yxiniii01 (1,2,...,).
The principle of least squares estimates the parameters 01and by minimizing the sum of squares of the difference between the observations and the line in the scatter diagram. Such an idea is viewed from different perspectives. When the vertical difference between the observations and the line in the scatter diagram is
considered, and its sum of squares is minimized to obtain the estimates of 01and , the method is known
as direct regression. yi
(xi,
Y 01 X
(X , i
xi Direct regression Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 2
Alternatively, the sum of squares of the difference between the observations and the line in the horizontal
direction in the scatter diagram can be minimized to obtain the estimates of 01and . This is known as a reverse (or inverse) regression method.
yi
YX 01 (xi, yi)
(Xi, Yi)
xi, Reverse regression method
Instead of horizontal or vertical errors, if the sum of squares of perpendicular distances between the
observations and the line in the scatter diagram is minimized to obtain the estimates of 01and , the method is known as orthogonal regression or major axis regression method.
yi
(xi
YX 01
(Xi )
xi
Major axis regression method Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 3
Instead of minimizing the distance, the area can also be minimized. The reduced major axis regression method minimizes the sum of the areas of rectangles defined between the observed data points and the nearest point on the line in the scatter diagram to obtain the estimates of regression coefficients. This is shown in the following figure:
yi
(xi yi)
YX 01
(Xi, Yi)
xi
Reduced major axis method
The method of least absolute deviation regression considers the sum of the absolute deviation of the observations from the line in the vertical direction in the scatter diagram as in the case of direct regression to
obtain the estimates of 01and .
No assumption is required about the form of the probability distribution of i in deriving the least squares estimates. For the purpose of deriving the statistical inferences only, we assume that i 's are random
2 variable with E()ii 0,()Var and Cov (, ij )0forall i j (, i j 1,2,...,). n This assumption is needed to find the mean, variance and other properties of the least-squares estimates. The assumption that
i 's are normally distributed is utilized while constructing the tests of hypotheses and confidence intervals of the parameters.
Based on these approaches, different estimates of 01and are obtained which have different statistical properties. Among them, the direct regression approach is more popular. Generally, the direct regression estimates are referred to as the least-squares estimates or ordinary least squares estimates. Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 4
Direct regression method This method is also known as the ordinary least squares estimation. Assuming that a set of n paired
observations on (xii ,yi ), 1,2,..., n are available which satisfy the linear regression model yX01 .
So we can write the model for each observation as yxiii 01, (in 1,2,..., ) .
The direct regression approach minimizes the sum of squares
nn 22 S(,)01 ii (y 0 1x i ) ii11
with respect to 01and .
The partial derivatives of S(,)01 with respect to 0 is
n S(,)01 2( yxti 01 ) 0 i1 and the partial derivative of S(,)01 with respect to 1 is
n S(,)01 2( yxxiii 01 ). 1 i1
The solutions of 01and are obtained by setting S(,) 01 0 0 S(,) 01 0. 1 The solutions of these two equations are called the direct regression estimators, or usually called as the
ordinary least squares (OLS) estimators of 01and .
This gives the ordinary least squares estimates bb0011of and of as
bybx01
sxy b1 sxx where
nnnn 2 11 sxy()(),(),xxyy i i s xx xx i x xy i , y i . iiii1111nn
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 5
Further, we have 2S(,) n 012(1)2, n 2 0 i1 2S(,) n 01 2 x2 2 i 1 i1 2 n S(,)01 22. xt nx 01 i1
The Hessian matrix which is the matrix of second-order partial derivatives, in this case, is given as
22 SS(,)01 (,) 01 2 H* 001 22SS(,) (,) 01 01 2 01 1 nnx 2n 2 nx xi i1 ' 2, x x ' where (1,1,...,1)' is a n -vector of elements unity and x (xx1 ,...,n )' is a n -vector of observations on X . The matrix H * is positive definite if its determinant and the element in the first row and column of H * are positive. The determinant of H * is given by
n 222 H *4nxnx i i1 n 2 4(nxx i ) i1 0.
n 2 The case when ()0xxi is not interesting because all the observations, in this case, are identical, i.e. i1
xi c (some constant). In such a case, there is no relationship between x and y in the context of regression
n 2 analysis. Since ()0,xxi therefore H 0. So H is positive definite for any (,)01 , therefore, i1
S(,)01 has a global minimum at (,).bb01
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 6
The fitted line or the fitted linear regression model is
yb01 bx. The predicted values are
ybbxiˆii01(1,2,...,). n
The difference between the observed value yi and the fitted (or predicted) value yˆi is called a residual. The ith residual is defined as
eyyiiii~ˆ ( 1,2,..., n )
yyiiˆ
ybbxii().01
Properties of the direct regression estimators:
Unbiased property:
sxy Note that bbybx101and are the linear combinations of yii (1,...,). n sxx Therefore
n bky1 ii i1
nn wherekxxsii ( ) / xx . Note that k i 0 and kx ii 1, so ii11
n Eb()1 kEyii () i1 n kxii (01 ) . i1
1 .
This b1 is an unbiased estimator of 1 . Next
Eb()01 E y bx Exbx 01 1 01xx 1
0 .
Thus b0 is an unbiased estimator of 0 .
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 7
Variances:
Using the assumption that ysi ' are independently distributed, the variance of b1 is
n 2 Var() b1 kii Var () y k ijij k Cov (, y y ) iiji1 2 ()xxi 2 i (Cov ( y , y ) 0 as y ,..., y are independent) s2 ij1 n xx 2 sxx = 2 sxx 2 = . sxx
The variance of b0 is
2 Var() b011 Var ()y xVarb ()2 xCov (,).y b First, we find that
Cov(, y b111 ) E y E () y b E () b Ecy() ii 1 i 1 Eccxc()(ii01 iiiii ) 1 n iiii i 1 0000 n 0 So
2 2 1 x Var() b0 . nsxx
Covariance:
The covariance between b0 and b1 is
Cov(,) b01 b Cov (,) y b 1 xVar () b 1 x 2. sxx
It can further be shown that the ordinary least squares estimators b0 and b1 possess the minimum variance in the class of linear and unbiased estimators. So they are termed as the Best Linear Unbiased Estimators (BLUE). Such a property is known as the Gauss-Markov theorem, which is discussed later in multiple linear regression model. Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 8
Residual sum of squares: The residual sum of squares is given as nn 22ˆ SSres e i() y i y i ii11 n 2 (ybbxii01 ) i1 n 2 yii y bx11 bx i1 n 2 ()()yybxxii1 i1 nn n 22 2 ()yyii b11 ()2()() xx b xxyy ii ii11 i 1 22 sbsbsyy11 xx 2 xx 2 sbsyy1 xx 2 sxy ssyy xx sxx s2 s xy yy sxx
sbsyy1 xy . nn 2 1 where syyi(),yy y y i. ii11n
Estimation of 2 2 The estimator of is obtained from the residual sum of squares as follows. Assuming that yi is normally
2 distributed, it follows that SSres has a distribution with (2)n degrees of freedom, so SS res ~(2). 2 n 2 Thus using the result about the expectation of a chi-square random variable, we have
2 ESS()(2).res n Thus an unbiased estimator of 2 is SS s2 res . n 2
Note that SSres has only (2)n degrees of freedom. The two degrees of freedom are lost due to estimation
2 2 of b0 and b1 . Since s depends on the estimates b0 and b1 , so it is a model-dependentt estimate of .
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 9
Estimates of variances of b0 and b1 :
2 22 The estimators of variances of b0 and b1 are obtained by replacing by its estimate ˆ s as follows:
2 2 1 x Var() b0 s nsxx and 2 s Var() b1 . sxx n n ˆ It is observed that since ()0,yyii so ei 0. In the light of this property, ei can be regarded as an i1 i1
estimate of unknown i (in 1,..., ). This helps in verifying the different model assumptions on the basis of the given sample (,xiiyi ), 1,2,...,. n
Further, note that
n (i) xeii 0, i1 n ˆ (ii) yeii 0, i1 nn ˆ (iii) yyii and ii11 (iv) the fitted line always passes through (,xy ).
Centered Model: Sometimes it is useful to measure the independent variable around its mean. In such a case, the model
yXiii01 has a centred version as follows: yxxxin( ) ( 1,2,..., ) ii01 1 * 01()xxii * where 001x . The sum of squares due to error is given by
nn 2 *2 * Syxx(,)01 ii 0 1 ( i ). ii11 Now solving S(,)* 01 0 * 0 * S(,)01 * 0, 1 Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 10
* we get the direct regression least squares estimates of 01and as
* b0 y and
sxy b1 , sxx respectively.
Thus the form of the estimate of slope parameter 1 remains the same in the usual and centered model whereas the form of the estimate of intercept term changes in the usual and centered models.
* * Further, the Hessian matrix of the second order partial derivatives of S(,)01 with respect to 01and
** * ** is positive definite at 00 b and 11 b which ensures that S(,)01 is minimized at 00 b and
11 b .
2 Under the assumption that E()ii 0,Var () and Cov ( ij )0 for all i j 1,2,..., n , it follows that
** Eb()00 , Eb () 11 , 22 * Varb()01 , Varb () . nsxx
* In this case, the fitted model of yxxiii01() is
yybxx1(), and the predicted values are
yybxxˆii1( ) ( i 1,..., n ). Note that in the centered model
* Cov(,) b01 b 0.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 11
No intercept term model:
Sometimes in practice, a model without an intercept term is used in those situations when xyii 0 0 for all in1,2,..., . A no-intercept model is
yxiii1 ( i 1,2,.., n ). For example, in analyzing the relationship between the velocity ()y of a car and its acceleration ()X , the velocity is zero when acceleration is zero.
Using the data (xii ,yi ), 1,2,..., n , the direct regression least-squares estimate of 1 is obtained by
nn 22 minimizing S()11iii (y x ) and solving ii11 S() 1 0 1
gives the estimator of 1 as
n yxii * i1 b1 n . 2 xi i1
The second-order partial derivative of S()1 with respect to 1 at 11 b is positive which insures that b1 minimizes S().1
2 Using the assumption that E()ii 0,Var () and Cov ( ij )0 for all i j 1,2,..., n , the properties
* of b1 can be derived as follows:
n xiiEy() * i1 Eb()1 n 2 xi i1 n 2 xi 1 i1 n 2 xi i1
1
* * This b1 is an unbiased estimator of 1 . The variance of b1 is obtained as follows:
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 12
n x2Var() y i i * i1 Var() b1 n 2 2 xi i1 n 2 xi 2 i1 n 2 2 xi i1 2 n 2 xi i1 and an unbiased estimator of 2 is obtained as
nn 2 ybiii 1 yx ii11. n 1
Maximum likelihood estimation
We assume that i 'si ( 1,2,..., n ) are independent and identically distributed following a normal distribution N(0, 2 ). Now we use the method of maximum likelihood to estimate the parameters of the linear regression model
yxiniii01 (1,2,...,),
2 the observations yii ( 1,2,..., n ) are independently distributed with Nx(,)01 i for all in1,2,..., .
2 The likelihood function of the given observations (,xiiy ) and unknown parameters 01, and is
1/2 n 11 Lx(, y ; , ,22 ) exp ( y x ). ii0122 i 0 1 i i1 22
2 2 The maximum likelihood estimates of 01, and can be obtained by maximizing Lx(,ii y ;01 , , ) or
2 equivalently in lnLx (ii , y ;01 , , ) where
nn 1 n lnL (x ,yy ; , ,22 ) ln 2 ln ( x ) 2 . ii 01 2 ii 01 22 2 i1
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 13
2 The normal equations are obtained by partial differentiation of log-likelihood with respect to 01,and and equating them to zero as follows: lnLx ( , y ; , ,2 ) 1 n ii 01 (yx ) 0 2 ii01 0 i1 lnLx ( , y ; , ,2 ) 1 n ii 01 (yxx ) 0 2 iii01 1 i1 and lnLx ( , y ; , ,2 ) n 1 n ii 01 (yx )2 0. 224 ii01 22i1 2 The solution of these normal equations give the maximum likelihood estimates of 01, and as bybx01 n ()()xxyy ii s i1 xy b1 n 2 sxx ()xxi i1 and n 2 ()ybbxii01 s2 i1 n respectively.
It can be verified that the Hessian matrix of second-order partial derivation of ln L with respect to 01, ,
2 22 and is negative definite at 0011bb,, and s which ensures that the likelihood function is maximized at these values.
Note that the least-squares and maximum likelihood estimates of 0 and 1 are identical. The least-squares and maximum likelihood estimates of 2 are different. In fact, the least-squares estimate of 2 is
n 221 syy()i n 2 i1 so that it is related to the maximum likelihood estimate as n 2 ss22 . n 2 2 Thus b0 and b1 are unbiased estimators of 0 and 1 whereas s is a biased estimate of , but it is asymptotically unbiased. The variances of b0 and b1 are same as of b0 and b1 respectively but Vars()22 Vars ().
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 14
Testing of hypotheses and confidence interval estimation for slope parameter: Now we consider the tests of hypothesis and confidence interval estimation for the slope parameter of the model under two cases, viz., when 2 is known and when 2 is unknown.
Case 1: When 2 is known:
Consider the simple linear regression model yxiniii 01( 1,2,..., ) . It is assumed that i 's are independent and identically distributed and follow N(0, 2 ).
First, we develop a test for the null hypothesis related to the slope parameter
H01: 10 where 10 is some given constant.
2 2 Assuming to be known, we know that E()bVarbb11 , () 1 and 1 is a linear combination of sxx
normally distributed ysi ' . So
2 bN11~, sxx and so the following statistic can be constructed
b110 Z1 2
sxx which is distributed as N(0,1) when H0 is true.
A decision rule to test H11: 10 can be framed as follows:
Reject H0 if Z1/2 Z
where Z /2 is the /2 percent points on the normal distribution.
Similarly, the decision rule for one-sided alternative hypothesis can also be framed.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 15
The 100 (1 )% confidence interval for 1 can be obtained using the Z1 statistic as follows:
Pz/2 Z 1 z /2 1 b Pz11 z 1 /22 /2 sxx 22 Pb z b z 1. 1/2ss 11/2 xx xx
So 100(1 )% confidence interval for 1 is
22 bz, bz 1/21/2ss xx xx where z /2 is the /2 percentage point of the N(0,1) distribution.
Case 2: When 2 is unknown: When 2 is unknown then we proceed as follows. We know that SS res ~(2) 2 n 2 and
SSres 2 E . n 2
2 Further, SSres / and b1 are independently distributed. This result will be proved formally later in the next module on multiple linear regression. This result also follows from the result that under normal distribution, the maximum likelihood estimates, viz., the sample mean (estimator of population mean) and the sample
2 variance (estimator of population variance) are independently distributed, so b1 and s are also independently distributed. Thus the following statistic can be constructed:
b11 t0 ˆ 2 s xx b 11 SSres
(2)ns xx which follows a t -distribution with (2)n degrees of freedom, denoted as tn2 , when H0 is true. Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 16
A decision rule to test H11: 10 is to
reject H0 if tt02,/2 n where tn2, /2 is the /2 percent point of the t -distribution with (2)n degrees of freedom. Similarly, the decision rule for the one-sided alternative hypothesis can also be framed.
The 100 (1 )% confidence interval of 1 can be obtained using the t0 statistic as follows: Consider
Pt/2 t 0 t /2 1 b11 Pt/2 t /2 1 ˆ 2 sxx ˆˆ22 Pb t b t/2 1 . 1/2 ss 11 xx xx
So the 100 (1 )% confidence interval 1 is
SSres SS res bt12,/2nn,. bt 12,/2 (2)nsxx (2) ns xx
Testing of hypotheses and confidence interval estimation for intercept term: Now, we consider the tests of hypothesis and confidence interval estimation for intercept term under two cases, viz., when 2 is known and when 2 is unknown.
Case 1: When 2 is known: Suppose the null hypothesis under consideration is
H00:, 00 2 2 2 1 x where is known, then using the result that E()bVarb00 , () 0 and b 0 is a linear nsx combination of normally distributed random variables, the following statistic
b000 Z0 2 2 1 x nsxx
has a N(0,1) distribution when H0 is true. Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 17
A decision rule to test H10: 00 can be framed as follows:
Reject H0 if Z0/2 Z where Z /2 is the /2 percentage points on the normal distribution. Similarly, the decision rule for one- sided alternative hypothesis can also be framed.
2 The 100 (1 )% confidence intervals for 0 when is known can be derived using the Z0 statistic as follows:
Pz/2 Z 0 z /2 1 b00 Pz/2 z /2 1 2 2 1 x ns xx
22 2211xx Pb0/2 z 00/2 b z 1. ns ns xx xx
So the 100 (1 )% of confidential interval of 0 is
22 2211xx bz0/2,. bz 0/2 ns ns xx xx
Case 2: When 2 is unknown: When 2 is unknown, then the following statistic is constructed
b000 t0 SS 1 x 2 res nns 2 xx which follows a t -distribution with (n 2) degrees of freedom, i.e., tn2 when H0 is true.
A decision rule to test H10: 00 is as follows:
Reject H0 whenever tt02,/2 n where tn2, /2 is the /2 percentage point of the t -distribution with (2)n degrees of freedom. Similarly, the decision rule for one-sided alternative hypothesis can also be framed.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 18
The 100 (1 )% confidence interval of 0 can be obtained as follows: Consider
Ptnn2, /2 t 0 t 2, /2 1 b00 Ptnn2, /2 t 2, /2 1 2 SSres 1 x nns 2 xx 22 SSres11xx SS res Pb0 tnn 2,/2 0 b 0 t 2,/2 1. nns22 nns xx xx
So 100(1 )% confidence interval for 0 is
22 SSres11xx SS res bt02,/2nn,. bt 02,/2 nns22 nns xx xx
Test of hypothesis for 2 We have considered two types of test statistics for testing the hypothesis about the intercept term and slope parameter- when 2 is known and when 2 is unknown. While dealing with the case of known 2 , the value of 2 is known from some external sources like past experience, long association of the experimenter with the experiment, past studies etc. In such situations, the experimenter would like to test the hypothesis
22 22 2 like H00: against H00: where 0 is specified. The test statistic is based on the result SS r es ~ 2 . So the test statistic is 2 n2
SSr es 2 C02 2 ~ n under H0 . 0 2 2 The decision rule is to reject H0 if C02,/2 n or C02,1/2 n .
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 19
Confidence interval for 2
2 22 A confidence interval for can also be derived as follows. Since SSres/~ n2 , thus consider
SS P 22res 1 nn2, /22 2,1 /2 SSres2 SS res P 22 1 . nn2,1 /2 2, /2 The corresponding 100(1 )% confidence interval for 2 is
SSres SS res 22,. nn2,1 /2 2, /2
Joint confidence region for 0 and 1 :
A joint confidence region for 0 and 1 can also be found. Such a region will provide a 100(1 )%
confidence that both the estimates of 0 and 1 are correct. Consider the centered version of the linear regression model
* yxxiii01()
* * where 001x . The least squares estimators of 0 and 1 are
* sxy by01and b , sxx respectively. Using the results that
** Eb()00 ,
Eb()11 , 2 Var() b* , 0 n 2 Var() b1 . sxx When 2 is known, then the statistic b** b 00~(0,1)N and 11~(0,1).N 2 2
n sxx
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 20
Moreover, both statistics are independently distributed. Thus
2 2 b** b 00~ 2 and 11~ 2 2 1 2 1 n sxx * are also independently distributed because bb01and are independently distributed. Consequently, the sum of these two nb()()**2 s b 2 011oxx ~. 2 222 Since SS res ~ 2 2 n2
* and SSres is independently distributed of b0 and b1 , so the ratio
**2 2 nb()()00 sxx b 11 22 2 ~.F2,n 2 SSres 2 (2)n * * Substituting bbbx001 and 001x , we get
n 2 Qf 2 SSres where
nn 222 Qnbft()2()()().0 0 xb 0111 b xb i 11 ii11 Since
n 2 Q f PF2,n 2 1 2 SSres holds true for all values of 0 and 1 , so the 100(1 ) % confidence region for 0 and 1 is
n 2 Q f . F2,n 2;1 . . 2 SSres
This confidence region is an ellipse which gives the 100 (1 )% probability that 0 and 1 are contained simultaneously in this ellipse.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 21
Analysis of variance: The technique of analysis of variance is usually used for testing the hypothesis related to equality of more than one parameters, like population means or slope parameters. It is more meaningful in case of multiple regression model when there are more than one slope parameters. This technique is discussed and illustrated here to understand the related basic concepts and fundamentals which will be used in developing the analysis of variance in the next module in multiple linear regression model where the explanatory variables are more than two.
A test statistic for testing H01:0 can also be formulated using the analysis of variance technique as follows.
On the basis of the identity
yyiiˆˆ()(), yy i yy i the sum of squared residuals is
n ˆ 2 Sb() ( yii y ) i1 nn n 22ˆˆ (yyiiiii ) ( yy ) 2 ( yyyy )( ). ii11 i 1 Further, consider
nn ˆ ()()()()yyyyii yybxx i1 i ii11 n 22 bxx1 ()i i1 n ˆ 2 ().yyi i1 Thus we have
nnn 222ˆˆ ()yyiiii ( yy ) (). yy iii111
n 2 The term ()yyi is called the sum of squares about the mean, corrected sum of squares of y (i.e., i1
SScorrected), total sum of squares, or syy .
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 22
n ˆ 2 The term ()yyii describes the deviation: observation minus predicted value, viz., the residual sum of i1
n ˆ 2 squares, i.e., SSres() y i y i i1
n ˆ 2 whereas the term ()yyi describes the proportion of variability explained by the regression, i1 n ˆ 2 SSrge (). y i y i1
n ˆ 2 If all observations yi are located on a straight line, then in this case ()0yyii and thus i1
SScorrected SS re g .
Note that SSrge is completely determined by b1 and so has only one degree of freedom. The total sum of
n n 2 squares syyyy () i has (1)n degrees of freedom due to constraint ()0yyi and SSres has i1 i1
(2)n degrees of freedom as it depends on the determination of b0 and b1 .
2 All sums of squares are mutually independent and distributed as df with df degrees of freedom if the errors are normally distributed.
The mean square due to regression is SS MS rge rge 1 and mean square due to residuals is SS MSE res . n 2
The test statistic for testing H01:0 is MS F rge . 0 MSE
If H01:0 is true, then MSrge and MSE are independently distributed and thus
FF01,2~.n
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 23
The decision rule for H11:0 is to reject H0 if
FF01,2;1 n at level of significance. The test procedure can be described in an Analysis of variance table.
Analysis of variance for testing H01:0
Source of variation Sum of squares Degrees of freedom Mean square F
Regression SSrge 1 MSrge MSMSErge /
Residual SSres n 2 MSE
Total syy n 1
Some other forms of SSreg, SS res and syy can be derived as follows: The sample correlation coefficient then may be written as
sxy rxy . sxxs yy Moreover, we have
ssxy yy br1 xy . ssxx xx
The estimator of 2 in this case may be expressed as 1 n se22 n 2 i i1 1 SS . n 2 res
Various alternative formulations for SSres are in use as well:
n 2 SSres[( y i b01 b x i )] i1 n 2 [(yybxxii )1 ( )] i1 2 sbsbsyy11 xx 2 xy 2 sbsyy1 xx 2 ()sxy syy . sxx Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 24
Using this result, we find that
SScorrected s yy and
SSrge s yy SS res ()s 2 xy sxx 2 bs1 xx
bs1 xy .
Goodness of fit of regression
It can be noted that a fitted model can be said to be good when residuals are small. Since SSres is based on
residuals, so a measure of the quality of a fitted model can be based on SSres . When the intercept term is present in the model, a measure of goodness of fit of the model is given by SS SS R2 1.res rge ssyy yy This is known as the coefficient of determination. This measure is based on the concept that how much
variation in y ’s stated by syy is explainable by SSreg and how much unexplainable part is contained in
SSres . The ratio SSrge / s yy describes the proportion of variability that is explained by regression in relation to the total variability of y . The ratio SSres/ s yy describes the proportion of variability that is not covered by the regression. It can be seen that
22 Rr xy
2 2 where rxy is the simple correlation coefficient between x and y. Clearly 01 R , so a value of R closer
to one indicates the better fit and value of R2 closer to zero indicates the poor fit.
Prediction of values of study variable An important use of linear regression modeling is to predict the average and actual values of the study variable. The term prediction of the value of study variable corresponds to knowing the value of E (y ) (in case of average value) and value of y (in case of actual value) for a given value of the explanatory variable. We consider both cases.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 25
Case 1: Prediction of average value
Under the linear regression model, yx01 , the fitted model is yb 01 bx where bb01and are the
OLS estimators of 01and respectively.
Suppose we want to predict the value of Ey() for a given value of x x0 . Then the predictor is given by
Ey(|xbbx )ˆ . 0/010yx0 Predictive bias Then the prediction error is given as ˆ Ey() b bx E ( x ) yx|0100100
bbx0100 () 10 x
()().bbx00 110 Then E ˆ Ey() Eb ( ) Eb ( ) x yx|001100 000 Thus the predictor is an unbiased predictor of Ey(). y / x0
Predictive variance: The predictive variance of ˆ is y|x0 PV()ˆ Var ( b b x ) yx|0100
Var y b10() x x 2 Var() y ( x0101 x ) Var () b 2( x x ) Cov (, y b ) 2 22()xx 0 0 nsxx 2 2 1 ()xx0 . nsxx Estimate of predictive variance The predictive variance can be estimated by substituting 22by ˆ MSE as
1 ()xx 2 PV()ˆˆ2 0 yx| 0 nsxx
2 1 ()xx0 MSE . nsxx
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 26
Prediction interval estimation:
The 100(1-)% prediction interval for Ey(/x0 ) is obtained as follows: The predictor ˆ is a linear combination of normally distributed random variables, so it is also normally y|x0 distributed as
ˆˆ~,NxPV . yx|010|00 yx
So if 2 is known, then the distribution of ˆ E(|yx ) yx|00 PV ()ˆ yx| 0 is N (0,1). So the 100(1- )% prediction interval is obtained as
ˆ Eyx(| ) yx|00 Pz/2 z /2 1 PV ()ˆ yx| 0
which gives the prediction interval for E(/yx0 ) as
22 2211()xx00 () xx ˆˆyx|/2zz,. yx |/2 00ns ns xx xx
When 2 is unknown, it is replaced by ˆ 2 MSE and in this case the sampling distribution of ˆ Eyx(| ) yx|00 2 1 ()xx0 MSE nsxx is t -distribution with (2)n degrees of freedom, i.e., tn2 .
The 100(1- )% prediction interval in this case is ˆ Eyx(| ) yx|00 Pt/2,n 2 t 1. 2 ,2n 1 ()xx 2 MSE 0 ns xx which gives the prediction interval as
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 27
22 11()xx00 () xx ˆˆyx|/2,2tMSE n, yx |/2,2 tMSE n . 0 ns0 ns xx xx
Note that the width of the prediction interval Ey(|x0 ) is a function of x0 . The interval width is minimum for x0 x and widens as x0 x increases. This is also expected as the best estimates of y to be made at x -values lie near the center of the data and the precision of estimation to deteriorate as we move to the boundary of the x -space.
Case 2: Prediction of actual value
If x0 is the value of the explanatory variable, then the actual value predictor for y is
yˆ0010bbx.
The true value of y in the prediction period is given by yx00100 where 0 indicates the value that would be drawn from the distribution of random error in the prediction period. Note that the form of predictor is the same as of average value predictor, but its predictive error and other properties are different. This is the dual nature of predictor.
Predictive bias:
The predictive error of yˆ0 is given by yybbxˆ () x 000100100 ()().bbx00 110 Thus, we find that Ey()()()()ˆ y Eb Eb x E 00 00 110 0 0000
which implies that yˆ0 is an unbiased predictor of y0 .
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 28
Predictive variance
Because the future observation y0 is independent of yˆ0 , the predictive variance of yˆ0 is
2 PV() yˆˆ000 E ( y y ) 2 Eb[(00 )( x 0 xb )( 11 )( b 11 ) x 0 ] 22 Varb ()0 ( x 0 x ) Varb () 1 xVarb () 1 Var ( 0 ) 2( x 0 xCovb ) (,) 01 b 2 xCovb (,) 01 b 2( x 0 xVarb ) () 1
[rest of the terms are 0 assuming the independence of 012 with , ,...,n ] 22 Var()[() b00 x x x 2()]()()2[()2](,) x 0 x Var b 1 Var x 0 x x Cov b 01 b 2 Varb (00 ) xVarb ( 1 ) Var ( 0 ) 2 xCovb 0 ( 01 , b ) 22 2 2221 xx xx00 2 nsxx s xx s xx 2 2 1 ()xx0 1 . n sxx
Estimate of predictive variance The estimate of predictive variance can be obtained by replacing 2 by its estimate ˆ 2 MSE as 2 2 1 ()xx0 PV() yˆˆ0 1 nsxx
2 1 ()xx0 MSE 1. nsxx Prediction interval: If 2 is known, then the distribution of yyˆ 00 PV() yˆ0 is N(0,1). So the 100(1- )% prediction interval is obtained as
yyˆ Pz00 z 1 /2ˆ /2 PV() y0
which gives the prediction interval for y0 as
22 2211()xx00 () xx yzˆˆ0/21,1. yz 0/2 ns ns xx xx
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 29
When 2 is unknown, then yyˆ 00 PV() yˆ0
follows a t -distribution with (n 2) degrees of freedom. The 100(1- )% prediction interval for yˆ0 in this case is obtained as
yyˆ 00 Pt/2,nn 2 t /2, 2 1 ˆ PV() y0 which gives the prediction interval
22 11()xx00 () xx ytˆˆ0/2,2nn MSE1, yt 0/2,2 MSE 1 . ns ns xx xx
The prediction interval is of minimum width at x0 x and widens as x0 x increases.
The prediction interval for yˆ is wider than the prediction interval for ˆ because the prediction interval 0 y / x0
for yˆ0 depends on both the error from the fitted model as well as the error associated with the future observations.
Reverse regression method The reverse (or inverse) regression approach minimizes the sum of squares of horizontal distances between the observed data points and the line in the following scatter diagram to obtain the estimates of regression parameters. yi
YX 01 (xi,
(Xi, )
x,
Reverse regression Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 30
The reverse regression has been advocated in the analysis of gender (or race) discrimination in salaries. For example, if y denotes salary and x denotes qualifications, and we are interested in determining if there is gender discrimination in salaries, we can ask: “Whether men and women with the same qualifications (value of x) are getting the same salaries (value of y). This question is answered by the direct regression.”
Alternatively, we can ask: “Whether men and women with the same salaries (value of y) have the same qualifications (value of x). This question is answered by the reverse regression, i.e., regression of x on y.”
The regression equation in case of reverse regression can be written as
** xyiii01 (in 1,2,..., )
where i ’s are the associated random error components and satisfy the assumptions as in the case of the usual simple linear regression model. The reverse regression estimates ˆˆof** and of for the OR011 R
model are obtained by interchanging the x and y in the direct regression estimators of 01and . The estimates are obtained as ˆˆ ORx 1 R y and s ˆ yy 1R sxy for 01and respectively. The residual sum of squares in this case is s2 SS* s xy . res xx syy Note that s2 ˆ xy 2 11Rxybr ssxx yy
where b1 is the direct regression estimator of the slope parameter and rxy is the correlation coefficient
2 between x and y. Hence if rxy is close to 1, the two regression lines will be close to each other.
An important application of the reverse regression method is in solving the calibration problem.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 31
Orthogonal regression method (or major axis regression method) The direct and reverse regression methods of estimation assume that the errors in the observations are either in x -direction or y -direction. In other words, the errors can be either in the dependent variable or independent variable. There can be situations when uncertainties are involved in dependent and independent variables both. In such situations, the orthogonal regression is more appropriate. In order to take care of errors in both the directions, the least-squares principle in orthogonal regression minimizes the squared perpendicular distance between the observed data points and the line in the following scatter diagram to obtain the estimates of regression coefficients. This is also known as the major axis regression method. The estimates obtained are called orthogonal regression estimates or major axis regression estimates of regression coefficients.
yi
(xi, yi)
YX 01
(Xi, Yi)
xi
Orthogonal or major axis regression
If we assume that the regression line to be fitted is YXii 01 , then it is expected that all the observations
(xii ,yi ), 1,2,..., n lie on this line. But these points deviate from the line, and in such a case, the squared
perpendicular distance of observed data (xii ,yi ) ( 1,2,..., n ) from the line is given by
222 dXxYiiiii()()y
th where (,)XYii denotes the i pair of observation without any error which lies on the line.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 32
n 2 The objective is to minimize the sum of squared perpendicular distances given by di to obtain the i1 estimates of 0 and 1 . The observations (xii ,yi ) ( 1,2,..., n ) are expected to lie on the line
YXii01 , so let EY X 0. ii01 i
n 2 The regression coefficients are obtained by minimizing di under the constraints Ei 's using the i1 Lagrangian’s multiplier method. The Lagrangian function is
nn 2 L0 dEiii2 ii11
where 1,...,n are the Lagrangian multipliers. The set of equations are obtained by setting LLL L 0000, 0, 0 and 0 0 (in 1,2,..., ). XYii01 Thus we find
L0 ()Xxii i1 0 X i L 0 ()Yy 0 Y ii i i n L0 i 0 0 i1 n L0 iiX 0. 1 i1 Since Xx iii1 Yyiii , so substituting these values is i , we obtain
Eyiii ( )01 ( x ii 1 ) 0
01xyii i 2 . 1 1
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 33
n Also using this i in the equation i 0, we get i1
n ()01xyii i1 2 0 1 1
n and using ()Xxii i1 0and0 ii X , we get i1
n ii()0.x i1 i1
Substituting i in this equation, we get
n 2 ()01xxyxiiii 2 iii1101() xy 2220. (1) (1i ) (11 )
n Using i in the equation and using the equation i 0, we solve i1
n ()01xyii i1 2 0. 1 1
The solution provides an orthogonal regression estimate of 0 as ˆˆ 01ORyx OR ˆ where 1OR is an orthogonal regression estimate of 1.
Now, substituting 0OR in equation (1), we get
nn2 22 (1111 )yxiiiii xx x x y 111 y x x ii y 0 ii11 or
nn2 2 (11111 )xyii y ( x i x ) ( y i y ) ( x i x ) 0 ii11 or n n 22 (11111 ) (uxviii )( u ) ( v ii u ) 0 i1 i1 where
uxxii ,
vyyii .
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 34
nn Since uuii 0, so ii11 n 222 11uvii ( u i v i ) uv ii 0 i1 or 2 11ssssxy ( xx yy ) xy 0.
Solving this quadratic equation provides the orthogonal regression estimate of 1 as
22 syyssignsss xx xy()4 xx yy s ˆ xy 1OR 2sxy where signs()xy denotes the sign of sxy which can be positive or negative. So
1 ifsxy 0 sign() sxy . 1ifsxy 0.
n ˆ 2 Notice that this gives two solutions for 1OR . We choose the solution which minimizes di . The other i1
n 2 solution maximizes di and is in the direction perpendicular to the optimal solution. The optimal solution i1
can be chosen with the sign of sxy .
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 35
Reduced major axis regression method: The direct, reverse and orthogonal methods of estimation minimize the errors in a particular direction which is usually the distance between the observed data points and the line in the scatter diagram. Alternatively, one can consider the area extended by the data points in a certain neighbourhood and instead of distances, the area of rectangles defined between the corresponding observed data point and the nearest point on the line in the following scatter diagram can also be minimized. Such an approach is more appropriate when the uncertainties are present in the study and explanatory variables both. This approach is termed as reduced major axis regression.
yi
(xi yi)
YX 01
(Xi, Yi)
xi
Reduced major axis method
Suppose the regression line is YXii01 on which all the observed points are expected to lie. Suppose
the points (xii ,yi ), 1,2,..., n are observed which lie away from the line. The area of rectangle extended between the ith observed data point and the line is
Aiiiii(~)(~)(1,2,...,)XxYyi n
th where (,)XYii denotes the i pair of observation without any error which lies on the line.
The total area extended by n data points is
nn Aiiiii (~)(~).XxYy ii11
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 36
All observed data points (xii ,yi ), ( 1,2,..., n ) are expected to lie on the line
YXii01 and let
* EYii01 X i 0.
* So now the objective is to minimize the sum of areas under the constraints Ei to obtain the reduced major axis estimates of regression coefficients. Using the Lagrangian multiplier method, the Lagrangian function is
nn * LARiii E ii11 nn * ()()XxYyiiii ii E ii11 where 1,...,n are the Lagrangian multipliers. The set of equations are obtained by setting LLLL RRRR0, 0, 0, 0 (in 1,2,..., ). XYii01
Thus
LR ()Yyii 1 i 0 X i L R ()0Xx Y ii i i n LR i 0 0 i1 n LR iiX 0. 1 i1
Now
Xxiii
Yyii1 i
01Xyii 1 i
01 (xyii ) i 1 i
yxii01 i . 21
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 37
n Substituting i in i 0 , the reduced major axis regression estimate of 0 is obtained as i1 ˆˆ 01RMyx RM ˆ ˆ where 1RM is the reduced major axis regression estimate of 1 . Using Xxiiii , and 0RM in
n iiX 0 , we get i1
n yyxxiiii11 yyxx 11 xi 0. i1 2211
Let uxxiiand vyy ii , then this equation can be re-expressed as
n ()(2)0.vuvuiiii111 x i1
nn Using uuii0, we get ii11
nn 22 2 vuii1 0. ii11
Solving this equation, the reduced major axis regression estimate of 1 is obtained as
s ˆ yy 1RM sign() s xy sxx
1ifsxy 0 where sign() sxy 1ifsxy 0.
We choose the regression estimator which has same sign as of sxy .
Least absolute deviation regression method The least-squares principle advocates the minimization of the sum of squared errors. The idea of squaring the errors is useful in place of simple errors because random errors can be positive as well as negative. So consequently their sum can be close to zero indicating that there is no error in the model and which can be misleading. Instead of the sum of random errors, the sum of absolute random errors can be considered which avoids the problem due to positive and negative random errors.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 38
In the method of least squares, the estimates of the parameters 0 and 1 in the model
n 2 yxiniii01 . ( 1,2,..., ) are chosen such that the sum of squares of deviations i is minimum. In i1 the method of least absolute deviation (LAD) regression, the parameters 0 and 1 are estimated such that
n the sum of absolute deviations i is minimum. It minimizes the absolute vertical sum of errors as in the i1 following scatter diagram:
yi
(xi, yi)
YX 01
(Xi, Yi)
xi Least absolute deviation regression ˆ ˆ The LAD estimates 0L and 1L are the estimates of 0 and 1 , respectively which minimize
n LAD(,)01 yii 0 1 x i1
for the given observations (xii ,yi )( 1,2,..., n ).
Conceptually, LAD procedure is more straightforward than OLS procedure because e (absolute residuals) is a more straightforward measure of the size of the residual than e2 (squared residuals). The LAD
regression estimates of 0 and 1 are not available in closed form. Instead, they can be obtained numerically based on algorithms. Moreover, this creates the problems of non-uniqueness and degeneracy in the estimates. The concept of non-uniqueness relates to that more than one best line pass through a data point. The degeneracy concept describes that the best line through a data point also passes through more than one other data points. The non-uniqueness and degeneracy concepts are used in algorithms to judge the
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 39
quality of the estimates. The algorithm for finding the estimators generally proceeds in steps. At each step, the best line is found that passes through a given data point. The best line always passes through another data point, and this data point is used in the next step. When there is non-uniqueness, then there is more than one best line. When there is degeneracy, then the best line passes through more than one other data point. When either of the problems is present, then there is more than one choice for the data point to be used in the next step and the algorithm may go around in circles or make a wrong choice of the LAD regression line. The exact tests of hypothesis and confidence intervals for the LAD regression estimates can not be derived analytically. Instead, they are derived analogously to the tests of hypothesis and confidence intervals related to ordinary least squares estimates.
Estimation of parameters when X is stochastic In a usual linear regression model, the study variable is supped to be random and explanatory variables are assumed to be fixed. In practice, there may be situations in which the explanatory variable also becomes random.
Suppose both dependent and independent variables are stochastic in the simple linear regression model
yX01 where is the associated random error component. The observations (xii ,yi ), 1,2,..., n are assumed to be jointly distributed. Then the statistical inferences can be drawn in such cases which are conditional on X .
22 Assume the joint distribution of X and y to be bivariate normal N(,xyx , , y ,) where x and y
2 2 are the means of X and y; x and y are the variances of X and y ; and is the correlation coefficient between X and y . Then the conditional distribution of y given Xx is the univariate normal conditional mean
Ey(|Xx ) yx|01 x and the conditional variance of y given Xx is
22 2 Var(| y X x ) yx| y (1 ) where
01yx and
y 1 . x Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 40
When both X and y are stochastic, then the problem of estimation of parameters can be reformulated as follows. Consider a conditional random variable y | Xx having a normal distribution with mean as
2 conditional mean y|x and variance as conditional variance Var(| y X x ) y|x . Obtain n independently
2 distributed observation yxiii| , 1,2,..., n from N(,)y||xyx with nonstochastic X . Now the method of
maximum likelihood can be used to estimate the parameters which yield the estimates of 0 and 1 as earlier in the case of nonstochastic X as bybx1 and s xy b1 , sxx respectively. Moreover, the correlation coefficient Ey()() X y x yx can be estimated by the sample correlation coefficient
n ()()yyxxii ˆ i1 nn 22 ()xiixyy () ii11 s xy ssxx yy
sxx b1 . syy Thus
22 sxx ˆ b1 syy s b xy 1 s yy n ˆ2 syy i i1 syy R2 which is same as the coefficient of determination. Thus R2 has the same expression as in the case when X is fixed. Thus R2 again measures the goodness of the fitted model even when X is stochastic.
Econometrics | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur 41