Conditional Expectations and Regression Analysis

LECTURE 1 Conditional Expectations and Regression Analysis In this chapter, we shall study three methods that are capable of generating estimates of statistical parameters in a wide variety of contexts. These are the method of moments, the method of least squares and the principle of maximum likelihood. The methods will be studied only in relation to the simple linear regression model; and it will be seen that each entails assumptions that may be more or less appropriate to the context in which the model is to be applied. In the case of the regression model, the three methods generate estimating equations that are formally identical; but this does not justify us in taking a casual approach to the statistical assumptions that sustain the model. To be casual in making our assumptions is to invite the danger of misinterpretation when the results of the estimation are in hand. We begin with the method of moments, we shall proceed to the method of least squares, and we shall conclude with a brief treatment of the method of maximum likelihood. Conditional Expectations Let y be a continuously distributed random variable whose probability density function is f(y). If we wish to predict the value of y without the help of any other information, then we might take its expected value, which is defined by E(y)= yf(y)dy. The expected value is a so-called minimum-mean-square-error (m.m.s.e.) predictor. If π is the value of a prediction, then the mean-square error is given by M = (y − π)2f(y)dy (1) = E (y − π)2 = E(y2) − 2πE(y)+π2; 1 D.S.G. POLLOCK : ECONOMETRICS and, using the methods of calculus, it is easy to show that this quantity is minimised by taking π = E(y). Now let us imagine that y is statistically related to another random variable x, whose values have already been observed. For the sake of argument, it may be assumed that the form of the joint distribution of x and y, which is f(x, y), is known. Then, the minimum-mean-square-error prediction of y is given by the conditional expectation f(x, y) (2) E(y|x)= y dy f(x) wherein (3) f(x)= f(x, y)dy is the so-called marginal distribution of x. This proposition may be stated formally in a way that will assist us in proving it: (4) Lety ˆ =ˆy(x) be the conditional expectation of y given x, which is also expressed asy ˆ = E(y|x). Then E{(y − yˆ)2}≤E{(y − π)2}, where π = π(x) is any other function of x. Proof. Consider 2 E (y − π)2 = E (y − yˆ)+(ˆy − π) (5) = E (y − yˆ)2 +2E (y − yˆ)(ˆy − π) + E (ˆy − π)2 . In the second term, there is E (y − yˆ)(ˆy − π) = (y − yˆ)(ˆy − π)f(x, y)∂y∂x x y (6) = (y − yˆ)f(y|x)∂y (ˆy − π)f(x)∂x x y =0. Here, the second equality depends upon the factorisation f(x, y)=f(y|x)f(x), which expresses the joint probability density function of x and y as the product of the conditional density function of y given x and the marginal density function of x. The final equality depends upon the fact that (y − yˆ)f(y|x)∂y = E(y|x) − E(y|x) = 0. Therefore, E{(y − π)2} = E{(y − yˆ)2} + E{(ˆy − π)2}≥ E{(y − yˆ)2}, and the assertion is proved. 2 REGRESSION ANALYSIS The definition of the conditional expectation implies that E(xy)= xyf(x, y)∂y∂x x y (7) = x yf(y|x)∂y f(x)∂x x y = E(xyˆ). When the equation E(xy)=E(xyˆ) is rewritten as (8) E x(y − yˆ) =0, it may be described as an orthogonality condition. This condition indicates that the prediction error y − yˆ is uncorrelated with x. The result is intuitively appealing; for, if the error were correlated with x, then the information of x could not have been used efficiently in formingy ˆ. If the joint distribution of x and y is a normal distribution, then it is straightforward to find an expression for the function E(y|x). In the case of a normal distribution, there is (9) E(y|x)=α + βx, which is to say that the conditional expectation of y given x is a linear function of x. Equation (9) is described as a linear regression equation; and this terminology will be explained later. The object is to find expressions for α and β that are in terms of the first- order and second-order moments of the joint distribution. That is to say, we wish to express α and β in terms of the expectations E(x), E(y), the variances V (x), V (y) and the covariance C(x, y). Admittedly, if we had already pursued the theory of the Normal distribution to the extent of demonstrating that the regression equation is a linear equation, then we should have already discovered these expressions for α and β. However, present purposes are best served by taking equation (9) as the starting point; and the linearity of the regression equation may be regarded as an assumption in its own right rather than as a deduction from the assumption of a normal distribution. To begin, equation (9) may be multiplying throughout by f(x), and inte- grates with respect to x. This gives (10) E(y)=α + βE(x), whence (11) α = E(y) − βE(x). 3 D.S.G. POLLOCK : ECONOMETRICS Equation (10) shows that the regression line passes through the point E(x, y)= {E(x),E(y)}, which is the expected value of the joint distribution. Putting (11) into (9) gives (12) E(y|x)=E(y)+β x − E(x) , which shows that the conditional expectation of y differs from the unconditional expectation in proportion to the error of predicting x by taking its expected value. Next, (9) is multiplied by x and f(x) and then integrated with respect to x to provide (13) E(xy)=αE(x)+βE(x2). Multiplying (10) by E(x) gives 2 (14) E(x)E(y)=αE(x)+β E(x) , whence, on taking (14) from (13), we get 2 (15) E(xy) − E(x)E(y)=β E(x2) − E(x) , which implies that E(xy) − E(x)E(y) β = 2 E(x2) − E(x) E x − E(x) y − E(y) (16) = 2 E x − E(x) C(x, y) = . V (x) Thus, α and β have been expressed in terms of the moments E(x), E(y), V (x) and C(x, y) of the joint distribution of x and y. Example. Let x = ξ + η be an observed random variable which combines a signal component ξ and a noise component η. Imagine that the two components 2 2 are uncorrelated with C(ξ,η) = 0, and let V (ξ)=σξ and V (η)=ση. The object is to extract the signal from the observation. 4 REGRESSION ANALYSIS According to the formulae of (12) and (16), the expectation of the signal conditional upon the observation is C(x, ξ) (17) E(ξ|x)=E(ξ)+ x − E(x) . V (x) Given that ξ and η are uncorrelated, it follows that 2 2 (18) V (x)=V (ξ + η)=σξ + ση and that 2 (19) C(x, ξ)=V (ξ)+C(ξ,η)=σξ . Therefore σ2 | ξ − (20) E(ξ x)=E(ξ)+ 2 2 x E(x) . σξ + ση Estimation by the Method of Moments The values of the various moments comprised in the formulae for the regression parameters are unlikely to be know in the absense of sample data. However, they are easily estimated from the data. Imagine that a sample of T observations on x and y is available: (x1,y1), (x2,y2),...,(xT ,yT ). Then, the following empirical or sample moments can be calculated: T 1 x¯ = x , T t t=1 T 1 y¯ = y , T t t=1 (21) T T 1 1 s2 = (x − x¯)2 = x2 − x¯2, x T t T t t=1 t=1 T T 1 1 s = (x − x¯)(y − y¯)= x y − x¯y.¯ xy T t t T t t t=1 t=1 The method of moments suggests that, in order to estimate α and β, the moments should be replaced in the formulae of (11) and (16) by the corresponding sample moments. Thus the estimates of α and β are αˆ =¯y − βˆx,¯ (22) − − ˆ (xt x¯)(yt y¯) β = 2 . (xt − x¯) 5 D.S.G. POLLOCK : ECONOMETRICS The justification of the method is that, in many of the circumstances under which the data are liable to be generated, the sample moments are expected to converge to the true moments of the bivariate distribution, thereby causing the estimates of the parameters to converge, likewise, to the true values. In this context, the concept of convergence has a special definiton. Ac- cording to the concept of convergence which is used in mathematical analysis, (23) A sequence of numbers {an} is said to converge to a limit a if, for any arbitrarily small real number , there exists a corresponding integer N such that |an − a| <for all n ≥ N. This concept is not appropriate to the case of a stochastic sequence, such as a sequence of estimates. For, no matter how many observations N have been incorporated in the estimate aN , there remains a possibility that, subsequently, an aberrant observation yn will draw the estimate an beyond the bounds of a±.

Load more