Lecture Notes Univariate, Bivariate, Multivariate Normal, Quadratic

Home , Univariate (statistics)

Lecture Notes

Univariate, Bivariate, Multivariate Normal, Quadratic Forms & Related Estimations

(1) (Univariate) Normal Distribution

Probability density function:

X ~ N(, 2 ) : X follows normal distribution of mean  and variance  2

( x )2  1 2 f (x)  e 2 ,    x  , x  R 2

Moment generating function:

1  t  2t 2 tx 2 M X (t)  e f (x)dx  e 

Theorem 1 Sampling from the normal population

Let X,X,…,X ~ N(μ, σ ), where i.i.d stands for independent and identically distributed

σ 1. X~N(μ, ) 2. A function of the sample variance as shown below follows the Chi square distribution with (n-1) degrees of freedom:

(n−1)S ∑ (X −X) = ~휒 σ σ *Reminder: The Sample variance S is defined as:

∑ (X −X) S = n−1 *Def 1: The Chi-square distribution is a special gamma distribution (*** Please find out which one it is.)

*Def 2: Let Z,Z,…,Z ~ N(0,1), then W= ∑ 푍 ~휒

3. X & S are independent. 4. The Z-score transformation of the sample mean follows the standard normal distribution: 1

X − 휇 Z= ~N(0,1) σ/√푛

5. A T-random variable with n-1 degrees of freedom can be obtained by replacing the population standard deviation with its sample counterpart in the Z-score above: X − 휇 T= ~푡 S/√푛

*Def. of t-distribution (Student’s t-distribution, first introduced by William Sealy Gosset )

Let Z~N(0,1), 푊~휒, where Z&푊 are independent.

Then,

Z T= ~ 푡 푊 푘

*Def. of F-distribution

Let 푉~휒 & 푊~휒, where 푉 & 푊 are independent.

Then,

V/m F= ~ 퐹 W/k ,

It is apparent that

Z T = ~ 퐹 W/k ,

(2) Bivariate Normal Distribution

(푋, 푌)~퐵푁(휇, 휎 ; 휇, 휎 ; 휌) where 휌 is the correlation between 푋 & 푌

The joint p.d.f. of (푋, 푌) is

1 1 푥 − 휇 푥 − 휇 푦 − 휇 푦 − 휇 푓,(푥, 푦) = exp − −2휌 + 2(1 − 휌) 휎 휎 휎 휎 2휋휎휎1 − 휌

where −∞ < 푥 < ∞,−∞ < 푦 <∞. Then X and Y are said to have the bivariate normal distribution.

The joint moment generating function for X and Y is

1 푀(푡 , 푡 ) = exp 푡 휇 + 푡 휇 + (푡휎 +2휌푡 푡 휎 휎 + 푡휎) 2

Questions:

(a) Find the marginal pdf’s of X and Y;

(b) Prove that X and Y are independent if and only if ρ = 0.

(Here ρ is the population correlation coefficient between X and Y.)

(d) Find the conditional pdf of f(x|y), and f(y|x)

Solution:

(a)

The moment generating function of X can be given by

1 푀 (푡) = 푀(푡, 0) = 푒푥푝 휇 푡 + 휎2푡2. 2

Similarly, the moment generating function of Y can be given by

1 푀 (푡) = 푀(0, 푡) = 푒푥푝 휇 푡 + 휎2푡2. 2

Thus, X and Y are both marginally normal distributed, i.e.,

2 2 푋~푁(휇, 휎), and 푌~푁(휇, 휎 ).

The pdf of X is

2 1 (푥 − 휇) 푓(푥) = 푒푥푝 − 2 . √2휋휎 2휎

The pdf of Y is

2 1 (푦 − 휇) 푓(푦) = 푒푥푝 − 2 . √2휋휎 2휎

(b)

If휌 =0, then

1 푀(푡 , 푡 ) = exp 휇 푡 + 휇 푡 + (휎푡 + 휎푡) = 푀(푡 ,0)∙ 푀(0, 푡 ) 2

Therefore, X and Y are independent.

If X and Y are independent, then

1 푀(푡 , 푡 ) = 푀(푡 ,0) ∙ 푀(0, 푡 ) = exp 휇 푡 + 휇 푡 + (휎푡 + 휎푡) 2 1 = exp 휇 푡 + 휇 푡 + (휎푡 +2휌휎 휎 푡 푡 + 휎푡) 2

Therefore, 휌 =0

(c)

() 푀(푡) = 퐸푒 = 퐸[푒 ]

Recall that 푀(푡, 푡) = 퐸[푒 ], therefore we can obtain 푀(푡)by 푡 = 푡 = 푡 in 푀(푡, 푡)

That is,

1 푀 (푡) = 푀(푡, 푡) = exp 휇 푡 + 휇 푡 + (휎푡 +2휌휎 휎 푡 + 휎푡) 2 1 = exp (휇 + 휇 )푡 + (휎 +2휌휎 휎 + 휎)푡 2

∴ 푋 + 푌 ~푁(휇 = 휇 + 휇, 휎 = 휎 +2휌휎휎 + 휎 )

(d)

The conditional distribution of X given Y=y is given by

2 ⎧ 휎 ⎫ 푥 − 휇 − 휌(푦 − 휇) 푓(푥, 푦) 1 ⎪ 휎 ⎪ 푓(푥|푦) = = 푒푥푝 − . 2 2 2 푓(푦) √2휋휎 1 − 휌 ⎨ 2(1 − 휌 )휎 ⎬ ⎪ ⎪ ⎩ ⎭ Similarly, we have the conditional distribution of Y given X=x is

2 ⎧ 휎 ⎫ 푦 − 휇 − 휌(푥 − 휇) 푓(푥, 푦) 1 ⎪ 휎 ⎪ 푓(푦|푥) = = 푒푥푝 − . 2 2 2 푓(푥) √2휋휎 1 − 휌 ⎨ 2(1 − 휌 )휎 ⎬ ⎪ ⎪ ⎩ ⎭ Therefore:

휎 2 2 푋|푌 = 푦 ~ 푁 휇 + 휌 (푦 − 휇), (1 − 휌 )휎 휎

휎 푌|푋 = 푥 ~ 푁 휇 + 휌 (푥 − 휇), (1− 휌 )휎 휎

(3) Multivariate Normal Distribution

Let 풀 = (푌,…, 푌) ~푁(휇, 횺), the general formula for the 푘-dimensional normal density is 1 1 푓풀(푦,…, 푦) = exp − 푦 − 휇 횺 푦 − 휇 (2휋)|횺| 2

where 퐸풀 = 휇 and 횺 is the covariance matrix of 푌.

Now recall the bivariate normal case, where 푛 =2, we have: 휇 휇 = 휇

휎 휎 휎 휌휎휎 횺 = = , 휎 = Cov(푌, 푌) 휎 휎 휌휎휎 휎

1 휎 −휌휎휎 1 1/휎 −휌/휎휎 횺 = = 휎 휎 (1− 휌 ) −휌휎휎 휎 1− 휌 −휌/휎휎 1/휎

Thus the joint density of 푌 and 푌 is 1 1 (푦 − 휇) (푦 − 휇) 2휌(푦 − 휇)(푦 − 휇) 푓 , (푦, 푦) = exp − + − 2(1− 휌) 휎 휎 2휋휎휎1− 휌 휎 휎

Y1  0 1 0 Example: Plot of the bivariate pdf of Y =   with mean µ=   and covariance Σ =   Y2  0 0 1

*Marginal distribution.

Ya  T T Let Y be partitioned so that Y =   , where Ya =[Y1, …, Yk] and Yb =[Yk+1, …, Yn] . If Y is multivariate normal Yb 

μa  Σaa Σab  with mean µ =   and covariance matrix Σ =   , then the marginal distribution of Ya is also μb  Σba Σbb  T multivariate Gaussian, with mean µa =[µ1, …, µk] and covariance matrix Σaa.

* Conditional distribution

Ya  μa  Let Y be partitioned so that Y =   . If Y is multivariate Gaussian with mean µ =   and covariance matrix Σ Yb  μb 

Σaa Σab  =   , then the conditional distribution of Ya given that Yb=yb, is also multivariate Gaussian, with mean Σba Σbb    1  Σ  Σ  Σ Σ 1Σ μab μa ΣabΣbb ( Yb μb and) covariance aa|b aa ab bb ba

Theorem: t 1 2 Q  Y  μ  Σ Y  μ  ~  n

[proof:]

t t t Since  is positive definite,   T  T , where T is a real orthogonal matrix ( TT  T T  I ) and

  1 0  0   0   0     2  . Then,          0 0   n   1  T   1T t  T  1  T t  . Thus, Q  Y  μ t Σ 1 Y  μ  t  Y  μ TΛ 1T t Y  μ  X t Λ 1X where X  T t Y  μ  . Further,

Q  X t Λ 1X  1  0  0   X  1  1   1  X  0  0  2   X 1 X 2  X n     2           1  X  0 0   n   n  2 n X 2 n  X   i   i       i1 i i1  i 

Therefore, if we can prove X i ~ N 0,  i  and X i are mutually independent, then

2 X n  X  i ~ N 0,1 , Q   i  ~  2 .    n  i i 1   i 

The joint density function of X 1 , X 2 ,  , X n is

g x   g x1 , x 2 , , x n   f y J , where

  y y y    1 1  1   x x x    1 2 n         y2 y2 y2   yi   J  det    det  x x x    x   1 2 n    j              yn yn yn         x1 x2 xn      X  T t Y    μ   detT    Y  μ  TX    Y    T   X   det TT t  detI   1       1   detT detT t  det 2 T        detT   1   1

Therefore, the density function of X 1 , X 2 ,, X n

g x  f y

n 1 2 2  1   1    1 t 1       exp  y  μ   y  μ   2   det    2  n 1 2 2  1   1    1 t 1       exp  x  x  2   det    2  n 1 2 2 2 n  1   1    1 xi       exp     2   det    2 i1 i  1     2  t  n  det    det TT    n 2   1 2 1   1 x      i  t     n exp     det TT    det I   2    2 i1      i  n  i     i1    det     i   i1 

1  1 2  2 n  1  2  1    x       exp i    2     2  i1    i   i   

Therefore, X i ~ N 0,  i  and, 푋,⋯, 푋 are mutually independent.

Moment Generating Function of Multivariate Normal Random Variable:

Let

Y1   t1  Y  t  Y   2  ~ N μ ,  , t   2  .           Y n  t n 

Then, the moment generating function for Y is

t M Y t   M Y t1, t2 , ,tn   Eexp t Y 

 Eexp t1Y1  t2Y2    tnYn  

 1   exp  t tμ  t t t   2  9

Theorem: If Y ~ N μ ,   and C is a p  n matrix of rank p, then CY ~ N Cμ, CC t .

[proof:]

Let X  CY . Then,

t t M X t   Eexp t X  Eexp t CY   s  C t t   E exp s tY      t t    s  t C   1   exp  s tμ  s t s   2   1   exp  t t Cμ  t t CC t t   2 

t Since M X t  is the moment generating function of N Cμ, CC ,

CY ~ N Cμ, CC t . ◆

Corollary: If Y ~ N μ ,  2 I  then

2 TY ~ N Tμ, I , where T is an orthogonal matrix.

Theorem: If Y ~ N μ ,   , then the marginal distribution of subset of the elements of Y is also multivariate normal.



 Y1   Y i1    Y Y  Y   2  ~ N μ ,   , then Y    i 2  ~ N μ  ,    , where           Y n  Y im 

   2  2   2  i1 i1i1 i1i2 i1im    2 2 2  i     m  n, i ,i ,,i 1,2,,n , μ   2 ,    i2i1 i2i2 i2im  1 2 m             2 2 2  i      m   imi1 imi2 imim 

Theorem: Y has a multivariate normal distribution if and only if a t Y is univariate normal for all real vectors a .

[proof:]

:

Suppose E Y   μ , V Y    . a t Y is univariate normal. Also,

t t t t t t Ea Y  a EY   a μ, V a Y  a V Y a  a a .

Then, a t Y ~ N a t μ , a t  a  . Since

 Z ~ N , 2    t 1 t    M X 1   expa μ  a a   1 2   2   M Z 1   exp       2   EexpX    Eexpa tY  

 M Y a 

Since

 t 1 t  M Y a   expa μ  a a ,  2  is the moment generating function of N μ ,   , thus Y has a multivariate distribution N μ ,   .

:

By the previous theorem. ◆

(4) Quadratic forms in normal variables

Theorem: If Y ~ N μ, 2 I  and let P be an n  n symmetric matrix of rank r. Then,

t Q  Y  μ  P Y  μ   2

2 2 is distributed as  r if and only if P  P (i.e., P is idempotent).

[proof]

:

Suppose P 2  P and rank P   r . Then, P has r eigenvalues equal to 1 and n  r eigenvalues equal to 0. Thus, without loss generalization,

1 0  0  0    0   0  0 

t    1     t P  T  T  T  T where T is an orthogonal matrix. Then, 0 0  0  0            0 0  0  0 

Y  μt PY  μ Y  μt TT t Y  μ Q    2  2 t Z Z t t  Z  T Y  μ   Z1 Z 2  Z n    2

Z1    1 Z  Z Z  Z  2   2 1 2 n      Z n  Z 2  Z 2  Z 2  1 2 r  2

Since Z  T t Y  μ  and Y  μ ~ N 0, 2 I , thus

t t t 2 2 Z  T Y  μ  ~ N T 0,T T   N 0, I .

2 Z 1 , Z 2 , , Z n are i.i.d. normal random variables with common variance  . Therefore,

2 2 2 2 2 2 Z1  Z2  Zr  Z1   Z2   Zr  2 Q           ~ r  2         

:

Since P is symmetric, P  TT t , where T is an orthogonal matrix and  is a diagonal matrix with elements

t 2 1,2 ,,r . Thus, let Z  T Y  μ  . Since Y  μ ~ N 0,  I ,

Z  T t Y  μ  ~ N T t 0,T tT 2   N 0, 2 I .

2 That is, Z1,Z2 ,, Zr are independent normal random variable with variance  . Then,

Y  μt PY  μ Y  μt TT t Y  μ Q    2  2 Z t Z  Z  T t Y  μ   Z Z  Z t   2 1 2 n r 2  i Z i  i 1  2

r 2  i Z i The moment generating function of Q  i 1 is  2

  r    Z 2    i i  r   t Z 2  E exp  t i 1   E exp  i i   2     2     i 1          

r  1 t z 2    z 2  r  1   z 2 1 2 t  exp i i exp i dz  exp i i dz   2  2   2  i   2  2  i i1  2     2  i1  2  2  r  2 r r 1 1 2 t   z 1 2 t  1 1  i exp i i dz   1 2 t 2   2  2  i   i  i1 1 2it  2  2  i1 1 2it i1

2 r 2 Also, since Q is distributed as  r , the moment generating function is also equal to 1  2t  . Thus, for every t,

r  r 1 2 2 E exp tQ    1  2t    1  2 i t  i 1

Further,

r r 1  2t   1  2 i t . i 1

2 By the uniqueness of polynomial roots, we must have i  1 . Then, P  P by the following result: a matrix P is symmetric, then P is idempotent and rank r if and only if it has r eigenvalues equal to 1 and n-r eigenvalues equal to 0. ◆

Important Result: t t Let Y ~ N 0, I  and let Q 1  Y P1Y and Q 2  Y P2Y be both distributed as chi-square. Then, Q 1 and

Q 2 are independent if and only if P1 P2  0 .

Useful Lemma: 2 2 If P1  P1 , P2  P2 and P1  P2 is semi-positive definite, then

 P1 P2  P2 P1  P2

 P1  P2 is idempotent.

Theorem: If Y ~ N μ , 2 I  and let

Y μt PY μ Y μt P Y μ Q  1 , Q  2 1  2 2  2

2 If Q ~  2 , Q ~  2 , Q  Q  0 , then Q  Q and are independent and Q  Q ~  . 1 r1 2 r2 1 2 1 2 Q 2 1 2 r1  r2

[proof:]

We first prove Q  Q ~  2 . Q  Q  0 , thus 1 2 r1  r2 1 2

Y  t P  P Y  Q  Q   μ  1 2  μ  0 1 2  2

2 n Since Y  μ ~ N 0, I  , Y  μ is any vector in R . Therefore, P1  P2 is semidefinite. By the above useful lemma, P1  P2 is idempotent. Further, by the previous theorem,

t Y  μ P1  P2 Y  μ 2 Q  Q  2 ~  1 2  r1 r2 since

rankP1  P2   trP1  P2   trP1 trP2 

 rankP1  rank P2

 r1  r2

We now prove Q 1  Q 2 and Q 2 are independent. Since

P1P2  P2P1  P2  P1  P2 P2  P1P2  P2P2  P2  P2  0

By the previous important result, the proof is complete. ◆

(5) Regression and Simple Time Series Analysis

1. Let (푥, 푦),⋯,(푥, 푦) be n points. Please derive the least squares regression line for the usual simple linear regression: : 푦 = 훽 + 훽푥 + 휀 Solution:

To minimize the sum (over all n points)of squared vertical distances from each point to the line:

푆푆 = ∑(푦 − 훽 − 훽푥) : 휕푆푆 ⎧ =0 휕훽 휕푆푆 ⎨ =0 ⎩휕훽

⎧ 푦 − 훽 − 훽 푥 =0 ⎪ ⟹ ⎨ ⎪ 푥푦 − 훽 − 훽푥 =0 ⎩ ⟹ 훽 = 푦 − 훽푥̅

⟹ 푥푦 − 푦 + 훽푥̅ − 훽푥 =0 ∑ 푥 (푦 − 푦) ∑ (푥 − 푥̅)(푦 − 푦) 푆 ⟹ 훽 = = = ∑ 푥(푥 − 푥̅) ∑ (푥 − 푥̅) 푆

푦 = 훽 + 훽푥 is the least squares regression line.

2. Derive the least squares regression line for simple linear regression through the origin:

푦 = 훽푥 + 휀

Solution:

To minimize

푆푆 = ∑(푦 − 훽푥) :

휕푆푆 ∑ 푥 푦 =0⟹ 푥푦 − 훽푥 =0⟹ 훽 = 휕훽 ∑ 푥

푦 = 훽푥 is the least squares regression line through the origin.

3. Let (푥, 푦),⋯,(푥, 푦) be n points observed. Our goal is to fit a simple linear regression model using the maximum likelihood estimator. The model and assumptions are as follows:

푦 = 훽 + 훽푥 + 휀, 푖 = 1,2,… , 푛.

We assume that the xi‘s are given (condition on the x’s, so that the x’s are not considered as random 푖푖푑 variables here), and 휀 ~ 푁(0, 휎 ).

Solution:

Based on the assumptions that

푖푖푑 푦 = 훽 + 훽푥 + 휀, 푖 = 1,2,… , 푛, 푥 푖푠 푔푖푣푒푛, 푎푛푑 휀 ~ 푁(0, 휎 ),

we can derive the p.d.f. for yi that

푖푛푑푒푝푒푛푑푒푛푡 1 [()] 푦 ~ 푁(훽 + 훽푥, 휎 ) ⇔ 푓(푦) = 푒 . 푏푢푡 푛표푡 푖푖푑 √2휋휎

Thus, the likelihood and log-likelihood function can be derived as follows:

푙푖푘푒푙푖ℎ표표푑: ℒ = 푓(푦, 푦,⋯, 푦) 1 [()] ∑[()] = 푓(푦) = 푒 = (2휋휎 ) 푒 , √2휋휎

푛 푛 1 푙표푔 − 푙푖푘푒푙푖ℎ표표푑: 푙 =logℒ =− log(2휋)− log (휎)− [푦 −(훽 + 훽 푥 )]. 2 2 2휎

In order to maximize the likelihood function, which is in turn maximizing the log-likelihood function, we take the first derivatives of involved parameters in the as-derived log-likelihood function and set them to 0.

⎧ 휕푙 1 = [푦 −(훽 + 훽푥)]=0 ⎪ 휕훽 휎 ⎪ ⎪ 휕푙 1 = 푥[푦 −(훽 + 훽푥)]=0 . ⎨ 휕훽 휎 ⎪ ⎪ 휕푙 푛 1 ⎪ =− + [푦 −(훽 + 훽 푥 )] =0 휕휎 2휎 2휎 ⎩

By solving the three equations above, we can derive the MLEs as shown below:

훽 = 푦 − 훽푥̅ ⎧ ∑ (푥 − 푥̅)(푦 − 푦) 푆 ⎪ 훽 = = ∑ (푥 − 푥̅) 푆 ⎨ 1 ⎪ 휎 = [푦 −(훽 + 훽푥 )] 푛 ⎩

Now we have the fitted regression model by plugging the above MLEs of the model parameters:

푦 = 훽 + 훽푥, 푖 = 1,2,… , 푛.

Another way to write down the exact likelihood is to use the multivariate normal distribution directly because the likelihood is simply the joint distribution of the entire sample.

′ Let 푌 = (푌,…, 푌) , the general formula for the 푛-dimensional multivariate normal density function is 1 1 푓(푦,…, 푦) = exp − 푦 − 휇 횺 푦 − 휇 (2휋)|횺| 2 where 퐸푌 = 휇 and 횺 is the covariance matrix of 푌.

Given that we have i.i.d. Gaussian white noise, we can write down the exact likelihood of our regression ′ dependent variables 푌 = (푌,…, 푌) , conditioning on {푋,…, 푋} as a multivariate normal density function directly as follows:

퐿(풚; 휽) = 푓(푦,⋯, 푦; 휽)

where 휽 = (훽, 훽, 휎 ) . We have:

1 푥 훽 퐸푌 = 휇 = 푋훽 = ⋮ ⋮ 훽 1 푥 1 0 0 푉푎푟푌 =Σ= 휎 ⋮ ⋱ ⋮ = 휎퐈 0 0 1

4. For the stationary autoregressive model of order 1, that is, an AR(1) process: X =β+βX +ε, where {ε} is a sequence of Gaussian white noise.

*Definition: A process{X} is called (second-order or weakly) stationary if its mean is constant and its auto covariance functions depends only on the lag, that is, 퐸[푋(푡)] = 휇

퐶표푣[푋(푡), 푋(푡 + 휏)] = 훾(휏)

a) Please derive the conditional MLE for β,β. Please show that they are the same as the OLS. (Assume that we have observed {푥, 푥,⋯, 푥}) b) Please write down the exact (full) likelihood. c) Please derive the method of moment estimator (MOME) of the model parameters β,β. Are they the same as the OLS?

Solution: a) The conditional likelihood: 퐿 = 푓(푥|푥)푓(푥|푥, 푥) ⋯ 푓(푥|푥,⋯, 푥) = 푓(푥|푥)푓(푥|푥) ⋯ 푓(푥|푥) 1 (푥 − 훽 − 훽 푥 ) 1 (푥 − 훽 − 훽 푥 ) = exp ∙ exp ∙⋯ √2휋휎 2휎 √2휋휎 2휎 1 (푥 − 훽 − 훽푥) 1 (푥 − 훽 − 훽푥) ∙ exp = 푒푥푝 − √2휋휎 2휎 √2휋휎 2휎

푁 −1 1 푙 = 푙표푔퐿 =− log(2휋휎) − (푥 − 훽 − 훽 푥 ) 2 2휎

휕푙 ∑ 푥 − 훽 ∑ 푥 =0⟹ 훽 = 휕훽 푁 −1

휕푙 ∑ 푥 − 훽 ∑ 푥 =0⟹ (푥 − 훽 − 훽푥)푥 =0⟹ 푥 − − 훽푥 푥 =0⟹ 훽 휕훽 푁 −1 (∑ 푥)(∑ 푥) ∑ 푥푥 − = 푁 −1 (∑ 푥 ) ∑ 푥 − 푁 −1

Therefore the conditional MLE are:

(∑ 푥 )(∑ 푥 ) ∑ ∑ 푥 − 훽 ∑ 푥 푥푥 − 훽 = , 훽 = 푁 −1 푁 −1 (∑ 푥 ) ∑ 푥 − 푁 −1

Next we show that the conditional MLE are same with the OLS:

푥 푥

⋮ ⋮

푥 푥

∑ 푥 − 훽 ∑ 푥 ⟹ 훽 = 푦 − 훽푥̅ = = 훽 푁 −1

(∑ 푥)(∑ 푥) 푆 ∑ 푥푥 − ⟹ 훽 = = 푁 −1 = 훽 푆 (∑ 푥 ) ∑ 푥 − 푁 −1

b) Write down the exact (full) likelihood. β 휎 X =β+βX +ε ⟹ 퐸(X) = , 푉푎푟(X) = 1−β 1− 훽

퐿 = 푓(푥|푥)푓(푥|푥, 푥) ⋯ 푓(푥|푥,⋯, 푥) ∙ 푓(푥) β ⎡ 푥 − ⎤ 1 (푥 − 훽 − 훽푥) 1 1−β = 푒푥푝 − ∙ 푒푥푝 ⎢− ⎥ √2휋휎 2휎 ⎢ 휎 ⎥ 휎 2 2휋 ⎣ 1− 훽 ⎦ 1− 훽

The exact MLEs can thus be derived in a similar fashion as the conditional MLEs.

Another way to write down the exact likelihood is to use the multivariate normal distribution directly because the likelihood is simply the joint distribution of the entire sample.

′ Let 푋 = (푋,…, 푋) , the general formula for the 푁-dimensional multivariate normal density function is 1 1 푓(푥,…, 푥) = exp − 푥 − 휇 횺 푥 − 휇 (2휋)|횺| 2 21

where 퐸푋 = 휇 and 횺 is the covariance matrix of 푋.

Given that we have i.i.d. Gaussian white noise, we can write down the exact likelihood of our time series

{푋,…, 푋} as a multivariate normal density function directly as follows:

퐿(풙; 휽) = 푓(푥,⋯, 푥; 휽)

For our stationary AR(1)

푋 = 훽 + 훽푋 + 휀

where {휀} is a series of Gaussian white noise (that is, i.i.d. 푁(0, 휎 ), 푡 = 1,⋯, 푁),

휽 = (훽, 훽, 휎 ) , |훽| <1.

훽 퐸푋 = 휇 = 1 1− 훽

1 ⋯ 훽 ⎡ 훽 ⎤ 휎 훽 ⋯ ⎢ 1 훽 ⎥ 푉푎푟푋 =Σ= 1− 훽 ⎢ ⋮ ⋮ ⋱ ⋮ ⎥ ⎣훽 훽 ⋯ 1 ⎦

c) Derive the MOME of the model parameters β,β. Are they the same as the OLS?

β 훽 퐸(X ) = ⟹ 푥̅ = 1−β 1− 훽

∑ (푥 − 푥̅)(푥 − 푥̅) ( ) ( ) ( ) ( ) 훾 1 = 퐶표푣 X,X =β훾 0 ⟹ 휌 1 =β ⟹ ퟐ = 훽 ∑ (푥 − 푥̅)

Therefore the MOME of the model parameters are:

∑ (푥 − 푥̅)(푥 − 푥̅) 훽 = 1− 훽 푥̅, 훽 = ퟐ ∑ (푥 − 푥̅)

So we conclude that they are not the same as the OLS.

*Please see the supplement notes on Cochran’s Theorem to understand how to test the significance of regression, for a multiple regression. using the Quadratic Forms.