<<

D.G. Bonett (6/2018)

Matrix Notation and Operations

Matrix Notation

An r × c matrix is a rectangular array of elements with r rows and c columns. An r × c matrix is said to be of order r × c. A matrix is usually denoted by a capital letter printed in a boldface font (e.g., A, B, X). The elements of the matrix are represented by lower case letters with a double subscript (e.g., 푎푗푘, 푏푗푘, 푥푗푘). For instance, in the matrix X, 푥13 is the element in the first row and the third column. A 4 × 3 matrix X is shown below.

푥11 푥12 푥13 푥 푥 푥 X = [ 21 22 23] 푥31 푥32 푥33 푥41 푥42 푥43

A matrix with a single row is called a row vector and a matrix with a single column is called a column vector. Vectors are usually represented by lower case letters printed in a boldface font (e.g., a, b, x). The elements of the vector are represented by lower case letters with a single subscript (e.g., 푎푗, 푏푗, 푥푗). A 3 × 1 column vector y and a 1 × 4 row vector h are shown below.

푦1 y = [푦2] 푦3

h = [ℎ1 ℎ2 ℎ3 ℎ4 ].

It is sometimes necessary to refer to a particular row or column of a matrix. These row or column vectors are represented by a subscripted lower case letter in a boldface font. For th instance, the j row vector in the above 4 × 3 matrix X would be noted as 퐱푗.

A has the same of rows as columns. A square matrix where the elements (the elements where the two subscripts are equal) are nonzero and the off-diagonal elements are zero is called a . A 3 × 3 diagonal matrix D is shown below.

1

D.G. Bonett (6/2018)

푑11 0 0 D = [ 0 푑22 0 ] 0 0 푑33

The matrix is a special type of diagonal matrix where all diagonal elements are equal to 1. The is usually represented as I or In where n is the order of the identity matrix.

A square matrix where the jkth element is equal to the kjth element is called a . A symmetric 3 × 3 matrix is shown below.

14 5 2 S = [ 5 20 8 ] 2 8 11

A one vector is a row or column vector in which every element is equal to 1 and is represented as the number one printed in a boldface font. A 1 × 3 one vector is shown below.

1 = [1 1 1]

Matrix Operations

The of a matrix X is represented as X' (or XT). The transpose of a matrix is obtained by interchanging the rows and columns of the matrix. For instance, if

4 6 X = [7 1] 3 9 then 4 7 3 X' = [ ] . 6 1 9

Note that the jkth element in X is equal to the kjth element in X'. Most vectors in statistical formulas are assumed to be column vectors. Row vectors, when needed, are obtained by taking the transpose of a column vector.

2

D.G. Bonett (6/2018)

If two matrices A and B are of the same order, the two matrices are then conformable for th or , and A + B is a matrix with element 푎푗푘 + 푏푗푘 in the j row and the kth column, as illustrated below for the sum of two 3 × 2 matrices.

푎11 푎12 푎13 푏 푏 푏 A + B = [ ] + [ 11 12 13] 푎21 푎22 푎23 푏21 푏22 푏23

푎 + 푏 푎 + 푏 푎 + 푏 = [ 11 11 12 12 13 13] 푎21 + 푎11 푎22 + 푏22 푎23 + 푏23

th th Likewise, A – B is a matrix with element 푎푗푘 – 푏푗푘 in the j row and the k column, as illustrated below for the difference of two 3 × 2 matrices.

푎 푎 푎 푏 푏 푏 A – B = [ 11 12 13] – [ 11 12 13] 푎21 푎22 푎23 푏21 푏22 푏23

푎 − 푏 푎 − 푏 푎 − 푏 = [ 11 11 12 12 13 13] 푎21 − 푎11 푎22 − 푏22 푎23 − 푏23

To multiply a matrix by a (i.e., a single number), simply multiply each element in the matrix by the scalar. Scalars are represented by italicized lower case letters in a non- boldface font. To illustrate, if b = 2 and

4 7 3 A = [ ] 6 1 9 then 8 14 6 bA = [ ]. 12 2 18

Some statistical formulas involve the subtraction of a scalar from an n x 1 vector. The result is obtained by first multiplying the scalar by an n x 1 one vector and then taking the difference of the two vectors. For instance, if y is a 3 × 1 vector and m is a scalar, then y – m is

푦1 1 푦1 − 푚 y – m1 = [푦2] − 푚 [1] = [푦2 − 푚] 푦3 1 푦3 − 푚

3

D.G. Bonett (6/2018)

The dot of a n × 1 vector a with a n × 1 vector b is

a'b = 푎1푏1 + 푎2푏2 + … + 푎푛푏푛

Note that a'b = b'a. For instance, if a' = [4 3 2] and b' = [6 1 4], then a'b = 4(6) + 3(1) + 2(4) = 35.

Two n × 1 vectors, a and b, are said to be orthogonal if a'b = 0. For instance, if a' = [.5 .5 -1] and b' = [1 -1 0], then a and b are orthogonal because 퐚'b = (.5)(1) + (.5)(-1) + (-1)(0) = 0.

Two matrices A and B can be multiplied if they are conformable for . To compute the matrix product AB, the number of columns of A must equal the number of rows of B. In general, if A is r × n and B is n × c, then the matrix product AB is an r × c th matrix. The jk element in the r × c product matrix is equal to the 퐚푗퐛푘 where th th 퐚푗 is the j row vector of matrix A and 퐛푘 is the k column vector of matrix B. For instance, the matrices A and B shown below are conformable for computing the product AB because A is 2 × 3 and B is 3 × 4 so that the product will be a 2 × 4 matrix.

1 2 1 4 4 7 3 A = [ ] B = [5 4 3 1] 6 1 9 4 2 3 2

Each of the 2 × 4 = 8 elements of the AB matrix is a dot product. For instance, the element in row 1 and column 1 of the product AB is

1 [4 7 3] [5] = 4(1) + 7(5) + 3(4) = 51 4 and the element in row 2 and column 3 of AB is

1 [6 1 9] [3] = 6(1) + 1(3) + 9(3) = 36. 3

After computing all 8 dot products, the following result is obtained.

51 42 34 29 AB = [ ] 47 34 36 43

4

D.G. Bonett (6/2018)

Unlike where ab = ba, the matrix product AB does not in general equal BA. Regarding the matrix product AB, we can say that B is pre-multiplied by A or that A is post-multiplied by B. The product of matrix A with itself is denoted as A2.

The transpose of a matrix product is equal to the product of the transposed matrices in reverse order. Specifically, (AB)' = B'A'.

The product of three matrices ABC requires A and B to be conformable for multiplication and also requires B and C to be conformable for multiplication. The product ABC can be obtained by first computing AB and then post-multiplying the result by C, or by first computing BC and then pre-multiplying the result by A.

If A is a square matrix, then the matrix inverse of A is represented as A-1. If the inverse of A exists, then AA-1 = I. This result is a generalization of scalar where x(1/x) = 1, assuming x  0 so that the inverse of x exists. Computing a matrix inverse is tedious, and the amount of computational effort increases as the size of the matrix increases, but inverting a 2 × 2 matrix is not difficult. The inverse of a 2 × 2 matrix A is

푎22 −푎12 A-1 = (1/d)[ ] −푎21 푎11

where d = 푎11푎22 − 푎12푎21 is called the of A. The matrix inverse does not exist unless the determinant is nonzero.

Inverting a diagonal matrix D of any order is simple. The inverse of D is equal to a th diagonal matrix where the j diagonal element is equal to 1/푑푗 .

The of a square n × n matrix A, denoted as tr(A), is defined as the sum of its n diagonal elements.

tr(A) = 푎11 + 푎22 + … + 푎푛푛

5

D.G. Bonett (6/2018)

For instance if

14 5 2 V = [ 9 20 8 ] 7 8 11 then tr(V) = 14 + 20 + 11 = 45.

The of two matrices, an m × n matrix A and a p × q matrix B, is defined to be the mp × nq matrix

푎11퐁 푎12퐁 ⋯ 푎1푛퐁 푎 퐁 푎 퐁 … 푎 퐁 ⨂ [ 21 22 2푛 ] A B = ⋮ ⋮ ⋮ 푎푚1퐁 푎푚2퐁 ⋯ 푎푚푛퐁

which is obtained by replacing each element 푎푗푘 with the p × q matrix 푎푗푘B. For example, if

1 2 A = [ ] and b = [1 2 3] 3 −1 then

1 2 3 2 4 6 A ⨂ b = [ ]. 3 6 9 −1 −2 −3

The Kronecker product of an identity matrix and another matrix has the following simple form

퐁 ퟎ ⋯ ퟎ ퟎ 퐁 … ퟎ I ⨂ B = [ ] ⋮ ⋮ ⋮ ퟎ ퟎ ⋯ 퐁 where ퟎ is a matrix of zeros that has the same order as B.

The transpose of a Kronecker product of two matrices is

(A ⨂ B)' = A' ⨂ B'

6

D.G. Bonett (6/2018) and the inverse of a Kronecker product of two matrices is

(A ⨂ B)-1 = A-1 ⨂ B-1.

The product of A ⨂ B and C ⨂ D is equal to

(A ⨂ B)(C ⨂ D) = AC ⨂ BD assuming A and C are conformable for multiplication and B and D are conformable for multiplication.

In some statistical formulas, it is convenient to rearrange the elements of an r × c matrix A into an rc × 1 column vector a. This is done by stacking the c column vectors (which are each r × 1) of A (퐚1, 퐚2, … , 퐚푐) one under the other as shown below

퐚1 퐚2 a = [ ⋮ ] 퐚푐

The conversion of a matrix A into a vector is denoted as vec(A). To illustrate, consider the following 4 × 2 matrix Y and vec(Y). The mat(y) returns the vector into its original matrix form.

14 9 14 2 7 9 8 16 Y = [ ] vec(Y) = = y 7 11 2 8 16 34 11 [ 34 ]

The vectorization of a matrix can be expressed as

vec(ABC) = (C′ ⨂ A)vec(B)

7

D.G. Bonett (6/2018) and the vectorization of a matrix product AB follows from this result by setting C = I

vec(AB) = (Ip ⨂ A)vec(B) where p is the number of columns of B.

For two matrices A and B of the same order, matrix A = B indicates that 푎푗푘 = 푏푗푘 for every value of j and k. Matrix inequality A ≠ B indicates that there is at least one element in A that is not equal to its corresponding element in B.

Covariance Matrices

A is a symmetric matrix with variances in the diagonal elements and covariances in the off-diagonal elements. If r response variables y′ = [푦1 푦2 … 푦푟] are measured for each person in a random sample of n people, the estimated variance for the jth response variable is

2 푛 2 𝜎̂푗 = ∑푖=1(푦푖푗 − 휇̂푗) /(푛 − 1) and the estimated covariance between the jth and kth response variables is

푛 𝜎̂푗푘 = ∑푖=1(푦푖푗 − 휇̂푗)(푦푖푘 − 휇̂푘)/(푛 − 1).

th th The estimated covariance between the j and k measurements is also equal to 𝜌̂푗푘𝜎̂푗𝜎̂푘 where 𝜌̂푗푘 is the estimated Pearson correlation between the two response variables. Note that 𝜎̂푗푘 = 𝜎̂푘푗.

The r variances and the r(r – 1)/2 covariances of y′ = [푦1 푦2 … 푦푟] can be summarized in an r × r covariance matrix denoted as cov(y). For instance, with r = 3 the covariance matrix is

2 𝜎̂1 𝜎̂12 𝜎̂13 2 cov(y) = [𝜎̂12 𝜎̂2 𝜎̂23] . 2 𝜎̂13 𝜎̂23 𝜎̂3

If there are n sets of response variables where the ith has a covariance matrix Si, and response variables from different sets are assumed to be uncorrelated, the covariance matrix for all response variables has the following form 8

D.G. Bonett (6/2018)

퐒1 ퟎ ⋯ ퟎ ퟎ 퐒 … ퟎ [ 2 ] ⋮ ⋮ ⋮ ퟎ ퟎ ⋯ 퐒푛 where each 0 represents a matrix of zeros. If all of the n covariance matrices are equal to S, the above covariance matrix can then be expressed as I ⨂ S.

Variance of a of Variables

푟 A linear function of r variables can be expressed as ∑푗=1 ℎ푗 푦푗 where the ℎ푗 are known . This linear function can be expressed in matrix notation as h′y where h′ = [ℎ1 ℎ2 … ℎ푟] and y′ = [푦1 푦2 … 푦푟]. The variance of h′y can be expressed in matrix notation as h′cov(y)h. For instance, the variance of 푦1 + 푦2 + 푦3 is

2 𝜎̂1 𝜎̂12 𝜎̂13 1 2 [1 1 1][𝜎̂12 𝜎̂2 𝜎̂23] [1] 2 𝜎̂13 𝜎̂23 𝜎̂3 1 1 2 2 2 = [𝜎̂1 + 𝜎̂12 + 𝜎̂13 𝜎̂12 + 𝜎̂2 + 𝜎̂23 𝜎̂13 + 𝜎̂23 + 𝜎̂3 ] [1] 1 2 2 2 = 𝜎̂1 + 𝜎̂2 + 𝜎̂3 + 2𝜎̂12 + 2𝜎̂13 + 2𝜎̂23.

The variance of a linear function of estimates (i.e., 풄′휷̂) is similarly defined as ̂ ̂ ̂ c′cov(휷)c. For instance, the variance of 훽1 − 훽2 is

2 𝜎̂ 𝜎̂훽̂ 훽̂ 훽̂1 1 2 1 2 2 1 [1 −1][ 2 ] [ ] = [𝜎̂̂ − 𝜎̂̂ ̂ 𝜎̂̂ ̂ − 𝜎̂̂ ] [ ] 𝜎̂ 𝜎̂ −1 훽1 훽1훽2 훽1훽2 훽2 −1 훽̂1훽̂2 훽̂2 2 2 = 𝜎̂ − 𝜎̂̂ ̂ − 𝜎̂̂ ̂ + 𝜎̂ 훽̂1 훽1훽2 훽1훽2 훽̂2 2 2 = 𝜎̂ + 𝜎̂ − 2𝜎̂̂ ̂ . 훽̂1 훽̂2 훽1훽2

General Linear Model in Matrix Form

A (GLM) model for a random sample of n participants can be expressed in matrix notation as follows

9

D.G. Bonett (6/2018)

퐲 = 퐗휷 + 퐞 (1.1) where y is a n × 1 vector of response variable scores, 퐞 is an n × 1 vector of prediction 2 errors with cov(e) = 𝜎푒 퐈푛, 휷 is a (q + 1) × 1 vector of unknown population , and X is an n × (q + 1) . The first column of X is often an n × 1 vector of ones to code the y-intercept and other q columns contain the values of q predictor variables.

To illustrate the structure of 1.1, consider a model with q = 1 predictor variable and a random sample of n = 7. The elements of 퐲 = 퐗휷 + 퐞 for this example are shown below.

푦1 1 푥11 푒1

푦2 1 푥12 푒2

푦3 1 푥13 푒3 훽0 푦4 = 1 푥14 [ ] + 푒4 훽1 푦5 1 푥15 푒5 푦6 1 푥16 푒6 [푦7] [1 푥17] [푒7]

Using matrix notation, the (OLS) estimate of 휷 is

휷̂ = (퐗′퐗)−1퐗′퐲 (1.2) and the estimated residuals are

퐞̂ = 퐲 − 퐲̂ (1.3)

̂ 2 where 퐲̂ = 퐗휷 is a vector of predicted y scores. An estimate of the residual variance (𝜎푒 ) is

′ 푀푆E = 퐞̂ 풆̂/(푛 − 푞 − 1). (1.4)

th ̂ ̂ ̂ th The j element of 휷 is 훽푗. The standard error of 훽푗 is equal to the of the j diagonal element of the following covariance matrix of 휷̂

̂ ′ −1 cov(휷) = 푀푆E(퐗 퐗) . (1.5)

A linear function of the elements of 휷 will be of interest in some applications A linear function of 휷 can be expressed as 퐜′휷 where c is a (q + 1) × 1 vector of specified numbers.

10

D.G. Bonett (6/2018)

For instance, in a GLM with q = 2 predictor variables the linear function 훽1 + 5훽2 can be expressed as 퐜′휷 where 퐜′ = [0 1 5]. The estimated standard error of 퐜′휷̂ is

̂ 푆퐸퐜′휷̂ = √퐜′푐표푣(휷)퐜 (1.6)

and a 100(1 − 훼)% confidence interval for 퐜′휷 is

̂ 퐜′휷 ± 푡훼/2;푑푓퐸푆퐸퐜′휷̂ (1.7)

where 푑푓퐸 = n – q – 1. To obtain simultaneous Bonferroni confidence intervals for v different linear functions of 휷, replace 훼 with 훼∗ = 훼/푣 in Equation 1.7.

A special case of Equation 1.7 is the following confidence interval for the jth element in 휷

훽̂ ± 푡 푆퐸̂ 푗 훼/2;푑푓퐸 훽푗 (1.8) where

푀푆E 푆퐸̂ = (1.9) 훽푗 √(1 − 𝜌̂2 )𝜎̂2 (푛 − 1) 푥푗.퐱 푥푗

2 and 𝜌̂푥푗.퐱 is the estimated squared multiple correlation between predictor variable j and all other predictor variables.

Another special case of Equation 1.7 is the following confidence interval for the difference between any two elements in 휷, such as 훽1 and 훽2

훽̂ − 훽̂ ± 푡 √푆퐸 2 + 푆퐸 2 + 푐표푣(훽̂ , 훽̂ ) 1 2 훼/2;푑푓퐸 훽̂1 훽̂2 1 2 (1.10)

̂ ̂ where 푐표푣(훽1, 훽2) is obtained from Equation 1.5.

In a random-x GLM (where all q predictor variables are random), the squared multiple 2 correlation (denoted as 𝜌푦.퐱) describes the proportion of variance in the y scores that is 2 predictable from all q predictor variables. An estimate of 𝜌푦.퐱 (reported as “R-squared” in most statistical packages) is

2 𝜌̂푦.퐱 = 1 − 푆푆E/푆푆T . (1.11)

11

D.G. Bonett (6/2018)

′ 2 where 푆푆E = 퐞̂ 풆̂ and 푆푆T = (n – 1) 𝜎̂푦 . The fact that the multiple correlation (𝜌푦.풙) is equal to the Pearson correlation between y and the predicted y scores is helpful in interpreting 2 the multiple correlation and squared multiple correlation. A confidence interval for 𝜌푦.퐱 does not have a simple formula but it can be computed using SAS or R.

In a fixed-x GLM (where all the predictor variables are fixed), the following estimate of the of multiple determination

2 휂̂ = 1 − 푆푆E/푆푆T (1.12)

2 2 2 2 is an estimate of 휂 which is equal to 𝜌̂푦.퐱. Like 𝜌̂푦.퐱, 휂̂ has a positive bias and the bias can be substantial when n – q is small. A less biased estimate of 휂2 (confusingly called omega- 2 2 2 2 squared) is available in SAS. Although 휂 = 𝜌푦.퐱 and 휂̂ = 𝜌̂푦.퐱, different symbols are used 2 2 in the random-x and fixed-x models because 휂̂ and 𝜌̂푦.퐱 have different sampling distributions, and a confidence interval for 휂2 in the fixed-x model will be different than 2 2 a confidence interval for 𝜌푦.퐱 in the random-x model. The confidence interval for 휂 is complicated but it can be obtained in SAS or R.

Multivariate General Linear Model

Some studies will involve r ≥ 2 response variables. A GLM for a random sample of n participants, with the same set of q predictor variables for each response variable, can be specified for each of the r response variables

퐲1 = 퐗휷ퟏ + 퐞1 퐲2 = 퐗휷ퟐ + 퐞2 … 퐲푟 = 퐗휷풓 + 퐞푟 and these r models can be combined into the following multivariate general linear model (MGLM)

Y = XB + E where Y = [퐲1 퐲2 … 퐲푟] is an n × r matrix of response variables, X is an n × (q + 1) design matrix, B = [휷ퟏ 휷ퟐ … 휷풓] is a (q + 1) × r matrix of unknown parameters, and E =

[퐞1 퐞2 … 퐞푟] is an n × r matrix of prediction errors. Note that the design matrix in the MGLM is the same as the design matrix in each GLM. The r columns of E are assumed to be correlated and the r × r covariance matrix of the r columns of E will be denoted as S.

12

D.G. Bonett (6/2018)

The independence assumption of the GLM is also required in the MGLM and this assumption implies that the rows of E are uncorrelated.

The OLS estimate of B in the MGLM is

퐁̂ = (퐗′퐗)−1퐗′퐘, (1.17) and the matrix of predicted scores is

퐘̂ = X퐁̂. (1.18)

The matrix of estimated prediction errors (residuals) is

퐄̂ = Y – 퐘̂, (1.19) and an estimate of the covariance matrix S is

퐒̂ = 퐄̂′퐄̂/(n – q – 1). (1.20)

Let 휷̂∗ = vec(퐁̂). The covariance matrix of 휷̂∗ is

−1 ̂∗ −1 cov(휷 ) = [(퐈 푟 ⊗ 퐗)′(퐒̂ ⊗ 퐈푛) (퐈푟 ⊗ 퐗)]

= 퐒̂ ⊗ (퐗′퐗)−1. (1.21)

A multivariate linear contrast of the B parameter matrix in the MGLM can be expressed as c′Bh where c is a (q + 1) × 1 vector of researcher-specified coefficients and h is an r × 1 vector of researcher-specified coefficients. To illustrate the specification of multivariate linear contrasts, consider the parameter matrix of the MGLM given above with q = 3 predictor variables and r = 2 response variables. In this model, B has four rows and two columns. Each column of B contains the y-intercept and the three slope coefficients for a single response variable. The 4 × 2 parameter matrix for this model is given below.

훽01 훽02 훽 훽 B = [ 11 12] 훽21 훽22 훽31 훽32

13

D.G. Bonett (6/2018)

Some examples will illustrate the specification of c and h. To estimate 훽11 − 훽21, set c′

= [0 1 -1 0] and h′ = [1 0]. To estimate 훽11 − 훽12, set c′ = [0 1 0 0] and h′ = [1 -1]. To estimate (훽11 − 훽21) − (훽21 − 훽22), set c′ = [0 1 -1 0] and h′ = [1 -1].

A 100(1 – 훼)% confidence interval for c′Bh is

′ ′ −1 c′퐁̂h ± 푡훼/2;푑푓√(퐡′퐒̂퐡)퐜 (퐗 퐗) 퐜 (1.22) where df = n – q – 1 and √(퐡′퐒̂퐡)퐜′(퐗′퐗)−1퐜 is the estimated standard error of c′퐁̂h. To obtain simultaneous Bonferroni confidence intervals for v different multivariate linear contrasts of B, replace 훼 with 훼∗ = 훼/v in Equation 1.22. A confidence interval for c′Bh can be used to test H0: c′Bh = 0 against H1: c′Bh > 0 and H2: c′Bh < 0 using the following three- decision rule:

• If the upper of the confidence interval for c′Bh is less than 0, then H0 is rejected and H2: c′Bh < 0 is accepted.

• If the lower limit of the confidence interval for c′Bh is greater than 0, then H0 is rejected and H1: c′Bh > 0 is accepted.

• If the confidence interval for c′Bh includes 0, then H0 cannot be rejected and the results are inconclusive.

Some computer programs will compute the test statistic c′퐁̂h/√(퐡′퐒̂퐡)퐜′(퐗′퐗)−1퐜 and its associated p-value. The p-value is used to decide if H0 can be rejected.

Although 1.17 - 1.22 are elegant in their generality, all of these results can be obtained from GLM results. Specifically, the jth column of 퐁̂ in Equation 1.17 can be ̂ -1 computed using 휷푗 = (X′X) X′퐲푗 (Equation 1.2), and for each column of 퐁̂, the standard -1 th errors can be computed from MS퐸푗(X′X) (Equation 1.5) where MS퐸푗 is the j diagonal element of 퐒̂. Thus, the least squares estimates of B and their standard errors can be obtained by simply analyzing each of the r response variables separately using a GLM. To obtain the results for Equation 1.22 using GLM results, it can be shown that the confidence interval for c′휷 (Equation 1.7) is identical to Equation 1.22 if the response variable in the GLM is defined as y = ℎ1푦1 + ℎ2푦2 + … + ℎ푟푦푟.

14

D.G. Bonett (6/2018)

Constrained MGLM

The MGLM can be expressed in the following vector form

y* = X*휷* + e* where y*= vec(Y), X* = (퐈푟 ⊗ X), 휷*= vec(B), e* = vec(E), and cov(e*) = 퐒 ⊗ 퐈푛. Note that the order of 휷* is r(q + 1) × 1 and X* has the following block-diagonal form

퐗 ퟎ ⋯ ퟎ ퟎ 퐗 … ퟎ X* = [ ]. ⋮ ⋮ ⋮ ퟎ ퟎ ⋯ 퐗

This design matrix for the vector form of the MGLM shows that all q predictor variables are related to all r response variables. Now suppose the predictor variables are not the same for each response variable. A MGLM with one or more slope coefficients constrained to equal 0 is referred to in the and economics literature as a seemingly unrelated regression (SUR) model.

With response variable 푦푗 having its own set of 푞푗 predictor variables, the SUR model can be expresses as y* = X*휷* + e* where 휷* is a (푞1 + 푞2 + … + 푞푟 + r) × 1 vector and X* has the following block structure form

퐗ퟏ ퟎ ⋯ ퟎ ퟎ 퐗 … ퟎ [ ퟐ ] X* = ⋮ ⋮ ⋮ . ퟎ ퟎ ⋯ 퐗풓

The prediction errors in the SUR model have the covariance structure cov(e*) = 퐒 ⊗ 퐈푛 and the following generalized least squares (GLS) estimate of 휷* is typically used

̂∗ ∗′ ̂−1 ∗ −1 ∗ ̂−1 ∗ 휷퐺퐿푆 = [퐗 (퐒 ⊗ 퐈푛)퐗 ] 퐗 ′(퐒 ⊗ 퐈푛)퐲 (1.23) where 퐒̂ is an estimate of S. The traditional method of estimating S is to first compute a OLS estimate of 휷* (휷̂∗ = (퐗∗′퐗∗)−1퐗∗′퐲∗), compute the vector of estimated prediction errors (퐞̂ = y – X휷̂∗), convert the vector 퐞̂ into its matrix form (퐄̂ = mat(퐞̂)), and then estimate S as

퐒̂ = 퐃퐄̂′퐄̂퐃 (1.24)

15

D.G. Bonett (6/2018)

th where D is a diagonal matrix with √1/(푛 − 푞푗 − 1) in the j diagonal and 푞푗 is the number of predictors of response variable j. Equations 1.20 and 1.24 are unstructured covariance matrices with r estimated variances and r(r – 1)/2 estimated covariances.

̂∗ The estimated covariance matrix of 휷퐺퐿푆 is

̂∗ ′ ̂−1 −1 cov(휷퐺퐿푆) = [퐗 (퐒 ⊗ 퐈푛)퐗] (1.25)

̂∗ and the square roots of the diagonal elements of cov(휷퐺퐿푆) are the standard errors of the GLS estimates. An approximate 100(1 – 훼)% confidence interval for a′휷∗ is

̂∗ ̂∗ a′휷퐺퐿푆 ± 푧훼/2√퐚′푐표푣(휷퐺퐿푆)퐚 (1.26)

∗ th ∗ and an approximate confidence interval for 훽푗 (the j element in 휷 ) is obtained by specifying the jth element of a to equal 1 and all other values equal to 0. Equation 1.26 is approximate because it uses a GLS estimate of 휷∗ that is biased in small samples and also ̂∗ because the sampling distribution of a′휷퐺퐿푆 is not accurately approximated by a in small samples.

16