<<

Chapter

Linear

Intro duction

We discuss vectors matrices transp oses correlation and inverse

matrices subspaces and eigenanalysis An alterntive source for much

of this material is the excellent b o ok by Strang

Transp oses and Inner Pro ducts

A collection of variables may b e treated as a single entity by writing them as a vector

For example the three variables x x and x may b e written as the vector

1 2 3

x

1

x x

2

x

3

Bold face typ e is often used to denote vectors scalars single variables are written

with normal typ e Vectors can b e written as column vectors where the variables go

down the page or as row vectors where the variables go across the page it needs to

b e made clear when using vectors whether x means a row vector or a column vector

most often it will mean a column vector and in our text it will always mean a column

vector unless we say otherwise To turn a column vector into a row vector we use

the op erator

T

x x x x

1 2 3

The transp ose op erator also turns row vectors into column vectors We now dene

the inner of two vectors

y

1

T

y

x y x x x

2

1 2 3

y

3

x y x y x y

1 1 2 2 3 3

Signal Pro cessing Course WD Penny April

3

X

x y

i i

i=1

which is seen to b e a The of two vectors pro duces a

x

1

T

x

y y y xy

2

1 2 3

x

3

x y x y x y

1 1 1 2 1 3

x y x y x y

2 1 2 2 2 3

x y x y x y

3 1 3 2 3 3

An N M matrix has N rows and M columns The ij th entry of a matrix is the

entry on the j th column of the ith row Given a matrix A matrices are also often

written in b old typ e the ij th entry is written as A When applying the transp ose

ij

op erator to a matrix the ith row b ecomes the ith column That is if

a a a

11 12 13

a a a

A

21 22 23

a a a

31 32 33

then

a a a

11 21 31

T

a a a

A

12 22 32

a a a

13 23 33

A matrix is symmetric if A A Another way to say this is that for symmetric

ij j i

T

matrices A A

Two matrices can b e multiplied if the numb er of columns in the rst matrix equals

the numb er of rows in the second Multiplying A an N M matrix by B an M K

matrix results in an N K matrix The ij th entry in C is the inner pro duct

b etween the ith row in A and the j th column in B As an example

Given two matrices A and B we note that

T T

T

AB B A

Prop erties of matrix

Matrix multiplication is asso ciative

AB C ABC

distributive

AB C AB AC

but not commutative

AB BA

Signal Pro cessing Course WD Penny April

Typ es of matrices

Covariance matrices

In the previous chapter the covariance b etween two variables x and y was de

xy

ned Given p variables there are p p to take account of If we write

the covariances b etween variables x and x as then all the covariances can b e

i j ij

summarised in a which we write b elow for p

2

12 13

1

2

C

21 23

2

2

31 32

3

The ith diagonal is the covariance b etween the ith variable and itself which

2

is simply the variance of that variable we therefore write instead of Also note

ii

i

that b ecause covariance matrices are symmetric

ij j i

We now lo ok at computing a covariance matrix from a given data Supp ose we

have p variables and that a single observation x a row vector consists of measuring

i

these variables and supp ose there are N such observations We now make a matrix

X by putting each x into the ith row The matrix X is therefore an N p matrix

i

whose rows are made up of dierent observation vectors If all the variables have zero

mean then the covariance matrix can then b e evaluated as

T

C X X

N

T

This is a multiplication of a p N matrix X by a N p matrix X which results in

a p p matrix To illustrate the use of covariance matrices for time series gure

shows time series which have the following covariance

C

1

and mean vector

T

m

1

Diagonal matrices

A is a M N where all the entries are zero except

along the diagonal For example

D

Signal Pro cessing Course WD Penny April

28

26

24

22

20

18

16

14

12

10 0 20 40 60 80 100

t

Figure Three time series having the covariance matrix C and mean vector m

1 1

shown in the text The top and bottom series have high covariance but none of the

other pairings do

There is also a more compact notation for the same matrix

D diag

If a covariance matrix is diagonal it means that the covariances b etween variables are

zero that is the variables are all uncorrelated Nondiagonal covariance matrices are

2 2 2 T

known as ful l covariance matrices If V is a vector of variances V then

1 2 3

the corresp onding diagonal covariance matrix is V diag V

d

The correlation matrix

The correlation matrix can b e derived from the covariance matrix by the

R BCB

where B is a diagonal matrix of inverse standard deviations

B diag

1 2 3

The

The identity matrix is a diagonal matrix with ones along the diagonal Multiplication

of any matrix X by the identity matrix results in X That is

IX X

The identity matrix is the matrix equivalent of multiplying by for scalars

Signal Pro cessing Course WD Penny April

The Matrix Inverse

1

Given a matrix X its inverse X is dened by the prop erties

1

X X I

1

XX I

where I is the identity matrix The inverse of a diagonal matrix with entries d is

ii

another diagonal matrix with entries d This satises the denition of an inverse

ii

eg

More generally the calculation of inverses involves a lot more computation Before

lo oking at the general case we rst consider the problem of solving simultaneous

These constitute relations b etween a set of input or independent variables

x and a set of output or dependent variables y Each inputoutput pair constitutes

i i

an observation In the following example we consider just N observations and

p dimensions p er observation

w w w

1 2 3

w w

1 2

w w w

1 2 3

which can b e written in matrix form

w

1

w

2

w

3

or in matrix form

X w y

This system of equations can b e solved in a systematic way by subtracting multiples

of the rst equation from the second and third equations and then subtracting mul

tiples of the second equation from the third For example subtracting twice the rst

equation from the second and times the rst from the third gives

w

1

w

2

w

3

Then subtracting times the second from the third gives

w

1

w

2

w

3

This pro cess is known as forward elimination We can then substitute the value for

w from the third equation into the second etc This pro cess is backsubstitution The 3

Signal Pro cessing Course WD Penny April

two pro cesses are together known as Following this through

T

for our example we get w

When we come to invert a matrix as opp osed to solve a system of equations as in

the previous example we start with the equation

1

AA I

and just write down all the entries in the A and I matrices in one big matrix

1

We then p erform forward elimination until the part of the matrix corresp onding to

1

A equals the identity matrix the matrix on the right is then A this is b ecause in

1

equation if A b ecomes I then the left hand side is A and the right side must

equal the left side We get

12 5 6

16 16 16

4 3 2

8 8 8

This pro cess is known as the GaussJordan metho d For more details see Strangs

excellent b o ok on where this example was taken from

Inverses can b e used to solve equations of the form X w y This is achieved by

1

multiplying b oth sides by X giving

1

w X y

Hence

5 6 12

w

1

16 16 16

4 3 2

w

2

8 8 8

w

3

T

which also gives w

The inverse of a pro duct of matrices is given by

1 1

1

AB B A

Only square matrices are invertible b ecause for y Ax if y and x are of dierent

then we will not necessarily have a onetoone mapping b etween them

1

We do not p erform backsubstitution but instead continue with forward elimination until we get

a diagonal matrix

Signal Pro cessing Course WD Penny April

Orthogonality

The length of a delement vector x is written as jjxjj where

d

X

2 2

x jjxjj

i

i=1

T

x x

Two vectors x and y are orthogonal if

y

x-y

x

Figure Two vectors x and y These vectors wil l be orthogonal if they obey

Pythagoras relation ie that the sum of the squares of the sides equals the square of

the hypoteneuse

2 2 2

jjxjj jjy jj jjx y jj

That is if

2 2 2 2 2 2

x x y y x y x y

1 1 d d

1 d 1 d

Expanding the terms on the right and rearranging leaves only the crossterms

x y x y

1 1 d d

T

x y

That is two vectors are orthogonal if their inner pro duct is zero

Angles b etween vectors

T T

Given a vector b b b and a vector a a a we can work out that

1 2 1 2

Signal Pro cessing Course WD Penny April

b

||b-a||

||b||

a ||a||

β

δ α

Figure Working out the angle between two vectors

a

1

cos

jjajj

a

2

sin

jjajj

b

1

cos

jjbjj

b

2

sin

jjbjj

Now cos cos which we can expand using the trig identity

cos cos cos sin sin

Hence

a b a b

1 1 2 2

cos

jjajjjjbjj

More generally we have

T

a b

cos

jjajjjjbjj

T

Because cos this again shows that vectors are orthogonal for a b Also

b ecause j cos j where jxj denotes the absolute value of x we have

T

ja bj jjajjjjbjj

which is known as the Schwarz Inequality

Pro jections

The pro jection of a vector b onto a vector a results in a pro jection vector p which is

the p oint on the a which is closest to the p oint b Because p is a p oint on a it

Signal Pro cessing Course WD Penny April

b

b-p

a

p

δ

Figure The of b onto a is the point on a which is closest to b

must b e some scalar multiple of it That is

p w a

where w is some co ecient Because p is the p oint on a closest to b this means that

the vector b p is orthogonal to a Therefore

T

a b p

T

a b w a

Rearranging gives

T

a b

w

T

a a

and

T

a b

a p

T

a a

We refer to p as the projection vector and to w as the projection

Orthogonal Matrices

The set of vectors q q are orthogonal if

1 k

j k

T

q q

k

j

d j k

j k

If these vectors are placed in columns of the matrix Q then

T T

Q Q QQ D

Signal Pro cessing Course WD Penny April

Orthonormal Matrices

The set of vectors q q are orthonormal if

1 k

j k

T

q q

k

j

j k

If these vectors are placed in columns of the matrix Q then

T T

Q Q QQ I

Hence the transp ose equals the inverse

T 1

Q Q

The vectors q q are said to provide an orthonormal This means that any

1 k

vector can b e written as a of the basis vectors A trivial example

T

is the twodimensional cartesian co ordinate system where q the xaxis

1

T

and q the y axis More generally to represent the vector x we can write

2

x x q x q x q

1 2 d

1 2 d

To nd the appropriate co ecients x the coordinates in the new basis multiply

k

T

b oth sides by q Due to the prop erty all terms on the right disapp ear

k

except one leaving

T

x q x

k

k

The new co ordinates are the pro jections of the data onto the basis functions re

T

equation there is no denominator since q q In matrix form equation

k

k

1

can b e written as x Qx~ which therefore has the solution x~ Q x But given

1 T

that Q Q we have

T

x~ Q x

2

Transformation to an preserves lengths This is b ecause

T

jjx~ jj jjQ xjj

T T T

Q x Q x

T

T

x QQ x

T

x x

jjxjj

Similarly inner pro ducts and therefore angles b etween vectors are preserved That is

T

T T

T

x~ y~ Q x Q y

T

T

x QQ y

T

x y

Therefore transformation by an orthonormal matrix constitutes a of the

coordinate system

T T

2 T

Throughout this chapter we will make extensive use of the matrix identities AB B A

1 1

1

and AB C ABC We will also use AB B A

Signal Pro cessing Course WD Penny April

Subspaces

A space is for example a set of real numb ers A subspace S is a set of p oints fxg

such that i if we take two vectors from S and add them we remain in S and ii if

we take a vector from S and multiply by a scalar we also remain in S S is said to b e

closed under and multiplication An example is a D in a D space

A subspace can b e dened by a basis

The of a twobytwo matrix

a b

A

c d

is given by

detA ad bc

The determinant of a threebythree matrix

a b c

d e f

A

g h i

is given by

d e d f e f

c det b det det A a det

g h g i h i

Determinants are imp ortant b ecause of their prop erties In particular if two rows of

a matrix are equal then the determinant is zero eg if

a b

A

a b

then

detA ab ba

T T

In this case the transformation from x x x to y y y given by

1 2 1 2

Ax y

reduces two pieces of information x and x to one piece of information

1 2

y y y ax bx

1 2 1 2

In this case it is not p ossible to reconstruct x from y the transformation is not

invertible the matrix A do es not have an inverse and the value of the determinant

provides a test for this If det A the matrix A is not invertible it is singular

Signal Pro cessing Course WD Penny April x

x 1 2 y

Figure A singular noninvertible transformation

Conversely if detA then A is invertible Other prop erties of the determinant

are

T

det A det A

det AB det A detB

1

det A detA

Y

det A a

k k

k

Another imp ortant prop erty of determinants is that they measure the volume of a

matrix For a by matrix the three rows of the matrix form the edges of a cub e

The determinant is the volume of this cub e For a dbyd matrix the rows form the

edges of a parallepip ed Again the determinant is the volume

Eigenanalysis

The square matrix A has eigenvalues and eigenvectors q if

Aq q

Therefore

A I q

To satisfy this equation either q which is uninteresting or the matrix A I

must reduce q to the null vector a single p oint For this to happ en A I must

b e singular Hence

det A I

Eigenanalysis therefore pro ceeds by i solving the ab ove equation to nd the

values and then ii substituting them into equation to nd the eigenvectors

i

For example if

A

then

det A I

Signal Pro cessing Course WD Penny April

which can b e rearranged as

2

Hence the eigenvalues are and Substituting back into equation

T

gives an eigenvector q which is any multiple of Similarly eigenvector q is

1 2

T

any multiple of

We now note that the determinant of a matrix is also equal to the pro duct of its

eigenvalues

Y

det A

k

k

We also dene the of a matrix as the sum of its diagonal elements

X

T r A a

k k

k

and note that it is also equal to the sum of the eigenvalues

X

T r A

k

k

Eigenanalysis applies only to square matrices

GramSchmidt

A general class of pro cedures for nding eigenvectors are the deation methods of

which QRdecomp osition and GramSchmidt are examples

In GramSchmidt we are given a set of vectors say ab and c and we wish to nd a

set of corresp onding orthonormal vectors which well call q q and q To start with

1 2 3

we let

a

q

1

jjajj

0

We then compute b which is the original vector b minus the pro jection vector see

equation of b onto q

1

0

T

b b q bq

1

1

0

The second orthogonal vector is then a unit length version of b

0

b

q

2

0

jjb jj

Finally the third orthonormal vector is given by

0

c

q

3

0

jjc jj

where

0 T T

c c q cq q cq

1 2

1 2

T

In QRdecomp osition the Q terms are given by q and the R terms by q c

i i

Signal Pro cessing Course WD Penny April

Diagonalization

If we put the eigenvectors into the columns of a matrix

j j j

j j j

q q q Q

1 2 d

j j j

j j j

then b ecause Aq q we have

k

k k

j j j

j j j

q q q

AQ

1 2 d

1 2 d

j j j

j j j

If we put the eigenvalues into the matrix  then the ab ove matrix can also b e written

as Q Therefore

AQ Q

1

Premultiplying b oth sides by Q gives

1

Q AQ 

This shows that any square matrix can b e converted into a diagonal form provided

it has distinct eigenvalues see eg p Sometimes there wont b e d distinct

eigenvalues and sometimes theyll b e complex

Sp ectral Theorem

For any real all the eigenvalues will b e real and there will b e d dis

tinct eigenvalues and eigenvectors The eigenvectors will b e orthogonal if the matrix

is not symmetric the eigenvectors wont b e orthogonal They can b e normalised and

1 T

placed into the matrix Q Since Q is now orthonormal we have Q Q Hence

T

Q AQ 

T

Premultiplying by Q and p ostmultiplying by Q gives

T

A QQ

which is known as the It says that any real symmetric matrix can

b e represented as ab ove where the columns of Q contain the eigenvectors and  is a

diagonal matrix containing the eigenvalues Equivalently

i

j j j

1

q

1

j j j

2

q

2

q q q A

1 2 d

j j j

q

d

j j j d

Signal Pro cessing Course WD Penny April

This can also b e written as a

d

X

T

A q q

k

k

k

k =1

Complex Matrices

If

i i

A

i i i

then the complex transp ose or Hermitian transp ose is given by

i i

H

i

A

i i

ie each entry changes into its see app endix and we then trans

T H

p ose the result Just as A denotes the transp ose of the inverse so A denotes

the Hermitian transp ose of the inverse

H

If A A is a diagonal matrix then A is said to b e a It is the complex

equivalent of an

Quadratic Forms

The quadratic

2 2

f x a x a x x a x x a x

11 12 1 2 21 2 1 dd

1 d

can b e written in matrix form as

a a a

11 12 1d

x

1

a a a

21 22 2d

x

2

f x x x x

1 2 d

x

d

a a a

d1 d2 dd

which is written compactly as

T

f x x Ax

If f x for any nonzero x then A is said to b e p ositivedenite Similarly if

f x then A is p ositivesemidenite

T

If we substitute A Q Q and x Qy where y are the pro jections onto the

eigenvectors then we can write

T

f x y y

X

2

y

i

i i

Signal Pro cessing Course WD Penny April

Hence for p ositivedeniteness we must therefore have for all i ie p ositive

i

eigenvalues

Ellipses

For by matrices if A I then we have

2 2

x f x

2 1

p

which is the equation of a circle with radius f If A k I we have

f

2 2

x x

1 2

k

q

f k If A diag k k we have The radius is now

1 2

2 2

f k x k x

1 2

1 2

q

which is the equation of an ellipse For k k the ma jor axis has length f k and

1 2 2

q

the axis has length f k

1

T

For a nondiagonal A we can diagonalise it using A QQ This gives

2 2

f x x

1 2

1 2

where the ellipse now lives in a new coordinate system given by the rotation x

q q

T

f and f x Q The ma jor and minor axes have lengths

2 1