<<

MATH 532: Linear Chapter 4: Vector Spaces

Greg Fasshauer

Department of Applied Illinois Institute of Technology

Spring 2015

[email protected] MATH 532 1 Outline

1 Spaces and Subspaces

2 Four Fundamental Subspaces

3

4 Bases and

5 More About

6 Classical Least Squares

7 Kriging as best linear unbiased predictor

[email protected] MATH 532 2 Spaces and Subspaces Outline

1 Spaces and Subspaces

2 Four Fundamental Subspaces

3 Linear Independence

4 Bases and Dimension

5 More About Rank

6 Classical Least Squares

7 Kriging as best linear unbiased predictor

[email protected] MATH 532 3 Spaces and Subspaces Spaces and Subspaces

While the discussion of vector spaces can be rather dry and abstract, they are an essential tool for describing the world we work in, and to understand many practically relevant consequences.

After all, linear algebra is pretty much the workhorse of modern .

Moreover, many concepts we discuss now for traditional “vectors” apply also to vector spaces of functions, which form the foundation of analysis.

[email protected] MATH 532 4 Spaces and Subspaces

Definition A V of elements (vectors) is called a vector space (or linear space) over the field F if (A1) x + y ∈ V for any x, y ∈ V (M1) αx ∈ V for every α ∈ F and (closed under ), x ∈ V (closed under scalar (A2) (x + y) + z = x + (y + z) for all multiplication), x, y, z ∈ V, (M2) (αβ)x = α(βx) for all αβ ∈ F, (A3) x + y = y + x for all x, y ∈ V, x ∈ V, (A4) There exists a zero vector 0 ∈ V (M3) α(x + y) = αx + αy for all α ∈ F, such that x + 0 = x for every x, y ∈ V, x ∈ V, (M4) (α + β)x = αx + βx for all (A5) For every x ∈ V there is a α, β ∈ F, x ∈ V, negative (−x) ∈ V such that (M5)1 x = x for all x ∈ V. x + (−x) = 0,

[email protected] MATH 532 5 V = Rm and F = R (traditional real vectors) V = Cm and F = (traditional complex vectors) × V = Rm n and F = R (real matrices) × V = Cm n and F = C (complex matrices)

But also V is of a certain degree with real coefficients, F = R V is continuous functions on an [a, b], F = R

Spaces and Subspaces Examples of vector spaces

[email protected] MATH 532 6 V = Cm and F = C (traditional complex vectors) × V = Rm n and F = R (real matrices) × V = Cm n and F = C (complex matrices)

But also V is polynomials of a certain degree with real coefficients, F = R V is continuous functions on an interval [a, b], F = R

Spaces and Subspaces Examples of vector spaces

V = Rm and F = R (traditional real vectors)

[email protected] MATH 532 6 × V = Rm n and F = R (real matrices) × V = Cm n and F = C (complex matrices)

But also V is polynomials of a certain degree with real coefficients, F = R V is continuous functions on an interval [a, b], F = R

Spaces and Subspaces Examples of vector spaces

V = Rm and F = R (traditional real vectors) V = Cm and F = C (traditional complex vectors)

[email protected] MATH 532 6 But also V is polynomials of a certain degree with real coefficients, F = R V is continuous functions on an interval [a, b], F = R

Spaces and Subspaces Examples of vector spaces

V = Rm and F = R (traditional real vectors) V = Cm and F = C (traditional complex vectors) × V = Rm n and F = R (real matrices) × V = Cm n and F = C (complex matrices)

[email protected] MATH 532 6 Spaces and Subspaces Examples of vector spaces

V = Rm and F = R (traditional real vectors) V = Cm and F = C (traditional complex vectors) × V = Rm n and F = R (real matrices) × V = Cm n and F = C (complex matrices)

But also V is polynomials of a certain degree with real coefficients, F = R V is continuous functions on an interval [a, b], F = R

[email protected] MATH 532 6 Q: What is the difference between a and a subspace? A: The structure provided by the (A1)–(A5), (M1)–(M5) Theorem The subset S ⊆ V is a subspace of V if and only if

αx + βy ∈ S for all x, y ∈ S, α, β ∈ F. (1)

Remark Z = {0} is called the trivial subspace.

Spaces and Subspaces Subspaces

Definition Let S be a nonempty subset of V. If S is a vector space, then S is called a subspace of V.

[email protected] MATH 532 7 A: The structure provided by the axioms (A1)–(A5), (M1)–(M5) Theorem The subset S ⊆ V is a subspace of V if and only if

αx + βy ∈ S for all x, y ∈ S, α, β ∈ F. (1)

Remark Z = {0} is called the trivial subspace.

Spaces and Subspaces Subspaces

Definition Let S be a nonempty subset of V. If S is a vector space, then S is called a subspace of V.

Q: What is the difference between a subset and a subspace?

[email protected] MATH 532 7 Theorem The subset S ⊆ V is a subspace of V if and only if

αx + βy ∈ S for all x, y ∈ S, α, β ∈ F. (1)

Remark Z = {0} is called the trivial subspace.

Spaces and Subspaces Subspaces

Definition Let S be a nonempty subset of V. If S is a vector space, then S is called a subspace of V.

Q: What is the difference between a subset and a subspace? A: The structure provided by the axioms (A1)–(A5), (M1)–(M5)

[email protected] MATH 532 7 Remark Z = {0} is called the trivial subspace.

Spaces and Subspaces Subspaces

Definition Let S be a nonempty subset of V. If S is a vector space, then S is called a subspace of V.

Q: What is the difference between a subset and a subspace? A: The structure provided by the axioms (A1)–(A5), (M1)–(M5) Theorem The subset S ⊆ V is a subspace of V if and only if

αx + βy ∈ S for all x, y ∈ S, α, β ∈ F. (1)

[email protected] MATH 532 7 Spaces and Subspaces Subspaces

Definition Let S be a nonempty subset of V. If S is a vector space, then S is called a subspace of V.

Q: What is the difference between a subset and a subspace? A: The structure provided by the axioms (A1)–(A5), (M1)–(M5) Theorem The subset S ⊆ V is a subspace of V if and only if

αx + βy ∈ S for all x, y ∈ S, α, β ∈ F. (1)

Remark Z = {0} is called the trivial subspace.

[email protected] MATH 532 7 “⇐=”: Only (A1), (A4), (A5) and (M1) need to be checked (why?).

In fact, we see that (A1) and (M1) imply (A4) and (A5):

If x ∈ S, then — using (M1) — −1x = −x ∈ S, i.e., (A5) holds.

Using (A1), x + (−x) = 0 ∈ S, so that (A4) holds.

Spaces and Subspaces

Proof. “=⇒”: Clear, since we actually have

(1) ⇐⇒ (A1) and (M1)

[email protected] MATH 532 8 In fact, we see that (A1) and (M1) imply (A4) and (A5):

If x ∈ S, then — using (M1) — −1x = −x ∈ S, i.e., (A5) holds.

Using (A1), x + (−x) = 0 ∈ S, so that (A4) holds.

Spaces and Subspaces

Proof. “=⇒”: Clear, since we actually have

(1) ⇐⇒ (A1) and (M1)

“⇐=”: Only (A1), (A4), (A5) and (M1) need to be checked (why?).

[email protected] MATH 532 8 If x ∈ S, then — using (M1) — −1x = −x ∈ S, i.e., (A5) holds.

Using (A1), x + (−x) = 0 ∈ S, so that (A4) holds.

Spaces and Subspaces

Proof. “=⇒”: Clear, since we actually have

(1) ⇐⇒ (A1) and (M1)

“⇐=”: Only (A1), (A4), (A5) and (M1) need to be checked (why?).

In fact, we see that (A1) and (M1) imply (A4) and (A5):

[email protected] MATH 532 8 Using (A1), x + (−x) = 0 ∈ S, so that (A4) holds.

Spaces and Subspaces

Proof. “=⇒”: Clear, since we actually have

(1) ⇐⇒ (A1) and (M1)

“⇐=”: Only (A1), (A4), (A5) and (M1) need to be checked (why?).

In fact, we see that (A1) and (M1) imply (A4) and (A5):

If x ∈ S, then — using (M1) — −1x = −x ∈ S, i.e., (A5) holds.

[email protected] MATH 532 8 Spaces and Subspaces

Proof. “=⇒”: Clear, since we actually have

(1) ⇐⇒ (A1) and (M1)

“⇐=”: Only (A1), (A4), (A5) and (M1) need to be checked (why?).

In fact, we see that (A1) and (M1) imply (A4) and (A5):

If x ∈ S, then — using (M1) — −1x = −x ∈ S, i.e., (A5) holds.

Using (A1), x + (−x) = 0 ∈ S, so that (A4) holds.

[email protected] MATH 532 8 Remark span(S) contains all possible linear combinations of vectors in S. One can easily show that span(S) is a subspace of V.

Example (Geometric interpretation) 3 1 If S = {v 1} ⊆ R , then span(S) is the through the origin with direction v 1. 3 2 If S = {v 1, v 2 : v 1 6= αv 2, α 6= 0} ⊆ R , then span(S) is the through the origin “spanned by” v 1 and v 2.

Spaces and Subspaces

Definition

Let S = {v 1,..., v r } ⊆ V. The span of S is

( r ) X span(S) = αi v i : αi ∈ F . i=1

[email protected] MATH 532 9 Example (Geometric interpretation) 3 1 If S = {v 1} ⊆ R , then span(S) is the line through the origin with direction v 1. 3 2 If S = {v 1, v 2 : v 1 6= αv 2, α 6= 0} ⊆ R , then span(S) is the plane through the origin “spanned by” v 1 and v 2.

Spaces and Subspaces

Definition

Let S = {v 1,..., v r } ⊆ V. The span of S is

( r ) X span(S) = αi v i : αi ∈ F . i=1

Remark span(S) contains all possible linear combinations of vectors in S. One can easily show that span(S) is a subspace of V.

[email protected] MATH 532 9 the line through the origin with direction v 1. 3 2 If S = {v 1, v 2 : v 1 6= αv 2, α 6= 0} ⊆ R , then span(S) is the plane through the origin “spanned by” v 1 and v 2.

Spaces and Subspaces

Definition

Let S = {v 1,..., v r } ⊆ V. The span of S is

( r ) X span(S) = αi v i : αi ∈ F . i=1

Remark span(S) contains all possible linear combinations of vectors in S. One can easily show that span(S) is a subspace of V.

Example (Geometric interpretation) 3 1 If S = {v 1} ⊆ R , then span(S) is

[email protected] MATH 532 9 3 2 If S = {v 1, v 2 : v 1 6= αv 2, α 6= 0} ⊆ R , then span(S) is the plane through the origin “spanned by” v 1 and v 2.

Spaces and Subspaces

Definition

Let S = {v 1,..., v r } ⊆ V. The span of S is

( r ) X span(S) = αi v i : αi ∈ F . i=1

Remark span(S) contains all possible linear combinations of vectors in S. One can easily show that span(S) is a subspace of V.

Example (Geometric interpretation) 3 1 If S = {v 1} ⊆ R , then span(S) is the line through the origin with direction v 1.

[email protected] MATH 532 9 the plane through the origin “spanned by” v 1 and v 2.

Spaces and Subspaces

Definition

Let S = {v 1,..., v r } ⊆ V. The span of S is

( r ) X span(S) = αi v i : αi ∈ F . i=1

Remark span(S) contains all possible linear combinations of vectors in S. One can easily show that span(S) is a subspace of V.

Example (Geometric interpretation) 3 1 If S = {v 1} ⊆ R , then span(S) is the line through the origin with direction v 1. 3 2 If S = {v 1, v 2 : v 1 6= αv 2, α 6= 0} ⊆ R , then span(S) is

[email protected] MATH 532 9 Spaces and Subspaces

Definition

Let S = {v 1,..., v r } ⊆ V. The span of S is

( r ) X span(S) = αi v i : αi ∈ F . i=1

Remark span(S) contains all possible linear combinations of vectors in S. One can easily show that span(S) is a subspace of V.

Example (Geometric interpretation) 3 1 If S = {v 1} ⊆ R , then span(S) is the line through the origin with direction v 1. 3 2 If S = {v 1, v 2 : v 1 6= αv 2, α 6= 0} ⊆ R , then span(S) is the plane through the origin “spanned by” v 1 and v 2.

[email protected] MATH 532 9 Remark A spanning set is sometimes referred to as a (finite) . A spanning set is not the same as a since the spanning set may include redundancies.

Example        1 0 0  0 , 1 , 0 is a spanning set for R3.  0 0 1               1 0 0 2 0 0  0 , 1 , 0 , 0 , 2 , 0 is also a spanning set  0 0 1 0 0 2  for R3.

Spaces and Subspaces Definition

Let S = {v 1,..., v r } ⊆ V. If span S = V then S is called a spanning set for V.

[email protected] MATH 532 10 A spanning set is not the same as a basis since the spanning set may include redundancies.

Example        1 0 0  0 , 1 , 0 is a spanning set for R3.  0 0 1               1 0 0 2 0 0  0 , 1 , 0 , 0 , 2 , 0 is also a spanning set  0 0 1 0 0 2  for R3.

Spaces and Subspaces Definition

Let S = {v 1,..., v r } ⊆ V. If span S = V then S is called a spanning set for V.

Remark A spanning set is sometimes referred to as a (finite) frame.

[email protected] MATH 532 10 Example        1 0 0  0 , 1 , 0 is a spanning set for R3.  0 0 1               1 0 0 2 0 0  0 , 1 , 0 , 0 , 2 , 0 is also a spanning set  0 0 1 0 0 2  for R3.

Spaces and Subspaces Definition

Let S = {v 1,..., v r } ⊆ V. If span S = V then S is called a spanning set for V.

Remark A spanning set is sometimes referred to as a (finite) frame. A spanning set is not the same as a basis since the spanning set may include redundancies.

[email protected] MATH 532 10              1 0 0 2 0 0  0 , 1 , 0 , 0 , 2 , 0 is also a spanning set  0 0 1 0 0 2  for R3.

Spaces and Subspaces Definition

Let S = {v 1,..., v r } ⊆ V. If span S = V then S is called a spanning set for V.

Remark A spanning set is sometimes referred to as a (finite) frame. A spanning set is not the same as a basis since the spanning set may include redundancies.

Example        1 0 0  0 , 1 , 0 is a spanning set for R3.  0 0 1 

[email protected] MATH 532 10 Spaces and Subspaces Definition

Let S = {v 1,..., v r } ⊆ V. If span S = V then S is called a spanning set for V.

Remark A spanning set is sometimes referred to as a (finite) frame. A spanning set is not the same as a basis since the spanning set may include redundancies.

Example        1 0 0  0 , 1 , 0 is a spanning set for R3.  0 0 1               1 0 0 2 0 0  0 , 1 , 0 , 0 , 2 , 0 is also a spanning set  0 0 1 0 0 2  for R3. [email protected] MATH 532 10 Proof. By definition, S is a spanning set for Rm if and only if for every b ∈ Rm there exist α1, . . . , αn ∈ R such that

b = α1a1 + ... + αnan = Ax,     α1  .  where A = a1 a2 ··· an and x =  . . m×n αn

Spaces and Subspaces Connection to linear systems

Theorem

Let S = {a1, a2,..., an} be the set of columns of an m × n A. span(S) = Rm if and only if for every b ∈ Rm there exists an x ∈ Rn such that Ax = b (i.e., if and only if Ax = b is consistent for every b ∈ Rm).

[email protected] MATH 532 11 Spaces and Subspaces Connection to linear systems

Theorem

Let S = {a1, a2,..., an} be the set of columns of an m × n matrix A. span(S) = Rm if and only if for every b ∈ Rm there exists an x ∈ Rn such that Ax = b (i.e., if and only if Ax = b is consistent for every b ∈ Rm).

Proof. By definition, S is a spanning set for Rm if and only if for every b ∈ Rm there exist α1, . . . , αn ∈ R such that

b = α1a1 + ... + αnan = Ax,     α1  .  where A = a1 a2 ··· an and x =  . . m×n αn

[email protected] MATH 532 11 If SX and SY are spanning sets for X and Y, respectively, then SX ∪ SY is a spanning set for X + Y.

Spaces and Subspaces

Remark The sum X + Y = {x + y : x ∈ X , y ∈ Y} is a subspace of V provided X and Y are subspaces.

[email protected] MATH 532 12 Spaces and Subspaces

Remark The sum X + Y = {x + y : x ∈ X , y ∈ Y} is a subspace of V provided X and Y are subspaces.

If SX and SY are spanning sets for X and Y, respectively, then SX ∪ SY is a spanning set for X + Y.

[email protected] MATH 532 12 Four Fundamental Subspaces Outline

1 Spaces and Subspaces

2 Four Fundamental Subspaces

3 Linear Independence

4 Bases and Dimension

5 More About Rank

6 Classical Least Squares

7 Kriging as best linear unbiased predictor

[email protected] MATH 532 13 Example Let A be a real m × n matrix and

n f (x) = Ax, x ∈ R .

The f is linear since A(αx + βy) = αAx + βAy. Moreover, the range of f ,

n m R(f ) = {Ax : x ∈ R } ⊆ R ,

is a subspace of Rm since for all α, β ∈ R and x, y ∈ Rn

α( Ax ) + β( Ay ) = A(αx + βy) ∈ R(f ). |{z} |{z} ∈R(f ) ∈R(f )

Four Fundamental Subspaces Four Fundamental Subspaces Recall that a f : Rn → Rm satisfies n f (αx + βy) = αf (x) + βf (y) ∀α, β ∈ R, x, y ∈ R .

[email protected] MATH 532 14 since A(αx + βy) = αAx + βAy. Moreover, the range of f ,

n m R(f ) = {Ax : x ∈ R } ⊆ R ,

is a subspace of Rm since for all α, β ∈ R and x, y ∈ Rn

α( Ax ) + β( Ay ) = A(αx + βy) ∈ R(f ). |{z} |{z} ∈R(f ) ∈R(f )

Four Fundamental Subspaces Four Fundamental Subspaces Recall that a linear function f : Rn → Rm satisfies n f (αx + βy) = αf (x) + βf (y) ∀α, β ∈ R, x, y ∈ R . Example Let A be a real m × n matrix and

n f (x) = Ax, x ∈ R .

The function f is linear

[email protected] MATH 532 14 for all α, β ∈ R and x, y ∈ Rn

α( Ax ) + β( Ay ) = A(αx + βy) ∈ R(f ). |{z} |{z} ∈R(f ) ∈R(f )

Four Fundamental Subspaces Four Fundamental Subspaces Recall that a linear function f : Rn → Rm satisfies n f (αx + βy) = αf (x) + βf (y) ∀α, β ∈ R, x, y ∈ R . Example Let A be a real m × n matrix and

n f (x) = Ax, x ∈ R .

The function f is linear since A(αx + βy) = αAx + βAy. Moreover, the range of f ,

n m R(f ) = {Ax : x ∈ R } ⊆ R ,

is a subspace of Rm since

[email protected] MATH 532 14 Four Fundamental Subspaces Four Fundamental Subspaces Recall that a linear function f : Rn → Rm satisfies n f (αx + βy) = αf (x) + βf (y) ∀α, β ∈ R, x, y ∈ R . Example Let A be a real m × n matrix and

n f (x) = Ax, x ∈ R .

The function f is linear since A(αx + βy) = αAx + βAy. Moreover, the range of f ,

n m R(f ) = {Ax : x ∈ R } ⊆ R ,

is a subspace of Rm since for all α, β ∈ R and x, y ∈ Rn

α( Ax ) + β( Ay ) = A(αx + βy) ∈ R(f ). |{z} |{z} ∈R(f ) ∈R(f )

[email protected] MATH 532 14 Similarly, T n T mo n R(A ) = A y : y ∈ R ⊆ R

is called the range of AT .

Four Fundamental Subspaces

Remark For the situation in this example we can also use the terminology range of A (or of A), i.e.,

n m R(A) = {Ax : x ∈ R } ⊆ R

[email protected] MATH 532 15 Four Fundamental Subspaces

Remark For the situation in this example we can also use the terminology range of A (or image of A), i.e.,

n m R(A) = {Ax : x ∈ R } ⊆ R

Similarly, T n T mo n R(A ) = A y : y ∈ R ⊆ R is called the range of AT .

[email protected] MATH 532 15 Similarly, R(AT ) is the row space of A.

Four Fundamental Subspaces Column space and row space

Since Ax = α1a1 + ... + αnan,

we have R(A) = span{a1,... an}, i.e.,

R(A) is the column space of A.

[email protected] MATH 532 16 Four Fundamental Subspaces Column space and row space

Since Ax = α1a1 + ... + αnan,

we have R(A) = span{a1,... an}, i.e.,

R(A) is the column space of A.

Similarly, R(AT ) is the row space of A.

[email protected] MATH 532 16 the rows of A span R(AT ), i.e., they form a spanning set of R(AT ), However, since

(A)∗3 = 2(A)∗2 − (A)∗1 and (A)3∗ = 2(A)2∗ − (A)1∗

we also have

R(A) = span{(A)∗1, (A)∗2} T R(A ) = span{(A)1∗, (A)2∗}

Four Fundamental Subspaces

Example Consider 1 2 3 A = 4 5 6 7 8 9 By definition the columns of A span R(A), i.e., they form a spanning set of R(A),

[email protected] MATH 532 17 they form a spanning set of R(AT ), However, since

(A)∗3 = 2(A)∗2 − (A)∗1 and (A)3∗ = 2(A)2∗ − (A)1∗

we also have

R(A) = span{(A)∗1, (A)∗2} T R(A ) = span{(A)1∗, (A)2∗}

Four Fundamental Subspaces

Example Consider 1 2 3 A = 4 5 6 7 8 9 By definition the columns of A span R(A), i.e., they form a spanning set of R(A), the rows of A span R(AT ), i.e.,

[email protected] MATH 532 17 However, since

(A)∗3 = 2(A)∗2 − (A)∗1 and (A)3∗ = 2(A)2∗ − (A)1∗

we also have

R(A) = span{(A)∗1, (A)∗2} T R(A ) = span{(A)1∗, (A)2∗}

Four Fundamental Subspaces

Example Consider 1 2 3 A = 4 5 6 7 8 9 By definition the columns of A span R(A), i.e., they form a spanning set of R(A), the rows of A span R(AT ), i.e., they form a spanning set of R(AT ),

[email protected] MATH 532 17 span{(A)∗1, (A)∗2} T R(A ) = span{(A)1∗, (A)2∗}

Four Fundamental Subspaces

Example Consider 1 2 3 A = 4 5 6 7 8 9 By definition the columns of A span R(A), i.e., they form a spanning set of R(A), the rows of A span R(AT ), i.e., they form a spanning set of R(AT ), However, since

(A)∗3 = 2(A)∗2 − (A)∗1 and (A)3∗ = 2(A)2∗ − (A)1∗ we also have R(A) =

[email protected] MATH 532 17 span{(A)1∗, (A)2∗}

Four Fundamental Subspaces

Example Consider 1 2 3 A = 4 5 6 7 8 9 By definition the columns of A span R(A), i.e., they form a spanning set of R(A), the rows of A span R(AT ), i.e., they form a spanning set of R(AT ), However, since

(A)∗3 = 2(A)∗2 − (A)∗1 and (A)3∗ = 2(A)2∗ − (A)1∗ we also have

R(A) = span{(A)∗1, (A)∗2} R(AT ) =

[email protected] MATH 532 17 Four Fundamental Subspaces

Example Consider 1 2 3 A = 4 5 6 7 8 9 By definition the columns of A span R(A), i.e., they form a spanning set of R(A), the rows of A span R(AT ), i.e., they form a spanning set of R(AT ), However, since

(A)∗3 = 2(A)∗2 − (A)∗1 and (A)3∗ = 2(A)2∗ − (A)1∗ we also have

R(A) = span{(A)∗1, (A)∗2} T R(A ) = span{(A)1∗, (A)2∗}

[email protected] MATH 532 17 Four Fundamental Subspaces

In general, how do we find such minimal spanning sets as in the previous example?

An important tool is Lemma Let A, B be m × n matrices. Then

T T row (1) R(A ) = R(B ) ⇐⇒ A ∼ B (⇐⇒ EA = EB).

col (2) R(A) = R(B) ⇐⇒ A ∼ B (⇐⇒ EAT = EBT ).

[email protected] MATH 532 18 Now a ∈ R(AT ) ⇐⇒ a = AT y for some y. We rewrite this as

Four Fundamental Subspaces

Proof row 1 “⇐=”: Assume A ∼ B, i.e., there exists a nonsingular matrix P such that PA = B ⇐⇒ AT PT = BT .

[email protected] MATH 532 19 We rewrite this as

Four Fundamental Subspaces

Proof row 1 “⇐=”: Assume A ∼ B, i.e., there exists a nonsingular matrix P such that PA = B ⇐⇒ AT PT = BT . Now a ∈ R(AT ) ⇐⇒ a = AT y for some y.

[email protected] MATH 532 19 Four Fundamental Subspaces

Proof row 1 “⇐=”: Assume A ∼ B, i.e., there exists a nonsingular matrix P such that PA = B ⇐⇒ AT PT = BT . Now a ∈ R(AT ) ⇐⇒ a = AT y for some y. We rewrite this as

a = AT PT P−T y

[email protected] MATH 532 19 Four Fundamental Subspaces

Proof row 1 “⇐=”: Assume A ∼ B, i.e., there exists a nonsingular matrix P such that PA = B ⇐⇒ AT PT = BT . Now a ∈ R(AT ) ⇐⇒ a = AT y for some y. We rewrite this as

a = AT PT P−T y | {z } =BT

[email protected] MATH 532 19 Four Fundamental Subspaces

Proof row 1 “⇐=”: Assume A ∼ B, i.e., there exists a nonsingular matrix P such that PA = B ⇐⇒ AT PT = BT . Now a ∈ R(AT ) ⇐⇒ a = AT y for some y. We rewrite this as

a = AT PT P−T y | {z } =BT ⇐⇒ a = BT x for x = P−T y

[email protected] MATH 532 19 Four Fundamental Subspaces

Proof row 1 “⇐=”: Assume A ∼ B, i.e., there exists a nonsingular matrix P such that PA = B ⇐⇒ AT PT = BT . Now a ∈ R(AT ) ⇐⇒ a = AT y for some y. We rewrite this as

a = AT PT P−T y | {z } =BT ⇐⇒ a = BT x for x = P−T y ⇐⇒ a ∈ R(BT ).

[email protected] MATH 532 19 i.e., the rows of A are linear combinations of rows of B and vice versa. Now apply row operations to A (all collected in P) to obtain

PA = B, i.e., A row∼ B.

T T 2 Let A = A and B = B in (1). 

Four Fundamental Subspaces

(cont.) “=⇒”: Assume R(AT ) = R(BT ), i.e.,

span{(A)1∗,..., (A)m∗} = span{(B)1∗,..., (B)m∗},

[email protected] MATH 532 20 Now apply row operations to A (all collected in P) to obtain

PA = B, i.e., A row∼ B.

T T 2 Let A = A and B = B in (1). 

Four Fundamental Subspaces

(cont.) “=⇒”: Assume R(AT ) = R(BT ), i.e.,

span{(A)1∗,..., (A)m∗} = span{(B)1∗,..., (B)m∗},

i.e., the rows of A are linear combinations of rows of B and vice versa.

[email protected] MATH 532 20 T T 2 Let A = A and B = B in (1). 

Four Fundamental Subspaces

(cont.) “=⇒”: Assume R(AT ) = R(BT ), i.e.,

span{(A)1∗,..., (A)m∗} = span{(B)1∗,..., (B)m∗},

i.e., the rows of A are linear combinations of rows of B and vice versa. Now apply row operations to A (all collected in P) to obtain

PA = B, i.e., A row∼ B.

[email protected] MATH 532 20 Four Fundamental Subspaces

(cont.) “=⇒”: Assume R(AT ) = R(BT ), i.e.,

span{(A)1∗,..., (A)m∗} = span{(B)1∗,..., (B)m∗},

i.e., the rows of A are linear combinations of rows of B and vice versa. Now apply row operations to A (all collected in P) to obtain

PA = B, i.e., A row∼ B.

T T 2 Let A = A and B = B in (1). 

[email protected] MATH 532 20 Remark Later we will see that any minimal span of the columns of A forms a basis for R(A).

Four Fundamental Subspaces

Theorem Let A be an m × n matrix and U any obtained from A. Then 1 R(AT ) = span of nonzero rows of U. 2 R(A) = span of basic columns of A.

[email protected] MATH 532 21 Four Fundamental Subspaces

Theorem Let A be an m × n matrix and U any row echelon form obtained from A. Then 1 R(AT ) = span of nonzero rows of U. 2 R(A) = span of basic columns of A.

Remark Later we will see that any minimal span of the columns of A forms a basis for R(A).

[email protected] MATH 532 21 2 Assume the columns of A are permuted (with a matrix Q1) such that  AQ1 = BN , where B contains the basic columns, and N the nonbasic columns.

By definition, the nonbasic columns are linear combinations of the basic columns, i.e., there exists a nonsingular Q2 such that   BN Q2 = BO ,

where O is a .

Four Fundamental Subspaces

Proof row 1 This follows from (1) in the Lemma since A ∼ U.

[email protected] MATH 532 22 By definition, the nonbasic columns are linear combinations of the basic columns, i.e., there exists a nonsingular Q2 such that   BN Q2 = BO ,

where O is a zero matrix.

Four Fundamental Subspaces

Proof row 1 This follows from (1) in the Lemma since A ∼ U.

2 Assume the columns of A are permuted (with a matrix Q1) such that  AQ1 = BN , where B contains the basic columns, and N the nonbasic columns.

[email protected] MATH 532 22 Four Fundamental Subspaces

Proof row 1 This follows from (1) in the Lemma since A ∼ U.

2 Assume the columns of A are permuted (with a matrix Q1) such that  AQ1 = BN , where B contains the basic columns, and N the nonbasic columns.

By definition, the nonbasic columns are linear combinations of the basic columns, i.e., there exists a nonsingular Q2 such that   BN Q2 = BO ,

where O is a zero matrix.

[email protected] MATH 532 22 (2) in the Lemma says that

R(A) = span{B∗1,..., B∗r },

where r = rank(A). 

so that A col∼ BO.

Four Fundamental Subspaces

(cont.) Putting this together, we have

 AQ1Q2 = BO ,

[email protected] MATH 532 23 (2) in the Lemma says that

R(A) = span{B∗1,..., B∗r },

where r = rank(A). 

Four Fundamental Subspaces

(cont.) Putting this together, we have

 AQ1Q2 = BO , | {z } =Q so that A col∼ BO.

[email protected] MATH 532 23 Four Fundamental Subspaces

(cont.) Putting this together, we have

 AQ1Q2 = BO , | {z } =Q so that A col∼ BO.

(2) in the Lemma says that

R(A) = span{B∗1,..., B∗r }, where r = rank(A). 

[email protected] MATH 532 23 Third fundamental subspace: N(A) = {x : Ax = 0} ⊆ Rn, N(A) is the nullspace of A (also called the of A) Fourth fundamental subspace: N(AT ) = {y : AT y = 0} ⊆ Rm, N(AT ) is the left nullspace of A Remark N(A) is a linear space, i.e., a subspace of Rn. To see this, assume x, y ∈ N(A), i.e., Ax = Ay = 0. Then A(αx + βy) = αAx + βAy = 0, so that αx + βy ∈ N(A).

Four Fundamental Subspaces So far, we have two of the four fundamental subspaces: R(A) and R(AT ).

[email protected] MATH 532 24 Fourth fundamental subspace: N(AT ) = {y : AT y = 0} ⊆ Rm, N(AT ) is the left nullspace of A Remark N(A) is a linear space, i.e., a subspace of Rn. To see this, assume x, y ∈ N(A), i.e., Ax = Ay = 0. Then A(αx + βy) = αAx + βAy = 0, so that αx + βy ∈ N(A).

Four Fundamental Subspaces So far, we have two of the four fundamental subspaces: R(A) and R(AT ).

Third fundamental subspace: N(A) = {x : Ax = 0} ⊆ Rn, N(A) is the nullspace of A (also called the kernel of A)

[email protected] MATH 532 24 Remark N(A) is a linear space, i.e., a subspace of Rn. To see this, assume x, y ∈ N(A), i.e., Ax = Ay = 0. Then A(αx + βy) = αAx + βAy = 0, so that αx + βy ∈ N(A).

Four Fundamental Subspaces So far, we have two of the four fundamental subspaces: R(A) and R(AT ).

Third fundamental subspace: N(A) = {x : Ax = 0} ⊆ Rn, N(A) is the nullspace of A (also called the kernel of A) Fourth fundamental subspace: N(AT ) = {y : AT y = 0} ⊆ Rm, N(AT ) is the left nullspace of A

[email protected] MATH 532 24 To see this, assume x, y ∈ N(A), i.e., Ax = Ay = 0. Then A(αx + βy) = αAx + βAy = 0, so that αx + βy ∈ N(A).

Four Fundamental Subspaces So far, we have two of the four fundamental subspaces: R(A) and R(AT ).

Third fundamental subspace: N(A) = {x : Ax = 0} ⊆ Rn, N(A) is the nullspace of A (also called the kernel of A) Fourth fundamental subspace: N(AT ) = {y : AT y = 0} ⊆ Rm, N(AT ) is the left nullspace of A Remark N(A) is a linear space, i.e., a subspace of Rn.

[email protected] MATH 532 24 Then A(αx + βy) = αAx + βAy = 0, so that αx + βy ∈ N(A).

Four Fundamental Subspaces So far, we have two of the four fundamental subspaces: R(A) and R(AT ).

Third fundamental subspace: N(A) = {x : Ax = 0} ⊆ Rn, N(A) is the nullspace of A (also called the kernel of A) Fourth fundamental subspace: N(AT ) = {y : AT y = 0} ⊆ Rm, N(AT ) is the left nullspace of A Remark N(A) is a linear space, i.e., a subspace of Rn. To see this, assume x, y ∈ N(A), i.e., Ax = Ay = 0.

[email protected] MATH 532 24 Four Fundamental Subspaces So far, we have two of the four fundamental subspaces: R(A) and R(AT ).

Third fundamental subspace: N(A) = {x : Ax = 0} ⊆ Rn, N(A) is the nullspace of A (also called the kernel of A) Fourth fundamental subspace: N(AT ) = {y : AT y = 0} ⊆ Rm, N(AT ) is the left nullspace of A Remark N(A) is a linear space, i.e., a subspace of Rn. To see this, assume x, y ∈ N(A), i.e., Ax = Ay = 0. Then A(αx + βy) = αAx + βAy = 0, so that αx + βy ∈ N(A).

[email protected] MATH 532 24 ( x = −2x So that Ux = 0 =⇒ 2 3 , or x1 = −2x2 − 3x3 = x3       x1 x3 1 x2 = −2x3 = x3 −2 . x3 x3 1

Therefore    1  N(A) = span −2 .  1 

Four Fundamental Subspaces How to find a (minimal) spanning set for N(A)

Find a row echelon form U of A and solve Ux = 0. Example 1 2 3 1 2 3  We can compute A = 4 5 6 −→ U = 0 −3 −6. 7 8 9 0 0 0

[email protected] MATH 532 25 or

      x1 x3 1 x2 = −2x3 = x3 −2 . x3 x3 1

Therefore    1  N(A) = span −2 .  1 

Four Fundamental Subspaces How to find a (minimal) spanning set for N(A)

Find a row echelon form U of A and solve Ux = 0. Example 1 2 3 1 2 3  We can compute A = 4 5 6 −→ U = 0 −3 −6. 7 8 9 0 0 0 ( x = −2x So that Ux = 0 =⇒ 2 3 , x1 = −2x2 − 3x3 = x3

[email protected] MATH 532 25 Therefore    1  N(A) = span −2 .  1 

Four Fundamental Subspaces How to find a (minimal) spanning set for N(A)

Find a row echelon form U of A and solve Ux = 0. Example 1 2 3 1 2 3  We can compute A = 4 5 6 −→ U = 0 −3 −6. 7 8 9 0 0 0 ( x = −2x So that Ux = 0 =⇒ 2 3 , or x1 = −2x2 − 3x3 = x3       x1 x3 1 x2 = −2x3 = x3 −2 . x3 x3 1

[email protected] MATH 532 25 Four Fundamental Subspaces How to find a (minimal) spanning set for N(A)

Find a row echelon form U of A and solve Ux = 0. Example 1 2 3 1 2 3  We can compute A = 4 5 6 −→ U = 0 −3 −6. 7 8 9 0 0 0 ( x = −2x So that Ux = 0 =⇒ 2 3 , or x1 = −2x2 − 3x3 = x3       x1 x3 1 x2 = −2x3 = x3 −2 . x3 x3 1

Therefore    1  N(A) = span −2 .  1 

[email protected] MATH 532 25 Theorem Let A be an m × n matrix. Then 1 N(A) = {0} ⇐⇒ rank(A) = n. 2 N(AT ) = {0} ⇐⇒ rank(A) = m.

Proof. 1 We know rank(A) = n ⇐⇒ Ax = 0, but that implies x = 0. 2 Repeat (1) with A = AT and use rank(AT ) = rank(A).

Four Fundamental Subspaces

Remark We will see later that — as in the example — if rank(A) = r, then N(A) is spanned by n − r vectors.

[email protected] MATH 532 26 Proof. 1 We know rank(A) = n ⇐⇒ Ax = 0, but that implies x = 0. 2 Repeat (1) with A = AT and use rank(AT ) = rank(A).

Four Fundamental Subspaces

Remark We will see later that — as in the example — if rank(A) = r, then N(A) is spanned by n − r vectors.

Theorem Let A be an m × n matrix. Then 1 N(A) = {0} ⇐⇒ rank(A) = n. 2 N(AT ) = {0} ⇐⇒ rank(A) = m.

[email protected] MATH 532 26 but that implies x = 0. 2 Repeat (1) with A = AT and use rank(AT ) = rank(A).

Four Fundamental Subspaces

Remark We will see later that — as in the example — if rank(A) = r, then N(A) is spanned by n − r vectors.

Theorem Let A be an m × n matrix. Then 1 N(A) = {0} ⇐⇒ rank(A) = n. 2 N(AT ) = {0} ⇐⇒ rank(A) = m.

Proof. 1 We know rank(A) = n ⇐⇒ Ax = 0,

[email protected] MATH 532 26 2 Repeat (1) with A = AT and use rank(AT ) = rank(A).

Four Fundamental Subspaces

Remark We will see later that — as in the example — if rank(A) = r, then N(A) is spanned by n − r vectors.

Theorem Let A be an m × n matrix. Then 1 N(A) = {0} ⇐⇒ rank(A) = n. 2 N(AT ) = {0} ⇐⇒ rank(A) = m.

Proof. 1 We know rank(A) = n ⇐⇒ Ax = 0, but that implies x = 0.

[email protected] MATH 532 26 Four Fundamental Subspaces

Remark We will see later that — as in the example — if rank(A) = r, then N(A) is spanned by n − r vectors.

Theorem Let A be an m × n matrix. Then 1 N(A) = {0} ⇐⇒ rank(A) = n. 2 N(AT ) = {0} ⇐⇒ rank(A) = m.

Proof. 1 We know rank(A) = n ⇐⇒ Ax = 0, but that implies x = 0. 2 Repeat (1) with A = AT and use rank(AT ) = rank(A).

[email protected] MATH 532 26 Remark We will later see that this spanning set is also a basis for N(AT ).

Four Fundamental Subspaces How to find a spanning set of N(AT )

Theorem Let A be an m × n matrix with rank(A) = r, and let P be a nonsingular matrix so that PA = U (row echelon form). Then the last m − r rows of P span N(AT ).

[email protected] MATH 532 27 Four Fundamental Subspaces How to find a spanning set of N(AT )

Theorem Let A be an m × n matrix with rank(A) = r, and let P be a nonsingular matrix so that PA = U (row echelon form). Then the last m − r rows of P span N(AT ).

Remark We will later see that this spanning set is also a basis for N(AT ).

[email protected] MATH 532 27 The claim of the theorem implies that we should show that T T R(P2 ) = N(A ).

We do this in two parts:

1 T T Show that R(P2 ) ⊆ N(A ). 2 T T Show that N(A ) ⊆ R(P2 ).

Four Fundamental Subspaces

Proof   P1 Partition P as P = , where P1 is r × m and P2 is m − r × m. P2

[email protected] MATH 532 28 We do this in two parts:

1 T T Show that R(P2 ) ⊆ N(A ). 2 T T Show that N(A ) ⊆ R(P2 ).

Four Fundamental Subspaces

Proof   P1 Partition P as P = , where P1 is r × m and P2 is m − r × m. P2

The claim of the theorem implies that we should show that T T R(P2 ) = N(A ).

[email protected] MATH 532 28 Four Fundamental Subspaces

Proof   P1 Partition P as P = , where P1 is r × m and P2 is m − r × m. P2

The claim of the theorem implies that we should show that T T R(P2 ) = N(A ).

We do this in two parts:

1 T T Show that R(P2 ) ⊆ N(A ). 2 T T Show that N(A ) ⊆ R(P2 ).

[email protected] MATH 532 28 =⇒ P2A = O.

This also means that T T T A P2 = O ,

T T T T i.e., every column of P2 is in N(A ) so that R(P2 ) ⊆ N(A ).

Four Fundamental Subspaces

(cont.) C 1 Partition U = with C ∈ r×n and O ∈ m−r×n (a zero m×n O R R matrix). Then P  C PA = U ⇐⇒ 1 A = P2 O

[email protected] MATH 532 29 This also means that T T T A P2 = O ,

T T T T i.e., every column of P2 is in N(A ) so that R(P2 ) ⊆ N(A ).

Four Fundamental Subspaces

(cont.) C 1 Partition U = with C ∈ r×n and O ∈ m−r×n (a zero m×n O R R matrix). Then     P1 C PA = U ⇐⇒ A = =⇒ P2A = O. P2 O

[email protected] MATH 532 29 T T T T i.e., every column of P2 is in N(A ) so that R(P2 ) ⊆ N(A ).

Four Fundamental Subspaces

(cont.) C 1 Partition U = with C ∈ r×n and O ∈ m−r×n (a zero m×n O R R matrix). Then     P1 C PA = U ⇐⇒ A = =⇒ P2A = O. P2 O

This also means that T T T A P2 = O ,

[email protected] MATH 532 29 Four Fundamental Subspaces

(cont.) C 1 Partition U = with C ∈ r×n and O ∈ m−r×n (a zero m×n O R R matrix). Then     P1 C PA = U ⇐⇒ A = =⇒ P2A = O. P2 O

This also means that T T T A P2 = O ,

T T T T i.e., every column of P2 is in N(A ) so that R(P2 ) ⊆ N(A ).

[email protected] MATH 532 29 T T We assume y ∈ N(A ) and show that then y ∈ R(P2 ). By definition,

y ∈ N(AT ) =⇒ AT y = 0 ⇐⇒ y T A = 0T .

Since PA = U =⇒ A = P−1U, and so

C 0T = y T P−1U = y T P−1 O

or   T T −1 Q1 Q2 0 = y Q1C, where P = |{z} |{z} . m×r m×m−r

Four Fundamental Subspaces

(cont.)

2 T T Now, show N(A ) ⊆ R(P2 ).

[email protected] MATH 532 30 By definition,

y ∈ N(AT ) =⇒ AT y = 0 ⇐⇒ y T A = 0T .

Since PA = U =⇒ A = P−1U, and so

C 0T = y T P−1U = y T P−1 O

or   T T −1 Q1 Q2 0 = y Q1C, where P = |{z} |{z} . m×r m×m−r

Four Fundamental Subspaces

(cont.)

2 T T Now, show N(A ) ⊆ R(P2 ). T T We assume y ∈ N(A ) and show that then y ∈ R(P2 ).

[email protected] MATH 532 30 Since PA = U =⇒ A = P−1U, and so

C 0T = y T P−1U = y T P−1 O

or   T T −1 Q1 Q2 0 = y Q1C, where P = |{z} |{z} . m×r m×m−r

Four Fundamental Subspaces

(cont.)

2 T T Now, show N(A ) ⊆ R(P2 ). T T We assume y ∈ N(A ) and show that then y ∈ R(P2 ). By definition,

y ∈ N(AT ) =⇒ AT y = 0 ⇐⇒ y T A = 0T .

[email protected] MATH 532 30 or   T T −1 Q1 Q2 0 = y Q1C, where P = |{z} |{z} . m×r m×m−r

Four Fundamental Subspaces

(cont.)

2 T T Now, show N(A ) ⊆ R(P2 ). T T We assume y ∈ N(A ) and show that then y ∈ R(P2 ). By definition,

y ∈ N(AT ) =⇒ AT y = 0 ⇐⇒ y T A = 0T .

Since PA = U =⇒ A = P−1U, and so

C 0T = y T P−1U = y T P−1 O

[email protected] MATH 532 30 Four Fundamental Subspaces

(cont.)

2 T T Now, show N(A ) ⊆ R(P2 ). T T We assume y ∈ N(A ) and show that then y ∈ R(P2 ). By definition,

y ∈ N(AT ) =⇒ AT y = 0 ⇐⇒ y T A = 0T .

Since PA = U =⇒ A = P−1U, and so

C 0T = y T P−1U = y T P−1 O

or   T T −1 Q1 Q2 0 = y Q1C, where P = |{z} |{z} . m×r m×m−r

[email protected] MATH 532 30 Obviously, this implies that we also have

T T y Q1P1 = 0 (2)

Four Fundamental Subspaces

(cont.) However, since rank(C) = r and C is m × n we get (using m = r in our earlier theorem) N(CT ) = {0} T T and therefore y Q1 = 0 .

[email protected] MATH 532 31 Four Fundamental Subspaces

(cont.) However, since rank(C) = r and C is m × n we get (using m = r in our earlier theorem) N(CT ) = {0} T T and therefore y Q1 = 0 .

Obviously, this implies that we also have

T T y Q1P1 = 0 (2)

[email protected] MATH 532 31 T Therefore y ∈ R(P2 ). 

= Q1P1 + Q2P2

or Q1P1 = I − Q2P2. (3)

Now we insert (3) into (2) and get

Four Fundamental Subspaces

(cont.)   P1 −1  Now P = and P = Q1 Q2 so that P2

I = P−1P

[email protected] MATH 532 32 T Therefore y ∈ R(P2 ). 

or Q1P1 = I − Q2P2. (3)

Now we insert (3) into (2) and get

Four Fundamental Subspaces

(cont.)   P1 −1  Now P = and P = Q1 Q2 so that P2

−1 I = P P = Q1P1 + Q2P2

[email protected] MATH 532 32 T Therefore y ∈ R(P2 ). 

Now we insert (3) into (2) and get

Four Fundamental Subspaces

(cont.)   P1 −1  Now P = and P = Q1 Q2 so that P2

−1 I = P P = Q1P1 + Q2P2 or Q1P1 = I − Q2P2. (3)

[email protected] MATH 532 32 T Therefore y ∈ R(P2 ). 

Four Fundamental Subspaces

(cont.)   P1 −1  Now P = and P = Q1 Q2 so that P2

−1 I = P P = Q1P1 + Q2P2 or Q1P1 = I − Q2P2. (3)

Now we insert (3) into (2) and get

[email protected] MATH 532 32 T Therefore y ∈ R(P2 ). 

Four Fundamental Subspaces

(cont.)   P1 −1  Now P = and P = Q1 Q2 so that P2

−1 I = P P = Q1P1 + Q2P2 or Q1P1 = I − Q2P2. (3)

Now we insert (3) into (2) and get

T y (I − Q2P2)

[email protected] MATH 532 32 T Therefore y ∈ R(P2 ). 

Four Fundamental Subspaces

(cont.)   P1 −1  Now P = and P = Q1 Q2 so that P2

−1 I = P P = Q1P1 + Q2P2 or Q1P1 = I − Q2P2. (3)

Now we insert (3) into (2) and get

T T T T y (I − Q2P2) = 0 ⇐⇒ y = y Q2P2

[email protected] MATH 532 32 T Therefore y ∈ R(P2 ). 

Four Fundamental Subspaces

(cont.)   P1 −1  Now P = and P = Q1 Q2 so that P2

−1 I = P P = Q1P1 + Q2P2 or Q1P1 = I − Q2P2. (3)

Now we insert (3) into (2) and get

T T T T y (I − Q2P2) = 0 ⇐⇒ y = y Q2 P2. | {z } =zT

[email protected] MATH 532 32 Four Fundamental Subspaces

(cont.)   P1 −1  Now P = and P = Q1 Q2 so that P2

−1 I = P P = Q1P1 + Q2P2 or Q1P1 = I − Q2P2. (3)

Now we insert (3) into (2) and get

T T T T y (I − Q2P2) = 0 ⇐⇒ y = y Q2 P2. | {z } =zT T Therefore y ∈ R(P2 ). 

[email protected] MATH 532 32 Four Fundamental Subspaces

Finally, Theorem Let A, B be m × n matrices. row 1 N(A) = N(B) ⇐⇒ A ∼ B. col 2 N(AT ) = N(BT ) ⇐⇒ A ∼ B.

Proof. See [Mey00, Section 4.2].

[email protected] MATH 532 33 Linear Independence Outline

1 Spaces and Subspaces

2 Four Fundamental Subspaces

3 Linear Independence

4 Bases and Dimension

5 More About Rank

6 Classical Least Squares

7 Kriging as best linear unbiased predictor

[email protected] MATH 532 34 Linear Independence Linear Independence

Definition

A set of vectors S = {v 1,..., v n} is called linearly independent if

α1v 1 + α2v 2 + ... + αnv n = 0 =⇒ α1 = α2 = ... = αn = 0.

Otherwise S is linearly dependent.

Remark Linear independence is a property of a set, not of vectors.

[email protected] MATH 532 35     1 2 3 α1 ⇐⇒ Ax = 0, where A = 4 5 6 , x = α2 7 8 9 α3

Consider 1 2 3 0 α1 4 + α2 5 + α3 6 = 0 7 8 9 0

Linear Independence

Example        1 2 3  Is S = 4 , 5 , 6 linearly independent?  7 8 9 

[email protected] MATH 532 36     1 2 3 α1 ⇐⇒ Ax = 0, where A = 4 5 6 , x = α2 7 8 9 α3

Linear Independence

Example        1 2 3  Is S = 4 , 5 , 6 linearly independent?  7 8 9 

Consider 1 2 3 0 α1 4 + α2 5 + α3 6 = 0 7 8 9 0

[email protected] MATH 532 36 Linear Independence

Example        1 2 3  Is S = 4 , 5 , 6 linearly independent?  7 8 9 

Consider 1 2 3 0 α1 4 + α2 5 + α3 6 = 0 7 8 9 0     1 2 3 α1 ⇐⇒ Ax = 0, where A = 4 5 6 , x = α2 7 8 9 α3

[email protected] MATH 532 36 Linear Independence

Example ((cont.)) Since 1 2 3 row A ∼ EA = 0 1 2 0 0 0 we know that N(A) is nontrivial, i.e., the systemA x = 0 has a nonzero solution, and therefore S is linearly dependent.

[email protected] MATH 532 37 Linear Independence

More generally, Theorem Let A be an m × n matrix. 1 The columns of A are linearly independent if and only if N(A) = {0} ⇐⇒ rank(A) = n. 2 The rows of A are linearly independent if and only if N(AT ) = {0} ⇐⇒ rank(A) = m.

Proof. See [Mey00, Section 4.3].

[email protected] MATH 532 38 Remark Aside from being nonsingular (see next slide), diagonally dominant matrices are important since they ensure that will succeed without pivoting. Also, diagonally dominance ensures convergence of certain iterative solvers (more later).

Linear Independence

Definition A A is called diagonally dominant if

n X |aii | > |aij |, i = 1,..., n. j=1 j6=i

[email protected] MATH 532 39 Linear Independence

Definition A square matrix A is called diagonally dominant if

n X |aii | > |aij |, i = 1,..., n. j=1 j6=i

Remark Aside from being nonsingular (see next slide), diagonally dominant matrices are important since they ensure that Gaussian elimination will succeed without pivoting. Also, diagonally dominance ensures convergence of certain iterative solvers (more later).

[email protected] MATH 532 39 Proof We will show that N(A) = {0} since then we know that rank(A) = n and A is nonsingular.

We will do this with a proof by contradiction.

We assume that there exists an x(6= 0) ∈ N(A) and we will conclude that A cannot be diagonally dominant.

Linear Independence

Theorem Let A be an n × n matrix. If A is diagonally dominant then A is nonsingular.

[email protected] MATH 532 40 We assume that there exists an x(6= 0) ∈ N(A) and we will conclude that A cannot be diagonally dominant.

Linear Independence

Theorem Let A be an n × n matrix. If A is diagonally dominant then A is nonsingular.

Proof We will show that N(A) = {0} since then we know that rank(A) = n and A is nonsingular.

We will do this with a proof by contradiction.

[email protected] MATH 532 40 Linear Independence

Theorem Let A be an n × n matrix. If A is diagonally dominant then A is nonsingular.

Proof We will show that N(A) = {0} since then we know that rank(A) = n and A is nonsingular.

We will do this with a proof by contradiction.

We assume that there exists an x(6= 0) ∈ N(A) and we will conclude that A cannot be diagonally dominant.

[email protected] MATH 532 40 Now we take k so that xk is the maximum (in absolute value) component of x and consider

Ak∗x = 0.

We can rewrite this as

n n X X akj xj = 0 ⇐⇒ akk xk = − akj xj . j=1 j=1 j6=k

Linear Independence

(cont.) If x ∈ N(A) then Ax = 0.

[email protected] MATH 532 41 We can rewrite this as

n n X X akj xj = 0 ⇐⇒ akk xk = − akj xj . j=1 j=1 j6=k

Linear Independence

(cont.) If x ∈ N(A) then Ax = 0.

Now we take k so that xk is the maximum (in absolute value) component of x and consider

Ak∗x = 0.

[email protected] MATH 532 41 n X akk xk = − akj xj . j=1 j6=k

Linear Independence

(cont.) If x ∈ N(A) then Ax = 0.

Now we take k so that xk is the maximum (in absolute value) component of x and consider

Ak∗x = 0.

We can rewrite this as

n X akj xj = 0 ⇐⇒ j=1

[email protected] MATH 532 41 Linear Independence

(cont.) If x ∈ N(A) then Ax = 0.

Now we take k so that xk is the maximum (in absolute value) component of x and consider

Ak∗x = 0.

We can rewrite this as

n n X X akj xj = 0 ⇐⇒ akk xk = − akj xj . j=1 j=1 j6=k

[email protected] MATH 532 41 n X ≤ akj xj j=1 j6=k n X ≤ |xk | akj |{z} j=1 max. component j6=k

Finally, dividing both sides by |xk | yields

n X |akk | ≤ akj , j=1 j6=k

which shows that A cannot be diagonally dominant (which is a contradiction since A was assumed to be diagonally dominant). 

Linear Independence (cont.) Now we take absolute values:

n X |akk xk | = akj xj j=1 j6=k

[email protected] MATH 532 42 n X ≤ |xk | akj |{z} j=1 max. component j6=k

Finally, dividing both sides by |xk | yields

n X |akk | ≤ akj , j=1 j6=k

which shows that A cannot be diagonally dominant (which is a contradiction since A was assumed to be diagonally dominant). 

Linear Independence (cont.) Now we take absolute values:

n n X X |akk xk | = akj xj ≤ akj xj j=1 j=1 j6=k j6=k

[email protected] MATH 532 42 Finally, dividing both sides by |xk | yields

n X |akk | ≤ akj , j=1 j6=k

which shows that A cannot be diagonally dominant (which is a contradiction since A was assumed to be diagonally dominant). 

Linear Independence (cont.) Now we take absolute values:

n n X X |akk xk | = akj xj ≤ akj xj j=1 j=1 j6=k j6=k n X ≤ |xk | akj |{z} j=1 max. component j6=k

[email protected] MATH 532 42 which shows that A cannot be diagonally dominant (which is a contradiction since A was assumed to be diagonally dominant). 

Linear Independence (cont.) Now we take absolute values:

n n X X |akk xk | = akj xj ≤ akj xj j=1 j=1 j6=k j6=k n X ≤ |xk | akj |{z} j=1 max. component j6=k

Finally, dividing both sides by |xk | yields

n X |akk | ≤ akj , j=1 j6=k

[email protected] MATH 532 42 Linear Independence (cont.) Now we take absolute values:

n n X X |akk xk | = akj xj ≤ akj xj j=1 j=1 j6=k j6=k n X ≤ |xk | akj |{z} j=1 max. component j6=k

Finally, dividing both sides by |xk | yields

n X |akk | ≤ akj , j=1 j6=k

which shows that A cannot be diagonally dominant (which is a contradiction since A was assumed to be diagonally dominant).  [email protected] MATH 532 42 From above, the columns of V are linearly independent if and only if N(V) = {0}   α0  .  ⇐⇒ Vz = 0 =⇒ z = 0, z =  .  . αn−1

Linear Independence

Example

Consider m real numbers x1,..., xm such that xi 6= xj , i 6= j. Show that the columns of the

 2 n−1 1 x1 x1 ··· x1 2 n−1 1 x2 x ··· x  V =  2 2   .   .  2 n−1 1 xm xm ··· xm form a linearly independent set provided n ≤ m.

[email protected] MATH 532 43 Linear Independence

Example

Consider m real numbers x1,..., xm such that xi 6= xj , i 6= j. Show that the columns of the Vandermonde matrix

 2 n−1 1 x1 x1 ··· x1 2 n−1 1 x2 x ··· x  V =  2 2   .   .  2 n−1 1 xm xm ··· xm form a linearly independent set provided n ≤ m. From above, the columns of V are linearly independent if and only if N(V) = {0}   α0  .  ⇐⇒ Vz = 0 =⇒ z = 0, z =  .  . αn−1

[email protected] MATH 532 43 In other words, x1, x2,..., xm are all (distinct) roots of

2 n−1 p(x) = α0 + α1x + α2x + ... + αn−1x .

This is a of degree at most n − 1.

It can have m distinct roots only if m ≤ n − 1.

Otherwise, p is the zero polynomial, i.e., α0 = α1 = ... = αn−1 = 0, so that the columns of V are linearly dependent.

Linear Independence

Example (cont.) Now Vz = 0 if and only if

2 n−1 α0 + α1xi + α2xi + ... + αn−1xi = 0, i = 1,..., m.

[email protected] MATH 532 44 Linear Independence

Example (cont.) Now Vz = 0 if and only if

2 n−1 α0 + α1xi + α2xi + ... + αn−1xi = 0, i = 1,..., m.

In other words, x1, x2,..., xm are all (distinct) roots of

2 n−1 p(x) = α0 + α1x + α2x + ... + αn−1x .

This is a polynomial of degree at most n − 1.

It can have m distinct roots only if m ≤ n − 1.

Otherwise, p is the zero polynomial, i.e., α0 = α1 = ... = αn−1 = 0, so that the columns of V are linearly dependent.

[email protected] MATH 532 44 We see this by writing the polynomial in the form

2 m−1 `(t) = α0 + α1t + α2t + ... + αm−1t .

Then, interpolation of the data implies

`(xi ) = yi , i = 1,..., m

or  2 m−1     1 x1 x1 ··· x1 α0 y1 2 m−1 1 x2 x ··· x   α1  y2   2 2    =   .  .   .   .   .   .   .  2 m−1 1 xm xm ··· xm αm−1 ym Since the columns of V are linearly independent it is nonsingular, and the coefficients α0, . . . , αm−1 are uniquely determined.

Linear Independence The example implies that in the special case m = n there is a unique polynomial of degree (at most) m − 1 that interpolates the data 2 {(x1, y1), (x2, y2),..., (xm, ym)} ⊂ R .

[email protected] MATH 532 45 `(xi ) = yi , i = 1,..., m

or  2 m−1     1 x1 x1 ··· x1 α0 y1 2 m−1 1 x2 x ··· x   α1  y2   2 2    =   .  .   .   .   .   .   .  2 m−1 1 xm xm ··· xm αm−1 ym Since the columns of V are linearly independent it is nonsingular, and the coefficients α0, . . . , αm−1 are uniquely determined.

Linear Independence The example implies that in the special case m = n there is a unique polynomial of degree (at most) m − 1 that interpolates the data 2 {(x1, y1), (x2, y2),..., (xm, ym)} ⊂ R . We see this by writing the polynomial in the form

2 m−1 `(t) = α0 + α1t + α2t + ... + αm−1t .

Then, interpolation of the data implies

[email protected] MATH 532 45  2 m−1     1 x1 x1 ··· x1 α0 y1 2 m−1 1 x2 x ··· x   α1  y2   2 2    =   .  .   .   .   .   .   .  2 m−1 1 xm xm ··· xm αm−1 ym Since the columns of V are linearly independent it is nonsingular, and the coefficients α0, . . . , αm−1 are uniquely determined.

Linear Independence The example implies that in the special case m = n there is a unique polynomial of degree (at most) m − 1 that interpolates the data 2 {(x1, y1), (x2, y2),..., (xm, ym)} ⊂ R . We see this by writing the polynomial in the form

2 m−1 `(t) = α0 + α1t + α2t + ... + αm−1t .

Then, interpolation of the data implies

`(xi ) = yi , i = 1,..., m or

[email protected] MATH 532 45 Linear Independence The example implies that in the special case m = n there is a unique polynomial of degree (at most) m − 1 that interpolates the data 2 {(x1, y1), (x2, y2),..., (xm, ym)} ⊂ R . We see this by writing the polynomial in the form

2 m−1 `(t) = α0 + α1t + α2t + ... + αm−1t .

Then, interpolation of the data implies

`(xi ) = yi , i = 1,..., m or  2 m−1     1 x1 x1 ··· x1 α0 y1 2 m−1 1 x2 x ··· x   α1  y2   2 2    =   .  .   .   .   .   .   .  2 m−1 1 xm xm ··· xm αm−1 ym Since the columns of V are linearly independent it is nonsingular, and the coefficients α0, . . . , αm−1 are uniquely determined. [email protected] MATH 532 45 To verify (4) we note that the degree of ` is m − 1 (since each Li is of degree m − 1) and

Li (xj ) = δij , i, j = 1,..., m,

so that m X `(xj ) = yi Li (xj ) = yj , j = 1,..., m. i=1 | {z } =δij

Linear Independence

In fact,

m X `(t) = yi Li (t) (Lagrange interpolation polynomial) i=1 m m Y Y with Li (t) = (t − xk )/ (xi − xk ) (Lagrange functions). k=1 k=1 k6=i k6=i

[email protected] MATH 532 46 Linear Independence

In fact,

m X `(t) = yi Li (t) (Lagrange interpolation polynomial) i=1 m m Y Y with Li (t) = (t − xk )/ (xi − xk ) (Lagrange functions). k=1 k=1 k6=i k6=i

To verify (4) we note that the degree of ` is m − 1 (since each Li is of degree m − 1) and

Li (xj ) = δij , i, j = 1,..., m, so that m X `(xj ) = yi Li (xj ) = yj , j = 1,..., m. i=1 | {z } =δij

[email protected] MATH 532 46 Linear Independence

Theorem

Let S = {u1, u2 ..., un} ⊆ V be nonempty. Then

1 If S contains a linearly dependent subset, then S is linearly dependent.

2 If S is linearly independent, then every subset of S is also linearly independent.

3 If S is linearly independent and if v ∈ V, then Sext = S ∪ {v} is linearly independent if and only if v ∈/ span(S).

4 If S ⊆ Rm and n > m, then S must be linearly dependent.

[email protected] MATH 532 47 2 Follows from (1) by contraposition.

Linear Independence

Proof

1 If S contains a linearly dependent subset, {u1,..., uk } say, then there exist nontrivial coefficients α1, . . . , αk such that

α1u1 + ... + αk uk = 0.

Clearly, then

α1u1 + ... + αk uk + 0uk+1 + ... + 0un = 0

and S is also linearly dependent.

[email protected] MATH 532 48 Linear Independence

Proof

1 If S contains a linearly dependent subset, {u1,..., uk } say, then there exist nontrivial coefficients α1, . . . , αk such that

α1u1 + ... + αk uk = 0.

Clearly, then

α1u1 + ... + αk uk + 0uk+1 + ... + 0un = 0

and S is also linearly dependent. 2 Follows from (1) by contraposition.

[email protected] MATH 532 48 “⇐=”: Assume v ∈/ span(S) and consider

α1u1 + α2u2 + ... + αnun + αn+1v = 0.

First, αn+1 = 0 since otherwise v ∈ span(S). That leaves α1u1 + α2u2 + ... + αnun = 0.

However, the linear independence of S implies αi = 0, i = 1,..., n, and therefore Sext is linearly independent.

Linear Independence

(cont.)

3 “=⇒”: Assume Sext is linearly independent. Then v can’t be a of u1,..., un.

[email protected] MATH 532 49 That leaves α1u1 + α2u2 + ... + αnun = 0.

However, the linear independence of S implies αi = 0, i = 1,..., n, and therefore Sext is linearly independent.

Linear Independence

(cont.)

3 “=⇒”: Assume Sext is linearly independent. Then v can’t be a linear combination of u1,..., un.

“⇐=”: Assume v ∈/ span(S) and consider

α1u1 + α2u2 + ... + αnun + αn+1v = 0.

First, αn+1 = 0 since otherwise v ∈ span(S).

[email protected] MATH 532 49 Linear Independence

(cont.)

3 “=⇒”: Assume Sext is linearly independent. Then v can’t be a linear combination of u1,..., un.

“⇐=”: Assume v ∈/ span(S) and consider

α1u1 + α2u2 + ... + αnun + αn+1v = 0.

First, αn+1 = 0 since otherwise v ∈ span(S). That leaves α1u1 + α2u2 + ... + αnun = 0.

However, the linear independence of S implies αi = 0, i = 1,..., n, and therefore Sext is linearly independent.

[email protected] MATH 532 49 Linear Independence

(cont.) 4 We know that the columns of an m × n matrix A are linearly independent if and only if rank(A) = n.

 m Here A = u1 u2 ··· un with ui ∈ R .

If n > m, then rank(A) ≤ m and S must be linearly dependent. 

[email protected] MATH 532 50 Bases and Dimension Outline

1 Spaces and Subspaces

2 Four Fundamental Subspaces

3 Linear Independence

4 Bases and Dimension

5 More About Rank

6 Classical Least Squares

7 Kriging as best linear unbiased predictor

[email protected] MATH 532 51 Now Definition Consider a vector space V with spanning set S. If S is also linearly independent then we call it a basis of V.

Example n 1 {e1,..., en} is the standard basis for R . 2 The columns/rows of an n × n matrix A with rank(A) = n form a basis for Rn.

Bases and Dimension Bases and Dimension

Earlier we introduced the concept of a spanning set of a vector space V, i.e., V = span{v 1,..., v n}

[email protected] MATH 532 52 Example n 1 {e1,..., en} is the standard basis for R . 2 The columns/rows of an n × n matrix A with rank(A) = n form a basis for Rn.

Bases and Dimension Bases and Dimension

Earlier we introduced the concept of a spanning set of a vector space V, i.e., V = span{v 1,..., v n} Now Definition Consider a vector space V with spanning set S. If S is also linearly independent then we call it a basis of V.

[email protected] MATH 532 52 2 The columns/rows of an n × n matrix A with rank(A) = n form a basis for Rn.

Bases and Dimension Bases and Dimension

Earlier we introduced the concept of a spanning set of a vector space V, i.e., V = span{v 1,..., v n} Now Definition Consider a vector space V with spanning set S. If S is also linearly independent then we call it a basis of V.

Example n 1 {e1,..., en} is the standard basis for R .

[email protected] MATH 532 52 Bases and Dimension Bases and Dimension

Earlier we introduced the concept of a spanning set of a vector space V, i.e., V = span{v 1,..., v n} Now Definition Consider a vector space V with spanning set S. If S is also linearly independent then we call it a basis of V.

Example n 1 {e1,..., en} is the standard basis for R . 2 The columns/rows of an n × n matrix A with rank(A) = n form a basis for Rn.

[email protected] MATH 532 52 can be considered as infinite-dimensional linear algebra, where the linear spaces are usually function spaces such as

infinitely differentiable functions with Taylor (polynomial) basis

{1, x, x2, x3,...}

square integrable functions with Fourier basis

{1, sin(x), cos(x), sin(2x), cos(2x),...}

Bases and Dimension

Remark Linear algebra deals with finite-dimensional linear spaces.

[email protected] MATH 532 53 infinitely differentiable functions with Taylor (polynomial) basis

{1, x, x2, x3,...}

square integrable functions with Fourier basis

{1, sin(x), cos(x), sin(2x), cos(2x),...}

Bases and Dimension

Remark Linear algebra deals with finite-dimensional linear spaces.

Functional analysis can be considered as infinite-dimensional linear algebra, where the linear spaces are usually function spaces such as

[email protected] MATH 532 53 square integrable functions with Fourier basis

{1, sin(x), cos(x), sin(2x), cos(2x),...}

Bases and Dimension

Remark Linear algebra deals with finite-dimensional linear spaces.

Functional analysis can be considered as infinite-dimensional linear algebra, where the linear spaces are usually function spaces such as

infinitely differentiable functions with Taylor (polynomial) basis

{1, x, x2, x3,...}

[email protected] MATH 532 53 Bases and Dimension

Remark Linear algebra deals with finite-dimensional linear spaces.

Functional analysis can be considered as infinite-dimensional linear algebra, where the linear spaces are usually function spaces such as

infinitely differentiable functions with Taylor (polynomial) basis

{1, x, x2, x3,...}

square integrable functions with Fourier basis

{1, sin(x), cos(x), sin(2x), cos(2x),...}

[email protected] MATH 532 53 Theorem Let V be a subspace of Rm and let

B = {b1, b2,..., bn} ⊆ V.

The following are equivalent: 1 B is a basis for V. 2 B is a minimal spanning set for V. 3 B is a maximal linearly independent subset of V.

Remark We say “a basis” here since V can have many different bases.

Bases and Dimension

Earlier we mentioned the idea of minimal spanning sets.

[email protected] MATH 532 54 Remark We say “a basis” here since V can have many different bases.

Bases and Dimension

Earlier we mentioned the idea of minimal spanning sets. Theorem Let V be a subspace of Rm and let

B = {b1, b2,..., bn} ⊆ V.

The following are equivalent: 1 B is a basis for V. 2 B is a minimal spanning set for V. 3 B is a maximal linearly independent subset of V.

[email protected] MATH 532 54 Bases and Dimension

Earlier we mentioned the idea of minimal spanning sets. Theorem Let V be a subspace of Rm and let

B = {b1, b2,..., bn} ⊆ V.

The following are equivalent: 1 B is a basis for V. 2 B is a minimal spanning set for V. 3 B is a maximal linearly independent subset of V.

Remark We say “a basis” here since V can have many different bases.

[email protected] MATH 532 54 Bases and Dimension

Proof Since it is difficult to directly relate (2) and (3), our strategy will be to prove

Show (1) =⇒ (2) and (2) =⇒ (1), so that (1) ⇐⇒ (2).

Show (1) =⇒ (3) and (3) =⇒ (1), so that (1) ⇐⇒ (3).

Then — by transitivity — we will also have (2) ⇐⇒ (3).

[email protected] MATH 532 55 Assume B is not minimal, i.e., we can find a smaller spanning set {x1,..., xk } for V with k ≤ n elements. But then k X bj = αij xi , j = 1,..., n, i=1 or B = XA, where

 m×n B = b1 b2 ··· bn ∈ R ,  m×k X = x1 x2 ··· xk ∈ R , k×n [A]ij = αij , A ∈ R .

Bases and Dimension

Proof (cont.) (1) =⇒ (2): Assume B is a basis (i.e., a linearly independent spanning set) of V and show that it is minimal.

[email protected] MATH 532 56 But then k X bj = αij xi , j = 1,..., n, i=1 or B = XA, where

 m×n B = b1 b2 ··· bn ∈ R ,  m×k X = x1 x2 ··· xk ∈ R , k×n [A]ij = αij , A ∈ R .

Bases and Dimension

Proof (cont.) (1) =⇒ (2): Assume B is a basis (i.e., a linearly independent spanning set) of V and show that it is minimal. Assume B is not minimal, i.e., we can find a smaller spanning set {x1,..., xk } for V with k ≤ n elements.

[email protected] MATH 532 56 B = XA, where

 m×n B = b1 b2 ··· bn ∈ R ,  m×k X = x1 x2 ··· xk ∈ R , k×n [A]ij = αij , A ∈ R .

Bases and Dimension

Proof (cont.) (1) =⇒ (2): Assume B is a basis (i.e., a linearly independent spanning set) of V and show that it is minimal. Assume B is not minimal, i.e., we can find a smaller spanning set {x1,..., xk } for V with k ≤ n elements. But then k X bj = αij xi , j = 1,..., n, i=1 or

[email protected] MATH 532 56 Bases and Dimension

Proof (cont.) (1) =⇒ (2): Assume B is a basis (i.e., a linearly independent spanning set) of V and show that it is minimal. Assume B is not minimal, i.e., we can find a smaller spanning set {x1,..., xk } for V with k ≤ n elements. But then k X bj = αij xi , j = 1,..., n, i=1 or B = XA, where

 m×n B = b1 b2 ··· bn ∈ R ,  m×k X = x1 x2 ··· xk ∈ R , k×n [A]ij = αij , A ∈ R .

[email protected] MATH 532 56 nontrivial, i.e., there exists a z 6= 0 such that Az = 0.

But then Bz = XAz = 0, and therefore N(B) is nontrivial.

However, since B is a basis, the columns of B are linearly independent (i.e., N(B) = {0}) — and that is a contradiction.

Therefore, B has to be minimal.

Bases and Dimension

Proof (cont.) Now, rank(A) ≤ k < n, which implies N(A) is

[email protected] MATH 532 57 But then Bz = XAz = 0, and therefore N(B) is nontrivial.

However, since B is a basis, the columns of B are linearly independent (i.e., N(B) = {0}) — and that is a contradiction.

Therefore, B has to be minimal.

Bases and Dimension

Proof (cont.) Now, rank(A) ≤ k < n, which implies N(A) is nontrivial, i.e., there exists a z 6= 0 such that Az = 0.

[email protected] MATH 532 57 However, since B is a basis, the columns of B are linearly independent (i.e., N(B) = {0}) — and that is a contradiction.

Therefore, B has to be minimal.

Bases and Dimension

Proof (cont.) Now, rank(A) ≤ k < n, which implies N(A) is nontrivial, i.e., there exists a z 6= 0 such that Az = 0.

But then Bz = XAz = 0, and therefore N(B) is nontrivial.

[email protected] MATH 532 57 Therefore, B has to be minimal.

Bases and Dimension

Proof (cont.) Now, rank(A) ≤ k < n, which implies N(A) is nontrivial, i.e., there exists a z 6= 0 such that Az = 0.

But then Bz = XAz = 0, and therefore N(B) is nontrivial.

However, since B is a basis, the columns of B are linearly independent (i.e., N(B) = {0}) — and that is a contradiction.

[email protected] MATH 532 57 Bases and Dimension

Proof (cont.) Now, rank(A) ≤ k < n, which implies N(A) is nontrivial, i.e., there exists a z 6= 0 such that Az = 0.

But then Bz = XAz = 0, and therefore N(B) is nontrivial.

However, since B is a basis, the columns of B are linearly independent (i.e., N(B) = {0}) — and that is a contradiction.

Therefore, B has to be minimal.

[email protected] MATH 532 57 This is clear since if B were linearly dependent, then we would be able to remove at least one vector from B and still have a spanning set but then it would not have been minimal.

Bases and Dimension

Proof (cont.) (2) =⇒ (1): Assume B is a minimal spanning set and show that it must also be linearly independent.

[email protected] MATH 532 58 Bases and Dimension

Proof (cont.) (2) =⇒ (1): Assume B is a minimal spanning set and show that it must also be linearly independent.

This is clear since if B were linearly dependent, then we would be able to remove at least one vector from B and still have a spanning set but then it would not have been minimal.

[email protected] MATH 532 58 Assume that B is not a basis, i.e., there exists a v ∈ V such that v ∈/ span{b1,..., bn}.

Then — by an earlier theorem — the extension set B ∪ {v} is linearly independent.

But this contradicts the maximality of B, so that B has to be a basis.

Bases and Dimension

Proof (cont.) (3) =⇒ (1): Assume B is a maximal linearly independent subset of V and show that B is a basis of V.

[email protected] MATH 532 59 Then — by an earlier theorem — the extension set B ∪ {v} is linearly independent.

But this contradicts the maximality of B, so that B has to be a basis.

Bases and Dimension

Proof (cont.) (3) =⇒ (1): Assume B is a maximal linearly independent subset of V and show that B is a basis of V.

Assume that B is not a basis, i.e., there exists a v ∈ V such that v ∈/ span{b1,..., bn}.

[email protected] MATH 532 59 But this contradicts the maximality of B, so that B has to be a basis.

Bases and Dimension

Proof (cont.) (3) =⇒ (1): Assume B is a maximal linearly independent subset of V and show that B is a basis of V.

Assume that B is not a basis, i.e., there exists a v ∈ V such that v ∈/ span{b1,..., bn}.

Then — by an earlier theorem — the extension set B ∪ {v} is linearly independent.

[email protected] MATH 532 59 Bases and Dimension

Proof (cont.) (3) =⇒ (1): Assume B is a maximal linearly independent subset of V and show that B is a basis of V.

Assume that B is not a basis, i.e., there exists a v ∈ V such that v ∈/ span{b1,..., bn}.

Then — by an earlier theorem — the extension set B ∪ {v} is linearly independent.

But this contradicts the maximality of B, so that B has to be a basis.

[email protected] MATH 532 59 Let Y = {y 1,..., y k } ⊆ V, with k > n be a maximal linearly independent subset of V (note that such a set always exists). But then Y must be a basis for V by our “(1) =⇒ (3)” argument. On the other hand, Y has more vectors than B and a basis has to be a minimal spanning set. Therefore B has to already be a maximal linearly independent subset of V. 

Bases and Dimension

Proof (cont.) (1) =⇒ (3): Assume B is a basis, but not a maximal linearly independent subset of V, and show that this leads to a contradiction.

[email protected] MATH 532 60 But then Y must be a basis for V by our “(1) =⇒ (3)” argument. On the other hand, Y has more vectors than B and a basis has to be a minimal spanning set. Therefore B has to already be a maximal linearly independent subset of V. 

Bases and Dimension

Proof (cont.) (1) =⇒ (3): Assume B is a basis, but not a maximal linearly independent subset of V, and show that this leads to a contradiction.

Let Y = {y 1,..., y k } ⊆ V, with k > n be a maximal linearly independent subset of V (note that such a set always exists).

[email protected] MATH 532 60 On the other hand, Y has more vectors than B and a basis has to be a minimal spanning set. Therefore B has to already be a maximal linearly independent subset of V. 

Bases and Dimension

Proof (cont.) (1) =⇒ (3): Assume B is a basis, but not a maximal linearly independent subset of V, and show that this leads to a contradiction.

Let Y = {y 1,..., y k } ⊆ V, with k > n be a maximal linearly independent subset of V (note that such a set always exists). But then Y must be a basis for V by our “(1) =⇒ (3)” argument.

[email protected] MATH 532 60 Bases and Dimension

Proof (cont.) (1) =⇒ (3): Assume B is a basis, but not a maximal linearly independent subset of V, and show that this leads to a contradiction.

Let Y = {y 1,..., y k } ⊆ V, with k > n be a maximal linearly independent subset of V (note that such a set always exists). But then Y must be a basis for V by our “(1) =⇒ (3)” argument. On the other hand, Y has more vectors than B and a basis has to be a minimal spanning set. Therefore B has to already be a maximal linearly independent subset of V. 

[email protected] MATH 532 60 However, the number of elements in all of these bases is unique.

Definition The dimension of the vector space V is given by

dim V = the number of elements in any basis of V.

Special case: by convention

dim{0} = 0.

Bases and Dimension

Remark Above we remarked that B is not unique, i.e., a vector space V can have many different bases.

[email protected] MATH 532 61 Bases and Dimension

Remark Above we remarked that B is not unique, i.e., a vector space V can have many different bases.

However, the number of elements in all of these bases is unique.

Definition The dimension of the vector space V is given by

dim V = the number of elements in any basis of V.

Special case: by convention

dim{0} = 0.

[email protected] MATH 532 61 Geometrically, P corresponds to the plane z = 0, i.e., the xy-plane.

Note that dim P = 2.

Moreover, any subspace of R3 has dimension at most 3.

Bases and Dimension

Example Consider 1 0   3 P = span 0 , 1 ⊂ R .  0 0 

[email protected] MATH 532 62 Note that dim P = 2.

Moreover, any subspace of R3 has dimension at most 3.

Bases and Dimension

Example Consider 1 0   3 P = span 0 , 1 ⊂ R .  0 0 

Geometrically, P corresponds to the plane z = 0, i.e., the xy-plane.

[email protected] MATH 532 62 Moreover, any subspace of R3 has dimension at most 3.

Bases and Dimension

Example Consider 1 0   3 P = span 0 , 1 ⊂ R .  0 0 

Geometrically, P corresponds to the plane z = 0, i.e., the xy-plane.

Note that dim P = 2.

[email protected] MATH 532 62 Bases and Dimension

Example Consider 1 0   3 P = span 0 , 1 ⊂ R .  0 0 

Geometrically, P corresponds to the plane z = 0, i.e., the xy-plane.

Note that dim P = 2.

Moreover, any subspace of R3 has dimension at most 3.

[email protected] MATH 532 62 Proof. See [Mey00].

Bases and Dimension

In general, Theorem Let M and N be vector spaces such that M ⊆ N . Then 1 dim M ≤ dim N , 2 dim M = dim N =⇒ M = N .

[email protected] MATH 532 63 Bases and Dimension

In general, Theorem Let M and N be vector spaces such that M ⊆ N . Then 1 dim M ≤ dim N , 2 dim M = dim N =⇒ M = N .

Proof. See [Mey00].

[email protected] MATH 532 63 R(A) We know that

R(A) = span{columns of A}.

If rank(A) = r, then only r columns of A are linearly independent, i.e., dim R(A) = r. A basis of R(A) is given by the basic columns of A (determined via a row echelon form U).

Bases and Dimension Back to the 4 fundamental subspaces

Consider an m × n matrix A with rank(A) = r.

[email protected] MATH 532 64 If rank(A) = r, then only r columns of A are linearly independent, i.e., dim R(A) = r. A basis of R(A) is given by the basic columns of A (determined via a row echelon form U).

Bases and Dimension Back to the 4 fundamental subspaces

Consider an m × n matrix A with rank(A) = r.

R(A) We know that

R(A) = span{columns of A}.

[email protected] MATH 532 64 A basis of R(A) is given by the basic columns of A (determined via a row echelon form U).

Bases and Dimension Back to the 4 fundamental subspaces

Consider an m × n matrix A with rank(A) = r.

R(A) We know that

R(A) = span{columns of A}.

If rank(A) = r, then only r columns of A are linearly independent, i.e., dim R(A) = r.

[email protected] MATH 532 64 Bases and Dimension Back to the 4 fundamental subspaces

Consider an m × n matrix A with rank(A) = r.

R(A) We know that

R(A) = span{columns of A}.

If rank(A) = r, then only r columns of A are linearly independent, i.e., dim R(A) = r. A basis of R(A) is given by the basic columns of A (determined via a row echelon form U).

[email protected] MATH 532 64 Again, rank(A) = r implies that only r rows of A are linearly independent, i.e.,

dim R(AT ) = r.

A basis of R(AT ) is given by the nonzero rows of U (from the LU factorization of A).

Bases and Dimension

R(AT ) We know that

R(AT ) = span{rows of A}.

[email protected] MATH 532 65 A basis of R(AT ) is given by the nonzero rows of U (from the LU factorization of A).

Bases and Dimension

R(AT ) We know that

R(AT ) = span{rows of A}.

Again, rank(A) = r implies that only r rows of A are linearly independent, i.e.,

dim R(AT ) = r.

[email protected] MATH 532 65 Bases and Dimension

R(AT ) We know that

R(AT ) = span{rows of A}.

Again, rank(A) = r implies that only r rows of A are linearly independent, i.e.,

dim R(AT ) = r.

A basis of R(AT ) is given by the nonzero rows of U (from the LU factorization of A).

[email protected] MATH 532 65 Since P is nonsingular these rows are linearly independent and so

dim N(AT ) = m − r.

A basis of N(AT ) is given by the last m − r rows of P.

Bases and Dimension

N(AT ) One of our earlier theorems states that the last m − r rows of P span N(AT ) (where P is nonsingular such that PA = U is in row echelon form).

[email protected] MATH 532 66 A basis of N(AT ) is given by the last m − r rows of P.

Bases and Dimension

N(AT ) One of our earlier theorems states that the last m − r rows of P span N(AT ) (where P is nonsingular such that PA = U is in row echelon form).

Since P is nonsingular these rows are linearly independent and so

dim N(AT ) = m − r.

[email protected] MATH 532 66 Bases and Dimension

N(AT ) One of our earlier theorems states that the last m − r rows of P span N(AT ) (where P is nonsingular such that PA = U is in row echelon form).

Since P is nonsingular these rows are linearly independent and so

dim N(AT ) = m − r.

A basis of N(AT ) is given by the last m − r rows of P.

[email protected] MATH 532 66 A basis of N(A) is given by the n − r linearly independent solutions of Ax = 0.

Bases and Dimension

N(A) Replace A by AT above so that   dim N (AT )T = n − rank(AT ) = n − r

so that dim N(A) = n − r.

[email protected] MATH 532 67 Bases and Dimension

N(A) Replace A by AT above so that   dim N (AT )T = n − rank(AT ) = n − r

so that dim N(A) = n − r. A basis of N(A) is given by the n − r linearly independent solutions of Ax = 0.

[email protected] MATH 532 67 This follows directly from the above discussion of R(A) and N(A).

The theorem shows that there is always a balance between the rank of A and the dimension of its nullspace.

Bases and Dimension

Theorem For any m × n matrix A we have

dim R(A) + dim N(A) = n.

[email protected] MATH 532 68 The theorem shows that there is always a balance between the rank of A and the dimension of its nullspace.

Bases and Dimension

Theorem For any m × n matrix A we have

dim R(A) + dim N(A) = n.

This follows directly from the above discussion of R(A) and N(A).

[email protected] MATH 532 68 Bases and Dimension

Theorem For any m × n matrix A we have

dim R(A) + dim N(A) = n.

This follows directly from the above discussion of R(A) and N(A).

The theorem shows that there is always a balance between the rank of A and the dimension of its nullspace.

[email protected] MATH 532 68 Before we even do any calculations we know that

4 S ⊆ R , so that dim S ≤ 4.

We will now answer this question in two different ways using

1 2 2 3 1 2 4 4 6 2 A =   . 3 6 6 9 6 1 2 4 5 3

Bases and Dimension

Example Find the dimension and a basis for 1 2 2 3 1   2 4 4 6 2 S = span   ,   ,   ,   ,   . 3 6 6 9 6    1 2 4 5 3 

[email protected] MATH 532 69 We will now answer this question in two different ways using

1 2 2 3 1 2 4 4 6 2 A =   . 3 6 6 9 6 1 2 4 5 3

Bases and Dimension

Example Find the dimension and a basis for 1 2 2 3 1   2 4 4 6 2 S = span   ,   ,   ,   ,   . 3 6 6 9 6    1 2 4 5 3 

Before we even do any calculations we know that

4 S ⊆ R , so that dim S ≤ 4.

[email protected] MATH 532 69 Bases and Dimension

Example Find the dimension and a basis for 1 2 2 3 1   2 4 4 6 2 S = span   ,   ,   ,   ,   . 3 6 6 9 6    1 2 4 5 3 

Before we even do any calculations we know that

4 S ⊆ R , so that dim S ≤ 4.

We will now answer this question in two different ways using

1 2 2 3 1 2 4 4 6 2 A =   . 3 6 6 9 6 1 2 4 5 3

[email protected] MATH 532 69 Therefore, dim S = 3 and

1 2 1   2 4 2 S = span   ,   ,   3 6 6    1 4 3 

since the basic columns of EA are the first, third and fifth columns.

Bases and Dimension

Example (cont.) Via R(A), i.e., by finding the basic columns of A:

1 2 2 3 1 1 2 0 1 0 2 4 4 6 2 G.–J. 0 0 1 1 0 A =   −→ EA =   3 6 6 9 6 0 0 0 0 1 1 2 4 5 3 0 0 0 0 0

[email protected] MATH 532 70 Bases and Dimension

Example (cont.) Via R(A), i.e., by finding the basic columns of A:

1 2 2 3 1 1 2 0 1 0 2 4 4 6 2 G.–J. 0 0 1 1 0 A =   −→ EA =   3 6 6 9 6 0 0 0 0 1 1 2 4 5 3 0 0 0 0 0

Therefore, dim S = 3 and

1 2 1   2 4 2 S = span   ,   ,   3 6 6    1 4 3  since the basic columns of EA are the first, third and fifth columns.

[email protected] MATH 532 70 Therefore, dim S = 3 and

1 0 0   2 0 0 S = span   ,   ,   3 3 0    1 2 2 

since the nonzero rows of U are the first, second and third rows.

Bases and Dimension Example (cont.) Via R(AT ), i.e., R(A) = span{rows of AT }, i.e., we need the nonzero rows of U (from the LU factorization of AT :

1 2 3 1 1 2 3 1 1 2 3 1

2 4 6 2 T 0 0 0 0 0 0 3 2 T   zero out [A ]∗,1   permute   A = 2 4 6 4 −→ 0 0 0 2 −→ 0 0 0 2       3 6 9 4 0 0 0 2 0 0 0 0 1 2 6 3 0 0 3 2 0 0 0 0 | {z } =U

[email protected] MATH 532 71 Bases and Dimension Example (cont.) Via R(AT ), i.e., R(A) = span{rows of AT }, i.e., we need the nonzero rows of U (from the LU factorization of AT :

1 2 3 1 1 2 3 1 1 2 3 1

2 4 6 2 T 0 0 0 0 0 0 3 2 T   zero out [A ]∗,1   permute   A = 2 4 6 4 −→ 0 0 0 2 −→ 0 0 0 2       3 6 9 4 0 0 0 2 0 0 0 0 1 2 6 3 0 0 3 2 0 0 0 0 | {z } =U Therefore, dim S = 3 and

1 0 0   2 0 0 S = span   ,   ,   3 3 0    1 2 2  since the nonzero rows of U are the first, second and third rows.

[email protected] MATH 532 71 The procedure will be to augment the columns of S by an , i.e., to form

1 1 1 0 0 0 2 2 0 1 0 0 A =   3 6 0 0 1 0 1 3 0 0 0 1

and then to get a basis via the basic columns of U.

Bases and Dimension

Example Extend 1 1   2 2 S = span   ,   3 6    1 3  to a basis for R4.

[email protected] MATH 532 72 Bases and Dimension

Example Extend 1 1   2 2 S = span   ,   3 6    1 3  to a basis for R4. The procedure will be to augment the columns of S by an identity matrix , i.e., to form

1 1 1 0 0 0 2 2 0 1 0 0 A =   3 6 0 0 1 0 1 3 0 0 0 1 and then to get a basis via the basic columns of U.

[email protected] MATH 532 72 1 1 1 0 0 0 0 0 −2 1 0 0   0 3 −3 0 1 0 0 2 −1 0 0 1 1 1 1 0 0 0  1 1 1 0 0 0  0 2 −1 0 0 1  0 2 −1 0 0 1  −→  3 3  −→  3 3  0 0 − 2 0 1 − 2  0 0 − 2 0 1 − 2  4 0 0 −2 1 0 0 0 0 0 1 − 3 2

so that the basic columns are [A]∗1, [A]∗2, [A]∗3, [A]∗4 and

1 1 1 0   4 2 2 0 1 R = span   ,   ,   ,   . 3 6 0 0    1 3 0 0 

Bases and Dimension Example (cont.)

1 1 1 0 0 0 2 2 0 1 0 0 A =   −→ 3 6 0 0 1 0 1 3 0 0 0 1

[email protected] MATH 532 73 1 1 1 0 0 0  1 1 1 0 0 0  0 2 −1 0 0 1  0 2 −1 0 0 1  −→  3 3  −→  3 3  0 0 − 2 0 1 − 2  0 0 − 2 0 1 − 2  4 0 0 −2 1 0 0 0 0 0 1 − 3 2

so that the basic columns are [A]∗1, [A]∗2, [A]∗3, [A]∗4 and

1 1 1 0   4 2 2 0 1 R = span   ,   ,   ,   . 3 6 0 0    1 3 0 0 

Bases and Dimension Example (cont.)

1 1 1 0 0 0 1 1 1 0 0 0 2 2 0 1 0 0 0 0 −2 1 0 0 A =   −→   3 6 0 0 1 0 0 3 −3 0 1 0 1 3 0 0 0 1 0 2 −1 0 0 1

[email protected] MATH 532 73 1 1 1 0 0 0  0 2 −1 0 0 1  −→  3 3  0 0 − 2 0 1 − 2  4 0 0 0 1 − 3 2

so that the basic columns are [A]∗1, [A]∗2, [A]∗3, [A]∗4 and

1 1 1 0   4 2 2 0 1 R = span   ,   ,   ,   . 3 6 0 0    1 3 0 0 

Bases and Dimension Example (cont.)

1 1 1 0 0 0 1 1 1 0 0 0 2 2 0 1 0 0 0 0 −2 1 0 0 A =   −→   3 6 0 0 1 0 0 3 −3 0 1 0 1 3 0 0 0 1 0 2 −1 0 0 1 1 1 1 0 0 0  0 2 −1 0 0 1  −→  3 3  0 0 − 2 0 1 − 2  0 0 −2 1 0 0

[email protected] MATH 532 73 so that the basic columns are [A]∗1, [A]∗2, [A]∗3, [A]∗4 and

1 1 1 0   4 2 2 0 1 R = span   ,   ,   ,   . 3 6 0 0    1 3 0 0 

Bases and Dimension Example (cont.)

1 1 1 0 0 0 1 1 1 0 0 0 2 2 0 1 0 0 0 0 −2 1 0 0 A =   −→   3 6 0 0 1 0 0 3 −3 0 1 0 1 3 0 0 0 1 0 2 −1 0 0 1 1 1 1 0 0 0  1 1 1 0 0 0  0 2 −1 0 0 1  0 2 −1 0 0 1  −→  3 3  −→  3 3  0 0 − 2 0 1 − 2  0 0 − 2 0 1 − 2  4 0 0 −2 1 0 0 0 0 0 1 − 3 2

[email protected] MATH 532 73 Bases and Dimension Example (cont.)

1 1 1 0 0 0 1 1 1 0 0 0 2 2 0 1 0 0 0 0 −2 1 0 0 A =   −→   3 6 0 0 1 0 0 3 −3 0 1 0 1 3 0 0 0 1 0 2 −1 0 0 1 1 1 1 0 0 0  1 1 1 0 0 0  0 2 −1 0 0 1  0 2 −1 0 0 1  −→  3 3  −→  3 3  0 0 − 2 0 1 − 2  0 0 − 2 0 1 − 2  4 0 0 −2 1 0 0 0 0 0 1 − 3 2 so that the basic columns are [A]∗1, [A]∗2, [A]∗3, [A]∗4 and

1 1 1 0   4 2 2 0 1 R = span   ,   ,   ,   . 3 6 0 0    1 3 0 0 

[email protected] MATH 532 73 Theorem If X , Y are subspaces of V, then

dim(X + Y) = dim X + dim Y − dim(X ∩ Y).

Proof. See [Mey00], but the basic idea is pretty clear. We want to avoid double counting.

Bases and Dimension

Earlier we defined the sum of subspaces X and Y as

X + Y = {x + y : x ∈ X , y ∈ Y}

[email protected] MATH 532 74 Proof. See [Mey00], but the basic idea is pretty clear. We want to avoid double counting.

Bases and Dimension

Earlier we defined the sum of subspaces X and Y as

X + Y = {x + y : x ∈ X , y ∈ Y}

Theorem If X , Y are subspaces of V, then

dim(X + Y) = dim X + dim Y − dim(X ∩ Y).

[email protected] MATH 532 74 Bases and Dimension

Earlier we defined the sum of subspaces X and Y as

X + Y = {x + y : x ∈ X , y ∈ Y}

Theorem If X , Y are subspaces of V, then

dim(X + Y) = dim X + dim Y − dim(X ∩ Y).

Proof. See [Mey00], but the basic idea is pretty clear. We want to avoid double counting.

[email protected] MATH 532 74 Proof First we note that R(A + B) ⊆ R(A) + R(B) (4) since for any b ∈ R(A + B) we have

b = (A + B)x = Ax + Bx ∈ R(A) + R(B).

Bases and Dimension

Corollary Let A and B be m × n matrices. Then

rank(A + B) ≤ rank(A) + rank(B).

[email protected] MATH 532 75 since for any b ∈ R(A + B) we have

b = (A + B)x = Ax + Bx ∈ R(A) + R(B).

Bases and Dimension

Corollary Let A and B be m × n matrices. Then

rank(A + B) ≤ rank(A) + rank(B).

Proof First we note that R(A + B) ⊆ R(A) + R(B) (4)

[email protected] MATH 532 75 Bases and Dimension

Corollary Let A and B be m × n matrices. Then

rank(A + B) ≤ rank(A) + rank(B).

Proof First we note that R(A + B) ⊆ R(A) + R(B) (4) since for any b ∈ R(A + B) we have

b = (A + B)x = Ax + Bx ∈ R(A) + R(B).

[email protected] MATH 532 75 (4) ≤ dim(R(A) + R(B))

Thm= dim R(A) + dim R(B) − dim (R(A) ∩ R(B)) ≤ dim R(A) + dim R(B) = rank(A) + rank(B)

Bases and Dimension

(cont.) Now,

rank(A + B) = dim R(A + B)



[email protected] MATH 532 76 Thm= dim R(A) + dim R(B) − dim (R(A) ∩ R(B)) ≤ dim R(A) + dim R(B) = rank(A) + rank(B)

Bases and Dimension

(cont.) Now,

rank(A + B) = dim R(A + B)

(4) ≤ dim(R(A) + R(B))



[email protected] MATH 532 76 ≤ dim R(A) + dim R(B) = rank(A) + rank(B)

Bases and Dimension

(cont.) Now,

rank(A + B) = dim R(A + B)

(4) ≤ dim(R(A) + R(B))

Thm= dim R(A) + dim R(B) − dim (R(A) ∩ R(B))



[email protected] MATH 532 76 Bases and Dimension

(cont.) Now,

rank(A + B) = dim R(A + B)

(4) ≤ dim(R(A) + R(B))

Thm= dim R(A) + dim R(B) − dim (R(A) ∩ R(B)) ≤ dim R(A) + dim R(B) = rank(A) + rank(B)



[email protected] MATH 532 76 More About Rank Outline

1 Spaces and Subspaces

2 Four Fundamental Subspaces

3 Linear Independence

4 Bases and Dimension

5 More About Rank

6 Classical Least Squares

7 Kriging as best linear unbiased predictor

[email protected] MATH 532 77 Thus (for invertible P, Q), PAQ = B implies rank(A) = rank(PAQ).

As we now show, it is a general fact that multiplication by a nonsingular matrix does not change the rank of a given matrix.

Moreover, multiplication by an arbitrary matrix can only lower the rank. Theorem Let A be an m × n matrix, and let B by n × p. Then

rank(AB) = rank(B) − dim (N(A) ∩ R(B)) .

Remark Note that if A is nonsingular, then N(A) = {0} so that dim (N(A) ∩ R(B)) = 0 and rank(AB) = rank(B).

More About Rank More About Rank

We know that A ∼ B if and only if rank(A) = rank(B).

[email protected] MATH 532 78 As we now show, it is a general fact that multiplication by a nonsingular matrix does not change the rank of a given matrix.

Moreover, multiplication by an arbitrary matrix can only lower the rank. Theorem Let A be an m × n matrix, and let B by n × p. Then

rank(AB) = rank(B) − dim (N(A) ∩ R(B)) .

Remark Note that if A is nonsingular, then N(A) = {0} so that dim (N(A) ∩ R(B)) = 0 and rank(AB) = rank(B).

More About Rank More About Rank

We know that A ∼ B if and only if rank(A) = rank(B).

Thus (for invertible P, Q), PAQ = B implies rank(A) = rank(PAQ).

[email protected] MATH 532 78 Moreover, multiplication by an arbitrary matrix can only lower the rank. Theorem Let A be an m × n matrix, and let B by n × p. Then

rank(AB) = rank(B) − dim (N(A) ∩ R(B)) .

Remark Note that if A is nonsingular, then N(A) = {0} so that dim (N(A) ∩ R(B)) = 0 and rank(AB) = rank(B).

More About Rank More About Rank

We know that A ∼ B if and only if rank(A) = rank(B).

Thus (for invertible P, Q), PAQ = B implies rank(A) = rank(PAQ).

As we now show, it is a general fact that multiplication by a nonsingular matrix does not change the rank of a given matrix.

[email protected] MATH 532 78 More About Rank More About Rank

We know that A ∼ B if and only if rank(A) = rank(B).

Thus (for invertible P, Q), PAQ = B implies rank(A) = rank(PAQ).

As we now show, it is a general fact that multiplication by a nonsingular matrix does not change the rank of a given matrix.

Moreover, multiplication by an arbitrary matrix can only lower the rank. Theorem Let A be an m × n matrix, and let B by n × p. Then

rank(AB) = rank(B) − dim (N(A) ∩ R(B)) .

Remark Note that if A is nonsingular, then N(A) = {0} so that dim (N(A) ∩ R(B)) = 0 and rank(AB) = rank(B).

[email protected] MATH 532 78 Since N(A) ∩ R(B) ⊆ R(B) we know that

dim(R(B)) = s + t, for some t ≥ 0.

We can construct an extension set such that

B = {x1, x2,..., xs, z1,..., z2,..., zt }

is a basis for R(B).

More About Rank

Proof

Let S = {x1, x2,..., xs} be a basis for N(A) ∩ R(B).

[email protected] MATH 532 79 We can construct an extension set such that

B = {x1, x2,..., xs, z1,..., z2,..., zt }

is a basis for R(B).

More About Rank

Proof

Let S = {x1, x2,..., xs} be a basis for N(A) ∩ R(B).

Since N(A) ∩ R(B) ⊆ R(B) we know that

dim(R(B)) = s + t, for some t ≥ 0.

[email protected] MATH 532 79 More About Rank

Proof

Let S = {x1, x2,..., xs} be a basis for N(A) ∩ R(B).

Since N(A) ∩ R(B) ⊆ R(B) we know that

dim(R(B)) = s + t, for some t ≥ 0.

We can construct an extension set such that

B = {x1, x2,..., xs, z1,..., z2,..., zt } is a basis for R(B).

[email protected] MATH 532 79 Therefore, we now show that dim(R(AB)) = t. In particular, we show that

T = {Az1, Az2,..., Azt }

is a basis for R(AB).

We do this by showing that 1 T is a spanning set for R(AB), 2 T is linearly independent.

More About Rank

(cont.) If we can show that dim(R(AB)) = t then

rank(B) = dim(R(B)) = s + t = dim (N(A) ∩ R(B)) + dim(R(AB)), and we are done.

[email protected] MATH 532 80 We do this by showing that 1 T is a spanning set for R(AB), 2 T is linearly independent.

More About Rank

(cont.) If we can show that dim(R(AB)) = t then

rank(B) = dim(R(B)) = s + t = dim (N(A) ∩ R(B)) + dim(R(AB)), and we are done.

Therefore, we now show that dim(R(AB)) = t. In particular, we show that

T = {Az1, Az2,..., Azt } is a basis for R(AB).

[email protected] MATH 532 80 More About Rank

(cont.) If we can show that dim(R(AB)) = t then

rank(B) = dim(R(B)) = s + t = dim (N(A) ∩ R(B)) + dim(R(AB)), and we are done.

Therefore, we now show that dim(R(AB)) = t. In particular, we show that

T = {Az1, Az2,..., Azt } is a basis for R(AB).

We do this by showing that 1 T is a spanning set for R(AB), 2 T is linearly independent.

[email protected] MATH 532 80 But then By ∈ R(B), so that

s t X X By = ξi xi + ηj zj i=1 j=1

and

s t t X X X b = ABy = ξi Axi + ηj Azj = ηj Azj i=1 j=1 j=1

since xi ∈ N(A).

More About Rank

(cont.) Spanning set: Consider an arbitrary b ∈ R(AB). It can be written as

b = ABy for some y.

[email protected] MATH 532 81 s t t X X X b = ABy = ξi Axi + ηj Azj = ηj Azj i=1 j=1 j=1

since xi ∈ N(A).

More About Rank

(cont.) Spanning set: Consider an arbitrary b ∈ R(AB). It can be written as

b = ABy for some y.

But then By ∈ R(B), so that

s t X X By = ξi xi + ηj zj i=1 j=1

and

[email protected] MATH 532 81 t X = ηj Azj j=1

since xi ∈ N(A).

More About Rank

(cont.) Spanning set: Consider an arbitrary b ∈ R(AB). It can be written as

b = ABy for some y.

But then By ∈ R(B), so that

s t X X By = ξi xi + ηj zj i=1 j=1

and

s t X X b = ABy = ξi Axi + ηj Azj i=1 j=1

[email protected] MATH 532 81 More About Rank

(cont.) Spanning set: Consider an arbitrary b ∈ R(AB). It can be written as

b = ABy for some y.

But then By ∈ R(B), so that

s t X X By = ξi xi + ηj zj i=1 j=1

and

s t t X X X b = ABy = ξi Axi + ηj Azj = ηj Azj i=1 j=1 j=1

since xi ∈ N(A).

[email protected] MATH 532 81 t X The identity on the right implies that αi zi ∈ N(A). i=1 t X But we also have zi ∈ B, i.e., αi zi ∈ R(B). i=1 And so together

t X αi zi ∈ N(A) ∩ R(B). i=1

More About Rank

(cont.) Linear independence: Let’s use the definition of linear independence and look at

t t X X αi Azi = 0 ⇐⇒ A αi zi = 0. i=1 i=1

[email protected] MATH 532 82 t X But we also have zi ∈ B, i.e., αi zi ∈ R(B). i=1 And so together

t X αi zi ∈ N(A) ∩ R(B). i=1

More About Rank

(cont.) Linear independence: Let’s use the definition of linear independence and look at

t t X X αi Azi = 0 ⇐⇒ A αi zi = 0. i=1 i=1

t X The identity on the right implies that αi zi ∈ N(A). i=1

[email protected] MATH 532 82 And so together

t X αi zi ∈ N(A) ∩ R(B). i=1

More About Rank

(cont.) Linear independence: Let’s use the definition of linear independence and look at

t t X X αi Azi = 0 ⇐⇒ A αi zi = 0. i=1 i=1

t X The identity on the right implies that αi zi ∈ N(A). i=1 t X But we also have zi ∈ B, i.e., αi zi ∈ R(B). i=1

[email protected] MATH 532 82 More About Rank

(cont.) Linear independence: Let’s use the definition of linear independence and look at

t t X X αi Azi = 0 ⇐⇒ A αi zi = 0. i=1 i=1

t X The identity on the right implies that αi zi ∈ N(A). i=1 t X But we also have zi ∈ B, i.e., αi zi ∈ R(B). i=1 And so together

t X αi zi ∈ N(A) ∩ R(B). i=1

[email protected] MATH 532 82 t s X X αi zi − βj xj = 0. i=1 j=1

But B = {x1,..., xs, z1,..., zt } is linearly independent, so that α1 = ··· = αt = β1 = ··· = βs = 0 and therefore T is also linearly independent. 

More About Rank

(cont.)

Now, since S = {x1,..., xs} is a basis for N(A) ∩ R(B) we have

t s X X αi zi = βj xj ⇐⇒ i=1 j=1

[email protected] MATH 532 83 But B = {x1,..., xs, z1,..., zt } is linearly independent, so that α1 = ··· = αt = β1 = ··· = βs = 0 and therefore T is also linearly independent. 

More About Rank

(cont.)

Now, since S = {x1,..., xs} is a basis for N(A) ∩ R(B) we have

t s t s X X X X αi zi = βj xj ⇐⇒ αi zi − βj xj = 0. i=1 j=1 i=1 j=1

[email protected] MATH 532 83 therefore T is also linearly independent. 

More About Rank

(cont.)

Now, since S = {x1,..., xs} is a basis for N(A) ∩ R(B) we have

t s t s X X X X αi zi = βj xj ⇐⇒ αi zi − βj xj = 0. i=1 j=1 i=1 j=1

But B = {x1,..., xs, z1,..., zt } is linearly independent, so that α1 = ··· = αt = β1 = ··· = βs = 0 and

[email protected] MATH 532 83 More About Rank

(cont.)

Now, since S = {x1,..., xs} is a basis for N(A) ∩ R(B) we have

t s t s X X X X αi zi = βj xj ⇐⇒ αi zi − βj xj = 0. i=1 j=1 i=1 j=1

But B = {x1,..., xs, z1,..., zt } is linearly independent, so that α1 = ··· = αt = β1 = ··· = βs = 0 and therefore T is also linearly independent. 

[email protected] MATH 532 83 Therefore, the following upper and lower bounds for rank(AB) are useful.

Theorem Let A be an m × n matrix, and let B by n × p. Then 1 rank(AB) ≤ min{rank(A), rank(B)}, 2 rank(AB) ≥ rank(A) + rank(B) − n.

More About Rank

It turns out that dim(N(A) ∩ R(B)) is relatively difficult to determine.

[email protected] MATH 532 84 Theorem Let A be an m × n matrix, and let B by n × p. Then 1 rank(AB) ≤ min{rank(A), rank(B)}, 2 rank(AB) ≥ rank(A) + rank(B) − n.

More About Rank

It turns out that dim(N(A) ∩ R(B)) is relatively difficult to determine.

Therefore, the following upper and lower bounds for rank(AB) are useful.

[email protected] MATH 532 84 More About Rank

It turns out that dim(N(A) ∩ R(B)) is relatively difficult to determine.

Therefore, the following upper and lower bounds for rank(AB) are useful.

Theorem Let A be an m × n matrix, and let B by n × p. Then 1 rank(AB) ≤ min{rank(A), rank(B)}, 2 rank(AB) ≥ rank(A) + rank(B) − n.

[email protected] MATH 532 84 Similarly,

as above rank(AB) = rank(AB)T = rank(BT AT ) ≤ rank(AT ) = rank(A).

To make things as tight as possible we take the smaller of the two upper bounds.

The previous theorem states

More About Rank

Proof of (1) We show that rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

[email protected] MATH 532 85 Similarly,

as above rank(AB) = rank(AB)T = rank(BT AT ) ≤ rank(AT ) = rank(A).

To make things as tight as possible we take the smaller of the two upper bounds.

More About Rank

Proof of (1) We show that rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

The previous theorem states

rank(AB) = rank(B) − dim(N(A) ∩ R(B))

[email protected] MATH 532 85 Similarly,

as above rank(AB) = rank(AB)T = rank(BT AT ) ≤ rank(AT ) = rank(A).

To make things as tight as possible we take the smaller of the two upper bounds.

More About Rank

Proof of (1) We show that rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

The previous theorem states

rank(AB) = rank(B) − dim(N(A) ∩ R(B)) ≤ rank(B). | {z } ≥0

[email protected] MATH 532 85 as above rank(AB)T = rank(BT AT ) ≤ rank(AT ) = rank(A).

To make things as tight as possible we take the smaller of the two upper bounds.

More About Rank

Proof of (1) We show that rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

The previous theorem states

rank(AB) = rank(B) − dim(N(A) ∩ R(B)) ≤ rank(B). | {z } ≥0

Similarly,

rank(AB) =

[email protected] MATH 532 85 as above ≤ rank(AT ) = rank(A).

To make things as tight as possible we take the smaller of the two upper bounds.

More About Rank

Proof of (1) We show that rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

The previous theorem states

rank(AB) = rank(B) − dim(N(A) ∩ R(B)) ≤ rank(B). | {z } ≥0

Similarly,

rank(AB) = rank(AB)T = rank(BT AT )

[email protected] MATH 532 85 rank(A).

To make things as tight as possible we take the smaller of the two upper bounds.

More About Rank

Proof of (1) We show that rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

The previous theorem states

rank(AB) = rank(B) − dim(N(A) ∩ R(B)) ≤ rank(B). | {z } ≥0

Similarly,

as above rank(AB) = rank(AB)T = rank(BT AT ) ≤ rank(AT ) =

[email protected] MATH 532 85 To make things as tight as possible we take the smaller of the two upper bounds.

More About Rank

Proof of (1) We show that rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

The previous theorem states

rank(AB) = rank(B) − dim(N(A) ∩ R(B)) ≤ rank(B). | {z } ≥0

Similarly,

as above rank(AB) = rank(AB)T = rank(BT AT ) ≤ rank(AT ) = rank(A).

[email protected] MATH 532 85 More About Rank

Proof of (1) We show that rank(AB) ≤ rank(A) and rank(AB) ≤ rank(B).

The previous theorem states

rank(AB) = rank(B) − dim(N(A) ∩ R(B)) ≤ rank(B). | {z } ≥0

Similarly,

as above rank(AB) = rank(AB)T = rank(BT AT ) ≤ rank(AT ) = rank(A).

To make things as tight as possible we take the smaller of the two upper bounds.

[email protected] MATH 532 85 ≥ rank(B) − n + rank(A).

Therefore,

dim(N(A) ∩ R(B)) ≤ dim(N(A)) = n − rank(A).

But then (using the previous theorem)

rank(AB) = rank(B) − dim(N(A) ∩ R(B))



More About Rank

Proof of (2) We begin by noting that N(A) ∩ R(B) ⊆ N(A).

[email protected] MATH 532 86 ≥ rank(B) − n + rank(A).

n − rank(A).

But then (using the previous theorem)

rank(AB) = rank(B) − dim(N(A) ∩ R(B))



More About Rank

Proof of (2) We begin by noting that N(A) ∩ R(B) ⊆ N(A).

Therefore,

dim(N(A) ∩ R(B)) ≤ dim(N(A)) =

[email protected] MATH 532 86 ≥ rank(B) − n + rank(A).

But then (using the previous theorem)

rank(AB) = rank(B) − dim(N(A) ∩ R(B))



More About Rank

Proof of (2) We begin by noting that N(A) ∩ R(B) ⊆ N(A).

Therefore,

dim(N(A) ∩ R(B)) ≤ dim(N(A)) = n − rank(A).

[email protected] MATH 532 86 ≥ rank(B) − n + rank(A).

More About Rank

Proof of (2) We begin by noting that N(A) ∩ R(B) ⊆ N(A).

Therefore,

dim(N(A) ∩ R(B)) ≤ dim(N(A)) = n − rank(A).

But then (using the previous theorem)

rank(AB) = rank(B) − dim(N(A) ∩ R(B))



[email protected] MATH 532 86 More About Rank

Proof of (2) We begin by noting that N(A) ∩ R(B) ⊆ N(A).

Therefore,

dim(N(A) ∩ R(B)) ≤ dim(N(A)) = n − rank(A).

But then (using the previous theorem)

rank(AB) = rank(B) − dim(N(A) ∩ R(B)) ≥ rank(B) − n + rank(A).



[email protected] MATH 532 86 Lemma Let A be a real m × n matrix. Then T T 1 rank(A A) = rank(AA ) = rank(A). T T T 2 R(A A) = R(A ), R(AA ) = R(A). 3 N(AT A) = N(A), N(AAT ) = N(AT ).

More About Rank

To prepare for our study of least squares solutions, where the matrices AT A and AAT are important, we prove

[email protected] MATH 532 87 More About Rank

To prepare for our study of least squares solutions, where the matrices AT A and AAT are important, we prove

Lemma Let A be a real m × n matrix. Then T T 1 rank(A A) = rank(AA ) = rank(A). T T T 2 R(A A) = R(A ), R(AA ) = R(A). 3 N(AT A) = N(A), N(AAT ) = N(AT ).

[email protected] MATH 532 87 For (1) to be true we need to show dim(N(AT ) ∩ R(A)) = 0, i.e., N(AT ) ∩ R(A) = {0}. This is true since

x ∈ N(AT ) ∩ R(A)= ⇒ AT x = 0 and x = Ay for some y.

Therefore (using xT = y T AT )

xT x = y T AT x = 0.

But m T X 2 x x = 0 ⇐⇒ xi = 0 =⇒ x = 0. i=1

More About Rank Proof From our earlier theorem we know

rank(AT A) = rank(A) − dim(N(AT ) ∩ R(A)).

[email protected] MATH 532 88 This is true since

x ∈ N(AT ) ∩ R(A)= ⇒ AT x = 0 and x = Ay for some y.

Therefore (using xT = y T AT )

xT x = y T AT x = 0.

But m T X 2 x x = 0 ⇐⇒ xi = 0 =⇒ x = 0. i=1

More About Rank Proof From our earlier theorem we know

rank(AT A) = rank(A) − dim(N(AT ) ∩ R(A)).

For (1) to be true we need to show dim(N(AT ) ∩ R(A)) = 0, i.e., N(AT ) ∩ R(A) = {0}.

[email protected] MATH 532 88 Therefore (using xT = y T AT )

xT x = y T AT x = 0.

But m T X 2 x x = 0 ⇐⇒ xi = 0 =⇒ x = 0. i=1

More About Rank Proof From our earlier theorem we know

rank(AT A) = rank(A) − dim(N(AT ) ∩ R(A)).

For (1) to be true we need to show dim(N(AT ) ∩ R(A)) = 0, i.e., N(AT ) ∩ R(A) = {0}. This is true since

x ∈ N(AT ) ∩ R(A)= ⇒ AT x = 0 and x = Ay for some y.

[email protected] MATH 532 88 = 0.

But m T X 2 x x = 0 ⇐⇒ xi = 0 =⇒ x = 0. i=1

More About Rank Proof From our earlier theorem we know

rank(AT A) = rank(A) − dim(N(AT ) ∩ R(A)).

For (1) to be true we need to show dim(N(AT ) ∩ R(A)) = 0, i.e., N(AT ) ∩ R(A) = {0}. This is true since

x ∈ N(AT ) ∩ R(A)= ⇒ AT x = 0 and x = Ay for some y.

Therefore (using xT = y T AT )

xT x = y T AT x

[email protected] MATH 532 88 But m T X 2 x x = 0 ⇐⇒ xi = 0 =⇒ x = 0. i=1

More About Rank Proof From our earlier theorem we know

rank(AT A) = rank(A) − dim(N(AT ) ∩ R(A)).

For (1) to be true we need to show dim(N(AT ) ∩ R(A)) = 0, i.e., N(AT ) ∩ R(A) = {0}. This is true since

x ∈ N(AT ) ∩ R(A)= ⇒ AT x = 0 and x = Ay for some y.

Therefore (using xT = y T AT )

xT x = y T AT x = 0.

[email protected] MATH 532 88 x = 0.

More About Rank Proof From our earlier theorem we know

rank(AT A) = rank(A) − dim(N(AT ) ∩ R(A)).

For (1) to be true we need to show dim(N(AT ) ∩ R(A)) = 0, i.e., N(AT ) ∩ R(A) = {0}. This is true since

x ∈ N(AT ) ∩ R(A)= ⇒ AT x = 0 and x = Ay for some y.

Therefore (using xT = y T AT )

xT x = y T AT x = 0.

But m T X 2 x x = 0 ⇐⇒ xi = 0 =⇒ i=1

[email protected] MATH 532 88 More About Rank Proof From our earlier theorem we know

rank(AT A) = rank(A) − dim(N(AT ) ∩ R(A)).

For (1) to be true we need to show dim(N(AT ) ∩ R(A)) = 0, i.e., N(AT ) ∩ R(A) = {0}. This is true since

x ∈ N(AT ) ∩ R(A)= ⇒ AT x = 0 and x = Ay for some y.

Therefore (using xT = y T AT )

xT x = y T AT x = 0.

But m T X 2 x x = 0 ⇐⇒ xi = 0 =⇒ x = 0. i=1

[email protected] MATH 532 88 The first part of (2) follows from R(AT A) ⊆ R(AT ) (see HW) and

(1) dim(R(AT A)) = rank(AT A) = rank(AT ) = dim(R(AT ))

since for M ⊆ N with dim M = dim N one has M = N (from an earlier theorem).

The other part of (2) follows by switching A and AT .

More About Rank

(cont.) rank(AAT ) = rank(AT ) obtained by switching A and AT , and then use rank(AT ) = rank(A).

[email protected] MATH 532 89 (1) = rank(AT ) = dim(R(AT ))

since for M ⊆ N with dim M = dim N one has M = N (from an earlier theorem).

The other part of (2) follows by switching A and AT .

More About Rank

(cont.) rank(AAT ) = rank(AT ) obtained by switching A and AT , and then use rank(AT ) = rank(A).

The first part of (2) follows from R(AT A) ⊆ R(AT ) (see HW) and

dim(R(AT A)) = rank(AT A)

[email protected] MATH 532 89 since for M ⊆ N with dim M = dim N one has M = N (from an earlier theorem).

The other part of (2) follows by switching A and AT .

More About Rank

(cont.) rank(AAT ) = rank(AT ) obtained by switching A and AT , and then use rank(AT ) = rank(A).

The first part of (2) follows from R(AT A) ⊆ R(AT ) (see HW) and

(1) dim(R(AT A)) = rank(AT A) = rank(AT ) = dim(R(AT ))

[email protected] MATH 532 89 The other part of (2) follows by switching A and AT .

More About Rank

(cont.) rank(AAT ) = rank(AT ) obtained by switching A and AT , and then use rank(AT ) = rank(A).

The first part of (2) follows from R(AT A) ⊆ R(AT ) (see HW) and

(1) dim(R(AT A)) = rank(AT A) = rank(AT ) = dim(R(AT )) since for M ⊆ N with dim M = dim N one has M = N (from an earlier theorem).

[email protected] MATH 532 89 More About Rank

(cont.) rank(AAT ) = rank(AT ) obtained by switching A and AT , and then use rank(AT ) = rank(A).

The first part of (2) follows from R(AT A) ⊆ R(AT ) (see HW) and

(1) dim(R(AT A)) = rank(AT A) = rank(AT ) = dim(R(AT )) since for M ⊆ N with dim M = dim N one has M = N (from an earlier theorem).

The other part of (2) follows by switching A and AT .

[email protected] MATH 532 89 = n − rank(AT A) = dim(N(AT A))

using the same reasoning as above.

T The other part of (3) follows by switching A and A . 

More About Rank

(cont.) The first part of (3) follows from N(A) ⊆ N(AT A) (see HW) and

dim(N(A)) = n − rank(A)

[email protected] MATH 532 90 T The other part of (3) follows by switching A and A . 

More About Rank

(cont.) The first part of (3) follows from N(A) ⊆ N(AT A) (see HW) and

dim(N(A)) = n − rank(A) = n − rank(AT A) = dim(N(AT A)) using the same reasoning as above.

[email protected] MATH 532 90 More About Rank

(cont.) The first part of (3) follows from N(A) ⊆ N(AT A) (see HW) and

dim(N(A)) = n − rank(A) = n − rank(AT A) = dim(N(AT A)) using the same reasoning as above.

T The other part of (3) follows by switching A and A . 

[email protected] MATH 532 90 To find a “solution” we multiply both sides by AT to get the normal : AT Ax = AT b, where AT A is an n × n matrix.

More About Rank Connection to least squares and normal equations

Consider a — possibly inconsistent —

Ax = b

with m × n matrix A (and b ∈/ R(A) if inconsistent).

[email protected] MATH 532 91 More About Rank Connection to least squares and normal equations

Consider a — possibly inconsistent — linear system

Ax = b

with m × n matrix A (and b ∈/ R(A) if inconsistent).

To find a “solution” we multiply both sides by AT to get the normal equations: AT Ax = AT b, where AT A is an n × n matrix.

[email protected] MATH 532 91 More About Rank

Theorem Let A be an m × n matrix, b an m-vector, and consider the normal equations AT Ax = AT b associated with Ax = b. 1 The normal equations are always consistent, i.e., for every A and b there exists at least one x such that AT Ax = AT b. 2 If Ax = b is consistent, then AT Ax = AT b has the same solution set (the least squares solution of Ax = b). 3 AT Ax = AT b has a unique solution if and only if rank(A) = n. Then x = (AT A)−1AT b, regardless of whether Ax = b is consistent or not. 4 If Ax = b is consistent and has a unique solution, then the same holds for AT Ax = AT b and x = (AT A)−1AT b.

[email protected] MATH 532 92 To show (2) we assume the p is some particular solution of Ax = b, i.e., Ap = b.

If we multiply by AT , then

AT Ap = AT p,

so that p is also a solution of the normal equations.

More About Rank

Proof (1) follows from our previous lemma, i.e.,

AT b ∈ R(AT ) = R(AT A).

[email protected] MATH 532 93 If we multiply by AT , then

AT Ap = AT p,

so that p is also a solution of the normal equations.

More About Rank

Proof (1) follows from our previous lemma, i.e.,

AT b ∈ R(AT ) = R(AT A).

To show (2) we assume the p is some particular solution of Ax = b, i.e., Ap = b.

[email protected] MATH 532 93 More About Rank

Proof (1) follows from our previous lemma, i.e.,

AT b ∈ R(AT ) = R(AT A).

To show (2) we assume the p is some particular solution of Ax = b, i.e., Ap = b.

If we multiply by AT , then

AT Ap = AT p, so that p is also a solution of the normal equations.

[email protected] MATH 532 93 Moreover, the general solution of AT Ax = AT b is of the form

p + N(AT A) lemma= p + N(A) = S.

More About Rank

(cont.) Now, the general solution of Ax = b is from the set (see Problem 2 on HW#4) S = p + N(A).

[email protected] MATH 532 94 lemma= p + N(A) = S.

More About Rank

(cont.) Now, the general solution of Ax = b is from the set (see Problem 2 on HW#4) S = p + N(A).

Moreover, the general solution of AT Ax = AT b is of the form

p + N(AT A)

[email protected] MATH 532 94 More About Rank

(cont.) Now, the general solution of Ax = b is from the set (see Problem 2 on HW#4) S = p + N(A).

Moreover, the general solution of AT Ax = AT b is of the form

p + N(AT A) lemma= p + N(A) = S.

[email protected] MATH 532 94 What we know immediately is that AT Ax = AT b has a unique solution if and only if rank(AT A) = n. Since we showed earlier that rank(AT A) = rank(A) this part is done.

Now, if rank(AT A) = n we know that AT A is invertible (even though AT and A may not be) and therefore

AT Ax = AT b ⇐⇒ x = (AT A)−1AT b.

To show (4) we note that Ax = b has a unique solution if and only if T rank(A) = n. But rank(A A) = rank(A) and the rest follows from (3). 

More About Rank

(cont.) For (3) we want to show that AT Ax = AT b has a unique solution if and only if rank(A) = n.

[email protected] MATH 532 95 Since we showed earlier that rank(AT A) = rank(A) this part is done.

Now, if rank(AT A) = n we know that AT A is invertible (even though AT and A may not be) and therefore

AT Ax = AT b ⇐⇒ x = (AT A)−1AT b.

To show (4) we note that Ax = b has a unique solution if and only if T rank(A) = n. But rank(A A) = rank(A) and the rest follows from (3). 

More About Rank

(cont.) For (3) we want to show that AT Ax = AT b has a unique solution if and only if rank(A) = n.

What we know immediately is that AT Ax = AT b has a unique solution if and only if rank(AT A) = n.

[email protected] MATH 532 95 Now, if rank(AT A) = n we know that AT A is invertible (even though AT and A may not be) and therefore

AT Ax = AT b ⇐⇒ x = (AT A)−1AT b.

To show (4) we note that Ax = b has a unique solution if and only if T rank(A) = n. But rank(A A) = rank(A) and the rest follows from (3). 

More About Rank

(cont.) For (3) we want to show that AT Ax = AT b has a unique solution if and only if rank(A) = n.

What we know immediately is that AT Ax = AT b has a unique solution if and only if rank(AT A) = n. Since we showed earlier that rank(AT A) = rank(A) this part is done.

[email protected] MATH 532 95 To show (4) we note that Ax = b has a unique solution if and only if T rank(A) = n. But rank(A A) = rank(A) and the rest follows from (3). 

More About Rank

(cont.) For (3) we want to show that AT Ax = AT b has a unique solution if and only if rank(A) = n.

What we know immediately is that AT Ax = AT b has a unique solution if and only if rank(AT A) = n. Since we showed earlier that rank(AT A) = rank(A) this part is done.

Now, if rank(AT A) = n we know that AT A is invertible (even though AT and A may not be) and therefore

AT Ax = AT b ⇐⇒ x = (AT A)−1AT b.

[email protected] MATH 532 95 More About Rank

(cont.) For (3) we want to show that AT Ax = AT b has a unique solution if and only if rank(A) = n.

What we know immediately is that AT Ax = AT b has a unique solution if and only if rank(AT A) = n. Since we showed earlier that rank(AT A) = rank(A) this part is done.

Now, if rank(AT A) = n we know that AT A is invertible (even though AT and A may not be) and therefore

AT Ax = AT b ⇐⇒ x = (AT A)−1AT b.

To show (4) we note that Ax = b has a unique solution if and only if T rank(A) = n. But rank(A A) = rank(A) and the rest follows from (3). 

[email protected] MATH 532 95 More About Rank

Remark The normal equations are not recommended for serious computations since they are often rather ill-conditioned since one can show that

cond(AT A) = cond(A)2.

There’s an example in [Mey00] that illustrates this fact.

[email protected] MATH 532 96 Example The matrix 1 2 2 3 1 2 4 4 6 2 A =   3 6 6 9 6 1 2 4 5 3 cannot have rank 4 since rows one and two are linearly dependent. 9 6 But rank(A) ≥ 2 since is nonsingular. 5 3

More About Rank Historical definition of rank

Let A be an m × n matrix. Then A has rank r if there exists at least one nonsingular r × r submatrix of A (and none larger).

[email protected] MATH 532 97 since rows one and two are linearly dependent. 9 6 But rank(A) ≥ 2 since is nonsingular. 5 3

More About Rank Historical definition of rank

Let A be an m × n matrix. Then A has rank r if there exists at least one nonsingular r × r submatrix of A (and none larger).

Example The matrix 1 2 2 3 1 2 4 4 6 2 A =   3 6 6 9 6 1 2 4 5 3 cannot have rank 4

[email protected] MATH 532 97 9 6 But rank(A) ≥ 2 since is nonsingular. 5 3

More About Rank Historical definition of rank

Let A be an m × n matrix. Then A has rank r if there exists at least one nonsingular r × r submatrix of A (and none larger).

Example The matrix 1 2 2 3 1 2 4 4 6 2 A =   3 6 6 9 6 1 2 4 5 3 cannot have rank 4 since rows one and two are linearly dependent.

[email protected] MATH 532 97 9 6 since is nonsingular. 5 3

More About Rank Historical definition of rank

Let A be an m × n matrix. Then A has rank r if there exists at least one nonsingular r × r submatrix of A (and none larger).

Example The matrix 1 2 2 3 1 2 4 4 6 2 A =   3 6 6 9 6 1 2 4 5 3 cannot have rank 4 since rows one and two are linearly dependent. But rank(A) ≥ 2

[email protected] MATH 532 97 More About Rank Historical definition of rank

Let A be an m × n matrix. Then A has rank r if there exists at least one nonsingular r × r submatrix of A (and none larger).

Example The matrix 1 2 2 3 1 2 4 4 6 2 A =   3 6 6 9 6 1 2 4 5 3 cannot have rank 4 since rows one and two are linearly dependent. 9 6 But rank(A) ≥ 2 since is nonsingular. 5 3

[email protected] MATH 532 97 Note that other singular 3 × 3 submatrices are allowed, such as

1 2 2 2 4 4 . 3 6 6

More About Rank

Example (cont.) In fact, rank(A) = 3 since

4 6 2 6 9 6 4 5 3 is nonsingular.

[email protected] MATH 532 98 More About Rank

Example (cont.) In fact, rank(A) = 3 since

4 6 2 6 9 6 4 5 3 is nonsingular.

Note that other singular 3 × 3 submatrices are allowed, such as

1 2 2 2 4 4 . 3 6 6

[email protected] MATH 532 98 Now Theorem Let A and E be m × n matrices. Then

rank(A + E) ≥ rank(A),

provided the entries of E are “sufficiently small”.

More About Rank

Earlier we showed that

rank(AB) ≤ rank(A), i.e., multiplication by another matrix does not increase the rank of a given matrix, i.e., we can’t “fix” a singular system by multiplication.

[email protected] MATH 532 99 More About Rank

Earlier we showed that

rank(AB) ≤ rank(A), i.e., multiplication by another matrix does not increase the rank of a given matrix, i.e., we can’t “fix” a singular system by multiplication.

Now Theorem Let A and E be m × n matrices. Then

rank(A + E) ≥ rank(A), provided the entries of E are “sufficiently small”.

[email protected] MATH 532 99 Beware!! A theoretically singular system may become nonsingular, i.e., have a “solution” — just due to round-off error.

We may want to intentionally “fix” a singular system, so that it has a “solution”. One such strategy is known as Tikhonov regularization, i.e.,

Ax = b −→ (A + µI)x = b,

where µ is a (small) regularization parameter.

More About Rank

This theorem has at least two fundamental consequences of practical importance:

[email protected] MATH 532 100 We may want to intentionally “fix” a singular system, so that it has a “solution”. One such strategy is known as Tikhonov regularization, i.e.,

Ax = b −→ (A + µI)x = b,

where µ is a (small) regularization parameter.

More About Rank

This theorem has at least two fundamental consequences of practical importance: Beware!! A theoretically singular system may become nonsingular, i.e., have a “solution” — just due to round-off error.

[email protected] MATH 532 100 More About Rank

This theorem has at least two fundamental consequences of practical importance: Beware!! A theoretically singular system may become nonsingular, i.e., have a “solution” — just due to round-off error.

We may want to intentionally “fix” a singular system, so that it has a “solution”. One such strategy is known as Tikhonov regularization, i.e.,

Ax = b −→ (A + µI)x = b,

where µ is a (small) regularization parameter.

[email protected] MATH 532 100   E11 E12 Then — formally — PEQ = with appropriate blocks Eij . E21 E22 This allows us to write I + E E  P(A + E)Q = r 11 12 . E21 E22

More About Rank

Proof We assume that rank(A) = r and that we have nonsingular P and Q such that we can convert A to rank normal form, i.e.,

I O PAQ = r . OO

[email protected] MATH 532 101 This allows us to write I + E E  P(A + E)Q = r 11 12 . E21 E22

More About Rank

Proof We assume that rank(A) = r and that we have nonsingular P and Q such that we can convert A to rank normal form, i.e.,

I O PAQ = r . OO   E11 E12 Then — formally — PEQ = with appropriate blocks Eij . E21 E22

[email protected] MATH 532 101 More About Rank

Proof We assume that rank(A) = r and that we have nonsingular P and Q such that we can convert A to rank normal form, i.e.,

I O PAQ = r . OO   E11 E12 Then — formally — PEQ = with appropriate blocks Eij . E21 E22 This allows us to write I + E E  P(A + E)Q = r 11 12 . E21 E22

[email protected] MATH 532 101 = I − Bk → I,

provided the entries of B are “sufficiently small” (i.e., so that Bk → O for k → ∞). Therefore (I − B)−1 exists.

This technique is known as the Neumann series expansion of the inverse of I − B.

More About Rank

(cont.) Now, we note that

(I − B)(I + B + B2 + ... + Bk−1)

[email protected] MATH 532 102 → I,

provided the entries of B are “sufficiently small” (i.e., so that Bk → O for k → ∞). Therefore (I − B)−1 exists.

This technique is known as the Neumann series expansion of the inverse of I − B.

More About Rank

(cont.) Now, we note that

(I − B)(I + B + B2 + ... + Bk−1) = I − Bk

[email protected] MATH 532 102 Therefore (I − B)−1 exists.

This technique is known as the Neumann series expansion of the inverse of I − B.

More About Rank

(cont.) Now, we note that

(I − B)(I + B + B2 + ... + Bk−1) = I − Bk → I, provided the entries of B are “sufficiently small” (i.e., so that Bk → O for k → ∞).

[email protected] MATH 532 102 This technique is known as the Neumann series expansion of the inverse of I − B.

More About Rank

(cont.) Now, we note that

(I − B)(I + B + B2 + ... + Bk−1) = I − Bk → I, provided the entries of B are “sufficiently small” (i.e., so that Bk → O for k → ∞). Therefore (I − B)−1 exists.

[email protected] MATH 532 102 More About Rank

(cont.) Now, we note that

(I − B)(I + B + B2 + ... + Bk−1) = I − Bk → I, provided the entries of B are “sufficiently small” (i.e., so that Bk → O for k → ∞). Therefore (I − B)−1 exists.

This technique is known as the Neumann series expansion of the inverse of I − B.

[email protected] MATH 532 102 More About Rank

(cont.) −1 Now, letting B = −E11, we know that (Ir + E11) exists and we can write

     −1  Ir O Ir + E11 E12 Ir −(Ir + E11) E12 −1 −E21(Ir + E11) Im−r E21 E22 OIn−r I + E O = r 11 , OS

−1 where S = E22 − E21(Ir + E11) E12 is the of I + E11 in PAQ.

[email protected] MATH 532 103 = rank(A) + rank(S) ≥ rank(A).

But then this rank normal form with invertible diagonal blocks tells us

rank(A + E) = rank(Ir + E11) + rank(S)



More About Rank

(cont.) The Schur complement calculation shows that

I + E O A + E ∼ r 11 . OS

[email protected] MATH 532 104 = rank(A) + rank(S) ≥ rank(A).

More About Rank

(cont.) The Schur complement calculation shows that

I + E O A + E ∼ r 11 . OS

But then this rank normal form with invertible diagonal blocks tells us

rank(A + E) = rank(Ir + E11) + rank(S)



[email protected] MATH 532 104 ≥ rank(A).

More About Rank

(cont.) The Schur complement calculation shows that

I + E O A + E ∼ r 11 . OS

But then this rank normal form with invertible diagonal blocks tells us

rank(A + E) = rank(Ir + E11) + rank(S) = rank(A) + rank(S)



[email protected] MATH 532 104 More About Rank

(cont.) The Schur complement calculation shows that

I + E O A + E ∼ r 11 . OS

But then this rank normal form with invertible diagonal blocks tells us

rank(A + E) = rank(Ir + E11) + rank(S) = rank(A) + rank(S) ≥ rank(A).



[email protected] MATH 532 104 Classical Least Squares Outline

1 Spaces and Subspaces

2 Four Fundamental Subspaces

3 Linear Independence

4 Bases and Dimension

5 More About Rank

6 Classical Least Squares

7 Kriging as best linear unbiased predictor

[email protected] MATH 532 105 Find: “best fit” by a line

Idea for best fit Minimize the sum of the squares of the vertical distances of line from the data points.

Classical Least Squares Linear least squares ()

Given: data {(t1, b1), (t2, b2),..., (tm, bm)}

t 1 2 3 4 5 b 1.3 3.5 4.2 5.0 7.0

[email protected] MATH 532 106 Idea for best fit Minimize the sum of the squares of the vertical distances of line from the data points.

Classical Least Squares Linear least squares (linear regression)

Given: data {(t1, b1), (t2, b2),..., (tm, bm)} Find: “best fit” by a line

t 1 2 3 4 5 b 1.3 3.5 4.2 5.0 7.0

[email protected] MATH 532 106 Classical Least Squares Linear least squares (linear regression)

Given: data {(t1, b1), (t2, b2),..., (tm, bm)} Find: “best fit” by a line

t 1 2 3 4 5 b 1.3 3.5 4.2 5.0 7.0

Idea for best fit Minimize the sum of the squares of the vertical distances of line from the data points.

[email protected] MATH 532 106 m m X 2 X 2 εi = (f (ti ) − bi ) i=1 i=1 m X 2 = (α + βti − bi ) = G(α, β) −→ min i=1 From , necessary (and sufficient) condition for minimum ∂G(α, β) ∂G(α, β) = 0, = 0. ∂α ∂β where m m ∂G(α, β) X ∂G(α, β) X = 2 (α + βt − b ) , = 2 (α + βt − b ) t ∂α i i ∂β i i i i=1 i=1

Classical Least Squares More precisely, let f (t) = α + βt with α, β such that

[email protected] MATH 532 107 m X 2 εi = i=1 m X 2 = (α + βti − bi ) = G(α, β) i=1 From calculus, necessary (and sufficient) condition for minimum ∂G(α, β) ∂G(α, β) = 0, = 0. ∂α ∂β where m m ∂G(α, β) X ∂G(α, β) X = 2 (α + βt − b ) , = 2 (α + βt − b ) t ∂α i i ∂β i i i i=1 i=1

Classical Least Squares More precisely, let f (t) = α + βt with α, β such that m X 2 (f (ti ) − bi ) i=1

−→ min

[email protected] MATH 532 107 m X 2 εi = i=1

= G(α, β)

From calculus, necessary (and sufficient) condition for minimum ∂G(α, β) ∂G(α, β) = 0, = 0. ∂α ∂β where m m ∂G(α, β) X ∂G(α, β) X = 2 (α + βt − b ) , = 2 (α + βt − b ) t ∂α i i ∂β i i i i=1 i=1

Classical Least Squares More precisely, let f (t) = α + βt with α, β such that m X 2 (f (ti ) − bi ) i=1 m X 2 = (α + βti − bi ) −→ min i=1

[email protected] MATH 532 107 From calculus, necessary (and sufficient) condition for minimum ∂G(α, β) ∂G(α, β) = 0, = 0. ∂α ∂β where m m ∂G(α, β) X ∂G(α, β) X = 2 (α + βt − b ) , = 2 (α + βt − b ) t ∂α i i ∂β i i i i=1 i=1

Classical Least Squares More precisely, let f (t) = α + βt with α, β such that m m X 2 X 2 εi = (f (ti ) − bi ) i=1 i=1 m X 2 = (α + βti − bi ) = G(α, β) −→ min i=1

[email protected] MATH 532 107 m m ∂G(α, β) X ∂G(α, β) X = 2 (α + βt − b ) , = 2 (α + βt − b ) t ∂α i i ∂β i i i i=1 i=1

Classical Least Squares More precisely, let f (t) = α + βt with α, β such that m m X 2 X 2 εi = (f (ti ) − bi ) i=1 i=1 m X 2 = (α + βti − bi ) = G(α, β) −→ min i=1 From calculus, necessary (and sufficient) condition for minimum ∂G(α, β) ∂G(α, β) = 0, = 0. ∂α ∂β where

[email protected] MATH 532 107 Classical Least Squares More precisely, let f (t) = α + βt with α, β such that m m X 2 X 2 εi = (f (ti ) − bi ) i=1 i=1 m X 2 = (α + βti − bi ) = G(α, β) −→ min i=1 From calculus, necessary (and sufficient) condition for minimum ∂G(α, β) ∂G(α, β) = 0, = 0. ∂α ∂β where m m ∂G(α, β) X ∂G(α, β) X = 2 (α + βt − b ) , = 2 (α + βt − b ) t ∂α i i ∂β i i i i=1 i=1

[email protected] MATH 532 107 which can be written as Qx = y with  m m   m  X X X 1 ti bi   α   Q =  i=1 i=1  , x = , y =  i=1   m m  β  m  X X 2 X   ti ti   bi ti  i=1 i=1 i=1

Classical Least Squares

Equivalently,

m ! m ! m X X X 1 α + ti β = bi i=1 i=1 i=1 m ! m ! m X X 2 X ti α + ti β = bi ti i=1 i=1 i=1

[email protected] MATH 532 108  m m   m  X X X 1 ti bi   α   Q =  i=1 i=1  , x = , y =  i=1   m m  β  m  X X 2 X   ti ti   bi ti  i=1 i=1 i=1

Classical Least Squares

Equivalently,

m ! m ! m X X X 1 α + ti β = bi i=1 i=1 i=1 m ! m ! m X X 2 X ti α + ti β = bi ti i=1 i=1 i=1 which can be written as Qx = y with

[email protected] MATH 532 108 Classical Least Squares

Equivalently,

m ! m ! m X X X 1 α + ti β = bi i=1 i=1 i=1 m ! m ! m X X 2 X ti α + ti β = bi ti i=1 i=1 i=1 which can be written as Qx = y with  m m   m  X X X 1 ti bi   α   Q =  i=1 i=1  , x = , y =  i=1   m m  β  m  X X 2 X   ti ti   bi ti  i=1 i=1 i=1

[email protected] MATH 532 108 1T 1 1T t 1T b x = tT 1 tT t tT b 1T  ⇐⇒ AT Ax = AT b, AT = , A = 1 t tT

m m m X T X T T X 2 T 1 = 1 1, ti = 1 t = t 1, ti = t t i=1 i=1 i=1 m m X T T X T T bi = 1 b = b 1, bi ti = b t = t b, i=1 i=1 where

T  T  T  1 = 1 ··· 1 , t = t1 ··· tm , b = b1 ··· bm

With this notation we have

Qx = y ⇐⇒

Classical Least Squares We can write each of these sums as inner products:

[email protected] MATH 532 109 1T 1 1T t 1T b x = tT 1 tT t tT b 1T  ⇐⇒ AT Ax = AT b, AT = , A = 1 t tT

With this notation we have

Qx = y ⇐⇒

Classical Least Squares We can write each of these sums as inner products:

m m m X T X T T X 2 T 1 = 1 1, ti = 1 t = t 1, ti = t t i=1 i=1 i=1 m m X T T X T T bi = 1 b = b 1, bi ti = b t = t b, i=1 i=1 where

T  T  T  1 = 1 ··· 1 , t = t1 ··· tm , b = b1 ··· bm

[email protected] MATH 532 109 1T 1 1T t 1T b x = tT 1 tT t tT b 1T  ⇐⇒ AT Ax = AT b, AT = , A = 1 t tT

Classical Least Squares We can write each of these sums as inner products:

m m m X T X T T X 2 T 1 = 1 1, ti = 1 t = t 1, ti = t t i=1 i=1 i=1 m m X T T X T T bi = 1 b = b 1, bi ti = b t = t b, i=1 i=1 where

T  T  T  1 = 1 ··· 1 , t = t1 ··· tm , b = b1 ··· bm

With this notation we have

Qx = y ⇐⇒

[email protected] MATH 532 109 1T  ⇐⇒ AT Ax = AT b, AT = , A = 1 t tT

Classical Least Squares We can write each of these sums as inner products:

m m m X T X T T X 2 T 1 = 1 1, ti = 1 t = t 1, ti = t t i=1 i=1 i=1 m m X T T X T T bi = 1 b = b 1, bi ti = b t = t b, i=1 i=1 where

T  T  T  1 = 1 ··· 1 , t = t1 ··· tm , b = b1 ··· bm

With this notation we have 1T 1 1T t 1T b Qx = y ⇐⇒ x = tT 1 tT t tT b

[email protected] MATH 532 109 Classical Least Squares We can write each of these sums as inner products:

m m m X T X T T X 2 T 1 = 1 1, ti = 1 t = t 1, ti = t t i=1 i=1 i=1 m m X T T X T T bi = 1 b = b 1, bi ti = b t = t b, i=1 i=1 where

T  T  T  1 = 1 ··· 1 , t = t1 ··· tm , b = b1 ··· bm

With this notation we have 1T 1 1T t 1T b Qx = y ⇐⇒ x = tT 1 tT t tT b 1T  ⇐⇒ AT Ax = AT b, AT = , A = 1 t tT

[email protected] MATH 532 109 = Ax − b.

This implies that

m X 2 T T G(α, β) = εi = ε ε = (Ax − b) (Ax − b). i=1

Also note that since εi = α + βti − bi we have         ε1 1 t1 b1  .   .   .   .  ε =  .  =  .  α +  .  β −  .  εm 1 tm bm = 1α + tβ − b

Classical Least Squares α Therefore we can find the parameters of the line, x = , by solving β the square linear system

AT Ax = AT b.

[email protected] MATH 532 110 = Ax − b.

This implies that

m X 2 T T G(α, β) = εi = ε ε = (Ax − b) (Ax − b). i=1

Classical Least Squares α Therefore we can find the parameters of the line, x = , by solving β the square linear system

AT Ax = AT b.

Also note that since εi = α + βti − bi we have         ε1 1 t1 b1  .   .   .   .  ε =  .  =  .  α +  .  β −  .  εm 1 tm bm = 1α + tβ − b

[email protected] MATH 532 110 This implies that

m X 2 T T G(α, β) = εi = ε ε = (Ax − b) (Ax − b). i=1

Classical Least Squares α Therefore we can find the parameters of the line, x = , by solving β the square linear system

AT Ax = AT b.

Also note that since εi = α + βti − bi we have         ε1 1 t1 b1  .   .   .   .  ε =  .  =  .  α +  .  β −  .  εm 1 tm bm = 1α + tβ − b = Ax − b.

[email protected] MATH 532 110 = εT ε = (Ax − b)T (Ax − b).

Classical Least Squares α Therefore we can find the parameters of the line, x = , by solving β the square linear system

AT Ax = AT b.

Also note that since εi = α + βti − bi we have         ε1 1 t1 b1  .   .   .   .  ε =  .  =  .  α +  .  β −  .  εm 1 tm bm = 1α + tβ − b = Ax − b.

This implies that

m X 2 G(α, β) = εi i=1

[email protected] MATH 532 110 = (Ax − b)T (Ax − b).

Classical Least Squares α Therefore we can find the parameters of the line, x = , by solving β the square linear system

AT Ax = AT b.

Also note that since εi = α + βti − bi we have         ε1 1 t1 b1  .   .   .   .  ε =  .  =  .  α +  .  β −  .  εm 1 tm bm = 1α + tβ − b = Ax − b.

This implies that

m X 2 T G(α, β) = εi = ε ε i=1

[email protected] MATH 532 110 Classical Least Squares α Therefore we can find the parameters of the line, x = , by solving β the square linear system

AT Ax = AT b.

Also note that since εi = α + βti − bi we have         ε1 1 t1 b1  .   .   .   .  ε =  .  =  .  α +  .  β −  .  εm 1 tm bm = 1α + tβ − b = Ax − b.

This implies that

m X 2 T T G(α, β) = εi = ε ε = (Ax − b) (Ax − b). i=1

[email protected] MATH 532 110 ! ! P8 1 P8 t α P8 b ⇐⇒ i=1 i=1 i = i=1 i P8 P8 2 β P8 i=1 ti i=1 ti i=1 bi ti  8 20 α 37 ⇐⇒ = 20 92 β 25 =⇒ α ≈ 8.643, β ≈ −1.607

So that the best fit line to the given data is

f (t) ≈ 8.643 − 1.607t.

AT Ax = AT b

Classical Least Squares Example Data: t -1 0 1 2 3 4 5 6 b 10 9 7 5 4 3 0 -1

[email protected] MATH 532 111 ! ! P8 1 P8 t α P8 b ⇐⇒ i=1 i=1 i = i=1 i P8 P8 2 β P8 i=1 ti i=1 ti i=1 bi ti  8 20 α 37 ⇐⇒ = 20 92 β 25 =⇒ α ≈ 8.643, β ≈ −1.607

So that the best fit line to the given data is

f (t) ≈ 8.643 − 1.607t.

Classical Least Squares Example Data: t -1 0 1 2 3 4 5 6 b 10 9 7 5 4 3 0 -1

AT Ax = AT b

[email protected] MATH 532 111  8 20 α 37 ⇐⇒ = 20 92 β 25 =⇒ α ≈ 8.643, β ≈ −1.607

So that the best fit line to the given data is

f (t) ≈ 8.643 − 1.607t.

Classical Least Squares Example Data: t -1 0 1 2 3 4 5 6 b 10 9 7 5 4 3 0 -1

! ! P8 1 P8 t α P8 b AT Ax = AT b ⇐⇒ i=1 i=1 i = i=1 i P8 P8 2 β P8 i=1 ti i=1 ti i=1 bi ti

[email protected] MATH 532 111 =⇒ α ≈ 8.643, β ≈ −1.607

So that the best fit line to the given data is

f (t) ≈ 8.643 − 1.607t.

Classical Least Squares Example Data: t -1 0 1 2 3 4 5 6 b 10 9 7 5 4 3 0 -1

! ! P8 1 P8 t α P8 b AT Ax = AT b ⇐⇒ i=1 i=1 i = i=1 i P8 P8 2 β P8 i=1 ti i=1 ti i=1 bi ti  8 20 α 37 ⇐⇒ = 20 92 β 25

[email protected] MATH 532 111 So that the best fit line to the given data is

f (t) ≈ 8.643 − 1.607t.

Classical Least Squares Example Data: t -1 0 1 2 3 4 5 6 b 10 9 7 5 4 3 0 -1

! ! P8 1 P8 t α P8 b AT Ax = AT b ⇐⇒ i=1 i=1 i = i=1 i P8 P8 2 β P8 i=1 ti i=1 ti i=1 bi ti  8 20 α 37 ⇐⇒ = 20 92 β 25 =⇒ α ≈ 8.643, β ≈ −1.607

[email protected] MATH 532 111 Classical Least Squares Example Data: t -1 0 1 2 3 4 5 6 b 10 9 7 5 4 3 0 -1

! ! P8 1 P8 t α P8 b AT Ax = AT b ⇐⇒ i=1 i=1 i = i=1 i P8 P8 2 β P8 i=1 ti i=1 ti i=1 bi ti  8 20 α 37 ⇐⇒ = 20 92 β 25 =⇒ α ≈ 8.643, β ≈ −1.607

So that the best fit line to the given data is

f (t) ≈ 8.643 − 1.607t.

[email protected] MATH 532 111 The set of all least squares solutions is obtained by solving the normal equations AT Ax = AT b. Moreover, a unique solution exists if and only if rank(A) = n so that

x = (AT A)−1AT b.

Classical Least Squares General Least Squares

The general least squares problem behaves analogously to the linear example. Theorem Let A be a real m × n matrix and b an m-vector. Any vector x that minimizes the square of the residual Ax − b, i.e.,

G(x) = (Ax − b)T (Ax − b)

is called a least squares solution of Ax = b.

[email protected] MATH 532 112 Classical Least Squares General Least Squares

The general least squares problem behaves analogously to the linear example. Theorem Let A be a real m × n matrix and b an m-vector. Any vector x that minimizes the square of the residual Ax − b, i.e.,

G(x) = (Ax − b)T (Ax − b)

is called a least squares solution of Ax = b. The set of all least squares solutions is obtained by solving the normal equations AT Ax = AT b. Moreover, a unique solution exists if and only if rank(A) = n so that

x = (AT A)−1AT b.

[email protected] MATH 532 112 = xT AT Ax − xT AT b − bT Ax + bT b = xT AT Ax − 2xT AT b + bT b

since bT Ax = (bT Ax)T = xT AT b is a scalar.

To characterize the least squares solutions we first show that if x minimizes G(x) then x satisfies AT Ax = AT b.

As in our earlier example, a necessary condition for the minimum is: ∂G(x) = 0, i = 1,..., n. ∂xi Let’s first work out what G(x) looks like:

G(x) = (Ax − b)T (Ax − b)

Classical Least Squares

Proof The statement about uniqueness follows directly from our earlier theorem on p. 92 on the normal equations.

[email protected] MATH 532 113 = xT AT Ax − xT AT b − bT Ax + bT b = xT AT Ax − 2xT AT b + bT b

since bT Ax = (bT Ax)T = xT AT b is a scalar.

As in our earlier example, a necessary condition for the minimum is: ∂G(x) = 0, i = 1,..., n. ∂xi Let’s first work out what G(x) looks like:

G(x) = (Ax − b)T (Ax − b)

Classical Least Squares

Proof The statement about uniqueness follows directly from our earlier theorem on p. 92 on the normal equations.

To characterize the least squares solutions we first show that if x minimizes G(x) then x satisfies AT Ax = AT b.

[email protected] MATH 532 113 = xT AT Ax − xT AT b − bT Ax + bT b = xT AT Ax − 2xT AT b + bT b

since bT Ax = (bT Ax)T = xT AT b is a scalar.

Let’s first work out what G(x) looks like:

G(x) = (Ax − b)T (Ax − b)

Classical Least Squares

Proof The statement about uniqueness follows directly from our earlier theorem on p. 92 on the normal equations.

To characterize the least squares solutions we first show that if x minimizes G(x) then x satisfies AT Ax = AT b.

As in our earlier example, a necessary condition for the minimum is: ∂G(x) = 0, i = 1,..., n. ∂xi

[email protected] MATH 532 113 = xT AT Ax − xT AT b − bT Ax + bT b = xT AT Ax − 2xT AT b + bT b

since bT Ax = (bT Ax)T = xT AT b is a scalar.

Classical Least Squares

Proof The statement about uniqueness follows directly from our earlier theorem on p. 92 on the normal equations.

To characterize the least squares solutions we first show that if x minimizes G(x) then x satisfies AT Ax = AT b.

As in our earlier example, a necessary condition for the minimum is: ∂G(x) = 0, i = 1,..., n. ∂xi Let’s first work out what G(x) looks like:

G(x) = (Ax − b)T (Ax − b)

[email protected] MATH 532 113 = xT AT Ax − 2xT AT b + bT b

since bT Ax = (bT Ax)T = xT AT b is a scalar.

Classical Least Squares

Proof The statement about uniqueness follows directly from our earlier theorem on p. 92 on the normal equations.

To characterize the least squares solutions we first show that if x minimizes G(x) then x satisfies AT Ax = AT b.

As in our earlier example, a necessary condition for the minimum is: ∂G(x) = 0, i = 1,..., n. ∂xi Let’s first work out what G(x) looks like:

G(x) = (Ax − b)T (Ax − b) = xT AT Ax − xT AT b − bT Ax + bT b

[email protected] MATH 532 113 since bT Ax = (bT Ax)T = xT AT b is a scalar.

Classical Least Squares

Proof The statement about uniqueness follows directly from our earlier theorem on p. 92 on the normal equations.

To characterize the least squares solutions we first show that if x minimizes G(x) then x satisfies AT Ax = AT b.

As in our earlier example, a necessary condition for the minimum is: ∂G(x) = 0, i = 1,..., n. ∂xi Let’s first work out what G(x) looks like:

G(x) = (Ax − b)T (Ax − b) = xT AT Ax − xT AT b − bT Ax + bT b = xT AT Ax − 2xT AT b + bT b

[email protected] MATH 532 113 Classical Least Squares

Proof The statement about uniqueness follows directly from our earlier theorem on p. 92 on the normal equations.

To characterize the least squares solutions we first show that if x minimizes G(x) then x satisfies AT Ax = AT b.

As in our earlier example, a necessary condition for the minimum is: ∂G(x) = 0, i = 1,..., n. ∂xi Let’s first work out what G(x) looks like:

G(x) = (Ax − b)T (Ax − b) = xT AT Ax − xT AT b − bT Ax + bT b = xT AT Ax − 2xT AT b + bT b since bT Ax = (bT Ax)T = xT AT b is a scalar.

[email protected] MATH 532 113 T T T T T T = ei A Ax + x A Aei − 2ei A b T T T T = 2ei A Ax − 2ei A b

T T T T T T T since x A Aei = (x A Aei ) = ei A Ax is a scalar. This means that

∂G(x) T T = 0 ⇐⇒ (A )i∗Ax = (A )i∗b. ∂xi

If we collect all such conditions (for i = 1,..., n) in one linear system we get AT Ax = AT b.

Classical Least Squares (cont.) Therefore ∂G(x) ∂xT ∂x ∂xT = AT Ax + xT AT A − 2 AT b ∂xi ∂xi ∂xi ∂xi

[email protected] MATH 532 114 T T T T = 2ei A Ax − 2ei A b

T T T T T T T since x A Aei = (x A Aei ) = ei A Ax is a scalar. This means that

∂G(x) T T = 0 ⇐⇒ (A )i∗Ax = (A )i∗b. ∂xi

If we collect all such conditions (for i = 1,..., n) in one linear system we get AT Ax = AT b.

Classical Least Squares (cont.) Therefore ∂G(x) ∂xT ∂x ∂xT = AT Ax + xT AT A − 2 AT b ∂xi ∂xi ∂xi ∂xi T T T T T T = ei A Ax + x A Aei − 2ei A b

[email protected] MATH 532 114 T T T T T T T since x A Aei = (x A Aei ) = ei A Ax is a scalar. This means that

∂G(x) T T = 0 ⇐⇒ (A )i∗Ax = (A )i∗b. ∂xi

If we collect all such conditions (for i = 1,..., n) in one linear system we get AT Ax = AT b.

Classical Least Squares (cont.) Therefore ∂G(x) ∂xT ∂x ∂xT = AT Ax + xT AT A − 2 AT b ∂xi ∂xi ∂xi ∂xi T T T T T T = ei A Ax + x A Aei − 2ei A b T T T T = 2ei A Ax − 2ei A b

[email protected] MATH 532 114 This means that

∂G(x) T T = 0 ⇐⇒ (A )i∗Ax = (A )i∗b. ∂xi

If we collect all such conditions (for i = 1,..., n) in one linear system we get AT Ax = AT b.

Classical Least Squares (cont.) Therefore ∂G(x) ∂xT ∂x ∂xT = AT Ax + xT AT A − 2 AT b ∂xi ∂xi ∂xi ∂xi T T T T T T = ei A Ax + x A Aei − 2ei A b T T T T = 2ei A Ax − 2ei A b

T T T T T T T since x A Aei = (x A Aei ) = ei A Ax is a scalar.

[email protected] MATH 532 114 T T (A )i∗Ax = (A )i∗b.

If we collect all such conditions (for i = 1,..., n) in one linear system we get AT Ax = AT b.

Classical Least Squares (cont.) Therefore ∂G(x) ∂xT ∂x ∂xT = AT Ax + xT AT A − 2 AT b ∂xi ∂xi ∂xi ∂xi T T T T T T = ei A Ax + x A Aei − 2ei A b T T T T = 2ei A Ax − 2ei A b

T T T T T T T since x A Aei = (x A Aei ) = ei A Ax is a scalar. This means that ∂G(x) = 0 ⇐⇒ ∂xi

[email protected] MATH 532 114 If we collect all such conditions (for i = 1,..., n) in one linear system we get AT Ax = AT b.

Classical Least Squares (cont.) Therefore ∂G(x) ∂xT ∂x ∂xT = AT Ax + xT AT A − 2 AT b ∂xi ∂xi ∂xi ∂xi T T T T T T = ei A Ax + x A Aei − 2ei A b T T T T = 2ei A Ax − 2ei A b

T T T T T T T since x A Aei = (x A Aei ) = ei A Ax is a scalar. This means that

∂G(x) T T = 0 ⇐⇒ (A )i∗Ax = (A )i∗b. ∂xi

[email protected] MATH 532 114 Classical Least Squares (cont.) Therefore ∂G(x) ∂xT ∂x ∂xT = AT Ax + xT AT A − 2 AT b ∂xi ∂xi ∂xi ∂xi T T T T T T = ei A Ax + x A Aei − 2ei A b T T T T = 2ei A Ax − 2ei A b

T T T T T T T since x A Aei = (x A Aei ) = ei A Ax is a scalar. This means that

∂G(x) T T = 0 ⇐⇒ (A )i∗Ax = (A )i∗b. ∂xi

If we collect all such conditions (for i = 1,..., n) in one linear system we get AT Ax = AT b.

[email protected] MATH 532 114 Now, for any other y = z + u we have

T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

[email protected] MATH 532 115 Now, for any other y = z + u we have

T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b

[email protected] MATH 532 115 Now, for any other y = z + u we have

T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b = zT (AT Az − AT b) − zT AT b + bT b

[email protected] MATH 532 115 Now, for any other y = z + u we have

T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b = zT (AT Az − AT b) − zT AT b + bT b = −zT AT b + bT b. | {z } =0

[email protected] MATH 532 115 T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b = zT (AT Az − AT b) − zT AT b + bT b = −zT AT b + bT b. | {z } =0 Now, for any other y = z + u we have

[email protected] MATH 532 115 T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b = zT (AT Az − AT b) − zT AT b + bT b = −zT AT b + bT b. | {z } =0 Now, for any other y = z + u we have

G(y) = (z + u)T AT A(z + u) − 2(z + u)T AT b + bT b

[email protected] MATH 532 115 T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b = zT (AT Az − AT b) − zT AT b + bT b = −zT AT b + bT b. | {z } =0 Now, for any other y = z + u we have

G(y) = (z + u)T AT A(z + u) − 2(z + u)T AT b + bT b = G(z) + uT AT Au + zT AT Au + uT AT Az − 2uT AT b

[email protected] MATH 532 115 T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b = zT (AT Az − AT b) − zT AT b + bT b = −zT AT b + bT b. | {z } =0 Now, for any other y = z + u we have

G(y) = (z + u)T AT A(z + u) − 2(z + u)T AT b + bT b = G(z) + uT AT Au + zT AT Au +uT AT Az − 2uT AT b | {z } |{z} =uT AT Az AT Az

[email protected] MATH 532 115 T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b = zT (AT Az − AT b) − zT AT b + bT b = −zT AT b + bT b. | {z } =0 Now, for any other y = z + u we have

G(y) = (z + u)T AT A(z + u) − 2(z + u)T AT b + bT b = G(z) + uT AT Au + zT AT Au +uT AT Az − 2uT AT b | {z } |{z} =uT AT Az AT Az = G(z) + uT AT Au

[email protected] MATH 532 115 Classical Least Squares

(cont.) To verify that we indeed have a minimum we show that if z is a solution of the normal equations then G(z) is minimal.

G(z) = (Az − b)T (Az − b) = zT AT Az − 2zT AT b + bT b = zT (AT Az − AT b) − zT AT b + bT b = −zT AT b + bT b. | {z } =0 Now, for any other y = z + u we have

G(y) = (z + u)T AT A(z + u) − 2(z + u)T AT b + bT b = G(z) + uT AT Au + zT AT Au +uT AT Az − 2uT AT b | {z } |{z} =uT AT Az AT Az = G(z) + uT AT Au ≥ G(z) T T Pm 2 since u A Au = i=1(Au)i ≥ 0. 

[email protected] MATH 532 115 Example 2 1 Let f (t) = α0 + α1t + α2t , i.e., we can use quadratic polynomials (or any other degree).

2 Let f (t) = α0 + α1 sin t + α2 cos t, i.e., we can use trigonometric polynomials. √ 3 Let f (t) = αet + β t, i.e., we can use just about anything we want.

Classical Least Squares

Remark Using this framework we can compute least squares fits from any linear .

[email protected] MATH 532 116 2 Let f (t) = α0 + α1 sin t + α2 cos t, i.e., we can use trigonometric polynomials. √ 3 Let f (t) = αet + β t, i.e., we can use just about anything we want.

Classical Least Squares

Remark Using this framework we can compute least squares fits from any linear function space.

Example 2 1 Let f (t) = α0 + α1t + α2t , i.e., we can use quadratic polynomials (or any other degree).

[email protected] MATH 532 116 √ 3 Let f (t) = αet + β t, i.e., we can use just about anything we want.

Classical Least Squares

Remark Using this framework we can compute least squares fits from any linear function space.

Example 2 1 Let f (t) = α0 + α1t + α2t , i.e., we can use quadratic polynomials (or any other degree).

2 Let f (t) = α0 + α1 sin t + α2 cos t, i.e., we can use trigonometric polynomials.

[email protected] MATH 532 116 Classical Least Squares

Remark Using this framework we can compute least squares fits from any linear function space.

Example 2 1 Let f (t) = α0 + α1t + α2t , i.e., we can use quadratic polynomials (or any other degree).

2 Let f (t) = α0 + α1 sin t + α2 cos t, i.e., we can use trigonometric polynomials. √ 3 Let f (t) = αet + β t, i.e., we can use just about anything we want.

[email protected] MATH 532 116 Now the actually observed data may be affected by noise, i.e.,

y = Y + ε = β0 + β1X1 + β2X2 + ... + βnXn + ε, where ε ∼ N (0, σ2) (normally distributed with mean zero and variance σ2) is another random variable denoting the noise. To determine the model parameters β1, . . . , βn we now look at measurements, i.e.,

yi = β0 + β1xi,1 + β2xi,2 + ... + βnxi,n + ε, i = 1,..., m.

Classical Least Squares Regression in (BLUE)

One assumes that there is a random process that generates data as a random variable Y of the form

Y = β0 + β1X1 + β2X2 + ... + βnXn,

where X1,..., Xn are (input) random variables and β1, . . . , βn are unknown parameters.

[email protected] MATH 532 117 To determine the model parameters β1, . . . , βn we now look at measurements, i.e.,

yi = β0 + β1xi,1 + β2xi,2 + ... + βnxi,n + ε, i = 1,..., m.

Classical Least Squares Regression in Statistics (BLUE)

One assumes that there is a random process that generates data as a random variable Y of the form

Y = β0 + β1X1 + β2X2 + ... + βnXn,

where X1,..., Xn are (input) random variables and β1, . . . , βn are unknown parameters. Now the actually observed data may be affected by noise, i.e.,

y = Y + ε = β0 + β1X1 + β2X2 + ... + βnXn + ε, where ε ∼ N (0, σ2) (normally distributed with mean zero and variance σ2) is another random variable denoting the noise.

[email protected] MATH 532 117 Classical Least Squares Regression in Statistics (BLUE)

One assumes that there is a random process that generates data as a random variable Y of the form

Y = β0 + β1X1 + β2X2 + ... + βnXn,

where X1,..., Xn are (input) random variables and β1, . . . , βn are unknown parameters. Now the actually observed data may be affected by noise, i.e.,

y = Y + ε = β0 + β1X1 + β2X2 + ... + βnXn + ε, where ε ∼ N (0, σ2) (normally distributed with mean zero and variance σ2) is another random variable denoting the noise. To determine the model parameters β1, . . . , βn we now look at measurements, i.e.,

yi = β0 + β1xi,1 + β2xi,2 + ... + βnxi,n + ε, i = 1,..., m.

[email protected] MATH 532 117 T −1 T T −1 T E[βˆ] = E[(X X) X y] = (X X) X E[y] = (XT X)−1XT Xβ = β,

so that the estimator is indeed unbiased.

To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0. Then E[y] = E[Xβ + ε] = E[Xβ] + E[ε] = Xβ and therefore

Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β.

[email protected] MATH 532 118 T −1 T T −1 T E[βˆ] = E[(X X) X y] = (X X) X E[y] = (XT X)−1XT Xβ = β,

so that the estimator is indeed unbiased.

Then E[y] = E[Xβ + ε] = E[Xβ] + E[ε] = Xβ and therefore

Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β. To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0.

[email protected] MATH 532 118 T −1 T T −1 T E[βˆ] = E[(X X) X y] = (X X) X E[y] = (XT X)−1XT Xβ = β,

so that the estimator is indeed unbiased.

= E[Xβ] + E[ε] = Xβ and therefore

Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β. To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0. Then E[y] = E[Xβ + ε]

[email protected] MATH 532 118 T −1 T T −1 T E[βˆ] = E[(X X) X y] = (X X) X E[y] = (XT X)−1XT Xβ = β,

so that the estimator is indeed unbiased.

= Xβ and therefore

Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β. To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0. Then E[y] = E[Xβ + ε] = E[Xβ] + E[ε]

[email protected] MATH 532 118 T −1 T T −1 T E[βˆ] = E[(X X) X y] = (X X) X E[y] = (XT X)−1XT Xβ = β,

so that the estimator is indeed unbiased.

and therefore

Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β. To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0. Then E[y] = E[Xβ + ε] = E[Xβ] + E[ε] = Xβ

[email protected] MATH 532 118 T −1 T = (X X) X E[y] = (XT X)−1XT Xβ = β,

so that the estimator is indeed unbiased.

Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β. To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0. Then E[y] = E[Xβ + ε] = E[Xβ] + E[ε] = Xβ and therefore

T −1 T E[βˆ] = E[(X X) X y]

[email protected] MATH 532 118 = (XT X)−1XT Xβ = β,

so that the estimator is indeed unbiased.

Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β. To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0. Then E[y] = E[Xβ + ε] = E[Xβ] + E[ε] = Xβ and therefore

T −1 T T −1 T E[βˆ] = E[(X X) X y] = (X X) X E[y]

[email protected] MATH 532 118 = β,

so that the estimator is indeed unbiased.

Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β. To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0. Then E[y] = E[Xβ + ε] = E[Xβ] + E[ε] = Xβ and therefore

T −1 T T −1 T E[βˆ] = E[(X X) X y] = (X X) X E[y] = (XT X)−1XT Xβ

[email protected] MATH 532 118 Classical Least Squares

In matrix-vector form this gives us

y = Xβ + ε

Now, the least squares solution of Xβ = y, i.e., βˆ = (XT X)−1XT y is in fact the best linear unbiased estimator (BLUE) for β. To show this one needs an assumption that the error is unbiased, i.e., E[ε] = 0. Then E[y] = E[Xβ + ε] = E[Xβ] + E[ε] = Xβ and therefore

T −1 T T −1 T E[βˆ] = E[(X X) X y] = (X X) X E[y] = (XT X)−1XT Xβ = β, so that the estimator is indeed unbiased.

[email protected] MATH 532 118 Classical Least Squares

Remark One can also show (maybe later) that βˆ has minimal variance among all unbiased linear estimators, so it is the best linear unbiased estimator of the model parameters.

In fact, the theorem ensuring this is the so-called Gauss-Markov theorem.

[email protected] MATH 532 119 Kriging as best linear unbiased predictor Outline

1 Spaces and Subspaces

2 Four Fundamental Subspaces

3 Linear Independence

4 Bases and Dimension

5 More About Rank

6 Classical Least Squares

7 Kriging as best linear unbiased predictor

[email protected] MATH 532 120 ˆ Since all of the Yxj have zero mean the predictor Yx is automatically unbiased. ? Goal: to compute “optimal” weights wj (·), j = 1,..., N. To this end, consider the mean-squared error (MSE) of the predictor, i.e.,  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y .

We now present some details (see [FM15]).

Kriging as best linear unbiased predictor Kriging: a regression approach

Assume: the approximate value of a realization of a zero-mean (Gaussian) random field is given by a linear predictor of the form N ˆ X T Yx = Yxj wj (x) = w(x) Y , j=1 ˆ T where Yx and Yxj are random variables, Y = Yx1 ··· YxN , and T w(x) = w1(x) ··· wN (x) is a vector of weight functions at x.

[email protected] MATH 532 121 ? Goal: to compute “optimal” weights wj (·), j = 1,..., N. To this end, consider the mean-squared error (MSE) of the predictor, i.e.,  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y .

We now present some details (see [FM15]).

Kriging as best linear unbiased predictor Kriging: a regression approach

Assume: the approximate value of a realization of a zero-mean (Gaussian) random field is given by a linear predictor of the form N ˆ X T Yx = Yxj wj (x) = w(x) Y , j=1 ˆ T where Yx and Yxj are random variables, Y = Yx1 ··· YxN , and T w(x) = w1(x) ··· wN (x) is a vector of weight functions at x. ˆ Since all of the Yxj have zero mean the predictor Yx is automatically unbiased.

[email protected] MATH 532 121 To this end, consider the mean-squared error (MSE) of the predictor, i.e.,  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y .

We now present some details (see [FM15]).

Kriging as best linear unbiased predictor Kriging: a regression approach

Assume: the approximate value of a realization of a zero-mean (Gaussian) random field is given by a linear predictor of the form N ˆ X T Yx = Yxj wj (x) = w(x) Y , j=1 ˆ T where Yx and Yxj are random variables, Y = Yx1 ··· YxN , and T w(x) = w1(x) ··· wN (x) is a vector of weight functions at x. ˆ Since all of the Yxj have zero mean the predictor Yx is automatically unbiased. ? Goal: to compute “optimal” weights wj (·), j = 1,..., N.

[email protected] MATH 532 121 We now present some details (see [FM15]).

Kriging as best linear unbiased predictor Kriging: a regression approach

Assume: the approximate value of a realization of a zero-mean (Gaussian) random field is given by a linear predictor of the form N ˆ X T Yx = Yxj wj (x) = w(x) Y , j=1 ˆ T where Yx and Yxj are random variables, Y = Yx1 ··· YxN , and T w(x) = w1(x) ··· wN (x) is a vector of weight functions at x. ˆ Since all of the Yxj have zero mean the predictor Yx is automatically unbiased. ? Goal: to compute “optimal” weights wj (·), j = 1,..., N. To this end, consider the mean-squared error (MSE) of the predictor, i.e.,  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y .

[email protected] MATH 532 121 Kriging as best linear unbiased predictor Kriging: a regression approach

Assume: the approximate value of a realization of a zero-mean (Gaussian) random field is given by a linear predictor of the form N ˆ X T Yx = Yxj wj (x) = w(x) Y , j=1 ˆ T where Yx and Yxj are random variables, Y = Yx1 ··· YxN , and T w(x) = w1(x) ··· wN (x) is a vector of weight functions at x. ˆ Since all of the Yxj have zero mean the predictor Yx is automatically unbiased. ? Goal: to compute “optimal” weights wj (·), j = 1,..., N. To this end, consider the mean-squared error (MSE) of the predictor, i.e.,  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y .

We now present some details (see [FM15]).

[email protected] MATH 532 121 = E [(Yx − E[Yx ])(Yz − E[Yz ])] = E [Yx Yz − Yx E[Yz ] − E[Yx ]Yz + E[Yx ]E[Yz ]] = E[Yx Yz ] − E[Yx ]E[Yz ] − E[Yx ]E[Yz ] + E[Yx ]E[Yz ] = E[Yx Yz ] − E[Yx ]E[Yz ] = E[Yx Yz ] − µ(x)µ(z). Therefore, the variance of the random field,

2 2 2 2 Var(Yx ) = E[Yx ] − E[Yx ] = E[Yx ] − µ (x), corresponds to the “diagonal” of the , i.e.,

2 Var(Yx ) = σ K (x, x).

Kriging as best linear unbiased predictor Covariance Kernel

We need the covariance kernel K of a random field Y with mean µ(x). It is defined via

2 σ K (x, z) = Cov(Yx , Yz ) = E [(Yx − µ(x))(Yz − µ(z))]

[email protected] MATH 532 122 = E [Yx Yz − Yx E[Yz ] − E[Yx ]Yz + E[Yx ]E[Yz ]] = E[Yx Yz ] − E[Yx ]E[Yz ] − E[Yx ]E[Yz ] + E[Yx ]E[Yz ] = E[Yx Yz ] − E[Yx ]E[Yz ] = E[Yx Yz ] − µ(x)µ(z). Therefore, the variance of the random field,

2 2 2 2 Var(Yx ) = E[Yx ] − E[Yx ] = E[Yx ] − µ (x), corresponds to the “diagonal” of the covariance, i.e.,

2 Var(Yx ) = σ K (x, x).

Kriging as best linear unbiased predictor Covariance Kernel

We need the covariance kernel K of a random field Y with mean µ(x). It is defined via

2 σ K (x, z) = Cov(Yx , Yz ) = E [(Yx − µ(x))(Yz − µ(z))] = E [(Yx − E[Yx ])(Yz − E[Yz ])]

[email protected] MATH 532 122 = E[Yx Yz ] − E[Yx ]E[Yz ] − E[Yx ]E[Yz ] + E[Yx ]E[Yz ] = E[Yx Yz ] − E[Yx ]E[Yz ] = E[Yx Yz ] − µ(x)µ(z). Therefore, the variance of the random field,

2 2 2 2 Var(Yx ) = E[Yx ] − E[Yx ] = E[Yx ] − µ (x), corresponds to the “diagonal” of the covariance, i.e.,

2 Var(Yx ) = σ K (x, x).

Kriging as best linear unbiased predictor Covariance Kernel

We need the covariance kernel K of a random field Y with mean µ(x). It is defined via

2 σ K (x, z) = Cov(Yx , Yz ) = E [(Yx − µ(x))(Yz − µ(z))] = E [(Yx − E[Yx ])(Yz − E[Yz ])] = E [Yx Yz − Yx E[Yz ] − E[Yx ]Yz + E[Yx ]E[Yz ]]

[email protected] MATH 532 122 = E[Yx Yz ] − E[Yx ]E[Yz ] = E[Yx Yz ] − µ(x)µ(z). Therefore, the variance of the random field,

2 2 2 2 Var(Yx ) = E[Yx ] − E[Yx ] = E[Yx ] − µ (x), corresponds to the “diagonal” of the covariance, i.e.,

2 Var(Yx ) = σ K (x, x).

Kriging as best linear unbiased predictor Covariance Kernel

We need the covariance kernel K of a random field Y with mean µ(x). It is defined via

2 σ K (x, z) = Cov(Yx , Yz ) = E [(Yx − µ(x))(Yz − µ(z))] = E [(Yx − E[Yx ])(Yz − E[Yz ])] = E [Yx Yz − Yx E[Yz ] − E[Yx ]Yz + E[Yx ]E[Yz ]] = E[Yx Yz ] − E[Yx ]E[Yz ] − E[Yx ]E[Yz ] + E[Yx ]E[Yz ]

[email protected] MATH 532 122 Therefore, the variance of the random field,

2 2 2 2 Var(Yx ) = E[Yx ] − E[Yx ] = E[Yx ] − µ (x), corresponds to the “diagonal” of the covariance, i.e.,

2 Var(Yx ) = σ K (x, x).

Kriging as best linear unbiased predictor Covariance Kernel

We need the covariance kernel K of a random field Y with mean µ(x). It is defined via

2 σ K (x, z) = Cov(Yx , Yz ) = E [(Yx − µ(x))(Yz − µ(z))] = E [(Yx − E[Yx ])(Yz − E[Yz ])] = E [Yx Yz − Yx E[Yz ] − E[Yx ]Yz + E[Yx ]E[Yz ]] = E[Yx Yz ] − E[Yx ]E[Yz ] − E[Yx ]E[Yz ] + E[Yx ]E[Yz ] = E[Yx Yz ] − E[Yx ]E[Yz ] = E[Yx Yz ] − µ(x)µ(z).

[email protected] MATH 532 122 Kriging as best linear unbiased predictor Covariance Kernel

We need the covariance kernel K of a random field Y with mean µ(x). It is defined via

2 σ K (x, z) = Cov(Yx , Yz ) = E [(Yx − µ(x))(Yz − µ(z))] = E [(Yx − E[Yx ])(Yz − E[Yz ])] = E [Yx Yz − Yx E[Yz ] − E[Yx ]Yz + E[Yx ]E[Yz ]] = E[Yx Yz ] − E[Yx ]E[Yz ] − E[Yx ]E[Yz ] + E[Yx ]E[Yz ] = E[Yx Yz ] − E[Yx ]E[Yz ] = E[Yx Yz ] − µ(x)µ(z). Therefore, the variance of the random field,

2 2 2 2 Var(Yx ) = E[Yx ] − E[Yx ] = E[Yx ] − µ (x), corresponds to the “diagonal” of the covariance, i.e.,

2 Var(Yx ) = σ K (x, x).

[email protected] MATH 532 122 T T T = E[Yx Yx ] − 2E[Yx w(x) Y ] + E[w(x) YY w(x)]

Now use E[Yx Yz ] = K (x, z) (the covariance, since Y is centered): ˆ 2 T 2 T 2 MSE(Yx ) = σ K (x, x) − 2w(x) (σ k(x)) + w(x) (σ K)w(x), where 2 2 T σ k(x) = σ k1(x) ··· kN (x) : with 2 2 σ kj (x) = σ K (x, xj ) = E[Yx Yxj ] 2 K: the covariance matrix has entries σ K (xi , xj ) = E[Yxi Yxj ] Finding the minimum MSE is straightforward. Differentiation and equating to zero yields −2k(x) + 2Kw(x) = 0, and so the optimum weight vector is ? w(x) = K−1k(x).

Kriging as best linear unbiased predictor Let’s now work out the MSE:  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y

[email protected] MATH 532 123 Now use E[Yx Yz ] = K (x, z) (the covariance, since Y is centered): ˆ 2 T 2 T 2 MSE(Yx ) = σ K (x, x) − 2w(x) (σ k(x)) + w(x) (σ K)w(x), where 2 2 T σ k(x) = σ k1(x) ··· kN (x) : with 2 2 σ kj (x) = σ K (x, xj ) = E[Yx Yxj ] 2 K: the covariance matrix has entries σ K (xi , xj ) = E[Yxi Yxj ] Finding the minimum MSE is straightforward. Differentiation and equating to zero yields −2k(x) + 2Kw(x) = 0, and so the optimum weight vector is ? w(x) = K−1k(x).

Kriging as best linear unbiased predictor Let’s now work out the MSE:  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y

T T T = E[Yx Yx ] − 2E[Yx w(x) Y ] + E[w(x) YY w(x)]

[email protected] MATH 532 123 Finding the minimum MSE is straightforward. Differentiation and equating to zero yields −2k(x) + 2Kw(x) = 0, and so the optimum weight vector is ? w(x) = K−1k(x).

Kriging as best linear unbiased predictor Let’s now work out the MSE:  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y

T T T = E[Yx Yx ] − 2E[Yx w(x) Y ] + E[w(x) YY w(x)]

Now use E[Yx Yz ] = K (x, z) (the covariance, since Y is centered): ˆ 2 T 2 T 2 MSE(Yx ) = σ K (x, x) − 2w(x) (σ k(x)) + w(x) (σ K)w(x), where 2 2 T σ k(x) = σ k1(x) ··· kN (x) : with 2 2 σ kj (x) = σ K (x, xj ) = E[Yx Yxj ] 2 K: the covariance matrix has entries σ K (xi , xj ) = E[Yxi Yxj ]

[email protected] MATH 532 123 Differentiation and equating to zero yields −2k(x) + 2Kw(x) = 0, and so the optimum weight vector is ? w(x) = K−1k(x).

Kriging as best linear unbiased predictor Let’s now work out the MSE:  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y

T T T = E[Yx Yx ] − 2E[Yx w(x) Y ] + E[w(x) YY w(x)]

Now use E[Yx Yz ] = K (x, z) (the covariance, since Y is centered): ˆ 2 T 2 T 2 MSE(Yx ) = σ K (x, x) − 2w(x) (σ k(x)) + w(x) (σ K)w(x), where 2 2 T σ k(x) = σ k1(x) ··· kN (x) : with 2 2 σ kj (x) = σ K (x, xj ) = E[Yx Yxj ] 2 K: the covariance matrix has entries σ K (xi , xj ) = E[Yxi Yxj ] Finding the minimum MSE is straightforward.

[email protected] MATH 532 123 and so the optimum weight vector is ? w(x) = K−1k(x).

Kriging as best linear unbiased predictor Let’s now work out the MSE:  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y

T T T = E[Yx Yx ] − 2E[Yx w(x) Y ] + E[w(x) YY w(x)]

Now use E[Yx Yz ] = K (x, z) (the covariance, since Y is centered): ˆ 2 T 2 T 2 MSE(Yx ) = σ K (x, x) − 2w(x) (σ k(x)) + w(x) (σ K)w(x), where 2 2 T σ k(x) = σ k1(x) ··· kN (x) : with 2 2 σ kj (x) = σ K (x, xj ) = E[Yx Yxj ] 2 K: the covariance matrix has entries σ K (xi , xj ) = E[Yxi Yxj ] Finding the minimum MSE is straightforward. Differentiation and equating to zero yields −2k(x) + 2Kw(x) = 0,

[email protected] MATH 532 123 Kriging as best linear unbiased predictor Let’s now work out the MSE:  2 ˆ  T  MSE(Yx ) = E Yx − w(x) Y

T T T = E[Yx Yx ] − 2E[Yx w(x) Y ] + E[w(x) YY w(x)]

Now use E[Yx Yz ] = K (x, z) (the covariance, since Y is centered): ˆ 2 T 2 T 2 MSE(Yx ) = σ K (x, x) − 2w(x) (σ k(x)) + w(x) (σ K)w(x), where 2 2 T σ k(x) = σ k1(x) ··· kN (x) : with 2 2 σ kj (x) = σ K (x, xj ) = E[Yx Yxj ] 2 K: the covariance matrix has entries σ K (xi , xj ) = E[Yxi Yxj ] Finding the minimum MSE is straightforward. Differentiation and equating to zero yields −2k(x) + 2Kw(x) = 0, and so the optimum weight vector is ? w(x) = K−1k(x).

[email protected] MATH 532 123 Since we are given the observations y as realizations of Y we can compute the prediction

T −1 yˆx = k(x) K y.

Kriging as best linear unbiased predictor

We have shown that the (simple) kriging predictor

ˆ T −1 Yx = k(x) K Y is the best (in the MSE sense) linear unbiased predictor (BLUP).

[email protected] MATH 532 124 Kriging as best linear unbiased predictor

We have shown that the (simple) kriging predictor

ˆ T −1 Yx = k(x) K Y is the best (in the MSE sense) linear unbiased predictor (BLUP).

Since we are given the observations y as realizations of Y we can compute the prediction

T −1 yˆx = k(x) K y.

[email protected] MATH 532 124 Remark For Gaussian random fields the BLUP is also the best nonlinear unbiased predictor (see, e.g., [BTA04, Chapter 2]).

Kriging as best linear unbiased predictor

? The MSE of the kriging predictor with optimal weights w(·),

 2  ˆ  2  T −1  E Yx − Yx = σ K (x, x) − k(x) K k(x) , is known as the kriging variance. It allows us to give confidence intervals for our prediction. It also gives rise to a criterion for choosing an optimal parametrization of the family of covariance kernels used for prediction.

[email protected] MATH 532 125 Kriging as best linear unbiased predictor

? The MSE of the kriging predictor with optimal weights w(·),

 2  ˆ  2  T −1  E Yx − Yx = σ K (x, x) − k(x) K k(x) , is known as the kriging variance. It allows us to give confidence intervals for our prediction. It also gives rise to a criterion for choosing an optimal parametrization of the family of covariance kernels used for prediction.

Remark For Gaussian random fields the BLUP is also the best nonlinear unbiased predictor (see, e.g., [BTA04, Chapter 2]).

[email protected] MATH 532 125 The latter statement implies that one should be using kernels whose associated weights decay away from x. Positive definite translation kernels have this property.

2 More advanced kriging variants are discussed in papers such as [SWMW89, SSS13], or books such as [Cre93, Ste99, BTA04].

Kriging as best linear unbiased predictor

Remark 1 The simple kriging approach just described is precisely how Krige [Kri51] introduced the method: The unknown value to be predicted is given by a weighted average of the observed values, where the weights depend on the prediction location. Usually one assigns a smaller weight to observations further away from x.

[email protected] MATH 532 126 2 More advanced kriging variants are discussed in papers such as [SWMW89, SSS13], or books such as [Cre93, Ste99, BTA04].

Kriging as best linear unbiased predictor

Remark 1 The simple kriging approach just described is precisely how Krige [Kri51] introduced the method: The unknown value to be predicted is given by a weighted average of the observed values, where the weights depend on the prediction location. Usually one assigns a smaller weight to observations further away from x. The latter statement implies that one should be using kernels whose associated weights decay away from x. Positive definite translation invariant kernels have this property.

[email protected] MATH 532 126 Kriging as best linear unbiased predictor

Remark 1 The simple kriging approach just described is precisely how Krige [Kri51] introduced the method: The unknown value to be predicted is given by a weighted average of the observed values, where the weights depend on the prediction location. Usually one assigns a smaller weight to observations further away from x. The latter statement implies that one should be using kernels whose associated weights decay away from x. Positive definite translation invariant kernels have this property.

2 More advanced kriging variants are discussed in papers such as [SWMW89, SSS13], or books such as [Cre93, Ste99, BTA04].

[email protected] MATH 532 126 Appendix References ReferencesI

[BTA04] A. Berlinet and C. Thomas-Agnan, Reproducing Kernel Hilbert Spaces in Probability and Statistics, Kluwer, Dordrecht, 2004.

[Cre93] N. Cressie, Statistics for Spatial Data, revised ed., Wiley–Interscience, New York, 1993. [FM15] G. E. Fasshauer and M. J. McCourt, Kernel-based Approximation Methods using MATLAB, Interdisciplinary Mathematical Sciences, vol. 19, World Scientific Publishing, Singapore, 2015.

[Kri51] D. G. Krige, A statistical approach to some basic mine valuation problems on the Witwatersrand, J. Chem. Met. & Mining Soc., S. Africa 52 (1951), no. 6, 119–139. [Mey00] Carl D. Meyer, and Applied Linear Algebra, SIAM, Philadelphia, PA, 2000.

[SSS13] M. Scheuerer, R. Schaback, and M. Schlather, Interpolation of spatial data — a stochastic or a deterministic problem?, Eur. J. Appl. Math. 24 (2013), no. 4, 601–629.

[email protected] MATH 532 127 Appendix References ReferencesII

[Ste99] M. L. Stein, Interpolation of Spatial Data: Some Theory for Kriging, Springer, Berlin; New York, 1999.

[SWMW89] Jerome Sacks, W. J. Welch, T. J. Mitchell, and H. P. Wynn, Design and analysis of computer experiments, Stat. Sci. 4 (1989), no. 4, 409–423.

[email protected] MATH 532 128