<<

Matrix Multiplication

Let A be l × m , and B be m × n matrix. Then C = A × B is given by

m X cij = aik × bkj , (1) k=1

where aik , bkj , cij are defined as before.

Zheng-Liang Lu 312 / 354 For example,

Pm Note that if n = 1, then cij = k=1 aik × bkj becomes the matrix A times the vector B.

Zheng-Liang Lu 313 / 354 Example

l×m m×n Let A ∈ R and B ∈ R . Please implement a program that A multiplies B and compare your result to A × B provided in MATLAB. For simplicity, the dimensions l, m, n are the inputs to generate random matrices by rand.

Input: l, m, n for [A]l×m and [B]m×n Output: A × B

Zheng-Liang Lu 314 / 354 1 clear; clc; 2 % main 3 l = randi(5, 1);%A {lxm } 4 m = randi(5, 1);%B {mxn } 5 n = randi(5, 1); 6 A = randi(10, l, m); 7 B = randi(10, m, n); 8 C = zeros(size(A, 1), size(B, 2)); 9 for i = 1 : 1 : size(A, 1) 10 for j = 1 : 1 : size(B, 2) 11 for k = 1 : 1 : size(A, 2) 12 C(i, j) = C(i, j) + A(i, k) * B(k, j); 13 end 14 end 15 end

Time complexity: O(n3) Strassen (1969): O(n2.807355) Coppersmith and Winograd (2010): O(n2.375477)

Zheng-Liang Lu 315 / 354 Matrix Exponentiation

Raising a matrix to a power is equivalent to repeatedly multiplying the matrix by itself. For example, A2 = A × A. Note that A should be a . (Why?) In mathematics, the matrix exponential1 is a matrix function on square matrices analogous to the ordinary exponential function. That is, f (A) = eA, similar to f (x) = ex . However, raising a matrix to a matrix power, that is, AB , is undefined.

1See and . Zheng-Liang Lu 316 / 354 Example

Be aware that exp(X ) and exp(1)X are different for any matrix X .

1 clear; clc; 2 % main 3 X = [0 1; 1 0]; 4 mat exp1 = exp(1) ˆ X 5 mat exp2 = 0; 6 for i = 1 : 10% check the identity ofeˆa 7 mat exp2 = mat exp2 + X ˆ (i - 1) / ... factorial(i - 1); 8 end 9 mat exp2 10 mat exp3 = exp(X)% different?

Zheng-Liang Lu 317 / 354 1 mat exp1 = 2 3 1.5431 1.1752 4 1.1752 1.5431 5 6 7 mat exp2 = 8 9 1.5431 1.1752 10 1.1752 1.5431 11 12 13 mat exp3 = 14 15 1.0000 2.7183 16 2.7183 1.0000

Zheng-Liang Lu 318 / 354

det(A) is the determinant of the square matrix A.2  a b  Recall that if A = , then det(A) = ad − bc. c d det(A) can calculate the determinant of A. For example,

1 clear; clc; 2 % main 3 A = magic(3); 4 det(A)

2In math, |A| also denotes det(A). Zheng-Liang Lu 319 / 354 Actually, a general formula3 for determinant is somehow complicated. You can try a recursive algorithm to calculate the determinant for any n-by-n matrices.

3See determinant. Zheng-Liang Lu 320 / 354 Recursive Algorithm for Determinant

1 function y = myDet(X) 2 [r, c] = size(X); 3 if isequal(r,c)% checkr==c? 4 if c == 1 5 y = X; 6 elseif c == 2 7 y = X(1, 1) * X(2, 2) - X(1, 2) * X(2, 1); 8 else 9 fori=1:c 10 if i == 1 11 % recall thatc ==r 12 y = X(1, 1) * myDet(X(2 : c, 2 : c)); 13 elseif (i > 1 && i < c) 14 y = y + X(1, i) * (-1)ˆ(i + 1) *... 15 myDet(X(2 : c, [1 : i-1, i + 1 : ... c])); 16 else

Zheng-Liang Lu 321 / 354 17 y = y + X(1, c) * (-1)ˆ(c + 1) * ... myDet(X(2 : c, 1 : c - 1)); 18 end 19 end 20 end 21 else 22 error('Should bea square matrix.') 23 end

Obviously, there are n! terms in the calculation. So, this algorithm runs in O(en log n). Try any 10-by-10 matrix. Compare the running time with det.4

4By LU decomposition with running time O(n3). Zheng-Liang Lu 322 / 354 Inverse Matrices

In , an n-by-n square matrix A is invertible if there exists an n-by-n square matrix B such that

AB = BA = In,

where In denotes the n-by-n . Note that eye is similar to zeros and returns an identity matrix. The following statements are equivalent: A is invertible det(A) 6= 0 (nonsingular) A is full rank5

5See rank. Zheng-Liang Lu 323 / 354 inv(A) returns an inverse matrix of A. Note that inv(A) may return inf if A is badly scaled or nearly singular.6. So, if A is not invertible, then det(A) = 0. It is well-known that adj(A) A−1 = , det(A)

where adj(A) refers to the adjugate matrix of A. You can try to calculate an adjugate matrix7 for any n-by-n matrix. (Try.)

6That is, det(A) is near 0. 7See adjugate matrix. Zheng-Liang Lu 324 / 354 Example

1 clear; clc 2 % main 3 A = pascal(4);% 4 B = inv(A)

1 >> B = 2 3 4.0000 -6.0000 4.0000 -1.0000 4 -6.0000 14.0000 -11.0000 3.0000 5 4.0000 -11.0000 10.0000 -3.0000 6 -1.0000 3.0000 -3.0000 1.0000

Try inv(magic(3)).

Zheng-Liang Lu 325 / 354 System of Linear Equations

A system of linear equations is a set of linear equations involving the same set of variables. For example,   3x +2y −z = 1 x −y +2z = −1  −2x +y −2z = 0

After doing math, we have x = 1, y = −2, and z = −2.

Zheng-Liang Lu 326 / 354 A general system of m linear equations with n unknowns can be written as  a11x1 +a12x2 ··· +a1nxn = b1   a21x1 +a22x2 ··· +a2nxn = b2 ......  . . . . = .   am1x1 +am2x2 ··· +amnxn = bm where x1,..., xn are unknowns, a11,..., amn are the coefficients of the system, and b1,..., bm are the constant terms.

Zheng-Liang Lu 327 / 354 So, we can rewrite the system of linear equations as a matrix equation, given by Ax = b. where   a11 a12 ··· a1n  a21 a22 ··· a2n  A =   ,  . . .. .   . . . .  am1 am2 ··· amn   x1  .  x =  .  , and xn   b1  .  b =  .  . bm

Zheng-Liang Lu 328 / 354 General System of Linear Equations

Let x be the column vector with n independent variables and m constraints.8 If m = n9, then there exists the unique solution x0. If m > n, then there is no exact solution but there exists one least-squares error solution such that kAx0 − bk2 is minimal. Let x 0 be the least-square error solution. Then the error is  = Ax 0 − b, usually not zero. So, kAx 0 − bk2 = (Ax 0 − b)(Ax 0 − b) = 2 is minimal. If m < n, then there are infinitely many solutions.

8Assume that these equations are linearly independent. 9Equivalently, rank(A) = rank([A b]). Or, see Cramer’s rule. Zheng-Liang Lu 329 / 354 Quick Glance at Method of Least Squares

The method of least squares is a standard approach to the approximate solution of overdetermined systems.

Zheng-Liang Lu 330 / 354 Matrix Division in MATLAB

Consider Ax = b for a system of linear equations. A\b is the matrix division of A into B, which is roughly the same as inv(A)b, except that it is computed in a different way. Left matrix divide (\) or mldivide is to do

x =A−1b =inv(A)b =A\b.

Zheng-Liang Lu 331 / 354 Unique Solution (m = n)

For example,   3x +2y −z = 1 x −y +2z = −1  −2x +y −2z = 0

1 >> A = [3 2 -1; 1 -1 2; -2 1 -2]; 2 >> b = [1; -1; 0]; 3 >> x = A \ b 4 5 1 6 -2 7 -2

Zheng-Liang Lu 332 / 354 Overdetermined System (m > n)

For example,   2x −y = 2 x −2y = −2  x +y = 1

1 >> A=[2 -1; 1 -2; 1 1]; 2 >> b=[2; -2; 1]; 3 >> x = A \ b 4 5 1 6 1

Zheng-Liang Lu 333 / 354 Underdetermined System (m < n)

For example,

 x +2y +3z = 7 4x +5y +6z = 8

1 >> A = [1 2 3; 4 5 6]; 2 >> b = [7; 8]; 3 >> x = A \ b 4 5 -3 6 0 7 3.333 8 9 %(Why?)

Zheng-Liang Lu 334 / 354 Gauss Elimination

Recall the Gauss Elimination in high school. For example,   3x +2y −z = 1 x −y +2z = −1  −2x +y −2z = 0

It is known that x = 1, y = −2, and z = −2.

Zheng-Liang Lu 335 / 354 Check if det(A) = 0, then the program terminates; otherwise, the program continues. Loop to form upper which looks like   1 ¯a12 ··· ¯a1n  0 1 ··· ¯a2n  A¯ =   ,  . . .   . . 1 .  0 0 ··· 1 and   b¯1  b¯2  b¯ =   .  .   .  b¯n where ¯aij s and b¯i s are the resulting values.

Zheng-Liang Lu 336 / 354 Use backward substitution to determine the solution vector x by

xn = b¯n, n X xi = b¯i − ¯aij xj , j=i+1 where i ∈ {1, ··· , n − 1}.

Zheng-Liang Lu 337 / 354 Solution

1 clear; clc; 2 % main 3 A = [3 2 -1; 1 -1 2; -2 1 -2]; 4 b = [1; -1; 0]; 5 x = zeros(3, 1); 6 if rank(A) == rank([A b]) 7 fori=1:3 8 forj=i:3 9 b(j) = b(j) / A(j, i); 10 A(j, :) = A(j, :) / A(j, i); 11 end 12 forj=i+1:3 13 A(j, :) = A(j, :) - A(i, :); 14 b(j) = b(j) - b(i); 15 end 16 end 17 for i= 3 : -1 : 1 18 x(i) = b(i);

Zheng-Liang Lu 338 / 354 19 forj=i+1:1:3 20 x(i) = x(i) - A(i, j) * x(j); 21 end 22 end 23 else 24 disp('No unique solution.'); 25 end

Zheng-Liang Lu 339 / 354 Extension

How to extend this algorithm for any simultaneous equations of n variables?

1 clear; clc 2 3 n = randi(10, 1); 4 A = randi(100, n, n); 5 b = randi(100, n, 1); 6 if rank(A) == rank([A b]) 7 fori=1:n 8 forj=i:n 9 b(j) = b(j) / A(j, i); 10 A(j, :) = A(j, :) / A(j, i); 11 end 12 forj=i+1:n 13 A(j,:) = A(j, :) - A(i, :); 14 b(j) = b(j) - b(i);

Zheng-Liang Lu 340 / 354 15 end 16 end 17 for i= n : -1 : 1 18 x(i) = b(i); 19 forj=i+1:1:n 20 x(i) = x(i) - A(i, j) * x(j); 21 end 22 end 23 else 24 disp('No unique solution.'); 25 end 26 x'

Zheng-Liang Lu 341 / 354 Exercise

rank is used to check if rank(A) =rank([A, b]). rref is used to reduce the argumented matrix [A, b]. Use built-in functions to implement a program that solves a general system of linear equations.

Zheng-Liang Lu 342 / 354 Zheng-Liang Lu 343 / 354 Solution

1 function y = linearSolver(A,b) 2 3 if rank(A) == rank([A b])% argumented matrix 4 if rank(A) == size(A, 2); 5 disp('Exact one solution.') 6 x = A \ b 7 else 8 disp('Infinite numbers of solutions.') 9 rref([A b]) 10 end 11 else 12 disp('There is no solution.(Only least ... square solutions.)') 13 end

Can you replace reff with your Gaussian elimination algorithm?

Zheng-Liang Lu 344 / 354 Method of Least Squares

The first clear and concise exposition of the method of least squares was published by Legendre in 1805. In 1809, Gauss published his method of calculating the orbits of celestial bodies. The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. To obtain the coefficient estimates, the least-squares method minimizes the summed square of residuals.

Zheng-Liang Lu 345 / 354 More specific...

n n Let {yi }i=1 be the observed response values and {yˆi }i=1 be the fitted response values.

Define the error or residual i = yi − yˆi for i = 1,..., n. Then the sum of squares error estimates associated with the data is given by n X 2 S = i . (2) i=1

Zheng-Liang Lu 346 / 354 Linear Least Squares

A linear model is defined as an equation that is linear in the coefficients. Suppose that you have n data points that can be modeled by a 1st-order polynomial, given by

y = ax + b.

By (2), i = yi − (axi + b). Pn 2 Now S = i=1(yi − (axi + b)) . The least-squares fitting process minimizes the summed square of the residuals. The coefficient a and b are determined by differentiating S with respect to each parameter, and setting the result equal to zero. (Why?)

Zheng-Liang Lu 347 / 354 Hence,

n ∂S X = − 2 x (y − (ax + b)) = 0, ∂a i i i i=1 n ∂S X = − 2 (y − (ax + b)) = 0. ∂b i i i=1 The normal equations are defined as

n n n X 2 X X a xi + b xi = xi yi , i=1 i=1 i=1 n n X X a xi + nb = yi . i=1 i=1

Zheng-Liang Lu 348 / 354 In fact,

 Pn 2 Pn     Pn  i=1 xi i=1 xi a i=1 xi yi Pn = Pn . i=1 xi n b i=1 yi Solving for a, b, we have

n Pn x y − Pn x Pn y a = i=1 i i i=1 i i=1 i Pn 2 Pn 2 n i=1 xi − ( i=1 xi ) Cov(X , Y ) = , Var(X ) where Cov(X , Y ) refers to the covariance between X and Y . Also, we have n n 1 X X b = ( y − a x ). n i i i=1 i=1

Zheng-Liang Lu 349 / 354 Example: Drag Coefficients

Let v be the velocity of a moving object and k be a positive constant. The drag force due to air resistance is proportional to the square of the velocity, that is, d = kv 2. In a wind tunnel experiment, the velocity v can be varied by setting the speed of the fan and the drag can be measured directly.

Zheng-Liang Lu 350 / 354 The following sequence of commands replicates the data one might receive from a wind tunnel:

1 clear; clc; 2 % main 3 v = [0 : 1 : 60]'; 4 d = [0.1234 * v .ˆ 2]'; 5 dn = d + 0.4 * v .* randn(size(v)); 6 figure(1), plot(v, dn,' *', v, d,'r-'); grid on; 7 legend('Data','Analytic');

Zheng-Liang Lu 351 / 354 Zheng-Liang Lu 352 / 354 The unknown coefficient k is to be determined by the method of least squares. The formulation could be

 2  v1 k = dn1  2  v2 k = dn2 . .  .  2 v61k = dn61 Recall that for any matrix A and vector b with Ax = b, x = A\b returns the least square solution.

1 >> k = v .ˆ2 \ dn% note that thev and dn ... vectors are row vectors, need to be transposed. 2 3 k = 4 5 0.1239

Zheng-Liang Lu 353 / 354 Generalized Linear Least Squares

In matrix form, linear models are given by the formula

y = Xb + ,

where y is an n-by-1 vector of responses, X is the n-by-m matrix for the model, b is m-by-1 vector of coefficients, and  is an n-by-1 vector of errors. Then the normal equations are given by

(X T X )b = X T y,

where X T is the of the X . So, b = (X T X )−1X T y.

Zheng-Liang Lu 354 / 354