Matrix Algebra

Appendix A Matrix Algebra A.1 Basic Definitions AmatrixA is a rectangular array whose elements aij are arranged in rows and columns. If there are m rows and n columns, we say that we have an m × n matrix ⎡ ⎤ a11 a12 ··· a1n ⎢ ⎥ ⎢ a21 a22 ··· a2n ⎥ A = ⎢ . ⎥ = (aij ) ⎣ . · . ⎦ am1 am2 ··· amn When m = n, the matrix is said to be square. Otherwise, it is rectangular. A triangular matrix is a special kind of square matrix where the entries either below or above the main diagonal are zero. Hence, we distinguish between a lower and an upper triangular matrix. A lower triangular matrix L is of the form ⎡ ⎤ l11 0 ··· 0 ⎢ ⎥ ⎢ l21 l22 ··· 0 ⎥ L = ⎢ . ⎥ ⎣ . .. ⎦ ln1 an2 ··· lnn Similarly, an upper triangular matrix can be formed. The elements aii or lii, with i = 1,...,n are called the diagonal elements. When n = 1, the matrix is said to be a column matrix or vector. Furthermore, one distinguishes between row vectors in case m = 1, submatrices, diagonal matrices (aii), zero or null matrices O = (0ij ), and the identity matrix I , a diagonal matrix of the form (aii) = 1. To indicate an n-dimensional identity matrix, sometimes the notation In is used. A matrix is called symmetric if (aij ) = (aji). K.J. Keesman, System Identification, 249 Advanced Textbooks in Control and Signal Processing, DOI 10.1007/978-0-85729-522-4, © Springer-Verlag London Limited 2011 250 A Matrix Algebra A.2 Important Operations In this book, calculation is only based on real-valued matrices, hence A ∈ Rm×n.As for the scalar case (m = 1,n= 1), addition, subtraction (both element-wise), and multiplication of matrices is defined. If A is an m × n and B an n × p matrix, then the product AB is a matrix of dimension m × p whose elements cij are given by n cij = aikbkj k=1 Consequently, the ij th element is obtained by, in turn, multiplying the elements of the ith row of A by the jth column of B and summing over all terms. However, it should be noted that for the matrices A and B of appropriate dimensions, in general AB = BA Hence, generally, premultiplication of matrices yields different results than post- multiplication. T T The transpose of the matrix A = (aij ), denoted by A and defined as A = T (aij ) := (aji), is another important operation, which in general changes the dimension of the matrix. The following holds: (AB)T = BT AT For vectors x,y ∈ Rn, however, xT y = yT x which is called the inner or scalar product. If the inner product is equal to zero, i.e., xT y = 0, the two vectors are said to be orthogonal. In addition to the inner product of two vectors, also the matrix inner product has been introduced. The inner product of two real matrices A and B is defined as A,B:=Tr AT B Other important operations are the outer or dyadic product (ABT ) and matrix inver- sion (A−1), which is only defined for square matrices. In the last operation one has to determine the determinant of the matrix, denoted by det(A) or simply |A|, which is a scalar. The determinant of an n × n matrix A is defined as |A|:=ai1ci1 + ai2ci2 +···+aincin Herein, the cofactors cij of A are defined as follows: i+j cij := (−1) |Aij | A.2 Important Operations 251 where |Aij | is the determinant of the submatrix obtained when the ith row and the jth column are deleted from A. Thus, the determinant of a matrix is defined in terms of the determinants of the associated submatrices. Let us demonstrate the calculation of the determinant to a 3 × 3 matrix. Example A.1 Determinant:Let ⎡ ⎤ a11 a12 a13 ⎣ ⎦ A = a21 a22 a23 a31 a32 a33 Then, a22 a23 a21 a23 a21 a22 |A|=a11 − a12 + a13 a32 a33 a31 a33 a31 a32 After some algebraic manipulation using the same rules for the subdeterminants, we obtain |A|=a11(a22a33 − a32a23) − a12(a21a33 − a31a23) + a13(a21a32 − a31a22) When the determinant of a matrix is equal to zero, the matrix is singular, and the inverse A−1 does not exist. If det(A) = 0, the inverse exists, and the matrix is said to be regular. Whether a matrix is invertible or not can also be checked by calculating the rank of a matrix. The column rank of a matrix A is the maximal number of linearly independent columns of A. Likewise, the row rank is the maximal number of linearly independent rows of A. Since the column rank and the row rank are always equal, they are simply called the rank of A. Thus, an n × n matrix A is invertible when the rank is equal to n. The inverse of a square n × n matrix is calculated from − 1 A 1 = adj(A) |A| ⎡ ⎤ c11 c21 cn1 | | | | ··· | | ⎢ A A A ⎥ ⎢ c12 c22 ··· cn2 ⎥ ⎢ | | | | | | ⎥ ⎢ A A A ⎥ ⎢ . ⎥ ⎣ . · . ⎦ c1n c2n ··· cnn |A| |A| |A| where adj(A) denotes the adjoint of the matrix A and is obtained after transposing the n × n matrix C with elements cij , the cofactors of A. The following properties are useful: 1. (AB)−1 = B−1A−1 2. (AB)(B−1A−1) = A(BB−1)A−1 = AIA−1 = I 3. (ABC)−1 = C−1B−1A−1 252 A Matrix Algebra 4. (AT )−1 = (A−1)T 5. |A|−1 = 1/|A|. A square matrix is said to be an orthogonal matrix if AAT = I so that A−1 = AT . If, however, the matrix is rectangular, the matrix inverse does not exist. For these cases, the so-called generalized or pseudo-inverse has been introduced. The pseudo- inverse A+ of an m×n matrix A, also known as the Moore–Penrose pseudo-inverse, is given by − A+ = AT A 1AT provided that the inverse (AT A)−1 exists. Consequently, − A+A = AT A 1AT A = I and thus A+ of this form is also called the left semi-inverse of A. This Moore– Penrose pseudo-inverse forms the heart of the ordinary least-squares solution to a linear regression problem, where m = N (number of measurements), and n = p (number of parameters). The generalized inverse is not unique. For the case m<n, where the inverse (AT A)−1 does not exist, one could use − A+ = AT AAT 1 so that − AA+ = AAT AAT 1 = I if (AAT )−1 exists. Application of this generalized inverse or right semi-inverse plays a key role in so-called minimum-length solutions. Finally, for the cases where (AT A)−1 and (AAT )−1 do not exist, the generalized inverse can be computed via a limiting process + − − A = lim AT A + δI 1AT = lim AT AAT + δI 1 δ→0 δ→0 which is related to Tikhonov regularization. A.3 Quadratic Matrix Forms T Define the vector x := [x1,x2,...,xn] ; then a quadratic form in x is given by xT Qx A.4 Vector and Matrix Norms 253 where Q = (qij ) is a symmetric n × n matrix. Following the rules for matrix multiplication, the scalar xT Qx is calculated as T = 2 + +···+ x Qx q11x1 2q12x1x2 2q1nx1xn + 2 +···+ q22x2 2q2nx2xn +···+ 2 qnnxn Hence, if Q is diagonal, the quadratic form reduces to a weighted inner product, which is also called the weighted Euclidean squared norm of x. In shorthand nota- 2 tion, x 2,Q; see Sect. A.4 for a further introduction of vector and matrix norms. Consequently, the weighted squared norm represents a weighted sum of squares. An n × n real symmetric matrix Q is called positive definite if xT Qx > 0 for all nonzero vectors x. For an n × n positive definite matrix Q, all diagonal elements are positive, that is, qii > 0fori = 1,...,n. A positive definite matrix is invertible. In case xT Qx ≥ 0, we call the matrix semi-positive definite. A.4 Vector and Matrix Norms Let us introduce the norm of a vector x ∈ Rn, as introduced in the previous section, in some more detail, where the norm is indicated by the double bar. A vector norm on Rn for x,y ∈ Rn satisfies the following properties: x≥0 x=0 ⇐⇒ x = 0 x + y≤x+y αx=|α|x Commonly used vector norms are the 1-, 2-, and ∞-norm, which are defined as x1 := |x1|+···+|xn| 1 := 2 +···+ 2 2 x 2 x1 xn x∞ := max |xi| 1≤i≤n where the subscripts on the double bar are used to indicate a specific norm. Hence, the 2-norm, also known as the Euclidean (squared) norm, is frequently used to indicate a length of a vector. The weighted Euclidean norm for diagonal matrix Q,as already introduced in Sect. A.3, is then defined as 1 := 2 +···+ 2 2 x 2,Q q11x1 qnnxn 254 A Matrix Algebra Sometimes this norm is also denoted as xQ, thus without an explicit reference to the 2-norm. However, in the following, we will use the notation x2,Q for a weighted 2-norm to avoid confusion. This idea of norms can be further extended to matrices A,B ∈ Rm×n with the same kind of properties as presented above. For the text in this book, it suffices to introduce one specific matrix norm, the so-called Frobenius norm ·F , m n 2 T AF = |aij | = Tr A A i=1 j=1 where the trace (denoted by Tr(·)) of a square n × n matrix is the sum of its diagonal elements. The Frobenius norm is used in the derivation of a total least-squares solution to an estimation problem with noise in both the regressors and regressand, the dependent variable.

Load more