CSC 576: Mathematical Foundations I

CSC 576: Mathematical Foundations I Ji Liu Department of Computer Sciences, University of Rochester September 20, 2016 1 Notations and Assumptions In most cases (if without local definitions), we use • Greek alphabets such as α, β, and γ to denote real numbers; • Small letters such as x, y, and z to denote vectors; • Capital letters to denote matrices, e.g., A, B, and C. Other notations: • R is the one dimensional Euclidean space; n • R is the n dimensional vector Euclidean space; m×n • R is the m × n dimensional matrix Euclidean space; • R+ denotes the range [0; +1); n • 1n 2 R denotes a vector with 1 in all entries; n • For any vector x 2 R , we use jxj to denote the absolute vector, that is, jxji = jxij 8i = 1; ··· ; n; • denotes the component-wise product, that is, for any vectors x and y,(x y)i = xiyi. Some assumptions: • Unless explicit (local) definition, we always assume that all vectors are column vectors. 2 Vector norms, Inner product n A function f : x 2 R ! y 2 R+ is called a \norm", if the following three conditions are satisfied • (Zero element) f(x) ≥ 0 and f(x) = 0 if and only if x = 0; n • (Homogeneous) For any α 2 R and x 2 R , f(αx) = jαjf(x); n • (Triangle inequality) Any x; y 2 R satisfy f(x) + f(y) ≥ f(x + y). 1 n The `2 norm \k · k2" (a special \f(·)") in R is defined as 2 2 2 1 kxk2 = (jx1j + jx2j + ··· + jxnj ) 2 : Because of `2 is the most commonly used norm (also known as Euclidean norm), we denote it as 2 2 k · k sometimes for short. (Think about it how about f([x1; x2]) = 2x1 + x2?) A general `p norm (p ≥ 1) is defined as p p p 1 kxkp = (jx1j + jx2j + ··· + jxnj ) p : Note that for p < 1, it is not a \norm" since the triangle inequality is violated. `1 norm is defined as kxk1 = maxfjx1j; jx2j; ··· ; jxnjg: n One may notice that the `1 norm is the limit of the `p norm, that is, for any x 2 R , kxk1 = limp!+1 kxkp. In addition, people use kxk0 to denote the `0 \norm". n The inner product h·; ·i in R is defined as X hx; yi = xiyi: i One can show that hx; xi = kxk2. Two vectors x and y are orthogonal if hx; yi = 0. That is one reason why `2 norm is so special. n If p ≥ q, then for any x 2 R we have kxkp ≤ kxkq. In particular, we have kxk1 ≥ kxk2 ≥ kxk1: To bound from the order sides, we have p p kxk1 ≤ nkxk2 kxk2 ≤ nkxk1: Proof. To see the first one, we have p kxk1 = h1n; jxji ≤ k1nk2kjxjk2 = nkxk2 where the last inequality uses the Cauchy inequality. I leave the proof of the second inequality in your homework. Given a norm \k · kA", its dual norm is defined as hx; zi kxkA∗ = max hx; yi = max hx; yi = max : kykA≤1 kykA=1 z kzkA Several important properties about the dual norm are • The dual norm's dual norm is itself, that is, kxk(A∗)∗ = kxkA; • The `2 norm is self-dual, that is, the dual norm of the `2 norm is still the `2 norm; • The dual norm of the `p norm (p ≥ 1) is `q norm where p and q satisfy 1=p + 1=q = 1. Particularly, `1 norm and `1 norm are dual to each other. • (Holder inequality): hx; yi ≤ kxkAkykA∗ 2 3 Linear space, subspace, linear transformation A set S is a linear space if • 0 2 S; • given any two points x 2 S, y 2 S in S and any two scalars α 2 R and β 2 R, we have αx + βy 2 S: n m×n Note that ; is not a linear space. Examples: vector space R , matrix space R . How about the following things: • 0; (no) •f 0g; (yes) •f x j Ax = bg where A is a matrix and b is a vector. (b = 0 yes; otherwise, no) Let S be a linear space. A set S0 is a subspace if S0 is a linear space and also a subset of S. Actually, \subspace" is equivalent to \linear space", because any subspace is a linear space and any linear space is a subspace. They are indeed talking about the same thing. Let S be a linear space. A function L(·) is a linear transformation if given any two points x; y 2 S and two scalars α 2 R and β 2 R, one has L(αx + βy) = αL(x) + βL(y): For vector space, there exists a 1-1 correspondence between a linear transformation and a matrix. Therefore, we can simply say \a matrix is a linear transformation". • Prove that fL(x) j x 2 Sg is a linear space if S is a linear space and L is a linear transformation. • Prove that fx j L(x) 2 Sg a linear space assuming S is a linear space, and L is a linear transformation. How to express a subspace? The most intuitive way is to use a bunch of vectors. A subspace can be expressed by ( n ) X spanfx1; x2; ··· ; xng = αixi j αi 2 R = fXα j αg; i=1 which is called the range space of matrix X. A subspace can be also represented by the null space of X by fα j Xα = 0g: 3 4 Eigenvalues / eigenvectors, rank, SVD, inverse m×n T n×m The transpose of a matrix A 2 R is defined as A 2 R : T (A )ij = Aji: One can verify that (AB)T = BT AT : n×n n×n A matrix B 2 R is the inverse of an invertible matrix A 2 R if AB = I and BA = I: B can be denoted as A−1. A has the inverse is equivalent to that A has a full rank (the definition for \rank" will be clear very soon.) Note that the inverse of a matrix is unique. One can also verify that if both A and B are invertible, then (AB)−1 = B−1A−1: The \transpose" and the \inverse" are exchangeable: (AT )−1 = (A−1)T : When we write A−1, we have to make sure that A is invertible. n×n n n Given a square matrix A 2 R , x 2 R (x 6= 0) is called its eigenvector and λ 2 R is called its eigenvalue, if the following relationship is satisfied Ax = λx. (The effect of applying the linear transformation A on x is nothing but scaling it.) Note that • If fλ, xg is a pair of eigenvalue-eigenvector, then so is fλ, αxg for any α 6= 0. • One eigenvalue may correspond to multiple different eigenvectors. “Different" means eigenvectors are different after normalization. If the matrix A is symmetric, then any two eigenvectors (corresponding to different eigenvalues) T are orthogonal, that is, if A = A, Ax1 = λ1x1, Ax2 = λ2x2, and λ1 6= λ2, then T x1 x2 = 0: T Proof. Consider x1 Ax2. We have T T T T T x1 Ax2 = x1 (Ax2) = x1 (Ax2) = x1 (λ2x2) = λ2x1 x2; and T T T T T T x Ax2 = (x A)x2 = (A x1) x2 = (Ax1) x2 = λ1x x2: 1 1 |{z} 1 A=AT Therefore, we have T T λ2x1 x2 = λ1x1 x2: T Since λ1 6= λ2, we obtain x1 x2 = 0. 4 m×n A matrix A 2 R is a \rank-1" matrix, if A can be expressed as A = xyT m n m×n where x 2 R and y 2 R , and x 6= 0, y 6= 0. The rank of a matrix A 2 R is defined as ( r ) X T m n rank(A) = min r j A = xiyi ; xi 2 R ; yi 2 R i=1 ( r ) X = min r j A = Bi;Bi is a \rank-1" matrix : i=1 Examples: [1; 1; 1; 1], [1; 1; 2; 2], and many natural images have the low rank property. \Low rank" implies that the contained information is few. m×n T We say \U 2 R has orthogonal columns" if U U = I, that is, any two columns Ui· and Uj· of U satisfies T T Ui· Uj· = 0 if i 6= j; otherwise Ui· Uj· = 1: Swapping any two columns in U to get U 0, U 0 still satisfies U 0T U 0 = I. •k Uxk = kxk 8x. •k U T yk ≤ kyk 8y. If U is a square matrix and has orthogonal columns, then we call it \orthogonal matrix". It has some nice properties • U −1 = U T (which means that UU T = U T U = I.) • U T is also an orthogonal matrix. • The effect of applying the transformation U on a vector x is to rotate x, that is, kUxk = kxk = kU T xk. \SVD" is short for \singular value decomposition", which is the most important concept in linear algebra and matrix analysis. SVD almost explores all structures of a matrix. Given any m×n matrix A 2 R , it can be decomposed into r T X T A = UΣV = σiUi·Vi· i=1 m×r n×r where U 2 R and V 2 R have orthogonal columns, and Σ = diagfσ1; σ2; ··· ; σrg is a diagonal matrix with positive diagonal elements. σi's are called singular values, which are positive and are arranged in the decreasing order. • rank(A) = r; •k Axk ≤ σ1kxk. Why? n×n A matrix B 2 R is positive semi-definite (PSD), if the following things are satisfied 5 • B is symmetric; n T •8 x 2 R , we have x Bx ≥ 0.

CSC 576: Mathematical Foundations I

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support