CSC 576: Mathematical Foundations I

CSC 576: Mathematical Foundations I Ji Liu Department of Computer Sciences, University of Rochester September 20, 2016 1 Notations and Assumptions In most cases (if without local definitions), we use • Greek alphabets such as α, β, and γ to denote real numbers; • Small letters such as x, y, and z to denote vectors; • Capital letters to denote matrices, e.g., A, B, and C. Other notations: • R is the one dimensional Euclidean space; n • R is the n dimensional vector Euclidean space; m×n • R is the m × n dimensional matrix Euclidean space; • R+ denotes the range [0; +1); n • 1n 2 R denotes a vector with 1 in all entries; n • For any vector x 2 R , we use jxj to denote the absolute vector, that is, jxji = jxij 8i = 1; ··· ; n; • denotes the component-wise product, that is, for any vectors x and y,(x y)i = xiyi. Some assumptions: • Unless explicit (local) definition, we always assume that all vectors are column vectors. 2 Vector norms, Inner product n A function f : x 2 R ! y 2 R+ is called a \norm", if the following three conditions are satisfied • (Zero element) f(x) ≥ 0 and f(x) = 0 if and only if x = 0; n • (Homogeneous) For any α 2 R and x 2 R , f(αx) = jαjf(x); n • (Triangle inequality) Any x; y 2 R satisfy f(x) + f(y) ≥ f(x + y). 1 n The `2 norm \k · k2" (a special \f(·)") in R is defined as 2 2 2 1 kxk2 = (jx1j + jx2j + ··· + jxnj ) 2 : Because of `2 is the most commonly used norm (also known as Euclidean norm), we denote it as 2 2 k · k sometimes for short. (Think about it how about f([x1; x2]) = 2x1 + x2?) A general `p norm (p ≥ 1) is defined as p p p 1 kxkp = (jx1j + jx2j + ··· + jxnj ) p : Note that for p < 1, it is not a \norm" since the triangle inequality is violated. `1 norm is defined as kxk1 = maxfjx1j; jx2j; ··· ; jxnjg: n One may notice that the `1 norm is the limit of the `p norm, that is, for any x 2 R , kxk1 = limp!+1 kxkp. In addition, people use kxk0 to denote the `0 \norm". n The inner product h·; ·i in R is defined as X hx; yi = xiyi: i One can show that hx; xi = kxk2. Two vectors x and y are orthogonal if hx; yi = 0. That is one reason why `2 norm is so special. n If p ≥ q, then for any x 2 R we have kxkp ≤ kxkq. In particular, we have kxk1 ≥ kxk2 ≥ kxk1: To bound from the order sides, we have p p kxk1 ≤ nkxk2 kxk2 ≤ nkxk1: Proof. To see the first one, we have p kxk1 = h1n; jxji ≤ k1nk2kjxjk2 = nkxk2 where the last inequality uses the Cauchy inequality. I leave the proof of the second inequality in your homework. Given a norm \k · kA", its dual norm is defined as hx; zi kxkA∗ = max hx; yi = max hx; yi = max : kykA≤1 kykA=1 z kzkA Several important properties about the dual norm are • The dual norm's dual norm is itself, that is, kxk(A∗)∗ = kxkA; • The `2 norm is self-dual, that is, the dual norm of the `2 norm is still the `2 norm; • The dual norm of the `p norm (p ≥ 1) is `q norm where p and q satisfy 1=p + 1=q = 1. Particularly, `1 norm and `1 norm are dual to each other. • (Holder inequality): hx; yi ≤ kxkAkykA∗ 2 3 Linear space, subspace, linear transformation A set S is a linear space if • 0 2 S; • given any two points x 2 S, y 2 S in S and any two scalars α 2 R and β 2 R, we have αx + βy 2 S: n m×n Note that ; is not a linear space. Examples: vector space R , matrix space R . How about the following things: • 0; (no) •f 0g; (yes) •f x j Ax = bg where A is a matrix and b is a vector. (b = 0 yes; otherwise, no) Let S be a linear space. A set S0 is a subspace if S0 is a linear space and also a subset of S. Actually, \subspace" is equivalent to \linear space", because any subspace is a linear space and any linear space is a subspace. They are indeed talking about the same thing. Let S be a linear space. A function L(·) is a linear transformation if given any two points x; y 2 S and two scalars α 2 R and β 2 R, one has L(αx + βy) = αL(x) + βL(y): For vector space, there exists a 1-1 correspondence between a linear transformation and a matrix. Therefore, we can simply say \a matrix is a linear transformation". • Prove that fL(x) j x 2 Sg is a linear space if S is a linear space and L is a linear transformation. • Prove that fx j L(x) 2 Sg a linear space assuming S is a linear space, and L is a linear transformation. How to express a subspace? The most intuitive way is to use a bunch of vectors. A subspace can be expressed by ( n ) X spanfx1; x2; ··· ; xng = αixi j αi 2 R = fXα j αg; i=1 which is called the range space of matrix X. A subspace can be also represented by the null space of X by fα j Xα = 0g: 3 4 Eigenvalues / eigenvectors, rank, SVD, inverse m×n T n×m The transpose of a matrix A 2 R is defined as A 2 R : T (A )ij = Aji: One can verify that (AB)T = BT AT : n×n n×n A matrix B 2 R is the inverse of an invertible matrix A 2 R if AB = I and BA = I: B can be denoted as A−1. A has the inverse is equivalent to that A has a full rank (the definition for \rank" will be clear very soon.) Note that the inverse of a matrix is unique. One can also verify that if both A and B are invertible, then (AB)−1 = B−1A−1: The \transpose" and the \inverse" are exchangeable: (AT )−1 = (A−1)T : When we write A−1, we have to make sure that A is invertible. n×n n n Given a square matrix A 2 R , x 2 R (x 6= 0) is called its eigenvector and λ 2 R is called its eigenvalue, if the following relationship is satisfied Ax = λx. (The effect of applying the linear transformation A on x is nothing but scaling it.) Note that • If fλ, xg is a pair of eigenvalue-eigenvector, then so is fλ, αxg for any α 6= 0. • One eigenvalue may correspond to multiple different eigenvectors. “Different" means eigenvectors are different after normalization. If the matrix A is symmetric, then any two eigenvectors (corresponding to different eigenvalues) T are orthogonal, that is, if A = A, Ax1 = λ1x1, Ax2 = λ2x2, and λ1 6= λ2, then T x1 x2 = 0: T Proof. Consider x1 Ax2. We have T T T T T x1 Ax2 = x1 (Ax2) = x1 (Ax2) = x1 (λ2x2) = λ2x1 x2; and T T T T T T x Ax2 = (x A)x2 = (A x1) x2 = (Ax1) x2 = λ1x x2: 1 1 |{z} 1 A=AT Therefore, we have T T λ2x1 x2 = λ1x1 x2: T Since λ1 6= λ2, we obtain x1 x2 = 0. 4 m×n A matrix A 2 R is a \rank-1" matrix, if A can be expressed as A = xyT m n m×n where x 2 R and y 2 R , and x 6= 0, y 6= 0. The rank of a matrix A 2 R is defined as ( r ) X T m n rank(A) = min r j A = xiyi ; xi 2 R ; yi 2 R i=1 ( r ) X = min r j A = Bi;Bi is a \rank-1" matrix : i=1 Examples: [1; 1; 1; 1], [1; 1; 2; 2], and many natural images have the low rank property. \Low rank" implies that the contained information is few. m×n T We say \U 2 R has orthogonal columns" if U U = I, that is, any two columns Ui· and Uj· of U satisfies T T Ui· Uj· = 0 if i 6= j; otherwise Ui· Uj· = 1: Swapping any two columns in U to get U 0, U 0 still satisfies U 0T U 0 = I. •k Uxk = kxk 8x. •k U T yk ≤ kyk 8y. If U is a square matrix and has orthogonal columns, then we call it \orthogonal matrix". It has some nice properties • U −1 = U T (which means that UU T = U T U = I.) • U T is also an orthogonal matrix. • The effect of applying the transformation U on a vector x is to rotate x, that is, kUxk = kxk = kU T xk. \SVD" is short for \singular value decomposition", which is the most important concept in linear algebra and matrix analysis. SVD almost explores all structures of a matrix. Given any m×n matrix A 2 R , it can be decomposed into r T X T A = UΣV = σiUi·Vi· i=1 m×r n×r where U 2 R and V 2 R have orthogonal columns, and Σ = diagfσ1; σ2; ··· ; σrg is a diagonal matrix with positive diagonal elements. σi's are called singular values, which are positive and are arranged in the decreasing order. • rank(A) = r; •k Axk ≤ σ1kxk. Why? n×n A matrix B 2 R is positive semi-definite (PSD), if the following things are satisfied 5 • B is symmetric; n T •8 x 2 R , we have x Bx ≥ 0.

Load more