1 Inner Products and Norms

ORF 523 Lecture 2 Spring 2016, Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Tuesday, February 9, 2016 When in doubt on the accuracy of these notes, please cross check with the instructor's notes, on aaa. princeton. edu/ orf523 . Any typos should be emailed to [email protected]. Today, we review basic math concepts that you will need throughout the course. • Inner products and norms • Positive semidefinite matrices • Basic differential calculus 1 Inner products and norms 1.1 Inner products 1.1.1 Definition Definition 1 (Inner product). A function h:; :i : Rn × Rn ! R is an inner product if 1. hx; xi ≥ 0; hx; xi = 0 , x = 0 (positivity) 2. hx; yi = hy; xi (symmetry) 3. hx + y; zi = hx; zi + hy; zi (additivity) 4. hrx; yi = rhx; yi for all r 2 R (homogeneity) Homogeneity in the second argument follows: hx; ryi = hry; xi = rhy; xi = rhx; yi using properties (2) and (4) and again (2) respectively, and hx; y + zi = hy + z; xi = hy; xi + hz; xi = hx; yi + hx; zi using properties (2), (3) and again (2). 1 1.1.2 Examples • The standard inner product is T X n hx; yi = x y = xiyi; x; y 2 R : • The standard inner product between matrices is T X X hX; Y i = Tr(X Y ) = XijYij i j where X; Y 2 Rm×n. Notation: Here, Rm×n is the space of real m × n matrices. Tr(Z) is the trace of a real square P matrix Z, i.e., Tr(Z) = i Zii. Note: The matrix inner product is the same as our original inner product between two vectors of length mn obtained by stacking the columns of the two matrices. • A less classical example in R2 is the following: hx; yi = 5x1y1 + 8x2y2 − 6x1y2 − 6x2y1 Properties (2), (3) and (4) are obvious, positivity is less obvious. It can be seen by writing 2 2 2 2 hx; xi = 5x1 + 8x2 − 12x1x2 = (x1 − 2x2) + (2x1 − 2x2) ≥ 0 hx; xi = 0 , x1 − 2x2 = 0 and 2x1 − 2x2 = 0 , x1 = 0 and x2 = 0: 1.1.3 Properties of inner products Definition 2 (Orthogonality). We say that x and y are orthogonal if hx; yi = 0: Theorem 1 (Cauchy Schwarz). For x; y 2 Rn jhx; yij ≤ jjxjj jjyjj; where jjxjj := phx; xi is the length of x (it is also a norm as we will show later on). 2 Proof: First, assume that jjxjj = jjyjj = 1. jjx − yjj2 ≥ 0 ) hx − y; x − yi = hx; xi + hy; yi − 2hx; yi ≥ 0 ) hx; yi ≤ 1: Now, consider any x; y 2 Rn. If one of the vectors is zero, the inequality is trivially verified. If they are both nonzero, then: x y ; ≤ 1 ) hx; yi ≤ jjxjj · jjyjj: (1) jjxjj jjyjj Since (1) holds 8x; y, replace y with −y: hx; −yi ≤ jjxjj · jj − yjj hx; −yi ≥ −||xjj · jjyjj using properties (1) and (2) respectively. 1.2 Norms 1.2.1 Definition Definition 3 (Norm). A function f : Rn ! R is a norm if 1. f(x) ≥ 0, f(x) = 0 , x = 0 (positivity) 2. f(αx) = jαjf(x); 8α 2 R (homogeneity) 3. f(x + y) ≤ f(x) + f(y) (triangle inequality) Examples: pP 2 • The 2-norm: jjxjj = i xi P • The 1-norm: jjxjj1 = i jxij • The inf-norm: jjxjj1 = maxi jxij P p 1=p • The p-norm: jjxjjp = ( i jxij ) , p ≥ 1 Lemma 1. Take any inner product h:; :i and define f(x) = phx; xi. Then f is a norm. 3 Proof: Positivity follows from the definition. For homogeneity, f(αx) = phαx; αxi = jαjphx; xi We prove triangular inequality by contradiction. If it is not satisfied, then 9x; y s.t. phx + y; x + yi > phx; xi + phy; yi ) hx + y; x + yi > hx; xi + 2phx; xihy; yi + hy; yi ) 2hx; yi > 2phx; xihy; yi which contradicts Cauchy-Schwarz. Note: Not every norm comes from an inner product. 1.2.2 Matrix norms Matrix norms are functions f : Rm×n ! R that satisfy the same properties as vector norms. Let A 2 Rm×n. Here are a few examples of matrix norms: p T qP 2 • The Frobenius norm: jjAjjF = Tr(A A) = i;j Ai;j P • The sum-absolute-value norm: jjAjjsav = i;j jXi;jj • The max-absolute-value norm: jjAjjmav = maxi;j jAi;jj Definition 4 (Operator norm). An operator (or induced) matrix norm is a norm m×n jj:jja;b : R ! R defined as jjAjja;b = max jjAxjja x s.t. jjxjjb ≤ 1; m n where jj:jja is a vector norm on R and jj:jjb is a vector norm on R . Notation: When the same vector norm is used in both spaces, we write jjAjjc = max jjAxjjc s.t. jjxjjc ≤ 1: Examples: 4 p T • jjAjj2 = λmax(A A), where λmax denotes the largest eigenvalue. P • jjAjj1 = maxj i jAijj, i.e., the maximum column sum. P • jjAjj1 = maxi j jAijj, i.e., the maximum row sum. Notice that not all matrix norms are induced norms. An example is the Frobenius norm p given above as jjIjj∗ = 1 for any induced norm, but jjIjjF = n. Lemma 2. Every induced norm is submultiplicative, i.e., jjABjj ≤ jjAjj jjBjj: Proof: We first show that jjAxjj ≤ jjAjj jjxjj. Suppose that this is not the case, then jjAxjj > jjAjj jxjj 1 ) jjAxjj > jjAjj jjxjj x ) A > jjAjj jjxjj x but jjxjj is a vector of unit norm. This contradicts the definition of jjAjj. Now we proceed to prove the claim. jjABjj = max jjABxjj ≤ max jjAjj jjBxjj = jjAjj max jjBxjj = jjAjj jjBjj: jjx||≤1 jjx||≤1 jjx||≤1 Remark: This is only true for induced norms that use the same vector norm in both spaces. In the case where the vector norms are different, submultiplicativity can fail to hold. Consider e.g., the induced norm jj · jj1;2; and the matrices " p p # " # 2=2 2=2 1 0 A = p p and B = : − 2=2 2=2 1 0 In this case, jjABjj1;2 > jjAjj1;2 · jjBjj1;2: Indeed, the image of the unit circle by A (notice that A is a rotation matrix of angle π=4) stays within the unit square, and so jjAjj1;2 ≤ 1. Using similar reasoning, jjBjj1;2 ≤ 1. 5 p p This implies that jjAjj1;2jjBjj1;2 ≤ 1. However, jjABjj1;2 ≥ 2, as jjABxjj1 = 2 for x = (1; 0)T . Example of a norm that is not submultiplicative: jjAjjmav = max jAi;jj i;j This can be seen as any submultiplicative norm satisfies jjA2jj ≤ jjAjj2: In this case, ! ! 1 1 2 2 A = and A2 = 1 1 2 2 2 2 So jjA jjmav = 2 > 1 = jjAjjmav. Remark: Not all submultiplicative norms are induced norms. An example is the Frobenius norm. 1.2.3 Dual norms Definition 5 (Dual norm). Let jj:jj be any norm. Its dual norm is defined as T jjxjj∗ = max x y s.t. jjyjj ≤ 1: You can think of this as the operator norm of xT . The dual norm is indeed a norm. The first two properties are straightforward to prove. The triangle inequality can be shown in the following way: T T T T jjx + zjj∗ = max(x y + z y) ≤ max x y + max z y = jjxjj∗ + jjzjj∗ jjy||≤1 jjy||≤1 jjy||≤1 Examples: 1. jjxjj1∗ = jjxjj1 6 2. jjxjj2∗ = jjxjj2 3. jjxjj∞∗ = jjxjj1. Proofs: • The proof of (1) is left as an exercise. • Proof of (2): We have T jjxjj2∗ = max x y y s.t. jjyjj2 ≤ 1: Cauchy-Schwarz implies that xT y ≤ jjxjj jjyjj ≤ jjxjj x and y = jjxjj achieves this bound. • Proof of (3): We have T jjxjj∞∗ = max x y y s.t. jjyjj1 ≤ 1 So yopt = sign(x) and the optimal value is jjxjj1. 2 Positive semidefinite matrices We denote by Sn×n the set of all symmetric (real) n × n matrices. 2.1 Definition Definition 6. A matrix A 2 Sn×n is • positive semidefinite (psd) (notation: A 0) if T n x Ax ≥ 0; 8x 2 R : • positive definite (pd) (notation: A 0) if T n x Ax > 0; 8x 2 R ; x 6= 0: 7 • negative semidefinite if −A is psd. (Notation: A 0) • negative definite if −A is pd. (Notation: A ≺ 0.) Notation: A 0 means A is psd; A ≥ 0 means that Aij ≥ 0, for all i; j. Remark: Whenever we consider a quadratic form xT Ax, we can assume without loss of generality that the matrix A is symmetric. The reason behind this is that any matrix A can be written as A + AT A − AT A = + 2 2 A+AT A−AT where B := 2 is the symmetric part of A and C := 2 is the anti-symmetric part of A. Notice that xT Cx = 0 for any x 2 Rn. Example: The matrix ! 5 1 M = 1 −2 is indefinite. To see this, consider x = (1; 0)T and x = (0; 1)T : 2.2 Eigenvalues of positive semidefinite matrices Theorem 2. The eigenvalues of a symmetric real-valued matrix A are real. Proof: Let x 2 Cn be a nonzero eigenvector of A and let λ 2 C be the corresponding eigenvalue; i.e., Ax = λx. By multiplying either side of the equality by the conjugate transpose x∗ of eigenvector x, we obtain x∗Ax = λx∗x; (2) We now take the conjugate of both sides, remembering that A 2 Sn×n : x∗AT x = λx¯ ∗x ) x∗Ax = λx¯ ∗x (3) Combining (2) and (3), we get λx∗x = λx¯ ∗x ) x∗x(λ − λ¯) = 0 ) λ = λ,¯ since x 6= 0.

Load more