<<

Useful Facts, Identities, Inequalities

Linear Algebra Determinants Facts about definite matrices Notation Lemma 1. [Sylvester’s Identity] Assume A is positive definite. Then: Notation Name Comment Let A ∈ Mm,n and B ∈ Mn,m. Then: •A is always full rank. A a indicated by capitalization and bold •A is invertible and A−1 is also positive definite. font det(Im + AB) = det(In + BA) (3) [A]ij the element from the i-th row and j-th Sometimes denoted Aij for brevity •A is positive definite ⇔ there exists an B such that A = BB⊤. column of A Mn the set of complex-valued matrices e.g., A ∈ Mm,n •Aii > 0 with m rows and n columns ⊤ M ∈ M Determinant Facts •rank(BAB ) = rank(B). n the set of complex-valued square e.g., A n ∈ M { } Q For A n Pwith eigenvalues λ(A) = λ1, . . . λn , ≤ matrices n •det(A) i Aii I the e.g., I = diag [1, 1,..., 1] •tr(A) = i=1 λi Q ⊤ e vector of ones n •If X ∈ Mn,r, with n ≤ r and rank(X) = n, then XX is positive definite. •det(A) = i=1 λi σi(A) i-th of A Typically, we assume sorted order: † { } { } •A positive definite ⇔ eig( A+A ) > 0 (and ≥ 0 for A positive-semidefinite). σ1(A) ≥ σ2(A) ≥ ... •Also, det(exp A ) = exp tr(A) . 2 { } ∈ M λi(A) the i-th eigenvalue of A may also write λ(A) = λi i for the Lemma 2 ( Determinants). Let M n+m be a partitioned as: •If B is symmetric, then A − tB is positive definite for sufficiently small t. set of eigenvalues.   ∈ M {| |} AB ρ(A) the of A n ρ(A) = max λi M = (4) Projections 1≤i≤n CD ¯ ¯ A the of A operates entrywise, e.g. Aij = Aij Definition 14 A⊤ the matrix of A where A ∈ Mn, D ∈ Mm, with B and C being n × m and m × n respectively. † † def ¯ ⊤ Let V be a , and U ⊆ V be a subspace. The projection of v ∈ V onto some A the of A e.g., A = (A) If B = 0n,m or C = 0m,n, then − − k·k A 1 the matrix inverse of A e.g., A 1A = I subspace with respect to some norm q is: − A+ the Moore-Penrose pseudoinverse of A sometimes may just use A 1 due to det(M) = det(A) det(D) (5) def k − k sloppy notation Πv = argmin v u q (10) u∈U Definitions Furthermore, if A is invertible, then † ¯ ⊤ For the (weighted) Euclidean norm, Π can itself be expressed as a matrix. Definition 1 (Complex Conjugate). Let A ∈ Mn, then A = (A) .  ⊤ −1 Definition 2 (). A matrix A such that A = A det(M) = det(A) det D − CA B (6) ⊤ −1 ⊤ ΠD = X(X DX) X D (11) Definition 3 (). A matrix A such that A = A† = (A¯ )⊤. ⇔ † † Definition 4 (). A normal A A = AA A similar identity holds for D invertible. Although sometimes we just write Π instead of ΠD. That is, a normal matrix is one that commutes with its conjugate transpose. Lemma 3 (Block Determinants). Consider a matrix M ∈ M of the ⇔ † † m,n Definition 5 (). A unitary I = A A = AA . form: Definition 6 (). A square matrix A whose columns and rows are orthogonal unit vectors, that is AA⊤ = A⊤A.   Projection Facts S11 S12 ··· S1n Definition 7 (). A matrix such that AA = A.  ···  •Projections are idempotent, i.e., Π2 = Π. S21 S22 S2n  Definition 8 ( Matrix). A matrix such that A2 = AA = 0. M =   (7)  . .. .  •A square matrix Π ∈ M is called an orthogonal if Π2 = Π = Π†. Definition 9 ( Matrix). A matrix A such that A2 = I. . . . n Definition 10 ( Matrix). Consider a matrix, say P with no negative entries. P Sn1 Sn2 ··· Snn •A non-orthogonal projection is called an oblique projection, usually expressed as is row if each row sums to one; it is column stochastic if each column sums Π = A(B⊤A)−1B⊤. × to one. By convention, usually mean row stochastic when using “stochastic” without where each Sij is an m m matrix. •The triangle inequality applies, kxk = k(I − Π)x + Πxk ≤ k(I − Π)xk + kΠxk. qualification. A matrix is doubly stochastic if it is both row and column stochastic. If M is in block triangular form (that is, Sij = 0 for j > i), then Definition 11 (). Two square matrices A and B over a are called ⊤ Yn Singular Values congruent if there exists an invertible matrix P over the same field such that P AP = B. × det(M) = det(Sii) = det(S11) det(S22) ··· det(Snn) (8) Let A ∈ Cm n and r = min{m, n}, with σ(A) = {σ (A)}r , and U, V are subspaces with Definition 12 (). Two square matrices A and B are called similar if there k k=1 i=1 ⊆ n ⊆ m exists an invertible matrix P such that B = P−1AP. U C , V C . A transformation A 7→ P−1AP is called a similarity transformation or conjugation of the k k k k Definiteness σk(A) = max min Ax 2 = min max Ax 2 matrix A. dim(U)=k x∈U dim(U)=n−k+1 x∈U ∥x∥=1 ∥x∥=1 Special Matrices and Vectors Definition 13: Definiteness ⊤ (12) ⊤ y Ax kAxk The vector of all ones, 1 = (1, 1,..., 1) . = max min = max min A square matrix A is dim(U)=k x∈U kxk kyk dim(U)=k x∈U kxk † ∈ 2 2 2   •positive definite if x Ax > 0 ∀x =6 0. dim(V )=k x V v1 v1 ··· v1 † v2 v2 ··· vn •positive semi-definite if x Ax ≥ 0 ∀x. ⊤   ||| ||| v1 = v ⊗ 1 =   (1) •σ1(A) = A 2  . . .. .  Similar definitions apply for negative (semi-)definiteness and indefiniteness. . . . . ⊤ † •σi(A) = σi(A ) = σi(A ) = σi(A¯ ). vn vn ··· vn 2 † † Some sources argue that only Hermitian matrices should count as positive definite, while •σi (A) = λi(AA ) = λi(A A). Standard others note that (real) matrices can be positive definite according to definition ,13 although this is ultimately due to the symmetric part of the matrix. •For U and V (square) unitary matrices of appropriate , σi(A) = σi(UAV). n { }n ∈ n The standard (euclidean) basis for R is denoted ei i=1, with ei R , and ◦ k k k·k This also connects to why A 2 = σ1, because 2 is unitarily invariant, ( k k | | k k † k k Remark 1: non-symmetric positive definite matrices diag(d) = maxi di , so A 2 = UΣV = Σ 2 = σ1(A). def 1 if i = j 2 [ei] = (2) 0 i =6 j ⊤ Matrix Inverse If A ∈ Mn(R) is a (not necessarily symmetric) real matrix, then x Ax > 0 for all ⊤ non-zero real x if the symmetric part of A, defined as A+ = (A + A )/2, is positive Identities General Information definite. −1 −1 −1 ⊤ ⊤ • (AB) = B A Symmetric Matrices In fact, x Ax = x A+x ∀x because • (A + CBC⊤)−1 = A−1 − A−1C(B−1 + C⊤A−1C)−1C⊤A−1 •Any matrix congruent to a symmetric matrix is itself symmetric, i.e. if S is symmetric, ⊤ ⊤ ⊤ ⊤ ⊤ 1 ⊤ ⊤ ⊤ x Ax = (x Ax) = x A x = x (A + A )x (9) then so is ASA for any matrix A. 2 • •Any symmetric real matrix can be diagonalized by an orthogonal matrix. • Moore-Penrose Pseudoinverse Miscellaneous facts Induced norms

1/2 Definition 17 (Induced Norm). A |||·||| is said to be induced by the vector Decompositions • kAvk = (v⊤A⊤Av)1/2 ≤ A⊤A (v⊤v)1/2. 2 2 norm k·k if Symmetric Decomposition • | det(A)| = σ1σ2 ··· σn, where σi is the i-th singular value of A. |||M||| = max kMxk ∥x∥=1 (19) A square matrix A can always be written as the sum of a symmetric matrix A+ and an • All eigenvalues of a matrix are smaller in magnitude than the largest singular value, |||·||| antisymmetric matrix A−, such that A = A+ + A−. | | | | ≤ k k For an induced norm we have: i.e., σ1(A) > λi(A) . In particular, λ1(A) = ρ(A) σ1(A) = A 2. ||| ||| QR Decomposition I = 1 Matrix Norms |||Ax||| ≤ |||A|||kxk for all matrices A and vectors x A decomposition of a matrix A into the product of an orthogonal matrix Q and an upper ||| ||| ≤ ||| |||||| ||| Properties AB A B for all A, B triangular matrix R such that A = QR. For a weighted norm with W a symmetric positive , we have |||αA||| = |α||||A||| (absolutely homogeneous) p Singular Value Decomposition ||| ||| ≤ ||| ||| ||| ||| A + B A + B (sub-additive) k k def ⊤ 1/2 x W = x Wx = W x (20) Rewriting Matrices |||A||| ≥ 0 (positive valued) 2 |||A||| = 0 ⇒ A = 0 (definiteness) We can rewrite some common expressions using the standard basis e and the single-entry i |||AB||| ≤ |||A||||||B||| (sub-multiplicative) The corresponding induced matrix norm is matrix Jij = e e⊤: i j Some vector norms, when generalized to matrices, are also matrix norms. ⊤ 1/2 −1/2 A B = [Ae e B] = [AJjkB] We use |||·||| when denoting a norm satisfying all five of the above properties, and k·k to kAxk W AW y ij kℓ j k iℓ iℓ ||| ||| def W 2 denote the “generalized norms” that are not sub-multiplicative. A W = max = max ⊤ ⊤ ik x≠ 0 kxk y≠ 0 1/2 (21) = [A eiek B]jℓ = [AJ B]jℓ W W y 2 ⊤ ⊤ (13) jℓ − = [Aej eℓ B ]ik = [AJ B]ik ||| 1/2 1/2||| Common Matrix Norms = W AW 2 = [A⊤e e⊤B⊤] = [A⊤JiℓB⊤] i ℓ jk jk Let A ∈ Mm,n. Note that the above identities can be derived from each other via appropriate transpositions. The single-entry matrix can also be used to extract rows and columns of a matrix. Let A be Notation Name Formula Spectrum of a Matrix × ij × ij × ∈ M n m and J be m p. Then AJ is a p m matrix of zeros except for the j-th column, |||A||| Lp-norm max kAxk Definition 18 (Spectral Radius). The spectral radius ρ(A) of a matrix A n is p ∥ ∥ p which is the i-th column of A. x p=1   Xm def {| | } ··· ··· ρ(A) = max λ : λ is an eigenvalue of A (22) 0 0 A1i 0 0 |||A||| L1-norm |||A||| = max |Aij | 1 1 ≤ ≤ 0 0 ··· A 0 ··· 0 1 j n 1/r  2i  i=1 For square matrices, we have lim →∞|||Ar||| = ρ(A). AJij =   (14) ||| ||| r ...... A 2 = σ1(A) |||·||| ∈ M ......  q Theorem 4. If is a submultiplicative matrix norm and A n, then . . . . 0 |||A||| L2-norm ··· ··· 2 † 0 0 Ani 0 0 = vλmax(A A) ρ(A) ≤ |||A||| (23) u ij uXm Xn For appropriately sized J , we can use the Kronecker delta to express: ||| ||| t | |2 Proof. Let v be the eigenvector associated with λ, with |λ| = ρ(A). Consider the matrix V A F = Aij [AJij ] = δ A [Jij A] = δ A (15) i=1 j=1 with columns equal to v, i.e. [V]ij = vj . Note that AV = λV. So we have, k,ℓ jℓ ki k,ℓ ik jℓ q ||| ||| † A F Frobenius norm = tr(A A) |λ||||V||| = |||λV||| = |||AV||| ≤ |||A||||||V||| (24) v Quadratic Forms u So |λ| = ρ(A) ≤ |||A|||. uminXm,n t = σ2 Definition 15: i Theorem 5 (Weyl’s Inequality for Eigenvalues and Singular Values). Let A ∈ Mn be a i=1 ≥ ≥ · · · ≥ ≥ Xn square matrix with singular values sigma1 σ2 σn 0 and eigenvalues ∈ M ∈ n † |λ | ≥ · · · ≥ |λ |. Then Given A n, x C , the scalar value x Ax is called a quadratic form. Explicitly, |||A|||∞ L∞-norm |||A|||∞ = max |Aij | 1 n 1≤i≤m |λ ··· λ | ≤ σ ··· σ (25) Xn X Xn X j=1 1 k 1 k † x Ax = x¯i Aij xj = Aij x¯ixj (16) for k = 1, . . . , n, with equality for k = n. i j i j Miscellaneous Matrix Norm Facts P ⊤ n + Norm equivalence For x ∈ Rn, this becomes x Ax = A x x . • A = σmin(A) i,j ij i j 2 Definition 19 (Unitarily Invariant Norm). ... Facts The definiteness of quadratic forms maps to the definiteness of A in a pretty natural way Other Norms (see remark 1). 1. If D = diag(d1, d2, . . . , dN ), then |||D||| = maxi |di|. ||| ⊤ ||| ||| ⊤||| ≤ ||| |||2 Norms 2. A A F = AA F A F Matrix Normp Inequalities Vector Norms ||| ||| ≤ ||| ||| ||| ||| 3. |||A + B||| = |||A||| + |||B||| + 2hA, BiF • A 2 A 1 A ∞ F F F 1. kxk ≥ 0. Hadamard Product •|||·||| is an induced norm ⇒ ρ(A) ≤ |||Ar|||1/r 2. k0k = 0. Definition 20. Let A, B ∈ Mn,r. Then the Hadamard product (AKA Schur product, ◦ |||·||| ||| ||| ◦ k k ≤ k k k k For 2 and A Hermitian, we have A 2 = ρ(A) elementwise product) of A and B is denoted A B and defined such that 3. x + y x + y . p k k | |k k •|||A||| ≤ |||A||| ≤ min {m, n}|||A||| def 4. ax = a x for any scalar a. 2 F 2 [A ◦ B]ij = Aij Bij (26) Definition 16 (p-norms). ! •Weyl’s inequality (TODO) 1/p Identities X Inequality Conditions Comments k k def | |p x p = xi (17) • A ◦ B = B ◦ A i 1 |||A1/2XB1/2||| ≤ |||AvXB1−v + A1−vXBv||| • A ◦ (B ◦ C) = (A ◦ B) ◦ C ∞ 2 For p = this becomes the max norm or infinity norm, 0 < v ≤ 1, A, Heinz Inequality ◦ ◦ ◦ 1 • A (B + C) = (A B) + (A C) ≤ |||AX + XB||| B, X ∈ Mn, A, B for Matrices def ◦ ◦ ◦ kxk∞ = max |xi| (18) 2 positive • (kA) B = A (kB) = k(A B) i semidefinite • (A ◦ B)⊤ = A⊤ ◦ B⊤ |||Ab||| ≤ 1 |||(A + B)2||| ≤ 1 |||A2 + B2||| A, B positive Bhatia and 4 2 • A ◦ 1 = A (identity operation, 1 is the matrix with all entries one). Vector Norm Inequalities semi-definite, |||·||| Kittaneh |h i| ≤ k k k k 1 1 † x, y x p q q for p + q = 1. (Holder) unitarily invariant. • A ◦ (xy ) = DxADy = diag(x)A diag(y). |h i| ≤ k k ≤ k k k k ≤ k kk k ≤ k kk kk k k·k u, v u 2 v 2 (Cauchy) ABx A Bx A B x is induced norm ... ◦ ⊤ ◦ ⊤ ⊤ ◦ ◦ k k ≤ k k k k ||| ||| ≤ ||| ||| • (X ab )(Y cd ) = (ad ) (X diag(b c)Y) u + v p u p + v p for p > 1 (Minkowski) A p A q ...... k k ≥ k k ||| ||| ≤ ||| ||| • (A ◦ xy⊤)z = x ◦ (A(y ◦ z)) x q x p for p > q > 0 (Generalized Mean) A p A q ...... • For Dx and Dy diagonal matrices, Proof of lemma 6. (From Lemma 1 in [TV97]) Recall that d⊤P = d⊤, and then expand: D (A ◦ B)D = (D AD ) ◦ B = A ◦ (D BD ) x y x y x y   (27) 2 = (DxA) ◦ (BDy) = (ADy) ◦ (DxB) XN XN k k2 ⊤   Pz d = (Pz) DPz = d(i) Pij zj • a ◦ b = Dab (vector product) i=1 j ⊤ ⊤ ⊤ ⊤ ⊤ • (a ◦ b)(x ◦ y) = (ax ) ◦ (by ) = (ay ) ◦ (bx ) XN XN XN XN 2 2 † ≤ d(i) Pij z = d(i)Pij z • x†(A ◦ B)y = tr(D AD B⊤) j j (33) x y i=1 j i=1 j ⊤ • [(A ◦ B)x]i = [ADxB ]ii XN XN XN P ⊤ ⊤ 2 2 ◦ ◦ = d(i)Pij zj = d(i)zj • j [A B]ij = [B A]jj = [AB ]ii. That is, the row sums of A B are the diagonal j=1 i j=1 elements of AB⊤ = kzk2 Inequalities D Where we first applied Jensen’s inequality as the function f(x) = x2 is convex, and then For A = [Aij ] ∈ Mm,n, denote the decreasingly ordered Euclidean row and column lengths 1 respectively by interchange the order ofP summation via the Fubini-Tonelli theorem . For the penultimate N ⊤ r (A) ≥ r (A) ≥ · · · ≥ rm(A) step, we recognize that i Pij d(i) corresponds to d P, and then note that what remains is 1 2 k k (28) just z D. c1(A) ≥ c2(A) ≥ · · · ≥ cn(A) k k ≤ P  1 Lemma 7 (Bounded Features, Bounded Feature Matrix). Suppose we have x(i) 2 Kb n | |2 2 ∈ N×m where rk(A) is the k-th largest value of j=1 Aij for i = 1, 2, . . . , m and similarly for for some Kb > 0 for i = 1,...,N, and let X R be the feature matrixP with [X]ij = xj (i). Let D be a D = diag (d), with di ≥ 0 and di = 1. Then ck(A). Then: i ( ) we have an L2 norm bound of the form: r1(A)σ1(B) 1 σ (A ◦ B) ≤ r (A)c (B) ≤ ≤ σ (A)σ (B) (29) ||| ||| ≤ ||| 2 ||| ≤ (34) 1 1 1 1 1 DX 2 D X 2 Kb σ1(A)c1(B) (This is a bit of a niche result, but even some experts have missed it) (from [HJ91], theorem 5.5.3, pg. 332) lemma 7. First, note that   Conditions 1/2 1 1 1 1 1 ||| ◦ ||| ≤ 1 ||| |||2||| |||2 ||| †||| ≤ ||| ||| ||| ||| ∈ M |||DX||| = |||D 2 (D 2 X)||| ≤ |||D 2 ||| |||D 2 X||| ≤ |||D 2 X||| (35) A B 2 2 A 2 B 2 + AB 2 A 2 B 2 A, B n 2 2 2 2 2 σ1(A ◦ B) ≤ σ1(A)σ1(B) A, B ∈ Mm,n |||·||| ≤ |||·||| Then use the fact that 2 F to get det(A ◦ B) ≥ det(A) det(B) A, B positive-semidefinite. 1 1 ||| 2 ||| ≤ ||| 2 ||| (36) D X 2 D X F Stochastic Matrices and upon expanding ≤ X X X X X X • ρ(P) = 1 σ1(P) 1 1 ||| 2 |||2 2 2 2 2 D X F = [D X]ij = diXij = di Xij • |||P||| = σ1(P) = 1 ⇔ P is doubly stochastic. 2 i j i j i j ⊤ X (37) • If P is doubly stochastic, then so is P P ≤ 2 2 Kb = Kb Some Results on Stochastic Matrices i def For P aperiodic and irreducible, with stationary distribution d and D = diag(d), we have so we have 1 1  1 ||| ||| ≤ ||| 2 ||| ≤ ||| 2 ||| ≤ 2 2 (38) DX 2 D X 2 D X F Kb = Kb ||| ||| ≤ DP 2 1 (30)

Proof. Note that by definition, 0 ≤ Pij < 1, and 0 < di < 1. Then,   1/2 X X ||| ||| ≤ ||| |||  2  DP 2 DP F = [DP]ij i j X X X X X X X (31) 2 2 2 ≤ 2 2 ≤ [DP]ij = di Pij di Pij = di 1 i j i j i j i ⇒ ||| ||| ≤ ||| ||| ≤ 1/2 DP 2 DP F 1 = 1 The bound is actually pretty tight, gets close to exact as the distribution becomes more ||| ||| ||| |||2 concentrated in a single state. Generally DP 2 is pretty close to d 2, which can be justified in a hand-wavy manner. NB: We don’t use strict inequality here, though we could, since I think this could be generalized to other kinds of stochastic matrices, and so I’m future-proofing.

||| ||| We can use a variation of the above to show that P D is a non-expansion in the distribution-weighted Euclidean norm, as in lemma 6. Lemma 6 (Stochastic Matrix is a Non-Expansion in Distribution Weighted Euclidean Norm). Let P ∈ RN×N be an ergodic stochastic matrix with stationary distribution d = (d1, d2, . . . , dN , with D = diag(d). k·k Then P is a non-expansion in the weighted Euclidean norm D, that is, k k ≤ k k Pz D z D (32) 1Exchanging the order of summation can actually be ill-defined if the series contain subsequences diverging to both positive and negative infinity. Given that d(·), P (·, ·) and z2(·) are positive-valued functions this does not happen– but since we are dealing with a finite state space, hence the sequence cannot diverge at all. Presumably the original proof invoked it for maximum generality, because this lemma would hold in the setting where the state space was continuous (requiring only changes to the notation). See [Coh80] or another book on analysis for further details. Sequences, Series, and Products Power Mean Bound forP Geometric Mean [Ste04, p. 122] For weights pk, Identities ≥ n ≥ k = 1, 2, . . . , n with pk 0, k=1 pk = 1, and xk > 0, there is the bound: Series identities " # Identity Conditions Comments Yn Xn 1/q pk q x ≤ pkx (46) X∞ k k 1 k=1 k=1 xn = |x| < 1 Infinite Power 1 − x for all q > 0. n =0 ! Series X∞ 2 X∞ X∞ n 2 2n n anx = anx + 2 aiaj x Square of n=0 k=0 n=1 Geometric i+j=n PowerP Mean Inequality [Ste04, p. 123] For weights pk, k = 1, 2, . . . , n with Series n ! i≥ 0, there is the bound: 2 k=1 Xn Xn Xn 1 Xn Xn " # " # 2 2 − − 2 n 1/t n 1/q akbk = ak bk (aibj aj bi) Lagrange’s X X 2 t ≤ q (47) k=1 k=1 k=1! i=1 j=1 Identity pkxk pkxk Xn Yn k=1 k=1 log(ak) = log ak 0 < ak ∈ R, Log-Sum ∈ for all −∞ < t < q < ∞, with equality if and only if x1 = x2 = ··· = xn. k=0 k=0 ! P n N Identity Xn Yn π − n arg(z ) log(z ) = log z − 2πib k=0 k c n ∈ N General k k 2π k=0 k=0 Log-Sum Sums of Squares Identity Product of two linear forms: Xm Xk Xm Xm     1   1 akbj = akbj Triangle-Sum 2 2 Xn Xn 1 Xn Xn Xn  Xn k=0 j=0 j=0 k=j Reordering ≤  2  2 2 uj xj vj xj  uj vj + uj vj  xj (48) m p m j p m 2 X X X X X X j=1 j=1 j=1 j=1 j=1 j=1 akbj = akbj + akbj 0 ≤ k ≤ m, Quadrangle- k=0 j=k j=0 k=0 j=m+1 k=0 0 ≤ k ≤ j ≤ p Sum for {ui}, {vi}, {xi} real valued. Reordering X∞ X∞ X∞ Xj akbj = akbj−k Infinite Double k=0 j=0 j=0 k=0 Sum Reordering X∞ Xk X∞ X∞ akbj = akbj Infinite Double k=0 j=0 j=0 k=j Sum Reordering (Many taken from Wolfram/MathWorld) Inequalities Cauchy Schwarz For {ak} and {bk} two sequences, we have ! ! ! XN 2 XN XN ≤ 2 2 akbk ak bk (39) k k k Some direct implications:

(ac + bd)2 ≤ (a2 + b2)(c2 + d2) (40)

For ai, bi > 0, we have Titu’s Lemma: (a + a + ··· + a )2 a2 a2 a2 1 2 n ≤ 1 + 2 + ··· + n (41) b1 + b2 + ··· + bn b1 b2 bn For 0 ≤ x < 1: ! ∞ ∞ 1 X 1 X 2 a xk ≤ √ a2 (42) k − 2 k k=0 1 x k=0 ! 1 Xn a Xn 2 k < 2 a2 (43) k k k=1 k=1

Arithmetic and Geometric Mean Inequality In general, the arithmetic mean is larger:

! 1 Yn n 1 Xn |a | ≤ |a | (44) k n k k=1 k=1 There are some particular implications of interest. Let {ai} be a sequence of nonnegative real numbers, and let {pi} be a sequence of positive reals that sums to one. Then, via the exponential bound: Yn Xn pk ≤ ak pkak (45) k=1 k=1 General and Miscellaneous (Facts/information that will be split into their own areas once enough are collected)

Algebraic Identities a2 − b2 = (a + b)(a − b) (a2 + b2)(c2 + d2) = (ac + bd)2 + (ad − bc)2 (Fibonacci-Brahmagupta)

Stirling’s Approximation Stirling’s approximation: given as ln(n!) = n ln(n) − n + O(ln n) or   (49) √ n n n! ≈ 2πn ≈ nne−n e

With the following bounds that hold for n ∈ N+: √ n+ 1 −n n+ 1 −n 2πn 2 e ≤ n! ≤ en 2 e (50)

Inequalities |h i| ≤ k k k k 1 1 x, y x p q q for p + q = 1. (Holder) |h i| ≤ k k ≤ k k u, v u 2 v 2 (Cauchy-Schwarz) k k ≤ k k k k u + v p u p + v p for p > 1 (Minkowski) k k ≥ k k x q x p for p > q > 0 (Generalized Mean) (Jensen) Q  1 P n | | n ≤ 1 n | | k=1 ak n k=1 ak (AM-GM) (Radon) 1 + x ≤ ex for x ∈ R (Exponential Bound) 1 + nx ≤ (1 + x)n for x ≥ −1, n ≥ 1. (Bernoulli)