Mathematics of Data: from Theory to Computation

Mathematics of Data: From Theory to Computation Prof. Volkan Cevher volkan.cevher@epfl.ch Lecture 1: Objects in space Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) EE-556 (Fall 2015) License Information for Mathematics of Data Slides I This work is released under a Creative Commons License with the following terms: I Attribution I The licensor permits others to copy, distribute, display, and perform the work. In return, licensees must give the original authors credit. I Non-Commercial I The licensor permits others to copy, distribute, display, and perform the work. In return, licensees may not use the work for commercial purposes – unless they get the licensor’s permission. I Share Alike I The licensor permits others to distribute derivative works only under a license identical to the one that governs the licensor’s work. I Full Text of the License Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 2/ 40 Outline I This class: 1. Linear algebra review I Notation I Vectors I Matrices I Tensors I Next class 1. Review of probability theory Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 3/ 40 Recommended reading material I Zico Kolter and Chuong Do, Linear Algebra Review and Reference http://cs229.stanford.edu/section/cs229-linalg.pdf, 2012. I KC Border, Quick Review of Matrix and Real Linear Algebra http://www.hss.caltech.edu/~kcb/Notes/LinearAlgebra.pdf, 2013. I Simon Foucart and Holger Rauhut, A mathematical introduction to compressive sensing (Appendix A: Matrix Analysis), Springer, 2013. I Joel A Tropp, Column subset selection, matrix factorization, and eigenvalue optimization, In Proc. of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pp 978–986, SIAM, 2009. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 4/ 40 Motivation Motivation I This lecture is intended to help you follow mathematical discussions in data sciences, which rely heavily on basic linear algebra concepts. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 5/ 40 Notation I Scalars are denoted by lowercase letters (e.g. k) I Vectors by lowercase boldface letters (e.g., x) I Matrices by uppercase boldface letters (e.g. A) I Component of a vector x, matrix A as xi, aij & Ai,j,k,... respectively. I Sets by uppercase calligraphic letters (e.g. S). Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 6/ 40 Vector spaces Note: We focus on the field of real numbers (R) but most of the results can be generalized to the field of complex numbers (C). A vector space or linear space (over the field R) consists of (a)a set of vectors V (b) an addition operation: V × V → V (c)a scalar multiplication operation: R × V → V (d)a distinguished element 0 ∈ V and satisfies the following properties: 1. x + y = y + x, ∀x, y ∈ V commutative under addition 2. (x + y) + z = x + (y + z), ∀x, y, z ∈ V associative under addition 3. 0 + x = x, ∀x ∈ V 0 being additive identity 4. ∀x ∈ V ∃ (−x) ∈ V such that x + (−x) = 0 −x being additive inverse 5. (αβ)x = α(βx), ∀α, β ∈ R ∀x ∈ V associative under scalar multiplication 6. α(x + y) = αx + αy, ∀α ∈ R ∀x, y ∈ V distributive 7. 1x = x, ∀x ∈ V 1 being multiplicative identity Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 7/ 40 Vector spaces contd. Example (Vector space) p 1. V1 = {0} for 0 ∈ R p 2. V2 = R 3. V = Pk α x for α ∈ R and x ∈ Rp 3 i=1 i i i i It is straight forward to show that V1, V2, and V3 satisfy properties 1–7 above. Definition (Subspace) A subspace is a vector space that is a subset of another vector space. Example (Subspace) p V1, V2, and V3 in the example above are subspaces of R Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 8/ 40 Vector spaces contd. Definition (Span) The span of a set of vectors, {x1, x2,..., xk}, is the set of all possible linear combinations of these vectors; i.e., span {x1, x2,..., xk} = {α1x1 + α2x2 + ··· + αkxk | α1, α2, . , αk ∈ R} . Definition (Linear independence) A set of vectors, {x1, x2,..., xk}, is linearly independent if α1x1 + α2x2 + ··· + αkxk = 0 ⇒ α1 = α2 = ... = αk = 0. Definition (Basis) The basis of a vector space, V, is a set of vectors {x1, x2,..., xk} that satisfy (a) V = span {x1, x2,..., xk} , (b) {x1, x2,..., xk} are linearly independent. Definition (Dimension∗) The dimension of a vector space, V, (denoted dim(V)) is the number of vectors in the basis of V. ∗We will generalize the concept of affine dimension to the statistical dimension of convex objects. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 9/ 40 Vector Norms Definition (Vector norm) A norm of a vector in Rp is a function k · k : Rp → R such that for all vectors x, y ∈ Rp and scalar λ ∈ R (a) kxk ≥ 0 for all x ∈ Rp nonnegativity (b) kxk = 0 if and only if x = 0 definitiveness (c) kλxk = |λ|kxk homogeniety (d) kx + yk ≤ kxk + kyk triangle inequality I There is a family of `q-norms parameterized by q ∈ [1, ∞]; p 1/q I For x ∈ Rp, the ` -norm is defined as kxk := P |x |q . q q i=1 i Example (1) ` -norm: kxk := pPp x2 (Euclidean norm) 2 2 i=1 i (2) ` -norm: kxk := Pp |x | (Manhattan norm) 1 1 i=1 i (3) `∞-norm: kxk∞ := max |xi| (Chebyshev norm) i=1,...,p Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 10/ 40 Vector norms contd. Definition (Quasi-norm) A quasi-norm satisfies all the norm properties except (d) triangle inequality, which is replaced by kx + yk ≤ c (kxk + kyk) for a constant c ≥ 1. Definition (Semi(pseudo)-norm) A semi(pseudo)-norm satisfies all the norm properties except (b) definiteness. Example 1/q I The `q-norm is in fact a quasi norm when q ∈ (0, 1), with c = 2 − 1. I The total variation norm (TV-norm) defined (in 1D): kxk := Pp−1 |x − x | is a semi-norm since it fails to satisfy (b); TV i=1 i+1 i T e.g. any x = c(1, 1,..., 1) for c , 0 will have kxkTV = 0 even though x , 0. Definition (`0-“norm”) q kxk0 = limq→0kxkq = |{i : xi , 0}| The `0-norm counts the non-zero components of x. It is not a norm – it does not satisfy the property (c) ⇒ it is also neither a quasi- nor a semi-norm. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 11/ 40 Vector norms contd. Problem (s-sparse approximation) Find arg min kx − yk2 subject to: kxk0 ≤ s. x∈Rp Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 12/ 40 Vector norms contd. Problem (s-sparse approximation) Find arg min kx − yk2 subject to: kxk0 ≤ s. x∈Rp Solution 2 Define y ∈ arg min kx − yk2 and let S = supp y . b p b b x∈R :kxk0≤s We now consider an optimization over sets 2 Sb ∈ arg min kyS − yk2. S:|S|≤s 2 2 ∈ arg max kyk2 − kyS − yk2 S:|S|≤s 2 X 2 ∈ arg max kyS k2 = arg max kyik (≡ modular approximation problem). S:|S|≤s S:|S|≤s i∈S Thus, the best s-sparse approximation of a vector is a vector with the s largest components of the vector in magnitude. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 12/ 40 Vector norms contd. Norm balls p Radius r ball in `q-norm: Bq(r) = {x ∈ R : kxkq ≤ r} kxk0 ≤ 2 `0.5-quasi norm ball `1-norm ball `2-norm ball `∞-norm ball TV-semi norm ball Table: Example norm balls in R3 Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 13/ 40 Inner products Definition (Inner product) The inner product of any two vectors x, y ∈ Rp (denoted by h·, ·i) is defined as hx, yi = xT y = Pp x y . i i i The inner product satisfies the following properties: 1. hx, yi = hy, xi, ∀x, y ∈ Rp symmetry 2. h(αx + βy), zi = hαx, zi + hβy, zi, ∀α, β ∈ R, ∀x, y, z ∈ Rp linearity 3. hx, xi ≥ 0, ∀x ∈ Rp positive definiteness Important relations involving the inner product: I 1 1 Hölder’s inequality: |hx, yi| ≤ kxkqkykr, where r > 1 and q + r = 1 I Cauchy-Schwarz is a special case of Hölder’s inequality (q = r = 2) Definition (Inner product space) An inner product space is a vector space endowed with an inner product. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 14/ 40 Solution 1 1 By Hölder’s inequality, k · kr is the dual norm of k · kq if q + r = 1. Therefore, r = 1 + log(p) for q = 1 + 1/ log(p). Vector norms contd. Definition (Dual norm) Let k · k be a norm in Rp, then the dual norm denoted by k · k∗ is defined: kxk∗ = sup xT y, for all x, y ∈ Rp kyk≤1 I The dual of the dual norm is the original (primal) norm, i.e., kxk∗∗ = kxk. I 1 1 Hölder’s inequality ⇒ k · kq is a dual norm of k · kr when q + r = 1. Example 1 T i) k · k2 is dual of k · k2 (i.e. k · k2 is self-dual): sup{z x | kxk2 ≤ 1} = kzk2. T ii) k · k1 is dual of k · k∞, (and vice versa): sup{z x | kxk∞ ≤ 1} = kzk1.

Mathematics of Data: from Theory to Computation

C H a P T E R § Methods Related to the Norm Al Equations

The Column and Row Hilbert Operator Spaces

Matrices and Linear Algebra

Lecture 5 Ch. 5, Norms for Vectors and Matrices

Matrix Norms 30.4

5 Norms on Linear Spaces

A Degeneracy Framework for Scalable Graph Autoencoders

Chapter 6 the Singular Value Decomposition Ax=B Version of 11 April 2019

Domain Generalization for Object Recognition with Multi-Task Autoencoders

Norms of Vectors and Matrices and Eigenvalues and Eigenvectors - (7.1)(7.2)

Deep Subspace Clustering Networks

Eigenvalues and Eigenvectors