Mathematics of Data: from Theory to Computation

Mathematics of Data: from Theory to Computation

Mathematics of Data: From Theory to Computation Prof. Volkan Cevher volkan.cevher@epfl.ch Lecture 1: Objects in space Laboratory for Information and Inference Systems (LIONS) École Polytechnique Fédérale de Lausanne (EPFL) EE-556 (Fall 2015) License Information for Mathematics of Data Slides I This work is released under a Creative Commons License with the following terms: I Attribution I The licensor permits others to copy, distribute, display, and perform the work. In return, licensees must give the original authors credit. I Non-Commercial I The licensor permits others to copy, distribute, display, and perform the work. In return, licensees may not use the work for commercial purposes – unless they get the licensor’s permission. I Share Alike I The licensor permits others to distribute derivative works only under a license identical to the one that governs the licensor’s work. I Full Text of the License Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 2/ 40 Outline I This class: 1. Linear algebra review I Notation I Vectors I Matrices I Tensors I Next class 1. Review of probability theory Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 3/ 40 Recommended reading material I Zico Kolter and Chuong Do, Linear Algebra Review and Reference http://cs229.stanford.edu/section/cs229-linalg.pdf, 2012. I KC Border, Quick Review of Matrix and Real Linear Algebra http://www.hss.caltech.edu/~kcb/Notes/LinearAlgebra.pdf, 2013. I Simon Foucart and Holger Rauhut, A mathematical introduction to compressive sensing (Appendix A: Matrix Analysis), Springer, 2013. I Joel A Tropp, Column subset selection, matrix factorization, and eigenvalue optimization, In Proc. of the 20th Annual ACM-SIAM Symposium on Discrete Algorithms, pp 978–986, SIAM, 2009. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 4/ 40 Motivation Motivation I This lecture is intended to help you follow mathematical discussions in data sciences, which rely heavily on basic linear algebra concepts. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 5/ 40 Notation I Scalars are denoted by lowercase letters (e.g. k) I Vectors by lowercase boldface letters (e.g., x) I Matrices by uppercase boldface letters (e.g. A) I Component of a vector x, matrix A as xi, aij & Ai,j,k,... respectively. I Sets by uppercase calligraphic letters (e.g. S). Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 6/ 40 Vector spaces Note: We focus on the field of real numbers (R) but most of the results can be generalized to the field of complex numbers (C). A vector space or linear space (over the field R) consists of (a)a set of vectors V (b) an addition operation: V × V → V (c)a scalar multiplication operation: R × V → V (d)a distinguished element 0 ∈ V and satisfies the following properties: 1. x + y = y + x, ∀x, y ∈ V commutative under addition 2. (x + y) + z = x + (y + z), ∀x, y, z ∈ V associative under addition 3. 0 + x = x, ∀x ∈ V 0 being additive identity 4. ∀x ∈ V ∃ (−x) ∈ V such that x + (−x) = 0 −x being additive inverse 5. (αβ)x = α(βx), ∀α, β ∈ R ∀x ∈ V associative under scalar multiplication 6. α(x + y) = αx + αy, ∀α ∈ R ∀x, y ∈ V distributive 7. 1x = x, ∀x ∈ V 1 being multiplicative identity Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 7/ 40 Vector spaces contd. Example (Vector space) p 1. V1 = {0} for 0 ∈ R p 2. V2 = R 3. V = Pk α x for α ∈ R and x ∈ Rp 3 i=1 i i i i It is straight forward to show that V1, V2, and V3 satisfy properties 1–7 above. Definition (Subspace) A subspace is a vector space that is a subset of another vector space. Example (Subspace) p V1, V2, and V3 in the example above are subspaces of R Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 8/ 40 Vector spaces contd. Definition (Span) The span of a set of vectors, {x1, x2,..., xk}, is the set of all possible linear combinations of these vectors; i.e., span {x1, x2,..., xk} = {α1x1 + α2x2 + ··· + αkxk | α1, α2, . , αk ∈ R} . Definition (Linear independence) A set of vectors, {x1, x2,..., xk}, is linearly independent if α1x1 + α2x2 + ··· + αkxk = 0 ⇒ α1 = α2 = ... = αk = 0. Definition (Basis) The basis of a vector space, V, is a set of vectors {x1, x2,..., xk} that satisfy (a) V = span {x1, x2,..., xk} , (b) {x1, x2,..., xk} are linearly independent. Definition (Dimension∗) The dimension of a vector space, V, (denoted dim(V)) is the number of vectors in the basis of V. ∗We will generalize the concept of affine dimension to the statistical dimension of convex objects. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 9/ 40 Vector Norms Definition (Vector norm) A norm of a vector in Rp is a function k · k : Rp → R such that for all vectors x, y ∈ Rp and scalar λ ∈ R (a) kxk ≥ 0 for all x ∈ Rp nonnegativity (b) kxk = 0 if and only if x = 0 definitiveness (c) kλxk = |λ|kxk homogeniety (d) kx + yk ≤ kxk + kyk triangle inequality I There is a family of `q-norms parameterized by q ∈ [1, ∞]; p 1/q I For x ∈ Rp, the ` -norm is defined as kxk := P |x |q . q q i=1 i Example (1) ` -norm: kxk := pPp x2 (Euclidean norm) 2 2 i=1 i (2) ` -norm: kxk := Pp |x | (Manhattan norm) 1 1 i=1 i (3) `∞-norm: kxk∞ := max |xi| (Chebyshev norm) i=1,...,p Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 10/ 40 Vector norms contd. Definition (Quasi-norm) A quasi-norm satisfies all the norm properties except (d) triangle inequality, which is replaced by kx + yk ≤ c (kxk + kyk) for a constant c ≥ 1. Definition (Semi(pseudo)-norm) A semi(pseudo)-norm satisfies all the norm properties except (b) definiteness. Example 1/q I The `q-norm is in fact a quasi norm when q ∈ (0, 1), with c = 2 − 1. I The total variation norm (TV-norm) defined (in 1D): kxk := Pp−1 |x − x | is a semi-norm since it fails to satisfy (b); TV i=1 i+1 i T e.g. any x = c(1, 1,..., 1) for c , 0 will have kxkTV = 0 even though x , 0. Definition (`0-“norm”) q kxk0 = limq→0kxkq = |{i : xi , 0}| The `0-norm counts the non-zero components of x. It is not a norm – it does not satisfy the property (c) ⇒ it is also neither a quasi- nor a semi-norm. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 11/ 40 Vector norms contd. Problem (s-sparse approximation) Find arg min kx − yk2 subject to: kxk0 ≤ s. x∈Rp Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 12/ 40 Vector norms contd. Problem (s-sparse approximation) Find arg min kx − yk2 subject to: kxk0 ≤ s. x∈Rp Solution 2 Define y ∈ arg min kx − yk2 and let S = supp y . b p b b x∈R :kxk0≤s We now consider an optimization over sets 2 Sb ∈ arg min kyS − yk2. S:|S|≤s 2 2 ∈ arg max kyk2 − kyS − yk2 S:|S|≤s 2 X 2 ∈ arg max kyS k2 = arg max kyik (≡ modular approximation problem). S:|S|≤s S:|S|≤s i∈S Thus, the best s-sparse approximation of a vector is a vector with the s largest components of the vector in magnitude. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 12/ 40 Vector norms contd. Norm balls p Radius r ball in `q-norm: Bq(r) = {x ∈ R : kxkq ≤ r} kxk0 ≤ 2 `0.5-quasi norm ball `1-norm ball `2-norm ball `∞-norm ball TV-semi norm ball Table: Example norm balls in R3 Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 13/ 40 Inner products Definition (Inner product) The inner product of any two vectors x, y ∈ Rp (denoted by h·, ·i) is defined as hx, yi = xT y = Pp x y . i i i The inner product satisfies the following properties: 1. hx, yi = hy, xi, ∀x, y ∈ Rp symmetry 2. h(αx + βy), zi = hαx, zi + hβy, zi, ∀α, β ∈ R, ∀x, y, z ∈ Rp linearity 3. hx, xi ≥ 0, ∀x ∈ Rp positive definiteness Important relations involving the inner product: I 1 1 Hölder’s inequality: |hx, yi| ≤ kxkqkykr, where r > 1 and q + r = 1 I Cauchy-Schwarz is a special case of Hölder’s inequality (q = r = 2) Definition (Inner product space) An inner product space is a vector space endowed with an inner product. Mathematics of Data | Prof. Volkan Cevher, volkan.cevher@epfl.ch Slide 14/ 40 Solution 1 1 By Hölder’s inequality, k · kr is the dual norm of k · kq if q + r = 1. Therefore, r = 1 + log(p) for q = 1 + 1/ log(p). Vector norms contd. Definition (Dual norm) Let k · k be a norm in Rp, then the dual norm denoted by k · k∗ is defined: kxk∗ = sup xT y, for all x, y ∈ Rp kyk≤1 I The dual of the dual norm is the original (primal) norm, i.e., kxk∗∗ = kxk. I 1 1 Hölder’s inequality ⇒ k · kq is a dual norm of k · kr when q + r = 1. Example 1 T i) k · k2 is dual of k · k2 (i.e. k · k2 is self-dual): sup{z x | kxk2 ≤ 1} = kzk2. T ii) k · k1 is dual of k · k∞, (and vice versa): sup{z x | kxk∞ ≤ 1} = kzk1.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    43 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us