An Incidence Geometry Approach to Dictionary Learning∗

CCCG 2014, Halifax, Nova Scotia, August 11{13, 2014 An Incidence Geometry approach to Dictionary Learning∗ Meera Sitharamy Mohamad Tarifiz Menghan Wangx Abstract The Dictionary Learning problem arises in various con- texts such as signal processing and machine learning. We study the Dictionary Learning (aka Sparse Coding) The dictionary under consideration is usually overcom- problem of obtaining a sparse representation of data plete, with n > d. However we are interested in asymp- points, by learning dictionary vectors upon which the totic performance with respect to all four variables data points can be written as sparse linear combina- n; m; d; s. Typically, m n d > s. Both cases tions. We view this problem from a geometry per- when s is large relative to d and when s is small relative spective as the spanning set of a subspace arrange- to d are interesting. ment, and focus on understanding the case when the underlying hypergraph of the subspace arrangement is specified. For this Fitted Dictionary Learning prob- 1.1 Previous approaches and challenges lem, we completely characterize the combinatorics of Several traditional Dictionary Learning algorithms work the associated subspace arrangements (i.e. their under- by alternating minimization, i.e. iterating the two steps lying hypergraphs), using incidence geometry. Specif- of Vector Selection which finds a representation Θ of X ically, a combinatorial rigidity-type theorem is proven in an estimated dictionary, and updating the Dictionary that characterizes the hypergraphs of subspace arrange- estimation by solving an optimization problem that is ments that generically yield (a) at least one dictionary convex in D when Θ is known [25, 20, 19]. (b) a locally unique dictionary (i.e. at most a finite number of isolated dictionaries) of the specified size. We For an overcomplete dictionary, the general vector se- are unaware of prior application of combinatorial rigid- lection problem is ill defined and has been shown to be ity techniques in the setting of Dictionary Learning, or NP-hard [21]. One is then tempted to conclude that even in machine learning. We also provide a systematic Dictionary Learning is also NP-hard. However, this can- classification of problems related to Dictionary Learning not be directly deduced in general, since even though together with various approaches, assumptions required adding a witness D turns the problem into an NP-hard and performance. problem, it is possible that the Dictionary Learning solution produces a different dictionary D´. On the other hand, if D satisfies the condition of being a frame, i.e. 1 Introduction for all θ such that kθk0 ≤ s, there exists a δs such that 2 kDθk2 (1 − δs) ≤ 2 ≤ (1 + δs), it is guaranteed that the Dictionary Learning (aka Sparse Coding) is the problem kθk2 of obtaining a sparse representation of data points, by sparsest solution to the Vector Selection problem can be learning dictionary vectors upon which the data points found via l1 minimization [8, 7]. can be written as sparse linear combinations. One popular alternating minimization method is the Method of Optimal Dictionary (MOD) [9], which uses a Problem 1 (Dictionary Learning) A point set X = maximum likelihood formalism, and compute D via the d [x1 : : : xm] in R is said to be s-represented by a dic- pseudoinverse. Another method k-SVD [2] updates D tionary D = [v1 : : : vn] for a given sparsity s < d, if by taking every atom in D and applying SVD to X and there exists Θ = [θ1 : : : θm] such that xi = Dθi, with Θ restricted to only the columns that have contribution kθik0 ≤ s. Given an X known to be s-represented by from that atom. an unknown dictionary D of size jDj = n, Dictionary Though alternating minimization methods work well Learning is the problem of finding any dictionary D´ sat- ´ in practice, there is no theoretical guarantee that the isfying the properties of D, i.e. jDj ≤ n, and there exists their results will converge to a true dictionary. Several ´ ´ ´ Θi such that xi = Dθi for all xi 2 X. recent works give provable algorithms under stronger ∗This research was supported in part by the research grant constraints on X and D. Spielman et. al [24] give an NSF CCF-1117695 and a research gift from SolidWorks. l1 minimization based approach which is provable to yDepartment of Computer and Information Science and Engi- find the exact dictionary D, but requires D to be a neering, University of Florida, [email protected] basis. Arora et. al [3] and Agarwal et. al [1] indepen- zGoogle, [email protected] xDepartment of Computer and Information Science and Engi- dently give provable non-iterative algorithms for learn- neering, University of Florida, [email protected] ing approximation of overcomplete dictionaries. Both 26th Canadian Conference on Computational Geometry, 2014 of their methods are based on an overlapping cluster- sphere Sd−1, then when s is fixed, jDj = Ω(jXj) ing approach to find data points sharing a dictionary with probability 1 (see Corollary 4). vector, and then estimate the dictionary vectors from the clusters via SVD. However, their algorithms require • As a corollary to our main result, we obtain an Dic- the dictionaries to be pairwise incoherent which is much tionary Learning algorithm for sufficiently general stronger than the frame property. data X, i.e. requiring sufficiently large dictionary In this paper, we understand the Dictionary Learn- size n (see Corollary 5). ing problem from an intrinsically geometric point of • We provide a systematic classification of problems view. Notice that each x 2 X lies in an s-dimensional related to Dictionary Learning together with var- subspace supp (x), which is the span of s vectors D ious approaches, conditions and performance (see v 2 D that form the support of x. The result- Section 4). ing s-subspace arrangement SX;D = f(x; suppD(x)) : x 2 Xg has an underlying labeled (multi)hypergraph Note that although our results are stated for uniform H(SX;D) = (I(D); I(SX;D)), where I(D) denotes the hypergraphs H(SX;D) (i.e. each subspace in SX;D has index set of the dictionary D and I(SX;D) is the set of the same dimension), they can be easily generalized to (multi)hyperedges over the indices I(D) corresponding non-uniform underlying hypergraphs. to the labeled sets (x; suppD(x)). The word \multi" ap- Remark on technical significance: in this paper, we fol- pears because if suppD(x1) = suppD(x2) for data points low [4] and [31] to give a complete combinatorial charac- x1; x2 2 X with x1 6= x2, then that support set of dictio- terization for the Fitted Dictionary Learning problem, nary vectors (resp. their indices) is multiply represented starting from the initial representation as a nonlinear al- in SX;D (resp. I(SX;D)) as labeled sets (x1; suppD(x1)) gebraic system. For more details of technical challenges and (x2; suppD(x2)). We denote the sizes of these mul- and significance, see Section 3.4. tisets as jSX;Dj (resp. jI(SX;D)j). Note that there could be many dictionaries D for the 3 Main Result: Combinatorial Rigidity Characteri- same set X of data points and for each D, many possible zation for Dictionary Learning subspace arrangements SX;D that are solutions to the Dictionary Learning problem. In this section, we present the main result of the paper, i.e. a combinatorial characterization of the 2 Contributions (multi)hypergraphs H such that the existence and local uniqueness of a dictionary D is guaranteed for generic In this paper, we focus on the version of Dictionary X satisfying H(SX;D) = H. Learning where the underlying hypergraph is specified. Since the magnitudes of the vectors in X or D are uninteresting, we treat the data and dictionary points Problem 2 (Fitted Dictionary Learning) Let X be a given set of data points in d. For an unknown dictio- as living in the projective (d−1) space and use the same R notation to refer to both original d-dimensional and nary D = [v1; : : : ; vn] that s-represents X, we are given the hypergraph H(S ) of the underlying subspace ar- projective d − 1 dimensional versions when the mean- X;D ing is clear from the context. We rephrase the Fitted rangement S . Find any dictionary D´ of size jD´j ≤ n X;D Dictionary Learning problem as the following Pinned consistent with the hypergraph H(S ). X;D Subspace-Incidence problem for the convenience of ap- Our contributions in this paper are as follows: plying machinery from incidence geometry. • As the main result, we use combinatorial rigidity Problem 3 (Pinned Subspace-Incidence Problem) techniques to obtain a complete characterization of Let X be a given set of m points (pins) in Pd−1(R). the hypergraphs H(SX;D) that generically yield (a) For every pin x 2 X, we are also given the hyperedge at least one solution dictionary D, and (b) a lo- suppD(x), i.e, an index subset of an unknown set cally unique solution dictionary D (i.e. at most a of points D = fv1; : : : ; vng, such that xi lies on the finite number of isolated solution dictionaries) of subspace spanned by suppD(x). Find any such set D the specified size (see Theorem 2). To the best that satisfies the given subspace incidences. of our knowledge, this paper pioneers the use of combinatorial rigidity for problems related to Dic- 3.1 Algebraic Representation tionary Learning. We represent the Pinned Subspace-Incidence problem in • We are interested in minimizing jDj for general X. the tradition of geometric constraint solving [6, 23], and However, as a corollary of the main result, we ob- view the problem as finding the common solutions of a tain that if the data points in X are highly general, system of polynomial equations (finding a real algebraic for example, picked uniformly at random from the variety).

An Incidence Geometry Approach to Dictionary Learning∗

Study Guide for the Midterm. Topics: • Euclid • Incidence Geometry • Perspective Geometry • Neutral Geometry – Through Oct.18Th

Arxiv:1609.06355V1 [Cs.CC] 20 Sep 2016 Outlaw Distributions And

Connections Between Graph Theory, Additive Combinatorics, and Finite

Keith E. Mellinger

1 Definition and Models of Incidence Geometry

ON the POLYHEDRAL GEOMETRY of T–DESIGNS

Matroid Enumeration for Incidence Geometry

Set Theory. • Sets Have Elements, Written X ∈ X, and Subsets, Written a ⊆ X. • the Empty Set ∅ Has No Elements

Math 431 Solutions to Homework 3

Analogy Between Additive Combinations, Finite Incidence Geometry & Graph Theory

Extending Erd\H {O} S-Beck's Theorem to Higher Dimensions

Structure Theorems and Extremal Problems in Incidence Geometry