Vanishingly Sparse Matrices and Expander Graphs, With

arXiv:1207.3094v2 [cs.IT] 15 Jul 2013 eehleTrust. Leverhulme ( UK | oyehiu ´deaed asne(PL,Luan,Sw Lausanne, (EPFL), Lausanne de ( Fédérale Polytechnique ie nDfiiin11adterilsrto nFg 1. Fig. in i illustration graph their expander and lossless 1.1 a Definition of in given definition adjacen precise sparse a very with matrices, Expand graphs connected graphs. highly expander are lossless graphs of matrices adjacency are here). considered (as and column Gaussian per dense nonzeros fixe small of to very well number with compared even how matrices do Transform Cosine implementation matrices Discrete of GPU [ type a Tanner these and in Blanchard demonstrated fact, In recently matrices counterparts. sensing dense matric good as their sparse be as can shows graphs that expander from results coming numerical guarante exhibiting [4 performance with also theoretical both sensing, giving made [7] compressed [6], been in [5], has matrices, progress matrices dense significant sparse late, to incorporate Of compared [3]. [2], as [1], implementation see comple fast storage and low their ity of because mathematics putational n,sas arcs xadrgraphs. expander matrices, sparse ing, esn apigterm hnuigsas o mean-zero non sparse using spars compress when the matrices. quantitative measurement theorems hence also and sampling theor nove and graphs consider quantitative sensing a we expander for is matrices lossless allow analysis random of that this existence analysis constants to on order detailed Key better sets. more of of a splitting unions through dyadic in derived collisions th are of for expe bounds bounds the similar These than are less bounds the be these expansion from will of Deducible cardinality cardinality value. this expected that the probability for neighbors gra expander of formulae degree left of present corresponde fixed of number We one-to-one matrices a adjacency fixed the have wit a with random matrices at has uniformly These drawn replacement. column are indices each row whose nonzeros where matrices random xadri ti iatt rp with graph bipartite a is it if expander [email protected] V a ihnl preMtie n xadrGraphs, Expander and Matrices Sparse Vanishingly oyih c 02IEEE 2012 (c) Copyright ahmtc nttt n xtrClee nvriyo Ox of University College, Exeter and Institute Mathematics aoaoyfrIfrainadIfrneSses(LIONS), Systems Inference and Information for Laboratory ento 1.1: Definition that matrices sparse random consider we manuscript this In com- and applied in useful particularly are matrices Sparse ne Terms Index Abstract | = [email protected] n ih etcsadhsarglrlf degree left regular a has and vertices right ftegahwihi fitrs nmn applications. many in interest of is which graph the of W eii h rbblsi osrcino sparse of construction probabilistic the revisit —We o hs rps n rsn albud nthe on bounds tail present and graphs, these for ihApiaint opesdSensing Compressed to Application With Agrtm,cmrse esn,sga process- signal sensing, compressed —Algorithms, ehiu.Teaayi e otederivation the to led analysis The technique. G .I I. ( = ). NTRODUCTION ,V E V, U, .JT cnwegsspotfo the from support acknowledges J.T. ). ) salossless a is | U | uaarBh n ae Tanner Jared and Bah, Bubacarr = N etvertices, left od Oxford, ford, ( ,d ǫ d, k, d itzerland such , sand es Ecole ´ cted phs. ems nce set cy ed 8] es x- to er ) ], d h e s e - l neighbors. any that i.1 lutaino lossless a of Illustration 1. Fig. d ntefr fabudo h alpoaiiyo h ieof size the of probability tail the on bound obje a these of of form construction the probabilistic in the on guarantees [12]. achiev al. et. only Guruswami constructions see parameters, Deterministic sub-optimal here. consider optim we with constructions probabil Probabilistic high parameters graph. with is, expander graph an a bipartite [11] that Bassylago showed left-regular and and random expanders Pinsker lossless survey. of detailed existence [10] the more proved see Storag a Complexity, for Proof Distributed [9] in or a Tautologies Hard and and Time of Method, Fault-tolerance Linear Complexity Subsets, Bitprobe Networks, Storing applications Codes, in many Error-Correcting have Routing Decodable and Distributed mathematics including: pure and science osesexpanders lossless conductors 2 = 1 )Te r called are They 2) )The 3) eak1.2: Remark u ancnrbto,i h rsnaino quantitative of presentation the is contribution, main Our computer theoretical in studied well been have graphs Such osesexpanders Lossless . (1 − X ihprmtr htaebs oaihso h parameters the of logarithms 2 base are that parameters with expansion ǫ 1 ,N n, ) ⊂ d . e 9,[]adterfrne therein. references the and [5] [9], see U xs u r o utbefrteapplications the for suitable not are but exist )Tegah are graphs The 1) with ihparameters with falossless a of naacdexpanders unbalanced | X ≤ | ( ,d ǫ d, k, k ,k ,N n, k, d, ) has epne rpswith graphs -expander ( ,d ǫ d, k, lossless | Γ( r qiaetto equivalent are X ) epne rp is graph -expander ) | when because (1 = n − k ≪ ǫ 4 = ǫ ) lossless ≪ d N | ity, X and cts ; ny 1 al of e e ; 1 | 2 the set of neighbors, Γ(X) for a given X U, of a randomly theory and linear algebra we define the set of neighbors in generated left-degree bipartite graph. Moreover,⊂ we provide both notation. deducible bounds on the tail probability of the expansion of Definition 1.4: Consider a bipartite graph G = (U,V,E) the graph, Γ(X) / X . We derive quantitative guarantees for where E is the set of edges and e = (x ,y ) is the edge that | | | | ij i j randomly generated non-mean zero sparse binary matrices to connects vertex x to vertex y . For a set of left vertices S U i j ⊂ be adjacency matrices of expander graphs. In addition, we its set of neighbors is Γ(S) = yj xi S and eij E . In derive the first phase transitions showing regions in parameter terms of the adjacency matrix, {A,| of G∈ = (U,V,E∈) the} set space that depict when a left-regular bipartite graph with a of neighbors of AS for S = s, denoted by As, is the set of given set of parameters is guaranteed to be a lossless expander rows with at least one nonzero.| | with high probability. The most significant contribution, which Definition 1.5: Using Definition 1.4 the expansion of the is the key innovation, of this paper is the use of a novel graph is given by the ratio Γ(S) / S , or equivalently, As /s. technique of dyadic splitting of sets. We derived our bounds By the definition of a lossless| | expander,| | Definition 1.1,| | we using this technique and apply them to derive ℓ1 restricted need Γ(S) to be large for every small S U. In terms of | | ⊂ isometry constants (RIC1). the class of matrices defined by Definition 1.3, for every AS Numerous compressed sensing algorithms have been de- we want to have As as close to n as possible, where n is signed for sparse matrices [4], [5], [6], [7]. Another contribu- the number of rows.| Henceforth,| we will only use the linear tion of our work is the derivation of sampling theorems, pre- algebra notation As which is equivalent to Γ(S). Note that sented as phase transitions, comparing performance guarantees A is a random variable depending on the draw of the set of | s| for some of these algorithms as well as the more traditional ℓ1 columns, S, for each fixed A. Therefore, we can ask what is minimization compressed sensing formulation. We also show the probability that A is not greater than a , in particular | s| s how favorably ℓ1 minimization performance guarantees for where a is smaller than the expected value of A . This is s | s| such sparse matrices compared to what ℓ2 restricted isome- the question that Theorem 1.6 to answers. We then use this try constants (RIC2) analysis yields for the dense Gaussian theorem with RIC1 to deduce the corollaries that follow which matrices. For this comparison, we used sampling theorems are about the probabilistic construction of expander graphs, the and phase transitions from related work by Blanchard et. al. matrices we consider, and sampling theorems of some selected [13] that provided such theorems for dense Gaussian matrices compressed sensing algorithms. based on RIC2 analysis. Theorem 1.6: For fixed s,n,N and d, let an n N matrix, The outline of the rest of this introduction section goes as A be drawn from either of the classes of matrices× defined in follows. In Section I-A we present our main results in Theorem Definition 1.3, then 1.6 and Corollary 1.7. In Section I-B we discuss RIC1 and its implication for compressed sensing, leading to two sampling Prob ( A a )

where pmax(s, d) is given by A. Main results 2 Our main results is about a class of sparse matrices coming pmax(s, d)= , and (2) 25√2πs3d3 from lossless expander graphs, a class which include non-mean zero matrices. We start by defining the class of matrices we consider and a key concept of a set of neighbors used in the s/2 1 ⌈ ⌉ s a2i ai derivation of the main results of the manuscript. Firstly, we Ψ (as,...,a1)= (n ai) H − n 2i − · n a denote H(p) := p log(p) (1 p) log(1 p) as the Shannon " i=1 − i − − − − X entropy function of base e logarithm. a a a +a H 2i i n H i +3s log (5d) (3) Definition 1.3: Let be an matrix with nonzeros i − A n N d · ai − · n # in each column. We refer to A as× a random where . 1) sparse expander (SE) if every nonzero has value 1 a1 := d 2) sparse signed expander (SSE) if every nonzero has value If no restriction is imposed on as then the ai for i> 1 take from 1, 1 on their expected value ai given by {− } and the support set of the d nonzeros per column are drawn ai a2i = ai 2 b for i =1, 2, 4,..., s/2 . (4) uniformly at random, with each column drawn independently. − n ⌈ ⌉ SE matrices are adjacency matrices of lossless (k,d,ǫ)- b If a is restricted to be less than a , then the a for i > 1 expander graphs while SSE matrices have random sign pat- bs b s i are the unique solutions to the following polynomial system terns in the nonzeros of an adjacency matrix of a lossless (k,d,ǫ)-expander graph. If A is either an SE or SSE it will 3 2 2 2 b a2i 2aia2i +2ai a2i ai a4i =0 for i =1, 2,..., s/4 (5) have only d nonzeros per column and since we fix d n, A is − − ⌈ ⌉ therefore “vanishingly sparse.” We denote A as a≪ submatrix with a a for each i. S 2i ≥ i of A composed of columns of A indexed by the set S with Corollary 1.7: For fixed s,n,N,d and 0 <ǫ< 1/2, let an S = s. To aid translation between the terminology of graph n N matrix, A be drawn from the class of matrices defined | | × 3 in Definition 1.3, then Theorem 1.9: If an n N matrix A is either SE or SSE defined in Definition 1.3,× then A/d satisfies RIP-1 with Prob ( ASx 1 (1 2ǫ)d x 1) 0 → ∈ − struction algorithms. One of these guarantees is a bound on the Prob ( Ax (1 2ǫ)d x ) 1 (12) approximation error between our recovered vector, say xˆ, and k k1 ≥ − k k1 → the original vector by the best k-term representation error i.e. exponentially in n, where ρexp(δ; d, ǫ) is the largest limiting x xˆ 1 Const. x xk 1 where xk is the optimal k-term value of k/n for which representationk − k ≤ for xk. This− k is possible if A has small RIC , 1 k n in other words A satisfies the ℓ1 restricted isometry property H + Ψ (k,d,ǫ)=0. (13) N N (RIP-1), introduced by Berinde et. al. in [5] and defined as thus. The outline of the rest of the manuscript is as follows: In Definition 1.8 (RIP-1): Let χN be the set of all k sparse Section II we show empirical data to validate our main results vectors, then an n N matrix A has RIP-1, with the− lower and also present lemmas (and their proofs) that are key to the × RIC1 being the smallest L(k,n,N; A), when the following proof of the main theorem, Theorem 1.6. In Section III we condition holds. discuss restricted isometry constants and compressed sensing algorithms. In Section IV we prove the mains results, that (1 L(k,n,N; A)) x Ax x x χN . (8) − || ||1 ≤ || ||1 ≤ || ||1 ∀ ∈ is Theorem 1.6 and the corollaries in Sections I-A and I-B. For computational purposes it is preferable to have A sparse, Section V is the appendix where we present the alternative to but little quantitative information on L(k,n,N; A) has been Theorem 1.6. available for large sparse rectangular matrices. Berinde et. al. in [5] showed that scaled adjacency matrices of lossless II. DISCUSSION AND DERIVATION OF THE MAIN RESULTS expander graphs (i.e. scaled SE matrices) satisfy RIP-1, and We present the method used to derive the main results and the same proof extends to the signed adjacency matrices (i.e. discuss the validity and implications of the method. We start so called SSE matrices). by presenting in the next subsection, Section II-A, numerical 4 results that support the claims of the main results in Sections sketch of its proof in Section II-B in terms of the maximum I-A and I-B. This is followed in Section II with lemmas, likelihood and empirical illustrate the accuracy of the result in propositions and corollaries and their proofs. Fig. 4 where we show the relative error between aˆk and the mean values of the ak, a¯k, realized over 500 runs, to be less than 10 3. A. Discussion on main results − d =8, n =1024 and k = 2,4,8, ...,512 −1 Theorem 1.6 gives a bound on the probability that the 10 cardinality of a union of k sets each with d elements, i.e. A , | k| is less than . Fig. 2 shows plots of values of (size of set −2 ak Ak 10 of neighbors) for different k (in blue), superimposed| | on these plots is the mean value of Ak (in red) both taken over 500 | | −3 realizations and the ak in green. Similarly, Fig. 3 also shows 10 | k ˆ a values of Ak /k (the graph expansion) also taken over 500 k − ¯ a k ¯ | | a | realizations. −4 b 10 d =8, n =1024 and k = 2,3,4, ...,512 1200

−5 10 1000

−6 10 800 0 0.1 0.2 0.3 0.4 0.5 k/n | k

A 600 | Fig. 4. For ﬁxed d = 8 and n = 210, over 500 realizations the relative error between the mean values of ak (referred to as a¯k) and the aˆk from 400 Equation (4) of Theorem 1.6.

200 16000

0 14000 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 k /n 12000

10 Fig. 2. For ﬁxed d = 8 and n = 2 , over 500 realizations we plot (in blue) 10000 the cardinalities of the index sets of nonzeros in a given number of set sizes, k. The dotted red curve is mean of the simulations and the green squares are i a 8000 the aˆk. 6000

d =8, n =1024 and k = 2,3,4, ...,512 8 4000

2000 7

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 6 ǫ

5 /k | Fig. 5. Values of aias a function of ǫ ∈ [0, 1) for ak := (1 − ǫ)ˆak k 3 20 A | with d = 8, k = 2 × 10 and n = 2 . For this choice of d,k,n 4 j there are twelve levels of dyadic splits resulting in ai for i = 2 for 12 j = 0,..., ⌈log2 k⌉ = 12. The highest curve corresponds to ai for i = 2 , 11 3 the next highest curve corresponds to i = 2 , and continuing in decreasing magnitude with decreasing subscript values.

2 Fig. 5 shows representative values of ai from (5) for ak := 3 1 (1 ǫ)âk as a function of ǫ for d = 8, k = 2 10 , and 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 − 20 × k /n n = 2 . Each of the ai decrease smoothly towards d, but with ai for smaller values if i varying less than for larger Fig. 3. For fixed d = 8 and n = 210, over 500 realizations we plot (in values of i. blue) the graph expansion for a given input set size k. The dotted red curve For fixed 0 <ǫ< 1/2 and for small but fixed d, ρexp(δ; d, ǫ) is mean of the simulations and the green squares are the aˆk/k. in Corollary 1.11 is a function of δ for each d and ǫ, is a phase transition function in the (δ, ρ) plane. Below the exp Theorem 1.6 also claims that the aˆs are the expected values curve of ρ (δ; d, ǫ) the probability in (12) goes to one of the cardinalities of the union of s sets. We give a brief exponentially in n as the problem size grows. That is if A 5

n =1024, ǫ =0.25 is drawn at random with d 1s or d 1s in each column ± and having parameters (k,n,N) that fall below the curve of 0.02 exp ρ (δ; d, ǫ) then we say it is from the class of matrices in 0.018

Deﬁnition 1.3 with probability approaching one exponentially 0.016 in n. In terms of Γ(X) for X U and X k, Corollary 0.014 1.11 say that the| probability| Γ(⊂X) |(1 |≤ǫ)dk goes to one exponentially in n if the| parameters| ≥ of− our graph lies 0.012 in the region below ρexp(δ; d, ǫ). This implies that if we k/n 0.01 draw a random bipartite graphs that has parameters in the 0.008 exp region below the curve of ρ (δ; d, ǫ) then with probability 0.006 approaching one exponentially in n that graph is a lossless 0.004 (k,d,ǫ)-expander. d =8 d =12 0.002 d =16 d =8, ǫ =0.25 d =20 0 0 0.2 0.4 0.6 0.8 1 n/N 0.02

0.018 Fig. 7. Phase transition plots of ρexp(δ; d, ǫ) for ﬁxed ǫ = 1/6 and n = 210 0.016 with d varied.

0.014 n =1024, d =8

0.012

k/n 0.02 0.01 0.018 0.008 0.016 0.006 0.014 0.004 0.012 n =256 0.002 n =1024 n =4096 k/n 0.01 0 0 0.2 0.4 0.6 0.8 1 0.008 n/N 0.006

Fig. 6. Phase transition plots of ρexp(δ; d, ǫ) for fixed d = 8 and ǫ = 1/4 0.004 with n varied. ǫ =0.0625 0.002 ǫ =0.16667 ǫ =0.25 exp 0 Fig. 6 shows a plot of what ρ (δ; d, ǫ) converge to for 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 different values of n with ǫ and d fixed; Fig. 7 shows a n/N plot of what ρexp(δ; d, ǫ) converge to for different values of d with ǫ and n fixed; while Fig. 8 shows plots of what Fig. 8. Phase transition plots of ρexp(δ; d, ǫ) for fixed d = 8 and n = 210 ρexp(δ; d, ǫ) converge to for different values of ǫ with n and with ǫ varied. d fixed. It is interesting to note how increasing d increases the phase transition up to a point then it decreases the phase To put our results in perspective, we compare them to the transition. Essentially beyond there is no gain in d = 16 alternative construction in [26] which led to Corollary 2.2, increasing . This vindicates the use of small in most d d whose proof is given in Section V-A of the Appendix. Fig. 9 of the numerical simulations involving the class of matrices compares the phase transitions resulting from our construction considered here. Note the vanishing sparsity as the problem to that presented in [26], but we must point out however, that size grows while is fixed to a small value of (k,n,N) d the proof in [26] was not aimed for a tight bound. 8. In their GPU implementation [8] Blanchard and Tanner Corollary 2.2: Consider a bipartite graph G = (U,V,E) observed that SSE with d = 7 has a phase transition for with left vertices U = N, right vertices V = n and left numerous sparse approximation algorithms that is consistent degree d. Fix 0 <ǫ<| | 1/2 and d, as (k,n,N| | ) while with dense Gaussian matrices, but with dramatically faster k/n ρ (0, 1) and n/N δ (0, 1) then→ ∞ for ρ < implementation. (1 γ→)ρexp∈(δ; d, ǫ) and γ > 0 → ∈ − bi As afore-stated Corollary 1.11 follows from Theorem 1.6, Prob (G fails to be an expander) 0 (14) alternatively Corollary 1.11 can be arrived at based on proba- → bilistic constructions of expander graphs given by Proposition exp exponentially in n, where ρ (δ; d, ǫ) is the largest limiting 2.1 below. This proposition and its proof can be traced back bi value of k/n for which to Pinsker in [25] but for more recent proofs see [26], [10]. Proposition 2.1: For any N/2 k 1, ǫ> 0 there exists Ψ (k,n,N; d, ǫ)=0 (15) a lossless (k,d,ǫ)-expander with ≥ ≥ k dk ǫdk dk with Ψ (k,n,N; d, ǫ)= H + H (ǫ)+ log . d = O (log (N/k) /ǫ) and n = O k log (N/k) /ǫ2 . N N N n 6

2 0 −1 n = 2 10

−2 10

−3 10

−4 10 k /n

−5 10

−6 10

−7 10 ρ e x p(δ ; d , ǫ ) e x p ρ b i (δ ; d , ǫ ) −8 10 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n /N Fig. 10. The binary splitting of the support S that indexes the columns of AS resulting into a regular binary tree. Here we show the set of neighbors, As, and the resulting sets from the splitting. Each child has either the ceiling exp exp or the ﬂoor of it’s parent’s number of columns. The leaves of the tree, which Fig. 9. A comparison of ρ in Theorem 1.6 to ρbi of Corollary 2.2 derived using the construction based on Corollary 2.2. are at level ⌈log2 s⌉− 1 of the tree, have sets that are composed of union of nonzero elements in at most two columns, A2.

B. Key Lemmas Proof: Given B ,B [n] where B = b and 1 2 ⊂ | 1| 1 The following set of lemmas, propositions and corollaries B2 = b2 are drawn uniformly at random, independent of each form the building blocks of the proof of our main results to |other,| we calculate Prob ( B B = z) where z = b +b b. | 1 ∩ 2| 1 2− be presented in Section IV. Without loss of generality consider drawing B1 first, then For one fixed set of columns of A, denoted AS, the the probability that the draw of B2 intersecting B1 will have probability in (1) can be understood as the cardinality of the cardinality z, i.e. Prob ( B B = z), is the size of the event | 1 ∩ 2| unions of nonzeros in the columns of AS. Our analysis of this of drawing B2 intersecting B1 by z divided by the size of probability follows from a nested unions of subsets using a the sample space of drawing B2 from [n], which are given b n b n dyadic splitting technique. Given a starting set of columns we by 1 − 1 and respectively. Rewriting the division z · b2 z b2 recursively split the number of columns from this set, and the as a product− with the divisor raised to a negative power and resulting sets, into two sets composed of the ceiling and floor replacing z by b + b b gives (16). 1 2 − of half of the number of columns of the set we split. In other Corollary 2.4: If two sets, B ,B [n] are drawn uni- 1 2 ⊂ words, given a starting support set S (referred to as the parent formly at random, independent of each other, and B = B1 B2 ∪ set), we split it into two disjoint sets of size s/2 and s/2 ⌈ ⌉ ⌊ ⌋ Prob ( B = b)= P (b,b ,b ) (referred to as children). Then we union the nonzero elements | | n 1 2 × in the columns indexed by the children sets to get A s/2 and Prob ( B1 = b1) Prob ( B2 = b2) (17) ⌈ ⌉ | | · | | A s/2 . We continue this process until at a level when the ⌊ ⌋ Proof: Prob ( B = b)= Prob ( B B = b) by defini- cardinalities in each child set is at most two. Resulting from 1 2 tion. As a consequence| | of the inclusion-exclusion| ∪ | principle this type of splitting is a regular binary tree where the size of each child is either the ceiling or the floor of the size of it’s Prob ( B1 B2 = b)= Prob ( B1 B2 = b1 + b2 b) parent set. The root of the binary is our starting set S or A | ∪ | | ∩ | − s Prob ( B = b ) Prob ( B = b ) . (18) which we refer to as level 0. Then, the splitting, as described × | 1| 1 · | 2| 2 above proceeds till level log 1, see Fig. 10. We use Lemma 2.3 to replace Prob ( B B = b + b b) ⌈ 2⌉− | 1 ∩ 2| 1 2 − The probability of interest becomes a product of the prob- in (18) by Pn (b,b1,b2) leading to the required result. abilities involving all the children from the dyadic splitting In the binary tree resulting from our dyadic splitting scheme of the original set S, the index of the union of the nonzero the number of columns in the two children of a parent elements forming As. node is the ceiling and the floor of half of the number of The computation of the probability in (1) involves the com- columns of the parent node. At each level of the split the putation of the probability of the cardinality of the intersection number of columns of the children of that level differ by of two sets. This probability is given by Lemma 2.3 and one. The enumeration of these two quantities at each level Corollary 2.4 below. of the splitting process is necessary in the computation of the Lemma 2.3: Let B, B , B [n] where B = b , B = probability of (1). We state and prove what we refer to a dyadic 1 2 ⊂ | 1| 1 | 2| b2, B = B1 B2 and B = b. Also let B1 and B2 be drawn splitting lemma, Lemma 2.5, which we later use to enumerate uniformly at∪ random,| independent| of each other, and define these two quantities - the sizes (number of columns) of the P (b,b ,b ) := Prob ( B B = b + b b), then children and the number of children with a given size at each n 1 2 | 1 ∩ 2| 1 2 − 1 level of the split. b1 n b1 n − Pn (b,b1,b2)= − . (16) Lemma 2.5: Let S be an index set of cardinality s. For any b1 + b2 b b b1 b2 level j of the dyadic splitting, j = 0,..., log s 1, the − − ⌈ 2 ⌉− 7

set S is decomposed into disjoint sets each having cardinality Equating this value of Qj to its defined value in the statement s Qj = 2j or Rj = Qj 1. Let qj sets have cardinality Qj of the lemma gives and r sets have cardinality− R , then j j j s j j s qj s j s j q = s 2 +2 , and r =2 q . (19) − +1= qj = s 2 +2 . (35) j − · 2j j − j 2j 2j ⇒ − · 2j l m l m l m Proof: At every node on the binary tree the children have either of two sizes (number of columns) of the floor and ceiling Therefore, from (33) we use (35) to have of half the sizes of their parents and these sizes differ at most by 1, that is at level j of the splitting we have at most 2 j j s rj =2 qj rj =2 s, (36) different sizes. We define these sizes, Qj and Rj , in terms of − ⇒ · 2j − two arbitrary integers, m1 and m2, as follows. l m s m s m 1 and 2 (20) which concludes the proof. Qj = j + j Rj = j + j . 2 2 2 2 The bound in (1) is derived using a large deviation analysis Because of the nature of our splitting scheme we have Rj = of the nested probabilities which follow from the dyadic split- Q 1 which implies that m and m must satisfy the relation ting in Corollary 2.4. The large deviation analysis of (16) at j − 1 2 each stage involves its large deviation exponent ψn( ), which m1 m2 − =1. (21) follows from Stirling’s inequality bounds on the combinato· rial 2j product of (16). Lemma 2.6 establishes a few properties of Now let q and r be the number of children with Q and R j j j j ψn( ) while Lemma 2.7 shows how the various ψn( )’s at a number of columns respectively. Therefore, given· dyadic splitting level can be combined into a relative· ly j simple expression. qj + rj =2 . (22) Lemma 2.6: Define At each level j of the splitting the following condition must be satisfied x z x y qj Qj + rj Rj = s. (23) ψ (x,y,z) := y H − + (n y) H − · · n · y − · n y To find m , m , q and r , from (20) we substitute for Q z− 1 2 j j j n H , (37) and Rj in (23) to have − · n s m s m q + 1 + r + 2 = s, (24) j · 2j 2j j · 2j 2j then for n>x>y we have that j j j j 2− qj s +2 − qj m1 +2 − rj s+2− rj m2 = s, (25) j j 2− (qj + rj ) s +2− (qj m1 + rj m2)= s, (26) for y>z ψn(x,y,y) ψn(x,y,z) ψn(x,z,z); (38) j ≤ ≤ s +2− (qj m1 + rj m2)= s, (27) for x>z ψn(x,y,y) > ψn(z,y,y); (39) qj m1 + rj m2 =0. (28) for 1/2 < α 1 ψ (x,y,y) < ψ (αx, αy, αy). (40) ≤ n n We expanded the brackets from (24) to (25) and simplified from (25) to (26). We simplify the first term of (26) using Proof: We start with Property (38) and first show that the (22) to get (27) and we simplified this to get (28). left inequality holds. If we substitute y for z in (37) with y>z Equation (21) yields we reduce the first and last terms of (37) while we increase the j middle term of (37) which makes ψn(x,y,y) ψn(x,y,z). m1 = m2 +2 . (29) For second inequality we replace y by z in (37)≤ with y>z Substituting this in (28) yields we increase the first and the last terms of (37) and reduce the middle term which makes ψ (x,y,z) ψ (x,z,z). This j n n qj m2 +2 + rj m2 =0, (30) concludes the proof for (38). ≤ (q + r ) m +2jq =0, (31) j j 2 j Property (39) states that for fixed y, ψn(x,y,y) is mono- j 2 (qj + m2)=0. (32) tonically increasing in its first argument. To prove (39) we use the condition n>x>y to ensure that H(p) increases From (30) to (31) we expanded the brackets and rearranged monotonically with p, which implies that the first and last the terms and used (22) to simplify to (32). Using (32) and terms of (37) increase with x for fixed y while the second (29) respectively we have term remains constant. j m2 = qj and m1 =2 qj = rj . (33) Property (40) means that ψn(x,y,y) is monotonically de- − − creasing in x and y. For the proof we show that for 1/2 < α Substituting this in (20) we have 1 the difference ψ (αx, αy, αy) ψ (x,y,y) > 0. Using (37)≤ n − n s q s q we write out clearly what the difference, ψn(αx, αy, αy) Q = − j +1 and R = − j . (34) − j 2j j 2j ψn(x,y,y), is as follows. 8

is equal to the following if we replace qj and rj by their values given in Lemma 2.5. αx αy αx αy αy αyH − + (n αy)H − nH αy − n αy − n − j s j Q Q (46) x y x y y s 2 j +2 ψn aQj ,a j ,a j yH − (n y)H − + nH (41) − 2 · 2 2 − y − − n y n l m − j s x y αx αy αx αy + 2 j s ψn aRj ,a Rj ,a Rj = αyH − + nH − αyH − 2 − · 2 2 y n αy − n αy l m − − j s j Q Q (47) αy x y x y < s 2 j +2 ψn aQj ,a j ,a j nH yH − nH − − 2 · 2 2 − n − y − n y l m − j s x y y + 2 j s ψn aRj ,a Rj ,a Rj . + yH − + nH (42) 2 − · 2 2 n y n l m − j s j R R (48) x y αx αy x y < s 2 j +2 ψn aQj ,a j ,a j = αyH − αyH − yH − − 2 · 2 2 y − n αy − y l m − j s x y y αy + 2 j s ψn aRj ,a Rj ,a Rj . + yH − + nH nH 2 − · 2 2 n y n − n l m − j s j R R (49) αx αy x y < s 2 j +2 ψn aQj ,a j ,a j + nH − nH − (43) − 2 · 2 2 n αy − n y l m − − j s + 2 j s ψn aQj ,a Rj ,a Rj . From (41) to (42) we expanded brackets and simplified, while 2 − · 2 2 from (42) to (43) we rearranged the terms for easy comparison. l m j Again n>x>y ensures that the arguments of H( ) are =2 ψn aQj ,a Rj ,a Rj . (50) · 2 2 strictly less than half and H(p) increases monotonically· with p. In (43) the difference of the first two terms in the first row From (46) to (47) we upper bounded is positive while the difference of the second two terms is negative. However, the whole sum of the first four terms is ψn aQj ,a Qj ,a Qj by ψn aQj ,a Qj ,a Qj 2 2 2 2 negative but very close to zero when α is close to one which is the regime that we will be considering. The difference of and ψn aR j ,a Rj ,a Rj by ψn aRj ,a Rj ,a Rj 2 2 2 2 the last two terms in the second row is positive while the using (38) of Lemma 2.6. We then upper bounded difference of the terms on bottom row is negative but due to ψn aQj ,a Qj ,a Qj by ψn aQj ,a Rj ,a Rj , the concavity and steepness of the Shannon entropy function 2 2 2 2 the first positive difference is larger hence the sum of last four from (47) to (48), again using (38) of Lemma 2.6. From terms is positive. Since we can write n = cy with c> 1 being (48) to (49), using (39) of Lemma 2.6, we bounded an arbitrarily constant, then the positive sum in the second ψn aRj ,a Rj ,a Rj by ψn aQj ,a Rj ,a Rj . four terms dominates the negative sum in the first four terms. 2 2 2 2 For the final step from (49) to (50) we factored out This gives the required results and hence concludes this proof and the proof of Lemma 2.6. ψn aQj ,a Rj ,a Rj and then simplified. Lemma 2.7: Given ψ ( ) as defined in (37) then the fol- 2 2 n log (s) 1 · Using q + r = 2⌈ 2 ⌉− we bound lowing bound holds. log 2(s) 1 log2(s) 1 ⌈ ⌉−log (s) ⌈ 1 ⌉− q log (s) 1 by 2⌈ 2 ⌉− . Then we add this to the summa- ⌈ 2 ⌉− log2(s) 2 tion of (49) for j =0,..., log (s) 2 establishing the bound ⌈ ⌉− ⌈ 2 ⌉− qj ψn aQj ,a Qj ,a Qj + of Lemma 2.7. · 2 2 j=0 X Now we state and prove a lemma about the quantities ai. During the proof we will make a statement about the a R R i rj ψn aRj ,a j ,a j + q log2(s) 1 ψn (a2, d, d) · 2 2 ⌈ ⌉− · using their expected values which follows from a maximum aî log (s) 1 ⌈ 2 ⌉− likelihood analogy. j 2 ψn aQj ,a Rj ,a Rj , (44) Lemma 2.8: The problem ≤ · 2 2 j=0 X s/2 where R . ⌈ ⌉ a ⌈log2(s)⌉−1 = d s 2 max ψn (a2i,ai,ai) (51) Proof: The quantity inside the left hand side summation as,...,a2 2i · i=1 in (44), i.e. X has a global maximum and the maximum occurs at the

qj ψn aQj ,a Qj ,a Qj expected values of the ai, ai given by · 2 2 ai + rj ψn aR ,a Rj ,a Rj , (45) b j a2i = ai 2 for i =1, 2, 4,..., s/2 , (52) · 2 2 − n ⌈ ⌉ b b b 9

which are a solution of the following polynomial system. If we restrict as to take a fixed value, then Ψn (as,...,a2, d) is given by 2 ∇ a s/2 2na s/2 + nas =0, ⌈ ⌉ − ⌈ ⌉ T 3 2 2 2 a2i 2aia2i +2ai a2i ai a4i =0, e s a2i (a4i a2i)(2ai a2i) − − log − − 2 for i =1, 2,..., s/4 , (53) 2i · " (2a2i a4i) (a2i ai) #! ⌈ ⌉ − − for i =1, 2, 4,..., s/4 . (58) where a1 = d. If as is constrained to be less than aˆs, then ⌈ ⌉ there is a different global maximum, instead the ai satisfy the Obtaining the critical points by solving Ψn (as,...,a2, d)= following system 0 leads to the polynomial system (54).∇ 2 Given as, the Hessian, Ψn (as,...,ae 2, d) at these op- 3 2 2 2 ∇ a 2aia +2a a2i a a4i =0, timal a which are the solutions to the polynomial system 2i − 2i i − i i for i =1, 2, 4,..., s/4 , (54) (54) is negative definite whiche implies that this unique critical ⌈ ⌉ point is a global maximum; this case differs from a maximum again with a1 = d. likelihood estimation because of the extra constraint of fixing Proof: Define as. The dyadic splitting technique we employ requires greater s/2 ⌈ ⌉ s care of the polynomial term in the large deviation bound of Ψn (as,...,a2, d) := ψn (a2i,ai,ai) . (55) 2i · Pn (x,y,z) in (16); Lemma 2.10 establishes the polynomial i=1 X term. e Definition 2.9: P defined in (16) satisfies the up- Using the definition of ψn( ) in (37) we therefore have n (x,y,z) · per bound s/2 ⌈ ⌉ s a a Pn (x,y,z) π (x,y,z) exp(ψn(x,y,z)) (59) Ψ (a ,...,a , d)= a H 2i − i + ≤ n s 2 2i · i · a i=1 i with bounds of π (x,y,z) given in Lemma 2.10. X e a2i ai ai Lemma 2.10: For π (x,y,z) and Pn (x,y,z) given by (59) (n ai) H − n H . (56) and (16) respectively, if , is − · n ai − · n y,z

The gradient of Ψn (as,...,a2, d) , Ψn (as,...,a2, d) is 4 1 ∇ 5 yz(n y)(n z) 2 given by − − , (60) 4 2πn(y + z x)(x y)(x z)(n x) e e − − − − 2a s/2 as (n as) otherwise π (x,y,z) has the following cases. log ⌈ ⌉ − − , 2 3 1 " as a s/ 2 # 5 y(n z) 2 − ⌈ ⌉ − if x = y>z; (61) T 4 n(y z) s a2i (a4i a2i)(2ai a2i) − log − − 3 1 2i · (2a a ) (a a )2 5 (n y)(n z) 2 " 2i 4i 2i i #! − − if x = y + z; (62) − − 4 n(n y z) for i =1, 2, 4,..., s/4 , (57) − − ⌈ ⌉ 2 1 5 2πz(n z) 2 T − if x = y = z. (63) where v is the transpose of the vector v. Obtaining the 4 n critical points by solving Ψn (as,...,a2, d)=0 leads to the polynomial system (53).∇ Proof: The Stirling’s inequality [27] below would be used 2 in this proof and other proofs to follow. The Hessian, Ψn (as,...,ae 2, d) at these optimal ai which are the solutions∇ to the polynomial system (53) is 16 1 NH(p) N (2πp(1 p)N)− 2 e negative deﬁnite whiche implies that this unique critical point 25 − ≤ Np is a global maximum point. Let the solution of the system 5 1 NH(p) be the aˆ then they satisfy a recurrence formula (52) which (2πp(1 p)N)− 2 e . (64) i ≤ 4 − is equivalent to their expected values as explained in the paragraph that follows. From Deﬁnition 2.9 the quantity π (x,y,z) is the polynomial portion of the large deviation upper bound. Within this We estimate the uniformly distributed parameter relating a 2i proof we express this by to ai. The best estimator of this parameter is the maximum 1 likelihood estimator which we calculate from the maximum y n y n − log-likelihood estimator (MLE). The summation of the ψ ( ) π (x,y,z)= poly − . (65) n · " y + z x x y z # is the logarithm of the join density functions for the a2i. − − The MLE is obtained by maximizing this summation and We derive the upper bound π (x,y,z) using the Stirling’s it corresponds to the expected log-likelihood. Therefore, the inequality. The right inequality of (64) is used to upper bound y n y parameters given implicitly by (52) are the expected log- y+z x and x−y and the left inequality of (64) is used to likelihood which implies that the values of the aˆ in (52) are lower− bound n−. If y,z

If x = y>z (60) is undefined; however, substituting y for The convergence analysis of nearly all of these algorithms y y n y n y x in (65) gives y+z x = z and x−y = −0 = 1. We rely heavily on restricted isometry constants (RIC). As we saw − 1 − upper bound the product y n − using the right inequality earlier RICs measures how near isometry A is when applied to z z y k-sparse vectors in some norm. For the ℓ1 norm, also known in (64) to bound z from above and the left inequality in n as the Manhattan norm, RIC1 is stated in (8). The restricted (64) to bound from below z . The resulting polynomial part of the product simplifies to (61). Euclidian norm isometry, introduced by Candès in [32], is y y n y If x = y + z, then = = 1 and − = denoted by RIC2 and is defined in Definition 3.1. y+z x 0 x y N n y . As above, we upper bound− the product of n −y and Definition 3.1 (RIC2): Define χ to be the set of all −z x−y n 1 − k sparse vectors and draw an n N matrix A, then for all x − using (64) and simplify the polynomial part of this −N × ∈ z χ , A has RIC2, with lower and upper RIC2, L(k,n,N; A) product to get (62). If instead , then y y x = y = z y+z x = 0 and U(k,n,N; A) respectively, when the following holds. n y n y − and x−y = −0 both of which equal 1. Therefore the − 1 n − (1 L(k,n,N; A)) x 2 Ax 2 bound only involves z which we bound using (64) and − k k ≤k k the resulting polynomial part simplifies to (63). (1 + U(k,n,N; A)) x . ≤ k k2 Corollary 2.11: If n> 2y, then π(y,y,y) is monotonically The computation of RIC for adjacency matrices of lossless increasing in y. 1 (k,d,ǫ)-expander graphs is equivalent to calculating ǫ. The Proof: If n > 2y, (63) implies that π(y,y,y) is propor- computation of RIC is intractable except for trivially small tional to y, i.e. π(y,y,y) = c y, with c > 0 and c y is 2 √ √ √ problem sizes (k,n,N) because it involves doing a combinato- monotonic in y. N rial search over all k column submatrices of A. As a results attempts have been made to derive RIC bounds. Some of III. RESTRICTED ISOMETRY CONSTANTS AND 2 these attempts have been successful in deriving RIC bounds COMPRESSED SENSING ALGORITHMS 2 for the Gaussian ensemble and these bounds have evolved from Here we introduce RIC2 and briefly discuss the implications the first by Candès and Tao in [19], improved by Blanchard, of RIC1 and RIC2 to compressed sensing algorithms in Sec- Cartis and Tanner in [33] and further improved by Bah and tion III-A. In Section III-B we present the first ever quantitative Tanner in [34]. comparison of the performance guarantees of some of the RIC2 bounds have been used to derive sampling theorems compressed sensing algorithms proposed for sparse matrices for compressed sensing algorithms - ℓ1-minimization and the as stated in Definition 1.3. greedy algorithms for dense matrices, NIHT, CoSAMP, and SP. Using the phase transition framework with RIC2 bounds A. Restricted isometry constants Blanchard et. al. compared performance of these algorithms It is possible to include noise in the Compressed Sensing in [13]. In a similar vain, as another key contribution of this model, for instance y = Ax + e where e is a noise vector paper we provide sampling theorems for ℓ1-minimization and capturing the model misfit or the non-sparsity of the signal x. combinatorial greedy algorithms, EMP, SMP, SSMP, LDDSR The ℓ0-minimization problem (7) in the noise case setting is and ER, proposed for SE and SSE matrices.

min x 0 subject to Ax y 2 < e 2, (66) x χN k k k − k k k B. Algorithms and their performance guarantees ∈ where e 2 is the magnitude of the noise. Theoretical guarantees have been given for ℓ1 recovery Problemsk k (7) and (66) are in general NP-hard and hence and other greedy algorithms including EMP, SMP, SSMP, intractable. To beneﬁt from the rich literature of algorithms LDDSR and ER designed to do compressed sensing recovery available in both convex and non-convex optimization the with adjacency matrices of lossless expander graphs and by ℓ0-minimization problem is relaxed to an ℓp-minimization extension SSE matrices. Sparse matrices have been observed one for 0 < p 1. It is well known that the ℓ norm to have recovery properties comparable to dense matrices ≤ p for 0 < p 1 are sparsifying norms, see [19], [21]. In for ℓ1-minimization and some of the aforesaid algorithms, addition, there≤ are speciﬁcally designed classes of algorithms see [5], [6], [23], [4], [24] and the references therein. Base that take on the ℓ0 problem and they have been referred to on theoretical guarantees, we derived sampling theorems and as greedy algorithms. When using dense sensing matrices, A, present here phase transition curves which are plots of phase popular greedy algorithms include Normalized Iterative Hard transition functions ρalg(δ; d, ǫ) of algorithms such that for Thresholding (NIHT), [28], Compressive Sampling Matching k/n ρ < (1 γ)ρalg(δ; d, ǫ), γ> 0, a given algorithm is Pursuits (CoSAMP), [22], and Subspace Pursuit (SP), [20]. guaranteed→ to recover− all k-sparse signals with overwhelming When A is sparse and non-mean zero, a different set of probability approaching one exponentially in n. combinatorial greedy algorithms have been proposed which 1) ℓ1-minimization: Note that ℓ1-minimization is not an iteratively locates and eliminate large (in magnitude) com- algorithm per se, but can be solved using Linear Program- ponents of the vector, [5]. They include Expander Matching ming (LP) algorithms. Berinde et. al. showed in [5] that ℓ1- Pursuit (EMP), [29], Sparse Matching Pursuit (SMP), [30], minimization can be used to perform signal recovery with bi- Sequential Sparse Matching Pursuit (SSMP), [31], Left Degree nary matrices coming from expander graphs. We reproduce the Dependent Signal Recovery (LDDSR), [24], and Expander formal statement of this guarantee in the following theorem, Recovery (ER), [23], [7]. the proof of which can be found in [5], [6]. 11

Algorithm 1 Sequential Sparse Matching Pursuit (SSMP) [31] Theorem 3.2 (Theorem 3, [5], Theorem 1, [6]): Let A be Input: A, y, η an adjacency matrix of a lossless (k,d,ǫ)-expander graph with Output: k-sparse approximation xˆ of the target signal x α(ǫ)=2ǫ/(1 2ǫ) < 1/2. Given any two vectors x, xˆ such Initialization: − 1. Set j = 0 that Ax = Axˆ, and xˆ 1 x 1, let xk be the largest (in || || ≤ || || 2. Set xj = 0 magnitude) coefficients of x, then Iteration: Repeat T = O (log (kxk1/η)) times 1. Set j = j + 1 2 2. Repeat (c − 1)k times x xˆ x x . (67) || − ||1 ≤ 1 2α(ǫ) || − k||1 a) Find a coordinate i & an increment z that − minimizes kA (xj + zei) − yk1 b) Set xj to xj + zei The condition that α(ǫ)=2ǫ/(1 2ǫ) < 1/2 implies the 3. Set xj = Hk (xj ) sampling theorem stated as Corollary− 3.3, that when satisfied Return xˆ = xT ensures a positive upper bound in (67). The resulting sampling theorem is given by ρℓ1 (δ; d, ǫ) using ǫ =1/6 from Corollary 3.3. Given a vector y = Ax + e, the algorithm returns approxima- Corollary 3.3 ([5]): ℓ1-minimization is guaranteed to re- tion vector xˆ satisfying cover any k-sparse vector from its linear measurement by an 1 4ǫ 6 x xˆ − x x + e , (68) adjacency matrix of a lossless (k,d,ǫ)-expander graph with || − ||1 ≤ 1 16ǫ|| − k||1 (1 16ǫ)d|| ||1 ǫ< 1/6. − − Proof: Setting the denominator of the fraction in the right where xk is the k largest (in magnitude) coordinates of x. hand side of (67) to be greater than zero gives the required Corollary 3.5 ([29]): SSMP, EMP, and SMP are all guar- results. anteed to recover any k-sparse vector from its linear measurement by an adjacency matrix of a lossless (k,d,ǫ)-expander 2) Sequential Sparse Matching Pursuit (SSMP): Introduced graph with ǫ< 1/16. by Indyk and Ruzic in [31], SSMP has evolved as an im- 3) Expander Recovery (ER): Introduced by Jafarpour et. provement of Sparse Matching Pursuit (SMP) which was an al. in [23], [7], ER is an improvement on an earlier algorithm improvement on Expander Matching Pursuit (EMP). EMP also introduced by Xu and Hassibi in [24] known as Left Degree introduced by Indyk and Ruzic in [29] uses a voting-like mech- Dependent Signal Recovery (LDDSR). The improvement was anism to identify and eliminate large (in magnitude) compo- mainly on the number of iterations used by the algorithms nents of signal. EMP’s drawback is that the empirical number and the type of expanders used, from (k, d, 1/4)-expanders of measurements it requires to achieve correct recovery is for LDDSR to (k,d,ǫ)-expander for any ǫ < 1/4 for ER. suboptimal. SMP, introduced by Berinde, Indyk and Ruzic in Both algorithms use this concept of a gap defined below. [30], improved on the drawback of EMP. However, it’s original Definition 3.6 (gap, [24], [23], [7]): Let be the original version had convergence problems when the input parameters x signal and y = Ax. Furthermore, let xˆ be our estimate for x. (k and n) fall outside the theoretically guaranteed region. For each value we define a gap as: This is fixed by the SMP package which forces convergence yi gi when the user provides an additional convergence parameter. N In order to correct the aforementioned problems of EMP and gi = yi Aij xˆj . (69) − j=1 SMP, Indyk and Ruzic developed SSMP. It is a version of X SMP where updates are done sequentially instead of parallel, Algorithm 2 below is a pseudo-code of the ER algorithm consequently convergence is automatically achieved. All three for an original k-sparse signal x RN and the measurements algorithms have the same theoretical recovery guarantees, y = Ax with an n N measurement∈ matrix A that is an which we state in Theorem 3.4, but SSMP has better empirical adjacency matrix of a× lossless (2k,d,ǫ)-expander and ǫ< 1/4. performances compared to it’s predecessors. The measurements are assumed to be without noise, so we aim Algorithm 1 below is a pseudo-code of the SSMP algorithm for exact recovery. The authors of [23], [7] have a modified based on the following problem setting. The measurement version of the algorithm for when x is almost k-sparse. matrix A is an n N adjacency matrix of a lossless ((c + × 1)k,d,ǫ/2)-expander scaled by d and A has a lower RIC1, Algorithm 2 Expander Recovery (ER) [23], [7] L ((c + 1)k,n,N)= ǫ. The measurement vector y = Ax + e Input: A, y Output: k-sparse approximation xˆ of the original signal x where e is a noise vector and η = e . We denote by H (y) k k1 k Initialization: the hard thresholding operator which sets to zero all but the 1. Set xˆ = 0 largest, in magnitude, k entries of y. Iteration: Repeat at most 2k times 1. if y = Axˆ then The recovery guarantees for SSMP (also for EMP and 2. return xˆ and exit SMP) are formalized by the following theorem from which 3. else we deduce the recovery condition (sampling theorem) in terms 4. Find a variable node xˆj such that at least (1 − 2ǫ)d of the measurements it participated in, have identical gap g of ǫ in Corollary 3.5. Based on Corollary 3.5 deduced from 5. Set xˆj =x ˆj + g, and go to 2. Theorem 3.4 we derived phase transition, ρSSMP (δ; d, ǫ), for 6. end if SSMP. Theorem 3.4 (Theorem 10, [29]): Let A be an adjacency Theorem 3.7 gives recovery guarantees for ER. Directly matrix of a lossless (k,d,ǫ)-expander graph with ǫ < 1/16. from this theorem we read-off the recovery condition in terms 12 of ǫ for Corollary 3.8, from which we derive phase transition 0.014 functions, ρER(δ; d, ǫ), for ER.

0.012 n N Theorem 3.7 (Theorem 6, [7]): Let A R × be the ∈ adjacency matrix of a lossless (2k,d,ǫ)-expander graph, where 0.01 ǫ < 1/4 and n = O (k log(N/k)). Then, for any k-sparse signal x, given y = Ax, ER recovers x successfully in at 0.008 l1 most iterations. ρE (δ; d, ǫ) 2k l1 k/n ρG(δ) 0.006 Corollary 3.8: ER is guaranteed to recover any k-sparse vector from its linear measurement by an adjacency matrix of 0.004 a lossless (k,d,ǫ)-expander graph with ǫ< 1/4. 0.002

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 n/N n =1024, d =8

ℓ1 Fig. 12. Phase transition plots of ℓ1, ρG (δ), for Gaussian matrices derived 0.02 ℓ1 using RIC2 and ρE (δ; d, ǫ) for adjacency matrices of expander graphs with 0.018 n = 1024, d = 8, and ǫ = 1/6.

0.016

0.014 IV. PROOF OF MAINS RESULTS 0.012

k/n 0.01 A. Proof of Theorem 1.6 1 2 0.008 By the dyadic splitting As = A s A s and therefore | | ⌈ 2 ⌉ ∪ ⌊ 2 ⌋ 0.006 1 2 Prob ( As as)= Prob A s A s as (70) 0.004 2 2 ρSSMP (δ; d, ǫ = 1/16) | |≤ ⌈ ⌉ ∪ ⌊ ⌋ ≤ l 0.002 ρ 1 (δ; d, ǫ = 1/6) 1 2 ER = Prob A s A s = l s (71) ρ (δ; d, ǫ = 1/4) ⌈ 2 ⌉ ∪ ⌊ 2 ⌋ 0 ls l⌈ s ⌉,l⌊ s ⌋ 2 2 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 X X n/N 1 2 = Pn ls,l s ,l s ⌈ 2 ⌉ ⌊ 2 ⌋ × ls l1 l2 alg ⌈ s ⌉ ⌊ s ⌋ Fig. 11. Phase transition curves ρ (δ; d, ǫ) computed over finite values X X2 X2 of δ ∈ (0, 1) with d fixed and the different ǫ values for each algorithm - 1/4, 1 1 2 2 1/6 and 1/16 for ER, ℓ1 and SSMP respectively. Prob A s = l s Prob A s = l s . (72) ⌈ 2 ⌉ ⌈ 2 ⌉ ⌊ 2 ⌋ ⌊ 2 ⌋ From (70) to (71) we sum over all possible events while from (71) to (72), in line with the splitting technique, we simplify the probability to the product of the probabilities of 4) Comparisons of phase transitions of algorithms: Fig. 1 2 the cardinalities of s and s and their intersection. SSMP A A 11 compares the phase transition plot of ρ (δ; d, ǫ) for ⌈ 2 ⌉ ⌊ 2 ⌋ In a slight abuse of notation we write j to denote SSMP (also for EMP and SMP), the phase transition of plot l j=1,...,x ER applying the sum x times. Now we use Lemma 2.5 to simplify ρ (δ; d, ǫ) for ER (also of LDDSR) and the phase transition P ℓ1 (72) as follows. plot of ρ (δ; d, ǫ) for ℓ1-minimization. Remarkably, for ER and LDDSR recovery is guaranteed for a larger portion of j1 2j1 1 2j1 P l ,l Q − ,l Q the plane than is guaranteed by the theory for - n Q0 0 0 (δ, ρ) ℓ1 2 2 j j j ⌈ ⌉ ⌊ ⌋ l 1 l 2 l 3 minimization using sparse matrices; however, ℓ1-minimization XQ0 XQ1 XR1 has a larger recovery region than does SSMP, EMP, and SMP. j1=1,...,q0 j2=1,...,q1 j3=1,...,r1 q1 j2 j2 Fig. 12 shows a comparison of the phase transition of - Prob AQ = lQ ℓ1 × 1 1 j =1 minimization as presented by Blanchard et. al. in [13] for Y2 q1+r 1 dense Gaussian matrices based on RIC2 analysis and the Prob Aj3 = lj3 . (73) phase transition we derived here for the sparse binary matrices × R1 R1 j =q +1 coming from lossless expander based on RIC1 analysis. This 3 1 Y shows a remarkable difference between the two with sparse Let’s quickly verify that (73) is the same as (72). By Lemma matrices having better performance guarantees. However, these 2.5, Q0 = s is the number of columns in the set at the zeroth improved recovery guarantees are likely more due to the closer level of the split while q0 =1 is the number of sets with Q0 match of the method of analysis than to the efficacy of sparse columns at the zeroth level of the split. Thus for j1 = 1 the matrices over dense matrices. first summation and the P ( ) term are the same in the two n · 13 equations. If Q0 = Q0 , then they are both equal to Q and a polynomial, π( ), and an exponential with exponent ψ ( ). ⌈ 2 ⌉ ⌊ 2 ⌋ 1 · n · q =2 while r =0. If on the other hand Q0 = Q0 +1, 1 1 ⌈ 2 ⌉ ⌊ 2 ⌋ then q1 =1 and r1 =1. In either case we have the remaining j1 2j1 1 2j1 π l ,l Q − ,l Q Q0 0 0 2 2 part of the expression of (72) i.e. the second two summations j j j ⌈ ⌉ ⌊ ⌋ × l 1 l 2 l 3 and the product of the two Prob( ). XQ0 XQ1 XR1 · j1=1,...,q0 j2=1,...,q1 j3=1,...,r1 Now we proceed with the splitting - note (73) stopped only j1 2j1−1 2j1 ψn lQ ,l Q ,l Q at the first level. At the next level, the second, we will have 0 ⌈ 0 ⌉ ⌊ 0 ⌋ e 2 2 ! q2 sets with Q2 columns and r2 sets with R2 columns which · " j j l 4 l 5 leads to the following expression. XQ2 XR2 j4=1,...,q2 j5=1,...,r2

j2 2j2−1 2j2 j1 2j1 1 2j1 ψn l ,l ,l Q1 Q1 Q1 P l ,l Q − ,l Q j2 2j2 1 2j2 ⌈ ⌉ ⌊ ⌋ n Q0 0 0 2 2 ! 2 2 π l ,l Q − ,l Q e j j j ⌈ ⌉ ⌊ ⌋ Q1 1 1 l 1 l 2 l 3 ⌈ 2 ⌉ ⌊ 2 ⌋ · × XQ0 XQ1 XR1 j1=1,...,q0 j2=1,...,q1 j3=1,...,r1 j3 2j3−1 2j3 ψn lR ,l R ,l R j 2j 1 2j 1 ⌈ 1 ⌉ ⌊ 1 ⌋ 3 3 3 2 2 ! π l ,l R − ,l R e j2 2j2 1 2j2 R1 1 1 2 2 P l ,l Q − ,l Q · n Q1 1 1 ⌈ ⌉ ⌊ ⌋ 2 2 × " j j ⌈ ⌉ ⌊ ⌋ l 4 l 5 XQ2 XR2 ... j4=1,...,q2 j5=1,...,r2 × × " " j s q2 l 2⌈log2 ⌉−2 Q Xs j3 2j3 1 2j3 j4 j4 ⌈log2 ⌉−1 P l ,l R − ,l R Prob A = l n R1 1 1 Q1 Q1 j2⌈log s⌉−2=1,...,qj 2 2 × 2 ⌈log2 s⌉−1 ⌈ ⌉ ⌊ ⌋ j =1 4 j 2j 1 2j Y 2⌈log2 s⌉−4 2⌈log2 s⌉−4− 2⌈log2 s⌉−4 q2+r2 π l4 ,l2 ,l2 j5 j5 Prob A = l . (74) j 2j −1 2j R1 R1 2⌈log2 s⌉−4 2⌈log2 s⌉−4 2⌈log2 s⌉−4 ψn l ,l ,l j =q +1 # 4 2 2 5 2 e Y × j 2j 1 2⌈log2 s⌉−3 2⌈log2 s⌉−3− We continue this splitting of each instance of Prob( ) for π l3 ,l2 , d · × log s +1 levels (from level 0 to level log s ) until reaching j 2j −1 2 2 2⌈log2 s⌉−3 2⌈log2 s⌉−3 ⌈ ⌉ ⌈ ⌉ ψn l ,l ,d sets with single columns. Note that for the resulting binary tree e 3 2 × × from the splitting it sufﬁce to stop at level log s 1 as can be j 2 2⌈log2 s⌉−2 ⌈ ⌉− j ψn l ,d,d seen in Fig. 10, since at this level we have all the information 2⌈log2 s⌉−2 2 π l2 , d, d e ... . (76) required for the enumeration of the sets. By construction, at · # level log s , omitted from Fig. 10, the probability that the 2 Using Lemma 2.6 we maximize the ψ ( ) and hence the single⌈ column⌉ has d nonzeros is one. This process gives a n exponentials. If we maximize each by choosing· l to be a , complicated product of nested sums of P ( ) which we express ( ) ( ) n then we can pull the exponentials out of the product.· The· as · exponential will then have the exponent Ψn (as,...,a2, d). The factor involving the π( ) will be called Π (ls,...,l2, d) and we have the following upper· bound for (76). j1 2j1 1 2j1 P l ,l Q − ,l Q n Q0 0 0 2 2 j j j ⌈ ⌉ ⌊ ⌋ Π (ls,...,l2, d) exp [Ψn (as,...,a2, d)] , (77) l 1 l 2 l 3 XQ0 XQ1 XR1 · j =1,...,q j =1,...,q j =1,...,r 1 0 2 1 3 1 where the exponent Ψn (as,...,a2, d) is given by

j2 2j2 1 2j2 P l ,l Q − ,l Q Q Q n Q1 1 1 ψn aQ0 ,a 0 ,a 0 + ... + ψn (a2, d, d) . (78) 2 2 2 2 × " j j ⌈ ⌉ ⌊ ⌋ ⌈ ⌉ ⌊ ⌋ l 4 l 5 XQ2 XR2 Now we attempt to bound the probability of interest in j4=1,...,q2 j5=1,...,r2 (70). This task reduces to bounding Π (ls,...,l2, d) and j3 2j3 1 2j3 P l ,l R − ,l R ... Ψn (as,...,a2, d) in (77) and we start with the former, i.e. n R1 1 1 2 2 × ⌈ ⌉ ⌊ ⌋ · " " j bounding Π (l ,...,l , d). We bound each sum of π( ) in 2⌈log2 s⌉−2 s 2 lQ X · ⌈log2 s⌉−1 Π (ls,...,l2, d) of (77) by the maximum of summations j2⌈log s⌉−2=1,...,qj 2 ⌈log2 s⌉−1 multiplied by the number of terms in the sum. From (63) we j 2j 1 2j 2⌈log2 s⌉−4 2⌈log2 s⌉−4 2⌈log2 s⌉−4 see that π( ) is maximized when all the three arguments are Pn l ,l − ,l 4 2 2 the same and· using Corollary 2.11 we take largest possible j 2j 1 2⌈log2 s⌉−3 2⌈log2 s⌉−3− Pn l3 ,l2 , d arguments that are equal in the range of the summation. In × this way the following proposition provides the bound we end j 2⌈log2 s⌉−2 up. Pn l2 , d, d ... . (75) × # # Proposition 4.1: Let’s make each summation over the sets with the same number of columns to have the same range Using the deﬁnition of Pn( ) in Lemma 2.3 we bound where the range we take are the maximum possible for each (75) by bounding each P ( ) as· in (59) with a product of such set. Let’s also maximize π( ) where all its three input n · · 14 variables are equal and are equal to the maximum of the . Consequently (79) is upper bounded by the following. third variable. Then we bound each sum by the largest term in the sum multiplied by the number of terms. This scheme q0 Q 5 2 Q combined with Lemma 2.5 give the following upper bound on 0 d 2π 0 d 2 4 2 × Π (ls,...,l2, d). s ! log s 2 qj +rj ⌈ 2 ⌉− Q 5 2 Q j d 2π j d q0 2 4 2 × 2 j=1 s ! Q0 5 Q0 Y d 2π d q⌈log2 s⌉−1 2 4 2 × Q 2 Q s ! log2 s 1 5 log2 s 1 ⌈ ⌉− d 2π ⌈ ⌉− d qj log2 s 2 2 2 4 s 2 ⌈ ⌉− Q 5 Q ! j d 2π j d (82)  2 4 2 × j=1 s ! Y r j  2 j Now we use the property that qj + rj = 2 for j = Rj 5 Rj d 2π d 1,..., log2 s 1 from Lemma 2.5 to bound (82) by the 2 4 2  × ⌈ ⌉ − s ! following.

q⌈log2 s⌉−1 2  2j Q log2 s 1 5 Q log2 s 1 log2 s 1 2 ⌈ ⌉− d 2π ⌈ ⌉− d ⌈ ⌉− Qj 5 Qj 2 4 s 2 d 2π d . (83) ! 2 4 2 j=0 s ! (79) Y We have a strict upper bound when , which r log2 s 1 = 0 ⌈ ⌉− 6 j Proof: From (63) we have occurs when s is not a power of 2, because then by qj +rj =2 we have log2 s 1. In fact (83) is q log2 s 1 + r log2 s 1 =2⌈ ⌉− an overestimate⌈ ⌉− for a large⌈ s⌉−which is not a power of 2. 2 2 s Qj s 5 2πy(n y) 5 Note Q = j by Lemma 2.5. Thus = j and π(y,y,y)= − < 2πy. (80) j 2 2 2 +1 4 n 4 Qj s l m r j+1 . So we bound (83) by the following. p 2 ≤ 2 j k j Simply put, we bound π(x,y,z) by multiplying the max- log s 1 2 x ⌈ 2 ⌉− s 5 2 s imum of π(x,y,z) with the number of terms in the summa- d 2π d (84) P 2j+1 4 2j+1 tion. Remember the order of magnitude of the arguments of j=0 r ! Y l m l m π(x,y,z) is x y z. Therefore, the maximum of π(x,y,z) ≥ ≥ occurs when the arguments are all equal to the maximum value Next we upper bound log2 s 1 in the limit of the ⌈ ⌉ − s s 1 of z. In our splitting scheme the maximum possible value of product by log2 s and upper bound 2j+1 by 2j+1 + 2 = Qj j+1 ⌈ ⌉ Q is since there are nonzeros in each column. s 2 l j 2 d d j+1 1+ , we also move the d into the square root and 2 · 2 s Also Q Q Q so the number of terms combined the constants to have the following bound on (84). l j lQj l j + l j 2 ≤ ≤ 2 2 Qj in the summation over lQj is 2 d, and similarly for Rj . · log2 s j+1 We know the values of the Qj and the Rj and their quantities s 2 25√2π j+1 1+ qj and rj respectively from Lemma 2.5. 2 s 16 × j=0 " ! Q R j j Y j We replace y by 2 d or 2 d accordingly into the 2 · · j+1 bound of π(y,y,y) in (80) and multiply by the number of s 2 3 Qj Rj 1+ d . terms in the summation, i.e. d or d. This product 2j+1 s 2 2 s # is then repeated q or r times accordingly· until· the last level j j of the split, j = log s 1, where we have q and 2j+1 2 log2 s 1 We bound 1+ by 2 to bound the above by (which⌈ is equal⌉− to 2). We exclude ⌈ ⌉− since s Q log2 s 1 R log2 s 1 l ⌈ ⌉− = d. Putting the whole product together⌈ ⌉− results to R⌈log2 s⌉−1 2j (79) hence concluding the proof of Proposition 4.1. log2 s s 25√2π s d3 As a final step we need the following corollary. 2j 16 2j j=0 " ! r # Corollary 4.2: Y j log s 2 2 25√2π s3d3 = (85) 16 23j 2 j=0 " ! r # Π (ls,...,l2, d) < exp [3s log(5d)] . (81) Y 25√2πs3d3 · where we moved s/2j into the square root. Using the rule of indices the product of the constant term is replaced by it’s Proof: From Lemma 2.5 we can upper bound Rj by Qj power to sum of the indices. We then rearranged to have the 15 power 3/2 in the outside and this gives the following. 2s 25√2π 16 1 (2d)3s (95) log2 s i 3/2 3 3 2 j 16 ! 25√2π 8√s d i=0 log2 s 2 25√2π sd 2s P (86) √ 16  2j  25 2π 3s 2 ! j=0 = (2d) (96) Y 16 25√2πs3d3 3/2 ! 2s 1  log s j − s 2 2 2/3 25√2π log2 2i 1 = (sd) i=0 (87) 2 25√2π 16  2j  = exp 3s log 2 d (97) ! P j=0 √ 3 3 ·   16  Y 25 2πs d ! 3/2 2s 1  log2 s j  − j=0 j2 2    25√2π 2s 1 1 < exp [3s log(5d)] (98) = (sd) − P . (88) √ 3 3 · 16  2  25 2πs d ! From (95) to (96) we simplified and from (96) to (97) we   rewrote the powers as an exponential with a logarithmic expo- 2/3 25√2π From (86) to (87) we evaluate the power of the first factor nent. Then from (97) to (98) we upper bounded 2 16 which is a geometric series and we again use the rule of indices by 5 which gives the required format of a product of a for the sd factor. Then from (87) to (88) we use the indices’ polynomial and an exponential to conclude the proof. rule for the last factor and evaluate the power of the sd factor With the bound in Corollary 4.2 we have completed which is also a geometric series. We simplify the power of the the bounding of Π (ls,...,l2, d) in (77). Next we bound last factor by using the following. Ψn (as,...,a2, d) which is given by (78). Lemma 2.5 gives the three arguments for each ψn( ) and the number of ψn( ) with the same arguments. Using· this lemma we express· m Ψn (as,...,a2, d) as k 2k = (m 1) 2m+1 +2. (89) · − · log (s) 2 Xk=1 ⌈ 2 ⌉− qj ψn aQj ,a Qj ,a Qj + · 2 2 j=0 X This therefore simplifies (88) as follows. rj ψn aRj ,a Rj ,a Rj + · 2 2 q ψ (a , d, d) . (99) 3/2 log2(s) 1 n 2 2s 1 log2 s+1 ⌈ ⌉− · − (log2 s 1) 2 +2 25√2π 2s 1 1 − · (sd) − Equation (99) is bounded above in Lemma 2.7 by the follow- 16 2 ! " # ing. (90) log2(s) 1 2s 1 2s(log s 1) 3/2 ⌈ ⌉− √ − 2 − j 25 2π 2s 1 1 1 2 ψn aQj ,a Rj ,a Rj . (100) = (sd) − (91) · 2 2 16 2 4 j=0 ! " # X 2s 1 3/2 If we let the and R we have (100) equal √ − 2s a2i = aQj ai = a j 25 2π (sd) 2s log s 2s 2 = 2− 2 2 (92) 16 4sd to the following. ! 2s 2s 3/2 s/2 s/2 25√2π 16 (2sd) 2s ⌈ ⌉ s ⌈ ⌉ s a2i ai = s− (93) ψ (a ,a ,a )= a H − 16 25√2π 4sd 2i · n 2i i i 2i i · a ! i=1 i=1 i 2s 3/2 X X 25√2π 16 (2d)2s a2i ai ai = . (94) + (n ai) H − n H . (101) 16 25√2π 4sd − · n ai − · n ! − Now we combine the bound of Π (ls,...,l2, d) in (81) and the exponential whose exponent is the bound of Ψn (as,...,a2, d) From (90) through (92) we simplified using basic properties in (101) to get (2), the polynomial p (s, d) = 2 , max 25√2πs3d3 of indices and logarithms. While from (92) to (93) we incor- and (3), the exponent of the exponential Ψ (as,...,d) given porated 22s into the first factor inside the square brackets and by the sum of 3s log (5d) and the right hand side of (101). we rewrote the first factor into a product of a power in s and Lemma 2.8 gives the ai that maximize (101) and the 2s 2s another without s. From (93) to (94) the s and s− canceled systems (53) and (54) they satisfy depending on the constraints out. on as. Solving completely the system (53) gives aî in (52) and (4) which are the expected values of the ai. The system (5) Now we expand the square brackets in (94) to have (95) is equivalent to (54) hence also proven in Lemma 2.8. This below. therefore concludes the proof Theorem 1.6. 16

B. Main Corollaries 3) Corollary 1.11: Corollary 1.10 has given us an upper bound on the probability Prob ( Ax (1 2ǫ)d x ) in In this section we present the proofs of the corollaries in k k1 ≤ − k k1 Sections I-A and I-B. These include the proof of Corollary (9). In this bound the exponential dominates the polyno- 1.7 in Section IV-B1, the proof of Corollary 1.10 in Section mial. Consequently, in the limit as (k,n,N) while k/n ρ (0, 1) and n/N δ (0, 1)→this ∞ bound IV-B2 and the proof of Corollary 1.11 given in Section IV-B3. → ∈ → ∈ 1) Corollary 1.7: Satisfying RIP-1 means that for any has a sharp transition at the zero level curve of Ψnet. For Ψ (k,n,N; d, ǫ) strictly bounded above zero the overall s sparse vector x, A x (1 2ǫ)d x which indicates net S 1 1 bound grows exponentially in without limit, while for that− the cardinalityk of thek set≥ of− neighborsk k satisfies A N s Ψ (k,n,N; d, ǫ) strictly bounded below zero the over- (1 ǫ)ds. Therefore | | ≥ net − all bound decays to zero exponentially quickly. We define exp ρ (δ; d, ǫ) to satisfy Ψnet (k,n,N; d, ǫ) = 0 in (13), so Prob ( ASx 1 (1 2ǫ)d x 1) k k ≤ − k k that for any ρ strictly less than ρexp(δ; d, ǫ) the exponent will Prob ( As (1 ǫ)ds) . (102) ≡ | |≤ − satisfy Ψnet (k,n,N; d, ǫ) < 0 and hence the bound decay to zero. This implies that as = (1 ǫ)ds and since this is restricting − More precisely, for k/n ρ < (1 γ)ρexp(δ; d, ǫ) with as to be less than it’s expected value given by (4), the rest small γ > 0, in this regime→ of ρ <−(1 γ)ρexp(δ; d, ǫ) of the ai satisfy the polynomial system (5). If there exists a we have Prob ( Ax (1 2ǫ)d x ) − 0. Therefore, solution then the ai would be functions of s, d and ǫ which k k1 ≤ − k k1 → Prob ( Ax 1 (1 2ǫ)d x 1) 1 as the problem size makes Ψ (as,...,a2, d)=Ψ(s,d,ǫ). k k ≥ − k k → 2) Corollary 1.10: Corollary 1.7 states that by fixing S grows such that (k,n,N) , n/N δ (0, 1) and k/n ρ. → ∞ → ∈ and the other parameters, Prob ( ASx 1 (1 2ǫ)d x 1) < → p (s, d) exp [n Ψ (s,d,ǫ)]k. Corollaryk ≤ 1.10− considersk k max · · any S [N] and since the matrices are adjacency V. APPENDIX ⊂ matrices of lossless expanders we need to consider any A. Proof of Corollary 2.2 S [N] such that S k. Therefore our target is ⊂ | | ≤ The first part of this proof uses ideas from the proof of Prob ( Ax 1 (1 2ǫ)d x 1) which is bounded by a simple unionk boundk ≤ over− all Nk kS sets and by treating each set Proposition 2.1 which is the same as Theorem 16 in [26]. We s consider a bipartite graph G = (U,V,E) with U = N left S, of cardinality less than k, independent we sum over this | | probability to get the following bound. vertices, V = n right vertices and left degree d. For a fixed S U where| | S = s k, G fails to be an expander on k ⊂ | | ≤ N S if Γ(S) < (1 ǫ)ds. This means that in a sequence of Prob ( A x (1 2ǫ)d x ) (103) | | − s · k S k1 ≤ − k k1 ds vertex indices at least ǫds of the these indices are in the s=2 collision set that is identical to some preceding value in the X k N sequence. < p (s, d) exp [n Ψ (s,d,ǫ)] (104) s · max · · Therefore, the probability that a neighbor chosen uniformly s=2 X k 2 at random is to be in the collision set is at most ds/n and, 5 1 treating each event independently, then the probability that a < pmax(s, d) 4 s · set of ǫds neighbors chosen at random are in the collision set s=2 2πs 1 X − N is at most (ds/n)ǫds. There are ds ways of choosing a set of q s ǫds exp NH + n Ψ (s,d,ǫ) (105) ǫds points from a set of ds points and N ways of choosing × N · s 2 each set S from U. This means therefore that the probability 5 h p (k, d) i < k max that G fails to expand in at least one of the sets S of fixed 4 k 2πk 1 size s can be bounded above by a union bound − N q k n exp N H + Ψ (k,d,ǫ) . (106) Prob (G fails to expand on S) × N N · N ds ds ǫds From (103) to (104) we bound the probability in (103) using . (107) ≤ s ǫds n N Corollary 1.7. Then from (104) to (105) we bound s using Stirling’s formula (64) by a polynomial in N multiplying We define ps to be the right hand side of (107) and we use pmax(s, d) and an exponential incorporated into the exponent the right hand side of the Stirling’s inequality (64) to upper of the exponential term. From (105) to (106) we use that bound ps as thus s for N > 2k the entropy H N is largest when s = k 1 and we bound the summation by taking the maximum value 5 ǫds ǫds − 2 ǫds ps < 2π 1 ǫds exp dsH of s and multiplying by the number of terms plus one, 4 ds − ds ds giving k, in the summation. This gives p (N,k,d) = 1 max′ 5 s s − 2 5 2 pmax(k,d) 1 2π (1 )N k 4 which simplifies to and × 4 N − N 2πk(1 k ) 16πk d3(1 k ) − N − N h i ǫds q k nq s ds the factor Ψnet (k,n,N; d, ǫ) = H N + N Ψ (k,d,ǫ) is exp NH (108) what is multiplied to N in the exponent as claimed.· × N × n h i 17

Writing the last multiplicand of (108) in exponential form and REFERENCES simplifying the expression gives [1] R. Horn and C. Johnson, Matrix analysis. Cambridge Univ Pr, 1990. [2] J. Demmel, Numerical linear algebra. Center for Pure and Applied Mathematics, Dept. of Mathematics, University of California, 1993, ps Information Theory Workshop, exp 2007. ITW’07. IEEE. IEEE, 2007, pp. 414–419. of k/n. Therefore for any ρ<ρbi , Ψ (k,n,N; d, ǫ) < 0 [25] M. Pinsker, “On the complexity of a concentrator,” in 7th annual as (k,n,N) , Prob (G fails to be an expander) 0 teletrafﬁc conference, 1973, p. 318. and G becomes→ an ∞ expander with probability approaching→ one [26] R. Berinde, “Advances in sparse signal recovery methods,” Master’s thesis, Massachusetts Institute of Technology, 2009. exponentially in N which is the same as exponential growth [27] E. W. Cheney, Introduction to approximation theory. AMS Chelsea in n since n Nρ. Publishing, Providence, RI, 1998, reprint of the second (1982) edition. → 18

[28] T. Blumensath and M. E. Davies, “Normalized iterative hard thresholding: guaranteed stability and performance,” IEEE Journal of Selected Topics in Signal Processing, vol. 4(2), pp. 298–309, 2010. [29] P. Indyk and M. Ruzic, “Near-optimal sparse recovery in the l1 norm,” in Foundations of Computer Science, 2008. FOCS’08. IEEE 49th Annual IEEE Symposium on. IEEE, 2008, pp. 199–207. [30] R. Berinde, P. Indyk, and M. Ruzic, “Practical near-optimal sparse recovery in the l1 norm,” in Communication, Control, and Computing, 2008 46th Annual Allerton Conference on. IEEE, 2008, pp. 198–205. [31] R. Berinde and P. Indyk, “Sequential sparse matching pursuit,” in Communication, Control, and Computing, 2009. Allerton 2009. 47th Annual Allerton Conference on. IEEE, 2009, pp. 36–43. [32] E. J. Cand`es, “The restricted isometry property and its implications for compressed sensing,” C. R. Math. Acad. Sci. Paris, vol. 346, no. 9–10, pp. 589–592, 2008. [33] J. Blanchard, C. Cartis, and J. Tanner, “Compressed sensing: How sharp is the restricted isometry property?” SIAM Review, vol. 53, no. 1, pp. 105–125, 2011. [34] B. Bah and J. Tanner, “Improved bounds on restricted isometry constants for gaussian matrices,” SIAM Journal of Matrix Analysis, 2010.

Bubacarr Bah received the BSc degree in mathematics and physics from the University of The Gambia, and the MSc degree in mathematical modeling and scientiﬁc computing from the University of Oxford, Wolfson College. He received his PhD degree in applied and computational mathematics at the University of Edinburgh where his PhD degree supervisor was Jared Tanner. He is currently a POSTDOCTORAL FELLOW at EPFL working with Volkan Cevher. Previously he had been a GRADUATE ASSISTANT at the University of The Gambia (2004-2007). His research interests include random matrix theory with applications to compressed sensing and sparse approximation. Dr. Bah is a member of SIAM and has received the SIAM Best Student Paper Prize (2010).

Jared Tanner received the B.S. degree in physics from the University of Utah, and the Ph.D. degree in Applied Mathematics from UCLA where his PhD degree adviser was Eitan Tadmor. He is currently the PROFESSOR of the mathematics of information at the University of Oxford and Fellow of Exeter College. Previously he had the academic posts: PROFESSOR of the mathematics of information at the University of Edinburgh (2007-2012), ASSISTANT PROFESSOR at the University of Utah (2006-2007), National Science Foundation POST- DOCTORAL FELLOW in the mathematical sciences at Stanford University (2004-2006), and a VISITING ASSISTANT RESEARCH PROFESSOR at the University of California at Davis (2002-2004). His research interests include signal processing with emphasis in compressed sensing, sparse approximation, applied Fourier analysis, and spectral methods. Prof. Tanner has received the Philip Leverhulme Prize (2009), Sloan Research Fellow in Science and Technology (2007), Monroe Martin Prize (2005), and the Leslie Fox Prize (2003).