1 Explicit Matrices for Sparse Approximation Amin Khajehnejad, Arash Saber Tehrani, Alexandros G. Dimakis, Babak Hassibi

Abstract—We show that girth can be used to certify time that a given candidate has good recovery that sparse matrices have good guarantees. sparse approximation guarantees. This allows us to present the first deterministic measurement matrix Our Contributions: Consider a H in constructions that have an optimal number of mea- 0, 1 m×n that has d ones per row and d ones per { } c v surements for `1/`1 approximation. Our techniques are column. If the bipartite graph corresponding to H has coding theoretic and rely on a recent connection of com- Ω(log n) girth, then for k = p n and an optimal number pressed sensing to LP relaxations for channel decoding. · of measurements m = c2 n, we show that H offers `1/`1 sparse approximation under· basis pursuit decoding. Our I. Introduction technical requirement of girth, unlike expansion or RIP, is easy to check and several deterministic constructions of Assume we observe m linear measurements of an un- matrices with m = c n and Ω(log n) exist, starting with known vector e Rn: · ∈ the early construction in Gallager’s thesis [16], and the H e = s, progressive edge-growth Tanner graphs of [17]. · Our result is a weak bound, also known as a ‘for- where H is a real-valued matrix of size m n, called the every signal’ guarantee [2]. This means that we have measurement matrix. When m < n this× is an underde- a fixed deterministic matrix and show the `1/`1 sparse termined system of linear equations and one fundamental approximation guarantee with high probability over the compressed sensing problem involves recovering e assum- support of the signal. To the best of our knowledge, this ing that it is also k-sparse, i.e. it has k or less non-zero is the first deterministic construction of matrices with an entries. optimal number of measurements. The sparse approximation problem goes beyond exactly sparse vectors and requires the recovery of a k-sparse Our techniques are coding-theoretic and rely on recent vector ˆe that is close to e, even if e is not exactly k- developments that connect the channel decoding LP re- sparse itself. Recent breakthrough results [11]–[13] showed laxation by Feldman et al. [20] to compressed sensing [8]. that it is possible to construct measurement matrices with We rely on a primal-based density evolution technique m = O(k log(n/k)) rows that recover k-sparse signals initiated by Koetter and Vontobel [18] and analytically exactly in polynomial time. This scaling is also optimal, as strengthened in the breakthrough paper of Arora et al. [9] discussed in [2], [3]. These results rely on randomized ma- that established the best known finite-length threshold trix constructions and establish that the optimal number results for LDPC codes under LP decoding. of measurements will be sufficient with high probability To show our `1/`1 sparse approximation result, we first over the choice of the matrix and/or the signal. Unfor- need to extend [8] for non-sparse signals. Specifically, the tunately the required properties RIP [11], Nullspace [14], first step in our analysis (Theorem 3) is to show that `1/`1 [19] and high expansion (expansion quality  < 1/6) have sparse approximation corresponds to a channel decoding no known ways to be deterministically constructed or problem for a perturbed symmetric channel. The second efficiently checked . There are several explicit constructions component is to extend the Arora et al. argument for of measurement matrices (e.g. [4], [7]) which, however, this perturbed symmetric channel, an analysis achieved require a slightly sub-optimal number of measurements (m in Theorem 2. This is performed by showing how a similar growing super-linearly as a function of n for k = p n). tree recursion allows us to establish robustness of the · In this paper we focus on the linear sparsity regime fundamental cone (FCP), i.e. every pseudocodeword in the where k is a fraction of n and optimal number of measure- fundamental cone [18] has a non-negative pseudoweight ments will also be a fraction of n. The explicit construc- even if the flipped bit likelihoods are multiplied by a factor tion of measurement matrices with an optimal number of larger than one. The FCP condition shows that the matrix rows is a well-known open problem in compressed sensing H, taken as an LDPC code can tolerate a constant fraction theory (see e.g. [2] and references therein). A closely of errors for this perturbed symmetric channel under LP related issue is that of checking or certifying in polynomial decoding.

Amin Khajehnejad and Babak Hassibi are with the Department of We note that even though our analysis involves a rigor- Electrical Engineering, California Institute of Technology, Pasadena, ous density evolution argument, our decoder is always the CA, USA. email:{amin,hassibi}@caltech.edu basis pursuit linear relaxation which is substantially differ- A. Saber Tehrani and A. G. Dimakis are with the Department of Electrical Engineering, University of Southern California, Los ent from the related work on message-passing algorithms Angeles, CA, USA. email:{saberteh,dimakis}@usc.edu for compressed sensing [15], [28]. 2

II. Background Information

In this section we provide a brief background for the CR + 1 0 e ˆe 1 2 min e e 1. (1) major topics we will discuss. We begin by introducing 0 (k) k − k ≤ CR 1 · e ∈Σ n k − k the noiseless compressed sensing problem and the basis − R (k) n pursuit. Then, we mention the channel coding problem where Σ n = w supp(w) k . The nullspace R R and its relaxation. condition is a necessary{ ∈ and| | sufficient| ≤ for} a measurement matrix to yield `1/`1 approximation guarantees [8], [27]. A. Compressed Sensing Preliminaries The simplest noiseless compressed sensing (CS) problem B. Channel Coding Preliminaries for exactly sparse signals consists of recovering the sparsest 0 A linear binary code of length n is defined by an m n real vector e of a given length n, from a set of m real- C n × parity-check matrix HCC, i.e. , x F2 HCC x = 0 . valued measurements s, given by H e0 = s; namely C { ∈ | · } · We define the set of codeword indices , (HCC) , 1, . . . , n , the set of check indices I I(H ) { } J , J CC , 0 1, . . . , m , the set of check indices that involves the i-th CS-OPT: minimize e 0 k k {codeword} position (H ) j [H ] = subject to H e0 = s. Ji , Ji CC , { ∈ J | CC j,i · 1 , and the set of codeword positions that are involved in } Since ` minimization is NP-hard, one can relax CS-OPT the j-th check j j(HCC) i [H]j,i = 1 . For a 0 I , I , { ∈ I| } by replacing the ` norm with ` , specifically regular code, the degrees of the codeword adjacency i 0 1 |J | 0 and the size of check equations j are denoted by dv and CS-LPD: minimize e 1 |I | dc respectively. If a codeword x is transmitted through k k 0 subject to H e = s. a channel and an output sequence∈ C y is received, then · one can potentially decode x by solving for the maximum This LP relaxation is also known as basis pursuit. A likelihood codeword in , namely central question in compressed sensing is under what C conditions the solution given by CS-LPD equals (or is CC-MLD: minimize λT x0 very close to, specially in the case of approximately sparse 0 subject to x conv( )), signals) the solution given by CS-OPT, i.e., the LP ∈ C relaxation is tight. There has been a substantial amount where λ is the likelihood vector defined by λi = of work in this area, see e.g. [2], [11]–[14], [19]. log( P(yi|xi=0) ), and conv( ) is the convex hull of all code- P(yi|xi=1) C One sufficient way to certify that a given measurement words of in Rn. CC-MLD solves the ML decoding matrix is“good”is through the well-known restricted isom- problem byC the virtue of the fact that the objective λT x etry property (RIP), which guarantees that the LP relax- is minimized on a corner point of conv( ), which is a ation will be tight for all k-sparse vectors e and further the codeword. CC-MLD is NP-hard and thereforeC an efficient recovery will be robust to approximate sparsity [11], [12]. description of the exact codeword polytope is very unlikely However, RIP condition is not a complete characterization to exist. of the LP relaxation of “good” measurement matrices The well-known channel decoding LP relaxation is: (see, e.g., [21]). In this paper we rely on the nullspace T 0 characterization (see, e.g., [22], [23]) instead, that gives CC-LPD: minimize λ x 0 a necessary and sufficient condition for a matrix to be subject to x (HCC), “good”. ∈ P where = (HCC) is known as the fundamental poly- Definition 1: Let 1, . . . , n and let CR 0. We P P S ⊂ { } ≤ ≥ tope [18], [20]. The fundamental polytope is compactly say that H has the nullspace property NSP ( ,CR), and T ≤ R S write H NSP ( ,C ), if described as follows: If hj is the j-th row of HCC, then ∈ R S R = conv( ), (2) C . νS ν , for all ν (H). 1≤j≤m j R k k1 ≤ k S k1 ∈ N P ∩ C n T We say that H has the strict nullspace property where j = x F hj x = 0 mod 2 . Due to the < < C { ∈ | } NSP ( ,CR) and write H NSP ( ,CR), if symmetries of the fundamental polytope [20] we can focus R S ∈ R S on the cone around the all-zeros codeword without loss C . νS < ν , for all ν (H) 0 . R k k1 k S k1 ∈ N \{ } of generality. Given the parity check matrix HCC, its The next performance metric (see, e.g., [1], [6]) for CS fundamental cone (HCC) is defined as the smallest cone n K in R that encompasses (HCC), and can be formally involves recovering approximations to signals that are not P exactly k-sparse. This is the main performance metric we defined as follows. use in this paper: Definition 3: The fundamental cone , (HCC) of n K K HCC is the set of all vectors w R that satisfy Definition 2: An `1/`1 approximation guarantee means ∈ that the Basis Pursuit LP relaxation outputs an estimate wi 0, for all i , (3) ˆe that is within a constant factor from the best k-sparse P≥ ∈ I wi 0 wi0 , for all i j, j . (4) approximation for e, i.e., ≤ i ∈Ij ∈ I ∈ J 3

Proof: To prove this Theorem, we take the steps +CR shown in Fig. 2. We import the best performance guar- (1 p) antees for LDPC code from LP decoding [9] through +1 − +1 the bridge established inC [8]. First through Theorem 2, we p show that the code has the fundamental cone property C −αn 1 1 FCP( ,CR) with probability at least 1 (e ). Next, − (1 p) − S − O − through Lemma 2, we demonstrate that corrects the error configuration at the output ofC the perturbed CR S − symmetric channel. Finally, by Theorem 3, we establish a connection between the properties of HCC as parity check Fig. 1. Perturbed symmetric channel model. matrix (i.e. FCP condition) and its null space properties as a measurement matrix in compressed sensing. Conse- quently, we show that the solution of CS-LPD satisfies Given the fundamental cone of a code , we define the the `1/`1 condition. following property. C Note that, if the girth is g = Ω(log log(n)), then Definition 4: Let S 1, 2, , n and CR 1 be HCC has `1/`1 guarantee with probability higher than fixed. A code with parity⊂ { check··· matrix} H ≥is said 1 (1/n). Theorem 1 is our main result and we will C CC − O to have the fundamental cone property FCP( ,CR) if for build the blocks necessary to complete the proof in the every w (H ) the following holds: S remaining of this paper. ∈ K CC

CR wS 1 < wS 1. (5) k k k k A. Extension of CC-LPD We introduce the perturbed symmetric channel (PSC) For a fixed code rate, the probabilistic analysis of [9] shown in Fig. 1, where each bit +1 ( 1)1 is flipped into yields a (weak) threshold p∗ for the probability of bit flip, C (+C ) with probability p and− remains unchanged R R below which CC-LPD can recover the output of a binary with− probability (1 p). For C = 1, the perturbed R bit flipping channel, with high probability, provided that symmetric channel is− the same as the binary symmetric the code has a girth at least doubly logarithmic in the channel (BSC). Note that the data can be received error size of the code. The resulting threshold is significantly free through the perturbed channel. Through solving CC- tighter than the previously achieved bounds, namely those LPD, however, it is apparent that the recovery probability obtained in [24], which were established for expander of the perturbed channel with C > 1 is less than BSC. R graph based codes. Building upon the work of Arora et al., we further prove III. `1/`1 Guarantee for Ω(log n) Girth Matrices that codes with optimal girth maintain the FCP property in a weak notion. In other words, we prove that if the We start by stating the main Theorem of this paper. probability of bit flip p is sufficiently small, there is a finite Theorem 1: Let H be an m n 0-1 measurement matrix C > 1 such that with high probability, the output of the with girth g = Ω(log(n)), and× column and row densities R perturbed symmetric channel in Figure 1 with C can be d and d respectively. There exists a fraction p∗(d , d ), R v c c v recovered by CC-LPD. The parameter C surely depends solely a function of d and d , so that for every 0 p R c v on p. We compute an upper bound on the achievable p∗(d , d ), there exists some C > 1 such that H provides≤ ≤ c v R robustness factor C as a function of p, which approaches ` /` guarantee (with parameter C ) for the CS-LPD R 1 1 R 1 as p approaches the recoverable threshold p∗ of LP and for a randomly chosen support set S of size p n, with decoding based on the analysis of [9]. Using the connection probability higher than 1 (e−αn) for some α >· 0. ∗ − O between the CS-LPD and CC-LPD, this result will allow The function p (dc, dv) and CR are deterministic func- us to explicitly find the relationship between the ` /` tions computed from a recursion on a tree (see Theorem 1 1 robustness factor CR and the fraction p, when the results 2). For example, for dc = 6, dv = 3 and we numerically ∗ 1 are translated to the context of compressed sensing. In obtain p 0.045, and therefore we can recover almost order to state the main theorem of this section, first we all k = 0.045≥n-sparse signals with m = 1 n measurements. 2 define η to be a random variable that takes the value CR To the best of our knowledge, this is the best provable with probability p and value 1 with probability 1 p−, i.e. recovery threshold for sparse compressed sensing mea- the output of the PSC when 0 is transmitted. Also,− let the surement matrices. Furthermore, for p = 0.04 we obtain sequences of random variables X ,Y , i 0 be defined in C 1.08, which means that for compressible signals, i i ≥ R ≥ the following way: `1 minimization gives `1/`1 error bound, given by the constant 2 CR+1 = 52, i.e. the ` norm of the recovery error CR−1 1 is bounded by a factor 52 of the `1 norm of the smallest Y0 = η, (1 0.04)n coefficient of the signal (see (1)). (1) (dc−1) − Xi = min Yi ,...,Yi i > 0, { (1) (}d ∀−1) 1Note that the bits 0,1 are mapped to +1,-1, respectively. Y = 2iη + X + + X v i > 0, (6) i i−1 ··· i−1 ∀ 4

Thm. 3 Performance Thm. 2 Lemma 3 H FCP( ,C ) corrects & H NSP<( ,C ) Guarantee CC R CC R R with high∈ probabilityS errorC pattern ! /∈! guaranteeS by Arora et al. S Lemma 4 1 1 (CC-CS bridge)

Fig. 2. The procedure of importing the performance guarantee from LP decoding into compressed sensing.

(j) where X s are independent copies of a random variable has a linear certificate, namely FCP( ,CR). In other X. words,S if we define: S Theorem 2: Let be a regular (dv, dc)-LDPC code with S X X girth equal to g, andC 0 p 1/2 be the probability of bit f ( )(x) = x C x , (7) ≤ ≤ C i − R i flip, and S be the random set of flipped bits. If for some i∈S i∈S j N, S ∈ ( ) then LP decoder is successful, if and only if fCR (z) 0 1/(dv −2) −tXj ≥ c = γ min Ee < 1 for every pseudocodeword z . Therefore, according to t≥0 Lemma 1, it suffices that the∈ condition P be true for all T - where γ = (d 1) CR+1 ( CR·p )1/(CR+1)(1 p) < 1, then c CR 1−p local deviations, which is equivalent to FCP( ,CR). − −T −1 S with probability at least 1 (n)cdv (dv −1) the code Proof of Theorem 2: We denote the set of variable − O C has the FCP( ,CR), where T is any integer with j T < nodes and check nodes by Xv and Xc respectively. For a T g/4. S ≤ fixed w [0, 1] , let be the set of all minimal T -local deviations,∈ and be theB set of minimal T -local deviations Note that the value c can be derived for specific values Bi of p, dv, dc, and CR. The proof of the above theorem defined over a skinny tree rooted at the variable node vi. Also, assume is the random set of flipped bits, when falls along the same lines as the arguments of [9]. The S bottom line is the existence of a certificate in the primal the flip probability is p. Interchangeably, we also use S LP problem for the success of LP decoding, which can to refer to the set of variable nodes corresponding to the be extended to the case of the perturbed channel. In flipped bits indices. We are interested in the probability (w) (S) (w) order to define the certificate, we first bring the following that for all β , fC (β ) 0. For simplicity we ∈ B(S) ≥ definitions from [9]. In the sequel, we denote the bipartite denote this event by f ( ) 0. Since the bits are flipped C B ≥ graph that corresponds to H by . independently and with the same probability, we have the Definition 5: A tree of heightG 2T is called a skinny following union bound T subtree of , if it is rooted at some variable node vi , for G 0  (S)   (S)  every variable node v in all the neighboring check nodes P f ( ) 0 1 nP f ( 1) 0 . (8) C B ≥ ≥ − C B ≥ of v in are also presentT in , and for every check node c Now consider the full tree of height 2T, that is rooted at in exactlyG two neighboringT variable nodes of c in are the node v , and contains every node u in that is no more presentT in . G 0 than 2T distant from v , i.e. d(v , u) 2TG. We denote this DefinitionT 6: Let w [0, 1]T be a fixed vector. A vector 0 0 tree by B(v , 2T ). To every variable≤ node u of B(v , 2T ), β(w) is called a minimal∈ T -local deviation, if there is a 0 0 we assign a label, I(u), which is equal to C ω if u skinny subtree of of height 2T , say , so that for every − R h(u) ∈ G T S, and is ωh(u) if u , where (ω0, ω2, , ω2T −2) = w. variable node vi 1 i n, ∈ S (S) ··· ≤ ≤ We can now see that the event f ( 1) 0 is equivalent  w if v v C B ≥ (w) h(i) i i0 to the event that for all skinny subtrees of B(v , 2T ) of βi = ∈ T \ { } , 0 0 otherwise height 2T , the sum of the labels on theT variable nodes of where h = 1 d(v , v ). is positive. In other words, if Γ0 is the set of all skinny i 2 i0 i T (S) trees of height 2T that are rooted at v , then f ( ) 0 The key to the derivation of a certificate for LP decoding 0 C B1 ≥ is the following lemma: is equivalent to: Lemma 1 (Lemma 1 of [9]): For any vector z , and X T ∈ P min I(v) 0. (9) any positive vector w [0, 1] , there exists a distribution T ∈Γ ∈ (w) 0 ≥ on the minimal T -local deviations β , such that v∈T ∩Xv (w) We assign to each node u (either check or variable node) Eβ = αz, of B(v0, 2T ) a random variable Zu, which is equal to P where 0 < α 1. the contribution to the quantity minT ∈Γ I(v) ≤ 0 v∈T ∩Xv Lemma 1 has the following interpretation. If a linear prop- by the offspring of the node u in the tree B(v0, 2T ), (w) erty holds for all minimal T -local deviations ( f(β ) 0, and the node u itself. The value of Zu can be deter- where f(.) is a linear function), then it also holds≥ for mined recursively from all of its children. Furthermore, all pseudo-codewords (f(z) 0 z ). Interestingly the distribution of Z only depends on the height of u ≥ ∀ ∈ P u enough, the success of LP decoding over the perturbed in B(v0, 2T ). Therefore, to find the distribution of Zu, symmetric channel of Figure 1 for a given set of bit flips we use X ,X , ,X − as random variables with the 0 1 ··· T 1 5

same distribution as Zu when u is a variable node (X0 Proof: Without loss of generality, we can assume is assigned to the lowest level variable node) and likewise that the all-zero codeword is transmitted. We begin by Y ,Y , ,Y − for the check nodes. It then follows that: proving the sufficiency. Let +1 be the log-likelihood ratio 1 2 ··· T 1 associated with a received 0, and let CR < 1 be the log- likelihood ratio associated with a received− 1.− Therefore, Y = ω η , 0 0 C ( (1) (dc−1) +1 if i Xi = min Y ,...,Y i > 0, { i i } ∀ λi = ∈ S . Y = ω η + X(1) + + X(dv −1) i > 0, (10) CR if i i i C i−1 ··· i−1 ∀ − ∈ S Then it follows from the assumptions in the lemma state- where X(j)’s are independent copies of a random variable ment that for any w (HCC) 0 X, and ηC is a random variable that takes the value CR ∈ K \{ } − X X with probability p and value 1 with probability 1 p. It λT w = (+1).w + ( C ).w − i − R i follows that i∈S i∈S (a) (b) T     = +1. wS 1 CR. wS 1 > 0 = λ 0, (S) (1) (dv ) k k − k k · P fC ( 1) 0 = P XT −1 + + XT −1 0 B ≤ ··· ≤ where step (a) follows from the fact that wi = wi for −tX d (E(e T −1 )) v . (11) all i (H ), and where step (b) follows| | from (5). ≤ ∈ I CC The last inequality is by Markov inequality and is true for Therefore, under CC-LPD the all-zero codeword has the all t > 0. The rest of the proof is modifications to the proof lowest cost function value when compared to all non-zero of Lemma 8 in [9], for the Laplace transform evolution of pseudo-codewords in the fundamental cone, and therefore the variables X s and Y s, to account for a non-unitary also compared to all non-zero pseudo-codewords in the i i fundamental polytope. robustness factor CR. By upper bounding the Laplace transform of the variables recursively it is possible to show Note that the proof of the converse is direct and can that (see Lemma 8 of [9], the argument is completely the easily be derived by taking the sufficiency proof steps, same for our case) backward.

(d −1)i−j B. Establishing the Connection − −tXj  v Ee tXi Ee ≤ In this section, through Lemma 3 (taken from [10]), − k Y −twi−kη(dv 1) (dc 1)Ee ,(12) we will establish a bridge between LP-decoding and com- − 0≤k≤i−j−1 pressed sensing. Specifically, using this bridge, we will import performance results from LP-decoding context into for all 1 j i < T . If we take the weight vector as compressed sensing. w = (1, 2,≤ ,≤2j, ρ, ρ, , ρ) for some integer 1 j < T , Lemma 3: (Lemma 6 in [10]): Let H be a zero-one and use equation··· (12)··· for i = T 1, we obtain: ≤ CC − measurement matrix. Then

−tX (d −1)T −j−1 ν (HCC) ν (HCC). Ee−tXT −1 (Ee j ) v ∈ N ⇒ | | ∈ K ≤ (dv−1)T −j−1−1 Proof: Let w ν and to show that such a vector −tρη dv −2 , (dc 1)Ee , | | · − w is indeed in the fundamental cone of HCC, we need ρ and t can be chosen to jointly minimize Ee−tXj and to verify (3) and (4). It is apparent that w satisfies (3). Ee−tρη in the above, which along with (11) results in Therefore, let us focus on the proof that w satisfies (4). Namely, from ν (HCC) it follows that for all j , S −tXT −1 dv P ∈ N P ∈ J P(fC ( 1 0)) (Ee ) i∈I hj,iνi = 0, i.e., for all j , i∈Ij νi = 0. This B ≤ ≤ T −j−1 ∈ J γ−dv /(dv −2) cdv (dv −1) , implies ≤ × where γ = (d 1) CR+1 (1 p)( CR.p )1/(CR+1) and c = c CR 1−p X X X 0 0 0 1/(dv −2) − −tXj − wi = νi = νi νi = wi , γ mint≥0 e . If c < 1, then probability of error E | | − 0 ≤ 0 | | 0 tends to zero as stated in Theorem 2. i ∈Ij \{i} i ∈Ij \{i} i ∈Ij \{i} In the next Lemma, we show that if a code has for all j and all i j, showing that w indeed satisfies the fundamental cone property FCP( ,C ), thenCCC- (4). ∈ J ∈ I S R LPD can correct the perturbed symmetric channel error From here on, we deal with HCC as a zero-one mea- configuration . surement matrix. Note that the bridge established in [8] S Lemma 2: Let HCC be a parity-check matrix of a code and through Lemma 3 connects CC-LPD of the binary and let (HCC) be a particular set of coordinate linear channel code and CS- LPD based on a zero-one indicesC thatS are ⊂ I flipped by a perturbed channel with cross- measurement matrix over reals by viewing this binary over probability p > 0. If and only if HCC has the parity-check matrix as a measurement matrix. This con- FCP( ,CR), then the solution of CC-LPD equals the nection allows the translation of performance guarantees codewordS that was sent. from one setup to the other. Using this bridge, we can show 6

that parity-check matrices of ”good” channel codes can be from applying the triangle inequality property of the `1- used as provably ”good”measurement matrices under basis norm twice, and where step (c) follows from pursuit. (C + 1). ( νS + ν ) Using the bridge of Lemma 3, we show in the next theo- R −k k1 k S k1 rem that the parity-check matrix H , as a measurement = CR. νS 1 νS 1 + CR. ν 1 + ν 1 CC − k k − k k k S k k S k matrix, has the nullspace property and as a result satisfies (d) ν 1 νS 1 + CR. ν 1 + CR. νS 1 the `1/`1 guarantee. ≥ −k S k − k k k S k k k m×n Theorem 3: Let H 0, 1 be a parity-check = (CR 1). νS 1 + (CR 1). ν 1 CC ∈ { } − k k − k S k matrix of the code and let k be a non-negative integer. = (CR 1) ν 1, Further assume thatC code can correct the error config- − k k uration over the perturbedC symmetric channel where where step (d) follows from applying twice the fact S k. Additionally, assume that s = HCC e. Then that ν (HCC) and the assumption that HCC |S| ≤ · ≤ ∈ N ∈ H as a measurement matrix satisfies NSP ( ,CR). Subtracting the term eS 1 on both sides CC R S k k of (14), and solving for ν 1 = e ˆe 1 yields the promised H NSP <( ,C ). k k k − k CC ∈ R S R result. Furthermore, the solution ˆe produced by CS-LPD will This Theorem was the last piece in the proof of Theorem satisfy 1, and with it, the proof is complete. Note that it is easy to obtain deterministic construc- CR + 1 e ˆe 1 2. . e 1. tions of measurement matrices that are regular, have k − k ≤ C 1 k S k R − the optimal order of measurements and have Ω(log n) Proof: We begin by proving the nullspace property. girth. Therefore, a byproduct of our result is the explicit Since the code corrects the configuration , from Lemma C S construction of measurement matrices with m = cn rows 2, for any point in the fundamental cone (HCC) includ- which can approximate almost all near-sparse signals in ing w and any set 1, . . . , n with K k, we have S ⊂ { } |S| ≤ the `1/`1 sense, for sparsity k = p n. These will be parity- check matrices of codes with girth· Ω(log n). To explicitly CR wS 1 < wS 1. (13) k k k k obtain such codes, we can use one of the known determin- We prove by contradiction. Assume HCC does not have istic matrix constructions such as the progressive edge- the strict nullspace property NSP <( ,C ), i.e. there R R growth (PEG) Tanner graphs [17], or the deterministic exists a point ν (H ) such that S ∈ N CC construction suggested by Gallager [16]. C νS ν . Rk k1 ≥ k S k1 References Further, from Lemma 3, we know there exists w = ν in | | [1] R. Berinde, A. Gilbert, P. Indyk, H. Karloff, and M. Strauss, (HCC). And K Combining geometry and combinatorics: a unified approach to sparse signal recovery, in Proc. 46th Allerton Conf. on Commu- CR wS 1 = CR νS 1 k k k| |k nications, Control, and Computing, Allerton House, Monticello, = C νS IL, USA, Sept. 23–26 2008. Rk k1 ν [2] A. Gilbert and P. Indyk, A survey on Sparse Recovery Using ≥ k S k1 Sparse Matrices, Proceedings of IEEE, June 2010. [3] K. Do Ba, P. Indyk, E. Price and D. Woodruff, Lower bounds = νS 1 k| |k for Sparse Recovery, in Proc. ACM-SIAM Symp. Discrete Algo- = wS 1, rithms (SODA), Jan. 2010. k k [4] W. U. Bajwa, R. Calderbank, and S. Jafarpour, Why Gabor which contradicts the assumption and shows that no such frames? Two fundamental measures of coherence and their role a point ν exists. in model selection, submitted, June 2010. [5] M. Capalbo, O. Reingold, S. Vadhan, and A. Wigderson, We showed that HCC has the claimed nullspace prop- Randomness conductors and constant-degree lossless expanders, erty. Since HCC e = s and HCC ˆe = s, it easily follows Proc. 34th Annual ACM Symposium on Theory of Computing, · · that, ν , e ˆe is in the nullspace of HCC. So 2002. − [6] A. Cohen, W. Dahmen, and R. DeVore, Compressed sensing and best k-term approximation, J. Amer. Math. Soc., vol. 22, pp. 211– eS 1 + eS 1 = e 1 k k k k k k 231, July 2008. (a) ˆe [7] R. A. DeVore, Deterministic constructions of compressed sensing ≥ k k1 matrices, J. Complexity, vol. 23, no. 4–6, pp. 918–925, Aug. 2007. = e ν 1 [8] A. G. Dimakis, R. Smarandache, and P. O. Vontobel, LDPC k − k Codes for Compressed Sensing, submitted for publication. Avail- = eS νS + e ν able online: k − k1 k S − S k1 (b) http://arxiv.org/abs/1012.0602 [9] S. Arora, C. Daskalakis, and D. Steurer, Message-passing algo- eS 1 νS 1 + ν 1 e 1 ≥ k k − k k k S k − k S k rithms and improved LP decoding, Proc. 41st Annual ACM Symp. (c) CR 1 Theory of Computing, Bethesda, MD, USA, May 31–June 2 2009. eS − . ν e , ≥ k k1 − C + 1 k k1 − k S k1 [10] R. Smarandache and P. O. Vontobel, Absdet-pseudo-codewords R and perm-pseudo-codewords: definitions and properties, Proc. (14) IEEE Int. Symp. Information Theory, Seoul, Korea, June 28–July 3 2009. where step (a) follows from the fact that the solution of [11] E. J. Candes and T. Tao, Decoding by , IEEE CS-LPD satisfies e ˆe , where step (b) follows Trans. Inf. Theory, vol. 51, no. 12, pp. 4203–4215, Dec. 2005. k k1 ≤ k k1 7

[12] D. Donoho, Compressed sensing, IEEE Trans. Inf. Theory, vol. 52, no. 4, pp. 1289–1306, Apr. 2006. [13] D. Donoho and J. Tanner, Neighborlyness of randomly-projected simplices in high dimensions, Proc. National Academy of Sci- ences, 02(27), pp. 9452–9457, 2005. [14] D. L. Donoho and X. Huo, Uncertainty principles and ideal atomic decomposition, IEEE Transactions on Information The- ory, 47(7), 2001. [15] D. L. Donoho, A. Maleki, and A. Montanari, Message passing algorithms for compressed sensing, Proc. Natl Acad. Sci., 2009. [16] R. G. Gallager, Low-Density Parity-Check Codes, M.I.T. Press, Cambridge, MA, 1963. [17] X. Y. Hu, E. Eleftheriou, and D. M. Arnold, Regular and irregular progressive edge-growth Tanner graphs, IEEE Trans. Inf. Theory, vol. 51, no. 1, pp. 386–398, 2005. [18] R. Koetter and P. O. Vontobel, On the block error probability of LP decoding of LDPC codes, Proc. Inaugural Workshop of the Center for Information Theory and Applications, UC San Diego, La Jolla, CA, USA, Feb. 6–10 2006. [19] M.Stojnic, W. Xu, and B. Hassibi, Compressed sensing - probabilistic analysis of a null-space characterization, IEEE Inter- national Conference on Acoustic, Speech and , ICASSP 2008. [20] J. Feldman, M. J. Wainwright, and D. R. Karger, Using linear programming to decode binary linear codes, IEEE Trans. Inf. Theory, vol. 51, no. 3, pp. 954–972, Mar. 2005. [21] J. D. Blanchard, C. Cartis, and J. Tanner, Compressed sensing: how sharp is the restricted isometry property?, to appear in SIAM Review, 2010, available online: http://arxiv.org/abs/1004.5026 [22] W. Xu and B. Hassibi, Compressed sensing over the Grassmann manifold: a unified analytical framework, in Proc. 46th Allerton Conf. on Communications, Control, and Computing, Allerton House, Monticello, IL, USA, Sept. 23-26 2008. [23] M. Stojnic, W. Xu, and B. Hassibi, Compressed sensing - probabilistic analysis of a null-space characterization, in Proc. IEEE Intern. Conf. Acoustics, Speech and Signal Processing, Las Vegas, NV, USA, Mar. 31– Apr. 4 2008, pp. 3377–3380. [24] C. Daskalakis, A. G. Dimakis, R. M. Karp, and M. J. Wain- wright, Probabilistic Analysis of Linear Programming Decoding. [25] P. O. Vontobel and R. Koetter, Graph-cover decoding and finite- length analysis of message-passing iterative decoding of LDPC codes, accepted for IEEE Trans. Inform. Theory, 2007. Available online http://www.arxiv.org/abs/cs.IT/0512078 [26] A. Khajehnejad, A. G. Dimakis, B. Hassibi and W. Bradley, Iterative Reweighted LP Decoding, Proceedings of Allerton Con- ference 2010. [27] W. Xu and B. Hassibi, Compressed sensing over the Grassmann manifold: a unified analytical framework, in Proc. 46th Allerton Conf. on Communications, Control, and Computing, Allerton House, Monticello, IL, USA, Sept. 23–26 2008. [28] F. Zhang and H. D. Pfister, On the iterative decoding of high rate LDPC codes with applications in compressed sensing, sub- mitted, available online under http://arxiv.org/abs/0903.2232, Mar. 2009.