<<

1 Robustifying the Sparse Walsh-Hadamard Transform without Increasing the Sample Complexity of O(K log N)

Xiao Li, Joseph Kurata Bradley, Sameer Pawar and Kannan Ramchandran Dept. of Electrical Engineering and Computer Sciences, U.C. Berkeley.

Abstract—The problem of computing a K-sparse N-point guarantees. We prove that our can recover the Walsh-Hadamard Transforms (WHTs) from noisy time domain sparse WHT at constant signal-to-noise ratios (SNRs) with α samples is considered, where K = O(N ) scales sub-linearly in the same (K log N) samples as for the noiseless case in N for some α ∈ (0, 1). A robust algorithm is proposed to recover O the sparse WHT coefficients in a stable manner which is robust [14]. This result contrasts with the DFT work in [15], for to additive Gaussian noise. In particular, it is shown that the K- which robustness to noise increases the sample complexity sparse WHT of the signal can be reconstructed from noisy time from (K) to (K log N). O O domain samples with any error probability N which vanishes The rest of the paper is organized as follows: In Section to zero, using the same sample complexity O(K log N) as in the II, we provide the problem formulation along with the signal noiseless case. and the noise model. Section III provides our main results and a brief comparison with related literature. In Section IV, I.INTRODUCTION we explain the proposed front-end architecture for acquiring The Walsh-Hadamard Transform (WHT) has been widely samples and the robust algorithm using a simple example. In deployed in [1], spreading code design Section V, we provide simulation results which empirically in multiuser systems such as CDMA and GPS [2], and validate the performance of our algorithm. compressive sensing [3]. The WHT may be computed using N Notations: Throughout this paper, the set of integers samples and N log N operations via a recursive algorithm [4], 0, 1, ,N 1 for some integer N is denoted by [N]. { ··· − } [5] analogous to the Fast (FFT). However, Lowercase letters, such as x, are used for the time domain these costs can be significantly reduced if the signal is sparse expressions and the capital letters, such as X, are used for the in the WHT domain, as is true in many real world scenarios transform domain signal. Any letter with a bar such as x¯ or [6], [7]. X¯ represents a vector containing the corresponding samples. Since the WHT is a special case of the multidimensional Given a real-valued vector v¯ RN with N = 2n, the i-th n ∈ DFT over a finite field F2 , recent advances in computing K- entry of v¯ is interchangeably represented by v[i] indexed by sparse N-point Fourier transforms have provided insights in the decimal representation of i or vi0,i1,··· ,in−1 indexed by the designing for computing sparse WHTs. There has binary representation of i, where i , i , , i − denotes the 0 1 ··· n 1 been much recent work in computing a sparse Discrete Fourier binary expansion of i with i0 and in−1 being the least signifi- Transform (DFT) [8]–[13]. Among these works, the Fast cant bit (LSB) and the most significant bit (MSB), respectively. Fourier Aliasing-based Sparse Transform (FFAST) algorithm The notation F2 refers to the finite field consisting of 0, 1 , { } proposed in [13] uses (K) samples and (K log K) oper- with defined operations such as summation and multiplication O O n ations for any sparsity regime K = (N α) with α (0, 1) modulo 2. Furthermore, we let F be the n-dimensional vector O ∈ 2 under a uniform sparsity distribution. Following the sparse- with each element from F2 and the addition of the vectors graph decoding design in [13] for DFTs, the Sparse Fast done element-wise over this field. The inner product of two Pn−1 Hadamard Transform (SparseFHT) algorithm developed in binary indices i and j is defined by i, j = t=0 itjt with α h i [14] computes a K-sparse N-point WHT with K = (N ) arithmetic over F2, and the inner product between two vectors O PN using (K log N) samples. Since K is sub-linear in N, their is defined as x,¯ y¯ = x[t]y[t] with arithmetic over R. h i t=1 resultsO can be interpreted as achieving a sample complexity II.SIGNAL MODELAND PROBLEM FORMULATION (K log N). However, the algorithm specifically exploits the O N n noiseless nature of the underlying signals and hence fails to Consider a signal x¯ R containing N = 2 samples xm ∈ n work in the presence of noise. indexed with elements m F2 , and the corresponding WHT ¯ N ∈ n In this paper, we consider the problem of computing a K- X R containing N coefficients Xk with k F2 . The ∈ ¯ ∈ sparse N-point WHT in the presence of noise. A key question N-dimensional WHT X of the signal x¯ is given by of theoretical and practical interest is: what price must be 1 X X = ( 1)hk,mix , (1) paid to be robust to noise? Surprisingly, there is no cost k √ − m N ∈ n in sample complexity to being robust to noise, other than a m F2 n constant factor. Specifically, we develop a robust algorithm where k F2 denotes the corresponding index in the trans- which uses (K log N) samples and has strong performance form domain.∈ We assume the WHT is a sub-linearly sparse O 2

α signal with K = N non-zero coefficients Xk in the set k Since both the single-ton test in [13] and the collision detection and α (0, 1). ∈ K in [14] specifically exploit the noiseless nature of signals, they Previous∈ analysis [14] assumes exact measurements of the cannot be used in the noisy setting without major algorithmic time-domain signal x¯. We generalize this setting by using changes. Our work fills this gap by developing a sparse WHT noise-corrupted measurements: algorithm which is robust to noise. y = x + w , (2) m m m B. Our Results where w (0, σ2) is Gaussian noise added to the clean m ∼ N We now summarize our main results on recovering a K- samples xm. The SparseFHT algorithm [14] no longer works sparse N-point WHT of a signal from noisy time domain in the presence of noise. Therefore, the focus of this paper is samples. For our analysis, we make the following assumptions: to develop a robust algorithm which can compute the sparse • The support of the non-zero WHT coefficients is uni- WHT coefficients X reliably from the noisy samples k k∈K formly random in the set [N]. y with the same sample{ } complexity as in the noiseless case. m • The unknown WHT coefficients take values from ρ. ρ2 ± • The signal-to-noise ratio SNR = 2 is fixed. III.RELATED WORKAND OUR RESULTS Nσ The first assumption is critical to analyzing the peeling de- In this section, we first frame our results in the context coder. The next two assumptions merely simplify analysis. of previous work on recovering sparse transforms. We then summarize our main results. Theorem 1. For any sublinear sparsity regime K = (N α) for α (0, 1), our robust algorithm based on the randomizedO ∈ A. Related Work hashing front-end (Section IV-A) and the associated peeling- based decoder (Section IV-B) can stably compute the WHT X¯ Due to the similarities between the Discrete Fourier Trans- 2 of any signal x¯ in the presence of noise w (0, σ IN×N ), form (DFT) and the WHT, we give a brief account of previous with the following properties: ∼ N work on reducing the sample and computational complexity of 1) Sample complexity: The algorithm needs (K log N) computing a K-sparse N-point DFT. [8], [9] developed ran- noisy samples y . O domized sub-linear time algorithms that achieve near-optimal m 2) Computational complexity: The algorithm requires sample and computational complexities of (K log N) with (N log2 N) operations. potentially large big-Oh constants [11]. Then,O [10] further 3) ProbabilityO of success: The algorithm successfully com- improved the algorithm for 2-D Discrete Fourier Transforms putes the K-sparse WHT X¯ with probability at least (DFTs) with K = √N, which reduces the sample complexity 1  for any  > 0. to (K) and the computational complexity to (K log K), − N N albeitO with a constant failure probability that doesO not vanish as Proof. See Appendix A. the signal dimension N grows. On this front, the deterministic Importantly, the proposed robust algorithm can compute the algorithm in [12] is shown to guarantee zero errors but with sparse WHT using (K log N) samples, i.e., no more than the complexities of (poly(K, log N)). SparseFHT algorithmO [14] developed for the noiseless case. A major improvementO in terms of both complexities is given The overhead in moving from the noiseless to the noisy regime by the FFAST algorithm [13], which achieves a vanishing is only in the extra computational complexity. failure probability using only (K) samples and (K log K) operations for any sparsity regimeO K = (N αO) and α IV. STABLE FAST WALSH-HADAMARD TRANSFORMVIA (0, 1). The success of the FFAST algorithmO is thanks to∈ ROBUST SPARSE GRAPH DECODING peeling-based decoding over sparse graphs, which depends on the single-ton test to pinpoint the “parity” Fourier bin We now describe our randomized hashing front-end archi- containing only one “erasure event” (unknown non-zero DFT tecture and the associated peeling-based decoding algorithm coefficient). Given such a single-ton bin, the value and location for computing a K-sparse N-point WHT, which we then of the coefficient can be obtained and then removed from other connect to the framework of decoding over sparse-graph codes. “parity” bins. This procedure iterates until no more single-ton bins are found. A. Randomized Hashing Inspired by [13], the SparseFHT algorithm in [14] com- Our algorithm is based on subsampling to create aliasing putes a K-sparse WHT of x¯ using (K log N) samples patterns, similarly to the SparseFHT algorithm in [14] and 2 O and (K log N) operations1. The tenet of the algorithm is the FFAST algorithm in [13]. After subsampling B = (K) O again to intelligently subsample the multidimensional signal to time domain samples, computing the corresponding BO-point create hashing/aliasing patterns in the transform domain bins. WHT creates “bins” of coefficients from the original N-point Similar to the single-ton test in [13], the SparseFHT algorithm WHT. Each of these hashed (aliased) WHT coefficients (bins) critically relies on the collision detection module to identify is composed of zero, one, or several coefficients from the parity bins which contain only one unknown WHT coefficient. original WHT. Each subsample of B time domain samples

1 is called a stage; the hashing front-end consists of C stages, In [14], the result suggests a requirement of O(K log(N/K)) samples and n×b each of which uses a different subsampling matrix Ψc F O(K log K log(N/K)) operations, which is equivalent to O(K log N) and ∈ 2 O(K log2 N), respectively, since K = N α for a fixed constant α ∈ (0, 1). which has rank b. 3

Algorithm 1 Subsampling and Shifting Randomized Hashing Front-end Input: N Noisy time domain samples ym R . subsam- bin-0 bin-1 bin-2 bin-3 n×b ∈ (y[0],y[1],...,y[62],y[63]) y[0],y[1],y[2],y[3] pling matrix Ψ . 0 4-WHT c F2 # ∈ y[4],y[5],y[6],y[7] Set: d0,1 = [4]2 0 4-WHT Random shift dc,p, where c [C], p [P ]. # ∈ ∈ b y[8],y[9],y[10],y[11] d0,2 = [8]2 return Length-B time-domain vector indexed by ` F2: # 0 4-WHT ∈ y[12],y[13],y[14],y[15] d0,3 = [12]2 4-WHT r # 0 N y[16],y[17],y[18],y[19] d0,4 = [16]2 uc,p[`] = yΨc`+dc,p (3) # 0 4-WHT B Peeling X¯ stage-0 y[0],y[4],y[8],y[12] decoder 1 4-WHT # y[1],y[5],y[9],y[13] d1,1 = [1]2 4-WHT # 1 y[2],y[6],y[10],y[14] The subsample for each stage c is shifted by P different d1,2 = [2]2 1 4-WHT n # binary patterns d , p [P ]; we call these “substreams.” y[16],y[20],y[24],y[28] c,p F2 d1,3 = [16]2 4-WHT ∈ ∈ # 1 The key change from [13], [14] is that, rather than using y[32],y[36],y[40],y[44] d1,4 = [32]2 4-WHT # 1 bin-0 bin-1 bin-2 bin-3 deterministic shifts, we use randomized shifts which make our stage-1 algorithm robust to noise. We summarize the subsampling and shifting procedure in Fig. 1. Consider a N = 2n = 64 length signal x¯ that has a K = 2b = 4 Algorithm 1. The following proposition describes how original sparse WHT, i.e., n = 6, b = 2. The signal x¯ is processed through a 2 stage FHT architecture. In general there are 3 or more stages but here for purpose of WHT coefficients are hashed to bins. illustration we show an example architecture with 2 stages. The subsampling operation is performed using matrices Ψ0 in stage-1 and Ψ1 in stage-2. The Proposition 1. (Randomized Hashing) Suppose that in the c- first sampling matrix Ψ0 freezes the 4 MSB bits while the second sampling th stage p-th substream, the noisy time domain samples ym matrix Ψ1 freezes the 2 MSB and 2 LSB bits. Various shifts are implemented n×b by shifting the signal using the 6 bit patterns dc,p, where dc,p is a random are subsampled by Ψc F and shifted by dc,p, as in ¯ ∈ 2 shift. The peeling-decoder then synthesizes the big WHT X, from the short Algorithm 1. Let uc,p[`] be the resulting length-B time-domain WHT’s of each of the subsampled data streams. vector. Then the B-point WHT of uc,p[`] may be written as:

X hdc,p,ki Uc,p[j] = Xk( 1) + ξc,p[j], (4) b n − substreams, each containing subsamples, are then k∈F2 |hc(k)=j B = 2 = 4 passed to a B-point WHT to obtain the hash observations. T where hc(k) = Ψc k denotes the hash function and where The output of the short WHT for this particular substream √N X (c = 0, p = 1) is: ξ [j] = ( 1)hj,`iw (5) c,p B − Ψc`+dc,p ∈ b U0,1[0] = X[0] X[4] + X[60] + ξ0,1[0] ` F2 − · · · − 2 U0,1[1] = X[1] X[5] + X[61] + ξ0,1[1] is the compound Gaussian noise ξc,p[j] (0, Nσ /B). − · · · − ∼ N U0,1[2] = X[2] X[6] + X[62] + ξ0,1[2] Proof. The proof follows from the properties of WHT, simi- − · · · − U [3] = X[3] X[7] + X[63] + ξ [3]. larly to that in [14], and hence is omitted here. 0,1 − · · · − 0,1 2) General Case using Bin-Measurement Matrices: For The hash function h (k) maps original WHT coefficients c the general case, the hash outputs of the substreams in X to the hash bin j. The shifts d change the sign of the P k c,p each stage can be stacked in a length- vector ¯ contribution of each original coefficient X to its bin. P Uc[j] , k [U [j], ,U [j]]T for each bin j and expressed as 1) A Simple Example: A simple example of the randomized c,1 ··· c,P n hashing front-end is shown in Fig. 1 with N = 2 = 64 U¯c[j] = Gc[j]X¯ + ξ¯c[j], (6) and sparsity level K = B = 2b = 4. Suppose the 4 non- ¯ 2 zero WHT coefficients of the signal X¯ are X[4],X[8],X[17] for c [C] and j [B], where ξc[j] (0, (N/B)σ IP ×P ) and G∈ [j] is the∈bin measurement∼ matrix N defined by the and X[62]. Here the decimal representation X[k10] of Xk is c random shifts as well as the subsampling operator Ψ . Let used for convenience: e.g., X[4] = X000100. The randomized c hashing front-end subsamples the input signal and its shifted hdc,1,ki hdc,P ,ki T P g¯k = [( 1) , , ( 1) ] 1, 1 (7) version through C = 2 stages. The signal is shifted using the − ··· − ∈ {− } random 6-bit patterns dc,p. be the signature associated with a particular WHT coefficient For illustration, we show the second substream p = 1 in Xk. Then the k-th column of the bin measurement matrix P ×N stage c = 0, where the associated random shift is chosen as Gc[j] F of bin j of stage c is given by ∈ 2 d = [4] = 000100. The subsampling matrix for stage c = ( 0,1 2 g¯ , if ΨT k = j is T T T , which freezes the MSB. Thus, k c 0 Ψ0 = [04×2,I2×2] 4 [Gc[j]](:,k) = . (8) substream p = 1 in this stage is obtained as 0P ×1, otherwise Therefore, the outputs U¯ [j] from stage c in the randomized u0,1[0] = y000000+d0,1 = y[4], u0,1[1] = y000001+d0,1 = y[5], c hashing front-end at a certain bin j becomes the compressed u [2] = y = y[6], u [3] = y = y[7]. 0,1 000010+d0,1 0,1 000011+d0,1 measurement of the unknown sparse WHT vector X¯, with The second sampling matrix Ψ1 freezes the 2 MSB and 2 the random bin measurement matrix Gc[j] in (6). Thus, each LSB, whose subsampled outputs are shown in Fig. 1. These stage divides the computation of the sparse WHT into multiple 4

(00) Algorithm 2 Robust Bin Identification ¯ B (r2,r1,r0) Input: Noisy observations Uc[j] R for bin j in stage (01) ∈ (00 01 00) X[4] stage-0 c. (10) r0 Set: Parameters γ > 0 and L 1. (00 10 00) X[8] ¯ 2 2  (11) if Uc[j] /P (1 + γ)ν then ≤ (10 00 01) X[17] return Bin j is a zero-ton ¯ 2 2 2 (00) else if Uc[j] /P (1 + γ)(Lρ + ν ) then (11 11 10)X[62] return Bin j is a≥ multi-ton stage-1 (01) else r1 b (10) for k F2 do Obtain∈ the MLE of the coefficient: (11) 1 T Xbk = g¯ U¯c[j]. (11) P k Fig. 2. A 2-left regular degree bi-partite graph representing the relation be- tween the unknown non-zero WHT coefficients and the observations obtained end for through the architecture shown in Fig. 1, for the 64-point example signal x¯. Identify the best coefficient: Variable (left) nodes correspond to the non-zero WHT coefficients and the check (right) nodes are the observations. 2 bk = arg min U¯c[j] Xbkg¯k (12) b k∈F2 − 2 sparse recovery problems. if U¯ [j] X g¯ /P (1 + γ)ν2 then c bbk bk To analyze recovery performance, we must place restrictions − ≤ return Coefficient index bk and value Xb on to ensure that it is a “good” measurement matrix. bk Gc[j] else Here we adopt the criterion of mutual coherence between µ return Unable to identify bin different codewords in the non-zero columns of , g¯k Gc[j] end if defined as: end if 1 T µ = max g¯k g¯k0 . (9) k=6 k0 P | |

Intuitively, µ measures similarity between codewords, and a The degree of each check node U¯c[j] depends on how many good measurement matrix should have codewords that are as non-zero coefficients Xk are hashed into bin j in stage c. Our mutually different as possible. We use the following bound on next goal will be to identify the degree of check nodes, which the mutual coherence: we categorize as zero-ton bins (no non-zero coefficients), Lemma 1. The mutual coherence µ is bounded by single-ton bins (one non-zero coefficient), and multi-ton bins (multiple non-zero coefficients). 1 µ (10) 2) Robust Identification of Single-ton Bins: We briefly 2(L + 1) ≤ describe our tests for zero/single/multi-tons, summarized in for some positive integer L = (1) with probability at least Algorithm 2. We prove that these tests succeed with high O 1 (N −2), as long as the number of random shifts is at probability in the appendix. For simplicity, we assume that the − O least P 24(L + 1)2 log N. signal strength ρ and the (hashed) noise variance ν2 = Nσ2/B ≥ are known. Proof. See Appendix B. For each type of bin, the observation in bin (c, j) has values: ¯ ¯ B. Peeling-based Decoder for Stable WHT Recovery Uc[j] = ξc[j] (zero-ton) (13) ¯ ¯ So far, we have described how WHT coefficients are hashed Uc[j] = Xkg¯k + ξc[j] (single-ton) (14) X to bins. We now describe how those coefficients are identified U¯c[j] = Xkg¯k + ξ¯c[j] (multi-ton) (15) and recovered. After explaining the structure of our decoding k∈nonzeros(c,j) problem by relating it to sparse graph codes (following [13]), For zero-tons and large multi-tons, we can expect the energy we discuss how to identify coefficients and iteratively “peel” U¯ [j] to be small and large, respectively, relative to the them from the bins. c energy of a single-ton. Algorithm 2 uses this idea to eliminate 1) Sparse Graph Codes with Randomized Check Nodes: zero-tons and large multi-tons. To locate single-tons and Each hash observation U¯ [j] consists of randomized linear c distinguish them from small multi-tons, Algorithm 2 uses a combinations of the unknown WHT coefficients X in k k∈K Maximum Likelihood Estimate (MLE) test. For each of N/B bin j. In terms of sparse graph codes, we need to identify{ } a set possible coefficient locations k (for a fixed bin (c, j)), we of K erasures (coefficients X ). These erasures/coefficients k obtain the MLE for X as: correspond to K variable{ nodes} , and the observations cor- k respond to the check nodes, for stages c [C]. We illustrate 1 T Xbk = g¯ U¯c[j] . (16) this graph decoding problem in Fig. 2, continuing∈ the example P k from Fig. 1. We choose among the locations by finding the location k 5 which minimizes the residual energy: Algorithm 3 Peeling-based Decoding Algorithm 2 Require: # of hash blocks C; # of peeling iterations I; Sets ¯ bk = arg min Uc[j] Xbkg¯k . (17) of random shifts dc,p for each substream p [P ] and stage k − c [C]. ∈ 3) Iterative Decoding: Once we identify a single-ton bin’s Ensure:∈ Hash block size B = (K) and P = β log N with coefficient (location and value), we can substract its con- some sufficiently large constantO β. tribution to other bins, possibly creating new single-tons. N n Given: Noisy sequence y¯ =x ¯ + ξ¯ R with N = 2 , We detail this iterative method in Algorithm 3. Barring the where the WHT of x¯ has (unknown)∈ sparsity K. zero/single/multi-ton testing, peeling may be analyzed the for c = 0 to C 1 do same way as in [13], [14]. for p = 0 to P− 1 do Let bin jc be a single-ton detected in stage c, and let the − associated non-zero location be bk and coefficient estimate be r 0 N b Xbk as in (16). For each stage c = 1, ,C, the coefficient u [`] = y , ` (20) b ··· c,p Ψc`+dc,p F2 X contributes to bins jc0 for which: B ∈ bk b Uc,p[j] = WHT [uc,p[`]] , ` F . (21) T 0 ∈ 2 jc0 = Ψc0 bk, c = 1, ,C. (18) ··· end for We remove from a bin 0 by updating the bin values as: Xbk jc end for for i = 1 to I do dc0,p,bk Uc0,p(jc0 ) Uc0,p(jc0 ) Xb ( 1) , p. (19) ←− − bk − h i ∀ for c = 0 to C 1 do − This whole process iterates until no more single-tons are found for j = 0 to B 1 do − and K non-zero coefficients have been decoded. Perform robust bin identifications in Algorithm 2. If not single-ton, continue to next j. Obtain estimated index and coefficient . V. NUMERICAL EXPERIMENTS bk Xbbk Infer participating bins We tested our method on samples generated from a sparse n b T 0 WHT signal X of length N = 2 with K = 2 randomly jc0 = Ψc0 bk, c = 1, ,C. positioned non-zero coefficients of magnitude ρ. We added ··· 2 Peel Off: zero-mean Gaussian noise with variance σ to the time-domain 0 signal computed from . Despite the fact that our analysis is for c = 0 to C 1 do X Remove the contribution− from single-tons asymptotic, probabilities of failure approach 0 quickly in a hd 0 ,kbi range of problem settings. U 0 (j 0 ) ← U 0 (j 0 ) − X (−1) c ,p , ∀p ∈ [P ]. c ,p c c ,p c bkb Sample Complexity: We examine how noise affects the number of samples our algorithm requires. For a fixed problem end for end for size (WHT signal of length 214 with 26 non-zero coefficients), we varied the SNR and calculated the probability of failure end for from 100 random problems. In Fig. 3, we can see that end for the method requires a small constant oversampling factor of about 4 more shifts than the noiseless algorithm from [14] (oversampling× factor 1). vanishing failure probability at the same level of complexities Sparsity: We examine how sparsity affects the probability as that of the noiseless case. of success in Fig. 4. For a fixed signal length of 214, we vary We are currently developing faster decoding methods to the number of non-zero coefficients 2b. Our method seems decrease computational complexity. Another valuable but more to recover dense problems more easily. We posit that this challenging direction would be to modify the analysis to relax difference is due to the fact that, for a fixed SNR, the expected our assumptions, especially the uniformly random sparsity p magnitude of noise in each bin shrinks as 1/B; thus, as pattern and the assumed knowledge of problem parameters B increases, it is easier to recognize signals of magnitude ρ such as SNR. amidst the noise. Note that we use 2B bins rather than B, which we use in our asymptotic analysis. With B bins, the APPENDIX A probability of success does not get quite as close to 0 because PROOFOF MAIN RESULTS IN THEOREM 1 of occasional failures in the peeling process. A. Sample Complexity VI.CONCLUSIONS In [14], it has been shown that for the noiseless case, for In this paper, we have proposed a robust algorithm to any given 0 < α < 1 and sufficiently large (K,N), their compute a K-sparse N-point WHT using (K log N) sam- algorithm computes the K-sparse N-length WHT X¯ , with ples generated by the randomized hashingO front-end and probability at least 1 (N −3/8) using a total of O(K) (N log2 N) operations. Our approach is based on strategic number of bins. Later,− we O show that P = (log N) number O O subsampling of the input noisy signal ym using a small set of observations per bin are sufficient to make our algorithm of randomly chosen subsampling patterns, which achieves a robust against the observation noise. Hence, the total sample 6

B. Computational Complexity 1.0 The computational cost of our algorithm can be roughly computed as 0.8 Oversampling Factor Total # of arithmetic operations (22) 1 2 = # of iterations # of bins (operations per bin). (23) 0.6 × × 4 Similar to [14], for all values of sparsity index 0 < α < 1 8 our front-end employs no more than (K) number of bins O 0.4 14 and if successful completes decoding in constant number of 28 iterations. Each bin requires first the computation of a K- Probability of failure 0.2 56 point WHT, which has complexity (K log K) over (P ) substreams. O O Now, from the pseudocode of bin identification scheme 0.0 20 15 10 5 0 5 10 15 20 provided in Algorithm 2, it is clear that for each bin, the SNR (dB) algorithm performs exhaustive search over (N/B) columns O of possible signatures g¯k, where each column is of dimension Fig. 3. Sample complexity: Our theory for decoding requires that the P . Further as shown later, the number of shifts P = (log N) number P of random delays be on the order of n. These plots of probability is sufficient for reliable reconstruction. Thus, theO overall of failure (y-axis) vs. SNR (x-axis) show that this value n is fairly tight. computational complexity of our algorithm is no more than, The noiseless method from [14] uses n + b − 1 delays; each plotline uses (oversampling factor) · (n + b − 1) delays. A small oversampling factor > 1 results in high success probability. Total # of arithmetic operations Test details: Problems are sparse WHT signals of length N = 2n, n = 14, = # of iterations (K) ( (N/B) P ) with K = 2b, b = 6, non-zeros. Algorithm with C = 4 stages and 2K bins. × O × O × Probability of failure is computed from 100 random problems. + # of iterations (K log K) P × O × = (N log N) + (K log K log N) O O (N log2 N) ≤ O where log K = α log N is used.

1.0 K = 2^b non-zeros C. Probability of Success b=0 0.8 The success rate of the algorithm depends on each bin j b=1 to be processed correctly, meaning that each bin is correctly b=2 identified as a zero-ton, single-ton or multi-ton. Define b as 0.6 b=4 the the error event where a bin j is decoded wronglyE and b=6 then, using a union bound over different bins and different

0.4 iterations, the probability of the algorithm making a mistake in bin identification can be obtained as Probability of failure 0.2 P ( ) < number of iterations number of bins P ( b) . E × × E Furthermore, define as the error event where the algorithm Ef 0.0 fails to reconstruct the WHT coefficients X¯, then its probabil- 20 15 10 5 0 5 10 15 20 SNR (dB) ity can be obtained as

N = P ( f ) (24) E Fig. 4. Sparsity: Our algorithm is robust to noise at reasonable SNRs (x-axis), c c = P ( f ) P ( ) + P ( f ) P ( ) (25) and the probability of failure (y-axis) goes to 0 as SNR increases. Plotlines E |E E E |E E b c show different sparsity levels (with 2 non-zero coefficients); sparser problems P ( ) + P ( f ) . (26) seem to require higher SNRs. ≤ E E |E Test details: Each test uses twice as many samples as the noiseless algorithm: The first term is the error probability of bin processing and n P = 2(n−b+1). Problems are sparse WHT signals of length N = 2 , n = the second term is the error probability of the algorithm when 14, with K = 2b non-zeros. Algorithm with C = 4 stages and 2K bins. Probability of failure is computed from 100 random problems. it fails to decode all the WHT coefficients given noiseless observations (i.e., bin processing is always correct). According to the result in [14], the error probability of the algorithm with c noiseless observations can be upper bounded by P ( f ) (N −3/8) and therefore, as long as the bin processingE |E error≤ Oprobability can be bounded by complexity of our algorithm in the presence of the observation noise is (K log N). P ( ) < (1/N), (27) O E O 7 the overall failure probability remains the same level as the driven by the minimum distance between every distinct noiseless case [14] such that pair of single-tons, which can be obtained by using the mutual coherence on any two randomly picked columns lim N = 0. (28) N→∞ of the bin-observation matrix. In the following, we focus on showing that the probability In the following, we analytically study the probability of in (27) is guaranteed by having a bin-wise error probability error in each category. 2 P ( b) < (1/N ), (29) Proposition 2. (Vanishing zero-ton error) The probabilities of E O mistaking a single-ton as a zero-ton (miss detection) as well since there are at most B K bins in each iteration and as mistaking a zero-ton with a single-ton (false detection) can K < N. The following section≈ is dedicated to showing that be upper bounded as (29) can be guaranteed by having  [  2 exp ( C P ) P = (log N) (30) P z|k? k?|z 0 O E E ≤ − shifts in the randomized hashing front-end. where  2  Probability of Bin Error P ( b): We define a set c,j , 1 ρ E K C0 = log (31) k : hc(k) = j = k1, , kN/B . Given a specific set 2 (1 + γ)ν2 of{ bin measurements} U¯{ [j]···, we have} the following possible c for some γ. outcomes from the single-ton identification scheme (i.e., bin processing) Proof. See Appendix D. 1) It is a zero-ton bin (13). 2) It is a multi-ton bin (15). Proposition 3. (Large Multi-ton v.s. Single-ton) The proba- 3) It is a single-ton bin (14) for some k c,j. bilities of mistaking a single-ton as a small multi-ton (miss ∈ K Then the errors made in the identification include the following detection) as well as mistaking a small multi-ton with a single- miss detection events ton (false detection) can be bounded as 1) Detecting a single-ton as a zero-ton |  2  Ez k?  [  γ P 2) Detecting a single-ton as another single-ton k|k? P ML|k? k?|ML < 4 exp . (32) E E E − 8 3) Detecting a single-ton as a small L-ton mL|k? with L = (1) E Proof. See Appendix E. 4) ODetecting a single-ton as a large L-ton with L = EML|k? 6 (1) and lim →∞ L = . Proposition 4. (Small Multi-ton v.s. Single-ton) The poba- O N ∞ and the following false detection events bilities of mistaking a single-ton as a small multi-ton (miss detection) as well as mistaking a small multi-ton with a single- 1) Detecting a zero-ton as a single-ton k?|z 2) Detecting a small L-ton with L = E(1) as a single-ton ton (false detection) can be bounded respectively as O  2   2  Ek?|mL  γ P P ρ 3) Detecting a large L-ton with L = (1) and lim →∞ L = P mL|k? 2 exp + exp (33) N E ≤ − 8 − ν2 as a single-ton . 6 O k?|ML  ∞ E P k?|mL exp ( CLP ) , (34) Next we provide the highlights of the analysis to show that E ≤ − the probability of bin error of the peeling-decoder decodes go where 2 to zero as N at a rate that is at least (1/N ).  2  → ∞ O 1 (L 1)ρ • Vanishing zero-ton error: The zero-ton error probability CL = log − . (35) S 2 2(1 + γ)ν2 P( k?|z z|k? ) is found by evaluating the tail of noise distributionE E based on the zero-ton test. Proof. See Appendix F. • Vanishing large L-ton error: The large multi-ton error S probability P( k?|ML ML|k? ) is found by applying Proposition 5. (Single-ton v.s. Single-ton) The probability of the Central LimitE TheoremE (CLT) to the multi-ton with mistaking a single-ton as another single-ton is bounded as L = (1) and limN→∞ L = , and evaluating the tail 6 O ∞ P( k?|k) exp ( C0P ) . (36) probability with noise based on the multi-ton test. E ≤ − • Vanishing small L-ton error: The error exponent of Proof. See Appendix G. mistaking a small L-ton with the true single-ton S P( k?|mL mL|k? ) is driven by the minimum distance E E It can be inferred from Proposition 2 - 5 that the bin error between any small L-ton and a single-ton, which can be can be bounded by obtained by a proper bound on the mutual coherence of the bin-observation matrix for any two randomly picked P ( b) exp( CminP ), (37) E ≤ − columns and applying the worst case mutual coherence where bound on L = (1) columns. O  γ2  • Vanishing single-ton error: The error exponent of mis- C min C , ,C . (38) min , 0 8 L taking any single-ton with the true single-ton P( k|k? ) is E 8

To drive the error to vanish at the rate of 1/N 2, it is sufficient APPENDIX C to have PROOFOF GENERAL TAIL PROBABILITY 2 log N We first invoke the following tail bound for later error P (39) ≥ Cmin analysis. and thus P = (log N). Lemma 2. [16] Given a random Gaussian vector ξ¯ O ∼ (0, ν2I) RP and any length-P vector φ¯ RP such that N ∈ ∈ 1 ¯ 2 φ Emin, (50) APPENDIX B P ≥ PROOFOF LEMMA 1 then the following tail bounds hold:   1 ¯ ¯ 2 The mutual coherence µ is obtained as P φ + ξ τ exp ( Cτ P ) (51) P ≤ ≤ − P 1 X hdc,p,ni hdc,p,ki where Cτ is some constant given by µ = max ( 1) ( 1) (40) n=6 k P − −   p=1 1 Emin P Cτ = log . (52) 1 X 2 τ = max ( 1)hdc,p,n+ki (41) n=6 k P − p=1 APPENDIX D = max µn,k (42) PROOFOF PROPOSITION 2 n=6 k A. Detecting a Single-ton as a Zero-ton z|k where E ? The event z|k occurs under the single-ton model in (14) P E ? X such that the bin energy obtained as µ ( 1)hdc,p,n+ki/P . (43) n,k , − p=1 1 ¯ 2 1 ¯ 2 Uc[j] = Xk? g¯k? + ξc[j] (53) P P Under the assumption that the shifts dc,p’s are selected ran- domly, each summand is an independent Bernoulli random drops below the threshold τ = (1 + γ)ν2. By using φ = ¯ ¯ ¯ √ variable taking values 1 equiprobably, which implies that the Xk? g¯k? and ξc[j] = ξ in Lemma 2, we have φ = ρ P ± 2 mean of each term is zero. Furthermore, since each random (i.e., Emin = ρ ) and thus variable is bounded between [ 1, 1], the Hoeffding bound  P z|k? exp ( C0P ) gives us − E ≤ −

P ! where C0 is given in (31). X  P µ2  hdc,p,n+ki 0 P ( 1) /P µ0 2 exp . (44) − ≥ ≤ − 2 B. Detecting a Zero-ton as a Single-ton p=1 Ek?|z  2  The event k |z occurs under the zero-ton model in (13) Therefore, with probability at least 1 2 exp P µ0 , the E ? − − 2 such that the bin energy obtained as variable µn,k can be bounded by 1 ¯ 2 1 ¯ 2 Uc[j] Xk? g¯k? = Xk? g¯k? ξc[j] (54) µn,k µ0. (45) P − P − ≤ 2 Therefore, the probability of the mutual coherence µ greather rises above the threshold τ = (1 + γ)ν . than µ can be bounded by the union bound below (assuming   0  1 ¯ 2 P k?|z = P Xk? g¯k? ξc[j] τ . a uniform distribution of n and k) E P − ≤ 1 N ¯ ¯ ¯ X X Now we let φ = Xk? g¯k? and ξ = ξc[j]. This implies that P (µ µ0) P (µn,k µ0) (46) 2 − ≥ ≤ N ≥ Emin = ρ and therefore by Lemma 2, the probability of error n=1 k=6 n for the event can be similarly obtained as  P µ2  Ek?|k 2N exp 0 . (47)  ≤ − 2 P k?|z exp ( C0P ) (55) E ≤ − By choosing µ0 = 1/2(L+1) for some positive integer L > 0, for some γ. the mutual coherence µ can be bounded as 1 APPENDIX E µ (48) PROOFOF PROPOSITION 3 ≤ 2(L + 1) Define the set of indices of non-zero coefficients in bin:  P  with probability at least 1 2N exp 2 . As long as . Since the multi-ton model is a sum of different − − 8(L+1) Lc,j ⊆ Kc,j P 24(L + 1)2 log N, the probability can be achieved with signatures and noise, ≥   X P 2 U¯ [j] = X g¯ + ξ [j], (56) (49) c k k c 1 2N exp 2 1 2 . − −8(L + 1) ≥ − N k∈Lc,j 9 we specifically analyze the following sum in the asymptotic Therefore, regime L as N  γ2P  → ∞ → ∞  < 2 exp . (64) 1 X P ML|k? Z¯ = X g¯ . (57) E − 8 L k k k∈Lc,j

Since Xk and g¯k are independent random variables, therefore B. Detecting a Large L-ton as a Single-ton Ek?|ML Xkg¯k’s are independent identically distributed. Clearly, by the Central Limit Theorem (CLT), the vector Z¯ asymptotically The event k?|ML corresponds to the error when the un- derlying binE is a large multi-ton bin of size L = (1) converges in distribution to a multi-variate normal random 6 O vector with limN→∞ L = and is mistaken as a single-ton bin at location k . The event∞ occurs under the multi-ton ¯ ? Ek?|ML Z (0P ×1, Σ), (58) model (15) whenever the energy ∼ N where 1 ¯ 2 2 Uc[j] Xk? g¯k? (1 + γ)ν .  T 2 2  T  P − ≤ Σ = E g¯kg¯ Xk = ρ E g¯kg¯ . (59) k | | k Thus, the error probability can be obtained by The (i, j)-th element in Σ can be readily obtained as   1 ¯ 2 2 ( P( k?|ML ) = P Uc[j] Xk? g¯k? (1 + γ)ν (65) h i 2 E P − ≤ 2 hdc,i+dc,j ,ki ρ , i = j Σij = ρ E ( 1) = . (60)   − 0, i = j 1 ¯ 2 2 2 P Uc[j] (1 + γ)ν + ρ . (66) 6 ≤ P ≤ Therefore, Z¯ has P independent unit-variance normal random ¯ Since (1+γ)ν2 +ρ2 (1 γ)(Lρ2 +ν2) as long as 0 < γ < 1 variables and therefore asymptotically the entries in Uc[j] are ≤ − 2 in the asymptotic regime L , we have independent normal random variables each with variance σL = 2 → ∞ Lρ2 + ν2. Therefore, U¯ [j] is a P -dimensional chi-square   c 1 ¯ 2 2 2 2 2 P( k?|ML ) P Uc[j] (1 γ)(ν + Lρ ) (67) random variable σLχP . E ≤ P ≤ − where U¯ [j] follows the multi-ton model. Therefore, the prob- A. Detecting a Single-ton as a Large L-ton m |k c E L ? ability in (67) can be obtained by The event ML|k? occurs when the underlying bin is a E  2 2  single-ton bin, which is mistaken as a large multi-ton. Such 2 P (1 γ)(ν + Lρ ) P( k?|ML ) P χP − 2 (68) event occurs under the single-ton model (14) whenever E ≤ ≤ σL 2  1 2 2 2 = P χP P (1 γ) . (69) U¯c(j) τ, τ = (1 + γ)(Lρ + ν ), ≤ − P ≥ This can be bounded by the same bound in (61) as Substituting the single-ton model into the above expression,  2  we have γ P P( k?|ML ) 2 exp . (70) E ≤ − 8 1 ¯ 2 2 2 Xk? g¯k? + ξc[j] (1 + γ)(Lρ + ν ). P ≥ Using the triangular inequality, we have APPENDIX F    ¯ 2 2 2 2 PROOFOF PROPOSITION 4 P ML|k? P ξc[j] P (1 + γ)(Lρ + ν ) ρ E ≤ ≥ −  2  A. Detecting a Single-ton as a Small -ton ¯ 2 L mL|k? < P ξc[j] P (1 + γ)ν . E ≥ The event m |k corresponds to the error when the un- ¯ 2 2 2 2 E L ? Since ξc[j] follows a chi-square distribution ν χP , and χP derlying bin is a single-ton bin and is mistaken as a small is a sub-exponential random variable with parameters (4P, 4), multi-ton. Such event occurs under the single-ton model (14) 2 we obtain a standard tail bound for χP for some real number whenever t (0,P ) as follows ∈ 1 ¯ 2 2  2  Uc(j) Xk? g¯k? (1 + γ)ν , 2  t P − ≥ P χ P t 2 exp (61) P − ≥ ≤ −8P or and therefore Xbk? = Xk? ,  2  6 2  t which gives P χP t + P 2 exp . (62) ≥ ≤ −8P  2 P mL|k? Now let P + t = P τ/ν = P (1 + γ) such that t = γP , we E   1 ¯ 2 2 have P Uc(j) Xk? g¯k? (1 + γ)ν Xbk? = Xk? ≤ P − ≥ |    2  ¯ 2 2 γ P   P ξc[j] P (1 + γ)ν < 2 exp . (63) + P Xbk? = Xk? . ≥ − 8 6 10

Substituting the single-ton model into the above expression, and therefore by Lemma 2, the probability of error for the the first term is equivalent to event can be computed by Ek?|mL   1 2 ¯ 2 P( k?|mL ) exp ( CLP ) (77) P Uc(j) Xk? g¯k? (1 + γ)ν Xbk? = Xk? P − ≥ | E ≤ −   where CL is given in (35). ¯ 2 2 = P ξc[j] P (1 + γ)ν . ≥ The probability of this event can be bounded as that in (63) APPENDIX G PROOFOF PROPOSITION 5    2  ¯ 2 2 γ P P ξc[j] P (1 + γ)ν < 2 exp . (71) The event k|k? occurs under the single-ton model (14) 8 ≥ − whenever theE energy   2 On the other hand, the value of P Xbk? = Xk? can be ¯ 2 6 Uc[j] Xkg¯k (1 + γ)τ, τ = P ν . bounded by the error probability of binary detections as − ≤ The probability of this event can be upper bounded by    P ρ2  (72)  2  P Xbk? = Xk? exp 2 . ¯ 6 ≤ − ν P( k?|k) = P Xk? g¯k? Xkg¯k + ξc[j] (1 + γ)τ E − ≤ The result in Proposition 4 thus follows by summing up (71)  ¯ 2  = P Gc[j]¯q + ξc[j] (1 + γ)τ , and (72). ≤

where q¯ is an 2-sparse vector and Gc[j] is the bin-observation matrix in (6). Now let φ¯ = Gc[j]¯q and ξ¯ = ξ¯c[j]. Since the B. Detecting a Small L-ton as a Single-ton | Ek? mL support of q¯ is exactly 2, the norm φ¯ 2 can be obtained as The event is a generalized event that corresponds to k k k?|mL follows: the error whenE the underlying bin is a small multi-ton bin of 2 2 size L = (1) and is mistaken as a single-ton bin at location φ¯ P ρ , (78) O k k ≥ k . The event occurs under the multi-ton model (15) −2 ? Ek?|mL with probability at least 1 (N ) if P 24 log N whenever the energy − O ≥ 2 according to Lemma 1. This implies that Emin = ρ and 1 ¯ 2 2 therefore by Lemma 2, the probability of error for the event Uc[j] Xk? g¯k? τ, τ = (1 + γ)ν . P − ≤ can be computed by Ek?|k The probability of this event can be upper bounded by P( k?|k) exp ( C0P ) . (79) E ≤ − P( k?|mL ) E  2  REFERENCES 1  X ¯  [1] W. Pratt, J. Kane, and H. C. Andrews, “Hadamard transform image = P Xkg¯k Xk? g¯k? + ξc[j] τ P − ≤  coding,” Proceedings of the IEEE, vol. 57, no. 1, pp. 58–68, 1969. k∈Lc,j [2] T. R. WGI, “Spreading and modulation (fdd),” 3GPP Tech Rep. TS25.   213, 2000. http://www. 34gpp. org, Tech. Rep. 1 2 ¯ [3] S. Haghighatshoar and E. Abbe, “Polarization of the renyi´ information = P Gc[j]¯q + ξc[j] τ , P ≤ dimension for single and multi terminal analog compression,” in Infor- P mation Theory Proceedings (ISIT), 2013 IEEE International Symposium where q¯ = Qkg¯k is an (L + 1)-sparse vector with = on. IEEE, 2013, pp. 779–783. S k∈L L c,j k? and Gc[j] is the bin-observation matrix in (6). [4] M. Lee and M. Kaveh, “Fast hadamard transform based on a simple L { } ¯ P ¯ ¯ matrix factorization,” Acoustics, Speech and Signal Processing, IEEE Now let φ = Gc[j]¯q = k∈L Qkg¯k and ξ = ξc[j]. Since Transactions on, vol. 34, no. 6, pp. 1666–1667, Dec 1986. 2 the support of q¯ is no greater than (L + 1), the norm φ¯ [5] J. Johnson and M. Puschel, “In search of the optimal walsh-hadamard can be obtained as follows: k k transform,” in Acoustics, Speech, and Signal Processing, 2000. ICASSP ’00. Proceedings. 2000 IEEE International Conference on, vol. 6, 2000, 2 pp. 3347–3350 vol.6. 2 X φ¯ = Qkg¯k (73) [6] K. J. Horadam, Hadamard matrices and their applications. Princeton k k university press, 2007. k∈L X X X [7] A. Hedayat and W. Wallis, “Hadamard matrices and their applications,” 2 2 T The Annals of Statistics, vol. 6, no. 6, pp. 1184–1238, 1978. = Qk g¯k + g¯n g¯kQkQn (74) | | k k [8] H. Hassanieh, P. Indyk, D. Katabi, and E. Price, “Simple and practical k∈L n=6 k k algorithm for sparse fourier transform,” in Proceedings of the Twenty- 2 2 X X Third Annual ACM-SIAM Symposium on Discrete Algorithms P (L 1)ρ P ρ µn,k (75) . SIAM, ≥ − − 2012, pp. 1183–1194. n=6 k k [9] ——, “Nearly optimal sparse fourier transform,” in Proceedings of the P (L 1)ρ2 P ρ2(L2 + 1)µ, (76) 44th symposium on Theory of Computing. ACM, 2012, pp. 563–578. ≥ − − [10] B. Ghazi, H. Hassanieh, P. Indyk, D. Katabi, E. Price, and L. Shi, where µ is the mutual coherence. Therefore, the norm φ¯ 2 “Sample-optimal average-case sparse fourier transform in two dimen- can be lower bounded by k k sions,” arXiv preprint arXiv:1303.1209, 2013. [11] M. Iwen, A. Gilbert, and M. Strauss, “Empirical evaluation of a sub- linear time sparse dft algorithm,” Communications in Mathematical φ¯ 2 > P (L 1)ρ2/2, k k − Sciences, vol. 5, no. 4, pp. 981–998, 2007. [12] M. A. Iwen, “Combinatorial sublinear-time fourier algorithms,” Foun- with probability at least 1 (N −2) if P 24(L+1)2 log N −O ≥ dations of Computational Mathematics, vol. 10, no. 3, pp. 303–338, according to Lemma 1. This implies that E = (L 1)ρ2/2 2010. min − 11

[13] S. Pawar and K. Ramchandran, “Computing a k-sparse n-length discrete fourier transform using at most 4k samples and O(klogk) complexity,” arXiv preprint arXiv:1305.0870, 2013. [14] R. Scheibler, S. Haghighatshoar, and M. Vetterli, “A fast hadamard transform for signals with sub-linear sparsity,” arXiv preprint arXiv:1310.1803, 2013. [15] S. A. Pawar, “Pulse: Peeling-based ultra-low complexity algorithms for sparse signal estimation,” PhD Dissertation, 2013. [16] Y. Jin, Y.-H. Kim, and B. D. Rao, “Limits on support recovery of sparse signals via multiple-access communication techniques,” Information Theory, IEEE Transactions on, vol. 57, no. 12, pp. 7877–7892, 2011.