arXiv:1205.4673v1 [cs.IT] 21 May 2012 f“tutrd inl?Aeteesmln n recovery and signals sampling “structured” of class there reconstruction the the Are for algorithms define signals? to “structured” How of signals: “structured” dersampling holds result same compression. The at lossy [11]. process universal compression ergodic asymptotically for of stationary rate any family entropy code a to its able exists are there that distributi words, codes source other at the processes In knowing ergodic [11]. without stationary rates, code entropy to their able are algorithms sue ntecd ein twsltrsonta this that shown later was fact, it In design, essential. hence code and not decoder, is the the to in and used encoder the is to distributi both source known the is that was assumption the [10], that Shannon structures literature. the other in some explored been of of already examples rate have are finite [9] and [4]–[8], innovation sensing compressed [ Low-rankedness - model-based [2]. those [1], process algorithms from efficient it using the ments through recovering to signal and measurements sparse refers domain. linear high-dimensional (CS) a some sensing undersampling in of compressed representations term sparse The have signals ural is context rate. kind Shannon-Nyquist sampli some the the have lowering from interest considerably rate enables of that signals structure the analog of too, many There that signals. is analog idea tren sampling parallel a for years, developing of recent been exploitin In , number storage. hence the images, their reduce for and text, required dramatically objects, can that structured structures premise highly those are the all on etc. based built are u eut otecs fniymeasurements. noisy of case e exist we the paper, this to there In results setting. questi whether our noise-free this the ask answered in affirmative recently to the We projections. natural algorithm linear signa is recovery their “structured” recovering different it for algorithms of date, “universal” range to wide compl developed (matrix signal the sparse matrices Given representatio include low-rank tion). examples and reduced projection; sensing) dimensionality random (compressive a a as from such signals tured etrfrMteaiso Information of for Center n a s iia usin o h rbe fun- of problem the for questions similar ask can One hl nteoiia orecdn rbe nrdcdby introduced problem coding source original the in While this in studied extensively was that structure first The They world. digital the in ubiquitous are compressors Data Abstract aionaIsiueo Technology of Institute California aaea aiona 91125 California, Pasadena, oto rbesivletercvr fstruc- of recovery the involve problems of host A — sparsity hrnJalali Shirin thsbe bevdta aynat- many that observed been has It . iiu opeiyPrut tblt Analysis Stability Pursuit: Minimum .I I. NTRODUCTION universal eateto lcrclEngineering Electrical of Department compression oso,Txs 77005 Texas, Houston, sfrom ls xtend ieUniversity Rice has d nin on from ra Maleki Arian 3], on on ng e- n g s s h neligsrcue osti goac nu oti cost a rate? incur sampling ignorance of this the knowledge Does structure? the underlying having the without measurements linear their egho h hretcmue rga htprints that program computer shortest the of length t ora-audsignals complex- real-valued Kolmogorov of to notion ity the extending [15], In halts. hii 1] ensauieslnto fcmlxt for sequenc complexity finite-alphabet of a x notion Given universal sequences. a finite-alphabet defines [14], Chaitin ity .Notation A. set a epciey o apespace sample a For respectively. h ae.Apni rsnstoueu emsue in used lemmas useful concludes two VI proofs. presents Section the A and of Appendix literature, some paper. the mentions the V in measurements Section work stable. noisy related in is the of proved MCP case results that the proves related and considers the IV and of Section algorithm some [15]. MCP extends the and defines reviews formally real-valued III a Section II-B of signal. Section dimension paper. information the Kolmogorov throughout defines used notation the the defines and its rate on sampling bounds the noise. derive the of and is of terms variance noise algorithm in the proposed error to the reconstruction the respect that plus with that prove signal stable account We the an into of noise. propose transformation takes Gaussian linear first that a We are MCP measurements noisy. of are version updated measurements the where fewer dimension. many ambient the using its satisfying signal than the solution measurements recovers “simplest” measurements the linear W finding measurements. that linear their showed recover- from the for signals introduced “structured” algorithm We ing (MCP) questions. pursuit above complexity the minimum of some addressed xlrdi [16]. in explored h omgrvcmlxt of complexity Kolmogorov the , 1 nagrtmcifraintheory, information algorithmic In nrdcdb oooof[2,Klooo 1] and [13], Kolmogorov [12], Solomonoff by introduced , algahcltessc as such letters Calligraphic h raiaino hsppri sflos eto II Section follows. as is paper this of organization The case the to [15] of results the extend we paper, this In hs yeo xesosaesrihfradadhv alrea have and straightforward are extensions of type These A , |A| and A eateto lcrclEngineering Electrical of Department c I D II. eoeissz n t complement, its and size its denote 1 EFINITIONS yterpoe uniain we quantization, proper their by oso,Txs 77005 Texas, Houston, ihr Baraniuk Richard ieUniversity Rice A x Ω and , K n vn set event and omgrvcomplex- Kolmogorov ( x B ) sdfie sthe as defined is , eoest.For sets. denote ⊆ A ybeen dy x and Ω n e e , ½ denotes the indicator function of the event . Bold- Theorem 1: Assume that xo = (xo,1, xo,2,...) [0, 1]∞. A A ∈ faced lower case letters denote vectors. For a vector x = For integers m and n, let κm,n denote the Kolmogorov Rn n (x1, x2,...,xn) , its ℓp and ℓ norms are defined as information dimension of xo at resolution m. Then, for any p , n ∈p , ∞ x p xi and x maxi xi , respectively. For τn < 1 and t> 0, we have k k i=1 | | k k∞ | | integer n, let In denote the n n identity matrix. P × √nd 1 + t +1+1 For x [0, 1], let ((x)1, (x)2,...), (x)i 0, 1 , denote P xn xˆn > − √n2 2m+2 ∈ ∈ { i } k o − o k2 τ − the binary expansion of x, i.e., x = i∞=1 2− (x)i. The n ! - approximation of , , is defined as , d 2 d 2 m x [x]m [x]m 2κm,nm 2 (1 τn+2 log τn) 2 t m i P n 2 e − +e− . i=1 2− (x)i. Similarly, for a vector (x1,...,xn) [0, 1] , ≤ n ∈ Theorem 1 can be proved following the steps used in the [x ]m , ([x1]m,..., [xn]m). P proof of Theorem 2 in [15]. To interpret this theorem, in B. Kolmogorov complexity the following we consider several interesting corollaries that The Kolmogorov complexity of a finite-alphabet sequence follow from Theorem 1. Note that in all of the results, the x with respect to a universal is defined logarithms are to the base of Euler’s number e. U x Corollary 1: Assume that xo = (xo,1, xo,2,...) [0, 1]∞ as the length of the shortest program on that prints ∈ 2 U , and halts. Let K(x) denote the Kolmogorov complexity of and m = mn = log n . Let κn κmn,n. Then if dn = ⌈ ⌉ n n x , n κn log n , for any ǫ> 0, we have P ( x xˆ 2 >ǫ) 0, binary string 0, 1 ∗ n 1 0, 1 . ⌈ ⌉ k o − o k → ∈{ } ∪ ≥ x{ } as n . Definition 1: For real-valued = (x1, x2,...,xn) → ∞ n x ∈ Proof: For m = m = log n and d = κ log n , [0, 1] , define the Kolmogorov complexity of at resolution n ⌈ ⌉ n ⌈ n ⌉ m as 1 2mn+2 ( nd− + t +1+1)√n2− [ ]m K · (x)= K([x ] , [x ] ,..., [x ] ). 1 m 2 m n m p 2 κ log n 1 + (t + 1)n 1 + √n 1 . (2) Definition 2: The Kolmogorov information dimension of ≤ ⌈ n ⌉− − − vector (x , x ,...,x ) [0, 1]n at resolution m is defined p  1 2 n Hence, fixing t> 0 and setting τn = τ =0.1, for any ǫ> 0 as ∈ [ ]m and large enough values of n we have K · (x1, x2,...,xn) κm,n , . m (√nd 1 + t +1+1)√n2 2m+2 To clarify the above definition, we derive an upper bound − − ǫ. τn ≤ for κm,n. Lemma 1: For (x1, x2,...) [0, 1]∞ and any resolution Therefore, for n large enough, ∈ sequence mn , we have { } P xn xˆn 2 >ǫ κm,n o o 2 k − k κn log n 2 d 2 lim sup 1. 2κn log n (1 τ +2 log τ) t n n ≤ 2 e 2 − +e− 2 →∞ ≤ d 2 Therefore, by Lemma 1, we call a signal compressible, 1.4κn log n 1.7κn log n t 1 e e− +e− 2 , (3) if lim supn n− κm,n < 1. As stated in the following ≤ →∞ proposition, Lemma 1’s upper bound on κm,n is achievable. which shows that as n , P( xn xˆn 2 >ǫ) 0. iid → ∞ k o − o k2 → Proposition 1: Let X ∞ Unif[0, 1]. Then,  { i}i=1 ∼ According to Corollary 1, if the complexity of the signal is 1 [ ]m K · (X ,X ,...X ) 1 mn 1 2 n → less than κ, then the number of linear measurements needed for asymptotically perfect recovery is, roughly speaking, at in . the order of κ log n. In other words, the number of measure- III. MINIMUMCOMPLEXITYPURSUIT ments is proportional to the complexity of the signal and n only logarithmically proportional to its ambient dimension. Consider the problem of reconstructing a vector xo n d ∈ Corollary 2: Assume that x = (x , x ,...) [0, 1] [0, 1] from d < n random linear measurements yo = o o,1 o,2 ∞ n n , ∈ Ax . The MCP algorithm proposed in [15] reconstructs x and m = mn = log n . Let κn κmn,n. Then, if d = o o ⌈ ⌉ from its linear measurements yd by solving the following dn = 3κn , we have o ⌈ ⌉ optimization problem: 1 n n [ ]m P xo xˆo 2 >ǫ 0, min K · (x1,...,xn) √nk − k → xn   n d s.t. Ax = yo . (1) as n , for any ǫ> 0. → ∞ 0.5 d n 3 Proof: Setting τn = n− , m = mn = log n , and d = Let the elements of A R × , Aij , be i.i.d. (0, 1). ⌈ ⌉ n n d ∈ N d dn = 3κn in Theorem 1, it follows that Let xˆo =x ˆo (yo , A) denote the output of (1) to inputs yo = ⌈ ⌉ n and . Theorem 1 stated below is a generalization of Axo A 1 n n 1 1 1 P x xˆ > 2 dn− + (t + 1)n +2√n Theorem 2 proved in [15]. √nk o − o k2 − −  −q1 d 2  2κn log n 1.5κn(1 n log n) t 2See Chapter 14 of [11] for the exact definition of a universal computer, 2 e − − +e− 2 ≤ −1 d 2 and more details on the definition of Kolmogorov complexity. (1.5 2 log 2)κn log n+κn(1.5 1.5n ) t 3 = e− − − +e− 2 . Note that in [15] we had assumed that Ai,j ∼N (0, 1/d). n , n n n , n n Since 1.5 2 log 2 > 0, for any ǫ> 0 and n large enough, Let em xo [xo ]m and eˆm xˆo [ˆxo ]m denote the we have − quantization errors− of the original and the− reconstructed sig- nals, respectively. Using these definitions, and the Cauchy- 1 1 1 2 dn− + (t + 1)n +2√n < ǫ. n − − Schwartz inequality, we find a lower bound for A(ˆxo n 2 k − q 1 n n xo ) 2 as It follows that P( √n xo xˆo 2 >ǫ) 0, as n . k k − k → → ∞  A(ˆxn xn) 2 k o − o k2 In other words, if we are interested in the normalized n n n n 2 = A([ˆxo ]m +ˆem [xo ]m em) 2 mean square error, or per element squared distance, then 3κ k − − k n = A([ˆxn] [xn] )+ A(ˆen en ) 2 measurements are sufficient. k o m − o m m − m k2 A([ˆxn] [xn] ) 2 ≥k o m − o m k2 IV. STABILITY ANALYSIS OF MCP 2 (ˆen en )T AT A ([ˆxn] [xn] ) − m − m o m − o m In the previous section we considered the case where the n n 2 A([ˆ xo ]m [xo ]m) 2 signal is exactly of low complexity and the measurements ≥k − k 2 A(ˆen en ) A ([ˆxn] [xn] ) . (8) are also noise-free. In this section, we extend the results to − k m − m k2 k o m − o m k2 d n d d noisy measurements, where yo = Axo + w , with w On the other hand, again using our definitions plus the 2 ∼ (0, σ Id). Assuming that the complexity of the signal is Cauchy-Schwartz inequality, we find an upper bound on knownN at the reconstruction stage, we consider the following (wd)T A(ˆxn xn) as | o − o | reconstruction algorithm: (wd)T A(ˆxn xn) o − o n d 2 n n n n T T d arg min Ax y , = ([ˆx ]m [x ]m +ˆe e ) A w xn k − o k2 o − o m − m n n T T d n n T T d [ ]m n ([ˆx ] [x ] ) A w + (ˆe e ) A w s.t.K · (x ) κm,nm. (4) ≤ o m − o m m − m ≤ n n T T d n n T d ([ˆxo ]m [xo ]m) A w + eˆm em 2 A w 2. Note that κm,nm is an upper bound on the Kolmogorov ≤ − k − k k k (9) complexity of xo at resolution m. The major issue of this m section is to calculate the number of measurements required For any x [0, 1], 0 x [x]m < 2− . Therefore, ∈ ≤ − for robust recovery in noise. n n √ 2m+2 eˆm em 2 n2− . (10) Theorem 2: Assume that xo = (xo,1, xo,2,...) [0, 1]∞. k − k ≤ For integers m and n, let κ denote the information∈ Let set be the set of all vectors of length n that can be m,n S dimension of xn at resolution m. If m = m = log n written as the difference of two vectors with complexity less o n ⌈ ⌉ and d = 8rκm,nm, where r > 1, then for any ǫ > 0, we than km,nm; that is, have n n n n = h1 h2 : K(h1 ) κm,nm, K(h2 ) κm,nm . 2 S { − ≤ ≤ } n n 2 (2κm,nm)σ 2κm,nm P x xˆ > 0, (5) Note that 2 . Define the event 1 as k o − o k2 ρd → |S| ≤ E   n d T n n 1 , h : (w ) Ah t1 h 2 . as n , where ρ , (1 √r 1)2/2. E {∀ ∈ S k k≤ k k } → ∞ − − n n According Theorem 2, as long as d > 8rκ log n the For any fixed h , Ah is an i.i.d. zero-mean Gaussian n n 2 algorithm is stable in the sense that the reconstruction vector of length d and variance h 2. Assuming that hn =1 and applying Lemma 3, wek obtaink error is proportional to the variance of the input noise. By k k2 increasing the number of measurements one may reduce the d T n d P (w ) Ah t1 =P w 2G t1 reconstruction error. ≥ k k ≥ d d [ ]m n n =P w G t1, w 2 √dσ(1 +t2) Proof: Since by definition K · (xo ) = km,nmn, xo is 2 ≥ k k ≥ also a feasible point in (4). But, by assumption, xˆn is the o +P w d G t , wd < √dσ(1 + t) solution of (4). Therefore, 2 ≥ 1 k k2 2  d  n d 2 n d 2 P w 2 √dσ(1 + t2) Axˆo yo 2 Axo yo 2 ≤ k k ≥ k − k ≤k − k n n d 2 d 2   1 = Ax Ax w = w . (6) +P G t (√dσ(1 + t ))− k o − o − k2 k k2 ≥ 1 2 t2 n d 2 n n d 2 2 1  Expanding Axˆo yo 2 = Axˆo Axo w 2 in (6), it dt /2 2σ2d(1+t ) k − k k − − k e− 2 + e− 2 . (11) follows that ≤ Hence, by the union bound and the fact that 22κm,nm n n 2 d 2 d T n n d 2 |S| ≤ A(ˆxo xo ) 2 + w 2 2(w ) A(ˆxo xo ) w 2. [11], we have k − k k k − − ≤k k(7) t2 2 1 c 2κ m dt /2 2 m,n 2 − 2σ d(1+t2) d 2 P( 1 ) 2 e− + e . (12) Canceling w 2 from both sides of (7), we obtain E ≤ k k   A(ˆxn xn) 2 2(wd)T A(ˆxn xn) Note that k o − o k2 ≤ o − o 2 (wd)T A(ˆxn xn) . A(ˆen en ) σ (A) en en . ≤ o − o k m − m k2 ≤ max k m − mk2

For t3 > 0, define the event 2 as Inequality (17) involves a quadratic equation of ∆m 2. E Finding the roots of this quadratic equation, using √1+k xk (n) , √ ≤ 2 σmax(A) < (1 + t3) d + √n . 1+ x/2, and replacing m = log n , we obtain E ⌈ ⌉ n o It can be proved that [17] ∆ σγ (2κ log n)d 1 k mk2 ≤ 3 m,n − (n),c dt2/2 q 1 1 1 P e− 3 . (13) + (γ1√n− + γ2√d− )+ √d− γ4, (18) E2 ≤ 1   where γ1 = √1+ t5(1 + t3)(1 t4)− , γ2 = √1+ t5(1 But if σmax(A) < (1 + t3)√d + √n, then from (10) 1 1 − −1 t4)− , γ3 = √1+ t2(1 t4)− and γ4 = √1+ t6(1 t4)− . On the other hand, by− the union bound, − n n d m+1 A(ˆem em) 2 1+(1+ t3) 2− n. c c c c c c k − k ≤ rn! P (( 1 2 3 4 5) )=P( 1 2 3 4 5 ) E ∩Ec ∩E c∩E ∩Ec c E ∪E c∪E ∪E ∪E P( 1 )+P( 2 ) + P( 3 ) + P( 4 )+P( 5 ). (19) Define the event (n) as (n) , hn : Ahn 2 > ≤ E E E E E E3 E3 {∀ ∈ S k k2 (1 t )d hn 2 . By the union bound and Lemma 2, it Given d = 8rκm,nm, choosing t2 = t4 = 1/√r and − 4 k k2} follows that fixing t1, t3, t5,...,t8 at appropriate fixed small numbers, (12), (13), (14), (15) and (16) guarantee that (19) goes to (n),c d 2κm,nm 2 (t4+log(1 t4)) P 3 2 e − . (14) zero, as n . Moreover, for chosen parameters, γ3 < E ≤ √ √→1 ∞   2/(1 r− ). Finally, for any ǫ> 0, for n large enough, (n) √ −1 √ 1 √ 1 . This concludes the Define the event 4 as (γ1 n− + γ2 d− )+ d− γ4 < ǫ E proof.  (n) , n n 2 n 2 4 h ; Ah 2 < (1 + t5)d h 2 , E {∀ ∈ S k k k k } V. RELATED WORK Again by the union bound and Lemma 2, it follows that The MCP algorithm proposed in [15] is mainly inspired

d by [18] and [19]. Consider the universal denoising problem (n),c 2κm,nm (t log(1 t )) P 2 e− 2 5− − 5 . (15) θ E4 ≤ where is corrupted by additive white Gaussian noise as   Xn = θ + Zn. The denoiser’s goal is to recover θ from the Finally, for t > 0, define 6 noisy observation Xn. The minimum Kolmogorov complexity (n) , T d 2 estimation (MKCE) approach proposed in [18] suggests 5 A w 2 nd(1 + t6) . E {k k ≤ } a denoiser that looks for the sequence θˆ with minimum Given wd, AT wd is an n dimensional i.i.d. Gaussian random Kolmogorov complexity among all the vectors that are within d 2 n vector with variance w 2. Hence, by Lemma 2, some distance of the observation vector X . [18] shows that k k if are i.i.d., then under certain conditions, the average T d 2 2 d 2 2 θi P A w nγ (1 + t7) w = γ 2 2 marginal conditional distribution of θˆi given Xi tends to the k n k ≥ k k (t7 log(1+t7)) e− 2 − .  actual posterior distribution of θ1 given X1. ≤ In [18], the authors consider the problem of recovering a On the other hand, again by Lemma 2, low-complexity sequence from its linear measurements. Let n n n d (k ) , x [0, 1] : K(x ) k . Consider measuring d 2 2 (t8+log(1 t8)) 0 0 P( w 2 < d(1 t8)) e − . Sn { ∈ ≤ } d n k k − ≤ xo (k0) using a d n binary matrix A. Let yo = Axo . To ∈ S n × d n Choosing t6,t7,t8 > 0 such that t6 < t7 and 1 + t6 = recover xo from measurements yo , [18] suggests finding xˆ n d n as xˆ (y , A) , arg min n d n K(x ), and proves that (1 t8)(1 + t7), it follows that o x : yo =Ax − n if d 2k0, then this algorithm is able to find xo with high T d 2 ≥ P A w nd(1 + t6) probability. Clearly assuming that a real-valued sequence has k k2 ≥ T d 2 d 2 a low complexity is very restrictive, and hence (k ) does =P A w 2 nd(1 + t6), w 2 > d(1 t8) 0 k k ≥ k k − not include any of the classes that has been studiedS in CS +P AT wd 2 nd(1 + t ), wd 2 < d(1 t ) k k2 ≥ 6 k k2 − 8 literature. For instance most of the one sparse signals have n (t log(1+t )) d (t +log(1 t )) e− 2 7− 7 +e 2 8 − 8 . (16) infinite Kolmogorov complexity, and hence the result of [18] ≤ does not imply useful information. Combining (8) and (9) and the upper and lower bounds In a recent and independent work, [20] and [21] consider a derived for the corresponding terms of (8) and (9), and scheme similar to MCP. For a stationary and ergodic source, choosing t1 = 2σ d(1 + t2)(2κm,nm), with probability they propose an algorithm to approximate MCP. While the P( ), the following inequality holds: E1 ∩E2 ∩E3 ∩E4p∩E5 empirical results are promising, no theoretical guarantees are 2 provided on either the performance of MCP or their final (1 t4)√d ∆m − k k2 algorithm. m+1 2 √1+ t52− √n((1 + t3)√d + √n)) ∆m 2 The notion of sparsity has already been generalized in − k k the literature in several different directions [3], [4], [9], 2 σ√1+ t 2κ m ∆  2 m,n m 2 [22]. More recently, [22] introduced the class of simple − m+1 k k 2− √1+ t n 0. (17) − p6 ≤  functions and atomic norm as a framework that unifies some n 2 of the above observations and extends them to some other because i=1 ai = 1. Therefore, since the distribution of n Xi n n n signal classes. While all these models can be considered as n Yi given X / X 2 = a is independent of i=1 X 2 subclasses of the general model considered in this paper, an, k Pk k k n it is worth noting that even though the recovery approach P X i Y (0, 1). proposed here is universal, given the incomputibility of Xn i ∼ N i=1 2 Kolmogorov complexity, it is not useful for practical pur- X k k n n n poses. Finding practical algorithms with provable perfor- To prove the independence, note that X / X 2 and Y n k k mance guarantees is left for future research. are both independent of X 2. k k  In this paper, we have focused on deterministic signal models. For the case of random signals, [23] considers the REFERENCES problem of recovering a memoryless process from its under- [1] D. L. Donoho. Compressed sensing. IEEE Trans. Info. Theory, sampled linear measurements and establishes a connection 52(4):489–509, April 2006. between the required number of measurements and the Renyi [2] E. Cand`es, J. Romberg, , and T. Tao. Robust uncertainty principles: entropy of the source. Also, our work is in the same spirit Exact signal reconstruction from highly incomplete frequency infor- mation. IEEE Trans. Info. Theory, 52(2):489–509, February 2006. as the minimum entropy decoder proposed by Csiszar in [3] B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum rank [24], which is a universal decoder, for reconstructing an i.i.d. solutions to linear matrix equations via nuclear norm minimization. signal from its linear measurements. SIAM Review, 52(3):471–501, April 2010. [4] R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde. Model-based compressive sensing. IEEE Trans. Info. Theory, 56(4):1982 –2001, VI. CONCLUSION April 2010. In this paper, we have considered the problem of recov- [5] Y. C. Eldar, P. Kuppinger, and H. Bolcskei. Block-sparse signals: Uncertainty relations and efficient recovery. IEEE Trans. on Sig. Proc., ering structured signals from their random linear measure- 58(6):3042–3054, 2010. ments. We have investigated the minimum complexity pursuit [6] M. Stojnic, F. Parvaresh, and B. Hassibi. On the reconstruction of (MCP) scheme. Our results confirm that if the Kolmogorov block-sparse signals with an optimal number of measurements. IEEE Trans. on Sig. Proc., 57(8):3075–3085, 2009. complexity of the signal is upper bounded by κ, then MCP [7] M. Stojnic. Block-length dependent thresholds in block-sparse com- recovers the signal accurately from O(κ log n) random linear pressed sensing. Arxiv preprint arXiv:0907.3679, 2009. measurements, which is much smaller than the ambient [8] D. Malioutov, M. Cetin, and A.S. Willsky. A sparse signal recon- struction perspective for source localization with sensor arrays. IEEE dimension. In this paper, we have specifically proved that Trans. on Sig. Proc., 53(8):3010–3022, August 2005. MCP is stable, such that the ℓ2-norm of the reconstruction [9] M. Vetterli, P. Marziliano, and T. Blu. Sampling signals with finite error is proportional to the standard deviation of the noise. rate of innovation. IEEE Trans. on Sig. Proc., 50(6):1417–1428, June 2002. [10] C. E. Shannon. A mathematical theory of communication: Parts I and APPENDIX A II. Bell Syst. Tech. J., 27:379–423 and 623–656, 1948. [11] T. Cover and J. Thomas. Elements of . Wiley, New The following two lemmas are frequently used in our York, 2nd edition, 2006. proofs. [12] R. J. Solomonoff. A formal theory of inductive inference. Inform. Lemma 2 (χ-square concentration): Fix τ > 0, and let Contr., 7:224–254, 1964. d 2 [13] A. N. Kolmogorov. Logical basis for information theory and proba- Zi (0, 1), i = 1, 2,...,d. Then, P( i=1 Zi < d(1 bility theory. IEEE Trans. Info. Theory, 14:662–664, 1968. ∼ Nd (τ+log(1 τ)) − τ)) e 2 − , and [14] G. J. Chaitin. On the length of program for computing binary ≤ P sequences. J. Assoc. Comput. Mach., 13:547 – 569, 1966. d [15] S. Jalali and A. Maleki. Minimum complexity pursuit. In Proc. 49th 2 d (τ log(1+τ)) Allerton Conference on Communication, Control, and Computation, P( Z > d(1 + τ)) e− 2 − . i ≤ Monticello, IL, Sep. 2011. i=1 X [16] L. Staiger. The Kolmogorov complexity of real numbers. Theoretical The proof of Lemma 2 is presented in [15]. , 284(2):455 – 466, 2002. Lemma 3: Let Xn and Y n denote two independent Gaus- [17] E. Cand`es, J. Romberg, , and T. Tao. Decoding by linear programming. sian random vectors with i.i.d. elements. Further assume that IEEE Trans. Info. Theory, 51(12):4203–4215, Dec. 2005. [18] D. L. Donoho. Kolmogorov sampler. Preprint, 2002. for i =1,...,n, Xi (0, 1) and Yi (0, 1). Then the [19] D. L. Donoho, H. Kakavand, and J. Mammen. The simplest solution n T∼ Nn n ∼ N distribution of (X ) y = i=1 XiYi is the same as the to an underdetermined system of linear equations. In Proc. IEEE Int. n Symp. Info. Theory, July 2006. distribution of X 2G, where G (0, 1) is independent n k k P ∼ N [20] D. Baron and M. Duarte. Universal MAP estimation in compressed of X 2. sensing. In Proc. of 49th Allerton Conference on Communication, k k n T n n Proof: We need to show that (X ) Y / X 2 is dis- Control, and Computation, Monticello, IL, Sep. 2011. kn k [21] D. Baron and M. Duarte. Signal recovery in compressed sensing via tributed as (0, 1) and is independent of X 2. To prove N k k universal priors. Arxiv preprint arXiv:1204.2611, 2012. the first claim, note that [22] V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. Willsky. The convex geometry of linear inverse problems. Preprint, 2010. (Xn)T Y n n X = i Y . (A-1) [23] Y. Wu and S. Verd´u. Renyi information dimension: Fundamental limits Xn Xn i of almost lossless analog compression. IEEE Trans. Info. Theory, 2 i=1 2 k k X k k 56(8):3721–3748, Aug. 2010. [24] I. Csiszar. Linear codes for sources and source networks: Error On the other hand, given Xn/ Xn = an, k k2 exponents, universal coding. IEEE Trans. Info. Theory, 28:585–592, 1982. n X i Y (0, 1), Xn i ∼ N i=1 2 X k k