Minimum Complexity Pursuit: Stability Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Minimum Complexity Pursuit: Stability Analysis Shirin Jalali Arian Maleki Richard Baraniuk Center for Mathematics of Information Department of Electrical Engineering Department of Electrical Engineering California Institute of Technology Rice University Rice University Pasadena, California, 91125 Houston, Texas, 77005 Houston, Texas, 77005 Abstract— A host of problems involve the recovery of struc- their linear measurements without having the knowledge of tured signals from a dimensionality reduced representation the underlying structure? Does this ignorance incur a cost in such as a random projection; examples include sparse signals the sampling rate? (compressive sensing) and low-rank matrices (matrix comple- tion). Given the wide range of different recovery algorithms In algorithmic information theory, Kolmogorov complex- developed to date, it is natural to ask whether there exist ity, introduced by Solomonoff [12], Kolmogorov [13], and “universal” algorithms for recovering “structured” signals from Chaitin [14], defines a universal notion of complexity for their linear projections. We recently answered this question in finite-alphabet sequences. Given a finite-alphabet sequence the affirmative in the noise-free setting. In this paper, we extend x, the Kolmogorov complexity of x, K(x), is defined as the our results to the case of noisy measurements. length of the shortest computer program that prints x and I. INTRODUCTION halts. In [15], extending the notion of Kolmogorov complex- ity to real-valued signals1 by their proper quantization, we Data compressors are ubiquitous in the digital world. They addressed some of the above questions. We introduced the are built based on the premise that text, images, videos, minimum complexity pursuit (MCP) algorithm for recover- etc. all are highly structured objects, and hence exploiting ing “structured” signals from their linear measurements. We those structures can dramatically reduce the number of bits showed that finding the “simplest” solution satisfying the required for their storage. In recent years, a parallel trend has linear measurements recovers the signal using many fewer been developing for sampling analog signals. There too, the measurements than its ambient dimension. idea is that many analog signals of interest have some kind In this paper, we extend the results of [15] to the case of structure that enables considerably lowering the sampling where the measurements are noisy. We first propose an rate from the Shannon-Nyquist rate. updated version of MCP that takes into account that the The first structure that was extensively studied in this measurements are a linear transformation of the signal plus context is sparsity. It has been observed that many nat- Gaussian noise. We prove that the proposed algorithm is ural signals have sparse representations in some domain. stable with respect to the noise and derive bounds on its The term compressed sensing (CS) refers to the process reconstruction error in terms of the sampling rate and the of undersampling a high-dimensional sparse signal through variance of the noise. linear measurements and recovering it from those measure- The organization of this paper is as follows. Section II ments using efficient algorithms [1], [2]. Low-rankedness [3], defines the notation used throughout the paper. Section II-B model-based compressed sensing [4]–[8], and finite rate of arXiv:1205.4673v1 [cs.IT] 21 May 2012 defines Kolmogorov information dimension of a real-valued innovation [9] are examples of some other structures that signal. Section III formally defines the MCP algorithm and have already been explored in the literature. reviews and extends some of the related results proved in While in the original source coding problem introduced by [15]. Section IV considers the case of noisy measurements Shannon [10], the assumption was that the source distribution and proves that MCP is stable. Section V mentions some of is known both to the encoder and to the decoder, and hence the related work in the literature, and Section VI concludes is used in the code design, it was later shown that this the paper. Appendix A presents two useful lemmas used in information is not essential. In fact, universal compression the proofs. algorithms are able to code stationary ergodic processes at their entropy rates, without knowing the source distribution II. DEFINITIONS [11]. In other words, there exists a family of compression codes that are able to code any stationary ergodic process at A. Notation its entropy rate asymptotically [11]. The same result holds Calligraphic letters such as and denote sets. For for universal lossy compression. a set , and c denote itsA size andB its complement, One can ask similar questions for the problem of un- respectively.A |A| For aA sample space Ω and event set Ω, dersampling “structured” signals: How to define the class A ⊆ of “structured” signals? Are there sampling and recovery 1These type of extensions are straightforward and have already been algorithms for the reconstruction of “structured” signals from explored in [16]. ½ denotes the indicator function of the event . Bold- Theorem 1: Assume that xo = (xo,1, xo,2,...) [0, 1]∞. A A ∈ faced lower case letters denote vectors. For a vector x = For integers m and n, let κm,n denote the Kolmogorov Rn n (x1, x2,...,xn) , its ℓp and ℓ norms are defined as information dimension of xo at resolution m. Then, for any p , n ∈p , ∞ x p xi and x maxi xi , respectively. For τn < 1 and t> 0, we have k k i=1 | | k k∞ | | integer n, let In denote the n n identity matrix. P × √nd 1 + t +1+1 For x [0, 1], let ((x)1, (x)2,...), (x)i 0, 1 , denote P xn xˆn > − √n2 2m+2 ∈ ∈ { i } k o − o k2 τ − the binary expansion of x, i.e., x = i∞=1 2− (x)i. The n ! -bit approximation of , , is defined as , d 2 d 2 m x [x]m [x]m 2κm,nm 2 (1 τn+2 log τn) 2 t m i P n 2 e − +e− . i=1 2− (x)i. Similarly, for a vector (x1,...,xn) [0, 1] , ≤ n ∈ Theorem 1 can be proved following the steps used in the [x ]m , ([x1]m,..., [xn]m). P proof of Theorem 2 in [15]. To interpret this theorem, in B. Kolmogorov complexity the following we consider several interesting corollaries that The Kolmogorov complexity of a finite-alphabet sequence follow from Theorem 1. Note that in all of the results, the x with respect to a universal Turing machine is defined logarithms are to the base of Euler’s number e. U x Corollary 1: Assume that xo = (xo,1, xo,2,...) [0, 1]∞ as the length of the shortest program on that prints ∈ 2 U , and halts. Let K(x) denote the Kolmogorov complexity of and m = mn = log n . Let κn κmn,n. Then if dn = ⌈ ⌉ n n x , n κn log n , for any ǫ> 0, we have P ( x xˆ 2 >ǫ) 0, binary string 0, 1 ∗ n 1 0, 1 . ⌈ ⌉ k o − o k → ∈{ } ∪ ≥ x{ } as n . Definition 1: For real-valued = (x1, x2,...,xn) → ∞ n x ∈ Proof: For m = m = log n and d = κ log n , [0, 1] , define the Kolmogorov complexity of at resolution n ⌈ ⌉ n ⌈ n ⌉ m as 1 2mn+2 ( nd− + t +1+1)√n2− [ ]m K · (x)= K([x ] , [x ] ,..., [x ] ). 1 m 2 m n m p 2 κ log n 1 + (t + 1)n 1 + √n 1 . (2) Definition 2: The Kolmogorov information dimension of ≤ ⌈ n ⌉− − − vector (x , x ,...,x ) [0, 1]n at resolution m is defined p 1 2 n Hence, fixing t> 0 and setting τn = τ =0.1, for any ǫ> 0 as ∈ [ ]m and large enough values of n we have K · (x1, x2,...,xn) κm,n , . m (√nd 1 + t +1+1)√n2 2m+2 To clarify the above definition, we derive an upper bound − − ǫ. τn ≤ for κm,n. Lemma 1: For (x1, x2,...) [0, 1]∞ and any resolution Therefore, for n large enough, ∈ sequence mn , we have { } P xn xˆn 2 >ǫ κm,n o o 2 k − k κn log n 2 d 2 lim sup 1. 2κn log n (1 τ +2 log τ) t n n ≤ 2 e 2 − +e− 2 →∞ ≤ d 2 Therefore, by Lemma 1, we call a signal compressible, 1.4κn log n 1.7κn log n t 1 e e− +e− 2 , (3) if lim supn n− κm,n < 1. As stated in the following ≤ →∞ proposition, Lemma 1’s upper bound on κm,n is achievable. which shows that as n , P( xn xˆn 2 >ǫ) 0. iid → ∞ k o − o k2 → Proposition 1: Let X ∞ Unif[0, 1]. Then, { i}i=1 ∼ According to Corollary 1, if the complexity of the signal is 1 [ ]m K · (X ,X ,...X ) 1 mn 1 2 n → less than κ, then the number of linear measurements needed for asymptotically perfect recovery is, roughly speaking, at in probability. the order of κ log n. In other words, the number of measure- III. MINIMUM COMPLEXITY PURSUIT ments is proportional to the complexity of the signal and n only logarithmically proportional to its ambient dimension. Consider the problem of reconstructing a vector xo n d ∈ Corollary 2: Assume that x = (x , x ,...) [0, 1] [0, 1] from d < n random linear measurements yo = o o,1 o,2 ∞ n n , ∈ Ax . The MCP algorithm proposed in [15] reconstructs x and m = mn = log n . Let κn κmn,n. Then, if d = o o ⌈ ⌉ from its linear measurements yd by solving the following dn = 3κn , we have o ⌈ ⌉ optimization problem: 1 n n [ ]m P xo xˆo 2 >ǫ 0, min K · (x1,...,xn) √nk − k → xn n d s.t.