2346 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 4, APRIL 2014 Sparsity and Infinite Divisibility Arash Amini and Michael Unser, Fellow, IEEE

Abstract—We adopt an innovation-driven framework and Gaussian laws with considerably different variances, where investigate the sparse/compressible distributions obtained by the outcome of the one with the smaller variance linearly measuring or expanding continuous-domain stochastic forms the insignificant or compressible part. The so-called models. Starting from the first principles, we show that all such distributions are necessarily infinitely divisible. This property Bernoulli–Gaussian distribution is an extreme case where one is satisfied by many distributionsusedinstatisticallearning, of the variances is zero. such as Gaussian, Laplace, and a wide range of fat-tailed To understand physical phenomena, we need to establish distributions, such as student’s-t and α-stable laws. However, it mathematical models that are calibrated with a finite number excludes some popular distributions used in compressed sensing, of measurements. These measurements are usually described such as the Bernoulli–Gaussian distribution and distributions, that decay like exp ( x p) for 1 < p < 2. We further by discrete stochastic models. Nevertheless, there are two explore the implications− O of| infinite| divisibility on distributions fundamentally different modeling approaches. and conclude that tail! decay and" unimodality are preserved by 1) Assume a discrete-domain model right from the start. all linear functionals of the same continuous-domain process. 2) Initially adopt a continuous-domain model and discretize We explain how these results help in distinguishing suitable variational techniques for statistically solving inverse problems it later to describe the measurements. like denoising. We refer to the two as the discrete and the discretized models, Index Terms—Decay gap, infinite-divisibility,Lévy–Khinchine respectively. One way to formalize stochastic processes is representation, Lévy process, sparse . through an innovation model. In words, we assume that the stochastic objects are linked to discrete/continuous-domain I. INTRODUCTION innovation processes by means of linear operators. In this AUSSIAN processes are by far the most studied sto- paper, we study innovation-driven discretized models. The Gchastic models. There are many advantages in favor framework was recently introduced in [3] and [4] under the of Gaussian models, such as simplicity of statistical analy- name “Sparse Stochastic Processes.” sis (e.g.,ininferenceproblems),stabilityoftheGaussian distribution (i.e.,closednessunderlinearcombinations),and A. Motivation unified parameterization of all marginal distributions. Yet, one In many applications, the signals of interest possess sparse/ of the downsides of Gaussian distributions is that they fail compressible representations in some transform domains, to properly represent sparse or compressible data, which is a although the observations are rarely sparse themselves. good incentive for the study of alternative models. For analyzing such signals, it is befitting to establish The identification of compressible distributions (whose sparse/compressible signal models. One way to incorporate high-dimensional i.i.d. realizations likely consist of a small sparsity into the model is to assume a sparsity-inducing prob- number of large elements that capture most of the energy ability distribution for the signal. Although the statistics of the of the sequence) is an active field of research where estab- signal are often known in the observation domain, the common lished findings indicate that rapidly decaying distributions such trend is to assume a sparse/compressible distribution on the as Gaussian and Laplace are not compressible. Meanwhile, coefficients of a sparse representation. The typical example fat-tailed distributions are potential candidates for distributions is to assume independent and identically distributed (i.i.d.) with compressibility [1], [2]. For instance, mixture models are coefficients in a transform domain with Bernoulli–Gaussian common representatives of the compressible distributions in law [5]. the literature: one might think of a mixture of two zero-mean Our approach in this paper is to assume a continuous- Manuscript received December 4, 2012; revised December 7, 2013; accepted domain innovation-driven model for the signal, where the January 6, 2014. Date of publication January 29, 2014; date of current version statistics are imposed on the innovation process. The March 13, 2014. This work was supported by the European Research Council continuous-domain models are known to explain physical under Grant ERC-2010-AdG 267439-FUN-SP. A. Amini is with the Department of Electrical and Engineering, Sharif phenomena more accurately. In addition, as we will show University of Technology, Tehran 16846, Iran (e-mail: [email protected]). in this paper, the innovation-driven models allow for spec- M. Unser is with the Biomedical Imaging Group, École Polytech- ifying the statistics in any transform domain. This advan- nique Fédérale de Lausanne, Lausanne CH-1015, Switzerland (e-mail: michael.unser@epfl.ch). tage is better understood when compared to the conventional Communicated by G. Moustakides, Associate Editor for Detection and Bernoulli–Gaussian discrete model. In the latter case, any Estimation. transformation of the signal involves linear combinations of Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Bernoulli–Gaussian random variables. In general, such Digital Object Identifier 10.1109/TIT.2014.2303475 distributions are found by n-fold convolution of the constituent 0018-9448 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. AMINI AND UNSER: SPARSITY AND INFINITE DIVISIBILITY 2347 density functions. However, in the case of innovation-driven divisibility, which is classically associated with Lévy models there is a direct way of expressing all such sta- processes [6], is preserved by linear transformations. tistics based on the distribution of the innovation process. To highlight the implications of the discretized model, we Particularly, it turns out that the statistics of the innovation focus on the tail behavior of probability density functions in process determine whether the process of interest has a Theorem 5. It is well-known that, among the family of id sparse/compressible representation. laws, Gaussian distributions have the fastest decay. Since the Let us assume that the structure of the process is such that degree of sparsity/compressibility of a distribution is in inverse it has a sparse/compressible representation in a given domain relation with its decay rate [1], all non-Gaussian members (e.g.,acontinuous-domainwavelettransform).Inorderto of the id family are sparser than the Gaussians. Therefore, represent a realization of the process based on a finite number distributions with super-Gaussian decay (e.g.,distributions of measurements, it is favorable to estimate the sparse wavelet with finite support such as the uniform distribution) cannot be coefficients. However, we need the probability laws of the id. As we shall show in Theorem 7, there is even a gap between coefficients in order to estimate them. In other words, unlike the Gaussian rate of decay and the rest of the id family, in the the previous scenario in which we assume the statistics of the sense that the non-Gaussian id pdfs cannot decay faster than ( x log x ) sparse representation, we need now to derive them. e−O | | | | . The knowledge about the distribution of the coefficients Besides the tail behavior, we study the unimodality and in the sparse representation can also be exploited in signal- moment indeterminacy of id laws in Theorems 9 and 11. recovery problems (e.g.,inverseproblems)bydevisingsta- We shall show that linear transformations preserve these tistical techniques such as maximum a posteriori (MAP) and properties along with infinite divisibility. minimum mean-square error (MMSE) methods. The common implementation of such statistical techniques is to reformulate C. Outline them as variational problems in which we minimize a cost that involves the prior or posterior probability density functions We address continuous-domain models in Section II. This (pdf). Thus, the shape of the pdf plays a significant role includes the definition and characterization of innovation in the minimization procedure and, therefore, the recovery processes. In Section III, we deduce the infinite-divisibility method. property as a major consequence of adopting an innovation- In this paper, we maintain awareness of the analog perspec- based model. This property helps us characterize the set of tive of the model while studying the sparse/compressible distri- admissible probability distributions. In Section IV, we cover butions that arise from innovation-driven models. Such models some key properties that are shared among infinitely divisible serve as common ground for the conventional continuous- distributions obtained from the model, such as unimodality domain and the modern sparsity-based models. In particular, and the state of decay of the tail. we investigate the implications of these models in minimiza- tion problems linked with statistical recovery methods. II. STOCHASTIC FRAMEWORK The notations in this paper are consistent with the previous B. Contribution works [3] and [4]. We denote the continuous-domain stochastic process which models a real/complex-valued physical phe- To study innovation-driven processes, we introduce innova- d nomenon by s(x) for x R .Weassumethattheprocessis tion processes formally. The approach towards these processes the result of applying a linear∈ operator on a continuous-domain is based on observations through analysis functions rather innovation process. The innovation process is represented than through conventional pointwise samples. It allows us to by w.Themodelpresupposestheexistenceofapairof deduce the statistics of any linear functional of the innovation 1 1 operators L− and L with LL− I(identityoperator)that process (or observations through some arbitrary test functions). transform w to s and vice versa,= respectively. The former As starting point, we show in Proposition 1 that the observa- 1 is known as the shaping operator L− and the latter as the tion of an innovation process through a rectangular window whitening operator L. characterizes the whole process. The result can also be used To discretize the continuous-domain process or to project to characterize innovation-driven processes, by mapping the it onto a Riesz basis (transform domain), we consider gen- observations onto the innovation process itself. The practical eralized sampling through sampling kernels (or dual frame) advantage is that this formulation lends itself to the derivation 1 ψ1,...,ψK .Theconstraintonthekernelsisthatφi L− ∗ψi of statistics in any linear transform domain. = should have a finite L p norm for certain values of p,where At first glance, it would seem that there is no obvi- 1 1 L− ∗ refers to the adjoint of L− .WeshowinFig.1the ous distinction between the discrete and discretized versions schematic of our stochastic model. of innovation models. Nevertheless, we shall show in Theorems 2 and 3 that the discretized models are strictly embedded in the discrete family. The reason for this is that A. Characteristic Functionals every associated with linear measure- Since the considered processes are not necessarily Gaussian, ments of a continuous-domain innovation model is necessarily the first and second-order statistics (such as mean and covari- infinitely divisible (id), while there is no such restriction on ance) are not sufficient to fully characterize them. The alter- discrete-domain models. The key property is that infinite native used here is the so-called characteristic functional. It is 2348 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 4, APRIL 2014

arbitrary. By contrast, a continuous-domain innovation process is defined by its observations through test functions. Definition 1: The process w is a continuous-domain inno- vation process if

1) (Stationarity) for any test function ϕ #w and arbitrary d ∈ τ 1, τ 2 R ,thetworandomvariables w, ϕ( τ 1) ∈ ⟨ ·− ⟩ and w, ϕ( τ 2) have identical distributions. ⟨ ·− ⟩ 2) (Independent ) for test functions ϕ1,ϕ2 #w with Fig. 1. Schematic of the stochastic framework. ∈ disjoint supports such that ϕ1(x)ϕ2(x) 0, the random ≡ variables w, ϕ1 and w, ϕ1 are independent. ⟨ ⟩ ⟨ ⟩ conceptually the same as the characteristic function but it is The independent- property implies that, for ϕ1, associated with random processes rather than random vari- ϕ2 #w with disjoint supports, we should have that ∈ ables [7]. For an arbitrary random process z,thecharacteristic Pw(ϕ1 ϕ2) Pw(ϕ1)Pw(ϕ2). (4) functional Pz is defined as + = j z,ψ In Section III, we give a full characterization of continuous- Pz(ψ) E e ⟨ ⟩ , (1) ! ! ! ! = domain innovation processes using the Gelfand–Vilenkin where ψ is a suitable test function" and# approach. ! It is known that two Gaussian random variables are inde- pendent if and only if they are uncorrelated. Thus, for Zψ z,ψ z(x)ψ(x)dx (2) =⟨ ⟩= d Gaussian innovation processes, it is common to express $R is a real/complex-valued which is a linear the independent-atom property based on two orthogonal test functional of z.Notethatthecharacteristicfunctionalis functions ( ϕ1(x)ϕ2(x)dx 0) without considering their supports. The characteristic functional= of a Gaussian innova- indexed by test functions rather than scalars. , The specification of the input domain of a characteristic tion process with symmetric distribution is given by [8] 1 2 2 functional is part of its definition. For a process z,wedenote 2 σg ϕ 2 Pwg (ϕ) e− ∥ ∥ . (5) the set of all valid test functions by #z.Itisrequiredthatthis = set is a function . This implies that linear combinations of This functional is well-defined for ϕ L2.Itsatisfies ! ∈ elements in #z belong to #z.Hence,ifψ1,...,ψK #z and the requirements of Definition 1 owing to the fact that K ∈ 2 2 2 ω1,...,ωK is a set of real variables, then k 1 ωkψk #z ϕ( τ) 2 ϕ 2 and ϕ1 ϕ2 2 ϕ1 2 ϕ2 2 = ∈ ∥ ·− ∥ =∥∥ ∥ + ∥ =∥ ∥ +∥ ∥ + and 2 ϕ1,ϕ2 . % ⟨ ⟩ K K j k 1 ωk Zψk Pz ωkψk E e = = C. Linear Operators k 1 % & '= ( ) * The linear operator L in Fig. 1 is the continuous-domain ! K K analog of the sparsifying matrix used in compressed sensing. p (x ,...,x )e j k 1 ωk xk dx Zψ Zψ 1 K = k 1 = K 1 : K % Conversely, the inverse operator L mixes the independent $R k 1 − += components of the innovation process to form specific correla- 1 pZ Z (ω1,...,ωK ), (3) tion patterns. In order to formally define the application of L = F ψ1 : ψK − ) * on w,onemightthinkofstudyingtheeffectoftheoperator where represents the Fourier transform operator and F{·} on the realizations. Here, we concentrate on the characteristic pZ Z is the joint pdf of the linear observations of the ψ1 ψK functionals. The key to our study is the concept of adjoint : z Z Z process : ψ1 ,..., ψK .Therefore,allfinite-dimensional operator which enables us to write pdfs can be derived from the characteristic functional. The 1 1 function # used in this paper are the intersection of Sψ s,ψ L− w, ψ w, L− ∗ψ Wφ , (6) =⟨ ⟩=⟨ ⟩=⟨ ⟩= two L p spaces (functions with finite p-norm). where L 1∗ is the adjoint operator of L 1 and φ L 1∗ψ. The main interest of characteristic functionals is that they − − = − provide a concise and rigorous way of defining stochastic Hence, the characteristic functional of the process s can be expressed as processes. In brief, a functional Pz for which Pz(0) 1 and K is a valid characteristic function for= all 1 Pz k 1 ωkψk Ps(ψ) Pw(L− ∗ψ) Pw(φ). (7) = ψk #z corresponds to a unique! random process! z.More = = ∈ & % ( details! about this fact are provided in Appendix A. This definition implies that the domain of Ps is made up ! ! ! 1 of those test functions ψ for which φ L ∗ψ #w.For = − ∈ existence considerations and the proper interpretation! of s as B. Innovation Process ageneralizedprocessovertempereddistributions( ′ ), it is S The innovation process (a.k.a. white noise) is a random important that the domains of both Pw and Ps include the object composed of i.i.d. constituents. A discrete-domain inno- Schwartz function space (see Appendix A). This constrains 1 S vation process is simply a sequence of i.i.d. random variables L− ∗ to form a mapping from the Schwartz! space! to a subset 1 that is completely characterized by its pdf, which can be of #w.InthispaperweassumethattheoperatorL− satisfies AMINI AND UNSER: SPARSITY AND INFINITE DIVISIBILITY 2349 the required admissibility properties. More details regarding dealing with a linear transform domain with compressible i.i.d. the specification of suitable inverse operators can be found in coefficients, which is an ideal scenario for compressed sensing. [3] and [9]. In most cases, however, such a linear transformation does It is worth mentioning that L needs not be uniquely not exist. Instead, one may think of a linear transformation 1 invertible to have an acceptable shaping operator L− .The which best uncouples the coefficients. For instance, it is shown real requirement is that L has a finite-dimensional null space in [10] that, if L is a differential operator, then the generalized and L 1 is a right inverse of L that continuously maps differences of uniform point samples of s have finite-length − S into #w [9]. Ordinary differential operators with constant dependencies. Note that the generalized differences are linear coefficients are among examples of L that admit suitable right- functionals of the process and can be written in the form 1 1 inverses [10], [11]. For such operators, L− is an integral s,ψi .Inturn,thefunctionsφi L− ∗ψi are exponential operator. The formalism also extends to the cases where the ⟨splines⟩ associated with the differential= operator L and are of underlying system is unstable, which requires the imposition of finite support [12]. The relaxation of the i.i.d. property to suitable linear boundary conditions in order to enforce unicity. finite-length dependencies is still useful because the sequence Linearity is of fundamental importance to the formulation and can be written as the union of a finite (but more than one) is embedded in the definition of s via the use of the adjoint number of i.i.d. subsequences. 1 operator L− ∗. III. INFINITE DIVISIBILITY D. Discretization The main property of the measurements that we are going We use the term discretization for observations of the to explore is the infinite divisibility stated in Definition 2. continuous- process s through test functions (sampling Definition 2: ArandomvariableX (or its distribution) is kernels). We now explain two objectives of discretization. said to be infinitely divisible if, for all positive n, 1) Let us consider the expansion of the realizations of s we can write X as the sum of n independent and identically in a Riesz basis, namely, ψ .Thecoefficientsinthe distributed (i.i.d.) random variables. ˜ i expansion are found as the{ inner} products of s with the It is easy to check that the sum of n independent Gaussian µ σ 2 dual basis ψi as in random variables with mean n and variance n is a Gaussian { } random variable with mean µ and variance σ 2.Thus,all sψ s,ψi ψi . (8) ˜ = ⟨ ⟩ ˜ Gaussian random variables are infinitely divisible. The same 'i argument can be extended to other stable distributions. This suggests that the statistics of s would be encap- Nevertheless, the stable distributions are only a small part sulated in the coefficients s,ψi .Inthisscenario,the of the id family. The complete family is characterized by ⟨ ⟩ task of finding the coefficients and their statistics in a the celebrated Lévy–Khinchine representation theorem in transform domain is referred to as discretization. Section III-B. 2) We are practically limited to sense physical phenomena In the following, we first show that the discretizations of through a finite number of measurements. The stochastic an innovation process through rectangular test functions are models that describe such phenomena are usually char- infinitely divisible. Then, we characterize all id laws. This in acterized by a set of parameters, and the measurements turn characterizes the innovationprocesses.Thefinalresultof can be used to estimate these parameters. For instance, this section is that infinite divisibility is a general property that one might think of the optimal set of parameters as the is shared by all linear measurements and is not restricted to one that best explains the statistics of the measurements. rectangular test functions. In many applications, the measurements are point sam- ples or, more generally, linear samples of the physical A. Observations With Rectangular Windows phenomena by means of sampling kernels ψ1,...,ψK . In this context, the discretization procedure translates To measure a process through a given test function, we into linearly measuring the process. It might also involve should first make sure that the test function belongs to the an additive noise term. associated function space. For the innovation process w, In spite of different objectives,alldiscretizationsarecen- this requires the knowledge of the space #w.Althoughwe postpone the exact identification of #w to Section III-D, we tered on the random variables sψi s,ψi .Thedefinition =⟨ ⟩ 1 can already use the result that #w is the intersection of some of s restricts the kernels ψi to satisfy φi L− ∗ψi #w. = ∈ L p spaces, which certainly contains the intersection of all L p This guarantees the inclusion of the random variables sψi in the established framework by the way of the characteristic spaces. functional. We first examine the unit rectangular test function rect(x) that takes the value 1 for x 0, 1 d and 0 otherwise. Since Our results in this paper do not depend on the choice of ∈[ [ sampling kernels used in discretization, as long as they are this function is bounded and has finite support, it belongs to d admissible. However, certain kernels are preferable for the p 0 L p(R ). ≥ purpose of sparse representation. For the sake of simplicity, Lemma 1: The random variable w, rect ,wherew is an - ⟨ ⟩ let us consider a hypothetical setting in which the sψi are innovation process as specified in Definition 1, is infinitely i.i.d. If the distribution is also compressible, then we are divisible. 2350 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 4, APRIL 2014

Proof: Adistinguishingpropertyoftherectangular the constant σ is usually regarded as the Gaussian term; this function is its refinablity given by becomes even more evident in the Lévy-It¯odecompositionof n 1 Lévy processes. − When the Lévy measure is symmetric, with V (I) V ( I) rect(x) rect(nx1 i, x2,...,xd), (9) = − = − i 0 for all measurable sets I,thentheLévyexponentadmitsthe '= simplified form where x (x1,...,xd ) and n is any positive . This implies that= σ 2 f (ω) jθω ω2 1 cos(aω) dV (a). (13) = − 2 − − X w, rect $R 0 =⟨ ⟩ \{ } & ( n 1 By Theorem 1 and Lemma 1, it follows that the character- − w(x), rect(nx1 i, x2,...,xd ) . (10) istic function of w, rect takes the generic form = ⟨ − ⟩ ⟨ ⟩ i 0 '= Xi f (ω) p w,rect (ω) Pw(ω rect) e , (14) ˆ⟨ ⟩ = = Note that the test. functions rect/0(nx1 i, x2,...,1 xd ) for − where f is a valid Lévy exponent. i 0,...,(n 1) differ only by the shift parameter i. ! Moreover,= they− have disjoint supports. Hence, due to the stationarity and independent-atom property of the innovation C. Discretizations With General Windows process w,therandomvariablesXi are i.i.d. Consequently, So far, we have considered rectangular windows. Next, we for arbitrary n,wehavearepresentationofX as the sum of n study the implications of rectangular windows on more general i.i.d. random variables, thus X is infinitely divisible. test functions. By employing the independent-atom and stationarity prop- B. Characterization of id Distributions erties of the innovation process and the refinement formula The concept of infinite divisibility was introduced and stud- rect(x) rect(nx k), (15) ied in the late 1920’s and early 1930’s by Finetti, Kolmogorov, = − k 0,...,n 1 d and Lévy. The complete characterization of id distributions is ∈{ '− } given by the Lévy–Khinchine representation theorem. we conclude that Theorem 1 (Lévy–Khinchine [6]): The random variable 1 f (ω) d d Pw ω rect(nx k) Pw(ω rect) n e n . (16) X is infinitely divisible if and only if its characteristic function − = = has the form pX (ω) exp f (ω) with This allows& us to further( identify& the value( of for the ˆ = ! ! Pw 2 general class of piecewise-constant functions of the form σ 2 & ( f (ω) jθω ω ϕ a rect(nx k) such as ! = − 2 = k − jaω % e 1 jaω1 a <1(a) dV (a), (11) Pw ω ak rect(nx k) Pw ωak rect(nx k) + 0 − − | | − = − R k K k K $ \{ } 2 '∈A 3 ∈+A & ( 1 & ( 1 where h(a)<1(a) 1for a h(a)<1 and 0 otherwise, θ,σ ! d !k f (ωak) = { | } e n ∈AK , (17) are constants, and V (the Lévy measure) is a positive measure = % d that satisfies where K K,...,K and ak R are arbitrary coeffi- cients.A In other={− words, thecharacterizationof} ∈ w, rect results min(1, a2)dV (a)< . (12) ⟨ ⟩ in the identification of Pw over the set of d-dimensional R 0 ∞ The function$ f\{in} (11) is usually referred to as the Lévy piecewise-constant signals of finite support with corners at exponent.Itsfinitenessimpliesthatthecharacteristicfunction rational grid points. ! of an id random variable does not vanish. It is interesting These step functions can also be used to approximate other to point out how three important properties of characteristic test functions; by increasing n and K ,wemakethestep functions impact the Lévy exponent. functions finer and wider in support, respectively. Proposition 1: For a given test function ϕ # ,where 1) Normalization: pX (0) pX (x)dx 1. This implies w R p ∈ ˆ = = #w L p L p and ϕ i is Riemann-integrable, we have that f (0) 0, which is consistent with (11). = 1 ∩ 2 | | 2) Since the characteristic= function, is the Fourier transform that of a nonnegative distribution, we have that pX (ω) |ˆ |≤ ω Pw(ωϕ) exp f ωϕ(τ) dτ . (18) pX (0).Thisisequivalentto f (ω) 0, with equality ∀ : = d ˆ ℜ{ }≤ $R at ω 0. Proof: The key idea is that,2 due to& Riemann( integrability3 ! 3) Continuity:= p is the Fourier transform of a non-negative of ϕ ,itispossibletofindasequenceofstepfunctions ˆ | | integrable distribution. Thus, it is continuous. This trans- ϕn n such that ϕn ϕ and limn ϕn ϕ.Foreach { } ∈N | |≤| | →∞ = lates into f log p being continuous as well. realization of w like wr we have that = ˆ Theorem 1 indicates that id distributions are uniquely char- wr ,ϕ lim wr ,ϕn . (19) n acterized by the triplet (θ,σ, V ).Forinstance,aGaussian ⟨ ⟩= →∞⟨ ⟩ 2 distribution with mean µg and variance σ corresponds to Therefore, the random variables Wϕ w, ϕn converge to g n =⟨ ⟩ the triplet (µg,σg, V 0).Infact,thetermassociatedwith the random variable Wϕ w, ϕ ,almostsurely.Accordingto ≡ =⟨ ⟩ AMINI AND UNSER: SPARSITY AND INFINITE DIVISIBILITY 2351

Lévy’s continuity theorem, a similar convergence result holds Definition 3: The Lévy measure V is said to be for the characteristic functions (p1, p2)-bounded for 0 p1 p2 2if ≤ ≤ ≤ jωWϕ jωWϕn p p pWϕ (ω) E e lim E e 1 2 ˆ = = n min( a , a )dV (a)< . (25) →∞ R 0 | | | | ∞ lim" pW# (ω). " # (20) $ \{ } = n ˆ ϕn The concept of (p1, p2)-boundedness is to refine the (0, 2)- →∞ boundedness imposed by (12) in order to better represent the Note that pWφ (ω) Pw(ωφ) and, for step functions ϕn,we properties of a given Lévy measure. As Lemma 2 indicates, a ˆ = already know the validity of (18) from (17), so that (p1, p2)-bounded measure is automatically (0, 2)-bounded. ! Lemma 2: If 0 q1 p1 p2 q2 2, then, (p1, p2)- ≤ ≤ ≤ ≤ ≤ Pw(ωϕn) exp f ωϕn(τ) dτ . (21) boundedness of a measure V implies its (q1, q2)-boundedness. = d $R Proof: By separately studying the cases of a 1and 2 & ( 3 Hence, ! a > 1, we can check that | |≤ | | q1 q2 p1 p2 Pw(ωϕ) lim exp f ωϕn(τ) dτ min( a , a ) min( a , a ). (26) = n d | | | | ≤ | | | | →∞ $R 2 & ( 3 This yields ! exp lim f ωϕn(τ) dτ . (22) = n d →∞ $R p1 p2 2 & ( 3 min( a , a )dV (a) The proof is completed by 0 | | | | $R\{ } min( a q1, a q2 )dV (a)< . (27) lim f ωϕn(τ) dτ f ω lim ϕn(τ) dτ, (23) ≤ | | | | ∞ n d = d n R 0 →∞ $R $R →∞ $ \{ } & ( & ϕ(τ) ( The particular instance of Lemma 2 for q1 0andq2 2 where we invoke the continuity of f and. Lebesgue’s/0 1 domi- = = suggests that (p1, p2)-boundedness is a more restrictive prop- nated convergence theorem to justify the interchange of limits. erty than the classical constraint (12). This requires the upperbound of Lemma 3 (Section III-D) on In order to specify #w,weneedtotakeintoaccount f (ω) which implies an upperbound on d f ωϕn(τ) dτ in R the properties of the Lévy triplet (θ,σ, V ).Theconceptof terms| | of some L norms of ϕ ,andconsequently,ofϕ. p n (p , p )-boundedness describes some properties of the Lévy One can check that (18) is consistent with, previous& assump-( 1 2 measure V ,suchasthedecayofitstail.Wefurtherrefinethis tions regarding the rectangular window. In particular, the concept in Definition 4 by including the two other elements characteristic function of X rect predicted by (18) w, of the triplet. matches p (ω) exp f (ω) =.Moreover,bysetting⟨ ⟩ ω 1 X Definition 4: We say that the pair (p , p ),where in (18), weˆ can interpret= the result of Proposition 1 in terms= min max 0 p p 2, bounds the Lévy triplet (θ,σ, V ) of the characteristic functional& ( by min max if ≤ ≤ ≤

1) V is (pmin, pmax)-bounded (see Definition 3), Pw(ϕ) exp f ϕ(τ) dτ . (24) = d 2) 1 pmin, pmax in case θ 0orV is asymmetric $R ∈[ ] ̸= 2 & ( 3 (no constraint when θ 0andV is symmetric), and This form is multiplicative! for disjointly supported test func- 3) p 2forσ 0(noconstraintfor= σ 0). tions ϕ ,ϕ ,whichguaranteestheindependent-atomproperty max 1 2 Similar to= (p , p )-boundedness,̸= it is easy= to check that of the process. Conversely, Gelfand and Vilenkin proved in [8] 1 2 (0, 2) bounds all Lévy triplets. Furthermore, if (p , p ) that (24) is a valid characteristic functional over the space of min max bounds a given triplet, then all pairs of (q , q ) such that smooth and compactly supported functions if and only if f is min max 0 q p p q 2boundthetripletas avalidLévyexponent.Inthiswork,weshallinvestigatethe min min max max well.≤ ≤ ≤ ≤ ≤ extent to which we can expand the class of test functions. The significance of Definition 4 is in identifying the L p spaces whose intersection results in a valid function space #w D. Characteristic Functional Over #w for the domain of the characteristic functional. We show in By defining a characteristic functional over some function Theorem 2 that, if (pmin, pmax) bounds the Lévy triplet, then space #,weimplythattheprobabilitymeasureoftheprocess d d #w L pmin (R ) L pmax (R ) (28) is supported on the dual of # (Appendix A). To highlight this = ∩ point, let # be a strict subspace of #.Thedefinitionofthe is a suitable function space as the input domain of the char- ˆ characteristic functional over # induces a definition over #. acteristic functional of the innovation process. By convention, ˆ The latter definition results in an extension of the probability the limit case L0 denotes the space of bounded and compactly space to the algebraic dual of #,typicallyviatheinclusion supported functions. ˆ of new sets with probability measure zero. Therefore, it is For 0 qmin pmin pmax qmax 2, we have that ≤ ≤ ≤ ≤ ≤ desirable to base the definition of the characteristic functional Lq Lq L p L p .Thus,thetighterbounding min ∩ max ⊆ min ∩ max on the largest-possible space, so as to maximally constrain the pair (pmin, pmax) on the Lévy triplet allows for a larger support of the probability measure. function space #w.Concretely,thismeansthatbyfindingtight 2352 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 4, APRIL 2014

bounding pairs for the Lévy triplet we get a better description Theorem 2, respectively. For a given ϕ #w,defineµϕ to of the properties of the innovation process. be the measure describing the amplitude∈ distribution of ϕ. In Theorem 2, we prove that the function space #w defined Then, the random variable Xϕ w, ϕ is infinitely divisible in (28) is a suitable domain for the characteristic functional with the Lévy exponent =⟨ ⟩ of (24). Lemma 3, whose proof is given in Appendix B, is our main tool for establishing this fact. fϕ(ω) f ωϕ(τ) dτ, (33) = d Lemma 3: Let V be a (p , p )-bounded Lévy measure and $R 1 2 & ( define which can be represented by the triplet (θϕ,σϕ , Vϕ),where

jaω g(ω) e 1 jaω1 a <1(a) dV (a). (29) σϕ σ ϕ 2 (0, if σ 0), = 0 − − | | = ∥ ∥ = (34) $R\{ } 2 3 ⎧ Vϕ(I) aτ I dV (a)dµϕ(τ)(0 / I R). We have that ⎨ = ∈ ∈ ⊂ m M Furthermore, except, for ϕ 0, Vϕ is (p1, p2)-bounded if and g(ω) κ1 ω κ2 ω , (30) ⎩ ≡ ≤ | | + | | only if V is (p1, p2)-bounded. where κ1 and κ24 are nonnegative4 constants. Here, (m, M) To facilitate reading of the paper, the proof is postponed 4 4 = (p1, p2) if V is symmetric, and m min(1, p1) and M to Appendix C. The main message in Theorem 3 is that all = = max(1, p2) otherwise. linear observations of an innovation-driven process (subject to 1 Theorem 2: Let f be a Lévy exponent characterized by the admissibility condition ϕ L ∗ψ #w)areinfinitely = − ∈ the triplet (b,σ,V ) and let (pmin, pmax) be a pair that bounds divisible with roughly similar Lévy measures. the triplet. Then, the characteristic functional IV. PROPERTIES OF INFINITELY Pw(ϕ) exp f ϕ(τ) dτ DIVISIBLE DISTRIBUTIONS = d 2 $R 3 & ( d d In this section, we study properties of the id family such is finite (well-defined)! over #w L pmin (R ) L pmax (R ). = ∩ as decay and unimodality of the probability density functions. Proof: Using the Lévy-Khintchine representation (11), we We also investigate their consequence on transform-domain rewrite the exponent of the characteristic functional as statistics. Our approach is based on expressing various proper- σ 2 ties of the pdf in terms of the associated Lévy measure. As dis- f ϕ(τ) dτ j θ ϕ(τ)dτ ϕ2(τ)dτ d = d − 2 d cussed in Section III, certain high-level properties of the Lévy $R $R $R & ( measures are shared among different linear measurements of T1 T2 an innovation-driven process. The links between the Lévy . g/0ϕ(τ)1dτ.. /0 1 (31) measures and pdfs help us in establishing the implications + d $R that this has on the probability laws. & ( T3 Remark 1: Let X be an infinitely divisible random variable with Lévy triplet (θ,σ, V ),andN be a Gaussian random We establish finiteness for. each/0 of the1 terms contributing 2 in (31). variable independent of X with mean µg and variance σg . Then, the random variable X N is also infinitely divisible If θ 0, then T1 0. For θ 0, Definition 4 implies + 2 2 • = = ̸= d with the Lévy triplet θ µg, σ σg , V .Thiscanbe that 1 pmin, pmax or ϕ L1(R ).Theinequality + + ∈[ ] ∈ easily verified by stating the independence of X and N in the T1 θ ϕ 1 confirms that T1 is finite. & 8 ( | |≤| |·∥ ∥ form pX N (ω) pX (ω)pN (ω).Themainconsequenceisthat Similarly, σ 0yieldsT2 0; thus, we assume σ 0. • ˆ + =ˆ ˆ Under this assumption,= Definition= 4 necessitates̸= that the existence of an additive Gaussian noise does not change d those properties of X that are related to its Lévy measure. pmax 2orϕ L2(R ).ThefinitenessofT2 is obtained = ∈ σ 2 2 by the inequality T2 ϕ . | |≤ 2 ∥ ∥2 We prove the finiteness of T3 by applying Lemma 3. Note A. Decay Rate • that the pair (pmin, pmax) also satisfies the requirements The id property is typically associated with slowly decaying d of Lemma 3. This provides us with ϕ L pmin (R ) pdfs. More specifically, it will be proved that Gaussian laws d ∈ ∩ L pmax (R ) and have the fastest rate of decay among id distributions. Thus, pmin pmax all distributions with super-Gaussian decay are necessarily T3 κ1 ϕ p κ2 ϕ p . (32) | |≤ ∥ ∥ min + ∥ ∥ max non-id. However, a sub-Gaussian decay does not necessarily imply infinite divisibility. As we shall demonstrate, there is a decay-gap between the Gaussians and the rest of the id family. E. Infinite Divisibility of All Discretizations We start our investigation by recalling a standard result in the theory of id laws. Our last contribution in this section is to show that all the Theorem 4 (25.3 in [6]): Let V be the Lévy measure of measurements Wϕ w, ϕ are infinitely divisible. an infinitely divisible random variable X.Then,foralllocally Theorem 3: Let=(θ⟨,σ, V⟩ ) be a Lévy triplet representing bounded functions g R R such that the Lévy exponent f and let #w and w be the corre- : 3→ sponding function space and innovation process as defined in g(x y) g(x)g(y), x, y R, (35) + ≤ ∀ ∈ AMINI AND UNSER: SPARSITY AND INFINITE DIVISIBILITY 2353

r we have that E Xϕ is finite for all 0 < r < p and is infinite for r p. {| ¯| } ≥ Recalling Theorem 5, we conclude that E X r and, therefore, g(x) < g(a)dV(a)< . (36) r {| | } EX E Xϕ ,arefinitefor0< r < p and infinite for r p.Thus, { } ∞⇐⇒ a 1 ∞ {| | } ≥p The functions g satisfying (35)$| | are≥ called submultiplicative. Xϕ is also fat-tailed with the same decay rate x − . | | They include all functions of the form g(x) (1 x )δ1 The effect of an additive Gaussian noise is cast in the δ δ2 δ3 x 4 = +| | Gaussian parameter σ of the Lévy triplet. Lemma 4 shows 1 ln(1 x ) e | | ,whereδ1,2,3 0and0<δ4 1. In+ particular,+| Theorem| 4 can be applied≥ to investigate≤ the that the fat-tail property is solely determined by the Lévy &existence of moments.( measure. Lemma 4: Let X be an id random variable with Lévy Compressible distributions are closely related to fat-tailed measure V .Then,forallp 0, distributions [1], [2]. In fact, Lemma 5 states that the com- ≥ pressibility of a linear observation is a property that is inherited p p E X < a dV (a)< . (37) from the innovation process and is independent of how it is {| | } ∞⇐⇒ a 1 | | ∞ measured or expanded. Proof: The function g(x$)| |≥ x p does not satisfy the =|| Illustration 1: Let us consider the recovery of compress- requirement of Theorem 4. Therefore, we continue with ible vectors from noisy linear measurements. For this purpose, p p E X < E 1 X < let x be an i.i.d. random vector with a fat-tailed distribution {| | } ∞⇐⇒ { +| | }p ∞ and let y Ax n be the measurements, where A is a E (1 X ) < , (38) = + ⇐⇒ { +| | } ∞ known sensing matrix and n stands for a vector of white 2 where we used the inequalities Gaussian noise with variance σn .Inourframework,this p p p 1 p problem can reflect the discretization of a continuous-domain 1 x (1 x ) 2 − (1 x ). 1 +| | ≤ +| | ≤ +| | process where A and x correspond to the discretizations of L− Now, the function g(x) (1 x )p fulfills the requirement and the innovation process, respectively. An example of such of Theorem 4. Thus, = +| | discretization can be found in [13]. For the sake of simplicity, we focus on the MAP estimator which is known to take the p p E X < (1 a ) dV (a)< form {| | } ∞⇐⇒ a 1 +| | ∞ $| |≥ 1 p x arg min y Ax 2 J(x), (41) a dV (a)< . (39) x 2σ 2 2 ⇐⇒ a 1 | | ∞ ˆ = n ∥ − ∥ + $| |≥

where J(x) log pX (x) i log pX (xi ).Thecommon Theorem 5: Let w be an innovation process and let X and sparsifying penalty=− term used= in− compressed sensing is J(x) Xϕ be w, rect and w, ϕ ,respectively,where0 ϕ #w % = ⟨ ⟩ ⟨ ⟩ ̸= ∈ ∩ x 1 i xi which is obtained for x vectors following a L p Lmax(2,p).Then,wehavethat ∥ ∥ = | | ∩ Laplace distribution. Several authors have pointed out that the p p Laplace% distribution is by no means sparse or compressible. E X < E Xϕ < . (40) In a nutshell,{| Theorem| } ∞⇐⇒ 4 and Lemma{| | } 4 imply∞ that the pdf Furthermore, the classical least-square estimator outperforms and the Lévy measure of an id distribution have the same the MAP estimator under Laplace distributions [10]. For fat-tailed distributions of x,thepenaltytermJ(x) is rate of decay of their tails. Theorem 5 states that X and Xϕ are equivalent random variables in the sense of existence of of the form i +(xi ) with +(x) log x ,whichis fundamentally different from x .Nevertheless,thepenalty= O | | moments. The additional restriction ϕ L p Lmax(2,p) is to % | | & ( ∈ ∩ term log( ) can be regarded as an ℓ1-ℓ0 relaxation [14] and ensure that the amplitude distribution measure µϕ has finite · pth or both pth and second-order moments. The proof of is useful in image recovery [15]. Moreover, for fat-tailed Theorem 5 is postponed to appendix D. distributions, the MAP estimator is a biased but still fair Our next step is to show that the linear observations of an approximation of the Bayesian (posterior mean) estimator [10]. As the rate of decay of the tail of a distribution increases innovation process through test functions in p L p all have the same fat-tail behavior. (faster decay), it becomes less compressible. One of the - properties of the id family is that the Gaussian distributions are Lemma 5: Let #w be the domain of the characteristic functional of an innovation process w.Ifthedistributionof the least-compressible members.Infact,Gaussiandistributions are somewhat isolated members, not only because of their Xϕ w, ϕ for a given ϕ p L p 0 is fat-tailed with ¯ =⟨ ¯⟩p ¯ ∈ \{ } extreme rate of decay, but also due to a gap between their lim x x P( Xϕ > x ) 0, for some p > 0, then, | |→∞ | | | ¯| | | ∈-] ∞[ rate of decay and that of the rest of the family. Theorem 6 the distribution of w, ϕ for all ϕ p L p 0 is fat-tailed ⟨ ⟩ p ∈ \{ } paves the road for specifying this gap. with the same decay rate x − .Moreover,theadditionof Gaussian noise to the measurement| | does- not change the fat- Theorem 6 (26.1 in [6]): Let X be an id random variable tail property. corresponding to a Lévy measure V .Define Proof: First note that, due to Theorem 3, all the ran- dom variables w, ϕ are infinitely divisible. Let X, Xϕ, c inf a > 0 SV x x a , (42) ⟨ ⟩ ¯ = : ⊆{ :||≤ } and Xϕ denote the random variables w, rect , w, ϕ ,and ) * ⟨ ⟩ ⟨ ¯⟩ p w, ϕ ,respectively.Then,thecondition0< lim x x where SV denotes the support set of the Lévy measure V . ⟨ ⟩ | |→∞ | | P( Xϕ > x )< (fat-tail property of Xϕ)indicatesthat We also allow c to take the values 0 and .Then,forthe | ¯| | | ∞ ¯ ∞ 2354 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 4, APRIL 2014

to the decay of the tail, we investigate the implications of the Lévy measure on the pdf in terms of unimodality. Definition 5 ([16]): AmeasureV is said to be unimodal with mode a0 if it can be expressed as

V (da) cδa (da) v(a)da, (44) = 0 +

where c is a nonnegative , δa0 is Dirac’s delta function supported at a0,andv is an increasing function on , a0 and decreasing on a0, . ]−∞We use[ Theorem 8 proved] in∞ [17][ as the main tool for connecting the unimodality of the pdf to that of the Lévy measure. Theorem 8 ([17]): If a Lévy measure V is symmetric and unimodal with mode 0, all the id random variables identified Fig. 2. Identification of id distributions with respect to their tail probabilities by the Lévy triplet (θ,σ, V ) have unimodal pdfs. in the form of exp x α(log x )β .Examplesinclude(α 2, β 0) for Gaussians; −(α O |1,|β 0|) |for Laplace, hyperbolic and Gumbel= Theorem 9: Let w be an innovation process for which the distributions;= (α 0,&β =1&) for all= the fat-tailed(( laws; (α 0,β 2) for random variable w, rect admits the Lévy triplet (θ,σ, V ). log-normal distributions;= = and (α 1,β 1) for all id laws= with non-zero= If V is symmetric⟨ and unimodal⟩ with mode 0, then the pdf of but finitely supported Lévy measures.= The= only id distributions in the shaded area are Gaussians. w, ϕ for all ϕ #w is unimodal. ⟨ Proof:⟩ By using∈ Theorem 8, it is sufficient to show that the Lévy measure Vϕ of w, ϕ is also symmetric and unimodal super-exponential moments, we have that with mode 0. To show⟨ its symmetry,⟩ we recall Theorem 3 and 1 α x log x write Vϕ( I) for 0 / I R as 0 <α< c EX e | | | | < , − ∈ ⊂ 1 : α x log x ∞ (43) c <α EX e | | | | . 9 : " # =∞ Vϕ( I) dV (a)dµϕ(τ) dV ( a)dµϕ(τ) Theorem 7: The only id distributions that decay faster than − = = − ( x log x ) " # $aτ I $aτ I e−O | | | | are the Gaussians. ∈− ∈ Proof: Ataildecayingfasterthanexp ( x log x ) dV (a)dµϕ(τ) Vϕ(I). (45) − Oα x| log| x | | = aτ I = implies that all super-exponential moments EX e | | | | are $ ∈ finite. By using the result of Theorem 6,& this implies that( Unimodality of V with mode 0 requires the corresponding 1 " # delta term of V (dx) to be placed at zero. However, as pointed c ,wherec is defined in (42). Thus, we shall have c =0,∞ which confirms that V is supported only at 0 .Besides, out earlier, zero is excluded in all the integrals over V .This note= that 0 is excluded in all the integrals involving{ } V . fact, in conjunction with the symmetry of V ,suggeststhat Hence, such{ } a V is effectively equivalent to the zero measure. V (dx) can be effectively written as v( x )dx where v is a decreasing function. Hence, for a a| |> 0wecanwrite Evidently, an id distribution with zero Lévy measure is a |¯|≥| | Gaussian distribution (see Section III-B). that Illustration 2: Let us consider the pdfs that have a rate a a κ Vϕ(da) v dµϕ(τ) v dµϕ(τ) of decay of the form exp ( x ) .TheGaussianand = τ 0 τ = τ 0 τ − O | | $ ̸= $ ̸= Laplace distributions are id examples that correspond to κ 2 2 a3 24 43 & ( = v ¯ dµϕ(τ) Vϕ(da),4 4 (46) and κ 1, respectively. However, Theorem 7 states that the ≥ τ = ¯ = $τ 0 pdfs corresponding to 1 <κ<2arenotinfinitelydivisible ̸= 24 43 (the gap). For a better understanding of this result, we revisit which proves the unimodality4 4 of Vϕ. the MAP estimator of Illustration 1. It is well-known that, for Gaussian and Laplace distributions of x,thepenaltytermJ(x) 2 C. Moment Indeterminacy in (41) transforms into x 2 and x 1 ,respectively. AsimpleconsequenceofthegapinthedecayofthetailofO ∥ ∥ O ∥ ∥ The problem of moments, or Hamburger-moment problem, & ( & ( p id distributions is that penalty terms of the form x p for is to answer whether the set of moments 1 < p < 2arenotallowed.WeillustratethisgapinFig.2.∥ ∥ m(U) andU(a), n 0, 1, 2,... n = = $R B. Unimodality uniquely determines the measure U.Incasetheanswer The modes of a real-valued function are the points at which is negative, the measure is called moment-indeterminate or, the function attains its local maxima or minima. A pdf is briefly, indeterminate. There are simple necessary or suf- unimodal if it has a unique local maximum and no local ficient conditions (namely, Krein’s and Carleman’s condi- minima. In words, a unimodal pdf is decreasing on the right tions, respectively) for indeterminacy of a measure, while side of its mode and increasing on its left side. Unimodality necessary and sufficient conditions are more complicated to is useful in optimization problems such as MAP. formulate. Here, we want to show that the unimodality of the pdf is a If at least one of moment of a distribution is infinite, then property that is inherited from the innovation process. Similar the distribution is automatically considered as indeterminate. AMINI AND UNSER: SPARSITY AND INFINITE DIVISIBILITY 2355

Theorem 5 states that w, rect and w, ϕ for ϕ p L p are APPENDIX A equivalent in the sense⟨ of existing⟩ moments.⟨ ⟩ Thus,∈ if w, rect RANDOM PROCESSES VIA is indeterminate by means of having infinite moments,-⟨ the⟩ CHARACTERISTIC FUNCTIONALS same applies to all w, ϕ .However,whenallthemomentsare finite, Theorem 5 does⟨ not⟩ settle the issue of in/determinacy. Agivenfunctionaloverthespace# defines a probability Theorem 10 ([18]): An infinitely divisible distribution that measure on the algebraic or topological (continuous) dual corresponds to an indeterminate Lévy measure is itself inde- of # (the set of realizations) if the functional satisfies cer- terminate. tain conditions. The two main results are the Kolmogorov Next, we show that the indeterminacy of a Lévy measure extension theorem [19] and the Bochner–Minlos theorem and, consequently, of its associated probability distribution, is [20]. For suitable characteristic functionals, the Kolmogorov apropertythatissharedbyalllinearmeasurementsofan extension theorem demonstrates the existence of a random innovation process. process supported over the algebraic dual of #,whilethe Theorem 11: If the Lévy measure V of w, rect is indeter- Bochner–Minlos theorem narrows down the support to the minate, where w is an innovation process, then⟨ the⟩ distribution continuous dual of #,providedthatthelatterspaceisnuclear. of w, ϕ for all 0 ϕ L p is indeterminate. Theorem 12 (Bochner–Minlos [20]): Let # be a nuclear ⟨ ⟩ ̸= ∈ p Proof: By applying Theorem 3, we know that space over R and C # C be a continuous functional. - If C(0) 1andC :is semipositive-definite,3→ then C is the n n = a dVϕ(a) (aτ) dV (a)dµϕ(τ) characteristic functional of a unique random process supported 0 ¯ ¯ = 0 0 on the continuous dual of # denoted by #′ .Semipositive $R\{ } $R\{ } $R\{ } n n definiteness of C means that, for any positive integer k and a dV (a) τ dµϕ(τ) . = 0 0 for all z1,...,zK C and ϕ1,...,ϕK #,thevalue 2 $R\{ } 32 $R\{ } 3 ∈ ∈ (V ) (µϕ ) K mn mn zk zk C(ϕk ϕk ) . /0 1 . /0 (47)1 1 ¯ 2 1 − 2 k1,k2 1 '= (µϕ) Since ϕ p L p,themoments mn n∞ 0 are all finite. is real and nonnegative. ∈ { } = This implies a one-to-one mapping between the moments The function space # of the characteristic functionals con- (V ) - (Vϕ) mn n∞ 0 and mn n∞ 0 for a given ϕ p L p.Inother sidered in this paper is the intersection of L p spaces. Such { } = { } = ∈ words, m(V ) uniquely determines V if and only if spaces are not nuclear, thus, the Bochner–Minlos theorem n n - (Vϕ) { } does not apply. However, the Kolmogorov extension theorem mn n∞ 0 uniquely determines Vϕ.Hence,indeterminacy { } = [19] implies that the functional C defines a random process of V translates into indeterminacy of Vϕ,whichinturn establishes the indeterminacy of the distribution of w, ϕ over the algebraic dual of # represented as #∗.Additionally, through Theorem 10. ⟨ ⟩ C gives rise to a cylinder set measure (a quasi-measure) over #′ (continuous dual). This quasi-measure is equivalent to a random process as long as finite-dimensional pdfs are V. C ONCLUSION considered. The key observation for our purpose is that the Schwartz space of rapidly decreasing functions is a nuclear We considered an innovation-driven continuous-domain S model from which we obtain linear measurements. Our goal space which is included in all L p spaces. In terms of duals, this translates into #′ ′ # .TheBochner–Minlostheorem was to identify the sparse/compressible distributions that can ⊂ S ⊂ ∗ describe the distribution of such measurements. We showed therefore guarantees that a random process over #∗ is indeed supported over ′ (the space of tempered distributions). that a common property of such distributions is infinite divisi- S bility. One of the important implications of this property is the exclusion of all distributions that decay faster than Gaussians. APPENDIX B Furthermore, we revealed a gap between the decay rate of Gaussian distributions and other id distributions. PROOF OF LEMMA 3 The Lévy–Khinchine representation theorem characterizes We first decompose g into three components as all infinitely divisible distributions by means of a measure known as the Lévy measure. It was already known that g(ω) j sin(aω) aω dV (a) = − many properties of infinitely divisible distributions can be $ a <1 | | & ( expressed in terms of their Lévy measure. The contribution g˚ (ω) of this paper is to show that most of the higher-level prop- ℑ . /0 1 2 aω erties of pdfs (finiteness of moments, rate of decay, and j sin(aω)dV (a) 2 sin dV (a). + a 1 − 0 2 unimodality) are also preserved through linear measurements. $| |≥ $R\{ } 2 3 For instance, if a model generates a compressible distribution g (ω) g (ω) ˇℑ ℜ in a particular measurement scheme, the distribution of all . /0 1 . /0 (48)1 possible measurements would be compressible. Furthermore, this compressibility can be identified a priori through the Lévy Our next step is to upperbound each term separately. For this measure associated with the innovation process. purpose, note that sin(x) x and sin(x) 1. Hence, for | |≤| | | |≤ 2356 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 4, APRIL 2014 all a 0, 1 ,weconcludethat sin(x) min(1, x ) x a. APPENDIX C In addition,∈[ ] since V is (m, M)-bounded,| |≤ we have| that| ≤| | PROOF OF THEOREM 3 M m a dV (a)< , a dV (a)< . Recalling (3) and Theorem 2, we obtain the characteristic a <1 | | ∞ a 1 | | ∞ $| | $| |≥ function of Xϕ as 1) In case V is symmetric, due to the odd symmetry of the integrand, we conclude thatg ˚ 0. For asymmetric ℑ ≡ pXϕ (ω) Pw(ωϕ) exp f ωϕ(τ) dτ . (53) measures, we continue as ˆ = = d :$R ; & ( g˚ (ω) aω 1 sinc(aω) dV (a), (49) Similar to (31) we! can write that ℑ =− a <1 − $| | 2 sin(x&) ( σϕ 2 where sinc(x) x .Wefurtherknowthat f ωϕ(τ) dτ jθ1,ϕ ω ω gϕ(ω), (54) = d = − 2 + sinc(x) 1and0 1 sinc(x) x .Thelatter $R | |≤ ≤ − ≤|2 | & ( is obtained by observing that x sin (x/2),which where confirms that the function x 2 |sin|≥x x is increasing | | + | |−| | with respect to x .Thetwoinequalitiesforsincleadto θ1,ϕ θ d ϕ(τ)dτ, | | = R 1 sinc(x) min(2, x ).Thus,wecanbound˚g (ω) as 2 2 2 | − |≤ | | ℑ ⎧σϕ σ , d ϕ (τ)dτ, (55) ⎪ = R ⎨⎪ g˚ (ω) aω 1 sinc(aω) dV (a) gϕ(ω) , d g ωϕ(τ) dτ. ℑ ≤ a <1 | |·| − | = R $| | ⎪ 4 4 ⎪ , & ( 4 4 aω min(2, aω )dV (a) The upperbound⎩ on g imposed by Lemma 3 indicates that ≤ a <1 | | | | gϕ(ω) is finite for all ω.Wesimplifygϕ by rewriting it as $| | M aω dV (a) jaωϕ(τ) 1 ≤ a <1 | | gϕ(ω) e 1 jaωϕ(τ) a <1(a) dV (a)dτ d | | $| | = R R 0 − − M M $ $ \{ }2 3 ω a dV (a), (50) gϕ(ω) jωθ2,ϕ, (56) =| | a <1 | | =¯ + $| | < where ∞ M 1 where we used min(2,. x ) /0x − ,whichisjustified1 jaωϕ(τ) | | ≤| | gϕ(ω) e 1 by 1 M 2forasymmetricLévymeasures. ¯ = d − $R $R 0 2) Similar≤ tog≤ ˚ ,wehavethatg 0forsymmetricLévy \{ } 2 ℑ ˇℑ ≡ jaωϕ(τ)1 aϕ(τ) <1(a) dV (a)dτ (57) measures V .Wecanwritethat − | | 3 g (ω) sin(aω) dV (a) and ˇℑ ≤ a 1 | | $| |≥ 1 1 4 4 m θ2,ϕ aϕ(τ) aϕ(τ) <1(a) a <1(a) dV (a)dτ. 4 4 d | | aω dV (a) = R R 0 | | − ≤ a 1 | | $ $ \{ } 2 3 $| |≥ (58) ω m a mdV (a), (51) ≤| | a 1 | | In (57), the integration parameter τ is used only as the input $| |≥ < argument of ϕ.Consequently,ϕ(τ) can be replaced with ∞ its amplitude distribution measure µ .Ontheotherhand, where we used 0 m 1forasymmetricLévy ϕ . /0 1 whenever ϕ(τ) 0theintegrandin(57)isalsozero.Thus, measures. ≤ ≤ those values of τ=for which ϕ(τ) 0donotcontributeinthe 3) We employ a trigonometric rule to simplify g as = ℜ integral. In summary, we can rewrite (57) as aω g (ω) 2 sin2 dV (a) jaωτ ℜ = R 0 2 gϕ(ω) e ¯ 1 jaωτ1 aτ <1(a) dV (a)dµϕ(τ) $ \{ } | ¯| 4 4 2 a3ω 2 ¯ = R 0 R 0 − − ¯ ¯ 4 4 2 min 1, dV (a) $ \{ } $ \{ }2 3 jaω ≤ R 0 2 e ¯ 1 jaω1 a <1(a) dVϕ(a), (59) $ \{ } 2 4 4 3 = − − ¯ |¯| ¯ ¯ aω4 m 4 aω M R 0 2 min 4 ,4 dV (a) $ \{ } 2 3 ≤ 0 2 2 where we used the change of variables a aτ. $R\{ } 24 4 4 4 3 ¯ = ¯ ω m ω4 M 4 4 4 Equations (54)–(59) suggest (θ1,ϕ θ2,ϕ,σϕ, Vϕ) as the Lévy 2 | | | 4| 4 4 4 + ≤ 2m + 2M triplet of Xϕ,providedthatθ2,ϕ is finite and Vϕ satisfies : ; the requirement (12). Note that the (0, 2)-boundedness of m M min( a , a )dV (a) . (52) Vϕ (Requirement (12)) implies the finiteness of gϕ through × 0 | | | | ¯ $R\{ } Lemma 3. This establishes the finiteness of θ2,ϕ,sincethe < ∞ finiteness of gϕ is guaranteed by Theorem 2. Thus, to prove Finally, we combine. the individual/0 upperbounds1 using the that Xϕ is infinitely divisible with the suggested Lévy triplet, triangular inequality, which completes the proof. it suffices to show that Vϕ is (0, 2)-bounded. Instead, we prove AMINI AND UNSER: SPARSITY AND INFINITE DIVISIBILITY 2357

p p astrongerstatement:ifV is (p1, p2)-bounded, then Vϕ is also Note that 0 τ dµϕ(τ) ϕ p and R\{ } | | =∥ ∥ (p1, p2)-bounded. Specifically, , p 1 2 a dV (a) p a dV (a) p1 p2 1 | | ≤ τ 2 1 | | min( a , a )dVϕ(a) τ a 1 τ a 1 |¯| |¯| ¯ $ | | ≤| |≤ | | $ | | ≤| |≤ $R 0 1 \{ } min(1, a 2)dV (a) p1 p2 p min( aτ , aτ )dV (a)dµϕ(τ) ≤ τ 2 0 | | | | $R\{ } = R 0 R 0 | ¯| | ¯| ¯ $ \{ } $ \{ } cV p1 p2 p1 p2 ( τ τ ) min( a , a )dV (a)dµϕ(τ) cV ≤ |¯| +|¯| | | | | ¯ ,. /0 1(66) $R 0 = τ p2 \{ } | | p1 p2 p1 p2 ϕ p ϕ p min( a , a )dV (a) . (60) where p min(0, p 2).Hence, = ∥ ∥ 1 +∥ ∥ 2 | | | | 2 $R 0 = − & ( \{ } < a pdV (a) ∞ ϕ a 1 |¯| ¯ To complete the proof of. Theorem3,weestablishtheconverse/0 1 $|¯|≥ p p max(2,p) statement: if Vϕ is (p1, p2)-bounded and ϕ # 0 ,thenV ϕ p a dV (a) cV ϕ max(2,p). (67) ∈ \{ } ≤∥ ∥ a 1 | | + ∥ ∥ is also (p1, p2)-bounded, since $| |≥ This proves that min( a p1 , a p2 )dV (a) 0 | | | | p p $R\{ } a dV (a)< a dVϕ(a)< . (68) min( aτ p1 , aτ p2 )dV (a)dµ (τ) a 1 | | ∞⇒ a 1 |¯| ¯ ∞ 0 0 ϕ $| |≥ $|¯|≥ R\{ } R\{ } | ¯| | ¯| ¯ p p ≤ min( τ 1 , τ 2 )dµϕ(τ) Next, we prove the converse statement. The assumption , , R 0 |¯| |¯| ¯ \{ } p p ϕ 0necessitatestheexistenceof0< T 1suchthat min( a 1 , a 2 )dVϕ(a) p R 0 , ̸= p ≤ \{ } |¯| |¯| ¯ . (61) ϕ p,(T ) τ T τ dµϕ(τ) is strictly positive. This helps p1 p2 ∥ ∥ = | |≥ | | = , 0 min( τ , τ )dµϕ(τ) us bound (64) as R\{ } |¯| |¯| ¯ , The numerator, in (61) is finite since Vϕ is assumed to be p a dVϕ(a) (p1, p2)-bounded. The integrand in the denominator is also a 1 |¯| ¯ $|¯|≥ positive and the integral is nonzero because of ϕ 0. The p p boundedness of the denominator is readily confirmed̸≡ by τ dµϕ(τ) a dV (a) ≥ τ T | | · a 1 | | $| |≥ $| |≥ T min( τ p1 , τ p2 )dµ (τ) min( ϕ p1 , ϕ p2 ). (62) p p p ϕ p1 p2 ϕ p,(T ) a dV (a) a dV (a) R 0 |¯| |¯| ¯ ≤ ∥ ∥ ∥ ∥ =∥ ∥ a 1 | | − 1 a 1 | | $ \{ } : $| |≥ $ ≤| |≤ T ; p p cV APPENDIX D ϕ p,(T ) a dV (a) p , (69) ≥∥ ∥ a 1 | | − T : $| |≥ | | ; PROOF OF THEOREM 5 which completes the proof. According to Theorem 3, for all ϕ #w the random ∈ variable w, ϕ is infinitely divisible. Let V and Vϕ denote the ACKNOWLEDGMENT Lévy measures⟨ ⟩ of w, rect and w, ϕ ,respectively.Byusing The authors would like to thank J. Ward, C. Vonesch, and Lemma 4, we reformulate⟨ ⟩ the claim⟨ in⟩ Theorem 5 as P. Thévenaz for their constructive comments. p p a dV (a)< a dVϕ(a)< . (63) a 1 | | ∞⇐⇒ a 1 |¯| ¯ ∞ REFERENCES $| |≥ $|¯|≥ We apply Theorem 3 to rewrite the integral against the [1] A. Amini, M. Unser, and F. Marvasti, “Compressibility of deterministic and random infinite sequences,” IEEE Trans. Signal Process.,vol.59, measure Vϕ in the form no. 11, pp. 5193–5201, Nov. 2011. [2] R. Gribonval, V. Cevher, and M. E. Davies, “Compressible distributions p p a dVϕ(a) aτ dV (a)dµϕ(τ) for high-dimensional statistics,” IEEE Trans. Inf. Theory,vol.58,no.8, a 1 |¯| ¯ = aτ 1 | | pp. 5016–5034, Aug. 2012. $|¯|≥ $| |≥ [3] M. Unser, P. Tafti, and Q. Sun, “A unified formulation of Gaussian vs. p p τ a dV (a) dµϕ(τ). sparse stochastic processes—Part I: Continuous-domain theory,” 2011 = | | 1 | | [Online]. Available: arXiv:1108.6150 R 0 a > τ $ \{ } : $| | | | ; [4] M. Unser, P. Tafti, A. Amini, and H. Kirshner, “A unified formulation (64) of Gaussian vs. sparse stochastic processes—Part II: Discrete-domain theory,” 2011 [Online]. Available: arXiv:1108.6152 This yields [5] J. S. Turek, I. Yavneh, M. Protter, and M. Elad, “On MMSE and MAP denoising under sparse representation modeling over a unitary a pdV (a) τ pdµ (τ) a pdV (a) dictionary,” IEEE Trans. Signal Process.,vol.59,no.8,pp.3526–3535, ϕ ϕ Aug. 2011. a 1 |¯| ¯ ≤ 0 | | · a 1 | | $|¯|≥ $R\{ } $| |≥ [6] K. Sato, Lévy Processes and Infinitely Divisible Distributions.Boston, p p MA, USA: Chapman & Hall, 1994. τ a dV (a)dµϕ(τ). + | | 1 | | [7] A. N. Kolmogorov, “La transformation de Laplace dans les espaces τ 1 τ a 1 $| |≥ $ | | ≤| |≤ linéaires,” C. R. Acad. Sci. Paris,vol.T200,pp.1717–1718, (65) Jan./Feb. 1935. 2358 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 60, NO. 4, APRIL 2014

[8] I. Gelfand and N. Y. Vilenkin, Generalized Functions,vol.4.NewYork, Arash Amini received B.Sc., M.Sc., and Ph.D. degrees in electrical engi- NY, USA: Academic Press, 1964. neering (Communications and Signal Processing) in 2005, 2007, and 2011, [9] Q. Sun and M. Unser, “Left-inverses of fractional Laplacian and sparse respectively, and the B.Sc. degree in petroleum engineering (Reservoir) in stochastic processes,” Adv. Comput. Math.,vol.36,no.3,pp.399–441, 2005, all from Sharif University of Technology (SUT), Tehran, Iran. He Apr. 2012. was a researcher at École Polytechnique Fédérale de Lausanne, Lausanne, [10] A. Amini, U. Kamilov, E. Bostan, and M. Unser, “Bayesian estimation Switzerland from 2011 to 2013 working onstatisticalapproachestowards for continuous-time sparse stochastic processes,” IEEE Trans. Signal modeling sparsity in continuous-domain. Process.,vol.61,no.4,pp.907–920,Oct.2013. Since September 2013, he is an assistant professor at Sharif University [11] A. Amini, U. Kamilov, and M. Unser, “Bayesian denoising of general- of Technology, Tehran, Iran. His research interests include various topics in ized Poisson processes with finite rate of innovation,” in Proc. ICASSP, statistical signal processing, specially compressed sensing. Kyoto, Japan, Mar. 2012, pp. 3629–3632. [12] M. Unser and T. Blu, “Cardinal exponential splines: Part I—Theory and filtering algorithms,” IEEE Trans. Signal Process.,vol.53,no.4, pp. 1425–1438, Apr. 2005. [13] U. S. Kamilov, P. Pad, A. Amini, and M. Unser, “MMSE estimation Michael Unser (M’89–SM’94–F’99) received the M.S. (summa cum laude) of sparse Lévy processes,” IEEE Trans. Signal Process.,vol.61,no.1, and Ph.D. degrees in Electrical Engineering in 1981 and 1984, respectively, pp. 137–147, Feb. 2013. from the Ecole Polytechnique Fédérale de Lausanne (EPFL), Switzerland. [14] E. J. Candès, M. Wakin, and S. Boyd, “Enhancing sparsity by reweighted From 1985 to 1997, he worked as a scientist with the National Institutes ℓ1 minimization,” J. Fourier Anal. Appl.,vol.14,nos.5–6,pp.877–905, of Health, Bethesda USA. He is now full professor and Director of the Dec. 2008. Biomedical Imaging Group at the EPFL. [15] E. Bostan, U. Kamilov, and M. Unser, “Reconstruction of biomedical His main research area is biomedical image processing. He has a strong images and sparse stochastic modeling,” in Proc. ISBI,Barcelona,Spain, interest in sampling theories, multiresolution algorithms, wavelets, the use of May 2012, pp. 880–883. splines for image processing, and, more recently, stochastic processes. He has [16] O. E. Barndorff-Nielsen, T. Mikosch, and S. I. Resnick, Lévy Processes: published about 250 journal papers on those topics. Theory and Applications.Cambridge,MA,USA:Birkhäuser,2001. Dr. Unser is currently member of the editorial boards of IEEE JOURNAL [17] S. J. Wolfe, “On the unimodality of infinitely divisible distribution OF SELECTED TOPICS IN SIGNAL PROCESSING, Foundations and Trends in functions,” Prob. Theory Rel. Fields,vol.45,no.4,pp.329–335, Signal Processing, SIAM Journal on Imaging Sciences,andthePROCEEDINGS Dec. 1978. OF THE IEEE. He co-organized the first IEEE International Symposium on [18] C. C. Heyde, “Some remarks on the moment problem (II),” Quart. Biomedical Imaging (ISBI2002) and was the founding chair of the technical J. Math. Oxford Ser.,vol.14,no.1,pp.97–105,1963. committee of the IEEE-SP Society on Bio Imaging and Signal Processing [19] B. Øksendal, Stochastic Differential Equations: An Introduction with (BISP). Applications.Berlin,Germany:Springer-Verlag,2003. He received three Best Paper Awards (1995, 2000, 2003) from the IEEE [20] T. R. Johansen, “The Bochner-Minlos theorem for nuclear spaces and Signal Processing Society, and two IEEE Technical Achievement Awards an abstract white noise space,” Dept. Math., Louisiana State Univ., (2008 SPS and 2010 EMBS). He is an EURASIP Fellow and a member Baton Rouge, LA, USA, Tech. Rep., 2003. of the Swiss Academy of Engineering Sciences.