Sparsity, Randomness and

Petros Boufounos Mitsubishi Electric Research Labs [email protected] Sparsity Why Sparsity

• Natural data and signals exhibit structure

• Sparsity oen captures that structure

• Very general signal model

• Computaonally tractable

• Wide range of applicaons in signal acquision, processing, and transmission Signal Representations Signal example: Images

• 2‐D funcon

• Idealized view some funcon space defined over Signal example: Images

• 2‐D funcon

• Idealized view some funcon space defined over

• In pracce

ie: an Signal example: Images

• 2‐D funcon

• Idealized view some funcon space defined over

• In pracce

ie: an matrix (pixel average) Signal Models Classical Model: Signal lies in a linear vector space x1 (e.g. bandlimited functions) X Sparse Model: Signals of interest are often sparse or compressible x2 Signal Transform x3

Image

Bat Sonar Gabor/ Chirp STFT

i.e., very few large coefficients, many close to zero. Sparse Signal Models

x1 x1 X X Sparse signals have x2 few non-zero coefficients x2 x3 x3 1-sparse 2-sparse

x1

Compressible signals have few significant coefficients. x2 The coefficients decay as a power law. x3

Compressible (p ball, p<1) Sparse Approximation Computational Harmonic Analysis

• Representaon

coefficients , frame

• Analysis: study through structure of should extract features of interest

• Approximaon: uses just a few terms exploit sparsity of Wavelet Transform Sparsity

• Many (blue) Sparseness ⇒ Approximation

few big

many small

sorted index Linear Approximation

index Linear Approximation

• ‐term approximaon: use “first”

index Nonlinear Approximation

• ‐term approximaon: use largest independently

• Greedy / thresholding

few big sorted index Error Approximation Rates

as

• Opmize asymptoc error decay rate

• Nonlinear approximaon works beer than linear Compression is Approximation

• Lossy compression of an image creates an approximaon

coefficients basis, frame

quanze to total bits ≠ Compression

• Sparse approximaon chooses coefficients but does not quanze or worry about their locaons

threshold Location, Location, Location

• Nonlinear approximaon selects largest to minimize error (easy – threshold)

• Compression algorithm must encode both a set of and their locaons (harder) Exposing Sparsity Spikes and Sinusoids example

Example Signal Model: Sinusoidal with a few spikes.

f B a DCT Basis: Spikes and Sinusoids Dictionary

f D a

DCT basis Impulses

Lost Uniqueness!! Overcomplete Dictionaries

Strategy: Improve sparse approximation by constructing a large dictionary. f D a

How do we design a dictionary? Dictionary Design

Wavelets DCT, DFT Edgelets, curvelets, …

Impulse Basis Oversampling Frame … …

Dictionary D

Can we just throw in the bucket everything we know? Dictionary Design Considerations

• Diconary Size: – Computaon and storage increases with size

• Fast Transforms: – FFT, DCT, FWT, etc. dramacally decrease computaon and storage

• Coherence: – Similarity in elements makes soluon harder Dictionary Coherence

Two candidate dictionaries:

D1 D2 BAD!

Intuition: D2 has too many similar elements. It is very coherent.

Coherence (similarity) between elements: 〈d1,d2〉

Dictionary coherence: μ=maxi,j〈di,dj〉 Incoherent Bases • “Mix” well the signal components – Impulses and Fourier Basis – Anything and Random Gaussian – Anything and Random 0‐1 basis Computing Sparse Representations Thresholding

Zero out Compute set of coefficients small ones f D a

a=D†f

Computationally efficient Good for small and very incoherent dictionaries Matching Pursuit

Measure image Select largest Add to against dictionary correlation representation T ρ D f ak ← ak+ρk

ρk Compute residual

f ← f - ρkdk

〈dk,f 〉 = ρk

Iterate using residual Greedy Pursuits Family

• Several Variaons of MP: OMP, StOMP, ROMP, CoSaMP, Tree MP, … (You can create an AndrewMP if you work on it…)

• Some have provable guarantees

• Some improve diconary search

• Some improve coefficient selecon CoSaMP (Compressive Sampling MP)

Add to Measure image Select location support set against dictionary of largest Ω = supp(ρ 2K ) T DT f ρ 2K correlations | ∪ Invert over support supp(ρ|2K) b = DΩ† f Truncate and compute residual T = supp(b ) |K a = b K 〈d ,f 〉 = ρ | k k r f Da ← − Iterate using residual Optimization (Basis Pursuit)

Sparse approximation:

Minimize non-zeros in representation s.t.: representation is close to signal

min ‖a‖0 s.t. f ≈ Da

Number of non-zeros Data Fidelity (sparsity measure) (approximation quality)

Combinatorial complexity. Very hard problem! Optimization (Basis Pursuit)

Sparse approximation:

Minimize non-zeros in representation s.t.: representation is close to signal

min ‖a‖0 s.t. f ≈ Da

Convex Relaxation

min ‖a‖1 s.t. f ≈ Da

Ploynomial complexity. Solved using . Why l1 relaxation works

min ‖a‖1 s.t. f ≈ Da

Sparse solution

l1 “ball”

f = Da Basis Pursuits

• Have provable guarantees – Finds sparsest soluon for incoherent diconaries

• Several variants in formulaon: BPDN, , Dantzig selector, …

• Variaons on fidelity term and relaxaon choice

• Several fast algorithms: FPC, GPSR, SPGL, … Compressed Sensing: Sensing, Sampling and Data Processing Data Acquisition • Usual acquision methods sample signals uniformly – Time: A/D with microphones, geophones, hydrophones. – Space: CCD cameras, sensor arrays.

• Foundaon: Nyquist/Shannon sampling theory – Sample at twice the signal bandwidth. – Generally a projecon to a complete basis that spans the signal space. Data Processing and Transmission

• Data processing steps: – Sample Densely Signal x, N coefficients

– Transform to an informave domain (Fourier, Wavelet)

K<

– Process/Compress/Transmit Sets small coefficients to zero (sparsificaon) Sparsity Model

• Signals can usually be compressed in some basis

• Sparsity: good prior in picking from a lot of candidates Compressive Sensing Principles

x1 X If a signal is sparse, do not waste effort sampling the empty space. x2

x3 Instead, use fewer samples 1-sparse and allow ambiguity.

x1 Use the sparsity model to reconstruct X and uniquely resolve the ambiguity. x2

x3 2-sparse Measuring Sparse Signals Compressive Measurements Ambiguity

x1 y1

x2 x1 X Measurement x3 (Projection) X x2 y2

x3 Reconstruction

Φ has rank M≪N

N = Signal dimensionality M = Number of measurements K = Signal sparsity (dimensionality of y)

N ≫ M ≳ K One Simple Question Geometry of Sparse Signal Sets Geometry: Embedding in RM Illustrative Example Example: 1-sparse signal y1=x2

X N=3 Bad! y2 =x3 K=1 x1=0 M=2K=2

x1 X

x2 y1=x1=x2 x3 X Bad! y2 =x3 Example: 1-sparse signal

y1=x2 N=3 x1 K=1 X M=2K=2 Good! y2 =x3

x1 X y1=x1 x2

x3 X Better! y2

x3 x2 Restricted Isometry Property RIP as a “Stable” Embedding Verifying RIP Universality Property Universality Property Democracy ~ Φ ~

measurements

Bad/lost/dropped measurements

• Measurements are democrac [Davenport, Laska, Boufounos, Baraniuk] - They are all equally important - We can loose some arbitrarily, (i.e. an adversary can choose which ones)

~ • The Φ sll sasfies RIP (as long as we don’t drop too many) Reconstruction Requirements for Reconstruction

• Let x1, x2 be K‐sparse signals (I.e. x1-x2 is 2K‐sparse): • Mapping y=Φx is inverble for K‐sparse signals:

Φ(x1-x2)≠0 if x1≠x2

• Mapping is robust for K‐sparse signals:

||Φ(x1-x2)||2≈|| x1-x2||2

– Restricted Isometry Property (RIP): Φ preserves distance when projecng K‐sparse signals

• Guarantees there exists a unique K‐sparse signal explains the measurements, and is robust to noise. Reconstruction Ambiguity

• Soluon should be consistent with measurements

xˆ s.t. y = Φxˆ or y ≈ Φxˆ • Projecons imply that an infinite number of soluons are consistent!

• Classical approach: use the pseudoinverse (minimize l2 norm) € • Compressive sensing approach: pick the sparsest.

• RIP guarantee: sparsest soluon unique and reconstructs the signal.

Becomes a sparse approximation problem! Putting everything together Compressed Sensing Coming Together

Signal Structure (sparsity)

Reconstrucon Measurement y ~ using sparse x y=Φx x approximaon

Stable Embedding Non‐linear Reconstrucon (random projecons) (Basis Pursuit, Matching Pursuit, CoSaMP, etc…)

• Signal model: Provides prior informaon; allows undersampling

• Randomness: Provides robustness/stability; makes proofs easier

• Non‐linear reconstrucon: Incorporates informaon through computaon Beyond: Extensions, Connecons, Generalizaons Sparsity Models Block Sparsity

Φ

sparse measurements signal

nonzero blocks of L

x Mixed l1/l2 norm—sum of l2 norms: ! Bi !2 i ! min xB 2 s.t. y Φx Basis pursuit becomes: x ! i ! ≈ i ! Blocks are not allowed to overlap Joint Sparsity

y Φ x N L M L N measurements × L sparse signals in R

Sparse components per signal with common support

Mixed l1/l2 norm—sum of l2 norms: x(i, ) 2 ! · ! i ! Basis pursuit becomes: min x(i, ) 2 s.t. y Φx x ! · ! ≈ i ! Randomized Embeddings Stable Embeddings Johnson-Lindenstrauss Lemma Favorable JL Distributions Connecting JL to RIP Connecting JL to RIP

More? The tip of the iceberg Today’s lecture

Compressive Sensing Repository dsp.rice.edu/cs

Blog on CS nuit-blanche.blogspot.com/

Yet to be discovered… Start working on it 