Source Coding with Side Information

Source Coding With Side Information O. Simeone NJIT European School of Information Theory, 2013 O. Simeone Source Coding With Side Information 1 / 219 Overview O. Simeone Source Coding With Side Information 2 / 219 Overview PART I: Basics Example: File Synchronization The Wyner-Ziv problem Code design PART II: Constraints Causality constraints Reconstruction constraints Robustness constraints PART III: Extensions Action-dependent side information Cascade problems Sources with memory O. Simeone Source Coding With Side Information 3 / 219 Overview PART I: Basics Example: File Synchronization The Wyner-Ziv problem Code design PART II: Constraints Causality constraints Reconstruction constraints Robustness constraints PART III: Extensions Action-dependent side information Cascade problems Sources with memory O. Simeone Source Coding With Side Information 4 / 219 Example: File Synchronization xn=( 01001…0 ) yn=(0 0011… 1) nR bits Goal: File synchronization with minimum (bit) rate [Orlitsky ’91] R rate (bits per source symbol) O. Simeone Source Coding With Side Information 5 / 219 Example: File Synchronization xn=( 01001…0 ) yn=(0 0011… 1) nR bits Goal: File synchronization with minimum (bit) rate [Orlitsky ’91] R rate (bits per source symbol) O. Simeone Source Coding With Side Information 5 / 219 Example: File Synchronization xn=( 01001…0 ) yn=(0 0011… 1) nR bits Source coding with side information Encoder: x n (and possibly y n) ! nR bits (message or code) Decoder: (nR bits; y n) ! x n O. Simeone Source Coding With Side Information 6 / 219 Example: File Synchronization n = 3 At most one substitution If y 3 is known at the encoder, only the error pattern x 3 ⊕ y 3 must be communicated (delta compression) Ex.: x 3 = (100) and y 3 = (110), x 3 ⊕ y 3 = (010) Required number of bits nR = log2(#error patterns) 3 3 = log + = 2 bits 2 0 1 O. Simeone Source Coding With Side Information 7 / 219 Example: File Synchronization n = 3 At most one substitution If y 3 is known at the encoder, only the error pattern x 3 ⊕ y 3 must be communicated (delta compression) Ex.: x 3 = (100) and y 3 = (110), x 3 ⊕ y 3 = (010) Required number of bits nR = log2(#error patterns) 3 3 = log + = 2 bits 2 0 1 O. Simeone Source Coding With Side Information 7 / 219 Example: File Synchronization When y 3 is not known at the encoder... What does the decoder know based on y n? Equivocation set at the receiver: for y n = (000) 000 011 001 101 010 110 100 111 O. Simeone Source Coding With Side Information 8 / 219 Example: File Synchronization Equivocation set at the receiver: for y n = (001) 000 011 001 101 010 110 100 111 O. Simeone Source Coding With Side Information 9 / 219 Example: File Synchronization Each source sequence is connected to all others at a Hamming distance of at most 2 000 011 001 101 010 110 100 111 (characteristic graph [Witsenhausen ’76] The encoder must only distinguish all sequences connected by an edge O. Simeone Source Coding With Side Information 10 / 219 Example: File Synchronization The encoder can thus encode with the same label (“color”) sequences that belong to an independent set 000 011 001 101 010 110 100 111 A set with the same color forms a “bin” O. Simeone Source Coding With Side Information 11 / 219 Example: File Synchronization Encoding: Bin Code C1 = f000; 111g 00 C2 = f001; 110g 01 C3 = f100; 011g 10 C4 = f010; 101g 11 Same rate as the case in which y 3 is known at the encoder! (But this is not always the case for zero error [Orlitsky ’91]) O. Simeone Source Coding With Side Information 12 / 219 Example: File Synchronization Bins M X n Encoder Decoder Xˆ n Y n X n Channel Channel decoding at the decoder... Bins as channel codes? O. Simeone Source Coding With Side Information 13 / 219 Example: File Synchronization Bins C1 = f000; 111g is a (3; 1) repetition code: can correct one bit flip ... good channel code for the channel between X and Y C2 = f001; 110g is C1 ⊕ (001)... coset of C1 ((001) coset leader) C3 = f100; 011g is C1⊕(100) ... coset of C1 ((100) coset leader) C4 = f010; 101g is C1⊕(010) ... coset of C1 ((010) coset leader) O. Simeone Source Coding With Side Information 14 / 219 Example: File Synchronization Bins C1 = f000; 111g is a (3; 1) repetition code: can correct one bit flip ... good channel code for the channel between X and Y C2 = f001; 110g is C1 ⊕ (001)... coset of C1 ((001) coset leader) C3 = f100; 011g is C1⊕(100) ... coset of C1 ((100) coset leader) C4 = f010; 101g is C1⊕(010) ... coset of C1 ((010) coset leader) O. Simeone Source Coding With Side Information 14 / 219 Example: File Synchronization Bins C1 = f000; 111g is a (3; 1) repetition code: can correct one bit flip ... good channel code for the channel between X and Y C2 = f001; 110g is C1 ⊕ (001)... coset of C1 ((001) coset leader) C3 = f100; 011g is C1⊕(100) ... coset of C1 ((100) coset leader) C4 = f010; 101g is C1⊕(010) ... coset of C1 ((010) coset leader) O. Simeone Source Coding With Side Information 14 / 219 Example: File Synchronization Code Parity matrix for C1 = f000; 111g 1 1 0 H = 0 1 1 0 0 1 0 1 1 0 Codewords must satisfy: H @ 0 A = H @ 1 A = 0 0 1 (00) is the code for the sequences in C1 (00) is also known as the syndrome for C1 O. Simeone Source Coding With Side Information 15 / 219 Example: File Synchronization Code The code C2 = f001; 110g is obtained in the same way 0 0 1 0 1 1 0 H @ 0 A = H @ 1 A = 1 1 0 (01) is referred to as the syndrome for C2 Check syndromes for C3 (10), and for C4 (11) O. Simeone Source Coding With Side Information 16 / 219 Example: File Synchronization Cosets and Syndromes Bin (Coset) Code (Syndrome) C1 = f000; 111g 00 C2 = f001; 110g 01 C3 = f100; 011g 10 C4 = f010; 101g 11 ... The operation at the decoder can be interpreted as channel decoding... [Orlitsky and Viswanathan ’03] Binning is related to the concept of hashing [MacKay ’03] O. Simeone Source Coding With Side Information 17 / 219 Example: File Synchronization Cosets and Syndromes Bin (Coset) Code (Syndrome) C1 = f000; 111g 00 C2 = f001; 110g 01 C3 = f100; 011g 10 C4 = f010; 101g 11 ... The operation at the decoder can be interpreted as channel decoding... [Orlitsky and Viswanathan ’03] Binning is related to the concept of hashing [MacKay ’03] O. Simeone Source Coding With Side Information 17 / 219 Overview PART I: Basics Example: File Synchronization The Wyner-Ziv problem Code design PART II: Constraints Causality constraints Reconstruction constraints Robustness constraints PART III: Extensions Action-dependent side information Cascade problems Sources with memory O. Simeone Source Coding With Side Information 18 / 219 Wyner-Ziv Problem Wyner and Ziv ’76 n M X Encoder Decoder Xˆ n Y n X n p( y | x ) X n ∼ p(x) i.i.d. n n n Qn Memoryless side information “channel”: Y jX = x ∼ i=1 p(yi jxi ) Fixed-length codes: M 2 f1; 2; :::; 2nR g (nR bits) O. Simeone Source Coding With Side Information 19 / 219 Wyner-Ziv Problem Doubly Symmetric Binary Source (DSBS) n M X Encoder Decoder Xˆ n Y n X n p( y | x ) Binary Symmetric Source (BSS): Xi ∼ Ber(1=2) i.i.d. Binary Symmetric Channel (BSC(q)): n Yi = Xi ⊕ Qi with Qi ∼ Ber(q) and indep. of X i.i.d. O. Simeone Source Coding With Side Information 20 / 219 Wyner-Ziv Problem Wyner and Ziv ’76 n M X Encoder Decoder Xˆ n Y n Lossy compression: Distortion measure 0 ≤ d(x; x^) < 1, X n p( y | x ) Ex.: Hamming distortion d(x; x^) = 1(x 6=x ^); MSE d(x; x^) = (x − x^)2 Average per-block distortion n 1 X E[d(X ; X^ )] ≤ D n i i i=1 Ex.: With Hamming distortion 1 Pn ^ 1 Pn ^ n i=1 E[d(Xi ; Xi )] = n i=1 Pr[Xi 6= Xi ] O. Simeone Source Coding With Side Information 21 / 219 Wyner-Ziv Problem Wyner and Ziv ’76 n M X Encoder Decoder Xˆ n Y n X n p( y | x ) Lossy functional reconstruction d(x; y; x^) = d(f (x; y); x^) [Yamamoto ’82] [Feng et al ’04] n 1 X E[d(X ; Y ; X^ )] ≤ D n i i i i=1 O. Simeone Source Coding With Side Information 22 / 219 Wyner-Ziv Problem Wyner and Ziv ’76 n M X Encoder Decoder Xˆ n Y n X n (R; D) is achievable if for any > 0, there existsp( y a | sufficiently x ) large n0 such that for n ≥ n0 n 1 X E[d(X ; Y ; X^ )] ≤ D + n i i i i=1 WZ The rate-distortion function RXjY (D) is the infimum of all R such that (R; D) is achievable O. Simeone Source Coding With Side Information 23 / 219 Wyner-Ziv Problem Wyner and Ziv ’76 n M X Encoder Decoder Xˆ n Y n Sensor networks [Draper and Wornell ’04], video coding [Aaron et al ’02], file synchronization [OrlitskyX n ’91], distributedp( y | file x ) backup and file sharing systems [Suel and Memon ’02], reference based genome compression [Chern et al ’12],... Systematic source-channel coding: analog+digital audio/video broadcasting [Shamai et al ’98] Denoising: fidelity-boosting sequence constrained to rate R [Jalali et al ’10] Networks: Relay channel [Cover and El Gamal ’79], distributed uplink reception [Sanderovich et al ’09],... O.

Source Coding with Side Information

Data Compression: Dictionary-Based Coding 2 / 37 Dictionary-Based Coding Dictionary-Based Coding

Group Testing

DRASIC Distributed Recurrent Autoencoder for Scalable

Digital Communication Systems 2.2 Optimal Source Coding

Randomized Lempel-Ziv Compression for Anti-Compression Side-Channel Attacks

Principles of Communications ECS 332

Marconi Society - Wikipedia

IEEE Information Theory Society Newsletter

Nested Tailbiting Convolutional Codes for Secrecy, Privacy, and Storage

MASTER of ADVANCED STUDY New Professional Degrees for Engineers University of California, San Diego of California, University

Information Theory and Statistics: a Tutorial

IEEE Information Theory Society Newsletter