Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data

Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data Daniel Hillel Schonberg Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2007-93 http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-93.html July 25, 2007 Copyright © 2007, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement I am very grateful to my advisor, Professor Kannan Ramchandran, for his guidance, support, and encouragement throughout the course of my graduate study. This research would not have been possible without his vision and motivation. I would like to thank my dissertation committee, Kannan Ramchandran, Martin Wainwright, and Peter Bickel, for their assistance. Their comments and guidance have proved invaluable. This research was supported in part by the NSF under grant CCR- 0325311. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data by Daniel Hillel Schonberg B.S.E. (University of Michigan, Ann Arbor) 2001 M.S.E. (University of Michigan, Ann Arbor) 2001 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering-Electrical Engineering and Computer Sciences in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Kannan Ramchandran, Chair Professor Martin Wainwright Professor Peter Bickel Fall 2007 The dissertation of Daniel Hillel Schonberg is approved. Chair Date Date Date University of California, Berkeley Fall 2007 Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data Copyright °c 2007 by Daniel Hillel Schonberg Abstract Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data by Daniel Hillel Schonberg Doctor of Philosophy in Engineering-Electrical Engineering and Computer Sciences University of California, Berkeley Professor Kannan Ramchandran, Chair Distributed source codes compress multiple data sources by leveraging the correlation between those sources, even without access to the realization of each source, through the use of a joint decoder. Though distributed source coding has been a topic of research since the 1970s (Slepian and Wolf, 1973), there has been little practical work. In this thesis, we present a practical and flexible framework for distributed source coding and its applications. Only recently, as applications have arisen, did practical work begin in earnest. The applications range from dense sensor networks to robust video coding for wireless transmission to the compression of encrypted data. Unfortunately, most published work o®ering practical codes consider only a limited scenario. They focus on simple idealized data models, and they assume a priori knowledge of the underlying joint statistics. These assumptions are made to ease the description and analysis of their solutions. We argue though that a full solution requires a construction flexible enough to be adapted to real-world distributed source coding scenarios. This solution must be able to accommodate any source and un- known statistics. In this thesis, we develop practical distributed source codes, applicable to idealized sources and real world sources such as images and videos, that are rate adaptable to all scenarios. This thesis is broken into two halves. In the ¯rst half we discuss analytic considerations for generating practical distributed source codes. We begin by assuming a priori knowledge of the underlying source statistics, and then consider source coding with side information, a special case of distributed source coding. As a basis for our solution, we develop codes 1 for independent and identically distributed (i.i.d.) sources and then adapt the codes to parallel sources. We then generalize this framework to a complete distributed source coding construction, flexible enough for arbitrary scenarios and adaptable to important special cases. Finally we conclude this half by eliminating our assumption of a priori knowledge of the statistics. We discuss a protocol for rate adaptation, ensuring that our codes can operate blind of the source statistics. Our protocol converges rapidly to the entropy-rate of a stationary source. In the second half of this thesis, we consider distributed source coding of real world sources. As a motivating application, we consider the compression of encrypted images and video. Since encryption masks the source, traditional compression algorithms are ine®ec- tive. However, through the use of distributed source-coding techniques, the compression of encrypted data is possible. It is possible to reduce data size without requiring data be com- pressed prior to encryption. We develop algorithms for the practical lossless compression of encrypted data. Our methods leverage the statistical correlations of the source, even without direct access to their instantiations. For video, we compare our performance to a state-of- the-art motion-compensated lossless video encoder that requires unencrypted video as in- put. It compresses each unencrypted frame of the \Foreman" test video sequence by 59% on average. In comparison, our proof-of-concept implementation, working on encrypted data, compresses the same sequence by 33%. Professor Kannan Ramchandran Dissertation Committee Chair 2 I dedicate my thesis to my parents, Ila and Leslie. Mom and Dad, thank you so much. Without your encouragement and support, I would have never gotten so far. Thank you so much for instilling in me a love of learning, and for so much more. I further dedicate this thesis to my girlfriend, Helaine. Thank you so much for all the love and support you have given me. i Contents Contents ii List of Figures vi List of Tables xiv Acknowledgements xv 1 Introduction 1 2 Background 6 2.1 Traditional Source Coding . 6 2.2 Channel Coding . 8 2.3 Low-Density Parity-Check Codes and Graphical Models . 9 2.3.1 The Sum-Product Algorithm and LDPC codes . 11 2.3.2 LDPC Code Design . 13 2.4 Distributed Source Coding . 16 3 A Practical Construction for Source Coding With Side Information 20 3.1 Introduction . 20 3.2 Linear Block Codes for Source Codes With Side Information . 22 3.2.1 Illustrative Example . 23 3.2.2 Block Codes For Source Coding With Side Information . 24 3.3 Low-Density Parity-Check Codes . 26 3.4 Practical Algorithm . 27 3.4.1 Application of Sum-Product Algorithm . 28 3.5 Simulations and Results . 30 ii 3.5.1 Compression With Side Information Results . 31 3.5.2 Compression of a Single Source . 32 3.6 Conclusions . 33 4 Slepian-Wolf Coding for Parallel Sources: Design and Error-Exponent Analysis 35 4.1 Introduction . 36 4.2 Error Exponent Analysis . 37 4.3 LDPC Code Design . 39 4.3.1 EXIT Chart Based Code Design . 41 4.4 Simulation Results . 44 4.5 Conclusions & Open Problems . 48 5 Symmetric Distributed Source Coding 52 5.1 Introduction . 53 5.1.1 Related Work . 54 5.2 Distributed Source Coding . 56 5.2.1 Illustrative Example . 56 5.2.2 Code Generation . 57 5.2.3 Decoder . 59 5.2.4 Multiple Sources . 61 5.3 Low-Density Parity-Check Codes . 62 5.4 Adaptation of Coding Solution to Parity Check Matrices . 63 5.5 Integration of LDPC Codes Into Distributed Coding Construction . 64 5.5.1 Application of Sum-Product Algorithm . 67 5.6 Simulations and Results . 67 5.7 Conclusions & Open Problems . 69 6 A Protocol For Blind Distributed Source Coding 71 6.1 Introduction . 71 6.2 Protocol . 72 6.3 Analysis . 74 6.3.1 Probability of Decoding Error . 74 6.3.2 Bounding the Expected Slepian-Wolf Rate . 77 6.3.3 Optimizing the Rate Margin . 78 iii 6.4 Practical Considerations . 79 6.5 Results . 80 6.6 Conclusions & Open Problems . 82 7 The Compression of Encrypted Data: A Framework for Encrypted Simple Sources to Encrypted Images 84 7.1 Introduction . 85 7.2 General Framework . 87 7.2.1 Encryption Function . 89 7.2.2 Code Constraint . 89 7.2.3 Source and Source Model . 90 7.2.4 Decoding Algorithm . 91 7.3 From Compression of Encrypted Simple Sources to Encrypted Images . 92 7.3.1 1-D Markov Model . 93 7.3.2 2-D Markov Model . 93 7.3.3 Message Passing Rules . 94 7.3.4 Doping . 95 7.3.5 Experimental Results . 96 7.3.6 Gray Scale Images . 98 7.4 Conclusions & Open Problems . 99 8 Compression of Encrypted Video 101 8.1 Introduction . 101 8.1.1 High-Level System Description and Prior Work . 102 8.1.2 Main Results & Outline . 104 8.2 Source Model . 105 8.3 Message Passing Rules . 109 8.4 Experimental Results . 109 8.5 Blind Video Compression . 113 8.6 Conclusions & Open Problems . 114 9 Conclusions & Future Directions 116 Bibliography 120 A Codes and Degree Distributions Used to Generate LDPC Codes 127 iv B Deriving the Slepian-Wolf Error Exponent From the Channel Coding Er- ror Exponent 136 v List of Figures 1.1 A block diagram for the distributed source coding problem. The encoders act separately on each source for joint recovery at a common receiver. There is theoretically no loss of lossless sum rate compression performance due to the distributed nature of the problem, as shown in Slepian and Wolf [78]. 2 2.1 A block diagram for single source and single destination source coding.

Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data

Entropy Rate of Stochastic Processes

Lecture 3: Entropy, Relative Entropy, and Mutual Information 1 Notation 2

An Introduction to Information Theory

L11-IT-Handouts.Pdf

Entropy Rate Estimation for Markov Chains with Large State Space

Package 'Infotheo'

Arxiv:1907.00325V5 [Cs.LG] 25 Aug 2020

Chapter 2: Entropy and Mutual Information

Entropy Power, Autoregressive Models, and Mutual Information

GAIT: a Geometric Approach to Information Theory

Source Coding: Part I of Fundamentals of Source and Video Coding

Relative Entropy Rate Between a Markov Chain and Its Corresponding Hidden Markov Chain