Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data

Total Page:16

File Type:pdf, Size:1020Kb

Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data Daniel Hillel Schonberg Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2007-93 http://www.eecs.berkeley.edu/Pubs/TechRpts/2007/EECS-2007-93.html July 25, 2007 Copyright © 2007, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Acknowledgement I am very grateful to my advisor, Professor Kannan Ramchandran, for his guidance, support, and encouragement throughout the course of my graduate study. This research would not have been possible without his vision and motivation. I would like to thank my dissertation committee, Kannan Ramchandran, Martin Wainwright, and Peter Bickel, for their assistance. Their comments and guidance have proved invaluable. This research was supported in part by the NSF under grant CCR- 0325311. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data by Daniel Hillel Schonberg B.S.E. (University of Michigan, Ann Arbor) 2001 M.S.E. (University of Michigan, Ann Arbor) 2001 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Engineering-Electrical Engineering and Computer Sciences in the GRADUATE DIVISION of the UNIVERSITY OF CALIFORNIA, BERKELEY Committee in charge: Professor Kannan Ramchandran, Chair Professor Martin Wainwright Professor Peter Bickel Fall 2007 The dissertation of Daniel Hillel Schonberg is approved. Chair Date Date Date University of California, Berkeley Fall 2007 Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data Copyright °c 2007 by Daniel Hillel Schonberg Abstract Practical Distributed Source Coding and Its Application to the Compression of Encrypted Data by Daniel Hillel Schonberg Doctor of Philosophy in Engineering-Electrical Engineering and Computer Sciences University of California, Berkeley Professor Kannan Ramchandran, Chair Distributed source codes compress multiple data sources by leveraging the correlation be- tween those sources, even without access to the realization of each source, through the use of a joint decoder. Though distributed source coding has been a topic of research since the 1970s (Slepian and Wolf, 1973), there has been little practical work. In this thesis, we present a practical and flexible framework for distributed source coding and its applications. Only recently, as applications have arisen, did practical work begin in earnest. The ap- plications range from dense sensor networks to robust video coding for wireless transmission to the compression of encrypted data. Unfortunately, most published work o®ering prac- tical codes consider only a limited scenario. They focus on simple idealized data models, and they assume a priori knowledge of the underlying joint statistics. These assumptions are made to ease the description and analysis of their solutions. We argue though that a full solution requires a construction flexible enough to be adapted to real-world distributed source coding scenarios. This solution must be able to accommodate any source and un- known statistics. In this thesis, we develop practical distributed source codes, applicable to idealized sources and real world sources such as images and videos, that are rate adaptable to all scenarios. This thesis is broken into two halves. In the ¯rst half we discuss analytic considerations for generating practical distributed source codes. We begin by assuming a priori knowledge of the underlying source statistics, and then consider source coding with side information, a special case of distributed source coding. As a basis for our solution, we develop codes 1 for independent and identically distributed (i.i.d.) sources and then adapt the codes to parallel sources. We then generalize this framework to a complete distributed source coding construction, flexible enough for arbitrary scenarios and adaptable to important special cases. Finally we conclude this half by eliminating our assumption of a priori knowledge of the statistics. We discuss a protocol for rate adaptation, ensuring that our codes can operate blind of the source statistics. Our protocol converges rapidly to the entropy-rate of a stationary source. In the second half of this thesis, we consider distributed source coding of real world sources. As a motivating application, we consider the compression of encrypted images and video. Since encryption masks the source, traditional compression algorithms are ine®ec- tive. However, through the use of distributed source-coding techniques, the compression of encrypted data is possible. It is possible to reduce data size without requiring data be com- pressed prior to encryption. We develop algorithms for the practical lossless compression of encrypted data. Our methods leverage the statistical correlations of the source, even without direct access to their instantiations. For video, we compare our performance to a state-of- the-art motion-compensated lossless video encoder that requires unencrypted video as in- put. It compresses each unencrypted frame of the \Foreman" test video sequence by 59% on average. In comparison, our proof-of-concept implementation, working on encrypted data, compresses the same sequence by 33%. Professor Kannan Ramchandran Dissertation Committee Chair 2 I dedicate my thesis to my parents, Ila and Leslie. Mom and Dad, thank you so much. Without your encouragement and support, I would have never gotten so far. Thank you so much for instilling in me a love of learning, and for so much more. I further dedicate this thesis to my girlfriend, Helaine. Thank you so much for all the love and support you have given me. i Contents Contents ii List of Figures vi List of Tables xiv Acknowledgements xv 1 Introduction 1 2 Background 6 2.1 Traditional Source Coding . 6 2.2 Channel Coding . 8 2.3 Low-Density Parity-Check Codes and Graphical Models . 9 2.3.1 The Sum-Product Algorithm and LDPC codes . 11 2.3.2 LDPC Code Design . 13 2.4 Distributed Source Coding . 16 3 A Practical Construction for Source Coding With Side Information 20 3.1 Introduction . 20 3.2 Linear Block Codes for Source Codes With Side Information . 22 3.2.1 Illustrative Example . 23 3.2.2 Block Codes For Source Coding With Side Information . 24 3.3 Low-Density Parity-Check Codes . 26 3.4 Practical Algorithm . 27 3.4.1 Application of Sum-Product Algorithm . 28 3.5 Simulations and Results . 30 ii 3.5.1 Compression With Side Information Results . 31 3.5.2 Compression of a Single Source . 32 3.6 Conclusions . 33 4 Slepian-Wolf Coding for Parallel Sources: Design and Error-Exponent Analysis 35 4.1 Introduction . 36 4.2 Error Exponent Analysis . 37 4.3 LDPC Code Design . 39 4.3.1 EXIT Chart Based Code Design . 41 4.4 Simulation Results . 44 4.5 Conclusions & Open Problems . 48 5 Symmetric Distributed Source Coding 52 5.1 Introduction . 53 5.1.1 Related Work . 54 5.2 Distributed Source Coding . 56 5.2.1 Illustrative Example . 56 5.2.2 Code Generation . 57 5.2.3 Decoder . 59 5.2.4 Multiple Sources . 61 5.3 Low-Density Parity-Check Codes . 62 5.4 Adaptation of Coding Solution to Parity Check Matrices . 63 5.5 Integration of LDPC Codes Into Distributed Coding Construction . 64 5.5.1 Application of Sum-Product Algorithm . 67 5.6 Simulations and Results . 67 5.7 Conclusions & Open Problems . 69 6 A Protocol For Blind Distributed Source Coding 71 6.1 Introduction . 71 6.2 Protocol . 72 6.3 Analysis . 74 6.3.1 Probability of Decoding Error . 74 6.3.2 Bounding the Expected Slepian-Wolf Rate . 77 6.3.3 Optimizing the Rate Margin . 78 iii 6.4 Practical Considerations . 79 6.5 Results . 80 6.6 Conclusions & Open Problems . 82 7 The Compression of Encrypted Data: A Framework for Encrypted Simple Sources to Encrypted Images 84 7.1 Introduction . 85 7.2 General Framework . 87 7.2.1 Encryption Function . 89 7.2.2 Code Constraint . 89 7.2.3 Source and Source Model . 90 7.2.4 Decoding Algorithm . 91 7.3 From Compression of Encrypted Simple Sources to Encrypted Images . 92 7.3.1 1-D Markov Model . 93 7.3.2 2-D Markov Model . 93 7.3.3 Message Passing Rules . 94 7.3.4 Doping . 95 7.3.5 Experimental Results . 96 7.3.6 Gray Scale Images . 98 7.4 Conclusions & Open Problems . 99 8 Compression of Encrypted Video 101 8.1 Introduction . 101 8.1.1 High-Level System Description and Prior Work . 102 8.1.2 Main Results & Outline . 104 8.2 Source Model . 105 8.3 Message Passing Rules . 109 8.4 Experimental Results . 109 8.5 Blind Video Compression . 113 8.6 Conclusions & Open Problems . 114 9 Conclusions & Future Directions 116 Bibliography 120 A Codes and Degree Distributions Used to Generate LDPC Codes 127 iv B Deriving the Slepian-Wolf Error Exponent From the Channel Coding Er- ror Exponent 136 v List of Figures 1.1 A block diagram for the distributed source coding problem. The encoders act separately on each source for joint recovery at a common receiver. There is theoretically no loss of lossless sum rate compression performance due to the distributed nature of the problem, as shown in Slepian and Wolf [78]. 2 2.1 A block diagram for single source and single destination source coding.
Recommended publications
  • Entropy Rate of Stochastic Processes
    Entropy Rate of Stochastic Processes Timo Mulder Jorn Peters [email protected] [email protected] February 8, 2015 The entropy rate of independent and identically distributed events can on average be encoded by H(X) bits per source symbol. However, in reality, series of events (or processes) are often randomly distributed and there can be arbitrary dependence between each event. Such processes with arbitrary dependence between variables are called stochastic processes. This report shows how to calculate the entropy rate for such processes, accompanied with some denitions, proofs and brief examples. 1 Introduction One may be interested in the uncertainty of an event or a series of events. Shannon entropy is a method to compute the uncertainty in an event. This can be used for single events or for several independent and identically distributed events. However, often events or series of events are not independent and identically distributed. In that case the events together form a stochastic process i.e. a process in which multiple outcomes are possible. These processes are omnipresent in all sorts of elds of study. As it turns out, Shannon entropy cannot directly be used to compute the uncertainty in a stochastic process. However, it is easy to extend such that it can be used. Also when a stochastic process satises certain properties, for example if the stochastic process is a Markov process, it is straightforward to compute the entropy of the process. The entropy rate of a stochastic process can be used in various ways. An example of this is given in Section 4.1.
    [Show full text]
  • Lecture 3: Entropy, Relative Entropy, and Mutual Information 1 Notation 2
    EE376A/STATS376A Information Theory Lecture 3 - 01/16/2018 Lecture 3: Entropy, Relative Entropy, and Mutual Information Lecturer: Tsachy Weissman Scribe: Yicheng An, Melody Guan, Jacob Rebec, John Sholar In this lecture, we will introduce certain key measures of information, that play crucial roles in theoretical and operational characterizations throughout the course. These include the entropy, the mutual information, and the relative entropy. We will also exhibit some key properties exhibited by these information measures. 1 Notation A quick summary of the notation 1. Discrete Random Variable: U 2. Alphabet: U = fu1; u2; :::; uM g (An alphabet of size M) 3. Specific Value: u; u1; etc. For discrete random variables, we will write (interchangeably) P (U = u), PU (u) or most often just, p(u) Similarly, for a pair of random variables X; Y we write P (X = x j Y = y), PXjY (x j y) or p(x j y) 2 Entropy Definition 1. \Surprise" Function: 1 S(u) log (1) , p(u) A lower probability of u translates to a greater \surprise" that it occurs. Note here that we use log to mean log2 by default, rather than the natural log ln, as is typical in some other contexts. This is true throughout these notes: log is assumed to be log2 unless otherwise indicated. Definition 2. Entropy: Let U a discrete random variable taking values in alphabet U. The entropy of U is given by: 1 X H(U) [S(U)] = log = − log (p(U)) = − p(u) log p(u) (2) , E E p(U) E u Where U represents all u values possible to the variable.
    [Show full text]
  • An Introduction to Information Theory
    An Introduction to Information Theory Vahid Meghdadi reference : Elements of Information Theory by Cover and Thomas September 2007 Contents 1 Entropy 2 2 Joint and conditional entropy 4 3 Mutual information 5 4 Data Compression or Source Coding 6 5 Channel capacity 8 5.1 examples . 9 5.1.1 Noiseless binary channel . 9 5.1.2 Binary symmetric channel . 9 5.1.3 Binary erasure channel . 10 5.1.4 Two fold channel . 11 6 Differential entropy 12 6.1 Relation between differential and discrete entropy . 13 6.2 joint and conditional entropy . 13 6.3 Some properties . 14 7 The Gaussian channel 15 7.1 Capacity of Gaussian channel . 15 7.2 Band limited channel . 16 7.3 Parallel Gaussian channel . 18 8 Capacity of SIMO channel 19 9 Exercise (to be completed) 22 1 1 Entropy Entropy is a measure of uncertainty of a random variable. The uncertainty or the amount of information containing in a message (or in a particular realization of a random variable) is defined as the inverse of the logarithm of its probabil- ity: log(1=PX (x)). So, less likely outcome carries more information. Let X be a discrete random variable with alphabet X and probability mass function PX (x) = PrfX = xg, x 2 X . For convenience PX (x) will be denoted by p(x). The entropy of X is defined as follows: Definition 1. The entropy H(X) of a discrete random variable is defined by 1 H(X) = E log p(x) X 1 = p(x) log (1) p(x) x2X Entropy indicates the average information contained in X.
    [Show full text]
  • L11-IT-Handouts.Pdf
    Today’s Topics Information Theory • Communication Channel • Noiseless binary channel Mohamed Hamada • Binary Symmetric Channel (BSC) Software Engineering Lab The University of Aizu • Symmetric Channel Email: [email protected] URL: http://www.u-aizu.ac.jp/~hamada • Mutual Information • Channel Capacity 1 Digital Communication Systems Digital Communication Systems 1. Huffman Code. 1. Memoryless 2. Two-pass Huffman Code. 2. Stochastic 3. Lemple-Ziv Code. 3. Markov 4. Fano code. Information User of Information 4. Ergodic User of Source 5. Shannon Code. Information Source Information 6. Arithmetic Code. Source Source Source Source Encoder Decoder Encoder Decoder Channel Channel Channel Channel Encoder Decoder Encoder Decoder Modulator De-Modulator Modulator De-Modulator Channel Channel 2 3 INFORMATION TRANSFER ACROSS CHANNELS Communication Channel Sent Received A (discrete ) channel is a system consisting of an input alphabet messages messages X and output alphabet Y and a probability transition matrix symbols p(y|x) that expresses the probability of observing the output symbol y given that we send the symbol x Channel Channel Source Source Channel sourcecoding coding decoding decoding receiver Examples of channels: Compression Error Correction Decompression CDs, CD – ROMs, DVDs, phones, Source Entropy Channel Capacity Ethernet, Video cassettes etc. Rate vs Distortion Capacity vs Efficiency 4 5 1 Communication Channel Noiseless binary channel Noiseless binary channel Channel Channel 00 input x p(y|x) output y Transition probabilities 11 Memoryless: - output only on input Transition Matrix - input and output alphabet finite 01 p(y | x) = 0 10 1 01 6 7 Binary Symmetric Channel (BSC) Binary Symmetric Channel (BSC) (Noisy channel) (Noisy channel) 00BSC Channel 1 p 1-p 1-p Error Source 00 e BSC Channel p 110 y = x e p 1-p xi i i + 11 Input Output p 1-p 00 1-p 1 1 p BSC Channel 8 9 Symmetric Channel (Noisy channel) Channel XY In the transmission matrix of this channel , all the rows are permutations of each other and so the columns.
    [Show full text]
  • Entropy Rate Estimation for Markov Chains with Large State Space
    Entropy Rate Estimation for Markov Chains with Large State Space Yanjun Han Department of Electrical Engineering Stanford University Stanford, CA 94305 [email protected] Jiantao Jiao Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720 [email protected] Chuan-Zheng Lee, Tsachy Weissman Yihong Wu Department of Electrical Engineering Department of Statistics and Data Science Stanford University Yale University Stanford, CA 94305 New Haven, CT 06511 {czlee, tsachy}@stanford.edu [email protected] Tiancheng Yu Department of Electronic Engineering Tsinghua University Haidian, Beijing 100084 [email protected] Abstract Entropy estimation is one of the prototypicalproblems in distribution property test- ing. To consistently estimate the Shannon entropy of a distribution on S elements with independent samples, the optimal sample complexity scales sublinearly with arXiv:1802.07889v3 [cs.LG] 24 Sep 2018 S S as Θ( log S ) as shown by Valiant and Valiant [41]. Extendingthe theory and algo- rithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with S states from a sample path of n observations. We show that Provided the Markov chain mixes not too slowly, i.e., the relaxation time is • S S2 at most O( 3 ), consistent estimation is achievable when n . ln S ≫ log S Provided the Markov chain has some slight dependency, i.e., the relaxation • 2 time is at least 1+Ω( ln S ), consistent estimation is impossible when n . √S S2 log S . Under both assumptions, the optimal estimation accuracy is shown to be S2 2 Θ( n log S ).
    [Show full text]
  • Package 'Infotheo'
    Package ‘infotheo’ February 20, 2015 Title Information-Theoretic Measures Version 1.2.0 Date 2014-07 Publication 2009-08-14 Author Patrick E. Meyer Description This package implements various measures of information theory based on several en- tropy estimators. Maintainer Patrick E. Meyer <[email protected]> License GPL (>= 3) URL http://homepage.meyerp.com/software Repository CRAN NeedsCompilation yes Date/Publication 2014-07-26 08:08:09 R topics documented: condentropy . .2 condinformation . .3 discretize . .4 entropy . .5 infotheo . .6 interinformation . .7 multiinformation . .8 mutinformation . .9 natstobits . 10 Index 12 1 2 condentropy condentropy conditional entropy computation Description condentropy takes two random vectors, X and Y, as input and returns the conditional entropy, H(X|Y), in nats (base e), according to the entropy estimator method. If Y is not supplied the function returns the entropy of X - see entropy. Usage condentropy(X, Y=NULL, method="emp") Arguments X data.frame denoting a random variable or random vector where columns contain variables/features and rows contain outcomes/samples. Y data.frame denoting a conditioning random variable or random vector where columns contain variables/features and rows contain outcomes/samples. method The name of the entropy estimator. The package implements four estimators : "emp", "mm", "shrink", "sg" (default:"emp") - see details. These estimators require discrete data values - see discretize. Details • "emp" : This estimator computes the entropy of the empirical probability distribution. • "mm" : This is the Miller-Madow asymptotic bias corrected empirical estimator. • "shrink" : This is a shrinkage estimate of the entropy of a Dirichlet probability distribution. • "sg" : This is the Schurmann-Grassberger estimate of the entropy of a Dirichlet probability distribution.
    [Show full text]
  • Arxiv:1907.00325V5 [Cs.LG] 25 Aug 2020
    Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities Ronan Perry1, Ronak Mehta1, Richard Guo1, Jesús Arroyo1, Mike Powell1, Hayden Helm1, Cencheng Shen1, and Joshua T. Vogelstein1;2∗ Abstract. Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty. Current widely used approaches for computing such quantities rely on nearest neighbor methods and exhibit both strong performance and theoretical guarantees in certain simple scenarios. However, existing approaches fail in high-dimensional settings and when different features are measured on different scales. We propose decision forest-based adaptive nearest neighbor estimators and show that they are able to effectively estimate posterior probabilities, conditional entropies, and mutual information even in the aforementioned settings. We provide an extensive study of efficacy for classification and posterior probability estimation, and prove cer- tain forest-based approaches to be consistent estimators of the true posteriors and derived information-theoretic quantities under certain assumptions. In a real-world connectome application, we quantify the uncertainty about neuron type given various cellular features in the Drosophila larva mushroom body, a key challenge for modern neuroscience. 1 Introduction Uncertainty quantification is a fundamental desiderata of statistical inference and data science. In supervised learning settings it is common to quantify uncertainty with either conditional en- tropy or mutual information (MI). Suppose we are given a pair of random variables (X; Y ), where X is d-dimensional vector-valued and Y is a categorical variable of interest. Conditional entropy H(Y jX) measures the uncertainty in Y on average given X. On the other hand, mutual information quantifies the shared information between X and Y .
    [Show full text]
  • Chapter 2: Entropy and Mutual Information
    Chapter 2: Entropy and Mutual Information University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2 outline • Definitions • Entropy • Joint entropy, conditional entropy • Relative entropy, mutual information • Chain rules • Jensen’s inequality • Log-sum inequality • Data processing inequality • Fano’s inequality University of Illinois at Chicago ECE 534, Natasha Devroye Definitions A discrete random variable X takes on values x from the discrete alphabet . X The probability mass function (pmf) is described by p (x)=p(x) = Pr X = x , for x . X { } ∈ X University of Illinois at Chicago ECE 534, Natasha Devroye Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links. 2 Probability, Entropy, and Inference Definitions Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 This chapter, and its sibling, Chapter 8, devote some time to notation. Just You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/as the White Knight fordistinguished links. between the song, the name of the song, and what the name of the song was called (Carroll, 1998), we will sometimes 2.1: Probabilities and ensembles need to be careful to distinguish between a random variable, the v23alue of the i ai pi random variable, and the proposition that asserts that the random variable x has a particular value. In any particular chapter, however, I will use the most 1 a 0.0575 a Figure 2.2.
    [Show full text]
  • Entropy Power, Autoregressive Models, and Mutual Information
    entropy Article Entropy Power, Autoregressive Models, and Mutual Information Jerry Gibson Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA 93106, USA; [email protected] Received: 7 August 2018; Accepted: 17 September 2018; Published: 30 September 2018 Abstract: Autoregressive processes play a major role in speech processing (linear prediction), seismic signal processing, biological signal processing, and many other applications. We consider the quantity defined by Shannon in 1948, the entropy rate power, and show that the log ratio of entropy powers equals the difference in the differential entropy of the two processes. Furthermore, we use the log ratio of entropy powers to analyze the change in mutual information as the model order is increased for autoregressive processes. We examine when we can substitute the minimum mean squared prediction error for the entropy power in the log ratio of entropy powers, thus greatly simplifying the calculations to obtain the differential entropy and the change in mutual information and therefore increasing the utility of the approach. Applications to speech processing and coding are given and potential applications to seismic signal processing, EEG classification, and ECG classification are described. Keywords: autoregressive models; entropy rate power; mutual information 1. Introduction In time series analysis, the autoregressive (AR) model, also called the linear prediction model, has received considerable attention, with a host of results on fitting AR models, AR model prediction performance, and decision-making using time series analysis based on AR models [1–3]. The linear prediction or AR model also plays a major role in many digital signal processing applications, such as linear prediction modeling of speech signals [4–6], geophysical exploration [7,8], electrocardiogram (ECG) classification [9], and electroencephalogram (EEG) classification [10,11].
    [Show full text]
  • GAIT: a Geometric Approach to Information Theory
    GAIT: A Geometric Approach to Information Theory Jose Gallego Ankit Vani Max Schwarzer Simon Lacoste-Julieny Mila and DIRO, Université de Montréal Abstract supports. Optimal transport distances, such as Wasser- stein, have emerged as practical alternatives with theo- retical grounding. These methods have been used to We advocate the use of a notion of entropy compute barycenters (Cuturi and Doucet, 2014) and that reflects the relative abundances of the train generative models (Genevay et al., 2018). How- symbols in an alphabet, as well as the sim- ever, optimal transport is computationally expensive ilarities between them. This concept was as it generally lacks closed-form solutions and requires originally introduced in theoretical ecology to the solution of linear programs or the execution of study the diversity of ecosystems. Based on matrix scaling algorithms, even when solved only in this notion of entropy, we introduce geometry- approximate form (Cuturi, 2013). Approaches based aware counterparts for several concepts and on kernel methods (Gretton et al., 2012; Li et al., 2017; theorems in information theory. Notably, our Salimans et al., 2018), which take a functional analytic proposed divergence exhibits performance on view on the problem, have also been widely applied. par with state-of-the-art methods based on However, further exploration on the interplay between the Wasserstein distance, but enjoys a closed- kernel methods and information theory is lacking. form expression that can be computed effi- ciently. We demonstrate the versatility of our Contributions. We i) introduce to the machine learn- method via experiments on a broad range of ing community a similarity-sensitive definition of en- domains: training generative models, comput- tropy developed by Leinster and Cobbold(2012).
    [Show full text]
  • Source Coding: Part I of Fundamentals of Source and Video Coding
    Foundations and Trends R in sample Vol. 1, No 1 (2011) 1–217 c 2011 Thomas Wiegand and Heiko Schwarz DOI: xxxxxx Source Coding: Part I of Fundamentals of Source and Video Coding Thomas Wiegand1 and Heiko Schwarz2 1 Berlin Institute of Technology and Fraunhofer Institute for Telecommunica- tions — Heinrich Hertz Institute, Germany, [email protected] 2 Fraunhofer Institute for Telecommunications — Heinrich Hertz Institute, Germany, [email protected] Abstract Digital media technologies have become an integral part of the way we create, communicate, and consume information. At the core of these technologies are source coding methods that are described in this text. Based on the fundamentals of information and rate distortion theory, the most relevant techniques used in source coding algorithms are de- scribed: entropy coding, quantization as well as predictive and trans- form coding. The emphasis is put onto algorithms that are also used in video coding, which will be described in the other text of this two-part monograph. To our families Contents 1 Introduction 1 1.1 The Communication Problem 3 1.2 Scope and Overview of the Text 4 1.3 The Source Coding Principle 5 2 Random Processes 7 2.1 Probability 8 2.2 Random Variables 9 2.2.1 Continuous Random Variables 10 2.2.2 Discrete Random Variables 11 2.2.3 Expectation 13 2.3 Random Processes 14 2.3.1 Markov Processes 16 2.3.2 Gaussian Processes 18 2.3.3 Gauss-Markov Processes 18 2.4 Summary of Random Processes 19 i ii Contents 3 Lossless Source Coding 20 3.1 Classification
    [Show full text]
  • Relative Entropy Rate Between a Markov Chain and Its Corresponding Hidden Markov Chain
    ⃝c Statistical Research and Training Center دورهی ٨، ﺷﻤﺎرهی ١، ﺑﻬﺎر و ﺗﺎﺑﺴﺘﺎن ١٣٩٠، ﺻﺺ ٩٧–١٠٩ J. Statist. Res. Iran 8 (2011): 97–109 Relative Entropy Rate between a Markov Chain and Its Corresponding Hidden Markov Chain GH. Yariy and Z. Nikooraveshz;∗ y Iran University of Science and Technology z Birjand University of Technology Abstract. In this paper we study the relative entropy rate between a homo- geneous Markov chain and a hidden Markov chain defined by observing the output of a discrete stochastic channel whose input is the finite state space homogeneous stationary Markov chain. For this purpose, we obtain the rela- tive entropy between two finite subsequences of above mentioned chains with the help of the definition of relative entropy between two random variables then we define the relative entropy rate between these stochastic processes and study the convergence of it. Keywords. Relative entropy rate; stochastic channel; Markov chain; hid- den Markov chain. MSC 2010: 60J10, 94A17. 1 Introduction Suppose fXngn2N is a homogeneous stationary Markov chain with finite state space S = f0; 1; 2; :::; N − 1g and fYngn2N is a hidden Markov chain (HMC) which is observed through a discrete stochastic channel where the input of channel is the Markov chain. The output state space of channel is characterized by channel’s statistical properties. From now on we study ∗ Corresponding author 98 Relative Entropy Rate between a Markov Chain and Its ::: the channels state spaces which have been equal to the state spaces of input chains. Let P = fpabg be the one-step transition probability matrix of the Markov chain such that pab = P rfXn = bjXn−1 = ag for a; b 2 S and Q = fqabg be the noisy matrix of channel where qab = P rfYn = bjXn = ag for a; b 2 S.
    [Show full text]