Lossless Source Coding

Total Page:16

File Type:pdf, Size:1020Kb

Lossless Source Coding Lossless Source Coding Gadiel Seroussi Marcelo Weinberger Information Theory Research Hewlett-Packard Laboratories Palo Alto, California Seroussi/Weinberger – Lossless source coding 26-May-04 1 LosslessLossless SourceSource CodingCoding 1. Fundamentals Seroussi/Weinberger – Lossless source coding 26-May-04 2 Notation • A = discrete (usually finite) alphabet • a = | A| = size of A (when finite) n n • x1 = x = x1x2 x3 Kxn = finite sequence over A ¥ ¥ • x1 = x = x1x2 x3 Kxt K = infinite sequence over A j • xi = xi xi+1Kx j = sub-sequence (i sometimes omitted if = 1) • pX(x) = Prob(X=x) = probability mass function (PMF) on A (subscript X and argument x dropped if clear from context) • X ~ p(x) : X obeys PMF p(x) • Ep [F] = expectation of F w.r.t. PMF p (subscript and [ ] may be dropped) n • pˆ n (x) = empirical distribution obtained from x1 x1 • log x = logarithm to base 2 of x, unless base otherwise specified • ln x = natural logarithm of x • H(X), H(p) = entropy of a random variable X or PMF p, in bits; also • H(p) = - p log p - (1-p) log (1-p), 0 £ p £ 1 : binary entropy function • D(p||q) = relative entropy (information divergence) between PMFs p and q Seroussi/Weinberger – Lossless source coding 26-May-04 3 Coding in a communication/storage system data data encoder channel decoder source destination noise data source channel source encoder encoder noise channel data source channel destination decoder decoder Seroussi/Weinberger – Lossless source coding 26-May-04 4 Information Theory q Shannon, “A mathematical theory of communication,” Bell Tech. Journal, 1948 l Theoretical foundations of source and channel coding l Fundamental bounds and coding theorems in a probabilistic setting u in a nutshell: perfect communication in the presence of noise is possible as long as the entropy rate of the source is below the channel capacity l Fundamental theorems essentially non-constructive: we’ve spent the last 52 years realizing Shannon’s promised paradise in practice u very successful: enabled current digital revolution (multimedia, internet, wireless communication, mass storage, …) l Separation theorem: source and channel coding can be done independently Seroussi/Weinberger – Lossless source coding 26-May-04 5 Source Coding data source channel source encoder encoder noiseless channelnoise channel data source channel destination decoder decoder Source coding = Data compression ÞÞ efficient use of bandwidth/space Seroussi/Weinberger – Lossless source coding 26-May-04 6 Data Compression de- D compressor C C D’ channel compressor (encoder) (decoder) data compressed data decompressed data the goal: size(C) < size(D) size(C) compression ratio: r = < 1 in appropriate units, size(D) e.g., bits/symbol q Channel: l Communications channel (“from here to there”) l Storage channel (“from now to then”) qq LosslessLosslesscompression:compression: DD==D’D’ the case of interest here q Lossy compression: D’ is an approximation of D under some metric Seroussi/Weinberger – Lossless source coding 26-May-04 7 Data Sources data D n source x1x2 x3 Kxt Kxn = x1 q Symbols xi Î A = a countable (usually finite) alphabet n q Probabilistic source: xi are random variables; x1 obeys some probability distribution P on An (the ensemble of possible sequences emitted by the source) ¥ l we are often interested in n®¥ : x1 is a random process ¥ ¥ u stationary (time-invariant): xi = xj , as random processes, " i,j ³ 1 u ergodic: time averages converge (to ensemble averages) u memoryless: xi are statistically independent u independent, identically distributed (i.i.d.): memoryless, and xi ~ pX " i q Individual sequence: xi are just symbols, not assumed to be a realization of a random process. The “data source” will include a probability assignment, but it will be derived from the data under certain constraints, and with certain objectives Seroussi/Weinberger – Lossless source coding 26-May-04 8 Statistics on Individual Sequences q Empirical distributions 1 memoryless, pˆ n (a) = {xi |1 £ i £ n, xi = a}, a Î A Bernoulli model x1 n l we can compute empirical statistics of any order (joint, conditional, etc.) l sequence probability according to own empirical distribution n ˆ n P n (x1 ) = pˆ n (xi ) x1 Õ x1 i=1 l this is the highest probability assigned to the sequence by any distribution from the model class (maximum likelihood estimator) l Example: A = {0,1}, n0 = {i | xi = 0}, n1 = n - n0 = {i | xi = 1} n n nn0 nn1 pˆ(0) = 0 , pˆ(1) = 1 : Pˆ(xn ) = pˆ(0)n0 pˆ(1)n1 = 0 1 n n 1 nn n l Notice that if x1 is in fact the outcome of a random process, then its empirical statistics are themselves random variables l e.g., expressions of the type Prp (| pˆ(a) - p(a) |³ e ) Seroussi/Weinberger – Lossless source coding 26-May-04 9 Statistical Models for Data Sources x 1 xt-k+1 xt-1 xt xt+1 q Markov of order k ³ 0 k : finite memory t t p(xt+1 | x1 ) = p(xt+1 | xt-k +1), t ³ k (some convention for t < k) l i.i.d = Markov of order 0 Example p(0|s1) = 0.5 q Finite State Machine (FSM) 1 s 1 l state space S = {s0,s1,...,sK-1} 1 l initial state s0 0 0 1 l transition probability s0 s2 q(s |s’,a), s,s’ÎS, aÎA 0 p(0|s ) = 0.9 p(0|s ) = 0.1 l output probability p(a|s), aÎA, sÎS 0 2 l unifilar Û deterministic transitions: Steady state next-state function f : S´ A®S state probs. é.9 .1 0ù ê ú l every Markov source is equivalent [p 0 p1 p 2 ] .5 0 .5 = [p 0 p1 p 2 ] to a unifilar FSM with K £ |A|k, but in ê.1 0 .9ú ë û symb. probs. general, finite state ¹ finite memory é5 1 5 ù é5 3ù Þ [p 0 p1 p 2 ] = , [p0 p1] = ëê8 16 16ûú ëê8 8ûú Seroussi/Weinberger – Lossless source coding 26-May-04 10 Statistical Models for Data Sources (cont.) q Tree sources (FSMX) 1 s1 1 l finite memory £ k (Markov) l # of past symbols needed to 0 s 0 s 1 determine the state might be < 0 0 2 k for some states s0 « 0, s1 « 01, s2 « 11 l by merging nodes from the full Markov tree, we get a model with a smaller number of free parameters 0 1 l the set of tree sources with unbalanced trees has measure zero in the space of p(0|s0) s0 Markov sources of any given order 0 1 l yet, tree source models have proven very useful in practice, and are associated s0 s0 s1 s2 with some of the best compression algorithms to date p(0|s0) p(0|s0) p(0|s1) p(0|s2) l more about this later ... Seroussi/Weinberger – Lossless source coding 26-May-04 11 Entropy D X ~ p(x) : H (X ) = - p(x)log p(x) [0log 0 = 0] åxÎA entropy of X (or of the PMF p(×) ), measured in bits H (X ) = E p [- log p(X )] l H measures the uncertainty or self-information of X l we also write H(p) : a random variable is not actually needed; p(×) could be an empirical distribution Example: A={0,1}, pX(1) = p, pX(0) = 1-p (overloaded notation!) H 2 ( p) = - p log p - (1- p)log(1- p) binary entropy function Main properties: H2(p) • H2(p) ³ 0, H2 (p) is Ç-convex, 0 £ p £ 1 • H2(p) ® 0 as p ® 0 or 1, with slope ¥ • H2(p) is maximal at p = 0.5, H2(0.5)=1 Þ the entropy of an unbiased coin is 1 bit p Seroussi/Weinberger – Lossless source coding 26-May-04 12 Entropy (cont.) q For a general finite alphabet A, H(X) is maximal when X ~ pu , where pu(a)=1/ |A| for all aÎ A (uniform distribution) l Jensen’s inequality: if f is a È-convex function, then Ef (X) ³ f (EX). l - log x is a È-convex function of x l H(X) = E[log(1/p(X))] = - E[- log(1/p(X))] £ log E[1/p(X)] = log |A| = H(pu) q Empirical entropy l entropy computed an an empirical distribution l Example: recall that the probability assigned to an individual binary sequence by its own zero-order empirical distribution is nn0 nn1 normalized, Pˆ(xn ) = pˆ(0)n0 pˆ(1)n1 = 0 1 1 nn in bits/symbol l we have 1 n æ n ö n æ n ö ˆ n 0 0 1 1 ˆ ˆ n - log P(x1 ) = - logç ÷ - logç ÷ = H2 ( p(0)) = H (x1 ) n n è n ø n è n ø 1 in fact, - log P ˆ ( x n ) = H ˆ ( x n ) holds for a large class of probability models n 1 1 Seroussi/Weinberger – Lossless source coding 26-May-04 13 Joint and Conditional Entropies q The joint entropy of random variables (X,Y) ~ p(x,y) is defined as H(X ,Y ) = - p(x, y)log p(x, y) åx,y l this can be extended to any number of random variables: H(X1,X2,..., Xn) Notation: H( X1,X2,..., Xn) = joint entropy of X1,X2,..., Xn (0 £ H £ n log |A|) H( X1,X2,..., Xn) = H/n = normalized per-symbol entropy (0 £ H £ log |A|) l if (X,Y) are statistically independent, then H(X,Y) = H(X) + H(Y) q The conditional entropy is defined as H (Y | X ) = p(x)H (Y | X = x) = -E log p(Y | X ) åx p( x, y) q Chain rule: H(X ,Y ) = H (X ) + H (Y | X ) q Conditioning reduces uncertainty (on the average): H (X |Y ) £ H (X ) l but H(X|Y=y) ³ H(X) is possible Seroussi/Weinberger – Lossless source coding 26-May-04 14 Entropy Rates q Entropy rate of a random process in bits/symbol, ¥ 1 n H (X1 ) = lim H( X1 ) n®¥ n if the limit exists! q A related limit based on conditional entropy * ¥ in bits/symbol, H (X1 ) = lim H (X n X n-1, X n-2 ,K, X1) n®¥ if the limit exists! Theorem: For a stationary random process, both limits exist, and * ¥ ¥ H (X1 ) = H(X1 ) q Examples: ¥ l X1,X2,..
Recommended publications
  • Entropy Rate of Stochastic Processes
    Entropy Rate of Stochastic Processes Timo Mulder Jorn Peters [email protected] [email protected] February 8, 2015 The entropy rate of independent and identically distributed events can on average be encoded by H(X) bits per source symbol. However, in reality, series of events (or processes) are often randomly distributed and there can be arbitrary dependence between each event. Such processes with arbitrary dependence between variables are called stochastic processes. This report shows how to calculate the entropy rate for such processes, accompanied with some denitions, proofs and brief examples. 1 Introduction One may be interested in the uncertainty of an event or a series of events. Shannon entropy is a method to compute the uncertainty in an event. This can be used for single events or for several independent and identically distributed events. However, often events or series of events are not independent and identically distributed. In that case the events together form a stochastic process i.e. a process in which multiple outcomes are possible. These processes are omnipresent in all sorts of elds of study. As it turns out, Shannon entropy cannot directly be used to compute the uncertainty in a stochastic process. However, it is easy to extend such that it can be used. Also when a stochastic process satises certain properties, for example if the stochastic process is a Markov process, it is straightforward to compute the entropy of the process. The entropy rate of a stochastic process can be used in various ways. An example of this is given in Section 4.1.
    [Show full text]
  • Lecture 3: Entropy, Relative Entropy, and Mutual Information 1 Notation 2
    EE376A/STATS376A Information Theory Lecture 3 - 01/16/2018 Lecture 3: Entropy, Relative Entropy, and Mutual Information Lecturer: Tsachy Weissman Scribe: Yicheng An, Melody Guan, Jacob Rebec, John Sholar In this lecture, we will introduce certain key measures of information, that play crucial roles in theoretical and operational characterizations throughout the course. These include the entropy, the mutual information, and the relative entropy. We will also exhibit some key properties exhibited by these information measures. 1 Notation A quick summary of the notation 1. Discrete Random Variable: U 2. Alphabet: U = fu1; u2; :::; uM g (An alphabet of size M) 3. Specific Value: u; u1; etc. For discrete random variables, we will write (interchangeably) P (U = u), PU (u) or most often just, p(u) Similarly, for a pair of random variables X; Y we write P (X = x j Y = y), PXjY (x j y) or p(x j y) 2 Entropy Definition 1. \Surprise" Function: 1 S(u) log (1) , p(u) A lower probability of u translates to a greater \surprise" that it occurs. Note here that we use log to mean log2 by default, rather than the natural log ln, as is typical in some other contexts. This is true throughout these notes: log is assumed to be log2 unless otherwise indicated. Definition 2. Entropy: Let U a discrete random variable taking values in alphabet U. The entropy of U is given by: 1 X H(U) [S(U)] = log = − log (p(U)) = − p(u) log p(u) (2) , E E p(U) E u Where U represents all u values possible to the variable.
    [Show full text]
  • An Introduction to Information Theory
    An Introduction to Information Theory Vahid Meghdadi reference : Elements of Information Theory by Cover and Thomas September 2007 Contents 1 Entropy 2 2 Joint and conditional entropy 4 3 Mutual information 5 4 Data Compression or Source Coding 6 5 Channel capacity 8 5.1 examples . 9 5.1.1 Noiseless binary channel . 9 5.1.2 Binary symmetric channel . 9 5.1.3 Binary erasure channel . 10 5.1.4 Two fold channel . 11 6 Differential entropy 12 6.1 Relation between differential and discrete entropy . 13 6.2 joint and conditional entropy . 13 6.3 Some properties . 14 7 The Gaussian channel 15 7.1 Capacity of Gaussian channel . 15 7.2 Band limited channel . 16 7.3 Parallel Gaussian channel . 18 8 Capacity of SIMO channel 19 9 Exercise (to be completed) 22 1 1 Entropy Entropy is a measure of uncertainty of a random variable. The uncertainty or the amount of information containing in a message (or in a particular realization of a random variable) is defined as the inverse of the logarithm of its probabil- ity: log(1=PX (x)). So, less likely outcome carries more information. Let X be a discrete random variable with alphabet X and probability mass function PX (x) = PrfX = xg, x 2 X . For convenience PX (x) will be denoted by p(x). The entropy of X is defined as follows: Definition 1. The entropy H(X) of a discrete random variable is defined by 1 H(X) = E log p(x) X 1 = p(x) log (1) p(x) x2X Entropy indicates the average information contained in X.
    [Show full text]
  • Entropy Rate Estimation for Markov Chains with Large State Space
    Entropy Rate Estimation for Markov Chains with Large State Space Yanjun Han Department of Electrical Engineering Stanford University Stanford, CA 94305 [email protected] Jiantao Jiao Department of Electrical Engineering and Computer Sciences University of California, Berkeley Berkeley, CA 94720 [email protected] Chuan-Zheng Lee, Tsachy Weissman Yihong Wu Department of Electrical Engineering Department of Statistics and Data Science Stanford University Yale University Stanford, CA 94305 New Haven, CT 06511 {czlee, tsachy}@stanford.edu [email protected] Tiancheng Yu Department of Electronic Engineering Tsinghua University Haidian, Beijing 100084 [email protected] Abstract Entropy estimation is one of the prototypicalproblems in distribution property test- ing. To consistently estimate the Shannon entropy of a distribution on S elements with independent samples, the optimal sample complexity scales sublinearly with arXiv:1802.07889v3 [cs.LG] 24 Sep 2018 S S as Θ( log S ) as shown by Valiant and Valiant [41]. Extendingthe theory and algo- rithms for entropy estimation to dependent data, this paper considers the problem of estimating the entropy rate of a stationary reversible Markov chain with S states from a sample path of n observations. We show that Provided the Markov chain mixes not too slowly, i.e., the relaxation time is • S S2 at most O( 3 ), consistent estimation is achievable when n . ln S ≫ log S Provided the Markov chain has some slight dependency, i.e., the relaxation • 2 time is at least 1+Ω( ln S ), consistent estimation is impossible when n . √S S2 log S . Under both assumptions, the optimal estimation accuracy is shown to be S2 2 Θ( n log S ).
    [Show full text]
  • Package 'Infotheo'
    Package ‘infotheo’ February 20, 2015 Title Information-Theoretic Measures Version 1.2.0 Date 2014-07 Publication 2009-08-14 Author Patrick E. Meyer Description This package implements various measures of information theory based on several en- tropy estimators. Maintainer Patrick E. Meyer <[email protected]> License GPL (>= 3) URL http://homepage.meyerp.com/software Repository CRAN NeedsCompilation yes Date/Publication 2014-07-26 08:08:09 R topics documented: condentropy . .2 condinformation . .3 discretize . .4 entropy . .5 infotheo . .6 interinformation . .7 multiinformation . .8 mutinformation . .9 natstobits . 10 Index 12 1 2 condentropy condentropy conditional entropy computation Description condentropy takes two random vectors, X and Y, as input and returns the conditional entropy, H(X|Y), in nats (base e), according to the entropy estimator method. If Y is not supplied the function returns the entropy of X - see entropy. Usage condentropy(X, Y=NULL, method="emp") Arguments X data.frame denoting a random variable or random vector where columns contain variables/features and rows contain outcomes/samples. Y data.frame denoting a conditioning random variable or random vector where columns contain variables/features and rows contain outcomes/samples. method The name of the entropy estimator. The package implements four estimators : "emp", "mm", "shrink", "sg" (default:"emp") - see details. These estimators require discrete data values - see discretize. Details • "emp" : This estimator computes the entropy of the empirical probability distribution. • "mm" : This is the Miller-Madow asymptotic bias corrected empirical estimator. • "shrink" : This is a shrinkage estimate of the entropy of a Dirichlet probability distribution. • "sg" : This is the Schurmann-Grassberger estimate of the entropy of a Dirichlet probability distribution.
    [Show full text]
  • Arxiv:1907.00325V5 [Cs.LG] 25 Aug 2020
    Random Forests for Adaptive Nearest Neighbor Estimation of Information-Theoretic Quantities Ronan Perry1, Ronak Mehta1, Richard Guo1, Jesús Arroyo1, Mike Powell1, Hayden Helm1, Cencheng Shen1, and Joshua T. Vogelstein1;2∗ Abstract. Information-theoretic quantities, such as conditional entropy and mutual information, are critical data summaries for quantifying uncertainty. Current widely used approaches for computing such quantities rely on nearest neighbor methods and exhibit both strong performance and theoretical guarantees in certain simple scenarios. However, existing approaches fail in high-dimensional settings and when different features are measured on different scales. We propose decision forest-based adaptive nearest neighbor estimators and show that they are able to effectively estimate posterior probabilities, conditional entropies, and mutual information even in the aforementioned settings. We provide an extensive study of efficacy for classification and posterior probability estimation, and prove cer- tain forest-based approaches to be consistent estimators of the true posteriors and derived information-theoretic quantities under certain assumptions. In a real-world connectome application, we quantify the uncertainty about neuron type given various cellular features in the Drosophila larva mushroom body, a key challenge for modern neuroscience. 1 Introduction Uncertainty quantification is a fundamental desiderata of statistical inference and data science. In supervised learning settings it is common to quantify uncertainty with either conditional en- tropy or mutual information (MI). Suppose we are given a pair of random variables (X; Y ), where X is d-dimensional vector-valued and Y is a categorical variable of interest. Conditional entropy H(Y jX) measures the uncertainty in Y on average given X. On the other hand, mutual information quantifies the shared information between X and Y .
    [Show full text]
  • Chapter 2: Entropy and Mutual Information
    Chapter 2: Entropy and Mutual Information University of Illinois at Chicago ECE 534, Natasha Devroye Chapter 2 outline • Definitions • Entropy • Joint entropy, conditional entropy • Relative entropy, mutual information • Chain rules • Jensen’s inequality • Log-sum inequality • Data processing inequality • Fano’s inequality University of Illinois at Chicago ECE 534, Natasha Devroye Definitions A discrete random variable X takes on values x from the discrete alphabet . X The probability mass function (pmf) is described by p (x)=p(x) = Pr X = x , for x . X { } ∈ X University of Illinois at Chicago ECE 534, Natasha Devroye Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/ for links. 2 Probability, Entropy, and Inference Definitions Copyright Cambridge University Press 2003. On-screen viewing permitted. Printing not permitted. http://www.cambridge.org/0521642981 This chapter, and its sibling, Chapter 8, devote some time to notation. Just You can buy this book for 30 pounds or $50. See http://www.inference.phy.cam.ac.uk/mackay/itila/as the White Knight fordistinguished links. between the song, the name of the song, and what the name of the song was called (Carroll, 1998), we will sometimes 2.1: Probabilities and ensembles need to be careful to distinguish between a random variable, the v23alue of the i ai pi random variable, and the proposition that asserts that the random variable x has a particular value. In any particular chapter, however, I will use the most 1 a 0.0575 a Figure 2.2.
    [Show full text]
  • Entropy Power, Autoregressive Models, and Mutual Information
    entropy Article Entropy Power, Autoregressive Models, and Mutual Information Jerry Gibson Department of Electrical and Computer Engineering, University of California, Santa Barbara, Santa Barbara, CA 93106, USA; [email protected] Received: 7 August 2018; Accepted: 17 September 2018; Published: 30 September 2018 Abstract: Autoregressive processes play a major role in speech processing (linear prediction), seismic signal processing, biological signal processing, and many other applications. We consider the quantity defined by Shannon in 1948, the entropy rate power, and show that the log ratio of entropy powers equals the difference in the differential entropy of the two processes. Furthermore, we use the log ratio of entropy powers to analyze the change in mutual information as the model order is increased for autoregressive processes. We examine when we can substitute the minimum mean squared prediction error for the entropy power in the log ratio of entropy powers, thus greatly simplifying the calculations to obtain the differential entropy and the change in mutual information and therefore increasing the utility of the approach. Applications to speech processing and coding are given and potential applications to seismic signal processing, EEG classification, and ECG classification are described. Keywords: autoregressive models; entropy rate power; mutual information 1. Introduction In time series analysis, the autoregressive (AR) model, also called the linear prediction model, has received considerable attention, with a host of results on fitting AR models, AR model prediction performance, and decision-making using time series analysis based on AR models [1–3]. The linear prediction or AR model also plays a major role in many digital signal processing applications, such as linear prediction modeling of speech signals [4–6], geophysical exploration [7,8], electrocardiogram (ECG) classification [9], and electroencephalogram (EEG) classification [10,11].
    [Show full text]
  • GAIT: a Geometric Approach to Information Theory
    GAIT: A Geometric Approach to Information Theory Jose Gallego Ankit Vani Max Schwarzer Simon Lacoste-Julieny Mila and DIRO, Université de Montréal Abstract supports. Optimal transport distances, such as Wasser- stein, have emerged as practical alternatives with theo- retical grounding. These methods have been used to We advocate the use of a notion of entropy compute barycenters (Cuturi and Doucet, 2014) and that reflects the relative abundances of the train generative models (Genevay et al., 2018). How- symbols in an alphabet, as well as the sim- ever, optimal transport is computationally expensive ilarities between them. This concept was as it generally lacks closed-form solutions and requires originally introduced in theoretical ecology to the solution of linear programs or the execution of study the diversity of ecosystems. Based on matrix scaling algorithms, even when solved only in this notion of entropy, we introduce geometry- approximate form (Cuturi, 2013). Approaches based aware counterparts for several concepts and on kernel methods (Gretton et al., 2012; Li et al., 2017; theorems in information theory. Notably, our Salimans et al., 2018), which take a functional analytic proposed divergence exhibits performance on view on the problem, have also been widely applied. par with state-of-the-art methods based on However, further exploration on the interplay between the Wasserstein distance, but enjoys a closed- kernel methods and information theory is lacking. form expression that can be computed effi- ciently. We demonstrate the versatility of our Contributions. We i) introduce to the machine learn- method via experiments on a broad range of ing community a similarity-sensitive definition of en- domains: training generative models, comput- tropy developed by Leinster and Cobbold(2012).
    [Show full text]
  • Source Coding: Part I of Fundamentals of Source and Video Coding
    Foundations and Trends R in sample Vol. 1, No 1 (2011) 1–217 c 2011 Thomas Wiegand and Heiko Schwarz DOI: xxxxxx Source Coding: Part I of Fundamentals of Source and Video Coding Thomas Wiegand1 and Heiko Schwarz2 1 Berlin Institute of Technology and Fraunhofer Institute for Telecommunica- tions — Heinrich Hertz Institute, Germany, [email protected] 2 Fraunhofer Institute for Telecommunications — Heinrich Hertz Institute, Germany, [email protected] Abstract Digital media technologies have become an integral part of the way we create, communicate, and consume information. At the core of these technologies are source coding methods that are described in this text. Based on the fundamentals of information and rate distortion theory, the most relevant techniques used in source coding algorithms are de- scribed: entropy coding, quantization as well as predictive and trans- form coding. The emphasis is put onto algorithms that are also used in video coding, which will be described in the other text of this two-part monograph. To our families Contents 1 Introduction 1 1.1 The Communication Problem 3 1.2 Scope and Overview of the Text 4 1.3 The Source Coding Principle 5 2 Random Processes 7 2.1 Probability 8 2.2 Random Variables 9 2.2.1 Continuous Random Variables 10 2.2.2 Discrete Random Variables 11 2.2.3 Expectation 13 2.3 Random Processes 14 2.3.1 Markov Processes 16 2.3.2 Gaussian Processes 18 2.3.3 Gauss-Markov Processes 18 2.4 Summary of Random Processes 19 i ii Contents 3 Lossless Source Coding 20 3.1 Classification
    [Show full text]
  • Relative Entropy Rate Between a Markov Chain and Its Corresponding Hidden Markov Chain
    ⃝c Statistical Research and Training Center دورهی ٨، ﺷﻤﺎرهی ١، ﺑﻬﺎر و ﺗﺎﺑﺴﺘﺎن ١٣٩٠، ﺻﺺ ٩٧–١٠٩ J. Statist. Res. Iran 8 (2011): 97–109 Relative Entropy Rate between a Markov Chain and Its Corresponding Hidden Markov Chain GH. Yariy and Z. Nikooraveshz;∗ y Iran University of Science and Technology z Birjand University of Technology Abstract. In this paper we study the relative entropy rate between a homo- geneous Markov chain and a hidden Markov chain defined by observing the output of a discrete stochastic channel whose input is the finite state space homogeneous stationary Markov chain. For this purpose, we obtain the rela- tive entropy between two finite subsequences of above mentioned chains with the help of the definition of relative entropy between two random variables then we define the relative entropy rate between these stochastic processes and study the convergence of it. Keywords. Relative entropy rate; stochastic channel; Markov chain; hid- den Markov chain. MSC 2010: 60J10, 94A17. 1 Introduction Suppose fXngn2N is a homogeneous stationary Markov chain with finite state space S = f0; 1; 2; :::; N − 1g and fYngn2N is a hidden Markov chain (HMC) which is observed through a discrete stochastic channel where the input of channel is the Markov chain. The output state space of channel is characterized by channel’s statistical properties. From now on we study ∗ Corresponding author 98 Relative Entropy Rate between a Markov Chain and Its ::: the channels state spaces which have been equal to the state spaces of input chains. Let P = fpabg be the one-step transition probability matrix of the Markov chain such that pab = P rfXn = bjXn−1 = ag for a; b 2 S and Q = fqabg be the noisy matrix of channel where qab = P rfYn = bjXn = ag for a; b 2 S.
    [Show full text]
  • On a General Definition of Conditional Rényi Entropies
    proceedings Proceedings On a General Definition of Conditional Rényi Entropies † Velimir M. Ili´c 1,*, Ivan B. Djordjevi´c 2 and Miomir Stankovi´c 3 1 Mathematical Institute of the Serbian Academy of Sciences and Arts, Kneza Mihaila 36, 11000 Beograd, Serbia 2 Department of Electrical and Computer Engineering, University of Arizona, 1230 E. Speedway Blvd, Tucson, AZ 85721, USA; [email protected] 3 Faculty of Occupational Safety, University of Niš, Carnojevi´ca10a,ˇ 18000 Niš, Serbia; [email protected] * Correspondence: [email protected] † Presented at the 4th International Electronic Conference on Entropy and Its Applications, 21 November–1 December 2017; Available online: http://sciforum.net/conference/ecea-4. Published: 21 November 2017 Abstract: In recent decades, different definitions of conditional Rényi entropy (CRE) have been introduced. Thus, Arimoto proposed a definition that found an application in information theory, Jizba and Arimitsu proposed a definition that found an application in time series analysis and Renner-Wolf, Hayashi and Cachin proposed definitions that are suitable for cryptographic applications. However, there is still no a commonly accepted definition, nor a general treatment of the CRE-s, which can essentially and intuitively be represented as an average uncertainty about a random variable X if a random variable Y is given. In this paper we fill the gap and propose a three-parameter CRE, which contains all of the previous definitions as special cases that can be obtained by a proper choice of the parameters. Moreover, it satisfies all of the properties that are simultaneously satisfied by the previous definitions, so that it can successfully be used in aforementioned applications.
    [Show full text]