Information Theory: Principles and Applications

Total Page:16

File Type:pdf, Size:1020Kb

Information Theory: Principles and Applications Information Theory: Principles and Applications . .. Tiago T. V. Vinhoza May 28, 2010 . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 1 / 50 . ..1 Differential Entropy Definition Other Information Measures Properties . ..2 Gaussian Channel Capacity Coding Theorem Achievability and Converse Parallel Gaussian Channels: Waterfilling . ..3 Fading Channels . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 2 / 50 Differential Entropy Definition Differential Entropy Entropy of a continuous random variable. Let X be a random variable with cumulative distribution function FX (x) and probabliity density function pX (x). Z h(X) = − pX (x) log pX (x)dx S where S is the supporting set of the random variable X, that is, the set where pX (x) > 0. Like the discrete case, the differential entropy is only dependent of pX (x). Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 3 / 50 Differential Entropy Definition Differential Entropy: Example 1 Uniform distribution. Consider a random variable uniformly distributed from 0 to a. Z h(X) = − pX (x) log pX (x)dx ZS a 1 1 = − log dx = log a 0 a a Note that for a < 1, log a < 0, so the differential entropy can be negative. The volume of the support set, 2h(X) = 2log a = a is always a non-negative quantity. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 4 / 50 Differential Entropy Definition Differential Entropy: Example 2 Gaussian distribution with zero mean and variance σ2. 1 −x2=2σ2 pX (x) = p e 2πσ Z h(X) = − pX (x) ln pX (x)dx ZS [ ] 1 −x2=2σ2 = − pX (x) ln p e dx 2πσ Z hp i Z 2 2 2 = pX (x) ln 2πσ dx + x =2σ pX (x)dx 1 E[X2] = ln 2πσ2 + 2 2σ2 1 1 = ln 2πσ2 + 2 2 1 1 = ln 2πeσ2 nats = log 2πeσ2 bits. 2 2 . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 5 / 50 Differential Entropy Definition Relations Between Differential Entropy and Discrete Entropy Consider a random variable X with density pX (x). Divide the range of X into bins of length ∆. Mean value theorem: Z (i+1)∆ pX (xi) = pX (x)dx i∆ Define the quantized random variable X∆ ∆ X = xi; if i∆ ≤ X ≤ (i + 1)∆: ∆ P (X = xi) = f(xi)∆. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 6 / 50 Differential Entropy Definition Relations Between Differential Entropy and Discrete Entropy The entropy of this quantized variable is X1 ∆ H(X ) = − pi log pi i=−∞ X1 = − pX (xi)∆ log(pX (xi)∆) i=−∞ X1 X1 = − pX (xi)∆ log(pX (xi)) − pX (xi)∆ log(∆) i=−∞ i=−∞ X1 = − pX (xi)∆ log(pX (xi)) − log(∆) i=−∞ . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 7 / 50 Differential Entropy Definition Relations Between Differential Entropy and Discrete Entropy If the density pX (x) is Riemann integrable, then H(X∆) + log ∆ ! h(X) as ∆ ! 0: The entropy of and n-bit quantization of a continuous random variable X is approximately h(X) + n. Example: If X has an uniform distribution on [0; 1], and we let ∆ = 2−n, then h(X) = 0, H(X∆) = n and n bits suffice to describe X with n bits accuracy. Example: If X has an uniform distribution on [0; 1=8). The first three bits after the decimal point are zero. To describe X with n bit precision, we need only n − 3 bits, which agrees with h(X) = −3. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 8 / 50 Differential Entropy Other Information Measures Joint Differential Entropy The differential entropy of a random vector Xn composed of the n random variables X1;X2;:::;Xn with density pXn (x ) is defined as ZZ Z n n n n h(X ) = − ::: pXn (x ) log pXn (x )dx . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 9 / 50 Differential Entropy Other Information Measures Joint Differential Entropy: Example Entropy of a multivariate Gaussian distribution: Let X1;X2;:::Xn form a Gaussian random vector with mean µ and covariance matrix K, that is, Xn ∼ N (µ; K). 1 h(Xn) = h(X ;X ;:::;X ) = log(2πe)njKj: 1 2 n 2 where jKj denotes the determinant of the covariance matrix K. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 10 / 50 Differential Entropy Other Information Measures Conditional Differential Entropy If X; Y have a joint pdf pXY (x; y), the conditional differential entropy h(XjY ) is defined as ZZ h(XjY ) = − pXY (x; y) log pXjY =y(x)dxdy As pXY (x; y) log pXjY =y(x)pY (y), we can write h(X; Y ) = h(XjY ) + h(Y ) . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 11 / 50 Differential Entropy Other Information Measures Relative Entropy Is a measure of the distance between two continuous distributions. The relative entropy between two probability density functions fX (x) and gX (x) is defined as: Z fX (x) D(fX (x)jjgX (x)) = fX (x) log dx gX (x) . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 12 / 50 Differential Entropy Other Information Measures Relative Entropy D(fX (x)jjgX (x)) ≥ 0 with equality if and only if fX (x) = gX (x). D(fX (x)jjgX (x)) =6 D(gX (x)jjfX (x)) . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 13 / 50 Differential Entropy Other Information Measures Mutual Information The mutual information of two continuous random variables X and Y is defined as the relative entropy between the joint probability density pXY (x; y) and the product of the marginals pX (x) and pY (y) jj I(X; Y ) = ZZD(pXY (x; y) pX (x)pY (y)) pXY (x; y) = pXY (x; y) log dxdy pX (x)pY (y) I(X; Y ) = h(X) − h(XjY ) = h(Y ) − h(Y jX). Mutual information of continuous random variables is the limit of the mutual information between their quantized versions I(X∆; Y ∆) = H(X∆) − H(X∆jY ∆) ≈ h(X) − log ∆ − (h(XjY ) − log ∆) = I(X; Y ) . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 14 / 50 Differential Entropy Properties Properties Translation does not change the differential entropy h(X + c) = h(X) Multiplication by a constant h(aX) = h(X) + log jaj Same property holds for random vectors h(AXn) = h(Xn) + log jAj where jAj is the absolute value of the determinant of A. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 15 / 50 Differential Entropy Properties Properties The multivariate Gaussian distribution maximizes the entropy over all distribution with the same covariance matrix. 1 h(Xn) ≤ log(2πe)njKj: 2 with equality if and only if Xn ∼ N (0; K). Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 16 / 50 Differential Entropy Properties Asymptotic Equipartition Property Like the discrete case, we can define a typical set and characterize it Let X1;X2;:::;Xn be a sequence of iid random variables with probability density function pX (x). 1 − log p (x ; x ; : : : ; x ) ! E[− log p (x)] = h(X) n X1X2:::Xn 1 2 n X Like the discrete case, this results follows from the weak law of large numbers . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 17 / 50 Differential Entropy Properties Asymptotic Equipartition Property (n) The typical set Aϵ with respect to pX (x) is the set of sequences n (x1; x2; : : : ; xn) 2 X with the following property: { } − (n) n log pXn (x) A = x : − h(X) ≤ ϵ ϵ n The properties of the typical set for continuous random variables are the same as the ones for the discrete case. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 18 / 50 Differential Entropy Properties Asymptotic Equipartition Property The volume of the typical set for continuous random variablesis the analog of the cardinality of the typical set for the discrete case. The volume Vol(A) of a set A 2 Rn is defined as Z Vol(A) = dx1dx2 : : : dxn A . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 19 / 50 Differential Entropy Properties Asymptotic Equipartition Property: Properties n (n) P (X 2 Aϵ ) > 1 − ϵ for n sufficiently large. (n) n(h(X)+ϵ) Vol(Aϵ ) ≤ 2 for all n. (n) n(h(X)−ϵ) Vol(Aϵ ) ≥ (1 − ϵ)2 for n sufficiently large. The results for joint typicality follows the ones for the discrete case. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 20 / 50 Gaussian Channel Gaussian Channel It is a discrete-time channel where the output at time i, Yi is the sum of the input Xi and the Gaussian noise Zi Yi = Xi + Zi;Zi ∼ N (0;N) Noise is assumed to be independent from input. Noiseless case: infinite capacity. Any real number can be transmitted without error. Unconstrained inputs: infinite capacity. Even with noise, we can choose the inputs arbitrarily far apart, so that they are distinguishable at the output with probability of error as small as we want. Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 21 / 50 Gaussian Channel Gaussian Channel Limitations on the input: Power constraint Average power constraint is assumed. For a any length-n codeword (x1; x2; : : : xn), it is required that 1 Xn x2 ≤ P n i i=1 . Tiago T. V. Vinhoza () Information Theory - MAP-Tele May 28, 2010 22 / 50 Gaussian Channel Gaussian Channel: A suboptimal use We want to send one bit over the channel in one use of the channel. Given thep power constraint,p we have two possibilities of signals to transmit P and − P .
Recommended publications
  • An Unforeseen Equivalence Between Uncertainty and Entropy
    An Unforeseen Equivalence between Uncertainty and Entropy Tim Muller1 University of Nottingham [email protected] Abstract. Beta models are trust models that use a Bayesian notion of evidence. In that paradigm, evidence comes in the form of observing whether an agent has acted well or not. For Beta models, uncertainty is the inverse of the amount of evidence. Information theory, on the other hand, offers a fundamentally different approach to the issue of lacking knowledge. The entropy of a random variable is determined by the shape of its distribution, not by the evidence used to obtain it. However, we dis- cover that a specific entropy measure (EDRB) coincides with uncertainty (in the Beta model). EDRB is the expected Kullback-Leibler divergence between two Bernouilli trials with parameters randomly selected from the distribution. EDRB allows us to apply the notion of uncertainty to other distributions that may occur when generalising Beta models. Keywords: Uncertainty · Entropy · Information Theory · Beta model · Subjective Logic 1 Introduction The Beta model paradigm is a powerful formal approach to studying trust. Bayesian logic is at the core of the Beta model: \agents with high integrity be- have honestly" becomes \honest behaviour evidences high integrity". Its simplest incarnation is to apply Beta distributions naively, and this approach has limited success. However, more powerful and sophisticated approaches are widespread (e.g. [3,13,17]). A commonality among many approaches, is that more evidence (in the form of observing instances of behaviour) yields more certainty of an opinion. Uncertainty is inversely proportional to the amount of evidence. Evidence is often used in machine learning.
    [Show full text]
  • Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities
    Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 20 October 2016 doi:10.20944/preprints201610.0086.v1 Peer-reviewed version available at Entropy 2016, 18, 442; doi:10.3390/e18120442 Article Guaranteed Bounds on Information-Theoretic Measures of Univariate Mixtures Using Piecewise Log-Sum-Exp Inequalities Frank Nielsen 1,2,* and Ke Sun 1 1 École Polytechnique, Palaiseau 91128, France; [email protected] 2 Sony Computer Science Laboratories Inc., Paris 75005, France * Correspondence: [email protected] Abstract: Information-theoretic measures such as the entropy, cross-entropy and the Kullback-Leibler divergence between two mixture models is a core primitive in many signal processing tasks. Since the Kullback-Leibler divergence of mixtures provably does not admit a closed-form formula, it is in practice either estimated using costly Monte-Carlo stochastic integration, approximated, or bounded using various techniques. We present a fast and generic method that builds algorithmically closed-form lower and upper bounds on the entropy, the cross-entropy and the Kullback-Leibler divergence of mixtures. We illustrate the versatile method by reporting on our experiments for approximating the Kullback-Leibler divergence between univariate exponential mixtures, Gaussian mixtures, Rayleigh mixtures, and Gamma mixtures. Keywords: information geometry; mixture models; log-sum-exp bounds 1. Introduction Mixture models are commonly used in signal processing. A typical scenario is to use mixture models [1–3] to smoothly model histograms. For example, Gaussian Mixture Models (GMMs) can be used to convert grey-valued images into binary images by building a GMM fitting the image intensity histogram and then choosing the threshold as the average of the Gaussian means [1] to binarize the image.
    [Show full text]
  • A Lower Bound on the Differential Entropy of Log-Concave Random Vectors with Applications
    entropy Article A Lower Bound on the Differential Entropy of Log-Concave Random Vectors with Applications Arnaud Marsiglietti 1,* and Victoria Kostina 2 1 Center for the Mathematics of Information, California Institute of Technology, Pasadena, CA 91125, USA 2 Department of Electrical Engineering, California Institute of Technology, Pasadena, CA 91125, USA; [email protected] * Correspondence: [email protected] Received: 18 January 2018; Accepted: 6 March 2018; Published: 9 March 2018 Abstract: We derive a lower bound on the differential entropy of a log-concave random variable X in terms of the p-th absolute moment of X. The new bound leads to a reverse entropy power inequality with an explicit constant, and to new bounds on the rate-distortion function and the channel capacity. Specifically, we study the rate-distortion function for log-concave sources and distortion measure ( ) = j − jr ≥ d x, xˆ x xˆ , with r 1, and we establish thatp the difference between the rate-distortion function and the Shannon lower bound is at most log( pe) ≈ 1.5 bits, independently of r and the target q pe distortion d. For mean-square error distortion, the difference is at most log( 2 ) ≈ 1 bit, regardless of d. We also provide bounds on the capacity of memoryless additive noise channels when the noise is log-concave. We show that the difference between the capacity of such channels and the capacity of q pe the Gaussian channel with the same noise power is at most log( 2 ) ≈ 1 bit. Our results generalize to the case of a random vector X with possibly dependent coordinates.
    [Show full text]
  • Specific Differential Entropy Rate Estimation for Continuous-Valued Time Series
    Specific Differential Entropy Rate Estimation for Continuous-Valued Time Series David Darmon1 1Department of Military and Emergency Medicine, Uniformed Services University, Bethesda, MD 20814, USA October 16, 2018 Abstract We introduce a method for quantifying the inherent unpredictability of a continuous-valued time series via an extension of the differential Shan- non entropy rate. Our extension, the specific entropy rate, quantifies the amount of predictive uncertainty associated with a specific state, rather than averaged over all states. We relate the specific entropy rate to pop- ular `complexity' measures such as Approximate and Sample Entropies. We provide a data-driven approach for estimating the specific entropy rate of an observed time series. Finally, we consider three case studies of estimating specific entropy rate from synthetic and physiological data relevant to the analysis of heart rate variability. 1 Introduction The analysis of time series resulting from complex systems must often be per- formed `blind': in many cases, mechanistic or phenomenological models are not available because of the inherent difficulty in formulating accurate models for complex systems. In this case, a typical analysis may assume that the data are the model, and attempt to generalize from the data in hand to the system. arXiv:1606.02615v1 [cs.LG] 8 Jun 2016 For example, a common question to ask about a times series is how `complex' it is, where we place complex in quotes to emphasize the lack of a satisfactory definition of complexity at present [60]. An answer is then sought that agrees with a particular intuition about what makes a system complex: for example, trajectories from periodic and entirely random systems appear simple, while tra- jectories from chaotic systems appear quite complicated.
    [Show full text]
  • "Differential Entropy". In: Elements of Information Theory
    Elements of Information Theory Thomas M. Cover, Joy A. Thomas Copyright 1991 John Wiley & Sons, Inc. Print ISBN 0-471-06259-6 Online ISBN 0-471-20061-1 Chapter 9 Differential Entropy We now introduce the concept of differential entropy, which is the entropy of a continuous random variable. Differential entropy is also related to the shortest description length, and is similar in many ways to the entropy of a discrete random variable. But there are some important differences, and there is need for some care in using the concept. 9.1 DEFINITIONS Definition: Let X be a random variable with cumulative distribution function F(x) = Pr(X I x). If F(x) is continuous, the random variable is said to be continuous. Let fix) = F’(x) when the derivative is defined. If J”co fb> = 1, th en fl x 1 is called the probability density function for X. The set where f(x) > 0 is called the support set of X. Definition: The differential entropy h(X) of a continuous random vari- able X with a density fix) is defined as h(X) = - f(x) log f(x) dx , (9.1) where S is the support set of the random variable. As in the discrete case, the differential entropy depends only on the probability density of the random variable, and hence the differential entropy is sometimes written as h(f) rather than h(X). 2.24 9.2 THE AEP FOR CONTZNUOUS RANDOM VARIABLES 225 Remark: As in every example involving an integral, or even a density, we should include the statement if it exists.
    [Show full text]
  • January 28, 2021 1 Dealing with Infinite Universes
    Information and Coding Theory Winter 2021 Lecture 6: January 28, 2021 Lecturer: Madhur Tulsiani 1 Dealing with infinite universes So far, we have only considered random variables taking values over a finite universe. We now consider how to define the various information theoretic quantities, when the set of possible values is not finite. 1.1 Countable universes When the universe is countable, various information theoretic quantities such as entropy an KL-divergence can be defined essentially as before. Of course, since we now have infinite sums in the definitions, these should be treated as limits of the appropriate series. Hence, all quantities are defined as limits of the corresponding series, when the limit exists. Convergence is usually not a problem, but it is possible to construct examples where the entropy is infinite. Consider the case of U = N, and a probability distribution P satisfying ∑x2N p(x) = 1. Since the sequence ∑x p(x) converges, usually the terms of ∑x p(x) · log(1/p(x)) are not much larger. However, we can construct an example using a the fact that ∑n≥2 1/(k · (log k) ) converges if an only if a > 1. Define C 1 1 p(x) = 8x ≥ 2 where lim = . 2 n!¥ ∑ 2 x · (log x) 2≤x≤n x · (log x) C Then, for a random variable X distributed according to P, C x · (log x)2 ( ) = · = H X ∑ 2 log ¥ . x≥2 x · (log x) C Exercise 1.1. Calculate H(X) when X be a geometric random variable with P [X = n] = (1 − p)n−1 · p 8n ≥ 1 1 1.2 Uncountable universes When the universe is not countable, one has to use measure theory to define the appropri- ate information theoretic quantities (actually, it is the KL-divergence which is defined this way).
    [Show full text]
  • Introduction to Information Theory and Coding
    Introduction to information theory and coding Louis WEHENKEL Set of slides No 4 Source modeling and source coding Stochastic processes and models for information sources • First Shannon theorem : data compression limit • Overview of state of the art in data compression • Relations between automatic learning and data compression • IT 2007-4, slide 1 The objectives of this part of the course are the following : Understand the relationship between discrete information sources and stochastic processes • Define the notions of entropy for stochastic processes (infinitely long sequences of not necessarily independent • symbols) Talk a little more about an important class of stochastic processes (Markov processes/chains) • How can we compress sequences of symbols in a reversible way • What are the ultimate limits of data compression • Review state of the art data compression algorithms • Look a little bit at irreversible data compression (analog signals, sound, images. ) • Do not expect to become a specialist in data compression in one day. IT 2007-4, note 1 1. Introduction How and why can we compress information? Simple example : A source S which emits messages as sequences of the 3 symbols (a, b, c). 1 1 1 P (a) = 2 , P (b) = 4 , P (c) = 4 (i.i.d.) 1 1 1 Entropy : H(S) = 2 log 2 + 4 log 4 + 4 log 4 = 1.5 Shannon/ source symbol Not that there is redundancy : H(S) < log 3 = 1.585. How to code “optimaly” using a binary code ? 1. Associate to each symbol of the source a binary word. 2. Make sure that the code is unambiguous (can decode correctly).
    [Show full text]
  • A Probabilistic Upper Bound on Differential Entropy Joseph Destefano, Member, IEEE, and Erik Learned-Miller
    1 A Probabilistic Upper Bound on Differential Entropy Joseph DeStefano, Member, IEEE, and Erik Learned-Miller 1 Abstract— A novel, non-trivial, probabilistic upper bound on the entropy of an unknown one-dimensional distribution, given 0.9 the support of the distribution and a sample from that dis- tribution, is presented. No knowledge beyond the support of 0.8 the unknown distribution is required, nor is the distribution required to have a density. Previous distribution-free bounds on 0.7 the cumulative distribution function of a random variable given Upper bound a sample of that variable are used to construct the bound. A 0.6 simple, fast, and intuitive algorithm for computing the entropy bound from a sample is provided. 0.5 Index Terms— Differential entropy, entropy bound, string- 0.4 Empirical cumulative tightening algorithm 0.3 Lower bound I. INTRODUCTION 0.2 The differential entropy of a distribution [6] is a quantity 0.1 employed ubiquitously in communications, statistical learning, 0 physics, and many other fields. If is a one-dimensional 0 0.5 1 1.5 2 2.5 3 3.5 4 random variable with absolutely continuous distribution ¡£¢¥¤§¦ and density ¨©¢¥¤§¦ , the differential entropy of is defined to Fig. 1. This figure shows a typical empirical cumulative distribution (blue). The red lines show us the probabilistic bounds provided by Equation 2 on the be true cumulative. ¦ ¨©¢¥¤§¦¨©¢¤¦©¤ ¢ (1) For our purposes, the distribution need not have a density. II. THE BOUND If there are discontinuities in the distribution function ¡£¢¥¤§¦ , 1 ¤+- ¤. Given , samples, through , from an unknown dis- then no density exists and the entropy is ! .
    [Show full text]
  • Claude Shannon: His Work and Its Legacy1
    History Claude Shannon: His Work and Its Legacy1 Michelle Effros (California Institute of Technology, USA) and H. Vincent Poor (Princeton University, USA) The year 2016 marked the centennial of the birth of Claude Elwood Shannon, that singular genius whose fer- tile mind gave birth to the field of information theory. In addition to providing a source of elegant and intrigu- ing mathematical problems, this field has also had a pro- found impact on other fields of science and engineering, notably communications and computing, among many others. While the life of this remarkable man has been recounted elsewhere, in this article we seek to provide an overview of his major scientific contributions and their legacy in today’s world. This is both an enviable and an unenviable task. It is enviable, of course, because it is a wonderful story; it is unenviable because it would take volumes to give this subject its due. Nevertheless, in the hope of providing the reader with an appreciation of the extent and impact of Shannon’s major works, we shall try. To approach this task, we have divided Shannon’s work into 10 topical areas: - Channel capacity - Channel coding - Multiuser channels - Network coding - Source coding - Detection and hypothesis testing - Learning and big data - Complexity and combinatorics - Secrecy - Applications We will describe each one briefly, both in terms of Shan- non’s own contribution and in terms of how the concepts initiated by Shannon have influenced work in the inter- communication is possible even in the face of noise and vening decades. By necessity, we will take a minimalist the demonstration that a channel has an inherent maxi- approach in this discussion.
    [Show full text]
  • Chapter 5 Differential Entropy and Gaussian Channels
    Chapter 5 Differential Entropy and Gaussian Channels Po-Ning Chen, Professor Institute of Communications Engineering National Chiao Tung University Hsin Chu, Taiwan 30010, R.O.C. Continuous sources I: 5-1 • Model {Xt ∈X,t∈ I} – Discrete sources ∗ Both X and I are discrete. – Continuous sources ∗ Discrete-time continuous sources ·Xis continuous; I is discrete. ∗ Waveform sources · Both X and I are continuous. • We have so far examined information measures and their operational charac- terization for discrete-time discrete-alphabet systems. In this chapter, we turn our focus to discrete-time continuous-alphabet (real-valued) sources. Information content of continuous sources I: 5-2 • If the random variable takes on values in a continuum, the minimum number of bits per symbol needed to losslessly describe it must be infinite. • This is illustrated in the following example and validated in Lemma 5.2. Example 5.1 – Consider a real-valued random variable X that is uniformly distributed on the unit interval, i.e., with pdf given by 1ifx ∈ [0, 1); fX(x)= 0otherwise. – Given a positive integer m, we can discretize X by uniformly quantizing it into m levels by partitioning the support of X into equal-length segments 1 of size ∆ = m (∆ is called the quantization step-size) such that: i i − 1 i q (X)= , if ≤ X< , m m m m for 1 ≤ i ≤ m. – Then the entropy of the quantized random variable qm(X)isgivenby m 1 1 H(q (X)) = − log2 =log2 m (in bits). m m m i=1 Information content of continuous sources I: 5-3 – Since the entropy H(qm(X)) of the quantized version of X is a lower bound to the entropy of X (as qm(X) is a function of X) and satisfies in the limit lim H(qm(X)) = lim log2 m = ∞, m→∞ m→∞ we obtain that the entropy of X is infinite.
    [Show full text]
  • Entropy-Based Approach for Detecting Feature Reliability
    ENTROPY-BASED APPROACH FOR DETECTING FEATURE RELIABILITY Dragoljub Pokrajac, Delaware State University [email protected] Longin Jan Latecki,Temple University [email protected] Abstract – Although a tremendous effort has been made One of the most successful approaches for motion to perform a reliable analysis of images and videos in the detection was introduced by Stauffer and Grimson [7]. It is past fifty years, the reality is that one cannot rely in 100% on based on adaptive Gaussian mixture model of the color the analysis results. In this paper, rather than proposing yet values distribution over time at each pixel location. Each another improvement in video analysis techniques, we Gaussian function in the mixture is defined by its prior discuss entropy-based monitoring of features reliability. probability, mean and a covariance matrix. In [8], we applied Major assumption of our approach is that the noise, the Stauffer-Grimson technique on spatial-temporal blocks adversingly affecting the observed feature values, is and demonstrate the robustness of the modified technique in Gaussian and that the distribution of noised feature comparison to original pixel-based approach. approaches the normal distribution with higher magnitudes of noise. In this paper, we consider one-dimensional features In [9] we have shown that the motion detection algorithms and compare two different ways of differential entropy based on local variation are not only a much simpler but also estimation—histogram based and the parametric based can be a more adequate model for motion detection for estimation. We demonstrate that the parametric approach is infrared videos. This technique can significantly reduce the superior and applicable to identify time intervals of a test processing time in comparison to the Gaussian mixture video where the observed features are not reliable for motion model, due to smaller complexity of the local variation detection tasks.
    [Show full text]
  • Measuring the Complexity of Continuous Distributions
    Entropy 2015, xx, 1-x; doi:10.3390/—— OPEN ACCESS entropy ISSN 1099-4300 www.mdpi.com/journal/entropy Article Measuring the Complexity of Continuous Distributions Guillermo Santamaría-Bonfil 1;2*, Nelson Fernández 3;4* and Carlos Gershenson 1;2;5;6;7* 1Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México. 2 Centro de Ciencias de la Complejidad, UNAM, México. 3 Laboratorio de Hidroinformática, Universidad de Pamplona, Colombia. 4 Grupo de Investigación en Ecología y Biogeografía, Universidad de Pamplona, Colombia. 5 SENSEable City Lab, Massachusetts Institute of Technology, USA. 6 MoBS Lab, Northeastern University, USA. 7 ITMO University, St. Petersburg, Russian Federation. * Authors to whom correspondence should be addressed; [email protected], [email protected],[email protected]. Received: xx / Accepted: xx / Published: xx Abstract: We extend previously proposed measures of complexity, emergence, and self-organization to continuous distributions using differential entropy. This allows us to calculate the complexity of phenomena for which distributions are known. We find that a broad range of common parameters found in Gaussian and scale-free distributions present high complexity values. We also explore the relationship between our measure of complexity and information adaptation. Keywords: complexity; emergence; self-organization; information; differential entropy; arXiv:1511.00529v1 [nlin.AO] 2 Nov 2015 probability distributions. 1. Introduction We all agree that complexity is everywhere. Yet, there is no agreed definition of complexity. Perhaps complexity is so general that it resists definition [1]. Still, it is useful to have formal measures of complexity to study and compare different phenomena [2].
    [Show full text]