Information Measures for Deterministic Input-Output Systems

1 Information Measures for Deterministic Input-Output Systems Bernhard C. Geiger, Student Member, IEEE, and Gernot Kubin, Member, IEEE Abstract—In this work the information loss in deterministic, processing, just as, loosely speaking, a passive system cannot memoryless systems is investigated by evaluating the conditional increase the energy contained in a signal1. entropy of the input random variable given the output random Clearly, the information lost in a system not only depends variable. It is shown that for a large class of systems the information loss is finite, even if the input is continuously distributed. on the system but also on the signal carrying this information; Based on this finiteness, the problem of perfectly reconstructing the same holds for the energy lost or gained. While there the input is addressed and Fano-type bounds between the is a strong connection between energy and information for information loss and the reconstruction error probability are Gaussian signals (entropy and entropy rate of a Gaussian derived. signal are related to its variance and power spectral density, For systems with infinite information loss a relative measure is defined and shown to be tightly related to Rényi information respectively), for non-Gaussian signals this connection degen- dimension. Employing another Fano-type argument, the recon- erates to a bound by the max-entropy property of the Gaussian struction error probability is bounded by the relative information distribution. Energy and information of a signal therefore can loss from below. behave completely differently when fed through a system. But In view of developing a system theory from an information- while we have a definition of the energy loss (namely, the theoretic point-of-view, the theoretical results are illustrated by inverse gain), an analysis of the information loss in a a few example systems, among them a multi-channel autocorre- L2 lation receiver. system is still lacking. It is the purpose of this work to close this gap and to Index Terms—Data processing inequality, Fano’s inequality, information loss, Rényi information dimension, system theory propose the information loss – the conditional entropy of the input given the output – as a general system characteristic, complementing the prevailing energy-centered descriptions. I. INTRODUCTION The choice of this conditional entropy is partly motivated by When opening a textbook on linear [1] or nonlinear [2] the data processing inequality (cf. Definition 1) and justified input-output systems, the characterizations one typically finds by a recent axiomatization of information loss [6]. – aside from the difference or differential equation defining At present we restrict ourselves to memoryless systems2 the system – are almost exclusively energy-centered in nature: operating on (multidimensional) random variables. The reason transfer functions, input-output stability, passivity, lossless- for this is that already for this comparably simple system class ness, and the or energy/power gain are all defined using the L2 a multitude of questions can be asked (e.g., what happens if we amplitudes (or amplitude functions) of the involved signals, lose an infinite amount of information?), to some of which we therefore essentially energetic in nature. When opening a intend to present adequate answers (e.g., by the introduction textbook on statistical signal processing [3] or an engineering- of a relative measure of information loss). This manuscript can oriented textbook on stochastic processes [4], one can add thus be regarded as a first small step towards a system theory correlations, power spectral densities, and how they are af- from an information-theoretic point-of-view; we hope that the fected by linear and nonlinear systems (e.g., the Bussgang results presented here will ley the foundation for some of the arXiv:1303.6409v2 [cs.IT] 17 Apr 2013 theorem [4, Thm. 9-17]). By this overwhelming prevalence of forthcoming steps, such as extensions to stochastic processes energetic measures and second-order statistics, it is no surprise or systems with memory. that many problems in system theory or signal processing are formulated in terms of energetic cost functions, e.g., the mean- squared error. A. Related Work What one does not find in all these books is an information- To the best of the authors’ knowledge, very few results theoretic characterization of the system at hand, despite the about the information processing behavior of deterministic fact that such a characterization is strongly suggested by an input-output systems have been published. Notable exceptions elementary theorem: the data processing inequality. We know are Pippenger’s analysis of the information lost in the mul- that the information content of a signal (be it a random vari- tiplication of two integer random variables [7] and the work able or a stochastic process) cannot increase by deterministic of Watanabe and Abraham concerning the rate of information loss caused by feeding a discrete-time, finite-alphabet Bernhard C. Geiger ([email protected]) and Gernot Kubin ([email protected]) are with the Signal Processing and Speech stationary stochastic process through a static, non-injective Communication Laboratory, Graz University of Technology Parts of this work have been presented at the 2011 IEEE Int. Sym. 1At least by not more than a finite amount, cf. [5]. on Wireless Communication Systems, the 2012 Int. Zürich Seminar on 2Since these systems will be described by (typically) nonlinear functions, Communications, and the 2012 IEEE Information Theory Workshop. we will use “systems” and “functions” interchangeably. 2 function [8]. Moreover, in [9] the authors made an effort to B. Outline and Contributions extend the results of Watanabe and Abraham to dynamical The organization of the paper is as follows: In Section II we input-output systems with finite internal memory. All these give the definitions of absolute and relative information loss works, however, focus only on discrete random variables and and analyze their elementary properties. Among other things, stochastic processes. we prove a connection between relative information loss and Slightly larger, but still focused on discrete random vari- Rényi information dimension and the additivity of absolute ables only, is the field concerning information-theoretic cost information loss for a cascade. We next turn to a class of functions: The infomax principle [10], the information bottle- systems which has finite absolute information loss for a real- neck method [11] using the Kullback-Leibler divergence as valued input in Section III. A connection to differential entropy a distortion function, and system design by minimizing the is shown as well as numerous upper bounds on the information error entropy (e.g., [12]) are just a few examples of this recent loss. Given this finite information loss, we present Fano- trend. Additionally, Lev’s approach to aggregating accounting type inequalities for the probability of a reconstruction error. data [13], and, although not immediately evident, the work Section IV deals with systems exhibiting infinite absolute about macroscopic descriptions of multi-agent systems [14] information loss, e.g., quantizers and systems reducing the belong to that category. dimensionality of the data, to which we apply our notion A system theory for neural information processing has been of relative information loss. Presenting a similar connection proposed by Johnson3 in [15]. The assumptions made there between relative information loss and the reconstruction er- (information need not be stochastic, the same information can ror probability establishes a link to analog compression as be represented by different signals, information can be seen investigated in [26]. We apply our theoretical results to two as a parameter of a probability distribution, etc.) suggest the larger systems in Section V, a communications receiver and use of the Kullback-Leibler divergence as a central quantity. an accumulator. It is shown that indeed both the absolute and Although these assumptions are incompatible with ours, some relative measures of information loss are necessary to fully similarities exist (e.g., the information transfer ratio of a characterize even the restricted class of systems we analyze in cascade in [15] and Proposition 4). this work. Eventually, Section VI is devoted to point at open issues and lay out a roadmap for further research. Aside from these, the closest the literature comes to these concepts is in the field of autonomous dynamical systems or iterated maps: There, a multitude of information-theoretic II. DEFINITION AND ELEMENTARY PROPERTIES OF characterizations are used to measure the information transfer INFORMATION LOSS AND RELATIVE INFORMATION LOSS in deterministic systems. In particular, the information flow A. Notation between small and large scales, caused by the folding and We adopt the following notation: Random variables (RVs) stretching behavior of chaotic one-dimensional maps was are represented by upper case letters (e.g., ), lower case analyzed in [16]; the result they present is remarkably sim- X letters (e.g., ) are reserved for (deterministic) constants or ilar to our Proposition 7. Another notable example is the x realizations of RVs. The alphabet of an RV is indicated by a introduction of transfer entropy in [17], [18] to capture the calligraphic letter (e.g., ). The probability distribution of an information transfer between states of different (sub-)systems. RV is

Information Measures for Deterministic Input-Output Systems

Polarization of the Rényi Information Dimension with Applications To

Rényi Information Dimension: Fundamental Limits of Almost Lossless Analog Compression Yihong Wu, Student Member, IEEE, and Sergio Verdú, Fellow, IEEE

Understanding the Fractal Dimensions of Urban Forms Through Spatial Entropy

Information Theory and Dynamical System Predictability

Understanding Fractal Dimension of Urban Form Through Spatial Entropy

Information, Measure Shifts and Distribution Diagnostics

Measuring Statistical Dependence Via the Mutual Information Dimension

Information Dimension and the Probabilistic Structure of Chaos J

Modeling and Analyzing Fractal Point Processes by Anand R

Information-Theoretic Analysis of Memoryless Deterministic Systems