Entropy and Information Theory Robert M
Total Page:16
File Type:pdf, Size:1020Kb
Entropy and Information Theory Robert M. Gray Entropy and Information Theory Springer Science+ Business Media, LLC Robert M. Gray Information Systems Laboratory Electrical Engineering Department Stanford University Stanford, CA 94305-2099 USA Printed on acid-free paper. © 1990 by Springer Science+Business Media New York Originally published by Springer·Verlag New York, Inc. in 1990 Softcover reprint ofthe hardcover lst edition 1990 All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher Springer Science+Business Media, LLC, except for brief ex cerpts in connection with reviews or scholarly analysis. Use in connection with any form of infor mation storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use of general descriptive names, trade names, trade marks, etc. in this publication, even if the former are not especially identified, is not to be taken as a sign that such names, as understood by the Trade Marks and Merchandise Marks Act, may accordingly be used freely by anyone. Camera-ready text prepared by the author using LaTeX. 9 8 7 6 5 4 3 2 1 ISBN 978-1-4757-3984-8 ISBN 978-1-4757-3982-4 (eBook) DOI 10.1007/978-1-4757-3982-4 to Tim, Lori, Julia, Peter, Gus, Amy Elizabeth, and Alice and in memory of Tino Contents Prologue xi 1 Information Sources 1 1.1 Introduction . 1 1.2 Probability Spaces and Random Variables . 1 1.3 Random Processes and Dynamical Systems 5 1.4 Distributions . 7 1.5 Standard Alphabets . 12 1.6 Expectation . 13 1. 7 Asymptotic Mean Stationarity 16 1.8 Ergodie Properties . 17 2 Entropy and Information 21 2.1 Introduction ...... 21 2.2 Entropy and Entropy Rate 21 2.3 Basic Properties of Entropy 25 2.4 Entropy Rate . 37 2.5 Conditional Entropy and Information 42 2.6 Entropy Rate Revisited .. 50 2.7 Relative Entropy Densities ... 53 3 The Entropy Ergodie Theorem 57 3.1 Introduction .......... 57 3.2 Stationary Ergodie Sources .. 60 3.3 Stationary Nonergodie Sources 67 3.4 AMS Sources . 71 3.5 The Asymptotic Equipartition Property 75 4 Information Rates I 77 4.1 Introduction .............. 77 4.2 Stationary Codes and Approximation 77 Vll Vlll CONTENTS 4.3 Information Rate of Finite Alphabet Processes 86 5 Relative Entropy 89 5.1 Introduction ......... 89 5.2 Divergence ......... 89 5.3 Conditional Relative Entropy 106 5.4 Limiting Entropy Densities . 119 5.5 Information for General Alphabets 121 5.6 Some Convergence Results . 132 6 Information Rates II 135 6.1 Introduction ................ 135 6.2 Information Rates for General Alphabets 135 6.3 A Mean Ergodie Theorem for Densities . 139 6.4 Information Rates of Stationary Processes 141 7 Relative Entropy Rates 149 7.1 Introduction .............. 149 7.2 Relative Entropy Densities and Rates 149 7.3 Markov Dominating Measures . 152 7.4 Stationary Processes . 156 7.5 Mean Ergodie Theorems .... 159 8 Ergodie Theorems for Densities 165 8.1 Introduction .......... 165 8.2 Stationary Ergodie Sources .. 165 8.3 Stationary Nonergodie Sources 170 8.4 AMS Sources . 174 8.5 Ergodie Theorems for Information Densities. 177 9 Channels and Codes 181 9.1 Introduction ............. 181 9.2 Channels .............. 183 9.3 Stationarity Properties of Channels . 184 9.4 Examples of Channels ..... 188 9.5 The Rohlin-Kakutani Theorem 210 10 Distortion 217 10.1 Introduction .......... 217 10.2 Distortion and Fidelity Criteria 217 10.3 Performance ........ 219 10.4 The rho-bar distortion .. 221 10.5 d-bar Continuous Channels 224 CONTENTS IX 10.6 The Distortion-Rate Function . 228 11 Source Coding Theorems 241 11.1 Source Coding and Channel Coding 241 11.2 Block Source Codes for AMS Sources . 242 11.3 Block Coding Stationary Sources . 252 11.4 Block Coding AMS Ergodie Sources 253 11.5 Subadditive Fidelity Criteria 260 11.6 Asynchronaus Block Codes .... 262 11.7 Sliding Block Source Codes .... 265 11.8 A Geometrie Interpretation of OPTA's . 275 12 Coding for noisy channels 277 12.1 Noisy Channels ... 277 12.2 Feinstein's Lemma . 278 12.3 Feinstein's Theorem 281 12.4 Channel Capacity .. 285 12.5 Robust Block Codes 290 12.6 Block Coding Theorems for Noisy Channels 293 12.7 Joint Source and Channel Block Codes .. 295 12.8 Synchronizing Block Channel Codes ... 298 12.9 Sliding Block Source and Channel Coding 302 Bibliography 315 Index 327 Prologue This book is devoted to the theory of probabilistic information measures and their application to coding theorems for information sources and noisy channels. The eventual goal is a general development of Shannon's mathe matical theory of communication, but much of the space is devoted to the tools and methods required to prove the Shannon coding theorems. These tools form an area common to ergodie theory and information theory and comprise several quantitative notions of the information in random vari ables, random processes, and dynamical systems. Examples are entropy, mutual information, conditional entropy, conditional information, and dis crimination or relative entropy, along with the limiting normalized versions of these quantities such as entropy rate and information rate. Much of the book is concerned with their properties, especially the long term asymptotic behavior of sample information and expected information. The book has been strongly influenced by M. S. Pinsker's dassie Infor mation and Information Stability of Random Variables and Processes and by the seminal work of A. N. Kolmogorov, I. M. Gelfand, A. M. Yaglom, and R. L. Dobrushin on information measures for abstract alphabets and their convergence properties. Many of the results herein are extensions of their generalizations of Shannon's original results. The mathematical models of this treatment are more general than traditional treatments in that nonstationary and nonergodie information processes are treated. The models are somewhat less general than those of the Soviet school of infor mation theory in the sense that standard alphabets rather than completely abstract alphabets are considered. This restriction, however, permits many stronger results as weil as the extension to nonergodie processes. In addi tion, the assumption of standard spaces simplifies many proofs and such spaces include as examples virtually all examples of engineering interest. The information convergence results are combined with ergodie theo rems to prove general Shannon coding theorems for sources and channels. The results are not the most general known and the converses are not the strongest available, but they are sufficently general to cover most systems xi Xll PROLOGUE encountered in applications and they provide an introduction to recent ex tensions requiring significant additional mathematical machinery. Several of the generalizations have not previously been treated in book form. Ex amples of novel topics for an information theory text indude asymptotic mean stationary sources, one-sided sources as weil as two-sided sources, nonergodie sources, d-continuous channels, and sliding bloek codes. An other novel aspect is the use of recent proofs of general Shannon-McMillan Breiman theorems which do not use martingale theory: A coding proof of Ornstein and Weiss [117) is used to prove the almost everywhere conver gence of sample entropy for discrete alphabet proeesses and a variation on the sandwich approach of Algoet and Cover [7) is used to prove the conver gence of relative entropy densities for general standard alphabet processes. Both results are proved for asymptotically mean stationary processes which need not be ergodic. This material can be considered as a sequel to my book Probability, Random Processes, and Ergodie Properlies [51) wherein the prerequisite results on probability, standard spaees, and ordinary ergodie properties may be found. This book is self contained with the exeeption of common (and a few less common) results which may be found in the first book. It is my hope that the book will interest engineers in some of the math ematieal aspects and general models of the theory and mathematicians in some of the important engineering applications of performanee bounds and eode design for eommunication systems. Information theory or the mathematical theory of communication has two primary goals: The first is the development of the fundamental theoreti callimits on the achievable performanee when eommunicating a given infor mation source over a given eommunications channel using coding schemes from within a prescribed dass. The second goal is the development of coding schemes that provide performanee that is reasonably good in eom parison with the optimal performanee given by the theory. Information theory was born in a surprisingly rieh state in the dassie papers of Claude E. Shannon [129) [130) whieh eontained the basie results for simple mem oryless sources and channels and introdueed more general eommunication systems models, including finite state sources and channels. The key tools used to prove the original results and many of those that followed were special cases of the ergodie theorem and a new variation of the ergodie theorem which considered sample averages of a measure of the entropy or self information in a process. Information theory can be viewed as simply a branch of applied prob ability theory. Beeause of its dependenee on ergodie theorems, however, it can also be viewed as a branch of ergodie theory, the theory of invariant PROLOGUE Xlll transformations and transformations related to invariant transformations. In order to develop the ergodie theory example of principal interest to in formation theory, suppose that one has a random process, which for the moment we consider as a sample space or ensemble of possible output se quences together with a probability measure on events composed of collec tions of such sequences. The shift is the transformation on this space of sequences that takes a sequence and produces a new sequence by shifting the first sequence a single time unit to the left.