Probability, Random Processes, and Ergodic Properties

Probability, Random Processes, and Ergodic Properties January 2, 2010 ii Probability, Random Processes, and Ergodic Properties Robert M. Gray Information Systems Laboratory Electrical Engineering Department Stanford University Springer-Verlag New York iv c 1987 by Springer Verlag. Revised 2001, 2006, 2007, 2008 by Robert M. Gray. v This book is affectionately dedicated to the memory of Elizabeth Dubois Jordan Gray 1906{1998 R. Adm. Augustine Heard Gray, U.S.N. 1888{1981 Sara Jean Dubois 1811{? and William \Old Billy" Gray 1750{1825 vi Contents Contents vii Preface ix 1 Probability and Random Processes 1 1.1 Introduction . .1 1.2 Probability Spaces and Random Variables . .1 1.3 Random Processes and Dynamical Systems . .6 1.4 Distributions . .8 1.5 Extension . 13 1.6 Isomorphism . 19 2 Standard alphabets 21 2.1 Extension of Probability Measures . 21 2.2 Standard Spaces . 22 2.3 Some properties of standard spaces . 26 2.4 Simple standard spaces . 29 2.5 Metric Spaces . 31 2.6 Extension in Standard Spaces . 36 2.7 The Kolmogorov Extension Theorem . 37 2.8 Extension Without a Basis . 38 3 Borel Spaces and Polish alphabets 45 3.1 Borel Spaces . 45 3.2 Polish Spaces . 48 3.3 Polish Schemes . 54 4 Averages 61 4.1 Introduction . 61 4.2 Discrete Measurements . 61 4.3 Quantization . 64 4.4 Expectation . 67 4.5 Time Averages . 77 4.6 Convergence of Random Variables . 80 4.7 Stationary Averages . 87 vii viii CONTENTS 5 Conditional Probability and Expectation 91 5.1 Introduction . 91 5.2 Measurements and Events . 91 5.3 Restrictions of Measures . 95 5.4 Elementary Conditional Probability . 95 5.5 Projections . 98 5.6 The Radon-Nikodym Theorem . 101 5.7 Conditional Probability . 104 5.8 Regular Conditional Probability . 106 5.9 Conditional Expectation . 109 5.10 Independence and Markov Chains . 115 6 Ergodic Properties 119 6.1 Ergodic Properties of Dynamical Systems . 119 6.2 Some Implications of Ergodic Properties . 122 6.3 Asymptotically Mean Stationary Processes . 127 6.4 Recurrence . 134 6.5 Asymptotic Mean Expectations . 138 6.6 Limiting Sample Averages . 140 6.7 Ergodicity . 142 7 Ergodic Theorems 149 7.1 Introduction . 149 7.2 The Pointwise Ergodic Theorem . 149 7.3 Block AMS Processes . 154 7.4 The Ergodic Decomposition . 156 7.5 The Subadditive Ergodic Theorem . 160 8 Process Metrics and the Ergodic Decomposition 169 8.1 Introduction . 169 8.2 A Metric Space of Measures . 170 8.3 The Rho-Bar Distance . 176 8.4 Measures on Measures . 182 8.5 The Ergodic Decomposition Revisited . 183 8.6 The Ergodic Decomposition of Markov Processes . 186 8.7 Barycenters . 188 8.8 Affine Functions of Measures . 191 8.9 The Ergodic Decomposition of Affine Functionals . 194 Bibliography 197 Index 201 Preface History and Goals This book has been written for several reasons, not all of which are academic. This material was for many years the first half of a book in progress on information and ergodic theory. The intent was and is to provide a reasonably self-contained advanced treatment of measure theory, probability theory, and the theory of discrete time random processes with an emphasis on general alphabets and on ergodic and stationary properties of random processes that might be neither ergodic nor stationary. The intended audience was mathematically inclined engineering graduate students and visiting scholars who had not had formal courses in measure theoretic probability. Much of the material is familiar stuff for mathematicians, but many of the topics and results have not previously appeared in books. The original project grew too large and the first part contained much that would likely bore mathematicians and discourage them from the second part. Hence I finally followed a suggestion to separate the material and split the project in two. The original justification for the present manuscript was the pragmatic one that it would be a shame to waste all the effort thus far expended. A more idealistic motivation was that the presentation had merit as filling a unique, albeit small, hole in the literature. Personal experience indicates that the intended audience rarely has the time to take a complete course in measure and probability theory in a mathematics or statistics department, at least not before they need some of the material in their research. In addition, many of the existing mathematical texts on the subject are hard for this audience to follow, and the emphasis is not well matched to engineering applications. A notable exception is Ash's excellent text [1], which was likely influenced by his original training as an electrical engineer. Still, even that text devotes little effort to ergodic theorems, perhaps the most fundamentally important family of results for applying probability theory to real problems. In addition, there are many other special topics that are given little space (or none at all) in most texts on advanced probability and random processes. Examples of topics developed in more depth here than in most existing texts are the following: Random processes with standard alphabets We develop the theory of standard spaces as a model of quite general process alphabets. Although not as general (or abstract) as examples often considered by probability theorists, standard spaces have useful structural properties that simplify the proofs of some general results and yield additional results that may not hold in the more general abstract case. Three important examples of results holding for standard alphabets that have not been proved in the general abstract case are the Kolmogorov extension theorem, the ergodic decomposition, and the existence of regular conditional probabilities. In fact, Blackwell [6] introduced the notion of a Lusin space, a structure closely related to a standard space, in order to avoid known examples of probability spaces where the Kolmogorov extension theorem does not hold and regular conditional probabilities do not exist. Standard ix x PREFACE spaces include the common models of finite alphabets (digital processes) and real alphabets as well as more general complete separable metric spaces (Polish spaces). Thus they include many function spaces, Euclidean vector spaces, two-dimensional image intensity rasters, etc. The basic theory of standard Borel spaces may be found in the elegant text of Parthasarathy [55], and treatments of standard spaces and the related Lusin and Suslin spaces may be found in Christensen [10], Schwartz [62], Bourbaki [7], and Cohn [12]. We here provide a different and more coding oriented development of the basic results and attempt to separate clearly the properties of standard spaces, which are useful and easy to manipulate, from the demonstra- tions that certain spaces are standard, which are more complicated and can be skipped. Thus, unlike in the traditional treatments, we define and study standard spaces first from a purely probability theory point of view and postpone the topological metric space considerations until later. Nonstationary and nonergodic processes We develop the theory of asymptotically mean stationary processes and the ergodic decomposition in order to model many physical processes better than can traditional stationary and ergodic processes. Both topics are virtually absent in all books on random processes, yet they are fundamental to understanding the limiting behavior of nonergodic and nonstationary processes. Both topics are considered in Krengel's excellent book on ergodic theorems [41], but the treatment here is more detailed and in greater depth. We consider both the common two-sided processes, which are considered to have been producing outputs forever, and the more difficult one-sided processes, which better model processes that are \turned on" at some specific time and which exhibit transient behavior. Ergodic properties and theorems We develop the notion of time averages along with that of probabilistic averages to emphasize their similarity and to demonstrate many of the implications of the existence of limiting sample averages. We prove the ergodic theorem theorem for the general case of asymptotically mean stationary processes. In fact, it is shown that asymptotic mean stationarity is both sufficient and necessary for the classical pointwise or almost everywhere ergodic theorem to hold for all bounded measurements. We also prove the subadditive ergodic theorem of Kingman [39], which is useful for studying the limiting behavior of certain measurements on random processes that are not simple arithmetic averages. The proofs are based on recent simple proofs of the ergodic theorem developed by Ornstein and Weiss [52], Katznelson and Weiss [38], Jones [37], and Shields [64]. These proofs use coding arguments reminiscent of information and communication theory rather than the traditional (and somewhat tricky) maximal ergodic theorem. We consider the interrelations of stationary and ergodic properties of processes that are stationary or ergodic with respect to block shifts, that is, processes that produce stationary or ergodic vectors rather than scalars | a topic largely developed by Nedoma [49] which plays an important role in the general versions of Shannon channel and source coding theorems. Process distance measures We develop measures of a \distance" between random processes. Such results quantify how \close" one process is to another and are useful for considering spaces of random processes. These in turn provide the means of proving the ergodic decomposition of certain functionals of random processes and of characterizing how close or different the long term behavior of distinct random processes can be expected to be. Of particular interest are the distribution or variational distance and the Kantorovich/Vasershtein/Ornstein matching distance. Having described the topics treated here that are lacking in most texts, we admit to the omission of many topics usually contained in advanced texts on random processes or second books on random PREFACE xi processes for engineers. The most obvious omission is that of continuous time random processes.

Probability, Random Processes, and Ergodic Properties

Measure-Theoretic Probability I

Stationary Processes and Their Statistical Properties

Methods of Monte Carlo Simulation II

Probability and Statistics Lecture Notes

Ergodicity, Decisions, and Partial Information

On the Continuing Relevance of Mandelbrot's Non-Ergodic Fractional Renewal Models of 1963 to 1967

Examples of Stationary Time Series

Ergodic Theory

Stationary Processes

Ergodicity and Metric Transitivity

MA427 Ergodic Theory

Conformal Fractals – Ergodic Theory Methods