Random Processes and Entropy Rates
Mathias Winther Madsen [email protected]
Institute for Logic, Language, and Computation University of Amsterdam
March 2015 Random Processes
Definition A (discrete) random process is a series of random variables
X1, X2, X3, X4, X5,...
In each possible world ω ∈ Ω, each variable Xt is assigned a value Xt(ω).
ω X1 X2 X3 X4 X5 X6 ···
ω1 0 1 2 1 2 3 ···
ω2 0 −1 −2 −1 0 −1 ···
ω3 0 1 2 1 0 −1 ··· Random Processes
A Memoryless Uniform Process
baaabaaccbacacccababbcca ...
Geometric-Length Repetitions Randomly choose one of the letters {a, b, c}; flip a coin and print the letter until the coin comes up heads; repeat.
c bb aa c c bb aaa b a a a cccc bb a b bb b ...
A Repetition Code Choose a three-letter word over {a, b, c}; print it twice; repeat.
cca cca bac bac aca aca cba cba cbc cbc ... Markov Chains
A Markov Chain Choose the first letter according to some unconditional distribution. Choose the next letter based on a conditional distribution given your last choice; repeat.
bbaaaaaaaaaaaaaaabbaabbbbababa ...
a b .3 ↓ ↓ .7 a b .5 .5 .7 .5 → a .3 .5 → b Markov Chains
t 0.4 0.5 0.5 0.1 _ h 0.1 0.7 0.2
0.7 0.3 0.3 0.2 0.9
e a 0.1 t_ate_t_he_te_the_the_that_t_te_ athe_at_athe_t_athe_te_ath_th_a_ a_the_the_thatea_the_he_a_t_ ... Markov Chains
.7 1 a Pr{a} 0.5 .5 .3
b 0 0 1 2 3 4 .5 Epoch Markov Chains
The Ergodic Theorem for Markov Chains A Markov chain with a finite number of states converges to a unique stationary distribution if it is connected: all its states are connected by a path with positive probability; aperiodic: the length of its positive-probability cycles have no common divisor greater than 1.
1 1 1 1 2 1 1 1 1 1 2
No Yes No No Entropy Rates
Definition The conditional entropy H(X | Y) is the weighted average of the entropies H(X | Y = y).
Definition
The entropy rate of a stochastic process X1, X2, X3,... is
lim H(Xt | X1, X2,..., Xt−1) t→∞
when this limit exists. Entropy Rates A memoryless process
bacadabbdcbbaadcacbbabbdacbdbb ...
Random Walk
A dust particle starts at X0 = 0 and takes a step up or down each period.
0, −1, −2, −1, 0, −1, 0, 1, 0, −1,...
Geometric-Length Words Choose a letter from {a, b, c, d}; flip a coin and keep printing the letter until the coin comes up heads; repeat.
c a c cc aa bbb a d b b c d aa b c cc ccc b aa dd ... Ergodic Theory
Definition Suppose a time shift operation T :Ω → Ω is given. A set A ⊆ Ω is a trapping set for T if T merely reshuffles the set, TA = A.
Definition A trapping set is trivial with respect to a measure m if
m(A) = 0 or m(A) = m(Ω).
A time shift operation T is ergodic with respect to a measure m if it has only trivial trapping sets.
1 1 1 1 1 1 2 2 2 2 0 1 2 2
Yes No Yes No Ergodic Theory
Definition A measure is stationary with respect to a time shift T if T has no effect on the measure of a set, m(T−1A) = m(A).
Birkoff’s Ergodic Theorem If a time shift T is ergodic with respect to some stationary measure m, then the time-average of any measurable reward function X converges to its space-average under m:
n−1 1 X Z Z X(T iω) → X dm = X(ω) m(ω) dω n i=0 Ω
This convergence holds for all ω ∈ Ω except a set of measure 0.
George David Birkhoff: “Proof of the ergodic theorem,” Proceedings of the National Academy of Sciences of the USA, 1931. Ergodic Theory Ergodic Theory
Corollary: A Law of Large Numbers In the long run, an ergodic process will visit a set A with a stable frequency, regardless of initial conditions.
Corollary: Uniqueness There is at most one stationary measure under which T is ergodic.
Corollary: i.d.d. Equivalence In terms of expected values, an ergodic process behaves just like a memoryless process.
Corollary: The General Source Coding Theorem Any ergodic process has an entropy rate.