Random Processes and Rates

Mathias Winther Madsen [email protected]

Institute for Logic, Language, and Computation University of Amsterdam

March 2015 Random Processes

Definition A (discrete) random process is a series of random variables

X1, X2, X3, X4, X5,...

In each possible world ω ∈ Ω, each variable Xt is assigned a value Xt(ω).

ω X1 X2 X3 X4 X5 X6 ···

ω1 0 1 2 1 2 3 ···

ω2 0 −1 −2 −1 0 −1 ···

ω3 0 1 2 1 0 −1 ··· Random Processes

A Memoryless Uniform Process

baaabaaccbacacccababbcca ...

Geometric-Length Repetitions Randomly choose one of the letters {a, b, c}; flip a coin and print the letter until the coin comes up heads; repeat.

c bb aa c c bb aaa b a a a cccc bb a b bb b ...

A Repetition Code Choose a three-letter word over {a, b, c}; print it twice; repeat.

cca cca bac bac aca aca cba cba cbc cbc ... Markov Chains

A Choose the first letter according to some unconditional distribution. Choose the next letter based on a conditional distribution given your last choice; repeat.

bbaaaaaaaaaaaaaaabbaabbbbababa ...

a b .3 ↓ ↓ .7 a b .5 .5 .7 .5 → a .3 .5 → b Markov Chains

t 0.4 0.5 0.5 0.1 _ h 0.1 0.7 0.2

0.7 0.3 0.3 0.2 0.9

e a 0.1 t_ate_t_he_te_the_the_that_t_te_ athe_at_athe_t_athe_te_ath_th_a_ a_the_the_thatea_the_he_a_t_ ... Markov Chains

.7 1 a Pr{a} 0.5 .5 .3

b 0 0 1 2 3 4 .5 Epoch Markov Chains

The Ergodic Theorem for Markov Chains A Markov chain with a finite number of states converges to a unique stationary distribution if it is connected: all its states are connected by a path with positive probability; aperiodic: the length of its positive-probability cycles have no common divisor greater than 1.

1 1 1 1 2 1 1 1 1 1 2

No Yes No No Entropy Rates

Definition The H(X | Y) is the weighted average of the H(X | Y = y).

Definition

The entropy rate of a X1, X2, X3,... is

lim H(Xt | X1, X2,..., Xt−1) t→∞

when this limit exists. Entropy Rates A memoryless process

bacadabbdcbbaadcacbbabbdacbdbb ...

Random Walk

A dust particle starts at X0 = 0 and takes a step up or down each period.

0, −1, −2, −1, 0, −1, 0, 1, 0, −1,...

Geometric-Length Words Choose a letter from {a, b, c, d}; flip a coin and keep printing the letter until the coin comes up heads; repeat.

c a c cc aa bbb a d b b c d aa b c cc ccc b aa dd ... Ergodic Theory

Definition Suppose a time shift operation T :Ω → Ω is given. A set A ⊆ Ω is a trapping set for T if T merely reshuffles the set, TA = A.

Definition A trapping set is trivial with respect to a measure m if

m(A) = 0 or m(A) = m(Ω).

A time shift operation T is ergodic with respect to a measure m if it has only trivial trapping sets.

1 1 1 1 1 1 2 2 2 2 0 1 2 2

Yes No Yes No Ergodic Theory

Definition A measure is stationary with respect to a time shift T if T has no effect on the measure of a set, m(T−1A) = m(A).

Birkoff’s Ergodic Theorem If a time shift T is ergodic with respect to some stationary measure m, then the time-average of any measurable reward function X converges to its space-average under m:

n−1 1 X Z Z X(T iω) → X dm = X(ω) m(ω) dω n i=0 Ω

This convergence holds for all ω ∈ Ω except a set of measure 0.

George David Birkhoff: “Proof of the ergodic theorem,” Proceedings of the National Academy of Sciences of the USA, 1931. Ergodic Theory Ergodic Theory

Corollary: A Law of Large Numbers In the long run, an ergodic process will visit a set A with a stable frequency, regardless of initial conditions.

Corollary: Uniqueness There is at most one stationary measure under which T is ergodic.

Corollary: i.d.d. Equivalence In terms of expected values, an ergodic process behaves just like a memoryless process.

Corollary: The General Source Coding Theorem Any ergodic process has an entropy rate.