The Following Materials Are From

Total Page:16

File Type:pdf, Size:1020Kb

The Following Materials Are From

Spring 2006

The following materials are from http://www.mlpedia.org/index.php?title=Markov_chain

Markov chain, named after Andrey Markov, is a stochastic process with the Markov property. In such a process, the past is irrelevant for predicting the future, given knowledge of the present.

A Markov chain is a sequence X1, X2, X3, ... of random variables. The range of these variables, i.e., the set of their possible values, is called the state space, the value of Xn being the state of the process at time n. If the conditional probability distribution of Xn+1 on past states is a function of

Xn alone, then:

where x is some state of the process.

A simple way to visualise a specific type of Markov chain is through a finite state machine. If you are at state y at time n, then the probability that you will move on to state x at time n+1 does not depend on n, and only depends on the current state y that you are in. Hence at any time n, a finite Markov chain can be characterized by a matrix of probabilities whose x, y element is given by P( Xn+1 = x | X n = y ) and is independent of the time index n.

Andrey Markov produced the first results (1906) for these processes. A generalization to countably infinite state spaces was given by Kolmogorov (1936). Markov chains are related to Brownian motion and the ergodic hypothesis, two topics in physics which were important in the early years of the twentieth century

Properties of Markov chains

A Markov chain is characterized by the conditional distribution P( Xn+1 | X n ) , which is called the transition probability of the process. This is sometimes called the "one-step" transition probability. The marginal distribution P(Xn) is the distribution over states at time n. The initial distribution is P(X0). There may exist one or more state distributions π such that

, where Y is just a convenient name for the variable of integration. Such a distribution π is called a stationary distribution or steady-state distribution. Whether there is a stationary distribution, and whether it is unique if it does exist, are determined by certain properties of the process. Irreducible means that every state is accessible from every other state. A process is periodic if there exists at least one state to which the process will continually return with a fixed time period (greater than one). Aperiodic means that there is no such state. Positive recurrent means that the expected return time is finite for every state. If the Markov chain is positive recurrent, there exists a stationary distribution. If it is positive recurrent and irreducible, there exists a unique stationary distribution, and furthermore the process constructed by taking the stationary distribution as the initial distribution is ergodic. Markov chains in discrete state spaces

If the state space is finite, the transition probability distribution can be represented as a matrix, called the transition matrix, with the (i, j)'th element equal to Pij= P( X n+1 = j | X n = i )

For a discrete state space, if P is the one-step transition matrix, then Pk is the transition matrix for the k-step transition. The stationary distribution is a vector which satisfies the equation

, where πT is the transpose of π.

As a consequence, neither the existence nor the uniqueness of a stationary distribution is guaranteed for a general transition matrix P. However, if the transition matrix P is irreducible and aperiodic, then there exists a unique stationary distribution π. In addition, Pk converges elementwise to a rank-one matrix in which each row is the (transpose of the) stationary distribution πT, that is

, where is the column vector with all entries equal to 1. This is stated by the Perron-Frobenius theorem.

This means that if we simulate or observe a random walk with transition matrix P, then the long- term probability of presence of the walker in a given state is independent from where the chain was started, and is dictated by the stationary distribution. The random walk "forgets" the past. In short, Markov chains are the "next thing" after memoryless processes (i.e., a sequence of independent identically distributed random variables).

A transition matrix which is positive (that is, every element of the matrix is positive) is irreducible and aperiodic. A matrix is a stochastic matrix if and only if it is the matrix of transition probabilities of some Markov chain.

The special case of the transition probability being independent of the past is known as the Bernoulli scheme. A Bernoulli scheme with only two possible states is known as a Bernoulli process.

Scientific applications

Markovian systems appear extensively in physics, particularly statistical mechanics, whenever probabilities are used to represent unknown or unmodelled details of the system, if it can be assumed that the dynamics are time-invariant, and that no relevant history need be considered which is not already included in the state description.

Markov chains can also be used to model various processes in queueing theory and statistics. Claude Shannon's famous 1948 paper A mathematical theory of communication, which at a single step created the field of information theory, opens by introducing the concept of entropy through Markov modeling of the English language. Such idealised models can capture many of the statistical regularities of systems. Even without describing the full structure of the system perfectly, such signal models can make possible very effective data compression through entropy coding techniques such as arithmetic coding. They also allow effective state estimation and pattern recognition. The world's mobile telephone systems depend on the Viterbi algorithm for error-correction, while Hidden Markov models (where the Markov transition probabilities are initially unknown and must also be estimated from the data) are extensively used in speech recognition and also in bioinformatics, for instance for coding region/gene prediction.

The PageRank of a webpage as used by Google is defined by a Markov chain. It is the probability to be at page i in the stationary distribution on the following Markov chain on all

(known) webpages. If N is the number of known webpages, and a page i has ki links then it has transition probability (1-q)/ki + q/N for all pages that are linked to and q/N for all pages that are not linked to. The parameter q is taken to be about 0.15.

Markov chain methods have also become very important for generating sequences of random numbers to accurately reflect very complicated desired probability distributions - a process called Markov chain Monte Carlo or MCMC for short. In recent years this has revolutionised the practicability of Bayesian inference methods.

Markov chains also have many applications in biological modelling, particularly population processes, which are useful in modelling processes that are (at least) analogous to biological populations.

A recent application of Markov chains is in geostatistics. That is, Markov chains are used in two to three dimensional stochastic simulations of discrete variables conditional on observed data. Such an application is called "Markov chain geostatistics", similar with kriging geostatistics. The Markov chain geostatistics method is still in development.

Markov chains can be used to model many games of chance. The children's games Chutes and Ladders and Candy Land, for example, are represented exactly by Markov chains. At each turn, the player starts in a given state (on a given square) and from there has fixed odds of moving to certain other states (squares).

Recommended publications