Stochastic processes and Markov chains (part I)
Wessel van Wieringen w. n. van. wieringen@vu. nl
Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics , VU University Amsterdam, The Netherlands Stochastic processes Stochastic processes
ExExampleample 1 • The intensity of the sun. • Measured every day by the KNMI.
• Stochastic variable Xt represents the sun’s intensity at day t, 0 ≤ t ≤
T. Hence, Xt assumes values in R+ (positive values only). Stochastic processes
ExExampleample 2 • DNA sequence of 11 bases long. • At each base position there is an A, C, G or T.
• Stochastic variable Xi is the base at position i, i = 1,…,11. • In case the sequence has been observed, say:
(x1, x2, …, x11) = ACCCGATAGCT,
then A is the realization of X1, C that of X2, et cetera. Stochastic processes
position … t t+1 t+2 t+3 … base … A A T C …
… A A A A …
… G G G G …
… C C C C …
… T T T T …
position position position position t t+1 t+2 t+3 Stochastic processes
ExExampleample 3 • A patient’s heart pulse during surgery. • Measured continuously during interval [0, T].
• Stochastic variable Xt represents the occurrence of a heartbeat at time t, 0 ≤ t ≤ T. Hence, Xt assumes only the values 0 (no heartbeat) and 1 (heartbeat). Stochastic processes
Example 4 • Brain activity of a human under experimental conditions. • Measured continuously during interval [0, T].
• Stochastic variable Xt represents the magnetic field at time t, 0 ≤ t ≤ T.
Hence, Xt assumes values on R. Stochastic processes
Differences between examples
Xt Discrete Continuous e tt Example 2 Example 1 iscre DD Time s
Example 3 Example 4 tinuou nn Co Stochastic processes
The state space S is the collection of all possible values that the random variables of the stochastic process may assume.
If S = {E1, E2,,, …, Es}}, discrete, then Xt is a discrete stochastic variable. → examples 2 and 3.
If S = [0, ∞) discrete, then Xt is a continuous stochastic variable. → examples 1 and 4. Stochastic processes
Stochastic process: • Discrete time: t = 0 , 1, 2, 3, …. • S = {-3, -2, -1, 0, 1, 2, …..} •P((pp)one step up) = ½½( = P(one ste p down ) • Absorbing state at -3 7 6 5 4 3 2 1 0 123456789101112131415161718 -1 -2 -3 -4 Stochastic processes
The first passage time of a certain state Ei in S is the timetatwhichtime t at which Xt = Ei for the first time since the start of the process.
7 What is the first ppgassage 6 time of 2? And of -1? 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 101112131415161718 -1 -2 -3 -4 Stochastic processes
The absorbing state is a state Ei for which the following holds: if Xt = Ei than Xt1t+1 = Ei for all s ≥ t. The process will never leave state Ei.
7 6 The absorbing state is -3. 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 -1 -2 -3 -4 Stochastic processes
The time of absorption of an absorbing state is the first passage time of that state.
7 6 The time of absorption is 17. 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 -1 -2 -3 -4 Stochastic processes
The recurrence time is the first time t at which the process has returned to its initial state.
7 6 The recurrence time is 14. 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 101112131415161718 -1 -2 -3 -4 Stochastic processes
A stochastic process is described by a collection of time points, the state space and the simultaneous
distribution of the variables Xt, i.e., the distributions of all Xt and their dependency .
There are two important types of processes: • Poisson process : all variables are identically and independently distributed. Examples: queues for counters, call centers, servers, et cetera. • Markov process: the variables are dependent in a simple manner. Markov processes Markov processes
A 1st order Markov process in discrete time is a sto-
chastic process {Xt}t=1,2, … for which the following holds:
P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt). In other words, only the present determines the future, the past is irrelevant.
From every 7 state with 6 5 4 probability 0. 5 3 2 one step up or 1 0 1 2 3 4 5 6 7 8 9 101112131415161718 down (except -1 -2 for state -3). -3 -4 Markov processes
The 1st order Markov property, i.e.
P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt),
does not imply independence between Xt-1 and Xt+1. In fact, often P(Xt+1=xt+1 | Xt-1=xt-1) is unequal to zero.
For instance, 7 6 P(Xt+1=2 | Xt-1=0) = 5 4 P(Xt+1=2 | Xt=1) 3 2 * P(X =1 | X =0) 1 t t-1 0 1 2 3 4 5 6 7 8 9 101112131415161718 = ¼. -1 -2 -3 -4 Markov processes
An m-order Markov process in discrete time is a stochastic
process {Xt}t=1,2, … for which the following holds:
P(Xt+1=xt+1 | Xt=xt, …, X1=x1) = P(Xt+1=xt+1 | Xt=xt,…,Xt-m+1= xt-m+1). Loosely, the future depends on the most recent past.
Suppose m=9:
9 preceeding t+1 bases
… CTTCCTCAAGATGCGTCCAATCCCTCAAAC …
Distribution of the t+1-th base depends on the 9 preceeding ones. Markov processes
A Markov process is called a Markov chain if the state space is discrete, i. e., is finite or countable.
In these lecture series we consider Markov chains in discrete time.
Recall the DNA example. Markov processes
The transition probabilities are the
P(Xt+1=xt+1 | Xt=xt), but also the
P(Xt+1=xt+1 | Xs=xs) for s < t,
where xt+1, xt in {E1, …, Es}=} = S.
7 P(X =3 | X =2) = 0.5 6 t+1 t 5 4 3 P(Xt+1=0 | Xt=1) = 0.5 2 1 0 1 2 3 4 5 6 7 8 9 101112131415161718 P(Xt+1=5 | Xt=7) = 0 -1 -2 -3 -4 Markov processes
A Markov process is called time homogeneous if the transition probabilities are independent of t:
P(Xt+1=x1 | Xt=x2) = P(Xs+1=x1 | Xs=x2).
For example:
P(X2=-1|1 | X1=2) = P(X6=-1|1 | X5=2)
P(X584=5 | X583=4) = P(X213=5 | X212=4)
7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 101112131415161718 -1 -2 -3 -4 Markov processes
Example of a time inhomogeneous process
Bathtub h(t) curve
t
Infant mortality Random Wearout failures failures Markov processes
Consider a DNA sequence of 11 bases. Then, S={a, c,
g, t}, Xi is the base of position i, and {Xi}i=1, …, 11 is a Markov chain if the base of position i only depends on the base of position i-1, and not on those before i-1. If this is plausible, a Markov chain is an acceptable model for base ordering in DNA sequences.
A C
G T state diagram Markov processes
In remainder, only time homogeneous Markov processes. We then denote the transition probabilities of a finite time
homogeneous Markov chain in discrete time {Xt}t=1,2,… with S={E1, …, Es} as:
P(Xt+1=Ej | Xt=Ei) = pij (does not depend on t).
Putting the pij in a matrix yields the transition matrix:
The rows of this matrix sum to one. Markov processes
In the DNA example:
p pAA CC p where AC A p C p = P(X = A | X = A) CA AA t+1 t pGC pGC and: pGA pAG pTC pCT
pAA + pAC + pAG + pAT = 1 pGT pCA + pCC + pCG + pCT = 1 G T pTG et cetera pGG pTT Markov processes
T The initial distribution π = (π1, …,πs) gives the proba bilit ies o f t he in it ia l state:
πi = P(X1 = Ei) for i = 1, …, s,
and π1 + … + πs = 1.
Together the initial distribution π and the transition matrix P determine the probability distribution of the
process, the Markov chain {Xt}t=1,2, … Markov processes
We now show how the couple (π, P) determines the probability distribution (transition probabilities) of time steppgs larger than one.
Hereto define for n = 2 :
general n : Markov processes
For n = 2 :
just the definition Markov processes
For n = 2 :
use the fact that P(A, B)+P() + P(A, BC)=P() = P(A) Markov processes
For n = 2 :
use the definition of conditional probability: P(A, B)P() = P(A | B)P() P(B) Markov processes
For n = 2 :
use the Markov property Markov processes
For n = 2 : Markov processes
In summary , we have shown that:
In a similar fashion we can show that:
In word s: the transiti on mat ri x f or n stithteps is the one- step transition matrix raised to the power n. Markov processes
The general case:
is proven by induction to n.
One needs the Kolmogorov-Chapman equations:
for all n, m ≥ 0 and i, j =1,…,S. Markov processes
Proof of the Kolmogorov-Chapman equations: Markov processes
Induction proof of Assume Then: Markov processes
A numerical example:
Then:
matrix multiplication ((rows“rows times columns ” ) Markov processes
Thus:
0.6490 = P(Xt+2 = I | Xt = I) is composed of two probabilities:
probability of this route plus probability of this route I I I I I I
II II II II II II position position position position position position t t+1 t+2 t t+1 t+2 Markov processes
In similar fashion we may obtain:
Sum over probs. of all possible paths between 2 states: I I I I I I
II II II II II II pos. t pos. t+1 pos. t+2 pos. t+3 pos. t+4 pos. t+5
Similarly: Markov process
How do we sample from a Markov process?
p Reconsider the DNA example. pAA CC pAC A p C AC G T CA pGC pGC
pGA pAG pTC pCT
pGT to G T pTG p AC GT GG pTT f A r C o G m T Markov process
1 Sample the first base from the initial distribution π:
P(X1=A) = 0.45, P(X1=C) = 0.10, et cetera Suppose we have drawn an A. Our DNA sequence now consists of one base, namely A. After the first step , our sequence looks like: A
2 Sample the next base using P given the previous base.
Here X1=A, thus we sample from:
P(X2=A|A | X1=A) = 0. 1, P(X2=C|C | X1=A) = 0 . 1, e t ce tera.
In case X1=C, we would have sampled from:
P(X2=A | X1=C) = 0.2, P(X2=C | X1=C) = 0.3, et cetera. Markov process
2 After the second step: AT
3 The last step is iterated u ntil the last base. After the third step: ATC
9 After nine steps: ATCCGAT GC Andrey Markov Andrey Markov
1856–1922 Andrey Markov
In the famous first application of Markov chains, Andrey Markov studied the sequence of 20,000 letters in Pushkin’s poem “Eugeny Onegin”, discovering that - the stationary vowel probability is 0.432, - the probability of a vowel following a vowel is 0.128, and - the probability of a vowel following a consonant is 0.663.
Markov also studied the sequence of 100,000 letters in Aksakov’s novel “The Childhood of Bagrov, the Grandson”.
This was in 1913, long before the computer age!
Basharinl et al. (2004) Parameter estimation Parameter estimation
T A G T T C G G A A G C A C G T G G A T G A G G A A C C C G C C A C C C C A T C C G C C T G G T G T T T G A T A G C T G T A T T C G G G C A T G G C G G C C C A C G C T A T T C G T C T G T A C G C A A G C C T C G T T T T A T C C C C G C C G C C A G A G T A C C G C A G C A C C C G G C G C G A C G G G A C C G C G C C G C G G C C A A C T A G C G C A G C C G C A G A C C G C C A C G T G C G C A G G A C T T C C C A C C C T C A C C C T T C C G C T C A C C C T T A T C G G A G G G C A C T C G C A C G G A T A G T T G G T A G G G G G A C A T C C C A G G G T A G C C G C A C G A T G C G G A T A G G G G T A T T T A T G G C G C C C T C A C C C A G G C G A T C T C G G G C C C G G C T G G G C G C C G G G C G G A C C C A G A T A G A T C G G T G C C C C G G C C C C A A C C G A A G G G T A G G A C T G T C G G A C C T G C G G C G C C C G C G A C G A T A C G A C A G A C A G G A A A G A A A G C A T C C A G G G A C G G C G C G C T T G C T G G A C T G A C C A G C C A C T C C A T G G G G T C C G T A C C C G C C C G G G G G C A C T T T C G C C C C C G C G G A G A G C T G A G A C A A T C G A T G T G C C C A G C C G A G G G G G C C C G C C A G C C C A G G G G A C C G G A C T A A G G G C C A C C G T A G G G T A C A G C A A C C C C G G A C A C T G G C G A C C C C G C A C C A C C C C C A C C C G T A G C T C A G T A C A C G A G G A G A G G A A C C C G C G C G A A G C C C C T G G G C G C T C T T G C T G G T C A G A A A T A C C G T C G T T T A T C C G T G G A C T C G T A C G A C C T G C C A C C C T G G C A G G A C T G A C T G C C G C C A C C C C C C G T C C A C T A C G G A A A G T G A G G G G G C C A A C C G C G T A G G A C A A G A G G G T C T C G C C T C G G G G G G T T G C C G G C C G G G G C A G G T T C A G G A G T T A T T T C C A G G C C C C A C G G A C T G T T T T T C C G T C C C A G A T G G G A T G G A A T T A A C G C A C G C C A T C A C A C C T G C A C G C C G G C C C C C T G G A G A C C G T G C G C A AHow G A G G Gdo G G Cwe T A C estimateC C A C T T T C T theG C G T parametersA T A T A C G T T A G G A C T C G A A G C C C A G G G T A G T G A C G G A C G C G C T C G A T C C C C C G T T C A C C G G G A C C C T G C C G G A T T T A A G G C G C G T G C C G G G C C C G T G C T C A C C C C C G G A Cof A G aC CMarkov A G A C A C A Tchain T T C T C Cfrom T G C G Athe G T G Gdata? T G C C G C T T A C C G G G T C T G A T A C G G AGTAGAGGCCCCCCGGCTATAGCCACATGCCTACATCCTCGCGTCACACGGGCTCAGTTCGCCA G T A G A G G C C C C C C G G C T A T A G C C A C A T G C C T A C A T C C T C G C G T C A C A C G G G C T C A G T T C G C C T GGGG G G G A C G G A T G C C C C G A T C G C G T G G G A T A C C C G C G G C T T A C G A A T C A A A C A C G C C G C G G C G T G C G G G C G G G G A G G C C T A G G C T T C G G G A G G G G C A T T G C G C C C T C G A T A C G A C C G G G T C C C C G C C C A C G C T C T T T G G G T G T C G G C G C C G C A C TMaximum C C C T G C G T G Clikelihood! C G G T C T C C T T G T G T A A C G G T G A T A C G G G G G T A C C G G T G T G C C A G C A G G C C T C T A A G C T A T A G C C C C T G C C C A C C T T C G T G A C A C C C G G G T T G C C C C C A C G T C C C G G T A G T C A A A G G G C T A C C C A G C G C C G C A T C G A C G T C C C T T G G G T G G C G G T T A A C A G C A G A T C T T T G C C A G A AGTCGCGCAACAAAGGCACTCTGCAGCCTGGCCACCAGGCGGCCCTCTCGAGCGGCCAGCAGGA G T C G C G C A A C A A A G G C A C T C T G C A G C C T G G C C A C C A G G C G G C C C T C T C G A G C G G C C A G C A G G A CCCC C C G G G C A C G G C A C T C T A T A G C A C G G T A G C C A C T G C A G T A C C G T A G G A C C A C C C G G T G A C A C A G T G G A G A C C G G G G T T A C A C C A C T C C C C C G A G G T T A G C C A T C C T C G A G A A C G A C T A A G C G C G G G A A C G A G G C G A G C A G C C A A T A C G T G C G A A G A A C G T C T G C G G A G C G C C A T A C A C G A T A A C T A C G G T T T A T A G G T A C G T C G C A T G G A G G T T C G C G G C T G T A A A A C C C A C A C A T A C T C G G T C C C A C A C G G G T G T A A G C A C T C G T C A G T C C G C A G C C T C G T C T C G C C G A G G A T C A C G A G T C G G T C G C C C T C A A A G G A G C T T C A A G T T G G A C C C T G A C C G G G G A C T T G A G A A G T A A G C A C G C C C G A C G A A G C G G G G C C A G C A C C G G T A A A C C T G C G A C A A G T C C T T T A C C G C T A G G G T A C T G C G T G T C A A C C A G T C A C G A G T G C C C C A C C C T G G C G A C C C G G C G A G G A C C G C C C C C T T C C G G G A A G G T C A G A C C T G T G A G C A G A G A G C C A G G A T G C C A C T A A C G C T C A C C G T A C C T T C C G C C T G A C T C G C C G G T C C G C G C G C G A G C A A C C C C T A A C C A C C A A C C G G C G G A G G A C C C A G C A G A A T A T G G C G A A C C G G C C C C C G C G A C T G C G C G A T C G G C A G A C T C C T G T G C G G G G A G C A T A C A G G C T C T C C T C G G A G C G T G G C T C A G T G C C G C G T G G G C C G C T T T C T C C A G C C C C T A C A C G C C C A C T A C T C G A A C G T T T T T G T G G G C C T G A C T T G C C G C G T A C C C G C G C C T T T C G C C C T G A C G A C C C G C G T T A C C T A T G A G G G A T C A T G T G G G G T G G G G C C C C G G G T C C A C G C T A T C G A C G T A T G G T T G C T G C G T C A A C T C A C G C T C G A T G T A C A C G C C G T C C T C G T A A C A G C G A C C A C G C T C C C G A A G C T G G A T A A A C G A C T T C A G C A A T A A G G G A C G G T C A G G G G G G A G G A C G T A C G T A C T G G G C A G G C A G G C G G C T G G C A C C C T G A G G C A A A T G C G G A C G C G A A C C A T C C A A G A C G G G C G A C A A A A C G A C C G G A A G A G C G A G G A G A C G T A G C A C C G T T T C C G C G G C T T C C G A A C G G A G C T A A C A G A G A C C C G C C G G A T C T C G T T G G T A C C G A G C C G C C G T A T A C A T C A C A C C T G C A C G G C C A G T G C C T G C C T T A G T T C G G G C G C T A G C C C G T C G T C C C T A T T T A A C T G C G C G A G T G G C G A G C T C G G C Parameter estimation
UsUsinging tthehe dedefinitionfinition ooff coconditionalnditional pprobability,robability, tthehe likelihood can be decomposed as follows:
where x1, …, xT in S = {E1, …, ES}. Parameter estimation
UsUsinging tthehe MarkovMarkov pproperty,roperty, wewe obtaobtain:in: Parameter estimation
FFurthermore,urthermore, ee.g.:.g.:
where only one transition probability at the time enters the likelihood, due to the indicator function. Parameter estimation
ThThee likelihoodlikelihood tthenhen becobecomes:mes:
where, e.g., Parameter estimation
NNowow nnoteote tthat,hat, ee.g.,.g.,
pAA + pAC + pAG + pAT = 1. Or,
pAT = 1 - pAA - pAC - pAG. Substitute this in the likelihood, and take the logarithm to arrive at the log-likelihood.
Note: it is irrelevant which transition probability is substituted. Parameter estimation
ThThee loglog-liklikelihood:elihood: Parameter estimation
DiffDifferentiationerentiation tthehe loglog-liklikelihoodelihood yyields,ields, ee.g.:.g.:
Equate the derivatives to zero. This yields four systems of equations to solve, e. g. : Parameter estimation
The transition probabilities are then estimated by:
Vif2Verify 2nd ordtditiflder part. derivatives of log-like hoo d are nega tive ! Parameter estimation
If oonene maymay assuassumeme tthehe obseobservedrved data isis a realizationrealization ooff a stationary Markov chain, the initial distribution is estimated by the stationary distribution.
If onlyyy one realization of a stationary Markov chain is available and stationarity cannot be assumed, the initial distribution is estimated by: Parameter estimation
Thus,,gq for the following sequence: ATCGATCGCA, tabulation yields
The maximum likelihood estimates thus become: Parameter estimation
In R: > DNAseq <- c("A", "T", "C", "G", "A", "T", "C", "G", "C" "A") > table(DNAseq) DNAseq ACGTA C G T 3 3 2 2 > table(DNAseq[1:9], DNAseq[2:10]) A C G T A 0 0 0 2 C 1 0 2 0 G 1 1 0 0 T0200T 0 2 0 0 Testing the order of the Markov chain Testing the order of a Markov chain
Often the order of the Markov chain is unknown and needbdidfhdThiidids to be determined from the data. This is done using a Chi-square test for independence.
Consider a DNA-sequence. To assess the order, one first tests whether the sequence consists of independent letters. Hereto count the nucleotides, di-nucleotides and tri-nucleotides in the sequence: Testing the order of a Markov chain
The null hypothesis of independence between the letters of th e DNA sequence eva lua te d by the χ2 sttititatistic:
which has (4-1) x (4-1) = 9 degrees of freedom.
If the null hypothesis cannot be rejected , one would fit a
Markov model of order m=0. However, if H0 can be rejected one would test for higher order dependence. Testing the order of a Markov chain
To test for 1st order dependence, consider how often a di- nucleotide, e. g., CT, is followed by, e. g., T.
The 1st order dependence hypothesis is evaluated by:
where
This is χ2 distributed with (16-1) x(4x (4-1)=45dof1) = 45 d.o.f.. Testing the order of a Markov chain
A 1st order Markov chain provides a reasonable description of the sequence of the hlyE gene of the E. coli bacteria. Independence test: Chi-sqq,p stat: 22.45717, p-value: 0.00754 1st order dependence test: Chi-sq stat: 55.27470, p-value: 0.14025
The seqqpguence of the prrA gene of the E.coli bacteria re qgquires a higher order Markov chain model. Independence test: Chi-sq stat: 33. 51356, p-value: 1. 266532e-04 1st order dependence test: Chi-sq stat: 114. 56290, p-value: 5. 506452e-08 Testing the order of a Markov chain In R (the independence case, assuming the DNAseq-object is a character-objjgq)ect containing the sequence): > # calculate nucleotide and dimer frequencies > nuclFreq <- matrix(table(DNAseq), ncol=1) > dimerFreq <- table(DNAseq[1:(length(DNAseq)-1)], DNAseq[2:length(DNAseq)])
> # calculate expected dimer frequencies > dimerExp <- nuclFreq %*% t(nuclFreq)/(length(DNAseq)-1)
> # calculate test statistic and p-value > teststat <- sum((dimerFreq - dimerExp)^2 / dimerExp) > pval <- exp(pchisq(teststat, 9, lower. tail=FALSE, log.p=TRUE))
EiExercise: modifthdbfth1dify the code above for the 1st ordttder test. Apppplication: sequence discr im ina tion Application: sequence discrimination
WWee havehave fittedfitted 1st oorderrder MarkovMarkov cchainhain mmodelsodels to a representative coding and noncoding sequence of the E.coli bacteria.
Pcoding Pnoncoding A C G T A C G T A 0.321 0.257 0.211 0.211 A 0.320 0.278 0.231 0.172 C 0.319 0.207 0.266 0.208 C 0.295 0.205 0.286 0.214 G 0.259 0.284 0.237 0.219 G 0.241 0.261 0.233 0.265 T 0.223 0.243 0.309 0.225 T 0.283 0.238 0.256 0.223
These models can used to discriminate between coding and noncoding sequences. Application: sequence discrimination
FForor a newnew sequesequencence withwith uunknownnknown functionfunction cacalculatelculate tthehe likelihood under the coding and noncoding model, e.g.:
These like lihoo ds are compare d by means o f the ir log ra tio:
the log-likelihood ratio. Application: sequence discrimination
If tthehe loglog likelihoodlikelihood LR(X) ooff a newnew sequesequencence eexceedsxceeds a certain threshold, the sequence is classified as coding and non-coding otherwise.
Back to E.coli To illustrate the pppp,otential of the discrimination approach, we calculate the log likelihood ratio for large set of known coding and noncoding sequences. The distribution of the ttftwo sets of LR(X)’s are compare d. Application: sequence discrimination
noncoding coding sequences sequences y cc requen ff
log likelihood ratio Application: sequence discrimination
flip mirror
violinplot
noncoding coding Application: sequence discrimination
CoConclusionnclusion E.E.colicoli eexamplexample Comparison of the distributions indicate that one could discriminate reasonablyygg between coding and noncoding sequences on the basis of simple 1st order Markov.
Improvements: - Higher order Markov model. - Incorporate more structure information of the DNA. References & further reading References and further reading
Basharin, G.P., Langville, A.N., Naumov, V.A (2004), “The life and work of A.A. Markov”, Linear Algebra and its Applications , 386, 3-26. Ewens, W.J, Grant, G (2006), Statistical Methods for Bioinformatics, Springer, New York. Reinert, G., Schbath, S., Waterman, M.S. (2000), “Probabilistic and statistical properties of words: an overview”, Journal of Computational Biology, 7, 1-46. This material is provided under the Creative Commons Attribution/Share-Alike/Non-Commercial License.
See http://www.creativecommons.org for details.