Q-bio 2017 CSU FC
Marek Kimmel [email protected] Follow-up to Natalia Komarova lectures (with a twist)
• Two general talks – Stem cells – Crossing fitness valley • Two hours of class – Very brief intro of cancer as an evolutionary process, including mutations, selection, multistage carcinogenesis, and periods if growth and plateau. – Stochastic modeling of drug resistance in cancer . Brief intro to the biology of CML . Continuous time birth death process, derived the Kolmogorov forward equation . PDE for generating functions by the method of characteristics . Applications, such as combination therapies, the number of drugs required, and modeling cross resistance. Outline
Purpose: Understanding of cancer mutations based on data and theory of genetics and statistics
BS-1 Monday Basic stochastic models of molecular evolution and population genetics
BS-2 Wednesday Applications to modeling and analysis of leukemic and lung cancer data
General Talk Thursday Modified Griffiths-Tavaré coalescent applied to The Cancer Genome Atlas Data (Estimation of growth and mutation characteristics?) Vignette: Non-Darwinian evolution in tumors
What should we know about molecular evolution and population genetics?
• Markov Chain models of molecular evolution (infinite population) • Moran model (finite but constant population)
Later on (BS-2 and General Lecture) • Branching processes (finite growing or decaying population) • Wright-Fisher model and coalescent (later on) Elementary Introduction to Markov Chains Stochastic (random) processes
X (t,) :T R or Z Markov property in discrete time
Sequence of random variables
X 0 , X1,, X n ,
P[X n A| X 0 , X1,X n1] P[X n A| X n1]
If the process is nonnegative integer-valued, then
P[X n j | X 0 i0 , X1 i1, X n1 i] P[X n j | X n1 i]
Pij where i, j S the state space of the process (chain). Transition probabilities Transition probability matrix (TPM) Attention, state space S finite or denumerable
P11 P12 P1 j P P P 21 22 2 j P [Pij ]i, jS Pi1 Pi2 Pij
TPM is a stochastic matrix
i, j S : Pij 0
i S : Pij 1 jS Marginal probabilities Marginal probability of the process at time n follows from the Bayes formula for total probability
P[X n j] P[X n j | X n1 i]P[X n1 i] iS or, in abbreviated notation
p j (n) pi (n 1)Pij , j S, n 1, 2, iS where p j (n) P[X n j]. In matrix - vector notation
p(n) p(n 1)P, where p(n) [ p1(n), p2 (n),,] Example 1: Irreversible mutations in discrete generations
Let us assume that state 1 can mutate into state 2 but not conversely 1 P 0 1 p(n) p(0)Pn
n n n (1 ) 1 (1 ) P 0 1 After a long time, nothing is left in state 1 (state 2 is absorbing) Example 2: Reversible mutations in discrete time
Let us assume that state 1 mutate into state 2 and conversely. 1 P 1 We expect existence of a stationary distribution, which is invariant wrt multiplication by P (and Pn) 1 p pP ( p1, p2 ) ( p1, p2 ) 1 This 2-equation system is under-determined (since TPM is stochastic), so a probability norming condition is needed
p1 p1(1 ) p2 , p1 p2 1 p , p 1 2 Markov property in continuous time
Assume the following sequence of time points
0 t0 t1 t2 tn2 tn1( s) tn ( t) The Markov property now reads
P[X (t) j | X (t0 ) i0 ,X (t1) i1,, X (s) i] P[X (t) j | X (s) i]
Pij (s,t) Pij (t s)
Where the last equality follows from time-homogeneity (time-shift invariance) Transition probabilities in continuous time As in the time-discrete case, marginal distributions evolve by multiplication by TPM p(t) p(s)P(t s) A more general (in fact, fundamental) property is the Chapman-Kolmogorov equation (aka the semigroup property) P(t) P(s)P(t s) alternatively
Pij (t) Pik (s)Pkj (t s), i, j S kS Transition intensities
How to build a time-continuous Markov chain? Let us specify infinitesimal transition probabilities (1 jump at most per t) Q t o(t) j i ij Pij (t) P[X (t t) j | X (t) i] 1 Qijt o(t) j i jS , ji o(t) lim 0 o(t) is called a “small” of t (generic Landau’s symbol) t0 t
Qij are transition intensities. Define the diagonal element
Qii Qij Qij 0 jS , ji jS We now have
Pij (t) ij Qijt o(t), i,j S Or in condensed notation P (t) I Qt o(t) Matrix of transition intensities
Q11 Q12 Q1 j Q Q Q 21 22 2 j Q [Qij ]i, jS Qi1 Qi2 Qij
i, j S, i j :Qij 0
i S :Qij 0
i S : Qij 0 jS Differential equations for TPM Infinitesimal transition equation P(t) I Qt o(t) Can be multiplied from the left by the TPM P(t)P(t) P(t) P(t)Qt o(t) P(t)(...)
Applying the Chapman-Kolmogorov P(t t) P(t) P(t)Qt o(t) P(t t) P(t) o(t) P(t)Q t 0 t t dP(t) / dt P(t)Q, P(0) I This is called the forward equation (why ?)
Similarly, multiplying from the right by the TPM (...) P(t) dP(t) / dt QP(t), P(0) I we obtain the backward equation Differential equations for TPM
Both forward equations dP(t) / dt P(t)Q, P(0) I and backward equations (remember these are matrix equations!) dP(t) / dt QP(t), P(0) I can be sometimes solved. If this is the case then the solution has the form P(t) P(0)eQt eQt exp(Qt) which is formally the same as in scalar case, except the matrix exponent exp(Qt) I Qt (Qt)2 / 2! (Qt)3 / 3! (Qt)i / i! i0 is a square matrix itself. Another useful expression is the following lim t0 P(0) Q Stationarity in time-continuous processes
Conclusion from Chapman-Kolmogorov: p(t s) p(s)P(t) For the stationary distribution p p(s) p(s t) p p pP(t); all t
Since we have Q2t 2 Q3t 3 exp(Qt) I Qt 2! 3! an equivalent condition is pQ 0 Example 3: Mutations in continuous time
Now, μ and ν are intensities not probabilities
Q
P(t) exp(Qt) 1 exp[( )t] exp[( )t] exp[( )t] exp[( )t] In the limit, the transition probability matrix has rows being stationary distributions (ergodicity) p P() p Models for DNA substitution
Nothing in Biology Makes Sense Except in the Light of Evolution
Theodosius Dobzhansky (1900-1975) Substitutions
Purine Purine Pyrimidine Pyrimidine Transitions AG, G A, C T, T C (more likely)
Purine Pyrimidine Pyrimidine Purine Transversions AT, T A, A C, C A (less likely) GT, T G, G C, C G Hypotheses
Substitution of nucleotides in the evolution of DNA sequences can be modeled by a Markov chain – time-discrete, or – time continuous Usually – stationary, and – reversible (why?) Transition matrix
a g c t
a paa pag pac pat
g p p p p P = ga gg gc gt c pca pcg pcc pct
t pta ptg ptc ptt Jukes – Cantor model (~1960) Neutral evolution theory: most mutations have no selective value
All substitutions are equally probable
1 3 1 3 P 1 3 1 3 Stationary distribution
P
a g c t 0.25 0.25 0.25 0.25 Spectral decomposition of Pn
0.25 0.25 0.25 0.25 0.75 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.75 0.25 0.25 Pn (1 4)n 0.25 0.25 0.25 0.25 0.25 0.25 0.75 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.25 0.75 Parameter estimation Jukes – Cantor model
The following 3 graphs are equivalent due to reversibility:
Ancestor A t t D1 D2 D2 D1 A A D1 D2 2t 2t Descendants Probability that the nucleotides are different in two descendants
p p(t) 0.75(1 exp(8t)) Estimating α
We have two DNA sequences of length N
D1: ACAATACAGGGCAGATAGATACAGATAGACACAGACAGAGCAGAGACAG D2: ACAATACAGGACAGTTAGATACAGATAGACACAGACAGAGCAGAGACAG Number of differences p = N
1 4 t log(1 pˆ) 8 3
Estimated product of mutation rate and time only ! But how does this work in finite populations? • Is there really a problem? Moran Process with discrete time and directional selection Assumption: Mutants are already there!
t t + 1
One dies (randomly chosen) Another reproduces Transition probabilities if i is the number r 1 s of Orange Mutants i N i N i ri p p p 1 p p i,i1 N ri N i i,i1 N ri N i i i,i1 i,i1
Final effect: Extinction or fixation of an evolutionarily favorable mutant Expressions for probability of fixation and expected time to fixation 1 (1 s)i P[T T | Z(0) i] N 0 1 (1 s)N
2ln N lni E[T |T T ;Z(0) i] N N 0 s (this latter approximate: large N, small s) Anatomy of a Moran Process conditional on mutant fixation (Durrett-style)
N N - Subcritical bp N - N/lnN
Deterministic
N/lnN 1 2 3 i Supercritical bp 0 TN