Markov Chains, Renewal, Branching and Coalescent Processes: Four Topics in

Andreas Nordvall Lager˚as

Mathematical Department of Mathematics Stockholm University 2007 Doctoral Dissertation 2007 Stockholm University SE-106 91 Stockholm

Typeset by LATEX c Andreas Nordvall Lager˚as ISBN 91-7155-375-4 pp. 1–14 Printed by US AB Abstract

This thesis consists of four papers. In paper 1, we prove central limit theorems for Markov chains under (local) contraction conditions. As a corollary we obtain a for Markov chains associated with iterated function systems with contractive maps and place-dependent Dini-continuous probabilities. In paper 2, properties of inverse subordinators are investigated, in particular similarities with renewal processes. The main tool is a theorem on processes that are both renewal and Cox processes. In paper 3, distributional properties of supercritical and especially immortal branching processes are derived. The marginal distributions of immortal branching processes are found to be compound geometric. In paper 4, a description of a dynamic population model is presented, such that samples from the population have genealogies as given by a Λ-coalescent with mutations. Depending on whether the sample is grouped according to litters or families, the sampling distribution is either regenerative or non- regenerative. Tack

Jag vill tacka

- min handledare Thomas H¨oglund som var min f¨orsta f¨orel¨asare i san- nolikhetsteori och omedelbart fick mig att fatta tycke f¨or ¨amnet.

- Anders Martin-L¨of som alltid har funnits till hands att bolla id´eer med.

- Orjan¨ Stenflo som var mycket bra som en f¨orsta medf¨orfattare, ordent- ligt sporrande och tillr¨ackligt kr¨avande.

- ¨ovriga kollegor p˚a matematisk statistik f¨or att vi fikar, ¨ater, undervisar och forskar s˚a bra tillsammans. S¨arskilt fikar.

- min familj och v¨anner, f¨or allt det som sker utanf¨or min lilla verkstad.

Stockholm 25 januari 2007 Andreas Nordvall Lager˚as Contents

Introduction and summary of the four papers 1

1 Paper I 1 1.1 Markov chains as iterated function systems ...... 1 1.2 Limit theorems ...... 2 1.3 Main result ...... 3

2 Paper II 4 2.1 Renewal processes and beyond ...... 4 2.2 Cox processes ...... 5 2.3 Main result ...... 6

3 Paper III 8 3.1 Compound distributions ...... 8 3.2 Branching processes in continuous time ...... 8 3.3 Binary branching: the Yule process ...... 9 3.4 Main result ...... 10

4 Paper IV 12 4.1 Population models ...... 12 4.2 Main result ...... 14

List of papers

I Lager˚as, A. N. and Stenflo, O.¨ (2005) Central limit theorems for contractive Markov chains.∗ Nonlinearity, 18(5), 1955–1965.

II Lager˚as, A. N. (2005) A renewal-process-type expression for the mo- ments of inverse subordinators.† Journal of , 42(4), 1134–1144.

III Lager˚as, A. N. and Martin-Lof,¨ A. (2006) Genealogy for super- critical branching processes.‡ Journal of Applied Probability, 43(4), 1066–1076.

IV Lager˚as, A. N. (2006) A population model for Λ-coalescents with neutral mutations. Submitted.

∗ c 2005 IOP Publishing Ltd. † c 2005 The Applied Probability Trust ‡ c 2006 The Applied Probability Trust

Introduction and summary of the four papers

This thesis consists of four papers that concern different areas in probability theory. The following pages have short summaries of each article for the non-specialist probabilist.

1 Paper I: Central limit theorems for con- tractive Markov chains

The first article of this thesis is a joint work with Orjan¨ Stenflo, and was first published in Nonlinearity (2005), vol 18 no 5. It was also a part of my licentiate thesis.

1.1 Markov chains as iterated function systems

Consider the following way of generating a (Zn)n∈N on some state space S. Let w be a collection of functions defined on S. Given { i}i∈I that Zn = zn, draw a Xn on I, whose distribution may depend on zn, and let Zn+1 = wXn (Zn). In fact, any Markov chain can ∗ be described in this way, with I = [0, 1] and Xn being uniform on I and independent of Zn, but sometimes it is more natural to take Xn dependent on Zn. Example Let (Z ) N have state space 0, 1, 2, 3 , and transition matrix n n∈ { } 1 1 0 2 0 2 1 1 2 0 2 0 (pij) =  1 3  0 4 0 4  3 0 1 0  4 4    which we can also describe with this picture:

3/4

Ò Ò 1/2 1/2 3/4 * * * * 89:;?>=<0 Vj 89?>1:;==<2 Vj 89?>3:;=< j _ h _ h _ h B @ 1/2 1/4 1/4 ~B S k W Z ] _ a d g 1/2

Here full (dashed) arrows indicate a step up (down) modulo 4, and the double 3 1 1 (simple, half) arrowheads belong to jumps occurring with probability 4 ( 2 , 4 ). ∗At least if S is Borel, see Proposition 7.6 in Kallenberg, O. (1997) Foundations of Modern Probability. Springer.

1 One way of generating this Markov chain is to let

w (z) z 1 mod 4 ↓ ≡ − w (z) z + 1 mod 4 ↑ ≡ z 1 mod 4 for z = 0, 1 wl(z) − ≡ (z + 1 mod 4 for z = 2, 3 and let X1, X2, . . . be an i.i.d. sequence of random elements in , , such 1 1 1 {↓ ↑ l} that P (Xn = ) = 4 , P (Xn = ) = 2 and P (Xn = ) = 4 for all n, and then set ↓ ↑ l

Z = w n (Z ) = w n w n− (Z ) = = w n w (Z ). n+1 X n X ◦ X 1 n−1 · · · X ◦ · · · ◦ X1 0

If we allow X1, X2, . . . to be dependent on the values of Z1, Z2, . . . we can describe the dynamics with only w↓ and w↑, namely by letting 1 1 P (X = Z = 0, 1) = , P (X = Z = 0, 1) = , n ↓ | n 2 n ↑ | n 2 1 3 P (X = Z = 2, 3) = , P (X = Z = 2, 3) = , n ↓ | n 4 n ↑ | n 4 which is arguably more natural. In the article, we investigate properties of Markov chains with a compact state space, typically a closed and bounded subset of Rn, that are generated by a collection of contractive maps w , with I being countable, and the { i}i∈I probability of Xn = i given Zn = z is given by pi(z) for some functions pi(z) i∈I . Such collections ( wi , pi ) are called iterated function systems {with place} dependent probabilities.{ } { }

1.2 Limit theorems d When Zn Z, with Z having a stationary distribution for (Zn)n∈N, you typically ha−v→e a for the Markov chain:

1 n f(Z ) a.s. E[f(Z)], n n −→ i=1 X or even a central limit theorem:

1 n f(Z ) E[f(Z)] d N(0, σ2) (1) √n k − −→ Xk=1   2 where f is a function from the state space S to R. One could also want to center the summands in (1) by E[f(Zk)] instead of E[f(Z)]. A so called func- tional central limit theorem is a stronger version of a central limit theorem in which [nt] 1 f(Z ) E[f(Z)] d σB , √n k − −→ t Xk=1   where (Bt)0≤t≤1 is a standard Brownian motion.

1.3 Main result

Our main result concerns Markov chains, that have contractive maps wi i∈I when they are viewed as iterated function systems. Hence they are{ called} contractive Markov chains. With conditions on the smoothness of f and p we obtain a functional central limit theorem. The conditions are { i}i∈I such that a highly regular f allows for more “wild” pi i∈I and vice versa. We also state the results with conditions on the rate of{ con} vergence towards the stationary distribution for the Markov chain. Often one consider chains such that the rate is, in a sense, “exponential”, but our results also work with even slower convergence.

3 2 Paper II: A renewal-process-type expression for the moments of inverse subordinators

This article was first published in Journal of Applied Probability (2005), vol 42 no 4, and was a part of my licentiate thesis.

2.1 Renewal processes and beyond One of the first stochastic processes that one is introduced to in a beginners course in stochastic processes is the renewal process. It is simply a collection of points in time, events of some sort, such that the times between consecutive events are independent and identically distributed. One quantity of interest is Nt = “the number of events in [0, t].” The simplest case of a renewal process is the Poisson process, which has an for the times between events. The name comes from the fact that Nt is Poisson distributed for all t. Calculations for the Poisson process are greatly simplified by the fact that it is a Markov process in continuous time, and the number of events in disjoint time intervals are independent. For other renewal processes, one can hardly give any exact results about the distribution of Nt. An exception to this is that if one has an explicit expression for E[Nt] it can be used to calculate moments of arbitrary integer order for the joint moments of the increments of (Nt) over disjoint intervals. It is easiest to state [k] the result with factorial moments instead of ordinary moments: E[Nt ] = E[Nt(Nt 1) (Nt k + 1)]. Note that these are the moments you get if you differen− tiate· · · the−probability generating function of a random variable and plug in 1:

f(s) = E[sZ ] f (k)(s) = E[Z[k]sZ−k] f (k)(1) = E[Z[k]] ⇒ ⇒ N The state space of (Nt)t∈R+ is clearly 0, the non-negative integers. I was searching in the literature for a simple increasing process, that like a renewal process had some dependence between increments over disjoint time intervals, but in contrast to renewal processes had the whole of R+ as its state space. Here is where the “inverse subordinators” of the title of this article enter. These processes are the analogs of renewal processes when the state space is R+. To see why, note that (Nt) is the inverse of the random n walk Sn = k=0 Xk with independent steps X0, X1, . . . , corresponding to the times between events of the renewal process: Nt = min(n N0 : Sn > t). P ∈ 4 6 6 Nt Sn r r b

r b r b r b r b

r b r b b - - t n

Figure 1: To the left we have a realization of the renewal process (Nt), and to the right its inverse, the (Sn).

The renewal process has state space N0, since (Sn) only changes value at times in N0, see figure 1. The equivalent of random walks in continuous time are called L´evy pro- cesses, and increasing L´evy processes are called subordinators. In this sense, we can say that the equivalents of renewal processes with a continuous state space are the inverse subordinators constructed as τ = inf(τ R : Y > t), t ∈ + τ where (Ys) is a subordinator, see the top part of figure 3.

2.2 Cox processes The Poisson process mentioned above is also referred to as a homogeneous Poisson process, since the process is homogeneous in time. One can extend this process to an inhomogeneous process and still keep the Poisson property, namely that N N , the number of points in the interval (s, t], is Poisson t − s distributed, but the renewal property will then be lost. In the general case, the mean of Nt Ns is λ((s, t]), where λ is a given measure on R, called the intensity measure.− See figure 2 for an illustration. In the homogeneous case λ((s, t]) = c(t s), for some positive constant c. − If we let Λ be a , and given a realization Λ = λ, let Π be the points of the corresponding inhomogeneous Poisson process, we get a Π that is called a . Note that a Poisson random variable Z with mean l has probability generating function f(s) = el(s−1) and thus factorial moments E[Z[k]] = lk. Generalizing this result, we obtain the well known relation that the factorial moments of a Cox process equal the ordinary moments of its random measure. This is also true for joint

5 6 λ((0, t]) d dt λ((0, t])

- ×× × ××× × t

Figure 2: The ’s denote the points of an inhomogeneous Poisson process, whose intensity×measure of (0, t] is given by the dashed line. The measure has a density which is illustrated by the full lines. Note that no points can occur where this density is zero. moments. It is also known that if we produce our random measure with aid of an inverse subordinator by letting Λ((s, t]) = τt τs, then the Cox process is also a renewal process. −

2.3 Main result It can be as hard to calculate properties of inverse subordinators as it is for renewal processes, but not necessarily harder! The main result of the article is an expression for the joint moments of arbitrary integer order for the increments of any inverse subordinator, that is similar to the already known expression for factorial moments of renewal processes. A sketch of the proof goes as follows: By constructing a random measure from a given inverse subordinator, and from that a Cox process as above, we are left with a Cox process that is also a renewal process. Ordinary moments of the inverse subordinator equal the corresponding factorial moments of the constructed Cox process, and since the Cox process by construction also is a renewal process, we can calculate those factorial moments. Other results from also carry over to inverse subordinators with this device.

6 6 τt ¨¨ ¨¨ ¨¨ ¨¨ ¨¨ ¨ ¨ ¨ ¨¨ ¨¨ ¨¨ ¨¨ - t

6

- × × × ×× × × t

6 Nt r r b r b r b r b r b r b b - t

Figure 3: Top This is a realization of an inverse subordinator, in fact an inverse of a with drift. This implies that the sloping parts have i.i.d. exponential lengths, and the flat parts are also i.i.d. according to some distribution. d Middle Here is a graph of dt τt and a realization of the Cox process Π with (τt) as its intensity measure. Bottom The counting process Nt associated to Π, which is both a renewal and a Cox process.

7 3 Paper III: Genealogy for supercritical branching processes

This article is a joint work with Anders Martin-L¨of, and was first published in Journal of Applied Probability (2006), vol 43 no 4.

3.1 Compound distributions In order to understand the results of this article, one needs to know what a compound distribution is. We say that a random variable X is compound-N if N d X = Yi, i=1 X where N, Y1, Y2, . . . are independent, N has a distribution on the non-negative integers and Y1, Y2, . . . all have the same distribution.

3.2 Branching processes in continuous time A Markov in continuous time is a random process, whose value at any given time is the number of individuals alive in a population that evolves as follows. At time t = 0 there is one individual. She lives for an exponentially distributed time with intensity µ, and when she dies, she gives birth to a random number of children, distributed as X, say. Each individual has life length and offspring size that have the same distributions as those of her mother and are furthermore independent of those of her sisters. The evolution carries on in the same way with the grandchildren, etc. If E[X] =, < or > 1 the process is called critical, subcritical or supercrit- ical respectively. It is well know that the process dies out almost surely if it is critical or subcritical. If it is supercritical, the probability of extinction, q, is strictly less than one, and in the case of non-extinction the population size tends to infinity as t . → ∞ In this article we investigate the distribution of the number of individuals in supercritical branching processes. An individual in the population will at time t have an infinite line of descent, i.e. descendants at all further times, if the branching process that starts with her as an ancestor tends to infinity in size. This will happen with probability p = 1 q independently of what happens with the descendants of the other individuals− in the population at time t. This implies that the number of individuals who have an infinite line of descent will have a binomial distribution with parameters n and p, given that the population has size n at time t. We let N 0 if N Bin(0, p). ≡ ∼ 8 f ✝ f ✝ ✝ v ? v ? ✝ ? v ? ✝ f ✝ ✝ f ✝ v ? v ? v ? v ? ✝ f ✝ v ? - 0 t time

Figure 4: This picture illustrates a supercritical branching process. The full lines denote individuals who have an infinite line of descent, and the dashed lines those who have not. The 13 individuals that are alive at time t are denoted with circles. Each of those 13 individuals have a chance p of having an infinite line of descent, independently of all the other ones. In the picture, 8 of the 13 have that, and are denoted by full circles. Deaths of individuals who leave no children after themselves are denoted by a cross. At the rightmost part of the tree, the crosses also denote lineages that will eventually die out, whereas the stars denote infinite lineages.

This relation means that it sometimes suffices to study the subpopulation of individuals who have an infinite line of descent in order to understand the dynamics of a supercritical branching process in general. It turns out that this subpopulation also can be described as a branching process, see figure 4, and this branching process has X 2. ≥ 3.3 Binary branching: the Yule process A branching process that has X 2, i.e. only binary branching, is called a ≡ Yule process. It is one of the very few types of branching processes whose distribution can be calculated exactly. At any time t, the number of births −µt until t, which we call Nt, has a geometric distribution with parameter e .

9 Since a death and birth event creates a net increase of one, and we start with one individual, the size of the population at t is 1 + Nt. It is easy to show that given Nt = n, the birth times in the population, τ(1) < τ(2) < . . . , have the distribution as an ordered sample of size n from independent τ1, τ2, . . . , that have a certain distribution F depending on t and µ.

3.4 Main result

If Zt is the size at time t of a branching process Z with X 2, e.g. the subpopulation of individuals with an infinite line of descent in a≥supercritical branching process, then Z 1 is compound geometric, or more exact: t − Nt d Zt = 1 + Yi, (2) i=1 X −µt where Nt is geometric with parameter e , and independent of the Y1, . . . , which are all i.i.d. as some Y . The random variable Y 1 itself also has a compound distribution: − X−2 d (j) Y = 1 + Zt−τ , (3) j=1 X (1) where Zt−τ , . . . are i.i.d. as the value of the process Z at a random time t τ, where τ has distribution F as in the previous section about the Yule process.− This should be understood as follows. We can embed a Yule process Z in any branching process with X 2, simply by picking out exactly two of ≥ the ancestor’s children, two children from each of the offspring of those two,b etc., see figure 5. Let Zt be the number of individuals in this subpopulation at time t. We can now decompose Zt: b Z = Z + (Z Z ) = 1 + N + (Z Z ), t t t − t t t − t where (Z Z ) counts all individuals that are related to sisters of the indi- t − t b b b viduals in Zt. A time of a birth in Z will be distributed as τ if we pick it uniformly at randomb from all the birth times between 0 and t. An additional number of Xb 2 individuals will be bborn in Z at that time. Each of these individuals will− start independent branching process that will evolve as Z, but only during a random time of length t τ. Hence − i Nt X( )−2 (Z Z ) =d Z(ij) , t − t t−τi i=1 j=1 X X b 10 -

-

× ×

× × × u × × × - τ t 0 time

Figure 5: A branching process with X 2. We have also embedded a Yule process in this process (the full lines).≥The ’s denote the times of birth in the Yule process. Consider the time τ, at×which a birth in both Z and Z occurs. Here X = 4, and thus the two new individuals in Z have two (= X 2) sisters, indicated by the arrows, that start their own independent − brancb hing processes whose sizes at time t are distributed as Zt−τb.

(1) (ij) with X , . . . being i.i.d. as X and all Zt i.i.d. as Zt. If we put all this together we arrive at (2) and (3). This reasoning is quite heuristic, but in the article the results are proved rigorously with the aid of generating functions and their properties.

11 4 Paper IV: A population model for Λ-coales- cents with neutral mutations

This article has been submitted for peer-review.

4.1 Population models The problems studied in this article come from the field of theoretical pop- ulation dynamics. Consider a sample of n individuals from some very large population. We want to know how these individuals are related to each other. When we trace their lineages backwards in time and reach a common ances- tor to some, or all, the individuals in our sample, in effect we have reached a branching point on the family tree of the individuals in the sample. We continue until we have reached the most recent common ancestor of all the individuals. There are two natural conditions to be put on this process of coalescing lineages. First, it should be Markov. Second, it should be consistent in the following sense. If we draw the family tree of a sample of n + 1 individuals and then delete the branch of one of the individuals, the resulting tree should have the same distribution as if we started with n individuals and did not delete any branch. If both these conditions are fulfilled, the dynamics of the process can be completely parametrized by a finite measure Ξ on the infinite ∞ simplex ∆ = (x1, . . . ) : x1 x2 . . . , i=1 xi 1 . These processes are called Ξ-coalescen{ ts or coalescen≥ ts≥with simultaneous≤ }multiple collisions. We only consider processes with at mostP one collision at any given time, i.e. we will never reach two or more ancestors to different groups in our sample at the same time. In this case the process is completely parametrized by a

Figure 6: Example of population dynamics without mutations. The popula- tion consists of three families represented by circles. An individual denoted by a cross is picked at random and in the next step to the right, she has begot offspring denoted by the black circle. The total area of the circles is constant throughout the series.

12 finite measure Λ on [0, 1], and is called a Λ-coalescent or a coalescent with multiple collisions. We say that the process has (possibly) multiple collisions since more than two lineages may reach their common ancestor at any given time. For the dynamics of the population this means that the common ancestor had such a large number of offspring at the time of collision that it constituted a considerable fraction of all of the pop- ulation in the next generation, since it would otherwise be highly unlikely that more than two individuals in your sam- ple, representing different lineages, would happen to have the same mother at any given generation. This reasoning holds if the original sample was sampled uni- formly from the entire population as we have assumed. The population thus evolves in jumps, where an individual is picked uniformly at random and begets offspring with size being a fraction, say X, of the total pop- ulation and the rest of the population is scaled down by a factor 1 X. See figure 6 for an illustration with X−= 0.5, 0.4 and Figure 7: The gray circles de- 0.5. Note that large families grow on the note all the “mutants”, or sin- behalf of the smaller ones, since it is more gletons, and only grow by ero- probable that a mother is picked from a sion. If a mutant begets some large family than a small one. offspring, the resulting family will In reality one can often not observe no longer count among the single- the genealogy of a sample directly, but tons. Above, when moving from only partition the sample into groups ac- left to right, mass is eroded and cording to their genetic make-up. Lin- added to the mutants. When eages have different genotypes because of moving from right, down to the mutations that introduce new types that left, the individual denoted by have never been seen before in the popu- a cross begets offspring indicated lation. In the article a model is described, by the black circle. in which mutations occur with a small probability between generations, such that mutations appear with some con- stant rate, when tracing a lineage backwards in time. This have been studied before in the sense of dynamics of the sample, but the novel idea in my pa-

13 per is a description of the dynamics of the whole population, such that any sample behaves as described earlier. The idea is simply that since any lineage mutates at constant rate, the families of the population erode by the same constant rate. The “mutants” all have different genotypes, so they constitute “infinitesimally” small families in the population. Nevertheless it can happen that at the time of a jump in the process, a mother is picked among the “mutants” and from that moment on, her family makes up a positive fraction of the population, see figure 7 for an example.

4.2 Main result The main result of the paper is that a partition of a sample into families will have the correct sampling distribution, when the sample is picked from a population that has evolved according to proposed dynamic for a long time. With correct distribution, we mean that the distribution is the same as for a sample from a coalescent process with mutations, a type of process that has been investigated by others earlier. This shows that the proposed model for the whole population has the right dynamics.

14