<<

Stochastic Chemical Kinetics

A Special Course on Probability and Stochastic Processes for Physicists, Chemists, and Biologists

Summer Term 2011

Version of July 7, 2011

Peter Schuster Institut f¨ur Theoretische Chemie der Universit¨at Wien W¨ahringerstraße 17, A-1090 Wien, Austria Phone: +43 1 4277 527 36 , Fax: +43 1 4277 527 93 E-Mail: pks @ tbi.univie.ac.at Internet: http://www.tbi.univie.ac.at/~pks 2 Peter Schuster Preface

The current text contains notes that were prepared first for a course on ‘Stochastic Chemical Kinetics’ held at Vienna University in the winter term 1999/2000 and repeated in the winter term 2006/2007. The current version refers to the summer term 2011 but no claim is made that it is free of errors. The course is addressed to students of chemistry, biochemistry, molecular biology, mathematical and theoretical biology, bioinformatics, and systems biology with particular interests in phenomena observed at small particle numbers. Stochastic kinetics is an interdisciplinary subject and hence the course will contain elements from various disciplines, mainly from , mathematical statistics, stochastic processes, chemical kinetics, evo- lutionary biology, and computer science. Considerable usage of mathemat- ical language and analytical tools is indispensable, but we have consciously avoided to dwell upon deeper and more formal mathematical topics. This series of lectures will concentrate on principles rather than technical details. At the same it will be necessary to elaborate tools that allow to treat real problems. Analytical results on stochastic processes are rare and thus it will be unavoidable to deal also with approximation methods and numerical techniques that are able to produce (approximate) results through computer calculations (see, for example, the articles [1–4]). The applica- bility of simulations to real problems depends critically on population sizes that can by handled. Present day computers can deal with 106 to 107 parti- cles, which is commonly not enough for chemical reactions but sufficient for most biological problems and accordingly the sections dealing with practical examples will contain more biological than chemical problems. The major goal of this text is to avoid distraction of the audience by taking notes and to facilitate understanding of subjects that are quite sophisticated at least in parts. At the same time the text allows for a repetition of the major issues of the course. Accordingly, an attempt was made in preparing a

3 4 Peter Schuster useful and comprehensive list of references. To study the literature in detail is recommended to every serious scholar who wants to progress towards a deeper understanding of this rather demanding discipline. Apart from a respectable number of publications mentioned during the progress of the course the following books were used in the preparation [5–9], the German textbooks [10, 11], elementary texts in English [12, 13] and in German [14]. In addition, we mention several other books on probability theory and stochastic processes [15–26]. More references will be given in the chapters on chemical and biological applications of stochastic processes.

Peter Schuster Wien, March 2011. 1. History and Classical Probability

An experimentalist reproduces an experiment. What can he expect to find? There are certainly limits to precision and these limits confine the repro- ducibility of experiments and at the same time restrict the predictability of outcomes. The limitations of correct predictions are commonplace: We witness them every day by watching the failures of various forecasts from the weather to the stock market. Daily experience also tells us that there is an enormous variability in the sensitivity of events with respect to preci- sion of observation. It ranges from the highly sensitive and hard to predict phenomena like the ones just mentioned to the enormous accuracy of astro- nomical predictions, for example, the precise dating of the eclipse of the sun in Europe on August 11, 1999. Most cases lie between these two extremes and careful analysis of unavoidable randomness becomes important. In this series of lectures we are heading for a formalism, which allows for proper ac- counting of the limitations of the deterministic approach and which extends the conventional description by differential equations. In order to be able to study reproducibility and sensitivity of prediction by means of a scientific approach we require a solid mathematical basis. An appropriate conceptual frame is provided by the theory of probability and stochastic processes. Conventional or deterministic variables have to be re- placed by random variables which fluctuate in the sense that different values are obtained in consecutive measurements under (within the limits of control) identical conditions. The solutions of differential equations, commonly used to describe time dependent phenomena, will not be sufficient and we shall search for proper formalisms to model stochastic processes. Fluctuations play a central role in the stochastic description of processes. The search for the origin of fluctuations is an old topic of , which still is of high current interest. Some examples will be mentioned in the next section 1.1. Recently,

5 6 Peter Schuster the theory of fluctuations became important because of the progress in spec- troscopic techniques, particularly in fluorescence spectroscopy. Fluctuations became directly measurable in fluorescence correlation spectroscopy [27–29]. Even single molecules can be efficiently detected, identified, and analyzed by means of these new techniques [30, 31]. In this series of lectures we shall adopt a phenomenological approach to sources of randomness or fluctuations. Thus, we shall not search here for the various mechanisms setting the limitations to reproducibility (except a short account in the next section), but we will develop a mathematical technique that allows to handle stochastic problems. In the philosophy of such a phenomenological approach the size of fluctuations, for example, is given through external parameters that can be provided by means of a deeper or more basic theory or derived from experiments. This course starts with a very brief primer of current probability theory (chapter 2), which is largely based on an undergraduate text by Kai Lai Chung [5] and an introduction to stochasticity by Hans-Otto Gregorii [11] . Then, we shall be concerned with the general formalism to describe stochastic processes (chapter 3, [6]) and present several analytical as well as numerical tools to derive or compute solutions. The following two chapters 4 and 5 deal with applications to selected problems in chemistry and biology. Stochastic Kinetics 7

1.1 Precision limits and fluctuations

Conventional chemical kinetics is handling ensembles of molecules with large numbers of particles, N 1020 and more. Under many conditions1 random ≈ 4 fluctuations of particle numbers are proportional to √N. Dealing with 10− moles, being tantamount to N = 1020 particles, natural fluctuations involve 10 10 typically √N = 10 particles and thus are in the range of 10− N. Un- ± der these conditions the detection of fluctuations would require a precision 10 2 in the order of 1 : 10− , which is (almost always) impossible to achieve. Accordingly, the chemist uses concentrations rather than particle numbers, c = N/(N V ) wherein N = 6.23 1023 and V are Avogadro’s number L × L × and the volume (in dm3), respectively. Conventional chemical kinetics con- siders concentrations as continuous variables and applies deterministic meth- ods, in essence differential equations, for modeling and analysis of reactions. Thereby, it is implicitly assumed that particle numbers are sufficiently large that the limit of infinite particle numbers neglecting fluctuations is fulfilled. In 1827 the British botanist Robert Brown detected and analyzed ir- regular of particles in aqueous suspensions that turned out to be independent of the nature of the suspended materials – pollen grains, fine particles of glass or minerals [32]. Although Brown himself had already demonstrated that Brownian is not caused by some (mysterious) bi- ological effect, its origin remained kind of a riddle until Albert Einstein [33], and independently by Marian von Smoluchowski [34], published a satisfac- tory explanation in 1905 which contained two main points:

(i) The motion is caused by highly frequent impacts on the pollen grain of the steadily moving molecules in the liquid in which it is suspended.

(ii) The motion of the molecules in the liquid is so complicated in detail that its effect on the pollen grain can only be described probabilistically

1Computation of fluctuations and their time course of will be the subject of this course. Here we mention only that the √N law is always fulfilled in the approach towards equi- librium and stable stationary states. 2Most techniques of analytical chemistry meet serious difficulties when accuracies in 4 particle numbers of 10− or higher are required. 8 Peter Schuster

in terms of frequent statistically independent impacts.

In particular, Einstein showed that the number of particles per unit volume, f(x, t),3 fulfils the already known differential equation of diffusion,

2 ∂f ∂2f N exp x /(4Dt) = D with the solution f(x, t) = − , ∂t ∂x2 √4πD  √t  where N is the total number of particles. From the solution of the diffusion equation Einstein computes the square root of the mean square ,

λx, the particle experiences in x-direction:

2 λx = √x¯ = √2Dt .

Einstein’s treatment is based on discrete time steps and thus contains an approximation – that is well justified – but it represents the first analysis based on a probabilistic concept of a process that is comparable to the current theories and we may consider Einstein’s paper as the beginning of stochastic modeling. Brownian motion was indeed the first completely random process that became accessible to a description that was satisfactory by the standards of . Thermal motion as such had been used previously as the irregular driving causing collisions of molecules in gases by James Clerk Maxwell and Ludwig Boltzmann. The physicists in the second half of the nineteenth century, however, were concerned with molecular motion only as it is required to describe systems in the thermodynamic limit. They derived the desired results by means of global averaging statistics. Thermal motion as an uncontrollable source of random fluctuation has been complemented by quantum mechanical uncertainty as another limita- tion of achievable precision. For the purpose of this course the sensitivity of processes too small (and uncontrolled) changes in initial conditions, however, is of more relevance than the consequences of uncertainty. Analysis of com- plex dynamical systems was initiated in essence by Edward Lorenz [35] who detected through computer integration of differential equations what is nowa- days called deterministic chaos. Complex in physics and chemistry

3For the sake of simplicity we consider only motion in one spatial direction, x. Stochastic Kinetics 9 has been known already earlier as the works of the French mathematician Henri Pioncar´eand the German chemist Wilhelm Ostwald demonstrate. New in the second half of this century were not the ideas but the tools to study complex dynamics. A previously unknown in the analysis by numerical computation became available through easy access to electronic computers. These studies have shown that the majority of dynamical systems modeled by nonlinear differential equations show irregular – that means non-periodic – oscillation for certain ranges of parameter values. In these chaotic regimes solutions curves were found to be extremely sensitive to small changes in the initial conditions. Solution curves which are almost identical at the beginning deviate exponentially from each other. Limitations in the control of initial conditions, which are inevitable because of the natural limits to achievable precision, result in upper bounds of the time spans for which the dynamics of the system can be predicted with sufficient accuracy. It is not accidental that Lorenz detected chaotic dynamics first in equations for atmospheric motions which are thought to be so complex that forecast is inevitably limited to rather short . Fluctuations play an important role in highly sensitive dynamical sys- tems. Commonly fluctuations increase with time and any description of such a system will be incomplete when it does not consider their development in time. Thermal fluctuations are highly relevant at low concentration of one, two or more reaction partners or intermediates and such situations oc- cur almost regularly in oscillating or chaotic systems. An excellent and well studied example in chemistry is the famous Belousov-Zhabotinskii reaction. In biology, on the other hand, we encounter regularly situations that are driven by fluctuations. Every mutation leading to a new variant produces a single individual at first. Whether or not the mutant will be amplified to population level depends on both, the properties of the new individual and events that are completely governed by chance. Both phenomena, quantum mechanical uncertainty and sensitivity of complex dynamics, provided an ultimate end for the deterministic view of the world. Quantum set a principle limit to determinism that com- monly becomes evident only in the world of atoms and molecules. Limited 10 Peter Schuster predictability of complex dynamics is more of a practical nature: Although the differential equations used to describe and analyze chaos are still deter- ministic, initial conditions of a precision that can never be achieved in reality would be required for correct long-time predictions.

1.2 Thinking in terms of probability

The concept of probability originated from the desire to analyze gambling by rigorous mathematical thoughts. An early study that has largely re- mained unnoticed but contained already the basic ideas of probability was done in the sixteenth century by the Italian mathematician Gerolamo Car- dano. Commonly, the beginning of classical probability theory is attributed to the French mathematician Blaise Pascal who wrote in the middle of sev- enteenth century – 100 years later – several letters to Pierre de Fermat. The most famous of this letters, dated July 29, 1654, reports the careful observa- tion of a professional gambler, the Chevalier de M´er´e. The Chevalier observed that obtaining at least one “six” with one die in 4 throws is successful in more than 50% whereas obtaining at least two times the “six” with two dice in 24 throws has less than 50% chance to win. He considered this finding as a paradox because he calculated na¨ıvely and erroneously that the chances should be the same: 1 2 4 throws with one die yields 4 = , × 6 3 1 2 24 throws with two dice yields 24 = . × 36 3 Blaise Pascal became interested in the problem and calculated correctly the probability as we do it now in classical probability theory by careful counting of events: number of favorable events probability = Prob = . (1.1) total number of events Probability according to equation (1.1) is always a positive quantity between zero and one, 0 Prob 1. The sum of the probabilities that an event ≤ ≤ has occurred or did not occur thus has to be always one. Sometimes, as in Stochastic Kinetics 11

Pascal’s example, it is easier to calculate the probability of the unfavorable case, q, and to obtain the desired probability as p = 1 q. In the one- − die example the probability not to throw a “six” is 5/6, in the two-dice example we have 35/36 as the probability of failure. In case of independent events probabilities are multiplied4 and we finally obtain for 4 and 24 trials, respectively:

5 4 5 4 q(1) = and p(1) = 1 = 0.5177 , 6 − 6     35 24 35 24 q(2) = and p(2) = 1 = 0.4914 . 36 − 36     It is remarkable that the gambler could observe this rather small difference in the probability of success – he must have tried the game very often indeed! Statistics in biology has been pioneered by the Augustinian monk Gregor Mendel. In table 1.1 we list the results of two typical experiments distin- guishing roundish or wrinkled seeds with yellow or green color. The ratios observed for single plants show large scatter. In the mean values for ten plants some averaging has occurred but still the deviations from the ideal values are substantial. Mendel carefully investigated several hundred plant and then the statistical law of inheritance demanding a ratio of 3:1 became evident [36]. Ronald Fisher in a rather polemic publication [37] reanalyzed Mendel’s experiments, questioned Mendel’s statistics, and accused him of intentionally manipulating his data because the results are too close to the ideal ratio. Fisher’s publication initiated a long lasting debate during which the majority of scientists spoke up in favor of Mendel until 2008 recent book declared the end of the Mendel-Fisher controversy [38]. In chapter 5 we shall discuss statistical laws and Mendel’s statistics in the light of present day mathematical statistics. The third example we mention here can be used to demonstrate the usual weakness of people in estimating probabilities. Let your friends guess – without calculating – how many persons you need in a group such that there

4We shall come back to the problem of independent events later when we introduce current probability theory in section 2, which is based on set theory. 12 Peter Schuster

Table 1.1: Statistics of Gregor Mendel’ experiments with the garden pea (pisum sativum). The results of two typical experiments with ten plants are shown. In total Mendel analyzed 7324 seeds from 253 hybrid plants in the second trial year, 5474 were round or roundish and 1850 angular wrinkled yielding a ratio 2.96:1. The color was recorded for 8023 seeds from 258 plants out of which 6022 were yellow and 2001 were green with a ratio of 3.01:1.

Form of seed Color of seeds plants round angular ratio yellow green ratio 1 45 12 3.75 25 11 2.27 2 27 8 3.38 32 7 4.57 3 24 7 3.43 14 5 2.80 4 19 10 1.90 70 27 2.59 5 32 11 2.91 24 13 1.85 6 26 6 4.33 20 6 3.33 7 88 24 3.67 32 13 2.46 8 22 10 2.20 44 9 4.89 9 28 6 4.67 50 14 3.57 10 25 7 3.57 44 18 2.44 total 336 101 3.33 355 123 2.89

is fifty percent chance that at least two of them celebrate their birthday on the same day – You will be surprised by the oddness of the answers! With our knowledge on the gambling problem the probability is easy to calculate. First we compute the negative event: all person celebrate their birthdays on different days in the year – 365 days, no leap-year – and find for n people in the group,5 . 365 364 363 365 (n 1) q = . . . − − and p = 1 q . 365 · 365 · 365 · · 365 − 5The expressions is obtained by the argument that the first person can choose his birthday freely. The second person must not choose the same day and so he has 364 possible choices. For the third remain 363 choices and the nth person, ultimately, has 365 (n 1) possibilities. − − Stochastic Kinetics 13

Figure 1.1: The birthday puzzle. The curve shows the probability p(n) that two persons in a group of n people celebrate birthday on the same day of the year.

The function p(n) is shown in figure 1.1. For the above mentioned 50% chance we need only 27 persons, with 41 people we have already more than 90% chance that two celebrate birthday one the same day; 57 yield more than 99% and 70 persons exceed 99,9%. The fourth and final example deals again with counterintuitive proba- bilities: The coin toss game Penney Ante invented by Walter Penney [39]. Before a sufficiently long sequence of heads and tails is determined by flip- ping each of two players chooses a sequence of n consecutive flips – commonly n = 3 is applied and this leaves the choice of the eight triples shown in ta- ble 1.2. The second player has the advantage of knowing the choice of the first player. Then a sufficiently long sequence of coins flips is recorded un- til both of the chosen triples have appeared in the sequence. The player whose sequence appeared first has won. The advantage of the second player is commonly largely underestimated when guessed without calculation. A simple argument illustrates the disadvantage of player 1: Assume he had chosen ’111’. If the second player chooses a triple starting with ’0’ the only chances for player 1 to win are expressed by the sequences beginning ’111. . . and they have a probability of p=1/8 leading to the odds 7 to 1 for player 2. Eventually, we mention the optimal strategy for player 2: Take the first two digits of the three-bit sequence of player 1 and precede it with the opposite 14 Peter Schuster

Table 1.2: Advantage of the second player in Penney’s game. Two players choose two triples of digits one after the other, player 2 after player 1. Coin flipping is played until the two triples appear. The player whose triple came first has won. An optimally gambling player 2 (column 2) has the advantage shown in column 3. Code: 1 = head and 0 = tail. The optimal strategy for player 2 is encoded by color and boldface (see text).

Player’s choice Outcome Player 1 Player 2 Odds in favor of player 2 111 011 7 to 1 110 011 3 to 1 101 110 2 to 1 100 110 2 to 1 011 001 2 to 1 010 001 2 to 1 001 100 3 to 1 000 100 7 to 1

of the symbol in the middle of the triple (The shifted pair is shown in red, the switched symbol in bold in table 1.2). Probability theory in its classical form is more than 300 years old. Not accidentally the concept arose in thinking about gambling, which was con- sidered as a domain of chance in contrast to rigorous science. It took indeed rather long time before the concept of probability entered scientific thought in the nineteenth century. The main obstacle for the acceptance of prob- abilities in physics was the strong belief in determinism that has not been overcome before the advent of quantum theory. Probabilistic concepts in physics of last century were still based on deterministic thinking, although the details of individual events were considered to be too numerous and too complex to be accessible to calculation. It is worth mentioning that think- ing in terms of probabilities entered biology earlier, already in the second half of the nineteenth century through the reported works of Gregor Mendel Stochastic Kinetics 15 on genetic inheritance. The reason for this difference appears to lie in the very nature of biology: small sample sizes are typical, most of the regular- ities are probabilistic and become observable only through the application of probability theory. Ironically, Mendel’s investigations and papers did not attract a broad scientific audience before their rediscovery at the beginning of the twentieth century. The scientific community in the second half of the nineteenth century was simply not yet prepared for the acceptance of probabilistic concepts. Although classical probability theory can be applied successfully to a great variety of problems, a more elaborate notion of probability that is derived from set theory is advantageous and absolutely necessary for extrap- olation to infinitely large sample size. Here we shall use the latter concept because it is easily extended to probability measures on continuous variables where numbers of sample points are not only infinite but also uncountable. 16 Peter Schuster 2. Probabilities, Random Variables, and Densities

The development of set theory initiated by Georg Cantor and Richard Dedekind in the eighteen seventieth provided a possibility to build the concept of prob- ability on a firm basis that allows for an extension to certain families of uncountable samples as they occur, for example, with continuous variables. Present day probability theory thus can be understood as a convenient ex- tension of the classical concept by means of set and measure theory. We start by repeating a few indispensable notions and operations of set theory.

2.1 Sets and sample

Sets are collections of objects with two restrictions: (i) Each object belongs to one set cannot be a member of more sets and (ii) a member of a set must not appear twice or more often. In other words, objects are assigned to sets unambiguously. In application to probability theory we shall denote the elementary objects by the small Greek letter omega, ω – if necessary with various sub- and superscripts – and call them sample points or individual results and the collection of all objects ω under consideration the sample is denoted by Ω with ω Ω. Events, A, are subsets of sample points ∈ that fulfil some condition1

A = ω,ω Ω : f(ω)= c (2.1) k ∈  with ω = (ω1,ω2,...) being some set of individual results and f(ω) = c encapsulates the condition on ensemble of the sample points ωk. 1What a condition means will become clear later. For the it is sufficient to understand a condition as a function providing a restriction, which implies that not all subsets of sample points belong to A.

17 18 Peter Schuster

Any partial collection of points is a subset of Ω. We shall be dealing with fixed Ω and, for simplicity, often call these subsets of Ω just sets. There are two extreme cases, the entire sample space Ω and the empty set, . ∅ The number of points in some set S is called its size, S , and thus is a | | nonnegative integer or . In particular, the size of the empty set is = 0. ∞ |∅| The unambiguous assignment of points to sets can be expressed by2

ω S exclusive or ω / S . ∈ ∈

Consider two sets A and B. If every point of A belongs to B, then A is contained in B. A is a subset of B and B is a superset of A:

A B and B A . ⊂ ⊃

Two sets are identical if the contain exactly the same points and then we write A = B. In other words, A = B iff (if and only if) A B and B A. ⊂ ⊂ The basic operations with sets are illustrated in figure 2.1. We briefly repeat them here:

Complement. The complement of the set A is denoted by Ac and consists of all points not belonging to A:3

Ac = ω ω / A . (2.2) { | ∈ }

There are three evident relations which can be verified easily: (Ac)c = A, Ω c = , and c = Ω. ∅ ∅ Union. The union of the two sets A and B, A B, is the set of points which ∪ belong to at least one of the two sets:

A B = ω ω A or ω B . (2.3) ∪ { | ∈ ∈ }

2In order to be unambiguously clear we shall write or for and/or and exclusive or for or in the strict sense. 3Since we are considering only fixed sample sets Ω these points are uniquely defined. Stochastic Kinetics 19

aaaaaaaaa aaaaaaaaa aaaaaaaaa aaaaaaaaa aaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaa aaaaaaaaaaaaaa aaaaaaaaaaaaaa aaaaa aaaaa

Figure 2.1: Some definitions and examples from set theory. Part a shows the complement Ac of a set A in the sample space Ω. In part b we explain the two basic operations union and intersection, A B and A B, respectively. Parts ∪ ∩ c and d show the set-theoretic difference, A B and B A, and the symmetric \ \ difference, A B. In parts e and f we demonstrate that a vanishing intersection of 4 three sets does not imply pairwise disjoint sets.

Intersection. The intersection of the two sets A and B, A B, is the set ∩ of points which belong to both sets (For short A B is sometimes written ∩ AB):

A B = ω ω A and ω B . (2.4) ∩ { | ∈ ∈ }

Unions and intersections can be executed in sequence and are also defined 20 Peter Schuster for more than two sets, or even for an infinite number of sets:

A = A A = ω ω A for at least one value of n , n 1 ∪ 2 ∪··· { | ∈ n } n=1,... [ A = A A = ω ω A for all values of n . n 1 ∩ 2 ∩··· { | ∈ n } n=1,... \

These relations are true because the commutative and the associative laws are fulfilled by both operations, intersection and union:

A B = B A , A B = B A ; ∪ ∪ ∩ ∩ (A B) C = A (B C) , (A B) C = A (B C) . ∪ ∪ ∪ ∪ ∩ ∩ ∩ ∩

Difference. The set A B is the set of points, which belong to A but not \ to B: A B = A Bc = ω ω A and ω / B . (2.5) \ ∩ { | ∈ ∈ } In case A B we write A B for A B and have A B = A (A B) as ⊃ − \ \ − ∩ well as Ac = Ω A. − Symmetric difference. The symmetric difference A∆B is the set of points which belongs exactly to one of the two sets A and B. It is used in advanced theory of sets and is symmetric as it fulfils the commutative law, A∆B = B∆A: A∆B = (A Bc) (Ac B) = (A B) (B A) . (2.6) ∩ ∪ ∩ \ ∪ \

Disjoint sets. Disjoint sets A and B have no points in common and hence their intersection, A B, is empty. They fulfill the following relations: ∩ A B = , A Bc and B Ac . (2.7) ∩ ∅ ⊂ ⊂ A number of sets are disjoint only if they are pairwise disjoint. For three sets, A, B and C, this requires A B = , B C = , and C A = . When ∩ ∅ ∩ ∅ ∩ ∅ two sets are disjoint the addition symbol is (sometimes) used for the union, A + B for A B. Clearly we have always the decomposition: Ω= A + Ac. ∪ Stochastic Kinetics 21

Figure 2.2: Sizes of sample sets and countability. Finite, countably infinite, and uncountable sets are distinguished. We show examples of every class. A set is countably infinite when its elements can be assigned uniquely to the natural numbers (1,2,3,. . .,n,. . .).

Sample spaces may contain finite or infinite numbers of sample points. As shown in figure 2.2 it is important to distinguish further between different classes of infinity: countable and uncountable numbers of points. The set of rational numbers, for example, is a countably infinite since the numbers can be labeled and assigned uniquely to the positive integers 1 < 2 < 3 < < ··· n< . The set of real numbers cannot be ordered in such a way and hence ··· it is uncountable. 22 Peter Schuster

2.2 Probability measure on countable sample spaces

Although we are equipped now, in principle, with the tools of probability theory, which shall enable us to handle uncountable sets under certain con- ditions, the starting point pursued in this section will be chosen under the assumptions that our sets are countable.

2.2.1 Probabilities on countable sample spaces

For countable sets it is straightforward to measure the size of sets by counting the numbers of points they contain. The proportion

A P (A) = | | (2.8) Ω | | is identified as the probability of the event represented by the elements of subset A. For another event holds, for example, P (B)= B / Ω . Calculating | | | | the sum of the two probabilities, P (A)+ P (B), requires some care since we know only (figure 2.1):

A + B A B . | | | | ≥ | ∪ | The excess of A + B over the size of the union A B is precisely the size | | | | | ∪ | of the intersection A B and thus we find | ∩ | A + B = A B + A B | | | | | ∪ | | ∩ | or by division through the size of sample space Ω

P (A) + P (B) = P (A B) + P (A B) . ∪ ∩ Only in case the intersection is empty, A B = , the two sets are disjoint ∩ ∅ and their probabilities are additive, A B = A + B , and hence | ∪ | | | | | P (A + B) = P (A) + P (B) iff A B = . (2.9) ∩ ∅ It is important to memorize this condition for later use, because it represents an implicitly made assumption for computing probabilities. Stochastic Kinetics 23

Now we can define a probability measure by means of the basic axioms of probability theory (for alternative axioms in probability theory see, for example [40, 41]):

A probability measure on the sample space Ω is a function of subsets of Ω, P : S P (S) or P ( ) for short, which is defined by the three axioms: → · (i) For every set A Ω, the value of the probability measure is a nonneg- ⊂ ative number, P (A) 0 for all A, ≥ (ii) the probability measure of the entire sample set – as a subset – is equal to one, P (Ω) = 1, and

(iii) for any two disjoint subsets A and B, the value of the probability mea- sure for the union, A B = A + B, is equal to the sum of its value for ∪ A and its value for B,

P (A B) = P (A + B) = P (A) + P (B) provided P (A B) = . ∪ ∩ ∅ Condition (iii) implies that for any countable – eventually infinite – collection of disjoint or non-overlapping sets, A (i = 1, 2, 3,...) with A A = for i i ∩ j ∅ all i = j, the relation called σ-additivity 6

∞ ∞ P Ai = P (Ai) or P Ak = P (Ak) (2.10) i ! i ! [ X Xk=1 Xk=1 holds. Clearly we have also P (Ac)=1 P (A), P (A)=1 P (Ac) 1, and − − ≤ P ( ) = 0. For any two sets A B we have P (A) P (B) and P (B A)= ∅ ⊂ ≤ − P (B) P (A). For any two arbitrary sets A and B we can write a sum of − disjoint sets as follows

A B = A + Ac B and ∪ ∩ P (A B) = P (A) + P (Ac B) . ∪ ∩ Since B Ac B we obtain P (A B) P (A)+ P (B). ⊂ ∩ ∪ ≤ The set of all subsets of Ω is the powerset Π(Ω) (figure 2.3). It contains the empty set , the sample space Ω and the subsets of Ω and this includes ∅ 24 Peter Schuster

Figure 2.3: The powerset. The powerset Π(Ω) is a set containing all subsets of Ω including the empty set and Ω itself. The figure sketches the powerset of ∅ three events A, B, and C. the results of all set theoretic operations that were listed above. The relation between the sample point ω, an event A, the sample space Ω and the powerset Π(Ω) is illustrated by means of an example presented already in section 1.2 as Penney’s game: the repeated coin toss. Flipping a coin has two outcomes: ’0’ for head and ’1’ tail (see also Bernoulli process in subsection 2.7.5). The sample points for flipping the coin n-times are binary n-tuples or strings, ω = (ω ,ω ,...,ω ) with ω 0, 1 .4 It is useful to consider also infinite numbers 1 2 n i ∈{ } of repeats, in particular for computing limits n : ω = (ω ,ω ,...) = → ∞ 1 2 (ωi)i N with ωi 0, 1 . Then we are dealing with infinitely long binary ∈ ∈ { } strings and the sample space Ω = 0, 1 N is the space of all infinitely long { } binary strings. It is countable as can be easily verified: Every binary string represents the binary encoding of a natural number (including ’0’) N N k ∈ 0 and hence Ω is countable as the natural numbers are. A subset of Ω will be called an event A when a probability measure

4There is a trivial but important distinction between strings or n-tuples and sets: In a string the position of an element matters, whereas in a set it does not. The following three sets are identical: 1, 2, 3 = 3, 1, 2 = 1, 2, 2, 3 . In order to avoid ambiguities { } { } { } string are written in (normal) parentheses and sets in curly brackets. Stochastic Kinetics 25 derived from axioms (i), (ii), and (iii) has been assigned. Commonly, one is not interested in the full detail of a probabilistic result and events can be easily adapted to lumping together sample points. We ask, for example, for the probability A that n coin flips yield at least k-times tail (the score for tail is 1):

n A = ω =(ω ,ω ,...,ω ) Ω : ω k , 1 2 n ∈ i ≥ ( i=1 ) X where the sample space is Ω = 0, 1 n. The task is now to find a system { } of events that allows for a consistent assignment of a probability P (A) F for every event A. For countable sample spaces Ω the powerset Π(Ω) rep- resents such a system , we characterize P (A) as a probability measure F on Ω, Π(Ω) , and the further handling of probabilities as outlined below is straightforward.  In case of uncountable sample spaces Ω, however, the powerset Π(Ω) is too large and a more sophisticated procedure is required (section 2.3). So far we have constructed and compared sets but not yet introduced numbers for actual computations. In order to construct a probability measure that is adaptable for numerical calculations on some countable sample space, Ω = ω ,ω ,...,ω ,... , we assign a weight % to every sample point ω { 1 2 n } n n subject to the conditions

n : % 0 ; % =1 . (2.11) ∀ n ≥ n n X Then for P ( ω )= % n the following two equations { n} n ∀ P (A) = %(ω) for A Π(Ω) and ∈ ω A X∈ (2.12) %(ω) = P ( ω ) for ω Ω { } ∈ represent a bijective relation between the probability measure P on Ω, Π(Ω) and the sequences % = %(ω) in [0,1] with %(ω) = 1. Such a ω Ω ω Ω  ∈ ∈ sequence is called a (discrete) probability density. P The function %(ωn) = %n has to be estimated or determined empirically because it is the result of factors lying outside or probability 26 Peter Schuster

Figure 2.4: Probabilities of throwing two dice. The probability of obtaining two to twelve counts through throwing two perfect or fair dice are based on the equal probability assumption for obtaining the individual faces of a single die. The probability P (N) raises linearly from two to seven and then decreases linearly between seven and twelve (P (N) is a discretized tent map) and the additivity 12 condition requires k=2 P (N = k) = 1. P theory. In physics and chemistry the correct assignment of probabilities has to meet the conditions of the experimental setup. An example will make this point clear: The fact whether a die is fair and shows all its six faces with equal probability or it has been manipulated and shows the ’six’ more frequently then the other numbers is a matter of physics and not mathematics. For many purposes the discrete uniform distribution, , is applied: All results U Ω ω Ω appear with equal probability and hence %(ω)=1/ Ω . ∈ | | With the assumption of uniform distribution we can measure the size U Ω of sets by counting sample points as illustrated best by considering the throws Stochastic Kinetics 27

Figure 2.5: An ordered partial sum of a random variable. The sum Sn = n represents the cumulative outcome of a series of events described by a k=1 Xk class of random variables, . The series can be extended to + and such a case P Xk ∞ will be encountered, for example, with probability distributions. The ordering criterion is not yet specified, it could be time t, for example. of dice. For one die the sample space is Ω = 1, 2, 3, 4, 5, 6 and for the fair { } die we make the assumption

1 P ( k ) = ; k =1, 2, 3, 4, 5, 6 . { } 6 that all six outcomes corresponding to different faces of the die are equally likely. Based on the assumption of we obtain the probabilities for the U Ω outcome of two simultaneously thrown fair dice shown in figure 2.4. The most likely outcome is a count of seven points because it has the largest multiplicity, (1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1) . { }

2.2.2 Random variables and functions

For the definition of random variables on countable sets a probability triple (Ω, Π(Ω),P ) is required: Ω contains the sample points or individual results, 28 Peter Schuster the powerset Π(Ω) provides the events A as subsets, and P eventually rep- resents the probability measure defined by equation (2.12). Based on such a probability triple we define a random variable as a numerically valued function of ω on the domain of the entire sample space Ω, X ω Ω : ω (ω) . (2.13) ∈ → X Random variables, (ω) and (ω), can be subject to operations to yield X Y other random variables, such as

(ω)+ (ω) , (ω) (ω) , (ω) (ω) , (ω)/ (ω) [ (ω) = 0] , X Y X −Y X Y X Y Y 6 and, in particular, also any linear combination of random variables such as a (ω)+ b (ω) is a random variable too. Just as a function of a function is X Y still a function, a function of a random variable is a random variable, ω Ω : ω ϕ ( (ω), (ω)) = ϕ( , ) . ∈ → X Y X Y Particularly important cases are the (partial) sums of n variables: n S (ω) = (ω) + . . . + (ω) = (ω) . (2.14) n X1 Xn Xk Xk=1 Such a partial sum Sn could be, for example, the cumulative outcome of n successive throws of a die.5Consider, for example, an ordered series of events where the current cumulative outcome is given by the partial sum S = n as shown in figure 2.5. In principle, the series can be extended n k=1 Xk to t . →∞P The ordered partial sum is step functions and their precise definitions are hidden in the equations (2.13) and (2.14). Three definitions are possible for the value of the function at the discontinuity. We present them for the Heaviside step function

0 , if x< 0 , H(x) = 0, 1 , 1 if x =0 , (2.15)  2  1 , if x> 0 .

5  The use of partial in this context implies that the sum does not cover the entire sample space in the moment. Series of throws of dice, for example, could be continued in the future. Stochastic Kinetics 29

Figure 2.6: The cumulative distribution function of fair dice. The cumula- tive probability distribution function (cdf) or probability distribution is a mapping from the sample space Ω onto the unit interval [0, 1] of R. It corresponds to the ordered partial sum with the ordering parameter being the score determined by the stochastic variable. The example shown deals with throwing fair dice: The distribution for one die (black) consists of six steps of equal height at the scores 1, 2,..., 6. The second curve (red) is the probability of throwing two dice (fig- ure 2.4).

The value ’0’ at x = 0 implies left hand continuity for H(x) and in terms of a probability distribution would correspond to a definition P ( < x) 1 X in equation (2.16), the value 2 implies that H(x) is neither right-hand nor left-hand semi-differentiable at x = 0 but is useful in many applications that make use of the inherent symmetry of the Heaviside function, for example the relation H(x)= 1 + sgn(x) /2 where sgn(x) is the sign or signum function:  1 if x< 0 , − sgn(x)  0 if x =0 ,    1 if x> 0 .   30 Peter Schuster

The functions in probability theory make use of the third definition deter- mined by P ( x) or H(0) = 1 in case of the Heaviside function. X ≤ Right-hand continuity is an important definition in the conventional han- dling of stochastic processes, for example for semimartigales (subsection 3.1.1). Often the property of right-hand continuity with left limits is denoted as c`adl`ag, which is an acronym from French for “continue `adroite, limites `a gauche”. An important step function for the characterization of a discrete proba- bility distribution is the cumulative distribution function (cdf, see also subsection 2.2.3). It is a mapping from sample space into the real numbers on the unit interval, (P ( x; Ω) (F (x); R : 0 F (x) 1) defined X ≤ ⇒ X ≤ X ≤ by

F (x) = P ( x) with lim F (x) = 0 and lim F (x)=1 . (2.16) X x X x + X X ≤ →−∞ → ∞ Two examples for throwing one die or two dice are shown in figure 2.6. The distribution function is defined for the entire x-axis, x R, but cannot be ∈ integrated by conventional Riemann integration. The cumulative distribution function and the partial sums of random variables, however, are continuous and differentiable on the right-hand side of the steps and therefore they are Stieltjes-Lebesgue integrable (see subsection 2.3.5). The probability function (pmf) is also mapping from sample space into the real numbers and gives the probability that a discrete random variable attains exactly some value x. We assume that is a discrete X X random variable on the sample space Ω, : Ω R, and then we define the X → probability mass function as a mapping onto the unit interval, f : R [0, 1], X → by f (x) = P ( = x) = P s Ω : (s)= x . (2.17) X X { ∈ X } Sometimes it is useful to be able to treat a discrete probability distribution as if it were continuous. The function f (x) is defined therefore for all real X numbers, x R including those outside the sample set. Then we have: ∈ f (x)=0 x / (Ω). Figure 2.7 shows the probability mass function of X ∀ ∈ X fair dice corresponding to the cumulative distributions in figure 2.6. For Stochastic Kinetics 31

Figure 2.7: Probability mass function of fair dice. The probability mass function (pmf), f (x), belong to the two probability distributions F (x) shown X X in figure 2.6. The upper part of the figure represents the scores x obtained with one die. The pmf is zero everywhere on the x-axis except at a set of points, x = 1, 2, 3, 4, 5, 6, of measure zero where it adopts the value 1/6 (black). The lower part contains the probability mass function for simultaneously throwing two dice (red, see also figure 2.4). The maximal probability value is obtained for the score x = 7. throwing one die it consists of six peaks, f (k)=1/6 with k = 1, 2,..., 6 X and has the value f (x) = 0 everywhere else (x = k). In the case of two dice X 6 the probability mass function corresponds to the discrete probabilities shown 32 Peter Schuster in figure 2.4. It contains, in essence, the same information as the cumulative distribution function or the listing of discrete probabilities.

2.2.3 Probabilities on intervals

Now, we write down sets, which are defined by the range of a random variable on the closed interval [a, b],6

a b = ω a (ω) b , { ≤ X ≤ } { | ≤ X ≤ } and define their probabilities by P (a b). More generally, the set ≤ X ≤ A of sample points can be defined by the open interval ]a, b[, the half-open intervals [a, b[ and ]a, b], the infinite intervals, ] , b[ and ]a, + [, as well −∞ ∞ as the set of real numbers, R =] , + [. When A is reduced to the single −∞ ∞ point x, it is called the singleton x : { } P ( = x) = P ( x ) . X X ∈{ } For countable, finite or countably infinite, sample spaces Ω the exact range of is just the set of the real numbers v below: X i

V = (ω) = v1, v2,...,vn,... . X {X } { } ω Ω [∈ Now we introduce probabilities

pn = P ( = vn) , vn V , X ∈ X and clearly we have P ( = x)=0if x / V . X ∈ X Knowledge of all pn-values implies full information on all probabilities concerning the random variable : X P (a b) = p or, in general, P ( A) = p . (2.18) ≤ X ≤ n X ∈ n a vn b vn A ≤X≤ X∈ 6The notation we are applying here uses square brackets, ’[’ ’]’, for closed intervals, · open square brackets, ’]’ ’[’, for open intervals, ’]’ ’]’ and ’[’ ’[’ for left-hand or right-hand · · · half-open intervals, respectively. An alternative less common notation uses parentheses instead of open square brackets, e.g., ’(’ ’)’. · Stochastic Kinetics 33

An especially important case, which has been discussed already in the pre- vious subsection 2.2.2, is obtained when A is the infinite interval ] , x]. −∞ The function x F (x), defined on R and in particular on the unit interval → X [0, 1], 0 F (x) 1, is the cumulative distribution function of : ≤ X ≤ X

F (x) = P ( x) = pn . (2.16’) X X ≤ vn x X≤ It fulfils several easy to verify properties:

F (a) F (b) = P ( b) P ( a) = P (a< b) , X − X X ≤ − X ≤ X ≤ P ( = x) = lim F (x + ) F (x ) , and  0 X X X → − − P (a< < b) = lim F (b ) F (a + )  .  0 X X X → − − An important special case is an integer valued positive random variable X corresponding to a countably infinite sample space which is the set of non- negative integers: Ω = N0 = 0, 1, 2,...,n,... with { } 0 pn = P ( = n) , n N and F (x) = pn . (2.19) X ∈ X 0 n x ≤X≤ Integer valued random variables will be used, for example, for modeling par- ticle numbers in stochastic processes. Two (or more) random variables,7 and , form a random vector X Y ( , ), which is defined by the probability X Y P ( = x , = y ) = p(x ,y ) . (2.20) X i Y j i j These probabilities constitute the joint probability distribution of the random vector. By summation over one variable we obtain the probabilities for the two marginal distributions: P ( = x ) = p(x ,y ) = p(x , ) and X i i j i ∗ yj X (2.21) P ( = y ) = p(x ,y ) = p( ,y ) , Y j i j ∗ j x Xi of and , respectively. X Y 7For simplicity we restrict ourselves to the two variable case here. The extension to any finite number of variables is straightforward. 34 Peter Schuster

2.3 Probability measure on uncountable sample spaces

A new and more difficult situation arises when the sample space Ω is un- countable. Then problems with measurability arise from the impossibility to assign a probability to every subset of Ω. The general task to develop measures for uncountable sets that are based on countably infinite subsets is highly demanding and requires advanced mathematics. For the probability concept we are using here, however, a restriction of the measure to sets of a certain family called Borel sets or Borel fields, , and the introduction of F the Lebesgue measure is sufficient.

2.3.1 Existence of non-measurable sets

The notion that is essential for the construction of a probability measure for an uncountable set is again the powerset Π(Ω), which is defined as the set of all subsets of Ω (figure 2.3). It would seem straightforward to proceed exactly as we did in the case of countability, the powerset, however, is too large since it contains uncountably many subsets. Giuseppe Vitali [42] provided a proof by example that no mapping P : Π(Ω) [0, 1] for the infinitely repeated → coin flip, Ω = 0, 1 N, exists, which fulfils the three indispensable properties { } for probabilities [11, p.9,10]:

(N) normalization: P (Ω) = 1 ,

(A) σ-additivity: for pairwise disjoint events A , A ,... Ω holds 1 2 ⊂

P Ai = P (Ai) , i 1 ! i 1 [≥ X≥

(I) invariance: For all A Ω and k 1 holds P (Tˆ A)= P (A), where Tˆ ⊂ ≥ k k is an operator that inverts the outcome of the n-th toss,

Tˆk : ω =(ω1,...,ωk 1,ωk,ωk+1 . . .) (ω1,...,ωk 1, 1 ωk,ωk+1 . . .), − → − − and Tˆ A = Tˆ (ω) : ω A is the image of A under the operation Tˆ . k { k ∈ } k Stochastic Kinetics 35

The first two conditions are the criteria for probability measures and the in- variance condition (I) is specific for coin flipping and encapsulates the prop- erties derived from the uniform distribution, . In general, such a relation UΩ will always exist with the details depending on the – coin flip – and its implementation in the real world – uniform distribution – be it an experimental setup, a census in sociology or the rules of gambling. We dispense first with the details of the proof and mention only the nature of the constructed contradiction:

1 = P (Ω) = P (TˆSA) = P (A) (2.22) S S X∈S X∈S cannot be fulfilled for infinitely large sequences of coin tosses. There are sequences S where all values P (A) or P (TˆSA) are the same and infinite summation of the same number yields either 0 or but never 1. ∞ The construction of the proof starts from finite sets of all subsets of N: = S N : S < , is countable as it is a union of countably many S { ⊂ | | ∞} S finite sets S N : max S = m . For S = k ,...,k we define { ⊂ } { 1 n} ∈ S ˆ ˆ ˆ ˆ TS = Πk STk = Tk1 . . . Tkn and an equivalence relation ’ ’ on Ω: ω ω˜ iff ∈ ◦ ◦ ≡ ≡ ωk =ω ˜k for sufficiently large k. The axiom of choice guarantees the existence of a set A Ω, which contains exactly one element of each equivalence class. ⊂ Then we have

(i) for each ω Ω there exists anω ˜ A with ω ω˜, and therefore an ∈ ∈ ≡ ˆ ˆ ˆ S with ω = TSω˜ TSA and hence Ω = S TSA, and ∈ S ∈ ∈S S (ii) the sets (TˆS)S are pairwise disjoint as can be easily verified: Assume ∈S that Tˆ A Tˆ˜A = for S, S˜ then there exist a pair ω, ω˜ A with S ∩ S 6 ∅ ∈ S ∈ Tˆ ω = Tˆˆ and consequently ω Tˆ ω = Tˆˆ ωˆ, and according to the S S ≡ S S ≡ choice of A we have ω =ω ˆ and consequently S = Sˆ.

Application of the properties (N), (A) and (I) on P and taking m to infinity yields equation (2.22) and completes the proof. Accordingly, the proof of Vitali’s theorem demonstrates the existence of a non-measurable subset of the real numbers called Vitali sets – precisely 36 Peter Schuster

Figure 2.8: Conceptual levels of sets in probability theory. The lowest level is the sample space Ω, it contains the sample points or individual results ω as elements, and events A are subsets of Ω: ω Ω and A Ω. The next higher level ∈ ⊂ is the powerset Π(Ω). Events A are its elements and event systems constitute its F subsets: A Π(Ω) and Π(Ω). The highest level finally is the power powerset ∈ F ⊂ Π Π(Ω) that houses event systems as elements: Π Π(Ω) . F F ∈   a subset of the real numbers that is not Lebesgue measurable (see subsec- tion 2.3.2). The problem to be solved is a reduction of the powerset to an event system such that the subsets causing the lack of countability are F eliminated.

2.3.2 Borel σ-algebra and Lebesgue measure

Before we define minimal requirements for an event system the three level F of sets in set theory that are relevant for our construction are considered (fig- ure 2.8). The objects on the lowest level are the sample points corresponding to individual results, ω Ω. The next higher level is the powerset Π(Ω) ∈ housing the events A Π(Ω). The elements of the powerset are subsets ∈ of the sample space A Ω. To illustrate the role of event systems we ⊂ F need one more higher level, the powerset of the powerset, Π Π(Ω) : Event systems are elements of the power powerset, Π Π(Ω) and subsets on F F ∈  the powerset, Π(Ω). F ⊂  Stochastic Kinetics 37

Minimal requirements for an event system are summarized in the F following definition for a σ-algebra on Ω with Ω = and Π(Ω): 6 ∅ F ⊂ (a) Ω , ∈ F (b) A = Ac := Ω A , and ∈F ⇒ \ ∈F

(c) A1, A2,... = i 1 Ai . ∈F ⇒ ≥ ∈F Condition (b) defines theS logical negation as expressed by the difference be- tween the entire sample space and the event A, and condition (c) represents the logical ’or’ operation. The pair (Ω, ) is called an event space or a F measurable space. From the three properties (a) to (c) follow other prop- erties. The intersection, for example, is the complement of the union of the complements A B =(Ac Bc)c . The consideration is easily extended ∩ ∪ ∈F to the intersection of countable many subsets of that belongs also to . F F Thus, a σ-algebra is closed under the operations ’c’, ’ ’ and ’ ’.8 Trivial ∪ ∩ examples of σ-algebras are , Ω , , A, Ac, Ω or the family of all subsets. {∅ } {∅ } A construction principle for σ-algebras makes stats out form an event system Π(Ω) (Ω = ) that is sufficiently small and arbitrary. Then, G ⊂ 6 ∅ there exists exactly one smallest σ-algebra = σ( ) in Ω with , and F G F ⊃ G is called the σ-algebra induced by or is the generator of . Here are F G G F three important examples:

(i) the powerset with Ω being countable where = ω : ω Ω is the G { } ∈ system of the subsets of Ω containing a single element, σ( ) = Π(Ω),  G each A Π(Ω) is countable, and A = ω A ω σ( ) (see section 2.2), ∈ ∈ { } ∈ G (ii) the Borel σ-algebra n (see below),S and B (iii) the product σ-algebra for sample spaces Ω that are Cartesian products

of sets Ek,Ω= k I Ek where I is a set of indices with I = . We ∈ 6 ∅ assume k is a σ-algebraQ on Ek with k : Ω Ek being the projection E X → 1 onto the k-th coordinate and the generator = − A : k I, A G {Xk k ∈ k ∈ 8A family of sets is called closed under an operation when the operation can be applied a countable number of times without producing a set that lies outside the family. 38 Peter Schuster

is the system of all sets in Ω, which are determined by an event Ek} on a single coordinate. Then, k I k := σ( ) is called the product ∈ E G σ-algebra of the sets on Ω. In the important case of equivalent Ek N Cartesian coordinates, E = E and = for all k I the short-hand k Ek E ∈ notion I is common. The Borel σ-algebra on Rn is represented by E the n-dimensionalN product σ-algebra of the Borel σ-algebra = 1 on B B R, or n = n. B BN All three examples are required for a deeper understanding of probability measures. The power set (i) provides the frame for discrete sample spaces the Borel σ-algebra (ii) to be discussed below sets the stage for one-dimensional continuous sample spaces. and (iii) the product σ-algebra represents the natural extension to the n-dimensional Cartesian space. For the construction of the Borelian σ-algebra9 we define a generator representing the set of all compact cuboids in n-dimensional Cartesian space, Ω= Rn, which have rational corners,

n

= [ak, bk] : ak < bk; ak, bk Q (2.23) G ( ∈ ) kY=1 where Q is the set of all rational numbers. The σ-algebra induced by this generator is denoted as the Borelian σ-algebra, n := σ( ) on Rn and each B G A n is a Borel set (for n = 1 one commonly writes instead of 1). ∈ B B B Five properties of the Borel σ-algebra are useful for application and for imagination of its enormous size.

(i) Each open set ’]..[’ A Rn is Borelian. Every ω A has a neighborhood ⊂ ∈ Q with Q A and therefore we have A = Q ,Q A Q representing ∈ G ⊂ ∈G ⊂ a union of countable many sets in n which follows from condition (c) B S of σ-algebras.

(ii) Each closed set ’[..]’ A Rn is Borelian since Ac is open and Borelian ⊂ according to item (i).

9Sometimes a Borel σ-algebra is also called a Borel field. Stochastic Kinetics 39

(iii) The σ-algebra n cannot be described in a constructive way, because is B consists of much more than the union of cuboids and their complements. In order to create n the operation of adding complements and count- B able unions has to be repeated as often as there are countable ordinal numbers (and this leads to uncountable many times [43, pp.24,29]). It is sufficient to memorize for practical purposes that n covers almost B all sets in Rn – but not all of them.

(iv) The Borelian σ-algebra on R is generated not only by the system B of compact sets (2.23) but also by the system of closed left-hand open infinite intervals: ˜ = ] ,c]; c R . (2.23’) G { −∞ ∈ } Condition (b) requires ˜ and – because of minimality of σ( ˜) – G ⊂ B G σ( ˜) too. Alternatively, σ( ˜) contains all left-open intervals since G ⊂ B G ]a, b] =] , b] ] , a] and all compact or closed intervals since −∞ \ 1 −∞ [a, b] = n 1 ]a n , b] and accordingly also the σ-algebra generated ≤ − B from these intervals (2.23). In full analogy is also generated from all T B open left-unbounded, from all closed and open right-unbounded inter- vals.

(v) The event system n = A Ω : A n on Ω Rn, Ω = represents BΩ { ∩ ∈ B } ⊂ 6 ∅ a σ-algebra on ω, which is denoted as the Borelian σ-algebra on Ω.

All intervals discussed in items (i) to (iv) are (Lebesgue) measurable while other sets are not. The Lebesgue measure is the conventional mean of assigning lengths, areas, volumes to subsets of three-dimensional Euclidean space and in formal Cartesian spaces to objects with higher dimensional volumes. Sets to which generalized volumes10 can be assigned are called Lebesgue measurable and the measure or the volume of such a set A is denoted by λ(A). The Lebesgue measure on Rn has the following properties:11 10We generalize volume here to arbitrary dimension n: The generalized volume for n = 1 is a length, for n = 2 an area, for n = 3 a (conventional) volume and for arbitrary dimension n a cuboid in n-dimensional space. 11Slightly modified from Wikipedia: Lebesgue measure, version March 04, 2011. 40 Peter Schuster

(1) If A is a Lebesgue measurable set, then λ(A) 0. ≥ (2) If A is a Cartesian product of intervals, I I . . . I , then A is 1 ⊗ 2 ⊗ ⊗ n Lebesgue measurable and λ(A)= I I . . . I . | 1|·| 2|· ·| n| (3) If A is Lebesgue measurable, its complement Ac is so too.

(4) If A is a disjoint union of countably many disjoint Lebesgue measurable

sets, A = k Ak, then A is itself Lebesgue measurable and λ(A)= k λ(ASk). P (5) If A and B are Lebesgue measurable and A B, then holds ⊂ λ(A) λ(B). ≤ (6) Countable unions and countable intersections of Lebesgue measurable sets are Lebesgue measurable.12

(7) If A is an open or closed subset or Borel set of Rn, then A is Lebesgue measurable.

(8) The Lebesgue measure is strictly positive on non-empty open sets, and so its support is the entire Rn.

(9) If A is a Lebesgue measurable set with λ(A) = 0, called a null set, then every subset of A is also a null set, and every subset of A is measurable.

(10) If A is Lebesgue measurable and r is an element of Rn, then the transla- tion of A by r that is defined by A+r = a+r : a A is also Lebesgue { ∈ } measurable and has the same measure as A.

(11) If A is Lebesgue measurable and δ > 0, then the dilation of A by δ defined by δA = δr : r A is also Lebesgue measurable and has { ∈ } measure δnλ(A).

12This is not a consequence of items (3) and (4): A family of sets, which is closed under complements and countable disjoint unions need not be closed under (non-disjoint) countable unions, for example the set , 1, 2 , 1, 3 , 2, 4 , 3, 4 , 1, 2, 3, 4 . ∅ { } { } { } { } { }  Stochastic Kinetics 41

(12) In generalization of items (10) and (11), if L is a linear transformation and A is a measurable subset of Rn, then T (A) is also measurable and has the measure det(T ) λ(A). | | All twelve items listed above can be succinctly summarized in one lemma: The Lebesgue measurable sets form a σ-algebra on Rn containing all products of intervals, and λ is the unique complete translation- invariant measure on that σ-algebra with λ [0, 1] [0, 1] . . . [0, 1] = 1. ⊗ ⊗ ⊗ We conclude this subsection on Borel algebra and Lebesgue measure by men- tioning a few characteristic and illustrative examples: Any closed interval [a, b] of real numbers is Lebesgue measurable, and • its Lebesgue measure is the length b a. The open interval ]a, b[ has − the same measure, since the difference between the two sets consists of the two endpoint a and b only and has measure zero.

Any Cartesian product of intervals [a, b] and [c,d] is Lebesgue mea- • surable and its Lebesgue measure is (b a) (d c) the area of the − · − corresponding rectangle.

The Lebesgue measure of the set of rational numbers in an interval of • the line is zero, although this set is dense in the interval.

The Cantor set13 is an example of an uncountable set that has Lebesgue • measure zero.

Vitali sets are examples of sets that are not measurable with respect • to the Lebesgue measure. In the forthcoming sections we make use of the fact that the continuous sets on the real axes become countable and Lebesgue measurable if rational numbers are chosen as beginnings and end points of intervals. Hence, we can with real numbers with almost no restriction for practical purposes.

13The Cantor set is generated from the interval [0, 1] through consecutively taking out 1 2 1 2 1 2 7 8 the open middle third: [0, 1] [0, 3 ] [ 3 , 1] [0, 9 ] [ 9 , 3 ] [ 3 , 9 ] [ 9 , 1] . . .. An → ∪ → m 1∪ ∪ ∪ → (3 − 1) 3k+1 3k+2 explicit formula for the set is: C = [0, 1] ∞ − , . \ m=1 k=0 3m 3m S S   42 Peter Schuster

2.3.3 Random variables on uncountable sets

Sufficient for dealing with random variables on uncountable sets is a prob- ability triple (Ω, ,P ). The sets in being the Borel σ-algebra are mea- F F surable and they alone have probabilities. We are now in the position to handle probabilities on uncountable sets: (ω) x ω (ω) x and P ( x) = |{X ≤ }| (2.24a) { |X ≤ } ∈ F X ≤ Ω | | a< b = b a with a < b (2.24b) { X ≤ } {X ≤ }−{X ≤ } ∈F a< b P (a< b) = |{ X ≤ }| = F (b) F (a) . (2.24c) X ≤ Ω − | | Equation (2.24a) contains the definition of a real-valued function that X is called a random variable iff it fulfils P ( x) for any real number x, X ≤ equation (2.24b) is valid since is closed under difference, and finally equa- F tion (2.24c) provides the basis for defining and handling probabilities on uncountable sets. The three equations (2.24) together constitute the basis of the probability concept on uncountable sample spaces that will be applied throughout this course. Random variables on uncountable sets Ω are commonly characterized by probability density functions (pdf). The probability density function – or density for short – is the continuous analogue to the (discrete) probability mass function (pmf). A density is a function f on R =] , + [ , u f(u), −∞ ∞ → which satisfies the two conditions: (i) u : f(u) 0 , and ∀ ≥ + ∞ (ii) f(u) du = 1 . Z−∞ Now we can define a class of random variables14 on general sample spaces: X is a function on Ω : ω (ω) whose probabilities are prescribed by means → X of a density function f(u). For any interval [a, b] the probability is given by b P (a b) = f(u) du . (2.25) ≤ X ≤ Za 14Random variables having a density are often called continuous in order to distinguish them from discrete random variables defined on countable sample spaces. Stochastic Kinetics 43

If A is the union of not necessarily disjoint intervals (some of which may be even infinite), the probability can be derived in general from the density

P ( A) = f(u) du , X ∈ ZA k in particular, A can be split in disjoint intervals, A = j=1[aj, bj] and then the integral can be rewritten as S

k bj f(u) du = f(u) du . A j=1 aj Z X Z For A = ] , x] we derive the (cumulative probability) distribution func- −∞ tion F (x) of the continuous random variable X x F (x) = P ( x) = f(u) du . X ≤ Z−∞ If f is continuous then it is the derivative of F as follows from the fundamental theorem of calculus dF (x) F 0(x) = = f(x). dx If the density f is not continuous everywhere, the relation is still true for every x at which f is continuous. If the random variable has a density, then we find by setting a = b = x X x P ( = x) = f(u) du = 0 X Zx reflecting the trivial geometric result that every line segment has zero area. It seems somewhat paradoxical that (ω) must be some number for every ω X whereas any given number has probability zero. The paradox can be resolved by looking at countable and uncountable sets in more depth. Extension to two variables and , forming a random vector ( , ), X Y X Y yields the joint probability distribution f x y P ( x, y) = f(u, v) dudv . (2.26) X ≤ Y ≤ Z−∞ Z−∞ Again we have to restrict the definition of probabilities to Borel sets S which could be, for example, polygons filling the two-dimensional plane,

P ( , ) S = f(u, v) dudv . X Y ∈ ZZS  44 Peter Schuster

Figure 2.9: Discretization of a probability density. The segment [x1,xm] on the u-axis is divided up into m 1 not necessarily equal intervals and elementary − probabilities are obtained by integration.

The joint density function f satisfies the following conditions:

(i) (u, v) : f(u, v) 0 , and ∀ ≥ + + ∞ ∞ (ii) f(u, v) dudv = 1 . Z−∞ Z−∞ Condition (ii) implies that f is integrable over the whole plane. As in the discrete case we may compute the probabilities of the individual variable from marginal density functions, u f(u, ) and v f( , v), respectively: → ∗ → ∗

x + ∞ P ( x) = f(u, ) du where f(u, ) = f(u, v) dv , X ≤ ∗ ∗ Z−∞ Z−∞ (2.27) y + ∞ P ( y) = f( , v) dv where f( , v) = f(u, v) du . Y ≤ ∗ ∗ Z−∞ Z−∞ In the most general case we may also define a joint distribution function F of ( , ) by X Y

F (x, y) = P ( x, y) for all (x, y) , X ≤ Y ≤ Stochastic Kinetics 45

Table 2.1: Comparison of the formalism of probability theory on countable and uncountable sample spaces.

Countable case Uncountable case

Range v , n =1, 2,...

Probability element pn f(u) du = dF (u) P (x b) p b f(u) du a vn b n a ≤ X ≤ ≤ ≤ P Rx P ( x) = F (x) vn x pn f(u) du X ≤ ≤ −∞ R∞ E( ) P n pn vn u f(u) du X −∞ ∞ proviso nPpn vn < R u f(u) du < | | ∞ −∞ | | ∞ P R and obtain the marginal distribution functions as limits:

lim F (x, y) = F (x, ) = P ( x, < ) = P ( x) and y ∞ X ≤ Y ∞ X ≤ →∞ (2.28) lim F (x, y) = F ( ,y) = P ( < , y) = P ( y) . x →∞ ∞ X ∞ Y ≤ Y ≤ We note that the relations < and < put no restrictions on the X ∞ Y ∞ variables and . X Y Let us finally consider the process of discretization of a density function in order to yield a set of elementary probabilities. The x-axis is divided up into m+1 pieces (figure 2.9), not necessarily equal and not necessarily small, and denote the piece of the integral between xn and xn+1 by

xn+1 p = f(u) du , 0 n m . (2.29) n ≤ ≤ Zxn When x = and x =+ we have 0 −∞ m+1 ∞ n : p 0 and p = 1 . ∀ n ≥ n n X The partition is not finite but countable, provided we label the intervals suitably, for example ...,p 2,p 1,p0,p1,p2,... . Now we consider a random − − 46 Peter Schuster variable such that Y P ( = x ) = p , (2.29’) Y n n where we may replace xn by any number in the subinterval [xn, xn+1]. The random variable can be interpreted as the discrete analogue of the random Y variable . X We end by presenting a comparison between probability measures on countable and uncountable sample spaces where the latter are based on a probability density f(u) in table 2.1.15

2.3.4 Limits of series of random variables

Limits of sequences are required for problems convergence of and for approx- imations to random variables. The problem of limits arises because there is ambiguity in the definition of limits. A sequence of random variables, , is defined on a probability space Ω Xn and it is assumed to have the limit

= lim n . (2.30) n X →∞ X The probability space Ω, we assume now, has elements ω which have a prob- ability density p(ω). Four different definitions of the limit are common in probability theory [6, pp.40,41].

Almost certain limit. The series converges almost certainly to if for Xn X all ω except a set of probability zero

(ω) = lim n(ω) . (2.31) n X →∞ X is fulfilled and each realization of converges to . Xn X Limit in the mean. The limit in the mean or the mean square limit of a series requires that the mean square deviation of (ω) from (ω) vanishes Xn X in the limit and the condition is 2 2 lim dωp(ω) n(ω) (ω) lim ( n ) = 0 . (2.32) n X − X ≡ n X − X →∞ ZΩ →∞   15Expectation values E( ) and higher moments of probability distributions are dis- X cussed in section 2.5. Stochastic Kinetics 47

The mean square limit is the standard limit in Hilbert space theory and it is commonly used in quantum mechanics.

Stochastic limit. The limit in probability also called the stochastic limit fulfils the condition: is the stochastic limit if for any ε> 0 the relation X

lim P ( n >ε) = 0 . (2.33) n →∞ |X − X|

Limit in distribution. Probability theory uses also a weaker form of conver- gence than the previous three limits, the limit in distribution, which requires that for any continuous and bounded function f(x) the relation

lim f( n) = f( ) (2.34) n →∞ h X i h X i holds. This limit is particularly useful for characteristic functions, f(x) = . exp(ııxs): If two characteristic functions approach each other, then the prob- ability density of converges to that of . Xn X

2.3.5 Stieljes and Lebesgue integration

This subsection provides a short repetition of some generalizations of the conventional Riemann integral, which are important in probability theory. We start with a sketch comparing the Riemann and the Lebesgue approach to integration presented in figure 2.10. One difference between the two inte- gration methods for a non-negative function – like the probability functions – can be visualized in three dimensional space: The volume below a surface given by the non-negative function g(x, y) is measured by summation of the volumes of cuboids with squares of edge length ∆d whereas the Lebesgue in- tegral is summing the volumes of layers with thickness ∆d between constant level sets. The Stieltjes integral is an important generalization of Riemannian integration b g(x) dh(x) . (2.35) Za Herein g(x) is the integrand and h(x) is the integrator, and the conventional Riemann integral is obtained for h(x)= x. The integrator can be visualized 48 Peter Schuster

Figure 2.10: Comparison of Riemann and Lebesgue integrals. In the con- ventional Riemannian-Darboux integrationa the integrand is embedded between an upper sum (light blue) and a lower sum (blue) of rectangles. The integral ex- ists iff the upper sum and the lower sum converge to the integrand in the limit ∆d 0. The Lebesgue integral can be visualized as an approach to calculating the → area enclosed by the x-axis and the integrand through partitioning into horizontal stripes (red) and considering the limit ∆d 0. The definite integral b g(x) dx is → a confining the integrand to a closed interval: [a, b] or a x b. ≤ ≤ R a The concept of representing the integral by the convergence of two sums is due to the French mathematician Gaston Darboux. A function is Darboux integrable iff it is Riemann integrable, and the values of the Riemann and the Darboux integral are equal in case they exist.

best as a weighting function for the integrand. In case g(x) and h(x) are con- tinuous and continuously differentiable the Stieltjes integral can be resolved Stochastic Kinetics 49 by partial integration: b b dh(x) g(x) dh(x) = g(x) dx = dx Za Za b b dg(x) = g(x)h(x) h(x) dx = x=a − a dx   Z b dg(x) = g(b)h(b) g( a)h(a) h(x) dx . − − dx Za The integrator F (x) may also be a step function: For g(x) being continuous and F (x) making jumps at the points x ,...,x ]a, b [ with the heights 1 n ∈ ∆F ,..., ∆F R, and n ∆F 1, the Stieltjes integral is of the form 1 n ∈ i=1 n ≤ bP n g(x) dF (x) = g(xi) ∆Fi . (2.36) a i=1 Z X With g(x) = 1 and in the limit lima the integral becomes identical with →−∞ the (discrete) cumulative probability distribution function (cdf). Riemann-Stieltjes integration is used in probability theory to compute, for example, moments of probability densities (section 2.5). If F (x) is the cumulative probability distribution of a random variable the expected X value (see section 2.5) for any function g( ) is obtained from X + ∞ E g( ) = g(x) dF (x)= g(x ) ∆F , X i i Z−∞ i  X and this is the equation for the discrete case. If the random variable has a X probability density f(x) = dF (x)/dx with respect to the Lebesgue measure continuous integration can be used + ∞ E g( ) = g(x) f(x) dx . X Z−∞  + Important special cases are the moments: E( n)= ∞ xn dF (x). X −∞ Lebesgue theory of integration assumes the existenceR of a probability space defined by the triple (Ω, ,µ) representing sample space Ω, a σ-algebra F of subsets A Ω and a (non-negative) probability measure µ satisfying F ∈ µ(Ω) = 1. Lebesgue integrals are defined for measurable functions g fulfilling

1 g− [a, b] Ω for all a

This condition is equivalent to the requirement that the pre-image of any Borel subset [a, b] of R is an element of the event system . The set of mea- B surable functions is closed under algebraic operation and also closed under certain pointwise sequential limits like sup g , lim inf g or lim sup g , which k N k k k N k k N ∈ ∈ ∈ are measurable if the sequence of functions (gk)k N contains only measurable ∈ functions.

The construction of an integral Ω g dµ = Ω g(x) µ(dx) is done in steps and we begin with the indicator functionR : R

1 iff x A 1A(x) = ∈ , (2.38) 0 otherwise  which provides a possibility to define the integral over A n by ∈ B

g(x) dx := 1A(x) g(x) dx ZA Z and which assigns a volume to Lebesgue measurable sets by setting g 1 ≡

1A dµ = µ(A) Z and which is the Lebesgue measure µ(A)= λ(A) for a mapping λ : R. B→ Simple functions are finite linear combinations of indicator functions g = j αj 1Aj . They are measurable if the coefficients αj are real numbers andP the sets Aj are measurable subsets of Ω. For non-negative coefficients αj the linearity property of the integral leads to a measure for non-negative simple functions:

αj 1Aj dµ = αj 1Aj dµ = αj µ(Aj) . j ! j j Z X X Z X Often a simple function can be written in several ways as a linear combination of indicator functions and then the value of the integral will always be the same. Sometimes some care is needed in the construction of a real-valued simple function g = j αj1Aj in order to avoid undefined expressions of the kind . Choosing α = 0 implies that α µ(A ) = 0 because 0 = 0 ∞−∞ P i i i ·∞ by convention in measure theory. Stochastic Kinetics 51

An arbitrary non-negative function g : (Ω, ,µ) (R+, ,λ) is mea- F → B surable iff there exists a sequence of simple functions (gk)k N that converges ∈ pointwise16 and growing monotonously to g. The Lebesgue integral of a non-negative and measurable function g is defined by

g dµ = lim gk dµ (2.39) k ZΩ →∞ ZΩ with gk being simple functions that converge pointwise and monotonously towards g. The limit is independent of the particular choice of the functions gk. Such a sequence of simple functions is easily visualized, for example, by the bands below the function g(x) in figure 2.10: The band widths ∆d decrease and converge to zero as the index increases, k . →∞ The extension to general functions with positive and negative value do- mains is straightforward. There is one important major difference between Riemann and Lebesgue integration: The contribution to the Riemann inte- gral changes sign when the function changes sign, whereas all partial areas enclosed between the function and the axis of integration are summed up in

∞ the Lebesgue integral. The improper Riemann integral, 0 cos xdx, has a limit inferior, lim infn xn = 1, and a limit superior, lim supn xn = +1, →∞ − R →∞ whereas the corresponding Lebesgue integral does not exist. For Ω = R and the Lebesgue measure λ holds: Functions that are Rie- mann integrable on a compact interval [a, b] are Lebesgue integrable too and the values of both integral are the same. The inverse is not true: Not every Lebesgue integrable function is Riemann integrable (see the Dirichlet func- tion below). A function that has an improper Riemann integral need not be Lebesgue integrable on the whole. We consider one example for each case:

(i) The Dirichlet (step) function, D(x) is the characteristic function of the rational numbers and assumes the value 1 for rational x and the value 0 for irrational x: 1 , if x Q , D(x) = ∈ or D(x) = lim lim cos2n(m! π x) .  m n 0 , otherwise , →∞ →∞  16 Pointwise convergence of a sequence of functions fn , limn fn = f pointwise is  { } →∞ fulfilled iff limn fn(x)= f(x) for every x in the domain. →∞ 52 Peter Schuster

Figure 2.11: The alternating harmonic series. The alternating harmonic step function, h(x) = n = ( 1)k+1/k with (k 1) x < k and n N, has an k − − ≤ k ∈ improper Riemann integral since k∞=1 nk = ln 2. It is not Lebesgue integrable because the series ∞ n diverges. k=1 | k| P P D(x) is lacking Riemann integrability for every arbitrarily small interval:

Each partitioning S of the integration domain [a, b] into intervals [xk 1, xk] − leads to parts that contain necessarily at least one rational and one irrational number. Hence the lower Darboux sum,

n

Σlow(S)= (xk xk 1) inf D(x)=0 , − xk 1

n

Σhigh(S)= (xk xk 1) sup D(x)= b a , − − · xk 1

D dλ = 0 λ(S R Q)+1 λ(S Q) , · ∩ \ · ∩ ZS with λ being the Lebesgue measure. The evaluation of the integral is straight- forward. The first term vanishes because multiplication by zero yields zero no matter how large λ(S R Q) is, because 0 is zero by the conven- ∩ \ ·∞ tion of measure theory, and the second term is also zero because λ(S Q) ∩ is zero since the set of rational numbers, Q, is countable. Hence we have

S D dλ = 0. R (ii) The step function with alternatingly positive and negative areas of size 1 , (1, 1 , 1 , 1 ,...) (see figure 2.11) is an example of a function that has n − 2 3 − 4 an improper Riemann integral whereas the Lebesgue integral diverges. The function h(x)=( 1)k+1/k with (k 1) x

Such righthand continuous and monotonously increasing functions F : R → R are therefore called measure generating. The Lebesgue integral of a λF integrable function g is called Lebesgue-Stieltjes integral

g dλ with A (2.40) F ∈ B ZA being Borel measurable. If F is the identity function on R,17 F = id : R R, id(x)= x, then the corresponding Lebesgue-Stieltjes measure is the → Lebesgue itself: λF = λid = λ. For (proper) Riemann integrable functions g we have stated that the Lebesgue integral is identical with the Riemann integral: b g dλ = g(x) dx . a [a,bZ ] Z The interval [a, b] = a x b is partitioned into a sequence σ = (a = ≤ ≤ n (n) (n) (n) (n) x0 , x1 ,...,xr = b) where the superscript ’ ’ indicates a Riemann sum with σ 0 and the Riemann integral on the righthand side is replaced by | n|→ the limit of the Riemann summation:

r (n) (n) (n) g dλ = lim g(xk 1) xk xk 1 = n − − − →∞ k=1 [a,bZ ] X   r = lim g(x(n) ) id(x(n)) id(x(n) ) . n k 1 k k 1 →∞ − − − Xk=1   The Lebesgue measure λ has been introduced above as the special case F = id and therefore we find for the Stieltjes-Lebesgue integral by replacing λ by λF and ’id’ by F

r (n) (n) (n) g dλ = lim g(xk 1) F (xk ) F (xk 1) . n − − − →∞ k=1 [a,bZ ] X  

The details of the derivation are found in [44, 45]. . 17The identity function id(x) = x, it maps a domain, for example [a,b], point by point onto itself. Stochastic Kinetics 55

In summary we define a Stieltjes-Lebesgue integral or F -integral by: F,g : R R are two functions partitioned on the interval [a, b] by the sequence → σ =(a = x0, x1,...,xr = b) defined by

r . g dF = g(xk 1) F (xk) F (xk 1) . − − − σ k=1 X X  The function g is F-integrable on [a,b] if

b . g dF = lim g dF (2.41) σ 0 | |→ σ Za X R b exists in and a gdF is called the Stieltjes-Lebesgue integral or F -integral of g. This formulationR will be required for the presentation of the Itˆointegral used in Itˆocalculus in section 3. 56 Peter Schuster

2.4 Conditional probabilities and independence

The conventional probability has been defined on the entire sample space 18 Ω, P (A) = A / Ω = ω A P (ω) ω Ω P (ω). We shall now define a | | | | ∈ ∈ probability of set A relativeP to another P set, say S. This means that we are interested in the proportional weight of the part of A in S which is expressed by the intersection A S relative to S, and obtain ∩ P (ω) P (ω) . ω A S ω S ∈X∩ . X∈ In other words, we switch from Ω to S as the new universe and consider the conditional probability of A relative to S:

P (A S) P (AS) P (A S) = ∩ = (2.42) | P (S) P (S) provided P (S) = 0. Apparently, the conditional probability vanishes when 6 the intersection is empty: P (A S)=0if P (A S) = . From here on we | ∩ ∅ shall always use the short notation for the intersection, AS A S. ≡ ∩ Next we mention several simple but fundamental relations involving con- ditional probabilities that we present here, in essence, without proof (for details see [5], pp.111-144). For n arbitrary events Ai we have

P (A1, A2,...,An) = P (A1)P (A2 A1)P (A3 A1A2) ... P (An A1A2 ...An 1) | | | − provided P (A1A2 ...An 1) > 0. Under this proviso all conditional probabil- − ities are well defined since

P (A1) P (A1A2) . . . P (A1A2 ...An 1) > 0 . ≥ ≥ ≥ − Let us assume that the sample space Ω is partitioned into n disjoint sets,

Ω= n An. For any set B we have then P P (B) = P (A ) P (B A ) . n | n n X 18The sample space Ω is assumed to be countable and the weight P (ω) = P ( ω ) is { } assigned to every point. Generalization to Lebesgue measures is straightforward. Stochastic Kinetics 57

From this relation it is straightforward to derive the conditional probability

P (A )P (B A ) P (A B) = j | j j| P (A )P (B A ) n n | n provided P (B) > 0. P Independence of random variables will be a highly relevant problem in the forthcoming chapters. Countably-valued random variables ,..., X1 Xn are defined to be independent if and only if for any combination x1,...,xn of real numbers the joint probabilities can be factorized:

P ( = x ,..., = x ) = P ( = x ) . . . P ( = x ) . (2.43) X1 1 Xn n X1 1 · · Xn n

A major extension of equation (2.43) replaces the single values xi by arbitrary sets Si

P ( S ,..., S ) = P ( S ) . . . P ( S ) . X1 ∈ 1 Xn ∈ n X1 ∈ 1 · · Xn ∈ n In order to proof this extension we sum over all points belonging to the sets

S1,...,Sn:

P ( = x ,..., = x ) = ··· X1 1 Xn n x1 S1 xn Sn X∈ X∈ = P ( S ) . . . P ( S ) = ··· X1 ∈ 1 · · Xn ∈ n x1 S1 xn Sn X∈ X∈ = P ( S ) . . . P ( S ) , X1 ∈ 1 · · Xn ∈ n x1 S1 ! xn Sn ! X∈ X∈ which is equal to the right hand side of the equation to be proven.

Since the factorization is fulfilled for arbitrary sets S1,...Sn it holds also for all subsets of ( . . . ) and accordingly the events X1 Xn S ,..., S {X1 ∈ 1} {Xn ∈ n} are also independent. It can also be verified that for arbitrary real-valued functions ϕ ,...,ϕ on ( , + ) the random variables ϕ ( ),...,ϕ ( ) 1 n −∞ ∞ 1 X1 n Xn are independent too. 58 Peter Schuster

Independence can be extended in straightforward manner to the joint distribution function of the random vector ( ,..., ) X1 Xn F (x ,...,x ) = F (x ) . . . F (x ) , 1 n 1 1 · · n n where the F ’s are the marginal distributions of the ’s , 1 j n. Thus, j Xj ≤ ≤ the marginal distributions determine the joint distribution in case of inde- pendence of the random variables. For the continuous case we can formulate the definition of independence for the sets S1,...,Sn forming a Borel family. In particular if there is a joint density function f, then we have

P ( S ,..., S ) = f (u) du . . . f (u) du = X1 ∈ 1 Xn ∈ n 1 · · n ZS1  ZSn  = f (u ) ...f (u ) du ...du , ··· 1 1 n n 1 n ZS1 ZSn where f1,...,fn are the marginal densities. The probability is also equal to

f(u ,...,u ) du ...du ··· 1 n 1 n ZS1 ZSn and hence we finally find for the density case:

f(u1,...,un) = f1(u1) ...fn(un) . (2.44)

As we have seen here, stochastic independence makes it possible to factorize joint probabilities, distributions or densities. Applications of conditional probabilities to problems in biology are found in chapter 5. Genetics was indeed one of the first cases in science where probabilities were used in the interpretation of experimental results (see, for example, the works of Gregor Mendel as described in [46, 47] and section 1.2). Finally, we mention that a whole branch of probability theory Bayesian statistics is based on conditional probabilities. It is named after the English mathematician and Presbyterian minister Thomas Bayes who initiated an alternative way to think about probabilities by the formulation of Bayes’s theorem about hypothesis H and data D [48]: P (D H) P (H) P (H D) = | , (2.45) | P (D) Stochastic Kinetics 59 wherein P (H) is the prior probability that H is correct before the data are seen, P (D) is the marginal probability of D giving the prior probability of witnessing the data under all possible hypotheses and as such depends on the prior probabilities gives to them

P (D) = P (D,H ) = P (D H )P (H ) , i | i i i i X X P (D H) is the conditional probability of seeing the data D given that the | hypothesis H is true, and P (H D) eventually is the posterior probability, | which is the probability that the hypothesis H is true given the data D in the previous state of belief about the hypothesis (For the current status of Bayesian statistics see, for example, [49–51]). Bayesian statistics is thus deal- ing with statistical inference rather than the conventional frequency-based interpretation of probabilities and accordingly is much closer to formal logic. 60 Peter Schuster

2.5 Expectation values and higher moments

Random variables are accessible to analysis via their probabilities. In addi- tion, straightforward information can be derived also from ensembles defined on the entire sample space Ω. The most important example is the expecta- tion value, E( )= . We start with a countable sample space: X hXi E( ) = (ω)P (ω) = p v . (2.46) X X n n ω A n X∈ X In the special case of a random variable on N0 we have X E( ) = np . X n n=0 X The expectation value (2.46) exists when the series converges in absolute val- ues, ω Ω (ω) P (ω) < . Whenever the random variable is bounded, ∈ |X | ∞ X which means that there exists a number m such that (ω) m for all P |X | ≤ ω Ω, then it is summable and in fact ∈ E( ) = (ω) P (ω) m P (ω) = m . |X| |X | ≤ ω ω X X It is straightforward to show that the sum of two random variable, + is X Y summable iff and are summable: X Y E( + ) = E( ) + E( ) . X Y X Y The relation can be extended to an arbitrary countable number of random variables: n n

E k = E( k) . X ! X Xk=1 Xk=1 In addition, the expectation values fulfill the following relations E(a) = a, E(a )= a E( ) which can be combined in X · X n n

E ak k = ak E( k) . (2.47) X ! · X Xk=1 Xk=1 Thus, E( ) is a linear operator. · Stochastic Kinetics 61

For a random variable on an arbitrary sample space Ω the expectation X value may be written as an abstract integral on Ω or – provided the density f(u) exists and we know it – as an integral over R:

+ ∞ E( ) = (ω) dω = u f(u) du . (2.48) X Ω X Z Z−∞ It is worth to reconsider the discretization of a continuous density in this context (see figure 2.9 and section 2.3): The discrete expression for the ex- pectation value is based upon p = P ( = x ) as outlined in equations (2.29) n Y n and (2.29’),

+ ∞ E( ) = x p E( ) = u f(u) du , Y n n ≈ X n X Z−∞ and approximates the exact value in the sense of an approximation to the Riemann integral.

2.5.1 First and second moments

For two or more variables, for example ( , ), described by a joint density X Y f(u, v), we have

+ + ∞ ∞ E( ) = u f(u, ) du and E( ) = v f( , v) dv . X ∗ Y ∗ Z−∞ Z−∞ The expectation value of the sum of the variables, + , can be evaluated X Y by iterated integration:

+ + ∞ ∞ E( + ) = (u + v) f(u, v) dudv = X Y Z−∞ Z−∞ + + + + ∞ ∞ ∞ ∞ = udu f(u, v) dv + v dv f(u, v) du = Z−∞ Z−∞  Z−∞ Z−∞  + + ∞ ∞ = u f(u, ) du + v f( , v) dv = ∗ ∗ Z−∞ Z−∞ = E( ) + E( ) , X Y which establishes the previously derived expression. 62 Peter Schuster

The multiplication theorem of probability theory requires that the two variables and are independently summable and this implies for the X Y discrete and the continuous case,

E( ) = E( ) E( )and (2.49a) X·Y X · Y + + ∞ ∞ E( ) = uv f(u, v) dudv = X·Y Z−∞ Z−∞ + + ∞ ∞ = u f(u, ) du v f( , v) dv = ∗ ∗ Z−∞ Z−∞ = E( ) E( ) , (2.49b) X · Y respectively. The multiplication theorem is easily extended to any finite number of independent and summable random variables:

E( ,..., ) = E( ) . . . E( ) . (2.49c) X1 Xn X1 · · Xn Let us now consider the expectation values of special functions of random variables, in particular, their powers which give rise to the raw moments of the probability distribution,µ ˆ . For a random variable we distinguish the r X r-th moments E( r) and the so-called centered moments19 µ = E( ˜r) X r X referring to the random variable

˜ = E( ) . X X − X Clearly, the first raw moment is the expectation value and the first centered moment vanishes, E( ˜)= µ = 0. Often the expectation value is denoted by X 1 µ =µ ˆ = E( ), a notation that we shall use too for the sake of convenience 1 X but it is important not to confuse µ and µ1. In general, a moment is defined about some point a by means of the random variable (a) = a . X X − 19Since the moments centered around the expectation value, will be used more frequently than the raw moments we denote them by µ and the raw moments byµ ˆ. The r-th moment of a distribution is also called the moment of order r. Stochastic Kinetics 63

For a = 0 we obtain the raw moments

µˆ = E( r) (2.50) r X whereas a = E( ) yields the centered moments. X The general expressions for the raw r-th moments and centered moments as derived from the density f(u) are

+ ∞ E( r) = α ( ) = ur f(u) du and (2.51a) X r X Z−∞ + ∞ E( ˜r) = µ ( ) = (u µ)r f(u) du . (2.51b) X r X − Z−∞ The second centered moment is called the variance, σ2( ), and its pos- X itive square root, the standard deviation σ( ). The variance is always a X non-negative quantity as can be easily shown. Further we can derive:

2 σ2( ) = E( ˜2) = E E( ) = X X X − X    = E 2 2 E( ) + E( )2 = X − X X X (2.52)   = E( 2) 2 E( ) E( ) + E( )2 = X − X X X = E( 2) E( )2 . X − X If E( 2) is finite, than E( ) is finite too and fulfils the inequality X |X| E( )2 E( 2) , |X| ≤ X and since E( ) E( ) the variance is necessarily a non-negative quantity, X ≤ |X| σ2( ) 0. X ≥ If and are independent and have finite variances, then we obtain X Y σ2( + ) = σ2( ) + σ2( ) , X Y X Y as follows readily by simple calculation:

E ( ˜ + ˜)2 = E ˜2 + 2 ˜ ˜ + ˜2 = X Y X X Y Y  = E ˜2 + 2 E( ˜) E( ˜) + E ˜2 = E ˜2 + E ˜2 . X X Y Y X Y     64 Peter Schuster

Here we have used the fact that the first centered moments vanish: E( ˜)= X E( ˜)=0. Y For two general – non necessarily independent – random variables and X , the Cauchy-Schwarz inequality holds for the mixed expectation value: Y E( )2 E( 2) E( 2) . (2.53) XY ≤ X Y If both random variables have finite variances, the covariance is defined by

Cov( , ) = E E( ) E( ) = X Y X − X Y − Y   = E E( ) E(  ) + E( ) E( ) = (2.54) XY −X Y − X Y X Y   = E( ) E( ) E( ) . XY − X Y The covariance Cov( , ) and the coefficient of correlation ρ( , ), X Y X Y Cov( , ) Cov( , ) = E( ) E( ) E( ) and ρ( , ) = X Y , (2.54’) X Y XY − X Y X Y σ( ) σ( ) X Y are a measure of correlation between the two variables. As a consequence of the Cauchy-Schwarz inequality we have 1 ρ( , ) 1. If covariance − ≤ X Y ≤ and correlation coefficient are equal to zero, the two random variables and X are uncorrelated. Independence implies lack of correlation but the latter Y is in general the weaker property. In addition to the expectation value two more quantities are used to char- acterize the center of probability distributions (figure 2.12): (i) The median µ¯ is the value at which the number of points of a distribution at lower values of matches exactly the number of points at higher values as expressed in terms of two inequalities, 1 1 P ( µ¯) and P ( µ¯) or X ≤ ≥ 2 X ≥ ≥ 2 µ¯ + (2.55) 1 ∞ 1 dF (x) and dF (x) , ≥ 2 µ¯ ≥ 2 Z−∞ Z where Lebesgue-Stieltjes integration is applied or in case of an absolutely continuous distribution the condition simplifies to µ¯ 1 P ( µ¯) = P ( µ¯) = f(x) dx = , (2.55’) X ≤ X ≥ 2 Z−∞ Stochastic Kinetics 65

Figure 2.12: Probability densities and moments. As an example of an asym- metric distribution with highly different values for mode, median, and mean, the lognormal density f(x) = 1 exp (ln x ν)2/(2σ2) is shown. Parameters √2πσx − − values σ = √ln 2 and ν = ln 2 were chosen and they yieldµ ˜ = exp(ν σ2/2) = 1  − for the mode,µ ¯ = exp(ν) = 2 for the median and µ = exp(ν + σ2/2) = 2√2 for the mean, respectively. The sequence mode < median < mean is charac- teristic for distributions with positive skewness whereas the opposite sequence mean < median < mode is found in cases of negative skewness (see also figure 2.14). and (ii) the mode µ˜ of a distribution is the most frequent value – the value that is most likely to obtain through sampling – and it is obtained as the maximum of the probability mass function for discrete distribution or as the maximum of the probability density in the continuous case. An illustrative example for the discrete case is the probability mass function of the scores for throwing to dice (The mode in the lower part of figure 2.7 isµ ˜ = 7). A prob- ability distribution may have more than one mode. Bimodal distributions occur occasionally and then the two modes provide much more information on the expected outcomes than mean or median (see also subsection 2.7.9). For many purposes a generalization of the median from two to n equally 66 Peter Schuster

Figure 2.13: Definition and determination of quantiles. A quantile q with pq = k/n defines a value xq at which the (cumulative) probability distribution reaches the value F (x ) = p corresponding to P ( < x) p. The figure shows q q X ≤ how the position of the quantile pq = k/n is used to determine its value xq(p). In particular we use here the normal distribution Φ(x) as function F (x) and the 1 xq ν computation yields Φ(x ) = 1 + erf − = p . Parameter choice: ν = 2, q 2 √2σ2 q 2 1 σ = 2 , and for the quantile (n= 5, k = 2), yielding pq = 2/5 and xq = 1.8209. sized data sets is useful. The quantiles are points taken at regular intervals from the cumulative distribution function F (x) of a random variable . Or- X dered data are divided into n essentially equal-sized subsets and accordingly, (n 1) points on the x-axis separate the subsets. Then, the k-th n-quantile − is defined by P ( < x) k = p or equivalently X ≤ n x 1 . F − (p) = inf x R : F (x) p and p = dF (u) . (2.56) { ∈ ≥ } Z−∞ In case the random variable has a probability density the integral simplifies x 1 to p = f(u)du. The median is simply the value of x for p = 2 . For −∞ 1 partitioningR into four parts we haver the first or lower quartile at p = 4 , 1 the second quartile or median at p = 2 , and the third or upper quartile at 3 p = 4 . The lower quartile contains 25 % of the data, the median 50 %, and the upper quartile eventually 75 %. The interpretation of expectation value and variance or standard devia- Stochastic Kinetics 67 tion is straightforward: The expectation value is the mean or average value of a distribution and the variance measures the width.

2.5.2 Higher moments

Two other quantities related to higher moments are frequently used for a more detailed characterization of probability distributions:20 (i) The skew- ness 3 µ µ E E( ) γ = 3 = 3 = X − X (2.57) 1 3/2 3   3/2 µ σ 2 2 E E( ) X − X    and (ii) kurtosis, which is either defined as the fourth standardized moment

β2 or in terms of cumulants given as excess kurtosis, γ2,

4 E E( ) µ4 µ4 X − X β2 = 2 = 4 = 2 and µ2 σ  2  E E( ) (2.58) X − X    κ4 µ4  γ2 = 2 = 4 3 = β2 3 . κ2 σ − − Skewness is a measure of the asymmetry of the probability density: curves that are symmetric about the mean have zero skew, negative skew implies a longer left tail of the distribution caused by fewer low values, and positive skew is characteristic for a distribution with a longer right tail. Kurtosis characterizes the degree of peakedness of a distribution. High kurtosis implies a sharper peak and flat tails, low kurtosis in contrary characterizes flat or round peaks and this tails. Distributions are called leptokurtic if they have a positive excess kurtosis and therefore are sharper peak and a thicker tail than the normal distribution which is taken as a reference with zero kurtosis (see section 2.7), they are characterized as platykurtic if they have a negative excess kurtosis, a broader peak and thinner tails (see figure 2.14). One property of skewness and kurtosis being caused by definition is important

20In contrast to expectation value, variance and standard deviation, skewness and kur- tosis are not uniquely defined and it is necessary therefore to check carefully the author’s definitions when reading text from literature. 68 Peter Schuster

Figure 2.14: Skewness and kurtosis. The upper part of the figures illustrates the sign of skewness with asymmetric density functions. The examples are taken form the binomial distribution B (n,p): γ = (1 2p)/ np(1 p) with p = 0.1 k 1 − − (red), 0.5 (black; symmetric), and 0.9 (blue) with the valuesp γ1 = 0.596, 0, 0.596. − Densities with different kurtosis are compared in the lower part of the figure: The Laplace distribution (D, red), the hyperbolic secant distribution (S, orange), and the logistic distribution (L, green) are leptokurtic with excess kurtosis values 3, 2, and 1.2, respectively. The normal distribution is the reference curve with excess kurtosis 0 (N, black). The raised cosine distribution (C, cyan), the Wigner semicir- cle distribution (W, blue), and the uniform distribution (U, magenta) are platykur- tic with excess kurtosis values of -0.593762, -1, and -1.2 respectively (The picture is taken from http://en.wikipedia.org/wiki/Kurtosis, March 30,2011). Stochastic Kinetics 69 to note: The expectation value, the standard deviation, and the variance are quantities with dimensions, mass, length and/or time, whereas skewness and kurtosis are dimensionless numbers.

The cumulants κi are the coefficients of a series expansion of the logarithm of the characteristic function (see section 2.7), which in turn is the Fourier transform of the probability density function, f(x):

+ ∞ . i ∞ . (ıı z) φ(z) = exp(ıı z x) f(x)dx and ln φ(z) = κ . (2.59) i i! i=1 Z−∞ X

The first five cumulants κi (i =1,..., 5) expressed in terms of the expectation value µ and the central moments µi (µ1 = 0) are

κ1 = µ

κ2 = µ2

κ3 = µ3 κ = µ 3 µ2 4 4 − 2 κ = µ 10 µ µ . 5 5 − 2 3

We shall come back to the use of cumulants κi in the next section 2.6 when we apply k-statistics in order to compute empirical moments from incomplete data sets and in section 2.7 where we shall compare frequently used individual probability densities. 70 Peter Schuster

2.6 Mathematical statistics

Although mathematical statistics is a discipline in its own right and would require a separate course, we mention here briefly the basic concept which is of general importance for every scientist.21 In practice, we can collect data for all sample points of the sample space Ω only in very exceptional cases. Otherwise we have to rely on limited samples as they are obtained in experiments or in opinion polls. Among other things mathematical statistics is dealing extensively with the problem of incomplete data sets.

For a given incomplete random sample ( ,..., ) some function Z is X1 Xn evaluated and yields a random variable = Z( ,..., ) as output. From Z X1 Xn a limited set of data, x = x , x ,...,x , sample expectation values also { 1 2 n} called sample means, sample variances, sample standard deviations or other quantities are computed as estimators in the same way as if the sample set would cover the complete sample space. In particular we compute the sample mean

1 n m = x (2.60) n i i=1 X and the moments around the sample mean. For the sample variance we obtain

2 1 n 1 n m = = x2 x , (2.61) 2 n i − n i i=1 i=1 ! X X

21For the reader who is interested in more details on mathematical statistics we recom- mend the textbook by Fisz [10] and the comprehensive treatise by Stuart and Ord [25,26] which is a new edition of Kendall’s classic on statistics. The monograph by Cooper [52] is particularly addressed to experimentalists practicing statistics. A great variety of other and equally well suitable texts are, of course, available in the rich literature on mathemat- ical statistics. Stochastic Kinetics 71 and for the third and fourth moments after some calculations 3 1 n 3 n n 2 n m = x3 x x2 + x (2.62a) 3 n i − n2 i j n3 i i=1 i=1 ! j=1 ! i=1 ! X X X X 1 n 4 n n m = x4 x x3 + 4 n i − n2 i j i=1 i=1 ! j=1 ! X X X 2 4 6 n n 3 n + x x2 x . (2.62b) n3 i j − n4 i i=1 ! j=1 ! i=1 ! X X X These na¨ıve estimators, mi (i =2, 3, 4,...), contain a bias because the exact expectation value µ around which the moments are centered is not known and has to be approximated by the sample mean m. For the variance we illustrate the systematic deviation by calculating a correction factor known as Bessel’s correction but more properly attributed to Gauss [53, part 2, p.161]. In order to obtain an expectation value for the sample moments we repeat drawing of samples with n elements and denote their expectation values by < mi >. In particular we have 2 1 n 1 n m = x2 x = 2 n i − n i i=1 i=1 ! X X 1 n 1 n n = x2 x2 + x x = n i − n2 i i j i=1 i=1 i,j=1, i=j ! X X X6 n 1 n 1 n = − x2 x x . n2 i − n2 i j i=1 i,j=1, i=j X X6 The expectation value is now of the form n 1 1 n 1 n < m > = − x2 x x , 2 n n i − n2 i j * i=1 + *i,j=1, i=j + X X6 2 and by using < xixj >=< xi >< xj >=< xi > we find 2 n 1 1 n n(n 1) n < m > = − x2 − x = 2 n n i − n2 i * i=1 + * i=1 + X X n 1 n(n 1) n 1 = − α − µ2 = − (α µ2) , n 2 − n2 n 2 − 72 Peter Schuster

2 where α2 is the second moment about zero. Using the identity α2 = µ2 + µ we find eventually n 1 1 n < m > = − µ and var(x) = (x m)2 . (2.63) 2 n 2 n 1 i − i=1 − X Further useful measures of correlation between pairs of random variables can be derived straightforwardly: (i) the unbiased sample covariance 1 n = (xi m) (yi m) , (2.64) MXY n 1 − − i=1 − X and (ii) the sample correlation coefficient n (x m) (y m) = i=1 i − i − . (2.65) RXY ( n (x m)2) ( n (y m)2) i=1P i − i=1 i − For practical purposes Bessel’sp P correction is oftenP unimportant when the data sets are sufficiently large but the recognition of the principle is important in particular for statistical properties more involved than variances. Sometimes a problem is encountered in cases where the second moment of a distribution,

µ2, does not exist that means it diverges. Then, computing variances from incomplete data sets is also unstable and one may choose the mean absolute deviation, 1 n ( ) = m , (2.66) D X n |Xi − | i=1 X as a measure for the width of the distribution [54, pp.455-459], because it is commonly more robust than variance or standard deviation. In order to derive unbiased estimators for the cumulants of a probability distribution, = κi, k-statistics has been invented [55, pp.99-100]. The first four terms of k-statistics for n sample points are

k1 = m , n k = m , 2 n 1 2 − n2 k = m and (2.67) 3 (n 1)(n 2) 3 − − 2 2 n (n + 1) m4 3(n 1) m2 k = − − , 4  (n 1)(n 2)(n 3)  − − − Stochastic Kinetics 73 which result from inversion of the relationships

= µ , n 1 < m > = − µ , 2 n 2 (n 1)(n 2) < m3 > = − 2 − µ3 , n (2.68) 2 2 (n 1) (n 1) µ4 + (n 2n + 3) µ2 < m 2 > = − − − , and 2  n3  2 2 (n 1) (n 3n + 3) µ4 + 3(2n 3) µ2 < m > = − − − . 4  n3  The usefulness of these relations becomes evident in various applications. The statistician computes moments and other functions from his empiri- cal data sets, for example x ,...,x or (x ,y ),..., (x ,y ) by means of { 1 n} { 1 1 n n } the equations (2.60) and (2.63) to (2.65). The main issue of mathematical statistics, however, has always been and still is the development of inde- pendent tests that allow for the derivation of information on the quality of data. The underlying assumption commonly is that the values of the empir- ical functions converge to the corresponding exact moments as the random sample increases. Predictions on the reliability of the computed values are made by means of a great variety of tools. We dispense from details which are extensively treated in the literature [10, 25, 26]. 74 Peter Schuster

2.7 Distributions, densities and generating functions

In this section we introduce a few frequently used probability distributions and analyze their properties. For this goal we define and make use of two auxiliary functions, which allow for the derivation of compact representa- tions of the distributions and which provide convenient tools for handling functions of probabilities. In particular we shall make use of probability gen- erating functions, g(s), moment generating functions M(s) and characteristic functions φ(s). The characteristic function φ(s) exists for all distributions but we shall encounter cases where no generating function exists (see, for example, the Cauchy-Lorentz distribution in subsection 2.7.8). In addition to the three generating functions mentioned here other functions are in use as well. An example is the cumulant generating function that is lacking a uniform definition. It is either the logarithm of the moment generation or the logarithm of the characteristic function.

2.7.1 Probability generating functions

Let be a random variable taking only non-negative integer values with a X probability distribution given by

P ( = j) = a ; j =0, 1, 2,... . (2.69) X j A dummy variable s is introduced and the probability generating func- tion is expressed by an infinite power series

∞ 2 j g(s) = a0 + a1 s + a2 s + . . . = aj s . (2.70) j=0 X As we shall show later, the full information on the probability distribution is encapsulated in the coefficients a (j 0). In most cases s is a real variable, j ≥ although it can be of advantage to consider also complex s. Recalling that a = 1 we verify easily that the power series (2.70) converges for s 1: j j | | ≤ P g(s) a s j a = 1 , for s 1 . | | ≤ | j|·| | ≤ j | | ≤ j j X X Stochastic Kinetics 75

For s < 1 we can differentiate the series term by term to calculate the | | derivatives of the generating function g

∞ dg 2 n 1 g0(s) = = a + 2 a s + 3 a s + . . . = n a s − , ds 1 2 3 n n=1 X 2 ∞ d g n 2 g00(s) = = 2 a + 6 a s + . . . = n(n 1) a s − , 2 2 3 − n ds n=2 X and, in general, we have j ∞ ∞ (j) d g n j n n j g = = n(n 1) . . . (n j + 1) a s − = j ! a s − . j − − n j n ds n=j n=j X X   Setting s = 0, all terms vanish except the constant term 1 g(j)(0) = j ! a or a = g(j)(0) . j j j !

In this way all aj’s may be obtained by consecutive differentiation from the generating function and alternatively the generating function can be deter- mined from the known probability distribution.

Putting s = 1 in g0(s) and g00(s) we can compute the first and second moments of the distribution of : X ∞ g0(1) = n a = E( ) , n X n=0 X ∞ ∞ 2 2 (2.71) g00(1) = n a n a = E( ) E( ) and, n − n X − X n=0 n=0 X X 2 E( ) = g0(1) and E( ) = g0(1) + g00(1) . X X We summarize: The probability distribution of a non-negative integer val- ues random variable can be converted into a generating function without loosing information. The generating function is uniquely determined by the distribution and vice versa. 76 Peter Schuster

2.7.2 Moment generating functions

Basis of the moment generating function is the series expansion of the expo- nential of the random variable X 2 3 s 2 3 eX = 1+ s + X s + X s ... . X 2! 3! The moment generating function allows for direct computation of the moments of a probability distribution as defined in equation (2.69) since we have:

s µˆ2 2 µˆ3 3 M (s) = E(eX )=1+ˆµ1 s + s + s ... , (2.72) X 2! 3! whereinµ ˆi is the i-th raw moment. The moments are obtained by differen- tiating M (s) i times with respect to s and then setting s =0 X n n (n) d M E( )=µ ˆn = M = nX . X X ds s=0

A probability distribution thus has (at least) as many momen ts as many times the moment generating function can be continuously differentiated (see also characteristic function in subsection 2.7.3) . If two distributions have the same moment generating functions they are identical at all points:

M (s) = M (s) F (x) = F (x) . X Y ⇐⇒ X Y This statement, however, does not imply that two distributions are iden- tical when they have the same moments, because in some cases the mo- ments exist but the moment generating function does not since the limit k n µˆk s limn diverges as, for example, in case of the logarithmic nor- →∞ k=0 k! mal distribution.P

2.7.3 Characteristic functions

Like the moment generating function the characteristic function φ(s) of a random variable completely describes the probability distribution F (x). X It is defined by + + ∞ . ∞ . φ(s) = exp(ıı s x) dF (x) = exp(ıı s x) f(x) dx , (2.59’) Z−∞ Z−∞ Stochastic Kinetics 77 where the integral over dF (x) is of Riemann-Stieltjes type. In case a probabil- ity density f(x) exists for the random variable the characteristic function X is (almost) the Fourier transform of the density.22 From equation (2.59’) fol- . ııs lows the useful expression φ(s)= E(e X ) that we shall use, for example, in solving the equations for stochastic processes (subsection 3.4). The characteristic function exists for all random variables since it is an in- tegral of a bounded continuous function over a space of finite measure. There is a bijection between distribution functions and characteristic functions:

φ (s) = φ (s) F (x) = F (x) . X Y ⇐⇒ X Y If a random variable has moments up to k-th order, then the characteristic X function φ(x) is k times continuously differentiable on the entire real line and vice versa if a characteristic function φ(x) has a k-th derivative at zero, then the random variable has all moments up to k if k is even and up to k 1 X − if k is odd: . dkφ(s) dkφ(s) . E( k) = ( ıı)k and = ıık E( k) . (2.73) X − k k X ds s=0 ds s=0

An interesting example is presented by the Cauchy distribution (subsec- tion 2.7.8) with φ(s) = exp( s ): It is not differentiable at s = 0 and the | | distribution has no moments including the expectation value. The moment generating function is related to the probability generating function g(s) (subsection 2.7.1) and the characteristic function φ(s) (subsec- tion 2.7.3) by

s s . . g (e ) = E eX = M (s) and φ(s) = Mıı (s) = M (ııs) . X X X  All three generating functions are closely related but it may happen that not all three are existing. As said, characteristic functions exist for all probability distributions. 22The difference between the Fourier transform ψ(s) and the characteristic function φ(s), + + ∞ . ∞ . ψ(s) = exp( 2πıı s x) dx and φ(s) = exp(+ıı s x) dx , − Z−∞ Z−∞ is only a matter of the factor 2π and the choice of the sign. 78 Peter Schuster ) )  2 | s s n 1) | 2 2 k ) ) ) γ s s σ − as s − ı . 2 ı a ı . ) 2 1 ı ) s − νs s p ı . e a ı ı πbs . s ı 2 ıs . ı . − e ı cf ı − − + 0 b b ( 2 ( p − bs πbs e α ı 1 ıx . . ı ı ı . ı − ı . ıνs ı exp( . sin( ı − e (1 (1 exp exp( exp( )  2 s n 1) b 1 2 2 k ) 2 1 ) ) s s σ − 2 s − as p a < ) 2 s 1 s ) νs e | a e 2 s πbs – + − ( b s s < + 2 − mgf p | b − α bs ( 1 − exp( πbs e e νs sin( − for (1 for (1 exp exp( vely; ) ) p p − − 5 6 (1 1 0 3 – k α 12 (1 p 4.2 − 6 . For more np − Kurtosis 1 s d 1 ) − p b − ) p 8 k α 2 (1 1 0 0 0 0 – s − √ q 1 np − Skewness √ (1 1 ) 3 p 2 − / ) 2 − a 2 2 k a b – α b 12 2 σ s − 2 (1 b 2 ( x π 0 Variance np R 1 } or ] 0 , ) = c 1 c− 2 p p a, b − [ 0 ν a ν e− k x { ∈ a, b α +1) Mode +1) d ; n ˜ n m ( ( x b max b ( B c 3 e 02  α . Abbreviation and notations used in the table k 0 np 2 9 d − b 0 − 2 ν a ν + 1 3 or x a 1 c  + Median k α np b ≈ ≈ b b 2 – ν k a ν + α np a Mean )   k   ] are the upper and lower incomplete gamma function, respecti 0 /b 2 , ) ν x 1+ ) σ γ ν a − a, b 2 − s ) b [ x x − ,α , k, − ) 2 x ! x √ d c  x x ∈ c , 2 − k a s (  1 b − k − b +1 2 k n cdf e ν − b k − Γ( ≥ ( b 1 2 ≥ γ e − , x x x

1 2 k n − b 0 otherwise  sech πγ √          and ) is the regularized incomplete beta function with [ s ] 1] 0 0 ∞ d , a, b R R R R , N N s a, b [ [0 ∈ ∈ ∈ ∈ [0 ∈ ∈ − ∈ ∈ x x x x (1; e ∈ k k Support p x 1 x − /B r ) s 0 R 1] + R R ∞ + 0 , N ∈ R R N x a, b R R ∈ ∈ ; R [0 0 ,b > ∈ ∈ ∈ ∈ ∈ ∈ R 0 x ∈ b > a < b ν ν k n 2 Comparison of several common probability densities. x ( γ a, b ∈ σ p α > Parameters a B ) = r, x ) ) ) ) = ) ) k Γ( α ( a, b n, p ν, σ oisson a, b Name Normal Cauchy Laplace Logistic hi-square Uniform Binomial able 2.2: details see [56]. Stochastic Kinetics 79

Figure 2.15: The Poisson probability density. Two examples of Poisson k α distributions, πk(α) = α e− /k!, with α = 1 (black) and α = 5 (red ) are shown. The distribution with the larger α has the mode shifted further to the right and a thicker tail.

Before entering the discussion of individual common probability distribu- tions we present an overview over the most important characteristics of these distributions in table 2.2.

2.7.4 The Poisson distribution

The Poisson distribution, named after the French physicist and mathemati- cian Sim´eon Denis Poisson, is a discrete probability distribution. It is used to model the number of independent events occurring in a given interval of time. In physics and chemistry the Poisson process is the stochastic basis of first order processes, for example radioactive decay or irreversible first order reactions and the Poisson distribution is the probability distribution underly- ing the time course of particle numbers, N(t). Despite its major importance in physics and biology the Poisson distribution, π(α), is a fairly simple math- ematical object. It contains a single parameter only, the real valued positive 80 Peter Schuster number α:23 e α P ( = k) = π (α) = − αk ; k N0 . (2.74) X k k! ∈ As an exercise we leave to verify the following properties:

∞ ∞ ∞ 2 2 πk =1 , k πk = α and k πk = α + α Xk=0 Xk=0 Xk=0 Examples of Poisson distributions with two different parameter values, α =1 and 5, are shown in figure 2.15. By means of a Taylor expansion we can find the generating function of the Poisson distribution, α(s 1) g(s) = e − . (2.75)

From the generating function we calculate easily

α(s 1) 2 α(s 1) g0(s) = α e − and g00(s) = α e − .

Expectation value and second moment follow straightforwardly from equa- tion(2.71):

E( ) = g0(1) = α , (2.75a) X 2 2 E( ) = g0(1) + g00(1) = α + α , and (2.75b) X σ2( ) = α . (2.75c) X Both, the expectation value and the variance are equal to the parameter α and hence, the standard deviation amounts to σ( )= √α. This remarkable X property of the Poisson distribution is not limited to the second moment. The factorial moments, r , fulfil the equation hX if

r = E ( 1) . . . ( r + 1) = αr . (2.75d) hX if X X − X −  

23In order to solve the problems one requires knowledge on some basic infinite series: 1 x xn 1 n α α n e = ∞ , e = ∞ for x < , e = limn (1 + ) , e− = limn (1 ) . n=0 n! n=0 n! | | ∞ →∞ n →∞ − n P P Stochastic Kinetics 81

Figure 2.16: The binomial probability density. Two examples of binomial distributions, B (n,p) = n pk(1 p)n k, with n = 10, p = 0.5 and p = 0.1 are k k − − shown. The former distribution  is symmetric with respect to the expectation value

E(Bk)= n/2.

2.7.5 The binomial distribution

The binomial distribution, B(n, p), characterizes the cumulative result of independent trials with two-valued outcomes, for example, successive coin tosses as we discussed in sections 1.2 and 2.2, = n . The ’s are Sn i=1 Xi Xi commonly called Bernoulli random variables named afterP the Swiss mathe- matician Jakob Bernoulli, and accordingly, the sequence of events is known as a Bernoulli process, and the corresponding random variable is said to have a Bernoulli or binomial distribution:

n k n k 0 P ( = k) = B (n, p) = p q − , q =1 p and (k N ,k n) . Sn k k − ∈ ≤   (2.76) Two examples are shown in figure 2.16. The distribution with p = 0.5 is symmetric with respect to k = n/2. The generating function for the single trial is g(s) = q + ps. Since we have n independent trials the complete generating function is n n n n k k k g(s) = (q + ps) = q − p s . (2.77) k Xk=0   82 Peter Schuster

From the derivatives of the generating function,

n 1 2 n 2 g0(s) = np (q + ps) − and g00(s) = n(n 1) p (q + ps) − , − we compute readily expectation value and variance:

E( ) = g0(1) = n p , (2.77a) Sn 2 2 2 2 2 2 E( ) = g0(1) + g00(1) = np + n p np = n p q + n p , (2.77b) Sn − σ2( ) = npq , and (2.77c) Sn σ( ) = √npq . (2.77d) Sn For p =1/2, the case of the unbiased coin, we have the symmetric binomial distribution with E( )= n/2, σ2( )= n/4, and σ( )= √n/2. We note Sn Sn Sn that the expectation value is proportional to the number of trials, n, and the standard deviation is proportional to its square root, √n. The binomial distribution B(n, p) can be transformed into the Poisson distribution π(α) in the limit n . In order to show this we start from →∞

n k n k 0 B (n, p) = p (1 p) − , 0 k n (k N ,k n) . k k − ≤ ≤ ∈ ≤   The symmetry parameter p is assumed to vary with n, p(n)= α/n for n 1, ≥ and thus we have α n α k α n k B n, = 1 − , (k N0,k n) . k n k n − n ∈ ≤         We let n go to infinity for fixed k and start with B0(n, p): n α α α lim B0 n, = lim 1 = e− . n n n − n →∞   →∞   Now we compute the ratio of two consecutive terms, Bk+1/Bk:

α 1 1 Bk+1 n, n n k α α − α n k α − α = − 1 = − 1 . Bk n, k +1 · n · − n k +1 · n · − n n           Both terms in the square brackets converge to one as n , and hence we →∞ find: α Bk+1 n, α lim n = . n B n, α k +1 →∞ k n   Stochastic Kinetics 83

From the two results we compute all terms starting from the limit value of B and find lim B = α exp( α), lim B = α2 exp( α)/2!, . . ., lim B = 0 1 − 2 − k αk exp( α)/k!. Accordingly we have verified Poisson’s limit law: − α 0 lim Bk n, = πk(α) , k N . (2.78) n n ∈ →∞   It is worth remembering that the limit was performed in a peculiar way since the symmetry parameter p(n) was shrinking with increasing n and as a matter of fact vanished in the limit of n . →∞

2.7.6 The normal distribution

The normal distribution is of central importance in probability theory be- cause many distributions converge to it in the limit of large numbers. It is basic for the estimate of statistical errors and thus we shall discuss it in some detail. The general normal distribution has the density24

+ (x ν)2 1 − ∞ ϕ(x) = e− 2σ2 with ϕ(x) dx = 1 , (2.79) √2π σ Z−∞ and the corresponding random variable has the moments E( ) = ν, X X σ2( )= σ2, and σ( )= σ. X X For many purposes it is convenient to use the normal density in centered and normalized form: + 1 x2/2 ∞ ϕ(x) = e− with φ(x) dx = 1 , (2.79’) √2π Z−∞ In this form we have E( ˜) = 0, σ2( ˜) = 1, and σ( ˜) = 1. Integration of X X X the density yields the distribution function

x 1 u2 P ( x) = Φ(x) = e− 2 du . (2.80) X ≤ √2π Z−∞ The function Φ(x) is not available in analytical form, but it can be easily formulated in terms of the error function, erf(x). This function as well as its

24The notations applied here for the normal distribution are: Φ(x; ν, σ) for the cumu- lative distribution and ϕ(x; ν, σ) for the density, Commonly, the parameters, (ν, σ) are omitted. 84 Peter Schuster complement, erfc(x), defined by

x 2 t2 2 ∞ t2 erf(x) = e− dt and erfc(x) = e− dt , √π √π Z0 Zx are available in tables and in standard mathematical packages.25 Examples of the normal density ϕ(x) with different values of the standard deviation σ and one example of the integrated distribution Φ(x) are shown in figure 2.17. The normal distribution is also used in statistics to define confidence intervals: 68.2 % of the data points lie within an interval ν σ, 95.4 % within an interval ± ν 2σ, and eventually 99,7 % with an interval ν 3σ. ± ± The normal density function ϕ(x) has, among other remarkable prop- erties, derivatives of all orders. Each derivative can be written as product of ϕ(x) by a polynomial, of the order of the derivative, known as Hermite polynomial. This existence of all derivatives makes the bell-shaped curve x ϕ(x) particularly smooth. In addition, the function ϕ(x) decreases to → zero very rapidly as x . | |→∞ The normal density is particularly smooth and can be differentiated ar- bitrarily often. This makes the moment generating function of the normal distribution especially attractive (see subsection 2.7.2). M(s) can be ob- tained directly by integration

+ + 2 ∞ ∞ x M(s) = exs ϕ(x) dx = exp xs dx = − 2 Z−∞ Z−∞   + 2 + s2 (x s) ∞ ( − ) s2/2 ∞ (2.81) = e 2 − 2 dx = e ϕ(x s) dx = − Z−∞ Z−∞ 2 = es /2 .

All raw moments of the normal distribution + ∞ µˆ(n) = xn ϕ(x) dx (2.82) Z−∞ They can be obtained, for example, by successive differentiation of M(s) with respect to s (subsection 2.7.2). The moments are obtained more efficiently

25We remark that erf(x) and erfc(x) are not normalized in the same way as the normal + 2 ∞ 2 ∞ 1 ∞ 1 density: erf(x) + erfc(x)= √π 0 exp( t )dt = 1, but 0 ϕ(x)dx = 2 ϕ(x)dx = 2 . − −∞ R R R Stochastic Kinetics 85

Figure 2.17: The normal probability density. In the upper part we show the density function of the normal distribution, ϕ(x) = exp[ (x ν)2/(2σ2)]/(√2πσ) − − for ν = 5 and σ = 0.5 (red), 1.0 (black), and 2.0 (blue). The smaller the standard deviation σ, the sharper is the curve. The lower part shows the den- sity function ϕ(x) (red) together with the distribution Φ(x) = x ϕ(u)du = −∞ 0.5 1+erf x/(√2σ) ν (black) for ν = 5 and σ = 0.5. − R  

by expanding the first and the last expression in the previous equation (2.81) 86 Peter Schuster

Figure 2.18: A fit of the normal distribution to the binomial distribu- tion. The curves represent normal densities (red), which were fit to the points of the binomial distribution (black). The three examples. Parameter choice for the binomial distribution: (n = 4,p = 0.5), (n = 10,p = 0.5), and (n = 10,p = 0.1), for the upper, middle, and lower plot, respectively. Stochastic Kinetics 87 in a power series of s,

+ 2 n ∞ (xs) (xs) 1 + xs + + . . . + + . . . ϕ(x) dx = 2! n! Z−∞   s2 1 s2 2 1 = 1+ + + . . . + s22 n + ... , 2 2! 2 n!    or expressed in terms of the momentsµ ˆ(n),

∞ µˆ(n) ∞ 1 sn = s2n , n! 2n n! n=0 n=0 X X from which we compute the moments of Φ(x) by putting the coefficients of the powers of s equal on both sides of the expansion and find for n 1:26 ≥

(2n 1) (2n) (2n)! µˆ − = 0 andµ ˆ = . (2.83) 2n n! All odd moments vanish because of symmetry. In case of the fourth moment, kurtosis, a kind of standardization is common, which assigns zero excess kurtosis, γ2 = 0 to the normal distribution. In other words, excess kurtosis monitors peak shape with respect to the normal distribution: Positive excess kurtosis implies peaks that are sharper than the normal density, negative excess kurtosis peaks that are broader than the normal density.

Multivariate normal distribution. For general applications it is worth considering the normal distribution in multiple dimensions. The random variable is replaced by a random vector,27 ~ =( ,..., ) with the joint X X X1 Xn 26The definite integrals are: √π n =0 + ∞ xn exp( x2)dx = 0 n 1; odd , −∞ −  ≥  (n 1)!! R  − √π n 2; even 2n/2 ≥ where (n 1)!! = 1 3 . . . (n 1).  − · · · −  27The notation we use here for vectors is ‘ ~ ’ or bold-face letters. Matrices are denoted · either by upper-case Roman or upper-case Greek letters. Vectors are commonly under- stood as columns or one-column matrices. Transposition, indicated by ‘ 0 ’, converts them · into row vectors or one-row matrices and vice versa transposition of a row vector yields a column vector. If no confusion is possible row vectors are used instead of column vectors in order to save space. 88 Peter Schuster probability distribution

P ( = x ,..., = x ) = p(x ,...,x ) = p(x) . X1 1 Xn n 1 n This multivariate the normal probability density can be written as

1 1 1 ϕ(x) = exp (x ν)0 Σ− (x ν) . (2π)n Σ −2 − − | |   The vector ν consistsp of the (raw) first moments along the different coordi- nates, ν =(ν1,...,νn) and the variance-covariance matrix Σ contains the n variances in the diagonal and the covariances are combined as off-diagonal elements σ2( ) Cov( , ) . . . Cov( , ) X1 X1 X2 X1 Xn 2 Cov( 2, 1) σ ( 2) . . . Cov( 2, n) Σ = X X X X X , . . .. .  . . . .    Cov( , ) Cov( , ) ... σ2( )   Xn X1 Xn X2 Xn    which is symmetric by definition of covariances: Cov( , ) = Cov( , ). Xi Xj Xj Xi Mean and variance are given by νˆ = ν and the variance-covariance matrix Σ, the moment generating function expressed in the dummy vector variable s =(s1,...,sn) is of the form 1 M(s) = exp (ν0s) exp s0Σs , · 2   and, finally, the characteristic function is given by

. 1 φ(s) = exp (ıı ν0s) exp s0Σs · −2   Without showing the details we remark that this particulary simple char- acteristic function implies that all moments higher than order two can be expressed in terms of first and second moments, in particular expectation values, variances, and covariances. To give an example that we shall require later in subsection 3.5.2, the fourth order moments can be derived from

E( 1, 2, 3, 4) = σ12σ34 + σ41σ23 + σ13σ24 and X X X X (2.84) E( 4) = 3 σ2 . X1 11 Stochastic Kinetics 89

Normal and binomial distribution. The normal distribution is of general importance since it may be derived, for example, from the binomial distri- bution n k n k B (n, p) = p (1 p) − , 0 k n , k k − ≤ ≤   through extrapolation to large values of n at constant p.28 As a matter of fact the expression normal distribution originated from the idea that many distributions can be transformed in a natural way for large n to yield the distribution Φ(x). The transformation from the binomial distribution to the normal distribution is properly done in two steps (see also [5, pp.210-217]): (i) At first we make the binomial distribution comparable by shifting the maximum towards x = 0 and adjusting the width (figures 2.18 and 2.19). For 0

We assume now an arbitrary but fixed positive constant c. In the range of k defined by ξ c we approximate | k| ≤

n k n k 1 ξ2/2 p q − e− k . k ≈ √2πnpq   The convergence is uniform with respect to k in the range specified above. (ii) The limit n is performed by means of the de Moivre-Laplace theo- →∞ rem which proofs convergence of the (centered and adjusted) distribution of the random variable ∗ towards the normal distribution Φ(x) on any finite Sn 28This is different from the extrapolation in the previous subsection because the limit limn Bk(n,α/n)= πk(α) leading to the Poisson distribution was performed in the limit →∞ of vanishing p = α/n. 29 The new variable ξk depends also on n, but for short we dispense from a second subscript. 90 Peter Schuster

Figure 2.19: Normalization of the binomial distribution. The figure shows 1 a symmetric binomial distribution B(20, 2 ), which is centered around µ = ν = 10 (black). The transformation yields a binomial distribution centered around the origin with unit variance: σ = σ2 = 1 (red). The grey and the pink continu- ous curves are normal distributions ϕ = exp (x ν)2/(2σ2) /√2πσ2 with the − − parameters (ν = 10,σ2 = np(1 p) = 5) and (ν = 0,σ2 = 1), respectively. −  interval. For an arbitrary constant interval ] a, b] with a < b, we have

b np 1 x2/2 lim P n ]a, b] = e− dx . (2.85) n S − √npq ∈ √2π a →∞    Z b In the proof the definite integral a ϕ(x)dx is partitioned into n (small) seg- ments like in Riemannian integration:R The segments still reflect the discrete distribution. In the limit n the partition becomes finer and eventually →∞ converges to the continuous function described by the integral. A comparison of figures 2.18 and 2.19 shows that the convergence is particularly effective in the symmetric case, p = q =0.5, where only minor differences are observable already for n = 20. (see also the next subsection 2.7.7). Stochastic Kinetics 91

2.7.7 Central limit theorem and the law of large numbers

The central limit theorem, in essence, is a more general formulation of the just discussed convergence of the binomial distribution to the normal distribution in the limit of large n, as expressed by the de Moivre-Laplace theorem (2.85). The outcome of a number of successive trials described by random variables is summed up to yield the random variable Xj = + + . . . + , n 1 . Sn X1 X2 Xn ≥ The individual random variables are assumed to be independent and Xj identically distributed. By identically distributed we mean that the variables have a common distribution which need not be specified. In the previous subsection we considered the binomial distribution as the outcome of succes- sive Bernoulli trials. Here, the only assumptions concerning the distributions P ( x) = F (x) and P ( x) = F (x) are that the means µ and the Xj ≤ j Sn ≤ n variances σ2 of all random variables are the same and finite. Xj The first step towards the central limit theorem is again a transformation of variables shifting the maximum to the origin and adjusting the width of the distribution: n j E( j) n E( n) 1 ∗ = X − X and ∗ = S − S = ∗ . (2.86) Xj σ( ) Sn σ( ) √n Xj j n j=1 X S X The values for the individual first and second moments are: E( ) = µ, Xj E( ) = nµ, σ( ) = σ, and σ( ) = √n σ. The transformation is in full Sn Xj Sn analogy to the one performed with the random variable of the binomial Sn distribution in the previous subsection and eventually we obtain:

2 E( j∗) = 0 , σ ( j∗)=1 , X X (2.87) 2 E( ∗) = 0 , σ ( ∗)=1 . Sn Sn If I is the finite interval ]a, b] then F (I)= F (b) F (a) for any distribution − function F and we can write the de Moivre-Laplace theorem in the compact form

lim Fn(I) = Φ(I) . (2.88) n →∞ 92 Peter Schuster

Under the generalized conditions the central limit theorem states that for any interval ]a, b] with a < b the limit

b n nµ 1 x2/2 lim P S − ]a, b] = e− dx. (2.89) n √n σ ∈ √2π →∞   Za is fulfilled. A proof of the central limit theorem makes use of the characteristic func- tion for the unit normal distribution (ν = 0, σ2 = 1):

. 1 2 2 s2/2 φ(s) = exp(ııν s σ s ) = e− . (2.90) − 2 Assume that for every s the characteristic function for converges to the Sn characteristic function φ(s),

s2/2 lim φn(s) = φ(s) = e− . n →∞

Since φn(s) are the characteristic functions associated with an arbitrary dis- tribution function Fn(x) follows for every x

x 1 u2/2 lim Fn(x) = Φ(x) = e− du , (2.91) n √2π →∞ Z−∞ and in particular the de Moivre-Laplace theorem follows as a special case. Characteristic functions h(s) of random variables with mean zero, X ν = 0, and variance one, σ2 = 1, have the Taylor expansion

s2 h(s) = 1 1+ ε(s) with lim ε(s)=0 − 2 s 0   → at s = 0 and truncation after the second term. The proof is straightforward and starts from the full Taylor expansion up to the second term:

h00(0) 2 h(s) = h(0) + h0(0) s + s 1+ ε(s) . 2 .   ııs From h(s)= E(e X ) follows by differentiation

. . . ııs 2 ııs h0(s) = E ıı e X and h00(s) = E e X X −X   Stochastic Kinetics 93

. 2 and hence h0(0) = E(ıı ) = 0 and h00(0) = E( ) = 1 yielding the X −X − equation given above. Next we consider the characteristic function of ∗: Sn . . n E exp(ııs n∗) = E exp ııs j∗ /√n S j=1 X      X   The right hand side of the equation can be factorized and yields

. n . n n ııs ∗ /√n ııs /√n s E e j=1 Xj = E e Xj∗ = h , P √n        where h(s) is the characteristic function of the random variable . Insertion Xj into the expression for the Taylor series yields now

s s2 s h = 1 1+ ε . √n − 2n √n      Herein s is fixed and n is approaching infinity:

n . 2 ııs s s s2/2 lim E e Sn∗ = lim 1 1+ ε = e− . (2.92) n n →∞ →∞ − 2n √n !      For taking the limit in the last step of the derivation we recall the summation of infinite series:

n αn α lim 1 = e− for lim αn = α . (2.93) n − n n →∞   →∞ This is a stronger result than the convergence of the conventional exponential n α series, limn (1 α/n) = e− . Thus, we have shown that the characteris- →∞ − tic function of the normalized sum of random variables, ∗, converges to the Sn characteristic function of the unit normal distribution and therefore by equa- tion (2.91) the distribution Fn(x) converges to the unit normal distribution Φ(x) and the validity of (2.88) follows straightforwardly. Summarizing the results of this part we conclude that the distribution of the sum of every sequence of n independent random variables with Sn Xj finite variance converges to the normal distribution for n . This conver- →∞ gence is independent of the particular distribution of the random variables provided all variables follow the same distribution and they have finite mean µ and variance σ2. 94 Peter Schuster

Contrasting the rigorous mathematical derivation, simple practical appli- cations used in large sample theory of statistics turn the limit theorem encapsulated in equation (2.92) into a rough approximation

P (σ√n x < nm < σ√n x ) Φ(x ) Φ(x ) (2.94) 1 Sn − 2 ≈ 2 − 1 or for the spread around the sample mean m by setting x = x 1 − 2 P ( n m < σ√n x) 2Φ(x) 1 . (2.94’) |Sn − | ≈ − For practical purposes equation (2.94) has been used in pre-computer time 1 together with extensive tabulations of the functions Φ(x) and Φ− (x), which are still found in statistics textbooks. The law of large numbers can be derived as a straightforward conse- quence of the central limit theorem (2.92) [5, pp.227-233]. For any fixed but arbitrary constant c> 0 we have

lim P Sn µ

Related to and a consequence of equation (2.95) is Chebyshev’s inequality named after the Russian mathematician Pafnuty Chebyshev for random vari- ables that have a finite second moment: X E( 2) P ( c) X (2.96) |X| ≥ ≤ c2 is fulfilled for any constant c. We dispense here from a proof that is found in [5, pp.228-233]. Equation (2.95) is extended to a sequence of independent random vari- ables with different expectation values and variances, E( ) = µ(j) and Xj Xj σ2( ) = σ 2, with the restriction that there exists a constant Σ2 < such Xj j ∞ that σ 2 Σ2 is fulfilled for all . Then we have for each c> 0: j ≤ Xj + . . . + µ(1) + . . . + µ(n) lim P X1 Xn

For the purpose of illustration we consider a Bernoulli sequence of coin tosses as described by a binomial distribution. First we rewrite equation (2.97) by Stochastic Kinetics 95 the introduction of centered random variables ˆ = µ(j) – we note that Xj Xj − the variables ˆ are not normalized with respect to variance, ∗ = ˆ /σ( ) Xj Xj Xj Xj – and their sum ˆ = n ˆ with Sn i=1 Xj P n n E( ˆ ) = 0 and E ( ˆ )2 = σ2( ˆ ) = σ2( ˆ ) = σ2 , Sn Sn Sn Xi i i=1 i=1   X X make use of boundedness of the variances, σ 2 Σ2, j ≤ ˆ 2 Σ E ( ˆ )2 n Σ and E Sn , Sn ≤ n ≤ n      and find through application of Chebyshev’s inequality (2.96)

ˆ E ( ˆ )2) Σ n Sn P S c 2 2 . n ≥ ! ≤ c ≤ nc 

Hence the probability P above converges to zero in the limit n . Second, →∞ insertion of the specific data for the Bernoulli series yields ˆ p(1 p) 1 P Sn p c − , n − ≥ ≤ nc2 ≤ 4 nc2 ! where the last inequality result from p(1 p) 1/4 for 0 p 1. In order to − ≤ ≤ ≤ make an error smaller than ε, the number of trials has to exceed n 1/(4c2ε). ≥ Since the Chebyshev inequality provides a rather crude estimate we present also a sharper bound that leads to the approximation np n n P Sn − c 2 1 Φ c . ≥ p(1 p) ≈ − p(1 p) np(1 p) !    − r − r −

Eventually, p we put η = c√n p (1 p) and find − .p ε 2 I Φ(η) ε or Φ(η) 1 , − ≤ ≥ − 2 which is suitable for numerical evaluation. The main message of the law of large numbers is that for a sufficiently large numbers of independent events the statistical errors in the sum will vanish and the mean converges to the exact expectation value. Hence, the law of large numbers provides the basis for the assumption of convergence in mathematical statistics. 96 Peter Schuster

2.7.8 The Cauchy-Lorentz distribution

The Cauchy-Lorentz distribution is named after the French mathematician Augustin Louis Cauchy and the Dutch physicist Hendrik Antoon Lorentz and is important in mathematics and in particular in physics where it occurs as the solution to the differential equation for forced resonance. In spec- troscopy the Lorentz curve is used for the description of spectral lines that are homogeneously broadened. The Cauchy density function is of the form 1 1 1 γ f(x) = 2 = 2 2 , (2.98) πγ · x x0 π · (x x0) + γ 1+ −γ −   which yields the cumulative distribution function 1 x x 1 F (x) = arctan − 0 + . (2.99) π γ 2   The two parameters define the position of the peak, x0, and the width of the distribution, γ (figure 2.20). The peak height or amplitude is 1/(πγ). The function F (x) can be inverted

1 1 F − (p) = x + γ tan π p (2.99’) 0 − 2   and we obtain for the quartiles and the median the values: (x γ, x , x +γ). 0 − 0 0 The characteristic function of the Cauchy distribution is given by

. . ıı s ∞ ııs x . φ (s) = E(e X ) = f(x)e dx = exp(ııx0 s γ s ) , (2.100) X − | | Z−∞ which is the Fourier transform of the probability density that in turn can be derived from . 1 ∞ ııx s f(x) = φ (s)e− ds . 2π X Z−∞ We remark that the conventional Fourier transformation is slightly different through the choice of a factor 2π and the sign in the exponent. The Cauchy distribution has a well defined median and a well defined mode given byµ ¯ =µ ˜ = x0, quantiles can be calculated readily from (2.99’), but the mean and all other moments do not exist because the integral ∞ xf(x)dx diverges. −∞ R Stochastic Kinetics 97

Figure 2.20: The Cauchy-Lorentz probability density. The figure shows three examples of Cauchy-Lorentz densities, f(x) = γ π (x x )2 + γ2 − 0 centered around the median x0 = 5. The width of the distribution.  increases with γ, which was chosen to be 0.5 (red), 1.0 (black), and 2.0 (blue).

2.7.9 Bimodal distributions

As the name of the bimodal distribution indicates that the density function f(x) has two maxima. It arises commonly as a mixture of two unimodal distribution in the sense that the bimodally distributed random variable X is defined as

P ( = 1) = α and Prob( ) = X Y X  P ( = ) = (1 α) .  X Y2 − Bimodal distributions commonly arise from statistics of populations that are split into two subpopulations with sufficiently different properties. The sizes of weaver ants give rise to a bimodal distributions because of the existence of two classes of workers [57]. In case the differences are too small as in case of the combined distribution of body heights for men and women monomodality is observed [58]. As an illustrative model we choose the superposition of two normal dis- tributions with different means and variances (figure 2.21). The probability 98 Peter Schuster

Figure 2.21: A bimodal probability density. The figure illustrates a bimodal distribution modeled as a superposition of two normal distributions (2.101) with 2 α = 1/2 and different values for mean and variance (ν1 = 2,σ1 = 1/2) and (ν2 = 2 (x 2)2 (x 6)2/2 6,σ2 = 1): f(x) = (√2e− − + e− − ) (2√2π). The upper part shows the probability density corresponding to the two modesµ ˜1 = ν1 = 2 andµ ˜2 = ν2 = 6. Medianµ ¯ = 3.65685 and mean µ = 4 are situated near the density minimum between the two maxima. The lower part presents the cumulative probability 1 x 6 distribution, F (x)= 2+erf x 2 +erf − , as well as the construction of 4 − √2   the median. The variances in this example  are:µ ˆ2 = 20.75 and µ2 = 4.75. Stochastic Kinetics 99 density for α =1/2 is then of the form:

(x ν )2 (x ν )2 − − 1 − − 2 1 2σ2 2 2σ2 2 f(x) = e 1 σ + e 2 σ . (2.101) 2√2π 1 2  .q .q  The cumulative distribution function is readily obtained by integration. As in the case of the normal distribution the result is not analytical but formulated in terms of the error function, which is available only numerically through integration:

1 x ν x ν F (x) = 2 + erf − 1 + erf − 2 . (2.102) 4 2 2  2σ1   2σ2 ! In the numerical example shownp in figure 2.21 thep distribution function shows two distinct steps corresponding to the maxima of the density f(x). As an exercise first an second moments of the bimodal distribution can be readily computed analytically. The results are: 1 µˆ = µ = (ν + ν ) , µ = 0 and 1 2 1 2 1 1 1 1 1 µˆ = (ν2 + ν2)+ (σ2 + σ2) , µ = (ν ν )2 + (σ2 + σ2) . 2 2 1 2 2 1 2 2 4 1 − 2 2 1 2 The centered second moment illustrates the contributions to the variance of the bimodal density. It is composed of the sum of the variances of the subpopulations and the square of the difference between the two means, (ν ν )2. 1 − 2 In table 2.2 we have listed also several other probability densities which are of importance for special applications. In the forthcoming chapters 4 and 5 dealing with applications we shall make use of them, in particular of the logistic distribution that describes the stochasticity of growth following the logistic equation. 100 Peter Schuster 3. Stochastic processes

Systems evolving probabilistically in time can be described and modeled in mathematical terms by stochastic processes. More precisely, we postulate the existence of a time dependent random variable (t) or random vector X ~(t).1 We shall distinguish the simpler discrete case, X P (t) = P (t)= n with n N0 , n X ∈   and the continuous or probability density case,

dF (x, t) = f(x, t) dx = P x (t) x + dx with x R . ≤ X ≤ ∈   In both cases an experiment, or a trajectory, is understood as a recording of the particular values of at certain times: X = (x , t ), (x , t ), (x , t ), , (y , τ ), (y , τ ), . (3.1) T 1 1 2 2 3 3 ··· 1 1 2 2 ···   Although it is not essential for the application of probability theory, we shall assume for the sake of clearness that the recorded values are always time ordered here with the earliest or oldest values on the rightmost position and the most recent values at the latest entry on the left-hand side (figure 3.1):2 t t t τ τ . 1 ≥ 2 ≥ 3 ≥···≥ 1 ≥ 2 ≥··· A trajectory is a sequence of time ordered doubles (x, t). A general comment on the meaning of variables is required. So far we have only used the vague notion of scores and not yet specified what kinds of

1At first we need not specify whether (t) is a simple random variable or a random X vector. Later on, when a distinction between problems of different dimensionality becomes necessary, we shall make clear, in which sense (t) is used (variable in one dimension or X vector ~(t)). X 2It is worth noticing that the conventional time axis in drawings of stochastic processes goes in opposite direction, from left to right.

101 102 Peter Schuster

Figure 3.1: Time order in modeling stochastic processes. Time is pro- gressing from left to right and the most recent event is given by the rightmost recording at time t1. The Chapman-Kolmogorov equation describing stochastic processes comes in two forms: (i) the forward equation predicting the future form past and present and (ii) the backward equation that extrapolates back in time from present to past. quantities the random variables ( , , ) Ω describe or what their realiza- X Y Z ∈ tions in some measurable space, denoted by (x, y, z) Rn, are. Depending ∈ on the chemical or biological model these variables can be discrete numbers of particles in ensembles of atoms or molecules or continuous concentrations, they can be the numbers of individuals in populations or they can be po- sitions in three-dimensional space when migration processes are considered. In the last example we shall tacitly assume that the one-dimensional vari- ables can be replaced by vectors, (t), (t), (t) = ~(t), ~(t), ~(t) or X Y Z ⇒ X Y Z (x, y, z) = (x, y, z) without changing the equations, and if this is not the ⇒   case the differences will be stated. In the more involved models of chemical reaction-diffusion systems the variables will be functions of space and time, for example (r, t). X In this chapter we shall present a general formalism to describe stochastic processes and distinguish between differen classes, in particular drift, diffu- sion, and jump processes. In essence, we shall use the notation introduced by Crispin Gardiner [59]. The introduction given here is essentially based on the two textbooks [6, 16]. A few examples of stochastic processes of gen- eral importance will be discussed in this chapter in order to illustrate the formalism. Applications are presented in the following two chapters 4 and 5. Stochastic Kinetics 103

3.1 Markov processes

A stochastic process, as we shall assume, is determined by a set of joint probability densities the existence of which is given and which determine the system completely:3

p(x , t ; x , t ; x , t ; ; x , t ; ) . (3.2) 1 1 2 2 3 3 ··· n n ··· By the phrase ’the determination is complete’ we mean that no additional information is needed to describe the progress in terms of a time ordered series (3.1) and we shall call such a process a separable stochastic pro- cess. Although more general processes are conceivable, they play little role in current physics, chemistry, and biology and therefore we shall not consider them here. Calculation of probabilities from (3.2) by means of marginal densities (2.21) and (2.27) is straight forward. For the continuous case we obtain

b ∞ P ( 1 = x1 [a, b]) = dx1 dx2dx3 dxn X ∈ a ··· ··· Z ZZZ−∞ p(x , t ; x , t ; x , t ; ; x , t ; ) 1 1 2 2 3 3 ··· n n ··· and in the discrete case the result is obvious

P ( = x ) = p(x , ) = p(x , t ; x , t ; x , t ; ; x , t ; ) . X 1 1 ∗ 1 1 2 2 3 3 ··· n n ··· x =x1 Xk6 Time ordering allows us to formulate predictions of future values from the known past in terms of conditional probabilities: p(x , t ; x , t ; ; y , τ ; y , τ , ) p(x , t ; x , t ; y , τ ; y , τ , ) = 1 1 2 2 ··· 1 1 2 2 ··· , 1 1 2 2 ···| 1 1 2 2 ··· p(y , τ ; y , τ , ) 1 1 2 2 ··· with t t τ τ . In other words, we may compute 1 ≥ 2 ≥ ··· ≥ 1 ≥ 2 ≥ ··· (x , t ), (x , t ), from known (y , τ ), (y , τ ), . Before we derive { 1 1 2 2 ···} { 1 1 2 2 ···} 3The joint density p is defined in the same way as in equations (2.20) and (2.26) but with a slightly different notation. In describing stochastic processes we are always dealing with doubles (x, t), and therefore we separate individual doubles by a semicolon: ; x ,t ; x ,t ; . · · · k k k+1 k+1 · · · 104 Peter Schuster a general concept that allows for flexible modeling and tractable stochastic description of processes we introduce a few common and characteristic classes of stochastic processes.

3.1.1 Simple stochastic processes

The simplest class of stochastic processes is characterized by complete in- dependence,

p(x , t ; x , t ; x , t ; ) = p(x , t ) , (3.3) 1 1 2 2 3 3 ··· i i i Y which implies that the current value (t) is completely independent of its X values in the past. A special case is the sequence of Bernoulli trials (see in Sn chapter 2, in particular in subsections 2.2.1 and 2.7.5) where the probability densities are also independent of time: p(xi, ti)= p(xi), and then we have

p(x , t ; x , t ; x , t ; ) = p(x ) . (3.3’) 1 1 2 2 3 3 ··· i i Y Further simplification occurs, of course, when all trials are based on the same probability distribution – for example, if the same coin is tossed in Bernoulli trials – and then the product is replaced by p(x)n. The notion of martingale has been introduced by the French mathe- matician Paul Pierre L´evy and the development of the theory of martingales is due to the American mathematician Joseph Leo Doob. The conditional mean value of the random variable (t) provided (t )= x is defined as X X 0 0 . E (t) (x , t ) = dxp(x, t x , t ) . X | 0 0 | 0 0 Z  In a martingale the conditional mean is simple given by

E (t) (x , t ) = x . (3.4) X | 0 0 0  The mean value at time t is identical to the initial value of the process. The martingale property is rather strong and we shall use it for several specific situations. Stochastic Kinetics 105

The somewhat relaxed notion of a semimartingale is of importance because it covers the majority of processes that are accessible to modeling by stochastic differential equations (section 3.5). A semimartingale is composed of a local martingale and a c`adl`ag adapted process with bounded variation

(t) = (t) + (t) X M A A local martingale is a stochastic process that satisfies locally the martingale property (3.4) but its expectation value (t) may be distorted at long hM i times by large values of low probability. Hence, every martingale is a local martingale and every bounded local martingale is a martingale. In particu- lar, every driftless diffusion process is a local martingale but need not be a martingale. An adapted or nonanticipating process is a process that cannot see in the future. An informal interpretation [60, section II.25] would say: A stochastic process (t) is adapted iff for every realization and for every time X t, (t) is known at time t. C`adl`ag stands for right-hand continuous with left X limits (For the c`adl`ag property of processes see also section 2.2.2). More formally the definition of an adapted process reads: For a prob- ability space (Ω, ,P ) with being an index set with total order ( ) and F I ≤ 4 N, N0, = [0, t] or = [0, + ), = ( i)i I being a filtration of I ∈ I ∈ I I ∞ F· F ∈ the σ algebra ,(S, Σ) being a measurable state space and : Ω S be- F X I⊗ → ing a stochastic process, the process is said to be adapted to the filtration X ( i)i I if the random variable F ∈ : Ω S is a ( , Σ)-measurable function for each i I . Xi → Fi ∈ The concept of adapted processes is essential, for the Ito stochastic integral, which requires that the integrand is an adaptive process.

4 A filtration is an indexed set Si of subobjects from a given algebraic structure S with the index i running over some index set , which is totally ordered with respect to the I condition: i j, (i, j) I S S . Alternatively, in a filtered algebra there is instead ≤ ∈ ⇒ i ⊆ j the requirement that the Si are subobjects with respect to certain operations, for example vector addition, whereas other operations are objects from other structures, for example multiplication that satisfies Si Sj Si j where the index set is the natural numbers, · ⊂ ⊕ N (see also [61]). I ≡ 106 Peter Schuster

Another simple concept assumes that knowledge of the present only is sufficient to predict the future. It is realized in Markov processes named after the Russian mathematician Andrey Markov and can formulated easily in terms of conditional probabilities:

p(x , t ; x , t ; y , τ ; y , τ , ) = p(x , t ; x , t ; y , τ ) . (3.5) 1 1 2 2 ···| 1 1 2 2 ··· 1 1 2 2 ···| 1 1 In essence, the Markov condition expresses more precisely the assumptions of Albert Einstein and Marian von Smoluchowski in their derivation of the diffusion process. particular, we have

p(x , t ; x , t ; y , τ ) = p(x , t x , t ) p(x , t y , τ ) . 1 1 2 2 1 1 1 1| 2 2 2 2| 1 1 As we have seen in section 2.4 any arbitrary joint probability can be simply expressed as products of conditional probabilities:

p(x , t ; x , t ; x , t ; x , t ) = 1 1 2 2 3 3 ··· n n

= p(x1, t1 x2, t2) p(x2, t2 x3, t3) p(xn 1, tn 1 xn, tn) p(xn, tn) (3.5’) | | ··· − − | under the assumption of time ordering t1 t2 t3 ...tn 1 tn. ≥ ≥ ≥ − ≥

3.1.2 The Chapman-Kolmogorov equation

From joint probabilities also follows that summation of all mutually exclusive events of one kind eliminates the corresponding variable:

P (A B C) = P (A C) . ∩ ∩ ∩ B X By the same token we find

p(x , t ) = dx p(x , t ; x , t ) = dx p(x , t x , t ) p(x , t ) . 1 1 2 1 1 2 2 2 1 1| 2 2 2 2 Z Z Extension to three events leads to

p(x , t x , t ) = dx p(x , t ; x , t x , t ) = 1 1| 3 3 2 1 1 2 2| 3 3 Z = dx p(x , t x , t ; x , t ) p(x , t x , t ) . 2 1 1| 2 2 3 3 2 2| 3 3 Z Stochastic Kinetics 107

For t t t and making use of the Markov assumption we obtain the 1 ≥ 2 ≥ 3 Chapman-Kolmogorov equation, which is named after the British geo- physicist and mathematician Sydney Chapman and the Russian mathemati- cian :

p(x , t x , t ) = dx p(x , t x , t ) p(x , t x , t ) . (3.6) 1 1| 3 3 2 1 1| 2 2 2 2| 3 3 Z In case we are dealing with a discrete random variable N0 defined on N ∈ the integers we replace the integral by a sum and obtain

P (n , t n , t ) = P (n , t n , t ) P (n , t n , t ) . (3.7) 1 1| 3 3 1 1| 2 2 2 2| 3 3 n X2 The Chapman-Kolmogorov equation can be interpreted in two different ways known as forward and backward equation. In the forward equation the double (x3, t3) is considered to be fixed and (x1, t1) represents the variable x1(t1), and the time t1 proceeding in positive direction. The backward equa- tion is exploring the past of a a given situation: the double (x1, t1) is fixed and (x3, t3) is propagating backwards in time. The forward equation is bet- ter suited to describe actual processes, whereas the backward equation is the appropriate tool to compute the evolution towards given events, for ex- ample first passage times. In order to discuss the structure of solutions of equations (3.6) and (3.7), we shall derive the equations in differential form.

Continuity of processes. Before we can do so we need to consider a condi- tion for the continuity of Markov processes. The process goes from position z at time t to position x at time t + ∆t. Continuity of the process implies that the probability of x to be finitely different from z goes to zero faster than ∆t in the limit lim ∆t 0: → 1 lim dxp(x, t + ∆t z, t) = 0 , (3.8) ∆t 0 ∆t x z >ε | → Z| − | and this uniformly in z, t, and ∆t. In other words, the difference in proba- bility as a function of x z converges sufficiently fast to zero and thus no | − | jumps occur in the random variable (t). X 108 Peter Schuster

Figure 3.2: Continuity in Markov processes. Continuity is illustrated by means of two stochastic processes of the random variable (t), the Wiener process X (t) (3.9) and the Cauchy process (t) (3.10). The Wiener process describes W C Brownian motion and is continuous but almost nowhere differentiable. The even more irregular Cauchy process is wildly discontinuous.

As two illustrative examples for the analysis continuity we choose in fig- ure 3.2 the Einstein-Smoluchowski solution of the Brownian motion,5 which leads to normally distributed probability,

1 (x z)2 p(x, t + ∆t z, t) = exp − , (3.9) | √ − 4D∆t 4πD∆t   and the so-called Cauchy process following the Cauchy-Lorentz distribution,

∆t 1 p(x, t + ∆t z, t) = . (3.10) | π (x z)2 + ∆t2 − In case of the Wiener process we exchange the limit and the integral, intro-

5Later on we shall discuss this particular stochastic process in detail and call it a Wiener process. Stochastic Kinetics 109

1 duce ϑ = (∆t)− , perform the limit ϑ , and have →∞ 1 1 1 (x z)2 lim dx exp − = ∆t 0 ∆t x z >ε √4πD √∆t − 4D∆t → Z| − |   1 1 1 (x z)2 = dx lim exp − = x z >ε ∆t 0 ∆t √4πD √∆t − 4D∆t Z| − | →   1 ϑ3/2 = dx lim , where ϑ (x z)2 x z >ε √4πD − Z| − | →∞ exp 4D ϑ   ϑ3/2 lim 2 3 = 0 . ϑ (x z)2 1 (x z)2 1 (x z)2 →∞ 1 + − ϑ + − ϑ2 + − ϑ3 + . . . 4D · 2! 4D · 3! 4D · Since the power expansion of the exponential in the denomina tor increases faster than every finite power of ϑ, the ratio vanishes in the limit ϑ → ∞ and the value of the integral is zero. In the second example, the Cauchy process, we exchange limit and integral as incase of the Wiener process, and perform the limit ∆t 0: → 1 ∆t 1 lim dx 2 2 = ∆t 0 ∆t x z >ε π (x z) + ∆t → Z| − | − 1 ∆t 1 = dx lim 2 2 = x z >ε ∆t 0 ∆t π (x z) + ∆t Z| − | → − 1 1 1 = dx lim 2 2 = 2 dx = 0 . x z >ε ∆t 0 π (x z) + ∆t x z >ε π(x z) 6 Z| − | → − Z| − | − ∞ 2 The last integral, I = x z >ε dx/(x z) , takes a value of the order I 1/ε. | − | − ≈ Although it is continuous,R the curve of Brownian motion is indeed ex- tremely irregular since it is nowhere differentiable (figure 3.2). The Cauchy- process curve is also irregular but even discontinuous. Both processes, as required for consistency, fulfill the relation

lim p(x, t + ∆t z, t) = δ(x z) , t 0 → | − where δ( ) is the so-called delta-function.6 It is also straightforward to show · that the Chapman-Kolmogorov equation is fulfilled in both cases. Note, that 6The delta-function is no proper function but a generalized function or distribution. It was introduced by Paul Dirac in quantum mechanics. For more details see, for example, [62, pp.585-590] and [63, pp.38-42]. 110 Peter Schuster a small but finite difference x z > ε is required to avoid the collapse of | − | the distribution to the delta-function which is prohibitive for the detection of continuity.

Differential Chapman-Kolmogorov equation. Now we shall develop a differential version of the Chapman-Kolmogorov equation which is based on the continuity condition just discussed. This requires a technique to divide differentiability conditions into parts corresponding either to continuous mo- tion under generic conditions or to discontinuous motion. This partitioning is based on the following conditions for all ε> 0: (3.11)

1 (i) lim∆t 0 p(x, t + ∆t z, t) t = W (x z, t) , uniformly in x, z, and t → ∆t | | for x z ε; | − | ≥ 1 (ii) lim∆t 0 ∆t x z <ε dx (xi zi) p(x, t + ∆t z, t) = ai(z, t) + (ε) , → | − | − | O uniformly inRz, and t;

1 (iii) lim∆t 0 ∆t x z <ε dx (xi zi)(xj zj) p(x, t + ∆t z, t) = → | − | − − | = B (z, t) + (ε) , uniformly in z, and t. ij R O where xi, xj, zi, and zj, refer to particular components of the vectors x and z, respectively. In (i) W (x z, t) is the jump probability from z to x at time | t. It is important to notice that all higher-order coefficients of motion Cijk, defined in analogy to ai in (ii) or Bij in (iii), must vanish by symmetry considerations [6, p.47-48]. As an example we consider the third order term defined by 1 . lim dx (xi zi)(xj zj)(xk zk)p(x,t +∆t z,t) = Cijk(z,t)+ (ε). ∆t 0 ∆t x z <ε − − − | O → Z| − |

The function Cijk(z, t) is symmetric in the indices i, j and k, and in order to check the consequences of this symmetry we define

C˜(α, z, t) α α α C (z, t) , ≡ i j k ijk Xi,j,k which can be written as 1 ∂3 Cijk(z, t) = C˜(α, z, t) . 3! ∂αi∂αj∂αk Stochastic Kinetics 111

By comparison with item (iii) we find

C˜(α, z, t) | | ≤ 1 lim dx α(x z) α(x z) 2p(x, t + ∆t z, t) + (ε) , ≤ ∆t 0 ∆t x z <ε | − | − | O → Z| − | 1  α ε lim dx α(x z) 2p(x, t + ∆t z, t) + (ε) , ≤| | ∆t 0 ∆t x z <ε − | O → Z| − |  = ε α α α B (Z, t)+ (ε) k + (ε) , | | i j ij O O   = (ε) , O and accordingly, C vanishes. It can be shown by analogous derivation that all quantities of higher than third order are zero too. According to the continuity condition (3.8) a Markov process can only have a continuous path if W (x z, t) vanishes for all x = z. It is sugges- | 6 tive therefore that this function describes discontinuous motion, whereas the quantities ai and Bij are connected with aspects of continuous motion. In order to derive a differential version of the Chapman-Kolmogorov equa- tion we consider the time evolution of the expectation of a function f(z) which is (at least) twice differentiable: ∂ dx f(x) p(x,t y,t0) = ∂t | Z 1 = lim dx f(x) p(x,t +∆t y,t0) p(x,t y,t0) = ∆t 0 ∆t | − | → Z   1 = lim dx dz f(x)p(x,t +∆t z,t)p(z,t y,t0) dz f(z)p(z,t y,t0) , ∆t 0 ∆t | | − | → Z Z Z  where we have used the Chapman-Kolmogorov equation in the positive term in order to produce the dz expression. In the negative term we made use of the fact that dx p(x, t+∆t z, t) = 1 because p(x, t+∆t z, t) is a conditional R | | probability. TheR derivation of the expression can be visualized much easier by considering the association of variables with times: x t + ∆t, z t, ↔ ↔ and y t0 with t + ∆t>t>t0. Then we obtain the second term just be the ↔ appropriate change in variables: x z. ↔ The integral over dx is now divided into two regions, x z ε and | − | ≥ x z <ε. Since f(z) is assumed to be twice continuously differentiable, we | − | 112 Peter Schuster

find by means of Taylor expansion up to second order:

2 ∂f(z) 1 ∂ f(z) 2 f(x)= f(z)+ (xi zi)+ (xi zi)(xj zj)+ x z R(x, z) . ∂zi − 2 ∂zi∂zj − − | − | Xi Xi,j From the condition of twice continuous differentiability follows R(x, z) 0 | |→ as x z 0 where R(x, z) is the remainder term after the truncation of the | − |→ Taylor series. Substitution in the partial time derivative of the expectation value from above yields:

∂ dx f(x) p(x,t y,t0) = ∂t | Z 1 ∂f(z) ∂2f(z) = lim dx dz (xi zi) + (xi zi)(xj zj) ∆t 0 ∆t − ∂z − − ∂z ∂z × → i i i,j i j x ZZz <ε   | − | X X p(x,t +∆t z,t) p(z,t y,t0) + (3.12a) × | | 2 + dx dz x z R(x, z) p(x,t +∆t z,t) p(z,t y,t0) + (3.12b) | − | | | x ZZz <ε | − |

+ dx dz f(x) p(x,t +∆t z,t) p(z,t y,t0) + (3.12c) | | x ZZz <ε | − |

+ dx dz f(x) p(x,t +∆t z,t) p(z,t y,t0) (3.12d) | | − x ZZz ε | − |≥

dx dz f(z) p(x,t +∆t z,t) p(z,t y,t0) . (3.12e) − | | ZZ  In last term in the equation, line (3.12e), the integral over x is simply one, since p(x, t + ∆t z, t) is a probability and the integration covers the entire | sample space. In the following we consider the individual terms separately. As we have assumed uniform convergence we can take the limit inside the integral and obtain by means of conditions (ii) and (iii) from (3.11) for the term (3.12a):

∂f(z) 1 ∂2f(z) dz a (z) + B (z) p(z, t y, t0) + (ε) . i ∂z 2 ij ∂z ∂z | O i i i,j i j ! Z X X The next term (3.12b) is a remainder term and vanishes in the limit ε 0 → Stochastic Kinetics 113

[6, p.49]:

1 2 dx (x z) R(x, z) p(x, t + ∆t z, t) ∆t − | ≤ x Zz <ε | − |

1 dx (x z)2 p(x, t + ∆t z, t) max R(x, z) ≤ ∆t − | x z <ε → | − | x Zz <ε   | − |

Bij(z, t) + (ε) max R(x, z ) . → O x z <ε i,j | − |  X

From the previously stated requirement of twice continuous differentiability follows max R(x, z) 0 as ε 0 . x z <ε → → | − | The remaining three terms, (3.12c), (3.12d), and (3.12e) can be combined and yield:7

dx dz f(z) W (z x, t) p(x, t y, t0) W (x z, t) p(z, t y, t0) . | | − | | x ZZz <ε | − |   The whole right-hand side of equation (3.12) is independent of ε. Thus we can take the limit ε 0 and find → ∂ dz f(z) p(z, t y, t0) = ∂t | Z ∂f(z) 1 ∂2f(z) = dz a (z, t) + B (z) p(z, t y, t0) + i ∂z 2 ij ∂z ∂z | i i i,j i j ! Z X X

+ dz f(z) — dx W (z x, t) p(x, t y, t0) W (x z, t) p(z, t y, t0) , | | − | | Z  Z   where – dx stands for a principal value integral, for example, R lim dx F (x, z) — dx F (x, z) , ε 0 → ≡ x Zz >ε Z | − | 7Note that we interchanged the variables x and z in the two positive terms (3.12c) and (3.12d). This is generally admissible since the integration extends over both domains. 114 Peter Schuster the principal value integral of a function F (x, z). For any realistic process we can assume that this integral exists. Condition (i) of (3.11) defines W (x z, t) | for x = z only (ε = x z > 0), and hence leaves open the possibility 6 | − | 8 that it becomes infinite at x = z. In general, when p(x, t y, t0) is continuous | and once differentiable, then the principle value integral exists. In the forth- coming part we shall dispense from spelling out the principal value integral explicitly, since singular cases like the Cauchy process, for which contour in- tegration based on complex functions theory is required, are considered only rarely. The final step in our derivation is now integration by parts, for which we recall f 0(x)g(x)dx = f(x)g(x) f(x)g0(x)dx from elementary calculus. − Some carefulR computation finally yieldsR ∂ dz f(z) p(z, t y, t0) = ∂t | Z ∂ = dz f(z) a (z, t) p(z, t y, t0) + − ∂z i | i i Z  X 1 ∂2 + B (z, t) p(z, t y, t0) + 2 ∂z ∂z ij | i,j i j X

+ dx W (z x, t) p(x, t y, t0) W (x z, t) p(z, t y, t0) | | − | | Z   + surface terms .

So far we have not yet specified the range of the integrals. The process under consideration is assumed to be confined to a region in sample space R Ω ∈ with some surface S. Clearly, probabilities vanish when variables are outside R and by definition we have

p(x, t z, t0) = 0 and W (x z, t) = 0 unless both x, z R . | | ∈

The situation with the functions ai(z, t) and Bij(z, t) is more subtle, since the conditions on them can lead to discontinuities because the conditional 8This is indeed the case for the Cauchy process (figure 3.2), for which W (x z,t) = | 1/[π(x z)2]. − Stochastic Kinetics 115

probability p(x, t + ∆t z, t0) may well change discontinuously as z crosses the | boundary of R. Such a discontinuity could, for example, be the consequence that no transitions are allowed from outside R to inside R or vice versa.

Integration by parts requires, as initially stated, that ai and Bij are once and twice differentiable, respectively. In order to avoid problems related to discontinuous behavior at the surface S we may choose the function f(z) to be (arbitrary but) non-vanishing only in an arbitrary region R0 = R S R \ ⊂ and according to the definition of f(z) the surface terms vanish necessarily. Then we have for all z in the interior of R: ∂ p(z, t y, t0) = ∂t | ∂ a (z, t) p(z, t y, t0) + (3.13a) − ∂z i | i i X   1 ∂2 + B (z, t) p(z, t y, t0) + (3.13b) 2 ∂z ∂z ij | i,j i j X  

+ dx W (z x, t) p(x, t y, t0) W (x z, t) p(z, t y, t0) . (3.13c) | | − | | Z   This equation has been called the differential Chapman-Kolmogorov equation by Crispin Gardiner [64]. Precisely, it is the forward equation since it specifies initial conditions to lie in the past and it describes the development of the probability density with increasing time. Later on, we shall also discuss a backward equation. From a mathematical puristic’s point of view it is not clear from the derivation given here, that solutions of the differential Chapman-Kolmogorov equation (3.13) exist or that the solutions of (3.13) are also solutions to the Chapman-Kolmogorov equation (3.6). It is true, however, that the set of conditional probabilities obeying equation (3.13) does generate a Markov process in the sense that the joint probabilities produced satisfy all prob- ability axioms. It has been shown, however, that a non-negative solution to the differential Chapman-Kolmogorov equations exists and satisfies the Chapman-Kolmogorov equation under certain conditions (see [65, Vol.II]): 116 Peter Schuster

(i) a(x, t) = a (x, t); i = 1,... and B(x, t) = B (x, t); i, j = 1,... { i } { ij } are specific vectors and positive semidefinite matrices9 of functions, respectively,

(ii) W (x y, t) are non-negative quantities, | (iii) the initial condition has to satisfy p(z, t y, t) = δ(y z) which follows | − from the definition of of a conditional probability density, and

(iv) appropriate boundary conditions have to be fulfilled.

The boundary conditions are very hard to specify for the full equation but can be discussed precisely for special cases, for example in the case of the Fokker-Planck equation [9].

9 A positive definite matrix has exclusively positive eigenvalues, λk > 0 whereas a positive semidefinite matrix has non-negative eigenvalues, λ 0. k ≥ Stochastic Kinetics 117

3.2 Classes of stochastic processes

The differential Chapman-Kolmogorov equation describes three classes of components for stochastic processes which allow to define several important special cases. These classes refer to the three conditions (i), (ii), and (iii) discussed in (3.11) as well as combinations derived from them. Here, we shall be concerned only with the general aspect of classification. Specific examples will discussed in the forthcoming section 3.4.

3.2.1 Jump process and master equation

For the jump process we consider the last term in the differential Chapman-

Kolmogorov equation (3.13c) and set ai(z, t) = 0 and Bij(z, t) = 0 for all i and j. The resulting equation is known as master equation:

∂ p(z,t y,t0) = dx W (z x,t) p(x,t y,t0) W (x z,t) p(z,t y,t0) . (3.14) ∂t | | | − | | Z  

J ()t

X ()t

t

Figure 3.3: Jump process. The figure shows a typical trajectory of a jump process (t) as described, for example, by a master equation. The random variable J (t) stays constant except at certain discrete points where the jumps occur. X 118 Peter Schuster

In order to illustrate the general process described by the master equation (3.14) we consider the evolution in a short time interval. For this goal we solve approximately to first order in ∆t and use the initial condition p(z, t y, t)= | δ(y z) representing a sharp probability density at t = 0:10 − ∂ p(z, t + ∆t y, t) = p(z, t y, t)+ p(z, t y, t) ∆t + . . . | | ∂t | ≈ ∂ p(z, t y, t)+ p(z, t y, t) ∆t = ≈ | ∂t |

= δ(y z)+ W (z y, t) dx W (x y, t) δ(y z) ∆t = − | − | −  Z  = 1 dx W (x y, t)∆t δ(y z) + W (z y, t) ∆t . − | − |  Z  In the first term, the coefficient of δ(y z) is the (finite) probability for the − particle to stay in the original position y, whereas the distribution of particles that have jumped is given after normalization by W (z y, t). A typical path | ~(t) thus will consist of constant sections, ~(t) = const, and discontinuous X X jumps which are distributed according to W (z y, t) (figure 3.3). It is worth | noticing that a pure jump process does occur here even though the variable ~(t) can take on a continuous range of values. X A highly relevant special case of the master equation is obtained when the sample space is mapped onto the space of integers, Ω Z = .., 2, 1, 0, 1, 2, .. . → { − − } Then, we can use conditional probabilities rather than probability densities in the master equation:

∂P (n, t n0, t0) | = W (n m, t) P (m, t n0, t0) W (m n, t) P (n, t n0, t0) . ∂t | | − | | m X  (3.14’) Clearly, the process is confined to jumps since only discrete values of the random variable ~ (t) are allowed. The master equation on the even more N restricted sample space Ω N0 = 0, 1, 2,... is of particular importance in → { } chemical kinetics. The random variable (t) then counts particle numbers N which are necessarily non-negative integers.

10We recall a basic property of the delta-function: f(x)δ(x y) dx = f(y). − R Stochastic Kinetics 119

Figure 3.4: Drift and diffusion process. The figure shows a typical trajec- tory of a drift-and-diffusion process whose probability density is described by a Fokker-Planck equation. The sample curve (t) is characterized by drift (red) D and diffusion (pink) of the random variable (t). The band indicates a confidence X interval of about ν σ that contains 68.2 % of the points (when they are normally ± distributed; see subsection 2.7.6).

3.2.2 Diffusion process and Fokker-Planck equation

The Fokker-Planck equation is in a way complementary to the master equation since the quantities W (z x, t) are assumed to be zero and hence | jumps are excluded: ∂p(z,t y,t ) | 0 = ∂t ∂ 1 ∂2 = ai(z,t) p(z,t y,t0) + Bij(z,t) p(z,t y,t0) . (3.15) − ∂zi | 2 ∂zi∂zj | Xi   Xi,j   The process corresponding to the Fokker-Planck equation is a diffusion pro- cess with a(z, t) being the drift vector and B(z, t) the diffusion matrix. As a result of the definition given in condition (iii) of (3.11), the diffusion ma- trix is positive semi-definite and symmetric. The trajectories of the diffusion 120 Peter Schuster process are continuous as follows directly from W (z x, t) = 0 in condition (i) | of (3.11). Making use of the initial condition p(z, t y, t) = δ(z y) and neglecting | − the derivatives of ai(z, t) and Bij(z, t) for small ∆t we find the approximation

2 ∂p(z, t y, t0) ∂p(z, t y, t0) 1 ∂ p(z, t y, t0) | = a (z, t) | + B (z, t) | , ∂t − i ∂z 2 ij ∂z z ∂ i i i,j i j X X 11 which can be solved for small ∆t = t t0 and obtain in matrix form − 1 1/2 p(z, t + ∆t y, t) = B(y, t) − | (2π)n ∆t | | ×   1 p 0 − 1 z y a(y, t) ∆t B(y, t) z y a(y, t) ∆t exp − − − − , × −2    ∆t      which is a normal distribution with a variance-covariance matrix Σ = B(y, t) and an expectation value ν = a(y, t) ∆t. The general picture thus is a sample point moving with a systematic drift, whose is a(y, t), and a superimposed Gaussian fluctuation with a covariance matrix B(y, t):

y(t + ∆t) = y(t) + a y(t), t ∆t + η(t) √∆t ,  with E η(t) = 0 and E η(t)η(t)0 = B(y, t). Accordingly, this picture gives (i) trajectories which are always continuous (since, for example, y(t + ∆t)   → y(t) is fulfilled for ∆t 0), and → (ii) trajectories which are nowhere differentiable because of the √∆t depen- dence of the fluctuations.12 An example of a typical solution of a Fokker- Planck equation is shown in figure 3.4.

11For readers not familiar with matrix formalism it is a straightforward exercise to write down the one-dimensional solution of the problem (See also subsection 2.7.6). 12The role of the √∆t dependence will become clear when we discuss the Wiener process in subsection 3.4.3. Stochastic Kinetics 121

3.2.3 Deterministic processes and Liouville’s equation

When the differential Chapman-Kolmogorov equation contains only the first term, all others being zero, the resulting differential equation is a special case of the Liouville equation

∂p(z, t y, t0) ∂ | = a (z, t) p(z, t y, t0) , (3.16) ∂t − ∂z i | i i X   which is known from . Equation (3.16) describes a com- pletely deterministic motion which is a solution of the ordinary differential equation dx(t) = a x(t), t with x(y, t0) = y . (3.17) dt The (probabilistic) solution of the differential equation (3.16) with the initial condition p(z, t0 y, t0)= δ(z y) is p(z, t y, t0)= δ z x(y, t) . | − | − The proof of this assertion is obtained by direct substitution [6, p.54]. ∂ ∂ a (z, t) δ z x(y, z) = a (x(y, t), t) δ z x(y, t) , ∂z i − ∂z i − i i i i X   X   ∂ = a (x(y, t), t) δ z x(y, t) , i ∂z − i i X  ∂ ∂ dx (y, t) and δ z x(y, t) = δ z x(y, t) i , ∂t − − ∂z − dt i i  X  and by means of equation (3.17) the last two lines become equal.

If a particle is in a well-defined position y at time t0 it will remain on the trajectory obtained by solving the corresponding ordinary differential equation (ODE). Deterministic motion can be visualized therefore as an ele- mentary form of Markov process, which can be formulated by a drift-diffusion process with a zero diffusion matrix. 122 Peter Schuster

3.3 Forward and backward equations

Equations which reproduce the time development with respect to initial vari- ables (y, t0) of the probability density p(x, t y, t0) are readily derived: | 1 lim p(x,t y,t0 +∆t0) p(x,t0 y,t0) = ∆t0 0 ∆t0 | − | →   1 = lim dz p(z,t0 +∆t0 y,t0) p(x,t y,t0 +∆t0) p(x,t z,t0 +∆t0) . ∆t0 0 ∆t0 | | − | → Z  

Figure 3.5: Illustration of forward and backward equations. The forward differential Chapman-Kolmogorov equation starts from an initial condition corre- sponding to the sharp distribution δ(y z), (y,t ) is fixed (blue), and the probabil- − 0 ity density unfolds with time t t (black). It is well suited for the description of ≥ 0 actual experimental situations. The backward equation, although somewhat more convenient and easier to handle from the mathematical point of view, is less suited to describe typical experiments and commonly applied to first passage time or exit problems. Here, (x,t) is held constant (blue) and the time dependence of the probability density corresponds to samples unfolding into the past, t t (red). 0 ≤ The initial condition, δ(y z), in this case is rather a final condition represented − by a sharp final distribution. Stochastic Kinetics 123

Thereby we used the Chapman-Kolmogorov equation in the second term and the fact that the first term yields 1 p(x, t y, t0 + ∆t0) on integration. × | Under the usual conditions that p(x, t0 y, t0) is continuous and bounded | in x, t, and t0 for a finite range t t0 >δ> 0 and that all relevant derivatives − exist, we may rewrite the left-hand side of the last equations 1 lim p(x,t y,t0 +∆t0) p(x,t0 y,t0) = ∆t0 0 ∆t0 | − | →   1 = lim dz p(z,t0 +∆t0 y,t0) p(x,t y,t0) p(x,t z,t0) . ∆t0 0 ∆t0 | | − | → Z   Similarly as in section 3.1.2 we can proceed and derive a differential version of this class of the Chapman-Kolmogorov equation:

∂p(x, t y, t0) ∂p(x, t y, t0) | = a (y, t0) | (3.18a) ∂t − i ∂y − 0 i i X 2 1 ∂ p(x, t y, t0) B (y, t0) | + (3.18b) − 2 ij ∂y ∂y i,j i j X

+ dz W (z y, t0) p(x, t y, t0) p(x, t z, t0) . (3.18c) | | − | Z   This equation is called the backward differential Chapman-Kolmogorov equation in contrast to the previously derived forward equation 3.13. In purely mathematical terms the backward equation is (somewhat) better de- fined than its forward analogue. The appropriate initial condition is

p(x, t y, t) = δ(x y) for all t , | − which expresses the fact that the probability density for finding the particle at position x at time t if it is at y at the same time is δ(x y), or in other words − the (classical and non-quantum-mechanical) particle can be simultaneously at x and y if and only if x y. ≡ The forward and the backward equations are equivalent to each other. The basic difference concerns the set of variables which is held fixed. In case on the forward equation we hold (y, t0) fixed, and consequently solutions exist for t t0, so that p(x, t y, t) = δ(x y) is an initial condition for the ≥ | − forward equation. The backward equation has solutions for t0 t and hence ≤ 124 Peter Schuster

it expresses development in t0. Accordingly, p(x, t y, t) = δ(x y) is a final | − condition rather than an initial condition. Both differential expressions, the forward and the backward equation, are useful in their own right. The forward equation gives more directly the values of measurable quantities as functions of the observed time. Accordingly, it is more commonly used in modeling experimental systems. The backward equation finds applications in the study of first passage time or exit prob- lems in which we search for the probability that a particle leaves a region at a certain time. Stochastic Kinetics 125

3.4 Examples of special stochastic processes

In this section we discuss a few stochastic processes which are of special importance and, at the same time, well understood in terms of mathematical analysis.

3.4.1 Poisson process

The Poisson process is commonly used to model certain classes of cumulative random events. These may be, for example, electrons arriving at an anode, customers entering a shop or telephone calls arriving at a switch board. The cumulative number of these events is denoted by the random variable (t). N The probability of arrival is assumed to be λ per unit time, or λ ∆t = λ · in a time interval of length ∆t. The master equation for this process is derived from the conditional probability quantities W ( ) which we denote ·|· as transition probabilities

W (n +1 n, t) = α and otherwise W (m n, t)=0 m = n +1 . (3.19) | | ∀ 6 Accordingly, the master equation is of the form

∂P (n, t n0, t0) | = λ P (n 1, t n0, t0) P (n, t n0, t0) (3.20) ∂t − | − |   and represents a one-sided random walk with a probability α for the walker to step to the right within a unit time interval. The increase in the probability to have n recorded events at time t is proportional to the difference in the probabilities of n 1 and n recorded − events, because of the elementary processes (n 1 n) and (n n + 1) − → → of a single arrival, which increase or decrease the probability of n events, re- spectively. We solve the master equation by introducing the time dependent characteristic function see equations (2.59) and (2.59’) :

. ııs n(t)  . φ(s, t) = E(e ) = P (n, t n0, t0) exp(ııs n) . | n X Now we differentiate φ(s, t) with respect to time and obtain by combining it 126 Peter Schuster with the master equation

. ∂ ∂ ııs n φ(s, t) = P (n, t n0, t0) e = ∂t ∂t | n X . ııs n = λ P (n 1, t n0, t0) P (n, t n0, t0) e = − | − | n X   . . . ııs(n 1) ııs ııs n = λ P (n 1, t n0, t0) e − e P (n, t n0, t0) e = − | − | n ! X. = λ eııs 1 φ(s, t) . −  Since, at first, time t is the only explicit variable it is straightforward to compute the solution:

. φ(s, t) = φ(s, 0) exp λ (eııs 1) t . (3.21) −   It is meaningful to assume that there are no electrons or customers at time t = 0, which implies P (0, 0) = 1, P (n, 0) = 0 for all n = 0, and φ(s, 0) = 1. 6 We are now obtaining the respective solution

n n λt (λt) α α P (n, t 0, 0) = e− = e− . (3.22) | n! n! With λt = α is our old friend, the Poisson distribution (2.74), which has the expectation value E (t) = λt. N In case the probability space is discrete, n N0, and the initial conditions  ∈ are simple, for example P (n, 0) = δ(n) or P (n, 0) = δ(n m) for t = 0, we − shall prefer a simpler notation for the probability of particle numbers

P (t) = Prob( (t)= n) . (3.23) n N With the transition probabilities given in (3.19) we can write equations and solution of the Poisson process in the following way:

n dPn(t) (λt) λt = λ Pn 1(t) Pn(t) and Pn(t) = e− . (3.24) dt − − n!   This notation will conveniently be used for the majority of stochastic pro- cesses in chemistry and biology. Stochastic Kinetics 127

As said initially the Poisson process can be viewed also from a slightly different perspective by considering the arrival times of individual indepen- dent events as random variables , , . We shall assume that they are T1 T2 ··· a t positive and follow an exponential density %(a, t) = a e− · with a > 0 and · ∞ 0 %(a, t) dt = 1, and thus for each index j we have

R a t a t P ( t)=1 e− · and thus P ( > t) = e− · , t 0 . Tj ≤ − Tj ≥ Independence of the individual events implies the validity of

a(t1+...+tn) P ( > t ,..., > t ) = P ( > t ) . . . P ( > t ) = e− , T1 1 Tn n T1 1 · · Tn n which determines the joint probability distribution of the arrival times ’s. Tj The expectation value of the inter-arrival times is simply given by E( )= 1 . Tj a Clearly, the smaller a is, the longer will be the mean inter-arrival time, and thus a can be addressed as the intensity of flow. In comparison to the previous derivation we have a λ. For = 0 and n 1 we define by the cumulative ≡ S0 ≥ random variable n = + . . . = Sn T1 Tn Tj j=1 X the waiting time until the nth arrival. The event = ( t) implies I Sn ≤ that the nth arrival has occurred before time t. The connection between the arrival times and the cumulative number of arrived objects, (t), is easily N performed and illustrates the usefulness of the dual point of view:

P ( ) = P ( t) = P ( (t) n) . I Sn ≤ N ≥ More precisely, (t) is determined by the whole sequence ( , j 1), and N Tj ≥ depends on the elements ω of the sample space through the individual arrival times . In fact, we can compute the number of objects exactly by Tj (t)= n = t t = t . {N } {Sn ≤ } − {Sn+1 ≤ } {Sn ≤ ≤ Sn+1} We may interpret this equation directly: there are exactly n arrivals in [0, t] if and only if the arrival n occurs before t and the arrival (n+1) occurs after 128 Peter Schuster t. For each value of t the probability distribution of the random variable (t) is given by N P ( (t)= n) = P t P t , n N0 , N {Sn ≤ } − {Sn+1 ≤ } ∈ where we used already the initial condition = 0. As we have shown before S0 this distribution of (t) is the Poisson distribution π(at)= π(λt)= π(α). N

3.4.2 Random walk in one dimension

The random walk in one dimension is now a classical and famous problem in probability theory. A walker moves along a line and takes steps to the left or to the right with equal probability and length `. The position of the walker is thus n` with n being an integer, n Z. The first problem to solve ∈ is the computation of the probability that the walker reaches a given point at distance n` from the origin within a predefined time span. For this goal we encapsulate the random walk in a master equation and try to find an analytical solution. For the master equation we have the following transition probabilities per unit time:

W (n+1 n, t) = W (n 1 n, t) = ϑ and W (m n, t)=0 m = n+1, n 1 . | − | | ∀ 6 { − } (3.25) Hence, the master equation describing the evolution of the probability for the walker to be in position n` at time t when he started at n0 ` at time t0 is

∂P (n, t n0, t0) | = ϑ P (n +1, t n0, t0) + P (n 1, t n0, t0) 2 P (n, t n0, t0) . ∂t | − | − |  (3.26) As for the Poisson process, the master equation can be solved by means of the time dependent characteristic function see equations (2.59) and (2.59’) :

. ııs n(t) .  φ(s, t) = E(e ) = P (n, t n0, t0) exp(ııs n) . (3.27) | n X Combining (3.26) and (3.27) yields

. . ∂φ(s, t) ııs ııs = ϑ e + e− φ(s, t) . ∂t  Stochastic Kinetics 129

Figure 3.6: Probability distribution of the random walk. The figure presents the conditional probabilities P (n,t 0, 0) of a random walker to be in posi- | tion n Z at time t for the initial condition to be at n = 0 at time t = t = 0. The ∈ 0 n-values of the individual curves are: n = 0 (black), n = 1 (blue), n = 2 (purple), and n = 3 (red). Parameter choice: ϑ = 1.

Accordingly, the solution for the initial condition n0 =0 at t0 = 0 is . . . . ııs ııs ııs ııs φ(s, t) = φ(s, 0) exp ϑ t (e + e− 2) = exp ϑ t (e + e− 2) . − − . Comparison of the coefficients for individual powers of eııs yields the individ- ual conditional probabilities:

2ϑt P (n, t 0, 0) = I (4ϑt) e− , n Z or | n ∈ 2ϑt P (t) = I (4ϑt) e− , n Z for P (0) = δ(n) . (3.28) n n ∈ n where the pre-exponential term is written in terms of modified Bessel func- tions Ik(θ) with θ =4ϑt, which are defined by ∞ (θ/4)2j+k I (θ) = . (3.29) k j!(j + k)! j=0 X It is straightforward to calculate first and second moments from the charac- teristic function φ(s, t) by means of equation (2.73) and the result is: E (t) = n and σ2 (t) = 2ϑ (t t ) . (3.30) N 0 N − 0   130 Peter Schuster

The expectation value is constant and coincides with the starting point of the random walk and the variance increases linearly with time.

In figure 3.6 we illustrate the probabilities Pn(t) by means of a concrete example. The probability distribution is symmetric for a symmetric initial condition Pn(0) = δ(n) and hence Pn(t)= P n(t). For long times the proba- − bility density P (n, t) becomes flatter and flatter and eventually converges to the uniform distribution over the spatial domain. In case n Z all probabil- ∈ ities vanish: limt Pn(t)=0 for all n. →∞ In the random walk we may also discretize time in the sense that the walker takes a step precisely every time interval τ. Then, time is a discrete variable, t = m τ with m N0. In addition we assume the random walk to · ∈ be symmetric in the sense that steps to the left and to the right are taken with equal probability and find: 1 P n, (m + 1)τ n0,m0τ = P (n + 1,mτ n0,m0τ) + P (n 1,mτ n0,m0τ) . | 2 | − |    For small τ the continuous and the discrete process represent approximations 1 1 to each other with t = m τ, t0 = m0 τ, and ϑ = 2 τ − . The transition probability per unit time, ϑ, in the master equation model corresponds to one half of the inverse waiting time, τ, in the discrete model. Again, it is straightforward to apply the same generating function – with m = t/τ – which leads to the solution m . . 1 ııs ııs φ(s, m)= e + e− 2    and finally to the probability distribution

m 1 1 m n m + n − P (n, m τ 0, 0) = m! − ! ! , (3.31) | 2 2 2        which is also known as the Bernoulli distribution. It is also straightforward to consider the continuous time random walk in the limit of continuous space. This is achieved by setting the distance traveled to x = n` and performing the limit ` 0. For that purpose we can → start from the characteristic function of the distribution in x,

. . . ıısx ıı`s ıı`s φ(s, t) = E e = φ(`s,t) = exp ϑ (e + e− 2) , −     Stochastic Kinetics 131 and take the limit of infinitesimally small steps, lim ` 0: → . . ıı`s ıı`s lim exp ϑ t (e + e− 2) t = ` 0 − →   = lim exp ϑ t ( `2s2 + . . .) = exp( s2 Dt/2) , ` 0 → − −   2 where we used the definition D = 2 lim` 0(` d). Since this is the character- → istic function of the normal distribution we have for the density (2.79): 1 p(x, t 0, 0) = exp x2/2Dt . (2.79) | √2πDt −  We could also have proceeded directly from equation (3.26) and expanded the right-hand side as a function of x up to second order in ` which gives ∂p(x, t 0, 0) D ∂2 | = p(x, t 0, 0) , (3.32) ∂t 2 ∂x2 | 2 where D stands again for 2 lim` 0(` ϑ). This equation will be considered in → detail in the next section 3.4.3, which is dealing with the Wiener process.

3.4.3 Wiener process and the diffusion problem

The Wiener process named after the American mathematician and logician Norbert Wiener is fundamental in many aspects. It is synonymous to Brown- ian motion or white noise and describes among other things random fluctua- tions caused by thermal motion. From the point of view of statistic processes the Wiener process is the solution of the Fokker-Planck equation in one ran- dom variable, (t) and the probability W w0 P ( (t) w0)= p(w, t) dw , W ≤ Z−∞ with drift coefficient zero and diffusion coefficient D = 1. This equation reads: ∂p(w, t w , t ) 1 ∂2 | 0 0 = p(w, t w , t ) . (3.33) ∂t 2 ∂w2 | 0 0 We solve for the initial condition on the conditional probability p(w, t w , t )= 0| 0 0 δ(w w ) by using the characteristic function − 0 . φ(s, t) = dwp(w, t w , t ) exp(ııs w) , | 0 0 Z 132 Peter Schuster which fulfils ∂φ(s, t)/∂t = 1 s2 φ(s, t) as can be shown by applying integra- − 2 tion in parts twice and making use that p(w, t w , t ) like every probability | 0 0 density has to vanish in the limits w and the same is true for the first → ±∞ partial derivative (∂p/∂w). Next we compute the characteristic function by integration: 1 φ(s, t) = φ(s, t ) exp s2 (t t ) . (3.34) 0 · −2 − 0 .  With the initial condition φ(s, t0) = exp(ııs w0) we complete the characteris- tic function . 1 φ(s, t) = exp ııs w s2 (t t ) (3.35) 0 − 2 − 0   and eventually obtain the probability density through inversion of Fourier transformation

2 1 (w w0) p(w, t w0, t0) = exp − . (3.36) | 2π (t t ) − 2(t t0) − 0  −  The density function is a normalp distribution with an expectation value of 2 E (t) = w = ν and variance of E (t) w = t t = σ(t)2, so W 0 W − 0 − 0 that an initially sharp distribution spreads in time as illu strated in figure 3.7. The Wiener process may be characterized by three important features:

(i) irregularity of sample path,

(ii) non-differentiability of sample path, and

(iii) independence of increment.

Although the mean value E (t) is well defined and independent of time, W w , in the sense of a martingale, the mean square E (t)2 becomes infinite 0  W as t . This implies that the individual trajectories, (t), are extremely →∞ W variable and diverge after short time (see, for example, the three trajectories of the forward equation in figure 3.5). We shall encounter such a situation with finite mean but diverging variance also in biology in the case of multi- plication as a birth and death process (chapter 5): Although the mean is well defined it looses its value in practice when the standard deviation becomes much larger than the expectation value. Stochastic Kinetics 133

Figure 3.7: Probability density of the Wiener process. In the figure we show the conditional probability density of the Wiener process which is identical with the normal distribution (figure 2.17), p(w,t w ,t ) = exp (w w )2/ 2(t t ) / 2π(t t ). | 0 0 − − 0 − 0 − 0 The values used are w0 = 5 and t t0 =0.01 (red),0.5 (purple),p 1.0 (violet), 2.0 −  (blue). The initially sharp distribution, p(w,t w ,t ) = δ(w w ) spreads with | 0 0 − 0 increasing time until it becomes completely flat in the limit t . →∞

Continuity of sample paths of the Wiener process has been demonstrated already in subsection 3.1.2. In order to show that the trajectories of the Wiener process are not differentiable we consider the probability

(t + h) (t) ∞ 1 P W − W >k = 2 dw exp( w2/2h) , h kh √2πh −   Z

which can be readily computed from the conditional probability (3.36). In the limit h 0 the integral becomes 1 and the probability is one. The result → 2 implies that, no matter what finite k we choose, (t + h) (t) /h will | W − W | almost certainly be greater than this value. In other words, the derivative of the Wiener process will be infinite with probability one and the sample path is not differentiable. Diffusion is closely related to the Wiener process and hence it is important to proof statistical independence of the increments of (t). Since we are W 134 Peter Schuster dealing with a Markov process we can write the joint probability

n 1 − p(wn,tn; wn 1,tn 1; wn 2,tn 2; . . . ; w0,t0) = p(wi+1,ti+1 wn,tn) p(w0,t0) . − − − − | Yi=0 Now we express the conditional probabilities in terms of (3.36) and find

(w w )2 n 1 exp i+1− i − − 2(ti+1 ti) p(wn,tn; wn 1,tn 1; wn 2,tn 2; . . . ; w0,t0) = − p(w0,t0) . − − − − 2π(ti+1 ti)  Yi=0 − p We simplify notation by writing new variables ∆Wi (ti) (ti 1) and ≡ W − W − 13 ∆ti ti ti 1. The joint probability density for W (tn) becomes now, ≡ − − ∆w2 n exp i − 2∆ti p(∆wn; ∆wn 1; ∆wn 2; . . . ; ∆w1; ∆w0) = p(w0, t0) , − − √  i=1 2π ∆ti Y where the factorization shows independence of the variables ∆Wi of each other and of W (t0). The Wiener process is readily extended to higher dimension. For the multivariate Wiener process, defined as

~ (t) = (t),..., (t) (3.37) W W1 Wn   satisfying the Fokker-Planck equation ∂p(w, t w , t ) 1 ∂2 | 0 0 = p(w, t w , t ) . (3.38) ∂t 2 ∂w2 | 0 0 i i X The solution is a multivariate normal density

2 1 (w w0) p(w, t w0, t0) = exp − . (3.39) | 2π (t t ) − 2(t t0) − 0  −  with mean E ~ (t) = w andp variance-covariance matrix W 0 Σ = E (t) w (t) w = (t t ) δ , ij Wi − 0i Wj − 0j − 0 ij   where all off-diagonal elements – the covariances – are zero. Hence, Wiener processes along different Cartesian coordinates are independent.

13Since we shall refer frequently to the Wiener process in the forthcoming section 3.5, the notation (t) will be replaced by W (t) and the expectation value E( ) by for better W · h·i appearance. Stochastic Kinetics 135

3.4.4 Ornstein-Uhlenbeck process

All three examples of stochastic processes discussed so far had one feature in common: All individual trajectories diverged to +infinity (Poisson-Process) or infinity (random walk and Wiener process) in the long time limit and ± no stationary solutions exist. In this subsection we shall consider a general stochastic process that allows for the approach to a stationary solution, the Ornstein-Uhlenbeck process named after the American physicists George Uh- lenbeck and Leonard Ornstein [66]. Many further examples of processes with stationary solutions will follow in chapters 4 and 5. The Ornstein-Uhlenbeck process is obtained through addition of a drift term to the Wiener process:

∂p(x, t x , 0) ∂ 1 ∂2 | 0 = kxp(x, t x , 0) + D p(x, t x , 0) . (3.40) ∂t ∂x | 0 2 ∂x2 | 0   In order to solve equation (3.40) for the probability density we make use of the characteristic function that fulfils the equation

∞ . φ(s) = eıısx p(x, t x , 0) dx , (3.41) | 0 Z−∞ which is converted into the partial differential equation ∂φ ∂φ 1 + ks = Ds2 φ . ∂t ∂s 2 For the boundary condition p(x, 0 x , 0) = δ(x x ) one can calculate the | 0 − 0 solution for the characteristic function 2 Ds 2kt . kt φ(s, t) = exp 1 e− + ıısx e− . (3.42) − 4k − 0    The characteristic function corresponds to a normal distribution and can be used to calculate the moments of the Ornstein-Uhlenbeck process:

E (t) = µ = x exp( kt) and X 0 − (3.43)   D σ2 (t) = µ = 1 exp( 2kt) . X 2 2k − −     The expectation value E (t) decreases exponentially from (0) = x to X X 0 limt E (t) = µ = 0. In other words, all trajectories start from the same →∞ X   136 Peter Schuster point (0) = x and converge to a final distribution around the mean µ = 0. X 0 Accordingly the variance is initially zero, σ2 (0) = 0 and increases with X 2 time until it reaches the value limt σ (t) = D (2k). This behavior is →∞ X  in contrast to the previously discussed stochastic  process es, in particular in contrast to the Wiener process, where the variance diverges. The stationary solution of the Ornstein-Uhlenbeck process,p ¯(x), can be readily computed from (3.40) by putting ∂p(x, t x , 0)/∂t = 0, which yields | 0 the differential equation

d 1 d¯p(x) kx p¯(x) + D dx 2 dx   and is a Gaussian with meanx ¯ = 0 and variance σ2 = D/2k

k kx2 p¯(x) = exp . (3.44) πD − D r   An explicit solution can be derived by means of stochastic differential equa- tions (see section 3.5). One additional point concerning the probability density and the mean of the Ornstein-Uhlenbeck process is worth to be stressed: The mean of the sta- tionary can easily be shifted by replacing kx by k(x m) in equation (3.40) − where m = E ¯ , the mean of the stationary distribution. The drift term – X the first term on the right-hand side of the equation – prevents the distribu- tion to become completely flat in the long-time limit but still the distribution extends from to + and there are individual trajectories (of measure −∞ ∞ zero) that diverge. Stochastic Kinetics 137

3.5 Stochastic differential equations

The idea of stochastic differential equations goes back to the French math- ematician Paul Langevin who conceived an equation named after him that allows for the introduction of random fluctuations into conventional differ- ential equations [67]. The idea was to find a sufficiently simple approach to model successfully Brownian motion. In its original form the Langevin equation d2r dr m = γ + ξ(t) (3.45) dt2 − dt describes the motion of a Brownian particle where r(t) and v(t) = dr/dt are position and velocity of the particle. Often the Langevin equation is formulated in terms of the and then is written in the more familiar form dv(t) γ 1 = v(t) + ξ(t) . (3.45’) dt −m m The parameter γ = 6πηr is the coefficient according to Stokes law with η being the viscosity coefficient of the medium and r the size of the particle, m is the particle mass, and ξ(t) a fluctuating random force. The analogy of (3.45) to Newton’s equation of motion is evident: The determinis- tic force, f(x)= (∂V/∂x) with V (x) being the potential , is replaced − by ξ(t). The forthcoming discussion of stochastic differential equations follows the presentation by Crispin Gardiner [6, pp.77-96]. In the literature one can find an enormous variety of more detailed treatises. We mention here only the monograph [68] and two books that are available on the internet: [61, 69].

3.5.1 Derivation of the stochastic differential equation

Generalization of equation (3.45) yields dx = a(x, t) + b(x, t) ξ(t) , (3.46) dt where x is the variable under consideration, a(x, t) and b(x, t) are functions defined by the model investigated, and ξ(t) is a rapidly fluctuating term. From the mathematical point of view we require statistical independence for 138 Peter Schuster

ξ(t) and ξ(t0) iff t = t0 and furthermore we assume ξ(t) = 0 since any drift 6 h i term can be absorbed in a(x, t), and cast all requirements in the condition

ξ(t) ξ(t0) = δ(t t0) . (3.47) h i − This assumption has the consequence that σ2 ξ(t) is infinite and leads to the idealized concept of white noise.  The differential equation (3.46) makes only sense if it can be integrated and hence an integral of the form

t u(t) = ξ(τ) dτ Z0 exists. The assumption that u(t) is a continuous function of time has the consequence that u(t) has the Markov property, which can be proven by splitting the integral

t t t ε t 0 − 0 u(t0) = ξ(τ) dτ + ξ(τ 0) dτ 0 = lim ξ(τ) dτ + ξ(τ 0) dτ 0 ε 0 Z0 Zt → Z0  Zt and hence for every ε> 0 the ξ(τ) in the first integral is independent of the

ξ(τ 0) in the second integral. By continuity u(t) and u(t0) u(t) are statistically − independent in the limit ε 0, and further u(t0) u(t) is independent of → − all u(t00) with t00 < t. In other words, u(t0) is completely determined in probabilistic terms by the value u(t) and no information on any past values is required: u(t) is Markovian. According to the differential Chapman-Kolmogorov equation (3.13) and because of the continuity of u(t), it must be possible to find a Fokker-Planck equation for the description of u(t) (see subsection 3.2.2), and we can com- pute the drift and diffusion coefficient(with u(t)=ˆu):

t+∆t u(t + ∆t) uˆ (ˆu, t) = ξ(τ) dτ = 0 and − | Zt   t+∆t t+∆t 2 u(t + ∆t) uˆ (ˆu, t) = dτ ξ(τ)ξ(τ 0) dτ 0 = − | h i Zt Zt D  E t+∆t t+∆t = dτ δ(τ τ 0) dτ 0 = ∆t , − Zt Zt Stochastic Kinetics 139 and we obtain for the drift and diffusion coefficient u(t + ∆t) uˆ (ˆu, t) A(ˆu, t) = lim − | = 0 ∆t 0 ∆t →  2 (3.48) u(t + ∆t) uˆ (ˆu, t) B(ˆu, t) = lim − | = 1 . ∆t 0 D E → ∆t  Accordingly, the Fokker-Planck equation we are looking is that of the Wiener process and we have

t ξ(τ) dτ = u(t) = W (t) . Z0 Considering the consequences of equation (3.48) we are left with the paradox that the integral of ξ(t) is W (t), which is continuous but not differentiable, and hence the Langevin equation (3.45) and the stochastic differential equa- tion (3.46) do not exist in strict mathematical terms. The corresponding integral equation,

t t x(t) x(0) = a x(τ), τ dτ + b x(τ), τ ξ(τ) dτ , (3.49) − Z0 Z0   however, is accessible to consistent interpretation. Eventually, we make the relation to the Wiener process more visible by using

dW (t) W (t + dt) W (t) = ξ(t) dt ≡ − and obtain: t t x(t) x(0) = a x(τ), τ dτ + b x(τ), τ dW (τ) . (3.49’) − Z0 Z0   The second integral is a stochastic Stieltjes integral the evaluation of which will be discussed in the next subsection 3.5.2. Finally, we remark that we have presented here Crispin Gardiner’s ap- proach and assumed continuity of the function u(t). The result was that ξ(t) follows the normal distribution. An alternative approach starts out from the assumption of the Gaussian nature of the probability density ξ(t). It is defi- nitely a matter of taste, which assumption is preferred but the requirement of continuity seems more natural. 140 Peter Schuster

t1 t2 t3 t4 t5 t6 tn-2 tn-1 tn

G() t

t t t t t t t t t t t 0 1 2 3 4 5 6 n-3 n-2 n-1

Figure 3.8: Stochastic integral. The time interval [t0,t] is partitioned into n segments and an intermediate point τi is defined in each segment: ti 1 τi ti. − ≤ ≤

3.5.2 Stochastic integration

In this subsection we define the stochastic integral and present practical recipes for integration (for more details see [70]). Let G(t) be an arbitrary function of time and W (t) the Wiener process, then the stochastic integral is defined as a Riemann-Stieltjes integral (2.35) of the form

t I(t, t0) = G(τ) dW (τ) . (3.50) Zt0 The integral is partitioned into n subintervals, which are separated by the points ti: t0 t1 t2 tn 1 t (figure 3.8). Intermediate points ≤ ≤ ≤ ··· ≤ − ≤ are defined within the subintervals ti 1 τi ti for the evaluation of the − ≤ ≤ function G(τi) and as we shall see the value of the integral depends on the position of the τ’s within the subintervals. t The stochastic integral 0 G(τ) dW (τ) is defined as the limit of the partial sums R n

Sn = G(τi) W (ti) W (ti 1) − − i=1 X   Stochastic Kinetics 141 and it is not difficult to realize that the integral is different for different choices of the intermediate point τi. As an example we consider the important case

G(τi)= W (τi): n

Sn = W (τi) W (ti) W (ti 1) = h i − − * i=1 + X   n n

= min(τi, ti) min(τi, ti 1) = (τi ti 1) . − − − − i=1 i=1 X   X Next we choose the same intermediate position τ for all subintervals ’i’

τi = α ti + (1 α) ti 1 with 0 α 1 (3.51) − − ≤ ≤ and obtain for the sum n

Sn = (ti ti 1) α = (t t0) α . h i − − − i=1 X Accordingly, the mean value of the integral may adopt any value between zero and (t t ) depending on the choice of the position of the intermediate − 0 points as expressed by the parameter α.

Ito stochastic integral. The most frequently used definition of the stochas- tic integral is due to the Japanese mathematician Kiyoshi It¯o[71, 72]. The choice α =0or τi = ti 1 defines the Ito stochastic integral of a function G(t) − to be t n G(τ) dW (τ) = lim G(ti 1) W (ti) W (ti 1) , (3.52) n − − − t0 →∞ i=1 Z X   where the limit is taken as the mean square limit (2.32). As an example we compute the previously discussed integral t W (τ) dW (τ) t0 and find for the sum Sn where we abbreviate W (ti) by Wi: R n n

Sn = Wi 1 (Wi Wi 1) Wi 1 ∆Wi = − − − ≡ − i=1 i=1 X X n 1 2 2 2 = (Wi 1 + ∆Wi) Wi 1 ∆Wi = 2 − − − − i=1 X  1 1 n = W (t)2 W (t )2 ∆W 2 , 2 − 0 − 2 i i=1   X 142 Peter Schuster where the second line results from: 2 ab = (a + b)2 a2 b2. It is now − − necessary to calculate the mean squareP limit of the second term in the last line of the equation. For a finite sum we have the expectation values

n 2 2 ∆Wi = (Wi Wi 1) = (ti ti 1) = t t0 , (3.53) − − − − − * i=1 + i i X XD E X where the second equality results from the Gaussian nature of the probability density (3.36): (W W )2 = W 2 W 2 = σ2(W ) σ2(W )= t t .14 i − j h i i − j i − j i − j Next we calculate the expectation of the mean square deviation in (3.53):

n 2 2 4 (Wi Wi 1) (t t0) = (Wi Wi 1) + − − − − − − * i=1 + i X   X 2 2 + 2 (Wi Wi 1) (Wj Wj 1) − − − − − i

We start the evaluation with the second line and make again use of the independence of Gaussian variables:

2 2 (Wi Wi 1) (Wj Wj 1) = (ti ti 1)(tj tj 1) . − − − − − − − − D E According to (2.84) the fourth moment of a Gaussian variable can be ex- pressed in terms of the variance

2 4 2 2 (Wi Wi 1) = 3 (Wi Wi 1) = 3(ti ti 1) − − − − − − D E D E 14For the derivation of this relation we used the fact that the stochastic variables of the Wiener process at different times are uncorrelated, W W = 0 and the variance is h i j i σ2(W )= W 2 W 2 = W 2 ν2. i i − h ii i −

Stochastic Kinetics 143 and insertion into the expectation value eventually yields:

n 2 2 (Wi Wi 1) (t t0) = * − − − − + Xi=1  2 = 2 (ti ti 1) + (ti ti 1) (t t0) (tj tj 1) (t t0) = − − − − − − − − − − Xi Xi Xj  2 = 2 (ti ti 1) 0 as n . − − → →∞ Xi 2 Accordingly, limn (Wi Wi 1) = t t0 in the mean square limit. →∞ i − − − Eventually, we obtainP for the Ito stochastic integral of the Wiener process:

t 1 2 2 W (τ) dW (τ) = W (t) W (t0) (t t0) . (3.54) t 2 − − − Z 0   We remark that the Ito integral differs from the conventional Riemann- Stieltjes integral where the term t t is absent. An illustrative explanation − 0 for this unusual behavior of the limit of the sum Sn is the fact that the quan- tity W (t + ∆t) W (t) is almost always of the order √t and hence – unlike | − | in ordinary integration – the terms of second order in ∆W (t) do not vanish on taking th e limit. It is also worth noticing that the expectation value of the integral (3.54) vanishes,

t 1 W (τ) dW (τ) = W (t)2 W (t )2 (t t ) = 0 , (3.55) 2 − 0 − − 0 Zt0    since the intermediate terms Wi 1∆Wi vanish because ∆Wi and Wi 1 are h − i − statistically independent. Semimartingales (subsection 3.1.1), in particular local martingales are the most common stochastic processes that allow for straightforward application of It¯o’s formulation of stochastic calculus. 144 Peter Schuster

Stratonovich stochastic integral. As already outlined, the value of a stochastic integral depends on the particular choice of the intermediate points,

τi. The Russian physicist and engineer Ruslan Leontevich Stratonovich [73] and the American mathematician Donald LeRoy Fisk [74] developed simul- taneously an alternative approach to It¯o’s stochastic integration, which is commonly called Stratonovich integration. The intermediate points are cho- sen such that the unconventional term (t t ) does not appear any more. − 0 The integrand as a function of W (t) is evaluated precisely in the middle, namely at the value ti ti 1 /2 and it is straightforward to show that that − − the mean square limit converges to the expression for the integral in conven- tional calculus

t n W (ti)+ W (ti 1) W (τ)dW (τ) = lim − W (ti) W (ti 1) = n 2 − t0 →∞ − Z Xi=1   (3.56) 1 = W (t)2 W (t )2 . 2 − 0   It is important to stress that a stand-alone Stratonovich integral has no relationship to an Ito integral or, in other words, there is no connection between the two classes of integrals for an arbitrary function G(t). Only when the stochastic differential equation is known to which the two integrals refer a formula can be derived that relates one integral to the other.

Nonanticipating functions. The concept of an nonanticipating or adap- tive process has been discussed in subsection 3.1.1. Here we shall require this property in order to be able to solve certain classes of Ito stochastic integrals. The situation we are referring to requires that all functions can be expressed as functions or functionals15 of the Wiener process W (t) by means of a stochastic differential or integral equation of the form

t t x(t) x(t ) = a x(τ), τ dτ + b x(τ), τ dW (τ) . (3.49’) − 0 Zt0 Zt0   A function G(t) is nonanticipating (with respect to t) if G(t) is probabilis- tically independent of W (s) W (t) for all s and t with s > t. In other − 15A function assigns a value to the argument of the function, x f(x ) whereas a 0 → 0 functional relates a function to the value of a function, f f(x ). → 0 Stochastic Kinetics 145 words, G(t) is independent of the behavior of the Wiener process in the future s > t. This is a natural and physically reasonable requirement for a solution of equation (3.49”) because it boils down to the condition that x(t) involves W (τ) only for τ t. Examples of important nonanticipating functions are ≤

(i) W (t) ,

(ii) t F W (τ) dτ , t0 R  (iii) t F W (τ) dW (τ) , t0 R  (iv) t G(τ) dτ, when G(t) itself is nonanticipating, and t0 R (v) t G(τ) dW (τ), when G(t) itself is nonanticipating. t0 R The items (iii) and (v) depend on the fact that in It¯o’s version the stochastic integral is defined as the limit of a sequence in which G(τ) and W (τ) are involved exclusively for τ < t. Three reasons for the specific discussion of nonanticipating functions are important: 1. Many results can be derived that are only valid for nonanticipating func- tions, 2. nonanticipating function occur naturally in situations, in which causality can be expected in the sense that the future cannot affect the presence, and 3. the definition of stochastic differential equations requires nonanticipating functions. In conventional calculus we never encounter situations in which the future acts back on the presence or even on the past. Several relations are useful and required in Ito calculus:

dW (t)2 = dt ,

dW (t)2+n = 0 for n> 0 ,

dW (t)dt = 0 , 146 Peter Schuster

t t n 1 n+1 n+1 n n 1 W (τ) dW (τ) = W (t) W (t0) W (τ) − dτ , t n + 1 − − 2 t Z 0   Z 0 ∂f 1 ∂2f ∂f df W (t),t = + dt + dW (t) , ∂t 2 ∂W 2 ∂W   t  G(τ)dW (τ) = 0 , and Zt0  t t t G(τ)dW (τ) H(τ)dW (τ) = G(τ)H(τ) dτ h i Zt0 Zt0  Zt0 The expressions are easier to memorize when we assign a dimension [t1/2] to W (t) and discard all terms of order t1+n with n> 0. At the end of this subsection we are left with the dilemma that the Ito integral is mathematically and technically most satisfactory but the more natural choice would be the Stratonovich integral that enables the usage of conventional calculus. In addition, the noise term ξ(t) in the Stratonovich interpretation can be real noise with finite correlation time whereas the ide- alized white noise assumed as reference in It¯o’s formalism gives rise to diver- gence of variances and correlations.

3.5.3 Integration of stochastic differential equations

A stochastic variable x(t) is consistent with an Ito stochastic differential equation (SDE)

dx(t) = a x(t), t dt + b x(t), t dW (t) (3.46’)   if for all t and t0 the integral equation (3.49’) is fulfilled. Time is ordered,

t < t < t < < t = t , 0 1 2 ··· n and the time axis may be assumed to be split into (equal or unequal) incre- ments, ∆t = t t . We visualize a particular solution curve of the SDE i i+1 − i for the initial condition x(t0)= x0 by means of a discretized version

xi+1 = xi + a(xi, ti) ∆ti + b(xi, ti) ∆Wi , (3.49”) wherein x = x(t ), ∆t = t t , and ∆W = W (t ) W (t ). Figure 3.9 i i i i+1 − i i i+1 − i illustrates the partitioning of the stochastic process into a deterministic drift Stochastic Kinetics 147

Figure 3.9: Stochastic integration. The figure illustrates the Cauchy-Euler procedure for the construction of an approximate solution of the stochastic dif- ferential equation (3.46’). The stochastic process consists of two different compo- nents: (i) the drift term, which is the solution of the ODE in absence of diffusion

(red; b(xi,ti) = 0) and (ii) the diffusion term representing a Wiener process W (t)

(blue; a(xi,ti) = 0). The superposition of the two terms gives the stochastic pro- cess (black). The two lower plots show the two components in separation. The increments of the Wiener process ∆Wi are independent or uncorrelated. An ap- proximation to a particular solution of the stochastic process is constructed by letting the mesh size approach zero, lim ∆t 0. → component, which is the discretized solution curve of the ODE obtained by setting b x(t), t = 0 in equation (3.49”) and stochastic diffusion compo- nent, which is a random Wiener process W (t) that is obtained by setting a x(t), t = 0 in the SDE. The increment of the Wiener process in the stochastic  term, ∆Wi, is independent of xi provided (i) x0 is independent of all W (t) W (t ) for t > t and (ii) a(x, t) is a nonanticipating function of − 0 0 t for any fixed x. Condition (i) is tantamount to the requirement that any random initial condition must be nonanticipating. 148 Peter Schuster

A particular solution to equation (3.49”) is constructed by letting the mesh size go to zero, lim n implying ∆t 0. In the construction → ∞ → of an approximate solution x is always independent of ∆W for j i as i j ≥ we verify easily that by inspection of (3.49”). Uniqueness of solutions refers to individual trajectories in the sense that a particular solution is uniquely obtained for a given sample function (t) of the Wiener Process W (t). The W existence of a solution is defined for the whole ensemble of sample functions: A solution of equation (3.49”) exists if – with probability one – a particular solution exists for any choice of sample function (t) of the Wiener process. W Existence and uniqueness of solutions to Ito stochastic differential equa- tions can be proven for two conditions [68, pp.100-115]: (i) the Lipschitz condition and (ii) the growth condition. Existence and uniqueness of a nonanticipating solution x(t) of an Ito SDE within the time interval [t0, t] require: (i) Lipschitz condition: there exists a κ such that a(x, τ) a(y, τ) + b(x, τ) b(y, τ) κ x y | − | | − | ≤ | − | for all x and y and τ [t , t], and ∈ 0 (ii) growth condition: a κ exists such that for all τ [t , t] ∈ 0 a(x, τ) 2 + b(x, τ) 2 κ2 (1 + x 2) . | | | | ≤ | | The Lipschitz condition is almost always fulfilled for stochastic differential equations in practice, because in essence it is a smoothness condition. The growth condition, however, may often be violated in abstract model equa- tions, for example, when a solution explodes to infinity. In other words, the value of x may become infinite in a finite (random) time. We shall encounter such situations in the applied chapters 4 and 5. As a matter of fact this is typical model behavior since no population or spatial variable can approach infinity at finite times in a finite world. Several other properties known to apply to solutions of ordinary differen- tial equations apply without major modifications to SDE’s: Continuity in the dependence on parameters and boundary conditions as well as the Markov property (for proofs we refer to [68]). Stochastic Kinetics 149

3.5.4 Changing variables in stochastic differential equations

In order to see the effect of a change of variables in Ito’s stochastic differential equations we consider an arbitrary function: x(t) f x(t) . We start with ⇒ the simpler single variable case and then introduce the mult idimensional situation.

Single variable case. Making use of our previous results on nonanticipating functions we expand df (t) up to second order in dW (t): f x(t) = f x(t)+dx(t) f x(t) = −  1  2 = f 0 x(t) dx(t) + f 00 x(t) dx(t) + = 2 · · ·   1 2 2 = f 0 x(t) a x(t),t dt + b x(t),t dW (t) + f 00 x(t) b x(t),t dW (t) , 2       where all terms higher than second order have been neglected. Introducing dW (t)2 = dt into the last line of this equation we obtain It¯o’s formula:

1 2 df x(t) = a x(t), t f 0 x(t) + b x(t), t f 00 x(t) dt + 2 (3.57)       + b x(t), t f 0 x(t), t dW (t) .

It is worth noticing that It¯o’s formula and ordinary calculus lead to different results unless f x(t) is linear in x(t) and thus f 00 x(t) vanishes.

Many variable case . The application of It¯o’s formalism  to many dimen- sions, in general, becomes very complicated. The most straightforward sim- plification is the extension of Ito calculus to the multivariate case by making use of the rule that dW (t) is an infinitesimal of order t1/2. Then we can show that the following relations hold for an n-dimensional Wiener process

W(t)= W1(t), W2(t),...,Wn(t) :  dWi(t) dWj(t) = δij dt ,

2+N dWi(t) = 0 , (N > 0) ,

dWi(t)dt =0 , dt1+N = 0 , (N > 0) . 150 Peter Schuster

The first relation is a consequence of the independence of increments of

Wiener processes along different coordinate axes, dWi(t) and dWj(t). Making use of the drift vector A(x, t) and the diffusion matrix B(x, t) the multidi- mensional stochastic differential equation

dx = A(x, t) dt + B(x, t) dW(t) . (3.58)

Following It¯o’s procedure we obtain for an arbitrary well-behaved function f x(t) the result  ∂ df(x) = A (x, t) f(x)+ i ∂x i i X 1 ∂2 + B(x, t) B0(x, t) f(x) dt + 2 · ij ∂x ∂x (3.59) i,j i j  X  ∂ + B f(x) dW (t) . ij ∂x j i,j i X Again we observe the additional term introduced through the definition of the Ito integral.

3.5.5 Fokker-Planck and stochastic differential equations

Next we calculate the expectation value of an arbitrary function f x(t) by means of It¯o’s formula and begin with a single variable:  df x(t) df x(t) d = = f x(t) = dt  * dt + dt  ∂f x(t) 1 ∂2f x(t) = a x(t), t + b x(t), t 2 . * ∂x  2 ∂x +   The stochastic variable x(t) has the conditional probability density p(x, t x , t ) | 0 0 and hence we can compute the expectation value by integration – whereby we simplify the notation f(x) f x(t) and p(x, t) p(x, t x , t ): ≡ ≡ | 0 0 d ∂  f(x) = dx f(x) p(x, t) = dt h i ∂t Z ∂f(x) 1 ∂2f(x) = dx a(x, t) + b(x, t)2 p(x, t) ∂x 2 ∂x2 Z   Stochastic Kinetics 151

The further derivation follows the procedure we have used in case of the differential Chapman-Kolmogorov equation in subsection 3.1.2 – in particular integration by parts, neglect of surface terms and so on – and we obtain ∂ ∂ 1 ∂2 dxf(x) p(x,t) = dxf(x) a(x,t) p(x,t) + b(x,t)2 p(x,t) . ∂t −∂x 2 ∂x2 Z Z      Since the choice of a function f(x) has been arbitrary we can drop it now and obtain finally a forward Fokker-Planck type equation ∂p(x, t x , t ) ∂ | 0 0 = a(x, t) p(x, t x , t ) + ∂t − ∂x | 0 0 (3.60) 1 ∂2  + b(x, t)2 p(x, t x , t ) . 2 ∂x2 | 0 0 The probability density p(x, t) thus obeys an equation that is completely equivalent to the equation for a diffusion process characterized by a drift coef- ficient a(x, t) and a diffusion coefficient b(x, t) as derived from the Chapman- Kolmogorov equation. Hence, It¯o’s stochastic differential equation provides indeed a local approximation to a (drift and) diffusion process in probability space. An example comparing a change form Cartesian in polar coordinates in an Ito stochastic differential equation and in the corresponding Fokker- Planck equation is shown in subsubsection 3.6.3.1. The extension to the multidimensional case based on It¯o’s formula (3.59) is straightforward, and we obtain for the conditional probability density p(x, t x , t p) the Fokker-Planck equation: | 0 0 ≡ ∂p ∂ 1 ∂ ∂ = A (x,t) p + B(x,t) B0(x,t) p . (3.61) ∂t − ∂x i 2 ∂x ∂x · i,j i i i,j i j   X  X  Here, we derive one additional property, which is relevant in practice. The stochastic differential equation,

dx = A(x, t) dt + B(x, t) dW(t) , (3.58) is mapped into a Fokker-Planck equation that depends only on the matrix product B B0 and accordingly, the same Fokker-Planck equation arises from · all matrices B that give rise to the same product B B0. Thus, the Fokker- · Planck equation is invariant to a replacement B B S when S is an orthog- ⇒ · onal matrix: S S0 = I. If S fulfils the orthogonality relation it may depend · on x(t), but for the stochastic handling it has to be nonanticipating. 152 Peter Schuster

Now, we want to proof the redundancy directly from the SDE and define a transformed Wiener process

dV(t) = S(t) dW(t) .

The random vector V(t) is a normalized linear combination of Gaussian variables dWi(t) and S(t) in nonanticipating, and accordingly, dV (t) is it- self Gaussian with the same correlation matrix. Averages dWi(t) to various powers and taken at different times factorize and the same is true for the dVi(t). Accordingly, the infinitesimal elements dV(t) are increments of a Wiener process: The orthogonal transformation mixes sample path without, however, changing the stochastic nature of the process. Equation (3.58) can be rewritten and yields

dx = A(x, t) dt + B(x, t) S0(t) S(t) dW(t) = · = A(x, t) dt + B(x, t) S0(t) dV(t) = · = A(x, t) dt + B(x, t) S0(t) dW(t) , · since V(t) is as good a Wiener process as W(t) is, and both SDEs give rise to the same Fokker-Planck equation. Stochastic Kinetics 153

3.6 Fokker-Planck equations

The name Fokker-Planck equation originated from two independent works by the Dutch physicist Adriaan Dani¨el Fokker on Brownian motion of electric dipoles in a radiation field [75] and by the German physicist Max Planck who aimed at a comprehensive theory of fluctuations [76]. Other frequently used notations for this equation are Kolmogorov’s forward equation preferred by mathematicians because Kolmogorov developed the rigorous basis for it [77] or Smoluchowski equation because of Smoluchowski’s use of the equa- tion in random motion of colloidal particles. Fokker-Planck equations are related to stochastic differential equations in the sense that they describe the (deterministic) time evolution of a probability distribution p(x, t x , t ) that | 0 0 is derived from the ensemble of trajectories obtained by integration of the stochastic differential equation with different time courses of the underlying Wiener Process W (t) (subsection 3.5.5). The Fokker-Planck equation (3.15) is a parabolic partial differential equa- tion16 and thus its solution requires boundary conditions in addition to the initial conditions. The boundary conditions are determined by the na- ture of the stochastic process, reflecting boundary conditions, for example, conserve the numbers of particles whereas the particles disappear on the boundaries in case they are absorbing. General boundary conditions may be much more complex than reflection or absorption but these two simple special cases may be used to characterize the extremes, permeable and impermeable boundaries.

16The classification of partial differential equations of the form

Auxx + 2Buxy + Cuyy + Dux + Euy + F = 0 where u(x, y) is a function and the subscripts stand for partial differentiation, for example

A B 2 ux ∂u/∂x, makes use of the determinant of the matrix Z = , det(Z)= AC B ≡ B C! − and defines an elliptic PDE by the condition Z is positive definite, a parabolic PDE by det(Z) = 0, and a hyperbolic PDE by det(Z) < 0. 154 Peter Schuster

3.6.1 Probability currents and boundary conditions

For the definition of a probability current and the derivation of bound- ary conditions we consider a multi-variable forward Fokker-Planck equation (where we omit the explicit statement of initial conditions)

∂p(z, t) ∂ 1 ∂2 = A (z, t) p(z, t) + B (z, t) p(z, t) . (3.62) ∂t − ∂z i 2 ∂z ∂z ij i i i,j i j X X We rewrite this equation by introducing a vectorial flux J(z, t) denoted as probability current that in components is defined as 1 ∂ J (z, t) = A (z, t) p(z.t) B (z, t) p(z, t) , (3.63) i i − 2 ∂z ij j j X and introduction into equation (3.62) yields

∂p(z, t) ∂ + J (z, t) = 0 . (3.64) ∂t ∂z i i i X This equation can be interpreted as a local conservation condition. By in- tegration over a volume V with a boundary S we obtain for the probability that the random variable lies in V : Z P (V, t) = Prob( V ) = dz p(z, t) . Z ∈ ZV The time derivative is conveniently formulated by means of the surface inte- gral ∂P (V, t) = dS n J(z, t) , ∂t S ZS where nS is a unit vector perpendicular to the surface and pointing outward, and nS J is the component of J perpendicular to the surface. The total · change of probability is given by the surface integral of the current J over the boundary of V . The sketch in figure 3.10 is used now to show that the surface integral over the current J for any arbitrary surface S yields the net flow of probability across this surface. The volume V is split into two parts, V1 and V2, and the two volumes are separated by the surface S12. Then, V1 is enclosed by Stochastic Kinetics 155

Figure 3.10: Probability current. The figure presents a sketch, which is used to proof that the probability current (3.63) measures the flow of probability. A total volume V = V1 + V2 with surface S = S1 + S2 is split by a surface S12 into two parts, V1 and V2 and Φ12 (red) and Φ21 (blue) are the particle fluxes from V2 to V1 and vice versa. The unit vector n defines the local direction perpendicular to the differential surface element dS. For the calculation of the probability net flow, Φ = Φ Φ see text. 12 − 21

S1 + S12 and V2 by S2 + S12. In order to compute the net flow of probability we make use of the fact that the sample path of the stochastic process are continuous (because a process described by a Fokker-Planck equation is free of jumps). We denote by Φ12 the particle flow crossing the boundary S12 from V2 to V1, and by Φ21 the flux going in opposite direction from V1 to

V1. Choosing a sufficiently small time interval ∆t the probability of crossing the boundary S12 from V2 to V1 can be expressed by the joint probability of being in V2 at time t and in V1 at time t + ∆t,

Φ12(t, ∆t) = dx dy p(x, t + ∆t; y, t) . ZV1 ZV2

The net flow of probability from V2 to V1 is obtained from the difference between the flows in opposite direction, Φ Φ , through division by ∆t 12 − 21 and forming the limit ∆t 0: → 1 Φ(t) = lim dx dy p(x, t + ∆t; y, t) p(y, t + ∆t; x, t) . ∆t 0 ∆t V V − → Z 1 Z 2   In the limit ∆t = 0 the integrals Φ12(t, ∆t) and Φ21(t, ∆t) vanish because

dx dy p(x, t; y, t)=0 or dx dy p(y, t; x, t)=0 ZV1 ZV2 ZV1 ZV2 156 Peter Schuster since the probability of being simultaneously in both compartments is zero, and we obtain further ∂ ∂ Φ(t) = dx dy p(x, τ; y, t) p(y, τ; x, t) . V V ∂τ − ∂τ Z 1 Z 2   Now we may use the flow version of the Fokker-Planck equation (3.64) and obtain ∂ ∂ Φ(t) = J (x, t; V , t) + J (y, t; V , t) , − ∂x i 2 ∂y i 1 V1 i i V2 i i Z X Z X where the integration over V2 or V1 is encapsulated in the definition of the probability that applies for the flow J(x, t; V2, t), which is calculated accord- ing to (3.63):

p(x, t; V2, t) = dy p(x, t; y, t) . ZV2 Volume integrals are now converted into surface integrals. The integrals over the boundaries S2 and S1 vanish (except for sets of measure zero) because they involve probabilities p(x, t; V2, t) with x neither in V2 nor in its boundary and vice versa. The only non-vanishing terms are those where the integration extends over S12 and this yields 1 Φ(t) = lim dx dy p(x, t + ∆t; y, t) p(y, t + ∆t; x, t) = ∆t 0 ∆t V V − → Z 1 Z 2  

= dS n J(x, t; V1, t) + J(x, t; V1, t) = S · Z 12   = dS n J(x, t) , (3.65) · ZΣ where Σ is some surface separating two regions and n is a unit vector pointing from region V2 to region V1. With the precise definition and the known properties of the probability current we are now in the position to discuss different types of boundary conditions. Stochastic Kinetics 157

Reflecting boundary conditions. A reflecting barrier prevents particles from leaving the volume V and accordingly there is zero net flow across S, the boundary of V :

n J(z, t) = 0 for z S and n normal to S , (3.66) · ∈ where J(z, t) is defined by equation (3.63). Since a particle cannot cross S is must be reflected there and this explains the name reflecting barrier.

Absorbing boundary conditions. An absorbing barrier is defined by the fact that every particle that reaches the boundary is instantaneously removed from the system. In other words, since the fate of the particle outside V is not considered, the barrier absorbs the particle and accordingly the probability to find a particle at the barrier is zero:

P (z, t) = 0 for z S . (3.67) ∈

Discontinuities at boundaries. Both classes of coefficients, Aj and Bij may have discontinuities at the surface S, although the particles are supposed move freely across the boundary S. In order to allow for free motion both the probability, p(z), and the normal component of the current, n J(z), have · be continuous at the boundary S:

p(z) = p(z) and n J(z) = n J(z) , (3.68) S+ S S+ S − · · − where the subscripts S+ and S indicate the limits of the quantities taken − from the left and right-hand side of the surface. The definition of the prob- ability current in equation (3.63) thus is compatible with discontinuities in the derivatives of p(z) at the surface S. . 158 Peter Schuster

3.6.2 Fokker-Planck equation in one dimension

The general Fokker-Planck equation in one variable is of the simple form:

∂f(x, t) ∂ 1 ∂2 = A(x, t) f(x, t) + B(x, t) f(x, t) . (3.69) ∂t −∂x 2 ∂x2     So far we have applied the Fokker-Planck operator always to the conditional probability

f(x, t) = p(x, t x , t ) with p(x, t x , t ) = δ(x x ) (3.70) | 0 0 0| 0 0 − 0 as initial condition. In order to allow for more general initial conditions we need only to redefine the one time probability

p(x, t) = dx p(x, t; x , t ) dx p(x, t x , t ) p(x , t ) , (3.71) 0 0 0 ≡ 0 | 0 0 0 0 Z Z which is compatible with the general initial probability density

p(x, t) = p(x, t0) . t=t0

In the previous section 3.5 we had shown that the stochastic process described by the conditional probability (3.70), which satisfies the Fokker-Planck equa- tion (3.69), is equivalent to the It¯ostochastic differential equation

dx(t) = A x(t), t) dt + B x(t), t) dW (t) . (3.72)  q  In a way the two description are complementary to each other. In particu- lar, perturbation theories derived from the Fokker-Planck equation are very different from those based on the stochastic differential equation but both a suitable in their own right for specific applications.

Boundary conditions in one dimensional systems. General boundary conditions have been discussed in subsection 3.6.1. Systems in one dimension allow for the introduction of additional special boundary conditions that are useful for certain classes of problems. Periodic boundary conditions. An, in principle, infinite system or cyclic sys- tem is partitioned into identical intervals (of finite size). Then, the stochastic Stochastic Kinetics 159 process takes place on an interval [a, b], the endpoints of which are assumed to be identical. For a discontinuity at the boundary we obtain:

lim p(x, t) = lim p(x, t) and x b x a+ → − → (3.73) lim J(x, t) = lim J(x, t) . x b x a+ → − → Continuous boundary conditions simply imply that the two functions A(x, t) and B(x, t) are periodic on the same interval:

A(b, t) = A(a, t) and B(b, t) = B(a, t) or (3.74) p(a, t) = p(b, t) and J(b, t) = J(a, t) , the probability and its derivatives are identical at the endpoints of the inter- val. Prescribed boundary conditions. Under the assumption that the diffusion co- efficient vanishes at the boundary and diffusive motion occurs only for x > a and further that A(x, t) and B(x, t) obey the Lipschitz condition at x = a and B(x, t) is differentiable there,p we have ∂B(a, t) = 0 , the SDE dx(t) = A(x, t) dt + B(x, t) dW (t) ∂t p has solutions and the nature of the boundary conditions is exclusively deter- mined by the sign of A(x, t) at x = a,

(i) exit boundary: A(a, t) > 0, if a particle reaches the point x = a it will certainly proceed out of the interval [a, b] into the open x < a,

(ii) entrance boundary: A(a, t) < 0, if a particle reaches the point x = a it will certainly return to the region x > a or in other words a particle at the right-hand side of x = a can never leave the interval, if the particle is introduced at x = a it will certainly enter the region x > a, and

(iii) natural boundary: A(a, t) = 0, a particle that has reached x = a would remain there, however, it can be shown that this point cannot even be reached from x > a and, moreover, a particle introduced there will stay, and thus this boundary is neither absorbing nor releasing particles. 160 Peter Schuster

The Feller classification of boundaries. William Feller [78] gave very general criteria for the classification of boundary conditions into four classes: regu- lar, exit, entrance, and natural. For this goal definitions of four classes of functions are required: x A(s) (i) f(x) = exp 2 ds , − B(s)  Zx0  2 (ii) g(x) = , B(x) f(x) x (iii) h1(x) = f(x) g(s) ds , and Zx0 x (iv) h2(x) = g(x) f(s) ds . Zx0 wherein x is fixed and from the interval x ]a, b[. Now, we write (x , x ) 0 0 ∈ L 1 2 for the set of all functions, which can be integrated on the interval ]x1, x2[ and then the Feller classification is of the form: (I) regular: if f(x) (a, x ) and g(x) (a, x ) ∈L 0 ∈L 0 (II) exit: if g(x) / (a, x ) and h (x) (a, x ) ∈L 0 1 ∈L 0 (III) entrance: if g(x) (a, x ) and h (x) (a, x ) ∈L 0 2 ∈L 0 (IV) natural: all other cases. The classification becomes important in the context of stationary solutions of the one-dimensional Fokker-Planck equation. Many results concerning the compatibility of stationary solutions with certain classes of boundaries are self-evident and will be discussed in the next paragraph. Boundaries at infinity. In principle, all kinds of boundaries can exist at in- finity but the requirement to obtain a probability density p(x, t) that can be normalized and is sufficiently well behaved is a severe restriction. These requirements are ∂p(x, t) lim p(x, t) = 0 and lim = 0 , (3.75) x x →∞ →∞ ∂t where the second condition excludes cases in which the probability oscillated infinitely fast as x . Accordingly, nonzero current at infinity can only → ∞ occur when either A(x, t) or B(x, t) diverges in the limit x . →∞ Stochastic Kinetics 161

Only two currents at boundaries x = are compatible with conserva- ±∞ tion of probability: (i) J( , t) = 0 and (ii) J(+ , t) = J( , t) corre- ±∞ ∞ −∞ sponding to reflecting and periodic boundary conditions, respectively.

Stationary solutions for homogeneous Fokker-Planck equations. In a homogeneous process the drift and the diffusion coefficients do not depend on time and hence the stationary probability density satisfies the ordinary differential equation d 1 d2 dJ(x) A(x)¯p(x) B(x)¯p(x) = 0 or = 0 , (3.76) dx − 2 dx2 dx     which evidently has the solution

J¯(x) = constant . (3.76’)

For a process on an interval [a, b] we have J¯(a)= J¯(x)= J¯(b)= J¯, therefore. One reflecting boundary implies that the other boundary is reflecting too and hence the current vanishes J¯ = 0. For boundaries that are not reflecting the only case satisfying equation (3.76) is periodic boundary conditions fulfilling (3.73). The conditions for these cases are: Zero current condition. From J = 0 follows 1 d A(x)¯p(x) = B(x)¯p(x) = 0 , 2 dx   which can be solved by integration: x A(ξ) p¯(x) = N exp 2 dξ , B(x) B(ξ)  Za  where the normalization constant is determined by the probability con- b N servation condition, a dx p¯(x)=1. Periodic boundary conditionR . The nonzero current of periodic boundary con- ditions fulfils the equation 1 d A(x)¯p(x) = B(x)¯p(x) = J . 2 dx   It is, however, not arbitrary since it is restricted by normalization and the conditions for periodic boundary conditions,p ¯(a)=p ¯(b) and J(a) = J(b). 162 Peter Schuster

In order to simplify the expressions we define

x A(ξ) ψ(x) = exp 2 dξ , B(ξ)  Za  integrate and obtain

p¯(x) B(x) p¯(a) B(a) x dξ = 2 J . ψ(x) ψ(a) − ψ(ξ) Za Making use of the boundary condition for the current

B(b) B(a) ψ(b) ψ(a) p¯(a) J = − ,  b dξ  a ψ(ξ) and find eventually R

x dξ B(b) + b dξ B(a) a ψ(ξ) · ψ(b) x ψ(ξ) · ψ(a) p¯(x)=p ¯(a) b R B(x) R dξ ! ψ(x) · a ψ(ξ) Infinite range of the stochastic variable andR singular boundaries may com- plicate the situation and a full enumeration and analysis of possible cases is extremely hard. Commonly one relies on the handling of special cases, a typical one is given in the next paragraph.

A chemical reaction as model. For the purpose of illustration we con- sider an autocatalytic chemical reaction, although chemical reactions are commonly modeled better by master equations,

k1

X + A −−−→ 2 X . (3.77) ←−−− k2

The stochastic variable describes the numbers of molecules X and X p(x, t) = P (t)= x (3.78) X  is the corresponding probability density whereby we assume that particle numbers are sufficiently large in order to justify modeling by continuous Stochastic Kinetics 163

Figure 3.11: “Stationary” probability density of the reaction A+X 2X. The figure shows the “stationary” solution of the Fokker-Planck equation of the autocatalytic reaction 3.77 according to equation (3.80). The result is not an ordinary stationary solution because it is not normalizable. variables. The reaction system is of special interest since it has an exit barrier at x = 0 which has a simple physical explanation: If no molecule X is present in the system no X can be produced.17 The Fokker-Planck equation, which will be derived in chapter 4, for reaction 3.77 is of the form

∂p(x, t) ∂ = k ax k x(x 1) p(x, t) + ∂t ∂x 1 − 2 − 1 ∂2   + k ax + k x(x 1) p(x, t) (3.79) 2 ∂x2 1 2 − ≈ ∂  1 ∂2   (k ax k x2 p(x, t) + (k ax + k x2) p(x, t) . ≈ ∂x 1 − 2 2 ∂x2 1 2      Reflecting boundaries are introduced at the positions x = α and x = β and the stationary probability density is computed to be

4a 1 (a + x) − 2x p¯(x) = e− . (3.80) x

17Actually this is an artifact of a system violating thermodynamics. Correct thermo- dynamic handling of catalysis requires that for every catalyzed reaction the uncatalyzed reaction is taken into account too. This is A X for the current example and if this is done properly the singularity at x = 0 disappears. 164 Peter Schuster

This function (figure 3.11) is not normalizable for α = 0. In fact the pole at x = 0 is a result of absorption occurring there and this becomes evident when we compute the Fokker-Planck coefficients

2 B(0, t) = (ax + x ) x=0 = 0 , A(0, t) = (ax x2) = 0 , and − x=0 (3.81) ∂ B(0, t) = (a +2x) > 0 , ∂x x=0 which meet the conditions of an exit boundary. The stationary solution is strictly relevant for α> 0 only. The meaning of the reflecting barrier is quite simple, whenever a molecule X disappears another one is added instanta- neously. Despite the mathematical difficulties the stationary solution (3.80) is a useful representation of the probability distribution in practice except near the point x = 0, because the time required for all molecules X to disap- pear is extraordinarily long in real chemical systems and exceeds the duration of an experiment by many orders of magnitude.

3.6.3 Fokker-Planck equation in several dimensions

Although multidimensional Fokker-Planck equations are characterized by es- sentially more complex behavior than in the one-dimensional case – in par- ticular, boundary problems are much more complex and show much higher variability, some analogies between one and many dimensions are quite useful and will be reported here.

3.6.3.1 Change of variables

A multidimensional Fokker-Planck equation written in general variables, x =

(x1, x2,...,xn)0, n n ∂p(x,t) ∂ 1 ∂2 = Ai(x)p(x,t) + Bij(x)p(x,t) , (3.82) ∂t − ∂xi 2 ∂xi∂xj Xi=1   i,jX=1   is to be transformed into the corresponding equation for the new variables

ξi = fi(x) (with i = 1,...,n). The functions fi are assumed to be inde- pendent and differentiable. If π(ξ, t)is the probability density for the new Stochastic Kinetics 165 variable, we can obtain it from

∂x1 ∂x1 . . . ∂x1 ∂ξ1 ∂ξ2 ∂ξn ∂x2 ∂x2 ∂x2 ∂ξ ∂ξ . . . ∂ξ π(ξ, t) = p(x, t) 1 2 n = p(x, t) J , (3.83) . . .. . ·| | . . . .

∂xm ∂xm . . . ∂xm ∂ξ1 ∂ξ2 ∂ξn where J is the Jacobian matrix and J the Jacobian determinant of the | | transformation of coordinates. Often, the easiest way to implement the change of variable is make use if the corresponding stochastic differential equation

dx(t) = A(x) dt + b(x) dW(t) with b(x) b(x)0 = B(x) , (3.58’) · and then recompute the Fokker-Planck equations for π(ξ, t) from the stochas- tic differential equations. Commonly, both procedures are quite involved and the calculations are quite mess unless the use of symmetries or simplifications facilitate the problem.

Transformation from Cartesian to polar coordinates.. As an example we consider the Rayleigh process also called Rayleigh fading. The commonly applied model uses two orthogonal Ornstein-Uhlenbeck processes along the real and the imaginary axis of an electric field, E =(E1, E2). The stochastic differential equation,

dE1(t) = γ E1(t) dt + ε dW1(t) and − (3.84) dE (t) = γ E (t) dt + ε dW (t) , 2 − 2 2 is converted into polar coordinates

E1(t) = α(t) cos φ(t) and E2(t) = α(t) sin φ(t) . . . With α(t) = exp µ(t) we can write E1 + ıı E2 = exp µ(t)+ ııφ(t) and for the Wiener processes we define 

dWα = dW1(t) cos φ(t) + dW2(t) sin φ(t) and dW = dW (t) cos φ(t) + dW (t) sin φ(t) , φ − 1 2 166 Peter Schuster and obtain for Raleigh fading in polar coordinates ε dφ(t) = dW (t) and α(t) φ (3.85) ε2 dα(t) = γ α(t) + dt + ε dW (t) . − 2α(t) α   The interesting result is that the phase angle φ diffused without a drift term. Now we shall perform the analogous transformation on the Fokker-Planck equation describing the probability density p(E1, E2, t)

∂p(E , E ,t) ∂ ∂ 1 ∂2p ∂2p 1 2 = γ (E p)+ (E p) + ε2 + (3.86) ∂t ∂E 1 ∂E 2 2 ∂E2 ∂E2  1 2   1 2  The transformation of coordinates, (E , E ) = (α,φ) with E = α cos φ 1 2 ⇒ 1 and E2 = α cos φ yields the following Jacobian determinant

∂(E , E cos φ sin φ J = 1 2 = = α , | | ∂(α,φ) α sin φ α cos φ −

and the Laplacian transformed to polar coordinates yields (equation (3.86) second term on the right-hand side):

∂2 ∂2 1 ∂ ∂ 1 ∂2 2 + 2 = α + 2 2 . ∂E1 ∂E2 α ∂α ∂α α ∂φ  2 2 1 For the inverse transformation we obtain α = E1 + E2 and φ = tan− (E2/E1) and the Jacobian p

1 ∂(α,φ cos φ sin φ/α 1 J − = = − = α− . | | ∂(E1, E2) sin φ cos φ/α

Further calculations are straightforward and yield ∂ ∂ (E1 p)+ (E2,p) = ∂E1 ∂E2 ∂p ∂α ∂p ∂φ ∂p ∂α ∂p ∂φ = 2 p + E + + E + = 1 ∂α ∂E ∂φ ∂E 2 ∂α ∂E ∂φ ∂E  1 1   2 2  ∂p 1 ∂ = 2 p + α = (α2 p) , ∂α α ∂α Stochastic Kinetics 167 the probability density in polar coordinates becomes

∂(E , E π(α,φ) = 1 2 p(E , E ) = αp(E , E ) . ∂(α,φ) 1 2 1 2

Collection of all previous results leads to ∂π(α,φ,t) ∂ ε2 ε2 1 ∂2π ∂2π = γ α + π + + . (3.87) ∂t − ∂α − α 2 α2 ∂φ2 ∂α2   !   The Fokker-Planck equation (3.87) corresponds to the two stochastic differ- ential equations (3.85), which were derived by changing variables according to Ito’s formalism.

3.6.3.2 Stationary solutions

Here, we give a brief overview over the stationary solutions of many variable Fokker-Planck equations. Some general aspects of boundary conditions have been discussed already in subsection 3.6.1 and we just summarize. For the forward Fokker-Planck equation the reflecting barrier boundary condition has to satisfy

n J = 0 for x S , · ∈ where S is the surface of the domain V that is considered (figure 3.10), n is a local vector standing normal to the surface and J is the probability current with the components 1 ∂ J (x, t) = A (x, t) p(x, t) B (x, t) p(x, t) . i i − 2 ∂x ij j j X The absorbing barrier condition is

p(x, t) = 0 for x S . ∈ In reality, parts of the surface may be reflecting whereas other parts are absorbing. Then, at discontinuities on the surface the conditions

n J(x, t) = n J(x, t) and p (x, t) = p (x, t) for x S · 1 · 2 1 2 ∈ 168 Peter Schuster have to be fulfilled. The stationarity condition for the multidimensional Fokker-Planck equa- tion implies that the probability current J (3.63) vanishes for all x V and ∈ leads to the equation for the stationary probability densityp ¯(x):

1 ∂p¯(x) ∂ B (x) =p ¯(x) A (x) B (x) . (3.88) 2 ij ∂x i − ∂x ij j j j j ! X X Under the assumption that the matrix B is invertible this equation can be rewritten as

∂ 1 ∂ log p¯(x) = Bik− (x) 2 Ak(x) Bkj(x) ∂xi − ∂xj ≡ k j ! (3.89)  X X Z (A, B, x) . ≡ i The stationarity condition (3.89) cannot be satisfied for arbitrary drift vec- tors A and diffusion matrices B since the left-hand side of the equation is a gradient and thus Zi has to fulfill the (necessary and sufficient) condition for a vanishing curl, (∂Zi/∂xj )=(∂Zj/∂xi). If the vanishing curl condi- tion is fulfilled, the stationary solution can be obtained by straightforward integration x p¯(x) = exp dξ Z(A, B, x) .  Z  This condition is often addressed as potential condition, because the gradients Z can be associated with the existence of a potential Φ(x), and is better i − illustrated by

x p¯(x) = exp Φ(x) with Φ(x) = dξ Z(A, B, x) . (3.90) − − Z  Not every Fokker-Planck equation has a stationary solution but if it sustains one, the solution can be obtained by simple integration. The Rayleigh process in polar coordinates (subsubsection 3.6.3.1) is used as an illustrative example. From equation (3.87) we obtain

2 γα + ε ε2 0 A = − 2α , B = , 0 ! 0 ε2! Stochastic Kinetics 169

Figure 3.12: Detailed balance in a reaction cycle A B C A. In the cycle of three monomolecular reaction the condition of detailed balance is stronger than the stationarity condition. The common condition for the stationary state, d[A] dt = d[B] dt = d[C] dt = 0, requires that the probability currents for the individual reaction steps are equal, J12 = J23 = J31 = J, whereas detailed balance is satisfied only when all individual currents vanish, J = 0. from which we obtain

2γα 1 ∂ ∂ 1 ε2 + α Bα,j = 0 , Bφ,j = 0 and hence Z = 2 B− A = − , ∂xj ∂xj · 0 ! Xj Xj

(∂Zα ∂φ)=0 and(∂Zφ ∂α) = 0, and then the stationary solution is of the form (with being a normalization constant)  N  (α,φ) p¯(α,φ) = exp dαZα + dφZφ = Z !  γα2 = exp log α = (3.91) N − ε2   γα2 = α exp . N − ε2  

3.6.3.3 Detailed balance

The stationary solutions of certain Fokker-Planck equations correspond to the condition of vanishing probability current and this property suggests that 170 Peter Schuster the condition is a particular version of the physical principle of detailed bal- ance. In detailed balance was studied first by Richard Tolman [79]. A Markov process fulfils detailed balance if in the stationary state the frequency of every possible transition is balanced by the correspond- ing reverse transition. A special example showing that detailed balance is stronger than the stationarity condition is presented in figure 3.12. In an n membered cycle stationarity is obtained iff all individual probability cur- rents are equal, J = J = = J = J, whereas detailed balance requires 1 2 ··· n J = 0. The local flux for a single reaction step is J = k [I ] k [I ] i i,i+1 i − i+1,i i+1 the condition of a constant flux Ji = J can be fulfilled even in case of irre- versible reactions, for example kj,i = 0 for all i = 1, 2,...,n and j = i + 1, j =mod (n).

Detailed balance in Markov processes. In general, the condition of de- tailed balance in Markov processes is formulated best by means of time re- versal visualized as a transformation to reversed variables: x = ε x with i ⇒ i i ε = 1 depending whether the variable shows even or odd behavior under i ± time reversal. The detailed balance requires

p¯(x, t + τ; z, t)=p ¯(ζ, t + τ; ξ, t) with (3.92) ξ =(ε1x1,ε2x2,... ) and ζ =(ε1z1,ε2z2,... ) .

By setting τ = 0 equation (3.92) we find

δ(x z)¯p(z) = δ(ξ ζ)¯p(ξ) . − − The two delta functions are equal as only simultaneous changes of sign are in- volved, and hencep ¯(x)=¯p(ξ) as a consequence of the formulation of detailed balance. The condition 3.92 can now be rewritten in terms of conditional probabilities: p(x, τ z, 0)p ¯(z) = p(ζ, τ ξ, 0)p ¯(x) . (3.92’) | | Equation (3.92’) has consequences for several stationary quantities and func- tions. For the mean holds x¯ = ξ¯ (3.93) h i

Stochastic Kinetics 171 and this has the consequence that all odd variables – variables x with ε = 1 i i − – have zero mean at the stationary state with detailed balance. For the au- tocorrelation functions and the spectrum one obtains with ε =(ε1,...,εn)0:

G(τ) = εG0(τ)ε0 and S(ω) = εS0(ω)ε0 , with the covariance matrix Σ fulfilling Σ ε = ε0 Σ for G(τ) at τ = 0. A somewhat different situation arises in cases where the vectorial quan- tity transforms like an axial vector or pseudovector and represents angular like mechanical rotation or magnetic fields. Then, there exist two or more stationary solutions and the condition (3.92) must be relaxed to

p λk (x, t + τ z, t) = p εkλk (ζ, t + τ ξ, t), (3.92”) | | where λ =(λ1,λ2,... ) is a vector of constant quantities under rotation which changes to (ε1λ1,ε2λ2 . . . ) under time reversal. Crispin Gardiner suggests to call this property time reversal invariance instead of detailed balance [6, p.145].

Detailed balance in the differential Chapman-Kolmogorov equa- tion. The necessary and sufficient conditions for a homogeneous Markov process to have a stationary state that fulfils detailed balance are: (i) W (x z)¯p(z) = W (ζ ξ)¯p(x) , (3.94) | | ∂ (ii) ε A (ξ)¯p(x) = A (x) + B (x)¯p(x) , (3.95) i i − i ∂x ij j j X   (iii) εiεj Bij(ξ) = Bij(x) . (3.96) The corresponding conditions for the Fokker-Planck equation are obtained simply by setting the jump probabilities equal to zero: W (x z)=0. A | considerable simplification arises for exclusively even variables because the conditions simplify to: (i) W (x z)¯p(z) = W (z x)¯p(x) , (3.94’) | | 1 ∂ (ii) A (x)¯p(x) = B (x)¯p(x) , (3.95’) i 2 ∂x ij j j X   (iii) Bij(x) = Bij(x) . (3.96’) 172 Peter Schuster

We remark that condition (3.95’) is identical with the condition of a vanishing flux at the stationary state equation (3.88) that has been the requirement for the existence of a potential φ(x). Special for Fokker-Planck is the partitioning of the drift term into a re- versible and an irreversible part [80–82]: 1 D (x) = A (x) + ε A (ξ) irreversible drift (3.97) i 2 i i i 1  I (x) = A (x) ε A (ξ) reversible drift (3.98) i 2 i − i i   Making use again of the probability formulated by means of a potential, p¯(x) = exp φ(x) , the conditions for detailed balance are of the form −

εiεj Bij(ξ) = Bij(x) , 1 ∂ 1 ∂φ(x) D (x) B (x) = B (x) , and i − 2 ∂x ij − 2 ij ∂x j j j j X X ∂ ∂φ(x) I (x) I (x) = 0 . ∂x i − i ∂x i i i X   Under the assumption that these conditions are fulfilled by the functions

Di(x) and Bij(x) and matrix B is invertible the equation in the middle can be rewritten in the form ∂Zˆ ∂Zˆ i = j with ∂xj ∂xi

1 ∂ Zˆ = B− (x) 2 D (x) B (x) , i ik k − ∂x kj j j ! Xk X and the stationary probability densityp ¯(x) fulfilling detailed balance is of the form, x p¯(x) = exp φ(x) = exp dz Zˆ , (3.99) − ·   Z  and can be calculated explicitly as an integral. Finally, we mention the the reciprocity relations in linear irreversible ther- modynamics, developed by the Norwegian and American physicist Lars On- sager and therefore also called Onsager relations, are also a consequence of detailed balance [83, 84]. Stochastic Kinetics 173

3.7 Autocorrelation functions and spectra

Analysis of experimentally recorded or computer created sample path is often largely facilitated by the usage of additional tools complementing moments and probability distributions since they can, in principle, be derived from a single trajectory. These tools are autocorrelation functions and spectra of random variables (for an extensive treatment of time series analysis see, for example, [85]). They provide direct insight into the dynamics of the process as they deal with relations between sample points collected at different times. The autocorrelation function of the random variable (t) is a measure X of the influence the value x recorded at time t has on the measurement of the same variable at time t + τ 1 t G(τ) = lim dθ x(θ) x(θ + τ) . (3.100) t t →∞ Z0 It represents the time average of the product of two values recorded at arbi- trary or sufficiently long times. The autocorrelation function is of particular importance in the analysis of experimental data because technical devices called autocorrelators have been built which sample data and can record directly the autocorrelation function of a process under investigation. Another relevant quantity is the spectrum or the spectral density of the quantity x(t). In order to derive the spectrum, we construct a new variable . t ııωθ y(ω) by means of the transformation y(ω) = 0 dθ e x(θ). The spectrum is then obtained from y by performing the limit t : R →∞ t 2 1 1 . S(ω) = lim y(ω) 2 = lim dθ eııωθ x(θ) . (3.101) t 2πt | | T 2πt →∞ →∞ Z0

The autocorrelation function and the spectrum are closely connected. By some calculations one finds t t τ 1 1 − S(ω) = lim cos(ωτ) dτ x(θ) x(θ + τ) dθ . t π t →∞  Z0 Z0  Under certain assumptions, which insure the validity of the interchanges of order, we may take the limit t and find →∞ 1 ∞ S(ω) = cos(ωτ) G(τ) dτ . π Z0 174 Peter Schuster

This result relates the Fourier transform of the autocorrelation function to the spectrum and can be cast in an even prettier form by using t τ 1 − G( τ) = lim dθ x(θ) x(θ + τ) = G(τ) − t t τ →∞ Z− to yield the Wiener-Khinchin theorem named after Norbert Wiener and the Russian mathematician Aleksandr Khinchin + + . . 1 ∞ ııωτ ∞ ııωτ S(ω) = e− G(τ) dτ and G(τ) = e S(ω) dω . (3.102) 2π Z−∞ Z−∞ Spectrum and autocorrelation function are related to each other by the Fourier transformation and its inversion. Equation (3.102) allows for a straightforward proof that the Wiener pro- cess ~ (t) = W (t) gives rise to white noise (subsection 3.4.3). Let w be W a zero-mean random vector with the identity matrix as (auto)covariance or autocorrelation matrix:

2 E(w) = µ = 0 and Cov( , ) = E(ww0) = σ I , W W then the Wiener process W (t) fulfils the relations,

µ (t) = E (t) = 0 and W W G (τ) = E (t) (t + τ) = δ(τ) , W W W defining it as a zero-mean process with infinite power at zero time shift. For the spectral density of the Wiener process we obtain: + . 1 ∞ ııωτ 1 S (ω) = e− δ(τ) dτ = . (3.103) W 2π 2π Z−∞ The spectral density of the Wiener process is a constant and hence all fre- quencies in the noise are represented with equal weight. All colors mixed with equal weight in light yields white light and this property of visible light gave the name for white noise; in colored noise the noise frequencies do not fulfil the uniform distribution. The time average of a signal as expressed by an autocorrelation function is complemented by the ensemble average, , or expressed by the expec- h·i tation value of the corresponding random variable, E( ), which implies an · Stochastic Kinetics 175

(infinite) number of repeats of the same measurement. In case the assump- tion of ergodic behavior is true, the time average is equal to the ensemble average. Thus we find for a fluctuating quantity (t) in the ergodic limit X E (t), (t + τ) = x(t)x(t + τ) = G(τ) . X X h i  It is straightforward to consider dual quantities which are related by Fourier transformation and get:

. . 1 ııωt ııωt x(t) = dωc(ω) e and c(ω) = dt x(t) e− . 2π Z Z We use this relation to derive several important results. Measurements refer to real quantities x(t) and this implies: c(ω) = c∗( ω). From the condition − of stationarity, x(t)x(t0) = f(t t0) and does not depend on t otherwise h i − follows

. 1 ııωt+iωt c(ω)c∗(ω0) = dt dt0 e− 0 x(t)x(t0) = h i (2π)2 h i ZZ . δ(ω ω0) ııωτ = − dτ e G(τ) = δ(ω ω0) S(ω) . 2π − Z The last expression relates not only the mean square c(ω) 2 with the spec- h| | i trum of the random variable, it shows also that stationarity alone implies that c(ω) and c∗(ω0) are uncorrelated. 176 Peter Schuster 4. Applications in chemistry

The chapter starts by a discussion of stochasticity in chemical reactions and then presents the chemical master equation as an appropriate tool for mod- eling chemical reactions. Then, the birth-and-death process is introduced as a well justified and useful approximation to general jump processes. The equilibration of particle numbers or concentrations in the flow reactor is used as a simple example to demonstrate the analysis of a birth-and-death pro- cess. Next follow discussions of mono- and bimolecular chemical reactions that can be still solved exactly by means of time dependent probability gen- erating functions. The last sections handle the transition from microscopic to macroscopic systems by means of the size expansion technique and the numerical approach to stochastic chemical kinetics.

4.1 Stochasticity in chemical reactions

Stochastic chemical kinetics is based on the assumption that knowledge on the transformation of molecules in chemical reactions is not accessible in full detail or if it would, the information would be overwhelming and obscur- ing essential features. Thus, it is assumed that chemical reactions have a probabilistic element and can be modeled properly by means of stochastic processes. The random processes are caused by thermal noise as well as by random encounter of molecules in collisions. Fluctuations, therefore, play an important role and they are responsible for the limitations in the reproduc- tion of experiments. We shall model chemical reactions as Markov processes and analyze the corresponding master and Fokker-Planck equations. As an appropriate criterium for classification we shall use the molecularity of reac- tions1 and the complexity of the dynamical behavior.

1The molecularity of a reaction is the number of molecules that are involved in the reac- tion, for example two in a reactive collision between molecules or one in a conformational

177 178 Peter Schuster

The stochastic approach to chemical reaction kinetics has tradition and started in the late fifties from two different initiatives: (i) approximation of the highly complex vibrational relaxation [86–88] and its application to chem- ical reactions, and (ii) direct simulation of chemical reactions as stochastic processes [89–91]. The latter approach is in the sense of the initially men- tioned limited information on details and has been taken up and developed further by several groups [92–96]. The major part of these works has been summarized in an early review [97] which is particularly recommended here for further reading. Bartholomay’s work is also highly relevant for biological models of evolution, because he studied reproduction as a linear birth-and- death process. Exact solutions to master or Fokker-Planck equation can be found only for particularly simple special cases. Often approximations were used or the analysis has been restricted the expectation values of variables. Later on computer assisted approximation techniques and numerical simula- tion methods were developed which allow for handling stochastic phenomena in chemical kinetics on a more general level [64, 98].

4.1.1 Elementary steps of chemical reactions

Chemical reactions are defined by mechanisms, which can be decomposed into elementary processes. An elementary process describes the transforma- tion of one or two molecules into products. Elementary processes involving three or more molecules are unlikely to happen in the vapor phase or in dilute solutions, because trimolecular encounters are very rare under these conditions. Therefore, elementary steps of three molecules are not consid- ered in conventional chemical kinetics.2 Two additional events which occur in open systems, for example in flow reactors, are the creation of a molecules through influx or the annihilation of a molecule through outflux. Common elementary steps are:

change. 2Exceptions are reactions involving surfaces as third partner, which are important in gas phase kinetics, and biochemical reactions involving macromolecules. Stochastic Kinetics 179

? A (4.1a) −−−→ A (4.1b) −−−→ A B (4.1c) −−−→ A 2 B (4.1d) −−−→ A B + C (4.1e) −−−→ A + B C (4.1f) −−−→ A + B 2 A (4.1g) −−−→ A + B C + B (4.1h) −−−→ A + B C + D (4.1i) −−−→ 2 A B (4.1j) −−−→ 2 A 2 B (4.1k) −−−→ 2 A B + C (4.1l) −−−→ Depending on the number of reacting molecules the elementary processes are called mono-, bi-, or trimolecular. Tri- and higher molecular elementary steps are excluded in conventional chemical reaction kinetics as said above. The example show in equation (4.1g) is an autocatalytic elementary process. In practice autocatalytic reactions commonly involve many elemen- tary steps and thus are the results of complex reaction mechanisms. In order to study basic features of autocatalysis or chemical self-enhancement, single step autocatalysis is often used as a model system. One particular trimolec- ular autocatalytic process,

2 A + B 3 A , (4.2) −−−→ became very famous [99] despite its trimolecular nature which makes it un- likely to occur in real systems. The elementary step (4.2) is the essential step in the so-called Brusselator model, it can be straightforwardly addresses by analytical mathematical techniques, and it gives rise to complex dynamical 180 Peter Schuster phenomena in space and time which are otherwise not observed in chemi- cal reaction systems. Among other features such special phenomena are: (i) multiple stationary states, (ii) chemical hysteresis, (iii) oscillations in concen- trations, (iv) deterministic chaos, and (v) spontaneous formation of spatial structures. The last example is known as Turing instability [100] and is fre- quently used as a model for pattern formation or morphogenesis in biology [101].

4.1.2 The master equation in chemistry

Provided particle numbers are assigned to the variables describing the progress of chemical reactions, the stochastic variable (t) with the probability P (t)= N n P ( (t)= n) can take only nonnegative integer values, n N0. In addition N ∈ we introduce a few simplifications and some conventions in our notation. We shall use the forward equation unless stated differently and assume an infinitely sharp initial density: P (n, 0 n , 0) = δ with n = n(0). Then, | 0 n,n0 0 we can simplify the full notation by P (n, t n , 0) P (t) with the implicit | 0 ⇒ n assumption of the initial condition specified above. Other sharp initial val- ues or for initial extended probability densities will be given explicitly. The expectation value of the stochastic variable (t) will be denoted by N ∞ E (t) = n(t) = n P (t) . (4.3) N h i · n n=0  X Its stationary value, provided it exists, will be written

n¯ = lim n(t) . (4.4) t →∞ h i Almost alwaysn ¯ will be identical with the long time value of the correspond- ing deterministic variable. The running index of integers will be denoted by either m or n0. Then the chemical master equation is of the form ∂P (t) n = W (n m)P (t) W (m n)P (t) . (4.5) ∂t | m − | n m X  The transition probabilities may be time dependent in certain cases: W (n m, t). | Most frequently we shall assume that they are not. The probabilities W (n m) | Stochastic Kinetics 181

. can be understood as the elements of a transition matrix W = W ; n, m { nm ∈ N0 . Diagonal elements W cancel in the master equation (4.5) and hence } nn need not be defined. According to their nature as transition probabilities, all W with n = m have to be nonnegative. We may define, nevertheless, nm 6 m Wmn = 0 which implies Wnn = m=n Wmn and then insertion into − 6 P(4.5) leads to a compact form of the masterP equation ∂P (t) n = W P (t) . (4.5’) ∂t nm m m X

Introducing vector notation, P(t)0 =(P1(t),...,Pn(t),...), we obtain

∂P(t) = W P(t) . (4.5”) ∂t ×

With the initial condition Pn(0) = δn,n0 stated above we can solve equa- tion (4.5”) formally for each n0 and obtain

P (n, t n0, 0) = exp(W t) , | n,n0   where the element (n, n0) of the matrix exp(Wt) is the probability to have n particles at time t, (t)= n, when there were n particles at time t = 0. N 0 0 The evaluation of this equation boils down to diagonalize the matrix W which can be done analytically in rather few cases only. For the forthcoming considerations we shall derive the so-called jump moments ∞ α (n) = (m n)p W (m n) ; p =1, 2,... . (4.6) p − | m=0 X The usefulness of the first two jump moments (p = 1, 2) is easily demon- strated: We multiply equation (4.5) by n and obtain through summation:

d ∞ ∞ n = m W (n m) P (t) n W (m n) P (t) = dt h i | m − | n n=0 m=0 X X  ∞ ∞ = (m n) W (m n) P (t) = α (n) . − | n h 1 i n=0 m=0 X X 182 Peter Schuster

Only in case α1(n) is a linear function of n formation of moment and expec- tation value may be interchanged and we have the simple equation d n = α ( n ) . dt h i 1 h i Otherwise this is only a zeroth order approximation which can be improved through expansion of α (n)in(n n ). Break off after the second derivative 1 −h i yields d 1 d2 n = α ( n ) + σ 2 α ( n ) . (4.6’) dt h i 1 h i 2 n dn2 1 h i In order to obtain a consistent approximation one may apply a similar ap- proximation to the time development of the variance and finds [98]: d d σ 2 = α ( n )+2 σ 2 α ( n ) . (4.6”) dt n 2 h i n dn 1 h i These expressions will be simplified in case of the forthcoming examples. We proceed now by discussing first some important special cases where exact solutions are derivable and then present a general and systematic approx- imation scheme which allows to solve the master equation for sufficiently large systems [64, 98]. This scheme is based on a power series expansion in some extensive physical parameter Ω, for example the size of the system or 1/2 the total number of particles. It will turn out that Ω− is the appropri- ate quantity for the expansion and thus the approximation is based on the smallness of fluctuations. This implies that we shall encounter the limits of reliability of the technique at small population sizes or in situations of self-enhancing fluctuations, for example at instabilities or phase transitions. The chemical master equation has been shown to be based on a rigor- ous microscopic concept of chemical reactions in the vapor phase within the frame of classical collision theory [3]. The two general requirements that have to be fulfilled are: (i) a homogeneous mixture as it is assumed to exits through well stirring and (ii) thermal equilibrium implying that the veloci- ties of molecules follow a Maxwell-Boltzmann distribution. Daniel Gillespie’s approach focusses on chemical reactions rather than molecular species and is well suited to handle reaction networks. In addition the algorithm can be easily implemented for computer simulation. We shall discuss the Gillespie formalism together with the computer program in section 4.4. Stochastic Kinetics 183

Figure 4.1: Sketch of the transition probabilities in master equations. In the general jump process steps of any size are admitted (upper drawing) whereas in birth-and-death processes all jumps have the same size. The simplest and most common case is dealing with the condition that the particles are born and die one at a time (lower drawing).

4.1.3 Birth-and-death master equations

The concept of birth-and-death processes has been created in biology and is based on the assumption that only a finite number of individuals are produced – born – or destroyed – die – in a single event. The simplest and the only case, we shall discuss here, is occurs when birth and death is confined to single individuals of only one species. These processes are commonly denoted as one step birth-and-death processes.3 In figure 4.1 the transitions in a general jump process and a birth-and-death process are illustrated. Restriction to single events is tantamount to the choice of a sufficiently small time interval of recording, ∆t, such that the simultaneous occurrence of two events has

3In addition, one commonly distinguishes between birth-and-death processes in one variable and in many variables [64]. We shall restrict the analysis here to the simpler single variable case here. 184 Peter Schuster a probability of measure zero (see also section 4.4). This small time step is often called the blind interval, because no information on things happening within ∆t is available. Then, the transition probabilities can be written in the form

+ W (n m, t) = w (m) δn,m+1 + w−(m) δn,m 1 , (4.7) | − since we are dealing with only two allowed processes:

n n + 1 with w+(n) as transition probability per unit time and −→ n n 1 with w−(n) as transition probability per unit time. −→ − In subsection 3.4.1 we discussed the Poisson process which can be understood as a birth-and-death process on n N0 with zero death rate. Modeling of ∈ chemical reactions by birth-and-death processes turns out to be a very useful approach for reaction mechanisms which can be described by changes in a single variable. The stochastic process can now be described by a birth-and-death master equation

∂Pn(t) + = w (n 1) Pn 1(t) + w−(n + 1) Pn+1(t) ∂t − − − (4.8) + (w (n)+ w−(n) P (t) . − n   There is no general technique that allows to find the time-dependent solutions of equation (4.8) and therefore we shall present some special cases later on. In subsection 5.1.2 we shall give also a detailed overview of exactly solvable single step birth-and-death processes. It is, however, possible to analyze the stationary case in full generality.

Provided a stationary solution of equation (4.8), limt Pn(t) = P¯n, ex- →∞ ists, we can compute it in straightforward manner. It is useful to define a probability current J(n) for the n-th step in the series,

Particle number 0 1 . . . n 1 n n + 1 . . . − Reaction step 1 2 ... n 1 n n + 1 . . . − Stochastic Kinetics 185 which is of the form

+ J(n) = w− n P¯(n) w (n 1) P¯(n 1) . (4.9) − − − Now, the conditions for the stationary solution are given by ∂P (t) n = 0 = J(n + 1) J(n) , (4.10) ∂t − 0 Restriction to positive particle numbers, n N , implies w−(0) = 0 and ∈ Pn(t)=0 for n< 0, which in turn leads to J(0) = 0. Now we sum the vanishing flow terms according to equation (4.10) and obtain: n 1 − 0 = J(z + 1) J(z) = J(n) J(0) . − − z=0 X  Thus we find J(n) = 0 for arbitrary n which leads to w+(n 1) n w+(z 1) P¯n = − P¯n 1 and finally P¯n = P¯0 − . w (n) − w (z) − z=1 − Y The condition J(n) = 0 for every reaction step is known in chemical ki- netics as the principle of detailed balance, which has been formulated first by the American mathematical physicist Richard Tolman [79] (see also sec- tion 3.6.3.3 and [6, pp.142-158]). The macroscopic rate equations are readily derived from the master equa- tion through computation of the expectation value:

∂ ∂ ∞ E n(t) = nP (t) = ∂t ∂t n n=0 !  X ∞ + + = n w (n 1) Pn 1(t) w (n) Pn(t) + − − − n=0 X   ∞ + n w−(n + 1) P (t) w−(n) P (t) = n+1 − n n=0 X   ∞ + + = (n + 1) w (n) nw (n)+(n 1) w−(n) nw−(n) P (t) = − − − n n=0 X  ∞ ∞ + + = w (n) P (t) w−(n) P (t) = E w (n) E w−(n) . n − n − n=0 n=0 X X   186 Peter Schuster

Neglect of fluctuations yields the deterministic rate equation of the birth- and-death process

d n + h i = w n w− n . (4.11) dt h i − h i

  + The condition of stationary yields:n ¯ = limt n(t) for which holds w (¯n)= →∞ h i w−(¯n). Compared to this results we note that the maximum value of the stationary probability density, max P¯ , n N0 , is defined by P¯ P¯ { n ∈ } n+1 − n ≈ (P¯n P¯n 1) or P¯n+1 P¯n 1 which coincide with the deterministic value − − − ≈ − for large n.

4.1.4 The flow reactor

The flow reactor is introduced as an experimental device that allows for investigations of systems off thermodynamic equilibrium. The establishment of a stationary state or the flow equilibrium in a flow reactor (figure 4.2) is a suitable case study for the illustration of the search for a solution of a birth- and-death master equation. At the same time the non-reactive flow of a single compound represents the simplest conceivable process in such a reactor. The 1 stock solution contains A at the concentration [A] =a ˆ =a ¯ [mole l− ]. influx · The influx concentrationa ˆ is equal to the stationary concentrationa ¯, because no reaction is assumed to take place in the reactor. The flow is measured by 1 1 means of the flow rate r [l sec− ]: This implies an influx ofa ¯ r [mole sec− ] · · · of A into the reactor, instantaneous mixing with the content of the reactor, and an outflux of the mixture in the reactor at the same flow rate r.4 The reactor has a volume of V [l] and thus we have a mean residence time of 1 τ = V r− [sec] of a volume element dV in the reactor. R · In- and outflux of compound A into and from the reactor are modeled by two formal elementary steps or pseudo-reactions

? A −−−→ (4.12) A . −−−→ 4The assumption of equal influx and outflux rate is required because we are dealing with a flow reactor of constant volume V (CSTR, figure 4.2). Stochastic Kinetics 187

Figure 4.2: The flow reactor. The reactor shown in the sketch is a device of chemical reaction kinetics which is used to carry out chemical reactions in an open system. The stock solution contains materials, for example A at the concentration

[A]influx =a ˆ, which are usually consumed during the reaction to be studied. The reaction mixture is stirred in order to guarantee a spatially homogeneous reaction medium. Constant volume implies an outflux from the reactor that compensates precisely the influx. The flow rate r is equivalent to the inverse mean residence 1 time of solution in the reactor multiplied by the reactor volume, τ − V = r. The R · reactor shown here is commonly called continuously stirred tank reactor (CSTR).

In chemical kinetics the differential equations are almost always formulated in molecular concentrations. For the stochastic treatment, however, we replace concentrations by the numbers of particles,n ¯ =a ¯ V N with n N0 and · · L ∈ 188 Peter Schuster

5 NL being Loschmidt’s or Avogadro’s number, the number of particles per mole. The particle number of A in the reactor is a stochastic variable with the probability P (t) = P (t) = n . The time derivative of the probability n N distribution is described by means of the master equation

∂Pn(t) 0 = r nP¯ n 1(t)+(n+1) Pn+1(t) (¯n+n) Pn(t) ; n N . (4.13) ∂t − − ∈   Equation (4.13) describes a birth-and-death process with w+(n) = rn¯ and w−(n) = rn. Thus we have a constant birth rate and a death rate which is proportional to n. Solutions of the master equation can be found in text books listing stochastic processes with known solutions, for example [19]. Here we shall derive the solution by means of probability generating functions as introduced in subsection 2.7.1, equation (2.70) in order to illustrate one particularly powerful approach:

∞ n g(s, t) = Pn(t) s . (2.70’) n=0 X

Sometimes the initial state is included in the notation: gn0 (s, t) implies

Pn(0) = δn,n0 . Partial derivatives with respect to time t and the dummy variable s are readily computed:

∂g(s,t) ∞ ∂P (t) = n sn = ∂t ∂t · nX=0 ∞ n = r n¯ Pn 1(t) + (n + 1) Pn+1(t) (¯n + n) Pn(t) s and − − n=0 X  ∂g(s,t) ∞ n 1 = n P (t) s − . ∂s n nX=0 5As a matter of fact there is a difference between Loschmidt’s and Avogadro’s number 23 1 that is often ignored in the literature: Avogadro’s number, N =6.02214179 10 mole− L × 25 3 refers to one mole substance whereas Loschmidt’s constant n = 2.6867774 10 m− 0 × counts the number of particles in one liter gas under normal conditions. The conver- sion factor between both constants is the molar volume of an ideal gas that amounts to 3 1 22.414 dm− mole− . · Stochastic Kinetics 189

Proper collection of terms and arrangement of summations – by taking into account: w−(0) = 0 – yields

∂g(s,t) ∞ n ∞ n = rn¯ Pn 1(t) Pn(t) s + r (n + 1) Pn+1(t) n Pn(t) s . ∂t − − − n=0 n=0 X  X  Evaluation of the four infinite sums

∞ n ∞ n 1 Pn 1(t) s = s Pn 1(t) s − = sg(s, t) , n=0 − n=0 − X∞ n X Pn(t) s = g(s, t) , n=0 X ∞ n ∂g(s, t) (n + 1) Pn+1(t) s = , and n=0 ∂t X ∞ n ∞ n 1 ∂g(s, t) nPn(t) s = s nPn(t) s − = s , n=0 n=0 ∂t X X and regrouping of terms yields a linear partial differential equation of first order ∂g(s, t) ∂g(s, t) = r n¯(s 1) g(s, t) (s 1) . (4.14) ∂t − − − ∂s   The partial differential equation (PDE) is solved through consecutive sub- stitutions

∂φ(s, t) ∂φ(s, t) φ(s, t) = g(s, t) exp( ns¯ ) = r(s 1) , − −→ ∂t − − ∂s ∂ψ(ρ, t) ∂ψ(ρ, t) s 1 = eρ and ψ(ρ, t) = φ(s, t) + r . − −→ ∂t ∂ρ

Computation of the characteristic manifold is equivalent to solving the ordinary differential equation (ODE) r dt = dρ. We find: rt ρ = C − − where C is the integration constant. The general solution of the PDE is an arbitrary function of the combined variable rt ρ: −

n¯ rt n¯ ψ(ρ, t) = f exp( rt + ρ) e− and φ(s, t) = f (s 1) e− e− , − · − ·    and the probability generating function

rt g(s, t) = f (s 1) e− ) exp (s 1)¯n . − · −    190 Peter Schuster

Normalization of probabilities (for s = 1) requires g(1, t) = 1 and hence f(0) = 1. The initial conditions as expressed by the conditional probability P (n, 0 n , 0) = P (0) = δ leads to the final expression | 0 n n,n0 g(s, 0) = f(s 1) exp (s 1)¯n = sn0 , − · − n0 rt f(ζ) = (ζ + 1) exp( ζn¯) with ζ = (s 1) e− , · − − n (4.15) rt 0 rt g(s,t) = 1 + (s 1) e− exp n¯(s 1) e− exp n¯(s 1) = − · − − · −  n0 rt  rt  = 1 + (s 1) e− exp n¯(s 1) (1 e− ) . − · − − −   From the generating function we compute with somewhat tedious but straight- forward algebra the probability distribution

min n0,n krt rt n0+n 2k { } n e 1 e − rt 0 n k − − n¯ (1 e− ) P (t) = n¯ − − e− − (4.16) n k · (n k)! · k=0  X   − with n, n , n¯ N0. In the limit t we obtain a non-vanishing contri- 0 ∈ → ∞ bution to the stationary probability only from the first term, k = 0, and find n¯n lim Pn(t) = exp( n¯) . t →∞ n! − This is a Poissonian distribution with parameter and expectation value α = n¯. The Poissonian distribution has also a variance which is numerically identical with the expectation value, σ2( ) = E( )=¯n, and thus the NA NA distribution of particle numbers fulfils the √N-law at the stationary state. The time dependent probability distribution allows to compute the ex- pectation value and the variance of the particle number as a function of time

rt E (t) =n ¯ + (n0 n¯) e− , N − · (4.17) 2 rt rt σ (t) = n¯ + n e− 1 e− . N 0 · · − As expected the expectation  value apparently coincides with the solution curve of the deterministic differential equation

dn + = w (n) w−(n) = r (¯n n) , (4.18) dt − − which is of the form

rt n(t)=n ¯ + (n n¯) e− . (4.18’) 0 − · Stochastic Kinetics 191

Figure 4.3: Establishment of the flow equilibrium in the CSTR. The upper part shows the evolution of the probability density, Pn(t), of the number of molecules of a compound A which flows through a reactor of the type illustrated in figure 4.2. The initially infinitely sharp density becomes broader with time until the variance reaches its maximum and then sharpens again until it reaches stationarity. The stationary density is a Poissonian distribution with expectation value and variance, E( ) = σ2( ) =n ¯. In the lower part we show the expectation value N N E (t) in the confidence interval E σ. Parameters used:n ¯ = 20, n = 200, N ± 0 and V = 1; sampling times (upper part): τ = r t = 0 (black), 0.05 (green), 0.2  · (blue), 0.5 (violet), 1 (pink), and (red). ∞ 192 Peter Schuster

Since we start from sharp initial densities variance and standard deviation are zero at time t = 0. The qualitative time dependence of σ2 (t) , however, {NA } depends on the sign of (n n¯): 0 − (i) For n n¯ the standard deviation increases monotonously until it 0 ≤ reaches the value √n¯ in the limit t , and →∞

(ii) for n0 > n¯ the standard deviation increases until it passes through a maximum at 1 t(σ ) = ln 2 + ln n ln(n n¯) max r 0 − 0 −   and approaches the long-time value √n¯ from above.

In figure 4.3 we show an example for the evolution of the probability den- sity (4.16). In addition, the figure contains a plot of the expectation value E (t) inside the band E σ

4.2 Classes of chemical reactions

In this section we shall present exact solutions of the chemical master equa- tion for two classes of chemical reactions: monomolecular and bimolecular. Molecularity of a reaction refers to the number of molecules in the reaction complex and in most cases the molecularity is also reflected by the chemical rate law of reaction kinetics in form of the reaction order. In particular, we distinguish fist order and second order kinetics, which is typically observed with monomolecular and bimolecular reactions, respectively.

4.2.1 Monomolecular chemical reactions

The reversible mono- or unimolecular chemical reaction can be split into two irreversible elementary reactions

k1 A B (4.19a) −−−→ k2 A B , (4.19b) ←−−− wherein the reaction rate parameters, k1 and k2, are called reaction rate constants. The reaction rate parameters depend on temperature, pressure, and other environmental factors. At equilibrium the rate of the forward reaction (4.19a) is precisely compensated by the rate of the reverse reac- tion (4.19b), k [A]= k [B], leading to the condition for the thermodynamic 1 · 2 · equilibrium: k [B] K = 1 = . (4.20) k2 [A] The parameter K is called the equilibrium constant that depends on tem- perature, pressure, and other environmental factors like the reaction rate parameters. In an isolated or in a closed system we have a conservation law:

(t) + (t) NA NB = [A] + [B] = c(t) = c =c ¯ = constant , (4.21) V N 0 · A with c being the total concentration andc ¯ the corresponding equilibrium value, limt c(t)=¯c. →∞ 194 Peter Schuster

4.2.1.1 Irreversible monomolecular chemical reaction

We start by discussing the simpler irreversible case,

k A B , (4.19a’) −−−→ which is can be modeled and analyzed in full analogy to the previous case of the flow equilibrium. Although we are dealing with two molecular species, A and B the process is described by a single stochastic variable, (t), NA since we have (t) = n (t) with n = n(0) being the number of NB 0 −NA 0 A molecules initially present because of the conservation relation (4.21). If a sufficiently small time interval is applied, the irreversible monomolecular reaction is modeled by a single step birth-and-death process with w+(n)=0 6 and w−(n)= kn. The probability density is defined by P (t)= P ( = n) n NA and its time dependence obeys ∂P (t) n = k (n + 1) P (t) knP (t) . (4.22) ∂t n+1 − n The master equation (4.22) is solved again by means of the probability gen- erating function, ∞ g(s, t) = P (t) sn ; s 1 , n | | ≤ n=0 X which is determined by the PDE ∂g(s, t) ∂g(s, t) k (1 s) = 0. ∂t − − ∂s The computation of the characteristic manifold of this PDE is tantamount to solving the ODE ds k dt = = ekt = s 1 + const . s 1 ⇒ − − With φ(s, t)=(s 1) exp( kt)+ γ, g(s, t) = f(φ), the normalization con- − − n0 dition g(1, t) = 1, and the boundary condition g(s, 0) = f(φ)t=0 = s we find n0 kt kt g(s, t) = s e− + 1 e− . (4.23) · − 6  +  We remark that w−(0) = 0 and w (0) = 0 are fulfilled, which are the conditions for a natural absorbing barrier at n = 0 (section 5.1.2). Stochastic Kinetics 195

This expression is easily expanded in binomial form, which orders with re- spect to increasing powers of s,

kt n0 n0 kt kt n0 1 n0 2kt kt n0 2 g(s,t) = (1 e− ) + se− (1 e− ) − + se− (1 e− ) − + − 1 − 2 −    

n0 n0 1 (n0 1)kt kt n0 n0kt + . . . + s − e− − (1 e− )+ s e− . n 1 −  0 −  Comparison of coefficients yields the time dependent probability density

n n n0 n P (t) = 0 exp( kt) 1 exp( kt) − . (4.24) n n − − −       It is straightforward to compute the expectation value of the stochastic vari- able , which coincides again with the deterministic solution, and its vari- NA ance

kt E A(t) = n0 e− , N (4.25) 2 kt kt σ (t) = n e− 1 e− . NA 0 −

The half-life of a population of n0 particles, 

n0 ktm 1 t : E (t) = = n e− = t = ln 2 , 1/2 {NA } 2 0 · ⇒ 1/2 k is time of maximum variance or standard deviation, dσ2/ dt=0ordσ/ dt = 0, respectively. An example of the time course of the probability density of an irreversible monomolecular reaction is shown in figure 4.4.

4.2.1.2 Reversible monomolecular chemical reaction

The analysis of the irreversible reaction is readily extended to the reversible case (4.19), where we are dealing with a one step birth-and-death process. Again we are dealing with a closed system, the conservation relation (t)+ NA (t)= n – with n being again the number of molecules of class A initially NB 0 0 present, Pn(0) = δn,n0 – holds and the transition probabilities are given by: + 7 w (n) = k (n n) and w−(n) = k n. The master equation is now of the 2 0 − 1 7 Here we note the existence of barriers at n = 0 and n = n0, which are characterized + + by w−(0) = 0, w (0) = k2n0 > 0 and w (n0) = 0, w−(n0) = k1n0 > 0, respectively. These equations fulfil the conditions for reflecting barriers (section 5.1.2). 196 Peter Schuster

Figure 4.4: Continued on next page. Stochastic Kinetics 197

Figure 4.4: Probability density of an irreversible monomolecular reac- tion. The three plots on the previous page show the evolution of the probability density, Pn(t), of the number of molecules of a compound A which undergo a re- action A B. The initially infinitely sharp density P (0) = δ becomes broader → n n,n0 with time until the variance reaches its maximum at time t = t1/2 = ln 2/k and then sharpens again until it approaches full transformation, limt Pn(0) = δn,0. →∞ On this page we show the expectation value E (t) and the confidence intervals NA E σ (68,3%,red) and 2σ (95,4%,blue) with σ2 (t) being the variance. Pa- ± ± NA 1 rameters used: n0 = 200, 2000, and 20 000; k = 1 [t− ]; sampling times: 0 (black), 0.01 (green), 0.1 (blue), 0.2 (violet), (0.3) (magenta), 0.5 (pink), 0.75 (red), 1 (pink), 1.5 (magenta), 2 (violet), 3 (blue), and 5 (green). form

∂Pn(t) = k2(n0 n + 1)Pn 1(t) + k1(n + 1)Pn+1(t) ∂t − − − (4.26) k n + k (n n) P (t) . − 1 2 0 − n   Making use of the probability generating function g(s, t) we derive the PDE

∂g(s, t) ∂g(s, t) = k +(k k )s k s2 + n k (s 1) g(s, t) . ∂t 1 2 − 1 − 1 ∂s 0 2 −   The solutions of the PDE are simpler when expressed in terms of parameter combinations, κ = k1 + k2 and λ = k1/k2, and the function 198 Peter Schuster

ω(t)= λ exp( κt)+1: − n0 κt s g(s, t) = 1+(s 1) e− = − − λ  κt κt n0 λ (1 e− ) + s (λe− + 1) = − = 1+ λ   n0 n0 n n n0 κt n κt − s = λe− +1 λ(1 e− ) . n − (1 + λ)n0 n=0    X    The probability density for the reversible reaction is then obtained as

n0 n n0 1 κt n κt − Pn(t) = λe− +1 λ(1 e− ) . (4.27) n (1 + λ)n0 −      Expectation value and variance of the numbers of molecules are readily com- puted (with ω(t)= λ exp( κt)+1): − n E (t) = 0 ω(t) , NA 1+ λ (4.28)  n ω(t) ω(t) σ2 (t) = 0 1 , NA 1+ λ − 1+ λ    and the stationary values are

k2 lim E A(t) = n0 , t →∞ N k1 + k2  2 k1 k2 lim σ A(t) = n0 , (4.29) t 2 →∞ N (k1 + k2)  k1 k2 lim σ A(t) = √n0 . t N k + k →∞ p1 2  This result shows that the √N-law is fulfilled up to a factor that is indepen- dent of N: E/σ = √n0 k2/ k1 k2. Starting from a sharp distribution,p Pn(0) = δn,n0, the variance increases, may or may not pass through a maximum and eventually reaches the equi- ¯2 2 librium value, σ = k1k2 n0/(k1 + k2) . The time of maximal fluctuations is easily calculated from the condition dσ2/ dt = 0 and one obtains

1 2 k t = ln 1 . (4.30) var max k + k k k 1 2  1 − 2  Stochastic Kinetics 199

Figure 4.5: Continued on next page. 200 Peter Schuster

Figure 4.5: Probability density of a reversible monomolecular reaction The three plots on the previous page show the evolution of the probability density,

Pn(t), of the number of molecules of a compound A which undergo a reaction A B.

The initially infinitely sharp density Pn(0) = δn,n0 becomes broader with time until the variance settles down at the equilibrium value eventually passing a point of maximum variance. On this page we show the expectation value E (t) and the NA confidence intervals E σ (68,3%,red) and 2σ (95,4%,blue) with σ2 (t) being ± ± NA  1 the variance. Parameters used: n0 = 200, 2000, and 20 000; k1 = 2 k2 = 1 [t− ]; sampling times: 0 (black), 0.01 (dark green), 0.025 (green), 0.05 (turquoise), 0.1 (blue), 0.175 (blue violet), 0.3 (purple), 0.5 (magenta), 0.8 (deep pink), 2 (red).

Depending on the sign of (k k ) the approach towards equilibrium passes a 1 − 2 maximum value or not. The maximum is readily detected from the height of the mode of Pn(t) as seen in figure 4.5 where a case with k1 >k2 is presented. In order to illustrate fluctuations and their value under equilibrium con- ditions the Austrian physicist Paul Ehrenfest designed a game called Ehren- fest’s urn model [102], which was indeed played in order to verify the √N-law. Balls, 2N in total, are numbered consecutively, 1, 2,..., 2N, and distributed arbitrarily over two containers, say A and B. A lottery machine draws lots, which carry the numbers of the balls. When the number of a ball is drawn, the ball is put from one container into the other. This setup is already suffi- cient for a simulation of the equilibrium condition. The more balls are in a Stochastic Kinetics 201 container, the more likely it is that the number of one of its balls is drawn and a transfer occurs into the other container. Just as it occurs with chem- ical reactions we have self-controlling fluctuations: Whenever a fluctuations becomes large it creates a force for compensation which is proportional to the size of the fluctuation.

4.2.2 Bimolecular chemical reactions

Two classes of bimolecular reactions are accessible to full stochastic analysis:

k A + B C and (4.31a) −−−→ k 2 A B . (4.31b) −−−→

Bimolecularity gives rise to nonlinearities in the kinetic differential equations and in the master equations and complicates substantially the analysis of the individual cases. At the same time, these classes of bimolecular equations do not show essential differences in the qualitative behavior compared to the corresponding monomolecular or linear case A B in contrast to −→ autocatalytic processes (section 5.1). The following derivations are based upon two publications [94, 95].

4.2.2.1 Addition reaction

In the first example (4.31a) we are dealing with three dependent stochastic variables (t), (t), and (t). Following McQuarrie et al. we define the NA NB NC probability P (t) = P (t) = n and apply the standard initial condition n NA P (0) = δ , P ( (0) = b) = δ , and P ( (0) = c)= δ . Accordingly, n n,n0 NB b,b0 NC c,0 we have from the laws of stoichiometry (t)= b n + (t) and (t)= NB 0 − 0 NA NC n (t). For simplicity we denote b n = ∆ . Then the master equation 0 −NA 0 − 0 0 for the chemical reaction is of the form

∂P (t) n = k (n + 1)(∆ + n + 1) P (t) k n (∆ + n) P (t) . (4.31a’) ∂t 0 n+1 − 0 n 202 Peter Schuster

Figure 4.6: Irreversible bimolecular addition reaction A+B C. The plot → shows the probability distribution P (t) = Prob (t)= n describing the num- n NC ber of molecules of species C as a function of time and calculated by equation (4.37). The initial conditions are chosen to be (t) = δ(a, a ), (t) = δ(b, b ), and NA 0 NB 0 (t) = δ(c, 0). With increasing time the peak of the distribution moves from NC left to right. The state n = min(a0, b0) is an absorbing state and hence the long time limit of the system is: limt C (t) = δ n, min(a0, b0) . Parameters used: →∞ N a = 50, b = 51, k = 0.02 [t 1 M 1]; sampling times (upper part): t = 0 (black), 0 0 − · −  0.01 (green), 0.1 (turquoise), 0.2 (blue), 0.3 (violet), 0.5 (magenta), 0.75 (red), 1.0 (yellow), 1.5 (red), 2.25 (magenta), 3.5 (violet), 5.0 (blue), 7.0 (cyan), 11.0 (turquoise), 20.0 (green), and (black). ∞

We remark that the birth and death rates are no longer linear in n. The corresponding PDE for the generating function is readily calculated

∂g(s, t) ∂g(s, t) ∂2g(s, t) = k (∆ + 1)(1 s) + ks(1 s) . (4.32) ∂t 0 − ∂s − ∂s2

The derivation of solutions or this PDE is quite demanding. It can be achieved by separation of variables:

∞ g(s, t) = Am Zm(s) Tm(t) . (4.33) m=0 X Stochastic Kinetics 203

We dispense from details and list only the coefficients and functions of the solution: (2m + ∆ )Γ(m + ∆ )Γ(n + 1)Γ(n + ∆ + 1) A = ( 1)m 0 0 0 0 0 , m − Γ(m + 1)Γ(∆ + 1)Γ(n m + 1)Γ(n + ∆ + m + 1) 0 0 − 0 0 Zm(s) = Jm(∆0, ∆0 +1,s) , and T (t) = exp m(m + ∆ ) kt . m − 0 Herein, Γ represents the conventional gamma function with the definition Γ(x +1) = xΓ(x), and J(p,q,s) are the Jacobi polynomials named after the German mathematician Carl Jacobi [103, ch.22, pp.773-802], which are solutions of the differential equation 2 d Jn(p,q,s) dJn(p,q,s) s(1 s) + q (p +1)s + n(n + p) Jn(p,q,s)= 0 . − ds2 − ds   These polynomials fulfil the following conditions:

dJn(p,q,s) n(n + p) = Jn 1(p +2, q +1,s) and ds − s − 2 1 n! Γ(q) Γ(n + p q + 1) q 1 p q s − (1 s) − J (p,q,s) J (p,q,s) ds = − δ . − n ` (2n + p)Γ(n + p)Γ(n + q) `,n Z0 At the relevant value of the dummy variable, s = 1, we differentiate twice and find: n ∂g(s,t) 0 (2m +∆ )Γ(n + 1)Γ(n +∆ + 1) = 0 0 0 0 T (t) , (4.34) ∂s Γ(n m + 1)Γ(n +∆ + m + 1) m s=1 m=1 0 0 0   X − ∂2g(s,t) = ∂s2  s=1 n0 (m 1)(m +∆0 + 1)(2m +∆0)Γ(n0 + 1)Γ(n0 +∆0 + 1) = − Tm(t) (4.35) Γ(n0 m + 1)Γ(n0 ∆0 + m + 1) mX=2 − − from which we obtain expectation value and variance according to subsec- tion 2.7.1 ∂g(s,t) E (t) = and NA ∂s  s=1  2 ∂2g(s,t) ∂g(s,t) ∂g(s,t) σ2 (t) = + . (2.71’) NA ∂s2 ∂s − ∂s  s=1  s=1  s=1!  204 Peter Schuster

As we see in the current example and we shall see in the next subsubsec- tion, bimolecularity complicates the solution of the chemical master equa- tions substantially and makes it quite sophisticated. We dispense here from the detailed expressions but provide the results for the special case of vast excess of one reaction partner, ∆ n > 1, which is known as pseudo | 0|  0 first order condition. Then, the sums can be approximated well be the first terms and we find (with k0 = ∆0k):

∂g(s, t) ∆0 +2 (∆0+1)kt k t n e− n e− 0 and ∂s ≈ 0 n + ∆ +1 ≈ 0  s=1 0 0 2 ∂ g(s, t) 2 k t n (n 1) e− 0 , ∂s2 ≈ 0 0 −  s=1 and we obtain finally,

k0t E A(t) = n0 e− and N (4.36) 2 k t k t σ (t) = n e− 0 1 e− 0 , NA 0 −    which is essentially the same result as obtained for the irreversible first order reaction. For the calculation of the probability density we make use of a slightly different definition of the stochastic variables and use (t) counting the NC number of molecules C in the system: P (t) = P (t) = n . With the n NC initial condition Pn(0) = δ(n, 0) and the upper limit of n, limt Pn(t) = c →∞ with c = min a , b where a and b are the sharply defined numbers of A { 0 0} 0 0 and B molecules initially present ( (0) = a , (0) = b ), we have NA 0 NB 0 c P (t) = 1 and thus P (t)=0 (n / [0,c], n Z) n n ∀ ∈ ∈ n=0 X and the master equation is now of the form

∂Pn(t) = k a0 (n 1) b0 (n 1) Pn 1(t) ∂t − − − − − − k (a n)(b n)P (t) .  (4.31a”) − 0 − 0 − n Stochastic Kinetics 205

In order to solve the master equation (4.31a”) the probability distribution

Pn(t) is Laplace transformed in order to obtain a set of pure difference equa- tion from the master equation being a set of differential-difference equation

∞ q (s) = exp( s t) P (t) dt n − · n Z0 and with the initial condition Pn(0) = δ(n, 0) we obtain

1 + s q (s) = k a b q (s) , − 0 − 0 0 0

s qn(s) = k a0 (n 1) b0 (n 1) qn 1(s) − − − − − − k (a n)(b n) q (s) , 1 n c . − 0 − 0 − n ≤ ≤

Successive iteration yields the solutions in terms of the functions qn(s)

a b n 1 q (s) = 0 0 (n!)2kn , 0 n c n n n s + k(a j)(b j) ≤ ≤ j=0 0 0    Y − − and after converting the product into partial fractions and inverse transfor- mation one finds the result

a b n n j P (t) = ( 1)n 0 0 ( 1)j 1+ − n − n n − a + b n j × j=0 0 0    X  − −  (4.37) 1 − n a0 + b0 j k(a0 j)(b0 j)t − e− − − . × j n    An illustrative example is shown in figure 4.6. The difference between the irreversible reactions monomolecular conversion and the bimolecular addition reaction (figure 4.4) is indeed not spectacular.

4.2.2.2 Dimerization reaction

When the dimerization reaction (4.31b) is modeled by means of a master equation [94] we have to take into account that two molecules A vanish at a time, and an individual jump involves always ∆n = 2:

∂P (t) 1 1 n = k (n + 2)(n + 1) P (t) k n(n 1) P (t) , (4.31b’) ∂t 2 n+2 − 2 − n 206 Peter Schuster

Figure 4.7: Irreversible dimerization reaction 2A C. The plot shows → the probability distribution P (t) = Prob (t) = n describing the number of n NA molecules of species C as a function of time and calculated by equation (4.42). The number of molecules C is given by the distribution P (t) = Prob (t) = m . m NC The initial conditions are chosen to be (t)= δ(n, a ), and (t)= δ(m, 0) and NA 0 NC  hence we have n + 2m = a0. With increasing time the peak of the distribution moves from right to left. The state n = 0 is an absorbing state and hence the long time limit of the system is: limt A(t) = δ(n, 0) limt C (t) = δ(m, a0/2). →∞ N →∞ N Parameters used: a = 100 and k = 0.02[t 1 M 1]; sampling times (upper part): 0 − · − t = 0 (black), 0.01 (green), 0.1 (turquoise), 0.2 (blue), 0.3 (violet), 0.5 (magenta), 0.75 (red), 1.0 (yellow), 1.5 (red), 2.25 (magenta), 3.5 (violet), 5.0 (blue), 7.0 (cyan), 11.0 (turquoise), 20.0 (green), 50.0 (chartreuse), and (black). ∞ which gives rise to the following PDE for the probability generating function

∂g(s, t) k ∂2g(s, t) = (1 s2) . (4.38) ∂t 2 − ∂s2

The analysis of this PDE is more involved than it might look at a first glance. Nevertheless, an exact solution similar to (4.33) is available:

∞ 1 2 g(s, t) = Am Cm− (s) Tm(t) , (4.39) m=0 X Stochastic Kinetics 207 wherein the parameters and functions are defined by 1 2m Γ(n + 1)Γ[(n m + 1)/2] A = − 0 0 − , m 2m · Γ(n m + 1)Γ[(n + m + 1)/2] 0 − 0 1 1 2 − 2 1 2 2 d Cm (s) 2 Cm− (s): (1 s ) + m(m 1) Cm− (s)=0 , − ds2 − 1 T (t) = exp k m(m 1) t . m {−2 − }

1 2 The functions Cm− (s) are ultraspherical or Gegenbauer polynomials named after the German mathematician Leopold Gegenbauer [103, ch.22, pp.773- 802]. They are solution of the differential equation shown above and belong to the family of hypergeometric functions. It is straightforward to write down expressions for the expectation values and the variance of the stochastic variable (t) (µ stands for an integer running index, µ N): NA ∈ 2 n0 b 2 c E (t) = A T (t) and NA − m m m=2µ=2  X (4.40) 2 n0 b 2 c 1 σ2 (t) = (m2 m + 2) A T (t) + A 2 T 2(t) . NA − 2 − m m m m m=2µ=2  X   In order to obtain concrete results these expressions can be readily evaluated numerically. There is one interesting detail in the deterministic version of the dimer- ization reaction. It is conventionally modeled by the differential equation (4.41a), which can be solved readily. The correct ansatz, however, would be (4.41b) for which we have also an exact solution (with [A]=a(t) and a(0) = a0): da a = k a2 = a(t) = 0 and (4.41a) − dt ⇒ 1+ a0 kt da a = k a(a 1) = a(t) = 0 . (4.41b) − dt − ⇒ a + (1 a )e kt 0 − 0 − The expectation value of the stochastic solution lies always between the solu- tion curves (4.41a) and (4.41b). An illustrative example is shown in figure 4.7. 208 Peter Schuster

As the previous subsection 4.2.2.1 we consider also a solution of the master equation by means of a Laplace transformation [95]. Since we are dealing with a step size of two molecules A converted into one molecule C, the master equation is defined only for odd or only for even numbers of molecules A. For an initial number of 2a molecules and a probability P n(t) = P (t) = 0 2 NA 2n we have for the initial conditions (0) = 2a , (0) = 0 and the NA 0 NC condition that all probabilities outside the interval [0, 2a0] as well as the odd probabilities P2n 1 (n =1,..., 2a0 1) vanish − − ∂P (t) 1 1 2n = k (2n)(2n 1)P (t) + k (2n + 2)(2n + 1)P (t) (4.31b”) ∂t −2 − 2n 2 2n+2

The probability distribution P2n(t) is derived as in the previous subsection by Laplace transformation

∞ q (s) = exp( s t) P (t) dt 2y − · 2y Z0 yielding the set of difference equations 1 1 + s q (s) = k (2a )(2a 1) q (s) , − 2a0 −2 0 0 − 2a0 1 s q (s) = k (2n)(2n 1) q (s)+ 2n −2 − 2n 1 + k (2n + 2)(2n + 1) q (s) , 0 y a 1 , 2 2n+2 ≤ ≤ 0 − which again can be solved by successive iteration. It is straightforward to cal- culate first the Laplace transform for 2µ, the number of molecules of species A that have reacted to yield C:2µ = 2(a m) with m = [C]and 0 m a : 0 − ≤ ≤ 0 m m 1 k 2a0 k − q2(a0 m)(s) = (2m)! s + 2(a0 j) 2(a0 j) 1 , − 2 2m 2 − · − −     j=1   Y   and a somewhat tedious but straightforward exercise in algebra yields the inverse Laplace transform:

m a0!(2a0 1)!! P2(a0 m)(t) = ( 1) − − − (a m)!(2a 2m 1) × − 0 − − m (4a 4j 1)(4a 2m 2j 3)!! ( 1)j 0 − − 0 − − − × − j!(m j)!(4a 2j 1)!! × j=0 0 X − − −

k (a0 j) 2(a0 j) 1 t e− − · − − . ×  Stochastic Kinetics 209

The substitution i = a j leads to 0 −

m a0!(2a0 1)!! P2(a0 m)(t) = ( 1) − − − (a m)!(2a 2m 1) × − 0 − − a0 a0 i (4i 1)(2a0 2m +2i 3)!! ( 1) − − − − × − (a0 i)!(a0 i + m)!(2a0 +2i 1)!! × i=a0 m X− − − − k 2i 2i 1 t e− · − . ×  Setting now n = a m in accord with the definition of m we obtain the 0 − final result a !(2a 1)!! P (t) = ( 1)n 0 0 − 2n − n!(2n 1)!! × − n (4.42) i (4i 1)(2n +2i 3)!! k i(2i 1)t ( 1) − − e− − . × − n!(2n 1)!! × i=1 X − The results are illustrated be means of a numerical example in figure 4.6. 210 Peter Schuster

4.3 Fokker-Planck approximation of master equations

It is often desirable to use a single differential equation for the description of probability distributions P (x) for jump processes. The obvious strategy is to approximate the master equation by a Fokker-Planck equation and im- plicitly such an approach has been already used by Albert Einstein in his famous work on Brownian motion [33]. A nonrigorous but straightforward derivation of an expansion of the master equation was given by the Dutch physicist Hendrik Kramers [104] and later the Kramers approach has been substantially improved by the mathematical physicist Jos´e Moyal [105]. This approach is generally known a Kramers-Moyal expansion. A differently moti- vated and rigorous expansion is due the Dutch theoretical physicist Nicholas van Kampen who was a PhD student of Hendrik Kramers. His expansion, known as the van Kramers size expansion, is based on a general approxima- tion by size dependent parameters containing those that vanish in the limit of macroscopic extensions. Before we discuss these expansions and approximations, we shall intro- duce first the reverse approach: A diffusion process can always be approxi- mated by a jump process whereas the inverse is not always true. We shall encounter such a situation in case of the Poisson process, which cannot be simulated by a suitable Fokker-Planck equation. Basic for the conversion in the limit of vanishing step size is an expansion of the transition probabilities that is truncated after the second term as in the case of the derivation of the Chapman-Kolmogorov equation (subsection 3.1.2).

4.3.1 Diffusion process approximated by a jump process

The typical result is derived for a random walk (subsection 3.4.2) where the master equation becomes a Fokker-Planck equation in the limit of infinitely small step size. Jumps, in this case, must become smaller and more prob- able simultaneously and this is taken care by a scaling assumption that is encapsulated in a parameter δ in the following way: average step size and the variance of the step size are proportional to δ whereas the jump probabilities increase as δ becomes smaller. Stochastic Kinetics 211

First we introduce a new variable y =(z x A(x)δ)/√δ and write for − − the jump probability

3/2 W (z x) = δ− Φ(y, x) with dy Φ(y, x) δ | Z (4.43) = Q and dyy Φ(y, x)=0 . Z Now we define three terms for a series expansion in the jump moments,

α (x) dz W (z x) = Q δ 0 ≡ δ | Z  α (x) dz (z x) W (z x) = A(x) Q (4.44) 1 ≡ − δ | Z α (x) dz (z x)2 W (z x) = dyy2 Φ(y, x) , 2 ≡ − δ | Z Z and assume that the function Φ(y, x) vanishes sufficiently fast as y in →∞ order to guarantee that

x 3 lim Wδ(z x) = lim Φ(y, x) = 0 for z = x . δ 0 | y z x 6 → →∞  −  ! Next we choose a twice differentiable function f(z) and carry out a procedure that is very similar to the derivation of the differential Chapman-Kolmogorov equation in section 3.1.2 and find

∂f(z) ∂f(z) 1 ∂2f(z) lim = α1(z) + α2(z) . δ 0 ∂t ∂z 2 ∂z2 →     This result has the consequence that in the limit δ 0 the master equation → ∂P (x) = dz W (x z) P (z) W (z x) P (x) (4.45a) ∂t | − | Z   becomes the FPE ∂P (x) ∂ 1 ∂2 = α P (x) + α P (x) . (4.45b) ∂t − ∂x 1 2 ∂x2 2     Accordingly, one can always construct a master equation if the requirements imposed by the three α-functions (4.44) are met. In case these criteria are not 212 Peter Schuster fulfilled, there is no approximation possible.The approximation is illustrated by means of three examples:

Random walk. Based on the notation introduced in subsection 3.4.2 we find for x = n l: ·

2 W (x z) = ϑ (δz,x l + δz,x+l) = α0(x)=2ϑ, α1(x)=0, α2(x)=2 l ϑ . | − ⇒ With δ = l2 and D = l2ϑ we obtain the familiar diffusion equation

∂P (x, t) ∂2P (x, t) = D . (4.46) ∂t ∂x2

Poisson process. With the notation used in subsection 3.4.1 (except α ϑ) ↔ and x = n l we find: · W (x z) = ϑ δ = α (x)= ϑ, α (x)= l ϑ, α (x)= l2 ϑ . | z,x+l ⇒ 0 1 2

In this case there is no way to define l and ϑ as functions of δ such that α1(x) and α (x) remain finite in the limit l 0. There is no Fokker-Planck limit 2 → for the Poisson process.

General approximation of diffusion by birth-and death master equa- tions. We begin with a master equation of the class

A(x) B(x) A(x) B(x) Wδ(z x) = + δz,x+δ + + δz,x δ , (4.47) | 2δ 2δ2 − 2δ 2δ2 −     where W (z x) is positive for sufficiently small δ. Under the assumption that δ | this is fulfilled for the entire range of interest for x, the process takes place on a range of x that is composed of integer multiples of δ.8 In the limit δ 0 → the birth and death process is converted into a Fokker-Planck equation with

2 α0(x)= B(x) δ , α1(x)= A(x), α2(x)= B(x) and (4.48) lim Wδ(z x)=0 for z = x . δ 0 → | 6 8We remark that the scaling relations (4.43) and (4.47) not the same but both lead to a Fokker-Planck equation. Stochastic Kinetics 213

2 Although α0(x) diverges with 1δ in contrast to (4.44) – where we pre- scribed the required 1/δ behavior – and the imagination of jumps converging smoothly into a continuous distribution is no longer valid, there exists a limiting Fokker-Planck equation, because the behavior of α0(x) is irrelevant ∂P (x, t) ∂ 1 ∂2 = A(x) P (x, t) + B(x) P (x, t) . (4.49) ∂t − ∂x 2 ∂x2 Equation (4.48) provides a tool for the simulation of a diffusion process by an approximating birth-and-death process. The method, however, fails for

B(x) = 0 for all possible ranges of x since then Wδ(z, x) does not fulfil the criterion of being nonnegative.

4.3.2 Kramers-Moyal expansion

The derivation starts from the master equation (4.45a) and a substitution of z by the definition y = x z in the first term and − y = z x in the second term . − We redefine also the elements of the transition matrix T (y, x) = W (x + y x) , | and the master equation is now of the form ∂P (x, t) = dy T (x, x y) P (x y, t) T (y, x) P (x, t) , , (4.45a’) ∂t − − − Z   and the integral is readily expanded in a power series

∂P (x, t) ∞ ( y)n ∂n = dy − T (y, x) P (x, t) ∂t n! ∂xn n=1 Z X   ∞ ( 1)n ∂n = − α (x) P (x, t) , (4.50) n! ∂xn n i=1 X   where the n-th derivative moment is defined by

α (x) = dz (z x)n W (z x) = dyyn T (y, x) . (4.50’) n − | Z Z In case the Kramers-Moyal expansion is terminated at the second term the result is Fokker-Planck equation of the form (4.45b). 214 Peter Schuster

4.3.3 Size expansion of the chemical master equation

Although we were able to analyze a number of representative examples by solving the one step birth-and-death master equation exactly, the actual ap- plicability to specific problems of chemical kinetics of this technique is rather limited. In order to apply a chemical master equation to a problem in prac- tice on is commonly dealing with about 1012 particles or more. Upscaling discloses one particular problem that is related to size expansion and that becomes virulent in the transition from the master equation to a Fokker- Planck equation. The problem is intimately related to the parameter volume V , which is the best possible estimator of system size. We distinguish to classes of quantities: (i) intensive quantities that are independent of sys- tem size, and (ii) extensive quantities that grow proportional to system size. Examples of intensive properties are temperature, pressure, density, and ex- tensive properties are volume, energy, or entropy. In upscaling from say 1000 to 1012 particles extensive properties grow by a factor of 109 whereas inten- sive properties remain the same. Some pairs of properties – one extensive and one intensive – are of particular importance, for example particle num- ber and concentration c = /(V N ) or mass M and (volume) density N N · L % = M/V . In order to compensate for the lack of generality, approximation methods were developed which turned out to be particularly useful in the limit of suffi- ciently large particle numbers [98]. The Dutch theoretical physicist Nicholas van Kampen expands the master equation in the inverse square root of some extensive quantity, particle number, mass or volume, which is characteristic of system size and which will be denoted by Ω. In van Kampen’s notation,

a Ω = extensive variable ∝ (4.51) x = a/Ω = intensive variable , the limit of interest is a large value of Ω at fixed x, which is tantamount to the transition to a macroscopic system. The transition probabilities are reformulated as

W (a a0) = W (a0; ∆a) with ∆a = a a0 , | − Stochastic Kinetics 215 and scaled according to the assumption

a0 W (a a0) = Ω ψ , ∆a . | Ω   The essential trick in the van Kampen expansion is that the size of the jump is expressed in term of an extensive quantity, ∆a, whereas the intensive variable x is used for the expression of the dependence on the position variable, a0. The expansion is made now in the new variable z defined by

a = Ω φ(t) + Ω1/2 z , where the function φ(t) is still to be determined. The derivative moments

αn(a) are now proportional to the system size Ω and therefore we can scale them accordingly: αn(a)=Ω˜αn(x). In the next step the new variable z is introduced into the Kramers-Moyal expansion (4.50): ∂P (z, t) ∂φ ∂P (z, t) Ω1/2 = ∂t − ∂t ∂z ∞ 1 n/2 n n Ω − ∂ 1/2 = ( 1) α˜ φ(t) + Ω− z P (z, t) − n! ∂zn n n=1 X    For general validity of an expansion all terms of a certain order in the ex- pansion parameter must vanish. We make use of this property to define φ(t) such that the terms of order Ω1/2 are eliminated by demanding ∂φ =α ˜ φ(t) . (4.52) ∂t 1 This equation an ODE determining φ(t) and, of course, in full agreement with the deterministic equation for the expectation value of the random variable. Accordingly, φ(t) is indeed the deterministic part of the solution. 1/2 1/2 The next step is an expansion ofα ˜n φ(t)+Ω− z in Ω− and reordering of terms yielding  m ∞ (m 2)/2 n ∂P (z, t) Ω− − n m m n ∂ m n = ( 1) α˜ − φ(t) z − P (z, t) ∂t m! − n n ∂zn m=2 n=1   X X    In taking the limit of large system size Ω all terms vanish except the one with m = 2 and we find the result ∂P (z, t) ∂ 1 ∂2 = α˜(1) φ(t) zP (z, t) + α˜ φ(t) P (z, t) , (4.53) ∂t − 1 ∂z 2 2 ∂z2     216 Peter Schuster

(1) where α1 stands for the linear drift term. Since the size expansion is of fundamental importance for an understanding of the relation between micro- scopic and macroscopic processes, we shall also provide the original slightly different derivation of van Kampen’s in section 5.2. It is straightforward to compare with the result of the Kramers-Moyal expansion (4.50) terminated after two terms:

∂P (a, t) ∂ 1 ∂2 = α (a)P (a, t) + α (a)P (a, t) . ∂t − ∂a 1 2 ∂a2 2     The change of variables x = a/Ω leads to

∂P (x, t) ∂ 1 ∂2 = α˜ (x)P (x, t) + α˜ (x)P (x, t) . ∂t − ∂x 1 2Ω ∂x2 2     2 1 The application of small noise theory with  = Ω− and for the substitution z = Ω1/2 x φ(t) one obtains the lowest order Fokker-Planck equation, − which is exactly the same as the lowest order approximation in the van Kampen expansion. This result has an important consequence: If we are only interested in the lowest order approximation we may use the Kramers-Moyal equation, which is much easier to derive than the van Kampen equation. If only the small noise approximation is approximately valid than it is appropriate to consider only the linearization of the drift term and individual solutions of this equations are represented by the trajectories of the stochastic equation: (1) dz =α ˜1 φ(z) z dt + α˜2 φ(t) dW (t) . (4.54) q Eventually, we have found a procedure to relate approximate ly master equa- tions, Fokker-Planck and stochastic differential equations and to close the gap between microscopic stochasticity and macroscopic behavior.

The chemical reaction A B as an example. The transition probabilities for the interval t0 t of the corresponding single step birth-and-death master → A A B equation with [ ]t = a(t), [ ]t0 = a0, [ ]= b, a fixed concentration, and the reaction rate parameters k1 and k2 are:

W (a a0) = δa,a +1 k2b + δa,a 1 k1a . | 0 0− Stochastic Kinetics 217

Figure 4.8: Comparison of expansions of the master equation. The re- action A B with B buffered, [B] = b = b0, is chosen as example and the exact solution (black) is compared with the results of the Kramers-Moyal expansion (red) and the van Kampen size expansion (blue). Parameter choice: V = 1, k1 = 2, k2 = 1, b = 40.

Now we choose the volume of the system, V , as size parameter and have: a = αV and b = βV . This leads to the scaled transition probability,

W (α0; ∆a) = V k2β δ∆a,1 + k1α0 δ∆a, 1 , − and the first two derivative moments 

α = (a0 a) W (a0 a) = k b k a = V (k β k α) , 1 − | 2 − 1 2 − 1 X(a0) 2 α = (a0 a) W (a0 a) = k b + k a = V (k β + k α) . 2 − | 2 1 2 1 X(a0) 218 Peter Schuster

Following the procedure of van Kampen’s expansion we define

a = V φ(t) + V 1/2 z (4.55) and obtain for the deterministic differential equation and its solution:

dφ(t) k1t k2β k1t = k2β k1 φ(t) and φ(t) = φ(0) e− + (1 e− ) . dt − k1 − The Fokker-Planck equation takes on the form ∂P (z) ∂ 1 ∂2 = k zP (z) + k β + k φ(t) P (z) ∂t 1 ∂z 2 ∂z2 2 1      k1t The expectation value of z is readily computed to be z(t) = z(0)e− . h i Since the partition of the variable a in equation (4.55) is arbitrary we can assume z(0) = 0 – as usual9 – and find for the variance in z

2 k2β k1t σ z(t) = + φ(0) (1 e− ) k −  1  and eventually obtain for the solutions in the macroscopic variable a with a(0) = V φ(0)

k1t k2 b k1t a(t) = V φ(t) = a(0) e− + (1 e− ) , h i k1 −

2 2 k2 b k1t σ a(t) = V σ z(t) = + a(0) (1 e− ) . k1 −     Finally, we compare the different  stationary state solutions obtained from the van Kampen expansion, α = k2b/k1, 1 (z α)2 P¯(z) = exp − , πα 1+erf( α ) − 2α 2 2   with those derived fromp the Kramers-Moyalp  expansion

1+4k2 b/k1 2a P¯(a) = (k b + k a)− e− , N 2 1 and the exact solution a a α k2 b/k1 exp k2 b/k1 α e P¯(a) = − = − ,  a!  a! which is a Poissonian. A comparison of numerical plots is shown in figure 4.8. 9The assumption z(0) = 0 implies z(t) = 0 and hence the corresponding stochastic variable (t) describes the fluctuations around zero. Z Stochastic Kinetics 219

4.4 Numerical simulation of master equations

In this section we introduce a model for computer simulation of stochastic chemical kinetics that has been developed and put upon a solid basis by the American physicist and mathematical chemist Daniel Gillespie [1–4]. Consid- ered is a population of N molecular species, S , S ,..., S in the gas phase, { 1 2 N } 10 which interact through M elementary chemical reactions (R1, R2,..., RM ). Two conditions are assumed to be fulfilled by the system: (i) the container with constant volume V in the sense of a flow reactor (CSTR in figure 4.2) is assumed to be well mixed by efficient stirring and (ii) the system is as- sumed to be in thermal equilibrium at constant temperature T . The goals of the simulation are the computation of the time course of the stochas- tic variables – (t) being the number of molecules (K) of species S at Xk k time t – and the description of the evolution of the population. A single computation yields a single trajectory, very much in the sense of a single so- lution of a stochastic differential equation (figure 3.9) and observable results are commonly derived through sampling of trajectories. Exceptions are sin- gle molecule techniques, which allow for experimental observation of single events including whole trajectories of biopolymer folding and unfolding (see, for example [28,30,106,107]).

4.4.1 Definitions and conditions

For a reaction mechanism involving N species in M reactions the entire population is characterized by an N-dimensional random vector counting numbers of molecules for the various species Sk,

~ (t) = (t), (t),..., (t) . (4.56) X X1 X2 XN  The common variables in chemistry are concentrations rather than particle numbers:

x = x (t), x (t),..., x (t) with x = Xk , (4.57) 1 2 N k V N · L  10Elementary steps of chemical reactions are defined and discussed in subsection 4.1.1. 220 Peter Schuster where the volume V is the appropriate expansion parameter Ω for the sys- tem size (subsection 4.3.3).11 The following derivation of the chemical master equation [3, pp. 407-417] focuses on reaction channels Rµ of bimolecular na- ture S + S S + . . . (4.58) a b −−−→ c like (4.1f,4.1i,4.1j and 4.1k) shown in the list (4.1). An extension to monomolec- ular and trimolecular reaction channels is straightforward, and zero-molecular processes like the influx of material into the reactor in the elementary step (4.1a) provide no major problems. Reversible reactions, for example (4.19), are handled as two elementary steps, A + B C + D and C + D A + B. −→ −→ In equation (4.58) we distinguish between reactant species, A and B, and product species, C . . . , of a reaction Rµ. The two stipulations (i) perfect mixture and (ii) thermal equilibrium can now be cast into precise physical meanings. Premise (i) requires that the probability of finding the center of an arbitrarily chosen molecule inside a container subregion with a volume ∆V is equal to ∆V/V . The system is spatially homogeneous on macroscopic scales but it allows for random fluctu- ations from homogeneity. Formally, requirement (i) asserts that the position of a randomly selected molecule is described by a random variable, which is uniformly distributed over the interior of the container. Premise (ii) implies that the velocity of a randomly chosen molecule of mass m will be found to lie within an infinitesimal region dv3 around the velocity v is equal to

m mv2/(2k T ) P = e− B . MB 2πk T  B 

Here, the velocity vector is denoted by v = (vx, vy, vz) in Cartesian coordi- 3 nates, the infinitesimal volume element fulfils dv = dvx dvy dvz, the square 2 2 2 2 of the velocity is v = vx + vy + vz , and kB is Boltzmann’s constant. Premise (ii) asserts that the velocities of molecules follow a Maxwell-Boltzmann dis- tribution or formally it states that each Cartesian velocity component of a randomly selected molecule of mass m is represented by a random variable,

11In order to distinguish random and deterministic variables stochastic concentrations are indicated by upright fonts. Stochastic Kinetics 221

Figure 4.9: Sketch of a molecular collision in dilute gases. A spherical molecule S with radius r moves with a velocity v = v v relative to a spherical a a b − a molecule Sb with radius rb. If the two molecules are to collide within the next infinitesimal time interval dt, the center of Sb has to lie inside a cylinder of radius r = ra + rb and height v dt. The upper and lower surface of the cylinder are deformed into identically oriented hemispheres of radius r and therefore the volume of the deformed cylinder is identical with that of the non-deformed one.

which is normally distributed with mean 0 and variance kBT/m. Implicitly, the two stipulations assert that the molecular position and velocity compo- nents are all statistically independent of each other. For practical purposes, we expect premises (i) and (ii) to be valid for any dilute gas system at con- stant temperature in which nonreactive molecular collisions occur much more frequently than reactive molecular collisions

4.4.2 The probabilistic rate parameter

In order to derive a chemical master equation for the population variables

(t) some properties of the probability πµ(t, dt) with µ =1,...,M that a Xk randomly selected combination of the reactant molecules for reaction Rµ at time t will react to yield products within the next infinitesimal time interval [t, t + dt[. With the assumptions made in the previous subsection 4.4.1 virtually all chemical reaction channels fulfil the condition

πµ(t, dt) = γ µ dt , (4.59) 222 Peter Schuster

where the specific probability rate parameter γµ is independent of dt. First, we calculate the rate parameter for a general bimolecular reaction by means of classical collision theory and then extend briefly to mono- and trimolec- ular reactions. Apart from the quantum mechanical approach the theory of collisions in dilute gases is the best developed microscopic model for chemi- cal reactions and well suited for a rigorous derivation of the master equation from molecular motion and events.

4.4.2.1 Bimolecular reactions

The occurrence of a reaction A + B has to be preceded by a collision of an

Sa molecule with an Sb molecule, and first we shall calculate the probability of such a collision in the reaction volume V . For simplicity molecular species are regarded as spheres with specific and radii, for example ma and ra for Sa, and mb and rb for Sb, respectively. A collision occurs whenever the center-to center distance of the two molecules RAB decreases to (RAB)min = ra + rb. Next we define the probability that a randomly selected pair of Rµ reactant molecules at time t will collide within the next infinitesimal time interval [t, t + dt[ by πµ∗ (t, dt) and calculate it from the Maxwell-Boltzmann distribution of molecular velocities according to figure 4.9.

The probability that a randomly selected pair of reactant molecules Rµ, one molecule S and one molecule S , has a v = v v a b b − a lying in an infinitesimal volume element dv3 about v at time t is denoted by

P (v(t), Rµ) and can be readily obtained from kinetic theory of gases:

µ 2 3 P (v(t), Rµ) = exp µv /(2k T ) dv . 2πk T − B  B   Herein v = v = v2 + v2 + v2 is the value of the relative velocity and | | x y z µ = mamb/(ma + mqb) is the reduced mass of the two Rµ molecules. Two properties of the probabilities P (v(t), Rµ) for different velocities v are im- portant:

(i) The elements in the set of all velocity combinations, E R are mutu- { v(t), µ } ally exclusive, and (ii) they are collectively exhaustive since v is varied over the entire three Stochastic Kinetics 223 dimensional velocity space.

Now we relate the probability P (v(t), Rµ) to a collision event Ecol by calculat- ing the conditional probability P (E (t, dt) E R ). In figure 4.9 we sketch col | v(t), µ the geometry of the collision event between to randomly selected spherical molecules Sa and Sb that is assumed to occur with an infinitesimal time in- 12 terval dt: A randomly selected molecule Sa moves along the vector v of the relative velocity v v between S and an also randomly selected molecule b − b a Sb. A collision between the molecules will take place in the interval [t, t + dt if and only if the center of molecule Sb is inside the spherically distorted cylinder (figure 4.9) at time t. Thus P (E (t, dt) E R ) is the probability col | v(t), µ that the center of a randomly selected Sb molecule moving with velocity v(t) relative to the randomly selected Sa molecule will be situated at time t with a certain subregion of V , which has a volume V = v dt π(r + r )2, and by col · a b scaling with the total volume V we obtain:13

2 v(t) dt π(ra + rb) P E (t, dt) Ev R = · . (4.60) col | (t), µ V  By substitution and integration over the entire velocity space we can calculate the desired probability

2 2 µ µv /(2kBT ) v(t) dt π(ra + rb) 3 πµ∗ (t, dt) = e− · dv . 2πkBT · V ZZZv  

Evaluation of the integral is straightforward and yields

1/2 2 8πkBT (ra + rb) π∗ (t, dt) = dt . (4.61) µ V µ   The first factor contains only constants and the macroscopic quantities, vol- ume V and temperature T , whereas the molecular parameters, the radii ra and rb and the reduced mass µ.

12 The absolute time t comes into play because the positions of the molecules, ra and rb, and their velocities, va and vb, depend on t. 13Implicitly in the derivation we made use of the infinitesimally small size of dt. Only if the distance v dt is vanishingly small, the possibility of collisional interference of a third molecule can be neglected. 224 Peter Schuster

A collision is a necessary but not a sufficient condition for a reaction to take place and therefore we introduce a collision-conditioned reaction prob- ability p µ that is the probability that a randomly selected pair of colliding

Rµ reactant molecules will indeed react according to Rµ. By multiplication of independent probabilities we have

πµ(t, dt) = p µ πµ∗ (t, dt) , and with respect to equation (4.59) we find

8πk T 1/2 (r + r )2 γ = p B a b . (4.62) µ µ V µ  

As said before, it is crucial for the forthcoming analysis that γµ is independent of dt and this will be the case if and only if pµ does not depend on dt. This is highly plausible for the above given definition, and an illustrative check through the detailed examination of bimolecular reactions can be found in [3, pp.413-417]. It has to be remarked, however, that the application of classical collision theory to molecular details of chemical reactions can be an illustrative and useful heuristic at best, because the molecular domain falls into the realm of quantum phenomena and any theory that aims at a derivation of reaction probabilities from first principles has to be built upon a quantum mechanical basis.

4.4.2.2 Monomolecular, trimolecular, and other reactions

A monomolecular reaction is of the form A C and describes the sponta- −→ neous conversion S S . (4.63) a −−−→ c

One molecule Sa is converted into one molecule Sc. This reaction is different from a catalyzed conversion

S + S S + S , (4.58’) a b −−−→ c b Stochastic Kinetics 225 where the conversion A C is initiated by a collision of an A molecule −→ with a B molecule,14 and a description as an ordinary bimolecular process is straightforward. The true monomolecular conversion (4.63) is driven by some quantum mechanical mechanism similar as in the case of radioactive decay of a nu- cleus. Time-dependent perturbation theory in quantum mechanics [108, pp.724-739] shows that almost all weakly perturbed energy-conserving tran- sitions have linear probabilities of occurrence in time intervals δt, when δt is microscopically large but macroscopically small. Therefore, to a good ap- proximation the probability for a radioactive nucleus to decay within the next infinitesimal time interval dt is of the form α dt, were α is some time- independent constant. On the basis of analogy we may expect πµ(t, dt) the probability for a monomolecular conversion to be approximately of the form

γµ dt with γµ being independent of dt. Trimolecular reactions of the form S + S + S S + . . . (4.64) a b c −−−→ d should not be considered because collisions of three particles do not occur with a probability larger than of measure zero. There may be, however, spe- cial situations where approximations of complicated processes by trimolecular events is justified. One example is a set of three coupled reactions with four reactant molecules [109, pp.359-361] where is was shown that πµ(t, dt) is essentially linear in dt. The last class of reaction to be considered here is no proper chemical reaction but an influx of material into the reactor. It is often denoted as a the zeroth order reaction (4.1a): S . (4.65) ∗ −−−→ a Here, the definition of the influx and the efficient mixing or homogeneity condition is helpful, because it guarantees that the number of molecules entering the homogeneous system is a constant and does not depend on dt. 14Remembering what had been said in subsection 3.6.2 the two reactions are related by rigorous thermodynamics: Whenever the catalyzed reaction is described an incorporation of the uncatalyzed process in the reaction mechanism is required. 226 Peter Schuster

4.4.3 Simulation of chemical master equations

So far we succeeded to derive the fundamental fact that for each elementary reaction channel Rµ with µ =1,...,M, which is accessible to the molecules of a well-mixed and thermally equilibrated system in the gas phase, exists a scalar quantity γµ, which is independent of dt such that [3, p.418]

γµ dt = probability that a randomly selected combination of

Rµ reactant molecules at time t will react accordingly (4.66) in the next infinitesimal time interval [t, t + dt[ .

The specific probability rate constant, γµ is one of three quantities that are required to fully characterize a particular reaction channel Rµ. In addition we shall require a function hµ(n) where the vector n =(n1,...,nn)0 contains the exact numbers of all molecules at time t, ~ (t)= (t),..., (t) 0 = n(t), N N1 NN  hµ(n) the number of distinct combinations of Rµ reactant ≡ molecules in the system when the numbers of molecules (4.67)

Sk are exactly nk with k =1,...,N , and an N M matrix of integers S = ν µ; k = 1,...,N, µ = 1,...,M , × { k } where

νkµ the change in the Sk molecular population caused by the ≡ (4.68) occurrence of one Rµ reaction.

The functions hµ(n) and the matrix N are readily deduced by inspection of the algebraic structure of the reaction channels. We illustrate by means of an example:

R : S + S S + S , 1 1 2 −−−→ 3 4 R : 2 S S + S , and (4.69) 2 1 −−−→ 1 5 R : S S . 3 3 −−−→ 5 Stochastic Kinetics 227

The functions hµ(n) are obtained by simple combinatorics

h1(n) = n1 n2 , h (n) = n (n 1)/2 , and 2 1 1 − h3(n) = n3 , and the matrix S is of the form 1 1 0 − − 1 0 0 −  S = +1 0 1 .    −  +1 0 0     0 +1 +1     It is worth noticing that the functional form of hµ is determined exclusive by the reactant side of Rµ. In particular, has precisely the same form de- termined by mass action in the deterministic kinetic equations with the ex- ception that the particle numbers have to counted exactly in small systems, n(n 1) instead of n2 for example. The stoichiometric matrix S refers to − the product side of the reaction equations and counts the net production of molecular species per one elementary reaction event: νkµ is the number of molecules Sk produces by reaction Rµ, these numbers are integers and nega- tive values indicate the number of molecules, which have disappeared during one reaction. In the forthcoming analysis we shall make use of vectors cor- responding to individual reactions Rµ: νµ =(ν1µ,...,νNµ)0.

Analogy to deterministic kinetics. It is illustrative to consider now the analogy to conventional chemical kinetics. If we denote the concentration vector of our molecular species Sk by x = (x1,...,xN )0 and the flux vector

ϕ =(ϕ1,...,ϕN )0 the kinetic equation can be expressed by dx = S ϕ . (4.70) dt · The individual elements of the flux vector in mass action kinetics are n gkµ ϕµ = kµ x for g µ S + g µ S + . . . + g µ S k 1 1 2 2 N N −−−→ kY=1 228 Peter Schuster

wherein the factors gkµ are the stoichiometric coefficients on the reactant side of the reaction equations. It is sometimes useful to define analogous factors qkµ for the product side, both classes of factors can be summarized in matrices G and Q and then the stochastic matrix is simply given by the difference S =Q G. We illustrate by means of the model mechanism (4.69) − in our example:

0 +1 0 +1 +2 0 1 1 0 − −  0 0 0  +1 0 0   1 0 0  − Q G = +1 0 0   0 0 +1 = +1 0 1 = S −   −    −  +1 0 0   0 0 0  +1 0 0               0 +1 +1  0 0 0   0 +1 +1             We remark that the entries of G and Q are nonnegative integers by defini- tion. The flux ϕ has the same structure as in the stochastic approach, γµ corresponds to the kinetic rate parameter or rate constant kµ and the com- binatorial function hµ and the mass action product are identical apart from the simplifications for large particle numbers.

Occurrence of reactions. The probability of occurrence of reaction events within an infinitesimal time interval dt is cast into three theorems:

Theorem 1. If ~(t)= n, then the probability that exactly one Rµ will occur X in the system within the time interval [t, t + dt[ is equal to

γµ hµ(n)dt + o(dt) , where o( dt) denotes terms that approach zero with dt faster than dt.

Theorem 2. If ~(t) = n, then the probability that no reaction will occur X within the time interval [t, t + dt[ is equal to

1 γµ hµ(n)dt + o(dt) . − µ X Theorem 3. The probability of more than one reaction occurring in the system within the time interval [t, t + dt[ is of order o( dt).

Proofs for all three theorems are found in [3, pp.420,421]. Stochastic Kinetics 229

Based on the three theorems an analytical description of the evolution of the population vector ~ (t). The initial state of the system at some initial X time t is fixed: ~ (t ) = n . Although there is no chance to derive a 0 X 0 0 deterministic equation for the time-evolution, a deterministic function for the time-evolution of the probability function P (n, t n , t ) for t t will | 0 0 ≥ 0 be obtained. We express the probability P (n, t n , t ) as the sum of the | 0 0 probabilities of several mutually exclusive and and collectively exhaustive routes from ~ (t ) = n to ~ (t + dt) = n. These routes are distinguished X 0 0 X from one another with respect to the event that happened in the time interval [t, t + dt[:

M P (n,t + dt n ,t ) = P (n,t n ,t ) 1 γµ hµ(n)dt + o( dt) + | 0 0 | 0 0 ×  −  µ=1 X   M (4.71) + P (n νµ,t n ,t ) γµ hµ(n νµ)dt + o(dt) + − | 0 0 × − µ=1 X   + o( dt) .

The different routes from ~ (t ) = n to ~ (t + dt) = n are obvious from X 0 0 X the balance equation (4.71): (i) One route from ~ (t ) = n to ~ (t + dt) = n is given by the first X 0 0 X term on the right-hand side of the equation: No reaction is occurring in the time interval [t, t + dt[ and hence ~ (t) = n was fulfilled at time t. The X joint probability for route (i) is therefore the probability to be in ~ (t) = n X conditioned by ~ (t )= n times the probability that no reaction has occurred X 0 0 in [t, t + dt[. In other words, the probability for this route is the probability to go from n0 at time t0 to n at time t and to stay in this state during the next interval dt.

(ii) An alternative route from ~ (t ) = n to ~ (t + dt) = n accounted X 0 0 X for by one particular term in sum of terms on the right-hand side of the equation: An Rµ reaction is occurring in the time interval [t, t + dt[ and hence ~ (t) = n νµ was fulfilled at time t. The joint probability for X − route (ii) is therefore the probability to be in ~ (t) = n νµ conditioned by X − ~ (t ) = n times the probability that exactly one Rµ reaction has occurred X 0 0 230 Peter Schuster in [t, t + dt[. In other words, the probability for this route is the probability to go from n at time t to n νµ at time t and to undergo an Rµ during 0 0 − the next interval dt. Obviously, the same consideration is valid for every elementary reaction and we have M terms of this kind.

(iii) A third possibility – neither no reaction nor exactly one reaction chosen from the set Rµ; µ =1,...,M – must inevitably invoke more than { } one reaction within the time interval [t, t + dt[. The probability for such events, however, is o( dt) or of measure zero by theorem 3.

All routes (i) and (ii) are mutually exclusive since different events are taking place within the last interval [t, t + dt[. The last step to derive the chemical master equation is straightforward: P (n, t n , t ) is subtracted from both sides in equation (4.71), then both sides | 0 0 are divided by dt, the limit dt 0 is taken, all o( dt) terms vanish and finally ↓ we obtain

∂ M P (n, t n , t ) = γµ hµ(n νµ) P (n νµ n , t ) ∂t | 0 0 − − | 0 0 − µ=1 (4.72) X

γµ hµ(n) P (n, t n , t ) . − | 0 0  Initial conditions are required to calculate the time evolution of the proba- bility P (n, t n , t ) and we can easily express them in the form | 0 0

1 , if n = n0 , P (n, t0 n0, t0) = (4.72’) | 0 , if n = n ,  6 0 which is precisely the initial condition used in the derivation of equation (4.71). Any sharp probability distribution P n , t n(0), t = δ(n n(0)) is admit- k 0| k 0 k − k ted for the molecular particle numbers at t0. The assumption of extended initial distributions is, of course, also possible but the corresponding master equation becomes more sophisticated. Stochastic Kinetics 231

4.4.4 The simulation algorithm

The chemical master equation (4.72) as derived in the last subsection 4.4.3 is closely related to a stochastic simulation algorithm for chemical reactions [1, 2, 4] and it is important to realize how the simulation tool fits into the general theoretical framework of the chemical master equation. The algorithm is not based on the probability function P (n, t n , t ) but on another related | 0 0 probability density p(τ, µ n, t), which expresses the probability that given | ~ (t)= n the next reaction in the system will occur in the infinitesimal time X interval [t + τ, t + τ + dτ[, and it will be an Rµ reaction.

Figure 4.10: Partitioning of the time interval [t,t+τ +dτ[. The entire interval is subdivided into (k + 1) nonoverlapping subintervals. The first k intervals are of equal size ε = τ/k and the (k + 1)-th interval is of length dτ.

Considering the theory of random variables, p(τ, µ n, t) is the joint den- | sity function of two random variables: (i) the time to the next reaction, τ, and (ii) the index of the next reaction, µ. The possible values of the two random variables are given by the domain of the real variable 0 τ < ≤ ∞ and the integer variable 1 µ M. In order to derive an explicit formula ≤ ≤ for the probability density p(τ, µ n, t) we introduce the quantity | M

a(n) = γµ hµ(n) µ=1 X and consider the time interval [t, t + τ + dτ[ to be partitioned into k +1 subintervals, k > 1. The first k of these intervals are chosen to be of equal length ε = τ/k, and together they cover the interval [t, t + τ[ leaving the 232 Peter Schuster interval [t + τ, t + τ + dτ[ as the remaining (k + 1)-th part (figure 4.10. With ~ (t) = n the probability p(τ, µ n, t) describes the event no reaction X | occurring in each of the k ε-size subintervals and exactly one Rµ reaction in the final infinitesimal dτ interval. Making use of theorems 1 and 2 and the multiplication law of probabilities we find

k p(τ, µ n, t) = 1 a(n) ε + o(ε) γµ hµ(n) dτ + o(dτ) | −     Dividing both sides by dτ and taking the limit dτ 0 yields ↓ k p(τ, µ n, t) = 1 a(n) ε + o(ε) γµ hµ(n) | −   This equation is valid for any integer k > 1 and hence its validity is also guaranteed for k . Next we rewrite the first factor on the right-hand → ∞ side of the equation

k a(n) kε + ko(ε) k 1 a(n) ε + o(ε) = 1 = − − k     a(n) τ + τo(ε)/ε k = 1 , − k   and take now the limit k whereby we make use of the simultaneously →∞ occurring convergence o(ε)/ε 0: ↓ k k a(n) τ a(n) τ lim 1 a(n) ε + o(ε) = lim 1 = e− . k − k − k →∞  →∞   By substituting this result into the initial equation for the probability density of the occurrence of a reaction we find

M a(n) τ γµ hµ(n) γν hν (n) τ p(τ, µ n, t) = a(n) e− = γµ hµ(n) e− ν=1 . (4.73) | a(n) P Equation (4.73) provides the mathematical basis for the stochastic simu- lation algorithm. Given ~ (t) = n, the probability density consists of two X independent probabilities where the first factor describes the time to the next reaction and the second factor the index of the next reaction. These factors correspond to two statistically independent random variables r1 and r2. Stochastic Kinetics 233

4.4.5 Implementation of the simulation algorithm

Equation (4.73) is implemented now for computer simulation and we inspect the probability densities of the two unit-interval uniform random variables r1 and r2 in order to find the conditions to be imposed of a statistically exact sample pair (τ, µ): r1 has an exponential density function with the decay constant a(n), 1 τ = ln 1 r , (4.74a) a(n) 1   and taking m to be the smallest integer which fulfils

m

µ = inf m cµ hµ(n) > a(n) r2 . (4.74b) ( ) µ=1 X After the values for τ and µ have been determined accordingly the action advance the state vector ~ (t) of the system is taking place: X

~ (t) = n ~ (t + τ)= n + νµ . X −→ X Repeated application of the advancement procedure is the essence of the stochastic simulation algorithm. It is important to realize that this advance- ment procedure is exact as far as r1 and r2 are obtained by fair samplings from a unit-interval uniform random number generator or, in other words, the correctness of the procedure depends on the quality of the random num- ber generator applied. Two further issues are important: (i) The algorithm operates with internal time control that corresponds to real time of the chem- ical process, and (ii) contrary to the situation in differential equation solvers the discrete time steps are not finite interval approximations of an infinites- imal time step and instead, the population vector ~ (t) maintains the value X ~ (t) = n throughout the entire finite time interval [t, t + dτ[ and then X changes abruptly to ~ (t + τ) = n + νµ at the instant t + τ when the Rµ X reaction occurs. In other words, there is no blind interval during which the algorithm is unable to record changes. 234 Peter Schuster

Table 4.1: The combinatorial functions hµ(n) for elementary reactions. Reactions are ordered with respect to reaction order, which in case of mass action is identical to the molecularity of the reaction. Order zero implies that no reactant molecule is involved and the products come from an external source, for example from the influx in a flow reactor. Orders 1,2 and 3 mean that one, two or three molecules are involved in the elementary step, respectively.

No. Reaction Order hµ(n) 1 products 0 1 ∗ −→ 2 A products 1 nA −→ 3 A + B products 2 nAnB −→ 4 2 A products 2 nA(nA 1)/2 −→ − 5 A + B + C products 3 nAnBnC −→ 6 2 A + B products 3 nA(nA 1)nB/2 −→ − 7 3 A products 3 nA(nA 1)(nA 2)/6 −→ − −

Structure of the algorithm. The time evolution of the population in de- scribed by the vector ~ (t) = n(t), which is updated after every individual X reaction event. Reactions are chosen from the set = Rµ; µ =1,...,M , R { } which is defined by the reaction mechanism under consideration. They are classified according to the criteria listed in table 4.1. The reaction proba- bilities corresponding to the reaction rates of deterministic kinetics are con- tained in a vector a(n) = c1h1(n),...,cM hM (n) 0, which is also updated after every individual reaction event. Updating is performed according to the stoichiometric vectors νµ of the individual reactions Rµ, which represent columns of the stoichiometric matrix S. We repeat that the combinato- rial functions hµ(n) are determined exclusively by the reactant side of the reaction equation whereas the stoichiometric vectors νµ represent the net production, (products) (reactants). − Stochastic Kinetics 235

The algorithm comprises five steps:

(i) Step 0. Initialization: The time variable is set to t = 0, the initial values of all N variables ,..., for the species – for species X1 XN Xk Sk – are stored, the values for the M parameters of the reactions Rµ,

c1,...,cM , are stored, and the combinatorial expressions are incorpo- rated as factors for the calculation of the reaction rate vector a(n) according to table 4.1 and the probability density P (τ, µ). Sampling times, t < t < and the stopping time t are specified, the first 1 2 ··· stop sampling time is set to t1 and stored and the pseudorandom number generator is initialized by means of seeds or at random.

(ii) Step 1. Monte Carlo step: A pair of random numbers is created (τ, µ) by the random number generator according to the joint probability function P (τ, µ). In essence two explicit methods can be used: the direct method and the first-reaction method.

(iii) Step 2. Propagation step: (τ, µ) is used to advance the simulation time

t and to update the population vector n, t t + τ and n n + νµ, → → then all changes are incorporated in a recalculation of the reaction rate vector a.

(iv) Step 3. Time control: Check whether or not the simulation time has

been advanced through the next sampling time ti, and for t > ti send current t and current n(t) to the output storage and advance the sam- pling time, t t . Then, if t > t or if no more reactant molecules i → i+1 stop remain leading to hµ = 0 µ = 1,...,M, finalize the calculation by ∀ switching to step 4, and otherwise continue with step 1.

(v) Step 4. Termination: Prepare for final output by setting flags for early termination or other unforseen stops and send final time t and final n to the output storage and terminate the computation.

A caveat is needed for the integration of stiff systems where the values of in- dividual variable can vary by many orders of magnitude and such a situation might caught the calculation in a trap by slowing down time progress. 236 Peter Schuster

The Monte Carlo step. Pseudorandom numbers are drawn from a random number generator of sufficient quality whereby quality is meant in terms of no or very long recurrence cycles and a the closeness of the distribution of the pseudorandom numbers r to the uniform distribution on the unit interval:

0 α < β 1 = P (α r β) = β α . ≤ ≤ ⇒ ≤ ≤ − With this prerequisite we discuss now two methods which use two output values r of the pseudorandom number generator to generate a random pair (τ, µ) with the prescribed probability density function P (τ, µ).

The direct method. The two-variable probability density is written as the product of two one-variable density functions:

P (τ, µ) = P (τ) P (µ τ) . 1 · 2 |

Here, P1(τ) dτ is the probability that the next reaction will occur between times t + τ and t + τ + dτ, irrespective of which reaction it might be, and

P (µ τ) is the probability that the next reaction will be an Rµ given that 2 | the next reaction occurs at time t + τ.

By the addition theorem of probabilities, P1(τ) dτ is obtained by summa- tion of P (τ, µ) dτ over all reactions Rµ:

M

P1(τ) = P (τ, µ) . (4.75) µ=1 X Combining the last two equations we obtain for P (µ τ) 2 | M P (µ τ) = P (τ, µ) P (τ, ν) (4.76) 2 | ν  X Equations (4.75) and (4.76) express the two one-variable density functions in terms of the original two-variable density function P (τ, µ). From equa- tion (4.73) we substitute into P (τ, µ)= p(τ, µ n, t) through simplifying the | notation by using

M M

aµ γµhµ(n) and a = aµ γµhµ(n) ≡ ≡ µ=1 µ=1 X X Stochastic Kinetics 237 and find

P1(τ) = a exp( a τ) , 0 τ < and − ≤ ∞ (4.77) P (µ τ) = P (µ) = aµ a , µ =1,...,M . 2 | 2 As indicated, in this particular case, P (µ τ) turns out to be independent 2 | of τ. Both one variable density functions are properly normalized over their domains of definition:

M M ∞ ∞ a τ aµ P (τ) dτ = a e− dτ = 1 and P (µ) = = 1 . 1 2 a 0 0 µ=1 µ=1 Z Z X X Thus, in the direct method a random value τ is created from a random number on the unit interval, r1, and the distribution P1(τ) by taking ln r τ = 1 . (4.78) − a The second task is to generate a random integer µˆ according to P (µ τ) in 2 | such a way that the pair (τ, µ) will be distributed as prescribed by P (τ, µ).

For this goal another random number, r2, will be drawn from the unit interval and then µˆ is taken to be the integer that fulfils

µ 1 µ − aν < r a aν . (4.79) 2 ≤ ν=1 ν=1 X X The values a1, a2, . . . , are cumulatively added in sequence until their sum is observed to be equal or to exceed r2a and then µˆ is set equal to the index of the last aν term that had been added. Rigorous justifications for equations (4.78) and (4.79) are found in [1, pp.431-433]. If a fast and reliable uniform random number generator is available, the direct method can be easily programmed and rapidly executed. This it represents a simple, fast, and rigorous procedure for the implementation of the Monte Carlo step of the simulation algorithm.

The first-reaction method. This alternate method for the implementa- tion of the Monte Carlo step of the simulation algorithm is not quite as efficient as the direct method but it is worth presenting here because it adds 238 Peter Schuster insight into the stochastic simulation approach. Adopting again the notation aν γν hν (n) it is straightforward to derive ≡

Pν (τ) dτ = aν exp( aν τ) dτ (4.80) − from (4.66) and (4.67). Then, Pν (τ) would indeed be the probability at time t for an Rν reaction to occur in the time interval [t + τ, t + τ + dτ[ were it not for the fact that the number of Rν reactant combinations might have been altered between t and t + τ by the occurrence of other reactions. Taking this into account, a tentative reaction time τν for Rν is generated according to the probability density function Pν (τ), and in fact, the same can be done for all reactions Rµ . We draw a random number rν from the unit interval and { } compute ln rν τν = , ν =1,...,M . (4.81) − aν From these M tentative next reactions the one, which occurs first, is chosen to be the actual next reactions:

τ = smallest τν for all ν =1,...,M , (4.82) µ = ν for which τν is smallest .

Daniel Gillespie [1, pp.420-421] provides a straightforward proof that the random (τ, µ) obtained by the first reaction method is in full agreement with the probability density P (τ, µ) from equation (4.73). It is tempting to try to extend the first reaction methods by letting the second next reaction be the one for which τν has the second smallest value. This, however, is in conflict with correct updating of the vector of particle numbers, n, because the results of the first reaction are not incorporated into the combinatorial terms hµ(n). Using the second earliest reaction would, for example, allow the second reaction to involve molecules already destroyed in the first reaction but would not allow the second reaction to involve molecules created ion the first reaction. Thus, the first reaction method is just as rigorous as the direct method and it is probably easier to implement in a computer code than the direct method. From a computational efficiency point of view, however, the direct Stochastic Kinetics 239 method is preferable because for M 3 it requires fewer random numbers ≥ and hence the first reaction methods is wasteful. This question of economic use of computer time is not unimportant because stochastic simulations in general are taxing the random number generator quite heavily. For M 3 ≥ and in particular for large M the direct method is probably the method of choice for the Monte Carlo step. An early computer code of the simple version of the algorithm described – still in FORTRAN – is found in [1]. Meanwhile many attempts were made in order to -up computations and allow for simulation of stiff systems (see e.g. [110]. A recent review of the simulation methods also contains a discussion of various improvements of the original code [4]. 240 Peter Schuster 5. Applications of stochastic processes in biology

Compared to stochasticity in chemistry stochastic phenomena in biology are not only more important but also much harder to control. The major sources of the problem are small particle numbers and the lack of sufficiently simple references systems that are accessible to experimental studies. In biology we are regularly encountering reaction mechanisms that lead to enhancement of fluctuations at non-equilibrium conditions and biology in essence is dealing with processes and stationary states far away from equilibrium whereas in chemistry autocatalysis in non-equilibrium systems became an object of gen- eral interest and intensive investigation not before some forty years ago. We start therefore with the analysis of simple autocatalysis modeled by means of a simple birth-and-death process. Then we present an overview over solvable birth-and-death processes (section 5.1) and discuss the role of boundaries in form of different barriers (section 5.1.2). In section 5.2 we come back to the size expansion for stochastic processes and analyze it with biological prob- lems in the focus. Finally, the Poisson expansion is presented in section 5.3, because it finds very useful applications in biology.

5.1 Autocatalysis, replication, and extinction

In the previous chapter we analyzed already bimolecular reactions, the addi- tion and the dimerization reaction, which gave rise to perfectly normal be- havior although the analysis was quite sophisticated (subsection 4.2.2). The nonlinearity became manifest in task to find solutions but did not change effectively the qualitative behavior of the reaction systems, for example the √N-law for the fluctuations in the stationary states retained its validity. As an exactly solvable example we shall study first a simple reaction mechanism

241 242 Peter Schuster consisting of two elementary steps, replication and extinction. In this case the √N-law is not valid and fluctuations do not settle down to some value which is proportional to the square root of the size of the system but grow in time without limit as we saw in case of the Wiener process (3.4.3).

5.1.1 Autocatalytic growth and death

Reproduction of individuals is modeled by a simple duplication mechanism and death is represented by first order decay. In the language of chemical kinetics these two steps are:

λ A + X 2 X , (5.1a) −−−→ µ X B . (5.1b) −−−→

The rate parameters for reproduction and extinction are denoted by λ and µ, respectively.1 The material required for reproduction is assumed to be replenished as it is consumed and hence the amount of A available is con- stant and assumed to be included in the birth parameter: λ = f [A]. The · degradation product B does not enter the kinetic equation because reaction (5.1b) is irreversible. The stochastic process corresponding to equations (5.1) belongs to the class of linear birth-and-death processes with w+(n) = λ n · 2 and w−(n)= µ n. The master equation is of the form, ·

∂Pn(t) = λ (n 1) Pn 1(t) + µ (n + 1) Pn+1(t) (λ + µ) nPn(t) , (5.2) ∂t − − −

1Reproduction is to be understood a asexual reproduction here. Sexual reproduction, of course, requires two partners and gives rise to a process of order 2 (table 4.1). 2 Here we use the symbols commonly applied in biology: λ(n) for birth, µ(n) for death, and ν for immigration and ρ for emigration (tables 5.1 and 5.2). These notions were created especially for application to biological problems, in particular for problems in theoretical ecology. Other notions and symbols are common in chemistry: A birth corresponds to the production of a molecule, f λ, a death to its decomposition or degradation through a ≡ chemical reaction, d µ. Influx and outflux are the proper notions for immigration and ≡ emigration. Stochastic Kinetics 243

E()N()t

t

Figure 5.1: A growing linear birth-and-death process.The two-step reaction mechanism of the process is (X 2X, X ) with rate parameters λ and → → µ, respectively. The upper part shows the evolution of the probability density, P (t) = Prob (t)= n. The initially infinitely sharp density, P (n, 0) = δ(n,n ) n X 0 becomes broader with time and flattens as the variance increases with time. In the lower part we show the expectation value E (t) in the confidence interval N E σ. Parameters used: n = 100, λ = √2, and µ = 1/√2; sampling times ± 0  (upper part): t = 0 (black), 0.1 (green), 0.2 (turquoise), 0.3 (blue), 0.4 (violet), 0.5 (magenta), 0.75 red), and 1.0 (yellow). and after introduction of the probability generating function g(s, t) gives rise to the PDE ∂g(s, t) ∂g(s, t) (s 1)(λs µ) = 0 . (5.3) ∂t − − − ∂s 244 Peter Schuster

E()N()t

t

Figure 5.2: A decaying linear birth-and-death process. The two-step reac- tion mechanism of the process is (X 2X, X ) with rate parameters λ and → → µ, respectively. The upper part shows the evolution of the probability density, P (t) = Prob (t)= n. The initially infinitely sharp density, P (n, 0) = δ(n,n ) n X 0 becomes broader with time and flattens as the variance increases but then sharp- ens again as process approaches the absorbing barrier at n = 0. In the lower part we show the expectation value E (t) in the confidence interval E σ. Param- N ± eters used: n0 = 40, λ = 1/√2, and µ = √2; sampling times (upper part): t = 0 (black), 0.1 (green), 0.2 (turquoise), 0.35 (blue), 0.65 (violet), 1.0 (magenta), 1.5 red), 2.0 (orange), 2.5 (yellow), and limt (black). →∞ Stochastic Kinetics 245

Solution of this PDE yields different results for equal or different replication and extinction rate coefficients, λ = µ and λ = µ, respectively. In the first 6 case we substitute γ = λ/µ (= 1) and η(t) = exp (λ µ)t , and find: 6 − 

n0 η(t) 1 + γ η(t) s g(s, t) = − − and γη(t) 1 + γ 1 η(t) s ( −  −  ) min(n,n ) 0  n + n m 1 n P (t) = γn ( 1)m 0 − − 0 (5.4) n − n m m × m=0  −   X m n0+n m 1 η(t) − γ η(t) − − . × 1 γη(t) γ 1 η(t)  −  − ! 

In the derivation of the expression for the probability distributions we ex- panded enumerator and denominator of the expression in the generating n n n k function g(s, t), by using expressions for the sums (1 + s) = k=0 k s n k n(n+1)...(n+k 1) k and (1+ s)− =1+ ∞ ( 1) − s , multiply, order terms with k=1 − k! P  respect to powers ofPs, and compare with the expansion of the generating n function, g(s, t)= n∞=0 Pn(t) s .

Computations ofP expectation value and variance are straightforward:

(λ µ) t E (t) = n e − and NX 0 (5.5) 2  λ + µ (λ µ) t (λ µ) t σ (t) = n e − e − 1 NX 0 λ µ − −  

Illustrative examples of linear birth-and-death processes with growing (λ > µ) and decaying (λ<µ) populations are shown in figures 5.1 and 5.2, re- spectively.

In the degenerate case of neutrality with respect to growth, µ = λ, the 246 Peter Schuster same procedure yields: λt + (1 λt) s n0 g(s, t) = − , (5.6a) 1+ λt + λts   min(n,n ) λt n0+n 0 n + n m 1 n 1 λ2t2 m P (t) = 0 − − 0 − , n 1+ λt n m m λ2t2 m=0   X  −     (5.6b)

E (t) = n , and (5.6c) NX 0 σ2 (t) = 2 n λt . (5.6d) NX 0 Comparison  of the last two expressions shows the inherent instability of this reaction system. The expectation value is constant whereas the fluctuations increase with time. The degenerate birth-and-death process is illustrated in figure 5.3. The case of steadily increasing fluctuations is in contrast to an equilibrium situation where both, expectation value and variance approach constant values. Recalling the Ehrenfest urn game, where fluctuations were negatively correlated with the deviation from equilibrium, we have here two uncorrelated processes, replication and extinction. The particle number n fulfils a kind of random walk on the natural numbers, and indeed in case of the random walk (see equation (3.30) in subsection 3.4.2 we had also obtained a constant expectation value E = n0 and a variance that increases linearly with time, σ2(t)=2ϑ(t t )). − 0 A constant expectation value accompanied by a variance that increases with time has an easy to recognize consequence: At some critical time above which the standard deviation exceeds the expectation, tcr = n0 (2λ). From this instant on predictions on the evolution of the system based on the expec- tation value become obsolete. Then we have to rely on individual probabili- ties or other quantities. Useful in this context is the probability of extinction of all particles, which can be readily computed: λt n0 P (t) = . (5.7) 0 1+ λt   Provided we wait long enough, the system will die out with probability one, since we have limt P0(t) = 1. This seems to be a contradiction to the →∞ Stochastic Kinetics 247

Pn () t

n

Figure 5.3: Continued on next page. 248 Peter Schuster

Figure 5.3: Probability density of a linear birth-and-death with equal birth and death rate. The two-step reaction mechanism of the process is (X → 2X, X ) with rate parameters λ = µ. The upper and the middle part show → the evolution of the probability density, P (t) = Prob (t) = n . The initially n X infinitely sharp density, P (n, 0) = δ(n,n0) becomes broader with time and flattens as the variance increases but then sharpens again as the process approaches the absorbing barrier at n = 0. In the lower part, we show the expectation value E (t) in the confidence interval E σ. The variance increases linearly with N ± time and at t = n0/(2λ) = 50 the standard deviation is as large as the expectation value. Parameters used: n0 = 100, λ = 1; sampling times, upper part: t = 0 (black), 0.1 (green), 0.2 (turquoise), 0.3 (blue), 0.4 (violet), 0.49999 (magenta), 0.99999 red), 2.0 (orange), 10 (yellow), and middle part: t = 10 (yellow), 20

(green), 50 (cyan), 100 (blue), and limt (black). →∞ constant expectation value. As a matter of fact it is not: In almost all individual runs the system will go extinct, but there are very few cases of probability measure zero where the particle number grows to infinity for t . These rare cases are responsible for the finite expectation value. →∞ Equation (5.7) can be used to derive a simple model for random selection [111]. We assume a population of n different species

λ A + X 2 X , j =1,...,n , (5.1a’) j −−−→ j µ X B , j =1,...,n . (5.1b’) j −−−→ The probability joint distribution of the population is described by

P = P (t)= x ,..., (t)= x = P (1) . . . P (n) , (5.8) x1...xn X1 1 Xn n x1 · · xn wherein all probability distribution for individual species are given by equa- tion (5.6b) and independence of individual birth events as well as death events allows for the simple product expression. In the spirit of Motoo Kimura’s neutral theory of evolution [7] all birth and all death parameters are assumed to be equal, λj = λ and µj = µ for all j = 1,...,n. For convenience we as- sume that every species is initially present in a single copy: Pnj (0) = δnj ,1. Stochastic Kinetics 249

Figure 5.4: The distribution of sequential extinction times . Shown are Tk the expectation values E( ) for n = 20 according to equation(5.10). Since E( ) Tk T0 diverges, is the extinction that appears on the average at a finite value. A single T1 species is present above and random selection has occurred in the population. T1

We introduce a new random variable that has the nature of a first passage time: is the time up to the extinction of n k species and characterize Tk − it as sequential extinction time. Accordingly, n species are present in the population between n, which fulfils n 0 by definition, n 1, n 1 species T T ≡ T − − between n 1 and n 2, and eventually a single species between 1 and 0, T − T − T T which is the moment of extinction of the entire population. After no T0 particle X exists any more. Next we consider the probability distribution of the sequential extinction times H (t) = P ( < t) . (5.9) k Tk The probability of extinction of the population is readily calculated: Since individual reproduction and extinction events are independent we find

λt n H = P = P (1) . . . P (n) = . 0 0,...,0 0 · · 0 1+ λt   250 Peter Schuster

The event < t can happen in several ways: Either X is present and all T1 1 other species have become extinct already, or only X2 is present, or only X3, and so on, but < t is also fulfilled if the whole population has died out: T1

H1 = Px1=0,0,...,0 + P0,x2=0,...,0 + P0,0,...,xn=0 + H0 . 6 6 6 The probability that a given species has not yet disappeared is obtained by exclusion since existence and nonexistence are complementary, λt 1 Px=0 = 1 P0 = 1 = , 6 − − 1+ λt 1+ λt which yields the expression for the presence of a single species (λt)n 1 H (t) = (n + λt) − , 1 (1 + λt)n and by similar arguments a recursion formula is found for the extinction probabilities with higher indices

n k n (λt) − Hk(t) = + Hk 1(t) , k (1 + λt)n −   that eventually leads to the expression

k n (λt)n j H (t) = − . k j (1 + λt)n j=0 X   The moments of the sequential extinction times are computed straightfor- wardly by means of a handy trick: Hk is partitioned into terms for the k individual powers of λt, Hk(t) = j=0 hj(t) and then differentiated with respect to time t P n (λt)n k h (t) = − , j j (1 + λt)n   dhj(t) λ n n j 1 n n j = h0 = (n j)(λt) − − j(λt) − . dt j (1 + λt)n+1 j − − j     

The summation of the derivatives is simple because hk0 + hk0 1 + . . . + h0 is a − telescopic sum and we find

n k 1 dHk(t) n n k t − − = (n k) λ − . dt k − (1 + λt)n+1   Stochastic Kinetics 251

Making use of the definite integral [112, p.338]

n k (n k+1) ∞ t λ − dt = − − , (1 + λt)n+1 n k Z0 k we finally obtain for the expectation values of the  sequential extinction times

∞ dH (t) n k 1 E( ) = k t dt = − , n k 1 , (5.10) Tk dt k · λ ≥ ≥ Z0 and E( ) = (see figure ). It is worth recognizing here another paradox T0 ∞ of probability theory: Although extinction is certain, the expectation value for the time to extinction diverges. Similarly as the expectation values, we calculate the variances of the sequential extinction times: n(n k) 1 σ2( ) = − , n k 2 , (5.11) Tk k2(k 1) · λ2 ≥ ≥ − from which we see that the variances diverges for k = 0 and k = 1.

For distinct birth parameters, λ1,...,λn, and different initial particle numbers, x1(0),...,xn(0), the expressions for the expectation values become considerably more complicated, but the main conclusion remains unaffected: E( ) is finite whereas E( ) diverges. T1 T0

5.1.2 Boundaries in one step birth-and-death processes

One step birth-and-death processes have been studied extensively and ana- lytical solutions are available in table form [19]. For transition probabilities + at most linear in n, w (n)= ν + λn and w−(n)= ρ + µn, one distinguishes birth (λ), death (µ), immigration (ν), and emigration (ρ) terms. Analytical solutions exist for all one step birth-and-death processes whose transitions probabilities are not of higher order than linear in particle numbers n. It is necessary, however, to consider also the influence of boundaries on these processes. For this goal we define an interval [a, b] for the stochastic process. There are two classes of boundary conditions, absorbing and reflect- ing boundaries. In the former case, a particle that left the interval is not allowed to return to it whereas the latter boundary implies that it is forbid- den to exit from the interval. Boundary conditions can be easily implemented by ad hoc definitions of transition probabilities: 252 Peter Schuster

Reflecting Absorbing + Boundary at a w−(a)=0 w (a 1)=0 − + Boundary at b w (b)=0 w−(b +1)=0

+ The reversible chemical reaction with w−(n)= k n and w (n)= k (n n), 1 2 0 − for example had two reflecting barriers at a = 0 and b = n0. Among the examples we have studied so far we found an absorbing boundary in the replication-extinction process between = 1 and = 0 tantamount to NA NA the boundary a = 1, and the state n = 0 is an end point of all trajectories reaching it. Compared, for example, to an unrestricted random walk on positive and negative integers, n Z, a chemical reaction or a biological process has to ∈ be restricted by definition, n N0, since negative particle numbers are not ∈ allowed. In general, the one step birth-and-death master equation (4.8),

∂Pn(t) + + = w (n 1) Pn 1(t)+ w−(n+1) Pn+1(t) (w (n)+w−(n) Pn(t) , ∂t − − −   is not restricted to n N0 and thus does not automatically fulfil the proper ∈ boundary conditions to model a chemical reaction. What we need is a mod- ification of the equation at n = 0 which introduces a proper boundary of the process: ∂P0(t) + = w−(1) P (t) w (0) P (t) . (4.8’) ∂t 1 − 0

This occurs naturally if w−(n) vanishes for n = 0 which is always so when the constant or migration term vanishes, ν = 0. With w−(0) = 0 we only need to make sure that P 1(t) = 0 and obtain equation (4.8’). This will − be the case whenever we take an initial state with P (0) = 0 n < 0, and n ∀ it is certainly true for our conventional initial condition, Pn(0) = δn,n0 with n 0. By the same token we prove that the upper reflecting boundary 0 ≥ for chemical reactions, b = n0, fulfils the conditions of being natural too. Equipped with natural boundary conditions the stochastic process can be solved for the entire integer range, n Z, which is often much easier than ∈ with artificial boundaries. All the barriers we have encountered so far were natural. Stochastic Kinetics 253 [16] [16] [16] Ref. [114] [114] [114] [113] ) × µt 1) ) 1)  − e µt − t − µt ) σ − − − ρ 1 e λt λt e 0 e − 0 ρt νt + ( (1 +1)( − γ n n γ 2 ν ( λt µt ( (1 σ e Variance + − 0 0 e × n ν µ n 0  n t ) ) ρ µt + µt ρt − νt − λt µt e σ µ − 0 e ν − − 0 − e + n 0 e n 0 0 (1 0 Mean n 0 ν n n + ( n n 0 + n ) ) ,n ,n (0 (0 k t > < ) − ) ) ρ 0 n 0 × + n  ,n ,n  n k ν 0 ν µ ; ( × k  ; k (0 n (0 0  0 −   × 0 n e k σ n  2 > n < k t σ/γ  1 ) 2 − 2 2 ) t 0 ≥ 0 1 − − − ≤ ) λ 2 1 0 k n n 0 νρ µt k λ − n −  ; ; 1 − − + √ ) 0 0 , n 0 =0 e n,n k , n t n t n n  k ( ) Data are taken from [19, pp.10,11]. Abbrevi- n 0 n ( )! − n + 0 n (2 k − 0  µt n − P 1 n n ≥ ≤ − − − 0 0 n − e n n,n − + (1 n k ( ) n k 0 ) is the modified Bessel function. n ) − ν k µ P n + − x 1) , n  , n − λ t (1 I 0 ( µ t n − n n − − n σ 2 −   γσ νt ρt ( e / I µtk + e − e e ) )! )! − − λt n 1 0) 0 0 0 0 1 n e − λt n − n n n =0 − exp )  P 1+ k − − − n,n × − 0 0 ( (1 n n n  (1 n n × ) ) ( =0 , and ( ( n P k  λ t n,n } ρt νt γ ( 0 ρ ν ( ( 0 nµ t n × − − e e n,n   { 0 0 n n n n min ≡  ) 0 /µ 0 ) t × n 0 ) µt 0 0 n − n,n n n −  ρ/s  n e )   ) , ( + t ) s t  s s t/s ) ) s − ) s /s ) νs 1) ) µ ) σ − σ s 1 − − − − +( − λt s − λt s t λ s,t ( γ (1 − ) ( (1 − − ( (1 ν 1)(1 ρ (1 e γ 0 ρ e µt λt + e n (1 1) µt 0 +(1 − − ν 1)+( ≡ g 0 ( − n e − 1+ λt s λt − n s e − e σ ( σ s γσ e  − ( ( , ν − − 0 1   n 1 1  s   λ/µ exp ≡ × γ n 0 0 ρ ρ µ λn µn µn µn n 0 0 ν ν ν Comparison of of results for some unrestricted processes. λ λn λn λn Birth Death Death Birth& Poisson Poisson Process Table 5.1: ation and notations: 254 Peter Schuster Ref. [16, 115] [16, 115] [16, 115] [16, 115] [16, 115] [116, 117] [116, 117] [116, 117] [116, 117] ) ), x ( , γ  n j  I   ) ζ l j j ( are the − + n + j l 0 u ˆ 2 ( ξ n H k − − 0 where n ≡ +2 n − , respectively. 0  and n + u t n 1 n 2 j 2 H /γ − I I ξ − / ) n 1 1 1  2 2 ) − − j − l , γ 1 j/ j/ 2  j γ  νρ j ˆ − I  j ζ  j j ( γ ρ ν  ρ ν γ l G 2( H j + j j  −  ˆ 1 1 ˆ ) γ G n H u l − − I l l l l − =2 =2 H ∞ j u − − − − ∞ j ), =0 =0 =0 =0 ( ≡ j u j u u j u j   k l 0 P , γ P , respectively; n 2 n )= j P · P l P P · +2 ˆ − ξ − l −     0 I ( 2   − , γ n k k n k k  n j  − ˆ ˆ ξ ζ ξ − ζ ˆ 0 ρ ζ ν + = ρ ν ) u G σ − ( σ − n t n 2 1 ( n I l σ l σ + I − − 0 ≡ I − n n n − − l 1 1 I ; n ,...,u − n − − n − + t n,n u u ˆ ) ˆ u G G G 0 ρ 0 ˆ l P l H n H =0 + n H + + = 0 − − 0 0 ∞ k − ν 0 − 0 n n u ( 0 n n n n 2 is a hypergeometric function, e n − − I P , j I ˆ 2 u − u G ) G − l / F ˆ n l ) Data are taken from [19, pp.16,17]. Abbreviation 1 α − α H + H 1 and 0 , γ − − 0 =0 − l l 1 n j ) l k u n l ˆ − ξ − =0 + − − − =0 ( + l − l k u u n u k l P n u ( 2 − =0 ( − I ) − I u k n P k P u 2 2 − ) where n / l n / P 1 +2 1 1 − ν/ρ γ 0 − n − ρ ν l γ G ( ρ ν u n γ − γ γ   − u ,...,u ≡ n − γ + I )= + α 1 0 , 0 = 0 , , γ n 1 t n j −∞ ) ˆ − ξ is a Gottlieb polynomial, − µ = ( , j n x, n ∞ k − I n I − λ +1 (  l G  P e − n, α α  ) = 0 u − ≡ α ( G , γ j σ F ζ , n ( l γ ) where − −∞ −∞ u 1 and , γ = λ/µ : refl : abs : refl : abs : refl : refl : abs j : : H l l  − ξ l l l l l l l ≡ ( l ; ; ), n +1 γ 1 k k − ∞ ∞ G − − x ≡ : refl; : refl; : abs; : refl; : abs; Boundaries : refl; : + : abs; : +  x, γ n u u u u u k n u u ( u u G ,...,u n k ) G 1 = 0 − ) ) γ l l )= + 1) + 1) , j − − − n n n ρ ρ ρ ρ ρ µ x, γ n n (1 − − ( ( ( n ) = 0 µ µ u u =0 ( ( n k H , γ µ µ j ), P ξ ( n l , γ Comparison of of results for some restricted processes. γ j − ) ) ˆ ζ u n n ( ≡ + 1) + 1) G n l l ) n − − ν ν ν ν ν H λ − − u u ( ( x, γ ≡ n n λ λ ( ( ( n n λ λ ˆ Table 5.2: and notations used in the table are: is a modified Bessel function; G roots of H Stochastic Kinetics 255

For the sake of completeness we summarize the conditions, which can be introduced in the master equation in order to sustain reflecting or absorbing barriers for processes described by forward and backward master equations on the interval [a, b] (see e.g. [6, pp.283-284]).

Forward master equation on [a, b] Reflecting Absorbing + Boundary at a w−(a)Pa(t)= w (a 1)Pa 1(t) Pa 1(t)=0 − − − + Boundary at b w (b)Pb(t)= w−(b + 1)Pb+1(t) Pb+1(t)=0

Backward master equation on [a, b] Reflecting Absorbing

Boundary at a P (., . a 1, t0)= P (., . a, t0) P (., . a 1, t0)=0 | − | | − Boundary at b P (., . b +1, t0)= P (., . b, t0) P (., . b +1, t0)=0 | | | An overview over a few selected birth-and-death processes is given in tables 5.1 and 5.2. Commonly, unrestricted and restricted processes are distinguished [19]. An unrestricted process is characterized by the possibility to reach all states (t) = n. A requirement imposed by physics demands N that all changes in state space are finite for finite times (growth condition in subsection 3.5.3) and hence the probabilities to reach infinity at finite times must vanish: limn Pn,n0 = 0. The linear birth and death process in →±∞ table 5.1 is unrestricted only in the positive direction and the state (t)=0 N is special because it represents an absorbing barrier. The restriction is here hidden and met by the condition P (t)=0 n< 0. n,n0 ∀ 256 Peter Schuster

5.2 Size expansion in biology

In the previous chapter we introduced a size expansion for the chemical mas- ter equation (section 4.3.3). Here, we shall repeat the derivation of introduce this useful technique by means of a simple example, the spreading of an epi- demic, which is, nevertheless, sufficiently general in order to be transferable to other cases [98, pp.251-258]. Before we discuss this example, however, we come back to the birth-and- + death master equation (4.8) for one step processes, W (n n0)= w (n0)δn,n 1+ 0− + | w−(n0)δn,n0+1, where w (n) and w−(n) are usually analytic functions, which we shall assume are (at least) twice differentiable. We repeat the one step master equation:

∂Pn(t) + + = w (n 1) Pn 1(t) + w−(n + 1) Pn+1(t) w (n)+ w−(n) Pn(t) , ∂t − − −   and define a single step difference operator D by

1 D f(n) = f(n + 1) , and D− fb(n) = f(n 1) . (5.12) − Using thisb operator we can rewrite theb master equation and find

∂P(t) + 1 = ( D 1)w (n)+( D− 1)w−(n) P(t) . (4.8’) ∂t − − n o The jump moments areb now b

p + α (n) = ( 1) w (n) + w−(n) . (4.6’) p − We repeat the macroscopic rate equation

d n + h i = w−( n ) + w ( n ) , dt − h i h i find for the coupled equations for expectation value and variance the simpler expressions

2 + 2 d n + 1 d w ( n ) d w−( n ) 2 h i = w ( n ) w−( n ) + h i h i σ , (5.13a) dt h i − h i 2 dn2 − dn2 n   2 + dσn + dw ( n ) dw−( n ) 2 = w ( n ) + w−( n )+2 h i h i σ , (5.13b) dt h i h i dn − dn n   Stochastic Kinetics 257

Figure 5.5: The macroscopic part of a stochastic variable . The variable N n is partitioned according to into a macroscopic part and the fluctuations around it, n =Ωφ(t)+Ω1/2x(t), wherein Ω is a size parameter, for example the size of the population or the volume of the system. Computations: Ωφ(t) = 5n (1 0.8e kt) 0 − − 1/2 (n Ωφ(t))2 /(2σ2) 2 with n0 = 2 and k = 0.5; p(n,t)=Ω x(t) = e− − / 2πσ with σ = 0.1, 0.17, 0.24, 0.285, 0.30. p and are now in the position to handle the example by means of an expansion technique (see [98, pp.251-254] and section 4.3.3). An epidemic spreads in a population of Ω individuals. We assume that n(t) individuals are already infected. The probability of a new infection is proportional to both, to the number of infected and to the number of uninfected individuals, w−(n) = βn(Ω n). No cure is possible and thus − 258 Peter Schuster w+(n) = 0. Finally, we have

W (n n0) = β δ n0 (Ω n0) , | n,n0+1 − which leads to the master equation ∂P (t) n = β(n 1)(Ω n + 1) P (t) βn(Ω n) P (t) or ∂t n+1 n − − − − (5.14) ∂P 1 = β D− 1 n (Ω n) P(t) . ∂t − −   Basic to the expansionb is the idea that the density of the stochastic variable can be split in a macroscopic part, Ω φ(t), and fluctuations of the order N Ω1/2 around it. As shown in figure 5.5 we assume that P (n, t) is represented by a (relatively) sharp peak located approximately at Ω φ(t) with a width of order Ω1/2. In other words, we assume that the fluctuations fulfil a √N-law, and we make the ansatz

n(t) = Ω φ(t) + Ω1/2 x(t) , (5.15) where x is a new variable describing the fluctuations and the function φ(t) has to be chosen in accord with the master equation. As said above, Ω φ(t) is called the macroscopic part and Ω1/2 x the fluctuating part of n. We may refer to the new variables as an Ω language. The probability density of n becomes now a probability density Π(x, t) of x:

P (n, t) ∆n = Π(x, t) ∆x , (5.16) Π(x, t) = Ω1/2 P Ωφ(t) + Ω1/2x, t .

Differentiation yields3  ∂Π ∂P ∂Π dφ ∂P ∂P = Ω1/2 , = Ω1/2 Ω + , ∂x ∂n ∂t dt ∂n ∂t   and eventually we obtain ∂P ∂Π dφ ∂Π Ω1/2 = Ω1/2 . (5.17) ∂t ∂t − dt ∂x 3The somewhat unclear differentiation ∂P/∂n can be circumvented through direct vari- ation of t by δt and simultaneously of x by Ω1/2φ(t)δt, which leads to the same final − result. Stochastic Kinetics 259

Now, the difference operators are also size expanded in power series of differ- ential operators

2 1/2 ∂ 1 1 ∂ D = 1+Ω− + Ω− + ... , (5.18a) ∂x 2 ∂x2 1 1 D−b = = 1/2 ∂ 1 1 ∂2 1 + Ω− ∂x + 2 Ω− ∂x2 + . . . 2 2 b 1/2 ∂ 1 1 ∂ 1 ∂ = 1 Ω− Ω− + . . . + Ω− + . . . − ∂x − 2 ∂x2 ∂x2 ≈ 2 1/2 ∂ 1 1 ∂ 1 Ω− + Ω− , (5.18b) ≈ − ∂x 2 ∂x2 2 1 1/2 ∂ 1 1 ∂ D− 1 = Ω− + Ω− . (5.18c) − − ∂x 2 ∂x2 Insertionb of the operator and substitution of the new variables into the master 1/2 equation (5.14) yields after cancelation of an overall factor Ω−

2 ∂Π 1/2 dφ ∂Π 2 1/2 ∂ 1 1 ∂ Ω = βΩ Ω− + Ω− ∂t − dt ∂x − ∂x 2 ∂x2 ·  1/2  1/2 φ + Ω− x 1 φ Ω− x Π(x, t) . · − −   The right-hand side requires two consecutive differentiati ons of three factors:

1/2 ∂ 1/2 1/2 Ω− φ +Ω− x 1 φ Ω− x Π(x,t) = − ∂x · − −      1/2 1/2 1/2 1/2 ∂Π = 1 2φ 2Ω− x Π(x,t) Ω φ +Ω− x 1 φ Ω− x , − − − − · − − ∂x  2      1 1 ∂ 1/2 1/2 Ω− φ +Ω− x 1 φ Ω− x Π(x,t) = 2 ∂x2 · − − n    o 2 3/2 ∂Π 1 1 1/2 1/2 ∂ Π = Ω− + Ω− φ +Ω− x 1 φ Ω− x . − ∂x 2 · − − ∂x2     For convenience we introduce a new time scale, τ = βΩt, in order to absorb one factor Ω – and for convenience also the factor β – into the time variable. Collection of terms corresponding to the largest powers in Ω now yields dφ ∂Π(x, τ) ∂Π(x, τ) Ω1/2 : = φ (1 φ) , dτ ∂x − ∂x ∂Π(x, τ) ∂ 1 ∂2Π(x, τ) Ω0 : = (1 2φ) xΠ(x, τ) + φ (1 φ) . ∂τ − − ∂x 2 − ∂x2   260 Peter Schuster

The largest term cancels if dφ = φ (1 φ) , (5.19’) dτ − and this yields the differential equation for the macroscopic variable φ(t), which after transformation back into the original variables leads to the macro- scopic rate equation dn = β n (Ω n) . (5.19) dt − Equating the next largest term, the coefficient of Ω0 to zero results in a linear Fokker-Planck equation with time dependent coefficients φ(t):

∂Π(x, τ) ∂ 1 ∂2Π(x, τ) = (1 2φ) xΠ(x, τ) + φ (1 φ) . (5.20) ∂τ − − ∂x 2 − ∂x2   Equation (5.20) describes the fluctuations of the random variable (t) around N the macroscopic part and these fluctuations are of order Ω1/2 as expected and initially assumed. The strategy for solving the master equation (5.14) is now obvious. At first one determines φ(τ) by integrating the differential equation (5.19’) with the initial value φ(0) = n0/Ω, then one solves the Fokker-Planck equa- tion (5.20) with the initial condition Π(x, 0) = δ(x) and finally one obtains the desired probabilities from

1/2 n Ωφ(τ) P (n, t n , 0) = Ω− Π − , τ (5.21) | 0 Ω1/2   A typical solution is sketched in figure 5.5 and it compares perfectly with the exact solutions for sufficiently large systems (see, for example figures 4.4 and 4.5). Remembering the derivation we remark the terms of relative order 1/2 Ω− and smaller have been neglected. Stochastic Kinetics 261

5.3 The Poisson representation

Master equations that are constructed on the basis of combinatorial or mass action kinetcs and expansion of the probability distribution in Poisson distri- butions provide a useful technique for the setup of Fokker-Planck equations and stochastic differential equations, which are exactly equivalent to birth- and-death master equations with many variables as they are used for the de- scription of chemical or biological reaction networks. The expansion method is called Poisson representation and has been developed by Crispin Gardiner and Subhash Chaturvedi [118, 119] (see also [6, pp.301-335]). In this context the system size expansion takes on the form of a small noise expansion of the Fokker-Planck equation obtained by the Poisson representation. In simple cases the Poisson representation yields stochastic differential equations, which can be solved straightforwardly, but there are other cases where the corresponding stochastic differential equation can be formulated only in the complex plane. Then, the gauge Poisson representation [120] provides an appropriate frame for solving the more complicated cases, and recent applications of the theory to practical problems in population biology have demonstrated the usefulness of the approach [121]. In a very wide class of systems the time development is considered as the result of individual encounters between members of a population. Such systems comprise, for example, (i) chemical reactions being caused by colli- sions between molecules, (ii) biological population dynamics resulting from giving birth, mating, eating each other and dying, and (iii) systems in epi- demiology where diseases are transmitted by contacts between individuals. Encounter based time evolution gives rise to combinatorial kinetics, which on the macroscopic level is known as mass action kinetics in chemistry.We in- clude therefore a consideration of birth-an-death processes in many variables in this section (subsection 5.3.2). 262 Peter Schuster

5.3.1 Motivation of the Poisson representation

Statistical physics relates microscopic processes to macroscopic quantities and thus provides the link between the realm of atoms and molecules and thermodynamics. Under the conditions of an ideal gas or an ideal solution of reacting species probability distributions of microstates are assumed to be Poissonian and the correctness or suitability of this assumption has been well established empirically. Considering a system of n molecular species Sj involved in m reactions Rµ we obtain for the distribution function in the grand canonical ensemble4

P (σ ) = exp β µ n (σ ) ε , (5.22) k N · j j k − k j ! X  where σk is a microscopic state characterized by an index (set) ‘k’, nj(σk) is the number of particles of species Sj in state σk, εk the energy of this par- ticular state, µ is the chemical potential of species S , is a normalization j j N factor, and β =1/(kBT ) with T being the (absolute) temperature in Kelvin. Chemical reactions are converting chemical species and consequently equi- librium thermodynamics requires that the chemical potentials fulfil certain relations. If a state σk is supposed to be converted into a state σ` then

(S) (S) νj nj(σk) = νi ni(σ`) , S =1, 2, 3,..., j i X X which implies the stoichiometric restrictions on the reaction system. The canonical and the grand canonical ensembles are defined by the requirements

(S) (S) νj nj(σk) = τ and j X P (σ ) ν(S) n (σ ) ν(S) n = τ (S) , k j j k ≡ j h ji j j Xk X X 4In statistical thermodynamics three ensembles are distinguished [122, pp.513-518]: The microcanonical ensemble referring to constant energy and maximizing entropy, the canonical ensemble defined for constant volume and minimizing Helmholtz free energy, and eventually the grand canonical ensemble is defined by constant chemical potentials µj and the quantity minimized is the product p V . Stochastic Kinetics 263 respectively. The grand canonical probability distribution results through maximizing entropy with fixed mean energy with the stoichiometric con- straint, and this implies that the chemical potential satisfies the relation

(S) µj = νj µS . (5.23) S X Eventually one finds for the probability distribution of the population num- bers of the individual states n : { j} nj 1 P ( n ) = exp β µ n exp βε(A) . (5.24) { j} N j j n ! − k j ! j j ! X Y Xk   (A) A Herein the εk -values are the eigenstates of a single molecules of . Equa- tion (5.24) is a multivariate Poissonian distribution with the expectation values exp(βµj) nj = . (5.25) h i exp βε(A) k − k Equation (5.25) combined with theP chemical potential (5.23) yields the law of mass action. Implementation of the stronger constraint for the canonical ensemble leads to

nj (A) (A) 1 (A) P ( nj ) = δ νj nj , τ exp β εk . (5.26) { } N × nj! − ! YA Xj  Yj Xk   Local fluctuations can be included into the considerations by partitioning the system into individual cells interrelated by transport and the stoichiometric relation involve summations over all cells [118]. The canonical distribution allows for a straightforward proof that one obtains locally Poissonian dis- tributions, which become uncorrelated in the large volume limit. For local calculations there is no difference between the canonical and the grand canon- ical distribution but the latter is much easier to handle for the description of thermodynamic equilibrium. The application of pure statistical mechanics thus leads to Poissonian distributions, locally as well as globally, and it is suggestive therefore to approximate otherwise hard to derive probability distributions form master equations by multivariate Poissonians. 264 Peter Schuster

5.3.2 Many variable birth-and-death systems

Combinatorial kinetics is introduced best by means of examples and we shall start by illustrating a reversible dimerization process (see the subsubsec- tion 4.2.2.2): X 2 Y. The forward reaction X 2 Y occurs by a kind of → spontaneous fission giving rise to two identical molecules Y. Such a sponta- neous process can be visualized as a kind of degenerate encounter involving just one molecule of X. For the transition in the molecular population we have the probability

w(x x 1,y y +2) = k+ x , → − → and for the reverse reaction, 2 Y X, pairs of molecules Y have to be as- → sembled, which have a combinatorial probability of y(y 1)/2 and hence we − find

w(x x +1,y y 2) = k− y(y 1) . → → − −

Chemical multi-variate master equation. Generalization to a reaction system with n reaction components or chemical species – reactants and/or X products j – involved in m individual reactions Rµ where nµG and nµQ are the numbers of variable chemical reactant species or product species,5 respectively, yields:

k+ µ nµ nµG Q (µG)X (µQ)X νj j −−−→ νi i with µ =1, 2,...,m. (5.27) j=1 ←−−− i=1 X kµ− X

(µG) (µQ) The stoichiometric coefficients νj and νi represent the numbers of molecules of species Xj or Xi, respectively, which are involved in the elemen- 6 tary step of reaction Rµ (5.27). The coefficients are properly understood as

5Concentrations or particle numbers of species that are kept constant are assumed to be incorporated into rate parameters. 6For consistency we shall use the formal indices G and Q for the reactant side and the product side of a general reaction Rµ, respectively. For clarity we use different indices for summation and products – ‘j’ for reactants and ‘i’ for products – although this is not demanded by mathematical rules. Stochastic Kinetics 265 elements of three matrices: (i) the stoichiometric reactant matrix G, (ii) the stoichiometric product matrix Q, and (iii) the stoichiometric matrix S, which fulfil the relation Q G = S. The particle numbers or concentrations of the − individual chemical species are subsumed in the vector,

x(t) = ([X1], [X2],..., [Xn]) = (x1, x2,...,xn)0 , and making use of individual, reaction specific columns of the three matrices, G, Q, and S, we obtain for the changes in particle numbers for one elementary step of the reaction Rµ:

ν(1) ν(2) ν(µ) ν(m) 1 1 ··· 1 ··· 1 ν(1) ν(2) ν(µ) ν(m) S =  2 2 ··· 2 ··· 2  ......  ......   (1) (2) (µ) (m) νn νn νn νn   ··· ···    For each reaction we define a vector

(µ) (µ) (µ) (µ) r = α (ν1 , ν1 ,..., νn )0 , (5.28) which, in principle, accounts for α steps of the reaction Rµ. It is straight- forward now to write down the changes in particle numbers caused by α elementary steps of reaction R(µ) in either direction, forward or backward:

x x + r(µ) in forward direction, and −→ (5.29) x x r(µ) in backward direction. −→ − The reaction rates or transition probabilities are readily calculated from the reaction rate parameters and the combinatorics of molecular encounters

nµ G x ! w+ = k+ j , and µ µ (µG) j=1 (xj νj )! Y − (5.30) nµQ xi! w− = k− . µ µ (µQ) i=1 (xi ν )! Y − i For large particle numbers and in macroscopic deterministic kinetics the com- binatorial expressions are approximated by the terms with the highest power 266 Peter Schuster in the particle numbers

nµ nµG µG Q µQ + + xj xi wµ kµ and wµ− kµ− . ≈ (µG) ≈ (µQ) j=1 νj ! i=1 ν ! Y Y i Finally, we obtain the master equation for an arbitrary chemical reaction step (5.27) of a reaction network

m ∂P (x, t) (µ) (µ) + = w−(x + r ) P (x + r , t) w (x) P (x, t)+ ∂t µ − µ µ=1 X (5.31) + (µ) (µ) + w (x r ) P (x r , t) w−(x) P (x, t) , µ − − − µ  or written in terms of the original parameters:

m n (µG) (µQ) x (xj + ν ν )! ∂P ( ,t) + j j (µG) (µQ) = kµ − P x + ν ν ,t ∂t (µQ) − − µ=1 (xj ν )! X jY=1 − j    n x ! j P x,t + (µG) − (xj ν )! ! jY=1 − j    m n (µG) (µQ) (xj ν + ν )! j j (µG) (µQ) + kµ− − P x ν + ν ,t (µG) − − µ=1 (xj ν )! X jY=1 − j    n x ! j P x,t . (5.31’) − (µQ) (xj ν )! ! jY=1 − j    In equations (5.28) and (5.31) steps of any size α are permitted, the single- step birth-and-death master equation for the chemical reaction network, how- ever, is tantamount to the restriction α = 1 (see also subsection 4.4.3).

Generating functions. For combinatorial kinetics we can derive a fairly simple differential equation for the probability generation function:

xmax n xj g(s, t) = sj P (x, t) (5.32) x=0 j=1 ! X Y (max) (max) (max) with xmax = x1 , x1 ,...,xn being a vector collecting the (in- dividually) maximal values of particle numbers for individual species and Stochastic Kinetics 267

0 = (0, 0,..., 0) being the zero-vector. Now we shall calculate the corre- sponding partial differential equation. For this goal we separate the two parts corresponding to step-up and step-down functions as expressed by the + transition probabilities wµ and wµ−, respectively:

∂g(s, t) ∂+g(s, t) ∂ g(s, t) = + − ∂t ∂t ∂t As follows from equation (5.30) the corresponding equations are of the form

n + µG µ ∂ g(s, t) + (xj rj )! xj µ = kµ − sj P (x r , t) ∂t µ (µG) − − µ,x j=1 (xj rj νj )!  X Y − − (5.33) nµG xj! xj sj P (x, t) , − (µG) j=1 (xj νj )! ! Y −  whereby the step-down equation (∂−g(s, t)/∂t) has the same structure and the two equations are related by replacing the superscripts, + , the 7 ↔ − running index, i j, and numbers of reactions, nµG nµQ , and the stoi- ↔ µ ↔ chiometric coefficients νµG ν Q . Next we change the summation variable j ↔ i in the first term from x to x r(µ) and obtain, because the summation is − extended over the entire domain of x and all probabilities for x-values outside vanish:

nµ nµ + G (µ) G ∂ g(s,t) + xj! xj +rj xj! xj = kµ sj sj P (x,t) , ∂t (µG) − (µG) µ,x (xj ν )! (xj ν )! ! X jY=1 − j jY=1 − j For further simplification we make use of the easy to verify expressions

(µ ) xj G νj (µG) sj xj! ∂ xj νj = (µ ) sj sj and (µG) ν G j (xj νj )! j ∂x j ! Y − Y j (µ) (µ ) xj +rj G (µ ) νj Q sj xj! ∂ xj νj = (µ ) sj sj , (µG) ν G j (xj νj )! j ∂x j ! Y − Y j 7 The numbers nµG and nµQ need not be the same in case some reactions are considered to proceed irreversibly. An alternative approach uses nµG = nµQ and zero values for some rate parameters. 268 Peter Schuster and obtain for the step-up equation

(µG) + (µ ) (µ ) ν s Q ν G j s ∂ g( , t) + νi j ∂ g( , t) = kµ si sj (µ ) . ∂t − ν G µ i j ! j X Y Y ∂xj A similar formula is derived for the step-down equation and summation of the step-up and the step-down equation finally yields the general expression of the differential equation for the generating function of chemical networks:

(µ ) (µ ) ∂g(s, t) ν Q ν G = s i s j ∂t i − j × µ i j X Y Y  (µG) (µQ) (5.34) νj νi + ∂ ∂ kµ (µ ) kµ− (µ ) g(s, t) . × ν G − ν Q  j j i i ! Y ∂xj Y ∂xi We illustrate by means of examples.

Two reactions with a single variable. The two-step reaction mechanism

k1 A + X 2 X + D and −−−→ k2 (5.35) B + X −−−→ C . ←−−− k3 consists of an irreversible autocatalytic reaction step and a reversible addition reaction. The concentrations of three molecular species are assumed to be

fixed: [A] = a0, [B] = b0, and [C] = c0, species D does not enter the kinetic equations since it is the product of an irreversible reaction and hence, only a single variable x = [X] has to be considered. Among other applications this model is a simple implementation of the processes taking place in a nuclear reactor. The first reaction represents nuclear fission: One neutron (X) hits a nucleus A and releases to neutrons (2X) and residual products (D). The second reaction describes absorption of neutrons by B and creation by the reverse process. The three stoichiometric matrices have a very simple form – n = 1 and m = 2 – in this case:

G = (1 1) , Q = (2 0) , and S = Q G = (1 1) . − − Stochastic Kinetics 269

The constant concentrations can be absorbed into the rate parameters and we have

+ + k1 = k1 a0 = α, k1− = 0 , k2 = k2 b0 = β, k2− = k3 c0 = γ .

From equation (5.34) we obtain by insertion and some calculation:

∂g(s, t) ∂g(s, t) = (1 s)(β αs) (1 s) γg(s, t) . (5.36) ∂t − − ∂s − − Formal division by dg yields the characteristic equations dt ds dg = = , 1 − (1 s)(β αs) γ(1 s) g − − − leading to two ODEs ds 1 dg ds dt = and = − (1 s)(β αs) − γ g β αs − − − that can be readily integrated and yield the solutions

1 s (α β)t γ/α u = − e − and v = (β αs) g . β αs − − The general solution can now be written v = f(u) where f(u) is some function still to be determined

γ/α 1 s (α β)t g(s, t) = (β αs)− f − e − . − β αs  −  ! This equation sustains a variety of time-dependent solutions. From the condi- tional probability P (x, t x , 0) that represents the initial conditions we obtain | 0 g(s, 0) = sx0 and

x0 x0/α x0 γ/α f(z) = (1 βz) (1 αz)− − (β α) − − − and with λ = β α the probability generating function takes on the form − x0 γ/α λt λt g(s, t) = λ β(1 e− ) s(α βe− ) − − − × (5.37)   γ/α x0 λt λt − − (β αe− ) αs(1 e− ) . × − − −   270 Peter Schuster

First and second moment for the probability distribution of the random vari- able (t) are readily computed from the derivatives of the generating func- X tion by means of equation (2.71): ∂g(s, t) ∂2g(s, t) = x(t) and = x(t) x(t) 1 . (2.71’) ∂s s=1 h i ∂s2 s=1 −

 The computation then yields for the time derivatives d x(t) h i = (α β) x(t) + γ and dt − h i d x(t) x(t) 1 − = 2(α β) x(t) x(t) 1 + 2(α + γ) x(t) dt  − − h i For α < β the two equations have a stationary solution and the stationary mean and variance are γ βγ x¯ = and σ2(¯x) = . (5.38) h i β α (α β)2 − − Equation (5.37) allows for straightforward calculation of stationary states

γ/α x0 γ/α x0 lim g(s, t) = (β α) (β sα) (β sα)− − , t →∞ − − − β α γ/α g¯(s) = − , and β sα  −  Γ(x + γ/α)(α/β)x P¯(x) = (β α)γ/α . (5.39) Γ(γ/α) x! − A stationary solution exist only if α < β or [A] β the system is unstable in the sense that x(t) diverges exponentially, h i limt x(t) = . Expectation value and variance at the steady state are →∞ h i ∞ readily computed from (∂g¯(s)/∂s) and (∂2g¯(s)/∂s2) and, of course, |s=1 |s=1 yield the same result as in equation (5.38).

Interesting phenomena arise when α = k1a0 approaches β = k2b0, this is when the number of particles X created by reaction one approximates the number of particles X, which are annihilated in reaction two. Since both quantities α and β involve the input concentrations of A and B, respectively, the critical quantity β α can easily fine tuned in experiments. It is inter- − esting that both moments become very large when the critical value α = β Stochastic Kinetics 271 in approached since

σ2(¯x) β k b = = 2 0 for α β , x¯ β α k b k a → ∞ → h i − 2 0 − 1 0 and accordingly we are dealing with very large fluctuations in x¯ near the h i critical point. Since the system is Markovian and the differential equation for the expectation value is linear we have

2 α β 2 k1a0 k2b0 x(t), x(0) = σ (¯x) e − = σ (¯x) e − , h i and the relaxation of fluctuations becomes very slow near the critical point, a phenomenon that is known as critical slowing down.

5.3.3 The formalism of the Poisson representation

The basic assumption is that a probability distribution P (x, t) can be ex- panded as a superposition of uncorrelated, multivariate Poisson distributions:

x e αk α k P (x, t) = dα − k f(α, t) , (5.40) xk! Z Yk where f(α, t) is the quasiprobability of the Poisson representation, and then the probability generating function g(s, t) can be written as

g(s, t) = dα exp (sk 1) αk f(α, t) . (5.41) − ! Z Xk Substitution of g(s, t) into the probability generating function in equation (5.34), which stems from the general master equation (5.31’) for a chemical reaction, leads to

(µ ) (µ ) ν Q ν G ∂g(s, t) ∂ i ∂ j = dα +1 +1 ∂t ∂α − ∂α × µ i i j j X Z Y   Y    (µ ) (µ ) Q ν G + νi j Σ (s 1)α k α k− α e k k− k f(α, t) . × µ i − µ j i j !  Y Y  272 Peter Schuster

Integration by parts, neglect of surface terms and finally comparison of co- efficients in the exponential functions yields [6, pp.301,302]

(µ ) (µ ) ν Q ν G ∂f(α, t) ∂ i ∂ j = 1 1 ∂t − ∂α − − ∂α × µ i i j j ! X Y  Y  (5.42) (µ ) (µ ) Q ν G + νi j k α k− α f(α, t) . × µ i − µ j i j  Y Y  The function f(α, t) is obtained as solution of equation (5.42) and insertion into equation (5.40) yields the approximation to the probability distribution P (x, t). The introduction of a reaction flux J(α) with the components

(µ ) (µ ) Q ν G + νi j Jµ(α) = k α k− α (5.43) µ i − µ j i j Y Y facilitates the formulation of a Fokker-Planck equation. If the repertoire of elementary steps does not contain reaction orders larger than two or at maximum bimolecular reactions – which is almost always the case in realistic systems – then the Fokker-Planck equation contains no higher derivatives than second order:

n ∂f(α, t) ∂ µ = A Jµ(α) f(α, t)+ ∂t − ∂α i i=1 i µ ! X X 1 n ∂2 + B J(α) f(α, t) with 2 ∂α ∂α ij i,j=1 i j X  (µ ) (µ ) (5.44) Aµ = ν Q ν G and i i − i (µQ) (µQ) (µG) (µG) Bij J(α) = δi,j νi (νi 1) νi (νi 1) Jµ(α)+ µ − − −  X  (µQ) (µQ) (µG) (µG) + (1 δ ) ν ν ν ν Jµ(α) . − i,j i j − i j µ X  For practical purposes it is much more convenient not to work with the Fokker-Planck equation (5.44) but with the equivalent stochastic differential Stochastic Kinetics 273 equation in It¯oform: n µ dαi = Ai Jµ(α)dt + cij J(α) dW (t) with µ j=1 (5.45) X X  cij J(α) = B J(α) , ij where dW (t) is a differential   Wiener process. In order to calculate quantities of interest from equation (5.45) in the inverse power of the system size V , it is useful to define + α + kµ kµ− η = , κµ = (µ ) , and κµ− = (µ ) . V ν Q +1 ν G +1 V i i V i i and the stochastic differentialP equation takes on the formP n µ 1 dαi = Ai Jµ(α)dt +  cij J(α) dW (t) with  = . µ j=1 V X X  We illustrate by means of two examples. p

The monomolecular reversible reaction. Here, we consider the monomolec- ular reversible reaction

k1

X −−−→ Y (5.46) ←−−− k2 as an exercise in two variables, although mass conservation would allow for the elimination of one variable. It can be described by the master equation ∂P (x, y, t) = k1 (x + 1) P (x +1,y 1, t) xP (x, y, t) + ∂t − − (5.47)   + k (y + 1) P (x 1,y +1, t) yP (x, y, t) , 2 − − where P (x, y, t) = Prob( (t) = x, (t) = y).8 Now we expandP (x, y, t) in X Y Poisson distributions according to equation (5.40) whereby is a normal- N ization factor and the region of integration is still to be determined: x αy αx αx αy y P (x, y, t) = dα dα e− e− f(α , α , t) . (5.48) N x y x! y! x y Z 8It is important to keep in mind that depending on the experimental setup the stochas- tic variables and may be dependent or independent. In closed systems we have the X Y conservation relation + = N or x + y = n with N and n being constants. X Y 274 Peter Schuster

In general, a Poisson representation in terms of a function f(α, t) need not exist but it can be proven to exist in terms of generalized functions:

Any distribution in x can be realized as a linear combination of Kronecker 9 deltas δx,z, which can be chosen to fulfill x α α z α δ = dα e− δ ( α) e . x,z x! − Z   Now we choose f (α)=( 1)z δz(α) eα and find through integration by parts z − x x z α α α d dα f (α) e− = dα δ(α) = δ , z x! x! −dα x,z Z Z   and for a one-variable probability distribution we can write

x α α z z α P (x) = dα e− ( 1) P (z) δ (α) e , x! − z ! Z X and in a formal sense a function f(α) can always be found for any P (x).

Commonly, however, one need not rely on such rather bizarre distribu- tions as the one that has been used for the proof of existence of a Poisson quasiprobability. If the quasiprobability vanishes at the boundary of the re- gion of integration, substitution of (5.48) into (5.47) and integration by parts yields the Fokker-Planck equation for f(αx, αy, t)

∂f(αx, αy, t) ∂ ∂ = + (k1αx k2αy) f(αx, αy, t) (5.49) ∂t − ∂αx ∂αy −     It is important to note that the diffusion coefficient in equation (5.49) is zero and this is generally observed for all first-order reactions giving rise to linear kinetic equations: All fluctuations obey the Poissonian law of noise as it is encapsulated in the Poissonian distribution. The range of integration in equation (5.48) follows from the solution of (5.49) by searching for the manifold in the (αx, αy) plane as the boundary on which f(αx, αy) vanishes. The general steady state solution satisfying the boundary condition of vanishing f(αx, αy) at the limits of the range of integration can be written

f¯(α , α ) = δ(k α k α ) φ(α , α ) , (5.50) x y 1 x − 2 y x y 9The n-derivative of the delta function with respect to x is denoted by δn(x). Stochastic Kinetics 275

10 where φ(αx, αy) still is some arbitrary function. If we choose, for example, φ(α , α )= δ(α x¯) the corresponding steady x y x − state solution P¯(x, y) is x αy αx αx αy y P¯(x, y) = dα dα e− e− δ(k α k α ) δ(α x¯) . N x y x! y! 1 x − 2 y x − Z The range of integration is any region, which contains the point where the arguments of both delta functions vanish. Eventually we find for the equi- librium distribution x y x¯ y¯ (¯x+¯y) P¯(x, y) = e− . (5.51) x! y! which is a Poisson distribution in x and y wherein the valuesx ¯ andy ¯ are related by the condition for the deterministic equilibrium: k x¯ k y¯ = 0. 1 − 2 Alternatively, we could use

φ(α , α ) = ( 1)n δn(α ) eαx+αy . x y − y Then, we obtain for the stationary probability distribution n! 1 1 P¯(x, y) = x¯x y¯y δ(x + y n) and with y = n x nn x! y! − − (5.52) 1 n x (n x) P¯(x) = x¯ (n x¯) − , nn x −   which is a binomial distribution. In this case the condition for the determin- istic equilibrium is k1x¯ = k2y¯ = nk1k2/(k1 +k2). The two choices correspond to the grand canonical and the canonical ensemble, respectively, and we see nicely that the latter is more restricted than the former: In the closed system the total number of molecules X and Y is constant, + = N. X Y

10An alternative but equivalent ansatz for the stationary solution is f¯(α , α )= g(α + α )/(k α k α ) but because of the pole at k α = k α it requires x y x y 1 x − 2 y 1 x 2 y integration in the complex plane (subsection 5.3.4). If integration is restricted to a region along the real axis (5.50) is unique. 276 Peter Schuster

Two reactions with a single variable. The reaction mechanism studied here is closely related to the nuclear reactor model (5.35):

k1

A + X −−−→ 2 X and ←−−− k2 (5.53) k3

B + X −−−→ C , ←−−− k4 both reaction are assumed to be reversible and the amounts of three molecular 11 species are assumed to be constant, [A] = a0, [B] = b0, and [C] = c0. The stoichiometric matrices for n = 1 and m = 2 have a very simple form:

G = (1 1) , Q = (2 0) , and S = Q G = (1 1) . − − The constant concentrations can be absorbed into the rate parameters and we have + + k1 = k1 a0, k1− = k2 , k2 = k3 b0 , k2− = k4 c0 . Equation (5.42) for the function f(α, t) in the Poisson representation now takes on the form

∂f(α, t) ∂ 2 ∂ = 1 1 (k a α k α2) f(α, t)+ ∂t − ∂α − − ∂α 1 0 − 2    ! ∂ + 1 1 (k b α k c ) f(α, t) and − − ∂α 3 0 − 4 0  ! eventually leading to

∂f(α, t) ∂ 2 = k4c0 +(k1a0 k3b0)α k2α + ∂t −∂α − −   (5.54) 2 ∂ 2 + 2 (k1a0α k2α ) f(α, t) . ∂α − !

11We remark that the mechanism (5.53) is compatible with equilibrium thermodynamics if and only if the relation (k a k b k k c )= ϑ = 0 is fulfilled. 1 0 · 3 0 − 2 · 4 0 Stochastic Kinetics 277

This equation is of Fokker-Planck form as long as D = k a α k α2 > 0 1 0 − 2 is fulfilled. The critical values of α where the expression for D vanishes are

α1 = 0 and α2 = k1a0/k2 between these two values we have D > 0 and a conventional Fokker-Planck equation exists. For one variable the factorial moments are obtained from the simple re- lationship (2.75d) and they are of the general form

e α xr dα x(x 1) . . . (x r + 1) − f(α) = h i ≡ − − x! x X Z   (5.55) = dα αr f(α) αr . ≡ h i Z There is, however, one caveat: The quasiprobability f(α) need not fulfil the conditions required for a probability. The stationary solution of the Fokker-Planck equation 5.54 up to normal- ization is

k3b0/k2 k4c0/(k1a0) 1 k4c0/(k1a0) 1 α f¯(α) = (k a k α) − − α − e . (5.56) N 1 0 − 2 × Although this is a fairly smooth function, the prerequisites for a probability density – f¯(α) being nonnegative everywhere and normalizable – need not be fulfilled. With ϑ =(k b /k k c /k a ) normalization is possible on the 3 0 2 − 4 0 1 0 interval ]0,k1a0/k2[ provided ϑ> 0 and k4 > 0 are fulfilled. In addition, one has to check whether the surface terms vanish under these conditions, which is the case in the current example, and thus there exists a genuine Fokker-

Planck equation for ϑ > 0 and k4 > 0 that is equivalent to the stochastic differential equation

dα = k c +(k a k b ) α k α2 dt + 2(k a α k α2) dW (t) . 4 0 1 0 − 3 0 − 2 1 0 − 2   p (5.57) The domain of the variable is 0 <α

5.3.4 Real, complex, and positive Poisson representations

In the case of a single variable only the Poisson representation of the proba- bility distribution P (x, t) is of the form

e α αx P (x, t) = dµ(α) − f(α) , (5.58) x! Z D where µ(α) is a measure that can be chosen in different ways and is the D domain of integration, which can take on various forms depending on the choice of µ(α). In real Poisson representations the choice of the measure is dµ(α) = dα and is a section [a, b] of the real line. As we have seen in the second D example of the previous subsection 5.3.3 there may be situations where the diffusion coefficient becomes negative and then a real Poisson representation does not exist. For complex Poisson representations we choose again the simple measure dµ(α) = dα but is a contour C in the complex plane. In order to analyze D the existence of a complex Poisson representation we choose

z! z 1 α fz = . α− − e 2πıı and C being a contour surrounding the origin instead of the expression f (α)=( 1)zδz(α)eα used previously for the real representation. Then z − we can show that

1 z! x z 1 Pz(x) = . dα α − − = δx,z 2πıı x! CI

By appropriate summation we may express a given probability distribution P (x) in terms of functions f(α) which are now given by

1 α z 1 f(α) = . P (z) e α− − z! . (5.59) 2πıı z X Provided the P (z) have the property that z!P (z) is bounded for all z, the series has a finite radius of convergence outside which f(α) is analytic. By Stochastic Kinetics 279

Figure 5.6: Definition of the contour C in the complex plane The sketch refers to the reaction mechanism (5.53). The contour C is chosen to enclose the pole on the real axis at α = k1a0/k2. The range on the real axis that is accessible to the real Poisson representation, the stretch ]0, k1a0/k2[, is shown as full line. choosing the contour C to lie outside this circle of convergence, the integra- tion can be taken inside the summation and we find that P (x) is finally given by e α αx P (x) = dα − f(α) . (5.60) x! CI Equation (5.60) is the analogue of the real representation in the complex plane and, for example, can be used for functions f(α) that have poles on the real axis.

The complex Poisson representation allows for the calculation of station- ary solutions in analytic form to which exact or asymptotic methods are easily applicable. In general, it is nor so useful for the derivation of time-dependent solutions.

Two reactions with a single variable. Again we adopt the reaction model (5.53) of subsection 5.3.3 with the notation introduced there. When a steady state exists, the quantity ϑ =(k b /k k c /k a ) provides a measure 3 0 2− 4 0 1 0 for the relative direction in which the two reactions are progressing: ϑ > 0 implies that the first reaction produces and the second reaction consumes X, at ϑ = 0 both reaction balance separately and we have thermodynamic 280 Peter Schuster equilibrium see the footnote to the mechanism (5.53) , and ϑ< 0 the first X 12 reaction consumes whereas the second reaction produces it. Now we dis- cuss the three conditions separately: (i) ϑ> 0: This case has been analyzed in subsection 5.3.3. The conditions ¯ for f(α) to be a valid quasiprobability on the real interval ]0,k1a0/k2[ are fulfilled. Within this range the diffusion coefficient D = k a α k α2 is 1 0 − 2 positive and the deterministic mean of α given by k a k b + (k a k b )2 +4k k c α¯ = α = 1 0 − 3 0 1 0 − 3 0 2 4 0 , h i p 2k2 lies within the interval under consideration, ]0,k1a0/k2[, we are dealing with a genuine Fokker-Planck equation and f¯(α) is a function vanishing at both ends of the interval and having its maximum near the deterministic mean. (ii) ϑ= 0: Both reaction balance separately and the existence of Poissonian ¯ steady state is expected. The quasiprobability f(α)has a poleat α = k1a0/k2 and the range of α is chosen to be a contour C in the complex plane enclosing this pole. No boundary terms will arise for a closed contour C through partial integration and hence P¯(x) resulting from this type of Poissonian representation satisfies the steady state master equation. By calculus of residues we find that

α0 x e α0 k1a0 P¯(x) = with α0 = . (5.61) x! k2 (iii) ϑ< 0: The steady state does no longer satisfy the condition ϑ > 0. If, however, the range of α is chosen to be a contour C in the complex plane as shown in figure 5.6 the complex Poisson representation can be used to construct a steady state solution of the master equation that has the form: e α αx P¯(x) = dα f¯(α) − x! CI Now the deterministic steady state corresponds to a point on the real axis the is situated to the right of the singularity at α = k1a0/k2 and asymptotic 12This example demonstrates nicely the difference between stationarity and detailed balance: All three cases describe steady states but only the condition ϑ = 0, the state of thermodynamic equilibrium, is compatible with detailed balance. Stochastic Kinetics 281 evaluation of means, higher moments and other quantities may be performed by choosing C to pass through the saddle point occurring there. Then, the variance σ2(α)= α2 α 2 is negative. This has the consequence that the h i−h i variance in x, which is obtained as the variance in α plus the variance of the Poisson distribution, α , h i σ2(x) = x2 x 2 = α2 α 2 + α , (5.62) − h i − h i h i is smaller than the variance of the Poissonian: σ2(x) < x . In other words, h i the steady distribution is narrower than the Poissonian distribution. Finally, we stress that all three cases may be obtained from the contour C. For ϑ = 0 the cut from the singularity at α = k a /k to α = 1 0 2 −∞ and C can be distorted to a simple contour around the pole. If ϑ > 0 the singularity becomes integrable and the contour C may be collapsed onto the cut. The integral to be evaluated is a discontinuity integral over the full range

[0,k1a0/k2] that might need modifications for ϑ being a positive integer.

For the positive Poisson representation we choose α to be a genuine two . valued complex variable, α α + ııα , ≡ x y 2 dµ(α) = d α = dαx dαy , and being the entire complex plane. Then, it can be proven [6, pp.309-312] D that for any P (x) there exists a positive f(α) such that e α αx P (x) = d2α − f(α) . (5.63) x! Z A positive Poisson representation, fp(α), however, need not be unique as we show by means of an example. We choose 1 α α 2 f = exp | − 0| p 2πσ2 − 2σ2   and g(α) an analytic function of α that can be expanded according to

∞ (α α )n g(α) = g(α ) + g(n)(α ) − 0 such that 0 0 n! n=1 X 1 α α 2 exp | − 0| g(α) = g(α ) , 2πσ2 − 2σ2 0 Z   282 Peter Schuster since all terms with n 1 vanish upon integration. As the Poissonian ≤ α x 2 e− α /x! itself is an analytic function we find for any positive value of σ e α αx e α0 αx P (x) = d2α − f (α) = − 0 . x! p x! Z Nonuniqueness of this kind is an advantage in practice rather than a problem. As as example we consider again mechanism (5.53) that gives rise to the . SDE in the complex plane since α = αx + ıı αy

dα = k c +(k a k b ) α k α2 dt + 2(k a α k α2) dW (t) . 4 0 1 0 − 3 0 − 2 1 0 − 2   p (5.64) Again the term ϑ plays the decisive role: For ϑ> 0 the noise term vanishes at α = 0 and α = k1a0/k2, is positive between these point and the drift term takes care that α returns to the range ]0,k1a0/k2[ whenever it reaches one of the endpoints. For ϑ> 0 equation (5.64) is the real stochastic differential equation (5.57) on the real interval [0,k1a0/k2].

For ϑ < 0 the stationary point lies outside the interval [0,k1a0/k2], and a point inside the interval will migrate according to (5.64) along the interval until it reaches the right-hand end, where the noise vanishes but the drift continues to drive it further towards the right. On leaving the interval the noise becomes imaginary and the point will start to diffuse in the complex plane until it may eventually return again to the interval [0,k1a0/k2]. Thus, the behavior for the entire range of ϑ is encapsulated in a single SDE.

Application to logistic growth. Based on the mechanism (5.53) an ex- tensive stochastic analysis of the logistic model in population dynamics has been performed by Alexei and Peter Drummond [121, 123]. The deterministic dynamics is described by the differential equation dx = x (g c x) , dt − which is identical with the logistic equation of Pierre-Fran¸cois Verhulst [124]. In this formulation, g is the unconstrained growth rate and g/c the carrying capacity of the ecosystem. The dynamical system has an equilibrium point atx ¯ = g/c and a second stationary state atx ¯ = 0. Drummond and Drum- mond add a death reaction in addition the competition process allowing for Stochastic Kinetics 283 limitation of the population size. The implementation of the logistic model makes use of three processes:

a Death : X , −−−→ b Birth : X 2 X , and (5.65) −−−→ c Competition : 2 X X . −−−→ Parametrization yields the three quantities: (i) the net growth rate g = b a, − (ii) the carrying capacity Nc = g/c, and (iii) the reproductive ratio r = b/a. Comparison with the mechanism (5.53) shows that this logistic model is a special case of it with the substitutions: k a b, k c, k b a, and 1 0 → 2 → 3 0 → k c 0. 4 0 → 284 Peter Schuster Bibliography

[1] D. T. Gillespie. A general method for numerically simulating the stochastic time evolution of coupled chemical reactions. J. Comp. Phys., 22:403–434, 1976.

[2] D. T. Gillespie. Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem., 81:2340–2361, 1977.

[3] D. T. Gillespie. A rigorous derivation of the chemical master equation. Physica A, 188:404–425, 1992.

[4] D. T. Gillespie. Stochastic simulation of chemical kinetics. Annu. Rev. Phys. Chem., 58:35–55, 2007.

[5] K. L. Chung. Elementary Probability Theory with Stochastic Processes. Springer-Verlag, New York, 3rd edition, 1979.

[6] C. W. Gardiner. Stochastic Methods. A Handbook for the Natural Sciences and Social Sciences. Springer Series in Synergetics. Springer-Verlag, Berlin, fourth edition, 2009.

[7] M. Kimura. The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, UK, 1983.

[8] D. S. Moore, G. P. McCabe, and B. Craig. Introduction to the Practice of Statistics. W. H. Freeman & Co., New York, sixth edition, 2009.

[9] H. Risken. TheFokker-Planck Equation. Methods of Solution and Applications. Springer-Verlag, Berlin, 2nd edition, 1989.

[10] M. Fisz. Wahrscheinlichkeitsrechnung und mathematische Statistik. VEB Deutscher Verlag der Wissenschaft, Berlin, 1989.

[11] H. Georgii. Stochastik. Einf¨uhrung in die Wahrscheinlichkeitstheorie und Statistik. Walter de Gruyter GmbH & Co., Berlin, third edition, 2007.

[12] D. S. Moore and W. I. Notz. Statistics. Concepts and Controversies. W. H. Freeman & Co., New York, seventh edition, 2009.

[13] W. I. Notz and M. A. Fligner. Study Guide for Moore and McCabe’s Introduction to the Practice of Statistics. W. H. Freeman, New York, third edition, 1999.

285 286 BIBLIOGRAPHY

[14] N. Henze. Stochastik f¨ur Einsteiger. Eine Einf¨uhrung in die faszinierende Welt des Zufalls. Vieweg Verlag, Braunschweig, DE, fourth edition, 2003.

[15] M. S. Bartlett. An Introduction to Stochastic Processes with Special Reference to Methods and Applications. Cambrigde University Press, Cambridge, UK, 3rd edition, 1978.

[16] D. R. Cox and H. D. Miller. The Theory of Stochastic Processes. Methuen, London, 1965.

[17] J. L. Doob. Stochastic Processes. John Wiley & Sons, New York, 1953.

[18] W. Feller. An Introduction to Probability Theory and its Applications, volume I and II. John Wiley, New York, 1966.

[19] N. S. Goel and N. Richter-Dyn. Stochastic Models in Biology. Academic Press, New York, 1974.

[20] R. L. Graham, D. E. Knuth, and O. Patashnik. Concrete Mathematics. Addison-Wesley, Reading, MA, 2nd edition, 1994.

[21] M. Iosifescu and P. T˘autu. Stochastic Processes and Application in Biology and Medicine. I. Theory, volume 3 of Biomathematics. Springer-Verlag, Berlin, 1973.

[22] M. Iosifescu and P. T˘autu. Stochastic Processes and Application in Biology and Medicine. II. Models, volume 4 of Biomathematics. Springer-Verlag, Berlin, 1973.

[23] S. Karlin and H. M. Taylor. A First Course in Stochastic Processes. Academic Press, New York, 1975.

[24] S. Karlin and H. M. Taylor. A Second Course in Stochastic Processes. Academic Press, New York, 1981.

[25] A. Stuart and J. K. Ord. Kendall’s Advanced Theory of Statistics. Volume 1: Distribution Theory. Charles Griffin & Co., London, fifth edition, 1987.

[26] A. Stuart and J. K. Ord. Kendall’s Advanced Theory of Statistics. Volume 2: Classical Inference and Relationship. Edward Arnold, London, fifth edition, 1991.

[27] A. Loman, I. Gregor, C. Stutz, M. Mund, and J. Enderlein. Measuring rotational diffusion of macromolecules by fluorescence correlation spectroscopy. Photochem. Photobiol. Sci., 9:627–636, 2010.

[28] M. G¨osch and R. Rigler. Fluorescence correlation spectroscopy of molecular motions and kinetics. Advanced Drug Delivery Reviews, 57:169–190, 2005. BIBLIOGRAPHY 287

[29] S. T. Hess, S. Huang, A. A. Heikal, and W. W. Webb. Biological and chemical applications of fluorescence correlation spectroscopy: A review. Biochemistry, 41:697–705, 2002.

[30] J. Hohlbein, K. Gryte, M. Heilemann, and A. N. Kapanidis. Surfing on a new wave of single-molecule fluourescence methods. Phys. Biol., 7:031001, 2010.

[31] E. Haustein and P. Schwille. Single-molecule spectroscopic methods. Curr. Op. Struct. Biol., 14:531–540, 2004.

[32] R. Brown. A brief description of microscopical observations made in the months of June, July and August 1827, on the particles contained in the pollen of plants, and on the general existence of active molecules in organic and inorganic bodies. Phil. Mag., Series 2, 4:161–173, 1828.

[33] A. Einstein. Uber¨ die von der molekular-kinetischen Theorie der W¨arme geforderte Bewegung von in ruhenden Fl¨ussigkeiten suspendierten Teilchen. Annal. Phys. (Leipzig), 17:549–560, 1905.

[34] M. von Smoluchowski. Zur kinetischen Theorie der Brownschen Molekularbewegung und der Suspensionen. Annal. Phys. (Leipzig), 21:756–780, 1906.

[35] E. N. Lorenz. Deterministic nonperiodic flow. J. Atmospheric Sciences, 20:130–141, 1963.

[36] G. Mendel. Versuche ¨uber Pflanzen-Hybriden. Verhandlungen des naturforschenden Vereins in Br¨unn, 4:3–47, 1866.

[37] R. A. Fisher. Has Mendel’s work been rediscovered? Annals of Science, 1:115–137, 1936.

[38] A. Franklin, A. W. F. Edwards, D. Fairbanks, D. Hartl, and T. Seidenfeld. Ending the Mendel-Fisher Controversy. University of Pittsburgh Press, Pittsburgh, PA, 2008.

[39] W. Penney. Problem: Penney-Ante. J. Recreational Math., 2(October):241, 1969.

[40] R. T. Cox. The Algebra of Probable Inference. The John Hopkins Press, Baltimore, MD, 1961.

[41] E. T. Jaynes. Probability Theory. The Logic of Science. Cambridge University Press, Cambridge, UK, 2003.

[42] G. Vitali. Sul problema della misura dei gruppi di punti di una retta. Tipi Gamberini e Parmeggiani, Bologna, 1905. 288 BIBLIOGRAPHY

[43] P. Billingsley. Probability and Measure. Wiley-Interscience, New York, third edition, 1995.

[44] M. Carter and B. van Brunt. The Lebesgue-Stieltjes Integral. A Practical Introduction. Springer-Verlag, Berlin, 2007.

[45] D. Meintrup and S. Sch¨affler. Stochastik. Theorie und Anwendungen. Springer-Verlag, Berlin, 2005.

[46] G. de Beer, Sir. Mendel, Darwin, and Fisher. Notes and Records of the Royal Society of London, 19:192–226, 1964.

[47] K. Sander. Darwin und Mendel. Wendepunkte im biologischen Denken. Biologie in unserer Zeit, 18:161–167, 1988.

[48] T. Bayes and R. Price. An essay towards solving a problem in the doctrine of chances. By the late Rev. Mr. Bayes, communicated by Mr. Price, in a letter to John Canton, M.A. and F.R.S. Phil. Trans. Roy. Soc. London, 53:370–418, 1763.

[49] C. P. Robert. The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation. Springer-Verlag, Berlin, 2007.

[50] P. M. Lee. Bayesian Statistics. An Introduction. Hodder Arnold, New York, third edition, 2004.

[51] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman & Hall / CRC, Boca Raton, FL, second edition, 2004.

[52] B. E. Cooper. Statistics for Experimentalists. Pergamon Press, Oxford, 1969.

[53] J. F. Kenney and E. S. Keeping. Mathematics of Statistics. Van Nostrand, Princeton, NJ, second edition, 1951.

[54] W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes. The Art of Scientific Computing. Cambridge University Press, Cembridge, UK, 1986.

[55] J. F. Kenney and E. S. Keeping. The k-Statistics. In Mathematics of Statistics. Part I, 7.9. Van Nostrand, Princeton, NJ, third edition, 1962. § [56] M. Evans, N. A. J. Hastings, and J. B. Peacock. Statistical Distributions. John Wiley & Sons, New York, third edition, 2000.

[57] N. A. Weber. Dimorphism of the African oecophylla worker and an anomaly (hymenoptera formicidae). Annals of the Entomological Society of America, 39:7–10, 1946. BIBLIOGRAPHY 289

[58] M. F. Schilling, A. E. Watkins, and W. Watkins. Is human height bimodal? The American Statistician, 56:223–229, 2002.

[59] C. W. Gardiner. Handbook of Stochastic Methods. Springer-Verlag, Berlin, first edition, 1983.

[60] D. Williams. Diffusions, Markov Processes and Martingales. Volume 1: Foundations. John Wiley & Sons, Chichester, UK, 1979.

[61] B. K. Øksendal. Stochastic Differential Equations. An Introduction with Applications. Springer-Verlag, Berlin, sixth edition, 2003.

[62] M. Schubert and G. Weber. Quantentheorie. Grundlagen und Anwendungen. Spektrum Akademischer Verlag, Heidelberg, DE, 1993.

[63] R. W. Robinett. Quantum Mechanics. Classical Results, Modern Systems, and Visualized Examples. Oxford University Press, New York, 1997.

[64] C. W. Gardiner. Handbook of Stochastic Methods. Springer-Verlag, Berlin, second edition, 1985.

[65] I. F. Gihman and A. V. Skorohod. The Theory of Stochastic Processes. Vol. I, II, and III. Springer-Verlag, Berlin, 1975.

[66] G. E. Uhlenbeck and L. S. Ornstein. On the theory of the Brownian motion. Phys. Rev., 36:823–841, 1930.

[67] P. Langevin. On the theory of Brownian motion. C. R. Acad. Sci. (Paris), 146:530–533, 1908.

[68] L. Arnold. Stochastic Differential Equations. Theory and Applications. John Wiley & Sons, New York, 1974.

[69] P. Medvegyev. Stochastic Integration Theory. Oxford University Press, New York, 2007.

[70] P. E. Protter. Stochastic Intergration and Differential Equations, volume 21 of Applications of Mathematics. Springer-Verlag, Berlin, second edition, 2004.

[71] K. It¯o. Stochastic integral. Proc. Imp. Acad. Tokyo, 20:519–524, 1944.

[72] K. It¯o. On stochastic differential equations. Mem. Amer. Math. Soc., 4:1–51, 1951.

[73] R. L. Stratonovich. Introduction to the Theory of Random Noise. Gordon and Breach, New York, 1963.

[74] D. L. Fisk. Quasi-martingales. Trans. Amer. Math. Soc., 120:369–389, 1965. 290 BIBLIOGRAPHY

[75] A. D. Fokker. Die mittlere Energie rotierender elektrischer Dipole im Strahlungsfeld. Annal. Phys. (Leipzig), 43:810–820, 1914.

[76] M. Planck. Uber¨ einen Satz der statistischen Dynamik und seine Erweiterung in der Quantentheorie. Sitzungsber. Preuss. Akad. Wiss. Phys.Math.Kl., 1917/I:324–341, 1917.

[77] A. N. Kolmogorov. Uber¨ die analytischen Methoden in der Wahrscheinlichkeitsrechnung. Mathematische Annalen, 104:415–418, 1931.

[78] W. Feller. The parabolic differential equations and the associated semi-groups of transformations. Annals of Mathematics, Second Series, 55:468–519, 1952.

[79] R. C. Tolman. The Principle of Statistical Mechanics. Oxford University Press, Oxford, UK, 1938.

[80] N. G. van Kampen. Derivation of the phenomenological equations from the master equation. I. Even variables only. Physica, 23:707–719, 1957.

[81] N. G. van Kampen. Derivation of the phenomenological equations from the master equation. II. Even and odd variables. Physica, 23:816–729, 1957.

[82] R. Graham and H. Haken. Generalized thermodynamic potential for Markoff systems in detailed balance and far from thermal equilibrium. Z. Physik, 243:289–302, 1971.

[83] L. Onsager. Reciprocal relations in irreversible processes. I. Phys. Rev., 37:405–426, 1931.

[84] L. Onsager. Reciprocal relations in irreversible processes. II. Phys. Rev., 38:2265–2279, 1931.

[85] W. W. S. Wei. Time Series Analysis. Univariate and Multivariate Methods. Addison-Wesley Publ. Co., Redwood City, CA, 1990.

[86] E. W. Montroll and K. E. Shuler. Studies in nonequilibrium rate processes: I. The relaxation of a system of harmonic oscillators. J. Chem. Phys., 26:454–464, 1956.

[87] K. E. Shuler. Studies in nonequilibrium rate processes: II. The relaxation of vibrational nonequilibrium distributions in chemical reactions and shock waves. J. Phys. Chem., 61:849–856, 1957.

[88] N. W. Bazley, E. W. Montroll, R. J. Rubin, and K. E. Shuler. Studies in nonequilibrium rate processes: III. The vibrational relaxation of a system of anharmonic oscillators. J. Chem. Phys., 28:700–704, 1958. BIBLIOGRAPHY 291

[89] A. F. Bartholomay. On the linear birth and death processes of biology as Markoff chains. Bull. Math. Biophys., 20:97–118, 1958.

[90] A. F. Bartholomay. Stochastic models for chemical reactions: I. Theory of the unimolecular reaction process. Bull. Math. Biophys., 20:175–190, 1958.

[91] A. F. Bartholomay. Stochastic models for chemical reactions: II. The unimolecular rate constant. Bull. Math. Biophys., 21:363–373, 1959.

[92] S. K. Kim. Mean first passage time for a random walker and its application to chemical knietics. J. Chem. Phys., 28:1057–1067, 1958.

[93] D. A. McQuarrie. Kinetics of small systems. I. J. Chem. Phys., 38:433–436, 1962.

[94] D. A. McQuarrie, C. J. Jachimowski, and M. E. Russell. Kinetics of small systems. II. J. Chem. Phys., 40:2914–2921, 1964.

[95] K. Ishida. Stochastic model for bimolecular reaction. J. Chem. Phys., 41:2472–2478, 1964.

[96] I. G. Darvey and P. J. Staff. Stochastic approach to first-order chemical reaction kinetics. J. Chem. Phys., 44:990–997, 1966.

[97] D. A. McQuarrie. Stochastic approach to chemical kinetics. J. Appl. Prob., 4:413–478, 1967.

[98] N. G. van Kampen. The expansion of the master equation. Adv. Chem. Phys., 34:245–309, 1976.

[99] G. Nicolis and I. Prigogine. Self-Organization in Nonequilibrium Systems. John Wiley & Sons, New York, 1977.

[100] A. M. Turing. The chemical basis of morphogenesis. Phil. Trans. Roy. Soc. London B, 237(641):37–72, 1952.

[101] H. Meinhardt. Models of Biological Pattern Formation. Academic Press, London, 1982.

[102] P. Ehrenfest and T. Ehrenfest. Uber¨ zwei bekannte Einw¨ande gegen das Boltzmannsche H-Theorem. Z. Phys., 8:311–314, 1907.

[103] M. Abramowitz and I. A. Segun, editors. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables, New York, 1965. Dover Publications.

[104] H. A. Kramers. Brownian motion in a field of force and the diffusion model of chemical reactions. Physica, 7:284–304, 1940. 292 BIBLIOGRAPHY

[105] J. E. Moyal. Stochastic porcesses and statistical physics. J. Roy. Statist. Soc. B, 11:151–210, 1949.

[106] A. Janshoff, M. Neitzert, Y. Oberd¨orfer, and H. Fuchs. Force spectroscopy of molecular systems – single molecule spectroscopy of polymers and biomolecules. Angew. Chem. Int. Ed., 39:3212–3237, 2000.

[107] W. K. Zhang and X. Zhang. Single molecule mechanochemistry of macromolecules. Prog. Polym. Sci., 28:1271–1295, 2003.

[108] A. Messiah. Quantum Mechanics, volume II. North-Holland Publishing Company, Amsterdam, NL, 1970.

[109] D. T. Gillespie. Markov Processes: An Introduction for Physical Sceintists. Academic Press, San Diego, CA, 1992.

[110] Y. Cao, D. T. Gillespie, and L. R. Petzold. Efficient step size selection for the tau-leaping simulation method. J. Chem. Phys., 124:044109, 2004.

[111] P. Schuster and K. Sigmund. Random selection – A simple model based on linear birth and death processes. Bull. Math. Biol., 46:11–17, 1984.

[112] I. S. Gradstein and I. M. Ryshik. Tables of Series, Products, and Integrals, volume 1. Verlag Harri Deutsch, Thun, DE, 1981.

[113] C. R. Heathcote and J. E. Moyal. The random walk (in continuous time) and its application to the theory of queues. Biometrika, 46:400–411, 1959.

[114] N. T. J. Bailey. The Elements of Stochastic Processes with Application in the Natural Sciences. Wiley, New York, 1964.

[115] E. W. Montroll. Stochastic processes and chemical kinetics. In W. M. Muller, editor, Energetics in Metallurgical Phenomenon, volume 3, pages 123–187. Gordon & Breach, New York, 1967.

[116] E. W. Montroll and K. E. Shuler. The application of the theory of stochastic processes to chemical kinetics. Adv. Chem. Phys., 1:361–399, 1958.

[117] K. E. Shuler, G. H. Weiss, and K. Anderson. Studies in nonequilibrium rate processes. V. The relaxation of moments derived from a master equation. J. Math. Phys., 3:550–556, 1962.

[118] C. W. Gardiner and S. Chaturvedi. The Poisson representation. I. A new technique for chemical master equations. J. Statist. Phys., 17:429–468, 1977.

[119] S. Chaturvedi and C. W. Gardiner. The Poisson representation. II. Two-time correlation functions. J. Statist. Phys., 18:501–522, 1978. BIBLIOGRAPHY 293

[120] P. D. Drummond. Gauge Poisson representation for birth-death master equations. Eur. Phys. J. B, 38:617–634, 2004.

[121] P. D. Drummond, T. G. Vaughan, and A. J. Drummond. Extinction times in autocatalytic systems. J. Phys. Chem. A, 114:10481–10491, 2010.

[122] R. S. Berry, S. A. Rice, and J. Ross. . Oxford University Press, New York, second edition, 2000.

[123] A. J. Drummond and P. D. Drummond. Extinction in a self-regulating population with demographic and environmental noise. ArXiv: 0807.4772v2, the University of Auckland, Auckland, NZ and the University of Queensland, Brisbane, QLD, AU, 2008.

[124] P. Verhulst. Recherches math´ematiques sur la loi d’accroisement de la population. Nouv. M´em. de l’Academie Royale des Sci. et Belles-Lettres de Bruxelles, 18:1–41, 1845. 294 BIBLIOGRAPHY Contents

1 History and Classical Probability 5 1.1 Precision limits and fluctuations ...... 7 1.2 Thinking in terms of probability ...... 10

2 Probabilities, Random Variables, and Densities 17 2.1 Setsandsamplespaces ...... 17 2.2 Probability measure on countable sample spaces ...... 22 2.2.1 Probabilities on countable sample spaces ...... 22 2.2.2 Randomvariablesandfunctions ...... 27 2.2.3 Probabilitiesonintervals ...... 32 2.3 Probability measure on uncountable sample spaces ...... 34 2.3.1 Existence of non-measurable sets ...... 34 2.3.2 Borel σ-algebraandLebesgue measure ...... 36 2.3.3 Random variables on uncountable sets ...... 42 2.3.4 Limitsofseries ofrandomvariables ...... 46 2.3.5 StieljesandLebesgueintegration ...... 47 2.4 Conditional probabilities and independence ...... 56 2.5 Expectation values and higher moments ...... 60 2.5.1 Firstandsecondmoments ...... 61 2.5.2 Highermoments...... 67 2.6 Mathematicalstatistics...... 70 2.7 Distributions, densities and generating functions ...... 74 2.7.1 Probability generating functions ...... 74 2.7.2 Moment generating functions ...... 76 2.7.3 Characteristicfunctions ...... 76 2.7.4 ThePoissondistribution ...... 79 2.7.5 The binomial distribution ...... 81 2.7.6 Thenormaldistribution ...... 83 2.7.7 Central limit theorem and the law of large numbers . . 91 2.7.8 The Cauchy-Lorentz distribution ...... 96 2.7.9 Bimodaldistributions...... 97

I II CONTENTS

3 Stochastic processes 101 3.1 Markovprocesses ...... 103 3.1.1 Simple stochastic processes ...... 104 3.1.2 The Chapman-Kolmogorov equation ...... 106 3.2 Classes of stochastic processes ...... 117 3.2.1 Jump process and master equation ...... 117 3.2.2 Diffusion process and Fokker-Planck equation . . . . . 119 3.2.3 Deterministic processes and Liouville’s equation . . . . 121 3.3 Forwardandbackwardequations ...... 122 3.4 Examples of special stochastic processes ...... 125 3.4.1 Poissonprocess ...... 125 3.4.2 Random walk in one dimension ...... 128 3.4.3 Wiener process and the diffusion problem ...... 131 3.4.4 Ornstein-Uhlenbeck process ...... 135 3.5 Stochastic differential equations ...... 137 3.5.1 Derivation of the stochastic differential equation . . . . 137 3.5.2 Stochasticintegration...... 140 3.5.3 Integration of stochastic differential equations . . . . . 146 3.5.4 Changing variables in stochastic differential equations . 149 3.5.5 Fokker-Planck and stochastic differential equations . . 150 3.6 Fokker-Planckequations ...... 153 3.6.1 Probability currents and boundary conditions . . . . . 154 3.6.2 Fokker-Planck equation in one dimension ...... 158 3.6.3 Fokker-Planck equation in several dimensions . . . . . 164 3.6.3.1 Changeofvariables...... 164 3.6.3.2 Stationarysolutions ...... 167 3.6.3.3 Detailedbalance ...... 169 3.7 Autocorrelationfunctions andspectra...... 173

4 Applications in chemistry 177 4.1 Stochasticity in chemical reactions ...... 177 4.1.1 Elementary steps of chemical reactions ...... 178 4.1.2 The master equation in chemistry ...... 180 4.1.3 Birth-and-death master equations ...... 183 4.1.4 Theflowreactor ...... 186 4.2 Classesofchemicalreactions ...... 193 4.2.1 Monomolecular chemical reactions ...... 193 4.2.1.1 Irreversible monomolecular chemical reaction 194 4.2.1.2 Reversible monomolecular chemical reaction . 195 4.2.2 Bimolecular chemical reactions ...... 201 4.2.2.1 Additionreaction...... 201 CONTENTS III

4.2.2.2 Dimerizationreaction ...... 205 4.3 Fokker-Planck approximation of master equations ...... 210 4.3.1 Diffusion process approximated by a jump process . . . 210 4.3.2 Kramers-Moyalexpansion ...... 213 4.3.3 Size expansion of the chemical master equation . . . . 214 4.4 Numerical simulation of master equations ...... 219 4.4.1 Definitionsandconditions ...... 219 4.4.2 The probabilisticrateparameter...... 221 4.4.2.1 Bimolecular reactions ...... 222 4.4.2.2 Monomolecular, trimolecular, and other reac- tions...... 224 4.4.3 Simulation of chemical master equations ...... 226 4.4.4 Thesimulationalgorithm ...... 231 4.4.5 Implementation of the simulation algorithm ...... 233

5 Applications of stochastic processes in biology 241 5.1 Autocatalysis, replication, and extinction ...... 241 5.1.1 Autocatalyticgrowthanddeath ...... 242 5.1.2 Boundaries in one step birth-and-death processes . . . 251 5.2 Sizeexpansioninbiology...... 256 5.3 ThePoissonrepresentation...... 261 5.3.1 Motivation of the Poisson representation ...... 262 5.3.2 Many variable birth-and-death systems ...... 264 5.3.3 The formalism of the Poisson representation ...... 271 5.3.4 Real, complex, and positive Poisson representations . . 278 IV CONTENTS Index algebra Lipschitz, 148 filtered, 105 Markov, 106 assumption potential, 168 scaling, 210 pseudo first order, 204 conditions Bayes, Thomas, 58 boundary, 153 Bernoulli trials, 104 confidence interval, 84, 192 Bernoulli, Jakob, 81 constant Boltzmann, Ludwig, 8, 220 equilibrium, 193 ´ Borel, Emile, 34 reaction rate, 193 boundary controversy absorbing, 157, 167, 194, 251 Mendel-Fisher, 11 entrance, 159, 160, 277 convergence exit, 159, 160 pointwise, 51 natural, 159, 160 correction periodic, 158, 161 Bessel, 71 prescribed, 159 correlation reflecting, 157, 167, 251 coefficient, 64 regular, 160 covariance, 64 Brown, Robert, 7 sample, 72 current Cantor, Georg, 17 probability, 154 Cardano, Gerolamo, 10 carrying capacity, 282 Darboux, Gaston, 48 Cauchy, Augustin Louis, 96 de Moivre, Abraham, 91 central limit theorem, 91 Dedekind, Richard, 17 chaos density deterministic, 8 discrete probability, 25 Chapman, Sydney, 107 joint, 61, 103 Chebyshev, Pafnuty, 94 joint , 44 Chung, Kai Lai, 6 marginal, 44 collisions spectral, 173 classical theory, 222 description nonreactive, 221 deterministic, 5, 7 reactive, 221 detailed balance, 170, 185 condition diffusion matrix, 119 growth, 148 Dirac, Paul, 109

I II INDEX

Dirichlet, Peter Gustav Lejeune, 51 filtration, 105 distribution Fisher, Ronald, 11 bimodal, 65 Fisk, Donald, 144 discrete uniform, 26 fluctuations, 5, 7, 9 joint, 33, 43, 88 Fokker, Adriaan, 153 joint , 44 function marginal, 33, 45, 58 autocorrelation, 173 Maxwell-Boltzmann, 220 characteristic, 74, 76 normal, 67 cumulative distribution, 30, 33, uniform, 35 66 Doob, Joseph, 104 density, 42 drift vector, 119 distribution, 43 dynamics Heaviside, 28 complex, 8 indicator, 50 marginal distribution, 45 Ehrenfest, Paul, 200 measurable, 49 Einstein, Albert, 7, 106 moment generating, 74, 76 ensemble nonanticipating, 144 canonical, 262 probability generating, 74 grand canonical, 262 probability mass, 30 microcanonical, 262 signum, 29 ensemble average, 174 simple, 50 equation backward, 107, 115, 123 game Chapman-Kolmogorov, 107, 123, Penney’s, 13 138 Gardiner, Crispin, 102, 115, 137 chemical master, 230 Gegenbauer, Leopold, 207 Fokker-Planck, 116, 119, 138, genetics 153, 177 Mendelian, 11 forward, 107, 115, 123 Gillespie, Daniel, 182, 219 Kolmogorov, 153 Langevin, 137 Heaviside, Oliver, 28 Liouville, 121 independence, 57 master, 117, 177, 180 inequality Smoluchowski, 153 Cauchy-Schwarz, 64 ergodicity, 175 inference, statistical, 59 estimator, 70 integral event, 24 Lebesgue, 47 exit problem, 124 Riemann, 47 expectation value, 49, 60 Stieltjes, 47, 139, 140 Feller, William, 160 Stratonovich, 144 INDEX III integration measure Cauchy-Euler, 147 Lebesgue, 39 It¯o, Kiyoshi, 141 median, 64 Mendel, Gregor, 11, 58 Jacobi, Carl, 203 mode, 65 molecularity, 193 Khinchin, Aleksandr, 174 moment Kimura, Motoo, 248 centered, 62 kinetic theory jump, 181 gases, 222 raw, 62 kinetics moments combinatorial, 261 factorial, 80 combinorial, 261 raw, 84 mass action, 261 sample, unbiased, 71 Kolmogorov, Andrey, 107 motion Kramers, Hendrik, 210 Brownian, 7, 137 kurtosis, 67 thermal, 8 excess, 67 Moyal, Jos´e, 210 L´evy, Paul Pierre, 104 noise Langevin, Paul, 137 colored, 174 Laplace, Pierre-Simon, 91 white, 138, 174 law numbers large numbers, 94 rational, 41 Lebesgue, Henri L´eon, 30 limit object almost certain, 46 elementary, 17 in distribution, 47 Onsager, Lars, 172 mean square, 46, 141 operator stochastic, 47 linear, 60 Liouville, Joseph, 121 Ornstein, Leonard, 135 Lorentz, Hendrik, 96 Ostwald, Wilhelm, 9

Markov, Andrey, 106 Pascal, Blaise, 10 martingale, 104 PDE local, 105, 143 parabolic, 153 mass action, 227 Planck, Max, 153 matrix Poincar´e, Henri, 9 stoichiometric, 265 Poisson representation, 261 Maxwell, James Clerk, 8, 220 Poisson, Sim´eon Denis, 79 mean pre-image, 50 sample, 70 probability IV INDEX

conditional, 56 Riemann, Bernhard, 47 density, 42, 49 distribution, 43, 49 sample elementary, 45 space, 17 mass function, 30 selection net flow, 154 random, 248 posterior, 59 semimartingale, 30, 105, 143 prior, 59 sensitivity triple, 27, 42 to fluctuations, 9 process set adapted, 105 Borel, 34, 38 adaptive, 144 Cantor, 41 Bernoulli, 81 countable, 21 birth-and-death, 178 empty, 18 c`adl`ag, 30, 105 power, 34 diffusion, 119, 151 uncountable, 21 jump, 117 Vitali, 35, 41 Markov, 138, 177 sets nonanticipating, 105 disjoint, 20 Ornstein-Uhlenbeck, 135 sigma-algebra, 37 Poisson, 79, 125 Borelian, 38 Rayleigh, 165, 168 skewness, 67 Wiener, 108, 120, 131, 135, 143, space 147, 174 event, 37 product, reaction, 220 measurable, 37 property spectroscopy extensive, 214 fluorescence correlation, 6 intensive, 214 single molecule, 6 pseudovector, 171 spectrum, 173 standard deviation, 63 quantile, 66 sample, 70 quasiprobability, 271 statistics, Bayesian, 58 Stieltjes, Thomas Jean, 30 random walk, 128, 135 stochastic process, 101 rate parameter independent, 104 probabilistic, 222 Markov, 106 Rayleigh fading, 165 separable, 103 reactant, 220 Stratonovich, Ruslan, 144 reaction system Belousov-Zhabotinskii, 9 closed, 193, 195 reaction order, 193 event, 36, 37 INDEX V

isolated, 193 systems dynamical, 8 theorem mutliplication, 62 theory large samples, 94 time arrival, 127 first passage, 107, 124, 249 sequential extinction, 249 Tolman, Richard Chace, 170, 185 trajectory, 101 translation, 40

Uhlenbeck, George, 135 uncertainty quantum mechanical, 8 van Kampen, Nicholas, 210, 214 variable random, 28 variables continuous, 42 discrete, 42 variance, 63 sample, 70 vector axial, 171 random, 87 Vitali, Giuseppe, 34 volume generalized, 39 von Smoluchowski, Marian, 7, 106, 153

Wiener, Norbert, 131