<<

Course Notes, Math 558

March 28, 2012 2 Contents

1 Nondimensionalization, Scaling, and Units 5 1.1 ...... 5 1.1.1 Example 1. Throw a ball ...... 5 1.1.2 Example 2. Drag on a sphere ...... 7 1.1.3 Using dimensional analysis for scale models ...... 8 1.1.4 Buckingham Pi Theorem ...... 9 1.1.5 Example 3. Computing the yield of a nuclear device ...... 10 1.1.6 Example 4. Pythagoras’ Theorem ...... 11 1.1.7 Example 5. Diffusion equation ...... 11 1.2 Scaling and nondimensionalization ...... 12 1.2.1 Projectile problem revisited ...... 12 1.2.2 A different scaling for projectile problem ...... 14 1.2.3 Diffusion equation, revisited ...... 14

2 Regular Asymptotics 17 2.1 Motivation, definitions ...... 17 2.2 Asymptotics for polynomials ...... 18 2.2.1 Examples ...... 18 2.2.2 Systematic expansions for polynomials ...... 19 2.3 Asymptotics for Differential Equations ...... 22 2.3.1 Examples ...... 22 2.3.2 Regular asymptotic regime ...... 24

3 Random Walks, Diffusions, and SDE 27 3.1 Random Walks ...... 27 3.1.1 Unbiased random walk — combinatorial approach ...... 27 3.1.2 Unbiased random walk— asymptotic approach ...... 29 3.1.3 Biased random walk ...... 30 3.1.4 Another unbiased random walk ...... 31 3.1.5 Summary ...... 31 3.2 Solution of heat equation ...... 31 3.2.1 No drift term ...... 31 3.2.2 Drift term ...... 33 3.3 Stochastic differential equations ...... 33 3.3.1 Background on probability theory ...... 33 3.3.2 Stochastic differential equations ...... 34 3.3.3 Fokker-Planck equations ...... 35 3.4 Comments on units, scaling ...... 35

3 4 Birth–Death Processes, Generators 37 4.1 Counting Process ...... 37 4.1.1 Exponential Random Variable ...... 37 4.1.2 Poisson process ...... 38 4.1.3 Generators ...... 40 4.1.4 `p spaces and adjoints ...... 41 4.1.5 Adjoint of generator ...... 43 4.1.6 Large number limit ...... 44 4.2 General discrete stochastic process ...... 46

5 Singular Asymptotics 47 5.1 Long timescales for weakly nonlinear ODE ...... 47 5.1.1 Example from Celestial mechanics ...... 47 5.1.2 Weakly nonlinear oscillators ...... 49 5.1.3 Method of Multiple Scales ...... 50 5.1.4 Linearization and –revisiting the nonlinear oscillator ...... 52 5.1.5 Resonance for linearized systems ...... 54 5.2 Multiscale differential equations ...... 55 5.2.1 Example ...... 55 5.2.2 One slow manifold ...... 55 5.2.3 Multiple slow manifolds ...... 55 5.3 Boundary layers for ODE ...... 55 5.3.1 Example 1. Zero-time blowup ...... 55 5.3.2 Example 2...... 56 5.3.3 Inner and outer expansions ...... 57 5.4 WKB expansions ...... 58 5.5 Reaction-diffusion equations, small diffusion limit ...... 58

A Background on measure theory, probability theory 59 A.1 Basic Measure Theory ...... 59

4 Chapter 1

Nondimensionalization, Scaling, and Units

Barenblatt.book Holmes.new.book This section is a selection of material related to include Chapters 0–3 of [?], Chapters 1–3 of [?].

1.1 Dimensional Analysis

1.1.1 Example 1. Throw a ball Consider the following example: We throw a ball directly upwards from the surface of the Earth. What is its maximum height? We need two physical laws to describe this system: 1. Newton’s Second Law. The acceleration of a body is directly proportional to its force and inversely proportional to its mass; moreover, the acceleration is parallel to the form. This is typically said as “Force equals mass times acceleration” or F = ma. 2. Newton’s law of universal gravitation. Every body in the universe attracts every other body with a gravitational force, which force is proportional to each of the masses of the objects and inversely proportional to the square of the distance between the objects. In equations, if x is the vector from mass 1 to mass 2, then the force on mass 1 due to mass 2 is

Gm1m2x F12 = , kxk3 and the force on mass 2 due to mass 1 is

Gm1m2x F21 = −F12 = − . kxk3

Since this problem is one-dimensional, we can think of the force, acceleration, and position as scalars (dis- tance from the ground). So if we define mB as the mass of the ball, mE as the mass of the earth, then we have the for x(t), the height above the ground, as 2 d x GmBmE m = − , (1.1) eq:d B dt2 d2 where d = R + x is the distanceeq:d between the center of the ball and the center of the earth (R is the radius of the earth). Simplifying (1.1) we obtain d2x gR2 = − , (1.2) eq:x dt2 (R + x)2

5 2 where we write g = GME/R . Clearly this problem, being second-order, requires two integration constants, but these can be specified by x(0) and x0(0), the initial position and velocity. Let us denote the maximal height by xmax. This equation can actually be solved exactly, but it is tedious and requires special functions (inverse sinh, for example) and in general is a big mess. Let us delay this solution for now. Idea 1. The first idea we might have is to say that x(t)  R for all t, and thus

x(t) + R ≈ R. (1.3) eq:approx eq:x If we make this approximation, then (1.2) becomes

d2x = −g, (1.4) eq:constcoeff dt2 which we can easily solve as g x(t) = − t2 + x0(0)t + x(0). 2 0 ∗ Writing x(0) = 0 and x (0) = v0, it is easy to see that the time of maximal height is t = v0/g, and therefore 2 the maximal height is xmax = v0/2g. Of course, what we have done here is what is known as an uncontrolledeq:approx approximation! We have no idea, a priorieq:x, how the error we have introduced in the approximation in (1.3) percolates into errors in the solution of (1.2). Studying the effects of this approximation is exactly what we will study in Topic 2 of this course in a few weeks. eq:constcoeff Idea 2. Let us imagine that we didn’t know how to solve (1.4), and look at the units of the problem. There are three physical dimensions that appear in this problem, length, time, and mass; we denote these as L, T , M. Clearly, the units of velocity are L/T , acceleration is L/T 2, force is ML/T 2, etc. (If we didn’t know these, we could deduce the first two from the definitions and the third from Newton’s Second Law!) Now, let us make the Ansatz that the maximal height depends only on g, mB, and v0, so that

xmax = f(g, mB, v0).

If this is true, then these quantities must have the same units, i.e.

[xmax] = [f(g, mB, v0)],

and let us make the further Ansatz that this function can be written as a monomial, so that

a b c [xmax] = [m v0g ].

(We will justify this second Ansatz later.) This then becomes

 L b  L c L = M a = M aLb+cT −b−2c. T T 2

Equating powers, we obtain the system

a = 0, b + c = 1, −b − 2c = 0, or, a = 0, b = 2, c = −1.

Therefore we have v2 x = α 0 . max g This is consistent with the exact answer derived above, although is a weaker statement (here we only know that there is a constant out front and not that this constant is α = 1/2). However, notice that we had to insert much less information into the problem to obtain this solution and did not have to know how to solve an ODE.

6 1.1.2 Example 2. Drag on a sphere Imagine a sphere moving through a fluid, we want to compute the force due to drag on the sphere. We postulate that it should depend on the dimensional quantities R, v, ρ, µ: R is the radius of the sphere, v is the velocity of the sphere, ρ is the density of the fluid, and µ is the (dynamic) viscosity of the fluid. The units of these quantities are L M M [R] = L, [v] = , [ρ] = , [µ] = . T L3 LT By the previous logic, we should write a b c d [DF ] = [R v ρ µ ], or ML  L b  M c  M d = La T 2 T L3 LT Expanding these out and equating powers, we obtain the system

a + b − 3c + d = 1, c + d = 1, b + d = 2.

There are only three equations, but four unknowns! So there will not be a unique solution and there is (at least) one free variable. For now, let us make the choice of d as the free variable, and then we obtain

a = 2 − d, b = 2 − d, c = 1 − d, giving us  µ d D = αR2−dv2−dρ1−dµd = αR2v2ρ , F Rvρ α where is a scalar. Let us denote µ Π = (1.5) eq:defofPi Rvρ and we have 2 2 d DF = αR v ρΠ . (1.6) eq:ad We might want to know the dimensions of Π, so we compute M/LT [Π] = = 1. L(L/T )(M/L3) eq:ad We say that Π is dimensionless. Now, since Π is dimensionless, (1.6) is dimensionally correct no matter what the choice of α, or d. Therefore we could write

2 2 d1 DF = R v ρ(α1Π ),

or 2 2 d2 DF = R v ρ(α2Π ), or in fact any linear combination of such terms, namely

n ! 2 2 X dk DF = R v ρ αkΠ . k=1

In general, given any function of Π, this expression is dimensionally correct, so we can write

2 2 DF = R v ρf(Π)

7 for some unknown function f. Since Π is nondimensional, f(Π) is nondimensional for any function f, and therefore we cannot proscribe the solution further. Now, notice that we made a choice of d as a free parameter in the derivation above. Would anything have changed had we made another choice there? For example, let’s now say that c is free. Solving, we would obtain a = 1 + c, b = 1 + c, d = 1 − c, giving Rvρc D = αR1+cv1+cρcµ1−c = αRvρ . F µ Thus defining Rvρ Πe = , (1.7) eq:defofPitilde µ we have c DF = αRvµΠe , and by the same argument of arbitrary powers, we have

DF = Rvµg(Π)e .

Are these two different expressions really different? Note, first of all, that Πe = 1/Π. Moreover, if we set them equal, we obtain

ρR2v2f(Π) = Rvµg(1/Π), Rvρ f(Π) = g(1/Π), µ f(Π) = Πg(1/Π).

Thus there is a functional relationship between f and g; if we know one, we know the other. So these expressions are equivalent even though they do not seem so at first glance. Now, one might ask how one determines theHolmes.new.book unknown function f (or g), and this is something that should be done by experiment. See Figure 1.3 of [?]. The nondimensional quantities Π and Πe show up so often in fluid dynamics that they are given names: Πe is known as the Reynolds number, and Π is the P´ecletnumber.

1.1.3 Using dimensional analysis for scale models Let us imagine that we have specific values for the quantities R, v, ρ, µ in mind, but we want to know the drag a sphere would experience without building the object itself and making a . Can we figure out how to build a scale model of the sphere, and then embed it in another physical experiment to get the measurement. We know that µ D = ρR2v2f(Π), Π = . F Rvρ

As long as we know f(Π), then we are done. So let us build a model, with model parameters Rm, vm, ρm, µm so that Πm = Π. Then f(Πm) = f(Π), and we can measure the former to get the value for the latter. The restriction that Π = Π means that m µ µ = m . Rvρ Rmvmρm

Assuming that we’re using the same fluid, so that ρm = ρ, µm = µ, we then have Rv Rv = Rmvm, or, vm = . Rm

8 As an example, let us say that we wanted to measure the drag force on a sphere of radius 1000m at a given velocity v. This is much to large to build, but we could, for example, build a sphere with radius Rm = 2m and then choose velocity vm = 500v.

1.1.4 Buckingham Pi Theorem The questions we have from the example above are clear. Will such a procedure always work? Will we have choices? When we are given a choice during the procedure, will this affect matters significantly? The answer to this is given in a theorem that we prove below. We will state and prove the theorem in the case that the only dimensions available to us are mass, length, and time, for concreteness. There could be other dimensional quantities (e.g. charge) but it will be easy to see at the end how to modify the statements when there are other dimensions. Let us consider a physical quantity q which depends on the n physical quantities p1, p2, . . . , pn. We have the relationship q = f(p1, . . . , pn), and let us assume a monomial dependence of the units as

a1 a2 an [q] = [p1 p2 . . . pn ] (1.8) eq:mono

Let us assume that the dimensions of each quantity are known, and denote them as

li mi ti l0 m0 t0 [pi] = L M T , [q] = L M T . eq:mono Plugging these into (1.8) and equating powers, we obtain the three equations

n n n X X X aili = l0, aimi = m0, aiti = t0. (1.9) eq:lmt i=1 i=1 i=1

We can write these equations efficiently in matrix form as follows. Define the matrix and vectors   a1  l l ··· l   l  1 2 n  a2  0 A = t t ··· t , a =   , b = t ,  1 2 n   .   0  m1 m2 ··· mn  .  m0 an eq:lmt then (1.9) becomes Aa = b. (1.10) eq:A eq:A So, the question remains: does (1.10) have a solution? If so, is it unique? eq:A Definition 1.1.1. We say that p1, . . . , pn are dimensionally complete if (1.10) has a solution for every q, and dimensionally incomplete if it does not. Equivalently, we say that p1, . . . , pN are dimensionally complete if the matrix A has rank 3. We are now in position to state the theorem:

Theorem 1.1.2. Assume that q = f(p1, . . . , pn) is a dimensionally homogeneous relation and p1, . . . , pn are di- mensionally complete. Then there exists a function F such that q = QF (Π1,..., Πk), where Πi are dimensionless products of the pi, [Q] = [q], and k is the dimension of the kernel of A. eq:A Proof. We know that all solutions to (1.10) can be written in the form

k X a = a∗ + γ(i)a(i), i=1

9 eq:A where a∗ is a particular solution to (1.10), γ(i) ∈ R, and a(i) all lie in the kernel of A. We will write

(i) (i) (i) (i) a = (a1 , a2 , . . . , an ),

and similarly for a∗. We first claim that if a(i) ∈ N (A), then

n (i) (i) a Y aj Πi := p := pj j=1 is a . To check this, we have

(i) (i) a1 an [P ii] = [p1 ··· pn ] l a(i) m a(i) t a(i) l a(i) m a(i) t a(i) = L 1 1 M 1 1 T 1 1 ··· L n n M n n T n n

Pn l a(i) Pn m a(i) Pn t a(i) = L j=1 j j M j=1 j j T j=1 j j .

However, notice that the three powers which appear in this last expression are the three rows of Aa(i), and (i) 0 0 0 since a is in the kernel of A, these are all zero. Therefore [P ii] = L M T = 1 and is dimensionless. Now we also define ∗ ∗ ∗ a a1 an Q = p = p1 . . . pn . (1.11) eq:defofQ

We can check that [Q] = [q] in a manner similar to checking that [P ii] = 1 above. From this it follows that for any gamma1, . . . , γk ∈ R, γ1 γk γ [q] = [QΠ1 ... Πk ] =: [QΠ ]. Of course, this works for any choice of γ, so we can therefore write

∞ X γ q = Q αiΠ i i=1 for some choice of αi (some of which may be zero) and thus we have the general nonlinear function q = QF (Π1,..., Πk). Moreover, given a choice of basis of the kernel of A (i.e. choosing a(i)), this specifies the nondimensional quantities Πi. If we were to make a different choice of basis here, this would give different Π’s.

Remark 1.1.3. We assumed in the theorem that the law q = f(p1, . . . , pn) is homogeneous. Is this a valid assump- tion? For example, consider a relationship of the form

n n Y αi Y βi f(p1, . . . , pn) = C1 pi + C2 pi . i=1 i=1 P P By dimensional consistency, we would need to satisfy both i liαi = l0 and i liβi = l0, and similarly for m, t, which would mean that we must satisfy both Aα = Aβ = a and therefore we can combine the two terms.

1.1.5 Example 3. Computing the yield of a nuclear device See course lecture.

10 1.1.6 Example 4. Pythagoras’ Theorem Consider a right-triangle ABC where B is the right angle. Define c as the length of the hypotenuse AC and θ as the angle between AB and AC. It is not hard to see that specifying c and θ completely determines the triangle, and therefore the area of the triangle is given by a function f(c, θ). Notice that θ is nondimensional, [c] = L and [area] = L2, therefore

area = c2F (θ).

Now drop a perpendicular from vertex B to the hypotenuse, breaking the original triangle into two smaller ones. These are both right triangles with one interior angle equal to θ; the first has hypotenuse a, the second hypotenuse b. Therefore we have c2F (θ) = a2F (θ) + b2F (θ),

and assuming that F (θ) 6= 0, we obtain c2 = a2 + b2. 1 Moreover, we can actually use trigonometry to compute F (θ) = 2 sin(θ) cos(θ).

1.1.7 Example 5. Diffusion equation If we consider the density of (e.g.) a chemical in solution, where we denote said density at x and t by u(x, t), then the diffusion equation is given by ut = Duxx, (1.12) eq:PDE where D is the diffusion coefficient. Let us imagine that we post this problem on the domain x ∈ (0, ∞) and t > 0. Moreover, we assume that u(x, 0) = 0 for all x (zero concentration at time zero) and we inject the chemical at x = 0 so that u(0, t) = u0. We also append the boundary condition u(∞, t) = 0. Computing dimensions, we have

M M M M 3 [u] = , [u ] = , [u ] = , [u ] = . L3 t TL3 xx L5 0 L To make the PDE dimensionally consistent, we must have [D] = L2/T . If we assume that the concentration u(x, t) is a function of x, t, D, and u0, we obtain

a b c d [u] = [x t D u0], or M L2 c  M d = LaT b = La−3d+2cT b−cM d. L3 T L3 One solution of the system is to choose b = c = −a/2 and d = 1, giving  a 2 −a/2 −a/2 x u = αx t D u0 = αu0 √ . Dt √ Thus we have the nondimensional Π = x/ Dt and we have u = αF (Π) for some function F . Note, of course, that if we consider the matrix A as in the theorem, we have  1 0 2 −3  A =  0 0 0 1  . 0 1 −1 0

It is easy to see that this matrix has a one-dimensional nullspace (it clearly has column rank 3) and we can check that a nullvector is (2, −1, −1, 0)t. Therefore we have no real choice here, our nondimensional quantity will be x2/Dt (or a power of it) and we have chosen the square root of that.

11 eq:PDE Plugging the expression u(x, t) = αF (Π) into the PDE (1.12), we obtain Π F 00(Π) = − F 0(Π). 2 We also have the boundary conditions F (0) = 1,F (∞) = 0. The general solution of this system is

Π Z 2 F (Π) = β + α e−s /4 ds. 0 It is not hard to see (exercise below!) that ∞ Z 2 √ e−s /4 ds = π, 0 and therefore plugging in the initial conditions we obtain that

Π 1 Z 2 F (Π) = 1 − √ e−s /4 ds, π 0 and therefore we have the general solution √ Z x/ Dt ! 1 −s2/4 u(x, t) = u0 1 − √ e ds . π 0

1.2 Scaling and nondimensionalization

We will revisit a couple of the earlier problems and nondimensionalize them (i.e. remove the dimensions from the problems to obtain purely mathematical problems).

1.2.1 Projectile problem revisited Consider the projectile problem above. Recall that we have the ODE given by

d2x −gR2 = . (1.13) eq:dim dt2 (x + R)2

Let us rescale space and time by the following

t = tcτ, x = xcξ, where [t] = [tc] and [x] = [xc], so that τ, ξ are nondimensional. By the chain rule, we have

d2 1 d2 2 = 2 2 , dt tc dτ eq:dim so (1.13) becomes 2 2 xc d ξ −gR −g 2 2 = 2 = 2 , tc dτ (R + xcξ) (1 + xcξ/R) or, rewriting, we have x d2ξ −1 c = . 2 2 xc 2 gtc dτ (1 + R ξ)

12 Moreover, we also have the initial conditions

dξ tc ξ(0) = 0, (0) = v0, dτ xc where v0 is the initial velocity of the projectile. Note that three non-dimensional constants show up in the problem, namely: xc xc tcv0 Π1 = 2 , Π2 = , Π3 = . gtc R xc The first is the ratio of the characteristic acceleration of the problem with respect to the gravitational accel- eration of the earth; the second is the characteristic size of the problem compared to the earth’s radius, and the third is the characteristic velocity with respect to the initial velocity of the problem. Now, notice that we have two degrees of freedom to choose (xc, tc) and three quantities which we can change. Of course it is unclear what to do here since there are so many choices, so we have rules of thumb on how to proceed here. Rules of Thumb on nondimensionalization:

1. (always) Make as many nondimensional constants equal to one as possible.

2. (usually) Make the constants that appear in the initial or boundary conditions equal to one.

3. (usually) If there is a nondimensional constant that, if we were to set it equal to zero, would simplify the problem significantly, allow it to remain free and then see when we can make it small.

Using the guidance above, we definitely want to choose Π3 = 1. Moreover, we saw above that if the Π2 term disappeared, the problem becomes very simple, so Π2 should remain free, and we then set Π1 = 1. This means that we choose the characteristic scales of the problem as

v v2 t = 0 , x = 0 , c g c g and this gives us x v2 Π = c = 0 . 2 R gR Now, we expect this to be small if the projectile is our throwing a ball. We definitely don’t expect the characteristic height of this problem to be significant when compared to R. Similarly, notice that the other expression is the ratio of the kinetic energy of the ball to the potential energy of the ball at time zero. Since we don’t expect to be able to throw a ball into orbit, we expect this ratio to be small as well. Since Π2 is small, we denote it by , and we obtain the rescaled, nondimensional ODE

d2ξ −1 = , ξ(0) = 0, ξ0(0) = 1. dτ 2 (1 + ξ)2

Now, we have chosen wisely, since we know that if we set  = 0 in this problem, we have the ODE

ξ00 = −1, ξ(0) = 0, ξ0(0) = 1, which we can solve explicitly as ξ = −τ 2/2 + τ. So the question one can (and should!) ask at this point is how much the addition of an  in the problem changes things. More specifically, if we consider the problem

d2ξ −1 dξ = , ξ(0) = 0, (0) = 1, (1.14) eq:rp dτ 2 (1 + ξ)2 dτ

13 then how close are ξ and ξ0? This is a useful question, since we know the latter exactly. Of course, we need to be careful about what we mean by close here. Now, for example, imagine that we know how to write

 0 2 ξ (t) = ξ (t) + ξ1(t) +  ξ2(t) + ...

and we can guaranteed that ξ1(t) is bounded over some time interval. Then we have a good approximation  to the solution ξeq:rpand we can make it better and better as  → 0. If we can do this, then we call the perturbation in (1.14) a regular perturbation; we will study these extensively in the next section of the course. (We will see for this problem that we can do so.) Just to get a handle on numbers here, let us assume that the initial velocity was 25m/s (this is about 55 mph so pretty fast for a human!). We then have

v 25m/s v2 625m2/s2 t = 0 = ≈ 2.6s, x = 0 = ≈ 63.8m. c g 9.8m/s2 c g 9.8m/s2

This gives us the typical time and length scales for the problem. Moreover, notice that x 62.8m  = c = ≈ 9.85 × 10−6. R 6378.1km Since  is so small, our approximation will probably work quite well (we verify this later). However, solving the approximate problem is easy: the time of maximum height occurs at τ = 1 (or t = tc = 2.6s) and therefore the maximum height is ξ = 1/2 (or xmax = xc/2 = 31.9m).

1.2.2 A different scaling for projectile problem

What if, on the other hand, we had chosen Π2 = Π3 = 1? Then we would have

2 2 R v0 v0 xc = R, tc = , Π1 = ≈ 7 2 2 v0 gR 6.25 × 10 m /s

For human velocities, this is clearly quite small. Again choosing v0 = 25m/s, we have

−6  := Π1 = 9.99 × 10 .

This is again small, but plugging the nondimensional variables into the equation, we obtain

d2ξ 1 dξ  = − , ξ(0) = 0, (0) = 1. dτ 2 (1 + ξ)2 dτ

When we take the limit as  → 0, we obtain an equation where the derivatives disappear, and in fact is not a differential equation at all. This is actually a singular perturbation; we will deal with these kinds of problems as well, but after regular perturbations.

1.2.3 Diffusion equation, revisited Recall the diffusion equation we considered above. We can even add a nonlinear term to it, as follows:

∂u ∂2u = D + γu3, u(x, 0) = u (x). ∂t ∂x2 0 This is an example of a reaction-diffusion equation; the polynomial term is a (local) reaction of the substance whose concentration we are tracking. Rescaling with

x = xcξ, t = tcτ, u = ucν,

14 we obtain 2 ∂ν tc ∂ ν 2 3 = D 2 2 + γtcuc ν . ∂t xc ∂ξ Thus we have Dtc 2 Π1 = 2 , Π2 = γtcuc , xc as our nondimensional quantities. This lets us know what parameters we would choose to have (relative) small diffusion or (relative) large diffusion. If Π1  Π2, i.e.

2 Duc 2  1, γxc then we can write Π1 = , Π2 = 1 and we have the PDE

∂ν ∂2ν =  + ν3, ∂t ∂ξ2

or the small diffusion scaling regime. This lets us know how we would make this small, e.g. say the constants D, γ are fixed, we could either take uc small (small concentrations) or xc large (long lengthscales) to get the small diffusion regime. Similarly, if Π1  Π2, or 2 Duc 2  1, γxc then we can write Π1 = 1, Π2 =  and we have the PDE

∂ν ∂2ν = + ν3, ∂t ∂ξ2

or the small reaction scaling regime. This can be obtained by looking at large concentrations, or really small lengthscales. We will study both of these asymptotic problems below as well; we will see that the small reaction regime is basically a regular perturbation, but the small diffusion regime is a singular perturbation. As we saw in the previous case, when we have a small parameter multiplying a derivative term, setting the small parameter to zero makes that term disappear, and this changes the form of the equation significantly.

15 16 Chapter 2

Regular Asymptotics

ch:RA sec:motivation We start with some motivation in Section 2.1

2.1 Motivation, definitions sec:motivation Consider the projectile problem considered above. One choice of scaling gave us the problem d2x 1 = − , x(0) = 0, x0(0) = 1, dt2 (1 + x)2

and we had  ≈ 10−5. As we saw before, if we ignore the  term, then we have the system d2y = −1, y(0) = 0, y0(0) = 1, dt2 which we can solve in our heads. The remaining question is where or not the solutions x(t), y(t) are close, i.e. whether we have some norm | · | so that lim |x(t) − y(t)| → 0, →0 or, even better, we can approximate the rate at which this limit approaches zero. More generally, we can consider systems of equations of the form

x˙ = f(x) + g(x), x(0) = x0, and question whether we can approximate this system by the system

y˙ = f(y), y(0) = y0, i.e. whether there is some norm in which we can make |x(t) − y(t)| small. It turns out that the answer to this question is yes, in some sense, but the problem is subtle in some surprising ways. We want some terminology to determine whether these expansions are “well-behaved” or not, so we consider the following general context. Definition 2.1.1. Let S be a set and V a vector space, and let F  : S → V be a one-parameter family of functions  defined for  ∈ [0, 0] for some 0 > 0. We define a solution of the asymptotic problem as any x such that F (x) = 0. (2.1) eq:Fe eq:Fe We say that this problem is regularly perturbed if there exists an 1 such that, for all  ∈ [0, 1],(2.1) has at least   0 one solution, and, moreover, there is a choice of solution x for each  ∈ [0, 1] such that lim→0 x = x . If not, we call the problem singularly perturbed.

17 Remark 2.1.2. There are many ways in which a problem can fail to be regularly perturbed: the problem can fail to have solutions in a -neighborhood of 0, it could have solutions that limit on something other than the solution to the unperturbed problem, etc.

2.2 Asymptotics for polynomials

2.2.1 Examples Example 2.2.1. We want to compute the roots of the equation

x2 + 0.003x − 1 = 0.

Let us write  = 10−3, then we have 2 x + 3x − 1 = 0. (2.2) eq:poly1 2 First consider the simplereq:poly1 equation that we obtain when we set  to 0, this gives x0 − 1 = 0, which has roots x0 = ±1. Are the solutions to (2.2) close to ±1? Using the quadratic formula, we have √ 3 ± 92 + 1 3 r 9 x = − = −  ± 1 + 2.  2 2 4

Using the Taylor expansion √ A A2 A3 1 + A = 1 + − + + ..., 2 8 16 we have 3 9 x = ±1 −  ± 2 + O(3).  2 8 We see that lim x = x0, →0 and, moreover, 3 |x − x | =  + O(2).  0 2 eq:poly1 Thus, when finding the roots of (2.2), we know that ±1 is a pretty good approximation to the two solutions, and, moreover, we know that we are off by about .0015 with a high degree of confidence.

Remark 2.2.2. We see in the previous example that we can show that the solutions to the perturbed ( > 0) problem are close to the solutions of the unperturbed ( = 0) problem (up to the fact that we have to keep track of which solution is from the plus choice, and which is from the minus). In fact, we can do even more, and we see that we can get a estimate on the leading-order asymptotic difference between the two solutions. All in all, we see that this problem is regularly perturbed, and we even have what can be called a regular perturbation expansion for the solution.

Example 2.2.3. Let us now consider the polynomial equation

0.001x2 + x − 1 = 0.

We again choose  = 10−3 and obtain 2 x + x − 1 = 0. (2.3) eq:poly2 If we set  = 0 in the equation, we will obtain the equation

x0 − 1 = 0,

18 eq:poly2 which has only one solution. Since we know that (2.3) has two solutions, this should already make us a bit suspicious. Using the quadratic formula, we obtain √ √ −1 ± 1 + 4 1 1 + 4 x = = − ± .  2 2 2

Using the Taylor expansion again, we now have

1  1  x = − ± + 1 −  + O(2)  2 2

We see that there are two radically different scalings, depending on whether we choose a plus or a minus in the previous expansion, i.e. we have 1 x = 1 −  + O(2), − − 1 +  + O(2).  

One of these solutions is O(1) (and close to the x0 solution!) and the other is of a completely different order, and very far away from 1. eq:poly2 Remark 2.2.4. It is clear that this problem is singularly perturbed: although (2.3) has two solutions for every  > 0, one of those solutions limits on −∞, which is not a valid solution to the unperturbed equation.

2.2.2 Systematic expansions for polynomials Regular perturbation example

Let us reconsider the problems above, but attempt to make a systematic approach to finding these solutions in the absence of a solution formula. If we reconsider the equation

(x)2 + 3x − 1 = 0,

then make the Ansatz  x = x0 + x1.

Plugging in and expanding, we obtain

2 2 2 2 x0 + 2x0x1 +  x1 + 3x0 + 3 x1 − 1 = 0,

and if this must be zero for a range of , this means that all coefficients of this polynomial is  must be zero, so we reorder: 2 2 x0 − 1 + (2x0x1 + 3x0) = O( ). Therefore we have the system of equations

2 x0 − 1 = 0, 2x0x1 + 3x0 = 0,

which has solutions x0 = ±1, x1 = −3/2. Therefore we have that

3 x = ±1 −  + O(2),  2

which agrees with the solution above. However, notice that the advantage here is that we did not need to know a solution method for the polynomial, we obtained this solution by solving the unperturbed problem, and then solving a linear equation!

19 Singular perturbation Now we reconsider the equation (x)2 + x − 1 = 0.  Making the same Ansatz x = x0 + x1, we have

2 2 3 2 x0 + 2 x0x1 +  x1 + x0 + x1 − 1 = 0.

Again recombining we obtain 2 2 x0 − 1 + (x0 + x1) = O( ).

This gives x0 = 1, x1 = −1 as a unique solution. Therefore we obtain

x = 1 −  + O(2), which does agree with one of the solutions discovered above, but misses the other entirely. Of course, since the other solution has a completely different scaling form than our Ansatz, this is not surprising, so we should make a different Ansatz. Now, of course, we know the scaling form of our solution, so we should make an Ansatz of the form x x = −1 + x + x + ...  0 1 but we are inspired by this form only because we have a formula for the solution, so this is somewhat cheating. Let us imagine that we had no idea how the solution scaled, and we make the Ansatz

 γ α x =  (x0 +  x1), where α > 0 and γ is real. Note there is nothing really special about this form, all we are assuming is that there is a leading order term which scales like γ for some γ, and a correction term that is asymptotically smaller. Now, plugging in, we obtain

2γ+1 2 2γ+α+1 2γ+2α+1 2 γ γ+α  x0 +  2x0x1 +  x1 +  x0 +  x1 − 1 = 0.

We have six different terms of six different orders. Clearly there are many choices of α, γ that will simplify this expression and allow for cancellations, but how to we proceed systematically to make a choice? The rules of thumb are as follows: first, concentrate on the leading order terms; second, always leave multiple terms at the highest order so that we can cancel terms. Looking at this, we have terms of order q, where

q = 2γ + 1, 2γ + α + 1, 2γ + 2α + 1, γ, γ + α, 0.

Since we will be choosing α > 0, it is clear that every term with an α in it is dominated by some term without an α in it, i.e. 2γ + 2α + 1 > 2γ + α + 1 > 2γ + 1, γ + α > γ. Therefore the only candidates for leading order terms are 2γ + 1, γ, 0. Clearly we cannot choose γ to make all of these equal, but we can make three choices to make any pair of these match, and these choices are

γ = 0, γ = −1, γ = −1/2.

We should not choose γ = 0, since that gives us the previous expansion, and that didn’t work well for us. If we choose γ = −1/2, then 2γ + 1 = 0 and the first and third terms match at order 0, but then we are left with a single term at order −1/2 with nothing to cancel it. Thus we should choose γ = −1. This choice means that we now have terms of order

−1, −1 + α, −1 + 2α, −1, −1 + α, 0.

20 We will need a term to cancel out the order 0 term, and this means that we either have to choose α = 1 or α = 1/2. If we choose the former, we would be left with terms −1, 0, 1, −1, 0, 0,

and if we choose the latter, we would obtain −1, −1/2, 0, −1, −1/2, 0.

It is not entirely clear which choice of α is best here, so we will end up trying both. So let us choose γ = −1, α = 1, then we have

−1 2 0  (x0 + x0) +  (2x0x1 + x1 − 1) = O().

Solving this, we obtain x0 = 0, −1. If x0 = 0, then we obtain x1 = 1, and if x0 = −1, we obtain x1 = −1. Thus we obtain the expansions 1 x = − + 1 + O(), −1 + O().  This matches the expansions that we obtained above. Of course, the second term is somewhat unsatisfying: we have obtained the leading order term correctly, but we don’t have any idea what the correction term looks like. More on this below. Had we made the other choice, we would then have

−1 2 −1/2 2  (x0 + x0) +  (2x0x1 + x1) + (x1 − 1) = 0

Solving the first equation gives x0 = −1, 0 as before. Notice that the second equation gives us x1(2x0 + 1) = −1/2 0, and since 2x0 + 1 6= 0, this gives x1 = 0. This is what we expect, since there is no term of order  in the solution. However, notice that this then makes the problem inconsistent, since if x1 = 0, then we cannot solve the 0 term. This lets us know that we have chosen something wrong in our asymptotic expansion, and at the very least our choice of α leads to problems.

Yet another Ansatz We saw in the previous case that either expansion that we obtained was unsatisfactory in some regard: one choice gives us a consistent (and correct) answer, but one that doesn’t give us the correction term, and the second choice gave us an inconsistent answer. Clearly there is some problem with our Ansatz, and this can be solved by adding yet another term in the expansion. Thus we make the Ansatz

 γ α 2α x =  (x0 +  x1 +  x2).

Plugging this in, we obtain

2γ+1 2 2γ+α+1 2γ+2α+1 2 2γ+3α+1 2γ+4α+1 2 γ γ+α γ+2α  x0+ (2x0x1)+ (2x0x2+x1)+ (2x1x2)+ x2+ x0+ x1+ x2−1 = 0. Collating the orders, we have 2γ + 1, 2γ + α + 1, 2γ + 2α + 1, 2γ + 3α + 1, 2γ + 4α + 1, γ, γ + α, γ + 2α, 0.

Again, since α > 0, the leading order terms can only be 2γ + 1, γ, 0, and we again are left with the choice of γ = −1, giving orders of −1, −1 + α, −1 + 2α, 1 + 3α, −1 + 4α, −1, −1 + α, −1 + 2α, 0.

We again can choose α = 1 which will match the second, seventh, and last terms, or α = 1/2, which matches the second and seventh terms, and the third and last terms. If we make the first choice, this gives

−1 2 0 1 2 2  (x0 + x0) +  (2x0x1 + x1 − 1) +  (2x0x2 + x1 + x2) = O( ),

21 which gives us the system

2 x0 + x0 = 0,

(2x0 + 1)x1 = 1, 2 (2x0 + 1)x2 = −x1. We write it in this form to stress that while this system of equations is nonlinear, it has the pattern that the first equation is the solution to the unperturbed problem, and each further equation is linear in the unknown quantity that appears at that order. So, we can solve as before: if we make the choice x0 = 0, then we have x1 = 1 and thus x2 = −1. If we choose x0 = −1, then we have x1 = −1 and x2 = 1. Therefore, we have the expansions 1 x = 1 −  + O(2), − − 1 +  + O(2),  which agrees with the solution above up to O(2). Had we made the choice γ = −1, α = 1/2, then we obtain

−1 2 −1/2 0 2  (x0 + x0) +  (2x0x1 + x1) +  (2x0x2 + x1 + x2 − 1) = 0.

Notice that again we obtain x0 = 0, −1 and x1 = 0, but now we can solve the last equation for x2 as the solution of (2x0 + 1)x2 = 1. Notice that now the equation for x2 is the same as the equation for x1 when we made the choice of α = 1. Therefore, we obtain our expansions 1 0 √ 0 0 √ x = − + √ − 1 + O( ), + √ + 1 + O( ),    

which, up to truncation of the highersec:? order terms, agrees with both the exact solution, and with the two- term expansion done in Section ??. What we see here is, if we have enough terms in our expansion, we can make the expansion correct, and even when we made the choice of α = 1/2, the expansions let us know that the coefficient of that term was zero, and thus the expansion was unnecessary. So, had we made that choice at first, we would see that there was no need for a term at that order, and that would let us know that α = 1 is the better choice.

2.3 Asymptotics for Differential Equations

2.3.1 Examples We first consider some examples that we know how to solve analytically. Example 2.3.1. Consider the linear ODE x00 + x0 + x = 0, x(0) = 1, x0(0) = 0.

We make the exponential Ansatz x(t) = ert, giving us the characteristic equation r2 + r + 1 = 0.

We can solve this to obtain √  2 − 4 r = − + ± , ± 2 2 or  r = ±i − + O(2). ± 2 This lets us know that we can write the solution of the ODE as

x(t) = e−/2t(α cos t + β sin t), or, e−/2tA cos(t + ϕ),

22 where A, ϕ are the amplitude and phase of the solution. Plugging in the initial data gives  A cos(ϕ) = 1,A sin ϕ = − , 2 or r 2  A = 1 + , tan ϕ = − . 4 2 Since tan(x) ≈ x for x small, we can ignore the tan function. More formally, since we know the Taylor series approximation for tan−1, we have

2  A = 1 + + O(4), ϕ = − + O(3). 8 2 Therefore our solution is  2    −  t    3 x (t) = 1 + e 2 cos t − + O( ). 8 2 Choose any t ∈ R. Then it is not hard to see that

 0 lim x (t) − x (t) = 0. →0

Moreover, we could even consider a fixed time domain and obtain a similar result. For example, choose T > 0 and consider all t ∈ [0,T ]. Then we have

 2   −  t    sup 1 + e 2 cos t − − cos(t) t∈[0,T ] 8 2

2 −  t    2 −  t 2 −  t 2 2 2 = sup (1 +  /8)e cos t − − (1 +  /8)e cos(t) + (1 +  /8)e cos(t) − cos(t) t∈[0,T ] 2

2 −  t    2 −  t 2 −  t 2 2 2 ≤ sup (1 +  /8)e cos t − − (1 +  /8)e cos(t) + sup (1 +  /8)e cos(t) − cos(t) t∈[0,T ] 2 t∈[0,T ]

2 −  t    2 −  t 2 2 ≤ (1 +  /8)e sup cos t − − cos(t) + sup (1 +  /8)e − 1 . t∈[0,T ] 2 t∈[0,T ]

Using the fact that cos(t − /2) = cos(t) −  sin(t)/2 + O(2), and e−t/2 > 1 − t/2, we see that

 0 sup x (t) − x (t) = O(). t∈[0,T ]

Example 2.3.2. We now consider the ODE given by

x00 + x0 + x = 0, x(0) = 1, x0(0) = 0.

We know that this equation has a unique solution. However, if we naively set  = 0 in the equation, we obtain

x0 + x = 0, , x(0) = 1, x0(0) = 0,

and this has no solution (since the general solution is x(t) = Ce−t and we cannot meet both boundary conditions. So, let us solve the equation directly. The characteristic equation is

r2 + r + 1 = 0,

which has solutions √ −1 ± 1 − 4 1  1  r = = − ± − 1 +  + ... ± 2 2 2

23 The general solution of this problem is  r+t r−t x (t) = C+e + C−e , and plugging in boundary conditions gives

C+ + C− = 1,C+r+ + C−r− = 0, or −r− r+ C+ = ,C− = . r+ − r− r+ − r− Expanding in , we 2 3 2 3 C+ = 1 −  + 3 + O( ),C− =  − 3 + O( ). giving −1 x(t) = (1 −  + 32)e(−1−)t + ( − 32)e− +1−t + .... We check that x(0) = 1 + O(2) and x˙ (0) = 0 + O(2). Moreover, choose any t > 0, and notice that

lim x(t) = e−t, →0 which is, in fact, the solution to x0 + x = 0, x(0) = 1. So, in some sense, the solution of the perturbed equation does limit onto the unperturbed equation. However, the way that the boundary conditions are met is by a rapidly- varying term which only exists on a timescale of O(). This is an example of a boundary layer in the problem: very near the place where the conditions are imposed, the solution fluctuates wildly to matchsec:bl it, but otherwise looks like a lower-order equation in the bulk. We will examine this problem in detail in Section ?? below.

2.3.2 Regular asymptotic regime Here we consider one regular asymptotic regime for a family of ODE with a small parameter. Specifically, we consider the system d x = f(x) + g(x), x(0) = x , (2.4) eq:xe dt 0 n n n where x0 ∈ R , and f, g : R → R . We would like to ignore the term of O() in the equation, i.e. replace this equation with an equation of the form

dy = f(y), y(0) = x . (2.5) eq:y dt 0 What we will show below is, under some assumptions delineated below, that

sup |x(t) − y(t)| = O(), t∈[0,T ]

and, in particular, that lim sup |x(t) − y(t)| = 0. →0 t∈[0,T ] Let us first assume that f is linear, i.e. that f(x) = Ax for some n × n matrix. Then defining z = x − y, we have d d d z = x − y = Ax + g(x) − Ay = Az + g(x), dt dt dt and of course z(0) = 0. Using integrating factors, we have

d eAt z − eAtAz = eAtg(x), dt

24 and, integrating both sides of the equation, we obtain

Z t d Z t (eAsz) ds =  eAsg(x(s)) ds 0 ds 0 Z t eAtz(t) − z(0) =  eAsg(x(s)) ds 0 Z t z(t) = e−Atz(0) + e−At eAsg(x(s)) ds, 0 and, using the fact that z(0) = 0, we have Z t z(t) = e−At eAsg(x(s)) ds. 0 We want to show that |z(t)| is O().

Definition 2.3.3. Given any norm on Rn, we define the induced operator norm on linear operators from Rn to itself by |Ax| |A| := sup |Ax| = sup . |x|=1 x6=0 |x| Exercise 2.3.4. Show the second equality in the definition above. We then have that |Ax| ≤ |A| |x| by definition. Then we compute Z t  −At As  |z (t)| ≤  e e |g(x (s))| ds 0 −At As  ≤  e · t · sup e |g(x (s))| . s∈[0,t]

If we make the following assumptions:

1. x(t) is bounded for all t, 2. g is a continuous function (e.g. any polynomial will work),

and use the results of the following

Exercise 2.3.5. Show that for any n × n matrix A, there are a, b, C ∈ R such that At at −At bt e ≤ Ce , e ≤ Ce for all t > 0.

then we have sup |z(t)| ≤ CT e(a+b)T sup |g(x(t))| , t∈[0,T ] t∈[0,T ] and the latter is bounded for all T by the assumptions. Therefore sup |z(t)| = O(), t∈[0,T ] and in particular, lim sup |z(t)| = 0. →0 t∈[0,T ] What changes in this argument if f is nonlinear? Proceeding formally, we again define z(t) and obtain an ODE d z(t) = f(x(t)) − f(y(t)) + g(x(t)). dt

25 Let us assume that f is Lipschitz, i.e. that there is a constant M so that

|f(x) − f(y)| ≤ M |x − y| .

Then we have d    z (t) ≤ M |z (t)| +  |g(x (t))| . dt Recall the alternate form of the triangle inequality:

|x − y + y| ≤ |x − y| + |y| |x| − |y| ≤ |x − y|

And note that d |z(t + h)| − |z(t)| |z(t)| := lim dt h→0 h |z(t + h) − z(t)| ≤ lim h→0 h   z (t + h) − z (t) = lim h→0 h

d  = z (t) , dt so we have d |z(t)| ≤ M |z(t)| +  |g(x(t))| . (2.6) eq:scalar dt We now use a theorem known as Gronwall’s Inequality: Theorem 2.3.6. Let x(t) be a nonnegative differentiable function on [0,T ] which satisfies

x0(t) ≤ a(t)x(t) + b(t),

where a(t), b(t) are nonnegative, integrable functions on [0,T ]. Then

Z t   Z t  x(t) ≤ exp a(s) ds x(0) + b(s) ds 0 0

for all t ∈ [0,T ].

Proof. Exercise! eq:scalar Using Gronwall’s on (2.6), we obtain

 Z t  |z(t)| ≤ eMt z(0) +  |g(x(t))| , 0

Since z(0) = 0, we have sup |z(t)| ≤ T eMT sup |g(x(t))| , t∈[0,T ] t∈[0,T ] and the last term is bounded by the assumptions above.

26 Chapter 3

Random Walks, Diffusions, and SDE

3.1 Random Walks

3.1.1 Unbiased random walk — combinatorial approach We consider an unbiased random walk on the line, which is given by the description that we move left with probability 1/2 and right with probability 1/2. More specifically, if we define Xt to be the (random) location of the walker at time t, then ( Xt + 1, prob. 1/2, Xt+1 = Xt − 1, prob. 1/2.

If we could specify the probability distribution of Xn for all n, then we would understand this problem completely. Let us use the notation p(k, n) = P(Xn = k), the probability that the walker is at state k at time n. It turns out that it is not difficult to write down an exact combinatorial formula for p(k, n). Each particular path of length n has probability 2−n, and to end up at state k we have to take r steps to the right, l steps to the left, and r − l = k. Therefore we have the system

r − l = k, r + l = n.

Adding these equations gives 2r = n + k and therefore n + k must be even (i.e. n, k have the same parity). Solving for r, l gives n + k n − k r = , l = . 2 2 The number of paths with r right steps and l left steps is

 r + l  (r + l)! = , r r!l!

so we have n! −n p(k, n) = n+k  n−k  2 , 2 ! 2 ! if k, n have the same parity, and zero otherwise. While this formula is exact, it is hard to deal with, and a priori not at all easy to estimate. To be more concrete, let us focus on k = 0, then we have to have n even, so we rewrite (2n)! p(0, 2n) = 2−2n. n!n! How large is this when n is large? Does it go to zero? If so, how quickly?

27 To get an idea of the order of magnitude of this quantity, we use the Stirling Approximation √ n! ∼ 2πnnne−n, where we use the notation f(n) ∼ g(n) to mean f(n) lim = 1. n→∞ g(n) Plugging this in, we have √ 4πn(2n)2ne−2n p(0, 2n) ≈ √ √ 2−2n 2πnnne−n 2πnnne−n 1 = √ , πn since all of the large terms cancel. This suggests that p(0, 2n) ∼ n−1/2, n → ∞ and in particular limn→∞ p(0, 2n) = 0. And this makes sense: the question we’re asking is the probability of throwing 2n coins and getting exactly n heads and n tails. While this is the most likely outcome, the probability of getting a precise 50/50 split should be small when n is large. We have capture the decay in the calculation above. Of course, we say suggests above because we should be a little suspicious of the asymptotic calculation we performed up there! Notice that we wrote Stirling’s Formula as just the leading order term, and didn’t write down the correction. But then later, we canceled huge terms in the numerator and the denominator to get our scaling, but of course corrections to Stirling’s Formula could have messed this calculation up. A more precise formulation of Stirling’s Formula goes as follows. We can write the asymptotic series √ nn  1 1 139  n! = 2πn 1 + + − + ... , e 12n 288n2 51840n3 or we can estimate that √ nn n! = 2πn eλn e where 1 1 ≤ λ ≤ 12n + 1 n 12n sec:? for all n. (We will show how to do this below in Section ??, but take these as given for now.) Then we have 1 1 λ ≤ , λ ≥ , 2n 24n n 12n + 1 so (2n)! 1  1 1  ≤ √ exp − , n!n! πn 24n 2(12n + 1) and 1 1 1 − = < 1 for n > 0. 24n 2(12n + 1) 24n(12n + 1) Since ex < 1 + 2x for x < 1, we have

1 1 1  1  p(0, 2n) ≤ √ e 24n(12n+1) ≤ √ 1 + , πn πn 12n(12n + 1) so we obtain the same asymptotics in an upper bound, with the cost of adding a correction of O(n−3/2. One can do similar asymptotics on the lower bound, and we obtain p(0, 2n) = Cn−1/2 + O(n−3/2). This gives us the asymptotics, but it is a lot of work.

28 3.1.2 Unbiased random walk— asymptotic approach Let us modify the problem consider above and add a length-scale and a time-scale to the problem. Specifi- cally, let us take the same unbiased random walk, but now make the step sizes to the left and right ∆x and the timesteps ∆t. That is to say, if we know the position at time t, then we have ( Xt + ∆x, prob. 1/2, Xt+∆t = Xt − ∆x, prob. 1/2.

Now, let us imagine that we know the probability distribution p(x, t) at some time t. Can we write down an infinitesimal law for the evolution of P ? Using naive reasoning, we would say that there are two ways to be at state x at time t + ∆t: either we were at x − ∆x and time t, and moved right, or we were at x + ∆x at time t, and moved left. In short, 1 1 p(x, t + ∆t) = p(x − ∆x, t) + p(x + ∆x, t). (3.1) eq:discrete 2 2 Being more rigorous in our application of probability theory, we have the law of total probability: X P(Xt+∆t = x) = P(Xt+∆t = x|Xt = y)P(Xt = y) y

and noticing that 1 (X = x|X = y) = (δ ), P t+∆t t 2 y−x=±∆x eq:discrete eq:discrete which recovers (3.1). Let us now take ∆x, ∆t small and expand (3.1) in a Taylor series. If we were to assume that p(x, t) is a smooth function of x, t, then we have ∂p p(x, t + ∆t) = p(x, t) + ∆t (x, t) + O(∆t2), ∂t ∂p (∆x)2 ∂2p p(x + ∆x, t) = p(x, t) + ∆x (x, t) + + O(∆x3), ∂x 2 ∂x2 ∂p (∆x)2 ∂2p p(x − ∆x, t) = p(x, t) − ∆x (x, t) + + O(∆x3). ∂x 2 ∂x2

eq:discrete Plugging into (3.1), we obtain

∂p (∆x)2 ∂2p p(x, t) + ∆t (x, t) = p(x, t) + (x, t) + O(∆x3, ∆t2). ∂t 2 ∂x2 ∂p Canceling and solving for , we obtain ∂t ∂p (∆x)2 ∂2p (x, t) = (x, t) + O(∆x3/∆t, ∆t). ∂t 2∆t ∂x2 Clearly the most interesting limit is if the leading order term on the right-hand side is finite and non-zero, so we take the limit (∆x)2 ∆x, ∆t → 0, → D. 2∆t In this limit, the correction term is negligible, so we have

∂p ∂2p (x, t) = D (x, t). (3.2) eq:heat1 ∂t ∂x2

29 This is the diffusion equation that we have seen before, and D is the diffusion coefficient. Note that [D] = L2/T by definition, which is as it should be. Now, we need to specify an initial condition (and perhaps boundary conditions) for the solution of this PDE. It is not too hard to see that the initial condition that corresponds to X0 = 0 is to take p(x, 0) = δ0(x), i.e. the delta function centered at zero. If so, we have that the solution of the heat equation is

1 2 p(x, t) = √ e−x /2Dt 2πDt and we see that p(0, t) ∼ t−1/2. Presuming that we know how to solve PDE, we get the decay times much more easily that we did before. (We will discuss solutions of this PDE later as well.)

3.1.3 Biased random walk Let us try the same procedure with a biased random walk: namely, choose α, β with α + β = 1, and define ( Xt + ∆x, prob. α, Xt+∆t = Xt − ∆x, prob. β. eq:discrete The rule (3.1) will now become p(x, t + ∆t) = αp(x − ∆x, t) + βp(x + ∆x, t),

and when we again do a Taylor expansion we obtain ∂p ∂p (∆x)2 ∂2p ∆t (x, t) = (β − α)∆x (x, t) + (x, t), ∂t ∂x 2 ∂x2 or ∂p ∆x ∂p (∆x)2 ∂2p (x, t) = (β − α) (x, t) + (x, t). (3.3) eq:sp ∂t ∆t ∂x 2∆t ∂x2 Now we have a scaling problem. If we choose the scaling as we did above, i.e. (∆x)2 → D, 2∆t then the first-order term will go like ∆x 1 ∼ → ∞. ∆t ∆x If, on the other hand, we choose a scaling to make the first term finite (∆x ∼ ∆t), then the diffusion term would be zero. We would like the two terms to scale the same way. This is not possible if α, β are constant. But what if we allow α, β to change in the limit? If β − α were O(∆x), then we would be ok. So we will do this: choose 1 1 α = − a∆x, β = + a∆x, 2 2 eq:sp which makes (3.3) into ∂p (∆x)2 ∂p (∆x)2 ∂2p (x, t) = 2a (x, t) + (x, t). ∂t ∆t ∂x 2∆t ∂x2 We again choose (∆x)2 → D, 2∆t then in this limit we obtain ∂p ∂p ∂2p (x, t) = v (x, t) + D (x, t), ∂t ∂x ∂x2 where we write v = 4aD. Notice that [a] = L−1, since [∆x] = L and α, β are numbers, so therefore [v] = L/T , making v a velocity.

30 3.1.4 Another unbiased random walk We could also extend the definition of the random walk above to only require that α+β ≤ 1 in the definition: ( Xt + ∆x, prob. α, Xt+∆t = Xt − ∆x, prob. β.

Again, we rescale the bias and choose γ γ α = − a∆x, β = + a∆x. 2 2 This now gives p(x, t + ∆t) = (1 − α − β)p(x, t) + αp(x − ∆x, t) + βp(x + ∆x, t), and, expanding as above, we obtain ∂x (∆x)2 ∂p (∆x)2 ∂2p = 2a (x, t) + γ (x, t) ∂t ∆t ∂x 2∆t ∂x2 and in the limit (∆x)2 → D, 2∆t we obtain ∂p ∂p ∂2p (x, t) = v (x, t) + γD (x, t), ∂t ∂x ∂x2 where we write v = 4aD. In this manner, we can make the diffusion coefficient not depend exactly on the scaling.

3.1.5 Summary Notice that in all of the models we’ve considered above, we have microscopic parameters γ, a and, in the correct scaling, this gives a macroscopic model for the flow of probability, and this has parameters v, D.

3.2 Solution of heat equation

3.2.1 No drift term eq:heat1 Let us reconsider (3.2) ∂p ∂2p = D . ∂t ∂x2 We have x ∈ R. It is reasonable for us to assume that we should consider only those solutions that decay sufficiently rapidly at infinity, i.e. require that p(x, t) → 0 fast enough as |x| → ∞. But of course, we will still need to specify an initial condition since this is an evolutionary equation. We will have to think about how to interpret the meaning of this initial condition, or in fact the meaning of the solution. We can do a dimensional√ analysis as we did before, and note that we expect to have a nondimensional parameter Π = x/ Dt. We first consider the special function

1 2 k(x, t) = √ e−x /4Dt 4πDt

which is defined for all t > 0 and all x, and, for each fixed t > 0, decayseq:heat1 exponentially fast as |x| → ∞. It is not hard to see that on the domain {(x, t): t > 0}, this function solves (3.2) by differentiating. Notice, first of all, that for all t > 0, ∞ ∞ Z 1 Z 2 k(x, t) dx = √ e−x /4Dt = 1. −∞ 4πDt −∞

31 so it makes sense to think of k(x, t) as a probability distribution. Also notice that this means that we should have [p] = 1/L. Now, as t → 0, the function k(x, t) becomes discontinuous, since ( 0, x 6= 0, lim k(x, t) = t→0 ∞, x = 0. However, we also have Z ∞ lim k(x, t) dx = lim 1 = 1, t→0 −∞ t→0 so it makes sense to interpret k(x, t) distributionally as a δ function. To be more precise in this statement we proceed as follows. Choose some function p0(x), and define Z ∞ p(x, t) := k ∗ p0 := k(x − y, t)p0(y) dy −∞

Let us also denote √ x/ 4Dt 1 1 Z 2 Q(x, t) := + √ e−s ds. 2 π 0 ∂Q Notice that = k, so we write ∂x Z ∞ ∂Q p(x, t) = (x − y, t)p0(y) dy −∞ ∂x Z ∞ ∂Q = − (x − y, t)p0(y) dy −∞ ∂y Z ∞ ∞ dp0 = − Q(x − y, t)p0(y)|−∞ + Q(x − y, t) (y) dy. −∞ dy

If p0(y) → 0 as |y| → ∞, then this boundary term is zero, in which case we have Z ∞ dp0 p(x, t) = Q(x − y, t) (y) dy. (3.4) eq:weak −∞ dy Moreover, notice that Q is continuous in t, so we have Z ∞ dp Z ∞ dp lim p(x, t) = lim Q(x − y, t) 0 (y) dy = Q(x − y, 0) 0 (y) dy. t→0 t→0 −∞ dy −∞ dy Moreover, it is easy to see that ( 1, x > 0, Q(x, 0) = 0, x < 0, so we have Z ∞ Z x dp0 dp0 Q(x − y, 0) (y) dy = (y) dy = p0(x). −∞ dy −∞ dy eq:heat1 Finally, since (3.2) is linear, we have that if k is a solution, then k ∗ p0 is also. To see this: ∂ ∂2 ∂k ∂2k  (k ∗ p ) − (k ∗ p ) = − ∗ p = 0, ∂t 0 ∂x2 0 ∂t ∂x2 0 eq:heat1 Therefore, we have that p(x, t) is a solution of (3.2) with the property that p(x, 0) = p0(x). Said another way, if we want to solve the BVP ∂p ∂2p = D , p(x, 0) = p (x), ∂t ∂x2 0 then we have shown that the solution is Z ∞ 1 −(x−y)2/4Dt k ∗ p0 = √ e p0(y) dy. 4πDt −∞

32 3.2.2 Drift term Let us now consider the solution of ∂p ∂p ∂2p = v + D . (3.5) eq:heat2 ∂t ∂x ∂x2 We first write q(x, t) = p(x − vt, t) and compute:

∂q ∂p ∂p = −v + , ∂t ∂x ∂t ∂2q ∂2p = , ∂x2 ∂x2 eq:heat2 eq:heat1 and so if q satisfies (3.5) then p satisfies (3.2). Therefore we have Z ∞ 1 −(x−y−vt)2/4Dt q(x, t) = √ e p0(y) dy. 4πDt −∞

In particular, if we think of the solution p(x, t) as a Gaussian which is spreading out as time increases, the case with v 6= 0 is the same solution but moving with velocity v.

3.3 Stochastic differential equations

In earlier sections, we considered the correspondence between microscopic and macroscopic models in a particular scaling limit. We were able to obtain a macroscopic description of how probabilities move around, but we might next ask what the paths look like in the macroscopic limit. These paths are the solutions to what are known as stochastic differential equations (SDE). In this section we formally derive their form and properties.

3.3.1 Background on probability theory We state the three main theorems on sums of independent random variables. Throughout this section, we will assume that we have a probability distribution and that X1,X2,...,Xn,... are independent samples from this distribution. Denote the mean of this distribution as µ and its variance as σ2 < ∞, i.e.

2 2 hXii = µ, (Xi − hXii) = σ .

We will denote n 1 X S = X . n n i i=1 Then we have

Theorem 3.3.1 (Weak Law of Large Numbers). Sn → µ is probability, i.e. for any  > 0, we have

lim (|Sn − µ| > ) = 0. n→∞ P

Theorem 3.3.2 (Strong Law of Large Numbers). Sn → µ almost surely, i.e.   lim Sn = µ = 1. P n→∞

33 √ 2 Theorem 3.3.3 (Central Limit Theorem). We have that n(Sn − µ) → N(0, σ ) in distribution, i.e. √  Z z n 1 −s2/2 lim P (Sn − µ) ≤ z = e ds. n→∞ σ 2π −∞

A more colloquial way of saying this is that σ S = µ + √ N(0, 1). n n

3.3.2 Stochastic differential equations Now, recall the random walk process from before, where we have γ (X = 1) = − a∆x, P i 2 γ (X = −1) = + a∆x, P i 2 P(Xi = 0) = 1 − γ.

It is easy to see that µ = −2a∆x. To compute the variance of Xi, notice that we have

2 2 2 (Xi − hXii) = Xi − 2Xi hXii + (hXii) 2 2 = Xi − 2 hXii hXii + (hXii) 2 2 = Xi − (hXii) ,

2 the last formula being a more convenient way to compute this. Notice that Xi = 1 with probability γ, and is zero otherwise, so that ρ2 = γ − (2a∆x)2 = γ − 4a2(∆x)2, 2 where we denote the variance of Xi by ρ . (We will need σ for something else below.) Of course, the position of the random walk after n steps of size ∆x is n∆xSn. Moreover, if we want to know the position of the random walk at time τ, where our timesteps are ∆t, then we need to make τ/∆t timesteps. In short, if we define Yτ = ∆x(τ/∆t)Xτ/∆t, then we have

2 r 2 ρ (∆x) p 2 (∆x) Yτ = ∆x(τ/∆t)µ + ∆x(τ/∆t) N(0, 1) = −4aτ + 2ρ τ N(0, 1). pτ/∆t 2∆t 2∆t

As always, we will choose the limit (∆x)2/2∆t → D, and this gives √ p 2 Yτ = −4aDτ + 2ρ D τN(0, 1).

This is an exact quantity, up to the convergence speed in the Central Limit Theorem, and works for all τ. However, we might also want to consider the case where µ, ρ vary as a function of position, and thus we take τ small. To stress this, we will denote this timestep by δt, and let us define the iterative random equation by √ Yt+δt = b δt + σ δtN(0, 1) p √ where b = −4aD, σ = 2γD − 8a2D(∆x)2 → 2γD. If it makes sense to take the limit as δt → 0, then we want to have a symbol for that limit, and we denote this by the stochastic differential equation

dYt = b dt + σdWt. Oksendal We will not prove here that this limit exists (see e.g. [?]) but it does. In fact, there are many ways to think of SDEs, but one operational way is to think in terms of limits like these. Notice, however, that we’ve taken

34 two limits here; to move from the random walk timescale ∆t to the iterative timescale δt we had to use the CLT and the limit as n → ∞, so we have to be careful to think of scalings where ∆t  δt  1. Notice that since we have an infitesimal δt, we can easily define SDE with non-constant coefficients, as follows: to define the SDE dYt = b(Yt) dt + σ(Yt) dWt, we define the iterative system √ Yt+δt = Yt + b(Yt) δt + σ(Yt) δtN(0, 1).

3.3.3 Fokker-Planck equations If we have microscopic parameters γ, a and consider the diffusive limit, then on one hand we have the macroscopic description of paths, namely the SDE p dYt = −4aD dt + 2γD dWt.

On the other hand, we have the macroscopic description of the flows of probability, namely the PDE

∂p ∂p ∂2p = 4aD + γD . ∂t ∂x ∂x2 So, as a general rule, to any SDE dXt = b dt + σ dWt, we associate to it a particular PDE, the Fokker–Planck equation, namely

∂u ∂u σ2 ∂2u = −b + . ∂t ∂x 2 ∂x2 In particular, if we want to know how the probability distribution of SDE move around, we simply say that if we define p0(x) to be the probability distribution of X0, then u(x, t) gives the probability distribution of Xt. In particular, for constant-coefficient SDE, if we startsec:? out with a deterministic X0, then the distribution for Xt is a shifted Gaussian, as we showed in Section ??. Add discussion of non-constant coefficients

3.4 Comments on units, scaling

35 36 Chapter 4

Birth–Death Processes, Generators

4.1 Counting Process

4.1.1 Exponential Random Variable We define the counting process with rate ρ. Let us first consider the case of an event that happens “at rate” ρ and think of what this means. The interpretation of this should be that if the event has not occurred by time t, then the probability of it happening in the next ∆t timesteps should be ρ∆t to leading order, or P(E ∈ (t, t + ∆t]|E 6∈ [0, t]) = ρ∆t + o(∆t). So, let us define Q(t) as the probability that an event has not occurred by time t. Then Q(0) = 1, and we have Q(t + ∆t) = Q(t) − ρ∆tQ(t) + o(∆t), or Q(t + ∆t) − Q(t) = −ρQ(t) + o(∆t). ∆t Taking the limit ∆t → 0, we have dQ = −ρQ(t),Q(0) = 1. dt The solution of this is Q(t) = e−ρt, so that the probability of surviving to time t goes to zero exponentially fast as t → ∞. If fact, this inspires the definition: Definition 4.1.1. An exponential random variable with rate ρ is a random variable T with the property that −ρt P(T > t) = e We will denote this by T ∼ E(ρ). Now, we might ask the question as to when the event will occur, or, more specifically, what is the probability distribution for the event time. If T ∼ E(ρ), then −ρt −ρ(t+∆t) P(T ∈ (t, t + ∆t]) = P(T > t ∩ T ≤ t + δt) = P(T > t) − P(T > t + δt) = e − e If ∆t << 1, then we have e−ρ(t+∆t) = e−ρte−∆t = e−ρt(1 − ρ∆t + O(∆t)2), so for ∆t  1 we have −ρt P(T ∈ (t, t + ∆t]) = ρe ∆t, thus giving the probability density of ρe−ρt. For those more familiar with probability theory, one can derive this more easily by noting that we are defining the hazard function of E(ρ) in the definition above, and d therefore the density is − (e−ρt). dt

37 4.1.2 Poisson process Let us now consider a generalization of the model given above, where we think of some object being created at rate ρ, and we want to count the number of copies of this object which exist at time t. For example, think of the chemical system A →ρ X where A is a constant source, and we want to know the number of molecules of X. Let Ct be the number of these molecules, then we define

P(Ct+∆t = k + 1|Ct = k) = ρ∆t, P(Ct+∆t = k|Ct = k) = 1 − ρ∆t, P(Ct+∆t = k + 1|Ct = j) = 0, if j 6= k, k − 1. In short, we are defining a process that, in any small timestep ∆t, can either create one molecule with probability ρ∆t, or not. We cannot destroy molecules or create more than one in a given timestep. Let us define Qk(t) = P(Ct = k) as the probability of having k molecules at time t. Then, using an argument similar to that above, we have for k > 0,

Qk(t + ∆t) = P(Ct+∆t = k) = P(Ct+∆t = k|Ct = k)P(Ct = k) + P(Ct+∆t = k|Ct = k − 1)P(Ct = k − 1) = (1 − ρ∆t)Qk(t) + ρ∆tQk−1(t)

= Qk(t) + ρ∆t(Qk−1(t) − Qk(t)).

In the limit as ∆t → 0, this leads to d Q (t) = ρ(Q (t) − Q (t)). (4.1) eq:ODEk dt k k−1 k For k = 0, we obtain the same equation as before, namely d Q (t) = −ρQ (t) (4.2) eq:ODE0 dt 0 0 If we further assume that there are no molecules at the outset of the process, we have

Q0(0) = 1,Qk(0) = 0, for k > 0.

Also, note that from total probability we should have that

∞ X Qk(t) = 1 for allt ≥ 0. k=0 Clearly this is true for t = 0 by definition. Moreover, we have that

∞ ∞ ∞ d X X d X Q (t) = Q (t) = −ρQ (t) + ρ(Q (t) − Q (t)) dt k dt k 0 k−1 k k=0 k=0 k=1 ∞ ∞ X X = −ρQ0(t) + ρ Qk(t) − ρ Qk(t) k=0 k=1 = ρ − ρ = 0, so it remains true for all t > 0. Now we want to solve the system. As before, we have

−ρt Q0(t) = e .

For k > 0, we have the system 0 Qk + ρQk = ρQk−1.

38 Using Duhamel’s formula, we have Z t −ρt ρs Qk(t) = e ρe Qk−1(s) ds. 0 Writing −ρt Qk(t) = fk(t)e , this simplifies to Z t fk(t) = ρfk−1(s) ds, f0(t) ≡ 1. 0 It is not hard to see that this recursion relation gives rise to the formula (ρt)k f (t) = , k k! and thus (ρt)k Q (t) = e−ρt. k k! This is the well-known Possion distribution. To solve this distribution for general initial conditions, let us first consider the case where we have k molecules at time zero with probability one. Then this system is exactly the same, except the indices are shifted by k, i.e. if we assume that

Qk(0) = 1,Ql(0) = 0 for l 6= k, then we have (ρt)l−k Q (t) = e−ρt for all l ≥ k, l (l − k)!

and of course Ql(t) = 0 for l < k (since molecules cannot be destroyed). If we assume that the initial distribution of molecules is random and the distribution is given by the vector λ = (λ0, λ1,... ), where λk is the probability of starting with k molecules, then we have

l X (ρt)l−k Q (t) = e−ρt λ l k (l − k)! k=0 by linearity. Just as a sanity check that this is still a distribution, we compute

∞ ∞ l X X X (ρt)l−k Q (t) = e−ρt λ l k (l − k)! l=0 l=0 k=0 ∞ l X X (ρt)l−k = e−ρt λ . k (l − k)! l=0 k=0 We will change the order of summation. The set {(k, l)|0 ≤ k ≤ l} can be first written by ranging l from 0 to ∞ then k from 0 to l, or can also be described as ranging k from 0 to ∞ and l from k to ∞, so we have ∞ ∞ ∞ X X X (ρt)l−k Q (t) = e−ρt λ l k (l − k)! l=0 k=0 l=k ∞ ∞ X X (ρt)l−k = e−ρt λ k (l − k)! k=0 l=k ∞ ∞ −ρt X ρt X = e λke = λk = 1. k=0 k=0

39 4.1.3 Generators

Now, let us consider two classes of objects defined on the integers. We first consider probability distributions, which are vectors λk defined on Z with

X λk ≥ 0 for all k, λk = 1. k

Let us also consider observables, which are simply functions f : Z → R, i.e. functions that can take any real value on the integers. In practice, we will be considering bounded observables, i.e. those functions with maxk |f(k)| < ∞. For a given observable f, let us compute the expected value of the observable evaluated on the counting process. One way to think of this is that if we will receive a payoff of f(k) units if we have k molecules at time t, then we want to compute the average payoff at time t. Recalling that if we have a random variable X that takes values in the integers, we have the average value as

X E[X] := kP(X = k), k and the average value of f(X) is given by

X E[f(X)] = f(k)P(X = k). k

So, as a general formula, we have

X E[f(Ct)] = f(k)P(Ct = k) k X (ρt)k = f(k)e−ρt k! k

and it seems difficult to evaluate this sum exactly for general f. However, if we consider the case f(x) = x, then we have

∞ ∞ X X (ρt)k [C ] = k (C = k) = ke−ρt E t P t k! k=0 k=0 ∞ ∞ X (ρt)k X (ρt)k = e−ρt k = e−ρt k k! k! k=0 k=1 ∞ ∞ X (ρt)k X (ρt)l+1 = e−ρt = e−ρt (k − 1)! l! k=1 l=0 ∞ X (ρt)l = ρte−ρt = ρt. l! l=0

40 However, is there a way to attack the problem for general f? We compute:

∞ X E[f(Ct+∆t)] = f(k)P(Ct+∆t = k) k=0 ∞ X = f(0)P(Ct+∆t = 0) + f(k)P(Ct+∆t = k) k=1 = f(0)P(Ct+∆t = 0|Ct = 0)P(Ct = 0) ∞ X + f(k)[P(Ct+∆t = k|Ct = k − 1)P(Ct = k − 1) + P(Ct+∆t = k|Ct = k)P(Ct = k)] k=1 ∞ ∞ X X = f(k)P(Ct+∆t = k|Ct = k − 1)P(Ct = k − 1) + f(k)P(Ct+∆t = k|Ct = k)P(Ct = k) k=1 k=0 ∞ ∞ X X = f(k)ρ∆tP(Ct = k − 1) + f(k)(1 − ρ∆t)P(Ct = k) k=1 k=0 ∞ ∞ ∞ X X X = f(k)P(Ct = k) + ρ∆t f(k + 1)P(Ct = k) − ρ∆t f(k)P(Ct = k) k=0 k=0 k=0 ∞ X = E[f(Ct)] + ρ∆t [f(k + 1) − f(k)]P(Ct = k). k=0

Now, let us define a linear operator on observables given by

Lf(x) := ρ(f(x + 1) − f(x)).

Then we have d [f(C )] = [Lf(C )]. dtE t E t Moreover, if we condition this rate of change on the number of molecules at time t, the right-hand side becomes deterministic. To see this, let us condition on the event that Ct = x, and consider the last line of the computation above, namely we have

∞ X E[f(Ct+∆t)|Ct = x] = E[f(Ct)|Ct = x] + ρ∆t [f(k + 1) − f(k)]P(Ct = k|Ct = x), k=0

but of course P(Ct = k|Ct = x) = δk,x, so we have

∞ X E[f(Ct+∆t)|Ct = x] − E[f(Ct)|Ct = x] = ρ∆t [f(k + 1) − f(k)]δk,x = ρ∆t[f(x + 1) − f(x)], k=0

i.e. d [f(C )|C = x] = Lf(x). dtE t t This linear operator L is called the generator of the Poisson process.

4.1.4 `p spaces and adjoints

Recall the definitions of the `p spaces:

41 Definition 4.1.2. If f : Z → R, we define the `p norm

!1/p X p kfkp := (f(k)) k

for 1 ≤ p < ∞, and the `∞ norm as

kfk∞ = sup |f(k)| . k

Exercise 4.1.3. Show that k · kp is a norm for 1 ≤ p ≤ ∞.

Exercise 4.1.4. Show that for any function f with kfk∞ < ∞,

lim kfk = kfk . p→∞ p ∞

This justifies the notation of the latter.

Definition 4.1.5. For all 1 ≤ p ≤ ∞, we define `p(Z) as

p ` (Z) := {f : Z → R| kfkp < ∞}.

Notice that, by definition, probability distributions are in `1(Z) and bounded observables are in `∞(Z). This means that there is a way to multiply these two objects, and for this we need the following theorem.

p q 1 1 Theorem 4.1.6 (Holder’s¨ Inequality). Let f ∈ L (Z), g ∈ L (Z) with p + q = 1. Then X |f(k)g(k)| ≤ kfkp kgkq . k

Remark 4.1.7. Throughout this section, whenever 1 ≤ p ≤ ∞, we will write p0 to be the (unique) number such that 1 1 p + p0 = 1.

Proof. We first have to establish an inquality known as Young’s Inequality, i.e. if 1 < p < ∞ and a, b > 0, then 0 ap bp ab ≤ + . p p0 To see this, notice that the map x 7→ ex is convex, and thus

0 0 0 p p log a+log b 1 log ap+ 1 log bp 1 log ap 1 log bp a b ab = e = e p p0 ≤ e + e = + . p p0 p p0

Now, let us first assume that kfkp = kgkp0 = 1, 1 < p < ∞. Then we have

0 p X X |f(k)p| g(k) |f(k)g(k)| ≤ + p p0 k k 1 1 1 1 = kfk + kgk = + = 1. p p p0 p0 p p0

More generally, if kfkp , kgkp0 < ∞, then we define

f g fˆ = , gˆ = , kfkp kgkp0

42 and we have

X X ˆ |f(k)g(k)| = f(k)ˆg(k) kfkp kgkp0 k k

X ˆ = kfkp kgkp0 f(k)ˆg(k) k

≤ kfkp kgkp0 .

Finally, in the case where we choose p = 1, p0 = ∞, we have

X X   X |f(k)g(k)| ≤ |f(k)| sup |g(k)| = sup |g(k)| |f(k)| = kfk1 kgk∞ . k k k k k

0 Definition 4.1.8 (inner product). Whenever f ∈ `p, g ∈ `p , we define X hf, gi := f(k)g(k), k and by H¨older(and the triangle inequality) this is always finite.

Exercise 4.1.9. Prove that this is an inner product.

In particular, whenever λ is a probability distribution, and f a bounded observable, then X hλ, fi = λkfk < ∞. k

Once we have an inner product, it makes sense to define an adjoint:

0 Definition 4.1.10 (adjoint). Let L: `p → `p be a linear operator. We define the (unique!) linear operator L∗ : `p → 0 `p by requiring that hLf, gi = hf, L∗gi

0 for all f ∈ `p, g ∈ `p .

Remark 4.1.11. Remarks on existence, uniqueness, and linearity if L is bounded.

Definition 4.1.12 (Bounded linear operator). Let V be a normed vector space, and L: V → V be a linear operator. We define kLvk kLkop(V ) = sup = sup kLvk . v∈V kvk kvk=1

If kLkop(V ) < ∞, then we say that L is bounded on V .

4.1.5 Adjoint of generator Recall the generator given for the Poisson process by

Lf(x) = f(x + 1) − f(x).

This is clearly a bounded operator on `∞. To see this, we compute

kLfk∞ = sup |f(x + 1) − f(x)| ≤ 2 sup |f(x)| = 2 kfk∞ < ∞. x x

43 So, in fact, kLkop(`∞) ≤ 2. Now, what is the adjoint of this operator when we think of it as an operator on probability distributions? We have to have that hL∗λ, fi = hLf, λi ,

i.e. ∞ ∞ X X ∗ ρ(f(k + 1) − f(k))λk = (L λ)kf(k). k=0 k=0 We compute:

∞ ∞ ∞ X X X ρ(f(k + 1) − f(k))λk = ρf(k + 1)λk − ρf(k)λk k=0 k=0 k=0 ∞ ∞ X X = ρf(k)λk−1 − ρf(k)λk k=1 k=0

and we see that we must define ( ∗ −ρλ0, k = 0, (L λ)k := ρ(λk−1 − λk), k > 0 eq:ODEkeq:ODE0 ∞ Recall the ODE for the probability distribution of the Poisson process (4.1, 4.2), if we define Q(t) = (Qk(t))k=0 to be the probability distribution of the Poisson process at time t, then d Q = L∗Q. dt This is not a coincidence. Using our inner product notation, if we ever have a random variable X with probability distribution λ, then X hλ, fi = λkfk = E[X]. k Thus we write d d [f(C )] = hQ, fi = hQ0, fi = hL∗Q, fi dtE t dt sec:? but from the computations in Section ??, we have d [f(C )] = [Lf(C )] = hQ, Lfi . dtE t E t So even though we have done the calculations for both the derivative of the probability distribution and the expectation of an observable, we see that the calculations are (at least for the Poisson) dual to each other. In fact, we will see below that this duality will hold in general.

4.1.6 Large number limit As we stated above, we can think of the Poisson process as a model for a system that creates a certain object with rate ρ and then computes the distribution for the size of the population at some future time. Now, let us imagine that we have N copies of such a process, namely that we have Poisson processes (1) (2) (N) Ct ,Ct ,...,Ct where each are a Poisson process, and we define

N 1 X (i) X = C . t N t i=1

44 The question we ask is, what is the generator of the stochastic process Xt? The only thing that can (i) happen for the process Xt is that it can stay the same, or increase by 1/N, if one of the Ct increases by 1. In (i) (j) short, consider the event Ei defined as “Ct increases by one, and the other Ct do not increase”. Clearly these events are disjoint, and we can write

N X P(Xt+∆t = α + 1/N|Xt = α) = P(Ei|Xt = α) i=1 N X (i) (i) Y (j) (j) = P(Ct+∆t = Ct + 1|Xt = α) P(Ct+∆t = Ct |Xt = α) i=1 j6=i N X Y = ρ∆t (1 − ρ∆t) i=1 j6=i N X = ρ∆t + O(∆t)2 = Nρ∆t + O(∆t)2. i=1 Thus we can think of the mean process as a process that makes smaller jumps (smaller by a factor of N) but faster jumps (faster by a factor of N). We technically still need to show that the probability of Xt jumping by 2/N (or 3/N, 4/N, etc.) is infitesimally small, but we could do so, and we would have that

2 P(Xt+∆t = α + 1/N|Xt = α) = Nρ∆t + O(∆t) , 2 P(Xt+∆t = α|Xt = α) = (1 − Nρ∆t) + O(∆t) . So we compute X E[f(Xt+∆t)|Xt = α] = f(β)P(Xt+∆t = β|Xt = α) β∈Z/N = f(α + 1/N)Nρ∆t + f(α)(1 − Nρ∆t) = f(α) + Nρ∆t [f(α + 1/N) − f(α)] .

Noting that E[f(Xt)|Xt = α] = f(α), we have d [f(X )|X = α] = Nρ [f(α + 1/N) − f(α)] . dtE t+∆t t Therefore we should choose the generator LN to be LN f(x) = Nρ [f(x + 1/N) − f(x)] .

(We will generalize this computation and make it more precise below, but at least for now this formally checks.) Now, let us consider what happens as we send N → ∞. We can (at least formally) expand f in a Taylor series, and we obtain 1 1 f(x + 1/N) = f(x) + f 0(x) + f 00(x) + O(N −3) N 2N 2 and this gives us  1 1  LN f(x) = Nρ f 0(x) + f 00(x) + O(N −3) N 2N 2 ρ = ρf 0(x) + f 00(x) + O(N −2). 2N So, in some sense, ρ lim LN f(x) = ρf 0(x) + f 00(x) N→∞ 2N

45 as operators. This is a differential operator, but we have seen these before. In fact, this should correspond to the SDE r ρ dX = ρ dt + dW . t 2N t

4.2 General discrete stochastic process

We want to consider a generalization of the Poisson process that we considered above.

46 Chapter 5

Singular Asymptotics

ch:SA There are many problems that are singular and for which we would want to consider singular asymptotics. We give some examples below.

5.1 Long timescales for weakly nonlinear ODE sec:longtime As we showed above, if we consider the system where dx = f(x) + g(x), dt and replace it with the system dy = f(y), dt then on timescales t ∼ O(1), then it will not make a difference whether we retain the term or not, i.e. we have sup |x(t) − y(t)| = O(). t∈[0,T ] What if, however, we are interested in understanding longer timescales? In particular, what if we wanted an approximate equation for y(t) that allowed an estimate of the form

sup |x(t) − y(t)| = O(), t∈[0,T/]

i.e. we can take times of order O(1/) and still retain such an estimate? It turns out that we can, with (of course) the added complexity of making the equation for y more complicated. But this would be good, because it would tell us that by choosing  sufficiently small, we could make our approximation good for as long as we might need.

5.1.1 Example from Celestial mechanics Let us consider a concrete example where it is clear that O(1) timescale asymptotics are simply not accept- able. It is classically known that the two-body problem in celestial mechanics is solvable while the three-body problem is not. More specifically, let us consider the case of a universesec:dimanal with exactly two bodies, A and B, interacting with gravitational forces alone. As we saw in Section ??, using Newton’s Law of Gravitation and Newton’s Second Law, we have xA − xB x¨B = GmB 3 , kxA − xBk

47 and similarly for body A. Here G is the gravitational constant. Note that [G] = L3/MT 2. More generally, if we consider N bodies, then the acceleration on body n due to the other bodies is given by X xk − xn x¨n = G mk 3 . k6=n kxk − xnk Although this equation does not look much different, formally, when we change the number of bodies, the types of solutions that this equation can exhibit change radically. As said above, when there are two bodies, the solutions are simple: each body orbits on a ellipse with the center of mass at one of the foci. This is in accordance with Kepler’s First Law of Planetary Motion. Once we move to three bodies, the solution is much more complicated and can, e.g., exhibit chaos. So let us think of how to solve the three-body problem perturbatively, i.e. to think of it as a perturbation of a solvable problem (the two-body problem). In particular, let us consider the three bodies to be the Earth, the Sun, and Jupiter. In the absence of Jupiter, we expect the Earth to move around the Sun in an ellipse (here we’re eliding a bit and by “Earth” we really mean “the coupled Earth-Moon system”), but when Jupiter appears it should modify Earth’s dynamics a bit. So we should have the following dynamical system for Earth: ! xS − xE xJ − xE x¨E = G mS 3 + mJ 3 , xE(0) = x0, x˙ E(0) = v0. kxS − xEk kxJ − xEk

We nondimentionalize as always with

x = xcξ, quadt = tcτ,

and we obtain 2 2 ¨ GmStc ξS − ξE GmJ tc ξJ − ξE x0 ˙ v0tc ξE = 3 3 + 3 3 , ξ(0) = , ξ(0) = . xc kξS − ξEk xc kξJ − ξEk xc xc

We have four nondimensional parameters, namely

2 2 GmStc GmJ tc x0 v0tc Π1 = 3 , Π2 = 3 , Π3 = , Π4 = . xc xc xc xc

We will choose Π1 = Π3 = 1. We want to consider the effect of Jupiter on the system as a perturbation, so we should definitely not choose Π2 = 1, and in fact we see that Π2 = mJ /mS. Plugging in the values for the masses of the Sun and Jupiter, we have

27 30 −4 mJ = 1.90 × 10 kg, mS = 1.99 × 10 kg,  := Π2 ≈ 9.5 × 10 .

Since we choose Π3 = 1, we have xc = x0. Taking the average distance of the Earth from the Sun gives

8 11 xc = 1.5 × 10 km = 1.5 × 10 m,

and x3 (1.5 × 1011m)3 t2 = c = ≈ 2.5 × 1013s2. c −11 3 30 GmS (6.7 × 10 m /kg/s)(1.99 × 10 kg) This gives 6 tc ≈ 5.0 × 10 s ≈ 58 days . Notice that (3 × 104m/s)(5 × 106s) Π ≈ ≈ 1. 4 1.5 × 1011m

48 This is not a coincidence and is in fact due to the fact that there is more structure in this system than we have heretofore recognized, but we will not use that here. Now, the question one might ask is, how long can we guarantee that the Earth’s orbit does not sig- nificantly deviate from its current one due to the effects of Jupiter? A quick look on Wikipedia (http: //en.wikipedia.org/wiki/Habitable_zone) tells us that there are many estimates, but a conser- vative estimate is that if the Earth were to change its distance from the Sun by 25%, it would become uninhabitable. Therefore, if we know that the error incurred by ignoring the Jupiter term can be bounded by τ, we need to choose τ ≤ 1/(4) ≈ 263.. Converting this to the original time units, we have that our approximation breaks down at time

4 t = tcτ ≈ 58 days · 264. ≈ 1.5 × 10 days ≈ 41.8 years.

Hopefully this is not sharp! In fact, we know from geological and paleobiological evidence that the Earth’s orbit has been roughly stable over timescales of length 109 years, so this estimate is actually quite conser- vative. If we could go to longer timescales, this would probably be more realistic.

5.1.2 Weakly nonlinear oscillators Let us consider a version of a weakly nonlinear oscillator, namely, an oscillator of the form

x¨ + ω2x = x3, (5.1) eq:osc

and try and find an approximating equation that works for t ∼ −1. We first try the naive Ansatz of expanding our solution in powers of , so we try

2 x = x0 + x1 + O( ).

Plugging this in and separating scales, we obtain

2 x¨0 + ω x0 = 0, (5.2) eq:osc0 2 3 x¨1 + ω x1 = x0. (5.3) eq:osc1 eq:osc0 We solve (5.2) as iωt −iωt x0(t) = Ae + Be ,

where A, B are arbitrary (complex) constants. Notice that since x(t) is real, we have x0 = x0, or Aeiωt + Be−iωt = Ae−iωt + Beiωt, eq:osc1 and so B = A. We will use this fact later but we do not need it now. Plugging our solution for x0 into (5.3), we obtain 2 3 3iωt 2 iωt 2 −iωt 3 −3iωt x¨1 + ω x1 = A e + 3A Be + 3AB e + B e . (5.4) eq:osc1plug Since this equation is a forced linear equation, we can consider each term separately and then reform a lniear combination. In general, there is a formula for solving systems like this, and we have Theorem 5.1.1. Given an equation of the form x¨ + ω2x = eiνt,

the general solution to this system is

( iωt −iωt 1 iνt 2 2 Ae + Be + ω2−ν2 e , ω 6= ν , x(t) = iωt −iωt 1 iνt 2 2 Ae + Be + 2iν te , ω = ν .

In particular, the solution is bounded in time iff ω2 6= ν2.

49 Proof. First assume that ω2 6= ν2, and make the Ansatz x = Ceiνt.

Then x¨ + ω2x = C(−ν2 + ω2)eiνt = eiνt, or 1 C = . ω2 − ν2 This is valid, of course, only if ω 6= ±ν. If it turns out that this denominator is zero, then we should try the Ansatz x = Cteiνt. We have x¨ + ω2x = 2iνCeiνt − ν2Cteiνt + ω2Cteiνt, and since ω2 = ν2, we have 2iνCeiνt = eiνt, or 1 C = . 2iν

eq:osc1plug Using the theorem, we can solve (5.4) and obtain 3A2B 3A2B A3 B3 x (t) = teiωt − te−iωt + e3iωt + e3iωt + αeiωt + βe−iωt. 1 2iω 2iω 9 − ω2 9 − ω2

The constants α, β are arbitrary, but notice that no matter how we choose them, the solution for x1(t) is unbounded in time if A, B 6= 0. In particular, as t ∼ 1/, we have x1(t) ∼ 1/ and x1(t) ∼ 1. This means that the Ansatz we made above breaks down and nothing in the above approximation is necessarily valid.

5.1.3 Method of Multiple Scales Add some history Let us now make a more complicated Ansatz. We will still assume that there is a scale separation in the equations, but now let us also assume that each piece depends on two timescales, a fast time t and a slow time τ, and we assume that τ = αt for some α > 0. We then have d ∂ ∂ ∂τ ∂ ∂ = + = + α , dt ∂t ∂τ ∂t ∂t ∂τ by the chain rule. We can take an iterate of this operator to obtain

d2  ∂ ∂ 2 ∂2  ∂2 ∂2  ∂2 = + α = + α + + 2α . dt2 ∂t ∂τ ∂t2 ∂t∂τ ∂τ∂t ∂τ 2

We will further make the simplifying assumption that the partials commute, i.e. that ∂2 ∂2 = . ∂t∂τ ∂τ∂t This is, of course, not strictly true since τ and t aren’t really independent variables, but since they evolve on such different timescales this might be at least approximately true. Then, in short, we have d2x = x + 2αx + 2αx . dt2 ,tt ,tτ ,ττ

50 where we are denoting subscripted variables after the comma to be derivatives, i.e. ∂2x x = , ,tt ∂t2 eq:osc etc. If we then write x = x0 + x1 = x0(t, τ) = x1(t, τ), and plug this into (5.1), we have

α α+1 2α 2α+1 2 2 3 x0,tt + x1,tt + 2 x0,tτ + 2 x1,tτ +  x0,ττ +  x1,ττ + ω x1 + ω x1 = x0 + h.o.t. As always, the first guess is to choose α so that the highest-order terms which are not O(1) match, which suggests that we should choose α = 1. We then obtain

2 x0,tt + ω x0 = 0, (5.5) eq:mms0 2 3 x1,tt + ω x1 + 2x0,tτ = x0. (5.6) eq:mms1 eq:mms0 eq:osc0 eq:mms0 Now, (5.5) lookseq:osc0 the same as (5.2), excepteq:mms0 notice that x0 is now a function of two variables, so that (5.5) is a PDE, whereas (5.2) is an ODE. Solving (5.5), we obtain

iωt −iωt x0(t, τ) = A(τ)e + B(τ)e .

Notice that the arbitrary constants are no longer constants, but are arbitrary functions of τ. Notice then that we have, similarly to before,

3 3 3iωt 2 iωt 2 −iωt 3 −3iωt x0 = A(τ) e + 3A(τ) B(τ)e + 3A(τ)B(τ) e + B(τ) e , and we also have 0 iωt 0 iωt 2x0,tτ = iωA (τ)e − iωB (τ)e , where we denote 0 as the derivative with respect to τ. Putting all of this together, we obtain

2 3 3iωt 2 0 iωt 2 0 −iωt 3 −3iωt x1,tt + ω x1 = A(τ) e + (3A(τ) B(τ) − 2iωA (τ))e + (3A(τ)B(τ) + 2iωB (τ))e + B(τ) e .

As we aw before, the terms that will give a problem are those terms that look like e±iωt on the right-hand side, and we can remove this problem by setting their coefficients equal to zero. Thus we obtain the system 3 3 A0(τ) = A(τ)2B(τ),B0(τ) = − A(τ)B(τ)2. 2iω 2iω We now recall the observation that B = A due to the reality of x, and we have 3i A0(τ) = − A(τ) |A(τ)|2 . 2ω This is a nonlinear equation so in general we might not be able to solve it. However, we can write down this solution in closed form; the easiest way to see this is to write the equation in polar coordinates. We write A = ρeiθ,AA = ρ2, 2iθ = A/A. Differentiating the first relation gives

0 3i 3i 2ρρ0 = A0A + AA = −A A(τ) |A(τ)|2 + A A(τ) |A(τ)|2 = 0, 2ω 2ω so we have that radius is conserved, ρ(t) = ρ(0). (This is not surprising in retrospect, since if we have the linear system A0 = iA, the radius stays conserved, and all we’re doing is multiplying this equation by the real quantity |A|2.) We also have A0 ρ0eiθ + ρeiθiθ0 ρ0 = = + iθ0, A ρeiθ ρ

51 and since ρ0 = 0, we have A0 3 3 θ0 = −i = − |A(τ)|2 = − ρ2. A 2ω 2ω 0 We can solve this as 3ρ2 θ(τ) = θ(0) − 0 τ, 2ω and therefore  3ρ2   3ρ2  A(τ) = A(0) exp −i 0 τ = A(0) exp −i 0 t . 2ω 2ω This gives us  3ρ2    3ρ2   x (t) = x(0) exp − 0 t eiωt = x(0) exp i ω −  0 t . 0 2ω 2ω So we see that the addition of the nonlinearity in the equation leads to a slow phase shift, and on timescales of −1, this solution will get far from the solution with  = 0, and it doesn’t make sense to try and estimate

  2   3ρ0 iωt x(0) exp i ω −  t − x(0)e 2ω

without taking this phase shift into account.

Exercise 5.1.2. Prove odds and evens.

5.1.4 Linearization and resonance–revisiting the nonlinear oscillator The lesson of the above example is that if we want to get long timescale approximations, we must take resonant terms into account. Now, it was somewhat difficult to figure out which terms are resonant and which terms are not in the above example, and we would like to streamline the process. It turns out that the best formalism for this is to write the equations as first-order diagonal systems. For example, if we revisit the equation

x¨ + ω2x = x3,

writing x˙ = v, we obtain the first-order (but two-dimensional) system

x˙ = v, v˙ = −ω2x + x3,

or, written in vector form, d  x   0 1   0  = +  . dt v −ω2 0 x3 This still isn’t in the form we would like, because we would prefer a diagonal matrix. We can compute the eigenvalues and eigenvectors of the matrix in the equation above, and we obtain the standard diagonaliza- tion        1 i  0 1 1 1 iω 0 2 2ω 2 = 1 i . −ω 0 iω −iω 0 −iω 2 − 2ω So we have written A = SDS−1, where D is the diagonal matrix with the eigenvalues on the diagonal, and S is the matrix whose columns are the corresponding eigenvectors. Now, as a general rule, if we have a system in the form

x˙ = Ax + f(x),

52 and make the change of variables x = Sy, then we have y˙ = S−1x˙ = S−1(Ax + f(x)) = S−1ASy + S−1f(Sy), and S−1AS = D, so we have the system y˙ = Dy + S−1f(Sy).

Also notice that since S, S−1 are linear transformations, if f is a polynomial of degree p, then S−1f(S · ) will be as well. Thus we can assume wlog that our linear piece is diagonal below. But for now, let us continue the computation. If we make the variable 1 i z = x + v 2 2ω as suggested by the first column of S−1, we have 1 i 1 i i 1 i z˙ = x˙ + v˙ = v + (−ω2x + x3) = − ωx + v + x3. (5.7) eq:z 2 2ω 2 2ω 2 2 2ω Noting that i 1 −iωz = − ωx + v, 2 2 and that z + z = x, we obtain i z˙ = −iωz + (z + z)3 = −iωz + (z3 + 3z2z + 3zz2 + z3). 2ω Now, to find , we have the even simpler theorem for first order systems: Theorem 5.1.3. Consider the first-order system

z˙ + αz = eβt, this has solution of the form ( 1 eβt, α 6= β, z(t) = Ae−αt + β−α teαt, α = β, where A is an arbitrary constant. Proof. Now we plug in the na¨ıve Ansatz z(t) = z0(t) + z1(t), which will (as always) give us the equations

z˙0 = −iωz0, i z˙ = −iωz + (z3 + 3z2z + 3z z2 + z3). 1 1 2ω 0 0 0 0 0 0 −iωt We obtain z0(t) = Ae , and then plugging this in, we see that we will obtain

i 2 3 z˙ = −iωz + (A3e−3iωt + 3A2Ae−iωt + 3AA eiωt + A e3iωt). 1 1 2ω As before, we see that three of these terms will be no problem, but the term with −iω will be in resonance and we will have to deal with a 3A2A term.

53 Again, trying the MMS approach, if we write

z = z0(t, τ) + z1(t, τ),

and again write τ = t, we will have

z0,t = −iωz0, i z + z = −iωz + (z3 + 3z2z + 3z z2 + z3). 1,t 0,τ 1 2ω 0 0 0 0 0 0 Solving the first equation gives −iωt z0(t, τ) = α(τ)e . Plugging this in, we obtain

i  3i  3i i z = −iωz + α3e−3iωt + α2α − α0(τ) e−iωt + αα2eiωt + α3e3iωt. 1,t 1 2ω 2ω 2ω 2ω

To remove the problem term, we clearly have to solve the system 3i α0(τ) = α(τ)2α(τ). (5.8) eq:sva2 2ω eq:sva1 This is theeq:z same equation as (??), as it should be. The advantage here is that once we have linearized the system in (5.7), we can identify whicheq:sva2 terms will be resonanteq:z simply by looking at , and in fact it’s not hard to see how to read off (5.8) from observing (5.7). In fact, we will show that this is true in general below.

5.1.5 Resonance for linearized systems

Let us now assume that we have a general system on Rn of the form x˙ = Ax + f(x),

where A is assumed diagonal and f : Rn → Rn is a polynomial. Since f is a polynomial, it can be written as a finite linear combination of monomials. We will denote these terms as follows. First, let us define, for x ∈ Rn and α ∈ Zn, n α Y αk x = xk . k=1

We also define eˆi as the ith standard basis vector, then, we have that any polynomial can be written as the finite sum X α f(x) = Cα,kx eˆk α,k

Let us further assume that the eigenvalues of A are purely imaginary and we will denote them by iωk, so that   ω1 0 0 ...  0 ω2 0 ...    A =  .  .  ..  ... 0 0 ωn eq:wnl If we write out the kth component of ??, we have

X α x˙ k = iωkxk + Cα,kx . α

54 Again, as before, we can identify those problem terms by those which will give an oscillatory response that looks like eiωkt. Now we again make the Ansatz x = x0 + x1, which gives the separation

x˙ 0 = Ax0, X α x˙ 1 = Ax1 + Cα,kx0 eˆk. α,k

Solve the first equation gives At x0(t) = e x0(0), but since A is diagonal it is easy to see what this looks like in coordinates, namely

iωkt x0,k(t) = e x0,k(0).

We now compute: if we know that iωkt x0,k(t) = Cke , then n n n ! n ! ! α α Y αk Y iωkt k Y αk X x = xk = Cke = Ck exp i ωk t . k=1 k=1 k=1 k=1 The constant out front is just a constant, but we see that we will have a resonance if and only if the frequency in the forcing is the same as the natural frequency of the oscillator, but in the kth equation this is iωk. Therefore if we define n X Λα,k := αlωl − ωk = hα, ωi − ωk, l=1 then we have a resonant term if Λα,k = 0.

5.2 Multiscale differential equations

5.2.1 Example Let us consider an example of the Fitzhugh-Nagumo system

x˙ = x − x3/3 − y, y˙ = x.

5.2.2 One slow manifold 5.2.3 Multiple slow manifolds 5.3 Boundary layers for ODE

5.3.1 Example 1. Zero-time blowup First, let us consider a singular initial value problem to motivate the analysis below. Consider the equation

dy  = (y)2, y(0) = y . dt 0

55 Notice that since this is a first-order equation, the distinction between initial-value and boundary-value problem is nonexistent, and here we want to think of this as more of a BVP than an IVP. If we formally set  = 0 in this equation, we obtain y2(t) = 0 which is inconsistent with the initial condition unless y0 = 0. Therefore the problem is singular by definition, since in the limit  → 0 most forms of the problem have no solution. In fact, we can solve this problem exactly and see what happens. If we separate variables and write dy dt = , y2  then we can integrate once to obtain 1 1 t − = , y(0) y(t)  or y y(t) = 0 .  − ty0

This system has a finite-time blowup at t = /y0. Moreover, we see that the blowup time goes to 0 as  → 0, so that every term blows up in O() time, except, of course, the term with y0 = 0.

5.3.2 Example 2. Now let us consider the BVP y00 + y0 + y = 0, y(0) = 0, y(1) = 1. Make the exponential Ansatz y(t) = ert, and we obtain

r2 + r + 1 = 0. sec:? As we studied above in Section ??, the roots of this polynomial act singularly, and we found the roots r± to be 1 r = −1 + , r = − − 1 + , + −  up to O(2). So the general solution of the ODE is

r+t r−t y(t) = C+e + C−e .

Plugging in the two initial conditions, we obtain 1 C+ = ,C− = −C+. er+ − er−

r− Now, as  → 0, we have r−  r+, and in particular e → 0, so we have

−r+ −r+ C+ ≈ e ,C− ≈ −e .

Plugging this in, we obtain y(t) ≈ er+(t−1) − er−t−r+ . Just to check, if we plug in t = 0, 1, we obtain

y(0) ≈ er+ − er+ = 0, y(1) ≈ 1 − er−−r+ ≈ 1.

So, we can solve the ODE, but perhapseq:BVP this solution doesn’t give us a huge amount of insight. If, on the other hand, we take (??) and plug in  = 0 directly, we obtain the system

y0 + y = 0, y(0) = 0, y(1) = 0,

56 but this has no solution. To see this, the solution of the ODE is y(t) = Ce−t. Plugging in the boundary condition at t = 0 gives C = 0 and thus y(t) ≡ 0, which obvious cannot solve the other boundary condition. Conversely, plugging the BC at t = 1, we get C = e and y(t) = e1−t, but unfortunately, this doesn’t satisfy the other condition, since this would give y(0) = e. More generally, if we consider the system 00 0 y + y + y = 0, y(0) = y0, y(1) = 1, then this only has a consistent solution in the  → 0 limit if y0 = e, and nothing else.

5.3.3 Inner and outer expansions eq:BVP However, formally, revisit ??. This solution is actually capable of rapid change, since we are multiplying the highest derivative by a small number. Conversely, notice that the solution we obtain by formally setting  = 0 worked pretty well everywhere in the domain, except at t = 0. So let us consider the fast time τ = t/α, α > 0, and this gives us d d dτ d = = α . dt dτ dt dτ eq:BVP Let us write Y (τ) and plug it into (??), and we obtain 1−2α −α  Yττ +  Yτ + Y = 0, and we only expect that this should be valid in a neighborhood of the origin, so we impose only the bound- ary condition Y (0) = 0. Again using the scheme that we want to choose scales to match the leading order terms, and this suggests that 1 − 2α = −α, or α = 1. Therefore we have −1 −1  Yττ +  Yτ + Y = 0,Y (0) = 0, which we can write as Yττ + Yτ + Y = 0,Y (0) = 0. This is now a regularly-perturbed problem. To see this, if we make the Ansatz Y (τ) = erτ , then we get the characteristic equation r2 + r +  = 0, which has roots r = 0, −1 with an O() correction. Therefore we obtain −τ Y (τ) = C0 + C1e , and imposing the BC at τ = 0, we obtain −τ Y (τ) = C0(1 − e ). Therefore we have two solutions of the system, each valid in different regimes. We have the “inner solution” −τ 1−t Y (t) = C0(1 − e ) which is valid only in a neighborhood of t = 0, and the “outer solution” y(t) = e valid outside of a neighborhood of t = 0. Since we have a continuous solution to this differential equation, we should impose the matching condi- tion lim Y (τ) = lim y(t) = e, τ→∞ t→0 which therefore gives Y (τ) = e − e1−τ . Therefore we should expect the full solution to be of the form η(t) = y(t) + Y (t/) + C for some constant C. Plugging in the boundary data η(0) = 0, η(1) = 1, we obtain that η(0) = y(0) + Y (0) + C = e + C, η(1) = y(1) + Y (/) + C = e + 0 + C, and we see that by choosing C = −e, we can solve both boundary conditions! Specifically, we have η(t) = y(t) + Y (t/) − e = e1−t − e1−t/.

57 5.4 WKB expansions 5.5 Reaction-diffusion equations, small diffusion limit

58 Appendix A

Background on measure theory, probability theory

A.1 Basic Measure Theory

Measure theory is a rich and complex subject and we will not cover it in its generality here. However, we will cover what we will need in this course. Definition A.1.1. Let S be a set. A σ-algebra S on S is a set of subsets of S satisfying 1. ∅ ∈ S, 2. if A ∈ S, then Ac ∈ S, S 3. if An ∈ S, n ∈ N, then n An ∈ S. Remark A.1.2. For any set S, there are always two σ-algebras on S: one is the pair {∅,S}, the other is P(S), the power set of S. Note that S ∈ S by combining the first two axioms. Example A.1.3. Let S = {1, 2, 3}. What are the possible σ-algebras on S? We have the two cases listed above, but there are others. For example, let us assume that {1} ∈ S. Then {1}c = {2, 3} ∈ S by the second axiom. It is not hard to see, however, that this is all we need, since {1} ∪ {2, 3} = S. Therefore one such σ-algebra is S = {∅, {1}, {2, 3}, {1, 2, 3}}.

It is not hard to see that all σ-algebras must be a permutation of this one, or one of the trivial two mentioned above. Definition A.1.4. The pair (S, S) is called a measurable space.A measure on E is a function µ: S → [0, ∞] that has the property of countable additivity, namely that if An ∈ S and Am ∩ An = ∅ for m 6= n, then ! [ X µ An = µ(An). n n The triple (S, S, µ) is called a measure space.

Definition A.1.5. If µ(S) < ∞, we say that µ is a finite measure. If there exists a sequence of sets An ∈ S with µ(An) < ∞ and ∪nAn = S, then we say that µ is a σ-finite measure.

Example A.1.6. Choose our space to be Z and the σ-algebra P(Z). Define the counting measure κ: P(Z) → R by saying that κ(E) is the number of elements of E. It is not hard to see that this measure is countably additive. Now, notice that κ(Z) = ∞, so κ is not finite. However, note that κ([−M,M]) = 2M + 1 < ∞, and clearly Z = ∪M [−M,M]. Thus the counting measure is σ-finite.

59 Remark A.1.7. It turns out that measure theory is much easier if our underlying set is countable. Assume that S is countable, and choose S = P(S). Choose a function λ: S → R≥, i.e. for all s ∈ S, λ(s) ∈ [0, ∞). Define X µ(S) = λ(x). x∈S

This is always a σ-finite measure on S, and, in fact, every σ-finite measure on S can be constructed in such a manner.

Exercise A.1.8. Choose λ: S → R≥ and define µ as above. • First, prove that µ is countably additive since series of positive terms are independent of ordering. • Second, show that µ is σ-finite by exhibiting a sequence of subsets as in the definition. • Finally, show that

60