Topics in analysis 1 — Real functions

Onno van Gaans

Version May 21, 2008

Contents

1 Monotone functions 2 1.1 Continuity ...... 2 1.2 Differentiability ...... 2 1.3 ...... 2

2 Fundamental theorem of 2 2.1 Indefinite ...... 2 2.2 The Cantor ...... 2 2.3 ...... 2

3 Sobolev spaces 2 3.1 Sobolev spaces on intervals ...... 3 3.2 Application to differential equations ...... 6 3.3 Distributions ...... 10 3.4 and fractional ...... 16

4 Stieltjes 20 4.1 Definition and properties ...... 20 4.2 Langevin’s equation ...... 28 4.3 Wiener integral ...... 32

5 Convex functions 33

1 1 Monotone functions

1.1 Continuity 1.2 Differentiability 1.3 Bounded variation 2 Fundamental theorem of calculus

2.1 Indefinite integrals 2.2 The 2.3 Absolute continuity 3 Sobolev spaces

When studying differential equations it is often convenient to assume that all functions in the equation are several times differentiable and that their derivatives are continuous. If the equation is of second order, for instance, one usually assumes that the solution should be two times continuously differentiable. There are several reasons to allow functions with less smoothness. Let us consider two of them. Suppose that u00(t) + αu0(t) + βu(t) = f(t) describes the position u(t) of a particle at time t, where f is some external force acting on the particle. If the force is contiuous in t, one expects the solution u to be twice continuously differentiable. If the functions f has jumps, the second of u will not be continuous and, more than that, u0 will probably not be differentiable at the points where f is discon- tinuous. Such external forces typically appear in systems such as electrical circuits, where external influences can be switched on or off. There is no harm in such discontinuities. The function u00 still exists and u0 will still be the indefinite integral of u00. The second reason concerns mathematical rather than modeling issues. Differential equa- tions can often not be solved explicitly. It can then still be useful to know existence of solutions. For instance, if one wants to apply numerical approximation schemes. A functional analytic approach to existence of solutions has become a popular method. Showing existence comes down to finding, among all differentiable functions (or m times differentiable functions) a function which satisfies the equation. The set of all differentiable functions is a function space. The functional analytic approach uses structure of function spaces and maps on them to show existence of solutions. It is almost indispensible that the function space in which the solution is sought is a linear space with a norm and that it is complete with respect to the norm. For instance the linear space C 1[0, 1] of all continuously differentiable functions on [0, 1] with the norm f C[0,1] := f(0) + max f 0(t) k k | | 0 t 1 | | ≤ ≤ is a complete normed space, that is, a . It is often desirable that the space is a , that is, a Banach space where the norm is induced by an inner product. It turns out that the space of differentiable functions is not a Hilbert space. However, there are

2 Hilbert spaces consisting of almost everywhere differentiable functions, which turn out to be very useful for the study of differential equations.

3.1 Sobolev spaces on intervals The space C[a, b] of all continuous functions from [a, b] to R is a Banach space with the norm f = max f(t) . k k∞ a t b | | ≤ ≤ This space is not a Hilbert space. Let 2[a, b] denote the space of all Lebesgue measurable L functions f : [a, b] R such that f(t)2 dt < . Let L2[a, b] denote the Banach space of → [a,b] ∞ equivalence classes of functions from 2[a, b], where we identify two functions that are equal R L almost everywhere. The space L2[a, b] is a Hilbert space with the inner product given by

f, g 2 = f(t)g(t) dt h iL Z[a,b] and its induced norm is b 1/2 2 f 2 = f(t) dt . k kL Za  A continuously differentiable function can be seen as the indefinite integral of a . In the hope to obtain a Hilbert space of ‘differentiable’ functions we could consider the functions that are indefinite integrals of functions from L2[a, b]. Recall that every g ∈ L2[a, b] is integrable. Define t W 1,2[a, b] := α + g(s) ds: α R, g L2[a, b] . { ∈ ∈ } Z0 First observe that there is no ambiguity in the function t t g(s) ds. If we would take 7→ 0 another element g˜ from the same equivalence class as g, then g˜ and g are equal almost R everywhere, so that their integrals over any set coincide. Observe further that the set W 1,2[a, b] is a vector space with the pointwise addition and scalar multiplication. We have seen that the indefinite integral of an integrable function is absolutely continuous. Therefore the function f(t) = α+ t g(s) ds, where g L2[a, b], is absolutely continuous. Moreover, f is differentiable 0 ∈ almost everywhere and its derivative equals g almost everywhere. We will denote the (almost R 1,2 everywhere defined) derivative of an absolutely continuous function f by f 0. The set W equals b 1,2 2 W [a, b] = f : [a, b] R: f is absolutely continuous and f 0(t) dt < . { → ∞} Za Define an inner product on W 1,2[a, b] by b 1,2 f, g 1,2 = f(a)g(a) + f 0(t)g0(t) dt, f, g W . h iW ∈ Za The induced norm is given by

b 1/2 2 2 f = f(a) + f 0(t) dt k k1,2  Za  2 2 1/2 = f(a) + f 0 2 . k kL  3 1,2 The space W [a, b] endowed with the inner product , 1,2 is called a . The h· ·iW role of the upper indices 1, 2 becomes clear if we introduce some other Sobolev spaces:

t W 1,p[a, b] = α + g(s) ds: α R, g Lp[a, b] { ∈ ∈ } Z0 and, inductively,

t m,p m 1,p W [a, b] = α + g(s) ds: α R, g W − [a, b] , { ∈ ∈ } Z0 where m N and p 1. We focus on W 1,2. ∈ ≥ 1,2 Lemma 3.1. The Sobolev space (W [a, b], , 1,2 ) is a Hilbert space. h· ·iW 1,2 Proof. It is straightforward to show that W is a vector space, that , W 1,2 is a symmetric 1,2 1,2 h· ·i bilinear form on W , and that f, f W 1,2 0 for every f. If f W is such that f, f W 1,2 = b 2 h i ≥ ∈ h i 0, then a f 0(t) dt = 0, so f 0(t) = 0 for almost every t. Since f is absolutely continuous (as it is an indefinite integral), it follows that f is constant. Since we also have that f(a) = 0, R the function f must be identically zero. 1,2 1,2 Next we show that W [a, b] is complete. Let (fn) be a Cauchy sequence in W . As 2 2 fn0 fm0 2 fn fm W 1,2 , the sequence (fn0 ) is a Cauchy sequence in L [a, b]. Because k 2 − k ≤ k − k 2 L [a, b] is complete, there exists a g L [a, b] such that fn0 g L2 0 as n . Also, ∈ k − kR → → ∞ fn(a) fm(a) fn fm W 1,2 , so (fn(a)) is a Cauchy sequence in . Let α := limn fn(a). | − | ≤ k − k →∞ Define t f(t) := α + g(s) ds, t [a, b]. ∈ Za Then f W 1,2. It remains to show that f is the limit of f with convergence in the sense of ∈ n W 1,2. Well, 2 2 2 f f 1,2 = (α f (a)) + g f 0 0 as n , k − nkW − n k − nk2 → → ∞ 1,2 1,2 so f is indeed the limit in W of (fn). Hence W is complete. There is a physical interpretation of the space W 1,2. If f describes the position of a particle, then f (t) is its velocity at time t. The condition b f (t) dt < then means that 0 a 0 ∞ the total kinetic energy of the motion from time a to b is finite. So W 1,2 is the space of ‘curves R with finite kinetic energy’. b 2 The essential part of the norm 1,2 is the term f (t) dt. Only this term would not k · kW a 0 define a norm as otherwise each constant function would get norm equal to zero. There are R different ways to add a term to make it a norm. A common other choice is the following. Define b b 1,2 f, g = f(t)g(t) dt + f 0(t)g0(t) dt, f, g W [a, b]. hh ii ∈ Za Za It is easy to verify that , is an inner product on W 1,2. hh· ·ii Claim 3.2. The norm induced by , is equivalent to . ||| · ||| hh· ·ii k · k

4 Proof. For x [a, b],by (α + β)2 2α2 + 2β2 and Cauchy-Schwartz, ∈ ≤ x 2 2 f(x) = f(a) + f 0(t) ds  Za  x 2 2 2f(a) + 2 f 0(t)1 dt ≤ Za  2 x 1/2 x 1/2 2 2 2 2f(a) + 2 f 0(t) dt 1 dt ≤ Za  Za  ! b 2 2 2f(a) + 2(b a) f 0(t) dt, (1) ≤ − Za so

b b 2 2 2 2 f(x) dx 2(b a)f(a) + 2(b a) f 0(t) dt a ≤ − − a Z 2 Z 2 max 2(b a), 2(b a) f 1,2 , ≤ { − − }k kW so 1/2 f C f 1,2 , ||| ||| ≤ 1 k kW where C = max 2(b a), 2(b a)2 + 1. 1 { − − k For the in the converse direction, for x [a, b], ∈ x 2 2 f(a) f(x) f 0(t) dx ≤ −  Za  x 2 2 2f(x) + 2 f 0(t)1 dt ≤ a Z x  x 2 2 2 2f(x) + 2 f 0(t) dt 1 dt ≤ a a Zx Z  2 2 2f(x) + 2 f 0(t) dt (b a), ≤ − Za so

1 b f(a)2 = f(a)2 dt b a Za − b b 2 2 2 f(x) dx + 2 f 0(t) dt. ≤ b a − Za Za Hence 1/2 f 1,2 C f , k kW ≤ 2 ||| ||| 2 where C2 = max b a , 2 + 1. Thus the norms and W 1,2 are equivalent. { − } ||| · ||| k · k There are many other equivalent norms, e.g.,

b 1/2 2 f(b) + f 0(t) dt ,  Za 

5 b 1/2 2 2 f(c) + f 0(t) dt ,  Za  b 1/2 2 2 2 5f(a) + 7f(b) + π f 0(t) dt ,  Za  etc. We conlcude this section by a useful lemma.

Lemma 3.3. Let c [a, b]. Let ∈ ϕ (f) := f(c), f W 1,2[a, b]. c ∈ 1,2 Then ϕc is a bounded linear functional on W .

Proof. Clearly, ϕc is linear. From (1) it follows that

b 2 2 2 2 ϕ (f) 2f(a) + 2(b a) f 0(t) dt max 2, 2(b a) f 1,2 | c | ≤ − ≤ { − }k kW Za for all f W 1,2, so ϕ is a bounded linear functional. ∈ c From (1) we even obtain that

max f(t) C f W 1,2 a t b | | ≤ k k ≤ ≤ for some constant C. That means that the norm W 1,2 is stronger than . k · k k · k∞ 3.2 Application to differential equations In order to explain the functional analytic approach to differential equations and show the use of Sobolev spaces, we consider two examples. Consider u u = f on (0, 1), 00 − (2) u(0) = u(1) = 0,  where the function f : [0, 1] R is given and we want to find a fucntion u : [0, 1] R → → satisfying (2). A classical solution of (2) is a twice continuously differentiable function u with (2). From the equation we see that for existence of such a solution it is necessary that f is continuous. We will show existence of a solution (2) by means of the Sobolev space W 1,2[0, 1] and the representation theorem of Riesz:

Theorem 3.4. Let (H, , , ) be a Hilbert space and let ϕ : H R be a bounded linear h· · i → function. Then there exists a unique y H such that ∈ ϕ(u) = u, y for all u H. h i ∈

6 First we reformulate (2). If u would satisfy (2), we could multiply both sides of the equation by a v W 1,2[0, 1] and integrate over [0, 1], ∈ 1 1 1 u00(t)v(t) dt u(t)v(t) dt = f(t)v(t) dt − Z0 Z0 Z0 and use to obtain

1 1 1 u0(1)v(1) u0(0)v(0) u0(t)v0(t) dt u(t)v(t) dt = f(t)v(t) dt. − − − Z0 Z0 Z0 If v(0) = v(1) = 0, we get

1 1 1 u0(t)v0(t) dt + u(t)v(t) dt = f(t)v(t) dt. − Z0 Z0 Z0 Hence (2) implies

1 1 1 u(t)v(t) dt + u0(t)v0(t) dt = f(t)v(t) dt for all v D, (3) − ∈ Z0 Z0 Z0 where D = v W 1,2[0, 1]: v(0) = v(1) = 0 . { ∈ } Equation (3) is called the weak formulation of (2). The function v D is called a test ∈ function. The integral at the right hand side of (3) makes sense for any f L2[0, 1]. The ∈ integrals at the left hand side exist for each u W 1,2[0, 1]. A function u W 1,2[0, 1] with ∈ ∈ u(0) = u(1) = 0 which satisfies (3) is called a weak solution of (2). Assume f L2[0, 1]. We claim that ∈ (a) D is a closed subspace of W 1,2[0, 1] hence a Hilbert space,

(b) there is a unique u D such that (3) holds, ∈ (c) if f is continuous, then u is twice continuously differentiable.

It follows that (2) has a unique solution if f L2[0, 1] and a classical solution if f C[0, 1]. ∈ ∈ Proof of (a). By Lemma 3.3,

ϕ (v) := v(0) and ϕ (v) := v(1), v W 1,2[0, 1], 0 1 ∈ define bounded linear functionals ϕ , ϕ : W 1,2[0, 1] R. Hence 0 1 → D = v W 1,2[0, 1]: ϕ (v) = 0, ϕ (v) = 0 { ∈ 0 1 } 1 1 = ϕ− ( 0 ) ϕ− ( 0 ) 0 { } ∩ 1 { } is a closed subspace of W 1,2[0, 1]. Proof of (b). Consider the inner product

1 1 u, v = u(t)v(t) dt + u0(t)v0(t) dt hh ii Z0 Z0

7 on W 1,2[0, 1]. As shown in the previous section, its induced norm is equivalent to 1,2 ||| · ||| 1,2 , so W [0, 1] is also a Hilbert space when endowed with the inner product , . k · kW hh· ·ii As D is a closed subspace, (D, , , ) is a Hilbert space. Define hh · ·ii 1 ϕ (v) := f(t)v(t) dt, v D. f − ∈ Z0 Then ϕf is linear and

1 1/2 1 1/2 ϕ (v) f(t)2 dt v(t)2 dt | f | ≤ Z0  Z0  f 2 v for all v D, ≤ k kL ||| ||| ∈ so ϕf is a bounded linear functional on D. Equation (3) can be written as

u, v = ϕ (v) for all v D. hh ii f ∈ Due to Riesz’s representation theorem, there exists a unique y D such that ϕ (v) = ∈ f v, y = y, v for all v D. This y is then the unique solution in D of (3). In other hh ii hh ii ∈ words, y is the unique weak solution of (2). Proof of (c). Assume that f is continuous. From (3) and integration by parts we have for every v D that ∈ 1 1 u0(t)v0(t) dt = (u(t) + f(t))v(t) dt 0 − 0 Z Z1 t = (u(s) + f(s)) dsv0(t) dt, Z0 Z0 so 1 t u0(t) (u(s) + f(s)) ds v0(t) dt = 0. (4) − Z0  Z0  It follows from the next lemma that there exists a constant C such that

t u0(t) (u(s) + f(s)) ds = C for almost every t. (5) − Z0 Lemma 3.5. If w L2[a, b] is such that ∈ b b w(t)g(t) dt = 0 for all g L2[a, b] with g(t) dt = 0, ∈ Za Za then there exists a constant C such that

w(t) = C for almost every t [a, b]. ∈ Moreover, b C = w(t) dt. Za

8 Proof. Let C := b w(t) dt and g(t) := w(t) C, t [a, b]. Then g L2[a, b] and b g(t) dt = 0, a − ∈ ∈ a so R b R w(t)g(t) dt = 0. Za Hence b b g(t)2 dt = w(t) C g(t) dt − Za Za b  b = w(t)g(t) dt C g(t) dt = 0. − Za Za Hence g = 0 a.e.

Let us see that (4) indeed yields (5). We have that 1 t u0(t) (u(s) + f(s)) ds g(t) dt = 0 (6) − Z0  Z0  holds for any g of the form v with v D. If g L2[0, 1] is such that 1 g(s) ds = 0, 0 ∈ ∈ 0 then v(t) := t g(s) ds is such that v D and v = g. Hence (6) holds for all g L2[0, 1] 0 ∈ 0 R ∈ with 1 g(t) dt = 0. Then the lemma yields existence of a constant C such that (5) holds. 0 R Consequently, since u is absolutely continuous, R t t s u(t) = u(0) + u0(s) ds = Ct + (u(s) + f(s)) ds. Z0 Z0 Z0 Because u + f is continuous, we obtain that u C 2[0, 1], that is, u is twice continuously ∈ differentiable. Moreover, differentiating twice yields that

u00(t) = u(t) + f(t), t (0, 1). ∈ Recall that u D, so that also the boundary conditions u(0) = u(1) = 0 are satisfied. ∈ The above procedure is typical for the functional analytic approach to differential equa- tions. First find a weak formulation of the equation by multiplying by a test function, inte- grating and using integration by parts. Second, apply abstract functional analytic methods (e.g., Riesz representation theorem) to obtain existence of a weak solution. Third, consider regularity of the solution.

More generally, we consider the Sturm-Liouville problem (pu ) + qu = f on (0, 1) − 0 0 (7) u(0) = u(1) = 0,  where the coefficients p C 1[0, 1] and q C[0, 1] are given with p(x) > 0 and q(x) 0 for ∈ ∈ ≥ all x [0, 1]. Let f L2[0, 1]. The corresponding weak formulation is ∈ ∈ 1 1 (p(x)u0(x)v0(x) + q(x)u(x)v(x)) dx = f(x)v(x) dx for all v D. (8) ∈ Z0 Z0 Define 1 1 u, v := p(x)u0(x)v0(x) dx + q(x)u(x)v(x), u, v D. h iSL ∈ Z0 Z0 We claim

9 (a) , is an inner product on D and its induced norm is equivalent to 1,2 h· ·iSL k · kSL k · kW on D,

(b) there exists a unique u D which satisfies (8), ∈ (c) this function u is in C 2[0, 1] if f C[0, 1]. ∈ Proof of (a). It is straightforward that , is bilinear and symmetric. For u D, h· ·iSL ∈ 1 1 2 2 2 2 u SL p(x)u0(x) dx min p(x) u0(x) dx = min p(x) u W 1,2 k k ≥ ≥ x [0,1] x [0,1] k k Z0 ∈ Z0 ∈ and

1 1 2 2 2 u SL max p(x) u0(x) dx + max q(x) u (x) dx k k ≤ x [0,1] x [0,1] | | ∈ Z0 ∈ Z0 2 2 max max p(x), max q(x) u C u 1,2 . ≤ { x x | |}||| ||| ≤ k kW

It follows that , is an inner product and that its induced norm is equivalent to 1,2 . h· ·iSL k · kW Proof of (b). Due to (a), (D, , ) is a Hilbert space. The functional ϕ (v) := h· ·iSL f 1 v(x)f(x), v D, is bounded linear, so by Riesz’s representation theorem, there is a unique 0 ∈ u D with R ∈ ϕ (v) = u, v for all v D. f h iSL ∈ That is, u satisfies (8). Proof of (c). The weak solution u satisfies

1 1 x p(x)u0(x)v0(x) dx (q(s)u(s) f(s)) ds v0(x) dx = 0 for all v D, − − ∈ Z0 Z0 Z0  from which we infer by Lemma 3.5 that there exists a constant C such that

x p(x)u0(x) (q(s)u(s) f(s)) ds = C for a.e. x [0, 1]. − − ∈ Z0 Hence 1 x u0(x) = C + (q(s)u(s) f(s)) ds , p(x) −  Z0  so, because u is absolutely continuous,

x 1 s u(x) = C + (q(r)u(r) f(r)) dr ds. p(x) − Z0  Z0  Hence, if f is continuous, u is C 2, since q is continuous and p is in C 1[0, 1].

3.3 Distributions In order to be able to differentiate functions that are not everywhere differentiable, we have considered almost everywhere derivatives. There is an even weaker notion of derivative, which makes even more functions ‘differentiabe’. The price to pay is that derivatives will no longer be functions. The idea is that instead of describing a function f by all its values f(x), it

10 can also be described by all values f(x)g(x) dx, where g runs through a suitable set of test functions. R The support of a function f : R R is the closed set supp f := x R: f(x) = 0 . → { ∈ 6 } Consider the space of test functions (R) = f : R R: f is infinitely many times differentiable D { → and supp f is compact . } On (R) we consider a notion of convergence given by D N such that supp g [ N, N] for all k and g g in ∃ k ⊆ − (9) k → D ⇐⇒ g(m) g(m) uniformly as k for all m 0. k → → ∞ ≥ Here we use the convention that g(0) = g. This notion of convergence is given by a topology on , which is more complicated to describe and of little interest. It can be defined as follows. D For N N, define ∈ N = g : supp g [ N, N] , D { ∈ D ⊆ − } BN (f, m, r) = g N : sup g(m)(x) f (m)(x) < r , f N , m 0, r > 0 { ∈ D x R | − | } ∈ D ≥ ∈ N = BN (f, m, r): f N , m 0, r > 0 , B { ∈ D ≥ } N N = B1 Bn : Bi , n N , B∩ { ∩ · · · ∩ ∈ B ∈ } N N = Ui : Ui , I any set , and U { ∈ B∩ } i I [∈ = U : U N N for all N . U { ⊆ D ∩ D ∈ U } Proposition 3.6. (i) is a topology on . U D (ii) N such that supp g [ N, N] for all k and g g in w.r.t. ∃ k ⊆ − k → D U ⇐⇒ g(m) g(m) uniformly as k for all m 0. k → → ∞ ≥ Proof. (i) It is straightforward to check that , N N , that N is closed under finite ∅ D ∈ U U intersections and closed under arbitrary unions. It then follows that is a topology in .(ii) U D ) gk g w.r.t. means that for each U with g U there exists a k0 such that ⇒ → U ∈ U ∈ N0 gk U for all k k0. Take N0 such that supp g [ N0, N0]. As , there exists ∈ ≥ N0 ⊆ − D ∈ U a k0 such that gk for all k k0. Choose N1 such that supp gk [ N1, N1] for ∈ D ≥ ⊆N − k = 1, . . . , N0. Let N := max N0, N1 . For every m 0 and ε > 0, the set B (g, m, ε) , { } ≥ N ∈ U so there exists a k1 such that for all k k1 one has gk B (g, m, ε), which means that (m) (m) ≥(m) (m) ∈ supx R gk(x) g(x) < ε. Hence gk g uniformly. ∈ | − | → ) Let U with g U. Take N such that supp g [ N, N] for all k. Then ⇐ ∈ U ∈ k ⊆ − U N N , so there are B , . . . , B N with U N B B and g B B . ∩D ∈ U 1 n ∈ B ∩D ⊇ 1 ∩· · ·∩ n ∈ 1 ∩· · ·∩ n For each j, the uniform convergence and the triangle inequality yield that g B for k k ∈ j sufficiently large. Therefore g U for k large. Hence g g w.r.t. . k ∈ k → U On we will consider the convergence given by (9). Denote D 0 = ϕ : R: ϕ is linear and continuous . D { D → } The elements of are called distributions. D0

11 Example 3.7. (a) A function f : R R is locally integrable if it is measurable and b f(x) dx < → a | | for every a < b. In other words, f is locally integrable if it is integrable on [ R, R] for ∞ R− every R > 0. If f is locally integrable, then

ϕf (g) = f(x)g(x) dx, g , R ∈ D Z defines a distribution ϕ . Indeed, the linearity is clear. If g g in , then there exists f ∈ D0 k → D N N such that supp g [ N, N] and supp g [ N, N] and g g uniformly. Hence ∈ k ⊂ − ⊆ − k → N ϕf (gk) ϕf (g) f(x)(gk(x) g(x)) dx f(x) dx sup gk(x) g(x) 0, | − | ≤ R | − | ≤ N | | x | − | → Z Z− as k . → ∞ (b)The Dirac distribution δ at a R is defined by a ∈ δ (g) = g(a), g . a ∈ D (c) If µ is any finite on the Borel σ-algebra of R,

ϕµ(g) = g(x) dµ(x), g , R ∈ D Z defines ϕ . µ ∈ D0 Lemma 3.8. If f and f are locally integrable and ϕ (g) = ϕ (g) for all g , then 1 2 f1 f2 ∈ D f1 = f2 a.e. Proof. (Sketch) Denote f = f f . Then ϕ (g) = 0 for all g . Let 1 − 2 f ∈ D 1 e x2−1 , 1 < x < 1, h(x) = c − 0, x 1, ( | | ≥ where c is chosen such that h(x) dx = 1. Then h . Let R ∈ D R h (x) = mh(mx), x R. m ∈ Let c < d. Let

gm(x) = hm(x y) dy. [c 1/m,d+1/m] − Z − Then g for all m and g (x) 1 (x) for all x R. By Lebesgue’s dominated m ∈ D m → [c,d] ∈ convergence theorem,

f(x)dx = lim f(x)gm(x) dx = 0. m R Z[c,d] →∞ Z It follows that f = 0 a.e.

12 It follows from the lemma that there is an injection f ϕ from the locally integrable 7→ f functions into , if we identify functions that are equal almost everywhere. In other words, D0 every locally integrable function f corresponds to a distribution ϕf . Instead of considering the function f we can as well consider the distribution ϕf .

Distributions on intervals. More generally we can consider distributions on any open I R. Let ⊆ (I) = g : I R: g is infinitely many times differentiable and supp g I . D { → ⊂ } Observe that supp g I means that there is a closed interval [c, d] I such that g = 0 ⊂ ⊂ outside [c, d]. In particular g vanishes near the boundary of I. On (I) we introduce the D following convergence.

closed interval [c, d] such that supp g [c, d] for all k and g g in (I) ∃ k ⊆ k → D ⇐⇒ g(m) g(m) uniformly as k for all m 0. k → → ∞ ≥ Define the distributions on I by

0(I) = ϕ : R: ϕ is linear and continuous . D { D → } As before, we can view locally integrable functions on I as distributions, by

ϕf (g) = f(x)g(x) dx. ZI If ϕ is a distribution, we will say that ϕ C(I) (or Lp(I) or ...) if there exists a function ∈ f C(I) (or Lp(I) or ...) such that ϕ = ϕ .Derivatives of distributions. If f is C 1 and ∈ f g , then by integration by parts, ∈ D

f 0(x)g(x) dx = f(x)g0(x) dx. − ZI ZI That is, ϕ 0 (g) = ϕ (g0) for all g . f − f ∈ D Definition 3.9. The derivative of a distribution ϕ is the distribution Dϕ defined ∈ D0 ∈ D0 by Dϕ(g) = ϕ(g0), g . − ∈ D It is not difficult to prove that Dϕ is indeed an element of . It follows that every locally D0 integrable function and even every distrbution has a derivative.

Lemma 3.10. If f : I R is locally integrable, then → f is absolutely continuous Dϕ is integrable. ⇐⇒ f

13 Proof. ) Let g . We first show that fg is absolutely continuous. If (a , b ), i = 1, . . . , n, ⇒ ∈ D i i are mutually disjoint subintervals of I, then

n (fg)(b ) (fg)(a ) | i − i | i=1 X n ( f(b )g(b ) f(b )g(a ) + f(b )g(a ) f(a )g(a ) ) ≤ | i i − i i | | i i − i i | i=1 X n n max f(x) g(bi) g(ai) + max g(x) f(bi) f(ai) . ≤ x [a,b] | | | − | x [a,b] | | | − | ∈ Xi=1 ∈ Xi=1 Moreover, g is C1 hence absolutely continuous. Therefore, it follows that fg is absolutely con- tinuous. (In fact we have just shown that the product of two absolutely continuous functions is absolutely continuous.) Then, for a I, ∈ x (fg)0(t) dt = (fg)(x) (fg)(a) for all x I, x a. − ∈ ≥ Za Since supp g I and I is open, g vanishes near the boundaries of I. Hence we obtain ⊂

(fg)0(t) dt = 0. ZI That is,

f 0(t)g(t) dt + f(t)g0(t) dt = 0. ZI ZI Hence Dϕf = ϕf 0 . As f 0 is integrable, Dϕf is integrable. ) By definition, there exists an integrable h : I R such that Dϕ = ϕ . That is, ⇐ → f h

f(t)g0(t) dt = h(t)g(t) dt. − ZI ZI Let a be the left end point of I (possibly ) end let −∞ x H(x) = h(t) dt. Za Then H is absolutely continuous, so Hg is absolutely continuous, so

x H(x)g(x) = (Hg)0(t) dt a Z x = H0(t)g(t) dt + H(t)g0(t) dt. Za Hence (with x = b)

0 = h(t)g(t) dt + H(t)g0(t) dt, ZI ZI so

f(t)g0(t) dt = h(t)g(t) dt = H(t)g0(t) dt, − ZI ZI ZI 14 so

(f(t) H(t))g0(t) dt = 0 for all g . − ∈ D ZI From this we want to conclude that there exists a constant C sucht that f(t) H(t) = C for − almost every t I. Since H + C is absolutely continuous, it then follows that f is absolutely ∈ x continuous. If w is such that w(s) ds = 0, then g(x) := w(s) ds defines an element g ∈ D I a in with g = w. So (f(t) H(t))w(t) dt = 0. By a suitable approximation argument it can D 0 I − R R then be shown that (f(t) H(t))w(t) dt = 0 holds for any w L2(I) with w(s) ds = 0, RI − ∈ I so that there exists a constant C with f(t) H(t) = C for almost every t. R − R Corollary 3.11. W 1,2[a, b] = f L2[a, b]: Dϕ L2[a, b] . { ∈ f ∈ } Similarly,

m,p p m 1,p W [a, b] = f L [a, b]: Dϕ W − [a, b] { ∈ f ∈ } = f Lp[a, b]: Dϕ , . . . , Dmϕ Lp[a, b] . { ∈ f f ∈ }

Multidimensional distributions. Let d N and let Ω be an open subset of Rd. Let ∈ (Ω) := g : Ω R: all partial derivatives of g of all orders exist and are continuous and D { → compact K Ω such that supp g K . ∃ ⊆ ⊆ } Introduce a notion of convergence in by D

compact K Ω such that supp gn K for all n and for every m +···m m +···m gn g ∃ ⊆ ∂ 1 d gk ∂ 1⊆ d g → ⇐⇒ m , . . . , m 0, m m m m uniformly 1 d ∂ 1 x1 ∂ d x ∂ 1 x1 ∂ d x ≥ ··· d → ··· d The set of distributions is given by

0(Ω) := ϕ: R: ϕ is linear and continuous . D { D → } A function f : Ω R is locally integrable if → f(x , . . . , x ) dx . . . dx < | 1 d | 1 d ∞ ZB for every bounded B Ω. The functional ⊆ ϕ (g) = f(x , . . . , x )g(x , . . . , x ) dx . . . dx , g , f 1 d 1 d 1 d ∈ D ZΩ is then a distribution. For m , . . . , m 0, define the derivative of ϕ of order (m , . . . , m ) by 1 d ≥ ∈ D0 1 d m1+ +md m1,...,m m1+ +m ∂ ··· D d ϕ(g) := ( 1) ··· d ϕ g , g . − ∂m1 ∂md x ∈ D  · · · d 

15 Then Dm1,...,md ϕ . By means of these multidimensional distributional derivatives we can ∈ D0 define Sobolev spaces on Ω as follows:

W m,p(Ω) := f Lp(Ω): Dm1,...,md ϕ Lp(Ω) { ∈ f ∈ for all m , . . . , m 0 with m + + m m . 1 d ≥ 1 · · · d ≤ } If d = 1, we have seen that that every f W 1,p is continuous (even absolutely continuous). ∈ In higher dimensions elements of W 1,p need not be continuous. The following theorem is the famous Sobolev embedding theorem. Its proof is beyond the scope of this course. Theorem 3.12. If d N, m, k 0, 1, 2, . . . , Ω Rd is open, and p > 1, then ∈ ∈ { } ⊆ d W m,p(Ω) Ck(Ω) if m > k + . ⊆ p If we denote by Hm,p(Ω) the closure of (Ω) in W m,p(Ω), then 0 D m,p q dq H0 (Ω) is compactly embedded in L (Ω) if mp < d and q < d mp • − Hm,p(Ω) is compactly embedded in C k(Ω) if mp > d + k. • 0 Here we say that a normed space X is compactly embedded in a normed space Y if X is a subspace of Y and all elements from the unit ball of X are a relatively compact set in Y . A standard reference on Sobolev spaces is the book “Sobolev spaces” by R.A. Adams, Academic Press, 1975.

3.4 Fourier transform and fractional derivatives If one analyses linear operations in Rd, it is sometimes convenient to choose a new basis. The transformation of coordinates with respect to the old basis to the coordinates for the new basis is a linear bijection. Differentiation is also a linear transformation. It is sometimes convenient to use a transformation of a function space that makes the operation of differentiation simpler. Such a transformation is the Fourier transform. A complete theory of the Fourier transform is quite involved. This section contains a glossery of the main features relevant for Sobolev spaces. For an integrable function f : R R we define the Fourier transform of f by → 1 iωt ( )(ω) = f(t)e− dt, ω R. F √2π R ∈ Z Not all mathematicians agree on the choice of the scaling factor 1 . Other common choices √2π 1 are 1 and 2π . The function f is often also denoted by fˆ. It can be shown that fˆ is a continuous F function and fˆ(ω) 0 as ω or ω . → → ∞ → −∞ If we apply to a test function f , the image f need not be in . If we enlarge the F ∈ D F D class of test functions somewhat, we get a class that is more compatible with the action of . F A function f : R R is called rapidly decreasing if → t nf(t) 0 as t for every n N, | | → | | → ∞ ∈

16 t that is, f decays faster than any power of t. For instance, f(t) = e−| | is rapidly decreasing and every function with a bounded support is rapidly decreasing. The set

(m) (R) = f C∞(R): f is rapidly decreasing for all m N S { ∈ ∈ } is called the . Observe that (R) (R). The relevance of the Schwartz space D ⊂ S for the Fourier transform is shown by the next theorem, which we cite without proof. Theorem 3.13. The Fourier transform is a bijection from (R) onto (R). F S S Proposition 3.14. For f, g (R) we have ∈ S ˆ (a) R f(ω)g(ω) dω = R f(t)gˆ(t) dt;

R 1 t2 R 1 ω2 1 (at)2 1 1 ( ω )2 (b) (t e− 2 ) = ω e− 2 and, more generally, (t e− 2 ) = ω e− 2 a for F 7→ 7→ F 7→ 7→ a every a = 0; 6 (c) ( f)(ξ) = f( ξ) for all ξ R; F F − ∈ ˆ (d) R f gˆ(ω) dω = R f(t)g(t) dt. Proof.R (a) By Fubini,R

1 iωt fˆ(ω)g(ω) dω = f(t)e− dt g(ω) dω R R √2π R Z Z Z 1 iωt = f(t) g(ω)e− dω dt R √2π R Z Z = f(t)gˆ(t) dω. R Z (b) We have

1 1 (at)2 iωt 1 1 s2 i ω s e− 2 e− dt = e− 2 e− a ds √2π R √2πa R Z Z 2 1 1 (s iω )2 1 ( ω ) = e− 2 − a ds e− 2 a √2πa R Z 1 1 ( ω )2 = e− 2 a . a (c) By (a) and (b),

1 iωξ 1 (aω)2 ( f)(ω)e− e− 2 dω √2π R F Z 1 iωξ 1 (aω)2 = f(t) (e− e− 2 ) dω √2π R F Z 1 1 iωξ 1 (aω)2 iωt = f(t) e− e− 2 e− dω dt √2π R √2π R Z Z 1 1 1 ( ξ+t )2 = f(t) e− 2 a dt √2π R a Z 1 1 s2 = f(as ξ)e− 2 ds. √2π R − Z

17 If we let a 0, Lebesgue’s dominated convergence theorem applied to both the left and right → hand side yields 1 iωξ ( f)(ω)e− dω = f( ξ), √2π R F − Z so f(ξ) = f( ξ). FF − (d) Since

1 iωt 1 iωt gˆ(ω) = g(t)e− dt = g(t)e dt √2π R √2π R Z Z 1 iωt = g( s)e− ds = (s g( s)), √2π R − F 7→ − Z (a) and (c) yield with h(t) = g( t) that − fˆ(ω) gˆ(ω) dω = fˆ(ω)( h)(ω) dω R R F Z Z = f(t)( h)(t) dω = f(t)h( t) dt R FF − Z Z = f(t)g(t) dt. R Z

From part (d) we have f, g 2 R = f, g 2 R hF F iL ( ) h iL ( ) and f 2 R = f 2 R . kF kL ( ) k kL ( ) So is an isometry with respect to the L2-norm on (R). It can be shown that (R) is F S S dense in L2(R) and then it follws that extends uniquely to an isometry from L2(R) into F F2 L2(R). This extension is called the Fourier-Plancherel transformation. It satisfies F2 2 f, g 2 R = f, g 2 R for all f, g L (R). hF2 F2 iL ( ) h iL ( ) ∈ Moreover, is a bijection from L2(R) onto L2(R). F Fourier transform and derivative. The importance of the Fourier transform in the context of differentiation is that it transforms differentiation to multiplication by iω, which is an easier operation. Proposition 3.15. If f (R), then ∈ S ˆ f 0(ω) = iωf(ω). Proof. b N 1 iωt 1 iωt f 0(t)e− dt = lim f 0(t)e− dt √2π R M,N √2π M Z →∞ Z− N 1 iωt N iωt = lim f(t)e− M f(t)( iω)e− dt M,N √2π |− − M − →∞  Z−  = iωfˆ(ω).

18 It can be seen from the Fourier transform of a function whether its distributional derivative is actually a function in L2(R). Proposition 3.16. For f L2(R) one has ∈ f has distributional derivative in L2(R) ω ωfˆ(ω) L2(R). ⇐⇒ 7→ ∈ Moreover, in that case the distributional derivative of f equals

1 − (ω iωfˆ(ω)). F2 7→ 2 2 1 Proof. ) As is a bijection from L (R) onto L (R), we can define h = − (ω iωfˆ(ω)) ⇐ F F2 7→ ∈ L2(R). For g (R), ∈ D h(t)g(t) dt = hˆ(t) gˆ(ω) dω = iωfˆ(ω) gˆ(ω) dω R R R Z Z Z ˆ ˆ = f(ω) (iω)gˆ(ω) dω = f(ω)g0(ω) dω = f(t)g0(t) dt, − R − R − R Z Z Z so h is the distributional derivative of f. ) Let h Lb2(R) be the distributional derivative of ⇒ ∈ f. Then

h(t)g(t) dt = f(t)g0(t) dt for all g (R). R − R ∈ D Z Z Therefore, for every g (R), ∈ D

hˆ(ω)gˆ(ω) dω = h(t)g(t) dt = f(t)g0(t) dt R R − R Z Z Z ˆ ˆ = f g0(ω) dω = f(ω) (iω)gˆ(ω) dω − R − R Z Z = iωfˆ(bω) gˆ(ω) dω, R Z so hˆ(ω) iωfˆ(ω) gˆ(ω) dω = 0. R − Z   By a quite involved approximation argument it can be shown that this imlies that

hˆ(ω) iωfˆ(ω) = 0 for almost every ω. − Hence ω ωfˆ(ω) = hˆ(ω) L2(R). 7→ ∈ We can view a function f L2[a, b] as a function in L2(R) by extending with zero: ∈ f(t), t [a, b], f(t) = ∈ 0, t R 0 .  ∈ \ { } Notice that f L2(R) L2(R). ∈ ∩ The previous proposition yields yet another way to describe the Sobolev space W 1,2[a, b]. Corollary 3.17. W 1,2[a, b] = f L2[a, b]: ω ωfˆ(ω) L2(R) . { ∈ 7→ ∈ }

19 Fractional derivative. Does there exist an operator D1/2 such that applying it twice to a ˆ function f yields the derivative f 0? For higher derivatives we see from f 0(ω) = iωf(ω) that (m) 1 m f = − (ω (iω) fˆ(ω)), F2 7→ b whenever f is such that ω (iω)mfˆ(ω) is in L2(R). If s > 0 and f L2(R) is such that 7→ ∈ ω (iω)sfˆ(ω) L2(R), we define 7→ ∈ s 1 s D f = − (ω (iω) fˆ(ω)). F2 7→ Then the function Dsf is in L2(R) and it is called the fractional derivative of f of order s. Recall that 1 πis s s e 2 ω , ω 0, (iω) = 3 πis| |s ≥ ( e 2 ω , ω < 0. | | By means of the Fourier-Plancherel transform we see that if f L2(R) has distributional 2 1/2 1/2 ∈ derivative in L (R), then D D f = f 0. A short overview with other approaches to frac- tional derivatives is given on http://en.wikipedia.org/wiki/Fractional calculus.

4 Stieltjes integral

4.1 Definition and properties Riemann’s idea to define the integral of a function f on an interval [a, b] was to consider convergence of sums f(s )(t t ), where a = t t t = b is a partition and i i i+1 − i 0 ≤ 1 ≤ · · · ≤ n s [t , t ]. The factor t t is the weight or length of the interval [t , t ]. If µ is an i ∈ i i+1 P i+1 − i i i+1

arbitrary measure on [a, b], then the Lebesgue integral of f with respect to µ is constructed

by means of sums i αiµ(Ai), where i αi Ai are step functions approximating f. The factor µ(A ) is the weight or measure of the set A . The integral of Thomas Stieltjes is both i P P i historically and in generality between the Riemann and the Lebesgue integral. It is not as general as Lebesgue’s integral but its construction is as explicit as that of Riemann’s integral. It is especially convenient in . Moreover, the stochastic integrals of Wiener and Ito are based on Stieltjes’s construction. A partition of an interval [a, b] is a finite set t = t , t , . . . , t of points of [a, b]. We { 0 1 n} will assume that a partition always contains a and b. In proofs and constructions it is often convenient to order the points of a partition. The assumption that the end points are in the partition and the right labeling of the point so that they are ordered can be summarized in the phrase “ let a = t t t = b be a partition of [a, b]”. It turns out to be 0 ≤ 1 ≤ · · · ≤ n convenient to allow here that some consecutive points are equal. This means in particular that a point may occur several times. The mesh size of a partition a = t t t = b 0 ≤ 1 ≤ · · · ≤ n is defined by mesh(t) = max tk tk 1 . 1 k n | − − | ≤ ≤ If t and s are two partitions, then s is called finer than t if s t. That is, a finer partition ⊇ is obtained from a coarser one by adding more points. Notice that 0, π, 4 and 1, 2, 4 are { } { } both partitions of [0, 4] but none of the two is finer than the other. Given two (or finitely many) partitions, there is always a partition that is finer than each of them, namely the union. Such a finer partition is sometimes called a common refinement.

20 Definition 4.1. Let f, g : [a, b] R. The function f is Stieltjes integrable with respect to g if → there exists I R such that for every ε > 0 there exists a partition such that for every finer ∈ partition a = t0 t1 t2 tn = b and every choice of si [ti 1, ti] we have ≤ ≤ ≤ · · · ≤ ∈ − n f(sk) g(tk) g(tk 1) I < ε. − − − k=1   X

The number I is then called the Stieltjes integral of f with respect to g and it is denoted by b f(t) dg(t). Za If a = t0 t1 t2 tn = b is a partition and sk [tk 1, tk] are intermediate points, ≤ ≤ ≤ · · · ≤ ∈ − we denote the corresponding Riemann-Stieltjes sum by n S(s, t) = f(sk) g(tk) g(tk 1) . − − Xk=1   The most famous instance of Stieltjes integrability is the following. Theorem 4.2. If f : [a, b] R is continuous and g : [a, b] R is of bounded variation, then → → f is Stieltjes integrable with respect to g. Proof. First we compare the Riemann-Stieltjes sums for two partitions, one finer than the other. Let a = t0 t1 t2 tn = b and for each k let tk 1 = τk,0 τk,1 ≤ ≤ ≤ · · · ≤ − ≤ ≤ · · · ≤ τk,m = tk. Let sk [tk 1, tk] and let σk,i [τk,i 1, τk,i]. Then k ∈ − ∈ − S(σ, τ) S(s, t) | − | n mk n = f(σk,i) g(τk,i) g(τk,i 1) f(sk) g(tk) g(tk 1) − − − − − Xk=1 Xi=1   Xk=1   n mk n mk

= f(σk,i) g(τk,i) g(τk,i 1) f(sk) g(τk,i) g(τk ,i 1) − − − − − Xk=1 Xi=1   Xk=1 Xi=1   n mk

= f(σk,i) f(sk) g(τk,i) g(τk,i 1) − − − Xk=1 Xi=1    n mk

max max f(σ) f(s) g(τk,i) g(τk,i 1) ≤ k σ,s [tk−1,tk] | − | − − ∈ Xk=1 Xi=1   max max f(σ) f(s) V[a,b](g). ≤ k σ,s [tk− ,tk] | − | ∈ 1 Next, choose any sequence of partitions tn with mesh size tending to zero and such that tn+1 is a refinement of tn for every n. Then for m n, ≥ n n m m S(s , t ) S(s , t ) max f(s) f(σ) V[a,b](g). | − | ≤ s σ mesh(tn) | − | | − |≤ As f is continuous on [a, b], it is uniformly continuous, since [a, b] is compact. Hence the previous formula yields

S(sn, tn) S(sm, tm) 0 as m, n . | − | → → ∞

21 So (S(sn, tn)) is a Cauchy sequence and therefore there exists I R such that n ∈ I = lim S(sn, tn). n →∞ Now we have the candidate integral I. Let ε > 0. Choose δ > 0 such that ε s σ < δ = f(s) f(σ) < . | − | ⇒ | − | 2V[a,b](g) + 1

(The +1 in the denominator is only there to avoid division by 0 in case V[a,b](g) = 0. Alter- natively, this trivial case could be treated separately.) Choose N such that

mesh(tN ) < ε/2 and S(sn, tn) I < ε/2 for all n N. | − | ≥ Let τ be any partition finer than tN and σ any choice of intermediate points for τ. Then

N N S(σ, τ) S(s , t ) max f(s) f(σ) V[a,b](g) < ε/2 | − | ≤ s σ

m

g = βi [ui,b], Xi=1 or, equivalently, g(t) = β + + β whenever t [u , u ) (where u := b). Then f is 1 · · · ` ∈ ` `+1 m+1 Stieltjes integrable with respect to g and

b m f(t) dg(t) = f(u`)β`. a Z X`=1 Indeed, let ε > 0. Choose δ > 0 such that ε s σ < δ = f(s) f(σ) < | − | ⇒ | − | β + 1 ` | `| and such that δ < u u for all `. Let a = t t P t = b be any partition with `+1 − ` 0 ≤ 1 ≤ · · · ≤ n mesh(t) < δ and let s by any choice of intermediate points for t. Then each interval [tk 1, tk] m − contains at most one of the jump points u`. Further, `=1[u`, u`+1) = [a, b), so for each ` there is exactly one k such that u [t , t ). Then ` ` ∈ k`+1 k` S n m m f(sk) g(tk) g(tk 1) = f(sk ) g(tk ) g(tk 1) = f(sk )β`. − − ` ` − `− ` Xk=1   X`=1   X`=1 22 Since both sk [tk 1, tk ] and u` [tk 1, tk ), we have sk u` < tk tk 1 < δ. Hence ` ∈ `− ` ∈ ` − ` | ` − | | ` − `− | ε f(s ) f(u ) < , | k` − ` | β + 1 ` | `| so P m m m

f(sk` )β` f(u`)β` f(sk` f(u`) β` < ε. − ≤ | − | | | `=1 `=1 `=1 X X X b m

Therefore f is Stieltjes integrable with respect to g and a f(t) dg(t) = `=1 f(u`)β`.

Example 4.4. The function f = [0,1/2) is not StieltjesR integrable withP respect to g = . Indeed, if 0 = t t t = 1 is any partition, then there exists a k such [1/2,1] 0 ≤ 1 ≤ · · · ≤ n ∗ ∗ ∗ n ∗ that [tk 1, tk ) contains 1/2. Then k=1 f(sk) g(tk) g(tk 1) = f(sk ). By choosing − − − sk∗ [tk∗ 1, tk∗ ] less than 1/2 or greater or equal 1/2 we can makethe Riemann Stietjes sum ∈ − P either 0 or 1. Thus f is not Stietjes integrable with respect to g.

A formula for calculus with Stieltjes integrals. If g C 1[a, b] then g is of bounded ∈ variation. Indeed, for a = t t t = b, 0 ≤ 1 ≤ · · · ≤ n n n g(tk) g(tk 1) max g0(ξ) tk tk 1 = max g0(ξ) (b a), | − − | ≤ ξ [a,b] | | | − − | ξ [a,b] | | − X` Xk=1 ∈ ∈

so V[a,b](g) maxξ [a,b] g0(ξ) (b a). ≤ ∈ | | − Theorem 4.5. If f : [a, b] R is continuous and g : [a, b] R is continuously differentiable, → → then b b f(t) dg(t) = f(t)g0(t) dt. Za Za Proof. First we show that the continuous differentiability of g implies uniform differentiability in the following sense: for every ε > 0 there exists δ > 0 such that

g(s) g(t) 0 < s t < δ = − g0(t) < ε. | − | ⇒ s t −

Indeed, let g(s) g(t) s−t , s = t, h(s, t) := − 6 ( g0(t), s = t. Then h is continuous on [a, b] [a, b] and therefore uniformly continuous. Hence for ε > 0 × there exists a δ > 0 such that

(s, t) (s , t ) < δ = h(s, t) h(s , t ) < ε. k − 0 0 k ⇒ | − 0 0 |

Then (with t = s0 = t0)

s t < δ = h(s, t) h(t, t) < ε, | − | ⇒ | − | and that is the implication we wanted to show.

23 Next, let ε > 0. Choose δ > 0 such that g(s) g(t) ε 0 < s t < δ = − g0(t) < . | − | ⇒ s t − 2(b a) f + 1

− − k k∞ Then

ε 0 < s t < δ = g(s) g(t) g0(t)(s t) < s t . | − | ⇒ | − − − | 2(b a) f + 1| − | − k k∞ Choose a partition t such that for every finer partition a = τ τ τ = b and for 0 ≤ 1 ≤ · · · ≤ n every choice of σk [τk 1, τk], ∈ − n b f(σk) g(τk) g(τk 1) f(t) dg(t) < ε/2. − − − a k=1  Z X  Without loss of generalit y we may choose t such that mesh(t) < δ. Then also mesh(τ) < δ,

so n b f(σk)g0(σk)(τk τk 1) f(t) dg(t) − − − a Xk=1 Z n n

< f(σk)g0(σk)(τk τk 1) f( σk) g(τk) g(τk 1) + ε/2 − − − − − k=1 k=1   Xn X

f(σk) g0(σk)(τk τk 1) g(τk) g(τk 1) + ε/2 ≤ | | − − − − − k=1   Xn

f g0(σk)(τk σk) + g0(σk)(σk τk 1) (g(τk) g(σk)) + (g(τk 1) g(σk)) + ε/2 ≤ k k∞| − − − − − − − | k=1 X n ε ε f τk σk + τk 1 σk + ε/2 ≤ k k∞ 2(b a) f + 1| − | 2(b a) f + 1| − − | k=1  − k k∞ − k k∞  X n ε ε (τk σk + σk τk 1) + ε/2 (b a) + ε/2 = ε. ≤ 2(b a) − − − ≤ 2(b a) − − Xk=1 − b b So a f(t) dg(t) = a f(t)g0(t) dt. IntegrationR by parts.R Another important theorem about the Stieltjes integral is the inte- gration by parts formula. Theorem 4.6. Let f, g : [a, b] R. If f is Stieltjes integrable with respect to g, then g is → Stieltjes integrable with respect to f and b b f(t) dg(t) = g(t) df(t) + f(b)g(b) f(a)g(a). − − Za Za Proof. Let ε > 0. Since we know that b f(t) dg(t) exists, we can choose a partition a = t a 0 ≤ t t = b such that for every finer partition a = τ τ τ = b and every 1 ≤ · · · ≤ n R 0 ≤ 1 ≤ · · · ≤ m σi [τk 1, τk] we have ∈ − m b f(σk) g(τk) g(τk 1) f(t) dg(t) < ε. − − − a k=1   Z X

24 In order to show that b g(t) df(t) exists and equals b g(t) df(t) + f(b)g(b) f(a)g(a), a − a − we consider the partition a = t t t t t t t t = b (on R 0 ≤ 0 ≤ 1 ≤ 1 ≤ 2R ≤ 2 ≤ · · · ≤ N ≤ N purpose!). Let a = θ0 θ1 θM = b be a finer partition and let ζk [θk 1, θk]. ≤ ≤ · · · ≤ ∈ − Due to the way we have chosen the partition with double tis, each ti will occur as one of the ζ . That is, a = ζ ζ ζ = b is a partition of [a, b] which is finer than k 1 ≤ 1 ≤ · · · ≤ M a = t0 t1 t2 tn = b. Morover, θk 1 ζk θk ζk+1, so θk [ζk, ζk+1] for ≤ ≤ ≤ · · · ≤ − ≤ ≤ ≤ ∈ k = 0, . . . , M 1, where we set ζ := a. Hence − 0 M 1 − b f(θk) g(ζk+1 g(ζk) f(t) dg(t) < ε. − − a k=0   Z X

Further, M 1 M 1 M 1 − − − g(θ ) f(ζ f(ζ ) = g(θ )f(ζ ) g(θ )f(ζ ) k k+1 − k k k+1 − k k Xk=0   Xk=0 Xk=0 M M 1 − = g(θk 1)f(ζk) g(θk)f(ζk) − − Xk=1 Xk=0 M M = g(θk 1)f(ζk) g(θk)f(ζk) + g(θk)f(ζk) g(θ0)f(ζ0) − − − Xk=1 Xk=1 M = f(ζk) g(θk) g(θk 1 + f(b)g(b) f(a)g(a). − − − − Xk=1   So

M 1 − b g(θk) f(ζk+1 f(ζk) f(t) dg(t)f(b)g(b) f(a)g(a) − − − a − k=0    Z  X M b f(ζk) g(θk) g(θk 1) + f(t) dg(t) < ε. ≤ − − − a k=1   Z X

Hence g is Stieltjes integrable with respect to f and b b f(t) dg(t) = g(t) df(t) + f(b)g(b) f(a)g(a). − − Za Za

Remark 4.7. The definition given here of Stieltjes integrability is not the original one, which was somewhat more restrictive. Let f, g : [a, b] R. The original definition of “f is Stieljes → integrable with respect to g” would read There exists an I R such that for every ε > 0 there exists a δ > 0 such that for every ∈ partition a = t0 t1 tn = b with mesh(t) < δ and every choice of sk [tk 1, tk] ≤ ≤ · · · ≤ ∈ − one has n f(sk) g(tk) g(tk 1) I < ε. − − − k=1   X

25 The difference is that in this definition every partition of small enough mesh has to be considered, whereas in our definition we only need to consider partitions that are refinement of a given partition. The latter enables us to include some critical points in each of the

partitions.

Let us consider an example. Let f = [0,1/2] and g = [1/2,1]. As in Example 4.4 we can show for any partition of which 1/2 is not one of the points that the Riemann Stieltjes sum can be 1 or 0, depending on the choice of intermediate points. Theorefore, f and g do not satisfy the original definition stated above. However, for every partition 0 = τ τ τ = 1 0 ≤ 1 ≤ · · · ≤ n finer than 0 = 0 1/2 1 = 1 and any choice of intermediate points σk [τk 1, τk], we have ≤ ≤ ∈ − n f(σk) g(τk) g(τk 1) = 1. − − Xk=1   So f is Stieltjes integrable with respect to g according to our definition. More generally, it can be shown that every c`agl`ad function f is Stieltjes integrable with respect to any c`adl`ag function g. Here c`agl`ad stands for the french phrase “continu a` gauche, limite a` droite”, which means left continuous with right limits. It means that f is such that at every point t, both the left and right limits exist and that f(t) equals the left limit. Similarly, c`adl`ag means right continuous with left limits. This situation, although it seems somewhat technical, is of great importance in the theory of stochastic processes, in particular, integration with respect to semimartingales.

Connection with Lebesgue integral. Let g : [a, b] R be increasing. Then µ((s, t]) := → g(t) g(s) extends to a measure µ on the Borel sets of [a, b]. If f is continuous, then f is − Lebesgue integrable with respect to the bounded measure µ. We will show that

b f(t) dg(t) = f(t) dµ(t). Za Z[a,b] Let N N. Choose a partition a = tN tN tN = b such that mesh(tN ) < 1/N and ∈ 0 ≤ 1 ≤ · · · ≤ nN nN b N N 1 f(sk) g(tk ) g(tk 1) f(t) dg(t) < , − − − a N k=1   Z X N N N N N for any choice of sk [tk 1, tk ]. Choose any sk [tk 1, tk ] and let ∈ − ∈ − n N hN := f(s ) N N . k [(tk−1,tk ] Xk=1 Then N n N N N N N N hN dµ(t) = f(sk )µ((tk 1, Tk ]) = f(sk ) g(tk ) g(tk 1) . − − − Z Xk=1 Xk=1   Since f is continuous, hN (t) f(t) as N for every t [a, b]. Further, hN f on → → ∞ ∈ | | ≤ k k∞ [a, b]. Due to Lebesgue’s dominated convergence theorem we find as N → ∞ h (t) dµ(t) f dµ(t). N → Z Z 26 Also, nN b N N N hN (t) dµ(t) = f(sk ) g(tk ) g(tk 1) f(t) dg(t). − − → a Z Xk=1   Z

Recall that a measure µ on [a, b] is called absolutely continuous with respect to the λ if λ(A) = 0 impies µ(A) = 0 for every A [a, b]. ⊆ Theorem 4.8. Let g : [a, b] R be increasing. Let µ be a measure on the Borel sets of [a, b] → such that µ((s, t]) = g(t) g(s) for all a s < t b. Then − ≤ ≤ µ is absolutely continuous with respect to λ g is absolutely continuous. ⇐⇒ Proof. ) Let A be a Borel set with λ(A) = 0. Let ε > 0. We want to show that µ(A) < ε. ⇐ Since g is absolutely continuous, we can choose a δ > 0 such that

n g(b ) g(a ) < ε | k − k | Xk=1 n for every mutually disjoint intervals (ak, bk), k = 1, . . . , n, with k=1(bk ak) < δ. Since N − λ(A) = 0, there are mutually disjoint open intervals (ai, bi), i , such that A i∞=1(ai, bi) n ∈ Pn ⊆ and ∞ (b a ) < δ. Then (b a ) < δ for each n so g(b ) g(a ) < ε for i=1 i − i i=1 i − i i=1 | i − Si | every n. Hence ∞ g(b ) g(a ) ε. We infer P i=1 | i − Pi | ≤ P

P ∞ ∞ ∞ µ(A) µ( (a , b )) = µ((a , b )) µ((a , b ]) ≤ i i i i ≤ i i i[=1 Xi=1 Xi=1 ∞ = (g(b ) g(a )) ε. i − i ≤ Xi=1 Thus µ(A) = 0. ) Assume that µ is absolutely continuous with respect to λ. Due to the Radon-Nikodym ⇒ theorem there exists an integrable function h: [a, b] R with h 0 such that

→ ≥

µ(A) = A(s)h(s) dλ(s). Z for all Borel sets A. If we use this for A = (a, t], we find

g(t) g(a) = µ((a, t]) = (s)h(s) dλ(s). − (a,t] Z Hence t g(t) = g(a) + h(s) ds for all t [a, b], ∈ Za which yields that g is absolutely continuous.

27 4.2 Langevin’s equation Consider the movement of a particle of mass m along a line, with velocity v(t) at time t. Due to friction there is a force av(t) acting on the particle, where a 0 is a constant. Suppose − ≥ that many small influences in the environment of the particle are acting on the particle, resulting in a force, which is erratic and very irregular. We will call this force “noise”. The velocity of the particle then satisfies

mv0(t) = av(t) + “noise”. − By means of rescaling, we can rewrite the equation as

y0(t) = ay(t) + “noise” (10) − How to model noise? Let us first consider the case a = 0. We conceive of the noise as a force which is so erratic that is cannot be modeled deterministically. We think of tossing a coine at each time t and a force hitting the particle in positive direction if the outcome is head and in negative direction if the outcome is tail. For a good mathematical model we consider discretization of the differential equation

y0(t) = “noise” (11)

Fix t > 0 and n N. Let X , . . . , X be mutually independent random variables such that ∈ 1 n X = 1 with probability 1/2 and X = 1 with probability 1/2. We let t = k t and consider k k − k n the discretized equation

y(tk) y(tk 1) − − = bXk, k = 1, . . . , n, tk tk 1 − − where b is a constant. Then 1 y(tk) = y(tk 1) + b(tk tk 1)Xk = y(tk 1) + b Xk, − − − − n so n b y(t) = y(t ) = y(0) + n t X . n n k Xk=1 We do this for each n, n b y (t) = y(0) + t X . n n n,k Xk=1 We are looking for a limit as n . If b is too small, the random influence may converge → ∞ n to 0. If bn is too big, the sums may not converge. The Central Limit Theorem says that 1 n √n t k=1 Xn,k converges in law to a W (t) with normal (Gaussian) law with √ meanP0 and variance t. Therefore we choose bn = n. The limiting random variables W (t) can be chosen in such a way that the dependence on t is coninuous, as is stated in the next theorem. Its proof is difficult and beyond these notes.

Theorem 4.9. There exists a probability space (Ω, , P) and random variables W (t), t 0, F ≥ such that

28 W = 0 a.e. on Ω, • 0 W has normal law with mean 0 and variance t, • t W W is independent of W for all 0 r s t, • t − s r ≤ ≤ ≤ t W (ω) is continuous for a.e. ω Ω. • 7→ t ∈ The family of random variables W , t 0, of the above theorem is called a Brownian t ≥ motion or Wiener process. Recall that a probability space (Ω, , P) is a set Ω, a σ-algebra , and a measure P on F F F with P(Ω) = 1. A random variable is a map X : Ω R that is measurable with respect to → F and the Borel σ-algebra in R. The law or distribution of X is the measure µX on R satisfying

1 µ (A) = P(X− (A)) = P( ω Ω: X(ω) A ), X { ∈ ∈ } where A R is a Borel set. In words, µ (A) is the probability that X has values in A. ⊆ X There can be different random variables (even defined on different probability spaces) with the same law. If X and Y are random variables defined on Ω, then their joint law is the R2 measure µ(X,Y ) on defined by

µ (C) = P( ω Ω: (X(ω), Y (ω)) C ), (X,Y ) { ∈ ∈ } C R2 Borel. The random variables X and Y are independent if µ equals the product ⊆ (X,Y ) measure µ µ . Equivalently, if X ⊗ Y P((X, Y ) A B) = P(X A)P(Y B) ∈ × ∈ ∈ for all Borel sets A R, B R. A law µ is called normal or Gaussian with mean m R ⊆ ⊆ ∈ and variance σ2 0 if for every Borel set A R, ≥ ⊆ 2 1 1 (x−m) µ(A) = e− 2 σ . √2π σ ZA

By considering limits of discretizations we have found a mathematical model for noise. As a solution of (11) we found y(t) = y(0) + Wt so d “noise” = y0(t) = W . dt t Langevin’s equation is therefore d y0(t) = ay(t) + σ W , (12) − dt t where σ R is some auxiliary parameter. More explicitly, W depends on the variable ω Ω, ∈ t ∈ so also y(t) will depend on ω. We thus have for each ω Ω the differential equation ∈ d y0(t)(ω) = ay(t)(ω) + σ W (ω). − dt t

29 As usual in probability theory we often do not write the dependence on ω explicitly. There is a mathematical problem: the map t Wt(ω) is continuous for almost every ω, 7→ d but it is not differentiable and even nowhere differentiable. So dt Wt(ω) does not exist. (The proof of this fact is far from easy and we omit it). There is a simple way to avoid this problem: we formulate Langevin’s equation as an integral equation,

T y(T ) = y(0) + ay(t) dt + σW , T 0. (13) − T ≥ Z0

Let us next solve equation (12) or rather its correct formulation (13). The homogeneous equation y0(t) = ay(t) − has solution at y(t) = ce− , t 0. ≥ Due to the variation-of-constants formula, the solution of the inhomogeneous equation

y0(t) = ay(t) + f(t), t 0, − ≥ is given by the

t at a(t s) y(t) = ce− + e− − f(s) ds, Z0 where c = y(0) and f is a continuous function. Formally, the solution of (12) would be

t at a(t s) d y(t) = y(0)e− + e− − σ Ws ds 0 ds Z t at a(t s) = y(0)e− + σ e− − dWs. Z0 d Be aware that ds Ws doesnot exist. The latter expression, however, makes sense as a Stieltjes integral. Indeed, s e a(t s) is increasing hence of bounded variation and W is continuous. 7→ − − s Hence t a(t s) e− − dWs(ω) Z0 exists for almost every ω Ω as a Stieltjes integral. These formal manipulations do yield the ∈ correct solution.

Theorem 4.10. Let a, σ R. Let W , t 0, be a Wiener process defined on (Ω, , P). Let ∈ t ≥ F y0 R. Define ∈ t at a(t s) y(t) = e− y0 + σ e− − dWs Z0 (for almost every ω Ω as Stieltjes integral). Then y satisfies the Langevin equation ∈ t y(t) = y a y(s) ds + σW (s). 0 − Z0

30 Proof. By integration by parts,

t at a(t s) y(t) = e− y0 + σ e− − dWs 0 Z t at at as t as = e− y0 + σe− e Ws 0 Ws de | − 0  Z t  a(t s) at at as = e− − + σe− e Wt aWte ds − 0 t Z  at a(t s) e− y + σW σ ae− − W ds. 0 t − s Z0 Hence the above expression for y and Fubini yield

T T T t T a(t s) at y(t) dt = σ W dt σ ae− − W ds dt + e− y dt t − s 0 Z0 Z0 Z0 Z0 Z0 T T T aT a(t s) e− y0 = σ W dt σ ae− − W dds + y t − s a 0 − a Z0 Z0 Zs T T aT a(t s) e− y0 = σ W dt + σ e− − 1 W ds + y t − s a 0 − a Z0 Z0 T  aT  a(T s) e− y0 = σ e− − W ds + y s a 0 − a Z0 1 1 e aT y = σW y(t) + − y 0 . a t − a a 0 − a Thus y(t) satisfies the Langevin equation.

Concerning uniqueness we have the following.

Proposition 4.11. Let a, σ R. Let W , t 0, be a Wiener process and let Ω Ω be such ∈ t ≥ 0 ⊂ that P(Ω ) = 1 and t W (ω) is continuous for all ω Ω . If y (t) and y (t), t 0, are 0 7→ t ∈ 0 1 2 ≥ two families of random variables on (Ω, , P) such that t y (t)(ω) and t y (t)(ω) are F 7→ 1 7→ 2 continuous for all ω Ω and such that y and y both satisfy (13), then ∈ 0 1 2 at y (t) y (t) e− y (0) y (0) , t 0. | 1 − 2 | ≤ | 1 − 2 | ≥ Proof. Let v(t) := y (t) y (t), t 0. 1 − 2 ≥ Then v satisfies t v(t) = a v(s) ds. − Z0 On Ω0, v is continuous and then from the previous formula v is even continuously differen- tiable. Differentiating yields v0(t) = av(t), − so at v(t) = v(0)e− , which yields the desired formula.

31 Consequently, if y1(0) = y2(0), then y1 = y2.

Law of the solution. So far we have considered ω Ω as a parameter and solved the ∈ Langevin equation for each ω separately. We obtained for almost every ω a function t 7→ y(t)(ω). Next, we fix t. Then y(t) is an alomost everywhere defined map from Ω to R. If necessary we can assume it everywhere defined by choosing it zero on those ω where it is not defined. It can be shown that the map y(t) is then measurable. Thus for fixed t, y(t) is a random varuable. What is its law? We know that

t at a(t s) y(t) = e− y0 + σ e− − dWs. Z0 By definition of the Stieltjes integral, y(t) is a limit of sums of the form

n at a(t sk) (s , . . . , s ) = e− y + σ e− − W W − . S 1 n 0 sk − sk 1 Xk=1   From the propertied of the Wiener process it follows that the W W , k = 1, . . . , n are sk − sk−1 mutually independent normal random variables with means 0 and variances sk sk 1. Hence at 2 n 2a(t s−k) − (s1, . . . , sn) has normal law with mean e− y0 and variance σ k=1 e− − (sk sk 1). S − − It follows that the random variables converge in law to a normal random variable with mean at P e− y0 and variance n 2 2 2a(t s ) σ 2at σ e− − k ds = (1 e− ). 2a − Xk=1 2 Hence y(t) is normal with mean e aty and variance σ (1 e at). The details of the above − 0 2a − − arguments need more probability theory and we omit them. We summarize the conclusion.

Proposition 4.12. The solution y(t) at time t of the Langevin equation

t y(t) = y a y(s) ds + σW , t 0, 0 − t ≥ Z0 2 has normal law with mean e at and variance σ (1 e 2at). − 2a − − We could also take the initial value y0 to be a random variable. If we take it independent σ2 of Wt for all t and with normal law with mean 0 and variance 2a , then it turns out that y(t) σ2 has at each time t mean 0 and variance 2a . Thus, the random variables y(t) vary in time, but their laws are constant. Such a solution is called a stationary solution. In case of the above laws, the solution is called an Ornstein-Uhlenbeck process.

4.3 Wiener integral

Let Wt, t 0 be a Wiener process defined on the probability space (Ω, , P). Since s a(t s) ≥ F 7→ e− − is of bounded variation, we can define

t a(t s) e− − dWs Z0

32 as a Stieltjes integral, almost everywhere on Ω. In the same way we can define

t f(s) dWs Z0 for any f : [0, t] R of bounded variation. As before, the corresponding Riemann-Stieltjes → sums n f(s ) W W k sk − sk−1 Xk=1   have mean 0 and variance n 2 f(sk) (sk sk 1). − − Xk=1 In other words,

n 2 n E 2 2 f(sk) Wsk Wsk−1 = f(sk) (sk sk 1), − ! − − Xk=1   Xk=1 where E denotes expectation: EX = X(ω) dP(ω). ZΩ It can be shown that taking limits leads to

t 2 t 2 E f(s) dWs = E f(s) ds. Z0  Z0 This can be restated as t f(s) dWs = f L2[0,t]. 2 k k Z0 L (Ω,P)

t This identity is called the Ito isometry. The map J : f 0 f(s) dWs from BV [0, t] to 2 P 7→ L (Ω, ) is hence an isometry with respect to L2[0,t] on BV [0, t]. J is also linear. Since 2 k · k R 2 BV [0, t] is dense in (L [0, t], 2 ), there is a unique bounded linear map Jˆ from L [0, t] k·kL [0,t] → L2(Ω, P) that extends J. Moreover, this map Jˆ is an isometry. We denote Jˆ by an integral sign: t f(s) dWs := Jˆ(f). Z0 Thus we can “integrate” any L2-function f : [0, t] R with respect to the Wiener process → Ws. This integral is called the Wiener integral. The definition of integration with respect to Ws can still be extended further, for instance, to functions f that are themselves random. Then we arrive at a stochastic integral called the Ito integral.

5 Convex functions

33