Mathematical Finance, Spring 2021

Dario Gasbarra1

March 26, 2021

1University of Helsinki|Master program in Mathematics and Statis- tics|MAST31801 2

My Thesis, paradoxically and a little provocatively, but nonetheless gen- uinely, is simply this:

PROBABILITY DOES’ NOT EXISTS.

The abandonment of superstitious beliefs about the existence of Phlogis- ton , the Cosmic Ether, Absolute Space and Time, ... , or Faires and Witches was an essential step along the road on scientiﬁc thinking. Probability too, if regarded as something endowed with some kind of objective existence is no less a misleading misconception, an illusory attempt to exteriorize or mate- rialize our true probabilitic beliefs.

Bruno De Finetti, Theory of Probability (1974). Contents

1 Probability = Price 5 1.0.1 Expectation ...... 9 1.1 Convexity and separating hyperplane theorem ...... 11 1.1.1 First fundamental Theorem from Farkas lemma . . . . 15 1.1.2 Separation theorem in inﬁnite dimension ...... 20 1.2 Kolmogorov axioms of probability ...... 20 1.2.1 Lebesgue integral and expectation ...... 22 1.3 Arbitrage in one period market model...... 25 1.3.1 Discounting ...... 26 1.4 Geometric Characterization of Arbitrage Free Markets . . . . 30 1.5 First Fundamental Theorem in Multiperiod models ...... 33 1.6 Change of numeraire ...... 40 1.7 Option pricing ...... 41 1.7.1 Intermezzo: The Gaussian integration by parts formula ...... 43 1.7.2 Black and Scholes european call option pricing formula for the lognormal stock price ...... 44 1.7.3 Hedging and superhedging contingent claims . . . . . 47 1.8 Convex functions ...... 54 1.8.1 Replicating european options using call options . . . . 56 1.8.2 Leverage eﬀect of option investments ...... 57

2 Martingale representation and hedging 61 2.1 Some basic facts from martingale theory ...... 61 2.1.1 Conditional Expectation and Martingales ...... 61 2.1.2 Square integrable martingales and predictable bracket . 66 2.1.3 Orthogonal projections in the space of square integrable martingales ...... 67 2.1.4 Martingale property and change of measure ...... 68 2.1.5 Doob decomposition and change of measure ...... 69 2.1.6 Stopping times and localization ...... 71

3 4 CONTENTS

2.1.7 Martingale Predictable Representation Property . . . 73 2.2 Uniform Doob decomposition ...... 76

3 Option Pricing and Hedging in the Cox-Ross-Rubinstein Bi- nomial market model 81

4 American options in discrete time 91 4.1 Optimal Stopping and dynamic programming ...... 91 4.2 Pricing and hedging american options ...... 93 4.2.1 Hedging american options ...... 94 4.2.2 Optimal exercise strategies for the option holder . . . 95 4.2.3 Relation between american and european call options . 95 4.3 Complements on American option ...... 96 4.4 Dual representation ...... 97 4.5 Software tools for computing some option prices in the binomial tree model ...... 98

5 Calculus with respect to functions of ﬁnite variation 103

6 Convergence towards time continuous Black & Scholes model 111

7 Quadratic variation and Ito-Föllmer calculus 115 7.0.1 Ito-Föllmer calculus for random paths ...... 124 7.0.2 Cross-variation ...... 128 7.0.3 Pathwise Stratonovich calculus ...... 130 Chapter 1

Probability = Price

Bruno De Finetti (1906-1985) was an italian mathematician, economist and philosopher. In his philosophy of science the postulate of absolute probability is abandoned. Instead probabilities have a purely operative meaning. Probabilities are always relative to our state of knowledge/ignorance.

In an uncertain world, we classify events into two types C = { certain events } and U = { uncertain events }. When E ∈ C is certain and F ⊇ E, meaning that also F happens when E happens, it follows that also F ∈ C is a certain event. We say that an event E is impossible when its complement or negation Ec (which happens when E does not happen) is a certain event.

We may say that uncertain events which are not impossible are random events. If E and F are events, we denote by (E ∪ F ) the event where either E or F or perhaps both of them occur, and (E ∩ F ) the event where both E and F occur,

We denote the class N = { impossible events } and we say that two events E,F are not compatible or disjoint if their joint occurrence is an impossible event. eli (E ∩F ) ∈ N . Note also that for any event E, always (E ∪Ec) ∈ C.

A bookmaker accepts bets from his customers on the occurence of the disjoint events E1,...,En, such that (Ei ∩ Ej) ∈ N (impossible event) for i 6= j.

The bookmaker chooses the prices of the gambles for the events Ei, their

5 6 CHAPTER 1. PROBABILITY = PRICE

c complements Ei , and all possible unions (Ei ∪ Ej), (Ei ∪ Ej ∪ Ek),... as

Pr(Ei), Pr(Ei ∪ Ej), Pr(Ei ∪ Ej ∪ Ek) · · · ∈ R.

Remark 1.0.1. The set of prices for all the considered events is called pricing system. The notation Pr is both for Prices and Probabilities. Gamblers are allowed to choose their gambling strategy and place simul- taneously gambles against all these considered events A with both positive and negative coeﬃcients (for each event both long and short position are allowed), and the bookmaker has to accept every possible bet at the prices he has set. Let’s introduce the indicator of an event 1A which takes value 1 if A occurs and 0 if A does not occur. Such indicators are random variables, whose values is not speciﬁed in advance but it depends on the occurrence of the random event A. A customer is gambling against the events E1,...En with a gambling n strategy y = (y1, . . . , yn) ∈ R of his choice. This means that he pays

y1Pr(E1) + y2Pr(E2) + ··· + ynPr(En) and received the value of the random variable

y11E1 + y21E2 + ··· + yn1En with proﬁt

V = (1E1 − p1)y1 + ··· + (1E1 − pn)yn where pi = Pr(Ei). A negative proﬁt corresponds to a loss.

Definition 1.0.1. An arbitrage opportunity is a situation where by following a suitable strategy the customer makes a profit which is certainly non-negative and it is possible (which means that it is not impossible) that the profit is strictly positive. If the bookmaker wants to stay in business, he must choose the gambling prices avoiding to create arbitrage opportunities for his customers. We show that in order to do so, the bookmaker has to be coherent and use a probability to form the price system.

Theorem 1.0.1. A pricing system free of arbitrage must be a ﬁnitely additive probability, with the following properties: 7

1. The price of the gambles are unique (law of one price), and every the prices of event indicators 1E are numbers in the interval [0, 1]. 2. If E ∈ C is certain, necessarily P r(E) = 1 and if E ∈ N is impossible, necessarily P r(E) = 0. 3. Prices of events are (ﬁnitely) additive.

Proof. 1. If the bet 1E had two prices, p1 > p2, a gambler would place an arbitrary large number of bets x > 0 at price p1 and simulataneously place (−x) bets at price p2. Trivially this is an arbitrage strategy, where gambler’s proﬁt (which is the loss of the bookmaker) in all situations with certainity is given by

(1E − p1)x − (1E − p2)x = (p1 − p2)x

2. If the event E is certain, and the bet 1E has price p 6= 1, the gambler placing x bets makes a proﬁt

(1E − p)x = (1 − p)x with certainity. When p > 1 ( p < 1) the gambler will make with certainity an arbitrary large proﬁt with x < 0 ( x > 0). If E is impossible, with certainity the gambler proﬁt is

(1E − p)x = −px where again the gambler chooses an arbitrary large x < 0. 3. We consider first the situation where we price two disjoint events, n = 2 , (E1 ∩ E2) ∈ N , such that their union (E1 ∪ E2) ∈ C is a certain event. If the pricing system is arbitrage free, we should have P r(E1 ∪E2) = 1. The linear system V (E1) = (1 − p1)y1 − p2y2 = gambler’s profit when E1 occurs V (E2) = −p1y1 + (1 − p2)y2 = gambler’s profit when E2 occurs

has a solution (y1, y2) for any proﬁt vector (V (E1),V (E2)) if and only if the system coeﬃcients matrix is invertible, 1 − p1 −p2 det = 1 − p1 − p2 6= 0 −p1 1 − p2 In order to avoid arbitrage possibilities for the gamblers, this determinant has to be zero, equivalently

P r(E1) + P r(E2) = 1 = P r(E1 ∪ E2) = 1. 8 CHAPTER 1. PROBABILITY = PRICE

Similarly, for n > 2 disjoint events Ei with Ei ∩ Ej ∈ N for i 6= j, such that their union E1 ∪ E2 ∪ · · · ∪ En ∈ C is a certain event , the price system pi = Pr(Ei), i = 1, . . . , n is arbitrage-free if and only if   1 − p1 −p2 ... −pn  −p1 1 − p2 ... −pn  det   = 1 − p1 − p2 − · · · − pn = 0  ......  −p1 −p2 ... 1 − pn

This determinant is computed by induction or by using Sylvester lemma from linear algebra:

Lemma 1.0.1. If A and B are respectively n × m and m × n matrices, and In denotes the n × n identity matrix, det In + AB = det Im + BA

More in general, for two disjoint events (E1 ∩ E2) ∈ N ,

c E1 ∪ E2 ∪ (E1 ∪ E2) ∈ C

implies

c 1 = P r(E1) + P r(E2) + P r((E1 ∪ E2) ) = P r(E1) + P r(E2) + 1 − P r(E1 ∪ E2)

equivalently P r(E1 ∪ E2) = P r(E1) + P r(E2) 2

Remark 1.0.2. In reality the prices of the bets offered by bookmakers do not form a probability system, however the gambling strategies of the customers are restricted to positive coefficients (no short positions are allowed), and usually P r(E)+P r(Ec) ≥ 1. In this way if the bookmaker can match the bets of his customers will make a profit. Because the bookmakers are competing for customers they have to keep their share of profit small enough to attract gamblers with the best bet prices. Different bookmakers offer different bet prices, and sometimes a gambler by choosing the best price for each bet from different brokers can construct an abritrage opportunity under the positivity constraint. 9

1.0.1 Expectation

Assume that a bookmaker has priced coherently the events E1,...,En (not necessarly disjoint) by using a (ﬁnitely) additive probability Pr, and consider the contract X which is linear combination of the bets n X X X = xi1Ei = x1Ex i=1 x∈R n with coeﬃcient vector x = (x1, . . . , xn) ∈ R , where [ Ex := Ei = X takes value x .

1≤i≤n:xi=x is also an event and Ex = ∅ when x 6= xi ∀i. The contract X pays the predetemined amount xi ∈ R when the event Ei occurs. This contract is replicable by gambling on the events E1,...,En using the strategy (x1, . . . , xn). By the law of one price, it follows that the price of the contract is the linear combination of event prices:

n X X EP r(X) := xiP r(Ei) = xP r(Ex) . i=1 x∈R We say that X is a (simple) random variable and its expectation is the price EP r(X) under the pricing system (probability) Pr The expectation has the following properties: Linearity: if a, b are constants and X,Y are random variables,

EP r(aX + bY ) = aEP r(X) + bEP r(Y )

Positivity: if the event (X ≥ 0) is certain under the probability P r,( meaning that P r(X ≥ 0) = 1), then

EP r(X) ≥ 0

Let n X X ⊆ xi1(Ei): n ∈ N, xi ∈ R i=1 a collection of contracts (random variables), and let c = (c(X): X ∈ X ) the price system for these contracts. 10 CHAPTER 1. PROBABILITY = PRICE

Proposition 1.0.1 (First fundamental theorem of mathematical ﬁnance). The price system c(·) is free of arbitrage if and only if there exists a pricing probability P r, such that P r(E) = 0 if and only if E ∈ N is an impossible event, and the price c(X) of the contract X ∈ X is the expectation EPr(X) under the pricing probability Pr X c(X) = EP r(X) = xP r(X = x). x Remark 1.0.3. Note that when the pricing system for the contracts is free of arbitrage, the pricing probability does not need to be unique, there could be diﬀerent probabilities under which the considered contracts have the same expectation.

Proof. (⇐=): Let X1,...,Xm a finite set of random variables (contracts) and P r a pricing probability, which defined the pricing system X ck = EP r(Xk) = xP r(Xk = x), k = 1, . . . , m. x∈R assuming that for each contract Xk the set of possible values { x : P r(Xk = x) > 0} is finite. k Let y = (y1, . . . , yk) ∈ R a strategy where we buy (when yk > 0, short-sell when yk < 0) yk units of the contract Xk, and suppose that with certainity the profit is non-negative:

m X yk(Xk − ck) ≥ 0 ∈ C k=1 By the linearity of the expectation, since ck = EP r Xk , m m X X EP r yk(Xk − ck) = yk EP r Xk − ck = 0 . k=1 k=1 which implies that

m X yk(Xk − ck) > 0 ∈ N k=1 is an impossible event, otherwise

m X P r yk(Xk − ck) > 0 > 0 k=1 1.1. CONVEXITY AND SEPARATING HYPERPLANE THEOREM 11 with contradiction

m m X X 0 = EP r(Xk) − ck = EP r yk(Xk − ck) k=1 k=1 m m X X = EP r yk(Xk − ck) 1 yk(Xk − ck) > 0 > 0. k=1 k=1 To prove the other implication (=⇒) we will need the separating hyperplane theorem from convex analysis Remark 1.0.4. The arbitrage free pricing system assigned for the events, it is extended by linearity to all simple random variables taking ﬁnitely many values. In the other direction if a set of prices (c(X): X ∈ X ) is assigned for a class of simple random variables (gambling contracts on elementary events), we want to check whether it is free of arbitrage. We show that this is the case if and only if there is at least a pricing probability Pr such that EPr(X) = c(X). When the pricing probability is not unique, the price system (c(X): X ∈ X ) can be extended in diﬀerent ways. Remark 1.0.5. In this introduction we adopted De Finetti’s minimalist point of view and it was not necessary to assume all Kolmogorov probability axioms, where events are subsets a space Ω of point events ω.

1.1 Convexity and separating hyperplane theorem

Deﬁnition 1.1.1 (Convexity). Let V be a vector space, for example V = Rd, the euclidean space. 1. A set C ⊆ V is convex if and only if

x, y ∈ C, 0 ≤ α ≤ 1 =⇒ αx + (1 − α)y ∈ C.

2. A set K ⊆ V is a cone if ∀x ∈ K, λ > 0 also λx ∈ K. Deﬁnition 1.1.2. The convex hull of a set A ⊆ V , is deﬁned as the smallest convex set containing A:

n n \ X X C(A) = C = x = αiyi : n ∈ N, yi ∈ A, αi > 0, αi = 1 convex C⊇A i=1 i=1 12 CHAPTER 1. PROBABILITY = PRICE

Lemma 1.1.1 (Föllmer & Schied book, Proposition A.1). Let C ⊂ Rd a convex set such that 0 6∈ C. Then there is a vector ρ ∈ Rd such that

d X (x, ρ) := x · ρ := xiρi ≥ 0 ∀x ∈ C i=1 and there exists x0 ∈ C such that x0 · ρ > 0. Moreover, when 0 6∈ C, there is a vector ρ ∈ Rd such that x·ρ > 0 ∀x ∈ C. Geometric interpetation: there is a separating hyperplane

d H = z :∈ R : z · ρ = 0 between the convex set C and the origin 0. √ Proof. Denote |x| = x · x (euclidean norm), and C denotes the closure of C, which contains the limits for all Cauchy sequences {xn : n ∈ N} ⊆ C. Note that

0 6∈ C ⇐⇒ inf |x| > 0 ⇐⇒ inf |x| > 0, x∈C x∈C and there is y ∈ C such that |y| = infx∈C |x| > 0. In this case, since the closure of a convex set is convex (exercise), it follows that ∀0 ≤ α ≤ 1, ∀x ∈ C,

|(x − y)α + y|2 ≥ |y|2 > 0 ⇐⇒ (x − y, x − y)α2 + (y, y) + 2α(x − y, y) ≥ (y, y) > 0 ⇐⇒ (x − y, x − y)α ≥ 2(y − x, y) by letting α ↓ 0, it follows that (x, y) ≥ (y, y) > 0, and we can choose ρ = y. Otherwise 0 ∈ C\C and infx∈C |x| = 0. We can assume that

m X d LinearSpan(C) = λkxk : m ∈ N, λk ∈ R, xk ∈ C = R k=1 otherwise we consider C imbedded in R` with smaller dimension ` < d. We show that when C is convex and 0 ∈/ C, necessarily C (Rd. d d Let {e1, . . . , ed} ⊆ C be a basis in R , where ei span R . We show that

d X ξ := − ei ∈/ C i=1 1.1. CONVEXITY AND SEPARATING HYPERPLANE THEOREM 13

Otherwise there would be a sequence (ξ(n) : n ∈ N) ⊆ C d (n) X (n) (n) ξ = λi ei → ξ as n → ∞ , ⇐⇒ λi → −1 ∀i = 1, . . . , d. i=1 (n) For n large enough λi < 0 ∀i = 1, . . . , d, and it follows that d (n) 1 X −λ 0 = ξ(n) + i e ∈ C, Pd (n) Pd (n) i 1 − j=1 λj i=1 1 − j=1 λj (n) where 0 is then convex combination of vectors ξ , e1, . . . , ed ∈ C, and since C is convex necessarily 0 ∈ C. This contradicts the assumption 0 6∈ C, therefore d we must have ξ 6∈ C. By the same argument ζn = ξ/n ∈ R \ C ∀n ∈ N, which means that

inf |x − ζn| > 0 x∈C and at the same time ζn → 0. For each n Cn = (C − ζn) := x − ζn : x ∈ C is convex and 0 6∈ Cn. By the ﬁrst part of the proof infx∈Cn |x| > 0, and there d is a vector ρn ∈ R , such that

(ρn, x) > 0 ∀x ∈ Cn

By multiplyng with a positive scalar we can assume that ∀n, |ρn| = 1. Since the surface of the unit ball Sd−1 = {x ∈ Rd : |x| = 1} is compact, by the Heine-Borel lemma it follows that there is a convergent subsequence

ρnk → ρ, a vector with |ρ| = 1, such that ∀x ∈ C

ρ · x = lim ρn · x = lim ρn · (x − ζn ) ≥ 0 k→∞ k k→∞ k k where |ζnk | → 0. Since |ρ| = 1, necessarily ρ 6= 0, and since L = span(C) is d- dimensional, it cannot be (x, ρ) = 0 ∀x ∈ C, otherwise we would have C ⊆ ρ⊥ where the hyperplane ρ⊥ = { x ∈ Rd :(x, ρ) = 0} is (d − 1)-dimensional. Therefore there exists a vector x0 ∈ C such that (ρ · x0) > 0 Corollary 1.1.1. Let C ⊂ Rd a convex set and b ∈ Rd such that b 6∈ C. Then there is a vector ρ ∈ Rd such that d X (x, ρ) := x · ρ := xiρi ≥ b · ρ ∀x ∈ C i=1 and there exists x0 ∈ C such that x0 · ρ > b · ρ. Moreover, when ρ 6∈ C, there is a vector ρ ∈ Rd such that x · ρ > b · ρ ∀x ∈ C. 14 CHAPTER 1. PROBABILITY = PRICE

Proof. Apply the ﬁrst version of the separating hyperplane theorem to (C − b) = {x − b : x ∈ C} 63 0, which is also a convex set.

Lemma 1.1.2 (Farkas (1902), Theorem of the Alternative). For a d × n- d matrix A, and a vector b = (b1, . . . , bd) ∈ R , one and only one of these two alternatives is always true:

> n (1) There exists q = (q1, . . . , qn) ∈ (0, ∞) such that Aq = b

d n (2) There exists y = (y1, . . . , yd) ∈ R such that y · b ≤ 0, yA ∈ R+, and n (yA − y · b) ∈ R+ \{0} d Proof. Denote by a1, . . . , an ∈ R the row vectors of the matrix A and consider the open cone n X d C = αiai : αi > 0 ⊆ R i=1 which is the convex cone generated by the vectors a1, . . . , an. (1) and (2), correspond to the alternatives b ∈ C and b∈ / C. When b∈ / C, by the separating hyperplane Theorem, there is a vector y ∈ Rd such that

y · v ≥ y · b ∀v ∈ C and ∃v0 ∈ C : y · v0 > y · b.

Equivalently

yAq ≥ y · b ∀q ∈ (0, ∞)n and n ∃q0 ∈ (0, ∞) : yAq0 > y · b.

Since q = (q1, . . . , qn) can have arbitrarily large or arbitrarily small positive components, it follows that y · b ≤ 0 and by choosing for example

q(N,j) = 1/N, . . . , 1/N , N, 1/N, . . . 1/N |{z} j coordinate since

(N,j) yAq ≥ y · b ∀N ∈ N, 1 ≤ j ≤ d,

n n necessarily yA ∈ R+. Therefore y · (Aq0 − b) > 0 for some q0 ∈ (0, ∞) , and n (yA − y · b) ∈ R+ \{0}. 1.1. CONVEXITY AND SEPARATING HYPERPLANE THEOREM 15

Corollary 1.1.2 (Farkas). For a d×n-matrix A, and a vector b = (b1, . . . , bd) ∈ Rd, one and only one of these two alternatives is always true: > n (1) There exists q = (q1, . . . , qn) ∈ R+ such that Aq = b

d n (2) There exists y = (y1, . . . , yd) ∈ R such that yA ∈ R+ \{0} and y·b < 0.

Proof. Here R+ = [0, ∞). As in the previous lemma considering the closed convex cone C instead of the open cone C.

1.1.1 First fundamental Theorem from Farkas lemma Farkas lemma gives immediately the First Fundamental Theorem of Mathe- matical Finance on a finite probability space. Let A1(ω),...,Ad(ω) final values financial assets depending on ω ∈ Ω = {ω1, . . . , ωn} a discrete probability space with a reference probability P such that P({ωi} ) > 0 ∀i = 1, . . . , n. Let Ajk = Aj(ωk) for j = 1, . . . , d, k = 1, . . . , n, and let bj ∈ R the initial price of the asset Aj. In general the Ajk are not restricted to be non-negative, and so are the initial prices bj. Now the first version of Farkas Lemma 1.1.2, says that either there is a vector (q1, . . . , qn) with qk > 0 such that Aq = b, which means that

n X bj = Ajkqk k=1 or there exist a strategy (y1, . . . , yd) which has initial price

d X y · b = yjbj ≤ 0 j=1 and ∀k the ﬁnal value

d X (y · A)k = yjAj(ωk) ≥ 0 j=1 such that for some ωk with P({ωk}) > 0 we have strictly positive proﬁt

d X (yA)k − y · b = yj(Ajk − bj) > 0 j=1

So either there is a pricing measure equivalent to P (giving positive weight to all the elementary events) or there is an arbitrage opportunity, which 16 CHAPTER 1. PROBABILITY = PRICE has non-positive initial cost, and has always non-negative ﬁnal value, which possibly (on a set of positive P probability) is also strictly positive. In order to show that q is a probability measure we need to have a deterministic money asset in the investment portfolio, for example A1(ω), corresponding to a constant investment with A11 = A12 = ··· = A1n = b1 6= 0, which gives the normalization

q1 + q2 + ··· + qn = 1

In order to reduce ourself to this situation, we assume that there is one asset, for example A1(ω), which we call numeraire, which takes always strictly positive values, b1 > 0 A1(ω) > 0 ∀ω with P({ω}) > 0, and apply the theorem to the vector of discounted asset values eb = 1, b2/b1, . . . , bn/b1 and discounted asset prices

Aejk = Ajk/A1k so that eb1 = Ae11 = ··· = Ae1n = 1. Then either there is a probability q = (q1, . . . , qn) with each qk > 0, Pn j=1 qj = 1 such that

Aqe = eb (1.1.1)

n or there is an arbitrage strategy y = (y1, . . . , yd) such that yAe ∈ R+ \{0} and y · eb ≤ 0. n The latter condition is equivalent to yA ∈ R+ \{0} and y · b ≤ 0. We say that q satisfying (1.1.1) is a risk-neutral probability measure, or martingale measure. The expectation w.r.t. q of the future value of the discounted asset value is the initial price.

The second Farkas lemma 1.1.2 characterizes the existence of an immediate arbitrage.

Assuming that A11 = ··· = A1n = b1 = 1, either there is a probability Pn q = (q1, . . . , qn) with qj ≥ 0 and k=j qj = 1, or there is a strategy y = n (y1, . . . , yd) such that yA ∈ R+ \{0} and y · b < 0. This is called immediate arbitrage since the proﬁt of the arbitrager is

−y · b + yA ≥ −y · b > 0 the arbitrager get’s part of her proﬁt from the beginning. 1.1. CONVEXITY AND SEPARATING HYPERPLANE THEOREM 17

Example 1.1.1. On Ω = {ω1, ω2, ω3} we ﬁx a reference probability P such that P({ωi}) > 0 ∀i = 1, 2, 3. Consider a two-period market with a random vector of 3 assets A(ω) = (A0(ω),A1(ω),A2(ω)) with initial prices b = (b0, b1, b2) = (1, 2, 7) at time 0 and possible ﬁnal values at time 1

1 1  1  A(ω1) = 3 ,A(ω2) = 1 ,A(ω3) =  5  9 5 10

Apply Farkas lemma to see whether there exists an arbitrage or not. Solution The linear system

1 1 1  Aq = b with A = 3 1 5  where Aij = Ai(ωj) 9 5 10 has unique solution q = (1/2, 1/2, 0)> which is a probability but it is not completely positive. By Farkas lemma there are arbitrage strategies y = (y0, y1, y2) 6= 0 with initial price y · b ≤ 0 and proﬁt

n (yA − y · b) ∈ R+ \{0}

3 Geometrically, we look for some investment strategy y = (y1, y2, y3) ∈ R such that y ⊥ A(ωj) = 0, for j = 1, 2, and y · S(ω3) > 0 strictly. Concretely we can take orthogonal projection and ﬁnd y as the component of the vector A(ω3) orthogonal to the plane spanned by A(ω1) and A(ω2). This investment portfolio has initial value

y · b = q1 y · A(ω1) + q2 y · A(ω2) = 0 and it has final value y · A(ωj) = 0 for j = 1, 2 and y · A(ω3) > 0. Since we had assumed that P({ω3}) > 0, this strategy is an arbitrage, since it has zero initial cost, and in all cases has a non-negative final value which is strictly positive with positive probability. The second version of Farkas lemma tells that since there is a pricing probability measure, there are no immediate arbitrages, with negative initial price and always non-negative final value. In fact the arbitrage portfolio we found has zero initial price, and it is not an immediate arbitrage.

Theorem 1.1.1 (Another version of Farkas lemma with inequality system). Let A be a d×n-matrix and b a 1×d-vector. Then the either of the following two alternatives always hold 18 CHAPTER 1. PROBABILITY = PRICE

1. there is a vector q ∈ (0, ∞)n such that Aq ≤ b.

d 2. There is a non-trivial ρ ∈ R+ \{0} (investment strategy) with non- n negative weights, such that ρ · b ≤ 0, ρA ∈ R+, and proﬁt

n (ρA − ρ · b) ∈ R+ \{0}

Proof. We leave the proof as an exercise with an hint: d o n We introduce the non-negative orthants K = R+ and K = (0, ∞) . Check that there is a strictly positive supermartingale probability measure for the price vector b ∈ Rd if and only if b ∈ C := AKo + K which is a convex cone. Otherwise apply the separating hyperplane theorem to prove that there is an arbitrage strategy with non-negative weights. Proof K = [0, ∞)d (the non-negative orthant) and Ko = (0, ∞)n are convex cones. Let

C = y : Ax ≤ y for some x ∈ Ko = y = Ax + z for some x ∈ Ko, z ∈ K = AKo + K which is a convex cone, since a linear transformation of a convex cone is a convex cone and the sum of convex cones is also a convex cone. Now either b ∈ C which means that there exist q ∈ Ko with Aq ≤ b, or b 6∈ C. In the latter case, by the separating hyperplane theorem, there is ρ ∈ Rd such that ρ · y ≥ ρ · b ∀y ∈ C and y0 ∈ C such that

ρ · y0 > ρ · b implying that ρ 6= 0. Since C is a cone necessarily ρ · b ≤ 0. Also

ρAx + ρ · z ≥ 0 ∀x ∈ Ko, z ∈ K

n d which implies ρA ∈ R+ and ρ ∈ R+ \{0}. Theorem 1.1.2 (First Fundamental Theorem under non-negative portfolio weights constraint). Let A be a d × n-matrix and b a 1 × d-vector. Assume that A1k = b1 = 1 for k = 1, . . . , n corresponding to a money instrument with constant value. Then the either of the following two alternatives always hold 1.1. CONVEXITY AND SEPARATING HYPERPLANE THEOREM 19

n 1. there is a probability vector q ∈ (0, ∞) with q1 + ··· + qn = 1 such that Aq ≤ b with equality in the ﬁrst coordinate

(Aq)1 = q1 + ··· + qn = b1 = 1. Such probability vector q is a supermartingale measure for the pricing system b of the random assets A.

2. There is an abritrage strategy ρ ∈ R × [0, ∞)d−1 \{0} (investment strategy) with the the weights ρj, j = 2, . . . , d of the assets are non- negative, with the exception of the money instrument corresponding to j = 1, n such that ρ · b ≤ 0, ρA ∈ R+, and the proﬁt vector satisﬁes n (ρA − ρ · b) ∈ R+ \{0}

If bj > 0 for j = 2, . . . , d it follows also that ρ1 < 0 for the money component of the portfolio, meaning that at time t = 0, (−ρ1) units of the money instrument are borrowed from the bank in order to buy ρj units of the j-th instruments for j = 2, . . . , d. Proof. Let n C1 = y : Ax ≤ y for some x ∈ (0, ∞) with y1 = (Ax)1 = y = Ax + z for some x ∈ (0, ∞)n, z ∈ {0} × [0, ∞)d−1 = A(0, ∞)n + {0} × [0, ∞)d−1 which again is a convex cone (linear transformation of a convex cones and the sum of convex cones are convex cones). Now either b ∈ C1, which means that there exist a supermartingale prob- n ability measure q ∈ (0, ∞) with Aq ≤ b and q1 + ··· + qn = 1, or b 6∈ C1. In the latter case, by the separating hyperplane theorem, there is ρ ∈ Rd such that

ρ · y ≥ ρ · b ∀y ∈ C1 and y0 ∈ C1 such that

ρ · y0 > ρ · b, which implies ρ 6= 0. Now since C is a cone necessarily ρ · b ≤ 0. Also ρAx + ρ · z ≥ 0 ∀x ∈ (0, ∞)n, z ∈ {0} × [0, ∞)d−1

n which implies ρA ∈ R+ and ρj ≥ 0 for j = 2, . . . , d. Since b · ρ ≤ 0, when bj > 0 for j = 1, . . . , d, since ρj ≥ 0 for j = 2, . . . , d are not all identically 0, it follows that ρ1 < 0. 20 CHAPTER 1. PROBABILITY = PRICE

1.1.2 Separation theorem in inﬁnite dimension

Deﬁnition 1.1.3. We say that the hyperplane {x ∈ Rd : a · x = α} with d d a ∈ R \{0} and α ∈ R separates two sets C1,C2 ⊂ R if

C1 ⊆ {x : a · x ≤ α} and C2 ⊆ {x : a · x ≥ α} or viceversa.

When {x : a · x = α} 6⊇ C1 ∪ C2 we say that the separation is proper. We say that the separation is strict if there are α1 < α2 ∈ R with

C1 ⊆ {x : a · x ≤ α1} and C2 ⊆ {x : a · x ≥ α2} or viceversa.

Theorem 1.1.3 (Geometric Hahn–Banach theorem). This inﬁnite dimensional generalization of the separating hyperplane Theorem is discussed in functional analysis courses. Let V be a normed vector space (more in general a locally convex vector space with topology deﬁned by a family of seminorms). The dual space V ∗ is the set of continuous linear functionals v∗ : V → R, with v 7→ hv, v∗i. When ∅= 6 K ⊂ V is convex and compact and ∅= 6 C ⊂ V is convex and closed, with K ∩ C = ∅, there exists a continuous linear functional v∗ ∈ V ∗ and a scalar r ∈ R such that the hyperplane H = {v ∈ V : hv, v∗i = r} ⊆ V, separates strictly K from C, i.e. suphv, v∗i < r < inf hw, v∗i v∈C w∈K Proof. See section 6.2 in the book Real Analysis and Probability by Richard Dudley.

Corollary 1.1.3. On a normed vector space V (more in general on a locally convex vector space) the dual space V ∗ separates the points: if v 6= w ∈ V , there exists v∗ ∈ V ∗ with hv, v∗i= 6 hw, v∗i

1.2 Kolmogorov axioms of probability

In 1933 Andrey Nikolaevich Kolmogorov (1903-1987) published the book Foundation of the Theory of Probability, containing the modern axiomatic of probability theory, the construction of probability measures on inﬁnite dimensional spaces, and the general notion of conditional expectation. 1.2. KOLMOGOROV AXIOMS OF PROBABILITY 21

Deﬁnition 1.2.1. Let Ω be a set (the space of point events ω ). A σ-algebra of events is a collection F of Ω-subsets satisfying the following properties

1. Ω ∈ F

2. E ∈ F =⇒ Ec = (Ω \ A) ∈ F. In particular ∅ = Ωc ∈ F

3. F is closed under countable unions: [ {En : n ∈ N} ⊂ F =⇒ En ∈ F n∈N

Elements A ∈ F are called events (also measurable sets).

Deﬁnition 1.2.2. A set Ω equipped with a σ-algebra F forms a probability space (Ω, F) (also called measurable space).

Deﬁnition 1.2.3. A (positive) measure on a probability space (Ω, F) is a function µ : F → R+ ∪ { +∞} such that

1. µ(∅) = 0

2. µ is σ-additive (countably additive) i.e. for any countable sequence of disjoint events {En : n ∈ N} ⊂ F,

∞ ∞ [ X µ En = µ(En) n=0 n=0

We say that positive measure µ is a probability when µ(Ω) = 1.

Deﬁnition 1.2.4. Let (Ω, F) and (V, B) be measurable spaces. Usually we consider V = Rd and B = B(Rd) is the smallest σ-algebra containing the open sets, named Borel σ-algebra. A random variable (measurable function) taking values in V is a function X :Ω → V such that ∀B ∈ B, the counterimage

X−1(B) = ω : X(ω) ∈ B ∈ F.

Remark 1.2.1. Note the analogy with the concept of continuity: a function between topological spaces f :(T, T ) → (U, U), where T and U are the collections of open sets in the respective spaces, is continuous if and only if

−1 f (U) = t : f(t) ∈ U ∈ T (is open), ∀ open U ∈ U. 22 CHAPTER 1. PROBABILITY = PRICE

Definition 1.2.5. We say that a random variable X : (Ω, F) → (V, B) is simple when it takes finitely many values. In this case it has representation n X X(ω) = vi1Ei (ω), vi ∈ V,Ei ∈ F (1.2.1) i=1 Definition 1.2.6. Let X : (Ω, F) → (V, B) be a simple random variable on a normed vector space V with representation (1.2.1), and P a probability measure on (Ω, F). The expectation of X under P is the vector Z n X EP X = X(ω)P(dω) = viP(Ei) ∈ V Ω i=1 Note that if V = Rd we can also define the integral coordinatewise. We continue with scalar random variables.

1.2.1 Lebesgue integral and expectation

Let X :Ω → R+ be a scalar non-negative random variable. One can always construct a sequence of simple random variables such that 0 ≤ X(n)(ω) ≤ X(ω), ∀n, ω, and Xn(ω) ↑ X(ω) as n ↑ ∞ ∀ω), approximating X(ω) from below. For example, take the sequence of simple random variables

n2 X k k k + 1 X(n)(ω) = 1 ≤ X(ω) < n n n k=0 Since the expectations

n2 X k k k + 1 X(n) = ω : ≤ X(ω) < EP nP n n k=0 and since this sequence of expectations is monotone, its limit lim X(n) ∈ + ∪ {+∞} n→∞ EP R always exists (possibly infinite), and (it is proved in Probability Theory or measure theory courses) it does not depend on the approximating sequence (n) of simple functions X . This limit defines the expectation EP(X) of the random variable X under the probability P. More in general when X takes values in R, X+(ω) = max{X(ω), 0} and X−(ω) = max{−X(ω), 0} are non-negative random variables with X(ω) = X+(ω) − X−(ω) ∀ω ∈ Ω, and we define + − EP(X) = EP(X ) − EP(X ) + − which is well defined unless EP(X ) = EP(X ) = +∞. 1.2. KOLMOGOROV AXIOMS OF PROBABILITY 23

Deﬁnition 1.2.7. The space 1 L (Ω, F, P) := X : (Ω, F) → (R, B) random variables with EP |X| < ∞

is the space of integrable random variables with well-deﬁned expectation under P.

Proposition 1.2.1. The expectation satisfyies the following properties

Linearity If X,Y ∈ L1(Ω, F, P) and a, b ∈ R, then (aX + bY ) ∈ L1(Ω, F, P) and EP aX + bY = aEP(X) + bEP(Y )

Positivity If X is a random variable with P(X ≥ 0) = 1, then EP(X) ≥ 0.

Deﬁnition 1.2.8. Given the probability triple (Ω, F, P), we say that A ⊂ Ω is a null set under P when there is a B ∈ F with A ⊂ B and P(B) = 0, and denote the collection of P -null sets as N P . For AN P we set P(A) = 0. We deﬁne also the P -completed σ-algebra F P = F ∨ N P as the smallest σ- algebra containing F and N P . By Charatheodory extension theorem there is an extension P∗ of P on (Ω, F P ) such that P∗(B) = P(B) when B ∈ F.

Deﬁnition 1.2.9. The space L∞(Ω, F, P) is the space of F-measurable R- valued random variables X which are essentially bounded, i.e. have ﬁnite P-essential supremum k X k∞= ess sup |X(ω)| := inf k > 0 : P(|X| ≤ k) = 1 < ∞ ω,P

Deﬁnition 1.2.10. We also denote by L0(Ω, F, P) the space of F-measurable P R-valued random variables, where two elements X ≡ Y are identiﬁed when P(X = Y ) = 1 (this is an equivalence relation).

Deﬁnition 1.2.11. Let C any collection of R-valued random variables. There exists a random variable X∗ such that ∀X ∈ C

∗ X ≥ X P-almost surely (1.2.2)

which is the smallest possible with this property: if Z is a random variable such that ∀X ∈ C

Z ≥ X P-almost surely 24 CHAPTER 1. PROBABILITY = PRICE then necessarily Z ≥ X∗ P-almost surely. We denote X∗ = ess sup X X∈C which is called the essential supremum of the collection of random variables C. If C is a directed set, meaning that ∀X0,X00 ∈ C ∃X ∈ C with X ≥ X0∨X00 ∗ P-almost surely, then there is a sequence {Xn}n∈N ⊆ C such that Xn ↑ X P-almost surely. Proof. (from Appendix A.5 in Föllmer and Schied Stochastic Finance book) without loss of generality we can assume that the random variables X ∈ C take values in [0, 1], (otherwise we consider composed random variables f(X) where f : R → [0, 1] is a strictly increasing). ∗ When C is countable X = supC X is a random variable which satisﬁes the properties of the essential supremum. The problem is that supX∈C X is not necessarily measurable when we take the supremum over an uncountable family. We prove ﬁrst that the supremum

∗ c = sup EP sup X : countable Φ ⊂ C X∈Φ is attained, i.e. ∃ countable Φ∗ ⊆ C such that

∗ c = EP sup X (1.2.3) X∈Φ∗

First there is a sequence of countable Φn ⊂ C such that

∗ EP sup X ↑ c ∈ [0, 1], X∈Φn and Φ∗ = S Φ ⊆ C is countable, as countable union of countable sets, n∈N n which necessarily satisﬁes (1.2.3). We claim that

X∗ = sup X X∈Φ∗ satisﬁes the properties of the essential supremum. First X∗ is a random variable, since it is the supremum over a countable set of random variables. If (1.2.2) did not hold, there would be a random variable Xe ∈ C such that P(Xe > X∗) > 0. But this would mean that P sup X ≥ sup X = 1 and P sup X > sup X > 0 X∈Φ∗∪{Xe} X∈Φ X∈Φ∗∪{Xe} X∈Φ 1.3. ARBITRAGE IN ONE PERIOD MARKET MODEL. 25 which gives the contradiction ∗ ∗ c ≥ EP sup X > EP sup X = c X∈Φ∗∪{Xe} X∈Φ To conclude if Z is a random variable such that ∀X ∈ C Z ≥ X, necessarily ∗ Z ≥ supX∈Φ∗ X = X . Deﬁnition 1.2.12. We say that two probabilities P and Q on a probability space (Ω, F) are equivalent when N P = N Q, i.e. for A ∈ F P (A) = 0 if and F only if Q(A) = 0. and introduce the notation P ∼ Q for such relation. Note that this equivalence depends on the choice of the σ-algebra F.

Note that L∞(Ω, F, P) = L∞(Ω, F,Q) when P ∼ Q. Theorem 1.2.1 (Radon-Nikodym). If P ∼ Q are equivalent probability measures on (Ω, F), their likelihood ratio or Radon-Nikodym derivative Z = dQ 1 dP ≥ 0 exists as a random variable in L (Ω, F, P) such that for all essentially bounded random variable X ∈ L∞(Ω, F,P ) = L∞(Ω, F,Q) Z Z dQ EQ(X) = X(ω)Q(dω) = EP (ZX) = X(ω) (ω)P (dω), Ω Ω dP P (Z > 0) = 1, and

−1 dP −1 Z = , meaning that P (X) = Q(XZ ) dQ E E

Example 1.2.1. If P ∼ Q are deﬁned on a discrete probability space Ω = N Q({ω}) we have Z(ω) = P ({ω}) , where in case P ({ω}) = Q({ω}) = 0 we set an arbitrary value, for example Z(ω) = 0.

Example 1.2.2. If P ∼ Q for probability measures on the euclidean space (Rd, B(Rd)) with densities p(x) and q(x) with respect to the d-dimensional dQ Lebesgue measure, then Z(x) = dP (x) = q(x)/p(x).

1.3 Arbitrage in one period market model.

On a probability space (Ω, F) equipped with a reference probability P, we consider the one period market model with a random vector of (d + 1) assets ¯ values at time t = 1, S = (S0,S1,...,Sd) = (S0,S), with S0 scalar values d and S R -valued with initial prices π¯ = (π0, π1, . . . , πd) = (π0, π) at time d t = 0, where π0 > 0 and π ∈ (0, ∞) . Here πi > 0, P (Si ≥ 0) = 1 with 26 CHAPTER 1. PROBABILITY = PRICE

P(Si > 0) > 0 and for the ﬁrst asset which is chosen as numeraire we also assume that P(S0 > 0) = 1. Often S0 is a riskless asset, with S0 = (1 + r)π0 where the return r > −1 return (interest rate).

Deﬁnition 1.3.1. An portfolio or investment strategy is a vector ξ¯ = d+1 ¯ Pd (ξ0, ξ1, . . . ξd) ∈ R with initial price at time t = 0 V (0) = ξ · π¯ = i=0 ξiπi ¯ ¯ Pd and random ﬁnal value at time t = 1 V (1) = ξ · S = i=0 ξiSi.

Definition 1.3.2. An arbitrage opportunity is a portfolio ξ¯ ∈ Rd+1 such that with initial price V (0) = ξ¯ · π ≤ 0, and final value V (1) = ξ¯ · S satisfying P(V (1) ≥ 0) = 1 and P(V (1) > 0) > 0. Note that the definition of arbitrage depends on the choice of the probability only through the null sets N P , so it is invariant under and equivalent change of probability measure.

Proposition 1.3.1. The market model with assets (S0,S) and initial prices d (π0, π) ∈ R+ × R+ has arbitrage opportunities if and only if there is a vector d ξ = (ξ1, . . . , ξd) ∈ R such that

S0 (ξ · S) ≥ (ξ · π) P-almost surely π0 S0 P (ξ · S) > (ξ · π) > 0 π0

¯ d Proof. Let ξ0 = −(ξ · π)/π0 and take ξ = (ξ0, ξ) ∈ R × R

1.3.1 Discounting

Using the assumption P(S0 > 0) = 1, we choose the ﬁrst asset S0 as numeraire and consider the discounted asset values at time t = 1

Sei = Si/S0 , i = 0, . . . , d which P-almost surely takes ﬁnite non-negative values, and their discounted initial prices at time t = 0:

πei = πi/π0

Often π0 = 1 and S0 = (1 + r) with r > −1 deterministic, πei = πi and Si Sei = 1+r i = 1, . . . , d. 1.3. ARBITRAGE IN ONE PERIOD MARKET MODEL. 27

Deﬁnition 1.3.3. We say that a probability Q on (Ω, F) is risk-neutral, (or that it is a martingale measure) with respect to the numeraire asset S0, if under Q the expected discounted values of the other assets coincide with initial prices: Si πi EQ Sei = EQ = πei = i = 1, . . . , d S0 π0

Theorem 1.3.1 (1st fundamental theorem of asset pricing). The market model is arbitrage free if and only if there exists a risk-neutral measure Q dQ ∞ equivalent to P with (essentially) bounded density dP ∈ L (P ) Proof. (⇐=) Assume that there is a risk-neutral meausure Q equivalent to ¯ d+1 d P , and let ξ = (ξ0, ξ) ∈ R = R × R if Qξ¯· S¯ ≥ 0 = 1 and Qξ¯· S¯ > 0 > 0 then by discounting with the numeraire S0

S S Q ξ0 + ξ · ≥ 0 = 1 and Q ξ0 + ξ · > 0 > 0 S0 S0 which implies

S π EQ ξ0 + ξ · = ξ0 + ξ · > 0, S0 π0 equivalently ξ¯· π¯ > 0, and ξ¯ cannot be an arbitrage portfolio.

Proof. =⇒ (from Föllmer and Schied book). Now (Ω, F,P ) is a general abstract probability space We introduce the vector of discounted gains

Yi(ω) = Sei(ω) − πei, i = 1, . . . , d discounted with respect to the numeraire S0. Let

P := dQ Q ∼ P probability with Z = ∈ L∞(Ω, F,P ),E (Z) = 1, Z > 0 P -a.s. dP P be the set of probability measures Q equivalent to P with essentially bounded dQ likelihood ratio dP and d C = EQ(Y ): Q ∈ P ⊆ R 28 CHAPTER 1. PROBABILITY = PRICE

Note that C is convex as image of a convex set P by the linear map Q 7→ d EQ(Y ) ∈ R . When 0 6∈ C, meaning that there are no risk-neutral measures Q ∼ P with bounded likelihood ratio, we show that arbitrage possibilities exist. Indeed by the separating hyperplane theorem, there is a vector ξ ∈ Rd such that ξ · y ≥ 0 for all y ∈ C, and ξ · y > 0 for some y ∈ C equivalently

d X EQ(ξ · Y ) = ξiEQ(Yi) ≥ 0∀Q ∈ P, and ∃Q0 ∈ P : EQ0 (ξ · Y ) > 0 i=1 Let A = ω : ξ · Y (ω) < 0 we show that P (A) = Q(A) = 0 ∀Q ∈ P. By contradiction assume that P (A) > 0. Then we can construct by reweighting the probability Q0 a sequence (Qn : n ∈ N) ⊆ P which concentrates more and more probability mass to into the set A. Starting from Q0 with Q0(A) > 0, let 1 1 ϕ (ω) = 1 − 1 (ω) + 1 c (ω) n n A n A ϕ (ω) ϕ (ω) Z (ω) = n = n n (ϕ ) 2 1 EQ0 n 1 − n Q0(A) + n which is a bounded random variable with 0 < Zn ≤ n and EQ0 (Zn) = 1 and take the probability Qn ∈ P with

dQn = Zn dQ0 Note that dQ dQ dQ n = n 0 dP dQ0 dP where the likelihood ratios on the right hand side product are strictly posi- dQn tive and essentially bounded, so dP is also strictly positive and essentially bounded and therefore Qn ∈ P. We assume now without loss of generality that v u d uX 2 1 k Y k= t Yi ∈ L (Ω, F,P ). i=1 1.3. ARBITRAGE IN ONE PERIOD MARKET MODEL. 29 which implies that P (|ξ · Y | ≤ EP k Y k max{|ξi|} < ∞ E i (and also with respect to all Q ∈ P. This is without loss of generality, ∗ otherwise one can always ﬁnd a probability measure P ∈ P such that EP ∗ k Y k) < ∞ by taking dP ∗ 1 1 −1 = E > 0 dP 1+ k Y k P 1+ k Y k which is a bounded ranbom variable, leading to k Y k 1 −1 E ∗ (k Y k) = E E < ∞ P P 1+ k Y k P 1+ k Y k ∗ and start the construction substituting P in the place of P . Since Qn have bounded densities with respect to P , it folllows that ∀n EQn k Y k < ∞ as well. We have

lim ϕn(ω) = 1A(ω) ∀ω n→∞ and by Lebesgue dominated convergence theorem lim EQ (ξ · Y )ϕn = EQ (ξ · Y )1A = EQ (ξ · Y )1(ξ · Y < 0) ≤ 0 n→∞ 0 0 0 on the other has since Qn ∈ P −1 EQn (ξ · Y )) = EQ0 (ξ · Y )ϕn EQ0 ϕn ≥ 0 | {z } >0 Therefore EQ0 (ξ · Y )1A = 0, which means that P(ξ · Y ≥ 0) = 1 under the equivalent probability. Since EQ0 (ξ · Y ) > 0 necessarily P (ξ · Y > 0) > 0, and ξ¯ = (−ξ · π, ξ) ∈ Rd+1 is an arbitrage strategy. Remark 1.3.1. In this proof, the mathematical point of view it was not essential that the assets and their prices take non-negative values.

Remark 1.3.2. In the case of a market with inﬁnitely many assets (Si : i ∈ N), existence of a risk neutral measure implies absence of arbitrage, but there are are arbitrage free models which do not admit a risk neutral measure. See Remark 1.18 in Föllmer and Schied book. Remark 1.3.3. Similarly we can always apply an equivalent change of mea- j sure to assume that the discounted assets Set are square integrable, and there is an equivalent risk-neutral measure preserving square integrability. 30 CHAPTER 1. PROBABILITY = PRICE 1.4 Geometric Characterization of Arbitrage Free Markets

We have seen that the one period market model with random ﬁnal values 0 1 d d+1 S1(ω) = (S1 (ω),S1 ...,S1 (ω)) ∈ R+ at time t = 1 and initial prices π = 0 1 d d+1 (π , π , . . . , π ) ∈ R+ at time t = 0 is arbitrage free if and only the zero vector belongs the set of measures barycenters dQ ∞ 0 ∈ Mb(µ) := EQ(Y ) Q ∼ P, ∈ L , EQ(|Y |) < ∞ dP Z Z dν ∞ d d d = ydν ν ∼ µ, ∈ L (R , µ), ν(R ) = 1, |y|dν < ∞ ⊆ R Rd dµ Rd where Sj πj Y j = Sj − πj = 1 − , j = 1, . . . , d e e 0 0 S1 π is the vector of discounted rewards, and µ(B) = P(Y ∈ B), B ∈ B(Rd), (Borel set) is the image measure of the r.v. Y on its value space Rd. We consider also the larger set

Mb(µ) ⊆ M(µ) := EQ(Y ) Q ∼ P, EQ(|Y |) < ∞

Z Z d d = ydν ν ∼ µ, ν(R ) = 1, |y|dν < ∞ ⊆ R Rd Rd

Deﬁnition 1.4.1. For any probability measure ν on B(Rd), there exists a smallest closed set S ⊆ Rd such that ν(Sc) = 0, and it is called the support of ν. It is characterized by the following property: ν(Sc) = 0 and if U ⊆ Rd is open and S ∩ U 6= ∅, then ν(U ∩ S) > 0. Proof. For [ U = Br(x) d r∈Q+,x∈Q :ν(Br(x))=0

c where Br(x) = {y : |x − y| < r}. Then S := U satisﬁes the deﬁnition.

Deﬁnition 1.4.2. The relative interior of a convex set C ⊆ Rd is the set of points x ∈ C such that ∀y ∈ C ∃ε > 0 with

x − (y − x)ε ∈ C. 1.4. GEOMETRIC CHARACTERIZATION OF ARBITRAGE FREE MARKETS31

Remark 1.4.1. According to the deﬁnition, a singleton {x} is the relative interior of itself. Theorem 1.4.1. RelativeInterior ConvexHull Support(µ) = Mb(µ) = M(µ) (1.4.1)

Remark 1.4.2. This gives a practical way to check the NA condition: it is enough to look at the convex hull of the support of the distribution of the discounted reward vector Y , and check whether 0 is in the relative interior.

Proof. We show ﬁrst that that the set of barycenters (mean values) of probability measures ν ∼ µ on Rd contains (1.4.1). For m ∈ RelativeInterior ConvexHullSupport(µ) , we can always con-

d sider the shifted measure µe(B) = µ(B + m) B ∈ B(R ), such that 0 ∈ RelativeInterior ConvexHull Support(µe) . (1.4.2)

Without loss of generality it is enough to show that (1.4.2) implies the following no-arbitrage condition:

d ∀ξ ∈ R is such that µ(y : ξ · y ≥ 0) = 1, then µ(y : ξ · y = 0) = 1. (1.4.3)

Otherwise, ∃ξ ∈ Rd, δ > 0, such that µ(y : ξ · y ≥ 0) = 1, and µ(y : ξ · y > δ) > 0. In this case

Support(µ) ⊆ {y : ξ · y ≥ 0} a closed half-space but not in the (closed) hyperplane {y : ξ · y = 0}. By the deﬁnition of support, ξ · y ≥ 0 ∀y ∈ Support(µ) and ∃y∗ ∈ supp(µ) such that ξ · y∗ > 0. Since y∗ ∈ ConvexHullSupport(µ) and assumption (1.4.2), by the deﬁ- nition of relative interior

−εy∗ ∈ ConvexHullSupport(µ) for some ε > 0, which means

∗ −εy = α1y1 + ··· + αnyn 32 CHAPTER 1. PROBABILITY = PRICE for some n ∈ N, yk ∈ supp(µ) αk > 0 with (α1 + ··· + αk) = 1, and we obtain the contradiction

∗ 0 > −εξ · y = α1 ξ · y1 + ··· + αn ξ · yn ≥ 0 . Therefore (1.4.3) must be true and by the fundamental theorem of asset pricing on the canonical space Ω = Rd equipped with the probability measure ∗ dµ∗ ∞ µ, there is an equivalent risk-neutral measure µ ∼ µ with dµ ∈ L (µ), such that Z Z |y|µ∗(dy) < ∞ and yµ∗(dy) = 0 Rd Rd which proves that 0 ∈ Mb(µ). In the opposite direction, assume that for some ν ∼ µ Z Z |y|ν(dy) < ∞ and m = yν(dy) 6∈ C, Rd Rd C = RelativeInterior ConvexHullSupport(µ)

Again by shifting the measure we can assume that m = 0 ∈ Rd. By the ﬁnite dimensional Separating Hyperplane Theorem applies to the convex set C, there is a vector ξ ∈ Rd such that ξ · y ≥ 0 ∀y ∈ C and ξ · y∗ > 0 for some y∗ ∈ C. By taking limits, ξ·y ≥ 0 also ∀y ∈ ConvexHull Support(µ) and ξ·y0 > 0 strictly for some y0 ∈ Support(µ). Hence {y : ξ · y > 0} ∩ supp(µ) 6= ∅ which by the property of the support implies that µ{y : ξ · y > 0} > 0 together with µ{y : ξ · y ≥ 0} = 1 By the equivalence of µ and ν, this is true also if we replace µ with the measure ν giving a contradiction Z Z ξ · 0 = ξ · yν(dy) = ξ · y ν(dy) > 0 Rd Rd We conclude that M(µ) ⊆ RelativeInterior ConvexHullSupport(µ) 1.5. FIRST FUNDAMENTAL THEOREM IN MULTIPERIOD MODELS 33 1.5 First Fundamental Theorem in Multiperiod models

Deﬁnition 1.5.1. On a probability space (Ω, F) a ﬁltration F = (Ft : t ≥ 0) is a nondecreasing family of σ-algebrae indexed by time, i.e. Fs ⊆ Ft ⊂ F ∀s ≤ t. We say that a stochastic process (St(ω): t ≥ 0) is F-adapted, if ω 7→ St(ω) is Ft-measurable ∀t. In discrete time, we say that a stochastic process (St(ω): t ∈ N) is F- predictable if ω 7→ St(ω) is Ft−1-measurable ∀t ≥ 1.

Deﬁnition 1.5.2. We say that stochastic process (St(ω): t ≥ 0) is a (F, P)- supermartingale if 1 1. St ∈ L (Ω, Ft, P) ∀t, i.e. S is F-adapted and integrable w.r.t. P. 2. the supemartingale property holds: ∀r ≤ t EP St Fr (ω) ≤ Sr(ω)

We say that St is a (F, P)-submartingale if (−St) is a (F, P)-supermartingale, i.e. ∀r ≤ t EP St Fr (ω) ≥ Sr(ω) and St is a (F, P)-martingale when it is both a (F, P)-submartingale and a (F, P)-supermartingale, i.e. ∀r ≤ t EP St Fr (ω) = Sr(ω) We consider now a stochastic process of (d + 1) asset prices

0 1 d d+1 St(ω) = (St (ω),St (ω),...,St (ω)) ∈ R+ with time index t ∈ {0, 1,...,T }, T > 1. We assume that St is adapted j 0 to the ﬁltration F = (Ft, 0 ≤ t ≤ T ), and St (ω) ≥ 0 ∀j, t, St (ω) > 0 with 0 probability P = 1. We choose the asset St as numeraire and consider the discounted prices

1 d d+1 Set(ω) = (1, Set (ω),..., Set (ω)) ∈ R+ with j j St (ω) Set (ω) = 0 St (ω) 34 CHAPTER 1. PROBABILITY = PRICE

We say that Q ∼ P is an equivalent martingale measure (or equiva- j lent risk-neutral measure for the market with asset prices S), if Set are Q- martingales ∀1 ≤ j ≤ d. The fundamental theorem of asset pricing in discrete time says that the market with price process St is arbitrage-free if and only if there exists an equivalent martingale measure ∼ with essentially bounded density dQ ∈ Q P dP L∞(P). Note this extends the previous version of the fundamental theorem also to the case where the initial prices at time t = 0 are random variables. The idea of the proof by Kreps and Yan is as follows: assuming that there are no arbitrage possibilities between time t and time (t + 1), by the inﬁnite dimensional version of the separating hyperplane theorem there is a random variable in L∞(P), which is dual of L1(P) which turns out to be the positive density dQ of a risk-neutral measure . dP Q Consider the set of discounted gains achievable by a predictable strategy between time t and time (t + 1)

0 0 Kt = ξ · (Set+1 − Set): ξ ∈ L (Ft) ⊆ L (Ω, Ft+1, Ω)

The No-Arbitrage Condition at time t (NAt) means that

0 Kt ∩ L (R+) = {0} which means that the only non-negative discounted gain which can be obtained using predictable strategy is identically zero. Equivalently

0 Ket ∩ L (R+) = {0} (1.5.1) where

0 Ket = Kt − L (R+) is the set of non-negative discounted gains obtained by predictable strategies and subtracted a non-negative consumption. Note that Kt is a linear subspace of random variables (closed under linear combinations) and Ket is a convex cone (closed under linear combinations with positive coeﬃcients). We will need the following technical lemma

Lemma 1.5.1. Under the (NAt) condition 1.5.1 Ket is closed under convergence in probability.

Proof. Paragraph 1.2.3.2 in Fundamentals and Advanced Techniques in Deriva- tives Hedging, by Bruno Bouchard and Jean-Francois-Chassagneux. 1.5. FIRST FUNDAMENTAL THEOREM IN MULTIPERIOD MODELS 35

Proof of Kreps & Yan Theorem If the (NAt) condition (1.5.1) holds, then we show that there exists

∞ ζt ∈ L (Ω, Ft, P). t = 1,...,T such that EP ζt+1(Set+1 − Set) Ft = 0 and P(ζt > 0) = 1. Then by using ζt we shall construct an equivalent martingale measure Q. ∀A ∈ Ft+1 with P(A) > 0 by the (NAt) condition 1A 6∈ Kt, and also 1 1 1A 6∈ Ket. Ket ∩ L (P) is closed in L (Ω, Ft+1, P): namely if a sequence Xn ∈ 1 1 Ket ∩ L (P) converges to X in L (P), it converges to X also in probability, and since by Lemma 1.5.1 Ket is closed under convergence in probability, necessarily X ∈ Ket. We recall now that the dual space of L1(P) is L∞(P) (the space of P- 1 essentially bounded random variables). Since Ket ∩L (P) is closed and convex 1 in L (Ω, Ft+1, P), and the singleton {1A} with A ∈ Ft+1, is obviously convex and compact, by the Hahn-Banach strict separating hyperplane Theorem ∞ there exists a random variable Y = Yt+1 ∈ L (Ω, Ft+1, P) such that sup EP YX < EP Y 1A (1.5.2) 1 X∈Ket∩L (P) and because Ket is a cone EP Y 1A > 0 and sup EP YX ≤ 0. (1.5.3) 1 X∈Ket∩L (P )

From (1.5.3) it follows that for X = −1(Y < 0) ∈ Ket (it corresponds to an investment and consumption strategy with only the consumption term), we have P(Y ≥ 0) = 1. ∞ We show that there is also an element Yb = Ybt+1 ∈ L (Ω, Ft+1, P) satisfying (1.5.3) with P(Yb > 0) = 1. Let ∞ Gt+1 = Y ∈ L (Ω, Ft+1, P),Y ≥ 0 P a.s., and satisfying (1.5.3) which is also a convex cone, and for each Y let s(Y ) := {ω : Y (ω) > 0} ∈ Ft+1. Let (Yn) ⊂ Gt+1 such that P(s(Yn)) → sup P(s(Y )) Y ∈Gt+1 36 CHAPTER 1. PROBABILITY = PRICE

Then

X −n −1 Yb(ω) = 2 (1+ k Yn k∞) Yn(ω) ∈ Gt+1 n>0 with P(s(Yb)) = max P(s(Y )) : Y ∈ Gt+1

We show that P(s(Yb)) = 1. Otherwise the complement of s(Yb) has c P s Yb > 0,

c ¯ and by the previous argument with A = s(Yb) ∈ Ft+1 there is Y ∈ Gt+1 such that

sup (YX¯ ) ≤ 0 < Y¯ 1 EP EP s(Yb )c 1 X∈Ket∩L (P )

c this would imply P s(Y¯ ) ∩ s(Yb) > 0 (strictly) and P(s(Y¯ + Yb)) > P(s(Yb)) which contradicts with the maximal property of Yb. Therefore P(Yb > 0) = 1. Applying (1.5.3) to the strategies j j j j ∞ ej1 E (Set+1 − Set )Yb Ft > 0 and − ej1 E (Set+1 − Set )Yb Ft < 0 ∈ L (Ft) investing in ±1 unit of the j-th asset,j = 1, . . . , d gives j j j j 0 ≥ EP 1 EP (Set+1 − Set )Yb Ft > 0 (Set+1 − Set )Yb j j j j = E 1 E (Set+1 − Set )Yb Ft > 0 E (Set+1 − Set )Yb Ft P P P and j j j j 0 ≤ EP 1 EP (Set+1 − Set )Yb Ft < 0 (Set+1 − Set )Yb j j j j = E 1 E (Set+1 − Set )Yb Ft < 0 E (Set+1 − Set )Yb Ft P P P

(by the properties of the conditional expectation), which implies EP (Set+1 − Set)Yb Ft = 0 with P = 1. 1.5. FIRST FUNDAMENTAL THEOREM IN MULTIPERIOD MODELS 37

∞ Note that Yb = Ybt+1 ∈ L (Ω, Ft+1,P ), and by normalizing we can assume without loss of generality that E(Yb) = 1. We do this starting from the end for the transitions from t = T − 1 and t + 1 = T , obtaining in this way the probability measure Q(T ) ∼ P with bounded density

(T ) dQ ∞ 0 < = YbT (ω) ∈ L (Ω, FT , P) dP Now starting from this new probability Q(T ) ∼ P, we repeat the argument for the previous transitions from t = T −2 to t+1 = T −1, ﬁnding a probability measure Q(T −1) ∼ Q(T ) with bounded density

(T −1) dQ ∞ (T ) 0 < = YbT −1(ω) ∈ L (Ω, FT −1,Q ) dQ(T ) and continue inductively, until the ﬁrst transition from t = 0 to t + 1 = 1 where we ﬁnd a probability measure Q(1) ∼ Q(2) with bounded density

(1) dQ ∞ (2) 0 < = Yb1(ω) ∈ L (Ω, F1,Q ) dQ(2)

Note that all these probabilities are equivalent to P. and the last one Q = Q(1) has likelihood ratio (1) (2) (T −1) (T ) dQ dQ dQ dQ dQ ∞ ZT := = ... = Yb1Yb2 ... YbT −1YbT ∈ L (Ω, FT , P). dP dQ(2) dQ(3) dQ(T ) dP We introduce the likelihood ratio process Zt = EP ZT |Ft), t = 1,...,T . We have

1. P(Zt > 0) = 1, ∞ 2. Zt ∈ L (Ω, Ft, P)

3. EP(Zt+1|Ft) = Zt (martingale property)

4. EP(Zt) = 1 i.e. Zt is a (F, P)-martingale which is positive and essentially bounded. For 0 ≤ t ≤ T − 1 EP ZT (Set+1 − Set) Ft = EP Yb1 ... YbT (Set+1 − Set) Ft = Yb1 ... YbtEP Ybt+1 ... YbT (Set+1 − Set) Ft = Yb1 ... Ybt EP Ybt+1 ... YbT (Set+1 − Set) Ft = 0 | {z } =0 38 CHAPTER 1. PROBABILITY = PRICE since for t + 1 = T EP YbT (Set+1 − Set) Ft = 0 and for t < T − 1 EP Ybt+1 ... YbT (Set+1 − Set) Ft = EQ(t+2) Ybt+1(Set+1 − Set) Ft = 0 By the abstract Bayes formula (2.1.1) EP ZT (Set+1 − Set) Ft EP Zt+1(Set+1 − Set) Ft 0 = = = EQ Set+1 − Set Ft EP(ZT |Ft Zt

dQ ∞ and with = ZT ∈ L ( ) is an equivalent martingale measure for the Q dP P discounted asset values Set on the discrete time interval { 0, 1,...,T }. 2 Exercise 1.5.1. Let Y ∈ L∞(Ω, F, P) with Y > 0 P a.s. Since L∞(P) ⊆ L1(P), after normalization

Y/EP(Y ) is also an essentially bounded random variable. However if G ⊂ F is a sub σ-algebra which is not P-independent from the r.v. Y , it is not always true that

∞ Y/EP(Y |G) ∈ L (P)

Construct a counterexample.

We prove now that the existence of an equivalent martingale measure Q for Set implies absence of arbitrage between consecutive times (NAt). Let Q be equivalent martingale measure for Set, and for a ﬁxed t ≤ T let d+1 ξt ∈ R a Ft−1-measurable investment strategy which has between times (t − 1) and t discounted proﬁt

ξ ∆Vet = ξt · ∆Set ≥ 0 Note that in general the discounted portfolio proﬁts ξ · ∆Set which are increments of the discounted value process

t ξ ξ X ξ Vet = Ve0 + ξk · ∆Sek = Ve0 + ξ • Se)t k=1 1.5. FIRST FUNDAMENTAL THEOREM IN MULTIPERIOD MODELS 39

(with the martingale transform notation) does not need to be Q-integrable, ∞ ξ unless ξt ∈ L (P), and without integrability the process Vet is not necessarily a martingale (it is a generalized martingale and a local martingale as ex- plained in the book by Shiryaev, Probability, Part II Chapter 7.1). Because of that, we use a truncation argument: for each n ∈ N consider the bounded (n) investment strategy ξt = ξt1(|ξt| ≤ n). Then also

(n) ξt · ∆Set ≥ 0 and

(n) EQ ξt · ∆Set = 0 which implies ξ · ∆Set1(|ξ| > n) = 0 P a.s. and since this holds ∀n and P(|ξ| > n) ↓ 0, it follows by σ-additivity of the probability measure that ξ ∆Vt = 0 P a.s. 2 Lemma 1.5.2. (NAt) is equivalent to (NAtb ) below: (NAtb): there is no arbitrage with bounded predictable strategies between consecutive times 0 ≤ t − 1 < t ≤ T : ∞ d which means that when ξ ∈ L (Ω, Ft−1, P) is R -valued and ξt · ∆Set ≥ 0 P-a.s., necessarily ξt · ∆Set = 0.

Proof. Obviously (NAt)=⇒ (NAtb). In the other direction, if ξt is Ft−1- measurable and ξt · ∆Set ≥ 0, P-a.s., for each n ∈ N also

1(|ξt| ≤ n)ξt · ∆Set ≥ 0, P a.s.

∞ where 1(|ξt| ≤ n)ξt is in L (Ω, Ft−1, P), and by (NAtb)

1(|ξt| ≤ n)ξt · ∆Set = 0, P a.s. and taking the limit for n → ∞ (NAt) follows.

Lemma 1.5.3. (NA) is equivalent to (NAt) and (NAb) is equivalent to (NAtb)

Proof. Obviously (NA)=⇒ (NAt) and (NAb)=⇒ (NAtb) On the other direction, we show that if there is an predictable arbitrage strategy ξt in the time interval {1,...,T } such that ξ • Se T ≥ 0 P a.s. and ξ • Se T > 0 with positive P-probability 40 CHAPTER 1. PROBABILITY = PRICE then there has to be an arbitrage between some consecutive times 0 ≤ t−1 < t ≤ T . Let t = inf r : P (ξ • Se)r ≥ 0 = 1 and P (ξ • Se)r > 0 > 0 . Necessarily t ≤ T and since t is the smallest time with arbitrage possibilies in the interval {0, 1, . . . , t} 1. either P (ξ • Se)t−1 = 0 = 1, or 2. P (ξ • Se)t−1 < 0 > 0. In case (1),

ξt · ∆Set = (ξ • Se)t − (ξ • Se)t−1 = (ξ • Se)t P a.s. which is non-negative with P = 1 and striclty positive with positive P- probability, and in case (2), let A = ω :(ξ • Se)t−1 < 0 ∈ Ft−1, with P(A) > 0

1Aξt · ∆Set = (ξ • Se)t1A − (ξ • Se)t−11A ≥ −(ξ • Se)t−11A ≥ 0 P a.s. where the last term on the right is striclty positive with positive P-probability. Therefore ξt1A is an arbitrage strategy between consecutive times t−1, t To resume P∗ 6= ∅ m (NA) ⇐⇒ (NAt) ⇐⇒ (NAtb) ⇐⇒ (NAb).

1.6 Change of numeraire

In an arbitrage free market the set of risk neutral measures ∗ dQ ∞ Si πi P = Q ∼ P : ∈ L (P ), EQ = , i = 1, . . . , d dP S0 π0 depends on the choice of the numeraire. In fact instead of choosing S0 we could choose as numeraire any other asset (say for example S1) with the property P (S1 > 0) = 1. ∗ If Q ∈ P is risk neutral with numeraire S0, we look for a new measure dQb ∞ Qb with likelihood ratio Zb = dQ ∈ L such that S π i = i ∀0 ≤ i ≤ d EQb S1 π1 1.7. OPTION PRICING 41

By the change of measure formula this means that ∀i

πi Si dQb = EQ π1 S1 dQ and by multiplying and dividing by π0 and by S0 under the expectation πi π0 Si S0 dQb = EQ π0 π1 S0 S1 dQ and since Q is risk neutral for the numeraire S0 this is true when

0 dQb π S1(ω) (ω) = 1 dQ π S0(ω)

Note that it always possible to ﬁnd the QP and QPb risk neutral with dQ dQb numeraires S0 and S1 respectively, such that both dP and dP are P -essentially bounded.

1.7 Option pricing

i Consider an arbitrage free market model with random assets (St : i = i i i 0, 1, . . . , d), with St ≥ 0 P -almost surely and initial prices (S0 = π : i = 0, 1, . . . , d) on a probability space (Ω, F,P ). Since the market is arbitrage free the set of risk-neutral measures equivalent to the reference probability P 0 0 (with respect to the numeraire St satisfying the assumption P (S > 0) = 1) i i ∗ dQ ∞ S1 S0 P = Q ∼ P : ∈ L (P ) and EQ 0 = 0 dP S1 S0 is nonempty. For a new random contingent claim F (ω) ≥ 0, we want to find the set of arbitrage free prices C(F ) ⊂ R+ such that for c ∈ C(F ) the extended market model with assets (S0,S1,...,Sd,F ) and intitial prices (π0, π1, . . . , πd, c) remains arbitrage-free. The 1st fundamental theorem implies that 0 F ∗ C(F ) = c = π EQ 0 : Q ∈ P and the expectation is finite ST d+1 When F (ω) = f(ST (ω)) for some measurable function f : R+ → R+ of the final asset values, we say that F is an option of european type. For example call,K 1 + F (ω) = ST (ω) − K 42 CHAPTER 1. PROBABILITY = PRICE is the european call option on the underlying asset S1 with strike price K > 0, and

put,K 1 + F (ω) = K − ST (ω) is the european put option on the underlying asset S1 with (deterministic) strike price K > 0. Note that we have the put-call parity

1 + 1 + 1 ST (ω) − K − K − ST (ω) = ST (ω) − K which implies that the arbitrage free prices of call and put options c(F call,K ) and c(F put,K ) are linked by the relation K c(F call,K ) − c(F put,K ) = π1 − 1 + r where π1 = c(S1) is the price of the underlying asset S1 at time t = 0 and (S0−π0) r = π0 > −1 is the deterministic return of the riskless money account. Theorem 1.7.1. Suppose that the one-period market model with asset values (S0(t),S1(t),...,Sd(t)), t ∈ {0, 1} is arbitrage free, and that P (S0(t) > 0) = 1 so we can choose the asset S0 as numeraire. Then the set of arbitrage free prices of a contingent claim F (ω) ≥ 0 is given by F ∗ F C(F ) = S0(0)EQ : Q ∈ P such that EQ < ∞ S0(1) S0(1) where by assumption the set of equivalent risk neutral measures P∗ with respect to the numeraire S0 is nonempty. Proof. π(F ) is an arbitrage free price if and only if F π(F ) = S0(0)EQ S0(1) for some Q ∈ P∗. To show that C(F ) 6= ∅, we ﬁx ﬁrst a measure Pe ∼ P such that F < ∞ EPe S0(1)

This can be done by choosing Pe with

−1 dPe 1 1 0 < = ∈ L∞(P ) dP 1 + F EP 1 + F 1.7. OPTION PRICING 43

Then, for Q ∈ P∗ with dQ ∈ L∞ dPe F dQ F = < ∞ EQ EPe S0(1) dPe S0(1)

Corollary 1.7.1. In an arbitrage free one period market model, the set C(F ) of arbitrage free prices is convex, i.e. it is an interval (possibly a singleton if the price is unique) contained in R+.

1.7.1 Intermezzo: The Gaussian integration by parts formula The standard Gaussian density

2 1 − x φ(x) = √ e 2 , x ∈ R 2π satisfies the Ordinary Differential Equation d φ(x) = −φ(x)x dx Definition 1.7.1. If f : R → R is continuous with Z x f(x) = f(x0) + Df(y)dy x0 for some measurable function Df(x) we say that f is absolutely continuous and Df(x) is a weak derivative. When f is differentiable in the classical sense it is absolutely continuous and the classical derivative is a weak derivative. For example for the Gaussian density Z x φ(x) = φ(x0) − yφ(y)dy x0 where we can also take x0 = −∞. Proposition 1.7.1 (Stein equation). Let N ∼ N (0, 1) be a standard Gaus- sian random variable and f be absolutely continuous with EP |Df(N)| < ∞ (1.7.1) Then we have the integration by parts formula EP Df(N) = EP f(N)N (1.7.2) 44 CHAPTER 1. PROBABILITY = PRICE

Proof. Z Z x EP f(N)N = x f(0) + Df(y)dy dx R 0 Z ∞ Z ∞ = f(0) EP (N) + 1(x > y)Df(y)xφ(x)dydx | {z } 0 0 =0 Z 0 Z 0 − 1(x < y)Df(y)xφ(x)dydx −∞ −∞ Z ∞Z ∞ Z 0 Z y = − Dφ(x)dx Df(y)dy + Dφ(x)dx Df(y)dydx 0 y −∞ −∞ Z ∞ = Df(y)φ(y)dy = EP Df(N) −∞ where under condition (1.7.1) we use Fubini theorem to change the order of integration.

Corollary 1.7.2. If f, g are absolutely continuous and N ∼ N (0, 1) with f(N), g(N), Df(N), Dg(N) ∈ L2(P ), then the divergence operator δf(x) = D∗f(x) := f(x)x − Df(x) is the adjoint of the weak derivative operator in the L2 space of the standard Gaussian distribution, satisfying the Gaussian integration by parts formula E g(N)Df(N) = E f(N)δg(N) = E f(N)(g(N)N − Dg(N)) . Proof. Using the integration by parts formula(1.7.2) together with the product rule of diﬀerentiation for h(N) = f(N)g(N), the integration by parts formula and the linearity of the expectation we have E |Dh(N)| < ∞ and EP f(N)g(N)N) = EP f(N)Dg(N) + EP (g(N)Df(N)

1.7.2 Black and Scholes european call option pricing formula for the lognormal stock price Consider the 1-period market model with t = 0,T , where T > 0 is the time horizon or maturity, and investment assets Bt,St where σ2 B = B erT ,S = S exp µ − T + σW T 0 T 0 2 T 1.7. OPTION PRICING 45

and B0,S0 > 0. Here the interest rate r of the riskless investment, the drift parameter µ, and the volatility parameter σ are deterministic, and, under the reference probability P, WT is a Gaussian random variable with EP[WT ] = 0 2 and variance EP[WT ] = T . This Black & Scholes log-normal model is derived from a continuous time stochastic process model, the geometric Brownian motion. In order to price european options with maturity T , it is enough to consider the marginal distribution at maturity. σ2T σWT We recall that WT has moment generating function E e = e 2 , and write

σ2 −rT µ−r− T σWT B0EP ST /BT ] = e EP ST ] = S0e 2 EP e = S0 exp (µ − r)T To show that this (B,S)-model is arbitrage free it is equivalent to ﬁnd an equivalent risk neutral probability measure Q ∼ P for the discounted stock price, which means ST S0 EQ = BT B0

−rT EQ[ST ]e = S0 (1.7.3)

Note that ST is lognormal distributed, i.e. log(ST ) is Gaussian and the law of ST under P is equivalent to Lebesgue measure on [0, ∞). Therefore any probability distribution on [0, ∞) with strictly positive density satisfying the moment condition (1.7.3) corresponds to an equivalent risk-neutral measure for this (B,S)-model. We change the probability measure by changing the drift parameter of the lognormal distribution in order to satisfy (1.7.3) keeping the same volatility parameter σ as follows:

2 2 −rT σ σ T e ST = S0 exp (µ − r − )T + σWT = S0 exp σWfT − 2 2 where (µ − r)T WfT = WT + σ and we introduce an equivalent pricing probability measure Q ∼ P such that WfT is a zero mean Gaussian with variance T under Q. Equivalently WT has mean (r − µ)T/σ and variance T under Q. This is obtained by 46 CHAPTER 1. PROBABILITY = PRICE

dQ computing the likelihood ratio dP as the density ratio of a Gaussian with mean (µ − r)T/σ and variance T , and a zero-mean Gaussian with the same variance T computed at WT : dQ (W + (µ − r)T/σ)2 1 W 2 √ = exp − T √ × exp T 2πT dP 2T 2πT 2T 2 2 (µ − r)WT (µ − r) T (µ − r)WfT (µ − r) T = exp − − = exp − + σ σ2 σ σ2

Under Q the expected return of the stock ST at maturity coincides with the rT return e of the riskless instrument BT . Now let F = f(ST ) be an option of european type, the price of the option at time t = 0 consistent with our pricing measure is given by 2 −rT −rT σ c(F ) = e EQ f(ST ) = e EQ f S0 exp (r − T − σWfT 2 + In case of an european call option with F = f(ST ) = (ST − K) with strike price K > 0 we compute explicitely the Black & Scholes price: −rT + −rT −rT + c(F ) = e EQ (ST − K) = EQ (e ST − Ke ) 2 + σ T −rT = Q S0 exp σWfT − − Ke E 2 √ σ2T + = E S expσ TN − − Ke−rT 0 2 where N ∼ N (0, 1) is a standard Gaussian random variable. Integrating w.r.t. the standard Gaussian density φ(x)

Z ∞ √ 2 σ T x− σ T −rT c(F ) = S0e 2 − Ke φ(x)dx α with log(K/S ) + σ2 − rT α = 0 √ 2 σ T where since the Gaussian density is an even function, we assumed σ > 0 R x without loss of generality. Denoting by Φ(x) = Q(N ≤ x) = −∞ φ(y)dy = 1 − Φ(−x) the cumulative distrbution function of the standard Gaussian, Z ∞ √ 2 2 σ T x dx −rT c(F ) = S0 exp σ T x − − √ − Ke Φ(−α) 2 2 2π α √ Z ∞ 2 (x − σ T ) dx −rT = S0 exp − √ − Ke Φ(−α) α 2 2π 1.7. OPTION PRICING 47 where in the integral√ we have the√ density of the shifted Gaussian random variable (N + σ T ) with mean σ T and variance 1, obtaining the Black & Scholes formula (1973) for the call option with strike price K √ √ −rT −rT c(F ) = S0Q(N + σ T > α) − Ke Φ(−α) = S0Φ(σ T − α) − Ke Φ(−α) √ √ 2 2 σ T log(S0/K) −rT σ T log(S0/K) = S0Φ r + + √ − Ke Φ r − + √ 2 σ σ T 2 σ σ T Note that the option price c(F ) does not depend on the drift parameter µ of log ST under the reference probability P. Also in discrete time (B,S) market (when we are constrained to change our portfolio only at ﬁnitely many preﬁxed time) the price from Black & Scholes formula is not the unique arbitrage free price. It is only in continuous time, where continuous rebalancing of the portfolio is allowed, that the B&S price is the unique arbitrage free price (when under reference P, St is a geometric Brownian motion). In 1997 Myron S. Scholes and Robert C. Merton received the Nobel prize in economics for deriving the formula and inventing option pricing theory.

1.7.3 Hedging and superhedging contingent claims

0 Deﬁnition 1.7.1. We say that a contingent claim F ∈ L (Ω, FT , P) is at- 0 1 d tainable at time T in the market St = (St ,St ,...,St ) with initial prices 0 1 d S0 = π = (π , π , . . . , π ) when there is a replicating portfolio (hedging) 0 d+1 ξt ∈ L (R , Ft−1, P) which is F-predictable satisfying the self-ﬁnancing condition

Vt = ξt · St = ξt+1 · St, such that at maturity T

d X j j F (ω) = VT (ω) = ξT · ST (ω) = ξT ST (ω) j=0

In such case, when the market (St) is arbitrage free, by the law of one price the unique arbitrage free price for F at time t = 0 is given by the initial price of the replicating portfolio

d X j j c0(F ) = V0 = ξ0 · π = ξ0π (ω) j=0

Remark 1.7.1. Note that if the contingent claim is attainable, there could be diﬀerent hedging portfolio, all with the same initial price. 48 CHAPTER 1. PROBABILITY = PRICE

Deﬁnition 1.7.2. In the previous settings, we say that a F-predictable self- ﬁnancing strategy ξt is a superhedging (super-replicating) for the option F (ω), if

F (ω) ≤ VT (ω) = ξT · ST P-almost surely and it is a subhedging (under-replicating) if

F (ω) ≥ VT (ω) = ξT · ST P-almost surely

Theorem 1.7.2. In the one period market model with t ∈ {0, 1}, a contingent claim F ∈ L1(Ω, F,P ) is attainable in the arbitrage free market 0 1 d 0 1 d (St ,St ,...,St ) with initial prices S0 = π = (π , π , . . . , π ) if and only if it admits an unique arbitrage free price c(F ). If this is not the case, then either C(F ) = ∅, or the set of arbitrage free prices for F is a open interval

inf sup C(F ) = (c0 (F ), c0 (F ))

sup where possibly c0 (F ) = ∞. Moreover we have the superhedging duality

inf − − c0 (F ) = sup C (F ) where C (F ) = V0 = ξ0 · S0 : F-predictable self-ﬁnancing portfolio ξt under-replicating F and

sup + + c0 (F ) = inf C (F ) where C (F ) = V0 = ξ0 · S0 : F-predictable self-ﬁnancing portfolio ξt super-replicating F

inf Remark 1.7.2. c0 (F ) is the lower-bound for the prices a trader can ask for sup selling F and c0 (F ) is the upper bound for the prices a trader can oﬀer for buying F without creating arbitrage opportunities for the counterparts. inf sup When c0 (F ) < c0 (F ), these are arbitrage prices. Proof. We give the proof in the one-period market model with deterministic initial prices. The situation is clear if the contingent claim is attainable: by the Law of One Price, the unique arbitrage-free price for F is the current price of a replicating portfolio, and in an arbitrage-free market all replicating portfolios have the same price. Otherwise, since P∗ is convex, C(F ) is an interval when it is non-empty. We show that it is an open interval. As we have seen c is an arbitrage free price for F in the market model i i i with assets (S1 : i = 0, . . . , d) and initial prices S0 = π , i = 1, . . . , d if and 0 1 d only if the extended market with assets (S1 ,S1 ,...,S1 ,F ) and initial prices 1.7. OPTION PRICING 49

i i (S0 = π , c0(F ) = c : i = 0, . . . , d) is arbitrage-free, equivalently there exists Q ∈ P∗ such that

EQ(Se1) = Se0, and EQ(Fe) = ec

i i 0 0 0 where Set = St/St , Fe = F/S1 , ec = c/S0 are discounted with respect to the 0 numeraire instrument St . ∗ ∗ We show ﬁrst that if there exist Q ∈ P with EQ∗ (Fe) = ∞, then sup C(F ) = +∞ Otherwise F admits an arbitrage price π > sup C(F ), which means that there is an arbitrage strategy (ξi : i = 0, 1, . . . , d + 1) in the extended market with

d X i i d+1 ξ S0 + ξ π ≤ 0 i=0 d X i i d+1 ξ S1 + ξ F ≥ 0 P a.s., > 0 strictly with P > 0 i=0 Necessarily ξd+1 > 0, because ξd+1 = 0 would mean that ξ is an arbitrage d+1 strategy in the (St) market, and when ξ < 0 Pd ξiSi 0 ≤ F ≤ G = − i=0 1 ξd+1 which leads to a contradiction since after discounting by the numeraire the right hand side is Q∗ integrable while Fe is not. Hence ξd+1 > 0, and

d P ξiSei F ≥ G = − i=0 1 ξd+1 is an underreplicating portfolio. For n ≥ 1 let Fn = F ∧ (G ∨ 0 + n), with 0 ≤ Fn ↑ F . Then

d d+1 X i i d+1 ξ (Fn − G) = ξ S1 + ξ Fn ≥ 0 i=0

P a.s., and with strict inequality > with P > 0 for n such that P(F < n) > 0. Therefore for n large enough π which was assumed to be an arbitrage price for F is also an arbitrage price for Fn with the same arbitrage strategy ξ. ∀n ∈ NEQ∗ (Fen) < ∞ and by the monotone convergence theorem EQ∗ Fen ↑ EQ∗ Fe = ∞ 50 CHAPTER 1. PROBABILITY = PRICE

∗ ∗ and for n large enough EQ∗ (Fen) > πe. Since we assumed C(F ) 6= ∅, ∃P ∈ P such that EP∗ Fen ≤ EP∗ Fe < πe and for some α ∈ (0, 1)

EPα Fen = πe where P α = (αQ∗ + (1 − α)P∗) ∈ P∗. In contradiction with previous state- ments, π would be an arbitrage free-price for Fn. We show now that when F is not attainable in the (St)-market if π ∈ C(F ), there are π,ˇ πˆ ∈ C(F ) with 0 < πˇ < π < πˆ. Let P∗ ∈ P∗ such that 1 ∗ Fe ∈ L (Ω, FT , P ) 0 F 0 π = S0 EP∗ 0 = S0 EP∗ Fe S1 The set of discounted portfolio values

d+1 1 ∗ Ve = ξ · Se1 : ξ ∈ R ⊂ L (Ω, F1, P ) is a closed linear space of dimension ≤ (d + 1). Since F is not attainable, Fe 6∈ Ve and by Hahn-Banach separating hyperplane theorem there is an ∞ element of the dual space X ∈ L (Ω, F1,P ) such that

sup EP ∗ (XVe) ≤ 0 < EP ∗ (XFe) Ve ∈Ve with strict separation. Since Ve is a linear space

EP ∗ (XVe) = 0 ∀Ve ∈ Ve

By rescaling we can assume that |X| ≤ 1/2 P-a.s. and consider the positive probability measure Pb ∼ P∗ ∼ P with

dPb (ω) = 1 + X(ω) > 0 with P = 1 dP∗ Note that

EP ∗ (X1) = 0 1.7. OPTION PRICING 51 since the constant 1 ∈ Ve, corresponding to an investment with 1 unit of the numeraire asset. In particular (V ) = ∗ (V ) ∀V ∈ , which implies also EPb e EP e e Ve that Pb ∈ P∗. The same holds for Pˇ deﬁned with dPˇ (ω) = 1 − X(ω) > 0 P a.s. dP∗ We have that πˇ π ∗ ∗ ∗ 0 = EPˇ Fe = EP (Fe) − EP (XFe) < EP (Fe) = 0 S0 S0 π < ∗ (F ) + ∗ (XF ) = F = b EP e EP e EPb e 0 S0 where both πˇ and πb are arbitrage-free prices for F in the S-market.

The initial price of a superhedging strategy is an arbitrage price for the option seller (unless the superhedging is a replicating strategy, in this case the option F is attainable with unique arbitrage free price). Let η ≥ 0 and ξ = (ξ1, . . . , ξd) ∈ Rd such that d X i i i η + ξ (Se1 − Se0) ≥ Fe P-a.s. i=1 The strategy

d X i η − ξiS0, ξ1, . . . , ξd) i=1 is a superhedging, with initial discounted value Ve0 = η. By taking expectation with respect to any risk-neutral measure Q ∈ P∗ η ≥ EQ∗ Fe) which implies inf C+(F ) ≥ sup C(F ) (1.7.4) We next show that equality holds. It clear that this is the case when sup C(F ) = +∞. Otherwise if ∞ > π > sup C(F ), we have shown that d+2 π is an arbitrage price for F , and there is ξ = (ξ0, ξ1, . . . , ξd, ξd+1) ∈ R with ξd+1 < 0 such that

d X i i d+1 ξ S0 + ξ π ≤ 0 i=0 52 CHAPTER 1. PROBABILITY = PRICE and

d X i i d+1 ξ S1 + ξ F ≥ 0 P a.s., and > 0 with P > 0. i=0 equivalently we have obtained a superhedging strategy

d X ξi − Si ≥ F ξ i i=0 d+1 with initial price

d X ξi − Si ≤ π ξ 0 i=1 d+1 and inf C+(F ) ≤ π. Now as π ↓ sup C(F ), it follows that

inf C+(F ) ≤ sup C(F ) and combined with (1.7.4) we get the equality. We show that there are superhedging strategies attaining the minimum + ξn + price inf C (F ). Let ξn be superhedging strategies with V0 = πn ↓ inf C (F ). We show that lim sup |ξn| < ∞, which implies that there is a convergent d+2 subsequence ξnk where the limit ξ ∈ R is a superhedging strategy with minimum price. First we normalize these strategies by taking ηn = ξn/|ξn| with values in the d-dimensional unit sphere, which is compact. Necessarily there is a convergent subsequence denoted again by ηn with a limit η on the unit sphere. If lim sup |ξn| = ∞, for some subsequence

πn Fe + ηn · (Se1 − Se0) ≥ |ξn| |ξn| would have limit

η · (Se1 − Se0) ≥ 0

P-a.s. Since the market is arbitrage free necessarily

η · (Se1 − Se0) = 0

P-a.s. which implies η = 0 in contradiction with |η| = 1. A similar argument works for the underhedging strategies 1.7. OPTION PRICING 53

¯ Deﬁnition 1.7.2. We say that a market model S = (S0,S1,...,Sd) is com- 1 plete, if all contingent claims F (ω) = f(S0(ω),...,Sd(ω)) ∈ L (Ω, F,P ) which are measurable with respect to the σ-algebra F = σ(S0,S1,...,Sd) are attainable. On a market model which is arbitrage free and complete all contingent claims have unique arbitrage-free price. Theorem 1.7.3 (2nd fundamental theorem of asset pricing). An arbitrage- 0 1 d free market model (S0,S1,...,Sd) with initial prices (π , π , . . . , π ) is complete if and only if the risk neutral measure Q is unique: dQ P∗ = Q ∼ P risk neutral for (π, S), ∈ L∞(P ) = {Q} dP

Proof. When the market is complete, the contingent claim F (ω) = S0(ω)1A(ω) with measurable A ∈ F is attainable by a portfolio ξ¯ ∈ Rd+1 and it has an unique arbitrage free price c(F ) = ξ¯ · π¯. By the first fundamental theorem c(F ) ∗ ∗ π0 = EQ(1A) ∀Q ∈ P , but this means that all Q ∈ P coincide since they coincide on all events A ∈ F. In the other direction, we show that when the risk neutral measure is unique, all contingent claims are attainable with unique arbitrage-free prices. Otherwise there would be a non-attainable contingent claim Fe ∈ L1(Ω, F,Q) \ Ve, where Q ∼ P is the unique risk neutral measure, and the set of attainable discounted contingent claims d+1 Ve = ξ · Se1 : ξ ∈ R is finite dimensional closed linear subspace of dimension ≤ (d + 1). By the separating hyperplane theorem in L1(Ω, F,Q) there is X ∈ L∞(Ω, F,P ) with |X| ≤ 1/2 P a.s. such that d+1 EQ (ξ · Se1)X > EQ(XFe) ∀ξ ∈ R and since Ve is a linear subspace, d+1 EQ (ξ · Se1)X = 0 > EQ(XFe) ∀ξ ∈ R It follows that Qb with Radon-Nikodym derivative dQb = (1 + X) > 0 a.s. dQ P satisfies (Si ) = Si , ∀i = 0, 1, . . . , d EQb e1 e1 Qb ∼ P would be a risk-neutral probability measure different than Q, which is in contradiction with the uniqueness assumption. 54 CHAPTER 1. PROBABILITY = PRICE

i Remark 1.7.3. In the one period market model (St : i ∈ {0, 1, . . . , d}, t ∈ {0,T = 1}) the set of all discounted attainable claims is given by d+1 1 Ve = ξ · SeT : ξ ∈ R ⊆ L (Ω; σ(ST ), Q) (1.7.5) where Ve is a finite dimensional linear subspace (without non-negativity con- straints) of dimension ≤ (d+1). The market model is complete if and only if we have equality, which means that the σ-algebra F = σ(ST ) = σ(A1,...,An) with disjoint atoms Ak with P(Ak) > 0, and the number of atoms n equals the dimension of Ve, which is number of linearly independent discounted instruments. This happens also in discrete time models with finite horizon, when the market is complete the σ-algebra FT is finitely generated and we can work on a finite probablity space with finitely many atoms.

1.8 Convex functions

Deﬁnition 1.8.1. A function on a vector space f : V → R∪{+∞} is convex if and only its epigraph, deﬁned as the set of points above its graph, epi(f) = (x, y) ∈ V × (R ∪ {+∞}): y ≥ f(x) is a convex set. Equivalently, ∀α ∈ [0, 1], x, y ∈ V f(αx + (1 − α)y) ≤ αf(x) + (1 − α)f(y). Theorem 1.8.1. Let f : R → R ∪ {+∞} be a convex function. • f has left and right derivatives at all x ∈ R. f(u) − f(x) f(y) − f(x) ∂−f(x) := lim ≤ ∂+f(x) := lim u↑x u − x y↓x y − x

• left and right derivatives ∂±f(x) are non-decreasing, ∂−f(x) ≤ ∂+f(x) ≤ ∂−f(y) ≤ ∂f +(y) ∀x < y with at most countably many discontinuity points with ∂−f(x) < ∂+f(x), and f is diﬀerentiable elsewhere.

• if f(x) = f+(x) − f−(x) where f± : R → R ∪ {+∞} are convex, then f(x) has representation

f(x) − f(x0) = Z Z x0 ± + + ∂ f(x0)(x − x0) + (x − y) µ(dy) + (y − x) µ(dy) (x0,∞) 0

where µ(dy) = µ+(dx) − µ−(dy) is a signed measure. 1.8. CONVEX FUNCTIONS 55

• If f(x) is convex and twice diﬀerentiable, necessarily ∂2f(x) ≥ 0.

x−u Proof. For u ≤ x ≤ y since we can write x = (1−α)u+αy for α = y−u ∈ [0, 1], by convexity, x − u f(x) − f(u) = f((1 − α)u + αy) − f(u) ≤ α(f(y) − f(u)) = (f(y) − f(u)) y − u we have f(x) − f(u) f(y) − f(u) f(y) − f(x) ≤ ≤ x − u y − u y − x which implies that in the deﬁnition of ∂± the limits are monotone. When f(x) is convex,

Z x Z xZ t + + + f(x) = f(x0) + ∂ f(t)dt = f(x0) + ∂ f(x0)(x − x0) + ∂ f(ds) dt x0 x0 x0 Z xZ x + + = f(x0) + ∂ f(x0)(x − x0) + dt ∂ f(ds) x0 s Z x + + = f(x0) + ∂ f(x0)(x − x0) + (x − s)∂ f(ds) x0 Z ∞ Z x0 + + + + + = f(x0) + ∂ f(x0)(x − x0) + (x − s) ∂ f(ds) + (s − x) ∂ f(ds) x0 0 where we used Fubini theorem to change the order of integration, and we have to distinguish the cases with x ≥ x0 and x < x0. When f(x) has a second derivative, such that

Z t 2 ∂f(t) = ∂f(x0) + ∂ f(s)ds x0 necessarily ∂2f(x) ≥ 0 (since x 7→ ∂f(x) is non-decreasing) and

Z x 2 f(x) = f(x0) + ∂f(x0)(x − x0) + (x − s)∂ f(s)ds x0

Lemma 1.8.1 (Jensen inequality). For f : R → R convex and X random variable with EP (|X|) < ∞, EP f(x) ≥ f EP (X) . 56 CHAPTER 1. PROBABILITY = PRICE

Proof. Z x + + f(x) = f(x0) + ∂ f(x0)(x − x0) + (x − s)∂ f(ds) x0 | {z } ≥0 + ≥ f(x0) + ∂ f(x0)(x − x0) since ∂+f(ds) is a positive measure. By taking the expectation under P with random x = X(ω) and constant x0 = EP X we obtain Jensen inequality + EP f(x) ≥ f EP (X) + ∂ f EP (X) EP X − EP (X) . | {z } =0

1.8.1 Replicating european options using call options

Corollary 1.8.1. A convex function f : R → R is necessarily absolutely continuous with non-decreasing weak derivative Z x f(x) = f(x0) + ∂f(y)dy x0 integrating against Lebesgue measure, without loss of generality we can take ∂f = ∂+f which is continuous from the right. Since non-decreasing functions on R correspond to positive measures, we introduce µ((a, b]) = ∂f(b) − ∂f(a) for a < b, which extends to a positive measure on the Borel σ-algebra B(R). Then by using Fubini theorem Z x f(x) − f(x0) = ∂f(x0)(x − x0) + µ((0, y]) − µ((0, x0]) dy = x0 Z ∞ Z ∞ Z x0 Z x0 ∂f(x0)(x − x0) + 1(r < y ≤ x)µ(dr)dy + 1(x < y ≤ r)µ(dr)dy x0 x0 0 0 Z ∞ Z x0 + + = ∂f(x0)(x − x0) + (x − r) µ(dr) + (r − x) µ(dr) x0 0 | {z } =(x−r)++r−x

For an underlying ﬁnancial asset St, t = 0,T , the value of an european option F = f(ST ) with convex f at maturity T is Z ∞ + f(ST ) = f(0) + ∂f(0)ST + (ST − K) µ(dK) 0 Z ∞ Z S0 + + = f(S0) + ∂f(S0)(ST − S0) + (ST − K) µ(dK) + (K − ST ) µ(dK) S0 0 1.8. CONVEX FUNCTIONS 57

rT and in a (B,S) market with a riskless bank account BT = e B0 its initial price is necessarily given by Z ∞ −rT + c0(F ) =e f(0) + ∂f(S0)S0 + c0 (ST − K) µ(dK) 0 Z ∞ −rT −rT + = e f(S0) + ∂f(S0)(1 − e )S0 + c0 (ST − K) µ(dK) S0 Z S0 + + c0 (K − ST ) µ(dK) 0

+ −rT + where c0((ST −K) ) = e EQ((ST −K) ) is the market price of the standard european call option at time t = 0 and by the put call parity of european options

+ −rT + c0((K − ST ) ) = Ke − S0 + c0((ST − K) )

When the underlying stock and call options on it are traded on the market for all strike prices, we can use the market prices when available, without need to estimate the pricing measure Q and computing the call options prices. The same expression holds when f(x) is difference of convex functions, then ∂f(x) has finite variation on finite intervals and decomposes into positive and negative parts, and µ is a signed measure, difference of mutually singular positive measures. By using the standardized european call options, the underlying and the bank account we can replicate any other european option f(ST ) with f difference of convex functions.

Exercise 1.8.1. In a (B,S) market replicate a digital option

f(ST ) = 1(ST > K)

rT using the underlying ST , the deterministic bank account BT = B0e and + european call options (ST − K) with all possible strikes. Hint. the indicator f(x) = 1(x > K) is discontinuous, and the repli- cation method does not apply directly. Approximate f by a diﬀerence of convex functions fn and replicate the options fn(ST ), and ﬁnd the price of f(ST ) as the limit of the fn(ST ) prices.

1.8.2 Leverage eﬀect of option investments Options are useful to manage the ﬁnancial risks. We demonstrate it in two examples invented by Esko Valkeila. 58 CHAPTER 1. PROBABILITY = PRICE

Exercise 1.8.2. Consider a random asset S(ω) ≥ 0 P -a.s. and a contract

+ FK (ω) = α(K) S(ω) − K) where K ≥ 0 is the strike price of the european call option E (S) π(S) α(K) = Q = > 0 + + EQ (S − K) π (S − K) is such that FK and S have the same expectation with respect to the pricing measure Q (i.e. have the same price). We show that

VarianceQ(FK ) ≥ VarianceQ(S) i.e. under the pricing probability Q, investing only in an european call option is always more risky than investing only on the underlying asset. However we remark that the risk-neutral probability Q used for pricing is not necessarily the probability that one would use to forecast the future values of S, when the market is not risk-neutral. Solution Equivalently we have to show that

E{(S − K)+}2 − E(S − K)+2 E[S]2 ≥ ES2 − ES2 E(S − K)+2 or E{(S − K)+}2 E[S]2 ≥ ≥ 1 E(S − K)+2 ES2

We show that the map

E{(S − K)+}2 K 7→ E((S − K)+)2 is non-decreasing on R+. We show that this map is diﬀerentiable and ∀K > 0 ∂ E{(S − K)+}2 0 ≤ ∂K E{(S − K)+}2 since this derivative equals

E{(S − K)+} ∂ E{(S − K)+}2 − E{ (S − K)+}2 ∂ E (S − K)+ = ∂K ∂K E{ (S − K)+}3 1.8. CONVEX FUNCTIONS 59 where ∂ ∂ E (S − K)+ = E (S − K)+ = −E1(S > K) = −Q(S > K) ∂K ∂K where it is justified to switch the order of differentiation and integration since the derivative of the integrand 0 ≤ 1(S > K) ≤ 1 is uniformly bounded. and ∂ ∂ E { (S − K)+}2 = E {(S − K)+}2 = −2E(S − K)+ ∂K ∂K where again it is justified to switch the order of differentiation and integration since the derivative of the integrand is 0 ≤ (S − K)+ ≤ S ∈ L1(Q) is uniformly (w.r.t. K) dominated by the integrable random variable S (we assume all the time that E(S) < ∞.

2 ∂ E{(S − K)+}2 E{(S − K)+}2Q(S > K) − E(S − K)+ = 2 + 2 + 3 ∂K E {(S − K) } EP {(S − K) } 2Q(S > K)2 = E(S − K)2|S > K) − E(S − K|S > K)2 ≥ 0 E{(S − K)+}3 where E((S − K)+) E((S − K)+) E(S − K)2|S > K) = and E(S − K|S > K) = Q(S > K) Q(S > K) are (elementary) conditional expectations conditioned on the event {ω : S(ω) > K}, and the inequality follows by Jensen inequality for conditional expectations, if X ∈ L1(Ω, F,Q), f(x) is convex (for example f(x) = x2) and G ⊆ F is a sub-σ-algebra, then

E(f(X)|G)(ω) ≤ fE(X|G)(ω) and it holds also when we condition on an event A with Q(A) > 0:

E(f(X)|A) ≤ fE(X|A)(ω) 2 On the other hand, combining the investment stock with a put option reduces the risk. Exercise 1.8.3. Consider a random asset S(ω) ≥ 0 P -a.s. and the stop-loss option contract + FK (ω) = α(K) S(ω) + (K − S(ω)) = α(K)K + (S(ω) − K)+ = α(K) max{K,S(ω)} 60 CHAPTER 1. PROBABILITY = PRICE with strike price K > 0, where under the pricing probability Q

EQ(S) π(S) α(K) = + = + EQ(S) + EQ((K − S) ) π(S) + π((K − S) )

EQ(S) π(S) = + = + ∈ (0, 1) K + EQ((S − K) ) K + π((S − K) ) is a constant such that EQ(FK ) = EQ(S) under the risk-neutral pricing measure Q and π(F ) = π(S). Show that ∀K > 0

VarianceQ(FK ) ≤ VarianceQ(S) meaning that in the risk neutral world options can be used to reduce the ﬁnancial risk of an investment. Solution Since max{K,S(ω)} ≥ S(ω) ≥ 0, Emax{K,S(ω)} ≥ ES(ω) and 0 ≤ α(K) ≤ 1. Using the decomposition into positive and negative parts X = (X+−X−),

Variance(X) = Var(X+) + Var(X−) − 2Covariance(X+,X−) = Var(X+) + Var(X−) + 2E(X+)E(X−) ≥ Var(X+) + Var(X−) ≥ Var(X±) where Cov(X+,X−) = −E(X+)E(X−) ≤ 0 since (X+X−) = 0. Finally

α(K)2Var(K + (S − K)+) ≤ Var(K + (S − K)+) = Var((S − K)+) ≤ Var(S − K) = Var(S) 2 Chapter 2

Martingale representation and hedging in discrete time models

2.1 Some basic facts from martingale theory

2.1.1 Conditional Expectation and Martingales Let (Ω, F,P ) be a probability space. Deﬁnition 2.1.1. Conditional expectation: Let X be a random variable, (which is F-measurable) and a sub σ-algebra G ⊆ F, EP (X|G) is a G- measurable random variable such that for all B ∈ G

EP (1BX) = EP (1BEP (X|G)) Properties:

i) EP (EP (X|G)) = EP (X) ,

ii) if Y is G-measurable EP (XY |G) = Y EP (X|G).

iii) if Y ⊥⊥G, EP (Y |G) = EP (Y ). 2 iv) If EP (X ) < ∞, the random variable EP (X|G) is the orthogonal projection of the r.v. X to the subspace L2(Ω, G,P ) ⊂ L2(Ω, F,P ):

2 2 E((X − EP (X|G)) ) = min E((X − Y ) ) . Y ∈L2(Ω,G,P )

v) the conditional expectation is linear:

EP (aX + bY |G)(ω) = aEP (X|G)(ω) + bEP (Y |G)(ω)

61 62 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING

vi) The conditional expectation is linear is non-negative, if X(ω) ≥ 0 P a.s. , then E(X|G)(ω) ≥ 0 P a.s.

Let Q a probability measure which dominates P (P Q) on a σ-algebra G ⊆ F, which means that Q(A) = 0 =⇒ P (A) = 0 for all A ∈ G. The Radon-Nikodym derivative of P w.r.t Q is a G-measurable random variable

dP |G ZG(ω) = ZG(P,Q)(ω) = (ω) ≥ 0 dQ|G This means that P (dω) = Z(P,Q)(ω)Q(dω) on G, and if X is a G- measurable random variable we change the measure to represent the expectation w.r.t. P as an expectation w.r.t. Q:

EP (X) = EQ(XZ(P,Q)) G 1 We have that 0 ≤ Z (P,Q) ∈ L (Ω, G,Q), ja EQ(Z(P,Q)) = 1. In statistics Z(P,Q) is called likelihood ratio. Note that if A ⊆ G and P Q on G, then trivially P Q on A, and

A G Z (P,Q) = EQ(Z (P,Q)|A). This is the Q-martingale property for nested σ-algebrae.

We have also a formula to change the measure in the conditional expectation. For P Q, G ⊆ F, and X is F-measurable, Bayes formula holds:

Example 2.1.1. As an exercise we show that the elementary Bayes formula used in statistics follows as a special case: 2.1. SOME BASIC FACTS FROM MARTINGALE THEORY 63

Let (X,Y ) a random vector with values in R2, with

P (X ∈ dx, Y ∈ dy) = π(x)p(y|x)dxdy

We work directly on the canonical space Ω = R2. On the σ-algebra F = σ(X,Y ), we take as reference measure a dominating product measure, for example Q(dx, dy) = π(x)dxdy (although Q is not a probability measure, Bayes formula works also in this case). dP Clearly P Q and Z(P,Q) = dQ (x, y) = p(y|x). When we condition to the sub-σ-algebra G = σ(Y ), our (abstract) Bayes formula says that for any bounded measurable function f(x),

We introduce now a ﬁltration F := {Ft}t≥0, which is an increasing sequence of σ-algebrae shich that, for all s ≤ t , Fs ⊆ Ft ⊆ F. ( here it does not matter whether the time is discrete or continuous, we can always imbed discrete time in continuous time by taking Ft = Fbtc).

Deﬁnition 2.1.2. A process Mt is a (P, F)-martingale if Mt is Ft measur- 1 able, Mt ∈ L (P ), and for s ≤ t

EP (Mt|Fs) = Ms .

When

EP (Mt|Fs) ≤ Ms , s ≤ t we say that (Mt) is a (P, {Ft})-supermartingale, and if

EP (Mt|Fs) ≥ Ms , s ≤ t

(Mt) is a (P, {Ft})-submartingale. Given all the past, the conditional expectation of a future value of a martingale is the current value. Note that the martingale property depends on the measure P and on the ﬁltration {Ft}. 64 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING

Given two measures P and Q deﬁned on (Ω, F) we consider at each time t the restriction of the measures to the current information σ-algebra Ft,

Pt = P |Ft , Qt = Q|Ft . If Pt Qt on Ft, we deﬁne

dPt Zt(P,Q) = . dQt 1 From the deﬁnition it follows that Zt ∈ L (Q, Ft) and Zt(ω) ≥ 0. We show that Zt is a (Q, F) martingale: for s ≤ t if B ∈ Fs also B ∈ Ft and we have

P (B) = EP (1B) = EQ(Zs1B) = EQ(Zt1B) which means that Zs = EQ(Zt|Fs).

Example 2.1.2. On a probability space (Ω, F) we have a sequence of (R- valued) random variables (X1,X2,...,Xn,... ) , and two probability measures P and Q such that (Xi) are independent and identically distributed under both P and Q. We assume that P (X1 ∈ dx) = f(x)Q(X1 ∈ dx). Let Ft = σ(X1,...,Xt), t ∈ N. It follows that Y Zt(P,Q) = fs(Xi) . s∈N:s≤t

Exercise 2.1.1. Check that Z(P,Q) is a (Q, {Ft})-martingale.

Deﬁnition 2.1.3. We say that a process (Xt) is adapted if Xt ∈ Ft for all t, and in the discrete-time situation it is predictable if Xt ∈ Ft−1 for all t.

Theorem 2.1.1. (Doob-Meyer decomposition in discrete time). If (Xt) is adapted to the ﬁltration {Ft}, and E(|Xt|) < ∞ for all t = 0, 1,...,T then there is an unique decomposition

Xt = X0 + At + Mt where At is {Ft}-predictable and Mt is a {Ft}-martingale with A0 = 0 and M0 = 0. If (Xt) is a supermartingale (respectively submartingale) the process At is non-increasing, (respectively non-decrasing submartingale). Proof

∆Xt = (∆Xt − EP (∆Xt|Ft−1)) + EP (∆Xt|Ft−1) = ∆Mt + ∆At 2.1. SOME BASIC FACTS FROM MARTINGALE THEORY 65 where

t t X X At = EP (∆Xt|Ft−1),Mt = ∆Xt − EP (∆Xt|Ft−1) s=1 s=1

If another Doob decomposition of X existed, Xt − X0 = At + Mt we would have (Mt − Mt) = (At − At) which means that (Mt − Mt) is a predictable martingale, which is necessarly the constant zero.

Deﬁnition 2.1.4. If (Yt) and (Xt) are sequences we deﬁne the stochastic integral of Y with respect to X as the sequence

t X (Y • X)t = Ys∆Xs s=1 which is called martingale transform or discrete stochastic integral

Theorem 2.1.2. Assume that (Yt) {Ft}-predictable process and (Mt) is a (P, {Ft})-martingale. If Yt is a bounded random variable for all t, or alterna- tively both Yt and Mt are square integrable r.v., it follows that E(|Yt∆Mt|) < ∞. Under such assumptions, the stochastic integral (Y •M)t is a martingale. Proof: Exercise.

2 Proposition 2.1.1. If Mt is a (P, F)-martingale with EP (Mt ) < ∞, ∀t Yt is F-predictable such that t X 2 EP Ys ∆hMis < ∞, s=1 where the predictable variation (or angle bracket) is deﬁned with increments

2 ∆hMis = ∆hM,Mis = EP {∆Ms} Fs−1 ,

2 then (Y • M)t is a L (P )-martingale.

Proof. 2 2 2 2 EP Yt∆Mt = EP Yt EP { ∆Mt} Ft−1 = EP Yt ∆hMit 66 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING

2.1.2 Square integrable martingales and predictable bracket

2 A (P, {Ft})-martingale (Mt) is square integrable when E(Mt ) < ∞ for all t. If Mt,Nt are square integrable martingales then by using Cauchy-Schwartz inequality q q 2 2 E(|MtNt|) ≤ E(Mt ) E(Nt ) < ∞

1 so that the product (MtNt) is in L and it makes sense to consider its Doob- Meyer decomposition:

MtNt − Mt−1Nt−1 = Mt−1∆Nt + Nt−1∆Mt + ∆Mt∆Nt = Mt−1∆Nt + Nt−1∆Mt + ∆Mt∆Nt − EP (∆Mt∆Nt|Ft−1) + EP (∆Mt∆Nt|Ft−1) By introducing the predictable covariation process t X hM,Nit := EP (∆Ms∆Ns|Fs−1) s=1 we write the Doob-Meyer decomposition

MtNt = M0N0 + hM,Nit + mt where dmt the sum the martingale increments dmt = Mt−1∆Nt + Nt−1∆Mt + ∆Mt∆Nt − EP (∆Mt∆Nt|Ft−1) where the integrability conditions in the deﬁnition of martingale follow from Cauchy-Schwartz inequality since we have assumed M and N are square- integrable. We introduce also the quadratic covariation t X [M,N]t := ∆Ms∆Ns. s=1

It follows that ([M,N]t − hM,Nit) is a (P, {Ft})-martingale.

[M,N]t is called quadratic covariation or square-bracket process, while hM,Nit is called predictable covariation, or predictable-bracket process. Since E((∆Mt)P |Ft−1) ≥ 0, the process ([M,M]t) is a submartingale and therefore (hM,Mit) is non-decreasing. The notations [M]t := [M,M]t and hMit := hM,Mit are also used. Note [M,N]t does not depend on the measure P , but the predictable bracket hM,Nit does ! 2.1. SOME BASIC FACTS FROM MARTINGALE THEORY 67

Deﬁnition 2.1.5. Two square integrable martingales (Mt), (Nt) are orthogonal if the product (MtNt) is a martingale. Equivalent conditions are i) [M,N]t is a martingale, ii) hM,Nit = 0, which means EP (∆Mt∆Nt|Ft−1)(ω) = 0 P a.s.

2 2 ∆hMit := EP ((∆Mt) |Ft−1)(ω) and EP (∆hMti) = EP {∆Mt}

2.1.3 Orthogonal projections in the space of square integrable martingales Let M and N be square integrable martingales, We write

t ⊥ X ⊥ Nt = N0 + (H • M)t + Nt = N0 + Hs∆Ms + Nt (2.1.2) s=1 where (Ht) is the predictable process

H = 1∆hM,Mi > 0 ∆hM,Nit t t ∆hM,Mit 2 EP (∆Mt∆Nt|Ft−1) = 1 P ((∆Mt) |Ft−1) > 0 2 E EP ((∆Mt) |Ft−1)

⊥ and Nt is a P -martingale orthogonal to Mt.

Note ﬁrst that since the conditional expectation is a positive operator, 2 EP ∆Mt Ft−1 (ω) ≥ 0 and

2 EP {∆Mt} Ft−1 (ω) = 0 (2.1.3) if and only if

2 P {∆Mt} = 0 Ft−1 (ω) = 1 otherwise for some ε > 0

2 P {∆Mt} > ε Ft−1 (ω) > η > 0 and

2 EP {∆Mt} Ft−1 (ω) ≥ εη > 0. 68 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING

Therefore (2.1.3) implies that ⊥ EP ∆Nt ∆Mt Ft−1 (ω) = 0 2 and Ht is well deﬁned. Note also that Ht ∈ L (Ω, Ft−1P ), since 2 EP (∆Mt∆Nt| Ft−1) 2 EP (Ht) = EP 2 ≤ EP ∆Nt < ∞ EP ({ ∆Mt} | Ft−1) where we used the Cauchy-Schwartz inequality for the conditional expectation together with the properties of the conditional expectation. 2 2 2 EP (∆Mt∆Nt| Ft−1)(ω) ≤ EP ((∆Mt) |Ft−1)(ω)EP ((∆Nt) |Ft−1)(ω) Note also that from Proposition 2.1.1 it follows that H • M t is a square integrable martingale, since 2 2 2 ∆hM,Nis {∆hM,Nis} 2 Hs ∆hM,Mis = ∆hM,Mis = ≤ ∆hN,Nis ∈ L (P ) ∆hM,Mis ∆hM,Mis ⊥ 2 where we used Cauchy Schwartz inequality. This implies Nt ∈ L (P ), and it is a P -martingale since ⊥ EP ∆Nt ∆Mt Ft−1 = EP (∆Nt − Ht∆Mt)∆Mt Ft−1 = EP ∆Nt∆Mt Ft−1 − HtEP ∆Mt, ∆Mt Ft−1 ∆hM,Nit = ∆hN,Mit − ∆hM,Mit = 0 ∆hM,Mit

2.1.4 Martingale property and change of measure Theorem 2.1.3. Let Q P and let

dQt Zt(ω) = Zt(Q, P ) = (ω) dPt

Then Mt is a (Q, {Ft})-martingale if and only if the product (MtZt) is a (P, {Ft})-martingale.

Proof. for s ≤ t, let A ∈ Fs. By the change of measure formula

EQ((Mt − Ms)1A) = EP (ZtMt1A) − EP (ZsMs1A) = EP ((ZtMt − ZsMs)1A) with Mt1A ∈ Ft and Ms1A ∈ Fs. By taking Mt = Ms = 1 it follows that Zt is a P-martingale, and by deﬁnition of conditional expectation

EQ(Mt|Fs) = Ms if and only if EP (ZtMt|Fs) = ZsMs 2.1. SOME BASIC FACTS FROM MARTINGALE THEORY 69

2.1.5 Doob decomposition and change of measure

Suppose that M is a (P, Ft) martingale with M0 = 0 and ∆Mt > −1.

t t Y X Zt = E(M)t := (1 + ∆Mt) = 1 + Zs−1∆Ms > 0 s=1 s=1 and we deﬁne on each Ft consistently a measure

Qt(dω) = Zt(ω)Pt(dω)

If (Zt)t=0,1,...,T is integrable, then (Zt) is a P -martingale and EP (Zt) = Z0 = 1.

Example 2.1.3. Assume that {ξt(ω): t = 1,...,T } are i.i.d. N (0, 1) distributed (univariate gaussian with 0 mean and variance 1 ). For a given θ ∈ R Deﬁne

t X 1 M (θ) = exp(θξ − θ2) − 1 t s 2 s=1

This is a martingale with independent increments, and ∆Mt > −1. Then we set Z0(θ) = 1 and

t t X Y 1 Z (θ) = E(M(θ)) = 1 + Z (θ)∆M = 1 + exp(θξ − θ2) − 1 = t t s−1 s s 2 s=1 s=1 t t Y 1 X 1 exp(θξ − θ2) = exp θ ξ − θ2t s 2 s 2 s=1 s=1

Pt It follows that Zt(θ) is integrable, since under P , the r.v. s=1 ξs is gaussian N (0, t). Since integrability is satisﬁed, Zt(θ) is a P - martingale, which deﬁnes a probability measure dQt(θ) = Zt(θ)dPt on Ft Pt For example for Nt = s=1 ξt, the martingale decomposition2 under Q(θ) is given by ∆Nt = ∆Nt − ∆hN,M(θ)it + ∆hN,M(θ)it 1 1 ξ − ξ exp(θξ − θ2) + ξ exp(θξ − θ2) = ξ − θ + θ t EP t t 2 EP t t 2 t meaning that (Nt − θt) is a Q(θ)-martingale. Here 1 ∂ ξ exp(θξ − θ2) = log exp(θξ ) = θ EP t t 2 ∂θ EP t 70 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING

Assume that M and N are square integrable P martingales, ∆Mt ≥ −1 1 and Zt = E(M)t, t = 1,...,T with ZT ∈ L (P ) for all t. By projecting N on M obtaining the orthogonal martingale decomposition

⊥ Nt = N0 + (H • M)t + Nt What happens to the martingale property of N and M under the new mea- dQ sure Q with likelihood ratio dP = Z? Proposition 2.1.2. (Girsanov theorem in discrete time) The Doob decomposition of N under Q is given by ⊥ Nt = N0 + H • hM,Mi t + H • (M − hM,Mi) t + Nt

⊥ where Mft := (M −hM,Mi)t is a Q-martingale and N is a martingale under both P and Q, and (H • hM,Mi)t is a predictable process. Proof. From Bayes’ formula of change of measure in conditional expectation,

EP(Zt∆Mt|Ft−1) Zt EQ(∆Mt|Ft−1) = = EP ∆Mt Ft−1 = EP(Zt|Ft−1) Zt−1 ∆Zt 2 EP ∆Mt 1 + Ft−1 = EP ∆Mt Ft−1 +EP (∆Mt) Ft−1 Zt−1 | {z } =0

= ∆hM,Mit which means that (Mt − hM,Mit) is a Q-martingale. On the other hand

⊥ ⊥ ⊥ EQ(∆Nt |Ft−1) = EP(∆Nt ∆Mt|Ft−1) = ∆hN ,Mit = 0 since N ⊥ and M are orthogonal martingales under P ⊥ Remark 2.1.1. Note that Nt is a martingale with respect to both P and Q measures. However although it is orthogonal to M with respect to the P probability, which means

t ⊥ (P) X ⊥ hN ,Mit = EP ∆Ns ∆Ms Fs−1 = 0 , s=1

N ⊥ does not need to be orthogonal to (M − hMi) under Q,

t ⊥ (Q) X ⊥ hN , Mfit = EQ ∆Ns ∆Mfs Fs−1 s=1 2.1. SOME BASIC FACTS FROM MARTINGALE THEORY 71 does not need to be zero. In fact ⊥ ⊥ ⊥ EQ ∆Nt (∆Mt − ∆hMit) Ft−1 = EQ ∆Nt ∆Mt Ft−1 − ∆hMit EQ ∆Nt Ft−1 = | {z } 0 ⊥ ⊥ ⊥ 2 EP ∆Nt ∆Mt(1 + ∆Mt) Ft−1 = EP ∆Nt ∆Mt Ft−1 +EP ∆Nt (∆Mt) Ft−1 | {z } 0 ⊥ = ∆hMit EP ∆Nt Ft−1 +EP(∆Nt(∆[M]t − ∆hMit) Ft−1 | {z } 0 where the (P, F)-martingale ([M] − hMi) does not need to be orthogonal to N ⊥ under P.

In example 2.1.3, we compute hM(θ),M(θ)i, and ﬁnd the law of (ξs) under the probability measure QT (θ). Recall that the characteristic function of the gaussian distribution N (µ, σ2) is given by

1 2 2 ϕ (u) := E 2 (exp(iuX)) = exp iuµ − u σ X µ,σ 2 where X(ω) is N (µ, σ2)-distributed and i is the imaginary unit. We also ﬁnd the characteristic function of the vector ξ1, . . . , ξt under the measure Q: for t u = (u1, . . . , ut) ∈ R t t X X EQ exp(i usξs)) = EP Zt exp(i usξs)) = s=1 s=1 t X 1 E exp (iu + θ)ξ − θ2t = P s s 2 s=1 t t t X 1 X 1 X E exp i(u − iθ)ξ + (u − iθ)2 exp {−(u − iθ)2 − θ2} = P s s 2 s 2 s s=1 s=1 s=1 t t Y 1 Y 1 E exp (u − iθ)ξ + (u − iθ)2 exp(iθu − u2) P s s 2 s s 2 s s=1 s=1

= EP (exp(i(θ + ξs)us) this means that the law under Q of ξs is the same as the law under P of (θ + ξs), i.e. under Q (ξs : s = 1, . . . , t) are i.i.d. N (θ, 1).

2.1.6 Stopping times and localization Deﬁnition 2.1.6. A random time τ(ω) taking values in the time index set N ∪ { +∞} is a stopping time with respect to the ﬁltration F = (Ft : t ∈ N) 72 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING when ω : τ(ω) ≤ t ∈ Ft.

The interpretation is that if F is our information ﬁltration, at time t we know whether τ has already happened (and then we know its value), or not.

Deﬁnition 2.1.7. For (Xt : t ∈ N) is a random process, and τ is a F- stopping time, we deﬁne the stopped process

t τ X Xt (ω) = Xt∧τ (ω) = X0(ω) + 1(τ(ω) ≥ s)∆Xs(ω) (2.1.4) s=1 which is constant after τ(ω).

Lemma 2.1.1. 1. If Xt is F is adapted, and τ is an F-stopping time, then τ the stopped process Xt is F-adapted.

2. If Xt is an F-martingale (submartingale, supermartingale, respectively), τ then the stopped process Xt is also a an F-martingale (submartingale, supermartingale, respectively). Proof. Note that the integrand process 1(τ(ω) ≥ s) in (2.1.4) is bounded, non-negative and predictable, since ω : τ(ω) ≥ s = ω : τ(ω) ≤ s − 1 c ∈ Fs−1, ∀s > 0. and the result follows by Theorem 2.1.2

Deﬁnition 2.1.8 (Localization). Let C be a subclass of F-adapted stochastic processes. We say that an F-adapted stochastic process (Xt : t ∈ N) is locally in C and we write X ∈ Cloc if there exists a non-decreasing sequence of F- stopping times 0 ≤ τn(ω) ≤ τn+1(ω) ↑ ∞ such that for each τn the stopped process Xτn ∈ C. In a ﬁnite time horizon with t ∈ {0, 1,...,T } it is enough to ask that τn ↑ T . For example an F-adapted process Mt is a F-local martingale if there is a nondecreasing sequence of stopping times with τn(ω) ↑ ∞ such that for each n the stopped process (Mt∧τn : t ∈ N) is an F-martingale. Similarly we can speak about F-locally bounded processes, F-locally integrable processes, F-locally square integrable processes and so on.

Lemma 2.1.2. An F-adapted process Xt(ω) with continuous paths is locally bounded.

Proof. For each n ∈ N, let τn(ω) := inf t ≥ 0 : |Xt(ω)| > n . 2.1. SOME BASIC FACTS FROM MARTINGALE THEORY 73

This is a stopping time: ω : τ(ω) > t = ω : |Xs(ω)| ≤ n, ∀q ∈ Q ∩ [0, t] ∈ Ft

and by path continuity |Xt∧τn (ω)| ≤ n Remark 2.1.2. Note that without additional assumptions, it is not always possible to show that a local martingale Mt is a true martingale. If τn ↑ ∞ is a localizing sequence of stopping times in the filtration F, such that for each fixed n the stopped process (Mt∧τn : t ≥ 0) is a (P, F)- martingale, in order to show that Mt is a true martingale, we need to show that ∀t > s ≥ 0 and A ∈ Fs (Mt − Ms)1A = lim (Mt∧τ − Ms∧τ )1A (2.1.5) E E n→∞ n n ? = lim (Mt∧τ − Ms∧τ )1A = 0 (2.1.6) n→∞ E n n The problem is that the interchange of limit and expectation is not justified without additional assumptions.

2.1.7 Martingale Predictable Representation Property

Let M be a P-martingale w.r.t. to a discrete time ﬁltration {Ft : t ∈ N}. We say that M has the martingale representation property in the ﬁltration F = {Ft}, if any other bounded (P, F)-martingale (Xt) can be represented as an F0-measurable initial value plus a martingale transform w.r.t. M

t X Xt − X0 = (Y • M)t = Ys∆Ms s=1 where (Yt) is F-predictable, that is Yt is Ft−1-measurable for all t. Since X is a bounded martingale, also ∆Ms(ω) is conditionally bounded given Fs−1 on the set ω : Ys(ω) 6= 0},

−1 |∆Mt(ω)| ≤k ∆Xt kL∞(P) Yt(ω)

Therefore Mt itself has to be locally bounded, a localizing sequence is given by −1 τn := inf t :k ∆Xt+1 kL∞(P) Yt+1(ω) > n which is a stopping time since {τn ≤ t} ∈ Ft. 74 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING

Note that this notation covers also the case of d-dimensional martingales. In such case (Ys) is a d-dimensional predictable process, and

t t d X X X (i) (i) Ys∆Ms = Ys ∆Ms s=1 s=1 i=1

2 Lemma 2.1.3. Let (Mt) be an L (P)(P, F)-martingale . (Mt) has the predictable representation property in the (F)-ﬁltration if and only if the only bounded (P, F)-martingales (Nt) such that the product (MtNt) is a (P, F)-martingale are constant. Proof. Assume that the PRP holds for M. Then every bounded martingale N has the form Nt = (H • M)t. If N is such that (NtMt) is a martingale, necessarly

∆(MtNt) = Mt−1∆Nt + Nt−1∆Mt + ∆Mt∆Nt = 2 (Mt−1Ht + Nt−1)∆Mt + Ht(∆Mt)

This gives a contradiction, since

2 0 = EP(∆(MtNt)|Ft−1) = HtEP((∆Mt) |Ft−1) 6= 0 with positive probability unless either ∆Mt = 0 or Ht = 0. This implies that Nt is constant. For the opposite implication,

∆hM,Nit ⊥ ∆Nt = ∆Mt + Nt ∆hM,Mit

⊥ ⊥ with hN ,Mit = 0, and the assumption implies N constant Theorem 2.1.4. In the discrete time setting, M has the martingale representation property in the ﬁltration F if and only if P is the unique equivalent martingale measure for M which coincides with P on F0. dP ∞ ∼ |F = |F Z(ω) = (ω) ∈ L ( ) (Mt) if Q P such that Q 0 P 0 , dQ P and is a also a (Q, F)-martingale, necessarily Q = P.

Proof. For simplicity we set F0 = {Ω, ∅}. Assume that Q ∼ P and 0 < Z = dQ ∞ ∈ L ( ). We know that Zt = Zt( , ) = E (Z|Ft) is an essentially dP P Q P P bounded (P, F)-martingale. By the predictable representation property,

∆Zt = Zt−1Ht∆Mt where Ht is Ft−1-measurable. 2.1. SOME BASIC FACTS FROM MARTINGALE THEORY 75

We show that M is not a martingale under Q, unless Ht = 0.

Zt ∆Zt EQ(∆Mt|Ft−1) = EP ∆Mt Ft−1 = EP ∆Mt 1 + Ft−1 = Zt−1 Zt−1 2 EP ∆Mt 1 + Ht∆Mt Ft−1 = EP(∆Mt|Ft−1) + EP(Ht(∆Mt) |Ft−1) = 2 0 + HtEP((∆Mt) |Ft−1) 6= 0 unless Mt ≡ M0 which can’t have the PRP unless Ft ≡ F0 ∀t, or Ht = 0 P-a.s. for all t. This means that Zt = 1 for all t and Q = P.

Viceversa, suppose that the representation property does not hold for M in the ﬁltration F, and there is some other non-constant bounded (P, F)- martingale N such that the product (MtNt) is a martingale. We can take N satisfying N0 = 0 and |Nt| ≤ 1/2. It is a fact from martingale theory that a bounded martingale (Nt) has almost surely a limit N∞ as t → ∞. Deﬁne a probability on (Ω, Ft) as dQt = 1 + Nt dPt = Zt(ω)dPt

1 Note that (Zt) is a P-martingale with 0 < 2 ≤ Zt(ω) ≤ 3/2 and Z0 = 1, and Qt is a probabilty measure equivalent to Pt on Ft. It follows that

MtZt = (1 + Nt)Mt is a P-martingale since (Mt) and (NtMt) are P-martingales. This means we have constructed another measure Qt ∼ Pt, with Qt 6= Pt such that (Mt) is a Q-martingale Example 2.1.4. Consider a sequence of i.i.d. standard normal random variables (ξt) on the probability space (Ω, F, P). with the filtration of σ algebrae Ft = σ(ξs : 1 ≤ s ≤ t). t P Define Mt = ξs. Mt is a P -martingale, since it has independent in- s=1 crements and centered. Mt is also square integrable, since the increments are gaussian. Note that Ft = σ(Ms : 1 ≤ s ≤ t). t 2 P Note that ηt = (ξt − 1) are also i.i.d. and centered, and Nt = ηs is s=1 also a P -martingale. It follows that the product (NtMt) is a P -martingale, since EP(ξtηt) = 3 EP(ξt − ξt) = 0. The filtration {Ft} generated by (Mt) contains the P-martingale (Nt) which is is orthogonal to (Mt). Neither M or N have the predictable representation property. 76 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING

We show that there exist an equivalent martingale measure for M. Note 2 that ∆Nt = (ξt − 1) > −1 P-almost surely. Therefore

t t Y X Zt = (1 + ∆Nt) = 1 + Zs−1∆Ns > 0 s=1 s=1 deﬁnes an equivalent probability measure dQt = ZtdPt. By Girsanov theorem, since (MtNt) is a P-martingale it follows that also (MtZt) is a P-martingale. But this means that (Mt) is a Q-martingale. So Q ∼ P but Q 6= P is another martingale measure equivalent to P. In order to construct a bounded (P, {Ft})- martingale we can take the i.i.d. sequence of centered and bounded random variables

2 2 εt := (ξt ∧ 1) − EP(ξt ∧ 1) ∈ (−1, 1) It follows that

2 2 EP(ξtεt) = EP(ξt(ξt ∧ 1)) − EP(ξt)EP(ξt ∧ 1) = 3 EP(ξt1(|ξt| > 1)) + EP(ξt 1(|ξt| ≤ 1)) + 0 = 0 since the distribution ξt is symmetric around 0. Therefore for any ﬁxed T , the process stopped at T

t∧T T X Xt := εs s=1 is a bounded P-martingale orthogonal to (Mt). Remark 2.1.1. We could replace in the definition of the Predictable Repre- sentation Property bounded martingales with square integrable martingales. In fact in discrete time, there is no difference,the predictable representation property is possible only when the σ-algebra FT is finitely generated, with all random variables can take only finitely many possible values with positive probabilities. As we will show, in continuous time we can have the Predictable Representation Property in filtrations where the σ-algebra are not finitely generated, as the filtration generated by a Brownian motion and/or a Poisson process.

2.2 Uniform Doob decomposition

Theorem 2.2.1. On a probability space (Ω, F, P) equipped with a discrete time ﬁltration F = (Ft : t = 0, 1,...,T ) we have a market model with (1 + d) 2.2. UNIFORM DOOB DECOMPOSITION 77

(1) (d) F-adapted asset value proceesess (1,St ,...,St : t = 0, 1,...,T ) already discounted with respect to the numeraire, we denote as usual the set of equivalent risk-neutral probability measures with bounded density as

∗ dQ ∞ P = Q ∼ P : ∈ L and Q(St|Ft−1) = St−1 ∀1 ≤ t ≤ T } dP E

An F-adapted process (U(t): t = 0, 1,...,T ) is a (Q, F)-supermartingale for all martingale measures Q ∈ P∗, if and only if it admits the optional decomposition

t X U(t) − U(0) = ξt · ∆St − Bt ∀t = 1,...,T r=1

d where ξt is R -valued and F-predictable and Bt is non-decreasing and F- adapted.

Remark 2.2.1. This optional decomposition diﬀers from the usual Doob decomposition of a supermartingale w.r.t. to a single probability in the non- increasing part process, which is adapted but not predictable.

1 Proof. Without loss of generality we can assume that Bt ∈ L (P) (under the reference measure, ∀0 ≤ t ≤ T .(⇐=) ∀Q ∈ P∗ we get the supermartingale property EQ ∆Ut Ft−1 = ξt · EQ ∆St Ft−1 −EQ ∆Bt |Ft−1 ≤ 0 | {z } |{z} =0 ≥0

∗ (=⇒) Let Ut be a (Q, F)-supermartingale ∀Q ∈ P , and deﬁne

0 d Vt = ξt · ∆St : ξ ∈ L (Ω, Ft−1, P; R ) the set of attainable proﬁts between two consecutive times. We need to prove that ∀t = 1,...,T

0 Ut − Ut−1 ∈ Kt := Vt − L (Ω, Ft, P; R+) (2.2.1) where we can assume without loss of generality that P is risk-neutral, i.e. EP(St|Ft−1) = St−1. Assuming that (2.2.1) does not hold, for some t ∈ {1,...,T }

0 1 Ut − Ut−1 6∈ Ct = Vt − L+(Ω, Ft, P) ∩ L (P) 78 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING

1 where Ct is a convex cone closed in L (Ω, Ft, P). By the separating hyperplane ∞ theorem, there exists ζt ∈ L (Ω, Ft, P) such that sup EP(W ζt) = 0 < EP (Ut − Ut−1)ζt < ∞ W ∈Ct

By taking W = −1(ζt < 0) ∈ Ct it follows that ζt ≥ 0 P a.s. We show that we can also ﬁnd a ζt with the same property and bounded away from zero, ζt ≥ ε P a.s. for some ε > 0. By Fatou Lemma for W = (ξt∆St − Y ) ∈ Ct, with Y ≥ 0, ξt Ft−1-measurable

W = lim 1(|ξt| ≤ n)W n→∞ P-a.s. and in L1(P ), where for each n EP ξt1(|ξt| ≤ n)∆St) = 0 since St is a P -martingale and ξt1(|ξt| ≤ n) is bounded and Ft−1 measurable, and (W ) = lim 1(|ξt| ≤ n)W ≤ 0 EP n→∞ EP

Therefore for some small enough ε > 0 and ∀W ∈ Ct EP (ζt + ε)W ≤ 0 < δ = E (ζt + ε)Ut − Ut−1) Let’s deﬁne

ζt + ε ∞ Zt = ∈ L (Ω, Ft−1, P) EP(ζt|Ft−1) + ε dQ with P (Zt|Ft−1) = 1, and introduce the new measure Q with = Zt. By E dP using the abstract Bayes formula EP (St − St−1)Zt|Ft−1) EQ St − St−1|Ft−1) = = EP (St − St−1)(ζt + ε) Ft−1 = 0 EP (Zt|Ft−1) ∗ Q ∈ P is a martingale measure for St, and by assumption EQ Ut − Ut−1|Ft−1) ≤ 0. By using abstract Bayes formula again

0 ≥ EQ EQ(Ut − Ut−1|Ft−1)EP (ζt + ε|Ft−1) Ft−1

= EQ (Ut − Ut−1)EP (ζt + ε|Ft−1) Ft−1

(ζt + ε) = EP (Ut − Ut−1) EP [ζt + ε|Ft−1] Ft−1 EP [ζt + ε|Ft−1] = EP (Ut − Ut−1)(ζt + ε) = δ > 0 2.2. UNIFORM DOOB DECOMPOSITION 79 which is a contradiction. 80 CHAPTER 2. MARTINGALE REPRESENTATION AND HEDGING Chapter 3

Option Pricing and Hedging in the Cox-Ross-Rubinstein Binomial market model

Consider the finite probability space (Ω, F, P) where Ω = {0, 1}T , with T < ∞, and F = 2Ω, the finite collection of all possible subset, and probability measure satisfies P({ω}) > 0 for all ω ∈ Ω. An history is a vector ω = t (ω1, . . . , ωT ) ∈ Ω and denote ω = (ω1, . . . , ωt) for t ≤ T . We have a market with a bank account Bt and a stock price St(ω), t = 0, 1,...,T , adapted to the filtration F with Ft = σ(ωs, s ≤ t), F0 = {Ω, ∅} We assume that there are {Ft}-predictable processes Ut(ω) > Rt(ω) > Dt(ω) > −1. B0 > 0 and S0 > 0 are determistic values, and we let

t Y Bt = B0 (1 + Rt), s=1 t Y St = S0 (1 + Dt + ωt(Ut − Dt)) s=1

Suppose that G(ω) is a Ft-measurable contingent claim, and we want to ﬁnd a self-ﬁnancing hedging strategy (βt, γt) satisfying

Vt = βtBt + γtSt = βt+1Bt + γt+1St . ¯ Let G(ω) = G(ω)/BT (ω) the discounted contingent claim. We show ﬁrst that there is an unique probability measure Q such that ¯ Q ∼ P and the discounted process St := (St/Bt) is a Q-martingale. ¯ Once we have shown that Q is the unique martingale measure for (St) in the ﬁltration F, it follows that every (Q, F) martingale (Nt) has the repre-

81 82CHAPTER 3. OPTION PRICING AND HEDGING IN THE COX-ROSS-RUBINSTEIN BINOMIAL MARKET MODEL

(1+u)4 (1+r)4

(1+u)3 (1+r)3

(1+u)2 (1+u)3(1+d) (1+r)2 (1+r)4

1+u (1+u)2(1+d) 1+r (1+r)3

(1+u)(1+d) (1+u)2(1+d)2 1 (1+r)2 (1+r)4

1+d (1+u)(1+d)2 1+r (1+r)3

(1+d)2 (1+u)(1+d)3 (1+r)2 (1+r)4

(1+d)3 (1+r)3

(1+d)4 (1+r)4

Figure 3.1: Values of the discounted stock process Set = St/Bt in the recom- bining binary tree with constant parameters −1 < d < r < u, Se0 = 1 83 sentation as

t X ¯ Nt = N0 + Hu∆Su u=1 where (Ht) is a F-predictable process. In particular we can take Nt = ¯ EQ(G|Ft), and obtain when t = T

T G(ω) X G¯(ω) = = E (G¯|F ) = E (G¯) + γ ∆S¯ B (ω) Q T Q t t T t=1 where (γt) is a F-predictable process. This gives the unique price c(G) = ¯ EQ(G)B0 and the hedging strategy for the contingent claim G. Lets’ ﬁrst compute the martingale measure Q: ¯ St St−1 St−1 (1 + Dt + (Ut − Dt)ωt) ∆St = − = − 1 = Bt Bt−1 Bt−1 (1 + Rt) St−1 (Ut − Dt)ωt − (Dt − Rt) Bt−1(1 + Rt) Taking conditional expectation with respect to a measure Q, and imposing the martingale property

¯ St−1 EQ(∆St|Ft−1) = (Ut − Dt)EQ(ωt|Ft−1) − (Dt − Rt) = 0 Bt−1(1 + Rt) ¯ which implies that Q is a martingale measure for (St) if and only if

t−1 (Rt − Dt) qt(ω ) := EQ(ωt|Ft−1) = Q(ωt = 1|Ft−1) = , (Ut − Dt)

t−1 where qt(ω ) ∈ (0, 1) is a probability since we have assumed that Dt < Rt < Ut, P a.s, and it is uniquely determined. We deﬁne globally the unique risk-neutral measure Q as follows:

T Y t−1 ωt t−1 1−ωt Q(ω) = qt(ω ) (1 − qt(ω )) t=1 with Q ∼ P since Q({ω}) > 0 for all ω ∈ Ω. We deﬁne the basic Q-martingale

t X (s−1) Mt = ωs − qs(ω ) s=1 84CHAPTER 3. OPTION PRICING AND HEDGING IN THE COX-ROSS-RUBINSTEIN BINOMIAL MARKET MODEL

We write

¯ St−1 (t−1) St−1(Ut − Dt) ∆St = (Ut − Dt)(ωt − qt(ω )) = ∆Mt Bt−1(1 + Rt) Bt−1(1 + Rt) ¯ and we can represent ∆Mt in terms of ∆St:

Bt−1(1 + Rt) ¯ ∆Mt = ∆St St−1(Ut − Dt) Next we show how to use the martingale representation to compute the hedging strategy for the contingent claim G.

Deﬁnition 3.0.1. If X(ω) is a function of the binary variables ω1, . . . , ωT ∈ {0, 1} we deﬁne its discrete Malliavin derivative or discrete stochastic gradient at time t ∈ {1,...,T } w.r.t ωt as

∇tX(ω) := X(ω1, . . . , ωt−1, 1, ωt+1, . . . ωT ) − X(ω1, . . . , ωt−1, 0, ωt+1, . . . ωT ) .

Note that in general ∇tX(ω) is not Ft measurable unless the r.v. X(ω) = t X(ω ) is Ft-measurable. In such case ∇tX(ω) is also Ft−1-measurable. Consider now the FT -measurable option value G(ω) and the discounted ¯ −1 value G(ω) = G(ω)BT . The following quantities are FT −1-measurable.

T −1 T −1 T −1 ∇T G(ω ) = (G(ω , 1) − G(ω , 0)) , ¯ T −1 ¯ T −1 ¯ T −1 1 T −1 T −1 ∇T G(ω ) = (G(ω , 1) − G(ω , 0)) = (G(ω , 1) − G(ω , 0)) BT (ω) T −1 ∇T G(ω ) = since BT (ω) is FT −1-measurable, BT (ω) T −1 T −1 T −1 ∇T ST (ω ) = (ST (ω , 1) − ST (ω , 0)) T −1 T −1 T −1 = ST −1(ω )(UT (ω ) − DT (ω )), ¯ T −1 1 ¯ T −1 ∇T ST (ω ) = ∇T ST (ω ) BT Note also that

¯ ¯ ¯ ST −1 ¯ T −1 T −1 ∆ST = (ST − ST −1) = (UT − DT )(ωT − qT ) = ∇T ST (ω ) ωT − qT (ω ) BT and

T −1 1 ¯ BT ¯ ∆MT = (ωT − qT (ω )) = ¯ ∆ST = ∆ST ∇T ST ∇T ST 85

Denoting the discounted option price process as

¯ ¯ ¯ ¯ G Vt = EQ G|Ft), with VT = G = for t = T, BT

¯ ¯ ¯ T −1 ¯ T −1 ¯ T −1 ¯ T −1 VT = G(ω) = G(ω , ωT ) = G(ω , 0) + G(ω , 1) − G(ω , 0) ωT = ¯ T −1 ¯ T −1 G(ω , 0) + ∇T G(ω )ωT = ¯ T −1 ¯ T −1 ¯ T −1 G(ω , 0) + ∇T G(ω )qT + ∇T G(ω )(ωT − qT ) = ¯ ¯ ¯ ¯ ∇T G ¯ EQ(G|FT −1) + ∇T G ∆MT = EQ(G|FT −1) + ¯ ∆ST = ∇ST ¯ ∇T G ¯ ¯ ∇T G 1 ∇T G 1 St−1 EQ(G|FT −1) + ∆ST = EQ(G|FT −1) + ∆St − ∆Bt ∇ST ∇ST BT ∇ST BT Bt−1 where ¯ ∇T G(ω) ∇T G(ω) ¯ St−1 1 ¯ = and ∆St = ∆(St/Bt) = Bt∆St − ∆Bt ∇T S(ω) ∇T S(ω) Bt−1 Bt The option value process is obtained by multiplying its discounted value by ¯ the numeraire Vt = VtBt, and obtain the increments by Abel summation formula (5.0.1)(discrete integration by parts)

¯ ¯ ¯ ∆VT = ∆(VT BT ) = BT ∆VT + VT −1∆BT

∇T G ∇T G ST −1 VT −1 = ∆ST − ∆BT + ∆BT ∇ST ∇ST BT −1 BT −1 ∇T G VT −1 − ST −1 ∇T G ∇ST = ∆ST + ∆BT ∇ST BT −1

By investing at time (T − 1) the (random) wealth

¯ EQ(G|FT −1)(ω) cT −1(G) = EQ G|FT −1 (ω)BT −1(ω) = 1 + RT we replicate the contingent claim G as follows: we buy γT units of the stock S at price ST −1,

∇T G γT = ∇T ST so that the wealth invested in stocks at time T − 1 is γT ST −1 (if γT < 0 we short-sell stocks), if necessary by borrowing from the bank at the predictable 86CHAPTER 3. OPTION PRICING AND HEDGING IN THE COX-ROSS-RUBINSTEIN BINOMIAL MARKET MODEL

interest rate RT , and we invest the remaining wealth on the riskless bank account investment B, buying βT units with 1 βT = cT −1(G) − γT ST −1 BT −1 at price BT −1, so that the total value of the investment at time (T − 1) is given by

VT −1 = cT −1(G) = βT BT −1 + γT ST −1 At time T the value of the portfolio becomes

VT = βT BT + γT ST = βT BT −1(1 + RT ) + γT ST −1 + γT ∆ST

= EQ(G|FT −1) − γT ST −1(1 + RT ) + γT ST −1 + γT ∆ST

= EQ(G|FT −1) − γT ST −1RT + γT ∆ST = EQ(G|FT −1) + γT ST − (1 + RT )ST −1 ST (1 + RT ) = EQ(G|FT −1) + BT γT − ST −1 BT BT ¯ ¯ ¯ = EQ(G|FT −1) + BT γT ∆ST = BT EQ(G|FT −1) + γT ∆ST ¯ = BT G = G Remark 3.0.1. The martingale measure Q when it is unique gives a device to compute the price and hedging strategy. In fact the price hedging can be computed without using probability, once we have assumed that all histories ω ∈ Ω have positive probability. A direct way to compute the hedging without using martingales is to solve at time T the system of equations: T −1 G(ω , 0) = BT βT + γT ST −1(1 + DT ) T −1 G(ω , 1) = BT βT + γT ST −1(1 + UT ) By substracting these two equations we get T −1 ∇T G(ω ) γT = ST −1(UT − DT )

T −1 and if the two equations with respective weights (1 − qT (ω )) corresponding T −1 to ωT = 0 and qT (ω ) corresponding to ωT = 1 we obtain 1 βT = EQ(G|FT −1) − γT EQ(ST |FT −1) BT 1 ST −1 = EQ(G|FT −1) − γT BT BT −1 87 combining these toghether we get the price of the contingent claim at time (T − 1): 1 cT −1(G) = βT BT −1 + γT ST −1 = EQ(G|FT −1) 1 + RT

The martingale method has the advantage that it gives a probabilistic interpretation to the price of the contingent claim, which can be computed directly as a Q-expectation. The other reason is that the martingale method works also in the continuous-time settings.

We found the price of the option G at time T −1, and the hedging strategy between times T − 1 and T . The option price at any time t and the hedging strategy in the whole time interval t = 1,...,T , is then obtained by backward induction: Let ct(G) be the price of the contract G at time t ≤ T . This is a Ft- measurable contingent claim. This means that are able to hedge the contingent claim G expiring at time T when at time t we have a portfolio with wealth ct(G). By repeating the martingale argument or by writing directly the system of equations we ﬁnd the price ct−1(G) at time (t − 1) and the t−1 t−1 replicating portfolio βt(ω ), γt(ω ) of the contract ct(G). Bt−1 Bt−1 −1 ct−1(G) = EQ ct(G) Ft−1 = EQ BtEQ BT G Ft Ft−1 Bt Bt −1 = Bt−1EQ BT G Ft−1

By combining the hedging between times t−1 and t, and the hedging between t and maturity T , we obtain an hedging strategy over the all discrete time interval. The martingale method enables to compute directly price and replicating strategy at all times t by computing Q-expectations. The predictable representation property of the Q-martingale M gives

Theorem 3.0.1. Discrete Clarck-Ocone formula:

t ¯ ¯ X ¯ s−1 EQ(G|Ft)(ω) = EQ(G) + ∇sEQ(G(ω)|Fs) ωs − qs(ω ) s=1 t ¯ X ∇uEQ(G(ω)|Fu) = E (G¯) + ∆S¯ Q ∇ S¯ u u=1 u u ¯ where by deﬁnition ∇tEQ(G(ω)|Ft) is Ft−1-measurable. 88CHAPTER 3. OPTION PRICING AND HEDGING IN THE COX-ROSS-RUBINSTEIN BINOMIAL MARKET MODEL

We set

This means that to obtain a portfolio with value EQ(G|Ft) at time t, we need to invest

ct−1 := EQ(G|Ft−1)/(1 + Rt)

Bt at time (t − 1). Equivalently, to have EQ(G |Ft) in our portfolio at time t BT we need to invest the amount

Bt−1 EQ(G |Ft−1) at time (t − 1) . BT

Inductively , to have G = EQ(G|FT ) at time T we have to invest at time s ≤ T the amount

Bt ct(G) = EQ(G |Ft) BT at time t. The hedging at time (t − 1) is given by

Bt ∇tEQ(G(ω) |Ft) BT ∇tct(G) γt = = , ∇tSt ∇tSt 1 βt = ct−1(G) − γtSt−1 Bt−1 giving

t X Vt = ct(G) = c0(G) + γu∆Bu + βu∆Bu u=1 T X VT = G = c0(G) + γu∆Bu + βu∆Bu u=1 89

When Rt is deterministic, we can take the discounting factors Bt/BT outside the conditional expectation also when T − t > 1. When (Dt,Rt,Ut) are deterministic, under the martingale measure Q the random variable ωt is independent from the past. Then the computation of the hedging strategy may be simpliﬁed by using the following formula:

Corollary 3.0.1. If (Dt,Rt,Ut) are deterministic at all t ≤ T , conditional expectation and gradient commute in Ito-Clarck formula

EQ(∇tG|Ft)(ωt) = X = G(ωt−1, 1, ωt+1,T ) − G(ωt−1, 0, ωt+1,T ) Q(ωt+1,T ) ωt+1,T ∈{0,1}T −t

= ∇tEQ(G|Ft)(ωt) which is Ft−1-measurable.

Example 3.0.1. Assume that Rt = r, Ut = u, Dt = d deterministic, with −1 < d < r < u. Then qt = q = (r − d)/(u − d) is constant. We have that

Nt t−Nt St = S0(1 + u) (1 + d) t P where Nt = ωs. s=1 Then if G(ω) = ϕ(ST ) is a plain european option, we compute the price at time t = 0 using the distribution Binomial(q, T ).

V0 = c0(G) = B0EQ(ϕ(ST )/BT ) = T X T (1 + r)−T qn(1 − q)T −nϕS (1 + u)n(1 + d)T −n . n 0 n=0

The conditional distribution of (NT − Nt) given Ft is Binomial(q, T − t), and the price of the replicating portfolio at time t is given by

Vt = ct(G) = BtEQ(ϕ(ST )/BT |Ft) = T −t X T − t (1 + r)t−T qn(1 − q)T −t−nϕS (1 + u)Nt+n(1 + d)T −Nt−n . n 0 n=0 90CHAPTER 3. OPTION PRICING AND HEDGING IN THE COX-ROSS-RUBINSTEIN BINOMIAL MARKET MODEL this wealth is invested in γt+1 stocks and the rest in the bank account, with

∇t+1ct+1(G) t+1−T EQ(∇t+1G|Ft) γt+1 = = (1 + r) = ∇t+1St+1 St(u − d) T −t−2 1 X T − t − 2 (1 + r)t+1−T qn(1 − q)T −t−2−n × S (u − d) n t n=0 Nt+n+1 T −Nt−n−2 Nt+n T −Nt−n−1 × ϕ S0(1 + u) (1 + d) − ϕ S0(1 + u) (1 + d) Chapter 4

American options in discrete time

4.1 Optimal Stopping and dynamic programming

1 Deﬁnition 4.1.1. Let Ft : t ∈ 0, 1 ...,T be an F-adapted process in L (P). The Snell envelope of Ft is the smallest supermartingale Ut with Ut ≥ Ft ∀t, and it is constructed by backward induction (also called dynamic programming) as UT := FT ,Ut := Ft ∨ EP Ut+1|Ft , t = (T − 1,T − 2,..., 1, 0)

Proof. By construction Ut ≥ Ft ∀t, and Ut is a supermartingale. If is another supermartingale with Yt ≥ Ft, YT ≥ FT = UT and by backward induction Yt ≥ EP (Yt+1|Ft) ∨ Ft ≥ EP (Ut+1|Ft) ∨ Ft = Ut ∀t = T − 1,T − 2,...,, 1, 0

Lemma 4.1.1. For each 0 ≤ t ≤ T the random time

τt = inf{r : t ≤ r ≤ T such that Fr = Ur} (4.1.1)

is a stopping time, and the stopped supermartingale (Ur∧τt : r = t, t+1,...,T ) is a martingale.

Proof. We check the martingale property: for t ≤ r ≤ T Ur∧τt 1(τt > r) = Ur1(τt > r) = EP Ur+1 Fr 1(τt > r) = EP Ur+11(τt > r) Fr = EP Ur+1∧τt 1(τ > r) Fr = EP Ur+1∧τt 1(τ > r) Fr 1(τ > r)

91 92 CHAPTER 4. AMERICAN OPTIONS IN DISCRETE TIME and

Ur∧τt 1(τt ≤ r) = Uτt 1(τt ≤ r) = Fτt 1(τt ≤ r) = U(r+1)∧τt 1(τt ≤ r)EP U(r+1)∧τt 1(τt ≤ r) Fr = EP U(r+1)∧τt Fr 1(τt ≤ r)

Theorem 4.1.1. The Snell envelope of the F-adapted process Ft can be deﬁned also as Ut = ess sup EP Fτ Ft τ∈Ft where we take the essential supremum over

Tt = { F-stopping times τ with values in t, t + 1,...,T

Proof. Since Ut is a supermartingale, for any stopping time τ, also the stopped process

t τ X Ut = Ut∧τ = 1(τ ≥ t)∆Ut, 0 ≤ t ≤ T k=1 is a supermartingale, since it is the martingale transform of a non-negative bounded predictable process w.r.t. to a a supermartingale. For all stopping times τ with values in {t, t + 1,...,T }

EP(Fτ |Ft) ≤ EP(Uτ |Ft) = EP(Uτ∧T |Ft) ≤ EP(Uτ∧t|Ft) = Ut (4.1.2) therefore Ut ≥ ess sup EP Fτ Ft τ∈Ft

Moreover the stopping time τt deﬁned in (4.1.1) belongs to Tt, and by Lemma the process (Ur : t ≤ r ≤ T ) stopped at time τt is a martingale, and since necessarily Uτt = Fτt , the inequalies in (4.1.2) become equalities

E(Fτt |Ft) ≤ E(Uτt |Ft) = E(Uτt∧T |Ft) ≤ E(Uτt∧t|Ft) = Ut

Deﬁnition 4.1.2. We say that a stopping time τe is an optimal exercise time for the americal option if (F ) = ess sup (F ) = U EP τe EP τ 0 τ∈T0 4.2. PRICING AND HEDGING AMERICAN OPTIONS 93

Proposition 4.1.1. A stopping time τe is an optimal exercise time if and only if the following two conditions hols

U τe = U 1. the stopped supermartingale t t∧τe is a martingale. U = F 2. τe τe.

Proof. The supermartingale U has Doob decomposition Ut = Mt − At where Mt is a martingale, and At is a predictable non-decreasing process with A0 = 0. By deﬁnition τe is optimal if and only if U = (F ) = (U ) = (U ) = (U τe) 0 EP τe EP τe EP T ∧τe EP T ≤ (U τ ) = (U ) = (U ) = U EP 0 EP 0∧τe EP 0 0 where we assume that F0 is P-trivial, which implies that the inequality is an equality and since U dominates F necessarily F = U -almost surely. τe τe P On the other hand we have also (M ) = M = U = E (U ) = E (M ) − (A ) EP τe 0 0 P τe P τe EP τ where the ﬁrst equality follows from the Doob optional stopping theorem for the bounded stopping time τe ≤ T For the opposite implication, M = M = U = U = M − A EP τe 0 0 EP τe EP τe EP τ which implies Aτ = 0 P-almost surely, and the stopped supermartingale U = M is a martingale. t∧τe t∧τe

4.2 Pricing and hedging american options

On a probability space (Ω, Ft, P) equipped with the ﬁltration F = (Ft : t = 0, 1,...,T ), we have an arbitrage free market model with (Bt,St) F-adapted price processes. Denote by Set = St/Bt the discounted prices with respect to the numeraire instrument B. When the market is completer there is an unique equivalent martingale measure Q ∈ P∗ be an equivalent martingale measure for Set (in the case of incomplete market we may think for simplicity that the market chooses for us a pricing measure Q ∈ P∗).

Deﬁnition 4.2.1. Let Ft ≥ 0, t = 0, 1,...,T be an F-adapted process. An american option with face-value process Ft is a contract which gives to the holder the possibility at any time t ∈ {0, 1, 2,...,T } to exercise the option and receive the payment Ft. 94 CHAPTER 4. AMERICAN OPTIONS IN DISCRETE TIME

The option can be exercised only once, and at every time t the decision whether to exercise the option immediately or to wait, in case the option was not exercised before, can be based only on the current information available, which is carried by the σ-algebra Ft. The exercise strategy correspond to an F-stopping time τ(ω) taking values in {0, 1,...,T }. The discounted price process Uet of the american option is the Snell envelope under the martingale measure Q of the discounted face value process Fet = Ft/Bt, deﬁned by backward induction as UeT = FeT , Uet = Fet ∨ EQ Uet+1 Ft t = T − 1,..., 1, 0 or by the equivalent deﬁnition

Uet = sup EQ Feτ Ft τ∈Tt and the price of the american option at time t is Ut = UetBt.

4.2.1 Hedging american options Under the assumption of market completness, the option writer constructs an hedging for the american option as follows: starting from the Doob decomposition of the supermartingale Ue

Uet = Mft − Aet where Mf is a Q-martingale and Aet is predictable and non-decreasing with Ae0 = 0, she constucts a predictable portfolio (ηt, ξt) replicating the standard option MfT with ﬁxed maturity T , the time horizon of the american option, with discounted value process MfT satisfying the self ﬁnancing condition

Mft = ηt + ξtSet = ηt+1 + ξt+1Set

By construction

Mft = Uet + Aet ≥ Fet + Aet ≥ Fet, ∀t = 0, 1,...,T which means that and hedging strategy for Mft is a superhedging for the discounted face value process Fet at all maturities 0 ≤ t ≤ T , and Mfτ ≥ Feτ for every exercise strategy τ chosen by the option holder. 4.2. PRICING AND HEDGING AMERICAN OPTIONS 95

4.2.2 Optimal exercise strategies for the option holder In order to act optimally, the option holder should not to exercise the option before τ0 τ0 = inf t : Fet ≥ EQ Uet+1 Ft = inf t : Fet = Uet which is the ﬁrst optimal exercise time, and should at the latest exercise the option at the stopping time

∗ τ = inf t : Aet+1 > 0 = inf t : Fet > EQ Uet+1 Ft

∗ which is the latest optimal exercise time (τ is a stopping time since Aet is predictable). When the option holder uses a suboptimal strategy, the option writer who is hedging the option will gain a positive proﬁt.

4.2.3 Relation between american and european call options

Proposition 4.2.1. When the discounted face value process Fet = Ft/Bt, t = 0, 1,...,T is a submartingale w.r.t. the risk neutral pricing measure Q. Then Uet = Ut/Bt is a Q-martingale, The constant exercise strategy τe ≡ T is optimal and the price of the american option (Ft : t = 0, 1,...,T ) coincides with the price of the standard option FT with maturity T . In particular this holds when the face value has the form Ft = f(St/Bt)Bt with convex f.

Proof. At the time horizon T , UT = FT

FT BT −1EQ( | Ft) ≥ FT −1 BT which implies

UT UT UT −1 = FT −1 ∨ BT −1EQ( | FT −1) = BT −1EQ( | FT −1) BT BT and by backward induction

Ut Ut Ut−1 = Ft−1 ∨ Bt−1EQ( |Ft−1) = Bt−1EQ( |Ft−1) Bt Bt since

Ft Ut Ft−1 ≤ Bt−1EQ( |Ft−1) ≤ Bt−1EQ( |Ft−1) Bt Bt 96 CHAPTER 4. AMERICAN OPTIONS IN DISCRETE TIME

∗ Now since Uet is a martingale, the last optimal stopping time τ = inf{t ≤ T : At+1 > 0} ≡ T . Therefore it is optimal for the option holder to wait and exercise the option at the time horizon T , and so for american option holder following this optimal strategy there is no diﬀerence between the american option and the corresponding standard option FT with ﬁxed exercise time T . When Fet = f(Set) with f convex, it follows by Jensen inequality for the conditional expectation that it is a Q submartingale.

+ Corollary 4.2.1. An american option with face value Ft = (St − K) = B S − K +, t = 0, 1,...,T 0 ≤ t ≤ T t et Bt , at time has value

+ K + ST − K Ut = BtEQ( SeT − Ft = BtEQ( Ft BT BT

, the price of a standard european call.

Remark Note that this does not hold for american options of put-type, and there is no put-call parity for american options.

4.3 Complements on American option

For the american option with (discounted) face value (Ft : t = 0, 1,...,T ), the value is given by UT = FT and by dynamic programming Ut = max Ft,EQ(Ut+1|Ft) t < T

We have

+ Ut = EQ(Ut+1|Ft) + Ft − EQ(Ut+1|Ft) which leads to the Doob decomposition t < T

+ Ut+1 − Ut = Ut+1 − EQ(Ut+1|Ft) − Ft − EQ(Ut+1|Ft) (4.3.1) | {z } | {z } ∆Mt+1 ∆At+1 so that Ut = U0 + Mt − At where M0 = A0 = 0, Mt is a martingale and A is F-predictable. Ut is also characterized as the smallest supermartingale such that Ut ≥ Ft ∀t. 4.4. DUAL REPRESENTATION 97 4.4 Dual representation

Recall that an F-adapted and integrable process Mt is a (F,Q)- martingale if and only if EQ Mτ = EQ(M0) for all bounded stoopping times τ(ω). Let Tt = τ F-stopping time with t ≤ τ ≤ T . Note that if (Nu : t ≤ u ≤ T ) is a martingale with Nt = 0 and τ ∈ Tt, EQ(Nτ |Ft) = EQ(Nτ ) = 0 and Ut := sup EQ Fτ F t τ∈Tt = sup EQ Fτ − Nτ Ft ≤ EQ sup {Fu − Nu} Ft τ∈Tt t≤u≤T since (Fτ − Ntau) ≤ sup0≤u≤T Fu − Nu Therefore Ut := sup EQ Fτ Ft (4.4.1) τ∈Tt

≤ inf EQ sup {Fu − Nu} F t (4.4.2) N martingale with Nt = 0 t≤u≤T

In fact by using the Doob decomposition (4.3.1) we can show that the inﬁmum is attained by choosing Nv = Mv − Mt = (Av − At) + Uv − Ut, the Doob decomposition and the inequality (4.4.6) in fact is an equality. Ut := sup EQ Fτ F t (4.4.3) τ∈Tt

inf EQ sup { Fu − Nu} F t (4.4.4) N martingale with Nt=0 t≤u≤T

Lemma 4.4.1. Ut = max Ft,Ft+1 − ∆Mt+1,Ft+2 − ∆Mt+2 − ∆Mt+1,..., (4.4.5)

...,FT − ∆MT − ∆MT1 − · · · − ∆Mt+1

Proof. By backward induction. (4.4.5) holds at t = T since FT = UT . As- suming that it holds at time t ≤ T , we have Ut−1 = max Ft−1,EQ(Ut|Ft−1) = max Ft−1,Ut − ∆Mt = max Ft−1,Ft − ∆Mt,Ft+1 − ∆Mt+1 − ∆Mt,Ft+2 − ∆Mt+2 − ∆Mt+1 − ∆Mt,...

...,FT − ∆MT − ∆MT1 − · · · − ∆Mt+1 − ∆Mt and the martingale Nv = Mv −Mt = Uv +Av −U0 +A0 achieves the bound 98 CHAPTER 4. AMERICAN OPTIONS IN DISCRETE TIME

Although in order to ﬁnd the martingale Nt which achievesthe minimum Ut := sup EQ Fτ F t (4.4.6) τ∈Tt

= inf EQ sup { Fu − Nu} F t N martingale with Nt = 0 t≤u≤T we would need to compute the Snell envelope, often a suboptimal choice of the martingale N results in an useful upper bound for the american option prices.

4.5 Software tools for computing some option prices in the binomial tree model

We start to get familiar with the octave/matlab scientific computing language and with the financial package. Octave is free gnu-software for scientific computing. It is a clone of the commercial Matlab software. With a licence you can also use Matlab, the computer code needs only minor modification to switch between Matlab and Octave. Download and install Octave (the current version is 4.2.2) on your device following the instruction link https://www.gnu.org/software/octave/download. html There is also a free android version for tablets/phones downloadable from the play store. https://play.google.com/store/apps/details?id=com.octave&hl=en https://play.google.com/store/apps/details?id=com.octave.financial& hl=en For example in ubuntu you can install octave with these commands: sudo apt−add−repository ppa:octave/stable sudo apt−get update sudo apt−get install octave Download and install the octave financial package http://octave.sourceforge. net/financial/ http://octave.sourceforge.net/financial/overview. html For example in the ubuntu linux distribution: 4.5. SOFTWARE TOOLS FOR COMPUTING SOME OPTION PRICES IN THE BINOMIAL TREE MODEL 99 sudo apt−get install liboctave −dev sudo apt−get install octave−i n f o sudo apt−get install octave−f i n a n c i a l octave −−force −gui

>> pkg install −forge financial

>> pkg load financial

>>K= 60; S0 = 50; r = 0.05, sigma= 0.2, Dt = 0.01; N= 100;

>> americancall = binprice(K,S0,r,sigma,Dt,T,1);

>> americanput= binprice(K,S0,r,sigma,Dt,T,1);

>> europeanput=americancall+K−S0 ; The output is the asset price and American option value at each node of the binary tree. The octave and matlab function binprice() computes American call and put option prices using the binomial tree CRR model.

[AssetPrice , OptionValue] = binprice (Price , Strike , Rate, Time, Increment, Volatility , OptType)

>> help binprice

>> doc binprice Since in the CRR model the european and american call option have the same price, you can use the same binprice formula together with the call-put parity for european options to get the price of the european put. Run the example >> option_example downloadable from https://wiki.helsinki.fi/pages/viewpage.action? pageId=164343144&preview=/164343144/194680277/option_example.m

%%% octave 4.0.0 code by Dario Gasbarra [email protected] 21.3.2016 % uncomment to install the package if not installed 100 CHAPTER 4. AMERICAN OPTIONS IN DISCRETE TIME

%pkg install −forge financial pkg load financial K= 60; S0 = 50; d=−0.1, r = 0.01, u=0.2, T = 10; [S,americancall , fv_americancall] = binpriceFacevalue(K,S0,d,r,u,T,0); [S,americanput , fv_americanput]= binpriceFacevalue(K,S0,d,r ,u,T,1); europeanput=americancall+K−S0 ; crrgraph(americancall , fv_americancall ); title(’American call option prices and face value in the CRR binary tree model’) f i g u r e crrgraph(americanput , fv_americanput ); title(’American put option prices and face value in the CRR binary tree model’) using the function crrgraph() downloadable from https://wiki.helsinki. fi/pages/viewpage.action?pageId=164343144&preview=/164343144/194680274/ crrgraph.m

octave 4.0.0 code by Dario Gasbarra [email protected] 21.3.2016 function crrgraph(option ,face) T=size(option ,1); N=T∗(T+1)/2; B=zeros (T,T); XY=zeros (N,2); l a s t =0; V=zeros(N,1); F=zeros(N,1); f o r ( i =1:T) idx=(last+1):(last+i ); B(i ,1: i)=idx; XY(idx ,2)=i ; XY(idx ,1)=(1: i ); V(idx)=option(1:i , i ); F(idx)=face(1:i ,i ); last=last+i ; e n d f o r ; A=zeros (N,N); f o r ( i =1:(T−1)) f o r ( j =1: i ) A( B(i,j), [B(i+1,j) B(i+1,j+1)])=1 ; e n d f o r ; e n d f o r ; A=A+A’ ;

%XY(1 ,:)=XY(1 ,:)+T/2; %XY( 2 , : ) =T−XY( 2 , : ) ; gplot (A,XY, ’ − ∗ ’); axis("off") dy =0.3 f o r ( i =1:N) text( XY(i ,1) ,XY(i ,2) ,num2str(V(i ))); text( XY(i ,1) ,XY(i ,2)+dy,num2str(F(i ))); e n d f o r ; endfunction ; 4.5. SOFTWARE TOOLS FOR COMPUTING SOME OPTION PRICES IN THE BINOMIAL TREE MODEL 101

and the function binpriceFacevalue(), downloadable from https://wiki. helsinki.fi/pages/viewpage.action?pageId=164343144&preview=/164343144/ 194680273/binpriceFacevalue.m

%%% octave 4.0.0 code by Dario Gasbarra [email protected] 21.3.2016 %%% it gives also the face values of the american option

## This program is free software; you can redistribute it and/or modify it under ## the terms of the GNU Lesser General Public License as published by the Free ## Software Foundation; either version 3 of the License, or (at your option) any ## later version. ## ## This program is distributed in the hope that it will be useful , but WITHOUT ## ANY WARRANTY; without even the i m p l i e d warranty o f MERCHANTABILITY or ## FITNESS FOR A PARTICULAR PURPOSE. See the GNU L e s s e r General P u b l i c L i c e n s e ## for more details. ## ## You should have received a copy of the GNU Lesser General Public License ## along with this program; if not, see .

## Computes the American call and put option prices using the ## Cox−Ross−Rubinstein binomial tree. ## @seealso {binprice ,blkprice , blsprice}

function [AssetPrice , OptionValue, FaceValue] = binpriceFacevalue(Price , Strike ,d,r,u,N,OptType ) %%% OptType=0 for american call option , %% OptType=1 for american put option %%% the function returns at each node of the CRR binary tree %%% the price of the asset AssetPrice %%% the price of the american option %%% and the face value of the option

i f ( ( d<= − 1)||( r<=d)||( r>=u)||(u

if (nargin!=7) print_usage (); elseif (! isbool (OptType) && ! isequal (OptType, 0) && ... ! isequal (OptType, 1)) error ("binprice: OPTTYPE must be logical , 0 (call) or 1 (put)"); e n d i f

AssetPrice = OptionValue = FaceValue=zeros (N);

## Martingale measure a=(1+r ) ; p = ( r− d ) / ( u − d ) ; q = 1 − p ;

## Build tree AssetPrice(1, 1) = Price; f o r n = 2 :N AssetPrice (1:(n − 1), n) = (1+u) ∗ AssetPrice (1:(n − 1 ) , n − 1 ) ; AssetPrice(n, n) = (1+d) ∗ AssetPrice(n − 1 , n − 1 ) ; e n d f o r

## Initial condition opt = 2 ∗ OptType − 1 ; FaceValue(: ,N)=OptionValue(: , N) = max (opt ∗ (AssetPrice(: , N) − Strike), 0);

## Time−stepping loop f o r n = (N − 1): −1:1 HoldValue = (q ∗ OptionValue(2:(n + 1), n + 1) + ... p ∗ OptionValue(1:n, n+ 1)) / a; FaceValue(1:n, n)=opt ∗ (AssetPrice(1:n, n) − S t r i k e ) ; %OptionValue(1:n, n) = max (HoldValue, opt ∗ (AssetPrice(1:n, n) − S t r i k e ) ) ; OptionValue(1:n, n) = max (HoldValue,FaceValue(1:n, n)); e n d f o r

endfunction

## T e s t s %K= 60; S0 = 50; d=−0.1, r = 0.01, u=0.2, T = 10; % [S,americancall , fv_americancall] = binpriceFacevalue(K,S0,d,r,u,T,0); % [S,americanput , fv_americanput]= binpriceFacevalue(K,S0,r ,sigma ,Dt,T,1); 102 CHAPTER 4. AMERICAN OPTIONS IN DISCRETE TIME Chapter 5

Calculus with respect to functions of ﬁnite variation

The main goal of these lectures is to define an integral Z t Y · X)t := Y (s)dX(s) 0 where X and Y are sample paths of stochastic processes. We start from the case of X and Y with finite variation on compacts, where standard Lebesgue- Stieltjes integration theory applies. Later on we will study Ito stochastic integration theory, which is based on martingale theory, and it allows to define stochastic integrals as for example Z t W (s)dW (s) 0 where W (t) is a Brownian motion, whose sample paths, with probability P = 1, are continuous and have infinite variation on every interval. Proposition 5.0.1 (Integration by parts formula). Let X,Y : R → R non- decreasing and right-continuous functions. Then ∀a < b, Z b Z b X Y (t)X(dt) = Y (t−)X(dt) + ∆Y (t)∆X(t) a a t∈(a,b] Z b = X(b)Y (b) − X(a)Y (a) − X(t−)Y (dt) a where these Lebesgue Stieltjes integrals (which in fact do exist also in Rie- mann Stieltjes sense), X(t−) = lims↑t X(s) denotes the limit from the left and ∆X(t) = X(t) − X(t−).

103 104CHAPTER 5. CALCULUS WITH RESPECT TO FUNCTIONS OF FINITE VARIATION

It holds also when X and Y have ﬁnite variation the interval [a, b], meaning that

X(t) = X⊕(t) − X (t),Y (t) = Y ⊕(t) − Y (t), with non-decreasing and right-continuous functions X⊕,X ,Y ⊕,Y .

Proof. By proving the formula separately for the components of Z b Y (s)X(ds) = a Z b Z b Z b Z b Y ⊕(s)X⊕(ds) − Y (s)X⊕(ds) + Y (s)X (ds) − Y ⊕(t)X (ds) a a a a we can assume that X and Y are non-decreasing. We integrate with respect to the measures X(ds) and Y (ds) deﬁned on R, by Caratheodory extension theorem such that X(u, v] = X(v)−X(u) and Y (u, v] = Y (v)−Y (u) ∀u < v. We have Z b Z Z ∞ Y (t)X(dt) = Y (t)X(dt) = 1(a,b](t)Y (t)X(dt), a (a,b] −∞ and as t ≥ a, Z ∞ Y (t) = Y (a) + 1(s ≤ t)Y (ds) . a This implies

Z b Z b Z ∞ Y (t)X(dt) = Y (a) + 1(s ≤ t)Y (ds) X(dt) = a a a Z bZ b Y (a)X(b) − X(a) + 1(s ≤ t)Y (ds) X(dt) a a Since the measures X(dt) and Y (ds) are ﬁnite on the compact interval [a, b] and the integrand is non-negative, by Fubini theorem we can change the order of integration obtaining

Z bZ b = Y (a)X(b) − X(a) + 1(s ≤ t)X(dt) Y (ds) = a a Z b Y (a)X(b) − X(a) + X(b) − X(s−)Y (ds) = a Z b X(b)Y (b) − X(a)Y (a) − X(s−)Y (ds) . a 105

Note that we could also write

Z b Z b X Y (t)X(dt) = Y (t−)X(dt) + ∆Y (t)∆X(t) a a t∈(a,b] since a non-decreasing function has at most a countable number of jumps

Deﬁnition 5.0.1. A right-continuous function with ﬁnite left-limits is called càdlàg, the french abbreviation for "continue à droite, limite à gauche".

Deﬁnition 5.0.2. For X, Y càdlàg with ﬁnite variation on [0,T ] X [X,Y ]t := ∆X(s)∆Y (s) 0

Example 5.0.1. For sequences (xt : t ∈ N)(yt : t ∈ N), denote ∆xt = xt − xt−1 and ∆yt = yt − yt−1. In this case discrete integration by parts is known as Abel summation formula:

t t t X X X xtyt − x0y0 = xs−1∆ys + ys−1∆xs + ∆xs∆ys+ (5.0.1) s=1 s=1 s=1 t t t t X X X X = xs∆ys + ys−1∆xs = xs−1∆ys + ys∆xs s=1 s=1 s=1 s=1

Theorem 5.0.1 (Chain rule of calculus). Let X(t) be a cadlag function with ﬁnite variation on [0,T ], let

a = inf{X(t): t ∈ [0,T ]} > −∞, b = sup{X(t): t ∈ [0,T ]} < ∞, and f ∈ C1([a, b]). Then ∀t ∈ [0,T ], we have the change of variable formula

f(X(t)) − f(X(0)) = (5.0.2) Z t X f 0(X(s))X(ds) + f(X(s)) − f(X(s−) − f 0(X(s))∆X(s) = 0 0≤s≤t Z t X f 0(X(s))Xc(ds) + f(X(s)) − f(X(s−) 0 0≤s≤t 106CHAPTER 5. CALCULUS WITH RESPECT TO FUNCTIONS OF FINITE VARIATION where X Xc(t) = X(t) − ∆X(s) 0≤s≤t is the continuous part of X. Proof. Let A = f ∈ C1([a, b]) such that (5.0.2) holds we show that A is an algebra, closed under linear combinations and multipli- cation. Linearity is clear by the linearity of the derivative and the integrals. Let h(x) = f(x)g(x) with f, g ∈ A. By using the integration by parts h(X(t)) − h(X(0)) = f(X(t))g(X(t)) − f(X(0))g(X(0)) = Z t X f(X(s))g0(X(s))Xc(ds) + f(X(s−))g(X(s)) − g(X(s−) 0 0

Z t X = h0(X(s))Xc(ds) + f(X(s))g(X(s)) − f(X(s−))g(X(s−)) 0 0

0 0 k f − pn k∞ + k f − pn k∞→ 0 as n → ∞. where the supremum norm is taken over the compact interval [a, b]. Neces- sarily

0 sup k pn k∞ + sup k pn k∞< ∞ n n 107 and we have uniform Lipschitz continuity

0 pm(x) − pm(y) ≤ x − y sup k pn k∞ ∀x, y ∈ [a, b], m ∈ N n which implies

0 sup pn(X(s)) − pn(X(s−)) ≤ ∆X(s) sup k pn k∞ n n with X ∆X(s) < ∞ 0

By the bounded convergence theorem for the Lebesgue-Stieltjes integrals with respect to the positive measures derived from the non-decreasing processes c P X (t) and s≤t ∆X(s),

Z t 0 c X lim pn(X(s))X (ds) + pn(X(s)) − pn(X(s−) n→∞ 0 0

Lemma 5.0.1. When X(t) is right continuous with ﬁnite variation on [0,T ] and Y (t) is bounded and measurable, the integral

Z t J(t) = Y · X t = Y (s)X(ds) 0 itself is a right-continuous functions with ﬁnite variation on [0,T ]. If V (s) for another bounded measurable integrand we have

Z t Z t V (s)J(ds) = V (s)Y (s)dX(s) (5.0.3) 0 0 108CHAPTER 5. CALCULUS WITH RESPECT TO FUNCTIONS OF FINITE VARIATION

Proof. Without loss of generality assume that X(t) is non-decreasing on [0,T ]. By taking the positive and negative parts of the integrand, Y +(s) = Y (s) ∨ 0, Y −(s) = (−Y (s)) ∨ 0, we decompose the integral as Z t Z t J(t) = Y +(s)X(ds) − Y −(s)X(ds), t ∈ [0,T ] 0 0 which is the difference of two bounded non-decreasing functions. Therefore J(t) has finite variation on [0,T ]. By construction J(t) is right-continuous, with ∆J(s) = J(s) − J(s−) = Y (s)X(s) − X(s−) Let’s prove (5.0.3): it follows trivially for V (s) = 1(s ≤ r), since Z t Z t∧r Z t V (s)Y (s)X(ds) = Y (s)X(ds) = J(t ∧ r) = V (s)J(ds). 0 0 0 Since the intervals generate the Borel σ-algebra, (5.0.3) follows by the monotone class Theorem ?? for all bounded Borel-measurable V (s) if we can show that the class C of bounded functions V (s) satisying (5.0.3) is a monotone class. Obviously constant functions satisfy (5.0.3) and C is a vector space, by the linearity of the integral. Let 0 ≤ Vn(t) ≤ Vn+1(t) ↑ V (t) pointwise as n ↑ ∞, with Vn ∈ C. By the monotone convergence theorem it follows that V satisfies (5.0.3), and V ∈ C when it is bounded. This monotone class arguments of this kind are used over and over in stochastic analysis

Example 5.0.2. For a discrete sequence (yn : n ∈ N) with yn 6= 0, ∆yn = yn − yn−1, and 1 1 1 1 ∆ = − = − ∆yn yn yn yn−1 ynyn−1 and by using Abel summation formula n n n n xn x0 X 1 X xk−1 X 1 X xk − = ∆xk − ∆yk = ∆xk − ∆yk yn y0 yk yk−1yk yk−1 yk−1yk k=1 k=1 k=1 k=1 Definition 5.0.3. Let t 7→ X(t) a cadlag function with finite variation on compact intervals and X(0) = 0. We define the Doleans stochastic exponential c Y E(X)t = exp X (t) 1 + ∆X(s) 0

1. E(X)t is the unique solution bounded on compact intervals of the linear equation Z t Z(t) = 1 + Z(s−)X(ds) 0

2. Let Y (t) be another cadlag function with ﬁnite variation on compacts. the non-homogeneous stochastic exponential Z t Y 1 E (X)t = E(X)t Y (0) + Y (ds) 0 E(X)s is the unique locally bounded solution of the non-homogeneous linear equation Z t Z(t) = Y (t) + Z(s−)X(ds) 0 Hint: Use the integration by parts formula. 3. Yor formula

E(X)tE(Y )t = E(X + Y + [X,Y ])t

4. 1 Z t 1 = 1 − X(ds) E(X)t 0 E(X)s Note that

∆Xs ≥ −1; ∀s ≤ t ⇐⇒ E(X)s ≥ 0 ∀s ∈ [0, t]. and E(X)t = 0 ∀t ≥ τ = inf s : ∆Xs = −1 . On the other hand every non-negative right-continuous function Z with ﬁnite variation on compacts such that Z0 = 1 and

Zt = 0 =⇒ Zu = 0 ∀u ≥ t is the stochastic exponential of some right-continuous process X with ﬁnite variation on compacts, such that X0 = 0 and ∆Xt ≥ −1 ∀t, which is called the stochastic logarithm of Z and it is given by Z t 1(Zs− > 0) Xt = Log(Z)t = dZ(s) 0 Zs− X ∆Z(s) ∆Z(s) = log(Z ) + 1(Z > 0) − log 1 + . t s− Z Z 0

Convergence towards time continuous Black & Scholes model

We consider a sequence of discrete time market models, where the continuous (N) time interval [0,T ] is discretized uniformly with nodes tk = T k/N, k = 0, 1,...,N, N ∈ N. 1. r(N) > −1 is a deterministic sequence of investment returns with r(N) → 0 as N → ∞ and (1 + r(N))N → exp(rT ). For example, take r(N) = r/N.

(N) (N) k (N) 2. Bk = (1 + r ) B0 k = 0,...,N is a deterministic numeraire asset (N) with B0 = 1. 3. −1 < D(N) < r(N) < U (N) are deterministic sequences with D(N),U (N) → 0 as N → ∞. (N) 4. Sk (ω) > 0 k = 0,...,N are ﬁnancial instruments, with deterministic (N) (N) S0 −→ S0 with random returns Rk (ω) satisfying (N) (N) (N) (N) Sk (ω) − Sk−1(ω) (N) (N) −1 < D ≤ Rk (ω) := (N) ≤ U (P = 1) Sk−1(ω) with respect to a sequence of reference probabilities P(N). (N) (N) (N) Sek = Sk /Bk denotes the asset price discounted by the riskless numeraire. We assume that there is a sequence of risk neutral probabilities Q(N) ∼ P(N) (which is unique in the case the support of the conditional distri- (N) (N) bution of Rk given Fk−1, contains only 2 predictable values), (N) (N) (N) EQ(N) Sek | Fk−1) = Sek−1, k = 1,...,N,

111 112CHAPTER 6. CONVERGENCE TOWARDS TIME CONTINUOUS BLACK & SCHOLES MODEL

(N) (N) (N) where Fk = σ(S1 ,...,Sk ). (N) (N) We assume that the returns Rk are independent w.r.t Q , and

N 2 1 X (N) 2 σ = Variance (N) (R ) −→ σ ∈ (0, ∞) as N → ∞ N T Q k k=1

(N) Theorem 6.0.1. Under these assumptions the distribution of SN (ω) w.r.t. Q(N) converges in law towards the log-normal distribution of a random variable √ 1 S exp G(ω)σ T + r − σ2T 0 2 where G(ω) is standard Gaussian with E(G) = 0, E(G2) = 1. Definition 6.0.1. A sequence of random variables X(N) defined on the respective probability spaces (Ω(N), F (N), Q(N)), converges in law towards a random variable X defined on some probability space (Ω, F, Q), when for all bounded and continuous functions f(x)

(N) EQ(N) f(X ) −→ EQ f(X)

(N) Lemma 6.0.1. Central Limit Theorem: For each N, let (ξk : k = 1,...,N) P(N)-independent random variables such that (N) PN (N) 1. µ := k=1 EP(N) (ξk ) −→ µ

(N)2 PN (N) 2 2. σ := k=1 VarianceP(N) (ξk ) −→ σ ∈ (0, ∞)

(N) (N) 3. |ξk (ω)| ≤ c −→ 0 for some deterministic scaling sequence c(N). Then

N (N) X (N) Law √ X := ξk −→ µ + G(ω) σ k=1 with convergence in distribution, where G is standard Gaussian random variable Proof of Thm. 6.0.1. By taking 2nd order Taylor expansion

2 1 d log(1 + x) 2 1 2 1 1 2 log(1 + x) = x + 2 x = x − x + 1 − 2 x 2 dx x=η 2 2 (1 + η) 113 where η = η(x), 0 ≤ η ≤ x or −1 < x ≤ η ≤ 0. It follows that, ∀d ≤ x ≤ u,

1 1 1 − < |η| ≤ |u| ∨ |d| −→ 0 as d, u −→ 0. 2 (1 + η)2

N (N) X (N) log(SN ) = log(S0) + log(1 + Rk ) k=1 N N X (N) 1 (N) X (N) = log(S ) + R − (R )2 + O(|u(N)| ∨ |d(N)| ) (R )2 0 k 2 k k k=1 k=1 By Slutsky’s lemma, if X(N) −→Law X and Y (N) −→Law c deterministic constant, then

(X(N) + Y (N)) −→Law (X + c),X(N)Y (N) −→Law cX, it is enough to apply the Central Limit Theorem

N X (N) 1 (N) Law 1 R − (R )2 −→ N rT − σ2T, σ2T k 2 k 2 k=1 with N N X (N) 2 (N) 2 X (N) −1 2 2 (N) 2 EQ(N) (Rk ) = (r ) N + Var(Rk ) = N r T + (σ ) T k=1 k=1 −→ σ2T N X (N) 1 (N) 2 1 2 (N) R − (R ) −→ rT − σ T EQ k 2 k 2 k=1 114CHAPTER 6. CONVERGENCE TOWARDS TIME CONTINUOUS BLACK & SCHOLES MODEL Chapter 7

Quadratic variation and Ito-Föllmer calculus

In 1979 Hans Föllmer published a short paper with title “Ito calculus without probabilities”, where he showed how the stochastic calculus invented by Ito, using convergence in of Riemann sums in L2(Ω,P ) sense, applies surprisingly also pathwise for some non-random functions, using some special sequences of ﬁnite partitions. We choose to start our journey into stochastic analysis from the modern pathwise result of Föllmer, which is rather minimalist. Later in the following chapters we develop the classical Ito calculus based on martingales. Note that in the real world is often the case that a random process say (Bt(ω): t ∈ [0, 1]) is realized only once, and convergence in mean square sense or in probability remain rather abstract and unsatisfactory concepts, while almost sure convergence results are the most meaningful, since we are mainly interested in that single realized path. This approach is also discussed by Dieter Sondermann in his book Introduction to stochastic calculus for ﬁ- nance .

Let (xt) be the integrator and (yt) integrand funktions ⊕ ⊕ When (xt) has ﬁnite variation , that is xt = (xt − xt ), where x , x are non-decreasing (and therefore Borel-measurable), and (yt) is Borel measurable and bounded, the Lebesgue-Stieltjes integral is well deﬁned Z t Z t Z t ⊕ ysdxs = ysdxs − ysdxs 0 0 0

When ys is also piecewise continuous, or it has ﬁnite variation on compacts, the Lebesgue-Stieltjes and Riemann-Stieltjes integrals coincide. The diﬀer-

115 116CHAPTER 7. QUADRATIC VARIATION AND ITO-FÖLLMER CALCULUS ential calculus is ﬁrst order: for F (·) ∈ C1(R), Z t X F (xt) = F (x0) + Fx(xs)dxs + F (xs) − F (xs−) − Fx(xs−)(xs − xs−) 0 s≤t with correction terms appear at the discontinuities of xt. What happens when the integrator is xt has inﬁnite total variation ? Can we make sense of the limit of Riemann sums for some class of integrands ?

For a path xt of inﬁnite total variation we can do the following: by summing p-powers of small increments for some p > 1 and taking supremum we deﬁne the p-power variation of a continuous path xt as

(p) X p vt (x) = sup |xti+1 − xti | Π ti∈Π

(p) Since the increments are small, there is a chance that vt (x) < ∞ even in (1) the case were the total variation vt(x) = vt (x) = ∞. In Ito calculus we consider p = 2 but we use a weaker notion of p-variation, where instead of taking a supremum over all ﬁnite partitions Π, we take the limit under a given sequence of partitions. Consider a sequence of partitions {Πn} where

n n n n Πn = {0 = t0 < t1 < . . . , < tk < . . . }, lim tk = ∞, ∀n, k→∞ n n ∀t > 0, ∆(Πn, t) = sup tk+1 ∧ t − tk ∧ t → 0 for n → ∞ . n tk ∈Πn t ∧ s := min{t, s}. Usually we will use dyadic partitions

n −n Dn = tk = k2 : k ∈ N , n ∈ N

Deﬁnition 7.0.1. A continuous function x : [0, ∞) → R has pathwise quadratic variation [x, x]t among the sequence {Πn}, when

X 2 lim (xt ∧t − xt ∧t) = [x, x]t ∀t < ∞ (7.0.1) n→∞ i+1 i ti∈Πn and t 7→ [x, x]t is continuous. Remark For each n the approximating function

X 2 ξn(t) = (xti+1∧t − xti∧t)

ti∈Πn 117

is continuous since t 7→ xt is continuous. However in order to show that the limit [x, x]t would be continuous, we would need the stronger uniform convergence of ξn(t) to [x, x]t on compact intervals, which is not guaranteed, if nothing else is known about the continuous path xt, that’s why we need to include continuity in the deﬁnition of [x, x]t.

Lemma 7.0.1. When it exists, t 7→ [x, x]t is non-decreasing with [x, x]0 = 0. 2 For a constant c, [cx, cx]t = c [x, x]t. In particular, the quadratic variation is reﬂection invariant: [−x, −x]t = [x, x]t.

Let u < v and for each n large enough , let in < jn such that

n n n n tin−1 < u < tin < tjn−1 < v < tjn−1

Then

jn−1 2 2 X 2 2 ξn(v) − ξn(u) =(xtn − xtn ) − (xu − xtn ) + (xtn − xtn ) + (xv − xtn ) in in−1 in−1 ik ik−1 jn−1 k=in+1 2 2 ≥(xtn − xtn ) − (xu − xtn ) . in in−1 in−1

As n → ∞ where the last expression vanishes since x is uniformly continuous on compact intervals, and [x, x]v ≥ [x, x]u 2 Lemma 7.0.2. (Characterization): A continuous path t 7→ xt has quadratic variation [x, x]t among the sequence {Πn} if and only if the sequence of discrete measures

X 2 ξn(dt) = (xti+1 − xti ) δti (dt) ti∈πn converges weakly1 on compact intervals to a Radon measure2 ξ(dt) without atoms, which means that ξ({ t}) = 0 ∀t.

1 Weak convergence on compacts (also called vague convergence) of ξn → ξ means that for all continuous functions s 7→ ys with compact support Z Z ysξn(ds) → ysξ(ds)

2 A Radon measure ξ lives on the Borel σ-algebra of an Hausdorff space, and it is locally finite (every point has neighbourhood of finite measure) and it is inner regular, that is ξ(A) = supξ(K): compact K ⊆ A . 118CHAPTER 7. QUADRATIC VARIATION AND ITO-FÖLLMER CALCULUS

Proof (Suﬃciency) Consider a continuous integrand ys. Since y is uniformly continuous on the compact [0, 1], ∀ε > 0, there are k, m, τ1, . . . , τm such that the piecewise constant function

m ε X ε y (s) = yτj 1(τj ,τj+1](s) satisﬁes sup |y (s) − y(s)| < ε s≤t j=1 It follows Z t X 2 y (x n − x n ) − y d[x, x] ≤ ti ti+1 ti s s 0 ti∈πn:ti≤t Z t X ε 2 X 2 y (x n − x n ) − y d[x, x] + ε (x n − x n ) ti ti+1 ti s s ti+1 ti 0 ti∈πn:ti≤t ti∈πn m Z t X X 2 X 2 = y (x n − x n ) − y d[x, x] + ε (x n − x n ) τj ti+1 ti s s ti+1 ti n n 0 j=1 ti ∈πn:τj

j=1 0 Z t ε = (ys − ys)d[x, x]s + ε[x, x]t as n → ∞ . 0 and as ε → 0, from the deﬁnition of Riemann-Stieltjes integral it follows Z t X 2 lim y (x n − x ) = y d[x, x] , ti ti+1 ti s s n→∞ 0 ti∈πn and in the deﬁnition we have assumed that the non-decreasing function t 7→ [x, x]t is continuous, the corresponding measure ξ(dt) has no atoms. Proof of necessity: We approximate pointwise the indicator 1[0,t](s) by piecewise linear continuous functions  1 s ≤ t  1 s ≤ t − ε ε   y (s) = 1 + (t − s)/ε t < s ≤ t + ε , yε(s) = (t − s)/ε t − ε < s ≤ t  0 s > t + ε  0 s > t such that ε yε(s) ≤ 1[0,t](s) ≤ y (s), (7.0.2) which implies Z Z ε yε(s)ξn(ds) ≤ ξn([0, t]) ≤ y (s)ξn(ds) 119

As n → ∞ Z Z ε yε(s)d[x, x]s ≤ lim inf ξn([0, t]) ≤ lim sup ξn([0, t]) ≤ y (s)d[x, x]s n n which implies ∀ε > 0 Z ε lim sup ξn([0, t]) − lim inf ξn([0, t]) ≤ (y (s) − yε(s))d[x, x]s ≤ ξ((t − ε, t + ε]) ∀ε > 0 n n

=⇒ lim sup ξn([0, t]) − lim inf ξn([0, t]) ≤ ξ({t}) = 0 n n since by assumption the measure ξ(dt) has no atoms 2 Remark 7.0.1. Note that for s < t < u,

|xu − xs| ≤ |xu − xt| + |xt − xs| but

2 2 2 (xu − xs) = (xu − xt) + (xt − xs) + 2(xu − xt)(xt − xs)

2 2 which is not necessarily smaller than (xu − xt) + (xt − xs) . The quadratic variation behaves differently than the first variation, by re- fining the partition the approximating sum is not necessarily non-increasing. That’s the reason while in the definition of first variation we can take the supremum over all partitions, while with this definition of quadratic variation we follow a given sequence of partitions.

Remark 7.0.2. When xt is continuous with ﬁnite total variation in [0, t], it follows that [x, x]t = 0:

X 2 X (xti+1 − xti ) ≤ sup |xti+1 − xti | |xti+1 − xti | ti∈πn:ti≤t ti∈πn:ti≤t ti∈πn:ti≤t

≤ sup |xti+1 − xti |vt(x) → 0 kun n → ∞, ti∈πn:ti≤t where vt(x) < ∞ is the ﬁrst variation of the path. If for some sequence of partitions {Πn} exists strictly positive quadratic variation [x, x]t > 0, necessarily the ﬁrst variation is vt(x) = ∞.

We show that for continuous paths with quadratic variation a second order diﬀerential calculus holds. 120CHAPTER 7. QUADRATIC VARIATION AND ITO-FÖLLMER CALCULUS

Proposition 7.0.1. (Föllmer 1979): Let t 7→ xt a continuous path with pathwise quadratic variation among {Πn} with ∆(Πn, t) → 0 ∀t, and let F (x) ∈ C2(R). Then Ito formula holds: Z t 1 Z t F (xt) = F (x0) + Fx(xs)dxs + Fxx(xs)d[x, x]s , t > 0 (7.0.3) 0 2 0 where the pathwise Ito-Föllmer integral with respect to x exists as the limit of Riemann sums among the sequence {Πn}. Z t ←− X Fx(xs)d x s := lim Fx(xti )(xti+1 − xti ) 0 n t≥ti∈πn This is also called pathwise forward integral. Proof: take telescopic sums X F (xt) − F (x0) = lim F (xt ) − F (xt ) n i+1 i t≥ti∈πn and use Taylor expansion X F (xti+1 ) − F (xti ) =

t≥ti∈πn X 1 X X F (x )(x − x ) + F (x )(x − x )2 + r(x , x )(x − x )2 x ti ti+1 ti 2 xx ti ti+1 ti ti ti+1 ti+1 ti where by the middle-point theorem ∗ r(xti , xti+1 ) = Fxx(xi ) − Fxx(xti )

∗ for some xi ∈ (xti , xti+1 ]. Note that Rn(t) := sup r(xti , xti+1 ): ti ∈ Πn ∩ [0, t] −→ 0 (7.0.4) uniformly as ∆(Πn) → 0 since the map t 7→ Fxx(xt) is uniformly continuous on compacts. As n ↑ ∞, by deﬁnition of quadratic variation the second Riemann sums converges towards

t 1 Z F (x )d[x, x] 2 xx s s 0 121 and the remainder term is dominated by

X 2 Rn(t) (xti+1 − xti ) → 0 · [x, x]t when n → ∞ .

ti∈πn,tt≤t

Therefore the limit of Riemann sums among {Πn} exists, and it is given by

Z t ←− X Fx(xs)d x s := lim Fx(xti )(xti+1 − xti ) 0 n t≥ti∈πn 1 Z t = F (x ) − F (x ) − F (x )d[x, x] t 0 2 xx s s 0 2

Remark 7.0.3. 1. In general the existence and the value of such pathwise forward integral may depend on the particular sequence of partitions. When [x, x] exists for all {Πn}-sequences with ∆(Πn) → 0 and its value does not depend on the particular sequence,the forward integral R ←− Fx(xs)d x s is well deﬁned independently of the sequence of partitions.

2. The existence of quadratic variation in the sense of weak convergence on compacts was the minimal assumption which we used to derive Ito formula.

3. We have the following extension of Ito formula: if F (x, a) ∈ C2,1 and t 7→ at is continuous with ﬁnite variation, then

Z t ←− X Fx(xs, as)d x s := lim Fx(xti , ati )(xti+1 − xti ) 0 n t≥ti∈πn Z t 1 Z t = F (x , a ) − F (x , a ) − F (x , a )da − F (x , a )d[x, x] t t 0 0 y s s s 2 xx s s s 0 0 2

4. If xt and at are continuous, at has ﬁnite ﬁrst variation on compacts n and xt has quadratic variation [x, x]t among (Π ), then yt = (xt + at) n has also quadratic variation among (Π ) with [y, y]t = [x, x]t. Proof

X 2 (∆x + ∆a) = (∆x)2 + (∆a)2 + 2∆a∆x i 122CHAPTER 7. QUADRATIC VARIATION AND ITO-FÖLLMER CALCULUS

Therefore

X 2 (y n − y n ) ti ∧t ti−1∧t n n ti ∈Π X 2 X 2 = (x n − x n ) + (a n − a n ) ti ∧t ti−1∧t ti ∧t ti−1∧t n n n n ti ∈Π ti ∈Π X + 2 (x n − x n )(a n − a n ) −→ [x, x] , ti ∧t ti−1∧t ti ∧t ti−1∧t t n n ti ∈Π

where since a has ﬁrst variation vt(a) < ∞, [a, a]t = 0 and

X (xtn∧t − xtn ∧t)(atn∧t − atn ∧t) ≤ max xtn∧t − xtn ∧t vt(a) −→ 0 i i−1 i i−1 i i i−1 tn∈Πn i 2

5. . When F ∈ C1(R) and x is continuous with pathwise quadratic variation among {Πn}, then the function wt := F (xt) has also quadratic variation among {Πn} given by Z t 2 [w, w]t = Fx(xs) d[x, x]s 0 Proof: by Taylor expansion and Lemma 7.0.2:

X 2 X 2 2 F (x n ) − F (x n ) = F (x n ) x n − x n ti+1 ti x ti ti+1 ti n n ti ∈πn:ti ≤t i Z t X 2 2 + r x n , x n x n − x n −→ F (x ) d[x, x] n → ∞ ti ti+1 ti+1 ti x s s as i 0

∗n n n where for some ti ∈ [ti , ti+1], 2 2 r x n , x n = F (x ∗n ) − F (x n ) −→ 0 , ti ti+1 x ti x ti

2 uniformly on the compact interval [0, t] since s 7→ Fx(xs) is uniformly continuous. 6. Note that since Z t Z t ←− 1 zt = Fx(xs)d x s = F (xt) − F (x0) − Fxx(xs)d[x, x]s 0 2 0

(zt − f(xt)) has ﬁnite variation on compacts and it follows that Z t 2 [z]t = [f(x)]t = Fx(xs) d[x, x]s 0 123

7. We have deﬁned the pathwise forward integral Z t ←− ysd x s 0 2,1 for integrands yt = F (xt, zt) with F ∈ C and zt of ﬁnite variation. What about more general integrands ?

Let (Πn) a sequence of partitions with ∆(Πn) → 0 and y ∈ C([0, t], R). Note that

n X It (y) := yti (xti+1 − xti ) t≥ti∈πn is a linear operator. When xt has inﬁnite total variation, in particular when [x, x]t > 0 among the sequence (Πn), the integral operator

t Z ←− It(y) := ysd x s (7.0.5) 0 it is not well defined for all continuous integrands, (I mean in the case yt 1 has infinite variation but it not of the form f(xt, t) with f ∈ C ), and It in (7.0.5) is not a continuous operator on (C([0, t], R), | · |∞). Proposition 7.0.2. (From Protter book) If for all y ∈ C(R) exists n It(y) := lim It (y), n it follows that xt has finite first variation and therefore [x, x]t = 0. Proof: ∀n there is a continuous function y(n)(t) such that

(n) (n) (n) y (ti ) = sign(x (n) − x (n) ) ∀ti ∈ πn, ti+1 ti

(n) and |y |∞ = 1. For the operator norm

(n) X k In k≥ |In(y )| = sign x (n) − x (n) x (n) − x (n) ti ∧t ti−1∧t ti ∧t ti−1∧t (n) ti ∈Πn X = x (n) − x (n) , ti ∧t ti−1∧t (n) ti ∈Πn and

sup k In k≥ v(x)t, n 124CHAPTER 7. QUADRATIC VARIATION AND ITO-FÖLLMER CALCULUS since X vt(x) = lim |xt ∧t − xt ∧t| n→∞ i i−1 ti∈πn for any sequence of partitions with ∆(Πn) → 0. If ∀y ∈ C(R) there exists I(y) = limn In(y) < ∞ among {Πn}, necessarily 3 supn |In(y)| < ∞, and by the Banach Steinhaus theorem from functional analysis it follows that supn k In k< ∞, which implies v(x)t < ∞. Therefore if xt is a continuous path with infinite total variation vt(x) = +∞ on [0, t], we R t cannot define the pathwise integral 0 ysdxs for all continuous integrands ys. However it may be possible to define the pathwise integral on some subspace of integrands.

7.0.1 Ito-Föllmer calculus for random paths

Definition 7.0.2. Let (Xt(ω): t ≥ 0) a stochastic process with almost surely continuous paths defined on the probability space (Ω, F,P ). We say that X has stochastic quadratic variation process ([X,X]t(ω): t ≥ 0) when for all sequence of finite partitions {Πn} with ∆(Πn, t) → 0

X 2 P (Xti+1∧t − Xti∧t) → [X,X]t

ti∈Πn with convergence in probability

It follows that for any sequence of ﬁnite partitions {Πn} with ∆(Πn) → 0 4 there is a deterministic subsequence {Πn(m)} such that (ﬁrst for all t ∈ Q ∩ [0, ∞) and then by continuity of [X,X] for all t ≥ 0)

X 2 (Xti+1 (ω) − Xti (ω)) →[X,X]t(ω) P -almost surely ω (7.0.6)

t≥ti∈Πn(m)

3 Let’s recall Banach-Steinhaus theorem: Let (Iν : ν ∈ J) a family of linear continuous operators, Iν : X1 −→ X2, where (Xi, | · |Xi ), i = 1, 2 are normed-spaces. If ∀y ∈X1 ,

sup |Iν (y)|X2 < ∞, ν∈J then sup k Iν k< ∞, where k Iν k:= sup |Iν (y)|X2 |y|X1 : y ∈ X1} is the strong ν∈J operator-norm. 4 P Recall that ξn → 0 (in probability) if and only if for every subsequence (nk) there is a further subsequence (n ) such that ξ (ω) → 0 P -almost surely. The P -null set where k` nk` convergence fails may depend on the subsequence. 125 i.e. the stochastic quadratic variation and the pathwise quadratic variation among {Πn(m)} coincide P -almost surely. We also obtain a stochastic Ito formula where the stochastic forward integral is deﬁned as limit in probability of Riemann sums:

Proposition 7.0.3. Let Xt(ω) be a stochastic process which has continuous paths P -almost surely and with stochastic quadratic variation in the sense of convergence in probability. Then Ito formula (7.0.3) holds where the stochastic forward integral is deﬁned as limit in probability of Riemann sums for any sequence of partitions (Πn) with ∆(Πn, t) → 0: X P − lim Fx(Xtn ) Xtn∧t − Xtn ∧t (7.0.7) n→∞ i−1 i i−1 n n ti ∈Π Z t ←− 1 Z t = Fx(Xs)dX s = F (Xt) − F (X0) − Fxx(Xs)d[X,X]s (7.0.8) 0 2 0

n Proof For any sequence of partitions (Π ), and any subsequence (nk), there is a further subsequence nk` such that P -almost surely X 2 Xti∧t − Xti−1∧t −→ [X,X]t. nk ti∈Π ` in pathwise sense. With probability (P = 1) Ito formula (7.0.3) holds pathwise where the pathwise forward integral X lim Fx(Xt ) Xt ∧t − Xt ∧t `→∞ i−1 i i−1 nk ti∈Π ` Z t ←− 1 Z t = Fx(Xs)dX s = F (Xt) − F (X0) − Fxx(Xs)d[X,X]s 0 2 0

n is deﬁned with respect to the sequence of partitions Π k` , and it does not depend on the partitions (Πn). The stochastic Ito formula (7.0.7) follows by the subsequence characterization of the convergence in probabiity 2 Consider dyadic partitions

n −n n Dn = {tk = k2 : k = 0, . . . , n2 }

Proposition 7.0.4. ( by Paul Lévy ) Brownian motion has P -almost surely pathwise quadratic variation [B,B]t = t among the dyadic sequence {Dn}, which is also the stochastic quadratic variation in the sense of convergence in probability. 126CHAPTER 7. QUADRATIC VARIATION AND ITO-FÖLLMER CALCULUS

Proof: the variance of the approximating sums is

2 X n 2 n n X n 2 n n 2 E (B n − B ) − (t − t ) = E (B n − B ) − (t − t ) tk+1 tk k+1 k tk+1 tk k+1 k n n tk ≤t tk ≤t

( since increments are independent the cross-product terms have zero expectation).

X 4 n 2 n 2 = E({∆B n } ) + (∆t ) − 2(∆t )E({∆B n } ) = tk k k tk n tk ≤t X n n 2 n −2n −n 2 tk+1 − tk ) = 2bt2 c2 ≤ 2t2 n tk ≤t Let ε > 0 and X Aε = ω : |t − (Bn (ω) − Bn (ω))2| > ε n tk+1 tk n tk ≤t by Chebychev inequality

ε −n −2 P (An) ≤ 2t2 ε

Therefore

X ε −2 P (An) ≤ ε 4t < ∞ n Applying Borel Cantelli lemma, ∀ε > 0

ε P lim sup An = 0 n equivalently

ε c P lim inf An = 1 n

Taking ε = 1/m, m ∈ N and countable intersection of the complements \ [ \ 1/mc P An = 1 m≥0 k≥0 n≥k which is the probability that the path t 7→ Bt(ω) has pathwise quadratic variation [B,B]t = t when we take the limit among the dyadic sequence. 127

Remark 7.0.4. 1. Essentially we used X X n n 2 tk+1 − tk ) < ∞ n n tk ≤t

In order to obtain almost sure convergence starting from convergence in probability, it is enough to have X ∆(Πn, t) < ∞ n∈N

2. The set of measure zero where convergence fails may well depend on the sequence of partitions. Since the collection of partition sequences is uncountable, we don’t get almost sure convergence if we take supremum over partitions. When the partitions are nested Πn ⊆ Πn+1, we have also almost sure convergence in Proposition 7.0.4:

Proposition 7.0.5. (In Revuz and Yor, Continuous martingales and Brow- nian motion,Proposition 2.12 ) . Brownian motion has P -almost surely pathwise quadratic variation [B,B]t = t among any sequence of reﬁning partitions {Πn}, with Πn ⊆ Πn+1 and ∆(Πn) → 0.

Proof. It is enough to consider the canonical Brownian motion Bt(ω) = ωt, deﬁned on the space Ω = C0([0, 1] → R) of continuous functions f with f(0) = 0. n n n n+1 Let Π = {0 = t0 < ··· < tk < t} ⊆ Π , and deﬁne

2 X 2 M (ω) := B (ω) ,M (ω) := B n (ω) − B n (ω) , n ∈ . 0 t −n ti ∧t ti−1∧t N n n ti ∈Π

n n n n n n Given Π consider = (0 , 1 , . . . , k ) where i ∈ {−1, 1} are binary n n variables. For such Π and ε consider the transformation ω 7→ (θεn ω) ∈ Ω such that

n (θ n ω) n − (θ n ω) n = ω n − ω n ε . ε ti ∧t ε ti−1∧t ti ∧t ti−1∧t i

n n It means that in each interval ti−1 ∧ t, ti ∧ t the Brownian motion starting n from the right side is reﬂected when εi = −1. Deﬁne G0 = F = σ(C), and for n ∈ N

n G−n = σ X random variables such thatX(ω) = X(θεn ω) ∀ε 128CHAPTER 7. QUADRATIC VARIATION AND ITO-FÖLLMER CALCULUS that is the smallest σ-algebra which contains the random variables invariant n under the trasformations ω 7→ θεn ω for all possible ε corresponding to the partition Πn. Note that since Πn ⊆ Π(n+1) a path trasformation based on the partition Πn corresponds to a path transformation based on the next partition Π(n+1). Since the set of these path transformations grows with n, the corresponding set of transformation invariant random variables becomes smaller and smaller, and G−n ⊇ G−(n+1). These σ-algebrae form a ﬁltration G = (G−n : n ∈ N) indexed by negative integers. Note also that n M−n(ω) = M−n(θεn ω) ∀ε is invariant under all θεn transformations, and it is G−n measurable. We show that 2 M−n = E Bt G−n , n ∈ N which is an uniformly integrable G-martingale. We show this in details for n = 1, and by using the independece of 1 1 0 increments the same argument works ∀n. For 0 = t < t < t, let ∆B = B 1 0 1 t1 00 0 00 and ∆B = (B − B 1 ), with B = (∆B + ∆B ) t t1 t 2 0 2 00 2 0 00 0 2 00 2 E B1 G−1 = (∆B ) + (∆B ) + 2E ∆B ∆B (∆B ) , (∆B ) 0 0 2 00 00 2 = M−1 + 2E ∆B (∆B ) E ∆B (∆B ) = M−1 where by symmetry p 2 2 P ∆B = ± (∆B)2 (∆B) = 1/2 and E ∆B (∆B) = 0 .

By Doob backward martingale convergence theorem it follows that ∃ M−∞(ω) = 1 limn→∞ M−n(ω) P -almost surely and in L (P ). However we know already 2 that limn→∞ M−n = t in L (P ) sense (Proposition 7.0.4), therefore the P - almost sure limit is M−∞(ω) = t.

7.0.2 Cross-variation

Deﬁnition 7.0.3. Let xt, yt continuous paths with pathwise quadratic varia- n n tions [x, x]t and [y, y]t among the sequence of partitions (Π ) with ∆(Π , t) = 0 ∀t. We deﬁne their pathwise cross-variation among the sequence of partitions (Πn) as X [x, y]t = [y, x]t = lim (xtn∧t − xtn ∧t)(ytn∧t − ytn ∧t) n→∞ i i−1 i i−1 n n ti ∈Π when it exists and t 7→ [x, y]t is continuous. 129

Lemma 7.0.3. When the continuous paths (xt + yt) and (xt − yt) have pathwise quadratic variation among the sequence of partitions (Πn) , their cross variation among (Πn) exists and it is given by the polarization formula. 1 [x, y] = [x + y, x + y] − [x − y, x − y] . (7.0.9) t 4 t t Therefore the cross-variation has ﬁnite variation on compacts since it is the diﬀerence of two non-decreasing functions. Proof Observe that (7.0.9) for the approximating sums before taking limits, since 1 ∆x∆y = (∆x + ∆y)2 − (∆x − ∆y)2 2

Lemma 7.0.4. The continuous path xt, yt have cross-variation [x, y]t among (Πn), the sequence of measures

n X ξ (dt) = δ n (dt)(x n − x n )(y n − y n ) ti−1 ti ∧t ti−1∧t ti ∧t ti−1∧t ti∈Πn converges vaguely to the measure ξ(dt) with ξ((s, t]) = [x, y]t − [x, y]s. Proof By polarization and Lemma 7.0.2.

Proposition 7.0.6. When xt, yt are continuous with pathwise quadratic variations [x, x]t,[y, y]t and cross variation [x, y]t among the sequence of partitions (Πn) and f(r, s) ∈ C2.2, the following Ito Föllmer formula holds: Z t d←−x f(xt, yt) = f(x0, y0) + ∇f(xs, ys) ←− 0 d y 1 Z t 1 Z t Z t + fxx(xs, ys)d[x.x]s + fxx(xs, ys)d[x.x]s + fxy(xs, ys)d[x, y]s 2 0 2 0 0 where Z t ←− Z t Z t d x ←− ←− ∇f(xs, ys) ←− = fx(xs, ys)d x s + fy(xs, ys)d y s = 0 d y 0 0 X lim fx(xtn , ytn ) xtn − xtn + fy(xtn , ytn ) ytn − ytn n→∞ i−1 i−1 i i−1 i−1 i−1 i i−1 n n ti ∈Π Remark Note that at this stage we are not able to deﬁne separately the pathwise integrals Z t Z t ←− ←− fx(xs, ys)d x s and fy(xs, ys)d y s . 0 0 130CHAPTER 7. QUADRATIC VARIATION AND ITO-FÖLLMER CALCULUS

when [x]t[y]t > 0, but their sum is well deﬁned. Proof As before, use a telescopic sums and a second order Taylor ap- proximation, together with Lemma 7.0.4 2 Proposition 7.0.7. For xt,yt continuous with respective quadratic variations [x]t, [y]t, and cross variation [x, y]t among the sequence of partitions (Πn), for F (x),G(x) ∈ C2, the Ito-Föllmer integrals Z t Z t wt = Fx(xs)dxs, zt = Gx(ys)dys 0 0 have also cross-quadratic variation among the sequence of partitions (Πn), given by Z · Z · Z t Fx(xs)dxs, Fx(ys)dys F (x·,G(y·) t = Fx(xs)Gy(ys)d[x, y]s 0 0 t 0 Proof as in Remark (5).

Proposition 7.0.8. Let Bt and Wt independent Brownian motions. Then P -almost surely their pathwise cross-variation among the dyadic partitions n (D ) exists, and [B,W ]t = 0. √ √ Proof By deﬁnition (Bt + Wt)/ 2 and (Bt − Wt)/ 2 are Brownian motions. Therefore [B + W, B + W ]t = [B − W, B − W ]t = 2t and the result follows by polarization.

7.0.3 Pathwise Stratonovich calculus If in the approximating Riemann sums we evaluate the integrand at the mid- point rather than in the left point we obtain X Fx(B(ti+1+ti)/2)(Bti+1 − Bti ) = ti∈Dn:ti≤t X X = Fx(Bti )(Bti+1 − Bti ) + (Fx(B(ti+1+ti)/2) − Fx(Bti ))(Bti+1 − Bti ) X X = Fx(Bti )(Bti+1 − Bti ) + Fxx(Bti )(B(ti+1+ti)/2 − Bti )(Bti+1 − Bti )+ X + r(B(ti+1+ti)/2,Bti )(B(ti+1+ti)/2 − Bti )(Bti+1 − Bti ) X X 2 = Fx(Bti )(Bti+1 − Bti ) + Fxx(Bti )(B(ti+1+ti)/2 − Bti ) + X + Fxx(Bti )(B(ti+1+ti)/2 − Bti )(Bti+1 − B(ti+1+ti)/2)+ X + r(B(ti+1+ti)/2,Bti )(B(ti+1+ti)/2 − Bti )(Bti+1 − Bti ) 131

Lemma 7.0.5. For the Brownian path

X 2 1 1 B − B → [B,B] = t , (7.0.10) (ti+1+ti)/2∧t ti∧t 2 t 2 ti∈Dn X B(ti+1+ti)/2∧t − Bti∧t Bti+1∧t − B(ti+1+ti)/2∧t → 0 , (7.0.11)

ti∈Dn

Proof: Hint: among the lines of Proposition (7.0.4).

It follows that the Riemannin sums among the dyadics converge P -a.s. among the dyadics (Dn) to the pathwise Stratonovich integral

Z t Z t ←− 1 Z t Fx(Bs) ◦ dBs := Fx(Bs)dB s + Fxx(Bs)ds 0 0 2 0 1 Z t 1 Z t = F (Bt) − F (B0) − Fxx(Bs)ds + Fxx(Bs)ds = F (Bt) − F (B0) , 2 0 2 0 which follows the ordinary ﬁrst order calculus. By evaluating in the Rie- mann sums the integrand at the right point we obtain the pathwise backward integral

Z t −→ X Fx(Bs)dB s = lim Fx(Btn )(Btn ∧t − Btn ) n→∞ i+1 i+1 i 0 n ti ∈Dn 1 Z t Z t ←− Z t = F (Bt) − F (B0) + Fxx(Bs)ds = Fx(Bs)dB s + Fxx(Bs)ds 2 0 0 0 Proof: exercise.

References H. Föllmer, “Calcul d Ito sans probabilites” (1980). Sémi- naire de Probabilités XV, pp 143-149 Springer D. Sondermann, “ Intoduction to stochastic calculus for ﬁnance ” Springer.