Convex Stochastic Optimization Wise 2019/2020

Convex Stochastic optimization WiSe 2019/2020

Ari-Pekka Perkki¨o

February 17, 2020

Contents

1 Practicalities and background 3

2 Introduction 3 2.1 Examples ...... 4

3 Convex sets 6 3.1 Separation theorems ...... 8

4 Convex functions 10 4.1 Lower semicontinuity, recessions and directional derivatives . . . 12 4.2 Continuity of convex functions ...... 15

5 Convex normal integrands 17

6 Integral functionals 28

7 Dynamic programming 33 7.1 Conditional expectation of a normal integrand ...... 35 7.2 Existence of solutions ...... 42

1 7.3 Kernels and conditioning ...... 46 7.4 Examples ...... 50

8 Conjugate duality 57 8.1 Convex conjugates ...... 57 8.2 Conjugate duality in optimization ...... 62 8.3 Dual spaces of random variables ...... 64 8.4 Conjugate duality in stochastic optimization ...... 65 8.5 Relaxation ...... 67

9 Examples 72

10 Absence of duality gap 75

2 1 Practicalities and background

Upon passing the exam, attending and solving the exercises give a bonus to the ﬁnal grade. We assume that the following concepts are familiar:

1. Vector spaces, topology.

2. Probability space, random variables, expectation, convergence theorems. 3. Conditional expectations, martingales. 4. The fundamentals of discrete time ﬁnancial mathematics.

2 Introduction

T Let (Ω, F,P ) be a probability space with a ﬁltration (Ft)t=0 of sub-σ-algebras of F and consider the dynamic stochastic optimization problem Z minimize Eh(x, u) := h(x(ω), ω)dP (ω) over x ∈ N , (SP ) where, for given integers nt and m,

T 0 nt N = {(xt)t=0 | xt ∈ L (Ω, Ft,P ; R )} is the space of adapted processes, h is an extended real-valued B(Rn) ⊗ F- measurable function, where n := n0 + ... + nT . Here and in what follows, we deﬁne the expectation of a measurable function φ as +∞ unless the positive part φ+ is integrable1. The function Eh is thus well-deﬁned extended real-valued function on N . We will assume throughout that the function h(·, ω) is convex for every ω ∈ Ω. Then Eh is a convex function of N and (SP ) is a convex stochastic optimization problem on the space of adapted processes. The aim of this course is to analyze (SP ) using dynamic programming and conjugate duality. These lead to characterizations of optimal solutions of (SP ) via ”Bellman equations”and ”Karush-Kuhn-Tucker conditions”, both starting points of various modern numerical methods. Duality theory also leads to characterizations and lower bounds of the optimal value of (SP ), classical example being that the superhedging price of an option in a liquid market can be computed via ”martingale measures”.

1In particular, the sum of extended real numbers is deﬁned as +∞ if any of the terms equals +∞.

3 2.1 Examples

Example 2.1 (Mathematical programming). Consider the problem

minimize Ef0(x) over x ∈ N

subject to fj(x) ≤ 0 P -a.s., j = 1, . . . , m, where fj are normal integrands. The problem ﬁts the general framework with ( f (x, ω) if f (x, ω) ≤ 0 for j = 1, . . . , m, h(x, ω) = 0 j +∞ otherwise.

When each f0(·, ω), . . . , fm(·, ω) is an aﬃne function, the problem becomes a linear stochastic optimization problem.

T J Example 2.2 (Optimal investment). Let s = (st)t=0 be an adapted R -valued stochastic process describing the unit prices or assets in a perfectly liquid financial market. Consider the problem of finding a dynamic trading strategy T x = (xt)t=0 that provides the “best hedge” against a financial liability of deliver- ing a random amount c ∈ L0 cash at time T . If we measure our risk preferences over random cash-flows with the “expected shortfall” associated with a nondecreasing convex “loss function” V : R → R, the problem can be written as T −1 ! X minimize EV u − xt · ∆st+1 over x ∈ ND, (2.1) t=0

T where ND denotes the set of adapted trading strategies x = (xt)t=0 that satisfy the portfolio constraints x ∈ Dt for all t = 0,...,T almost surely. Here Dt is a random Ft-measurable set consisting of the portfolios we are allowed to hold over time period (t, t + 1]. The problem ﬁts the general framework with ( V u(ω) − PT −1 x · ∆s (ω) if x ∈ D (ω) for t = 0,...,T h(x, ω) = t=0 t t+1 t t +∞ otherwise.

This example can be extended to a semi-static hedging problem, where some (or all) of the assets are allowed to be traded only at the initial time t = 0. It is also possible to allow some of the assets to be ”American type options”. The above could also be readily extended by allowing the loss function V to be random or by adding transaction costs. Example 2.3 (Stochastic control). The problem

" T # X minimize E Lt(Xt,Ut) over X,U ∈ N , t=0

subject to ∆Xt = AtXt−1 + BtUt−1 + ut t = 1,...,T

4 ﬁts the general framework with x = (X,U), ( PT L (x , ω) if π∆x − A¯ (ω)x = u (ω) for t = 1,...,T, h(x, ω) = t=0 t t t t t−1 t +∞ otherwise, where π = [I 0] and A¯t = [At Bt]. Here the stochastic process X is the ”state” and U is the ”control”. Example 2.4 (Problems of Bolza). Consider the problem

T X minimize E Kt(xt, ∆xt), (2.2) x∈N t=0 where nt = d, ∆xt := xt − xt−1, x−1 := 0. The problem ﬁts the general framework with

T X h(x, ω) = Kt(xt, ∆xt, ω). t=0

For instance, currency market models fit into this framework, where components of xt describe different currencies in the portfolio hold at time (t, t + 1]. Example 2.5 (Risk measures). Some modern optimization problems in finance are given in terms of risk measures that do not a priori fit into (SP ). However, some of them can be expressed as (SP ) by introducing additional variables. Consider the problem minimize V(x) (2.3) x∈N for V : L0 → R defined by V(x) = inf Ec(α, x), α∈R where c(·, ·, ω) is convex on R × Rn. 0 Extending N¯ := L (F−1) × N for a trivial F−1 and denoting x¯ = (α, x), the problem fits the general framework with

h(¯x, ω) = c(α, x, ω).

Assume now that c(α, x, ω) = α + θ(g(x, ω) − α, ω) for convex θ and g. When θ is nondecreasing with θ(0) = 0 and 1 ∈ ∂θ(0),

V(x) = inf E[α + θ(g(x) − α)] α is known as optimized certainty equivalent of the random variable g(x). When θ(u) = et − 1, we get the entropic risk

V(x) = log Eeg(x)

5 u+ while for θ(u) = γ with γ ∈ (0, 1) we obtain the conditional value at risk at γ 1 V(x) = inf E[α + (g(x) − α)+]. α γ

1 2 When g is aﬃne and θ(u) = 2 u + u, we obtain the ”Mean Variance” risk measure 1 V(x) = E[(g(x) − Eg(x))2] + Eg(x). 2

3 Convex sets

Let U be a real vector space. A set A ⊆ U is convex if

λu + (1 − λ)u0 ∈ A. for every u, u0 ∈ A and λ ∈ (0, 1). For λ ∈ R and sets A and B, we deﬁne the scalar multiplication and a sum of sets as

λA := {λu | u ∈ A} A + B := {u + u0 | u ∈ A, u0 ∈ B}.

The summation is also known as Minkowski addition. With this notation, A is convex if and only if

λA + (1 − λ)A ⊆ A ∀λ ∈ (0, 1).

Convex sets are stable under many algebraic operations. Let X be another linear vector space.

j Theorem 3.1. Let J be an arbitrary index set, (A )j∈J a collection of convex sets and A ⊂ X × U a convex set. Then,

1. for λ ∈ R+, the scaled set λA is convex, P j 2. for ﬁnite J , the sum j∈J A is convex,

T j 3. the intersection j∈J A is convex, 4. the projection {u ∈ U | ∃x :(x, u) ∈ A} is convex.

Proof. Exercise.

Next we turn to topological properties of convex sets. Let τ be a topology on U (the collection of open sets, their complements are called closed sets) and let A ⊂ U. The interior int A of A is the union of all open sets contained in A and closure cl A is the intersection of closed sets containing A.

6 The set A is a neighborhood of u if u ∈ int A. We denote the collection of neighborhoods of u by Hu and the collection of open neighborhoods of u by o Hu. Note that A is a neighborhood of u if and only if A contains an open neighborhood of u.

o Exercise 3.0.1. For A ⊂ U, u ∈ cl A if and only if A ∩ O 6= ∅ for all O ∈ Hu.

A function g from U to another topological space V is continuous at a point u if the preimage of every neighborhood of g(u) is a neighborhood of u. A function f is continuous if it is continuous at every point.

Exercise 3.0.2. A function is continuous if and only if the preimage of every open set is open.

A collection E of neighborhoods of u is called a neighborhood base if every neigh- 0 borhood of u contains and element of E. Evidently Hu is a neighborhood base.

Exercise 3.0.3. Given local bases Eu of u and Ev of v = g(u), g is continuous at u if and only if the preimage of every element of Ev contains an element of Eu.

Given another topological space (U 0, τ 0), the product topology on U × U 0 is the smallest topology containing all the sets {O × O0 | O ∈ τ, O0 ∈ τ 0}. We always equip products of topological spaces with the product topology. Clearly 0 o o 0 {(O,O ) ∈ Hu × Hu0 } is a neighborhood basis of (u, u ). Exercise 3.0.4. Let p : U × U 0 → V be continuous. For every u0 ∈ U 0, u 7→ p(u, u0) is continuous.

The space (U, τ) is a topological vector space (TVS) if (u, u0) → u + u0 is continuous from U × U to U and (u, α) → αu is continuous from U × R to U. Exercise 3.0.5. In a topological vector space U,

0 o 1. αO ∈ H0 for all α 6= 0 and O ∈ H0, o o 2. for all u ∈ U and O ⊂ U, (O + u) ∈ Hu if and only if O ∈ H0, 3. sum of a nonempty open set with any set is open,

o 0 o 0 4. for every O ∈ H0, there exists O ∈ H0 such that 2O ⊂ O,

5. αA ∈ H0 for all α 6= 0 and A ∈ H0,

6. for all u ∈ U and A ∈ U, (A + u) ∈ Hu if and only if A ∈ H0,

0 0 7. for every A ∈ H0, there exists A ∈ H0 such that 2A ⊂ A.

A set C is symmetric if x ∈ C implies −x ∈ C.

7 Lemma 3.2. In a topological vector space, every (resp. convex) neighborhood of the origin contains a symmetric (resp. convex) neighborhood of the origin.

Proof. Let A ∈ H0. By continuity of p(α, u) := αu from R × U to U, there is 0 o 0 S α and O ∈ H0 such that αO ⊂ A for all |α| ≤ α . The set B := |α|≤α0 (αO) is the sought neighborhood. Assume additionally that A is convex. The set A ∩ (−A) is symmetric, so, since B ⊂ A is symmetric as well, B ⊂ A ∩ (−A). Hence A ∩ (−A) is a symmetric convex set containing a neighborhood of the origin. Lemma 3.3. Let C be a convex set in a TVS. Then int C and cl C are convex.

Proof. Let λ ∈ (0, 1). We have int C ⊂ C, so λ(int C) + (1 − λ) int C ⊂ C. Since sums and strictly positive scalings of open sets are open, we see that λ(int C) + (1 − λ) int C ⊂ int C, since int C is the largest open set contained in C. Since λ ∈ (0, 1) was arbitrary, this means that int C is convex. To prove that cl C is closed we use results from Exercises 3.0.1 and 3.0.5. Let 0 ˜ o u, u ∈ cl C, λ ∈ (0, 1) and O ∈ H0. It suﬃces to show that λu + (1 − λ)u0 + O˜ ∩ C 6= ∅.

0 o 0 ˜ There are O,O ∈ H0 with λO + (1 − λ)O ⊂ O andu ˜ ∈ C ∩ (u + O) and u˜0 ∈ C ∩ (u0 + O0). Thus

λu˜ + (1 − λ)˜u0 ⊂ λ(u + O) + (1 − λ)(u0 + O0) ⊂ λu + (1 − λ)u0 + O˜ where the left side belongs to C.

3.1 Separation theorems

Sets of of the form {u ∈ U | l(u) = α} are called hyper-planes, where l is a real-valued linear function and α ∈ R. Each hyperplane generates two half-spaces (opposite sides of the plane)

{u ∈ U | l(u) ≤ α}, {u ∈ U | l(u) ≥ α}.

A hyperplane separates sets C1 and C2 if they belong to the opposite sides of the hyperplane. The separation is proper unless both sets are contained in the hyperplane. In other words, proper separation means that

sup{l(u1 − u2) | ui ∈ Ci} ≤ 0 and inf{l(u1 − u2) | ui ∈ Ci} < 0.

A set C ⊂ U is called algebraically open if {α ∈ R | u + αu0 ∈ C} is open for any u, u0 ∈ U. The set C is algebraically closed if its complement is open, or equivalently, if the set {α ∈ R | u + αu0 ∈ C} is closed for any u, u0 ∈ U.

8 Exercise 3.1.1. In a topological vector space, open (resp. closed) sets are algebraically open (resp. closed), and the sum of a nonempty algebraically open set with any set is algebraically open.

The following separation theorem states that the origin and an algebraically open convex set not containing the origin and can be properly separated. Theorem 3.4. Assume that C in a linear vector space U is an algebraically open convex set with 0 ∈/ C. Then there exists a linear l : U → R such that sup{l(u) | u ∈ C} ≤ 0, inf{l(u) | u ∈ C} < 0. In particular, l(u) < 0 for all u ∈ C.

Proof. This is an application of Zorn’s lemma. Omitted.

The above separation theorem implies a series of other separation theorems for convex sets. In the locally convex setting below, we get separation theorems in terms of continuous linear functionals, or equivalently, in terms of closed hyperspaces as the next exercise shows. A real-valued function g is bounded from above on B ⊂ U if there is M ∈ R such that g(u) < M for all u ∈ B. If g is continuous at u ∈ dom g, then it is bounded from above on a neighborhood at u. Indeed, choose a neighborhood g−1((−∞,M)) for some M > g(u). Theorem 3.5. Assume that l is a real-valued linear function on a topological vector space. Then the following are equivalent:

1. l is bounded from above in a neighborhood of the origin. 2. l is continuous.

3. {u ∈ U | l(u) = α} is closed for all α ∈ R. 4. {u ∈ U | l(u) = 0} is closed.

Proof. Exercise.

A topological vector space is locally convex (LCTVS) if every neighborhood of the origin contains a convex neighborhood of the origin. Theorem 3.6. Assume that C is a closed convex set in a LCTVS and u∈ / C. Then there is a continuous linear functional separating properly u and C.

C o Proof. The origin belongs to the open set (C −u) , so there is a convex O ∈ H0 such that 0 ∈/ C − u + O. By Theorem 3.4, there is a linear l such that l(u0) < 0 ∀ u0 ∈ C − u + O. This means that l(u0) < l(u) for all u0 ∈ C+O, so l is continuous by Theorem 3.5.

9 The following corollary is very important in the sequel. For instance, it will give the biconjugate theorem that is the basis of duality theory in convex optimization. Corollary 3.7. The closure of convex set in a LCTVS is the intersection of all closed hyperplanes containing the set.

Proof. By Lemma 3.3, cl C is convex for convex C. For any u∈ / cl C, there is, by the above theorem, a closed half-space Hu such that cl C ⊂ Hu and u∈ / Hu. We get \ cl C = Hu. u/∈C

4 Convex functions

Throughout the course, R = R∪{±∞} is the extended real line. For a, b ∈ R, the ordinary summation is extended as a + b = +∞ if a = +∞, and as a + b = −∞ if a 6= +∞ and b = −∞. Let g : U → R. The function g is convex if g(λu + (1 − λ)u0) ≤ λg(u) + (1 − λ)g(u0) for all u, u0 ∈ U and λ ∈ [0, 1]. A function is convex if and only if its epigraph

epi g := {(u, α) ∈ U × R | g(u) ≤ α} is a convex set. Applying the last part of Theorem 3.1 to epi g, we see that the domain dom g := {u ∈ U | g(u) < ∞} is convex when g is convex.

Exercise 4.0.1. A function g is convex if and only if the strict epigraph

o epi g = {(u, α) ∈ U × R | g(u) < α} is convex.

Many algebraic operations also preserve convexity of functions.

j Theorem 4.1. Let J be an arbitrary index set, (g )j∈J a collection of convex functions, and p : X × U → R a convex function. Then,

j P j j 1. for ﬁnite J and strictly positive (λ )j∈J , the sum j∈J λ g is convex,

10 2. the inﬁmal convolution X X x 7→ inf{ gj(uj) | uj = u} is convex,

j 3. the supremum u 7→ supj∈J g (u) is convex,

4. the marginal function u 7→ infx p(x, u) is convex.

Proof. Exercise.

The function g is called positively homogeneous if g(αu) = αg(u) ∀ u ∈ dom g and ∀ α > 0, and sublinear if g(α1u1 + α2u2) ≤ α1g(u1) + α2g(u2) ∀ ui ∈ dom g and ∀ αi > 0. The second part in the following exercise shows that norms are convex. Exercise 4.0.2. Let g be an extended real-valued function on U.

1. If g is positively homogeneous and convex, then it is sublinear. 2. If g is positively homogeneous, then it is convex if and only if g(u1 + u2) ≤ g(u1) + g(u2) ∀ ui ∈ dom g.

3. If g is convex, then ( λg(u/λ) if λ > 0, G(λ, u) = +∞ otherwise

is positively homogeneous and convex on R × U. In particular, p(u) = inf G(λ, u) λ>0 is positively homogeneous and convex on U.

The third part above is sometimes a surprising source of convexity. It also implies properties for recession functions and directional derivatives introduced later on. Let G be a function from a subset dom G of X to U and let K ⊂ U be a convex cone. The function G is K-convex if epi G := {(x, u) | x ∈ dom G, G(x) − u ∈ K} K is a convex set in X × U. Note that dom G is convex, being the projection of epiK G to X. When G : X → R, G is convex if and only if it is K-convex for K = R−, in which case epiK G = epi G.

11 Lemma 4.2. The function G is K-convex if and only if dom G is convex and

G(λx1 + (1 − λ)x2) − λG(x1) − (1 − λ)G(x2) ∈ K for every xi ∈ dom G and λ ∈ (0, 1)

Proof. Exercise.

The composition g◦G of g and G is deﬁned by

dom(g◦G) := {x ∈ dom G | G(x) ∈ dom g} (g◦G)(x) := g(G(x)) ∀x ∈ dom(g◦G). The range of G is denoted by rge G.

Theorem 4.3. If G is K-convex and g is convex such that g(u1) ≤ g(u2) whenever u1 ∈ rge G and u1 − u2 ∈ K, then g◦G is convex.

Proof. Exercise. Exercise 4.0.3. Let G a convex function on X and h a nondecreasing convex function on rge G. Then h◦G is convex.

n λi n Exercise 4.0.4. The function g(u) = Πi=1ui is concave on R , where u = Pn (u1, . . . , un), λi > 0 and i=1 λi < 1. Pn Hint: When i=1 λi = 1, apply Exercise 4.0.2. For the general case, combine with the composition rule.

4.1 Lower semicontinuity, recessions and directional derivatives

Assume now that g is an extended real-valued function on U. The function g is said to be proper if it is not identically +∞ and if it never takes the value −∞. The function g is lower semicontinuous (lsc) if the level-set

levα g := {u ∈ U | g(u) ≤ α} is closed for each α ∈ R. Equivalently, g is lsc if its epigraph is closed, or if, for every u ∈ U, sup inf g(u0) ≥ g(u). u0∈A A∈Hu For sequences, a lsc function g satisﬁes lim inf g(uν ) ≥ g(u). uν →u When U is ”sequential” (e.g., a Banach space), this property is equivalent to lower semicontinuity. We denote argmin g := {u ∈ U | g(u) = inf g(u0)}. u0∈U

12 j Theorem 4.4. Let J be an arbitrary index set, g a lsc function and (g )j∈J a collection of lsc functions on a topological vector space U. Then,

1. for a continuous F : V → U, g◦F is lsc.

j P j j 2. for ﬁnite J and strictly positive (λ )j∈J , the sum j∈J λ g is lsc,

j 3. the supremum u 7→ supj∈J g (u) is lsc.

Proof. Exercise.

Let g be a lsc convex function. Givenu ¯ ∈ dom g, the function ( λ(g(¯u + u/λ) − g(¯u)) if λ > 0, G(λ, u) := +∞ otherwise. is positively homogeneous and convex on R × U by Exercise 4.0.2. The function G(·, u) is a decreasing on R+, i.e., g(¯u + λu) − g(¯u) λ 7→ λ is increasing on R+. The function g(¯u + λu) − g(¯u) u 7→ g0(¯u; u) = lim λ&0 λ gives the directional derivative of g atu ¯. We have g(¯u + λu) − g(¯u) g0(¯u; u) = inf , λ&0 λ so g0(¯u, ·) is positively homogeneous and convex by Exercise 4.0.2. The function g(¯u + λu) − g(¯u) g∞(u) = lim λ%∞ λ is called the recession function of g. Note that g∞ is independent of the choice u¯, and g(¯u + λu) − g(¯u) g∞(u) = sup , λ>0 λ so g∞ is positive homogeneous and convex. Since g is lsc, g∞ is lsc as well. Theorem 4.5. For a proper lsc convex function g, the function  λ(g(¯u + u/λ) − g(¯u)) if λ > 0,  G(λ, u) := g∞(u) if λ = 0, +∞ otherwise. is lsc.

13 Proof. Exercise.

For a convex C, the set

C∞ := {x | x0 + λx ∈ C ∀x0 ∈ C, λ > 0}

∞ is called the recession cone of C. For a closed C, we have δC = δC∞ , so the recession cone is closed for a closed C. This is a consequence of the following lemma. Lemma 4.6. For a closed convex C in a topological vector space,

C∞ := {u | u¯ + λu ∈ C ∀λ > 0} for any u¯ ∈ C.

Proof. It suﬃces show that the right side is a subset of C∞. Let u 6= 0 and u¯ ∈ C be such thatu ¯ + λu ∈ C for all λ > 0. Let u0 ∈ C and λ0 > 0. For any λ ≥ λ0, λ0 λ0 λ0 u0 + λ0u + (u − u0) = (1 − )u0 + (¯u + λu) ∈ C λ λ λ by convexity. Since C is closed, letting λ % ∞ gives u0 + λ0u ∈ C. Exercise 4.1.1. If (xν ) is a sequence in a closed convex C, λν & 0 and λν xν → x¯, then x¯ ∈ C∞.

In a topological vector space, a set C is bounded if for any neighborhood A of the origin, C ⊂ λA for some λ > 0. In a normed space, like Rn, this means that C is contained in some ball.

Theorem 4.7. A convex set C in Rd is bounded if and only if (cl C)∞ = {0}.

Proof. Exercise. Remark 4.8. In a general LCTVS, a closed set C need not be bounded even though C∞ = {0}. Consider, e.g. U = L∞, the space of essentially bounded random variables equipped with a topology generated by the essential supremum norm. The set C = {u ∈ L∞ | E|u| ≤ 1} is closed and C∞ = {0}. If there ν ν ν ν are non-null A ∈ F with P (A ) & 0, then u := 1Aν /P (A ) belongs to C but ν ν ku kL∞ = P (A ) % ∞, so C is not bounded.

Recall that in Rd, a set is compact if and only if it is bounded and closed. d Theorem 4.9. Let g : R → R be a proper lsc convex function. Then lev≤α g is bounded (and hence compact) for every α if and only if

{x | g∞(x) ≤ 0} = {0}.

In this case, argmin g is nonempty and compact.

14 Proof. Exercise.

Exercise 4.1.2. Assume that g : Rn × Rm → R is a proper lsc convex function such that n ∞ L := {x ∈ R | g (x, 0) ≤ 0} is a linear space. Then g(x + x0, u) = g(x, u) for every x0 ∈ L.

Theorem 4.10. Assume that g : Rn × Rm → R is a proper lsc convex function such that n ∞ L := {x ∈ R | g (x, 0) ≤ 0} is a linear space. Then the inﬁmum in

p(u) := inf g(x, u) n x∈R is attained, p is a proper lsc convex function and

p∞(u) = inf g∞(x, u). n x∈R

Proof. To prove that p that is lsc, it suffices to show that lev≤β p∩B is compact (and hence closed) for every closed ball B. That g(x+x0) = g(x) for all x0 ∈ L is ⊥ n 0 0 left as an exercise. Let L := {x ∈ | x · x = 0 ∀ x ∈ L} andg ¯ = g + δ ⊥ R L ×B n so that p(u) + δB(u) =p ¯(u) := infx∈R g¯(x, u). n For a proper lsc convex function f : R → R, we have that lev≤β f is bounded (and hence compact) for every β if and only if {x | f ∞(x) ≤ 0} = {0} (an exercise). Applying this to x 7→ g¯(x, u), this function has inf-compact level sets and thus the infimum in the definition ofp ¯ and in that of p is attained for every u. In particular, we have lev≤β p ∩ B = Π(lev≤β g¯) where Π is the projection from Rn × Rm to Rm. We have {(x, u) | g¯∞(x, u) ≤ 0} = {(x, u) | u = 0, x ∈ L⊥, g∞(x, u) ≤ 0} = {(0, 0)}, sog ¯ has compact-level sets as well. Thus lev≤β p∩B is a projection of a compact set and hence compact. The proof of the recession formula is left as an exercise.

4.2 Continuity of convex functions

Given a set A ⊂ U,

core A = {u ∈ U | ∀u0 ∈ U ∃λ : u + λ0u ∈ A ∀λ ∈ (0, λ)} is known as the core (or algebraic interior) of A and its elements are called internal points, not to be confused with interior points. We always have int A ⊂ core A.

15 Theorem 4.11. If a convex function is bounded from above on an open set, then it is continuous throughout the core of its domain.

Proof. Let g be a function that is bounded from above on a neighborhood O ofu ¯. We show ﬁrst that this implies that g is continuous atu ¯. Replacing g by g(u +u ¯) − g(¯u), we may assume thatu ¯ = 0 and that g(0) = 0. Hence there exists M > 0 such that g(u) ≤ M for all u ∈ O. Let > 0 be arbitrary and choose λ ∈ (0, 1) with λM < . We have g(λu) < for each u ∈ O by convexity. Moreover

0 = g((1/2((−λu) + (1/2)λu) ≤ (1/2)g(−λu) + (1/2)g(λu), which implies that −g(−λu) ≤ g(λu) for all u ∈ O ∩ (−O). Thus, |g(u)| < for all u ∈ α(O ∩ (−O)), so g is continuous at the origin. Assume now that g is bounded from above on an open set A, i.e., there is M such that g(u) ≤ M for each u ∈ A. By above, it suﬃces to show that g is bounded from above at each u ∈ core dom g. Let u0 ∈ A. There isu ¯ ∈ dom g and λ ∈ (0, 1) with u = λu¯ + (1 − λ)u0. We have

g(λu¯ + (1 − λ)˜u)) ≤ λg(¯u) + (1 − λ)M ∀u˜ ∈ A, so g is bounded from above on a open neighborhood λu¯ + (1 − λ)A of u.

In Rd, geometric intuition suggests that a convex function is continuous on the core of its domain. This idea extends to lsc convex functions on a barreled space. The LCTVS space U is barreled if every closed convex symmetric absorbing set is a neighborhood of the origin2. A set C is called absorbent if S (αC) = U. α∈R+ A set is absorbent if and only if the origin belongs to its core. For example, every Banach space is barreled3. Lemma 4.12. Let A ⊂ U be a convex set. Then int A = core A under any of the following conditions:

1. U is ﬁnite dimensional, 2. int A 6= ∅, 3. U is barreled and A is closed.

Proof. We leave the ﬁrst case as an exercise. To prove the second, it suﬃces to show that, for u ∈ core A, we have u ∈ int A. Let u0 ∈ int A. There is λ ∈ (0, 1) andu ¯ ∈ A with u = λu0 + (1 − λ)¯u. Now

u + (1 − λ)(int A − u0) =u ¯ + (1 − λ) int A ⊂ A

2It is an exercise to show that in a LCTVS, every neighborhood of the origin is absorbing and contains a closed convex symmetric neighborhodhood of the origin. 3An application of the Baire category theorem: if U = S (nC) for a closed C, then n∈N int C 6= ∅

16 where the left side is an open neighborhood of u. To prove the last claim, let U be barreled and A closed. Again, it suﬃces to show that, for u ∈ core A, we have u ∈ int A. Let B = A−u. Now 0 ∈ core B, so 0 ∈ core(B∩(−B)). Thus (B∩(−B)) is a closed convex symmetric absorbing set and hence it is a neighborhood of the origin. Thus 0 ∈ int B and x ∈ int A. Theorem 4.13. A convex function g is continuous on core dom g in the following situations:

1. U is ﬁnite dimensional 2. U is barreled and g is lower semicontinuous.

Proof. We leave the ﬁrst part as an exercise. To prove the second, let u ∈ core dom g and α > g(u). For u0 ∈ U, the function λ 7→ g(u + λu0) is continuous at the origin by the ﬁrst part, so u ∈ core levα g. By Lemma 4.12 and lower semicontinuity of g, int levα g 6= ∅. Thus continuity follows from Theorem 4.11.

5 Convex normal integrands

Throughout, every ﬁnite dimensional space is equipped with the usual topology d and the usual Borel-σ-algebra, and S :Ω ⇒ R is a set-valued mapping, i.e., for every ω, S(ω) ⊂ Rd. The mapping S is measurable if the preimage S−1(O) := {ω ∈ Ω | S(ω) ∩ O 6= ∅} of every open O ⊂ Rd is measurable, i.e. S−1(O) ∈ F for every open O. The mapping S is (resp. convex, cone, etc.) closed-valued when S(ω) is (resp convex,conical, et.c) closed for each ω ∈ Ω. The set

dom S := {ω | S(ω) 6= ∅} is the domain of S. Being the preimage of the whole space, it is measurable as soon as S is measurable. Evidently, if S is measurable, then its image-closure mapping ω 7→ cl S(ω) is measurable, since its preimages of open sets coincide with those of S. The function

d(x, A) = inf |x − x0| x0∈A is the distance mapping.

n Theorem 5.1. Let S :Ω ⇒ R be closed-valued. The following are equivalent.

1. S is measurable,

17 2. S−1(C) ∈ F for every compact set C, 3. S−1(C) ∈ F for every closed set C, 4. S−1(B) ∈ F for every closed ball B,

5. S−1(O) ∈ F for every open ball O, 6. {ω ∈ Ω | S(ω) ⊂ O} ∈ F for every open O, 7. {ω ∈ Ω | S(ω) ⊂ C} ∈ F for every closed C,

8. ω 7→ d(x, S(ω)) is measurable for every x ∈ Rn.

−1 T −1 Proof. Assuming 1, for a compact C we have S (C) = S (C + B(0, 1/ν)) S ν which proves 2. For a closed C, we have C = (C ∩ B(0, ν), so 3 implies ν S 4. Clearly, 3 implies 4. For any open ball O, O = q,r{B(q, r) |, B(q, r) ⊂ n O, (q, r) ∈ Q , Q+} , so 4 implies 5. Any open O is a countable union of open balls, so 5 implies 1. Since {ω | S(ω) ⊆ A} = Ω\S−1(AC ), 3 and 6 are equivalent, and 4 and 7 are equivalent. For any α ∈ R+, d(x, S(ω) ≤ α if and only if S(ω) ∩ B(x, α) 6= ∅, so 4 and 8 are equivalent.

n For a set-valued mapping S :Ω ⇒ R ,

n gph S := {(x, ω) ∈ R × Ω | x ∈ S(ω)} is the graph of S.

Corollary 5.2. The graph of a measurable closed-valued mapping is B(Rn)⊗F- measurable.

Proof. Exercise. Hint: consider open balls with rational centers and rational radii.

Measurability is preserved under many algebraic operations.

Theorem 5.3. Let J be a countable index set and let each Sj, j ∈ J , be a measurable set-valued mapping. Then

T j j 1. ω 7→ j∈J S (ω) is measurable if each S is closed,

S j 2. ω 7→ j∈J S (ω) is measurable,

P j j j 3. ω 7→ j∈J λ S (ω) is measurable for ﬁnite J , where λ ∈ R, 4. ω 7→ (S1(ω),...,Sj(ω)) is measurable for ﬁnite J ; here we may allow j d S :Ω ⇒ R j .

18 Proof. 4. Let R(ω) = (S1(ω),...,Sj(ω)). Every open set O in the product j −1 space is expressible as a union of rectangular open sets ×jOν . Thus R (O) = S j −1 j j −1 j ν (∩j(S ) (Oν )), where each (S ) (Oν ) is measurable by the assumption. PJ j j 3. Let R(ω) = j=1 λ S (ω). For an open O, the set

J X O0 = {(x1, . . . , xJ )| λν xj ∈ O} j=1

J d is open in ×j=1R . Now

J X R−1(O) = {ω | ( λjSj)(ω) ∩ O 6= ∅} j=1 = {ω | (S1(ω),...,SJ (ω)) ∩ O0 6= ∅}, where the set on the right-hand side is measurable by part 4. S j −1 S j −1 2. ( j∈J S ) (O) = j∈J (S ) (O) for any open O. 1. Assume first that J = {1, 2}. Take any compact C ⊂ Rd, and denote Rν (ω) = Sν (ω) ∩ C, then (S1 ∩ S2)−1(C) = {ω | S1(ω) ∩ S2(ω) ∩ C 6= ∅} = {ω | 0 ∈ R1(ω) − R2(ω)} = (R1 − R2)−1({0}). Here R1 − R2 is measurable by part 3; let us show that it is closed-valued as well. Since Sν are closed-valued, Rν are compact-valued, so R1 − R2 is compact valued. (an exercise). Hence S1 ∩ S2 is measurable. The case of finite J follows from by induction. ˜µ Tµ ν Suppose finally that J is countable, J = {1, 2, 3,... }. Denote S = ν=1 S . T∞ ν T∞ ˜µ ˜µ Note that ν=1 S (ω) = µ=1 S (ω), and that S are measurable by preceding. The proof is complete as soon as we show ∞ \ \ ( Sν )−1(C) = (S˜µ)−1(C). µ=1 T ν −1 T∞ ˜µ −1 If ω ∈ ( S ) (C), it is straight-forward to check that ω ∈ µ=1(S ) (C). T∞ ˜µ −1 ˜µ ∞ For the converse, take ω ∈ µ=1(S ) (C). Since (S (ω) ∩ C)µ=1 is a nested T∞ ˜µ T∞ ν sequence of nonempty compact sets, µ=1(S (ω) ∩ C) 6= ∅. By ν=1 S (ω) = T∞ ˜µ T ν −1 µ=1 S (ω) this means that ω ∈ ( S ) (C). n m Theorem 5.4. If M(·, ω): R ⇒ R is such that gph M(ω) := {(x, u) | u ∈ M(x, ω)} defines a measurable closed-valued mapping, then the following mappings are measurable,

19 n 1. R(ω) := M(S(ω), ω), where S :Ω ⇒ R is measurable and closed-valued, n m 2. ω 7→ {x ∈ R | M(x, ω) ∩ S(ω) 6= ∅}, where S :Ω ⇒ R is measurable and closed-valued.

Proof. Let Π : Rn × Rm → Rm be the projection mapping. We have R = Π ◦ Q for Q(ω) := [S(ω) × Rm] ∩ gph M(ω), which is measurable by Theorem 5.1. Since R−1(O) = Q−1(Π−1(O)), where Π−1(O) is open for any open O, R is measurable. To prove 2, we have {x | M(x, ω)∩S(ω) 6= ∅} = Γ(S(ω)) for Γ(ω) := M −1(·, ω). Here gph Γ(ω) = {(x, u) | u ∈ M(x, ω)}, so the result follows from 1.

A function h :Ω × Rn → R is a normal integrand on Rn if

n epi h(ω) := {(x, α) ∈ R × R | h(x, ω) ≤ α} deﬁnes a measurable closed-valued mapping. A normal integrand is convex (positively homogeneous etc.), if, for all ω, h(·, ω) is convex (positively homogeneous etc). The indicator function of S, ( 0 if x ∈ S(ω), δ (x) := δS(x, ω) := S(ω) +∞ otherwise deﬁnes a normal integrand on Rn if and only if S is closed-valued and measurable. A function h is a Caratheodory integrand if h(·, ω) is continuous for each ω ∈ Ω and h(x, ·) is measurable for each x ∈ Rn. Theorem 5.5. A Caratheodory integrand is a normal integrand.

Proof. Let {xν | ν ∈ N} be a dense set in Rd and deﬁne αν,q(ω) = h(xν , ω) + q, ˆ where q ∈ Q+. Since h(·, ω) is continuous, the set O = {(x, α) | h(x, ω) < α} is open. For any (x, α) ∈ epi h(·, ω) and for any open neighborhood O of (x, α), O ∩ Oˆ is open and nonempty, and there exists (xν , αν,q) ∈ O ∩ Oˆ, i.e., {(xν , αν,q(ξ) | ν ∈ N, q ∈ Q} is dense in epi h(·, ω). Thus for any open set O ∈ U × R, [ {ω | epi h(ω) ∩ O 6= ∅} = {ω | (xν , αν,q(ω)) ∈ O} ν,q is measurable.

Let o n Sh(ω) := {(x, α) ∈ R × R | h(x, ω) < α}. o Since preimages of open sets under Sh and Sh are the same, one is measurable if and only if the other is so.

20 Theorem 5.6. For a normal integrand h on Rn and β ∈ L0, the level-set mapping d ω 7→ {x ∈ R | h(x, ω) ≤ β(ω)} is measurable and closed-valued. A function h : Rn × Ω → R is a normal integrand if and only if

n lev≤β h(ω) := {x ∈ R | h(x, ω) ≤ β} is measurable and closed-valued for every β ∈ R.

Proof. Let S(ω) := {x ∈ Rd | h(x, ω) ≤ β(ω)}. For a closed C ⊂ Rn, R(ω) := C × {α | α ≤ β(ω)} is measurable and closed-valued (an exercise). Now

S−1(C) = {ω | S(ω) ∩ C 6= ∅}

= {ω | Sh(ω) ∩ R(ω) 6= ∅} = dom(epi h ∩ R) which shows the measurability of S. To prove the second claim, note ﬁrst that the closedness of the level sets implies that epi h is closed-valued. Let (βν ) be a dense sequence in R. Since countable unions of measurable mappings are measurable,

n ν lev<βν h(ω) := {x ∈ R | h(x, ω) < β } are measurable. We have

o [ ν epi h (ω) = (lev<βν h(ω) × [β , ∞)) , ν so Sh is measurable. Theorem 5.7. The following are normal integrands:

i i 1. h(x, ω) = supi∈J h (x, ω), where h are normal integrands and J is countable.

Pn i i 2. h(x, ω) = i=1 h (x, ω), where h are normal integrands. 0 0 0 3. h(·, ω) = α(ω)h (·, ω), where h is a normal integrand and α ∈ L (R+). When α(ω) = 0, the scalar multiplication is deﬁned as α(ω)h0(·, ω) = cl dom h0(·, ω).

4. h(x, ω) = f(x, u(ω), ω), where f : Rn ×Rm ×Ω → R is a normal integrand and u ∈ L0(Ω; Rm).

Proof. In 1, epi h = ∩ epi hi, so the claim follows directly from Theorem 5.3. Below, in each case, M satisﬁes the assumption of Theorem 5.4, which can be veriﬁed using Theorems 5.5, 5.6 and 5.3.

21 To prove 2, let S = epih1 × · · · × epihJ and ( (x, α1 + ··· + αJ ) if x1 = ··· = xJ = x M(x1, α1, . . . , xJ , αJ ) = ∅ otherwise so that epi h = M ◦S and the claim follows from Theorem 5.4.1. The multiplication 3 is an exercise. In 4, we have epi h(ω) = {x | M(x, ω) ∈ epi f(·, ·, ω)} for M(x, α, ω) := (x, u(ω), α), so Theorem 5.4.2. applies.

Theorem 5.8. Assume that f : Rn × Rm × Ω → R is a normal integrand on Rn × Rm and let p(u, ω) := inf f(x, u, ω). n x∈R m The function deﬁned by clu p(u, ω) is a normal integrand on R . In particular, if p(·, ω) is lsc for every ω, p is a normal integrand.

Proof. Let Π(x, u, α) = (u, α) be the projection from Rn × Rm × R to Rm × R. It is easy to check that Π epi f o(ω) = epi po(ω), so, for an open O ⊂ Rn × R, (epi po)−1(O) = (epi f o)−1(Π−1(O)), where the right side is measurable, since f is a normal integrand and Π is continuous. Thus epi p is measurable. Moreover, ω 7→ cl epi p is measurable, which shows that (u, ω) 7→ (lscu p)(u, ω) is a normal integrand. If p is lsc in the ﬁrst argument, it is thus a normal integrand.

Corollary 5.9. For a normal integrand h, p(ω) := infx h(x, ω) is measurable, and S(ω) := argmin h(x, ω) x is measurable and closed-valued.

Proof. The ﬁrst claim follows from Theorem 5.8. We have

S(ω) = {x | h(x, ω) ≤ p(ω)}, so S is measurable by Theorem 5.6. Since h(·, ω) is lsc, S is closed-valued.

d 0 d Example 5.10. For a measurable closed-valued S :Ω ⇒ R and x ∈ L (R ), the projection mapping

0 ω 7→ PS(ω)(x(ω)) := argmin |x − x(ω)| x0∈S(ω) is measurable and closed-valued. Indeed, this follows from Corollary 5.9 applied 0 0 0 to h(x , ω) := δS(ω)(x ) + |x − x(ω)|.

22 Corollary 5.11. Let h be a normal integrand such that h(x) ≥ −ρ|x| − m for some ρ, m ∈ L0. The functions

hν (x, ω) := inf {h(x0, ω) + νρ(ω)|x − x0|} ν ∈ 0 d N x ∈R are Caratheodory integrands, (νρ)-Lipschitz with hν (x) ≥ ρ|x| − m and as ν increases, they increase pointwise to h.

n Proof. For any x1, x2 ∈ R ,

ν 0 0 h (x1, ω) ≤ inf{h(x , ω) + νρ|x − x2| + νρ(ω)|x2 − x1|} y ν = h (x2, ω) + νρ(ω)|x2 − x1|.

By symmetry, hν (·, ω) is νρ(ω)-Lipschitz continuous. For every ν and > 0, there is a yν such that

hν (x, ω) ≥ h(yν , ω) + νρ(ω)|yν − x| − ≥ −ρ(ω)|yν | − m(ω) + νρ(ω)|yν − x| − ≥ −ρ(ω)|x| − m(ω) + (ν − 1)ρ(ω)|yν − x| − .

Thus, either hν (x, ω) → ∞ or yν → x as ν → ∞. In the latter case,

lim inf hν (x, ω) ≥ h(x, ω) by lower semicontinuity of h.

A function x :Ω → Rd is called a selection of S if x(ω) ∈ S(ω) for all ω ∈ dom S. The set of measurable selections of S is denoted by L0(S). The sequence (xν ) of measurable selections of S in the following theorem is known as a Castaing representation of S.

d Theorem 5.12 (Castaing representation). Let S :Ω ⇒ R be closed-valued. Then S is measurable if and only if dom S is measurable and there exists a sequence (xν ) ∈ L0(Rd) such that, for all ω ∈ dom S, S(ω) = cl{xν (ω) | ν = 1, 2,... }.

Proof. Assuming the Castaing representation exists, we have, for an open O,

∞ [ S−1(O) = (xν )−1(O), ν=1 so S is measurable. Assume now that S is measurable. Let J be the countable d collection of q = (q0, q1, . . . , qd), q ∈ Q such that {q0, q1, . . . , qd} are aﬃnely q,0 independent. For each q ∈ J , we deﬁne recursively S (ω) := PS(ω)(q0) and

q,i S (ω) := PSq,i−1(ω)(qi).

23 These mappings are measurable and closed-valued, by Example 5.10. Moreover, q,d q q,d S (ω) is a singleton, a point in S(ω) nearest to q0. Setting x (ω) := S (ω), (xq) is a Castaing representation of S. Let us verify that Sq,d is single-valued. We fix ω and omit it from the notation. By the recursive definition of Sq,i, for each qi, there is ri ≥ 0 such that Sq,d ⊂ ∂B(qi, ri). Thus, for any x ∈ Sq,d, |x − qi|2 = (ri)2 for all i. By affine independence, these equations have a unique solution. Corollary 5.13 (Measurable selection theorem). Any measurable closed-valued n S :Ω ⇒ R admits a measurable selection. Corollary 5.14 (Doob–Dynkin, set-valued version). Assume that S is σ(ξ)- measurable closed random set for ξ ∈ L0(Rd). Then there exists measurable ˜ d n ˜ closed-valued S : R ⇒ R such that S(ω) = S(ξ(ω)).

Proof. Let (xν ) be a σ(ξ)-measurable Castaing representation of S. By Doob- Dynkin lemma, there exist Borel-measurable gν : Rd → Rn such that xν (ω) = gν (ξ(ω)). Let S˜(y) = cl{gν (y) | ν = 1, 2,... } so that S(ω) = cl(gν (ξ(ω)) | ν = 1, 2,... )} = S˜(ξ(ω)).

Corollary 5.15. If h is a σ(ξ)-normal integrand on Rn for an Rd-valued random variable ξ, then h(x, ω) = H(x, ξ(ω)) for a B(Rd)-normal integrand H on Rn.

Proof. Apply Corollary 5.14 to the epigraphical mapping of h.

n Corollary 5.16. A closed-valued mapping S :Ω ⇒ R is measurable if and only if there exists a sequence (xν ) ∈ L0(Rd) such that

1. for each ν, {ω | xν (ω) ∈ S(ω)} is measurable,

2. for each ω, S(ω) ∩ {xν (ω) | ν ∈ N} is dense in S(ω).

Proof. By Theorem 5.12, it suffices to show that the necessity is equivalent to the existence of a Castaing representation. If a Castaing representation, exists, it satisfies 1 and 2. When (xν ) ∈ L0(Rd) satisfies 1 and 2,...?? Theorem 5.17 (Section theorem). For normal integrands h and h˜, we have h ≤ h˜ almost surely everywhere if and only if, for every w ∈ L∞(F), h(w) ≤ h˜(w) almost surely. In particular, h = h˜ almost surely everywhere if and only if h(w) = h˜(w) for every w ∈ L∞(F) almost surely.

24 Proof. Necessity is obvious. Let hν and h˜ν be the respective Lipschitz-regularizations. For w ∈ L∞, we have h(w) ≤ h˜(w) if and only if hν (w) ≤ h˜ν (w) for every ν. Thus it suﬃces to prove the claim for Lipschitz integrands h and h˜. We assume for a contradiction that, for some constants α and β with α < β, the domain of

d S(ω) = {x ∈ R | h(x, ω) ≥ β > α ≥ h˜(x, ω)} is not evanescent. Since −h is a normal integrand and ˜ S = (lev≤α h) ∩ (lev≤−β(−h)), S is measurable and closed-valued by Theorems 5.6 and 5.3. By Corollary 5.13, there is a measurablew ˆ withw ˆ ∈ S on dom S. For ν large enough, w :=w ˆ1|wˆ|≤ν and A := {|wˆ| ≤ ν} ∩ dom S satisfy P (A) > 0 which is a contradiction with the assumption that h(w) ≤ h˜(w) almost surely.

A function M : Rn ×Ω → Rm is a Caratheodory mapping if M(·, ω) is continuous for all ω and M(x, ·) is measurable for all x ∈ Rn. Theorem 5.18. When M is a Caratheodory mapping,

gph M(·, ω) := {(x, u) | u ∈ M(x, ω)}.

Proof. For any countable dense D ⊂ Rn, {(x, M(x, ω)) | x ∈ D} is a Castaing representation of gph M.

The rest of the section is concerned with convexity. For a convex-valued S, we deﬁne S∞ as the ω-wise recession cone of S, i.e.

S∞(ω) = {x | x¯ + λx ∈ S(ω) ∀ x¯ ∈ S(ω), λ > 0}.

Theorem 5.19. For a measurable closed convex-valued S, S∞ is measurable closed convex-valued.

Proof. By Corollary 5.13, there isx ¯ ∈ L0(Rn) such thatx ¯ ∈ S on dom S. By Theorem ?? in the appendix,

∞ \ 1 S∞(ω) = (S(ω) − x¯(ω)), ν ν=1 for ω ∈ dom S. The measurability now follows from Theorem 5.3.

Given a convex normal integrand h, we deﬁne h∞ scenariowise as

h∞(·, ω) := h(·, ω)∞; see the Appendix.

25 Theorem 5.20. For a convex normal integrand h, h∞ is a normal integrand. If h is proper, then

∞ h(¯x(ω) + λx) − h(¯x(ω), ω) n h (x, ω) = sup ∀(x, ω) ∈ R × Ω λ>0 λ for every x¯ ∈ L0(dom h).

Proof. By Theorem 5.19, h∞ is a normal integrand. Theorem 5.21. Assume that f is a convex normal integrand and that the set- valued mapping n ∞ N(ω) = {x ∈ R | f (x, 0, ω) ≤ 0} is linear-valued. Then p(u, ω) := inf f(x, u, ω) n x∈R is a normal integrand with p∞(u, ω) = inf f ∞(x, u, ω). n x∈R Moreover, given a u ∈ L0(F), there is an x ∈ L0(F) with x(ω) ⊥ N(ω) and p(u(ω), ω) = f(x(ω), u(ω), ω).

Proof. By Theorem 4.10, the linearity condition implies that the inﬁmum in the deﬁnition of p is attained and that p(·, ω) is a lower semicontinuous convex function with p∞(u, ω) = inf f ∞(x, u, ω). n x∈R By Theorem 5.8, the lower semicontinuity implies that p is a normal integrand. By Theorem ??, there is anx ¯t that attains the minimum for every ω. By Theo- rem 4.10, we may replacex ¯t(ω) by its projection to the orthogonal complement of Nt(ω). By Example 5.10, such a projection preserves measurability.

Given an extended real-valued function g on Rm and an Rm-valued function H on a subset dom H of Rn, we deﬁne their composition as the extended real- valued function ( g(H(x)) if x ∈ dom H, (g◦H)(x) := +∞ if x∈ / dom H. Given a convex cone K ⊂ Rm, the function H is said to be K-convex if the set epi H := {(x, u) | x ∈ dom H,H(x) − u ∈ K} K is convex. A K-convex function is closed if epiK H is a closed set. It is easily veriﬁed (see the proof below) that if g is convex and H is K-convex then h◦H is convex if H(x) − u ∈ K =⇒ g(H(x)) ≤ g(u) ∀x ∈ dom H. (5.1)

26 n m We say that H : R ×Ω → R is a K-convex normal function if ω 7→ epiK H(·, ω) is closed convex-valued and measurable. Theorem 5.22. The following are convex normal integrands:

0 0 0 1. h(·, ω) = α(ω)h (·, ω), where α ∈ L+ and h is a convex normal integrand. When α(ω) = 0, the scalar multiplication is deﬁned as α(ω)h0(·, ω) = cl dom h0(·, ω). Pn i i 2. h(x, ω) = i=1 h (x, ω), where h are convex normal integrands. 3. h = g◦H, where g is a convex normal integrand and H is a K-convex normal function such that (5.1) holds almost surely and (−K) ∩ {u ∈ Rm | g∞(u, ω) ≤ 0} is linear. 4. h(x, ω) = g(A(ω)x, ω) where g is a convex normal integrand and A : Rn × Ω → Rd is an aﬃne Caratheodory mapping.

Proof. The ﬁrst two parts follow from Theorem 5.7 and the fact that scalar multiplication and the sum preserve convexity. By 2, h(x, u, ω) := g(u, ω) + δ (x, u), epiK H(·,ω) deﬁnes a convex normal integrand. The growth condition gives

(g◦H)(x, ω) = inf h(x, u, ω) m u∈R while the linearity condition implies, by Theorem 5.21, that this expression is a normal integrand. Part 4, follows from 3 by choosing H = A and K = {0}.

For a normal integrand h, we deﬁne h∗ scenariowise, that is, h∗(y, ω) := sup{u · y − h(u, ω)}. u Theorem 5.23. For a normal integrand h, h∗ is a convex normal integrand.

Proof. Let (xν , αν ) be a Castaing representation of epi h. On dom epi h, ( sup {xν (ω) · y − αν (ω)} ω ∈ dom epi h, h∗(y, ω) = ν −∞ ω∈ / dom epi h, this formula will be justiﬁed later on in the Section ”Conjugate duality”. Being a countable supremum of normal integrands (in fact, Caratheodory integrands), h∗ is normal.

In particular, for a measurable S,

σS(y, ω) := σS(ω)(v) := sup x · v x∈S(ω) is a normal integrand.

27 Exercise 5.0.1. Show that, for a convex normal integrand h,  αh(x/α, ω) if α > 0,  H(x, α, ω) := h∞(x, ω) if α = 0, +∞ otherwise

0 is a convex normal integrand and that, for α ∈ L (R+),  α(ω)h(x/α(ω), ω) if α(ω) > 0,  h0(x, ω) := h∞(x, ω) if α(ω) = 0, +∞ otherwise is a convex normal integrand. Hint: Compute the conjugate H∗ of H and then (H∗)∗.

6 Integral functionals

Given a normal integrand h, the associated integral functional Eh : L0 → R is deﬁned by Z Eh(x) := h(x(ω), ω)dP (ω).

Here and what follows, the expectation of an extended real-valued random variable is defined as +∞ unless the positive part is integrable. Likewise, the sum of αi ∈ R is defined as +∞ if αi = +∞ for some i. That the integral is well-defined for any x ∈ L0 follows from the following lemma.

Lemma 6.1. A normal integrand h is B(Rn) ⊗ F-measurable from Rn × Ω to R. In particular, ω 7→ h(x(ω), ω) is measurable for any x ∈ L0(Rn).

Proof. Note that {[−∞, β] | β ∈ R} generate the σ-algebra of R. For β ∈ R, lev≤β h is closed-valued and measurable by Theorem 5.6, so {(x, ω) | h(x, ω) ≤ β} is measurable by Corollary 5.2. The second claim follows from the fact the compositions of measurable functions are measurable.

0 ∞ A vector space X ⊆ L is decomposable if L ⊂ X and 1Ax ∈ X whenever A ∈ F and x ∈ X . Equivalently, X is decomposable if

0 1Ax + 1Ω\Ax ∈ U whenever A ∈ F, x ∈ X and x0 ∈ L∞. Examples of decomposable spaces include Lp-spaces with p ≥ 0 and Orlicz spaces.

28 Theorem 6.2 (Interchange rule). For a normal integrand h :Ω × Rn → R and a decomposable X , we have

inf Eh(x) = E[ inf h(x)] n x∈X x∈R if X = L0 or the left side is less than +∞. As long as this common value is ﬁnite, argmin Eh = {x ∈ X | x ∈ argmin h P -a.s.} x∈X

Proof. By Theorem 5.8, the function p deﬁned by

p(ω) := inf h(x, ω) m x∈R is measurable. Let α > Ep and > 0 be small enough so that Eβ < α for β := + max{p, −1/}. The mapping

S(ω) := {x | h(x, ω) ≤ β(ω)} is measurable and closed-valued with P (dom S) = 1, so, by Corollary 5.13, there exists x ∈ L0(Rn) such that h(x) ≤ β almost surely and thus Eh(x) ≤ Eβ < α. Letx ¯ ∈ X be such that Eh(¯x) < ∞, and deﬁne

ν x = 1|x|≤ν x + 1|x|>ν x.¯

By construction, xν ∈ X , and h(xν ) ≤ max{h(x), h(¯x)} for all ν, so, by Fatou’s lemma, Eh(xν ) < E[h(x)] + for ν large enough. Since α > Ep was arbitrary, this proves the ﬁrst claim. Assume now that Ep is ﬁnite. If x0 ∈ argmin h almost surely, then x0 ∈ argmin Eh. If x0 6∈ argmin h, then P (A) > 0 for A := {h(x0) > + p} for some > 0, and, as above, there is x such that h(x) < h(x0) on A, so 0 0 0 Eh(1Ax + 1AC x ) < Eh(x ), which means that x ∈/ argmin Eh.

We equip L0 with the translation invariant metric

d(x, x0) := Eρ(|x0 − x|), where ρ is a bounded nondecreasing sub-additive function vanishing only at the origin. A sequence (xν ) in L0 convergences in measure to an x ∈ L0 if

lim P ({|xν − x| ≥ } = 0 ν→∞ for all > 0. Lemma 6.3. The space L0 is a complete metric vector space where a sequence converges if and only if it converges in probability. A sequence converges in probability if and only if every subsequence has an almost surely convergent subsequence with a common limit.

29 Proof. Exercise. Exercise 6.0.1. Show that, for Ω = [0, 1], F = B([0, 1]) and the Lebesque measure P , L0 is not locally convex. Hint: show that any nonempty open convex set is the whole L0.

We say that a normal integrand h is bounded from below if there is an m ∈ L1 such that n h(x, ω) ≥ m(ω) ∀x ∈ R , ω ∈ A, where A ∈ F is of full measure. Theorem 6.4. If h is a normal integrand bounded from below, then Eh is lsc on L0.

Proof. Let xν → x in L0. By Lemma 6.3, we may assume, by passing to a subsequence if necessary, that xν → x almost surely. By lower semicontinuity of h(·, ω), lim inf h(xν (ω), ω) ≥ h(x(ω), ω) ν→∞ almost surely so lower semicontinuity follows from Fatou’s lemma. Exercise 6.0.2 (2 points). Show that the lower boundedness assumption in Theorem 6.4 can be relaxed as follows. Recall that A is an atom of P if for every B ⊂ A, P (B) = 0 or P (B) = P (A). Given a normal integrand h such that Eh is proper on L0, Eh is lsc on L0 if and only if A := {ω ∈ Ω | inf h(x, ω) = −∞} x contains only atoms of P and at most ﬁnitely many of them, and there exists m ∈ L1 such that h ≥ −m on AC . If Eh is not lsc, lsc Eh = −∞ on dom Eh.

When restricted to the space of N of adapted processes, the lower boundedness in Theorem 6.4 can be relaxed signiﬁcantly using the following very useful result. We denote N ⊥ = N ∩ L∞ and

⊥ 1 n +...n ∞ N := {v ∈ L (R 0 T ) | E[x · v] = 0 ∀x ∈ N }. Lemma 6.5. Let x ∈ N and v ∈ N ⊥. If E[x · v]+ ∈ L1, then E[x · v] = 0.

ν ν ∞ Proof. Assume ﬁrst that T = 0. Deﬁning x := 1{|x|≤ν}x, we have x ∈ L , so E[xν · v] = 0 and

E[x · v]− ≤ lim inf E[xν · v]− = lim inf E[xν · v]+ ≤ E[x · v]+, ν→∞ ν→∞ where the inequalities follow from Fatou’s lemma. By dominated convergence, E[x · v] = lim E[xν · v] = 0.

30 Assume now that the claim holds for every (T − 1)-period model. Deﬁning ν 1 x := {|x0|≤ν}x, we have

T X ν + ν + ν − + ν − [ xt · vt] ≤ [x · v] + [x0 · v0] ≤ [x · v] + [x0 · v0] , t=1

PT ν ν ∞ so E[ t=1 xt · vt] = 0, by the induction hypothesis. Since x0 ∈ L , we get E[xν · v] = 0. It then follows that E[x · v] = 0 just like in the case T = 0. Example 6.6. If a martingale s and x ∈ N are such that

T −1 X + E[ xt · ∆st+1] < ∞, t=0

PT −1 ⊥ then E[ t=0 xt ·∆st+1] = 0. This follows from Lemma 6.5 with v ∈ N deﬁned by vt = ∆st+1.

Lemma 6.7. Given extended real-valued random variables ξ1 and ξ2, we have

E[ξ1 + ξ2] = E[ξ1] + E[ξ2] under any of the following:

+ + 1 1. ξ1 , ξ2 ∈ L , 1 1 2. ξ1 ∈ L or ξ2 ∈ L ,

− − 1 3. ξ1 , ξ2 ∈ L .

4. ξ1 or ξ2 is {0, +∞}-valued.

Proof. Exercise.

The following is an immediate corollary of Lemma 6.7.

Lemma 6.8. Given normal integrands h1 and h2, we have

E[h1 + h2] = E[h1] + E[h2] under any of the following:

1. the integrands are bounded from below.

2. h1 or h2 is an indicator function of a measurable closed-valued mapping. Lemma 6.9. Given a normal integrand h and v ∈ L0, the following are equivalent.

31 1. h(x, ω) ≥ λx · v(ω) − m(ω) for some m ∈ L1 and two different values of λ ∈ R. 2. λv ∈ dom Eh∗ for two different values of λ ∈ R. ⊥ Definition 6.10. A normal integrand h is Nv -bounded if it satisfies one of the equivalent conditions of Lemma 6.9 with v ∈ N ⊥. A normal integrand is ⊥ ⊥ ⊥ N -bounded if it is Nv -bounded fro some v ∈ N . ⊥ Theorem 6.11. Assume that h is an Nv -bounded normal integrand with h(x, ω) ≥ λx · v(ω) − m(ω) where m ∈ L1 and λ ∈ R. Let k(x, ω) := h(x, ω) − λx · v(ω). Then Ek is lsc on L0 and Ek = Eh on N .

Proof. Theorem 6.4 gives the lower semicontinuity. If x ∈ dom Eh ∩ N , we have E[x · v] = 0, by Lemma 6.5, so Ek(x) = Eh(x). On the other hand, by assumption, there exists λ0 6= λ such that h(x, ω) ≥ λ0x · v(ω) − m(ω) so k(x, ω) ≥ (λ0 − λ)x · v(ω) − m(ω). By Lemma 6.5 again, Ek = Eh on dom Ek ∩ N .

The assumption in Theorem 8.19 may seem somewhat strange but it is satisﬁed in many interesting applications as we will see in the following section. If h is a convex normal integrand, then Eh is a convex function on L0. Theorem 6.12. If h is a convex normal integrand such that Eh is lsc and proper, then (Eh)∞ = Eh∞.

Proof. Letx ¯ ∈ dom Eh. Since Eh is lsc and h(¯x) is integrable, Theorem ?? and Lemma 6.7 give h(λx +x ¯) − h(¯x) (Eh)∞(x) = sup E . λ>0 λ The diﬀerence quotients h(λx(ω) +x ¯(ω), ω) − h(¯x(ω), ω) hλ(x(ω), ω) := λ increase pointwise to h∞(x(ω), ω). Thus, (Eh)∞ ≤ Eh∞. If x +x ¯ ∈ / dom Eh, then (Eh)∞(x) = +∞. If x+x ¯ ∈ dom Eh, the claim follows from the monotone convergence theorem.

32 7 Dynamic programming

The purpose of this section is to prove a general dynamic programming recursion which generalizes the classical Bellman equation for convex stochastic optimization. We will use the notion of a conditional expectation of a normal integrand. An extended real-valued random variable X is said to quasi-integrable if either X+ or X− is integrable. For a quasi-integrable X, there exists an extended real- valued G-measurable random variable EGX, unique almost surely, such that

G ∞ E α(E X) = E [αX] ∀α ∈ L+ (Ω, G,P ).

The random variable EGX is known as the G-conditional expectation of X. Deﬁnition 7.1. Given a normal integrand h, a G-integrand EGh is a G-conditional expectation of h if (EGh)(x) = EG[h(x)] a.s. for all x ∈ L0(G) such that h(x) is quasi-integrable.

t Ft We will use notations x = (x0, . . . , xt) and Et = E . We say that an adapted T n0+···+nt sequence (ht)t=0 of normal integrands ht : R × Ω → R solves the generalized Bellman equations if

hT = ET h, h˜ (xt−1, ω) := inf h (xt−1, x , ω), t−1 n t t (BE) xt∈R t ˜ ht−1 = Et−1ht−1 ˜ for t = T,..., 1. It is implicitly assumeed here that ht are normal integrands since otherwise the last equation is not well deﬁned.

In the above formulation, one does not separate the decision variables xt into “state” and “control” like in the classical dynamic programming models. A formulation closer to the classical dynamic programming equations will be given in Corollary ?? below. Given a collection C ⊂ L0 of random variables, a random variable ξ ∈ L0 is said to be the essential infimum of C if ξ ≤ ξ0 almost surely for every ξ0 ∈ C and if ξ ∈ L0 is another random variable with this property, then ξ ≤ ξ almost surely. The essential infimum ξ exists, is unique almost surely and denoted by 0 ess infξ0∈C ξ . Denote by N t the projection of N to its first t components. When h is bounded from below and EGh exists, we have E(EGh) = Eh. Theorem 7.2. Assume that h is bounded from below, (SP ) is feasible and that T the Bellman equations (BE) admit a solution (ht)t=0. Then t t t t 0 ht(x ) = ess infx˜∈N Eth(˜x) x˜ = x ∀x ∈ L (Ft)

33 and

t inf (SP ) = inf Eht(x ) xt∈N t for all t = 0,...,T and an x ∈ N is a solution of (SP ) if and only if

t−1 xt(ω) ∈ argmin ht(x (ω), xt, ω) P -a.s. t = 0,...,T. (OC) n xt∈R t

t Proof. By the deﬁnition of ht, for every Ft-measurable x ,

t t t 0 ht(x ) = Et[ inf ht+1(x , xt+1)] = ess infxt+1∈L (Ft+1) Et[ht+1(x , xt+1)], xt+1 where the last equality follows from Corollary 5.13 applied to -argmin; hence

t t+1 t+1 0 Etht+1(x , xt+1) = EtEt+1 inf ht+2(x , xt+2) = ess infxt+2∈L (Ft+2) Etht+2(x , xt+2) xt+2

0 and the ﬁrst statement follows from recursion and the fact that if C = {ξα,α0 | α ∈ J , α0 ∈ J 0}, then

0 0 ess infξ0∈C ξ = ess infα∈J (ess infα0∈J 0 ξα,α0 ).

Let x ∈ N . Theorem 6.2 gives, for any t, ˜ t−1 t−1 Eht−1(x ) = inf Eht(x , xt). 0 xt∈L (Ft)

˜ ˜ t−1 Since ht−1 is bounded from below, Eht−1 = Eht−1 on N and thus

t−1 t inf Eht−1(x ) = inf Eht(x ). xt−1∈N t−1 xt∈N t

t By induction, inf (SP ) = infxt∈N t Eht(x ). We have

t t x ∈ argmin Eht(x ) xt∈N t if and only if

t−1 t−1 t−1 x ∈ argmin Eht−1(x ) and xt ∈ argmin Eht(x , xt). t−1 t−1 0 x ∈N xt∈L (Ft) By the second part of Theorem 6.2,

t−1 0 t−1 argmin Eht(x , xt) = {xt ∈ L (Ft) | xt ∈ argmin ht(x , xt) a.s.}. 0 n xt∈L (Ft) xt∈R t By induction, x ∈ N is optimal if and only if (OC) holds.

In Sections 7.1 and 7.2, we study existence of solutions to (BE) and (SP ) and extend Theorem 7.2 to settings where the objective h is not bounded from below.

34 7.1 Conditional expectation of a normal integrand

We say that two set-valued mappings S and S¯ are indistinguishable if there exists a P -null set N such that S = S¯ outside N. Normal integrands are said to indistinguishable if their epigraphical mappings are so. Note that if h˜ is a G- conditional expectation of h, then any h¯ that is indistinguishable from h˜ is also a G-conditional expectation of h. Uniqueness and equality of normal integrands means indistinguishable. For an integrable Rd-valued random variable X, EGX is deﬁned componentwise as the conditional exectation of X. We say that a normal integrand h is L-bounded if there exist ρ, m ∈ L1 such that

n h(x) ≥ −ρ|x| − m ∀x ∈ R . (L) Lemma 7.3. A convex normal integrand h is L-bounded if and only if dom Eh∗∩ L1 6= ∅.

Proof. Let v ∈ L1 such that Eh∗(v) < ∞ is ﬁnite. By Fenchel’s inequality,

h(x, ω) ≥ x · v − h∗(v, ω) ≥ −|v||x| − h∗(v, ω), so we may choose ρ = |v| and m = h∗(v)+. On the other hand, h ≥ −ρ| · | − m can be written as (h + ρ| · |)∗(0) ≤ m. By Corollary ?? in the appendix, this means that inf {h∗(v) + δ (v/ρ)} ≤ m, n B v∈R where the inﬁmum is attained. By Corollary 5.13, there is a v ∈ L0 with |v| ≤ ρ and h∗(v) ≤ m.

For an L-bounded normal integrand, EG[h(x)] is well-defined for every x ∈ L∞(G). The following lemma shows that, in the definition of conditional expectation, for L-bounded normal integrands, it suffices to test with x ∈ L∞(G). Lemma 7.4. For an L-bounded normal integrand h, a G-normal integrand h˜ is the unique G-conditional expectation of h if

h˜(x) = EG[h(x)] ∀x ∈ L∞(G).

Proof. If there is x ∈ L0(G) such that h(x) is quasi-integrable and h˜(x) 6= EG[h(x)], then, for ν large enough, ˜ G G 1{|x|≤ν}h(1{|x|≤ν}x) 6= 1{|x|≤ν}E [h(1{|x|≤ν}x)] = E [1{|x|≤ν}h(1{|x|≤ν}x)] which is a contradiction. Uniqueness follows from Theorem 5.17. Example 7.5. Let h(x, ω) = x · v(ω) for v ∈ L1. Then (EGh)(x, ω) = x · (EGv)(ω).

35 Theorem 7.6. Let h be a Caratheodory function such that there exist x ∈ Rn and ρ ∈ L1 with h(x) ∈ L1 and

0 0 0 n |h(x) − h(x )| ≤ ρ|x − x | ∀x, x ∈ R . Then EGh exists, is unique and characterized by

G G n (E h)(x) = E [h(x)] ∀x ∈ R . Moreover, G 0 G 0 0 d |(E h)(x − x )| ≤ (E ρ)|x − x | ∀x, x ∈ R .

Proof. The assumptions imply that h(x) ∈ L1 for all x ∈ Rd. Let D be a countable dense set in Rd and, for each x ∈ D, let h˜(x) be the conditional expectation of h(x). There is a P -null set N such that, outside N,

|h˜(x) − h˜(x0)| ≤ (EGρ)|x − x0| for each x, x0 ∈ D. Outside N, h˜ extends to whole Rd by continuity while on N we set h˜ = 0. PN n1 n For a simple x = n=1 x An , where A ∈ G form a disjoint partition of Ω and xn ∈ D, we have

N N N G G X n G X n X ˜ n ˜ E [h(x)] = E [h( x 1An )] = E [ 1An h(x )] = 1An h(x ) = h(x) n=1 n=1 n=1 Any x ∈ L∞(G) is a pointwise limit of such simple random variables bounded by kxk, so dominated convergence for conditional expectation and scenariowise continuity of h and h˜ imply that EG[h(x)] = h˜(x). Thus Lemma 7.4 implies the claim. Exercise 7.1.1. Find a solution to (BE) and a solution of the corresponding stochastic optimization problem when T = 0, n = 1

h(x, ω) = V (x0(s1 − s0)), where si have standard normal distributions, s0 ∈ F0 (in particular, F0 is not trivial), s1 is independent of F0, and ( exp(x) − 1 if x ≤ 0, V (x) = x if x > 1.

Lemma 7.7. Let h1 and h2 be L-bounded normal integrands with h1 ≤ h2. Then EGh1 ≤ EGh2 whenever the conditional expectations exist.

Proof. For any x ∈ L∞(G), h1(x) and h2(x) are quasi-integrable, so h1 ≤ h2 implies EGh1(x) ≤ EGh2(x) and the result follows from Theorem 5.17.

36 The following result is a monotone convergence theorem for conditional expectations of integrands.

ν ∞ Theorem 7.8. Let (h )ν=1 be a nondecreasing sequence of L-bounded normal integrands and h = sup hν . ν If each EGhν exists, then EGh exists and

EGh = sup EGhν . ν

∞ ∞ Proof. For any x ∈ L (G) and α ∈ L (G; R+), monotone convergence and Lemma 7.7 imply that

E[αh(x)] = E[α sup hν (x)] = sup E[αhν (x)] ν ν = sup E[αEG[hν (x)]] = E[α sup EG[hν (x)]]. ν ν

G ν G Thus supν E h = E h.

The following gives existence and uniqueness of the conditional expectation of general L-bounded normal integrand. Theorem 7.9. An L-bounded normal integrand has a unique conditional expectation and that is L-bounded as well.

Proof. Let h be an L-bounded normal integrand. Assume ﬁrst that h ≤ n for some constant n > 0. By Corollary 5.11,

hν (x, ω) := inf{h(x0, ω) + νρ(ω)|x − x0|} x0 form a nondecreasing sequence of Caratheodory functions increasing pointwise to h. By Theorems 7.6 and 7.8, EGh exists. To remove the assumption h ≤ n, consider the nondecreasing sequence hn(x) := min{h(x), n} and apply Theo- rem 7.8 again.

G Corollary 7.10. Assume that S is a closed random set. Then E δS = EδS˜, where S˜ is a unique closed random set such that L0(G; S˜) = L0(G; S).

Proof. By Theorem 7.9, the normal integrand h = δS has a unique conditional expectation EGh. Since the conditional expectation of a {0, +∞}-valued random variable is {0, +∞}-valued as well, we have, for every x ∈ L0(G) and a ≥ 0, that G G G if E h(x) ≤ a, then E h(x) = 0. Hence, by Theorem 5.17, lev≤a(E h) = G G lev≤0(E h) for all a ≥ 0, so E h is the indicator of its domain S˜ and the claim follows from the deﬁnition of EGh.

In the above lemma, S˜ is known as the G-measurable core of S.

37 Theorem 7.11 (Tower property). Assume that h is an L-bounded normal integrand, and G˜ is a sub-σ-algebra of G. Then

EG˜(EGh) = EG˜h.

Proof. By Theorem 7.9, EGh exists and is L-bounded, so the result follows from the usual tower property of conditional expectation and Lemma 7.4.

0 Lemma 7.12. Let ξ1, ξ2 ∈ L (F; R) be quasi-integrable. Then

G G G 1. E [ξ1 + ξ2] = E [ξ1] + E [ξ2] if ξ1, ξ2 satisfy any of the conditions in Lemma 6.7.

G G ∞ 2. E [ξ1ξ2] = ξ1E [ξ2] if ξ1 ∈ L (G).

∞ Proof. Let α ∈ L+ (G). To prove 1, Lemma 6.7 gives

G G G G E[α(ξ1 +ξ2)] = E[αξ1]+E[αξ2] = E[αE ξ1]+E[αE ξ2] = E[α(E ξ1 +E ξ2)].

The second claim is clear. Theorem 7.13 (Conditional expectation in operations). Let h, h1 and h2 be L-bounded normal integrands.

1. h1 + h2 is L-bounded and EG(h1 + h2) = EGh1 + EGh2.

∞ G G 2. If α ∈ L+ (G), then αh is L-bounded and E (αh) = αE h.

3. If M :Ω × Rn → Rd is a G-measurable Caratheodory mapping such that |M(x)| ≤ α|x| + b for α, β ∈ L0 with ρα and ρβ integrable, then h1◦M is L-bounded and EG(h◦M) = (EGh)◦M.

Proof. For any x ∈ L∞(G), h1(x)− and h2(x)− are integrable, so 1 follows from Lemmas 7.4 and 7.12. The second follows similarly from Lemmas 7.4 and 7.12. To prove , assumptions imply that h◦M satisﬁes the assumption of Theorem 7.9 and that h(M(x) is quasi-integrable for every x ∈ L∞. Thus Theorem 7.9 implies EG(h(M(x)) = (EGh)(M(x)).

Next we focus on convex normal integrands. Theorem 7.14. For a convex L-bounded normal integrand h, EGh is a convex L-bounded normal integrand. If h is positively homogeneous, so is EGh.

Proof. Exercise.

38 Theorem 7.15. Let h be a convex normal integrand and F a K-convex G- normal function such that (5.1) holds and (−K) ∩ {u | h∞(u, ω) ≤ 0} is linear almost surely, and there exists y ∈ dom Eh∗ ∩ L1 such that y · F is L-bounded. Then h◦F is an L-bounded convex normal integrand,

EG(h◦F ) = (EGh)◦F and EG(h◦F ) is convex.

Proof. By Fenchel’s inequality,

h(F (x, ω), ω) ≥ y(ω) · F (x, ω) − h∗(y), so the composition is L-bounded and the claim follows from Theorem 7.9. Theorem 7.16. Assume that h is a convex normal integrand such that there exists x ∈ dom Eh ∩ L0(G) and v ∈ dom Eh∗ ∩ L1 with (x · v)− ∈ L1. Then

EG(h∞) = (EGh)∞.

Proof. The diﬀerence quotients

h(λx0 + x(ω), ω) − h(x(ω), ω) hλ(x0, ω) := λ

λ ∞ ∞ deﬁne a nondecreasing sequence of integrands (h )λ=1 with pointwise limit h . Fenchel’s inequality gives

hλ(x0, ω) ≥ x0 · v(ω) + x · v(ω) − h∗(v(ω), ω)) − h(x(ω), ω), so Theorems 7.8 and 7.13 imply the claim.

n A G-conditional expectation of an F-measurable set-valued mapping S :Ω ⇒ R is a G-measurable closed-valued mapping EGS such that

L1(G; EGS) = cl{EGv | v ∈ L1(F; S)}, where the closure is in L1. The following lemma gives the existence.

Lemma 7.17. Assume that S is closed convex-valued mapping and admits at least one integrable selection. Then EGS exists, is unique, closed-convex valued and characterized by G ∗ δEG S = (E σS) .

G Proof. By Theorem 7.14, E σS is a positively homogeneous convex normal integrand. By Theorem 5.23 and ??, its conjugate is the indicator function of some G-measurable closed convex-valued Γ.

39 Let C = {EGv | v ∈ L1(F; S)}. For every x ∈ L∞(G),

G σC (x) = EσS(x) = E[E σS(x)] = E[σΓ(x)] = σL1(Γ)(x), where the ﬁrst and the last equality follow from Theorem 6.2 and the second from Theorem 7.20. By the biconjugate theorem in Section ??, cl C = L1(Γ), so Γ = EGS by deﬁnition. Lemma 7.18. Assume that h is a convex normal integrand such that there exists x ∈ dom Eh ∩ L0(G) and v ∈ dom Eh∗ ∩ L1 with (x · v)− ∈ L1. Then  αh(x/α, ω) if α > 0,  H(x, α, ω) := h∞(x, ω) if α = 0, +∞ otherwise

∗ is an L-bounded convex normal integrand with H = δepi h∗ and  αEGh(x/α, ω) if α > 0,  EGH(x, α, ω) = (EGh)∞(x, ω) if α = 0, +∞ otherwise

Proof. Since v ∈ dom Eh∗ ∩ L1, H is L-bounded. 0 0 Assume for a contradiction that there exists (x, α) ∈ L (G)+ × L (G) such that G E σepi h∗ (x, α) 6= σepi(EG h)∗ (α, x). Let

Aν := {ω | α(ω) ∈ {0} ∪ [1/ν, ν], |x(ω)| ≤ ν}. Now L-boundedness of h gives

G E [σepi h∗ (x, α)] = σepi(EG h)∗ (α, x) on Aν .

Since P (Aν ) % 1, this leads to a contradiction.

For a convex normal integrand h, the mapping EG epi h is an epigraph of a G-normal integrand. We denote this G-normal integrand by Gh and call it the epi-conditional expectation of h. Theorem 7.19. Assume that h is a convex normal integrand such that there exists x ∈ dom Eh ∩ L0(G) and v ∈ dom Eh∗ ∩ L1 with (x · v)− ∈ L1. Then

(EGh)∗ = G(h∗).

Proof. By assumption, there exists v ∈ L1 such that (v, h∗(v)+) is an integrable selection of epi h∗. Applying Lemma 7.17 to epi h∗, we get that EG epi h∗ exists and is characterized by

G ∗ δEG epi h∗ = (E σepi h∗ ) . Thus the claim follows from Lemma 7.18.

40 Exercise 7.1.2. Give an example of a convex normal integrand h for which

EGh 6= Gh.

When h is bounded from below, then Eh = E(EGh) on L0(G) and, by Fatou’s lemma, Eh is lsc on L0(G). More generally, we have the following. Theorem 7.20. Assume that h is an L-bounded normal integrand. Then Eh = E(EGh) on dom Eh ∩ L0(G) and

inf E(EGh)(x) = inf Eh(x). x∈L0(G) x∈L0(G)

If Eh is proper and lsc on L0(G), then E(EGh) = Eh on L0(G) and, in particular, argmin Eh ∩ L0(G) = L0(G; argmin EGh).

Proof. Letx ¯ ∈ dom Eh ∩ L0(G). Then h(¯x) is quasi-integrable, so Eh(¯x) = E[EGh(¯x)] = E(EGh)(¯x). Thus Eh = E(EGh) on dom Eh ∩ L0(G). Let x ∈ dom E(EGh) ∩ L0(G). We have

− − − h(x1{|x|≤ν} +x ¯1{|x|>ν}) ≤ max{h(x1{|x|≤ν}) , h(¯x) }, where the right side is integrable when h is L-bounded and Eh is proper. Thus

G Eh(x1{|x|≤ν} +x ¯1{|x|>ν}) = E(E h)(x1{|x|≤ν} +x ¯1{|x|>ν}).

Since

G + G + G + (E h)(x1{|x|≤ν} +x ¯1{|x|>ν}) ≤ max{(E h)(x) , (E h)(¯x) }, where the right side in integrable, Fatou’s lemma gives

G lim sup Eh(x1{|x|≤ν} +x ¯1{|x|>ν}) ≤ E(E h)(x), ν→∞ proving the second claim. If Eh is proper and lsc on L0(G), the last inequality gives Eh ≤ E(EGh) on L0(G). In this case, argmin Eh ∩ L0(G) = argmin E(EGh) ∩ L0(G), where the latter equals L0(G; argmin EGh) by Theo- rem 6.2.

Lemma 7.21. Assume that h is a convex normal integrand such that there exist v ∈ L∞(G)⊥, m ∈ L1 and two diﬀerent values of λ ∈ R such that h(x, ω) ≥ λx · v(ω) − m(ω).

∞ G ∞ Then lev0 h and lev0(E h) have the same G-measurable selections.

41 Proof. If h∞(x) ≤ 0, then (EGh)∞(x) ≤ 0 by Theorem 7.16. We have

h∞(x) ≥ (λ0 − λ) max{x · v, 0} − λx · v, so (EGh)∞ ≥ 0, by Theorem 7.16 and Theorem 7.13.1. If (EGh)∞(x) ≤ 0, then Eh∞(x) ≤ 0 by the previous theorem. In this case, the lower bound and Lemma 6.5 imply x · v = 0, so h∞(x) ≥ 0 which together with Eh∞(x) ≤ 0 gives h∞(x) = 0.

The next example shows that we can have E(EGh) 6= Eh if the lower semicontinuity assumption in Theorem ?? is omitted. Example 7.22. Let h(x, ω) = x · v(ω) for v ∈ L∞(G)⊥ with v 6= 0 almost surely. Then EGh ≡ 0, but Eh(x) = +∞ for some x ∈ L0(G) if L∞(G) is not ﬁnite-dimensional while Eh is proper by Lemma 6.5. Hence Eh 6= E(EGh) on L0(G).

7.2 Existence of solutions

T n0+···+nt Recall that an adapted sequence (ht)t=0 of normal integrands ht : R × Ω → R solves the generalized Bellman equations if

hT = ET h, h˜ (xt−1, ω) := inf h (xt−1, x , ω), t−1 n t t (BE) xt∈R t ˜ ht−1 = Et−1ht−1 for t = T,..., 1. The ﬁrst part of the following result gives suﬃcient conditions for the existence of a solution. The second part uses the solution and Theorem 7.2 to prove the existence of a solution to (SP ). Theorem 7.23. Assume that h is convex, (SP ) is feasible and that

1. there exist v ∈ N ⊥, λ ∈ R and m ∈ L1 such that h(x, ω) ≥ λx · v(ω) − m(ω), (7.1)

2. the set nt ∞ t t−1 Nt(ω) := {xt ∈ R | ht (x , ω) ≤ 0, x = 0}

is linear-valued whenever ht is a well-deﬁned normal integrand.

T Then (BE) has a unique solution (ht)t=0 with

t t t ht(x , ω) ≥ λx · (Etv )(ω) − (Etm)(ω) (7.2) for all t = 0,...,T .

42 If h is N ⊥-bounded, then

t inf (SP ) = inf Eht(x ) xt∈N t for all t = 0,...,T , (SP ) has a solution x ∈ N with xt ⊥ Nt for every t = 0,...,T and, moreover, the solutions of (SP ) are characterized by

t−1 xt(ω) ∈ argmin ht(x (ω), xt, ω) P -a.s. t = 0,...,T. (OC) xt

Proof. Assume that ht+1 is well-deﬁned and that

t+1 t+1 t+1 ht+1(x , ω) ≥ λx · (Et+1v )(ω) − (Et+1m)(ω). ˜ t+1 ˜ By Theorem 5.21, ht is well-deﬁned. Since Et+1vt+1 = 0, ht is L-bounded, so ht exists and is unique by Theorem 7.9. By the same theorem, the lower bound admits a conditional expectation, so we get

t t ht(x, ω) ≥ λx · (Etv )(ω) − (Etm)(ω).

By induction, (BE) admits a unique solution satisfying (7.2). Assume now that h is N ⊥-bounded. Let

k(x, ω) := h(x, ω) − λx · v(ω). and (kt) be the corresponding solution to (BE). By Theorem 8.19, we have t Ek = Eh on N . Since Etvt = 0, Theorem 7.13 gives, recursively backward in time, that t t t kt(x, ω) := ht(x , ω) − λx · Etv (ω). t Thus, Ekt = Eht on N , by Theorem 8.19 again. Since k is bounded from below, Theorem 7.2 says that

t inf (SP ) = inf Ekt(x ) xt∈N t and that an x ∈ N is a solution of (SP ) associated with k if and only if

t−1 xt(ω) ∈ argmin kt(x (ω), xt, ω) ∀t = 0,...,TP -a.s.. xt

t Since Etvt = 0 we also have

t−1 t−1 argmin kt(x (ω), xt, ω) = argmin ht(x (ω), xt, ω) xt xt which gives the optimality condition. Applying Theorem 5.21 recursively forward in time gives an optimal x ∈ N with xt ⊥ Nt.

The following example shows that assumptions 1 and 2 in Theorem 7.23 are not suﬃcient for the last statement of the theorem.

43 2 ⊥ Example 7.24. Let α ∈ L (F0) and v ∈ N be such that Eαv = ∞ and consider 1 h(x, ω) := |x − α(ω)|2 + x v(ω). 2 0 0 1 2 Here h0(x, ω) = 2 |x0 − α(ω)| , so x ∈ N satisﬁes (OC) if and only if x0 = α. + − For such x, neither h (x) nor h (x) is integrable, so E0h(x) does not exist. Hence (SP ) does not have a solution albeit (BE) has a unique solution and there is a unique x satisfying (OC).

The following result shows that the linearity condition of Theorem 7.2 can be stated in terms of the original normal integrand h directly. We will use the consequence of Theorem 5.17 that if C is closed-valued and G-measurable, then it is almost surely linear-valued if and only if the set of its measurable selectors is a linear space. Theorem 7.25. Assume that h is an N ⊥-bounded convex normal integrand. Then assumptions of Theorem 7.23 hold if and only if L = {x ∈ N | h∞(x(ω), ω) ≤ 0 a.s.} is a linear space.

Proof. We proceed by induction on T . When T = 0, Theorem 7.9 implies that hT is well defined and the second part of Lemma 7.21 gives ∞ L = {x ∈ N | hT (x(ω), ω) ≤ 0 a.s.}, 0 which, by definition of NT , equals L (FT ; NT ). Since NT is FT -measurable, by Theorem ??, the linearity of L is equivalent to NT being linear-valued. Let now T be arbitrary and assume that the claim holds for every (T − 1)-period model. 0 ∞ If L is linear then L := {x ∈ N | x0 = 0, h (x(ω), ω) ≤ 0 a.s.} is linear as well. Applying the induction hypothesis to the (T − 1)-period model obtained by fixing x0 ≡ 0, we get that ht is well defined and Nt is linear for t = T,..., 1. By Theorem 5.21 and Theorem 7.9, h0 is well defined. By Theorem 5.21 and the second part of Lemma 7.21,

0 0 ∞ L (F0; N0) = {x0 ∈ L (F0) | h0 (x0(ω), ω) ≤ 0 a.s.} 0 ˜∞ = {x0 ∈ L (F0) | h0 (x0(ω), ω) ≤ 0 a.s.} 0 ∞ = {x0 ∈ L (F0) | inf h1 (x0(ω), x1, ω) ≤ 0 a.s.} x1 0 ∞ 1 = {x0 ∈ L (F0) | ∃x˜ ∈ N :x ˜0 = x0, h1 (˜x (ω), ω) ≤ 0 a.s.}, where the last equality follows by applying the last part of Theorem 5.21 to the normal integrand h∞. Repeating the argument for t = 1,...,T , we get

0 0 ∞ L (F0; N0) = {x0 ∈ L (F0) | ∃x˜ ∈ N :x ˜0 = x0, hT (˜x(ω), ω) ≤ 0 a.s.} 0 ∞ = {x0 ∈ L (F0) | ∃x˜ ∈ N :x ˜0 = x0, h (˜x(ω), ω) ≤ 0 a.s.} 0 = {x0 ∈ L (F0) | ∃x˜ ∈ L :x ˜0 = x0}. (7.3)

44 0 The linearity of L thus implies that of L (F0; N0) which is equivalent to N0 being linear-valued.

Assume now that Nt is linear-valued for t = T,..., 0 and let x ∈ L. Expression 0 (7.3) for L (F0; N0) is again valid so, by linearity of N0, there is anx ˜ ∈ L ∞ withx ˜0 = −x0. Since h is sublinear, L is a cone, so that x +x ˜ ∈ L. Since 0 x0 +x ˜0 = 0, we also have x +x ˜ ∈ L . Since, by the induction assumption, L0 is linear and since L0 ⊆ L, we get −x − x˜ ∈ L. Since L is a cone, we get −x =x ˜ − x − x˜ ∈ L. Thus, L is linear. For t = 0, the last claim follows directly from expression (7.3). The general case follows by applying this to the (T − t)-period model obtained by ﬁxing xt−1 ≡ 0.

¯ T 0 n¯t Given N := {(¯xt)t=0 | x¯t ∈ L (Ft; R },n ¯ :=n ¯0 ... n¯T we say that a mapping A : Rn × Ω → Rn¯ is adapted if Ax ∈ N¯ for every x ∈ N . Theorem 7.26. Let h be an N ⊥-bounded convex normal integrand such that

h(x, ω) ≥ λx · v(ω) − m(ω) for v ∈ N ⊥, m ∈ L1 and two diﬀerent values of λ ∈ R, and that L = {x ∈ N | h∞(x) ≤ 0} is a linear space. Then each h¯ below is an N ⊥-bounded convex normal integrand such that L¯ = {x ∈ N¯ | h¯∞(x) ≤ 0} is a linear space.

1. h¯(¯x, ω) := inf{h(x, ω) | A(ω)x =x ¯} as soon as h¯ is a normal integrand and A : Rn × Ω → Rn¯ is an adapted aﬃne map such that v = AT v¯ for some v¯ ∈ N¯ ⊥.

2. h¯(¯x, ω) := h(A(ω)¯x, ω), where A : Rn¯ × Ω → Rn is an adapted aﬃne map such that AT v ∈ L1.

Proof. In 1, L¯ = {x¯ ∈ N¯ | ∃x ∈ L : Ax ∈ L}, so L¯ is a linear space. Since AT v¯ = v, h¯(¯x) ≥ inf{λx · AT v¯ − m | Ax =x ¯} ≥ λx¯ · v¯ − m, x so h¯ is N ⊥-bounded. In 2, L¯ = {x¯ ∈ N¯ | Ax¯ ∈ L}, so L¯ is a linear space. The lower bound becomes

h¯(¯x) ≥ λx¯ · AT v − m

By assumption, AT v ∈ L1 so that, for anyx ¯ ∈ N¯ ∞, E[¯x · AT v] = E[Ax¯ · v] = 0 by Lemma 6.5. Thus AT v ∈ N¯ ⊥ and h¯ is N ⊥-bounded.

45 7.3 Kernels and conditioning

Given a measurable space (S, S), a mapping µ : (Ω, G)×S → R is a (probability) kernel if µ(A) := ω 7→ µ(ω, A) is G-measurable for each A ∈ S and A 7→ µ(ω, A) is a (probability) measure for each ω ∈ Ω. A probability kernel µ is a G- conditional regular distribution of η ∈ L0(Ω; S) if, for all B ∈ G and A ∈ S,

E[1B1η∈A] = E[1Bµ(A)].

This means that µ(A) is a G-conditional expectation of 1η∈A. Theorem 7.27 (Disintegration). Let (S, S) be a measurable space, H : Rn × S → R be B(Rn) ⊗ S-measurable and η ∈ L0(Ω; S) with a G-conditional regular distribution µ. The function Z h˜(x, ω) := H(x, s)µ(ω, ds) is B(Rn) × G-measurable. If H(x, η) is quasi-integrable for x ∈ L0(Ω, G; Rn), then, almost surely, EG[H(x, η)] = h˜(x).

0 Proof. The claims holds if H = 1B1A for B ∈ S and A ∈ S. By the monotone class theorem, the claims holds for every S0 ×S-measurable nonnegative H. The general case follows by taking diﬀerences.

In the case where (S, S) = (Ω, F) and the coordinate mapping η(ω) = ω has a G-conditional regular distribution µ, Theorem 7.27 gives, almost surely Z EG[X](ω) = X(ω0)µ(ω, dω0) for any quasi-integrable random variable X. Theorem 7.28. Let h be an L-bounded normal integrand and assume that the coordinate mapping η(ω) = ω has a G-conditional regular distribution µ. Then Z h˜(x, ω) := h(x, ω0)µ(ω, dω0) is a G-conditional expectation of h.

Proof. By Theorem 7.27, Z EG[h(x)](ω) = h(x(ω), ω0)µ(ω, dω0) = h˜(x(ω), ω) for every x ∈ L∞(G). Letρ ˜(ω) := R ρ(ω0)µ(ω, dω0) for ρ from the deﬁnition of L-boundedness of h. If

|h(x, ω) − h(x0, ω)| ≤ ρ(ω)|x − x0|,

46 then |h˜(x, ω) − h˜(x0, ω)| ≤ ρ˜(ω)|x − x0|, so h˜(·, ω) is continuous for every ω and, since h˜ is measurable, h˜ is a Caratheodory integrand and hence a normal integrand. Let

hν (x, ω) = inf{h(x0, ω) + νρ(ω)|x − x0|}. x0

By above, each h˜ν (x, ω) := R hν (x, ω0)µ(ω, dω0) is a normal integrand. Since ˜ ˜ν ˜ h = supν h , h is a normal integrand by Theorem 7.8.

Theorem 7.29. Let H : Rn × Rm → R be a B(Rm)-normal integrand and η ∈ L0(F; Rd) with a regular G-conditional distribution µ such that there exist measurable M,R : Rm → R with M(η),R(η) ∈ L1, and H(x, p) ≥ −R(p)|x| − M(p).

For h(x, ω) := H(x, η(ω)) we have

(EGh)(x, ω) = h˜(x, ω), where h˜ is the normal integrand Z h˜(x, ω) := H(x, p)µ(ω, dp).

Proof. When h˜ is a B(Rd)-normal integrand, Theorem 7.27 gives Z EG[H(x, η)] = H(x, p)µ(ω, dp) = h˜(x, ω) for every x ∈ L∞(G). Denotingρ ˜(ω) := R R(p)µ(ω, dp), if

|H(x, p) − H(x0, p)| ≤ R(p)|x − x0|, then |h˜(x, ω) − h˜(x0, ω)| ≤ ρ˜(ω)|x − x0|, so h˜(·, ω) is continuous for every s ∈ Rd and, since h˜ is measurable, h˜ is a Caratheodory integrand and hence a normal integrand. Let

Hν (x, p) = inf{H(x0, p) + νR(p)|x − x0|}. x0

By above, each h˜ν (x, ω) := R Hν (x, p)µ(ω, dp) is a normal integrand. Since ˜ ˜ν ˜ h = supν h , h is a normal integrand by Theorem 7.8.

47 Given ξ ∈ L0(Rd) and η ∈ L0(Rm), let ν be a probability kernel on Rd × B(Rm) σ(ξ) n such that E [1η∈B] = ν(ξ, B) for all B ∈ B(R ) almost surely. The kernel ν is unique L(ξ)-a.e. and called the regular ξ-conditional distribution of η. Given a measurable R : Rm → R with R(η) ∈ L1, there exists a version of ν such that Z R˜(s) := R(p)ν(s, dp) < ∞ for all s.

Indeed, R˜(s) := R R(p)ν(s, dp) < ∞ L(ξ)-a.e., so it suﬃces to redeﬁne ν to be zero on a L(ξ)-null Borel set that contains {s | R˜(s) = ∞}.

Theorem 7.30. Let H : Rn × Rm → R be a B(Rm)-normal integrand, η ∈ L0(F; Rd) such that there exist measurable M,R : Rm → R with M(η),R(η) ∈ L1 and H(x, p) ≥ −R(p)|x| − M(p) and let ν be a regular ξ-conditional distribution of η for which Z R˜(s) := R(p)ν(s, dp) < ∞ for all s.

For h(x, ω) := H(x, η(ω)) we have

(Eσ(ξ)h)(x, ω) = H˜ (x, ξ(ω)), where H˜ is the normal integrand Z H˜ (x, s) := H(x, p)ν(s, dp).

Proof. Theorem 7.27 gives Z EG[H(x, η)] = H(x, p)ν(ξ, dp) = H˜ (x, ξ) for every x ∈ L∞(σ(ξ)), so it suﬃces to prove that H˜ is a B(Rd)-normal integrand. If |H(x, p) − H(x0, p)| ≤ R(p)|x − x0|, then |H˜ (x, s) − H˜ (x0, s)| ≤ R˜(s)|x − x0|, so H˜ (·, s) is continuous for every s ∈ Rd and, since H˜ is measurable, H˜ is a Caratheodory integrand and hence a normal integrand. Let

Hν (x, p) = inf{H(x0, p) + νR(p)|x − x0|}. x0

By above, each H˜ ν (x, s) := R Hν (x, p)ν(s, dp) is a normal integrand. Since ˜ ˜ ν ˜ H = supν H , H is a normal integrand by Theorem 7.8.

48 Given G0, H ⊂ F, the σ-algebras G0 and G are H-conditionally independent if

H H H E [1A0 1A] = E [1A0 ]E [1A] for every A0 ∈ G0 and A ∈ G such that the conditional expectations exist.

Lemma 7.31. For σ-algebras G0, G and H, the following are equivalent:

1. G0 and G are H-conditionally independent 2. EH[w0w] = EH[w0]EH[w] for every w0 ∈ L1(G0) and w ∈ L∞(G).

3. EG∨H[w0] = EH[w0] for every w0 ∈ L1(G0).

Proof. The ﬁrst implies the second by two applications of the monotone class theorem. When the second holds, we have, for any w0 ∈ L1(G0), A ∈ G and B ∈ H,

H 0 H 0 0 G∨H 0 E[E [w ]1A∩B] = E[E [w 1A]1B] = E[w 1A1B] = E[E [w ]1A∩B], and, by the monotone class theorem, this extends from sets of the form A ∩ B to any set in G ∨ H. Thus the second implies the third. Assuming the third, we have, for A0 ∈ G0 and A ∈ G,

H H G∨H H G∨H E [1A1A0 ] = E [E [1A1A0 ]] = E [1AE 1A0 ] H H H H = E [1AE 1A0 ] = E [1A]E [1A0 ], so the ﬁrst holds.

A random variable w is H-conditionally independent of G if σ(w) is so. Likewise, we say that a normal integrand is H-conditionally independent of G if σ(h) is H-conditionally independent of G. Here σ(h) is the smallest σ-algebra under which epi h is measurable. Theorem 7.32. Let h be an L-bounded normal integrand H-conditionally independent of G. Then EG∨Hh = EHh. In particular, if h is independent of G, then EGh is deterministic.

Proof. Assume first that h satisfies the assumptions of Theorem 7.6. Then EGh is characterized by G G d E (h(x)) = (E h)(x) ∀x ∈ R and likewise for EG∨Hh. Thus EG∨Hh = EHh, by Lemma 7.31. The first claim now follows using Lipschitz regularizations as in the proof of Theorem 7.9. The second claim follows by taking H the trivial σ-algebra.

49 7.4 Examples

Example 7.33 (Financial mathematics continued). In the optimal investment problem in Example 2.2, we have

T −1 ! T −1 X X h(x, ω) = V u(ω) − xt · ∆st+1(ω) + δDt (xt, ω). t=0 t=0

Here h is N ⊥-bounded if there exists a P -absolutely continuous martingale measure Q of the price process s such that, for y := dQ/dP , yu ∈ L1 and EV ∗(λiy) < ∞ for two diﬀerent λi ∈ R. Indeed, Fenchel’s inequality then gives

T −1 i X i ∗ i h(x) ≥ −λ y xt · ∆st+1 − λ yu − V (λ y), t=0

⊥ so h is N -bounded with vt := y∆st+1 and

m = max{λ1yu + V ∗(λ1y), λ2yu + V ∗(λ2y)}.

Exercise 7.4.1. In the setting of Example 7.33, assume that V is not a constant function (i.e., V is a convex nondecreasing non-constant extended real-valued function on the real line) and that there are no portfolio constraints (Dt is the whole space for all t). Show that the linearity condition in Theorem 7.25 is PT −1 equivalent to the no-arbitrage condition: the terminal wealth t=0 xt · ∆st+1 is nonnegative almost surely if and only if it is zero almost surely. Exercise 7.4.2. Assume that V is a ﬁnite diﬀerentiable convex function on the real line. Show that ( v+x if x ≥ 0, V ∞(x) = v−x if x < 0,

+ 0 − 0 0 where v := supx V (x), v := infx V (x) and V is the derivative of V . T d Exercise 7.4.3 (Variance optimal hedging). Let s = (st)t=0 be an R -valued T 0 (Ft)t=0-adapted stochastic process, u ∈ L (Ω, F,P ; R) and consider the problem of minimizing T −1 !2 X E α + zt · ∆st+1 − u t=0 d over α ∈ R and Ft-measurable R -valued functions zt. Here α is interpreted as an initial value of a self-ﬁnancing trading strategy where zt is the portfolio of risky assets held over period [t, t + 1]. Show that optimal solutions exists.

The remaining examples are concerned with the dynamic programming principle.

50 Theorem 7.34 (Problems of Bolza). Assume that

T X h(x, ω) = Kt(xt−1, ∆xt, ω) + k(xT , ω) t=0 for some ﬁxed initial state x−1 and Ft-measurable convex normal integrands Kt such that

1. There exists two diﬀerent values of λ ∈ R and a, b ∈ N 1 with

Kt(xt−1, ∆xt) ≥ λ(at · xt−1 + bt · xt) − mt,

k(xT , ω) ≥ λaT +1 · xT − mT +1

and Etat+1 + bt = 0.

PT ∞ ∞ 2. {x ∈ N | t=0 Kt (xt−1, ∆xt) + k (xT ) ≤ 0} is a linear space.

nt Then the functions Vt : R × Ω → R deﬁned by

VT = ET k, V˜ (x , ω) = inf {K (x , ∆x , ω) + V (x , ω)}, t−1 t−1 n t t−1 t t t (7.4) xt∈R t

Vt−1 = Et−1V˜t−1, are L-bounded convex normal integrands,

nt ∞ ∞ Nt(ω) = {xt ∈ R | Kt (0, xt, ω) + Vt (xt, ω) ≤ 0} are linear-valued,

" t # X inf (SP ) = inf E Ks(xs−1, ∆xs) + Vt(xt) t = 0,...,T, xt∈N t s=0 an x ∈ N is optimal if and only if

xt(ω) ∈ argmin{Kt(xt−1(ω), ∆xt, ω) + Vt(xt, ω)} P -a.s. t = 0,...,T, n xt∈R t and there is an optimal solution x ∈ N with xt ⊥ Nt for every t = 0,...,T .

Proof. Summing up the lower bounds,

T T +1 X X h(x) ≥ λ xt · (at+1 + bt) − mt t=0 t=0 so h is N ⊥-bounded. By Theorems 7.25 and 7.23, it suﬃces to show that the solution of (BE) satisﬁes

t t X ht(x , ω) = Ks(xs−1, ∆xs, ω) + Vt(xt, ω) (7.5) s=0

51 for every t = 0,...,T . For t = T , (7.5) is obvious since VT = ET k by deﬁnition. PT +1 Assuming that (7.5) holds for t and that Vt(xt) ≥ xt · Etat+1 − Et s=t+1 ms, we get

h˜ (xt−1, ω) = inf h (xt−1, x , ω) t−1 n t t xt∈R t t−1 X = K (x , ∆x , ω) + inf {K (x , ∆x , ω) + V (x , ω)}. s s−1 s n t t−1 t t t xt∈ t s=0 R The lower bounds give

T +1 X V˜t−1(xt−1) ≥ xt−1 · at − mt − Et ms, s=t+1 so V˜t−1 is L-bounded, while, by Theorem 5.21, the linearity of Nt implies that V˜t−1 is a well-deﬁned convex normal integrand. We obtained

t−1 ˜ t−1 X ht−1(x , ω) = Ks(xs−1, ∆xs, ω) + V˜t−1(xt−1, ω). s=0

Since for s = 0, . . . , t − 1, Ks is Ft−1-measurable and L-bounded, Theorem 7.13 gives

t−1 t−1 X ht−1(x , ω) = Ks(xs−1, ∆xs, ω) + Vt−1(xt−1, ω), s=0 so the claim follows by induction on t. Corollary 7.35 (Stochastic control continued). Consider the problem

" T # X minimize E Lt(Xt,Ut) over (X,U) ∈ N , t=0 (SC)

subject to ∆Xt = AtXt−1 + BtUt−1 + ut and assume that

1 1. There exists two diﬀerent values of λ ∈ R and a, b ∈ N with Etat+1 +bt = T T 0, At bt,Bt bt and bt · ut integrable and

Lt−1(Xt−1,Ut−1) ≥ λ(at · Xt−1 + bt · Xt) − mt,

LT (XT ,UT ) ≥ λaT +1 · XT − mT +1

for all feasible (X,U).

PT ∞ 2. {(X,U) ∈ N | t=0 Lt (Xt,Ut) ≤ 0, ∆Xt = AtXt−1 + BtUt−1} is a linear space.

52 lt lt+mt Then functions Jt : R × Ω → R and It : R × Ω → R deﬁned recursively by

IT +1 := 0 J (X ) := inf (L + E I )(X ,U ), t t m t t t+1 t t Ut∈R t It(Xt−1,Ut−1) := Jt(Xt−1 + AtXt−1 + BtUt−1 + ut), are L-bounded convex normal integrands,

∞ ∞ Nt(ω) := {Ut | Lt (0,Ut, ω) + (EtIt+1)(0,Ut, ω) ≤ 0} are linear-valued, the optimum value of (SC) coincides with that of

" t # X t t t minimize E Ls(Xs,Us) + Jt(Xt) over (X ,U ) ∈ N , s=0

subject to ∆Xs = AsXs−1 + BsUs−1 + us s = 0, . . . , t for all t = 0,...,T , an U ∈ N is optimal if and only if

Ut(ω) ∈ argmin{Lt(Xt,Ut, ω) + (EtIt+1)(Xt,Ut, ω)} m Ut∈R t and there is an optimal solution (X,U) ∈ N with Ut ⊥ Nt for every t = 0,...,T .

Proof. Omitted.

Exercise 7.4.4. Prove Corollary 7.35 in the case when Lt ≥ m for all t for some m ∈ L1.

Theorem 7.36. In the setting of Theorem 7.34, assume that Kt+1 is Kt- conditionally independent of Ft and k is KT -conditionally independent of FT . ˜ σ(Kt) ˜ T Then Vt is σ(Kt+1)-measurable and Vt = E Vt. In particular, if (Kt)t=0 and k are mutually independent, then each Vt is deterministic.

Proof. By Theorem 7.32, VT = ET k is σ(KT )-measurable. If Vt+1 is σ(Kt+1)- σ(Kt) measurable, then V˜t is so as well, and Vt = E V˜t, by Theorem 7.32. Thus the claim follows by induction.

T T A stochastic process (ξt)t=0 is (Ft)t=0-Markov if ξt+1 is ξt-conditionally independent of Ft. For each t, let νt be the regular ξt-conditional distribution νt of dt dt+1 ˆ n dt dt+1 ξt+1. For a B(R × R )-normal integrand V : R × R × R → R, we deﬁne Eξt Vˆ : Rn × Rdt → R by Z ξt (E Vˆ )(x, st) := Vˆ (x, st, st+1)dνt(st, st+1).

ξt This means that (E Vˆ )(x, ξt(ω)) is the σ(ξt)-conditional expectation of

Vˆ (x, ξt(ω), ξt+1(ω)). Thus we get the following consequence of Theorem 7.34.

53 Theorem 7.37 (Problems of Bolza under Markovian dynamics). In the setting of Theorem 7.34, assume that

Kt(xt−1, ∆xt, ω) = Kˆt(xt−1, ∆xt, ξt−1(ω), ξt(ω)) ˆ k(xT , ω) = k(xT , ξT (ω))

T ˆ ˆ for an (Ft)t=0-Markov process ξ and convex normal integrands Kt and k. Then Vt(xt, ω) = Vˆt(xt, ξt(ω)), where Vˆt are given by the recursion ˆ VˇT = k,

Vˇt−1(xt−1, st−1, st) = inf{Kˆt(xt−1, xt, st−1, st) + Vˆt(xt, st)}, xt

ξt−1 Vˆt−1 = E Vˇt−1.

Corollary 7.38 (Stochastic control under Markovian dynamics). In the setting of Corollary 7.35, assume that

Lt(XT ,UT , ω) = Lˆt(XT ,UT , ξt(ω))

T ˆ for an (Ft)t=0-Markov process ξ, normal integrands Lt and Borel measurable Aˆt, Bˆt, uˆt. Then Jt(Xt, ω) = Jˆt(Xt, ξt(ω)), where

Jˆ (X , s ) = inf Lˆ (X ,U , s ), T T T m T T T T UT ∈R T

Iˆt+1(Xt,Ut, st+1) := Jt+1(Xt + At+1(st+1)Xt + Bt+1(st+1)Ut + ut+1(st+1), st+1), Jˆ (X , s ) = inf (Lˆ + Eξt Iˆ )(X ,U , s ). t t t m t t+1 t t t Ut∈R t Example 7.39 (Financial mathematics continued). We can write the optimal investment problem as a stochastic control problem by deﬁning the rate of return i i i process Rt := ∆st/st−1 of the price process s so that the problem becomes

minimize EV (u − XT ) over U ∈ ND˜ , d X j subject to ∆Xt = Rt+1 · Ut, Ut − Xt. j=0

i i i where U is the portfolio process in terms of wealth invested (Ut := stxt) and ˜ i i D(ω) = {U | Ut ∈ Dt(ω)/St(ω)}. We assume that ξ = (S, R) is a Markov process, the claim is of the form u(ω) = c(ST ) and that each Dt is σ(ξt)-measurable so that D˜(ω) = Dˆ(ξt(ω)) for some measurable closed convex-valued Dˆ. This ﬁts Corollary 7.35 with

d X L (X ,U , ξ ) = δ ( U j − X ) + δ (U ) t t t t {0} t t Dˆ t(ξt) t j=0

LT (XT , ξT ) = V (c(ST ) − XT ).

54 Assuming that there are no portfolio constraints Dt, the recursion in Corol- lary 7.35 becomes

JT (XT , ξT ) = V (c(sT ) − XT ), d (7.6) ξt X j Jt(Xt, ξt) = inf {E Jt+1(Xt + Rt+1 · Ut, st+1)) | Ut = Xt}. U ∈ d+1 t R j=0 Using the constraint to substitute out U 0, we get

d ξt 0 X j 0 j Jt(Xt, ξt) = inf E Jt+1((1 + Rt+1)Xt + (Rt+1 − Rt+1)Ut , st+1). U ∈ d t R j=1

If the returns Rt+1 are independent of (St,Rt), we see, using the fact that st+1 = st(1 + Rt+1), that Jt does not depend on Rt. If in addition the claim c(ST ) = 0 0 and R = 0, we see by induction that Jt are independent of ξt and

JT (XT ) = V (−XT ),

ξt Jt(Xt) = inf {E Jt+1(Xt + Rt+1 · Ut)}. d Ut∈R

In addition to European type claims c(sT ) that depend only on the terminal values sT of the underlyings s, many path-dependent claims can be handled P similarly. For example, an Asian c( t st) can be expressed using a new process Pt At = t0=0 st with dynamics At+1 = At + (1 + Rt+1)st, the running maximum Mt = supt0≤t st0 with Mt+1 = max(Mt, (1 + Rt+1)st). The joint process ξ = (s, A, M, R) will be Markov as soon as R is so. Exercise 7.4.5. In the setting of Example 7.39, assume that there are no port- 0 folio constraints, Rt+1 are independent of (St,Rt), c(ST ) = 0 and that R = 0. For V (u) = exp(u) − 1, show that

JT (XT ) = exp(−XT ) − 1,

ξt Jt(Xt) = exp(−Xt)αt+1 inf {E exp(−Rt+1 · Ut)} − 1 d Ut∈R where αT = 1 and

ξt αt = αt+1 inf {E exp(−Rt+1 · Ut)}. d Ut∈R 1 d Note that here the optimal risky allocation (Ut ,...,Ut ) does not even depend on the current wealth Xt. On the other hand, if we make the change of variables pj = U j/X (proportion of wealth invested in j), (7.6) becomes

JT (XT ) = V (c(sT ) − XT ), d ξt X j Jt(Xt, ξt) = inf {E Wt+1(Xt(1 + Rt+1 · p), st+1)) | p = 1}. p∈ d+1 R j=0

55 For V (u) = − ln(−u), show that

JT (XT ) = − ln(XT ), d ξt X j Jt(Xt) = − ln(Xt) + αt+1 + inf {E [− ln((1 + Rt+1 · p))] | p = 1}, p∈ d+1 R j=0 where αT = 0 and

d ξt X j αt = αt+1 + inf {E [− ln((1 + Rt+1 · p)) | p = 1}. p∈ d+1 R j=0

Likewise, for V (u) = −(−u)q for q ∈ (0, 1), show that

q JT (XT ) = −(XT ) , d q ξt q X j Jt(Xt) = −X αt+1 inf {E [−((1 + Rt+1 · p)) ] | p = 1}, p∈ d+1 R j=0 where αT = 1 and

d ξt q X j αt = αt+1 inf {E [−((1 + Rt+1 · p)) ] | p = 1}. p∈ d+1 R j=0

Note that, for both the logarithmic and the power loss function, the proportions of wealth invested pt do not even depend on the current wealth Xt.

56 8 Conjugate duality

8.1 Convex conjugates

From now on, we assume that U and Y are vector spaces that are in separating duality under the bilinear form hu, yi. That the bilinear form is separating means that for every u 6= u0, there is y ∈ Y with hu − u0, yi= 6 0. On U the weak topology σ(U, Y ) is the weakest locally convex topology under which each u 7→ hu, yi is continuous. That is, σ(U, Y ) is generated by sets of the form {u ∈ U | |hu, yi| < α} where α > 0 and y ∈ Y . Under σ(U, Y ), U is a locally convex topological space. The Mackey topology τ(U, Y ) is the strongest locally convex topology under which each continuous linear functional can be identiﬁed with an element of Y . The Mackey topology is generated by sets of the form {u ∈ U | suphu, yi < 1} (8.1) y∈K where K is convex symmetric and σ(Y,U)-compact. Turning the idea around, when U is a locally convex topological vector space, a natural choice for Y is the dual space of continuous linear functionals on U. By Theorem 3.6, the bilinear form is separating. Especially for Banach spaces, σ(U, Y ) is called the weak topology and σ(Y,U) the weak∗-topology, and the Mackey topology τ(U, Y ) coincides with the norm topology. We call both these topologies simply weak topologies, when the spaces in question are clear. When U = Rd, we always choose Y = Rd and the bilinear form as just the usual inner product. Example 8.1. Recall that, for p ∈ [1, ∞), the Lebesque space Lp := Lp(Ω, F,P ) is a Banach space under the norm kuk := (E|up|)1/p.

0 For p > 1, its continuous dual can be identiﬁed with Lp for p0 satisfying 1/p + 1/p0 = 1. For p = 1, we set p0 = ∞, and the continuous dual of L1 is the space L∞ = L∞(Ω, F,P ) of essentially bounded random variables. For all p = [1, ∞), 0 the bilinear form between Lp and Lp is given by hq, ci := E[qc]. Note, however, than when L∞ is equipped with the essential supremum norm, its continuous dual is not L1 (but the space of ﬁnitely additive measures on (Ω, F)). In particular, τ(L∞,L1) is a weaker topology than the one generated by the essential supremum norm.

57 Given an extended real-valued function g on U, its conjugate g∗ : Y → R is g∗(y) := sup{hu, yi − g(u)}. u∈U

The function g∗ is also known as Legendre-Fenchel transform, polar function, or convex conjugate of g. Since g∗ is a supremum of lower semicontinuous functions, g∗ is a lower semicontinuous function on Y . The Fenchel inequality

g(u) + g∗(y) ≥ hu, yi follows directly from the deﬁnition of the convex conjugate. In the exercises, we will familiarize ourselves with this transformation by calculating conjugates of convex functions deﬁned on Rd. The biconjugate of g is the function

g∗∗(c) = sup{hu, yi − g∗(y)}. y

By the Fenchel inequality, we always have

g ≥ g∗∗.

The following biconjugate theorem is the fundamental theorem on convex conjugates. The lower semicontinuous hull lsc g is a function deﬁned via

epi(lsc g) := cl epi g.

Theorem 8.2 (Biconjugate theorem). Assume that g is a convex extended real- valued function on U such that (lsc g)(u) > −∞ for all u. Then lsc g = g∗∗, i.e., ∗ (lsc g)(u) = supy∈Y {hu, yi − g (y)}. In particular, if g is a lsc proper convex function, then g = g∗∗, i.e., g has the dual representation g)(u) = sup{hu, yi − g∗(y)}. y∈Y

Proof. Recall that the closure of convex set in a locally convex space is the intersection of all closed half spaces containing the set. Applying this to the epigraph of lsc g, we get that lsc g is the supremum of all aﬃne continuous functions less than g, i.e.,

(lsc g)(u) = sup {hu, yi − α | hu, yi − α ≤ g(u)}. y∈Y,α∈R

Let y ∈ Y be ﬁxed. Then u 7→ hu, yi − α is smaller than g if hu, yi − α ≤ g(u) for all u, i.e. α ≥ sup{hu, yi − g(u)} = g∗(y). u

58 We got that, if g∗(y) < ∞, then the largest affine function less than g with the slope y is u → hu, yi−g∗(y), whereas if g∗(y) = ∞, then there are no such affine functions. Since this is valid for any y, we get that u 7→ sup{hu, yi − g∗(y)} is the supremum of all affine functions less than g. Combining this with the first paragraph of the proof, we get the result. Exercise 8.1.1. A proper convex function is σ(U, Y )-lsc if and only if it is τ(U, Y )-lsc.

Given a set C ⊂ U, the function

σC (y) = suphu, yi u∈C is known as the support function of C,

jC (u) := inf {λ | u ∈ λC} λ>0 as the gauge of C, and the set ◦ C := {y | σC (y) ≤ 1} as the polar of C. Note that σC is the conjugate of the indicator function ( 0 if u ∈ C, δC (u) = +∞ otherwise.

In the following theorem, cl C = C◦◦ is known as the bipolar theorem. ∗ Theorem 8.3. If a convex set C contains the origin, then σC = jC◦ , jC = δC◦ and cl C = C◦◦.

Proof. Exercise. Theorem 8.4. The set C ⊂ U is a Mackey neighborhood of the origin if and only if {y ∈ Y | σC (y) ≤ α} is weakly compact for some α > 0. In this case, {y ∈ Y | σC (y) ≤ α} is weakly compact for all α. In particular, C is a Mackey neighborhood of the origin if and only if C◦ is weakly compact.

Proof. Let C be a Mackey neighborhood of the origin. By (8.1), K◦ ⊂ C for some convex weakly compact K for which ◦◦ {y ∈ Y | σC (y) ≤ α} ⊂ {y ∈ Y | σK◦ (y) ≤ α} = αK = αK. Thus the closed set on the left side belongs to a weakly compact set and is thus compact for all α > 0. ◦ To prove the converse, ﬁx α > 0 with αC = {y ∈ Y | σC (y) ≤ α} weakly compact. The convex symmetric set K := co(αC◦ ∪(−αC◦)) is weakly compact as well (an exercise). By (8.1), K◦ is a Mackey neighborhood of the origin. Since C◦ ⊂ K/α, we have αK◦ ⊂ C◦◦ = cl C, so C is a neighborhood of the origin.

59 Theorem 8.5. Let g be a proper convex lower semicontinuous function on U. The following are equivalent:

1. g is bounded from above on a τ(U, Y )- neighborhood of u¯,

2. for every α ∈ R, {y ∈ Y | g∗(y) − hu,¯ yi ≤ α} is σ(Y,U)-compact.

Here 1 implies 2 even when g is not lsc.

Proof. By translations, we may assume thatu ¯ = 0 and g(0) = 0. To prove that 1. implies 2., note that we have (see the exercises)

∗ g(u) ≤ γ + δO(u) ∀ u ⇐⇒ g (y) ≥ σO(y) − γ ∀ y, so, for any α ∈ R, ∗ {y ∈ Y | g (y) ≤ α} ⊂ {y ∈ Y | σO(y) ≤ α + γ}, where the set on the right side is weakly compact when O is a Mackey neighborhood of the origin. ∗ ∗ That 2. implies 1., we may again do translations so that g (0) = infy g (y) = 0; the details are left as an exercise. Let γ > 0 and denote K := {y ∈ Y | g∗(y) ≤ γ}. If y∈ / K, we have

jK (y) := inf {λ | y ∈ λK} λ>0 = inf {λ | y ∈ λK} λ>1 = inf {λ | g∗(y/λ) ≤ γ} λ>1 ≤ inf {λ | g∗(y)/λ ≤ γ} λ>1 = g∗(y)/γ.

∗ If y ∈ K, we have jK (y) ≤ 1, so putting these together we get that g (y) ≥ γjK (y) − γ. Conjugating, we get

g(u) ≤ δK◦ (u/γ) + γ. Therefore, g is bounded above in a neighborhood of the origin, since K◦ is the polar of a weakly compact set.

8.1.1 Exercises

Exercise 8.1.2. Let f be a convex lower semicontinous function on the real line. Convince yourself that, given a ”slope” v, f ∗(v) is the smallest constant α such that the aﬃne function x → vx−α is majorized by f. What does this mean geometrically?

60 Exercise 8.1.3. Calculate the conjugates of the following functions on the real line:

1. f(x) = |x|

2. f(x) = δB(x), where B = {x | |x| ≤ 1}. 1 p 3. f(x) = p |x| , for p > 1. 4. V (x) = (eax − 1)/a. Exercise 8.1.4. Let V be a nondecreasing convex function on the real line. Analyze V ∗ using the geometric idea from the ﬁrst exercise.

1. Is V ∗ positive? 2. Is V ∗ zero somewhere?

3. Is V ∗ monotone? 4. Where is V ∗ ﬁnite? 5. Is V ∗ necessarily ﬁnite at the origin?

Hint: The answers depends on your choice of V .

8.1.2 Subgradients

A vector y ∈ Y is a subgradient of g at u ∈ dom g if

g(u0) ≥ g(u) + hu0 − u, yi ∀u0 ∈ U.

The subdiﬀerential ∂g(u) is the set of all subgradients of g at u. Note that we avoided deﬁning subgradients outside the domain. Exercise 8.1.5. We have y ∈ ∂g(u) if and only if

g(u) + g∗(y) = hu, yi.

We say that g is subdifferentiable at u if ∂g(u) 6= ∅. Exercise 8.1.6. Assume that g is a differentiable convex function on the real line. Show that ∂g(u) = {g0(u)}, the derivative of g at u. Give an example of a proper lsc convex extended real-valued function on the real line that is not subdifferentiable at a point in its domain. Theorem 8.6. Assume that g is proper and bounded from above in a neighborhood of u. Then ∂g(u) is nonempty and weakly compact, and the directional derivative g0(u, ·) is the support function of ∂g(u).

61 Proof. Exercise. Hint: show ﬁrst that g0(u, ·) is bounded above in a neighborhood of the origin.

∗ ∞ Theorem 8.7. For a proper convex g, we have (g ) = σdom g. If g is also lsc, ∞ then g = σdom g∗ .

Proof. Exercise.

8.2 Conjugate duality in optimization

Now we turn our attention to convex optimization problems. Given a vector space X and a locally convec topological vector space U, assume that F is a jointly convex extended real-valued function on X × U such that F (x, ·) is lsc for all x. The value function

ϕ(u) := inf F (x, u) x∈ X gives the optimal value of the convex minimization problem

minimize F (x, u) over x ∈ X (Pu) as a function of u. The value function is always convex, so, when it is proper and lower semicontinuous, the biconjugate theorem gives

inf F (x, u) = ϕ(u) = sup{hu, yi − ϕ∗(y)}. x∈X

The optimization problem (Pu) is called the primal problem and

∗ maximize hu, yi − ϕ (y) over y ∈ Y (Du) is called the dual problem. When ϕ is proper and lower semicontinuous, their optimal values coincide, and one says that there is no duality gap, or there is strong duality between (Pu) and (Du). Even when ϕ is not lsc, the inequality ϕ ≥ ϕ∗∗ means that, at least, the dual problem gives lower bounds to the optimal value of the primal problem. This is sometimes called as weak duality between (Pu) and (Du). A simple condition for the absence of duality gap and the existence of dual solutions is given by continuity of ϕ. Theorem 8.8. Assume that ϕ is proper and bounded from above in a neighborhood of u. Then the optimal values of (Pu) and (Du) coincide, and (Du) has a solution.

Proof. Exercise.

When also X is a LCTVS and F is jointly lsc, we have the following symmetry between the primal and the dual formulations.

62 Theorem 8.9. Assume that F : X × U → R is proper lsc convex function on LCTVSs X and U and let V be the dual space of X. The conjugate of x 7→ F (x, u) is (lsc γu), where

∗ γu(v) := inf{F (v, y) − hu, yi}. y

The function ϕ is lsc at u if and only if γu is lsc at the origin.

Proof. Exercise.

Many fundamental theorems in duality theory correspond to minimax theorems of convex-concave functions. The function L : X × Y → R deﬁned by L(x, y) := inf {F (x, u) − hu, yi} u∈U is called the Lagrangian associated with F . It is a convex-concave function in the sense that L(·, y) is convex for all y ∈ Y and L(x, ·) is concave for all x ∈ X. We have inf L(x, y) = −ϕ∗(y) x∈X and sup{L(x, y) + hu, yi} = F (x, u). y∈Y

Denoting Lu(x, y) := hu, yi + L(x, y) we have the elementary inequalities

∗ F (x, u) ≥ Lu(x, y) ≥ hu, yi − ϕ (y) and thus

∗ inf sup Lu(x, y) = F (x, u) ≥ sup{hu, yi − ϕ (y)} = sup inf Lu(x, y). x y y y x

If we have an equality above, the common value is called a saddle-value of Lu. A pair (x, y) ∈ X × Y is called a saddle-point of Lu if

0 0 0 0 Lu(x, y ) ≤ Lu(x, y) ≤ Lu(x , y) ∀(x , y ) ∈ X × Y.

This means that the pair (x, y) satisﬁes the Karush-Kuhn-Tucker (KKT) condition 0 ∈ ∂xL(x, y) u ∈ ∂y(−L)(x, y).

Theorem 8.10. Assume that F : X × U → R is jointly convex with F (x, ·) lsc for every x ∈ X. The following are equivalent:

1. inf(Pu) = sup(Du) 2. ϕ is lsc at u.

3. The saddle-value of Lu exists.

63 Furthermore, the following are equivalent:

4 x solves (Pu), y solves (Du) and inf(Pu) = sup(Du). 5 The pair (x, y) satisﬁes the KKT-condition.

Proof. Exercise.

Exercise 8.2.1. Let X = U = R. Find a lsc convex function F : X × U such that the primal and the dual problem have solutions but there is a duality gap.

8.3 Dual spaces of random variables

We assume that the space U is decomposable in the sense that 0 1Au + 1Ω\Au ∈ U whenever A ∈ F, u ∈ U and u0 ∈ L∞(Ω, F,P ; Rm). Note that decomposability of the spaces U and Y implies that the separation property holds automatically for the bilinear form hu, yi := E[u · y] Moreover, we have the following relations for relative topologies. Lemma 8.11. If U and Y are decomposable, then L∞ ⊆ U ⊆ L1 and 1 ∞ ∞ 1 σ(L ,L )|U ⊆ σ(U, Y), σ(U, Y)|L∞ ⊆ σ(L ,L ), 1 ∞ ∞ 1 τ(L ,L )|U ⊆ τ(U, Y), τ(U, Y)|L∞ ⊆ τ(L ,L ).

Proof. Exercise.

The following is a corollary of Theorem 6.2. Corollary 8.12 (Jensen’s inequality). Let h be a G-measurable convex normal integrand such that Eh∗ is proper on EGY and that EGU ⊂ U and EGY ⊂ Y. Then Eh(EGu) ≤ Eh(u) for every u ∈ U.

Proof. Applying Theorem 6.2 twice, we get Eh(EGu) = sup E[y · EGu − h∗(y)] Y ∈Y(G) = sup E[y · u − h∗(y)] Y ∈Y(G) ≤ sup E[y · u − h∗(y)] Y ∈Y = Eh(u).

64 Theorem 8.13. Let U and Y be decomposable and h be a convex normal integrand. Then Eh : U → R and Eh∗ : Y → R are conjugates of each other as soon as they are proper. Moreover, y ∈ ∂Eh(u) if and only if y ∈ ∂h(u) almost surely.

Proof. Exercise.

8.4 Conjugate duality in stochastic optimization

We will study (SP ) within the conjugate duality framework by introducing an additional parameter z ∈ L∞ := L∞(Ω, F,P, Rn) and the extended real-valued convex function F on L∞ × L∞ × U deﬁned by

F (x, z, u) = Ef(x, u) + δN (x − z). The space L∞ is in separating duality with L1 := L1(Ω, F,P ; Rn) under the bilinear form hz, pi := E[z · p].

We denote the associated optimum value function by ϕ(z, u) := inf {Ef(x, u) | x − z ∈ N }. x∈L∞ The function l(x, y, ω) := inf {f(x, u, ω) − u · y} m u∈R is the ”scenariowise Lagrangian” associated with f. Theorem 8.14. If Ef is proper on L∞ × U, then  +∞ if x∈ / dom F,  1 ⊥ L(x, p, y) = E[l(x, y) − x · p] if x ∈ dom1 F and p ∈ N , −∞ otherwise, ∗ ∗ F (v, p, y) = Ef (v + p, y) + δN ⊥ (p), ∗ ∗ ϕ (p, y) = Ef (p, y) + δN ⊥ (p). The KKT conditions mean that x ∈ N ∞, p ∈ N ⊥ and

p ∈ ∂xl(x, y) and u ∈ ∂y(−l)(x, y) P -almost surely.

Proof. By deﬁnition, L(x, p, y) = inf {F (x, z, u) − hz, pi − hu, yi} (z,u)∈L∞×U = inf {E[f(x, u) − z · p − u · y] | x − z ∈ N } (z,u)∈L∞×U = inf {E[f(x, u) − (x − z0) · p − u · y] | z0 ∈ N }, (z0,u)∈L∞×U

65 so the expression for L follows from Theorem ??. When Ef is proper on L∞ ×U, the interchange rule gives

F ∗(v, p, y) = sup {hx, vi − L(x, p, y)} x∈N ∞ = sup {hx, vi − E[f(x, u) − (x − z0) · p − u · y] | z0 ∈ N } x∈L∞,z0∈ L∞,u∈U = sup {E[x · (v + p) + u · y − f(x, u) − z0 · p] | z0 ∈ N } x∈L∞,z0∈ L∞,u∈U ∗ = Ef (v + p, y) + δN ⊥ (p),

The KKT condition (0, p, y) ∈ ∂F (x, 0, u) means that F (x, 0, u) + F ∗(0, p, y) = hu, yi. Using the expressions for F and F ∗, this can be written as x ∈ N ∞, p ∈ N ⊥ and

Ef(x, u) + Ef ∗(p, y) = E[x · p] + E[u · y].

Since f(x, u, ω) + f ∗(p, y, ω) ≥ x · p + u · y for all x, p ∈ Rn and u, y ∈ Rm, this is equivalent to the given subdiﬀerential condition.

The dual of (SP ) can be written as

∗ ⊥ minimize E[f (p, y) − u · y] over (p, y) ∈ N × Y. (Du)

Remark 8.15. In the deterministic setting, N ⊥ = {0} so the dual problem becomes ∗ m minimize f (0, y) − u · y over y ∈ R (Du) and the optimality conditions in Theorem 8.14 become

0 ∈ ∂xl(x, y),

u ∈ ∂y(−l)(x, y) which are the classical KKT conditions.

Given a convex function H on L∞, we say that p ∈ L1 is a shadow price of information (spi) for H if p is a subgradient at the origin of the function

φ(z) := inf {H(x) + δN (x − z)} x∈L∞ on L∞. In particular, if (p, y) ∈ ∂ϕ(0, u), then p is an spi of x 7→ Ef(x, u).

Lemma 8.16. A p ∈ L1 is an spi for H : L∞ → R if and only if p ∈ N ⊥ and inf H(x) = inf {H(x) − hx, pi}. x∈N ∞ x∈L∞

66 Proof. The condition p ∈ ∂φ(0) can be written as H(x0 + z) ≥ inf H(x) + hp, zi ∀z ∈ L∞, x0 ∈ N ∞ x∈N ∞ or, equivalently, H(z0) − hp, z0 − x0i ≥ inf H(x) ∀z0 ∈ L∞, x0 ∈ N ∞, x∈N ∞ which proves the claim.

Let

L˜0(x, y) := inf{Ef(x, u) − hu, yi} u ( +∞ if x∈ / dom Ef, = 1 E[l(x, y)] otherwise. Theorem 8.17. The following are equivalent

1. (p, y) ∈ ∂ϕ(0, u),

2. y ∈ ∂uϕ(0, u) and p is an spi of L˜0(·, y),

3. p ∈ ∂zϕ(0, u) and y ∈ Y satisﬁes y ∈ ∂kp(u) almost surely, where

kp(u, ω) = inf {f(x, u, ω) − x · p(ω)}. n x∈R

Proof. Exercise. Hint: Check the KKT conditions of the general setting for the function F (z, u) = ϕ(z, u).

8.5 Relaxation

In order to obtain existence of solutions and the absence of duality gap, we will extend the the space of strategies to the linear space N of all (not necessarily bounded) adapted strategies. The objective in the larger space is defined as the lower semicontinuous hull F¯ of F on L0 ×L∞ ×U where L0 is endowed with the topology of convergence in measure. We denote the associated value function by ϕ¯(z, u) = inf F¯(x, z, u). x∈L0 Taking the lsc hull does not affect the infimum of a function, so ϕ¯∗(p, y) = sup{hz, pi + hu, yi − ϕ¯(z, u)} z,u = sup {hz, pi + hu, yi − F¯(x, z, u)} x,z,u = sup {hz, pi + hu, yi − F (x, z, u)} x,z,u = ϕ∗(p, y).

67 The following theorem gives a simple expression for F¯. More importantly, the theorem gives that ϕ¯(0, u) = inf Ef(x, u) x∈N which is precisely (SP ). Theorem 8.18. Assume that F is proper. We have

F¯(x, z, u) = Ef(x, u) + δN (x − z) as soon as the right side is lsc and proper on L0 × L∞ × U. In particular, the optimum value of (SP ) equals ϕ¯(0, u).

Proof. By assumption,

Ef(x, u) + δN (x − z) ≤ F¯(x, z, u) for all (x, z, u) ∈ L0 ×L∞ ×U. To prove the converse, let (¯x, z, u) ∈ L0 ×L∞ ×U be such that the left side is finite. Define Q by dQ/dP = φ, where φ := (1 + |x¯|2)−1. We havex ¯ ∈ X , where X := L2(Ω, F,Q; Rn) is paired with V = X . We have F¯(¯x, z, u) ≤ F˜(¯x, z, u), where F˜ denotes the closure of F with respect to the locally convex space X × L∞ × U (the topology of which is stronger than that of L0 × L∞ × U). By the biconjugate theorem, it thus suffices to show that

∗∗ F (¯x, z, u) ≤ Ef(¯x, u) + δN (¯x − z), where the conjugates are deﬁned with respect to the pairing of X × L∞ × U with V × L1 × Y. Clearly, it suﬃces to consider the casex ¯ − z ∈ N . For any (v, p, y) ∈ V × L1 × Y,

F ∗(v, p, y) = sup {hx, vi + hz, pi + hu, yi − Ef(x, u) | x − z ∈ N } (x,z,u)∈L∞×L∞×U = sup {E [x · (φv) + z · p + u · y − f(x, u)] | x − z ∈ N } (x,z,u)∈L∞×L∞×U = sup {E [x · (φv) + (x − z0) · p + u · y − f(x, u)] | z0 ∈ N } (x,z0,u)∈L∞×L∞×U ∗ = Ef (φv + p, y) + δN ⊥ (p), where the last equality follows from the interchange rule. Hence,

F ∗∗(¯x, z, u) = sup {hx,¯ vi + hz, pi + hu, yi − Ef ∗(φv + p, y)} (v,p,y)∈V×N ⊥×Y = sup {E[¯x · (φv) + z · p + u · y − f ∗(φv + p, y)]} (v,p,y)∈V×N ⊥×Y = sup {E[¯x · (φv + p) + (z − x¯) · p + u · y − f ∗(φv + p, y)]}. (v,p,y)∈V×N ⊥×Y

68 By Fenchel’s inequality, f(¯x, u) + f ∗(φv + p, y) ≥ x¯ · (φv + p) + u · y, so [¯x · p]+ ∈ L1 and [(z − x¯) · p]+ ∈ L1. Thus, by Lemma 6.5, E[(z − x¯) · p] = 0 and then, F ∗∗(¯x, z, u) = sup {E[¯x · (φv + p) + u · y − f ∗(φv + p, y)]} (v,p,y)∈V×N ⊥×Y ≤ sup E sup[¯x · (φv + p) + u · y − f ∗(φv + p, y)] p∈N ⊥ v,y = sup Ef(¯x, u), p∈N ⊥ which completes the proof. Theorem 8.19. Let h be a normal integrand such that there exists a v ∈ N ⊥, normal integrand g and two diﬀerent values of λ ∈ R such that f(x, u, ω) ≥ λx · v(ω) − g(u, ω) and Eg is continuous on U. Then

(x, z, y) 7→ Ef(x, u) + δN (x − z) is lsc on N × L∞ × U.

Proof. Let k(x, u, ω) := f(x, u, ω) − λx · v(ω) + g(u, ω) and K(x, z, u) = Ek(x, u) + δN (x − z). By Theorem 6.4, K is lsc on L0 × L∞ × U. If (x, z, u) ∈ dom F , we have E[x · v] = E[z · v], by Lemma 6.5, so K(x, z, u) = F (x, z, u) − λhz, vi + Eg(u). (8.2) On the other hand, by assumption, there exists λ0 6= λ such that f(x, u, ω) ≥ λ0x · v(ω) + g(u, ω) so k(x, u, ω) ≥ (λ0 − λ)x · v(ω) + g(u, ω) By Lemma 6.5 again, (8.2) holds on dom K. Remark 8.20. The assumption of Theorem ?? means that there exists a p ∈ N ⊥ such that inf Ef ∗(λp, y) < ∞ y∈Y for two diﬀerent values of λ ∈ R. This holds, in particular, if f is bounded from below. In the context of portfolio optimization, it holds when the utility function satisﬁes the so called “reasonable asymptotic elasticity”.

69 Example 8.21. This is example shows that

F¯(x, z, u) < Ef(x + z, u) + δN (x − z)

2 when the right side is not lsc. We continue Example ?? and let α ∈ L (F0) and v ∈ N ⊥ be such that Eαv = ∞ and consider 1 f(x, u, ω) := |x − α(ω)|2 + x v(ω). 2 0 0 Here 1 1 F (x, z, u) = E[ |x −α|2 +x v]+δ (x−z) = E[ |x −α|2]+hz , vi+δ (x−z) 2 0 0 N 2 0 0 N so 1 F¯(x, z, u) = E[ |x − α|2] + hz , vi + δ (x − z). 2 0 0 N Now F¯(α, 0, 0) = 0 while Ef(α, 0) = ∞. Example 8.22. There exists f such that F (x, u) := Ef(x, u) is lsc on N × U and proper on N ∞ × U, but

cl(F + δN ∞×U ) 6= F.

That is, choose T = 1, n0 = n1 = 1, trivial F0, and ( 0 if x = x ξ(ω), f(x, u, ω) := 1 0 +∞ otherwise.

∞ where ξ∈ / L . Then cl(F + δN ∞×U ) = δ{0}×U which does not coincide with F .

We deﬁne the Lagrangian associated with the relaxed problem by

L¯(x, p, y) := inf{F¯(x, z, u) − hz, pi − hu, yi}.

Note that the interchange rule gives the expression

L¯(x, p, y) = E[x · p − f ∗(p, y)].

Theorem 8.23. Assuming that

F¯(x, z, u) = Ef(x, u) + δN (x − z) and that the primal and the dual are feasible, the following are equivalent

(a) x solves (SP ), (p, y) solves the (Du) and there is no duality gap, (b) x and (p, y) is a saddle-point of L¯, (c) x is primal feasible, (p, y) is dual feasible and

(p, y) ∈ ∂f(x, u) P -a.s.

70 (d) x is primal feasible, (p, y) is dual feasible and

p ∈ ∂xl(x, y), u ∈ ∂y[−l](x, y) P -a.s.

Proof. Let x and (v, y) be feasible. By Fenchel’s inequality,

f(x, u) + f ∗(v, y) − u · y ≥ x · v P -a.s. so Ef(x, u) + E[f ∗(p, y) − u · y] ≥ E[x · p] and the latter holds as an equality if and only if the former does which, in turn, is equivalent to the subdiﬀerential condition. It now suﬃces to note that E[x · p] = 0, by Lemma 6.5. Theorem 8.24. Assume that F is proper and that

F¯(x, z, u) = Ef(x, u) + δN (x − z) is lsc and proper. The following are equivalent.

1. (p, y) ∈ ∂ϕ¯(0, u).

2. y ∈ ∂uϕ¯(0, u) and

inf L˜0(x, y) = inf {L˜0(x, y) − E[x · p]}. x∈N x∈L0

3. p ∈ ∂zϕ¯(0, u) and y ∈ Y satisﬁes y ∈ ∂φ(u) almost surely, where

φ(u, ω) = inf{f(x, u, ω) − x · p(ω)}. x

Proof. Omitted.

1 2 Example 8.25. Let f(x, u, ω) = 2 (x0 − 1) + δ{0}(x0ξ(ω) − x1), F0 trivial, 0 ∞ ξ ∈ L (F1) with ξ∈ / L . We have 1 f ∗(v, y, ω) = δ (y) + v + v ξ(ω) + (v + v ξ(ω))2. {0} 0 1 2 0 1 Clearly Ef ∗(p, y) ≥ 0 for all (p, y) ∈ N ⊥ × Y, so the dual optimum is attained at the origin. Clearly ϕ¯ is lsc, so (0, 0) ∈ ∂ϕ¯(0, 0). Here L˜0(x, 0) = Ef(x, 0), ∞ so x ∈ N in dom1 L˜0 if and only if x = 0. Thus

inf L˜0(x, 0) = 1 x∈N ∞ while, choosing x = (1, ξ), we see that

inf L˜0(x, 0) = 0. x∈L∞

71 9 Examples

Exercise 9.0.1 (Mathematical programming). Consider the mathematical programming problem

minimize Ef0(x) over x ∈ N

subject to fj(x) ≤ uj P -a.s., j = 1, . . . , m, where fj are convex normal integrands. Embed the problem in to our duality framework, show that the corresponding f is a convex normal integrand, write down l and the optimality conditions for u = 0. You may assume that f satisﬁes all the conditions of the main theorems. Exercise 9.0.2 (Linear stochastic programming). Consider Exercise 9.0.1 in the case where f0(x, ω) = c(ω)·x and fj(x, ω) = aj(ω)·x−bj(ω) for j = 1, . . . , m. The problem can then be written in the matrix notation as minimize E[x · c] over x ∈ N subject to Ax ≤ b P -a.s. Assume that c is adapted and the rows of A∗ deﬁne adapted processes. Write down l, the dual problem and the optimality conditions. For every y, minimize over p in the dual problem. Exercise 9.0.3 (Problems of Bolza). Consider the problem of Bolza

T X minimize E[ Kt(xt−1, ∆xt + ut) + k(xT )], (9.1) x∈N t=0 where Kt and k are convex normal integrands. Embed the problem into the duality framework and write down f, l and f ∗. Write down l in terms of the associated Hamiltonian

Ht(xt−1, yt, ω) := inf {Kt(xt−1, ut, ω) − ut · yt}. d ut∈R 1 2 1 2 When kt(xt−1, u) = kt (xt−1) + kt (u) for Ft−1-adapted kt and Ft-adapted kt , use Jensen’s inequality to show that for the dual problem with u = 0, the dual problem reduces to

T ∗ X ∗ ∗ inf ϕ (p, y) = inf E[ Kt (Et−1∆yt, yt) + k (−yT )]. p,y adapted y t=0

You may assume that the Kt and k are bounded from below. Example 9.1 (Financial mathematics, continued). The problem in Example 9.1 ﬁts the duality framework with

T −1 T −1 X X f(x, u) = V (u − xt∆st+1) + δDt(ω)(xt) t=0 t=0

72 so T −1 T −1 X ∗ X l(x, y) = −y xt∆st+1 − V (y) + δDt(ω)(xt), t=0 t=0 l = l and T −1 ∗ ∗ X f (v, y) = V (y) + σDt(ω)(vt + y∆st+1). t=0 Thus the dual problem becomes

" T −1 # ∗ X ⊥ minimize E V (y) + σDt(ω)(pt + y∆st+1) − uy over p ∈ N , y ∈ Y t=0 while the optimality conditions are p ∈ N ⊥ and

T −1 X ∗ − xt∆st+1 ∈ ∂V (y),

pt + y∆st+1 ∈ NDt (xt) t = 0,...,T.

Since Dt is adapted, Corollary 8.12 gives that, given y, optimal p satisﬁes pt = Et[y∆st+1] − y∆st+1, so the dual reduces to

" T −1 # ∗ X minimize E V (y) + σDt (Et[y∆st+1]) − uy over y ∈ Y. t=0 If the dual problem has a solution, then x is optimal if and only if it is feasible and there is y such that

T −1 X ∗ − xt∆st+1 ∈ ∂V (y),

Et[y∆st+1] ∈ NDt (xt) t = 0,...,T.

Example 9.2 (Stochastic control, continued). The problem

" T # X minimize E Lt(Xt,Ut) over X,U ∈ N , t=0

subject to ∆Xt = AtXt−1 + BtUt−1 + ut t = 1,...,T

ﬁts the general framework (equality constraints) with x = (X,U), u = (u1, . . . , uT ),

T T X X ¯ f(x, u) = Lt(xt) + δ{0}(π∆xt − Atxt−1 − ut), t=0 t=1

73 π = [I 0] and A¯t = [At Bt]. Denoting y0 = yT +1 = 0 and A¯T +1 = 0, we get

T T X X l(x, y) = Lt(xt) − (π∆xt − A¯txt−1) · yt t=0 t=1 T X ∗ ¯∗ = Lt(xt) + xt · (π ∆yt+1 + At+1yt+1) , t=0 ( T ) ∗ X ∗ ¯∗ f (v, y) = sup xt · (vt − π ∆yt+1 − At+1yt+1) − Lt(xt) x t=0 T X ∗ ∗ ¯∗ = Lt (vt − π ∆yt+1 − At+1yt+1). t=0

Optimality conditions become p ∈ N ⊥, and, for all t,

∗ ∗ pt − (∆yt+1 + At+1yt+1,Bt+1yt+1) ∈ ∂Lt(Xt,Ut),

∆Xt = AtXt−1 + BtUt−1 + ut.

When each Lt and ut is adapted, Corollary 8.12 gives

T ∗ X ∗ ∗ ¯∗ inf Ef (p, y) = E Lt (−Et[π ∆yt+1 + At+1yt+1]). p∈N ⊥ t=0 Moreover, if A and B are adapted and the dual optimum is attained, then an x is optimal if and only if it is feasible and there is an adapted y (adapted by Jensen) such that

∗ ∗ −Et[(∆yt+1 + At+1yt+1,Bt+1yt+1)] ∈ ∂Lt(Xt,Ut),

∆Xt = AtXt−1 + BtUt−1 + ut.

Note that the ﬁrst condition can be written as (an exercise)

−(Et∆yt+1, 0) ∈ ∂(Xt,Ut)Ht(Xt,Ut, yt+1) where Ht is the “Hamiltonian” deﬁned by

Ht(Xt,Ut, yt+1) := Lt(Xt,Ut) + Et[yt+1 · (At+1Xt + Bt+1Ut)].

0 X U If Lt are of the form Lt(Xt,Ut) = Lt (Xt,Ut) + Lt (Xt) + Lt (Ut) with a dif- 0 ferentiable Lt , all Ft-measurable, then the optimality conditions can be written as

Ut ∈ argmin Ht(Xt, ·, yt+1),

−Et∆yt+1 ∈ ∂X Ht(Xt,Ut, yt+1),

∆Xt = AtXt−1 + BtUt−1 + ut.

74 This can be viewed as a stochastic version of the classical “maximum principle” in discrete time. Recall the “cost to go function”

Wt(Xt) = inf Et{Lt(Xt,Ut) + Wt+1(Xt + At+1Xt + Bt+1Ut + ut+1)} Ut from Corollary 7.35. We have

∂Wt(Xt) = {Vt | ∃Ut :(Vt, 0) ∈ ∂Et[Lt(Xt,Ut)+Wt+1(Xt+At+1Xt+Bt+1Ut+ut+1)]}.

Thus, if (X, U, y) satisfy the optimality conditions and Et+1yt+1 ∈ ∂Wt+1(Xt+1), then

0 0 0 0 0 Lt(Xt,Ut) + Wt+1(Xt + At+1Xt + Bt+1Ut + ut)

≥ Lt(Xt,Ut) + Wt+1(Xt + At+1Xt + Bt+1Ut + ut) 0 0 0 + Et+1[yt+1] · (Xt + At+1Xt + Bt+1Ut − Xt − At+1Xt − Bt+1Ut) ∗ ∗ 0 0 − Et[(∆yt+1 + At+1yt+1,Bt+1yt+1)](Xt − Xt,Ut − Ut) Taking conditional expectations gives

0 0 0 0 0 Et[Lt(Xt,Ut) + Wt+1(Xt + At+1Xt + Bt+1Ut + ut)] 0 ≥ Et[Lt(Xt,Ut) + Wt+1(Xt + At+1Xt + Bt+1Ut + ut)] + Et[yt] · (Xt − Xt) i.e.

(Et[yt], 0) ∈ Et[Lt(Xt,Ut) + Wt+1(Xt + At+1Xt + Bt+1Ut + ut+1)], so Et[yt] ∈ ∂Wt(Xt). We obtained that dual optimizers are subgradients of the cost-to-go function.

10 Absence of duality gap

Assumption 10.1. There exist p ∈ N ⊥ such that inf Ef ∗(λp, y) < ∞ y∈Y for two diﬀerent values of λ ∈ R. Assumption 10.2. The set L = {x ∈ N | f ∞(x, 0) ≤ 0} is linear.

The following theorem combines the results of this section with the main theorems of the previous one. Theorem 10.1. Under Assumptions 10.1 and 10.2, ϕ¯ is proper and lsc, (SP ) admits solutions and its optimum value equals sup E[u · y − f ∗(p, y)}. p∈N ⊥,y∈Y ∗ ∗ Moreover, ϕ¯ (p, y) = Ef (p, y) + δN ⊥ (p).

Proof. Omitted.

75 11 Optimal investment and indiﬀerence pricing

Recall the optimal investment problem

T −1 ! X minimize EV u − xt · ∆st+1 over x ∈ N0, (ALM) t=0 where N0 = {x ∈ N | xT = 0 P -a.s.} and V is a nondecreasing nonconstant convex function with V (0) = 0. The market satisﬁes the no-arbitrage condition if

(T −1 T −1 )

X X xt · ∆st+1 xt · ∆st+1 ≥ 0, x ∈ N = {0}. (NA) t=0 t=0 We denote the nonnegative multiples of absolutely continuous martingale den- sities of s by Q. It is an exercise to verify that a nonnegative y ∈ Y belongs to Q if and only if Et[y∆st+1] = 0 for all t = 0,...T − 1. The following theorem is from the exercise section above. Theorem 11.1. Assume that the market satisﬁes (??) and that there exists a ∗ dQ martingale measure Q P of s such that EV (λ dP ) < ∞ for two diﬀerent dQ λ ≥ 0. If dP ∈ Y, then the value function of (2.1) is closed in U and (2.1) has a solution for all u ∈ U.

11.1 Pricing formulas for indiﬀerence prices

Given a current financial positionu ¯, the indifference price of a claim u is given by π(u;u ¯) := inf {α | ϕ(¯u + u − α) ≤ ϕ(¯u)}. α∈R This is the least amount of cash the agent needs to cover the claim u without worsening her risk profile. The indifference price can be expressed

π(u, u¯) = inf{α | c − α ∈ A(¯u)}, where the acceptance set

A(¯u) := {u ∈ U | ϕ(u +u ¯) ≤ ϕ(¯u)}, describes the claims that the agent ﬁnds ”acceptable” given her risk preferences and her ability to trade in the market.

Theorem 11.2. Assume that F0 is the trivial σ-ﬁeld , ϕ is proper and lower semicontinuous, and that the price process is arbitrage-free. Then

π(u;u ¯) = sup{Euy − σA(¯u)(y) | Ey = 1}. y∈Y

76 If inf ϕ < ϕ(¯c), then π(u;u ¯) = sup {E[uy − λV ∗(y/λ) − uy¯ ] − λϕ(¯u) | Eq = 1}. y∈Q,λ>0

Proof. We have

∗ π (y;u ¯) = sup{E[uy] − πs(u, u¯)} u∈U = sup {E[uy] − α | c − α ∈ A(¯u)} α∈R,c∈U = sup {E[(˜u + α)y] − α | c˜ ∈ A(¯u)} α∈R,c˜∈U = sup {E[(˜uy + α(y − 1)] | u˜ ∈ A(¯u)} α∈R,c˜∈U ( σ (y) if Ey = 1, = A(¯u) +∞ otherwise.

Thus the ﬁrst formula follows from the biconjugate theorem since π(·, u¯) is proper and lower semicontinuous (an exercise). This will be a topic in the exercises.

The support function σA(¯u) satisﬁes

σA(¯u)(q) = sup{hu, yi | ϕ(u +u ¯) ≤ ϕ(¯u)} u∈U = inf sup{hu, yi − λ(ϕ(¯u + u) − ϕ(¯u))} (11.1) λ≥0 u∈U = inf sup{hu˜ − u,¯ yi − λ(ϕ(˜u) − ϕ(¯u))} λ≥0 u˜∈U = inf {λϕ∗(y/λ) − E[¯uy] + λϕ(¯u)}. λ≥0 Here the second line is based on Lagrangian duality and the assumption that inf ϕ < ϕ(¯c) (exercise below) . Thus the result follows from Theorem ?? once we recall that taking the lsc-hull does not change the inﬁmum. Exercise 11.1.1. Prove that π(·, u¯) is proper and lsc. Hint: Prove sequential lower semicontinuous using the ”direct method”, get precompactness of ”minimiz- ∞ ∞ ing sequences” using recession analysis and recalling ϕ (u) = infx∈N Ef (x, u). Exercise 11.1.2. Prove the formula (??). Hint: Apply conjugate duality to

F (α, u) = hu, yi + δR− (ϕ(u +u ¯) − ϕ(u) + α) where α ∈ R is paired with λ ∈ R, hα, λi := αλ. Example 11.3 (Superhedging prices). For the superheding prices, we have V =

δR− and c¯ = 0, so the above formula reduces to π(c) = sup{E[uy] | Ey = 1}. y∈Q

77 We have recovered the pricing formula for superhedging prices in terms of absolutely continuous martingale measures of s.