Faculty of Science Department of Applied Mathematics, Computer Science and Statistics

Utility based risk measures

Jasmine Maes

Promotor: Prof. Dr. D. Vyncke Supervisor: H. Gudmundsson

Thesis submitted tot obtain the academic degree of Master of Science: Applied Mathematics

Academic year 2015–2016

Acknowledgements

First of all I would like to thank my supervisor Mr. Gundmundson for letting me come by his office whenever I felt like I needed it, for coming up with good ideas and for supporting me throughout the thesis. I also would like to thank my promotor prof. Vyncke for giving me advice when I asked for it, while still allowing me a lot of freedom. Last but not least I would like to thank my friends for listing to all my complaints when things didn’t go as planned and when I got stuck, and my parents for their financial support during my education.

The author gives permission to make this master thesis available for consultation and to copy parts of this master thesis for personal use. In the case of any other use, the limitations of the copyright have to be respected, in particular with regard to the obligation to state expressly the source when quoting results from this master dissertation.

Ghent, 1 June 2016.

2

Contents

Preface 6

1 Mathematical representation of risk 8 1.1 Definitions and properties ...... 8 1.2 The acceptance set of a ...... 12 1.3 The penalty function ...... 16 1.4 Robust representation of convex risk measures ...... 20

2 An introduction to decision theory 24 2.1 The axioms of von Neumann-Morgenstern ...... 24 2.2 Risk and ...... 26 2.3 Certainty equivalents ...... 28 2.3.1 The ordinary certainty equivalent ...... 28 2.3.2 The optimised certainty equivalent ...... 29 2.3.3 The u-Mean certainty equivalent ...... 33 2.4 The exponential utility function ...... 33 2.5 Stochastic dominance ...... 35

3 and 38 3.1 Value at Risk ...... 38 3.1.1 General properties ...... 39 3.1.2 Consistency with expected utility maximisation ...... 42 3.2 Expected shortfall ...... 45 3.2.1 General properties ...... 46 3.2.2 Consistency with expected utility maximisation ...... 49

4 Utility based risk measures 54 4.1 Utility based shortfall risk measures ...... 54 4.2 Divergence risk measures ...... 60 4.2.1 Construction and representation ...... 60 4.2.2 The coherence of divergence risk measures ...... 71 4.2.3 Examples ...... 75 4.3 The ordinary certainty equivalent as risk measure ...... 76

5 Utility functions 78 5.1 The power utility functions ...... 80 5.2 The exponential utility functions ...... 81 5.3 The polynomial utility functions ...... 84

4 5.4 The SAHARA utility functions ...... 88 5.5 The κ-utility functions ...... 96

Conclusion 102

A Dutch summary 104

B Additional computations 107 B.1 Computations regarding the SAHARA utility class ...... 107 B.1.1 Computation of the utility function ...... 107 B.1.2 Computation of the divergence function ...... 110 B.2 Computations regarding the κ-utility class ...... 112 B.2.1 Determining the asymptotic behaviour ...... 112 B.2.2 Computation of the divergence function ...... 114

5 Preface

When choosing between different investment opportunities it is tempting to select the one which offers the highest expected return. However, this strategy would ignore the risk associated with that investment. Generally speaking we have that the larger the expected return of an investment, the larger the risk associated with it. Taking into account the risk of a particular investment is not only necessary to pick the best investment, but also to set up capital requirements. These capital requirements should create a buffer for potential losses of the investments. But how do we describe and measure this risk? We could of course try to describe the cumulative distribution or the density function of the investment. Although this would give us a lot of information about the risk involved, it could still be very difficult to compare different investment opportunities in terms of risk. But a more important problem is that it gives us too much information in some sense. Therefore it would be useful to summarize the distribution of the investment into a number, which represents the risk. These numbers can then be used to determine the necessary capital requirements. More formally if the stochastic variable X models the returns of an investment, then a risk measure is a mapping ρ : X 7→ ρ(X) such that ρ(X) ∈ R. Because a stochastic variable can be viewed as a function, a risk measure can be interpreted as a functional. We could therefore study risk measures by looking at them as purely mathematical objects. Using techniques and ideas from math- ematical analysis we could then analyse properties of these functionals. This is exactly what we will do in the first chapter of this thesis. Studying risk measures only from a purely mathematical point of view has the downside that it ignores the intuition behind it. The attitude towards risky alter- natives is a subjective matter determined by personal preferences. These personal preferences can be represented using so called utility functions. Utility functions are commonly used in economics to model how people make decisions under un- certainty. In the second chapter we will therefore introduce this decision theoretic framework and explain the necessary concepts of economics. Armed with both a strong mathematical and economic framework we will then apply these concepts to two commonly used risk measures in industry, Value at Risk and Expected Shortfall. This analysis will be the subject of the third chapter. The fourth chapter combines the axiomatic approach from the first chapter and the economic ideas from the second chapter and describes different ways in which utility functions can be used to construct risk measures. We will introduce utility based shortfall risk measures and divergences risk measures. Using ideas from mathematical optimisation we will link different utility based risk measures and discuss different representations of these risk measures.

6 After this discussion the question arises which utility function we should use to construct these utility based risk measures. Because utility functions represent personal preferences we do not believe that there is a straightforward answer to this question. However, the properties of the utility function used in the risk measures do affect this risk measure. The last chapter takes a closer look at different classes of utility functions which appear in literature and asses their properties in the context of utility based risk measures.

7 1 Mathematical representation of risk

In this chapter we will look at risk measures from a solely mathematical point of view. We will define what a risk measure is, and what properties it should have. We will take a closer look at the concepts of the acceptance set and the penalty function. Finally we will introduce the robust representation of a risk measure. The contents of this chapter is largely based on the theorems found in [11].

1.1 Definitions and properties

Consider a probability space (Ω, F,P ). Where Ω represents the set of all possible scenario’s, where F is a σ-algebra and where P is a probability measure. The future value of a scenario is uncertain and can be represented by a stochastic variable X. This is a function on the set of all possible scenario’s to the real numbers, X :Ω → R. Let X denote a given linear space of functions X :Ω → R including the constants. A risk measure ρ is a mapping ρ : X → R. Our goal is to define ρ in such a way that it can quantify the risk of a market position X, such that it can serve as a measure to determine the capital requirement of X. That is the amount of capi- tal needed when invested in a risk-free manner will make the position acceptable. Using this interpretation of ρ(X), we would like to have a risk measure that has some likeable properties.

First of all, if the value of the portfolio X is smaller then the value of the portfolio Y almost surely, then it would be logical that than you need more money to make the position of X acceptable, than to make the position of Y acceptable. This property is called monotonicity. Property 1. (Monotonicity) If X ≤ Y , then ρ(X) ≥ ρ(Y ). Secondly, it is logical to assume that to make the position X +m acceptable, where m is a risk-free amount, we need to have ρ(X) − m. This is precisely the amount

8 ρ(X) to make the position X acceptable reduced by the risk-free amount m we already had. This property is called translation invariance or cash invariance. Property 2. (Translation invariance) If m ∈ R, then ρ(X + m) = ρ(X) − m. Definition 1.1. A mapping ρ : X → R is called a monetary risk measure if ρ satisfies both monotonicity and translation invariance. Here we would like to point out that some authors define a monetary risk such that ρ can also take the values of +∞ and −∞. But then they use the additional property that ρ(0) is finite or even normalized ρ(0) = 0.

In [11] we found the following lemma. Lemma 1.1. Any monetary risk measure ρ is Lipschitz continuous with respect to the supremum norm k · k, we have: |ρ(X) − ρ(Y )| ≤ kX − Y k (1.1) Proof. We have that X − Y ≤ sup |X(ω) − Y (ω)|, ω∈Ω hence X ≤ Y +kX−Y k. Using monotonicity we find that ρ(X) ≥ ρ (Y + kX − Y k). Using translation invariance we get ρ(X) ≥ ρ(Y ) − kX − Y k. This gives us that ρ(X) − ρ(Y ) ≥ −kX − Y k or equivalently, ρ(Y ) − ρ(X) ≤ kX − Y k. We also have that Y − X ≤ sup |Y (ω) − X(ω)|. ω∈Ω Again using monotonicity and translation invariance we find that ρ(Y ) ≥ ρ(X) − kY −Xk. From this we conclude that ρ(X)−ρ(Y ) ≤ kY −Xk = kX −Y k. Hence we have that ρ(Y ) − ρ(X) ≤ kX − Y k and ρ(X) − ρ(Y ) ≤ kX − Y k. This leads us to conclude that |ρ(X) − ρ(Y )| ≤ kX − Y k.

An important subclass of monetary risk measures are the convex risk measures. These risk measures have the extra property of being convex. Property 3. (Convexity) ρ(λX +(1−λ)Y ) ≤ λρ(X)+(1−λ)ρ(Y ), for 0 ≤ λ ≤ 1 We will prove in lemma 1.2 that for monetary risk measures this property is equiv- alent with the property of quasi convexity. Property 4. (Quasi convexity) ρ(λX + (1 − λ)Y ) ≤ max (ρ(X), ρ(Y )) Definition 1.2. A convex risk measure is a monetary risk measure satisfying the convexity property. We can easily interpret the property of quasi convexity. Consider an investor who can invest his resources in such a way that he obtains X, or in another way so that he obtains Y . If he spends only a fraction λ of his resources on the first investment strategy and the rest on Y , he will obtain λX + (1 − λ)Y . This diversification strategy will give him a risk of ρ(λX + (1 − λ)Y ). The property of quasi convexity then states that the risk of this diversified portfolio cannot be greater than the risk of the riskiest investment strategy. In [11, p 178] we find the following statement which we will prove in this thesis.

9 Lemma 1.2. A monetary risk measure is convex if and only if it is quasi convex. Proof. First consider a risk measure satisfying convexity, hence ρ(λX+(1−λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ), for 0 ≤ λ ≤ 1. Without loss of generality we can assume that ρ(X) ≥ ρ(Y ) and hence max (ρ(X), ρ(Y )) = ρ(X). We find that

ρ(λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ) ≤ λρ(X) + (1 − λ)ρ(X) = ρ(X) = max (ρ(X), ρ(Y ))

From which we can conclude that convexity of a monetary risk measure implies quasi convexity. Now consider a monetary risk measure satisfying quasi convexity. For all X,Y ∈ X we can define X0 := X+ρ(X) and Y 0 := Y +ρ(Y ). Then it is clear that X0,Y 0 ∈ X . Without loss of generality we can suppose that ρ(Y 0) ≥ ρ(X0). Because ρ is quasi convex we have that ρ(λX0 + (1 − λ)Y 0) ≤ ρ(Y 0). Rewriting this expression in terms of X and Y we find that ρ (λX + λρ(X) + (1 − λ)Y + (1 − λ)ρ(Y )) ≤ ρ (Y + ρ(Y )). Now using the fact that ρ satisfies translation invariance we have that

ρ (λX + (1 − λ)Y ) − λρ(X) − (1 − λ)ρ(Y ) ≤ ρ(Y + ρ(Y )) = ρ(Y ) − ρ(Y ) = 0

We can conclude that ρ(λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ) for all X,Y ∈ X , i.e. ρ is a convex risk measure.

We can define a special subclass of convex risk measures using the notion of posi- tive homogeneity. Consider an investor who invests his wealth using an investment strategy that replicates X, with an associated risk ρ(X). If he only invests a frac- tion λ of his wealth in the same investment strategy he will obtain λX, with an associated risk of ρ(λX). If this risk equals the proportional risk of the initial investment, we say that the risk measure satisfies the property of positive homo- geneity. Property 5. (Positive Homogeneity) If λ ≥ 0, then ρ(λX) = λρ(X) Definition 1.3. A is a convex risk measure satisfying pos- itive homogeneity Coherent risk measures can also be defined by using the sub-additivity property. If a risk measure is sub-additive, you can decentralize the task of managing the risk of different positions. Consider an investor who has invested his wealth in a contingent claim X + Y . If the risk measure is sub-additive this will never be greater than ρ(X) + ρ(Y ). Property 6. (Sub-additivity) ρ(X + Y ) ≤ ρ(X) + ρ(Y ) It is stated in [11] that a coherent risk measure is a monetary risk measure sat- isfying positive homogeneity and sub-additivity. We now prove this equivalent definition.

10 Lemma 1.3. For a monetary risk measure that satisfies positive homogeneity, the convexity property is equivalent to the sub-additivity property.

Proof. First assume ρ is sub-additive, X,Y ∈ X , and 0 ≤ λ ≤ 1. We find that:

ρ(λX + (1 − λ)Y ) ≤ ρ(λX) + ρ((1 − λ)Y ) = λρ(X) + (1 − λ)ρ(Y ).

The first inequality follows from the fact that ρ is sub-additive, the second equality uses the assumption that ρ satisfies positive homogeneity. Note that λX ∈ X and (1 − λ)Y ∈ X because of the assumed linearity of X . 0 1 Now assume ρ is convex. Then for a fixed λ, 0 < λ < 1, define X := λ X and 0 1 0 0 Y := (1−λ) Y . Notice that X ,Y ∈ X . It follows from the convexity property and the positive homogeneity that

ρ(X + Y ) = ρ(λX0 + (1 − λ)Y 0) ≤ λρ(X0) + (1 − λ)ρ(Y 0) = ρ(λX0) + ρ((1 − λ)Y 0) = ρ(X) + ρ(Y ).

This proves that ρ satisfies the sub-additive property.

So far we have defined a coherent risk measure as a risk measure which satisfies the following four properties:

1. (Monotonicity) If X ≤ Y , then ρ(X) ≥ ρ(Y ).

2. (Translation invariance) If m ∈ R, then ρ(X + m) = ρ(X) − m.

3. (Positive homogeneity) If λ ≥ 0, then ρ(λX) = λρ(X).

4. (Convexity)ρ(λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ), for 0 ≤ λ ≤ 1.

Where the convexity property can be replaced by the subadditivity propery. How- ever some author’s like [1] and [2] use a positivity axiom instead of the monotonicity axiom.

Property 7. (Positivity) ∀X ≥ 0 ⇒ ρ(X) ≤ 0.

In general positivity and monotonicity are not equivalent. However it turns out that when a risk measure satisfies the positive homogeneity property and the sub- additivity property they are. The reason for using the positivity property instead of monotonicity property, is that the positivity property is often easier to prove.

Lemma 1.4. If a risk measure is translation invariant, sub-additive and positive homogeneous, then it is positive if and only if it is monotone.

11 Proof. First suppose the risk measure is positive homogeneous, translation invariant,sub- additive and positive. Because of positivity we have that

(X − Y ) ≥ 0 ⇒ ρ(X − Y ) ≤ 0. (1.2)

Using the sub-additivity property we find that

ρ(X) = ρ(X − Y + Y ) ≤ ρ(X − Y ) + ρ(Y ). (1.3)

Combining equation 1.2 and equation 1.3 we find that

X ≥ Y ⇒ ρ(X) ≤ ρ(Y ). (1.4)

We conclude that the risk measure is monotone. Now suppose the risk measure is positive homogeneous, translation invariant, sub-additive and monotone, we need to show that it is positive. Using monotonicity we find that

X ≥ 0 ⇒ ρ(X) ≤ ρ(0). (1.5)

Using positive homogenity we find that for all λ > 0,

ρ(0) = ρ(λ0) = λρ(0). (1.6)

Because this is true for all λ > 0 we can conclude that ρ(0) = 0. Using equation 1.5 we can conclude that X ≥ 0 ⇒ ρ(X) ≤ 0. (1.7) This proves positivity.

Remark 1.1. Using lemma 1.3 and lemma 1.4 we see that a risk measure is coherent if and only if it satisfies the following properties for X,Y ∈ X .

1. (Positivity) X ≥ 0 ⇒ ρ(X) ≤ 0.

2. (Sub-additivity) ρ(X + Y ) ≤ ρ(X) + ρ(Y ).

3. (Positive homogeneous) ∀λ > 0 ρ(λX) = λρ(X).

4. Ttranslation invariant) ∀m ∈ R ρ(X + m) = ρ(X) − m.

1.2 The acceptance set of a risk measure

In the previous section we interpreted ρ(X) as the amount of capital which, if invested in a risk-free manner, makes the position X acceptable. In this section we will define the acceptance set of a risk measure. This is the set of all positions which do not require surplus capital. We will also demonstrate the relationship between the properties of the risk measure and the corresponding acceptance set.

12 Definition 1.4. The acceptance set induced by a monetary risk measure ρ is de- fined by Aρ := {X ∈ X |ρ(X) ≤ 0}. (1.8) The following theorem was taken from [11] and proves that there is a clear con- nection between the properties of a monetary risk measure and the associated acceptance set. We have worked out the proof.

Theorem 1.1. If ρ is a monetary risk measure with acceptance set A := Aρ then

1. A is non-empty.

2. A is closed in X with respect to the supremum norm k · k.

3. inf{m ∈ R|m ∈ A} > −∞. 4. X ∈ A, Y ∈ X , Y ≥ X ⇒ Y ∈ A.

5. ρ can be recovered from A:

ρ(X) = inf{m ∈ R|m + X ∈ A}. (1.9)

6. If ρ is a convex risk measure, then A is a convex set.

7. If ρ is positively homogeneous, then A is a cone. In particular is ρ is a coherent risk measure, A is a convex cone.

Proof. 1. Consider m = ρ(0), then m ∈ X . We will prove that m ∈ A. m ∈ A ⇔ ρ(m) ≤ 0 ⇔ ρ(0) − m ≤ 0 ⇔ ρ(0) ≤ m.

1 2. Consider a sequence Xn ∈ A such that Xn → X . We need to prove that X ∈ A. Suppose ρ(X) > 0 then ∃c > 0 : |ρ(Xn) − ρ(X)| > c but using lemma 1.1 we have that kXn − Xk ≥ |ρ(Xn) − ρ(X)| > c > 0. If Xn converges to X in the supremum norm the left-hand side goes to 0. This gives us a contradiction. Hence ρ(X) ≤ 0, and therefore ρ(X) ∈ A.

3. ∀m ∈ A we have: m ∈ A ⇔ ρ(m) ∈ A ⇔ ρ(0) − m ≤ 0 ⇔ ρ(0) ≤ m. Hence ρ(0) is a lower bound for all m ∈ A. This concludes the proof since we supposed ρ(0) is finite for a monetary risk measure.

4. We know that X ∈ A ⇒ ρ(X) ≤ 0 and using monotonicity Y ≥ X ⇒ ρ(Y ) ≤ ρ(X). Combining those two facts we find that ρ(Y ) ≤ ρ(X) ≤ 0. Finally we can conclude that Y ∈ A.

5. Notice that inf{m ∈ R|ρ(m + X) ≤ 0} = inf{m ∈ R|ρ(X) ≤ m} = ρ(X). 6. We need to prove that ∀X,Y ∈ A and ∀λ ∈ [0, 1] we have that λX + (1 − λ)Y ∈ A. It is sufficient to prove that ρ (λX + (1 − λ)Y ) ≤ 0. Since X,Y ∈ A, and λ ∈ [0, 1] we have λρ(X) ≤ 0 and (1 − λ)ρ(Y ) ≤ 0. Since ρ is convex we have ρ (λX + (1 − λ)Y ) ≤ λρ(X) + (1 − λ)ρ(Y ) ≤ 0. This is what we needed to prove.

1convergence with respect to the supremum norm k · k

13 7. To prove that A is a cone it is sufficient to prove that ∀X ∈ X and ∀λ ≥ 0, we have that λX ∈ A. Because ρ is positively homogeneous we have that λX ∈ A ⇔ ρ (λX) ≤ 0 ⇔ λρ (X) ≤ 0 ⇔ ρ(X) ≤ 0 ⇔ ρ(X) ∈ A. This proves that A is a cone. From the above proofs it follows directly that if ρ is a coherent risk measure then A is a convex cone.

In 1.4 we defined for each monetary risk measure the associated acceptance set. We can also do the opposite and define for each acceptance set an associated risk measure.

Definition 1.5. ρA(X) := inf{m ∈ R|m + X ∈ A} This is a very intuitive definition for a risk measure. If X is a financial position, then ρA(X) is the minimal amount of money needed to make the position X acceptable. Next theorem will show that the properties of the acceptance set are linked to the properties of the associated risk measure. This theorem was found in [11] and we have worked out the proof. Theorem 1.2. If A is a non-empty subset of X such that properties 3 and 4 from theorem 1.1 are both satisfied, Then the functional ρA has the following properties

1. ρA is a monetary risk measure.

2. If A is a convex set, then ρA is a convex risk measure.

3. If A is a cone, then ρA is positively homogeneous. In particular if A is a convex cone then ρA is a coherent risk measure.

4. A ⊆ AρA , and A = AρA if and only if A is k · k-closed in X .

Proof. 1. To prove that ρA is a monetary risk measure, we need to check that ∀X ∈ X ρA(X) is finite and that, ρA(X) satisfies monotonicity and transla- tion invariance.

ˆ (translation invariance) We need to prove that for X ∈ X and m ∈ R ρA(X + m) = ρA(X) − m. This follows almost immediately from the properties of the infimum, since ρA(X) − m = inf{l ∈ R|l + X ∈ A} − m = inf{l ∈ R|l + X + m ∈ A} = ρA(X + m) ˆ (monotonicity) X ≤ Y ⇒ m + X ≤ m + Y ∀m ∈ R this implies that inf{m ∈ R|m+X ∈ A} ≥ inf{m ∈ R|m+Y ∈ A}. Using the definition of ρA we conclude that X ≤ Y ⇒ ρA(X) ≥ ρA(Y )

ˆ (ρA(X) is finite) Since A= 6 ∅, we can find a Y ∈ A. Fix this Y and let X ∈ X . From the assumptions on X we have that X and Y are both bounded, hence there exists a m ∈ R such that m + X > Y . Using that Y ∈ A, monotonicity and the translation invariance we find 0 ≥ ρA(Y ) ≥ ρA(X + m) = ρA(X) − m. We conclude that ∀X ∈ X ρA(X) ≤ m < +∞. Because we have assumed property 3 from theorem 1.1, we have ρA(0) > −∞. We need to prove that ρA(X) > −∞ ∀X ∈ X . Take m0 ∈ R such that X+m0 ≤ 0. Using translation invariance and 0 0 monotonicity we find that ρA(X + m ) = ρA(X) − m ≥ ρA(0) > −∞. From this we can conclude that for a random X ∈ X ρA(X) > −∞.

14 2. We need to prove that if A is convex, then ∀X1,X2 ∈ X and ∀λ ∈ [0, 1] ρA (λX1 + (1 − λ)X2) ≤ λρA(X1) + (1 − λ)ρA(X2). Because of translation invariance we find ∀i ∈ 1, 2 ρA (Xi + ρA(Xi)) = ρA(Xi) − ρA(Xi) = 0, hence ρA(Xi) + Xi ∈ A. Because A is a convex set we have λ (ρA(X1) + X1) + (1 − λ)(ρA(X2) + X2) ∈ A. Using this we find that

0 ≥ ρA (λ (ρA(X1) + X1) + (1 − λ)(ρA(X2) + X2))

= ρA (λX1 + (1 − λ)X2) − λρA(X1) − (1 − λ)ρA(X2).

From this we can conclude that ∀X1,X2 ∈ X , λ ∈ [0, 1], ρA (λX1 + (1 − λ)X2) ≤ λρA(X1) − (1 − λ)ρA(X2). Which is precisely what we needed to prove.

3. If A is a cone we need to prove that ∀X ∈ X and ∀λ ≥ 0, ρA(λX) = λρA(X). We first prove ρA(λX) ≤ λρA(X). We know that since ρA(X)+X ∈ A and A is a cone that λ (ρA(X) + X) ∈ A. Hence we have 0 ≥ ρA (λ (ρA(X) + X)) = ρA(λX) − λρA(X). This proves that ρA(λX) ≤ λρA(X). To prove the opposite inequality take m such that m < ρA(X). Then m + X/∈ A, which also implies that for λ ≥ 0 λm + λX∈ / A. Which is equivalent with λm < ρA(λX). We have that λm < λρA(X) ⇒ λm < ρA(λX) This can only be true if λρA(X) ≤ ρA(λX). Finally we can conclude that λρA(X) = ρA(λX).

4. First we’ll prove the inclusion A ⊂ AρA . For this take an X ∈ A then it is

clear that ρA(X) = inf{m ∈ R|m + X ∈ A} ≤ 0 and therefore X ∈ AρA .

Secondly from part 2 of theorem 1.1 we know that if A = AρA , then A is k · k-closed in X . Finally assume that A is k · k-closed in X . We need to

prove that AρA ⊂ A, hence we need to prove that X ∈ AρA ⇒ X ∈ A. This

is equivalent with X/∈ A ⇒ X/∈ AρA . Take an X/∈ A it is sufficient to prove that ρA(X) > 0. To prove this we need to take m > kXk. Since A is k · k-closed in X , X\A is k · k-open in X . Because X ∈ X \ A we can find a λ ∈ (0, 1) such that λm + (1 − λ)X/∈ A. Therefore we have

0 ≤ ρA(λm + (1 − λ)X) = ρA((1 − λ)X) − λm.

Because ρA is a monetary risk measure we can apply lemma 1.1. We find that |ρA((1 − λ)X) − ρA(X)| ≤ kX − λX − Xk = λkXk.

Using the two inequalities which we have obtained above, we can conclude that ρA(X) ≥ ρA((1 − λ)X) − λkXk ≥ λ(m − kXk) > 0.

This is precisely what we needed to prove.

We have connected the concepts of monetary risk measures, convex risk measures and coherent risk measures to the concept of the acceptance set. The acceptance contains all acceptable financial positions. But what is an acceptable position? This is subjective and can depend on the risk-aversion of the portfolio-holder. Or it could depend on regulations of a supervisory agency.

15 1.3 The penalty function

In 1921 the economist Frank Knight formulated a distinction between risk and uncertainty. Risk only applies to situations where, although we do not know the outcome of an event, we can accurately assign a probability measure to the differ- ent outcomes. This situation might occur when tossing a fair coin. Although you do not know if the coin will land head’s up or not, you know (with certainty) that 1 this will happen with probability 2 . Uncertainty in Knigth’s work is different. It applies to situations in which we do not have all the information to accurately assign a probability measure to the dif- ferent outcomes. This type of uncertainty, named after Knight, is called Knightian uncertainty. Knightian uncertainty is very common in real world situations. Con- sider for example the future return of a stock. The return of the stock is uncertain and we cannot accurately assign a probability measure to the different returns. From historical returns of the stock we could estimate such a probability measure. But would this be the correct probability measure? Obviously not. In this section we consider the case of Knightian uncertainty where we have a measurable space (Ω, F) but without a fixed probability measure assigned to this space. Let X be the space of all bounded measurable functions on (Ω, F) endowed with the supremum-norm k · k. It is straightforward to show that X is a Banach space. Let M1 := M1 (Ω, F) be the set of all probability measures on (Ω, F) and denote with M1,f the set of all functions Q : F → [0, 1] with are normalized i.e. Q (Ω) = 1 and which are finitely additive. It is clear that M1 ⊂ M1,f and that the elements of M1,f are not necessarily probability measures since it is not guaran- teed that they satisfy σ-additivity. In the next section we use the notation EQ [X] R with Q ∈ M1,f for XdQ, where the integral is understood to be a Lebesgue integral and Q ∈ M1,f .

Definition 1.6. A penalty function for ρ on M1,f is a functional α : M1,f → R ∪ {+∞} such that inf α (Q) ∈ R. (1.10) Q∈M1,f Penalty functions are strongly linked to convex risk measures. Each penalty func- tion defines a convex risk measure and convex risk measures can be represented by using a penalty function. We will prove this in the next two theorems. Theorem 1.3. The functional

ρ(X) := sup (EQ [−X] − α (Q)) (1.11) Q∈M1,f defines a convex risk measure on X , such that ρ(0) = − inf α(Q). Q∈M1,f

Proof. For each Q ∈ M1,f we define for all X in X the functional ρQ(X) := (EQ [−X] − α (Q)). We will first show that ρQ satisfies monotonicity and transla- tion invariance. Monotonicity follows from X ≤ Y ⇒ −X ≥ −Y

⇒ EQ [−X] ≥ EQ [−Y ] ⇒ (EQ [−X] − α(Q)) ≥ (EQ [−Y ] − α(Q)) ⇒ ρQ(X) ≥ ρQ(Y ).

16 To prove that ρQ satisfies translation invariance take X ∈ X and m ∈ R. We have that

ρQ(m + X) = EQ [−(X + m)] − α(Q) = EQ [−X] − α(Q) − m = ρQ(X) − m. Where we have used that Q is normalized. We also want to prove that the functional ρQ is convex. From the proof of 1.2 we know that it is sufficient to prove that ∀X,Y ∈ X and ∀λ ∈ [0, 1] we have that ρQ (λX + (1 − λY ) ≤ max (ρQ(X), ρQ(Y )). We can assume without loss of generality that EQ [−X] ≤ EQ [−Y ], then ρQ(X) ≤ ρQ(Y ) and therefore max (ρQ(X), ρQ(Y )) = ρQ(Y ). We also have that

ρQ (λX + (1 − λ)Y ) = EQ [− (λX + (1 − λ)Y )] − α(Q) = λEQ [−X] + (1 − λ)EQ [−Y ] − α(Q) ≥ λEQ [−Y ] + (1 − λ)EQ [−Y ] − α(Q) = EQ [−Y ] − α(Q) = ρQ(Y ) = max (ρQ(X), ρQ(Y )) .

The properties monotonicity, translation invariance and convexity are satisfied for all Q ∈ M1,f . Hence we have that the functional defined by 1.11 also satisfies these properties since they are preserved when taking the supremum over all Q ∈ M1,f . Because of the definition of a penalty function and the fact that X ∈ X is bounded, we have that ρ(X) only takes finite values. We can conclude that ρ(X) is a convex risk measure. The fact that ρ(0) = − inf α(Q) follows immediately from the Q∈M1,f properties of supremum and infimum.

Next theorem will prove that we can represent each convex risk measure using a penalty function. The proof of this theorem is not easy and uses results from functional analysis. For the ease of the reader we give these results without proof.

Theorem 1.4. (Separating hyperplane theorem) In a topological vector space E, any two disjoint convex sets B and C, one of which has an interior point, can be separated by a non-z´ero continuous linear functional l on E, i.e.,

l(x) ≤ l(y) ∀x ∈ C, ∀y ∈ B. (1.12)

Proof. Without proof, see [11, p.508].

Theorem 1.5. (Riesz representation theorem) There is a one-to-one correspon- dence between the set of functions Q ∈ M1,f and linear continuous functionals l on X such that l(1) = 1 and l(X) ≥ 0 for X ∈ X . The correspondence is defined R by l(X) = EQ [X] = XdQ, ∀Q ∈ M1,f . Proof. Without proof see [11, p.506].

Theorem 1.6. Any convex risk measure ρ on X is of the form

min  ρ(X) = max EQ [−X] − α (Q) ,X ∈ X , (1.13) Q∈M1,f

17 where the penalty function αmin is given by

min α (Q) := sup EQ [−X] for Q ∈ M1,f (1.14) X∈Aρ

Moreover, αmin is the minimal penalty function which represents ρ i.e., any penalty min function α for which 1.11 holds satisfies α(Q) ≥ α (Q) for all Q ∈ M1,f Proof. Step 1: We will prove that

min  ρ(X) ≥ sup EQ [−X] − α (Q) , ∀X ∈ X . (1.15) Q∈M1,f

Let X0 := ρ(X) + X, then because of the translation invariance property we have 0 0 that ρ(X ) = ρ(ρ(X)+X) = ρ(X)−ρ(X) = 0. Hence X ∈ Aρ. Because of the def- min 0 min 0 inition of α (Q) and X ∈ Aρ, we have that ∀Q ∈ M1,f α (Q) ≥ EQ [−X ] = min EQ [−X]−ρ(X). This leads us to conclude that ρ(X) ≥ sup EQ [−X]−α (Q). Q∈M1,f Which is what we wanted to prove.

Step 2: For a given X we will construct a QX ∈ M1,f such that

min ρ(X) ≤ EQX [−X] − α (QX ). (1.16) In combination with 1.15 from the first part, this will prove 1.13. It is sufficient to prove this for X ∈ X with ρ(X) = 0. Because if ρ(X) = m, then ρ(X + m) = 0 min and we have that ρ(X) − m = ρ(X + m) ≤ EQX [−(X + m)] − α (QX ) = min EQX [−X] − α (QX ) − m. We can also assume without loss of generality that ρ(0) = 0. Consider the set B := {Y ∈ X |ρ(Y ) < 0}. (1.17) It is clear that B is non-empty. We’ll prove that B is open in X . To prove this it is sufficient to prove that X\B is closed in X . Take a sequence Xn ∈ X \ B, i.e. ρ(Xn) ≥ 0 for all n, such that Xn → X. Because of lemma 1.1 we have that ρ is Lipschitz continuous with respect to the supremum norm and ρ(Xn) → ρ(X). We find that ρ(X) ≥ 0, i.e. X ∈ X \ B. The set B is also convex because if we take X,Y ∈ B and λ ∈ [0, 1] then because of the convexity of ρ we have (λX + (1 − λ)Y ) ∈ B ⇔ ρ (λX + (1 − λ)Y )) ≤ λρ(X) + (1 − λ)ρ(Y ) < 0. Since X/∈ B and a singleton is a convex set we can apply theorem 1.4 to find a non-z´ero continuous linear functional l on X such that

l(X) ≤ inf l(Y ) =: b. (1.18) Y ∈B

To construct QX we’ll use 1.5. For this we will first need to prove that Y ≥ 0 ⇒ l(Y ) ≥ 0. Take Y ≥ 0 for all λ > 0 we have λY ≥ 0, because of monotonic- ity we have ρ(λY ) ≤ 0. Furthermore because of translation invariance we find ρ(1 + λY ) = ρ(λY ) − 1 < 0. We find that ∀λ > 0 (1 + λY ) ∈ B. Because of the linearity of l we have that l(X) ≤ l(1)+λl(Y ) for all λ > 0. If l(Y ) < 0 you could, by choosing λ large enough, make sure that l(1) + λl(Y ) < l(X), a contradiction.

18 We conclude that l(Y ) ≥ 0 if Y ≥ 0.

Now we will prove that l(1) > 0. Since l is a non-z´erocontinuous linear functional, there exists a Y ∈ X such that l(Y ) 6= 0 and also l(−Y ) = −l(Y ) 6= 0. Hence we find a Y + and aY − such that 0 < l(Y ) := l(Y +) − l(Y −). With Y + ≥ 0 and Y − ≥ 0. This representation of Y is not unique and because of the linearity of l we can pick Y with l(Y ) > 0 and a representation of this Y such that kY +k < 1. Then because 1 − Y + ≥ 0 and the positivity of l we have l(1 − Y +) ≥ and l(Y +) > 0. Using linearity we find that l(1) = l(Y +) + l(1 − Y +) > 0.

Now we can use 1.5 to find a QX in M1,f such that l(Y ) [Y ] = ∀Y ∈ X (1.19) EQX l(1)

It is clear from the definitions of B and Aρ that B ⊂ Aρ. Therefore we have

min −l(Y ) l(Y ) −b α (QX ) = sup EQX [−Y ] ≥ sup EQX [−Y ] = sup = − inf = Y ∈Aρ Y ∈B Y ∈B l(1) Y ∈B l(1) l(1) (1.20) Because we know that ∀ > 0 Y + ∈ B for any Y ∈ Aρ. Therefore we can conclude using the epsilon characterisation of the supremum, that the above inequality is min −b an equality, hence α (QX ) = l(1) . Using the assumption that ρ(X) = 0 and the fact that l(X) ≤ b, we can conclude that 1 [−X] − αmin(Q ) = (b − l(X)) ≥ 0 = ρ(X) (1.21) EQX X l(1) This is what we needed to prove.

The only part which is left to prove is the fact that αmin is the minimal penalty function which represents ρ. Let α be a random penalty function which represents min ρ. Then we need to prove that for all Q ∈ M1,f α(Q) ≥ α (Q). We have that ∀X ∈ X and Q ∈ M1,f , ρ(X) ≥ EQ [−X] − α(Q). therefore min α(Q) ≥ sup (EQ [−X] − ρ(X)) ≥ sup (EQ [−X] − ρ(X)) ≥ α (Q). X∈X X∈Aρ This concludes the proof.

We have learned that each convex risk measure can be represented using a penalty function. Because coherent risk measures are by definition convex, this is also true for coherent risk measures. We now show that the penalty function of a coherent risk measure has some interesting properties and that the the representation 1.13 can be further specified. Theorem 1.7. The minimal penalty function αmin of a coherent risk measure ρ takes only values 0 and +∞. In particular a coherent risk measure can be represented by ρ(X) = max Q [−X] . (1.22) Q∈Qmax E

19 Where Qmax is defined as

max min Q := {Q ∈ M1,f |α (Q) = 0}. (1.23)

Proof. We know from theorem 1.1 that Aρ is a convex cone. Hence for all λ > 0, λX ∈ Aρ. Using theorem 1.6 we know that

min min α (Q) = sup EQ [−X] = sup EQ [−λX] = λ sup EQ [−X] = λα (Q). X∈Aρ λX∈Aρ λX∈Aρ

Because this equation must hold for all Q ∈ M1,f and for all λ > 0, we have that αmin(Q) = 0 or αmin(Q) = +∞. It is now clear that 1.22 holds. We would like to remind the reader that in the representation 1.13 of a convex risk measure the Q is not necessarily a probability measure. In the next section we will impose some extra conditions with respect to the space X and the continuity of ρ to obtain an analogous representation in which Q is indeed a probability measure.

1.4 Robust representation of convex risk mea- sures

In the previous section we considered the situation in which there was no proba- bility measure fixed to the space (Ω, F). In this section fix a probability measure P to the space (Ω, F) and let X = L∞ := L∞(Ω, F,P ). Theorem 1.6 gave us a representation for any convex risk measure. In this section we will only consider risk measures ρ such that

if X = YP − almost surely then ρ(X) = ρ(Y ). (1.24)

We introduce the the notion of absolute continuity.

Definition 1.7. Q ∈ M1,f is absolute continuous with respect to P ∈ M1,f on the σ-algebra F, and we write Q  P if for all A ∈ F

P (A) = 0 ⇒ Q(A) = 0. (1.25)

Notice that if P and Q are probability measures then this definition reduces to the definition of absolute continuity of two probability measures. Lemma 1.5. Let ρ be a convex risk measure that satisfies 1.24 and which is represented by a penalty function α as in 1.11. Then α(Q) = +∞ for any Q ∈ M1,f (Ω, F) which is not absolutely continuous to P .

Proof. Take Q ∈ M1,f (Ω, F) such that Q is not absolute continuous with respect to P . Then because Q : F → [0, 1] there exists an A ∈ F such that

Q(A) > 0 and P (A) = 0. (1.26) ( 1, if ω ∈ A Take any X ∈ Aρ and define Xn := X − nIA, with IA(ω) = . 0, if ω∈ / A

Because P (A) = 0, A is a null-set of P . Since X and Xn only differ on a null-set

20 of P and we have assumed that ρ satisfies 1.24, we have that ρ(Xn) = ρ(X) for all n. Using theorem 1.6 we have that

min α(Q) ≥ α (Q) ≥ EQ(−Xn) = EQ [−X + nIA] = EQ [−X] + nQ(A). (1.27)

Because Q(A) > 0 we have that EQ [−X] + nQ(A) → +∞ if n → +∞. We can conclude that α(Q) = +∞.

Now let M1 := M1(Ω, F,P ) denote the set of all probability measures which are absolute continuous with respect to P . From theorem 1.6 we know that each convex risk measure can be represented by a minimal penalty function αmin, but in the representation the supremum is taken over all Q ∈ M1,f . The following theorem characterizes a class of convex risk measures in which the Q is indeed a min probability measure and α is a penalty function concentrated on M1(P ) The proof of this theorem is very technical and is outside the scope of this thesis.

Theorem 1.8. Suppose ρ : L∞ → R is a convex risk measure, then the following conditions are equivalent:

1. ρ satisfies the following Fatou property: for any bounded sequence (Xn) which converges P-a.s. to some X,

ρ(X) ≤ lim inf ρ(Xn) n↑∞

2. ρ can be represented by the restriction of the minimal penalty function αmin to the set M1(P )

min ∞ ρ(X) = sup (EQ [−X] − α (Q)).X ∈ L (1.28) Q∈M1(P )

Proof. Without proof, the proof can be found in [11]. Instead of proving this theorem we’ll try to give an intuition behind the seemingly technical formula 1.28. Consider the situation where you have some subjective belief P . Consider also the set of all other probabilistic models M1(P ) which have the property that, for an event A, if under your subjective belief it is impossible for A to happen, then under other probabilistic models from M1(P ), A cannot happen. For a fixed probabilistic model Q we can interpret EQ [X] as the expected value of the portfolio under this probabilistic model. Using the interpretation of a risk- measure as a capital requirement we can interpret ρ(X) = EQ [−X] as the risk-free capital you should hold so that your total expected wealth, which consists of the portfolio and the risk-free capital, equals zero. If you portfolio has a positive expected value under Q, then the position X is acceptable. Hence ρ(X) ≤ 0 and X ∈ Aρ. But which probability measure Q should we pick in de definition of ρ? Instead of focusing on a specific probabilistic model, we could consider all plausible prob- abilistic models M1(P ). We could define ρ as the capital requirement needed in

21 the worst-case scenario of all these probabilistic models M1(P ) to make sure our total expected wealth is always at least zero i.e.

ρ(X) = sup EQ [−X] . (1.29) Q∈M1(P )

But we did have some beliefs about the probabilistic model. So we would like to give more importance to probabilistic models which are ”more similar” to P , than the models which deviate a lot from P . This is where the penalty function comes in. If we let α(Q) be so that is assigns higher values to probabilistic models Q which deviate a lot from our model P , then they have a smaller influence on the supremum. Now the question is how do we measure the similarity between two probability measures? One way of doing this is using the notion of relative entropy, or Kullback-Leibler divergence.

Definition 1.8. The Kullback-Leibler divergence or relative entropy for a proba- bility measure Q which is absolute continuous with respect to a probability measure P is defined as

 dQ Z dQ dQ KL(Q|P ) = ln = ln dP. (1.30) EQ dP dP dP

dQ Where dP is the Radon-Nikod´ymderivative of Q with respect to P . If we want to use relative entropy as a penalty function, it is important to check that it takes a minimal value for Q = P . We want to penalize the probabilistic model P the least. This is what we prove in the following lemma.

Lemma 1.6. For all Q ∈ M1(P ) we have KL(Q|P ) ≥ 0. Furthermore we have KL(P |P ) = 0.

Proof. Let f(x) = x ln(x). Then f(x) is a convex function. By definition we have that dQ dQ KL(Q|P ) = ln E dP dP  dQ = f E dP  dQ ≥ f E dP dQ  dQ = ln E dP E dP = 1 ln (1) = 0.

Where the inequality follows from Jensen’s inequality. It is clear that

dP dP  KL(P |P ) = ln = [1 ln (1)] = 0 E dP dP E

22 Using the Kullback-Leibler entropy as penalty function we get the following risk- measure: ρ(X) = sup (EQ[−X] − KL(Q|P )) . (1.31) Q∈M1(P ) This risk measure is known as an . We’ll study this risk measure in more detail in chapters four and five. We would like to point out that each risk measure that is defined as

min ∞ ρ(X) = sup (EQ[−X] − α (Q)),X ∈ L , (1.32) Q∈M1(P ) is a convex risk measure. To see this it is sufficient to notice that the proof of theorem 1.11 still works, if we take the supremum over all Q ∈ M1(P ) instead of over all Q ∈ M1,f . The representation of a convex risk measure in the form of 1.28 is often called the robust representation of a convex risk measure. This name refers to the fact that we don’t pick a fixed probability measure Q to calculate the risk, but consider all possible scenario’s. Sometimes the supremum in the representation 1.28 is actually a maximum.

Theorem 1.9. For a convex risk measure ρ on X we have that if ρ is continuous from below, which means that for all sequences Xn:

Xn % X pointwise on Ω ⇒ ρ(Xn) & ρ(X).

The the supremum in 1.28 is a maximum and we have that

min  ρ(X) = max EQ [X] − α (Q) ,X ∈ X . (1.33) Q∈M1(P )

Proof. Without proof, see [11, p192]

23 2 An introduction to decision theory

2.1 The axioms of von Neumann-Morgenstern

When dealing with risk individual preferences matter. In the first chapter we defined the acceptance set of a risk measure. The acceptance set contained all acceptable positions X ∈ X . But we never gave a clear explanation of what an acceptable position is. This is because whether or not you find a specific position acceptable depends on your individual preferences and therefore your risk aversion. In this section we repeat some basic notions from expected utility theory. A central question in this discussion is ”How does a rational investor choose between different portfolio’s?” This choice is risky, because the return of the portfolio’s is uncertain and can be modelled using stochastic variables. The attitude of the investor towards risk can be studied using expected utility theory. Crucial to this theory is the concept of a preference order over a set of lotteries L.

Definition 2.1. A lottery L is defined as a probability measure over a set of outcomes, called the outcome space.

In our case the outcome space will be the real axis. These are all the possible net payoffs of the portfolio, X. The different probability distributions of the net payoffs of the different portfolio’s are given by the lotteries.

Definition 2.2. A preference order on the set of lotteries L is defined as a binary relation < with the following two axioms:

Axiom 1. (Completeness) ∀L1,L2 ∈ L :

L1 < L2 or L2 < L1.

Axiom 2. (Transitivity) ∀L1,L2,L3 ∈ L:

L1 < L2 and L2 < L3 ⇒ L1 < L3.

24 Sometimes a preference order has a numerical representation.

Definition 2.3. A numerical representation of a preference order is a function U : L → R such that L1 < L2 ⇔ U(L1) ≥ U(L2). (2.1)

A numerical representation is called affine if for all L1, L2 ∈ L and α ∈ [0, 1].

U(αL1 + (1 − α)L2) = αU(L1) + (1 − α)U(L2). (2.2)

To be sure there exists numerical representation and that it is affine we have to impose two extra axioms.

Axiom 3. (Independence) ∀L1,L2,L3 ∈ L and α ∈ (0, 1], we have that

L1 < L2 ⇒ αL1 + (1 − α)L3 < αL2 + (1 − α)L3. Using the concept of a compound lottery, we can give an interpretation of the independence axiom. The compound lottery is represented by the distribution αL1 + (1 − α)L3, and can be interpreted as a two-step procedure where first a choice is made between lottery L1,and lottery L3, with probabilities α and 1 − α respectively, and then the chosen lottery is played. The axiom of independence states that if we prefer lottery L1 to lottery L2, we must prefer the compound lottery αL1 + (1 − α)L3 to αL2 + (1 − α)L3

Axiom 4. (Continuity) ∀L1,L2,L3 ∈ L, the following sets are closed.

{α ∈ [0, 1]|αL1 + (1 − α)L2 < L3} ⊂ [0, 1]

{α ∈ [0, 1]|L3 < αL1 + (1 − α)L2} ⊂ [0, 1]

Theorem 2.1. If < is a preference order that satisfies the axiom of independence and the axiom of continuity, then there exists an affine numerical representation U of <. However U is not unique, but is unique up to an affine transformation. This ˜ ˜ means that another affine numerical representation U of < is such that U = aU +b, with a > 0 and b ∈ R. Proof. Without proof, see e.g. [11, p. 58]. Sometimes this numerical affine representation has a special form, called the von Neumann-Morgenstern representation.

Definition 2.4. A numerical representation of a preference order < is a von Neumann-Morgenstern representation if it is of the form Z U(L) = u(x)L(dx) ∀L ∈ L. (2.3)

Where u is a real function of the outcomes. We will call this function u the utility function.

25 In the case that the outcome space is not finite it is generally not guaranteed that the numerical representation will be of the von Neumann-Morgenstern form. But if there is a von Neumann-Morgenstern representation then both U and u are only unique up to affine transformation. For an interpretation of the von Neumann-Morgenstern representation and to un- derstand why it is useful, consider a fixed preference relation < which has a von Neumann-Morgenstern representation U(L) = R u(x)L(dx). In our context the lottery L can be interpreted as the probability distribution that characterizes the returns of out investment, modelled by a stochastic variable X. We will assume that we can make a loss or a profit, such that the outcome space is the whole real line. Taking the integral over our outcome space R gives us that U(L) = E [u(X)]. In the expected utility framework a rational investor with utility function u, will rank different portfolios based on their expected utility.

X1 < X2 ⇔ E [u(X1)] ≥ E [u(X2)] . (2.4)

2.2 Risk and utility

In this section we will only consider investors who’s preference order admits to a von Neumann-Morgenstern utility representation. The utility function u of such an investor reveals his attitude towards risk.

Definition 2.5. We will call a preference order < (strictly) risk averse if and only if u is (strictly) concave.

This definition should not come as a surprise. From Jensen’s inequality we know that if u is a concave function we have that

u (E [X]) ≥ E [u (X)] . (2.5) Where if u is strictly concave, the inequality holds. In the expected utility context, Jensen’s inequality states that when a rational risk averse investor has to choose between taking a gamble X or getting a certain amount equal to the expected payoff of the gamble E [X], he will prefer the certain ammount. This is because his utility for taking the certain amount, u (E [X]), is higher than the expected utility he receives when he takes the gamble, i.e. E [u (X)] . Similarity, if the utility function of an investor is convex this means the investor is risk loving. And if the utility function of an investor is a linear function it means that the investor is risk neutral. This gives us another way to look at utility functions in the context of financial mathematics. Concave utility functions can be viewed as a way to make risky investments less valuable. Conceptually it is comparable to the discount factor used to make future payoffs less valuable. If the utility function u is twice differentiable, we could analyse the concavity of a utility function using the second derivative of this function. However, two remarks must be made about this approach. Firstly, the second derivative is a local measure and will reflect the local risk aversion. Secondly it is impossible to compare the risk aversion of two utility functions using only the second derivative. This second

26 problem is a direct consequence of the fact that the von Neumann-Morgenstern utility representation is only unique up to an affine transformation. The utility 2 2 functions u1(x) = −x and u2(x) = −2x have as second derivatives respectively −2 and −4. But both utility functions can represent the same preference order. One way to deal with this problem is to use the Arrow-Pratt coefficient of absolute risk aversion.

Definition 2.6. The Arrow-Pratt coefficient of absolute risk aversion of a twice differentiable utility function u is defined as

u00(x) r (x) = − . (2.6) A u0(x)

u We will use the notation rA(x) if we want to specify the used utility function u. By dividing u00(x) by u0(x) we have made sure that all affine transformations of a utility function u have the same coefficient of absolute risk aversion. The minus sign makes sure that positive values of the coefficient of absolute risk version reflect a risk averse investor. For an interpretation of the numerical value of this coefficient consider

u00(x) du0/dx du0/u0 r (x) = − = − = . (2.7) a u0(x) u0 dx

Where u0(x) is called the marginal utility. The marginal utility measures the increase of utility per unit of increase in payoff of the portfolio. It would be rational to assume non-saturation, that is u0(x) ≥ 0, which means that the utility is non- decreasing when the payoff of the portfolio increases. The Arrow-Pratt coefficient of absolute risk aversion can be interpreted as the percentage decrease in marginal utility per unit of increase in net payoffs of the portfolio. E.g. if rA = 0.01 this means that in the neighbourhood of x the investor’s marginal utility is decreasing at the rate of 1% per unit of increase in the net payoff. As a little remark we would like to point that if the net payoff of the portfolio is expressed in euro, then 1 the unit of rA(x) is euro . However generally the units of the Arrow-Pratt measure of absolute risk aversion are omitted, and we’ll do the same. Given the functional form of the Arrow-Pratt measure of risk aversion, you could find a utility satisfying it, by solving the following second order linear differential equation:

00 0 u (x) + rA(x)u (x) = 0. (2.8) Solving this equation is fairly straightforward and we’ll do this in the next theorem.

Theorem 2.2. The solutions to equation 2.8 is given by:

Z x Z η  u(x) = C1 exp −rA(ζ)dζ dη + C2. (2.9) 1 1

Where C1 and C2 are two constants and rA(x) is the Arrow-Pratt coefficient of absolute risk aversion.

27 Proof. Using the substitution v(x) = u0(x) in equation 2.8, we find

0 v (x) = −rA(x)v(x). Rewriting this we find v0(x) = −r (x) ⇔ ln(v(x))0 = −r (x). v(x) A A Integrating both sides we get Z η ln(v(η)) = −rA(ζ)dζ + C. 1 This gives us Z η  v(η) = C1 exp −rA(ζ)dζ . 1 Since v(x) = u0(x) we find that Z x Z η  u(x) = C1 exp −rA(ζ)dζ + C2. 1 1

2.3 Certainty equivalents

Certainty equivalents will play a crucial role in this thesis, since they are strongly linked to risk-measures. In this section we will take a look at three different certainty equivalents. We will consider the ordinary certainty equivalent, the op- timised certainty equivalent and the certainty equivalent resulting from the zero- utility principle.

2.3.1 The ordinary certainty equivalent

Definition 2.7. The (ordinary) certainty equivalent, CEu(X) of a stochastic vari- able X, with distribution L, is the amount of money for which an individual is indifferent between X and the certain amount CEu(X). This means Z u (CEu(X)) = u(x)L(dx) = E [u(X)] . (2.10)

L Sometime we will use the notation CEu (X) if we explicitly want to specify the distribution L of X.

If an investor is risk averse we have that u (E [X]) ≥ E [u (X)]. This means that −1 CEu(X) = u (E [u(X)]) ≤ E [X] . (2.11) This coincides with our intuition about risk aversion. When faced with the choice between a gamble X and a risk free amount CEu(X) it is possible that a risk averse investor will choose the certain amount even if it’s less than the expected payoff of the gamble. So far we have seen different ways to asses the attitude of investor towards risk. The following theorem links these different concepts.

28 Theorem 2.3. Given two investors with utility functions u1 and u2 respectively, then the following statements are equivalent.

u2 u1 1. rA (x) ≥ rA (x) for every x.

2. There exists an increasing concave function φ(·) such that u2(x) = φ(u1(x)) at all x. This means that u2 can be seen as a concave transformation of u1.

L L 3. CEu2(X) ≤ CEu1(X) for all L.

4. Whenever the second investor with utility function u2 finds a lottery L as least as good as a risk free outcome x¯, then the first investor with utility R function u1 also finds the lottery L as least as good as x¯. So u2(x)L(dx) ≥ R u2(¯x) ⇒ u1(x)L(dx) ≥ u1(¯x) for all L and x¯.

Proof. Without proof, see [20, p191]. All expressions in theorem 2.3 reflect the fact that the second investor is more risk averse than the first investor. Remember that in expected utility theory, a rational investor is able to rank different portfolios using their expected utility. It is easy to see that if the investor has an increasing utility function, this ranking can also be obtained when the investor uses his ordinary certainty equivalent, because:

−1 −1 X1 < X2 ⇔ E [u(X1)] ≥ E [u(X2)] ⇔ u (E [u(X1)]) ≥ u (E [u(X2)]) . (2.12)

2.3.2 The optimised certainty equivalent A certainty equivalent which will play an important role in the study of so called divergence risk measures will be the optimised certainty equivalent.

Definition 2.8. The optimized certainty equivalent, OCEu(X) of a stochastic vari- able X of an investor with a concave utility function u is defined as

OCEu(X) = sup (η + E [u (X − η)]) . (2.13) η∈R Before we give the economic intuition behind this certainty equivalent, we will take a closer look at definition 2.13. When we do this, we immediately notice a problem. Consider u1(x) and u2(x) = u1(x) + b with b 6= 0. From the von Neumann-Morgenstern utility theory we know that both utility functions represent the same preferences. However because of the linearity of the expected value we have OCEu2 (X) = OCEu1 (X) + b 6= OCEu1 (X). This means that the OCEu is not invariant under an affine transformation of the utility function, which is an undesirable property for a certainty equivalent and makes the interpretation difficult. The authors [5], who introduced this certainty equivalent did only define the opti- mised certainty equivalent for a limited class U0 of ”normalised” utility functions.

29 Definition 2.9. Let u : R 7→ [−∞, +∞) be a proper1 closed concave and non- decreasing utility function with effective domain dom u = {t ∈ R|u(t) > −∞}= 6 ∅. Then u is contained in the class of normalized utility functions U0 if u satisfies u(0) = 0 and 1 ∈ ∂u(0). Where ∂u(·) is the subdifferential 2 map of u. If u is differentiable at 0 then the two normalisation properties of definition 2.9 0 yield u(0) = 0 and u (0) = 1. Since the utility functions in U0 are non-decreasing and u(0) = 0 we have that for u ∈ U0, u(x) ≥ 0 for all x ≥ 0. We also have that for u ∈ U0 and for all x ∈ R, u(x) ≤ x because of the concavity of the utility function and 1 ∈ ∂(0). We will now try to give an intuition behind the definition of the optimised certainty equivalent let X denote the net payoff of our portfolio. Then E [u(X)] can be interpreted as sure present value of the net payoff of out portfolio. Now consider an investor who can choose to receive a part η of the future net payoff of the portfolio X, giving him a total present value of η + E [u(X − η)]. If the investor were to optimise this choice, he would receive max (η + E [u(X − η)]). However since it η∈R is not always guaranteed a maximum exists the optimised certainty equivalent is defined using a supremum. From [4] we have following properties and proofs.

Theorem 2.4. For u ∈ U0 the optimised certainty equivalent has following prop- erties:

1. (Monotonicity) X ≤ Y ⇒ OCEu(X) ≤ OCEu(Y )

2. (Shift additive) For all c ∈ R we have OCEu(X + c) = OCEu(X) + c.

3. (Risk aversion) u(x) ≤ x for all x if and only if OCEu(X) ≤ E [X].

4. (Concavity) For all stochastic variables X1 and X2 and all λ ∈ [0, 1] we have

OCEu(λX1 + (1 − λ)X2) ≥ λ OCEu(X1) + (1 − λ) OCEu(X2).

Proof. The proofs of these properties follow from straightforward calculations, as demonstrated below. 1. (Monotonicity) Because u is non-decreasing we have that

X ≤ Y ⇒ X − η ≤ Y − η ⇒ E [u (X − η)] ≤ E [u (Y − η)] ⇒ η + E [u (X − η)] ≤ η + E [u (Y − η)] ⇒ sup (η + E [u (X − η)]) ≤ sup (η + E [u (Y − η)]) η∈R η∈R

⇒ OCEu(X) ≤ OCEu(Y ) 1This means that u(·) is such that for all x u(x) < +∞ and for at least one x we have u(x) > −∞. 2 For a concave function the subdifferential at x0 is defined as following set ∂u(x0) = {c ∈ R|u(x) − u(x0) ≤ c(x − x0)}. In this case 1 ∈ ∂u(0) ⇔ u(x) ≤ x.

30 2. (Shift additive)

OCEu(X + c) = sup (η + E [u(X + c − η)]) η∈R = sup (η − c + E [u(X − (η − c))]) + c η∈R = sup (η − c + E [u(X − (η − c))]) + c (η−c)∈R

= OCEu(X) + c

3. (Risk aversion) First suppose u(x) ≤ x for all x then:

OCEu(X) = sup (η + E [u(X − η)]) η∈R ≤ sup (η + E [X − η]) η∈R = sup (η + E [X] − η) η∈R = sup E [X] η∈R = E [X]

Now suppose OCEu(X) ≤ E [X]. For all X we have:

sup (η + E [u(X − η)]) ≤ E [X] η∈R ⇒ η + E [u(X − η)] ≤ E [X] ∀η ∈ R ⇒ E [u(X − η)] ≤ E [X − η] ∀η ∈ R ⇒ E [u(X)] ≤ E [X]

Since this is true for all X, this is especially true for all x ∈ R. Hence we can conclude that u(x) ≤ x.

4. (Concavity) For all λ ∈ [0, 1], let Xλ = λX1 + (1 − λ)X2. Because of the concavity of u we have for all η1, η2 ∈ R that

E [u(λX1 + (1 − λ)X2 − λη1 − (1 − λ)η2)] ≥ λE [u(X1 − η1)]+(1−λ)E [u(X2 − η2)] .

Notice that λη1 + (1 − λ)η2 ∈ R. Adding this to both sides we find that

λη1 + (1 − λ)η2 + E [u(λX1 + (1 − λ)X2 − λη1 − (1 − λ)η2)] ≥ λη1 + λE [u(X1 − η1)] + (1 − λ)η2 + (1 − λ)E [u(X2 − η2)] . Taking the supremum of both sides we get

sup {λη1 + (1 − λ)η2 + E [u(Xλ − λη1 − (1 − λ)η2)]} η1,η2∈R ≥ sup {λ (η1 + E [u(X1 − η1)]) + (1 − λ)(η2 + E [u(X2 − η2)])}. η1,η2∈R

31 2 Because the mapping (η1, η2) 7→ λη1 + (1 − λ)η2 defines a surjection from R to R, we have that

OCEu(Xλ) = sup {λη1 + (1 − λ)η2 + E [u(Xλ − λη1 − (1 − λ)η2)]} η1,η2∈R ≥ sup {λ (η1 + E [u(X1 − η1)]) + (1 − λ)(η2 + E [u(X2 − η2)])} η1,η2∈R = λ sup (η1 + E [u(X1 − η1)]) + (1 − λ) sup (η2 + E [u(X2 − η2)]) η1∈R η2∈R

= λ OCEu(X1) + (1 − λ) OCEu(X2).

This proves the concavity property of the optimised certainty equivalent.

It is now natural to ask if the optimised certainty equivalent provides the same ranking on the portfolios as the ordinary certainty equivalent, or the expected utility criterion. It is stated in [5] that this will not always be the case, theorem 2.6 links the optimised certainty equivalent and the ordinary certainty equivalent. In this theorem we need to assume that the supremum in the definition of OCEu(X) is attained for an η ∈ R. This will be the case if the support of X is a closed bounded interval. From [5] we have following theorems and proof.

Theorem 2.5. If u ∈ U0 and if X is a stochastic variable with support a closed bounded interval, then the supremum in the definition of the optimised certainty equivalent is attained. I.e.

∃η ∈ R : OCEu(X) = η + E [u(X − η)] . Proof. Without proof, see [5].

Theorem 2.6. If X and Y are stochastic variables with a compact support, then

OCEu(X) ≥ OCEu(Y ) ∀u ∈ U0 ⇔ CEu(X) ≥ CEu(Y ) ∀u ∈ U0. (2.14)

Proof. First assume that CEu(X) ≥ CEu(Y ) ∀u ∈ U0. Using that u is nondecreas- ing we find that ∀u ∈ U0 and η ∈ R:

CEu(X) ≥ CEu(Y ) ⇒ E [u(X)] ≥ E [u(Y )] ⇒ η + E [u(X − η)] ≥ η + E [u(Y − η)] ⇒ sup (η + E [u(X − η)]) ≥ sup (η + E [u(Y − η)]) η∈R η∈R

⇒ OCEu(X) ≥ OCEu(Y )

Where the first two implications follow from the fact that u is nondecreasing. Now assume that OCEu(X) ≥ OCEu(Y ). Because X and Y have compact supports, the supremum in the optimised certainty equivalents is attained. Hence for every u ∈

32 u u u u U0 there exists ηX and ηY such that we have that OCEu(X) = ηX + E [u(X − ηX )] u u and OCEu(Y ) = ηY + E [u(Y − ηY )]. We have for any u ∈ U0 that

u u u u u u OCEu(X) = ηX + E [u(X − ηX )] ≥ ηY + E [u(Y − ηY )] ≥ ηX + E [u(Y − ηX )] .

u u We conclude that for any u ∈ U0 E [u(X − ηX )] ≥ E [u(Y − ηX )]. Which implies that E [u(X)] ≥ E [u(Y )]. Again using the fact that u is nondecreasing this implies that CEu(X) ≥ CEu(Y ). We would like to remark that we do not necessarily need the fact that both X and Y have compact support. We only need to be sure that the supremum is attained for all utility functions u.

2.3.3 The u-Mean certainty equivalent

Definition 2.10. The u-Mean certainty equivalent, Mu(X) of a stochastic variable X is defined by the equation

E [u (X − Mu(X))] = 0. (2.15) Where u is a strictly increasing utility function. The equation 2.15 is known as the principle of zero utility.

Notice that the u-Mean certainty equivalent also has the problem that it is not invariant under an affine transformation of the utility function u. When u is non-decreasing one can give a more general definition of the u-Mean certainty equivalent, Mu(X) = sup{m ∈ R|E [u(X − m)] ≥ 0}. (2.16) In the fourth chapter of this thesis we will derive a relation between the u-Mean certainty equivalent and the optimised certainty equivalent. We will also show that under reasonable assumptions, the zero-utility principle has a unique solution.

2.4 The exponential utility function

The exponential utility function will be an important utility function in this thesis, since it is strongly connected with the concept of an entropic risk measure. In this section I will therefore apply the concepts described above to the exponential utility function. The exponential utility function occurs when we model an investor with constant absolute risk aversion. Let a, with a > 0 be the coefficient of absolute risk aversion of a risk averse investor. Using theorem 2.2 we find that

Z x Z η  u(x) = C1 exp −adζ dη + C2. 1 1 Calculating these integrals we have that −C u(x) = 1 (exp (−a(x − 1)) − 1) + C . a 2

33 0 We will choose the constants C1 and C2 such that u ∈ U0. The condition u (0) = 1 gives us that C1 = exp (−a), and the condition u(0) = 0 gives us that C2 = 1 a (1 + exp (−a)). Using these constants the utility function becomes 1 − exp (−ax) u(x) = . (2.17) a We will take a look at the different kinds of certainty equivalents. It is stated in [5] that for the exponential utility function 2.17 the ordinary certainty equivalent, the optimised certainty equivalent and the u-Mean certainty equivalent coincide. We will provide a proof of this statement in this thesis.

1−exp(−ax) Theorem 2.7. If u(x) = a then 1 CE (X) = OCE (X) = M (X) = − ln ( [exp (−aX)]) . (2.18) u u u a E Proof. Using the definitions from section 2.3, we’ll compute all the different cer- tainty equivalents.

ˆ The ordinary certainty equivalent CEu(X) −1 − ln(1−ax) Notice that u (x) = a −1 CEu(X) = u (E [u(X)])  h 1−exp(−aX) i − ln 1 − aE a = a −1 = ln (1 − [1 − exp (−aX)]) a E −1 = ln ( [exp (−aX)]) a E

ˆ The optimised certainty equivalent OCEu(X) Consider the function f(η) = η + E [u (X − η)]. For the optimised certainty equivalent we’re interested in the supremum of this function. If f(η) has a maximum, this maximum will be equal to the supremum. The first order condition gives d 0 = (η + [u (X − η)]) dη E d  1 − exp (−a (X − η)) = 1 + dη E a d 1 − exp (aη) [exp (−aX)] = 1 + E dη a = 1 − exp (aη) E [exp (−aX)] Solving this last equation to η we find − ln [exp (−aX)] η∗ = E a

34 To show that the function f(η) attains a maximum at η∗ it is suffucient to no- tice that for a > 0 the second derivative f 00(η) = −a exp (aη) E [exp (−aX)] is negative. We can conclude that 1 − exp (−a (X − η∗)) OCE (X) = η∗ + u E a

  ln E[exp(−aX)]  − ln [exp (−aX)] 1 − exp −aX + a a = E + a E  a  − ln [exp (−aX)] 1 1 = E + − [exp (−aX) exp (ln [exp (−aX)])] a a aE E   − ln E [exp (−aX)] 1 1 exp (−aX) = + − E a a a E [exp (−aX)] − ln [exp (−aX)] = E a

ˆ The u-Mean Mu(X) − ln E[exp(−aX)] We will show that Mu(X) = a satisfies the principle of zero utility.

E [u (X − Mu(X))] = E [1 − exp (−aX − ln E [exp(−aX)])] = E [1 − exp(−aX) exp (− ln E [exp(−aX)])] [exp(−aX)] = 1 − E E [exp(−aX)] = 0

Because u is a strictly increasing and continuous function Mu(X) is the unique solution of the zero-utility principle.

2.5 Stochastic dominance

Expected utility theory states that an investor with a utility function u would prefer X1 to X2 if and only if E [u(X1)] ≥ E [u(X2)]. In this section we will take a look at a situation where X1 is preferred to X2, not only for one investor with a specific utility function u, but for a whole class of investors with different utility functions. This study can be done using the concept of stochastic dominance. The main goal of this section is to introduce the necessary concepts and definitions regarding stochastic dominance so that we can apply them later on in concrete situations. The definitions and theorems in this section are taken from [29]. As always the stochastic variable X will be the net payoff of a portfolio, and take negative as well as positive values. We will assume that the distribution of X is given by F (x) and that his derivative exists, so that the probability density of X is well defined. We will further assume that the utility function u is sufficiently differentiable.

35 An important concept in the study of stochastic dominance is the n-th order distribution of a stochastic variable X. The n-th order distribution can be defined inductively as follows Definition 2.11. The n-th order distribution function, F (n)(x), of a stochastic variable X is inductively defined by Z x F (1)(x) := F (x),F (n)(x) := F (n−1)(u)du. (2.19) −∞ Where F (x) is the cummulative distribution of X. Using the notion of a n-th order distributions functions we can define the concept of n-th order stochastic dominance.

Definition 2.12. If X1 and X2 are random variables, then X1 dominates X2 in the sense of n-th order stochastic dominance, X1 ≥SD(n) X2, if

(n) (n) F1 (x) ≤ F2 (x) ∀x ∈ R (2.20)

(n) (n) Where F1 (x) and F2 (x) are the n-th order distributions of X1 and X2. There is an important link between the concept of n-th order stochastic dominance and expected utility maximisation this link is given by following theorem. Theorem 2.8. X1 ≥SD(n) X2 ⇔ E [u (X1)] ≥ E [u (X2)] . (2.21) for all utility functions u(x) for which (−1)ku(k)(x) ≤ 0 for k ∈ {1, 2, 3, . . . , n} for all x (with at least one utility function satisfying the inequality). When we take a closer look at the special case of second order stochastic dom- inance. We can see see that second order stochastic dominance has a clear and easy economic interpretation. Theorem 2.9. X1 ≥SD(2) X2 ⇔ E [u (X1)] ≥ E [u (X2)] . (2.22) For all utility functions u(x) with u0(x) ≥ 0 and u00(x) ≤ 0 for all x, where there is at least one utility function u(x) with the property that u0(x) > 0 and u00(x) < 0.

Second order dominance means that X1 is ranked above X2 if for all risk averse 00 0 (u (x) ≤ 0) and non-saturated (u (x) ≥ 0) investors the expected utility of X1 is more than the expected utility of X2. Risk measures can induce an ordering on different portfolio’s, that is if ρ(X) ≥ ρ(Y ) ⇒ X ≤ Y . When all non-saturated risk averse investors obtain the same ordering using the von Neumann-Morgenstern criterion we will say that this risk- measure is consistent with second order stochastic dominance. More generally we have following definition. Definition 2.13. A risk measure ρ(X) is consistent with n-th order stochastic dominance if X1 ≥SD(n) X2 ⇒ ρ(X1) ≤ ρ(X2). (2.23)

36 Using the definition of the n-th order distribution it is easy to see that one has following inclusion.

Theorem 2.10. X1 ≥SD(n) X2 ⇒ X1 ≥SD(n+1) X2. Proof. without proof, see [29, Theorem 4] Using definition 2.13 and theorem 2.10 we can conclude that following theorem holds.

Theorem 2.11. A risk measure consistent with (n + 1)-th order stochastic domi- nance is also consistent with n-th order stochastic dominance.

Proof. Without proof, see [29, Theorem 6].

37 3 Value at Risk and Expected shortfall

”Value at Risk is like an air bag that works well all the time except when you have an accident.”

David Einhorn

In this chapter we will take a closer look at two commenly used risk measures, Value at Risk (VaR) and Expected shortfall (ES). We will use the concepts described in chapter one and two to check whether these risk measures have both desirable mathematical properties as well as desirable decision theoretic properties.

3.1 Value at Risk

Value at Risk is perhaps the most famous risk measures in existence. It’s definition is based on the quantiles of a stochastic variables.

Definition 3.1. The lower α quantile of a stochastic variable X is defined by

xα := qα(X) := inf{x ∈ R|P [X ≤ x] ≥ α}. (3.1) Definition 3.2. The upper α quantile of a stochastic variable X is defined by

α α x := q (X) = inf{x ∈ R|P [X ≤ x] > α}. (3.2)

Because {x ∈ R|P [X ≤ x] > α} ⊂ {x ∈ R|P [X ≤ x] ≥ α} we have that qα(X) ≤ qα(X). The equality is in general not true. However in [1] it is stated that equality holds iff P [X ≤ x] = α for at most one x. We follow [1] , [11] and [29] in defining VaRα as the smallest value such that the probability of an absolute loss being at most this value is at least 1 − α. More formally we have following definition.

38 Definition 3.3. Fix α ∈ (0, 1), Then the Value at Risk of a portfolio, where the net payoff is modelled by X at a level α1 is defined as

VaRα(X) = − inf{x ∈ R|P [X ≤ x] > α}. (3.3) We have illustrated the concept of Value at Risk in figure 3.1. Where we have assumed that X ∼ N(0, 1) and the total blue area equals α. In this case the upper and lower quantile are the same.

Figure 3.1: Value at Risk. y

− VaRα

x 0 α

3.1.1 General properties We will now apply properties of general risk measures form chapter one to analyse the properties of Value at Risk. It is easy to check that Value at Risk is a monetary risk measure.

Theorem 3.1. VaRα(X) is a monetary risk measure.

Proof. To prove that VaRα is a monetary risk measure, we need to prove that it satisfies both monotonicity and translation invariance. 1. Monotonicity Take X ≤ Y . We need to prove that for all α ∈ (0, 1) we have that VaRα(X) ≥ VaRα(Y ). Notice that

VaRα(X) ≥ VaRα(Y ) ⇔ inf{x|P [X ≤ x] > α} ≤ inf{x|P [Y ≤ x] > α}.

To prove the inequality on the right side we will show that

{x ∈ R|P [Y − x ≤ 0] > α} ⊂ {x ∈ R|P [X − x ≤ 0] > α}. (3.4)

From our assumption X ≤ Y we have that ∀x ∈ R,X − x ≤ Y − x. From this it follows that P [X − x ≤ 0] ≥ P [Y − x ≤ 0]. Take an x ∈ {x ∈ R|P [Y − x ≤ 0] > α} randomly, then we have P [Y − x ≤ 0] > α. From which it follows that P [X − x ≤ 0] ≥ P [Y − x ≤ 0] > α. We can conclude that x ∈ {x ∈ R|P [X − x ≤] > α}. This proves 3.4. We conclude that if X ≤ Y then VaRα(X) ≥ VaRα(Y ).

1This means on a 100(1 − α) percent confidence level.

39 2. Translation invariance We need to prove that VaRα(X + m) = VaRα(X) − m. This follows from a straightforward calculation.

VaRα(X + m) = − inf{x ∈ R|P [X + m ≤ x] > α} = − inf{x ∈ R|P [X ≤ x − m] > α} = − inf{x + m ∈ R|P [X ≤ x] > α} = − inf{x ∈ R|P [X ≤ x] > α} − m = VaRα(X) − m.

This concludes the proof that VaRα(X) is a monetary risk measure.

It is now natural to ask whether Value at Risk is a convex risk measure. From theorem 1.1 from chapter one, we know that if VaRα is a convex risk measure, then the associated acceptance set should be convex as well. Unfortunately it turns out that VaRα is not a convex risk measure. This is an undesirable property to have because it can penalize more diversified portfolio’s. We will illustrate the fact that VaRα is not an convex risk measure using a simplified example. Let the risk-free rate be 0%. Consider a zero-coupon bond which costs 100, pays out 101 and has a default probability of 0.0095. Denote the net payoff of an investment in the bond with X. Then ( −100, when the bond defaults. X = (3.5) 1, otherwise.

It is easy to see that the Value at Risk at the 99% confidence level, VaR0.01(X) = −1. This is because the probability of default is below the 1% at which we calculate the Value at Risk. This default is considered to unlikely to be taken into account by VaR0.01. Because VaR0.01(X) = −1 ≤ 0, we have that X is an acceptable position, i.e. X ∈ AVaR0.01 . Now consider a second bond which has exactly the same default risk, payoff and price as the first bond. If the net payoff of the investment in this second bond are modelled by Y , then it is clear that VaR0.01(Y ) = −1 and thus

Y ∈ AVaR0.01 . We have that both X and Y are acceptable positions. Now assume that the default of the first bond is independent of the default of the second bond. If Value at Risk would be a convex risk measure, then the more diversified portfolio 1 1 with payoff P = 2 X + 2 Y should be an acceptable position as well. Using the independence of X and Y we find that P has following distribution.

 2 −100, when both bonds default, p = (0.0095) .  99 X = − 2 , precisely one bond defaults, p = 2 · 0.0095(1 − 0.0095). (3.6) 1, otherwise.

The probability that at least one bond will default equals (0.0095)2 +2·0.0095(1− 0.0095) = 0.0189. Hence the Value at Risk of the diversified portfolio is VaR0.01(P ) = 99 2 . The portfolio P is not an acceptable position. The acceptance set AVaR0.01 is not convex.

40 Although we constructed a concrete example for α = 0.01, it is possible to con- struct such an example for all α ∈ (0, 1) by choosing the default probability of the bonds small enough. In the previous example another important problem of Value at Risk became clear. Value at Risk can ignore potentially very large losses. Consider again the example of a bond which costs b, has a positive net return r and defaults with probability p < α. Then VaRα(X) = −r ≤ 0 no matter the value of b. If the bond defaults the payoff is −b, but because the default probability is too low, Value at Risk isn’t affected by this, potentially very large, loss.

Figure 3.3: Standard normal density Figure 3.2: Density function of S1. function.

Like all risk measures in this thesis, VaR tries to summarise the distribution of a portfolio into one number which should reflect the level of risk. Hence it is in- evitable that some information regarding the complete distribution of the stochas- tic variable is lost. It is however important to be aware of this information loss. Because Value at Risk is defined as a quantile, it does not incorporate well the information about the shape of the left tail of the density function. We will illus- trate this with a theoretical example. Remember that the normal density is given by 1  (x − µ)2  f(x, µ, σ2) = √ exp − . (3.7) σ 2π 2σ2

Now consider S1 such that the net payoffs have following density function g(x) = 0.99f(x, 0, 1.002974) + 0.01f(x, −8, 0.04). (3.8) This density function is plotted in figure 3.2. Next to this figure the density of the standard normal is plotted. Apart from the spike which occurs around -8, both density functions are very similar. In fact if we calculate the Value at Risk at a 95%-confidence level of S1 we find that VaR0.05(S1) = 1.64485. If we now consider S2, for which the net payoffs have a standard normal distribution, then we find that the Value at Risk at a 95% confidence level is the same,i.e. VaR0.05(S2) = 1.64485. Although the Value at Risk at the 95%-confidence level is the same. The risk as- sociated with both investments is definitely not. For S1 there is a 0.5% probability 2 that the loss exceeds 8, while for S2 we have that this probability is negligible . This problem of Value at Risk get addressed by other risk measures such as Ex- pected shortfall.

2p = 6.2210−16

41 3.1.2 Consistency with expected utility maximisation In this subsection we will take a closer look at some of the results reported in [29] regarding the consistency of Value at Risk with expected utility maximisation. For this we will rely on the definitions and theorems introduced in chapter two regarding stochastic dominance. In [29] it is stated that Value at Risk is consistent with first order stochastic dominance. This should not be surprising, since Value at Risk is defined as a quantile.

Theorem 3.2. VaR is consistent with first-order stochastic dominance. This means X1 ≥SD(1) X2 ⇒ VaRα(X1) ≤ VaRα(X2). (3.9) Theorem 3.3. Without proof, follows from [19, Theorem 1’].

Instead of copying the proof of this theorem, we will give an example which will illustrate the fact that X1 ≥SD(1) X2 is a very strong assumption. Consider 2 stocks and let X1 and X2 denote the net payoffs of these stocks. Assume X1 ∼ N(1, 1) and X2 ∼ N(0, 1). It is known that the cumulative distribution of of a normal distribution with mean µ and deviation σ is given by

1  x − µ 1 Z x F (x) = 1 + erf √ , where erf(x) := √ exp (−t2)dt. (3.10) 2 σ 2 π −x We have that 1  x − 1 F1(x) = 1 + erf √ 2 2 1   x  F2(x) = 1 + erf √ 2 2 Because the error function is an increasing function, we have that for all x ∈ R F1(x) ≤ F2(x). By definition of first order stochastic dominance this im- plies that X1 ≥SD(1) X2. We have plotted these cumulative distributions in figure 3.1.2 together with the line y = 0.05. By definition we have that x- coordinate of intersection of this line with the cumulative distribution of X1 equals − VaR0.05(X1) = −0.6449. Similarity we find that − VaR0.05(X2) = −1.6449. We have that VaR0.05(X1) ≤ VaR0.05(X2). From figure 3.1.2 it is clear that this in- equality would hold for any α ∈ (0, 1). Hence we have that

X1 ≥SD(1) X2 ⇒ VaRα(X1) ≤ VaRα(X2). (3.11)

42 Figure 3.4: First order stochastic dominance and VaR

We realize the above example is a rather theoretical one. However it does illustrate an important point. The condition that F1(x) ≤ F2(x) for all x ∈ R is very (2) (2) restrictive. A less severe restriction would be that F1 (x) ≤ F2 (x) for all x ∈ R. Where F (2) denotes the second order distribution. In [29] it is stated that Value at Risk is in general not consistent with second order stochastic dominance. This means that in general we have that

X1 ≥SD(2) X2 ; VaRα(X1) ≤ VaRα(X2). (3.12) However in [29] we also find an important exception. Theorem 3.4. VaR is consistent with second order stochastic dominance when portfolios’ profits and losses have an elliptical distribution 3 with finite variance and the same mean. Proof. Without proof, see [29, Theorem 14]. Again we will not repeat the proof here, but we will construct an example to get a better understanding of the concept of second order stochastic dominance and the assumptions made in theorem 3.4. Consider again two stocks such that there net payoff is given by X1 and X2 re- 2 2 spectively. Assume that X1 ∼ N(µ, σ1) and X2 ∼ N(µ, σ2), with σ1 < σ2 and 1 1 denote with F1 and F2 there cumulative distributions. It is known that a normal distribution is an elliptical distribution. Then for i = 1 and i = 2 we have that    1 1 x − µ Fi (x) = 1 + erf √ . (3.13) 2 σi 2

3 T An n-dimensional random vector R = [R1,R2,...,Rn] has an elliptical distribution if the density function of R (denoted by f(R)) is represented below with a function φ(·, n)

1 T −1  f(R, θ, Σ) = 1 φ (R − θ) Σ (R − θ), n . |Σ| 2 Where Σ is an n-dimensional positive definite matrix and θ is an n-dimensional column vector.

43 By assumption we also have that 1 1 σ1 < σ2 ⇒ √ > √ . (3.14) σ1 2 σ2 2 From this we can conclude that

 x−√µ > x−√µ , when x > µ.  σ1 2 σ2 2 x−√µ < x−√µ , when x < µ (3.15) σ1 2 σ2 2  x−µ x−µ  √ = √ when x = µ. σ1 2 σ2 2 Using the fact that the error function erf is an increasing function we find that  F 1(x) > F 1(x), when x > µ.  1 2 1 1 F1 (x) < F2 (x), when x < µ (3.16)  1 1 F1 (x) = F2 (x), when x = µ. From this we can conclude that we do not have the necessary condition for first order stochastic dominance. Graphically this is illustrated in figure 3.5 in which we have taken µ = 0, σ1 = 1 and σ2 = 3. Although the conditions for first order stochastic dominance is not fulfilled, the condition for second order stochastic dominance is. The second order distributions are plotted in figure 3.6. We will have that

2 2 2 2 X1 ∼ N(µ, σ1) and X2 ∼ N(µ, σ2), with σ1 < σ2 ⇒ F2 (x) ≥ F1 (x) ∀x ∈ R. (3.17)

Figure 3.5: First order distirbutions Figure 3.6: Second order distributions for X1 ∼ N(0, 1) and for X1 ∼ N(0, 1) and X2 ∼ N(0, 9). X2 ∼ N(0, 9).

To see why 3.17 is true remember that for i = 1 and i = 2 we have by definition that: Z x 2 1 Fi (x) = Fi (u)du. (3.18) −∞ 1 1 Using equations 3.16 we see that F1 and F2 intersect each other for x = µ. Using the assumption that both X1 and X2 have the same mean and the properties of the

44 1 2 normal distribution we see that this intersection will happen for F1 (µ) = F2 (µ) = 0.5. When calculating the second order distribution, we in fact calculate the area under the first order distribution form −∞ to some point x. 2 2 We know from 3.16 that for x < µ we have that F2 (x) > F1 (x). Hence when calculating the second order distribution until some x < µ we accumulate extra area, this difference in area is labelled A in figure 3.5. We can also clearly see this 2 2 accumulating effect in figure 3.6 where the difference between F2 and F1 grows until x = µ. In this same figure we also notice that after this point the difference 2 2 between F2 and F1 decreases again, but it never becomes negative. This effect is 1 1 a result from the fact that for x > µ we have that F1 (x) > F2 (x) which implies 1 1 that the access area between F2 (x) and F1 (x) is negative for x > µ. Using the symmetry property of the normal distribution we see that

Z µ Z +∞ 1 1  1 1  F2 (x) − F1 (x) dx = − F2 (x) − F1 (x) dx (3.19) −∞ µ Graphically this means that in figure 3.5 the area A and B are the same, but they have a different sign. Hence when integrating the first order distributions form −∞ to some point x, one first accumulates the extra area A, and then loses part of this excess area when x > µ. However it is impossible to lose more than the already accumulated excess area because the absolute value of the area A is the same as that of area B, a fact which follows from equation 3.19. Hence we conclude that 3.17 holds and by definition of second order stochastic dominance we have that:

2 2 X1 ∼ N(µ, σ1),X2 ∼ N(µ, σ2) with σ1 < σ2 ⇒ X1 ≥SD(2) X2. (3.20)

This conclusion coincides with our intuition that the riskiest of two portfolios X1 and X2, with the same expected return, but with different variance, is the portfolio with the largest variance. Applying theorem 3.4 we find that

2 2 X1 ∼ N(µ, σ1),X2 ∼ N(µ, σ2) with σ1 < σ2 ⇒ VaRα(X1) ≤ VaRα(X2). (3.21)

We want to stress that in this previous example the fact that X1 and X2 had the same mean, is a crucial assumption. If this constraint would not be fulfilled it would not be guaranteed that Value at Risk is consistent with second order stochastic dominance. We can conclude that although Value at Risk has an easy definition, it also has a lot of shortcomings both from a mathematical point of view and from a decision theoretic point of view.

3.2 Expected shortfall

In this section we will look at an improvement of Value at Risk called Expected shortfall. The theorems and definitions used in this section are taken from [1].

45 Definition 3.4. Assume E [X−] < +∞. Then the expected shortfall at a level α ∈ (0, 1) is defined as 1 ES (X) = − ( [XI(X ≤ x )] + x (α − P (X ≤ x ))) . (3.22) α α E α α α Where I(·) denotes the indicator function.

An interesting representation of expected shortfall is the integral representation.

Theorem 3.5. If X is a real valued random variable on the probability space (Ω, F,P ) with E [X−] < +∞ and α ∈ (0, 1) is fixed, then 1 Z α 1 Z α ESα(X) = − xudu = − qu(X)du. (3.23) α 0 α 0 Proof. Without proof, see [1]. At this point we would like to mention that sometimes the definition

α TCE := −E [X|X ≤ − VaRα(X)] (3.24) is used as a synonym of expected shortfall. When the distribution of X is con- tinuous, this definition is equivalent to the definition of expected shortfall given α by equation 3.22, see [1]. However in general the equality TCE (X) = ESα(X) does not hold. The risk measure defined in equation 3.24 is known as upper tail conditional expectation. It is stated in [1] that it is not guaranteed that this risk-measure is coherent, because it sometimes lacks the sub-additivity property.

3.2.1 General properties The most important property of expected shortfall is that it is a coherent risk measure. This is an improvement upon Value at Risk since Value at Risk was not even convex. To prove this we will use the alternative characterisation of a coherent risk measure described in remark 1.1 of the first chapter. Here we find that we need to show that for all X,Y and α ∈ (0, 1) we have that:

1. (Positivity) X ≥ 0 ⇒ ESα(X) ≤ 0

2. (Positive homogeneous) ∀λ > 0 ESα(λX) = λ ESα(X)

3. (Translation invariant) ∀m ∈ R ESα(X + m) = ESα(X) − m.

4. (Sub-additivity) ESα(X + Y ) ≤ ESα(X) + ESα(Y ) From all these properties the sub-additivity property is the most difficult to prove. We will work out the sub-additivity proof from [1]. For this we need to define following function.

46 ( α I (X ≤ x) , if P (X = x) = 0 I (X ≤ x) := α−P (X≤x) (3.25) I (X ≤ x) + P (X=x) I (X = x) , if P (X = x) > 0

In [1] we find following lemma.

Lemma 3.1. We have following equalities

α  1. I X ≤ x(α) ∈ [0, 1]

 α  2. E I X ≤ x(α) = α

1  α  3. α E XI X ≤ x(α) = − ESα(X) We will use this lemma to prove following lemma.

Lemma 3.2. ( Iα Z ≤ z  − Iα X ≤ x  ≥ 0, if X > x (α) (α) (α) (3.26) α  α  I Z ≤ z(α) − I X ≤ x(α) ≤ 0, if X < x(α)

Proof. If X > x(α) or if X < x(α) we have that P [X = xα] = 0. Using definition α   3.25 we have that I X ≤ x(α) = I X ≤ x(α) . Hence we have

( α  I X ≤ x(α) = 0, if,X > x(α) α  I X ≤ x(α) = 1, if X < x(α).

α  From 3.1 we have that I Z ≤ z(α) ∈ [0, 1]. Hence we can conclude that

( α  α  I Z ≤ z(α) − I X ≤ x(α) ≥ 0, if X > x(α) α  α  I Z ≤ z(α) − I X ≤ x(α) ≤ 0, if X < x(α)

Which is what we needed to prove.

Theorem 3.6. Expected shortfall is a coherent risk measure.

Proof. 1. (Positivity) Take X ≥ 0 then for all α ∈ (0, 1) we have that qα(X) ≥ 0. Using the integral representation of expected shortfall we find that 1 Z α ESα(X) = − qu(X)du ≤ 0. (3.27) α 0 This proves the positivity property.

47 2. (Positive Homogeneity) Take λ > 0 then for all α ∈ (0, 1) we have that

q(α)(λX) = inf {x ∈ R|P (λX ≤ x) ≥ α} n  x o = inf x ∈ |P X ≤ ≥ α R λ = inf {λx ∈ R|P (X ≤ x) ≥ α} = λ inf {x ∈ R|P (X ≤ x) ≥ α} = λq(α)(X).

Using the integral representation of expected shortfall we get that

1 Z α ESα(λX) = − q(u)(λX)du α 0 1 Z α = − λq(u)(X)du α 0 = λ ESα(X). This proves positive homogeneity. 3. (Translation invariance) Let m ∈ R then we have for all α ∈ (0, 1)

q(α)(X + m) = inf {x ∈ R|P (X + m ≤ x) ≥ α} = inf {x ∈ R|P (X ≤ x − m) ≥ α} = inf {x + m ∈ R|P (X ≤ x) ≥ α} = inf {x ∈ R|P (X ≤ x) ≥ α} + m = q(α)(X) + m.

Using the integral representation of expected shortfall we have that

1 Z α ESα(X + m) = − q(u)(X + m)du α 0 1 Z α = − (q(u)(X) + m)du α 0 1 Z α m Z α = − q(u)(X)du − du α 0 α 0 1 Z α = − q(u)(X)du − m α 0 = ESα(X) − m

4. (Sub-additivity) Take X and Y then we need to show that following inequality holds.

ESα(X) + ESα(Y ) − ESα(X + Y ) ≥ 0. (3.28)

48 Let Z := X + Y and take α > 0. From lemma 3.1 we have that α ESα(X) =  (α)  −E XI (X ≤ x(α)) . We find that:

α (ESα(X) + ESα(Y ) − ESα(Z))  (α) (α) (α)  =E ZI (Z ≤ z(α)) − XI (X ≤ x(α)) − YI (Y ≤ y(α)) .

Using the fact that Z = X + Y , we can rewrite this as

 (α) (α)   (α)  (α)  E X I (Z ≤ z(α)) − I (X ≤ x(α) +E Y I (Z ≤ z(α) − I (Y ≤ y(α)) . (3.29) Now we use lemma 3.2 to obtain following inequalities

 (α)  (α)   (α)  (α)  E X I Z ≤ z(α) − I X ≤ x(α) ≥ xαE I Z ≤ z(α) − I X ≤ x(α)  (α)  (α)   (α)  (α)  E Y I Z ≤ z(α) − I Y ≤ y(α) ≥ yαE I Z ≤ z(α) − I Y ≤ y(α)

We conclude that

 (α) (α)   (α)  (α)  E X I (Z ≤ z(α)) − I (X ≤ x(α)) + E Y I (Z ≤ z(α) − I (Y ≤ y(α)))  (α)  (α)   (α)  (α)  ≥ xαE I Z ≤ z(α) − I X ≤ x(α) + yαE I Z ≤ z(α) − I Y ≤ y(α) = x(α)(α − α) + y(α)(α − α) = 0.

We conclude that expected shortfall satisfies the sub-additivity property.

3.2.2 Consistency with expected utility maximisation We will now look at the consistency of expected shortfall with expected utility maximisation. For this we will need a result from [19, Theorem 5’] where we find following theorem.

Theorem 3.7. Let qα(X1) and qα(X2) be quantiles of X1 and X2 respectively, then following expressions are equivalent

1. X1 ≥SD(2) X2 R α R α 2. 0 qu(X1)du ≥ 0 qu(X2)du for all α ∈ [0, 1] and a strict inequality holds for some α.

Proof. without proof, see [19, Theorem 5’]. In [29] we find following theorem with a proof based on theorem 3.7 and the integral representation of expected shortfall.

Theorem 3.8. Expected shortfall is consistent with second-order stochastic dom- inance.

49 Proof. By definition of second order stochastic dominance we need to show that for all α ∈ (0, 1) X1 ≥SD(2) X2 ⇒ ESα(X1) ≤ ESα(X2). (3.30) From theorem 3.7 we have that Z α Z α X1 ≥SD(2) X2 ⇒ qu(X1)du ≥ qu(X2)du 0 0 1 Z α 1 Z α ⇒ − qu(X1)du ≤ − qu(X2)du α 0 α 0 ⇒ ESα(X1) ≤ ESα(X2).

This concludes the proof.

The fact that expected shortfall is consistent with second order stochastic domi- nance means that if all risk averse and non-saturated investors prefer X1 to X2, then the expected shortfall of X1 is lower than the expected shortfall of X2. This theorem shows that expected shortfall is not only an improvement upon Value at Risk from a mathematical point of view but also from an economic point of view. We would like to point out that the condition that all risk averse investors prefer X1 to X2 is a rather severe one. When this condition is not fulfilled consistency with expected utility maximisation cannot be guaranteed. To illustrate the severity of the assumption of second order stochastic dominance we’ll give a numerical example. Consider two investors A and B with following utility functions.

1 − exp (−0.02x) uA(x) := (3.31) 0.√02 2 uB(x) := 1 + x − 1 + x (3.32)

Notice that both utility functions are increasing and concave. Consider two port- folio’s X1 and X2 such that their net payoffs are given by  2, p = 0.99  X1 = −25, p = 0.0075 (3.33) −50, p = 0.0025 and  5, p = 0.55  X2 = 2, p = 0.44 (3.34) −75, p = 0.01. We have calculated the expected utility for both investors as well as the expected shortfall at a 0.01 level of both portfolios. These results are summarised in table 3.2.2. We notice that on the basis of expected utility investor A prefers X2 to X1 while investor B prefers X1 to X2. We conclude that second order stochastic dominance cannot order X2 and X1.

50 Table 3.1: summery results

X1 X2 conclusion

Expected utility A 1.48 1.74 X2

Definition 3.5. Assume E [X−] < +∞ Then the conditional value at risk at level α of X is defined as   1  − CVaRα(X) = inf E (X − s) − s . (3.35) s∈R α

In [1, Corollary 4.3] it is stated that under a mild integrability condition, expected shortfall and conditional value at risk are the same object. More formally following theorem is stated.

Theorem 3.9. Let X be a real integrable random variable on some probability space (Ω, F,P ) and α ∈ (0, 1) be fixed. Then

ESα(X) = CVaRα(X) (3.36)

Proof. Without proof, see [1, Corollary 4.3].

It is interesting to notice that CVaRα can be rewritten in the form of an optimised certainty equivalent.

  1  − CVaRα(X) = inf E (X − s) − s s∈R α    1  − = inf − s − E (X − s) s∈R α    −1 − = − sup s + E (X − s) s∈R α = − OCEu(X).

−1 Where u(x) = α max(0, −x). Notice that u(0) = 0 and 1 ∈ ∂(0) and that u(x) is increasing. Furthermore because 0 < α < 1 we have that u is a concave function.

−1 u(λx + (1 − λ)y) = max(0, −λx − (1 − λ)y) α −1 −1 ≥ max(0, −λx) + max(0, −(1 − λ)y) α α −1 −1 = λ max(0, −x) + (1 − λ) max(0, −y) α α = λu(x) + (1 − λ)u(y).

51 To get a better understanding of this utility function we have plotted it in figure 3.2.2). We notice that, locally, the investor is risk neutral because the utility function is a piecewise linear function. The interpretation of expected shortfall as the optimised certainty equivalent of −1 an investor with the utility function u(x) = α max(0, −x) reveals a potential criticism. When an investor with this utility function knows he will lose money, he is indifferent between an uncertain loss X and a certain loss E [X]. And when this investor knows he will gain money, his utility score does not depend upon the amount he eventually gains.

Figure 3.7: Utility function for CVaRα with α = 0.05.

52

4 Utility based risk measures

In this chapter we will discuss how utility functions can be incorporated in financial risk measures. The stochastic variable X will model, as always, the net payoffs of the portfolio. We will again assume that X ∈ L∞ (Ω, F,P ) and that X can take positive as well as negative values. When studying risk, we are especially interested in the losses of our portfolio. Instead of using the utility function u to study these losses we will use the associated loss function l defined as: l(x) = −u(−x). (4.1) Non-saturated and risk averse investors are modelled using increasing and concave utility functions. Using the relation 4.1 we can see that this implies that their associated loss function will be increasing and convex.

4.1 Utility based shortfall risk measures

The first class of utility based risk measures that we will discuss was introducted in [13] and [11]. These risk measures are called utility based shortfall risk and can be constructed using the notion of a loss function. In this section we will explain the construction of this utility based risk measure and show the link to the u-Mean certainty equivalent. Definition 4.1. A function l : R → R is called a loss function if it is increasing and not identically constant. Loss functions can induce risk measures in a natural way using the notion of a acceptance set. Take x0 in the interior of l (R). We can now define the following acceptance set:

∞ A : = {X ∈ L (Ω, F,P ) |E [l(−X)] ≤ x0} ∞ = {X ∈ L (Ω, F,P ) |E [−u(X)] ≤ x0} ∞ = {X ∈ L (Ω, F,P ) |E [u(X)] ≥ −x0}.

54 Hence a position X is acceptable if the expected utility of it is larger than a given amount, or equivalently if the expected loss is smaller than a given amount. Using this acceptance set we are able to define the risk measure associated with it.

ρA(X) : = inf{m ∈ R|m + X ∈ A} = inf{m ∈ R|E [l(−X − m)] ≤ x0} = inf{m ∈ R|E [u(X + m)] ≥ −x0}. The risk measure defined above is called utility based shortfall risk and we will l denote it with SF x0 (X). In [5, p. 473] a link between utility based shortfall risk measures and u-Mean certainty equivalents is mentioned. We will derive this link here.

l SF x0 (X) = inf{m ∈ R|E [l(−X − m)] ≤ x0} = − sup{−m ∈ R|E [l(−X − m)] ≤ x0} = − sup{m ∈ R|E [l(−X + m)] ≤ x0} = − sup{m ∈ R|E [u(X − m)] ≥ −x0} = − sup{m ∈ R|E [˜u(X − m)] ≥ 0} = −Mu˜(X)

Whereu ˜(x) = u(x) + x0 = −l(−x) + x0. We conclude that l SF x0 (X) = −Mu˜(X), withu ˜(x) = u(x) + x0 = −l(−x) + x0. (4.2)

The value of x0 has an influence on the utility based shortfall risk. Suppose x1 ≥ x0 and take m ∈ {m ∈ R|E [u (X + m)]+x0 ≥ 0}. Then we have that E [u (X + m)]+ x1 ≥ E [u (X + m)]+x0 ≥ 0. Hence m ∈ {m ∈ R|E [u (X + m)]+x1 ≥ 0}. We can conclude that {m ∈ R|E [u (X + m)]+x0 ≥ 0} ⊂ {m ∈ R|E [u (X + m)]+x1 ≥ 0}. This implies that inf{m ∈ R|E [u (X + m)]+x0 ≥ 0} ≥ inf{m ∈ R|E [u (X + m)]+ x0 ≥ 0}. We conclude that

l l x1 ≥ x0 ⇒ SF x1 (X) ≤ SF x0 (X). (4.3) l In [11, p247] it is stated without proof that SF x0 (X) is a monetary risk measure and that if l is a convex loss function this risk measure is convex. In this thesis we’ll prove these claims. Theorem 4.1. The utility based shortfall risk measure

l SF x0 (X) = inf{m ∈ R|E [l(−X − m)] ≤ x0} (4.4) is a monetary risk measure. l Proof. To prove that SF x0 (X) is a monetary risk measure we will prove that it satisfies the monotonicity property and the translation invariance property. 1. (Monotonicity) Without loss of generality we can assume that X ≤ Y . We have that X ≤ Y ⇒ −X − m ≥ −Y − m ⇒ E [l(−X − m)] ≥ E [l(−Y − m)] .

55 Now take m ∈ {m ∈ R|E [l(−X − m)] ≤ x0}, then we have that

E [l(−Y − m)] ≤ E [l(−X − m)] ≤ x0.

From this we conclude that m ∈ {m ∈ R|E [l(−Y − m)] ≤ x0}. Hence we find that

{m ∈ R|E [l(−X − m)] ≤ x0} ⊂ {m ∈ R|E [l(−Y − m)] ≤ x0}. We can conclude that

inf{m ∈ R|E [l(−X − m)] ≤ x0} ≥ inf{m ∈ R|E [l(−Y − m)] ≤ x0}.

l l This proves that if X ≤ Y then SF x0 (X) ≥ SF x0 (Y ). (Translation invariance) We have that

l SF x0 (X + k) = inf{m ∈ R|E [l(−X − k − m)] ≤ x0} = inf{m ∈ R|E [l(−X − (k + m))] ≤ x0} = inf{m − k ∈ R|E [l(−X − m)] ≤ x0} = inf{m ∈ R|E [l(−X − m)] ≤ x0} − k l = SF x0 (X) − k. This proves the translation invariance property.

We now proof that utility based shortfall risk is a convex risk measure if l is a convex loss function, or equivalently if u is a concave utility function.

Theorem 4.2. If the loss function l is convex, then the utility based shortfall risk measure l SF x0 (X) = inf{m ∈ R|E [l(−X − m)] ≤ x0} (4.5) is a convex risk measure.

Proof. From theorem 1.1 form chapter 1 we know that it is suffici¨ent to prove ∞ that the acceptance set A = {X ∈ L (Ω, F,P )|E [l(−X)] ≤ x0} is convex. Take X ∈ A, Y ∈ A and λ ∈ [0, 1] randomly. We need to prove that λX +(1−λ)Y ∈ A. Because the loss function l is convex, we have that E [l(−(λX + (1 − λ)Y ))] ≤ E [λl(−X) + (1 − λ)l(−Y )] = λE [l(−X)] + (1 − λ)E [l(−Y )]. Since X ∈ A and Y ∈ A we have E [l(−X)] ≤ x0 and E [u(−Y )] ≤ x0. We can conclude that E [l(−(λX + (1 − λ)Y ))] ≤ λ(x0) + (1 − λ)(x0) = x0. This means that λX + (1 − λ)Y ∈ A, which is what we needed to prove.

It is now natural to ask whether this utility based shortfall risk measure is a coherent risk measure. Unfortunately this will not always be the case. I will demonstrate this using a numerical example. Consider a bond which at time t = 0 costs 100, and will pay 105 at time t = 1. Assume risk free interest rate of 2%, and a default probability of the bond of 1%. Then the net payoff X of this investment −100 5 is 1.02 = −98.04 with probability 1% and 1.02 = 4.90 with probability 99%. If

56 we use the exponential utility function u(x) = 1 − exp(−x) and take x0 = 0. l Then SF 0(X) = − Mu(X) = ln E [exp(−X)]. Where the last equality follows from theorem 2.7 in chapter 2. If this risk measure was coherent it would satisfy the property of positive homogeneity, i.e. λ ln E [exp(−X)] = ln E [exp(−λX)], for all λ > 0. However if we take λ = 2, we find that 2 ln E [exp(−X)] = 186.87 and ln E [exp(−2X)] = 191, 47. Which shows that this risk measure is not always coherent. From [27, p.101] we have following fact.

Lemma 4.1. A convex function l : R → R is continuous. From [11, p.248] we have following lemma. We will work out the proof of this lemma. Lemma 4.2. If l is a convex loss function, then the equation

E [l (−z − X)] = x0 (4.6)

l has a unique solution z = SF x0 (X). 1 Proof. Consider a sequence zn with zn ∈ {z ∈ R|E [l(−z − X)] = x0} such l that zn → SF x0 (X) = inf{z ∈ R|E [l(−z − X)] = x0} if n → +∞. Then ∞ lim [l(−zn − X)] = x0. Now we will use that because X ∈ L (Ω, F,P ), n→+∞ E X is a bounded measurable function. Which means that ∃M ∈ R : ∀ω ∈ Ω |X(ω)| ≤ M. Because l : R → R is continuous and zn ∈ R then for all n, we have that 0 0 ∃M ∈ R, ∀ω ∈ Ω, ∀n ≥ 0 : |l(−zn − X(ω))| ≤ M < +∞   Using bounded convergence we have that lim l(−zn − X) = x0. Using the E n→+∞ fact that l is an increasing and convex function and thus continuous we have that     l  l l lim −zn − X = l −SF x (X) − X = x0. Hence SF x (X) is a E n→+∞ E 0 0 solution to 4.6. To show that the solution is unique it is sufficient to notice that if x0 is an interior point of an increasing, convex and non-identically constant −1 function l and that l is strictly increasing in (l (x0) − , +∞) for some  > 0. Because any solution of 4.6 has to lie in this interval we have that the solution is unique. In [11, p.248] we found following theorem and proof.

l Theorem 4.3. The utility-based shortfall risk measure SF x0 (X) is continuous l from below. Hence SF x0 (X) can be represented in the form

l min  SF x0 (X) = max EQ (−X) − α (Q) (4.7) Q∈M1(P )

l Proof. If SF x0 (X) is continuous from below, representation 4.7 follows directly ∞ from theorem 1.9 from the first chapter. Take a sequence Xn ∈ L (Ω, F,P ) l such that Xn % X, point wise. Then SF x0 (Xn) & R ∈ R. We need to

1 We know this set is not empty because x0 is assumed to be an internal point of l.

57 l show that R = SF x0 (X). Just as in the proof of the above lemma 4.2 we  l  can use bounded convergence to obtain that lim l(−SF x (Xn) − Xn) = n→+∞ E 0   l lim l(−SF x (Xn) − Xn) . Again using the fact that l is continuous we have E n→+∞ 0   l  l  lim l(−SF x (Xn) − Xn) = [l(−R − X)]. Because l(−SF x (Xn) − Xn) = E n→+∞ 0 E E 0  l  x0 for all n we have that lim l(−SF x (Xn) − Xn) = x0. This implies that n→+∞ E 0 E [l(−R − X)] = x0. Since we know that the only solution to equation 4.6 is l l SF x0 (X). We have that R = SF x0 (X).

Equation 4.2 states the link between the link between the utility based shortfall risk measure and the u-Mean certainty equivalent. We will work out the idea of using strong Lagrangian duality proposed in [5] to derive a relation between the optimised certainty equivalent en the u-Mean certainty equivalent. From [14, p60] we have following theorem concerning strong Lagrangian duality.

Theorem 4.4. Let X be a non-empty convex set in Rn, Let f : Rn → R and g : Rn → Rm be convex and h : Rn → Rl be affine. Suppose that the following constraint is satisfied: There exists a xˆ ∈ X such that g(ˆx) < 0 and h(ˆx) = 0 with 0 ∈ int(h(X)) where h(X) = {h(x)|x ∈ X}. Then

inf{f(x)|x ∈ X, g(x) ≤ 0, h(x) = 0} = sup{θ(u, v)|u ≥ 0}. (4.8)

Where θ(u, v) = inf{f(x) + uT g(x) + vT h(x)|x ∈ X}. Furthermore if the infimum is finite, then sup{θ(u, v)|u ≥ 0} is achieved at (ˆu, vˆ) with uˆ ≥ 0. If the infimum is achieved at xˆ, then uˆT g(ˆx) = 0.

For λ > 0, denote with OCEλu(X) := sup (η + λE [u(X − η)]). By definition we η∈R have that:

l SF x0 = inf{η ∈ R|E [l(−X − η)] ≤ x0}. ˜ Use following translation l(x) = l(x) − x0, we have that

˜l h˜ i SF 0 = inf{η ∈ R|E l(−X − η) ≤ 0}. h i The Lagrange function for this problem is given by L(λ) = η + λE ˜l(−X − η) . We want to apply the strong duality theorem. We have that f(η) = η is a convex h i function. We also have that g(η) := E ˜l(−X − η) is a convex function. This follows from the convexity of l. To formally prove this, take η1, η2 ∈ R and

58 t ∈ [0, 1], then we have that h˜ i g(tη1 + (1 − t)η2) = E l(−X − tη1 − (1 − t)η2) h˜ i = E l(−tX − (1 − t)X − tη1 − (1 − t)η2) h˜ i = E l (t(−X − η1) + (1 − t)(−X − η2)) h ˜ ˜ i ≤ E tl(−X − η1) + (1 − t)l(−X − η2) h˜ i h˜ i = tE l(−X − η1) + (1 − t)E l(−X − η2)

= tg(η1) + (1 − t)g(η2). To apply strong Lagrangian duality we need to show that there exists an internal solution. That is, we need to find aη ˆ such that g(ˆη) < 0. Because x0 is an internal point of l(R), 0 is an internal point of ˜l(R). Using the same arguments as with equation 4.6 we find that there exists an  > 0 such that the equation h i E ˜l (−X − η) = − has a solutionη ˆ. From this we can conclude there exists an h i ηˆ such that g(ˆη) = E ˜l (−X − ηˆ) = − < 0, which proves the existence of the internal solution. Denote withu ˜ the utility function associated with ˜l. Because h i ˜l is a continuous and non-decreasing function, the restriction E ˜l(−X − η) ≤ 0 will be binding. Hence we can assume λ > 0. We can now apply the strong duality theorem.  h i Using that θ(λ) = inf η + λE ˜l(−X − η) we find that η∈R

˜l n h˜ i o SF 0 = inf η ∈ R|E l(−X − η) ≤ 0   h i = sup inf η + λE ˜l(−X − η) λ>0 η∈R   = sup inf (η − λE [˜u(X + η)]) λ>0 η∈R   = sup inf (−η − λE [˜u(X − η)]) λ>0 η∈R   = sup − sup (η + λE [˜u(X − η)]) λ>0 η∈R   = − inf sup (η + λE [˜u(X − η)]) λ>0 η∈R

= − inf (OCEλu˜(X)) . λ>0

˜l We already showed that SF 0 = −Mu˜. Hence we have that

Mu˜ = inf (OCEλu˜(X)) . (4.9) λ>0 From this we conclude that

59 Mu ≤ OCEu(X). (4.10)

4.2 Divergence risk measures

Apart from utility based shortfall risk measures there is another way to incorpo- rate utility functions into risk measures. Although less obvious, divergence risk measures are another example of utility based risk measures. The next section is devoted to the study of this class of risk measures.

4.2.1 Construction and representation Divergence risk measures are based on the robust representation of a convex risk measure, something which we have discussed in the first chapter. The robust representation of a risk measure has the following form

ρ(X) = sup (EQ [−X] − α(Q)) . (4.11) Q∈M1(P ) In this representation we have taken some probabilistic models more seriously than others using the penalty function α(Q). In divergence based risk measures this penalty function will be the φ-divergence. We will make following assumptions on the function φ:

1. φ : R → (−∞, +∞] is a proper2 closed convex function.

2. φ is lower semicontinous.

3. If the effective domain3 is denoted by dom φ, then 1 ∈ int(dom φ).

4. The minimum of φ is 0 which is attained at 1.

The class of functions for which these properties are satisfied will be denoted with Φ. We will call the function φ a divergence function.

Definition 4.2. For φ ∈ Φ the φ-divergence of the probability measure Q with respect to P is defined as

 R φ dQ  dP if Q ∈ M (P ) I (Q|P ) = Ω dP 1 (4.12) φ +∞, otherwise

dQ Where dP denotes the Radon-Nikodym derivative.

2Which means there exists an x ∈ R such that φ(x) < +∞. 3The effective domain of the proper function is the set {x|φ(x) < +∞}.

60 Note that if the probability measure Q would not be absolute continuous with respect to P then the Radon-Nikodym derivative would not be well defined. Us- ing the φ-divergence as a penalty function we can define divergence based risk measures.

Definition 4.3. The φ-divergence based risk measure is defined as

Dφ(X) = sup (E[ − X ]−Iφ (Q|P )) . (4.13) Q∈M1(P )

In what follows we will often use the Legendre transform. This transform is some- times also called the Fenchel-Legendre transform.

Definition 4.4. The Legendre transform of a convex function l : R → R ∪ {+∞} is defined as ∗ l (y) := sup (yx − l(x)) , y ∈ R. (4.14) x∈R At first sight it might not be clear why divergence risk measures are also utility based risk measures. However, it turns out that divergence risk measures are in fact negative optimised certainty equivalents. The negative of the optimised certainty equivalent can be viewed as the dual optimisation problem of the divergence risk measure. i.e for u(x) = −φ∗(−x) we have that

sup (EQ[ − X ]−Iφ (Q|P )) = − sup (η + E [u (X − η)]) . (4.15) Q∈M1(P ) η∈R We want to make the remark that in the optimisation problem on the left hand side of 4.15 we optimise over an infinite dimensional space, while on the right hand side the optimisation happens over a finite dimensional space. In [5] the authors use strong Lagrangian duality to obtain this link. However we are not convinced that they checked all necessary assumptions to conclude that strong duality holds. Therefore we have added the assumption that φ is a lower semicontinuous function, an assumption also made in [11, p.256]. Using this extra assumption and the ideas proposed in [5] we have reworked the proof of 4.15. Instead of using strong La- grangian duality we will use the closely related concept of Fenchel duality to prove this connection. Using this type of duality explains why the Fenchel-Legendre transformation turns up in some of the equations. We will need the concept of the core of a set.

Definition 4.5. If X is a normed space then the core of a set A ⊂ X is defined by x ∈ core(A) if for each h ∈ {x ∈ X|kxk = 1} there exists an δ > 0 such that x + th ∈ A for all 0 ≤ t ≤ δ.

Lemma 4.3. If A is a set then int(A) ⊂ core(A).

Proof. without proof, see [9]. Our proof will be based on following duality theorem regarding Fenchel duality with equality constraints. From [7, Corollary 1.3] we have that

61 Theorem 4.5. (Fenchel Duality theorem for linear constraints) Let X and Y be Banach spaces. Given any f : X → (−∞, +∞]. Any bounded map A : X → Y any element b ∈ Y . The following weak duality holds:

inf {f(x)|Ax = b} ≥ sup {hb, µi − f ∗(A∗µ)}. (4.16) x∈X µ∈Y ∗

If f is lower semicontinuous and b ∈ core(A dom f), then we have equality. And the supremum is attained if finite.

In [16] we found the following Fatou property which states that

Theorem 4.6. (Fatou property) Let g, fn for n ∈ N be measurable functions such R that fn ≥ g for all n and gdµ > −∞, then Z Z lim inf fndµ ≥ lim inf fndµ. (4.17) n→∞ n→∞ We also have that

Theorem 4.7. Let Ω be a σ-finite measure space, and X := Lp(Ω, F,P ), p ∈ [1, +∞]. Let g : R × Ω → (−∞, +∞] be a normal integrand, and define on X the R integral function Ig(x) := Ω g (x(ω), ω) dP (ω). Then, Z Z inf g(x(ω), ω)dP (ω) = inf g(s, ω)dP (ω), (4.18) x∈X Ω Ω s∈R provided the left-hand side is finite. Moreover,

x¯ ∈ arg min Ig(x) ⇔ x¯(ω) ∈ arg min Ig(s, ω), a.e. (4.19) x∈X s∈R Proof. Theorem from [5, p20].

Theorem 4.8. Let f : R × Ω → (−∞, +∞]. If f(·, ω) is (convex) and closed for almost all ω, and measurable in ω for each x such that dom f(·, ω) has a non-empty interior for every ω, then f is a normal (convex) integrand.

Proof. Without proof, see [5].

Theorem 4.9. For all p ∈ [1, ∞] the spaces Lp(Ω, F,P ) are Banach spaces

Proof. Without proof, see [11, p207]. Before we prove theorem 4.15 we will prove some lemma’s which will make the final proof easier.

Lemma 4.4. If z ∈ L1(Ω, F,P ) then the functional B : L1 → R defined by R B(z) = Ω z(ω)dP (ω) is continuous and linear. 1 Proof. Take z1, z2 ∈ L (Ω, F,P ). We need to show that ∀ > 0 ∃δ > 0 such that if kz1 − z2kL1 < δ then |B(z1) − B(Z2)| < . We have that kz1 − z2kL1 = R Ω |z1(ω) − z2(ω)|dP (ω) < δ. We can now conclude that

62 Z Z Z

|B(z1)−B(Z2)| = z1(ω)dP (ω) − z2(ω)dP (ω) ≤ |z1(ω)−z2(ω)|dP (ω) < δ. Ω Ω Ω (4.20) Hence for each  > 0 we can pick δ such that δ = . The linearity of the functional B follows from the fact that the Lebegues integral is linear. A standard result from functional analysis yields that a linear operator between normed spaces is bounded if and only if it is a continuous linear operator. From this we can conclude that B is a bounded functional.

Lemma 4.5. If A and B are sets such that A ⊂ B then int(A) ⊂ int(B)4

Proof. take a ∈ int(A) then there exists an environment U of a with U ⊂ A. Because A ⊂ B we have U ⊂ B. Hence U is an environment of a in B. we conclude that a ∈ int(B). Because a was chosen randomly, we can conclude that int(A) ⊂ int(B).

Lemma 4.6. If X ∈ L∞(Ω, F,P ), then the function g : L1(Ω, F,P ) → R defined R by g(z) := Ω X(ω)z(ω)dP (ω) is continuous. 1 Proof. Take z1, z2 ∈ L (Ω, F,P ), such that for δ > 0 kz1 − z2kL1 < δ. This means R that Ω |z1(ω) − z2(ω)|dP (ω) < δ. Then we have that Z

|g(z1) − g(z2)| = X(ω)(z1(ω) − z2(ω)) dP (ω) Ω Z ≤ |X(ω)(z1(ω) − z2(ω))| dP (ω) Ω Z ≤ sup |X(ω)| |z1(ω) − z2(ω)|dP (ω) ω Ω < sup |X(ω)|δ ω Because X ∈ L∞(Ω, F,P ) we have sup |X(ω)| < +∞. Hence if we pick δ = ω  > 0 then kz − z k < δ implies |g(z ) − g(z )| < . sup |X(ω)| 1 2 L1 1 2 ω

We can now prove the main theorem of this section.

Theorem 4.10. Let φ ∈ Φ and let X ∈ L∞(Ω, F,P ). Then

∗ inf (EQ [X] + Iφ (Q|P )) = sup (η − EP [φ (η − X)]) . (4.21) Q∈M1(P ) η∈R

4int(A) denotes the interior of the set A.

63 Therefore with u(t) := −φ∗(−x), we have

OCEu(X) = inf (EQ [X] + Iφ (Q|P )) (4.22) Q∈M1(P )

= − sup (EQ [−X] − Iφ (Q|P )) (4.23) Q∈M1(P )

= −Dφ(X). (4.24) (4.25)

Proof. Take φ in Φ. Let v := inf (EQ [X] + Iφ (Q|P )). Now fix Q ∈ M1(P ). Q∈M1(P ) Then by definition of M1(P ), Q is absolute continuous with respect to P . Us- ing the Radon-Nikodym theorem we have that this is equivalent with the exis- dQ(ω) tence of a density z(ω) := dP (ω) . We have that z ≥ 0 a.e. and it is clear that R 1 Ω |z(ω)|dP (ω) = 1. Hence we have that z(ω) ∈ L (Ω, F,P ).

v = inf (EQ [X] + Iφ (Q|P )) Q∈M1(P )   dQ = inf EQ [X] + EP φ Q∈M1(P ) dP Z Z Z  = inf φ (z(ω)) dP (ω) + X(ω)z(ω)dP (ω) z(ω) = 1, z ≥ 0 a.e. 1 z∈L Ω Ω Ω Z Z Z  = inf φ (z(ω)) dP (ω) + X(ω)z(ω)dP (ω) z(ω) = 1 . 1 z∈L Ω Ω Ω The last equality follows from the fact that if z(ω) < 0 for a set S ⊂ Ω with P (S) > 0 then z can not correspond to the Radon-Nikodym derivative of a cer- tain probability measure Q with respect to P . By definition of φ-divergence we have that R φ (z(ω)) dP (ω) = +∞. Furthermore we always have that −∞ < R Ω R Ω X(ω)z(ω)dP (ω) ≤ supω∈Ω |X(ω)| Ω z(ω)dP (ω) < +∞. From all this it fol- lows that if z(ω) < 0 for a set S ⊂ Ω with P (S) > 0 then R φ (z(ω)) dP (ω) + R Ω Ω X(ω)z(ω)dP (ω) = +∞. Therefore we can conclude that the last equality holds. We want to apply theorem 4.5 regarding Fenchel duality for linear constraints. In the context of this theorem let f : L1(Ω, F,P ) → (−∞, +∞] and defined by Z Z f(z) := φ(z(ω))dP (ω) + X(ω)z(ω)dP (ω). (4.26) Ω Ω and let A : L1(Ω, F,P ) → R: Z A(z) := z(ω)dP (ω). (4.27) Ω Then A is linear because the Lebegues integral is linear. It is bounded because it is also a continuous functional, see lemma 4.4. Let b = 1 and note that R∗ = R. We will now calculate

d := sup (hb, µi − f ∗(A∗µ)) . (4.28) µ∈R

64 We have that

f ∗(A∗µ) = sup (hA∗µ, zi − f(z)) z∈L1 = sup (hµ, Azi − f(z)) z∈L1  Z Z Z  = sup µ z(ω)dP (ω) − φ(z(ω))dP (ω) − X(ω)z(ω)dP (ω) z∈L1 Ω Ω Ω  Z Z  = sup − φ(z(ω))dP (ω) + (µ − X(ω))z(ω)dP (ω) z∈L1 Ω Ω Z Z  = − inf φ(z(ω))dP (ω) − (µ − X(ω))z(ω)dP (ω) 1 z∈L Ω Ω Z  = − inf (φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) . 1 z∈L Ω We now want to apply theorem 4.7. To be able to apply this theorem we first need to check that I(s, ω) := φ(s) − (µ − X(ω))s is a normal integrand. For this we can use lemma 4.8. I(s, ω) is convex and closed in s for almost all ω because φ is convex and closed. 1 ∈ int(dom I(·, ω)) for every ω, because 1 ∈ int(dom φ). We also need to prove that Z  inf (φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) (4.29) 1 z∈L Ω is finite. By assumption the minimum of φ is 0 which is attained at 1. Hence we have for all z.

Z Z − ∞ < (µ − X(ω)) dP (ω) ≤ (φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω). (4.30) Ω Ω

The first strict inequality follows from the fact that EP [X] is finite. Hence R (φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) is bounded from below. Which implies that Ω Z  that −∞ < inf (φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) . Because z = 1 is a 1 z∈L Ω possible solution we have Z  Z inf (φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) ≤ − (µ − X(ω)) dP (ω) < +∞. 1 z∈L Ω Ω We can conclude that Z  inf (φ(z(ω)) − (µ − X(ω))z(ω)) dP (ω) 1 z∈L Ω is finite and that we can apply theorem 4.7. We find that

65 Z f ∗(A∗µ) = − inf(φ(s) − (µ − X(ω))sdP (ω) Ω s∈R Z = sup ((µ − X(ω))s − φ(s)) dP (ω) Ω s∈R Z = φ∗ (µ − X(ω)) dP (ω). Ω Using the fact that b = 1 we can conclude that

d = sup (µ − f ∗(A∗µ)) µ∈R  Z  = sup µ − φ∗ (µ − X(ω)) dP (ω) . µ∈R Ω Using theorem 4.5 we can conclude that we have weak duality which means that

Z Z Z  inf φ (z(ω)) dP (ω) + X(ω)z(ω)dP (ω) z(ω) = 1 1 z∈L Ω Ω Ω  Z  ≥ sup µ − φ∗ (µ − X(ω)) dP (ω) . µ∈R Ω We now want to show that the equality holds. We need to show that 1 ∈ core(A dom f), this will follow from the assumption that 1 ∈ int(dom φ). We will first show that dom φ ∩ R ⊂ A dom f. Take w ∈ dom φ ∩ R. Then by defini- tion of the effective domain M := φ(w) < +∞ and we have that R φ(w)dP (ω) + R Ω Ω X(ω)wdP (ω) = M + wEP [X] < +∞. Hence w ∈ dom f. Because Aw = w we have w ∈ A dom f. By assumption we have 1 ∈ int(dom φ ∩ R). Using lemma 4.5 we can conclude that 1 ∈ int(A dom f). Using lemma 4.3 we conclude that 1 ∈ core(A dom f). We also need to show that f is lower semicontinuous. Which means we need to prove that

lim inf f(z) ≥ f(z0). (4.31) zn→z0 R R Denote with h(z) := Ω φ(z(ω))dP (ω) and with g(z) := Ω X(ω)z(ω)dP (ω). Then f(z) = h(z) + g(z). Using the sub-additivity property of limit inferior we find that

lim inf f(z) ≥ lim inf h(z) + lim inf g(z). zn→z zn→z zn→z

We know from lemma 4.6 that g(z) is continuous. This implies that g(z) is lower semicontinuous and we can conclude that lim inf g(z) ≥ g(z0). zn→z 1 For each sequence zn ∈ L , we can define a sequence φn(ω) := φ(zn(ω)) such that φ0(ω) := φ(z(ω)). Then

66 Z Z lim inf φ(zn(ω))dP (ω) = lim inf φn(ω)dP (ω). zn→z0 Ω n→∞ Ω

We have assumed that the minimum of φ is 0. Therefore we have that φn(ω) = R φ(zn(ω)) ≥ 0. Because Ω 0dP (ω) = 0 and φn are measurable functions because φ is a measurable function5 we can use Fatou’s lemma. We have that Z Z lim inf φn(ω)dP (ω) ≥ lim inf φn(ω)dP (ω) n→∞ Ω Ω n→∞ Z = lim inf φ(zn(ω))dP (ω) Ω zn→z0 Z ≥ φ(z0(ω))dP (ω) Ω = h(z0).

Where in the last inequality we have used that φ is lower semicontinuous. We find that

lim inf f(z) ≥ lim inf h(z) + lim inf g(z) ≥ h(z0) + g(z0) = f(z0). (4.32) zn→z0 zn→z0 zn→z−0 Which proves the lower semicontinuity of f. We can conclude that

Z Z Z   Z  inf φ (z(ω)) + X(ω)z(ω)dP (ω) z(ω) = 1 = sup µ − φ∗ (µ − X(ω)) dP (ω) . 1 L Ω Ω Ω µ∈R Ω (4.33) Which means that

∗ inf (Iφ (Q|P ) + EP [X]) = sup (µ − E [φ (µ − X)]) . (4.34) Q∈M1(P ) µ∈R This concludes the proof.

Using the relationship between optimised certainty equivalents and φ-divergence risk measures and the relationship between u-Mean certainty equivalents and util- ity based shortfall risk measures we can derive the robust representation of a utility based shortfall risk measure. This representation was found in [13]. Their proof is rather technical and is outside the scope of this thesis. Therefore we will use the proof suggested in [5] to obtain this result. For this we will first derive some elementary properties of the Legendre transform. In [11] we found following result.

Theorem 4.11. If φ is a proper convex function which is lower semicontinuous, then φ∗∗ = φ. I.e φ(t) = sup (xt − φ∗(x)) . (4.35) x∈R 5Because it is lower semicontinuous.

67 Proof. Without proof, see [11, p479]. From this result we can conclude that if φ is the divergence function which is linked to the utility function u by u(t) = −φ∗(−t), or equivalently to the loss function by φ∗(t) = l(t), then φ can be obtained by φ(t) = l∗(t).

Lemma 4.7. If f is a convex function and let f ∗ denote its Legendre transform then: ∗ ∗ t  If λ > 0 then (λf) (t) = λf λ . Proof.

 t   t  (λf)∗ (t) = sup (xt − λf(x)) = λ sup x − f(x) = λf ∗ x∈R x∈R λ λ

˜ Lemma 4.8. If l is a convex function and define l := l − x0 with x0 ∈ R. Then ˜∗ ∗ l = l + x0. Proof.   ˜∗ ˜ ∗ l (t) = sup xt − l(x) = sup (xt − l(x) + x0) = sup (xt − l(x)) + x0 = l (t) + x0 x∈R x∈R x∈R

Theorem 4.12. For any convex loss function l, the minimal penalty function in the representation (4.7) is given by     min 1 ∗ dQ α (q) = inf x0 + EP l λ ,Q ∈ M1(P ). (4.36) λ>0 λ dP

In particular we have      l 1 ∗ dQ ∞ SF x0 (X) = max EQ (−X) − inf x0 + EP l λ ,X ∈ L . Q∈M1(P ) λ>0 λ dP (4.37)

Proof. By definition we have that

l SF x0 (X) = inf{η ∈ R|EP [l(−X − η)] ≤ x0} = inf{η ∈ R|EP [l(−X − η) − x0] ≤ 0} h˜ i = inf{η ∈ R|EP l(−X − η) ≤ 0} ˜l = SF 0(X). ˜ Where l = l − x0. Denote withu ˜ the associated utility function. From equation l 4.2 we have that SF x0 (X) = −Mu˜(X). Using equation 4.9 and theorem 4.10 we have that

68 Mu˜(X) = inf (OCEλu˜(X)) λ>0  = inf inf EQ [X] + Iφ˜ (Q|P ) λ>0 Q∈M1(P )     ˜ dQ = inf inf EQ [X] + EP φ λ>0 Q∈M1(P ) dP     ˜∗ dQ = inf inf EQ(X) + EP λl λ>0 Q∈M1(P ) λdP     ∗ dQ = inf inf EQ [X] + EP λx0 + λl λ>0 Q∈M1(P ) λdP     ∗ dQ = inf EQ [X] + inf EP λx0 + λl Q∈M1(P ) λ>0 λdP     ∗ dQ = inf EQ [X] + inf λEP x0 + l Q∈M1(P ) λ>0 λdP     1 ∗ dQ = inf EQ [X] + inf EP x0 + l λ Q∈M1(P ) λ>0 λ dP     1 ∗ dQ = − sup EQ [−X] − inf EP x0 + l λ . λ>0 Q∈M1(P ) λ dP

In the forth equality we have used that φ˜∗(t) = −λu˜(−t) and hence φ˜(t) =  ∗  ∗ ˜ ˜ ˜∗ t  λl (t). Using 4.7 we have that λl (t) = λl λ . In the fifth equality we have used lemma 4.8. l Using the relation that SF x0 = −Mu˜(X) we can conclude that     l 1 ∗ dQ SF x = sup EQ [−X] − inf EP x0 + l λ . (4.38) 0 λ>0 Q∈M1(P ) λ dP Which proves the theorem. We have shown the relation of utility based risk measures with certainty equival- lents defined in chapter two. We have also discussed the relation between these risk measures. We summarize the main results in table 4.2.1.

Table 4.1: summary utility based risk measures l SF x0 (X) Dφ(X) certainty equivalent −Mu˜(X) − OCEu(X) ∗ withu ˜(x) = −l(−x) + x0 with −u(−x) = φ (x)

Utility representation − sup{m ∈ R|E [u(X − m)] ≥ −x0} − sup (η + E [u (X − η)]) η∈R 1  ∗ dQ   dQ  penalty function infλ>0 λ EP x0 + l λ dP EP φ dP

At this point we will take a closer look at the assumptions we have made. One of these assumptions was that the utility functions we use are normalised. We

69 followed [5] by only considering the subset of utility functions which are non- decreasing and concave and for which u(0) = 0 and 1 ∈ ∂u(0). The authors of [5] give no clear explanation why they chose this normalisation. However they do state that they need this normalisation to be able to give a clear economic interpretation of the optimised certainty equivalent. They interpret the optimised certainty equivalent as a decision problem and use the utility function to ’discount’ an uncertain payoff. If X is an uncertain payoff, then E [u(X)] is the value of this payoff. If you give the investor the possibility to consume a part η of this uncertain income in advance, then he gets η + E [u(X − η)]. The investor tries to optimise the decision on how much to consume in advance. Using this normalisation they guarantee that u(x) ≤ x. As we have shown in the second chapter this is equivalent with OCEu(X) ≤ E [X], a condition which reflects risk aversion. If the investor consumes to much in advance then u(·) will penalise this. If on the other hand the investor consumes to little in advance then this can be seen as a missed opportunity. Al his money (or more) is stuck in the uncertain payoff and since he is risk averse, this would also be penalized by u(·). The investors optimal allocation results in the optimised certainty equivalent. Remember that under the von Neumann- Morgenstern axioms the utility function of an investor is only unique up to an affine transformation. This has the undesirable effect that the same investor, modelled by two different utility functions can have different optimised certainty equivalents, because the optimised certainty equivalent is not invariant under an affine transformation of the utility function u. This makes the optimised certainty equivalent not a ’real’ certainty equivalent. From an economic point of view the standardisation of the utility functions is essential to give a clear interpretation to the optimised certainty equivalent, and to bypass the problem that the optimised certainty equivalent is not invariant under an affine transformation of the utility function. From a mathematical point of view however, the dependence of the optimised cer- tainty equivalent on the specific standardization of the utility function is not really a problem but can be viewed as an opportunity. By wisely choosing a standard- isation of the utility function, one can alter the optimised certainty equivalent, and thus the risk measure. Now a new question occurs: ”What would be a good standardisation of the utility function from a mathematical point of view?” To answer this question we will need to further examine the connection between the utility function and the divergence function. In the robust representation of the divergence risk measure one can observe that the divergence is a penalty function. A good penalty function would heavily penalise models Q which deviate a lot from the fixed model P , while lightly penalising models which are very close to P . Therefore it would be intuitive to assume that the divergence penalises the model P the least. That is φ(t) attains its minimum for t = 1. Using that −u(−x) = φ∗(t) we have that

u(0) = − sup (0 · t − φ(t)) t∈R = inf (φ(t)) . t∈R Assuming the infimum is attained we can conclude that u(0) is be the minimal penalty given. In what follows we will assume that u ∈ C1. We are interested in

70 what the condition u0(0) = 1 imposes on the divergence function φ. Notice that φ(1) = sup (1x + u(−x)). If u0(0) = 1, then 1 − u0(−x) = 0 has x∈R a solution x = 0. Because u is concave we have that u00(x) ≤ 0. Therefore φ(1) = u(0), which again states that the penalty given to P equals u(0). Hence the standardisation u(0) = 0 and u0(0) = 1 implies that φ attains its minimum at 1 and φ(1) = 0.

4.2.2 The coherence of divergence risk measures We know that divergence risk measures are always convex. This follows easily from the properties of the optimised certainty equivalent proven in theorem 2.4 of the second chapter. However divergence risk measures are not always coherent. In chapter one, theorem 1.7, we have seen that the penalty function of a coherent risk measure can only take the values 0 or +∞. In the case of divergence risk measures this would imply that the divergence is either 0 or +∞. This is a rather restrictive condition. It is now natuaral to ask which utility functions give rise to coherent divergence risk measures. This question was answered in [5]. For this the < authors considered the class of strongly risk averse utility functions U0 . I.e.

< u ∈ U0 if and only if u ∈ U0 and u(t) < t ∀t 6= 0.

We will further assume that u is continuous. In [5, lemma 2.1] we find following lemma.

Lemma 4.9. Let u : R → [−∞, +∞] be a proper closed and concave function. 0 0 Then the right and left derivatives u+ and u− exist as extended real numbers, and

0 0 0 0 1. for all a < t < b we have that u+(a) ≥ u−(t) ≥ u+(t) ≥ u−(b), and

2. the subdifferential is given by

0 0 ∂u(t) = {s ∈ R|u+(t) ≤ s ≤ u−(t)}. (4.39)

Denote with g(η) := η + E [u(X − η)] then it is stated in [5, proposition 2.1] that

∗  0 ∗   0 ∗  η ∈ arg max(g(η)) ⇔ E u+(X − η ) ≤ 1 ≤ E u−(X − η ) . (4.40) Where the authors have assumed that they can freely interchange the derivative and the expectation operator. They claim this is the case when the one-sided derivatives of u are continuous and when the associated expected values are finite. Hence if u ∈ C1 we have that

∗ 0 ∗ η ∈ arg max(g(η)) ⇔ E [u (X − η )] = 1. (4.41) In [5, Theorem 3.1] we find following theorem which characterises the utility func- tions for which the associated divergence risk measure is coherent.

71 < Theorem 4.13. In the class U0 of strongly risk-averse utility functions that are finite valued, The divergence risk measure Dφ(X) = − OCEu(X) is a coherent risk measure if and only if u is the piecewise linear function given by ( γ x, if x ≤ 0 u(x) = 2 (4.42) γ1x, if x > 0 for some γ2 > 1 > γ1 ≥ 0. Proof. The proof of this theorem was taken from [5, propositions 3.1, 3.2] and consists of two parts, theorem 4.14 and 4.15.

< Theorem 4.14. Let u ∈ U0 . Then OCEu(X) is positively homogeneous for all random variables X if and only if u is positively homogeneous.

Proof. First suppose that u is positive homogeneous. Then we need to show that OCEu is positive homogeneous. Take λ > 0, we have that

OCEu(λX) = sup (η + E [u(λX − η)]) η∈R = sup (λη + E [u(λX − λη)]) λη∈R = sup (λη + λE [u(X − η)]) λη∈R = λ sup (η + E [u(X − η)]) λη∈R

= λ OCEu(X).

Which proves that the OCE is positive homogeneous. Take α > 0 > β, and consider the random variable X such that P (X = α) = p and P (X = β) = 1 − p. Now denote with

g(η) := η + pu(α − η) + (1 − p)u(β − η). (4.43)

Then the optimised certainty equivalent is given by

OCEu(X) = sup (η + pu(α − η) + (1 − p)u(β − η)) = sup g(η). (4.44) η∈R η∈R

< Because u ∈ U0 we have that 1 ∈ ∂u(0). Hence by lemma 4.9 we have that

0 0 u+(0) ≤ 1 ≤ u−(0).

Because α > 0 > β we can again apply lemma 4.9 such that

0 0 0 0 0 0 u−(α) ≥ u+(α) ≥ u−(0) ≥ 1 ≥ u+(0) ≥ u−(β) ≥ u+(β) > 0. (4.45)

We know from equation 4.40 that

∗  0 ∗   0 ∗  η ∈ arg max(g(η)) ⇔ E u−(X − η ) ≥ 1 ≥ E u+(X − η ) . (4.46)

72 Hence 0 ∈ arg max(g(η)) if and only if

0 0 0 0 pu−(α) + (1 − p)u−(β) ≥ 1 ≥ pu+(α) + (1 − p)u+(β).

From this we have that

0 0 0 1 − u−(β) pu−(α) + (1 − p)u−(β) ≥ 1 ⇔ p ≥ 0 0 , u−(α) − u−(β) and

0 0 0 1 − u+(β) pu+(α) + (1 − p)u+(β) ≤ 1 ⇔ p ≤ 0 0 . u+(α) − u+(β) Hence

0 0 1 − u−(β) 1 − u+(β) 0 ∈ arg max(g(η)) ⇔ 0 0 ≤ p ≤ 0 0 . (4.47) u−(α) − u−(β) u+(α) − u+(β) We will now check whether the right hand side of equation 4.47 is well defined. 0 0 0 0 Because u−(α) > 1 > u−(β) > 0, and similarly u+(α) > 1 > u+(β) > 0 we have 0 0 1−u−(β) 1−u+(β) that 0 < 0 0 < 1 and 0 < 0 0 < 1. We conclude that we always u−(α)−u−(β) u+(α)−u+(β) have that p ∈ (0, 1). We also need to show that

0 0 1 − u−(β) 1 − u+(β) 0 0 ≤ 0 0 . (4.48) u−(α) − u−(β) u+(α) − u+(β) We will prove that

0 0 0 0 0 0 (1 − u+(β))(u−(α) − u−(β)) − (1 − u−(β))(u+(α) − u+(β)) ≥ 0. (4.49)

We have that

0 0 0 0 0 0 (1 − u+(β))(u−(α) − u−(β)) − (1 − u−(β))(u+(α) − u+(β)) 0 0 0 0 0 0 0 0 = u−(α) − u−(β) − u−(α)u+(β) − u+(α) + u+(β) + u+(α)u−(β) 0 0  0 0 0 0 0 = u−(α) 1 − u+(β) − u−(β) − u+(α) + u+(β) + u+(α)u−(β) 0 0  0 0 0 0 0 ≥ u+(α) 1 − u+(β) − u−(β) − u+(α) + u+(β) + u+(α)u−(β) 0 0 0  0 0 = u+(α) 1 − u+(β) − 1 + u−(β) − u−(β) + u+(β) 0 0 0  0 0  = u+(α) u−(β) − u+(β) − u−(β) − u+(β) 0 0  0  = u−(β) − u+(β) u+(α) − 1 ≥ 0.

0 0 Where in the first inequality we have used the fact that u−(α) ≥ u+(α). We can conclude that the expression on the right side of 4.47 is well defined. Now take p0 ∗ such that it satisfies 4.47. Then η = 0 is an optimal solution and OCEu(X) = p0u(α) + (1 − p0)u(β). Take λ ∈ (0, 1), then because we do not necessarily know ∗ that η = 0 is an optimal solution for OCEu(λX) we have following inequalities.

73 OCEu(λX) = sup (η + E [u(λX − η)]) η∈R

≥ p0u(λα) + (1 − p0)u(λβ)

= p0u(λα + (1 − λ)0) + (1 − p0)u(λβ + (1 − λ)0)

≥ λ (p0u(α) + (1 − p0)u(β)) + (1 − λ)(p0u(0) + (1 − p0)u(0))

= λ (p0u(α) + (1 − p0)u(β))

= λ OCEu(X)

Where we have used that u is a concave function such that u(0) = 0. Because we assumed that the optimised certainty equivalent is positive homogeneous, all equalities should hold. Hence we have that

p0u(λα) + (1 − p0)u(λβ) = λp0u(α) + λ(1 − p0)u(β) We can rewrite this and find that

p0 (u(λα) − λu(α)) + (1 − p0)(u(λβ) − λu(β)) = 0 (4.50)

Because u is a concave utility function for which u(0)=0. We have that for all x ∈ R and λ ∈ [0, 1]. u(λx) = u(λx + (1 − λ)0) ≥ λu(x) + (1 − λ)u(0) = λu(x).

Because p0 ∈ (0, 1) and (u(λx) − λu(x)) ≥ 0 for all x ∈ R, we have that both terms in the sum 4.50 are positive and because the sum to zero they should be zero as well. Hence we can conclude that ( u(λα) = λu(α), ∀α > 0 (4.51) u(λβ) = λu(β), ∀β < 0.

We conclude that because u(0) = 0

u(λx) = λu(x), ∀λ ∈ [0, 1], ∀x ∈ R. (4.52) 1 If λ > 1, then there exists a µ ∈ (0, 1) such that λ = µ . We then have that 1  1 u(µx) = µu(x), which means that u λ x = λ u(x). Because this holds for all 1  1 x ∈ R, this also holds for λx. We have that u λ λx = λ u(λx). we can conclude that

u(λx) = λu(x), ∀λ > 0, ∀x ∈ R. (4.53) This means that u is positive homogeneous, which concludes our proof.

The next theorem characterises the positive homogeneous utility functions from < U0 .

74 < Theorem 4.15. Let u ∈ U0 be a finite positive homogeneous utility function, then u is a piecewise linear function. I.e ( γ x, x ≤ 0 u(x) = 2 (4.54) γ1x x > 0.

Where γ2 > 1 > γ1 ≥ 0. The proof of this theorem is based on a lemma found in [25, corollary 13.2.1 ]. Lemma 4.10. Let f be any positively homogeneous convex function, which is not identically +∞. Then cl(f) is the support function of a certain closed convex set C. Namely C := {y|∀x, hx, yi ≤ f(x)} (4.55) Proof. Without proof, see [25, corollary 13.2.1 ]. Furthermore it is stated in [25, p. 51] that for proper convex functions the closed- ness property cl(f) = f is equivalent with lower semicontinuity. Proof. (theorem 4.15) Denote with l(x) := −u(−x), then l is a positive homoge- neous convex function. l is also continuous because u is. Using lemma 4.10 we know that l is the support of a closed convex set of R, I.e. an interval [γ1, γ2] with γ1 ≤ γ2. And because [γ1, γ2] = {y|∀x, hx, yi ≤ l(x)}, we have following representation for l.

l(x) = −u(−x) = sup (xy) . (4.56) γ1≤y≤γ2 Hence we have that ( γ x, x ≤ 0 l(x) = 1 (4.57) γ2x, x ≥ 0. Then the utility function u is given by ( γ x, x ≤ 0 u(x) = 2 (4.58) γ1x, x ≥ 0.

< Because u ∈ U0 we have that for x 6= 0 u(x) < x, this implies that γ2 > 1 > γ1. Because u is non-decreasing we also know that γ1 ≥ 0.

4.2.3 Examples In this section we will try to clarify the concept of divergence based risk measures further by calculating the corresponding utility function of some known divergence functions. The χ2-divergence is given by φ(t) = (t − 1)2. We have that −u(−x) = sup (xt − φ(t)) t∈R = sup xt − (t − 1)2 . t∈R

75 The first order condition yields that x = 2(t − 1). The second order condition yields that −2 < 0. From this we conclude that xt − (t − 1)2 attains a maximum x for t = 2 + 1. Hence: x  x2 −u(−x) = x + 1 − 2 2 x2 = + x. 4

x2 We conclude that the corresponding utility function is given by u(x) = − 4 + x. √ 2 The Hellinger divergence is given by φ(t) = t − 1 . We have that

−u(−x) = sup (xt − φ(t)) t∈R  √ 2 = sup xt − t − 1 . t∈R

The first order condition yields that x = 1 − √1 , or t = 1 . The second order t (1−x)2 condition is satisfied because −√1 < 0. We find that 2 t3

x  1 2 −u(−x) = − − 1 (1 − x)2 1 − x x  x 2 = − (1 − x)2 1 − x x − x2 = (1 − x)2 x = . 1 − x

x We can conclude that u(x) = 1+x . The reader might notice we have not included the Kullback-Leibler divergence. In the next chapter we will show that the associated utility function is the exponential utility function.

4.3 The ordinary certainty equivalent as risk mea- sure

In previous sections we have found two ways to construct convex risk measures using utility functions, utility based shortfall risk and divergence risk measures. Using a duality theorem form mathematical optimisation we found that these util- ity based risk measures where the dual optimisation problems of negative certainty equivalents. The link with certainty equivalents is not surprising at all. Certainty equivalents try to define an equivalent risk-free amount to a uncertain gamble. If this amount

76 is negative, this means you are willing to pay some amount to not have to incur the risk of the gamble. This amount is then used as the risk measure. It is now natural to ask whether we could we use the ordinary certainty equivalent as a risk measure? That is ”would ρ(X) = − CEu(X) be a good risk measure?” In the first chapter we defined some axioms which a ”good” risk measure should satisfy. First of all − CEu(X) would need to be a monetary risk measure. For this it needs to satisfy the translation property. This means we need to have

CEu(X + m) = CEu(X) + m ∀m ∈ R. (4.59) It turns out that the restriction 4.59 is a rather severe restriction on the possible utility functions we can use. In what follows we will further assume that u is 2 strictly increasing and u ∈ C . Now define for all m ∈ R um(x) := u(x + m), then 2 um is also strictly increasing and um ∈ C . Because both u and um are strictly −1 −1 increasing the inverse functions u and um are well defined. For all m ∈ R we have:

−1 −1 CEu(X + m) = CEu(X) + m ⇔ u (E [u(X + m)]) = u (E [u(X)]) + m −1  ⇒ E [u(X + m)] = u u (E [u(X)]) + m ⇒ E [um(X)] = um (CEu(X)) −1 ⇒ um (E [um(X)]) = CEu(X)

⇒ CEum (X) = CEu(X)

From theorem 2.3 from chapter 2. We know that for all m ∈ R

um u CEum (X) = CEu(X) ⇔ rA (x) = rA(x) ∀x ∈ R.

u Where rA(x) denotes the Arrow-Pratt coefficient of absolute risk aversion. We have that: 00 00 um −um(x) −u (x + m) u rA (x) = 0 = 0 = rA(x + m). um(x) u (x + m) u u This means we need to have rA(x + m) = rA(x) for all m ∈ R and x ∈ R. From u this we derive that the Arrow-Pratt coefficient of absolute risk aversion rA(x) is u independent of x. This implies that rA(x) is constant. Thus u is a linear or an exponential utility function. Because linear utility functions imply a risk neutral attitude they are not desirable to construct a risk measure with. In theorem 2.7 of chapter two we have shown that for the (normalised) exponential utility function all three different certainty equivalents coincide.

77 5 Utility functions

In this chapter we will take a closer look at some of the classes of utility functions we encountered in the literature. We will discuss their general properties and whether they are suitable to be used in utility based risk measures. For each of the utility functions we will calculate the associated divergence function using the Legendre transform. Furthermore we will also illustrate the effect of the parameters that occur in both the utility based shortfall risk and the divergence risk. For this we have simulated 10000 returns form a normal distribution with mean 0.25 and standard deviation σ. One can think of the log-returns of a stock which follows a Brownian motion with a drift of 0.25 and a volatility of σ. Because the volatility of a stock, is linked to the riskiness of this stock, we are interested to see the effect of σ on the risk measure. We expect to see that higher values of σ coincide with higher values of the risk measures. Inspired by the results of these simulations we can state following lemma, which does not assume a specific distribution of the returns.

Lemma 5.1. Let uα : R → R be a class of utility functions with a parameter α. ∞ If uα1 (x) ≥ uα2 (x) for all x ∈ R then we have that for X ∈ L (Ω, F,P ). 1. D (X) ≤ D (X), φα1 φα2

lα1 lα2 2. SF x0 (X) ≤ SF x0 (X), for all x0 in the interior of lα1 and lα2 .

Where φα1 and φα2 denote the associated divergence functions of uα1 and uα2 re- spectively. And where lα1 and lα2 denote the associated loss functions.

78 Proof. 1. Let η ∈ R and let X ∈ L∞(Ω, F,P ) then

uα1 (x − η) ≥ uα2 (x − η), ∀x ∈ R

⇒ E [uα1 (X − η)] ≥ E [uα2 (X − η)]

⇒ η + E [uα1 (X − η)] ≥ η + E [uα2 (X − η)]

⇒ sup (η + E [uα1 (X − η)]) ≥ sup (η + E [uα2 (X − η)]) η∈R η∈R ⇒ − OCE (X) ≤ − OCE (X) uα1 uα2 ⇒ D (X) ≤ D (X). φα1 φα2

The last implication follows from theorem 4.10 and uses the assumption that X ∈ L∞(Ω, F,P ).

2. To prove the effect on the utility based shortfall risk measure take X ∈ ∞ L (Ω, F,P ) and assume that uα1 (x) ≥ uα2 (x) for all x ∈ R. Then we have that

E [uα1 (X + m)] ≥ E [uα2 (X + m)] , ∀m ∈ R

lα1 lα2 To prove that SF x0 ≤ SF x0 for all x0 in the interior of lα1 and lα2 , we will show that

{m ∈ R|E [uα2 (X + m)] ≥ −x0} ⊂ {m ∈ R|E [uα1 (X + m)] ≥ −x0} (5.1)

Take m ∈ {m ∈ R|E [uα2 (X + m)] ≥ −x0}. We have that E [uα1 (X + m)] ≥

E [uα2 (X + m)] ≥ −x0.

Hence we can conclude that m ∈ {m ∈ R|E [uα1 (X + m)] ≥ −x0} . This proves the fact that

inf {m ∈ R|E [uα1 (X + m)] ≥ −x0} ≤ inf {m ∈ R|E [uα2 (X + m)] ≥ −x0} . (5.2)

The first class of utility functions we will study are the power utility functions of which we found a brief description in [15]. These utility functions belong to a larger class of utility functions called the HARA class. The acronym HARA stands for hyperbolic absolute risk aversion. A utility function belongs to the HARA class if the Arrow-Pratt coefficient of absolute risk aversion is given by 1 r = ∀x ∈ D. (5.3) A a + bx

Where b ≥ 0 and a > 0 if b = 0. The domain D = R if b = 0. If b 6= 0 then −a D = ( b , +∞).

79 5.1 The power utility functions

Assume that b > 0, then we can reconstruct the utility function using theorem 2.2 from the second chapter. First assume that b 6= 1 We have that Z x Z η  u(x) = C1 exp −rA(ζ)dζ dη + C2 1 1 Z x Z η −1  = C1 exp dζ dη + C2 1 1 a + bζ − 1 ! Z x a + bη  b = C1 exp ln dη + C2 1 a + b − 1 Z x a + bη  b = C1 dη + C2 1 a + b " 1− 1 # a + b a + bx b = C − 1 + C 1 b − 1 a + b 2

D 1− 1 = (a + bx) b + E. b − 1

1 Where D and E are integration constants. If b = 1 we have that rA(x) = a+x . Then the utility function is given by

u(x) = D ln(a + x) + E. (5.4)

The utility function

( 1− 1 D (a + bx) b + E b 6= 1 u(x) = b−1 (5.5) D ln(a + x) + E b = 1 is called the extended power utility. If we have that a > 0 then we can standardise this utility function in the usual way such that u(0) = 0 and u0(0) = 1. To see this, 0 1 first suppose b 6= 1. From the condition u (0) = 1 we conclude that D = a b , and −D 1 −a 1− b from u(0) = 0 we can conclude that E = b−1 a = b−1 . Now consider the case where b = 1 then u(0) = 0 implies that E = −D ln(a) and u0(0) = 1 implies that D = a. Therefore we can conclude that when a > 0 and b > 0 we have following standardised utility function.

1− 1 ( a b 1− 1 a (a + bx) b − b 6= 1 u(x) = b−1 b−1 (5.6) a ln(a + x) − a ln(a) b = 1,

−a where the domain is given by D = ( b , +∞). When a = 0 the extended power util- ity function becomes the narrow power utility function which takes the following form ( 1−γ D x + E γ 6= 1 u(x) = 1−γ (5.7) D ln(x) + E γ = 1.

80 This utility function is defined for all x > 0. The parameter γ is known as the coefficient of relative risk aversion rR which can be obtained using the following formula u00(x) r (x) = −x . (5.8) R u0(x) It is clear that if a ≤ 0 the power utility cannot be standardised in the usual way. However there is a bigger problem when using the power utility in the context of utility based risk measures. The power utility is only defined for values greater a than − b . Throughout this thesis however we have looked at the stochastic variable X which modelled the net payoffs of a portfolio. When trying to quantify the risk of this portfolio we are especially interested in the potential losses of the portfolio. Because we have to evaluate these potential losses with a utility function, it is important that the utility function is defined for those negative values. This can be problematic with power utility. There are however functions in the HARA class which do not have this problem. This occurs whenever b = 0 because then the coefficient of absolute risk aversion is a constant. As we have seen in chapter two of this thesis the corresponding utility function is the exponential utility function.

5.2 The exponential utility functions

The standardised exponential utility function is given by

1 − exp (−ax) u(x) = . (5.9) a Where the parameter a denotes the coefficient of absolute risk aversion. In theorem 2.7 of the second chapter we have proven that for the exponential utility function, all certainty equivalents coincide. We will now derive the divergence function associated with this utility function:

φ(t) ≡ sup (xt + u(−x)) x∈R  1 − exp (ax) = sup xt + x∈R a

1 The first order condition yields that in a maximum x = a ln(t). The second order condition for a maximum is fulfilled because for all x we have that −a exp (ax) < 0. Hence we find that the divergence function is given by

t 1 − exp (ln(t)) φ(t) = ln(t) + a a t 1 t = ln(t) + − . a a a

81 If Q ∈ M1(P ), then the divergence associated with this is Z 1 dQ dQ dQ Iφ(Q|P ) = ln + 1 − dP Ω a dP dP dP 1 Z dQ dQ 1 Z 1 Z = ln dP + dP − dQ a Ω dP dP a Ω a Ω 1 Z dQ dQ = ln dP a Ω dP dP 1 = KL(Q|P ). a We find that the divergence associated with the exponential utility function is the KullBack-Leibler entropy. We have already encountered this entropy in the first chapter where we used it as a penalty function. We note that the divergence risk measure associated with it, i.e  1  sup EQ(−X) − KL(Q|P ) (5.10) Q∈M1(P ) a is called the entropic risk measure. This definition of entropic risk measure was given in [11, p 201] where we also find the definition in the form of a negative optimised certainty equivalent. Using theorem 2.7 from the second chapter and theorem 4.10 from the fourth chapter we can see that this entropic risk measure has the following representation.  1  1 ERa(X) = sup EQ(−X) − KL(Q|P ) = ln (E [exp (−aX)]) . (5.11) Q∈M1(P ) a a To study the effect of the coefficient of absolute risk aversion of the entropic risk measure, it is important to notice that if a1 ≤ a2 then ua1 (x) ≥ ua2 (x) for all x ∈ R. We will formally prove this fact by showing that ua(x) is a decreasing function of a. We have that ∂u (x) exp(−ax)(xa + 1) − 1 a = ≤ 0 ∂a a2 Where the inequality follows from the fact that (xa + 1) ≤ exp(ax) 1. Hence we can use lemma 5.1 to conclude that the entropic risk measure is increasing in the coefficient of absolute risk aversion, i.e.

a1 ≤ a2 ⇒ ERa1 (X) ≤ ERa2 (X). (5.12) To illustrate this relationship we have simulated different sets of each 10000 re- turns. These returns were generated from a normal distribution with mean 0.25 and different standard deviations σ. We calculated the entropic risk measure for different values of absolute risk aversion and plotted our results. The results can be found in figure 5.2. In this figure we can clearly observe that an increase of absolute risk aversion corresponds to an increase in entropic risk. Furthermore we also notice that a higher standard deviation corresponds to a higher risk.

1This is a known inequality which follows from the fact that xa + 1 is the tangent line to exp(ax) in ax = 0, and exp(·) is a convex function.

82 Figure 5.1: Influence of the absolute risk aversion a on the exponential divergence risk measure.

l We would like to point out that because of theorem 2.7 we know that SF 0 = ERa(X). This means that for the exponential utility function the utility based shortfall risk for x0 = 0 equals the entropic risk, i.e the exponential divergence risk. Therefore we will not study the effect of the absolute risk aversion on the exponentially based shortfall risk. As we have deduced in the fourth chapter increasing the parameter x0 of a utility based shortfall risk measure always results in a decrease of the risk and this holds independent of the utility function. We know from 4.13 that this risk measure is not coherent because the utility functions is not a piecewise linear function. However there exists a coherent version of this entropic risk measure which is desribed in detail in [3]. This risk measure is called or EVaR. From [3, definition 3.1] we have that

Definition 5.1. Entropic Value at Risk at a (1−α)100% confidence level is defined as    1 MX (z) EVaRα(X) := inf ln , (5.13) z>0 z a where MX denotes the moment generating function of X. We will not work out any details of this article as it is outside the scope of this thesis. However we will report some key results from this paper and explain how these results are closely linked to the entropic risk measure and divergence based risk measures in general. This is interesting because these seemingly technical theorems could be used to construct coherent alternatives to the divergence risk measures discussed in chapter four. In [3, Theorem 3.3] we find following robust representation theorem regarding Entropic Value at Risk.

83 Theorem 5.1. For X ∈ L∞(Ω, F,P ) we have that !  1 1  EVaRα = sup Q [−X] = inf sup Q [X] − KL(Q|P ) − ln(α) , E z>0 E Q∈I Q∈M1(P ) z z (5.14) where I = {Q ∈ M1(P )|KL(Q|P ) ≤ − ln(α)} It is easy to see that any risk measure which has a robust representation in the form of sup EQ [−X], where I denotes a set of probability measures, is a coherent Q∈I risk measure. This follows from the properties of the expected value and the supremum. When we compare this representation of Entropic Value at Risk to definition 5.11 we can clearly see and interpret the different approach. In the divergence based approach we considered all probability measures Q which are absolute continuous with respect to P . We then looked at the expected losses under each of these probability measures Q, EQ [−X]. Using the Kullback-Leiber entropy we penalised these expected losses depending on how similar the probabil- ity measure Q was to P . We concluded our computation by taking the supremum over all these penalised expected losses. In the coherent approach however, we only consider the probability measures Q which have a Kullback-Leibler distance with respect to P smaller than a given amount. We then take the supremum over all the expected losses with respect to those probability measures Q. No penalty functions are used here and every prob- ability measure Q for which KL(Q|P ) ≤ − ln(α) is taken to be equally important. This idea which is used to construct Entropic Value at Risk could be generalised to all divergence risk measures using the definition of a φ-entropic risk measure with divergence level β which we found in [3, definition 5.1]. Definition 5.2. Let φ be a convex function with φ(1) = 0, and β a non-negative number. The φ-entropic risk measure with divergence level β is defined as

ERφ,β(X) := sup EQ [−X] , (5.15) Q∈I where I := {Q ∈ M1(P )|Iφ(Q|P ) ≤ β} . This defines a class of coherent risk measures which shows a lot of similarities to the divergence risk measures which where defined as

sup (EQ [−X] − Iφ (Q|P )) . Q∈M1(P ) So far we have discussed the power utility and the exponential utility, both of which are contained in the HARA class. These utility functions are commonly used in economics. There exist however a lot of other classes utility functions. One such class is the class of the polynomial utility functions.

5.3 The polynomial utility functions

The following class of utility functions was found in [10].

84 Definition 5.3. For γ > 1 with γ ∈ N the polynomial utility function is defined as ( 1−(1−x)γ γ if x ≤ 1 u(x) = 1 (5.16) γ elsewhere . We have plotted this utility function for different values of γ in figure 5.2 The associated loss function is given by

( (1+x)γ −1 γ if x ≥ −1 l(x) = −1 (5.17) γ elsewhere . The first derivative is given by ( (1 − x)γ−1 x ≤ 1 u0(x) = (5.18) 0 x > 1. The Arrow-Pratt measure of absolute risk aversion is not well defined for x ≥ 1. γ−1 For x < 1 we have that rA + 1−x > 0, which implies a risk averse attitude. Because the utility function is constant for x ≥ 1 we can say that for x ≥ 1 the utility function implies risk neutrality. We will now calculate the divergence function associated with this utility function.

φ(t) = sup (xt − l(x)) x∈R   (1 + x)γ − 1  1  = max sup xt − , sup xt + x≥−1 γ x<−1 γ  1  First consider the case that t < 0, then we have that sup xt + = +∞. Hence x<−1 γ for t < 0 we have that φ(t) = +∞. Now assume that t ≥ 0. Then we have that  1  1 sup xt + = −t + . (5.19) x<−1 γ γ  (1 + x)γ − 1 We will now calculate sup xt − . The first order condition x≥−1 γ 1 yields that t − (1 + x)γ−1 = 0. Hence we have that t γ−1 − 1 = x. Notice that the second order condition for a maximum is also fulfilled in this point. Hence we have that

γ γ !!  (1 + x) − 1  1  t γ−1 − 1 sup xt − = t γ−1 − 1 t − x≥−1 γ γ

1  1 +1 γ  = γt γ−1 − γt − t γ−1 + 1 γ 1  γ γ  = γt γ−1 − γt − t γ−1 + 1 γ 1  γ  = (γ − 1)t γ−1 − γt + 1 γ

85 Figure 5.2: Polynomial util- ity function for different values of γ. Figure 5.3: Divergence function for different values of γ.

γ (γ−1) γ−1 Now it is sufficient to notice that if t ≥ 0 then γ t ≥ 0. Hence we have that

1  γ  1 (γ − 1)t γ−1 − γt + 1 ≥ −t + . (5.20) γ γ From this it follows that for t ≥ 0

1  γ  φ(t) = (γ − 1)t γ−1 − γt + 1 . (5.21) γ We conclude that the associated divergence is given by

(  γ  1 (γ − 1)t γ−1 − γt + 1 if t ≥ 0 φ(t) = γ (5.22) +∞ elsewhere . We have plotted this divergence function for different values of γ in figure 5.3. We will now study the effect of the parameter γ on both the polynomial utility based shortfall risk and on the polynomial divergence risk. An excellent starting point for this is figure 5.2. This figure lets us suspect that, for a fixed return x, if

γ1 ≥ γ2 then uγ1 (x) ≤ uγ2 (x). We can see this in the following way: 1 1 Assume that γ1 ≥ γ2. For x ≥ 1 we have that ≤ . Hence for x ≥ 1 we have γ1 γ2

that uγ1 (x) ≤ uγ2 (x). Now consider a random but fixed x < 1 then 1 − x > 0. 1−(1−x)γ1 1−(1−x)γ2 Hence if γ1 ≥ γ2 we have that ≤ . From this we can conclude γ1 γ2

that uγ1 (x) ≤ uγ2 (x) for all x ∈ R.

86 We can use lemma 5.1 to conclude that both the polynomial divergence risk and the polynomial utility based shortfall risk will increase when the parameter γ increases. We have illustrated this relationship using a simulation. We generated 10000 returns from a normal distribution with mean 0.25. We did this for different standard deviations. For each of these sets of returns we computed the divergence risk and the utility based shortfall risk for different values of γ. We have listed the results in table 5.1 and 5.2 respectively.

Table 5.1: Divergence risk of the polynomial utility for different values of γ and σ γ = 2 γ = 3 γ = 4 γ = 5 γ = 6 σ = 0.2 -0.226 -0.206 -0.187 -0.167 -0.148 σ = 0.4 -0.166 -0.092 -0.022 0.045 0.107 σ = 0.6 -0.073 0.077 0.214 0.340 0.457 σ = 0.8 0.046 0.289 0.514 0.732 0.943 σ = 1.0 0.206 0.545 0.836 1.095 1.330

Table 5.2: Utility based shortfall risk of the polynomial utility with x0 = 0 for different values of γ and σ γ = 2 γ = 3 γ = 4 γ = 5 γ = 6 σ = 0.2 -0.226 -0.206 -0.186 0.167 -0.148 σ = 0.4 -0.163 -0.086 -0.015 0.052 0.115 σ = 0.6 -0.058 0.095 0.234 0.361 0.479 σ = 0.8 0.083 0.332 0.562 0.784 0.977 σ = 1.0 0.276 0.614 0.903 1.159 1.393

When looking at these tables we further notice that the larger the standard de- viation, the larger both risk measures. This coincides with our intuition. When we compare values across both tables, we notice that the divergence risk is always smaller than the utility based shortfall risk, with x0 = 0. This should not be surprising because this theoretical result follows directly from the results which 2 were proven in the fourth chapter. In the fourth chapter we showed that if x0 = 0 l then SF 0(X) = − Mu(X). We also obtained a general inequality which stated that Mu(X) ≤ OCEu(X). From those results we can conclude that

l SF 0(X) ≥ Dφ(X). (5.23)

A special case of the polynomial utility is the quadratic utility. This utility function is obtained by taking γ = 2.

( x2 − 2 + x if x ≤ 1 u(x) = 1 (5.24) 2 elsewhere . The divergence associated with the quadratic utility is

2see equation 4.2

87 ( 1 (t − 1)2 if t ≥ 0 φ(t) = 2 (5.25) +∞ elsewhere . The divergence function (t − 1)2 called the χ2-divergence function. It is stated in [5] that the optimised certainty equivalent associated with the quadratic utility function of a stochastic variable X for which xmax ≤ 1 + E [X] is 1 OCE (X) = [X] − Var(X) (5.26) u E 2 We will verify this claim. We have that u is a differentiable function and ( 1 − x, x ≤ 1 u0(x) = (5.27) 0 x > 0 Now we will use equation 4.41 which characterises the optimal allocation η∗ of the optimised certainty equivalent.

∗ 0 ∗ η = E [X] ⇔ E [u (X − η )] = 1

Because we know that xmax ≤ 1 + E [X], we have that X(ω) − E [X] ≤ 1 for all ω ∈ Ω. Hence we have that

0 E [u (X − E [X])] = E [1 − X + E [X]] = 1 − E [X] + E [X] = 1.

This proves that η∗ = E [X] is an optimal allocation. We can now conclude that the optimised certainty equivalent is given by

∗ ∗ OCEu(X) = η + E [u (X − η )] = E [X] + E [u (X − E [X])]  1  = [X] + (X − [X]) − (X − [X])2 E E E 2 E 1 = [X] + [X] − [X] − (X − [X])2 E E E 2E E 1 = [X] − Var(X). E 2 This proves the result.

5.4 The SAHARA utility functions

When we took a closer look at the application of the power utility to risk measures, we highlighted the problem that this utility function might not be defined for large negative values. The class of SAHARA utility functions was introduced in [8], to deal with the problem of the limited domain of certain HARA functions. The acronym SAHARA stands for symmetric asymptotic hyperbolic absolute risk aversion. Originally this class of utility functions was used for option pricing. In this thesis we will take a closer look at the properties of this class in the context

88 of utility based risk measures. Just as in the HARA class, the SAHARA class is also defined using the Arrow-Pratt measure of absolute risk aversion.

Definition 5.4. A utility function u with domain R belongs to the SAHARA class −u00(x) if the absolute risk aversion ra(x) = u0(x) is given by a rA(x) = (5.28) pb2 + (x − d)2 with a > 0, b > 0 and d ∈ R. We have plotted this risk aversion for several values of the parameters a, b and d in figure 5.4. We can see in these figures that the absolute risk aversion is a strictly positive symmetric function which attains a maximum for x = d. It is easy to prove these facts using equation 5.28. Furthermore we find that a lim rA(x) = lim = 0 (5.29) x→+∞ x→+∞ pb2 + (x − d)2 a lim rA(x) = lim = 0. (5.30) x→−∞ x→−∞ pb2 + (x − d)2

This implies that an investor with the SAHARA utility has an almost risk neutral attitude towards very large losses and very large gains. We will call the point at which the absolute risk aversion attains its maximum, d the threshold loss3. When approached from above the absolute risk aversion is increasing. This implies that an investor or financial institution will become increasingly risk averse and will try to avoid falling below the threshold loss. In the context of risk measures the parameter d could be used to model a loss which if exceeded will cause a the financial institution significant problems. Using this interpretation it is not difficult to see why lim rA(x) = 0 is not an unreasonable x→−∞ assumption. If the losses are so large that it insures the bankruptcy of the financial institution, there is no reason to be risk averse any more. Unlike in [8] we will not assume that d = 0, because there is no reason to assume that the threshold loss is 0. This makes the computation of the associated utility function tedious. To insure the readability of this section we will only report the results and we have put the calculations in appendix B. If the Arrow-Pratt measure of absolute risk aversion is given by equation 5.28 then the associated utility function is given by

  −a   −C1 p 2 2 p 2 2  2 b + (x − d) + (x − d) a b + (x − d) + (x − d) + C2 a 6= 1  (a −1) √ 2 ! u(x) =   b2+(x−d)2−(x−d) C1 p 2 2  ln b + (x − d) + (x − d) − 2 + C2 a = 1.  2 2b (5.31) For some constants C1 and C2. For all a > 0 the marginal utility is given by

3This is different from the definition given in [8], because in our context the stochastic variable X does not model the total wealth.

89 Figure 5.4: Absolute risk aversion of SAHARA utility.

Figure 5.5: Absolute risk aversion for Figure 5.6: Absolute risk aversion for varying values of a with varying values of b with b = 1 and d = 2 fixed. a = 0.5 and d = 2 fixed.

 −a 0 p 2 2 u (x) = C1 b + (x − d) + (x − d) . (5.32)

We can determine the constants C1 and C2 such that the utility function is stan- dardised and we have that u(0) = 0 and u0(0) = 1. We find that √ a 2 2 C1 = b + d − d , (5.33) and √ √  C −a 1 b2 + d2 − d a b2 + d2 − d a 6= 1  (a2−1) √ 2 C2 =  √ b2+d2+d  (5.34) C1 2 2  ( ) − 2 ln b + d − d − 2b2 a = 1. We have plotted the standardised SAHARA utility functions for different values of a,b and d in figure 5.7.

90 Figure 5.7: The SAHARA utility function.

Figure 5.8: b = 1 and Figure 5.9: a = 2 and Figure 5.10: a = 2 and d = 2. d = 0. b = 1.

The calculation of the associated divergence function can be found in appendix B. We found that the associated divergence function is given by

  1 −1 −1 1  b2t a C a t a C a  t 1 − 1 − 2d + C a 6= 1  2 1+ 1 1− 1 2 φ(t) = a a (5.35)  2  1 C1  t  C1 b t  C1 ln − − t d + − + C2 a = 1.  2 t 2 2t 2C1

Where the constants C1 and C2 are given by 5.33 and 5.34 respectively. We have plotted the divergence function in figure 5.11.

Figure 5.11: Divergence function of SAHARA utility with a = 2, b = 2 and d = −1.

The SAHARA class of utility functions is de most complicated class of utility functions in this chapter. Unlike in the case of the exponential utility functions and the polynomial utility functions, we will only illustrate the effect of the parameters of the SAHARA

91 utility on the divergence risk measures in a concrete example. For the divergence risk measures we worked with different sets of returns which we generated from a normal distribution with mean 0.25 and different standard deviations. The effect of the parameters a, b and d where plotted in figures 5.12, 5.14 and 5.16 respectively. When we look the effect of the parameter a on the SAHARA divergence risk we see that in this case the divergence risk is increasing in the parameter a. Although we do not provide a formal proof, we do not think this relationship is purely coincidental. If we look at figure 5.8 we suspect that if a1 ≥ a2 and all other parameters are fixed, then ua1 (x) ≤ ua2 (x). Hence the relationship between the parameter a and the divergence risk measure might be due to lemma 5.1. The same relationship is observed between the parameter a and the utility based shortfall risk. In figure 5.13 we have plotted both the divergence risk and the utility based shortfall risk with x0 = 0 of 10000 returns generated form a normal distribution with mean 0.25 and standard deviation 0.8. Although both the di- vergence risk and the utility based shortfall risk were plotted, only one graph is visible. This is not a mistake. It turns out that in this example both risk measures yield very similar results, which makes it difficult to distinguish between them.

Figure 5.12: Effect of the parameter a on the SAHARA divergence risk with b = 2 and d = 0.

92 Figure 5.13: Effect of parameter a on utility based shortfall risk with b = 2 en d = 0.

Using the same sets of returns as in the illustration of the affect of the parameter a we have illustrated the effect of the parameter b on the divergence risk. For the computations we have taken a = 2 and d = 0. The results are shown in figure 5.14. Here we observe that an increase in the parameter b corresponds to a decrease in the divergence risk measure. We also observe that a higher standard deviation leads to a higher divergence risk. These results are not surprising when we look at figure 5.9 where the effect of the parameter b on the SAHARA utility function is shown. To illustrate the effect on the utility based shortfall risk we have calculated this risk measure on a set of 10000 returns generated from a normal distribution with mean 0.25 and standard deviation 0.8. We took a = 2, d = 0 and x0 = 0. At the same time we also calculated the divergence risk of the same set of returns. The results are shown in figure 5.15. In this figure we do not only observe that the SAHARA utility based shortfall risk is decreasing in the parameter b, but also that the difference between this risk measure and the divergence risk measure is very small.

93 Figure 5.14: Influence of the parameter b on the SAHARA divergence risk with a = 2 and d = 0.

Figure 5.15: Influence of the parameter b on the utility based shortfall risk with a = 2, d = 0 and x0 = 0.

94 Until this point each parameter we looked at had a monotone effect on the risk measure. However if we look at figure 5.16 we notice a non-monotone relationship between the parameter d and the associated divergence risk. This figure was generated, as always using different sets of returns each taken from a normal distribution with mean 0.25 and different standard deviations. For the calculations we put a = 2 and b = 2. The same relationship is observed in figure 5.17. Here we have computed both the SAHARA utility based shortfall risk and the SAHARA divergence risk of a set of returns which was generated from a normal distribution with mean 0.25 and standard deviation 0.8. For the computations we took a = 2, b = 2 and x0 = 0. Again we notice that there is almost no difference between the utility based shortfall risk measure and the divergence risk measure. Notice that the non-monotonicity is a consequence of the way that we standardised the SAHARA utility function. If we had taken C1 = 1 and C2 = 0 for example then the increasing relationship between the parameter d and the divergence risk would follow directly from the shift additivity of the optimised certainty equivalent. To see this denote with ud1 (x) the SAHARA utility function were d = d1. Notice that under the alternative standardisation ud(x) = u0(x − d), therefore we have that

OCEud (X) = sup (η + E [ud(X − η)]) = sup (η + E [u0(X − d − η)]) = OCEu0 (X)−d η∈R η∈R

Hence we would have that Dφd (X) = Dφ0 (X) + d. Using the standardisation C1 = 1 and C2 = 0 would also cause an increasing relationship between the parameter d and the utility based shortfall risk. A result which follows directly from the translation invariance of utility based shortfall risk measures.

Figure 5.16: Influence of the parameter d on the SAHARA divergence risk with a = 2 and b = 2.

95 Figure 5.17: Influence of the parameter d on the different risk measures with a = 2, b = 2 and x0 = 0. Based on 10000 returns generated from a normal distribution with µ = 0.25 and σ = 0.8.

5.5 The κ-utility functions

In [15] we found following class of utility functions, which we will call the κ-utility functions. For each κ > 0 1  √  u(x) = 1 + κx − 1 + κ2x2 (5.36) κ denotes a κ-utility function. Notice that this utility function is standardised such that u(0) = 0 and u0(0) = 1 for all κ > 0. We have plotted this function for several values of κ in figure 5.5. We will fix κ and calculate the first and second derivatives with respect to x. We find that κx u0(x) = 1 − √ . (5.37) 1 + κ2x2 We can conclude that for all κ > 0 and for all x u0(x) > 0. This implies that all utility functions from the class 5.36 are strictly increasing. For the second derivative we find that 00 −κ u (x) = √ 3 . (5.38) 1 + κ2x2 Because κ > 0 we have that u00(x) < 0 for all κ and x. This means all utility functions from the class 5.36 are strictly concave everywhere. Now we can easily deduce the Arrow-Pratt measure of absolute risk aversion.

u00(x) κ ra(x) = − = √ (5.39) u0(x) (1 + κ2x2) 1 + κ2x2 − κ2x2

96 Figure 5.19: Absolute risk Figure 5.20: Skew asymp- Figure 5.18: Utility func- aversion. tion 5.36. tote, κ = 2.

We have plotted this absolute risk aversion for several values of κ in figure 5.19. Using this figure we can make several hypotheses about the class of κ-utility func- tions. We notice that when κ gets larger, the maximum of the absolute risk aversion gets larger. In the neighbourhood of zero we have that the larger the κ the larger the absolute risk aversion. We also see that for all values of κ the absolute risk aversion tends to zero in the left tail. The larger the κ the faster this happens. Hence in the left tail the agent tends to risk-neutrality. This will also be the case for the right tail. We can see this in figure 5.20 where we have plotted the same utility functions as in figure 5.5 but on a larger domain. We can see that when x becomes larger, the utility functions tend to some horizontal asymptote. In the same figure we see that when x gets smaller the utility function tends to some skew asymptote y = ax + b for some a and b. The computations regarding the asymptotic behaviour of the κ-utility can be found in appendix B. To improve the readability of this text we will only discuss the results. 1 It turns out that the skew asymptote is given by y = 2x + κ . This is illustrated 1 in figure 5.20. The equation of the horizontal asymptote is given by y = κ . The class of utility functions 5.36 is defined for all κ > 0. We will now discuss what happens when κ tends to 0 and κ tends to +∞. We have that √ lim u(x) = x − x2. (5.40) κ→+∞ From this we can conclude that ( 0, x ≥ 0 lim u(x) = (5.41) κ→+∞ 2x, x < 0.

We recognize the utility function used in CVAR for a confidence level α = 0.5. The limit when κ tends to zero is given by

97 lim u(x) = x. (5.42) κ→0 Which gives us the utility function of a risk neutral investor. The determination of the associated divergence function is a tedious task. All com- putations can be found in appendix B. We obtained that the divergence function is given by (   −1 p1 − (t − 1)2 + 1 if 0 ≤ t ≤ 2 φ(t) = κ κ (5.43) +∞ if t > 2. In figure 5.21 we have plotted this divergence function.

Figure 5.21: The divergence function of the κ-utility for different values of κ.

To understand the effect of the parameter κ on both the divergence risk measure and the utility based risk measure we will again use lemma 5.1. Figure lets us suspect that if κ1 ≥ κ2 then uκ1 (x) ≤ uκ2 (x) for all x ∈ R.

98 Assume that κ1 ≥ κ2 > 0 then for all x ∈ R we have that q q 2 2 2 2 κ1 ≥ κ2 > 0 ⇒ 1 + κ1x ≥ 1 + κ2x q q 2 2 2 2 ⇒ − 1 + κ1x ≤ − 1 + κ2x p1 + κ2x2 p1 + κ2x2 ⇒ − 1 ≤ − 2 κ1 κ2 1 p1 + κ2x2 1 p1 + κ2x2 ⇒ + x − 1 ≤ + x − 2 κ1 κ1 κ2 κ2

⇒ uκ1 (x) ≤ uκ2 (x).

From this we can conclude that an increase in the parameter κ results in an increase of the divergence risk and of the utility based shortfall risk. This effect of the parameter κ on the divergence risk is illustrated in figure 5.22. We have constructed this figure in the same fashion as before. I.e sets of 10000 returns where simulated from a normal distribution with mean 0.25 such that each set had different standard deviations.

Figure 5.22: Influence of κ-parameter on the divergence risk.

In figure 5.23 we have plotted the divergence risk and the utility based shortfall risk with x0 = 0 of a set of returns generated from a normal distribution with mean 0.25 and standard deviation 0.8. This figure illustrates the fact that utility based shortfall risk is increasing in the parameter κ. We can also observe that there is a clear difference between the utility based shortfall risk with x0 = 0 and the divergence risk, and that the difference between the two risk measures increases if κ increases.

99 Figure 5.23: Comparison of the divergence risk and the utility based shortfall risk of the κ-utility.

When looking at the divergence of the κ-utility we notice that for t > 2 the divergence becomes +∞. That is for values of t larger then the slope of the skew asymptote the divergence is +∞. We have seen something similar when we looked at the divergence associated with CVaR. In that case we have that the utility 1 1 function is given by u(x) = − α max(0, −x) and y = α x is a skew asymptote for x → −∞. The associated divergence was given by

( 1 0 if 0 ≤ t ≤ α φ(t) = 1 (5.44) +∞ if t > α . This turns out not be a coincidence. Because the slope of the skew asymptote is an upper bound of the slope of the utility function or equivalently the slope of loss function, the Legendre transform of the loss function becomes infinite for values larger than this upper bound. This effect is illustrated in figure 5.24 where we have used κ = 5. We have formalised this intuition in a lemma.

Lemma 5.2. Let u(x) be an increasing and concave utility function, such that y = ax + b is a skew asymptote for x → −∞. Then the associated divergence φ(t) = (−u(−t))∗ = +∞ for all t > a.

Proof. Because y = ax + b is a skew asymptote of u(x) when x tends to −∞, we have that y = ax − b is a skew asymptote of l(x) = −u(−x) when x tends to +∞. Because l(x) is a convex and increasing function, the slope of l(x) is increasing. Hence a is an upper bound for the slope of the loss function. Because ax − b is a skew asymptote when x tends to +∞ we have that

∀ > 0, ∃δ > 0 such that x > δ ⇒ |l(x) − (ax − b)| < 

100 Figure 5.24: Effect of skew asymptote on the divergence function.

Hence there exists a γ such that for all x > δ l(x) − ax + b < γ because we can take γ =  or γ = −. We need to calculate sup (xt − l(x)). We will show that x∈R for t > a we have that xt − l(x) is unbounded. Suppose that xt − l(x) is bounded from above for t > a, then there exists an M ∈ R such that xt − l(x) ≤ M for all x ∈ R. We have that for x > δ, l(x) < γ + ax − b. Hence we have that xt ≤ l(x)+M < γ+ax−b+M. From this we have that ∀x > δ, x(t−a) < γ−b+M. γ−b+M Because t > a we find that x < t−a . Which gives a contradiction. Hence xt − l(x) is unbounded and φ(t) = l∗(t) = sup(xt − l(x)) = +∞. x∈R

101 Conclusion

In this masterthesis we looked at different ways to incorporate utility functions in risk measures. We focused on two classes of risk measures: utility based shortfall risk measures and divergence risk measures. Both of these risk mea- sures are convex, which means that they satisfy the properties of monotonicity, translation invariance and sub-additivity. However, they are generally not coher- ent because they lack the positive homogeneity property. Like all convex risk measures, these risk measures have a robust representation of the following form sup (EQ [−X] − α(Q)). Where α(·) is a penalty function. In the case of diver- Q∈M1(P ) gence risk measures this representation is often used and the penalty function is  dQ  taken to be the φ- divergence Iφ(Q|P ) = E φ( dP , where the convex function φ is called the divergence function. The effect of this divergence function is difficult to analyse and to interpret. Fortunately the strong Fenchel duality theorem from mathematical optimisation was able to reformulate the robust representation of a divergence risk measure to a more comprehensible formula. We obtained that each divergence risk measure could be interpreted as the negative of an optimised certainty equivalent where the utility function u was linked to the divergence func- tion φ through the Fenchel-Legendre transform. More formally we obtained that if φ∗(x) = −u(−x) then

Dφ(X) = sup (EQ [−X] − Iφ(Q|P )) = − sup (η + E [u(X − η)]) = − OCEu(X). Q∈M1(P ) η∈R The utility based shortfall risk measures were defined as the negative of the u-Mean certainty equivalent. We had that for l(x) = −u(−x) = −u˜(−x) + x0.

l SF x0 (X) = inf{m ∈ R|E [l(−X − m) ≤ x0]} = − sup{m ∈ R|E [˜u(X − m)] ≥ 0} = − Mu˜(X). Both utility based risk measures have a representation as an optimisations problem with regard to a utility function. These optimisation problems were linked using l strong Lagrangian duality. We obtained that SF 0(X) ≥ Dφ(X). Because both the divergence risk measure and the utility based shortfall risk mea- sure can be interpreted as the negative of some certainty equivalent, we looked into the possibility of using the negative of the ordinary certainty equivalent as a risk measure. It turned out that to get a translation invariant risk measure, only linear or exponential utility functions could be used. Since we also noted that for the exponential utility function, all certainty equivalents coincide, we did not obtain any new interesting convex risk measures.

102 After reading this thesis, the reader might feel there remains an important unan- swered question. Namely, ”Which utility function should be used in utility based risk measure?” We do not give an answer to this question. One of the reasons for not proposing a specific utility function is that utility func- tions model preferences and these preferences are subjective. Another important reason is computability. In this thesis we did not look at how these risk mea- sures could be efficiently computed. Although we did compute some utility based risk measures in the last chapter, we did this by using a packages for constrained and unconstrained optimisation in python. This computational aspect should be taking into account when choosing a suitable utility function. Although we did not put forward a specific utility function that should be used in utility based risk measures, we did study some of them in the last chapter. Here we tried to give an illustration in how the parameters of the utility functions affect both the utility based shortfall risk and the divergence risk. For each of the utility functions we also computed the associated divergence function.

103 A Dutch summary

Om kapitaalvereisten op te stellen voor financi¨eleinstellingen is het noodzakelijk om het risico te kunnen bepalen van de portfolio’s van deze instellingen. Het bepalen van dit risico kan gebeuren aan de hand van risicomaten. In het eerste hoofdstuk bestuderen we deze risicomaten vanuit een wiskundig standpunt en for- muleren we enkele eigenschappen die een goede risicomaat zou moeten hebben. Aan de hand van deze eigenschappen kunnen we een klasse van convexe risico- maten construeren. Binnen deze klasse besteden we vervolgens extra aandacht aan de subklasse van de coherente risicomaten. Hierna bestuderen we het con- cept van acceptatieverzamelingen. Dit zijn verzamelingen waarin alle mogelijke portfolio’s zich bevinden waarvan we het risico aanvaardbaar vinden. Aan de hand van deze verzamelingen kunnen we op een eenvoudige manier risicomaten defini¨eren.Vervolgens introduceren de robuuste representatie van convexe risico- maten en bestuderen we de bijhorende straffunctie. Enkel risicomaten bestuderen vanuit een wiskundig standpunt zou volledig voorbij- gaan aan het subjectieve karakter van risico. Wat een te hoog risico is voor de ´e´en is aanvaardbaar voor de ander. Het tweede hoofdstuk geeft daarom een inleiding tot de beslissingstheorie. Hierin introduceren we het von Neumann-Morgenstein framework voor het maken van beslissingen onder onzekerheid. We leggen uit wat nutsfuncties zijn en hoe ze de verschillende attitudes ten opzichte van risico kunnen modelleren. Vervolgens schenken we aandacht aan verschillende types zekerheidse- quivalenten: het gewone zekerheidsequivalent (CEu), het geoptimaliseerde zeker- heidsequivalent (OCEu) en het u-gemiddeld zekerheidsequivalent (Mu) komen aan bod. We sluiten dit hoofdstuk af met een introductie van het concept stochastische dominantie. Gewapend met zowel de wiskundige concepten uit het eerste hoofdstuk als de economische concepten uit het tweede hoofdstuk, kunnen we nu concrete risico- maten analyseren. Dit gebeurt in het derde hoofdstuk, waarin we de geziene con- cepten toepassen op zowel Value at Risk als Expected shortfall. Hierin merken we op dat Value at Risk, een van de meest gebruikte risicomaten, enkele belangrijke

104 tekortkomingen vertoont zowel op wiskundig als op economisch gebied. In het vierde hoofdstuk gaan we dieper in op de hoofdvraag van deze thesis: ”Hoe kunnen nutsfuncties op een goede manier ge¨ıncorporeerd worden in risicomaten?” We geven twee mogelijke antwoorden op deze vraag. Allereerst zijn er de op l nutsfuncties gebaseerde shortfall risicomaten (SF x0 ). De constructie van deze risi- comaten gebeurt vanuit acceptatieverzamelingen. Deze acceptatieverzamelingen bevatten alle portfolio’s waarvan het verwachte nut een bepaalde grens overstijgt. Als we de verliesfunctie l defini¨erenals l(x) = −u(−x) hebben we dat

l SF x0 = inf{m ∈ R|E [u(X + m)] ≥ −x0}. Hierbij hebben we opgemerkt dat we deze formule kunnen herschrijven aan de l hand van een u-gemiddeld zekerheidsequivalent. We hebben dat SF 0 = − Mu(X). Een tweede type risicomaten waarin nutsfuncties ge¨ıncorporeerd zijn, zijn de zo- genaamde divergentie risicomaten (Dφ). In tegenstelling tot de op nutsfuncties gebasserde shortfall risicomaten worden deze risicomaten niet geconstrueerd aan de hand van acceptatieverzamelingen, maar wordt er gebruik gemaakt van de robuuste representatie van convexe risicomaten. We hebben dat

Dφ(X) = sup (EQ [−X] − Iφ(Q|P )) . QP

Kenmerkend voor divergentie risicomaten is dat de straffunctie de vorm Iφ(Q|P ) =  dQ  1 EP φ dP heeft, waarbij φ een convexe functie is die de divergentiefunctie ge- noemd wordt. Een van de bekendste voorbeelden van een divergentie risicomaat is entropisch risico. Hierbij neemt men als divergentie de Kullback-Leibler entropy,  dQ dQ  EP dP ln dP . De interpretatie van divergentie risicomaten is niet eenvoudig als men enkel beschikt over de desbetreffende robuuste respresentatie. Gelukkig biedt de stelling van de sterke Fenchel-dualiteit hiervoor een oplossing. Indien we als nutsfunctie u(x) = −φ∗(−x) nemen, waarbij φ∗ de Fenchel-Legendre transformatie is van de divergentiefunctie vinden we dat

Dφ(X) = − sup (η + E [u(X − η)]) = − OCEu(X). η∈R Deze representatie is veel eenvoudiger te interpreteren dan de robuuste repre- sentatie. Zo kan men entropisch risico interpreteren als het negatieve geopti- maliseerde zekerheidsequivalent van een individu met een exponenti¨elenutsfunctie. Ge¨ınspireerddoor het feit dat zowel divergentie risicomaten als op de nutsfunc- ties gebaseerde shortfall risicomaten kunnen ge¨ınterpreteerd worden als negatieve zekerheidsequivalenten vroegen we ons af of ook het gewone zekerheidsequiva- lent op die manier aanleiding zou geven tot een goede risicomaat. Dit idee bleek echter weinig succesvol daar deze risicomaten in vele gevallen niet over de gewenste wiskundige eigenschappen beschikten. Alhoewel entropisch risico een erg gekend voorbeeld is van een divergentie risico- maat, is er vanuit economisch oogpunt weinig reden waarom we voor de construc- tie van risicomaten de exponenti¨elenutsfunctie zouden gebruiken. In het laatste hoofdstuk bestudeerden we daarom verschillende nutsfuncties in de context van 1Die ook de waarden +∞ en −∞ aan kan nemen, maar in minstens ´e´enwaarde eindig is.

105 risicomaten. We berekenden voor elk van deze nutsfuncties de geassocieerde di- vergentiefunctie en onderzochten de invloed van de parameters op de verschillende risicomaten.

106 B Additional computations

B.1 Computations regarding the SAHARA util- ity class

The class of SAHARA utility functions is defined using following coefficient of absolute risk aversion. a rA(x) = (B.1) pb2 + (x − d)2 with a > 0, b > 0 and d ∈ R.

B.1.1 Computation of the utility function Let v(x) = u0(x) than we have that dv(x) = √ −a dx Integrating both sides v(x) b2+(x−d)2 gives Z −a ln(v(x)) = dx pb2 + (x − d)2 Z −a = dy pb2 + y2 p = −a ln( b2 + y2 + y) + C = ln(pb2 + (x − d)2 + (x − d))−a + C.

We conclude that  −a 0 p 2 2 u (x) = C1 b + (x − d) + (x − d) . (B.2)

For some integration constant C1. Then we have that

107 Z  −a p 2 2 u(x) = C1 b + (x − d) + (x − d) dx.

Let y = (x − d) then we have that

Z  −a p 2 2 u(y) = C1 b + y + y dy.

Now consider following substitution

z2 − b2 y = , 2z z2 + b2 dy = dz. 2z2 Then we have that s −a Z z2 − b2 2 z2 − b2  z2 + b2  u(z) = C b2 + + dz 1  2z 2z  2z2 r !−a Z 4b2z2 + z4 − 2z2b2 + b4 z2 − b2  z2 + b2  = C + dz 1 4z2 2z 2z2 s −a Z z2 + b2 2 z2 − b2  z2 + b2  = C + dz 1  2z 2z  2z2

Z  z2 −a z2 + b2  = C dz 1 2z 2z2 Z 1 = C z−a−2(z2 + b2)dz. 1 2

Now consider the case that a 6= 1 then Z 1 u(z) = C z−a−2(z2 + b2)dz 1 2 C  z−a+1 b2z−a−1  = 1 + + C 2 −a + 1 −a − 1 2 −C z−a = 1 z(a + 1) + b2z−1(a − 1) + C . 2(a2 − 1) 2

z2−b2 p 2 2 Using the substitution y = 2z or equivalently z = b + y + y we have that

108  −a −C pb2 + y2 + y 1 p  p −1  u(y) = b2 + y2 + y (a + 1) + b2 b2 + y2 + y (a − 1) + C 2(a2 − 1) 2  −a −C pb2 + y2 + y 1 p  p   = b2 + y2 + y (a + 1) + b2 + y2 − y (a − 1) + C 2(a2 − 1) 2  −a −C pb2 + y2 + y 1  p  = a b2 + y2 + y + C . (a2 − 1) 2

We conclude that for a 6= 1 we have that −C  −a   u(x) = 1 pb2 + (x − d)2 + (x − d) apb2 + (x − d)2 + (x − d) + C . (a2 − 1) 2 (B.3) Now consider the case that a = 1, then we have that Z C C u(z) = 1 z−1 + 1 b2z−3dz 2 2 C  1  = 1 ln(z) − b2 + C . 2 2z2 2

Now using the substitution z = pb2 + y2 + y we find that   C1 p  1 u(y) = ln b2 + y2 + y − b2  + C 2   2  2 2 pb2 + y2 + y   2  pb2 + y2 − y C1 p  = ln b2 + y2 + y − b2  + C 2   2  2  2 2 pb2 + y2 + y pb2 + y2 − y   2  pb2 + y2 − y C1 p  = ln b2 + y2 + y − b2  + C 2  2b4  2

  2  pb2 + y2 − y C1 p  = ln b2 + y2 + y −  + C . 2  2b2  2

Using that y = x − d we conclude that if a = 1 we have that   2  p 2 2 C   b + (x − d) − (x − d) u(x) = 1 ln pb2 + (x − d)2 + (x − d) − +C . 2  2b2  2 (B.4)

109 0 Now we will determine the constants C1 and C2 such that u(0) = 0 and u (0) = 1. From B.2 we have that the condition u0(0) = 1 yields √ a 2 2 C1 = b + d − d .

Notice that C1 > 0. The condition u(0) = 0 yields that when a 6= 1

C √ −a  √  C = 1 b2 + d2 − d a b2 + d2 − d . 2 (a2 − 1) When a = 1 we have that

√ 2 ! C √  b2 + d2 + d C = − 1 ln b2 + d2 − d − . 2 2 2b2

B.1.2 Computation of the divergence function Now we will derive the divergence function associated with the standardised SA- HARA Utility. First suppose that a 6= 1 then the utility function is given by B.3. denote with 1  −a   u (x) := pb2 + (x − d)2 + (x − d) apb2 + (x − d)2 + (x − d) 1 (a2 − 1) (B.5) Then u(x) = C1u(x) + C2. We will calculate the divergence function associated with u1 en denote this function φ1. Then there is a clear relation between the divergence function associated with u. Because C1 > 0 we have that

φ(t) = sup (xt + u(−x)) x∈R = sup (u(x) − xt) x∈R = sup (C1u1(x) + C2 − xt) x∈R  t  = C1 sup u1(x) − x + C2 x∈R C1  t  = C1φ1 + C2. C1

We have that φ1(t) = supx∈R (u1(x) − xt). The first order condition gives that u0(x∗) = t, where x∗ denotes the optimal value.

 −a pb2 + (x∗ − d)2 + (x∗ − d) = t

∗ 1  −1 2 1  ⇔(x − d) = t a − b t a . 2

110 Then we have that

−1  −a   u (x∗) = pb2 + (x∗ − d)2 + (x∗ − d) apb2 + (x∗ − d)2 + (x∗ − d) 1 (a2 − 1) −t   = apb2 + (x∗ − d)2 + (x∗ − d) a2 − 1   v 2 u −1 1 ! −1 2 1 −t u t a b2t a t a b t a = atb2 + − + −  a2 − 1  2 2 2 2 

 s  −2 2 −1 2 1 −t 4b2 + t a − 2b2 + b2t a t a b t a = a + − a2 − 1  4 2 2    v 2 u −1 ! −1 2 1 −t u t a + b2t1/a t a b t a = at + −  a2 − 1  2 2 2 

−1 2 1/a ! −1 2 1 ! −t t a + b t t a b t a = a + − a2 − 1 2 2 2

−t  −1 2 1  = t a (a + 1) + b t a (a − 1) 2(a2 − 1) −1 ! −t t a 2 1 = + b t a (a + 1) . 2 (a − 1)

We can conclude that

∗ ∗ φ1(t) = u1(x ) − x t −1 ! −t t a 2 1 t  −1 2 1  = + b t a (a + 1) − 2d + t a − b t a 2 (a − 1) 2      −t −1 1 2 1 1 = t a + 1 + b t a − 1 − td 2 a − 1 a + 1 2 1 −1 ! t b t a t a = 1 − 1 − 2d . 2 1 + a 1 − a

Hence we have that  t  φ(t) = C1φ1 + C2 (B.6) C1 When a = 1 we have that the first order condition yields that

 −1 t = pb2 + (x∗0 − d)2 + (x∗ − d)2 .

111 From which we can conclude that t−1 b2t (x∗ − d) = − 2 2 t−1 b2t x∗ = d + − . 2 2

We have that ! 1 pb2 + (x∗ − d)2 − (x∗ − d) u (x∗) = ln(pb2 + (x∗ − d)2 + (x − d)) − 1 2 2b2 r !! 1 1 4b2 + t−2 − b2 + b4t2 t−1 b2t = ln(t−1) − − + 2 2b2 4 2 2 1  1 t−1 b2t t−1 b2t = ln(t−1) − + − + 2 2b2 2 2 2 2 1  b2t  = ln(t−1) − 2 2b2 1  t  = ln(t−1) − . 2 2

Then we have that

∗ ∗ φ1(t) = u1(x ) − tx 1  t   t−1 b2t = ln(t−1) − − t d + − . 2 2 2 2

B.2 Computations regarding the κ-utility class

B.2.1 Determining the asymptotic behaviour For κ > 0 the κ-utility is given by 1  √  u(x) = 1 + κx − 1 + κ2x2 . (B.7) κ We will now determine the equation corresponding to the skew asymptote of the κ-utility. The skew asymptote is given by y = ax+b Using the formula’s of Cauchy we find that

112 u(κ, x) a = lim x→−∞ x √ ! 1 1 + κx − 1 + κ2x2 = lim x→−∞ κ x √ − 1 + κ2x2 = 0 + 1 + lim x→−∞ κx √ q 2 1 2 − x x2 + κ = 1 + lim x→−∞ κx q 1 2 x x2 + κ = 1 + lim x→−∞ κx q 1 2 x2 + κ = 1 + lim x√→−∞ κ κ2 = 1 + = 2 κ

b = lim [u(x) − ax] x→−∞  1  √   = lim 1 + κx − 1 + κ2x2 − 2x x→−∞ κ 1  1 √  = + lim x − 1 + κ2x2 − 2x κ x→−∞ κ 1  1 √  = + lim −x − 1 + κ2x2 κ x→−∞ κ " √ # 1 x2 r 1 = + lim −x − + κ2 κ x→−∞ κ x2 " # 1 xr 1 = + lim −x + + κ2 κ x→−∞ κ x2 " # 1 1 r 1 = − lim x 1 − + κ2 κ x→−∞ κ x2 q  1 1 2  1 1 − κ x2 + κ = − lim   x→−∞ 1 κ x " √ # 1 1 − 1 z2 + κ2 = − lim κ κ z→0 z 1  −z  1 = − lim √ = κ z→0 κ z2 + κ2 κ

113 1 We can conclude that the skew asymptote is given by y = 2x + κ . We will now derive the equation of the horizontal asymptote.

1  √  lim u(x) = lim 1 + κx − 1 + κ2x2 x→+∞ x→+∞ κ ! 1 1 r 1 = + lim κx − κx 1 + κ x→+∞ κ κ2x2 ! 1 r 1 = + lim x 1 − 1 + κ x→+∞ κ2x2

 q 1  1 1 − 1 + κ2x2 = + lim x→+∞ 1 κ x −1 2 3q 1 1 κ x 1+ 2 2 = + lim κ x x→+∞ −1 κ x2 1 1 = + lim q κ x→+∞ 2 1 κ x 1 + κ2x2 1 = . κ We conclude that if x tends to infinity the utility function tend to the horizontal 1 asymptote with equation y = κ . The class of utility functions 5.36 is defined for all κ > 0. We will now show what happens when κ tends to z´eroand κ tends to +∞.

1  √  lim u(x) = lim 1 + κx − 1 + κ2x2 κ→+∞ κ→+∞ κ ! 1 r 1 = lim + x − + x2 κ→+∞ κ κ r 1 = x − lim + x2 κ→+∞ κ √ = x − x2.

1  √  lim u(x) = lim 1 + κx − 1 + κ2x2 κ→0 κ→0 κ  κx2  = lim x − √ κ→0 1 + κ2x2 = x.

B.2.2 Computation of the divergence function Fix some κ > 0 then the associates loss function is given by l(x) = −u(−x). 1  √  l(x) = − 1 − κx − 1 + κ2x2 (B.8) κ

114 The associated divergence function is given by φ(t) = l∗(κ, t).

 1  √  φ(t) = sup xt + 1 − κx − 1 + κ2x2 (B.9) x∈R κ The first order condition yields that 1  κ2x  t − κ + √ = 0. (B.10) κ 1 + κ2x2 We will only derive the divergence for t > 0. First assume that t ≥ 2. Then there does not exist an x ∈ R such that equation B.10 holds. We can see this easily when we rewrite this equation as κx t = 1 + √ . (B.11) 1 + κ2x2 √ Because κx < 1 + κ2x2 for all x ∈ and κ > 0 we have that √ κx < 1. R 1+κ2x2 Hence if equation B.11 would hold for some x and some κ then t <. It t ≥  2  2 then the first derivative t − 1 κ + √ κ x > 0. This implies the function √ κ 1+κ2x2 1 2 2 xt + κ 1 − κx − 1 + κ x is increasing. To determine the supremum over all x of this function we will study the limit when x tends to +∞. First consider the case when t = 2. We have that

 1  √  1  1 √  lim 2x + 1 − κx − 1 + κ2x2 = + lim x − 1 + κ2x2 x→+∞ κ κ x→+∞ κ ! 1 r 1 = + lim x 1 − 1 + κ x→+∞ κ2x2

 q 1  1 1 − 1 + κ2x2 = + lim x→+∞ 1 κ x 1 1 = + lim q κ x→+∞ 2 1 κ x 1 + κ2x2 1 = . κ We conclude that: 1 φ(2) = . (B.12) κ Not consider the case where t > 2. Calculating the limit gives us

 1  √  1  1 √  lim tx + 1 − κx − 1 + κ2x2 = + lim (t − 1)x − 1 + κ2x2 x→+∞ κ κ x→+∞ κ ! 1 r 1 = + lim x (t − 1) − 1 + κ x→+∞ κ2x2 = +∞ (t − 1 − 1) = +∞.

115 We find that: φ(t) = +∞ t > 2. (B.13) When 0 < t < 2 equation B.11 can hold for some x. We will now determine this x as a function of t. We find that κx t = 1 + √ 1 + κ2x2 κx ⇒ (t − 1) = √ 1 + κ2x2 κ2x2 ⇒ (t − 1)2 = 1 + κ2x2 1 1 + κ2x2 ⇒ = (t − 1)2 κ2x2 1 1 ⇒ − 1 = (t − 1)2 κ2x2 s 1 1 ⇒ ± − 1 = (t − 1)2 κx ±1 ⇒ x = . q 1 κ (t−1)2 − 1

In what follows we will use following notations

1 p(t − 1)2 x+ = = q 1 κp1 − (t − 1)2 κ (t−1)2 − 1 −1 −p(t − 1)2 x− = = q 1 κp1 − (t − 1)2 κ (t−1)2 − 1 1  q  φ+(t) = x t + 1 − κx − 1 + κ2x2 + κ + + 1  q  φ−(t) = x t + 1 − κx − 1 + κ2x2 . − κ − −

+ − We remark that if t = 1 then xt = xt = 0. Hence for t = 1 we have that φ(1) = φ+(1) = φ−(1) = 1 − 1 = 0. The second order condition follows from a straightforward calculation

d  1  κ2x  d  −κ2x  −κ2 t − κ + √ = √ = < 0. 2 2 2 2 3 dx κ 1 + κ x dx 1 + κ x (1 + κ2x2) 2 To calculate the divergence we need to calculate

φ(t) = max φ+(t), φ−(t) (B.14)

116 1  q  φ+(t) = x t + 1 − κx − 1 + κ2x2 + κ + + s ! p(t − 1)2t 1 κp(t − 1)2 (t − 1)2 = + 1 − − 1 + κp1 − (t − 1)2 κ κp1 − (t − 1)2 (1 − (t − 1)2) ! p(t − 1)2t 1 p(t − 1)2 1 = + 1 − − κp1 − (t − 1)2 κ p1 − (t − 1)2 p1 − (t − 1)2 ! 1 p(t − 1)2t − p(t − 1)2 − 1 1 = + κ p1 − (t − 1)2 κ ! 1 p(t − 1)2(t − 1) − 1 1 = + κ p1 − (t − 1)2 κ

We need to distinguish two cases when t > 1 we have that p(t − 1)2 = (t − 1) and when t < 1 we have p(t − 1)2 = −(t − 1). Hence if t > 1 we have ! 1 (t − 1)2 − 1 1 φ+(t) = + κ p1 − (t − 1)2 κ ! −1 1 − (t − 1)2 1 = + κ p1 − (t − 1)2 κ −1   1 = p1 − (t − 1)2 + . κ κ And if t < 1 we have

! 1 −(t − 1)2 − 1 1 φ+(t) = + . κ p1 − (t − 1)2 κ

We’ll now calculate φ−(t). 1  q  φ−(t) = x t + 1 − κx − 1 + κ2x2 − κ − − s ! −p(t − 1)2t 1 κp(t − 1)2 (t − 1)2 = + 1 + − 1 + κp1 − (t − 1)2 κ κp1 − (t − 1)2 (1 − (t − 1)2) ! −p(t − 1)2t 1 p(t − 1)2 1 = + 1 + − κp1 − (t − 1)2 κ p1 − (t − 1)2 p1 − (t − 1)2 ! 1 −p(t − 1)2t + p(t − 1)2 − 1 1 = + κ p1 − (t − 1)2 κ ! 1 −p(t − 1)2(t − 1) − 1 1 = + κ p1 − (t − 1)2 κ

117 We again distinguish two cases. For t > 1 we find that

! 1 −(t − 1)2 − 1 1 φ−(t) = + . κ p1 − (t − 1)2 κ and for t < 1 we have

! 1 (t − 1)2 − 1 1 φ−(t) = + κ p1 − (t − 1)2 κ −1   1 = p1 − (t − 1)2 + . κ κ If t > 1 we have that φ+(t) ≥ φ−(t) and for t < 1 we find that φ−(t) ≥ φ+(t). We   lim −1 p 2 1 1 also have that x%2 κ 1 − (t − 1) + κ = κ . We can conclude that (   −1 p1 − (t − 1)2 + 1 if 0 ≤ t ≤ 2 φ(t) = κ κ +∞ if t > 2.

118 Bibliography

[1] C. Acerbi, D. Tasche, On the coherence of expected shortfall, Journal of Banking and Finance, Vol. 26, Issue 7, 2002, 1487-1503.

[2] C. Acerbi, Spectral measures of risk: A coherent representation of subjective risk aversion, Journal of Banking and Finance, Vol. 26, 2002, 1505-1518.

[3] A. Ahmadi-Javid, Entropic Value-at-Risk: A new coherent risk measure,J Optim Theory Appl, Vol. 155,Issue 3, 2012, 1105-1123.

[4] A. Ben-Tal, A. Ben-Israel, A recourse certainty equivalent for decisions under uncertainty., Annals of Operations Research, Vol. 30, Issue 1, 1991, 1-44.

[5] A. Ben-Tal,M. Teboulle, An old-new concept of convex risk measures: the op- timised certainty equivalent., , Vol. 17, Issue 3, 2007, 449-476.

[6] J.M. Borwein, A.S. Lewis, Partially finite convex programming, Part I: Quasi relative interiors and duality theory, Mathematical Programming, Vol. 57, 1992, 15-48.

[7] J.M. Borwein, D.R. Luke, Duality an convex programming, Handbook of Math- ematical Methods in Imaging, edited by Scherzer and Otmar, Springer, 1992, 229-270.

[8] A. Chen, A. Pelsser, M. Vellekoop, Modelling non-monotone risk aversion using SAHARA utility functions, Journal of Economic Theory, Vol. 146, Issue 5, 2011, 2075-2092.

[9] E.R. Csetnek, Overcoming the failure of classical generalized interior-point reg- ularity conditions in convex optimisation., Logos verlag Berlin GmbH, 2010.

[10] S. Drapeau, M. Kupper, A. Papapantoleon, A Fourier Approach to the Com- putation of CV@R and Optimized Certainty Equivalents, Journal of Risk, Vol. 16, 2013, 3-29.

[11] H. F¨ollmer,A. Schied, Stochastic finance: An introduction in discrete time, De Gruyter, 2010.

[12] H. F¨ollmer, A. Schied, Convex and coherent risk measures, unpublished paper.

[13] H. F¨ollmer,A. Schied Convex measures of risk and trading constraints., Fi- nance and Stochastics, Vol. 6, Issue 4, 2002, 429-447.

119 [14] G. C. Goodwin, M. M. Seron, J. A. de Don´a, Constrained Control and Es- timation: An Optimisation Approach, Springer Science and Business Media, 2006.

[15] V. Henderson, D. Hobson, Utility indifference Pricing: an Overview, Volume on Indifference Pricing, 2004.

[16] O. Hernandez-Lerma, J. B. Lasserre Markov Chains and Invariant Probabili- ties, Birkh¨auser,2012.

[17] V.Jose,R.Nau,R.Winkler, Scoring Rules, Generalized Entropy and Utility maximisation, Operations Research, Vol. 56, Issue 5, 2008, 1146-1157.

[18] T. Knispel, H. F¨ollmer, Convex Risk Measures: Basic Facts, Law-invariance and beyond, Asymptotics for Large Portfolios, Handbook of the Fundamentals of Financial Decision Making: In 2 Parts, edited by L. MacLean, W. Ziemba, World scientific, 2013, 507-555.

[19] H. Levy, Y. Kroll, Ordering Uncertain Options with Borrowing and Lending, The Journal of Finance, Vol. 33, Issue 2, 1978, 553-574.

[20] A.Mas-Collell,M. Whinston,J. Green, Microeconomic theory, Oxford Univer- sity Press, 1995.

[21] K. Martin, C.T. Ryan, M. Stern, The Slater Conundrum: Duality and Pricing in Infinite Dimensional Optimization., SIAM. J. OPtim., Vol. 26, Issue 1, 2016, 111-138.

[22] R.Nau, R.Jose, R. Winkler, Duality between maximization of expected utility and minimization of Relative entropy when probabilities are imprecise, Int. Symp. on imprecise probability, 2009.

[23] J. Pontstein, Approaches in the theory of optimisation, Cambridge University Press, 1980.

[24] R. Raskin, M. Cochran, Interpretations and transformations of scale for the pratt-arrow absolute risk aversion coefficient: implications for generalized stochastic dominance, Western Journal of Agricultural Economics Vol.11, Issue 2, 1986, 204-210.

[25] R.T. Rockafellar, Convex Analysis, Princeton University press, 1970.

[26] L. Rogers, D. Williams Diffusions, markov processes and martingales: Volume 1: Foundations, Cambridge University Press, 2000.

[27] W.Rudin, Principles of Mathematical Analysis , Third edition, Mc.Graw-Hill, 1964.

[28] Y. Syau, A note on convex functions,International J. Math. and Math. Sci., Vol.22, 1998, 525-534.

120 [29] Y.Yamai, T. Yoshiba, Comparative analyses of expected shortfall and value at risk: expected utility maximisation and ., Monetary and Economic Studies, 2002, 95-116.

121