arXiv:cond-mat/0104295v5 [cond-mat.stat-mech] 2 May 2002 sarsos oteedfiinistento of notion the deficiencies these to response event. a damage As incurred an of severity diversifica stimulate the to account fail measured into may when VaR components by of risk its risk managing of Hence, the risks that stand-alone means the This of criticism). criticized sum the heavily of is overview measure an for risk a as (VaR) Value-at-Risk Introduction 1 3,ad[] niprateapefrars esr fti k this of measure risk a for example important An [5]. and [3], h otnso hspprd o eesrl eetopinion reflect necessarily not do paper this of contents The expectation eo) notntl,asmwa ilaigformulatio misleading somewhat a Unfortunately, below). 5 expectation ∗ † bxak os ofre3,212Mln,Iay -al c E-mail: Italy; Milano, 20122 34, Monforte Corso Abaxbank, etceBnebn,Psfc 00 2 00 rnfr,Ge Frankfurt, 60006 02, 06 10 Postfach Bundesbank, Deutsche eec;qatl;sub-additivity. v quantile; conditional herence; (VaR); value-at-risk expectation; ditional vni ae hr h sa siaosfrVRfail. VaR for coheren estimators usual a the yielding where of cases Sho sense in Expected even the this Moreover, in distributions. robust Expected underlying is one of the which definitions unless one the lost is of get there some can compare ES We of definition. property its coherence distributio loss the underlying even to the case coh applied when when a results appear same not may the is Differences to lead general ES in of definitions which most (VaR) Value-at-Risk of ciencies e words: Key propos been has variants several in (ES) Shortfall Expected WE c.Dfiiin52i 3) hsnto scoeyrela closely is notion This [3]). in 5.2 Definition (cf. (WCE) TE rmDfiiin51i 3,bti eea osntcoinc not does general in but [3], in 5.1 Definition from (TCE) ntechrneo xetdShortfall Expected of coherence the On xetdSotal ikmaue os odtoa expec conditional worst measure; Risk Shortfall; Expected al Acerbi Carlo pi 9 2002 19, April Abstract ∗ 1 coherent ikTasche Dirk hrdb etceBundesbank. Deutsche by shared s [email protected] mn;Emi:[email protected] E-mail: rmany; lea-ik(VR;ti en co- mean; tail (CVaR); alue-at-risk ikmaue a nrdcdi [2], in introduced was measures risk talcnb siae effectively estimated be can rtfall n[]sget hscicdneto coincidence this suggests [2] in n o o en u-diie(e [7] (see sub-additive being not for in oevr a osnttake not does VaR Moreover, tion. shv icniute.I this In discontinuities. have ns otoi a elre hnthe than larger be can portfolio a † otnosls distributions. loss continuous yVR(f 2,[] 1] r[1]). or [15], [3], [2], (cf. VaR by ikmauergrls of regardless measure risk t rn ikmaue nfact, In measure. risk erent hrfl,pitn u that out pointing Shortfall, da eeyfrtedefi- the for remedy as ed okcr ftedtisin details the of care took n sthe is ind e othe to ted d ihi sesection (see it with ide ain alcon- tail tation; os conditional worst alconditional tail be true. Meanwhile, several authors (e.g. [17], [13], or [1]) proposed modifications to TCE, this way increasing confusion since the relation of these modifications to TCE and WCE remained obscure to a certain degree.

The identification of TCE and WCE is to a certain degree a temptation though the authors of [3] actually did their best to warn the reader. WCE is in fact coherent but very useful only in a theoretical setting since it requires the knowledge of the whole underlying probability space while TCE lends itself naturally to practical applications but it is not coherent (see Example 5.4 below). The goal to construct a which is both coherent and easy to compute and to estimate was however achieved in [1]. The definition of (ES) at a specified level α in [1] (Definition 2.6 below) is the literal mathematical transcription of the concept “average loss in the worst 100α% cases”. We rely on this definition of Expected Shortfall in the present paper, despite the fact that in the literature this term was already used sometimes in another meaning.

With the paper at hand we strive primarily for making transparent the relations between the notions developed in [5], [3], [13], and [1]. We present four characterizations of Expected shortfall: as integral of all the quantiles below the corresponding level (eq. (3.3)), as limit in a tail strong law of large numbers (Proposition 4.1), as minimum of a certain functional introduced in [13] (Corollary 4.3 below), and as maximum of WCEs when the underlying probability space varies (Corollary 6.3). This way, we will show that the ES definition in [1] is complementary and even in some aspects superior to the other notions. Moreover, in a certain sense any law invariant has a representation with ES as the main building block (see [11]).

Some hints on the organization of the paper: In section 2 we give precise mathematical definitions to the five notions to be discussed. These are WCE, TCE, CVaR (conditional value-at-risk), ES, and its negative, the so-called α-tail mean (TM). Section 3 presents useful properties of α-tail mean and ES, namely the integral representation (3.3), continuity and monotonicity in the level α as well as coherence for ES. In section 4 we show first that α-tail mean arises naturally as limit of the average of the 100α% worst cases in a sample. Then we point out that in fact ES and CVaR are two different names for the same object. Section 5 is devoted to inequalities and examples clarifying the relations between ES, TCE, and WCE. In Section 6 we deal with the question how to state a general representation of ES in terms of WCE. Section 7 concludes the paper.

2 Basic definitions

We have to arrange a minimum set of definitions to be consistent with the notions used in [5], [13], and [1]. Fix for this section some real-valued X on a probability space (Ω, A, P). X is considered the random profit or loss of some asset or portfolio. For the purpose of this paper, we are mainly interested in losses, i.e. low values of X. By E[. . . ] we will denote

2 expectation with respect to P. Fix also some confidence level α ∈ (0, 1). We will often make use of the indicator function

1 , a ∈ A (2.1) 1A(a) = 1A = ( 0 , a 6∈ A .

Definition 2.1 (Quantiles) R x(α) = qα(X) = inf{x ∈ : P[X ≤ x] ≥ α} is the lower α-quantile of X, x(α) = qα(X) = inf{x ∈ R : P[X ≤ x] > α} is the upper α-quantile of X. We use the x-notation if the dependence on X is evident, otherwise the q-notion. 2

Note that x(α) = sup{x ∈ R : P[X ≤ x] ≤ α}. From {x ∈ R : P[X ≤ x] > α} ⊂ {x ∈ R : (α) P[X ≤ x] ≥ α} it is clear that x(α) ≤ x . Moreover, it is easy to see that

(α) (2.2) x(α) = x if and only if P[X ≤ x] = α for at most one x ,

(α) and in case x(α)

(α) (α) R [x(α), x ) , P[X = x ] > 0 (2.3) {x ∈ : α = P[X ≤ x]} = (α) (α) ( [x(α), x ] , P[X = x ] = 0 .

(2.2) and (2.3) explain why it is difficult to say that there is an obvious definition for value-at- risk (VaR). We join here [5] taking as VaRα the smallest value such that the probability of the absolute loss being at most this value is at least 1 − α. As this is not really comprehensible when said with words here is the formal definition:

Definition 2.2 (Value-at-risk) α α (α) VaR = VaR (X)= −x = q1−α(−X) is the value-at-risk at level α of X. 2

The definition of tail conditional expectation (TCE) given in [3], Definition 5.1, depends on the choice of quantile taken for VaR (and of some discount factor we neglect here for reasons of simplicity). But as there is a choice for VaR there is also a choice for TCE. That is why we consider a lower and an upper TCE. Denote the positive part of a number x by x+ = x,x> 0 and its negative part by x− = (−x)+. ( 0 , x ≤ 0 ,

Definition 2.3 (Tail conditional expectations) − Assume E[X ] < ∞. Then TCEα = TCEα(X)= −E[X | X ≤ x(α)] is the lower tail conditional expectation at level α of X. TCEα = TCEα(X) = −E[X | X ≤ x(α)] is the upper tail conditional expectation at level α of X. 2

3 TCEα is (up to a discount factor) the tail conditional expectation from Definition 5.1 in [3]. “Lower” and “upper” here corresponds to the quantiles used for the definitions, but not to the proportion of the quantities. In fact,

α (2.4) TCEα ≥ TCE is obvious.

As Delbaen says in the proof of Theorem 6.10 in [5], TCEα in general does not define a sub- additive risk measure (see Example 5.4 below). For this reason, in [3], Definition 5.2, the worst conditional expectation (WCE) was introduced. Here is the definition (up to a discount factor) in our terms:

Definition 2.4 (Worst conditional expectation) − Assume E[X ] < ∞. Then WCEα = WCEα(X) = − inf{E[X | A] : A ∈ A, P[A] > α} is the worst conditional expectation at level α of X. 2

− Observe that under the assumption E[X ] < ∞ the value of WCEα is always finite since (α) (α) then limt→∞ P[X ≤ x + t] = 1 implies that there is some event A = {X ≤ x + t} with P[A] > α and E[ |X| 1A] < ∞. We will see in section 5 that Definition 2.4 has to be treated with care nevertheless because the notion WCEα(X) hides the fact that it depends not only on the distribution of X but also on the structure of the underlying probability space. From the definition it is clear that for any random variables X and Y on the same probability space

WCEα(X + Y ) ≤ WCEα(X) +WCEα(Y ) ,

α i.e. WCE is sub-additive. Moreover, Proposition 5.1 in [3] says WCEα ≥ TCE . Hence WCEα is a majorant to TCEα ≥ VaRα. It is in fact the smallest coherent risk measure dominating VaRα and only depending on X through its distribution if the underlying probability space is “rich” enough (see Theorem 6.10 in [5] for details).

This is a nice result, but to a certain degree unsatisfactory since the infimum does not seem too handy. This observation might have been the reason for introducing the conditional value-at-risk (CVaR) in [17] (see also the references therein) and [13]. CVaR can be used as a base for very efficient optimization procedures. We quote here, up to the sign of the random variable and the corresponding change from α to 1 − α (cf. Definition 2.2), equation (1.2) from [13].

Definition 2.5 (Conditional value-at-risk) − α α E[(X−s)−] R Assume E[X ] < ∞. Then CVaR = CVaR (X) = inf α − s : s ∈ is the condi- tional value-at-risk at level α of X. n o 2

Note that by Proposition 4.2 and (4.9), CVaR is well-defined. But beware: Pflug states in equation (1.3) of [13] (translated to our setting, i.e. −X instead of Y and 1 − α instead of α)

4 the relation CVaRα(X) = TCEα(X) , without any assumption. Corollary 5.3 in connection with (α) Corollary 4.3 shows that this is only true if P[X

0, P[X ≤ x(α)]= α (in particular if the distribution of X is continuous). The last definition we need is that of α-tail mean from [1]. In order to make it comparable to the risk measures defined so far, we define it in two variants: the tail mean which is likely to be negative but appears in a statistical context (cf. Proposition 4.1 below), and the Expected Shortfall representing potential loss as in most cases positive number. The advantage of tail mean is the explicit representation allowing an easy proof of super-additivity (hence sub-additivity for its negative) independent of the distributions of the underlying random variables (cf. the theorem in the appendix of [1]). We will see below (Corollary 4.3) that the Expected Shortfall is in fact identical with CVaR and enjoys properties as coherence and continuity and monotonicity in the confidence level (section 3). Moreover, it is in a specific sense the largest possible value WCE can take (Corollary 6.3).

Definition 2.6 (Tail mean and Expected Shortfall) Assume E[X−] < ∞. Then −1 x¯(α) = TMα(X) = α E[X 1{X≤x(α)}]+ x(α) (α − P[X ≤ x(α)]) is the α-tail mean at level α of X.  2 ESα = ESα(X)= −x¯(α) is the Expected Shortfall (ES) at level α of X.

Note that by Corollary 4.3 α-tail mean and ESα only depend on the distribution of X and the level α but not on a particular definition of quantile.

3 Useful properties of tail mean and Expected Shortfall

The most important property of ES (Definition 2.6) might be its coherence.

Proposition 3.1 (Coherence of ES) Let α ∈ (0, 1) be fixed. Consider a set V of real-valued random variables on some probability space (Ω, A, P) such that E[X−] < ∞ for all X ∈ V . Then

ρ : V → R with ρ(X) =ESα(X) for X ∈ V is a coherent risk measure in the sense of Definition 2.1 in [5], i.e. it is

(i) monotonous: X ∈ V, X ≥ 0 ⇒ ρ(X) ≤ 0,

(ii) sub-additive: X,Y,X + Y ∈ V ⇒ ρ(X + Y ) ≤ ρ(X)+ ρ(Y ),

(iii) positively homogeneous: X ∈ V, h> 0, h X ∈ V ⇒ ρ(h X)= h ρ(X), and

(iv) translation invariant: X ∈ V, a ∈ R ⇒ ρ(X + a)= ρ(X) − a.

5 Proof. See Proposition A.1 in the Appendix for an elementary proof of (ii). To check (i), (iii) and (iv) is an easy exercise (cf. also Proposition 3.2). 2

In the financial industry there is a growing necessity to deal with random variables with discon- tinuous distributions. Examples are portfolios of not-traded loans (purely discrete distributions) or portfolios containing derivatives (mixtures of continuous and discrete distributions). One problem with measures like VaR, TCE, and WCE, when applied to discontinuous dis- tributions, may be their sensitivity to small changes in the confidence level α. In other words, they are not in general continuous with respect to the confidence level α (see Example 5.4).

In contrast, ESα is continuous with respect to α. Hence, regardless of the underlying distribu- tions, one can be sure that the risk measured by ESα will not change dramatically when there is a switch in the confidence level by – say – some base points. We are going to derive this insensitivity property in Corollary 3.3 below as a consequence of an alternative representation of tail mean. This integral representation (Proposition 3.2) – which was already given in [4] for the case of continuous distributions – might be of interest on its own. Another – almost self-evident

– important property of ESα is its monotonicity in α. The smaller the level α the greater is the risk. We show this formally in Proposition 3.4.

Proposition 3.2 If X is a real-valued random variable on a probability space (Ω, A, P) with E[X−] < ∞ and α ∈ (0, 1) is fixed, then α −1 x¯(α) = α x(u) d u , Z0 with x¯(α) and x(u) as in Definitions 2.1 and 2.6, respectively.

Proof. By switching to another probability space if necessary, we can assume that there is a real random variable U on (Ω, A, P) that is uniformly distributed on (0, 1), i.e. P[U ≤ u] = u, u ∈ (0, 1). It is well-known that then the random variable Z = x(U) has the same distribution as X.

Since u 7→ x(u) is non-decreasing we have

(3.1) {U ≤ α} ⊂ {Z ≤ x(α)} and

(3.2) {U > α} ∩ {Z ≤ x(α)} ⊂ {Z = x(α)} .

By (3.1) and (3.2) we obtain α x(u) d u = E[Z 1{U≤α}] Z0 = E[Z 1{Z≤x(α)}] − E[Z 1{U>α}∩{Z≤x(α)}]

= E[X 1{X≤x(α)}]+ x(α) α − P[X ≤ x(α)] .   Dividing by α now yields the assertion. 2

6 Note that by definition of Expected Shortfall, Proposition 3.2 implies the representation α −1 (3.3) ESα(X) = − α qu(X) d u . Z0 Eq. (3.3) shows that ES is the coherent risk measure used in [11] as main building block for the representation of law invariant coherent risk measures.

Corollary 3.3 If X is a real-valued random variable with E[X−] < ∞, then the mappings

α 7→ x¯α and α 7→ ESα are continuous on (0, 1).

Proof. Immediate from Proposition 3.2 and (3.3). 2

For some of the results below and in particular the subsequent proposition on monotonicity of the tail mean and ES, a further representation for x(α) is useful (cf. Appendix in [1]). Let for x ∈ R

(α) 1{X≤x} , if P[X = x] = 0 (3.4) 1{X≤x} =  1 + α−P[X≤x] 1 , if P[X = x] > 0 .  {X≤x} P[X=x] {X=x} Then a short calculation shows (3.5) 1(α) ∈ [0, 1] , {X≤x(α)} (3.6) E 1(α) = α , and {X≤x(α)} −1 h (α) i (3.7) α E X 1 = x α . {X≤x(α)} ( ) h i

Proposition 3.4 If X is a real-valued random variable with E[X−] < ∞, then for any α ∈ (0, 1) and any ǫ> 0 with α + ǫ< 1 we have the following inequalities:

x(α+ǫ) ≥ x(α) and

ESα+ǫ(X) ≤ ESα(X) .

Proof. We adopt the representation (3.7). This yields

−1 (α+ǫ) −1 (α) x α ǫ − x α = E X (α + ǫ) 1 − α 1 ( + ) ( ) {X≤x(α+ǫ)} {X≤x(α)} h  i = (α(α + ǫ))−1E X α 1(α+ǫ) − (α + ǫ) 1(α) {X≤x(α+ǫ)} {X≤x(α)} h  i −1 (α+ǫ) (α) ≥ (α(α + ǫ)) E x α α 1 − (α + ǫ) 1 ( ) {X≤x(α+ǫ)} {X≤x(α)} h  i x(α) = α E 1(α+ǫ) − (α + ǫ)E 1(α) α(α + ǫ) {X≤x(α+ǫ)} {X≤x(α)}  h i h i x(α) = (α (α + ǫ) − (α + ǫ) α) α(α + ǫ) = 0 .

7 The inequality is due to the fact that by (3.5)

≤ 0 , if Xx(α) .

4 Motivation for tail mean and Expected Shortfall

Assume that we want to estimate the lower α-quantile x(α) of some random variable X. Let some sample (X1,... ,Xn), drawn from independent copies of X, be given. Denote by X1:n ≤

. . . ≤ Xn:n the components of the ordered n-tuple (X1,... ,Xn). Denote by ⌊x⌋ the integer part of the number x ∈ R, hence

⌊x⌋ = max{n ∈ Z : n ≤ x} .

Then the order statistic X⌊nα⌋:n appears as natural estimator for x(α). Nevertheless, it is well (α) known that in case of a non-unique quantile (i.e. x(α) < x ) the quantity X⌊nα⌋:n does not converge to x(α). This follows for instance from Theorem 1 in [8] which says that

(α) 1 = P[X⌊nα⌋:n ≤ x(α) infinitely often] = P[X⌊nα⌋:n ≥ x infinitely often] .

Surprisingly, we get a well-determined limit when we replace the single order statistic by an average over the left tail of the sample. Recall the definition (2.1) of an indicator function.

Proposition 4.1 Let α ∈ (0, 1) be fixed, X a real random variable with E[X−] < ∞ and

(X1, X2,... ) an independent sequence of random variables with the same distribution as X. Then with probability 1

⌊nα⌋ Xi:n i=1 (4.1) lim =x ¯(α) . n→∞ P⌊nα⌋

If X is integrable, then the convergence in (4.1) holds in L1, too.

Proof. Due to Proposition 3.2, the “with probability 1” part of Proposition 4.1 is essentially a special case of Theorem 3.1 in [18] with 0 = t0 < α = t1 < t2 = 1, J(t) = 1(0,α](t), −1 JN (t) = 1 ⌊Nα⌋+1 (t), g(t) = F (t), and p1 = p2 = ∞. Concerning the L1-convergence note (0, N ] that ⌊nα⌋ n Xi:n ≤ |Xi|. i=1 i=1 X X −1 n By the strong law of large numbers n i=1 |Xi| converges in L1. This implies uniform inte- −1 n −1 ⌊nα⌋ grability for n i=1 |Xi| and for n Pi=1 Xi:n . Together with the already proven almost 2 sure convergenceP this implies the assertion. P

8 To see how a direct proof of the almost sure convergence in Proposition 4.1 would work consider the following heuristic computation. Observe first that

⌊nα⌋ Xi:n n n i=1 1 = Xi n 1 + Xi n 1 (i) − 1 P⌊nα⌋ ⌊nα⌋ : {Xi:n≤X⌊nα⌋:n} : {1,...,⌊nα⌋} {Xi:n≤X⌊nα⌋:n} i i ! X=1 X=1   n n 1 = Xi 1 + X 1 (i) − 1 ⌊nα⌋ {Xi≤X⌊nα⌋:n} ⌊nα⌋:n {1,...,⌊nα⌋} {Xi:n≤X⌊nα⌋:n} i i ! X=1 X=1   n n 1 (4.2) = Xi 1 + X ⌊nα⌋− 1 . ⌊nα⌋ {Xi≤X⌊nα⌋:n} ⌊nα⌋:n {Xi≤X⌊nα⌋:n} i i ! X=1  X=1  If we now had

(4.3) lim X⌊nα⌋:n = x(α) , n→∞ with probability 1, in connection with limn→∞ n/⌊nα⌋ = 1/α it would be plausible to obtain (4.1). Unfortunately (4.3) is not true in general, but only

(α) (4.4) lim inf X⌊nα⌋:n = x(α) and lim sup X⌊nα⌋:n = x . n→∞ n→∞ Nevertheless the proof could be completed on the base of (4.2) by using (4.4) together with the Glivenko-Cantelli theorem and Corollary 4.3 below.

Proposition 4.1 validates the interpretation given to α–tail mean in [1] as mean of the worst 100α% cases. This concept, which seems very natural from an insurance or risk management point of view, has so far appeared in the literature by different kinds of conditional expectation beyond VaR which is a different concept for discrete distributions. “Tail Conditional expecta- tion”, “worst conditional expectation”, “conditional ” all bear also in their name the fact that they are conditional expected values of the random variable X (note that concerning

CVaR, by Corollary 4.3 below this is a misinterpretation). For TCEα, for instance, the natural estimator is not given by the one analyzed in (4.1) or its negative, but rather by

n i=1 Xi 1{Xi≤X nα n} (4.5) − ⌊ ⌋: n 1 P i=1 {Xi≤X⌊nα⌋:n}

P (α) which however has problems of convergence in case x(α)

This is the reason why we avoid the term ”conditional” in our definition of α–tail mean. In fact, it is not very hard to see (cf. Example 5.4 below) that α–tail mean does not admit a general representation in terms of a conditional expectation of X given some event A ∈ σ(X) (i.e. some event only depending on X). Hence it is not possible to give a definition of the type

(4.6) x¯(α) = E[X|A] for some A ∈ σ(X) ,

9 unless the event A is chosen in a σ-algebra A⊃ σ(X) on an artificial new probability space (see Corollary 6.2 below).

In order to make visible the coincidence of CVaR and tail mean, the following proposition collects some facts on quantiles which are well-known in probability theory (cf. Exercise 3 in ch. 1 of [9] or Problem 25.9 of [10] for the here cited version):

Proposition 4.2 Let X be a real integrable random variable on some probability space (Ω, A, P).

Fix α ∈ (0, 1) and define the function Hα : R → [0, ∞) by + − (4.7) Hα(s) = α E[(X − s) ] + (1 − α) E[(X − s) ] .

Then the function Hα is convex (and hence continuous) with lim Hα(s)= ∞. The set Mα of |s|→∞ minimizers to Hα is a compact interval, namely

(α) Mα = [x(α), x ] (4.8) = {s ∈ R : P[X

Note the following equivalent representations for Hα: E[(X − s)−] (4.9) H (s) = α E[X]+ α − s α α   E[X 1{X≤s}] α − P[X ≤ s] (4.10) = α E[X] − α + s . α α   From Definitions 2.5 and 2.6 for CVaR and ES, respectively, and by (4.10), in connection with Proposition 4.2, we obtain the following corollary to the proposition.

Corollary 4.3 Let X be a real integrable random variable on some probability space (Ω, A, P) and α ∈ (0, 1) be fixed. Then

α ESα(X) = CVaR (X) −1 (α) 2 (4.11) = − α E[X 1{X≤s}]+ s (α − P[X ≤ s]) , s ∈ [x(α), x ] .  A further representation of ES or CVaR, respectively, as expectation of a suitably modified tail distribution is given in the recent research report [14] (cf. Def. 3 therein).

In Definitions 2.5 and 2.6 only E[X−] < ∞ is required for X. Indeed, this integrability condition would suffice to guarantee (4.11). We formulated Corollary 4.3 with full integrability of X because we wanted to rely on Proposition 4.2 for the proof.

Note that by a simple calculation one can show that (4.11) is equivalent to

−1 (α) (4.12) ESα(X) = − α E[X 1{X

10 5 Inequalities and counter-examples

In this section we compare the Expected Shortfall with the risk measures TCE and WCE defined in section 2. Moreover, we present an example showing that VaR and TCE are not sub-additive in general. By the same example we show that there is not a clear relationship between WCE and lower TCE. We start with a result in the spirit of the Neyman-Pearson lemma.

Proposition 5.1 Let α ∈ (0, 1) be fixed and X be a real-valued random variable on some proba- bility space (Ω, A, P). Suppose that there is some function f : R → R such that E[(f ◦X)−] < ∞, f(x) ≤ f(x(α)) for xx(α). Let A ∈ A be an event with P[A] ≥ α and E[ |f ◦ X| 1A] < ∞. Then

(i) TMα(f ◦ X) ≤ E[f ◦ X | A] ,

(ii) TMα(f ◦ X) = E[f ◦ X | A] if P[A ∩ {X>x(α)}] = 0 and

(5.1) P[X

(5.2) P[X 0, P[Ω\A ∩ {X

(iii) if f(x) < f(x(α)) for x f(x(α)) for x>x(α), then TMα(f ◦ X) =

E[f ◦ X | A] implies P[A ∩ {X>x(α)}] = 0 and either (5.1) or (5.2).

Proof. Note that by assumption

{X ≤ x(α)} ⊂ {f ◦ X ≤ f(x(α))} and {X

Hence we see from (4.8) that

P[f ◦ X ≤ f(x(α))] ≥ α and P[f ◦ X

α (5.3) qα(f ◦ X) ≤ f(x(α)) ≤ q (f ◦ X) .

Moreover, the assumption implies

(5.4) {f ◦ X ≤ f(x(α))}\{X ≤ x(α)} ⊂ {f ◦ X = f(x(α))} .

11 By Corollary 4.3, (5.3), (5.4), (3.6), and (3.7), we can calculate similarly to the proof of Propo- sition 3.4

−1 −1 (α) E[f ◦ X|A] − TMα(f ◦ X) = E f ◦ X P [A] 1A − α 1 {f◦X≤f(x(α))} h  i −1 −1 (α) = E f ◦ X P [A] 1A − α 1 {X≤x(α)} h  i −1 (α) = (α P [A]) f(x α )E α 1A − P [A] 1 ( ) {X≤x(α)}  h i (α) + E (f ◦ X − f(x α )) α 1A − P [A] 1 ( ) {X≤x(α)} h  i  −1 (α) = (α P [A]) E (f ◦ X − f(x α )) α 1A − P [A] 1 ( ) {X≤x(α)} h  i (5.5) ≥ 0 .

Here, we obtain inequality (5.5) from the assumption on f since

(α) ≤ 0 , if Xx(α) .

This proves (i). The sufficiency and necessity respectively of the conditions in (ii) and (iii) for equality in (5.5) are easily obtained by careful inspection of (5.5). 2

Note that the condition

(5.7) P[A ∩ {X>x(α)}]=0 and P[Ω\A ∩ {X

(5.8) {X

In particular, (5.7) is implied by (5.8).

The proof of Proposition 5.1 is the hardest work in this section. Equipped with its result we are in a position to derive without effort a couple of conclusions pointing out the relations between

TCE, WCE and ES. Recall ESα = −TMα.

Corollary 5.2 Let α ∈ (0, 1) and X a real-valued random variable on some probability space (Ω, A, P) with E[X−] < ∞. Then

α (5.9) TCE (X) ≤ TCEα(X) ≤ ESα(X) , and α (5.10) TCE (X) ≤ WCEα(X) ≤ ESα(X) .

Proof. The first inequality in (5.9) is obvious (formally it follows from Lemma 5.1 in [16]). The second follows from Proposition 5.1 (i) by setting f(x) = x, A = {X ≤ x(α)}, and observing

12 P[X ≤ x(α)] ≥ α. The first inequality in (5.10) was proven in Proposition 5.1 of [3]. The second follows again from Proposition 5.1 (i) since all the events in the definition of WCE have probabilities > α. 2

The following corollary to Proposition 5.1 presents in particular in (i) a first sufficient condition for WCE and ES to coincide, namely continuity of the distribution of X.

Corollary 5.3 Let α and X be as in Corollary 5.2. Then

(α) (α) (i) P[X ≤ x ]= α, P[X 0 or P[X ≤ x , X 6= x(α)] = 0 if and only if

α (5.11) ESα(X) = WCEα(X) = TCEα(X) = TCE (X) .

In particular, (5.11) holds if the distribution of X is continuous, i.e. P[X = x] = 0 for all x ∈ R.

(ii) P[X ≤ x(α)]= α or P[X

Proof. Concerning (i) apply Proposition 5.1 (ii) and (iii) with A = {X ≤ x(α)} and Corollary 2 5.2. In order to obtain (ii) apply Proposition 5.1 (ii) and (iii) with A = {X ≤ x(α)}.

Corollary 5.2 leaves open the relation between TCEα(X) and WCEα(X). The implication

(5.12) P[X ≤ x(α)] > α ⇒ TCEα(X) ≤ WCEα(X) is obvious. Corollary 5.3 (ii) shows that

(5.13) P[X ≤ x(α)] = α ⇒ TCEα(X) ≥ WCEα(X) .

The following example shows that all the inequalities between TCE, WCE, and ES in (5.9),

(5.10), (5.12), and (5.13) can be strict. Moreover, it shows that none of the quantities −qα, α α VaR , TCEα, or TCE defines a sub-additive risk measure in general.

Example 5.4

Consider the probability space (Ω, A, P) with Ω = {ω1,ω2,ω3}, A the set of all subsets of Ω and P specified by

P[{ω1}]=P[{ω2}]= p, P[{ω3}] = 1 − 2 p , 1 and choose 0

−N , if i = j Xi(ωj) = ( 0 , otherwise. Choose α such that 0 <α< 2 p. Then it is straightforward to obtain Table 1 with the values of the risk measures interesting to us.

13 p<α< 2p p = α p > α

Risk Measure X1,2 X1 + X2 X1,2 X1 + X2 X1,2 X1 + X2

−qα 0 N N N N N VaRα 0 N 0 N N N TCEα Np N Np N N N

TCEα Np N N N N N

WCEα N/2 N N/2 N N N

ESα Np/α N N N N N

Table 1: Values of risk measures for Example 5.4.

In case p<α< 2 p we see from Table 1 that

−qα(X1) − qα(X2) < −qα(X1 + X2) α α α VaR (X1) + VaR (X2) < VaR (X1 + X2)

TCEα(X1)+TCEα(X2) < TCEα(X1 + X2) α α α TCE (X1)+TCE (X2) < TCE (X1 + X2) .

α α These inequalities show that none of the notions −qα, VaR , TCEα, or TCE can be used to define a sub-additive risk measure. In case p<α< 2 p we have also

TCEα(X1) < ESα(X1) α TCE (X1) = TCEα(X1) < WCEα(X1)

(5.14) WCEα(X1) < ESα(X1) . Hence the second inequalities in (5.9), (5.10), and (5.12) may be strict, as can be the first inequality in (5.10). In case p = α we have from Table 1 that

α TCE (X1) < TCEα(X1) and

TCEα(X1) > WCEα(X1) . Thus, also the first inequality in (5.9) and the inequality in (5.13) can be strict. In particular, we see that there is not any clear relationship between TCEα and WCE. Beside the inequalities, from the comparison with the results in the region p > α, we get an example for the fact that all the measures but ES may have discontinuities in α. Moreover, in case p < α we have a stronger version of (5.14), namely

− inf{E[X1 | A] : A ∈A, P[A] ≥ α} < ESα(X1) , which shows that even if one replaces “>” by “≥” in Definition 2.4, strict inequality may appear in the relation between WCE and ES. 2

We finally observe that Example 5.4 is not so academic as it may seem at first glance since the

Xi’s may be figured out as two risky bonds of nominal N with non–overlapping default states

ωi of probability p.

14 6 Representing ES in terms of WCE

By Example 5.4 we know that WCE and ES may differ in general. Nevertheless, we are going to show in the last part of the paper that this phenomenon can only occur when the underlying probability space is too “small” in the sense of not allowing a suitable representation of the random variable under consideration as function of a continuous random variable. Moreover, as long as only finitely many random variables are under consideration it is always possible to switch to a “larger” probability space in order to make WCE and ES coincide. Finally, we state a general representation of ES in terms of related WCEs.

Proposition 6.1 Let X and Y be a real-valued random variables on a probability space (Ω, A, P) such that E[Y −] < ∞. Fix some α ∈ (0, 1). Assume that Y is given by Y = f ◦X where f satisfies f(x) ≤ f(x(α)) for xx(α).

(i) If P[X ≤ x(α)]= α then

ESα(Y ) = − inf E[Y | A] . A∈A, P[A]≥α

(ii) If the distribution function of X is continuous then also

ESα(Y ) = WCEα(Y ) .

Proof. Concerning (i), by Proposition 5.1 (i) we only have to show

(6.1) TMα(Y ) = E[Y | X ≤ x(α)] .

With the choice A = {X ≤ x(α)} this follows from Proposition 5.1 (ii).

Concerning Proposition 6.1 (ii), by (6.1), we have to show that there is a sequence (An)n∈N in

A with P[An] > α for all n ∈ N such that

lim E[Y | An]=E[Y | X ≤ x(α)] . n→∞ By continuity of the distribution of X and integrability of Y − we obtain such a sequence with (α) the definition An = {X ≤ x + 1/n} . 2

d Corollary 6.2 Let (X1,... ,Xd) be an R -valued random vector on a probability space (Ω, A, P) − ′ ′ such that E[Xi ] < ∞, i = 1,... ,d. Fix α ∈ (0, 1). Then there is a random vector (X1,... ,Xd) on some probability space (Ω′, A′, P′) with the following two properties:

′ ′ (i) The distributions of (X1,... ,Xd) and (X1,... ,Xd) are equal, i.e.

′ ′ ′ R P[X1 ≤ x1,... ,Xd ≤ xd] = P [X1 ≤ x1,... ,Xd ≤ xd] for all (x1,... ,xd) ∈ .

15 (ii) Worst conditional expectation and Expected Shortfall coincide for all i = 1,... ,d, i.e. ′ ′ 2 WCEα(Xi) = ESα(Xi) , i = 1,... ,d.

Proof. By Sklar’s theorem (cf. Theorem 2.10.9 in [12]) we get the existence of a random vector ′ (U1,... ,Ud) where each Ui is uniformly distributed on (0, 1) such that (i) holds with Xi = qUi (Xi), i = 1,... ,d. Since qα is non-decreasing in α the assertion now follows from Proposition 6.1. 2

Corollary 6.2 yields another proof for the sub-additivity of Expected Shortfall: in order to prove

ESα(X)+ESα(Y ) ≥ ESα(X +Y ) apply the corollary to the underlying random vector (X,Y,X + Y ).

As a final consequence of Corollary 5.2 and Corollary 6.2 we note:

Corollary 6.3 Let X be a real-valued random variable on some probability space (Ω, A, P) with E[X−] < ∞. Fix α ∈ (0, 1). Then

′ ′ ′ ′ ′ ESα(X) = max WCEα(X ) : X random variable on (Ω , A , P ) with n P′[X′ ≤ x]=P[X ≤ x] for all x ∈ R , o where the maximum is taken over all random variables X′ on probability spaces (Ω′, A′, P′) such that the distributions of X and X′ are equal. 2

Corollary 6.3 shows that Expected Shortfall in the sense of Definition 2.6 may be considered a robust version of worst conditional expectation (Definition 2.4), making the latter insensitive to the underlying probability space.

7 Conclusion

In the paper at hand we have shown that simply taking a conditional expectation of losses beyond VaR can fail to yield a coherent risk measure when there are discontinuities in the loss distributions. Already existing definitions for some kind of expected shortfall, redressing this drawback, as those in [3] or [13], did not provide representations suitable for efficient computation and estimation in the general case. We have clarified the relations between these definitions and the explicit one from [1], thereby pointing out that it is the definition which is most appropriate for practical purposes.

16 A Appendix: Subadditivity of Expected Shortfall

We give here for the sake of completeness the proof of subadditivity for expected shortfall which was originally given in Appendix A of [1].

For the proof it is convenient to adopt the representation of eq. (3.7) for the Tail Mean and write the Expected Shortfall as

1 (α) (A.1) ESα(Y ) = − E[X 1 ] α {X≤x(α)}

(α) with the function 1{X≤s} defined in (3.4).

Proposition A.1 (Subadditivity of Expected Shortfall) Given two random variables X and Y with E[X−] < ∞ and E[Y −] < ∞ the following inequality holds:

(A.2) ESα(X + Y ) ≤ ESα(X)+ESα(Y ) for any α ∈ (0, 1]

Proof. Defining Z = X + Y , we obtain by virtue of (3.6)

(A.3) α ESα(X)+ESα(Y ) − ESα(Z) =  = E Z 1(α) − X 1(α) − Y 1(α) {Z≤z(α)} {X≤x(α)} {Y ≤y(α)} h i

= E X 1(α) − 1(α) + Y 1(α) − 1(α) {Z≤z(α)} {X≤x(α)} {Z≤z(α)} {Y ≤y(α)} h    i

(α) (α) (α) (α) ≥ x α E 1 − 1 + y α E 1 − 1 ( ) {Z≤z(α)} {X≤x(α)} ( ) {Z≤z(α)} {Y ≤y(α)} h i h i

= x(α) (α − α)+ y(α) (α − α) = 0 which proves the thesis. In the inequality above we used the fact that

(α) (α) 1 − 1 ≥ 0 if X>x α {Z≤z(α)} {X≤x(α)} ( ) (A.4)   (α) (α)  1 − 1 ≤ 0 if X

17 References

[1] Acerbi, C., Nordio, C., Sirtori, C. (2001) Expected Shortfall as a Tool for Management. Working paper. http://www.gloriamundi.org/var/wps.html

[2] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D. (1997) Thinking coherently. RISK 10(11).

[3] Artzner, P., Delbaen, F., Eber, J.-M., Heath, D. (1999) Coherent measures of risk. Math. Fin. 9(3), 203–228.

[4] Bertsimas, D., Lauprete, G.J., Samarov, A. (2000) Shortfall as a risk measure: properties, optimization and applications. Working paper, Sloan School of Management, MIT, Cambridge.

[5] Delbaen, F. (1998) Coherent risk measures on general probability spaces. Working paper, ETH Z¨urich. http://www.math.ethz.ch/∼delbaen/

[6] Delbaen, F. (2001) Coherent risk measures. Lecture Notes, Scuola Normale Superiore di Pisa.

[7] Embrechts, P. (2000) Extreme Value Theory: Potential and Limitations as an Integrated Risk Management Tool. Working paper, ETH Z¨urich. http://www.math.ethz.ch/∼embrechts/

[8] Feldman, D., Tucker, H.G. (1966) Estimation of non-unique quantiles. Ann. Math. Stat. 37, 451–457.

[9] Ferguson, T.G. (1967) Mathematical Statistics: a Decision Theoretic Approach. Aca- demic Press: London.

[10] Hinderer, K. (1972) Grundbegriffe der Wahrscheinlichkeitstheorie (Fundamental terms of probability theory). Springer: Berlin.

[11] Kusuoka, S. (2001) On law invariant coherent risk measures. In Advances in Mathematical Economics, volume 3, pages 83–95. Springer: Tokyo.

[12] Nelsen, R.B. (1999) An introduction to copulas. Springer: New York.

[13] Pflug, G. (2000) Some remarks on the value-at-risk and the conditional value-at-risk. In, Uryasev, S. (Editor). 2000. Probabilistic Constrained Optimization: Methodology and Applications. Kluwer Academic Publishers. http://www.gloriamundi.org/var/pub.html

[14] Rockafellar, R.T., Uryasev, S. (2001) Conditional value-at-risk for general loss dis- tributions. Research report 2001-5, ISE Dept., University of Florida. http://www.ise.ufl.edu/uryasev/pubs.html

18 [15] Rootzen,´ H., Kluppelberg,¨ C. (1999) A single number can’t hedge against economic catastrophes. Ambio 28(6), 550–555. Royal Swedish Academy of Sciences.

[16] Tasche, D. (2000) Conditional expectation as quantile derivative. Working paper, TU M¨unchen. http://www.ma.tum.de/stat/

[17] Uryasev, S. (2000) Conditional Value-at-Risk: Optimization Algorithms and Applications. Financial Engineering News 2 (3). http://www.gloriamundi.org/var/pub.html

[18] Van Zwet, W.R. (1980) A strong law for linear functions of order statistics. Ann. Probab. 8, 986–990.

19