arXiv:2012.04521v1 [math.OC] 8 Dec 2020 piiainpolm h ne rbe st iiietee the decompo minimize be to again is can problem problem inner The The problem. here. c approach optimization convex similar representat the recent a involving the low functions Using over problem p problem. optimization optimization Rock decision the the see from solve parameter, infima to P two valued and resulting real Pflug the S a terchanging in Expected over of shown (20 problem representation Shortfall al. the optimization et use Expected Chow (2011) the Ott in B¨auerle of and Whereas result U˘gurlu (2017). position (2015); al. et Chow levels. different at Shortfall or vra nnt iehrzn h ls frs esrsw co we measures risk of class the either The measures to horizon. risk applied di time measure over infinite generated risk an is the over process minimizing cost at a aims p that she this cost, assume In total we directly. the i.e. procedure to class, solution applied recursive Ruszczy´ is a ki measures in get t these risk still found the for to where be theory measure ones The example the risk for (2020). the Glauner can B¨auerle and apply MDP (2014); general which for those approach and recursively sures (2011). sometime Ott B¨auerle policy and Us e.g. optimal account. see whose into maker maker decision decision measur risk-neutral the risk by of (MDPs) risk-sensitivity Processes the Decision Markov of mization o xetdSotal h rbe a led entreate been already has problem the Shortfall, Expected For h ieauecnhr edvddit w tem:Toepa Those streams: two into divided be here can literature The repl to proposals various been have there decade, last the In odtoa Value-at-Risk Conditional IIIIGSETA IKMAUE PLE OMARKOV TO APPLIED MEASURES RISK SPECTRAL MINIMIZING M ujc classifications subject AMS e words Key Abstract. pia enuac rbe,weea nuac opn mi company dynam insurance a an present where we problem, application, the reinsurance an in optimal As (2011) for Shortfall. Ott algorithm B¨auerle Expected and an in is derive findings and dimensiona the solution include infinite a results the of Regarding existence s the give exists. and prove space policy state extended optimal an minimizat the an on that MDP two ordinary show into an We as split measures. solved risk co is the spectral problem and for spaces i representation optimization action or The and finite state a Borel above. over have to (MDP) assumed Process is Decision MDP Markov a by generated ai Reinsurance namic hc omacaso oeetrs esrsicuigthe including measures risk coherent of class a form which esuytemnmzto faseta ikmaueo h to the of measure risk spectral a of minimization the study We ikSniieMro eiinPoes pcrlRs Measure; Risk Spectral Process; Decision Markov Risk-Sensitive : IOEB NICOLE oepeieyseta ikmaue r itrso Expe of mixtures are measures risk spectral precisely More . EIINPROCESSES DECISION UREADAEADRGLAUNER ALEXANDER AND AUERLE ¨ 04,9G0 91G05 91G70, 90C40, : 1. Introduction 1 pr ecnrbt otescn model second the to contribute we aper, ne iiiainpolmcnbe can problem minimization inner n ipyteepcainmdl a models expectation the simply ing ue iiiainpolm we problem, minimization outer l pca aeta h ikmeasure risk the that case special Our approximation. numerical its cetnino h lsia static classical the of extension ic a evr ik,fra example an for risky, very be can s ceetm yadcso ae and maker decision a by time screte o fseta ikmaue san as measures risk spectral of ion iie t oto capital. of cost its nimizes nuaei ihe 21) efol- we (2015), Pichler in onjugate o rbesuiga infimum an using problems ion do oesi ahrdffrn to different rather is models of nd otala h ouino global a of solution the as hortfall ic ntercrieapoc we approach recursive the in since ffiin odtosudrwhich under conditions ufficient olm ilsatose method two-step a yields roblems neg ¨ureadOt(2011); Ott B¨auerle and e.g. in d tfnto a eunbounded be may function st otoe nt iehrznor horizon time finite a over cost fiiepann oio.The horizon. planning nfinite sdrhr r so-called are here nsider fla n rae 20) In- (2000). Uryasev and afellar s h dabhn ti otake to is it behind idea The es. c h xetto nteopti- the in expectation the ace pce ovxfnto fthe of function convex xpected 5 h uhr s decom- a use authors the 15) ettlcs.Terecursive The cost. total he clr(06,teatosof authors the (2016), ichler si(00;CuadZhang and Chu (2010); nski e noa ne n outer and inner an into sed eswihapyrs mea- risk apply which pers a icutdcost discounted tal xetdShortfall Expected Dy- spectral cted 2 N. BAUERLE¨ AND A. GLAUNER total cost. It can be solved with MDP techniques after a suitable extension of the original state space. Note that already here we get some difference to the problem. In contrast to the findings in B¨auerle and Ott (2011) who assume bounded cost or U˘gurlu (2017) who assume L1 cost, we only require the cost to be bounded from below. No further integrability assumption is necessary here. Moreover, we allow for general Borel state and action spaces and give continuity and compactness conditions under which an optimal policy exists. The major challenge is now the outer optimization problem, since we have to minimize over a function space and the dependence of the value function of the MDP on the functions is involved. However, we are again able to prove the existence of an optimal policy and an optimal function in the representation of the spectral . Moreover, by approximating the function space in the right way, we are able to reduce the outer optimization problem to a finite dimensional problem with a predetermined error bound. This yields an algorithm for the solution of the original optimization problem. Using an example from optimal reinsurance we show how our results can by applied. Note that for the Expected Shortfall authors in Chow and Ghavamzadeh (2014); Tamar et al. (2015) have developed gradient-based methods for the numerical computation of the optimal value and policy. For finite state and action spaces Li et al. (2017) provide an algorithm for quantile minimization of MDPs which is a similar problem. However, the outer optimization problem for spectral risk measures is much more demanding, since it is infinite dimensional. The paper is organized as follows: In the next section, we summarize definitions and properties of risk measures and introduce in particular the class of spectral risk measures which we consider here. In Section 3, we introduce the Markov Decision Model and give continuity and compactness assumptions which will later guarantee the existence of optimal policies. At the end of this section we formulate the spectral risk minimization problem of total cost. We also give some interpretations and show relations to other problems. In Section 4, we consider the inner problem with a finite time horizon. The necessary state space extension is explained as well as the recursive solution algorithm. Moreover, the existence of optimal policies is shown. In the next section we turn to the inner problem with infinite time horizon. We characterize the value function and show how optimal policies are obtained. In Section 6, we discuss our model assumptions. In case the state space is the real line we show that the restrictive assumption of the continuity of the transition function which we need in the general model, can be replaced by semicontinuity if some further monotonicity assumptions are satisfied. In Section 7, we finally treat the outer optimization problem and prove the existence of an optimal function in the representation of the spectral risk measure. The next section then deals with the numerical treatment of this problem. We show here that the infinite dimensional optimization problem can be approximated by a finite dimensional one. Last but not least in the final section we apply our findings to an optimal dynamic reinsurance problem. Problems of this type have been treated in a static setting before, see e.g. Chi and Tan (2013); Cui et al. (2013); Lo (2017); B¨auerle and Glauner (2018), but we will consider them in a dynamic framework for the first time. The aim is to minimize the solvency capital calculated with a spectral risk measure by actively choosing reinsurance contracts for the next period. When the premium for the reinsurance contract is calculated by the expected premium principle we show that the optimal reinsurance contracts are of stop-loss type.

2. Spectral Risk Measures Let (Ω, A, P) be a probability space and L0 = L0(Ω, A, P) the vector space of real-valued 1 0 random variables. By L we denote the subspace of integrable random variables and by L≥0 the subspace which consists of non-negative random variables. We follow the convention of the actuarial literature that positive realizations of random variables represent losses and negative ones gains. Let X ⊆ L0 be a convex cone. A risk measure is a functional ρ : X → R¯. The following properties are relevant in this paper. Definition 2.1. A risk measure ρ : X → R¯ is called MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 3

a) law-invariant if ρ(X)= ρ(Y ) for X,Y with the same distribution. b) monotone if X ≤ Y implies ρ(X) ≤ ρ(Y ). c) translation invariant if ρ(X + m)= ρ(X)+ m for all m ∈ R ∩X . d) positive homogeneous if ρ(λX)= λρ(X) for all λ ∈ R+. e) comonotonic additive if ρ(X + Y )= ρ(X)+ ρ(Y ) for all comonotonic X,Y . f) subadditive if ρ(X + Y ) ≤ ρ(X)+ ρ(Y ) for all X,Y . A risk measure is referred to as monetary if it is monotone and translation invariant. It appears to be consensus in the literature that these two properties are a necessary minimal requirement for any risk measure. Monetary risk measures which are additionally positive homogeneous and subadditive are called coherent. We will focus on the following class of risk measures. Here, P R −1 R FX (x)= (X ≤ x), x ∈ denotes the distribution function and FX (u) = inf{x ∈ : FX (x) ≥ u}, u ∈ [0, 1], the quantile function of a X. R 1 Definition 2.2. An increasing function φ : [0, 1] → + with 0 φ(u)d u = 1 is called spectrum and the functional ρ : L0 → R¯ with φ ≥0 R 1 −1 ρφ(X)= FX (u)φ(u)d u Z0 is referred to as spectral risk measure. Spectral risk measures were introduced by Acerbi (2002). They have all the properties listed in Definition 2.1. Properties a)-e) follow directly from respective properties of the quantile function. Verifying subadditivity is more involved, see Dhaene et al. (2000). As part of the proof they showed that spectral risk measures preserve the increasing convex oder. Spectral risk measures belong to the larger class of distortion risk measures. Definition 2.3. An increasing right-continuous function ϕ : [0, 1] → [0, 1] with ϕ(0) = 0 and 0 R¯ ϕ(1) = 1 is called distortion function and the functional ρϕ : L≥0 → with 1 −1 ρϕ(X)= FX (u)d ϕ(u) Z0 is referred to as distortion risk measure. In the special case of a spectral risk measure, the distortion function is given by u ϕ(u)= φ(s)d s, u ∈ [0, 1] (2.1) Z0 and is convex. This also shows that it is no restriction to assume φ being right continuous (as the right derivative of a convex function). Conversely, for a convex distortion function without a jump in 1, which implies continuity on [0, 1], one can always find a representation as in (2.1) with φ being a spectrum. Consequently, all distortion risk measures with convex and continuous distortion function are spectral. It has been proven by Dhaene et al. (2000) that the convexity of ϕ is equivalent to ρϕ being subadditive. 1 0 Note that ρφ is finite on L≥0 if the spectrum φ is bounded. On L≥0 the value +∞ is possible. 1 Shapiro (2013) has shown that a finite risk measure on L≥0 with all the properties in Definition 2.1 is already spectral with bounded spectrum. Example 2.4. The most widely used spectral risk measure is Expected Shortfall 1 1 ES (X)= F −1(u)d u, α ∈ [0, 1). α 1 − α X Zα 1 Its spectrum φ(u)= 1−α ½[α,1](u) is bounded. Especially in optimization, an infimum represen- tation of Expected Shortfall going back to Rockafellar and Uryasev (2000) is very useful:

1 E + 1 ESα(X)= inf q + [(X − q) ] , X ∈ L≥0. (2.2) q∈R 1 − α   4 N. BAUERLE¨ AND A. GLAUNER

−1 The infimum is attained at q = FX (α). Henceforth, we assume w.l.o.g. that φ is right-continuous. Then ν([0,t]) = φ(t) defines a Borel dµ measure on [0, 1]. Let us define a further measure µ by dν (α) = (1 − α). Every spectral risk measure can be expressed as a mixture of Expected Shortfall over different confidence levels, see e.g. Proposition 8.18 in McNeil et al. (2015).

Proposition 2.5. Let ρφ be a spectral risk measure. Then µ is a probability measure on [0, 1] and ρφ has the representation 1 0 ρφ(X)= ESα(X)µ(d α), X ∈ L≥0 Z0 When we allow to take the supremum on the r.h.s. over all probability measures µ we would get the superclass of coherent risk measures, see Kusuoka (2001). Using Proposition 2.5, the infimum representation (2.2) of Expected Shortfall can be gener- alized to spectral risk measures.

Proposition 2.6. Let ρφ be a spectral risk measure with bounded spectrum. We denote by G R R 0 the set of increasing convex functions g : → . Then it holds for X ∈ L≥0 1 ∗ ρφ(X)= inf E[g(X)] + g (φ(u)) d u , g∈G  Z0  where g∗ is the convex conjugate of g ∈ G. 1 Proof. For X ∈ L≥0 the assertion has been proven by Pichler (2015). For non-integrable X ∈ 0 L≥0 it follows from Proposition 2.5 1 ρφ(X)= ESα(X)µ(d α) ≥ ES0(X)= E[X]= ∞. Z0 −1 Now let g ∈ G and UX ∼U(0, 1) be the generalized distributional transform of X, i.e. FX (UX )= ∗ X a.s. By the definition of the convex conjugate it holds g(X)+ g (φ(UX )) ≥ Xφ(UX ). Hence, we have E E ∗ E E −1 [g(X)] + [g (φ(UX ))] ≥ [X φ(UX )] = [FX (UX ) φ(UX )] = ρφ(X)= ∞. Since g ∈ G was arbitrary, the assertion follows.  1 Remark 2.7. The proof by Pichler (2015) shows that for X ∈ L≥0 the infimum is attained in R R 1 −1 1 −1 + gφ,X : → , gφ,X (x) = 0 FX (α)+ 1−α x − FX (α) µ(d α) with µ from Proposition 2.5 and that the derivative of this function is g′ (x)= φ(F (x)) a.e. R φ,X X 3. Markov Decision Model We consider the following standard Markov Decision Process with general Borel state and action space. The state space E is a Borel space with Borel σ-algebra B(E) and the action space A is a Borel space with Borel σ-Algebra B(A). The possible state-action combinations at time n form a measurable subset Dn of E × A such that Dn contains the graph of a measurable mapping E → A. The x-section of Dn,

Dn(x)= {a ∈ A : (x, a) ∈ Dn}, is the set of admissible actions in state x ∈ E at time n. Note that the sets Dn(x) are non- empty. We assume that the dynamics of the MDP are given by measurable transition functions Tn : Dn×Z → E and depend on disturbances Z1,Z2,... which are independent random elements on a common probability space (Ω, A, P) with values in a measurable space (Z, Z). When the current state is xn, the controller chooses action an ∈ Dn(xn) and zn+1 is the realization of Zn+1, then the next state is given by

xn+1 = Tn(xn, an, zn+1). MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 5

′ The one-stage cost function cn : Dn × E → R+ gives the cost cn(x, a, x ) for choosing action a if the system is in state x at time n and the next state is x′. The terminal cost function cN : E → R+ gives the cost cN (x) if the system terminates in state x. Note that instead of non-negative cost we can equivalently consider cost which are bounded from below. The model data is supposed to have the following continuity and compactness properties.

Assumption 3.1. (i) The sets Dn(x) are compact and E ∋ x 7→ Dn(x) are upper semicon- tinuous, i.e. if xk → x and ak ∈ Dn(xk), k ∈ N, then (ak) has an accumulation point in Dn(x). (ii) The transition functions Tn are continuous in (x, a). (iii) The one-stage cost functions cn and the terminal cost function cN are lower semicontinu- ous. Under a finite planning horizon N ∈ N, we consider the model data for n = 0,...,N − 1. The decision model is called stationary if D,T,c do not depend on n and the disturbances are identically distributed. If the model is stationary and the terminal cost is zero, we allow for an infinite time horizon N = ∞. For n ∈ N0 we denote by Hn the set of feasible histories of the decision process up to time n

x0, if n = 0, hn = ((x0, a0,x1,...,xn), if n ≥ 1, where ak ∈ Dk(xk) for k ∈ N0. In order for the controller’s decisions to be implementable, they must be based on the information available at the time of decision making, i.e. be functions of the history of the decision process.

Definition 3.2. a) A measurable mapping dn : Hn → A with dn(hn) ∈ Dn(xn) for every hn ∈ Hn is called decision rule at time n. A finite sequence π = (d0,...,dN−1) is called N-stage policy and a sequence π = (d0, d1,... ) is called policy. b) A decision rule at time n is called Markov if it depends on the current state only, i.e. dn(hn)= dn(xn) for all hn ∈Hn. If all decision rules are Markov, the (N-stage) policy is called Markov. c) An (N-stage) policy π is called stationary if π = (d,...,d) or π = (d, d, . . . ), respectively, for some Markov decision rule d. With Π and ΠM we denote the sets of all policies and Markov policies respectively. It will be clear from the context if N-stage or infinite stage policies are meant. An admissible policy always exists as Dn contains the graph of a measurable mapping. Since risk measures are defined as real-valued mappings of random variables, we will work with a functional representation of the decision process. The law of motion does not need to be specified explicitly. We define for an initial state x0 ∈ E and a policy π ∈ Π π π π π X0 = x0, Xn+1 = T (Xn , dn(Hn ),Zn+1). π Here, the process (Hn )n∈N0 denotes the history of the decision process viewed as a random element, i.e. π π π π π π π π π H0 = x0,H1 = X0 , d0(X0 ), X1 ,...,Hn = (Hn−1, dn−1(Hn−1), Xn ). Under a Markov policy the recourse on the random history of the decision process is not needed. Even though the model is non-stationary we will explicitly introduce discounting by a factor β > 0 since for the following state space extension it is relevant if there is discounting. Otherwise, stationary models with discounting would have to be treated separately. For a finite planning horizon N ∈ N, the total discounted cost generated by a policy π ∈ Π if the initial state is x ∈ E, is given by N−1 πx k π π π N π CN = β ck(Xk , dk(Hk ), Xk+1)+ β cN (XN ). Xk=0 6 N. BAUERLE¨ AND A. GLAUNER

If the model is stationary and the planning horizon infinite, the total discounted cost is given by ∞ πx k π π π C∞ = β c(Xk , dk(Hk ), Xk+1). Xk=0 For a generic total cost regardless of the planning horizon we write Cπx. Our aim is to find a policy π ∈ Π which attains πx inf ρφ(CN ) (3.1) π∈Π or πx inf ρφ(C∞ ), (3.2) π∈Π 0 R¯ respectively, for a fixed spectral risk measure ρφ : L≥0 → with φ(1) < ∞, i.e. φ is bounded. We can apply Proposition 2.6 to reformulate the optimization problems (3.1) and (3.2) to 1 πx πx ∗ inf ρφ (C )= inf inf E[g(C )] + g (φ(u)) d u π∈Π π∈Π g∈G  Z0  1 = inf inf E[g(Cπx)] + g∗(φ(u)) d u g∈G π∈Π  Z0  1 = inf inf E[g(Cπx)] + g∗(φ(u)) d u , (3.3) g∈G π∈Π  Z0  For fixed g ∈ G we will refer to inf E[g(Cπx)] (3.4) π∈Π as inner optimization problem. In the two following sections we, solve (3.4) as an ordinary MDP πx 0 1 πx on an extended state space. If C ∈ L≥0 but not in L , then ρφ(C )= ∞. These policies are not interesting and can be excluded from the optimization. Since an increasing convex function g : R → R can be viewed as a disutility function, optimal- ity criterion (3.4) implies that the expected disutility of the total discounted cost in minimized. If g is strictly increasing, the optimization problem is not changed by applying g−1, i.e. minimizing the corresponding certainty equivalent g−1 E[g(Cπx)] . For bounded one-stage cost functions, such problems are solved in B¨auerle and Rieder (2014). The special case of the exponential disu- tility function g(x) = exp(γx), γ> 0, has been studied first by Howard and Matheson (1972) in a decision model with finite state and action space. The term risk-sensitive MDP goes back to them. The certainty equivalent corresponding to an exponential disutility is the entropic risk measure 1 ρ(X)= log E eγX . γ It has been shown by M¨uller (2007) that an exponential  disutility is the only case where the certainty equivalent defines a monetary risk measure apart from expectation itself (linear disu- tility). The concepts of spectral risk measures and expected disutilities (or corresponding certainty equivalents) can be combined to so-called rank-dependent expected disutilities of the form ρφ(u(X)), −1 where u is a disutility function. The corresponding certainty equivalent is u ρφ(u(X)) . In fact, this concept works more generally for distortion risk measures and incorporates both ex- pected disutilities (identity as distortion function) and distortion risk measures (identity as disutility function). The idea is that the expected disutility is calculated w.r.t. a distorted probability instead of the original probability measure. As long as the distorted probability is spectral, using a rank dependent disutility instead of ρφ leads to structurally the same inner problem as (3.4), only g is replaced by g(u(·)). Our results apply here, too. The certainty equivalent of a rank-dependent expected disutility combining an exponential disutility with a MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 7 spectral risk measure is itself a convex (but not coherent) risk measure. It has been introduced by Tsanakas and Desli (2003) as distortion-exponential risk measure.

4. Inner Problem: Finite Planning Horizon Under a finite planning horizon N ∈ N, we consider the non-stationary version of the decision model and our aim is to solve E πx inf [g(CN )] (4.1) π∈Π for an arbitrary but fixed increasing convex function g ∈ G. We assume that for all x ∈ E there πx 1 is at least one policy π s.t. CN ∈ L . Problem (4.1) is well-defined since the target function is bounded from below by g(0). W.l.o.g. we assume g ≥ 0. Note that the value +∞ is possible. As the functions g ∈ G are in general non-linear, the optimization problem cannot be solved with MDP techniques directly. This can be overcome by extending the state space to

E = E × R+ × (0, ∞) with corresponding Borel σ-algebra following B¨auerle and Rieder (2014). A generic element of b E is denoted by (x,s,t). The idea is that s summarizes the cost accumulated to far and that t keeps track of the discounting. The action space A and the admissible state-action combinations Dbn, n = 0,...,N − 1 remain unchanged. Formally, one defines

Dn = {(x, s, t, a) ∈ E × A : a ∈ Dn(x)}, n = 0,...,N − 1 implying D (x,s,t) = D (x), (x,s,t) ∈ E. The transition function on the new state space is n b n b given by Tn : Dn ×Z → E, b b Tn(x, a, z) b b b Tn(x, s, t, a, z)= s + tcn(x, a, Tn(x, a, z)) , n = 0,...,N − 1.  βt  Feasible historiesb of the decision model with extended stat e space up to time n have the form

(x0,s0,t0), n = 0, hn = ((x0,s0,t0, a0,x1,s1,t1, a1,...,xn,sn,tn), n ≥ 1, where ak ∈ Dk(xk,sk,tk), k = 0,...,N − 1, and the set of such histories is denoted by Hn. With Π and ΠM we denote the sets of history-dependent and Markov policies for the decision E π model with extendedb state space. We will write nhn for a conditional expectation given Hnb= hn, hnb∈ Hn.b The value of a policy π ∈ Π at time n = 0,...,N is defined as

VNπ(hN )= g(sN + tN cN (xN )), b b N−1 (4.2) E k−n π π π N−n π Vnπ(hn)= nhn g sn + tn β ck(Xk , dk(Hk ), Xk+1)+ β cN (XN ) , " !!# kX=n where hn ∈ Hn. The corresponding value functions are

Vn(hn)= inf Vnπ(hn), hn ∈ Hn. (4.3) b b π∈Π

In the end, the quantity of interest is V0(x, 0, 1) which agreesb with the infimal value of the original inner optimization problem (4.1). But how do we get an optimal policy for problem (4.1)? When starting in (x0, 0, 1) ∈ E, a history (x0, a0,x1, a1,...,xN ) ∈ HN of the original decision model uniquely determines the history (x0,s0,t0, a0,x1,s1,t1, a1,...,xN ,sN ,tN ) ∈ HN of the decision model with extended stateb space through n−1 b k n sn = β ck(xk, ak,xk+1) and tn = β , n = 0,...,N. Xk=0 8 N. BAUERLE¨ AND A. GLAUNER

M Hence, for the initial state (x0, 0, 1) ∈ E, a Markov policy π = (d0,...,dN−1) ∈ Π with dn : E →A, which will turn out to be optimal for (4.3), can be perceived as a history-dependent ′ ′ ′ policy π = (d0,...,dN−1) ∈ Π of the originalb decision model, since we can find measurableb ′ ′ functionsb dn : Hn → A satisfying dn(hn) ∈ Dn(xn) and

n−1 ′ k n dn(x0, a0,x1,...,xn)= dn xn, β ck(xk, ak,xk+1), β . ! Xk=0 Analogously, a history-dependent policy π ∈ Π can be regarded as a history-dependent policy of the original decision model. We can now proceed to deriving an iteration for the policy values (4.2). b

Proposition 4.1. The value of a policy π ∈ Π can be calculated recursively for n = 0,...,N − 1 and hn ∈ Hn as b bVNπ(hN )= g(sN + tN cN (xN ))

Vnπ(hn)= E Vn+1π hn, dn(hn), Tn(xn,sn,tn, dn(hn),Zn+1) Eh  i = Vn+1π hn, dn(hn), Tbn(xn, dn(hn),Zn+1), h  sn + tncn(xn, dn(hn), Tn(xn, dn(hn),Zn+1)), βt . i Proof. The proof is by backward induction. At time N there is nothing to show. Now assume the assertion holds for n + 1, then the tower property of conditional expectation yields

N−1 E k−n π π π N−n π Vnπ(hn)= nhn g sn + tn β ck(Xk , dk(Hk ), Xk+1)+ β cN (XN ) " !!# kX=n E = nhn g sn + tncn(xn, dn(hn), Tn(xn, dn(hn),Zn+1)) " N−1 k−(n+1) π π π N−(n+1) π + tnβ β ck(Xk , dk(Hk ), Xk+1)+ β cN (XN ) !!# k=Xn+1

= E E b nhn n+1(h ,d (h ),T (x ,s ,t ,d (h ),Z +1)) " n n n n n n n n n n "

g sn + tncn(xn, dn(hn), Tn(xn, dn(hn),Zn+1))

N−1 k−(n+1) π π π N−(n+1) π + tnβ β ck(Xk , dk(Hk ), Xk+1)+ β cN (XN ) !!## k=Xn+1

= E Vn+1π hn, dn(hn), Tn(xn,sn,tn, dn(hn),Zn+1) .  " #   b Remark 4.2. If there is no discounting or if the discounting is included in the non-stationary one-stage cost functions, the second summary variable t is obviously not needed. In the special + case that ρφ is the Expected Shortfall, one only has to consider the functions gq(x) = (x−q) , q ∈ R. Due to their positive homogeneity in (x,q), it suffices to extend the state space by only one real-valued summary variable even if there is discounting, cf. B¨auerle and Ott (2011). MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 9

Let us now consider specifically Markov policies π ∈ ΠM . The function space M R = v : E → + | v is lower semicontinuous,b  v(x, ·, ·) is increasing for all x ∈ E, b v(x,s,t) ≥ g(s) for (x,s,t) ∈ E turns out to be the set of potential value functions under such policies.b In order to simplify the notation, we introduce the usual operators on M. All v ∈ M are non-negative and thus at least quasi-integrable.

Definition 4.3. For v ∈ M and a Markov decision rule d : E → A we define

E Lnv(x, s, t, a)= v Tn(x, s, t, a, Zn+1) b Eh  i = v Tbn(x, a, Zn+1), s + tcn(x, a, Tn(x, a, Zn+1)), βt , (x, s, t, a) ∈ Dn, h  i Tndv(x,s,t)= Lnv(x,s,t,d(x,s,t)), (x,s,t) ∈ E,b Tnv(x,s,t)= inf Lnv(x, s, t, a), (x,s,t) ∈ E. a∈D(x) b b Note that the operators are monotone in v. Under a Markov policy π = (d0,...,dN−1) ∈ ΠM the value iteration can be expressed with the operators. In order to distinguish from the history-depended case, we denote the Markov value functions with J. Setting JNπ(x,s,t) = gb(s + tcN (x)), (x,s,t) ∈ E, we obtain for n = 0,...,N − 1 and (x,s,t) ∈ E

Jnπ(x,s,t)= E Jn+1bπ Tn(x, dn(x),Zn+1), s + tcn(x, dn(x), Tn(x, dn(xb),Zn+1)), βt h  i = Tndn Jn+1π(x,s,t).

The corresponding Markov value functions are defined for n = 0,...,N as

Jn(x,s,t)= inf Jnπ(x,s,t), (x,s,t) ∈ E. π∈ΠM

The next result shows that Vn satisfies a Bellman equation and provesb that an optimal policy exists and is Markov.

Theorem 4.4. Let Assumption 3.1 be satisfied. Then, for n = 0,...,N the value functions Vn only depend on (xn,sn,tn), i.e. Vn(hn)= Jn(xn,sn,tn) for all hn ∈ Hn, lie in M and satisfy the Bellman equation b JN (x,s,t)= g(s + tcN (x)),

Jn(x,s,t)= TnJn+1(x,s,t), (x,s,t) ∈ E.

∗ ∗ Furthermore, for n = 0,...,N −1 there exist Markov decision rules dnb : E → A with Tndn Jn+1 = ∗ ∗ TnJn+1 and every sequence of such minimizers constitutes an optimal policy π = (d0,...,dN−1) ∈ ΠM for problem (4.3). b

Proof.b The proof is by backward induction. At time N we have VN (hN ) = JN (xN ,sN ,tN ) = g(sN + tN cN (xN )), hN ∈ Hn, which is • lower semicontinuous since g is increasing and continuous (as a convex function on R) and b cN is lower semicontinuous, • increasing in (sN ,tN ) since g is increasing and cN is non-negative, • bounded below by g(sN ) since g is increasing and tN cN (xN ) ≥ 0, 10 N. BAUERLE¨ AND A. GLAUNER i.e. in M. Assuming the assertion holds at time n + 1 we have at time n for hn ∈ Hn

Vn(hn)= inf Vnπ(hn) b b π∈Π

= inf E Vn π hn, dn(hn), Tn(xn,sn,tn, dn(hn),Zn ) b +1 +1 π∈Π h  i ≥ inf E Vn hn, dn(hn), Tnb(xn,sn,tn, dn(hn),Zn ) b +1 +1 π∈Π h  i = inf E Jn Tn(xn,sn,tn,b dn(hn),Zn ) . b +1 +1 π∈Π h  i = inf E Jbn+1 Tn(xn,sn,tn, an,Zn+1) . (4.4) an∈Dn(xn) h  i The last equality holds since the minimizationb does not depend on the entire policy but only on an = dn(hn). Here, objective and constraint depend on the history of the process only ∗ through xn. Thus, given existence of a minimizing Markov decision rule dn, (4.4) equals ∗ Tndn Jn+1(xn,sn,tn). Again by the induction hypothesis, there exists an optimal Markov policy ∗ M π ∈ Π such that Jn+1 = Jn+1π∗ . Hence, we have

∗ ∗ ∗ Vn(hn) ≥Tndn Jn+1π (xn,sn,tn)= Jnπ (xn,sn,tn) ≥ Jn(xn,sn,tn) ≥ Vn(hn). ∗ M It remains to show the existence of a minimizing Markov decision rule dn and that Jn ∈ . We want to apply Proposition 2.4.3 of B¨auerle and Rieder (2011). The set-valued mapping E ∋ (x,s,t) 7→ Dn(x) is compact-valued and upper semicontinuous. Next, we show that Dn ∋ (x, s, t, a) 7→ Lnv(x, s, t, a) is lower semicontinuous for every v ∈ M. Let {(xk,sk,tk, ak)}k∈N be ∗ ∗ ∗ ∗ ab convergent sequence in Dn with limit (x ,s ,t , a ) ∈ Dn. The mapping b

Dn ∋ (x, s, t, a) 7→b v Tn(x, a, Zn+1(ω)), s + tcnb(x, a, Tn(x, a, Zn+1(ω))), βt   is lower semicontinuous.b Since v ≥ g ≥ 0, we can apply Fatou’s Lemma which yields

lim inf Lnv(xk,sk,tk, ak) k→∞

= lim inf E v Tn(xk, ak,Zn+1), sk + tkcn(xk, ak, Tn(xk, ak,Zn+1)), βtk k→∞ h  i ≥ E lim inf v Tn(xk, ak,Zn+1), sk + tkcn(xk, ak, Tn(xk, ak,Zn+1)), βtk k→∞ h ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ i ≥ E v Tn(x , a ,Zn+1), s + t cn(x , a , Tn(x , a ,Zn+1)), βt ∗ ∗ ∗ ∗ = Lhnv(x ,s ,t , a ). i

I.e. Lnv is lower semicontinuous. With Proposition 2.4.3 of B¨auerle and Rieder (2011) follows ∗ the existence of a minimizing decision rule dn and the lower semicontinuity of Tnv. Now fix x ∈ E. The fact that (s,t) 7→ Tnv(x,s,t) is increasing follows as in Theorem 2.4.14 in B¨auerle and Rieder (2011). The inequality Tnv(x,s,t) ≥ g(s), (x,s,t) ∈ E, is obvious. Taken together, we have Tnv ∈ M and the proof is complete. 

bN−1 Remark 4.5. From Theorem 4.4 it follows that the sequence {(xn,sn,tn)}n=0 with

n−1 k n (sn,tn)= β ck(xk, ak,xk+1), β ! Xk=0 is a sufficient statistic of the decision model with the original state space in the sense of Hinderer (1970). MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 11

5. Inner Problem: Infinite Planning Horizon In this section, we consider the inner optimization problem (3.4) of the risk-sensitive total cost minimization under an infinite planning horizon. This is reasonable if the terminal period is unknown or if one wants to approximate a model with a large but finite planning horizon. Solving the infinite horizon problem will turn out to be easier since it admits a stationary optimal policy. We study the stationary version of the decision model with no terminal cost, i.e. D,T,c do not depend on n, cN ≡ 0 and the disturbances are identically distributed. Let Z be a representative of the disturbance distribution. Our aim is to solve E πx inf [g(C∞ )] (5.1) π∈Π for an arbitrary but fixed increasing convex function g ∈ G. As in the previous section we πx 1 assume w.l.o.g. that g ≥ 0 and that for all x ∈ E there exists a policy π such that C∞ ∈ L . The remarks in Section 4 regarding connections to the minimization of (rank-dependent) expected disutilities and corresponding certainty equivalents apply in the infinite horizon case as well. In order to obtain a value iteration, the state space is extended to E = E × R+ × (0, ∞) as in Section 4. The action space A and the admissible state-action combinations D remain unchanged, i.e. D = {(x, s, t, a) ∈ E × A : a ∈ D(x)} and D(x,s,t)= Db(x), (x,s,t) ∈ E. The transition function on the new state space is given by T : D ×Z → E, b b b b T (x, a, z) T (x, s, t, a, z)= s + tc(x, a,b T (x,b a, z)) .b  βt  Since the model with infiniteb planning horizon will be derived as a limit of the one with finite M horizon, the consideration can be restricted to Markov policies π = (d1, d2,... ) ∈ Π due to M Theorem 4.4. For the relevant initial state (x0, 0, 1) ∈ E, a Markov policy π ∈ Π can be perceived as a history-dependent policy of the original decision model, cf. Sectionb 4. When calculating limits, it is more convenient to index the valueb functions with the distanceb to the time horizon rather than the point in time. This is also referred to as forward form of the value iteration and is only possible under Markov policies in a stationary model. There, the two ways M of indexing are equivalent. The value of a policy π = (d0, d1 . . . ) ∈ Π up to a planning horizon N ∈ N is b J0π(x,s,t)= g(s) N−1 E k π π π π π JNπ(x,s,t)= 0x g s + t β c(Xk , dk(Xk , sk , tk ), Xk+1) , " !# Xk=0 π π π M where (Xn , sn, tn)n∈N is the extended decision process under policy π ∈ Π with initial state (x,s,t) ∈ E. The change of indexing makes it necessary to write the value iteration in terms of M the shifted policy ~π = (d1, d2,... ) corresponding to π = (d0, d1,... ) ∈ Π :b b JNπ(x,s,t)= E JN−1~π T (x, d0(x,s,t),Z), s + tc(x, d0(x,s,t), T (x,b d0(x,s,t),Z)), βt h  i = Td0 JN−1~π(x), (x,s,t) ∈ E. (5.2) The value function for finite planning horizon N ∈ N is given by b JN (x,s,t)= inf JNπ(x,s,t), (x,s,t) ∈ E, b π∈ΠM and satisfies due to Theorem 4.4 the Bellman equation b N JN (x,s,t)= T JN−1(x,s,t)= T g(x,s,t), (x,s,t) ∈ E.

b 12 N. BAUERLE¨ AND A. GLAUNER

The value of a policy π ∈ ΠM under an infinite planning horizon is defined as ∞ E k π π π π π J∞π(x,s,t)= 0x g bs + t β c(Xk , dk(Xk , sk , tk ), Xk+1) , (x,s,t) ∈ E. " !# Xk=0 Note that J∞π is well-defined since c ≥ 0 and hence JNπ is increasing. The infiniteb horizon value function is

J∞(x,s,t)= inf J∞π(x,s,t), (x,s,t) ∈ E. (5.3) π∈ΠM The limit J(x) = limN→∞ JN (x), x ∈ E, which again exists since JNb is increasing, is referred to as limit value function. Note that M is closed under pointwise convergence and hence J ∈ M. Theorem 5.1. Let Assumption 3.1 be satisfied. Then it holds:

a) The infinite horizon value function J∞ is the smallest fixed point of the Bellman operator T in M and J∞ = J. ∗ b) There exists a Markov decision rule d such that Td∗ J∞ = T J∞ and each stationary policy π∗ = (d∗, d∗,... ) induced by such a decision rule is optimal for optimization problem (5.3).

Proof. a) First, we show that J∞ = J. For all N ∈ N we have JNπ ≥ JN . Taking the limit M N →∞ we obtain J∞π ≥ J for policies π ∈ Π . Thus J∞ ≥ J. M For the reverse inequality we start with JNπ ≤ J∞π which is true for all policies π ∈ Π due to the fact that c ≥ 0. Taking the infimumb over all policies yields JN ≤ J∞ and taking the limit N →∞ we obtain J ≤ J∞. It total, we have J = J∞. b Let now v ∈ M be another fixed point of T , i.e. v = T v. Iterating this equality yields v = T nv for all n ∈ N. Since v ∈ M we have v ≥ g and because of the monotonicity of the n n Bellman operator we get v = T v ≥ T g. Letting n → ∞ finally implies v ≥ J = J∞, thus J∞ is the smallest fixed point of the Bellman operator. b) Since J∞ ∈ M, the existence of a minimizing Markov decision rule follows as in the proof of Theorem 4.4. Furthermore, it holds J∞(x,s,t) ≥ g(s), (x,s,t) ∈ E, since J∞ ∈ M. Consequently, we have

N N ∗ ∗ b J∞ = lim Td∗ J∞ ≥ lim Td∗ g = lim JNπ = J∞π ≥ J∞. N→∞ N→∞ N→∞ i.e. π∗ is optimal. The first equality is by part a), the inequality thereafter by the mono- tonicity of the operator Td∗ and the second equality by the value iteration (5.2). 

6. Relaxed Assumptions for Monotone Models The model has been introduced in Section 3 with a general Borel space as state space. In order to solve the optimization problem in Sections 4 and 5 we needed a continuous transition function despite having a semicontinuous model. This assumption on the transition function can be relaxed to semicontinuity if the state space is the real line and the transition and one- stage cost function have some form of monotonicity. For notational convenience, we consider the stationary model with no terminal cost under both finite and infinite horizon in this section. We replace Assumption 3.1 by Assumption 6.1. (i) The state space is the real line E = R. (ii) The sets D(x) are compact and R ∋ x 7→ D(x) is upper semicontinuous and decreasing, i.e. D(x) ⊇ D(y) for x ≤ y. (iii) The transition function T is lower semicontinuous in (x, a) and increasing in x. (iv) The one-stage cost c(x, a, T (x, a, z)) is lower semicontinuous in (x, a) and increasing in x. Requiring that the one-stage cost function c is lower semicontinuous in (x, a, x′) and increasing in (x,x′) is sufficient for Assumption 6.1 (iv) to hold due to part (iii) of the assumption. How do the modified continuity assumptions affect the validity of the results in Sections 4 and 5? The only two results that were proven using the continuity of the transition function MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 13

T in (x, a) and not only its measurability are Theorems 4.4 and 5.1. All other statements are unaffected. Proposition 6.2. The assertions of Theorems 4.4 and 5.1 hold under Assumption 6.1, too. Moreover, the value functions Jn and J∞ are increasing. The set of potential value functions can therefore be replaced by

M = v : E → R | v is lower semicontinuous and increasing,  v(x,s,t) ≥ g(s) for (x,s,t) ∈ E . b

Proof. In Theorem 4.4, the continuity of T is used to show that Db ∋ (x, s, t, a) 7→ Lv(x, s, t, a) is lower semicontinuous for every v ∈ M. Due to the monotonicity assumptions, the mapping b Dn ∋ (x, s, t, a) 7→ v T (x, a, Z(ω)), s + tc(x, a, T (x, a, Z(ω))), βt   is lower semicontinuousb for every ω ∈ Ω as a composition of an increasing lower semicontinuous function with a lower semicontinuous one. Now, the lower semicontinuity of D ∋ (x, s, t, a) 7→ Lv(x, s, t, a) and the existence of a minimizing decision rule follow as in the proof of Theorem 4.4. The fact that T v is increasing for every v ∈ M follows as in Theorem 2.4.14 in B¨auerleb and Rieder (2011). In Theorem 5.1, the continuity of T is only used indirectly through Theorem 4.4. Note that J∞ ∈ M since the pointwise limit of increasing functions remains increasing. 

The monotonicity properties of Assumption 6.1 can be used to construct a convex model. Lemma 6.3. Let Assumption 6.1 be satisfied, A be a subset of a real vector space, the admissible state-action-combinations D be a convex set, the transition function T be convex in (x, a) and the one-stage cost D ∋ (x, a) 7→ c(x, a, T (x, a, z)) be a convex function for every z ∈ Z. Then the value functions Jn(·, ·,t) and J∞(·, ·,t) are convex for every t> 0.

Proof. We prove by induction that Jn is convex in (x,s) for n ∈ N0. Then J∞ is convex as a pointwise limit of convex functions. For n = 0 we know that J0(x,s,t)= g(s) is convex in (x,s). Now assume that Jn is convex in (x,s). Recall that Jn increasing by Proposition 6.2. Hence, for every ω ∈ Ω and t> 0 the function

(x, s, a) 7→ Jn T (x, a, Z(ω)), s + tc(x, a, T (x, a, Z(ω))), βt   is convex as a composition of an increasing convex with a convex function. By the linearity of expectation (x, s, a) 7→ LJn(x, s, t, a) is convex, too, for every t > 0. Now, the convexity of Jn follows from Proposition 2.4.18 in B¨auerle and Rieder (2011). 

If c is increasing in x′, it is sufficient to require that c and T are convex in (x, a). The monotonicity requirements in Assumption 6.1 are only one option. The following alternative is relevant i.a. for the dynamic reinsurance model in Section 9. For a proof see Section 6.1.3 in Glauner (2020). Corollary 6.4. Change Assumption 6.1 (ii)-(iv) to (ii’) The sets D(x) are compact and R ∋ x 7→ D(x) is upper semicontinuous and increasing. (iii’) T is upper semicontinuous in (x, a) and increasing in x. (iv’) c(x, a, T (x, a, z)) is lower semicontinuous in (x, a) and decreasing in x.

Then, the assertions of Theorems 4.4 and 5.1 still hold with the value functions Jn and J∞ being decreasing in x and increasing in (s,t). If furthermore A is a subset of a real vector space, D a convex set, T concave in (x, a) and D ∋ (x, a) 7→ c(x, a, T (x, a, z)) convex for every z ∈Z. Then, the value functions Jn(·, ·,t) and J∞(·, ·,t) are convex for every t> 0. 14 N. BAUERLE¨ AND A. GLAUNER

7. Outer Problem: Existence In this section, we study the existence of a solution to the outer optimization problem (3.3) under both finite and infinite planning horizon. Given a solution of the respective inner problem for every g ∈ G, the two outer problems are essentially the same and therefore treated together. We have assumed in both cases that for all x ∈ E there exists a policy π such that Cπx ∈ L1 πx and thus ρφ(C ) =ρ ¯ < ∞. Hence in what follows we can restrict to policies π such that πx ρφ(C ) ≤ ρ¯. In this case, we can further restrict the set G in the representation of Proposition 2.6. Lemma 7.1. It is sufficient to consider functions g ∈ G in the representation of Proposition 2.6 which are φ(1)-Lipschitz and satisfy 0 ≤ g(x) ≤ g¯(x), x ∈ R, where g¯(x)= φ(1)x+ +ρ. ¯ The space of such functions is denoted by G. πx Proof. Set C = C to simplify the notation and assume that ρφ(C) ≤ ρ¯. We know from Remark 2.7 that the optimal g ∈ G corresponding to C is 1 1 g (x)= F −1(α)+ x − F −1(α) µ(d α), x ∈ R, φ,C C 1 − α C + Z0 with µ from Proposition 2.5. Since C ≥ 0 it follows  1 −1 gφ,C (x) ≥ FC (α)µ(d α) ≥ 0. Z0 Furthermore, we have 1 1 1 + g (x)= F −1(α)µ(d α)+ x − F −1(α) µ(d α) φ,C C 1 − α C Z0 Z0 1 1  + 1 ≤ ESα(C)µ(d α)+ x µ(d α) 0 0 1 − α Z + Z + = ρφ(C)+ φ(1)x ≤ ρ¯ + φ(1)x =g ¯(x). −1 The first inequality uses FC (α) = VaRα(C) ≤ ESα(C) and C ≥ 0. The identity 1 1 µ(d α)= φ(1) 1 − α Z0 is by definition of µ. As a convex function, gφ,C is almost everywhere differentiable with deriva- ′ tive gφ,C (x)= φ(FC (x)) ≤ φ(1), cf. Remark 2.7. This establishes the Lipschitz continuity with constant L = φ(1).  For a fixed policy π ∈ ΠM , the optimal solution of the outer problem is already given by Remark 2.7 as 1 −1 b 1 −1 g πx (s)= F πx (α)+ s − F πx (α) µ(d α), s ∈ R,N ∈ N ∪ {∞}. φ,CN C C 0 N 1 − α N + Z   However, we solved the inner problem for arbitrary but fixed g ∈ G. Hence, the optimal policy depends on g and Proposition 2.6 is not helpful. As a first step in ensuring the existence of a solution of the outer problem, we study the dependence of the value functions of the inner problem on g. In order to do so, we need some structure on G. Lemma 7.2. (G,m) is a compact metric space, where ∞ max|s|≤j |g1(s) − g2(s)| m(g , g )= 2−j 1 2 1+max |g (s) − g (s)| j |s|≤j 1 2 X=1 is the metric of compact convergence. MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 15

Proof. Since G ⊆ C(R, R), it suffices to show that G is closed w.r.t. m and verify the assumptions of the Arzel`a-Ascoli theorem. Note that convergence w.r.t. m implies pointwise convergence. Convexity, monotonicity, the common Lipschitz constant φ(1), non-negativity and the pointwise upper boundg ¯ are all preserved even under pointwise convergence. Hence, G is closed w.r.t. m. Moreover, G is pointwise bounded and the common Lipschitz constant implies that it is uniformly equicontinuous.  g For clarity we index the value functions with g. The value functions J0 of the finite horizon g inner problem and J∞ of the infinite horizon inner problem depend semicontinuously on g. Lemma 7.3. Let Assumption 3.1 be satisfied. Then the functional G × E ∋ (g,x,s,t) 7→ g Jn(x,s,t) is lower semicontinuous for all n = 0,...,N. b g Proof. The proof is by backward induction. At time N we have to verify that JN (x,s,t) = g(s + tcN (x)) is lower semicontinuous in (g,x,s,t). First, note that G× R+ ∋ (g,s) 7→ g(s) is continuous since if (gk,sk) → (g,s), then g converges especially pointwise and

|gk(sk) − g(s)| = |gk(sk) − gk(s)+ gk(s) − g(s)| ≤ |gk(sk) − gk(s)| + |gk(s) − g(s)|

≤ φ(1) |sk − s| + |gk(s) − g(s)|→ 0 as k →∞.

Now let (gk,xk,sk,tk) → (g,x,s,t) and define the increasing sequence {ck}k∈N trough ck = infℓ≥k cN (xℓ). Case 1: {ck}k∈N is bounded above and therefore convergent with limitc ˆ. Then

cˆ = lim ck = lim inf cN (xℓ) = lim inf cN (xk) ≥ cN (x) k→∞ k→∞ ℓ≥k k→∞ since cN is lower semicontinuous. As the functions {gk}k∈N and g are all increasing, we get

lim inf gk(sk + tkcN (xk)) ≥ lim gk(sk + tkck)= g(s + tcˆ) ≥ g(s + tcN (x)). k→∞ k→∞

Case 2: {ck}k∈N is unbounded above. Then there exists K ∈ N such that ck ≥ cN (x) for all k ≥ K and

lim inf gk(sk + tkcN (xk)) ≥ lim inf gk(sk + tkck) ≥ lim gk(sk + tkcN (x)) = g(s + tcN (x)). k→∞ k→∞ k→∞ Now assume the assertion holds for n + 1. By Theorem 4.4 we have at time n g E g Jn(x,s,t)= inf Jn+1 Tn(x, a, Zn+1), s + tcn(x, a, Tn(x, a, Zn+1)), βt . a∈D(x) h  i g The integrand Jn+1 Tn(x, a, Zn+1(ω)), s + tcn(x, a, Tn(x, a, Zn+1(ω))), βt is lower semicontin- uous in (g, x, s, t, a) for every ω ∈ Ω by the induction hypothesis. Hence, if (gk,xk,sk,tk) → (g,x,s,t), Fatou’s lemma and the monotonicity of expectation yield

E gk lim inf Jn+1 Tn(xk, ak,Zn+1), sk + tkcn(xk, ak, Tn(xk, ak,Zn+1)), βtk k→∞ h  i E gk ≥ lim inf Jn+1 Tn(x, a, Zn+1), s + tcn(x, a, Tn(x, a, Zn+1)), βt k→∞ Eh g  i ≥ Jn+1 Tn(x, a, Zn+1), s + tcn(x, a, Tn(x, a, Zn+1)), βt h g i I.e. (g,x,s,t) 7→ LnJn+1(x, s, t, a) is lower semicontinuous. As the set-valued mapping E ∋ x 7→ D(x) is compact valued and upper semicontinuous, g g (g,x,s,t) 7→ Jn(x, s, t, a)= inf LnJn+1(x, s, t, a) a∈D(x) is lower semicontinuous by Proposition 2.4.3 in B¨auerle and Rieder (2011).  Lemma 7.4. Let Assumption 3.1 be satisfied. Then the functional G × E ∋ (g,x,s,t) 7→ g J∞(x,s,t) is lower semicontinuous for all (x,s,t) ∈ E. b b 16 N. BAUERLE¨ AND A. GLAUNER

Proof. Under an infinite planning horizon, we consider a stationary model and use forward g indexing for the value functions JN . They are lower semicontinuous in (g,x,s,t) by Lemma g g 7.3. Note that the induction basis holds especially for cN ≡ 0. Since JN ↑ J∞ as N → ∞, the assertion follows from Lemma A.1.4 in B¨auerle and Rieder (2011). 

For initial state x ∈ E and finite planning horizon N ∈ N the outer problem is given g 1 ∗ g by infg∈G J0 (x, 0, 1) + 0 g (φ(u)) d u and for infinite planning horizon by infg∈G J∞(x, 0, 1) + 1 g∗(φ(u)) d u. In the following, we will only use the semicontinuity of the value functions in g. 0 R Hence, we write R 1 inf J(g)+ g∗(φ(u)) d u (7.1) g∈G Z0 for a generic outer problem and suppress initial state and planning horizon. Theorem 7.5. Under Assumption 3.1 there exists a solution for the outer optimization problem (7.1). Proof. We want to apply Weierstraß’ extreme value theorem. In view of Lemmata 7.2, 7.3 and 1 ∗ 7.4 it suffices to show that the functional G ∋ g 7→ 0 g (φ(u)) d u is lower semicontinuous. Let {g } N ⊆ G be a convergent sequence with limit g ∈ G. It holds for all u ∈ [0, 1] k k∈ R ∗ ∗ lim inf gk(φ(u)) = lim inf gℓ (φ(u)) = lim inf sup φ(u)s − gℓ(s) k→∞ k→∞ ℓ≥k k→∞ ℓ≥k s∈R  ≥ lim sup inf φ(u)s − gℓ(s) = sup lim inf φ(u)s − gℓ(s) k→∞ s∈R ℓ≥k s∈R k→∞ ℓ≥k   ∗ = sup φ(u)s − lim sup gk(s) = sup φ(u)s − g(s) = g (φ(u)). (7.2) s∈R k→∞ s∈R   The inequality holds generally for the interchange of infimum and supremum, the equality there- after by Lemma A.1.6 in B¨auerle and Rieder (2011) and the last but one equality since the sequence {gk}k∈N is especially pointwise convergent. Moreover note that for all k ∈ N and u ∈ [0, 1] it holds ∗ gk(φ(u)) = sup φ(u)s − gk(s) ≥−gk(0) ≥−g¯(0) > −∞. s∈R  Now, Fatou’s lemma and (7.2) yield with

∗ ∗ ∗ lim inf gk(φ(u)) d u ≥ lim inf gk(φ(u)) d u ≥ g (φ(u)) d u k→∞ k→∞ Z Z Z the assertion. 

8. Outer Problem: Numerical Approximation As we know now that a solution to the outer optimization problem (7.1) exists, this section aims to determine the solution numerically. The idea is to approximate the functions g ∈ G by piecewise linear ones and thereby obtain a finite dimensional optimization problem which can be solved with classical methods of global optimization. We are going to show that the minimal values converge when the approximation is continuously refined and give an error bound. Regarding the second summand of the objective function (7.1) our method coincides with the Fast Legendre-Fenchel Transform (FLT) algorithm studied i.a. by Corrias (1996). πx N For unbounded cost CN with N ∈ ∪ {∞}, π ∈ Π, x ∈ E, the functions g ∈ G would have to be approximated on the whole non-negative real line. This is numerically not feasible. Assumption 8.1. If N ∈ N, we require additionally to Assumption 3.1 that c is bounded from above by a constantc ¯. If N = ∞, we also assume that β ∈ (0, 1). MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 17

πx N Consequently, it holds 0 ≤ CN ≤ cˆ for all N ∈ ∪ {∞}, π ∈ Π and x ∈ E, where we define N βkc¯ for finite planning horizon N ∈ N, cˆ = k=0 c¯ for infinite planning horizon N = ∞. (P1−β The bounded cost allows for a further reduction of the feasible set of the outer problem. On the reduced feasible set, the second summand of the objective function is guaranteed to be finite and easier to calculate. Recall that the convex conjugate of g ∈ G is an R¯-valued function defined ∗ R by g (y) = sups∈R{sy − g(s)}, y ∈ . Lemma 8.2. a) Under Assumption 8.1, a minimizer of the outer optimization problem (7.1) lies in G = {g ∈ G : g(s)= g(0) for s< 0 and g(s)= g(ˆc)+ φ(1)(s − cˆ) for s> cˆ} . b) For g ∈ G and y ∈ [0, φ(1)] it holds g∗(y) = max {sy − g(s)} < ∞. b s∈[0,cˆ] πx Proof. a) Fix π ∈ Π,x ∈ E and set C = CN to simplify the notation. We know from Remark 2.7 that the optimal g ∈ G corresponding to C is 1 1 g (s)= F −1(α)+ s − F −1(α) µ(d α), s ∈ R, φ,C C 1 − α C + Z0 with µ from Proposition 2.5. Clearly, it is sufficient to consider functions g ∈ G which πx −1 are optimal for at least one C = CN . Since 0 ≤ C ≤ cˆ we have 0 ≤ FC (α) ≤ cˆ. Consequently, it holds for s< 0 1 −1 gφ,C (s)= FC (α)µ(d α)= g(0). Z0 ′ As a convex function, gφ,C is almost everywhere differentiable with derivative gφ,C (s) = φ(FC (s)), cf. Remark 2.7, and for s> cˆ it holds FC (s) = 1. b) Let g ∈ G and y ∈ [0, φ(1)]. For s ≥ cˆ the function s 7→ sy − g(s) = (y − φ(1))s − g(ˆc)+ φ(1)ˆc b is decreasing and for s ≤ 0 the function s 7→ sy − g(s)= sy − g(0) is increasing. Hence, it suffices to consider the supremum over [0, cˆ].  The fact that the supremum of the convex conjugate reduces to the maximum of a continu- ous function over a compact set, opens the door for a numerical approximation with the FLT algorithm. By definition of G, it is sufficient to approximate the functions g ∈ G on the in- terval [0, cˆ]. For the value iteration in Lemma 4.1 and equation (5.2) it may be necessary to evaluate g in some s > cˆ, butb here the function is determined as a linear continuatiob n with slope φ(1). On the interval I = [0, cˆ], the metric of compact convergence reduces to the supre- mum norm k · k∞. For the piecewise linear approximation we consider equidistant partitions cˆ 0 = s1 < s2 < · · · < sm =c ˆ, i.e. sk = (k − 1) m−1 , k = 1,...,m, m ≥ 2. Let us define the mapping

g(sk+1) − g(sk) k k+1 pm(g)(s)= g(sk)+ (s − sk), s ∈ [s ,s ], k = 1,...,m − 1, sk+1 − sk which projects a function g ∈ G to its piecewise linear approximation and its image Gm = {pm(g) : g ∈ G}. For considering the restriction of the outer optimization problem (7.1) to Gm it is convenient to define for g ∈ bG b b b ∗ ∗ Km(g)= J(pm(g)) + pm(bg) (φ(u)) d u and K(g)= J(g)+ g (φ(u)) d u. Z Z 18 N. BAUERLE¨ AND A. GLAUNER

Proposition 8.3. It holds

cˆ inf Km(g) − inf K(g) ≤ sup |Km(g) − K(g)|≤ 2φ(1) . b b g∈G g∈G b m − 1 g∈G

N Proof. The first inequality is obvious and it remains to prove the second. We have for ∈ N ∪ {∞}, x ∈ E and g ∈ G

E πx E πx E πx πx |J(pm(g)) − J(g)| = infb [pm(g)(CN )] − inf [g(CN )] ≤ sup |pm(g)(CN ) − g(CN )| π∈Π π∈Π π∈Π

≤ sup |pm(g)(s) − g(s)|. s∈I Moreover, it holds for y ∈ [0, φ(1)]

∗ ∗ |pm(g) (y) − g (y)| = sup{sy − g(s)}− sup{sy − pm(g)(s)} ≤ sup |pm(g)(s) − g(s)|. s∈I s∈I s∈I

Finally, the assertion follows with

∗ ∗ |Km(g) − K(g)| ≤ |J(pm(g)) − J(g)| + |pm(g) (φ(u)) − g (φ(u))| d u ≤ 2sup |pm(g)(s) − g(s)| Z s∈I g(sk+1) − g(sk) =2 max max g(s) − g(sk) − (s − sk) k=1,...,m−1 s∈[s ,s +1] s − s k k k+1 k

cˆ ≤ 2 max |g(sk+1) − g(sk)|≤ 2φ(1) .  k=1,...,m−1 m − 1

The proposition shows that the infimum of Km converges to the one of K. The error of cˆ restricting the outer problem (7.1) to Gm is bounded by 2φ(1) m−1 . The piecewise linear functions g ∈ Gm are uniquely determined by their values in the kinks s1,...,sm. Hence, we can identify Gm with the compact set b b y − y y − y b Γ = (y ,...,y ) ∈ Rm : y ∈ I, 0 ≤ 2 1 ≤···≤ m m−1 ≤ φ(1) . m 1 m 1 s − s s − s  2 1 m m−1 

Note that due to translation invariance of ρφ it holds under Assumption 8.1 for g ∈ G that g(0) ≤ g¯(0) = ρ(C¯) = ρ(ˆc) =c ˆ. Thus, the outer problem (7.1) restricted to Gm becomes finite dimensional: b 1 b ∗ inf J(gy)+ gy(φ(u)), (8.1) y∈Γm Z0 where gy ∈ Gm is the piecewise linear function induced by y ∈ Γm, i.e.

yk+1 − yk b gy(s)= yk + (s − sk), s ∈ [sk,sk+1], k = 1,...,m − 1. sk+1 − sk

How to evaluate J(·) in gy, y ∈ Γm, has been discussed in Sections 4 and 5. The next Lemma simplifies the evaluation of the second summand of the objective function (8.1) to calculating the uk+1 −1 yk+1−yk integrals φ(u)d u, where u = 0, u = φ , k = 1,...,m − 1 and um = φ(1). uk 0 k sk+1−sk R   Lemma 8.4. The convex conjugate of gy, y ∈ Γm, in ξ ∈ [0, φ(1)] is given by y , ξ < y2−y1 , − 1 0 ≤ s2−s1 ∗ yk+1−yk yk+2−yk+1 g (ξ)= sk+1ξ − yk+1, ≤ ξ ≤ , k = 1,...,m − 2 y  sk+1−sk sk+2−sk+1 s ξ − y , ym−ym−1 <ξ ≤ φ(1).  m m sm−sm−1  MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 19

∗ yk+1−yk Proof. By Lemma 8.2 b) we have g (ξ) = maxs I sξ − gy(s). Note that the slopes c = , y ∈ k sk+1−sk k = 1,...,m − 1, are increasing. It follows ∗ gy(ξ)= sup sξ − gy(s)= max max s(ξ − ck) − yk + cksk. s∈[0,cˆ] k=1,...,m−1 s∈[sk,sk+1]

Let us distinguish three cases. Firstly, assume ξ ∈ [cℓ, cℓ+1] for some ℓ ∈ {1,...,m − 2}. Then ∗ gy(ξ) = max max sk+1(ξ − ck) − yk + cksk, max sk(ξ − ck) − yk + cksk k=1,...,ℓ k=ℓ+1,...,m−1 n o = max max sk+1ξ − yk+1, max skξ − yk k=1,...,ℓ k=ℓ+1,...,m−1 n o = sℓ+1ξ − yℓ+1.

The last equality holds, since c1 ≤ ··· ≤ cm−1 and cℓ ≤ ξ ≤ cℓ+1 is equivalent to ξsℓ − yℓ ≤ ξsℓ+1 − yℓ+1 ≥ ξsℓ+2 − yℓ+2. Secondly, assume ξ < c1. Then ∗ gy(ξ)= max sk(ξ − ck) − yk + cksk = max skξ − yk = s1ξ − y1 = −y1. k=1,...,m−1 k=1,...,m−1

Again, ξ < c1 is equivalent to ξs2 − y2 < ξs1 − y1. Since c1 ≤···≤ cm−1, this implies the last equality. The third case cm−1 <ξ is analogous.  The results of this section can be used to set up an algorithm for the problems in (3.1) 2φ(1)ˆc and (3.2). First we have to set m = ǫ + 1 when we want to solve the problem with error estimate ǫ. Then choose y0 ∈ Γml andm solve the inner problem with gy0 . Use a global optimization procedure to determine the next y1 (note that we do not have convexity of (8.1) in y) like e.g. simulated annealing to determine the optimal value of (8.1) .

9. Dynamic Optimal Reinsurance As an application, we present a dynamic extension in discrete time of the static optimal reinsurance problem

min rCoC · ρ f(Y )+ πR(f) , (9.1) f∈F  1 In this setting, an insurance company incurs an aggregate loss Y ∈ L≥0 at the end of a fixed period due to insurance claims. In order to reduce its risk, the insurer may cede a portion of it to a reinsurance company and retain only f(Y ). Here, the reinsurance treaty f determines the retained loss f(Y (ω)) in each scenario ω ∈ Ω. For the risk transfer, the insurer has to compensate 1 R¯ the reinsurer with a reinsurance premium πR(f) = πR(Y − f(Y )), where πR : L≥0 → is a premium principle with properties similar to a risk measure. Most widely used is the expected premium principle πR(X)=(1+ θ)E[X] with safety loading θ > 0. In order to preclude moral hazard, it is standard in the actuarial literature to assume that both f and the ceded loss function idR+ −f are increasing. Hence, the set of admissible retained loss functions is R R R F = {f : + → + | f(t) ≤ t ∀t ∈ +, f increasing, idR+ −f increasing}. The insurer’s target is to minimize its cost of solvency capital which is calculated as the cost of capital rate rCoC ∈ (0, 1] times the solvency capital requirement determined by applying the risk measure ρ to the insurer’s effective risk after reinsurance. First research on the optimal reinsurance problem (9.1) dates back to the 1960s. Borch (1960) proved that a stop-loss reinsurance treaty minimizes the variance of the retained loss of the insurer given the premium is calculated with the expected value principle. A similar result has been derived in Arrow (1963) where the expected utility of terminal wealth of the insurer has been maximized. Since then a lot of generalizations of this problem have been considered. For a comprehensive literature overview, we refer to Albrecher et al. (2017). Since the 2000s, Expected Shortfall has become of special interest. Chi and Tan (2013) identified layer reinsurance contracts as optimal for Expected Shortfall under general premium principles. Their results were extended to general distortion risk measures by Cui et al. (2013). Other 20 N. BAUERLE¨ AND A. GLAUNER generalizations concerned additional constraints, see e.g. Lo (2017), or multidimensional settings induced by a macroeconomic perspective, see B¨auerle and Glauner (2018). We are not aware of any dynamic generalizations in the literature. Reinsurance treaties are typically written for one year, cf. Albrecher et al. (2017). Hence, it is appropriate to model such an extension in discrete time. The insurer’s annual surplus has the dynamics

X0 = x, Xn+1 = Xn + Zn+1 − fn(Yn+1) − πR(fn),

∞ where the bounded, non-negative random variable Zn+1 ∈ L≥0 represents the insurer’s premium p R¯ income in the n-th period. The premium principle πR : L≥0 → of the reinsurer is assumed to be law-invariant, monotone, normalized and to have the Fatou property. Normalization means that πR(0) = 0 and the Fatou property is lower semicontinuity w.r.t. dominated convergence. The Markov Decision Model is given by the state space E = R, the action space A = F, + either no constraint or a budget constraint D(x) = {f ∈ F : πR(f) ≤ x }, the independent 1 ∞ disturbances (Yn,Zn)n∈N with Yn ∈ L≥0 and Zn ∈ L≥0, the transition function T (x,f,y,z) = ′ ′ x−f(y)−πR(f)+z and the one-stage cost function c(x,f,x )= x−x . The insurance companies target is to minimize its solvency cost of capital for the total discounted loss

N−1 k π π inf rCoC · ρφ β dk(Hk )(Yk+1)+ πR(dk(Hk )) − Zk+1 (9.2) π∈Π ! Xk=0   where ρφ is a spectral risk measure with bounded spectrum φ, β ∈ (0, 1] and N ∈ N. As it is irrelevant for the minimization, we will in the sequel omit the cost of capital rate rCoC and instead minimize the capital requirement. For β = 1 we have

N−1 N−1 π π π π π dk(Hk )(Yk+1)+ πR(dk(Hk )) − Zk+1 = Xk − Xk+1 = x − XN , Xk=0 Xk=0 i.e. due to translation invariance of spectral risk measures the objective reduces to minimizing π the capital requirement for the loss (negative surplus) at the planing horizon −XN . This is reminiscent of the static reinsurance problem (9.1), however here the loss distribution at the planing horizon can be controlled by interim action. Throughout, we have required that the one-stage cost c(x,f,T (x, f, Y, Z)) = f(Y )+ πR(f) − Z is non-negative. As f(Y ) and πR(f) are non-negative for all f ∈ F and c(x, idR+ , T (x, idR+ ,Y,Z)) = Y − Z due to normalization of πR, the premium income Z would have to be non-positive. This makes no sense from an actuarial ∞ N−1 k point of view, but since ρφ is translation invariant and Z ∈ L we can add k=0 β ess sup(Z) without influencing the minimization. This means that the one-stage cost function is changed toc ˆ(x,f,x′)= x − x′ + ess sup(Z). The economic interpretation is that theP one-stage cost

cˆ(x,f,T (x, f, Y, Z)) = f(Y )+ πR(f) + ess sup(Z) − Z now depends on the deviation from the maximal possible income instead of the actual income. For brevity we writez ˆ = ess sup(Z). As in (3.3) we separate an inner and outer reinsurance problem. For a structural analysis we focus on the inner optimization problem

N−1 E k π π inf g β dk(Hk )(Yk+1)+ πR(dk(Hk )) +z ˆ − Zk+1 (9.3) π∈Π " !# Xk=0   with arbitrary g ∈ G, cf. Lemma 7.1. Note that for π = (f,f,...) with f = idR+ we obtain πx R R ρφ(CN ) < ∞. On the extended state space E = × + × (0, 1], the value of a policy π ∈ Π is

b b MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 21 defined as

VNπ(hN )= g(sN ), N−1 E k−n π π Vnπ(hn)= nhn g sn + tn β dk(Hk )(Yk+1)+ πR(dk(Hk )) +z ˆ − Zk+1 , " !# kX=n   for n = 0,...,N and hn ∈ Hn. The corresponding value functions are

Vn(hn)= inf Vnπ(hn), hn ∈ Hn. b b π∈Π Due to the real state space we want to apply Corollary 6.4 for sbolving the optimization problem. Note that the one-stage costc ˆ is non-negative and the spectrum φ bounded by assumption. The following lemma shows that also the monotonicity, continuity and compactness assumptions of Corollary 6.4 are satisfied by the dynamic reinsurance model. Lemma 9.1. a) The retained loss functions f ∈ F are Lipschitz continuous with constant L ≤ 1. Moreover, F is a Borel space as a compact subset of the metric space (C(R+),m) of continuous real-valued functions on R+ with the metric of compact convergence. b) The functional πR : F → R+, f 7→ πR(f) is lower semicontinuous. c) The transition function T is upper semicontinuous and increasing in x. d) D(x) is a compact subset of F for all x ∈ R and the set-valued mapping R ∋ x 7→ D(x) is upper semicontinuous and increasing. e) The one-stage cost D ∋ (x, a) 7→ c(x, a, T (x, a, y, z)) is lower semicontinuous and decreas- ing in x.

Proof. a) Let f ∈ F. Since idR+ −f is increasing, it holds for 0 ≤ x ≤ y that x − f(x) ≤ y − f(y). Rearranging and using that f is increasing, too, yields with |f(x) − f(y)| = f(y)−f(x) ≤ y −x = |x−y| the Lipschitz continuity with common constant 1. Moreover,

F is pointwise bounded by idR+ and closed under pointwise convergence. Hence, (F,m) is a compact metric space by the Arzel`a-Ascoli theorem and as such also complete and separable, i.e. a Borel space. b) Let {fk}k∈N be a sequence in F such that fk → f ∈ F. Especially, it holds fk(x) → f(x) 1 for all x ∈ R+ and Y − fk(Y ) → Y − f(Y ) P-a.s. Since Y − fk(Y ) ≤ Y ∈ L for all k ∈ N, the Fatou property of πR implies

lim inf πR(fk) = lim inf πR Y − fk(Y ) ≥ πR Y − f(Y ) = πR(f). k→∞ k→∞

c) We show that the mapping F × R+ ∋ (f,y) 7→ f(y) is continuous. Then, the transition function T is upper semicontinuous as a sum of upper semicontinuous functions due to part b). Let {(fk,yk)}k∈N be a convergent sequence in F × R+ with limit (f,y). Since convergence w.r.t. the metric m implies pointwise convergence and all fk have the Lipschitz constant L = 1, it follows

|fk(yk) − f(y)| ≤ |fk(yk) − fk(y)| + |fk(y) − f(y)| ≤ |yk − y| + |fk(y) − f(y)|→ 0. The fact that T is increasing in x is obvious. d) Due to a), we only have to consider the budget-constrained case. Since F is compact it + suffices to show that D(x) = {f ∈ F : πR(f) ≤ x } is closed. This is the case since D(x) is a sublevel set of the lower semicontinuous function πR : F → R+, cf. Lemma A.1.3 in B¨auerle and Rieder (2011). Furthermore, we show that D is closed to obtain the upper semicontinuity from Lemma A.2.2 in B¨auerle and Rieder (2011). From the lower semicontinuity of πR it follows that the epigraph

epi(πR)= {(f,x) ∈F× R+ : πR(f) ≤ x}

is closed. Thus, D = {(x,f) : (f,x) ∈ epi(πR)}∪ (R− × D(0)) is closed, too. That x 7→ D(x) is increasing is clear. 22 N. BAUERLE¨ AND A. GLAUNER

e) The one-stage cost c(x,f,T (x, a, y, z)) = x − T (x,f,y,z) = f(y)+ πR(f) − z is lower semicontinuous in (x,f) as a sum of lower semicontinuous functions and decreasing in x since it does not depend on x. 

Now, Corollary 6.4 yields that it is sufficient to minimize over all Markov policies and the value functions satisfy the Bellman equation

JN (x,s,t)= g(s),

Jn(x,s,t)= inf E Jn+1 x − f(Y ) − πR(f)+ Z, s + t f(Y )+ πR(f) +z ˆ − Z , βt f∈D(x) h  i ∗ for (x,s,t) ∈ E and n = 0,...,N −1. Moreover, there exists a Markov Decision rule dn : E → F ∗ ∗ M minimizing Jn+1 and every sequence π = (d0,...,dN−1) ∈ Π of such minimizers is at solution to (9.3). b b All structural properties of the optimal policy which do notb depend on g are inherited by the optimal solution of the cost of capital minimization problem (9.2). The structural properties we will focus on in the rest of this section are induced by convexity. Therefore, we assume that the premium principle πR is convex and that there is no budget constraint. Note that D(x) is non-convex even for convex πR. Then, we have indeed a convex model: D is trivially convex, the transition function T (x,f,y,z)= x − f(y) − πR(f)+ z is concave in (x,f) as a sum of concave functions and the one-stage cost (x,f) 7→ cˆ(x,f,T (x,f,y,z)) = f(y)+ πR(f) +z ˆ − z is convex as a sum of convex functions. Now, Corollary 6.4 yields that the value functions Jn are convex. Under the widely-used expected premium principle, the optimization problem can be reduced to finite dimension.

Example 9.2. Let πR(·)=(1+ θ)E[·] be the expected premium principle with safety loading θ> 0 and assume there is no budget constraint. We will now show that the optimal reinsurance treaties (i.e. retained loss functions) can be chosen from the class of stop loss treaties f(x) = min{x, a}, a ∈ [0, ∞].

Due to the convexity of Jn+1, we can infer from the Bellman equation that reinsurance treaty f1 is better than f2 if

f1(Y )+ πR(f1) ≤cx f2(Y )+ πR(f2), where ≤cx denotes the convex order. Since Y1 ≤cx Y2 implies E[Y1] = E[Y2], it suffices to find an af ∈ [0, ∞] such that

min{Y, af }≤cx f(Y ). (9.4)

The mapping [0, ∞] → R+, a 7→ min{Y (ω), a} is continuous for all ω ∈ Ω and 0 ≤ min{Y, a}≤ 1 Y ∈ L . Thus, it follows from dominated convergence that [0, ∞] → R+, a 7→ E[min{Y, a}] is continuous. Furthermore, E[min{Y, 0}] ≤ E[f(Y )] ≤ E[min{Y, ess sup(Y )}].

Hence, by the intermediate value theorem there is an af ∈ [0, ∞] such that E[f(Y )] = E[min{Y, af }]. Let us compare the survival functions:

P P ½ Smin{Y,af }(y)= (min{Y, af } >y)= (Y >y) {af >y}, P P Sf(Y )(y)= (f(Y ) >y) ≤ (Y >y).

R The inequality holds since f ≤ id + . Hence, we have Smin{Y,af }(y) ≥ Sf(Y )(y) for y < af and Smin{Y,af }(y) ≤ Sf(Y )(y) for y ≥ af . The cut criterion 1.5.17 in M¨uller and Stoyan (2002) implies min{Y, af }≤icx f(Y ) and due to the equality in expectation follows (9.4), cf. Theorem 1.5.3 in M¨uller and Stoyan (2002). So the inner optimization problem (9.3) is reduced to finding an optimal nonnegative parameter of a stop loss treaty at every stage. MINIMIZING SPECTRAL RISK MEASURES APPLIED TO MARKOV DECISION PROCESSES 23

References Acerbi, C. (2002). Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking & Finance, 26(7):1505–1518. Albrecher, H., Beirlant, J., and Teugels, J. L. (2017). Reinsurance : Actuarial and Statistical Aspects. John Wiley & Sons, Hoboken, NJ. Arrow, K. (1963). Uncertainty and the welfare economics of medical care. The American Economic Review, 53(5):941–973. B¨auerle, N. and Glauner, A. (2018). Optimal risk allocation in reinsurance networks. Insurance: Mathe- matics and Economics, 82:37–47. B¨auerle, N. and Glauner, A. (2020). Markov decision processes with recursive risk measures. arXiv:2010.07220. B¨auerle, N. and Ott, J. (2011). Markov decision processes with Average-Value-at-Risk criteria. Mathe- matical Methods of Operations Research, 74(3):361–379. B¨auerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer- Verlag, Berlin Heidelberg. B¨auerle, N. and Rieder, U. (2014). More risk-sensitive Markov decision processes. Mathematics of Operations Research, 39(1):105–120. Borch, K. (1960). An attempt to determine the optimum amount of stop loss reinsurance. In Transactions of the XVI International Congress of Actuaries, volume I, pages 597–610. Chi, Y. and Tan, K. S. (2013). Optimal reinsurance with general premium principles. Insurance: Math- ematics and Economics, 52(2):180–189. Chow, Y. and Ghavamzadeh, M. (2014). Algorithms for cvar optimization in MDPs. In Advances in neural information processing systems, pages 3509–3517. Chow, Y., Tamar, A., Mannor, S., and Pavone, M. (2015). Risk-sensitive and robust decision-making: a CVaR optimization approach. In Advances in Neural Information Processing Systems, pages 1522– 1530. Chu, S. and Zhang, Y. (2014). Markov decision processes with iterated coherent risk measures. Interna- tional Journal of Control, 87(11):2286–2293. Corrias, L. (1996). Fast Legendre-Fenchel transform and applications to Hamilton-Jacobi equations and conservation laws. SIAM Journal on Numerical Analysis, 33(4):1534–1558. Cui, W., Yang, J., and Wu, L. (2013). Optimal reinsurance minimizing the distortion risk measure under general reinsurance premium principles. Insurance: Mathematics and Economics, 53(1):74–85. Dhaene, J., Wang, S., Young, V., and Goovaerts, M. J. (2000). Comonotonicity and maximal stop-loss premiums. Bulletin of the Swiss Association of Actuaries, 2000(2):99–113. Glauner, A. (2020). Robust and Risk-sensitive Markov Decision Processes with Applications to Dynamic Optimal Reinsurance. PhD thesis, Karlsruhe Institute of Technology. DOI: 10.5445/IR/1000126170. Hinderer, K. (1970). Foundations of Non-stationary Dynamic Programming with Discrete Time Param- eter. Springer-Verlag, Berlin Heidelberg. Howard, R. A. and Matheson, J. E. (1972). Risk-sensitive Markov decision processes. Management Science, 18(7):356–369. Kusuoka, S. (2001). On law invariant coherent risk measures. In Advances in mathematical economics, pages 83–95. Springer. Li, X., Zhong, H., and Brandeau, M. L. (2017). Quantile Markov decision process. arXiv preprint arXiv:1711.05788. Lo, A. (2017). A Neyman-Pearson perspective on optimal reinsurance with constraints. ASTIN Bulletin, 47(2):467–499. McNeil, A. J., Frey, R., and Embrechts, P. (2015). Quantitative Risk Management: Concepts, Techniques and Tools. Princeton University Press, revised edition. M¨uller, A. (2007). Certainty equivalents as risk measures. Brazilian Journal of Probability and Statistics, 21(1):1–12. M¨uller, A. and Stoyan, D. (2002). Comparison Methods for Stochastic Models and Risks. John Wiley & Sons, Chichester. Pflug, G. C. and Pichler, A. (2016). Time-consistent decisions and temporal decomposition of coherent risk functionals. Mathematics of Operations Research, 41(2):682–699. Pichler, A. (2015). Premiums and reserves, adjusted by distortions. Scandinavian Actuarial Journal, 2015(4):332–351. 24 N. BAUERLE¨ AND A. GLAUNER

Rockafellar, R. T. and Uryasev, S. (2000). Optimization of Conditional Value-at-Risk. Journal of Risk, 2(3):21–41. Ruszczy´nski, A. (2010). Risk-averse dynamic programming for Markov decision processes. Mathematical Programming, 125(2):235–261. Shapiro, A. (2013). On Kusuoka representation of law invariant risk measures. Mathematics of Operations Research, 38(1):142–152. Tamar, A., Chow, Y., Ghavamzadeh, M., and Mannor, S. (2015). Policy gradient for coherent risk measures. In Advances in Neural Information Processing Systems, pages 1468–1476. Tsanakas, A. and Desli, E. (2003). Risk measures and theories of choice. British Actuarial Journal, 9(4):959–991. U˘gurlu, K. (2017). Controlled Markov decision processes with AVaR criteria for unbounded costs. Journal of Computational and Applied Mathematics, 319:24–37.

(N. B¨auerle) Department of Mathematics, Karlsruhe Institute of Technology (KIT), D-76128 Karlsruhe, Germany Email address: [email protected]

(A. Glauner) Department of Mathematics, Karlsruhe Institute of Technology (KIT), D-76128 Karlsruhe, Germany Email address: [email protected]