Moment-Based Approximation with Mixed Distributions

by Hélène Cossette, David Landriault, Etienne Marceau, and Khouzeima Moutanabbir

ABSTRACT Moment-based approximations have been extensively analyzed over the years (see, e.g., Osogami and Harchol-Balter 2006 and references therein). A number of specific phase-type (and non phase-type) distributions have been considered to tackle the moment-matching problem (see, for instance, Johnson and Taaffe 1989). Motivated by the development of more flexible moment- based approximation methods, we develop and examine the use of finite mixture of Erlangs with a common rate parameter for the moment-matching problem. This is primarily motivated by Tijms (1994) who shows that this class of distributions can approximate any continuous positive distribution to an arbitrary level of accu- racy, as well as the tractability of this class of distributions for various problems of interest in quantitative risk management. We consider separately situations where the rate parameter is either known or unknown. For the former case, a direct connection with a discrete moment-matching problem is established. A parallel to the s-convex stochastic order (e.g., Denuit et al. 1998) is also drawn. Numerical examples are considered throughout.

KEYWORDS Risk theory, mixed Erlang distributions, moment-matching, distribution fitting, phase-type approximation

166 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1 Moment-Based Approximation with Mixed Erlang Distributions

1. Introduction moment-based approximation of Whitt (1982) when CV > 1 using a Coxian distribution. Alternatively, Mixed Erlang distributions are known to yield ana- Johnson and Taaffe (1989) considered a mixture of lytic solutions to many risk management problems of Erlangs with a common shape (order) parameter as interest. This is primarily due to the tractable features their moment-based approximation. of this distributional class. Among others, the class Most predominantly, there exists a substantial body of mixed Erlang distributions is closed under various of literature on the three-moment approximation operations such as convolutions and Esscher transfor- within the phase-type class of distributions (e.g., Telek mations (e.g., Willmot and Woo 2007 and Willmot and and Heindl 2002, Bobbio et al. 2005, and references Lin 2011). As such, risk aggregation and ruin problems therein). Matching the first three moments is often can more easily be tackled under mixed Erlang assump- viewed as effective to provide a reasonable approxi- tions (e.g., Cheung and Woo 2016, Cossette et al. 2012, mation to the underlying system (e.g., Osogami and and Landriault and Willmot 2009). Also, Tijms (1994) Harchol-Balter 2006 and references therein). How- showed that the class of mixed Erlang distributions is ever, as illustrated in this paper and many others, three dense in the set of all continuous and positive dis- moments does not always suffice, triggering the devel- tributions. Therefore, we consider a moment-based opment of more flexible moment-based approxima- approximation method which capitalizes on the afore- tions. Among others, we mention the work of Johnson mentioned properties of the mixed Erlang distribution. and Taaffe (1989) on mixed Erlang distributions of More precisely, we propose to approximate a distri- common order. Also, Dufresne (2007) proposes two bution with known moments by a moment-matching approximation techniques based on Jacobi polynomial mixed Erlang distribution. Moment-based approxi- expansions and the logbeta distribution to fit combina- mations have been extensively developed in various tions of exponential distributions. This paper is com- research areas, including performance evaluation, plementary to the aforementioned ones by considering , and risk theory, to name a few. the family of finite mixture of Erlangs with common Osogami and Harchol-Balter (2006) identify the rate parameter to approximate a distribution on R+, as following four criteria to evaluate moment-matching theoretically justified in the continuous case by Tijms algorithms: (1) the number of moments matched; (1994, Theorem 3.9.1). The reader is also referred to (2) the computational efficiency of the algorithm; Lee and Lin (2010) where fitting of the same class of (3) the generality of the solution; and (4) the mini- distributions is considered using the EM algorithm mality of the number of parameters (phases). It also (which relies on the knowledge of the approximated seems desirable for the approximation to be in itself distribution rather than only its moments). a distribution. This is not mentioned in Osogami It is worth pointing out that other non-phase type and Harchol-Balter (2006) for the obvious reason approximation methods have been widely used in that they consider phase-type distributions as their actuarial science. A good survey paper on this topic moment-based approximation class. There exists an is Chaubey et al. (1998). One of these approximation extensive literature on the approximation of distribu- classes are refinements to the normal approxima- tions by a specific subset of phase-type distributions tion such as the normal power and the Cornish Fisher using moment-based techniques. For instance, Whitt approximations (e.g., Ramsay 1991, Daykin et al. (1982) proposed a mixture of two exponential dis- 1994, and Lee and Lin 1992). These approximations tributions or a generalized Erlang distribution as are based on the first few moments. However, the a moment-based approximation when either the coef- resulting approximation is often not a proper distribu- ficient of variation (CV) is greater than or less than 1, tion. Other moment-based distributional approxima- respectively. Also, both Altiok (1985) and Vanden tions are the translated (e.g., Seal Bosch et al. (2000) proposed an alternative to the 1977), translated inverse Gaussian distribution (e.g.,

VOLUME 10/ISSUE 1 CASUALTY ACTUARIAL SOCIETY 167 Advancing the Science of Risk

Chaubey et al. 1998) and the generalized Pareto dis- As stated in Courtois and Denuit (2007), there tribution (e.g., Venter 1983). It should be noted that exists a non-negative random variable (rv) with dis- all these approximation methods are designed to fit a tribution function (df) F and first m moments lm if specific number of moments, and thus lack the flex- and only if the following two conditions are satisfied: ibility to match an arbitrary number of moments. • det Pk > 0, for k = 1, . . . , (m - 1)/2; The rest of the paper is constructed as follows. In • det Qk > 0, for k = 1, . . . , m/2; Section 2, a brief review on admissible moments, mixed Erlang distributions and the approximation where x holds for the integer part of x. In what fol- method of Johnson and Taaffe (1989) is provided. lows, we silently assume the moment set lm is from + Section 3 is devoted to our class of finite mixture of a on R . Erlangs with common rate parameter. Theoretical and 2.2. Mixed Erlang distribution practical considerations related to the approximation method are drawn. Various examples are considered We now review some known properties of mixed to examine the quality of the resulting approximation. Erlang distributions with common rate parameter. A In Section 4, we consider applications of our moment- more elaborate review of this class of distributions based approximations of Section 3 when the underly- can be found in Willmot and Woo (2007), Lee and ing distribution is of mixed Erlang form with known Lin (2010), and Willmot and Lin (2011). rate parameter. A parallel is drawn with a discrete Let W be a mixed Erlang rv with common rate moment-matching problem and certain stochastic parameter b > 0 and df orderings, notably the s-convex stochastic order (e.g., FxW ()=ζ∑ k Hx();,k β ,(1) Denuit et al. 1998). An application of Cossette et al. kA∈ l

(2002) will be examined in more detail. l where Al = {1, 2, . . . , l}, and {zk}k=1 is the probability mass function (pmf) of a discrete rv K with support

2. Background Al for a given l ∈ {1, 2, . . .}  {∞}. The Erlang df H is defined as 2.1. Admissible moments k−1 i x ()βx Karlin and Studden (1966) provide the necessary Hx();,kHβ≡1;−β ()xk,1=−e−β ∑ , and sufficient conditions for a set of (raw) moments i=0 i! lm = (µ1, . . . , µm) to be from a probability distribution x ≥ 0, (2) defined on R+. To state this result, define the matrices

Pk and Qk (k ≥ 1) as where the parameters k and b of the Erlang df are known as the shape and rate parameters, respectively. 1  µµ1 k  An alternative and useful representation of the mixed   Erlang rv W is WC= K where {C } are iid expo-  µµ12 µk+1 ∑ k=1 k k k≥1 Pk =   ; nential rv’s with mean 1/b, independent of K, i.e., the   rv W follows a compound distribution.    µµkk+12 µ k  Remark 1. As in, e.g., Willmot and Woo (2007), we consider the class of mixed Erlang dfs (1) rather than  µµ12 µk+1    the more general class of combinations of Erlangs  µµ23 µk+2  where some z ’s are possibly negative. For the latter Q =.k k   class, additional constraints on {z }l exist to ensure   k k=1   that the right-hand side of (1) is a non-decreasing  µµkk++12 µ21k+  function in x. This presents additional challenges in

168 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1 Moment-Based Approximation with Mixed Erlang Distributions

the subsequent moment-matching application, chal- for more details). In general, these moment-based lenges which do not arise in the mixed Erlang case. approximations propose to work with a specific sub- class of all finite and infinite mixed Erlang dis- It is well known that the j-th moment of W is given j -j ∞ j-1 tributions. Among them, we recall the method of by E[W ] = b Σk=1z {∏ (k + i)}. Of particular impor- k i=0 Johnson and Taaffe (1989), who will be used later for tance in actuarial science and quantitative risk man- comparative purposes. agement (see, e.g., McNeil et al. 2005 and references therein) are the VaR and TVaR risk measures. For 2.3. Method of Johnson and Taaffe (1989) the mixed Erlang rv W, there is in general no closed Johnson and Taaffe (1989) investigated the use of form expression for VaRk(W) = inf{x ∈ R : FW(x) ≥ k} where 0 ≤ k < 1, but its value can be obtained using a mixtures of Erlang distributions with common shape routine numerical procedure. As for its TVaR, Lee and parameter for moment-matching purposes. More pre- Lin (2010) showed that cisely, mixtures of n (or fewer) Erlangs with common shape parameters are used to match the first (2n - 1) 1 1 moments (whenever the set of moments is within the TVaR WW≡ VaRu du κ() ∫κ () 1 −κ feasible set). For the three-moment matching problem, 1 ∞ k Johnson and Taaffe (1989) generalized the approxima- = ∑ ζk HW()VaR;κ()k +β1, .(3) tion of Whitt (1982) and Altiok (1985) by enlarging the 1 −κk=1 β set of feasible moments l3 when CV > 1. Their method Another quantity of interest is the stop-loss premium is also valid for some combinations of l3 when CV < 1. defined as pW(b) = E[(W - b)+] with (x)+ = max{x, 0}. Their three-moment approximation is a mixture For the mixed Erlang df (1), we have of two Erlangs with common r (see Theorem 3 of Johnson and Taaffe 1989), i.e., 1 ∞  k  π=W ()b ∑ζk  Hb();1kb+β,;−βHb()k,, 1 −κk=1  β  Fx()=βpH ()xr;,12+−()1;pH()xr,,β (5) b ≥ 0  µ1 −−1  1 −1 2 where p =  −β2  ()β−1 β2 , and {}1 βi i=1 are (see also Willmot and Woo (2007, Eq. 3.6) for the  r  higher-order stop-loss moments). Tijms (1994) showed the solutions of As2 + Bs + C = 0 with that this class of distributions can approximate any continuous positive distribution with an arbitrary  r + 1 2  Ar=+()r 2 µµ12 − µ1  level of accuracy. For completeness, the theoretical  r  foundation of this result is given next.   r + 1 2   r  µµ13− µ2  Theorem 2. (Tijms 1994, Theorem 3.9.1). Let F be   r + 2     the df of a positive rv. For any given h > 0, define  2   rr()+ 2  r + 1 2   ∞ B =− +  µ−21µ   1  r + 1  r   Fxh ()=−()Fk()hF()()kh− 1;Hxk, .(4) ∑  h   k=1   2  r + 1 2   ++()n 2 µµ1  21− µ   Then, lim Fxh ()= Fx() for any continuity point x of F.  r  h→0   Note that F in (4) is a mixed Erlang df of the form h r + 1 C  2  . (1) with zk = (F(kh) - F((k - 1)h)) (k = 1, 2, . . .) and =µ11 µµ32− µ   r + 2  rate parameter b = 1/h. Several approximation methods motivated by Tijms’ The choice of the shape parameter r is discussed in theorem were proposed over the years (see Section 1 Johnson and Taaffe (1989, Proposition 4).

VOLUME 10/ISSUE 1 CASUALTY ACTUARIAL SOCIETY 169 Variance Advancing the Science of Risk

where ( , , . . . , )T, M ( µ , 2µ , . . . , 3. Moment-based approximation ym = zi1 zi2 zim b = b 1 b 2 m T with mixed Erlang distribution b µm) , and

In this section, we propose to use a different subclass  ii12 im  of mixed Erlang distributions to examine moment-    ii11(1++)(ii22 1) iimm(1+ ) based approximation techniques. G   . m =     3.1. Description of the approach  m−−1 m 1 m−1   ()ii++()ii ()iim +  +  ∏∏1 2 ∏  For a given l ∈ N , let (lm, Al) be the set of i=0 i=0 i=0 all finite mixture of Erlangs with df (1) and first m It follows that y = G-1M under the constraint moments l . From Section 2.2, this consists in the m m b m that y e = 1, where e is the vector 1’s. Note that identification of all solutions to the problem m T -1 T yme = (Gm Mb) e is a polynomial of degree (at most)

j 1 m in b. Thus, we only consider the real and positive l − ()ki+ ∏i=0 solutions (in ) of (G-1M )T e 1, and complete their ∑ ζk j =µj ,1jm= ,...,, (6) b m b = k=1 β mixed Erlang representation with the identification of the mixing weights y = G-1M . The procedure is l m m b under the constraints that b > 0 and {zk}k=1 is a prob- l systematically repeated for all (m) possible sets of m ability measure on Al. distinct elements in Al. Remark 3. For a rv W with df (1) and first m Remark 4. Given that the above procedure is moments lm, we indifferently write W ∈ (lm, Al) l repeated (m) times, the computational efficiency of the or FW ∈ (lm, Al). This will also apply to the other proposed methodology is mostly driven by this num- distributional classes. ber, and hence the parameters m and l should be cho-

res sen accordingly. For a given number of moments, m, Also, let  (lm, Al) be the (restricted) subset of we observe that: (a) larger values of l result in a more (lm, Al) with at most m non-zero mixing probabili- l res time-consuming numerical procedure; (b) however, ties {zk}k=1. Given that  (lm, Al) has a finite num- res l should be chosen large enough for the approxima- ber of solutions, we propose to use  (lm, Al) as tion class res(l , A ) to have a reasonable number our approximation class. It is clear that  res(l , A ) m l m l of members (to legitimally produce a “good” approxi- ⊆ res(l , A ) for l ≤ l′. m l′ mation). From our numerical studies, we observe that Note that for a continuous positive distribution with the selection of l (for a given m) can be problem- moments l , we know from Theorem 2 that there exists m specific, and thus this tradeoff in the choice of l should a l large enough such that (l , A ) is not empty. m l be handled with care. However, as a rule of thumb, Even though no formal conclusion can be reached for when m is relatively small (i.e., m ≤ 6) which is tra- the restricted class  res(l , A ), all our numerical m l ditionally in moment-matching exercises, a value of l studies have shown that this set has a large number of between 50 and 100 leads to reasonable mixed Erlang distributions (see, for instance, the examples of sub- approximations. We refer the reader to the numerical sections 3.2.1 and 3.2.2) for a given m when l is chosen illustrations and subsequent remarks in Section 3.2 large enough. for a more detailed discussion on this topic. res Distributions in the  (lm, Al) class are identi- m res Among all  (lm, Al) distributions, we propose fied as follows: for a given set {ik}k=1 ⊂ Al with 1 ≤ i1 < . . . to use as a criterion of the quality of these approxima- i2 < < im ≤ l, (6) can be rewritten in matrix form as tions the Kolmogorov-Smirnov (KS) distance defined

GMmmy = β , for two rv’s S and W (with respective dfs FS and FW)

170 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1 Moment-Based Approximation with Mixed Erlang Distributions

as d (S,W) sup F (x) F (x) . The KS distance KS = x≥0 S - W  FxW3,70 ()= 0.8209Hx();6, 6.3219 is commonly used in the context of continuous dis- 0.1727Hx;12, 6.3219 tributions (e.g., Denuit et al. 2005). Therefore, + () the chosen mixed Erlang approximation within + 0.0064Hx();26, 6.3219 , res  (lm, Al) is the one minimizing the KS dis- Fx0.6350Hx;7, 8.3334 tance with the true df F . We denote by F this W4,70 ()= () S Wm,l approximation, i.e., + 0.2950Hx();12, 8.3334 dS,WFsup xFx , KS ()ml, =−infres SW() () + 0.0672Hx();20,8.3334 FAW ∈ME ()lml, x≥0 where W is a rv with df F . This requires the calcu- + 0.0029Hx();40,8.3334 , m,l Wm,l lation of the KS distance for each mixed Erlang distri- and res bution in  (lm, Al) to identify its minimizer Wm,l. FxW ()= 0.6273Hx();7, 8.3608 In general, an explicit expression for this KS distance 5,70 does not exist, and hence we propose to numerically + 0.3063Hx();12, 8.3608 find this value by evaluating the distance between the + 0.0609Hx();20,8.3608 two dfs over all multiples (up to a given high value) of a small discretization span. + 0.0055Hx();34,8.3608 Note that other distances such as the stop-loss 0.0001Hx;69, 8.3608 , distance (e.g., Gerber 1979) could have been used + () to select our approximation distribution. Alterna- for x ≥ 0 with respective KS distances of dKS(S,W3,70) tively, one could have relied on another criterion = 0.0040, dKS(S,W4,70) = 0.0018 and dKS(S,W5,70) = to identify this approximation distribution (for 0.0011. Note that the quality of the mixed Erlang res instance, select the distribution in  (lm, Al) approximation (as measured by the KS distance) with the closest subsequent moment to the true increases with the number of moments matched. For distribution). comparative purposes, the three-moment approxima- tion (5) of Johnson and Taaffe (1989) is given by 3.2. Numerical examples FxW ()= 0.0087Hx();4,1.2804 We consider a few simple examples to illustrate JT the quality of the approximation. For comparative + 0.9913Hx();4, 3.5855 , purposes, other approximation methods will also be In Figure 1, we compare the density function of W discussed. Some concluding remarks on the mixed m,70 (m = 3, 4, 5) W and S. Erlang approximation method are later made based JT All three mixed Erlang approximations provide an on the numerical experiment conducted next. overall good fit to the exact distribution. To further 3.2.1. Lognormal distribution: Dufresne examine the tail fit, specific values of VaR and TVaR (2007, Example 5.4) for the exact and approximated distributions are pro- Let S = exp(Z) where Z is a normal rv with mean 0 vided in Tables 1 and 2, respectively. and variance 0.25. The first 5 moments of S are We observe that the VaR and TVaR values of l5 = (1.1331, 1.6487, 3.0802, 7.3891, 22.7599). We con- the mixed Erlang approximations compare very well res sider the class of mixed Erlang distributions  (lm, to their lognormal counterparts, especially for the

A70) (m = 3, 4, 5) which have a total of 13198, 89294 5-moment approximation. This is particularly true and 290422 distributions, respectively. The resulting given that the lognormal distribution is known to mixed Erlang approximations F (m 3, 4, 5) are have a heavier tail than the mixed Erlang distribution. Wm,70 =

VOLUME 10/ISSUE 1 CASUALTY ACTUARIAL SOCIETY 171 Variance Advancing the Science of Risk

Figure 1. Density function: Lognormal vs. Approximations

Note that the improvement is indeed not monotone ()3.2se2.6 −3.2s fsS ()= 0.2 with the number of moments matched, as increasing sΓ()2.6 this number does not necessarily lead to a higher qual- ity approximation in moment-matching techniques. ()1.2se6.3 −1.2s + 0.8 ,0s ≥ , sΓ()6.3 3.2.2. Mixture of two gamma distributions:

Lee and Lin (2010, Section 5, Example 1) and first 6 moments l6 = (4.3623, 25.7308, 176.9624, Let S be a mixture of two gamma distributions 1369.8272, 11754.2149, 110674.4154). We consider res with density the class of mixed Erlang distributions  (µm, A70)

Table 1. Values of VaRj for WJT, Wm,70 (m = 3, 4, 5) and S

k VaRk (WJT) VaRk (W3,70) VaRk (W4,70) VaRk (W5,70) VaRk (S) 0.9 1.8906 1.9129 1.9056 1.8936 1.8980 0.95 2.2119 2.2692 2.2918 2.2791 2.2760 0.99 2.9953 3.1223 3.0991 3.1812 3.2001 0.995 3.4239 3.6892 3.4811 3.6572 3.6252 0.999 5.0672 4.9237 5.0623 4.7241 4.6885

Table 2. Values of TVaRj for WJT, Wm,70 (m = 3, 4, 5) and S

k TVaRk (WJT) TVaRk (W3,70) TVaRk (W4,70) TVaRk (W5,70) TVaRk (S) 0.9 2.3931 2.4540 2.4601 2.4616 2.4616 0.95 2.7528 2.8350 2.8431 2.8600 2.8586 0.99 3.7939 3.9007 3.8125 3.8579 3.8413 0.995 4.4115 4.4455 4.3654 4.3323 4.2957 0.999 6.2260 5.4245 5.6195 5.3151 5.4341

172 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1 Moment-Based Approximation with Mixed Erlang Distributions

Table 3. Values of VaRj for Wm,70 (m = 3, 4, 5, 6), WEM, and S

k VaRk (WEM) VaRk (W3,70) VaRk (W4,70) VaRk (W5,70) VaRk (W6,70) VaRk (S) 0.9 7.7069 7.8183 7.6726 7.6943 7.6969 7.6859 0.95 8.7494 8.8676 8.7595 8.7172 8.7825 8.7666 0.99 10.7708 10.8451 11.0385 11.0462 10.9514 11.0023 0.995 11.5349 11.5835 11.9366 12.0611 11.8566 11.8925 0.999 13.1604 13.1473 13.8488 13.9701 13.9651 13.8551 for m = 3, 4, 5, 6 which are composed of 16000, 83797, + 0.2739Hx(;11, 3.0731) 494532 and 1928919 distributions, respectively. The + 0.3928Hx;17, 3.0731 resulting mixed Erlang approximations F (m 3, 4, () Wm,70 = 5, 6) are + 0.1188Hx(); 25, 3.0731

FxW3,70 ()= 0.2140Hx(;2, 2.0835) + 0.0052Hx(); 37, 3.0731 , + 0.5215Hx(;9, 2.0835) with respective KS distances of dKS(S,W3,70) = 0.0148, d (S,W ) 0.0050, d (S,W ) 0.0035 and + 0.2645Hx(;15, 2.0835), KS 4,70 = KS 5,70 = dKS(S,W6,70) = 0.0024. Lee and Lin (2010) used the EM

FxW4,70 ()= 0.2266Hx(;2, 2.0469) algorithm to fit a mixed Erlang distribution to the same distribution, which resulted in the following model: + 0.4440Hx(;9, 2.0469) Fx0.2282Hx(;2,1.9603) + 0.2963Hx(;13, 2.0469) WEM ()= 0.5430Hx(;9,1.9603) + 0.0330Hx(;19, 2.0469), + + 0.2288Hx(;14,1.9603), FxW5,70 ()= 0.2023Hx(;3, 3.7271)

+ 0.2091Hx(;12, 3.7271) with KS distance dKS(S, WEM) = 0.0094. Note that the KS distance for the 3-moment approximation is greater + 0.3936Hx(;19, 3.7271) than the KS distance obtained using the EM estima- + 0.1805Hx(;28, 3.7271) tion. We recall that the EM algorithm finds maximum likelihood estimates using the approximated distribu- + 0.0145Hx(;42, 3.7271), tion as an input, while our method is based on only partial information on the approximated distribution and (e.g., its first m moments). We observe that the fit improves and the KS distance decreases when more FxW ()= 0.0768Hx(;2, 3.0731) 6,70 moments are included in the approximation. For illus- + 0.1325Hx(;3, 3.0731) trative purposes, we also provide in Tables 3 and 4

Table 4. Values of TVaRj for Wm,70 (m = 3, 4, 5, 6), WEM and S

k TVaRk (WEM) TVaRk (W3,70) TVaRk (W4,70) TVaRk (W5,70) TVaRk (W6,70) TVaRk (S) 0.9 9.0866 9.1909 9.1596 9.1404 9.1634 9.1598 0.95 9.9929 10.0845 10.1583 10.1234 10.1378 10.1469 0.99 11.8253 11.8620 12.2766 12.3665 12.2440 12.2528 0.995 12.5377 12.5481 13.1132 13.2302 13.1371 13.1065 0.999 14.0774 14.0266 14.9062 14.8770 15.0978 15.0069

VOLUME 10/ISSUE 1 CASUALTY ACTUARIAL SOCIETY 173 Variance Advancing the Science of Risk

some values of VaR and TVaR for Wm,70 (m = 3, 4, 5, 6), and

WEM and S. FxW4,90 ()= 0.0347Hx(;35,1.0972) 3.2.3. + 0.0834Hx(;66,1.0972) The Gompertz distribution with df FS(x) = 1 - exp{-B(cx - 1)/ln c} (x > 0) has been extensively + 0.1009Hx(;67,1.0972) applied in various life contingency contexts (e.g., Bowers et al. 1997). Lenart (2014) provides an expres- + 0.7810Hx(;90,1.0972), sion for the j-th moment, namely with KS distances dKS(S,W3,90) = 0.0188, and dKS(S,W4,90) B j! j−1  B  = 0.0239. In Figure 2, we compare the fit of the µ= eElnc ,(7) j ()lnc j 1  lnc 3- and 4-moment approximations to the exact dis- tribution by plotting their densities (left) and dfs j ∞ ()ln x where Ezj = xe−−szxdx is the generalized (right). Overall, we observe that the fit is quite s () ∫1 j! reasonable. integro-exponential function (see Milgram 1985). Here, Note that the KS distance increases from the we consider an example of Melnikov and Romaniuk 3-moment to the 4-moment approximation. We notice (2006) on the 1959–1999 USA mortality data of the that both F and F use the Erlang-90 df, where W3,90 W4,90 human mortality database where the parameters B and 90 is the largest element of A90. As such, a mixed 5 c were estimated to B = 6.148 × 10- and c = 1.09159. Erlang approximation with a smaller KS distance can Using (7), the first five moments are l5 = (76.3437, likely be found in both cases by choosing a larger 6037.202, 489676.3, 40524308, 3410245408). We support Al (i.e., l > 90). consider the class of mixed Erlang distributions res 3.2.4. Some remarks  (lm, A90). The resulting mixed Erlang approxi- mations F and F are To provide insight on the quality of the moment- W3,90 W4,90 based mixed Erlang approximation proposed in this

FxW3,90 ()= 0.0154Hx(;22,1.0928) section, we briefly revisit the results of the above three examples. In the first two (lognormal and mixture + 0.2210Hx(;65,1.0928) of two gammas), the mixed Erlang approximation is + 0.7637Hx(;90,1.0928) easy to implement and provides a very satisfactory fit

Figure 2. Density (left) and df (right): 3- and 4-moment approximations vs. Gompertz distribution

174 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1 Moment-Based Approximation with Mixed Erlang Distributions

to the true distribution. As none of the resulting mixed FxW4,5:200 ()= 0.0168Hx(;40,1.6422) Erlang approximations uses the Erlang-70 df (as + 0.0971Hx(;85,1.6422) l = 70 in the first two examples), it is unlikely that a better approximation (from the viewpoint of KS dis- + 0.3044Hx(;115,1.6422) tance) can be found by increasing the value of l. Given + 0.5818Hx(;140,1.6422), that the best KS-fit is found from a mixture of Erlang distributions with relatively small shape parameter and (which corresponds to the parameter k in the Erlang df (2), the proposed method seems particularly well FxW5,5:200 ()= 0.0038Hx(;20,1.9095) suited for these two cases. + 0.0393Hx(;75,1.9095) As for any approximation method, limitations can also be found, as evidenced by the Gompertz exam- + 0.1262Hx(;110,1.9095) ple. Indeed, for this example, both the 3-moment + 0.3275Hx(;140,1.9095) and 4-moment mixed Erlang approximations use the Erlang-90 df (recall l = 90 for this example). As men- + 0.5032Hx(;165,1.9095), tioned earlier, one can likely reduce the KS distance (if so desired) of the resulting mixed Erlang approximation with KS distances dKS(S,W3,5:200) = 0.0171, dKS(S,W4,5:200) by increasing the value of l (in light of the comments = 0.0086, and dKS(S,W5,5:200) = 0.0072. Note that the KS in Remark 4). This implies that the KS-optimal mixed distance in the 4-moment approximation is consider- Erlang approximation would likely involve Erlang ably lower than for dKS(S,W4,90) = 0.0239. In Figure 3, distributions with large shape parameters (which have we compare the fit of the 5-moment approximation W to the Gompertz distribution by plotting their smaller for a given rate parameter b). 5,5:200 In general, distributions with negative densities (left) and dfs (right). We can see that the fit and sharp density peak(s) may require the use of is quite acceptable and of a better quality than the two Erlang distributions with large shape parameters to approximations displayed in Figure 3. provide a good approximation. Computational time of the proposed mixed Erlang methodology may become 4. Moment-based mixed Erlang a non-negligible issue in these cases (especially as the approximation with known a number of moments matched increases). However, a 4.1. Basic definitions slight adjustment to the proposed methodology may be considered to address this time-consuming issue. We consider here a slightly different context than the one of Section 3. Instead of approximating a general df Indeed, one may replace the set Al = {1, 2, . . . , l} in F with known moments , we assume that the df F is (1) by a set of the form {a, 2a, . . . , ja} for positive S lm S integers a and j (note that the two sets coincide when known to be of mixed Erlang form (1) with given rate parameter > 0, and first m moments . However, a = 1 and j = l). To illustrate this, we have re-considered b lm l the mixing weights {zk}k=1 are assumed unknown or the Gompertz example by replacing the set A90 by the set {5, 10, 15, . . . . , 200}. The resulting mixed Erlang difficult to obtain. Various applications in risk theory and credit risk fall into this context (e.g., Cossette et al. approximation, denoted by the rv Wm,5:200 when m 2002, Lindskog and McNeil 2003, and McNeil et al. moments are matched (m = 3,4,5), are given by 2005; see also the application of Section 4.4).

FxW3,5:200 ()= 0.0447Hx(;45,1.3185) For a given rate parameter b > 0, let (lm, b) be the set of all mixed Erlang distributions with df + 0.2573Hx(;85,1.3185) (1) (as l → ∞) and first m moments lm. Also, define

+ 0.6980Hx(;110,1.3185), (lm, Al, b) to be the subset of (lm, b) with

VOLUME 10/ISSUE 1 CASUALTY ACTUARIAL SOCIETY 175 Variance Advancing the Science of Risk

Figure 3. Density (left) and df (right): 5-moment approximation vs Gompertz distribution

l + ext n df (1) for a given l ∈ N , and let  (lm, Al, b) be for j = 1, . . . , m where kn = ∑k=1 zk k . In matrix T m a further subset of (lm, Al, b) such that at most form, we have Mb = sjm, where s = {s( j, n)}j,n=1 l (m + 1) of the mixing weights {zk}k=1 are non-zero. (with s( j, n) = 0 for n > j) and jm = (k1, . . . , km).

Note that a distribution in (lm, Al, b) can be Isolating jm yields expressed as a convex combination of distributions −1 T ext sM .(9) in  (lm, Al, b) (see, e.g., De Vylder 1996). j m = ()β For a given function f, we consider two approaches -1 to derive bounds and approximations for E[f(S)] when From Comtet (1974, p. 213) we know that s ≡ i+j m c = {(-1) c(i, j)}i,j=1 where the c(i, j)’s are the Stir- S ∈(lm, Al, b) (in cases when the expectation exists). The first approach is based on discrete s-convex extre- ling numbers of the second kind defined as c(i, j) -1 j j-k j i mal distributions while the second is based on moment = ( j!) ∑k=0(-1) (k)k (e.g., Abramowitz and Stegun bounds on discrete expected stop-loss transforms. 1972).

Thus, for a given b > 0, the class (lm, Al, b) can Remark 5. Naturally, the set ( , A , ) tends  lm l b be found through the identification of all discrete dis- to (lm, b) as l → ∞. As such, when S ∈(lm, b), tributions with support Al and first m moments given bounds for risk measures on (l , b) can be m by the right-hand side of (9). Equivalently, the class approximated by their counterparts in (l , A , b) m l  ext(l , A , b) can be identified by restricting the for l reasonably large. m l discrete distributions on Al to have at most (m + 1) For a mixed Erlang rv S with df (1), its j-th moment non-zero mass points. This argument is formalized is known to satisfy (6). Using the identity in the next section.

j−1 j ∏()ki+=∑ sj(),,nkn i=0 n=1 4.2. Discrete s-convex extremal distributions where the s( j, n)’s are the (signed) Stirling numbers of Let ( , A ) be all discrete distributions with the first kind (e.g., Abramowitz and Stegun 1972), (6) `m l support A and first m moments ( , , . . . , ). becomes l am = `1 a2 am ext Also, denote by  (`m, Al) the subset of (`m, Al) j j with distributions having at most (m + 1) non-zero βµjn=κ∑ sj(),,n (8) n=1 mass points. For a given b > 0, each distribution

176 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1 Moment-Based Approximation with Mixed Erlang Distributions

ext in (jm, Al) (and  (jm, Al)) with jm as defined in We mention that the definition of s-convex func- (9) corresponds to a mixed Erlang distribution tion, which refers to higher-convexity, should not be ext in (lm, Al, b) (and  (lm, Al, b)). This is a confused with the one for Schur-convex function, one-to-one correspondence (e.g., De Vylder 1996, also known as S-convex function. part 2). Definition 8. Denuit, Lefèvre and Shaked (2000). Remark 6. Because of this one-to-one correspon- For two rv’s X and Y defined on , X is smaller than  dence between (jm, Al) and (lm, Al, b), condi- Y in the s-convex sense, namely Xs-cxY, if E[f(X)] tions under which (lm, Al, b) is not empty can ≤ E[f(Y)] for all s-convex functions f (provided the be found from its discrete counterpart (jm, Al). We expectation exists). refer the reader to, e.g., De Vylder (1996), Marceau We mention that the 1-convex order corresponds (1996), or Courtois and Denuit (2009). to the usual stochastic dominance order, and the This allows us to make use of the theory devel- 2-convex order is the usual convex order (see Müller oped in Prékopa (1990), Denuit and Lefèvre (1997), and Stoyan (2002), Denuit et al. (2005), and Shaked Denuit, Lefèvre and Mesfioui (1999), and Courtois and Shanthikumar (2007) for a review on stochastic et al. (2006) to derive bounds/approximations for orders). Also, as stated in Theorem 1.6.3 of Müller

E[f(S)] when S ∈(lm, Al, b). and Stoyan (2002), the s-convex order can only First, we briefly recall the definitions of s-convex be used to compare rv’s with the same first (s - 1) function and s-convex order introduced by Denuit moments (which explains why s is chosen to be m + 1 et al. (1998). in what follows). Examples of s-convex functions s+j Definition 7. Denuit, Lefèvre and Shaked (1998, are f(x) = x for j ∈N and f(x) = exp(cx) for c ≥ 0. For a general treatment of the s-convex order, 2000). Let  be a subinterval of R or a subset of N, see, e.g., Denuit, Lefèvre, and Mesfioui (1999) and and f a function on . For a positive integer s and x0 . . . Denuit, Lefèvre, and Shaked (2000) and section 1.6 < x1 < < xs ∈ , we recursively define the divided differences as of Müller and Stoyan (2002).

Let K(m+1)-min and K(m+1)-max be the (m + 1)-extremum

[]xx12, ,...,,xxk φ − []01xx,..., kk−1 φ rv’s on (jm, Al), i.e., those which satisfy []xx01, ,..., xk φ = xxk − 0 EK[]φ≤()()m+−1 min EK[]φ() k φ(xi ) = ,,k = 1, 2,...,,s ∑ k ≤φEK()m+−1max , i=1 []() ∏ ( xxij− ) jj=≠0, i for any (m + 1)-convex function f and K ∈ (jm, Al). where [xk]f = f(xk) for k = 0,1, . . . , s. The function f The general distribution forms of K(m+1)-min and K(m+1)-max is s-convex if [x0, x1, . . . , xs]f ≥ 0 for all x0 < x1 are given in Prékopa (1990) (see also Courtois et al. . . . < < xs ∈ . (2006, Section 4)), and are repeated here:

m + 1 even m + 1 odd     support of K(m+1)-min  jj11,1++,...,,jjmm++1 1 1 1, jj11,1++,...,,jjmm1  2 2   22      support of K(m+1)-max 1, jj11,1++,...,,jjmm−−1 1 1, l  jj11,1++,...,,jjmm1, l  2 2   22  (10)

VOLUME 10/ISSUE 1 CASUALTY ACTUARIAL SOCIETY 177 Variance Advancing the Science of Risk

. . . where 1 < j1 < j1 + 1 < j2 < < l. From (10), it bound, we define the corresponding rv Km-down on Al is clear that the support of K(m+1)-min(max) has at most via the df m + 1 elements. K infKAD , EK()− k Let W = ∑ C be a mixed Erlang rv, i.e., {C }   ∈ ()j ml []+  K j=1 j j j≥1 1 −   , are a sequence of iid exponential rv’s with mean 1/b,   inf1EK k    −−KA∈D()j ml, []()− +  independent of K. The following result of Denuit, FkKm−down ()=  Lefèvre and Utev (1999, Property 5.7) relates to the  kl=−1, 2,...,1,  stability of the s-convex order under compounding. 1, kl=

Al R+ Lemma 9 If Ks-cx K′, then WKs-cxWK′. (12) We apply Lemma 9 to define the mixed Erlang for k ∈ Al. Similarly, Km-up is defined as in (12) by rv’s W and W , which are the (m + 1)- K(m+1)-min K(m+1)-max replacing ‘inf’ by ‘sup’ in the definition. Given that, extremum rv’s on (lm, Al, b). It is immediate for K ∈ (jm, Al), that, for W ∈ (lm, Al, b),

EK[]()m−downm−≤kE++[]()Kk −≤EK()−up − k , RR+ +  +  WWKm()mm+−1 min ≺≺()+−11cx ()mc+−xKW ()+−1max .(11)

for all k ∈ Al, it implies that Km-down(up) is smaller For instance, using (11), the (m + 1)-convex func- (larger) than K under the increasing convex order, tions f(x) = x m+1+j ( j ∈ N) and f(x) = exp(cx) (c ≥ 0) namely Km-down icx Kicx Km-up (see, e.g., Courtois yield and Denuit 2009). Note that Km-down and Km-up do not

mj11mj mj1 belong to (j , A ), but both have first moment k . The EW++ ≤≤ EW++ EW ++ m l 1 []K()mm+−1 min [][]K()+−1max increasing convex order is stable under compounding, and and thus

WWKi≺≺cx Kicx WK .(13) EcexpeWExp cW m−−downmup []()K()m+−1 min ≤ []() Then, from Denuit et al. (2005, Proposition 3.4.8), it ≤ Ecexp,W []()K()m+−1max follows that respectively. TVaRκκ()WWKKm−down ≤ TVaR ()

4.3. Moment bounds on discrete TVaR W ,(14) ≤ κ ()Kmu− p expected stop-loss transforms for k ∈ (0, 1). Clearly, the rv’s W and W Extrema on the (m + 1)-convex order yield bounds for Km-down Km-up will most likely not belong to (lm, Al, b). E[f(W)] when f is (m + 1)-convex and W ∈ (lm, Al, b). However, this approach is not appropriate to derive Remark 10. Note that, when neither of the two bounds on TVaR and the stop-loss premium when the aforementioned approaches are applicable, we pro- number of known moments is greater than 2. It is well pose to derive approximate bounds for E[f(S)] with known that two rv’s with the same mean and variance ext S ∈  (lm, Al, b) by calculating E[f(W)] for all cannot be compared under the convex order. ext W ∈  (lm, Al, b) and choosing Consequently, we use an approach inspired from

EW[]φ=()min() max infs()up EW[]φ(). Courtois and Denuit (2009) (see also Hürlimann 2002) ext WA∈βME ()lml,, to derive bounds on TVaR and the stop-loss premium.

We consider (jm, Al) and determine lower and upper Obviously, E[f(S)] does not necessarily lie between bounds for E[(K - k)+] for all k ∈ Al. From the lower E[f(W)]min and E[f(W)]max. However, the ‘interval’

178 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1 Moment-Based Approximation with Mixed Erlang Distributions

estimate [E[f(W)]min, E[f(W)]max] may give an idea FxW ()= 0.4860Hx();1,1 K5m− ax of the variability of all solutions on (lm, Al, b). + 0.3981Hx();2,1 4.4. Example: Portfolio of dependent risks + 0.0621Hx();4,1 We consider a portfolio of n dependent risks as described in the common mixture model of Cossette + 0.0537Hx();5,1 . . . et al. (2002). Let S = X1 + + Xn be the aggregate + 0.0000Hx(); 20,1, claim amount with Xi = BiIi. Conditional on a common mixture rv Q with pmf aj = (Q = j) for j = 1, 2, . . . , FxW ()= 0.5126Hx();1,1 P K6−min n {Ii}i=1 are assumed to form a sequence of independent j + 0.3179Hx();2,1 Bernoulli rv’s with P(Ii = 1Q = j) = 1 - (ri) for ri ∈ (0, 1). As for the Bi’s, they are assumed to form a + 0.0531Hx();3,1 sequence of iid exponential rv’s of mean 1, indepen- 20 + 0.1082Hx();4,1 dent of {Ii}i=1 and Q. In this context, it is clear that S is a two-point mix- + 0.0059Hx();7,1 ture of a degenerate rv at 0 and a mixed Erlang rv of the form (1) with l = n and b = 1, i.e., its Laplace + 0.0023Hx();8,1 , transform is given by and Ee[]−−tS ≡−1 pp+ Ee[]tY FxW ()= 0.5322Hx();1,1 K6m− ax ∞ 1 ar n  j 1 r j  ,0t , =+∑ ji∏i=1()()− ()i   ≥ + 0.2264Hx();2,1 j=1   1 + t  

∞ n j + 0.2110Hx();3,1 where p = 1 - ∑j=1 aj∏i=1(ri) . We perform the moment- based approximation on the rv Y = (SS > 0) rather + 0.0004Hx();5,1 than S. 0.0299Hx;6,1 For illustrative purposes, we assume n = 20 and a + () j logarithmic pmf for Q, namely aj = (0.5) /( j ln 2) for + 0.0000Hx(); 20,1. j ≥ 1. Also, the constants ri are set such that the (uncon- Q . . . Let X be a rv with df F (x) = 1 - p ditional) mean of Ii is qi = 1 - E[(ri) ] with q1 = K(m+1)-min(max) XK(m+1)-min(max) . . . + pF (x) for x ≥ 0. It follows from Section 4.2 = q10 = 0.1 and q11 = = q20 = 0.02. Under the above WK(m+1)-min(max) assumptions, the first five moments of Y are found to be that lower (upper) bounds for the higher-order moments E[S j] (j 4, 5, . . .) and the exponential l5 = (1.7999, 6.2270, 31.4785, 208.1258, 1693.7077). = hS Using the approach on discrete (m + 1)-convex premium principle defined as jh(S) = (ln E[e ])/h extremal distributions, the dfs F and F (h > 0) can be found from their counterparts for WK WK (m+1)-min (m+1)-max X . A few numerical values are provided in for m = 4, 5 are: K(m+1)-min(max) Tables 5 and 6, respectively. FxW ()= 0.5365Hx();1,1 K5−min As expected, the bounds get sharper as the number + 0.2127Hx();2,1 of moments involved increases. As for the second approach based on moment + 0.2232Hx();3,1 bounds with discrete expected stop-loss transforms, Table 7 presents the values of TVaR for X + 0.0245Hx();6,1 Km-down(up) (m = 4, 5) with df F (x) = 1 - p + pF (x) XKm-down(up) WKm-down(up) + 0.0030Hx();7,1 , for x ≥ 0.

VOLUME 10/ISSUE 1 CASUALTY ACTUARIAL SOCIETY 179 Variance Advancing the Science of Risk

Table 5. Higher-order moments of S, XK and XK (m = 4, 5) (m+1)-min (m+1)-max j EXj EXj ESj EXj EXj []K5m− in []K6m− in [] []K6m− ax []K5m− ax 4 138.7579 138.7579 138.7579 138.7579 138.7579 5 1125.9592 1129.1880 1129.1880 1129.1880 1149.9348 6 10748.5738 10873.8020 10881.2732 10922.7337 11993.6176

Table 6. Exponential premiums of S, XK and XK (m = 4, 5) (m+1)-min (m+1)-max X X h ϕη ()XK5m− in ϕη ()XK6m− in ϕη ()S ϕη ()K6m− ax ϕη ()K5m− ax 0.2 1.5545 1.5546 1.5546 1.5548 1.5564 0.1 1.3536 1.3536 1.3536 1.3536 1.3536 0.01 1.2137 1.2137 1.2137 1.2137 1.2137

Table 7. Values of TVaR for S, X and X (m = 4, 5) Km-down Km-up Exact 4 moments 5 moments TVaR S TVaR X TVaR X TVaR X TVaR X k κ () κ()K4−down κ ()K4−up κ()K5−down κ()K5−up 0.9 5.0696 4.9222 5.2062 4.9800 5.1490 0.95 6.2214 5.9708 6.4548 6.0594 6.3642 0.99 8.8460 8.2899 9.3301 8.4655 9.1767 0.995 9.9589 9.2500 10.5629 9.4679 10.3775 0.999 12.5066 11.4122 13.4854 11.7323 13.1382

ext Table 8. Minimal/Maximal values of VaR within ME (µm, A20, 1) (m = 4,5)

vs VaRj (S)

ext ext Exact  (µ4, A20, 1)  (µ5, A20, 1)

k VaRk (S) VaRk (X)min VaRk (X)max VaRk (X)min VaRk (X)max 0.9 3.3965 3.3896 3.4031 3.3897 3.4001 0.95 4.5704 4.5539 4.6030 4.5584 4.5730 0.99 7.2334 7.2182 7.2690 7.2251 7.2533 0.995 8.3604 8.3143 8.3946 8.3539 8.3925 0.999 10.9388 10.7191 10.9781 10.9226 10.9536

As expected, the inequality (14) is verified. Also, Also, the spread between the minimal and maxi- we observe that the interval estimate of TVaRk (S) mal values of VaR are reduced when we go from shrinks as the number of moments matched increases. 4 to 5 moments. An identical exercise for the TVaR Finally, given that neither method is applicable for risk measure resulted in the same conclusions. the VaR risk measure, we make use of the technique discussed in Remark 10. We find the numerical values References provided in Table 8. For this example, we observe that the exact val- Abramowitz, M., and I. Stegun, Handbook of Mathematical ues of VaR (S) are within the minimal and maxi- Functions, 9th ed., Washington, DC: US Government Printing k Office, 1972. mal values of their corresponding risk measures Altiok, T., “On the Phase-Type Approximations of General Dis- ext among all members of ME (lm, A20, 1) for m = 4, 5. tributions,” IIE Transactions 17:2, 1985, pp. 110–116.

180 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1 Moment-Based Approximation with Mixed Erlang Distributions

Bobbio, A., A. Horváth, and M. Telek, “Matching Three De Vylder, F. E., Advanced Risk Theory. A Self-Contained Intro- Moments with Minimal Acyclic Phase Type Distributions,” duction. Editions de l’Université Libre de Bruxelles—Swiss Stochastic Models 21, 2005, pp. 303–326. Association of Actuaries, Bruxelles, 1996. Bowers, N. L., H. U. Gerber, G. C. Hickman, D. A. Jones, and C. J. Dufresne, D., “Fitting Combinations of Exponential to Probabil- Nesbit, Actuarial Mathematics, Schaumburg, Illinois: Society ity Distributions,” Applied Stochastic Models in Business and of Actuaries, 1997. Industry 23, 2007, pp. 23–48. Chaubey, Y. P., J. Garrido, and S. Trudeau, “On the Computation Gerber, H. U., An Introduction to Mathematical Risk Theory. of Aggregate Claims Distributions: Some New Approxima- S. S. Huebner Foundation. Philadelphia: University of Penn- tions,” Insurance: Mathematics and Economics 23:3, 1998, sylvania, 1979. pp. 215–230. Hürlimann, W., “Analytical Bounds for Two Value-at-Risk Func- Cheung, E. C. K., and J. K. Woo, “On the Discounted Aggregate tionals,” ASTIN Bulletin 32, 2002, pp. 235–265. Claim Costs until Ruin in Dependent Sparre Andersen Risk Johnson, M. A., and M. R. Taaffe, “Matching Moments to Phase Processes,” Scandinavian Actuarial Journal 2016:1, 2016, Distributions: Mixtures of Erlang Distribution of Common pp. 63–91. Order,” Stochastic Models 5, 1989, pp. 711–743. Comtet, L., Advanced Combinatorics, Boston: Dordrecht- Karlin, S., and W. J. Studden, Tchebycheff Systems: With Holland, 1974. Applications in Analysis and Statistics. New York: Wiley, Cossette, H., M. Mailhot, and E. Marceau, “T-Var Based Capital 1966. Allocation for Multivariate Compound Distributions,” Insur- Landriault, D., and G. E. Willmot, “On the Joint Distributions of ance: Mathematics and Economics 50:2, 2012, pp. 247–256. the Time to Ruin, the Surplus Prior to Ruin and the Deficit at Cossette, H., P. Gaillardetz, and E. Marceau, “Common Mix- Ruin in the Classical Risk Model,” North American Actuarial tures in the Individual Risk Model,” Bulletin de l’Association Journal 13:2, 2009, pp. 252–270. suisse des actuaires, 2002, pp. 131–157. Lee, S. C., and X. S. Lin, “Modeling and Evaluating Insurance Courtois, C., and M. Denuit, “Moment Bounds on Discrete Losses via Mixtures of Erlang Distributions,” North Ameri- Expected Stop-Loss Transforms, with Applications,” Meth- can Actuarial Journal 14:1, 2010, pp. 107–130. odology and Computing in Applied Probability 11, 2009, Lee, Y. S., and T. K. Lin, “Higher-Order Cornish Fisher Expan- pp. 307–338. sion,” Applied Statistics 41, 1992, pp. 233–240. Courtois, C., and M. Denuit, “Bounds on Convex Reliability Lenart, A., “The Moments of the Gompertz Distribution and Functions with Known First Moments,” European Journal of Maximum Likelihood Estimation of its Parameters,” Scandi- Operational Research 177, 2007, pp. 365–377. navian Actuarial Journal 2014, pp. 255–277. Courtois, C., M. Denuit, and S. Van Bellegem, “Discrete Lindskog, F., and A. J. McNeil, “Common Poisson Shock Mod- S-convex Extremal Distributions: Theory and Applications,” els: Applications to Insurance and Credit Risk Modelling,” Applied Mathematics Letters 19, 2006, pp. 1367–1377. ASTIN Bulletin 33:2, 2003, pp. 209–238. Daykin, C. D., T. Pentikäinen, and H. Pesonen, Practical Risk Marceau, E., Classical Risk Theory and Schmitter’s Problems. PhD Theory for Actuaries, New York: Chapman-Hall, 1994. Thesis. Université Catholique de Louvain, Louvain-la-Neuve, Denuit, M., J. Dhaene, M. J. Goovaerts, and R. Kaas, Actuarial 1996. Theory for Dependent Risks—Measures, Orders and Models, McNeil, A. J., R. Frey, and P. Embrechts, Quantitative Risk Man- New York: Wiley, 2005. agement. Princeton University Press, Princeton, 2005. Denuit, M., and M. Lefèvre, “Some New Classes of Stochastic Melnikov, A., and Y. Romaniuk, “Evaluating the Performance Order Relations Among Arithmetic Random Variables, with of Gompertz, Makeham and Lee–Carter Mortality Models for Applications in Actuarial Sciences,” Insurance: Mathematics Risk Management with Unit-Linked Contracts,” Insurance: and Economics 20, 1997, pp. 197–214. Mathematics and Economics 39, 2006, pp. 310–329. Denuit, M., C. Lefèvre, and M. Shaked, “The s-convex Orders Milgram, M., “The Generalized Integro-Exponential Function,” among Real Random Variables, with Applications,” Math- Mathematics of Computation 44:170, 1985, pp. 443–458. ematical Inequalities and Applications 1, 1998, pp. 585–613. Müller, A., and D. Stoyan, Comparison Methods for Stochastic Denuit, M., M. Lefèvre, and M. Mesfioui, “On s-convex Stochas- Models and Risks, New York: Wiley, 2002. tic Extrema for Arithmetic Risks,” Insurance: Mathematics Osogami, T., and M. Harchol-Balter, “Closed Form Solutions for and Economics 25, 1999, pp. 143–155. Mapping General Distributions to Quasi-Minimal PH Distri- Denuit, M., C. Lefèvre, and M. Shaked, “Stochastic Convexity butions,” Performance Evaluation 63, 2006, pp. 524–552. of the Poisson Mixture Model,” Methodology and Computing Prékopa, A., “The Discrete Moment Problem and Linear Program- in Applied Probability 2:3, 2000, pp. 231–254. ming,” Discrete Applied Mathematics 27, 1990, pp. 235–254. Denuit, M., C. Lefèvre, and S. Utev, “Generalized Stochastic Ramsay, C., “A Note on the Normal Power Approximation,” Convexity and Stochastic Orderings of Mixtures,” Probabil- ASTIN Bulletin 21, 1991, pp. 147–150. ity in the Engineering and Informational Sciences 13, 1999, Seal, H. L., “Approximation to Risk Theory’s F(x,t) by Means of pp. 275–291. the Gamma Distribution,” ASTIN Bulletin 9, 1977, pp. 213–218.

VOLUME 10/ISSUE 1 CASUALTY ACTUARIAL SOCIETY 181 Variance Advancing the Science of Risk

Shaked, M., and J. G. Shanthikumar, Stochastic Orders and their Venter, G., “Transformed Beta and Gamma Distributions and Applications, 2nd ed., New York: Springer-Verlag, 2007. Aggregate Losses” Proceedings of the Casualty Actuarial Telek, M., and A. Heindl, “Matching Moments for Acyclic Dis- Society, 1983, pp. 156–193. crete and Continuous Phase-Type Distributions of Second Whitt, W., “Approximating a Point Process by a Renewal Pro- Order,” International Journal of Simulation Systems, Science cess, I: Two Basic Methods,” Operation Research 30:1, 1982, and Technology 3:3–4, 2002, pp. 47–57. pp. 125–147. Tijms, H. C., Stochastic Models: An Algorithmic Approach. Willmot, G. E., and J. K. Woo, “On the Class of Erlang Mixtures Chichester: John Wiley, 1994. with Risk Theoretic Applications,” North American Actuarial Vanden Bosch, P. M., D. C. Dietz, and E. A. Pohl, “Moment Journal 11:2, 2007, pp. 99–115. Matching Using a Family of Phase-type Distributions,” Com- Willmot, G. E., and X. S. Lin, “Risk Modelling with the Mixed munications in Statistics—Stochastic Models 16:3–4, 2000, Erlang Distribution,” Applied Stochastic Models in Business pp. 391–398. and Industry 27, 2011, pp. 2–16.

182 CASUALTY ACTUARIAL SOCIETY VOLUME 10/ISSUE 1