FITTING COMBINATIONS OF EXPONENTIALS TO PROBABILITY DISTRIBUTIONS

Daniel Dufresne Centre for Actuarial Studies University of Melbourne First version: 3 February 2005 This version: 8 November 2005

Abstract Two techniques are described for approximating distributions on the positive half-line by combinations of exponentials. One is based on Jacobi polynomial expansions, and the other on the log-. The techniques are applied to some well-known distributions (degenerate, uniform, Pareto, lognormal and others). In theory the techniques yield sequences of combination of exponentials that always converge to the true distribution, but their numerical performance depends on the particular distribution being approximated. An error bound is given in the case the log-beta approximations. Keywords: Sums of lognormals; Jacobi polynomials; Combinations of exponentials; Log-beta distribution

1. Introduction

The simplifies many calculations, for instance in risk theory, queueing theory, and so on. Often, combinations of exponentials also lead to simpler calculations, and, as a result, there is an interest in approximating arbitrary distributions on the positive half-line by combinations of exponentials. The class of approximating distributions considered in this paper consists in those with a distribution function F which can be written as

n −λj t 1 − F (t)= aje ,t≥ 0,λj > 0,j=1,...,n, 1 ≤ n<∞. k=1

When aj > 0, j =1,...,n, this distribution is a mixture of exponentials (also called a “hyper- exponential” distribution); in this paper, however, we allow some of the aj to be negative. The class we are considering is then a subset of the rational family of distributions (also called the matrix- , or Cox family), which comprises those distributions with a Laplace transform which is a rational function (a ratio of two polynomials). Another subset of the rational family is the class of phase-type distributions (Neuts, 1981), which includes the hyper-exponentials but not all combinations of exponentials. There is an expanding literature on these classes of distributions, see Asmussen & Bladt (1997) for more details and a list of references. For our purposes, the important property of combinations of exponentials is that they are dense in the set of probability distributions on [0, ∞). The problem of fitting distributions from one of the above classes to a given distribution is far form new; in particular, the fitting of phase-type distributions has attracted attention recently, see Asmussen (1997). However, this author came upon the question in particular contexts (see below) Fitting combinations of exponentials where combinations of exponentials are the ideal tool, and where there is no reason to restrict the search to hyper-exponentials or phase-type distributions. Known methods rely on least-squares, matching, and so on. The advantage of the methods described in this paper is that they apply to all distributions, though they perform better for some distributions than for others. For a different fitting technique and other references, see the paper by Feldmann & Whit (1998); they fit hyper-exponentials to a certain class of distributions (a subset of the class of distributions with a tail fatter than the exponential). The references cited above are related to queueing theory, where there is a lot of interest in phase-type distributions. However, this author has come across approximations by combinations of exponentials in relation to the following three problems. Risk theory. It has been known for some time that the probability of ruin is simpler to compute if the distribution of the claims, or that of the inter-arrival times of claims, is rational, see Dufresne (2001) for details and references. The simplifications which occur in risk theory have been well- known in the literature on random walks (see, for instance, the comments in Feller (1971)) and are also related to queueing theory. Convolutions. The distribution of the sum of independent random variables with a lognormal or is not known in simple form. Therefore, a possibility is to approximate the distribution of each of the variable involved by a combination of exponentials, and then proceed with the convolution, which is a relatively straightforward affair with combinations of exponentials. This idea is explored briefly in Section 7. Financial mathematics. Suppose an amount of money is invested in the stock market and is used to pay an annuity to a pensioner. What is the probability that the fund will last until the pensioner dies? The answer to this question is known in closed form if the rates of return are lognormal and the lifetime of the pensioner is exponential; this is due to Yor (1992), see also Dufresne (2005). In practice, however, the duration of human life is not exponentially distributed, and the distribution of the required “stochastic life annuity” can be found by expressing the future lifetime distribution as a combination of exponentials. The methods presented below are used to solve this problem in a separate paper (Dufresne, 2004a). Here is the layout of the paper. Section 2 defines the Jacobi polynomials and gives some of their properties, while Section 3 shows how these polynomials may be used to obtain a convergent series of exponentials that represents a continuous distribution function exactly. In many applications it is not necessary that the approximating combination of exponentials be a true , but in other cases this would be a problem. Two versions (A and B) of the method are described; method A yields a convergent sequence of combinations of exponentials which are not precisely density functions, while method B produces true density functions. A different idea is presented in Section 4. It is shown that sequences of log-beta distribution converge to the Dirac mass, and that this can be used to find a different sequence of combinations of exponentials which converges to any distribution on a finite interval. Sections 5 and 6 apply the methods described to the approximation of the some of the usual distributions: degenerate, uniform, Pareto, lognormal and Makeham and others. The Pareto and lognormal are fat-tail distributions, while the Makeham (used by actuaries to represent future lifetime) has a thin tail. The performance of the methods is discussed in those sections and in the Conclusion. Section 7 shows how the results of Section 3 may be applied to approximating the distribution of the sum of independent random variables.

2 Fitting combinations of exponentials

Notation and vocabulary. The cdf of a probability distribution is its cumulative distribution function, the ccdf is the complement of the cdf (one minus the cdf), and the pdf is the probability density function. An atom is a point x ∈ R where a measure has positive mass, or, equivalently, where its distribution function has a discontinuity. A Dirac mass is a measure which assigns mass 1 to some point x, and no mass elsewhere; it is denoted δx(·). The “big oh” notation has the usual meaning a a = O(b ) if lim n exists and is finite. n n →∞ n bn

Finally, (z)n is Pochhammer’s symbol:

(z)0 =1, (z)n = z(z +1)···(z + n − 1),n≥ 1.

2. Jacobi polynomials

The Jacobi polynomials are usually defined as   (α +1) 1 − x P (α,β)(x)= n F −n, n + α + β +1,α+1; ,n=0, 1,..., n n! 2 1 2 for α, β > −1. These polynomials are othogonal over the interval [−1, 1], for the weight function

(1 − x)α(1 + x)β.

We will use the “shifted” Jacobi polynomials, which have the following equivalent definitions, once again for α, β > −1:

(α,β) (α,β) − Rn (x)=Pn (2x 1) (α +1) = n F (−n, n + α + β +1,α+1;1− x) n! 2 1 (β +1) =(−1)n n F (−n, n + α + β +1,β+1;x) n! 2 1 (−1)n dn = (1 − x)−αx−β [(1 − x)α+nxβ+n] n! dxn n j = ρnjx , j=0 where n (−1) (β +1)n(−n)j(n + λ)j ρnj = ,λ= α + β +1. (β +1)jn!j! The shifted Jacobi polynomials are orthogonal on [0, 1], for the weight function

w(α,β)(x)=(1− x)αxβ.

The following result is stated in Luke (1969, p.284) (the result is given there in terms of the (α,β) (α,β) {Pn (x)}, but we have rephrased it in terms of the {Rn (x)}).

3 Fitting combinations of exponentials

Theorem 2.1. If φ(x) is continuous in the closed interval 0 ≤ x ≤ 1 and has a piecewise continuous derivative, then, for α, β > −1, the Jacobi series  ∞ 1 (α,β) 1 − α β (α,β) φ(x)= cnRn (x),cn = φ(x)(1 x) x Rn (x) dx hn 0 n=0 1 − α β (α,β) 2 Γ(n + α + 1)Γ(n + β +1) hn = (1 x) x Rn (x) dx = , 0 (2n + λ)n!Γ(n + λ) converges uniformly to f(x) in 

This means that the Fourier series

N (α,β) sN (x)= cnRn (x) n=0 converges to f(x) as N →∞, for all x in (0, 1). In the case of functions with discontinuities the limit of the series is 1 [f(x−)+f(x+)], 2 see Chapter 8 of Luke (1969) for more details. Another classical result is the following (see Higgins, 1977).

Theorem 2.2. If φ ∈ L2((0, 1),w(α,β)), that is, if φ is measurable and  1 φ2(x)w(α,β)(x) dx < ∞, 0

2 (α,β) then sN converges in L ((0, 1),w ) and

− lim φ sN L2((0,1),w(α,β)) =0. N→∞

3. Fitting combinations of exponentials to using Jacobi polynomials

Method A: direct Jacobi approximation of the ccdf The goal of this section is to approximate the complement of the distribution function by a combination of exponentials:  −λj t F (t)=1− F (t)=P(T>t) ≈ aje ,t≥ 0. (3.1) j

This can be attempted in many ways, for instance by matching moments. What we suggest is to transform the function to be approximated in the following way:   1 g(x)=F − log(x) , 0

4 Fitting combinations of exponentials where r>0. We have thus mapped the interval [0, ∞) onto (0, 1], t =0corresponding to x =1, and t →∞corresponds to x → 0+; since F (∞)=0, it is justified to set g(0) = 0.Ifwefind constants α, β, p and {bk} such that (informally)  p (α,β) g(x)=x bkR (x), 0

This expression agrees with (3.1), if λj =(j + p)r, j =0, 1, 2,...(The interchange of summation order is justified for finite sums, but not necessarily for infinite ones.) Classical Hilbert space theory says that the constants {bk} can be found by  1 1 −p (α,β) − α β bk = x g(x)Rk (x)(1 x) x dx hk 0  ∞ r −(β−p+1)rt − −rt α (α,β) −rt = e (1 e ) Rk (e )F (t) dt. (3.3) hk 0

This is a combination of terms of the form  ∞ e−(β−p+j+1)rt(1 − e−rt)αF (t) dt, j =0, 1,...,k. 0

The reason for using some p>0 is that it automatically gets rid of any constant which would otherwise appear in the series. If α =0, then this integral is the Laplace transform of F (·), which may be expressed in terms of the Laplace transform of the distribution of T :

 ∞  ∞ 1 1 e−νtF (t) dt = − F (t) de−νt = [1 − Ee−νT ],ν>0. (3.4) 0 ν 0 ν

The next result is an immediate consequence of Theorem 2.1.

Theorem 3.1. Suppose α, β > −1, that F (·) is continuous on [0, ∞), and that the function

eprtF (t)(3.5) has a finite limit as t tends to infinity for some p ∈ R (this is always verified if p ≤ 0). Then the integral in (3.3) converges for every k ∈ N and ∞ −prt (α,β) −rt F (t)=e bkRk (e )(3.6) k=0 for every t in (0, ∞), and the convergence is uniform over every interval [a, b], for 0

Not all distributions satisfy the condition that the limit of (3.5) exists for some p>0. The next result, a consequence of Theorem 2.2, does not need this assumption.

5 Fitting combinations of exponentials

Theorem 3.2. Suppose α, β > −1 and that for some p ∈ R and r>0

 ∞ e−(β+1−2p)rt(1 − e−rt)αF (t)2 dt < ∞ 0

(this is always true if p<β+1)/2). Then   ∞ N 2 −prt (α,β) −rt −(β+1−2p)rt −rt α lim F (t) − e bkR (e ) e (1 − e ) dt =0, N→∞ 0 k=0 and (3.6) holds almost everywhere, with {bk; k ≥ 0} given by (3.3).

For the purpose of approximating probability distributions by truncating the series, we will restrict our attention to 0

Method B: Modified Jacobi approximations which are true distribution functions

This alternative method applies only to distributions which have a density f(·). It consists in (1) fitting a combination of exponentials to the square root of the density; (2) squaring it; (3) rescaling it so it integrates to 1. We thus consider an expansion of the type  −pt −λj t f(t)=e ajλje . j

The square of this expression is also a combination of exponentials. The following application of Theorem 2.2 is immediate.

Theorem 3.3. Suppose α, β > −1, p ∈ R, r>0, and that f(·) is the density of a T ≥ 0. Suppose moreover that (i) E e−(β+1−2p)T < ∞. This is always satisfied if p ≤ (β +1)/2. (ii) E (T ∧ 1)α < ∞. This is always satisfied if α ≥ 0. Then ∞ −prt (α,β) −rt f(t)=e bkRk (e ) k=0 almost everywhere, with  ∞ r −(β−p+1)rt − −rt α (α,β) −rt bk = e (1 e ) Rk (e ) f(t) dt, k =0, 1,... hk 0

6 Fitting combinations of exponentials √ When truncated, the series for f may be negative in places, but squaring it yields a combi- nation of exponentials which is everywhere non-negative. It then only needs to be multiplied by the right constant (one over its integral) to become a true density function. Examples are given in Section 5. Observe that the integral of the squared truncated series may not converge to one. The usual L2 results (e.g. Higgins, 1977), say that the norm of the truncated series converges to the norm of the limit, which in this case means that, if we let N −prt (α,β) −rt φN (t)=e bkRk (e ), k=0 √ be the (N +1)-term approximation of f, then  ∞  ∞ lim φ (t)2e−(β+1−2p)rt(1 − e−rt)α dt = f(t)e−(β+1−2p)rt(1 − e−rt)α dt. →∞ N N 0 0 − 2 Only in the case where β +1 2p = α =0does this guarantee that the integral of φN tends to 1. A variation on the same theme consists in finding a combination of exponentials which ap- proximates  ∞ g(t)= f(s) ds, t differentiate with respect to t and then square as before (provided of course that the integral above converges). The advantage is that the same computer code may be used as for Method A.

4. Fitting combinations of exponentials using the log-beta distribution

This section considers approximating distributions over a finite interval by combinations of exponentials. It will be shown that a converging sequence of approximations can be constructed by starting with any sequence of combinations of exponentials which converges in distribution to a constant. A convenient sequence of exponentials is obtained from the lobgeta distribution, as will be seen below. Suppose the goal is to find a sequence of combination of exponentials which converge to distribution of a variable T , which takes values in [0,t0] only. Suppose that a sequence {Xn} of variables with densities which are combinations of exponentials converges in distribtution to t0 as n →∞; suppose moreover that {Xn} is independent of T . Then Tn = T + Xn − t0 converges in distribution to T . Denote the density of Xn by  (n) (n) − (n) λj t fXn (t)= aj λj e 1{t>0}. j Each variable T thus has a density n  − fTn (t)= fXn (t u) dFT −t0 (u)

n−1  (n) (n) −λ(n)t λ(n)u j j − = aj λj e e 1{u

7 Fitting combinations of exponentials

The integral in the last expression has the same value for all t ≥ 0 (but not for t<0). Hence, ≥ fTn (t) is a combination of exponentials for all t 0. Since P(Tn < 0) tends to 0 as n increases, the following result is obtained.

Theorem 4.1. Under the previous assumptions, the distributions with pdf’s equal to the combina- tions of exponentials

n−1 (n) (n) − (n) (n) (n) − (n) (n) λj t λj t0 λj T ≥ fCn (t)= bj λj e 1{t≥0},bj = aj e Ee /P(Tn 0) j=0 converge to the distribution of T .

It is possible to find an upper bound for the absolute difference between the true distribution · function of T and distribution function FCn ( ) of the combination of exponentials in Theorem 4.1, when T has a continuous distribution.

Theorem 4.2. Under the previous assumptions, if it is moreover assumed that T has a pdf fT (·),

| − |≤ | − | sup FT (z) FCn (z) [sup fT (t)] E Xn t0 + FTn (0) z t 1 ≤ − 2 2 [sup fT (t)] [E(Xn t0) ] + FTn (0). t

Proof. First,

F Tn (z) |F T (z) − F C (z)| = F T (z) − n F (0) Tn   1 − F (0) ≤| − | Tn F T (z) F Tn (z) + F Tn (z) F Tn (0) ≤| − | F T (z) F Tn (z) + FTn (0).

Second, let Yn = Xn − t0. Then

|F (z) − F (z)| = |F (z) − EF (z − Y )| T Tn  T  T n z ≤ fT (u) du dFYn (y) z−y  ≤ | | | | [sup fT (t)] y dFYn = [sup fT (t)] E Yn . t t

In applications, the mean of Xn will be close to t0, and FTn (0) will be small, so the error bound should be close to 1 [sup fT (t)] (VarXn) 2 . t The log-beta family of distributions will now be introduced, because it includes explicit combina- tions of exponentials which converge to a constant.

8 Fitting combinations of exponentials

Definition. For parameters a, b, c > 0, the log-beta distribution has density

−bx −cx a−1 fa,b,c(x)=Ka,b,ce (1 − e ) 1{x>0}.

This law is denoted LogBeta(a, b, c). The name chosen is justified by the fact that

X ∼ LogBeta(a, b, c) ⇔ e−cX ∼ Beta(b/c, a).

The distribution is then a transformed beta distribution that includes the gamma as a limiting case, since LogBeta(1,b,c)=Exp(b), lim LogBeta(a, b, c)=Gamma(a, b). c→0+ The log-beta distribution is unimodal, just as the beta distribution. (N.B. It would be possible to define the LogBeta(a, b, c) for a, b > 0 and −b/a < c < 0, since

(1 − e−cx)a−1e−bx =(−1)a−1(1 − ecx)a−1e−(b+ac)x. is integrable over (0, ∞). However, this law would then be identical to the LogBeta(a, b + ac, −c), and no greater generality would be achieved.) The next theorem collects some properties of the log-beta distributions; the proof may be found in the Appendix. The digammma function is defined as

1 d d ψ(z)= Γ(z)= log Γ(z). Γ(z) dz dz

It is known that (Lebedev, 1972, p.7)

∞   1 1 ψ(z)=−γ + − , (4.1) k +1 k + z k=0 where γ is Euler’s constant. The derivatives ψ(·),ψ(·),ψ(3)(·),...of the digamma function are known as the polygamma functions, and can be obtained by differentiating (4.1) term by term.

Theorem 4.3.(Properties of the log-beta distribution) Suppose X ∼ LogBeta(a, b, c), a, b, c > 0. b c Γ( c + a) (a) Ka,b,c = b . Γ( c )Γ(a) Γ( b + a)Γ( b+s ) −sX c c − ∼ (b) Ee = b+s b , s> b.Ifκ>0 and Y = κX, then Y LogBeta(a, b/κ, c/κ). Γ( c + a)Γ( c ) (c) The of the distribution are     (−1)n−1 b b κ = ψ(n−1) + a − ψ(n−1) ,n=1, 2,.... n cn c c

9 Fitting combinations of exponentials

In particular,         1 b − b 1  b −  b E X = ψ + a ψ ; Var X = 2 ψ ψ + a c c  c  c c c 1 b b E(X − E X)3 = ψ + a − ψ . c3 c c

(d) If c>0 is fixed, b = κa for a fixed constant κ>0, then   1 c X →d x = log 1+ as a →∞. 0 c κ Moreover, k k lim E X = x0 ,k=1, 2,... a→∞

(e) If a = n ∈ N+, then d X = Y0 + ···+ Yn−1, where Yj ∼ Exp(b + jc),j=0,...,n− 1, are independent, and

n−1 n−1 n−1 1 1 1 E X = , Var X = , E(X − E X)3 =2 . b + jc (b + jc)2 (b + jc)3 j=0 j=0 j=0

(f) If a = n ∈ N+, the density of X is

n−1 −λj x fn,b,c(x)= αjλje , j=0 where   − j b ( 1) − αj = c − − ,λj = b + jc, j =0,...,n 1. c n j!(n 1 j)!(b + jc)

(g) Under the same assumptions as in (d),

X − EX √ →d N(0, 1) as a →∞. Var X

(h) Suppose X1 ∼ LogBeta(a1,b,c) and X2 ∼ LogBeta(a2,b+ a1c, c) are independent. Then X1 + X2 ∼ LogBeta(a1 + a2,b,c)

10 Fitting combinations of exponentials

Corollary 4.4. For n ∈ N+ and any fixed κ, c, y0 > 0, the LogBeta(n, nκx0/y0,cx0/y0) dis- →∞ tribution has a pdf which is a combination of exponentials, and it tends to δy0 as n ,if x0 = log(1 + c/κ)/c. The same holds if x0 is replaced with the mean of the LogBeta(n, nκ, c) distribution.

Remark. The classical result that combinations of exponentials are dense in the set of distri- butions on R+ may thus be proved as follows: approximate the given distribution by a discrete distribution with a finite number of atoms; then approximate each atom by a log-beta distribution as in Corollary 4.4.

Section 6 applies Corollary 4.4 to some of the usual distributions. It is possible to be a little explicit about the maximum error made by using the combination of exponentials obtained (see Theorem 4.2). The proof of the next result is in the Appendix.

Theorem 4.5. Suppose Xn ∼ LogBeta(n, nκx0/t0,cx0/t0) as in Corollary 4.4. Then

2 ∼ 1 t0 1 →∞ VarXn 2 as n . n x0 κ(κ + c)

−p If, moreover, the variable T (0 ≤ T ≤ t0) has a pdf FT (·), and ET < ∞ for some p>1, then − p∧2 O 2 FTn (0) = (n ), and the upper bound in Theorem 4.4 becomes

1 1 √ | − |≤ 2 ∼ 2 O sup FT (z) FCn (z) [sup fT (t)] [VarXn] + FTn (0) [sup fT (t)] [VarXn] = (1/ n). z t t

Hence, under the assumptions of this theorem, a rule of thumb is that the maximum error is approximately 1 t0 1 √ [sup fT (t)] . n t x0 κ(κ + c)

5. Examples of Jacobi approximations

We apply the ideas above to some parametric distributions. The analysis focuses on the cdf or ccdf, rather than on the pdf. For Methods A and B some trial and error is required to determine “good” parameters α, β, p and r. All the computations were performed with Mathematica. It is not suggested that the particular approximations presented are the best in any sense. In the case of continuous distributions, a measure of the accuracy of an approximation Fˆ of a distribution function F is the maximum of the difference between the true distribution function F and an approximation Fˆ, which we denote

F − Fˆ = sup |F (t) − Fˆ(t)|. t≥0

Some of the low-order approximations are shown in graphs, as their larger errors make the curves distinguishable.

11 Fitting combinations of exponentials

Example 5.1. The degenerate distribution at x =1

In this case the Jacobi polynomials lead to highly oscillatory approximations. This can be seen in Figure 1. The log-beta approximations do better, but at the cost of a larger number of exponentials. See next section.

Example 5.2. The uniform distribution

This is not known to be the easiest distribution to approximate by exponentials. Figure 2 shows the ccdf of the U(1, 2) distribution, as well as two approximations found for it using method A. The parameters used are α =0,β=0,r= .7,p= .1. The 3-term approximation of the ccdf is (Method A) is

A − −0.07t −0.77t − −1.47t F 3 (t)= 0.364027e +3.497327e 2.137705e and is not good. The 10-term approximation does better. Both approximations are sometimes smaller than 0, sometimes greater than 1, and sometimes increasing, all things that a ccdf should not do. The accuracies do improve with the number of terms used: − A − A − A − A F F3 = .31, F F5 = .16, F F10 = .065, F F20 = .038.

Method√ B is illustrated in Figure 3. In this particular case there is√ a very convenient simplification, since f = f. The same Jacobi approximation is thus used for f. Differentiating F 3, squaring and rescaling yields

B F 5 (t) =0.01014e−0.14t − 0.357203e−.84t +10.522656e−1.54t − 16.518849e−2.24t +7.343256e−2.94t.

There are now 2 × 3 − 1=5distinct frequencies λj. Some of the maximum errors observed are: − B − B − B − B F F3 = .8, F F5 = .26, F F11 = .28, F F21 = .11. The B approximations are all true distributions (here the 11-term B approximation happens to be a little worse than the 5-term). The 5-term B approximation does slightly better than than the 3-term A approximation, although it is really based on the same 3-term Jacobi polynomial. However, the 5-term A approximation has better precision, although it is not a true ccdf. Overall, the Jacobi approximation is not terribly efficient for the uniform distribution, as the maximum errors above show. The graphs also reveal that the cdf (and therefore pdf) oscillates significantly. Other computations also indicate that the accuracy of the approximation varies with the interval chosen for the uniform distribution. The log-beta approximations are much smoother, see Section 6.

Example 5.3. The Pareto distribution

The tail of the Pareto distribution is heavier than that of a combination of exponentials, and it is not obvious a priori that the approximations we are considering would perform well. We consider a Pareto distribution with ccdf 1 F (t)= ,t≥ 0. 1+t

12 Fitting combinations of exponentials

Figure 4 compares the 3 and 5-term A approximations for the ccdf, while Figure 5 shows the same 3-term A approximation against the 5-term B approximation. The parameters are

α =0,β=0,r= .1,p= .1.

The accuracy increases with the number of terms used:

− A − A − A − A F F3 = .31, F F5 = .12, F F10 = .006, F F20 = .0015. Finding the B approximations was easy, because the square root of the pdf of this distribution is precisely its CDF, so the same Mathematica code could be used. The same parameters α, β, r and p were used, and produced:

− B − B − B − B F F3 = .31, F F5 = .24, F F11 = .018, F F21 = .0038.

Example 5.4. The lognormal distribution

The lognormal distribution has a tail which is heavier than the exponential, but lighter than the Pareto. This may explain why combinations of exponentials appear to do a little better with the lognormal than with the Pareto (see Figures 6 and 7). The parameters of the lognormal are 0 and .25 (meaning that the logarithm of a variable with this distribution has a with mean 0 and .25). The first parameter of the lognormal has no importance (it only affects the scale of the distribution), but the numerical experiments performed appear to show that smaller values of the second parameter lead to worse approximations. Figure 6 shows that the 3-term approximation does not do well at all, but that the 5-term is quite good (the exact and approximate may be difficult to distinguish). The parameters used are

α =0,β= .5,r=1,p= .2, and some of the precisions obtained are

− A − A − A − A × −6 F F3 = .11, F F5 = .014, F F10 = .0003, F F20 =1.5 10 .

The Method B approximations were found by first noting that the square root of the pdf of the Lognormal(µ, σ2) distribution is a constant times the pdf of a Lognormal(µ + σ2, 2σ2):

1/2 1 2 2 2 1 2 2 2 √ e−(log(x)−µ) /(2σ ) =(8πσ2)1/4e(σ +2µ)/4 √ √ e−[(log x−(µ+σ )] /4σ . xσ 2π xσ 2 2π It is possible to find an approximating combination of exponentials by computing a Jacobi ex- pansion for the pdf on the right, or else by finding one for the ccdf of the same distribution, and then differentiating. The two possibilities were compared, and the differences were minute. It was therefore decided to adopt the latter approach. The parameters used are

α =0,β= −.3,r= .7,p= .1, and some of the results found are

− B − B − B − B F F3 = .46, F F5 = .23, F F11 = .0065, F F21 = .0003.

13 Fitting combinations of exponentials

These are less accurate than the corresponding A approximations, but they do improve as the number of terms increases. Figure 7 compares the true ccdf with the 5 and 11 term B approximations. Overall, the approximations did better for the lognormal than for the Pareto in the specific cases chosen, but no general conclusion can be made based on two specific cases.

Example 5.5. The Makeham distribution

This distribution is used to represent the lifetime of insureds. It has

x µx = A + Bc .

We chose the parameters used in Bowers et al. (1997, p.78):

A = .0007,B=5× 10−5,c=10.04.

Figure 8 shows how the 3 and 7-term approximations do. The parameters chosen were

α =0,β=0,r= .08,p= .2, and higher degree Jacobi approximations have accuracies

− A − A − A − A F F3 = .08, F F5 = .04, F F10 = .0065, F F20 = .00024.

The accuracy is worse than with the lognormal. An explanation may be that the tail of the Makeham is much thinner than that of the exponential.

6. Examples of log-beta approximations

Example 6.1. The degenerate distribution at x =1

Figure 9 shows two log-beta approximations for a Dirac mass at x =1. The combinations of exponentials used are the LogBeta(n, nκ, c) distribution with n = 20 and 200, c = .1 and κ = .1/(e.1−1). The approximate ccdf’s are very smooth, by comparison with the Jacobi approximations (Figure 1).

Example 6.2. The uniform distribution

The pdf and ccdf of some log-beta approximations for the U(1, 2) distribution are shown in Figures 10 and 11. The LogBeta(n, nκ, c) distribution with n = 20 and 200, c = .1 and κ = .1/(e.2 − 1) was used. The approximations do not oscillate at all, as the Jacobi approximations do (Figures 2 and 3). The precisions attained were

− − FT FC20 = .178, FT FC200 = .0567.

The error bounds given by Theorem 4.2 are 0.451 and 0.142, respectively. In this case, ET −p < ∞ for any p, and P(Tn < 0) (which is a term in the error bound in Theorem 4.2) tends to 0 very

14 Fitting combinations of exponentials

−9 quickly: P(T20 < 0) = .000265, while P(T200 < 0) is of the order of 10 . The asymptotic expression for VarXn (Theorem 4.5) is extremely good here: 1 1 VarX =0.202698, =0.200668 20 20 κ(κ + c) 1 1 VarX =0.020087, =0.0200668. 200 200 κ(κ + c)

Example 6.3. Squared sine distribution

In order to test this method for a density which may not be as friendly as the previous two, consider the pdf 2 fT (t)=2sin(2πt)1(0,1)(t). The LogBeta(n, nκ, c) distribution with n = 20 and 200, c = .1 and κ = .1/(e.1 −1) was used.The results are shown in Figures 12 and 13. The precisions were

− − FT FC20 = .125, FT FC200 = .0296.

The error bounds given by Theorem 4.2 are .521 and .142, respectively. In this case ET −2 < ∞, P(T20 < 0) = 0.0725, and P(T200 < 0) = .00590. The asymptotic expression for VarXn (Theorem 4.5) is excellent in this case too: 1 1 VarX =0.0502929, =0.0500417 20 20 κ(κ + c) 1 1 VarX =0.00500667, =0.00500417. 200 200 κ(κ + c)

7. Application to sums of independent variables

Textbook examples of explicit distributions of sums of independent random variables are the normal and the gamma distributions, in the latter case only if the scale parameters are the same. In applications, however, convolutions of other distributions routinely appear, and some numerical technique is needed to approximate the distribution of the sum of two random variables. This explains why numerical convolutions (or simulations) are often required. The convolutions of combinations of exponentials being relatively easy to calculate, one might ask whether the approximations of Section 5 could be used to approximate convolutions of distributions. Table 1 presents the results of some experiments performed with the lognormal distribution. The exact distribution function of the sum of two independent logormal distributions with first parameter 0 and second parameter σ2 was computed numerically, as well as its N-term combination of exponentials approximation (the convolution of the approximation found as in Section 5). The last colum shows the maximum difference between the distribution functions. It is interesting to note that under Method A the error for the convolution is very nearly double what it is for the original distribution, while under Method B the two errors are about the same. This author does not have a good explanation for this difference between the methods, and does not know whether the same holds more generally. More work is required here.

15 Fitting combinations of exponentials

Another possible technique is the following. Section 3 showed that when α =0the Fourier coefficients are a weighted sum of the Laplace transform of the ccdf of the distribution (see (3.3)). The latter is given by (3.4). Hence, the Jacobi expansion for the ccdf of the convolution of prob- ability distributions may be found directly form the Laplace transforms of each of the indvidual distributions, since m − ··· − E e µ(T1+ +Tm) = E e µTk . k=1 For more on sums of lognormals, the reader is referred to Dufresne (2004b).

Table 1 Accuracy of convolutions of lognormals

2 − ∗2 − ∗2 σ Approx. N terms F FN F FN 1 A 3 .050 .102 1 A 7 .018 .037 1 A 11 .003 .006 .25 A 3 .105 .204 .25 A 7 .004 .008 .25 A 11 .00025 .0005 .25 B 5 .223 .205 .25 B 9 .050 .051 .25 B 13 .0015 .001

Conclusion

Two advantages of the Jacobi Methods A and B presented are, first, their simplicity, and, second, the fact that all what is needed is the Laplace transform of the distribution at a finite number of points. Two disadvantages may be mentioned. First, although the sequence converges to the true value of the distribution function at all its continuity points, the truncated series under Method A is in general not a true distribution function. Second, distributions which have atoms are more difficult to approximate. A numerical problem which arises with these methods is the magnitude of the Fourier coefficents as the order of the approximation increases; for instance, some 20-term approximations have been seen to include coefficients of the order of 1025. For completely monotone distributions, there is an alternative method, presented in Feldmann & Whit (1998) (but going back to de Prony (1795)) which yields a hyper-exponential approximation (this technique applies to some Weibull distributions and to the Pareto distribution, but not to the lognormal or uniform). The coefficients of a hyper-exponential being all positive, they necessarily must all be small. The log-beta family includes explicit sequences which are combinations of exponentials and also converge to the Dirac mass at any y0 ≥ 0; as shown in Section 4, this leads to a technique

16 Fitting combinations of exponentials for approximating any distribution on a finite interval by combinations of exponentials. These approximations do not have the oscillatory behaviour of the Jacobi approximations. The examples in Section 6 show that it is possible to use log-beta combinations of exponentials of fairly high order (the author has used n = 600 in some cases) without incurring numerical difficulties. The precisions achieved are modest, by comparison with the Jacobi methods. I have found, however, that the log- beta method required far less trial and error than the Jacobi methods. The main disadvantage of the log-beta approximations is that they apply only to distributions over a finite interval. A positive point is, however, that there is a simple, computable error bound (Theorems 4.2 and 4.5) for the log- beta approximations. Another advantage is that the log-beta approximating pdf’s are automatically positive everywhere.

APPENDIX: PROOFS OF THEOREMS 4.3 AND 4.5

Proof of Theorem 4.3. Parts (a) and (b) follow from (if s>−b)   ∞ 1 b+s −(b+s)x −cx a−1 1 b+s −1 a−1 Γ( c )Γ(a) e (1 − e ) dx = u c (1 − u) du = . c b+s 0 0 c Γ( c + a)

Part (c) is obtained by differentiating the logarithm of Laplace transform in the usual way. To prove the convergence in distribution in (d), one approach is to recall that e−cX has a beta distribution, which yields exlicit formulas for the mean and variance of that variable. The mean equals κ/(c + κ), and the variance tends to 0. Alternatively, use the formula (Lebedev, 1972, p.15)

Γ(z + α) (α − β)(α + β − 1) = zα−β 1+ + O(|z|−2) as |z|→∞, Γ(z + β) 2z where α and β are arbitrary constants and | arg z|≤π − δ, for some δ>0. It can be seen that it implies   s Γ( κa + a)Γ( κa+s ) κ c lim E e−sX = lim c c = a→∞ a→∞ κa+s κa Γ( c + a)Γ( c ) κ + c for all s ∈ R. The convergence of moments results from the convergence of cumulants, which itself is a consequence of the expression in (c) together with the asymptotic formulas (Abramowitz & Stegun, 1972, Chapter 6):

1 1 ψ(z) ∼ log(z) − − + ... 2z 12z2 (A.1) (n − 1)! n! ψ(n)(z) ∼ (−1)(n−1) + + ... ,n=1, 2,... zn 2zn+1

(valid as z →∞if | arg z| <π).

Next, turn to (e). If a = n ∈ N+, then   b b b + c b +(n − 1)c Ee−sX =  c n = ··· . b+s b + s b + c + s b +(n − 1)c + s c n

17 Fitting combinations of exponentials

This proves the representation of X as the sum of independent exponential variables. The mean, variance and third central moment then follow by the additivity of cumulants, or else by using (4.1) and part (c) of this theorem. The expression in (f) is found by expanding the density using the binomial theorem. To prove (g), first observe that, using (b), given κ>0,

X ∼ LogBeta(a, κa, c) ⇔ κX ∼ LogBeta(a, a, c/κ).

It is then sufficient to prove the result for κ =1. First let a = n ∈ N, and express X as

d X = Yn,0 + ···+ Yn,n−1, where the variables on the right are independent, and Yn,j ∼ Exp(n + jc). We will use the Lyapounov central Limit Theorem (Billingsley, 1987, p.371), which says that the normal limit holds if n−1 1 2+δ lim E|Yn,j − EYn,j| =0 (A.2) n→∞ 2+δ Sn j=0 for some δ>0, with n−1 2 Sn = Var Yn,j. j=0 Let δ =2.IfY ∼ Exp(1), then

Var Y =1, E(Y − 1)4 =9.

This means that 1 9 Var Y = , E(Y − EY )4 = . n,j (n + jc)2 n,j n,j (n + jc)4 Then n−1 1 9 S2 ≥ , E(Y − EY )4 ≤ , n n(1 + c)2 n,j n,j n3 j=0 which implies (A.2). For arbitrary a, let n = a, r = a − n, and note that   n−1 a a+s a +(r + j)c Γ( + r)Γ( ) E e−sX =   c c . a +(r + j)c + s Γ( a+s + r)Γ( a ) j=0 c c

Hence, d X = Ya,0 + ···+ Ya,n−1 + Ya,n, where Ya,j ∼ Exp(b +(r + j)c), j =0,...,n− 1, and Ya,n ∼ LogBeta(r, a, c). The Lyapounov central Limit Theorem can again be applied, the only real difference is the additional term Yn,n. This term does not change the limit, since it can be observed (see (4.1) and part (c)) that 1 1 E Y < , Var Y < . a,n a a,n a2

18 Fitting combinations of exponentials

This implies that

Var(Y + ···+ Y − ) Y − E Y lim a,0 a,n 1 =1, a,n a,n →d 0 as a →∞. a→∞ ··· Var(Ya,0 + + Ya,n) Var(Ya,0 + ···+ Ya,n)

Hence, including or excluding Yn,n in the normalised expression for X does not change the limit distribution. Finally, (h) may be seen to follow from writing the product of the Laplace transforms of X1 and X2:

Γ( b + a )Γ( b+s ) Γ( b + a + a )Γ( b+s + a ) Γ( b + a + a )Γ( b+s ) c 1 c × c 1 2 c 1 c 1 2 c b+s b b+s b = b+s b , Γ( c + a1)Γ( c ) Γ( c + a1 + a2)Γ( c + a1) Γ( c + a1 + a2)Γ( c ) or else may be obtained by performing the convolution explicitly: for y>0,  y − − − − − − − − e b(y x)(1 − e c(y x))a1 1e (b+a1c)x(1 − e cx)a2 1 dx 0  y − − − − − − − = e by e cx(e cx − e cy)a1 1(1 − e cx)a2 1 dx 0 −by 1 e − − − = (u − e cy)a1 1(1 − u)a2 1 du c −cy e  1 − − − 1 − − = e by(1 − e cy)a1+a2 1 va1 1(1 − v)a2 1 dv. c 0

Proof of Theorem 4.5. Applying Theorem 4.3 (c),

2     t0  nκ −  nκ VarXn = 2 2 ψ ψ + n . c x0 c c The asymptotic behaviour of the function ψ(z) as z →∞is given by (A.1), and implies   2 2 ∼ t0 c − c 1 t0 1 VarXn 2 2 = 2 . c x0 nκ n(κ + c) n x0 κ(κ + c)

Next, suppose ET −p < ∞ for some p>1.Ifp>2, then necessarily ET −2 < ∞, and p may be replaced with 2 in what follows. We have P(T ≤ 0) = P(T + X − t ≤ 0) n  n 0 t0 = P(Xn − t0 ≤−t)fT (t) dt 0 t0 ≤ P(|Xn − t0|≥t)fT (t) dt 0  t0 ≤ | − |p fT (t) E Xn t0 p dt 0 t by Markov’s inequality. Now, let the Lr-norm of a random variable Y be

r 1 Y r =[E(|Y | )] r .

19 Fitting combinations of exponentials

By applying H¨older’s inequality,

p p | − |p − p ≤ − − 2 2 E Xn t0 = Xn t0 p Xn t0 2 =[(E(Xn t0) ] , and, moreover, 2 2 E(Xn − t0) = VarXn +(EXn − t0) .

We now need to know the asymptotic behaviour of EXn − t0 as n →∞. From Theorem 4.3(c) and (A.1),     t nκ nκ EX − t = 0 ψ + n − ψ − t n 0 cx c c 0 0       t0 nκ nκ t0 1 1 ∼ log + n − log − − − t0 cx0  c  c 2x0 nκ + cn nκ t0 κ + c ct0 1 = log + − t0 cx0 κ 2nx0 κ(κ + c) ct 1 = 0 . 2nx0 κ(κ + c) Putting all this together, we get

p − p E|Xn − t0| = O(n 2 ).

This implies that

1 1 2 ∼ 2 [sup fT (t)] [VarXn] + FTn (0) [sup fT (t)] [VarXn] t t

References

Abramowitz, M., and Stegun, I.A. (1972). Handbook of Mathematical Functions. (Tenth printing.) Dover, New York. Asmussen, S. (1997). Phase-type distributions and related point processes: Fitting and recent ad- vances. In: Matrix-Analytic Methods in Stochastic Models, Eds. S.R. Chakravarthy and A.S. Alfa, Marcel Dekker Inc., New York. Asmussen, S., and Bladt, M., (1997). Renewal theory and queueing algorithms for matrix-expon- ential distributions. In: Matrix-Analytic Methods in Stochastic Models, Eds. S.R. Chakravarthy and A.S. Alfa, Marcel Dekker Inc., New York. Billingsley, P. (1987). Probability and Measure. (Second Edition.) Wiley, New York. Bowers, N.L., Gerber, H.U., Hickman, J.C., Jones, D.A., and Nesbitt, C.J. (1997). Actuarial Math- ematics (Second Edition). Society of Actuaries, Itasca, Illinois. Dufresne, D. (2001). On a general class of risk models. Australian Actuarial Journal 7: 755-791. Dufresne, D. (2004a). Stochastic life annuities. Research Paper no.118, Centre for Actuarial Studies, University of Melbourne, 14 pages. Available at: http://www.economics.unimelb.edu.au/actwww /wps2004.html.

20 Fitting combinations of exponentials

Dufresne, D. (2004b). The log-normal approximation in financial and other computations. Advances in Applied Probability 36: 747-773. Dufresne, D. (2005). Bessel Processes and a Functional of Brownian Motion. In: Numerical Methods in Finance, H. Ben-Ameur and M. Breton (Eds.), Kluwer Academic Publisher. Feldmann, A., and Whitt, W. (1998). Fitting mixtures of exponentials to long-tail distributions to analyze network performance models. Performance Evaluation 31: 245-279. Feller, W. (1971). An Introduction to and its Applications, II. (Second Edition.) Wiley, New York. Higgins, J.R. (1977). Completeness and Basis Properties of Sets of Special Funcions. Cambridge University Press. Lebedev, N.N. (1972). Special Functions and their Applications. Dover, New York. Luke, Y.L. (1969). The Special Functions and their Approximations. Academic Press, New York. Neuts, M.F. (1981). Matrix-Geometric Solutions in Stochastic Models. The Johns Hopkins Univer- sity Press, Baltimaore, 1981. de Prony, Baron Gaspard Riche (1795). Essai experimental et analytique: sur les lois de la dilatabilit´e des fluideselastiques ´ et sur celles de la force expansive de la vapeur de l’eau et de la vapeur de l’alkool,a ` diff´erentes temp´eratures. Journal de l’Ecole´ Polytechnique 1, cahier 22: 24-76. (Available at www.essi.fr/ leroux/PRONY.pdf.) Yor, M. (1992). On some exponential functionals of Brownian motion. Adv. Appl. Prob. 24: 509-531. (Reproduced in Yor (2001).)

Daniel Dufresne Centre for Actuarial Studies Department of Economics University of Melbourne Tel.: (0)3 8344 5324 Fax: (0)3 8344 6899 [email protected]

21 Figure 1. Example 5.1, degenerate distribution

1-CDF

1 Exact 0.8 15-term A-approx.

0.6 30-term A-approx.

0.4

0.2

t 0.5 1 1.5 2 2.5 3

-0.2

Figure 2. Example 5.2, uniform distribution

1-CDF

1 Exact 0.8 3-term A-approx.

0.6 10-term A-approx.

0.4

0.2

t 2 4 6 8 10

-0.2 Figure 3. Example 5.2, uniform distribution

1-CDF

1 Exact 0.8 3-term A-approx.

0.6 5-term B-approx.

0.4

0.2

t 2 4 6 8 10

-0.2

Figure 4. Example 5.3, Pareto distribution

1-CDF

1 Exact 0.8 3-term A-approx.

0.6 5-term A-approx.

0.4

0.2

t 5 10 15 20

-0.2 Figure 5. Example 5.3, Pareto distribution

1-CDF 1

0.8 Exact

3-term A-approx. 0.6 5-term B-approx.

0.4

0.2

t 20 40 60 80 100

Figure 6. Example 5.4, lognormal distribution

1-CDF

1 Exact 0.8 3-term A-approx.

0.6 5-term A-approx.

0.4

0.2

t 2 4 6 8 10

-0.2 Figure 7. Example 5.4, lognormal distribution

1-CDF 1

0.8 Exact

5-term B-approx. 0.6 9-term B-approx.

0.4

0.2

t 1 2 3 4 5 6

Figure 8. Example 5.5, Makeham distribution

1-CDF

1 Exact

0.8 3-term A-approx.

7-term A-approx. 0.6

0.4

0.2

t 10 20 30 40 50

-0.2 Figure 9. Example 6.1, degenerate distribution

1-CDF

1 Exact 0.8 20-term approx.

0.6 400-term approx.

0.4

0.2

t 0.5 1 1.5 2 2.5 3

-0.2

Figure 10. Example 6.2, uniform distribution

PDF

1 Exact 0.8 20-term approx.

0.6 200-term approx.

0.4

0.2

t 1 2 3 4

-0.2 Figure 11. Example 6.2, uniform distribution

1-CDF

1 Exact 0.8 20-term approx.

0.6 200-term approx.

0.4

0.2

t 1 2 3 4

-0.2

Figure 12. Example 6.3, squared sine distribution

PDF

2

Exact 1.5 20-term approx.

200-term approx. 1

0.5

t 0.5 1 1.5 2 Figure 13. Example 6.3, squared sine distribution

1-CDF

1 Exact 0.8 20-term approx.

0.6 200-term approx.

0.4

0.2

t 0.5 1 1.5 2

-0.2