Fitting Combinations of Exponentials to Probability Distributions

FITTING COMBINATIONS OF EXPONENTIALS TO PROBABILITY DISTRIBUTIONS Daniel Dufresne Centre for Actuarial Studies University of Melbourne First version: 3 February 2005 This version: 8 November 2005 Abstract Two techniques are described for approximating distributions on the positive half-line by combinations of exponentials. One is based on Jacobi polynomial expansions, and the other on the log-beta distribution. The techniques are applied to some well-known distributions (degenerate, uniform, Pareto, lognormal and others). In theory the techniques yield sequences of combination of exponentials that always converge to the true distribution, but their numerical performance depends on the particular distribution being approximated. An error bound is given in the case the log-beta approximations. Keywords: Sums of lognormals; Jacobi polynomials; Combinations of exponentials; Log-beta distribution 1. Introduction The exponential distribution simplifies many calculations, for instance in risk theory, queueing theory, and so on. Often, combinations of exponentials also lead to simpler calculations, and, as a result, there is an interest in approximating arbitrary distributions on the positive half-line by combinations of exponentials. The class of approximating distributions considered in this paper consists in those with a distribution function F which can be written as n −λj t 1 − F (t)= aje ,t≥ 0,λj > 0,j=1,...,n, 1 ≤ n<∞. k=1 When aj > 0, j =1,...,n, this distribution is a mixture of exponentials (also called a “hyper- exponential” distribution); in this paper, however, we allow some of the aj to be negative. The class we are considering is then a subset of the rational family of distributions (also called the matrix- exponential family, or Cox family), which comprises those distributions with a Laplace transform which is a rational function (a ratio of two polynomials). Another subset of the rational family is the class of phase-type distributions (Neuts, 1981), which includes the hyper-exponentials but not all combinations of exponentials. There is an expanding literature on these classes of distributions, see Asmussen & Bladt (1997) for more details and a list of references. For our purposes, the important property of combinations of exponentials is that they are dense in the set of probability distributions on [0, ∞). The problem of fitting distributions from one of the above classes to a given distribution is far form new; in particular, the fitting of phase-type distributions has attracted attention recently, see Asmussen (1997). However, this author came upon the question in particular contexts (see below) Fitting combinations of exponentials where combinations of exponentials are the ideal tool, and where there is no reason to restrict the search to hyper-exponentials or phase-type distributions. Known methods rely on least-squares, moment matching, and so on. The advantage of the methods described in this paper is that they apply to all distributions, though they perform better for some distributions than for others. For a different fitting technique and other references, see the paper by Feldmann & Whit (1998); they fit hyper-exponentials to a certain class of distributions (a subset of the class of distributions with a tail fatter than the exponential). The references cited above are related to queueing theory, where there is a lot of interest in phase-type distributions. However, this author has come across approximations by combinations of exponentials in relation to the following three problems. Risk theory. It has been known for some time that the probability of ruin is simpler to compute if the distribution of the claims, or that of the inter-arrival times of claims, is rational, see Dufresne (2001) for details and references. The simplifications which occur in risk theory have been well- known in the literature on random walks (see, for instance, the comments in Feller (1971)) and are also related to queueing theory. Convolutions. The distribution of the sum of independent random variables with a lognormal or Pareto distribution is not known in simple form. Therefore, a possibility is to approximate the distribution of each of the variable involved by a combination of exponentials, and then proceed with the convolution, which is a relatively straightforward affair with combinations of exponentials. This idea is explored briefly in Section 7. Financial mathematics. Suppose an amount of money is invested in the stock market and is used to pay an annuity to a pensioner. What is the probability that the fund will last until the pensioner dies? The answer to this question is known in closed form if the rates of return are lognormal and the lifetime of the pensioner is exponential; this is due to Yor (1992), see also Dufresne (2005). In practice, however, the duration of human life is not exponentially distributed, and the distribution of the required “stochastic life annuity” can be found by expressing the future lifetime distribution as a combination of exponentials. The methods presented below are used to solve this problem in a separate paper (Dufresne, 2004a). Here is the layout of the paper. Section 2 defines the Jacobi polynomials and gives some of their properties, while Section 3 shows how these polynomials may be used to obtain a convergent series of exponentials that represents a continuous distribution function exactly. In many applications it is not necessary that the approximating combination of exponentials be a true probability distribution, but in other cases this would be a problem. Two versions (A and B) of the method are described; method A yields a convergent sequence of combinations of exponentials which are not precisely density functions, while method B produces true density functions. A different idea is presented in Section 4. It is shown that sequences of log-beta distribution converge to the Dirac mass, and that this can be used to find a different sequence of combinations of exponentials which converges to any distribution on a finite interval. Sections 5 and 6 apply the methods described to the approximation of the some of the usual distributions: degenerate, uniform, Pareto, lognormal and Makeham and others. The Pareto and lognormal are fat-tail distributions, while the Makeham (used by actuaries to represent future lifetime) has a thin tail. The performance of the methods is discussed in those sections and in the Conclusion. Section 7 shows how the results of Section 3 may be applied to approximating the distribution of the sum of independent random variables. 2 Fitting combinations of exponentials Notation and vocabulary. The cdf of a probability distribution is its cumulative distribution function, the ccdf is the complement of the cdf (one minus the cdf), and the pdf is the probability density function. An atom is a point x ∈ R where a measure has positive mass, or, equivalently, where its distribution function has a discontinuity. A Dirac mass is a measure which assigns mass 1 to some point x, and no mass elsewhere; it is denoted δx(·). The “big oh” notation has the usual meaning a a = O(b ) if lim n exists and is finite. n n →∞ n bn Finally, (z)n is Pochhammer’s symbol: (z)0 =1, (z)n = z(z +1)···(z + n − 1),n≥ 1. 2. Jacobi polynomials The Jacobi polynomials are usually defined as (α +1) 1 − x P (α,β)(x)= n F −n, n + α + β +1,α+1; ,n=0, 1,..., n n! 2 1 2 for α, β > −1. These polynomials are othogonal over the interval [−1, 1], for the weight function (1 − x)α(1 + x)β. We will use the “shifted” Jacobi polynomials, which have the following equivalent definitions, once again for α, β > −1: (α,β) (α,β) − Rn (x)=Pn (2x 1) (α +1) = n F (−n, n + α + β +1,α+1;1− x) n! 2 1 (β +1) =(−1)n n F (−n, n + α + β +1,β+1;x) n! 2 1 (−1)n dn = (1 − x)−αx−β [(1 − x)α+nxβ+n] n! dxn n j = ρnjx , j=0 where n (−1) (β +1)n(−n)j(n + λ)j ρnj = ,λ= α + β +1. (β +1)jn!j! The shifted Jacobi polynomials are orthogonal on [0, 1], for the weight function w(α,β)(x)=(1− x)αxβ. The following result is stated in Luke (1969, p.284) (the result is given there in terms of the (α,β) (α,β) {Pn (x)}, but we have rephrased it in terms of the {Rn (x)}). 3 Fitting combinations of exponentials Theorem 2.1. If φ(x) is continuous in the closed interval 0 ≤ x ≤ 1 and has a piecewise continuous derivative, then, for α, β > −1, the Jacobi series ∞ 1 (α,β) 1 − α β (α,β) φ(x)= cnRn (x),cn = φ(x)(1 x) x Rn (x) dx hn 0 n=0 1 − α β (α,β) 2 Γ(n + α + 1)Γ(n + β +1) hn = (1 x) x Rn (x) dx = , 0 (2n + λ)n!Γ(n + λ) converges uniformly to f(x) in <x<1 − , 0 <<1. This means that the Fourier series N (α,β) sN (x)= cnRn (x) n=0 converges to f(x) as N →∞, for all x in (0, 1). In the case of functions with discontinuities the limit of the series is 1 [f(x−)+f(x+)], 2 see Chapter 8 of Luke (1969) for more details. Another classical result is the following (see Higgins, 1977). Theorem 2.2. If φ ∈ L2((0, 1),w(α,β)), that is, if φ is measurable and 1 φ2(x)w(α,β)(x) dx < ∞, 0 2 (α,β) then sN converges in L ((0, 1),w ) and − lim φ sN L2((0,1),w(α,β)) =0. N→∞ 3. Fitting combinations of exponentials to using Jacobi polynomials Method A: direct Jacobi approximation of the ccdf The goal of this section is to approximate the complement of the distribution function by a combination of exponentials: −λj t F (t)=1− F (t)=P(T>t) ≈ aje ,t≥ 0.

Load more