Institutionen för Matematik och Fysik Code: MdH-IMa-2005:018
MASTER THESIS IN MATHEMATICS /APPLIED MATHEMATICS
Modelling Insurance Claim Sizes using the Mixture of Gamma & Reciprocal Gamma Distributions
by
Ying Ni
Magisterarbete i matematik / tillämpad matematik
DEPARTMENT OF MATHEMATICS AND PHYSICS MÄLARDALEN UNIVERSITY SE-721 23 VÄSTERÅS, SWEDEN
DEPARTEMENT OF MATHEMATICS AND PHYSICS
______
Master thesis in mathematics / applied mathematics
Date: 2005-12-20
Project name: Modelling Insurance Claim Sizes using the Mixture of Gamma & Reciprocal Gamma Distributions
Author: Ying Ni
Supervisors: Dmitrii Silvestrov, Anatoliy Malyarenko
Examiner: Dmitrii Silvestrov
Comprising: 20 points ______
ii Abstract
A gamma-reciprocal gamma mixture distribution is used to model the claim size distribution. It is specially designed to handle both ordinary and extremely large claims. The gamma distribution has a lighter tail. This is meant to be a model for the frequent, small & moderate claims while the reciprocal gamma distribution covers the large, but infrequent claims.
We begin with introducing the gamma distribution and motivating its role in small & moderate claim size modelling, followed with a discussion on the reciprocal gamma distribution and its application in large claims size modelling; finally we demonstrate the mixture of gamma & reciprocal gamma as the generally applicable statistical model.
Two parameter estimation techniques, namely the method of moments and maximum likelihood estimation are provided to fit the single gamma, single reciprocal gamma, and the general mixture models. We shall explain in details the solution concepts of the involved moment estimation and likelihood equations. We shall even make an effort to evaluate the proposed moment estimation algorithm.
Two Java applications, namely the Java Mixture Model Simulation Program and the Java Mixture Model Estimation Program are developed to facilitate the analysis. While the former simulates samples that follow the mixture distribution given known parameters, the latter provides the parameter estimation for simplified cases given samples.
Keywords: Gamma distribution; Reciprocal gamma distribution; Mixture of gamma and reciprocal gamma; Claim size distribution; Large claim size distribution; Method of moments; Maximum likelihood estimation; Estimating the parameters of gamma; Estimating the parameters of reciprocal gamma
iii Notation & Abbreviations Notation
Pr (…) the probability of … ∞ Γ α)( gamma function α)( =Γ α 1 −− x dxex . ∫0 d ψ α)( digamma function ψ α Γ= a)(ln)( dα ψ α)(' trigamma function x γ α x);( “lower” incomplete gamma function αγ );( = α 1 −− t dtetx ∫0 ∞ Γ α x);( “upper” incomplete gamma function α );( =Γ α 1 −− t dtetx ∫x
L likelihood function ln L log likelihood function
2 χα chi-square distribution with a degrees of freedom
Abbreviations cdf cumulative distribution function mgf moment-generating function
MLE maximum likelihood estimation, maximum likelihood estimator, or maximum likelihood estimate
R-gamma reciprocal gamma (distribution) pdf probability density function
iv Table of Contents
1. Introduction ……………...…….……………………..…………………………….. 1
2. The Gamma Distribution …………………….…………………………………….. 2
2.1. Definition, Moments & Properties ……………………………………………... 2 2.2. Applications in Claim Size Modelling …………………………………………. 5 2.3. Estimation of Parameters ……………………………………………………….. 6 2.3.1. Method of Moments Estimation …………..…………………………..….. 7 2.3.2. Maximum Likelihood Estimation …….…………………………………. . 8
3. The Reciprocal Gamma Distribution ……………………………………………... 12
3.1. Definition, Moments & Properties …………………………………………….. 12 3.2. Applications in Claim Size Modelling ………………………………………… 15 3.3. Estimation of Parameters ………………………………………………………. 15 3.3.1. Method of Moments Estimation ………………………………………..…15 3.3.2. Maximum Likelihood Estimation …………………………………………16
4. The Mixture of Gamma & Reciprocal Gamma …………………………………... 17
4.1. The Mixture of Gamma & Reciprocal Gamma ………………...... ….17 4.2. Estimation of Parameters ………………………………………….....……….…18 4.2.1. Method of Moments Estimation …………………………………………. 18 4.2.2. Maximum Likelihood Estimation ……………………………………….. 24
5. Simulation Studies ………………………………………………………………..... 27
5.1. Simulation ………………...... 27 5.2. Estimation ………………………………………….....………………………... 29
6. Conclusion ………………………………………………………………………...... 39
7. References …………………………………………………………………………... 40
8. Appendixes .……………………………………………………………………..…... 41
Appendix A: Java Mixture Model Simulation Program-Users’ Guide …...... 42 Appendix B: Java Mixture Model Estimation Program- Users’ Guide……...………. 46 Appendix C: Java Source Code: MixtureModelSimulation.java ...………….…….… 49 Appendix D: Java Source Code: MixtureModelEstimation.java ………….……….... 62 Appendix E: Java Source Code: Newton_Solver. java ………..…………………….. 71
v 1. Introduction
It is not unusual that, in addition to a large number of “ordinary” policies, an insurance portfolio also contains a small number of policies under which much larger claim sizes are likely. Those large claims usually represent the greatest part of the indemnities paid by the insurer. It is therefore essential for actuary to have a good model, which takes consideration into those extreme large claims.
In this paper, we introduce a new and generally applicable statistical model for individual insurance claim sizes, the mixture of gamma and reciprocal gamma model, which is specially designed to handle both ordinary and extremely large claims. This model uses gamma distribution as the first component, placing probability p and reciprocal gamma distribution as the second, placing probability 1-p. The gamma distribution has a lighter tail. This is meant to be a model for the frequent, small claims while the reciprocal gamma distribution covers the large, but infrequent claims.
If we set p = 1, the general model reduces to the single gamma which is by itself important as a model for ordinary claim sizes, that is, data without heavy tails; At p = 0, it reduces to the single reciprocal gamma model, and because of the heavy tail of this distribution it can model the extremely large claims.
We shall discuss the statistical properties of the two component distributions, gamma and reciprocal gamma distributions and their advantages in modelling these two types of claims, namely, the ordinary and extra large claims.
We shall also be interested in the estimation of parameters, because the parameters needed to be estimated from claim data before we can apply the proposed distribution to any particular actuarial problem, and parameters are seldom known a priori. The methods of moments and the maximum likelihood estimation are applied to the solution of these problems. These two parameters estimation techniques are illustrated in details for gamma, reciprocal gamma, and finally their mixture.
The aim of this paper is to motivate the general mixture model, specify and study its two components, and discuss the related parameter estimation techniques. The remaining of the paper proceeds as follows. In section 2, we describe the gamma distribution, its applications in claim size modelling, and the estimation of parameters using method of moments and maximum likelihood estimation. In section 3, we repeat the procedure for the reciprocal gamma distribution. The general mixture is demonstrated in section 4. The two estimation techniques for this relatively complicated case are also discussed. Section 5 illustrates the simulation of the relevant random variables with the help of our Java Mixture Model Simulation Program, and evaluates the proposed moment estimation algorithm for simplified cases using the Java Mixture Model Estimation Program. And finally section 6 concludes.
1
2. The Gamma Distribution
2.1 Definition, Moments & Properties
Probability density function (pdf)
The gamma distribution is a well-known, flexible continuous distribution that can model individual claim size. It has the pdf
λ 1ex −− λαα x (2.1) xf λα ),|( = x λα >>∞<≤ 0,0,0, , Γ α)( where Γ(α) is the gamma function defined by
∞ (2.2) α)( =Γ α 1 −− x dxex . ∫0
When α = 1, direct integration will verify thatΓ )1( = 1, we then get the exponential distribution.
Integration by parts will verify that Γ α + = αΓ()1( α) for any α >1. In particular, when n is a positive integer,Γ = nn −1()( )!, for example, Γ )6( = 120 . The gamma distribution is also called the Erlang distribution in this special case. If α is not an integer, Γ(α) can be found approximately by interpolating between the factorials or from tables and software programs. Figure 1(a,b) show the pdf plots for selected combinations of the parameters α and λ. Figure 1a, 1b, 2 are produced using MATLAB.
Figure 1a. Examples of the gamma pdf with fixed λ =0.5 and various α
2
Figure 1b. Examples of the gamma pdf with fixed α = 3 and various λ
Alternatively, the gamma distribution can be parameterized in terms of a shape parameter α (same as the parameter α in (2.1)) and scale parameter β (equal to 1 / λ in (2.1), λ in (2.1) is the inverse of scale parameter, or sometimes called rate parameter)
1ex −− x βα (2.3) xf βα ),|( = x βα >>∞<≤ 0,0,0, α Γ()αβ
Both (2.1) and (2.3) are commonly used, however (2.1) is the form that will be used through out the paper and also in the attached Java programs.
Cumulative distribution function (cdf)
Gamma distribution does not have a closed form cdf , in other words, its cdf is not expressible in terms of elementary function. When α is not an integer, it is impossible to obtain areas under the density function by direct integration. The best we can do is to express the cdf in terms of Γ α)( andγ α;( x) , where γ α;( x) is the “lower” incomplete gamma function defined by
x (2.4) αγ );( = α 1 −− t dtetx . ∫0
And the cdf is (Weisstein n.d.)
x γ α xλ),( (2.5) = )()( duufxF = ∫0 Γ α)(
3 Figure 2. Examples of the gamma cdf
In the special case of α being an integer, there is an important relationship between the gamma and the Poisson distributions, which may simplify the evaluation problem, namely
(2.6) Pr(X ≤ x) = Pr(Y ≥ α) for any x where Y~Poisson (xλ). (2.6) can be established by successive integration by parts, see Casella and Berger (1990) for details.
The expression (2.5) indicates a potential disadvantage of using gamma distribution, because there may be the need of a numerical algorithm to evaluate the gamma cdf. However, most spreadsheet programs and many statistical and numerical analysis programs have the gamma distribution built in1.
In the unlikely event that only tables are available2, the evaluation problem can be solved by using readily available tables of the χ 2 distribution and the fact that, if X~gamma (α, λ), then 2 2 2λX~ χ 2α . It generally will be necessary to interpolate in the χ -table to obtain the desired values (Kaas et al. 2001, p. 36). For example, if X has a gamma distribution with α =1.5 =3/2 2 and λ = ¼, then 2λX = X/2 has a χ 3 distribution. Thus, Pr(X < 3.5) = Pr(X/2 < 1.75) can be found by using χ 2 tables.
1 Note that in many applications, for example, MS Excel, the parameter λ should be replaced by 1/λ. 2 Tables of general gamma distribution are not so readily available.
4 Moments & Moment-generating function (mgf)
The mgf for a gamma-distributed random variable is α λα ⎡⎛ 1 ⎞ ⎤ 1 (2.7) tm )( = ⎢⎜ ⎟ Γ α)( ⎥ = α tfor < λ. . )( ⎣⎢⎝ λα −Γ t ⎠ ⎦⎥ − t λ)1(
The reader is referred to Wackely, Mendenhall, and Scheaffer (2002) for a formal derivation.
One way of finding the mean and variance of the gamma distribution is to differentiate (2.7) with respect to t and obtain λα α (2.8) mXE )0(')( == α +1 = −t λ)1( t=0 λ 2 2 d λα ( ) αλα + )1( αα + )1( (2.9) mXE )0('')( == = = dt α +1 α +2 2 −t λ)1( t=0 −t λ)1( t=0 λ αα + )1( α 2 α (2.10) XV )( = =− λ2 λ λ22
We can also calculate the moments directly,
1 −− λαα x ∞ λ ex n +Γ α)( (2.11) n )( = xXE n dx = ∫0 Γ α)( n Γ αλ )(
Properties
The versatility of the distribution is evident in Figure 1(a,b). When α = 1, the pdf takes its largest value at x = 0, and declines thereafter. For all other values of α, f (x) is zero at x = 0, rises to a maximum and then falls away again. The distribution is obviously not symmetrical. It is positively skewed, but, as α increase, the skewness decreases and the distribution becomes more symmetrical. The sum of n independent gamma random variables, each with pdf (2.1), can be shown to have the gamma distribution with parameters nα and λ. For large n, the Central Limit Theorem tells us that this distribution will be effectively normal with mean nα/λ, and variance nα/λ2 (Hossack, Pollard, and Zehnwirth 1999, p.86)
2.2 Applications in Claim Size Modelling
Gamma distribution as a model for ordinary claims
The Gamma distribution is important as a model for the individual claim size, because it is moderately skewed to the right and has a non-negative range, which is a feature of claim size distributions.
The reader is warned, however that there may be problems with the extreme tail of this distribution, and the issue of the shape of the right-hand tail is a critical one for the actuary. If the gamma distribution is suitable, then we may have obtained a plausible estimate of the tail
5 probability. If, on the other hand the true underlying pdf fades away to zero more slowly than the gamma, our estimate of the probability will be too low.
Klugman, Panjer, and Willmot (1998) argued that, if the possibility of exceptional large claims were remote, there might be little interest in the purchase of insurance. On the other hand, many loss processes produce large claims with enough regularity to inspire people and companies to purchase insurance and to create problems for the provider of insurance. For any distribution that extends probability to infinity, the issue is one of how quickly the density function approaches zero as the loss approaches infinity. The slower this happens, the more probability is pushed onto higher values and we then say the distribution has a heavy, thicker, or fat tail (p. 85).
Therefore, it is worth reiterating that, when large claims may occur, the tail of the claim size distribution needs special attention. Reinsures in particular need to be cautious not to underestimate the tail. Clearly, a reinsure has the need to fit a “tail” which does not decay away to zero too quickly. Therefore heavy-tailed distribution3, for instance, Pareto distribution, is a more plausible to model if there’re possibilities of extreme large claims. We return to this problem again, in the coming sections.
Klugman, Panjer, and Willmot (1998) have demonstrated how to study the tail behaviour of distributions and distinguish the heavy-tailed distributions from the light-tailed distributions. They also compared the tail weights of Gamma, lognormal and Pareto distributions and found out that Gamma has the lighter tail than the other two (p.86). In general, Gamma distribution is considered by actuaries as a light-tailed distribution and is therefore a reasonable model for small and moderate-sized claims (so-called ordinary or standard claims), that is, for data without large (right-hand) tails. In addition, Gamma distribution has a relatively heavy left- hand tail, which makes it intuitively appealing as the model for small claims.
2.3 Estimation of Parameters
Method of moments v.s. Maximum likelihood estimation (MLE)
We have now determined the gamma family to describe the population of ordinary claims. Our next concern is to determine the values of parameters α, and λ, because the parameters are seldom known a priori. Before the proposed gamma distribution to be applied to any particular actuarial problem, α, λ need to be estimated from claims data.
Statisticians usually choose estimators based on criteria as whether the estimator is unbiased, consistent and efficient. Of the various standard methods, the method of moments is perhaps the oldest and most readily understood, it find estimators of unknown parameters by equating corresponding sample and population moments. However, the MLE, which selects as estimates the values of the parameters that maximize the likelihood of the observed sample, is by far the most popular estimation technique.
The method of moments is sometimes preferred because it’s ease to use and the fact that it sometimes yields estimators with reasonable properties. Unfortunately, moment estimators are
3 When we say light-tailed or heavy tailed distributions, we implicitly mean the right-hand tail of the distribution unless stated otherwise.
6 consistent but usually not very efficient. In many circumstance, this method yields estimators that may be improved upon, whereas the MLE usually provides estimators, which are quite satisfactory as far as the above criteria are concerned. It is well-known that MLEs have the desirable properties of being consistent, asymptotically normal and asymptotically efficient for large samples under quite general conditions, they are often biased but the bias is generally removable by a simple adjustment.
In some cases, method of moments is a good place to start with. As we will see later, when it is not possible to perform the maximization of likelihood function by analytic means, the method of moments can provide the first approximation for the iterative solution of the likelihood equation. We will demonstrate both methods.
2.3.1. Method of Moments Estimation
Method of moments estimators are obtained by equating the fist k populations moments to the corresponding k sample moments, and solving the resulting system of simultaneous equations. '' ' k More specifically, they are the solutions of μ = mkk , for k = 1, 2, …t, where μ k = EX is the n ' 1 k kth population moments, mk = ∑ X i the kth sample moments, and t is the number of n i=1 parameters to be estimated.
Suppose we have a random sample of n observations, X1, X2, …, Xn, that is selected from a population where X i gamma α,(~ λ) ( i = 1,2,..n). In order to find method of moment estimators for two parameters α and λ, we must equate two pairs of population and sample moments. The fist two moments of gamma distribution were given in (2.8) and (2.9).
α μ ' ' === Xm 1 λ 1
αα + 1)1( n μ ' = m' == X 2 2 2 2 ∑ i λ n i=1
From the first equation we obtain ˆ = αλ ˆ X . Substituting into the second equation and solving forαˆ , we obtain
X 2 Xn 2 ˆ (2.12) α = 2 2 = 2 2 . ∑ i )( − XnX ∑ i − XnX
Substituting αˆ into the first equation yields ˆ αˆ Xn (2.13) λ == 2 2 X ∑ i − XnX
7 2.3.2 Maximum Likelihood Estimation
As stated above, MLEs are those values of the parameters that maximize the likelihood function 21 …xxxL k θ θ 21 ,,|,,( …θ k ) . Clearly, the likelihood function is a function of the parametersθ θ 21 …,,, θ k , we can therefore intuitively write the likelihood function as L θ θ 21 ,,( …θ k ) , or sometimes briefly as L.
It is often easier to work with the natural logarithm of L , ln L , (known as the log likelihood), than it is to work with L directly. This is possible because the log function is strictly increasing on (0, ∞), which implies that the maximum values of L and ln L will occur at the same points.
Bowman and Shenton (1988) has found out that if X1, .., Xn are independent and identically distributed as gamma (α, λ), then the MLEs of α and λ are unique but do not have closed-form expressions. Let’s work it out.
Suppose we have a sample of n observations, the likelihood function is then
α −1 n α −1 −λ xi α n i ex λ ⎧ ⎫ −λ∑ xi α −nn L = ∏ = ⎨∏ i ⎬ ex Γ αλ )( i=1 Γ α)( ⎩ i=1 ⎭
n −1 n α ⎧ ⎫ ⎧ ⎫ −λ ∑ xi α −nn = ⎨ i ⎬ ⎨∏∏ i ⎬ exx Γ αλ )( ⎩ i=1 ⎭ ⎩ i=1 ⎭
−1 ⎧ n ⎫ After dropping the uninformative factor ⎨∏ xi ⎬ , we obtain ⎩ i=1 ⎭ n α ⎧ ⎫ −λ ∑ xi α −nn (2.14) = ⎨∏ i ⎬ exL Γ (αλ ) ⎩ i=1 ⎭
The corresponding log likelihood function of (2.14) is
ln = α ∑ln i ∑ i nnxxL lnln Γ−+− (αλαλ )
Taking partial derivatives with respect to α, λ and setting to zero we obtain,
∂ ln L ⎛ d ⎞ (2.15) ∑ i lnln λ nnx ⎜ Γ−+= a ⎟ = 0)(ln ∂α ⎝ dα ⎠
∂ ln L nα (2.16) x =+−= 0 . ∂λ ∑ i λ
(2.15), (2.16) are called the likelihood equations, or sometimes normal equations.