Institutionen för Matematik och Fysik Code: MdH-IMa-2005:018

MASTER THESIS IN MATHEMATICS /APPLIED MATHEMATICS

Modelling Insurance Claim Sizes using the Mixture of Gamma & Reciprocal Gamma Distributions

by

Ying Ni

Magisterarbete i matematik / tillämpad matematik

DEPARTMENT OF MATHEMATICS AND PHYSICS MÄLARDALEN UNIVERSITY SE-721 23 VÄSTERÅS, SWEDEN

DEPARTEMENT OF MATHEMATICS AND PHYSICS

______

Master thesis in mathematics / applied mathematics

Date: 2005-12-20

Project name: Modelling Insurance Claim Sizes using the Mixture of Gamma & Reciprocal Gamma Distributions

Author: Ying Ni

Supervisors: Dmitrii Silvestrov, Anatoliy Malyarenko

Examiner: Dmitrii Silvestrov

Comprising: 20 points ______

ii Abstract

A gamma-reciprocal gamma is used to model the claim size distribution. It is specially designed to handle both ordinary and extremely large claims. The has a lighter tail. This is meant to be a model for the frequent, small & moderate claims while the reciprocal gamma distribution covers the large, but infrequent claims.

We begin with introducing the gamma distribution and motivating its role in small & moderate claim size modelling, followed with a discussion on the reciprocal gamma distribution and its application in large claims size modelling; finally we demonstrate the mixture of gamma & reciprocal gamma as the generally applicable statistical model.

Two parameter estimation techniques, namely the method of moments and maximum likelihood estimation are provided to fit the single gamma, single reciprocal gamma, and the general mixture models. We shall explain in details the solution concepts of the involved moment estimation and likelihood equations. We shall even make an effort to evaluate the proposed moment estimation algorithm.

Two Java applications, namely the Java Mixture Model Simulation Program and the Java Mixture Model Estimation Program are developed to facilitate the analysis. While the former simulates samples that follow the mixture distribution given known parameters, the latter provides the parameter estimation for simplified cases given samples.

Keywords: Gamma distribution; Reciprocal gamma distribution; Mixture of gamma and reciprocal gamma; Claim size distribution; Large claim size distribution; Method of moments; Maximum likelihood estimation; Estimating the parameters of gamma; Estimating the parameters of reciprocal gamma

iii Notation & Abbreviations Notation

Pr (…) the of … ∞ Γ α)( gamma function α)( =Γ α 1 −− x dxex . ∫0 d ψ α)( digamma function ψ α Γ= a)(ln)( dα ψ α)(' trigamma function x γ α x);( “lower” incomplete gamma function αγ );( = α 1 −− t dtetx ∫0 ∞ Γ α x);( “upper” incomplete gamma function α );( =Γ α 1 −− t dtetx ∫x

L likelihood function ln L log likelihood function

2 χα chi-square distribution with a degrees of freedom

Abbreviations cdf cumulative distribution function mgf moment-generating function

MLE maximum likelihood estimation, maximum likelihood estimator, or maximum likelihood estimate

R-gamma reciprocal gamma (distribution) pdf probability density function

iv Table of Contents

1. Introduction ……………...…….……………………..…………………………….. 1

2. The Gamma Distribution …………………….…………………………………….. 2

2.1. Definition, Moments & Properties ……………………………………………... 2 2.2. Applications in Claim Size Modelling …………………………………………. 5 2.3. Estimation of Parameters ……………………………………………………….. 6 2.3.1. Method of Moments Estimation …………..…………………………..….. 7 2.3.2. Maximum Likelihood Estimation …….…………………………………. . 8

3. The Reciprocal Gamma Distribution ……………………………………………... 12

3.1. Definition, Moments & Properties …………………………………………….. 12 3.2. Applications in Claim Size Modelling ………………………………………… 15 3.3. Estimation of Parameters ………………………………………………………. 15 3.3.1. Method of Moments Estimation ………………………………………..…15 3.3.2. Maximum Likelihood Estimation …………………………………………16

4. The Mixture of Gamma & Reciprocal Gamma …………………………………... 17

4.1. The Mixture of Gamma & Reciprocal Gamma ………………...... ….17 4.2. Estimation of Parameters ………………………………………….....……….…18 4.2.1. Method of Moments Estimation …………………………………………. 18 4.2.2. Maximum Likelihood Estimation ……………………………………….. 24

5. Simulation Studies ………………………………………………………………..... 27

5.1. Simulation ………………...... 27 5.2. Estimation ………………………………………….....………………………... 29

6. Conclusion ………………………………………………………………………...... 39

7. References …………………………………………………………………………... 40

8. Appendixes .……………………………………………………………………..…... 41

Appendix A: Java Mixture Model Simulation Program-Users’ Guide …...... 42 Appendix B: Java Mixture Model Estimation Program- Users’ Guide……...………. 46 Appendix C: Java Source Code: MixtureModelSimulation.java ...………….…….… 49 Appendix D: Java Source Code: MixtureModelEstimation.java ………….……….... 62 Appendix E: Java Source Code: Newton_Solver. java ………..…………………….. 71

v 1. Introduction

It is not unusual that, in addition to a large number of “ordinary” policies, an insurance portfolio also contains a small number of policies under which much larger claim sizes are likely. Those large claims usually represent the greatest part of the indemnities paid by the insurer. It is therefore essential for actuary to have a good model, which takes consideration into those extreme large claims.

In this paper, we introduce a new and generally applicable statistical model for individual insurance claim sizes, the mixture of gamma and reciprocal gamma model, which is specially designed to handle both ordinary and extremely large claims. This model uses gamma distribution as the first component, placing probability p and reciprocal gamma distribution as the second, placing probability 1-p. The gamma distribution has a lighter tail. This is meant to be a model for the frequent, small claims while the reciprocal gamma distribution covers the large, but infrequent claims.

If we set p = 1, the general model reduces to the single gamma which is by itself important as a model for ordinary claim sizes, that is, data without heavy tails; At p = 0, it reduces to the single reciprocal gamma model, and because of the heavy tail of this distribution it can model the extremely large claims.

We shall discuss the statistical properties of the two component distributions, gamma and reciprocal gamma distributions and their advantages in modelling these two types of claims, namely, the ordinary and extra large claims.

We shall also be interested in the estimation of parameters, because the parameters needed to be estimated from claim data before we can apply the proposed distribution to any particular actuarial problem, and parameters are seldom known a priori. The methods of moments and the maximum likelihood estimation are applied to the solution of these problems. These two parameters estimation techniques are illustrated in details for gamma, reciprocal gamma, and finally their mixture.

The aim of this paper is to motivate the general mixture model, specify and study its two components, and discuss the related parameter estimation techniques. The remaining of the paper proceeds as follows. In section 2, we describe the gamma distribution, its applications in claim size modelling, and the estimation of parameters using method of moments and maximum likelihood estimation. In section 3, we repeat the procedure for the reciprocal gamma distribution. The general mixture is demonstrated in section 4. The two estimation techniques for this relatively complicated case are also discussed. Section 5 illustrates the simulation of the relevant random variables with the help of our Java Mixture Model Simulation Program, and evaluates the proposed moment estimation algorithm for simplified cases using the Java Mixture Model Estimation Program. And finally section 6 concludes.

1

2. The Gamma Distribution

2.1 Definition, Moments & Properties

Probability density function (pdf)

The gamma distribution is a well-known, flexible continuous distribution that can model individual claim size. It has the pdf

λ 1ex −− λαα x (2.1) xf λα ),|( = x λα >>∞<≤ 0,0,0, , Γ α)( where Γ(α) is the gamma function defined by

∞ (2.2) α)( =Γ α 1 −− x dxex . ∫0

When α = 1, direct integration will verify thatΓ )1( = 1, we then get the .

Integration by parts will verify that Γ α + = αΓ()1( α) for any α >1. In particular, when n is a positive integer,Γ = nn −1()( )!, for example, Γ )6( = 120 . The gamma distribution is also called the in this special case. If α is not an integer, Γ(α) can be found approximately by interpolating between the factorials or from tables and software programs. Figure 1(a,b) show the pdf plots for selected combinations of the parameters α and λ. Figure 1a, 1b, 2 are produced using MATLAB.

Figure 1a. Examples of the gamma pdf with fixed λ =0.5 and various α

2

Figure 1b. Examples of the gamma pdf with fixed α = 3 and various λ

Alternatively, the gamma distribution can be parameterized in terms of a shape parameter α (same as the parameter α in (2.1)) and scale parameter β (equal to 1 / λ in (2.1), λ in (2.1) is the inverse of scale parameter, or sometimes called rate parameter)

1ex −− x βα (2.3) xf βα ),|( = x βα >>∞<≤ 0,0,0, α Γ()αβ

Both (2.1) and (2.3) are commonly used, however (2.1) is the form that will be used through out the paper and also in the attached Java programs.

Cumulative distribution function (cdf)

Gamma distribution does not have a closed form cdf , in other words, its cdf is not expressible in terms of elementary function. When α is not an integer, it is impossible to obtain areas under the density function by direct integration. The best we can do is to express the cdf in terms of Γ α)( andγ α;( x) , where γ α;( x) is the “lower” incomplete gamma function defined by

x (2.4) αγ );( = α 1 −− t dtetx . ∫0

And the cdf is (Weisstein n.d.)

x γ α xλ),( (2.5) = )()( duufxF = ∫0 Γ α)(

3 Figure 2. Examples of the gamma cdf

In the special case of α being an integer, there is an important relationship between the gamma and the Poisson distributions, which may simplify the evaluation problem, namely

(2.6) Pr(X ≤ x) = Pr(Y ≥ α) for any x where Y~Poisson (xλ). (2.6) can be established by successive integration by parts, see Casella and Berger (1990) for details.

The expression (2.5) indicates a potential disadvantage of using gamma distribution, because there may be the need of a numerical algorithm to evaluate the gamma cdf. However, most spreadsheet programs and many statistical and numerical analysis programs have the gamma distribution built in1.

In the unlikely event that only tables are available2, the evaluation problem can be solved by using readily available tables of the χ 2 distribution and the fact that, if X~gamma (α, λ), then 2 2 2λX~ χ 2α . It generally will be necessary to interpolate in the χ -table to obtain the desired values (Kaas et al. 2001, p. 36). For example, if X has a gamma distribution with α =1.5 =3/2 2 and λ = ¼, then 2λX = X/2 has a χ 3 distribution. Thus, Pr(X < 3.5) = Pr(X/2 < 1.75) can be found by using χ 2 tables.

1 Note that in many applications, for example, MS Excel, the parameter λ should be replaced by 1/λ. 2 Tables of general gamma distribution are not so readily available.

4 Moments & Moment-generating function (mgf)

The mgf for a gamma-distributed random variable is α λα ⎡⎛ 1 ⎞ ⎤ 1 (2.7) tm )( = ⎢⎜ ⎟ Γ α)( ⎥ = α tfor < λ. . )( ⎣⎢⎝ λα −Γ t ⎠ ⎦⎥ − t λ)1(

The reader is referred to Wackely, Mendenhall, and Scheaffer (2002) for a formal derivation.

One way of finding the mean and of the gamma distribution is to differentiate (2.7) with respect to t and obtain λα α (2.8) mXE )0(')( == α +1 = −t λ)1( t=0 λ 2 2 d λα ( ) αλα + )1( αα + )1( (2.9) mXE )0('')( == = = dt α +1 α +2 2 −t λ)1( t=0 −t λ)1( t=0 λ αα + )1( α 2 α (2.10) XV )( = =− λ2 λ λ22

We can also calculate the moments directly,

1 −− λαα x ∞ λ ex n +Γ α)( (2.11) n )( = xXE n dx = ∫0 Γ α)( n Γ αλ )(

Properties

The versatility of the distribution is evident in Figure 1(a,b). When α = 1, the pdf takes its largest value at x = 0, and declines thereafter. For all other values of α, f (x) is zero at x = 0, rises to a maximum and then falls away again. The distribution is obviously not symmetrical. It is positively skewed, but, as α increase, the skewness decreases and the distribution becomes more symmetrical. The sum of n independent gamma random variables, each with pdf (2.1), can be shown to have the gamma distribution with parameters nα and λ. For large n, the Central Limit Theorem tells us that this distribution will be effectively normal with mean nα/λ, and variance nα/λ2 (Hossack, Pollard, and Zehnwirth 1999, p.86)

2.2 Applications in Claim Size Modelling

Gamma distribution as a model for ordinary claims

The Gamma distribution is important as a model for the individual claim size, because it is moderately skewed to the right and has a non-negative range, which is a feature of claim size distributions.

The reader is warned, however that there may be problems with the extreme tail of this distribution, and the issue of the shape of the right-hand tail is a critical one for the actuary. If the gamma distribution is suitable, then we may have obtained a plausible estimate of the tail

5 probability. If, on the other hand the true underlying pdf fades away to zero more slowly than the gamma, our estimate of the probability will be too low.

Klugman, Panjer, and Willmot (1998) argued that, if the possibility of exceptional large claims were remote, there might be little interest in the purchase of insurance. On the other hand, many loss processes produce large claims with enough regularity to inspire people and companies to purchase insurance and to create problems for the provider of insurance. For any distribution that extends probability to infinity, the issue is one of how quickly the density function approaches zero as the loss approaches infinity. The slower this happens, the more probability is pushed onto higher values and we then say the distribution has a heavy, thicker, or fat tail (p. 85).

Therefore, it is worth reiterating that, when large claims may occur, the tail of the claim size distribution needs special attention. Reinsures in particular need to be cautious not to underestimate the tail. Clearly, a reinsure has the need to fit a “tail” which does not decay away to zero too quickly. Therefore heavy-tailed distribution3, for instance, , is a more plausible to model if there’re possibilities of extreme large claims. We return to this problem again, in the coming sections.

Klugman, Panjer, and Willmot (1998) have demonstrated how to study the tail behaviour of distributions and distinguish the heavy-tailed distributions from the light-tailed distributions. They also compared the tail weights of Gamma, lognormal and Pareto distributions and found out that Gamma has the lighter tail than the other two (p.86). In general, Gamma distribution is considered by actuaries as a light-tailed distribution and is therefore a reasonable model for small and moderate-sized claims (so-called ordinary or standard claims), that is, for data without large (right-hand) tails. In addition, Gamma distribution has a relatively heavy left- hand tail, which makes it intuitively appealing as the model for small claims.

2.3 Estimation of Parameters

Method of moments v.s. Maximum likelihood estimation (MLE)

We have now determined the gamma family to describe the population of ordinary claims. Our next concern is to determine the values of parameters α, and λ, because the parameters are seldom known a priori. Before the proposed gamma distribution to be applied to any particular actuarial problem, α, λ need to be estimated from claims data.

Statisticians usually choose estimators based on criteria as whether the estimator is unbiased, consistent and efficient. Of the various standard methods, the method of moments is perhaps the oldest and most readily understood, it find estimators of unknown parameters by equating corresponding sample and population moments. However, the MLE, which selects as estimates the values of the parameters that maximize the likelihood of the observed sample, is by far the most popular estimation technique.

The method of moments is sometimes preferred because it’s ease to use and the fact that it sometimes yields estimators with reasonable properties. Unfortunately, moment estimators are

3 When we say light-tailed or heavy tailed distributions, we implicitly mean the right-hand tail of the distribution unless stated otherwise.

6 consistent but usually not very efficient. In many circumstance, this method yields estimators that may be improved upon, whereas the MLE usually provides estimators, which are quite satisfactory as far as the above criteria are concerned. It is well-known that MLEs have the desirable properties of being consistent, asymptotically normal and asymptotically efficient for large samples under quite general conditions, they are often biased but the bias is generally removable by a simple adjustment.

In some cases, method of moments is a good place to start with. As we will see later, when it is not possible to perform the maximization of likelihood function by analytic means, the method of moments can provide the first approximation for the iterative solution of the likelihood equation. We will demonstrate both methods.

2.3.1. Method of Moments Estimation

Method of moments estimators are obtained by equating the fist k populations moments to the corresponding k sample moments, and solving the resulting system of simultaneous equations. '' ' k More specifically, they are the solutions of μ = mkk , for k = 1, 2, …t, where μ k = EX is the n ' 1 k kth population moments, mk = ∑ X i the kth sample moments, and t is the number of n i=1 parameters to be estimated.

Suppose we have a random sample of n observations, X1, X2, …, Xn, that is selected from a population where X i gamma α,(~ λ) ( i = 1,2,..n). In order to find method of moment estimators for two parameters α and λ, we must equate two pairs of population and sample moments. The fist two moments of gamma distribution were given in (2.8) and (2.9).

α μ ' ' === Xm 1 λ 1

αα + 1)1( n μ ' = m' == X 2 2 2 2 ∑ i λ n i=1

From the first equation we obtain ˆ = αλ ˆ X . Substituting into the second equation and solving forαˆ , we obtain

X 2 Xn 2 ˆ (2.12) α = 2 2 = 2 2 . ∑ i )( − XnX ∑ i − XnX

Substituting αˆ into the first equation yields ˆ αˆ Xn (2.13) λ == 2 2 X ∑ i − XnX

7 2.3.2 Maximum Likelihood Estimation

As stated above, MLEs are those values of the parameters that maximize the likelihood function 21 …xxxL k θ θ 21 ,,|,,( …θ k ) . Clearly, the likelihood function is a function of the parametersθ θ 21 …,,, θ k , we can therefore intuitively write the likelihood function as L θ θ 21 ,,( …θ k ) , or sometimes briefly as L.

It is often easier to work with the natural of L , ln L , (known as the log likelihood), than it is to work with L directly. This is possible because the log function is strictly increasing on (0, ∞), which implies that the maximum values of L and ln L will occur at the same points.

Bowman and Shenton (1988) has found out that if X1, .., Xn are independent and identically distributed as gamma (α, λ), then the MLEs of α and λ are unique but do not have closed-form expressions. Let’s work it out.

Suppose we have a sample of n observations, the likelihood function is then

α −1 n α −1 −λ xi α n i ex λ ⎧ ⎫ −λ∑ xi α −nn L = ∏ = ⎨∏ i ⎬ ex Γ αλ )( i=1 Γ α)( ⎩ i=1 ⎭

n −1 n α ⎧ ⎫ ⎧ ⎫ −λ ∑ xi α −nn = ⎨ i ⎬ ⎨∏∏ i ⎬ exx Γ αλ )( ⎩ i=1 ⎭ ⎩ i=1 ⎭

−1 ⎧ n ⎫ After dropping the uninformative factor ⎨∏ xi ⎬ , we obtain ⎩ i=1 ⎭ n α ⎧ ⎫ −λ ∑ xi α −nn (2.14) = ⎨∏ i ⎬ exL Γ (αλ ) ⎩ i=1 ⎭

The corresponding log likelihood function of (2.14) is

ln = α ∑ln i ∑ i nnxxL lnln Γ−+− (αλαλ )

Taking partial derivatives with respect to α, λ and setting to zero we obtain,

∂ ln L ⎛ d ⎞ (2.15) ∑ i lnln λ nnx ⎜ Γ−+= a ⎟ = 0)(ln ∂α ⎝ dα ⎠

∂ ln L nα (2.16) x =+−= 0 . ∂λ ∑ i λ

(2.15), (2.16) are called the likelihood equations, or sometimes normal equations.

Solving (2.16) for λ gives

8 1 ∑ xi x α (2.17) = α λ =⇒= λ n α x

Where ,λα are the MLEs of α and λ respectively and = ∑ i nxx = sample mean, and (2.15) then turns into

(2.18) x x αψα =−−+ 0)(lnlnln

d Where ln = ()ln nxx and ψ α Γ= a)(ln)( (the logarithmic derivative of the gamma ∑ i dα function) is called the digamma function.

It is readily seen that the system of equations (2.15) and (2.16) can not be solved analytically because of the digamma and logarithm functions; therefore we have confirmed that the MLEs do not have a closed form. They are, however, indeed unique. And to obtain them, numerical methods must be applied. We may first determine αˆ using an iterative search like Newton- Raphson method, αˆ is then substituted into (2.17) to determine λˆ , as will be demonstrated in the following example.

A Numerical Example-Newton-Raphson method4

Suppose we have a sample known to come from gamma (2, 0.5), each value of which was 2 5 obtained by adding four consecutive entries from a table of realizations of χ1 variables . The sample is as follows:

Table 1. A sample of gamma (2,0.5) distributed random numbers

5.6135 2.2197 5.8971 3.8243 6.5021 0.8590 3.5452 2.4983 2.0567 0.7797 1.0184 11.3595 2.1279 2.0924 4.1648 4.9673 0.3939 4.7217 2.9399 6.8468 5.6229 3.6467 6.0812 1.9336 4.4899

∑ x = ∑ x = ,1934.96,59314.27ln ∑ x 2 = 5038..517 x = x = x 2 = x = 3475.1ln,702.20,8477.3,1037.1ln xx =− 2438.0lnln

For this sample, given that xx =− 2438.0lnln , (2.18) turns into

(2.19) h α α ψ α −−= 2438.0)(ln)( = 0

4 This example is adapted from Handbook of applicable mathematics, pp 316-318 5 Recall in (2.1) we stated that the sum of n independent gamma (α, λ) variables distribution gamma (nα, λ), since 2 2 χ1 is essentially gamma (0.5, 0.5), sum of 4 independent χ1 variables will have gamma (2, 0.5).

9

We first solve (2.19) forα , then put the resulting expression into (2.17) and obtain

= αλ ˆ 8477.3

As discussed before, we can use the method of moment’s estimates as the first approximation, which are obtained using (2.12) and (2.13)

xn 2 xn ˆ ˆ α = 22 = 52.2 , λ = 22 = 65.0 ∑ i − xnx ∑ i − xnx

Now we have the first approximations for the iterative solution of the likelihood equations (2.15) and (16).

We will use the Newton-Raphson method. The formula is

xh n )( (2.20) n+1 xx n −= xh n )('

(See Adams and Robert 2003, p.273)

Applying (2.20), the iteration proceeds as follows,

Table 2. Solving the likelihood equation of gamma distribution using Newton-Raphson method

Iteration No. α ψ(α) h(α) Ψ´(α) H´(α) Correction ( -h(α)/h´(α)) 1 2.52 0.713 -0.32 0.49 -0.09 -0.36 2 2.160 0.521 0.0052 0.590 -0.123 0.042 3 2.202 0.5454 0.00013 0.5723 -0.1182 0.0011 4 2.203

Thus,αˆ = .2 203, whence λˆ = 573.0 , which are quite satisfactory results. They are also closer to the true values (α = 2, λ = 0.5) comparing to the method of moments estimates.

In the above computations, ψ(α) is the digamma function, ψ´(α) is its 1st derivative, known as the trigamma function, the nth derivative of ψ(α) is called the polygamma function, denoted as ψ n α)( . Those functions can be evaluated in relatively math-specialized software programs, for example, in Mathematica, ψ(α) is returned by the function PolyGamma [z] or PolyGamma [0, z], n ψ α)( is implemented as PolyGamma [n, z].

Approximating MLEs in closed-form

Alternatively, ψ(α) can be approximated, very accurately using the following Equation (see Shorack 2000, p.548).

1 1 1 1 1 (2.21) ψ ln)( αα +−+−−= ... , 2α 12α 2 120α 4 252α 6 240α 8

10

Taking the first derivative of (2.21) yields

11 1 1 1 1 (2.22) ψ z)(' −−+−++= α 2α 2 6α 3 30α 5 42α 7 30α 9

Hory and Martin (2002) have used approximated ψ (α) in (2.21) to the first three terms, and in this way they derived an analytical approximation to the solutions of the likelihood equations system (2.15), (2.16). Their resulting (approximating) MLE estimators (in closed-form) are

()−++ uu 3)ln(411 αˆ = 12 ())ln(4 − uu (2.23) 12 ()−++ uu 3)ln(411 λˆ = 12 ())ln(4 − uuu 122 where

1 = ln xu , 2 = xu

We can apply (2.23) directly for our numerical example and obtain an approximation of MLEs,

++ ()− 31037.1)8477.3ln(411 αˆ = = 206.2 ()− 1037.1)8477.3ln(4

++ ()− 31037.1)8477.3ln(411 λˆ = = 573.0 ()− 1037.1)8477.3ln()8477.3(4

In this case, the approximation is very satisfactory. In general, Hory and Martin (2002)’s study has shown that (2.23) often leads to slightly over-estimated parameters.

11 3. The Reciprocal Gamma Distribution

3.1 Definition, Moments & Properties of the Distribution

Probability density function (pdf)

The random variable = 1 XY is reciprocal gamma (R-gamma) distributed if X is gamma distributed. Its pdf can be derived from (2.1) the gamma pdf by defining the transformation == 1)( XXgY .

d = ()−1 ygfyf −1 yg )()()( Y X dy α −1 λα ⎛ 1 ⎞ ⎛ − λ ⎞ 1 = ⎜ ⎟ exp⎜ ⎟ Γ α)( ⎝ y ⎠ ⎝ ⎠ yy 2

α +1 λα ⎛ 1 ⎞ ⎛ − λ ⎞ = ⎜ ⎟ exp⎜ ⎟ Γ α)( ⎝ y ⎠ ⎝ y ⎠ α λ α −− 1 ⎛ − λ ⎞ = y exp⎜ ⎟ Γ α)( ⎝ y ⎠

So the R-gamma pdf is

λα ⎛ − λ ⎞ (3.1) xf λα ),|( = x α −− 1 ⎜ ⎟ x λα >>∞<≤ 0,0,0,exp , Γ α)( ⎝ x ⎠

where α is known as the shape parameter, and λ the scale parameter.

Figure 3 (a, b) shows the pdf plots for selected combinations of the parameters α and λ. Figure 3a, 3b and 4 are produced using MATLAB.

12 Figure 3a. Examples of the R-gamma pdf with fixed λ = 1 and various α

Figure 3b. Examples of the R-gamma pdf with fixed α = 3 and various λ

Cumulative distribution function (cdf)

The cdf of R-gamma can be expressed in terms of Γ(α) and Γ α;( x) , where Γ α;( x) is the “upper” incomplete gamma function defined by

13 ∞ (3.2) α );( =Γ α 1 −− t dtetx . ∫x

Note that by definition, the lower and upper incomplete gamma functions satisfy

α γ α xx Γ=+Γ (),();( α)

And the cdf is

γ α ( x)λ)1,( Γ α λ x),( (3.3) xF λα −= xF λα 1),|1(1),|( −= = Γ α)( Γ α)(

Since, α λ >=≤= = − xFxXPxXPxF α λ),|1(1)11()(),|( ,

Figure 4. Examples of the R-gamma cdf

Moments

The moments of the reciprocal gamma distribution are α n ∞ λ ⎛ − ⎞ αλλ −Γ n)( (3.4) n )( = n xxXE α −− 1 exp⎜ ⎟dx = , α n >− 0 . ∫0 Γ α)( ⎝ x ⎠ Γ α)(

We can find the mean, and variance of R-gamma distribution through (3.4), λ (3.5) XE )( = α > 1, α −1 λ2 (3.6) XE 2 )( = α > 2, αα −− )1)(2(

14 λ2 λ2 λ2 (3.7) XV )( = − = α > 2, αα −− )1)(2( α − 2 αα −− )1)(2()1( 2

3.2 Applications in Claim Size Modelling

Reciprocal gamma as a model for extremely large claims

In 2.2, we discussed that gamma distribution is a suitable model for small and moderate claims because of its plausible shape and its light-tail. However, in several lines of insurance business extremely large claims occur, and in recent years actuaries have become more and more aware of the potential risk inherent in large claim sizes. Studies have shown that heavy-tailed distributions needed to be used for modelling large claims. Unlike the gamma distribution, few existing literature studied the tail behaviour of R-gamma, therefore it is worth explaining here.

For heavy-tailed distributions, the survival function, ∞ =−= )()(1)( dttfxFxS ∫x which gives the probability of loss exceeding the value x, have (relatively) large values of S(x) for given x. Also the pdf itself can be used to test tail behaviour. For some distributions, S(x) and/or f(x) are simple functions of x and we can immediately determine what the tail looks like. If they are complex functions of x, we may be able to obtain a simple function that has similar behaviour as x gets large. We use the notation

),(~)( xxbxa ∞→

to mean x ∞→ xbxa = 1)()(lim

(Klugman, Panjer, and Willmot 1998, p86)

Therefore, as x gets large, the R-gamma pdf given in (3.1) can be denoted by λα (3.8) xf λα ~),|( α −− 1 , xx ∞→ Γ α)(

Clearly, the tail of the R-gamma pdf for large x values is a power-law with exponent (-α-1). Since power tail is special case of heavy tail. R-gamma is a heavy-tailed distribution and thus a reasonable model for large claims. The parameter α in (3.8) controls the power-law decay for large x values. Intuitively, since we already know that the gamma distribution has a light right tail and heavy left hand tail; by taking its reciprocal we essentially mapped the left tail into the right tail.

3.3 Estimation of Parameters

3.3.1 Method of Moments Estimation

Suppose we have a random sample of n observations, Y1, Y2,…Yn, that is selected from a population where i ~ − gammaRY α,( λ) ( i = 1,2,..n). The fist two moments of R-gamma

15 distribution were given in (3.5) and (3.6), equating them to the corresponding sample moments gives, λ μ ' = ' Ym α >== 1, 1 α −1 1 2 n ' λ ' 1 2 μ 2 = m2 == ∑Yi α > 2, . αα −− )1)(2( n i=1

Solving for α and λ in terms of the first two moments yields

2 2 − YnY 2 YY 2 ˆ ∑ i ˆ ∑ i (3.9) α = 2 2 , λ = 2 2 . ∑ i − YnY ∑ i − YnY

3.3.2 Maximum Likelihood Estimation

From the R-gamma pdf (3.1), we can derive the likelihood function of α, λ on the above sample of n observations, which is

n α α −− 1 −λ y n α −− 1 λ i ey ⎧ ⎫ −λ∑1 yi α −nn L = ∏ = ⎨∏ i ⎬ ey Γ αλ )( i=1 Γ α)( ⎩ i=1 ⎭ (3.10) α n −1 ⎧ n ⎫ ⎧ ⎫ 1 −λ ∑1 yi α −nn = ⎨ yi ⎬ ⎨∏∏ ⎬ e Γ αλ )( ⎩ i=1 ⎭ ⎩ i=1 yi ⎭

−1 ⎧ n ⎫ After dropping the uninformative factor ⎨∏ yi ⎬ , if we set ⎩ i=1 ⎭

ii == ,2,1,1 nixy , it is readily seen that the resulting likelihood function coincides with the gamma counterpart (2.14).

Therefore, it follows that the MLE analysis of R-gamma distribution is carried out by replacing each observation yi by its reciprocal = 1 yx ii and proceeding as for the Gamma distribution.

Similarly, as the R-gamma version of (2.23), the approximating MLEs for R-gamma are obtained by defining 1 = (1ln yu ) and 2 = 1 yu

()−++ uu 3)ln(411 αˆ = 12 ())ln(4 − uu (3.11) 12 ()−++ uu 3)ln(411 λˆ = 12 ())ln(4 − uuu 122

where 1 = ()1ln yu , 2 = 1 yu .

16 4. The Mixture of Gamma & Reciprocal Gamma

4.1 The Mixture of Gamma & Reciprocal Gamma

Mixture of gamma & R-gamma as a model for heterogeneous portfolio

In section 2, we proposed the gamma distribution as the model for ordinary claim sizes; in section 3, the reciprocal gamma was introduced to model extra large claim sizes. Very often, the insurance portfolio is heterogeneous in the sense that, in addition to a large number of “ordinary” policies, it also contains a small number of policies under which much larger claim sizes are likely (Hossack, Pollard, and Zehnwirth 1999, p.256).

Klugman, Panjer, and Willmot (1998) have presented an example of heterogeneous portfolio- a hospital’s general liability coverage. In this case, one process is carelessness in patient handling, which would include lost items such as dentures and eyeglasses as well as small incidents such as wheel chair collisions. The other process takes place in the operating room, where surgeons may create events with costly consequences. (p.105).

Under those circumstances, the population of claims consist of two components (groups), G1, and G2, corresponding to ordinary and extra large claims. If we could not separate the data into these two components we can not model them separately. Following our previous discussion on gamma, and R-gamma’s applications in claim size modelling, when there’s a mixture of ordinary and large claims, it is intuitively appealing to use a mixture of gamma and R-gamma model.

The pdf for the mixture is simply the weighted sum of gamma (α1, λ1) and R-gamma (α2, λ2) (for an introduction to mixture distributions see Titterington et al. 1985),

α1 α 2 λ1 1 1 −− λα 1x λ2 α 2 −− 1 ⎛ − λ2 ⎞ (4.1) xf λαλα 2211 ),,,,|( = expp −+ )1( xp exp⎜ ⎟ Γ α1)( Γ α2 )( ⎝ x ⎠ p << 10( ).

The weights are the proportions (p and 1-p) of ordinary claims and extra large claims respectively. From another point of view, p is the probability that a particular claim falls into the component G1 (ordinary claims), and 1-p is for G2 (extra large claims). The gamma & R- gamma mixture thus has five parameters, α1, λ1, α2, λ2, and the so-called mixing parameter p.

In this mixture model, the gamma distribution has a lighter tail. This might be a model for the frequent, small claims while the R-gamma distribution covers the large, but infrequent claims. The previously discussed single gamma and single R-gamma models can be seen as the special cases of the general mixture model at p = 1, and p = 0.

Mean, Variance & Moments

Denote 1 xf )( as the gamma density with mean μ1 and varianceσ 1 , and 2 xf )( the R-gamma with μ 2 and varianceσ 2 , (4.1) can be simply rewritten as

17 (4.2) 1 −+= 2 xfpxpfxf )()1()()( .

The moments of mixture distribution (xf ) satisfy

∞ ∞ n = n )()( = n −+ )()1()( dxxfpxpfxdyxfxXE ∫∫ []1 2 (4.3) ∞− ∞− ∞ ∞ = n −+ n )()1()( pdxxfxpdyxfxp μ ' −+= p μ ' ,)1( ∫ 1 ∫ 2 n1 n2 ∞− ∞−

' ' where μ n1 is the nth moment of the gamma distribution, and μn2 the R-gamma. Formula (4.3) is a useful tool when we apply the moment estimation in the next section.

2 2 2 From (4.3) and the relation XE i )( σμ ii i =+= 2,1, . It is easy to verify that,

2 2 2 (4.4) μ1 −+= ppYE )1()( μ 2 , σ 1 σ 2 ppppYV )1()1()( []−−+−+= μμ 21 .

4.2 Estimation of Parameters in the Gamma-Reciprocal Gamma Mixture

4.2.1 Method of Moments Estimation

In the mixture model (4.1), there are five parameters overall, namely α1, λ1, α2, λ2, and p. Therefore we must equate five pairs of population and sample moments, provided that first 5th moments are available (α 2 > 5 ). Formulas of moments for single gamma and R-gamma are given in (2.11) and (3.4), applying also (4.3), we can establish the first five moments for (4.1) and then equate them to the first 5 sample moments.

A simplified case with known α1 and α2 (> 3)

To present the idea, we consider a simplified case when the α1 and α2 (> 3) are already known a priori, so there’re then only 3 parameters to be estimated.

⎧ ' α1 λ2 ' ⎪()μ1 −+= pp )1( = m1 ⎪ λ1 α 2 −1 ⎪ 2 ⎪ ' + )1( αα 11 λ2 ' (4.5) ⎨()μ 2 = p 2 −+ p)1( = m2 ⎪ λ1 2 αα 2 −− )1)(2( ⎪ ++ )1)(2( ααα λ3 ⎪ ' 1 11 2 ' ()μ 2 = p 3 −+ p)1( = m3 ⎩⎪ λ1 2 2 ααα 2 −−− )1)(2)(3(

What is shown above is a system of three non-linear simultaneous equations, which must be solved to give the estimates of the unknown three parameters p, λ1, and λ2. It is obvious that even for this simplified case the algebra can be very heavy. The solution can be accomplished by some numerical method, for example, the Newton-Raphson method. We have introduced in details the Newton-Raphson method for one equation in one unknown in section 2.3.2. As we

18 will see, this method can be easily extended to finding solutions of systems of m equations in m variables. The process goes as follows, with the iterations stops when the desired accuracy has been achieved.

1. For one single equation f(x) = 0, we do iteration → xx nn +1 following

xf n )( n+1 xx n −= . xf n )('

This is the formula we used in section 2.3.2 (formula (2.20)). x0 should be chosen suitably close to x.

2. System with m equations and m unknowns.

i 1 … m == ,1,0),,( …mixxf or in vector form

f (x) = 0.

nn +1 Let vector x = [x1,…xm], Iteration → xx by solving the following linear system

+1 nnn n )())(( =+− 0xfxxxJ

(4.6) n+1 n −= n −1 ()( xfxJxx n ) , where J(x) is the Jacobian matrix defined as,

⎡ ∂f ∂f ⎤ 1 1 ⎢ ∂x ∂x ⎥ ⎛ ∂f ⎞ ⎢ 1 m ⎥ ⎜ i ⎟ = ⎜ ⎟ ⎢ ⎥ ∂x j ⎝ ⎠ ⎢∂f m ∂f m ⎥ ⎢ ⎥ ⎣ ∂x1 ∂xm ⎦

We will denote the determinant of Jacobian matrix as J,

∂f ∂f 1 1 ∂x ∂x ⎛ ∂f ⎞ 1 m J = det⎜ i ⎟ = ⎜ ⎟ ⎝ ∂x j ⎠ ∂f ∂f m m ∂x1 ∂xm

For our system (4.5), the vector of function f = [f, g, h] is,

19 ⎧ α1 λ2 ' ⎪ −+= ppf )1( m1 =− 0 ⎪ λ1 α 2 −1 ⎪ 2 ⎪ + )1( αα 11 λ2 ' ⎨ = pg 2 −+ p)1( m2 =− 0 ⎪ λ1 2 αα 2 −− )1)(2( ⎪ ++ )1)(2( ααα λ3 ⎪ 1 11 2 ' = ph 3 −+ p)1( m3 =− .0 ⎩⎪ λ1 2 2 ααα 2 −−− )1)(2)(3(

The vector of unknowns is x = p λ λ21 ],,[ , and the Jacobian matrix is given by

⎡∂f ∂ ∂ff ⎤ ⎢ ⎥ ∂p ∂ ∂λλ ⎢ 21 ⎥ ∂g ∂ ∂gg pJ λλ ),,( = ⎢ ⎥ 21 ⎢ ⎥ ∂p ∂ ∂λλ 21 ⎢∂h ∂ ∂hh ⎥ ⎢ ⎥ ⎣⎢∂p ∂ ∂λλ 21 ⎦⎥ where,

∂f α λ 1 −= 2 ∂p λ1 α 2 −1 ∂ pf −= α1 2 ∂λ1 λ1 ∂f 1 −= p)1( ∂λ2 α 2 −1 2 ∂g ()+1 αα 11 λ2 = 2 − ∂p λ1 ()()2 αα 2 −− 12 ∂g p ()+−= 12 αα 11 3 ∂λ1 λ1 ∂g λ −= p)1(2 2 ∂λ2 ()()2 αα 2 −− 12 2 ∂h ()()1 ++ 12 ααα 11 λ2 = 3 − ∂p λ1 ()()(2 2 ααα 2 −−− 123 ) ∂h p ()()1 ++−= 123 ααα 11 4 ∂λ1 λ1

∂h λ2 −= p)1(3 2 . ∂λ2 ()()()2 2 ααα 2 −−− 123

00 0 0 Having chosen a suitable vector of starting values x = p 1 ,,[ λλ 2 ] , formula (4.6) is applied to successively find the next x values until convergence. We can solve (4.6) by either directly taking the inverse of the Jacobian matrix, or use the Cramer’s rule, which yields,

20 fff 32

ggg 32 hhh n+1 pp n −= 32 p n λλ nn ),,( J 21

1 fff 3

1 ggg 3 hhh n+1 λλ n −= 1 3 p n λλ nn ),,( 1 1 J 21

21 fff

21 ggg hhh n+1 λλ n −= 21 p n λλ nn ),,( 2 2 J 21

∂f ∂f ∂f where f1 = f 2 = f 3 = ,,, J, as said above, is the determinant of the Jacobian matrix. ∂p ∂ 1 ∂λλ 2

The method of Cramer’s rule is used in the Java Mixture Model Estimation Program, see Appendix E: Java source code- Newton_Solver.java.

In general, using Newton-Raphson method, convergence is not always guaranteed for a certain system of equations. Therefore to check whether Newton-Raphson method works well for system (4.5), we can test in the following numerical example.

Example: Suppose we know, α1=2, λ1= 1, α2= 4, λ2= 60, p =0.9, the corresponding population moments are therefore μ1=3.8, μ2= 65.4, μ3= 3621.6 (calculated by (4.3) or (4.5)).

If we let m1, m2, m3 equal to u1, u2, u3, the solutions of p, λ1, λ2 should be exactly their true values 0.9, 1, 60. We now can test Newton-Raphson method by solving (4.5) for p, λ1, λ2 and then comparing to their true values. The Java Mixture Model Estimation Program is an ideal tool for this purpose. The results can be achieved in one second. A complete users’ guide of the program is given in Appendix B.

Our inputs are α1=2, α2= 4, m1 (= μ1) =3.8, m2 (= μ2) =65.4, m3 (= μ3) = 3621.6. The starting 00 0 0 values are chosen as x = p 1 λλ 2 = 50,8.0,7.0[],,[ ]. The output is shown in the following graph.

21 Figure 5. Applying the Newton-Raphson method-an example

The convergence is very fast, after six iterations we get the expected results. We can record the estimated values for every iteration and plot in Excel, see Figure 6 (a,b,c). Experiments show that it is same effective if we use slightly “poorer” starting values.

Figure 6a. Convergence of p

0,92

0,915

0,91 Estimated p

p 0,905

0,9

0,895

0,89 123456 Iteration No.

22 Figure 6b. Convergence of λ1

1,005 1 0,995 0,99 Estimated lambda1 0,985 0,98 0,975 0,97 0,965 0,96 123456 Iteration No.

Figure 6c. Convergence of λ2

64 62 60 58 Estimated lambda2 56 54 52 50 123456 Iteration No.

We can try more examples with various parameters, for example, we may worry whether Newton-Raphson method works well for the cases where the mean of the reciprocal gamma component is much larger than the gamma component, or the cases where p is almost 1 (so weight of the reciprocal gamma component is almost zero). The results of those experiments shows that Newton-Raphson method is a good choice, the convergence almost always happens within 15 iterations.

Therefore, it is reasonable to hope that if the sample moments are good approximations of the population moments, our moment estimation algorithm will provide satisfactory estimation for the unknown parameters p, λ1, λ2, given α1, and α2 > 3. In section 5, we will simulate samples of the mixture model using the Java Mixture Model Simulation Program, and then evaluate our moment estimation algorithm (for the simplified case) using the Java Mixture Model Estimation Program.

23

4.2.2 Maximum Likelihood Estimation

As discussed previously, MLEs are well known to have desirable asymptotic properties and it is natural to consider the method for estimating the parameters in our mixture model (4.1). (For an introduction of MLE method for mixture distributions, see Everitt and Hand 1981 or McLachlan and Peel 2000). Unfortunately, in practice for our mixture model the MLE is more difficult and time-consuming to apply than the moment estimation. However, it is rather simple to theoretically derive the solution concept of this method.

The likelihood function is given by n n = ∏ i pxfL λαλα 2211 = ∏[]1 xpf i λα 11 −+ 2 xfp i λα 22 ),|()1(),|(),,,,|( i=1 i=1

where i pxf α λ α ,,,,|( λ2211 ) denotes the mixture distribution (4.1), and

α1 λ1 α1 −1 −λ 1xi 1 xf i λα 11 ),|( = i ex Γ α1 )( is the gamma component, and λ α 2 − λ 2 α2 −− 1 ⎛ 2 ⎞ 2 xf i λα 22 ),|( = x exp⎜ ⎟ Γ α 2 )( ⎝ x ⎠ the R-gamma component.

The corresponding log likelihood function is n n (4.7) = ∏ i pxfL λαλα 2211 = ∑ []1 xpf i λα 11 −+ 2 xfp i λα 22 ),|()1(),|(ln),,,,|(lnln i=1 i=1

The likelihood equations, which are obtained by equating the first partial derivatives of (4.7) with respect to p, α1, λ1, α2, λ2, to zero, are given by

∂ ln L n 1 (4.8) = ∑ 1 xf i λα − 211 xf i λα 22 = 0)),|(),|(( ∂ i=1 i pxfp λαλα 2211 ),,,,|(

∂ ln L n ∂ xf α λ ),|( ∂α (4.9) = ∑ p 1 i 111 = 0 ∂α1 i=1 i pxf λαλα 2211 ),,,,|(

∂ ln L n ∂ xf α λ ),|( ∂λ (4.10) = ∑ p 1 i 111 = 0 ∂λ1 i=1 i pxf λαλα 2211 ),,,,|(

∂ ln L n ∂ xf α λ ),|( ∂α (4.11) ∑ −= p)1( 2 i 222 = 0 ∂α 2 i=1 i pxf λαλα 2211 ),,,,|(

∂ ln L n ∂ xf α λ ),|( ∂λ (4.12) ∑ −= p)1( 2 i 222 = 0 ∂λ2 i=1 i pxf λαλα 2211 ),,,,|(

24 As the gamma component is assumed to exist in a fixed proportion p in the mixture (and the R- gamma in (1-p)), we may talk about the probability that a particular observation xi belongs to one of the component. By Bayes Rule, the probability that a given xi comes from the gamma component is

1 xpf i α λ11 ),|( (4.13) Pr(component xgamma i )| = , i pxf λαλα 2211 ),,,,|(

For convenience, we will sometimes use (xf i ) ,1 (xf i ),)2 (xf i as shortened notations for

i pxf α λ α λ2211 ),,,,|( ,1 xf i α ,|( λ11 ),2 xf i α ,|( λ22 ).

n xf )( If we manipulate (4.8), an interesting result ∑ 2 i = n can be obtained, as follows. i=1 xf i )(

First multiply (4.8) by p,

n p (4.14) ∑ 1 i 2 xfxf i =− 0)()(( i=1 xf i )(

Using the fact that, 1 += − 2 ⇒ 1 − 2 = − 2 xfxfxpfxpfxfpxpfxf )()()()()()1()()( , (4.14) can be written as

n − xfxf )()( ∑ i 2 i = 0 i=1 xf i )( n ⎛ 2 xf i )( ⎞ ∑⎜1− ⎟ = 0 i=1 ⎝ xf i )( ⎠ n ⎛ 2 xf i )( ⎞ ⎜∑ ⎟ n =− 0 ⎝ i=1 xf i )( ⎠

n xf )( (4.15) ∑ 2 i = n i=1 xf i )(

Now we multiply (4.8) by p again, the purpose of which is to write its solution in terms of Pr(component xgamma i )| .

n p ∑ 1 i 2 xfxf i =− 0)()(( i=1 xf i )( n ⎛ xpf )( xpf )( ⎞ ⎜ 1 i − 2 i ⎟ = 0 ∑⎜ xf )( xf )( ⎟ i=1 ⎝ i i ⎠ n xpf )( n xf )( 1 i −p∑∑ 2 i = 0 i=1 xf i )( i=1 xf i )( n ∑ Pr(component i npxgamma =− ∵ )15.4(),13.4(0)| ⇒ i=1

25

1 n ˆ (4.16) p = ∑ Pr(component xgamma i )| n i=1

Also, using (4.13) in (4.9)-(4.12), we can express them as

n ∂ 1 xf i α λ11 ),|(ln (4.17) ∑ Pr(component xgamma i )| = 0 i=1 ∂α1

n ∂ 1 xf i α λ11 ),|(ln (4.18) ∑ Pr(component xgamma i )| = 0 i=1 ∂λ1

n ∂ 2 xf i α λ22 ),|(ln (4.19) ∑ Pr(component − xgammaR i )| = 0 i=1 ∂α 2

n ∂ 2 xf i λα 22 ),|(ln (4.20) ∑ Pr(component − xgammaR i )| = 0 i=1 ∂λ2

Note that Pr(component − xgammaR i = − Pr(1)| component xgamma i )|

The forms in (4.17)-(4.20) are intuitively attractive. The likelihood equations for estimating α1, λ1, α2, λ2 are weighted averages of the likelihood equations arising from of gamma or R-gamma considered separately. The weights are the of membership of each sample point xi in each class.

Of course equations (4.16)-(4.20) do not give the estimators analytically, instead they must be solved using some type of iterative procedure, for instance, EM algorithm (see McLachlan et al 2000). The simplest way is a method (see Everitt, 1981) that is essentially an application of the EM algorithm. Applying this method we first find initial values of the parameters (obtained by some other methods or simply by experience), then insert them into (4.13) to obtain the first estimates of Pr(component | xgamma i ) , this is then inserted into equation (4.16) to (4.20) to give revised parameter estimates and the process is continued until convergence.

26 5. Simulation Studies 5.1 Simulation

Based on our gamma & R-gamma mixture model (or the single gamma and single R-gamma models as two special cases), a Java Mixture Model Simulation Program is specially designed for the simulation purpose. A complete users’ guide of the program is given in Appendix A. All the figures in this section are produced by the program.

The basic idea of the simulation for our model is straightforward, we simulate random numbers that either follow a gamma (α1, λ1) distribution (with probability p), or R-gamma (α2, λ2) distribution (1-p). To generate a random observation on the R-gamma distribution, all one has to do is to generate a random observation on the corresponding gamma distribution and take the reciprocal of the result.

Since the mean of the gamma component is supposed to be much smaller than the mean of the R-gamma component6, a general (normal) histogram do not look very illustrative, especially when p is close to one, as shown in Figure 7.

Figure 7. General histogram of 10,000 random observations on the mixture model

α1 = 3.0; λ1 = 0.01; α2 = 3.0; λ2 = 6000 and p = 0.75

A better alternative is to produce the histograms in logarithmic scale. Figure 8 (a, b, c) presents log histograms for different values of parameters α1, λ1, α2, λ2, and p.

6 This is because in the mixture model the gamma component covers the claims that are much smaller than what the R-gamma component does.

27 Figure 8. Log histograms of 10,000 random observations on the mixture model

(a) α1 = 3.0; λ1 = 0.01; α2 = 3.0; λ2 = 60000 and p = 0.9

(b) α1 = 3.0; λ1 = 0.01; α2 = 3.0; λ2= 60000 and p = 0.75

28 (c) α1 = 3.0; λ1 = 0.01; α2 = 3.0; λ2 = 6000 and p = 0.75

Figure 8 (a,b,c) illustrate how our mixture of gamma and reciprocal gamma model can create a bi-modal distribution.

5.2 Estimation

This section is aimed to evaluate the method of moment estimation we derived in section 4.2.1. Our goal is therefore limited to the simplified case where α1, α2>3 are known. The evaluation proceeds as follows: first we use the Java simulation program to simulate samples (the parameters α1, α2, λ1, λ2 and p are chosen by us) and computes the first three sample moments (m1, m2, m3) for each sample; we will generate 20 samples each for sample sizes (n) of 100, 200, 500 and 1000. Next we take the average m1, m2, m3 of the 20 samples for n =100, 200, 500 and1000. Finally we use the Java estimation program to estimate p, λ1, and λ2 (and compare to their true values) for each sample size n given α1, α2 and the corresponding average m1, m2, m3.

In our mixture model, the reciprocal gamma component is supposed to have larger mean than the gamma component, and the weight (1-p) of reciprocal gamma is reasonably assumed to be very small, i.e. 0.1 or 0.05. We will therefore focus on such cases.

29 Case One: Samples generated by the following parameters α1 = 2.0; α2 = 4.0; p = 0.9; λ1 = 1 and λ2 = 60

The means of the gamma and the R-gamma components are therefore 2 and 20 respectively.

Table 3. Results (m1, m2, m3) of the Java Mixture Model Simulation Program (for case 1 )

n = 100 n = 200

Sample No. m1 m2 m3 Sample No. m1 m2 m3 1 3,3566 46,1222 1461,0833 1 3,4795 45,1929 1222,5706 2 3,6025 44,2636 984,0578 2 3,3594 40,7412 874,1341 3 3,5380 49,3988 1179,8444 3 3,6581 52,8973 1574,8707 4 3,1809 32,0837 568,4237 4 3,1864 29,7927 482,3741 5 3,6396 61,3733 2208,3176 5 3,6409 43,6022 1004,6762 6 3,6767 44,4214 941,4238 6 3,6889 47,9849 1324,5586 7 3,1150 31,0644 560,2580 7 4,0897 80,4335 3933,2784 8 3,2579 28,5210 404,4903 8 3,4122 44,5275 1185,9261 9 3,5199 51,7304 1476,0766 9 4,8066 156,0900 11117,1673 10 3,7619 35,4741 533,2758 10 3,8587 49,1643 1178,9222 11 3,4535 44,2780 1397,3511 11 2,9903 29,6945 610,0595 12 3,9242 51,6918 1251,7661 12 3,3378 45,3884 1648,6347 13 4,3139 106,6350 6544,6139 13 4,8034 148,6120 9890,9770 14 3,8655 54,2321 1321,9429 14 2,9647 30,7577 749,8292 15 3,2286 50,8587 1718,8177 15 3,7998 57,2028 1975,5029 16 3,5957 38,1963 653,0345 16 3,4472 38,4896 825,6060 17 5,8454 241,6242 19646,9762 17 3,4866 38,4534 810,8807 18 3,7679 70,5558 2587,3584 18 3,3275 42,2316 1016,8013 19 3,9504 46,6159 906,3644 19 5,0210 201,8802 23409,7195 20 3,7669 51,7127 1451,4800 20 3,0108 28,3996 494,5522 Average 3,7181 59,0427 2389,8478 Average 3,6685 62,5768 3266,5521

n = 500 n = 1000

Sample No. m1 m2 m3 Sample No. m1 m2 m3 1 3,4635 46,6483 1280,3454 1 3,4649 42,4453 1031,7251 2 3,4663 38,2422 783,1049 2 3,9712 75,6400 3747,9705 3 3,7571 61,5391 2446,8983 3 3,5792 62,3311 2975,0006 4 4,1853 89,7410 5049,0427 4 3,6586 69,8909 5311,5120 5 3,6073 74,6707 4454,5788 5 3,8538 69,3384 3419,7651 6 3,5511 49,9915 1495,4225 6 3,5827 60,0450 3378,7429 7 3,5457 41,3303 916,3673 7 3,8599 58,7343 1765,9694 8 3,7716 98,4515 9706,6566 8 3,7997 48,9598 1202,3852 9 3,6711 74,4775 4633,2935 9 3,7006 59,5563 2370,4893 10 4,0364 64,1992 2206,2368 10 3,9337 74,3765 4864,7646 11 3,5119 43,6283 1231,7904 11 3,7684 55,5385 1677,0514 12 3,6535 76,4618 5525,6954 12 3,4867 49,4763 1885,2028 13 3,9650 61,2046 1820,6430 13 4,0130 104,6849 12967,1571 14 3,7548 56,2639 1711,2957 14 3,7941 68,0993 2957,7027 15 4,2924 67,0258 1900,3655 15 3,3847 42,0378 1091,8070 16 3,3070 30,8939 504,4048 16 3,8145 60,8667 1915,8447 17 3,5405 54,7646 2034,2454 17 3,7830 69,0677 3493,4036 18 3,8607 64,3479 2706,7333 18 3,6688 54,6478 1668,6038 19 3,8374 49,6190 1158,2886 19 3,8651 85,6260 6447,6714 20 4,0300 99,1340 8571,2406 20 3,6468 56,3769 2065,6042 Average 3,7404 62,1318 3006,8325 Average 3,7315 63,3870 3311,9187

30 We then estimate p, λ1 and λ2 using the Java estimation program.

Table 4. Results (estimated p, λ1, λ2) of the Java Estimation Program (for case 1 )

Estimation Result Estimated Relative Error7 Real n = 100 n = 200 n = 500 n = 1000 n = 100 n = 200 n = 500 n=1000 p 0,9 0,8017 0,8831 0,8664 0,887 10,92% 1,88% 3,73% 1,44%

λ1 1 1,6601 1,1534 1,1879 1,0921 66,01% 15,34% 18,79% 9,21% λ2 60 41,6361 54,8304 51,2326 55,9284 30,61% 8,62% 14,61% 6,79%

For example, for n =100, we input α1 = 2.0; α2 = 4.0, m1 = 3.7181, m2 = 59.0427, and m3 = 2389.8478 and reasonably close initial guess of p, λ1 and λ2 , as shown in the following graph.

Figure 9. Results of the Java Mixture Model Simulation Program for n = 100

To get a clearer picture, we can plot the relative errors against size n in Excel.

7 The relative error is defined as: RE= − TTA , where T is the exact value of some quantity and A is an approximation to this value.

31 Figure 10. A plot of relative estimation errors against sample size n (for case 1)

70,00% 60,00% relative error of p 50,00%

40,00% relative error of 30,00% lambda1 relative error of 20,00% Relative Errors Relative lambda2 10,00% 0,00% 100 200 500 1000 n

As Figure 10 shows, in this case, the estimation quality generally improves as n increases.

We now look at case two, where all parameters are the same as case one, except that, (1-p), the weight of the reciprocal gamma component, is even smaller.

32 Case Two: Samples generated by the following parameters

α1 = 2.0; α2 = 4.0; p = 0.95; λ1 = 1 and λ2 = 60

Table 5. Results (m1, m2, m3) of the Java Mixture Model Simulation Program (for case 2)

n = 100 n = 200

Sample No. m1 m2 m3 Sample No. m1 m2 m3 1 2,811956 23,08607 421,2255 1 2,73215 22,97709 460,0533 2 2,652344 22,86811 498,8811 2 2,399115 15,99389 239,3109 3 2,213545 12,11216 154,334 3 2,897493 33,6646 1068,223 4 2,584686 19,87563 324,2878 4 2,458403 16,07034 231,3746 5 2,971137 45,52974 1810,865 5 2,957405 25,97815 500,201 6 2,823848 21,79947 325,5811 6 2,70745 16,30414 197,5334 7 2,143992 12,17417 200,5771 7 2,756049 25,13949 500,4621 8 2,772815 19,96651 262,1722 8 2,501017 17,38893 254,5381 9 2,807652 27,90763 657,9461 9 3,686255 108,0554 8588,083 10 3,107159 24,04866 342,456 10 2,83921 23,84927 522,6723 11 2,608867 12,78013 103,6608 11 2,616091 21,47443 421,0139 12 2,806033 19,82815 291,4059 12 2,496273 16,02748 200,0329 13 2,598703 19,99865 296,275 13 3,702027 117,8293 9014,833 14 2,913395 30,28034 704,6493 14 2,608077 21,89915 525,5812 15 2,05928 8,406268 55,5521 15 3,151629 41,78698 1571,124 16 2,942755 26,37159 453,524 16 2,955429 27,04032 542,1265 17 4,585482 179,1298 15887,52 17 3,033717 28,04239 570,494 18 2,787028 36,98108 1288,642 18 2,445397 17,83242 274,1988 19 2,814397 20,29374 299,9783 19 3,409659 55,17347 2826,11 20 2,864023 27,40479 745,3663 20 2,382044 14,71677 201,8509 Average 2,793455 30,54213 1256,245 Average 2,836744 33,3622 1435,491

n = 500 n = 1000

Sample No. m1 m2 m3 Sample No. m1 m2 m3 1 2,646733 24,69434 641,9187 1 2,688913 22,93682 499,8326 2 2,731093 21,17929 357,7465 2 2,897996 38,14745 2012,658 3 2,597255 18,25871 290,3086 3 2,914819 43,80346 2346,517 4 3,198737 58,03619 3735,007 4 2,845249 28,56107 882,9561 5 2,910747 53,67358 3632,141 5 3,038571 35,74051 1197,224 6 2,918892 33,93335 1060,893 6 2,621883 19,3048 378,6913 7 3,02053 27,90005 546,1518 7 2,977892 33,98273 944,1954 8 2,669968 29,2221 1219,76 8 2,848272 24,7763 469,7652 9 3,013579 40,741 1653,164 9 2,996051 34,61389 1029,422 10 3,063562 30,74002 741,284 10 2,883081 27,32992 579,2428 11 2,608604 16,93131 231,3851 11 2,703239 23,14884 498,6979 12 2,635162 21,67829 525,9976 12 2,544855 18,35891 327,8484 13 3,196486 39,87805 1094,379 13 2,848149 27,65608 702,0109 14 2,759298 28,0874 794,0121 14 2,96168 44,51211 2167,515 15 3,019576 31,9273 718,3333 15 2,704447 24,52304 593,5765 16 2,676967 17,6253 221,1972 16 3,018948 39,91168 1316,85 17 2,877046 33,29981 1104,262 17 3,060125 45,03499 2416,506 18 3,115056 35,92797 954,5829 18 2,690702 23,43278 513,0984 19 2,944356 27,38943 552,4702 19 2,882462 34,40854 1150,552 20 2,821806 27,27041 606,0155 20 2,874173 37,28282 1531,215 Average 2,871273 30,91969 1034,05 Average 2,850075 31,37334 1077,919

33

Table 6. Results (estimated p, λ1, λ2) of the Java Estimation Program (for case 2)

Estimation Result Estimated Relative Error Real n = 100 n = 200 n = 500 n= 1000 n = 100 n = 200 n = 500 n = 1000 P 0,95 0,933 0,928 0,8783 0,8816 1,79% 2,32% 7,55% 7,20%

λ1 1 1,0849 1,1188 1,2801 1,2993 8,49% 11,88% 28,01% 29,93% λ2 60 48,0374 49,0967 36,9624 37,8281 19,94% 18,17% 38,40% 36,95%

Figure 11. A plot of relative estimation errors against sample size n (for case 2)

45,00% 40,00% 35,00% 30,00% RE of p 25,00% RE of lambda1 20,00% RE of lambda2 15,00%

Relative Error Relative 10,00% 5,00% 0,00% 100 200 500 1000 n

RE: Relative Error

In this case, (1-p) is as small as 0.05 (1-0.95), the estimation quality does not improve as n increases (it in fact declines), which is an undesirable result.

In the following two cases, we will increase the difference between the means of the two components (thus increase λ2). We will again compare the case p = 0.9 to the case p=0.95.

34 Case Three: Samples generated by the following parameters α1 = 2.0; α2 = 4.0; p = 0.9; λ1 = 1 and λ2 = 600

The means of the gamma and the R-gamma components are therefore 2 and 200 respectively.

Table 7. Results (m1, m2, m3) of the Java Mixture Model Simulation Program (for case 3)

n = 100 n = 200

Sample No. m1 m2 m3 Sample No. m1 m2 m3 1 15,53122 3976,963 1431548 1 16,83022 3860,099 1191063 2 18,12923 3743,235 950577,7 2 17,27083 3511,969 847372,1 3 18,67859 4331,777 1147453 3 19,89637 4790,008 1558023 4 15,86307 2692,161 547291,4 4 15,96468 2463,025 461516,9 5 19,03629 5582,994 2188192 5 19,64184 3783,012 980289,9 6 20,75644 3997,021 927854,7 6 19,04477 4192,369 1301066 7 16,08797 2660,002 544181,5 7 26,0584 7584,478 3916565 8 15,84138 2266,047 378852,3 8 17,90898 3866,055 1158487 9 18,93416 4632,735 1454381 9 31,41486 15021,39 1,11E+07 10 20,34951 2933,289 506199,2 10 21,32733 4316,248 1154656 11 17,0904 3828,416 1373403 11 13,67546 2468,284 592164,1 12 20,99914 4556,321 1228728 12 17,88255 4033,807 1628792 13 30,11917 10321,42 6534291 13 33,2496 14387,61 9872388 14 21,99763 4847,535 1298840 14 13,34488 2570,221 730907 15 16,1494 4519,713 1694021 15 21,59589 5149,496 1951519 16 19,66857 3212,397 622952,1 16 17,16694 3247,672 798718,2 17 40,5215 23501,43 1,96E+07 17 16,45465 3200,599 784307,2 18 22,30821 6541,35 2566370 18 19,36098 3833,707 1003700 19 22,7269 4106,676 884736,6 19 34,38003 19639,92 2,34E+07 20 19,92775 4525,819 1424576 20 13,7315 2281,961 470633,4 Average 20,53583 5338,865 2366171 Average 20,31004 5710,096 3244110

n = 500 n = 1000

Sample No. m1 m2 m3 Sample No. m1 m2 m3 1 17,44768 4065,426 1253012 1 17,92079 3681,623 1007653 2 18,39389 3297,819 762293,6 2 23,15087 6996,108 3724689 3 21,27115 5614,681 2425857 3 19,94968 5721,883 2955154 4 25,03059 8377,535 5023521 4 20,21882 6440,771 5288944 5 20,39182 6970,116 4435988 5 22,1965 6383,683 3397149 6 19,50754 4473,651 1474319 6 18,85711 5431,324 3355545 7 17,95484 3531,746 891124,5 7 22,19659 5316,706 1742958 8 22,48279 9349,796 9686764 8 22,051 4376,253 1181897 9 20,64254 6919,369 4611851 9 20,73018 5420,577 2349234 10 23,75046 5847,996 2182447 10 24,04579 6957,802 4846654 11 18,03449 3777,23 1207670 11 21,06881 4993,222 1653634 12 19,67973 7085,417 5503420 12 18,71552 4432,667 1865501 13 23,12022 5553,797 1796734 13 23,9148 9942,363 1,29E+07 14 21,27296 5079,614 1689181 14 21,97166 6303,859 2938155 15 26,97796 6161,347 1878142 15 17,32478 3647,515 1068603 16 17,12404 2591,159 485651,9 16 22,92985 5609,913 1897487 17 19,31522 4957,726 2013647 17 21,34209 6350,238 3470116 18 22,14513 5883,429 2684822 18 19,88448 4910,16 1646471 19 22,64703 4446,228 1137561 19 21,90593 8011,02 6425843 20 25,44456 9469,377 8555747 20 20,61231 5114,886 2043976 Average 21,13173 5672,673 2984988 Average 21,04938 5802,129 3290327

35 Table 8. Results (estimated p, λ1, λ2) of the Java Estimation Program (for case 3)

Estimation Result Estimated Relative Error Real n = 1001 n = 200 n = 5002 n= 1000 n = 100 n = 200 n = 500 n = 1000 p 0,9 0,8369 0,8939 0,8771 0,8918 7,01% 0,68% 2,54% 0,91%

λ1 1 1,809 8,5377 2,3082 3,0195 80,90% 753,77% 130,82% 201,95% λ2 600 443,1974 568,1431 526,2048 567,1468 26,13% 5,31% 12,30% 5,48%

1. The Newton-Raphson method does not converge, the estimated values are approximated by the values obtained at iteration 9.

2. The Newton-Raphson method does not converge, the estimated values are approximated by values obtained at iteration 11.

As Table 8 shows, in this case, the estimation quality is very poor, especially for λ1. For p and λ2, it seems that the estimated values are more or less acceptable for n larger than 100.

36 Case Four: Samples generated by the following parameters

α1 = 2.0; α2 = 4.0; p = 0.95; λ1 = 1 and λ2 = 600

Table 9. Results (m1, m2, m3) of the Java Mixture Model Simulation Program (for case 4)

n = 100 n = 200

Sample No. m1 m2 m3 Sample No. m1 m2 m3 1 9,788257 1664,487 391426,5 1 8,690964 1620,312 428036,5 2 7,593671 1576,137 464646,6 2 7,150295 1027,198 212320,8 3 5,084568 600,1045 121918,3 3 11,73739 2854,75 1051098 4 9,216023 1454,292 302723,2 4 7,654497 1064,319 209842,9 5 11,71334 3983,477 1790369 5 11,89763 1991,461 474855 6 11,76144 1726,023 311827,4 6 8,267358 984,407 172134,2 7 5,117395 739,3176 183701,8 7 11,1329 1993,822 481077,5 8 10,1916 1389,321 235984 8 7,8554 1112,127 224767,5 9 11,35124 2234,16 635619,1 9 19,55287 10200,5 8,56E+06 10 12,44401 1748,762 314090,9 10 10,09365 1752,867 497269,1 11 7,294228 610,247 76158,83 11 9,490685 1634,919 402823,6 12 9,240487 1358,567 268109,5 12 8,530887 1072,35 179513,5 13 11,06191 1591,265 283508 13 21,848 11304,67 8996186 14 11,20389 2396,378 678646,9 14 9,456421 1675,269 506372 15 4,116149 269,2539 30665,12 15 14,44944 3591,129 1546684 16 11,59465 1955 418869,9 16 11,5124 2078,076 514380,3 17 2,71E+01 17226,1 1,59E+07 17 11,32323 2137,147 542957,7 18 12,03858 3174,889 1267469 18 9,623635 1360,5 259593,6 19 10,61813 1460,894 278087,8 19 17,10218 4924,344 2,80E+06 20 9,569174 2044,84 716450,3 20 6,735436 890,2508 176936,8 Average 10,4033 2460,176 1231451 Average 11,20526 2763,521 1412080

n = 500 n = 1000

Sample No. m1 m2 m3 Sample No. m1 m2 m3 1 8,679172 1855,699 614216,8 1 9,426154 1711,608 475230,7 2 10,17314 1567,517 336244,6 2 11,38044 3208,744 1987672 3 8,583334 1245,142 267417,7 3 12,75509 3855,668 2326316 4 14,17754 5172,345 3707926 4 11,25938 2278,063 859101,1 5 12,817 4855,189 3613157 5 13,14739 2991,815 1173225 6 12,69317 2856,147 1039474 6 8,425639 1330,494 354346,3 7 12,10324 2168,756 520158,1 7 12,51382 2815,774 920304,9 8 10,41551 2387,371 1198044 8 11,60625 1931,493 448372,8 9 13,26288 3518,878 1630762 9 12,80714 2897,359 1007155 10 13,0319 2464,752 715687,4 10 12,48313 2219,575 559912 11 8,278421 1090,705 206853,1 11 9,403661 1721,786 474011,8 12 8,572858 1570,283 501839,6 12 8,310056 1288,873 306864,3 13 14,69045 3400,094 1069815 13 11,54337 2216,922 6,81E+05 14 10,33719 2231,454 770795 14 12,83117 3920,023 2147067 15 13,21381 2622,532 695045,4 15 9,887588 1879,191 569873,1 16 9,998691 1240,455 201700,3 16 14,16588 3489,379 1297614 17 11,69973 2776,689 1082416 17 13,48451 3929,214 2392647 18 13,91455 3018,029 931894,4 18 9,511126 1772,906 490434,7 19 12,66666 2188,774 530408,1 19 11,34205 2866,281 1127845 20 12,29961 2250,376 589415,8 20 11,97147 3175,384 1508384 Average 11,58044 2524,059 1011163 Average 11,41277 2575,028 1055352

37 Table10. Results (estimated p, λ1, λ2) of the Java Estimation Program (for case 4)

Estimation Result Estimated Relative Error Real n = 100 n = 200 n = 5001 n= 10002 n = 100 n = 200 n = 500 n = 1000 p 0,95 0,9411 0,9365 0,9056 0,908 0,94% 1,42% 4,67% 4,42%

λ1 1 3,258 4,7973 1,1988 6,8924 225,80% 379,73% 19,88% 589,24% λ2 600 500,662 511,0164 400,6099 409,841 16,56% 14,83% 33,23% 31,69%

1. The Newton-Raphson method does not converge, the estimated values are approximated by the values obtained at iteration 10.

2. The Newton-Raphson method does not converge, the estimated values are approximated by the values obtained at iteration 10

In this case, as Table 10 indicates, the moment estimation results are far from being satisfactory.

For cases where p is moderate and the difference between the means of gamma and reciprocal gamma components are comparatively small, our experiments8 showed that the moment estimation provides relatively good results, and the accuracy generally increases as n gets larger.

Conclusion of the Estimation Experiments

For simplified case where α1, α2 >3 are known. The moment estimation for p, λ1, λ2 (with the help of Newton-Raphson method) tends to provide poor and inconsistent results (the quality does not improve as n gets larger) as p gets very close to 1 (in other words, 1-p gets close to 0), and the mean of the reciprocal gamma component gets much larger than the gamma component. The Newton-Raphson method is not guaranteed to converge in the above cases.

It seems that the moment estimation is an acceptable choice (if there’re no better alternatives) when p is not very close to 1, and the mean of reciprocal gamma component is not much larger than the gamma component, for example, in case one.

8 For example, the author has tried the case where p =0.6, α1=4, α2=4, λ1= 0.5, λ2 =5, the obtained estimation results were of higher accuracy level than the four cases given here, and as expected the accuracy level increases as n gets larger.

38 6. Conclusions

The light-tailed gamma distribution is a reasonable model for ordinary (small and moderate) claims. The method of moments estimation can provide the first approximation for the iterative solution of the gamma distributions’ likelihood equations. The likelihood equations may be solved using an iterative search like Newton-Raphson method. Alternatively, if we use an approximation of the digamma function ψ(α), a closed-form of gamma (approximate) MLEs can be derived.

The reciprocal gamma distribution has a heavy tail since for large x values the tail is a power- law with exponent (-α-1). It is therefore proposed as a model for large claims. The MLE of reciprocal gamma distribution is carried out by replacing each observation yi by its reciprocal

= 1 yx ii and proceeding as for the gamma distribution.

When the insurance portfolio consists of both “ordinary” policies and policies under which much larger claim sizes are likely, It is intuitively appealing to use the mixture of gamma and R-gamma model. The gamma distribution has a lighter tail and can model the frequent, small claims while the R-gamma distribution covers the large, but infrequent claims.

In the mixture model, there are overall five parameters to estimate, to apply the method of moments, we must equate five pairs of population and sample moments; the resulting system can be solved by the Newton-Raphson method for higher dimensions.

Applying the MLE method to the mixture distribution, we found out that the likelihood equations for estimating α1, λ1, α2, λ2 are weighted averages of the likelihood equations arising from of gamma or R-gamma considered separately. The weights are the probabilities of membership of each sample point xi in each class. Some type of iterative procedure can be used to solve the likelihood equations.

For simplified case where α1, α2 >3 are known, empirical observation indicates that, the moment estimation tends to provide poor and inconsistent results as p gets very close to 1, and the mean of the reciprocal gamma component gets much larger than the gamma component. The Newton-Raphson method is not guaranteed to converge in those extreme cases.

39 7. References

Adams, Robert A 2003, Calculus: A Complete Course (5th Ed), Pearson Education Canada Inc., Toronto.

Bowman, K.O. and Shenton, L.R.. 1988, Properties of Estimators for the Gamma Distribution, Marcel Dekker, New York.

Casella, G, and Berger, R.L. 1990, Statistical Inference, Pacific Grove, Calif,:Brooks Cole.

Everitt, B. S. & Hand, D.J. 1981, Finite Mixture Distributions, Chapman and Hall.

Handbook of applicable mathematics, 1984, Walter Ledermann (chief editor), Vol. 6, , Part A, Emlyn Lloyd (Eds), Chichester: J. Wiley & Sons.

Hory, C. & Martin N.2002, Maximum likelihood noise estimation for spectrogram segmentation control, Acoustics, Speech, and Signal Processing, 2002 IEEE International Conference Vol 2 : page 1581-1584.

Hossack, I.B., Pollard, J.H., & Zehnwirth 1999, Introductory Statistics with Applications in General Insurance, 2nd edn, Cambridge University Press.

Kaas, R., Goovaerts, M. , Dhaene, J., & Denuit, M., 2001, Modern Actuarial Risk Theory, Kluwer Academic Publishers.

Klugman, S.A., Panjer, H.H. and Willmot, G.E. 1998, Loss Models, From Data to Decisions, Wiley, New York.

McLachlan G. & Peel D. 2000, Finite Mixture Models. Wiley, New York.

Shorack, Galen R. 2000, Probability for Statisticians, Secaucus, NJ, USA: Springer-Verlag New York, Inc.

Titterington, D.M., Smith, A.F.M. & Makov, U.E. 1985, Statistical Analysis of Finite Mixture Distributions. Wiley, New York.

Wackely,D.D., Mendenhall, W. & Scheaffer, R.L. 2002, Mathematical Statistics with Applications, 6th edn, Duxbury.

Weisstein, E.W. n.d., Gamma Distribution , CRC Press, LLC, viewed 15 July 2005 .

40

8. Appendixes

41 Appendix A: The Java Mixture Model Simulation Program-Users’ Guide

The Java Mixture Model Simulation Program is specially designed for the general gamma and reciprocal mixture model (4.1) for the simulation purpose.

α1 α 2 λ1 1 1 −− λα 1x λ2 α 2 −− 1 ⎛ − λ2 ⎞ (A.1) xf λαλα 2211 ),,,,|( = expp −+ )1( xp exp⎜ ⎟ Γ α1)( Γ α2 )( ⎝ x ⎠ p << 10( ) In this mixture model, the gamma distribution has a lighter tail and is therefore suitable as a model for ordinary claims, whereas the heavy-tailed R-gamma distribution covers the extra large claims. The so-called mixing parameter p is the probability that a particular claim falls into the ordinary claim category. If p = 1, (4.1) reduces to a single gamma model, and if p = 0 reduces to single reciprocal gamma model.

The Simulation Program accepts user’s inputs of the five parameters α1, λ1, α2 ,λ2, p, generates the resulting (claim size) random variable and produce histograms in different scales, just as if statistical data of a large number of real observed claims are available. The program also computes and shows the first three moments of the simulated samples.

The user-friendly Graphical User Interface (GUI) makes the program simple and straightforward to use, as shown in Figure A.1.

Figure A.1. Graphical user interface of the program

42

In the upper part of the GUI, the user can specify the values of five parameters, define how many values to simulate, and choose type of the histogram they want to produce; the resulting histogram is shown in the lower part, and the graph can be printed directly or saved as a PNG image file for later use. The details of the procedure are given as follows.

Step 1. Specifying Input

Figure A.2. The upper part of the GUI- specifying the inputs

Specify parameters α1, λ1, α2, λ2, p

As said above, this is the place you specify the values of parameters. Under the title Gamma Distribution, the text fields alpha >0, lambda>0, and weights (0-1) refers to the α1, λ1, and p in the mixture model (A.1). The numbers inputted should be in the form like “5.0”.

Under the title Reciprocal Gamma Distribution, the text fields alpha >0, lambda>0, and weights (0-1) refers to the α2, λ2, and 1-p. However, you don’t have to input any value in the last text field; it is calculated and shown automatically after you specified p.

Choose type of histogram

Next you choose type of histogram to produce by selecting one of the three alternatives (radio buttons) - General Histogram, Log Histogram and Power Histogram. For example, to produce a histogram in logarithmic scale, simply click the circle to left of text “Log Histogram”. The program produces general (normal) histogram by default.

Define the amount of realizations

In the text field Sample Size>0 you define how many values to simulate, for example, typing “10000” in this text field tells the java program to simulate 10000 realizations of the random variable which follows the mixture distribution (A.1).

Plot Histogram

Pressing the button Plot Histogram initiates the process of simulating and consequently a histogram appears in the histogram panel.

Clear all inputs & graphs

Pressing the button Reset clears all text fields, and if any, the histogram chart (graph) you generated previously.

43 If there are invalid inputs

Note that all text fields are facilitated with input verifying function. It means that you don’t have to worry about that the program will crash if you input any kind of invalid values. If you try to input for example “-8” in positive number only text fields like alpha >0, nothing will happen, the text fields will not respond to your input and look empty.

Get the tool text tips

When you pause with the cursor over any of the program's text fields and buttons, the tool tip for that component comes up.

Step 2. Obtaining Output

Figure A.3. The lower part of the GUI- obtaining the output

Here you get the resulting histogram. The program also computes the first three moments of the simulated samples and shows it in a label at the bottom.

Moving the mouse focus on the chart and clicking the right key a pop-up menu turns up, as shown in the above figure.

Select Properties, a dialog about the chart properties will turn up as shown below.

44 Figure A.4. Chart properties

In this dialog you can check and change the properties of the histogram chart, like title and legend. For example, you can change the text, font or the colour of histogram title.

Select Save As to save the histogram chart as a PNG image file into a folder you specified. If needed you can later insert the image file into a text document, for example, a Microsoft Word document.

Select Print to print the chart.

Zoom in, Zoom out and Auto Range are used if you want view the graph in different ranges.

45 Appendix B: The Java Mixture Model Estimation Program-Users’ Guide

The Java Mixture Model Estimation Program is specially designed for the general gamma and reciprocal mixture model (4.1) for the estimation purpose. Assuming that the parameters α1, α2 >3 are already known, the program estimates the unknown parameters p, λ1, and λ2 by method of moments. The system of moment estimation equations is solved by the Newton-Raphson iterative method.

α1 α 2 λ1 1 1 −− λα 1x λ2 α 2 −− 1 ⎛ − λ2 ⎞ (B.1) xf λαλα 2211 ),,,,|( = expp −+ )1( xp exp⎜ ⎟ Γ α1)( Γ α2 )( ⎝ x ⎠ p << 10( )

The Estimation Program accepts user’s inputs of the known parameters α1, α2, first three sample moments m1, m2, m3, and the initial guess of parameters to be estimated (p, λ1, and λ2). As output, it displays both the Newton-Raphson iteration process and the final estimated results.

Figure B.1. Graphical user interface of the program

46 Step 1. Specifying Input

Specify parameters α1, α2, m1, m2, m3

The text fields alpha1 and alpha2 refer to the known parameters α1, α2, in the mixture model (B.1). m1, m2, m3 refer to the first three sample moments. Note that it is required that alpha1>0, and alpha2 >3. The numbers inputted should be in the form like “5.0”.

Give initial guess of parameters to be estimated

In the text fields p and lambda1 and lambda2 you specify your initial guess of to-be-estimated parameters p, λ1, λ2.

Clear all inputs

Pressing the button Reset clears all text fields.

Get the tool text tips

When you pause with the cursor over any of the program's text fields and buttons, the tool tip for that component comes up.

Step 2. Obtaining Output

Press Button Estimate initiates the estimation process, see Figure B.2. The final results are given in the text label at the bottom. The Newton-Raphson iterations are illustrated in the scroll pane.

Click the arrow buttons on the scroll bars to read the full text in the scroll pane.

To copy the text and paste it in another text-document for example Word, first select the text to be copied, then click Edit, select Copy, then paste later into the targeted text-document.

47

Figure B.2. Graphical user interface of the program-obtaining the output

48 Appendix C: Java Source Code: MixtureModelSimulation.java

/** * @(#) MixtureModelSimulation.java 1.0 05/10/16 */

/** * MixtureModelSimulation.java is a 1.4 Application that requires libraries: * ssj, jfreecharr, jcommon * * The Application generates a random vairable that follows a mixture * distribution of Gamma and Reciprocal Gamma and produces the resulting histogram. * It also computes and displays the first three moments of the simulated sample. * * The parameters for Gamma and Reciprocal Gamma distributions, * alpha1, lambda1, alpha2, lambda2 * and the weights on the two distributions are determined by the user inputs. * * The application is one of the two applications designed for D-Project: * Modelling the Insurance Claim Sizes * using the Mixture of Gamma & Reciprocal Distribution. * It can be used independently or in combination with the other application: * MixtureModelEstimation.java * * @version 1.0 16OCT 2005 * @author Ying Ni Mälardalen University * @Copyright: Copyright (c) 2005, Ying Ni * @email of Ying: [email protected] */ import java.awt.*; import java.awt.event.*; import javax.swing.*; import javax.swing.border.*; import javax.swing.text.NumberFormatter; import java.util.Locale; import java.text.*; import java.beans.PropertyChangeListener; import java.beans.PropertyChangeEvent; import umontreal.iro.lecuyer.rng.*; import umontreal.iro.lecuyer.probdist.*; import umontreal.iro.lecuyer.randvar.*; import org.jfree.data.statistics.*; import org.jfree.chart.*; import org.jfree.chart.plot.*; import org.jfree.util.*; public class MixtureModelSimulation implements ActionListener {

JFrame frame;

49

JPanel mainPane,inputAndCommandPane,inputPane,commandPane,outputPane,progressPane;

JFormattedTextField alphaTextField,lambdaTextField, weightGammaTextField, alphaRTextField,lambdaRTextField,weightGammaRTextField,sampleSizeTextField;

JButton resetButton, plotHistoButton; ButtonGroup group; JProgressBar progressBar; JLabel testLabel,sampleInfoLabel;

// format for FormattedTextFields DecimalFormat myFormatter; DecimalFormatSymbols symbols;

// parameters for Gamma & Reciprocal Gamma-fistProbability is same as weight of Gamma

double alpha1,alpha2,lambda1, lambda2,firstProbability; int size; String typeOfHistogram="General Histogram";

RandMrg rng = new RandMrg(); UniformDist ud = new UniformDist(); UniformGen ug = new UniformGen(rng,ud);

double firstMoment,secondMoment,thirdMoment;

public static void main(String[] args) {

//Schedule a job for the event-dispatching thread: //creating and showing this application's GUI.

javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { createAndShowGUI(); } }); }

/** * Create the GUI and show it. For thread safety, * this method should be invoked from the * event-dispatching thread. */ private static void createAndShowGUI() {

//Create and set up the window. JFrame frame = new JFrame("Mixture of Gamma & Reciprocal Gamma Distribution"); frame.setSize(1010,750);

50 frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

MixtureModelSimulation mixture = new MixtureModelSimulation();

mixture.mainPane.setOpaque(true); //content panes must be opaque

frame.setContentPane(mixture.mainPane);

//Display the window. //frame.pack(); frame.setVisible(true); }

/** * A Contructor-creat the mainPane which consists of inputAndCommandPane * (inputPane+commandPane), * outputPane and infoPane. The inputPane is itself divided into two subPanels * -the leftSubPane and the rightSubPane */

public MixtureModelSimulation(){

mainPane = new JPanel(); mainPane.setLayout(new BoxLayout(mainPane, BoxLayout.PAGE_AXIS));

// put inputPane and commandPane into one Panel called inputAndCommanPane, // in order to make a general line border for them.

inputAndCommandPane = new JPanel();

inputAndCommandPane.setLayout(new BoxLayout(inputAndCommandPane, BoxLayout.PAGE_AXIS));

inputAndCommandPane.setBorder(BorderFactory.createCompoundBorder(

BorderFactory.createLineBorder(Color.blue.darker(),1),

BorderFactory.createEmptyBorder(5,5,5,5)));

setInputPane(); inputAndCommandPane.add(inputPane);

setCommandPane(); inputAndCommandPane.add(commandPane);

// add subPanels to the mainPane mainPane.add(inputAndCommandPane);

setOutputPane(); setProgressPane(); mainPane.add(outputPane); mainPane.add(progressPane);

51

}

private void setInputPane(){

// set up inputPane inputPane = new JPanel(); inputPane.setLayout(new GridLayout(0,2)); inputPane.setMaximumSize(new Dimension(900,50));

// set up the leftSubPane-labels and TextFields for Gamma Distribution

JPanel leftSubPane = new JPanel(); leftSubPane.setLayout(new GridLayout(1,6));

// border for leftSubPane TitledBorder border = new TitledBorder("Gamma Distribution"); border.setTitleColor(Color.blue.darker()); leftSubPane.setBorder(border);

JLabel alpha = new JLabel(" alpha>0:"); alpha.setForeground(Color.blue.darker()); alpha.setToolTipText("Shape parameter for Gamma distribution");

JLabel lambda = new JLabel(" lambda>0:"); lambda.setForeground(Color.blue.darker()); lambda.setToolTipText("Inverse of Scale parameter for Gamma distribution");

JLabel weightGamma = new JLabel ("Weight(0~1):"); weightGamma.setForeground(Color.blue.darker()); weightGamma.setToolTipText("weight of Gamma Distribution");

//Create format-myformatter, which will be used for many FormattedTextFields

symbols= new DecimalFormatSymbols(); symbols.setDecimalSeparator('.'); symbols.setGroupingSeparator(','); myFormatter = new DecimalFormat("##0.0##",symbols);

//Create and format input textFields-alphaTextField,lambdaTextField, //weightGammaTextField alphaTextField = new JFormattedTextField(myFormatter); alphaTextField.setColumns(5); alphaTextField.setValue(new Double(2.0)); alphaTextField.setToolTipText("Shape parameter for Gamma distribution");

// The FormattedTextField itself has powerful input verification, but // it is not sufficient here, using inner calss myInputVerifyListener // to specify the inputs must be > 0.

alphaTextField.addPropertyChangeListener(new myInputVerifyListener());

52

lambdaTextField = new JFormattedTextField(myFormatter); lambdaTextField.setColumns(5); lambdaTextField.setValue(new Double(1)); lambdaTextField.setToolTipText("Inverse of Scale parameter for Gamma distribution"); lambdaTextField.addPropertyChangeListener(new myInputVerifyListener());

weightGammaTextField = new JFormattedTextField(myFormatter); weightGammaTextField.setColumns(5); weightGammaTextField.setValue(new Double(0.9)); weightGammaTextField.setToolTipText("weight of Gamma Distribution");

//Tell accessibility tools about label/textfields pairs alpha.setLabelFor(alphaTextField); lambda.setLabelFor(lambdaTextField); weightGamma.setLabelFor(weightGammaTextField);

// add up components to leftSubPane

leftSubPane.add(alpha); leftSubPane.add(alphaTextField); leftSubPane.add(lambda); leftSubPane.add(lambdaTextField); leftSubPane.add(weightGamma); leftSubPane.add(weightGammaTextField);

// set up the rightSubPane-Labels &TextFields for Reciprocal Gamma Distribution

JPanel rightSubPane = new JPanel(); rightSubPane.setLayout(new GridLayout(1,6));

// rightSubPane has exactly the same border as leftSubPane

TitledBorder border2 = new TitledBorder("Reciprocal Gamma Distribution"); border2.setTitleColor(Color.blue.darker()); rightSubPane.setBorder(border2);

JLabel alphaR = new JLabel(" alpha>0:"); alphaR.setForeground(Color.blue.darker()); alphaR.setToolTipText("Shape parameter for Reciprocal Gamma");

JLabel lambdaR = new JLabel(" lambda>0:"); lambdaR.setForeground(Color.blue.darker()); lambdaR.setToolTipText("Scale parameter for Reciprocal Gamma");

JLabel weightGammaR = new JLabel ("Weight(0~1):"); weightGammaR.setForeground(Color.blue.darker()); weightGammaR.setToolTipText("weight of Reciprocal Gamma");

53 // inputTextFields for the right subPanel alphaRTextField = new JFormattedTextField(myFormatter); alphaRTextField.setColumns(5); alphaRTextField.setValue(new Double(4.0)); alphaRTextField.setToolTipText("Shape parameter for Reciprocal Gamma"); alphaRTextField.addPropertyChangeListener(new myInputVerifyListener());

lambdaRTextField = new JFormattedTextField(myFormatter); lambdaRTextField.setColumns(5); lambdaRTextField.setValue(new Double(60.0)); lambdaRTextField.setToolTipText("Scale parameter for Reciprocal Gamma"); lambdaRTextField.addPropertyChangeListener(new myInputVerifyListener());

weightGammaRTextField = new JFormattedTextField(myFormatter); weightGammaRTextField.setColumns(5);

// WeightGammaRTextField is unEditable, it's value is updated with // WeightGammaTextField

// The input verification for weightGammaTextField is done seperately below // also weightGammaRTextField is updated in this anonymous inner class weightGammaRTextField.setEditable(false); weightGammaRTextField.setToolTipText("weight of Reciprocal Gamma"); weightGammaTextField.addPropertyChangeListener(new PropertyChangeListener(){

public void propertyChange(PropertyChangeEvent evt){

String c =weightGammaTextField.getText();

boolean updateWeightGammaRTextField = false;

if (c.length()>0){ if (Character.isDigit(c.charAt(0))) {

updateWeightGammaRTextField = true; }else{

weightGammaTextField.setValue(null); weightGammaRTextField.setValue(null); } }

if (updateWeightGammaRTextField) {

double q= (Number)weightGammaTextField.getValue()).doubleValue();

if (q>=0&& q<=1){ weightGammaRTextField.setValue(new Float(1-q)); firstProbability = q; } else {

54 weightGammaTextField.setValue(null); weightGammaRTextField.setValue(null); // defalut value firstProbability = 0.5; }

}

}

});

//Tell accessibility tools about label/textfields pairs alphaR.setLabelFor(alphaRTextField); lambdaR.setLabelFor(lambdaRTextField); weightGammaR.setLabelFor(weightGammaRTextField);

// add components to the rightSubPane rightSubPane.add(alphaR); rightSubPane.add(alphaRTextField); rightSubPane.add(lambdaR); rightSubPane.add(lambdaRTextField); rightSubPane.add(weightGammaR); rightSubPane.add(weightGammaRTextField);

// add left & right subPane to inputPane, inputPane is now completed inputPane.add(leftSubPane); inputPane.add(rightSubPane);

}

// set up the commandPane- consisting of RadioButtons, one resetButton, // one plotHistoButton and one sampleSizeTextField- // in which the user can specify how many realizations he or she wants.

private void setCommandPane(){

commandPane = new JPanel(); commandPane.setMaximumSize(new Dimension(900,40)); commandPane.setLayout(new GridLayout(1,0)); commandPane.setBorder(BorderFactory.createEmptyBorder(6,0,6,0));

// set up the radiobuttons

JRadioButton generalHisto = new JRadioButton("General Histogram"); generalHisto.setActionCommand("General Histogram"); generalHisto.setSelected(true);

generalHisto.setForeground(Color.blue.darker());

55

JRadioButton logHisto = new JRadioButton("Log Historgram"); logHisto.setActionCommand("Logarithmic Histogram");

//logHisto.setEnabled(false); logHisto.setForeground(Color.blue.darker());

JRadioButton powerHisto = new JRadioButton("Power Histogram"); powerHisto.setActionCommand("Power Histogram"); powerHisto.setEnabled(false);

//powerHisto.addActionListener(this); powerHisto.setForeground(Color.blue.darker());

//Group the radio buttons. group = new ButtonGroup(); group.add(generalHisto); group.add(logHisto); group.add(powerHisto);

JLabel sampleSizeLabel = new JLabel(" Sample Size >0:"); sampleSizeLabel.setToolTipText("Specify How many realizations you want"); sampleSizeLabel.setForeground(Color.blue.darker());

sampleSizeTextField = new JFormattedTextField(NumberFormat.getIntegerInstance (java.util.Locale.US)); sampleSizeTextField.setColumns(5); sampleSizeTextField.setValue(new Integer (100)); sampleSizeTextField.setToolTipText("Specify How many realizations you want"); sampleSizeTextField.addPropertyChangeListener(new myInputVerifyListener());

//Tell accessibility tools about label/textfields pairs sampleSizeLabel.setLabelFor(sampleSizeTextField);

// Two Buttons-Reset and Plot Histogram and a Progress bar resetButton = new JButton("Reset"); resetButton.setActionCommand("Reset"); resetButton.setForeground(Color.blue.darker()); resetButton.addActionListener(this);

plotHistoButton = new JButton("Plot Histogram!"); plotHistoButton.setActionCommand("plot"); plotHistoButton.setForeground(Color.blue.darker()); plotHistoButton.addActionListener(this);

//add components to commandPane, commandPane is completed now. commandPane.add(generalHisto);

56 commandPane.add(logHisto); commandPane.add(powerHisto); commandPane.add(sampleSizeLabel); commandPane.add(sampleSizeTextField); commandPane.add(plotHistoButton); commandPane.add(resetButton);

}

// actions generated by pressing PlotHistoButton and resetButton

public void actionPerformed(ActionEvent e){

// check whether the user want to reset the input values or plot now if ("Reset".equals(e.getActionCommand())){

alphaRTextField.setValue(null); lambdaRTextField.setValue(null); weightGammaRTextField.setValue(null);

alphaTextField.setValue(null); lambdaTextField.setValue(null); weightGammaTextField.setValue(null);

sampleSizeTextField.setValue(null); testLabel.setVisible(false); sampleInfoLabel.setVisible(false);

outputPane.removeAll();

} else if("plot".equals(e.getActionCommand())){

outputPane.removeAll(); generateResult();

String testLabelString = "Gamma: alpha1= "+alpha1+" lambda1= "+lambda1+ " Reciprocal Gamma: alpha2= "+alpha2+" lambda2= " +lambda2+" Weight of Gamma: "+firstProbability+ " Generated: "+size+" realizations! ";

String sampleInfoLabelString = "The first moment: " +((alpha2>1) ? myFormatter.format(firstMoment) :" does not exist") +" The second moment: " + ((alpha2>2)? myFormatter.format(secondMoment) :" does not exist") +" The third moment: " + ((alpha2>3)? myFormatter.format(thirdMoment) :" does not exist");

testLabel.setText(testLabelString); sampleInfoLabel.setText(sampleInfoLabelString);

57 testLabel.setVisible(true); sampleInfoLabel.setVisible(true);

}

}

private void generateResult(){ String titleOfChart="Histogram of our data"; getInputValues(); GammaDist gd1 = new GammaDist(alpha1,lambda1); GammaDist gd2 = new GammaDist(alpha2,lambda2);

GammaGen gg1 = new GammaGen(rng,gd1);

GammaGen gg2 = new GammaGen(rng,gd2);

double[] array = new double[size];

for (int i=0; i<=size-1; i++){

double uniform = ug.nextDouble(rng,0,1); if (uniform < firstProbability) { array[i]= gg1.nextDouble(rng,alpha1,lambda1); }else { array[i]=1.0/gg2.nextDouble(rng,alpha2,lambda2); } }

firstMoment=0; secondMoment=0; thirdMoment=0;

for (int j=0; j<=size-1; j++){ firstMoment= firstMoment+array[j]; secondMoment= secondMoment+Math.pow(array[j],2); thirdMoment= thirdMoment+Math.pow(array[j],3);

} firstMoment=firstMoment/size; secondMoment=secondMoment/size; thirdMoment=thirdMoment/size;

// check which type of histogram is selected if (typeOfHistogram=="Logarithmic Histogram"){

titleOfChart="Log Histogram of our data";

for (int j=0; j<=size-1; j++){ array[j]=Math.log(array[j]); }

58 } //else if(histogramType=="Power Histogram"){ // titleOfChart="Power Histogram of our data"; //}

HistogramDataset dataset = new HistogramDataset(); dataset.addSeries("Our data",array,80);

JFreeChart chart = ChartFactory.createHistogram ( titleOfChart, "", "", dataset, PlotOrientation.VERTICAL, true, true, false); ChartPanel chartPanel = new ChartPanel(chart); chartPanel.setPreferredSize(new Dimension(800,500)); outputPane.add(chartPanel); chartPanel.setVisible(true); }

private void getInputValues(){

// default values alpha1=5.0; lambda1=3.0; alpha2=5.0; lambda2=3.0; firstProbability= 0.5; size=100;

// get user inputs // check which type of histogram is selected typeOfHistogram= group.getSelection().getActionCommand();

// parameters and number of realizations to simulate if (!(alphaTextField.getValue()==null)){ alpha1= ((Number)alphaTextField.getValue()).doubleValue(); } if (!(alphaRTextField.getValue()==null)){ alpha2= ((Number)alphaRTextField.getValue()).doubleValue(); } if (!(lambdaTextField.getValue()==null)){ lambda1= ((Number)lambdaTextField.getValue()).doubleValue(); } if (!(lambdaRTextField.getValue()==null)){ lambda2= ((Number)lambdaRTextField.getValue()).doubleValue(); }

59 if (!(weightGammaTextField.getValue()==null)){ firstProbability= ((Number)weightGammaTextField.getValue()).doubleValue(); }

if (!(sampleSizeTextField.getValue()==null)){ size= ((Number)sampleSizeTextField.getValue()).intValue(); }

}

// outputPane is the place to show the Histogram

private void setOutputPane(){

outputPane = new JPanel(); TitledBorder border3 = new TitledBorder(new LineBorder(Color.blue.darker()), "Histogram"); border3.setTitleColor(Color.blue.darker());

outputPane.setBorder(border3); outputPane.setMaximumSize(new Dimension(920,600)); }

private void setProgressPane(){

progressPane = new JPanel();

// testLabel test whether the parameters match exactly the user inputs // it also presents the first sample moments of the simulated sample

testLabel = new JLabel(""); sampleInfoLabel =new JLabel(""); testLabel.setForeground(Color.blue.darker()); sampleInfoLabel.setForeground(Color.red.darker()); //testLabel.setVisible(false);

progressPane.add(testLabel); progressPane.add(sampleInfoLabel);

}

// here we sepcify the formatted TextFields accept only positive values

class myInputVerifyListener implements PropertyChangeListener{

public void propertyChange(PropertyChangeEvent e){

JFormattedTextField source = (JFormattedTextField)e.getSource(); String c = source.getText(); boolean isOk = false;

// Formatted TextFields has a function-when the first letter (or more)

60 // is an valid input followed by invalid inputs, the whole input is // accepted with these invalid parts trimmed away

if (c.length()>0){ if (Character.isDigit(c.charAt(0))) { isOk = true; }else{ source.setValue(null); } }

if (isOk) {

float q = (float) ((((Number)source.getValue()).doubleValue()));

if ((q>=0)){ e.getNewValue(); }else { source.setValue(null); }

}

}

}

}

61 Appendix D: Java Source Code: MixtureModelEstimation.java

/** * @(#) MixtureModelEstimation.java 1.0 05/10/16 */

/** * MixtureModelEstiamtion.java is a 1.4 application that requires this file: * Class Newton_Solver.java * * The application estimates (using moment method) three unknown parameters- * p, lambda1, lambda2 for a mixture distribution of Gamma and Reciprocal Gamma, * given the other two parameters alpha1>0, alpha2>3 are already known. * * The known parameters, alpha1, alpha2, and the first three sample moments are * determined by the user inputs. * * The application is one of the two applications designed for D-Project: * Modelling the Insurance Claim Sizes using the Mixture of Gamma & Reciprocal Distribution. * It can be used independently or in combination with the other application: * MixtureModelSimulation.java * * @version 1.0 16OCT 2005 * @author Ying Ni Mälardalen University * @Copyright: Copyright (c) 2005, Ying Ni * @email of Ying: yni02001@ student.mdh.se, */ import java.awt.*; import java.awt.event.*; import javax.swing.*; import javax.swing.text.*; import java.text.*; import javax.swing.text.NumberFormatter; public class MixtureModelEstimation implements ActionListener {

double alpha1, alpha2, m1,m2,m3, p, lambda1, lambda2;

JPanel mainPanel, commandPanel, displayPanel, resultsPanel;

JFormattedTextField alpha1Field,alpha2Field,m1_Field,m2_Field,m3_Field, initialP_Field,initialLambda1_Field,initialLambda2_Field;

JButton startButton, resetButton;

JTextArea textArea; JLabel resultsLabel;

JMenuItem menuItem; JMenuBar menuBar ; JMenu mainMenu ;

62 DecimalFormat myFormatter; DecimalFormatSymbols symbols; final static int GAP = 10; final static int n=3; private final static String newline = "\n"; public MixtureModelEstimation() {

// CommandPanel is where the user input alpha1,alpha2, and sample moments // Displaypanel shows the process of Newton-Raphson method. // ResultPanel is just to show the estimated results

commandPanel = new JPanel();

displayPanel = new JPanel(); resultsPanel= new JPanel();

textArea = new JTextArea(25, 60);

JScrollPane scrollPanel = new JScrollPane(textArea, JScrollPane.VERTICAL_SCROLLBAR_ALWAYS, JScrollPane.HORIZONTAL_SCROLLBAR_ALWAYS); textArea.setEditable(false);

displayPanel.add(scrollPanel);

//Add various widgets to the sub panels. addWidgets();

//Create the main panel to contain the two sub panels. mainPanel = new JPanel(); mainPanel.setLayout(new BoxLayout(mainPanel, BoxLayout.PAGE_AXIS)); mainPanel.setBorder(BorderFactory.createEmptyBorder(5,5,5,5));

//Add the select and display panels to the main panel.

mainPanel.add(commandPanel); mainPanel.add(displayPanel); mainPanel.add(resultsPanel); } /* * Get the images and set up the widgets. */ private void addWidgets() {

/* * Create a label for displaying the moon phase images and * put a border around it. */ resultsLabel = new JLabel(); resultsLabel.setHorizontalAlignment(JLabel.LEFT);

63 resultsLabel.setVerticalAlignment(JLabel.BOTTOM); resultsLabel.setVerticalTextPosition(JLabel.BOTTOM); resultsLabel.setHorizontalTextPosition(JLabel.LEFT);

startButton = new JButton("Estimate"); startButton.setActionCommand("Estimate"); startButton.addActionListener(this); startButton.setToolTipText("start to estimate");

resetButton = new JButton("Reset"); resetButton.setActionCommand("Reset"); resetButton.addActionListener(this); resetButton.setToolTipText("clear all textfields"); //commandPanel is dividied into upper and lower parts JPanel upper = new JPanel(); JPanel lower = new JPanel();

//Add a border around the command panel. upper.setBorder(BorderFactory.createTitledBorder("Known parameters (alpha1, alpha2) & sample moments"));

lower.setBorder(BorderFactory.createTitledBorder("Initial guess of unknown parameters(p, lambda1, lambda2"));

//Add a border around the display panel. displayPanel.setBorder(BorderFactory.createCompoundBorder( BorderFactory.createTitledBorder("Newton-Raphson Iteration"), BorderFactory.createEmptyBorder(5,5,5,5)));

upper.add (this.createEntryFields1()); lower.add(this.createEntryFields2()); lower.add(startButton); lower.add(resetButton); commandPanel.add(upper); commandPanel.add(lower); commandPanel.setLayout(new BoxLayout(commandPanel, BoxLayout.PAGE_AXIS));

resultsPanel.add(resultsLabel);

}

protected JComponent createEntryFields1() {

symbols= new DecimalFormatSymbols(); symbols.setDecimalSeparator('.'); symbols.setGroupingSeparator(','); myFormatter = new DecimalFormat("##0.0###",symbols);

JPanel panel = new JPanel();

64

String[] labelStrings = { "alpha1: ", "alpha2: ", "m1: ", "m2: ", "m3: ",

};

JLabel[] labels = new JLabel[labelStrings.length];

JComponent[] fields = new JComponent[labelStrings.length]; int fieldNum = 0;

//Create the text field and set it up. alpha1Field = new JFormattedTextField(myFormatter); alpha1Field.setValue(new Double(2.0)); alpha1Field.setColumns(5); alpha1Field.setToolTipText("alpha1>0,shape parameter for gamma component"); fields[fieldNum++] = alpha1Field; alpha2Field = new JFormattedTextField(myFormatter); alpha2Field.setColumns(5); alpha2Field.setValue(new Double(4.0)); alpha2Field.setToolTipText("alpha2>3,shape parameter for R-gamma component"); fields[fieldNum++] = alpha2Field; m1_Field = new JFormattedTextField(myFormatter); m1_Field.setColumns(5); m1_Field.setValue(new Double(3.8)); m1_Field.setToolTipText("first sample moment"); fields[fieldNum++] = m1_Field;

m2_Field = new JFormattedTextField(myFormatter); m2_Field.setColumns(6); m2_Field.setValue(new Double(65.4)); m2_Field.setToolTipText("second sample moment"); fields[fieldNum++] = m2_Field;

m3_Field = new JFormattedTextField(myFormatter); m3_Field.setColumns(8); m3_Field.setValue(new Double(3621.6)); m3_Field.setToolTipText("third sample moment"); fields[fieldNum++] =m3_Field;

//Associate label/field pairs, add everything, //and lay it out. for (int i = 0; i < labelStrings.length; i++) { labels[i] = new JLabel(labelStrings[i], JLabel.TRAILING); labels[i].setLabelFor(fields[i]);

65 panel.add(labels[i]); panel.add(fields[i]);

}

return panel; } protected JComponent createEntryFields2() {

JPanel panel = new JPanel();

String[] labelStrings = { "p: ", "lambda1: ", "lambda2: " };

JLabel[] labels = new JLabel[labelStrings.length];

JComponent[] fields = new JComponent[labelStrings.length];

int fieldNum = 0;

initialP_Field = new JFormattedTextField(myFormatter); initialP_Field.setColumns(5); initialP_Field.setValue(new Double(0.7)); initialP_Field.setToolTipText("initial guess of p"); fields[fieldNum++] =initialP_Field;

initialLambda1_Field = new JFormattedTextField(myFormatter); initialLambda1_Field.setColumns(5); initialLambda1_Field.setValue(new Double(0.8)); initialLambda1_Field.setToolTipText("initial guess of lambda1"); fields[fieldNum++] =initialLambda1_Field;

initialLambda2_Field = new JFormattedTextField(myFormatter); initialLambda2_Field.setColumns(5); initialLambda2_Field.setValue(new Double(50)); initialLambda2_Field.setToolTipText("initial guess of lambda2"); fields[fieldNum++] =initialLambda2_Field;

//Associate label/field pairs, add everything, //and lay it out. for (int i = 0; i < labelStrings.length; i++) { labels[i] = new JLabel(labelStrings[i], JLabel.TRAILING); labels[i].setLabelFor(fields[i]); panel.add(labels[i]); panel.add(fields[i]);

} return panel; }

66

public void actionPerformed(ActionEvent e) {

if ("Reset".equals(e.getActionCommand())){

textArea.setText(null); alpha1Field.setText(null); alpha2Field.setText(null); m1_Field.setText(null); m2_Field.setText(null); m3_Field.setText(null); initialP_Field.setText(null); initialLambda1_Field.setText(null); initialLambda2_Field.setText(null);

} else if("Estimate".equals(e.getActionCommand())){ textArea.setText(null); doEstimation(); } }

public void doEstimation(){ getInputValues();

double jacobian[][] = new double[n][n];

//precision for convergence double eps = 1e-6; double[] x = {p, lambda1, lambda2}; //the function vectors double[] f = new double[n]; double errXi, errX, errF; // the corrections (increments) double[] dx = new double[n];

Newton_Solver aSolver= new Newton_Solver(alpha1,alpha2,m1,m2,m3);

//do 20 iterations, but break so far it converged

for (int j = 0; j < 20; j++) {

textArea.append(" ITERATION No. " + (j+1)+ newline + " ------" +newline);

//start to compute for this iteration aSolver.getOneIteration(x,f,jacobian,dx);

// display f values for every iteration

67 textArea.append(newline+" f values"+newline);

for ( int i=0; i <= n-1; i++) {

textArea.append(" "+Double.toString(f[i])+ newline); }

textArea.append(newline +" Jacobian matrix"+newline);

// display the jacobian matrix for (int p = 0; p < jacobian.length; p++) { textArea.append(" "+Double.toString(jacobian[p][0]) + " ");

for (int q = 1; q < jacobian[p].length; q++) { textArea.append(" "+Double.toString(jacobian[p][q]) + " "); } textArea.append(newline); }

textArea.append(newline+" Increment values"+newline);

for ( int i=0; i <= n-1; i++) { textArea.append(" "+Double.toString(dx[i])+newline); }

textArea.append(newline+" Resulting x values (p,lambda1, lambda2)"+newline);

//display the results for this iteration for ( int i=0; i <= n-1; i++) {

textArea.append(" "+Double.toString(x[i])+newline); } textArea.append(newline);

//set errors to 0 errX = errF = errXi = 0.0;

//Check for Convergence; find max err, then break if < eps

for ( int i = 0; i <= n-1; i++){ if ( x[i] != 0.) errXi = Math.abs(dx[i]/x[i]); else errXi = Math.abs(dx[i]); if ( errXi > errX) errX = errXi; // solution error if ( Math.abs(f[i]) > errF ) errF = Math.abs(f[i]); // functions error } if ((errX <= eps) && (errF <= eps)) break; // check for convergence

68

}

resultsLabel.setText("Estimated Results: p = "+myFormatter.format(x[0]) +" lambda1 = "+myFormatter.format(x[1])+" lambda2 = "+myFormatter.format(x[2])); }

private void getInputValues(){

if (!(alpha1Field.getValue()==null)){ alpha1= ((Number)alpha1Field.getValue()).doubleValue(); } if (!(alpha2Field.getValue()==null)){ alpha2= ((Number)alpha2Field.getValue()).doubleValue(); } if (!(m1_Field.getValue()==null)){ m1= ((Number)m1_Field.getValue()).doubleValue(); } if (!(m2_Field.getValue()==null)){ m2= ((Number)m2_Field.getValue()).doubleValue(); } if (!(m3_Field.getValue()==null)){ m3= ((Number)m3_Field.getValue()).doubleValue(); } if (!(initialP_Field.getValue()==null)){ p= ((Number)initialP_Field.getValue()).doubleValue(); } if (!(alpha2Field.getValue()==null)){ lambda1= ((Number)initialLambda1_Field.getValue()).doubleValue(); } if (!(m1_Field.getValue()==null)){ lambda2= ((Number)initialLambda2_Field.getValue()).doubleValue(); } }

//Create an Edit menu to support cut/copy/paste. public JMenuBar createMenuBar () { menuItem = null; menuBar = new JMenuBar(); mainMenu = new JMenu("Edit"); mainMenu.setMnemonic(KeyEvent.VK_E);

menuItem = new JMenuItem(new DefaultEditorKit.CutAction()); menuItem.setText("Cut"); menuItem.setMnemonic(KeyEvent.VK_T); mainMenu.add(menuItem); menuItem = new JMenuItem(new DefaultEditorKit.CopyAction());

69 menuItem.setText("Copy"); menuItem.setMnemonic(KeyEvent.VK_C); mainMenu.add(menuItem); menuItem = new JMenuItem(new DefaultEditorKit.PasteAction()); menuItem.setText("Paste"); menuItem.setMnemonic(KeyEvent.VK_P); mainMenu.add(menuItem);

menuBar.add(mainMenu); return menuBar; }

/** * Create the GUI and show it. For thread safety, * this method should be invoked from the * event-dispatching thread. */ private static void createAndShowGUI() {

//Create a new instance of LunarPhases. MixtureModelEstimation estimator = new MixtureModelEstimation();

//Create and set up the window. JFrame aFrame = new JFrame("Moment Estimation-using Newtons Method"); aFrame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); aFrame.setContentPane(estimator.mainPanel); aFrame.setJMenuBar(estimator.createMenuBar()); //Display the window. aFrame.pack(); aFrame.setVisible(true); }

public static void main(String[] args) { //Schedule a job for the event-dispatching thread: //creating and showing this application's GUI. javax.swing.SwingUtilities.invokeLater(new Runnable() { public void run() { createAndShowGUI(); } }); }

}

70 Appendix E: Java Source Code: Newton_Solver. java

/** * @(#) Newton_Solver.java 1.0 05/10/16 */

/** * Newton_Solver use Newton-Raphson Method to solve * the system of 3 nonlinear moment estimation equations * of the mixture model of gamma & reciprocal gamma * * The user should already know parameters alpha1, alpha2 and * the first three sample moments. The parameters to be estimated * are p, lambda1, lambda2, corresponding to x[0], x[1], and x[2] * in Newton_Solver.java * * * @version 1.0 16OCT 2005 * @author Ying Ni Mälardalen University * @Copyright: Copyright (c) 2005, Ying Ni * @email of Ying: yni02001@ student.mdh.se */

public class Newton_Solver{

private static int n = 3; private double m1; private double m2; private double m3 ; private double alpha1; private double alpha2;

public Newton_Solver(double alpha1, double alpha2,double m1,double m2, double m3){ this.m1=m1; this.m2=m2; this.m3=m3; this.alpha1=alpha1; this.alpha2=alpha2; }

// Obs-only works for a system of three equation in 3 unknowns

public void getOneIteration (double[] x,double[] f,double[][] jacobian, double[] dx){

//compute F vector = the n equations, - the RHS getFValue(x,f);

// compute array of derivatives dFi/dXj getJacobianValue(x, jacobian);

// computer corrections getDx(f,jacobian,dx);

71 for ( int i=0; i <= n-1; i++) { x[i] = x[i] + dx[i]; } }

public void getFValue(double x[], double f[]) {

// n=3-number of equations

f[0] = alpha1 * x[0]/ x[1] + (1-x[0])*x[2]/(alpha2-1.0) -m1; f[1] = (alpha1*(alpha1+1.0))*x[0]/Math.pow(x[1],2) +((1-x[0])*Math.pow(x[2],2))/((alpha2-2)*(alpha2-1))-m2; f[2] = (alpha1*(alpha1+1.0)*(alpha1+2.0))*x[0]/Math.pow(x[1],3) +((1-x[0])*Math.pow(x[2],3.0))/((alpha2-3.0)*(alpha2-2.0)*(alpha2-1.0))-m3; }

public void getJacobianValue(double x[], double jacobian[][]){

//compute derivatives of n=3 eqns

// dF1_dXj jacobian[0][0]= alpha1/x[1]-x[2]/(alpha2-1.0); jacobian[0][1]=-alpha1*x[0]/(Math.pow(x[1],2)); jacobian[0][2]=(1.0-x[0])/(alpha2-1.0);

//dF2_dXj jacobian[1][0]= ((alpha1+1.0)*(alpha1))/(Math.pow(x[1],2)) -(Math.pow(x[2],2))/((alpha2-2.0)*(alpha2-1.0)); jacobian[1][1]=(-2.0*(alpha1+1)*alpha1)*x[0]/(Math.pow(x[1],3)); jacobian[1][2]=2.0*(1.0-x[0])*x[2]/((alpha2-2.0)*(alpha2-1.0));

//dF3_dXj jacobian[2][0]= (alpha1*(alpha1+1.0)*(alpha1+2.0))/(Math.pow(x[1],3)) -(Math.pow(x[2],3))/((alpha2-3.0)*(alpha2-2.0)*(alpha2-1.0)); jacobian[2][1]=(-3*alpha1*(alpha1+1.0)*(alpha1+2.0))*x[0]/(Math.pow(x[1],4)); jacobian[2][2]= 3.0*(1.0-x[0])*(Math.pow(x[2],2))/((alpha2-3.0)*(alpha2-2.0)*(alpha2-1.0));

}

//Solve dx using Cramer's Rule

public static void getDx(double[] f,double[][] jac, double[] dx){

double[][] jacColOneReplaced=new double[n][n];

double[][] jacColTwoReplaced=new double[n][n];

72 double[][] jacColThreeReplaced=new double[n][n];

// Replace the first, second, and third columns of the Jacobian matrix // so to get three matrix: jacColOneReplaced, jacColTwoReplaced,jacColThreeReplaced

for (int i = 0; i < jac.length; i++) {

for (int j = 0; j < n; j++) {

jacColOneReplaced[i][j] = jac[i][j]; jacColTwoReplaced[i][j] = jac[i][j]; jacColThreeReplaced[i][j] = jac[i][j];

}

}

for (int row =0; row

jacColOneReplaced[row][0] = f[row]; }

for (int row =0; row

jacColTwoReplaced[row][1] = f[row]; }

for (int row =0; row

jacColThreeReplaced[row][2]= f[row];

}

// dxi(i=1,2,3) = determinant of jacobian matrix with ith column replaced by f1, f2, f3 // divided by the determinat of the Jacobian matrix

double nominator = detMatrixOf3(jac); double denominator1= detMatrixOf3(jacColOneReplaced); double denominator2= detMatrixOf3(jacColTwoReplaced); double denominator3= detMatrixOf3(jacColThreeReplaced);

dx[0] = -denominator1/nominator; dx[1] = -denominator2/nominator; dx[2] = -denominator3/nominator;

}

// compute the determinant of a 3*3 matrix using the definition of determinant.

public static double detMatrixOf3(double[][] m){ double det;

73 det = (m[0][0]*m[1][1]*m[2][2])+(m[0][1]*m[1][2]*m[2][0])+(m[0][2]*m[1][0]*m[2][1]) -(m[0][2]*m[1][1]*m[2][0])-(m[0][0]*m[1][2]*m[2][1])-(m[0][1]*m[1][0]*m[2][2]);

return det;

}

}

74