Risk theory
Harri Nyrhinen, University of Helsinki Fall 2017 Sisältö
1 Introduction 1
2 Background from Probability Theory 3
3 The number of claims 9 3.1 Poisson distribution and process ...... 10 3.2 Mixed Poisson variable ...... 16 3.3 The number of claims of a single policy-holder ...... 18 3.4 Mixed Poisson process ...... 19
4 Total claim amount 22
5 Viewpoints on claim size distributions 26 5.1 Tabulation method ...... 26 5.2 Analytical methods ...... 26 5.3 On the estimation of the tails of the distribution ...... 29
6 Calculation and estimation of the total claim amount 31 6.1 Panjer method ...... 31 6.2 Approximation of the compound distributions ...... 33 6.2.1 Limiting behaviour of compound distributions ...... 33 6.2.2 Refinements of the normal approximation ...... 36 6.2.3 Applications of approximation methods ...... 39 6.3 Simulation of compound distributions ...... 42 6.3.1 Producing observations ...... 43 6.3.2 Estimation ...... 44 6.3.3 Increasing efficiency of simulation of small probabilities ...... 46 6.4 An upper bound for the tail probability ...... 53 6.5 Modelling dependence ...... 54 6.5.1 Mixing models ...... 55 6.5.2 Copulas ...... 55
7 Reinsurance 61 7.1 Excess of loss (XL) ...... 61 7.2 Quota share (QS) ...... 67 7.3 Surplus ...... 68 7.4 Stop loss (SL) ...... 69 8 Outstanding claims 71 8.1 Development triangles ...... 71 8.2 Chain-Ladder method ...... 72 8.3 Predicting of the unknown claims ...... 76 8.4 Credibility estimates for outstanding claims ...... 82
9 Solvency in the long run 86 9.1 Classical ruin problem ...... 86 9.2 Practical long run modelling of the capital development ...... 95
10 Insurance from the viewpoint of utility theory 96 10.1 Utility functions ...... 96 10.2 Utility of insurance ...... 98 University of Helsinki 1
1 Introduction
Consider briefly the nature of the insurance industry and the motivation for buying insu- rance contracts. As an example, think about a collection of houses and the associated risk of fires. For a single house-owner, a fire means a huge economic loss. To hedge against the risk by building up a bank account with a sufficient amount of money seems not realistic. Usually the problem is solved by means of an insurance contract. This means that each house-owner pays a premium to an insurance company. The premium corresponds roughly to the mean level of losses because of fires for the house-owner in one year. By the cont- ract, the company compensates these losses. Thus the risk has moved to the insurance company and the house-owners are protected against random large losses by means of a deterministic moderate premium. The company typically makes a large number of similar contracts. The law of large numbers can be applied to see that the company is able to manage the compensations by means of moderate premiums. We have already introduced two important cash flows associated with the insurance business, namely, the compensations and the premiums. There are many other cash flows as it is illustrated in the following picture. University of Helsinki 2
PREMIUMS COMPEN- J SATIONS J J J J J J
J J J J RETURNS J^ ADMINIST- ON THE - INSURANCE - RATION INVESTMENTS COMPANY COSTS J J J J J J
J J J
J DIVIDENDS, JJ^ NEW CAPITAL TAXES, etc.
The course is focussed on the analysis of compensations. Examples of the goals are: - how should we describe the compensation process - how should we estimate the solvency of an insurance company. The course can be viewed as a study of risks associated with non-life insurance com- panies. The main source for the course is part I of the book DPP: Daykin, C., Pentikäinen, T. and Pesonen, M. (1994). Practical Risk Theory for Actua- ries. Chapman & Hall, London. The reader is referred to this book especially to get more applied discussion of various topics. More detailed references to appropriate chapters will be given during the course. University of Helsinki 3
2 Background from Probability Theory
The central subject of our interest is the compensation process, called later the claims process. We will consider it as a random variable or as a stochastic process. We list some concepts and facts from the probability theory which are assumed to be known. 1. Probability space Probability space is a triple (Ω, S, P) where Ω is the sample space. S is a sigma-algebra of Ω. The sets of S are called events. P is a probability measure. 2. Random variable A measurable map ξ : (Ω,S) → (R, B) is called a random variable where B is the Borel sigma-algebra of R. In the sequel, the measurability of a real-valued function refers to the measurability with respect to the Borel-sigma-algebra B. 3. Distribution The distribution P of the random variable ξ is the probability measure on (R, B) such that −1 P (B) = P(ξ (B)) = P(ω ∈ Ω | ξ(ω) ∈ B) for every B ∈ B. If the random variables ξ and η have the same distribution then we write
ξ =L η.
4. Distribution function, density function, probability mass function The distribution function F : R → R of the random variable ξ is defined by
F (x) = P(ξ ≤ x) = P ((−∞, x]).
The function f : R → R is the density (function) of ξ if Z x F (x) = f(t)dt −∞ for every x ∈ R. In this case, ξ is a continuous random variable. If there exists a countable subset {x1, x2,...} of R such that
P(ξ ∈ {x1, x2,...}) = 1 then ξ is discrete.Then the probability point mass function g : R → R of ξ is defined by
g(x) = P(ξ = x). University of Helsinki 4
In the sequel we often consider mixtures of continuous and discrete distributions. Then the distribution function has the form Z x X F (x) = f(t)dt + P(ξ = xi). (2.1) −∞ xi≤x kaikilla x ∈ R. 5. Expectation, variance, higher order moments The expectation of a random variable ξ is defined by Z E(ξ) = ξ(ω)dP(ω) Ω under the assumption that E(| ξ |) < ∞. If ξ ≥ 0 almost surely we also allow +∞ as the value of the expectation. Thus E(ξ) is defined for every non-negative random variable ξ. Let h : R → R be a measurable function. If E(| h(ξ) |) < ∞ and F is the distribution function of ξ then Z ∞ E(h(ξ)) = h(x)dF (x). −∞ If ξ has the density f then Z ∞ E(h(ξ)) = f(x)h(x)dx. −∞ If ξ is discrete and the probability mass function is g then
∞ X E(h(ξ)) = g(xi)h(xi), i=1 P∞ where it is assumed that i=1 g(xi) = 1. For the mixture (2.1) of a continuous and a discrete distribution, it holds
∞ Z ∞ X E(h(ξ)) = f(x)h(x)dx + P(ξ = xi)h(xi). −∞ i=1
The nth (origin) moment an of ξ is defined by Z ∞ n n an = E(ξ ) = x dF (x) −∞
n if E(| ξ |) < ∞, n = 1, 2,.... Hence, a1 = E(ξ). We also often write µ = E(ξ) or µξ = E(ξ). The nth central moment µn is defined by n µn = E((ξ − a1) ), n ≥ 2. University of Helsinki 5
The variance of ξ is 2 σξ = Var(ξ) = µ2 √ and the standard deviation σξ = µ2. The skewness γξ is defined by
3 3 3 γξ = E((ξ − a1) )/σ = µ3/σ .
6. Moment generating function
The moment generating function M = Mξ of ξ is a function R → R ∪ {+∞} which is determined by sξ Mξ(s) = E(e ).
The cumulant generating function c = cξ is a function R → R ∪ {+∞} determined by
cξ(s) = log Mξ(s).
Both of the functions are always defined. The following results hold. a) If the moment generating functions of two random variables coincide and are finite in a non-empty open subset of R then the distributions of the random variables coincide. b) Let ξ and η be independent random variables, that is,
P(ξ ∈ A, η ∈ B) = P(ξ ∈ A)P(η ∈ B) for every A, B ∈ B, denoted by ξ ⊥⊥ η. Then
Mξ+η(s) = Mξ(s)Mη(s) and cξ+η(s) = cξ(s) + cη(s) for every s ∈ R. c) The moment generating function has the derivatives of all order in the interior of (n) its domain. If s is in that interior then the nth derivative Mξ (s) is
(n) n sξ Mξ (s) = E ξ e .
In particular, if Mξ is finite in a neighbourhood of the origin then
(n) n Mξ (0) = E(ξ ) for every n ∈ N. Furthermore, 0 cξ(0) = E(ξ) and (n) n cξ (0) = E((ξ − a1) ) = µn, n = 2, 3. University of Helsinki 6
If P(ξ ≥ 0) = 1 then always lim M (n)(s) = (ξn). s→0− ξ E 7. Conditional expectation Let ξ and η be random variables. Assume that E(ξ) exists and is finite. Let σ(η) be the sigma-algebra generated by η, that is, σ(η) is the smallest sub-sigma-algebra of S where η is measurable. The conditional expectation of ξ with respect to η is the random variable E(ξ | η) which satisfies
(i) E(ξ | η) is σ(η) − measurable (ii) E{E(ξ | η)1(η ∈ B)} = E(ξ1(η ∈ B)) for every B ∈ B. In (ii) 1 is the indicator-function, that is, 1(η ∈ B)(ω) = 1, when η(ω) ∈ B and 0 otherwise. It can be shown that E(ξ | η) exists and is unique a.s. (almost surely). Furthermore, there exists a measurable map h : R → R such that E(ξ | η) = h(η). Intuitively, h(y) represents the mean of ξ if η = y. We often write E(ξ | η = y) = h(y). The conditional expectation has the following properties (assuming that the required expectations exist).
a) E(aξ1 + bξ2 | η) = aE(ξ1 | η) + bE(ξ2 | η) for every a, b ∈ R b) E{E(ξ | η)} = E(ξ) c) If f : R → R is measurable then E(f(η)ξ | η) = f(η)E(ξ | η) d) If ξ ⊥⊥ η then E(ξ | η) = E(ξ) e) If ξ1 ≤ ξ2 then E(ξ1 | η) ≤ E(ξ2 | η). All these results hold a.s. For a given B ∈ B, the conditional probability of B with respect to η is defined by
P(ξ ∈ B | η) = E(1(ξ ∈ B) | η).
Write also P(ξ ∈ B | η = y) = k(y) if P(ξ ∈ B | η) = k(η). Let Fη be the distribution function of η. There exists a family of distribution functions {Fξ|η(· | y) | y ∈ R}, the so called regular conditional distribution of ξ with respect to η, such that
(i) Fξ|η(· | y) is a distribution function for every y ∈ R (ii) Fξ|η(x | ·) is measurable for every x ∈ R R (iii) P(ξ ≤ x, η ≤ y) = u≤y Fξ|η(x | u)dFη(u) for every x, y ∈ R. If h : R → R is measurable and h(ξ) ∈ L then Z ∞ E(h(ξ) | η = y) = h(x)dFξ|η(x | y). x=−∞ University of Helsinki 7
In simple cases, the conditional expectation and probability can be determined in a elementary way. For example, let ξ and η be both discrete. Assume that ξ concentrates to {x1, x2,...} and η to {y1, y2,...}. Then
P(ξ = xi | η = yj) = P(ξ = xi, η = yj)/P(η = yj),
Fξ|η(x | yj) = P(ξ ≤ x | η = yj) = P(ξ ≤ x, η = yj)/P(η = yj) and ∞ X E(h(ξ) | η = yj) = P(ξ = xi | η = yj)h(xi) i=1 for every i, j = 1, 2,... and x ∈ R. The conditional expectation can be defined with respect to a collection of random variables, or more generally, with respect to a sub-sigma-algebra of S in the following way. Let F be a sub-sigma-algebra of S. The random variable E(ξ | F) is the conditional expectation of ξ with respect to F if (i) E(ξ | F) on F − measurable (ii) E{E(ξ | F)1(A)} = E(ξ1(A)) for every A ∈ F. The conditional expectation with respect to a random variable η is obtained by taking F = σ(η) where σ(η) = {η−1(B) | B ∈ B}. Also the general conditional expectation exists and is unique if E(ξ) is finite. The proper- ties a), b) and e) above still hold. In addition, if F is a sub-sigma-algebra of G then
E(ξ|F) = E(E(ξ|G)|F). Property c) takes the form c’) If ζ is F-measurable random variable then E(ζξ | F) = ζE(ξ | F). If in particular, η1, . . . , ηN are random variables and F = σ(η1, . . . , ηN ) is the sigma- algebra generated by these variables then there exists a measurable map h : RN → R such that E(ξ| σ(η1, . . . , ηN )) = h(η1, . . . , ηN ). Here RN is equipped by its Borel-sets. The regular conditional distribution also exists in this case. Write in short
E(ξ| σ(η1, . . . , ηN )) = E(ξ| η1, . . . , ηN ).
8. Laws of large numbers University of Helsinki 8
Let ξ1, ξ2,... be independent and identically distributed (i.i.d) random variables and let a1 = E(ξ1) exist and be finite. Then ξ1 + ··· + ξn P lim = a1 = 1. n→∞ n
This result is known as the strong law of large numbers (SLLN). In addition, for every > 0, ξ1 + ··· + ξn lim P − a1 ≥ = 0. n→∞ n This is known as the weak law of large numbers (WLLN). 9. Central limit theorem 2 Let ξ1, ξ2,... be as in section 8. Assume in addition that σ = Var(ξ1) < ∞. Then ξ1 + ··· + ξn − na1 lim P √ ≤ x = φ(x) n→∞ σ n for every x ∈ R where φ is the distribution function of the standard normal variable, x Z 1 2 φ(x) = √ e−t /2dt. −∞ 2π University of Helsinki 9
3 The number of claims
It is natural to describe the number of claims during a given time interval by means of a random variable whose possible values are non-negative integers. It is also useful to consider the accumulation of the claims as a stochastic process. For wider applied discussions, we refer to DPP, Section 2. Let K(t) be the number of claims occurred during the time interval (0, t] in a given in- surance portfolio (this means a fixed collection of insurance contracts). Let further K(t, u) be the number of claims occurred during the time interval (t, u] where 0 ≤ t < u. Thus K(t, u) = K(u) − K(t). A random variable K is called a counting variable if P(K ∈ {0, 1, 2,...}) = 1.A stochastic process {K(t) | t ≥ 0} is called a counting process if for each t ≥ 0, K(t) is a random variable on a fixed probability space (Ω, S, P) (which does not depend on t), and the following conditions are satisfied: (i) K(0) = 0 a.s. (ii) K(t) is a counting variable for each t ≥ 0 (iii) The realizations of the process are right continuous and have left limits. That is, the map fω : [0, ∞) → R, fω(t) = K(t)(ω) is right continuous and the limit limh→o+ fω(t − h) exists for every ω ∈ Ω (fω is a cadlag- function) (iv) P(K(t) − K(t−) = 0 or 1, ∀t > 0) = 1 where K(t−) = limh→0+ K(t − h). Condition (iii) is technical. What is essential is that K(t) is integer-valued and that jump size is always 1 (conditions (ii) and (iv)). These properties are natural when the development of the numbers of claims in time is modelled.
6
—4 fω(t) | r —3 | | r —2 | | r —1 | | r | - t (time) A realization of the counting process0 University of Helsinki 10
The jump times (the occurrence times of the claims) are random in the model. At each time point, at most one claim can occur. This property can be criticized because, for example, in the collision of two cars. Then the claim occurs at the same time for both of the participants. The problem can be solved by interpreting the collision as a one claim. In this way, the applicability of the model can be mad better.
3.1 Poisson distribution and process A random variable K has the Poisson distribution with the parameter λ ≥ 0 if
λk (K = k) = e−λ , k = 0, 1, 2,.... P k!
If λ = 0 then by convention, P(K = 0) = 1. A stochastic process {K(t) | t ≥ 0} is a Poisson process with the intensity λ ≥ 0 if {K(t)} is a counting process and a) K(t, u) has the Poisson distribution with the parameter λ(u−t) for every 0 ≤ t < u
b) for every time points 0 ≤ t1 < u1 ≤ t2 < u2 ≤ · · · ≤ tn < un, the increments K(t1, u1),...,K(tn, un) are independent. Poisson process is perhaps the simplest model to describe the development of the numbers of the claims in time. More flexibility can be obtained by means of the non- homogeneous Poisson process. This is defined in the following way. Let Λ : [0, ∞) → [0, ∞) be an increasing function such that Λ(0) = 0. The process {K(t) | t ≥ 0} is the Poisson process with the intensity function Λ if {K(t)} is a counting process and a’) K(t, u) has the Poisson distribution with the parameter Λ(u) − Λ(t) for every 0 ≤ t < u b’) condition b) holds. The usual Poisson Process is obtained by choosing Λ(t) = λt for every t ≥ 0. Theoretical motivation for the use of the Poisson process in modelling will be given in the sequel. We begin by listing some basic properties of the Poisson distribution. Theorem 3.1.1. Let K be a Poisson distributed random variable with the parameter λ. Then for the moment generating function MK , the expectation E(K), the variance Var(K), and the skewness γK have the forms
λ(es−1) MK (s) = e , ∀s ∈ R, (3.1.1)
E(K) = Var(K) = λ (3.1.2) and √ γK = 1/ λ. (3.1.3) University of Helsinki 11
Proof. The proofs of the claims concerning the moments are left to the reader. Let s ∈ R. Then ∞ ∞ X X λk M (s) = (K = k)esk = e−λ esk K P k! k=0 k=0 ∞ s k X (λe ) s s = e−λ = e−λeλe = eλ(e −1). k! k=0 The Poisson distribution is a limit of binomial distributions in the following way.
Lemma 3.1.1. Let ξn have Bin(n, pn) distribution,
n (ξ = k) = pk (1 − p )n−k, k = 0, 1, 2, . . . , n P n k n n
Assume that limn→∞ npn = λ. Then
k −λ λ lim P(ξn = k) = e , k = 0, 1, 2,... n→∞ k! Proof. Clearly,
n(n − 1) ··· (n − k + 1) λ + o(1)k λ + o(1)n−k (ξ = k) = 1 − , P n k! n n
n λ+o(1) −λ where o(1) → 0 as n → ∞. This proves the lemma because limn→∞ 1 − n = e . . Consider next counting processes. Theorem 3.1.2. Let {K(t)} be a counting process which satisfies
(i) independence of the increments: for every 0 ≤ t1 < u1 ≤ t2 < u2 · · · ≤ tn < un, the increments K(t1, u1),...,K(tn, un) are independent (ii) stationarity of the increments: for every t, r ≥ 0 K(r) and K(t + r) − K(t) are equally distributed. Then there exists λ ≥ 0 such that {K(t) | t ≥ 0} is the Poisson process with the intensity λ. The conditions of Theorem 3.1.2 are useful when Poisson process is considered as a candidate for modelling. The following stronger result is more convenient in the sequel. Theorem 3.1.3. Let {K(t)} be a counting process which satisfies condition (i) of Theorem 3.1.2 and the condition University of Helsinki 12
(ii)’ P(K(r) = 0) = P(K(t + r) − K(t) = 0) for every t, r ≥ 0. Then {K(t) | t ≥ 0} is the Poisson process with the intensity λ = − log P(K(1) = 0). Write in short
pk(t) = P(K(t) = k) for t ≥ 0, k = 0, 1, 2,..., (3.1.4) when the counting process {K(t)} in question is clear from the context. Proof of Theorem 3.1.3. Let r, t ≥ 0. By (i) and (ii)’,
p0(t + r) = P(K(t + r) = 0) = P(K(t) = 0,K(t, t + r) = 0) = P(K(t) = 0)P(K(t, t + r) = 0) = p0(t)p0(r). Let m, n ∈ N. Then m 1 m 1 m/n p = p = p n = [p (1)]m/n . 0 n 0 n 0 n 0
Now p0(t) is decreasing in t so that t p0 (t) = [p0 (1)] for every t ≥ 0. If p0(1) = 1 then P(K(t) = 0) = 1 for every t ≥ 0. Thus {K(t)} is a Poisson process (the intensity is λ = 0). Assume henceforth that p0(1) < 1. Suppose for a while that p0(1) = 0. For any 0 ≤ a < b ≤ 1, we then have
P(K(a, b) ≥ 1) = P(K(b) − K(a) ≥ 1) = 1 − P(K(b) − K(a) = 0) b−a = 1 − p0(b − a) = 1 − [p0(1)] = 1.
This implies that P(K(1) < ∞) = 0, a contradiction.
By the above observations, we can assume in the sequel that p0(1) ∈ (0, 1). Write
λ = − log(p0(1))
−λt so that p0(t) = e kaikilla t ≥ 0. Let now t > 0 and let k be an arbitrary non-negative integer. Consider the probability pk(t). Let n ∈ N and n > k. The intervals ν − 1 ν In = t, t , ν = 1, . . . , n, ν n n
n n constitute a partition of (0, t], namely, they are pairwise disjoint and (0, t] = I1 ∪ · · · ∪ In . n Let Ak be the collection of the realizations of {K(t)} which have jumps exactly in k n intervals Iν , k [ \ νi − 1 νi \ ν − 1 ν An = K t, t > 0 K t, t = 0 . k n n n n 1≤ν1<...<νk≤n i=1 ν6∈{ν1,...,νk} University of Helsinki 13
Let further Bn be the collection of the realizations which have at least two jumps in some interval Iν, n [ ν − 1 ν Bn = K t, t ≥ 2 . n n ν=1 n By (i) and (ii)’, P (Ak ) is binomial,
n P (Ak ) = P(ξn = k),
−λt/n where ξn has Bin(n, 1 − e ) distribution. By Lemma 3.1.1,
k n −λt (λt) lim P (Ak ) = e . (3.1.5) n→∞ k! Consider now the set Bn as n → ∞. Every counting process has a finite number of jumps in finite time intervals and the jump sizes all equal 1. Because the realizations are right continuous any such realization lies outside Bn for large n. By the dominated convergence,
lim (Bn) = 0. (3.1.6) n→∞ P Clearly, n n n n Ak \ B ⊆ {K(t) = k} ⊆ Ak ∪ B so that n n n n P(Ak ) − P(B ) ≤ pk(t) ≤ P(Ak ) + P(B ). By (3.1.5) and (3.1.6), k n −λt (λt) pk(t) = lim P (Ak ) = e . n→∞ k! Let now 0 ≤ t < u. We have to show that K(u) − K(t) has the Poisson distribution with the parameter λ(u − t). Let s ∈ R. By (i),
MK(u)(s) = MK(t)+K(u)−K(t)(s) = MK(t)(s)MK(u)−K(t)(s).
By Theorem 3.1.1 and by the first part of the proof,
MK(u)(s) λu(es−1) −λt(es−1) λ(u−t)(es−1) MK(u)−K(t)(s) = = e e = e . MK(t)(s)
Thus K(u) − K(t) has the Poisson distribution with the parameter λ(u − t). The next result gives an alternative way to define the Poisson process. We skip the proof. University of Helsinki 14
Theorem 3.1.3.1. Let ξ, ξ1, ξ2,... be independent exponentially distributed random variables with the parameter λ > 0. Thus
−λx P(ξ ≤ x) = 1 − e , x ≥ 0. Define the stochastic process {K(t)|t ≥ 0} by ( sup{k| ξ + ··· + ξ ≤ t}, K(t) = 1 k 0 if ξ1 > t.
Then {K(t)|t ≥ 0} is a Poisson process with the intensity λ. Conversely, if {K(t)|t ≥ 0} is a Poisson process with the intensity λ > 0 and
Tk = inf{t| K(t) ≥ k}, k = 1, 2,..., then T1,T2 − T1,T3 − T2,... are independent exponentially distributed random variables with the parameter λ. We end the section by a generalization of Theorem 3.1.2 where in essence, the statio- narity assumption (ii) is dropped. Theorem 3.1.4. Let {K(t)} be a counting process and let
pk(t) = P(K(t) = k), k = 0, 1, 2,....
Assume that {K(t)} satisfies condition (i) of Theorem 3.1.2. Suppose that p0(t) ∈ (0, 1] for every t ≥ 0 and that p0 : [0, ∞) → (0, 1] is continuous. Then {K(t)} is a Poisson process with the intensity function Λ where
Λ(t) = − log p0(t) (3.1.7) for every t ≥ 0.
We note that if p0 is not continuous at t0 then from the applied point of view, a claim occurs at time t0 with a positive probability. This is not very natural in non-life insurance. Proof of Theorem 3.1.4. We only give the proof under the simplified assumption that p0 is strictly decreasing in [0, ∞) and limt→∞ p0(t) = 0. Let
−1 p0 (t) : (0, 1] → [0, ∞) be the inverse of p0. Let µ > 0 be fixed and
−1 −µt τ(t) = p0 (e ), t ∈ [0, ∞). University of Helsinki 15
Then τ is continuous and strictly increasing. Define the counting process {K∗(t)} by
K∗(t) = K(τ(t)), t ∈ [0, ∞).
Obviously, {K∗(t)} satisfies condition (i) of Theorem 3.1.2. We show that also condition −µt (ii)’ of Theorem 3.1.3 is satisfied. Now p0(τ(t)) = e so that
∗ −µt P(K (t) = 0) = P(K(τ(t)) = 0) = e . For arbitrary t, r ≥ 0,
∗ ∗ ∗ ∗ P(K (t + r) = 0) = P(K (t) = 0) P(K (t + r) − K (t) = 0), so that
∗ ∗ ∗ P(K (t + r) = 0) −µr ∗ P(K (t + r) − K (t) = 0) = = e = P(K (r) = 0). P(K∗(t) = 0) This is (ii)’. By Theorem 3.1.3, {K∗(t)} is a Poisson process with the intensity
∗ − log P(K (1) = 0) = µ.
For given 0 ≤ t1 < t2,
∗ −1 ∗ −1 K(t2) − K(t1) = K (τ (t2)) − K (τ (t1)).
Thus K(t2) − K(t1) has the Poisson distribution with the parameter
−1 −1 µ(τ (t2) − τ (t1)) = − log p0(t2) + log p0(t1)
= Λ(t2) − Λ(t1).
When the applicability of the Poisson process as the model for the claim numbers is considered, one has to think about the independence and stationarity of the increments. Theorem 3.1.4 shows that the stationarity is not very problematic. The assumption of independence of the increments is more problematic. In practice, it can be difficult to justify this property. For example, economic cycles may cause con- secutive bad or good years. In the next section, we describe a modification of the Poisson process which allows dependence between the increments. University of Helsinki 16
3.2 Mixed Poisson variable Consider the number of claims during a fixed time interval, for example, during one year. The essential feature of the following model is that it is able to describe more variations in the number of claims than the ordinary Poisson distribution. Let Q be a non-negative random variable and let λ > 0 be a constant. Assume that E(Q) = 1. The counting variable K is called a mixed Poisson variable with the parameter (λ, Q) if k X (λq)h F (k|q) = (K ≤ k|Q = q) = e−λq K|Q P h! h=0 for every q ≥ 0 and k = 0, 1, 2,.... The variable Q is called the mixing variable. Heuristically, we can think that at the beginning of the year, the value of the structure variable is drawn from the distribution of Q. If it is q then the Poisson parameter in force during the year is λq. As an applied example, the variations obtained in this way may correspond to the number of slippery days in car insurance. Let H be the distribution function of Q,
H(q) = P(Q ≤ q), q ∈ R. The probability mass function of K is determined by
Z ∞ Z ∞ k −λq (λq) P(K = k) = P(K = k|Q = q)dH(q) = e dH(q), (3.2.1) 0 0 k! k = 0, 1, 2,.... The moment generating function at point s is Z ∞ sK λQ(es−1) λq(es−1) MK (s) = E{E{e |Q}} = E e = e dH(q). (3.2.2) 0 Hence, s MK (s) = MQ(λ(e − 1)) = MQ(c(s)), (3.2.3) where c is the cumulant generating function of the Poisson distribution with the parameter λ. Theorem 3.2.1. Let K be a mixed Poisson variable with the parameter (λ, Q). As- sume that MQ is finite in a neighbourhood of the origin. Then
E(K) = λ,
2 2 2 σK = λ + λ σQ and 2 2 3 3 3 γK = (λ + 3λ σQ + λ γQσQ)/σK . University of Helsinki 17
The proof is left to the reader. It is often useful to consider the numbers of claims separately in appropriate sub- portfolios of the company. Then the question arises what happens if the sub-portfolios are considered together. The following result focusses on this question.
Let Ki be a mixed Poisson variable with the parameter (λi,Qi), i = 1, 2. Dependence between sub-portfolios will be allowed via the structure variables. We will assume that
P(K1 = k1,K2 = k2|Q1,Q2) = P(K1 = k1|Q1) P(K2 = k2|Q2) (3.2.4) for every k1, k2 = 0, 1, 2,.... Let H be the joint distribution function of Q1 and Q2. Then for every Borel sets B1 and B2
Z (λ q )k1 (λ q )k2 −λ1q1 1 1 −λ2q2 2 2 P(K1 = k1,K2 = k2,Q1 ∈ B1,Q2 ∈ B2) = e e dH(q1, q2). B1×B2 k1! k2!
Theorem 3.2.2. Let Ki be a mixed Poisson variable with the parameter (λi,Qi) for i = 1, 2, and let K = K1 + K2. Assume that (3.2.4) holds for every k1, k2 = 0, 1, 2,.... Then K is a mixed Poisson variable with the parameter (λ1 + λ2,Q) where λ Q + λ Q Q = 1 1 2 2 . λ1 + λ2 Proof. By (3.2.4),
s s s sK λ1Q1(e −1) λ2Q2(e −1) (λ1+λ2)Q(e −1) MK (s) = E{E{e |Q1,Q2}} = E e e = E e .
This and (3.2.3) prove the theorem. It is quite common that the structure variable is not directly observable so that the estimation of the distribution is not easy. A practical approach is to find out by statistical methods the ’best one’ from an appropriate family of distributions. In some situations, it is sufficient to know a couple of the lowest moments of Q which helps the estimation problem. Gamma distribution is a popular approximation of the distribution of Q. The counting variable K will then have a generalized negative binomial distribution which is also called the Polya distribution. We study this model in the following example. Example 3.2.1. Assume that Q has the gamma-(r, α) distribution where α and r are positive constants. The density of Q is αr f(x) = e−αxxr−1 Γ(r) University of Helsinki 18 for x ≥ 0 where Γ is the Eulers gamma function, Z ∞ Γ(r) = e−uur−1du. 0 We assume that E(Q) = 1 which means that we have to take α = r. The distribution function of Q is then Z q rr 1 Z rq H(q) = e−rxxr−1dx = e−xxr−1dx 0 Γ(r) Γ(r) 0 for q ≥ 0. We will show that
Γ(r + k) r r λ k (K = k) = (3.2.5) P Γ(r)Γ(k + 1) r + λ r + λ for k = 0, 1, 2,.... In the particular case where r ∈ N, write p = r/(r + λ) to see that r + k − 1 (K = k) = pr(1 − p)k. P k This is known as the negative binomial distribution. The proof of (3.2.5) is straightforward by using representation (3.2.1):
Z ∞ k Z ∞ k r −λq (λq) −λq (λq) r −rq r−1 P(K = k) = e dH(q) = e e q dq 0 k! 0 k! Γ(r) rrλk Z ∞ = e−(r+λ)qqr+k−1dq. k!Γ(r) 0 The last integrand is up to a multiplicative constant the density of the gamma-(r+k, r+λ) distribution. By this fact, and because Γ(k + 1) = k!, we conclude that
rrλk Γ(r + k) Γ(r + k) r r λ k (K = k) = = . P k!Γ(r) (r + λ)r+k Γ(r)Γ(k + 1) r + λ r + λ This is (3.2.5).
3.3 The number of claims of a single policy-holder In applications, it is often necessary to consider the claims process of a single policy- holder. A good example is the pricing of insurance contracts (which is a topic of tariff theory). The mixed Poisson variable is also here useful. Consider a fixed portfolio. Poisson processes may also be reasonable models for the numbers of claims of every policy-holder in the portfolio (and the Poisson distributions University of Helsinki 19 during a fixed time interval). We make this assumption. It is not natural to assume that the Poisson parameter is equal for every policy-holder. For example in car insurance, differences appear because the driving skills of the policy-holders are not equal. We model the portfolio in the following way.
Policy-holder 1 2 ... N Poisson parameter λq1 λq2 ... λqN .
Here λ describes the average expectation of the number of claims in the portfolio. The coefficient qi describes the expectation of the policy-holder i relative to this average. We thus assume that q1 + ··· + qN = 1. Choose a policy-holder from the portfolio at random. Hence, if L denotes this policy- holder then P(L = i) = 1/N for every i. By the total probability,
N N k X X 1 (λqi) (K = k) = (L = i) (K = k|L = i) = e−λqi . P P P N k! i=1 i=1 Take a random variable Q such that
#{i|q ≤ q} H(q) := (Q ≤ q) = i P N for every q ∈ R. Then Z ∞ k −λq (λq) P(K = k) = e dH(q), 0 k! k = 0, 1, 2,.... It is seen that K is a mixed Poisson variable. In this application, Q describes the heterogeneity of the portfolio. In the above consideration, the q-coefficients were taken as known. In real life, they are not completely known but has to be approximated.
3.4 Mixed Poisson process Mixed Poisson distributions are also useful in continuous time considerations. In particu- lar, the increments of the resulting process will no more be independent. We take a Poisson process as the staring point, but let the intensity be random. Let Q1,Q2,... be the yearly mixing variables. We assume that they are i.i.d. with the common distribution function H. Let λ > 0 be fixed. It will describe the mean of the number of claims in each year. University of Helsinki 20
Let Q be a generic variable which has the same distribution as Q1, and let N be a positive integer. We consider claims during the time interval [0,N]. Under the above notations and assumptions, we call the counting process
{K(t) | t ∈ [0,N]} a mixed Poisson process with the parameter (λ, Q) if conditionally, given that
Q1 = q1,...,QN = qN , the process {K(t) | t ∈ [0,N]} is a Poisson process with the intensity function Λ where
btc X Λ(t) = λqn + (t − btc)λqbtc+1, t ∈ [0,N], (3.4.1) n=1 and where btc is the integer part of t. The conditional intensity Λ increases in (3.4.1) linearly inside the years. The rate of increase is allowed to vary from year to year. By considering more complicated intensity functions, it would be possible to obtain more complicated counting processes. We will only consider models where (3.4.1) holds. Let’s illustrate the finite dimensional distributions of the process by means of an example. Let k1 ≤ k2 ≤ k3 be non-negative integers and t1, t2 ∈ (0, 1) and t3 ∈ (1, 2). Then P(K(t1) = k1,K(t2) = k2,K(t3) = k3)
= P(K(t1) = k1,K(t2) − K(t1) = k2 − k1,K(t3) − K(t2) = k3 − k2)
Z ∞ Z ∞ k1 k2−k1 (λq1t1) (λq1(t2 − t1)) = e−λq1t1 e−λq1(t2−t1) q1=0 q2=0 k1! (k2 − k1)! (λ(q (1 − t ) + q (t − 1))k3−k2 −λ(q1(1−t2)+q2(t3−1)) 1 2 2 3 · e dH(q1)dH(q2). (k3 − k2)! The finite dimensional distributions in turn determine the whole process (at least if the minimal possible sigma-algebra is taken in the definition of the process). It is easy to see that the increments of the process in different years are independent (for example, K(1) − K(1/2) and K(5/3) − K(4/3) are independent). However, there is dependence inside the years. We end the section by showing that the increments of a mixed Poisson process {K(t)} are independent in the sense of condition (i) of Theorem 3.1.2 if and only if {K(t)} is a Poisson process. University of Helsinki 21
Let 0 ≤ t < u ≤ 1. Obviously, K(u) − K(t) is a mixed Poisson variable with the parameter (λ(u − t),Q). By Theorem 3.2.1,
Var(K(u) − K(t)) = λ(u − t) + λ2(u − t)2Var(Q).
In particular, Var(K(1)) = λ + λ2Var(Q), 1 1 Var (K(1/2)) = λ + λ2Var(Q) 2 4 and 1 1 Var(K(1) − K(1/2)) = λ + λ2Var(Q). 2 4 If the increment are independent then
Var(K(1)) = Var(K(1/2)) + Var(K(1) − K(1/2)).
This is true only if Var(Q) = 0. This means that P(Q = 1) = 1 so that {K(t)} is a Poisson process. University of Helsinki 22
4 Total claim amount
Every occurrence of a claim means that the company has to pay a compensation which basically covers the economic loss of the policy-holder in question. It is natural to model the compensation as a non-negative random variable. We also call it the claim size. We will study the sum of these random variables in a year in a given portfolio. The sum is called the total claim amount (in the year). For wider applied discussions, we refer to DPP, Section 3.
Denote by Zi the size of the ith claim. If the number of claims occurred in the year is K then the total claim amount X has the form
X = Z1 + ··· + ZK . (4.1)
Both the number of claims and the claim sizes are random. The understanding and esti- mation of the total claim amount is a central topic of the risk theory. Let S be a distribution function. The random variable in (4.1) is called a compound variable with the parameter (K,S) if
K,Z1,Z2,... are independent (4.2) and the distribution of Z1,Z2,... is S. (4.3) We will assume throughout the section that X is a compound variable even if conditions (4.2) and (4.3) are usually only approximately satisfied. For example, inflation may change the distribution of the claim sizes. We will also assume that P(K > 0) > 0. Let Z be a generic variable which has the distribution function S. Thus
P(Z ≤ z) = P(Zi ≤ z) = S(z) for every z ∈ R and i = 1, 2,.... Assume in the sequel that S(0−) = P(Z < 0) = 0. Thus we do not allow negative claim sizes. Then also X is non-negative. Write pk = P(K = k) for k = 0, 1, 2,.... Let X be a compound variable with the parameter (K,S). If K has the Poisson distribution with the parameter λ then we call X a compound Poisson variable with the parameter (λ, S). Similarly, if K has the mixed Poisson distribution with the parameter (λ, Q) the we call X a compound mixed Poisson variable with the parameter (λ, Q, S). When the compound variable X is analysed, it is convenient to study separately the number of claims and the claim sizes. To clarify this, let’s consider the estimation of the distribution function of X from the data. A natural staring point is to assume that all the past observations come from the same distribution. Then there are not many useful University of Helsinki 23 observations because the environment changes all the time. In contrast to this, there is typically a lot of observations from the claim sizes. The estimation of the number of claims is usually otherwise easier. For example, if we agree that it has a Poisson distribution then only one parameter has to be estimated. In summary, the model for the structure of X is of high value in the estimation problem. Let F be the distribution function of X. By the properties of the compound variable,
∞ X k∗ F (x) = P(X ≤ x) = pkS (x) (4.4) k=0 where Sk∗ is the kth convolution of S, ( 0, jos x < 0 S0∗(x) = 1, jos x ≥ 0,
Z ∞ Sk∗(x) = S(k−1)∗(x − y)dS(y), k = 1, 2,.... −∞
In principle, it is possible to determine F by this equation if the probabilities pk and the distribution function of S are known. This can be very time consuming if the mean of the number of claims is large. Observe also that
k∗ P(X ≤ x, K = k) = S (x)pk so that k∗ FX|K (x|k) = S (x), x ∈ R, k = 0, 1, 2,.... In the moment generating function, the number of claims and the claim sizes are separated in the following way. Theorem 4.1. Let X be a compound variable with the parameter (K,S) and let Z has the distribution S. Let MK be the moment generating function of K and cZ the cumulant generating function of Z. Then the moment generating function of X, denoted by MX , is determined by MX (s) = MK (cZ (s)), s ∈ R, where by convention, MX (s) = ∞ if cZ (s) = ∞. Proof. By (4.2) and (4.3),
∞ ∞ X X sX 1 s(Z1+···+Zk) MX (s) = E(e (K = k)) = pkE(e ) k=0 k=0 ∞ ∞ X k X kcZ (s) = pkMZ (s) = pke = MK (cZ (s)). k=0 k=0 University of Helsinki 24
If in particular, X is a compound Poisson variable with the parameter (λ, S) then
λ(MZ (s)−1) MX (s) = e , (4.5) and if X is a compound mixed Poisson variable with the parameter (λ, Q, S) then
MX (s) = MQ(λ(MZ (s) − 1)). (4.6) The lowest moments of X can often be derived by the means of the moment generating function. Let ai be the ith origin moment of Z, Z ∞ i i ai = E(Z ) = z dS(z). (4.7) 0 Theorem 4.2. Let X be a compound mixed Poisson variable with the parameter (λ, Q, S). Then the mean, the variance, and the skewness of X are
µX = E(X) = λa1 2 2 2 2 σX = Var(X) = λa2 + λ a1σQ and 2 2 3 3 3 3 γX = [λa3 + 3λ a1a2σQ + λ a1γQσQ]/σX under the assumption that the right hand side is well defined. The proof is left to the reader. If X is a compound Poisson variable with the parameter (λ, S) then
µX = λa1, (4.8)
2 σX = λa2, (4.9) 3 µ3 = E((X − µX ) ) = λa3 (4.10) and 3 a3 γX = µ3/σX = 3/2√ . (4.11) a2 λ We end the section by proving an additivity property of the compound Poisson distri- bution. By taking Z ≡ 1 in the following result, it is seen that the sum of two independent Poisson variables is also a Poisson variable.
Theorem 4.3. Let Xi be a compound Poisson variable with the parameter (λi,Si) for i = 1, 2. Assume that X1 and X2 are independent. Then X = X1 + X2 is a compound Poisson variable with the parameter (λ1 + λ2,S) where
λ1S1(z) + λ2S2(z) S(z) = , z ∈ R. (4.12) λ1 + λ2 University of Helsinki 25
Proof. Let Mi be the moment generating function of the claim size associated with Xi, Z ∞ sz Mi(s) = e dSi(z), s ∈ R. −∞ By the independence and by Theorem 4.1,
sX sX1 sX2 MX (s) = E(e ) = E(e ) E(e ) = eλ1(M1(s)−1) eλ2(M2(s)−1).
Thus λ1 λ2 (λ1+λ2)( λ +λ M1(s)+ λ +λ M2(s)−1) MX (s) = e 1 2 1 2 . (4.13) A straightforward calculation shows that the moment generating function associated with the distribution function (4.12) is determined by the formula
λ1 λ2 M1(s) + M2(s), s ∈ R. λ1 + λ2 λ1 + λ2
Thus (4.13) and Theorem 4.1 complete the proof. It is also useful to consider the total claim amount as a stochastic process in the continuous time. Let {K(t)} be a counting process and let Z1,Z2,... be the claim sizes as earlier. Assume that Z1,Z2,... are i.i.d. and that they are independent of {K(t)} in all respects. The total claim amount {X(t) | t ≥ 0} in the continuous time is defined by
X(t) = Z1 + ··· + ZK(t). (4.14)
If {K(t)} is a Poisson process with the intensity λ and the claim size distribution is S then {X(t)} is called a compound Poisson process with the parameter (λ, S). It is clear that the increments of every compound Poisson process are independent and stationary in the sense of Theorem 3.1.2 (this means that {X(t)} is a Levy process). Similarly, if {K(t)} is a mixed Poisson process with the parameter (λ, Q) then {X(t)} is called a compound mixed Poisson process with the parameter (λ, Q, S). University of Helsinki 26
5 Viewpoints on claim size distributions
We consider in this section estimation of claim size distributions and related problems. As a starting point, we take a collection of past observations from one insurance line or sub- line (claims associated with fires, car accidents, ...). We assume that inflation and other trends have been eliminated from the data so that it is reasonable to assume that the observations, called Z1,Z2,..., are i.i.d. For wider applied discussions, we refer to DPP, Section 3.
5.1 Tabulation method If the number of observations is large it might be possible to rely on the data as such. This means that the empirical distribution is used as the estimate of S. Denote the cor- e responding distribution function by S . Let N be the number of observations Zi. Then by definition, e S (z) = #{i ≤ N | Zi ≤ z}/N (5.1) for every z ∈ R. There is often only a few number of large claims which means that the estimate of the right tail of the distribution may be inaccurate.
5.2 Analytical methods It is often convenient to work with analytically given distributions rather than the em- pirical distribution. The idea is to fit Se to some appropriate mathematical distribution. An additional motivation is caused by the fact that the claim statistics concerning large claims are often sparse. Popular analytical distributions for the estimation are - Gamma distribution with the parameters r, α where r > 0, α > 0 The density is αr s(z) = e−αzzr−1 Γ(r) for z ≥ 0 where Z ∞ Γ(r) = e−uur−1du. 0 - Log-normal distribution with the parameters µ, σ where µ ∈ R, σ > 0 The density is 1 − 1 (log z−µ)2 s(z) = √ e 2σ2 σz 2π University of Helsinki 27 for z ≥ 0. It is easy to see that if Y has N(µ, σ2) distribution (normal distribution with expectation µ and variance σ2) then Z = eY has the log-normal distribution with the parameters µ and σ. - Pareto distribution with the parameters α, r where α > 0, r > 0 The density is α s(z) = z−α−1 r−α for z ≥ r. The distribution function is z −α S(z) = 1 − r for z ≥ r. The parameters are usually determined by some statistical methods, for example, by the maximum likelihood or by moment method. Concerning the right tails, the above three families of distributions represent three different types. Gamma distribution is light tailed which means that its moment generating function is finite for some positive value of the argument. In contrast to this, log-normal distribution is heavy tailed. Still all the moments are finite. Pareto distribution is also heavy tailed. The origin moment an is finite only if n < α. From the applied point of view, Pareto distribution is the most and gamma distribution the least dangerous one among these three types. A wide class of Pareto type distribution is obtained in the following way. A function f : (0, ∞) → (0, ∞) is called regularly varying with index α ∈ R if for every z > 0, f(tz) lim = zα. t→∞ f(t) If α = 0 then f is called slowly varying. Obviously, f is regularly varying with index α if and only if α f(z) = z f0(z), z > 0, where f0 is slowly varying. Theorem 5.1. The function f : (0, ∞) → (0, ∞) is slowly varying if and only if there exist positive constants b and c such that Z z e(y) f(z) = a(z) exp dy b y for every z ≥ b where e(z) → 0 and a(z) → c as z → ∞. The proof can be found in Bingham et al. (1987). The next result is an immediate consequence. University of Helsinki 28
Corollary 5.1. If f : (0, ∞) → (0, ∞) is slowly varying then for every > 0, there exists z > 0 such that − z ≤ f(z) ≤ z , ∀z ≥ z.
Write in short S¯(z) = 1 − S(z) for z ∈ R. Let α ≥ 0. We say that the (right) tail of Z is regularly varying with index −α if S¯ (restricted to (0, ∞)) is regularly varying with index −α. By Corollary 5.1, we can then associate the power z−α with S¯(z) for large z. The obtained family of distributions is rather wide and extensively studied. In general, the risk associated with the right tail can be roughly described by means of the following characteristics. Let αS ∈ [0, ∞] be such that
S¯(z) ≈ e−αS z for large z. The exact meaning is that
−1 ¯ lim sup z log S(z) = −αS. z→∞ ¯ In words, αS describes the (exponential) rate of convergence of S(z) to zero. It is clear that αS = 0 for every Pareto distribution. In these circumstances, the polynomial rate of convergence is more useful. This means that for some βS ∈ [0, ∞],
S¯(z) ≈ z−βS for large z, or more precisely, that
−1 ¯ lim sup(log z) log S(z) = −βS. z→∞
Lemma 5.1. Let S be the distribution function of Z and let Z+ = max(Z, 0). Then
sZ αS = sup s ≥ 0; E e < ∞ (5.2) and + s βS = sup s ≥ 0; E (Z ) < ∞ . (5.3) Proof. Write sZ κ = sup s ≥ 0; E e < ∞ .
We have to show that αS = κ.
We first prove that αS ≥ κ. We can assume that κ > 0. Let s ∈ (0, κ) so that E(esZ ) < ∞. By Chebycheff’s inequality,