<<

ACTS 4304 FORMULA SUMMARY Lesson 1. Basic

Probability Functions

F (x) = P r(X ≤ x) S(x) = 1 − F (x) dF (x) f(x) = dx H(x) = − ln S(x) dH(x) f(x) h(x) = = dx S(x)

Functions of random variables Z ∞ E[g(x)] = g(x)f(x)dx −∞ 0 n n-th raw µn = E[X ] n n-th central moment µn = E[(X − µ) ] σ2 = E[(X − µ)2] = E[X2] − µ2 µ µ0 − 3µ0 µ + 2µ3 γ = 3 = 3 2 1 σ3 σ3 µ µ0 − 4µ0 µ + 6µ0 µ2 − 3µ4 γ = 4 = 4 3 2 2 σ4 σ4 Moment generating function M(t) = E[etX ] Probability generating function P (z) = E[zX ]

100p-th − (1) 100p-th percentile π is any point satisfying FX (π ) ≤ p and FX (π) ≥ p.

If FX is continuous, it is the unique point satisfying FX (π) = p

If FX is discrete, then π is the point that satisfies:

P r(X ≤ π) ≥ p and P r(X ≥ π) ≥ 1 − p

(2) is 50-th percentile; n-th quartile is 25n-th percentile

More concepts (1) (σ) is positive square root of variance (2) Coefficient of variation is CV = σ/µ (3) is x which maximizes f(x) (n) n (n) (4) MX (0) = E[X ], where M is the n-th derivative (n) PX (0) (5) n! = P r(X = n) (n) (6) PX (1) is the n-th factorial moment of X. 2

Bayes’ Theorem P r(B|A)P r(A) P r(A|B) = P r(B)

fY (y|x)fX (x) fX (x|y) = fY (y) Law of total probability

If Bi is a set of exhaustive (in other words, P r(∪iBi) = 1) and mutually exclusive (in other words P r(Bi ∩ Bj) = 0 for i 6= j) events, then for any event A, X X P r(A) = P r(A ∩ Bi) = P r(Bi)P r(A|Bi) i i For continuous distributions, Z P r(A) = P r(A|x)f(x)dx

Conditional Expectation and Variance Formulae

Double expectation EX [X] = EY [EX [X|Y ]]

Conditional variance V arX [X] = EY [V arX [X|Y ]] + V arY (EX [X|Y ]) Bernoulli Shortcut If a can only assume two values a and b with q and 1 − q respectively, then its variance is V ar(X) = (b − a)2q(1 − q) 3

Lesson 2. Parametric Distributions Forms of probability density functions for common distributions

Distribution Probability density function f(x)

Uniform c, x ∈ [d, u] Beta cxa−1(θ − x)b−1, x ∈ [0, θ] − x Exponential ce θ , x ≥ 0 Weibull cxτ−1e−xτ /θτ , x ≥ 0 α−1 − x Gamma cx e θ , x ≥ 0 c Pareto (x+θ)α+1 , x ≥ 0 c Single-parameter Pareto xα+1 , x ≥ θ 2 2 ce−(ln x−µ) /2σ Lognormal x , x > 0

Summary of Parametric Distribution Concepts • If X is a member of a scale family with θ with value s, then cX is the same family and has the same parameter values as X except that the scale parameter θ has value cs. • All distributions in the tables are scale families with scale parameter θ except for lognormal and inverse Gaussian. • If X is lognormal with parameters µ and σ, then cX is lognormal with parameters µ + ln c and σ. • See the above table to learn the forms of commonly occurring distributions. Useful facts are d + u (u − d)2 Uniform on [d, u] E[X] = , V arX = 2 12 u2 Uniform on [0, u] E[X2] = 3 Gamma V arX = αθ2 • If Y is single-parameter Pareto with parameters α and θ, then Y − θ is two-parameter Pareto with parameters. • X is in the linear if its probability density function can be expressed as p(x)er(θ)x f(x; θ) = q(θ) 4

Lesson 3. Variance

For any random variables X and Y, E[aX + bY ] = aE[X] + bE[Y ] V ar(aX + bY ) = a2V ar(X) + 2abCov(X,Y ) + b2V ar(Y ) n ! n X X For independent random variables X1,X2, ··· ,Xn, V ar Xi = V ar(Xi) i=1 i=1 For independent identically distributed random variables (i.i.d.) X1,X2, ··· ,Xn, n ! X V ar Xi = nV ar(X) i=1 n 1 X The sample X¯ = X n i i=1 1 The variance of the sample mean V ar(X¯) = V ar(X) n Double expectation EX [X] = EY [EX [X|Y ]]

Conditional variance V arX [X] = EY [V arX [X|Y ]] + V arY (EX [X|Y ]) 5

Lesson 4. Mixtures and Splices Pn • If X is a mixture of n random variables with weights wi such that i=1 wi = 1, then the following can be expressed as a weighted average: n X Cumulative distribution function: FX (x) = wiFXi (x) i=1 n X Probability density function: fX (x) = wifXi (x) i=1 n k X k k-th raw moment: E[X ] = wiE[Xi ] i=1 • Conditional variance:

V arX [X] = EI [V arX [X|I]] + V arI (EX [X|I]) • For a frailty model, given the hazard rate function for each individual, h (x|Λ) = Λa(x), The is of the form: h −ΛA(x)i SX (x) = EΛ e = MΛ (−A(x)) , R x where A(x) = 0 a(t) dt and MΛ(x) is the moment generating function of the random variable Λ. • Splices: For a spliced distribution, the sum of the probabilities of being in each splice must add up to 1. 6

Lesson 5. Property/Casualty Insurance Coverages

Let I(X) be the insurance amount for loss X, SI be the Sum Insured, and FV be the Full Value at the time of the loss. Then if the insurance equals at least α of the value of the house at the time of the loss, the insurance amount is calculated as:  SI  I(X) = min SI, · X α · FV Typically α = 80%. 7

Lesson 6. Health Insurance

Types of Coverage Modifications: 1. Allowed charges 2. Deductibles 3. Coinsurance 4. Out of pocket limits 5. Maximum limits 6. Internal limits 7. Copays Variations on Major Medical: 1. Comprehensive major medical coverage 2. Catastrophic medical 3. Short term medical 4. High risk pool plans Dental Insurance: Types of Coverage: I. Preventive (X-rays, cleaning) II. Basic (fillings, extractions) III. Prosthetic (inlays, crowns) IV. Orthodontia 8

Lesson 7. Loss Reserving: Basic Methods

Let R – be the Total Reserve ELR – be the Expected Loss Ratio EP – be the Earned premium PTD – be the Amount of Claim Paid-to-Date fult be the ultimate development factor fj be the link ratios from year j − 1 to year j Expected Loss Ratio Method: RLR = ELR · EP − PTD Chain Ladder Method: RCL = fult · PTD − PTD = PTD (fult − 1) Bornhuetter-Ferguson Method: Y fult = fj j  1  RBF = EP · ELR · 1 − fult Connection Between the Three Reserve Methods:  1  1 RBF = 1 − RLR + RCL fult fult 9

Lesson 8. Loss Reserving: Other Methods

Method of Projecting and Severity Separately: 1) List Cumulative Payments Loss triangle. 2) List Cumulative Closed Claims triangle. 3) Obtain the Cumulative Average Claim Size triangle by dividing the Cumulative Loss Payments by the corresponding Cumulative Closed Claims. 4) Using the volume-weighted average technique, calculate the link factors for the Average Severities. 5) Using the volume-weighted average technique, calculate the link factors for the Closed Claims. 6) Project Average Severities through Development Years. 7) Project Closed Claims through Development Years. 8) List the Ultimate Average Severities from the last DY column for each AY. 9) List the Ultimate Closed Claims from the last DY column for each AY. 10) Multiply the Ultimate Average Severities by the Ultimate Closed Claims to obtain the Ultimate Losses for each AY. 11) List the PTD claims from the Cumulative Payments Loss triangle - same way as in CL method. 12) Obtain the Reserve by subtracting the PTD from the Ultimate Losses for each AY. 13) The Total Reserve is the sum of all Reserves over the AYs.

Closure Method: 1) List Cumulative Payments Loss Triangle . 2) Create the corresponding Incremental Payments Loss Triangle. 3) List Cumulative Closed Claims Triangle. 4) Create the corresponding Incremental Closed Claims Triangle. 5) Working with the Incremental Payments and Closed Claims, create the Incremental Severity Tri- angle by dividing incremental claim counts into incremental payments. 6) Trend all average severity numbers to the last AY by multiplying by 1+trend raised to the corre- sponding powers and then calculate their averages for each development year. 7) List these averages as projected incremental severities for the last AY, development years 1-n − 1 (leave the AYn, DY 0 number the same) and detrend the averages back through accident years for each DY by dividing by 1+trend raised to the corresponding powers. 8) Working with Ultimate Claim Counts and the Incremental Closed Claims, create a triangle showing the percentage of closed claims. The Ultimate Closed Claims were calculated by working with cumulative closed claims triangle and projecting using the volume-weighted averages. 9) Using these percentages complete the Projected Incremental Closed Claims triangle 10) Multiplying the projected incremental severities obtained in Step 7 and annual incremental closed claims obtained in Step 9, find the projected loss payments. 11) Add up the corresponding products by AY to obtain the reserve for each AY and then the total reserve.

Method of Discounted Loss Reserves: 1) List Projected Cumulative Payments using Chain Ladder method. 2) Calculate the Projected Incremental Payments by subtracting projected cumulative payments in DY t-1 from the projected cumulative payments in DY t. You will obtain a lower hand triangle. 3) Discount Projected Incremental Payments. 4) Add up the results by AY to obtain the reserve for each AY and then the total reserve. 10

Lesson 9. Ratemaking: Preliminary Calculations

The Method of n Given the (xi, yi)i=1, one can fit the ”best fit” line y = mx + b through these data points, where Pn y · Pn x2 − (Pn x y ) Pn x b = i=1 i i=1 i i=1 i i i=1 i Pn 2 Pn 2 n i=1 xi − ( i=1 xi) n Pn x y − Pn x Pn y m = i=1 i i i=1 i i=1 i Pn 2 Pn 2 n i=1 xi − ( i=1 xi) Gross Rate Formulae Let V be the expense ratio or the proportion of premium needed for expenses, contingencies, and profit. Let it include all expenses other than LAE as a percentage of the Gross Rate. Let L be the loss cost, developed and trended and including LAE. Let R be the permissible loss ratio: R = 1 − V . Then the gross rate G is: L L G = = 1 − V R For separated fixed and variable expenses: L + F L + F G = = 1 − V R If F is an amount fixed by state regulation and the regulation does not allow F to be grossed up by the loss ratio, then L G = + F R Credibility Factor

r n  Z = min , 1 1082 r n  Z = min , 1 , where nF is the number of expected claims needed for full credibility nF E Z = , where E is a measure of exposure and K a constant related to the distribution of the claims. E + K B¨uhlmannCredibility n ν Z = , where k = and n + k a • µ, or EHM, is the expected value of the hypothetical mean, or the overall mean: µ = E[E[X|θ]]. • a, or VHM, is the variance of the hypothetical mean a = V ar (E[X|θ]). • ν, or EPV, is the expected value of the process variance: ν = E[V ar (X|θ)]. 11

Lesson 10. Ratemaking: Rate Changes and Individual Risk Rating Plans

Overall Average Rate Change The Loss Cost Method Expected Losses, Trended and Developed Average Loss Cost = Number of Earned Exposures Average Loss Cost + F Average Gross Rate = , R where F is fixed expense per policy and R is the permissible loss ratio. Average Gross Rate Indicated Rate Change = − 1 Current Average Gross Rate The Loss Ratio Method

Indicated Rate Change = Effective Loss Ratio+ Fixed Expense Ratio = − 1, R where Effective Loss Ratio =

Expected Losses, Trended and Developed = Earned Premium at Current Rates

Fixed Expense Ratio =

Fixed Expenses per Exposure = = EP at Current Rates/Number of Earned Exposures Total Fixed Expenses = Earned Premium at Current Rates Updating Class Differentials The Loss Ratio method: Ri Indicated Differentiali = Existing Differentiali · , Rbase where R’s are the experience loss ratios. The Loss Cost method: Loss Costi Indicated Differentiali = Loss Costbase Balancing Back 1. After changing differentials, the resulting loss cost will not balance back to the expected loss cost because the average of the differentials is not 1. 2. We thus must multiply the rates by a factor. 3. The numerator of the factor is the weighted average of existing differentials. 4. The denominator of the factor is the weighted average of the proposed differentials. 5. The weights are the earned exposures. 12

Lesson 11. Policy Limits All formulas assume P r(X < 0) = 0. Z ∞ E[X] = S(x) dx 0 Z u Z u E[X ∧ u] = xf(x) dx + u (1 − F (u)) = S(x) dx 0 0 Z ∞ E[Xk] = kxk−1S(x) dx 0 Z u Z u E[(X ∧ u)k] = xkf(x) dx + uk (1 − F (u)) = kxk−1S(x) dx 0 0 For inflation, if Y = (1 + r)X, then  u  E[Y ∧ u] = (1 + r)E X ∧ 1 + r 13

Lesson 12. Deductibles Payment per Loss: L FY L (x) = FX (x + d), if Y = (X − d)+ Z ∞ E[(X − d)+] = (x − d)f(x) dx d Z ∞ E[(X − d)+] = S(x) dx d E[X] = E[X ∧ d] + E[(X − d)+] Payment per Payment:

FX (x + d) − FX (d) P FY P (x) = , if Y = (X − d)+|X > d 1 − FX (d)

SX (x + d) P SY P (x) = , if Y = (X − d)+|X > d SX (d) E[(X − d) ] E[X] − E[X ∧ d] e (d) = + = − mean excess loss X S(d) S(d) R ∞(x − d)f(x) dx e (d) = d X S(d) R ∞ S(x) dx e (d) = d X S(d) E[X] = E[X ∧ d] + e(d) (1 − F (d)) Mean excess loss for different distributions:

eX (d) = θ for exponential θ − d e (d) = , d < θ for uniform on [0, θ] X 2 θ − d e (d) = , d < θ for beta with parameters a = 1, b, θ X 1 + b θ + d e (d) = for two-parameter Pareto X α − 1 ( d α−1 d ≥ θ eX (d) = α(θ−d)+d for single-parameter Pareto α−1 d ≤ θ If Y L,Y P are loss and payment random variables for franchise deductible of d, and XL,XP are loss and payment random variables for ordinary deductible of d, then E[Y L] = E[XL] + dS(d) E[Y P ] = E[XP ] + d 14

Lesson 13. Loss Elimination Ratio The Loss Elimination Ratio is defined as the proportion of the expected loss which the insurer doesn’t pay as a result of an ordinary deductible d: E[X ∧ d] E[(X − d) ] LER(d) = = 1 − + E[X] E[X] Loss Elimination Ratio for Certain Distributions:

LER(d) = 1 − e−d/θ for an exponential  θ α−1 LER(d) = 1 − for a Pareto with α > 1 d + θ (θ/d)α−1 LER(d) = 1 − for a single-parameter Pareto with α > 1, d ≥ θ α 15

Lesson 14. Increased Limits Factors and Increased Deductible Relativities

Increased Limits Factors Let X be the random variable for severity. Then the increased limit factor for policy limit of U with the basic policy limit of B is: E[X ∧ U] ILF (U) = E[X ∧ B] In the presence of risk loads, the increased limits factor for policy limit U with base limit B is: LAS(U) + Risk Load(U) ILF (U) = , LAS(B) + Risk Load(B) where LAS is the limited average severity. Deductible Relativities The indicated deductible relativity is the ratio of the payment per loss with a deductible d to the payment per loss with the basic deductible b: E[(X − d) ] E[X] − E[X ∧ d] IDR(d) = + = E[(X − b)+] E[X] − E[X ∧ b] The following definition of the Loss Elimination Ratio is sometimes used: E[X ∧ d] − E[X ∧ b] LER(d) = E[X] − E[X ∧ b] It is the proportion of losses eliminated relative to the basic deductible. Then the indicated deductible relativity can be expressed through the LER as: IDR(d) = 1 − LER(d) 16

Lesson 15. Reinsurance

The proportion of losses in layer (a, b) is equal to:

ILFb − ILFa ILF∞ If all losses are below U, we can replace ILF∞ with ILFU to obtain: ILF − ILF ILF ILF b a = b − a ILFU ILFU ILFU 17

Lesson 16. Risk Measures and Tail Weight −1 Value-at-Risk: V aRp(X) = πp = FX (p) Tail-Value-at-Risk: R ∞ xf(x) dx V aRp(X) T V aRp(X) = E [X|X > V aRp(X)] = = 1 − F (V aRp(X)) R 1 V aRy(X) dy = p = V aR (X) + e (V aR (X)) = 1 − p p X p E[X] − E[X ∧ V aR (X)] = V aR (X) + p p 1 − p Value-at-Risk and Tail-Value-at-Risk measures for some distributions:

Distribution V aRp(X) T V aRp(X)

Exponential −θ ln(1 − p) θ (1 − ln(1 − p))

 − 1    − 1  Pareto θ (1 − p) α − 1 E[X] 1 + α (1 − p) α − 1 2 −zp/2 Normal µ + z σ µ + σ · e√ p 1−p 2π µ+zpσ Φ(σ−zp) Lognormal e E[X] · 1−p 18

Lesson 17. Other Topics in Severity Coverage Modifications Policy limit - the maximum amount that the coverage will pay. In the presence of a deductible or other modifications, perform the other modifications first, then the policy limit. Maximum coverage loss is the stipulated amount considered in calculating the payment. Apply this limit first, and then the deductible. If u is the maximum coverage loss and d - the deductible, then Y L = X ∧ u − X ∧ d Coinsurance of α is the portion of each loss reimbursed by insurance. In the presence of the three modifications, E[Y L] = α (E[X ∧ u] − E[X ∧ d]) If r is the inflation factor,   u   d  E[Y L] = α(1 + r) E X ∧ − E X ∧ 1 + r 1 + r 19

Lesson 18. Bonuses A typical bonus is a portion of the excess of r% of premiums over losses. If c is the portion of the excess, r is the loss ratio, P is earned premium, and X is losses, then B = max (0, c(rP − X)) = crP − c min(rP, X) = crP − c (X ∧ rP ) For a two-parameter Pareto distribution with α = 2 and θ, θd E[X ∧ d] = d + θ 20

Lesson 19. Discrete Distributions For a (a, b, 0) class distributions,

pk b = a + , pk = P r(X = k) pk−1 k

Poisson Binomial Negative binomial Geometric

n r n λ m n+r−1    β  βn −λ· n!  n m−n  1 pn e n q (1 − q) n 1+β 1+β (1+β)n+1 Mean λ mq rβ β Variance λ mq(1 − q) rβ(1 + β) β(1 + β) q β β a 0 − 1−q 1+β 1+β q β b λ (m + 1) 1−q (r − 1) 1+β 0

If N is the random variable having the (a, b, 0) distribution, then a + b E[N] = 1 − a a + b V ar(N) = (1 − a)2

If µ(j) is a factorial moment of N:

µ(j) = E[N(N − 1) ... (N − j + 1)] then (aj + b) µ = µ (j) 1 − a (j−1)

For a (a, b, 1) class distributions, p0 is arbitrary and p b k = a + for k = 2, 3, 4, ··· pk−1 k Zero-truncated distributions: T pn pn = , n > 0 1 − p0 Zero-modified distributions: M M T pn = (1 − p0 )pn E[N] = cm V ar(N) = c(1 − c)m2 + cv, where M • c is 1 − p0 • m is the mean of the corresponding zero-truncated distribution • v is the variance of the corresponding zero-truncated distribution 21

Lesson 20. Poisson/Gamma Assume that in a portfolio of insureds, loss frequency follows a Poisson distribution with parameter λ, but λ is not fixed but varies by insured. Suppose λ varies according to a over the portfolio of insureds. If the conditional loss frequency of an insured, if you are not given who the insured is, is Poisson with parameter λ, then the unconditional loss frequency for an insured picked at random is a negative binomial. The parameters of the negative binomial (r, β) are the same as the parameters of the gamma distri- bution (α, θ): r = α, β = θ. For a gamma distribution with parameters (α, θ), the mean is αθ and the variance is αθ2. For a negative , the mean is rβ and the variance is rβ(1 + β). If the Poisson parameter for one hour has a gamma distribution with parameters (α, θ), the Poisson parameter for k hours will have a gamma distribution with parameters (α, kθ). 22

Lesson 21. Frequency Exposure and Coverage Modifications Let X be the severity, d - a deductible, v - the probability of paying the claim.

Model Original Exposure Coverage Parameters Modification Modification

Exposure n1 Exposure n2 Exposure n1 P r (X > d) = 1 P r (X > d) = 1 P r (X > d) = v

Poisson λ (n2/n1) λ vλ 1 Binomial m, q (n2/n1) m, q m, vq

Negative binomial r, β (n2/n1) r, β r, vβ

1 Note that (n2/n1) m must be an integer for exposure modification formula to work. These adjustments work for (a, b, 1) distributions as well as (a, b, 0) distributions. For (a, b, 1) distri- M P∞ butions, p0 = 1 − k=1 pk is adjusted as follows:  ∗  M∗ M  1 − p0 1 − p0 = 1 − p0 , 1 − p0 where asterisks indicate distributions with revised parameters. 23

Lesson 22. Aggregate Loss Models: Compound Variance For the collective risk model the aggregate losses are defined as: N X S = Xi, i=1 where N is the number of claims and Xi is the size of each claim. For the individual risk model the aggregate losses are defined as: n X S = Xi, i=1 where n is the number of insureds in the group and Xi is the aggregate claims of each individual member. For the collective risk model, we assume that aggregate losses have a compound distribution, with frequency being the primary distribution and severity being the secondary distribution. E[S] = E[N]E[X] V ar(S) = E[N]V ar(X) + V ar(N)E[X]2 For Poisson Primary: V ar(S) = λE[X2] 24

Lesson 23. Aggregate Loss Models: Approximating Distribution The aggregate distribution may be approximated with a : S − E[S] s − E[S] s − E[S] FS(s) = P r(S ≤ s) = P r ≤ ≈ Φ σS σS σS If severity is discrete, then the aggregate loss distribution is discrete, and a continuity correction is required: if X assumes values a and b, but no value in between, all of the following statements are equivalent: X > a, X ≥ b, X > c for any c ∈ (a, b) X ≤ a, X < b, X < c for any c ∈ (a, b) To evaluate probabilities, assume: a + b P r(X > a) = P r(X ≥ b) = P r(X > ) 2 a + b P r(X ≤ a) = P r(X < b) = P r(X < ) 2 If severity has a continuous distribution, no continuity correction is made. 25

Lesson 24. Aggregate Loss Models: Severity Modifications 26

Lesson 25: Discrete Aggregate Loss Models: The Recursive Formula Let

pn = P r(N = n) = fN (n)

fn = P r(X = n) = fX (n)

gn = P r(S = n) = fS(n) P Then FS(x) = n≤x gn and ∞ X X gn = pk fi1 fi2 . . . fik , where n=0 i1+i2+···+ik=n k Y ∗k fim = f is the k- fold convolution of the f’s m=1

If N belongs to the (a, b, 0) class, gn can be calculated recursively:

g0 = PN (f0), where PN (z) is the probability generating function for the primary distribution k 1 X  bj  g = a + f g , k = 1, 2, 3,... k 1 − af k j k−j 0 j=1 In particular, for a Poisson distribution, where a = 0, b = λ, k λ X g = jf g , k = 1, 2, 3,... k k j k−j j=1

If N belongs to the (a, b, 1) class, gn can be calculated recursively as well:

g0 = PN (f0), where PN (z) is the probability generating function for the primary distribution k 1 1 X  bj  g = · (p − (a + b)p ) + a + f g , k = 1, 2, 3,... k 1 − af 1 0 1 − af k j k−j 0 0 j=1 27

Lesson 26. Aggregate Losses: Aggregate Deductible

E[(S − d)+] = E[S] − E[S ∧ d]

pn = P r(N = n) = fN (n)

fn = P r(X = n) = fX (n)

gn = P r(S = n) = fS(n) X FS(x) = gn n≤x

Determine SS(x) = 1 − FS(x) and apply Z d E[S ∧ d] = SS(x) dx 0 28

Lesson 27. Aggregate Losses: Miscellaneous Topics Coverage Modifications If there is a per-policy deductible, the expected annual aggregate payment is either

E[S] = E[N] · E[(X − d)+] or E[S] = E[N P ] · e(d), where N P is expected number of payments per year and e(d) is the expected payment per payment. Exact Calculation of Aggregate Loss Distribution The distribution function of aggregate losses at x is the sum over n of the probabilities that the claim count equals n and the sum of n loss sizes is less than or equal to x.

(1) Normal Distribution of Severities. If n random variables Xi are independent and normally dis- tributed with parameters µ and σ2, their sum is normally distributed with parameters nµ and nσ2. (2) Exponential and Gamma (Erlang) Distribution of Severities. The sum of n exponential random variables with common mean θ is a gamma distribution with parameters α = n and θ. For an integer α the gamma distribution is also called an Erlang distribution. The probability that n events occur before time x is FS|N=n, where S|N = n is Erlang(n, θ) and n−1 X (x/θ)j F (x) = 1 − e−x/θ S|N=n j! j=0 If S is compound model with exponential severities, ∞ X FS(x) = pnFS|N=n(x) n=0 (3) Negative Binomial/Exponential Compound Models. A compound model with negative binomial frequency with parameters r, integer, and β, and exponential severities with parameter θ is equivalent to a compound model with binomial frequency with parameters m = r and q = β/(1 + β) and exponential severities with parameter θ(1 + β). (4) Compound Poisson Models. Suppose Sj are a set of compound Poisson models with Poisson Pn parameters λj and severity random variables Xj. Then the sum S = j=1 Sj is a compound Pn Poisson model with Poisson parameter λ = j=1 λj and severity having a weighted average, or a mixture, distribution of the individual severities Xj. The weights are λj/λ. Discretizing The recursive method for calculating the aggregate distribution as well as the direct convolution method require a discrete severity distribution. Usually the severity distribution is continuous. Thus, discretization is needed. (1) Method of rounding. If h is the span,

fkh = F ((k + 0.5 − 0)h) − F ((k − 0.5 − 0)h) k k (2) Method of local moment . For the interval xk = x0 + kh and masses m0 and m1, solve the following system:  k k m0 + m1 = F ((k + 1)h) − F (kh) k k R (k+1)h xkm0 + xk+1m1 = kh xf(x) dx Then k k−1 fkh = m0 + m1 29

Lesson 29. Maximum Likelihood Estimators Likelihood formulas

Distribution Estimators

Discrete distribution, individual data px Continuous distribution, individual data f(x)

Grouped data F (cj) − F (cj−1) Individual data censored from above at u 1 − F (u) for censored observations Individual data censored from below at d F (d) for censored observations f(x) Individual data truncated from above at u F (u) f(x) Individual data truncated from below at d 1−F (d) 30

Lesson 30. Maximum Likelihood Estimators: Special Techniques Summary of maximum likelihood formulas In this table, n is the number of uncensored observations, c is the number of censored observations, di is the truncation point for each observation (0 is untruncated), xi is the observation if uncensored or the point if censored. The last column (CT?) indicates whether the estimator may be used for right-censored or left-truncated data.

Distribution Formula CT?

ˆ 1 Pn+c Exponential θ = n i=1 (xi − di) Yes q 1 Pn 1 Pn 2 2 Lognormal µˆ = n i=1 ln xi, σˆ = n i=1(ln xi) − µˆ No ˆ n Inverse Exponential θ = Pn No i=1 1/xi q ˆ τ 1 Pn+c τ τ Weibull, fixed τ θ = n i=1 (xi − di ) Yes

Uniform [0, θ] individual data θˆ = max xi No

Uniform [0, θ] θˆ = cj (n/nj) No

cj = Upper bound of highest finite interval

nj=number of observations below cj Pn+c Pn+c Two-parameter Pareto, fixed θ αˆ = −n/K, K = i=1 ln(θ + di) − i=1 ln(θ + xi) Yes Pn+c Pn+c Single-parameter Pareto, fixed θ αˆ = −n/K, K = i=1 ln(max(θ, di)) − i=1 ln xi Yes Pn Beta, fixed θ, b = 1 aˆ = −n/K, K = i=1 ln xi − n ln θ No ˆ Pn Beta, fixed θ, a = 1 b = −n/K, K = i=1 ln(θ − xi) − n ln θ No

Common likelihood functions, and their resulting estimates

When the is ··· Then the MLE is ···

−a −b/θ ˆ b L(θ) = θ e θ = a a −bθ ˆ a L(θ) = θ e θ = b a b ˆ a L(θ) = θ (1 − θ) θ = a+b 31

Lesson 31. Variance of Maximum Likelihood Estimators matrix and Cramer-Rao asymptotic matrix The information matrix for a single parameter θ is: " #  d2l   dl 2 I(θ) = −E = E X dθ2 X dθ The asymptotic variance is the inverse of the information matrix. Asymptotic variance of MLE’s for common distributions Let n be the sample size and Var asymptotic variance.

Distribution Formula

ˆ θ2 Exponential V ar(θ) = n ˆ nθ2 Uniform [0, θ] V ar(θ) = (n+1)2(n+2) ˆ θ2 Weibull, fixed τ V ar(θ) = nτ 2 α2 Pareto, fixed θ V ar(ˆα) = n ˆ (α+2)θ2 Pareto, fixed α V ar(θ) = nα σ2 Lognormal V ar(ˆµ) = n , Cov(µ, σ) = 0 σ2 V ar(ˆσ) = 2n

Delta Method The delta method is a method of estimating the variance of a function of a random variable from the variance of the random variable. 1. Delta Method Formula - One Variable  dg 2 V ar (g(X)) ≈ V ar(X) dx 2. Delta Method Formula - Two Variables  ∂g 2 ∂g ∂g ∂g 2 V ar (g(X)) ≈ V ar(X) + 2Cov(X,Y ) + V ar(Y ) ∂x ∂x ∂y ∂y 3. Delta Method Formula - General  ∂g ∂g ∂g 0 V ar (g(X)) ≈ (∂g)0Σ(∂g), ∂g = , , ··· , ∂x1 ∂x2 ∂xn 32

Lesson 32. Fitting Discrete Distributions 1. For a Poisson with complete data, the method of moments and maximum likelihood estimators of λ are bothx ¯. 2. For a negative binomial with complete data: a. The method of moments estimators are σˆ2 − x¯ x¯2 βˆ = rˆ = x¯ σˆ2 − x¯ b. Maximum likelihood setsr ˆβˆ =x ¯. If one of them is known, the other one is set equal tox ¯ divided by the known one. 3. For a binomial with complete data, method of moments may not set m equal to an integer. Maximum likelihood proceeds by calculating a likelihood profile for each m ≥ xi. The maximum likelihood estimate of q given m isx/m ¯ . When the maximum likelihood for m + 1 is less than the one for m, the maximum overall is attained at m. M 4. For modified (a, b, 1) distributions,p ˆ0 = n0/n and the mean is set equal to the sample mean. 5. Fitting λ of a zero-modified Poisson requires numerical techniques. 6. Fitting q for a zero-modified binomial for fixed m requires solving a high-degree polynomial unless m ≤ 3. 7. Fitting β for a zero-modified negative binomial for fixed r requires numerical techniques except for special values of r, like 1. 8. If you are given a table with varying exposures and claims, and individual claims have a Poisson distribution with the same λ, the maximum likelihood estimate of λ is the sample mean, or the sum of all claims over the sum of all exposures. 9. to choose between (a, b, 0) distributions to fit to data, two methods are available: a. Compare the sample varianceσ ˆ2 to the sample meanx ¯. Choose binomial if it is less, Poisson if equal, and negative binomial if greater. b. Calculate knk/nk−1, and observe the slope as a function of k. Choose binomial if negative, Poisson if zero, and negative binomial if positive. 33

Lesson 33. Hypothesis Tests: Graphic Comparison These plots are constructed to assess how well the model fits the data.

1. D(x) plots

Let fn be the density function and Fn be the empirical distribution function. Then for a sample x1 ≤ x2 ≤ · · · ≤ xn: number of x ≤ x j F (x) = j and F (x ) = n n n j n Let F ∗ be the fitted distribution function: F (x) − F (d) F ∗(x) = 1 − F (d) if observed data are left-truncated at d. Note F ∗(x) = F (x) for untruncated data. Then the D(x) plot is the graph of the function ∗ D(x) = Fn(x) − F (x)

2. p − p plots

Let Fn be the empirical distribution function: for a sample x1 ≤ x2 ≤ · · · ≤ xn j F (x ) = n j n + 1 Then the p − p plot is the graph linearly connecting points ∗ (Fn(xj),F (xj))

Note the difference in the definition of the Fn(xj) in a D(x) plot and a p − p plot. 34

Lesson 34. Hypothesis Tests: Kolmogorov-Smirnov Let F ∗ be the fitted distribution function: F (x) − F (d) F ∗(x) = 1 − F (d) if observed data are left-truncated at d. F (x) F ∗(x) = F (u) if observed data are right-truncated at u. Note F ∗(x) = F (x) for untruncated data.

For a sample x1 ≤ x2 ≤ · · · ≤ xn the Kolmogorov-Smirnov D is defined as: D = maxj Dj, where   ∗ j ∗ j − 1 Dj = max F (xj) − , F (xj) − , if xj 6= xj+1 and n n   ∗ j − 1 ∗ j + 1 Dj = max F (xj) − , F (xj) − , if xj = xj+1 n n 35

Lesson 35. Hypothesis Tests: Chi-square Chi-square Statistic Suppose the data is divided into k groups, n be the total number of observations. Let pj be the probability that X is in the jth group under the hypothesis, Oj be the number of observations in group j and let Ej = npj be the expected number of observations in group j. Then the chi-square statistic is: k 2 k 2 X (Oj − Ej) X Oj Q = = − n E E j=1 j j=1 j

Degrees of freedom If a distribution with parameters is given, or is fitted by a formal approach like maximum likelihood but using a different set of data, then there are k − 1 degrees of freedom. If the r parameters are fitted from the data, then there are k − 1 − r degrees of freedom.

Approximation The chi-square test assumes that the number of observations in each group is approximately normally distributed. To make this approximation work, each group should have at least 5 expected (not actual) observations.

Distribution The chi-square statistic is a sum of the squares of independent standard normal random variables. A chi-square random variable has a gamma distribution with parameters θ = 2 and α = d/2, where d is the number of degrees of freedom. If d = 2, then it is exponential.

If exposures and claims are given for several periods and each period is assumed to be independent, the chi-square statistic is: k 2 X (Oj − Ej) Q = V j=1 j where Ej is the fitted expected number and Vj is the fitted variance of observations in group j. The number of degrees of freedom in this case is k − r, where r is the number of parameters are fitted from the data. 36

Comparison of the three methods of testing goodness of fit

Kolmogorov-Smirnov Anderson-Darling Chi-square

Should be used only for individ- Should be used only for individ- May be used only for individual ual data ual data or grouped data Only for continuous fits For continuous or discrete fits Should lower critical value if Should lower critical value if No adjustment of critical value u < ∞ u < ∞ is needed for u < ∞ Critical value should be lowered Critical value should be lowered Critical value is automatically if parameters are fitted if parameters are fitted adjusted if parameters are fitted Critical value declines with Critical value independent of Critical value independent of larger sample size sample size sample size No discretion No discretion Discretion in grouping of data Uniform weight on all parts of Higher weight on tails of distri- Higher weight on intervals with distribution bution low fitted probability 37

Lesson 36. Likelihood Ratio Test and Algorithm, Penalized Loglikelihood Tests There are two types of methods for selecting a model: judgment-based and score-based. The highest value of likelihood function at the maximum or the likelihood ratio method is one of the score-based methods. A free parameter is one that is not specified, and that is therefore maximized using maximum likeli- hood. The number of free parameters to be estimated is denoted by r. The number of degrees of freedom for the likelihood ratio test is the number of free parameters in the altermative model, the model of the , minus the number of free parameters in the base model, the model of the null hypothesis. The Likelihood Ratio Test (LRT) accepts the alternative model if the loglikelihood of it exceeds the loglikelihood of the base model by one-half of the appropriate chi-square percentile (1 minus the significance level of the test) at the number of degrees of freedom for the test: the alternative model is accepted if 2 (ln L1 − ln L0) > c, where P r (X > c) = α for X a chi-square random variable with the number of degrees of freedom for the test. For every number of parameters, the model with the highest loglikelihood is selected. The Schwarz Bayesian Criterion (SBC)/Bayesian Information Criterion (BIC) subtracts r ln n 2 from the loglikelihood of the model (which is always negative). Then the score is: r ln L − ln n 2 The model with the highest score is selected. The Akaike Information Criterion (AIC) subtracts r from the loglikelihood of the model. Then the score is: ln L − r The model with the highest loglikelihood is selected. 38

Lesson 38. Limited Fluctuation or Classical Credibility: Poisson Frequency Let

 eF be the exposure needed for full credibility  µ and σ be the expected aggregate claims and the standard deviation per exposure  yp be the coefficient from the standard normal distribution for the desired confidence interval, −1 yp = Φ ((1 + p)/2)  k be the maximum accepted fluctuation Then y 2 e = n · CV 2, where n = p , F 0 0 k σ CV = − is the coefficient of variation for the aggregate distribution µ

If the claim frequency is Poisson with mean λ and µS, σS and CVS are the mean, standard deviation and the coefficient of variation of claim severity, then the credibility formulas could be summarized in the following table:

Experience expressed in Number of claims Claim size (severity) Aggregate losses/Pure premium

 2   2  n0 CVS 1+CVS Exposure units λ n0 λ n0 λ 2 2 Number of claims n0 n0CVS n0(1 + CVS ) 2 2 Aggregate losses n0µS n0µSCVS n0µS(1 + CVS )

The horizontal axis of the table fills in the ∗ in the statement ”You want ∗ to be within k of expected P of the time”. The vertical axis of the table fills in the ∗ in the statement ”How many ∗ are needed for full credibility? ”. Also, note that 2 2 2 2 1 + CVS µS + σS σ = 2 = 2 λ λµS µ 39

Lesson 39. Limited Fluctuation or Classical Credibility: Non-Poisson Frequency

Using the same notation as in the previous lesson with additional notation of µf , σf and CVf are the mean, standard deviation and the coefficient of variation of claim frequency, the credibility formulas could be summarized in the following table:

Experience expressed in Number of claims Claim size (severity) Aggregate losses/Pure premium

 σ2   2   σ2 2  f σs f σs Exposure units n0 2 n0 2 n0 2 + 2 µf µsµf µf µsµf  σ2   2   σ2 2  f σs f σs Number of claims n0 n0 2 n0 + 2 µf µs µf µs  σ2   2   σ2 2  f σs f σs Aggregate losses n0µs n0 n0µs + 2 µf µs µf µs 40

Lesson 40. Limited Fluctuation or Classical Credibility: Partial Credibility Let • Z be the credibility factor • M be the manual premium or the prior estimate of total loss (pure premium) • X¯ be the observed total loss (pure premium)

Then the credibility premium PC :

PC = ZX¯ + (1 − Z)M = M + Z(X¯ − M)

For n expected claims and nF expected claims needed for full credibility, r n Z = nF 41

Lesson 41. Bayesian Methods: Discrete Prior Bayes’ Theorem: P r(B|A)P r(A) P r(A|B) = P r(B) where the left side P is the , B is the observations, and A is the prior distribution. We answer two questions: 1. What is the probability that this risk belongs to some class? 2. What is the expected size of the next loss for this risk? We’ll construct a 4-line table to solve the first type of problem, with 2 additional lines for solving the second type of problem. The table has one column for each type of risk. 1. that the risk is in each class. 2. The likelihood of the experience given the class. 3. The probability of being in the class and having the observed experience, or the joint probability. Product of the first two rows. Sum up the entries in the third row. Each entry of the third row is a numerator in the expression for the posterior probability of being in the class given the experience given by Bayes’ Theorem, while the sum is the denominator in this expression. 4. Posterior probability of being in each class given the experience. Quotient of the third row over its sum. 5. Expected value, given that the risk is in the class. Also known as hypothetical means 6. Expected size of the next loss for this risk, given the experience. Also known as the Bayesian premium. Product of the 4th and 5th rows. Sum up the entries of the 6th row. 42

Lesson 42. Bayesian Methods: Continuous Prior If the prior distribution is continuous, Bayes’ Theorem becomes

π(θ)f(x1, x2, . . . , xn|θ) π(θ)f(x1, x2, . . . , xn|θ) π(θ|x1, x2, . . . , xn) = = R f(x1, x2, . . . , xn) π(θ)f(x1, x2, . . . , xn|θ) dθ Here n Y f(x1, x2, . . . , xn|θ) = f(xi|θ) i=1 Z f(xn+1|x1, x2, . . . , xn) = f(xn+1|θ)π(θ|x1, x2, . . . , xn) dθ 43

Lesson 43. Bayesian Credibility: Poisson/Gamma Suppose claim frequency is Poisson, with parameter λ varying by insured according to a gamma distribution with parameters α and θ:(N|λ) ∈ Γ(α, θ). Let γ = 1/θ. Suppose there are n exposures and x claims. Then the posterior distribuion of λ is a gamma distribution with parameters α∗ = α+x and γ∗ = γ + n, θ∗ = 1/γ∗:(λ|N) ∈ Γ(α∗, θ∗). The posterior mean is α∗ α + nx¯ γ α n PC = = = + x¯ γ∗ γ + n γ + n γ γ + n where Z = n/(γ + n) is the credibility factor. 44

Lesson 44. Bayesian Credibility: Normal/Normal The normal distribution as a prior distribution is the of a model having the normal distribution with a fixed variance. Suppose that the model has a normal distribution with mean θ and fixed variance ν. The prior hypothesis is that θ has a normal distribution with mean µ and variance a. Then the posterior distribution is also normal with mean µ∗ and variance a∗, where vµ + nax¯ na µ = and a = ∗ ν + na ∗ ν + na n is the number of claims or person-years,x ¯ is the sample mean. If Z = na/ν + na is the credibility factor, then  na   ν  µ = Zx¯ + (1 − Z)µ = x¯ + µ ∗ ν + na ν + na

The predictive distribution is also normal with mean µ∗ and variance ν + a∗ 45

Lesson 45. Bayesian Credibility: Bernoulli/Beta 1. Bernoulli/Beta If the prior distribution is a beta with parameters a and b, and you observe n Bernoulli trials with k 1’s (successes), then the posterior distribution is beta with parameters a∗ = a + k and b∗ = b + n − k. The posterior mean is a E[θ|x] = ∗ a∗ + b∗ If Z = n/(n + a + b) is the credibility factor, then

a  k   n   a   a + b  E[θ|x] = Zk¯ + (1 − Z) = + a + b n n + a + b a + b n + a + b The predictive distribution for the next claim is also Bernoulli with mean q = a∗/(a∗ + b∗) 2. Negative Binomial/Beta If the model has a negative binomial distribution with r + x − 1 f (x|p) = pr(1 − p)x, x = 0, 1, 2, . . . , p = 1/(1 + β) x|p x and the distribution of p is beta with parameters a, b and θ = 1, then if you have n observations x1, . . . , xn with meanx ¯, the posterior distribution is beta with parameters a∗ = a + nr and b∗ = b + nx¯. The predictive mean: rb E[θ|x] = ∗ a∗ − 1 If Z = nr/(nr + a − 1) is the credibility factor, then

rb  nr   a − 1  rb E[θ|x] = Zx¯ + (1 − Z) =x ¯ + a − 1 nr + a − 1 nr + a − 1 a − 1 46

Lesson 46. Bayesian Credibility: Exponential/Inverse Gamma 1. Assume that claim size has an exponential distribution with mean θ:(X|θ) ∈ Exp(θ): 1 f(X|θ) = e−x/θ θ Assume that θ varies by insured according to an inverse gamma distribution with parameters α and β: βα e−β/θ π(θ) = Γ(α) θα+1 If n claims x1, . . . , xn are observed, the parameters of the posterior inverse gamma distribution are α∗ = α + n and β∗ = β + nx¯. The predictive mean: β E[θ|x] = ∗ α∗ − 1 If Z = n/(n + α − 1) is the credibility factor, then

 n   α − 1  β E[θ|x] = Zx¯ + (1 − Z)µ =x ¯ + n + α − 1 n + α − 1 α − 1 The predictive distribution is a two-parameter Pareto with the same parameters α, β.

2. If the claim size has an exponential distribution with mean ∆: (X|∆) ∈ Exp(1/∆): f(X|∆) = ∆e−x∆ Assume that ∆ varies by insured according to a gamma distribution with parameters α and β, then θ = 1/∆ follows inverse gamma distribution with parameters α and 1/β. The posterior for θ is inverse gamma distribution with α∗ = α + n and β∗ = 1/β + nx¯, and the posterior for ∆ is gamma with (α∗, 1/β∗). 3. Assume that claim size has a gamma distribution with parameters η and θ:(X|θ) ∈ Gamma(η, θ): 1 f(X|θ) = xη−1e−x/θ Γ(η)θη Assume that θ varies by insured according to an inverse gamma distribution with parameters α and β: βα e−β/θ π(θ) = Γ(α) θα+1 If n claims x1, . . . , xn are observed, the parameters of the posterior inverse gamma distribution are α∗ = α + ηn and β∗ = β + nx¯. 47

Lesson 47. B¨uhlmannCredibility: Basics The B¨uhlmannmethod is a linear approximation of the Bayesian method. • µ, or EHM, is the expected value of the hypothetical mean, or the overall mean: µ = E[E[X|θ]]. • a, or VHM, is the variance of the hypothetical mean a = V ar (E[X|θ]). • ν, or EPV, is the expected value of the process variance: ν = E[V ar (X|θ)]. • V ar(X) = a + ν • B¨uhlmann’s k: k = ν/a • B¨uhlmann’scredibility factor Z = n/(n + k), where n is the number of observations: the number of periods when studying frequency or aggregate losses, the number of claims when studying severity. • PC is the B¨uhlmann’scredibility expectation:

PC = Zx¯ + (1 − Z)µ = µ + Z(¯x − µ) 48

Lesson 48. B¨uhlmannCredibility: Discrete Prior The Bayesian method calculates the true expected value. The B¨uhlmannmethod is only an approx- imation. 49

Lesson 49. B¨uhlmannCredibility: Continuous Prior B¨uhlmanncredibility with a continuous prior is no different in principle from B¨uhlmanncredibility with a discrete prior. The task is to identify the hypothetical mean and process variance, then to calculate the mean and variance of the former (µ and a) and the mean of the latter (ν). From there, one can calculate k, Z, and the credibility premium. However, since the prior is continuous, the means and of the hypothetical mean and process variance may require integration rather than summation. 50

Lesson 50. B¨uhlmann-StraubCredibility Generalizations of B¨uhlmanncredibility. The B¨uhlmanncredibility model assumes one exposure in every period.

B¨uhlmann-Straub:there are mj exposures in period j. Hewitt model: extension of the B¨uhlmann-Straub model. 51

Lesson 51. Exact Credibility Priors, posteriors, predictives, and B¨uhlmann ν, a, and k for linear exponential model/conjugate prior pairs

Model Prior Posterior Predictive B¨uhlmann ν B¨uhlmann a B¨uhlmann k

Poisson(λ) Gamma Gamma Negative Bi- nomial 2 α α∗ = α + nx¯ r = α∗ αθ αθ 1/θ

γ = 1/θ γ∗ = γ + n β = 1/γ∗

Bernoulli(q) Beta Beta Bernoulli a a = a + nx¯ q = a∗ ab ab a + b ∗ a∗+b∗ (a+b)(a+b+1) (a+b)2(a+b+1) b b∗ = b+n(1−x¯)

Normal(θ, ν) Normal Normal Normal νµ+nax¯ µ µ∗ = µ = µ∗ ν a ν/a aν 2 a a∗ = na+ν σ = a∗ + ν

Exponential(θ) Inverse Inverse gamma Pareto gamma θ2 θ2 α α∗ = α + n α = α∗ (α−1)(α−2) (α−1)2(α−2) α − 1 θ θ∗ = θ+ θ = θ∗ 52

Lesson 52. B¨uhlmannas Least Squares Estimate of Bayes

Let Xi are the observations, Yi are the Bayesian predictions and Yˆi are the B¨uhlmannpredictions. Suppose we’d like to estimate Yi by Yˆi which is a linear function of Xi: Yˆi = α + βXi and we’d like to select α and β in such a way as to minimize the weighted least square difference: X 2 pi(Yˆi − Yi)

If pi = P r(Xi,Yi), then Cov(X,Y ) β = V ar(X) α = E[Y ] − βE[X] Moreover, E[Y ] = E[X] X 2 2 V ar(X) = piXi − E[X] X Cov(X,Y ) = piXiYi − E[X]E[Y ] Also,

Cov(Xi,Xj) = V ar(µ(Θ)) = a

V ar(Xi) = E[ν(Θ)] + V ar(µ(Θ)) = ν + a 53

Lesson 53. Empirical Bayes Non-Parametric Methods

Suppose there are r policyholder groups, and each one is followed for ni years, where ni may vary by group, i = 1, 2, . . . , r. Experience is provided by year.

Uniform Exposures Non-Uniform Exposures

P miX¯i µˆ x¯ x¯ = Pi i mi n Pr P i m (x −x¯ )2 1 Pr Pn 2 i=1 j=1 ij ij i νˆ (xij − x¯i) Pr r(n−1) i=1 j=1 i=1(ni−1)

 Pr 2 −1 1 Pr 2 νˆ i=1 mi Pr 2  aˆ r−1 i=1(¯xi − x¯) − n m − m i=1 mi(¯xi − x¯) − ν(r − 1)

Z n mi n+k mi+k

i ¯ ¯ ¯ PC (1 − Z)ˆµ + ZXi (1 − Zi)X + ZiXi 54

Lesson 54. Empirical Bayes Semi-Parametric Methods I Poisson Models ¯ ¯ 2 νˆ X (Xi − X) µˆ =ν ˆ =x ¯;a ˆ = s2 − ; s2 = n r − 1 i aˆ estimated using empirical Bayes semi-parametric methods may be non-positive. In this case the method fails, and no credibility is given. For non-uniform exposures, use formulae from Lesson 53 to estimate the values ofx ¯ anda ˆ.

II Non-Poisson Models If the model is not Poisson, but there is a linear relationship between µ and ν, use the same technique as for a Poisson model. For example: a) Negative binomial with fixed β: E[N|r] = rβ, V ar[N|r] = rβ(1 + β) ⇒ µˆ =x, ¯ νˆ =x ¯(1 + β) b) Gamma with fixed θ: E[N|α] = αθ, V ar[N|α] = αθ2 ⇒ µˆ =x, ¯ νˆ =xθ ¯

III Which B¨uhlmannmethod should be used The following six B¨uhlmannmethods have been discussed in preparation for this Exam: 1. B¨uhlmann 2. B¨uhlmann-Straub 3. Empirical Bayes non-parametric with uniform exposures 4. Empirical Bayes non-parametric with non-uniform exposures 5. Empirical Bayes semi-parametric methods with uniform exposures 6. Empirical Bayes semi-parametric methods with non-uniform exposures The first two methods can only be used if you have a model specifying risk classes with means and variances. The second two methods must be used, if all you have is data. The last two methods should be used, if, in addition to data, a hypothesis that each exposure has a Poisson (or some other) distribution.