<<

ACTS 4304 FORMULA SUMMARY Lesson 1: Basic Probability Summary of Probability Concepts Probability Functions

F (x) = P r(X ≤ x) S(x) = 1 − F (x) dF (x) f(x) = dx H(x) = − ln S(x) dH(x) f(x) h(x) = = dx S(x) Functions of random variables Z ∞ E[g(x)] = g(x)f(x)dx −∞ 0 n n-th raw µn = E[X ] n n-th central moment µn = E[(X − µ) ] σ2 = E[(X − µ)2] = E[X2] − µ2 µ µ0 − 3µ0 µ + 2µ3 γ = 3 = 3 2 1 σ3 σ3 µ µ0 − 4µ0 µ + 6µ0 µ2 − 3µ4 γ = 4 = 4 3 2 2 σ4 σ4 Moment generating function M(t) = E[etX ] Probability generating function P (z) = E[zX ]

More concepts • (σ) is positive square root of variance • Coefficient of variation is CV = σ/µ • 100p-th percentile π is any point satisfying F (π−) ≤ p and F (π) ≥ p. If F is continuous, it is the unique point satisfying F (π) = p • is 50-th percentile; n-th quartile is 25n-th percentile • is x which maximizes f(x) (n) n (n) • MX (0) = E[X ], where M is the n-th (n) PX (0) • n! = P r(X = n) (n) • PX (1) is the n-th factorial moment of X. Bayes’ Theorem P r(B|A)P r(A) P r(A|B) = P r(B)

fY (y|x)fX (x) fX (x|y) = fY (y) Law of total probability 2

If Bi is a set of exhaustive (in other words, P r(∪iBi) = 1) and mutually exclusive (in other words P r(Bi ∩ Bj) = 0 for i 6= j) events, then for any event A, X X P r(A) = P r(A ∩ Bi) = P r(Bi)P r(A|Bi) i i Correspondingly, for continuous distributions, Z P r(A) = P r(A|x)f(x)dx

Conditional Expectation Formula

EX [X] = EY [EX [X|Y ]] 3

Lesson 2: Parametric Distributions Forms of probability density functions for common distributions

Distribution Probability density function f(x)

Uniform c, x ∈ [d, u] Beta cxa−1(θ − x)b−1, x ∈ [0, θ] − x Exponential ce θ , x ≥ 0 Weibull cxτ−1e−xτ /θτ , x ≥ 0 α−1 − x Gamma cx e θ , x ≥ 0 c Pareto (x+θ)α+1 , x ≥ 0 c Single-parameter Pareto xα+1 , x ≥ θ 2 2 ce−(ln x−µ) /2σ Lognormal x , x > 0

Summary of Parametric Distribution Concepts • If X is a member of a scale family with θ with value s, then cX is the same family and has the same parameter values as X except that the scale parameter θ has value cs. • All distributions in the tables are scale families with scale parameter θ except for lognormal and inverse Gaussian. • If X is lognormal with parameters µ and σ, then cX is lognormal with parameters µ + ln c and σ. • See the above table to learn the forms of commonly occurring distributions. Useful facts are d + u (u − d)2 Uniform on [d, u] E[X] = , V arX = 2 12 u2 Uniform on [0, u] E[X2] = 3 Gamma V arX = αθ2 • If Y is single-parameter Pareto with parameters α and θ, then Y − θ is two-parameter Pareto with parameters. • X is in the linear if its probability density function can be expressed as p(x)er(θ)x f(x; θ) = q(θ) 4

Lesson 3: Variance

For any random variables X and Y, E[aX + bY ] = aE[X] + bE[Y ] V ar(aX + bY ) = a2V ar(X) + 2abCov(X,Y ) + b2V ar(Y ) n ! n X X For independent random variables X1,X2, ··· ,Xn, V ar Xi = V ar(Xi) i=1 i=1 For independent identically distributed random variables (i.i.d.) X1,X2, ··· ,Xn, n ! X V ar Xi = nV ar(X) i=1 n 1 X The sample mean X¯ = X n i i=1 1 The variance of the sample mean V ar(X¯) = V ar(X) n Double expectation EX [X] = EY [EX [X|Y ]]

Conditional variance V arX [X] = EY [V arX [X|Y ]] + V arY (EX [X|Y ]) 5

Lesson 4: Mixtures and Splices Pn • If X is a mixture of n random variables with weights wi such that i=1 wi = 1, then the following can be expressed as a weighted average: n X Cumulative distribution function: FX (x) = wiFXi (x) i=1 n X Probability density function: fX (x) = wifXi (x) i=1 n k X k k-th raw moment: E[X ] = wiE[Xi ] i=1 • Conditional variance:

V arX [X] = EI [V arX [X|I]] + V arI (EX [X|I]) • Splices: For a spliced distribution, the sum of the probabilities of being in each splice must add up to 1. 6

Lesson 5: Policy Limits All formulas assume P r(X < 0) = 0. Z ∞ E[X] = S(x) dx 0 Z u Z u E[X ∧ u] = xf(x) dx + u (1 − F (u)) = S(x) dx 0 0 Z ∞ E[Xk] = kxk−1S(x) dx 0 Z u Z u E[(X ∧ u)k] = xkf(x) dx + uk (1 − F (u)) = kxk−1S(x) dx 0 0 For inflation, if Y = (1 + r)X, then  u  E[Y ∧ u] = (1 + r)E X ∧ 1 + r 7

Lesson 6: Deductibles Payment per Loss: L FY L (x) = FX (x + d), if Y = (X − d)+ Z ∞ E[(X − d)+] = (x − d)f(x) dx d Z ∞ E[(X − d)+] = S(x) dx d E[X] = E[X ∧ d] + E[(X − d)+] Payment per Payment:

FX (x + d) − FX (d) P FY P (x) = , if Y = (X − d)+|X > d 1 − FX (d)

SX (x + d) P SY P (x) = , if Y = (X − d)+|X > d SX (d) E[(X − d) ] E[X] − E[X ∧ d] e (d) = + = − mean excess loss X S(d) S(d) R ∞(x − d)f(x) dx e (d) = d X S(d) R ∞ S(x) dx e (d) = d X S(d) E[X] = E[X ∧ d] + e(d) (1 − F (d)) Mean excess loss for different distributions:

eX (d) = θ for exponential θ − d e (d) = , d < θ for uniform on [0, θ] X 2 θ − d e (d) = , d < θ for beta with parameters a = 1, b, θ X 1 + b θ + d e (d) = for two-parameter Pareto X α − 1 ( d α−1 d ≥ θ eX (d) = α(θ−d)+d for single-parameter Pareto α−1 d ≤ θ If Y L,Y P are loss and payment random variables for franchise deductible of d, and XL,XP are loss and payment random variables for ordinary deductible of d, then E[Y L] = E[XL] + dS(d) E[Y P ] = E[XP ] + d 8

Lesson 7: Loss Elimination Ratio The Loss Elimination Ratio is defined as the proportion of the expected loss which the insurer doesn’t pay as a result of an ordinary deductible d: E[X ∧ d] E[(X − d) ] LER(d) = = 1 − + E[X] E[X] Loss Elimination Ratio for Certain Distributions:

LER(d) = 1 − e−d/θ for an exponential  θ α−1 LER(d) = 1 − for a Pareto with α > 1 d + θ (θ/d)α−1 LER(d) = 1 − for a single-parameter Pareto with α > 1, d ≥ θ α 9

Lesson 8: Measures and Tail Weight −1 Value-at-Risk: V aRp(X) = πp = FX (p) Tail-Value-at-Risk: R ∞ xf(x) dx V aRp(X) T V aRp(X) = E [X|X > V aRp(X)] = = 1 − F (V aRp(X)) R 1 V aRy(X) dy = p = V aR (X) + e (V aR (X)) = 1 − p p X p E[X] − E[X ∧ V aR (X)] = V aR (X) + p p 1 − p Value-at-Risk and Tail-Value-at-Risk measures for some distributions:

Distribution V aRp(X) T V aRp(X)

Exponential −θ ln(1 − p) θ (1 − ln(1 − p))

 − 1    − 1  Pareto θ (1 − p) α − 1 E[X] 1 + α (1 − p) α − 1 2 −zp/2 Normal µ + z σ µ + σ · e√ p 1−p 2π µ+zpσ Φ(σ−zp) Lognormal e E[X] · 1−p 10

Lesson 9: Other Topics in Severity Coverage Modifications Policy limit - the maximum amount that the coverage will pay. In the presence of a deductible or other modifications, perform the other modifications first, then the policy limit. Maximum coverage loss is the stipulated amount considered in calculating the payment. Apply this limit first, and then the deductible. If u is the maximum coverage loss and d - the deductible, then Y L = X ∧ u − X ∧ d Coinsurance of α is the portion of each loss reimbursed by insurance. In the presence of the three modifications, E[Y L] = α (E[X ∧ u] − E[X ∧ d]) If r is the inflation factor,   u   d  E[Y L] = α(1 + r) E X ∧ − E X ∧ 1 + r 1 + r 11

Lesson 10: Bonuses A typical bonus is a portion of the excess of r% of premiums over losses. If c is the portion of the excess, r is the loss ratio, P is earned premium, and X is losses, then B = max (0, c(rP − X)) = crP − c min(rP, X) = crP − c (X ∧ rP ) For a two-parameter with α = 2 and θ, θd E[X ∧ d] = d + θ 12

Lesson 11: Discrete Distributions For a (a, b, 0) class distributions,

pk b = a + , pk = P r(X = k) pk−1 k

Poisson Binomial Negative binomial Geometric

n r n λ m n+r−1    β  βn −λ· n!  n m−n  1 pn e n q (1 − q) n 1+β 1+β (1+β)n+1 Mean λ mq rβ β Variance λ mq(1 − q) rβ(1 + β) β(1 + β) q β β a 0 − 1−q 1+β 1+β q β b λ (m + 1) 1−q (r − 1) 1+β 0

For a (a, b, 1) class distributions, p0 is arbitrary and p b k = a + for k = 2, 3, 4, ··· pk−1 k Zero-truncated distributions: T pn pn = , n > 0 1 − p0 Zero-modified distributions: M M T pn = (1 − p0 )pn E[N] = cm V ar(N) = c(1 − c)m2 + cv, where M • c is 1 − p0 • m is the mean of the corresponding zero-truncated distribution • v is the variance of the corresponding zero-truncated distribution 13

Lesson 12: Poisson/Gamma Assume that in a portfolio of insureds, loss follows a with parameter λ, but λ is not fixed but varies by insured. Suppose λ varies according to a over the portfolio of insureds. If the conditional loss frequency of an insured, if you are not given who the insured is, is Poisson with parameter λ, then the unconditional loss frequency for an insured picked at random is a negative binomial. The parameters of the negative binomial (r, β) are the same as the parameters of the gamma distri- bution (α, θ): r = α, β = θ. For a gamma distribution with parameters (α, θ), the mean is αθ and the variance is αθ2. For a negative , the mean is rβ and the variance is rβ(1 + β). If the Poisson parameter for one hour has a gamma distribution with parameters (α, θ), the Poisson parameter for k hours will have a gamma distribution with parameters (α, kθ). 14

Lesson 13: Frequency Exposure and Coverage Modifications Let X be the severity, d - a deductible, v - the probability of paying the claim.

Model Original Exposure Coverage Parameters Modification Modification

Exposure n1 Exposure n2 Exposure n1 P r (X > d) = 1 P r (X > d) = 1 P r (X > d) = v

Poisson λ (n2/n1) λ vλ 1 Binomial m, q (n2/n1) m, q m, vq

Negative binomial r, β (n2/n1) r, β r, vβ

1 Note that (n2/n1) m must be an integer for exposure modification formula to work. These adjustments work for (a, b, 1) distributions as well as (a, b, 0) distributions. For (a, b, 1) distri- M P∞ butions, p0 = 1 − k=1 pk is adjusted as follows:  ∗  M∗ M  1 − p0 1 − p0 = 1 − p0 , 1 − p0 where asterisks indicate distributions with revised parameters. 15

Lesson 14: Aggregate Loss Models: Compound Variance For the collective risk model the aggregate losses are defined as: N X S = Xi, i=1 where N is the number of claims and Xi is the size of each claim. For the individual risk model the aggregate losses are defined as: n X S = Xi, i=1 where n is the number of insureds in the group and Xi is the aggregate claims of each individual member. For the collective risk model, we assume that aggregate losses have a compound distribution, with frequency being the primary distribution and severity being the secondary distribution. E[S] = E[N]E[X] V ar(S) = E[N]V ar(X) + V ar(N)E[X]2 For Poisson Primary: V ar(S) = λE[X2] 16

Lesson 15: Aggregate Loss Models: Approximating Distribution The aggregate distribution may be approximated with a : S − E[S] s − E[S] s − E[S] FS(s) = P r(S ≤ s) = P r ≤ ≈ Φ σS σS σS If severity is discrete, then the aggregate loss distribution is discrete, and a continuity correction is required: if X assumes values a and b, but no value in between, all of the following statements are equivalent: X > a, X ≥ b, X > c for any c ∈ (a, b) X ≤ a, X < b, X < c for any c ∈ (a, b) To evaluate probabilities, assume: a + b P r(X > a) = P r(X ≥ b) = P r(X > ) 2 a + b P r(X ≤ a) = P r(X < b) = P r(X < ) 2 If severity has a continuous distribution, no continuity correction is made. 17

Lesson 16: Discrete Aggregate Loss Models: Convolution and Recursive Formulae Let

pn = P r(N = n) = fN (n)

fn = P r(X = n) = fX (n)

gn = P r(S = n) = fS(n) P Then FS(x) = n≤x gn and ∞ X X gn = pk fi1 fi2 . . . fik , where n=0 i1+i2+···+ik=n k Y ∗k fim = f is the k- fold convolution of the f’s m=1

If N belongs to the (a, b, 0) class, gn can be calculated recursively:

g0 = PN (f0), where PN (z) is the probability generating function for the primary distribution k 1 X  bj  g = a + f g , k = 1, 2, 3,... k 1 − af k j k−j 0 j=1 In particular, for a Poisson distribution, where a = 0, b = λ, k λ X g = jf g , k = 1, 2, 3,... k k j k−j j=1

If N belongs to the (a, b, 1) class, gn can be calculated recursively as well:

g0 = PN (f0), where PN (z) is the probability generating function for the primary distribution k 1 1 X  bj  g = · (p − (a + b)p ) + a + f g , k = 1, 2, 3,... k 1 − af 1 0 1 − af k j k−j 0 0 j=1 18

Lesson 17: Aggregate Losses: Aggregate Deductible

E[(S − d)+] = E[S] − E[S ∧ d]

pn = P r(N = n) = fN (n)

fn = P r(X = n) = fX (n)

gn = P r(S = n) = fS(n) X FS(x) = gn n≤x

Determine SS(x) = 1 − FS(x) and apply Z d E[S ∧ d] = SS(x) dx 0 19

Lesson 18: Aggregate Losses: Miscellaneous Topics Coverage Modifications If there is a per-policy deductible, the expected annual aggregate payment is either

E[S] = E[N] · E[(X − d)+] or E[S] = E[N P ] · e(d), where N P is expected number of payments per year and e(d) is the expected payment per payment. Exact Calculation of Aggregate Loss Distribution The distribution function of aggregate losses at x is the sum over n of the probabilities that the claim count equals n and the sum of n loss sizes is less than or equal to x.

(1) Normal Distribution of Severities. If n random variables Xi are independent and normally dis- tributed with parameters µ and σ2, their sum is normally distributed with parameters nµ and nσ2. (2) Exponential and Gamma () Distribution of Severities. The sum of n exponential random variables with common mean θ is a gamma distribution with parameters α = n and θ. For an integer α the gamma distribution is also called an . The probability that n events occur before time x is FS|N=n, where S|N = n is Erlang(n, θ) and n−1 X (x/θ)j F (x) = 1 − e−x/θ S|N=n j! j=0 If S is compound model with exponential severities, ∞ X FS(x) = pnFS|N=n(x) n=0 (3) Negative Binomial/Exponential Compound Models. A compound model with negative binomial frequency with parameters r, integer, and β, and exponential severities with parameter θ is equivalent to a compound model with binomial frequency with parameters m = r and q = β/(1 + β) and exponential severities with parameter θ(1 + β). (4) Compound Poisson Models. Suppose Sj are a set of compound Poisson models with Poisson Pn parameters λj and severity random variables Xj. Then the sum S = j=1 Sj is a compound Pn Poisson model with Poisson parameter λ = j=1 λj and severity having a weighted average, or a mixture, distribution of the individual severities Xj. The weights are λj/λ. Discretizing The recursive method for calculating the aggregate distribution as well as the direct convolution method require a discrete severity distribution. Usually the severity distribution is continuous. Thus, discretization is needed. (1) Method of rounding. If h is the span,

fkh = F ((k + 0.5 − 0)h) − F ((k − 0.5 − 0)h) k k (2) Method of local moment matching. For the xk = x0 + kh and masses m0 and m1, solve the following system:  k k m0 + m1 = F ((k + 1)h) − F (kh) k k R (k+1)h xkm0 + xk+1m1 = kh xf(x) dx Then k k−1 fkh = m0 + m1 20

Lesson 19: Review of Mathematical Statistics ˆ Bias: biasθˆ(θ) = E[θ|θ] − θ The sample mean is unbiased. The sample variance (with division by n − 1) is unbiased. Consistency: We say that the is consistent if with probability of 1, it is arbitrary close to the true value of the parameter if the sample is large enough, i.e. for all δ > 0

lim P r(|θˆn − θ| < δ) = 1 ⇔ n→∞

lim P r(|θˆn − θ| ≥ δ) = 0 n→∞ A sufficient condition for consistency is that the estimator is asymptotically unbiased and the variance of the estimator goes to 0 as n → ∞. Mean square error: MSE (θ) = E[(θˆ − θ)2|θ] = bias2(θ) + V ar(θˆ) θˆ θˆ 21

Lesson 20: The Empirical Distribution for Complete Data Individual Data

(1) Fn(x) - empirical cumulative distribution function. (2) fn(x) - empirical probability density function. Equal to k/n, where k is the number of xi in the sample equal to x. (3) Hn(x) = − ln Sn(x) - empirical cumulative hazard function. Grouped Data

(1) Fn(x) - empirical cumulative distribution function, called ogive. (2) fn(x) - empirical probability density function, called histogram. nj fn(x) = , n(cj − cj−1)

where x is in the interval [cj−1, cj), there are nj points in the interval, and n points altogether. 22

Lesson 21: Variance of Empirical with Complete Data Individual Data

Let nx be the observed number of survivors past time x. Then n n n − n S (x) = x ,F (x) = 1 − x = x , and n n n n n nx(n − nx) Vd ar (Sn(x)) = Vd ar (Fn(x)) = n3 Let nx, ny be the observed number of survivors past time x and y respectively. Then ny y−xpx = , and nx (nx − ny)ny Vd ar (y−xpˆx|nx) = Vd ar (y−xqˆx|nx) = 3 nx If nx is the number of observations equal to x, then nx(n − nx) Vd ar (fn(x)) = n3 Grouped Data

For a point x in the interval [cj−1, cj), if Y is the number of observations less than or equal to cj−1 and Z is the number of observations in the interval [cj−1, cj), then 2 2 Vd ar(Y )(cj − cj−1) + Vd ar(Z)(x − cj−1) + 2Covd (Y,Z)(cj − cj−1)(x − cj−1) Vd ar (Sn(x)) = 2 2 n (cj − cj−1) Vd ar(Z) Vd ar (fn(x)) = 2 2 , where n (cj − cj−1) Y (n − Y ) Vd ar(Y ) = n Z(n − Z) Vd ar(Z) = n YZ Covd (Y,Z) = − n 23

Lesson 22: Kaplan-Meier and Nelson-Aalen Estimators A Note about Incomplete Data • When no information is provided for certain ranges of data (observations below a certain number are not available), the data is called truncated. An example is a deductible. • When a range of values rather than an exact value is provided, the data is said to be censored. An example is a policy limit.

In a typical mortality study, the following notation is used for an individual i:

• di entry time • ui withdrawal time • xi death time Kaplan-Meier Product Limit Estimator j−1   Y si Sˆ(t) = 1 − , y ≤ t < y r j−1 j i=1 i Nelson-Aalen Estimator j−1 X si Hˆ (t) = , y ≤ t < y r j−1 j i=1 i ˆ Sˆ(t) = e−H(t) Exponential extrapolation t/t0 Sˆ(t) = Sˆ(t0) , t ≥ t0 24

Lesson 23: Estimation of Related Quantities Complete Data The variance of the empirical distribution is Pn (x − x¯)2 σ2 = i=1 i n Grouped Data

(1) fn(x) - empirical probability density function (the histogram) nj fn(x) = , n(cj − cj−1)

where x is in the interval [cj−1, cj), there are nj points in the interval, and n points altogether. (2) Fn(x) - empirical cumulative distribution function, called ogive. (3) E[X] and V ar(X) are found from the definitions: Z ∞ Z ∞ E[X] = xfn(x) dx = Sn(x) dx 0 0 Z d Z d E[X ∧ d] = xf(x) dx + d (1 − F (d)) = S(x) dx 0 0 Z ∞ 2 2 E[X ] = x fn(x) dx 0 Z d E[(X ∧ d)2] = x2f(x) dx + d2 (1 − F (d)) 0 V ar(X) = E[X2] − E[X]2 Incomplete Data Use Kaplan-Meier Product Limit Estimator j−1   Y si Sˆ(t) = 1 − , y ≤ t < y r j−1 j i=1 i or Nelson-Aalen Estimator j−1 X si Hˆ (t) = , y ≤ t < y r j−1 j i=1 i ˆ Sˆ(t) = e−H(t) to find Sˆ(t) and fˆ(t) and find E[X] and V ar(X) from basic definitions. 25

Lesson 24: Variance of Kaplan-Meier and Nelson-Aalen Estimators Greenwood formula for variance of Kaplan-Meier estimator 2 X sj Vd ar (Sn(t)) = Sn(t) rj(rj − sj) yj ≤t Formula for variance of Nelson-Aalen estimator  ˆ  X sj Vd ar H(t) = 2 rj yj ≤t Linear confidence interval for S(t) and H(t)

 q q  Sn(t) − z(1+p)/2 Vd ar (Sn(t)),Sn(t) + z(1+p)/2 Vd ar (Sn(t)) r r ! ˆ  ˆ  ˆ  ˆ  H(t) − z(1+p)/2 Vd ar H(t) , H(t) + z(1+p)/2 Vd ar H(t)

Log-transformed confidence interval for S(t)  1/U U  Sn(t) ,Sn(t) , where  q  z(1+p)/2 Vd ar (Sn(t)) U = exp   Sn(t) ln Sn(t)

Log-transformed confidence interval for H(t) ! Hˆ (t) , Hˆ (t)U , where U  r   ˆ  z(1+p)/2 Vd ar H(t)   U = exp    Hˆ (t) 

Conditional probabilities To calculate conditional probabilities like tpx, treat the study as if it started at time x and use the Greenwood approximation. 26

Lesson 25: Kernel Smoothing Notation and terminology • Kernel density methods are the methods of using kernels to smooth the distribution, even if these methods are used to evaluate the distribution function rather than the density function.

• The kernel density function is kxi (x).

• The kernel distribution function is Kxi (x).

• The kernel is the distribution for which kxi (x) and Kxi (x) are the density and distribution functions. • The kernel-smoothed density function is fˆ(x). • The kernel-smoothed distribution function is Fˆ(x). • The kernel-smoothed distribution is the distribution for which fˆ(x) and Fˆ(x) are the density and distribution functions.

n ˆ X f(x) = fn(x)kxi (x) i=1 n ˆ X F (x) = fn(x)Kxi (x) i=1 1 typically f (x) = n n Uniform kernel  1 x − b ≤ x ≤ x + b  2b i i kxi (x) =  0 otherwise  0 x ≤ x − b  i   K (x) = x−(xi−b) xi 2b xi − b ≤ x ≤ xi + b    1 x > xi + b Triangular kernel

 b−|x−xi|  b2 xi − b ≤ x ≤ xi + b kxi (x) =  0 otherwise  0 x ≤ x − b  i   2  (x−(xi−b)) x − b ≤ x ≤ x  2b2 i i Kxi (x) = 2  (xi+b−x)  1 − 2 xi ≤ x ≤ xi + b  2b    1 x ≥ xi + b Epanechnikov kernel

  2 3 1 − x−xi  x − b ≤ x ≤ x + b  4b b i i kxi (x) =  0 otherwise 27

Moments of kernel-smoothed distributions Let X be the with the kernel-smoothed distribution, and Y be the random variable with the original (un-smoothed) distribution, b be the bandwidth. Then E[X] = E[Y ] and 2  V ar(Y ) + b for a uniform kernel-smoothed density  3 V ar(X) =  b2 V ar(Y ) + 6 for a triangular kernel-smoothed density 28

Lesson 26: Approximations for Large Data Sets Notation and terminology

• Assume data are grouped into intervals, each one starting at cj and ending at cj+1, where c0 < c1 < ··· < ck • dj is the number of left truncated observations in the interval [cj, cj+1). This means the number of new entrants to the study in this interval. • uj is the number of right censored observations in the interval (cj, cj+1]. For a mortality study, this includes withdrawals or termination of the study; any cause of leaving the study other than death. • vj is the number of surrenders in the interval (cj, cj+1]. • wj is the number of survivals to end of study. • uj = vj + wj. • xj is the number of events in the interval (cj, cj+1]. • rj is the risk set to use for calculating the conditional mortality rate in the interval (cj, cj+1]. • Pj is the population, or the number of lives, at time cj. All entries/withdrawals at endpoints

Pj+1 = Pj + dj − uj − xj

rj = Pj + dj All entries/withdrawals uniformly distributed

Pj+1 = Pj + dj − uj − xj

rj = Pj + 0.5(dj − vj) between interval ends Once the survival function is calculated, it is interpolated (usually linearly) between interval ends rather than extended as a step function (as was done when large data set estimates were not used). 29

Lesson 27: Method of Moments Let Pn x Pn x2 m = i=1 i and t = i=1 i n n be the first two sample moments of the fitted distribution. Then the parameter estimators for various distributions are as follows:

Distribution Formulas

Exponential θˆ = m m2 ˆ t−m2 Gamma αˆ = t−m2 θ = m 2(t−m2) ˆ mt Pareto αˆ = t−2m2 θ = t−2m2 Lognormal µˆ = 2 ln m − 0.5 ln t σˆ2 = −2 ln m + ln t Uniform θˆ = 2m 30

Lesson 28: Percentile Matching Percentile matching formula for some distributions

Distribution Estimators

ˆ πp Exponential θ = − ln(1−p)

Inverse exponential θˆ = −πp ln p Weibull τˆ = ln(ln(1−p)/ ln(1−q)) ln(πp/πq) θˆ = − √ πp τˆ ln(1−p) Lognormal σˆ = ln πq−ln πp zq−zp µˆ = ln πp − zpσˆ 31

Lesson 29. Maximum Likelihood Estimators Likelihood formulas

Distribution Estimators

Discrete distribution, individual data px Continuous distribution, individual data f(x)

Grouped data F (cj) − F (cj−1) Individual data censored from above at u 1 − F (u) for censored observations Individual data censored from below at d F (d) for censored observations f(x) Individual data truncated from above at u F (u) f(x) Individual data truncated from below at d 1−F (d) 32

Lesson 30: Maximum Likelihood Estimators - special techniques Summary of maximum likelihood formulas In this table, n is the number of uncensored observations, c is the number of censored observations, di is the truncation point for each observation (0 is untruncated), xi is the observation if uncensored or the censoring point if censored. The last column (CT?) indicates whether the estimator may be used for right-censored or left-truncated data.

Distribution Formula CT?

ˆ 1 Pn+c Exponential θ = n i=1 (xi − di) Yes q 1 Pn 1 Pn 2 2 Lognormal µˆ = n i=1 ln xi, σˆ = n i=1(ln xi) − µˆ No ˆ n Inverse Exponential θ = Pn No i=1 1/xi q ˆ τ 1 Pn+c τ τ Weibull, fixed τ θ = n i=1 (xi − di ) Yes

Uniform [0, θ] individual data θˆ = max xi No ˆ Uniform [0, θ] grouped data θ = cj (n/nj)i No cj = Upper bound of highest finite interval

nj=number of observations below cj Pn+c Pn+c Two-parameter Pareto, fixed θ αˆ = −n/K, K = i=1 ln(θ + di) − i=1 ln(θ + xi) Yes Pn+c Pn+c Single-parameter Pareto, fixed θ αˆ = −n/K, K = i=1 ln(max(θ, di)) − i=1 ln xi Yes Pn Beta, fixed θ, b = 1 aˆ = −n/K, K = i=1 ln xi − n ln θ No ˆ Pn Beta, fixed θ, a = 1 b = −n/K, K = i=1 ln(θ − xi) − n ln θ No

Common likelihood functions, and their resulting estimates

When the is ··· Then the MLE is ···

−a −b/θ ˆ b L(θ) = θ e θ = a a −bθ ˆ a L(θ) = θ e θ = b a b ˆ a L(θ) = θ (1 − θ) θ = a+b 33

Lesson 31. Variance of Maximum Likelihood Estimators Asymptotic variance of MLE’s for common distributions. Here n is the sample size and Var means asymptotic variance.

Distribution Formula

ˆ θ2 Exponential V ar(θ) = n ˆ nθ2 Uniform [0, θ] V ar(θ) = (n+1)2(n+2) ˆ θ2 Weibull, fixed τ V ar(θ) = nτ 2 α2 Pareto, fixed θ V ar(ˆα) = n ˆ (α+2)θ2 Pareto, fixed α V ar(θ) = nα σ2 σ2 Lognormal V ar(ˆµ) = n , Cov(µ, σ) = 0, V ar(ˆσ) = 2n

Delta Method The delta method is a method of estimating the variance of a function of a random variable from the variance of the random variable. 1. Delta Method Formula - One Variable  dg 2 V ar (g(X)) ≈ V ar(X) dx 2. Delta Method Formula - Two Variables  ∂g 2 ∂g ∂g ∂g 2 V ar (g(X)) ≈ V ar(X) + 2Cov(X,Y ) + V ar(Y ) ∂x ∂x ∂y ∂y 3. Delta Method Formula - General  ∂g ∂g ∂g 0 V ar (g(X)) ≈ (∂g)0Σ(∂g), ∂g = , , ··· , ∂x1 ∂x2 ∂xn 34

Lesson 32. Fitting Discrete Distributions 1. For a Poisson with complete data, the method of moments and maximum likelihood estimators of λ are bothx ¯. 2. For a negative binomial with complete data: a. The method of moments estimators are σˆ2 − x¯ x¯2 βˆ = rˆ = x¯ σˆ2 − x¯ b. Maximum likelihood setsr ˆβˆ =x ¯. If one of them is known, the other one is set equal tox ¯ divided by the known one. 3. For a binomial with complete data, method of moments may not set m equal to an integer. Maximum likelihood proceeds by calculating a likelihood profile for each m ≥ xi. The maximum likelihood estimate of q given m isx/m ¯ . When the maximum likelihood for m + 1 is less than the one for m, the maximum overall is attained at m. M 4. For modified (a, b, 1) distributions,p ˆ0 = n0/n and the mean is set equal to the sample mean. 5. Fitting λ of a zero-modified Poisson requires numerical techniques. 6. Fitting q for a zero-modified binomial for fixed m requires solving a high-degree polynomial unless m ≤ 3. 7. Fitting β for a zero-modified negative binomial for fixed r requires numerical techniques except for special values of r, like 1. 8. If you are given a table with varying exposures and claims, and individual claims have a Poisson distribution with the same λ, the maximum likelihood estimate of λ is the sample mean, or the sum of all claims over the sum of all exposures. 9. to choose between (a, b, 0) distributions to fit to data, two methods are available: a. Compare the sample varianceσ ˆ2 to the sample meanx ¯. Choose binomial if it is less, Poisson if equal, and negative binomial if greater. b. Calculate knk/nk−1, and observe the slope as a function of k. Choose binomial if negative, Poisson if zero, and negative binomial if positive. 35

Lesson 33. Hypothesis Tests: Graphic Comparison These plots are constructed to assess how well the model fits the data.

1. D(x) plots

Let fn be the empirical probability density function and Fn be the empirical distribution function. Then for a sample x1 ≤ x2 ≤ · · · ≤ xn: number of x ≤ x j F (x) = j and F (x ) = n n n j n Let F ∗ be the fitted distribution function: F (x) − F (d) F ∗(x) = 1 − F (d) if observed data are left-truncated at d. Note F ∗(x) = F (x) for untruncated data. Then the D(x) plot is the graph of the function ∗ D(x) = Fn(x) − F (x)

2. p − p plots

Let Fn be the empirical distribution function: for a sample x1 ≤ x2 ≤ · · · ≤ xn j F (x ) = n j n + 1 Then the p − p plot is the graph linearly connecting points ∗ (Fn(xj),F (xj))

Note the difference in the definition of the Fn(xj) in a D(x) plot and a p − p plot. 36

Lesson 34. Hypothesis Tests: Kolmogorov-Smirnov

For a sample x1 ≤ x2 ≤ · · · ≤ xn the Kolmogorov-Smirnov statistic D is defined as: D = maxj Dj, where   ∗ j ∗ j − 1 Dj = max F (xj) − , F (xj) − , if xj 6= xj+1 and n n   ∗ j − 1 ∗ j + 1 Dj = max F (xj) − , F (xj) − , if xj = xj+1 n n 37

Lesson 35. Hypothesis Tests: Anderson-Darling

Let Fn be the empirical distribution function: for a sample y1 ≤ y2 ≤ · · · ≤ yn j number of y ≤ y F (y ) = and F (y) = j ,S (y) = 1 − F (y) n j n n n n n and let F ∗ be the fitted distribution function: F (y) − F (t) F ∗(y) = ,S∗(y) = 1 − F ∗(y) 1 − F (t) if observed data are left-truncated at t. Note F ∗(y) = F (y) for untruncated data. Then the Anderson-Darling statistic A2 is: Z u ∗ 2 2 (Fn(x) − F (x)) ∗ A = n ∗ ∗ f (x)dx t F (x) (1 − F (x)) For individual data, let t = y0 < y1 < y2 < ··· < yk < yk+1 = u where t is the truncation point and u is the censored point. Then k k 2 ∗ X 2 ∗ ∗ X 2 ∗ ∗ A = −nF (u) + n (Sn(yj)) (ln S (yj) − ln S (yj+1)) + n (Sn(yj)) (ln F (yj+1) − ln F (yj)) j=0 j=1 38

Lesson 36. Hypothesis Tests: Chi-square Chi-square Statistic Suppose the data is divided into k groups, n be the total number of observations. Let pj be the probability that X is in the jth group under the hypothesis, Oj be the number of observations in group j and let Ej = npj be the expected number of observations in group j. Then the chi-square statistic is: k 2 k 2 X (Oj − Ej) X Oj Q = = − n E E j=1 j j=1 j

Degrees of freedom If a distribution with parameters is given, or is fitted by a formal approach like maximum likelihood but using a different set of data, then there are k − 1 degrees of freedom. If the r parameters are fitted from the data, then there are k − 1 − r degrees of freedom.

Approximation The chi-square test assumes that the number of observations in each group is approximately normally distributed. To make this approximation work, each group should have at least 5 expected (not actual) observations.

Distribution The chi-square statistic is a sum of the squares of independent standard normal random variables. A chi-square random variable has a gamma distribution with parameters θ = 2 and α = d/2, where d is the number of degrees of freedom. If d = 2, then it is exponential.

If exposures and claims are given for several periods and each period is assumed to be independent, the chi-square statistic is: k 2 X (Oj − Ej) Q = V j=1 j where Ej is the fitted expected number and Vj is the fitted variance of observations in group j. The number of degrees of freedom in this case is k − r, where r is the number of parameters are fitted from the data. Comparison of the three methods of testing goodness of fit 39

Kolmogorov-Smirnov Anderson-Darling Chi-square

Should be used only for individ- Should be used only for individ- May be used only for individual ual data ual data or grouped data Only for continuous fits For continuous or discrete fits Should lower critical value if Should lower critical value if No adjustment of critical value u < ∞ u < ∞ is needed for u < ∞ Critical value should be lowered Critical value should be lowered Critical value is automatically if parameters are fitted if parameters are fitted adjusted if parameters are fitted Critical value declines with Critical value independent of Critical value independent of larger sample size sample size sample size No discretion No discretion Discretion in grouping of data Uniform weight on all parts of Higher weight on tails of distri- Higher weight on intervals with distribution bution low fitted probability

Lesson 37. Likelihood Ratio Algorithm, Schwarz Bayesian Criterion (SBC)/Bayesian Information Criterion (BIC), Akaike Information Criterion (AIC) There are two types of methods for selecting a model: judgment-based and score-based. The highest value of likelihood function at the maximum or the likelihood ratio method is one of the score-based methods. A free parameter is one that is not specified, and that is therefore maximized using maximum likeli- hood. The number of degrees of freedom for the likelihood ratio test is the number of free parameters in the altermative model, the model of the alternative hypothesis, minus the number of free parameters in the base model, the model of the null hypothesis. The Likelihood Ratio Test (LRT) accepts the alternative model if the loglikelihood of it exceeds the loglikelihood of the base model by one-half of the appropriate chi-square percentile (1 minus the significance level of the test) at the number of degrees of freedom for the test: the alternative model is accepted if 2 (ln L1 − ln L0) > c, where P r (X > c) = α for X a chi-square random variable with the number of degrees of freedom for the test. For every number of parameters, the model with the highest loglikelihood is selected. The Schwarz Bayesian Criterion (SBC)/Bayesian Information Criterion (BIC) subtracts r ln n 2 from the loglikelihood of the model (which is always negative). Then the score is: r ln L − ln n 2 The model with the highest loglikelihood is selected. The Akaike Information Criterion (AIC) subtracts r from the loglikelihood of the model. Then the score is: ln L − r The model with the highest loglikelihood is selected. 40

Lesson 38. Limited Fluctuation Credibility: Poisson Frequency Limited Fluctuation Credibility is also known as classical credibility. Let

 eF be the exposure needed for full credibility  µ and σ be the expected aggregate claims and the standard deviation per exposure  yp be the coefficient from the standard normal distribution for the desired confidence interval, −1 yp = Φ ((1 + p)/2)  k be the maximum accepted fluctuation Then y 2 e = n · CV 2, where n = p , F 0 0 k σ CV = − is the coefficient of variation for the aggregate distribution µ

If the claim frequency is Poisson with mean λ and µS, σS and CVS are the mean, standard deviation and the coefficient of variation of claim severity, then the credibility formulas could be summarized in the following table:

Experience expressed in Number of claims Claim size (severity) Aggregate losses/Pure premium

 2   2  n0 CVS 1+CVS Exposure units λ n0 λ n0 λ 2 2 Number of claims n0 n0CVS n0(1 + CVS ) 2 2 Aggregate losses n0µS n0µSCVS n0µS(1 + CVS )

The horizontal axis of the table fills in the ∗ in the statement ”You want ∗ to be within k of expected P of the time”. The vertical axis of the table fills in the ∗ in the statement ”How many ∗ are needed for full credibility? ”. Also, note that 2 2 2 2 1 + CVS µS + σS σ = 2 = 2 λ λµS µ 41

Lesson 39. Limited Fluctuation Credibility: Non-Poisson Frequency

Using the same notation as in the previous lesson with additional notation of µf , σf and CVf are the mean, standard deviation and the coefficient of variation of claim frequency, the credibility formulas could be summarized in the following table:

Experience expressed in Number of claims Claim size (severity) Aggregate losses/Pure premium

 σ2   2   σ2 2  f σs f σs Exposure units n0 2 n0 2 n0 2 + 2 µf µsµf µf µsµf  σ2   2   σ2 2  f σs f σs Number of claims n0 n0 2 n0 + 2 µf µs µf µs  σ2   2   σ2 2  f σs f σs Aggregate losses n0µs n0 n0µs + 2 µf µs µf µs 42

Lesson 40. Limited Fluctuation Credibility: Partial Credibility Let • Z be the credibility factor • M be the manual premium or the prior estimate of total loss (pure premium) • X¯ be the observed total loss (pure premium)

Then the credibility premium PC :

PC = ZX¯ + (1 − Z)M = M + Z(X¯ − M)

For n expected claims and nF expected claims needed for full credibility, r n Z = nF 43

Lesson 41. Bayesian Estimation and Credibility – Discrete Prior Bayes’ Theorem: P r(B|A)P r(A) P r(A|B) = P r(B) where the left side P is the posterior probability, B is the observations, and A is the prior distribution. We answer two questions: 1. What is the probability that this risk belongs to some class? 2. What is the expected size of the next loss for this risk? We’ll construct a 4-line table to solve the first type of problem, with 2 additional lines for solving the second type of problem. The table has one column for each type of risk. 1. that the risk is in each class. 2. The likelihood of the experience given the class. 3. The probability of being in the class and having the observed experience, or the joint probability. Product of the first two rows. Sum up the entries in the third row. Each entry of the third row is a numerator in the expression for the posterior probability of being in the class given the experience given by Bayes’ Theorem, while the sum is the denominator in this expression. 4. Posterior probability of being in each class given the experience. Quotient of the third row over its sum. 5. Expected value, given that the risk is in the class. Also known as hypothetical means 6. Expected size of the next loss for this risk, given the experience. Also known as the Bayesian premium. Product of the 4th and 5th rows. Sum up the entries of the 6th row. 44

Lesson 42. Bayesian Estimation and Credibility – Continuous Prior If the prior distribution is continuous, Bayes’ Theorem becomes

π(θ)f(x1, x2, . . . , xn|θ) π(θ)f(x1, x2, . . . , xn|θ) π(θ|x1, x2, . . . , xn) = = R f(x1, x2, . . . , xn) π(θ)f(x1, x2, . . . , xn|θ) dθ Here n Y f(x1, x2, . . . , xn|θ) = f(xi|θ) i=1 Z f(xn+1|x1, x2, . . . , xn) = f(xn+1|θ)π(θ|x1, x2, . . . , xn) dθ 45

Lesson 43. Bayesian Credibility: Poisson/Gamma Suppose claim frequency is Poisson, with parameter λ varying by insured according to a gamma distribution with parameters α and θ:(N|λ) ∈ Γ(α, θ). Let γ = 1/θ. Suppose there are n exposures and x claims. Then the posterior distribuion of λ is a gamma distribution with parameters α∗ = α+x and γ∗ = γ + n, θ∗ = 1/γ∗:(λ|N) ∈ Γ(α∗, θ∗). The posterior mean is α∗ α + nx¯ γ α n PC = = = + x¯ γ∗ γ + n γ + n γ γ + n where Z = n/(γ + n) is the credibility factor. 46

Lesson 44. Bayesian Credibility: Normal/Normal The normal distribution as a prior distribution is the of a model having the normal distribution with a fixed variance. Suppose that the model has a normal distribution with mean θ and fixed variance ν. The prior hypothesis is that θ has a normal distribution with mean µ and variance a. Then the posterior distribution is also normal with mean µ∗ and variance a∗, where vµ + nax¯ na µ = and a = ∗ ν + na ∗ ν + na n is the number of claims or person-years,x ¯ is the sample mean. If Z = na/ν + na is the credibility factor, then  na   ν  µ = Zx¯ + (1 − Z)µ = x¯ + µ ∗ ν + na ν + na

The predictive distribution is also normal with mean µ∗ and variance ν + a∗ 47

Lesson 45. Bayesian Credibility: Bernoulli/Beta, Negative Binomial/Beta 1. Bernoulli/Beta If the prior distribution is a beta with parameters a and b, and you observe n Bernoulli trials with k 1’s (successes), then the posterior distribution is beta with parameters a∗ = a + k and b∗ = b + n − k. The posterior mean is a E[θ|x] = ∗ a∗ + b∗ If Z = n/(n + a + b) is the credibility factor, then

a  k   n   a   a + b  E[θ|x] = Zk¯ + (1 − Z) = + a + b n n + a + b a + b n + a + b The predictive distribution for the next claim is also Bernoulli with mean q = a∗/(a∗ + b∗) 2. Negative Binomial/Beta If the model has a negative binomial distribution with r + x − 1 f (x|p) = pr(1 − p)x, x = 0, 1, 2, . . . , p = 1/(1 + β) x|p x and the distribution of p is beta with parameters a, b and θ = 1, then if you have n observations x1, . . . , xn with meanx ¯, the posterior distribution is beta with parameters a∗ = a + nr and b∗ = b + nx¯. The predictive mean: rb E[θ|x] = ∗ a∗ − 1 If Z = nr/(nr + a − 1) is the credibility factor, then

rb  nr   a − 1  rb E[θ|x] = Zx¯ + (1 − Z) =x ¯ + a − 1 nr + a − 1 nr + a − 1 a − 1 48

Lesson 46. Bayesian Credibility: Exponential/Inverse Gamma 1. Assume that claim size has an with mean θ:(X|θ) ∈ Exp(θ): 1 f(X|θ) = e−x/θ θ Assume that θ varies by insured according to an inverse gamma distribution with parameters α and β: βα e−β/θ π(θ) = Γ(α) θα+1 If n claims x1, . . . , xn are observed, the parameters of the posterior inverse gamma distribution are α∗ = α + n and β∗ = β + nx¯. The predictive mean: β E[θ|x] = ∗ α∗ − 1 If Z = n/(n + α − 1) is the credibility factor, then

 n   α − 1  β E[θ|x] = Zx¯ + (1 − Z)µ =x ¯ + n + α − 1 n + α − 1 α − 1 The predictive distribution is a two-parameter Pareto with the same parameters α, β.

2. If the claim size has an exponential distribution with mean ∆: (X|∆) ∈ Exp(1/∆): f(X|∆) = ∆e−x∆ Assume that ∆ varies by insured according to a gamma distribution with parameters α and β, then θ = 1/∆ follows inverse gamma distribution with parameters α and 1/β. The posterior for θ is inverse gamma distribution with α∗ = α + n and β∗ = 1/β + nx¯, and the posterior for ∆ is gamma with (α∗, 1/β∗). 3. Assume that claim size has a gamma distribution with parameters η and θ:(X|θ) ∈ Gamma(η, θ): 1 f(X|θ) = xη−1e−x/θ Γ(η)θη Assume that θ varies by insured according to an inverse gamma distribution with parameters α and β: βα e−β/θ π(θ) = Γ(α) θα+1 If n claims x1, . . . , xn are observed, the parameters of the posterior inverse gamma distribution are α∗ = α + ηn and β∗ = β + nx¯. 49

Lesson 47. B¨uhlmannCredibility: Basics The B¨uhlmannmethod is a linear approximation of the Bayesian method. • µ, or EHM, is the expected value of the hypothetical mean, or the overall mean: µ = E[E[X|θ]]. • a, or VHM, is the variance of the hypothetical mean a = V ar (E[X|θ]). • ν, or EPV, is the expected value of the process variance: ν = E[V ar (X|θ)]. • V ar(X) = a + ν • B¨uhlmann’s k: k = ν/a • B¨uhlmann’scredibility factor Z = n/(n + k), where n is the number of observations: the number of periods when studying frequency or aggregate losses, the number of claims when studying severity. • PC is the B¨uhlmann’scredibility expectation:

PC = Zx¯ + (1 − Z)µ = µ + Z(¯x − µ) 50

Lesson 48. B¨uhlmannCredibility: Discrete Prior The Bayesian method calculates the true expected value. The B¨uhlmannmethod is only an approx- imation. 51

Lesson 49. B¨uhlmannCredibility: Continuous Prior B¨uhlmanncredibility with a continuous prior is no different in principle from B¨uhlmanncredibility with a discrete prior. The task is to identify the hypothetical mean and process variance, then to calculate the mean and variance of the former (µ and a) and the mean of the latter (ν). From there, one can calculate k, Z, and the credibility premium. However, since the prior is continuous, the means and of the hypothetical mean and process variance may require integration rather than summation. 52

Lesson 50. B¨uhlmann-StraubCredibility Generalizations of B¨uhlmanncredibility. The B¨uhlmanncredibility model assumes one exposure in every period.

B¨uhlmann-Straub:there are mj exposures in period j. Hewitt model: extension of the B¨uhlmann-Straub model. 53

Lesson 51. Exact Credibility Priors, posteriors, predictives, and B¨uhlmann ν, a, and k for linear exponential model/conjugate prior pairs

Model Prior Posterior Predictive B¨uhlmann ν B¨uhlmann a B¨uhlmann k

Poisson(λ) Gamma Gamma Negative Bi- nomial 2 α α∗ = α + nx¯ r = α∗ αθ αθ 1/θ

γ = 1/θ γ∗ = γ + n β = 1/γ∗

Bernoulli(q) Beta Beta Bernoulli a a = a + nx¯ q = a∗ ab ab a + b ∗ a∗+b∗ (a+b)(a+b+1) (a+b)2(a+b+1) b b∗ = b+n(1−x¯)

Normal(θ, ν) Normal Normal Normal νµ+nax¯ µ µ∗ = µ = µ∗ ν a ν/a aν 2 a a∗ = na+ν σ = a∗ + ν

Exponential(θ) Inverse Inverse gamma Pareto gamma θ2 θ2 α α∗ = α + n α = α∗ (α−1)(α−2) (α−1)2(α−2) α − 1 θ θ∗ = θ+ θ = θ∗ 54

Lesson 52. B¨uhlmannas least squares estimate of Bayes

Let Xi are the observations, Yi are the Bayesian predictions and Yˆi are the B¨uhlmannpredictions. Suppose we’d like to estimate Yi by Yˆi which is a linear function of Xi: Yˆi = α + βXi and we’d like to select α and β in such a way as to minimize the weighted least square difference: X 2 pi(Yˆi − Yi)

If pi = P r(Xi,Yi), then Cov(X,Y ) β = V ar(X) α = E[Y ] − βE[X] Moreover, E[Y ] = E[X] X 2 2 V ar(X) = piXi − E[X] X Cov(X,Y ) = piXiYi − E[X]E[Y ] Also,

Cov(Xi,Xj) = V ar(µ(Θ)) = a

V ar(Xi) = E[ν(Θ)] + V ar(µ(Θ)) = ν + a 55

Lesson 53. Empirical Bayes non-parametric methods

Suppose there are r policyholder groups, and each one is followed for ni years, where ni may vary by group, i = 1, 2, . . . , r. Experience is provided by year.

Uniform Exposures Non-Uniform Exposures

P miX¯i µˆ x¯ x¯ = Pi i mi n Pr P i m (x −x¯ )2 1 Pr Pn 2 i=1 j=1 ij ij i νˆ (xij − x¯i) Pr r(n−1) i=1 j=1 i=1(ni−1)

 Pr 2 −1 1 Pr 2 νˆ i=1 mi Pr 2  aˆ r−1 i=1(¯xi − x¯) − n m − m i=1 mi(¯xi − x¯) − ν(r − 1)

Z n mi n+k mi+k

i ¯ ¯ ¯ PC (1 − Z)ˆµ + ZXi (1 − Zi)X + ZiXi 56

Lesson 54. Empirical Bayes semi-parametric methods I Poisson Models ¯ ¯ 2 X (Xi − X) µˆ =ν ˆ =x ¯;a ˆ = s2 − νˆ; s2 = r − 1 i aˆ estimated using empirical Bayes semi-parametric methods may be non-positive. In this case the method fails, and no credibility is given. For non-uniform exposures, use formulae from Lesson 53 to estimate the values ofx ¯ anda ˆ.

II Non-Poisson Models If the model is not Poisson, but there is a linear relationship between µ and ν, use the same technique as for a Poisson model. For example: a) Negative binomial with fixed β: E[N|r] = rβ, V ar[N|r] = rβ(1 + β) ⇒ µˆ =x, ¯ νˆ =x ¯(1 + β) b) Gamma with fixed θ: E[N|α] = αθ, V ar[N|α] = αθ2 ⇒ µˆ =x, ¯ νˆ =xθ ¯

III Which B¨uhlmannmethod should be used The following six B¨uhlmannmethods have been discussed in preparation for this Exam: 1. B¨uhlmann 2. B¨uhlmann-Straub 3. Empirical Bayes non-parametric with uniform exposures 4. Empirical Bayes non-parametric with non-uniform exposures 5. Empirical Bayes semi-parametric methods with uniform exposures 6. Empirical Bayes semi-parametric methods with non-uniform exposures The first two methods can only be used if you have a model specifying risk classes with means and variances. The second two methods must be used, if all you have is data. The last two methods should be used, if, in addition to data, a hypothesis that each exposure has a Poisson (or some other) distribution. 57

Lesson 55. Simulation - Inversion Method 1. Given a continuous density function f(x), calculate corresponding cumulative distribution function F (x): Z x F (x) = f(t) dt −∞ and then its inverse: F −1(x). Then for any randomly generated u from the uniform distribution on [0, 1], the corresponding simulated x is: x = F −1(u) 2. If F does not assume certain values because it jumps, so that F (c−) = a and F (c) = b, and you generate a uniform random number in the interval [a, b), you generate the random number x = c. 3. If F (x) = a in a range [x1, x2), you then map the uniform random number a to x2. 4. For a discrete distribution with pi = P r(X = xi), the distribution function is X F (x) = pi

xi≤x Pj−1 Pj Then a uniform random number in the interval [ i=1 pi, i=1 pi) gets mapped to xj. 58

Lesson 56. Simulation - Number of Data Values to Generate In the following table, (1) In the ”Confidence Interval” column, the confidence level is p. (2) In the ”Number of Runs” column, the estimated quantity should be within 100k% of actual with confidence level p. (3) π = (1 + p)/2 (4) zπ is the 100πth percentile of a standard normal distribution. (5) sn is the square root of the unbiased sample variance after n runs. s2 Pn (x − x¯)2 Pn x2 x¯2 n = i=1 i = i=1 i − n n(n − 1) n(n − 1) n − 1

(6) Pn is the number of runs out of n with values no greater than x. th (7) Ym is the m . zπ 2 (8) n0 = k 2 σ2 (9) CV = µ2 is the coefficient of variation.

Estimated Item Confidence Interval Number of Runs

√ √ 2 Mean (¯x − zπsn/ n, x¯ + zπsn/ n) n0CV  q q  Pn Pn(n−Pn) Pn Pn(n−Pn) F (x) n − zπ n3 , n + zπ n3 n0(n − Pn)/Pn

πq [Ya,Yb], where πˆq − Ya ≤ kπˆq and p a = bnq + 0.5 − zπ nq(1 − q)c Yb − πˆq ≤ kπˆq p b = dnq + 0.5 + zπ nq(1 − q)e 59

Lesson 57. Simulation - Applications Statistical Analysis of Risk Measures 1. From a simulation with n runs, the estimator for VaR is

V[ aRq(X) = Yk, where k = bnqc + 1

and Yk is the kth order statistic from the sample. 2. A p - confidence interval is (Ya,Yb), where p p a = bnq + 0.5 − z(1+p)/2 nq(1 − q)c and b = dnq + 0.5 + z(1+p)/2 nq(1 − q)e

3. The estimator for T V aRq(X) is the sample mean of the order statistics from bnqc + 1 to n. 4. The sample variance of those observations is Pn 2 (Yj − T\ V aRq(X)) s2 = j=k q n − k 5. An asymptotically unbiased estimator of the variance is 2 2     sq + q T\ V aRq(X) − V[ aRq(X) Vd ar T\ V aRq(X) = n − k + 1 60

Lesson 58. Bootstrap Approximation