Parameter Estimation for Tweedie Distribution with MCMC Approach Topic Presentation

Parameter Estimation for Tweedie Distribution with MCMC Approach Topic Presentation Tairan Ye 4/12/2018 University of Connecticut Summary 1. Markov Chain Monte Carlo 2. Tweedie Distribution 3. Parameter Estimation for Tweedie Model with MCMC Approach 1 Markov Chain Monte Carlo Introduction • Fact: Your desired parameters are not in the analytical form or extremely hard to solve out. • Solution: The sampling algorithms, like Markov Chain Monte Carlo • Reason: By constructing a Markov chain, we can sample the desired distribution by observing the chain after a number of iterations. The sample distribution will match more closely to the actual desired distribution as the iterations increase. 2 Example: Monte Carlo Integration General problem: Evaluating E[h(X )] = R h(x)p(x)dx can be difficult. However, if we can draw samples: X (1), X (2), ..., X (N) ∼ p(x) (1) Then, we can estimate: 1 N E[h(X )] ≈ ∑ h(X (t)) (2) N t=1 This is Monte Carlo (MC) Integration. 3 Markov Chain A Markov chain is generated by sampling: X (t+1) ∼ p(xjx(t)), t = 1, 2, ... (3) Here p is the transition kernel. So X (t+1) depends only on X (t), rather than X (0), X (1), X (2), ..., X (t−1) We can summarize as following: p(X (t+1)jX (t), X (t−1), ..., X (0)) = p(X (t+1)jX (t)) (4) 4 An example of Markov Chain A first order auto-regressive process with lag-1 auto-correlation 0.5. X (t+1)jX (t) ∼ N(0.5x(t), 1) (5) A simulation is conducted as following, which contains two different starting points: Figure 1: Simulation with two starting points The chains seemed to have forgotten their starting positions. 5 Sampling Algorithm Overview The most fundamental MCMC sampling algorithm are: • Gibbs Sampling 1. Require for the analytic form of the full conditional distribution. 2. High-efficiency { the sequence will converge with little iterations. • Metropolis-Hasting Algorithm 1. More flexible { no need for the deduction of full conditional distribution. 2. Some challenges for setting up the proposal distribution { experience needed to control the accept rate between 20% to 30%. 6 Full Conditional Distribution and Gibbs Sampling • Full conditional distributions: 2 2 The distributions p(qjs , y1, ..., yn) and p(s jq, y1, ..., yn) are called the full conditional distributions of s2 and q respectively, as they are each a conditional distribution of a parameter given everything else. • Gibbs Sampler: Given a current state of the parameters f(s) = fq(s), s2(s)g, we generate a new state as follows: (s+1) 2(s) 1. sample q ∼ p(qjs , y1, ..., yn); 2(s+1) 2 (s+1) 2. sample s ∼ p(s jq , y1, ..., yn); 3. update f(s+1) = fq(s+1), s2(s+1)g 7 Generalized Gibbs Sampler Suppose you have a vector of parameters f = ff1, ..., fpg, and your information about f is measured with p(f) = p(f1, ..., fp). For example, in the normal model f = fq, s2g, and the probability measure of interest 2 (0) (0) (0) is p(q, s jy1, ..., yn). Given a starting point f = ff1 , ..., fp g, the Gibbs sampler generates f(s) from f(s−1) as follows: (s) (s−1) (s−1) (s−1) • sample f1 ∼ p(f1jf2 , f3 , ..., fp ) (s) (s) (s−1) (s−1) • sample f2 ∼ p(f2jf1 , f3 , ..., fp ) ...... (s) (s) (s) (s) • sample fp ∼ p(fpjf1 , f2 , ..., fp−1) Markov Property: In this sequence, f(s) depends on f(0), ..., f(s−1) only through f(s−1), i.e. f(s) is conditionally independent of f(0), ..., f(s−2) given f(s−1). 8 Metropolis Algorithm The Metropolis algorithm proceeds by sampling a proposal value q∗ nearby the current value q(s) using a symmetric proposal distribution J(q∗jq(s)). Usually J(q∗jq(s)) is very simple, with samples from J(q∗jq(s)) being near q(s) with high probability. Examples include: • J(q∗jq(s)) ∼ U(q(s) − d, q(s) + d) • J(q∗jq(s)) ∼ N(q(s), d2) 9 Metropolis Algorithm The algorithm is as following: 1. Sample q∗ ∼ J(q∗jq(s)) 2. Compute the acceptance ratio: p(q∗jy) p(yjq∗)p(q∗) r = = (6) p(q(s)jy) p(yjq(s))p(q(s)) 3. Update ( ∗ q , p = min(r, 1) q(s+1) = (7) q(s), p = 1 − min(r, 1) • Step 3 can be accomplished by sampling u ∼ U(0, 1) and setting q(s+1) = q∗ if u < r and setting q(s+1) = q(s) otherwise. • In many cases, computing the ratio r directly can be numerically unstable, a problem that often can be remedied by computing the logarithm of r: 10 Metropolis Hastings Algorithm Metropolis Hastings Algorithm is the generalized and multi-parameters version of Gibbs sampling and Metropolis algorithm. Let's consider a simple example where our target probability distribution is p0(u, v), a bivariate distribution for two random variables U and V . 1. Update U: ∗ ∗ (s) (s) • Sample u ∼ Ju(u ju , v ) • Compute the acceptance ratio: p (u∗, v (s)) J (u(s)ju∗, v (s)) r = 0 ∗ u (8) (s) (s) ∗ (s) (s) p0(u , v ) Ju(u ju , v • Update ( u∗, p = min(r, 1) u(s+1) = (9) u(s), p = 1 − min(r, 1) • Sampling u ∼ U(0, 1) and setting u(s+1) = u∗ if u < r and setting u(s+1) = u(s) otherwise. 11 Metropolis Hastings Algorithm After updating U, we continue to update V: 1. Update V : ∗ ∗ (s+1) (s) • Sample v ∼ Jv (v ju , v ) • Compute the acceptance ratio: p (u(s+1), v ∗) J (v (s)ju(s+1), v ∗) r = 0 ∗ v (10) (s+1) (s) ∗ (s+1) (s) p0(u , v ) Jv (v ju , v ) • Update ( v ∗, p = min(r, 1) v (s+1) = (11) v (s), p = 1 − min(r, 1) • Sampling u ∼ U(0, 1) and setting v (s+1) = v ∗ if u < r and setting v (s+1) = v (s) otherwise. 12 Proposal Distribution The proposal distribution is the most "challenging" part for conducting the Metropolis algorithm. • Without restriction: ∗ 2 Normal distribution: q ∼ N(qt−1, sp ) • With restriction: ∗ Random walk: f (q ) = f (qt−1) + e and e ∼ N(0, 1) Example: q > 0, then apply log function. ∗ 2 q can be generated from lognormal(log(qt−1), sp ) 13 Combination In complex models, it is often the case that conditional distributions are available for some parameters but not for others. In these situations we can combine Gibbs and Metropolis-type proposal distributions to generate a Markov chain to approximate the joint distribution of all of the parameters. 14 More to talk about 1. Stationary: As t ! ¥, the Markov chain converges to its stationary distribution. Technique: trace plot, effective sample size (ESS) Package: 'coda' in R 2. Burn-in: Typically, we burn-in the initial simulation values since they are most likely nonstationary, and study the remaining part. 3. Gap: In order to avoid the dependency, we need to pick up the simulation values by gaps. 4. Chains: Also, we will build more than 1 chain to make sure the sampler is representative for the distribution. By the way, the calculation of deviance information criterion (DIC) requires at least 2 chains. 15 Example Figure 2: The trace plot of the raw MCMC 16 Example Figure 3: The trace plot of the updated sequence 17 Software • WinBUGS (Bayesian inference Using Gibbs Sampling) • JAGS (Just Another Gibbs Sampler) • Stan { with linkage of most of data science softwares via multiply platforms • R packages: rjags, coda, mcmc, BRugs, etc. 18 Tweedie Distribution Tweedie Distribution Overview Tweedie distribution can be defined as Tw(m, f, p). • m is the expected mean for the Tweedie distribution. • f is called the dispersion parameter. • p, or power, controls what distribution it belongs to. Property: • A special case of exponential dispersion models. • Variance = fmp. • Most interested in p 2 (1, 2), which is defined as compound Poisson-gamma distribution. 19 Compound Poisson-gamma distribution Due to its special property { mass at zero and continue otherwise, the Tweedie model with p 2 (1, 2), or the compound Poisson-gamma distribution, is popular in insurance industry. Denote the total auto insurance claim in dollar is Y , the frequency of claim is N, and the severity of each claim is Z. Then, we have: N Y = ∑ Zn (12) n=1 N ∼ Poisson(l) (13) Z ∼ gamma(a, b) (14) N = Z (15) j 20 Compound Poisson-gamma distribution We say Y ∼ Tw(m, f, p). Also: lab = m (16) la(a + 1)b2 = fmp (17) (18) We can solve: m2−p l = (19) f(2 − p) 2 − p a = (20) p − 1 b = f(p − 1)mp−1 (21) 21 Compound Poisson-gamma distribution The probability function for the Tweedie distribution with a power p 2 (1, 2) is: 1. For y = 0: P[Y = 0] = exp f−lg m2−p (22) = exp − f(2 − p) 2. For y > 0: ¥ n − y 1 l f (y) = e b e−l y na−1 Y ∑ na n=1 G(na)b n! j n 1 y 2−p o ( (2−p) ) ¥ ( ) p−1 y m 1 2−p f(p−1) = exp − − f(p − 1)m(p−1) f(2 − p) y ∑ j (2−p)j j=1 f G( p−1 )j! (23) 22 Parameter Estimation for Tweedie Model with MCMC Approach Model In order to study the key factors that affects the auto insurance claims, we can set up: log(ms ) = Xsg + erandom (24) 2 erandom ∼ N(0, s ) (25) where s indicates for different location; Xs contains the information about policyholders, including personal information as well as vehicle information; g is the factors, which we are interested in; e covers all other information we don't discuss here.

Parameter Estimation for Tweedie Distribution with MCMC Approach Topic Presentation

Adversarial Variational Bayes Methods for Tweedie Compound Poisson Mixed Models

Multivariate Tweedie Lifetimes: the Impact of Dependence

Robust Monitoring of Time Series with Application to Fraud Detection

Generalized Linear Models Lecture 11

Modelling Rainfall Using Generalized Linear Models When the Response Variable Has a Tweedie Distribution

Evaluation of Tweedie Exponential Dispersion Model Densities by Fourier Inversion

How Much Can a Retailer Sell? Sales Forecasting on Tmall

Generalized Estimating Equations When the Response Variable Has a Tweedie Distribution: an Application for Multi-Site Rainfall Modelling

Generalized Linear Models Beyond the Exponential Family with Loss Reserve Applications

Flexible Tweedie Regression Models for Continuous Data

Inferencing Tweedie Compound Poisson Mixed Models With

Fitting Tweedie's Compound Poisson-Gamma Mixture Model By