<<

ADVERSARIAL VARIATIONAL BAYES METHODS FOR TWEEDIE COMPOUND POISSON MIXED MODELS

Yaodong Yang∗, Rui Luo, Yuanyuan Liu

American International Group Inc.

ABSTRACT 1.5 w The Tweedie Compound Poisson-Gamma model is routinely 1.0 λ =1.9, α =49, β =0.03 random i effect used for modeling non-negative continuous data with a discrete x bi Σb Density mass at zero. Mixed models with random effects ac- i 0.5 U count for the covariance structure related to the grouping hier- λi αi βi

archy in the data. An important application of Tweedie mixed 0.0 i yi models is pricing the insurance policies, e.g. car insurance. 0 5 10 n Tweedie Value M However, the intractable likelihood function, the unknown vari- ance function, and the hierarchical structure of mixed effects (a) Tweedie Density Function (b) Tweedie Mixed-Effects model. have presented considerable challenges for drawing inferences Fig. 1: Graphical model of Tweedie mixed-effect model.

on Tweedie. In this study, we tackle the Bayesian Tweedie i.i.d mixed-effects models via variational inference approaches. In Poisson(λ),Gi ∼ Gamma(α, β). Tweedie is heavily used particular, we empower the posterior approximation by im- for modeling non-negative heavy-tailed continuous data with plicit models trained in an adversarial setting. To reduce the a discrete probability mass at zero (see Fig. 1a). As a result, of gradients, we reparameterize random effects, and Tweedie gains its importance from multiple domains [4, 5], in- integrate out one local latent variable of Tweedie. We also cluding actuarial science (aggregated loss/premium modeling), employ a flexible hyper prior to ensure the richness of the ap- (species biomass modeling), meteorology (precipi- proximation. Our method is evaluated on both simulated and tation modeling). On the other hand, in many field studies real-world data. Results show that the proposed method has that require manual data collection, for example in insurance smaller estimation bias on the random effects compared to tra- underwriting, the sampling heterogeneity from a hierarchy ditional inference methods including MCMC; it also achieves of groups/populations has to be considered. Mixed-effects a state-of-the-art predictive performance, meanwhile offering models can represent the covariance structure related to the a richer estimation of the . grouping hierarchy in the data by assigning common random effects to the observations that have the same level of a group- Index Terms— Tweedie model, variational inference, in- ing variable; therefore, estimating the random effects is also surance policy pricing an important component in Tweedie modeling. Despite the importance of Tweedie mixed-effects models, 1. INTRODUCTION the intractable likelihood function, the unknown variance index Tweedie models [1, 2] are special members in the exponen- P, and the hierarchical structure of mixed-effects (see Fig. 1b) arXiv:1706.05446v5 [stat.ML] 3 Feb 2019 tial dispersion family; they specify a power-law relationship hinder the inferences on Tweedie models. Unsurprisingly, between the variance and the : Var(Y ) = E(Y )P . For there has been little work devoted to full-likelihood based arbitrary positive P, the index parameter of the variance func- inference on the Tweedie mixed model, let alone Bayesian tion, outside the interval of (0, 1), Tweedie corresponds to a treatments. In this work, we employ variational inference to particular , e.g., Gaussian (P = 0), Pois- solve Bayesian Tweedie mixed models. The goal is to intro- son (P = 1), Gamma (P = 2), Inverse Gaussian (P = 3). duce an accurate and efficient inference algorithm to estimate When P lies in the range of (1, 2), the Tweedie model is the posterior distribution of the fixed-effect parameters, the equivalent to Compound Poisson– [3], variance index, and the random effects. hereafter Tweedie for simplicity. Tweedie serves as a spe- cial Gamma mixture model, with the number of mixtures 2. RELATED WORK determined by a Poisson-distributed , param- PT To date, most practice of Tweedie modeling are conducted eterized by{λ, α, β} and denoted as: Y = Gi,T ∼ i=1 within the quasi-likelihood (QL) framework [6] where the ∗Correspondence to: . density function is approximated by the first and second order moments. QL does not come with the schema of estimating Poisson model with the parameters of the general Tweedie the variance function, even though the estimation of the index EDM model, parameterized by the mean, the dispersion and parameter plays an important role in hypothesis testing and the index parameter {µ, P, φ} is denoted as: uncertainty evaluation [7]. Extended quasi-likelihood (EQL)  µ2−P µ = λαβ [8], together with the profile likelihood method [9] is proposed λ =   φ(2−P)  to fill the gap of index estimation for QL. Nonetheless, EQL is 2−P α+2 α = , P = α+1 (2) not capable of handling exact zero values. P−1   1−P 2−P  P−1  λ (αβ) It was not until recently that numerical approximations β = φ(P − 1)µ φ = 2−P . to the Tweedie density function appeared. Well-known ap- proaches include series expansion method [10] and Fourier A mixed-effects model contains both fixed effects and random inversion method [11]. Yang et al. [12] and Chen et al. [13] effects; graphical model is shown in Fig. 1b. We denote the > applied the gradient boosted trees that try to incorporate non- mixed model as: κ(µi) = fw(Xi) + Ui bi, bi∼N (0, Σb) linear interactions in estimating the mean value of the Tweedie where Xi is the i-th row of the design matrix of fixed-effects model. Qian et al. [14] investigated the grouped elastic net covariates, w are parameters of the fixed-effects model which and Lasso methods for the Tweedie model. Nonetheless, the could represent linear function or deep neural network. Ui dependence on density approximations also makes the above is the i-th row of the design matrix associated with random algorithms vulnerable to numerical issues. Both series expan- effects, bi is the coefficients of the random effect which is sion and Fourier inversion methods are noted to suffer from usually assumed to follow a multivariate unbounded increment in the number of required terms. with zero mean and covariance Σb, κ(·) is the link function, When the Tweedie model is expended to incorporate the and µi is the mean of the i-th response variable Y . In this random effects, the profiled quasi-likelihood (PQL) method work, we have considered the random effects on the intercept. still suffers from the numerical restrictions. Likelihood-based In the context of conducting Bayesian inference on inference methods require approximation of the integral over Tweedie mixed models, we define 1) the observed data the distribution of the random effect. Advances in numerical D = (xi, ui, yi)i=1,...,M . 2) global latent variables {w, σb}, approximation like Laplace approximation [15] and adaptive we assume Σb is a diagonal matrix with its diagonal elements Gauss-Hermite quadrature (AGQ) [16] are then applied to σb; 3) local latent variables {ni, bi}, indicating the number solve the mixed model. However the complex density of the of arrivals and the random effect. The parameters of Tweedie Tweedie model stills bothers the profiled likelihood methods. is thus denoted by (λi, αi, βi) = fλ,α,β(xi, ui|w, σb). The The expectation of a latent variable for an intractable like- latent variables thus contain both local and global ones lihood function can typically be estimated using the EM algo- z = (w, ni, bi, σb)i=1,...,M , and they are assigned with rithm in an empirical Bayes approach or MCMC methods in prior distribution P (z). The joint log-likelihood is com- full Bayesian approach. Simsekli et al. [3, 17] proposed solv- puted by summing over the number of observations M by PM ing the Tweedie mixed-effect models by explicitly exploiting i=1 log [P (yi|ni, λi, αi, βi) · P (ni|λi) · P (bi|σb)]. The the latent variable in the Bayesian formulation. Both MCMC goal here is to find the posterior distribution of P (z|D) and Monte Carlo EM methods have been applied. Zhang et and make future predictions via P (ypred|D, xpred, upred) = R al. [17] compared the latent variable approach with the den- P (ypred|z, xpred, upred)P (z|D) d z. sity function approximations aforementioned, and found that MCMC are computationally demanding on the high dimen- 4. METHODS sional dataset, and dedicated supervisions are needed in order Adversarial Variational Bayes. Variational Inference (VI) to adjust the Monte Carlo error and the sampling variance [18]. [19] approximates the posterior distribution, often compli- 3. TWEEDIE MIXED-EFFECTS MODELS cated and intractable, by proposing a class of probability distributions Qθ(z) (so-called inference models), and then Tweedie EDM fY (y|µ, φ, P) equivalently describes the finding the best set of parameters θ by minimizing the KL compound Poisson–Gamma distribution when P ∈ (1, 2). divergence between the proposal and the true distribution, i.e., Tweedie assumes Poisson arrivals of the events, and Gamma– KL(Qθ(z)||P (z|D)). Minimizing the KL divergence is equiv- distributed “cost” for each individual event. Judging on alent to maximizing the evidence of lower bound (ELBO) [20], whether the aggregated number of arrivals N is zero, the joint expressed as Eq. 3. Optimizing the ELBO requires the gradient density for each observation can be written as: information of ∇θELBO. In our experiments, we find that the −λ P (Y,N = n|λ, α, β) =d0(y) · e · 1n=0 model-free gradient estimation method – REINFORCE [21] fails to yield reasonable results due to the unacceptably-high ynα−1e−y/β λne−λ 1 variance issue [22], even equipped with baseline trick [23] or + nα · · n>0, (1) β Γ(nα) n! local expectation [24]. We also attribute the unsatisfactory re- where d0(·) is the at zero. The con- sults to the over-simplified proposal distribution in traditional nection of the parameters {λ, α, β} of Tweedie Compound VI. Here we try to solve these two issues by employing the implicit inference models with variance reduction tricks. Variance Reduction. We find that integrating out the ∗ h Qθ(z) i latent variable ni by Monte Carlo in Eq. 1 gives significantly θ = arg max EQθ (z) − log + log P (D|z) (3) θ P (z) lower variance in computing the gradients. As ni is a Poisson AVB [25, 26] empowers the VI methods by using neural net- generated sample, the variance will explode in the cases where works as the inference models; this allows more efficient and Yi is zero but the sampled ni is positive, and Yi is positive accurate approximations to the posterior distribution. Since a but the sampled ni is zero. This also accounts for why the neural network is black-box, the inference model is implicit direct application of REINFORCE algorithm fails to work. In thus have no closed form expression, even though this does practice, we find limiting the number of Monte Carlo samples not bother to draw samples from it. To circumvent the issue of between 2 − 10, dependent on the dataset, has the similar computing the gradient from implicit models, the mechanism performance as summing over to larger number. of adversarial learning is introduced; an additional discrimi- P (yi|w, b, Σ) T (z) native network φ is used to model the first term in Eq. 3. T By building a model to distinguish the latent variables that are X = P (b, Σ) · P (yi|nj, w, b, Σ) · P (nj|w, b, Σ) (5) sampled from the prior distribution p(z) from those that are j=1 sampled from the inference network Qθ(z), namely, optimiz- ing the blow equation (where σ(x) is the sigmoid function): Algorithm 1 AVB for Bayesian Tweedie mixed model ∗ h 1: Input: data D = (xi, ui, yi), i = 1, ..., M φ = arg min − EQ (z) log σ(Tφ(z)) φ θ 2: while θ not converged do i 3: for NT : do − EP (z) log(1 − σ(Tφ(z)) , (4) 4: Sample noises i,j=1,...,M ∼ N(0, I), Qθ (z) P ∗ 5: Map noise to prior w = Pψ( ) = µ + σ  , the ratio is estimated as EQθ (z)[log P (z) ] = EQθ (z)[Tφ (z)]. j j j P AVB considers optimizing Eq. 3 and Eq. 4 as a two-layer mini- 6: Reparameterize random effect bj = 0 + σb j max game. We apply stochastic gradient descent alternately to Q Q 7: Map noise to posterior zi = (wi, bi) = Qθ(i), find a Nash-equilibrium. Such Nash-equilibrium, if reached, 8: Minimize Eq. 4 over φ via the gradients: is a global optimum of the ELBO [26]. h i ∇ 1 PM − log σ(T (zQ)) − log(1 − σ(T (zP )) Reparameterizable Random Effect. In mixed-effects φ M i=1 φ i φ i models, the random effect is conventionally assumed to be 9: end for 10: Sample noises  ∼ N(0, I), bi∼N (0, Σ ). In fact, they are reparameterisable. As such, i=1,...,M b Q Q we model the random effects by the reparameterization trick 11: Map noise to posterior zi = (wi, bi) = Qθ(i), [27]; bi is now written as b(i) = 0 + σb i where i ∼ 12: Sample a mini-batch of (xi, yi) from D, N (0,I). The σb is a set of latent variables generated by the in- 13: Minimize Eq. 3 over θ, ψ via the gradients: 1 PM Q ference network and the random effects become deterministic ∇θ,ψ M i=1[−Tφ∗ (zi )+  PT given the auxiliary noise i. j=1 P (yi|nj, w, b, σb; xi, ui) · P (nj|w; xi, ui) · P (b | σ )] Theorem 1 (Reparameterizable Random Effect). Random ef- b 14: end while fect is naturally reparameterizable. With the reparameteri- zation trick, the random effect is no longer restricted to be 5. EXPERIMENTS AND RESULTS normally distributed. For any “location-scale” family distri- butions (Students’t, Laplace, Elliptical, etc.) We compare our method with six traditional inference meth- Proof. See the Section 2.4 in [27] ods on Tweedie. We evaluate our method on two public datasets of modeling the aggregate loss for auto insurance Note that when computing the gradients, we no longer need polices [31], and modeling the length density of fine roots to sample the random effect directly, instead, we can back- in ecology [32]. We separate the dataset in 50%/25%/25% propagate the path-wise gradients which could dramatically for train/valid/test respectively. Considering the sparsity and reduce the variance [28]. right-skewness of the data, we use the ordered Lorenz curve Hyper Priors. The priors of the latent variables are fixed and its corresponding Gini index [33, 34] as the metric. As- in traditional VI. Here we however parameterize the prior suming for the i /N observations, y is the ground truth, P (w) by ψ and make ψ also trainable when optimizing Eq. 3. th i ψ p to be the results from the baseline predictive model, yˆ We refer ψ as a kind of hyper prior to w. The intuition is to not i i to be the predictive results from the model. We sort all the constraint the posterior approximation by one over-simplified samples by the relativity r =y ˆ /p in an increasing order, prior. We would like the prior to be adjusted so as to make the i i i and then compute the empirical cumulative distribution as posterior Qθ(z) close to a set of prior distributions, or a self- Pn p ·1(r r) Pn y ·1(r r) ˆ i=1 i i6 ˆ i=1 i i6 (Fp(r) = Pn , Fy(r) = Pn ). The adjustable prior; this could further ensure the expressiveness i=1 pi i=1 yi of Qθ(z). We can also apply the same trick if the class of prior plot of (Fˆp(r), Fˆy(r)) is an ordered Lorenz curve, and Gini in- is reparameterizable. dex is the twice the area between the Lorenz curve and the 45◦ Table 1: The pairwise Gini index comparison with standard error based on 20 random splits

MODEL GLM[29] PQL[29] LAPLACE [15] AGQ[16] MCMC[17] TDBOOST[30] AVB BASELINE

GLM(AUTOCLAIM) / −2.976.28 1.755.68 1.755.68 −15.027.06 1.616.32 9.845.80 PQL 7.375.67 / 7.506.26 7.505.72 6.735.95 0.816.22 9.026.07 LAPLACE 2.104.52 −1.005.94 / 8.845.36 4.004.61 21.454.84 20.614.54 AGQ 2.104.52 −1.005.94 8.845.36 / 4.004.61 21.454.84 20.614.54 MCMC 14.756.80 −1.066.41 3.125.99 3.125.99 / 7.825.83 11.885.50 TDBOOST 17.524.80 17.085.36 19.305.19 19.305.19 11.614.58 / 20.304.97 AVB −0.174.70 0.0495.62 3.414.94 3.414.94 0.864.62 11.494.93 /

GLM(FINEROOT) / 23.189.30 35.876.52 35.876.52 −15.7310.63 35.716.52 35.646.53 PQL −21.618.34 / 30.908.86 30.908.86 −21.458.38 24.198.36 28.978.92 LAPLACE −12.557.37 −14.728.81 / 15.076.41 −12.557.37 15.337.36 10.617.20 AGQ −12.557.37 −14.728.81 15.076.41 / −12.557.37 15.337.36 10.617.20 MCMC 17.2710.25 22.539.31 35.106.54 35.106.54 / 35.106.54 34.876.55 TDBOOST 22.476.80 8.509.09 22.636.80 22.636.80 11.616.80 / 22.396.80 AVB −8.2617.66 −10.8868.98 2.137.28 2.137.28 −8.267.66 11.007.74 /

Param. P σ2 P σ2 Algo. auto b,auto root b,root MCMC PQL / 8.94 · 10−5 / 1.0346 · 10−3 20 AVB LAPLACE 1.3394 4.06 · 10−4 1.4202 6.401 · 10−3 AGQ 1.3394 4.027 · 10−4 1.4202 6.401 · 10−3 MCMC 1.3458 4.575 · 10−3 1.4272 3.490 · 10−2 Density 10 AVB 1.3403 3.344 · 10−3 1.4237 2.120 · 10−2

Table 2: Estimation on P and σ2 for both AutoClaim and 0 b FineRoot data with random effects. 1.00 1.25 1.50 1.75 2.00 Value From table 2, we can see that all the algorithms estimate P Fig. 2: Posterior estimation of the index parameter P on the in a reasonable similar range while the results from MCMC AutoClaim data. 2 and AVB are closer. The estimation of σb has a large variance. The estimation from AGQ and Laplace methods are around line. The Gini index is not symmetrical if different baselines ten times smaller than MCMC and AVB results while PQL are used. In short, a model with a larger Gini index indicates produces even smaller estimations. This is consistent with our its superior capability of separating the observations. In the finding in the simulation study that AGQ and Laplace methods insurance context, Gini index profiles the model’s capability tends to underestimate the random effect variance. The AVB of distinguishing policy holders with different risk levels. and MCMC estimations are considered more accurate; as the Gini comparison is presented in Table. 1. Here only fixed estimated variance is comparatively smaller than MCMC, the effects are considered. AVB method shows the state-of-the- AVB estimation can be believed to be more accurate. art performance on both AutoClaim and FineRoot datasets, even comparatively better than the boosting tree method that 6. CONCLUSIONS is considered to have a strong performance. By inspecting the We present Adversarial Variational Bayes in solving Bayesian posterior estimation of P in Fig. 2, we find AVB shows a richer Tweedie mixed models. To empower the posterior approxima- representation of the index parameter, with two modals at 1.20 tion, we employ implicit models and training in an adversarial and 1.31. As P uniquely characterizes a Tweedie distribution, setting. To reduce variance of the gradients, we sum over the compared with one single value that traditional profiled likeli- local latent variable of Tweedie model that is not parameteriz- hood method offers, flexible choices of P enable customized able. We also make the prior distribution trainable to guarantee descriptions for insurance policies that may have different the expressiveness of the posterior. Our method outperforms underlying risk profiles. Also note that AVB uses a neural all traditional methods on both simulated data and real-world sampler that does not involve any rejection procedures; unlike data; meanwhile, the estimations on the variance parameter MCMC, it holds the potentials for large–scale predictions on and the random effects are competitively accurate and more high dimensional dataset. stable. Tweedie model is one fundamental model that is widely To compare the estimations on random effects, we add used in insurance policy pricing. As far as we are concerned, “CARTYPE” in the AutoClaim data as the random effect with this is the first work that leverages recent progresses of deep 6 groups and “PLANT” in the FineRoot data with 8 groups. learning methods to tackle insurance problems. 7. REFERENCES [19] Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul, “An introduction to variational methods [1] Bent Jorgensen, The theory of dispersion models, CRC Press, for graphical models,” Machine learning, vol. 37, no. 2, pp. 1997. 183–233, 1999. [2] Bent Jorgensen, “Exponential dispersion models,” Journal [20] Christopher M Bishop, Pattern recognition and machine learn- of the Royal Statistical Society. Series B (Methodological), pp. ing, springer, 2006. 127–162, 1987. [21] Ronald J Williams, “Simple statistical gradient-following al- [3] Umut Simsekli, Ali Taylan Cemgil, and Yusuf Kenan Yilmaz, gorithms for connectionist reinforcement learning,” Machine “Learning the beta-divergence in tweedie compound poisson learning, vol. 8, no. 3-4, pp. 229–256, 1992. matrix factorization models.,” in ICML (3), 2013, pp. 1409– 1417. [22] Geoffrey E Hinton, Peter Dayan, Brendan J Frey, and Rad- ford M Neal, “The” wake-sleep” algorithm for unsupervised [4] Bent Jørgensen, Exponential dispersion models, Number 48. neural networks,” Science, vol. 268, no. 5214, pp. 1158, 1995. Instituto de matematica pura e aplicada, 1991. [23] Andriy Mnih and Karol Gregor, “Neural variational in- [5] Gordon K Smyth and Bent Jørgensen, “Fitting tweedie’s com- ference and learning in belief networks,” arXiv preprint pound poisson model to insurance claims data: dispersion mod- arXiv:1402.0030, 2014. elling,” Astin Bulletin, vol. 32, no. 01, pp. 143–157, 2002. [24] Michalis Titsias RC AUEB and Miguel Lazaro-Gredilla,´ “Local [6] Robert WM Wedderburn, “Quasi-likelihood functions, general- expectation gradients for black box variational inference,” in ized linear models, and the gaussnewton method,” Biometrika, Advances in Neural Information Processing Systems, 2015, pp. vol. 61, no. 3, pp. 439–447, 1974. 2638–2646. [7] Marie Davidian and Raymond J Carroll, “Variance function [25] Ferenc Huszar,´ “Variational inference using implicit distribu- Journal of the American Statistical Association estimation,” , tions,” arXiv preprint arXiv:1702.08235, 2017. vol. 82, no. 400, pp. 1079–1091, 1987. [26] Lars Mescheder, Sebastian Nowozin, and Andreas Geiger, [8] John A Nelder and Daryl Pregibon, “An extended quasi- “Adversarial variational bayes: Unifying variational autoen- likelihood function,” Biometrika, vol. 74, no. 2, pp. 221–232, coders and generative adversarial networks,” arXiv preprint 1987. arXiv:1701.04722, 2017. [9] DR Cox and N Reid, “A note on the calculation of adjusted [27] Diederik P Kingma and Max Welling, “Auto-encoding varia- profile likelihood,” Journal of the Royal Statistical Society. tional bayes,” arXiv preprint arXiv:1312.6114, 2013. Series B (Methodological), pp. 467–471, 1993. [28] Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gel- [10] Peter K Dunn and Gordon K Smyth, “Series evaluation of man, and David M Blei, “Automatic differentiation variational tweedie exponential dispersion model densities,” and inference,” arXiv preprint arXiv:1603.00788, 2016. Computing, vol. 15, no. 4, pp. 267–280, 2005. [29] Peter McCullagh, “Generalized linear models,” European [11] Peter K Dunn and Gordon K Smyth, “Evaluation of tweedie Journal of Operational Research, vol. 16, no. 3, pp. 285–292, exponential dispersion model densities by fourier inversion,” 1984. Statistics and Computing, vol. 18, no. 1, pp. 73–86, 2008. [12] Yi Yang, Wei Qian, and Hui Zou, “A boosted tweedie com- [30] Yi Yang, Wei Qian, and Hui Zou, “Insurance premium pre- pound poisson model for insurance premium,” arXiv preprint diction via gradient tree-boosted tweedie compound poisson Journal of Business & Economic Statistics arXiv:1508.06378, 2015. models,” , pp. 1–15, 2017. [13] Tianqi Chen and Carlos Guestrin, “Xgboost: A scalable tree boosting system,” arXiv preprint arXiv:1603.02754, 2016. [31] Karen CH Yip and Kelvin KW Yau, “On modeling claim frequency data in general insurance with extra zeros,” Insurance: [14] Wei Qian, Yi Yang, and Hui Zou, “Tweedies compound poisson Mathematics and Economics, vol. 36, no. 2, pp. 153–163, 2005. model with grouped elastic net,” Journal of Computational and Graphical Statistics, vol. 25, no. 2, pp. 606–625, 2016. [32] HN De Silva, AJ Hall, DS Tustin, and PW Gandar, “Analysis of distribution of root length density of apple trees on different [15] Luke Tierney and Joseph B Kadane, “Accurate approximations dwarfing rootstocks,” Annals of Botany, vol. 83, no. 4, pp. for posterior moments and marginal densities,” Journal of the 335–345, 1999. american statistical association, vol. 81, no. 393, pp. 82–86, 1986. [33] Edward W Frees, Glenn Meyers, and A David Cummings, “Summarizing insurance scores using a gini index,” Journal [16] Qing Liu and Donald A Pierce, “A note on gausshermite quadra- of the American Statistical Association, vol. 106, no. 495, pp. ture,” Biometrika, vol. 81, no. 3, pp. 624–629, 1994. 1085–1098, 2011. [17] Yanwei Zhang, “Likelihood-based and bayesian methods for [34] Edward W Jed Frees, Glenn Meyers, and A David Cummings, tweedie compound poisson linear mixed models,” Statistics and “Insurance ratemaking and a gini index,” Journal of Risk and Computing, vol. 23, no. 6, pp. 743–757, 2013. Insurance, vol. 81, no. 2, pp. 335–366, 2014. [18] Dani Gamerman and Hedibert F Lopes, Markov chain Monte Carlo: stochastic simulation for Bayesian inference, CRC Press, 2006.