ADVERSARIAL VARIATIONAL BAYES METHODS FOR TWEEDIE COMPOUND POISSON MIXED MODELS
Yaodong Yang∗, Rui Luo, Yuanyuan Liu
American International Group Inc.
ABSTRACT 1.5 w The Tweedie Compound Poisson-Gamma model is routinely 1.0 λ =1.9, α =49, β =0.03 random i effect used for modeling non-negative continuous data with a discrete x bi Σb Density probability mass at zero. Mixed models with random effects ac- i 0.5 U count for the covariance structure related to the grouping hier- λi αi βi
archy in the data. An important application of Tweedie mixed 0.0 i yi models is pricing the insurance policies, e.g. car insurance. 0 5 10 n Tweedie Value M However, the intractable likelihood function, the unknown vari- ance function, and the hierarchical structure of mixed effects (a) Tweedie Density Function (b) Tweedie Mixed-Effects model. have presented considerable challenges for drawing inferences Fig. 1: Graphical model of Tweedie mixed-effect model.
on Tweedie. In this study, we tackle the Bayesian Tweedie i.i.d mixed-effects models via variational inference approaches. In Poisson(λ),Gi ∼ Gamma(α, β). Tweedie is heavily used particular, we empower the posterior approximation by im- for modeling non-negative heavy-tailed continuous data with plicit models trained in an adversarial setting. To reduce the a discrete probability mass at zero (see Fig. 1a). As a result, variance of gradients, we reparameterize random effects, and Tweedie gains its importance from multiple domains [4, 5], in- integrate out one local latent variable of Tweedie. We also cluding actuarial science (aggregated loss/premium modeling), employ a flexible hyper prior to ensure the richness of the ap- ecology (species biomass modeling), meteorology (precipi- proximation. Our method is evaluated on both simulated and tation modeling). On the other hand, in many field studies real-world data. Results show that the proposed method has that require manual data collection, for example in insurance smaller estimation bias on the random effects compared to tra- underwriting, the sampling heterogeneity from a hierarchy ditional inference methods including MCMC; it also achieves of groups/populations has to be considered. Mixed-effects a state-of-the-art predictive performance, meanwhile offering models can represent the covariance structure related to the a richer estimation of the variance function. grouping hierarchy in the data by assigning common random effects to the observations that have the same level of a group- Index Terms— Tweedie model, variational inference, in- ing variable; therefore, estimating the random effects is also surance policy pricing an important component in Tweedie modeling. Despite the importance of Tweedie mixed-effects models, 1. INTRODUCTION the intractable likelihood function, the unknown variance index Tweedie models [1, 2] are special members in the exponen- P, and the hierarchical structure of mixed-effects (see Fig. 1b) arXiv:1706.05446v5 [stat.ML] 3 Feb 2019 tial dispersion family; they specify a power-law relationship hinder the inferences on Tweedie models. Unsurprisingly, between the variance and the mean: Var(Y ) = E(Y )P . For there has been little work devoted to full-likelihood based arbitrary positive P, the index parameter of the variance func- inference on the Tweedie mixed model, let alone Bayesian tion, outside the interval of (0, 1), Tweedie corresponds to a treatments. In this work, we employ variational inference to particular stable distribution, e.g., Gaussian (P = 0), Pois- solve Bayesian Tweedie mixed models. The goal is to intro- son (P = 1), Gamma (P = 2), Inverse Gaussian (P = 3). duce an accurate and efficient inference algorithm to estimate When P lies in the range of (1, 2), the Tweedie model is the posterior distribution of the fixed-effect parameters, the equivalent to Compound Poisson–Gamma Distribution [3], variance index, and the random effects. hereafter Tweedie for simplicity. Tweedie serves as a spe- cial Gamma mixture model, with the number of mixtures 2. RELATED WORK determined by a Poisson-distributed random variable, param- PT To date, most practice of Tweedie modeling are conducted eterized by{λ, α, β} and denoted as: Y = Gi,T ∼ i=1 within the quasi-likelihood (QL) framework [6] where the ∗Correspondence to:
MODEL GLM[29] PQL[29] LAPLACE [15] AGQ[16] MCMC[17] TDBOOST[30] AVB BASELINE
GLM(AUTOCLAIM) / −2.976.28 1.755.68 1.755.68 −15.027.06 1.616.32 9.845.80 PQL 7.375.67 / 7.506.26 7.505.72 6.735.95 0.816.22 9.026.07 LAPLACE 2.104.52 −1.005.94 / 8.845.36 4.004.61 21.454.84 20.614.54 AGQ 2.104.52 −1.005.94 8.845.36 / 4.004.61 21.454.84 20.614.54 MCMC 14.756.80 −1.066.41 3.125.99 3.125.99 / 7.825.83 11.885.50 TDBOOST 17.524.80 17.085.36 19.305.19 19.305.19 11.614.58 / 20.304.97 AVB −0.174.70 0.0495.62 3.414.94 3.414.94 0.864.62 11.494.93 /
GLM(FINEROOT) / 23.189.30 35.876.52 35.876.52 −15.7310.63 35.716.52 35.646.53 PQL −21.618.34 / 30.908.86 30.908.86 −21.458.38 24.198.36 28.978.92 LAPLACE −12.557.37 −14.728.81 / 15.076.41 −12.557.37 15.337.36 10.617.20 AGQ −12.557.37 −14.728.81 15.076.41 / −12.557.37 15.337.36 10.617.20 MCMC 17.2710.25 22.539.31 35.106.54 35.106.54 / 35.106.54 34.876.55 TDBOOST 22.476.80 8.509.09 22.636.80 22.636.80 11.616.80 / 22.396.80 AVB −8.2617.66 −10.8868.98 2.137.28 2.137.28 −8.267.66 11.007.74 /
Param. P σ2 P σ2 Algo. auto b,auto root b,root MCMC PQL / 8.94 · 10−5 / 1.0346 · 10−3 20 AVB LAPLACE 1.3394 4.06 · 10−4 1.4202 6.401 · 10−3 AGQ 1.3394 4.027 · 10−4 1.4202 6.401 · 10−3 MCMC 1.3458 4.575 · 10−3 1.4272 3.490 · 10−2 Density 10 AVB 1.3403 3.344 · 10−3 1.4237 2.120 · 10−2
Table 2: Estimation on P and σ2 for both AutoClaim and 0 b FineRoot data with random effects. 1.00 1.25 1.50 1.75 2.00 Value From table 2, we can see that all the algorithms estimate P Fig. 2: Posterior estimation of the index parameter P on the in a reasonable similar range while the results from MCMC AutoClaim data. 2 and AVB are closer. The estimation of σb has a large variance. The estimation from AGQ and Laplace methods are around line. The Gini index is not symmetrical if different baselines ten times smaller than MCMC and AVB results while PQL are used. In short, a model with a larger Gini index indicates produces even smaller estimations. This is consistent with our its superior capability of separating the observations. In the finding in the simulation study that AGQ and Laplace methods insurance context, Gini index profiles the model’s capability tends to underestimate the random effect variance. The AVB of distinguishing policy holders with different risk levels. and MCMC estimations are considered more accurate; as the Gini comparison is presented in Table. 1. Here only fixed estimated variance is comparatively smaller than MCMC, the effects are considered. AVB method shows the state-of-the- AVB estimation can be believed to be more accurate. art performance on both AutoClaim and FineRoot datasets, even comparatively better than the boosting tree method that 6. CONCLUSIONS is considered to have a strong performance. By inspecting the We present Adversarial Variational Bayes in solving Bayesian posterior estimation of P in Fig. 2, we find AVB shows a richer Tweedie mixed models. To empower the posterior approxima- representation of the index parameter, with two modals at 1.20 tion, we employ implicit models and training in an adversarial and 1.31. As P uniquely characterizes a Tweedie distribution, setting. To reduce variance of the gradients, we sum over the compared with one single value that traditional profiled likeli- local latent variable of Tweedie model that is not parameteriz- hood method offers, flexible choices of P enable customized able. We also make the prior distribution trainable to guarantee descriptions for insurance policies that may have different the expressiveness of the posterior. Our method outperforms underlying risk profiles. Also note that AVB uses a neural all traditional methods on both simulated data and real-world sampler that does not involve any rejection procedures; unlike data; meanwhile, the estimations on the variance parameter MCMC, it holds the potentials for large–scale predictions on and the random effects are competitively accurate and more high dimensional dataset. stable. Tweedie model is one fundamental model that is widely To compare the estimations on random effects, we add used in insurance policy pricing. As far as we are concerned, “CARTYPE” in the AutoClaim data as the random effect with this is the first work that leverages recent progresses of deep 6 groups and “PLANT” in the FineRoot data with 8 groups. learning methods to tackle insurance problems. 7. REFERENCES [19] Michael I Jordan, Zoubin Ghahramani, Tommi S Jaakkola, and Lawrence K Saul, “An introduction to variational methods [1] Bent Jorgensen, The theory of dispersion models, CRC Press, for graphical models,” Machine learning, vol. 37, no. 2, pp. 1997. 183–233, 1999. [2] Bent Jorgensen, “Exponential dispersion models,” Journal [20] Christopher M Bishop, Pattern recognition and machine learn- of the Royal Statistical Society. Series B (Methodological), pp. ing, springer, 2006. 127–162, 1987. [21] Ronald J Williams, “Simple statistical gradient-following al- [3] Umut Simsekli, Ali Taylan Cemgil, and Yusuf Kenan Yilmaz, gorithms for connectionist reinforcement learning,” Machine “Learning the beta-divergence in tweedie compound poisson learning, vol. 8, no. 3-4, pp. 229–256, 1992. matrix factorization models.,” in ICML (3), 2013, pp. 1409– 1417. [22] Geoffrey E Hinton, Peter Dayan, Brendan J Frey, and Rad- ford M Neal, “The” wake-sleep” algorithm for unsupervised [4] Bent Jørgensen, Exponential dispersion models, Number 48. neural networks,” Science, vol. 268, no. 5214, pp. 1158, 1995. Instituto de matematica pura e aplicada, 1991. [23] Andriy Mnih and Karol Gregor, “Neural variational in- [5] Gordon K Smyth and Bent Jørgensen, “Fitting tweedie’s com- ference and learning in belief networks,” arXiv preprint pound poisson model to insurance claims data: dispersion mod- arXiv:1402.0030, 2014. elling,” Astin Bulletin, vol. 32, no. 01, pp. 143–157, 2002. [24] Michalis Titsias RC AUEB and Miguel Lazaro-Gredilla,´ “Local [6] Robert WM Wedderburn, “Quasi-likelihood functions, general- expectation gradients for black box variational inference,” in ized linear models, and the gaussnewton method,” Biometrika, Advances in Neural Information Processing Systems, 2015, pp. vol. 61, no. 3, pp. 439–447, 1974. 2638–2646. [7] Marie Davidian and Raymond J Carroll, “Variance function [25] Ferenc Huszar,´ “Variational inference using implicit distribu- Journal of the American Statistical Association estimation,” , tions,” arXiv preprint arXiv:1702.08235, 2017. vol. 82, no. 400, pp. 1079–1091, 1987. [26] Lars Mescheder, Sebastian Nowozin, and Andreas Geiger, [8] John A Nelder and Daryl Pregibon, “An extended quasi- “Adversarial variational bayes: Unifying variational autoen- likelihood function,” Biometrika, vol. 74, no. 2, pp. 221–232, coders and generative adversarial networks,” arXiv preprint 1987. arXiv:1701.04722, 2017. [9] DR Cox and N Reid, “A note on the calculation of adjusted [27] Diederik P Kingma and Max Welling, “Auto-encoding varia- profile likelihood,” Journal of the Royal Statistical Society. tional bayes,” arXiv preprint arXiv:1312.6114, 2013. Series B (Methodological), pp. 467–471, 1993. [28] Alp Kucukelbir, Dustin Tran, Rajesh Ranganath, Andrew Gel- [10] Peter K Dunn and Gordon K Smyth, “Series evaluation of man, and David M Blei, “Automatic differentiation variational tweedie exponential dispersion model densities,” Statistics and inference,” arXiv preprint arXiv:1603.00788, 2016. Computing, vol. 15, no. 4, pp. 267–280, 2005. [29] Peter McCullagh, “Generalized linear models,” European [11] Peter K Dunn and Gordon K Smyth, “Evaluation of tweedie Journal of Operational Research, vol. 16, no. 3, pp. 285–292, exponential dispersion model densities by fourier inversion,” 1984. Statistics and Computing, vol. 18, no. 1, pp. 73–86, 2008. [12] Yi Yang, Wei Qian, and Hui Zou, “A boosted tweedie com- [30] Yi Yang, Wei Qian, and Hui Zou, “Insurance premium pre- pound poisson model for insurance premium,” arXiv preprint diction via gradient tree-boosted tweedie compound poisson Journal of Business & Economic Statistics arXiv:1508.06378, 2015. models,” , pp. 1–15, 2017. [13] Tianqi Chen and Carlos Guestrin, “Xgboost: A scalable tree boosting system,” arXiv preprint arXiv:1603.02754, 2016. [31] Karen CH Yip and Kelvin KW Yau, “On modeling claim frequency data in general insurance with extra zeros,” Insurance: [14] Wei Qian, Yi Yang, and Hui Zou, “Tweedies compound poisson Mathematics and Economics, vol. 36, no. 2, pp. 153–163, 2005. model with grouped elastic net,” Journal of Computational and Graphical Statistics, vol. 25, no. 2, pp. 606–625, 2016. [32] HN De Silva, AJ Hall, DS Tustin, and PW Gandar, “Analysis of distribution of root length density of apple trees on different [15] Luke Tierney and Joseph B Kadane, “Accurate approximations dwarfing rootstocks,” Annals of Botany, vol. 83, no. 4, pp. for posterior moments and marginal densities,” Journal of the 335–345, 1999. american statistical association, vol. 81, no. 393, pp. 82–86, 1986. [33] Edward W Frees, Glenn Meyers, and A David Cummings, “Summarizing insurance scores using a gini index,” Journal [16] Qing Liu and Donald A Pierce, “A note on gausshermite quadra- of the American Statistical Association, vol. 106, no. 495, pp. ture,” Biometrika, vol. 81, no. 3, pp. 624–629, 1994. 1085–1098, 2011. [17] Yanwei Zhang, “Likelihood-based and bayesian methods for [34] Edward W Jed Frees, Glenn Meyers, and A David Cummings, tweedie compound poisson linear mixed models,” Statistics and “Insurance ratemaking and a gini index,” Journal of Risk and Computing, vol. 23, no. 6, pp. 743–757, 2013. Insurance, vol. 81, no. 2, pp. 335–366, 2014. [18] Dani Gamerman and Hedibert F Lopes, Markov chain Monte Carlo: stochastic simulation for Bayesian inference, CRC Press, 2006.