Actuarial Applications of Hierarchical Modeling
Total Page:16
File Type:pdf, Size:1020Kb
Actuarial Applications of Hierarchical Modeling CAS RPM Seminar Jim Guszcza Chicago Deloitte Consulting LLP March, 2010 Antitrust Notice The Casualty Actuarial Society is committed to adhering strictly to the letter and spirit of the antitrust laws. Seminars conducted under the auspices of the CAS are designed solely to provide a forum for the expression of various points of view on topics described in the programs or agendas for such meetings . Under no circumstances shall CAS seminars be used as a means for competing companies or firms to reach any understanding – expressed or implied – that restricts competition or in any way impairs the ability of members to exercise independent business judgment regarding matters affecting competition. It is the responsibility of all seminar participants to be aware of antitrust regulations, to prevent any written or verbal discussions that appear to violate these laws, and to adhere in every respect to the CAS antitrust compliance policy. Copyright © 2009 Deloitte Development LLC. All rights reserved. 1 Topics Hierarchical Modeling Theory Sample Hierarchical Model Hierarchical Models and Credibilityyy Theory Case Study: Poisson Regression Copyright © 2009 Deloitte Development LLC. All rights reserved. 2 Hierarchical Model Theory Hierarchical Model Theory What is Hierarchical Modeling? • Hierarchical modeling is used when one’s data is grouped in some important way. • Claim experience by state or territory • Workers Comp claim experience by class code • Income by profession • Claim severity by injury type • Churn rate by agency • Multiple years of loss experience by policyholder. •… • Often grouped data is modeled either by: • Pooling the data and introducing dummy variables to reflect the groups • Building separate models by group • Hierarchical modeling offers a “third way”. • Parameters reflecting group mem bers hip enter one ’s mod el t hroug h appropriately specified probability sub-models. Copyright © 2009 Deloitte Development LLC. All rights reserved. 5 What’s in a Name? • Hierarchical models go by many different names • Mixed effects models • Random effects models • Multilevel models • Longitudinal models • Panel data models • I prefer the “hierarchical/multilevel model” terminology bkhdlbecause it evokes the way models-within-modldels are use d to reflect levels-within-levels of ones data. • An important special case of hierarchical models involves multiple observations through time of each unit. • Here group membership is the repeated observations belonging to each idiidlindividual. • Time is the covariate. Copyright © 2009 Deloitte Development LLC. All rights reserved. 6 Varying Slopes and Intercepts Random Intercept / Random Intercept Random Slope Random Slope Model Model Model • Intercept varies with group • Intercept stays constant • Intercept and slope vary • Slope stays constant • Slope varies by group by group • Each line represents a different group Copyright © 2009 Deloitte Development LLC. All rights reserved. 7 Common Hierarchical Models •Notation: • Data points (Xi, Yi)i=1…N • j[i]]p: data point i belongggps to group j. α β ε • Classical Linear Model Yi = + Xi + i α β σ2 • Equivalently: Yi ~ N( + Xi, ) •Same α and β for every data point α β ε • Random Intercept Model Yi = j[i] + Xi + i α μ σ2 ε σ2 • Where j ~ N( α, α)&) & i ~ N(0, ) •Same β for every data point; but α varies by group α β ε • Random Intercept and Slope Model Yi = j[i] + j[i]Xi + i α β Μ ε σ2 • Where ( j, j) ~ N( , ) & i ~ N(0, ) •Both α and β vary by group 2 α μα σ σ Y ~ N()α + β ⋅ X ,σ 2 where j ~ N ,Σ , Σ = α αβ i j[i] j[i] i β μ σ σ 2 j β αβ β Copyright © 2009 Deloitte Development LLC. All rights reserved. 8 Parameters and Hyperparameters • We can rewrite the random intercept model this way: α + β σ 2 α μ σ 2 Yi ~ N( j[i] X i , ) j ~ N( α , α ) • Suppose there are 100 levels: j = 1, 2, …, 100 (e.g. SIC bytes 1-2) α α α β • This model contains 101 parameters: { 1, 2, …, 100, }. • And it contains 4 hyperparameters: {μα, β, σ, σα}. • Here is how the hyperparameters relate to the parameters: n j αˆ = Z ⋅(y − βx ) + (1− Z )⋅ μˆα where Z = j j j j j j +σ 2 n j 2 σα • Does this formula look familiar? Copyright © 2009 Deloitte Development LLC. All rights reserved. 9 Sample Hierarchical Model Example • Suppose we wish to model a company’s policies in force, by region, for the years 2005-08. • 8 * 4 = 32 data points. Policies in Force by Year and Region • One way to visualize the data: 34 0 2700 – Plot all of the data points on the 00 3 same graph, use different 2 3 colors/symbols to represent 4 7 region. 4 2 0250026 00 8 4 pif •Alternate way: 78 561 – Use a trellis-style display, with 2300 24 785 one ppplot per reg ion 00 8 5 – More immediate representation of 2 7 6 61 the data’s hierarchical structure. 2 531 – (see next slide) 2100 220 6 00 1 200 2005 2006 2007 2008 Copyright © 2009 Deloitte Development LLC. All rights reserved. 11 year pif pif •must reflect the factthatPIFvariesbyregion. We •wish to build We time. a modelthatcaptures thechangeinPIF over Trellis-Style Data Display Copyright © 2009 DevelopmentrightsDeloitte All LLC.reserved. 2000 220024002600 2000 2200 2400 2600 2005 2006 2007 2008 2005 2006 2007 2008 region5 region1 region2 region3 region4 y y ear ear pif pif 2000 220024002600 2000 2200 2400 2600 2005 2006 2007 2008 2005 2006 2007 2008 region6 y y ear ear pif pif 2000 220024002600 2000 2200 2400 2600 2005 2006 2007 2008 2005 2006 2007 2008 region7 y y ear ear pif pif 2000 220024002600 2000 2200 2400 2600 2005 2006 2007 2008 2005 2006 2007 2008 region8 y y ear ear 12 Option 1: Simple Regression • The easiest thing to do is to pool the data across groups -- i.e. simply ignore region • Fit a simple linear model PIF = α + βt + ε • Alas, this model is not appropriate for all regions region1 region2 region3 region4 2600 2600 2600 2600 pif pif pif pif 2200 2400 2200 2400 2200 2400 2200 2400 2000 2000 2000 2000 2005 2006 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 year year year year region5 region6 region7 region8 2400 2600 2400 2600 2400 2600 2400 2600 pif pif pif pif 2000 2200 2000 2200 2000 2200 2000 2200 Copyright © 2009 Deloitte Development LLC. All rights reserved. 13 2005 2006 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 Option 2: Separate Models by Region • At the other extreme, we can fit a separate simple linear model for each region. k k k {}PIF = α + β t + ε = • Each model is fit with 4 data points. k 1,2,..,8 • Introduces danger of over -fitting the data. region1 region2 region3 region4 2600 2600 2600 2600 pif pif pif pif 2200 2400 2200 2400 2200 2400 2200 2400 2000 2000 2000 2000 2005 2006 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 year year year year region5 region6 region7 region8 2400 2600 2400 2600 2400 2600 2400 2600 pif pif pif pif 2000 2200 2000 2200 2000 2200 2000 2200 14 Copyright ©2005 2009 Deloitte 2006 Development 2007 2008 LLC.2005 All rights 2006 reserved. 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 Option 3: Random Intercept Hierarchical Model • Compromise: Reflect the region group structure using a hierarchical model. α + β σ 2 α μ σ 2 PIF ~ N( j[i] t, ) j ~ N( α , α ) region1 region2 region3 region4 2600 2600 2600 2600 pif pif pif pif 2200 2400 2200 2400 2200 2400 2200 2400 00 00 00 00 200 200 200 200 2005 2006 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 year year year year region5 region6 region7 region8 2400 2600 2400 2600 2400 2600 2400 2600 pif pif pif pif 2000 2200 2000 2200 2000 2200 2000 2200 Copyright2005 © 2009 2006 Deloitte 2007 Development 2008 LLC.2005 All rights 2006 reserved. 2007 2008 2005 2006 2007 2008 2005 2006 2007 2008 15 Compromise Between Complete Pooling & No Pooling = α + β + ε {}= α k + β k + ε k PIF t PIF t k =1,2,..,8 Complete Pooling No Pooling •Ignore group structure • Estimating one model for each altogether group Compromise Hierarchical Model • EtiEstima tes parame ters using a compromise between complete pooling and no pooling methodologies α + β σ 2 α μ σ 2 PIF ~ N( j[i] t, ) j ~ N( α , α ) Copyright © 2009 Deloitte Development LLC. All rights reserved. 16 Option 1b: Adding Dummy Variables • Question: of course it’d be crazy to fit a separate SLR for each region. • But what about adding 8 region dummy variables into the SLR? = γ + γ + + γ + β + ε PIF 1R1 2 R2 ... 8R8 t • If we do this, we need to estimate 9 parameters instead of 2. • In contrast , the random intercept model contains 4 hyperparameters: μα, β, σ, σα • Now suppose our example contained 800 regions . If we use dummy variables, our SLR potentially requires that we estimate 801 parameters. • But the random intercept model will contain the same 4 hyperparameters. Copyright © 2009 Deloitte Development LLC. All rights reserved. 17 Varying Slopes • The random intercept model is a compromise between a “pooled” SLR and a separate SLR by region. α + β σ 2 α μ σ 2 PIF ~ N( j[i] t, ) j ~ N( α , α ) • But there is nothing sacred about the intercept term: we can also allow the sl opes t o vary b y regi on.