Quick viewing(Text Mode)

A General Framework for Forecasting Numbers of Claims

A General Framework for Forecasting Numbers of Claims

Ageneralframeworkforforecastingnumbersofclaims Topic2 PricingRisk(RiskMargins) Author Wright,Thomas,MA,CStat,FIA Deloitte&ToucheLLP 1StonecutterStreet London EC4A4TR Tel:+44(0)2073036240 [email protected] Abstract In applications of the collective risk model, significantly more attention is often given to modellingseveritythan modelling frequency. Sometimes, frequency modellingisneglectedto theextentofusingaPoissondistributionforthenumberofclaims.ThePoissondistributionhas equalto,andtherearemultiplereasonswhythisisalmostneverappropriatewhen forecastingnumbersofnonlifeinsuranceclaims. Theinappropriatenessof the Poissondistribution for forecasting has longbeen recognisedby many, and collective risk algorithms (Panjer (1980), Heckman & Meyers (1983)) have been developedthat workjust aswellwithotherfrequency distributions,inparticular the Negative Binomial.However,tocalibrateaNegativeBinomialmodelrequirestwo,equivalent tospecifyingbothmeanandvariance.Theauthorbelievesthatonereasonfortheprevalenceof Poissonmodelsislackofknowledgeabouthowtoobjectivelyquantifythevarianceaswellas themean.Thispaperaimstocontributeinthisarea. Themainreasonswhythevarianceshouldexceedtheexpectednumberofclaimsareidentified asestimationerror,heterogeneity,contagion,andfutureexposureuncertainty.While all these factors have long been recognised by some practitioners, this paper provides a frameworkfortheirsystematicanalysisandquantification.Amathematicalmodelisdeveloped in which these concepts are precisely defined, and statistical methods are developed for the quantification of these factors from claim frequency data. The model also shows how these factorsinteracttoproducetheoverallvarianceforforecasts. It is not claimed that the particular form of model presented will be appropriate in all circumstances, but where necessary, modifications will often be possible within the general frameworkpresentedhere. Keywords Bayesian forecasting, claim frequency, collective risk model, contagion, heterogeneity, maximumlikelihoodestimation,negativebinomialdistribution,parameteruncertainty,poisson distribution,riskloading. 1 Motivation

1.1 WhyisthePoissonassumption‘never’appropriateforforecasting?

1.1.1 Introduction Claimnumberprobabilitydistributionsareusedinnonlifeinsuranceforforecastinganunknown future number of claims, whether for pricingor reserving purposes. For these purposes, the probability distribution should include all sources of uncertainty, otherwise riskmargins in reservesand/orpremiumswillbeinadequate. Itisusefultodistinguishaleatoricandepistemicuncertainty.Aleatoricuncertaintyisalsoknown asprocessuncertainty:epistemiccomprisesbothparameterandmodeluncertainty.Toillustrate theseconcepts,weconsidertheexampleofrollinganormalsixsideddie(which,incidentally,is theproblemthatpromptedtheinventionofprobabilitytheorybyPascalandFermatin1654).

1.1.2 Process,parameterandmodeluncertainty:dierollingexample Weaimtopredictthenumberofadieshowsa‘six’whenrolledafixednumberoftimes. Ifwearesurethatthedieisperfectlybalanced,thenthenumberof‘sixes’followsaBinomial distributionwithparameters nand p,where nisthenumberofthrowsand pis1/6.Wecanuse thisBinomialdistributiontomakeprobabilisticpredictionstatements,forexample:“ifweroll thedie4times,thechanceof4sixesis1in1,296”(thatis, 1/6 4).TheBinomialdistribution representspurealeatoric(orprocess)uncertaintyinthisexample. Theremayalsobeuncertaintyarisingfromdoubtastowhetherthedieisperfectlybalanced.The die may be weighted in such a way as to make a ‘six’ more or less likely. If it is weighted towardsshowingasix,thenclearlythechanceofobserving4sixesin4throwsisgreaterthan 1/1,296: if it is weighted against, then the chance of observing 4 sixes is less than 1/1,296. Perhaps less obviously: if we are uncertain as to whether the die is weighted for or against showingasix,thenthechanceofobserving4sixesin4throwsisusuallygreaterthan1/1,296 becauseofthepossibilitythatthedieisweightedinfavour. Suppose, forexample,thatbeforeobservinganythrowsofthedie,webelievethereisa40% chancethatitisperfectlybalanced( p =1/6or2/12),a30%chancethatinthelongrunitwill show1sixinevery12throws( p=1/12),anda30%chancethatinthelongrunitwillshow1 sixinevery4throws( p =¼or3/12). Thechanceofobtaininga‘six’inasinglethrowisnow: 0.30*1/12+0.40*2/12+0.30*3/12=1/6,whichisthesameasforaperfectlybalanceddie. However,thechanceof4sixesin4throwsis: 0.30*(1/12) 4+0.40*(2/12) 4+0.30*(3/12) 4=1/669,whichisnearlydoublethechanceof obtaining4sixesfromadiethatisknowntobeperfectlybalanced. Uncertainty about whether or not the die is perfectly balanced is an example of epistemic uncertainty. This type of epistemic uncertainty is also known as parameter uncertainty: it is uncertaintyaboutthevalueoftheparameterp.Theexampleillustratesthatparameteruncertainty generallyincreasesthechancesofextremeoutcomes.

1 Another type of epistemic uncertainty is model uncertainty. In the above calculations we assumedthatthediehasafixedshapeandinternalstructuresothatthechanceofthrowingasix remainsconstant.Perhapsthediehasasoftcentreofnonhomogeneousviscousfluid,soasit restswithasixuppermost,itscentreofmassshiftsdownmakingfurthersixesmorelikely.In this case our mathematical model, in which the probability p of a six remains constant, is incorrect.Thispossibilityfurtherincreasesuncertaintyaboutfutureoutcomes.

1.1.3 Backtoinsurance:Poissonprocess In the dierolling example, the appropriate probability distribution representing aleatoric uncertaintyistheBinomialbecausethenumberofsixesislimitedbythenumberofthrows.For numbersofnonlifeinsuranceclaims,aPoissondistributionisusuallymoreappropriate.Thisis because most nonlife policies provide cover for a fixed period of with no limit on the number of claims. There are exceptions to this, but where there is a contractual limit on the number of claims (as when cover terminates following the first claim, for example) the probabilityofaclaimoneachindividualpolicyisusuallysolowthatthePoissondistribution providesagoodapproximation.(RecallthattheBinomialdistributionwithparameters n and p approachesaPoissondistributionwithparameter λ=n.p as ptendstozero.) Somepoliciesdonotplaceacontractuallimitonthenumberofclaimsbutmaybesubjectto practicallimits.Forexample,inautoinsurance,acarisusuallyofftheroadforrepairsfollowing aclaim,whichcreatesanupperlimitofperhapsoneclaimeverycoupleofweeks.However,the probability of a claim in any two week period isusually so low that the Poissondistribution againprovidesaverygoodapproximation. SoforaleatoricuncertaintyweusethePoissondistribution.APoissondistributioniscompletely specified by its mean, which wedenoteλ. ThePoissondistribution has theproperty that the variance is equal to the mean. However, in the realworld, the parameter λ is never known precisely,soevenifthePoissonmodelisbasicallycorrect,wealsohaveepistemic(parameter) uncertainty. For this reason, the Poisson distribution should not be used for forecasting: we shouldinsteaduseadistributionwithvariancegreaterthanmean,suchastheNegativeBinomial.

1.2 Fourtypesofparameteruncertainty There are many possible sources of parameter uncertainty in nonlife insurance. These are consideredinthispaperasfourmaintypes: • Estimationuncertainty • Heterogeneity • Contagion • Exposureuncertainty

1.2.1 Estimationuncertainty Estimationuncertaintyisperhapsthepurestformofparameteruncertainty.Intherealworlditis alwayspresentandusuallymaterial,soestimationuncertaintyaloneprovidesastrongargument for‘never’usingthePoissondistributioninforecastingclaimnumbers.

2 Backtothedierollingexample:supposewehavetheopportunitytoobserve12throwsofthedie beforehavingtoforecastthenumberofsixesthatwillappearinafurther4throws.Inthe12 throws,weobserve6sixes. Thechanceof6ormore sixes appearing if thedie is perfectly balancedcanbecalculatedfromtheBinomialdistributionwith p=1/6aslessthan0.8%.We might reasonably conclude from this that p is probably notequal to1/6. In this case, having observed 6 sixes in 12 throws, we would naturally estimate p as ½. We could now base probabilisticpredictionsforthenext4throwsontheBinomialdistributionwith p=½.However, intheabsenceofanyadditional(‘prior’)information,theestimate p=½isquiteunreliable:the standarddeviationofthisestimateis0.13andwecanbeonly90%confidentthatthetruevalue of p isbetween0.29and0.71.(ThisisaBayesianconfidenceintervalbasedonauniformprior distribution:seeSectionA.1.1oftheappendixfordetails.) Instead of using a single Binomial distribution for forecasting, we should use a mixture of Binomials, allowing for the full rangeof possible values of theparameter p. The appropriate distributionforforecastinghereisknownasthe‘BetaBinomial’distributionasitisamixtureof BinomialsbasedonaBetadistributionfor p(furtherdetailsinSectionA.1.2oftheappendix). TheprinciplesofmixingwereillustratedbythenumericalexampleinSection1.1.2,wherewe foundthatparameteruncertaintyincreasesthechanceofextremeoutcomes. InthecaseofaPoissonclaimgeneratingprocess:supposetherewere6claimslastyear.Inthe absence of any other information, we estimate λ=6 for the annual frequency. The Bayesian posteriordistributionfor λisaGammadistribution,soinforecastingthenumberofclaimsnext year,weshoulduseaGammamixtureofPoissondistributions,whichis wellknown tobe a NegativeBinomialdistribution. Notethatthroughoutthispaper,“NegativeBinomialdistribution”referstothegeneralizedform inwhich theparameter cantake anypositive realvalue,notnecessarily restrictedto integer values.FurtherdetailsaregiveninSectionA.2.2oftheappendix.

1.2.2 Heterogeneity Heterogeneityreferstodifferentbasicunitsofexposure,andthecommonphenomenonthatnot allareequallyrisky.IntermsofthePoissonmodel:theparameterλisusuallynotthesamefor allbasicriskunits.Forexample,inautoinsurance,somedriversaremoreaccidentpronethan others:inprofessionalindemnityinsurance,somelawyersarelessscrupulousthanothers,etc. Heterogeneityalsoencompassessituationswheretheriskinessofasinglepolicychangeswith time.Forexample,anewlyqualifieddrivermaybecomelessaccidentproneduringthefirstyear ofdriving:aprofessionalpersonmaybemorelikelytomakemistakesduringbusyperiodsthan quiet periods, etc. In these situations, we couldconsider the basic unitof exposure tobe the “policymonth” (or some other suitable time period during which risk might reasonably be approximated as constant): we are then again concerned with the phenomenon that not all exposureunitshavethesamePoissonparameterλ. “Heterogeneity”canalsobestretchedtoaccommodatenonindependenceoflossesarisingfrom asinglebasicunitofexposure.Ifoneclaimtendstoleadtoanotheronthesamepolicy,thisis much thesamesituationasheterogeneity over time for a fixed riskunit.(An exampleofthis occursinhouseholdtheftinsurancewherethievessometimesreturnforthereplacementgoods providedbytheinsurer.)Itisalsopossiblethat,followingaclaim,thechanceofafurtherclaim fromthesameexposureunitistemporarilyreduced(aswhenacarisofftheroadforrepairs,for

3 example).Thiscausesapartialshifttowards a Binomialclaim generatingprocess, andas we shallseelater,canbeaccommodatedas“negativeheterogeneity”.However,allothercauseslead topositiveheterogeneity,andwhenallcausesarecombined,theoveralllevelofheterogeneityis nearlyalwayspositiveinnonlifeinsurance. Notethatheterogeneityinitspurestform(iedifferencesbetweenthebasicriskunits)isonlyan issuewhenforecastingifsomeofthefutureriskunitshavenotyetbeenobserved.Ifallriskunits havebeenselectedandpreviouslyobserved(forexample,whenconsideringrenewalofcurrent policies)heterogeneitydoesnotcausethevarianceofthenumberofclaimstoexceedthemean. Thisisbecause,if n1and n2arePoissonrandomvariableswithdifferentparametersλ1andλ2, thentheirsum n1+n 2isalsoPoisson(withparameterλ1+λ 2)sohasvarianceequaltomean. Heterogeneitybecomesanissueinforecastingifweexpectsomenewriskunits,butdon’tknow whethertheirPoissonparameterswillbeλ 1orλ 2.

1.2.3 Contagion In this paper, contagion refers to the possibility that the Poisson parameter will temporarily increase (ordecrease) for many risk unitssimultaneously, as a result of a common cause. In manypropertyclasses(auto,homeowners, fire,marineetc)amajorcauseofcontagionisthe weather. For example, while roads are icy, claim frequency temporarily increases for diverse categories of driver (whether they have high or low Poisson rates in normal conditions). In liabilityclasses,contagionmightbecausedbyalegalrulingthatleadstomultipleclaims. Thedifferencebetweenheterogeneity andcontagion isthatheterogeneityrefers todifferences between exposure units at a particular point in time (or within a particular epoch, such as a calendaryear)butcontagionreferstodifferencesinriskatdifferentpointsintime(orbetween differentdefinedepochs). Economicconditionsareanotherpossiblecauseoftimedependencyinmanylinesofinsurance (eg creditinsurance, workers compensation). However, these tend tocause gradual trends, or cyclesofseveralyearsduration,which,totheextentthattheyarepredictableatthestartofeach year, do not lead to increased parameter uncertainty. Contagion refers to unpredictable time effects:trendsarehandledseparatelyinthemathematicalmodeldevelopedlaterinthispaper. Trends are unpredictable to the extent that the parameters that describe them have to be estimated,butthisistakenintoaccountasparameterestimationuncertainty.

1.2.4 Exposureuncertainty Ifλrepresentstheexpectednumberofclaimsperunitofexposure,and xrepresentstheexpected exposure,thentheexpectednumberofclaimsis λ.x (assuminguncertaintyinthesetwofactorsis mutuallyindependent).Clearlysomeuncertaintyintheactualtotalnumberofclaimsisinduced byuncertaintyin x. Indierolling,exposureuncertaintyislikenotknowingthenumberofthrows nwhentryingto forecastthenumberofsixesthatwilloccur.Perhapswearetoldonlythatthediewillbethrown repeatedly for one minute. This is analogous to a reinsurance treaty that covers all primary policiesinforceforoneyear.Inreinsurancetreatypricing,thereisoftenuncertaintyinthefuture levelofexposurethatwillbecovered.Theextenttowhichthisneedstobeexplicitlytakeninto accountdependsonhowthetreatywillbepriced.Iftheaimistoproduceapremiumrateper unitofexposure,thenexposureuncertaintyhasonlysecondordereffects.

4 1.2.5 Summary Attheriskorstatingtheobvious:eachofthefoursourcesofparameteruncertaintydescribed aboveincreasesuncertaintywhenforecastingnumbersofclaims.IfthePoissondistributionisa reasonable model for aleatoric uncertaintyinclaimnumbers, thenthepresence ofany oneof thesefourtypesofparameteruncertaintywillincreasethevariancesothatitexceedsthemean.It isimportantthatallfourtypesofparameteruncertaintyareconsideredotherwisevariancemay beunderstated,leadingtoinadequateriskloadsinpremiumsand/orreserves. In Section 2 of this paper, we develop a fairly general mathematical model that incorporates Poisson process error and the four types of parameter uncertainty: estimation error, heterogeneity, contagion, andexposureuncertainty. Section 3outlines statistical methods that canbeusedtocalibratethemodelbasedonpastfrequencyandexposuredata.First,inSection 1.3,wegivesomeexamplestohighlighttheimportanceoftakingcarewiththecalibrationofthe claimnumberdistribution.

1.3 Effectofusingwrongclaimnumberdistribution

1.3.1 Lowfrequencyexamplewithestimationerror Consider a catastrophe excess of loss reinsurance treaty covering a specific type of natural catastropheforoneyear.Thecedenthasdecidedtobuycoverwiththepereventretentionsetat the level of a 10year return period. To assess the amount of sideways cover required (the numberofreinstatements)thecedentisinterestedinquestionssuchas:whatistheprobabilityof twoormoresuchlargeeventsoccurringinasingleyear?Theanswertothisquestionclearly dependsontheprobabilitydistributionassumedforthenumberofsuchlargeeventsinoneyear. Bydefinitionofeventswitha10yearreturnperiod,themeanofthisdistributionis0.1. Wenowsupposethatonlyearthquakeswillbecoveredbythetreatyandthecedentcarriesout calculations using a Poisson distribution in the belief (quite widely supported among seismologists)thatthisisareasonablemodelforearthquakeoccurrence. ThemiddlecolumnofTable1givestheprobabilityofneventsforaPoissondistributionwith meanequalto0.1.Theprobabilitythatinasingleyeartwoormoreeventsoccureachproducing alossinexcessofthe10yearreturnvalueis0.468%(=100%90.484%9.048%). Table1–Probabilityofnclaims–effectofparameteruncertainty No allowance for parameter Allowance for parameter estimation n uncertainty(Poisson) error(NegativeBinomial) 0 90.484% 90.555% 1 9.048% 8.913% 2 0.452% 0.509% 3 0.015% 0.022% 4 0.000% 0.001%

5 However,thiscalculationignoresparameteruncertainty.Thecedentmostlikelyobtainedthe10 year return period loss amount by using a catastrophe modelling system: let us suppose this producesa10yearlossfigureforthecedentof$100million,sothisisthepereventretention beingconsidered.Becauseofparameteruncertaintyinthecatmodel,thetrueannualfrequency ofquakescausingagrosslosstothecedentinexcessof$100millionisnotknownwithcertainty tobe precisely 0.1. A typical95%confidence interval for the true frequency wouldbe from 0.038to0.192(correspondingtoareturnperiodofbetween5.2and27years).Whenthisistaken intoaccount,thevarianceofthenumberofeventsexceeding$100millionincreasesfrom0.1to 0.1016(detailsaregiveninSectionA.2.3oftheAppendix).Thechanceoftwoormorelossesin excessof$100mincreasesfrom0.468%to0.532%asshowninthefinalcolumnofTable1. Toseewhateffectthisparameteruncertaintyshouldhaveontechnicalpremiums,wecalculate theaggregatelossdistributionforatreatycoveringthelayer$100mexcessof$100meachevent. We assume that the severity distribution for the part of cat losses in excess of the $100m retentionisParetowithscale$100mandshapeparameter2.Thisimpliesthat,foreachlossevent impactingthetreaty,themeanamountrecoverableis$50mandthereisa25%chancethatthe full$100mwillberecoverable. The aggregate lossunder the treaty hasbeencalculatedusingboth the Poisson and Negative Binomial models for the number of events exceeding $100m gross. Results are shown in columns2and3ofthetablebelow.Forbothmodels,theexpectednumberofeventsis0.1and themeanlosstothetreatypereventis$50msothemeanaggregatelossis$5m.Howeverthe standarddeviationoftheaggregatelossisslightlyhigherusingtheNegativeBinomialmodel. Thefinaltwocolumnsrelatetotheaggregatelossdistributionforthesameeventlimits($100m excessof $100m) but with an aggregatedeductibleof $100m (inother words, the treaty is a ‘backup’treaty).Inthiscase,theuseofaPoissondistributionunderstatesthemeanaggregate lossaswellasthestandarddeviation. Table2Aggregateloss($m)fortreatycoveringlayer$100mxs$100mperevent Noaggregatedeductible Inneraggregate£100m Poisson NegBin Poisson NegBin Mean $5.00m $5.00m $0.11m $0.12m StdDev $19.65m $19.76m $2.72m $2.95m Premium $14.83m $14.88m $1.46m $1.60m Thefinalrowofthetableshowsatechnicalpremiumcalculatedasmeanplushalfthestandard deviation. In the case of the treaty with no aggregate deductible, it is just the risk load (the standard deviation) that is slightly understated by ignoring estimation error in the frequency parameter,leadingtoasmall,probablyimmaterial,understatementoftechnicalpremium.Inthe case of the backup treaty (aggregate deductible of £100m) both the mean and the standard deviationareunderstatedbyignoringparameterestimationerrorinthefrequency,leadingtoan understatementinthetechnicalpremiumofapproximately8%. Inthecaseofeventssuchashurricanesthathaveatendencytooccurinclusters,theuseofa Poissondistributionwouldhaveamoresignificantimpactbecauseitwouldignorecontagionas wellasparameterestimationerror.Contagionisconsideredinthenextexample.

6 1.3.2 Highfrequencyexamplewithcontagion We consider the aggregate loss arising from a homeowners portfolio. The individual loss severitydistributionisLogNormalwithmeanequalto$1,000andstandarddeviationequalto $5,000:thishasacoefficientof140andimplies1in100lossesexceeds$13,000,1in 1,000exceeds$52,000and1in10,000exceeds$161,000.Wesupposethereisapolicylimitof $100,000 per claim: this reduces the mean, standard deviation and skewness of the loss distributionto$978.05,$3,632and14.1respectively.Themeanclaimfrequencyisestimatedas 0.2perpolicy year, andwehaveaportfolioof1millionpolicies, sotheexpectednumberof claimsnextyearis200,000. A NegativeBinomialdistribution(whichhas variance greaterthan mean) ismoreappropriate thanaPoissondistribution(varianceequaltomean)forallthereasonsdiscussedinSection1.2, but contagion will be the dominant cause of superPoisson uncertainty in this case. The assumptionofindependentincrementsrequiredforaPoissonprocessisclearlyviolated:claims fromahomeownersportfoliodonotoccurindependentlybecausemanyclaimsmayresultfrom asinglecause(oftenweatherrelated). Hopefully, nobody would ever use a Poisson distribution in this situation as it is so clearly wrong.ThePoissondistributionhasstandarddeviationequaltosquarerootofthemean,sothe variationcoefficient(defined as the ratio ofstandarddeviation to mean) tends to zero as the meanincreases.Thisisaninstanceofthe‘lawoflargenumbers’,whichappliesherebecauseof theindependenceassumptionimplicitinthePoissonassumption. Inthisexample,themeanis 200,000,sothePoissonstandarddeviationwouldbe447.2,givingavariationcoefficientofonly 0.22%forthenumberofclaims.Theindividuallossdistribution(thecappedLogNormal)isso skewedinthisexamplethatmostoftheuncertaintyintheaggregatelosswouldarisefromthis ratherthanfromvariationinthenumberofclaims.(Thevariationcoefficientoftheaggregate loss calculated using a Poisson assumption is in fact 0.86%: nearly four times the value attributabletoPoissonclaimnumbervariation.) When there are multiple units of exposure as in this example, Poisson parameter uncertainty causedbyestimationerrororcontagionimpactsallexposureunitsinthesamedirection,sothe variation coefficient of the total number of claims does not tend to zero as the size of the portfolioincreases.Thevariationcoefficientinsteadtendstoafiniteconstantdeterminedbythe amount of estimation error and contagion. (Here it is assumed that the degree of contagion remainsthesameastheportfoliogrows:ifcontagioniscausedprimarilybyweathereffects,then thedegreeofcontagionwillfallifthegeographicspreadoftheportfolioincreases.)Thegeneral model developed in Section 2 shows that the squared variation coefficient of the number of 2 2 2 2 claimsapproaches ρ e +ρ + ρ e .ρ c as theportfolio sizeincreases,where ρe andρ c are the variation coefficients for estimation error and contagion respectively. In this example, we suppose there is no estimation error (ρ e = 0: never true in the real world!) so the variation coefficientforclaimnumbersapproachesρ c.Theprecisedefinitionof ρc iselaboratedinSection 2:forthetimebeing,thereaderisaskedtoacceptthat5%isarealisticvalueforthisexample. We therefore use a Negative Binomial distribution with mean equal to 200,000 and standard deviationequalto10,000torepresentthetotalnumberofclaimsthatwilloccurnextyearonthe portfolio.(IntermsoftheNegativeBinomialparametersusedintheappendix:p=0.002,r= 400.5016.)Table3belowgivesmomentsoftheaggregatelossdistributioncalculatedusingthe collectiveriskmodel.

7 Table3–Momentsofaggregatelossdistribution CappedLogNormal Constantclaimseverity Mean $195,609,000 $195,609,000 StdDeviation $9,914,450 $9,780,450 VariationCoefficient 5.07% 5.00% Skewness 0.10002 0.09990 0.01499 0.01497 Quintessence 1.0032 1.0020 Sixth 0.3257 0.3251 Results inthe middle columnarebasedon the cappedLogNormaldistributionforindividual claimamounts(whichhascoefficientsofvariationandskewnessof3.71and14.1respectively). ResultsinthefinalcolumnarebasedonthesameNegativeBinomialdistributionforthenumber of claims, but a constant claim severity of $978.05: the variation coefficient, skewness and highermomentsgiveninthelastcolumnarejustthoseoftheNegativeBinomialdistribution usedtomodelthenumberofclaims. ThestrikingfeatureoftheseresultsishowclosetheresultsbasedontheLogNormalseverity are to those calculated using a constant severity. This illustrates the fact that where there is contagion, the shape of the individual loss distribution becomes irrelevant as the expected numberofclaimsincreases .(Foraformalproof,seeLundberg(1964).)Thesameapplieswhere there is parameter estimation or exposure uncertainty. Intuitively, this can be seen as a consequence of the lawof large numbers: as the number of claimsbecomes large, their total approaches their number multiplied by the mean of the severity distribution: variation in the amounts of individual claims cancels out until it is negligible compared to variation in the numberofclaims.ThisisnotthecaseinaPoissonmodelbecauseinthatcasethecoefficientof variationinthenumberofclaimsalsotendstozero. The practical consequence is: in the presence of contagion and/or estimation error and/or exposureuncertainty(iealways),whentheexpectednumberofclaimsisverylarge,weshould notwasteourtimeandeffortdevelopingaveryrefinedmodelofindividuallossamounts.We shouldinsteaddevotemostofourefforttoensuringthatwehaveagoodmodelforthenumberof claims,becausetheaggregatelossdistributionultimatelydependsonlyonthis.

8 2 Generalformulaforsingleepochforecasts

2.1 Overview In this section, general formulas for the mean and variance of a claim number forecasting distributionaredeveloped.Theformulastakeaccountofallfivetypesofuncertaintyidentified inSection1: • Poissonprocessuncertainty • Parameterestimationuncertainty • Heterogeneityofriskunits • Contagioneffects • Exposureuncertainty Notethatweareconcernedinthissectiononlywiththefirsttwomoments(meanandvariance) of the claim number distribution. These are sufficient to calibrate a Negative Binomial distribution for the forecast number of claims. Further comments on the use of a Negative BinomialdistributionforthispurposearegiveninSection2.3.4. ThenumberofclaimsbeingforecastisdenotedN,anditsmeanandvariancearedenotedE(N) andVar(N). GeneralformulasforE(N) andVar(N)arederivedbyrepeatedapplicationofthe followingresultsfromthetheoryofconditionalprobability: E(X) =E(E(X|Y)) 1 Var(X) =E(Var(X|Y))+Var(E(X|Y)) 2 WestartwiththePoissonmodelofaleatoricuncertainty,thenapplyformulas1and2oncefor eachofthefourtypesofparameteruncertainty.Thefinalmodeldoesnotdependontheorderin which the four types of parameter uncertainty are taken into account. In the following we introduce them in the order: heterogeneity, exposure uncertainty, contagion, and finally parameteruncertainty.

2.2 Derivationofthegeneralformula

2.2.1 Poissonprocessuncertainty AsdiscussedinSection1.1.3,ourbasicmodelofaleatoricuncertaintyisthePoissonprocess. ThePoissonparameter mayvarybetweenriskunits within anepoch(heterogeneity) andmay varyfromoneepochtothenextthroughcontagioneffects.Soweuseλ it todenotethePoisson parameterrepresentingthemeanclaimfrequencyforexposureunitiinepocht. Initially, we consider a single epoch, and to simplify the notation, we temporarily drop the subscriptt.Considerasingleriskunitforwhichthemeanfrequencyλiisknown.Sinceweare assumingthatthenumberofclaimsthatwillarise,Ni,hasaPoissondistributionwehave:

E(N i |λ i) =Var(N i |λ i)=λ i 3

9 2.2.2 Heterogeneity

2.2.2.1 Singleriskunitdrawnatrandom Still within a single epoch, imagine that we draw a single risk unit at random from a heterogeneouspopulationofriskunits.λandρhdenotethemeanandvariationcoefficient(across allriskunits)ofthePoissonparameters.λinowrepresentsthePoissonparameterofthesingle randomlyselectedriskunit:Ni representsthenumberofclaimsthatwillarisefromthisrandomly selectedunit.Ifthepopulationparametersλ,ρhareknown: 2 E(λ i)=λ and Var(λ i)=(ρ h.λ) bydefinitionofλandρ h 4

Wenowapplyformulas1and2toobtainthemeanandvarianceofN i:

E(N i) =E(E(Ni|λi)) byequation1

=E(λi) byequation3 =λ byequation4 5

Var(Ni) =E(Var(Ni|λi))+Var(E(Ni|λi)) byequation2

=E(λi)+Var(λi) byequation3 2 =λ+(ρh.λ) byequation4 2 =λ.(1+φ.λ) whereφ=ρ h 6 Laterwewillallownegativevaluesofφ.Clearlyifφisnegative,itcannotrepresentthesquared variationcoefficientofheterogeneityasdefinedabove.Instead,asdiscussedinSection1.2.2,a negative value of φ may arise from nonPoisson aleatoricuncertainty with variance less than mean. If aleatoric uncertainty is better approximated as Binomial, with r being the maximum possiblenumberofclaimsperriskunit(risassumedthesameforallriskunits),theninsteadof Var(N i|λ i)=λ i(fromequation3)wehaveVar(N i|λi)=λ i.(1λ i/r),(becauseλ iisnowtheBinomial 2 2 mean: λ i = r.p i say). The above calculation then gives Var(N i)=λ.(1+{ρh – (1+ ρh )/r}.λ), 2 2 whichisthesameasequation6butwithφ=ρh –(1+ρh )/r.Equation6canbeviewedasthe special case in which r is infinite. For smaller values of r, the combined effects of pure heterogeneity (as represented by ρ h) and thenonPoisson aleatoric uncertainty may produce a negativevalueforφ.Thisishoweverunusualinnonlifeinsurance.

2.2.2.2 Multipleriskunitsdrawnatrandom Now suppose that x units of exposure are selected independently at random from the heterogeneouspopulation,and thateachunit generatesclaimsindependently. We also assume thatthepopulationisverylargesotheparametersλandρhoftheremainingpopulationhardly change as risk units are withdrawn. Ndenotes the total numberof claims from all xunitsof exposure.Themeanisthesumofthe,and(byindependence)thevarianceisthesumof the,sofromtheaboveresultsforasingleunit ofexposure (equations5 and6) we have: E(N|x) =x.λ 7 Var(N|x) =x.λ.(1+φ.λ) 8

10 2.2.2.3 Sampledrawnpartlyatrandom Moregenerally,supposeaproportionqofthexunitsofexposureisselectedatrandomfromthe heterogeneous population, the remainder being a fixed sample with mean Poisson parameter equaltoλ.Inthiscasewehave: Var(N|x) =(1q).x.λ+q.x.λ.(1+φ.λ) =x.λ.(1+q.φ.λ) 9 Thefirsttermisforthe(1q).xexposureunitsthatarefixedwithaveragePoissonrateequaltoλ. Eventhoughtheremaybeheterogeneitywithinthisgroup,thetotalnumberofclaimsisthesum ofthePoissonnumbersgeneratedbyeachunitofthegroup,soisalsoPoissonwithparameter equal tothe sumof theindividual Poissonparameters,that is(1q).xtimes the meanPoisson parameter.Heterogeneitywithinthisfixedpartoftheportfolioaffectsthereliabilitywithwhich the mean Poisson rateλcan be estimated from pastdata for the sameportfolio, but this is a separateissuetakenintoaccountlater(parameterestimationerror,Section2.2.5). Equation9 approximates theusualsituationin experiencerating applicationsofthecollective riskmodel:wearepricingfora futureperiodhavingobservedclaimnumbersinpastperiods. Thepastdataareusedtoestimateλ.Ifthefutureexposureismostlyrenewalbusiness,itcan usuallybeexpectedtohaveapproximatelythesameunderlyingparameterλasinthepast(unless thereisreasontobelievethatthelapsingbusinesswillnotbearepresentativecrosssectionof theportfolio). Anynewbusinesshasnot beenobservedin thepast, sothere istheadditional uncertaintyofdrawingthisfromaheterogeneouspopulation.

2.2.3 Exposureuncertainty We nowsupposethat thereisuncertaintyin thelevelofexposure x:weuse mtodenote the expectedexposureandρ xtodenotethevariationcoefficient: 2 E(x)=mand Var(x)=(m.ρ x) 10 Applyingequations1and2againwehave: E(N) =E(E(N|x)) byequation1 =E(x.λ) byequation7 =m.λ byequation10 11 Var(N) =E(Var(N|x))+Var(E(N|x)) byequation2 =E(x.λ.(1+q.φ.λ))+Var(x.λ) byequations7and9 2 =m.λ.(1+q.φ.λ)+(m.λ.ρ x) byequation10 2 2 =m.λ+(m.λ) {q.φ/m+ρ x } =m.λ+α.(m.λ) 2 12 2 where:α =q.φ/m+ρ x 13

11 2.2.4 Contagion Wenowconsidermorethanoneepoch(forexample,accidentyears)sowereintroducesubscript t(fortime)todistinguishepochs.Inplaceofλwenowhaveλ t,representingthemean(overall units of exposure) of the Poisson rate in epoch t. (The squared variation coefficient for 2 heterogeneity,φ=ρ h ,mayalsodependontime:thispossibilityisconsideredinSection3.)

Weusetodenote theunderlyingmean frequency, andρ c todenotethe variationcoefficient causedbycontagioneffects: 2 E(λ t)=and Var(λ t)=(.ρ c) 14 fromwhichwehave: 2 2 E(λ t ) =E(λ t) +Var(λ t) bydefinitionofvariance 2 2 = .(1+ρc ) bydefinitionofandρ c 15 Contagion effects are assumed to be stochastically independent for different epochs. The possibility of trends is also considered in Section 3: to allow for trends, is replaced by t. Contagion refers only to the random component of λ t that occurs in addition to any trend changes. Nevertheless, the assumption of stochastic independence of contagion effects across epochsisastrongassumptionandshouldbecriticallyconsideredinanymultiepochapplication ofthemodeldevelopedhere. Introducingsubscriptt,andmakingtheconditioningonλ t explicit,equations11and12become:

E(N t|λt) =m.λ t 16 2 Var(N t|λt)=m.λt+α.(m.λ t) 17 Applyingequations1and2again:

E(N t) =E(E(N t|λt)) byequation1

=E(m.λ t) byequation16 =m. byequation14 18

Var(N t) =E(Var(N t|λt))+Var(E(N t|λt)) byequation2 2 =E(m.λt+α.(m.λt) )+Var(m.λ t) byequations16and17 2 2 2 2 =m.+α.m . .(1+ρc )+(m..ρ c) byequation14 2 2 2 =m.+(m.) .{α.(1+ρc )+ρ c } 2 2 =m.+(m.) .{(1+α).(1+ρc )1} 19

12 2.2.5 Parameterestimationuncertainty Finally,recognizingthatequations18and19areconditionalontheunderlyingmeanfrequency beingknownprecisely,theuncertaintyincanbeintroducedbyapplyingformulas1and2 onemoretime.Using’torepresentanunbiasedestimateof,andρe todenotethevariation coefficientoftheestimationuncertainty,wehave: 2 E()=’ and Var()=(ρ e.’) 20 NotethatwetakeaBayesianperspectivehere:theestimate’isknownsoisnotconsideredto bearandomvariable:thetruevalueisunknownsoisconsideredtobearandomvariable. Fromequation20wehave: 2 2 2 E( ) =’ .(1+ρ e ) 21 Sofinallywehave:

E(N t) =E(E(N t|)) byequation1 =E(m.) byequation18 =m.’ byequation20 22

Var(N t) =E(Var(N t|))+Var(E(N t|)) byequation2 2 2 =E(m.+(m.) .{(1+α).(1+ρc )1})+Var(m.) byequations19and20 2 2 2 2 =m.’+(m.’) .(1+ρ e ).{(1+α).(1+ρc )1}+(ρ e.m.’) byequation21 2 2 2 =m.’+(m.’) .{(1+α).(1+ρc ).(1+ρ e )1} 23 Equation23istheproposedgeneralformulaforthevarianceofthenumberofclaimsinasingle future epoch, taking account of all the main sources of uncertainty. This can alternatively be expressedin termsof the squared variation coefficient(Vco) whichisdefinedas the variance (equation23)dividedbythesquareofthemean(equation22): 2 2 2 2 2 Vco (N t) =1/(m.’)+{(1+ρ x +ρh .q/m).(1+ρc ).(1+ρ e )1} 24 Inequation24,αhasbeenreplacedbyitsdefinition(equation13).

2.3 Commentsonthegeneralformula

2.3.1 Comparisonwithotherclaimnumberformulasinactuarialliterature Many texts on risk theory allow for the overall effect of parameter uncertainty but do not distinguish the variouscauses in a general mathematical model. Here, weconsider two well known works: Heckman and Meyers (1983) and Daykin, Pentikäinen and Pesonen (1994), denotedH&MandDP&Pinthefollowing.

13 H&M discuss contagion, heterogeneity, and parameter estimation error, then use a single parameterc(whichtheycallthecontagionparameter)torepresenttheoveralleffectthroughthe equation: Var(N) =E(N).{1+c.E(N)} 25 Comparingthistothegeneralformuladevelopedhere(equations22and23above)weseethat: 2 2 2 2 c =(1+ρ x +ρh .q/m).(1+ρc ).(1+ρ e )–1 26 sotheH&Mcontagionparametercanbeseenasrepresentingthecombinedeffectofthefour typesofparameteruncertaintyconsideredinthepresentpaper. DP&P consider what we have called ‘contagion’, although they use the terms ‘shortperiod oscillations’ and ‘contamination’. To model this they use a multiplicative ‘mixing variable’ 2 denoted qwithE( q)=1,Var( q)=σq ,andarriveattheformula(theirequation2.4.12): 2 Var(N) =E(N).{1+σq .E(N)} 27 2 Inthepresentpaperwemodelcontagion(Section2.2.4)using:E(λ t)=andVar(λ t)=(.ρ c) .If 2 wedefine qt=λ t/,thenclearlywehaveλ t= qt.withE( qt)=1andVar( qt)=ρ c ,fromwhichit is clear that this formulation is equivalent to that used by DP&P, with ρ c = σq. Comparing equation 27 with the general formula (equation 24) we see that the formula of DP&P is the specialcaseρx=ρ h=ρ e=0.DP&Palsodiscussheterogeneity,andpointoutthatamultiplicative mixingvariablecanalsobeusedtomodelthis(althoughtheyprefertheterm‘structurevariable’ inthiscase)andtheyciteAmmeter(1948)andBuhlmann(1970)inthisconnection.However 2 theymakeitclearthat,fortheirpurposes,σq representscontagiononly,stating“formostofthe applicationsdealtwithinthisbooktheinnervariationinthecollectiveisnotrelevant”.Theydo not explicitly consider the effects of parameter estimation error and exposure uncertainty on claimnumberforecasts. By explicitly distinguishing the various causes of parameter uncertainty in a general mathematicalmodel,thepresentpaperaimstoensurethatallcausesaretakenintoaccountso thecombinedeffectisnotunderstated.

2.3.2 Limitingandspecialcasesofthegeneralformula

2.3.2.1 Limitasexpectedexposureincreases As the expected exposure m increases, with allother quantitiesconstant, the general formula (equation24)forthevariationcoefficientofaclaimnumberforecastapproaches: 2 2 2 2 Vco (N) =(1+ρ x ).(1+ρc ).(1+ρ e )1 28

ThissupportsthecommentmadeintheexampleofSection1.3.2,wherewealsoassumedρ x=0.

2.3.2.2 Limitasparameteruncertaintydecreases

Ifthefourcomponentvariationcoefficientsareallmuchsmallerthanone(exceptρ h,whichneed onlybesmallcomparedtothesquarerootofm/q),thenproducttermsbecomeimmaterialand equation24becomesapproximately: 2 2 2 2 2 Vco (N) =1/(m.’)+ρx +ρh .q/m+ρc +ρ e 29

14 Thisisjustthesumoffiveterms,onerelatingtoprocessuncertainty,andonerelatingtoeachof thefourtypesofparameteruncertainty. Ingeneral(whenthecomponentvariationcoefficients are not necessarily so small that cross terms can be neglected) the total uncertainty can be consideredasthesumofthefivetermsofequation29,plusanadditionaltermrepresentingthe interactionsbetweenthefivesourcesofuncertainty.

2.3.2.3 Caseofnocontagion,heterogeneityorexposureuncertainty

In the realworld there is always parameter estimation error: that is, we always have ρ e > 0. However,itispossible,thoughunusual,tohavenocontagion,noheterogeneityandnoexposure uncertainty:thatisρ x=ρ h=ρ c=0.Itisinterestingtoexaminetheformulainthisspecialcase becausethisgivesthesmallestpossibleforecastingvariancewhenaleatoricvariationisPoisson. (SmallerforecastingvarianceispossibleifaleatoricuncertaintyhaslowervariancethanPoisson becausethenitispossiblefor“heterogeneity”tobenegative,asdiscussedinSection1.2.2.) Thegeneralformula(equation23)forthevarianceofthenumberNofclaimsthatwillarisefrom mexposureunitsreducesinthisspecialcaseto: 2 2 Var(N) =m.’+(m.’) .ρ e 2 =m.’+m .Var(’) bydefinitionofρ e 30 CalibrationoftheformulainthegeneralcaseiscoveredinSection3.However,inthisspecial case,calibrationfromexperiencedataisparticularlysimple,ifitcanbeassumedthatρ h=ρ c=0 inthepastaswellasinthefuture.Underthisassumption,thePoissonparameteristhesamefor allriskunitsandinallepochs:λ it =foralliandt.Soifweobservedm 0riskunitsandthe numberofclaimswask,thentheobviousestimateofis’=k/m0.Estimationuncertaintyis 2 thenVar(’)=Var(k)/m0 =/m0 (becauseVar(k)=m 0.bythePoissonassumption).So (replacingwithitsestimate’)equation30becomes: 2 2 Var(N) =m.k/m0+m .k/m0

=g.(1+g).k whereg=m/m0 31 The first term (g.k) represents Poisson process uncertainty: the second term (g 2.k) represents parameterestimationuncertainty. Tointerpretequation31, we firstconsiderthe specialcase g=1,that is,bothestimation and predictionarebasedonthesamenumberofriskunits.(Forexample,wehaveobservedoneyear ofexperienceandaimtopredictthenumberofclaimsinonefutureyear,intheknowledgethat thesizeoftheportfoliowillnotchange.)Inthiscase,equation31becomesVar(N)=2.k.Thisis easilyinterpretedbynotingthatVar(N)representstheexpectedsquareddifferencebetweenthe best estimate k and the actual number N of future claims. Since these are, by assumption, independentPoissonvariables,theirexpectedsquareddifferenceisthesumoftheirvariances, whichisthesumofthetwoPoissonparameters.Againbyassumption,theybothhavethesame Poissonparameter,andtheobviousestimateofthisistheobservednumberk,sothesumofthe twovariancesis2.k. For gnotequal to1, the obvious estimate of N is g.k, so the Poissonprocess componentof Var(N)becomesg.k.Theestimationerrorcomponentvarieswiththesquareofgbecause,fora givenvalueofg:Var(g.k)=g 2.Var(k).

15 Notethatifgismuchgreaterthanone,theProcessuncertainty(g.k)maybenegligiblecompared to the estimation uncertainty (g 2.k). Conversely, if g is much less than one, the process uncertaintywilldominateandtheestimationuncertaintymaybenegligible:thisisillustratedby theexampleinSection1.3.1. Theabovederivationofequation31fromequation30appeals tointuitiononwhatis a good estimateof. Moreformalargumentscan leadtoslightlydifferent results, depending on the assumptionsmade.Theappendix(SectionA.2.2)givesaBayesianargumentleadingto:

Var(N) =g.(1+g).(k+1)insteadofequation31 32 TheBayesianargumentgivenintheappendixalsoshowsthat,undercertainstatedconditions (allowingforprocessvariationandparameterestimationerroronly),theforecastingdistribution ofNispreciselyNegativeBinomial.

2.3.2.4 Caseofnoparameterestimationuncertaintyandnoexposureuncertainty Here we consider variation in the number of claims N caused only by process variation, heterogeneity and contagion. This is needed for the development of the statistical calibration methodsdescribedinSection3. Ifthereisnoexposureuncertainty(ρ x =0)thenthequantityα(definedbyequation13)becomes: α = q.φ /m. Equation 19 (which does not include parameter estimation uncertainty) then becomes: 2 2 Var(N t) =m.+(m.) .{(1+q.φ/m).(1+ρc )1} 33 Thiscanalternativelybeobtainedfromthegeneralformula(equation24)bysettingρ x =ρ e =0.

2.3.3 Applicationofthegeneralformula Equations22and23givethefirsttwomomentsoftheforecastingdistributionforthenumberof claimsN.ThefirsttwomomentsaresufficienttospecifyaNegativeBinomialdistribution.An advantage of using the Negative Binomial for forecasting claim numbers is that this enables established compounding methods (those studied by Panjer (1980) and Heckman & Meyers (1983))tobeusedincalculatingaggregatelossdistributions.UsingaNegativeBinomialisalso clearly better than using a Poisson distribution: the Negative Binomial allows two moments (meanandvariance)tobecorrectlyspecified,thePoissonallowsonlyonemoment(mean)tobe correctlyspecified. In the appendix (Section A.2.2) the Negative Binomial distribution is parameterised using parametersrandp.Theseparametersarerelatedtothefirsttwomomentsthrough: p=E(N)/Var(N)andr=E(N).p/(1p) 34 ItwasnotedinSection2.3.1thatthegeneralformulaforVar(N)(equation23)canbeexpressed asVar(N)=E(N).{1+c.E(N)}whereE(N)=m.’andcisgivenbyequation26.Therequired NegativeBinomialparametersaretherefore: p=1/{1+c.m.’}andr=1/c 35 Although the Negative Binomial is exact in a few, somewhat artificial, special cases (for example:(a)thecaseofnocontagion,noexposureuncertaintyandnoheterogeneitydiscussedin SectionA.2.2oftheappendix;and(b)thecaseofnocontagion,noexposureuncertainty,no

16 parameterestimationuncertaintyandGammadistributedheterogeneity),theNegativeBinomial isingeneralonlyanapproximation.Ingeneral,allowingforallsourcesofuncertaintyproduces a mixtureof NegativeBinomialdistributions.(InSection3.3onesuchmixturedistribution is developed that allows for process error, heterogeneity and contagion only.) The Negative Binomialprovidesanadequateapproximationinmanycircumstancesbutofcourseitwillnot always be appropriate. In particular, when the expected number of claims is very large, it is advisabletoconsiderwhethersomeotherdistribution(egBetaBinomial,oramixtureorsumof NegativeBinomials)wouldbebetter.Thisisbecause(asillustratedbytheexampleinSection 1.3.2),astheexpectednumberofclaimsincreases,theaggregatelossdistributionapproachesthe assumedclaimnumberdistribution,sothisbecomescritical. However,regardlessof whetherthe Negative Binomialorsomeotherdistributionisused, the generalformulaforthevariancedevelopedinthispaper(equation23)isequallyvalid.

3 Calibrationoftheformulafrompastdata

3.1 Introduction Toapplythegeneralformulafortheuncertaintyofclaimnumberforecasts(equation24),itfirst needstobecalibrated,thatis,valuesneedtobeassignedtothesevenquantitiesappearinginthe formula:thefourvariationcoefficients(ρx,ρ e,ρ h, ρc)andthequantitiesm,qand. Inthissectionweconsidertheuseofdataonpastclaimfrequenciestocalibratethemodel,on theassumptionthatthefuturewillbesimilartothepast.Claimfrequencydataisoflittleorno relevance for determining the quantities m, ρ x and q (representing expected future exposure, uncertainty in exposure, and proportion of exposure that will be new business). We do not considerthesethreeotherthantosaythattheyshouldbebasedonwhateverrelevantinformation isavailable:qcouldbeestimatedbylookingatdataonpastrenewalandlapserates,mandρ x by lookingpastgrowthrates.Forallthree,businessplansarelikelytobemorerelevantthanany past data, and judgement more important than statistical analysis. The statistical methods developedinthissectionareforestimationoftheotherfourparameters,(,ρ e,ρh,ρc)frompast claimnumbersandcorrespondingexposures.

Theseparameterswillallbesubjecttoestimationerror.ρ e representstheestimationerrorinbut willitselfbesubjecttoestimationerror,aswilltheestimatesofρ h andρc.Wedonotattemptto quantifytheestimationerrorinthequantitiesρ e,ρh andρc:thiswouldbetogivetoomuchweight to past data. Instead, we develop statistical methods that produce point estimates of these quantities:the methods arebasedonsound statisticalprinciples(such asmaximumlikelihood estimation)sothepointestimatesarebelievedtobeaboutasreliableaspossible.Toallowfor estimation error, the statistical estimates of ρe, ρ h and ρ c should be considered and adjusted judgementally if necessary. For example, if the past experience shows little evidence of contagion,thestatisticalmethodsdevelopedherewillproducealowfigure forρ c.Itcouldbe thatlargecontagioneffectsarepossible,butsimplythatnonehaveoccurredintherecentpast. Becauseof this possibility, an analysis of the data to obtain an objective assessment of the reliabilityof thepointestimateρ c wouldbespurious. No amountof statisticalanalysiswould reliably indicate the potential for contagion effects larger than any that have occurred, so judgementisrequired.Thestatisticalmethodsdescribedhereprovideanobjectivestartingpoint fortheapplicationofjudgement.

17 We restrictattentionin thispaper toshorttailclasses for which actualnumbersof claims are knownwithsomecertaintyforallpastexposureperiods:thedatasetisassumedtoconsistofthe followingthreeitemsforeachdatapoint: • pastepocht

• exposurex tk

• correspondingactualnumberofclaimsn tk

A triple of values (t, x tk , n tk ) is called a datapoint or an observation. A dataset will usually containmanydatapoints,butmaycontainjustone.Thesubscriptkallowsforthepossibilityof morethanonedatapointforthesameepoch.Datapointsdonotnecessarilyrelatetosingle‘risk units’(howeverthesemightbedefined):thenumberofriskunitsisindicatedbyx tk . Ifthereismorethanoneobservationinanyoneepochitispossibletoestimatetheheterogeneity 2 parameterφ(=ρ h ).Ifthedatasetalsocoversmorethanonepastepochitispossibletoestimate thecontagionparameterρ c.Ifthereisonlyoneobservationforeachofseveralpastepochs,then itisimpossibletodistinguishtheeffectsofheterogeneityfromtheeffectsofcontagion. Therearethreemainstagesinthecalibrationmethoddevelopedhere:

1. Estimationofλ tforeachpastepoch,andφ tforthoseepochswithmorethanoneobservation. (λtdenotesthetruemeanPoissonrateinepochtincludingcontagioneffects:φtdenotesthe heterogeneityparameterforepocht.) 2. Analysis of changes in frequency across epochs to estimate trend parameters and the contagionparameterρ c.Wealsoinvestigatewhetherthereisevidencethattheheterogeneity parameterφ tvariesacrossepochs,andifnot,estimateaconstantvalue,denotedφ. 3. Projection of trends to the required future epoch to determine the expected future claim frequencyanditsestimationuncertaintyρ e.IfStage2showsevidencethatφ tvarieswitht, wealsoconsiderwhatvalueofφisappropriateforthefuture. ThesethreestagesaredescribedinSections3.2,3.3and3.4respectively.

3.2 Estimationofλ tandφ tforeachepochseparately

3.2.1 Positiveheterogeneity

3.2.1.1 Gammaassumptionforheterogeneity

Estimation of λ t and φ t is carried out for eachepoch separately, so we drop the subscript t initially. From Section 2.2.2 (equations 7 and 8), the number of claims N arising from x randomlyselectedriskunits(withx,λandφallknown)has: E(N|x,λ,φ) =x.λ 36 Var(N|x,λ,φ)=x.λ.(1+φ.λ) 37 If we assume that heterogeneity follows approximately a Gamma distribution, then the total PoissonrateforxunitsofexposureisalsoGamma(becausethesumofindependentidentically distributedGammavariablesisalsoGamma).ThedistributionofNisthenaGammamixtureof Poissons,whichisNegativeBinomial:thiswellknownresultisprovedinSectionA.2.2ofthe

18 appendix. (Alternatively: from the Gamma assumption for heterogeneity it follows that the numberofclaimsonasinglerandomlyselectedriskunitisNegativeBinomial,sothenumberof claimsNarisingfromxunitsofexposureisalsoNegativeBinomialasthesumofindependent identicallydistributedNegativeBinomials.)So,byassumingaGammadistributionofPoisson ratesintheheterogeneouspopulation,wehaveaNegativeBinomialdistributionforN: Γ(n + r) Pr(N = n | x,λ,ϕ) = ()1− p n .p r forsomepbetween0and1,andr>0. 38 n!.Γ(r) Ingeneral,aNegativeBinomialdistributionhas:p=E(N)/Var(N)andr=E(N).p/(1p)(see theappendix)so,fromequations36and37wehave: p=1/(1+φ.λ)andr=x/φ 39

3.2.1.2 Maximumlikelihoodestimationoftruemeanfrequencyλforsinglepastepoch

Given data pairs (x k, n k), the parameters λ and φ canbe estimated by maximum likelihood estimation (MLE): that is, we find the values λ’ and φ’ that maximise the product over all observations k of the NegativeBinomial probabilities (equation38). (An alternative is to use Bayesian estimation: this is considered in Section 3.2.1.3.) For a single datapair (x, n), the NegativeBinomialloglikelihoodLisobtainedbytakingthenaturallogarithmoftheNegative Binomialprobability(equation38).Writinglg(.)forln(Γ(.))thisgives: L =lg(n+r)–lg(n+1)–lg(r)+n.ln(1p)+r.ln(p) =lg(n+x/φ)–lg(n+1)–lg(x/φ)+n.ln(φ.λ)(n+x/φ).ln(1+φ.λ) 40

Formultipleobservations(xk,n k)(k=1,2,…),thetotalloglikelihoodisthesumoverkofthe aboveexpressionforL.Maximumlikelihoodestimatesofφandλarethevaluesthatmaximise this,andcanbefoundbysettingthederivatives∂L/∂λand∂L/∂φtozero.Fromequation40: ∂L/∂λ =n/λ–(φ.n+x)/(1+φ.λ) =(n–λ.x)/{λ.(1+φ.λ)} 41

Iftherearemultipleobservations(x k,n k)thismustbesummedoverkbeforesettingtozero.The denominatoristhesameforallobservations,soonlythenumeratorneedstobesummed,and settingthistozerogivesthemaximumlikelihoodestimate(MLE): + + λ’ =(Σ kn k)/(Σ kx k)=n /x 42 + + wheren denotesΣ kn kandx denotesΣ kx kNotethattheMLEofλisjusttheoverallobserved meanclaimfrequencyintheepochconcerned.Sincen+isthenumberofclaimsfromx+unitsof exposure,equation37appliesandwehave: Var(n+) =x+.λ.(1+φ.λ) 43 Fromwhich: Var(λ’) =λ.(1+φ.λ)/x+ =λ 2.(φ+1/λ)/x+ 44 Replacingtheunknownquantityλinthisexpressionbytheestimateλ’(equation42)givesthe followingapproximateexpressionforthesquaredvariationcoefficientofλ’: Vco 2(λ’)≈(φ+1/λ’)/x+=φ/x++1/n+ 45

19 3.2.1.3 Bayesianestimationoftruemeanfrequencyλforsinglepastepoch AsanalternativetoMLE,wealsoconsiderBayesianestimation.FromaBayesianperspective, MLEisequivalenttousingan‘uninformativeprior’thenestimatingtheparameterasthe of theposterior distribution. Here we specify a suitable form of informativeprior for λ, then determinetheposterior distributionbyBayes’theorem.Thebestestimateofλis takenasthe meanoftheposteriordistribution(thisisbestinthesenseofminimisingthemeansquareerror): estimationuncertaintyisrepresentedbythevariationcoefficientoftheposteriordistribution. Substitutingfromequations39intoequation38,theNegativeBinomialprobabilitybecomes: Γ(n + x /ϕ).(ϕ.λ) n Pr(N = n | x,λ,ϕ) = 46 n!.Γ(x /ϕ).(1+ϕ.λ) n+x /ϕ

Given multiple datapoints (x k, n k), the likelihood is the product,over all datapoints, of this expression. If π(λ) denotes the prior probability density for the parameter λ, then by Bayes’ theorem, theposteriordensityf(λ) isproportionalto theproductofthis andthefactorsof the likelihoodthatdependonλ:

+ π (λ).(ϕ.λ) n f (λ) ∝ + + 47 1( +ϕ.λ) n + x / ϕ Forthepriordistributionweproposeπ(λ)proportionalto1/λ γforsomeparameterγbetweenzero andone.Thisisanimproperpriorinthesensethatitgivesaninfiniteprobabilityforλ>0. Howeverifπ(λ)issettozeroforλ>Λ, whereΛissuitablylarge,thenπ(λ)definesaproper priordistributionforλintherangeofwhatisconsideredpossible.π(λ)decreasesasλincreases, meaning basically that high values of λ are considered less likely than lower values. More 1γ 1γ 1/(1γ) precisely:theprobabilitythatλisbetweenλ 0and{λ 0 +P.Λ } doesnotdependonλ 0(it isequaltoPforallλ 0).Forexample,puttingγ=½andΛ=10impliesthat,apriori,λhasan equalchance(10%)ofbeingbetweenanyadjacenttwoofthefollowing: zero,0.1,0.4,0.9,1.6,2.5,3.6,4.9,6.4,8.1,10. Viewedasaprobabilitydistributionfortheparameterλ,equation47(with1/λ γinplaceofπ(λ)) isaPearsonVIprobabilitydensityfunction.Ingeneral,thePearsonVIpdfcanbeexpressed: Γ(a + b).(λ / s) a−1 f (λ) = 48 s.Γ(a).Γ(b).(1+ λ / s) a+b andthemeanandvariationcoefficientaregivenby: E(λ) =s.a/(b1) (providedb>1) 49 Vco 2(λ) ={1+s/E(λ)}/(b2) (providedb>2) 50 (Notethatsisascaleparameter,aandbareshapeparameters.) Comparingequations47and48wehave: s=1/φ,a=n++(1γ),b=x+/φ–(1γ) 51 Therefore(from49and50)theposteriormeanandvariationcoefficientforλare: E(λ) ={n++(1γ)}/{x+–(2γ).φ} 52

20 Vco 2(λ) ={1+s/λ’}/(b2) whereλ’=E(λ) ={φ+1/λ’}/{x+–(3γ).φ} 53 ComparingtoMLE,weseethattheBayesianmean(equation52)ishigherthantheMLEofλ (equation 42). Comparing equations 45 and 53 for the squared variation coefficient: the differenceisthepresenceoftheterm(3γ).φinthedenominatoroftheBayesianformula. Thecondition(b>2)forthevarianceoftheposteriordistributiontobefiniteis:x+>(3γ).φ.If this inequality is not satisfied, a finite variance can be calculated by truncating the posterior distributionatthemaximumvalueΛthatisbelievedtobepossible.Thecomparisonofexposure x+withheterogeneity φmay atfirst seemstrangebecausewecanchangetheleftsideofthis inequalityjustby changingthedefinitionofan‘exposureunit’.For example,ifwedefinean ‘exposureunit’tobe10policyyearsinsteadofasinglepolicyyear,thenx+willdecreasebya factorof10.However,inthiscasethemeanclaimfrequencyincreasesbyafactorof10,andthe varianceinclaimfrequencies(duetoheterogeneity)alsoincreasesbythesamefactor(because, by independence, the variance is the sum of the variances). Therefore, the squared variation coefficientofheterogeneity(φ)decreasesbythesamefactor. Insteadoftheimproperpriorπ(λ)proportionalto1/λ γ,wecoulduseaPearsonVIdistribution withscales=1/φfortheprior:equation47stillyieldsaPearsonVIposteriordistribution.

3.2.1.4 Estimationofheterogeneityφfrommultipleobservations(singlepastepoch) FortheestimationoftheheterogeneityparameterweconsideronlyMLEbecausemoregeneral Bayesianmethodsarerelativelyintractable. Differentiating theloglikelihood(equation40) with respectto theheterogeneity parameterφ, andwritingΨ(x)forthederivativedlg(x)/dx(wherelg(x)=ln(Γ(x))wehave: φ2.∂L/∂φ =x.Ψ(x/φ)x.Ψ(n+x/φ)+x.ln(1+φ.λ)+φ.(n–x.λ)/(1+φ.λ) 54 Itcanbeshownthatthisislessthanzeroforallpositivevaluesofφ,soforasingleobservation (n,x)theMLEofφisalwayszero.Inthecaseofmultipleobservations,thereissometimesa positivevalueφ’suchthatthesumoverallobservationsofequation54iszero.Butsometimes, evenwithmultipleobservations,theloglikelihood(equation40summedoverallobservations k)increasesindefinitelyasφapproacheszero.Inthiscaseφshouldbesettozeroortoanegative valueasdescribedinthenextsubsection.Inanycase,theMLEφ’cannotbewritteninclosed form,butwhenapositivesolutionexistsitcanbedeterminedfromequation54bynumerical methods. ThefunctionΨ(x)is knownas thedigamma functionand there are goodnumerical algorithmsforevaluatingit.

3.2.2 Negativeandzeroheterogeneity

3.2.2.1 Negativeheterogeneity:Binomialdistribution AsdiscussedinSections1.2.2and2.2.2.1,“heterogeneity”coversseveraldistinctphenomena, anditispossiblethattheoveralleffectisanegativevalueforφ.Negativeheterogeneitycanbe accommodatedusingaBinomialdistributionforN|x(insteadofNegativeBinomial). TheBinomialprobabilityfunctioncanbeexpressed: P(N=n) =p n.(1p) rn.Г(r+1)/{Г(n+1).Г(rn+1)} 55

21 andthemeanandvarianceare: E(N) =r.p Var(N)=r.p.(1p) 56 Equatingtoequations36and37gives: p=φ.λr=x/φ 57 Clearly,fromequation37,wemusthaveφ.λ>1,whichensuresp<1. Takingnaturallogsof55andsubstitutingfrom57,theloglikelihoodLis: L =lg(1x/φ)lg(n+1)lg(1nx/φ)+n.ln(φ.λ)(n+x/φ).ln(1+φ.λ) 58 ThetermscontainingλarethesameasintheNegativeBinomialcase(equation40): L =n.ln(λ)(n+x/φ).ln(1+φ.λ)+(termsnotinvolvingλ) 59 Equation59givesthesocalled‘quasilikelihood’forλ:thiscanbedeterminedsolelyfromthe meanandvarianceassumptionsforn(equations36and37).BecauseboththeNegativeBinomial andBinomialloglikelihoodsareequivalenttothequasilikelihoodforλ,∂L/∂λisthesamein bothcases (equation41), sothe MLEofλ isthesame(equation42), and theformula for the estimationuncertaintyisalsothesame(equation45). TofindtheMLEofφ,equation58hastobedifferentiatedwithrespecttoφ: φ2.∂L/∂φ =x.Ψ(1x/φ)x.Ψ(1nx/φ)+x.ln(1+φ.λ)+φ.(n–x.λ)/(1+φ.λ) 60 ThisisnotthesameasintheNegativeBinomialcase(equation54).TheBinomialMLEofφis thenegativevalueφ’atwhichthesumoverallobservations(n k,x k)ofthisexpressioniszero.As intheNegativeBinomialcase,thereisnotnecessarilyasolution,andwhenasolutiondoesexist itcannotbewritteninclosedformbutcanbefoundbynumericalmethods.Ifasolutionexiststo bothequations54and60,thechoicebetweenthepositiveandnegativevaluesofφcanbebased onthemaximisedlikelihoods(equations40and58respectively). It is also possible to calculate a Bayesian estimate of λ in the Binomial case. The Binomial probability (equation 55) can be regarded as a scaled Beta distribution for λ, and a similar sequence followed as in Section 3.2.1.3 for theNegative Binomial.This is not donehere for reasonsof.

3.2.2.2 Zeroheterogeneity:Poissondistribution AzerovalueforφcorrespondstothePoissondistributionforN(givenxandλ): (x.λ) n Pr(N = n | x,λ) = exp(−x.λ) 61 n! Fromwhich,theloglikelihoodis:

L =x.λ +n.ln(x.λ)lg(n+1) 62 Differentiatingthiswithrespecttoλandsettingtozero,producesthesameMLEforλasinthe NegativeBinomialandBinomialcases(equation42).

22 3.3 Estimationoftimeeffects:trendsandcontagion

3.3.1 Testingforconstantheterogeneity

In general, we may have data covering more than one epoch with datapoints (t, x tk, n tk). Equations36 and 37 continue to apply but in general all quantities may vary with timeso a subscripttisrequiredthroughout.IntheprevioussectiontheMLEofλ t foreachepochtwas + + + + showntobeλ t’=nt /x t (wheren t =Σkn tk andx t =Σkx tk ).Inthecasewhenbothφandλare thesameinallepochs,andallexposureunitsarerandomlyselectedineachepoch(ieq=1)then allobservationscanbepooledintoasingleclassandtheMLEofλisλ’=(Σ tk n tk )/(Σ tk x tk ). Equations54and60canalsobesummedoverallobservations(kandt)andsolvedtofindthe MLEofφontheassumptionthatitisthesameinallepochs.Thiscanbedoneforφusingeither the constant estimate λ’or the varying estimates λt’. Using the varying estimatesλ t’, we can calculatethemaximumlikelihood(sumofequation40or58overallobservations)basedon: a) asingleconstantMLEφ’

b) varyingMLEsφ t’

WedenotethemaximumlikelihoodsL aandL brespectively.Thehypothesisthatφisconstant canthenbetestedusingWilksgeneralizedlikelihoodratiotest:ifφisconstant,2.(L b–L a)has approximatelyachisquareddistributionwithT1degreesoffreedom(whereTisthenumberof differentφ t’estimatesin(b)).Soif2.(L b–L a)couldplausiblyhavecomefromthischisquared distribution(ifitislessthanthe95%pointofthedistribution,forexample)thehypothesisthatφ isconstantcanbeaccepted,butif2.(L b–L a)isimplausiblylargethehypothesisisrejected.

A likelihood ratio test could be used for the hypothesis that λt is constant, but more flexible methodsthatdistinguishdifferenttypesofvariationinλtaredevelopedinthenextsubsection.

3.3.2 Estimationoftrendsandcontagion

3.3.2.1 Introduction

Wehypothesiseanunderlyingmeanfrequency tthatwouldpertainiftherewerenocontagion effects,andwhichmayvarywithtime(t)asaresultoftrendsorlongtermcycles.Contagion relatestoadditionalshorttermeffectsthatcausetheactualmeanfrequencyλt(acrosstheentire heterogeneousriskpopulation)todifferfrom t.Forcontagionweusethemodelintroducedin Section2.2.4(equations14and15)butwith tinplaceof.

Theaimhereis toestimatetheunderlying mean frequency tand thecontagionparameterρc fromananalysisofthevariationinobservedfrequencyacrossepochs.Variationacrossepochsin theobservedfrequenciesλt’(fromequation42)hasthreepossiblecauses:

a) Trendchangesintheunderlyingfrequency t

b) Contagioneffects(causingshorttermdifferencesbetweenλtand t)

c) Estimationerror(differencesbetweentheestimatesλt’andtruevaluesλt).

Wehaveassessedthemagnitudeofestimationerror(equation45or53):anyvariationinλt’that cannotreasonablybeattributedtothiswillbeattributedtocauses(a)and(b).Gradualchanges are attributedtocause(a),shortterm variation (inexcessofwhatcanbeexplainedby(c)) is attributedto(b).

23 3.3.2.2 MLEoftrendandcontagionparameters + + + Weusethenotationn t =Σ kn tk andx t =Σkx tk ,thatis,n t denotesthetotalnumberofclaims + + observedinepocht,andx t denotesthetotalexposuregivingrisetotheseclaims.Sincent isthe + numberofclaimsarisingfromxt unitsofexposure,equation37appliesandwehave: + + Var(nt )=xt .λ t.(1+φ t.λ t) 63 Also,assumingthatheterogeneityfollowsapproximatelyaGammadistribution(asdescribedin + + Section 3.2.1.1) we have a Negative Binomial distribution for n t (given x t , φt and λt), with NegativeBinomialparametersp tandr tgivenby: + pt=1/(1+φt.λt)andr t=x t /φ t 64 fromwhichwehave:

+ + nt + + Γ(nt + rt ) ()ϕt .λt Pr(nt | xt ,ϕt ,λt ) = + . + 65 Γ(n +1).Γ(r ) nt +rt t t ()1+ ϕt .λt (Thisisessentiallythesameasequation46,withthesubscriptttodistinguishepochs,andwith + + totalsn t andx t inplaceofnandx.)

Weaimtoestimatethecontagionparameterρcdefinedbyequation14(Section2.2.4).Wealso allowforpossibletrendsbyusingtinplaceofsoequation14becomes: 2 E(λ t)= t and Var(λ t)=( t.ρ c) 66 Asuitableformofmodelfortrendsmightbe: 2 t =exp(β0+β 1.t+β 2.t ) 67

Theexponentiationensuresthattisgreaterthanzero.Otherlinearformscouldalternativelybe usedintheexponent. + + Giventhedatapairs(n t ,xt )andestimatesφtobtainedasdescribedinSections3.2and3.3.1,we aimtoestimatethecontagionparameterρc andthetrendparameters(β0,β 1,β2).TouseMLE,we need to assume a full probability distribution for contagion: we have previously made only + + secondmomentassumptionsasinequation14.Sincethedistributionofn t givenx t ,φtandλtis + NegativeBinomial(asdescribedbyequation65above),thedistributionofn t allowingforthe uncertaintyinλtcausedbycontagioneffectsisamixtureofNegativeBinomials:iff t(λt)denotes theprobabilitydensityfunctionrepresentingcontagioneffects,wehave:

+ n+ ∞ t + + Γ(nt + rt ) ()ϕt .λt Pr(nt | xt ,ϕt ) = . + . ft (λt ).dλt 68 + ∫λ =0 n +r Γ(n +1).Γ(r ) t t t t t ()1+ ϕt .λt

Thisintegralistractableifweassumethatf t(λ t)isaPearsonVIdistributionwithscaleparameter equalto1/φ t.ThePearsonVIfamilyissufficientlyflexible,havingtwoshapeparameters,for themeanandvariancetobeasrequired.ThePearsonVIprobabilitydensitycanbeexpressed: Γ(a + b).(λ / s) a−1 f (λ) = 69 s.Γ(a).Γ(b).(1+ λ / s) a+b andthemeanandvariationcoefficientaregivenby:

24 E(λ) =s.a/(b1) (providedb>1) 70 Vco 2(λ) =(a+b–1)/{a.(b2)} (providedb>2) 71 Notethatsisascaleparameter,aandbareshapeparameters.Ifaandbbothtendtoinfinity witha/b,constantthevariationcoefficienttendstozerowhilethemeantendstos.a/b. 2 Fromequation66,werequirethemeanandsquaredvariationcoefficienttobe tandρc .Setting s=1/φ tthensolvingfora tandb tgives: 2 2 2 2 at={1+φt. t.(1+ρ c )}/ρ c andb t={1+2.ρ c +1/(φ t. t)}/ρ c 72 Evaluatingtheintegralinequation68givesthefollowingprobabilitydistribution forthetotal numberofclaimsinepocht(giventhetotalexposureandtheheterogeneityparameter):

+ + + + Γ(nt + rt ) Γ(at + bt ).Γ(nt + at ).Γ(rt + bt ) Pr(nt | xt ,ϕt ) = + . + 73 Γ(nt +1).Γ(rt ) Γ(at ).Γ(bt ).Γ(nt + rt + at + bt )

Thiscanbeusedtodetermineρ c andthetrendparameters(β 0,β 1,β 2)byMLE.Theparametersa t + andb tdependonρ c and(β 0,β 1,β 2)(throughequations72and67)butn t andr t(equation64)do not.Takingnaturallogs,andignoringtermsthatdonotdependonρc and(β 0,β 1,β 2)gives: + + Lt =lg(a t+bt)+lg(n t +a t)+lg(r t+bt)–lg(at)–lg(bt)–lg(nt +rt+a t+b t) 74

Usingθtodenoteanyoneoftherequiredparameters(ρcβ0,β 1,β2)wehave: + + ∂Lt/∂θ ={ψ(at+bt)+ψ(nt +at)ψ(nt +rt+at+bt)ψ(at)}.∂at/∂θ + +{ψ(at+bt)+ψ(rt +bt)ψ(nt +rt+at+bt)ψ(b t)}.∂bt/∂θ 75

Thepartialderivatives∂a t/∂θand∂b t/∂θareobtainedfromequations72and67.Equations74 and 75 have to be summed over all epochs t. The matrix of second derivatives (the Hessian matrix)canbeobtainedbydifferentiatingasecondtime.TheNewtonRaphsonmethodcanthen beusedtofindMLEsoftheparameters(ρc β0, β 1, β2). The approximate variancecovariance matrix of theparameter estimates isobtained as negative the inverse Hessian matrix. Further detailsarenotgivenhereforreasonsofspace.

3.4 Projectionoftrends

3.4.1 Generalcaseofsingleepochforecast Forsingleepochforecasts,weneedtoapplyequations22and24.Thissectionisconcernedwith 2 determining appropriate values of ’ and ρe for the future epoch concerned. Also, if the likelihood ratio test of Section 3.3.1 shows evidence that φ t varies with t, it is necessary to 2 considertheappropriatevalueofφ(denotedρh inequation24)forthefutureepoch. Fromequation67wehave: 2 t =exp(β0+β 1.t+β 2.t ) Ifwedefine: 2 ηt =β0+β 1.t+β 2.t =(1,t,t2). β 76

25 (where β isthecolumn vectoroftheβparameters andthedotdenotesmatrixmultiplication), thenwehave: t =exp(η t) 77 Using β’ todenotemaximumlikelihoodestimatesoftheβparametersdeterminedasoutlinedin theprevioussubsection,wecancalculateanestimateη t’ofη tforanyfutureepocht: 2 ηt’ =(1,t,t ). β’ 78 Atthispointwehavetoconsiderwhethertoprojectanytrendsforwards(byusingthevalueof timetcorrespondingtotherequiredfutureepochinequation78).Anotheroptionistoassume that thetherewillbenofurthertrendchangesbetweenthe latestepochinthedatasetand the futureepoch for which forecasts arerequired. In thiscase,thevalueoftusedinequation78 shouldbethatcorrespondingtothelatestepochinthedata.Judgementisobviouslyrequiredin makingthisdecision.Iftheestimatedtrendparametersβ1’andβ 2’aresmall,whetherornotthe trendsareprojectedmaybeimmaterial.Ifthevaluesarematerial,thenitisnecessarytoconsider whatmighthavecausedthetrendsinthepast:thiswillhelpindecidingwhetherornotthetrends are likely to continue in the future. Because we will not necessarily always use the actual chronologicaltimeofthefutureepochequation78itisreplacedby: η’ =(1,t 1,t2). β’ 79 2 wheret1andt 2denotethevaluesselectedtoreplacetandt .Similarlyequation76becomes: η =(1,t 1,t2). β 80 Assumingtheestimate β’isapproximatelyunbiased(onthegroundstheitisanMLEandMLEs are asymptotically unbiased) we have E(η’) = η. Equation 79 implies the following for the estimationuncertaintyinη’arisingfromestimationuncertaintyin β’: T 2 Var(η’) =(1,t 1,t2).Var( β’).(1,t 1,t2) =σ e say 81 whereVar( β’)isestimatedusingnegativetheinverseHessianmatrix(fromequation74),andT denotesmatrixtranspose. By asymptotic normality of MLEs, we approximate the estimates β’ as multivariate normal. Therefore(fromequation79)theestimateη’isapproximatelynormallydistributedandexp(η’) isapproximatelylognormal.Usingstandardresultsforthelognormalwehave: 2 E(exp(η’)) =exp(η+σ e /2) 82 2 Vco(exp(η’))=exp(σ e )1 83 Soifwedefine: 2 ’ =exp(η’σ e /2) 84 2 thenusingρ e todenotethevariationcoefficientof’,wehave: E(’) =exp(η)= byequation82 85 2 Var(’) =exp(σ e )1 byequation79 86 (Thevariationcoefficientof’isthesameasthevariationcoefficientofexp(η’)because’is 2 2 justexp(η’)multipliedbytheknownfactorexp(σe /2):weignoreestimationerrorinσ e ).

26 2 Equations84and86givethevalues’andρ e that arerequired forthegeneral singleepoch forecastingformulas(equations22and24).

IfthelikelihoodratiotestofSection3.3.1showsevidencethatφ tvarieswitht,itisnecessaryto 2 consider the appropriate value of φ (denoted ρh in equation24) for the future epoch. This is probablybestlefttojudgment,basedoninformationonpossiblecausesofheterogeneity,whyit mighthavechanged,andfuturebusinessplansthatmightaffectthemixofrisks.

4 Conclusion Forreasonsofspace,ithasnotbeenpossibletoincludeinthepresentpaper: • NumericalexamplesofthestatisticalcalibrationmethodsdescribedinSection3. • Statisticalcalibrationmethodsforusewithlongtailclassesinwhichtheultimatenumber ofclaimsn tforpastperiodsisnotknownforseveralyears.Insuchcases,modelcalibration iscarriedoutfromadevelopmenttriangleofclaimnumbers. • GeneralisationofthesingleepochforecastingformulasdevelopedinSection2formulti epochforecasting. Theauthorintendstopublishfurtherworkintheseareas.

5 References Ammeter (1948): “A generalization of the collective theory of risk in regard to fluctuating basic probabilities”,ScandinavianActuarialJournal31,171198 Bühlmann(1970):“Mathematicalmethodsofrisktheory”,SpringerVerlag Daykin,PentikäinenandPesonen(1994):“Practicalrisktheoryforactuaries”,Chapman&Hall HeckmanandMeyers(1983):“Thecalculationofaggregatelossdistributionsfromclaimseverityand claimamountdistributions”,PCASLXX,2261 LundbergO(1964):“Onrandomprocessesandtheirapplicationtosicknessandaccident” Panjer(1980):“Theaggregateclaimdistributionandstoplossreinsurance”,TransactionsoftheSociety ofActuaries,VolXXXII,pages523545.

27 AppendixA Bayesianforecasting Thisappendixreviewssomewellknownmathematicalresultsandderivesrelatedresultsthatare referredtoinSections1and2.

A.1 Binomialprocess

A.1.1 PosteriordistributionforBinomialparameterp ABinomialrandomvariablekhasprobabilities: k nk Pr(k|n,p) =n!/{k!(nk)!}p .(1p) (k=0,1,2,…n) where p is the chance of success in each of n independent trials, and k is the number of successes. Bayes’theoremgivestheposteriorprobabilitydensityforp(havingobservedk)as: f(p|k) proportionto:π(p).p k.(1p) nkwhereπ(p)isthepriordistribution Ifwehaveauniformprior(π(p)=1)thenf(p|k)isaBetadistributionforp. Moregenerally,ifthepriorbeliefscanbeapproximatedusingaBetadistribution(usuallythe caseastheBetafamilyisveryflexible,havingtwoshapeparameters): π(p) proportionalto:pα1.(1p) β1 thentheposteriorisagainaBetadistribuion: f(p|k) proportionalto:pα+k1.(1p) β+n(k+1) Themean,modeandvarianceofaBetadistributionasspecifiedaboveforπ(p)are: E(p) =α/(α+β) mode(p) =(α1)/(α+β2) Var(p) =α.β/{(α+β) 2.(α+β+1)} Sothemeanandmodeoftheposteriordistributionf(p|k)are: E(p|k) =(α+k)/(α+β+n) mode(p|k)=(α+k1)/(α+β+n2) Thecaseofauniformpriorcanbeconsideredtobethespecialcaseα=β=1.Inthiscase,the posterior meanis (k+1)/(n+2)and the posteriormode isk/n(whichis theclassicalmaximum likelihoodestimate). A Bayesian confidence interval (sometimes called a “credible interval”) is an interval constructedfromtheposteriordistribution.InSection1.2.1weuseda90%Bayesianconfidence intervalforpbasedonf(p|k)aboveinthecaseα=β=1,n=12andk=6.

A.1.2 ForecastingforaBinomialprocess:BetaBinomialdistribution WhenaleatoricuncertaintyisBinomial,assumingaBetapriorfortheparameterp(egauniform prior)impliesaBetaposteriordistributionforpasdescribedabove.Forecastsoffutureoutcomes

28 fromthesameprocessshouldthereforebebasedonamixtureofBinomialdistributionsinwhich the‘mixingweights’aregivenbytheBetaposteriordistributionforp.Thisiscalledthe‘Beta Binomial’distribution.

A.2 Poissonprocess

A.2.1 PosteriordistributionforPoissonparameterλ APoissonrandomvariablekhasprobabilities: Pr(k|λ) =e λ.λk/k!(k=0,1,2,…) SobyBayestheoremtheposteriordistributionforλ(havingobservedk)is: f(λ|k) proportionalto:π(λ).eλ.λk/k! Ifwehaveanuninformativepriordistribution(π(λ)=1)thisisaGammadistributionforλ,with E(λ)=Var(λ)=k+1. Moregenerally,ifthepriorbeliefscanbeapproximatedusingaGammadistribution: π(λ) proportionalto:eβ.λ.λα1 thentheposteriorisagainaGammadistribution: f(λ|k) proportionalto:e(β+1).λ.λα+k1 Themean,modeandvarianceofaGammadistributionasspecifiedaboveforπ(λ)are: E(λ) =α/β mode(λ) =(α1)/β Var(λ) =α/β 2 Sothemeanandmodeoftheposteriordistributionf(λ|k)are: E(λ|k) =(α+k)/(β+1) mode(λ|k)=(α+k1)/(β+1) Thecaseofanuninformativepriorcanbeconsideredtobethelimitasβtendstozerowithα fixedatone.Thevariationcoefficientoftheprioristhen1,butthemeanandstandarddeviation bothtendtoinfinity.Thisgivesaposteriormeanofk+1andposteriormodek(whichisalsothe classicalmaximumlikelihoodestimateofλ).

A.2.2 ForecastingforaPoissonprocess:NegativeBinomialdistribution WhenaleatoricuncertaintyisPoisson,assumingaGammapriorfortheparameterλimpliesa Gammaposteriordistributionforλasdescribedabove.Forecastsoffutureoutcomesfromthe same process should therefore be based on a mixture of Poisson distributions in which the ‘mixingweights’aregivenbytheGammaposteriordistributionforλ.Itisshownbelowthatthis producesaNegativeBinomialdistribution.WeassumethatkhasbeenobservedfromaPoisson processwithparameterλ,butforecastingisrequiredforaPoissonprocesswithparameterg.λ, thatis:Pr(n|g.λ)=e g.λ.(g.λ) n/n!.Initiallyweassumeanuninformativeprior(thecaseβ=0,α =1),sotheposteriordistributionisf(λ|k)=e.λ.λ k/k!.Themixeddistributionappropriatefor forecastingisthen:

29 ∞ Pr(n) = P(n | g.λ). f (λ | k).dλ ∫0

1 ∞ = exp(−g.λ).(g.λ) n .λk .exp(−λ).dλ n!.k! ∫0

n g ∞ = exp{−(g +1).λ}.λn+k dλ n!.k! ∫0 g n Γ(n + k + )1 = n!.k! (g + )1 n+k +1 Γ(n + r) = ()1− p n .p r wherep=1/(g+1)andr=k+1 n!.Γ(r) ThisistheNegativeBinomialdistribution:ncanbeinterpretedasthenumberoffailuresbefore successnumberrinindependenttrialseachwithchancepofsuccess. ThenegativeBinomialhas: E(n)=r.(1p)/p Var(n)=E(n)/p Sointhiscase,theforecastnumberofclaimsnhas: E(n)=g.(k+1) Var(n)=g.(1+g).(k+1) This should be compared with Equation 30 in Section 2.3.3.3, which was derived from the generalmodelusinganonBayesianargument. InthecaseofaninformativeGammaprior,theabovederivationiseasilymodifiedtoproducea NegativeBinomialwithp=(1+β)/(1+β+g)andr=k+α,andtherefore: E(n)=g.(k+α)/(1+β) Var(n)=g.(1+g+β).(k+α)/(1+β)2 From thiswesee thattheparameterrcantakeany positivevalue,notnecessarily aninteger. Whengeneralizedinthisway,theNegativeBinomialissometimescalledthePolyadistribution. ThecasewhenrisrestrictedtopositiveintegersissometimescalledthePascaldistribution. A Negative Binomial distribution exists for any positive values of mean and variance with variancegreaterthanmean.Theparameterspandraregivenby: p=E(N)/Var(N)and r=E(N).p/(1p)

A.2.3 NumericalexampleofSection1.3.1 InSection1.3.1wehaveaposteriordistributionforλwithameanof0.1and95%confidence interval (0.038, 0.192). This is corresponds to a posterior Gamma distribution for λ with parametersα’=6.25,β’=62.5(where,inthenotationabove:α’=(α+k)andβ’=(β+1)).The 2 varianceofthisGammadistributionisα’/β’ =0.0016.Thevarianceofthenumberofclaimsn isthengivenby: Var(n) =E(Var(n|λ))+Var(E(n|λ)) =E(λ)+Var(λ)becausen|λhasaPoissondistributionwithparameterλ =0.1+0.0016

30