Long-Term Actuarial Mathematics Study Note

Total Page:16

File Type:pdf, Size:1020Kb

Long-Term Actuarial Mathematics Study Note EDUCATION COMMITTEE OF THE SOCIETY OF ACTUARIES LONG-TERM ACTUARIAL MATHEMATICS STUDY NOTE CHAPTERS 10–12 OF LOSS MODELS, FROM DATA TO DECISIONS, FIFTH EDITION by Stuart A. Klugman, Harry H. Panjer and Gordon E. Willmot Copyright © 2018. Posted with permission of the authors. The Education Committee provides study notes to persons preparing for the examinations of the Society of Actuaries. They are intended to acquaint candidates with some of the theoretical and practical considerations involved in the various subjects. While varying opinions are presented where appropriate, limits on the length of the material and other considerations sometimes prevent the inclusion of all possible opinions. These study notes do not, however, represent any official opinion, interpretations or endorsement of the Society of Actuaries or its Education Committee. The Society is grateful to the authors for their contributions in preparing the study notes. LTAM-22-18 Printed in U.S.A. CONTENTS Review of mathematical statistics 3 10.1 Introduction and four data sets 3 10.2 Point estimation 5 10.2.1 Introduction 5 10.2.2 Measures of quality 6 10.2.3 Exercises 16 10.3 Interval estimation 17 10.3.1 Exercises 19 10.4 Construction of Parametric Estimators 20 10.4.1 Method of moments and percentile matching 20 10.4.2 Exercises 23 10.5 Tests of hypotheses 26 10.5.1 Exercise 29 10.6 Solutions to Exercises 30 Maximum likelihood estimation 39 11.1 Introduction 39 11.2 Individual data 41 11.2.1 Exercises 42 11.3 Grouped data 45 11.3.1 Exercises 46 i ii CONTENTS 11.4 Truncated or censored data 46 11.4.1 Exercises 51 11.5 Variance and interval estimation for maximum likelihood estimators 52 11.5.1 Exercises 57 11.6 Functions of aymptotically normal estimators 58 11.6.1 Exercises 60 11.7 Nonnormal confidence intervals 60 11.7.1 Exercise 63 11.8 Solutions to Exercises 63 Estimation Based on Empirical Data 81 12.1 The Empirical Distribution 81 12.2 Empirical distributions for grouped data 86 12.2.1 Exercises 87 12.3 Empirical estimation with right censored data 90 12.3.1 Exercises 101 12.4 Empirical estimation of moments 105 12.4.1 Exercises 111 12.5 Empirical estimation with left truncated data 111 12.5.1 Exercises 116 12.6 Kernel density models 117 12.6.1 Exercises 121 12.7 Approximations for large data sets 122 12.7.1 Introduction 122 12.7.2 Using individual data points 124 12.7.3 Interval-based methods 128 12.7.4 Exercises 131 12.8 Maximum likelihood estimation of decrement probabilities 132 12.8.1 Exercise 135 12.9 Estimation of transition intensities 135 12.10 Solutions to Exercises 136 References 157 10 REVIEW OF MATHEMATICAL STATISTICS 10.1 Introduction and four data sets Before studying empirical models and then parametric models, we review some concepts from mathematical statistics. Mathematical statistics is a broad subject that includes many topics not covered in this chapter. For those topics that are covered, it is assumed that the reader has had some prior exposure. The topics of greatest importance for construct- ing actuarial models are estimation and hypothesis testing. Because the Bayesian ap- proach to statistical inference is often either ignored or treated lightly in introductory mathematical statistics texts and courses, it receives more in-depth coverage in this text in Chapter ??. Bayesian methodology also provides the basis for the credibility methods covered in Chapter ??. To see the need for methods of statistical inference, consider the case where your su- pervisor needs a model for basic dental payments. One option is to simply announce the model. You proclaim that it is the lognormal distribution with µ = 5:1239 and σ = 1:0345. (The many decimal places are designed to give your proclamation an aura of precision.) When your supervisor, a regulator, or an attorney who has put you on the witness stand asks you how you know that to be so, it will likely not be sufficient to answer that “I just know these things,” “trust me, I am a trained statistician,” “it is too complicated, you wouldn’t understand,” or “my friend at Gamma Dental uses that model.” Loss Models: From Data to Decisions, 5th edition. 3 By Stuart A. Klugman, Harry H. Panjer, Gordon E. Willmot Copyright c 2017 John Wiley & Sons, Inc. 4 REVIEW OF MATHEMATICAL STATISTICS Table 10.1 Data Set A. Number of accidents Number of drivers 0 81,714 1 11,306 2 1,618 3 250 4 40 5 or more 7 Table 10.2 Data Set B. 27 82 115 126 155 161 243 294 340 384 457 680 855 877 974 1,193 1,340 1,884 2,558 15,743 An alternative is to collect some data and use it to formulate a model. Most distribu- tional models have two components. The first is a name, such as “Pareto.” The second is the set of parameter values that complete the specification. Matters would be simpler if modeling could be done in that order. Most of the time we need to fix the parameters that go with a named model before we can decide if we want to use that model. Because the parameter estimates are based on a sample from the population and not the entire population, the results will not be the true values. It is important to have an idea of the potential error. One way to express this error is with an interval estimate. That is, rather than announcing a particular value, a range of plausible values is presented. When named parametric distributions are used, the parameterizations used are those from Appendixes ?? and ??. Alternatively, you may want to construct a nonparametric model (also called an empir- ical model) where the goal is to determine a model that essentially reproduces the data. Such models are discussed in Chapter 12 At this point we present four data sets, referred to as Data Sets A, B, C, and D. They will be used several times, some in this chapter and some in later chapters. Data Set A This data set is well-known in the casualty actuarial literature. It was first analyzed in the paper [5] by Dropkin in 1959. He collected data from 1956–1958 on the number of accidents by one driver in one year. The results for 94,935 drivers are in Table 10.1. Data Set B These numbers are artificial. They represent the amounts paid on workers compensation medical benefits but are not related to any particular policy or set of policy- holders. These payments are the full amount of the loss. A random sample of 20 payments is given in Table 10.2. Data Set C These observations represent payments on 227 claims from a general liability insurance policy. The data are in Table 10.3. Data Set D This data set is from the experience of five-year term insurance policies. The study period is a fixed time period. The columns are interpreted as follows: (1) i is the policy number, 1–40. (2) di is the duration at which the insured was first observed. Thus, policies 1–30 were observed from when the policy was sold. The remining policies were POINT ESTIMATION 5 Table 10.3 Data Set C. Payment range Number of payments 0–7,500 99 7,500–17,500 42 17,500–32,500 29 32,500–67,500 28 67,500–125,000 17 125,000–300,000 9 Over 300,000 3 Table 10.4 Data Set D i di xi ui i di xi ui 1 0 – 0.1 16 0 4.8 – 2 0 – 0.5 17 0 – 4.8 3 0 – 0.8 18 0 – 4.8 4 0 0.8 – 19–30 0 – 5.0 5 0 – 1.8 31 0.3 – 5.0 6 0 – 1.8 32 0.7 – 5.0 7 0 – 2.1 33 1.0 4.1 – 8 0 – 2.5 34 1.8 3.1 – 9 0 – 2.8 35 2.1 – 3.9 10 0 2.9 – 36 2.9 – 5.0 11 0 2.9 – 37 2.9 – 4.8 12 0 – 3.9 38 3.2 4.0 – 13 0 4.0 – 39 3.4 – 5.0 14 0 – 4.0 40 3.9 – 5.0 15 0 – 4.1 issued prior to the start of the observation period and were known to be alive at that dura- tion. (3) xi is the duration at which the insured was observed to die. Those who didn’t die had “–” in that column. (4) ui is the last duration at which those who didn’t die were ob- served. That could be because they surrendered their policy before the five years elapsed, reached the end of the five-year term, or the study ended while the policy was still in force. The data are in Table 10.4. 10.2 Point estimation 10.2.1 Introduction Regardless of how a model is estimated, it is extremely unlikely that the estimated model will exactly match the true distribution. Ideally, we would like to be able to measure the error we will be making when using the estimated model. But doing so is clearly impossible! If we knew the amount of error we had made, we could adjust our estimate 6 REVIEW OF MATHEMATICAL STATISTICS by that amount and then have no error at all. The best we can do is discover how much error is inherent in repeated use of the procedure, as opposed to how much error we made with our current estimate.
Recommended publications
  • ACTS 4304 FORMULA SUMMARY Lesson 1: Basic Probability Summary of Probability Concepts Probability Functions
    ACTS 4304 FORMULA SUMMARY Lesson 1: Basic Probability Summary of Probability Concepts Probability Functions F (x) = P r(X ≤ x) S(x) = 1 − F (x) dF (x) f(x) = dx H(x) = − ln S(x) dH(x) f(x) h(x) = = dx S(x) Functions of random variables Z 1 Expected Value E[g(x)] = g(x)f(x)dx −∞ 0 n n-th raw moment µn = E[X ] n n-th central moment µn = E[(X − µ) ] Variance σ2 = E[(X − µ)2] = E[X2] − µ2 µ µ0 − 3µ0 µ + 2µ3 Skewness γ = 3 = 3 2 1 σ3 σ3 µ µ0 − 4µ0 µ + 6µ0 µ2 − 3µ4 Kurtosis γ = 4 = 4 3 2 2 σ4 σ4 Moment generating function M(t) = E[etX ] Probability generating function P (z) = E[zX ] More concepts • Standard deviation (σ) is positive square root of variance • Coefficient of variation is CV = σ/µ • 100p-th percentile π is any point satisfying F (π−) ≤ p and F (π) ≥ p. If F is continuous, it is the unique point satisfying F (π) = p • Median is 50-th percentile; n-th quartile is 25n-th percentile • Mode is x which maximizes f(x) (n) n (n) • MX (0) = E[X ], where M is the n-th derivative (n) PX (0) • n! = P r(X = n) (n) • PX (1) is the n-th factorial moment of X. Bayes' Theorem P r(BjA)P r(A) P r(AjB) = P r(B) fY (yjx)fX (x) fX (xjy) = fY (y) Law of total probability 2 If Bi is a set of exhaustive (in other words, P r([iBi) = 1) and mutually exclusive (in other words P r(Bi \ Bj) = 0 for i 6= j) events, then for any event A, X X P r(A) = P r(A \ Bi) = P r(Bi)P r(AjBi) i i Correspondingly, for continuous distributions, Z P r(A) = P r(Ajx)f(x)dx Conditional Expectation Formula EX [X] = EY [EX [XjY ]] 3 Lesson 2: Parametric Distributions Forms of probability
    [Show full text]
  • STAT 830 the Basics of Nonparametric Models The
    STAT 830 The basics of nonparametric models The Empirical Distribution Function { EDF The most common interpretation of probability is that the probability of an event is the long run relative frequency of that event when the basic experiment is repeated over and over independently. So, for instance, if X is a random variable then P (X ≤ x) should be the fraction of X values which turn out to be no more than x in a long sequence of trials. In general an empirical probability or expected value is just such a fraction or average computed from the data. To make this precise, suppose we have a sample X1;:::;Xn of iid real valued random variables. Then we make the following definitions: Definition: The empirical distribution function, or EDF, is n 1 X F^ (x) = 1(X ≤ x): n n i i=1 This is a cumulative distribution function. It is an estimate of F , the cdf of the Xs. People also speak of the empirical distribution of the sample: n 1 X P^(A) = 1(X 2 A) n i i=1 ^ This is the probability distribution whose cdf is Fn. ^ Now we consider the qualities of Fn as an estimate, the standard error of the estimate, the estimated standard error, confidence intervals, simultaneous confidence intervals and so on. To begin with we describe the best known summaries of the quality of an estimator: bias, variance, mean squared error and root mean squared error. Bias, variance, MSE and RMSE There are many ways to judge the quality of estimates of a parameter φ; all of them focus on the distribution of the estimation error φ^−φ.
    [Show full text]
  • Bayesian Methods for Function Estimation
    hs25 v.2005/04/21 Prn:19/05/2005; 15:11 F:hs25013.tex; VTEX/VJ p. 1 aid: 25013 pii: S0169-7161(05)25013-7 docsubty: REV Handbook of Statistics, Vol. 25 ISSN: 0169-7161 13 © 2005 Elsevier B.V. All rights reserved. 1 1 DOI 10.1016/S0169-7161(05)25013-7 2 2 3 3 4 4 5 5 6 Bayesian Methods for Function Estimation 6 7 7 8 8 9 9 10 Nidhan Choudhuri, Subhashis Ghosal and Anindya Roy 10 11 11 12 12 Abstract 13 13 14 14 15 Keywords: consistency; convergence rate; Dirichlet process; density estimation; 15 16 Markov chain Monte Carlo; posterior distribution; regression function; spectral den- 16 17 sity; transition density 17 18 18 19 19 20 20 21 1. Introduction 21 22 22 23 Nonparametric and semiparametric statistical models are increasingly replacing para- 23 24 metric models, for the latter’s lack of sufficient flexibility to address a wide vari- 24 25 ety of data. A nonparametric or semiparametric model involves at least one infinite- 25 26 dimensional parameter, usually a function, and hence may also be referred to as an 26 27 infinite-dimensional model. Functions of common interest, among many others, include 27 28 the cumulative distribution function, density function, regression function, hazard rate, 28 29 transition density of a Markov process, and spectral density of a time series. While fre- 29 30 quentist methods for nonparametric estimation have been flourishing for many of these 30 31 problems, nonparametric Bayesian estimation methods had been relatively less devel- 31 32 oped.
    [Show full text]
  • Estimation of Rate of Rare Occurrence of Events with Empirical Bayes
    Estimating Rate of Occurrence of Rare Events with Empirical Bayes: A Railway Application John Quigley (Corresponding author), Tim Bedford and Lesley Walls Department of Management Science University of Strathclyde, Glasgow, G1 1QE, SCOTLAND [email protected] Tel +44 141 548 3152 Fax +44 141 552 6686 Abstract Classical approaches to estimating the rate of occurrence of events perform poorly when data are few. Maximum Likelihood Estimators result in overly optimistic point estimates of zero for situations where there have been no events. Alternative empirical based approaches have been proposed based on median estimators or non- informative prior distributions. While these alternatives offer an improvement over point estimates of zero, they can be overly conservative. Empirical Bayes procedures offer an unbiased approach through pooling data across different hazards to support stronger statistical inference. This paper considers the application of Empirical Bayes to high consequence low frequency events, where estimates are required for risk mitigation decision support such as As Low As Reasonably Possible (ALARP). A summary of Empirical Bayes methods is given and the choices of estimation procedures to obtain interval estimates are discussed. The approaches illustrated within the case study are based on the estimation of the rate of occurrence of train derailments within the UK. The usefulness of Empirical Bayes within this context is discussed. 1 Key words: Empirical Bayes, Credibility Theory, Zero-failure, Estimation, Homogeneous Poisson Process 1. Introduction The Safety Risk Model (SRM) is a large scale Fault and Event Tree model used to assess risk on the UK Railways. The objectives of the SRM (see Risk Profile Bulletin [1]) are to provide an understanding of the nature of the current risk on the mainline railway and risk information and profiles relating to the mainline railway.
    [Show full text]
  • Empirical Information Metrics for Prediction Power and Experiment Planning
    Information 2011, 2, 17-40; doi:10.3390/info2010017 OPEN ACCESS information ISSN 2078-2489 www.mdpi.com/journal/information Article Empirical Information Metrics for Prediction Power and Experiment Planning Christopher Lee 1;2;3 1 Department of Chemistry & Biochemistry, University of California, Los Angeles, CA 90095, USA 2 Department of Computer Science, University of California, Los Angeles, CA 90095, USA 3 Institute for Genomics & Proteomics, University of California, Los Angeles, CA 90095, USA; E-Mail: [email protected]; Tel: 310-825-7374; Fax: 310-206-7286 Received: 8 October 2010; in revised form: 30 November 2010 / Accepted: 21 December 2010 / Published: 11 January 2011 Abstract: In principle, information theory could provide useful metrics for statistical inference. In practice this is impeded by divergent assumptions: Information theory assumes the joint distribution of variables of interest is known, whereas in statistical inference it is hidden and is the goal of inference. To integrate these approaches we note a common theme they share, namely the measurement of prediction power. We generalize this concept as an information metric, subject to several requirements: Calculation of the metric must be objective or model-free; unbiased; convergent; probabilistically bounded; and low in computational complexity. Unfortunately, widely used model selection metrics such as Maximum Likelihood, the Akaike Information Criterion and Bayesian Information Criterion do not necessarily meet all these requirements. We define four distinct empirical information metrics measured via sampling, with explicit Law of Large Numbers convergence guarantees, which meet these requirements: Ie, the empirical information, a measure of average prediction power; Ib, the overfitting bias information, which measures selection bias in the modeling procedure; Ip, the potential information, which measures the total remaining information in the observations not yet discovered by the model; and Im, the model information, which measures the model’s extrapolation prediction power.
    [Show full text]