An Introduction to Markov Chain Monte Carlo Methods and Their Actuarial Applications
Total Page:16
File Type:pdf, Size:1020Kb
AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS AND THEIR ACTUARIAL APPLICATIONS DAVID P. M. SCOLLNIK Department of Mathematics and Statistics University of Calgary Abstract This paper introduces the readers of the Proceed- ings to an important class of computer based simula- tion techniques known as Markov chain Monte Carlo (MCMC) methods. General properties characterizing these methods will be discussed, but the main empha- sis will be placed on one MCMC method known as the Gibbs sampler. The Gibbs sampler permits one to simu- late realizations from complicated stochastic models in high dimensions by making use of the model’s associated full conditional distributions, which will generally have a much simpler and more manageable form. In its most extreme version, the Gibbs sampler reduces the analy- sis of a complicated multivariate stochastic model to the consideration of that model’s associated univariate full conditional distributions. In this paper, the Gibbs sampler will be illustrated with four examples. The first three of these examples serve as rather elementary yet instructive applications of the Gibbs sampler. The fourth example describes a reasonably sophisticated application of the Gibbs sam- pler in the important arena of credibility for classifica- tion ratemaking via hierarchical models, and involves the Bayesian prediction of frequency counts in workers compensation insurance. 114 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS 115 1. INTRODUCTION The purpose of this paper is to acquaint the readership of the Proceedings with a class of simulation techniques known as Markov chain Monte Carlo (MCMC) methods. These methods permit a practitioner to simulate a dependent sequence of ran- dom draws from very complicated stochastic models. The main emphasis will be placed on one MCMC method known as the Gibbs sampler. It is not an understatement to say that several hun- dred papers relating to the Gibbs sampling methodology have ap- peared in the statistical literature since 1990. Yet, the Gibbs sam- pler has made only a handful of appearances within the actuarial literature to date. Carlin [3] used the Gibbs sampler in order to study the Bayesian state-space modeling of non-standard actuar- ial time series, and Carlin [4] used it to develop various Bayesian approaches to graduation. Klugman and Carlin [19] also used the Gibbs sampler in the arena of Bayesian graduation, this time concentrating on a hierarchical version of Whittaker-Henderson graduation. Scollnik [24] studied a simultaneous equations model for insurance ratemaking, and conducted a Bayesian analysis of this model with the Gibbs sampler. This paper reviews the essential nature of the Gibbs sampling algorithm and illustrates its application with four examples of varying complexity. This paper is primarily expository, although references are provided to important theoretical results in the published literature. The reader is presumed to possess at least a passing familiarity with the material relating to statistical com- puting and stochastic simulation present in the syllabus for CAS Associateship Examination Part 4B. The theoretical content of the paper is mainly concentrated in Section 2, which provides a brief discussion of Markov chains and the properties of MCMC methods. Except for noting Equations 2.1 and 2.2 along with their interpretation, the reader may skip over Section 2 the first time through reading this paper. Section 3 formally introduces the Gibbs sampler and illustrates it with an example. Section 4 discusses some of the practical considerations related to the 116 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS implementation of a Gibbs sampler. In Section 5, some aspects of Bayesian inference using Gibbs sampling are considered, and two final examples are presented. The first of these concerns the Bayesian estimation of the parameter for a size of loss distribu- tion when grouped data are observed. The second addresses cred- ibility for classification ratemaking via hierarchical models and involves the Bayesian prediction of frequency counts in workers compensation insurance. In Section 6 we conclude our presen- tation and point out some areas of application to be explored in the future. Since the subject of MCMC methods is still foreign to most actuaries at this time, we will conclude this section with a simple introductory example, which we will return to in Section 3. Example 1 This example starts by recalling that a generalized Pareto dis- tribution can be constructed by mixing one gamma distribution with another gamma distribution in a certain manner. (See for example, Hogg and Klugman [16, pp. 53–54].) More precisely, if a loss random variable X has a conditional gamma (k,µ) dis- tribution with density µk f x µ xk 1 µx < x < ( ) = ¡ k ¡ exp( ), 0 , (1.1) j ( ) ¡ 1 and the mixing random variable µ has a marginal gamma (®,¸) distribution with density ¸® f µ µ® 1 ¸µ < µ < ( ) = ¡ ® ¡ exp( ), 0 , ( ) ¡ 1 then X has a marginal generalized Pareto (®,¸,k) distribution with density ¡ ® k ¸®xk 1 f x ( + ) ¡ < x < : ( ) = ¡ ® ¡ k ¸ x ® k , 0 ( ) ( )( + ) + 1 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS 117 It also follows that the conditional distribution of µ given X is also given by a gamma distribution, namely, f(µ x) gamma (® + k,¸ + x), 0 < µ < : (1.2) j » 1 We will perform the following iterative sampling algorithm, which is based upon the conditional distributions appearing in Equations 1.1 and 1.2: 1. Select arbitrary starting values X(0) and µ(0). 2. Set the counter index i = 0. i i i 3. Sample X( +1) from f(x µ( )) gamma (k,µ( )). j » i i 4. Sample µ( +1) from f(µ X( +1)) gamma (® + k,¸+ i X( +1)). j » 5. Set i i + 1 and return to Step 3. à For the sake of illustration, we assigned the model parameters ® = 5, ¸ = 1000 and k = 2 so that the marginal distribution of µ is gamma (5,1000) with mean 0.005 and the marginal distribu- tion of X is generalized Pareto (5,1000,2) with mean 500. We then ran the algorithm described above on a fast computer for a total of 500 iterations and stored the sequence of generated val- ues X(0), µ(0), X(1), µ(1),:::, X(499), µ(499), X(500), µ(500). It must be emphasized that this sequence of random draws is clearly not independent, since X(1) depends upon µ(0), µ(1) depends upon X(1), and so forth. Our two starting values were arbitrarily se- lected to be X(0) = 20 and µ(0) = 10. The sequence of sampled i values for X( ) is plotted in Figure 1, along with the sequence i of sampled values for µ( ), for iterations 100 through 200. Both sequences do appear to be random, and some dependencies be- tween successive values are discernible in places. In Figure 2, we plot the histograms of the last 500 val- ues appearing in each of the two sequences of sampled values (the starting values X(0) and µ(0) were discarded at this point). 118 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS FIGURE 1 X i µ i SAMPLE PATHS FOR ( ) AND ( ) IN EXAMPLE 1 In each plot, we also overlay the actual density curve for the marginal distribution of either X or µ. Surprisingly, the depen- dent sampling scheme we implemented, which was based upon the full conditional distributions f(µ x) and f(x µ), appears to have generated random samples fromj the underljying marginal distributions. Now, notice that the marginal distribution of X may be inter- preted as the average of the conditional distribution of X given AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS 119 FIGURE 2 X µ HISTOGRAMS OF SAMPLE VALUES FOR AND IN EXAMPLE 1 µ taken with respect to the marginal distribution of µ; that is, f x f x µ f µ dµ: ( ) = Z ( ) ( ) j i Since the sampled values of µ( ) appear to constitute a random sample of sorts from the marginal distribution of µ, this suggests that a naive estimate of the value of the marginal density function for X at the point x might be constructed by taking the empirical i i average of f(x µ( )) over the sampled values for µ( ). If µ(1) = j 120 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS 0:0055, for example, then f(x µ(1)) = 0:00552xexp( 0:0055x): j ¡ i One does a similar computation for the other values of µ( ) and averages to get 500 1 i fˆ (x) = f(x µ( )): (1.3) 500 X i=1 j Similarly, we might construct 500 1 i fˆ (µ) = f(µ X( )) (1.4) 500 X i=1 j as a density estimate of f(µ). These estimated density functions are plotted in Figure 3 along with their exact counterparts, and it is evident that the estimated densities happen to be excellent. 2. MARKOV CHAIN MONTE CARLO In the example of the previous section, we considered an itera- tive simulation scheme that generated two dependent sequences of random variates. Apparently, we were able to use these se- quences in order to capture characteristics of the underlying joint distribution that defined the simulation scheme in the first place. In this section, we will discuss a few properties of certain sim- ulation schemes that generate dependent sequences of random variates and note in what manner these dependent sequences may be used for making useful statistical inference. The main results are given by Equations 2.1 and 2.2. Before we begin, it may prove useful to quickly, and very in- formally, review some elementary Markov chain theory. Tierney [31] provides a much more detailed and rigorous discussion of this material. A Markov chain is just a collection of random vari- X n ables n; 0 , with the distribution of the random variables k on somf e spac¸e •g R governed by the transition probabilities Xµ A X ::: X K X A Pr( n , , n) = ( n, ), +1 2 j 0 AN INTRODUCTION TO MARKOV CHAIN MONTE CARLO METHODS 121 FIGURE 3 X µ ESTIMATED AND EXACT MARGINAL DENSITIES FOR AND IN EXAMPLE 1 where A •.