<<

The Log- Distribution and Non-Normal Error

by Leigh J. Halliwell

ABSTRACT Because insured losses are positive, loss distributions start from zero and are right-tailed. However, residuals, or errors, are centered about a mean of zero and have both right and left tails. Seldom do error terms from models of insured losses seem normal. Usually they are positively skewed, rather than symmetric. And their right tails, as measured by their asymptotic failure rates, are heavier than that of the normal. As an error distribution suited to actuarial modeling this paper presents and recommends the log- and its linear combi­ nations, especially the combination known as the generalized logistic distribution. To serve as an example, a generalized logistic distribution is fitted by maximum likelihood to the standardized residuals of a loss-triangle model. Much theory is required for, and occasioned by, this presentation, most of which appears in three appendices along with some related mathematical history.

KEYWORDS Log-gamma, digamma, logistic, Euler-Mascheroni, cumulant, maximum likelihood, robust, bootstrap

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 173 Variance Advancing the Science of Risk

1. Introduction The fact that the errors resul- The errors resultant in most tant in most actuarial models are Since the late 1980s actuarial actuarial models are not not normally distributed raises science has been moving gradu- two questions. First, does non- normally distributed. ally from methods to models. The normality in the randomness affect movement was made possible by the accuracy of a model’s predic- personal computers; it was made necessary by insur- tions? The answer is “Yes, sometimes seriously.” ance competition. Actuaries, who used to be known Second, can models be made “robust,”2 i.e., able to for fitting and extrapolating from data, are deal properly with non-normal error? Again, “Yes.” now likely to be fitting models and explaining data. Three attempts to do so are 1} to relax parts of BLUE, Statistical modeling seeks to explain a block of especially linearity and unbiasedness (robust esti- observed data as a function of known variables. mation), 2} to incorporate explicit distributions into The model makes it possible to predict what will models (GLM), and 3} to bootstrap.3 Bootstrapping be observed from new instances of those variables. a linear model begins with solving it convention- However, rarely, if ever, does the function perfectly ally. One obtains aˆ and the expected observation explain the data. A successful model, as well as vector yˆ = Xaˆ . Gleaning information from the seeming reasonable, should explain the observations residual vector eˆ = y − yˆ, one can simulate proper, or tolerably well. So the deterministic model y = f (X) more realistic, “pseudo-error” vectors e*i and pseudo- gives way to the approximation y ≈ f (X), which is observations yi = yˆ + e*i . Iterating the model over ˆ restored to equality with the addition of a random the yi will produce pseudo-estimates ai and pseudo- error term: y = f(X) + e. The simplest model is the predictions in keeping with the apparent distribution homoskedastic linear statistical model (LSM) with of error. Of the three attempts, bootstrapping is the vector form y = Xβ + e, in which E[e] = 0 and most commonsensical. Var[e] = s2I. According to LSM theory, aˆ = (X′X)–1X′y Our purpose herein is to introduce a distribution is the best linear unbiased estimator (BLUE) of β, for non-normal error that is suited to bootstrapping in even if the elements of error vector e are not normally general, but especially as regards the asymmetric and distributed.1 skewed data that actuaries regularly need to model.

1For explanations of the linear statistical model and BLUE see Judge information into their models, even the so-called “collateral” infor- [1988, §§5.5 and 5.7] and Halliwell [2015]. Although normality is not mation. Judge [1988, Chapter 7] devotes fifty pages to the Bayesian assumed in the model, it is required for the usual tests of significance, the version of the LSM. Of course, whether randomness be ontological t-test and the f-test. It would divert us here to argue whether randomness or epistemological, the goal of science is to mitigate it, if not to arises from reality or from our ignorance thereof. Between the world dispel it. This is especially the goal of actuarial science with regard wars physicists Niels Bohr and Albert Einstein argued this at length– to risk. technically to a stalemate, although most physicists give the decision 2“Models that perform well even when the population does not conform to Bohr. Either way, actuaries earn their keep by dealing with what precisely to the parametric family are said to be robust” (Klugman [1998, the insurance industry legitimately perceives as randomness (“risk”). §2.4.3]). “A robust estimator . . . produces estimates that are ‘good’ (in One reviewer commented, “From a Bayesian perspective, there is no some sense) under a wide variety of possible data-generating processes” real concept of randomness in the sense of an outcome that is not the (Judge [1988, Introduction to Chapter 22]). Chapter 22 of Judge contains result of a cause-effect relationship.” This pushes conditional prob- thirty pages on robust estimation. 3 ability too far, to the tautology that Prob[X = a|X = b] = δab. Informa- For descriptions of bootstrapping see Klugman [1998, Example 2.19 tion about some variables from a group of jointly distributed random in §2.4.3] and Judge [1988, Chapter 9, §9.A.1]. However, they both variables can affect the distribution of the others, as in the describe bootstrapping, or resampling, from the empirical distribution classic case of the positive-definite multivariate . of residuals. Especially when observations are few (as in the case of our Such information tightens or narrows the “wave function,” without example in Section 6 with eighty-five observations) might the modeler altogether collapsing it. The reviewer’s comment presumes the notion want to bootstrap/resample from a parametric error distribution. Not to of Einstein’s hidden variables. Bayes is neither here nor there, for a give away “free cover” would require bootstrapping from a parametric variable that remains hidden forever, or ontologically, really is random. distribution, unless predictive errors were felt to be well represented The truth intended by the comment is that modelers should not be lazy, by the residuals. A recent CAS monograph that combines GLM and that subject to practical constraints they should incorporate all relevant bootstrapping is Shapland [2016].

174 CASUALTY ACTUARIAL SOCIETY VOLUME 13/ISSUE 2 The Log-Gamma Distribution and Non-Normal Error

As useful candidates for non-normal error, Sections 2 are obviously infinite, and the itself is skewed and 3 will introduce the log-gamma random variable to the left (negative skewness). and its linear combinations. Section 4 will settle on a The log-gamma moments can be derived from its linear combination that arguably maximizes the ratio moment : of versatility to complexity, the generalized logistic random variable. Section 5 will examine its special tY tXln ttΓα()+ t MtY ()==Ee[]Ee[]==EX[] θ cases. Finally, Section 6 will estimate by maximum Γα() likelihood the parameters of one such distribution from actual data. The most mathematical and theo- Even better is to switch from moments to cumu- retical subjects are relegated to Appendices A-. lants by way of the cumulant generating function:7

ψ=YY()tMln ()tt=Γln ()α+ −Γln ()α+t lnθ 2. The log-gamma random variable ψ=YY()0ln0M ()= 0

If X ∼ Gamma(α, q), then Y = ln X is a random The cumulants (kn) are the derivatives of this 4 variable whose support is the entire real line. Hence, function evaluated at zero, the first four of which are: the converts a one-tailed distribution into a two-tailed. Although a leftward shift of X would ψ=Y′ ()ttψα()++ln θ move probability onto the negative real line, such a κ=1 EY[]= ψ=Y′ ()0lψα()+θn left tail would be finite. The logarithm is a natural way, even the natural way, to transform one infinite ψ=Y′′()ttψα′()+ tail into two infinite tails.5 Because the logarithm κ=2 VarY[]= ψ=Y′′()00ψα′()> function strictly increases, the probability density 6 function of Y ∼ Log-Gamma(α, q) is: ψ=Y′′′()ttψα′′ ()+

κ=3 Skew[]Y = ψ=Y′′′()00ψα′′ ()< u eu u α−1 − u de 11θ  e  u fuY ()==feXe= Y () e   e iv du Γα()  θ  θ ψ=Y ()ttψα′′′ ()+ iv eu u α κ=4 XsKurt[]Y =ψY ()00= ψα′′′ ()> 1 −  e  = e θ   Γα()  θ  The scale factor q affects only the mean.8 The alter-

nating inequalities of k2, k3, and k4 derive from the Figure 1 contains a graph of the probability polygamma formulas of Appendix A.3. The variance, density functions of both X and Y = ln X for X ∼ of course, must be positive; the negative skewness Gamma(1, 1) ∼ Exponential(1). The log-gamma tails confirms the appearance of the log-gamma density in Figure 1. The positive excess kurtosis means that the log-gamma distribution is “platykurtic;” its kurtosis is more positive than that of the normal distribution. 4Appendix A treats the and its family of related functions. 5Meyers [2013], also having seen the need for two-tailed loss distri- butions, is intrigued with skew normal and “mixed lognormal-normal” distributions, which in our opinion are not as intelligible and versatile as 7Moments, cumulants, and their generating functions are reviewed in log-gamma distributions. Appendix B. The customary use of ‘y’ for both the cumulant generating 6With Leemis [“Log-gamma distribution”], we use the ‘log’ prefix for function and the is unfortunate; however, the presence Y = ln(Gamma), rather than for ln(Y) = Gamma. Hogg [1984, j2.6] or absence of a subscripted random variable resolves the ambiguity. defines his “loggamma” as Y = eGamma, after analogy with the lognormal. 8See footnote 16 in Appendix A.1.

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 175 Variance Advancing the Science of Risk

Figure 1. Probability densities 1.0

0.8

X ~ Exponential 0.6 Y = ln X

0.4

0.2

–4 –3 –2 –1 0 1 2 3 4

Since the logarithm is a concave downward func- but it reverses the signs of the odd ones. For example, tion, it follows from Jensen’s inequality: the skewness of –Y = −ln X = ln X−1 is positive. Now let Y be a γ-weighted sum of independent EY[]=≤ln XEln []X log-gamma random variables Yk, which resolves into ψα()+θln ≤αln()θ=ln()α+ln()θ a product of powers of independent gamma random variables Xk ∼ Gamma(αk, qk): ψα()≤αln() γγkk YY=γ∑ kk=γ∑ kkln XX==∑ln k ln∏ X k Because the probability is not amassed, the k k k k inequality is strict: y(α) < ln(α) for α > 0. However, Although the Xk must be independent of one when E[X ] = αq is fixed at unity and as α → ∞, the another, their parameters need not be identical. variance of X approaches zero. Hence, lim (ln(α) – α→∞ Because of the independence: y(α)) = ln 1 = 0. It is not difficult to prove that  tY∑ γ k  lim (ln(α) – y(α)) = ∞, as well as that ln(α) − y(α) tY k  tYγ kk α→0+ ψ=Y ()tEln []eE= ln eE = ln ∏ e  k  strictly decreases. Therefore, for every y > 0 there exists exactly one α > 0 for which y = ln(α) – y(α). ln EetYγγkk ln EetYkk t ==∏ []∑∑[]=ψYkk ()γ The log-gamma random variable becomes an error k k k term when its expectation equals zero. This requires The nth cumulant of the weighted sum is: the parameters to satisfy the equation E[Y] = y(α) + −y(α) n n ln q = 0, or q = e . Hence, the simplest of all dtψY () d κ=n ()Y =ψYk()tγ log-gamma error distributions is Y ∼ Log-Gamma n n ∑ k dt t==00dt k t (α, e−y(α)) = ln(X ∼ Gamma(α, e−y(α))). n dtψγYkk () n []n = =γk ψY ()0 ∑∑dtn k 3. Weighted sums of log-gamma k t=0 k random variables n =γ∑ kκ nk()Y k Multiplying the log-gamma random variable by negative one reflects its distribution about the y-axis. So the nth cumulant of a weighted sum of inde- This does not affect the even moments or cumulants; pendent random variables is the weighted sum of the

176 CASUALTY ACTUARIAL SOCIETY VOLUME 13/ISSUE 2 The Log-Gamma Distribution and Non-Normal Error

cumulants of the random vari- γ > 0 should be a useful form ables, the weights being raised The not overly complicated of intermediate complexity. 9 to the nth power. “generalized logistic” distribution Let the parameterization for Using the cumulant formu- this purpose be X ∼ Gamma is versatile enough for modeling 1 las from the previous section, (α, 1) and X2 ∼ Gamma(β, 1). we have: non-normal error. Contributing to the usefulness of this form is the fact that

κ=11()YY∑∑γκkk()=γ∑ kkψα()+γkkln()θ X1/X2 is a generalized Pareto random variable, whose k k k probability density function is:10 =γ∑ kkψα()+ C α−11β− k Γα()+β u 1 1 fu     XX12()=     2 2 2 Γα()Γβ() uu+ 1  + 1 ()u + 1 κ=2 ()YY∑ γκkk2 ()=γ∑ kkψα′() k k Y C γ Since e = e (X1/X2) , for –α < γt < β: 3 3 κ=3 ()YY∑ γκkk3 ()=γ∑ kkψα′′ () k k tY Ct γt MtY ()==Ee[]eE[]()XX12 4 4 κ=4 ()YY∑ γκkk4 ()=γ∑ kkψα′′′ () ∞ α−1 k k Ct γt Γα()+β  u  = eu∫   u=0 Γα()Γβ() u + 1 m+1 [m] In general, km+1 (Y) = ∑γ k y (αk) + IF(m = 0, k β−1 C, 0). A weighted sum of n independent log-gamma  1  du   2 random variables would provide 2n + 1 degrees of  u + 11 ()u + freedom for a method-of-cumulants fitting. All the ∞ Ct Γα()+γttΓβ()−γ Γα()+β = e scale parameters qk would be unity. As the parameter ∫ Γα()Γβ() u 0 Γα()+γttΓβ()−γ of an error distribution, C would lose its freedom, = α+γ−tt11β−γ− since the mean must then equal zero. Therefore, with  u   1  du γ k 2 no loss of generality, we may write Y = ln∏X k + C     k  uu+ 1  + 11 ()u + for Xk ∼ Gamma(αk, 1). Ct Γα()+γttΓβ()−γ = e 4. The generalized logistic Γα()Γβ() random variable Hence, the cumulant generating function and its Although any finite weighted sum is tractable, derivatives are: four cumulants should suffice in most practice. So ψ=YY()tMln ()tC=+ttln Γα()+γ let Y = γ lnX + γ lnX + C = ln(X γ1X γ2) + C. Even 1 1 2 2 1 2 −Γln ()α+ln Γβ−γt −Γln ()β then, one gamma should be positive and the other () negative; in fact, letting one be the opposite of the ψ=Y′ ()ttγψ()()α+γ−ψβ()−γtC+ other will allow Y to be symmetric in special cases. 2 Therefore, Y = γlnX1 − γlnX2 + C = γ ln(X1/X2) + C for ψ=Y′′()ttγ ()ψα′()+γ + ψβ′()−γt

9The same is true of all the ordinary, or non-central, moments. It is true 10Hogg [1884, §2.6] derives this by the change-of-variable technique. also of the first three central moments, but only because they are iden­ Venter [1983, Appendix D] uses the mixing technique, since Gamma (α, 1)/ tical to the first three cumulants. Halliwell [2011, §4] explains why the Gamma (β, 1) ∼ Gamma (α, q = 1/Gamma (β, 1)). See also Klugman fourth and higher central moments of independent random variables [1998, §2.7.3.4 and Appendix A.2.2.1]. Our formulation assumes a scale are not additive. factor of unity in the generalized Pareto.

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 177 Variance Advancing the Science of Risk

3 ψ=Y′′′()ttγ ()ψα′′ ()+γ − ψβ′′ ()−γt Z ∼ ln(Gamma(α, 1)/Gamma(β, 1)) is a “general­ ized logistic” random variable (Wikipedia [Generalized iv 4 ψ=Y ()ttγ ()ψα′′′ ()+γ + ψβ′′′ ()−γt logistic distribution]). The probability density functions of generalized And so, the cumulants are: logistic random variables are skew-symmetric:

κ=1 EY[]= ψ=Y′ ()0 C +γ()ψα()−γ()β β α Γ()β+α  e−u   1  2 fu1 X2 ()−= −uu− κ=2 VarY[]= ψ=Y′′()00γ ()ψα′()+ ψβ′()> =ln     Z X1 Γβ()Γα() ee+ 1  + 1

3 κ=3 Skew[]Y = ψ=Y′′′()0 γ ()ψα′′ ()− ψβ′′ () u αβ Γα()+β  e   1  =  uu   iv 4 Γα()Γβ() ee+ 1  + 1 κ=4 XsKurt[]Y =ψY ()00=γ ()ψα′′′()+ ψβ′′′()>

= fuX () The three parameters α, β, γ could be fitted to Z =ln 1 X2 empirical cumulants k2, k3, and k4. For an error dis- tribution C would equal γ(y(β) – y(α)). Since k4 > 0, In Figure 2 are graphed three generalized logistic the random variable Y is platykurtic. probability density functions.

Since Y = γ ln(X1/X2) + C, Z = (Y − C)/γ = ln(X1/X2) The density is symmetric if and only if α = β; the may be considered a reduced form. From the gener­ gray curve is that of the logistic density, for which alized Pareto density above, we can derive the α = β = 1. The mode of the generalized-logistic (α, β) density of Z: density is umode = ln α/β. Therefore, the mode is posi-

u tive [or negative] if and only if α > β [or α < β]. u de fuZ ()= feXX=eZ () Since the digamma and tetragamma functions y, y″ 12 du strictly increase over the positive reals, the signs of u α−11β− Γα()+β  e   1  1 u E[Z] = y(α) – y(β) and Skew[Z] = y″ (α) – y″(β) • e =  uu   u 2 Γα()Γβ() ee+1  +1 ()e + 1 are the same as the sign of α − β. The positive mode of the orange curve (2, 1) implies positive mean and αβ Γα()+β  eu   1  skewness, whereas for the blue curve (1, 4) they are =  uu   Γα()Γβ() ee+ 1  + 1 negative.

Figure 2. Generalized logistic densities 0.4

Logistic (α, β) = (1, 4) 0.3 (α, β) = (2, 1)

0.2

0.1

–4 –3 –2 –1 0 1 2 3 4

178 CASUALTY ACTUARIAL SOCIETY VOLUME 13/ISSUE 2 The Log-Gamma Distribution and Non-Normal Error

5. Special cases κ=2 VarZ[]= ψα′()+ ψβ′()= 21ψ′() ∞ 1 π22π Although the probability density function of the ==2 22• ζ=() 2 = ≈ 3.290 ∑ k 2 63 generalized logistic random variable is of closed form, k=1 the form of its cumulative distribution function is not κ=4 XsKurt[]Z = ψα′′′ ()+ ψβ′′′ ()= 21ψ′′′ () closed, save for the special cases of α = 1 and β = 1. ∞ 1 π442π The special case of α = 1 reduces X /X to an ordi- ••• 1 2 ==26∑ 4 12 ζ=()412 = ≈12.988 nary Pareto. In this case, the cumulative distribution k=1 k 90 15 β  1  is FuZ ()=−1  u  . Likewise, the special case of  e +1 Instructive also is the special case α = β = ½. Since G(½) = π , the probability density function in this β = 1 reduces X1/X2 to an inverse Pareto. In that case, x 2 u α 1 e 1  e  case is fxZ ()= x . The constant suggests FuZ ()=  u  . It would be easy to simulate π+e 1 π  e + 1 11 a connection with the Cauchy density values of Z in both cases by the inversion method 2 π+u 1 11 (Klugman [1998, Appendix H.2]). (Wikepedia [Cauchy distribution]). Indeed, the Quite interesting is the special case . For α = β density function of the random variable W = e Z/2 is: Z is symmetric about its mean if and only if α = β, in which case all the odd cumulants higher than the du()2ln fu()= fu()2ln mean equal zero. Therefore, in this case: WZ=2lnW du

eu α 1 u 221 Γα()2 () = = fuZ ()= 2 2 22 Γα()()eu + 1 α π+uu1 π+u 1 This is the density function of the absolute value Symmetry is confirmed in as much as fZ (−u) = fZ (u). 13 eu of the standard Cauchy random variable. When α = β = 1, fuZ ()= 2 , whose cumu- ()eu + 1 α=1 6. A maximum-likelihood example eu  eu  lative distribution is FuZ ()= u =  u  = e + 11 e +  Table 1 shows eighty-five standardized14 error β=1 terms from an additive-incurred model. A triangle  1  1−  u  . In this case, Z is a logistic distribu- of incremental losses by accident-year row and  e + 1 tion. Its mean and skewness are zero. As for its even 12 cumulants: 13Another derivation is:

 Gamma()1 ,1  11 ln 2 1 Workable Excel formulas are Z = LN (−1 + (1 − RAND( ))^(−1/β)) and  1  Gamma(),1 We==ZZ2 ee== Gamma()2,1  2 1 Z = −LN(RAND( )^(−1/α) − 1). To simulate the generalized Pareto and Gamma()2 ,1 logistic random variables requires gamma or beta inverse functions. Two equivalent forms are Z1 = LN(GAMMA.INV(RAND(), α, 1)) – LN(GAMMA. Gamma()1 ,2 N ()0,1 2 N ()0,1 INV(RAND(), β, 1)) and Z2 = −LN(−1 + 1/BETA.INV(RAND(), α, β)). ==2 = 1 2 Mathematically, Z1 ∼ Z2; but the Z2 formula overflows when RAND() ≈ 1. Gamma()2 ,2 N ()0,1 N ()0,1 12See Wikipedia [Logistic distribution] and Wikipedia [Riemann zeta The Gamma (½, 2) 2 random variable is the square of the standard function]. Often cited is the coefficient of excess kurtosis: γ = k /k2 = ∼ c1 2 4 2 normal. Wikipedia [Cauchy distribution] and Hogg [1984, pp. 47, 49] (2p4/15)/(p2/3)2 = 1.2. Havil [2003, Chapter 4] shows how Euler in 1735 ∞ show that the standard Cauchy random variable is the quotient of two 2 proved that ∑1 k converges to the value p2/6. Since the mid-1600s, independent standard normals. k=1 14The generic model of the observations is y = Xβ + e, where ∞ 2 ˆ determining the value of 1 k2 had eluded Wallis, Leibniz, and the Var[e] = s F. The variance of the residuals is Var[eˆ ] = Var[y - Xa] = ∑ 2 –1 –1 k=1 s (Φ – X(X′F X) X′), as derived by Halliwell [1996, Appendix D]. ˆ Bernoullis. Because the latter lived in Basel, it became known as the The standardized residual of the ith element is then eˆ i / Var[]e ii . Basel problem. Usually the estimate rˆ 2 must be used for s2.

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 179 Variance Advancing the Science of Risk

Table 1. Ordered error sample (85 = 17 ë 5) fuYZ()= fz=zY()()()uz′()u –2.287 –0.585 –0.326 –0.009 0.658 zu α –1.575 –0.582 –0.317 0.009 0.663 Γα()+β  e ()  1 β 1 =  zu()  zu() –1.428 –0.581 –0.311 0.010 0.698 Γα()Γβ() ee+ 1 ()+γ1 –1.034 –0.545 –0.298 0.012 0.707 –1.022 –0.544 –0.267 0.017 0.821 The logarithm, or the log-likelihood, is: –1.011 –0.514 –0.260 0.019 0.998

–0.913 –0.503 –0.214 0.038 1.009 ln fuY ()=Γln ()α+β−ln Γα()−Γln ()β –0.879 –0.501 –0.202 0.090 1.061 +αzu()−α()+β ln ezu()+−1ln γ –0.875 –0.487 –0.172 0.095 1.227 () –0.856 –0.486 –0.167 0.123 1.559 This is a function in four parameters; C and γ are –0.794 –0.453 –0.165 0.222 1.966 implicit in z(u). With all four parameters free, the –0.771 –0.435 –0.162 0.255 1.973 likelihood of the sample could be maximized. Yet it –0.726 –0.429 –0.115 0.362 2.119 is both reasonable and economical to estimate Y as –0.708 –0.417 –0.053 0.367 2.390 –0.670 –0.410 –0.050 0.417 2.414 a “standard-error” distribution, i.e., as having zero –0.622 –0.384 –0.042 0.417 2.422 mean and unit variance. In Excel it sufficed us to let –0.612 –0.381 –0.038 0.488 2.618 the Solver add-in maximize the log-likelihood with respect to α and β, giving due consideration that C and γ, constrained by zero mean and unit variance, evaluation-year column was modeled from the are themselves functions of α and β. As derived in exposures of its thirteen accident years. Variance by Section 4, E[Y ] = C + γ (y(α) – y(β)) and Var[Y] = column was assumed to be proportional to exposure. 2 13 γ (y′(α) + y′(β)). Hence, standardization requires A complete triangle would have ∑ k = 91 observa- that γα(),1β= ψα′()+ ψβ′() and that C(α, β) = k=1 γ (α, β) • (y(β) – y(α)). The log-likelihood maximized tions; but we happened to exclude six. at `ˆ = 0.326 and aˆ = 0.135. Numerical derivatives of The sample mean is nearly zero at 0.001. Three of the GAMMALN function approximated the digamma the five columns are negative, and the positive errors and trigamma functions at these values. The remain- are more dispersed than the negative. Therefore, ing parameters for a standardized distribution must this sample is positively skewed, or skewed to the be Cˆ = −0.561 and fˆ = 0.122. Figure 3 graphs the right. Other sample cumulants are 0.850 (variance), maximum-likelihood result. 1.029 (skewness), and 1.444 (excess kurtosis). The The maximum absolute difference between the coefficients of skewness and of kurtosis are 1.314 empirical and the fitted, |max| 0.066, occurs at and 1.999. = x 0.1. We tested its significance with a “KS-like” By maximum likelihood we wished to explain = (Kolmogorov-Smirnov, cf. Hogg [1984, p. 104]) the sample as coming from Y = γ Z + C, where Z = statistic. We simulated one thousand samples of ln(X1/X2) ∼ ln(Gamma(α, 1)/Gamma(β, 1)), the gener- alized logistic variable of Section 4 with distribution: eighty-five instances from the fitted distribution, i.e., from the Y distribution with its four estimated u α Γα()+β  e  1 β parameters. Each simulation provided an empirical fuZ ()=  uu Γα()Γβ() ee+ 1 ()+ 1 cumulative distribution, whose maximum absolute deviation we derived from the fitted distribution over So, defining z(u; C, γ) = (u − C)/γ, whereby the interval [−4, 4] in steps of 0.1. Figure 4 contains z(Y ) = Z, we have the distribution of Y: the graph of the cumulative density function of |max|.

180 CASUALTY ACTUARIAL SOCIETY VOLUME 13/ISSUE 2 The Log-Gamma Distribution and Non-Normal Error

Figure 3. Empirical and fitted CDFs 1.0

0.9

0.8

0.7

0.6 Empirical 0.5 Fitted ǀmaxǀ 0.4 NormS 0.3

0.2

0.1

–4 –3 –2 –1 0 1 2 3 4

Figure 4. Distributon of |max| 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 0.20

The actual deviation of 0.066 coincides with the 7. Conclusion thirty-first percentile, or about with the lower tercile. The dashed line in Figure 3 (legend “NormS”) Actuaries are well schooled in loss distribu- repre­sents the cumulative standard-normal distrib­u­ tions, which are non-negative, positively skewed, tion. The empirical distribution has more probability and right-tailed. The key to a versatile distribution below zero and a heavier right tail. Simulations with of error is to combine of loss distribu- these logistic error terms are bound to be more accu- tions. Because most loss distributions are transfor- rate than simulations defaulted to normal errors.15 mations of the gamma distribution, the log-gamma distribution covers most of the possible combina- 15Taking trigamma and tetragamma values from www.easycalculation. tions. The generalized logistic distribution strikes com//polygamma-function.php, we calculated the higher cumu- a balance between versatility and complexity. It lants: k3 = Skew[Y] = 1.496 and k4 = XsKurt[Y] = 4.180. Since Y is standardized, these values double as coefficients of skewness and should be a recourse to the actuary seeking to boot- kurtosis. That they, especially the kurtosis, are greater than the empirical strap a model whose residuals are not normally coefficients, 1.314 and 1.999, is evidence for the inferiority of the method of moments. distributed.

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 181 Variance Advancing the Science of Risk

References Wikipedia contributors, “,” https://en. wikipedia.org/wiki/Polygamma_function (accessed March Artin, E., The Gamma Function, New York: Dover, 2015 (ET 2017). “Einführung in die Theorie der Gammafunktion,” Hamburger Wikipedia contributors, “,” https://en. wikipedia.org/wiki/Riemann_zeta_function (accessed March Mathematische Einzelshriften, 1931). 2017). Daykin, C. D., T. Pentikäinen, and M. Pesonen, Practical Risk Theory for Actuaries, London: Chapman & Hall, 1994. Halliwell, L. J., “Loss Prediction by Generalized Least Squares,” Appendix A Proceedings of the Casualty Actuarial Society, 83 (1996), The gamma, log-gamma, pp. 156–193, www.casact.org/pubs/proceed/proceed96/ 96436.pdf. and polygamma functions Halliwell, L. J., “Conditional Probability and the Collective Risk A.1. The gamma function as an Model,” Casualty Actuarial Society E-Forum, Spring 2011, www.casact.org/pubs/forum/11spforum/Halliwell.pdf. The modern form of the gamma function is G(α) = Halliwell, L. J., “The Gauss-Markov Theorem: Beyond the ∞ BLUE,” Casualty Actuarial Society E-Forum, Fall 2015, ∫ e−xxα−1dx. The change of variable t = e−x (or www.casact.org/pubs/forum/15fforum/Halliwell_GM.pdf. x=0 Havil, J., Gamma: Exploring Euler’s Constant, Princeton Uni- x = −ln t = ln 1/t) transforms it into: versity Press, 2003.

Hogg, R. V., and S. A. Klugman, Loss Distributions, New York, α−1 α−1 0  1  dt  1  1 Wiley, 1984. Γα()= ∫∫t  ln   −  =  ln  dt Judge, G. G., R. C. Hill, W. E. Griffiths, H. Lütkepohl, and T. Lee, tt=1  t   tt =0   Introduction to the Theory and Practice of Econometrics, 2nd ed., New York: Wiley, 1988. According to Havil [2003, p. 53], Klugman, S. A., H. H. Panjer, and G. E. Wilmot, Loss Models: used the latter form in his pioneering work during From Data to Decisions, New York: Wiley, 1998. 1729–1730. It was not until 1809 that Legendre Leemis, L., “Log-Gamma Distribution,” www.math.wm.edu/ ∼leemis/chart/UDR/PDFs/Loggamma.pdf. named it the gamma function and denoted it with the Meyers, G., “The Skew Normal Distribution and Beyond,” Actu- letter ‘G’. The function records the struggle between arial Review, May 2013, p. 15, www.casact.org/newsletter/ the e−x and the power function pdfUpload/ar/AR_May2013_1.pdf. xα−1, in which the former ultimately prevails and forces Shapland, M. R., Using the ODP Bootstrap Model: A Practitio- ner’s Guide, CAS Monograph Series Number 4, Arlington, the convergence of the integral for α > 0. ∞ VA: Casualty Actuarial Society, 2016, www.casact.org/pubs/ x Of course, Γ=()11ed− x = . The well-known monographs/papers/04-shapland.pdf. ∫ x=0 Venter, G. G., “Transformed Beta and Gamma Distributions and recurrence formula G(α + 1) = αG(α) follows from Aggregate Losses,” Proceedings of the Casualty Actuarial Society, 70 (1983), pp. 156–193, www.casact.org/pubs/proceed/ : proceed83/83156.pdf. ∞ ∞ Wikipedia contributors, “Bohr–Mollerup Theorem,” https://en. ex−αxx−−1 dx edxeα−x xα ∞ αΓ()α=α=∫ ∫ ()= 0 wikipedia.org/wiki/Bohr–Mollerup_theorem (accessed March x=0 x=0 2017). Wikipedia contributors, “Cauchy Distribution,” https://en. ∞ ∞ −=xdα−eexx01+=−αxd+−11 x Γα+ wikipedia.org/wiki/Cauchy_distribution (accessed March ∫ () ∫ () x=0 x=0 2017). Wikipedia contributors, “Gamma Function,” https://en. −x α ∞ It is well to note that in order for e x |0 to equal wikipedia.org/wiki/Gamma_function (accessed March 2017). zero, must be positive. For positive-integral values Wikipedia contributors, “Generalized Logistic Distribution,” α https://en.wikipedia.org/wiki/Generalized_logistic_distribution of α, G(α) = (α − 1)!; so the gamma function extends (accessed March 2017). the to the positive real numbers. In this Wikipedia contributors, “Logistic Distribution,” https://en. domain the function is continuous, even differen- wikipedia.org/wiki/Logistic_distribution (accessed March tiable, as well as positive. Although lim Γα()=∞, 2017). α→0+

182 CASUALTY ACTUARIAL SOCIETY VOLUME 13/ISSUE 2 The Log-Gamma Distribution and Non-Normal Error

1 nx lim () lim ()11() 1. So () as + αΓ α= Γα+=Γ= Γα ∼ = lim α→0 α→0 α n→∞  xx   x 1+ ... 1+ α → 0+. As α increases away from zero, the func-  1   n  tion decreases to a minimum of approximately G(1.462) = 0.886, beyond which it increases. So, over nn! n As a check, Γ=()1 lim = lim = 1. the positive real numbers the gamma function is ‘U’ nn→∞ ()n +1! →∞ n +1 shaped, or concave upward. More­over, the recurrence formula is satisfied: ∞ ex−αx −1 dx Γα() ∫ nn! x+1 The simple algebra 1 = = x=0 = Γ+()x 1 = lim Γα() Γα() n→∞ ()1.++xn..()xn()++1 x ∞ 1 1 ∫ ex−αx −1dx indicates that fx()= ex−αx −1 nn! x x=0 Γα() Γα() = x lim n→∞ xx()1++...()nx is the probability density function of what we will call a Gamma(α, 1)-distributed random variable, or n lim =Γxx() simply Gamma(α)-distributed with θ = 1 understood.16 n→∞ ()nx++1 For k > −α, the kth moment of X ∼ Gamma(α) is: Hence, by induction, the integral and infinite- ∞ 1 product definitions are equivalent for positive-integral EX[]kk= ∫ xe−αx xd−1 x x=0 Γα() values of x. It is the purpose of this section to extend the equivalence to all positive real numbers. Γα()+ k ∞ 1 = ∫ ex−αxk+−1dx The proof is easier in logarithms. The log-gamma Γα() x=0 Γα()+ k ∞ function is lnΓα()= ln ∫ ex−αx −1 dx. The form of its Γα()+ k x=0 = Γα() recursive formula is lnG(α + 1) = ln(αG(α)) = lnG(α) + ln α. Its first derivative is: Therefore, for k = 1 and 2, E[X] = α and E[X2] = α(α + 1). Hence, Var[X] = α. d Γα′() ()ln  Γ ′ ()α= ln Γα()≡ dα Γα() A.2. The gamma function as an ∞ ∫ ex−αx −1 ln xdx Euler found that the gamma function could be = x=0 expressed as the infinite product: Γα()

∞ x 1 nn! = ex−αx −1 ln xdxE= []Y Γ=()x lim ∫ Γα() n→∞ xx()1++...()nx x=0

So the first derivative of lnG(α) is the expectation of Y ∼ Log-Gamma(α), as defined in Section 2. The 16 Similarly, Venter [1983, p. 157]: “The percentage of this integral reached second derivative is: by integrating up to some point x defines a probability distribution, i.e., the probability of being less than or equal to x.” The multiplication of this random variable by a scale factor q > 0 is Gamma(α, q)-distributed with d Γα′() x α−1 ln  Γ ′′()α= 11− x () θ   d () density fxGamma()αθ, ()= e   , as found under equivalent α Γα Γα()  θ  θ forms in Hogg [1984, p. 226] and Klugman [1998, Appendix A.3.2.1]. Γα()Γα′′()− Γα′()Γα′() Because this paper deals with logarithms of gamma random variables, = in which products turn into sums, we will often ignore scale factors. Γα()()Γα

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 183 Variance Advancing the Science of Risk

2 Γα′′ ()  Γα′() n−1  x  = −   xnln()−−1lnlx −+∑ n1  <Γln ()x Γα()  Γα() k=1  k 

n−1 ∞  x  1 −αx −1 2 2 xnln ln x ln 1 = ex ()ln xdxE− []Y ≤−() −+∑   ∫ k=1  k  x=0 Γα()

2 2 =−=EY[]EY[] VarY[]> 0 This string of inequalities, in the middle of which is ln G(x), is true for n ∈{2, 3, . . .}. As n approaches Because the second derivative is positive, the log-  n  infinity, lim[x ln(n) – x ln(n − 1)] = xlim ln  = gamma function is concave upward over the positive n→∞ n→∞  n − 1 real numbers.  n  x ln lim  = xln1 = 0. So lnG(x) is sandwiched Now let n ∈ {2, 3, . . .} and x ∈ (0, 1]. So 0 < n − 1 <  n→∞ n − 1 n < n + x ≤ n + 1. We consider the slope of the log- between two series that have the same limit. Hence: gamma secants over the intervals [n – 1, n], [n, n + x], and [n, n + 1]. Due to the upward concavity, the  n−1  x   order of these slopes must be: ln Γ=()xx−+ln lim xnln()−−1ln1+ n  ∑    →∞  k=1  k  

ln Γ−()nnln Γ−()1 ln Γ+()nx−Γln ()n  n  x   < =−ln xx+−lim ln()n ln 1+ n  ∑    nn−−()1 ()nx+−n →∞  k=1  k   ln Γ+()nn1l−Γn () ≤ ()nn+−1 If the limit did not converge, then lnG(x) would not converge. Nevertheless, Appendix C constructs a Using the recurrence formula and multiplying by proof of the limit’s convergence. But for now we can positive x, we continue: finish this section by exponentiation:

xnln()−<1lnlΓ+()nx−Γnl()nx≤ n()n Γ=()xelnΓ()x

n xnln()−+1lnlΓ<()nnnlΓ+()xx≤+nl()nnnΓ()   x   lim xnln()−+ln1   1 n→∞ ∑  k  = e  k =1  n−1 x

But lnG(n) = ln k and ln G(n + x) = lnG(x) + ln x + n ∑  x  k=1  xnln()−+ln 1  1  ∑  k   n−1 = lim e k =1  ∑ ln()kx+ . Hence: x n→∞   k=1   1 nx n−1 = lim x n→∞ n  x  xnln()−+1l∑ nlkx<Γnl()+ n x 1+ k=1 ∏   k=1  k  n−1 n−1 ++∑∑ln()kx≤+xnln() ln k nx k=1 k=1 = lim n→∞  xx   x 1+ ...1+ n−1 n−1  1   n  xnln()−+1l∑∑nlkx−−nlnl()kx+

184 CASUALTY ACTUARIAL SOCIETY VOLUME 13/ISSUE 2 The Log-Gamma Distribution and Non-Normal Error

to the integral form. 17 With the , but stop at the second derivative infinite-product form, Havil [2003, pp. 58f] easily of ln G(α). The remainder of this appendix deals with derives the complement formula G(x)G(1 − x) = p/ the derivatives of the log-gamma function. sin(px). Consequently, G(½)G(½) = p/sin(p/2) = p. According to the previous section:

 n  x   A.3. Derivatives of the log-gamma ln Γ=()xx−+ln lim xnln()−+ln 1 n  ∑    function →∞  k=1  k  

The derivative of the gamma function is G′(a) = Hence the digamma function is: ∞ ∞ α−1 ∞ d −αxx−−1 dx() −αx −1 ex dx = e dx = ex ln xdx. d ∫ ∫ d ∫ d 1  n 1  α x=0 x=0 α x=0 ψ=()x ln Γ=()x −+lim ln n − n  ∑  dx x →∞ k=1 kx+ So, for X ∼ Gamma(α):   n ∞  1  1 −αx −1 =−lim ln n n  ∑  EX[]ln = ∫ ln xexdx →∞  k=0 kx+  x=0 Γα()

1 ∞ Its recurrence formula is: = ∫ ex−αx −1 ln xdx Γα()x=0  n 1  ψ+()xn1 =−lim ln n  ∑  Γα′() d →∞ k=0 kx++1 = = ln Γα()   Γα() dα  n 11 1 =−lim ln n + =ψ()x + n  ∑  Because the derivative of ln G(α) has proven →∞  k=0 kx+  0 + x x useful, mathematicians have named it the “digamma”  n 1  d Also, ψ=()1 lim ln n − = lim function: ψα()= lnΓα(). Therefore, E[ln X] = n  ∑  n dα →∞  k=0 k + 1 →∞ ( ). Its derivatives are the trigamma, tetragamma, y α n+1 1 n 1 and so on. Unfortunately, Excel does not provide ln n −=−−lim ln n =−γ . Euler was ∑∑n {}k=1 kk→∞{}k=1 these functions, although one can approximate them n 1 from GAMMALN by numerical differentiation. the first to see that lim ∑ − lnn converged.18 n→∞ k The R and SAS programming languages have the {}k=1 He named it ‘C’ and estimated its value as 0.577218. Eventually it was called the Euler-Mascheroni con- stant γ, whose value to six places is really 0.577216. 17Here is an example of the increasing standard of mathematical rigor. Euler assumed that the agreement of the integral and the infinite product over the natural numbers implied their agreement over the positive reals. After all, the gamma function does not “serpentine” between integers, for it is concave. Only later was proof felt necessary, especially after Cantor demonstrated that  +  =  . This implies that the agreement 0 0 0  n   n  18  1   1  of two analytic functions at 0 points does not require the 0 derivatives Reformulate it as: lim  − ln n =−lim  ln()n + 1  = n→∞ ∑∑k n→∞ k in their to be identical. (It would if 0 of those points were  k=11  k=  contained within a finite interval. But here the points of agreement are n k+1 ∞ k+1  11dx   1  at least one unit distant from one another.) According to Wikipedia lim  −  =−  dx. The value of each unit- n→∞ ∑ k ∫ xk∑ ∫  x  [Bohr–Mollerup theorem], when Harald Bohr and Johannes Mollerup  k=1 xk=  k=1 xk= published a proof of the integral/infinite-product equivalence in their integral is positive, but less than 1/k – 1/(k + 1) = 1/k(k + 1) < 1/k2. 1922 textbook on complex analysis, they did not realize that their proof ∞ was original. Harald Bohr was the younger brother of Niels Bohr. So the series increases, but is bounded by ∑12k22=ζ()=π 6, A decade later, in the preface to his classic monograph on the gamma k=1 function, [2015 (1931)] acknowledged the fundamental nature where z is the zeta function [Wikipedia, Riemann zeta function]. See of this theorem. also Appendix C.

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 185 Variance Advancing the Science of Risk

The successive derivatives of the digamma func- Obviously, MX(0) = E[1] = 1. It is called the moment tion are: generating function because its derivatives at zero, if convergent, equal the (non-central) moments d  n 11 ∞ of X, i.e.: ψ=′()x lim ln n − = n  ∑∑ 2 dx →∞  k==0 kx+  k 0 ()kx+ n ntX []n d tX  de  ∞ 1 MX ()0 ==Ee E dtn []  dtn  ψ=′′ ()x −2∑ 3 t==00 t  k=0 ()kx+ EXnXeE0• X n . ∞ 1 ==[][] ψ=′′′ ()x 6∑ 4 k=0 ()kx+ The moment generating function for the normal µt+s2t2/2 random variable is M 2 (t) = e . The mgf of The general formula for the nth derivative is y[n](x) = N(µ,s ) a sum of independent random variables equals the ∞ 1 n−1 product of their mgfs: (−1) n!∑ n+1 with recurrence formula: k=0 ()kx+ Mt()==EetX()+YtEeXte Y ∞ 1 XY+ [][] ψ+[]n ()xn11=−()n−1 ! ∑ n+1 tX tY k=0 ()kx++1 ==Ee[]Ee[]MtXY()Mt() ∞ 11()− n−1 n! 1!n−1 n =−() ∑ n+1 − n+1 The cumulant generating function (cgf) of X, or k=0 ()kx+ x yX(t) is the of the moment gener- ()−1!n n ating function. So y (0) = 0; and for independent =ψ[]n ()x + X x n+1 summands:

∞ ψ=XY++()tMln XY()tM= ln XY()tM()t [n] n−1 1 n−1 Finally, y (1) = (−1) n! ∑ n+1 =−()1 k=0 ()k + 1 =+ln MtXY() ln Mt()=ψXY()tt+ψ () ∞ 1 n−1 n!∑ n+1 =−()1!nnζ+()1 . A graph of the k=1 k The nth cumulant of X, or kn (X), is the nth deri­ digamma through pentagamma functions appears in vative of the cgf evaluated at zero. The first two Wikipedia [Polygamma function]. cumulants are identical to the mean and the variance:

dMln X ()t MtX′ () Appendix B κ=1()X = = EX[] dt t=0 Mt() Moments versus cumulants19 X t=0

MtXX()Mt′′ () Actuaries and statisticians are well aware of d MtX′ () − MtXX′′()Mt() the moment generating function (mgf) of random κ=2 ()X = 2 dt MtX () MtX () t 0 variable X: t=0 = =−EX2 EX[][]EX= VarX[] tX tx [] MtX ()==Ee[]∫ ef()xdx X One who performs the somewhat tedious third derivative will find that the third cumulant is identical­ to the skewness. So the first three cumulants are equal 19 For moment generating functions see Hogg [1984, pp. 39f] and to the first three central moments. This amounts to a Klugman [1998, §3.9.1]. For cumulants and their generation see Daykin [1994, p. 23] and Halliwell [2011, §4]. proof that the first three central moments are additive.

186 CASUALTY ACTUARIAL SOCIETY VOLUME 13/ISSUE 2 The Log-Gamma Distribution and Non-Normal Error

However, the fourth and higher cumulants do not two functions y = x and y = ln(1 + x) are tangent to equal their corresponding central moments. In fact, each other at x = 0, f (0) = f ′(0) = 0. Otherwise, the n defining the central moments as µn = E[(X − µ) ], the line is above the logarithm and f (x ≠ 0) > 0. Since next three cumulant relations are: f ″(x) = 1/(1 + x)2 must be positive, f (x) is concave upward with a global minimum over (−1, ∞) at x = 0. 2 4 κ=44µ−33µ=2 µ−4 σ For now, let us investigate the convergence of ∞ ∞ κ=µ−10µµ  x  x   55 23 ϕ=()xa∑ n ()x =−∑  ln1+  . Obviously, n=1 n=1 n  n   2 3 κ=66µ−15µµ24−µ10 3 +µ30 2 j(x = 0) = 0. But if x ≠ 0, for some x ∈ (min(n, n + x), max(n, n + x)): The cgf of a normal random variable is yN(µ, s2)(t) = µt+s2t2/2 2 2 lne = µt + s t /2. It first two cumulants equal µ x  x  2 ax()=−ln 1+ and s , and its higher cumulants are zero. Therefore, n n  n  the third and higher cumulants are relative to the zero values of the corresponding normal cumulants. So x  nx+  =−ln  third and higher cumulants could be called “excess n  n  of the normal,” although this in practice is done only x nx+ du for the fourth cumulant. Because µ = E[(X − µ)4] =− 4 n ∫ u is often called the kurtosis, ambiguity is resolved by un= calling k the “excess kurtosis,” or the kurtosis in x 1 4 =−()()nx+−n 2 4 4 • 4 4 excess of µ4 = E[N(0, s ) ] = E[N(0, 1) ] s = 3s . n ξ x x =− Appendix C n ξ Log-gamma convergence  11 =−x In Appendix A.2 we proved that for real α > 0:  n ξ

∞  −αx −1  Regardless of the sign of x: ln ∫ ex dx =Γln ()α=−αln  x=0   11  11  n  α   x  −  < nn()+ x nn+ x nn+ x on the right side; otherwise, the equality would be be positive, since x > −1. Therefore, for x ≠ 0: limited or qualified. Nevertheless, it is valuable to examine the convergence of the log-gamma limit, as x 2 we will now do. 0 <

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 187 Variance Advancing the Science of Risk

Now , or the convergence Returning to the limit in the log-gamma formula ∞ and replacing ‘α’ with ‘x’, we have: of ∑ axn (), is stricter than simple convergence. n=1 ∞  n  x   If axn () converges, then so too does j(x) = lim xnln −+ln 1  ∑ n→∞ ∑   n=1  k=1  k   ∞ ∞ n ∑ axn (). And ∑ axn () converges, if it is bounded   x x  x   n=1 n=1 =+lim xnln −+ −+ln 1 n  ∑     →∞ k=1  k k  k   from above. The following string of inequalities   establishes the upper bound: n x n  x  x   =−lim xnln +−ln 1+ n ∑∑    →∞{}k==11k k  k  k   ∞ ∞ x 2 ∑∑axn ()≤ n n nn=1 =1 nn()+ x 1  x  x   =−x lim −+ln()n −+ln 1 n ∑∑    →∞{}k==11k k  k  k   x 22∞ x ≤ + ∑ n 11()+ x n=2 nn()+ x =−γ+xa∑ k ()x k=1 x 2 ∞ x 2 ≤ + x 2 ∑ 1+ x n=2 nn()+ x Thus, independently of the integral definition,

2 ∞ we have proven for α > 0 the convergence of x 2 1 n 21 ≤ + x ∑ 2   α   1+ x n=2 ()n − 1 ln Γα()=−lnα+lim α−ln n ln 1.+ n  ∑    →∞  k=1  k   2 2 ∞ x 2  π 1  This is but one example of a second-order con- ≤ + x  =ς()2 = ∑ 2  16+ x  n=1 n  vergence. To generalize, consider the convergence n  x  of ϕ=()xflim . As with the specific f(x) = n ∑   Thus have we demonstrated the convergence of →∞ k=1  k  ∞ ∞  x  x   x – ln(1 + x) above, the general function f must be ϕ=()xa∑∑n ()x =− ln1+  . nn=11= n  n   analytic over an open interval about zero, a < 0 < b. As a special case:20 Likewise, its value and first derivative at zero must be zero, i.e., f (0) = f ′(0) = 0. According to the n n 1  1  second-order mean value theorem, for x ∈ (a, b), lim ak ()1 =−lim ln 1+ n ∑∑n     →∞ k==11→∞k  kk   there exists a real x between zero and x such that:

n 1 k 1   +   f ′′()ξ 22f ′′()ξ =−lim ln fx() ff00x x x n ∑     =+() ′() + = →∞ k=1  k  k   22

 n 1  =−lim ln()n + 1 n ∑  →∞ k=1 k  21Actually. we proved the convergence of the second term, the one with  n 1  the limit, for α > −1. It is the first term, the logarithm, that contracts the =−lim  ln()n  =γ≈ 0.577 domain of ln G(α) to α > 0. As a complex function ln G(z) converges n→∞ ∑ k=1 k  for all complex z ∉ {0, −1, −2, . . .}. However, ln z is multivalent, or unique only to an integral multiple of 2pi. That exponentiation removes this multivalence is the reason why elnG(z) = G(z) is analytic, except at z ∉ eznln {0, −1, −2, . . .}. The infinite product Γ=()z lim n→∞  zz   z 1 + ... 1 +  1  n  converges, unless some factor in the denominator equals zero, i.e., unless 20See also Appendix A.3 on the Euler-Mascheroni constant γ. z = {0, −1, −2, . . .}.

188 CASUALTY ACTUARIAL SOCIETY VOLUME 13/ISSUE 2 The Log-Gamma Distribution and Non-Normal Error

x n  x  Therefore, there exist x between zero and such Consequently, lim f converges to a real k n ∑   k →∞ k=1  k 

n n 2 number for every function f that is analytic over some  x  fx′′()ξk   that ϕ=()xflim = lim . n ∑∑  n   domain about zero and for which f(0) = f ′(0) = 0. The →∞ k=11 k  →∞ k= 2  k  convergence must be at least of the second order; Again, we appeal to absolute convergence. If n n 2 2 n M fx′′ ()ξk   xf′′ ()ξk a first-order convergence would involve lim ∑ lim = lim converges, n→∞ k n ∑∑  n 2 k=1 →∞ k=1 22 k  →∞ k=1 k and the divergence of the harmonic series. then j(x) converges. But every x belongs to the k The same holds true for analytic functions over com- closed interval Ξ between 0 and x. And since f is ana- plex domains. Another potentially important second- lytic over (a, b) ⊃ Ξ, f ″ is continuous over Ξ. Since a z n   z   continuous function over a closed interval is bounded, order convergence is ϕ=()zelim k −+1 n ∑     →∞ k=1   k   . 0 ≤ f ″(xk) ≤ M for some real M. So, at length:

n n n f ′′ ()ξk M 1 0 ≤ lim ≤=lim M lim n ∑∑2 n 2 n ∑ 2 →∞ k=1 k →∞ k=1 k →∞ k=1 k

π2 =ζMM()2 = 6

VOLUME 13/ISSUE 2 CASUALTY ACTUARIAL SOCIETY 189