Method of L-moment estimation for the generalized lambda distribution
Paul J. van Staden and M.T. (Theodor) Loots
Department of Statistics, University of Pretoria, Pretoria, 0002, SOUTH AFRICA [email protected] www.up.ac.za/pauljvanstaden
Abstract
The generalized lambda distribution (GLD) is a flexible distribution for statistical modelling, but existing estimation methodologies for the GLD are computationally difficult, rendering the GLD impractical for many practitioners. We derive a parameterization of the GLD with closed-form expressions for the method of L-moment estimators. A numerical example involving the age of coronary heart disease subjects is presented.
Keywords: Generalized Pareto distribution, L-moment ratio diagram, quantile function, skew-logistic distribution
1. Introduction properties. Section 5 contains a numerical example and the paper concludes with Section 6. Although there is no consensus in the literature on the birth date of Tukey’s lambda distribution, with [1] and [2] popular “guesstimates”, the distribution has fathered 2. The Quantile Function (QF) and L-Moments various offspring, collectively referred to as generalized lambda distributions (GLDs). Among them, two The QF of the probability distribution of a continuous parameterizations have found favour among statisticians stochastic variable, say X, is defined as and practitioners. Both these parameterizations of the
GLD are highly flexible with respect to distributional QX ( p) = the value of x such that FX (x) = p , (1 )1) shape and are hence applied in diverse fields of research. A few recent applications include biochemistry [3], where 0 ≤ p ≤ 1 and F (x ) is the cumulative economics [4], forestry [5] and queuing theory [6]. X Furthermore, since random variates for Monte Carlo distribution function of X. The theory of L-moments was simulation studies can easily be generated via the compiled by [10]. In terms of the QF, the rth L-moment quantile function (QF) of the GLD, the GLD is often is defined by employed in such studies – see for instance [7-9]. 1 Unfortunately, estimating the parameters of the GLD is * Lr = ∫ QX ( p)Pr −1( p)dp , (2)1) not straightforward. Various estimation methodologies 0 have been proposed in the literature, but all of them require numerical optimization techniques. We approach where, the problem from a different angle by not proposing a new estimation method, but by deriving an alternative r r r + k parameterization of the GLD for which closed-form * r −k k Pr ( p) = ∑ (− )1 p , (3 )1) expressions for the method of L-moment (MoLM) k =0 k k estimators are available. A brief discussion of the theory of L-moments is is the rth shifted Legendre polynomial. Note that, presented in Section 2. In Section 3 we consider the two analogous to [11], we denote the rth L-moment by L parameterizations of the GLD which are currently used r in statistical modelling. In Section 4 we derive our instead of λr , as is for example done in [10], to avoid parameterization of the GLD and present some of its confusion with the parameters of the GLD. As proven by Third Annual ASEARC Conference 1 December 7—8, 2009, Newcastle, Australia
[10], if the mean of X, µ , exists, then all the L-moments considered the use of shape functionals. Recently [11] exist. L = µ is a measure of location, while L is a presented MoLM estimation. 1 2 With all the above-mentioned estimation techniques, measure of spread. L-moment ratios are defined as the four chosen population measures are equated to the
corresponding sample statistics, resulting in four Lr equations with four unknowns which must be solved τ r = , r = ...,4,3 . (4)1) L2 simultaneously. Since no closed-form expressions exist for the shape estimators of either the RS or the FMKL
The L-skewness ratio, τ 3 , and L-kurtosis ratio, τ 4 , are parameterizations, numerical optimization techniques measures of shape. L-moment ratios are bounded, must be used. The reason is that λ3 and λ4 jointly simplifying their interpretation. In particular account for the skewness and the kurtosis of the GLD, irrespective of the shape measures used. For a detailed −1 <τ < 1 and 1 (5τ 2 −1)≤ τ < 1. (5)1) discussion on the computational difficulties in fitting the 3 4 3 4 GLD to a data set, the reader is referred to [19].
Let x :1 n ≤ x :2 n ≤ ... ≤ xn:n denote an ordered data set of size n. The rth sample L-moment is then given by 4. An Alternative Parameterization for the GLD
r−1 r −1 −1 k The standard form (with spread parameter set equal to ∑ ∑... ∑ r ∑(− )1 xi :n k r −k 1≤i1 lr tr = , r = ,3 ,4 ..., n . (7)1) l2 where λ is a shape parameter. Applying the reflection rule in quantile model building to the GPD (see [21] for this rule), gives the QF of the reflected generalized 3. The Generalized Lambda Distribution (GLD) Pareto distribution (R-GPD), In order to provide an algorithm for generating 1 λ λ (p −1) , λ ≠ ,0 symmetric and asymmetric random variables in Monte QX ( p) = (11)1) Carlo simulations, [12, 13] developed the Ramberg- ln p , λ = .0 Schmeiser (RS) parameterization of the GLD, defined by Consider the special case, λ = 0 in Equations (10) and pλ3 − 1( − p)λ4 (11), which corresponds to the QFs of the exponential Q ( p) = λ + , X 1 (8)1) (EXP) and reflected exponential (R-EXP) distributions λ2 respectively. The QF of the skew-logistic (S-LOG) distribution, where λ is a location parameter, λ is a spread 1 2 parameter and λ and λ are shape parameters. If 3 4 QX ( p) = 1( −δ [ln) p]+δ [− ln( 1− p)] , (12)1) λ3 = λ4 , the GLD is symmetrical. See [14] for a detailed discussion of the parameter space and properties of the is obtained by taking a weighted sum of the QFs of the RS parametrization. The FMKL parameterization of the R-EXP and EXP distributions, with 0 ≤ δ ≤1 a weight GLD, introduced by [15], has QF parameter. The QF in Equation (12) reduces to the QFs of the R-EXP, symmetric-logistic (LOG) and EXP λ3 λ4 distributions respectively for δ = 0 , δ = 1 and δ = 1. 1 p −1 1( − p) −1 2 QX ( p) = λ1 + − . (9 )1) λ2 λ3 λ4 Taking the weighted sum of the QFs of the R-GPD and To estimate the four parameters of the GLD, one can the GPD, with values of λ equal, but not zero apply an estimation method where four measures, necessarily, yields the QF of our proposed namely a measure of location, a measure of spread and parameterization of the GLD. Adding location and two measures of shape, are utilized. Method of moment spread parameters, say α and β > 0 , in the usual way estimation was developed by [16]. Percentile-based (see again [21] for details), produces the four-parameter methods have been proposed by [17] and [4], while [18] QF, given by Third Annual ASEARC Conference 2 December 7—8, 2009, Newcastle, Australia p λ −1 1( − p)λ −1 QX ( p) = α + β 1( −δ ) −δ . (13)1) λ λ Table 1 summarizes the support (distributional range) of this distribution. Apart from the special cases already mentioned above, the QF in Equation (13) simplifies to the QF of the uniform (UNIF) distribution for λ = 1 and 1 also for δ = 2 and λ = 2 . Table 1. The support of the GLD with QF as given in Equation (13). δ λ Support Figure 1. L-moment ratio diagram for the GLD with QF λ ≤ 0 (−∞ , α ] as given in Equation (13). δ = 0 λ > 0 [α − β λ , α ] λ ≤ 0 (−∞ , ∞) 0 < δ <1 MoLM estimation is applied to a data set by equating λ > 0 [α − 1( − δ ) β λ , α + δ β λ] L1, L2 ,τ 3,τ 4 to l1, l2 , t3, t4 and solving for the unknown λ ≤ 0 [α , ∞) δ =1 parameters in the system of four equations. The λ > 0 [α , α + β λ] advantage of our parameterization over the RS and FMKL parameterizations is that the MoLM estimates All L-moments exist for λ > −1, and are given by can be calculated sequentially using β 1( − 2δ ) 2 L1 = α − , (14)1) 3 + 7t ± t + 98 t +1 λ +1 λˆ = 4 4 4 , (18)1) 1(2 − t4 ) β L2 = , (15 )1) (λ + )(1 λ + )2 t λˆ + 3 1 1− 3 ( ) , λ ≠ ,1 2 ˆ δˆ = λ −1 r −2 (19 )1) β 1( − 2δ )s (λ − i) ∏ 1 , λ = ,1 i=1 2 Lr = , r = ...,4,3 , (16 )1) r (λ + i) ∏ ˆ ˆ ˆ i=1 β = l2 (λ +1)(λ + 2), (20)1) where s = 1 for r odd and s = 0 for r even. The L- βˆ 1− 2δˆ skewness and L-kurtosis ratios are given by αˆ = l + ( ) . (21 )1) 1 ˆ λ +1 (λ − )(1 1− 2δ ) (λ − )(1 λ − )2 τ = and τ = , (17)1) 3 λ + 3 4 (λ + )(3 λ + )4 5. A Numerical Example and represented graphically in the L-moment ratio The coronary heart disease (CHD) data set in [22] diagram in Figure 1. As with the RS and FMKL contains the age in years of 100 subjects, assumed to parameterizations of the GLD, there is not always a one- have a LOG distribution in a logistic regression to-one relation between the parameter values and the L- framework. A histogram of the data is given by Figure 2 moments. In the dark grey region in Figure 1, two and suggests that the data is approximately symmetric, distinct pairs of values for δ and λ give the same pair but that it has tails shorter than the LOG distribution. of L-skewness and L-kurtosis ratios, while a one-to-one These visual deductions are confirmed by the sample L- relation exists in the light grey regions. moments, given in Table 2. Note that, since λ = 0 for It follows from Equation (17) that, irrespective of the the LOG distribution, τ = 1 = .0 1667 from Equation 1 4 6 value of λ , if δ = , then τ 3 = 0 and the GLD is 2 (17), whereas t = .0 0305 for the data. symmetrical. The L-kurtosis ratio, τ , only depends on 4 4 The generalized secant hyperbolic (GSH) distribution the value of λ , as does all τ 2r for r ≥ 2 . The minimum and a short-tailed symmetric (STS) distribution were fitted to the data by [23] and [24] respectively. We fitted value of τ 4 is obtained for λ = 6 −1 . Third Annual ASEARC Conference 3 December 7—8, 2009, Newcastle, Australia our parameterization of the GLD to the data set. Since (t3, t4 ) lies in the dark grey region of Figure 1, we obtained two GLDs, denoted GLD 1 and GLD 2. Table 2 presents their MoLM parameter estimates, while their density curves are plotted in Figure 2. Both fitted GLDs have bounded support, agreeing well with the data set. Table 2. Sample L-moments and MoLM estimates of the fitted GLDs for the age (in years) of 100 CHD subjects. L-moments l1 l2 t3 t4 44.380 6.777 -0.00224 0.0305 MoLM estimates αˆ βˆ δˆ λˆ Figure 2. Histogram of the age (in years) of 100 CHD subjects with fitted density curves overlayed. GLD 1 44.768 28.964 0.489 0.627 GLD 2 44.140 117.177 0.504 2.688 [10] J.R.M. Hosking, “L-moments: analysis and estimation of distributions using linear combinations of order statistics”, Journal of the Royal Statistical Society: Series B 6. Conclusion (Methodological) , 52 (1), 105–124, 1990. [11] J. Karvanen, A. Nuutinen, “Characterizing the generalized Using the GPD as a building block, we have derived a lambda distribution by L-moments”, Computational Statistics parameterization of the GLD which can be easily fitted & Data Analysis , 52 (4), 1971–1983, 2008. to a data set using MoLM estimation. In future research [12] J.S. Ramberg, B.W. Schmeiser, “An approximate method for we will be focusing on statistical inference for this new generating symmetric random variables”, Communications of the member of the Tukey lambda family. Association for Computing Machinery , 15 (11), 987–990, 1972. [13] J.S. Ramberg, B.W. Schmeiser, “An approximate method for generating asymmetric random variables”, Communications of References the Association for Computing Machinery , 17 (2), 78–82, 1974. [14] Z.A. Karian, E.J. Dudewicz, Fitting Statistical Distributions: The [1] C. Hastings Jr., F. Mosteller, J.W. Tukey, C.P. Winsor, “Low Generalized Lambda Distribution and Generalized Bootstrap moments for small samples: a comparative study of order statistics”, Methods . Chapman and Hall / CRC Press, Boca Raton, Florida, 2000. The Annals of Mathematical Statistics , 18 (3), 413–426, 1947. [15] M. Freimer, G.S. Mudholkar, G. Kollia, C.T. Lin, “A study [2] J.W. Tukey, “The practical relationship between the common of the generalized Tukey lambda family”, Communications in transformations of percentages of counts and of amounts”, Statistics: Theory and Methods , 17 (10), 3547–3567, 1988. Technical Report 36, Statistical Techniques Research Group, [16] J.S. Ramberg, P.R. Tadikamalla, E.J. Dudewicz, E.F. Princeton University, 1960. Mykytka, “A probability distribution and its uses in fitting [3] A. Ramos-Fernández, A. Paradela, R. Navajas, J.P. Albar, data”, Technometrics , 21 (2), 201–214, 1979. “Generalized method for probability-based peptide and [17] Z.A. Karian, E.J. Dudewicz, “Fitting the generalized lambda protein identification from tandem mass spectrometry data and distribution to data: a method based on percentiles”, Communications sequence database searching”, Molecular & Cellular in Statistics: Simulation and Computation , 28 (3), 793–819, 1999. Proteomics , 7 (9), 1748–1754, 2008. [18] R. King, H. MacGillivray, “Fitting the generalized lambda [4] H.N. Haridas, N.U. Nair, K.R.M. Nair, “Modelling income distribution with location and scale-free shape functionals”, using the generalised lambda distribution”, Journal of Income American Journal of Mathematical and Management Distribution , 17 (2), 37–51, 2008. Sciences , 27 (3–4), 441–460, 2007. [5] M. Ivkovi ć, P. Rozenberg, “A method for describing and modelling [19] Z.A. Karian, E.J. Dudewicz, “Computational issues in fitting of within-ring wood density distribution in clones of three coniferous statistical distributions to data”, American Journal of Mathematical species”, Annals of Forest Science , 61 (8), 759–769, 2004. and Management Sciences , 27 (3–4), 319-349, 2007. [6] L.W. Robinson, R.R Chen, “Scheduling doctors’ [20] J.R.M. Hosking, J.R. Wallis, “Parameter and quantile appointments: optimal and empirically-based heuristic estimation for the generalized Pareto distribution”, policies”, IIE Transactions , 35 (3), 295–307, 2003. Technometrics , 29 (3), 339–349, 1987. [7] F. Bautista, E. Gómez, “Una exploración de robustez de tres [21] W. Gilchrist, Statistical Modelling with Quantile Functions , pruebas: dos de permutación y la de Mann-Whitney”, Revista Chapman and Hall / CRC Press, Bocca Raton, Florida, 2000. Colombiana de Estadística , 30 (2), 177–185, 2007. [22] D.W. Hosmer, S. Lemeshow, Applied Logistic Regression , [8] R. Cao, G. Lugosi, “Goodness-of-fit tests based on the kernel density 2nd edition, John Wiley & Sons, Inc., New York, 2000. estimator”, Scandinavian Journal of Statistics , 32 (4), 599–616, 2005. [23] D.C. Vaughan, “The generalized secant hyperbolic [9] O. Thas, J.C.W. Rayner, D.J. Best, “Tests for symmetry based distribution and its properties”, Communications in Statistics: on the one-sample Wilcoxon signed rank statistic”, Theory and Methods , 31 (2), 219–238, 2002. Communications in Statistics: Simulation and Computation , [24] A.D. Akkaya, M.L. Tiku, “Short-tailed distributions and 34 (4), 957–973, 2005. inliers”, Test , 17 (2), 282–296, 2008. Third Annual ASEARC Conference 4 December 7—8, 2009, Newcastle, Australia