<<

Method of L- estimation for the generalized lambda distribution

Paul J. van Staden and M.T. (Theodor) Loots

Department of , University of Pretoria, Pretoria, 0002, SOUTH AFRICA [email protected] www.up.ac.za/pauljvanstaden

Abstract

The generalized lambda distribution (GLD) is a flexible distribution for statistical modelling, but existing estimation methodologies for the GLD are computationally difficult, rendering the GLD impractical for many practitioners. We derive a parameterization of the GLD with closed-form expressions for the method of L-moment . A numerical example involving the age of coronary heart disease subjects is presented.

Keywords: Generalized , L-moment ratio diagram, , skew-

1. Introduction properties. Section 5 contains a numerical example and the paper concludes with Section 6. Although there is no consensus in the literature on the birth date of Tukey’s lambda distribution, with [1] and [2] popular “guesstimates”, the distribution has fathered 2. The (QF) and L-Moments various offspring, collectively referred to as generalized lambda distributions (GLDs). Among them, two The QF of the distribution of a continuous parameterizations have found favour among variable, say X, is defined as and practitioners. Both these parameterizations of the

GLD are highly flexible with respect to distributional QX ( p) = the value of x such that FX (x) = p , (1 )1) shape and are hence applied in diverse fields of research. A few recent applications include biochemistry [3], where 0 ≤ p ≤ 1 and F (x ) is the cumulative economics [4], forestry [5] and queuing theory [6]. X Furthermore, since random variates for Monte Carlo distribution function of X. The theory of L-moments was simulation studies can easily be generated via the compiled by [10]. In terms of the QF, the rth L-moment quantile function (QF) of the GLD, the GLD is often is defined by employed in such studies – see for instance [7-9]. 1 Unfortunately, estimating the of the GLD is * Lr = ∫ QX ( p)Pr −1( p)dp , (2)1) not straightforward. Various estimation methodologies 0 have been proposed in the literature, but all of them require numerical optimization techniques. We approach where, the problem from a different angle by not proposing a new estimation method, but by deriving an alternative r   r r + k   parameterization of the GLD for which closed-form * r −k    k Pr ( p) = ∑ (− )1    p  , (3 )1) expressions for the method of L-moment (MoLM) k =0 k  k   estimators are available. A brief discussion of the theory of L-moments is is the rth shifted Legendre polynomial. Note that, presented in Section 2. In Section 3 we consider the two analogous to [11], we denote the rth L-moment by L parameterizations of the GLD which are currently used r in statistical modelling. In Section 4 we derive our instead of λr , as is for example done in [10], to avoid parameterization of the GLD and present some of its confusion with the parameters of the GLD. As proven by Third Annual ASEARC Conference 1 December 7—8, 2009, Newcastle, Australia

[10], if the of X, µ , exists, then all the L-moments considered the use of shape functionals. Recently [11] exist. L = µ is a of location, while L is a presented MoLM estimation. 1 2 With all the above-mentioned estimation techniques, measure of spread. L-moment ratios are defined as the four chosen population measures are equated to the

corresponding statistics, resulting in four Lr equations with four unknowns which must be solved τ r = , r = ...,4,3 . (4)1) L2 simultaneously. Since no closed-form expressions exist for the shape estimators of either the RS or the FMKL

The L- ratio, τ 3 , and L- ratio, τ 4 , are parameterizations, numerical optimization techniques measures of shape. L-moment ratios are bounded, must be used. The reason is that λ3 and λ4 jointly simplifying their interpretation. In particular account for the skewness and the kurtosis of the GLD, irrespective of the shape measures used. For a detailed −1 <τ < 1 and 1 (5τ 2 −1)≤ τ < 1. (5)1) discussion on the computational difficulties in fitting the 3 4 3 4 GLD to a , the reader is referred to [19].

Let x :1 n ≤ x :2 n ≤ ... ≤ xn:n denote an ordered data set of size n. The rth sample L-moment is then given by 4. An Alternative Parameterization for the GLD

r−1 r −1 −1 k   The standard form (with spread set equal to ∑ ∑... ∑ r ∑(− )1 xi :n  k  r −k 1≤i1

lr tr = , r = ,3 ,4 ..., n . (7)1) l2 where λ is a . Applying the reflection rule in quantile model building to the GPD (see [21] for this rule), gives the QF of the reflected generalized 3. The Generalized Lambda Distribution (GLD) Pareto distribution (R-GPD),

In order to provide an for generating  1 λ  λ (p −1) , λ ≠ ,0 symmetric and asymmetric random variables in Monte QX ( p) =  (11)1) Carlo simulations, [12, 13] developed the Ramberg- ln p , λ = .0 Schmeiser (RS) parameterization of the GLD, defined by Consider the special case, λ = 0 in Equations (10) and pλ3 − 1( − p)λ4 (11), which corresponds to the QFs of the exponential Q ( p) = λ + , X 1 (8)1) (EXP) and reflected exponential (R-EXP) distributions λ2 respectively. The QF of the skew-logistic (S-LOG)

distribution, where λ is a , λ is a spread 1 2 parameter and λ and λ are shape parameters. If 3 4 QX ( p) = 1( −δ [ln) p]+δ [− ln( 1− p)] , (12)1) λ3 = λ4 , the GLD is symmetrical. See [14] for a detailed discussion of the parameter and properties of the is obtained by taking a weighted sum of the QFs of the RS parametrization. The FMKL parameterization of the R-EXP and EXP distributions, with 0 ≤ δ ≤1 a weight GLD, introduced by [15], has QF parameter. The QF in Equation (12) reduces to the QFs of the R-EXP, symmetric-logistic (LOG) and EXP  λ3 λ4  distributions respectively for δ = 0 , δ = 1 and δ = 1. 1 p −1 1( − p) −1 2 QX ( p) = λ1 +  −  . (9 )1) λ2  λ3 λ4  Taking the weighted sum of the QFs of the R-GPD and To estimate the four parameters of the GLD, one can the GPD, with values of λ equal, but not zero apply an estimation method where four measures, necessarily, yields the QF of our proposed namely a measure of location, a measure of spread and parameterization of the GLD. Adding location and two measures of shape, are utilized. Method of moment spread parameters, say α and β > 0 , in the usual way estimation was developed by [16]. -based (see again [21] for details), produces the four-parameter methods have been proposed by [17] and [4], while [18] QF, given by

Third Annual ASEARC Conference 2 December 7—8, 2009, Newcastle, Australia

  p λ −1  1( − p)λ −1      QX ( p) = α + β 1( −δ )   −δ   . (13)1)   λ   λ 

Table 1 summarizes the (distributional ) of this distribution. Apart from the special cases already mentioned above, the QF in Equation (13) simplifies to the QF of the uniform (UNIF) distribution for λ = 1 and 1 also for δ = 2 and λ = 2 .

Table 1. The support of the GLD with QF as given in Equation (13).

δ λ Support Figure 1. L-moment ratio diagram for the GLD with QF λ ≤ 0 (−∞ , α ] as given in Equation (13). δ = 0 λ > 0 [α − β λ , α ]

λ ≤ 0 (−∞ , ∞) 0 < δ <1 MoLM estimation is applied to a data set by equating λ > 0 [α − 1( − δ ) β λ , α + δ β λ] L1, L2 ,τ 3,τ 4 to l1, l2 , t3, t4 and solving for the unknown λ ≤ 0 [α , ∞) δ =1 parameters in the system of four equations. The λ > 0 [α , α + β λ] advantage of our parameterization over the RS and FMKL parameterizations is that the MoLM estimates All L-moments exist for λ > −1, and are given by can be calculated sequentially using

β 1( − 2δ ) 2 L1 = α − , (14)1) 3 + 7t ± t + 98 t +1 λ +1 λˆ = 4 4 4 , (18)1) 1(2 − t4 ) β L2 = , (15 )1) (λ + )(1 λ + )2   t λˆ + 3   1 1− 3 ( ) , λ ≠ ,1  2 ˆ δˆ =  λ −1  r −2  (19 )1) β 1( − 2δ )s (λ − i)  ∏  1 , λ = ,1 i=1  2 Lr = , r = ...,4,3 , (16 )1) r (λ + i) ∏ ˆ ˆ ˆ i=1 β = l2 (λ +1)(λ + 2), (20)1) where s = 1 for r odd and s = 0 for r even. The L- βˆ 1− 2δˆ skewness and L-kurtosis ratios are given by αˆ = l + ( ) . (21 )1) 1 ˆ λ +1 (λ − )(1 1− 2δ ) (λ − )(1 λ − )2 τ = and τ = , (17)1) 3 λ + 3 4 (λ + )(3 λ + )4 5. A Numerical Example and represented graphically in the L-moment ratio The coronary heart disease (CHD) data set in [22] diagram in Figure 1. As with the RS and FMKL contains the age in years of 100 subjects, assumed to parameterizations of the GLD, there is not always a one- have a LOG distribution in a to-one relation between the parameter values and the L- framework. A of the data is given by Figure 2 moments. In the dark grey region in Figure 1, two and suggests that the data is approximately symmetric, distinct pairs of values for δ and λ give the same pair but that it has tails shorter than the LOG distribution. of L-skewness and L-kurtosis ratios, while a one-to-one These visual deductions are confirmed by the sample L- relation exists in the light grey regions. moments, given in Table 2. Note that, since λ = 0 for It follows from Equation (17) that, irrespective of the the LOG distribution, τ = 1 = .0 1667 from Equation 1 4 6 value of λ , if δ = , then τ 3 = 0 and the GLD is 2 (17), whereas t = .0 0305 for the data. symmetrical. The L-kurtosis ratio, τ , only depends on 4 4 The generalized secant hyperbolic (GSH) distribution the value of λ , as does all τ 2r for r ≥ 2 . The minimum and a short-tailed symmetric (STS) distribution were fitted to the data by [23] and [24] respectively. We fitted value of τ 4 is obtained for λ = 6 −1 .

Third Annual ASEARC Conference 3 December 7—8, 2009, Newcastle, Australia

our parameterization of the GLD to the data set. Since

(t3, t4 ) lies in the dark grey region of Figure 1, we obtained two GLDs, denoted GLD 1 and GLD 2. Table 2 presents their MoLM parameter estimates, while their density curves are plotted in Figure 2. Both fitted GLDs have bounded support, agreeing well with the data set.

Table 2. Sample L-moments and MoLM estimates of the fitted GLDs for the age (in years) of 100 CHD subjects.

L-moments l1 l2 t3 t4 44.380 6.777 -0.00224 0.0305

MoLM estimates αˆ βˆ δˆ λˆ Figure 2. Histogram of the age (in years) of 100 CHD subjects with fitted density curves overlayed. GLD 1 44.768 28.964 0.489 0.627 GLD 2 44.140 117.177 0.504 2.688 [10] J.R.M. Hosking, “L-moments: analysis and estimation of distributions using linear combinations of order statistics”, Journal of the Royal Statistical Society: Series B 6. Conclusion (Methodological) , 52 (1), 105–124, 1990.

[11] J. Karvanen, A. Nuutinen, “Characterizing the generalized Using the GPD as a building block, we have derived a lambda distribution by L-moments”, Computational Statistics parameterization of the GLD which can be easily fitted & , 52 (4), 1971–1983, 2008. to a data set using MoLM estimation. In future research [12] J.S. Ramberg, B.W. Schmeiser, “An approximate method for we will be focusing on for this new generating symmetric random variables”, Communications of the member of the Tukey lambda family. Association for Computing Machinery , 15 (11), 987–990, 1972. [13] J.S. Ramberg, B.W. Schmeiser, “An approximate method for generating asymmetric random variables”, Communications of References the Association for Computing Machinery , 17 (2), 78–82, 1974. [14] Z.A. Karian, E.J. Dudewicz, Fitting Statistical Distributions: The [1] C. Hastings Jr., F. Mosteller, J.W. Tukey, C.P. Winsor, “Low Generalized Lambda Distribution and Generalized Bootstrap moments for small samples: a comparative study of order statistics”, Methods . Chapman and Hall / CRC Press, Boca Raton, Florida, 2000. The Annals of , 18 (3), 413–426, 1947. [15] M. Freimer, G.S. Mudholkar, G. Kollia, C.T. Lin, “A study [2] J.W. Tukey, “The practical relationship between the common of the generalized Tukey lambda family”, Communications in transformations of percentages of counts and of amounts”, Statistics: Theory and Methods , 17 (10), 3547–3567, 1988. Technical Report 36, Statistical Techniques Research Group, [16] J.S. Ramberg, P.R. Tadikamalla, E.J. Dudewicz, E.F. Princeton University, 1960. Mykytka, “A and its uses in fitting [3] A. Ramos-Fernández, A. Paradela, R. Navajas, J.P. Albar, data”, Technometrics , 21 (2), 201–214, 1979. “Generalized method for probability-based peptide and [17] Z.A. Karian, E.J. Dudewicz, “Fitting the generalized lambda protein identification from tandem mass spectrometry data and distribution to data: a method based on ”, Communications sequence database searching”, Molecular & Cellular in Statistics: Simulation and Computation , 28 (3), 793–819, 1999. Proteomics , 7 (9), 1748–1754, 2008. [18] R. King, H. MacGillivray, “Fitting the generalized lambda [4] H.N. Haridas, N.U. Nair, K.R.M. Nair, “Modelling income distribution with location and scale-free shape functionals”, using the generalised lambda distribution”, Journal of Income American Journal of Mathematical and Management Distribution , 17 (2), 37–51, 2008. Sciences , 27 (3–4), 441–460, 2007. [5] M. Ivkovi ć, P. Rozenberg, “A method for describing and modelling [19] Z.A. Karian, E.J. Dudewicz, “Computational issues in fitting of within-ring wood density distribution in clones of three coniferous statistical distributions to data”, American Journal of Mathematical species”, Annals of Forest Science , 61 (8), 759–769, 2004. and Management Sciences , 27 (3–4), 319-349, 2007. [6] L.W. Robinson, R.R Chen, “Scheduling doctors’ [20] J.R.M. Hosking, J.R. Wallis, “Parameter and quantile appointments: optimal and empirically-based heuristic estimation for the generalized Pareto distribution”, policies”, IIE Transactions , 35 (3), 295–307, 2003. Technometrics , 29 (3), 339–349, 1987. [7] F. Bautista, E. Gómez, “Una exploración de robustez de tres [21] W. Gilchrist, Statistical Modelling with Quantile Functions , pruebas: dos de permutación y la de Mann-Whitney”, Revista Chapman and Hall / CRC Press, Bocca Raton, Florida, 2000. Colombiana de Estadística , 30 (2), 177–185, 2007. [22] D.W. Hosmer, S. Lemeshow, Applied Logistic Regression , [8] R. Cao, G. Lugosi, “Goodness-of-fit tests based on the density 2nd edition, John Wiley & Sons, Inc., New York, 2000. ”, Scandinavian Journal of Statistics , 32 (4), 599–616, 2005. [23] D.C. Vaughan, “The generalized secant hyperbolic [9] O. Thas, J.C.W. Rayner, D.J. Best, “Tests for symmetry based distribution and its properties”, Communications in Statistics: on the one-sample Wilcoxon signed rank ”, Theory and Methods , 31 (2), 219–238, 2002. Communications in Statistics: Simulation and Computation , [24] A.D. Akkaya, M.L. Tiku, “Short-tailed distributions and 34 (4), 957–973, 2005. inliers”, Test , 17 (2), 282–296, 2008.

Third Annual ASEARC Conference 4 December 7—8, 2009, Newcastle, Australia