On Generalized Gamma Distribution and Its Application to Survival Data

International Journal of Statistics and Probability; Vol. 8, No. 5; September 2019 ISSN 1927-7032 E-ISSN 1927-7040 Published by Canadian Center of Science and Education On Generalized Gamma Distribution and Its Application to Survival Data Kiche J1, Oscar Ngesa2 & George Orwa3 1 Mathematics Department, Pan African University, Institute of Basic Sciences, Technology, and Innovation, Kenya 2 Mathematics and Informatics Department, Taita-Taveta University, Kenya 3 Statistics and Actuarial Sciences Department, Jomo Kenyatta University of Agriculture and Technology, Kenya Correspondence: Kiche J, Mathematics Department, Pan African University, Institute of Basic Sciences, Technology, and Innovation, Kenya. E-mail: [email protected] Received: July 7, 2019 Accepted: August 27, 2019 Online Published: August 30, 2019 doi:10.5539/ijsp.v8n5p85 URL: https://doi.org/10.5539/ijsp.v8n5p85 Abstract The generalized gamma distribution is a continuous probability distribution with three parameters. It is a generalization of the two-parameter gamma distribution. Since many distributions commonly used for parametric models in survival analysis (such as the Exponential distribution , the Weibull distribution and the Gamma distribution) are special cases of the generalized gamma, it is sometimes used to determine which parametric model is appropriate for a given set of data. Generalized gamma distribution is one of the distributions used in frailty modeling. In this study , it is shown that generalized gamma distribution has three sub-families and its application to the analysis of a survival data has also been explored. The parametric modeling approach has been carried out to find the expected results. Keywords: generalized gamma, sub-families, Kullback-Leibler divergence 1. Introduction The early generalization of gamma distribution can be traced back to Amoroso(1925)who discussed a generalized gamma distribution and applied it to fit income rates. Johnson et al. gave a four parameter generalized gamma distribution which reduces to the generalized gamma distribution defined by Stacy (1962) when the location parameter is set to ze- ro.The generalized gamma defined by Stacy (1962) is a three-parameter exponentiated gamma distribution. Mudholkar and Srivastava (1993) introduced the exponentiated method to derive a distribution. Agarwal and Al-Saleh (2001) applied generalized gamma to study hazard rates. Balakrishnan and Peng (2006) applied this distribution to develop generalized gamma frailty model.Nadarajah and Gupta (2007) proposed another type of generalized gamma distribution with application to fitting drought data. Cordeiro et al. (2012) derived another generalization of Stacy’s generalized gamma distribution using exponentiated method, and applied it to life time and survival analysis. The generalized gamma distribution presents a flexible family in the varieties of shapes and hazard functions for modeling duration. Distributions that are used in duration analysis in economics include exponential, log-normal, gamma, and Weibull . The generalized gamma family, which encompasses exponential, gamma, and Weibull as subfamilies, and log- normal as a limiting distribution, has been used in economics by Jaggia, Yamaguchi, and Allenby et al . Some authors have argued that the flexibility of generalized gamma makes it suitable for duration analysis, while others have advocated use of simpler models because of estimation difficulties caused by the complexity of generalized gamma parameter structure. Obviously, there would be no need to endure the costs associated with the application of a complex generalized gamma model if the data do not discriminate between the generalized gamma and members of its subfamilies, or if the fit of a simpler model to the data is as good as that for the complex generalized gamma. Hager and Bain inhibited applications of the generalized gamma model. Prentice resolved the convergence problem using a nonlinear transformation of generalized gamma model. Maximum- likelihood estimation of the parameters and quasi maximum likelihood estimators for its subfamily (two-parameter gamma distribution) can be found. Hwang, T. et al introduced a new moment estimation of parameters of the generalized gamma distribution using it’s characterization. In information theory, thus far a maximum entropy (ME) derivation of generalized gamma is found in Kapur, where it is referred to as generalized Weibull distribution, and the entropy of GG has appeared in the context of flexible families of distributions. Some concepts of this family in information theory has introduced by 85 http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 Dadpay et al . 2. Parametric Distributions and Their Properties 2.1 Log-Normal Distribution A log-normal distribution is a continuous probability distribution of a random variable whose logarithm is normally distributed. If the random variable X is log-normally distributed, then Y = ln(X) has a normal distribution. Likewise, if Y has a normal distribution, then the exponential function of Y, X = exp(Y), has a log-normal distribution. A random variable which is log-normally distributed takes only positive real values. A log-normal process is the statistical realization of the multiplicative product of many independent random variables, each of which is positive. This is justified by considering the central limit theorem in the log domain. The log-normal distribution is the maximum entropy probability distribution for a random variateX for which the mean and variance of ln(X) are specified. A positive random variable X is log-normally distributed if the logarithm of X is normally distributed, ln(X) ∼ N(µ, σ2) The probability density function of the log-normal distribution is given by: 1 1 (lnx−µ)2 fX(x) = p exp(− ) (1) x σ 2π 2σ2 The cumulative distribution is given as − µ = Φ (lnx) FX(x) ( σ ) (2) which may also be expressed as : 1 lnx − µ [1 + er f ( p )] (3) 2 σ 2 where erf is the error function. Approximating formular for the characteristic function '(t) can be given as −!2(−itσ2eµ)+2!(−itσ2eµ) exp( σ2 '(t) ≈ p 2 (4) 1 + !(−itσ2eµ) where ! is the Lambert W function. For a log-normal random variable, the partial expectation is given by 2 µ+ 1 σ2 µ + σ − lnk = 2 Φ g(k) e ( σ ) (5) The quantile for the log-normal distribution is given by p exp(µ + 2σ2er f −1(2F − 1)) (6) while the variance is given as [exp(σ2) − 1]exp(2µ + σ2) (7) The skewness is given by the formular p 2 (eσ + 2) eσ2 − 1 (8) The Ex.kurtosis is then denoted by exp(4σ2) + 2exp(3σ2) + 3exp(2σ2) − 6 (9) 2 ; +1 µ + σ2 µ − σ2 The Supportp is over x (0 ) while the mean is exp( 2 ), the mode is exp( ) and the Entropy is given as µ+ 1 log2(σe 2 2π) 86 http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 2.2 Weibull Distribution The Weibull distribution is a continuous probability distribution. It is named after Swedish mathematician Waloddi Weibull, who described it in detail in 1951, although it was first identified by Frechet (1927) and first applied by Rosin and Rammler (1933) to describe a particle size distribution. The Weibull distribution is one of the most widely used lifetime distributions in reliability engineering. It is a versatile distribution that can take on the characteristics of other types of distributions, based on the value of the shape parameter. The probability density function of a Weibull random variable is ν ; ξ; ν = x ν−1 −(x/ξ)ν ; ≥ f (x ) ξ ( ξ ) e x 0 (10) where ν > 0 is the shape parameter and ξ > 0 is the scale parameter of the distribution. The cumulative distribution function for the Weibull distribution is ν F(x; ξ; ν) = 1 − e−(x/ξ) (11) The failure rate h (or hazard function) is given by ν ; ξ; ν = x ν−1 h(x ) ξ ( ξ ) (12) The quantile (inverse cumulative distribution) function for the Weibull distribution is Q(p; ξ; ν) = ξ(−ln(1 − p))1/ν (13) The moment generating function of the logarithm of a Weibull distributed random variable is given by tlogX = ξtΓ t + E[e ] (ν 1) (14) The mean and variance of a Weibull random variable can be expressed respectively as = ξΓ + 1 E(X) (1 ν ) (15) and = ξ2 Γ + 2 − Γ + 1 2 Var(X) [ (1 ν ) ( (1 ν )) ] (16) The skewness is given by 3 Γ + ξ3 − µσ2 − µ3 (1 ν ) 3 γ = (17) 1 σ3 where the mean is denoted by µ and the standard deviation is denoted by σ. The excess kurtosis is given by 4 2 2 −6Γ + 12Γ Γ2 − 3Γ − 4Γ1Γ3 + Γ4 γ = 1 1 2 (18) 2 Γ − Γ2 2 [ 2 1] The information entropy is given by ξ ξ; ν = γ − 1 + + H( ) (1 ν ) ln(ν) 1 (19) where γ is the EulerMascheroni constant. The Kullback-Leibler divergence for Weibull distribution is given by ν ν γ ξ ν 1 2 1 ν2 2 DKL(Weib1 k Weib2) = log ν − log ν + (ν1 − ν2)[logξ1 − ] + ( ) Γ( + 1) − 1 (20) ξ 1 ξ 2 ν ξ ν 1 2 1 2 1 The variances and covariances ofν ˆ and ξˆ are estimated from the inverse local Fisher matrix as 0 1− B @2Λ @2Λ C 1 0 1 B − − C B ˆ ν ˆ ν; ξˆ C B ∂ν2 ∂ν∂ξ C B Var(ˆ) Cov(ˆ )C B C B : C = B : C @B AC B C (21) ˆ ν; ξˆ ˆ ξˆ B @2Λ @2Λ C Cov(ˆ ) Var( ) @B− − AC ∂ν∂ξ ∂ξ2 ν=ν,ξˆ =ξˆ 87 http://ijsp.ccsenet.org International Journal of Statistics and Probability Vol. 8, No. 5; 2019 2.3 Exponential Distribution The exponential model, with only one unknown parameter, is the simplest of all life distribution models. The exponential distribution is one of the widely used continuous distributions.

Load more