Parameter Estimations of the Generalized Extreme Value Distributions for Small Sample Size
Total Page:16
File Type:pdf, Size:1020Kb
Mathematics and Statistics 8(2A): 47-51, 2020 http://www.hrpub.org DOI: 10.13189/ms.2020.081308 Parameter Estimations of the Generalized Extreme Value Distributions for Small Sample Size RaziraAniza Roslan*, Chin Su Na, Darmesah Gabda Faculty of Science and Natural Resources, Universiti Malaysia Sabah, Malaysia Received July 28, 2019; Revised September 27, 2019; Accepted February 20, 2020 Copyright©2020 by authors, all rights reserved. Authors agree that this article remains permanently open access under the terms of the Creative Commons Attribution License 4.0 International License Abstract The standard method of the maximum maxima (BM) and peak over the threshold (POT). In BM, likelihood has poor performance in GEV parameter the period will be divided into equal section and the estimates for small sample data. This study aims to explore maximum of each will be selected. The approach is usually the Generalized Extreme Value (GEV) parameter going to pair with generalized extreme value (GEV). While estimation using several methods focusing on small sample POT will select every value that exceeds a certain threshold size of an extreme event. We conducted simulation study and this approach leads to generalized Pareto distribution to illustrate the performance of different methods such as (GPD) [1]. the Maximum Likelihood (MLE), probability weighted GEV distribution was introduced by Jenkinson [3] and moment (PWM) and the penalized likelihood method has been used in many research areas such as in civil (PMLE) in estimating the GEV parameters. Based on the engineering design [4], in hydrology [2], to estimate air simulation results, we then applied the superior method in quality [15] and also in finance [14]. The GEV distribution modelling the annual maximum stream flow in Sabah. The consists of three parameters; shape (휉), scale (휎) and result of the simulation study shows that the PMLE gives location µ. This parameter estimation of GEV distribution better estimate compared to MLE and PMW as it has small can be obtained using several statistical methods such as bias and root mean square errors, RMSE. For an application, the Maximum Likelihood Estimator (MLE), Probability we can then compute the estimate of return level of river Weighted Moments (PWM), Penalized Maximum flow in Sabah. Likelihood Estimator (PMLE) and L-moment. The aim of this study is to model the annual maximum stream flow Keywords Extreme Value Theory (EVT), using the GEV distribution focusing on small sample size Generalized Extreme Value (GEV), Maximum Likelihood data. We apply several methods to estimate the GEV Estimation (MLE), Probability Weighted Moments (PWM), parameters. Penalized Maximum Likelihood (PMLE), L-Moment Each method of parameter estimation has its advantages and disadvantages. But to get ideal parameter estimation, it can be explained in terms of unbiasedness, efficiency and consistency. It is said that the parameter estimation must be 1. Introduction unbiased where the estimated parameter closed to the true parameter and the parameter estimation is efficient. The Extreme Value Theory (EVT) is a statistics field that method with the smallest of the root mean square error concentrates on any possible event that can be led to more (RMSE) shows an efficient estimator. Other than that, the extreme than it is normally happening. Usually, EVT is parameter estimation must be consistent where the function used to measure safety during catastrophic events, of estimation is well converged [12]. sometimes if we do not pay attention to the risk of an event MLE is the method that is mostly used to estimate the because it just has a low occurrence it will cause huge GEV parameter because MLE has good asymptotic losses. Therefore we can use EVT in a specific location to properties such as consistency and efficiency. MLE is easy estimate the frequency and cost of such events over a period to adapt to model change [12]. Besides that, MLE can be of time. EVT has been widely used in various fields such used in a complex model such as the non-stationary model, as geophysical variable, insurance, risk management and temporal dependence and covariate effect. However, this hydrology [19]. There are two approaches used when it parameter estimation can only be used in large sample data comes to analyzing the extreme value, which is Block and the result will become uncertain if the data is less than 48 Parameter Estimations of the Generalized Extreme Value Distributions for Small Sample Size 50 values (minimum) [7]. This is confirmed by Hosking et distribution, uncertainty will be ignored [12]. al. (1985) that MLE shows a poor performance due to the The GEV distribution having the non-degenerate small sample size. Considering MLE cannot perform well distribution function fulfills where 푎푛and 푏푛 are constant in small sample size, Coles & Dixon [18] show an with 푎푛> 0 investigation about how to improve MLE by proposing an 푥−푏푛 ∗ Pr (Mn≤푥)≈ 퐺 ( ) = 퐺 (푥) (1) alternative method called PMLE. Their study stated that 푎푛 PMLE will not only maintain model flexibility and large The cumulative distribution function (CDF) of GEV sample optimality of MLE, also help to improve it on small distribution is denoted as follows [13]: properties. This may be concluded that PMLE is given an 1 improved smoother estimation along with better accuracy 푥−µ exp [− ⌈1 + 휉 ( )⌉휉] , 휉 ≠ 0 thandirect estimate without penalties [20]. G(x) ={ 휎 (2) PWM was probably advantageous for small set data 푥−µ 푒푥푝 {−푒푥푝 (− )} , 휉 = 0 because it has smaller uncertainty than ordinary moment 휎 [19] and has lower variance than others [12]. But when the x- µ Where푥: 1+ ( ) > 0, -∞< µ <∞, 휎 > 0 and -∞<휉<∞, shape parameter is large, this parameter estimation σ performs poorly [18] and upper quantile will show PWM is in this model 휉,휎 and µ are the parameters for shape, scale, biased. But PWM is still preferable than MLE for small and location. By equation (2) GEV distribution for sample size data. On the other hand, MLE is more flexible Frẻchetξ>0and Weibullξ<0, while for Gumbel distribution than PWM because covariate can be easily added in ξ=0 taken as ξ→0. parameterization [11]. PWM is equivalent to L-moment and it performs better 2.2. Maximum Likelihood Estimation (MLE) than MLE in terms of bias and RMSE [17,10].L moment is the summary statistic, where it provides a measure of Generally, MLE is the most popular estimation method location, kurtosis, skewness or any aspects of shape that in EVT because MLE is having good asymptotic properties explain about probability distribution and data sample. such as consistency, efficiency, and normality. MLE can be Although L-moment produces bias, but it is still preferable applied to complex modeling situations such as temporal due to having a smaller variance than MLE [13] as MLE dependence, non-stationary and covariate effects [12]. The produced a very large variance and error for estimation [9]. likelihood function can be written as 푛 However, L-moment parameter estimation can only be L(휃/푥)= ∏푖=1 (푥) (3) used to estimate the stationary process [8]. Therefore L- where g is probability density function of GEV Moment and MLE can be "mixed' to produce a better result for GEV parameter estimator. The outcome of this L(휃/푥)= 1 1 − −( )−1 combination helps to reduce variance and bias [13]. In this 푛 1 x−µ 휉 x−µ 휉 ∏푖=1 exp {− ⌈1 + ξ ( )⌉ } ⌈1 + ξ ( )⌉ , 휉 ≠ 0 study, we will illustrate the GEV parameter estimations { 휎 σ σ using simulation study. The superior method then will be 1 x−µ x−µ ∏푛 푒푥푝 (− ) exp {−exp (− )} , 휉 = 0 applied to model the annual maximum stream flow in 푖=1 휎 σ σ Sabah. (4) When the sample size rises to infinity, it is said that the MLE shows consistent estimator and the variance will go 2. Methodology to zero. The asymptotic theory allows MLE to be normally distributed as the sample size rises. MLE was chosen due The previous study has shown that PMLE is more to the stable performance in a large sample size (n>50) [5]. superior to other methods. In this study, we will illustrate The parameter estimation for GEV can be obtained by the GEV parameter estimation using 3 parameter estimates maximizing log likelihood function with respect to such as MLE, PWM, and PMLE. We conducted simulation parameters. study for methods comparison using R software with our own written code. From the result, then we will apply this method to model the annual maximum river flow in Sabah. 2.3. Penalized Maximum Likelihood Estimator (PMLE) 2.1. GEV Distribution The following Penalization Maximum Likelihood was introduced byColes & Dixon [18]. With the penalized The GEV distribution is a family distribution consisting likelihood function can be written as equation 5 [18]: of three distributions called as Gumbel, Fréchet, and L ,, = L ,, P Weibull. These distributions can fit the extreme data set pen( ) ( ) ( ) (5) with high accuracy. Choosing only one of family GEV distribution may cause bias in data and the term of Where L(,, ) is the standard likelihood function of Mathematics and Statistics 8(2A): 47-51, 2020 49 MLE from equation (4) and the penalty function p (훏) is 4. Simulation Study shown in equation (6): We illustrate the comparison of GEV parameter 1 estimations using a simulation study. For this purpose, we α if ξ 0 ξ simulate extreme events from GEV distribution, X~GEV p(ξ) = exp −λ −1 if 0 ξ 1 (6) ξ −1 for (0,1,0.15) with a sample size of n=30. We repeat this if ξ 1 simulation for 1000 times. For each case, we estimate the 0 parameter estimation using MLE, PWM, and PML.