Selecting the Best Probability Distribution for At-Site Flood
Total Page:16
File Type:pdf, Size:1020Kb
Research Article Selecting the best probability distribution for at‑site food frequency analysis; a study of Torne River Mahmood Ul Hassan1 · Omar Hayat2 · Zahra Noreen3 Received: 26 May 2019 / Accepted: 29 October 2019 / Published online: 18 November 2019 © The Author(s) 2019 OPEN Abstract At-site food frequency analysis is a direct method of estimation of food frequency at a particular site. The appropri- ate selection of probability distribution and a parameter estimation method are important for at-site food frequency analysis. Generalized extreme value, three-parameter log-normal, generalized logistic, Pearson type-III and Gumbel distributions have been considered to describe the annual maximum steam fow at fve gauging sites of Torne River in Sweden. To estimate the parameters of distributions, maximum likelihood estimation and L-moments methods are used. The performance of these distributions is assessed based on goodness-of-ft tests and accuracy measures. At most sites, the best-ftted distributions are with LM estimation method. Finally, the most suitable distribution at each site is used to predict the maximum food magnitude for diferent return periods. Keywords Flood frequency analysis · L-moments · Maximum likelihood estimation 1 Introduction insurance studies, planning of food management and rescue operations. We come across a number of methods Floods are natural hazards and cause extreme damages which are available in the literature for food magnitude throughout the world. The main reasons of floods are estimation, but at-site food frequency analysis remains extreme rainfall, ice and snow melting, dam breakage the most direct method of estimation of food frequency and the lack of capacity of the river watercourse to con- at a particular site. vey the excess water. Floods are natural phenomena which To describe the flood frequency at a particular site, cause disasters like destruction of infrastructure, damages the choice of an appropriate probability distribution and in environmental and agricultural lands, mortality and parameter estimation method are of immense importance. economic losses. Many frequency distribution models The probability distributions used in this study include the have been developed for determination of hydraulic fre- generalized extreme value (GEV) distribution, Pearson quency, but none of the distribution models is accepted type-III (P3) distribution, generalized logistic (GLO) distri- as a universal distribution to describe the food frequency bution, Gumbel (GUM) distribution and three-parameter at any gauging site. The selection of a suitable distribution log-normal (LN3) distribution. These distributions are rec- usually depends on the characteristics of available data ommended for at-sight food frequency analysis in various at a particular site. We need to estimate the food mag- countries [2, 4, 24]. Furthermore, these distributions are nitude at a particular site for various purposes including most commonly traced in the hydrological literature for construction of hydraulic structures (barrages, canals, at-site and regional food frequency analysis. bridges, dams, embankments, reservoirs and spillways), * Mahmood Ul Hassan, [email protected] | 1Department of Statistics, Stockholm University, Stockholm, Sweden. 2Department for Education, London, UK. 3Division of Science and Technology, University of Education, Lahore, Pakistan. SN Applied Sciences (2019) 1:1629 | https://doi.org/10.1007/s42452-019-1584-z Vol.:(0123456789) Research Article SN Applied Sciences (2019) 1:1629 | https://doi.org/10.1007/s42452-019-1584-z Cicioni et al. [3] conducted at-site food frequency anal- included in water resources management can be distin- ysis in Italy by using 107 stations. They identifed LN3 and guished in data uncertainties, structural uncertainties GEV as best-ftting distributions based on the Kolmogo- and model/parameters uncertainties, see e.g. [13, 14]. rov–Smirnov (KS), Anderson–Darling (AD) and Cramer–von Furthermore, there is uncertainty in the estimation of Mises (CVM) goodness-of-ft tests. Saf [23] found the GLO food frequency of return periods much larger than the as the most suitable distribution for Upper-West Medi- actual records, particularly in the type of probability den- terranean subregion in Turkey. Mkhandi et al. [17] used sity function (PDF) and its parameters. This is particularly annual maximum food data of 407 stations from 11 coun- true on the right tail of the PDF, the region of interest for tries of Southern Africa to conduct the regional frequency fooding. In addition, there is uncertainty in the measure- analysis. They identifed Pearson type-III (P3) distribution ments. For example, see [15] for an in-depth discussion on with probability weighted moment (PWM) method and epistemic uncertainty (reducible uncertainty) and natural log-Pearson type-III (LP3) with a method of moment uncertainty (irreducible uncertainty). The food estimation (MOM) as the most suitable distributions for the regions. on high return periods are always associated with high Młyński et al. [18] identifed log-normal distribution as the uncertainties. In this study, we quantify the uncertainty of most suitable for the upper Vistula Basin region (Poland). a given quantile estimate for a specifc ftted distribution There have been many studies in the past literature on by using the parametric bootstrap method. the comparison of various probability distributions with In this research paper, the food frequency calculation, diferent parameter estimation methods for at-site food using statistical distribution, is addressed for gauged frequency analysis, e.g. see [1, 5, 7, 22] (Fig. 1). catchments, for which we dispose a respectively long- The most commonly used methods for estimation term hydrological time series. The choice of an appropriate of parameters in food frequency analysis are the maxi- probability distribution and associated parameter estima- mum likelihood estimation (MLE) method, the method of tion method is vital for at-site food frequency analysis. The moments (MOM), the L-moments (LM) method and the core objective of this study is to fnd the best-ft distribu- probability weighted moments method (PWM). The MLE tion among the candidate probability distributions with method is an efcient and most widely used method for a particular method of estimation (MLE or LM) for annual estimation of parameters. Recently, the LM method has maximum peak fow data at each site of the Torne River by gained more attention in the hydrological literature for using goodness-of-ft (GOF) tests and accuracy measures. estimation of parameters of probability distributions. In We are also interested to look that, is there any best overall this research study, we used LM and MLE methods for distribution and ftting method for these fve sites of Torne estimation of parameters of the candidate probability River?. Finally, to estimate the quantiles of food magni- distribution. tude for the return period of 5, 10, 25, 50, 100, 200 and 500 The methods usually use for selection of the best years with non-exceedance probability at each site of the distribution are goodness-of-ft (GOF) tests (e.g. Ander- river using the best-ft probability distribution. To address son–Darling and Cramér–von Mises), accuracy measures the uncertainty of food estimations, we estimate standard (e.g. root mean square error and root mean squared per- error of estimated quantiles and construct 95% confdence centage error), goodness-of-ft indices (e.g. AIC and BIC) interval of food quantile for the return period using the and graphical methods (e.g. Q–Q plot and L-moment ratio parametric bootstrap method. This is a frst study for at-site diagram). In the hydrological literature, researchers have food frequency analysis of Torne River. used diferent methods in order to fnd the best probabil- This paper is organized as follows: Sect. 2 describes the ity distribution. To identify the best-suited distribution study area and available data for the analysis. Section 3 at each site of Torne River, we have used goodness-of-ft deals with the model description, parameter estima- (GOF) test and accuracy measures. The GOF tests are used tion method and model comparison methods. Section 4 to test that the data come from a specifc distribution. The provides the results and discussion of the application of accuracy measures provide a term by term comparison of L-moments and maximum likelihood estimation method the deviation between the hypothetical distribution and based food frequency analysis of fve gauging sites on the the empirical distribution of the data. The details about Torne River. Finally, Sect. 5 concludes the article. accuracy measures and goodness-of-ft test used in this study are described in Sect. 3. The estimation of food frequency of the high return 2 Study area and data period is of great interest in food frequency analysis. The food frequency estimation of return periods is always The Torne River works as a border between northern Swe- associated with uncertainties. Uncertainty in food fre- den and Finland, with total catchment area 40157 km2 of quency analysis arises from many sources. Uncertainties which 60% is within Swedish border and the remaining Vol:.(1234567890) SN Applied Sciences (2019) 1:1629 | https://doi.org/10.1007/s42452-019-1584-z Research Article Table 1 Summary of Torne Station name Station no. Latitude Longitude Catchment Period for time series River gauging sites area (km2) Kukkolankoski Övre 16722 65.98 24.06 33,929.60 1911–2018 Pajala Pumphus 2012 67.21 23.40 11,038.10 1969–2018 Abisko 2357 68.19 19.99 3345.50 1984–2018 Junosuando 04 67.43 22.55 4348.00 1967–2018 Övre Abiskojokk 957 68.36 18.78 566.30 1985–2018 [Swedish meteorological and hydrological institute (SMHI)]. The data of annual maximum fow of fve gauging sites of Torne River (Swedish: Torneälven) are considered in this study. The data have been collected from SMHI (www. smhi.se). The length of the data series varies from 34 to 108 years. The summary of Torne River gauging sites character- istics is presented in Table 1.