Computational Engineering in Systems Applications (Volume II)

GPD Models for Extreme Rainfall in Dobrudja

JUDICAEL DEGUENON, Université d’Abomey – Calavi, École Polytechnique d’Abomey – Calavi 01 BP 2009 Cotonou BENIN [email protected]

ALINA BĂRBULESCU Department of Mathematics and Computer Sciences Ovidius University of Constanța 124, Bd., 900527, Constanța [email protected] www.math-modeling.ro

MAKHTAR SARR Abu Dhabi University, College of Arts and Sciences Abu Dhabi, UAE [email protected]

Abstract: A systematic modeling of extreme values of the monthly precipitation series, by generalized Pareto distribution (GPD) has been realized for ten monthly series collected in the period 1965 – 2005 in Dobrudja region. A peak over threshold method from the extreme value theory has been used to model the rainfall above a suitable choice of the threshold and the return level was determined. The following hypotheses have been tested: the change points existence for monthly rainfall extremes (by Pettitt test), and the non – stationarity one, by KPSS test. For the non- stationary series, the existence of cyclical (harmonic) trends has also been studied and a GPD re - parameterization by the orthogonal method has been done, using the scale as Poisson parameter, in the case of the cyclical trend existence. The comparison between the GPD and the re - parameterized models has been done by the likelihood ratio. We also gave the 100 year effective return level at the change points.

Key-Words: - Extreme values, Maximum Likelihood, Generalized Pareto Distribution, Ratio likelihood.

1 Introduction maps in Korea and Deidda and Puliga [9] applied The study of extremes is of major importance in the Generalized Pareto Distribution for left-censored records nature sciences, since the extreme events, as rainfall, in Sardinia. Annual flood series were found to be often storms, earthquakes, very low temperature etc. could skewed, which led to the development and use of many have disastrous effects. Establishing a probability skewed distributions, the most commonly applied distribution that provides a good fit to daily rainfall distributions being the Gumbel (EV1), the Generalized depth has long been a topic interest in the fields of Extreme Value (GEV), the Log Pearson Type III (LP3), hydrology, meteorology, and others. The investigations and the 3 parameter Lognormal [1], [20]. into the daily rainfall distribution are primarily spread Analyzing the literature we remark that several over three main research areas, namely, stochastic methods of modelling the extreme values can be applied: precipitation models, frequency analysis of precipitation, (i) Generalized Extreme Value Distributions (Frechet, and precipitation trends related to global change Gumbel or Weibull), whose parameters can be estimated [11]. either by maximum likelihood or by probability Different distributions have been used to study the weighted moments. But, for the use of these models, the extreme precipitations. The GEV has been chosen [17] hypotheses required are not always verified. Also, for a study of rainfall extremes in Louisiana and by excepting for the Weibull distribution, the estimation of Bonnin et al. [6] to model some daily and hourly rainfall return levels above 50 years gives unexploitable data in USA. Park and Jung [18] used the Kappa confidence intervals. distribution to generate extreme precipitation 3 quantile

ISBN: 978-1-61804-014-5 131 Computational Engineering in Systems Applications (Volume II)

(ii) Peak Over Threshold method and Generalized 2. The Pettitt test [19] is used to identify the change Pareto Distribution [12], for which the threshold point years in each of maximum monthly rainfall series. determination is not very facile and the hypotheses of 3. The Kwiatkowski – Phillips – Schmidt – Shin application are restrictive [15], [16]. (KPSS) [14] test is applied to test the null hypothesis In spite of the threshold selection difficulties, the that the series is stationary around a deterministic trend. GPD model present some advantages, like the flexibility For all the non stationary series, the orthogonal approach in the description of several types of tail behaviour. is used [21], that is the annual cycle in Poisson rate Since a systematic study of the extreme rainfall parameter is fitted, by: values in Dobrudja region (Romania) has not been done  t))(log(    10  Tt  2  TTt  .12),/2cos()/2sin( yet, the article come to complete our studies [2] – [6] of (2) precipitations in this region. and a new GPD is built with the cycle in scale In this article we present the results of the study of parameter: extreme rainfall performed using a data base formed by  t))(log(      Tt  TTt  .12),/2cos()/2sin( the monthly precipitation series collected at ten 10 2 (3) meteorological stations situated in Dobrudja in the 4. The new models with cycle in scale parameters are period 1961 – 2005. compared to the initial ones, using the likelihood ratio

[7].

5. Finally, the effective 100 - years return levels for 2 Methodology each value of the harmonics used in the fit for the change Let 21 ,...,, XXX n be a series of identically distributed point year are found. random variables (in our case, monthly rainfall) with an unknown underlying distribution F(x). Our interest lies 3 Results and discussions in estimating the behaviour of rainfall over a given high To model our data the R – packages ismev, eXtremes threshold u. This can be approached by estimating the and evir [7], [8], [10], [22] were used. excess distribution,u (yF ) , which represents the In Table 1, the possible threshold to fit GPD models probability that a heavy rainfall event that has a value of for data (called Model_1), together with the chosen ones

X i exceeds the threshold u by at most an amount of y, are presented for the monthly series. The inferior limit of given the information that it exceeds the threshold u the interval has been set as the value of chosen threshold, [15]. In our case, the problem is solved using the u, in the GPD models. For example, for , the General Pareto Distribution (GPD), which has the chosen threshold is 55. following distribution form: Table 1. Threshold choice  /1  x  0,)/1(1 Interval of possible threshold u , xG )(   , (1)  x  0,)/exp(1 (mm) (mm) where  is the scaling parameter and  , the shape, with: Adamclisi [55, 58] 55 Cernavoda [22, 25] 22  0 and 0,x  if  0, and x   /0  , if [35, 35.5] 35  .0 Harsova [41, 43] 41 Giving a threshold, u, we define the number of Corugea [35, 45] 35 [57, 61] 57 exceedences, Nu , to be the number of data out of the Constanta [21.5, 23.5] 21.5 total number of points, that exceed the threshold u, and [32, 45] 32 for which the GPD was fitted by MLE method. Jurilovca [50, 60] 50 The following steps were followed to carry out the Sulina [10, 30] 10 work: 1. Assuming that for a certain threshold u, Knowing the threshold, the number of exceedances, the u  , xGxF )()( [16], the parameters estimation is parameter estimates, together with the corresponding performed by maximum likelihood, the threshold, u, standard errors (Std. err. scale, Std. err. shape) are being determined as the lowest value of u for which the provided in Table 2. ˆ ˆ We remark that the standard errors are in the intervals estimates  and u of  and u u (the re – [1.5523, 4.8829] for the scale and [0.04829, 0.10590], parameterizing scale parameter) remain near constant in ˆ ˆ for the shape. the charts obtained by plotting  and u together with The models quality is revealed analyzing Figs. 1 – 10. their confidence intervals. In Figs. 1 – 5 the residual probability plots and the quantile residual plots (in exponential scale) are drawn.

ISBN: 978-1-61804-014-5 132 Computational Engineering in Systems Applications (Volume II)

The line represents the theoretical residual value in the a. Residual probability plot; b. Residual quantile plot convergence case, in the part a. of each figure, respectively the theoretical residual quantiles in the convergence case, in the part b. Since the plots are situated along the first bisectrix, we accept the hypothesis that the model is well fitted. In Fig.7-11, the probability plots (a) and the quantile plots (b) are presented together with the return level plot (c) and the histogram of data with the fitted density (d).

Table 2. Parameters estimation by maximum likelihood method for GPD models

Fig.3. Diagnostics plots - Model_1 Corugea. a. Residual probability plot; b. Residual quantile plot

Fig.4. Diagnostics plots - Model_1 Jurilovca. a. Residual probability plot; b. Residual quantile plot

Fig.1. Diagnostics plots – Model_1 Adamclisi. a. Residual probability plot; b. Residual quantile plot

Fig.5. Diagnostics plots - Model_1 Cernavodă. a. Residual probability plot; b. Residual quantile plot

In Fig. 6 – 10 (a) and (b) the theoretical curves are represented by solid lines and the empirical ones, by points. In (c), the solid central curve is the return period estimate, the other two being the 95% confidence interval. The empirical return period of extreme rainfalls Fig.2. Diagnostics plots - Model_1 Medgidia.

ISBN: 978-1-61804-014-5 133 Computational Engineering in Systems Applications (Volume II)

observed in the studied period is shown as points. Analysing the diagnostic plot we remark that the GPD Model_1 fit well the excesses over threshold for all the series. Defining the MLE Poisson rate parameter as the number of exceedances per year, the following values have been obtained in Model_1: 2.4146 – for Tulcea, 5.2927 – for Mangalia, and 8.1463, for Sulina.

Fig.9. Diagnostics plots: Model_1 Mangalia. a. Probability plot, b. Quantile plot, c. Return level plot, d. Density plot

Fig.6. Diagnostics plots: Model_1 Harsova. a. Probability plot, b. Quantile plot, c. Return level plot, d. Density plot.

Fig.10. Diagnostics plots: Model_1 Sulina. a. Probability plot, b. Quantile plot, c. Return level plot, d. Density plot

The results of KPSS stationarity test, together with the p – values associated are given in Table 3. For the series for which the p – value is less than 0.05, the stationarity hypothesis was rejected. Therefore, only Tulcea, Mangalia and Sulina series are stationary.

Fig.7. Diagnostics plots: Model_1 Constanta. Table 3. Results of KPSS test a. Probability plot, b. Quantile plot, c. Return level plot, Stationnarity d. Density plot. Station (significance level of p-value 5%) Adamclisi Rejected 0.03277 Cernavoda Rejected < 0.01 Medgidia Rejected 0.01572 Harsova Rejected < 0.01 Corugea Rejected < 0.01 Tulcea Accepted 0.0902 Constanta Rejected < 0.01 Mangalia Accepted 0.0948 Jurilovca Rejected 0.02422 Sulina Accepted > 0.1

For the seven non – stationary series, the annual cycle Fig.8. Diagnostics plots: Model_1 Tulcea. a. Probability in Poisson rate parameter was fitted. The parameter is plot, b. Quantile plot, c. Return level plot, d. Density plot

ISBN: 978-1-61804-014-5 134 Computational Engineering in Systems Applications (Volume II)

given in Table 4, together with the corresponding Table 6. Results of Likelihood ratio test standard deviations. 2 -value Conclusions The significance tests lead us to the conclusion that Adamclisi 0.1525109 Not Significant some coefficients are not significant. They are Medgidia 0.0000512 Significant accompanied by * in Table 4. Therefore: Adamclisi, Medgidia, Corugea and Jurilovca series present Corugea 0.0139127 Significant harmonic trends, Cernavoda and Harsova series present a Jurilovca 0.0705064 Not Significant cos component and Constanta series has no cyclic The change point years are presented in Table 7. The component. year 1975 is an artificial change point for Constanta series, being due to the change of the emplacement of Table 4. Parameters in the Poisson model this station; therefore only 1981 will be considered.

Table 7. Extreme change point years Change point (year) p-value Adamclisi 1973 0.41 Cernavoda 1972 0.15 Medgidia 1990 0.34 Harsova 1978 0.26 Corugea 1972 0.12 Tulcea 1995 0.23 Constanta 1975 and 1981 0.16 Mangalia 1994 0.09 Jurilovca 1978 0.08 Sulina 1980 0.20 * means that the coefficient is not significant The 100 - effective return levels have also been For the series that present a harmonic cyclic trend, determined and are presented in Fig. 11, for comparison GPD models with annual cycle in parameter scale have for a part of the non stationary series. been fitted. The determined parameters and the corresponding standard deviations (in brackets) are given in Table 5. The new model is called Model_2.

Table 5. Orthogonal approach: fitting GPD with annual cycle in scale parameter

0 1 2  Adamclisi 3.294 -0.093 -0.250 0.020 (0.142) (0.144) (0.132) (0.098) Medgidia 3.223 -0.276 -0.354 0.009 (0.111) (0.108) (0.098) (0.080) Corugea 3.229 -0.288 -0.189 0.001 (0.112) (0.119) (0.100) (0.072) Jurilovca 3.747 -0.036 0.292 -0.354 (0.173) (0.120) (0.095) (0.126)

To compare Model_1 and Model_2 the Likelihood ratio test was performed. The test results are presented in Fig.11. Effective return level: a. Adamclisi, b. Table 6, where the 2 - values are accompanied by the  Cernavoda, c. Medgidia, d. Harsova series test conclusions: Significant means that better performance have been registered by Model_2 than by The values of the return levels and the confidence Model_1. In this case, the Model_2 explains better the intervals have also been calculated. The return levels are data at the given stations than the Model_1. influenced by the geographical situation of each station, The hypothesis of extreme change point existence has the smallest being for Sulina (situated in 8 km offshore) been verified by Pettitt test. and Corugea (which has the highest altitude) and the

ISBN: 978-1-61804-014-5 135 Computational Engineering in Systems Applications (Volume II)

highest for Mangalia, situated in the South part of Southwest (Arizona, Southeast California, Nevada, Dobrudja, on the Littoral. New Mexico), NOAA Atlas 14, 2006 Testing the existence of a trend in the extreme series, [7] S. G. Coles, An Introduction to Statistical Modeling after the change point, lead us to the conclusion that of Extreme Values. Springer, 2001 there is an increasing trend only for Corugea, Jurilovca, [8] S.G. Coles, A. Stephenson A, ismev: An Introduction Cernavoda, Mangalia and Sulina series. No trend was to Statistical Modeling of Extreme Values, 2010, detected after the change point for the other extreme http://www.ral.ucar.edu/~ericg/softextreme.php series. [9] R. Deidda, M. Puliga, Sensitivity of goodness-of-fit statistics to rainfall data rounding off, Physics and Chemistry of the Earth, 31, 2006, pp. 1240 – 1251. 4 Conclusion [10] E. Gilleland, R. Katz, G. Young, extRemes: In this article GPD models have been built for monthly Extreme value toolkit, R package, 2009, precipitation series in Dobrudja region and a selection http://www.assessment.ucar.edu/toolkit/ procedure of the series that present a cyclic harmonic [11]L.S. Hanson, R. Vogel, The Probability Distribution trend was used. The analysis indicated that Mangalia, of Daily Rainfall in the United States, 2008, Tulcea and Sulina are stationary and the non-stationary http://engineering.tufts.edu/cee/people/vogel/publicat ones Constanţa doesn’t present a significant cyclic trend, ions/DailyRainfall.pdf Cernavoda and Harsova present a cosine trend, [12] J. R. Hosking, J. R. Wallis, Parameter and quantile Adamclisi, Medgidia, Corugea and Jurilovca present a estimation for the generalized Pareto distribution, significant harmonic trend. For Medgidia and Corugea, Technometrics, 29, 1987, pp. 339 – 349. the likelihood ratio test relieves that taking into account [13] IPCC, In: Climate Change, The Scientific Basis, J. the harmonic trend, there a significant improvement in T.Houghton, Y. Ding, D. J. Griggs, M. Noguer, P.J. the re-parameterization of GPD model via orthogonal van der Linden, X. Dai, K. Maskell, C.A. Johnson techniques. (Eds.), Cambridge University Press, 2001, pp. 34 – For Constanta, Cernavoda and Harsova series it is 44. possible that the climate change is influenced by the [14] D. Kwiatkowski, P. C. B. Phillips, P. Schmidt, Y. latitude and longitude of the meteorological stations. Shin, Testing the Null Hypothesis of Stationarity The unhomogeneity of the extreme rainfalls return against the Alternative of a Unit Root, Journal of level has also been relieved. Econometrics, 54, 1992, pp. 159 – 178. [15] I. Li, W. Cai, E.P. Campbell, Statistical Modeling Acknowledgements. This article was partially supported of Extreme Rainfall in Southwest Western Australia. by CNCSIS – UEFISCSU under Grant PN II ID_262. Journal of Climate, 18, 2005, pp. 852 – 863. [16] A. J. McNeil, Extreme value theory for risk References: managers; in Extreme and Integrated Risk [1] H. Abida, M. Ellouze M, Probability distribution of Management, Embrechts P, Ed., UBS Warbug, 2005, flood flows in Tunisia, Hydrol. Earth Syst. Sci., 12, pp. 1–35. 2008, pp. 703–714. [17] B. Naghavi, F.X. Yu, Regional frequency analysis [2] A. Bărbulescu, E. Băutu, Alternative Models for of extreme precipitation in Louisiana. Journal of Time Series. An. St. Univ. Ovidius, Mat., 3, 2009, pp. Hydraulic Engineering, 121, 1995, pp. 819–827. 45 – 68. [18] J.S. Park, H.S Jung, Modelling Korean extreme [3] A. Bărbulescu, E. Băutu, Meteorological Time Series rainfall using a Kappa distribution and maximum Modelling Based on Gene Expression Programming. likelihood estimate, Theoretical and Applied In: Recent Advances in Evolutionary Computing, Climatology, 72, 2002, pp. 55 – 64. WSEAS Press, 2009, pp. 17-23. [19] A. N. Pettitt, A nonparametric approach to the [4] A. Bărbulescu, E. Băutu, ARIMA Models versus change point problem. Appl. Stat., 28, 1979, pp. 126– Gene Expression Programming In Precipitation 135. Modeling. In: Recent Advances in Evolutionary [20] P.J. Pilon, J. D. Harvey, Consolidated frequency Computing, 2009, WSEAS Press, pp. 112-117. analysis, Reference manual, Environment Canada, [5] A. Bărbulescu, E. Pelican, ARIMA models for the Ottawa, Canada, 1994 analysis of the precipitation evolution. In: Recent [21] Soyoung J. Data Analysis in Extreme Value Advances in Computers, WSEAS Press, 2009, pp. Theory: Non-stationary Case, 2008, 221 – 226. www.unc.edu/~rls/s890/SoyoungWriteup.pdf [6] G.M. Bonnin, D. Martin, B. Lin, T. Parzyok, M. [22] A. Stephenson, E. Gilleland, Software for the Yekta, D. Riley, Precipitation-Frequency Atlas of the analysis of extreme events: The current state and United States, Vol.1, Version 4.0: Semiarid future directions. Extremes, 8, 2006, pp. 87-109.

ISBN: 978-1-61804-014-5 136