Interrelation of Droughts and Floods through Outlier Detection on Rivers in

Borislava Blagojevic1, Aleksandra Ilic2, Stevan Prohaska2 1 Faculty of Civli Engineering and Architecture, Nis, Serbia 2 Jaroslav Černi Institute for the Development of Water Resources, Belgrade, Serbia

Abstract

As floods and droughts are normal, recurrent features in climate, this paper will present an introduction to an in-depth analysis of draught and flood interrelation on rivers in Serbia. There are 144 gauge stations flow data available for analysis. These stations form an observation network of Hydro-Meteorological Service of the Republic of Serbia. Gauge stations from the Province of Vojvodina will not be taken into consideration due to unnatural flow regime caused by -Tisa-Danube channel. Data sets to be analyzed are annual minima, maxima, average flow, and 30-day minima. Outlier identification will be performed by Pilot and Harvey test. It is expected that extremely wet/dry years will be shown as synchronized outliers at a gauge station in each of the tested series either as high or low outlier. Taking existence of an outlier as the only drought/flood criteria, new series of extreme events will be formed. Data for years when low outlier appears in the series of annual minima will be excluded from the series of annual maxima and vice versa. This approach will lead to formation of series with data from the same population. Statistics of the series of extreme events will be compared for observed series, series from the same population and series of 30 days minima. The significance of difference in statistics of these series will be analyzed and presented. Regions prone to hydrologic droughts, floods, or both are expected to be identified for the territory of Serbia.

Key words: floods, hydrologic draught, outliers

Introduction

This paper presents an introduction to an in-depth analysis of draught and flood interrelation on rivers in Serbia. The research is part of an interdisciplinary project ‘Extreme Hydrologic Features: Floods and Droughts’. Taking outliers in hydrologic series as the only indicators of extreme floods and draughts, a working hypothesis for this research was set: extremely wet/dry years would show as synchronized outliers at a gauge station in each of the tested series, either as high or low outlier. It was also expected that regions prone to hydrologic droughts, floods, or both, could be identified on the territory of Serbia.

Studied area and input data

The studied area was territory of Serbia, south of Danube (Dunav) and Sava rivers. Within this territory, rivers belonging to the Black and Aegean Sea basins were investigated. Flow records at hydrologic gauge stations (G.S.) from the Republic of Serbia Hydro-Meteorological Service (RHMZS) Surface Water Observation Network were used. Figure 1 shows river network and division into sub--basins. Calculation and analyses were performed on the following series - data sets: annual minima (AMIN), annual maxima (AMAX), annual average flow (AAVG), and 30-day minima (30DMIN). The following data sets were not taken into consideration: less than 25 years of observation, unnatural river regime, and sets containing zero-flow in AMIN.

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 1

Figure 1 River network of Serbia and division into sub-basins. Study area on the south of Danube (Dunav) and Sava rivers and excluding sub-basin of the Beli Drim river on the south. Image source: http://www.hidmet.gov.rs/eng/hidrologija/povrsinske/index.php

Methodology

Detection of outliers was performed by Pilot and Harvey test, following research on historic floods in Serbia (Prohaska et. al. 2009).

Extreme events testing In hydrological practice, historic events are those which, in an uninterrupted ascending series (time series), considerably exceed or deviate from the subsequent, neighboring values for the event under consideration. The Pilot and Harvey test is generally used to obtain an objective detection of historic events (outliers) under extreme hydrologic conditions (floods and droughts). The assumption is that quantitative characteristics of these conditions adhere to the Log-Pearson III (LPT3) probability distribution. Under such circumstances, the upper and lower limits for the outliers are computed using the following formulas (1), (2): • Upper limit

YH = Yav + KN Sy (1) • Lower limit

YL = Yav - KN Sy (2) (these equations apply if -0.4> Csy > 0.4),

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 2 where: YH – log of the value of the upper limit of the outlier; YL – log of the value of the lower limit of the outlier; Yav – average value of the time series Y; Y = log X X - observed time series; Sy – standard deviation of the time series Y; Csy –skew coefficient of the time series Y; KN – frequency factor (critical value) for the risk coefficient ά=10% N - total number of members for the Y series for which statistical parameters will be calculated.

The frequency factor, KN, is computed using the formula (3): 0.25 0.5 0.75 KN = -3.6220 + 6.2844 N – 2.49835 N + 0.491436 N – 0.037911 N (3) The historic flood detection procedure itself involves a comparison of empirical distribution functions with defined outlier limits. If any empirical point falls outside the defined (upper or lower) limit, then such a point with probability 1- ά = 0.90 is deemed to represent a historic event.

In compliance with the test conditions, only the data sets that satisfied -0.4> Csy > 0.4 condition were taken into consideration for investigation.

Calculations of statistical parameters and return periods of outliers In order to estimate return period of detected outliers by Pilot and Harvey test, procedure below to be preformed because calculated statistical parameters do not reflect the actual characteristics of the analyzed processes. When outliers are detected, they have to be adjusted based on whether they occurred during a monitoring period or not, assuming that the random variable X follows the Pearson 3 (PT3) or LPT3 distribution. If an outlier occurs outside of an observation period, n, and if it is not exceeded during a longer period of time, N, then the empirical probabilities of a tested series Pi (i=1,2,3,...... , n+1) are computed using the formula (4):

P1 = 1/(N+1), P2 = 1/(n+1), P3 = 2/(n+1) , P4 = 3/(n+1), ...... , Pn+1 = n/(n+1) (4) If two outliers have occurred, one during the monitoring period n, and the other outside that period, and neither have been exceeded during the longer period of time N, then empirical probabilities of series Pi are computed based on the formula (5):

P1 = 1/(N+1), P2 = 2/(N+1), P3 = 3/(n+1) , P4 = 4/(n+1), ...... , Pn+1 = n/(n+1) (5) If the random variable X follows the PT3 or LPT3 distribution, then adjusted statistical parameters for a single outlier outside the monitoring period are computed using the formulas (6), (7), (8):

• Average value – Xav,N ⎡ n X ⎤ X + (N −1) i ⎢ N ∑ n ⎥ X = ⎣ i=1 ⎦ (6) av,N N where:

XN is the value of outlier – random variable X, which has not been exceeded during the time period N, and

Xi are values of the members of the basic data series for the monitoring period n.

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 3

• Coefficient of variation – Cv,N

1 ⎡ N −1 n ⎤ 2 2 (7) Cv,N = ⎢(k N −1) + ∑ (ki −1) ⎥ N −1 ⎣ n i=1 ⎦ where:

X N k N = is the value of the modulus coefficient for outlier, X av,N

X i ki = is the modulus coefficient of the random variable X during the monitoring period. X av,N

• Skewness coefficient – Cs,N N ⎡ N −1 n ⎤ C = (k −1)3 + (k −1)3 (8) s,N 3 ⎢ N ∑ i ⎥ (N −1)(N − 2)Cv,N ⎣ n i=1 ⎦ If there are two outliers, one during and the other outside the monitoring period, then statistical parameters are computed using the following formulas (9), (10), (11):

• Average value – Xsr,N ⎡ n X ⎤ X + X + (N − 2) i ⎢ N N −1 ∑ n ⎥ X = ⎣ i=2 ⎦ (9) av,N N

• Coefficient of variation – Cv,N

1 ⎡ N − 2 n ⎤ 2 2 2 (10) Cv,N = ⎢(k N −1) + (k N −1 −1) + ∑(ki −1) ⎥ N −1 ⎣ n −1 i=2 ⎦

• Skewness coefficient – Cs,N N ⎡ N − 2 n ⎤ C = (k −1)3 + (k −1)3 + (k −1)3 (11) s,N 3 ⎢ N N −1 ∑ i ⎥ (N −1)(N − 2)Cv,N ⎣ n −1 i=2 ⎦ If an outlier occurs within or outside series which follow the LPT3 distribution, the following procedure is used to adjust the statistical parameters: The weight coefficient – W is determined based on the number of events which fall outside the outlier limits, using the formula (12): N − Z W = (12) n + L where: Z is the number of high outliers, which is above high outlier limit L is the number of low outlier, which is lower than low outlier limit

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 4

In case of both high and low outliers, the adjusted values of the statistical parameters are determined using the log values of the basic random variable X (or using the random variable Y), as follows Average * values - YL

n Z W ∑∑Yi,L + Y j,L Y * = i==11j (13) L N −WL

* 2 • Variance – ( S L )

n Z * * 2 W ∑∑(Yi,L − Y L ) + (Y j;L − Y L ) (S * )2 = i==11j (14) L N −WL −1

* • Skewness coefficient - GL

n Z ⎡ * 3 * 3 ⎤ ⎢W ∑∑(Yi,L − YL ) + (Y j,L − YL ) ⎥ N −WL * ⎢ i==11j ⎥ (15) GL = * 3 (N −WL −1)(N −WL − 2) ⎢ (S L ) ⎥ ⎢ ⎥ ⎣ ⎦

Empirical probabilities are calculated using the expression (16): m* P = (16) N +1 with: m* = m for 1≤ m ≤ Z m* = Wm – (W-1)(Z+0.5) for (Z +1) ≤ m ≤ (Z + n + L) where: m* is the weighted order, m is the order of the data point in the series. As defined above, the adjusted values of the statistical parameters are used to calculate the theoretical (YL) probabilities (or return periods) of recorded outliers, based on PT3 - P(XN) or LPT3 – P(XN = 10 ) distribution: 1 T (X N ) = (in years). P(X N )

For calculation of statistical parameters and return period of outliers, theoretical probability was obtained in the following manner: N was set to 100 years for AMIN, AMAX and AAVG data sets, while for 30DMIN, N was set to observation period n for each series.

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 5

‘Normalization’ of series There are opinions that data from the same population are obtained if AMIN series is formed without data from extremely wet years, and AMAX without data from extremely dry years. Taking existence of low or high outlier as indicator of extremely wet/dry year, several data sets were subjected to this kind of ‘normalization’ of AMIN, AMAX and AAVG series.

Results

Outliers detected by Pilot and Harvey test were mapped. There are 3 maps showing G.S. with high outliers only (Figure 2), low outliers and zero-flow (Figure 3), and G.S. where both high and low outliers were detected. Signs denoting type of data set and outlier detected are mapped in the gravity center of the basin belonging to G.S. The maps are accompanied by the Tables (Tables 1-4) where return periods of outliers, calculated according to the described procedure, are also presented. There were 4 examples taken into consideration for investigation of ‘normalization’ of data sets of extreme events: a. G.S. R. Djetinja. AMAX statistics (data set without 1975, 1985 (low AMIN outliers) and 1990 (low AAVG outlier) data) b. G.S. Magovo R. . AMIN, 30DMIN statistics (data set without 1979, 1986 (high AMAX outliers)) c. Mercez R. Lukovska. AMIN, 30DMIN statistics (data set without 1955, (high AAVG outlier) and 1979 (high AMAX outier) data) d. G.S. R. . Qavg statistics (data set without 1979 (high AMAX outlier) and 1969, 1991 (low AMIN outliers)) These statistics are shown in the Tables 5-8. Probability flows for a few characteristic probabilities (0.001, 0.002, 0.005, and 0.01) for observed and ‘normalized’ series are also given, together with relative difference of theoretic probability flows. Relative difference was calculated according to expression:

x, p − x, p * dx = 100 [%] x, p

Table 1 List of G.S. with high outliers detcted. For each G.S. basin area (A [km2] ) and observation period (n[years]) is given. Outlier return period T[years] shown within the series where it was detected. OUTLIERS A AMIN AAVG AMAX 30DM River G.S. 2 n No. (km ) H H H H

1 Drina B. Bašta 14797 81 163 2 Čedovo 501 48 146 3 Ljubatska Bosiljgrad 198.6 43 664 4 Pčinja Barbace 457 49 1186 5 Ibar L.Lakat 7818 59 271 6 Studenica Ušće 540 53 960 7 Peštan Zeoke 125 49 263;132 8 Toplica Magovo 180 33 120;74 9 Lukovska Merćez 112.6 40 164 98 10 Z.Morava Jasika 14721 56 189 11 Moravica Ivanjica 475 64 489

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 6

Figure 2 High outliers mapped at gravity center of basin where they were detected. Outlier details given in Table 1.

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 7

Figure 3 Low outliers and zero flow G.S. mapped at gravity center of basin where they were detected. Outlier details given in

Table 2 and

Table 3.

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 8

Table 2 List of G.S. with low outliers detected and zero-flow present in AMIN series. For each G.S. basin area (A [km2] ) and observation period (n[years]) is given. Outlier return period T[years] shown within the series where it was detected. Sub- basins: Drina, Dunav, Ibar, . OUTLIERS A AMIN AAVG AMAX 30DM River G.S. 2 n No. (km ) L L L L 1 Lim 2762 44 200 2 Lešnica 959 47 1312 204;58 3 Jadar Zavlaka 313 47 576 4 Bistrica 79 42 224 5 Topčiderska 138 36 130 6 Pek Kučevo 849.5 53 137 7 Pek Kusiće 1220 41 112 8 Crnajka Crnajka 96 42 121 9 Rašanac/V.Selo 1124 56 1177 70 782 10 Vitovnica Kula 243 32 11 Jošanica 265 51 246 12 Studenica Devići 191.4 43 1507 149 13 Studenica Mlanča 310 44 148 14 115.8 47 116 249 15 Ribnica 102 38 115 1191 16 J.Morava Mojsinje 15390 56 254 583 17 J.Morava 3782 58 133 203;126 18 J.Morava V. Han 3052 54 221 563 19 Veternica 500 56 73 20 Lužnica Svodje 319.3 45 101 21 Vlasotince 879 47 240 146 22 Kozarska Tupalovce 98.1 46 103 23 Pečenjevce 891 34 24 Pusta Pukovac 561 53 25 Kolubara Valjevo 340 49 176 1526 26 Kolubara Slovac 995 51 416 322 27 Kolubara B. Brod 1896 48 1325 28 Obnica B. Polje 185 54 185 29 Degurić 159 51 1367 30 Koceljeva 209 46

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 9

Table 3 List of G.S. with low outliers detected and zero-flow present in AMIN series. For each G.S. basin area (A [km2] ) and observation period (n[years]) is given. Outlier return period T[years] shown within the series where it was detected. Sub- basins:Nisava, Timok, Toplica, V. Morava. OUTLIERS A AMIN AAVG AMAX 30DM River G.S. 2 n No. (km ) L L L L 31 Nišava Dimitrovgrad 482 48 320 99 32 Nišava Pirot 1745 55 282 539 33 Nišava B. Palanka 3087 56 202 1537 34 Nišava Niš 3870 55 92;92 35 Jerma T. Odorovci 557 46 131 97 36 Jerma Sukovo 795 42 111 37 Jerma Strazimirovci 95 46 156 38 Kutinska R.Bara 205 42 39 Visočica Brajćevci 227 46 40 B. Timok Vratarnica 1771 57 807 41 B.Timok Zaječar 2150 57 1131 42 C.Timok Zaječar/Gamzigrad 1213 57 125 43 Sikolska Mokranja 114 28 44 Toplica D.Selova 353 55 726 45 Toplica Pepeljevac 986 56 154 46 370 47 157 136 47 Toplica Prokuplje 1774 52 2498 48 V.Morava Varvarin 31548 59 155;153 49 V.Morava Bagrdan 33446 57 178 50 V.Morava Lj. Most 37320 59 216 51 Ćuprija 162.8 46 413 52 Jasenica D. Šatornja 83.6 47 383 53 193 46 54 Jasenica S.Palanka 496 45 55 Kubrušnica S.Palanka 743.2 42 56 Jagodina/Majur 427 44 57 Djetinja Stapari 332 41 118;118 154 58 Djetinja /Šengolj 511 55 138 199 59 Z.Morava Kraljevo/Miločaj 4658 52 303 60 V.Rzav /Radobudja 451.8 44 83 61 V.Rzav 564 50 111 62 Čemernica Preljina 625 47 357 257 63 Prijevor 201 45 230

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 10

Figure 4 Mixed outlier occurrence: both high and low ones mapped at gravity center of basin where they were detected. Outlier details given in Table 4

Table 4 List of G.S. with mixed outlier occurrence: both high and low ones. For each G.S. basin area (A [km2] ) and observation period (n[years]) is given. Outlier return period T[years] shown within the series where it was detected. OUTLIERS A AMIN AAVG AMAX 30DMIN River G.S. 2 n No. (km ) H L H L H L H L 1 Drina Radalj 17490 41 193 554 787 2 Lim 3160 81 503 408 3 Lim Priboj 3684 44 250;147 265 4 Brankovačka Ribarce 160 46 219;219 500 5 Raška Raška 1036 59 295 811 1299 6 Vlasina Svodje 350 52 124 1810 7 Banjska V. 108.3 43 121 223 8 Ribnica Paš./ 107.8 50 180 86

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 11

Table 5 Statistics of observed and ‘normalized’ AMAX series – without dry years, judged by low outlier occurrence (data set without 1975, 1985 (low AMIN outliers) and 1990 (low AAVG outlier) data). GS Stapari River Djetinja theor.distrib. lpt3 AMAX statistics, observed statistics, 'normalized' N 42 N* 39 yavg 1.666 yavg* 1.686 Sy 0.332 Sy* 0.303 Cv,y 0.199 Cv,y* 0.180 Cs,y 0.111 Cs,y* 0.169 p x,lpt3 x,lpt3* dx [%] 0.001 556 497 10.6 0.002 464 418 10.0 0.005 360 327 9.1 0.01 292 268 8.2

Table 6 Statistics of observed and ‘normalized’ AAVG series – without both wet and dry years, judged by outlier occurrence (data set without 1979 (high AMAX outlier) and 1969, 1991 (low AMIN outliers)). VS Priboj Lim theor.distrib. lpt3 AAVG statistics, observed statistics, 'normalized' N 44 N* 41 ysr 1.959 ysr* 1.958 Sy 0.098 Sy* 0.097 Cv,y 0.050 Cv,y* 0.049 Cs,y -0.502 Cs,y* -0.587 p x,lpt3 x,lpt3* dx [%] 0.001 157 152 3.2 0.002 153 148 2.9 0.005 147 143 2.4 0.01 142 139 2.1

Table 7 Statistics of observed and ‘normalized’ AMIN and 30DMIN series – without wet years, judged by high outlier occurrence (data set without 1979, 1986 (high AMAX outliers)). VS Magovo Reka Toplica VS Magovo Reka Toplica theor.distrib. pt3 AMIN theor.distrib. lpt3 30DMIN statistics, statistics, statistics, observed 'normalized' statistics, observed 'normalized' N 34 N* 32 N 40 N* 38 xsr 0.273 xsr* 0.262 ysr -0.427 ysr* -0.440 Sx 0.108 Sx* 0.101 Sy 0.183 Sy* 0.176 Cv,x 0.397 Cv,x* 0.386 Cv,y -0.429 Cv,y* -0.400 Cs,x 1.107 Cs,x* 1.302 Cs,y 0.418 Cs,y* 0.467 p x,pt3 x,pt3* dx [%] p x,lpt3 x,lpt3* dx [%] 0.001 0.092 0.113 -22.4 0.001 0.130 0.134 -3.7 0.002 0.096 0.114 -19.7 0.002 0.137 0.141 -3.2 0.005 0.102 0.118 -15.7 0.005 0.149 0.152 -2.4 0.01 0.109 0.123 -12.4 0.01 0.160 0.162 -1.7

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 12

Table 8 Statistics of observed and ‘normalized’ AMIN and 30DMIN series – without wet years, judged by high outlier occurrence (data set without 1955, (high AAVG outlier) and 1979 (high AMAX outier) data). VS Mercez Reka Lukovska VS Mercez Reka Lukovska theor.distrib. pt3 AMIN theor.distrib. pt3 30DMIN statistics, statistics, statistics, observed 'normalized' statistics, observed 'normalized' N 41 N* 39 N 40 N* 38 xsr 0.322 xsr* 0.314 xsr 0.425 xsr* 0.411 Sx 0.108 Sx* 0.100 Sx 0.157 Sx* 0.149 Cv,x 0.336 Cv,x* 0.317 Cv,x 0.370 Cv,x* 0.361 Cs,x 0.753 Cs,x* 0.567 Cs,x 1.242 Cs,x* 1.495 p x,pt3 x,pt3* dx [%] p x,pt3 x,pt3* dx [%] 0.001 0.097 0.084 14.2 0.001 0.184 0.216 -17.6 0.002 0.106 0.094 11.0 0.002 0.187 0.217 -16.0 0.005 0.119 0.110 7.5 0.005 0.194 0.220 -13.5 0.01 0.131 0.124 5.2 0.01 0.202 0.224 -11.2

Concluding remarks

Working hypothesis: ‘Extremely wet/dry years are shown as synchronized outliers at a gauge station in each of the tested series either as high or low outlier’ is rejected. There are only few examples of such an occurrence. Nevertheless, these outliers are tested only by one test and fitted either to PT3 or LPT3 theoretical distribution. Application of other outlier tests (Dixon-Thompson, Rosner, etc. (McCuen, (2003)) could bring more reliable conclusion related to behaviour of basins. For instance, many data sets were excluded based on |Csy|> 0.4 criterion, even though their basins were known for floods/draughts. The nature of series tested for outliers leads to conclusion that two characteristic of extreme events could be judged by outlier presence: magnitude of extreme event (AMIN, AMAX) and severity or persistence of wet/dry period (30DMIN, AAVG). Combination of these extreme events indicators could lead to better understanding of danger threatening to some regions. Such an analysis could reveal more thoroughly regions prone to floods, draughts or both, in the terms of flash floods or long lasting floods, and instantaneous or persistent hydrologic draughts. Combined with return period of detected outliers, flood and draught zoning could also be performed. The examples show (Tables 5-8) that there is a difference in the statistics of probability flows when data from extremely wet/dry years are removed from appropriate series. Consequently, there are differences in theoretic probability flows (judged by relative difference). These differences are significant for AMIN, AMAX and one of the 30DMIN series, while insignificant for AAVG and one of the 30MIN series. Even when outliers are taken as only indicators of wet/dry years, the recommendation should be followed. It is also observed that data taken out from the studied set as dry year data were not the lowest flow data in the set and vice versa. Finally, it is not an overstatement that any generalization of flood/draught regions depends on ability of any observation network to represent river network of studied area in terms of data quality for obtaining flow indicators, and spatial representation.

Acknowledgement Research results of the paper are funded from the Republic of Serbia Ministry of Science and Technological Development resources within scientific project no. 22005A ‘Extreme Hydrologic Features: Floods and Droughts’.

References Prohaska S., Ilić A., Miloradović B., Petković T., 2009: Detection and classification of Serbia’s historic floods. International Conference LAND CONSERVATION, Tara Mountain, Serbia, Book of Conference Abstracts p.121 McCuen R. H., 2003: Modeling Hydrologic Change: Statistical Methods, CRC Press LLC, Boca Raton, Florida.

BALWOIS 2010 - Ohrid, Republic of Macedonia - 25, 29 May 2010 13