Available online at www.sciencedirect.com ScienceDirect

Aquatic Procedia 4 ( 2015 ) 957 – 963

INTERNATIONAL CONFERENCE ON WATER RESOURCES, COASTAL AND OCEAN ENGINEERING (ICWRCOE 2015) Regression Analysis of Annual Maximum Daily Rainfall and Stream Flow for Flood Forecasting in Vellar River Basin

P.Supriyaa*, M.Krishnavenia, M.Subbulakshmia

a Centre for Water Resources, Anna University, Chennai-600025,

Abstract

Flood is a natural demolishing phenomenon, forecast of which is of high importance, to reduce its impact. On the other hand, it also serves as a major source of water when conserved through proper means. For this purpose, estimation of flood is of at utmost importance. The present study employed regression analysis between weighted maximum rainfall and maximum stream flow for the flood events of respective catchments of Vellar river basin. Regression equation was framed between Annual Maximum Daily Rainfall, Stream Flow and area for each catchment. Based on the equation, flood prioritization rank was given to each of the catchment. Thereby, Lower Vellar is prioritized as first rank and Upper Vellar is prioritized as last rank respectively. This study reveals that the Lower Vellar is the most vulnerable catchment and which needs flood control measures.

© 20152015 TheThe Authors. Authors. Published Published by by Elsevier Elsevier B.V. B.V. This is an open access article under the CC BY-NC-ND license Peer(http://creativecommons.org/licenses/by-nc-nd/4.0/-review under responsibility of organizing committee). of ICWRCOE 2015. Peer-review under responsibility of organizing committee of ICWRCOE 2015 Keywords: Flood Forecasting; Regression; Prioritization; Vulnerable

* Corresponding author. E-mail address: [email protected]

2214-241X © 2015 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). Peer-review under responsibility of organizing committee of ICWRCOE 2015 doi: 10.1016/j.aqpro.2015.02.120 958 P. Supriya et al. / Aquatic Procedia 4 ( 2015 ) 957 – 963

1. Introduction

The Vellar River basin was located between Ponnaiyar and Cauvery basins. The river originates in the Chitteri hills of the Eastern Ghats in the Salem district of . The river flows for a total length of 210 km in the Salem and districts and out falls in to the Bay of Bengal near Parangipettai. Every year, Vellar river basin is facing serious seasonal flood problems during the months of November and December. In 2009, heavy flood was occurred in both Vellar and Manimuktha river due to unforeseen precipitation and discharged 207000 Cusecs near the confluence point at Kudalayathur village. The flood water enters into the adjacent cultivable lands and causes serious threat to the livelihood of the local village people (Needhidasan. S. et al. 2013). Regression analysis performs an exploration of relationships between one dependent and one or more independent variables under consideration. Regression methods are widely used for many hydrological applications. Tangborn and Ramussen (1976) derived a regression equation based upon the relationship between winter precipitation and annual runoff. The regression coefficients were estimated with the help of mass balance equations. The accuracy of the model framed using the full record of precipitation and annual runoff by the standard error of estimate. MA Diaz-Granados and Rafael L. Bras (1982) investigated the computer based regression techniques such as ordinary least squares (OLS) and iterative Generalised Least Squares (GLS) to perform parameter estimation and residual analysis for seasonal stream flow. The residual statistics are calculated using regression equations. The results revealed that the iterative GLS performs better than the OLS in forecasting based upon the value of mean square of error. Huynh Ngoc Phien et al. (1990) compared five regression methods such as ordinary least squares, ridge, principle components, stepwise and least absolute value regression. The results revealed that the ordinary least squares, ridge, principle components regression methods are efficient in forecasting compared to the stepwise and least absolute value regression methods. Lynne Tolland (1998) discusses about the available empirical methods for the computation of peak flows in forest hydrology utilizing urban hydrology techniques. The rational methods and empirical methods are to be improvised to determine design floods for forests. Waltemeyer (2002) analyzed the magnitude and frequency of the 4-day annual low flow and regression equations for estimating the 4-day, 3-year low-flow frequency at ungauged sites on unregulated streams in New Mexico. Two regression equations were developed for estimating the 4-day, 3-year (4Q3) low-flow frequency at ungaged sites on unregulated streams in New Mexico. The first, a statewide equation for estimating the 4Q3 low- flow frequency from drainage area and average basin mean winter precipitation, was developed from the data for 50 stream flow-gaging stations that had non-zero 4Q3 low-flow frequency. The 4Q3 low-flow frequency for the 50 gaging stations ranged from 0.08 to 18.7 cubic feet per second. For this statewide equation, the average standard error of estimate was 126 percent and the coefficient of determination was 0.48. The second, an equation for estimating the 4Q3 low-flow frequency in mountainous regions from drainage area, average basin mean winter precipitation, and average basin slope, was developed from the data for 40 gaging stations located above 7,500 feet in elevation. For this regression equation, the average standard error of estimate was 94 percent and the coefficient of determination was 0.66. Bhatt. V. K (2008) developed regression equation of mean annual peak discharge for 5 & 10 return periods with channel cross-section area as independent variable provides a greater correlation coefficient and adequate flood estimates. Roland and Stuckey (2008) developed regression equations for estimating flood flows at selected recurrence intervals for ungaged streams in Pennsylvania. These equations were developed utilizing peak-flow data from 322 stream flow-gaging stations within Pennsylvania and surrounding states. All stations used in the development of the equations had 10 or more years of record and included active and discontinued continuous-record as well as crest- stage partial-record stations. The state was divided into four regions, and regional regression equations were developed to estimate the 2-, 5-, 10-, 50-, 100-, and 500-year recurrence-interval flood flows. The equations were developed by means of a regression analysis that utilized basin characteristics and flow data associated with the stations. Significant explanatory variables at the 95-percent confidence level for one or more regression equations included the following basin characteristics: drainage area; mean basin elevation; and the percentages of carbonate bedrock, urban area, and storage within a basin. The regression equations can be used to predict the magnitude of flood flows for specified recurrence intervals for most streams in the state; however, they are not valid for streams with drainage areas generally greater than 2000 square miles or with substantial regulation, diversion, or mining activity within the basin. P. Supriya et al. / Aquatic Procedia 4 ( 2015 ) 957 – 963 959

K. Engeland (2009) estimated low flow index in ungauged catchments using regional regression models and (HBV) regional hydrological model. The regression models provided the better estimates of lowest discharge values compared to HBV method. Saeid Eslamian (2010) attempted to estimate the low flow index using Principle Component Regression (PCR) based upon physiographic and hydrologic variables. The yearly minimum discharge data of respective gauging stations are ranked assessed upon its fitness distribution. M. Taylor (2011) made a comparison between Quantile Regression Technique (QRT) and Parameter Regression Technique (PRT) of regression based Regional Flood Frequency Analysis techniques. In QRT, the individual flood quantiles are regressed against catchment characteristics. In PRT, the parameters of a probability distribution are regressed against catchment characteristics. An ordinary least squares regression method is used to develop prediction equations. The reasonable predictions are achieved using the equations produced by both the methods are easily applied between catchment area and rainfall intensity. Jianzhu Li et al. (2013) performed multi linear regression analysis with flood volume and flood peak as dependent variables and changes in rainfall depth, intensity, land use area, watershed area as independent variables. The quantified effect on flood peak and flood volume was accomplished using four regression equations.

Nomenclature

AMDR Annual Maximum Daily Rainfall AMDSF Annual Maximum Daily Stream Flow ANFIS Adaptive Network-Based Fuzzy Inference System ANN Artificial Neural Network GLS Generalised Least Squares HBV Regional Hydrological Model MLR Multiple Regression Model MNLR Multiple Non-Linear Regression Model OLS Ordinary Least Squares PCR Principle Component Regression PRT Parameter Regression Technique PWD Public Works Department QRT Quantile Regression Technique WRD Water Resources Department X1 Annual Maximum Daily Rainfall in mm 2 X2 Catchment Area in km Y Annual Maximum Daily Stream flow in Cusecs

2. Purpose and Scope

This paper presents regression equations that describe the relation between weighted Annual Maximum Daily Rainfall, catchment area and Annual Maximum Daily Stream Flow. Flood flow regression equations are developed using the discharge data of Tholuthur, Pelandurai, Memathur and Sethiyathope. The weighted Annual Maximum Daily Rainfall (AMDR), Annual Maximum Daily Stream Flow (AMDSF) and Catchment area are utilized in framing regression equation for flood forecasting. The Vellar basin experienced severe flooding in the year 2005 and inundated most of the areas of .

3. Study Area and Data Sets

The Vellar river basin is located in the northern part of Tamil Nadu State in South India, between the latitudes 11° 13'N - 12 00' N and longitude 78° 13'E - 79° 47' E as shown in Figure 1. The total area of the basin is 7520.87 km2. The total length of the river is about 150km. Vellar basin lies entirely within the state of Tamilnadu and covers a portion of Dharmapuri, Salem, Namakkal, Perambalur, Trichy, Villupuram and Cuddalore districts. 960 P. Supriya et al. / Aquatic Procedia 4 ( 2015 ) 957 – 963

This basin is in between Ponnaiar, Paravanar and Cauvery river basins. The discharge data from Tholuthur, Pelandurai, Memathur and Sethiyathope stream gauge stations were obtained from Public Works Department (PWD) and Water Resources Department (WRD). The 32 years of daily rainfall data of 26 influencing rain gauge stations present in the sub basins and as well as nearby stations were acquired from PWD and WRD.

Fig. 1. Index Map of Vellar Basin

4. Methods

4.1. Regression Analysis

In regression analysis one variable is taken as dependent variable and the other as independent variable thus making it possible to study the cause and effect relationship. A regression model that involves more than one independent variable is called Multiple Regression Model (MLR). Multiple regression analysis was used to establish the statistical relations between one dependent variable Y and one or more independent variables X1, X2………Xp is of the form (Jeyarami Reddy, 2013). This is the multiple regression equation. The term linear is used because equation (1) is a linear function of the unknown parameters ܾ΋ǡ ܾΌǡ ܾ΍ ǥܾ’.

(൅ ܾ݌ ݔ݌ (1 ڮ ݕ ൌ ܾ΋ ൅ ܾΌݔΌ ൅ ܾ΍ݔ΍ ൅ In non-linear regression analysis, the observational data is modeled by a function which is a non-linear combination of the model parameters and depends upon one or more independent variables (Bilgili M. 2010). The Multiple Non-Linear Regression model (MNLR) was a simple and efficient method in producing accurate maximum daily stream flow predictions compared to ANNs, ANFIS and MLR (Rezaeianzadeh. M et al. 2013). The multiple non-linear regression equation generally is of the form (Jeyarami Reddy, 2013),

࢈૚ ࢈૛ ࢈࢖ ࢟ ൌ ࢈΋࢞Ό ࢞΍ ǥǥǤ ࢞࢖ (2)Where the b0,….,bp are the parameters of non-linear relation. The multiple non-linear regression problems can be brought to a linear form by a simple logarithmic transformation by taking logarithm on both the sides of the equation (2) (Jeyarami Reddy, 2013),

(൅ ࢈࢖࢒ (3 ڮ ࢟ ൌ ࢒࢔࢈૙ ൅ ࢈૚࢒࢔࢞૚ ൅ ࢈૛࢒࢔࢞૛ ൅ ܖܔ

and so the regression of ln y on ln (x1), ln(x2),……ln(xp) is utilized for estimating b0,b1,b2, ….,bp.

P. Supriya et al. / Aquatic Procedia 4 ( 2015 ) 957 – 963 961

4.2. DataFit 9 Software

DataFit 9 is a powerful software tool which simplifies the analysis of regression, statistical tasks and data plotting. Intuitive graphical interface is easily understandable and which is helpful in performing accurate regression analysis quicker rather than undergoing programming type approach. The variables can be typed directly or it can be imported from other files. It can solve multivariate linear and non-linear regression models up to 20 independent variables. DataFit 9 includes Forward Selection, Backward Elimination, Stepwise Selection and Manual variable selection modes to select the independent variable. The regression models can be selected from the predefined regression models based on its detailed results and R2 ranking as shown in Table 2. The DataFit 9 software is selected because it can be used to compare and analyze the list of 242 non-linear regression models and then the best fit model is automatically ranked first. Compared to other statistical softwares, DataFit 9 software is less time consuming, user friendly and provides accurate results.

4.3. Generation of Thiessen Polygons

The Thiessen polygon is a graphical technique which calculates station weights depends upon their relative areas of each measurement station in the Thiessen polygon network. Each area of the polygon was influenced by the rain gauge station present inside. The effective uniform depth of precipitation can be achieved by assigning weights to the rain gauge stations. Thiessen polygons were created for the influencing rain gauge stations by using Arc Map software is clearly shown in Figure 2.

Table 1. Thiessen Weightage for Rain Gauge Stations Catchment Catchment Area in km2 Rain Gauge Station Weightage Upper Vellar and Swethanadhi Attur 0.503 2828.01 Rasipuram 0.172 Tholudur 0.246 Thuraiyur 0.078 Attur 0.303 Gomuki and Manimuktha 1940.68 Kattumylore 0.224 Manimuktha Reservoir 0.034 Memathur 0.147 Tholudur 0.293 Chettikulam 0.229 999.6 Keelacheruvai 0.258 Anaivari Odai and Chinnar Pelandurai 0.136 Tholudur 0.349 Thuraiyur 0.027 Pelandurai 0.225 Lower Vellar Tholudur 0.033 Keelacheruvai 0.110 1752.69 Kattumylore 0.057 Memathur 0.242 Sethiyathope Anicut 0.148 Kothavacheri 0.024 Annamalai University 0.017 Ulundurpettai 0.144

962 P. Supriya et al. / Aquatic Procedia 4 ( 2015 ) 957 – 963

Fig. 2. Thiessen Polygon Map

5. Results and Discussions

The Thiessen weightage was computed to the rain gauge stations of each catchment based upon their geographical location using Arc Map software is clearly mentioned in the Table 1. The effective rainfall depth for every catchment was obtained by assigning weights to the influencing rain gauge stations. The stream flow data measured at every confluence point of the catchments are utilized for the generation of regression equations. The non-linear regression equations are framed by considering catchment area, weighted rainfall as independent variables and stream flow as dependent variable. The relationship between weighted Annual Maximum Daily Rainfall, Catchment area and Annual Maximum Daily Stream flow of that corresponding year were developed using Datafit 9 software and are shown as equations in Table 2. The performance of the relationships of weighted Annual Maximum Daily Rainfall, Catchment area and Annual Maximum Daily Stream flow was tested on the basis of R2. The R2 value was closer to one for all the four catchments which signifies that there exists strong relationship between variables. The Datafit 9 software generates best fit non- linear regression equations and ranks it based on R2.

Table 2. Regression Equations for Sub Basins Catchment Equation ID Parameters Regression equation R2 a 5.232 Lower Vellar ൌͷǤʹ͵ʹݔͳͳǤʹͳͻ ܻ ൌ ܽݔͳܾ ܿݔʹ b 1.219 0.90 ܻ c 1.000 a 1.998 Gomuki & Manimuktha ܻ ൌ ܽݔͳܾ ܿݔʹ b 1.521 ܻ ൌͳǤͻͻͺݔͳͳǤͷʹͳ 0.95 c 1.000 a 0.0654 Chinnar & ܾ ݔʹ ʹǤͺʹ͹ ൌ ܽݔͳ ܿ b 2.827 ܻ ൌͲǤͲ͸ͷͶݔͳ 0.97 ܻ Anaivari odai c 1.000 a 0.168 Upper Vellar & ͳͲെ͵ݔͳʹǤͷͺ͹ 0.90 כ ൌ ܽݔͳܾ ܿݔʹ ܻ ൌͻǤͻͳͻ ܻ Swethanadhi b 2.587 c 0.999

5.1. Flood Prioritization

The accuracy of the results was measured by the coefficient of determination R2, which is a commonly adopted measure in the regression analysis of flood. Based upon the coefficient of respective best fit regression equation of each of the catchments was utilized for flood prioritization. The analysis of regression equations revealed that lower Vellar has high flood risk and Upper Vellar has low flood risk compared to other catchments. P. Supriya et al. / Aquatic Procedia 4 ( 2015 ) 957 – 963 963

The prioritization of flood risk catchments are ranked as shown in Table 3.

Table 3. Ranking Sub Basins Based on Flood Prioritization Rank Catchment 1 Lower Vellar 2 Gomuki & Manimuktha 3 Chinnar & Anaivari odai 4 Upper Vellar & Swethanadhi

The prioritization of identifying flood risk catchment is essential to safeguard lives of local people, cultivable lands, economic losses, hydraulic structures, roads of Vellar river basin. This analysis reveals that the Lower Vellar is the most vulnerable catchment and which needs flood control measures.

6. Conclusions

The regression method is an effective statistical method of estimating flood discharge compared to other methods based upon catchment characteristics. This study reveals that the regression equations provide flood magnitude which is adequate for the planning and design of various hydraulic structures. Based on the equation, flood prioritization rank was assigned to the four catchments. Lower Vellar catchment has assigned first rank and Upper Vellar & Swethanadhi catchment has assigned last rank respectively. Flood control measures are needed in Lower Vellar which is most vulnerable catchment compared to other catchments. Regression equation is framed between annual maximum daily rainfall, discharge and area for each catchment. The accuracy of flood estimates are further improved by considering more independent variables such as catchment slope, length of main stream etc. in framing regression equations.

References

Bhatt, V. K., Tiwari, A. K., 2008. Estimation of peak streamflows through channel geometry. Hydrolological Sciences Journal 53(2), 401–408. Bilgili, M., 2010. Prediction of soil temperature using regression and artificial neural network models. Meteorological Atmospheric Physics 110, 59–70. Engeland, K., Hisdal, H., 2009. A Comparison of Low Flow Estimates in Ungauged Catchments Using Regional Regression and the HBV-Model ., Water Resources Management 23(12), 2567-2586. Huynh Ngoc Phien., Bui Khanh Huong., Phan Dinh Loi., 1990. Daily flow forecasting with regression analysis. Water SA 16(3), 179-184. Jeya Rami Reddy, P., 2013. Stochastic Hydrology. Laxmi Publications Private Limited., Ajit Printing Press, New Delhi, pp. 122-130. Jianzhu Li., Ping Feng., Zhaozhen Wei., 2013. Incorporating the data of different watersheds to estimate the effects of land use change on flood peak and volume using multi-linear regression. Mitigation and Adaptation Strategies for Global Change 18(8), 1183-1196. Lynne Tolland., Jaime G. Cathcart and S.O., Denis Russell., 1998. Estimating the Q100 in British Columbia: A practical problem in forest Hydrology. Journal of the American Water Resources Association 34(4), 787-794. Mario A. Diaz-Granados., Rafael L. Bras., 1982. Identification and Estimation of a Monthly Multivariate Stochastic Stream flow model for the Nile River Basin, Technology Adaptation Program. Report No. 283, July 1982. Needhidasan, S., Chenchu Babu, K., Natrayan, M., Naveen, D. C., Vinothkumar, S., 2013. Preserving the Environment due to the Flash Floods in Vellar River at TV Puthur, taluk, Tamil Nadu: A Case Study. International Journal of Structural and Civil Engineering Research 2(3), 104-114. Rezaeianzadeh, M., Tabari, H., Arabi Yazdi, A., Isik, A., Kalin, L., 2013. Flood flow forecasting using ANN, ANFIS and regression models. Neural Comput & Applic. springer-verlag London 2013, published online. DOI 10.1007/s00521-013-1443-6. Roland, M. A., Stuckey, M. H., 2008. Regression Equations for Estimating Flood Flows at Selected Recurrence Intervals for Ungaged Streams in Pennsylvania. Scientific Investigation Report 2008-5102, US Department of the Interior, U. S. Geological Survey. Saeid Eslamian., Mehdi Ghasemizadeh., Monireh Biabanaki., Mansoor Talebizadeh., 2010. A Principal Component Regression Method for Estimating Low Flow Index. Water Resources Management 24(11), 2553-2566 Tabari, H., Arabi Yazdi, A., Isik, A., Kalin, L., 2013. Flood flow forecasting using ANN, ANFIS and regression models. Neural Comput & Applic. springer-verlag London 2013, published online. Taylor, M., Haddad, K., Zaman, M., Rahman, A., 2011. Regional flood modelling in Western Australia: Application of regression based methods using ordinary least squares. 19th International Congress on Modelling and Simulation. Perth, Australia, 3803-3810. http://mssanz.org.au/modsim2011 Waltemeyer, S. D., 2002. Analysis of the magnitude and frequency of the 4-day annual low flow and regression equations for estimating the 4- day, 3-year low-flow frequency at ungaged sites on unregulated streams in New Mexico, US Department of the Interior, US Geological Survey. Wendell V. Tangborn., Lowell A. Ramussen., 1976. Hydrology of the North Cascades Region, Washington - A Proposed Hydro meteorological Stream flow Prediction Method. Water Resources research 12(2), 203-216.