DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED REGIONAL FLOOD FREQUENCY ANALYSIS TECHNIQUE

Kashif Aziz, BScEng, MEng Student ID 16658598

A thesis submitted for fulfilment for the degree of Doctor of Philosophy in Civil Engineering

Supervisory Panel: Assoc Prof Ataur Rahman Assoc Prof Gu Fang Assoc Prof Surendra Shrestha

School of Computing, Engineering and Mathematics University of Western Sydney, December 2014 Artificial Intelligence Based RFFA Aziz ABSTRACT

Flood is one of the worst natural disasters, which brings disruptions to services and damages to infrastructure, crops and properties and sometimes causes loss of human lives. In Australia, the average annual flood damage is worth over $377 million, and infrastructure requiring design flood estimate is over $1 billion per annum. The 2010-11 devastating flood in Queensland alone caused flood damage over $5 billion.

Design flood estimation is required in numerous engineering applications e.g., design of bridge, culvert, weir, spill way, detention basin, flood protection levees, highways, floodplain modelling, flood insurance studies and flood damage assessment tasks. For design flood estimation, the most direct method is flood frequency analysis, which requires long period of recorded streamflow data at the site of interest. This is not a feasible option at many locations due to absence or limitation of streamflow records. For these ungauged or poorly gauged catchments, regional flood frequency analysis (RFFA) is adopted. The use of RFFA enables the transfer of flood characteristics information from gauged to ungauged catchments. RFFA essentially consists of two principal steps: (i) formation of regions; and (ii) development of prediction equations.

For developing the regional flood prediction equations, the commonly used techniques include the rational method, index flood method and quantile regression technique. These techniques adopt a linear method of transforming inputs to outputs. Since hydrologic systems are non-linear, RFFA techniques based on non-linear method can be a better alternative to linear methods. Among the non-linear methods, artificial intelligence based techniques have been widely adopted to various water resources engineering problems. However, their application to RFFA is quite limited. Hence, this research focuses on the development of artificial intelligence based RFFA methods for Australia. The non-linear techniques considered in this thesis include artificial neural network (ANN), genetic algorithm based artificial neural network (GAANN), gene-expression programing (GEP) and co-active neuro fuzzy inference system (CANFIS).

This study uses data from 452 small to medium sized catchments from eastern Australia. In the development/training of the artificial intelligence based RFFA models, the selected 452 catchments are divided into two parts randomly: (i) training data set consisting of 362 catchments; and (ii) validation data set consisting of 90 catchments. It has been found that a

University of Western Sydney II

Artificial Intelligence Based RFFA Aziz

RFFA model with two predictor variables i.e., catchment area and design rainfall intensity provides more accurate flood quantile estimates than other models with a greater number of predictor variables. The results show that when the data from all the eastern Australian states are combined to form one region, the resulting ANN based RFFA model performs better as compared with other candidate regions such as regions based on state boundaries, geographical and climatic boundaries and the regions formed in the catchment characteristics data space.

In the training of the four artificial intelligence based RFFA models, no model performs the best for all the six average recurrence intervals over all the adopted statistical criteria. Overall, the ANN based RFFA model performs better than the three other models in the training/calibration.

In this research, it also has been found that non-linear artificial intelligence based RFFA techniques can be applied successfully to eastern Australian catchments. Among the four artificial intelligence based models considered in this study, the ANN based RFFA model has demonstrated best performance based on independent split-sample validation, followed by the GAANN based RFFA model. The ANN based RFFA model has been found to outperform the ordinary least squares based RFFA model. Based on independent validation, the median relative error values for the ANN based RFFA model are found to be in the range of 35% to 44% for eastern Australia, which is comparable to the generalised least squares regression and region-of-influence based RFFA approach. The ANN based RFFA model exhibits no noticeable spatial trend in the relative error values. Furthermore, the relative error values of the ANN based RFFA model are found to be independent of catchment area.

The findings of this research would help to recommend the most appropriate RFFA techniques in the 4th edition of Australian Rainfall and Runoff, which is due to be published in 2015.

University of Western Sydney III

Artificial Intelligence Based RFFA Aziz

STATEMENT OF AUTHENTICTY

I certify that all materials presented in this thesis are of my own contribution, and that any work adopted from other sources is duly cited and referenced as such. This thesis contains no material that has been submitted for any award or degree in other university or institution.

Kashif Aziz

University of Western Sydney IV

Artificial Intelligence Based RFFA Aziz

ACKNOWLEDGMENTS

I would like to express my heartfelt gratitude to Associate Professor Ataur Rahman, who is not only a mentor of mine but a role model as well. This work would have not been possible without his support, encouragement and most importantly the patience during the completion of this work. I am also grateful to Associate Professor Gu Fang and Associate Professor Surendra Shrestha for their valuable advice, support and constructive feedback towards the completion of this research. I could not be prouder of my academic roots and hope that I can in turn pass on the research values and the dreams that my supervisors have given to me.

I would not have contemplated this road if not for my parents, Mr. and Mrs. Choudhry Abdul Aziz (late), who instilled within me a love of knowledge and a spirit of struggle to achieve the goal, all of which finds a place in this thesis. To my parents, thank you. I sincerely acknowledge and appreciate the support and patience of my wife Rabia Rehman during this study by looking after myself and our kids. I am also thankful to my family and friends in Australia and overseas for their prayers and encouragement.

To the staff and fellow students at University of Western Sydney’s School of Computing Engineering and Mathematics, I am grateful for your help, encouragement and the company I have enjoyed during my candidature. Thank you for welcoming me as a friend and for your moral support.

I would like to acknowledge the technical and financial support of all the related Government agencies for providing the resources towards the completion of this research.

University of Western Sydney V

Artificial Intelligence Based RFFA Aziz Publications made (UNTIL June 2015) from this study

Aziz. K., Rahman, A., Fang, G., Shrestha, S. (2014). Application of Artificial Neural Networks in Regional Flood Frequency Analysis: A Case Study for Australia, Stochastic Environment Research & Risk Assessment, 28, 3, 541-554.

Aziz, K., Rai, S., Rahman, A. (2014). Design flood estimation in ungauged catchments using genetic algorithm based artificial neural network (GAANN) technique for Australia, Natural Hazards, 77, 2, 805-821.

Aziz, K., Rahman, A., Shamseldin, A.Y., Shoaib, M. (2013). Co-Active Neuro Fuzzy Inference System for Regional Flood Estimation in Australia, Journal of Hydrology and Environment Research, 1, 1, 11-20.

Aziz, K., Sohail, R., Rahman, A. (2014). Application of Artificial Neural Networks and Genetic Algorithm for Regional Flood Estimation in Eastern Australia, 35th Hydrology and Water Resources Symposium, Perth, Engineers Australia, 24-27 Feb, 2014.

Aziz, K., Rahman, A., Shamseldin, A., Shoaib, M. (2013). Regional flood estimation in Australia: Application of gene expression programming and artificial neural network techniques, 20th International Congress on Modelling and Simulation, 1 to 6 December, 2013, Adelaide, Australia, 2283-2289.

Aziz, K., Rahman, A., Fang, G. Shrestha, S. (2012). Comparison of Artificial Neural Networks and Adaptive Neuro-fuzzy Inference System for Regional Flood Estimation in Australia, Hydrology and Water Resources Symposium, Engineers Australia, 19-22 Nov 2012, Sydney, Australia.

Aziz, K., Rahman, A., Shrestha, S., Fang, G. (2011). Derivation of optimum regions for ANN based RFFA in Australia, 34th IAHR World Congress, 26 June – 1 July 2011, Brisbane, 17- 24.

Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2011). Application of Artificial Neural Networks in Regional Flood Estimation in Australia: Formation of Regions Based on Catchment Attributes, The Thirteenth International Conference on Civil, Structural and Environmental Engineering Computing and CSC2011: The Second International Conference on Soft Computing Technology in Civil, Structural and Environmental Engineering, Chania,

University of Western Sydney VI

Artificial Intelligence Based RFFA Aziz

Crete, Greece, 6-9 September, 2011, 13 pp.

Aziz, K., Rahman, A., Fang, G., Haddad, K. and Shrestha, S. (2010). Design flood estimation for ungauged catchments: Application of artificial neural networks for eastern Australia, World Environmental and Water Resources Congress 2010, American Society of Civil Engineers (ASCE), 16-20 May 2010, Providence, Rhode Island, USA, pp. 2841-2850.

University of Western Sydney VII

Artificial Intelligence Based RFFA Aziz TABLE OF CONTENTS

ABSTRACT………………………………………………………………………………… II STATEMENT OF AUTHENTICITY…………………………………………………… IV ACKNOWLEDGEMENT ……………………………………………………………… … V PUBLICATIONS MADE (UNTIL DECEMBER 2014) FROM THIS STUDY...... VI LIST OF FIGURES……………………………………………………………………… XII LIST OF TABLES…………………………………………………………………… XVIII LIST OF SYMBOLS…………………………………………………………………… XX LIST OF ABBREVIATIONS…………………………………………………………….XXII

CHAPTER 1 ...... 1 INTRODUCTION ...... 1 1.1 General ...... 1 1.2 Background ...... 1 1.3 Need for this research ...... 5 1.4 Scope and objectives of the study ...... 6 1.5 Research questions ...... 7 1.6 Summary of research undertaken in this thesis ...... 8 1.7 Outline of the thesis ...... 9 CHAPTER 2 ...... 12 REVIEW OF REGIONAL FLOOD FREQUENCY ANALYSIS METHODS ...... 12 2.1 General ...... 12 2.2 Design flood estimation methods ...... 12 2.2.1 Streamflow-based flood estimation methods ...... 13 2.3 Techniques for RFFA ...... 15 2.3.1 Linear techniques ...... 15 2.3.2 Non-linear RFFA techniques ...... 21 2.4 Summary ...... 32 CHAPTER 3 ...... 34 METHODOLOGY ...... 34 3.1 General...... 34 3.2 Methods adopted in the study ...... 34 3.2.1 Artificial neural network (ANN) ...... 35 3.2.2 Genetic algorithm based ANN (GAANN) ...... 39 3.2.3 Gene-expression programming ...... 45 3.2.4 Co-active neuro fuzzy inference system (CANFIS) ...... 47 University of Western Sydney VIII

Artificial Intelligence Based RFFA Aziz

3.2.5 Quantile regression technique (QRT) ...... 51 3.2.6 Cluster analysis ...... 53 3.2.7 Principle component analysis (PCA) ...... 55 3.2.8 Model validation technique ...... 55 3.3 Summary ...... 56 CHAPTER 4 ...... 57 SELECTION OF STUDY AREA AND DATA PREPARATION ...... 57 4.1 General...... 57 4.2 Selection of study area ...... 57 4.3 Selection of study catchments ...... 58 4.3.1 Factors considered for selection of catchments ...... 58 4.4 Streamflow data preparation ...... 59 4.4.1 Methods of streamflow data preparation ...... 59 4.4.2 Tests for outliers ...... 60 4.4.3 Trend analysis ...... 60 4.4.4 Rating error analysis ...... 61 4.5 Selection of catchment characteristics ...... 62 4.5.1 Selection criteria ...... 62 4.5.2 Catchment characteristics considered in this thesis ...... 63 4.5.3 Rainfall intensity ...... 63 4.5.4 Mean annual rainfall ...... 64 4.5.5 Catchment area ...... 64 4.5.6 Slope S1085 ...... 65 4.5.7 Mean annual evapo-transpiration ...... 66 4.6 Streamflow data preparation for various states ...... 66 4.6.1 NSW and ACT ...... 66 4.6.3 Queensland ...... 73 4.6.4 Victoria ...... 76 4.5 Flood frequency analysis ...... 81 4.6 Summary of catchment characteristics data ...... 82 4.7 Summary ...... 83 CHAPTER 5 ...... 84 SELECTION OF PREDICTOR VARIABLES FOR ARTIFICIAL INTELLIGENCE BASED RFFA MODELS ...... 84 5.1 General...... 84 5.2 Initial selection of predictor variables for artificial intelligence based RFFA models .. 84 5.3 Selection of Predictor variables for ANN based RFFA models ...... 88 5.4 Selection of predictor variables based on GEP models ...... 91 5.5 Summary ...... 95 CHAPTER 6 ...... 96

University of Western Sydney IX

Artificial Intelligence Based RFFA Aziz

SELECTION OF REGIONS ...... 96 6.1 General...... 96 6.2 Description of candidate regions ...... 96 6.2.1 Selection of the best performing region based on state, geographic and climatic boundaries ...... 97 6.3 Regions based on catchment characteristics data ...... 100 6.3.1 Cluster analysis ...... 100 6.3.2 Principal component analysis ...... 105 6.4 Summary ...... 111 CHAPTER 7 ...... 113 DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED RFFA MODELS ...... 113 7.1 General...... 113 7.2 Training of artificial intelligence based RFFA models ...... 114 7.3 Comparison of training and validation results ...... 120 7.3.1 ANN ...... 120 7.3.2 GAANN ...... 123 7.3.3 GEP ...... 126 7.3.4 CANFIS ...... 129 7.4 Selection of the best performing artificial intelligence based RFFA model based on training ...... 131 7.5 Summary ...... 133 CHAPTER 8 ...... 134 VALIDATION OF ARTIFICIAL INTELLIGENCE BASED RFFA MODELS ...... 134 8.1 General...... 134 8.2 Validation of RFFA models ...... 134 8.2.1 ANN ...... 134 8.2.2 GAANN ...... 138 8.2.3 GEP ...... 140 8.2.4 CANFIS ...... 143 8.3 Comparison of RFFA models based on validation data set ...... 145

8.3.1 Median Qpred/Qobs ratio ...... 145 8.3.2 Median RE (%) ...... 147 8.3.3 Median CE ...... 149 8.3.5 Comparison of RFFA models based on RE (%) ranges ...... 151 8.3.6 Selection of the best performing artificial intelligence based RFFA model ... 152 8.4 Performance of the finally selected artificial intelligence based RFFA model ...... 153 8.4.1 Spatial distribution of RE (%) of the ANN based RFFA model ...... 154 8.4.2 Catchment area vs RE ...... 157 8.5 Comparison with QRT ...... 158 8.6 Summary ...... 159 CHAPTER 9 ...... 161

University of Western Sydney X

Artificial Intelligence Based RFFA Aziz

SUMMARY, CONCLUSIONS AND RECOMMENDATIONS ...... 161 9.1 General...... 161 9.2 Summary of the research undertaken in this thesis ...... 161 9.3 Conclusions ...... 163 9.4 Recommendations for further research...... 164 REFERENCES ...... 166 REFERENCES ...... 167 APPENDICES ...... 182 APPENDIX A ...... 183 APPENDIX B ...... 205

University of Western Sydney XI

Artificial Intelligence Based RFFA Aziz List of Figures

Figure 1.1 Flooding at Ipswich, Queensland 2011 (ABC News, Australia) ...... 2

Figure 1.2 Aerial view of the flooded south western town of Wagga Wagga, NSW in March 2012 (ABC News,

2012) ...... 3

Figure 1.3 Illustration of major steps in this research ...... 9

Figure 2.1 Various design flood estimation methods (modified from Rahman et al., 1998) ...... 13

Figure 3.1 Different RFFA techniques adopted in this study ...... 34

Figure 3.2 Structure of typical natural neuron (Source: http://staff.itee.uq.edu.au/janetw/cmc/chapters/Introduction/) ...... 35

Figure 3.3 Configuration of Feedforward Three-Layer ANN (ASCE, 2000) ...... 36

Figure 3.4 Basic idea of genetic algorithm (Sohail et al., 2005) ...... 43

Figure 3.5 Flow chart showing steps in GAANN model ...... 44

Figure 3.6 An example of assigning gene values of a chromosome to the respective synaptic weights of ANN architecture during a GAANN modelling ...... 45

Figure 3.7 GEP expression tree (ET) ...... 47

Figure 3.8 Fuzzy inference system (FIS) (Shi and Mozimoto, 2000) ...... 48

Figure 3.9 A typical structure of CANFIS (Parthiban and Subramanian, 2009) ...... 50

Figure 4.1 Location of the selected study area (coloured parts of the map) ...... 57

Figure 4.2 Result of trend analysis (Station 219001). Here Vk is CUSUM test statistic defined in McGilchrist and

Wodyer, 1975 ...... 68

Figure 4.3 Result of trend analysis – time series plot (Station 219001) ...... 69

Figure 4.4 Histogram of rating ratios for 106 stations from NSW ...... 69

Figure 4.5 Distribution of streamflow record lengths of 96 stations from NSW and ACT ...... 70

Figure 4.6 Distribution of catchment areas of 96 stations from NSW and ACT ...... 70

Figure 4.7 Geographical distributions of 96 catchments from NSW and ACT ...... 71

Figure 4.8 Distribution of streamflow record lengths of the selected stations from ...... 72

Figure 4.9 Distribution of catchment areas of the selected stations from Tasmania ...... 72

Figure 4.10 Locations of selected catchments from Tasmania ...... 73

Figure 4.11 Distribution of streamflow record lengths of the selected 172 stations from QLD ...... 75

Figure 4.12 Distribution of catchment areas of the selected 172 stations from QLD ...... 75

University of Western Sydney XII

Artificial Intelligence Based RFFA Aziz

Figure 4.13 Locations of the selected 172 stations from QLD ...... 76

Figure 4.14 Time series graph showing significant trends after 1995 ...... 78

Figure 4.15 CUSUM test plot showing significant trends after 1995 ...... 78

Figure 4.15 Histogram of rating ratios (RR) of AM flood data in Victoria (stations with record lengths > 25 years) ...... 79

Figure 4.16 Distributions of streamflow record lengths of the selected 131 stations from Victoria ...... 81

Figure 4.17 Distributions of catchment areas of the selected 131 catchments from Victoria ...... 81

Figure 4.18 Geographical distributions of the selected 131 catchments from Victoria ...... 82

Figure 4.19 Locations of the study catchments ...... 82

Figure 6.1 Plot of median Qpred/Qobs ratio values for different ARIs for selected regions ...... 99

Figure 6.2 Median relative error (%) values for different ARIs for selected regions ...... 100

Table 6.4 Regions/groups formation by cluster analysis...... 101

Figure 6.3 Dendrogram using average linkage between groups ...... 102

Figure 6.3 (a) Section of Dendrogram using average linkage between groups ...... 103

Figure 6.3 (b) Section of Dendrogram using average linkage between groups ...... 104

Figure 6.4 Scree plot from principal component analysis ...... 107

Figure 6.5 Grouping derived from PC1 vs PC2 plot based on PC1 ...... 107

Figure 6.6 Grouping derived from PC1 vs PC2 plot based on PC2 ...... 108

Figure 6.7 Median Qpred/Qobs ratio values for different ARIs for candidate regions ...... 110

Figure 6.8 Median relative error (%) values for different ARIs for candidate regions ...... 111

Figure 6.9 Comparison of median relative error (%) values between combine data set and grouping based on K-

Means cluster analysis ...... 111

Figure 7.1 Plot of CE values of four artificial intelligence based RFFA models based on training data set ...... 114

Figure 7.2 Plot of median Qpred/Qobs ratio values of four artificial intelligence based RFFA models based on training data set ...... 115

Figure 7.3 Plot of median RE (%) values of four artificial intelligence based RFFA models based on training data set ...... 116

Figure 7.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q20 (training data set) ...... 117

Figure 7.5 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q20

(training data set) ...... 118

University of Western Sydney XIII

Artificial Intelligence Based RFFA Aziz

Figure 7.6 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q20 (training data set) ...... 119

Figure 7.7 Comparison of observed and predicted flood quantiles for CANFIS based RFFA model for Q20

(training data set) ...... 119

Figure 7.8 Plot comparing the CE values given by the training and validation data sets for the ANN based RFFA model ...... 121

Figure 7.9 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the

ANN based RFFA model ...... 121

...... 122

Figure 7.10 Plot comparing the median RE (%) values given by the training and validation data sets for the ANN based RFFA model...... 122

Figure 7.11 Regression plot comparing the training and validation of the ANN based RFFA model for Q20 .....122

Figure 7.12 Plot showing the training state of the ANN based RFFA model for Q20 ...... 123

Figure 7.13 Plot between Qobs and Qpred for the ANN based RFFA model for the validation data set ...... 123

Figure 7.14 Plot comparing the CE values given by the training and validation data sets for the GAANN based

RFFA model...... 125

Figure 7.15 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the GAANN based RFFA model ...... 125

Figure 7.16 Plot comparing the median RE (%) values given by the training and validation data sets for the

GAANN based RFFA model ...... 126

Figure 7.17 Plot comparing the CE values given by the training and validation data sets for the GEP based

RFFA model...... 127

Figure 7.18 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the GEP based RFFA model ...... 128

Figure 7.19 Plot comparing the median RE (%) values given by the training and validation data sets for the GEP based RFFA model...... 128

Figure 7.20 Plot comparing the CE values given by the training and validation data sets for the CANFIS based

RFFA model...... 130

Figure 7.21 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the CANFIS based RFFA model ...... 130

University of Western Sydney XIV

Artificial Intelligence Based RFFA Aziz

Figure 7.22 Plot comparing the median RE (%) values given by the training and validation data sets for the

CANFIS based RFFA model ...... 131

Figure 8.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q20 ...... 135

Figure 8.2 Boxplot of relative error (RE) values for ANN based RFFA model ...... 136

Figure 8.3 Boxplot of Qpred/Qobs ratio values for ANN based RFFA model ...... 137

Figure 8.4 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q20 .....138

Figure 8.5 Boxplot of relative error (RE) values for GAANN based RFFA model ...... 139

Figure 8.6 Boxplot of Qpred/Qobs ratio values for GAANN based RFFA model ...... 140

Figure 8.7 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q20 ...... 141

Figure 8.8 Boxplot of relative error (RE) values for GEP based RFFA model ...... 142

Figure 8.9 Boxplot of Qpred/Qobs ratio values for GEP based RFFA model ...... 143

Figure 8.10 Comparison of observed and predicted flood quantiles for CANFIS based RFFA model for Q20 ...144

Figure 8.11 Boxplot of relative error (RE) values for CANFIS based RFFA model ...... 145

Figure 8.12 Boxplot of Qpred/Qobs ratio values for CANFIS based RFFA model ...... 146

Figure 8.13 Plot of median Qpred/Qobs ratio values for the four artificial intelligence based RFFA models ...... 148

Figure 8.14 Plot of median RE (%) values for the four artificial intelligence based RFFA models ...... 150

Figure 8.15 Plot of median CE values for the four artificial intelligence based RFFA models ...... 151

Figure 8.16 Spatial distribution of RE of ANN based model across NSW ...... 154

Figure 8.17 Spatial distribution of RE of ANN based model across VIC ...... 155

Figure 8.18 Spatial distribution of RE of ANN based model across North QLD ...... 156

Figure 8.19 Spatial distribution of RE of ANN based model across Southeast QLD ...... 156

Figure 8.20 Spatial distribution of RE of ANN based model across QLD ...... 157

Figure 8.21 Spatial distribution of RE of ANN based model across TAS ...... 157

Figure 8.22 Plot between catchment area and RE (%) values for ANN based RFFA model for 90 test catchments

...... 158

Figure B.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q2 (training data set) ...... 206

Figure B.2 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q5 (training data set) ...... 206

Figure B.3 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q10 (training data set) ...... 207

University of Western Sydney XV

Artificial Intelligence Based RFFA Aziz

Figure B.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q50 (training data set) ...... 207

Figure B.5 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q100 (training data set) ...... 208

Figure B.6 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q2

(training data set) ...... 208

Figure B.7 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q5

(training data set) ...... 209

Figure B.8 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q10

(training data set) ...... 209

Figure B.9 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q50

(training data set) ...... 210

Figure B.10 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q100

(training data set) ...... 210

Figure B.11 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q2 (training data set) ...... 211

Figure B.12 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q5 (training data set) ...... 211

Figure B.13 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q10 (training data set) ...... 212

Figure B.14 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q50 (training data set) ...... 212

Figure B.15 Comparison of observed and predicted flood quantiles (training) for GEP based RFFA model for

Q100 (training data set) ...... 213

Figure B.16 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q2 (training data set) ...... 213

Figure B.17 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q5 (training data set) ...... 214

Figure B.18 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q10 (training data set) ...... 214

University of Western Sydney XVI

Artificial Intelligence Based RFFA Aziz

Figure B.19 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q50 (training data set) ...... 215

Figure B.20 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q100 (training data set) ...... 215

Figure B.21 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for

Q2 ...... 216

Figure B.22 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for

Q5 ...... 216

Figure B.23 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for

Q10 ...... 217

Figure B.24 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for

Q50 ...... 217

Figure B.25 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for

Q100 ...... 218

Figure B.26 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q2 ...... 218

Figure B.27 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q5 ...... 219

Figure B.28 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q10 ...... 219

Figure B.29 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q50 ...... 220

Figure B.30 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q100 ...... 220

Figure B.31 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for

Q2 ...... 221

Figure B.32 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for

Q5 ...... 221

Figure B.33 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for

Q10 ...... 222

University of Western Sydney XVII

Artificial Intelligence Based RFFA Aziz

Figure B.34 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for

Q50 ...... 222

Figure B.35 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for

Q100 ...... 223

Figure B.36 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q2 ...... 223

Figure B.37 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q5 ...... 224

Figure B.38 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q10 ...... 224

Figure B.39 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q50 ...... 225

Figure B.40 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q100 ...... 225

Figure B.41 Regression plot comparing the training and validation of the ANN based RFFA model for Q2 ...... 226

Figure B.42 Regression plot comparing the training and validation of the ANN based RFFA model for Q5 ...... 226

Figure B.43 Regression plot comparing the training and validation of the ANN based RFFA model for Q10 ....227

Figure B.44 Regression plot comparing the training and validation of the ANN based RFFA model for Q50 ....227

Figure B.45 Regression plot comparing the training and validation of the ANN based RFFA model for Q100 ...228

...... 229

Figure B.46 Section of Dendrogram using average linkage between groups ...... 229

...... 230

Figure B.47 Section of Dendrogram using average linkage between groups ...... 230

University of Western Sydney XVIII

Artificial Intelligence Based RFFA Aziz List of Tables

Table 3.1 Parameters used per run in GEP model ...... 47

Table 4.1 Summary statistics of the catchment characteristics data ...... 83

Table 5.1 Catchment characteristics predictor variables used in some previous RFFA studies ...... 85

Table 5.2 Various candidate models and catchment characteristics used ...... 87

Table 5.3 Comparison of eight different ANN based RFFA models using 90 independent test catchments ...... 89

Table 5.4 Rating on the basis of median Qpred/Qobs ratio ...... 90

Table 5.5 Grouping of stations on the basis of median Qpred/Qobs ratio using the criteria of Table 5.4 (ANN based RFFA models) ...... 90

Table 5.6 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio value using 90 independent test catchments (ANN based RFFA models) ...... 91

Table 5.7 Comparison of Model 1 and Model 2 on the basis of median relative error (RE) values using 90 independent test catchments (ANN based RFFA models) ...... 91

Table 5.8 Comparison of eight different GEP based RFFA models using 90 independent test catchments ...... 92

Table 5.9 Grouping of stations on the basis of median Qpred/Qobs ratio values using the criteria of Table 5.4 for GEP based RFFA models ...... 94

Table 5.10 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio values using 90 independent test catchments (for GEP based RFFA models ) ...... 94

Table 5.11 Comparison of Model 1 and Model 2 on the basis of RE values using 90 independent test catchments (for GEP based RFFA models) ...... 94

Table 6.1 Description of candidate regions ...... 97

Table 6.2 Median Qpred/Qobs ratio values for seven ANN based candidate regions ...... 98

Table 6.3 Median relative error values (%) for seven ANN-based candidate regions ...... 99

Table 6.4 Regions/groups formation by cluster analysis...... 101

Table 6.5 ANN based RFFA model performances for cluster groupings A1 & A2...... 105

Table 6.6 ANN- based RFFA model performances for cluster groupings B1 & B2 ...... 105

Table 6.7 Eigenvalues and variance explained by the principal components ...... 106

Table 6.8 Component matrix in principal component analysis ...... 106

Table 6.9 Descriptive statistics of standardised variables ...... 107

Table 6.10 Grouping based on principal component analysis ...... 109

Table 6.11 Median Qpred/Qobs ratio values for seven candidate regions ...... 110

University of Western Sydney XIX

Artificial Intelligence Based RFFA Aziz

Table 6.12 Median relative error (%) ...... 110

Table 7.1 CE values of four artificial intelligence based RFFA models based on training data set ...... 114

Table 7.2 Median Qpred/Qobs ratio values of four artificial intelligence based RFFA models based on training data set ...... 115

Table 7.3 Median RE (%) values of four artificial intelligence based RFFA models (training) ...... 116

Table 7.4 Comparison of training and validation results for the ANN based RFFA model...... 120

Table 7.5 Comparison of training and validation results for the GAANN based RFFA model ...... 124

Table 7.6 Comparison of training and validation results for the GEP based RFFA model ...... 127

Table 7.8 Comparison of training and validation results for the CANFIS based RFFA model ...... 129

Table 7.9 Ranking of the four artificial intelligence based RFFA models with respect to training ...... 132

Table 7.10 Ranking of the four artificial intelligence based RFFA models with respect to agreement between training and validation ...... 132

Table 8.1 Median Qpred/Qobs ratio values for the four artificial intelligence based RFFA models ...... 147

Table 8.2 Median RE (%) values for the four artificial intelligence based RFFA models ...... 149

Table 8.3 Median CE values of the four artificial intelligence based RFFA models ...... 150

Table 8.4 Grouping of 90 test catchments based on RE (%) ranges for the four artificial intelligence based RFFA models ...... 152

Table 8.5 Ranking of the four artificial intelligence based RFFA models for eastern Australia...... 153

Table 8.6 Median Qpred/Qobs ratio values for seven ANN based candidate regions and QRT ...... 159

Table 8.7 Median relative error values (%) for seven ANN based candidate regions and QRT ...... 159

Table 8.8 Coefficient of efficiency (CE) values for seven ANN based candidate regions and QRT ...... 159

University of Western Sydney XX

Artificial Intelligence Based RFFA Aziz List of symbols

A Catchment area (km2) bj The threshold value associated with the node j

0 Regression coefficient C Runoff coefficient

CY Dimensionless runoff coefficient for ARI of Y years d Sub-storm duration (h)

Ei Elevation at ith level (m) E Mean annual aerial evapotranspiration (mm) f Activation function g The binary gene I Rainfall intensity (mm/s)

I tc ,Y Average rainfall intensity for time of concentration tc and Y years ARI (mm/h) J Node in neural networks L Mainstream length (km) l Length of a chromosome n Number of samples and points pc Crossover rate pm Mutation rate Q Flood discharge (m3/s) 3 Q2 Flood peak discharge for 2 years average recurrence interval (ARI) (m /s) 3 QE Estimated flow (m /s) 3 QM Maximum measured flow (m /s) 3 Qobs Observed flood quantile (m /s)

3 Q Mean of Qobs (m /s)

3 Qpred Predicted flood quantile (m /s)

3 QY Peak flow rate for an ARI of Y years (m /s) 3 QT Peak flow rate for T years (m /s) R2 Coefficient of determination R Mean annual rainfall (mm/h) S1085 Slope of central 75% of mainstream (m/km) tc Time of concentration (h) T Return period (average recurrence interval) (year)

Vk CUSUM test statistic

University of Western Sydney XXI

Artificial Intelligence Based RFFA Aziz

Wj An input vector of jth node wij The connection weight from the ith node w Value of a synaptic weight xmaxabs Absolute maximum difference xn nth Input variable X An input vector

University of Western Sydney XXII

Artificial Intelligence Based RFFA Aziz List of abbreviations BoM Bureau of Meteorology ACT Australian Capital Territory AEP Annual exceedance probability AM Annual maximum ANFIS Adaptive neuro fuzzy inference system ANN Artificial neural network ARI Average recurrence interval ARMA Autoregressive Moving Average ARR Australian Rainfall and Runoff AUSIFD Software for Intensity-frequency-duration BP Backpropagation BGLS Bayesian Generalised Least Square BGLS-ROI Bayesian Generalised Least Square - Region-of-influence BITRE Bureau of Infrastructure, Transport and Regional Economics CANFIS Co-active neuro fuzzy inference system CE Co-efficient of Efficiency CD Compact Disc DOW Department of Water Elman Elman partial recurrent neural network ETs Expression Trees FFBP Feedforward Backpropagation FFA Flood frequency analysis FFN Fuzzy neural Network FIS Fuzzy inference system FLIKE Flood frequency analysis software GA Genetic algorithm GAANN Genetic algorithm based artificial neural network GB Grubbs and Beck GEP Gene expression programming GIUH Geomorphologic Instantaneous Unit Hydrograph GLS Generalised Least Square IFM Index Flood Method I. E. Australia Institution of Engineers Australia IFD Intensity-frequency-duration or design rainfall depth IM Instantaneous Maximum University of Western Sydney XXIII

Artificial Intelligence Based RFFA Aziz

LGP Linear Genetic Programing LM Lavenberg-Marquardt LP3 Log Pearson Type 3 LR Logistic Regression MATLAB MATrix LABoratory MF Membership Function MINITAB Statistical Package MLFN Multilayer Feedforward Neural Network MMD Monthly Maximum mean Daily MSE Mean Squared Error NCWE National Committee on Water Engineering NFS Neuro Fuzzy System NSW New South Wales NRW Department of Natural Resources & Water OLS Ordinary Least Square PCA Principle Component Analysis pdf Probability density function PRM Probabilistic Rational Method QRT Quantile Regression Technique r Ratio of predicted and observed flood quantile RE Relative error RFFA Regional flood frequency analysis ROI Region of Influence RR(s) Rating ratio(s) SDRR Summer Dominated Rainfall Region SWMM Storm Water Management Model TAS Tasmania TDNN Time Delay Neural Network TSK Takagi, Sugeno and Kang (Fuzzy model) UK United Kingdom USGS United States’ Geological Survey VIC Victoria WDRR Winter Dominated Rainfall Region

University of Western Sydney XXIV

Artificial Intelligence Based RFFA Aziz CHAPTER 1

INTRODUCTION

1.1 General

This thesis focuses on regional flood estimation by applying various non-linear techniques based on artificial intelligence. The non-linear techniques considered in this thesis include artificial neural network (ANN), genetic algorithm based artificial neural network (GAANN), gene-expression programing (GEP) and co-active neuro fuzzy inference system (CANFIS). This thesis aims to explore and enhance the non-linear techniques in regional flood estimation so that these techniques can be applied to ungauged and poorly gauged catchments to obtain accurate design flood estimates in Australia. This chapter begins by presenting a background to this research, need for this research, research questions to be investigated, research tasks undertaken and an outline of this thesis.

1.2 Background

Flood is one of the worst natural disasters, which brings disruptions to services and damages to infrastructure, crops and properties and sometimes causes loss of human lives. For example, 2010-11 floods in Queensland caused 35 deaths. Effects on industry and other production units and the costs in the form of health disaster due to flooding also add up to the overall losses to Australian economy. In Australia, the average annual flood damage is worth over $377 million and infrastructure requiring design flood estimate is over $1 billion per annum (BITRE, Australia). The state of New South Wales (NSW) alone has an average annual cost of flood damage of over $172 million, which is almost 46% of the average annual cost for Australia. The state of Queensland is second largest in terms of flood damage, with an average annual cost of $125 million. Importantly, the 2010-11 devastating flood in Queensland caused flood damage over $5 billion (Queensland Reconstruction Authority, 2011). Figure 1.1 shows flooding of Ipswich city in Queensland during the 2010-2011 flooding. Figure 1.2 shows an aerial view of the flooded south western town of Wagga Wagga, NSW in March 2012.

University of Western Sydney 1

Artificial Intelligence Based RFFA Aziz

Figure 1.1 Flooding at Ipswich, Queensland 2011 (ABC News, Australia)

Floods are caused by factors such as heavy rainfall, snowmelt, dam break and cyclones. The catchment and land use characteristics determines the magnitude of flooding from a given rainfall event. Urbanisation and clearing of catchment increase the flood risk for a given catchment. Apart from rural areas, flood is a serious problem in urban areas where the runoff volume increases due to increased impervious area plus shorter response time. Climate change has increased the frequency and magnitude of extreme rainfall events resulting in many devastating floods in recent years (Ishak et al., 2013; Ishak and Rahman, 2014). Australian Bureau of Meteorology (BOM) in its state of the climate report 2014 stated “An increase in the number and intensity of extreme rainfall events is projected for most regions”. This means there will be more extreme floods in most regions of Australia (BOM, 2014).

Flood damage can be minimised by ensuring optimum capacity to drainage infrastructures. An underdesign of these structures increases flood damage cost whereas an overdesign incurs unnecessary expenses. The optimum design of drainage infrastructures depends largely on reliable estimation of design floods which is a flood discharge associated with a given annual exceedance probability (AEP).

Design flood estimation is required in numerous engineering applications e.g., design of bridge, culvert, weir, spill way, detention basin, flood protection levees, highways, floodplain modelling, flood insurance studies and flood damage assessment tasks. For design flood

University of Western Sydney 2

Artificial Intelligence Based RFFA Aziz estimation, the most direct method is flood frequency analysis, which requires long period of recorded streamflow data at the site of interest. This is not a feasible option at many locations due to absence or limitation of streamflow records. As at 1993, of the 12 drainage divisions in Australia, seven did not have a stream with 20 or more years of data (Vogel et al., 1993). Australian Rainfall and Runoff (ARR) 1987 recommended various design flood estimation techniques for ungauged catchments for different regions of Australia (I. E. Aust., 1987, 2001). Since 1987, the methods in the ARR have not been upgraded although there have been an additional 20 years of streamflow data available and notable developments in both at-site and regional flood frequency analyses techniques in Australia and internationally.

Figure 1.2 Aerial view of the flooded south western town of Wagga Wagga, NSW in March 2012 (ABC News, 2012)

Different regional flood estimation methods have been proposed for different parts of Australia (I. E. Aust., 1987). Among these, various forms of the rational method and the index flood method are the most common. However, these methods have not been updated since 1987. Because of changing climatic conditions and improvements in regional flood estimation methods in recent years, there is a need to look for new regional flood estimation techniques for Australia. Some of the recent developments in regional flood estimation in Australia include L moments based index flood method (Bates et al., 1998; Rahman et al., 1999), various forms of regression techniques (Rahman, 2005; Haddad et al., 2006, 2008, 2009, 2014; Haddad and Rahman, 2012; Hackelbush et al., 2009; Zaman et al., 2012; Micevski et al., 2014) and regional Monte Carlo simulation (Rahman et al., 2002; Caballero and Rahman, 2014).

University of Western Sydney 3

Artificial Intelligence Based RFFA Aziz Regional flood frequency analysis (RFFA) is the generic name given to describe techniques which utilises streamflow data from gauged catchments in a region to estimate design floods for poorly gauged or ungauged catchments. The use of RFFA enables the “transfer” of flood characteristics information from gauged to ungauged catchments (Bloschl and Sivapalan, 1997; Pallard et al., 2009). The most commonly adopted RFFA methods have been described in Cunnane (1988) and Hosking and Wallis (1997). RFFA essentially consists of two principal steps: (a) formation of regions and (b) development of prediction equations.

Regions have traditionally been formed based on geographic, political, administrative or physiographic boundaries (e.g. NERC, 1975; I. E. Aust., 1987). Regions have also been formed in catchment characteristics data space using multivariate statistical techniques (e.g. Acreman and Sinclair, 1986; Nathan and McMahon, 1990; Rao and Srinivas, 2008; Guse et al., 2010). Regions can also be formed using a region-of-influence approach where a certain number of catchments based on proximity in geographic or catchment attributes space are pooled together based on some objective function to form an optimum region (e.g. Burn, 1990; Zrinji and Burn, 1994; Kjeldsen and Jones, 2009; Haddad and Rahman, 2012).

For developing the regional flood prediction equations, the commonly used techniques include the rational method, index flood method and quantile regression technique (QRT). The rational method has widely been adopted in estimating design floods for small ungauged catchments (e.g. Mulvany, 1851; I. E. Aust., 1987; Jiapeng et al., 2003; Pegram and Parak, 2004; Rahman et al., 2011). The index flood method has widely been adopted in many countries which heavily relies on the identification of homogeneous regions (Dalrymple, 1960; Hosking and Wallis, 1993; Bates et al., 1998; Rahman et al., 1999; Kjeldsen and Jones, 2010; Ishak et al., 2011). The QRT, proposed by the United States Geological Survey (USGS), has been applied by many researchers using either an Ordinary Least Square (OLS) or Generalized Least Square (GLS) regression technique (e.g. Benson, 1962; Thomas and Benson, 1970; Tasker, 1980; Stedinger and Tasker, 1985; Tasker et al., 1986; Madsen et al., 1997; Pandey and Nguyen, 1999; Bayazit and Onoz, 2004; Rahman, 2005; Griffis and Stedinger, 2007; Ouarda et al., 2008; Kjeldsen and Jones, 2009; Haddad and Rahman, 2011; Haddad et al., 2011, 2012).

Most of the above RFFA methods assume linear relationship between flood statistics and predictor variables in log domain while developing the regional prediction equations. However, most of the hydrologic processes are nonlinear and exhibit a high degree of spatial

University of Western Sydney 4

Artificial Intelligence Based RFFA Aziz and temporal variability and a simple log transformation cannot guarantee achievement of linearity in modeling. Therefore, there have been applications of artificial intelligence such as artificial neural networks (ANN), genetic algorithm based ANN (GAANN), gene expression programming (GEP) and co-active neuro-fuzzy inference system (CANFIS) based methods in water resources engineering such as rainfall runoff modeling and hydrologic forecasting, but there have been relatively few studies involving the application of these techniques to RFFA (e.g. Daniell, 1991; Muttiah et al., 1997; Shu and Burn, 2004; Kothyari, 2004; Dawson et al., 2006; Shu and Ouarda, 2007, 2008). Importantly, there has not been any known application of artificial intelligence based techniques in RFFA in Australia. Application of these techniques may help developing new improved RFFA techniques for Australia. Unlike regression based approach, the artificial intelligence based techniques do not impose any fixed model structure on the data rather the data itself identifies the model form through use of artificial intelligence.

This research seeks to fill the knowledge gap in RFFA by undertaking development and testing of artificial intelligence based RFFA models using the most extensive and comprehensive database that has become available in Australia as a part of the on-going revision of the Australian Rainfall and Runoff.

1.3 Need for this research

Flood is one of the worst natural disasters causing millions of dollars’ of damage each year in Australia. To reduce flood damage, accurate design flood estimates are needed to design infrastructures such as bridges, culverts and flood protection levees. Australia is the sixth largest country in the world with numerous streams. Most of these streams are ungauged or poorly gauged as monitoring of such a large number of streams is too expensive. Moreover, many of these streams are located far away from townships. The design flood estimation in small to medium sized ungauged catchments is of great economic significance (Pilgrim and Cordery, 1993). The need for flood estimation on ungauged catchments is one of the most important aspects in hydrologic practice as it covers a large number of catchments where hundreds of infrastructures are built each year in Australia. The accuracy of the flood estimation for ungauged catchments is important as an over-estimation would result in higher construction cost and under-estimation would increase flood damage. Hence, development of new and more accurate RFFA techniques is important since it will help to design adequate infrastructure that will allow passage of flood water safely. University of Western Sydney 5

Artificial Intelligence Based RFFA Aziz In Australia, linear modelling techniques have been adopted so far in developing RFFA models. The application of non-linear techniques such as artificial intelligence-based methods in RFFA may provide a viable alternative RFFA technique for Australia. This would assist in benchmarking the results of traditional RFFA models by comparing the results derived by artificial intelligence based RFFA models.

The findings of this research would help to recommend the most appropriate RFFA techniques in the 4th edition of Australian Rainfall and Runoff, which is due to be published in 2015.

1.4 Scope and objectives of the study

The study focuses on regional flood estimation problem, in particular it is devoted to investigate whether artificial intelligence-based RFFA techniques can be applied to eastern Australia. It requires carrying out a critical literature review on RFFA techniques, selection of study catchments, collation of flood, climatic and catchment characteristics data, delineation of regions, identification of the best set of predictor variables, training and validation of artificial intelligence-based RFFA models and comparison with other RFFA techniques.

The objectives of this study are:

 To carry out a critical literature review on RFFA methods with a particular emphasis on non-linear artificial intelligence based techniques and to identify the gaps in the current state of knowledge and further research opportunities on the artificial intelligence based techniques to regional flood estimation problem.

 To select study area and catchments from eastern Australia, to collate streamflow data, to select catchment characteristics that govern flood generation process and prepare the climatic and catchment characteristics data set for the RFFA modelling.

 To select the best performing set of predictor variables for the artificial intelligence based RFFA models.

 To form different candidate regions based on (i) state boundaries (ii) climatic and geographical boundaries and (iii) catchment characteristics data using multivariate

University of Western Sydney 6

Artificial Intelligence Based RFFA Aziz statistical techniques and identify the best performing region(s) for artificial intelligence based RFFA modelling.

 To train the artificial intelligence based RFFA models based on ANN, GANN, GEP and CANFIS.

 To validate the artificial intelligence based RFFA models using the validation data set and select the best performing model.

 To compare the best performing artificial intelligence based RFFA model with linear quantile regression technique.

 To make a conclusion based on the results obtained in the study.

1.5 Research questions

This thesis is devoted to answer the following research questions in relation to the development of artificial intelligence based RFFA models for Australia.

 Whether artificial intelligence based techniques can be applied in RFFA in Australia?

 What is the best set of predictor variables for the development of artificial intelligence based RFFA models in Australia?

 What is the best region(s) in artificial intelligence based RFFA modelling for Australia considering regions based on state boundaries, climatic and geographical boundaries and regions formed in catchment characteristics data space using multivariate statistical techniques?

 How various artificial intelligence based RFFA models can be trained/calibrated?

 Among different artificial intelligence based RFFA models (ANN, GAANN, GEP and CANFIS), which one provides the most accurate flood quantile estimates for Eastern Australia?

 How artificial intelligence based RFFA models compare with linear quantile regression technique?

University of Western Sydney 7

Artificial Intelligence Based RFFA Aziz 1.6 Summary of research undertaken in this thesis

The main research tasks undertaken in this thesis to answer the research questions posed in Section 1.4 are outlined below. Figure 1.3 illustrates major steps in this research.

 Perform a literature review on RFFA and critically examine advantages and disadvantages, limitations and assumptions associated with various RFFA techniques, with a particular emphasis on non-linear artificial intelligence based techniques. Based on the literature review, identify the gaps in the current state of knowledge and further research opportunities on the non-linear artificial intelligence based techniques to regional flood estimation.

 Select study area and catchments. Prepare streamflow data by filling gaps in the annual maximum flood series, checking for outliers, rating curve error and trends. Select catchment characteristics that govern flood generation and prepare the climatic and catchment characteristics data set.

 Select the best performing set of predictor variables for the artificial intelligence based RFFA models by comparing various combinations of the initially selected candidate catchment characteristics variables.

 Form different candidate regions based on (i) state boundaries (ii) climatic and geographical boundaries and (iii) catchment characteristics data using multivariate statistical techniques. Compare the performances of the candidate regions and select the best performing region for artificial intelligence based RFFA modelling.

 Develop artificial intelligence based RFFA models based on ANN, GANN, GEP and CANFIS. Train the model using the training data set (80% of the selected catchments), which involves minimisation of the mean squared error between the observed and predicted flood quantiles by the model (being trained) for a given ARI for the training data set. Evaluate the training of the model based on a number of statistical criteria: plot of predicted and observed flood quantiles, median ratio of predicted and observed flood quantiles, median relative error and coefficient of efficiency.

University of Western Sydney 8

Artificial Intelligence Based RFFA Aziz  Validate the artificial intelligence based RFFA models using the validation data set (20% of the selected catchments) and select the best performing model.

 Compare the best performing artificial intelligence based RFFA model with linear quantile regression technique.

Figure 1.3 Illustration of major steps in this research

1.7 Outline of the thesis

The research undertaken in this study is presented in nine chapters and four appendices, as outlined below.

Chapter 1 presents a brief introduction to the proposed research. This includes a background of the proposed research. This chapter also presents the needs for this research, research questions being examined and the main research tasks undertaken to answer the identified research questions.

Chapter 2 presents a critical review of RFFA techniques with a particular emphasis on non- linear techniques such as artificial neural network (ANN), co-active neuro-fuzzy inference system (CANFIS), genetic algorithm (GA) based ANN (GAANN) and gene-expression programming (GEP). At the beginning, various methods of flood estimation are discussed.

University of Western Sydney 9

Artificial Intelligence Based RFFA Aziz The review of linear methods including rational method, index flood method and regression method are then presented. The nonlinear artificial intelligence based methods are then discussed with a particular emphasis on their applications to hydrology. The assumptions, limitations, advantages and disadvantages of each of the RFFA methods are discussed. The current state of knowledge in RFFA, in particular the artificial intelligence based methods, is ascertained and the scopes of further research are identified.

Chapter 3 describes the mathematical tools adopted in this study. First, ANN is discussed, which is followed by a description of GAANN, GEP and CANFIS. The quantile regression technique is then discussed. The principles of cluster analysis and principal component analysis are then presented. Finally, the adopted model validation technique is discussed.

Chapter 4 presents selection of study area, study catchments and data preparation. First, criteria for selection of study catchments are presented. The methods of streamflow data preparation are discussed which include gap filling, outlier detection, trend analysis and rating curve error analysis. Selection of catchment characteristics is then presented. The preparation of annual maximum flood series data is then described. Estimation of flood quantiles for average recurrence intervals of 2, 5, 10, 20, 50 and 100 years for the selected gauged catchments by at-site flood frequency analysis is then presented. Finally, a summary of the catchment characteristics data is provided.

Chapter 5 presents the results of selecting the set of predictor variables for the development of artificial intelligence based RFFA models. First, an initial selection is made based on the findings of previous studies. These candidate sets of predictor variables are then evaluated using ANN and GEP based RFFA models. The final set of predictor variables is then selected.

Chapter 6 presents the formation of regions using ANN based RFFA modelling technique. Regions/groupings are first formed on the basis of state, geographical and climatic boundaries. In the second step, the regions are formed in the catchment characteristics data space based on cluster analysis and principal component analysis. All these candidate regions are then compared and the best performing region is finally selected.

Chapter 7 presents the development of artificial intelligence based RFFA models using ANN, GAANN, GEP and CANFIS based on the selected predictor variables in Chapter 5 and optimum region in Chapter 6. The model development involves training of the model using

University of Western Sydney 10

Artificial Intelligence Based RFFA Aziz part of the randomly selected data set. For this purpose, 80% (362 catchments) of the total 452 catchments are used to train the model (training data set) and the remaining 20% (90 catchments) are used to validate the model (validation data set). A number of statistical criteria are adopted to assess the training of the four artificial intelligence based RFFA models.

Chapter 8 presents the validation of the artificial intelligence based RFFA models and quantile regression technique. Initially the four artificial intelligence based RFFA models are compared with each other to select the best artificial intelligence based RFFA model. Secondly, the best performing artificial intelligence based RFFA model is compared with the quantile regression technique. The spatial distribution of the relative error for the finally selected model is evaluated. Finally, the relationship of the relative error with catchment area is investigated.

Chapter 9 presents the summary of the research undertaken in this thesis, conclusions and recommendations for further research.

Appendix A presents the list of the study catchments. This provides the area of each catchment and the period of streamflow records.

Appendix B presents additional results to supplement the discussion presented in the main body of the thesis.

University of Western Sydney 11

Artificial Intelligence Based RFFA Aziz

CHAPTER 2

REVIEW OF REGIONAL FLOOD FREQUENCY ANALYSIS METHODS

2.1 General

Regional flood frequency analysis (RFFA) is the generic name given to describe techniques which utilise data from gauged catchments (donor) in a region to estimate design floods for poorly gauged and ungauged catchments (receiver). There are many RFFA techniques ranging from simple approximate methods to complex intelligence based techniques. RFFA technique such as rational method is based on runoff coefficients which are developed and used on the principles of geographical contiguity. Index flood method is based on the concept of homogeneous regions which share a common set of growth factors, while regression based approaches are based on regional prediction equations. These methods are generally developed based on linear models; however, there are non-linear RFFA methods that are based on artificial intelligence such as artificial neural network (ANN). This chapter presents a review of various RFFA methods, in particular the non-linear intelligence based techniques, with a particular emphasis on the limitations of various methods, recent advancements and scope for further developments.

2.2 Design flood estimation methods

Different methods can be used to estimate a design flood for a given annual exceedance probability (AEP) or average recurrence interval (ARI) or return period (T). The ARI of the annual peak streamflow at a given location change if there are significant changes in the flow patterns at that location, possibly caused by an impoundment or diversion of flow. The effect of development (change of land use from forested or agricultural uses to commercial, residential and industrial uses) on peak flows is generally much greater for low ARI than than the higher ones. During these larger floods, the upper soil column is generally fully saturated and does not have the capacity to absorb much additional rainfall. Under these conditions, essentially all of the rain that falls, whether on paved surfaces or on saturated soil, runs off and becomes streamflow. The selection of a type of flood estimation method for a given

University of Western Sydney 12

Artificial Intelligence Based RFFA Aziz application largely depends on the data availability and the purpose of the flood estimates (Hoang, 2001). Lumb and James (1976), Feldman (1979), James and Robinson (1986) and Australian Rainfall and Runoff (ARR) (I. E. Australia, 1987) broadly classified design flood estimation methods into two broad categories: streamflow-based methods and rainfall-based methods. These are discussed below and illustrated in Figure 2.1.

2.2.1 Streamflow-based flood estimation methods

Streamflow-based flood estimation methods formulate the analysis entirely on recorded data from stream-gauging station in question and are applicable to gauged catchments, with a considerably long streamflow record length. In these methods, the design floods for a given AEP are estimated by undertaking a flood frequency analysis (FFA) of the observed streamflow data. In this context, a gauged catchment means that streamflow records exist for flood height and flood flow over a considerable period of time, normally 20 years or longer at the location of interest so that the parameters of the assumed probability distribution can be estimated with a reasonably high degree of confidence. The gauging locations are generally found within a given large catchment and located at the points of interests such as the convergence of two major creeks or the outlet of the catchment. FFA and regional flood frequency analysis (RFFA) are the most common streamflow-based methods and these are discussed below. It should be noted that RFFA methods generally consider catchment characteristics in estimation; however, FFA is solely dependent on streamflow records.

Figure 2.1 Various design flood estimation methods (modified from Rahman et al., 1998) University of Western Sydney 13

Artificial Intelligence Based RFFA Aziz Flood frequency analysis (FFA)

Flood frequency analysis (FFA) is a procedure of analysing the recorded flood data by adopting statistical methods. Statistical techniques, such as FFA, are used to estimate the AEP of flood or rainfall events. The ARI gives a general indication on how frequently a given discharge/rainfall will be exceeded on average over a longer period of time. The main objective of this statistical analysis is to develop a relationship between the magnitude of extreme flood events and their frequency of occurrence through the use of probability distributions (Chow et al., 1988). For the analysis to be of practical use, simpler distributions are often used to characterise the relation between flood magnitudes and their frequencies (Rao and Hamed, 2000). This deals mainly with direct frequency analysis, where a record of floods at or near the design site is available. The application of these methods is primarily made to flood peaks. These may sometimes be applied to flood volumes or even monthly maximum floods; however, little evidence is available on appropriate types of probability distributions in these cases (I. E. Australia, 1998). In terms of using the flood data, annual maximum flood data is more frequently adopted in FFA than the partial series flood data.

Regional flood frequency analysis (RFFA)

Regional flood frequency analysis (RFFA) is a mean of transferring flood frequency information from gauged catchments to another site on the basis of similarity in catchment characteristics (I. E. Aust., 1987). This procedure is important for estimating design floods at ungauged sites as this can stabilise site estimates using the regional relationships, particularly for parameters such as skew, which is more prone to small-sample errors and data extremes. In addition, regional relationship can mitigate the effects of outliers and can lead to more reliable extrapolation of flood frequency curve to rarer frequencies. RFFA although more commonly applied to ungauged catchments, this can also be adopted to enhance the design flood estimates at gauged sites where data may be limited in terms of record length.

The use of RFFA enables the transfer of flood characteristics information from gauged to ungauged sites if the donor catchments are hydrologically similar with the receiver ungauged site. Last couple of decades have seen extensive research on RFFA. The effort has been to develop new and improved reliable techniques for flood estimation. Because of vast area of study, diversity of climatic conditions and site characteristics, different researchers have emphasized on different issues relevant to RFFA. In the seventies and early eighties much effort was spent on developing efficient at-site FFA procedures, but late eighties proved to be

University of Western Sydney 14

Artificial Intelligence Based RFFA Aziz of quite significance in developing new and improved RFFA techniques (e.g. Greis and Wood, 1983; Potter, 1987; Kirby and Moss, 1987; Cunnane, 1987; NRC, 1988; and WMO, 1989). In late eighties, many suggested to compare the existing and available RFFA methods and to look for better information/data instead of developing new methods (Potter, 1987; Bobee et al., 1993a).

Many RFFA methods involve two major steps: (1) grouping of sites into homogeneous regions, and (2) developing regional estimation method. Grouping of sites into homogeneous regions or homogeneity is the main factor for the performance of many regional estimation methods in particular the index flood methods. Geographically contiguous regions have been used for a long time in hydrology, but have been criticised for being of arbitrary nature. In fact, the geographical proximity does not guarantee hydrological similarity. During the last five to ten years researchers have attempted to develop methods in which similarity between sites is defined in a multidimensional space of catchment or statistical characteristics (Douglas, 1995).

RFFA is needed to estimate design floods at the locations where there is a lack of sufficient recorded flood data. The reason of insufficient recorded flood data at many locations are it is quite expensive to operate stream gauges, and many streams are located at remote locations. Regional analyses, to some extent can compensate for the lack of temporal data, but introduce a spatial dimension which is not always well understood. Classical flood frequency analysis, be it at-site or regional, has been criticised for lacking balance, for putting too much emphasis on mathematical rigor while completely neglecting the understanding of the physical factors that cause flood events (Klemes, 1993). According to Klemes (1993), “If more light is to be shed on the probabilities of hydrological extremes then it will have to come from more information on the physics of the phenomena involved, not from more mathematics.'' This is a fact which is difficult to argue against. RFFA, in particular the identification of the physical or meteorological catchment characteristics that cause similarity in flood response, is a step in the right direction (e.g. Bates et al., 1998)

2.3 Techniques for RFFA

2.3.1 Linear techniques

Three linear RFFA methods are very common and are currently in use in most parts of the world:

University of Western Sydney 15

Artificial Intelligence Based RFFA Aziz  Rational method;

 Index flood method; and

 Regression method.

Rational method

The rational method is a simple technique for estimating a design discharge from a small watershed. The rational method was developed by Mulvany (1851) for small drainage basins in urban areas. This method has been widely regarded as a deterministic method for estimating the peak discharge from an individual storm. In Australian Rainfall and Runoff (ARR), probabilistic form of rational method known as Probabilistic Rational Method has been recommended (I. E. Aust., 1987). Application of the rational method is based on a simple formula that relates peak discharge with the average intensity of rainfall for a particular length of time (the time of concentration), and catchment area. The formula is:

QY = 0.278CY.ItcY.A (3.1)

Where

3 QY = Peak discharge (m /sec) of average recurrence interval (ARI) of Y years;

CY = runoff coefficient (dimensionless) for ARI of Y years;

A = area of catchment (km2)

ItcY = average rainfall intensity (mm/h) for design duration of tc hours and ARI of Y years.

This model is based on the following assumptions:

 The rainfall occurs uniformly over the drainage area;

 The peak rate of runoff can be reflected by the rainfall intensity averaged over a time period equal to the time of concentration of the drainage area; and

 The frequency of runoff is the same as the frequency of the rainfall used in the equation.

The use of the rational formula is subject to several limitations and procedural issues in its use: University of Western Sydney 16

Artificial Intelligence Based RFFA Aziz  The most important limitation is that the only output from the method is a peak discharge (the method provides only an estimate of a single point on the runoff hydrograph).

 The simplest application of the method permits and requires the wide latitude of subjective judgment by the user in its application. Therefore, the results are difficult to replicate.

 The average rainfall intensities used in the formula have no time sequence relation to the actual rainfall pattern during the storm.

 The computation of tc should include the overland flow time, plus the time of flow in open and/or closed channels to the point of design.

 The runoff coefficient, CY is usually estimated from map of runoff coefficient which is produced based on the assumption of geographical contiguity i.e. runoff of nearby catchments vary in a smooth fashion. This assumption is unlikely to be satisfied as there is no guarantee that two nearby catchments are hydrologically similar.

 Many users assume the entire drainage area is the value to be entered in the Rational method equation. In some cases, the runoff from only the interconnected impervious area yields the larger peak flow rate

In Australia, the Probabilistic Rational Method has been researched by Pilgrim and McDermott (1982), Adams (1987), Weeks (1991) and Rahman et al. (2008; 2011) and Pirozzi et al. (2009). There has been limited independent validation of the Probabilistic Rational Method and the user has little idea about the uncertainty in the estimated flood quantiles obtained from this method (Rahman and Hollerbach, 2003).

There have been few attempts to improve the rational method using a more advanced statistical treatment such as Franchini et al. (2005).

Index flood method

The index flood method (IFM), introduced by Dalrymple (1960), is the most widely used method of RFFA. It is based on the identification of a homogeneous region, within which the probability distribution of annual maximum peak flows is invariant except for a scale factor represented by the index flood (either the mean or median flood). Homogeneity with regards

University of Western Sydney 17

Artificial Intelligence Based RFFA Aziz to the index flood relies on the concept that the standardized flood peaks from individual sites in the region follow a common probability distribution with identical parameter values. From all the methods to be discussed in this thesis, this approach involves the strongest assumption on homogeneity.

The flood peak discharge with an assigned return period T relative to the selected site is, in fact, expressed as the product of two terms: the scale factor of the examined site (the index flood) and the dimensionless growth factor, which has regional validity i.e. it is fixed within a region. In general, it is assumed that the index flood is the average of annual maximum flood peak flows at the site of interest. For ungauged site, the index flood is estimated from a regional prediction equation that uses climate and catchment characteristics as predictor variables.

The literature contains numerous studies on the identification of homogeneous groups of catchments and the estimation of the growth factor (Reed et al., 1999; Burn and Goel, 2000; Castellarin et al., 2001), and relatively few on estimating the index flood. Recent studies in Australia, (Bates et al., 1998; Rahman et al., 1999), assigned ungauged catchments to a particular homogenous group identified (through the use of L-moments, (Hosking and Wallis, 1993)) on the basis of catchment and climatic characteristics as opposed to geographical proximity. However the deficiencies in this approach were already evident in that it needed 12 catchment/climatic descriptors to be used. Therefore its practical use is somewhat limited by its complexity and the time needed to gather the relevant data. On an international level Fill and Stedinger (1998) and Jeong et al. (2008), both demonstrated that the IFM can provide improved quantile estimation, when different sources of errors are reduced, such as sampling error and error due to inter-station correlation. As Australia is extremely diverse in hydrology there exists a greater heterogeneity among catchments, the use of IFM in Australia is limited (Bates et al., 1998) as results obtained through IFM would be subject to substantial error. Therefore a method in Australia is needed where the assumption of homogeneity can be relaxed and where heterogeneity can be accounted for by capturing the variability from site to site within a region. Such an approach is quantile regression technique, which is discussed below.

Australian Rainfall & Runoff (ARR) (I. E Aust., 1987) did not favour the IFM as a design flood estimation technique. This has been criticised on the basis that the coefficient of variation of the flood series may vary approximately inversely with catchment area, thus resulting in flatter flood frequency curves for larger catchments. This had particularly been

University of Western Sydney 18

Artificial Intelligence Based RFFA Aziz noticed in the case of humid catchments that differed greatly in size (Dawdy, 1961; Benson, 1962; Riggs, 1973; Smith, 1992).

L moments based index flood methods have widely been researched in recent years (e.g. Bates et al., 1998; Rahman et al., 1999; Zhang and Hall, 2004 and Saf, 2009).

Regression method

The quantile regression technique (QRT) for flood estimation was proposed by The United States Geological Survey (USGS). In this method a large number of gauged catchments are selected from a region and flood quantiles are estimated from recorded streamflow data, which are then regressed against climatic and catchment variables that are most likely to govern the flood generation process. Studies by Benson (1962) suggested that T-year flood quantile could be estimated directly using catchment characteristics data by multiple regression analysis. As with the index flood approach, this method is not based on a constant coefficient of variation (Cv) of annual maximum flood series in the region. It has been noted that the method can give design flood estimates that do not vary smoothly with T; however, hydrological judgment can be exercised in situations such as these when flood frequency curves need to be adjusted to increase smoothly with T.

The regression coefficients in the QRT are generally estimated by two methods:

 Ordinary least squares approach (OLS)

 Generalised least squares approach (GLS)

The OLS approach has traditionally been used by hydrologists to estimate the regression coefficients in regional hydrological models. But in order for the OLS model to be statistically efficient and robust, the annual maximum flood series in the region must be uncorrelated, all the sites in the region should have equal record length and all estimates of T year events should have equal variance. Since the annual maximum flow data in a region does not generally satisfy these assumptions, the assumption that the model residual errors in OLS are homoscedastic is violated and the OLS approach can provide distorted estimates of the model’s predictive precision (model error) and the precision with which the regression model coefficients are estimated (Stedinger and Tasker, 1985).

Stedinger and Tasker (1985) proposed the GLS procedure to overcome the above mentioned problem with the OLS. This approach can be used to estimate the parameters of regional

University of Western Sydney 19

Artificial Intelligence Based RFFA Aziz hydrologic regression models and can produce more accurate results than the OLS, in particular when the record length varies widely from site to site. In the GLS model, the assumptions of equal variance of the T year events and zero cross-correlation for concurrent flows are relaxed. Ever since its inception there have been a number of studies (e.g. Tasker, 1980; Kuczera, 1983; Tasker et al., 1986; Rosbjerg and Madsen, 1995; Madsen et al., 1997; Pandey and Nguyen, 1999; Bayazit and Onoz, 2004; Griffis and Stedinger, 2007; and Kjeldsen and Jones, 2009) that have dealt with the QRT in a GLS regression framework, all of these studies have looked at ways of minimising uncertainty in flood quantile estimation.

Regression based methods have been in the focus in Australia in recent years to estimate flood quantiles, for example, quantile regression technique (Rahman, 2005; Haddad et al., 2006, 2008, 2009, 2014) and parameter regression technique (Hackelbusch et al., 2009; Haddad and Rahman, 2012).

Different regional flood estimation methods have been proposed for different parts of Australia (I. E. Aust., 1987, 2001). Among these, various forms of the rational method and the index flood method are the most common. However, these methods have not been updated since 1987. Because of changing climatic conditions and improvements in regional flood estimation methods in recent years, there is a need to look for new regional flood estimation techniques for different parts of Australia. Some of the recent developments in regional flood estimation in Australia include L moments based index flood method (Bates et al., 1998; Rahman et al., 1999), various forms of regression techniques (Rahman, 2005; Haddad et al., 2006, 2008, 2009; Hackelbush et al., 2009).

Most of the above RFFA methods assume linear relationship between flood statistics and predicted variables. However, most of the hydrologic processes are nonlinear and exhibit a high degree of spatial and temporal variability. There have been applications of non-linear methods such as artificial neural network (ANN), adaptive neuro fuzzy inference system (ANFIS), co-active neuro fuzzy inference system (CANFIS), gene expression programming (GEP), genetic algorithm (GA) and genetic algorithm based artificial neural network (GAANN) in hydrology in different parts of the world. However, there has not been any notable application of these techniques in RFFA problem in Australia. Application of nonlinear techniques may help developing new improved regional flood estimation methods for Australia. Unlike regression based approach, these do not impose any fixed model structure on the data; rather the data itself identifies the model form through use of artificial intelligence. The discussion on various nonlinear RFFA methods is presented below:

University of Western Sydney 20

Artificial Intelligence Based RFFA Aziz 2.3.2 Non-linear RFFA techniques a) Artificial neural network (ANN)

An ANN is a mathematical or computational model that helps to simulate the structure and/or functional aspects of biological neural networks. Structurally, they are interconnected group of artificial neurons that process information using a connectionist approach to computation. Mostly, ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. Important aspect of ANN is its ability to model complex relationships between inputs and outputs or to find patterns in data.

The development of ANN began approximately 60 years ago (McCulloch and Pitts, 1943), inspired by a desire to understand the human brain and emulate its functioning. Within the last decade, it has experienced a huge resurgence due to the development of more sophisticated algorithms and the emergence of powerful computation tools. Extensive research has been devoted to investigate the potential of ANN as computational tools that acquire, represent, and compute a mapping from one multivariate input space to another (Wasserman, 1989).

The development of ANN techniques has experienced a renaissance only in the eighties due to efforts of Hopfield (1982) in iterative auto-associable neural networks. A tremendous growth in the interest of this computational mechanism has occurred since Rumelhart et al. (1986) rediscovered a mathematically rigorous theoretical framework for neural networks, i.e., back-propagation algorithm. Consequently, so far ANN has been applied to various fields like neurophysiology, physics, biomedical engineering, electrical engineering, computer science, acoustics, cybernetics, robotics, image processing and financing.

In early nineties, ANN was applied successfully in hydrology. In the very start this was used for rainfall-runoff modelling, streamflow forecasting, groundwater modelling, water quality, water management policy, precipitation forecasting, hydrologic time series modelling and reservoir operations.

Application of ANN in hydrology

Most hydrologic processes are highly nonlinear and exhibit a high degree of spatial and temporal variability. They are further complicated by uncertainty in parameter estimates. Hydrologists are often confronted with problems of prediction and estimation of quantities such as runoff, precipitation, contaminant concentrations, and water stages. This kind of

University of Western Sydney 21

Artificial Intelligence Based RFFA Aziz information is required in hydrologic and hydraulic engineering design as well as water resources management (ASCE, 2000).

Application of neural networks in hydrological modeling was inspired by the work on forecasting mapping (predictors) for chaotic dynamic systems (Farmer and Sidorowich, 1987). It followed a theorem, proven by Takens et al. (1981), that there exists a smooth function that is a predictor of a dynamic system featuring an attractor with a finite fractal dimension.

The Task Committee on Application of ANN in Hydrology by ASCE (2000) stated that ANN would have to be classified as empirical models. This approach is called a ‘‘model’’ as it has many features in common with other modelling approaches in hydrology. Empirical models treat hydrologic systems (such as a watershed) as a black-box and try to find a relationship between historical inputs (rainfall, temperature, etc.) and outputs (such as watershed runoff measured at a stream gauge). Lumped catchment models fall under this category (Blackie and Eeles, 1985). These methods need long historical records and have no physical basis and, as such, are not applicable for ungauged catchments. This was suggested that physical understanding can be useful in selecting the appropriate neural network (ASCE, 2000). As ANN are heavily a data based technique, the committee suggested that optimal data may be provided with limitation and with certain conditions based on existing sites.

An improvement over these kinds of models is the geomorphology-based models (e.g., Gupta and Waymire, 1993; and Corradini and Singh, 1985). These models represent the watershed structure and the stream network well, but various assumptions concerning the linearity of response of individual watershed units (streams and overland sections) are needed to be made.

ANN has been used in many rainfall and runoff forecasting applications. For example, Luk et al. (2001) used ANN forecasting model for rainfall forecasting in Australia. They identified three types of ANN suitable for this application: multilayer feedforward neural network (MLFN), Elman partial recurrent neural network (Elman) and time delay neural network (TDNN). They found that these ANN models can make reasonable forecast of rainfall one time step (15 minutes) ahead for 16 gauges concurrently.

A different approach of ANN was focused by Zhang and Govindaraju (2003), where they applied geomorphology based ANN (GANN) for estimation of direct runoff over watersheds catchment in Indiana, US. They concluded that GANN offer a promising step towards

University of Western Sydney 22

Artificial Intelligence Based RFFA Aziz elevating ANN from purely empirical models to those models that are based on geomorphology. In his analysis, he found GANN to be outperforming geomorphologic instantaneous unit hydrograph (GIUH) models.

Abrahart and See (2007) concluded that power of ANN model depends on the reduced set of inputs. Chokmani et al. (2008) compared the results from ANN and multiple regression techniques for ice-effected streamflow estimation in Canada. He used nine different variables as inputs and found ANN to be outperforming the regression techniques.

There have been some applications of ANN models in RFFA. Muttiah et al. (1997) used ANN for the 2-year flood prediction in USA catchments. For each gauging station, the two year peak discharge, drainage area, basin elevation, and average slope were extracted from the file (containing 150 variables for each gauging station) for statistical and neural network analysis; they concluded that ANN can provide reasonable estimates of Q2 discharge with simpler variable input (input vector reductions) requirements. They used a set of data from different catchments in USA. Kothyari (2004) used ANN for flood estimation of ungauged catchments in India. He selected data from 97 catchments spread over a large part of India, with area ranging from 14.5 km2 to 935,000 km2. He considered five different catchment characteristics as predictor variables including mean annual flood discharge, area, slope of catchment, rainfall and vegetation cover. He compared two scenarios: Scenario 1 with 12-neurons in the hidden layer and scenario 2 with 1 neuron only in the hidden layer. He found that scenario 2 provided the best results with minimum error and best R2 values for training, validation and testing data sets. He also described that an ANN model having more complex architecture than the one used in scenario 1 did not produce any better results. He suggested that the results from ANN models can be improved if the region is based on hydro-meteorological similarity.

Dawson et al. (2006) applied the ANN using different site descriptors for flood estimation at ungauged catchments in UK. They found that ANN could be used to estimate flood statistics for ungauged catchments quite successfully. While ANN had been trained in their study to model T-year flood magnitudes derived from the Gumbel distribution, they could just as easily be trained to model floods derived from any other distribution. Although it would have been possible to use conventional statistical approaches to build models for predicting T-year flood events, the ANN proved to be superior in their study. However, there were a few caveats to be noted. Firstly, the ANN was heavily data dependent. This was highlighted by University of Western Sydney 23

Artificial Intelligence Based RFFA Aziz improvements in skill achieved by training ANN on the full available data set instead of a limited (urban) data set. Secondly, the ANN could not explicitly account for physical processes, reducing confidence in model predictions. Finally, despite limiting the analysis to those sites that had at least ten years of record, the limited data at certain sites meant that some T-year flood events and index floods could be grossly under- or over-estimated. This was exacerbated when the data included periods of long-term drought or above average long term rainfall. In those cases, the ANN might be predicting the T-year flood event accurately but, with only limited observed data, evaluation of skill could be problematic. Dawson et al. (2006) recommended the partitioning of data on the basis of size, geology and climatic conditions. They also recommended the application of other ANN models like radial basis function networks and support vector machines.

Turan and Yurdusev (2009) applied feed forward backpropagation neural networks, generalized regression neural network and fuzzy logic estimate unmeasured data using the data of the four runoff gauging station on the Birs in Switzerland. The performances of these models were measured by the mean square error, coefficient of determination and coefficient of efficiency to choose the best fit model. Out of above mentioned techniques, they observed that model of feedforward backpropagation (FFBP) algorithm should be selected over the other models if the flows of station would be predicted. Based on the findings of this study, it was concluded that the best method should be sought to model river flows based on the flow values of the considered as the specific characteristics of the basin which feeds the river and the climatic conditions which may vary year by year. Such exercises may be useful in practice to estimate the missing values of a downstream station from those of upstream stations.

The application of ANN require careful consideration as highlighted by Maier and Dandy (2000) who reported a review on ANN based on 43 papers dealing with the use of ANN models for the prediction and forecasting of water resources variables. They found that in all but two of the papers reviewed, feedforward networks were used. The vast majority of these networks were trained using the backpropagation algorithm. They mentioned that issues in relation to the optimal division of the available data, data pre-processing and the choice of appropriate model inputs were seldom considered. In addition, the process of choosing appropriate stopping criteria and optimising network geometry and internal network parameters was generally described poorly or carried out inadequately. All of the above

University of Western Sydney 24

Artificial Intelligence Based RFFA Aziz factors could result in non-optimal model performance and an inability to draw meaningful comparisons between different models.

However, one limitation of ANN is that, like other empirical methods, they are unable to reliably extrapolate beyond the range of the data used for model calibration (Flood and Kartam, 1994; Minns and Hall, 1996; Tokar and Johnson, 1999). This well-known limitation of data-driven models is primarily because they are not based on the underlying physics. Physically-based models tend to perform better at model extrapolation for inputs that are outside of the range of those used in the calibration data as the mass and energy constraints they comply with may still result in an appropriate response. Accordingly, it can be very difficult to determine when data-driven models, such as ANN, will fail to generalize and to understand the range of applicability of the model. This is true for all the RFFA techniques e.g. index flood method, rational method and regression based methods.

ANN has been used in various parts of the world; however the application of ANN to RFFA is very limited. In case of Australia, ANN has been used in the hydrological problems other than RFFA. But to the author’s knowledge, there is no notable ANN based RFFA study in Australia.

a) Genetic algorithm based artificial neural network (GAANN)

Genetic Algorithm (GA) was invented by John Holland during 1960s and 1970s (Holland, 1975) and was finally popularized by one of his students who was able to solve a difficult problem involving the control of gas pipeline transmission for his dissertation (Goldberg, 1989). The concept of GA evolved from the biological evolutionary process. The major difference between GA and the classical optimization search techniques is that the GA works with a population of possible solutions; whereas, the classical optimization techniques work with a single solution (Jain et al., 2005). GA is based on the Darwinian-type survival of the fittest strategy, whereby potential solutions to a problem compete and mate with each other in order to produce increasingly stronger individuals. Each individual in the population represents a potential solution to the problem that is to be solved and is referred to as a chromosome (Rooji et al., 1996).

A number of selection techniques has been developed by various researchers like ‘roulette wheel’ (Holland, 1975), ‘stochastic universal sampling’ (Baker, 1987), ‘sigma scaling or truncation’ (Goldberg, 1989), ‘boltzmann selection’ (de la Maza and Tidor, 1993), ‘rank

University of Western Sydney 25

Artificial Intelligence Based RFFA Aziz selection’ (Baker, 1985) and ‘tournament selection’ (Goldberg and Deb, 1991); however, their success and utility depends upon the nature of problem in hand.

Application of GA in hydrology

In the fields of hydrology and water resources, although the GA techniques have been used widely to solve a number of water resources problems (Wang, 1991; Franchini, 1996; Franchini and Galeati 1997; Savic et al., 1999; Khu et al., 2001; Cheng et al., 2002); the combined use of GA and ANN i.e. GAANN could not attract much attention of researchers as yet. The probable reason might be that the algorithm of backpropagation (BP) is much simpler and easy to understand than GA; hence, most of the ANN applications in literature used back propagation algorithm. The GA and ANN hybrid applications in water resources field are limited. One of the hybrid application studies, Jain and Srinivasulu (2004) demonstrated that GA is better than BP for training an ANN model to predict daily flows more accurately.

Morshed and Kaluarachchi (1998) conducted experiments to compare GA and BP in streamflow and transport simulations. They reported better performance of BP over GA and concluded that their results were based on a single set of simulations, and therefore, more research is needed to prepare alternate GA as a complementary to BP for situations where BP may fail. See and Openshaw (1999) recombined a series of neural networks via a rule based fuzzy logic model that has been optimized using a GA. Abrahart et al. (1999) also used a GA to optimize the inputs to an ANN model used to forecast runoff from a small catchment. Rao and Jamieson (1997) used hybrid neural network and genetic algorithm approach to investigate the minimum-cost design of a pump-and-treat aquifer remediation scheme. Wu and Chau (2006) applied neural networks and GA in flood forecasting. They applied the model to a reach in the middle section of the Yangtze River in China. All the three techniques i.e., ANN, GA and GAANN were applied separately. They concluded that when a cautious treatment was addressed to avoid over-fitting problems, the hybrid GAANN model produced more accurate flood predictions of the channel. According to authors, hybrid models such as ANN and GAANN could be considered as feasible alternatives to conventional models and it would be worth exploring into different types of hybrid techniques.

In the field of RFFA there are few studies using BPANN (Dawson et al., 2005 and Aziz et al., 2013) but to the best of author’s knowledge there has been no notable application of GAANN in RFFA especially using the Australian conditions and the data.

b) Gene-expression programming (GEP)

University of Western Sydney 26

Artificial Intelligence Based RFFA Aziz Gene-expression programming (GEP) is (like GA and genetic programming (GP)), a GA as it uses population of individuals, selects them according to fitness, and introduces genetic variation using one or more genetic operators (Mitchell, 1996). The fundamental difference between the three algorithms resides in the nature of the individuals: in GA the individuals are linear strings of fixed length (chromosomes); in GP the individuals are nonlinear entities of different sizes and shapes (parse trees); and in GEP the individuals are encoded as linear strings of fixed length (the genome or chromosomes) which are afterwards expressed as nonlinear entities of different sizes and shapes (i.e., simple diagram representations or expression trees).

GEP is an evolutionary computing method that generates a ‘transparent’ and structured representation of the rainfall-runoff system being studied. The nature of GEP allows the user to gain additional information on how the system performs, i.e., gives an insight into the relationship between input (e.g. rainfall and evaporation) and output (flood runoff) data. One of the additional advantages of this approach over the neural combination method is the model’s ability to represent itself in the form of mathematical expressions (Fernando et al., 2009).

GEP (which is an extension of GP (Koza, 1992)), is a search technique that evolves computer programs (e.g., mathematical expressions, decision trees, polynomial constructs, and logical expressions). Computer programs generated by GEP are encoded in linear chromosomes and are then expressed or translated into expression trees (ETs). GEP is a comprehensive genotype/phenotype system, with the genotype totally separated from the phenotype, whereas in GP, genotype and phenotype are mixed together in a simple replicator system (Ferreira, 2001a, b; Guven and Aytek, 2009).

Application of GEP in hydrology

In case of water resource engineering, GP has been successfully applied in few cases to solve various problems. Giustolisi (2004) used GP to determine Chezy resistance coefficient in corrugated channels; Rabunal et al. (2007) applied GP and ANN to determine the unit hydrograph of a typical urban basin; Guven et al. (2008) used the linear genetic programming (LGP) approach for time-series modeling of daily flow rate; Guven and Gunal (2008) successfully applied GEP approach for prediction of local scour downstream hydraulic structures. These studies have drawn the hydrologists in investigating the use of GP in estimating the river flow data (Guven, 2009; Guven and Talu, 2010; Guven and Kisi, 2011).

University of Western Sydney 27

Artificial Intelligence Based RFFA Aziz Most recently, Kisi and Shiri (2011) forecasted precipitation using wavelet-genetic programming; Azamathulla and Ghani (2011) predicted the longitudinal dispersion coefficients in streams and Azamathulla et al. (2011) developed stage-discharge rating curves of Pahang River by using GEP.

In the context of rainfall-runoff modelling, the combination modelling approach advocates the synchronous use of simulated discharges obtained from a number of rainfall-runoff models to produce an overall combined/integrated discharge output which can be used as an alternative to that produced by a single rainfall-runoff model. At present only a limited number of studies have dealt with the multi-model combination of hydrological models (Coulibaly et al., 2005; See and Openshaw, 2002; Shamseldin and O'Connor, 1999; Shamseldin et al., 1997). The emerging conclusion from these pioneering studies is that the combination modelling approach has tremendous potential for improving the accuracy and reliability of hydrological modelling forecasts and predictions. However, in these studies no attempts had been made to explore the nature of the combination function and their inner workings. Further, no explanation had been provided to account for the drivers behind the improvements in the modelling results essential to advance the use of combination modeling approaches in the field of hydrology.

Savic et al. (1999) applied GEP approach for rainfall-runoff modelling. They used the Kirkton catchment in Scotland (UK) for flow prediction. They concluded that the results of the data- driven approaches (GP and ANN) could show a very good agreement with the conceptual model results for which parameters were optimised using the best available optimisation techniques. However, genetic programming seems to give more insight into the form of the rainfall-runoff relationships than ANN because it explicitly gives the form of the function identified. It also partially alleviates the problem of identifying the large number of parameters necessary for conceptual model calibration. The number of GP parameters (population size, crossover and mutation probability) is much smaller and does not necessarily need to change for different rainfall-runoff problems.

Fernando et al. (2012) used GEP to forecast the river flow for different catchments in China and Ireland. They investigated the application of the novel data driven technique of GEP to develop one-day-ahead flow forecasting models for catchments with widely differing characteristics. The outcome of the study found to be positive, although no comparisons have been made with forecasts from other models, the fact that these are transparent models and can serve the general purpose of producing daily forecasts of high accuracy University of Western Sydney 28

Artificial Intelligence Based RFFA Aziz is valuable. Fernando et al. (2009) applied GEP to develop a combined runoff estimate model from conventional rainfall-runoff model output. They investigated the structure of the combined model (ANN and GEP) and also the use of GEP to develop a combination rainfall-runoff model through the process of symbolic regression. They developed the GEP model using the daily simulated river flows of four other rainfall runoff models for the Chu catchment located in Vietnam. They found that GEP can be successfully used to combine model outputs from other basic rainfall-runoff models to develop one with greater accuracy. The combination allows an insight into the components that make up the model in terms of mathematical expressions thereby making the GEP model unlike its “black-box” counterparts that have been used in the past to develop combination models. The mathematical expressions generated by the programming process can be subsequently applied to other data sets not used in the model development as well as to further investigate the contributions from each of the sub-models.

The most relevant study to RFFA has been conducted by Seckin and Guven (2012); where GEP and linear genetic programming (LGP), which are extensions to GP, in addition to logistic regression (LR) were employed in order to forecast peak flood discharges. The data from 543 gauged sites across Turkey was used for the study. Drainage area, elevation, latitude, longitude, and return period were used as the inputs while the peak flood discharge was the output. They found that the proposed LGP and GEP models provided a fast and practical way of estimating the peak flood discharges. The results of their study indeed encourage the use of genetic programming in other aspects of water resources engineering studies. The proposed LGP and GEP models offer no restriction since they do not employ predefined functions unlike most regression-based models. The results of their study suggest that both genetic programming techniques, LGP and GEP can be successfully applied in estimating the peak discharges of floods in RFFA.

As discussed the application of GEP based technique in RFFA is very limited; however there is no significant study for RFFA based on GEP using Australian data.

c) Co-active neuro-fuzzy inference system (CANFIS)

Fuzzy logic is a form of multi-valued logic derived from fuzzy set theory to deal with reasoning that is approximate rather than precise. In contrast with "crisp logic", where binary sets have binary logic, the fuzzy logic variables may have a membership value of not only 0

University of Western Sydney 29

Artificial Intelligence Based RFFA Aziz or 1 – that is, the degree of truth of a statement can range between 0 and 1 and is not constrained to the two truth values of classic propositional logic Furthermore, when linguistic variables are used, these degrees may be managed by specific functions (Novak et al., 1999).

In the field of artificial intelligence, Neuro-fuzzy refers to combinations of artificial neural networks and fuzzy logic. Neuro-fuzzy was proposed by Jang (1993). Neuro-fuzzy hybridization results in a hybrid intelligent system that synergizes these two techniques by combining the human-like reasoning style of fuzzy systems with the learning and connectionist structure of neural networks. Neuro-fuzzy hybridization is widely termed as Fuzzy neural Network (FNN) or Neuro fuzzy System (NFS) in the literature. The Adaptive neuro fuzzy inference system (ANFIS) is a soft computing technique which makes use of the benefits of both the ANN and fuzzy systems. ANFIS serves as a basis for constructing a set of fuzzy if-then rules with appropriate membership functions to generate the stipulated input- output pairs

Generalized form of ANFIS is called as CANFIS. In CANFIS both Neural networks (NN) and Fuzzy inference system (FIS) play an active role in an effort to reach a specific goal. CANFIS has extended the notion of single-output system of ANFIS to produce multiple outputs.

Application of CANFIS in hydrology

Hydrologic analysis is complicated by uncertainties caused by nature (e.g., climate, land characteristics), limited data, and imprecise modelling. For instance, aquifer parameters are obtained from a few locations that represent a small fraction of the total volume. Definition of system boundaries and initial conditions also introduce uncertainty. Future stresses on the system are also imprecisely known. The stochastic approach of uncertainty analysis considers aquifer properties as random variables with known distributions. Thus, the outputs from a stochastic model are also characterized by the statistical moments or the full probability density function. However, the point in favour of fuzzy logic is; despite the theoretical development of the stochastic approach, its practical application is rather limited, especially if a point process model needs to be upscaled (Bogardi et al., 2003). Hydrological sciences require temporal and spatial data sources for a proper understanding of the phenomenon concerned. This information provides foundation for the preparation and interpretation and deduction of logically acceptable conclusions. In many hydrological studies, numerical data are pumped into mathematical models, especially through readily available computer software, which may produce unreliable results if the background of the working mechanism

University of Western Sydney 30

Artificial Intelligence Based RFFA Aziz related to any natural hydrological phenomenon is not appreciated qualitatively through verbal information (Sen, 2009).

Nayak et al. (2003) applied Neuro-fuzzy System to model the river flow of Baitarani River in India and compared its performance with the ANN and autoregressive moving average (ARMA) models. The appropriate input was selected by testing different combinations of flows at different time lags. The study also investigated the issue of transformation of input data (into normal domain) by comparing the performance of models developed on transformed and non-transformed data prior to being used as inputs to the models. It was observed that the model performance increased significantly by using the transformed input data. The results of the study showed that the neuro-fuzzy models performed slightly better than ANN but it outperformed the ARMA model in terms of all performance indices.

Jacquin and Shamseldin (2006) developed two types of fuzzy rainfall runoff models based on Takagi-Sugeno fuzzy inference systems. The developed models are applied to the data of six catchments of diverse climatic characteristics. The results of the developed models are compared with those of Simple Linear Model, the Linear Perturbation Model and the Nearest Neighbour Linear Perturbation Model. The study concluded that the FIS is a suitable alternative to the traditional methods of modelling non-linear rainfall and runoff.

Talei et al. (2010a) evaluated the rainfall runoff modelling for a sub-catchment of Kranji basin in Singapore by using a neuro-fuzzy computational technique. The result of the ANFIS was compared with those of physically based model storm water management model (SWMM). It was found that two inputs (rainfall at time t and at time t-1) have the maximum coefficient of efficiency. It was found that ANFIS model is comparable to storm water management model (SWMM) in terms of goodness of fit. The potential of ANFIS for hydrological modelling was assessed by applying the ANFIS model to monthly inflows of Bhakara Dam in India (Lohani et al., 2012). The proposed ANFIS models were compared with ANN and with Autoregressive (AR) models in order to determine the performance. Karimi et al. (2013) employed two data driven models ANFIS and ANN models for predicting hourly sea levels for Darwin Harbor, Australia.

Firat and Gungor (2007) applied neuro-fuzzy technique for flow estimation of the River Great Menderes in Turkey. The results were compared with the observed flows in order to evaluate the performance of the training/testing of this model. Using a data set of 5844 daily runoff

University of Western Sydney 31

Artificial Intelligence Based RFFA Aziz data this was found that the ANFIS models were accurate, reliable, and highly efficient and with minimum root mean square error values.

Oarda and Shu (2007) developed the models for RFFA at ungauged sites using the neuro- fuzzy for the hydrometric station network of southern Quebec, Canada. They used 15 years historical data consisting of 151 gauging stations. It was found that neuro-fuzzy approach provided a mechanism for integrating the two major steps, regionalisation and estimation, in the RFFA into one system.

A comparative study of ANN and neuro-fuzzy in continuous modelling of the daily and hourly behaviour of runoff was performed by Aqil et al. (2007). The data was derived from the Cilalawi River basin in Indonesia. The total drainage area of the Cilalawi River basin is approximately 60.17 km2. Forest, paddy field and perennial plantation dominate the land use system in the river basin, which account for 85% of the area. Two types of three layer Feed forward neural network (FFNN) models, each with one input layer, one hidden layer, and an output layer, were developed in this study. Three different network architectures and training algorithms were investigated, namely, Levenberg–Marquardt-FFNN, Bayesian regularization- FFNN, and neuro-fuzzy. When contesting against the Levenberg–Marquardt-FFNN and the Bayesian regularization-FFNN, the neuro-fuzzy model had proved better generalization capabilities and adaptability in modelling complex rainfall–runoff dynamics.

ANFIS has been used in the field of hydrology in various parts of the world. But its application in RFFA is very limited so far. Especially in Australia, the unique climatic and geographical conditions draw a line from the rest of world for the application of ANFIS and model development. There is no evidence of its application in RFFA in Australia till todate.

2.4 Summary This chapter has discussed various regional flood frequency analysis (RFFA) techniques with a particular emphasis on non-linear techniques i.e. artificial neural network (ANN), co-active neuro-fuzzy inference system (CANFIS), genetic algorithm (GA) and gene-expression programming (GEP). It has been found that the RFFA is widely used in design flood estimation for ungauged catchments. There are many RFFA methods in the literature having specific assumptions and data requirements. In Australia (in particular in New South Wales and Victoria), a linear method i.e., the Probabilistic Rational Method was the method of choice since 1987, which is likely to be changed in the new version of Australian Rainfall and Runoff. More recently, regression based RFFA methods have been widely investigated in

University of Western Sydney 32

Artificial Intelligence Based RFFA Aziz Australia. Most of the linear RFFA methods assume linear relationship between flood statistics and predictor variables. However, most of the hydrologic processes are nonlinear and exhibit a high degree of spatial and temporal variability, a simple log transformation (the most common form of transformation) cannot guarantee achievement of linearity in RFFA modelling. They are further complicated by uncertainty in parameter estimates. Increased computing power has created new opportunities for hydrologists for the solution of complex problems using non-linear intelligence based techniques such as ANN, CANFIS, GA and GEP. These non-linear techniques have been widely used in rainfall and streamflow forecasting; however, there have been only few studies on RFFA that are based on these techniques. In particular, there has been no major RFFA research in Australia based on these non-linear techniques. Non-linear techniques for regional flood estimation could be powerful methods of modelling as these do not impose a model structure on the data (i.e. they are model free techniques).

The choice of non-linear model structure, grouping of data into meaningful regions, selection of appropriate predictor variables, carefully designed model training, testing and validation methods are key to the development of successful RFFA models based on various non-linear techniques discussed in this chapter.

Non-linear techniques especially ANN have raised to prominence as a viable alternative to many traditional water resources models, particularly in the field of forecasting hydrologic variables. Some of the important features that have contributed to their popularity include their ease of implementation, their ability to learn from examples without explicit knowledge of the underlying physics and their powerful generalization abilities. However, one limitation of the non-linear techniques is that they are data dependent and data driven models. But unlike most commonly used regression based models, non-linear techniques do not impose a fixed model.

As the Australian climate and geography are different from rest of the world, with one of the most variable hydrology it is important to investigate the applicability of these non-linear techniques in RFFA problems. Hence, this research focuses on the development and testing of artificial intelligence based RFFA methods for Australia.

University of Western Sydney 33

Artificial Intelligence Based RFFA Aziz

CHAPTER 3

METHODOLOGY

3.1 General

This chapter presents the statistical and mathematical tools adopted in this study to develop the artificial intelligence based RFFA models and quantile regression technique. The cluster analysis and principal component analysis are also described which are used to group the data in catchment characteristics data space. At the beginning, artificial neural network (ANN) method is presented, which is followed by genetic algorithm based ANN, gene-expression programming, co-active neuro fuzzy inference system, quantile regression technique, cluster analysis and principal analysis. At the end, adopted validation technique is presented.

3.2 Methods adopted in the study

Initially the RFFA methods based on artificial intelligence are discussed in detail. This covers the features, fundamental concepts, mathematical equations and input data requirements for each of these methods. Later, the linear techniques are discussed with major emphasise on QRT. These are presented in the Figure 3.1.

Figure 3.1 Different RFFA techniques adopted in this study

University of Western Sydney 34

Artificial Intelligence Based RFFA Aziz 3.2.1 Artificial neural network (ANN)

There are various types of ANN and their applications are found in many different fields of science and engineering. Since the first neural model by McCulloch and Pitts (1943), there have been developments of hundreds of different models considered as ANN. The differences in them might be the functions, the accepted values, the topology, the learning algorithms, and the like. Since the function of ANN is to process information, they are used mainly in fields related to information processing. There are a wide variety of ANN that are used to model real neural networks, and study behaviour and control in animals and machines, but also there are ANN which are used for engineering purposes such as pattern recognition, forecasting, and data compression.

In the ANN modelling, natural neurons receive signals through synapses located on the dendrites or membrane of the neuron as shown in Figure 3.2. When the signals received are strong enough (surpass a certain threshold), the neuron is activated and emits a signal through the axon. This signal might be sent to another synapse, and might activate other neurons.

Figure 3.2 Structure of typical natural neuron (Source: http://staff.itee.uq.edu.au/janetw/cmc/chapters/Introduction/)

Features and strengths of ANN

1. The most important aspect of ANN is its non-linearity.

2. ANN has the ability to perform input-output mapping in an intelligent manner. This helps developing a relationship between the input and desired output. ANN has an ability to adjust its parameters, known as weights, so that the difference between the actual output from the ANN and the desired output under a certain input is minimized. This makes the ANN remarkable. There is a bit of similarity between regression

University of Western Sydney 35

Artificial Intelligence Based RFFA Aziz modelling and ANN as they both find an optimum set of coefficients to achieve input- output transformation; however, ANN can use complex non-linear models in making such transformation.

3. Adaptivity is the main characteristic of ANN. They can adapt free parameters or changes in the surrounding environment.

Working structure of artificial neural network (ANN)

A neural network comprises the neuron and weight building blocks. The behaviour of the network depends largely on the interaction between these building blocks. There are three types of neuron layers: input, hidden and output layers. Two layers of neuron communicate via a weight connection network. There are four types of weighted connections: feedforward, feedback, lateral, and time-delayed connections. A typical configuration of a feedforward three layer ANN can be seen in Figure 3.3.

Figure 3.3 Configuration of Feedforward Three-Layer ANN (ASCE, 2000)

Various forms of architecture of ANN are discussed below:

Feedforward connections: For all the neural models, data from neurons of a lower layer are propagated forward to neurons of an upper layer via feedforward connections networks.

Feedback connections: Feedback networks bring data from neurons of an upper layer back to neurons of a lower layer. In other words, through connection links signals are passed between nodes.

Lateral connections: The connection strength is represented by associated weight to each link. One typical example of a lateral network is the winners-takes-all circuit, which serves the important role of selecting the winner.

University of Western Sydney 36

Artificial Intelligence Based RFFA Aziz Time-delayed connections: Delay elements may be incorporated into the connections to yield temporal dynamics models. They are more suitable for temporal pattern recognition.

The architecture of ANN represents the pattern of connection between nodes, its method of determining the connection weights, and the activation function. Alkon (1989), Fausett (1994), Caudill (1987, 1988 and 1989) presented a comprehensive description of ANN. As mentioned above, a typical ANN consists of a number of nodes and these nodes are arranged in a particular order as that of biological neurons.

One way of classifying ANN is by the number of layers: single (Hopfield nets), bilayer (Carpenter/Grossberg adaptive resonance networks), and multilayer (most backpropagation networks). ANN can also be categorised based on the direction of information flow and processing. In a feedforward network, the nodes are generally arranged in layers, starting from a first input layer and ending at the final output layer. There can be several hidden layers, with each layer having one or more nodes. Information passes from the input to the output side. The nodes in one layer are connected to those in the next, but not to those in the same layer. Thus, the output of a node in a layer is only a dependent on the inputs it receives from previous layers and the corresponding weights. On the other hand, in a recurrent ANN, information flows through the nodes in both directions, from the input to the output side and vice versa. Sometimes, lateral connections are used where nodes within a layer are also connected (Smith, 1993; Wasserman, 1993; Lawrence, 1994; Bishop, 1995).

The input or the first layer receives the input variables for the problem at hand. This consists of all quantities that can influence the output. The input layer is thus transparent and is a means of providing information to the network. The last or output layer consists of values predicted by the network and thus represents model output. The number of hidden layers and the number of nodes in each hidden layer are usually determined by a trial-and-error procedure. The nodes within neighbouring layers of the network are fully connected by links. A synaptic weight is assigned to each link to represent the relative connection strength of two nodes at both ends in predicting the input-output relationship. These kinds of ANN can be used in solving a wide variety of problems, such as storing and recalling data, classifying patterns, performing general mapping from input pattern (space) to output pattern (space), grouping similar patterns, or finding solutions to constrained optimization problems. A system input vector composed of a number of causal variables that influence system behaviour, and system output vector composed of a number of resulting variables that represent the system behaviour (Theodoridis and Koutroumbas, 2009).

University of Western Sydney 37

Artificial Intelligence Based RFFA Aziz Mathematical treatment of ANN

The overall output value of a neuron can be expressed as below:

yj = f (X  Wi – bj) (3.1)

Where, the input in the first layer forms an input vector:

X = [x1. . . xi, . . . , xn] (3.2)

The sequence of weights leading to the node forms a weight vector:

Wj = [w1j, . . . ,wij, . . ., wnj] (3.3)

where, j = 1, 2, …n and m = number of neurons

Where, wij represents the connection weight from the ith node in the preceding layer to this jth node. The output of node j, yj, is obtained by computing the value of function f with respect to the inner product of vector X and Wj minus bj, where bj is the threshold value, also called the bias, associated with this node. In ANN parlance, the bias bj of the node must be exceeded before it can be activated.

The sigmoid function is a bounded, monotonic, non-decreasing function that provides a graded, nonlinear response. This function enables a network to map any nonlinear process. The popularity of the sigmoid function is partially attributed to the simplicity of its derivative that will be used during the training process. Some researchers also employ the bipolar sigmoid and hyperbolic tangent as activation functions, both of which are transformed from the sigmoid function. A number of such nodes are organized to form an ANN.

The function f in (Equation 3.1) is called an activation function. Its functional form determines the response of a node to the total input signal it receives. Typically the sigmoid function is expressed as below:

1 ex f (x)  (3.4) 1 ex University of Western Sydney 38

Artificial Intelligence Based RFFA Aziz In the ANN modelling adopted in this study, Lavenberg-Marquardt method was used as the training algorithm to minimize the mean squared error (MSE). The purpose of training an ANN with a set of input and output data is to adjust the weights in the ANN to minimize the MSE between the desired outputs and the ANN outputs. The degree of error increases with the number of layers in the network and with the percentage change in the weights. However, the degree of error is essentially independent of the number of weights per neuron and the number of neurons per layer, as long as these numbers are large (close to 100 or more). The data set was split into training and validation sub-sets. In this study, the testing data set was selected randomly to produce a reasonable sample of different catchment types and sizes. A feedforward ANN consisting of three layers (input, hidden and output layers) was used with the training algorithm known as ‘backpropagation of error’. Three hidden-layered neural networks were selected with 7, 3 and 1 neurons to each of these three layers. Two inputs, catchment area (A) and rainfall intensity with duration equal to time of concentration (tc) and a given average recurrence interval (ARI) were used in one input layer and one output layer with one output called predicted flood quantile (Qpred). The transfer function used for the hidden layers and the output layer was all hyperbolic tangent sigmoid function (Equation. 3.4). Transfer functions calculate a layer’s output from its net input. A maximum training iteration of 20,000 was adopted. Each predictor and predictand was standardized to the range of (0.05, 0.95), such that extreme flood events which exceeded the range of the training data set could be modelled between the boundaries (0, 1) during testing. A learning rate of 0.05 was used together with a momentum constant of 0.95. MATLAB was used to perform the ANN training. To select the best performing model the different combinations of hidden layers, algorithm, and number of neurons were observed against the MSE value. In order to obtain the best ANN-based model, the MSE values between the observed and predicted flood quantiles were calculated and the training was undertaken to minimise this error. To avoid over-training during the training of ANN model, the MSE values were also calculated for the testing data set. If the testing MSE was increasing, even when the training MSE still was decreasing, the training of the ANN was terminated. This ensured the training quality of the ANN and avoided over-fitting.

3.2.2 Genetic algorithm based ANN (GAANN)

In this study the analysis was done using two different types of ANN, one using the backpropagation technique and the other using genetic algorithm (GA) technique for optimization.

University of Western Sydney 39

Artificial Intelligence Based RFFA Aziz The major difference between GA and the classical optimization search techniques is that the GA works with a population of possible solutions; whereas, the classical optimization techniques work with a single solution (Jain et al., 2005). GA is based on the Darwinian-type survival of the fittest strategy, whereby potential solutions to a problem compete and mate with each other in order to produce increasingly stronger individuals. Each individual in the population represents a potential solution to the problem that is to be solved and is referred to as a chromosome (Rooji et al., 1996). The basic working of GA can be understood concisely by the diagram shown in Figure 3.4. An initial population of individuals (also called chromosomes) is created and according to an objective function in focus the fitness values of all chromosomes is evaluated. From this initial population parents are selected who mate together to produce off springs (also called children). The genes of parents and children are mutated. The fittest among parents and children are sent to a new pool. The whole procedure is carried over until any of the two stopping criteria is met i.e. the required number of generations has been reached or convergence has been achieved.

Chromosomes are the basic unit of population and represent the possible solution vector; they are assembled from a set of genes that are generally binary digits, integers or real numbers (Mitchell, 1996, Randy and Sue, 1998). A chromosome can be thought of as a vector x consisting of l genes gl:

x = (g1, g2,...gl), gl  G (3.5) l is referred to as the length of the chromosome. The “g” represents the binary genes (G ={0,1}), or integer genes (G ={...-2, -1, 0, 1, 2, …}) or real-value genes (G = R ). In the last case, the real values are stored in a gene by means of a floating point representation (Rooji et al., 1996)

The three genetic operators: selection, crossover (mating) and mutation in GA are primary force to produce new and unique offsprings having the same number of genes as that of parents. The selection operator is used to select parents from the pool. Crossover (mating) operator is used to produce offsprings from the selected parents. The parent chromosomes are mated to produce new offsprings representing new solution vectors. Like selection operator, various crossover techniques have been developed over the years, out of which the famous are single-point crossover (simple crossover), two-point crossover and uniform crossover (Mitchell, 1996). A crossover point is selected arbitrarily at the identical location in two parents and the two alternate halves of two parents are recombined to form two children

University of Western Sydney 40

Artificial Intelligence Based RFFA Aziz having new combination of gene values. The mutation operator is used to introduce changes in genes of a chromosome. The mutation keeps the diversity in the genes of a population and stops it from a premature convergence (Bowden et al., 2005). In the traditional binary GA, using binary digits as the gene values (i.e. 0 and 1); the value of selected gene is inversed in mutation i.e. if it has 0 value it is mutated as 1 or vice versa. However in real coded GA the two genes in a chromosome are selected and there values are swapped to introduce mutation.

Combination of genetic algorithm and artificial neural network (GAANN)

The flow chart of the GAANN model is shown in Figure 3.5. An initial population is crowded with “n” number of chromosomes where “n” is referred to as the population size. An objective function comprising of feed forward ANN model with complete description of its architecture is defined. It reads training patterns once at the start of model and stores them in memory for applying to each chromosome. The total number of genes l of each chromosome represents the total synaptic weights of ANN model.

{g1, g2, …gl} = {w(ifhr), w(ibhr), w(hfor), w(hbor)} (3.6) where ‘w’ represents the value of a synaptic weight, subscript ‘i’ represents a node of input layer, ‘h’ is a node of hidden layer and ‘o’ represents the output layer node, ‘f ’ is serial number of node which forwards the information (i.e. f = 1, 2, 3, …), ‘r’ is serial number of node which receives information (i.e. r = 1, 2, 3, .…), ‘ib’ represent the bias node of input layer and ‘hb’ is bias node of hidden layer.

At the start of model, the fitness values of all the chromosomes of population are evaluated by ANN function. The real values stored in the genes of chromosome are read as the respective weights of ANN model. Figure 3.6 shows an example of translation of the genes of a chromosome into the respective synaptic weights of an ANN model. The ANN performs feed forward calculations with the weights read from genes of forwarded chromosome as per Equation 3.6, and calculates MSE. The inverse of MSE is regarded as the fitness value of chromosome. By this way, the fitness values of all chromosomes of initial population are calculated by ANN function.

The selection operator selects two parent chromosomes randomly. The roulette wheel operator with elitism is used in this model. Elitism is a scheme in which the best chromosome of each generation is carried over to the next generation in order to ensure that the best chromosome does not lost during the calculations. The selected parents are mated to produce

University of Western Sydney 41

Artificial Intelligence Based RFFA Aziz two children having the same number of genes. The uniform crossover operator is used with a crossover rate of pc = 1.0. In uniform crossover, a toss is done at each gene position of an offspring and depending upon the result of toss, the gene value of 1st parent or 2nd parent is copied to the offspring. The genes of children are then mutated with the swap mutation operator with a mutation rate of pm = 0.8. The mutated children are then evaluated by ANN function to know their fitness values. The fitness values of all the four chromosomes (2 parents & 2 children) are compared and the two chromosomes of highest fitness values are then sent to a new population and the other two are abolished. The evolutionary operators continue this loop of selection, crossover, mutation and replacement until the population size of new pool is same as old pool. One generation cycle completes at this stage and process is repeated until any of two stopping criteria is fulfilled i.e. maximum number of generations are reached or the convergence has been achieved. And the best chromosome which is tracked so far through the number of generations is sent to the ANN function. The genes of best chromosome are read as weights of ANN model and represent the optimised weights of ANN model. With these weights, the model is said to be fully trained. Finally, the train and test sets are simulated by using these weights (Sohail et al., 2005)

The GAANN is coded in C language and some sub routines of LibGA package (Arthur and Rogers, 1995) for evolutionary operators of GA has been used with alterations to read and process the negative real values.

University of Western Sydney 42

Artificial Intelligence Based RFFA Aziz

Create population

Evaluate fitness values Selection

Crossover

Mutation

Test Convergence? No Yes

Stop

Figure 3.4 Basic idea of genetic algorithm (Sohail et al., 2005)

University of Western Sydney 43

Artificial Intelligence Based RFFA Aziz

Start

Create initial population of individuals

Population Size (PS) = n

Define feed forward ANN Evaluate fitness values FV = 1.0 / MSE

Select parents by roulette wheel method

Crossover parents by uniform crossover

method with pc = 1.0

Mutate genes of children by swapping with

pm = 0.8

Send 2 fittest individuals among 2 parents and 2 children to a new pool

NO PS of new pool =n

YES NO Termination criteria satisfied? = n

YES Select Best individual in all generations

Figure 3.5 Flow chart showing steps in GAANN model

University of Western Sydney 44

Artificial Intelligence Based RFFA Aziz

(a)

w(i1,h1) w(i1,h2) w(i2,h1) w(i2,h2) w(h1,o1) w(h2,o1)

(b)

w(i1,h1) i1 h1 w(h1,o1) w(i2,h1) o1 w(h2,o1) w(i1,h2)

w(i2,h2) i2 h2

Figure 3.6 An example of assigning gene values of a chromosome to the respective synaptic weights of ANN architecture during a GAANN modelling

3.2.3 Gene-expression programming

Gene-expression Programming (GEP) is used to perform a non-parametric symbolic regression. Symbolic regression although is very similar to traditional parametric regression, does not start with a known function relating dependent and independent variables as the latter. GEP programs are encoded as linear strings of fixed length (the genome or chromosomes), which are afterwards expressed as nonlinear entities of different sizes and shapes (Ferreira 2001a, b, 2006).

GEP automatically generates algorithms and expressions for the solution of problems, which are coded as a tree structure with its leaves (terminals) and nodes (functions). The generated candidates (programs) are evaluated against a “fitness function” and the candidates with

University of Western Sydney 45

Artificial Intelligence Based RFFA Aziz higher performance are then modified and re-evaluated. This modification evaluation cycle is repeated until an optimum solution is achieved. In GEP a population of individual combined model solutions is created initially in which each individual solution is described by genes (sub-models) which are linked together using a predefined mathematical operation (e.g. addition). In order to create the next generation of model solutions, individual solutions from the current generation are selected according to fitness which is based on the pre-chosen objective function. These selected individual solutions are allowed to evolve using evolutionary dynamics to create the individual solutions of the next generation. This process of creating new generations is repeated until a certain stopping criterion is met (Fernando et al., 2009).

Two important components of the GEP include the chromosomes and the expression trees (ETs). The ETs are the expression of the genetic information encoded in the chromosomes. The process of information decoding from chromosomes to the ETs is called translation, which is based on a kind of code and a set of rules. There exist very simple one to one relationships between the symbols of the chromosome and the functions or terminals they represent in the genetic code. To predict the flood quantiles the set of independent variables (predictor variables) to be used in the individual prediction equation are to be identified. Then a set of functions (e.g. ex, xa, sin(x), cos(x), ln(x), log(x), 10x , etc.) and arithmetic operations (+, -, /, *) are defined. The terminals and the functions form the junctions in the tree of a program.

In GEP, k-expressions (from Karva notation) which are fixed length list of symbols are used to represent an ET as shown in Figure 3.7. These symbols are called chromosomes, and the list is a gene. The Gene “sqrt, , ±, a, b, c and d” can be represented as ET as shown in

Figure 3.7. The GEP gene contains head and a tail. The symbols that represent both functions and terminals are present in the head while tail only contains terminals. The length of the head of the gene h is selected for each problem while the length of the tail is a function of length of the head of the gene.

In order to obtain the best GEP model, the mean squared error was used as ‘fitness function’, which was based on the observed and predicted flood quantiles; the training was undertaken to minimise this error. In order to develop the combined model in GenXProTools®, the parameter settings in Table 3.1 were used to develop the models.

University of Western Sydney 46

Artificial Intelligence Based RFFA Aziz

sqrt

+ -

a b c d

Figure 3.7 GEP expression tree (ET)

Table 3.1 Parameters used per run in GEP model

Parameters Description Amount P1 Chromosomes 20 P2 No of genes 5 P3 Head size 6 P4 Tail size 7 P5 Fitness function error type MSE P6 Linking function Subtraction P7 Mutation rate 0.044 P8 Function set +, -, *, /, x2, x3, sqrt, Exp, Ln, Sin, Cos, 3Rt, Atan, Pow, Pow10, Log, Log2 P9 Inversion rate 0.1 P10 Gene recombination rate 0.1 P10 One point recombination rate 0.3 P10 Two point recombination rate 0.1 P10 Gene Transposition rate 0.1 P10 Data type Floating-Type

3.2.4 Co-active neuro fuzzy inference system (CANFIS)

Fuzzy logic provides a different way to approach a control or classification problem. This method focuses on what the system should do rather than trying to model how it works. This procedure of developing a fuzzy inference system (FIS) using the framework of adaptive neural network is called an adaptive neuro fuzzy inference system (ANFIS). A typical FIS is shown in Figure 3.8.

Consider the example of simple FIS with only two inputs x and y and one output z and suppose that the rule base contains two fuzzy if-then rules of Takagi and Sugeno (1983).

University of Western Sydney 47

Artificial Intelligence Based RFFA Aziz Let A be a crisp set. An individual x from a universal set X is determined either to be a member of A or a non-member of A. This can be expressed by:

 A (X ) : X {0,1} (3.7)

Figure 3.8 Fuzzy inference system (FIS) (Shi and Mozimoto, 2000)

Fuzzy logic can be best understood using set membership where the membership values represent the degrees with which each object is associated with the properties that are distinctive to the collection. Formally, a fuzzy set A is defined as a collection of objects with membership values between 0 (complete exclusion) and 1 (complete membership). Membership grade of each element in X is determined through a membership function A which maps the elements of a universe of discourse X to the unit interval [0, 1].

A : X {0,1} (3.8)

By using approximate reasoning, a fuzzy logic description can be used to effectively model the uncertainty and nonlinearity of a system (Shu et al., 2008). Approximate reasoning provides decision support and expert system bund by a minimum of rules and it is the most obvious implementation in the field of artificial intelligence.

Rule 1: If x is A1 and y is B1, then f1 = p1x + q1y + r1,

Rule 2: if x is A2 and y is B2, then f2 = p2x + q2y + r2

University of Western Sydney 48

Artificial Intelligence Based RFFA Aziz

Where A1, A2 and B1, B2 are the membership functions of input x and y respectively; p1, q1, r1 and p2, q2, r2 are the parameters of the output functions. The node functions in the same layer of the same function family as described below:

Layer 1: Each node in this layer performs fuzzification and generate membership grade of a fuzzy set (A, B, C or D) and specifies the degree to which the given input belongs to one of the fuzzy sets. The fuzzy sets are defined by membership functions (MFs).

Layer 2: Each node in this layer is denoted by determined MF of the whole input vector by aggregating the fuzzified results of the individual scalar functions of the every input variable. The output of each node in this layer is obtained by multiplying the incoming signals and represents the firing strength of a rule.

Layer 3: This layer has two components. The upper component applies to the MFs to each of the inputs while the lower component is a representation of the modular network that computes, for each input, the sum of all the normalized firing strengths (Parthiban and Subramanian, 2009).

Layer 4: The fourth layer calculates the weight normalization of the output of the two components from the third layer and produces the output of the CANFIS network.

Fuzzy rules and fuzzy sets in the CANFIS capture and store the regional information. The training algorithm tunes the system parameters over the entire data space according to the hybrid learning rules. This approach provides a general framework that combines two techniques, the ANN and fuzzy systems. CANFIS model provides nonlinear modelling capability and requires no assumption of the underlying model. By utilizing the fuzzy techniques, the linguistic relationship between the input and output can be expressed using the fuzzy rules. Unlike the initialization of an ANN, which may require several rounds of random selection, the initialization of a CANFIS can be performed using the one pass subtractive clustering algorithm. A typical CANFIS model is shown in Figure 3.9.

In case of CANFIS, the fuzzy neuron that applies membership functions (MFs) to inputs is the fundamental component of CANFIS. The general bell and Gaussian functions are the two commonly used MFs (Principe et al., 2000). The bell shaped membership function is used in this study. The normalized axon/neuron in the network is used to expand the output into the range of 0 to1. One of the advantages associated with the fuzzy axon is that their MF can be modified through back propagation during network training and results in the expedition of

University of Western Sydney 49

Artificial Intelligence Based RFFA Aziz the convergence. The modular neural network that applies functional rules to the inputs is the second major component of CANFIS. The number of modular networks equals the number of network outputs, and the number of processing elements in each network corresponds to the number of MFs.

Figure 3.9 A typical structure of CANFIS (Parthiban and Subramanian, 2009)

The CANFIS also has a combiner axon that applies the MFs outputs to the modular network outputs (Roger et al., 1997; Alecsandru et al., 2004). Finally, the combined outputs are channelled through a final output layer and the error is backpropagated to both the MFs and the modular networks. There are a total of five layers in the CANFIS similar to ANFIS and each layer function is summarised as follows. The fuzzification of the input is performed by the each node in layer 1. Each node in this layer is the membership grade of a fuzzy set (A1, A2, B1 or B2) and specifies the degree to which the given input belongs to one of the fuzzy set. The input to the layer 2 is the product of all the output pairs from layer 1. Two components are present in the next third layer in the network. The upper component of this layer applies the membership functions to each of the inputs, while the lower component is a representation of the modular network that computes, for each output, the sum of all the firing

University of Western Sydney 50

Artificial Intelligence Based RFFA Aziz strength. The weight normalization of the outputs of the two components of the third layer is performed in the fourth layer of the network and this produces the final output of the network (Ishak and Trifiro, 2007).

The CANFIS model integrates adaptable fuzzy inputs with a modular neural network to rapidly and accurately approximate complex functions. The TSK fuzzy model proposed by Takagi, Sugeno and Kang (Takagi and Sugeno, 1985; Sugeno and Kang, 1988) is used in the present study, since this type of fuzzy model best fits the multi-input, single output system (Aytek, 2009).

For the CANFIS model development, model catchments were clustered based on model variables (A, Itc_ARI) into several class values in layer 1 to build up fuzzy rules, and each fuzzy rule was constructed through several parameters of membership function in layer 2. A fuzzy inference system structure was generated from the data using subtractive clustering. This was used in order to establish the rule base relationship between the inputs.

In order to obtain the best CANFIS models, the MSE was used as the ‘fitness function’, which was based on the observed and predicted flood quantiles; the training was undertaken to minimise this error. Lavenberg-Marquardt (LM) method was used as the training algorithm to minimize the MSE. CANFIS model was trained with a set of input and output data to adjust the weights and to minimize the MSE between the desired outputs and the model outputs. The testing data set was selected randomly to produce a reasonable sample of different catchment types and sizes. Two inputs (A, Itc_ARI) were used in one input layer and one output layer with one output (Qpred).

In the case of CANFIS, the bell membership function and the TSK neuro fuzzy model were used, as this type of fuzzy model best fits the multi-input, single output system (Aytek, 2009). LM algorithm was used for the training of CANFIS model. The stopping criteria for the training of the CANFIS network was set to be a maximum of 1000 epochs and training was set to terminate when the MSE drops to 0.01 threshold value.

3.2.5 Quantile regression technique (QRT)

A flood quantile is probabilistic flood estimate for a selected ARI. United States Geological Survey (USGS) proposed a quantile regression technique (QRT) where a large number of gauged catchments are selected from a region and flood quantiles are estimated from recorded streamflow data, which are then regressed against catchment variables that are most likely to

University of Western Sydney 51

Artificial Intelligence Based RFFA Aziz govern the flood generation process. Studies by Benson (1962) suggested that T-year flood peak discharges could be estimated directly using catchment characteristics (predictor variables) (X) data by multiple regression analysis. (Thomas and Benson, 1970; and Stedinger and Tasker, 1985; Haddad and Rahman, 2012):

1 2 QT  0 X1 X 2 ... (3.9)

Where, regression coefficients s are generally estimated by using an ordinary least squares (OLS) or generalised least squares (GLS) regression. There have been various techniques and many applications of regression models that have been adopted for hydrological regression. Most of these methods are derived from the methodology set out by the USGS as described above. The USGS has been applying the QRT for several decades. A well-known study using the QRT with an OLS procedure was carried out by Thomas and Benson (1970). The study tested four regions in the United States for design flood estimation using multiple regression techniques that related streamflow characteristics to drainage-basin characteristics.

The OLS estimator has traditionally been used by hydrologists to estimate the regression coefficients β in regional hydrological models. But in order for the OLS model to be statistically efficient and robust, the annual maximum flood series in the region must be uncorrelated, all the sites in the region should have equal record length and all estimates of T year events have equal variance. Since the annual maximum flow data in a region does not generally satisfy these assumptions, the OLS approach can provide very distorted estimates of the model’s predictive precision (model error) and the precision with which the regression model coefficients are being estimated (Stedinger and Tasker, 1985).

In this study, in developing the QRT, both the dependent and independent variables were log- transformed to linearize Equation 3.9. In this study an OLS regression was adopted to develop prediction equations for each of the six flood quantiles using two predictor variables

(A, Itc_ARI). The OLS is easily implementable approach whereas, GLS needs specialised software. However, both provide almost similar results unless data is highly correlated (Haddad et al., 2008). The data sets for building and independent testing of the QRT model were the same as with the other non-linear models. The MINITAB 14 software was used to develop the QRT models.

University of Western Sydney 52

Artificial Intelligence Based RFFA Aziz 3.2.6 Cluster analysis

In the process of formation of regions and to identify the groups of catchments in catchment characteristics data space, two methods were adopted in this study: cluster analysis and principal component analysis.

Clustering algorithms are generally categorised under two different categories – partitional and hierarchical. Partitional clustering algorithms divide the data set into non-overlapping groups and algorithms, k-mean, bisecting k-mean, k-modes, etc., fall under this category. Partitional clustering algorithms employ an iterative approach to group the data into a pre- determined k number of clusters by minimising a cost function. Whereas, hierarchical clustering involves creating clusters that have a predetermined ordering from top to bottom.

A number of methods of cluster analysis with different distance measures are used. One problem in cluster analysis is that it generates different groupings with different methods of cluster analysis. The question then arises which of these groupings is to be selected as the ‘acceptable grouping’. In selecting the ‘acceptable grouping’ the criterion was used that there should be no chaining effect in the final clusters and there should be well defined grouping in the final sets of clusters/groupings.

To overcome the problem arising from different dimensional units of the variables in cluster analysis, the variables were standardized. The variables were transformed to z- scores (mean = 0 and standard deviation = 1). Hence, it is assumed that there could be two groupings in cluster analysis so that each group contains a relatively large number of stations, which is needed for successful calibration of the RFFA model using non-linear techniques.

The hierarchical cluster analysis

There are numerous ways in which clusters can be formed. Hierarchical clustering is one of the most straightforward methods. A key component of the analysis is repeated calculation of distance measures between objects, and between clusters once objects begin to be grouped into clusters. The outcome is represented graphically which is known as a dendrogram. For this study the hierarchical clustering was used with following methods:

 Wards;

 Median; University of Western Sydney 53

Artificial Intelligence Based RFFA Aziz  Baverage;

 Waverage; and

 Centroid.

Because the goal of this cluster analysis is to form similar groups of figure-skating judges, so to measure a similarity or distance, a criterion needs to be selected. This distance is a measure of how far apart two objects are, while similarity measures how similar two objects are. For cases that are alike, distance measures are smaller and similarity measures are larger. Some, like the Euclidean distance, are suitable for only continuous variables, while others are suitable for only categorical variables. There are also many specialized measures for binary variables. But in this case different measures were adopted and the method with best clusters and with minimum outliers was selected for ANN modelling. For each of the above methods following distance measure options were adopted:

 Block;

 Euclid;

 Seuclid;

 Correlation;

 Cosine;

 Chebychev;

 Minkowski; and

 Power.

Based on above mentioned criteria for selecting the best grouping, cluster method ‘Wards’ with a distance measure option of ‘Block’ was adopted for selection of region based on the Hierarchical cluster analysis.

K-means clustering;

K-means clustering is a partitioning method. The function k-means partitions data into k mutually exclusive clusters, and returns the index of the cluster to which it has assigned each observation. Unlike hierarchical clustering, k-means clustering operates on actual observations (rather than the larger set of dissimilarity measures), and creates a single level of

University of Western Sydney 54

Artificial Intelligence Based RFFA Aziz clusters. The distinctions mean that k-means clustering is often more suitable than hierarchical clustering for large amounts of data.

3.2.7 Principle component analysis (PCA)

At the second stage of selecting acceptable grouping as part of formation of regions, the principal component analysis (PCA) was undertaken. PCA is basically a variable-reduction technique that shares many similarities to exploratory factor analysis. Its aim is to reduce a larger set of variables into a smaller set of artificial variables, called 'principal components', which account for most of the variance in the original variables. The PCA transforms a set of correlated variables into a new set of uncorrelated components, such that the first component accounts for the largest amount of the total variation in the data; the second component, which is uncorrelated with the first, accounts for the maximum amount of the remaining total variation not already accounted for by the first component, and so on. The PCA transforms a set of correlated variables into a new set of uncorrelated components, such that the first component accounts for the largest amount of the total variation in the data; the second component, which is uncorrelated with the first, accounts for the maximum amount of the remaining total variation not already accounted for by the first component, and so on. In this study PCA was undertaken using the statistical package SPSS. Variables used in PCA are discussed in Chapters 4 and 5.

3.2.8 Model validation technique

In this study, models/prediction equations were developed for each of the 6 flood quantiles being 2, 5, 10, 20, 50 and 100 years average recurrence intervals (ARIs). A split-sample validation technique was adopted to test the performance of the developed models/prediction equations where the data set was divided into two parts (i) training/modelling data set, which includes 80% of the study catchments; and (ii) validation/testing data set, which includes 20% of the study catchments. The artificial intelligence based RFFA models and QRT were first developed using the training/modelling data set, which were then tested using the validation/testing data set. This enabled an independent testing of the models/prediction equations developed in this study.

University of Western Sydney 55

Artificial Intelligence Based RFFA Aziz 3.3 Summary

This chapter provides a description of the statistical and mathematical tools adopted in this study. These include ANN, GAANN, GEP, CANFIS, cluster analysis, principal component analysis and quantile regression technique (QRT). The fundamental concepts, mathematical equations and input data requirements for each of these methods are presented in this chapter. The adopted split-sample validation technique is also described, which allowed an independent testing of the models/prediction equations developed in this study.

University of Western Sydney 56

Artificial Intelligence Based RFFA Aziz

CHAPTER 4

SELECTION OF STUDY AREA AND DATA PREPARATION

4.1 General

This thesis focuses on design flood estimation in ungauged catchments using artificial intelligence based methods. Regional flood frequency analysis (RFFA) method is based on the streamflow and catchment characteristics data of a set of selected gauged catchments. It is important that appropriate set of catchments are selected and data is prepared following standard procedure. This chapter presents selection of study area and catchments, collation of streamflow and catchment characteristics data used in this research.

4.2 Selection of study area

This study selects eastern Australia as the study area since this part of Australia has the highest density of stream gauging stations with good quality data. The eastern Australia covers the states of Queensland (QLD), New South Wales (NSW), Victoria (VIC), Australian Capital Territory (ACT) and Tasmania (TAS). The selected study area is shown in Figure 4.1.

Figure 4.1 Location of the selected study area (coloured parts of the map)

University of Western Sydney 57

Artificial Intelligence Based RFFA Aziz 4.3 Selection of study catchments

4.3.1 Factors considered for selection of catchments

The following factors were considered in making the initial selection of the study catchments:

Catchment area

The flood frequency behaviour of large catchments has been shown to significantly differ from smaller catchments, and since the RFFA method is intended for small to medium sized catchments, the proposed method should be developed based on small to medium sized catchments. Australian Rainfall and Runoff (ARR) (I. E Aust., 1987) suggests an upper limit of 1000 km2 for small to medium sized catchments, which seems to be reasonable and was adopted in this thesis.

Record length

For a stream gauging station, a long enough streamflow record is ideally needed to characterize the underlying flood probability distribution with reasonable accuracy. In most practical situations, streamflow records at many gauging stations in a given study area are not long enough and hence a balance is required between obtaining a sufficient number of stations (which captures greater spatial information) and a reasonably long record length (which enhances accuracy of at-site flood frequency analysis). Selection of a cut-off record length appears to be difficult as this can affect the total number of stations available to develop the RFFA technique in a study area. For this study, the stations having a minimum of 10 years of annual instantaneous maximum flow records were selected initially as ‘candidate stations’.

Regulation

Ideally, the selected streams should be unregulated, since major regulation affects the rainfall- runoff relationship significantly (storage effects). Streams with minor regulation, such as small farm dams and diversion weirs, may be included because this type of regulation is unlikely to have a significant effect on annual maximum (AM) floods. Gauging stations on streams subject to major upstream regulation were not included in this thesis.

University of Western Sydney 58

Artificial Intelligence Based RFFA Aziz

Urbanisation

Urbanisation can affect flood behaviour dramatically (e.g. decreased infiltration losses and increased flow velocity). Therefore catchments with more than 10% of the area affected by urbanisation were not included in this thesis.

Landuse change

Major landuse changes, such as the clearing of forests or changing agricultural practices modify the flood generation mechanisms and make streamflow records heterogeneous over the period of record length. Catchments which have undergone major landuse changes over the period of streamflow records were not included in the data set.

Quality of data

Most of the statistical analyses of flood data assume that the available data are essentially error free; at some stations, this assumption may be grossly violated. Stations graded as ‘poor quality’ or with specific comments by the gauging authority regarding quality of the data were assessed in greater detail; if they were deemed to be of ‘low quality’, they were excluded from the study.

4.4 Streamflow data preparation

4.4.1 Methods of streamflow data preparation

Missing observations in streamflow records at gauging locations are very common and one of the elementary steps in any hydrological data analysis is to make decisions about dealing with these missing data points. Missing records in the AM flood series were in-filled where the extra data points can be estimated with sufficient accuracy to contribute additional information rather than ‘noise’. For this research following methods were applied following the approach of Rahman (1997) and Haddad et al. (2010).

Method 1

In this method the monthly instantaneous maximum (IM) data was compared with monthly maximum mean daily (MMD) data at the same station for years with data gaps. For a missing month of instantaneous maximum flow corresponding to a month of very low maximum mean daily flow, that was taken to indicate that the AM did not occur during that missing month.

University of Western Sydney 59

Artificial Intelligence Based RFFA Aziz Method 2

This method involved a linear regression of the AM mean daily flow series against the annual instantaneous maximum series of the same station. Infilling of the gaps in IM record was performed using the developed regression equations. The IM record is not to extend the overall period of record of instantaneous flow data, but to infill the missing data points.

As Method 1 is more directly based on observed data for the missing month and involves fewer assumptions, it was preferred over Method 2.

4.4.2 Tests for outliers

In a set of annual maximum (AM) flood series there is a possibility of outliers being present. An outlier is an observation that deviates significantly from the bulk of the data, which may be due to errors in data collection or recording, or due to natural causes.

The method for treating outliers suggested in ARR (I.E Aust., 1987) was not adopted here, as it includes an adjustment for skew, employing somewhat ‘circular’ logic. Instead, the procedure known as Grubbs and Beck (1972) method was adopted. The Grubbs and Beck (1972) method is based on the principle of determining high and low outlier threshold values by applying a one-sided 10% significance level test, which considers the sample size. The test was developed by Grubbs and Beck (1972) for detecting single outlier from a normal distribution, but has been shown to be also applicable to the LP3 distribution.

4.4.3 Trend analysis

Hydrological data for any flood frequency analysis, be it at-site or regional, should be stationary, consistent and homogeneous. The AM flow series should not show any time trend to satisfy the basic assumption of stationarity with traditional flood frequency analyses methods. Thus, in this study, a trend analysis was carried out where possible to identify stations showing significant trend and the stations which did not show any significant trend were included in the primary data set for this study. Two tests were initially applied to detect trend, the Mann–Kendall test (Kendall, 1970) and the distribution free CUSUM test (McGilchrist and Wodyer, 1975); both tests were applied at the 5% significance level. The Mann-Kendall test is concerned with testing whether there is an increase or decrease in a time series, whereas the CUSUM test concentrates on whether the means in two parts of a record are significantly different. As a useful guide and in addition to

University of Western Sydney 60

Artificial Intelligence Based RFFA Aziz the trend tests, a simple time series plot and a cumulative flow graph of the station were also used to detect shifts in the AM flood data. It should be noted that trends in a time series data do not necessarily mean non-stationarity. In climate change research, non-stationarity means significant changes in statistical properties of the time series data of a hydro meteorological variable over time. Trends may not change statistical properties (such as mean and variance) of a time series data significantly. Therefore, trend analysis cannot be used as stationarity test; however, trends may be an indicator of stationarity.

4.4.4 Rating error analysis

The rating curve used to convert measured flood levels to flood flow rates is based on periodic measurements of flow areas and velocities over a range of flow magnitudes. However, the range of observed flood levels generally exceeds the range of ‘measured’ flows, thus requiring different degrees of extrapolation of well-established rating curves.

Any rating curve extrapolation errors are directly transferred into the largest observations in the AM flood series, and use of extrapolated data in flood frequency analysis can thus result in grossly inaccurate flood frequency estimates.

To assess the degree of rating curve related error for a given station, the AM flood series data point for each year (estimated flow QE) was divided by the maximum measured flow (QM) to obtain a rating ratio (RR) (see Equation 4.1). If the RR value is below or near 1, the corresponding AM flow may be considered to be free of rating curve extrapolation error. However, a RR value well above 1 indicates a rating curve error that can cause notable errors in flood frequency analysis.

Q Rating Ratio (RR)  E (4.1) QM

For any RFFA, a large number of stations with reasonably long record lengths are required and hence a trade-off needs to be made between an extensive data set that includes stations with very large RR values (and thus lower accuracy) and a smaller data set with RR values restricted to what could be considered to be a “reasonable upper limit” of rating curve errors.

A working method to decide on a cut-off RR value was determined by looking at the average RR value and the maximum RR value for each station in a region/state. Based on the results from Victoria and NSW, the following cut-off values were found to represent a reasonable

University of Western Sydney 61

Artificial Intelligence Based RFFA Aziz compromise between accuracy at individual sites and total size of the regional data set: an average RR value of 4 and a maximum RR value of 20.

4.5 Selection of catchment characteristics

Identification of the most relevant catchment characteristics is difficult as there is no objective method for doing this; also many catchment characteristics are highly correlated, thus the presence of many of these in the model can cause problems with statistical analysis such as introducing multi-colinearity and secondly it does not provide any extra useful information.

The evaluation and success of catchment characteristics used in past studies should be used as a criterion for the initial selection of candidate characteristics. The initial selection of candidate characteristics should be based on an evaluation and success of catchment characteristics used in past studies. All the possible catchment/climatic characteristics must be considered from the past studies to make the selection for a given study. This can increase the validity of the model to be developed. Rahman (1997) considered this aspect in detail from over 20 previous studies to develop a reasonable starting point. But in RFFA, the significance of characteristics may vary from region to region; therefore, no general inference about the significance of a particular catchment characteristic can be made for a given region based on the findings of other studies.

4.5.1 Selection criteria

Following guidelines were adopted in this study to select the catchment characteristics following the approach of Rahman (1997):

 The characteristic should have a plausible role in flood generation.

 They should be unambiguously defined.

 Characteristics should be easily obtainable. When a simpler characteristic and a complex one are correlated and have similar effects, then the simpler characteristic should be chosen.

 If a derived/combined characteristic is used, it should have a simple physical interpretation.

University of Western Sydney 62

Artificial Intelligence Based RFFA Aziz  The selected characteristics should not be highly correlated because this introduces unstable parameters in multiple regression analysis.

 The prediction performance of a particular characteristic in other regionalisation studies should be examined as this might provide some information regarding the importance of a characteristic.

4.5.2 Catchment characteristics considered in this thesis

Following five catchment characteristics were selected in this thesis on the basis of criteria mentioned in section 4.5.1. They are also described in detail in the next section.

The candidate catchment/climatic characteristics are:

 Design rainfall intensity (I_tc_ARI, mm/h);

 Mean annual rainfall (R, mm);

 Mean annual evapo-transpiration (E, mm);

 Catchment area (A, km2); and

 Slope of central 75% of mainstream S1085 (S, m/km).

4.5.3 Rainfall intensity

Rainfall intensity, with some appropriate duration and average recurrence interval (ARI), has been found to be the most influential climatic characteristic in the previous RFFA studies. There is no doubt that it is significant in the flood generation process. It is also quite easy to obtain.

The use of rainfall intensity requires the selection of an appropriate duration and ARI. It seems to be logical to use rainfall intensity with duration equal to the time of concentration

(tc), as applied in the rational method. However, the time of concentration (tc) differs for the catchments in the study area due to variability in size and shape; i.e. it is virtually impossible to select a storm having equal time of concentration which is representative of every catchment in this thesis. It was therefore decided to include the following design rainfall intensities in this study:

 (tc) duration, 2 years ARI (I_tc_2, mm/h); University of Western Sydney 63

Artificial Intelligence Based RFFA Aziz

 (tc) duration, 5 years ARI (I_tc_5, mm/h);

 (tc) duration, 10 years ARI (I_tc_10, mm/h);

 (tc) duration, 20 years ARI (I_tc_20, mm/h);

 (tc) duration, 50 years ARI (I_tc_50, mm/h); and

 (tc) duration, 100 years ARI (I_tc_100, mm/h).

All the basic design rainfall intensities data for the selected catchments were obtained from ARR, Vol. 2 (I. E. Aust., 1987) and the software AUSIFD was used to obtain other design rainfall intensities. AUSIFD is widely used software in Australia to derive design rainfalls. For consistency, and ease of application, the formula recommended in ARR 1987 for Victoria and eastern NSW, given by Equation 4.2, was adopted in this thesis to estimate time of 2 concentration tc (hours) from catchment area A (km ).

0.38 (4.2) tc  0.76A

4.5.4 Mean annual rainfall

Mean annual rainfall has been adopted in many previous studies; although it may not have a direct influence or a link with flood peaks it can still have a secondary effect by acting as surrogate for other catchment characteristics (e.g. vegetation). It is also quite easy to obtain. Thus, mean annual rainfall was included as a candidate predictor variable in this study The mean annual rainfall data was obtained from Australian Bureau of Meteorology CD. For all the catchments, the mean annual rainfall value for the rainfall station closest to the centroid of each catchment was extracted.

4.5.5 Catchment area

Catchment area is the most frequently adopted morphometric characteristic and the main scaling factor in the flood process studies, since it has a direct impact on the possible flood magnitude from a given storm event. Almost all of the reported RFFA studies have found catchment area to be very significant. One of the reasons why the area variable has been so useful in statistical hydrology is its association with other significant morphometric characteristics like slope, stream length, and stream order. Catchment areas of the selected catchments were measured by planimeter from 1:100,000 topographic maps. The derived

University of Western Sydney 64

Artificial Intelligence Based RFFA Aziz areas were also compared to the values provided in the catchment data base that contained the streamflow data provided by the stream gauging authority. Area was characterised by Anderson (1957) as the ‘devil’s own variable’, because almost every watershed characteristic is correlated with it. As in the case of area, the mean annual flood is directly proportional to other morphometric characteristics, which are again directly proportional to area (e.g. stream order, stream length). The total volume of runoff (Q) is proportional to the area of the catchment (A) and of the general form:

Q = cAm (4.3)

Where, the exponent m varies from 0.5 to 1.00. Catchment area was included in this study as a candidate predictor variable.

4.5.6 Slope S1085

From the different measures of slope, S1085 seems to be easily obtainable and reported to be the best measure for prediction of mean flood (Benson, 1959). Thus, S1085 was used in this study. S1085 method of slope measurement in this study excludes the extremes of slope that can be found at either end of the mainstream. It is the ratio of the difference in elevation of the stream bed at 85% and 10% of its length from the catchment outlet and 75% of the mainstream length.

The following methodology was adopted to derive the S1085 values:

 Catchment boundaries were plotted on 1:100,000 topographic maps for each gauged station.

 The mainstream length was measured using an electronic map wheel. Where the mainstream was taken as the total distance from the outlet to where it intersects with the catchment boundary of the stream. The longest path was chosen for each catchment as the main stream of that catchment.

 Elevations were then derived for the 10% and 85% mainstream length positions. The positions were interpolated from either 10 m or 20 m contours.

 S1085 values were determined from Equation 4.4.

(E  E ) S1085 2 1 (4.4) 0.75 (L) University of Western Sydney 65

Artificial Intelligence Based RFFA Aziz

Where, E2 is the elevation at the 0.85L position, E1 is the elevation at the 0.10L position and L is the main stream length, where S1085 in m/km. The slope S1085 is referred to as S henceforth.

4.5.7 Mean annual evapo-transpiration

Mean annual evapo-transpiration is the third influential climatic characteristic considered in the flood generation process. Evapo-transpiration does not affect the flood peak directly but can have a secondary effect by being a surrogate for other catchment characteristics. Evapo- transpiration can be defined as the water lost from a water body through the combined effects of evaporation and transpiration from catchment vegetation. In this study mean annual areal potential evapo-transpiration data was used.

For this, the data was obtained from the Australian Bureau of Meteorology CD. For all the catchments the value at the centroid of each catchment was extracted.

4.6 Streamflow data preparation for various states

4.6.1 NSW and ACT

A total of 635 stations were selected from NSW and ACT initially. For in-filling the gaps, Method 1 was preferred over Method 2 (see Section 4.4.1 for description of these methods) for different catchments in NSW.

Trend analysis

Initially the Mann-Kendall test was applied to the stations. The results showed that some 11% of the stations had a decreasing trend generally after 1990. Given the magnitude of the number of stations showing trend, time series plots and mass curves were prepared for the stations showing trend to detect visually if significant changes in slope could be identified. A typical plot is shown in Figure 4.2. A simple time series plot (Figure 4.3) is useful in addition to trend tests in detecting and confirming shifts in data. With an indication from these tests that flood data are not independently and identically distributed from year to year, there needs to be caution applied when using short records in estimating long term risks.

The fact that the last 10–15 years of data (after late 1980’s) showed a significant downward trend for many stations makes the inclusion of stations with short record length in flood frequency analysis questionable, as this could introduce significant bias in the results. Hence,

University of Western Sydney 66

Artificial Intelligence Based RFFA Aziz it was decided that a station should have at least 25 years of streamflow data. The number of eligible stations in NSW and ACT after the introduction of a cut off record length of 25 years dropped to 106.

Checking for outliers in the AM flood series

The Grubbs and Beck (1972) method was adopted to check for the outliers. While the data checking revealed many ‘outliers’ in the flood series, these did not preclude the use of the remaining flood data in RFFA. The results of the outlier detection procedure are summarised below:

 40% of the stations were found to have low outliers. The maximum number of low outliers detected in a data series was 9 and never exceeded 21% of the total number of data points in a series.

 Most of the detected low outliers occurred for stations located in low rainfall areas, especially in the western parts of NSW.

 31% of low outliers occurred in the years 1982, 1967 and 1994. This is not surprising as there were severe droughts during these years; the maximum flows that occurred in many rivers in these years were merely base flows, and not due to flood events.

 47% of the stations did not show any outliers.

 Only 5 stations had a high outlier.

The detected low outliers were treated as censored flows in flood frequency analysis using ARR FLIKE (Kuczera and Franks, 2005).

Rating curve error

To assess the degree of rating curve related error for a given station, the rating ratio (RR) (see Equation 4.1) was adopted. In the remaining data set of 106 stations from NSW, many had RR values considerably greater than 1 (Figure 4.8). A cut-off RR value of 20 was adopted; any station having an average RR value greater than 4 and a maximum RR value greater than 20 was rejected. This reduced the eligible number of stations from 106 to 96.

Final data set from NSW and ACT

University of Western Sydney 67

Artificial Intelligence Based RFFA Aziz A total of 635 stations were initially selected. After in-filling the gaps in the AM flood series, trend analysis, introduction of a cut-off record length of 25 years, and consideration of rating curve errors, only 96 stations remained, which represent about 15% of the initially selected stations. The statistics of AM streamflow record lengths of these 96 stations are summarised below:

 Record lengths range from 25 to 74 years, mean 34 years, median 31 years and standard deviation 10 years;

 77% of the stations have record lengths in the range 25-35 years;

 18% of the stations have record lengths in the range 40-55 years; and

 5% of the stations have record lengths in the range 60-75 years.

The histogram of streamflow record lengths of the 96 stations from NSW and ACT is shown in Figure 4.5.

Vk - Station 219001 12

10

8

6

Vk 4 Significant shift downwards

2

0

-2 1940 1950 1960 1970 1980 1990 2000 2010 Year

Figure 4.2 Result of trend analysis (Station 219001). Here Vk is CUSUM test statistic defined in

Histogram of Rating Ratio 10000

2162 Over 95% of rating ratios 1000 774 between 1 & 20 222 99 100 67 61

Frequency 21 13 9 10 8 5 5 5 4 2 2 University of Western Sydney 68 0 0 1

1 3 5 7 9 12 14 16 18 20 22 24 26 28 30 35 40 45 Rating Ratio - RR Artificial Intelligence Based RFFA Aziz

McGilchrist and Wodyer, 1975

Figure 4.3 Result of trend analysis – time series plot (Station 219001) Station 219001 12000

/s) 3 10000 Decrease in flow magnitude 8000

6000

4000

2000

Annual Maximum Flow (m Flow AnnualMaximum

0 1940 1950 1960 1970 1980 1990 2000 2010 Year

Figure 4.4 Histogram of rating ratios for 106 stations from NSW

The statistics of catchment areas of the selected 96 stations are summarised below:

 Catchment areas range from 8 to 1010 km2, with an average value of 353 km2, median of 267 km2 and a standard deviation of 276 km2;

 53% of catchments have areas smaller than 300 km2;

 38% of stations have areas in the range of 301 km2 to 800 km2; and

 10% of stations have areas in the range of 801 km2 to 1010 km2.

University of Western Sydney 69

Artificial Intelligence Based RFFA Aziz

45 41 40

35

30 26 25

20

Frequency 15

10 7 5 5 5 5 2 2 2 1 0 0 25 - 29 30 - 34 35 - 39 40 - 44 45 - 49 50 - 54 55 - 59 60 - 64 65 - 69 70 - 74 >75

Record Length (years)

Figure 4.5 Distribution of streamflow record lengths of 96 stations from NSW and ACT

The distribution of catchment areas is shown in Figure 4.6. The geographical distribution of the finally selected 96 stations is shown in Figure 4.7. There is no station in far western NSW that passed the selection criteria.

25

20 20

15 13 12

10 9

Frequency 8 8 7 6 5 5 4 3

1

0 0 - 25 26 - 100 101 - 201 - 301 - 401 - 501 - 601 - 701 - 801 - 901 - >1000 200 300 400 500 600 700 800 900 1000

Catchment Area (km2)

Figure 4.6 Distribution of catchment areas of 96 stations from NSW and ACT

4.6.2 Tasmania

A total of 73 stations were selected as candidates from Tasmania, each having a minimum of 10 years of streamflow record. For in-filling the gaps in the AM flood series, Method 1 was preferred over Method 2 (these methods are described in Section 4.4.1). The following points summarise the results of the in-filling of the AM flood series data for Tasmania:

University of Western Sydney 70

Artificial Intelligence Based RFFA Aziz  18 data points from 23 stations were in-filled by comparing flow records (Method 1);

 27 data points from 12 stations were in-filled by regression (Method 2); and

 20% of stations did not have any missing record.

After in-filling the gaps, the stations were then checked for possible trends (Section 4.4.3 details the method). Only three stations showed trends. The relevant data for checking the rating ratios for Tasmania was largely unavailable, and hence no rating error analysis was undertaken. About 9% of the stations showed low outliers. The maximum number of low outliers detected in a data series was one and never exceeded 4% of the total number of data points in a series. The low outliers occurred in the years 1967, 1982 and 2001. About 75% of the stations did not show any outliers. About 14% of the stations showed high outliers; however, these data points were not removed as no data error was detected.

While obtaining catchment characteristics data, 7 stations were found to have significant proportions of lake areas, and were thus excluded; this reduced the dataset to 56 stations. From this, 3 catchments over 1590 km2 were excluded, thus the final dataset contained 53 stations.

Figure 4.7 Geographical distributions of 96 catchments from NSW and ACT

The streamflow record lengths of the selected stations range from 10 to 58 years (median: 21 years and mean: 24 years). Figure 4.8 shows the distribution of record lengths. Figure 4.9 presents the distribution of catchment areas of the selected catchments. The catchment areas

University of Western Sydney 71

Artificial Intelligence Based RFFA Aziz range 4.6-1590 km2 (median: 102 km2 and mean: 240 km2). Figure 4.10 shows the locations of the selected stations. There is a lack of station in the southern and eastern parts of the state.

16 15

14

12

10 10

8 7

Frequency 6

4

2 2 2 1

0 1 - 10 11 - 20 21 - 30 31 - 40 41 - 50 51 - 60 Record Length (years)

Figure 4.8 Distribution of streamflow record lengths of the selected stations from Tasmania

10 9 9 8 7 6 6 5 5 4 Frequency 3 3 3 2 2 2 2 1 1 1 0 0 0 0 0 - 25 26 - 51 - 101 - 201 - 301 - 401 - 501 - 601 - 701 - 801 - 901 - >1000 50 100 200 300 400 500 600 700 800 900 1000 2 Catchment Area (km )

Figure 4.9 Distribution of catchment areas of the selected stations from Tasmania

University of Western Sydney 72

Artificial Intelligence Based RFFA Aziz

Figure 4.10 Locations of selected catchments from Tasmania

4.6.3 Queensland

The streamflow data were obtained from the Department of Natural Resources & Water (NRW), QLD. A total of 351 active and historical streamflow gauging station records were provided by NRW. Gauge station metadata, AM flow records as well as the monthly and daily records were supplied by the NRW for each station. Based on the adopted selection criteria, the number of eligible stations was reduced to 289.

The streamflow data were in-filled by comparing flow records (Method 1) and/or regression (Method 2). Method 1 was preferred over Method 2. Some years’ data could not be filled due to many missing records. Some important statistics regarding the gap filling are:

 81 data points were in-filled for 47 stations using Method 1;

 413 data points were in-filled for 104 stations using Method 2; and

 16 % of stations did not have any missing records.

To check for outliers, the Grubbs and Beck (1972) method was used. Some important statistics about the outlier detection are:

University of Western Sydney 73

Artificial Intelligence Based RFFA Aziz  39% of stations were found to have low outliers; the maximum number of outliers detected in a data series was 4 and never exceeded 10% of the total number of data points in a series.

 Most of the detected low outliers occurred mainly in the midwestern and top parts of Queensland.

 The bulk of the low outliers occurred in the years 1967, 1982 and 2001.

 61% of stations did not have any outliers.

A total of 117 stations (7% of the stations) showed a significant trend, and were removed from the database. As a result, 265 stations were retained.

Furthermore, the data with streamflow record length of 25 years and greater was selected. After the introduction of cut off period the numbers of catchments from QLD were dropped to 172. Figure 4.11 provides histogram of record lengths 172 stations. Some important statistics of the streamflow record lengths are provided below:

The distribution of catchment areas of these catchments is shown in Figure 4.12. Some important statistics of the catchment areas are summarised below:

 24 catchments (9%) are smaller than 50 km2;

 67 catchments (25%) are smaller than 100 km2;

 47 catchments (18%) are in the range of 101 to 200 km2; and

 37 catchments (14%) are larger than 600 km2.

The locations of the selected 172 stations are shown in Figure 4.13. There is no suitable station located in the south-western part of Queensland.

University of Western Sydney 74

Artificial Intelligence Based RFFA Aziz

120

99 100

80 73

62 60

Frequency 40

23 20

1 1 1 3 1 1 0 1 - 10 11 - 20 21 - 30 31 - 40 41 - 50 51 - 60 61 - 70 71 - 80 81 - 90 91 - 100 Record Length (years)

Figure 4.11 Distribution of streamflow record lengths of the selected 172 stations from QLD

70

59 60

50 47

40 36

27 30 26 25

Frequency

20 15 13 10 8 7 2 0 0 - 25 26 - 100 101 - 201 - 301 - 401 - 501 - 601 - 701 - 801 - 901 - 200 300 400 500 600 700 800 900 1000

Catchment Area (km2)

Figure 4.12 Distribution of catchment areas of the selected 172 stations from QLD

University of Western Sydney 75

Artificial Intelligence Based RFFA Aziz

Figure 4.13 Locations of the selected 172 stations from QLD

4.6.4 Victoria

Based on the adopted selection criteria, a total of 415 stations were initially selected as candidates from Victoria each having a minimum of 10 years of streamflow record.

For in-filling the gaps in the AM flood series, Method 1 was preferred over Method 2. The following points summarise the results of the in-filling of the AM flood series data in Victoria.

 273 data points from 187 stations were in-filled by comparing flow records (Method 1);

 60 data points from 44 stations were in-filled by regression (Method 2);

 Regression equations used in gap filling showed high R2 values (range 0.82 – 0.99, mean = 0.93 and SD = 0.041); and

 10% of stations did not have any missing records.

After in-filling the gaps, the stations were then checked for possible trends, as discussed below.

University of Western Sydney 76

Artificial Intelligence Based RFFA Aziz Trend analysis:

Initially the Mann-Kendall test was applied to the stations. The results were rather surprising as they revealed that some 20% of the stations had a decreasing trend. Given the magnitude of the number of stations showing trend, time series plots and mass curves were prepared for the stations showing trend to detect visually if significant changes in slope could be identified.

As an example, Figure 4.14 shows a significant overall downward trend for Station 230210, supporting the result from the Mann-Kendall test, and a noticeable decrease in AM flows from the late 1980s. In order to clarify this further the CUSUM test was applied; the result was similar, with the plotted graph as seen in Figure 4.15 showing a downward shift in the mean from 1995 onwards.

A simple time series plot was made in addition to trend tests in detecting and confirming shifts in data. With an indication from these tests that flood data are not independently and identically distributed from year to year, there needs to be caution applied when using short records in estimating long term risks. The fact that the last 10–15 years of data (after late 1980’s) showed a significant downward trend for many stations (presumably due to the drier climate epoch we have entered) makes the inclusion of stations with short records in regionalization studies quite questionable.

Finally, 21 stations from Victoria were removed due to the presence of significant trend. The number of eligible stations remaining after the application of trend tests and the introduction of a cut off length of 25 years, dropped to 144, which is only 35% of the initially selected 415 stations. This result shows that the effective dataset for RFFA in a given region is likely to be substantially smaller than the primary data set.

Impact of rating curve error on flood frequency analysis:

In the remaining data set of 144 stations, many had rating ratios (RR) considerably greater than 1 (RR is defined by Equation 4.1). For any RFFA study, a large number of stations with reasonably long record lengths are required and hence a trade-off needs to be made between an extensive data set that includes stations with very large RR values and a smaller data set with RR values restricted to what could be considered to be a “reasonable upper limit”.

A working method to decide on a cut-off RR value was determined by looking at the average RR value and the maximum RR value for each station. From the histogram of RR values shown in Figure 4.15 it can be seen that 90% of the RR values for all the recorded annual University of Western Sydney 77

Artificial Intelligence Based RFFA Aziz maxima fall between 1 and 20. Thus it was decided that a cut-off RR value of 20 would be reasonable, and that any station having an average RR value greater than 4 and a maximum RR value greater than 20 would be rejected. Rating ratios significantly greater than one could magnify the errors in flood frequency quantile estimates but, on the other hand, rejecting all stations with RR greater than one would reduce the number of stations below the minimum required for meaningful RFFA to be undertaken. Adopting the cut off values of RR, mentioned above, and reduced the eligible number of stations from 144 to 131.

Station 230210 12000

10000

8000 Decrease in flow 6000 magnitude

4000

2000

Annual Maximum Flow (ML/d) Flow AnnualMaximum 0 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year

Figure 4.14 Time series graph showing significant trends after 1995

Vk - Station 230210 9 8 7 6 5

Vk 4 Significant shift 3 downwards 2 1 0 1970 1975 1980 1985 1990 1995 2000 2005 Year

Figure 4.15 CUSUM test plot showing significant trends after 1995

University of Western Sydney 78

Artificial Intelligence Based RFFA Aziz

Histogram of Rating Ratio Values 10000 4387 Frequency 1000 384

111 90% of rating ratios 100 61 between 1 & 20 Frequency 19 18 18 9 10 10 10 5 5 4 4 4 3 2 2 2 2 1 1 1 1 0 0 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50More Ratio Ratio (RR)

Figure 4.15 Histogram of rating ratios (RR) of AM flood data in Victoria (stations with record lengths > 25 years)

Outlier identification results

While the data checking revealed many ‘outliers’ in the flood series, these do not preclude the use of the remaining flood data in RFFA. The results of the outlier detection procedure for Victoria are summarised below.

 43% of the stations were found to have low outliers. The maximum number of low outliers detected in a data series was 5 and never exceed 19% of the total number of data points in a series.

 Most of the detected low outliers occurred for stations which were located in low rainfall areas, especially in the western part of Victoria.

 31% of low outliers occurred in the years 1982 and 1967. This is not surprising as there were severe droughts during these two years; the maximum annual flows that occurred in many rivers in these years were merely base flows, and not due to flood events.

 55% of the stations did not show any outliers. Even the values in drought years (1982 and 1967) were not low enough to be treated as low outliers. The locations of most of these stations are in the south-eastern part of Victoria.

University of Western Sydney 79

Artificial Intelligence Based RFFA Aziz  Only 1 station shows a high outlier.

The detected outliers were treated as censored flows in flood frequency analysis using FLIKE (that is the information that there is no flood in that year was taken into account).

Final data set from Victoria:

As noted earlier, a total of 415 stations, each with a minimum record length of 10 years, was initially selected. After in-filling the gaps in the AM flood series, trend analysis, and introduction of a cut-off record length of 25 years, only 131 stations remained, which represents about one-third of the initially selected stations. The distribution of streamflow record lengths of the selected 131 stations is shown in Figure 4.16. The statistics of record lengths of these 131 stations are summarised below.

 Record lengths range from 25 to 52 years, mean 32 years, median 32 years and standard deviation 5 years;

 87% of the stations have record lengths in the range 25-35 years;

 8% of the stations have record lengths in the range 35-45 years; and

 5% of the stations have record lengths in the range 50-55 years.

The catchment areas of the selected 131 catchments range from 3 to 997 km2 (mean: 321 km2 and median: 289 km2). The distribution of catchment areas is shown in Figure 4.25. The statistics of catchments areas of the selected 131 catchments are summarised below:

 15 catchments (11%) are in the range of 3 to 50 km2;

 11 catchments (8%) are in the range of 51 to 100 km2;

 78 catchments (60%) are in the range of 101 to 499 km2; and

 27 catchments (21%) are in the range of 500 to 997 km2.

The geographical distribution of the finally selected 131 stations is shown in Figure 4.18. There is no station in north-western Victoria that passed the selection criteria. This region is characterized by very low runoff and ephemeral streams.

University of Western Sydney 80

Artificial Intelligence Based RFFA Aziz 4.5 Flood frequency analysis

For each of the selected stations, at-site flood frequency analysis was carried out using ARR FLIKE (Kuczera, 1999) software. The detected low flows were censored using in-built facility in the FLIKE. A LP3 distribution with the Bayesian fitting method was adopted to estimate flood quantiles for ARIs of 2, 5, 10, 20, 50 and 100 years. These flood quantiles were used as dependent/target variables in the RFFA adopted in this thesis.

90

80 78

70

60

50

40

Frequency 30 23 20 20

10 5 3 2 0 25 - 29 30 - 34 35 - 39 40 - 44 45 - 50 51 - 55 Record Length (years)

Figure 4.16 Distributions of streamflow record lengths of the selected 131 stations from Victoria

30

24 25 23

20 20 18

15 13

10 Frequency 10 6 6 5 5 4 2

0 0 - 25 26 - 101 - 201 - 301 - 401 - 501 - 601 - 701 - 801 - 901 - 100 200 300 400 500 600 700 800 900 1000

Catchment Area (km2)

Figure 4.17 Distributions of catchment areas of the selected 131 catchments from Victoria

University of Western Sydney 81

Artificial Intelligence Based RFFA Aziz

Figure 4.18 Geographical distributions of the selected 131 catchments from Victoria

4.6 Summary of catchment characteristics data

For each of the selected catchments, five catchment characteristics data were obtained following the procedures mentioned in section 4.5.2. Figure 4.19 shows the selected catchments from NSW, ACT, VIC, QLD and TAS. The catchments from NSW and ACT will be considered and discussed in this thesis as NSW.

Figure 4.19 Locations of the study catchments University of Western Sydney 82

Artificial Intelligence Based RFFA Aziz The summary statistics of the catchment characteristics data set of the selected catchments are provided in Table 4.1.

Table 4.1 Summary statistics of the catchment characteristics data

Standard Variables Range Median Mean Deviation

Catchment area (A), km2 1.3 to 1900 255.5 329.4 277.3 Mean annual areal evapo-transpiration (E), mm/y 410.1 to 1543.3 998.5 977.8 188.9 Mean annual rainfall (R), mm 416 to 4348 1005.6 1185.8 603.5 Main stream slope (S), m/km 0 to 197.7 7.7 11.3 16.8 Design rainfall intensity - 2 years ARI and time of 2.9 to 43.1 8.9 10.9 6.1 concentration of tc hours (I_tc_2), mm/h Design rainfall intensity - 5 years ARI and time of 3.6 to 54.5 11.4 13.9 8.0 concentration of tc hours (I_tc_5), mm/h Design rainfall intensity - 10 years ARI and time of 4.0 to 235.8 12.9 16.3 13.7 concentration of tc hours (I_tc_10), mm/h Design rainfall intensity - 20 years ARI and time of 4.6 to 70.1 15.0 18.3 10.5 concentration of tc hours (I_tc_20), mm/h Design rainfall intensity - 50 years ARI and time of 5.4 to 757 17.7 23.4 36.7 concentration of tc hours (I_tc_50), mm/h Design rainfall intensity - 100 years ARI and time of 6.0 to 91 20.1 24.5 14.0 concentration of tc hours (I_tc_100), mm/h

4.7 Summary

A total of 452 catchments have been selected from eastern Australia as the study catchments for this study. Among them, 96, 131, 172 and 53 catchments have been selected from the states of NSW, VIC, QLD and TAS, respectively. The locations of these catchments are shown in Figure 4.19. The streamflow data have been prepared for these catchments. At site flood quantiles have been estimated using ARR FLIKE software for ARIs of 2, 5, 10, 20, 50 and 100 years using Bayesian LP3 distribution. For each of the selected catchments, five catchment characteristics data have been extracted. These data will now be applied in the following chapters to develop and test artificial intelligence based RFFA techniques.

University of Western Sydney 83

Artificial Intelligence Based RFFA Aziz

CHAPTER 5

SELECTION OF PREDICTOR VARIABLES FOR ARTIFICIAL INTELLIGENCE BASED RFFA MODELS

5.1 General

The focus of this thesis is to develop regional prediction models for design flood estimation using various artificial intelligence based techniques namely artificial neural networks (ANN), adaptive neuro-fuzzy inference system (ANFIS), genetic algorithm (GA) and gene expression programming (GEP). In Chapter 4, five candidate predictor variables were selected for RFFA. This chapter focuses on the selection of final set of predictor variables from these candidate predictor variables that can be used in developing the artificial intelligence based RFFA models. In this chapter, predictor variables are selected based on the ANN and GEP based RFFA modelling, and it is assumed that the same set of predictor variables will be applicable to the GA and ANFIS based RFFA models.

5.2 Initial selection of predictor variables for artificial intelligence based RFFA models

The variables adopted by similar previous RFFA studies were first examined (see Table 5.1). It was found that all the mentioned previous studies adopted catchment area and mean annual rainfall as the predictor variables and hence these were included as candidate predictor variables in this thesis. Design rainfall intensity and evaporation were adopted by three previous Australian studies, and hence these were included in this study. Main stream slope was adopted by all but one study and hence it was included in this study. To use the design rainfall intensity, one needs duration of rainfall and average recurrence interval (ARI); in this study, 6 different combinations of durations and ARIs were adopted. Hence, this study included a total of 10 predictor variables; however six of them represent design rainfall intensity of different durations and ARIs. The correlations of these 10 variables are plotted in Figure 5.1, which shows that 6 different rainfall intensities are highly correlated, which

University of Western Sydney 84

Artificial Intelligence Based RFFA Aziz indicates that the use of only one design rainfall intensity is desirable in the final prediction equation since the use of highly correlated variables does not add any extra information to the model. At the first stage of model development, different models based on various combinations of initially selected predictor variables (A, I_tc_ARI, R, S, and E) were formed. The candidate models are shown in Table 5.2.

Table 5.1 Catchment characteristics predictor variables used in some previous RFFA studies Authors Country Predictor variables adopted Catchment area, mean annual rainfall, mainstream slope, main-channel Flavell (2012) Australia length, and 12 and 24 hours statistical rainfall totals. Catchment area, mean annual rainfall, runoff measured, mainstream Griffis and USA slope, main-channel length, forest cover, and storage measured as the Stedinger (2007) percent of catchment area. Catchment area, design rainfall intensity, mean annual rainfall, mean Haddad and Australia annual evapo-transpiration, stream density, mainstream slope, stream Rahman (2012) length, and forest cover. Muttiah et al. USA Catchment areas, mean annual rainfall, and mean basin elevation. (1997) Catchment area, design rainfall intensity, mean annual rainfall, mean annual rain days, mean annual Class A pan evaporation, mainstream Rahman (2005) Australia slope, river bed elevation at the gauging station, maximum elevation difference in the basin, stream density, forest cover, and fraction quaternary sediment area. Shu and Oarda Catchment area, mean annual rainfall, mainstream slope, fraction of the Canada (2008) basin area covered with lakes and annual mean degree-days. Riad et al. (2004) Morocco Catchment area and mean annual rainfall.

For the five predictor variables, there could be 31 different models. However, all these models may not necessarily be useful since some combination of variables would only result in weaker RFFA models. For example, catchment area has been found to be the most important predictor variable in almost all the previous RFFA studies as shown in Table 5.1. The second most important predictor variable has been reported to be design rainfall intensity (e.g. Javelle et al., 2002; Jingyi and Hall, 2004). Hence, the combination of these two predictor variables is likely to result in the most significant prediction equation than that is delivered by any two other variables. In fact, previous Australian RFFA studies have found that these two predictor variables generate the best RFFA prediction equation (e.g. Haddad and Rahman, 2012; Haddad et al., 2014).

University of Western Sydney 85

Artificial Intelligence Based RFFA Aziz

Matrix plot

0 2000 4000 0 100 200 0 20 40 0 25 50 0 50 100 1500 1000 evap 500 4000

2000 rain 0 2000

1000 area 0 200

100 slope 0 40

I_tc_2 20 0 200

100 I_tc_5 0 200

I_tc_10 100 0 200 100 I_tc_20

0 800

400 I_tc_50 100 0

50 I_tc_100 0 500 1000 1500 0 1000 2000 0 20 40 0 100 200 0 400 800

Figure 5.1 Plot representing bi-variate correlations of the candidate predictor variables

In this study, eight different models are considered as shown in Table 5.2, which contain catchment area and design rainfall intensity and combinations of the other three predictor variables. This approach, however, makes an assumption that there is no other combination of predictor variables (from these five variables) that would deliver a better model than any one of these eight models. This assumption seems to be justified.

ANN and GEP based RFFA models were developed for each of the eight combinations of predictor variables based on 362 training/model catchments. The details of the training of the the ANN and GEP based RFFA models are presented in Chapter 7. The developed models were then tested using 90 validation/test catchments. Prediction equation was developed for

University of Western Sydney 86

Artificial Intelligence Based RFFA Aziz each of the 2, 5, 10, 20, 50 and 100 years ARI flood quantiles. The set of predictor variables giving the best results based on the 90 independent test catchments were finally selected.

Table 5.2 Various candidate models and catchment characteristics used

Model ID Variables Description of variables (details in section 5.2)

1 A, I_tc_ARI

2 A, I_tc_ARI, S A: catchment area 3 A, I_tc_ARI, E I_tc_ARI : design rainfall intensity 4 A, I_tc_ARI, R S: slope 5 A, I_tc_ARI, S, E E: evapo-transpiration 6 A, I_tc_ARI, R, E R: mean annual rainfall 7 A, I_tc_ARI, R, S

8 A, I_tc_ARI , R, S, E

The following statistical measures were used to compare various RFFA models:

 Ratio between predicted and observed flood quantiles:

Q Ratio of predicted and observed flood quantile = pred (5.1) Qobs

 Relative error (RE):

Qpred  Qobs   RE (%) = Abs  100 (5.2)  Qobs 

 Coefficient of efficiency (CE):

n 2 (Qobs  Qpred ) CE = 1 - i1 (5.3) n 2 (Qpred  Q ) i1

Where Qpred is the flood quantile estimate from the ANNs-based or GEP based RFFA model,

Qobs is the at-site flood frequency estimate obtained from LP3 distribution using a Bayesian parameter fitting procedure (Kuczera, 1999) and Q is the mean of Qobs. The median relative error and median ratio values were used to measure the relative accuracy of a model. A

Qpred/Qobs ratio closer to 1 indicates a perfect match between the observed and predicted value and a smaller median relative error is desirable for a model. A CE value closer to 1 is the best; however a value greater than 0.5 is acceptable. University of Western Sydney 87

Artificial Intelligence Based RFFA Aziz 5.3 Selection of Predictor variables for ANN based RFFA models

In the first stage, various ANN based RFFA models were compared based on median

Qpred/Qobs ratio, RE and CE values. Table 5.3 shows the median Qpred/Qobs ratio, RE and CE values for various ANN based RFFA models. In the case of ANN, in terms of median

Qpred/Qobs ratio values for different models, the values range from 0.94 (Model 3 and Model 8) to 1.69 (Model 6) with the best median Qpred/Qobs ratio value of 1.01 (Model 4) and good but slightly under predicted value of 0.99 (Model 1). Models 2, 3, 4, 5, 6, 7 and 8 produce some very good median Qpred/Qobs ratio values but for some ARIs they show notable variation e.g.,

Model 6 produces median Qpred/Qobs ratio value as 1.02 for Q50 but, 1.24 and 1.57 for Q2 and

Q20 respectively. Similarly, Model 6 median Qpred/Qobs ratio values range from 1.03 to 1.69.

Model 7 produces an overall median Qpred/Qobs ratio value of 1.17, with 36% over-prediction for Q2 and 4% under-prediction for Q100. A clear inconsistency can be found in these models with overall median Qpred/Qobs ratio values of 1.10 to 1.27. In case of Model 2, reasonably good median Qpred/Qobs ratio values can be seen for all the ARIs except for Q10 with an overestimation of 31% and an overall median Qpred/Qobs ratio value of 1.11. Model 1 consisting ‘A’ and I_tc_ARI outperforms the other models producing an overall median

Qpred/Qobs ratio value of 1.06 and ranging from 0.99 for Q5 to 1.14 for Q50. This model is ranked as number 1 on the basis of median Qpred/Qobs ratio showing the consistency and good estimates for all the ARIs.

The RE values for ANN based RFFA models for different ARIs range from 30.65% (Model 2) to 78.77% (Model 6) as mentioned in Table 5.3. Notable higher values can be seen for Models 5, 6, 7 and 8 ranging from 40.28% to 78.77%. Models 3 and 4 produce RE values in the range of 39.35% to 60.08%. But, for higher ARIs these two models are unable to maintain this consistency especially for Q50 and Q100 with RE values of 55% and 60%. It can be seen that Models 1 and 2 outperform the other models with RE values ranging from 30.65% to 50.01% and the overall values of 39.74% to 44.07%. In case of Model 2, a higher RE value can be seen for smaller ARIs but it produces good result for 20 years ARI. Model 1 dominates Model 2 in terms of consistency and competitive RE values for all the ARIs. Hence, Model 1 is regarded as the top model in terms of RE value.

Furthermore, when comparing different models for CE values, it can be found that Models 1, 2, 3 and 4 outperform the remaining four models. A poor performance can be seen in case of Models 5, 6, 7 and 8 both for smaller and higher ARIs. Models 3 and 4 perform closely except for Q10 where CE value is 0.72 for Model 3 as compared to 0.56 for Model 4. Overall, Models University of Western Sydney 88

Artificial Intelligence Based RFFA Aziz 1 and 2 are found to be performing well with CE value as 0.66. However, Model 1 exhibits more consistency and better CE values for different ARIs when compared with closely performing Model 2 as shown in Table 5.3. On the basis of results shown in Table 5.3, Model 1 (two variables) can be ranked as top model followed by Model 2 (three variables).

Table 5.3 Comparison of eight different ANN based RFFA models using 90 independent test catchments Models

Quantiles Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8

Q2 0.73 0.68 0.78 0.76 0.37 0.21 0.40 0.70

Q5 0.61 0.52 0.65 0.65 0.79 0.28 0.71 0.70

Q10 0.63 0.78 0.72 0.56 0.46 0.40 0.39 0.67 CE Q20 0.71 0.72 0.68 0.68 0.57 0.67 0.27 -0.19

Q50 0.68 0.68 0.59 0.62 -0.33 0.36 0.54 0.14

Q100 0.52 0.57 0.44 0.55 0.37 0.33 0.45 0.37 Average 0.66 0.66 0.64 0.63 0.37 0.38 0.46 0.40 Median 0.64 0.68 0.67 0.64 0.42 0.35 0.43 0.52

Q2 1.04 1.09 0.94 1.26 1.37 1.19 1.36 1.20

Q5 0.99 1.03 1.02 1.13 1.13 1.09 1.08 1.24

Qpred/Qobs Q10 1.02 1.31 1.31 1.07 1.34 1.41 1.21 1.06

(median) Q20 1.04 1.09 1.06 1.01 1.26 1.07 1.19 0.94

Q50 1.14 1.06 1.41 1.16 1.17 1.69 1.22 1.11

Q100 1.10 1.09 1.14 1.30 1.37 1.03 0.96 1.03 Average 1.06 1.11 1.15 1.16 1.27 1.25 1.17 1.10 Median 1.04 1.09 1.14 1.16 1.27 1.19 1.19 1.10

Q2 37.56 49.93 44.22 46.98 55.75 40.28 61.36 44.05

Q5 40.39 50.01 39.60 44.25 49.56 57.66 38.28 46.78 Q10 44.63 43.98 55.26 39.35 49.87 55.01 55.20 44.68

RE (%) Q20 35.62 30.65 49.42 40.69 47.48 51.66 46.90 52.95

(median) Q50 39.09 44.00 55.01 41.10 69.61 78.77 46.66 66.80

Q100 44.53 44.13 51.18 60.08 55.75 53.11 53.72 49.20 Median 39.74 44.07 50.30 42.68 52.81 54.06 50.31 47.99 Average 40.30 43.78 49.12 45.41 54.67 56.08 50.35 50.74

In the second stage, the ANN based RFFA models are ranked on the basis of median

Qpred/Qobs ratio values. A criterion is developed to rank the models for different ARIs and the catchments are rated as ‘good’, ‘reasonable’, ‘bad’ and ‘very bad’ as shown in Tables 5.4 and 5.5. In this stage, two top ranked models found in the first stage (i.e. Models 1 and 2) are selected for comparison.

University of Western Sydney 89

Artificial Intelligence Based RFFA Aziz From Table 5.5, it is clear that Model 1 outperforms Model 2 in terms of ‘good’ groupings except for Q10 and Q20 with very small difference. On the other hand, Model 1 shows higher number of stations in ‘reasonable’ groupings and lower number of stations for ‘bad’ and ‘very bad’ groupings. Thus it can be concluded that Model 1 outperforms Model 2 when catchments are rated on the basis of median Qpred/Qobs ratio. Table 5.6 and Table 5.7 show the comparison between the best performing Model 1 and Model 2. As shown in Table 5.6, Model 1 provides a median Qpred/Qobs ratio value closer to 1 as compared to Model 2 except for Q50. Similarly, as shown in Table 5.7, Model 1 shows much smaller values of median RE for Q2, Q5 and Q50, a similar median RE values for Q10 and Q100 and a higher median RE value for Q20. These results demonstrate that overall Model 1 outperforms Model 2 for the ANN based RFFA models.

Table 5.4 Rating on the basis of median Qpred/Qobs ratio

Group Ratios (Median) Very bad less than 0.25 and above 4 Bad 0.26-0.49 and 2-4 Reasonable 0.5-0.69 and 1.41-2 Good 0.7-1.4

Table 5.5 Grouping of stations on the basis of median Qpred/Qobs ratio using the criteria of Table 5.4 (ANN based RFFA models)

Model 1 Model 2

No. of stations No. of stations

Quantile Very bad Bad Reasonable Good Very bad Bad Reasonable Good

Q2 6 18 27 39 7 25 27 31

Q5 6 20 24 40 10 24 23 33

Q10 5 21 31 33 5 19 30 36

Q20 5 20 24 41 9 20 15 46

Q50 14 19 19 38 11 21 20 38

Q100 8 23 27 32 11 23 23 33

Overall (%) 9.7 26.8 33.6 49.3 11.7 29.2 30.5 48.0

University of Western Sydney 90

Artificial Intelligence Based RFFA Aziz

Table 5.6 Comparison of Model 1 and Model 2 on the basis of median Qpred/Qobs ratio value using 90 independent test catchments (ANN based RFFA models)

Quantiles Median Qpred/Qobs ratio Model 1 Model 2

Q2 1.04 1.09

Q5 0.99 1.03

Q10 1.02 1.31

Q20 1.04 1.09

Q50 1.14 1.06

Q100 1.10 1.09

Table 5.7 Comparison of Model 1 and Model 2 on the basis of median relative error (RE) values using 90 independent test catchments (ANN based RFFA models)

Quantiles RE (median) (%) Model 1 Model 2

Q2 37.56 49.93

Q5 40.39 50.01

Q10 44.63 43.98

Q20 35.62 30.65

Q50 39.09 44.01

Q100 44.53 44.13

5.4 Selection of predictor variables based on GEP models

In the first stage, various GEP based RFFA models are compared based on median Qpred/Qobs ratio, RE and CE values. Table 5.8 shows the median Qpred/Qobs ratio, RE and CE values for various GEP based RFFA models. The median Qpred/Qobs ratio values range from 0.06 (Model

5) to 2.07 (Model 8) with the best median Qpred/Qobs ratio value of 1.02 (for Model 1 and

Model 7). Other models produce some very good median Qpred/Qobs ratio values but for some

ARIs they show notable variation e.g., Model 4 produces median Qpred/Qobs ratio value as 0.99 for Q5 but 1.49 and 1.42 for Q20 and Q10 respectively. Similarly, Model 8 median Qpred/Qobs ratio value ranges from 0.02 to 1.50. Model 3 produces overall median Qpred/Qobs ratio value of 0.97, with 57% over-prediction for Q50 and 89% under-prediction for Q100. A clear inconsistency can be found in these models with overall median Qpred/Qobs ratio values of 1.10 to 1.27. In case of Models 2 and 3, reasonably good values can be seen for all the ARIs except for Q20 (Model 2) and Q100 (Model 3) with an overestimation of 54% and a poor performance for Q100 with median Qpred/Qobs ratio value of 0.22. Model 1 consisting variables A and I_tc_ARI

University of Western Sydney 91

Artificial Intelligence Based RFFA Aziz outperforms the other models producing an overall median Qpred/Qobs ratio value of 1.06 and a range from 1.02 for Q20 and Q100 and 1.10 for Q5. Hence Model 1 can be ranked as number 1 on the basis of median Qpred/Qobs ratio.

Table 5.8 Comparison of eight different GEP based RFFA models using 90 independent test catchments

Models

Quantiles Model 1 Model 2 Model 3 Model 4 Model 5 Model 6 Model 7 Model 8

Q2 0.49 0.63 0.59 0.65 0.59 0.62 0.73 0.55

Q5 0.67 0.64 0.67 0.68 0.27 0.56 0.69 0.46

Q10 0.56 0.39 -9.59 -5.10 -14.09 0.25 0.46 -0.77 CE Q20 0.67 -2.86 0.52 0.56 0.62 0.63 0.61 0.58

Q 0.63 0.33 -0.87 0.54 0.49 -17.74 0.10 0.10 50

Q100 0.67 -0.01 -0.28 -0.29 -0.44 -0.20 -0.20 -0.27 Average 0.61 -0.15 -1.49 -0.49 -2.09 -2.65 0.40 0.11 Median 0.65 0.36 0.12 0.55 0.38 0.41 0.53 0.28

Q2 1.07 1.30 0.94 1.13 0.93 1.07 0.98 1.21

Q5 1.10 0.98 0.99 0.99 1.24 1.05 1.02 1.50

Q10 1.04 1.13 1.08 1.42 0.94 1.29 1.35 1.18 Qpred/Qobs (median) Q20 1.02 1.54 1.13 1.49 1.26 1.20 1.32 1.33

Q50 1.05 1.25 1.57 1.18 1.20 -0.62 1.97 2.07

Q100 1.02 0.22 0.11 0.26 -0.06 0.27 0.10 0.02 Average 1.02 1.07 0.97 1.08 0.92 0.71 1.12 1.22 Median 1.03 1.19 1.03 1.16 1.07 1.06 1.17 1.27

Q2 45.87 50.28 81.35 70.28 76.48 66.28 43.78 70.09

Q5 44.95 56.16 57.29 46.01 85.03 46.63 49.06 58.08

Q10 42.08 64.72 43.42 55.57 56.01 91.40 56.11 45.78

RE (%) Q20 41.53 93.67 46.93 51.31 43.72 43.09 47.08 51.56

(median) Q50 37.87 61.25 70.60 50.44 61.30 218.36 96.53 107.00

Q100 44.47 82.18 88.77 78.55 107.73 76.78 90.15 98.19

Median 43.27 62.98 63.94 53.44 68.89 71.53 52.59 64.09 Average 42.97 68.04 64.73 58.69 71.71 90.42 63.79 71.78

The RE values for various GEP based RFFA models for different ARIs range from 37.87% (Model 1) to 218.36% (Model 6) as can be seen in Table 5.8. Notable higher RE values can be seen for Models 5, 6, and 8 ranging from 43% to 218%. Models 4 and 7 produce RE values in the range of 43% to 96%. Despite comparatively higher RE values, a consistency can be University of Western Sydney 92

Artificial Intelligence Based RFFA Aziz found for these two models. But, for higher ARIs these two models are unable to maintain this consistency especially for Q50 and Q100 with RE values of 96% and 88%. Models 1, 2 and 3 outperform the other models with RE values ranging from 37% to 88% and the overall values of 43% to 63%. In case of Model 2, a higher RE values can be seen for medium to higher

ARIs but it produces good results for Q2 and Q5. Overall, Model 1 dominates Model 2 in terms of consistency and competitive RE values for all the ARIs and hence Model 1 is ranked number 1.

Furthermore, when comparing different models with respect to CE values, it is found that Models 1, 2 and 7 outperform the remaining five models. A poor performance can be seen in case of these five models. Overall, a good performance can be seen in case of small to medium ARIs for Models 2 and 7; however, they perform poorly in case of higher ARIs. Overall, Model 1 is found to be performing well with an average CE value of 0.61. However, Model 1 exhibits more consistency and better CE values for different quantiles when compared with closely performing Models 2 and 7, as shown in Table 5.8.

Hence, for GEP based RFFA models, Model 1 with two predictor variables (A, I_tc_ARI) outperforms other models with respect to median Qpred/Qobs ratio, RE and CE values as demonstrated in Table 5.8.

At the second stage, the GEP based RFFA models are ranked on the basis of median

Qpred/Qobs ratio values as shown in Table 5.9. Similar to ANN based RFFA models, criterion is developed to rank the models for different ARIs and the catchments are rated as ‘good’, ‘reasonable’, ‘bad’ and ‘very bad’ as shown in Table 5.9. In the second stage two top ranked models (selected in stage 1) (Model 1 and Model 2) are selected for comparison. From Table 5.9, it is clear that Model 1 outperforms Model 2 in terms of ‘good’ except for smaller ARIs. Also, Model 1 shows higher number of stations in ‘reasonable’ grouping and lower number of stations in the ‘bad’ and ‘very bad’ groupings. Hence, it can be concluded that Model 1 outperforms Model 2 when catchments are rated on the basis of median Qpred/Qobs ratio. Tables 5.12 and Table 5.13 show the comparison between best performing Models 1 and 2.

As shown in Table 5.10, Model 1 provides a median Qpred/Qobs ratio value closer to 1 as compared to Model 2. Similarly, as shown in Table 5.13, Model 1 shows much smaller values of median RE for all the ARIs.

On the basis of results shown in Table 5.8, Model 1 (A, I_tc_ARI) can be ranked as top model followed by Model 2 (A, I_tc_ARI, S) for the GEP based RFFA models.

University of Western Sydney 93

Artificial Intelligence Based RFFA Aziz

Table 5.9 Grouping of stations on the basis of median Qpred/Qobs ratio values using the criteria of Table 5.4 for GEP based RFFA models

Model 1 Model 2 No. of stations No. of stations Quantile Very bad Bad Reasonable Good Very bad Bad Reasonable Good

Q2 20 24 17 29 7 28 23 32

Q5 14 21 23 32 15 25 18 32

Q10 13 21 24 32 15 25 21 29

Q20 24 32 19 15 29 21 18 16

Q50 18 23 21 27 13 23 23 31

Q100 17 21 28 24 31 26 14 19 Overall (%) 23.5 31.4 29.2 35.2 24.3 32.7 25.9 35.2

Table 5.10 Comparison of Models 1 and 2 on the basis of median Qpred/Qobs ratio values using 90 independent test catchments (for GEP based RFFA models )

Quantiles Median Qpred/Qobs ratio Model 1 Model 2

Q2 1.07 1.30

Q5 1.10 0.98

Q10 1.04 1.13

Q20 1.02 1.54

Q50 1.05 1.25

Q100 1.02 0.22

Table 5.11 Comparison of Models 1 and 2 on the basis of RE values using 90 independent test catchments (for GEP based RFFA models)

Quantiles RE (median) (%)

Model 1 Model 2

Q2 45.87 50.28

Q5 44.95 56.16

Q10 42.08 64.72

Q20 41.53 93.67

Q50 37.87 61.25

Q100 44.47 82.18

University of Western Sydney 94

Artificial Intelligence Based RFFA Aziz 5.5 Summary

This chapter has examined various combinations of predictor variables to select the best set to be adopted in the RFFA modelling. Two artificial intelligence based modelling techniques (ANN and GEP) are used to develop the prediction equations using data of the selected 362 catchments. Independent testing is performed using 90 test catchments. Models are assessed based on ratio between predicted and observed flood quantiles, percent relative error and coefficient of efficiency. Based on the independent testing, it has been found that the ANN and GEP based RFFA models with only two predictor variables (catchment area and design rainfall intensity) outperform other models with a greater number of predictor variables. This model would be easier to apply in practice as the data for two predictor variables can be obtained relatively easily from the published maps and government websites. In the subsequent analyses presented in the next chapters, these two predictor variables (catchment area and design rainfall intensity) will be used.

University of Western Sydney 95

Artificial Intelligence Based RFFA Aziz CHAPTER 6

SELECTION OF REGIONS

6.1 General

In regional flood frequency analysis (RFFA), one of the key steps is to identify the acceptable/optimum region(s) which consist(s) of a set of gauged catchments that may be treated as homogeneous. Previous chapters cover the selection of study area, catchment data and the predictor variables to be used in the RFFA presented in this study. This chapter focuses on the formation and comparison of regions based on state, geographic and climatic boundaries as well as based on the catchment attributes. These regions are tested by developing RFFA models using artificial neural network (ANN) technique and the best performing region is then selected (as the optimum region) based on the results of the comparison of the alternative regions. This optimum region is then used to develop RFFA models using all the selected artificial intelligence based RFFA methods considered in this thesis.

6.2 Description of candidate regions

To identify the optimum regions for RFFA modeling in eastern Australia, a number of candidate regions are formed as discussed below.

Regions based on state and geographic boundaries

Initially, each of the states of Victoria (VIC), New South Wales (NSW), Queensland (QLD) and Tasmania (TAS) are treated as a separate region. The data for each of these regions are discussed in detail in section 4.6. These states cover the eastern part of Australia (Figure 4.1). These candidate regions are shown in Table 6.1.

Regions based on climatic boundaries

The Australian northern part is dominated by summer rainfall and the southern part is mainly dominated by winter rainfall. In this step, data set is divided into two sub-sets i.e., summer dominated rainfall region (SDRR) and winter dominated rainfall region (WDRR).

University of Western Sydney 96

Artificial Intelligence Based RFFA Aziz Combined data set

Here, the data for all the four states are combined to form one region. The detail of all the candidate regions based on state boundaries, geographic and climatic conditions are shown in Table 6.1.

Table 6.1 Description of candidate regions

Region label Description of region No. of stations Abbreviated region name 1 New South Wales 96 NSW 2 Victoria 131 VIC 3 Queensland 172 QLD 4 Tasmania 53 TAS 5 Combined Data Set 452 Combined 6 Summer Dominated Rainfall Region 203 SDRR 7 Winter Dominated Rainfall Region 249 WDRR

6.2.1 Selection of the best performing region based on state, geographic and climatic boundaries

In each of these candidate regions, the available data set is divided into two parts: (i) 80% for training (training data set); and (ii) 20% for testing/validation (validation data set). These sets are selected randomly from the respective grouping. For each grouping, the ANN-based RFFA model is built and used to predict 2, 5, 10, 20, 50 and 100 years ARI flood quantiles for the selected 20% test catchments. The structure, algorithm and other criteria of ANN based analyses are kept uniform throughout the analysis and are explained in Chapter 3.

Three statistical measures i.e. Qpred/Qobs ratio, relative error (RE) and coefficient of efficiency (CE) (as mentioned in section 5.2) are used to assess the model performance.

Table 6.2 summarises the median Qpred/Qobs ratio values for the seven candidate regions. For

NSW candidate region, median Qpred/Qobs ratio for Q10 is too small (0.17) which indicates a significant under-estimation. Also, for this region, Q50 shows remarkable over-estimation with a median Qpred/Qobs ratio of 1.82. For VIC candidate region, all the median Qpred/Qobs ratios seem to be reasonable with a range of 0.86 to 1.49. For QLD region, both Q50 and Q100 show an excellent median Qpred/Qobs ratio closer to 1.00 and median Qpred/Qobs ratios are in the range of 0.98 to 1.48, which appear to be reasonable. For TAS region, Q50 shows notable overestimation with a median Qpred/Qobs ratio value of 2.46.

University of Western Sydney 97

Artificial Intelligence Based RFFA Aziz

For SDRR and WDRR, results are better than the individual states except for Q50 for the

WDRR, which shows a notable overestimation with a median Qpred/Qobs ratio of 2.02. It seems that when the region size increases, the median Qpred/Qobs ratio values are more consistent over different ARIs. When all the data sets are combined together, the median Qpred/Qobs ratio values show remarkable improvement with a range of 0.99 to 1.14, which appears to be satisfactory. There are smaller differences in the median Qpred/Qobs ratio values across various ARIs for the combined data set as compared to other regions as illustrated in Figure 6.1.

Table 6.2 Median Qpred/Qobs ratio values for seven ANN based candidate regions

Quantiles Candidate regions based on state, geographic and climatic boundaries

NSW VIC QLD TAS SDRR WDRR Combined

Q2 1.38 1.06 1.28 1.08 1.14 1.25 1.04

Q5 0.84 1.13 1.48 1.56 1.21 1.06 0.99

Q10 0.17 0.86 1.11 1.65 1.38 1.26 1.02

Q20 1.53 1.49 1.11 0.74 0.84 1.28 1.04

Q50 1.82 1.17 0.98 2.46 1.32 2.02 1.14

Q100 1.22 1.24 1.00 1.05 1.21 1.33 1.10

In terms of median of the absolute relative error values (Table 6.3), for NSW Q10 and Q50 show very high median relative error values, which are 91% and 82% respectively. The best results are found for Q20 and Q100 with median relative error values close to 50%. For VIC region, median relative error values for Q50 and Q100 are in the range of 66% to 78%, which appear to be quite high. For QLD region, median relative error values are in the range of 37% to 58% which seems to be consistent across various ARIs and the best result among the individual states. For TAS region, Q50 has a very high median relative error value (146%), for other ARIs results are quite reasonable. It seems that there is a sharp increase and decrease in median relative error values from Q50 to Q100 which is unexpected. This indicates that for very small data set (TAS region has only 53 stations) ANN-based RFFA model provides inconsistent results across various ARIs.

For SDRR and WDRR, the median relative error values are in the range of 29% to 57% and 43% to 102%, respectively. Here all the median relative error values are in the reasonable range except for Q50 for WDRR region. When all the data are combined the median relative

University of Western Sydney 98

Artificial Intelligence Based RFFA Aziz error values are consistent across all the ARIs (in the range of 37% to 44%). There are smaller differences in the median relative error values across various ARIs for the combined data set as compared to other regions, as illustrated in Figure 6.2. These results clearly show that the combined data set provides the smallest median relative error values among all the seven candidate regions, which is also consistent in terms of median Qpred/Qobs ratio values as discussed before.

Table 6.3 Median relative error values (%) for seven ANN-based candidate regions

Quantile Candidate regions based on state, geographic and climatic boundaries

NSW VIC QLD TAS SDRR WDRR Combined

Q 2 48.21 78.05 42.42 65.77 52.40 48.50 37.56 Q 5 51.94 40.89 50.24 55.52 29.87 53.03 40.39 Q 10 91.52 39.75 37.67 64.61 52.79 43.88 44.63 Q 20 53.17 55.58 37.67 38.19 43.12 52.75 35.62 Q 50 82.08 73.75 57.90 146.47 57.66 102.13 39.09 Q 100 50.00 66.88 58.45 15.28 54.85 67.72 44.53 Overall 62.82 59.15 47.39 64.31 48.45 61.34 40.30

Figure 6.1 Plot of median Qpred/Qobs ratio values for different ARIs for selected regions

University of Western Sydney 99

Artificial Intelligence Based RFFA Aziz

Figure 6.2 Median relative error (%) values for different ARIs for selected regions

6.3 Regions based on catchment characteristics data

To identify regions/groups of catchments in catchment characteristics data space, two methods are adopted in this thesis: cluster analysis and principal component analysis. These methods have been discussed in Chapter 3. In the cluster and principal component analyses, five catchment characteristics variables (catchment area, design rainfall intensity, mean annual evapo-transpiration, mean annual rainfall and main stream slope) are adopted.

6.3.1 Cluster analysis

The hierarchical cluster analysis

Hierarchical clustering is one of the most straightforward methods. For this study the hierarchical clustering is used with a combination of Wards-Block method, as discussed in Chapter 3.

K-means clustering

In this method all variables are given equal weights. The best results obtained from cluster analysis are summarised in Table 6.4, which deliver two groupings: A1 (405 stations) and A2

University of Western Sydney 100

Artificial Intelligence Based RFFA Aziz (45 stations) from Wards-Block clustering and B1 (362 stations) and B2 (90 stations) from K- Means clustering.

Table 6.4 Regions/groups formation by cluster analysis

Total no. of Out of cluster Method Grouping Grouping stations stations Wards-Block Cluster 452 405 (A1) 45(A2) 2 combination K-Means Cluster 452 362 (B1) 90 (B2) 0

University of Western Sydney 101

Artificial Intelligence Based RFFA Aziz

Figure 6.3 Dendrogram using average linkage between groups University of Western Sydney 102

Artificial Intelligence Based RFFA Aziz

Figure 6.3 (a) Section of Dendrogram using average linkage between groups

University of Western Sydney 103

Artificial Intelligence Based RFFA Aziz

Figure 6.3 (b) Section of Dendrogram using average linkage between groups

In terms of median ratio values, for individual ARIs grouping A1 outperforms the other groupings (A2, B1, and B2) except for Q20, where A2 performs better than A1 as shown in

Tables 6.5 and 6.6. When comparing the overall Qpred/Qobs ratio values, A1, B1 and B2 perform similarly (with median Qpred/Qobs ratio values 1.1 or 1.2); here, A2 performs quite poorly with median Qpred/Qobs ratio value of 1.9. In terms of median relative error, grouping A1 seems to be producing consistent and reasonable results. For grouping A2, median relative error values for Q50 and Q100 are very high (164% and 191%, respectively), a similar observation for Q50 for grouping B1 and Q5 and Q10 for grouping B2 can be seen in Tables 6.5 and 6.6. Overall, grouping A1 shows the best results among cluster groupings. However, if both groupings A1 and A2 are compared (generated by Wards-Block cluster analysis method) against groupings B1 and B2 (generated by K-means cluster analysis method), groupings B1 and B2 perform better than groupings A1 and A2. This shows that K-means cluster analysis method has generated better groupings than the Wards-Block cluster analysis method.

University of Western Sydney 104

Artificial Intelligence Based RFFA Aziz Table 6.5 ANN based RFFA model performances for cluster groupings A1 & A2

Grouping A1 Grouping A2 Quantile (405 stations) (45 stations)

Qpred/Qobs ratio RE (Median) Qpred/Qobs ratio RE (Median) ARI (Median) (%) (Median) (%)

Q2 1.0 44.6 2.3 132.4

Q5 1.2 45.4 1.5 48.7

Q10 1.1 44.4 1.1 41.6

Q20 1.4 56.0 1.1 41.4

Q50 1.3 54.5 2.6 164.6

Q100 1.3 47.5 2.9 191.3

Overall 1.2 48.7 1.9 103.3

Table 6.6 ANN- based RFFA model performances for cluster groupings B1 & B2

Grouping B1 Grouping B2 Quantile (362 stations) (90 stations)

Qpred/Qobs ratio RE (Median) Qpred/Qobs ratio RE (Median) ARI (Median) (%) (Median) (%)

Q2 0.9 52.6 1.3 55.8

Q5 1.1 57.9 1.0 71.0

Q10 0.9 38.6 1.7 75.0

Q20 0.8 39.1 0.7 41.5

Q50 1.3 61.5 1.1 14.6

Q100 1.4 46.7 1.1 56.1 Overall 1.1 49.4 1.2 52.3

6.3.2 Principal component analysis

At the second stage, the principal component analysis (PCA) is undertaken. The eigenvalue and the percentage variance explained for each of the derived 5 principal components are listed in Table 6.7. The first two components have eigenvalues greater than 1, and account for about 60% of the total variance. However, component 3 has eigenvalue not significantly different from 1 (0.957). However, the component one (PC1) and component two (PC2)

University of Western Sydney 105

Artificial Intelligence Based RFFA Aziz account for more than 50% of the variation in the data, hence PC1 and PC2 may be deemed to be adequate in capturing the bulk of the information in the data. The plots of PC1 vs PC2 are shown in Figures 6.5 and 6.6. In Figure 6.5, two groups are formed based on PC1: Group C1 with PC1  0.0 and Group C2 with PC2 < 0. In Figure 6.6, similarly two groups are formed based on PC2: Group D1 with PC  0 and Group D2 with PC2 < 0. Table 6.7 summarises these groupings. Table 6.8, explains the component matrix later named as PC1 and PC2. Table 6.9 explains the statistics of different variables used in this study.

Table 6.7 Eigenvalues and variance explained by the principal components

Initial eigenvalues

Component Total % of variance Cumulative %

1 1.758 35.160 35.160

2 1.236 24.718 59.878

3 0.957 19.149 79.027

4 0.774 15.481 94.508

5 0.275 5.492 100.000

Table 6.8 Component matrix in principal component analysis

Component 1 2

Zevap -0.042 0.451

ZI_12_2 0.899 -0.209

Zrain 0.906 -0.156

Zarea -0.253 -0.708

Zslope 0.249 0.68

University of Western Sydney 106

Artificial Intelligence Based RFFA Aziz

Figure 6.4 Scree plot from principal component analysis

Table 6.9 Descriptive statistics of standardised variables

Mean Standard deviation No. of data points

Zevap .0141 1.019 360

ZI_12_2 -.0413 0.976 360

Zrain -0.025 0.995 360

Zarea 0.017 1.036 360

Zslope 0.025 1.085 360

f

Figure 6.5 Grouping derived from PC1 vs PC2 plot based on PC1

University of Western Sydney 107

Artificial Intelligence Based RFFA Aziz

Figure 6.6 Grouping derived from PC1 vs PC2 plot based on PC2

In each of these accepted candidate groupings, the available data set is divided into 80% for training, and 20% for testing. Similar assessment criteria are used as mentioned in Section 5.4.1.

Table 6.10 shows the results of the performance assessment of the PCA-based groupings.

With respect to median Qpred/Qobs ratio values, grouping D1 outperforms other PCA-based groupings. With respect to median relative error values, grouping D1 is the best performer. Overall, groupings based on PC2 (i.e. groupings D1 and D2) outperforms the grouping based on PC1 (which are groupings C1 and C2).

Now, if the best grouping based on cluster analysis (which are B1 and B2) are compared with the best PCA-based grouping (which is D1 and D2), in terms of relative error, they perform quite similarly, with little better performances for cluster analysis grouping B1 and B2. Hence, it can be concluded that K-means cluster analysis generates the best performing groups/regions in the catchment characteristics data space.

In the Tables 6.11 and 6.12, the results of the best catchment characteristics based groupings (which are B1 and B2) are compared with various geographic regions as discussed in Section 6.2.1.

In the last step, the groups performing better in case of cluster analysis and PCA are compared with the candidate regions based on geographic/state boundaries (Section 6.2.1). Table 6.11 and 6.12 summarise the results based on different candidate regions.

University of Western Sydney 108

Artificial Intelligence Based RFFA Aziz Table 6.10 Grouping based on principal component analysis

Grouping based on PC1 Grouping based on PC2

Quantile Grouping C1 Grouping C2 Grouping D1 Grouping D2 s RE Qpred/Qobs RE Qpred/Qobs RE Qpred/Qobs RE Qpred/Qobs (Median ARI ratio (Median) ratio (Median) ratio (Median) ratio ) (Median) (%) (Median) (%) (Median) (%) (Median) (%)

Q2 1.3 48.1 1.4 55.1 1.5 52.3 1.8 80.7

Q5 1.4 64.0 1.2 62.5 1.4 48.4 1.0 47.8

Q10 1.4 44.8 0.9 51.6 1.1 48.7 1.2 35.4

Q20 1.3 59.7 1.4 54.4 1.2 41.1 1.5 45.9

Q50 1.2 58.3 1.2 53.0 1.1 50.3 1.4 60.7

Q100 0.5 91.5 1.2 44.1 1.5 53.5 0.9 68.5

Overall 1.2 61.1 1.2 53.5 1.3 49.1 1.3 56.5

In terms of median Qpred/Qobs ratio values both the groupings based on cluster analysis and PCA outperform the groupings based in individual states as shown in Table 6.11. However, the grouping A1 performs better than grouping D1 except for Q20 and Q100. But in terms of consistency and an overall value of median Qpred/Qobs ratio, grouping A1 is found to perform well. Finally grouping A1 is compared with combined data set. Both groupings perform almost similar except for Q2 and Q10, but for the other ARIs combined data set outperform grouping A1. Combined data set also shows an overall consistency and better average value of median Qpred/Qobs ratio. Hence on the basis of median Qpred/Qobs ratio value, it can be concluded that combined data set perform better than all other candidate regions.

Table 6.12 shows the median relative error values for grouping A1, D1, individual states and combined data set. All the groups based on state boundaries show the poor performance except for QLD which shows better results for small to medium ARIs. However this region shows an overall poor performance. When A1 is compared with D1, it is noticed that overall both groups perform approximately similar to each other. However, A1 performs better for smaller ARIs while D1 performs well for higher ARIs; but overall, grouping based on cluster analysis outperform the grouping based on PCA. Moreover, when the median relative error values are compared between grouping A1 and combined data set; the latter is found to be performing well except for Q2 as shown in Figure 6.8.

University of Western Sydney 109

Artificial Intelligence Based RFFA Aziz

Hence on the basis of median Qpred/Qobs ratio and median relative error values it can be concluded that combined data set perform better than all other candidate regions and can be used for final model development.

Table 6.11 Median Qpred/Qobs ratio values for seven candidate regions

Grouping A1 Grouping D1 Quantiles NSW VIC QLD TAS Combined (cluster analysis) (PCA)

Q2 1.0 1.5 1.4 1.1 1.3 1.1 1.4

Q5 1.2 1.4 0.8 1.1 1.5 1.6 1.1

Q10 1.1 1.1 0.2 0.9 1.1 1.6 1.2

Q20 1.4 1.2 1.5 1.5 1.1 0.7 1.1

Q50 1.3 1.1 1.8 1.2 1.0 2.5 1.1

Q100 1.3 1.5 1.2 1.2 1.0 1.0 1.1 Overall 1.2 1.3 1.2 1.2 1.2 1.4 1.1

Table 6.12 Median relative error (%)

Grouping A1 Grouping D1 Quantiles NSW VIC QLD TAS Combined (Cluster analysis) (PCA)

Q2 44.6 52.3 48.2 78.1 42.4 65.8 56.2

Q5 45.4 48.4 51.9 40.9 50.2 55.5 41.4

Q10 44.4 48.7 91.5 39.8 37.7 64.6 39.1

Q20 56.0 41.1 53.2 55.6 37.7 38.2 37.2

Q50 54.5 50.3 82.1 73.7 57.9 146.5 40.0

Q100 47.5 53.5 50.0 66.9 58.4 15.3 39.6 Overall 48.7 49.1 62.8 59.2 47.4 64.3 42.3

Figure 6.7 Median Qpred/Qobs ratio values for different ARIs for candidate regions University of Western Sydney 110

Artificial Intelligence Based RFFA Aziz

Figure 6.8 Median relative error (%) values for different ARIs for candidate regions

Figure 6.9 Comparison of median relative error (%) values between combine data set and grouping based on K-Means cluster analysis

6.4 Summary

This chapter has focused on the application of artificial neural network (ANN) based regional flood frequency analysis (RFFA) in eastern Australia with a particular focus on the formation of regions. Regions/groupings are first formed on the basis of state/geographic boundaries and climatic boundaries. In the second step, the regions are formed in the catchment characteristics data space based on cluster analysis and principal component analysis. It has been found that that K-Means cluster analysis generates the best performing groups/regions in

University of Western Sydney 111

Artificial Intelligence Based RFFA Aziz the catchment characteristics data space. When compared with the geographic regions, some state-based groupings perform poorer than the K-Means cluster groupings. Overall, the best ANN based RFFA model is achieved when all the data of 452 catchments are combined together, which gives a RFFA model with median relative error of 37% to 44%. Since all the stations when combined together form the best performing region, this will be used in the subsequent chapters for other artificial intelligence based RFFA model building.

University of Western Sydney 112

Artificial Intelligence Based RFFA Aziz

CHAPTER 7

DEVELOPMENT OF ARTIFICIAL INTELLIGENCE BASED RFFA MODELS

7.1 General

Previous two chapters have presented the selection of predictor variables and optimum region for the development of artificial intelligence based RFFA models for eastern Australia. This chapter presents the development of RFFA models based on the selected predictor variables and optimum region using four artificial intelligence based methods, artificial neural networks (ANN), genetic algorithm based artificial neural networks (GAANN), gene-expression programing (GEP) and co-active neuro fuzzy inference system (CANFIS). A description of these methods has been provided in Chapter 3.

The model development presented in this chapter involves training of a model using part of the randomly selected data set. For this purpose, 80% (362 catchments) of the total 452 catchments are used to train the model (training data set) and the remaining 20% (90 catchments) are used to validate the model (validation data set). This division of the data set has been done randomly. In the traditional hydrological model building sense, the training/calibration of a model involves identification of a set of model parameters that allows satisfactory transformation of selected model input(s) to model output(s). In case of hydrological models, the calibration is generally carried out by a ‘trial and error’ method.

In this study, the artificial intelligence based models, which are basically black box type models, are trained/calibrated using the training data set based on minimisation of the mean squared error between the observed and predicted flood quantiles by the model (being trained) for a given ARI for the training data set. The artificial intelligence based RFFA models are also evaluated based on four criteria: median Qpred/Qobs ratio, plot of Qobs and Qpred, median relative error (RE) and coefficient of efficiency (CE). This is initially done for the training data set and then repeated for the validation data set. Models are ranked based on their relative performances in relation to these criteria to identify the best trained/calibrated model.

University of Western Sydney 113

Artificial Intelligence Based RFFA Aziz 7.2 Training of artificial intelligence based RFFA models

At the beginning each of the four artificial intelligence based RFFA models is trained using MATLAB codes (developed as a part of this research) by minimising the mean squared error between the observed and predicted flood quantiles for each of six ARIs (2, 5, 10, 20, 50 and 100 years). This is done using the training data set consisting of 362 catchments as mentioned in Section 7.1. Table 7.1 and Figure 7.1 show the CE values for the ANN, GANN, GEP and CANFIS based RFFA models. Among these four models, the GAANN is found have the highest CE values for ARIs of 2, 5, 10 and 20 years. For ARIs of 50 and 100 years, the ANN has the highest CE values. Considering all the six ARIs, GAANN has the highest CE value (0.71) and the three other models have similar CE values in the range of 0.67 to 0.66.

Table 7.1 CE values of four artificial intelligence based RFFA models based on training data set ARI (years) ANN GAANN GEP CANFIS 2 0.59 0.76 0.69 0.64 5 0.73 0.79 0.72 0.67 10 0.64 0.76 0.73 0.75 20 0.71 0.76 0.65 0.73 50 0.70 0.57 0.61 0.53 100 0.64 0.63 0.57 0.62 Overall 0.67 0.71 0.66 0.66

Figure 7.1 Plot of CE values of four artificial intelligence based RFFA models based on training data set

University of Western Sydney 114

Artificial Intelligence Based RFFA Aziz

Table 7.2 and Figure 7.2 show the median Qpred/Qobs ratio values for the four artificial intelligence based RFFA models. The ANN based RFFA model shows the best performance

(i.e. Qpred/Qobs ratio value is closest to 1.00) for ARIs of 20, 50 and 100 years. Considering all the six ARIs, the ANN outperforms the other three models with an overall Qpred/Qobs ratio value of 1.09. The second best performance is demonstrated by the GEP (1.19), while the GAANN and CANFIS perform similarly. In terms of consistency over the ARIs, GAANN,

GEP and CANFIS show very high Qpred/Qobs ratio values for some ARIs as can be seen in Table 7.2. Here again, the ANN shows the best consistency over the ARIs.

Table 7.2 Median Qpred/Qobs ratio values of four artificial intelligence based RFFA models based on training data set

ARI (years) ANN GAANN GEP CANFIS 2 1.03 1.22 0.99 1.76 5 1.12 1.20 1.08 0.99 10 1.06 1.02 1.08 0.87 20 1.10 1.11 1.17 1.26 50 1.08 1.52 1.45 1.04 100 1.15 1.18 1.39 1.36 Overall 1.09 1.21 1.19 1.21

Figure 7.2 Plot of median Qpred/Qobs ratio values of four artificial intelligence based RFFA models based on training data set

University of Western Sydney 115

Artificial Intelligence Based RFFA Aziz Table 7.3 and Figure 7.3 show the median of the absolute relative error values for the ANN, GAANN, GEP and CANFIS based RFFA models. It can be seen that ANN based RFFA model outperforms the other models with a median RE value of 42.07% over all the six ARIs. In some cases, the GAANN based RFFA model performs better or equal to the ANN based model i.e. for ARIs of 2, 5, 20 and 100 years; however, for 50 years ARI it shows a very high RE (60%). In terms of consistency over the ARIs, ANN outperforms the other three models. Both GEP and CANFIS have quite high RE values (GEP = 54.02%, CANFIS = 59.46%). Importantly, CANFIS shows very high RE values for 2 years ARI (94.02%) and 50 years ARI (71.94%). Overall, in terms of RE value, the ANN is the best performer, followed by the GAANN, GEP and CANFIS.

Table 7.3 Median RE (%) values of four artificial intelligence based RFFA models (training)

ARI (years) ANN GAANN GEP CANFIS 2 43.75 40.92 73.3 94.02 5 39.53 39.31 43.91 43.55 10 39.14 41.01 43.25 45.27 20 40.38 40.29 54.61 46.07 50 43.32 60.00 54.22 71.94 100 46.30 45.28 54.82 55.89 Overall 42.07 44.47 54.02 59.46

Figure 7.3 Plot of median RE (%) values of four artificial intelligence based RFFA models based on training data set

University of Western Sydney 116

Artificial Intelligence Based RFFA Aziz The predicted and the observed flood quantiles for the ANN based RFFA model for 20 years ARI is shown in Figure 7.4 (the plots for the other five ARIs can be seen in Appendix B, Figures B.1 to B.5). The reason of adopting 20 years ARI is that it is the most frequently applied ARI in design. These plots generally present a good agreement between the predicted and observed flood quantiles; however, there is some over-estimations by the ANN-based RFFA model when the observed flood quantiles are smaller than about 50 m3/s for all the ARIs except 50 years. Most of the training catchments are within a narrow range of variability from the 45-degree line except for a few outliers, in particular for higher discharges. Overall, the ANN based RFFA model shows better training results for higher discharges.

Figure 7.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q20 (training data set)

Figure 7.5 shows the plot of predicted flood quantiles by the GAANN-based RFFA model and the observed flood quantiles for 20 years ARI (the plots for the other four ARIs can be seen in Appendix B, Figures B.6 to B.10). These plots show that GAANN based RFFA model generally presents a good agreement between the observed and predicted flood quantiles; however, for ARI of 50 years (Figure B.9) (and to some degree for ARI of 5 years), there is a notable overestimation by the GAANN based RFFA model. Also, the 100 years ARI (Figure B.10) shows a notable scatter around the 45-degree line, in particular for small and medium discharges. Overall, the GAANN based RFFA model shows better training results for higher discharges.

University of Western Sydney 117

Artificial Intelligence Based RFFA Aziz

Figure 7.5 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q20 (training data set)

Figure 7.6 compares the predictcted flood quantiles by the GEP based RFFA model with the observed flood quantiles for 20 years ARI (Q20) (the plots for the other four ARIs can be seen in Appendix B, Figures B.11 to B.15). Figure 7.6 generally presents a good agreement between the predicted and observed flood quantiles. For the 2 and 5 years ARIs (Figures B.11 and B.12, respectively), there are few outliers and for 50 and 100 years ARIs (Figures B.14 and B.15, respectively), there is noticeable over estimation by the GEP based RFFA model for small to medium discharges. Overall, the GEP based RFFA model shows better training results for higher discharges.

Figure 7.7 shows the plot of predicted flood quantiles by the CANFIS based RFFA model and the observed flood quantiles for 20 years ARI (the plots for other ARIs can be seen in Appendix B, Figures B.16 to B.20). Figure 7.7 shows an over estimation by the CANFIS based RFFA model for smaller discharges for 20 years ARI. A very similar pattern can be seen for ARI of 5 years (Figure B.17) and ARI of 100 years (Figure B.20). For ARI of 2 years (Figure B.16) and ARI of 10 years (Figure B.18), number of outliers can be seen plus a noticeable scatter around the 45-degree line. For 50 years ARI (Figure B.19), the scatter around the 45-degree line is significant. Overall, the CANFIS based RFFA model shows better training results for higher discharges for all the ARIs except 50 years.

University of Western Sydney 118

Artificial Intelligence Based RFFA Aziz

Figure 7.6 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q20 (training data set)

Figure 7.7 Comparison of observed and predicted flood quantiles for CANFIS based RFFA model for Q20 (training data set)

University of Western Sydney 119

Artificial Intelligence Based RFFA Aziz 7.3 Comparison of training and validation results

7.3.1 ANN

The CE, median Qpred/Qobs ratio and median relative error values are compared in Table 7.4 for the training and validation datasets for the ANN based RFFA model. Figures 7.8, 7.9 and

7.10 compare the CE, median Qpred/Qobs ratio values and median relative error values, respectively for the ANN based RFFA model. In terms of CE value, the best agreement between the training and validation data sets is found for ARIs of 10, 20 and 50 years, a reasonable degree of agreement is found for ARIs of 2 and 5 years and relatively poor agreement is found for the ARI of 100 years where the CE value for the validation data set is remarkably small. With respect to median Qpred/Qobs ratio value, the best agreement between the training and validation data sets is found for 2 years ARI, a moderate agreement is noticed for 10, 20, 50 and 100 years ARIs and a poor agreement is found for 5 years ARI. However, for 5 years ARI validation data set gives a very good Qpred/Qobs ratio value (0.99). In relation to the median relative error values, the best agreement between the training and validation data sets is found for ARIs of 5 and 100 years, a moderate agreement for ARI of 50 years and poor agreement for ARIs of 2 and 10 years. From these results, it is noted that the ANN based RFFA model shows different degrees of agreement between the training and validation data sets for different ARIs across the three criteria adopted here.

Table 7.4 Comparison of training and validation results for the ANN based RFFA model

Training Validation

Qpred/Qobs ratio RE (%) Qpred/Qobs ratio ARI (years) CE CE RE (median) (median) (median) (median) 2 0.59 1.03 43.75 0.69 1.04 37.56 5 0.73 1.12 39.53 0.59 0.99 40.39 10 0.64 1.06 39.14 0.63 1.02 44.63 20 0.71 1.10 40.38 0.69 1.04 35.62 50 0.70 1.08 43.32 0.68 1.14 39.09 100 0.64 1.15 46.30 0.40 1.10 44.53 Overall 0.67 1.09 42.07 0.61 1.06 40.30

Figures 7.11 to 7.13 show some example plots generated during the training of the ANN based RFFA model. Figure 7.11 shows the regression plot for the ANN based RFFA model for the training and validation data sets for Q20 (the plots for other ARIs can be seen in Appendix B, Figures B.41 to B.45) Figure 7.12 shows the training state of the ANN based University of Western Sydney 120

Artificial Intelligence Based RFFA Aziz

RFFA model for Q20 using 20,000 epochs and Figure 7.13 shows the plot for validation of results for Q20.

Figure 7.8 Plot comparing the CE values given by the training and validation data sets for the ANN based RFFA model

Figure 7.9 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the ANN based RFFA model

University of Western Sydney 121

Artificial Intelligence Based RFFA Aziz

Figure 7.10 Plot comparing the median RE (%) values given by the training and validation data sets for the ANN based RFFA model

Figure 7.11 Regression plot comparing the training and validation of the ANN based RFFA model for Q20

University of Western Sydney 122

Artificial Intelligence Based RFFA Aziz

Figure 7.12 Plot showing the training state of the ANN based RFFA model for Q20

4500 NN output Actual 60 4000

3500 50

3000

40 2500

2000 30

NN Output VsNN Actual

1500 20

1000

10 500

0 0 10 20 30 40 50 60 70 80 90 Test Catchments

Figure 7.13 Plot between Qobs and Qpred for the ANN based RFFA model for the validation data set

7.3.2 GAANN

In Table 7.5, the CE, median Qpred/Qobs ratio and median relative error values are compared for the training and validation datasets for the GAANN based RFFA model. Figures 7.14,

University of Western Sydney 123

Artificial Intelligence Based RFFA Aziz

7.15 and 7.16 compare the CE, median Qpred/Qobs ratio and median relative error values, respectively for the GAANN based RFFA model. In terms of CE value, the best agreement between the training and validation data sets is found for ARIs of 2, 5, 20 and 100 years, a moderate degree of agreement is found for ARI of 10 years and a relatively poor agreement is found for the ARI of 50 years (for this, the CE value is 0.38, which is remarkably low). With respect to median Qpred/Qobs ratio value, the best agreement between the training and validation data sets is found for ARIs of 10 and 100 years, a moderate agreement is noticed for ARIs of 2 and 20 years and a poor agreement is found for ARIs of 5 and 50 years. For 50 years ARI, the validation data set shows a good Qpred/Qobs ratio value (0.95) as compared with a very high value (1.52) for the training data set. This shows that a poor performance during the training does not always give a poor performance in the validation. In relation to the median relative error values, the best agreement between the training and validation data sets is found for ARIs of 50 and 100 years, a moderate agreement for ARI of 20 years and a very poor agreement for ARIs of 2, 5 and 10 years. In particular, the relative error values for the validation data set are remarkably high compared with the training data set for ARIs of 2, 5 and 10 years. This shows that a good performance during model training does not guarantee a similar good performance during validation.

Table 7.5 Comparison of training and validation results for the GAANN based RFFA model Training Validation

ARI Qpred/Qobs ratio RE (%) Qpred/Qobs ratio RE CE CE (years) (median) (median) (median) (median)

2 0.76 1.22 40.92 0.72 1.08 65.13

5 0.79 1.20 39.31 0.75 0.89 61.48

10 0.76 1.02 41.01 0.63 0.98 72.56

20 0.76 1.11 40.29 0.71 0.93 48.19

50 0.57 1.52 60.00 0.38 0.95 55.93

100 0.63 1.18 45.28 0.65 1.17 47.08

Overall 0.71 1.21 44.47 0.64 1.00 58.40

University of Western Sydney 124

Artificial Intelligence Based RFFA Aziz

Figure 7.14 Plot comparing the CE values given by the training and validation data sets for the GAANN based RFFA model

Figure 7.15 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the GAANN based RFFA model

University of Western Sydney 125

Artificial Intelligence Based RFFA Aziz

Figure 7.16 Plot comparing the median RE (%) values given by the training and validation data sets for the GAANN based RFFA model

7.3.3 GEP

The CE, median Qpred/Qobs ratio and median relative error values for the GEP based RFFA model are compared in Table 7.6 for the training and validation datasets. Figures 7.17, 7.18 and 7.19 compare the CE, median Qpred/Qobs ratio and median relative error values, respectively for the GEP based RFFA model. In terms of CE value, the best agreement between the training and validation data sets is found for ARIs of 5, 20 and 50 years, a moderate degree of agreement is found for ARIs of 10 and 100 years and a relatively poor agreement is found for the ARI of 2 years. The CE value for the validation data set for ARI of

2 years is quite low (0.49). With respect to median Qpred/Qobs ratio value, the best agreement between the training and validation data sets is found for ARIs of 2, 5 and 10 years and a moderate agreement is noticed for ARIs of 20, 50 and 100 years. In relation to median relative error values, the best agreement between the training and validation data sets is found for ARIs of 2, 5 and 10 years, a moderate agreement for ARIs of 20 and 100 years and a poor agreement for ARI of 50 years. It should be noted that for 2 years ARI, both the training and validation data sets exhibit a very high relative error value (73.3% and 69.38%).

University of Western Sydney 126

Artificial Intelligence Based RFFA Aziz Table 7.6 Comparison of training and validation results for the GEP based RFFA model

Training Validation

ARI Qpred/Qobs ratio RE (%) Qpred/Qobs ratio CE CE RE (median) (years) (median) (median) (median)

2 0.69 0.99 73.30 0.49 1.07 69.38

5 0.72 1.08 43.91 0.67 1.10 44.95

10 0.73 1.08 43.25 0.56 1.04 42.08

20 0.65 1.17 54.61 0.67 0.89 47.61

50 0.61 1.45 54.22 0.63 1.05 37.87

100 0.57 1.39 54.82 0.67 1.02 44.47

Overall 0.66 1.19 54.02 0.61 1.03 44.47

Figure 7.17 Plot comparing the CE values given by the training and validation data sets for the GEP based RFFA model

University of Western Sydney 127

Artificial Intelligence Based RFFA Aziz

Figure 7.18 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the GEP based RFFA model

Figure 7.19 Plot comparing the median RE (%) values given by the training and validation data sets for the GEP based RFFA model

University of Western Sydney 128

Artificial Intelligence Based RFFA Aziz 7.3.4 CANFIS

The CE, median Qpred/Qobs ratio and median relative error values are compared in Table 7.8 for the training and validation datasets for the CANFIS based RFFA model. Figures 7.20,

7.21 and 7.22 compare the CE, median Qpred/Qobs ratio and median relative error values, respectively for the CANFIS based RFFA model. In terms of CE value, the best agreement between the training and validation data sets is found for ARIs of 20, 50 and 100 years, a reasonable degree of agreement is found for ARIs of 5 and 10 years and a significant disagreement is found for the ARI of 2 years where the CE value for the validation data set is -0.09 which is much smaller than 0.64 (the CE value for the training data set). With respect to median Qpred/Qobs ratio value, the performance for both the training and validation data sets is found to be in the acceptable range for all the ARIs except for 2 years. For 2 years ARI, the

Qpred/Qobs ratio value is relatively high for both the training data set (1.76) and validation data set (2.81). Similarly, in relation to median relative error value, the performance of 5 and 10 years ARI is found to be the best; however, the worst performance is observed in the case of 2 years ARI for the training data set. Moreover, the best performance in the case of validation data set is found for 20 years ARI, followed by 100 years ARI. The 2 years ARI shows a relatively high median relative error value for the validation data set (180.77%). These results show that the CANFIS based RFFA model is poorly trained/calibrated for 2 years ARI.

Table 7.8 Comparison of training and validation results for the CANFIS based RFFA model Training Validation

Qpred/Qobs ratio RE (%) Qpred/Qobs ratio ARI (years) CE CE RE (median) (median) (median) (median)

2 0.64 1.76 94.02 -0.09 2.81 180.77

5 0.67 0.99 43.55 0.54 0.95 48.92

10 0.75 0.87 45.27 0.67 0.79 51.97

20 0.73 1.26 46.07 0.72 1.18 34.48

50 0.53 1.04 71.94 0.55 0.93 59.20

100 0.62 1.36 55.89 0.59 1.31 42.63

Overall 0.66 1.21 59.46 0.50 1.33 69.66

University of Western Sydney 129

Artificial Intelligence Based RFFA Aziz

Figure 7.20 Plot comparing the CE values given by the training and validation data sets for the CANFIS based RFFA model

Figure 7.21 Plot comparing the median Qpred/Qobs ratio values given by the training and validation data sets for the CANFIS based RFFA model

University of Western Sydney 130

Artificial Intelligence Based RFFA Aziz

Figure 7.22 Plot comparing the median RE (%) values given by the training and validation data sets for the CANFIS based RFFA model 7.4 Selection of the best performing artificial intelligence based RFFA model based on training The training of the four artificial intelligence based RFFA models have been presented in Section 7.2 using 362 catchments. It has been found that none of the four models perform the best in all the adopted assessment criteria over the six ARIs, which makes it difficult to select the best trained/calibrated model. Based on the four different criteria as shown in Table 7.9, the performances of the four models are assessed in a heuristic manner. In this assessment, a model is ranked based on four different criteria as shown in Table 7.9. Four different ranks are used, with a relative score ranging from 4 to 1. If a model is ranked 1 for a criterion, it scores 4. For ranks of 2, 3 and 4, scores of 3, 2 and 1, respectively are assigned.

Table 7.9 shows that the ANN based RFFA model has the highest score of 15, followed by the GANN with a score of 12. The GEP receives a score of 10, while the CANFIS receives only 7 making it the least favourable model in terms of its performance during training. The ANN based model is placed at rank 1 in the 3 out of 4 criteria. Hence, it is decided that the ANN based RFFA model is the best performing artificial intelligence based model in terms of training/calibration of the model.

Table 7.10 shows the ranking of the four artificial intelligence based RFFA models based on the agreement between the training and validation using three criteria. Four different ranks are used with a relative score ranging from 4 to 1 as mentioned earlier. It is found that the ANN and GEP based RFFA models both score 9, followed by the GAAANN and CANFIS. University of Western Sydney 131

Artificial Intelligence Based RFFA Aziz Table 7.9 Ranking of the four artificial intelligence based RFFA models with respect to training

Criterion Rank 1 Rank 2 Rank 3 Rank 4

Scatter plot of Qobs Vs Qpred ANN GANN CANFIS GEP

Median Qpred/Qobs ANN GEP GAANN/CANFIS # Median RE ANN GAANN GEP CANFIS Median CE GAANN ANN GEP/CANFIS # Overall Score: ANN-15, GAANN-12, GEP-10, CANFIS-7

Table 7.10 Ranking of the four artificial intelligence based RFFA models with respect to agreement between training and validation

Criterion Rank 1 Rank 2 Rank 3 Rank 4

CANFIS GEP ANN GAANN

(Best agreement: Q5, (Best agreement: Q2, (Best agreement: (Best agreement: Q2, Median Q10, Q20, Q50, Q100 Q5, Q10, Q20 Q2, Q10, Q100 Q10, Q20, Q100 Qpred/Qobs Moderate agreement: Moderate agreement: Moderate Moderate agreement: Q5 none Q50, Q100 agreement: Q20, Q50 Very poor agreement: Very poor agreement: Poor agreement: none) Poor agreement: Q50) Q2) Q5)

GAANN CANFIS ANN GEP (Best agreement: Q50, (Best agreement: Q5, (Best agreement: (Best agreement: Q2, Q100 Q10, Q20, Q50, Q100 Q5, Q100 Median RE Q5, Q10, Q20, Q100 Moderate agreement: Moderate agreement: Moderate (%) Moderate agreement: Q20 none agreement: Q50, Q20 Q50 Very poor agreement: Significantly poor Poor agreement: Poor agreement: none) Q2, Q5, Q10) agreement: Q2 Q2, Q10)

ANN GEP GAANN CANFIS (Best agreement: (Best agreement: Q5, (Best agreement: Q2, (Best agreement: Q10, Q10, Q20, Q50 Q20, Q50 Q5, Q20, Q100 Q20, Q50, Q100 Median CE Moderate Moderate agreement: Moderate agreement: Moderate agreement: agreement: Q2, Q5 Q10, Q100 Q10 Q5 Poor agreement: Poor agreement: Q2) Poor agreement: Q50) Poor agreement: Q2) Q100) Overall Score: ANN-9, GEP-9, GAANN-7, CANFIS-5

University of Western Sydney 132

Artificial Intelligence Based RFFA Aziz Overall, ANN based RFFA model shows the best training/calibration and the CANFIS the least favourable one.

7.5 Summary

In this chapter, four artificial intelligence based RFFA models (ANN, GAANN, GEP and CANFIS) are developed. Some 80% (362 catchments) of the total 452 catchments are used to train the model (training data set) and the remaining 20% (90 catchments) are used to validate the model (validation data set). The selected artificial intelligence based models are basically black box type models, which are trained/calibrated using the training data set, which involves minimisation of the mean squared error between the observed and predicted flood quantiles by the model (being trained) for a given ARI for the training data set. The artificial intelligence based RFFA models are also evaluated based on four criteria: median Qpred/Qobs ratio, plot of Qobs and Qpred, median relative error (RE) and coefficient of efficiency (CE). This is initially done for the training data set and then repeated for the validation data set. Models are ranked based on their relative performances in relation to these criteria to identify the best trained/calibrated model.

It has been found that there is no model which performs the best for all the six ARIs over all the adopted criteria. Overall, the ANN based RFFA model outperforms the three other models (in terms of training/calibration). Hence, the ANN based RFFA model is the best calibrated model.

University of Western Sydney 133

Artificial Intelligence Based RFFA Aziz

CHAPTER 8

VALIDATION OF ARTIFICIAL INTELLIGENCE BASED RFFA MODELS

8.1 General Chapter 6 has discussed the formation of regions and selection of the best performing region for RFFA in eastern Australia using artificial intelligence based methods. Based on the available data of 452 natural catchments in NSW, VIC, QLD and TAS, it has been found that the best results in RFFA can be obtained when data from these states are combined to form one region. Chapter 5 has discussed the selection of the best set of predictor variables for the RFFA model development. It has been found that two predictor variables i.e., catchment area

(A) and design rainfall intensity (Itc_ARI) deliver the best results in RFFA for eastern Australia. Chapter 8 has developed/trained the RFFA models based on four artificial intelligence based methods which are ANN, GAANN, GEP and CANFIS using data from 362 catchments. This chapter presents the validation of these four RFFA models based on 90 independent test catchments. The results based on these four models are also compared with QRT based RFFA model. This chapter initially presents results in relation to each of the above four artificial intelligence based models followed by an inter-comparison of these methods. Finally, the best performing artificial intelligence based RFFA model is compared with the QRT based RFFA model.

8.2 Validation of RFFA models

8.2.1 ANN Figure 8.1 compares the predictcted flood quantiles for the selected 90 test catchments from the ANN based RFFA model with the observed flood quantiles for 20 years ARI (Q20). The observed flood quantiles are estimated using an LP3 distribution and Bayesian parameter estimation procedure as discussed in Chapter 4. It should be noted here that the observed flood quantiles are not free from error; these are subject to data error (such as rating curve extrapolation error), sampling error (due to limited record length of annual maximum flood series data), error due to choice of flood frequency distribution and error due to selection of

University of Western Sydney 134

Artificial Intelligence Based RFFA Aziz parameter estimation method. This error undermines the usefulness of the validation statistics (e.g. RE); however, this provides an indication of possible error of the developed RFFA model as far as practical application of the RFFA model is concerned. The ratio Qpred/Qobs and RE values are used for the assessment of models; however, the CE value is not very useful here as the mean of observed flood quantile is not known.

Figure 8.1 shows a good agreement overall between the predicted and observed flood quantiles; however, there is some over-estimations by the ANN based RFFA model when the observed flood quantiles are smaller than about 50 m3/s. Most of the test catchments are within a narrow range of variability from the 45-degree line except for a few outliers. The plots of predicted and observed flood quantiles for other ARIs can be seen in Appendix B (Figures B.22 to B.25). The results are very similar for ARIs of 2, 5, 10 and 20 years. Results for ARIs of 50 and 100 years (Figures B.24 and B.25, respectively) exhibit some overestimation by the ANN based RFFA model for smaller to medium discharges.

Figure 8.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q20

Figure 8.2 shows the boxplot of relative error (RE) values of the selected test catchments for ANN based RFFA model for different flood quantiles. It can be seen from Figure 8.2 that the median RE values (represented by the thick black lines within the boxes) are located very

University of Western Sydney 135

Artificial Intelligence Based RFFA Aziz close to the zero RE line (indicated by 0 – 0 horizontal line in Figure 8.2), in particular for ARIs of 2, 5, 10 and 20 years. However, for ARIs of 50 and 100 years, the median RE values are located above the zero line with ARI of 50 years showing the highest departure, which indicates an overestimation by the ANN based RFFA model. Overall, the ANN based RFFA model produces nearly unbaised estimates of flood quantiles as the median RE values match with the zero RE line quite closely as can be seen in Figure 8.2.

In terms of the spread of the RE (represented by the width of the box), ARI of 50 and 100 years present the highest RE band and ARIs of 2 and 5 years present the smallest RE band, followed by ARI of 20 years and 10 years. The RE bands for 50 and 100 years ARIs are almost double to RE bands of 2 and 5 years ARIs. This implies that ANN based RFFA model provides the most accurate flood quantile estimates for 2 and 5 years ARIs, and the least accurate flood quantiles for ARIs of 50 and 100 years. Overall. the boxplot in Figure 8.2 shows that better results in terms of RE values are achieved for the smaller ARIs (i.e. 2, 5, 10 and 20 years ARIs) as compared to higher ARIs for the ANN based RFFA model. Some outliers (evidenced by notable overestimation with a positive RE) can be seen for all the ARIs, which may need to be examined more closely for data errors or issues regarding the hydrology and physical characteristics of these catchments; if these catchments are deemed to be genuine outliers they should be removed to enhance the ANN based RFFA model; however, this has not been undertaken in this thesis.

300

200

100

)

%

(

0 0

E

R

-100

-200

-300 2 5 10 20 50 100 ARI (years)

Figure 8.2 Boxplot of relative error (RE) values for ANN based RFFA model

University of Western Sydney 136

Artificial Intelligence Based RFFA Aziz

Figure 8.3 shows the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments for ANN- based RFFA model for different ARIs. The median Qobs/Qpred ratio values (represented by the thick black lines within the boxes) are located closer to 1 – 1 line (the horizontal line in Figure 8.3), in particular for ARIs of 2, 5, 10 and 20 years. However, for

ARI of 50 years (and to a lesser degree for ARI of 100 years), the median Qobs/Qpred ratio value is clearly located above the 1 – 1 line. These results indicate that the ANN based RFFA model generally provides reasonably accurate flood quantiles with the expected Qobs/Qpred ratio value very close to 1.00, although there is a noticeable overestimation for ARI of 50 years and 100 years. In terms of the spread of the Qobs/Qpred ratio values, ARI of 2 and 5 years provide the lowest spread followed by ARIs of 20, 10, 100 and 50 years.

Considering, the RE and Qobs/Qpred ratio values as discussed above, it can be concluded that ANN based RFFA model generally provide unbiased flood estimates for smaller to medium ARIs (2 to 20 years); however, the model slightly overestimates the observed flood quantiles for higher ARIs (50 to 100 years).

3

2

) 1 1

s

b

o

Q

/

d

e r 0

p

Q

(

o

i

t

a

R -1

-2

-3 2 5 10 20 50 100 ARI (years)

Figure 8.3 Boxplot of Qpred/Qobs ratio values for ANN based RFFA model

University of Western Sydney 137

Artificial Intelligence Based RFFA Aziz 8.2.2 GAANN Figures 8.4, 8.5 and 8.6 show the validation results for GAANN based RFFA model. Figure 8.4 shows the plot of predicted flood quantiles by the GAANN based RFFA model and the observed flood quantiles for 20 years ARI. Figure 8.4 shows a greater scatter than Figure 8.1 (which represents ANN based RFFA model); in particular, there is an underestimation of the flood quantiles by the GAANN based RFFA model for few test catchments. Overall, the scatter around the 45-degree line in Figure 8.4 is deemed reasonable for most of the test catchments. The plots of predicted and observed flood quantiles for other ARIs can be seen in Appendix B (Figures B.26 to B.30). The results are very similar for ARIs of 2, 5, 10 and 20 years. Results for ARIs of 50 and 100 years (Figures B.29 and B.30, respectively) exhibit relatively better results by the GAANN based RFFA model, in particular for the higher discharges.

Figure 8.4 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q20

Figure 8.5 shows the boxplot of RE (%) values for the GAANN based RFFA model. The median RE values (represented by the black line within the boxes) match with the 0 – 0 line very well for ARI of 10 years and reasonably well for ARIs of 2, 20 and 50 years. For ARIs of 5 and 100 years, a noticeable underestimation and overestimation are provided by the

University of Western Sydney 138

Artificial Intelligence Based RFFA Aziz GAANN based RFFA model. In terms of the RE band (represented by the spread of the box), ARI of 20 years shows the lowest spread followed by ARIs of 2, 5, 10, 50 and 100 years. The RE band for 100 years ARI is about double to ARIs of 2 and 20 years. These results show that in terms of RE, the best result overall is achieved for 20 years ARI for the GAANN based RFFA model. Similar to ANN based RFFA model, the performance of GAANN based RFFA model is relatively poor for the higher ARIs (i.e. 50 to 100 years). This is not unexpected as estimation of flood quantiles for higher ARIs are associated with a greater degree of uncertainty (e.g. Haddad and Rahman, 2012; Rahman et al., 2011).

300

200

100

)

%

(

0 0

E

R

-100

-200

-300 2 5 10 20 50 100 ARI (years)

Figure 8.5 Boxplot of relative error (RE) values for GAANN based RFFA model

Figure 8.6 presents the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments for the GAANN based RFFA model for different ARIs. It is found that the median Qobs/Qpred ratio values (represented by the thick black lines within the boxes) are located closer to 1 – 1 line (the horizontal line in Figure 8.6), in particular for ARIs of 2, 10, 20 and 50 years (the best agreement is for ARI of 10 years). However, for ARI of 5 years, the median Qobs/Qpred ratio value is located a short distance below the 1 – 1 line and for ARI of 100 years, the median Qobs/Qpred ratio value is located a short distance above the 1 – 1 line. These results indicate a noticeable overall underestimation and overestimation of the predicted flood quantiles by the GAANN based RFFA model for 5 years and 100 years ARI. In terms of the spread of the Qobs/Qpred ratio values, ARI of 20 years exhibits the lowest spread followed by University of Western Sydney 139

Artificial Intelligence Based RFFA Aziz

ARIs of 5, 2, 50, 10 and 100 years. Furthermore, the spreads of the Qobs/Qpred ratio values for 10 and 100 years are very similar, which are remarkably larger than 2, 5 and 20 years.

3

2

) 1 1

s

b

o

Q

/

d

e r 0

p

Q

(

o

i

t

a

R -1

-2

-3 2 5 10 20 50 100 ARI (years)

Figure 8.6 Boxplot of Qpred/Qobs ratio values for GAANN based RFFA model

8.2.3 GEP

Figure 8.7 compares the predictcted flood quantiles for the selected 90 test catchments by the

GEP based RFFA model with the observed flood quantiles for 20 years ARI (Q20). Figure 8.7 generally presents a good agreement between the predicted and observed flood quantiles; however, there is some over-estimations by the GEP based RFFA model when the observed flood quantiles are smaller than about 100 m3/s. Most of the test catchments are within a narrow range of variability from the 45-degree line except for a few outliers. The plots of predicted and observed flood quantiles for other ARIs were found to be very similar to the 20 years ARI. The plots of predicted and observed flood quantiles for other ARIs can be seen in Appendix B (Figures B.31 to B.35). The results are very similar for ARIs of 2, 5, 10 and 20 years. Results for ARIs of 50 and 100 years (Figures B.34 and B.35, respectively) exhibit some overestimation by the GEP based RFFA model for smaller to medium discharges

University of Western Sydney 140

Artificial Intelligence Based RFFA Aziz

Figure 8.7 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q20

Figure 8.8 shows the boxplot of relative error (RE) values of the selected test catchments for GEP based RFFA model for different flood quantiles. It can be seen from Figure 8.8 that the median RE values (represented by the thick black lines within the boxes) are located very close to the zero RE line (indicated by 0 – 0 horizontal line in Figure 8.8), in particular for ARIs of 2 and 10 years. However, for ARIs of 20, 50 and 100 years, the median RE values are located above the zero line with ARI of 100 years showing the highest departure, which indicates an overestimation by the GEP based RFFA model. Overall, the GEP based RFFA model shows some overestimation bias in flood quantiles estiamtes for higher ARIs.

In terms of the spread of the RE (represented by the width of the box), ARI of 20, 50 and 100 years present the highest RE band and ARIs of 5 and 10 years present the smallest RE band, followed by ARI of 2 years. The RE bands for 20, 50 and 100 years ARIs are almost double to RE bands of 5 and 10 years ARIs. This implies that GEP based RFFA model provides the most accurate flood quantile estimates for 5 and 10 years ARIs, and the least accurate flood quantiles for ARIs of 20, 50 and 100 years. Overall, the boxplot in Figure 8.8 shows that better results in terms of RE values are achieved for the smaller ARIs (i.e. 2, 5 and 10 years ARIs) as compared to higher ARIs. Some outliers (evidenced by notable overestimation with a positive RE) can be seen for all the ARIs, which may need to be examined more closely for

University of Western Sydney 141

Artificial Intelligence Based RFFA Aziz data errors or issues regarding the hydrology and physical characteristics of these catchments; if these catchments are deemed to be genuine outliers they should be removed to enhance the GEP based RFFA model; however, this has not been undertaken in this thesis.

300

200

100

)

%

(

0 0

E

R

-100

-200

-300 2 5 10 20 50 100 ARI (years)

Figure 8.8 Boxplot of relative error (RE) values for GEP based RFFA model

Figure 8.9 shows the boxplot of the Qobs/Qpred ratio values of the selected 90 test catchments for GEP based RFFA model for different ARIs. The median Qobs/Qpred ratio values (represented by the thick black lines within the boxes) are located closer to 1 – 1 line (the horizontal line in Figure 8.9), in particular for ARIs of 2, 5 and 10 years. However, for ARI of

20, 50 and 100 years the median Qobs/Qpred ratio value is clearly located above the 1 – 1 line. These results indicate that the CANFIS based RFFA model generally provides reasonably accurate flood quantiles with the expected Qobs/Qpred ratio value very close to 1.00 for smaller ARIs. However; there is a noticeable overestimation for ARI of 20, 50 and 100 years. In terms of the spread of the Qobs/Qpred ratio values, ARI of 5 and 10 years provide the lowest spread followed by ARIs of 2, 20, 50 and 100 years.

Considering, the RE and Qobs/Qpred ratio values as discussed above, it can be concluded that CANFIS based RFFA model generally provide unbiased flood estimates for smaller to medium ARIs (5 and 10 years); however, the model slightly overestimates the observed flood quantiles for higher ARIs (50 to 100 years) and a a slight underestimation for 2 years ARI.

University of Western Sydney 142

Artificial Intelligence Based RFFA Aziz Some outliers can be seen in the case of higher ARIs (e.g. 100 years), which may need to be looked at more closely for data errors or issues regarding the hydrology of the catchment, if deemed to be genuine outliers they should be removed from the model which however has not been done in this thesis.

4

3

2

)

s

b 1 1

o

Q

/

d

e r 0

p

Q

(

o

i t -1

a

R

-2

-3

-4 Q2 Q5 Q10 Q20 Q50 Q100 ARI (years)

Figure 8.9 Boxplot of Qpred/Qobs ratio values for GEP based RFFA model

8.2.4 CANFIS Figures 8.10, 8.11 and 8.12 show the validation results for CANFIS based RFFA model. Figure 8.10 shows the plot of predicted flood quantiles by the CANFIS based RFFA model and the observed flood quantiles for 20 years ARI. Figure 8.10 shows a greater scatter than Figure 8.1 (which represents ANN based RFFA model) for flood events smaller than about 3 100 m /sec (Qobs) in particular, there is an overestimation of the flood quantiles by the 3 CANFIS based RFFA model for the test catchments with Qobs values smaller than 100 m /sec. Overall, the scatter around the 45-degree line in Figure 8.10 is deemed reasonable for most of 3 the test catchments with Qobs values greater than 100 m /sec. The plots of predicted and observed flood quantiles for other ARIs can be seen in Appendix B (Figures B.36 to B.40). The result for 2 years ARI is quite poor as can be seen in Figure B.36, with significant overestimation by the CANFIS based RFFA model. The results for ARIs of 5, 10 and 20 years are very similar. Results for ARIs of 50 and 100 years (Figures B.39 and B.40, respectively) exhibit some overestimation by the ANN based RFFA model for smaller to University of Western Sydney 143

Artificial Intelligence Based RFFA Aziz medium discharges. For ARI of 50 years (Figure B.39), there is noticeable scatter at smaller discharges.

Figure 8.10 Comparison of observed and predicted flood quantiles for CANFIS based RFFA model for Q20

Figure 8.11 shows the boxplot of RE (%) values for the CANFIS based RFFA model. The median RE values (represented by the black line within the boxes) match with the 0 – 0 line very well for ARI of 5 and 50 years and reasonably well for ARIs of 20 years. For ARIs of 2 and 100 years, a noticeable overestimation is provided by the CANFIS based RFFA model. In terms of the RE band (represented by the spread of the box), ARI of 5, 10 and 20 years shows the lowest spread followed by ARIs of 50, 100 and 2 years. The RE band for 100 years ARI is about double to ARIs of 5 and 10 years. The RE band for 2 years ARI is about four times compared with ARIs of 5 and 10 years. These results show that in terms of RE, the best result overall is achieved for 10 years ARI for the CANFIS based RFFA model.

Figure 8.12 presents the boxplot of the Qpred/Qobs ratio values of the selected 90 test catchments for the CANFIS based RFFA model for different ARIs. It is found that the median

Qobs/Qpred ratio values (represented by the thick black lines within the boxes) are located closer to 1 – 1 line (the horizontal line in Figure 8.12), in particular for ARIs of 2, 5, 10 and 20 years (the best agreement is for ARI of 10 years). However, for ARI of 50 and 100 years, the median Qobs/Qpred ratio value is located a short distance above the 1 – 1 line. These results

University of Western Sydney 144

Artificial Intelligence Based RFFA Aziz indicate a noticeable overall overestimation of the predicted flood quantiles by the CANFIS based RFFA model for 50 years and 100 years ARI. In terms of the spread of the Qobs/Qpred ratio values, ARI of 2 and 5 years exhibits the lowest spread followed by ARIs of 20, 10, 100 and 50 years. Furthermore, the spreads of the Qobs/Qpred ratio values for 50 and 100 years are very similar, which are remarkably larger than 2, 5 and 20 years.

400

300

200

100

)

%

(

E R 0 0

-100

-200

-300 2 5 10 20 50 100 ARI (years)

Figure 8.11 Boxplot of relative error (RE) values for CANFIS based RFFA model

8.3 Comparison of RFFA models based on validation data set For selecting the best performing RFFA model, it is important to compare the results of these models for independent test catchments. The following sub-sections compare the four artificial intelligence based RFFA models based on Qpred/Qobs ratio, RE and CE values.

8.3.1 Median Qpred/Qobs ratio

Table 8.1 summarises the median Qpred/Qobs ratio values for the four different RFFA models.

For the ANN, the median Qpred/Qobs ratio values range from 0.99 to 1.14. For Q5 the median

Qpred/Qobs ratio value is 0.99, which indicates a small under-estimation by the ANN based model. Also, for this model, Q50 and Q100 show over-estimation with median Qpred/Qobs ratio values of 1.14 and 1.10, respectively. The best result is obtained for Q10 with a median

University of Western Sydney 145

Artificial Intelligence Based RFFA Aziz

Qpred/Qobs ratio value of 1.02. In summary, the ANN based model shows a good median

Qpred/Qobs ratio value over all the ARIs (1.06) (at rank 3 among all the four models) and also consistent values of median Qpred/Qobs ratio values for ARIs of 2, 5, 10 and 20 years.

3

2

) 1 1

s

b

o

Q

/

d

e r 0

p

Q

(

o

i

t

a

R -1

-2

-3 2 5 10 20 50 100 ARI (years)

Figure 8.12 Boxplot of Qpred/Qobs ratio values for CANFIS based RFFA model

In case of the GAANN based RFFA model, the median Qpred/Qobs ratio values range from

0.89 (Q5) to 1.17 (Q100); all the median Qpred/Qobs ratio values seem to be within acceptable range except for Q5, which is 0.89 indicating an underestimation by 11%. Similar to the ANN based model, the best GAANN model is found for Q10 in terms of median Qpred/Qobs ratio value. The GAANN based RFFA model provides an overall median Qpred/Qobs ratio value of 1, which is at rank 1 among the four models, but for the individual ARIs, lesser consistency can be seen compared with the ANN based model.

In case of the GEP based RFFA model, all the flood quantiles seem to be performing well in terms of median Qpred/Qobs ratio value except for Q20 and Q5 where 11% underestimation and

10% overestimation, respectively can be seen. The best median Qpred/Qobs ratio value (1.02) is achieved for Q100 for the GEP based model followed by Q10 (1.4) and Q50 (1.05). These results show that the GEP based model provides better results for higher ARIs (in particular for 100 years ARI) as compared with all the three other models. The overall median ratio value for all ARIs is 1.03 which is at rank 2 among the four models.

University of Western Sydney 146

Artificial Intelligence Based RFFA Aziz

For the CANFIS based RFFA model, the best results in terms of median Qpred/Qobs ratio value are obtained in case of Q5 (0.95) followed by Q50 (0.93). This model performs poorly for very small and very high ARIs, however; for medium ARIs the performance of this model is quite good. Overall, CANFIS based model provides median ratio values in the range of 0.79 to 2.81 that shows the highest degree of fluctuation among the four models. The overall median

Qpred/Qobs ratio value for the CANFIS model is 1.33, which is at rank 4 among the four models.

Figure 8.13 plots the median Qpred/Qobs ratio values of all the four artificial intelligence based RFFA models. It can be seen that in terms of consistency, the GEP based model is at rank 1 and ANN based model is at rank 2. The CANFIS based model is the poorest where the degree of fluctuation among the ARIs is the highest.

Table 8.1 Median Qpred/Qobs ratio values for the four artificial intelligence based RFFA models Median ratio (Qpred/Qobs) ARI (years) ANN GAANN GEP CANFIS 2 1.04 1.08 1.07 2.81 5 0.99 0.89 1.10 0.95 10 1.02 0.98 1.04 0.79 20 1.04 0.93 0.89 1.18 50 1.14 0.95 1.05 0.93 100 1.10 1.17 1.02 1.31 Overall 1.06 1.00 1.03 1.33

8.3.2 Median RE (%) Table 8.2 summarises the median RE (%) values of the ANN, GAANN, GEP and CANFIS based RFFA models. The median RE values are calculated based on the absolute RE values of the individual test catchments. In case of ANN, median RE values range from 35.62% to 44.63%. The smallest and highest median RE values are found for ARIs of 20 and 100 years, respectively. The ANN model shows the smallest median RE values for ARIs of 2, 5 and 20 years among the four models. The ANN based RFFA model shows an overall median RE value of 40.3%, which places it at rank 1 among the four RFFA models.

For the GAANN based RFFA model, higher median RE values can be observed for smaller ARIs (2 to 10 years) whereas a better performance can be seen in case of higher ARIs. The best value is obtained in case of Q100 (47.08%) whereas, the highest median RE (%) value is obtained for Q10 (72.56%). The overall median RE values over all the 6 ARIs for the GAANN

University of Western Sydney 147

Artificial Intelligence Based RFFA Aziz based model is found to be 58.4% (Table 8.2), which places it at rank 3 among the four RFFA models.

3

2.5

2

ANN 1.5 GAANN GEP CANFIS

1 Median (Qpred/Qobs) Ratio Median

0.5

0 Figure 8.13 Plot of2 median Qpred/Q5obs ratio values10 for the four artificial20 intelligence50 based RFFA100 models ARI (years) In case of the GEP model, median RE values range from 37.87% (Q50) to 69.38% (Q2). The GEP based RFFA model seems to be performing well for higher ARIs. For 2 years ARI, it performs very poorly. The GEP model shows the smallest median RE values for ARIs of 10 and 50 years among the four models. The overall median RE values over all the 6 ARIs for the GEP based model is found to be 44.47% (Table 8.2), which places it at rank 2 among the four RFFA models.

The CANFIS based RFFA model shows median RE values in the range of 34.48% (Q20) to

180.77% (Q2). The CANFIS model shows the smallest median RE values for ARIs of 20 and 100 years among the four models. The overall median RE values over all the 6 ARIs for the CANFIS based model is found to be 69.66% (Table 8.2), which places it at rank 4 among the four RFFA models.

Figure 8.14 plots the median RE values of all the four artificial intelligence based RFFA models. It shows that the ANN based model shows the smallest degree of fluctuation in the

University of Western Sydney 148

Artificial Intelligence Based RFFA Aziz median RE values over all the six ARIs. The GAANN and GEP models show a similar degree of fluctuation and the CANFIS shows the highest degree of fluctuation.

Table 8.2 Median RE (%) values for the four artificial intelligence based RFFA models

Median RE (%)

ARI (years) ANN GAANN GEP CANFIS

2 37.56 65.13 69.38 180.77

5 40.39 61.48 44.95 48.92

10 44.63 72.56 42.08 51.97

20 35.62 48.19 47.61 34.48

50 39.09 55.93 37.87 59.20

100 44.53 47.08 44.47 42.63

Overall 40.30 58.40 44.47 69.66

8.3.3 Median CE

Table 8.3 depicts the summary of median CE values of the ANN, GAANN, GEP and CANFIS based RFFA models. In case of ANN based RFFA model, the median CE values range from 0.40 (Q100) to 0.69 (Q2 and Q20). Overall ANN based model shows a consistency except for Q100. The best results are obtained in the cases of Q2, Q20 and Q50. The ANN model shows the highest median CE value for 50 years ARI among all the four models. In terms of overall median CE value, the ANN is placed at rank 2 (jointly with the GEP model).

The GAANN based model shows median CE values in the range of 0.38 (Q50) to 0.75 (Q5). The GAANN model shows the highest median CE values for ARI of 2 and 5 years. In terms of overall median CE value, the GAANN is placed at rank 1 among the four models.

The GEP based model shows median CE values in the range of 0.49 (Q2) to 0.67 (Q5, Q20,

Q100). The GEP model shows the highest median CE values for ARI of 100 years. In terms of overall median CE value, the GEP model is at rank 2 (jointly with the ANN based model).

University of Western Sydney 149

Artificial Intelligence Based RFFA Aziz

Figure 8.14 Plot of median RE (%) values for the four artificial intelligence based RFFA models

The CANFIS based model provides poor results for Q2, with a negative median CE value. For the other ARIs, the median CE values are in the range of 0.50 to 0.72. The CANFIS model shows the highest median CE values for ARIs of 10 and 20 years. In terms of overall median CE value, the CANFIS model is at rank 4 among the four models.

Figure 8.15 plots the median CE values of all the four artificial intelligence based RFFA models. This plot shows that the lowest degree of fluctuation in the median CE values is demonstrated by the GEP model followed by the ANN based model and the highest degree of fluctuation is provided by the CANFIS model.

Table 8.3 Median CE values of the four artificial intelligence based RFFA models

Median CE values ARI (years) ANN GAANN GEP CANFIS 2 0.69 0.72 0.49 -0.09 5 0.59 0.75 0.67 0.54 10 0.63 0.63 0.56 0.67 20 0.69 0.71 0.67 0.72 50 0.68 0.38 0.63 0.55 100 0.40 0.65 0.67 0.59 Overall 0.61 0.64 0.61 0.50

University of Western Sydney 150

Artificial Intelligence Based RFFA Aziz

Figure 8.15 Plot of median CE values for the four artificial intelligence based RFFA models

8.3.5 Comparison of RFFA models based on RE (%) ranges

When comparing different RFFA models, it is important to observe how many test catchments fall within specified ranges of RE. For this purpose, RE (%) values (considering its sign), are grouped into four classes as shown in Table 8.4. The selected arbitrary ranges of RE (%) are (-10 to 10), (-20 to 20), (-50 to 50) and (-100 to 100).

In the range of -10 to 10 of RE (%), the ANN is placed at rank 1 with 22% of the 90 test catchments falling in this range, followed by CANFIS (14%) and GAANN (12%). However; in case of GEP, only 9 test catchments fall in this range, which is 10%.

In case of -20 to 20 of RE (%), a total of 32 (35%) test catchments fall under this category for the ANN based model. Some 27% (25) of the test catchments fall in the range of -20 to 20 in case of CANFIS based RFFA model, which is higher than the GEP (25%) and GAANN (22%) based models. In this case, the ANN based model is placed again at rank 1 and the GAANN at rank 4 among the four models.

In the range of -50 to 50 of RE (%), the CANFIS is found to be placed at rank 1 with 61% of the test catchments fall in this range, which is very closely followed by the ANN model where 60% of the test catchments fall in this range. The GEP is placed at rank 3 among the four

University of Western Sydney 151

Artificial Intelligence Based RFFA Aziz models, with 55% of the test catchments falling in this range followed by GAANN to be ranked at 4 with 47 of the test catchments falling in this category which is 52% of the total catchments.

In the range of -100 to 100 of RE (%), the ANN based RFFA model is again placed at rank 1 with 92% of the catchments falling in this range. This is followed by the CANFIS (77%), GAANN (76%) and 63 test catchments of GEP based RFFA model which is 70% of the test catchments.

Overall, the ANN based model outperforms the three other models in terms of the distributions of RE (%) values, which is followed by the CANFIS based model.

Table 8.4 Grouping of 90 test catchments based on RE (%) ranges for the four artificial intelligence based RFFA models

Models (-10 to 10) (-20 to 20) (-50 to 50) (-100 to 100)

ANN 20 32 54 83

% of test catchments 22 35 60 92

GAANN 11 20 47 69

% of test catchments 12 22 52 76

GEP 9 22 49 63

10 25 55 70 % of test catchments

CANFIS 13 25 55 70

% of test catchments 14 27 61 77

8.3.6 Selection of the best performing artificial intelligence based RFFA model

The four artificial intelligence based RFFA models have been compared in Section 8.2 and Sections 8.3.1 to 8.3.5 based on the results from application of these models to 90 test catchments. It has been found that none of the four models perform the best in all the assessment criteria, which makes it difficult to select the best model. Based on seven different criteria as shown in Table 8.5, the performances of the four models are assessed in a heuristic

University of Western Sydney 152

Artificial Intelligence Based RFFA Aziz manner. In this assessment, a model is ranked based on seven different criteria as shown in Table 8.5. Four different ranks are used with a relative score ranging 4 to 1. If a model is ranked 1 for a criterion, it scores 4. For ranks 2, 3 and 4, scores of 3, 2 and 1, respectively are used. Table 8.5 shows that the ANN based RFFA model has the highest score of 25, followed by the GAANN with a score of 19. The GEP receives a score of 17, while the CANFIS receives only 10 making it the least favourable model. The ANN based model is placed at rank 1 in 5 out of 7 criteria. Hence, it is reasonable to conclude that the ANN based RFFA model is the best performing artificial intelligence based model for eastern Australia.

Table 8.5 Ranking of the four artificial intelligence based RFFA models for eastern Australia

Criteria Rank 1 Rank 2 Rank 3 Rank 4

Scatter plot of Qobs Vs Qpred ANN GEP GAANN CANFIS

Box plot of RE ANN GEP GAANN CANFIS

Box plot of Qpred/Qobs ANN GAANN CANFIS GEP

Median Qpred/Qobs GAANN GEP ANN CANFIS

Median RE ANN GEP GAANN CANFIS

Median CE GAANN ANN/GEP # CANFIS

RE (%) ranges ANN CANFIS GAANN GEP

Overall Scoring: ANN: 25, GAANN: 19, GEP: 17, CANFIS: 10

8.4 Performance of the finally selected artificial intelligence based RFFA model

This section further evaluates the performance of the best performing artificial intelligence based RFFA model, which is the ANN based RFFA model. Here, the spatial distributions of the relative error (RE) values of the ANN based RFFA model for the 90 test catchments are evaluated. Secondly, relation between the RE and catchment area is investigated.

University of Western Sydney 153

Artificial Intelligence Based RFFA Aziz 8.4.1 Spatial distribution of RE (%) of the ANN based RFFA model

Figure 8.16 shows the spatial distribution of RE values across NSW. Most of the test catchments fall in the eastern part of the NSW since not many catchments qualified from the western NSW in the study data set. Overall, the catchments near the north-eastern NSW are found to be exhibiting smaller RE values. Most importantly, Figure 8.16 does not show any notable spatial pattern and in general test catchments with higher RE values are surrounded by catchments with relatively small RE values.

Figure 8.16 Spatial distribution of RE of ANN based model across NSW

Figure 8.17 shows the distribution of RE values across the state of Victoria. Similar to NSW there is no noticeable spatial trend of the RE values across the state. Figures 8.18, 8.19 and 8.20 show the spatial distribution of RE values across QLD. Figure 8.18 shows the RE values across northern and northeastern parts of QLD. Generally, the catchments in this part of QLD show a relatively small RE values. Figure 8.19 shows the catchments in the southern and southeastern parts of QLD. Most of the test catchments fall near the coastal area of QLD. The catchments close to NSW and QLD border are found to be exhibiting better results with RE values quite small. Figure 8.20 shows a full view of the spatial distribution of RE values across QLD, which shows that there is no noticeable spatial trend in the RE values for QLD.

University of Western Sydney 154

Artificial Intelligence Based RFFA Aziz Figure 8.21 shows the spatial distribution of RE values across the state of TAS. Most of the test catchments in TAS fall in the middle of TAS and away from coastal regions of TAS. No spatial trend is observed in the RE values over TAS.

It should also be noted that there are some outlier catchments where RE values are quite high; these catchments may need further investigation, which however is not undertaken in this thesis.

Figure 8.17 Spatial distribution of RE of ANN based model across VIC

University of Western Sydney 155

Artificial Intelligence Based RFFA Aziz

Figure 8.18 Spatial distribution of RE of ANN based model across North QLD

Figure 8.19 Spatial distribution of RE of ANN based model across Southeast QLD

University of Western Sydney 156

Artificial Intelligence Based RFFA Aziz

Figure 8.20 Spatial distribution of RE of ANN based model across QLD

Figure 8.21 Spatial distribution of RE of ANN based model across TAS

8.4.2 Catchment area vs RE

Figure 8.22 shows a plot between RE values and the area of the test catchments. Catchments with areas in the range of 1 to 200 km2 fall within minimum RE group. In the range of 200 to 400 km2, most catchments show smaller RE values except two outliers where RE values are

University of Western Sydney 157

Artificial Intelligence Based RFFA Aziz greater than 500%. Of importance, there is no statistically significant relationship between RE and catchment area as the coefficient of determination (R2) of the fitted regression line in Figure 8.22 is only 6%.

8.5 Comparison with QRT

Finally, the ANN based RFFA model is compared with the QRT based models. Here the same dataset are used for building and testing the ANN and QRT models. Based on the median

Qpred/Qobs ratio values as shown in Table 8.6, the ANN based RFFA model shows median

Qpred/Qobs ratio values closer to 1.00 compared with the QRT model for all the 6 ARIs. Similarly, as shown in Table 8.7, the ANN based RFFA model shows a smaller median RE values than the QRT model for all the ARIs. Furthermore, in Table 8.8, the ANN based RFFA model outperforms the QRT models with respect to CE values. These results demonstrate that ANN based RFFA model outperform the QRT model considering all the three evaluation statistics. It should be noted here that the median RE values for the best ANN based RFFA model developed here range from 35% to 44% (with few cases where RE > 100%), which is typical with Australian regional flood estimation methods (e.g., see Haddad et al., 2011; Haddad and Rahman, 2012). Since RE is independent of catchment area, the model can be applied to smaller as well as larger catchments up to 1000 km2.

Figure 8.22 Plot between catchment area and RE (%) values for ANN based RFFA model for 90 test catchments

University of Western Sydney 158

Artificial Intelligence Based RFFA Aziz

Table 8.6 Median Qpred/Qobs ratio values for seven ANN based candidate regions and QRT

Flood quantile Median ratio (Qpred/Qobs) ANN QRT Q2 1.04 1.15 Q5 0.99 1.06 Q10 1.02 1.35 Q20 1.04 1.13 Q50 1.14 1.19 Q100 1.10 1.28

Table 8.7 Median relative error values (%) for seven ANN based candidate regions and QRT

Flood quantile Median RE (%) ANN QRT Q2 37.56 65.38 Q5 40.39 45.35 Q10 44.63 57.62 Q20 35.62 42.64 Q50 39.09 48.71 Q100 44.53 51.72

Table 8.8 Coefficient of efficiency (CE) values for seven ANN based candidate regions and QRT Flood quantile CE ANN QRT Q2 0.73 0.35 Q5 0.61 0.37 Q10 0.63 0.30 Q20 0.71 0.37 Q50 0.68 -8.42 Q100 0.52 0.38

8.6 Summary

In this chapter, four artificial intelligence based RFFA models which are ANN, GAANN, GEP and CANFIS have been validated based on 90 independent test catchments. It has been found that there is no model which performs the best for all the six ARIs and for all the seven criteria (Table 8.17). It has been found that the ANN based RFFA model is the best performing model among the four artificial intelligence based RFFA models. The ANN based RFFA model is found to outperform the ordinary least squares based quantile regression technique.

University of Western Sydney 159

Artificial Intelligence Based RFFA Aziz The median relative error values for the finally selected ANN based RFFA model ranges 35% to 44%, which is slightly higher than the GLS regression based region-of-influence approach (parameter regression technique) reported by Haddad and Rahman (2012) (relative error ranges 29% to 45%). However, these relative error values by both the techniques are within the expected error/variability of RFFA models, which is dependent on at-site flood frequency analysis estimates (that has a high degree of sampling variability).

The ANN based RFFA model shows that there is no noticeable spatial trend in the relative error values across four states in eastern Australia. Furthermore, the relative error values are independent of catchment area.

There are few catchments where the ANN based RFFA model shows relatively high relative error values (similar to the results by Haddad and Rahman, 2012). These catchments may need further investigation, which however is not undertaken in this thesis.

To enhance the accuracy of regional flood estimation methods in eastern Australia, a larger data set with longer streamflow record lengths would be needed as Australia is characterised by a highly variable hydrology/flood regime. It is expected that the availability of such a larger data in future would enhance the accuracy of artificial intelligence based RFFA models in eastern Australia.

University of Western Sydney 160

Artificial Intelligence Based RFFA Aziz

CHAPTER 9

SUMMARY, CONCLUSIONS AND RECOMMENDATIONS

9.1 General This thesis has focused on the development and testing of non-linear artificial intelligence based regional flood frequency analysis (RFFA) models. For this purpose, a database of 452 small to medium sized catchments from eastern Australia has been used. Four different artificial intelligence based RFFA models have been considered in this research. These non- linear RFFA models have also been compared with the linear ordinary least squares based regression model. This chapter presents a summary of the research undertaken in this thesis, conclusions and recommendations for further study.

9.2 Summary of the research undertaken in this thesis

Selection of study catchments and data preparation: This research selects eastern Australia as the study area since it has the highest density of stream gauging stations in Australia. A total of 452 catchments were selected from the study area that consist of 96 catchments from New South Wales and Australian Capital Territory, 131 catchments from Victoria, 172 catchments from Queensland and 53 catchments from Tasmania. The geographical locations of the selected 452 catchments can be seen in Figure 4.19. These catchments are not affected by major regulation and land use changes. These are small to medium-sized catchments, with catchment areas in the range of 1.3 to 1900 km2 (mean: 329.4 km2). The annual maximum flood series of the selected stations were prepared by adopting standard procedure (e.g. by filling gaps in the data and by checking for rating curve error and trends). The annual maximum flood record lengths of the selected stations range from 25 to 75 years (mean: 33 years). For each of the selected stations, at-site flood frequency analysis was carried out using FLIKE software (Kuczera, 1999). The detected low flows were censored using in-built facility in the FLIKE. A LP3 distribution with the Bayesian parameter estimation procedure was adopted to estimate flood quantiles for six average recurrence intervals (i.e. 2, 5, 10, 20, 50 and 100 years). These flood quantiles were used as dependent/target variables in the

University of Western Sydney 161

Artificial Intelligence Based RFFA Aziz development of models using linear and non-linear methods. For each of the selected catchments, data for five catchment characteristics were abstracted, which are catchment area, mean annual areal evapo-transpiration, mean annual rainfall, main stream slope and design rainfall intensity. The summary of these catchment characteristics data can be seen in Table 4.1.

Selection of predictor variables: From the selected five candidate catchment characteristics variables, eight different combinations were formed. Each of these combinations contained catchment area and design rainfall intensity and combinations of the remaining three predictor variables (mean annual areal evapo-transpiration, mean annual rainfall and main stream slope) as can be seen in Table 5.2. Two artificial intelligence based RFFA techniques (ANN and GEP) were then used to develop prediction equations. From the selected 452 catchments, 90 catchments were selected randomly as test catchments, the remaining 362 catchments were used to develop models. Models were assessed based on ratio between predicted and observed flood quantiles, percent relative error and coefficient of efficiency. Based on the independent testing, it was found that the ANN and GEP based RFFA model with only two predictor variables (catchment area and design rainfall intensity) outperformed other models with a greater number of predictor variables. This model would be easier to apply in practice as these two predictor variables can be obtained relatively easily from the published maps and government websites. In the subsequent analyses, these two predictor variables (catchment area and design rainfall intensity) were used.

Formation of regions: From the selected 452 catchments covering four eastern Australian states, different regions/groupings were formed. In the first step, regions were formed on the basis of state/geographical and climatic boundaries. Here, seven different regions were considered as can be seen in Table 6.1. In the second step, the regions were formed in the catchment characteristics data space based on cluster analysis and principal component analysis. Here, two regions were formed based on cluster analysis and four regions were formed based on principal component analysis. It was found that that K-Means cluster analysis generated the best performing groupings in the catchment characteristics data space. When compared with the geographical regions, some state-based regions performed more poorly than the K-Means cluster groupings. Overall, the best ANN-based RFFA model was achieved when all the data set of 452 catchments were combined to form a single region.

Development of artificial intelligence based RFFA models: In the development/training of the artificial intelligence based RFFA models, the selected 452 catchments were divided into University of Western Sydney 162

Artificial Intelligence Based RFFA Aziz two parts randomly: (i) training data set consisting of 362 catchments; and (ii) validation data set consisting of 90 catchments. The artificial intelligence based RFFA models were evaluated based on four criteria: median Qpred/Qobs ratio, plot of Qobs and Qpred, median relative error and coefficient of efficiency (Tables 7.9 and 7.10). It was found that no model performed the best for all the six ARIs over all the adopted criteria. Overall, the ANN based RFFA model outperformed the three other models in the training/calibration.

Validation of the artificial intelligence based RFFA models: The four artificial intelligence based RFFA models (ANN, GAANN, GEP and CANFIS) were validated using the 90 independent test catchments. In the first step, the four artificial intelligence based RFFA models were compared with each other. Based on seven different criteria (can be seen in Table 8.5), it was found that there was no model which performed the best for all the six ARIs based on all the seven criteria (Table 8.17). It was found that the ANN based RFFA model was the best performing model among the four artificial intelligence based RFFA models. In the second step, the ANN based RFFA model was compared with the ordinary least squares based quantile regression technique. It was found that ANN based RFFA model outperformed the quantile regression technique.

The median relative error values for the finally selected ANN based RFFA model were found to be in the range of 35% to 44%, which is comparable to the generalised least squares regression and region-of-influence approach (parameter regression technique) which reported relative error values in the range of 29% to 45% for eastern Australia (Haddad and Rahman, 2012). The ANN based RFFA model exhibited no noticeable spatial trend in the relative error values on the map of the selected study area. Furthermore, the relative error values were found to be independent of catchment area. There are few catchments where the ANN based RFFA model (and the other three artificial intelligence based RFFA models and the quantile regression technique) showed quite high relative error values (similar to the results by Haddad and Rahman, 2012). These catchments need further investigation, which however was not undertaken in this thesis.

9.3 Conclusions The following conclusions can be made from this research:

 It has been found that non-linear artificial intelligence based RFFA techniques can be applied successfully to eastern Australian catchments. Among the four artificial intelligence based models, the ANN based RFFA model has been found to be the best

University of Western Sydney 163

Artificial Intelligence Based RFFA Aziz performing model, followed by the GAANN based RFFA model. The ANN based RFFA model has been found to outperform the ordinary least squares based RFFA model.

 It has been shown that in the training of the four artificial intelligence based RFFA models, no model performs the best for all the six ARIs over all the adopted criteria. Overall, the ANN based RFFA model is found to outperform the three other models in the training/calibration.

 Based on independent validation, the median relative error values for the ANN based RFFA model are observed to be in the range of 35% to 44% for eastern Australia, which is comparable to the generalised least squares regression and region-of-influence based RFFA approach.

 It has been demonstrated that a RFFA model with two predictor variables i.e., catchment area and design rainfall intensity provides more accurate flood quantile estimates than other models with a greater number of predictor variables. The finally selected ANN based RFFA model would be easier to apply in practice since data of these two predictor variables can be obtained relatively easily from published maps and government websites.

 It has been shown that when the data from all the eastern Australian states are combined to form one region, the resulting ANN based RFFA model performs better as compared with other candidate regions such as regions based on state boundaries, geographical and climatic boundaries and the regions formed in the catchment characteristics data space.

 The ANN based RFFA model exhibits no noticeable spatial trend in the relative error values. Furthermore, the relative error values of the ANN based RFFA model are found to be independent of catchment area.

9.4 Recommendations for further research The ANN based RFFA model developed in this study is based on the catchments in the states of New South Wales, Victoria, Queensland and Tasmania. In future research, the ANN based RFFA model should be tested to other Australian states.

University of Western Sydney 164

Artificial Intelligence Based RFFA Aziz The ANN based RFFA model developed in this study is based on design rainfall data from Australian Rainfall and Runoff (ARR) 1987. The ANN based RFFA model should be calibrated and tested with the recently released design rainfall data by Australian Bureau of Meteorology.

In future research, detail investigation should be made on the catchments where relative error values have been found to be quite high for all the modelling techniques adopted in this research. In this regard, streamflow data of these catchments should be checked. Furthermore, it should be checked whether these catchments have other special features which make them significantly different to other catchments in the data set.

To enhance the accuracy of the ANN based RFFA model, a lager data set consisting of a greater number of catchments and additional predictor variables (when available in future) should be used to develop and test the ANN based RFFA model in future.

In future research, leave-one-out validation and Monte Carlo cross validation technique should be adopted to train and validate the ANN based RFFA model.

University of Western Sydney 165

Artificial Intelligence Based RFFA Aziz

REFERENCES

University of Western Sydney 166

Artificial Intelligence Based RFFA Aziz REFERENCES

ABC News (2011). Aerial shot of the flooded Queensland town of Ipswich. (accessed on 5th August 2013). Accessible at http://www.abc.net.au.

ABC News (2011). Aerial shot of the flooded New South Wales town of Wagga Wagga. (accessed on 5th August 2013). Accessible at http://www.abc.net.au.

Abrahart, R.J., See, L. and Kneale P.E. (1999). Using pruning algorithms and genetic algorithms to optimize network architectures and forecasting inputs in a neural network rainfall-runoff model. Journal of Hydroinformatics, 1, 103-114.

Acreman, M.C. and Sinclair, C.D. (1986) Classification of drainage basins according to their physical characteristics and application for flood frequency analysis in Scotland, Journal of Hydrology, 84(3), 365-380.

Adams, C.A. (1987). Design flood estimation for ungauged rural catchments in Victoria Road Construction Authority, Victoria, Draft Technical Bulletin.

Alkon, D.L. (1989). Memory storage and neural systems. Scientific American, 26 (1), 42-50.

Alecsandru, C. and Ishak, S. (2004). Hybrid model-based and memory-based traffic prediction system. Transportation Research Record: Journal of the Transportation Research Board, 1879(1), 59-70.

Arthur, L.C. and Roger, L.W. (1995). LibGA for solving combinatorial optimization problems. In L. Chambers (ed.), Practical handbook of Genetic Algorithms, CRC Press, Inc.

ASCE. (2000). Task Committee, 2000. Artificial neural networks in hydrology-I: Preliminary concepts. Journal of Hydrologic Engineering, ASCE 5 (2), 115–123.

Aytek, A. (2009). Co-Active neuro-fuzzy inference system for evapotranspiration modelling. Soft Computing, 13(7), 691-700.

Azamathulla, H.M., Ghani, A.A., Leow, C.S., Chang, C.K. and Zakaria, N.A. (2011). Gene- expression programming for the development of a stage-discharge curve of the Pahang River. Water Resources Management, 25(11), 2901–2916

Azamathulla, H.M. and Ghani, A.A. (2011). Genetic programming for longitudinal dispersion coefficients in streams. Water Resources Management, 25(6), 1537–1544.

Aziz, K., Rahman, A., Fang, G., Haddad, K. and Shrestha, S. (2010). Design flood estimation for ungauged catchments: Application of artificial neural networks for eastern Australia. World Environment and Water Resources Congress, ASCE, Providence, Rhodes Island, USA.

University of Western Sydney 167

Artificial Intelligence Based RFFA Aziz Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2011). Artificial neural networks based regional flood estimation methods for eastern Australia: Identification of optimum regions. 33rd Hydrology and Water Resources Symposium, 26 June-1 July 2011, Brisbane, Australia.

Aziz, K., Rahman, A., Fang, G. and Shrestha, S. (2012). Comparison of artificial neural networks and adaptive neuro-fuzzy inference system for regional flood estimation in Australia, Hydrology and Water Resources Symposium, Engineers Australia, 19-22 Nov 2012, Sydney, Australia.

Aziz, K., Rahman, A., Fang, G. and Shreshtha, S. (2013). Application of artificial neural networks in regional flood frequency analysis: A case study for Australia, Stochastic Environment Research & Risk Assessment, 28(3), 541-554.

Baker, J.E. (1985). Adaptive selection method for genetic algorithms. Proceedings of an International Conference on Genetic Algorithms and their Applications, 100-111. Baker. J.E. (1987). Reducing bias inefficiency in the selection algorithm. In J.J. Grefenstette (ed.), Genetic algorithms and their applications, Proceedings of the second international conference on genetic algorithms, Erlbaum.

Bates, B.C., Rahman, A., Mein, R.G. and Weinmann, P.E. (1998). Climatic and physical factors that influence the homogeneity of regional floods in south-eastern Australia. Water Resources Research, 34(12), 3369-3382.

Bayazit, M. and Onoz, B. (2004). Sampling variances of regional flood quantiles affected by inter-site correlation, Journal of Hydrology, 291, 42-51.

Benson, M.A. (1962). Evolution of methods for evaluating the occurrence of floods. U.S. Geological Surveying Water Supply Paper, 30, 1580-A.

Bureau of Infrastructure, Transport and Regional Economics (BITRE) (2008). Analysis of the Emergency Management Australia database. About Australia’s Regions, Department of Infrastructure, Transport, Regional Development and Local Government, Australian Government, Canberra, Table 30, 44 pp.

Bureau of Meteorology (2014). State of the Climate 2014. http://www.bom.gov.au/state-of- the-climate/.

Bishop, C.M. (1995). Neural networks for pattern recognition, Oxford University Press.

Blöschl, G. and Sivapalan, M. (1997). Process controls on regional flood frequency: Coefficient of variation and basin scale, Water Resources Research, 33, 2967-2980.

Blackie, J.R. and Eeles, C.W.O. (1985). Lumped catchment models. In: Hydrological Forecasting (ed. by M. G. Anderson &T. P. Burt), 311-346. Wiley.

University of Western Sydney 168

Artificial Intelligence Based RFFA Aziz

Bogardi, I., Bardossy, A., Duckstein, L. and Pongra´cz, R. (2003). Fuzzy logic in hydrology and water resources. In: Demicco, R.V., Klir, G.J. (Eds.), Fuzzy Logic in Geology. Elsevier Academic Press, 153–190. Bowden, G.J., Dandy, G.C. and Maier, H.R. (2005). Input determination for neural network models in water resources applications. Part 1-background and methodology. Journal of Hydrology, 301, 75-92.

Burn, D.H. (1990). Evaluation of regional flood frequency analysis with a region of influence approach, Water Resources Research, 26(10), 2257-2265.

Burn, D.H. and Goel N.K. (2000). The formation of groups for regional flood frequency, Journal of Hydrological Sciences, 45(1), 97-112.

Caballero, W.L. and Rahman, A. (2014). Development of regionalized joint probability approach to flood estimation: A case study for New South Wales, Australia, Hydrological Processes, 28, 4001-4010.

Castellarin, A., Burn, D.H. and Brath, A. (2001). Assessing the effectiveness of hydrological similarity measures for regional flood frequency analysis, Journal of Hydrology, 241(3-4), 270-285.

Cheng, C.T., Ou, C.P. and Chau, K.W. (2002). Combining a fuzzy optimal model with a genetic algorithm to solve multi-objective rainfall-runoff model calibration. Journal of Hydrology, 268, 72-86.

Chokmani, K., Ouarda, B.M.J.T., Hamilton, S., Ghedira, M.H. and Gingras, H. (2008). Comparison of ice-affected streamflow estimates computed using artificial neural networks and multiple regression techniques. Journal of Hydrology, 349, 83–396.

Caudill, M. (1987). Neural networks primer, Part I, AI Expert, December, 46-52.

Caudill, M. (1988). Neural networks primer, Part II, AI Expert, No. February, 55-61.

Caudill, M. (1989). Neural networks primer, Part VII, AI Expert, No. May, 51 - 8.

Chow, V.T., Maidment, D.R. and Mays, L.W. (1988). Applied Hydrology, McGraw-Hill, New York, NY.

Corradini, C. and Singh, V.P. (1985). Effect of spatial variability of effective rainfall on direct runoff by a géomorphologie approach. Journal of Hydrology, 81, 27-43.

Cunnane, J.R. (1987). Review of Statistical Methods for Flood Frequency Estimation. V. P. Singh (Ed.), in Hydrologic Frequency Modeling, D. Reidel, Dordrecht.

University of Western Sydney 169

Artificial Intelligence Based RFFA Aziz Cunnane, C. (1988). Methods and merits of regional flood frequency analysis, Journal of Hydrology, 100, 269-290.

Dalrymple, T. (1960). Flood frequency analyses. U.S., Geological Survey Water Supply Paper, 1543-A, 11-51.

Daniell, T.M. (1991). Neural networks – applications in hydrology and water resources engineering. International Hydrology & Water Resources Symposium. Perth, Australia, 2-4 October, 1991. de la Maza M. and Tidor B. (1993). An analysis of selection procedures with particular attention paid to proportional and Boltzmann selection. In S. Forrest (ed.), Proceedings of the fifth international conference on genetic algorithms.

Dawson, C.W., Abrahart, R.J., Shamseldin, A.Y. and Wilby, R.L. (2006). Flood estimation at ungauged sites using artificial neural networks, Journal of Hydrology, 319, 391–409.

Dawdy, D.R. (1961). Variation of flood ratios with size of drainage area. U. S. Geol. Surv. Prof. Pap. 424-C, Paper C36.

Douglas, B.C. (1995). U.S. National Report to IUGG, 1991-1994. Reviews of Geophysics, 33 Supplement. Online; available at http://www.agu.org/revgeophys/dougla01/dougla01. (Accessed on 13 Nov, 2009).

Efron, B. and Tibshirani, R.J. (1993). An introduction to the bootstrap. Monographs on Statistics and Applied Probability. Chapman and Hall, New York.

Fausett, L. (1994). Fundamentals of neural networks, Englewood Cliffs, NJ: Prentice Hall.

Farmer, J.D. and Sidorowich, J. (1987). Predicting chaotic time series. Physical Review Letter, 59(8), 845-848.

Feldman, A.D. (1979). Flood hydrograph and peak flow frequency analysis. (Technical Paper No. 62). US Army Corps of Engineers, Institute for Water Resources, The Hydrologic Engineering Centre.

Fernando, D.A.K., Shamseldin, A.Y. and Abrahart, R.J. (2009). Using gene expression programming to develop a combined runoff estimate model from conventional rainfall-runoff model outputs. 18th World IMACS / MODSIM Congress, Cairns, Australia 13-17 July 2009.

Ferreira, C. (2001a). Gene expression programming in problem solving”, 6th Online World Conference on Soft Computing in Industrial Applications (invited tutorial).

Ferreira, C. (2001b). Gene expression programming: a new adaptive algorithm for solving problems. Complex Systems 13(2), 87–129.

University of Western Sydney 170

Artificial Intelligence Based RFFA Aziz

Ferreira, C. (2006). Gene-expression programming; mathematical modeling by an artificial intelligence. Springer, Berling, Heidelberg, New York.

Flavell, D. (2012). Design flood estimation in Western Australia. Australian Journal of Water Resources, Vol. 16 (1), 1-20.

Flood, I. and Kartam, N. (1994). Neural networks in civil engineering; Principles and understanding. Journal of Computing in Civil Engineering, 8(2), 131-148, 194.

Franchini, M. (1996). Using a genetic algorithm combined with a local search method for the automatic calibration of conceptual rainfall-runoff models. Hydrological Sciences Journal 41(1), 21-40.

Franchini, M. and Galeati, G. (1997). Comparing several genetic algorithm schemes for the calibration of conceptual rainfall-runoff models. Hydrological Sciences Journal, 42 (3), 357- 379.

Franchini, M., Galeati, G. and Lolli, M. (2005). Analytical derivation of the flood frequency curve through partial duration series analysis and a probabilistic representation of the runoff coefficient, Journal of Hydrology, 303, 1–15.

Giustolisi, O. (2004). Using genetic programming to determine Chèzy resistance coefficient in corrugated channels. Journal of Hydroinformatics, 6(3), 157–173.

Griffis, V.W. and Stedinger, J.R. (2007). The use of GLS regression in regional hydrologic analyses. Journal of Hydrology, 344, 82-95.

Grubbs, F.E. and Beck, G. (1972). Extension of sample sizes and percentage points for significance tests of outlying observations. Technometrics, 14, 847–854.

Goldberg, D.E. (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading, MA.

Goldberg, D.E. and Deb, K. (1991). A comparative analysis of selection schemes used in genetic algorithms. In G. Rawlins (ed.), Foundations of genetic algorithms.

Gupta, V.K. and E. Waymire. (1993). A statistical analysis of mesoscale rainfall as a random cascade. Journal of Applied Meteorology, 32(2), 251-267.

Guven, A. and Talu, N.E. (2010). Gene-expression programming for estimating suspended sediment in Middle Euphrates Basin, Turkey. Clean Soil Air and Water, 38(12), 1159–1168.

University of Western Sydney 171

Artificial Intelligence Based RFFA Aziz Guven, A. and Kisi, O. (2011). Estimation of suspended sediment yield in natural rivers using machine-coded linear genetic programming. Water Resources Management, 25(2), 691–704.

Hackelbusch A., Micevski T., Kuczera G., Rahman A. and Haddad K. (2009). Regional flood frequency analysis for eastern New South Wales: A region of influence approach using generalized least squares based parameter regression. In Proc. 31st Hydrology and Water Resources Symp., Newcastle, Australia.

Haddad, K., Rahman, A. and Weinmann, P.E. (2006). Design flood estimation in ungauged catchments by quantile regression technique: ordinary least squares and generalised least squares compared. 30th Hydrology and Water Resources Symposium, The Institution of Engineers Australia, 4-7 Dec 2006, Launceston.

Haddad, K., Rahman, A. and Weinmann, P.E. (2008). Development of a generalised least squares based quantile regression technique for design flood estimation in Victoria, 31st Hydrology and Water Resources Symp., Adelaide, 15-17 April 2008, 2546-2557.

Haddad, K., Pirozzi, J., McPherson, G., Rahman, A. and Kuczera, G. (2009). Regional flood estimation technique for NSW: Application of generalised least squares quantile regression technique. In Proc. 31st Hydrology and Water Resources Symp., Newcastle, Australia.

Haddad, K., Rahman, A., Weinmann, P.E., Kuczera, G. and Ball, J.E. (2010). Streamflow data preparation for regional flood frequency analysis: Lessons from south-east Australia. Australian Journal of Water Resources, 14, 1, 17-32.

Haddad, K., Rahman, A. and Stedinger, J.R. (2011). Regional flood frequency analysis using bayesian generalized least squares: A comparison between quantile and parameter regression techniques, Hydrological Processes, 25, 1-14.

Haddad, K. and Rahman, A. (2011). Regional flood estimation in New South Wales Australia using generalised least squares quantile regression. Journal of Hydrologic Engineering, ASCE, 16 (11), 920-925.

Haddad, K. and Rahman, A. (2012). Regional flood frequency analysis in eastern Australia: Bayesian GLS regression-based methods within fixed region and ROI framework – Quantile Regression vs. Parameter Regression Technique, Journal of Hydrology, 430-431 (2012), 142- 161.

Haddad, K., Rahman, A., Ling, F. (2014). Regional flood frequency analysis method for Tasmania, Australia: A case study on the comparison of fixed region and region-of-influence approaches, Hydrological Sciences Journal, DOI:10.1080/02626667.2014.950583.

Holland, J.H. (1975). Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor, MI pp. 183.

University of Western Sydney 172

Artificial Intelligence Based RFFA Aziz Hoang, T.M.T. (2001). Joint probability approach to design flood estimation, PhD thesis, Department of Civil Engineering, Monash University, Australia.

Hosking, J.R.M. and Wallis, J.R. (1993). Some statics useful in regional frequency analysis, Water Resources Research, 29(2), 271–281.

Hosking, J.R.M. and Wallis J.R. (1997). Regional frequency analysis – An approach based on L-moments, Cambridge University Press, New York, 224 pp.

Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the National Academy of Sciences of the USA, 9(2554).

Institution of Engineers Australia (I.E. Aust.) (1987, 2001). Australian rainfall and runoff: A guide to flood estimation. Editor: D.H. Pilgrim, Vol.1, I. E. Aust., Canberra.

Ishak, E., Rahman, A., Westra, S., Sharma, A. and Kuczera, G. (2013). Evaluating the non- stationarity of Australian annual maximum floods. Journal of Hydrology, 494, 134-145.

Ishak, E., Rahman, A. (2014). Detection of changes in flood data in Victoria, Australia over 1975-2011, Hydrology Research, doi:10.2166/nh.2014.064.

Ishak, E., Haddad, K., Zaman, M. and Rahman, A. (2011). Scaling property of regional floods in New South Wales Australia, Natural Hazards, 58, 1155-1167.

Jain A. and Srinivasulu S. (2004). Development of effective and efficient rainfall-runoff models using integration of deterministic, real-coded genetic algorithms and artificial neural network techniques. Water Resources Research, 40, W04302.

Jain A., Srinivasalu S. and Bhattacharjya, R.K. (2005). Determination of an optimal unit pulse response function using real-coded genetic algorithm. Journal of Hydrology, 303, 199-214.

Jain, A., Srinivasalu, S., Bhattacharjya, R.K. (2005). Determination of an optimal unit pulse response function using real-coded genetic algorithm. Journal of Hydrology, 303, 199-214.

James, W. and Robinson, M.A. (1986). Continuous deterministic urban runoff modelling, in C. Maksimovic and M. Radojkovic (Edition), Proceedings of the International Symposium on Comparison of Urban Drainage Models with Real Catchment Data, Dubrovnik, Yugoslavia, Pergamon Press, Oxford.

Jang, J.S.R. (1993). ANFIS: adaptive-network-based fuzzy inference system. IEE Transactions on Systems, Man and Cybernetics, 23(3), 665-685.

Jang, J.S.R., Sum, C.T. and Mizutani, E. (1997). Neuro-fuzzy and soft computing Prentice- Hall, New Jersey.

University of Western Sydney 173

Artificial Intelligence Based RFFA Aziz

Javelle, P., Ouarda, B.M.J.T., Lang, M., Bobee, B., Galea, G. and Gresillon, J.M. (2002). Devalopment of regional flood-duration-frequency curves based on the index flood method, Journal of Hydrology, 258, 249-259.

Jeong, D.I., Stedinger, J.R., Kim, Y., Sung, J.H. and Yoon, S.Y. (2008). Reflecting a Climate Change Factor in Flood Frequency Analysis for Korean River Basins. Water Down Under, Adelaide, Australia, 14-17 April.

Jiapeng, H., Zhongmin, L. and Zhongbo, Y. (2003). A modified rational formula for flood design in small basins, Journal of the American Water Resources Association, 39(5), 1017- 1025.

Jingyi, Z. and Hall, M.J. (2004). Regional flood frequency analysis for the Gan-Ming River basin in China, Journal of Hydrology, 296, 98–117.

Kendall, M.G. (1970). Rank Correlation Methods, 2nd Ed., New York: Hafner.

Khu, S.T., Liong, S.Y., Babovic, V., Madsen, H. and Muttil, N. (2001). Genetic programming and its application in real-time runoff forecasting. Journal of the American Water Resources Association, 37(2), 439-451.

Kirby, W. and Moss, M. (1987). Summary of flood frequency analysis in the United States. Journal of Hydrology, 96, 5-14.

Kisi, O. and Shiri, J. (2011). Precipitation forecasting using wavelet-genetic programming and wavelet-neuro-fuzzy conjunction models. Water Resources Management, 25(13), 3135–3152.

Kjeldsen, T.R. and Jones, D.A. (2010). Predicting the index flood in ungauged UK catchments: On the link between data-transfer and spatial model error structure, Journal of Hydrology, 387(1-2), 1-9.

Kjeldsen, T.R. and Jones, D. (2009). An exploratory analysis of error components in hydrological regression modelling. Water Resources Research, 45, W02407.

Klemes, V. (1993). Probability of extreme hydrometeorological events - A different approach in extreme hydrological events: Precipitation, Floods and Droughts, 167-176, IAHS, Publi.

Kothyari, U.C. (2004). Estimation of mean annual flood from ungauged catchments using artificial neural networks. Hydrology: Science and Practice for the 21st Century. Volume 1, British Hydrological Society.

Kuczera, G. (1983). A Bayesian surrogate for regional skew in flood frequency analysis. Water Resources Research, 19, 3, 832-837.

University of Western Sydney 174

Artificial Intelligence Based RFFA Aziz

Kuczera, G. (1999). Comprehensive at-site flood frequency analysis using Monte Carlo Bayesian inference. Water Resources Research, 35, 5, 1551-1557.

Lawrence, W.T. (1994). Comparative analysis of data acquired by three Narrow-band airbome spectroradiometers over subboreal vegetation. Remote Sens. Environ., vol47, 204- 215.

Luk, K.C., Ball, J.E. and Sharma, A. (2001). An application of artificial neural networks for rainfall forecasting. Mathematical and Computer Modelling, 33, 683-693.

Lumb, A.M. and James, L.D. (1976). Runoff files for flood hydrograph simulation. Journal of the Hydraulics Division, ASCE, 1515-1531.

Madsen, H., Rosbjerg, D. and Harremoes, P. (1995). Application of the Bayesian approach in regional analysis of extreme rainfalls. Stochastic Hydrology and Hydraulics, 9, 77-88.

Madsen, H., Pearson, C.P. and Rosbjerg, D. (1997). Comparison of annual maximum series and partial duration series for modeling extreme hydrological events—2. Regional modeling. Water Resources Research, 33(4), 771–781.

Maier, H.R. and Dandy, G.C. (2000). Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environmental Modelling and Software, 15(1), 101-123.

McCulloch, W.S. and Pitts, W. (1943). A logic calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133.

Micevski, T., Hackelbusch, A., Haddad, K., Kuczera, G., Rahman, A. (2014). Regionalisation of the parameters of the log-Pearson 3 distribution: a case study for New South Wales, Australia, Hydrological Processes, DOI: 10.1002/hyp.10147.

Minns, A. and Hall, M. (1996). Artificial neural networks as rainfall-runoff models. Hydrological Sciences, 41, 399-417.

Morshed, J. and Kaluarachchi, J.J. (1998). Application of artificial neural network and genetic algorithm in flow and transport simulations. Journal of Advances in Water Resources, 22(2), 145-158.

Mitchell, M. (1996). An Introduction to Genetic Algorithms. MIT Press.

Muttiah, R.S., Srinivasan, R. and Allen, P.M. (1997). Prediction of two year peak stream discharges using neural networks. Journal of the American Water Resources Association, 33 (3), 625–630.

University of Western Sydney 175

Artificial Intelligence Based RFFA Aziz Mulvany, T.J. (1851). On the use of self-registering rain and flood gauges, Inst. Civ. Eng. (Ireland) Trans, 4(2), 1-8.

Nathan, R.J. and McMahon, T.A. (1990). Identification of homogeneous regions for the purpose of regionalisation, Journal of Hydrology, 121, 217-238.

National Research Council (NRC). (1988). Estimating probabilities of extreme floods: methods and recommended research. National Academy Press, Washington, D.C., 141.

Nayak, P.C. and Sudheer, K.P. (2004). A neuro-fuzzy computing technique for modelling hydrological time series. Journal of Hydrology, 291(1–2), 52-66.

NERC. (1975). Flood studies report, Natural Environment Research Centre (NERC), London.

Novak, V., Perfilieva, I. and Mockor, J. (1999). Mathematical principles of fuzzy logic dodrecht: Kluwer Acedamic. ISBN 0-7923-8595-0.

Ouarda, T.B.M.J., Bâ, K.M., Diaz-Delgado, C., Cârsteanu, C., Chokmani, K., Gingras, H., Quentin, E., Trujillo, E. and Bobée, B. (2008). Intercomparison of regional flood frequency estimation methods at ungauged sites for a Mexican case study, Journal of Hydrology, 348, 40-58.

Pallard, B., Castellarin, A. and Montanari, A. (2009). A look at the links between drainage density and flood statistics, Hydrology and Earth System Sciences (HESS), 13, 1019-1029.

Pandey, G.R. and Nguyen, V.T.V. (1999). A comparative study of regression based methods in regional flood frequency analysis. Journal of Hydrology, 225, 92-101.

Parthiban, L. and Subramianian, R. (2009). CANFIS- A computer aided diagnostic tool for cancer detection. Journal of Biomedical Science and Engineering, 2, 323-335.

Pegram G.G.S. and Parak, M. (2004). A review of the regional maximum flood and rational formula using geomorphological information and observed floods, ISSN 0378-4738, Water South Africa, 30(3), 377-392.

Pilgrim, D.H. and McDermott, G.E. (1982). Design floods for small rural catchments in eastern New South Wales. Civil Engg Trans, Inst. Engrs Aust., CE24, 226-234.

Pilgrim, D.H. and Cordery, I. (1993). Flood Runoff, in Maidment, D.R., ed., Handbook of Hydrology, McGraw-Hill, New York, 9, 9.1-9.42.

Pirozzi, J., Ashraf, M., Rahman, A. and Haddad, K. (2009). Design flood estimation for ungauged catchments in eastern NSW: Evaluation of the probabilistic rational method. In Proc. 31st Hydrology and Water Resources Symposium, Newcastle, Australia.

University of Western Sydney 176

Artificial Intelligence Based RFFA Aziz Principe, J.C., Euliano, N.R. and Lefebvre, W.C. (2000). Neural and adaptive systems, John Wiley & Sons, Inc.

Queensland Reconstruction Authority (2011). Operation Queenslander: The State Community, Economic and Environmental Recovery and Reconstruction Plan 2011–2013. Queensland Reconstruction Authority, Queensland, Australia, March 2011, 48 pp. Rahman, A. (1997). Flood Estimation for ungauged catchments: A regional approach using flood and catchment characteristics, PhD thesis, Department of Civil Engineering, Monash University.

Rabunal, J.R., Puertas, J., Suarez, J. and Rivero, D. (2007). Determination of the unit hydrograph of a typical urban basin using genetic programming and artificial neural networks. Hydrological Process, 27(4), 476–485.

Rahman, A. (2005). A quantile regression technique to estimate design floods for ungauged catchments in South-east Australia. Australian Journal of Water Resources, 9(1), 81-89.

Rahman, A., Bates, B.C., Mein, R.G. and Weinmann, P.E. (1999). Regional flood frequency analysis for ungauged basins in south-eastern Australia. Australian Journal of Water Resources. 3(2), 199-207, 1324-1583.

Rahman, A., Weinmann, P.E. and Mein, R.G. (1999). At-site frequency analysis: LP3-product moment, GEV-L moment and GEV-LH moment procedures compared. Water 99 Joint Congress, 715-720.

Rahman, A., Weinmann, P.E., Hoang, T.M.T, Laurenson, E. M. (2002) Monte Carlo Simulation of flood frequency curves from rainfall. Journal of Hydrology, 256 (3-4), 196-210. ISSN 0022-1694.

Rahman, A. and Hollerbach, D. (2003). Study of runoff coefficients associated with the probabilistic rational method for flood estimation in South-east Australia In Proc. 28th Intl. Hydrology and Water Resources Symp., I. E. Aust., Wollongong, Australia, 10-13 Nov. 2003, 1, 199-203.

Rahman, A., Haddad, K., Caballero, W. and Weinmann, P.E. (2008). Progress on the enhancement of the Probabilistic Rational Method for Victoria in Australia. 31st Hydrology and Water Resources Symp., Adelaide, 15-17 April 2008, 940-951.

Rahman, A., Haddad, K., Kuczera, G. and Weinmann, P.E. (2009). Regional flood methods for Australia: data preparation and exploratory analysis. Australian Rainfall and Runoff Revision Projects, Project 5 Regional Flood Methods, Report No. P5/S1/003, Nov 2009, Engineers Australia, Water Engineering, pp. 181.

University of Western Sydney 177

Artificial Intelligence Based RFFA Aziz Rahman, A., Haddad, K., Zaman, M., Kuczera, G. and Weinmann, P.E. (2011). Design flood estimation in ungauged catchments: A comparison between the Probabilistic Rational Method and Quantile Regression Technique for NSW. Australian Journal of Water Resources, 14, 2, 127-137.

Rahman, A., Haddad, K., Zaman, M., Ishak, E., Kuczera, G. and Weinmann, P.E. (2012). Australian Rainfall and Runoff Revision Projects, Project 5 Regional flood methods, Stage 2 Report No. P5/S2/015, Engineers Australia, Water Engineering, pp. 319.

Rao, Z.F. and Jamieson, D.G. (1997). The use of neural networks and genetic algorithms for design of groundwater remediation schemes. Hydrology and Earth System Sciences, 1(2), 345-356.

Rao, A.R. and Hamed, K.H. (2000). Flood frequency analysis. CRC Press, Florida, USA. Riggs, H.C. (1973). Regional analyses of streamflow techniques. Techniques of water resources investigations of the U.S. Geol. Surv., Book 4, Chapter B3, U.S.Geol. Surv., Washington D.C.

Reed, D.W. and Robson, A.J. (1999). Flood estimation handbook, vol. 3. Centre for Ecology and Hydrology, UK.

Roger, J.S. Chuen-Tsai, S. and Eiji, M. (1997). Neuro-fuzzy and soft computing, Englewood Cliffs, Prentice Hall.

Rooij, A.J.F.V., Jain, L.C. and Johnson, R.P. (1996). Neural network training using genetic algorithms. World Scientific Publishing Co. Pty. Ltd., pp. 130.

Rumelhart, D.E., Hinton, G.E. and Williams, R.J. (1986). Learning internal representations by error propagation. In Rumelhart, D. E., McClelland, J. L. and the PDP Research Group, editors, Paralled Distributed Processing. Explorations in the Microstructure of Cognition. Vol. 1, 318-362. The MIT Press, Cambridge, MA.

Saf, B. (2009). Regional flood frequency analysis using L-Moments for the West Mediterranean Region of Turkey. Water Resources Management, 23(3), 531–551.

Savic, D.A., Walters, G.A. and Davidson, J.W. (1999). A genetic programming approach to rainfall-runoff modelling. Water Resources Management, 12, 219-231.

See, L., and Openshaw, S. (1999). Applying soft computing approaches to river level forecasting. Hydrological Sciences Journal, 44(5), 763-778.

Sekin, N. and Guven, A. (2012). Estimation of peak flood discharges at ungauged sites across Turkey, Water Resources Management, 26, 2569–2581.

University of Western Sydney 178

Artificial Intelligence Based RFFA Aziz Shamseldin, A.Y. (1997). Application of a neural network technique to rainfall-runoff modeling. Journal of Hydrology, 199, 272–294.

Shi, Y. and Muzimoto, M. (2000) Some Considerations on Convention Neuro-Fuzzy Learning Algorithms Gradient Descent Method, Fuzzy Sets and Systems, 112, 51-63.

Shu, C. and Burn, D.H. (2004). Artificial neural network ensembles and their application in pooled flood frequency analysis, Water Resources Research, 40(9), W09301, doi:10.1029/2003WR002816.

Shu, C. and Ouarda, T.B.M.J. (2007). Flood frequency analysis at ungauged sites using artificial neural networks in canonical correlation analysis physiographic space, Water Resources Research, 43, W07438, doi:10.1029/2006WR005142.

Shu, C. and Ouarda, T.B.M.J. (2008). Regional flood frequency analysis at ungauged sites using the adaptive neuro-fuzzy inference system. Journal of Hydrology, 349, 31-43.

Simonovic, S.P. (1992), Reservoir systems-analysis—Closing gap between theory and practice. Journal of Water Resources Planning and Management, 118(3), 262–280.

Smith, J.A. (1992). Representation of basin scale in flood peak distributions. Water Resources Research, 28 (11), 2993-2999.

Smith, J.A. (1993). LAI Inversion using a back-propagation neural network trained with a multiple scattering Model. IEEE Transactions on Geoscience and Remote Sensing, 31, 5,1102-1106.

Stedinger, J.R., Tasker, G.D. (1985). Regional hydrologic analysis - 1. Ordinary, weighted and generalized least squares compared. Water Resources Research, 21, 1421-1432.

Takens, F. (1981). Detecting strange attractors in turbulence. In: D.A. Rand and L.-S. Young, Editors, Dynamical systems and turbulence, Lecture Notes in Mathematics. Vol. 898, Springer-Verlag, Berlin, pp. 366–381.

Takagi, T. and M. Sugeno. (1983). Derivation of fuzzy control rules from human operator’s control actions. Proceedings of the IFAC symposium on fuzzy information, knowledge representation and decision analysis.

Takagi, T. and M. Sugeno. (1985). Fuzzy identification of systems and its applications to modeling and control. Systems, Man and Cybernetics, IEEE Transactions, (1), 116-132.

Takagi, H. and Hayashi, I. (1991). Neural Network driven fuzzy reasoning. International. Journal of Approximate Reasoning, 5(3), 191-212.

University of Western Sydney 179

Artificial Intelligence Based RFFA Aziz Talei, A. and Chua, L.H.C. (2010a). A novel application of a neuro-fuzzy computational technique in event-based rainfall–runoff modelling. Expert Systems with Applications, 37(12), 7456-7468.

Tasker, G.D. (1980). Hydrologic regression with weighted least squares. Water Resources Research, 16(6), 1107-1113.

Tasker, G.D., Eychaner, J.H. and Stedinger J.R. (1986). Application of generalised least squares in regional hydrologic regression analysis. US Geological Survey Water Supply Paper, 2310, 107–115.

Tasker, G.D., Hodge, S.A. and C.S. Barks. (1996). Region of Influence regression for estimat- ing the 50-year flood at ungauged sites, Water Resources Bulletin, 32(1), 163-170.

Thandaveswara, B.S. and Sajikumar, N. (2000). Classification of river basins using artificial neural networks. Journal of Hydrologic Engineering, 5 (3), 290–298.

Theodoridis, S. and Koutroumbas, K. (2009). Pattern Recognition, 4th Edition, Academic Press, ISBN: 978-1-59749-272-0.

Thomas, D.M. and Benson, M.A. (1970). Generalization of streamflow characteristics from drainage-basin characteristics, U.S. Geological Survey Water Supply Paper 1975, US Governmental Printing Office.

Tokar, A.S. and Johnson, P.A. (1999). Rainfall-Runoff Modeling using Artificial Neural Networks, J. Hydrologic Engineering, ASCE, 4(3), 232-239.

Turan, M.E. and Yurdusev, M.A. (2009). River flow estimation from upstream flow records by artificial intelligence methods. Journal of Hydrology, 369, 71–77.

Vogel, R.M., McMahon, T.A. and Chiew, F.H.S. (1993). Flood flow frequency model selection in Australia. Journal of Hydrology, 146, 421-449.

Wang, Q.J. (1991). The genetic algorithm and its application to calibrating conceptual rainfall-runoff models. Water Resources Research, 27(9), 2467-2471.

Wasserman, P.D. (1989). Neural computing: theory and practice. Van Nostrand Reinhold, New York.

Wasserman, P. (1993). Advanced methods in neural computing, Van Nostrand Reinhold, ISBN 0-442-00461-3.

Weeks, W.D. (1991). Design floods for small rural catchments in Queensland, Civil Engineering Transactions, IEAust, 33(4), 249-260.

University of Western Sydney 180

Artificial Intelligence Based RFFA Aziz Wu, C.L. and Chau, K.W. (2006). A flood forecasting neural network model with genetic algorithm. International Journal Environment and Pollution, 28, 261, 3-4.

Zaman, M., Rahman, A., Haddad, K. (2012). Regional flood frequency analysis in arid regions: A case study for Australia. Journal of Hydrology, 475, 74-83.

Zhang, B. and Govindaraju, R.S. (2003). Geomorphology-based artificial neural networks for estimation of direct runoff over watersheds. Journal of Hydrology, 273 (1), 18–34.

Zhang, Z. and Hall, D.B. (2004). Marginal models for zero inflated clustered data. Statistical Modelling, 4, 161–180.

Zrinji, Z. and Burn, D.H. (1994). Flood frequency analysis for ungauged sites using a region of influence approach. Journal of Hydrology, 153(1-4), 1-21.

University of Western Sydney 181

Artificial Intelligence Based RFFA Aziz

APPENDICES

University of Western Sydney 182

Artificial Intelligence Based RFFA Aziz

APPENDIX A

List of selected study catchments

University of Western Sydney 183

Artificial Intelligence Based RFFA Aziz Appendix A List of selected catchments Table A1 Selected catchments from New South Wales Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 201001 Eungella Oxley -28.36 153.29 213 49 1958 - 2006 203002 Repentance Coopers Ck -28.64 153.41 62 30 1977 - 2006 203012 Binna Burra Byron Ck -28.71 153.50 39 29 1978 - 2006 203030 Rappville Myrtle Ck -29.11 153.00 332 27 1980 - 2006 204025 Karangi Orara -30.26 153.03 135 37 1970 - 2006 204026 Bobo Nursery Bobo -30.25 152.85 80 29 1956 - 1984 204030 Aberfoyle Aberfoyle -30.26 152.01 200 29 1978 - 2006 204036 Sandy Hill(below Snake Cre Cataract Ck -28.93 152.22 236 54 1953 - 2006 204037 Clouds Ck Clouds Ck -30.09 152.63 62 35 1972 - 2006 204056 Gibraltar Range Dandahra Ck -29.49 152.45 104 31 1976 - 2006 204906 Glenreagh Orara -30.07 152.99 446 34 1973 - 2006 206009 Tia Tia -31.19 151.83 261 53 1955 - 2007 206025 near Dangar Falls Salisbury Waters -30.68 151.71 594 34 1973 - 2006 206026 Newholme Sandy Ck -30.42 151.66 8 33 1975 - 2007 207006 Birdwood(Filly Flat) Forbes -31.39 152.33 363 32 1976 - 2007 208001 Bobs Crossing Barrington -32.03 151.47 20 52 1955 - 2006 209001 Monkerai Karuah -32.24 151.82 203 34 1946 - 1979 209002 Crossing Mammy Johnsons -32.25 151.98 156 31 1976 - 2006 209003 Booral Karuah -32.48 151.95 974 38 1969 - 2006 209006 Willina Wang Wauk -32.16 152.26 150 36 1970 - 2005 209018 Dam Site Karuah -32.28 151.90 300 27 1980 - 2006 210011 Tillegra Williams -32.32 151.69 194 75 1932 - 2006 210014 Rouchel Brook (The Vale) Rouchel Brook -32.15 151.05 395 42 1960 - 2001 210017 Moonan Brook Moonan Brook -31.94 151.28 103 67 1941 - 2007

University of Western Sydney 184

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 210022 Halton Allyn -32.31 151.51 205 65 1941 - 2005 210040 Wybong Wybong Ck -32.27 150.64 676 50 1956 - 2005 210042 Ravensworth Foy Brook -32.40 151.05 170 30 1967 - 1996 210044 Middle Falbrook(Fal Dam Si Glennies Ck -32.45 151.15 466 51 1957 - 2007 210068 Pokolbin Site 3 Pokolbin Ck -32.80 151.33 25 41 1965 - 2005 210076 Liddell Antiene Ck -32.34 150.98 13 37 1969 - 2005 210079 Gostwyck Paterson -32.55 151.59 956 33 1975 - 2007 210080 U/S Glendon Brook West Brook -32.47 151.28 80 31 1977 - 2007 211009 Gracemere Wyong -33.27 151.36 236 35 1973 - 2007 211013 U/S Weir Ourimbah Ck -33.35 151.34 83 30 1977 - 2006 212008 Bathurst Rd Coxs -33.43 150.08 199 55 1952 - 2006 212018 Glen Davis Capertee -33.12 150.28 1010 35 1972 - 2006 212040 Pomeroy Kialla Ck -34.61 149.54 96 27 1980 - 2004 213005 Briens Rd Toongabbie Ck -33.80 150.98 70 27 1980 - 2006 215004 Hockeys Corang -35.15 150.03 166 75 1930 - 2004 218002 Belowra Tuross -36.20 149.71 556 29 1955 - 1983 218005 D/S Wadbilliga R Junct Tuross -36.20 149.76 900 42 1965 - 2006 218007 Wadbilliga Wadbilliga -36.26 149.69 122 33 1975 - 2005 219003 Morans Crossing Bemboka -36.67 149.65 316 64 1944 - 2007 219017 near Brogo Double Ck -36.60 149.81 152 41 1967 - 2007 219022 Candelo Dam Site Tantawangalo Ck -36.73 149.68 202 36 1972 - 2007 219025 Angledale Brogo -36.62 149.88 717 30 1977 - 2006 220001 New Buildings Br Towamba -36.96 149.56 272 26 1955 - 1980 220003 Lochiel Pambula -36.94 149.82 105 41 1967 - 2005

University of Western Sydney 185

Artificial Intelligence Based RFFA Aziz Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 220004 Towamba Towamba -37.07 149.66 745 37 1971 - 2007 221002 Princes HWY Wallagaraugh -37.37 149.71 479 36 1972 - 2007 222004 Wellesley (Rowes) Little Plains -37.00 149.09 604 65 1942 - 2006 222007 Woolway Wullwye Ck -36.42 148.91 520 57 1950 - 2006 222009 The Falls Bombala -36.92 149.21 559 43 1952 - 1994 222015 Jacobs Ladder Jacobs -36.73 148.43 187 27 1976 - 2002 222016 The Barry Way Pinch -36.79 148.40 155 31 1976 - 2006 222017 The Hut Maclaughlin -36.66 149.11 313 28 1979 - 2006 401009 Maragle Maragle Ck -35.93 148.10 220 56 1950 - 2005 401013 Jingellic Jingellic Ck -35.90 147.69 378 33 1973 - 2005 401015 Yambla Bowna Ck -35.92 146.98 316 31 1975 - 2005 410038 Darbalara Adjungbilly Ck -35.02 148.25 411 37 1969 - 2005 410048 Ladysmith Kyeamba Ck -35.20 147.51 530 48 1939 - 1986 410057 Lacmalac Goobarragandra -35.33 148.35 673 49 1958 - 2006 410061 Batlow Rd Adelong Ck -35.33 148.07 155 60 1948 - 2007 410062 Numeralla School Numeralla -36.18 149.35 673 43 1965 - 2007 410076 Jerangle Rd Strike-A-Light C -35.92 149.24 212 31 1975 - 2005 410088 Brindabella (No.2&No.3-Cab Goodradigbee -35.42 148.73 427 38 1968 - 2005 410112 Jindalee Jindalee Ck -34.58 148.09 14 30 1976 - 2005 410114 Wyangle Killimcat Ck -35.24 148.31 23 30 1977 - 2006 411001 Bungendore Mill Post Ck -35.28 149.39 16 25 1960 - 1984 411003 Butmaroo Butmaroo Ck -35.26 149.54 65 28 1979 - 2006 412050 Narrawa North Crookwell -34.31 149.17 740 34 1970 - 2003 412063 Gunning Lachlan -34.74 149.29 570 39 1961 - 1999

University of Western Sydney 186

Artificial Intelligence Based RFFA Aziz Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 412081 near Neville Rocky Br Ck -33.80 149.19 145 33 1969 - 2001 412083 Tuena Tuena Ck -34.02 149.33 321 33 1969 - 2001 416003 Clifton Tenterfield Ck -29.03 151.72 570 28 1979 - 2006 416008 Haystack Beardy -29.22 151.38 866 35 1972 - 2006 416016 Inverell (Middle Ck) Macintyre -29.79 151.13 726 35 1972 - 2006 416020 Coolatai Ottleys Ck -29.23 150.76 402 28 1979 - 2006 416023 Bolivia Deepwater -29.29 151.92 505 28 1979 - 2006 418005 Kimberley Copes Ck -29.92 151.11 259 35 1972 - 2006 418014 Yarrowyck Gwydir -30.47 151.36 855 37 1971 - 2007 418017 Molroy Myall Ck -29.80 150.58 842 29 1979 - 2007 418021 Laura Laura Ck -30.23 151.19 311 29 1978 - 2006 418025 Bingara Halls Ck -29.94 150.57 156 28 1980 - 2007 418027 Horton Dam Site Horton -30.21 150.43 220 36 1972 - 2007 418034 Black Mountain Boorolong Ck -30.30 151.64 14 29 1976 - 2004 419010 Woolbrook Macdonald -30.97 151.35 829 28 1980 - 2007 419016 Mulla Crossing Cockburn -31.06 151.13 907 33 1974 - 2006 419029 Ukolan Halls Ck -30.71 150.83 389 27 1979 - 2005 419051 Avoca East Maules Ck -30.50 150.08 454 31 1977 - 2007 419053 Black Springs Manilla -30.42 150.65 791 33 1975 - 2007 419054 Limbri Swamp Oak Ck -31.04 151.17 391 33 1975 - 2007 420003 Warkton (Blackburns) Belar Ck -31.39 149.20 133 30 1976 - 2005 421026 Sofala Turon -33.08 149.69 883 34 1974 - 2007 421036 below Dam Site Duckmaloi -33.75 149.94 112 25 1956 - 1980 421050 Molong Bell -33.03 148.95 365 33 1975 - 2007

University of Western Sydney 187

Artificial Intelligence Based RFFA Aziz Table A2 Selected catchments from Victoria

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 221207 Errinundra Errinundra -37.45 148.91 158 35 1971 - 2005 221209 Weeragua Cann(East Branch -37.37 149.20 154 33 1973 - 2005 221210 The Gorge Genoa -37.43 149.53 837 34 1972 - 2005 221211 Combienbar Combienbar -37.44 148.98 179 32 1974 - 2005 221212 Princes HWY Bemm -37.61 148.90 725 31 1975 - 2005 222202 Sardine Ck Brodribb -37.51 148.55 650 41 1965 - 2005 222206 Buchan Buchan -37.50 148.18 822 32 1974 - 2005 222210 Deddick (Caseys) Deddick -37.09 148.43 857 35 1970 - 2005 222213 Suggan Buggan Suggan Buggan -36.95 148.33 357 35 1971 - 2005 222217 Jacksons Crossing Rodger -37.41 148.36 447 30 1976 - 2005 223202 Swifts Ck Tambo -37.26 147.72 943 32 1974 - 2005 223204 Deptford Nicholson -37.60 147.70 287 32 1974 - 2005 224213 Lower Dargo Rd Dargo -37.50 147.27 676 33 1973 - 2005 224214 Tabberabbera Wentworth -37.50 147.39 443 32 1974 - 2005 225213 Beardmore Aberfeldy -37.85 146.43 311 33 1973 - 2005 225218 Briagalong Freestone Ck -37.81 147.09 309 35 1971 - 2005 225219 Glencairn Macalister -37.52 146.57 570 39 1967 - 2005 225223 Gillio Rd Valencia Ck -37.73 146.98 195 35 1971 - 2005 225224 The Channel Avon -37.80 146.88 554 34 1972 - 2005 226204 Willow Grove Latrobe -38.09 146.16 580 35 1971 - 2005 226205 Noojee Latrobe -37.91 146.02 290 46 1960 - 2005 226209 Darnum Moe -38.21 146.00 214 34 1972 - 2005 226217 Hawthorn Br Latrobe -37.98 146.08 440 34 1955 - 1988 226218 Thorpdale Narracan Ck -38.27 146.19 66 35 1971 - 2005

University of Western Sydney 188

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 226222 Near Noojee (U/S Ada R Jun Latrobe -37.88 145.89 62 35 1971 - 2005 226226 Tanjil Junction Tanjil -38.01 146.20 289 46 1960 - 2005 226402 Trafalgar East Moe Drain -38.18 146.21 622 31 1975 - 2005 227200 Yarram Tarra -38.46 146.69 25 41 1965 - 2005 227205 Calignee South Merriman Ck -38.36 146.65 36 31 1975 - 2005 227210 Carrajung Lower Bruthen Ck -38.40 146.74 18 33 1973 - 2005 227211 Toora Agnes -38.64 146.37 67 32 1974 - 2005 227213 Jack Jack -38.53 146.53 34 36 1970 - 2005 227219 Loch Bass -38.38 145.56 52 32 1973 - 2004 227225 Fischers Tarra -38.47 146.56 16 33 1973 - 2005 227226 Dumbalk North Tarwineast Branc -38.50 146.16 127 36 1970 - 2005 227231 Glen Forbes South Bass -38.47 145.51 233 32 1974 - 2005 227236 D/S Foster Ck Jun Powlett -38.56 145.71 228 27 1979 - 2005 228212 Tonimbuk Bunyip -38.03 145.76 174 30 1975 - 2004 228217 Pakenham Toomuc Ck -38.07 145.46 41 29 1974 - 2002 229218 Watsons Ck Watsons Ck -37.67 145.26 36 26 1974 - 1999 230202 Sunbury Jackson Ck -37.58 144.74 337 31 1975 - 2005 230204 Riddells Ck Riddells Ck -37.47 144.67 79 32 1974 - 2005 230205 Bulla (D/S of Emu Ck Jun) Deep Ck -37.63 144.80 865 32 1974 - 2005 230211 Clarkefield Emu Ck -37.47 144.75 93 31 1975 - 2005 231200 Bacchus Marsh Werribee Ck -37.68 144.43 363 28 1978 - 2005 231213 Sardine Ck- O'Brien Cro Lerderderg Ck -37.50 144.36 153 47 1959 - 2005 231225 Ballan (U/S Old Western H) Werribee Ck -37.60 144.25 71 33 1973 - 2005 231231 Melton South Toolern Ck -37.91 144.58 95 27 1979 - 2005

University of Western Sydney 189

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 232200 Little Little Ck -37.96 144.48 417 32 1974 - 2005 232210 Lal Lal Mooraboolwest Br -37.65 144.04 83 33 1973 - 2005 232213 U/S of Bungal Dam Lal Lal Ck -37.66 144.03 157 29 1977 - 2005 233211 Ricketts Marsh Birregurra Ck -38.30 143.84 245 31 1975 - 2005 233214 Forrest (above Tunnel) Barwoneast Branc -38.53 143.73 17 28 1978 - 2005 234200 Pitfield Woady Yaloak -37.81 143.59 324 34 1972 - 2005 235202 Upper Gellibrand Gellibrand -37.56 143.64 53 31 1975 - 2005 235203 Curdie Curdies -38.45 142.96 790 31 1975 - 2005 235204 Beech Forest Little Aire Ck -38.66 143.53 11 30 1976 - 2005 235205 Wyelangta Arkins Ck West B -38.65 143.44 3 28 1978 - 2005 235227 Bunkers Hill Gellibrand -38.53 143.48 311 32 1974 - 2005 235233 Apollo Bay- Paradise Barhameast Branc -38.76 143.62 43 29 1977 - 2005 235234 Gellibrand Love Ck -38.49 143.57 75 27 1979 - 2005 236205 Woodford Merri -38.32 142.48 899 32 1974 - 2005 236212 Cudgee Brucknell Ck -38.35 142.65 570 31 1975 - 2005 237207 Heathmere Surry -38.25 141.66 310 31 1975 - 2005 238207 Jimmy Ck Wannon -37.37 142.50 40 32 1974 - 2005 238219 Morgiana Grange Burn -37.71 141.83 997 33 1973 - 2005 401208 Berringama Cudgewa Ck -36.21 147.68 350 41 1965 - 2005 401209 Omeo Livingstone Ck -37.11 147.57 243 27 1968 - 1994 401210 below Granite Flat Snowy Ck -36.57 147.41 407 38 1968 - 2005 401212 Upper Nariel Nariel Ck -36.45 147.83 252 52 1954 - 2005 401215 Uplands Morass Ck -36.87 147.70 471 35 1971 - 2005 401216 Jokers Ck Big -36.95 141.47 356 52 1952 - 2005

University of Western Sydney 190

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 401217 Gibbo Park Gibbo -36.75 147.71 389 35 1971 - 2005 401220 McCallums Tallangatta Ck -36.21 147.50 464 30 1976 - 2005 402203 Mongans Br Kiewa -36.60 147.10 552 36 1970 - 2005 402204 Osbornes Flat Yackandandah Ck -36.31 146.90 255 39 1967 - 2005 402206 Running Ck Running Ck -36.54 147.05 126 31 1975 - 2005 402217 Myrtleford Rd Br Flaggy Ck -36.39 146.88 24 36 1970 - 2005 403205 Bright Ovens Rivers -36.73 146.95 495 35 1971 - 2005 403209 Wangaratta North Reedy Ck -36.33 146.34 368 33 1973 - 2005 403213 Greta South Fifteen Mile Ck -36.62 146.24 229 33 1973 - 2005 403221 Woolshed Reedy Ck -36.31 146.60 214 30 1975 - 2004 403222 Abbeyard Buffalo -36.91 146.70 425 33 1973 - 2005 403224 Bobinawarrah Hurdle Ck -36.52 146.45 158 31 1975 - 2005 403226 Angleside Boggy Ck -36.61 146.36 108 32 1974 - 2005 403227 Cheshunt King -36.83 146.40 453 33 1973 - 2005 403233 Harris Lane Buckland -36.72 146.88 435 34 1972 - 2005 404206 Moorngag Broken -36.80 146.02 497 33 1973 - 2005 404207 Kelfeera Holland Ck -36.61 146.06 451 31 1975 - 2005 405205 Murrindindi above Colwells Murrindindi -37.41 145.56 108 31 1975 - 2005 405209 Taggerty Acheron -37.32 145.71 619 33 1973 - 2005 405212 Tallarook Sunday Ck -37.10 145.05 337 31 1975 - 2005 405214 Tonga Br Delatite -37.15 146.13 368 49 1957 - 2005 405215 Glen Esk Howqua -37.23 146.21 368 32 1974 - 2005 405217 Devlins Br Yea -37.38 145.48 360 31 1975 - 2005 405218 Gerrang Br Jamieson -37.29 146.19 368 47 1959 - 2005

University of Western Sydney 191

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 405219 Dohertys Goulburn -37.33 146.13 694 39 1967 - 2005 405226 Moorilim Pranjip Ck -36.62 145.31 787 32 1974 - 2005 405227 Jamieson Big Ck -37.37 146.06 619 36 1970 - 2005 405229 Wanalta Wanalta Ck -36.64 144.87 108 36 1969 - 2005 405230 Colbinabbin Cornella Ck -36.61 144.80 259 33 1973 - 2005 405231 Flowerdale King Parrot Ck -37.35 145.29 181 32 1974 - 2005 405237 Euroa Township Seven Creeks -36.76 145.58 332 33 1973 - 2005 405240 Ash Br Sugarloaf Ck -37.06 145.05 609 33 1973 - 2005 405241 Rubicon Rubicon -37.29 145.83 129 33 1973 - 2005 405245 Mansfield Ford Ck -37.04 146.05 115 36 1970 - 2005 405248 Graytown Major Ck -36.86 144.91 282 35 1971 - 2005 405251 Ancona Brankeet Ck -36.97 145.78 121 33 1973 - 2005 405263 U/S of Snake Ck Jun Goulburn -37.46 146.25 327 31 1975 - 2005 405264 D/S of Frenchman Ck Jun Big -37.52 146.08 333 31 1975 - 2005 405274 Yarck Home Ck -37.11 145.60 187 29 1977 - 2005 406213 Redesdale Campaspe -37.02 144.54 629 30 1975 - 2004 406214 Longlea Axe Ck -36.78 144.43 234 34 1972 - 2005 406215 Lyal Coliban -36.96 144.49 717 32 1974 - 2005 406216 Sedgewick Axe Ck -36.90 144.36 34 26 1975 - 2005 406224 Runnymede Mount Pleasant C -36.55 144.64 248 30 1975 - 2004 406226 Derrinal Mount Ida Ck -36.88 144.65 174 28 1978 - 2005 407214 Clunes Creswick Ck -37.30 143.79 308 31 1975 - 2005 407217 Vaughan atD/S Fryers Ck Loddon -37.16 144.21 299 38 1968 - 2005 407220 Norwood Bet Bet Ck -37.00 143.64 347 33 1973 - 2005

University of Western Sydney 192

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 407221 Yandoit Jim Crow Ck -37.21 144.10 166 33 1973 - 2005 407222 Clunes Tullaroop Ck -37.23 143.83 632 33 1973 - 2005 407230 Strathlea Joyces Ck -37.17 143.96 153 33 1973 - 2005 407246 Marong Bullock Ck -36.73 144.13 184 33 1973 - 2005 407253 Minto Piccaninny Ck -36.45 144.47 668 33 1973 - 2005 415207 Eversley Wimmera -37.19 143.19 304 31 1975 - 2005 415217 Grampians Rd Br Fyans Ck -37.26 142.53 34 33 1973 - 2005 415220 Wimmera HWY Avon -36.64 142.98 596 32 1974 - 2005 415226 Carrs Plains Richardson -36.75 142.79 130 31 1971 - 2001 415237 Stawell Concongella Ck -37.02 142.82 239 29 1977 - 2005 415238 Navarre Wattle Ck -36.90 143.10 141 30 1976 - 2005

University of Western Sydney 193

Artificial Intelligence Based RFFA Aziz

Table A3 Selected catchments from Tasmania

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 76 at Ballroom Offtake North Esk -41.50 147.39 335.0 74 1923 - 1996 159 D/S Rapid Arthur -41.12 145.08 1600.0 42 1955 - 1996 473 D/S Crossing Rv Davey -43.14 145.95 680.0 34 1964 - 1997 499 at Newbury Tyenna -42.71 146.71 198.0 33 1965 - 1997 852 at Strathbridge Meander -41.49 146.91 1025.0 24 1985 - 2008 1012 3.5 Km U/S Esperance Peak Rivulet -43.32 146.90 35.0 23 1975 - 1997 1200 at Whitemark Water Supply South Pats -40.09 148.02 21.0 22 1969 - 1990 2200 at The Grange Swan -42.05 148.07 440.0 33 1964 - 1996 2204 U/S Coles Bay Rd Bdg Apsley -41.94 148.24 157.0 24 1969 - 1992 2206 U/S Scamander Water Supply Scamander -41.45 148.18 265.0 28 1969 - 1996 2207 3 Km U/S Tasman Hwy Little Swanport -42.34 147.90 600.0 19 1971 - 1989 2208 at Swansea Meredith -42.12 148.04 88.0 27 1970 - 1996 2209 Tidal Limit Carlton -42.87 147.70 136.0 28 1969 - 1996 2211 U/S Brinktop Rd Orielton Rivulet -42.76 147.54 46.0 24 1973 - 1996 2213 D/S McNeils Rd Goatrock Ck -42.14 147.92 1.3 22 1975 - 1996 3203 at Baden Coal -42.43 147.45 55.0 26 1971 - 1996 4201 at Mauriceton Jordan -42.53 147.12 730.0 36 1966 - 2001 5200 at Summerleas Rd Br Browns -42.96 147.27 15.0 30 1963 - 1992 6200 D/S Grundys Ck Mountain -42.94 147.13 42.0 29 1968 - 1996 7200 Dover Ws Intake Esperance -43.34 146.96 174.0 29 1965 - 1993 14206 1.5 Km U/S of Mouth Sulphur Ck -41.11 146.03 23.0 29 1964 - 1992 14207 at Bannons Br Leven -41.25 146.09 495.0 35 1963 - 1997 14210 U/S Flowerdale R Juncti Inglis -41.00 145.63 170.0 21 1968 - 1988 14215 at Moorleah Flowerdale -40.97 145.61 150.0 31 1966 - 1996

University of Western Sydney 194

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 14217 at Sprent Claytons Rivulet -41.26 146.17 13.5 26 1970 - 1995 14220 U/S Bass HWY Seabrook Ck -41.01 145.77 40.0 20 1977 - 1996 16200 U/S Old Bass Hwy Don -41.19 146.31 130.0 24 1967 - 1990 17200 at Tidal Limit Rubicon -41.26 146.57 255.0 31 1967 - 1997 17201 1.5KM U/S Tidal Limit Franklin Rivulet -41.26 146.61 131.0 20 1975 - 1994 18201 0.5 Km U/S Tamar Supply -41.26 146.94 135.0 19 1965 - 1983 18221 D/S Jackeys Marsh Jackeys Ck -41.68 146.66 29.0 27 1982 - 2008 18312 D/S Elizabeth R Junctio Macquarie -41.91 147.39 1900.0 19 1989 - 2007 19200 2.6KM U/S Tidal Limit Brid -41.02 147.37 134.0 32 1965 - 1996 19201 2KM U/S Forester Rd Bdg Great Forester -41.11 147.61 195.0 27 1970 - 1996 19204 D/S Yarrow Ck Pipers -41.07 147.11 292.0 25 1972 - 1996 304040 U/S Derwent Junction Florentine River -42.44 146.52 435.8 58 1951 - 2008 304125 Below Lagoon Travellers Rest River -42.07 146.25 43.6 25 1949 - 1973 304597 At Lake Highway Pine Tree Rivulet Ck -41.80 146.68 19.4 40 1969 - 2008 308145 At Mount Ficham Track -42.24 145.77 757.0 56 1953 - 2008 308183 Below Jane River Franklin River -42.47 145.76 1590.3 22 1957 - 1978 308225 Below Darwin Dam Andrew River -42.22 145.62 5.3 21 1988 - 2008 308446 Below Huntley -42.66 146.37 458.0 27 1953 - 1979 308799 B/L Alma Collingwood Ck -42.16 145.93 292.5 28 1981 - 2008 308819 Above Rd Andrew River -42.22 145.62 4.6 26 1983 - 2008 310061 At Que River -41.58 145.68 18.4 22 1987 - 2008 310148 Above Sterling Murchison River -41.76 145.62 756.3 28 1955 - 1982 310149 Below Sophia River -41.72 145.63 523.2 27 1954 - 1980 310472 Below Bulgobac Creek Que River -41.62 145.58 119.1 32 1964 - 1995

University of Western Sydney 195

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 315074 At Moina Wilmot River -41.47 146.07 158.1 46 1923 - 1968 315450 U/S Lemonthyme -41.61 146.13 311.0 46 1963 - 2008 316624 Above Mersey Arm River -41.69 146.21 86.0 37 1972 - 2008 318065 Below Deloraine -41.53 146.66 474.0 28 1969 - 1996 318350 Above Rocky Creek Whyte River -41.63 145.19 310.8 33 1960 - 1992

University of Western Sydney 196

Artificial Intelligence Based RFFA Aziz Table A4 Selected catchments from Queensland

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 102101A Fall Ck Pascoe -12.88 142.98 651 33 1968 - 2005 104001A Telegraph Rd Stewart -14.17 143.39 470 32 1970 - 2005 105105A Developmental Rd East Normanby -15.77 145.01 297 34 1970 - 2005 107001B Flaggy Endeavour -15.42 145.07 337 43 1959 - 2004 108002A Bairds Daintree -16.18 145.28 911 29 1969 - 2000 108003A China Camp Bloomfield -15.99 145.29 264 32 1971 - 2004 110003A Picnic Crossing Barron -17.26 145.54 228 80 1926 - 2005 110011B Recorder Flaggy Ck -16.78 145.53 150 44 1956 - 2003 110101B Freshwater Freshwater Ck -16.94 145.70 70 37 1922 - 1958 111001A Gordonvale Mulgrave -17.10 145.79 552 43 1917 - 1972 111003C Aloomba Behana Ck -17.13 145.84 86 28 1943 - 1970 111005A The Fisheries Mulgrave -17.19 145.72 357 34 1967 - 2004 111007A Peets Br Mulgrave -17.14 145.76 520 31 1973 - 2004 111105A The Boulders Babinda Ck -17.35 145.87 39 29 1967 - 2003 112001A Goondi North Johnstone -17.53 145.97 936 39 1929 - 1967 112002A Nerada Fisher Ck -17.57 145.91 15.7 75 1929 - 2004 112003A Glen Allyn North Johnstone -17.38 145.65 165 46 1959 - 2004 112004A Tung Oil North Johnstone -17.55 145.93 925 31 1967 - 2004 112101B U/S Central Mill South Johnstone -17.61 145.98 400 81 1917 - 2003 113004A Powerline Cochable Ck -17.75 145.63 95 32 1967 - 2001 114001A Upper Murray Murray -18.11 145.80 156 31 1971 - 2003 116005B Peacocks Siding Stone -18.69 145.98 368 36 1936 - 1971 116008B Abergowrie Gowrie Ck -18.45 145.85 124 51 1954 - 2004 116010A Blencoe Falls Blencoe Ck -18.20 145.54 226 40 1961 - 2000

University of Western Sydney 197

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 116011A Ravenshoe Millstream -17.60 145.48 89 42 1963 - 2004 116012A 8.7KM Cameron Ck -18.07 145.34 360 41 1962 - 2002 116013A Archer Ck Millstream -17.65 145.34 308 42 1962 - 2003 116014A Silver Valley Wild -17.63 145.30 591 44 1962 - 2005 116015A Wooroora Blunder Ck -17.74 145.44 127 38 1967 - 2004 116017A Running Ck Stone -18.77 145.95 157 33 1971 - 2004 117002A Bruce HWY Black -19.24 146.63 256 31 1974 - 2004 117003A Bluewater Bluewater Ck -19.18 146.55 86 30 1974 - 2003 118101A Gleesons Weir Ross -19.32 146.74 797 44 1916 - 1959 118106A Allendale Alligator Ck -19.39 146.96 69 30 1975 - 2004 119006A Damsite Major Ck -19.67 147.02 468 25 1979 - 2003 120014A Oak Meadows Broughton -20.18 146.32 182 28 1971 - 1998 120102A Keelbottom Keelbottom Ck -19.37 146.36 193 38 1968 - 2005 120120A Mt. Bradley Running -19.13 145.91 490 30 1976 - 2005 120204B Crediton Recorder Broken -21.17 148.51 41 31 1957 - 1987 120206A Mt Jimmy Pelican Ck -20.60 147.69 545 27 1961 - 1987 120216A Old Racecourse Broken -21.19 148.45 100 36 1970 - 2005 120307A Pentland Cape -20.48 145.47 775 34 1970 - 2003 121001A Ida Ck Don -20.29 148.12 604 48 1958 - 2005 121002A Guthalungra Elliot -19.94 147.84 273 32 1974 - 2005 122004A Lower Gregory Gregory -20.30 148.55 47 33 1973 - 2005 124001A Caping Siding O'Connell -20.63 148.57 363 35 1970 - 2004 124002A Calen StHelens Ck -20.91 148.76 118 32 1974 - 2005 124003A Jochheims Andromache -20.58 148.47 230 29 1977 - 2005

University of Western Sydney 198

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 125002C Sarich's Pioneer -21.27 148.82 757 43 1961 - 2005 125004B Gargett Cattle Ck -21.18 148.74 326 38 1968 - 2005 125005A Whitefords Blacks Ck -21.33 148.83 506 32 1974 - 2005 125006A Dam Site Finch Hatton Ck -21.11 148.63 35 29 1977 - 2005 126003A Carmila Carmila Ck -21.92 149.40 84 31 1974 - 2004 129001A Byfield Waterpark Ck -22.84 150.67 212 48 1953 - 2005 130004A Old Stn Raglan Ck -23.82 150.82 389 41 1964 - 2004 130108B Curragh Blackwater Ck -23.50 148.88 776 31 1973 - 2005 130207A Clermont Sandy Ck -22.80 147.58 409 40 1966 - 2005 130208A Ellendale Theresa Ck -22.98 147.58 758 37 1965 - 2001 130215A Lilyvale Lagoon Crinum Ck -23.21 148.34 252 29 1977 - 2005 130319A Craiglands Bell Ck -24.15 150.52 300 44 1961 - 2004 130321A Mt. Kroombit Kroombit Ck -24.41 150.72 373 41 1964 - 2004 130334A Pump Stn South Kariboe Ck -24.56 150.75 284 33 1973 - 2005 130335A Wura Dee -23.77 150.36 472 34 1972 - 2005 130336A Folding Hills Grevillea Ck -24.58 150.62 233 33 1973 - 2005 130348A Red Hill Prospect Ck -24.45 150.42 369 30 1976 - 2005 130349A Kingsborough Don -23.97 150.39 593 28 1977 - 2005 130413A Braeside Denison Ck -21.77 148.79 757 34 1972 - 2005 133003A Marlua Diglum Ck -24.19 151.16 203 36 1969 - 2004 135002A Springfield Kolan -24.75 151.59 551 40 1966 - 2005 135004A Dam Site Gin Gin Ck -24.97 151.89 531 40 1966 - 2005 136006A Dam Site Reid Ck -25.27 151.52 219 40 1966 - 2005 136102A Meldale Three Moon Ck -24.69 150.96 310 32 1949 - 1980

University of Western Sydney 199

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 136107A Cania Gorge Three Moon Ck -24.73 151.01 370 26 1963 - 1988 136108A Upper Monal Monal Ck -24.61 151.11 92 43 1963 - 2005 136111A Dakiel Splinter Ck -24.75 151.26 139 41 1965 - 2005 136112A Yarrol Burnett -24.99 151.35 370 40 1966 - 2005 136202D Litzows Barambah Ck -26.30 152.04 681 85 1921 - 2005 136203A Brooklands Barker Ck -26.74 151.82 249 64 1941 - 2005 136301B Weens Br Stuart -26.50 151.77 512 66 1936 - 2005 137001B Elliott Elliott -24.99 152.37 220 52 1949 - 2004 137003A Dr Mays Crossing Elliott -24.97 152.42 251 30 1975 - 2004 137101A Burrum HWY Gregory -25.09 152.24 454 36 1967 - 2004 137201A Bruce HWY Isis -25.27 152.37 446 38 1967 - 2004 138002C Brooyar Wide Bay Ck -26.01 152.41 655 94 1910 - 2005 138003D Glastonbury Glastonbury Ck -26.22 152.52 113 81 1921 - 2006 138009A Tagigan Rd Tinana Ck -26.08 152.78 100 31 1975 - 2005 138010A Kilkivan Wide Bay Ck -26.08 152.22 322 97 1910 - 2006 138101B Kenilworth Mary -26.60 152.73 720 52 1921 - 1972 138102C Zachariah Amamoor Ck -26.37 152.62 133 83 1921 - 2005 138103A Knockdomny Kandanga Ck -26.40 152.64 142 34 1921 - 1954 138104A Kidaman Obi Obi Ck -26.63 152.77 174 42 1921 - 1963 138106A Baroon Pocket Obi Obi Ck -26.71 152.86 67 39 1941 - 1986 138107B Cooran Six Mile Ck -26.33 152.81 186 58 1948 - 2005 138110A Bellbird Ck Mary -26.63 152.70 486 45 1960 - 2004 138111A Moy Pocket Mary -26.53 152.74 820 39 1964 - 2004 138113A Hygait Kandanga Ck -26.39 152.64 143 34 1972 - 2005

University of Western Sydney 200

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 140002A Coops Corner Teewah Ck -26.06 153.04 53 27 1975 - 2005 141001B Kiamba South Maroochy -26.59 152.90 33 65 1938 - 2004 141003C Warana Br Petrie Ck -26.62 152.96 38 41 1959 - 2004 141004B Yandina South Maroochy -26.56 152.94 75 27 1959 - 2004 141006A Mooloolah Mooloolah -26.76 152.98 39 33 1972 - 2004 142001A Upper Caboolture Caboolture -27.10 152.89 94 40 1966 - 2005 142201D Cashs Crossing South Pine -27.34 152.96 178 46 1918 - 1963 142202A Drapers Crossing South Pine -27.35 152.92 156 39 1966 - 2005 143010B Boat Mountain Emu Ck -26.98 152.29 915 31 1967 - 2005 143015B Damsite Cooyar Ck -26.74 152.14 963 35 1969 - 2005 143101A Mutdapily Warrill Ck -27.75 152.69 771 39 1915 - 1953 143102B Kalbar No.2 Warrill Ck -27.92 152.60 468 55 1913 - 1970 143103A Moogerah Reynolds Ck -28.04 152.55 190 36 1918 - 1953 143107A Walloon Bremer -27.60 152.69 622 36 1962 - 1999 143108A Amberley Warrill Ck -27.67 152.70 914 36 1962 - 2004 143110A Adams Br Bremer -27.83 152.51 125 29 1972 - 2004 143113A Loamside Purga Ck -27.68 152.73 215 28 1974 - 2004 143203C Helidon Number 3 Lockyer Ck -27.54 152.11 357 74 1927 - 2004 143208A Dam Site Fifteen Mile Ck -27.46 152.10 87 26 1957 - 1985 143209B Mulgowie2 Laidley Ck -27.73 152.36 167 31 1958 - 2004 143303A Peachester Stanley -26.84 152.84 104 77 1928 - 2005 143307A Causeway Byron Ck -27.13 152.65 79 26 1976 - 2005 145002A Lamington No.1 Christmas Ck -28.24 152.99 95 43 1910 - 1953 145003B Forest Home Logan -28.20 152.77 175 83 1918 - 2005

University of Western Sydney 201

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 145005A Avonmore Running Ck -28.30 152.91 89 30 1923 - 1952 145010A 5.8KM Deickmans Br Running Ckreek -28.25 152.89 128 40 1966 - 2005 145011A Croftby Teviot Brook -28.15 152.57 83 38 1967 - 2005 145012A The Overflow Teviot Brook -27.93 152.86 503 39 1967 - 2005 145018A Up Stream Maroon Dam Burnett Ck -28.22 152.61 82 32 1971 - 2005 145020A Rathdowney Logan -28.22 152.87 533 32 1974 - 2005 145101D Lumeah Number 2 Albert -28.06 153.04 169 43 1911 - 1953 145102B Bromfleet Albert -27.91 153.11 544 85 1919 - 2005 145103A Good Dam Site Cainbable Ck -28.09 153.08 42 32 1963 - 2004 145107A Main Rd Br Canungra Ck -28.00 153.16 101 32 1974 - 2005 146002B Glenhurst Nerang -28.00 153.31 241 85 1920 - 2005 146003B Camberra Number 2 Currumbin Ck -28.20 153.41 24 55 1928 - 1982 146004A Neranwood Little Nerang Ck -28.13 153.29 40 35 1927 - 1961 146005A Chippendale Tallebudgera Ck -28.16 153.40 55 27 1927 -1953 146010A Army Camp Coomera -28.03 153.19 88 43 1963 - 2005 146012A Nicolls Br Currumbin Ck -28.18 153.42 30 31 1971 - 2005 146014A Beechmont Back Ck -28.12 153.19 7 31 1972 - 2004 146095A Tallebudgera Ck Rd Tallebudgera Ck -28.15 153.40 56 29 1971 - 2004 416303C Clearview Pike Ck -28.81 151.52 950 48 1935 - 1987 416305B Beebo Brush Ck -28.69 150.98 335 36 1969 - 2005 416312A Texas Oaky Ck -28.81 151.15 422 35 1970 - 2004 416404C Terraine Bracker Ck -28.49 151.28 685 45 1953 - 2001 416410A Barongarook Macintyre Brook -28.44 151.46 465 32 1968 - 2001 422210A Tabers Bungil Ck -26.41 148.78 710 32 1967 - 2004

University of Western Sydney 202

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 422301A Long Crossing Condamine -28.32 152.34 85 66 1912 - 1977 422302A Killarney Spring Ck -28.35 152.34 21 45 1910 - 1954 422303A Killarney Spring Ck South -28.36 152.34 10 45 1910 - 1954 422304A Elbow Valley Condamine -28.37 152.16 275 56 1916 - 1971 422306A Swanfels Swan Ck -28.16 152.28 83 85 1920 - 2004 422307A Kings Ck Kings Ck -27.90 151.91 334 42 1921 - 1966 422313B Emu Vale Emu Ck -28.23 152.23 148 58 1948 - 2005 422317B Rocky Pond Glengallan Ck -28.13 151.92 520 38 1954 - 1991 422319B Allora Dalrymple Ck -28.04 152.01 246 36 1969 - 2005 422321B Killarney Spring Ck -28.35 152.33 35 45 1960 - 2004 422326A Cranley Gowrie Ck -27.52 151.94 47 34 1970 - 2004 422332B Oakey Gowrie Ck -27.47 151.74 142 25 1969 - 2006 422334A Aides Br Kings Ck -27.93 151.86 516 35 1970 - 2004 422338A Leyburn Canal Ck -28.03 151.59 395 27 1975 - 2004 422341A Brosnans Barn Condamine -28.33 152.31 92 29 1977 - 2005 422394A Elbow Valley Condamine -28.37 152.14 325 32 1973 - 2004 913010A 16 Mile Waterhole Fiery Ck -18.88 139.36 722 29 1973 - 2004 915011A Mt Emu Plains Porcupine Ck -20.18 144.52 540 31 1972 - 2004 915206A Railway Crossing Dugald -20.20 140.22 660 31 1970 - 2004 915211A Landsborough HWY Williams -20.87 140.83 415 31 1971 - 2003 917104A Roseglen Etheridge -18.31 143.58 867 32 1967 - 2005 917107A Mount Surprise Elizabeth Ck -18.13 144.31 651 32 1969 - 2002 919005A Fonthill Rifle Ck -16.68 145.23 366 32 1969 - 2004 919013A Mulligan HWY McLeod -16.50 145.00 532 25 1973 - 2005

University of Western Sydney 203

Artificial Intelligence Based RFFA Aziz

Record Length Station ID Station Name River Name Lat ( °S) Long ( °E) Area (km2) Period of Record (years) 919201A Goldfields Palmer -16.11 144.78 533 30 1968 - 2004 919305B Nullinga Walsh -17.18 145.30 326 35 1957 - 1991 922101B Racecourse Coen -13.96 143.17 172 32 1968 - 2004 926002A Dougs Pad Dulhunty -11.83 142.42 332 30 1971 - 2004

University of Western Sydney 204

Artificial Intelligence Based RFFA Aziz

APPENDIX B

Additional results on training and validation of RFFA models

University of Western Sydney 205

Artificial Intelligence Based RFFA Aziz

Figure B.1 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q2 (training data set)

Figure B.2 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q5 (training data set)

University of Western Sydney 206

Artificial Intelligence Based RFFA Aziz

Figure B.3 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q10 (training data set)

Figure B.4 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q50 (training data set)

University of Western Sydney 207

Artificial Intelligence Based RFFA Aziz

Figure B.5 Comparison of observed and predicted flood quantiles for ANN based RFFA model for Q100 (training data set)

Figure B.6 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q2 (training data set)

University of Western Sydney 208

Artificial Intelligence Based RFFA Aziz

Figure B.7 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q5 (training data set)

Figure B.8 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q10 (training data set)

University of Western Sydney 209

Artificial Intelligence Based RFFA Aziz

Figure B.9 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q50 (training data set)

Figure B.10 Comparison of observed and predicted flood quantiles for GAANN based RFFA model for Q100 (training data set)

University of Western Sydney 210

Artificial Intelligence Based RFFA Aziz

Figure B.11 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q2 (training data set)

Figure B.12 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q5 (training data set)

University of Western Sydney 211

Artificial Intelligence Based RFFA Aziz

Figure B.13 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q10 (training data set)

Figure B.14 Comparison of observed and predicted flood quantiles for GEP based RFFA model for Q50 (training data set)

University of Western Sydney 212

Artificial Intelligence Based RFFA Aziz

Figure B.15 Comparison of observed and predicted flood quantiles (training) for GEP based RFFA model for Q100 (training data set)

Figure B.16 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q2 (training data set)

University of Western Sydney 213

Artificial Intelligence Based RFFA Aziz

Figure B.17 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q5 (training data set)

Figure B.18 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q10 (training data set)

University of Western Sydney 214

Artificial Intelligence Based RFFA Aziz

Figure B.19 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q50 (training data set)

Figure B.20 Comparison of observed and predicted flood quantiles (training) for CANFIS based RFFA model for Q100 (training data set)

University of Western Sydney 215

Artificial Intelligence Based RFFA Aziz

Figure B.21 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for Q2

Figure B.22 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for Q5

University of Western Sydney 216

Artificial Intelligence Based RFFA Aziz

Figure B.23 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for Q10

Figure B.24 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for Q50

University of Western Sydney 217

Artificial Intelligence Based RFFA Aziz

Figure B.25 Comparison of observed and predicted flood quantiles (validation) for ANN based RFFA model for Q100

Figure B.26 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q2

University of Western Sydney 218

Artificial Intelligence Based RFFA Aziz

Figure B.27 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q5 10000

1000

/sec) 3

(m 100

pred Q

10

1 1 10 100 1000 10000 3 Qobs (m /sec)

Figure B.28 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q10

University of Western Sydney 219

Artificial Intelligence Based RFFA Aziz

Figure B.29 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q50

10000

1000

)

/sec 3

(m 100

pred Q

10

1 1 10 100 1000 10000 3 Qobs (m /sec)

Figure B.30 Comparison of observed and predicted flood quantiles (validation) for GAANN based RFFA model for Q100

University of Western Sydney 220

Artificial Intelligence Based RFFA Aziz

Figure B.31 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for Q2

Figure B.32 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for Q5

University of Western Sydney 221

Artificial Intelligence Based RFFA Aziz

Figure B.33 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for Q10

Figure B.34 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for Q50

University of Western Sydney 222

Artificial Intelligence Based RFFA Aziz

Figure B.35 Comparison of observed and predicted flood quantiles (validation) for GEP based RFFA model for Q100

Figure B.36 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q2

University of Western Sydney 223

Artificial Intelligence Based RFFA Aziz

Figure B.37 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q5

Figure B.38 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q10

University of Western Sydney 224

Artificial Intelligence Based RFFA Aziz

Figure B.39 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q50

Figure B.40 Comparison of observed and predicted flood quantiles (validation) for CANFIS based RFFA model for Q100

University of Western Sydney 225

Artificial Intelligence Based RFFA Aziz

Figure B.41 Regression plot comparing the training and validation of the ANN based RFFA model for Q2

Figure B.42 Regression plot comparing the training and validation of the ANN based RFFA model for Q5

University of Western Sydney 226

Artificial Intelligence Based RFFA Aziz

Figure B.43 Regression plot comparing the training and validation of the ANN based RFFA model for Q10

Figure B.44 Regression plot comparing the training and validation of the ANN based RFFA model for Q50

University of Western Sydney 227

Artificial Intelligence Based RFFA Aziz

Figure B.45 Regression plot comparing the training and validation of the ANN based RFFA model for Q100

University of Western Sydney 228

Artificial Intelligence Based RFFA Aziz

Figure B.46 Section of Dendrogram using average linkage between groups

University of Western Sydney 229

Artificial Intelligence Based RFFA Aziz

Figure B.47 Section of Dendrogram using average linkage between groups

University of Western Sydney 230