Khaled Haddad - 98072705

M.Eng(Hons) Thesis

Design Flood Estimation for Ungauged Catchments in : Ordinary & Generalised Least Squares Methods Compared

By Khaled Haddad B.Eng (Hons) Civil Student ID: 98072705

Principal Supervisor: Dr Ataur Rahman Associate Supervisors: Associate Professor Surendra Shrestha Associate Professor Chin Leo

School of Engineering University of Western Sydney Feb 2008

i Khaled Haddad - 98072705

STATEMENT OF AUTHENTICATION I hereby declare that the work presented in this thesis is solely my own work and that to the best of my knowledge the work is original except where otherwise indicated by references to other authors or works. No part of this thesis has been submitted for any other degree or diploma.

Signature…… ………………………….. Date …23…/…06…/2008………

ii Khaled Haddad - 98072705

ACKNOWLEDGEMENTS The author would like to gratefully acknowledge: • His main supervisor Dr Ataur Rahman, for his excellent guidance, inspiration, invaluable suggestions, timely advice and willingness to help at any time throughout the course of this research. • Mr Erwin Weinmann of Monash University for his constructive comments, valuable guidance, advice and encouragement throughout this research. • His associate supervisors Associate Professor Surendra Shrestha and Associate Professor Chin Leo for their encouragement and advice. • Department of Sustainability and Environment and Thiess Services Victoria for providing the streamflow data. • The Bureau of Meteorology for providing climatic data CDs. • Professor George Kuczera, Associate Professor James Ball, Mr Mark Babister, Mr Robert French and Dr William Weeks for their suggestions and input to the project. • Many thanks to Mr Wilfredo Caballero for providing some of the data used in this study which were abstracted as a part of his BEng (Honours) thesis in the University of Western Sydney. • My parents and family for being very encouraging and proud of what I have achieved.

iii Khaled Haddad - 98072705

ABSTRACT Design flood estimation in small to medium sized ungauged catchments is frequently required in hydrologic analysis and design and is of notable economic significance. For this task Australian Rainfall and Runoff (ARR) 1987, the National Guideline for Design Flow Estimation, recommends the Probabilistic Rational Method (PRM) for general use in South- East . However, there have been recent developments that indicated significant potential to provide more meaningful and accurate design flood estimation in small to medium sized ungauged catchments. These include the L moments based index flood method and a range of quantile regression techniques.

This thesis focuses on the quantile regression techniques and compares two methods: ordinary least squares (OLS) and generalised least squares (GLS) based regression techniques. It also makes comparison with the currently recommended Probabilistic Rational Method. The OLS model is used by hydrologists to estimate the parameters of regional hydrological models. However, more recent studies have indicated that the parameter estimates are usually unstable and that the OLS procedure often violates the assumption of homoskedasticity. The GLS based regression procedure accounts for the varying sampling error, correlation between concurrent flows, correlations between the residuals and the fitted quantiles and model error in the regional model, thus one would expect more accurate flood quantile estimation by this method.

This thesis uses data from 133 catchments in the state of Victoria to develop prediction equations involving readily obtainable catchment characteristics data. The GLS regression procedure is explored further by carrying out a 4-stage generalised least squares analysis where the development of the prediction equations is based on relating hydrological statistics such as mean flows, standard deviations, skewness and flow quantiles to catchment characteristics.

This study also presents the validation of the two techniques by carrying out a split-sample validation on a set of independent test catchments. The PRM is also tested by deriving an

iv Khaled Haddad - 98072705 updated PRM technique with the new data set and carrying out a split sample validation on the test catchments.

The results show that GLS based regression provides more accurate design flood estimates than the OLS regression procedure and the PRM. Based on the average variance of prediction, standard error of estimate, traditional statistics and new statistics, rankings and the median relative error values, the GLS method provided more accurate flood frequency estimates especially for the smaller catchments in the range of 1-300 km 2. The predictive ability of the GLS model is also evident in the regression coefficient values when comparing with the OLS method. However, the performance of the PRM method, particularly for the larger catchments appears to be satisfactory as well.

v Khaled Haddad - 98072705

TABLE OF CONTENTS

Page

Front Cover i Statement ii Acknowledgements iii Abstract iv Table of Contents vi List of Figures xiii List of Tables xvii List of Notations xviii List of Abbreviations xx

CHAPTER 1: INTRODUCTION 1 1.1 Background to the proposed research 1 1.2 The need for this research 3 1.3 Objectives of this research 5 1.4 Outline of the thesis 5

CHAPTER 2: REVIEW OF REGIONAL FLOOD FREQUENCY ESTIMATION TECHNIQUES 9 2.1 General 9 2.2 Basic Issues 9 2.2.1 Flood Frequency Analysis 9 2.2.2 Regional Flood Frequency Analysis 11 2.2.3 Regional Homogeneity 11 2.2.4 Inter – Site Dependence 12 2.2.5 Distributional Choices 13 2.3 Methods for identification of Homogenous Regions 14 2.4 Regional Flood Frequency Analysis Methods 15

vi Khaled Haddad - 98072705

2.4.1 Index Flood Method 15 2.4.2 Station Year Method 19 2.4.3 Bayesian Method 19 2.4.4 Probabilistic Rational Method 19 2.5 Quantile Regression Technique 21 2.5.1 Introduction 21 2.5.2 Generalised Least Squares and Weighted Least Squares 23 2.5.3 Application of Generalised Least Squares Regression 24 2.5.4 An Operational GLS Model for Hydrological Regression 25 2.5.5 Operational Bayesian GLS Regression for Regional Hydrological Analysis 25 2.5.6 The Use of GLS Regression in Regional Hydrologic Analysis 26 2.5.7 Application of Generalised Least Squares to Low – Flow Frequency Analysis 26 2.6 Quantile Regression Technique In Australia 28 2.7 Summary 30

CHAPTER 3: METHODOLOGY OF STATISTICAL TECHNIQUES USED IN THIS REGIONAL FLOOD FREQUENCY ANALYSIS STUDY 32 3.1 General 32 3.2 Methods for Assessing the Degree of Homogeneity of a Region 34 3.2.1 L-moments 35 3.2.2 Tests Based on L-moments (Goodness of Fit Tests) 35 3.3 Regional Homogeneity Tests 36 3.4 At – Site Flood Frequency Analysis 38 3.4.1 FLIKE 39 3.4.2 Log – Pearson Type 3 Distribution 40 3.5 Multiple Regression Analysis 41 3.5.1 Ordinary Least Squares 42 3.5.2 Dealing with Assumption Violations of Ordinary Least Squares 43

vii Khaled Haddad - 98072705

3.5.3 The Basic Problem – Generalised Least Squares 43 3.5.4 Weighted Least Squares 46 3.5.5 Dealing with Data Problems 48 3.6 Operational Generalised Least Squares – 4 Stage Generalised Least Squares Analysis 49 3.6.1 Regional Skew Analysis 53

3.6.2 Variance of Sample Estimators by Bootstrap Var (gi) 55 3.7 Ordinary Least Squares – Model Development Techniques Used 56 3.8 Generalised Least Squares – Model Development 59 3.8.1 Regression Model for Skewness 59 3.9 Setting up of the Residual Error Covariance Matrices 60 3.9.1 Regression Model for Standard Deviation 60 3.9.2 Regression Model for Mean 60 3.9.3 GLS Regression Model for the Quantiles 61 3.10 Measures of Model and Prediction Error 61 3.11 Development of the Probabilistic Rational Method 63 3.12 Summary 66

CHAPTER 4: STUDY AREA AND PREPARATION OF STREAMFLOW DATA 68 4.1 General 68 4.2 Study Area 68 4.3 Selection of Initial Catchment Candidate Catchments 69 4.4 Filling Missing Records in Annual Maximum Flood Series 71 4.5 Trend Analysis – Mann Kendall Test for Trend and Distribution Free CUSUM Test 74 4.6 Rating Curve Extrapolation Error 78 4.7 Impact of Rating Curve Error on Flood Frequency Analysis 80 4.8 Impacts of Rating Ratio on Flood Frequency Analysis – Sensitivity analysis 81 4.9 Selected Catchments 82 4.10 Checking for Outliers in the Annual Maximum Series 84

viii Khaled Haddad - 98072705

4.11 Selection of Test Catchments 87 4.12 Summary 87

CHAPTER 5: SELECTION AND ESTIMATION OF CLIMATIC AND CATCHMENT CHARACTERISTICS 88

5.1 General 88 5.2 Categories of Catchment Characteristics Considered 89 5.2.1 Climatic Characteristics 89 5.2.2 Morphometric Characteristics 89 5.2.3 Catchment Cover and Land Use Characteristics 89 5.2.4 Geology 90 5.3 Selection Criteria 90 5.4 Catchment Characteristics Considered for the Proposed Research 91 5.4.1 Rainfall Intensity 91 5.4.2 Mean Annual Rainfall 92 5.4.3 Mean Annual Evapotranspiration 93 5.4.4 Catchment Area 93 5.4.5 Slope S1085 93 5.4.6 Stream Density 94 5.4.7 Fraction Forest Area 94 5.4.8 Quaternary Sediment Area 94 5.5 Exploratory Data Analysis – Transformations 100 5.6 Exploratory Data Analysis – Correlation Matrix 111 5.7 Summary 112

CHAPTER 6: SEARCHING FOR HOMOGENEOUS REGIONS 113 6.1 Selection Criteria 113 6.2 Formation of Homogeneous Groups 113 6.3 Measuring the Degree of Heterogeneity in a Group 114

ix Khaled Haddad - 98072705

6.4 Forming One Homogenous Group 115 6.5 Forming Two Homogenous Groups 115 6.5.1 Homogenous Regions Based on the North of the 116 6.5.2 Homogenous Regions Based on the South of the Great Dividing Range 116 6.6 Forming Three Homogenous Groups 117 6.6.1 Homogenous Regions Based on Eastern Victoria 118 6.6.2 Homogenous Regions Based on Northwest Victoria 118 6.6.3 Homogenous Regions Based on Southwest Victoria 118 6.7 Forming Four Homogenous Groups 119 6.7.1 Homogenous Regions Based on Northwest Victoria 119 6.7.2 Homogenous Regions Based on Southwest Victoria 120 6.7.3 Homogenous Regions Based on Northeast Victoria 120 6.7.4 Homogenous Regions Based on Southeast Victoria 120 6.8 Forming Homogenous Groups Based on Catchment Size 120 6.9 Summary 123

CHAPTER 7: DEVELOPMENT OF THE PROBABILISTIC RATIONAL METHOD 124

7.1 General 124

7.2 Development of Runoff Coefficient C 10 Isopleth Maps 124

7.3 Development of Prediction Equation for C 10 125 7.4 Derivation of Frequency Factors 129 7.5 Summary 130

CHAPTER 8: DEVELOPMENT OF PREDICTION EQUATIONS BY QUANTILE REGRESSION TECHNIQUE 131 8.1 Ordinary Least Squares – Procedures Adopted 131 8.2 Log Transformation of Data 132 8.3 Development of Prediction Equations - OLS 132

x Khaled Haddad - 98072705

8.4 Q10 Model 132 8.5 Discussion 134 8.6 Results for Weighted Least Squares Regression Model for Skewness 138 8.7 Development of Prediction Equations Using Generalised Least Squares 138

8. 8 Q10 Model 138 8.9 Discussion 140 8. 10 Summary 142

CHAPTER 9: VALIDATION OF THE DEVELOPED PREDICTION EQUATIONS 144

9.1 Introduction 144 9.2 At – Site Flood Frequency Analysis 145 9.3 Quantile Regression Technique 145 9.4 Redeveloped Probabilistic Rational Method 146 9.5 Evaluation of Results 146 9.5.1 Comparison of techniques for various ARI’s 147 9.5.2 Comparison of Techniques in each test catchment 154 9.5.3 Comparison of Methods using Median Errors and Boxplots 155 9.5.3.1 Median Relative Errors & Boxplots 155 9.5.4 Comparison of Methods on at site Flood Frequency Plots 160 9.5.5 Strengths of the OLS and GLS models 162 9.6 Summary 162

CHAPTER 10: CONCLUSIONS 164 10.1 Introduction 164 10.2 Overview of the Study 164 10.3 Conclusions 166 10.4 Suggestions for Future Research 168

REFERENCES 170

xi Khaled Haddad - 98072705

APPENDIX A 177 APPENDIX B 180 APPENDIX C 181 APPENDIX D 183 APPENDIX E 191 APPENDIX F 200 APPENDIX G 219

xii Khaled Haddad - 98072705

LIST OF FIGURES

Page Figure 3.1 Flow chart of statistical techniques / methods used 33 Figure 3.2 Relationship between the cross correlations among Annual peaks and Distance in South - East Australia 49 Figure 4.1 Study area in the state of Victoria in Australia highlighted 68 Figure 4.2 Geographical distributions of the candidate study catchments 70 Figure 4.3 Time series graph showing significant trends after 1995 77 Figure 4.4 CUSUM test plot showing significant trends after 1995 78 Figure 4.5 Plot of rating ratios for Station 222202 79 Figure 4.6 Histogram of rating ratios (RR) of annual maximum flood data in Victoria (stations with record lengths > 25 years) 80 Figure 4.7 Impact of considering rating curve error in flood frequency analysis 82 Figure 4.8 Distribution of record lengths of the selected 133 catchments 83 Figure 4.9 Geographical distributions of the selected 133 catchments 83 Figure 4.10 Distribution of catchment areas of the selected 133 catchments 84 Figure 4.11 The high outlier detected for station 405234 86 Figure 5.1 Histogram of Catchment Area 96

2 Figure 5.2 Histogram of Rainfall Intensity (I12 ); 96 Figure 5.3 Histogram of Mean Annual Rainfall 97 Figure 5.4 Histogram of Mean Annual Evapotranspiration 97 Figure 5.5 Histogram of Forest % 98 Figure 5.6 Histogram of Slope S1085 98 Figure 5.7 Histogram of Stream Density 99 Figure 5.8 Histogram of Quaternary Sediment Area 99 Figure 5.9 Variable Area before Transformation 103 Figure 5.10 Variable Area after Transformation 103 Figure 5.11 Variable I12:2 before Transformation 104 Figure 5.12 Variable I12:2 after Transformation 104 Figure 5.13 Variable rain before Transformation 105

xiii Khaled Haddad - 98072705

Figure 5.14 Variable rain after Transformation 105 Figure 5.15 Variable evap before Transformation 106 Figure 5.16 Variable evap after Transformation 106 Figure 5.17 Variable forest before Transformation 107 Figure 5.18 Variable forest after Transformation 107 Figure 5.19 Variable S1085 before Transformation 108 Figure 5.20 Variable S1085 after Transformation 108 Figure 5.21 Variable sden before Transformation 109 Figure 5.22 Variable sden after Transformation 109 Figure 5.23: Variable qsa before Transformation 110 Figure 5.24: Variable qsa after Transformation 110 Figure 6.1 Victoria – Regions based on the Great Dividing Range 116 Figure 6.2 Victoria – Regions based on the East, Northwest and Southwest Victoria 117 Figure 6.3 Victoria – Regions based on the Northeast, Southeast, Northwest and Southwest Victoria 119 Figure 7.1 New C10 contour map for the PRM method in Victoria 127

Figure 8.1 – Histogram of Standardized Residuals for log Q10 133

Figure 8.2 – Normal Probability Plot of the Standardised Residuals for log Q10 133

Figure 8.3 – Standardised Residuals vs. Predicted Values for log Q10 133

Figure 8.4 – Histogram of Standardized Residuals (GLS) for log Q10 139

Figure 8.5 – Standardised Residuals vs. Predicted Values (GLS) for log Q10 139

Figure 9.1 – Comparison of flood quantiles Q1.25 147

Figure 9.2 – Comparison of flood quantiles Q2 148

Figure 9.3 – Comparison of flood quantiles Q5 148

Figure 9.4 – Comparison of flood quantiles Q10 149

Figure 9.5 – Comparison of flood quantiles Q20 149

Figure 9.6 – Comparison of flood quantiles Q50 150

Figure 9.7 – Comparison of flood quantiles Q100 150

Figure 9.8 – Comparison of flood quantiles Q200 151

Figures 9.9 Box plot of relative errors associated with the PRM 157

xiv Khaled Haddad - 98072705

Figure 9.10 Box plots of relative errors in design flood estimates from the OLS and GLS methods 159 Figure 9.11 Typical comparison of OLS and GLS estimates for a test catchment (area < 300 km 2) 161 Figure 9.12 Typical comparison PRM estimates plotted on FFA plot 161 Figure B-1 CUSUM test plot showing significant trends after 1995 180 Figure B-2 Time series graph showing trends 180 Figure E-1 Plot of C10 – coefficient, Estimation Set Observed & Predicted Values 199 Figure G-1 Comparison of FFA plots for different Flood Estimation Methods for Station 221207 220 Figure G-2 Comparison of FFA plots for different Flood Estimation Methods for Station 221210 220 Figure G-3 Comparison of FFA plots for different Flood Estimation Methods for Station 221212 221 Figure G-4 Comparison of FFA plots for different Flood Estimation Methods for Station 223202 221 Figure G-5 Comparison of FFA plots for different Flood Estimation Methods for Station 225224 222 Figure G-6 Comparison of FFA plots for different Flood Estimation Methods for Station 226209 222 Figure G-7 Comparison of FFA plots for different Flood Estimation Methods for Station 227200 223 Figure G-8 Comparison of FFA plots for different Flood Estimation Methods for Station 227210 223 Figure G-9 Comparison of FFA plots for different Flood Estimation Methods for Station 227219 224 Figure G-10 Comparison of FFA plots for different Flood Estimation Methods for Station 229218 224 Figure G-11 Comparison of FFA plots for different Flood Estimation Methods for Station 230204 235

xv Khaled Haddad - 98072705

Figure G-12 Comparison of FFA plots for different Flood Estimation Methods for Station 230213 225 Figure G-13 Comparison of FFA plots for different Flood Estimation Methods for Station 231231 226 Figure G-14 Comparison of FFA plots for different Flood Estimation Methods for Station 235205 226 Figure G-15 Comparison of FFA plots for different Flood Estimation Methods for Station 401210 227 Figure G-16 Comparison of FFA plots for different Flood Estimation Methods for Station 402217 227 Figure G-17 Comparison of FFA plots for different Flood Estimation Methods for Station 405229 228 Figure G-18 Comparison of FFA plots for different Flood Estimation Methods for Station 406200 228 Figure G-19 Comparison of FFA plots for different Flood Estimation Methods for Station 406213 229 Figure G-20 Comparison of FFA plots for different Flood Estimation Methods for Station 415238 229

xvi Khaled Haddad - 98072705

LIST OF TABLES

Page Table 4.1 In-filling annual maximum flood series (Method 1) 71 Table 4.2 Important Information on Method 2 of Gap Filling 73 Table 4.3 List of Test Catchments 87 Table 5.1 Summary statistics of the Catchment Characteristics data 95 Table 5.2 Test for normality of the catchment characteristics variables 101 Table 5.3 Test for normality of the log transformed catchment characteristics variables 102 Table 5.4 Correlation matrix of the transformed variables 111 Table 5.5 Correlation matrix of the log transformed variables 111 Table 6.1 Results of Homogeneity Test for the Hypothesised Regions 121

Table 7.1 Some important statistics relating to C10 prediction equation 126

Table 7.2 – Interpolated values and regression estimate values for C 10 of the test catchments 128 Table 7.3 Frequency factors for the new PRM for Victoria 129 Table 8.1 Important statistics for the developed prediction equations 134 Table 8.2 Relative importance of the predictor variables 137 Table 8.3 Important statistics for the developed GLS prediction equations 140 Table 9.1 Model comparison summary for various ARIs 152 Table 9.2 - Comparison of Techniques in each test catchment 155 Table 9.3 Median relative errors (%) for quantile estimates from all methods 156 Table 9.4 Some important statistics of relative error values 156 Table A-1 Catchments and Streamflow Data 178 Table C-1 Stations in Each Hypothesised Homogeneous Region 182 Table D-1 Catchment Characteristics 184 Table E-1 Flood Frequency Quantiles for Estimation Set 192 Table E-2 Flood Frequency Quantiles for Validation Set 194

Table E-3 C 10 Estimation Set 194 Table E-4 C – coefficient, Estimation Set and Frequency Factors 195 Table E-5 C10 Estimation Set: Observed & Predicted Values 198

xvii Khaled Haddad - 98072705

NOTATIONS A catchment area a, b constants

Cs coefficient of skewness

Cv coefficient of variation D discordancy measure e error term in regression analysis

F(Q T) probability that Q will not exceed T G regional mean skewness coefficient g nx1 vector of true skews H heterogeneity measure

IN identity matrix K number of parameters in regression model L total record lengths of all sites in a group l sample l moment L CS L coefficient of skewness L CV L coefficient of variation M total number of bootstrap samples generated N total number of streamflow records used in flood frequency analysis n total number of datasets in regression analysis

Nsim number of simulated regions p probability

QT flood quantile having return period of T years R2 coefficient of determination used in OLS regression T return period tc time of concentration Var() variance X nxk matrix of basin characteristics β kx1 vector of regression coefficients

βˆ * sample estimate of β with Λ known

xviii Khaled Haddad - 98072705

βˆ sample estimate of β with Λ estimated

Gi regional sample skew value at site i gˆ nx1 vector of sample skews si sample standard deviation of logs of annual maxima at station i Y vector of dependant variables in regression model xo row vector of basin characteristics at site 0

γ 2 model error variance

γˆ 2 sample estimate of model error variance Λ covariance matrix of regression errors Λˆ data based estimate of Λ σ 2 residual variance from OLS regression λ population L moment Q mean annual flood

µi population mean at site i

σ i population standard deviation i qˆ sample mean of logs of annual maxima at station i θ parameter of probability distribution

KT frequency factor for return period T in log-Pearson III flood frequency analysis

ρε∂ correlation between residuals in GLS regression

ε i residual error term associated with regression of mean

∂ i residual error term associated with regression of standard deviation Σˆ sampling error covariance matrix

ρˆ ε∂ estimated correlation between residuals

ρˆ ij estimated correlation/distance relationship between stations

ρij correlation/distance relationship between stations

R 2 pseudo coefficient of determination used in GLS regression

xix Khaled Haddad - 98072705

ABBREVIATIONS ‘A’ Anderson Darling Statistic for normality ARR-FLIKE Australian Rainfall and Runoff flood frequency analysis software AUSIFD Australian Intensity Frequency Duration software AEP Annual Exceedance Probabilities Aust Australia ARR Australian Rainfall and Runoff ARI Average Recurrence Interval ASEV Average Sampling Error Variance BOM Bureau of Meteorology Area Catchment area (km 2) CD Compact disk dtm Digital terrain model en i Equivalent years of record length FFA Flood Frequency Analysis

FF Y Flood frequency factors forest Fraction of basin covered by medium to dense forest qsa Fraction quaternary sediment area GEV Generalised Extreme Value distribution GLS Generalized Least Squares IFM Index flood method ID Instantaneous Discharge IFD Intensity Frequency Duration IM Monthly Instantaneous Maximum Data MMD Monthly Maximum Mean Daily Data LP3 Log Pearson type III distribution MLE Maximum Likelihood Estimator evap Mean annual evapotranspiration (mm) rain Mean annual rainfall (mm) NERC Natural Environment Research Council (UK) NCWE National Committee on Water Engineering

xx Khaled Haddad - 98072705

OLS Ordinary Least Squares PRM Probabilistic Rational Method PD, POT Peaks over threshold QRT Quantile Regression technique RR Rating Ratio

I12:2 Rainfall intensity of 12-hour duration and 2-year average recurrence interval (mm/hr) RFFA Regional Flood Frequency Analysis C Runoff Coefficient used in rational method Sden Stream Density (km/ km 2) Slope Slope of central 75% of mainstream S1085 (m/km) SEE Standard error of estimates Sep Standard error of prediction r Fillibens probability plot correlation coefficient TS Time Series USGS United States Geological Survey VIF Variance Inflation Factor VIC Victoria Vol Volume VP Average Variance of Prediction WLS Weighted Least Squares

xxi Khaled Haddad 98072705 University Of Western Sydney

CHAPTER 1: INTRODUCTION

This thesis is concerned with design flood estimation in ungauged catchments using a regional approach. The target with this regional approach is to exploit the errors often found in hydrological data using a Quantile Regression Technique (QRT) and hence provide reasonably accurate design flood estimates at ungauged sites. This study will focus on (1) the current regional flood estimation procedures and their merits/demerits (2) collation of streamflow data for regional flood estimation for the state of Victoria (3) developing regional prediction equations based on QRT and comparing this with the currently recommended Probabilistic Rational Method (PRM) for South – East Australia (ARR, IEAUST, 1987).

1.1 BACKGROUND TO THE RESEARCH

Regional flood frequency analysis serves two purposes. For sites where streamflow data are not available, the analysis is based on regional data (Cunnane, 1989). For sites with available data, the joint use of data measured at a site, called at - site data, and regional data from a number of stations in a region provides sufficient information to enable a probability distribution to be used with greater reliability. This type of analysis represents a substitution of space for time where data from different locations in a region are used to compensate for short records at a single site (Stedinger et al., 1993).

The availability of streamflow data is an important aspect in any regional flood frequency analysis. The estimation of probability of occurrence of extreme floods is an extrapolation based on limited data. Thus the larger the database, the more accurate the estimates will be. From a statistical point of view, estimation from a small sample may give unreasonable or physically unrealistic parameter estimates, especially for probability distributions with a large number of parameters (three or more). Large variations associated with small sample sizes cause the estimates to be distorted and unrealistic. In practice, however, data may be limited or in some cases may not be available for a site. In such situations, regional analysis is the most useful.

1 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

Flood Estimation in ungauged catchments is a common problem in hydrological design. There are several methods that are frequently adopted for this task. They include the various forms of the Index Flood Method, the Rational Method and the Quantile Regression Technique (QRT). In South – East Australia, the PRM is recommended for general use in Australian Rainfall and Runoff (ARR) mainly due to its simplistic nature (I.E. Aust., 1987). The essential component of this method is a dimensionless runoff coefficient which (in ARR) is assumed to vary smoothly over geographical space, an assumption that may not be satisfied in many cases because two nearby catchments can exhibit quite different physical features. Also values associated with these runoff coefficients were estimated in ARR using conventional moment estimates with flow records of limited length (some sites had only 10 years of record). This shows that these values were affected by greater sampling variability which can then introduce significant bias and uncertainty into design floods estimated using the PRM. Criticism has also been linked to the way the runoff coefficients are mapped; this can be attributed to the assumption of geographical contiguity as described above.

Rahman (2005) presented a QRT using ordinary least squares (OLS) for South-east Australia which can provide reasonably accurate design flood estimation for ungauged catchments in this region. The main focus of this thesis is to extend the QRT method using a generalised least squares (GLS) regression technique that accounts for cross–correlation and varying record lengths of annual maximum flood series at different sites. This is done by developing a weighting matrix that will include sampling and model errors thus exploiting the heteroskedastic structure of the residual errors. Stedinger and Tasker (1985) showed, in a Monte Carlo simulation, that the GLS method provided model regression parameters with smaller mean–square errors than the competing OLS model and results in a more accurate estimate of the regional model error.

The OLS estimator is generally used by hydrologists to estimate the parameters of regional hydrological models i.e β. But in order for the OLS model to be statistically efficient and robust, the annual maximum flood series in the region must be uncorrelated, all the sites in the region should have equal record length and all estimates of y year events have equal variance. Since the annual maximum flow data in a region do not generally satisfy these criteria, the assumption that the model residual errors in OLS are homoskedastic is violated

2 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney and the OLS approach can provide distorted estimates of the model error and the precision with which the regression model’s parameters are being estimated (Stedinger and Tasker, 1985).

To overcome the above problems in OLS, Stedinger and Tasker (1985) proposed GLS procedure which can result in remarkable improvements in the precision with which the parameters of regional hydrologic regression models can be estimated in particular when the record length varies widely from site to site. In the GLS model, the assumption of equal variance of the y year events and zero cross-correlation for concurrent flows are relaxed.

The GLS approach has not been applied in Regional Flood Frequency Analysis (RFFA) in Australia before. The principal objective of this thesis is to apply the QRT to the State of Victoria in Australia.

The main thrust of this method is to develop reasonably accurate and reliable prediction equations which can be used to predict design floods for ungauged catchments in Victoria. This approach involves using a statistically sound method which does not make any assumptions about runoff coefficients or geographical contiguity as with the currently proposed method (i.e. PRM) for Victoria.

If this study shows positive results for the study area, recommendations will be made for this method to be tested in other states within Australia and then the possibility of replacing the current methods within ARR to use the QRT for design flood estimation for small to medium ungauged catchments.

1.2 THE NEED FOR THIS RESEARCH

Australia is a large continent with many streams; many of these streams are ungauged or have little streamflow data. Out of the 12 drainage divisions in Australia, seven do not have streamflow data extending 20 or more years (Rahman, 1997). Therefore the need for reasonably accurate design flood estimation methods in ungauged catchments is highly important. The sizing of minor hydraulic structures such as culverts and bridges at sites with little flood information is a common problem faced by practicing engineers. This

3 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney problem is not of a small scope, with costs estimated of these structures is in the order of $300 million per annum in Australia (Bates, 1994).

Australian Rainfall Runoff, Chapter 5 (I.E Aust, 1987) states that almost 50% of Australia’s annual expenditures on projects which require flood estimation are required in small to medium sized ungauged catchments. The smaller catchments typically have an upper limit of 25 km 2, while medium sized catchments have an upper limit of 1000 km 2.

There have been many RFFA techniques which have been considered and proposed, tested and even used over many decades, it is well understood amongst researchers that some of the currently recommended approaches are not based on hydrologically and statistically meaningful rationale. As there are flaws in these empirical approaches further research is needed to find more suitable alternatives. Currently within Australia and the world there is no one universally accepted method which can be applied to specific regions, instead there are many developed site specific approaches for the region of interest.

In Australia ARR (I.E Aust, 1987) made recommendations which could vary from state to state, some of these recommendations include the PRM, Index Flood Method (IFM) and a variety of regional approaches such as the synthetic unit hydrograph and the main roads methods for the State of Queensland.

Since ARR came out in 1987, there have been some advancements in at – site and regional flood frequency analysis techniques (RFFA). There is also the extra benefit of having 20 years of additional streamflow data that can be incorporated into the regional model for more statistically meaningful results.

However, in stating this, there has been limited research and no revision has been made to the recommendations in Book 4 of ARR which were published in 1987. As a response to this problem and issues raised by practising hydrologists, the National Committee on Water Engineering (NCWE) and the ARR revision team are currently seeking to revise the chapter on design flood estimation in ungauged catchments. With the positive results that could be obtained from this thesis, this could then form or even assist with the revision of ARR for the state of Victoria and possibly Australia wide. 1.3 OBJECTIVES OF THIS RESEARCH 4 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

The objectives of this research are summarised as follows: • Carry out a literature review on the current techniques which are being used for design flood estimation in ungauged catchments in Australia with particular emphasis to Victoria. Both the advantages and disadvantages of these methods will be discussed. The main part of this literature review will be on the QRT using both OLS & GLS methods and PRM.

• Collate sufficient streamflow and catchment/climatic characteristics data to develop and test the QRT and PRM methods for the selected study area (which is Victoria).

• Develop regional prediction equations up to the 100 year average recurrence interval (ARI) which can be used to predict design floods at ungauged catchments in the study area. The applicability of the GLS based QRT will be assessed for Victoria.

• Validate the performance of the developed prediction equations (OLS & GLS) on a set of “independent test catchments” that will be selected randomly and will not be used in the development of prediction equations.

• Redevelop the PRM for state of Victoria to allow for a “fair and unbiased” comparison of the developed prediction equations and PRM.

1.4 OUTLINE OF THE THESIS

The investigations involved in the proposed research are presented in 10 chapters, as described below.

Chapter 1 gives a brief introduction to the overall study, highlighting both the background which is considered most relevant and important to the study. The objectives and aims are also introduced and defined.

Chapter 2 provides a literature review on the various aspects of regional flood frequency analysis (RFFA): assumptions in flood frequency analysis; distributional choices; regional

5 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney homogeneity; the index flood method; the Probabilistic Rational Method and the Quantile Regression Technique (both OLS and GLS). This chapter gives a summary of the findings of the associated merits and disadvantages of each approach.

Chapter 3 describes the statistical techniques used in the study e.g. L- Moments, at-site flood frequency analysis adopting a Bayesian methodology, multiple regression using both OLS and GLS and the statistical diagnostic tests to assess performance of developed equations. This chapter also covers in detail each technique, and outlines the underlying assumptions and limitations.

The assembly of streamflow data is an important step in any regional flood frequency analysis. Chapter 4 describes various aspects of streamflow data collation such as selection of the study area, filling of gaps in the annual maximum flood data series, testing the data for any suspected trends (as one of the assumptions of flood frequency analysis is the data must exhibit stationarity and be homogenous), exploring rating curve errors associated with the annual maximum data (hydrological data often has notable error associated with it, hence identifying this is important) and checking for outliers (both low and high outliers may be present in annual maximum flood data, these must be identified and treated accordingly). This chapter also presents the final set of stations to be used in the study after all the above criteria have been satisfied. The independent test catchments are also selected in this chapter.

Chapter 5 covers the selection of the climatic and physical catchment characteristics variables that govern flood generation and behaviour. Chapter 5 covers the issues considered in this selection and looks at the best functional form of these variables for use in this RFFA study.

The aim of Chapter 6 is to identify homogenous regions. The first step in that is to identify sites that are grossly discordant with the rest of the sites in that particular region. Secondly an empirical approach of determining homogenous groups are outlined, and applied. This approach is then extended to smaller sets of catchments based on regional proximity and catchment sizes to determine if any homogenous groups could be formed. Homogeneity is not a prerequisite for the QRT, however having homogenous regions is advantageous as

6 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney this would certainly reduce the model error inherent in the regional model and would give more accurate prediction equations applicable to the region of interest.

In Chapter 7, redevelopment of the PRM is undertaken with the current data set; this is done with a view to compare the performances of the developed prediction equations (from

QRT) with the PRM. Here the contour map of C10 is redeveloped following the ARR methodology.

Chapter 8 covers the development of the regional prediction equations for the study area using both OLS and GLS approaches. The GLS regression analysis technique is an extension to the OLS multiple linear regression method. The GLS explicitly accounts for sampling variability and model error in the regional model using a residual error covariance weighting matrix. The regression analysis assumes linearity (after transformation) between the dependant variables (flood quantiles for a specific ARI) and independent variables (climatic and catchment characteristics) and identifies the best functional and relational form between them. This is undertaken using specific software routines written as part of this study in the statistical language R.

Chapter 9 presents testing of the developed regionalisation techniques on a set of independent test catchments. The testing covers four aspects: flood quantile estimation with respect to the OLS & GLS methods; comparison of OLS & GLS based regional prediction equations with at- site flood frequency analysis and comparison of OLS & GLS with the PRM. The main issue to come out of this is how well or how good the regional equations can approximate at – site flood frequency results.

A summary, conclusion and further recommendations from this research are presented in Chapter 10.

There are seven appendices, as follows: Appendix A presents the stations used in the study along with the period of streamflow record of the selected catchments. Appendix B presents the plots associated with identifying trends in annual maximum data. Appendix C 7 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney presents hypothesised regions to identify homogeneous regions. Appendix D presents the transformed catchment characteristics and flood quantile data. Appendix E includes various results associated with the Probabilistic Rational Method and Flood Frequency Analysis. Appendix F presents the graphical results associated with the OLS and GLS based regression methods. The last Appendix G illustrates the flood frequency plots for each of the 20 test catchments for comparison of RFFA methods.

8 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

CHAPTER 2: REVIEW OF REGIONAL FLOOD FREQUENCY ANALYSIS TECHNIQUES

2.1 GENERAL

The aim of this chapter is to review previous studies on regional flood frequency analysis techniques with a particular emphasis on the Quantile Regression Technique. At the beginning, the basic issues on regional flood frequency analysis such as regional homogeneity, inter-site dependence, and distributional choices are reviewed. A brief discussion is then presented on identifying homogenous regions based on annual maximum series. A review of regional flood frequency analysis methods is then presented. Advantages and limitations of the Quantile Regression Technique are also discussed. A summary of the findings from this review is given at the end of the chapter.

2.2 BASIC ISSUES

2.2.1 FLOOD FREQUENCY ANALYSIS

In flood frequency analysis, a unique relationship between a flood magnitude ( Q) and the corresponding average recurrence interval (T) is sought. The task is to extract information from a set of streamflow records to estimate the relationship between Q and T. Statistical methods are generally used for flood frequency analysis as quantifying the physical processes that determine a flood magnitude is often associated with a high degree of uncertainty. Three different models may be considered for the purpose of flood frequency analysis. These models are (1) the annual maximum flood series (AM) model, (2) the partial duration series (PD) or the peaks over threshold (POT) model, and (3) the time series (TS) model.

In the (AM) series, only the peak flow in each year of record is considered. Most flood frequency analysis techniques are based on AM series. Flood peaks do not occur with any fixed pattern in time or magnitude. Time intervals between floods vary. The definition of return period is the average of these inter-event times between flood events (Cunnane, 1989). This is also called average recurrence interval (ARI).

9 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

Large floods naturally have large return periods and vice versa. The definition of the return period may not involve any reference to probability. However, a relationship between the probability of occurrence of a flood and its return period can be justified. A given flood q with a return period T may be exceeded once in T years. Hence the probability of exceedence is P(Q T>q) =1/ T. The cumulative probability of non–exceedence, F(Q T) is given by Equation 2.1 1 F(Q ) = P(Q ≤ q) =1 − P(Q > q) =1 − (2.1) T T T T

Equation 2.1 is the basis for estimating the magnitude of a flood, QT given the return period T.

Often, the observed flood series data are plotted on probability paper to check wether they follow a particular distribution, to detect data errors and to check for outliers. Probability plots require an initial estimate of the probability of non-exceedance which is called a “plotting position”. A plotting position formula which is used often is that given by Cunnane (1989):

i − 4.0 F = (2.2) N + 0.2 where N is the sample size and i is the rank of the observations in ascending order. Some other commonly used plotting position formulas can be found in Cunnane (1989).

The data used in flood frequency analysis is assumed to be independent and identically distributed. The flood data are considered to be stochastic. Further it is assumed that the flood data have not been affected by natural or man made changes in the hydrological regime.

In practice there are many pitfalls to these assumptions. The assumption that the data in a given system arise from a single parent distribution is subject to question especially when large catchments are being analysed. In circumstances such as this more than one type of rainfall or flow may contribute to the extreme events in a region of interest. These assumptions have been questioned and discussed extensively in Klemes, (1987a, 1987b) and Yevjevich (1986). 10 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

2.2.2 REGIONAL FLOOD FREQUENCY ANALYSIS

The availability of streamflow data is an important aspect in any flood frequency analysis. The estimation of probability of occurrence of extreme floods is an extrapolation based on limited data. Thus the larger the data set, the more accurate the estimates will be. From a statistical view point, estimation from a small sample may give unreasonable or physically unrealistic parameter estimates, especially for distributions with a large number of parameters (three or more). Large variations associated with small sample sizes cause the estimates to be biased. In practice, however, data may be limited or in some cases may not be available for a site. In such situations, regional flood frequency analysis (RFFA) is most useful.

RFFA is a technique of transferring information from gauged suites to ungauged sites. RFFA serves two purposes. For sites where data are not available, the analysis is based on regional data (Cunnane, 1989). For sites with limited data, the joint use of data measured at a site, called at - site data, and regional data from a number of stations in a region provides sufficient information to enable a probability distribution to be used with greater reliability. This type of analysis represents a substitution of space for time where data from different locations in a region are used to compensate for short records at a single site (National Research Council, 1988; Stedinger et al., 1993).

2.2.3 REGIONAL HOMOGENEITY

RFFA is based on the concept of regional homogeneity which assumes that annual maximum flood populations at several sites in a region are similar in statistical characteristics and are not dependant on catchment size (Cunnane, 1989). Although this assumption may not be strictly valid, it is convenient and effective in most applications.

One of the simplest RFFA procedures that has been used for a long time is the index flood method. The key assumption in the index flood method is that the distribution of floods at different sites within a region is the same except for a site - specific scale or index flood factor. Homogeneity in regards to the index flood relies on the concept that the standardised regional flood peaks have a common probability distribution with identical parameter values.

11 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

The identification of homogenous regions is an elementary step in RFFA (Bates et al., 1998). The application typically involves the allocation of an ungauged catchment to an appropriate homogenous group and the prediction of flood quantiles using developed models based on catchment characteristics (Bates et al., 1998). That is, the RFFA based on homogenous regions can transfer the information from similar gauged catchments to ungauged catchments to allow for flood prediction.

There have been many techniques developed which attempt to establish homogenous regions. For example the PRM uses geographical contiguity as an indication of homogeneity that is the catchments which are nearby to each other should have similar runoff coefficients.

Looking at homogeneity from a theoretical point of view, two catchments may be treated as homogenous with respect to flood behaviour if they both satisfy two criteria: the inputs (such as rainfall) to the hydrological systems are identical, and the climatic and physical characteristics changing the input to flood peak are the same. No two catchments can satisfy these criteria perfectly based on the fact that each catchment has a unique physical characteristic and that each catchment has different climatic inputs. The question remains, in the search for practical “homogeneity”, one has to make decisions on the degree of similarity or dissimilarity that is acceptable and deciding a cut-off point where a region is acceptably homogenous or heterogeneous, in consideration of the practical applications of the techniques.

In defining homogenous regions for use in RFFA, a balance has to be made between including more sites for increased information and maintaining an acceptable level of homogeneity. In most situations when more sites are added to a region, certainly more information is gained about the flood regime; however sites that are hydrologically dissimilar can increase the heterogeneity in the region.

2.2.4 INTER – SITE DEPENDENCE

Some RFFA methods make use of inter – site dependence while others do not. Inter – site dependence as reported by Cunnane (1988) states that streamflow data points across a region will show similar behaviour within any given timeframe. This means that;

12 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

1) In some years the annual maximum flows at all sites are due to a single widespread meteorological event. 2) In relatively dry years, peak flows are generally low over the entire region, in which case all annual maxima will be low.

To be able to counteract these trends in RFFA, previous studies have indicated that a concurrent record of sufficient length should be adopted (Stedinger, 1983a).

2.2.5 DISTRIBUTIONAL CHOICES

The choice of an appropriate probability distribution to be used in flood frequency analysis has been a topic of interest for a long time and is of prime importance in at-site and RFFA. It has received widespread attention by researchers. Benson (1968) and NERC (1975) devote considerable attention to this problem. Cunnane (1989) summarised the distributions commonly used in hydrology, mentioning 14 different distributions.

In some countries, a common distribution has been selected to achieve uniformity between different design agencies. The U.S.A Interagency Advisory Committee on Water Data (IACWD, 1982) and the Institution of Engineers Australia (I. E. Aust., 1987) recommend the Log Pearson Type 3 (LP3) distribution for use in United States and Australia respectively. Other distributions that have received considerable attention include Extreme Value Types 1, 2, 3, Generalised Extreme Value (GEV) (NERC, 1975), Wakeby (Houghton, 1978), Generalised Pareto (GPA) (Smith, 1987), Two-component Extreme Value (Rossi et al., 1984) and the Log-Logistic distribution (Ahmad et al., 1988).

The use of a standard distribution has been criticised by Wallis & Wood (1985) and Potter & Lettenmaier (1990). They argue that a reassessment of the use of the LP3 distribution for practical flood design is overdue. Vogel et al. (1993) studied the suitability of a number of distributions (including the LP3) for Australia. They found that the Generalised Extreme Value (GEV) and Wakeby distributions provide the best approximation to flood flow data in the regions of Australia that are dominated by rainfall during the winter months; for the remainder of the continent, the Generalised Pareto (GPA) and Wakeby distributions provide better approximations. For the same data set, the LP3 performed satisfactorily, but not as well as either the GEV or GPA distribution. The distributions that have attracted the 13 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney most interest as possible alternatives to the LP3 are the GEV and Wakeby (Bates, 1994). Studies by Rahman et al. (1999b) and Haddad and Rahman (2007) showed that GEV-LH moments method provide better results than the LP3 distribution in South–East Australia.

2.3 METHODS FOR IDENTIFICATION OF HOMOGENEOUS REGIONS The methods for obtaining homogenous regions are based on either geographical contiguity or flood characteristics alone or catchment characteristics alone. The theoretical aspects, limitations and associated problems with identification of homogenous regions based on flood data (annual maximum series) are discussed below.

In this approach, the degree of homogeneity of a proposed group is judged on the basis of a dimensionless coefficient of the annual maximum flood series, such as the coefficient of variation ( Cv), coefficient of skewness ( Cs) or similar measures. Examples are given by Dalrymple (1960), Wiltshire (1986), Acreman & Sinclair (1986), Vogel and Kroll (1989), Chowdhury et al. (1991), Pilon and Adamowski (1992), Lu and Stedinger (1992), Hosking and Wallis (1993) and Fill and Stedinger (1995a, b).

Dalrymple (1960) proposed a homogeneity test based on the sampling distribution of the standardised 10 year annual maximum flow, assuming an EV1 as the parent distribution.

Wiltshire (1986a, b) presented a test based on the sampling distribution of Cv and also a distribution-based test. He introduced an F statistic, which is the ratio of between group variance of Cvs and within group variances of Cvs, to judge the degree of homogeneity. He tested the efficiency of the proposed test on simulated data and concluded that “it is clear that the test in its present form is unsuitable for use in assessing regional homogeneity”. The power of his distribution-based test was found to be low for smaller regions. He mentioned that for a typical region with 50 sites with 20 years of record length for each site, the test has only moderate power. Acreman and Sinclair (1986) used a likelihood ratio test based on the assumption of an underlying GEV distribution.

Hosking and Wallis (1991, 1993) proposed a heterogeneity measure based on the L moment ratios L CV , L CS and L kurtosis. The advantages of this test are that it is based on L moments and not distribution-specific like those mentioned above. This test has received considerable attention in recent years (e.g. Pearson, 1991; Thomas and Olsen, 1992; Alila 14 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney et al., 1992; Guttman, 1993; Zrinji & Burn, 1996, Bates et al.,1998 & Rahman et al., 1999b), and will be discussed further in Chapter 3.

Cunnane (1988) mentioned that identification of a homogeneous region is necessarily based on statistical tests of hypothesis, the associated power of which, with currently available amounts of hydrological data, is low. Thus it is not possible to divide, with great assurance, a large number of catchments into homogeneous subgroups using flow records with limited lengths.

2.4 REGIONAL FLOOD FREQUENCY ANALYSIS METHODS There are a number of RFFA methods based on streamflow data that have been reported. Some of the most commonly used methods are discussed below.

2.4.1 INDEX FLOOD METHOD Most RFFA methods involve some sort of standardisation of the data. If the standardisation is of the form X = Q/b where b is an index of the overall flood magnitudes on a catchment, then it is referred to as an index flood method. The underlying concept of this method is that the distributions of floods at sites of a homogeneous region are identical, apart from a site-specific scaling factor b (the “index flood”), which reflects the size, rainfall, and runoff characteristics of each catchment. It is assumed that all the moments of order higher than one are identical after correction for scale. Usually b is taken as the mean flood, Q , or the ~ median flood Q of the site. When b = Q , then the variate X has the properties (Cunnane,

1988): E(X) = 1; σ X= CQ v () ; gX = gQ ; where E(X) = expected value of X; σX = standard deviation of X values; Cv = coefficient of variation; and g = skew coefficient. For small samples, X is the ratio of two random variables, rather than a single scaled random variable. Stedinger (1983) showed that its effects can be quite marked in small samples, in that the distribution of X is quite different in form to that of Q . Further, if samples are small, the variance of Q contributes appreciably to sampling variance of estimated X quantiles.

The dimensionless rescaled data Xij = Q ij /b j are the basis for estimating XT, the regional growth factor of ARI of T years. Dalrymple (1960) proposed an index flood method that used records of equal length for each sites which have been tested for homogeneity at the

15 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

10 year ARI level. This method, though widely used, has some limitations as mentioned in Section 2.3. A slightly different type of index flood method has been proposed by Hosking and Wallis (1990, 1993), in which it is assumed that the distribution of XT is known apart from the distribution parameters θk ( k = 1, ..., p). These parameters are estimated separately at each site and combined to give regional estimates:

M ()j ∑ w jθ$ k $ R j=1 θ k = M (2.3) ∑ w j j=1

()j where θ$ k is the site j estimate of θk and wj = Nj are the weights. Equation 2.3 gives a weighted average, the site estimate being given weight proportional to its record length

()j because, for regular statistical models, the variance of θ$ k is inversely proportional to Nj (Hosking and Wallis, 1993). Recent research (Lettenmaier and Potter, 1985; Wallis and Wood, 1985; Hosking and Wallis, 1986; Potter and Lettenmaier, 1990; Rahman, 1997 and Rahman et al., 1999b) has found that index flood procedures, coupled with probability- weighted moments or L moments can yield reasonably accurate quantile estimation.

In the Index Flood Method, the dimensionless regional growth curve is used to estimate XT. The flood quantile having an ARI of T year is then obtained from:

QT = X T Q (2.4)

In the case of a gauged site, the at-site mean flood is used in Equation 2.4; for an ungauged site, Q is estimated using regional information. Equation in 2.4 is based on the following variables:

QT , which is the flood quantile at a site, with an ARI of T years; and

XT , which is the regional growth factor, this defines the frequency distribution common to all the sites in a homogenous region.

16 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

Q , which is the known as the flood index, is typically represented (in gauged catchments) by the mean of the at – site annual maximum flood series. Being used as a scale parameter, it is recognised as the term which dictates the difference in quantiles between individual sites than homogenous groups.

Equation 2.4 is the essence of the index flood method. In the case where a site has gauging information, the at – site mean flood is used to represent Q . However, when the method is to be applied to the ungauged catchment case where there is little or no data available the difficulty in estimating Q becomes evident. Estimation such as this is typically performed via multiple regression between flood series (annual maximum series from gauged catchments in the region) and catchment/climatic characteristic within the region. The general form of the regression equation can be expressed as:

Q= aBb C c D d ... (2.5) where B, C, D, … are catchment characteristics and a, b, c, d, … are parameters of the regression equation.

A significant amount of research has been conducted in regards to the index flood method both in the past and more recently. Dalrymple (1960) was one of the first to develop an index flood technique which was used by the USGS prior to 1965. The method developed by Dalrymple was to relate annual flood series to catchments area for a particular region of interest. Relationships were then sought on geographical representation; the particular area was then divided into divisions based on similarity (Riggs, 1973).

The second part of Dalrymple’s approach involved averaging the shapes of similar curves for the region to create one similar common curve; this method was relatively easy to implement as only one variable was required which was catchment area. As this approach is an empirical one a number of limitations have been identified:

1). Arbitrary decisions are required at boundaries of regions with respect to mean annual flood and the shape of the frequency curve.

17 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

2). There was no consideration of other important factors which have shown to be plausible/influential in the flood generation process (Riggs, 1973).

Australian Rainfall & Runoff (ARR) (I.E Aust, 1987) did not recognise the superiority of the index flood method (IFM) as a design flood estimation technique. ARR (I.E Aust, 1987) was also very critical of the way the IFM had been developed in the past and commented that the regression techniques used are limited in their functional form and lack of a sound physical and statistical basis. More research had been conducted by Lettenmaier and Potter (1985), Wallis and Wood (1985), Hosking and Wallis (1986) and Potter and Lettenmaier (1990) who showed that the index flood procedures, coupled with probability weighted moments or L- moments yielded more accurate quantile estimates.

The index flood method had been criticised on the grounds that the coefficient of variation of the flood series Cv may vary approximately inversely with catchment area, thus resulting in flatter flood frequency curves for larger catchments. This had particularly been noticed in the case of humid catchments that differed greatly in size (Dawdy, 1961; Benson, 1962; Riggs, 1973; Smith, 1992).

There have been recent studies carried out by Bates et al. (1998) and Rahman et al. (1999a) where the development of an application for design flood estimation in ungauged catchments in South - east Australia was tested using IFM. The method involved the assignment of ungauged catchments to a particular homogenous group identified (through the use of L- moments) on the basis of catchment and climatic characteristics as opposed to geographical proximity. The relationships sought were carried out by statistical procedures such as canonical correlation analysis; tree based modelling and other multivariate statistical techniques. This allowed for the development of a RFFA method using up to 12 independent climatic and catchment characteristics variables.

Although the results of this method showed promise when compared to the Probabilistic Rational Method (PRM) its limitations were already evident in that it needed a large number of independent variables which are very time consuming to obtain. The results of this method also depend upon the correct catchment assignment to a homogenous group, thus any wrong assignment would greatly increase error in quantile estimation.

18 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

2.4.2 STATION YEAR METHOD The standardised X values of all the sites in the region are treated as if they form a single random sample of size L from a common X parent population. The pooled standardised data are then fitted to a suitable distribution, and XT values are calculated. Since this method ignores inter-site dependence, it may lead to bias, especially at large return periods (Cunnane, 1988).

2.4.3 BAYESIAN METHOD This method allows an unknown parameter to be estimated as a random variable rather than as a fixed unknown constant. It does not require standardisation of flood data or the assumption of regional homogeneity, like the index flood method. Cunnane and Nash (1974) and Cunnane (1988) discussed the Bayesian approach to RFFA. Kuczera (1982b) proposed a “Linear Empirical Bayes” approach for RFFA, but Lettenmaier and Potter (1985) found in a simulation study that this approach does not provide quantile estimates as precise as estimates obtained from index flood methods. Cunnane (1988) mentioned that the Bayes approach, though theoretically attractive, is not as ‘precise’ as the across region averaging method based on probability weighted moments (PWMs).

2.4.4 PROBABILISTIC RATIONAL METHOD The rational method has often been regarded as a deterministic representation of the flood generated from an individual storm. However, the rational method recommended in ARR, (I. E. Aust., 1987) (see also Pilgrim and Cordery, 1993), is based on a probabilistic approach for use in estimating design floods. This “Probabilistic Rational Method” (PRM) is given by:

QCIA= 0. 278 (2.6) T T tc , T

3 where QT is the peak flow rate (m /s) for an ARI of T years; CT is runoff coefficient (dimensionless) for ARI of T years; I is average rainfall intensity (mm/h) for a design tc , T duration of time of concentration tc hours and ARI of T years; and A is the catchment area (km 2).

19 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

The method may be regarded as a regional model, with design rainfall intensity I and tc , T catchment area A as independent variables. The runoff coefficient CT is a factor which lumps the effects of climatic and physical characteristics, other than catchment area and rainfall intensity. It is noteworthy that in ARR 1987 the values of CT were estimated using conventional moment estimates from flow records of limited lengths e.g. some sites had only 10 years of records. Since conventional moment estimates are largely affected by sampling variability and extremes in the data, a higher degree of uncertainty in quantile estimation is likely to arise due to CT . The mapping and use of runoff coefficients are based on the assumption of geographical contiguity, an assumption that is unlikely to be satisfied, as mentioned in Section 2.3.

Lay (1989) was able to report on the PRM and its application for ungauged rural catchments in Victoria, which confirmed that the PRM is in accordance with criteria set out by the National Committee on Water Resources. However, the method is only as good as the data it is derived from, hence to keep the PRM up to date and workable, it was recommended the database within ARR should be updated at regular intervals (e.g. every 5 years). It is clearly known that this recommendation has not been fulfilled as there has been no update of the PRM since 1987.

Rahman & Hollerbach, (2003) investigated the physical significance of runoff coefficients and assessed the extent of uncertainty of design flood estimates obtained by the PRM. By following the method of derivation in ARR, runoff coefficients were estimated for 104 gauged catchments in South east Australia. The mapping of these C10 coefficients onto a suitable map of the area indicated that C10 coefficients show little spatial coherence. The C coefficients are mapped according to the position of the gauging station and some interpolation is then required for areas where there is little or no data so that the contours can be developed. The error introduced into the contours is through the interpolation technique; this is due to the fact that some regions will be exposed to greater spatial changes in physical topography and other factors which will directly affect the C10 coefficients.

Rahman (1997) stated the underlying concept of contiguous regions, ie nearby catchments are hydrologically similar is true to the extent that contiguous areas are likely to have

20 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney similar meteorological characteristics and therefore similar hydrological inputs. But geographical proximity cannot be a guarantee for hydrological similarity, as two nearby catchments may possess quite dissimilar physical characteristics. Geographical regions may cut across geologic, climatic and topographic boundaries, causing abrupt changes in hydrological parameters at their boundaries (Wiltshire, 1986). A geographical region that is called homogeneous may include catchments exhibiting a wide variety of catchment characteristics, also with very different flood characteristics (Wiltshire, 1986c; Acreman and Wiltshire, 1989). In a very similar fashion Rahman and Holerbach (2003) also stated that while nearby catchments shows similar meteorological characteristics, they may possess quite dissimilar physical characteristics, which clearly indicates that the method of simple linear interpolation over a geographical space on the map of C10 in ARR (I.E Aust, 1987) has little validity.

Rahman and Hollerbach (2003) also examined the uncertainty associated with design flood estimation with use of the PRM adopting the developed C10 coefficients. This study showed that for about 40% of the catchments, the PRM underestimated the observed flood quantiles. The developed C10 coefficients however, did show reasonable correlation with pan evaporation, quaternary sediment area, stream density and mainstream slope. An attempt was made to develop prediction equations for C10 coefficients and catchment characteristics however this proved to be unsuccessful.

2.5 QUANTILE REGRESSION TECHNIQUE 2.5.1 INTRODUCTION United States Geological Survey (USGS) proposed a QRT where a large number of gauged catchments are selected from a region and flood quantiles are estimated from recorded streamflow data, which are then regressed against catchment variables that are most likely to govern the flood generation process. Studies by Benson (1962) suggested that T-year flood peak discharges could be estimated directly using catchment characteristics data by multiple regression analysis.

The quantile regression technique can be expressed as follows:

b c d QT = aB C D ... (2.7)

21 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

Where B, C, D, … are catchment characteristics variables and QT is the flood magnitude with T year ARI (flood quantile), and a, b, c, … are regression coefficients.

This method is not based on a constant coefficient of variation (C v) of annual maximum flood series in the region like the index flood method. It has been noted the method can give design flood estimates that do not vary smoothly with ARI; however, hydrological judgment can be exercised in situations such as these when flood frequency curves need to be adjusted to increase smoothly with T.

There have been various techniques and many applications of regression models that have been adopted for hydrological regression. Most of these methods are derived from the methodology set out by the USGS as described above.

The USGS for a long time have been applying the QRT. A well known study using the QRT with an Ordinary Least Squares (OLS) procedure had been carried out by Thomas and Benson (1970). The study tested four regions in the United States for design flood estimation using multiple regression techniques that related streamflow characteristics to drainage-basin characteristics. This study found that the QRT was predicting quantiles estimates quite accurately as compared to previous methods adopted by the USGS. However, there was still the point made that the equations were lacking statistically sound methodology.

The OLS estimator has traditionally been used by hydrologists to estimate the regression parameters β in regional hydrological models. But in order for the OLS model to be statistically efficient and robust, the annual maximum flood series in the region must be uncorrelated, all the sites in the region should have equal record length and all estimates of T year events have equal variance.

Since the annual maximum flow data in a region do not generally satisfy these criteria, the assumption that the model residual errors in OLS are homoskedastic is violated and the OLS approach can provide very distorted estimates of the model’s predictive precision (model error) and the precision with which the regression model parameters are being estimated (Stedinger and Tasker, 1985).

22 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

To overcome the above problems in OLS, Stedinger and Tasker (1985) proposed the Generalised Least Squares (GLS) procedure which can result in remarkable improvements in the precision with which the parameters of regional hydrologic regression models can be estimated, in particular when the record length varies widely from site to site. In the GLS model, the assumptions of equal variance of the T year events and zero cross-correlation for concurrent flows are relaxed.

The GLS procedure as described by Stedinger and Tasker (1985) and Tasker and Stedinger (1989) require an estimate of the covariance matrix of residual errorsΣˆ (Y ) .

2.5.2 GENERALISED LEAST SQUARES (GLS) AND WEIGHTED LEAST SQUARES (WLS) As discussed above, the parameters of regional hydrological models have been estimated using the OLS procedure. However, regionalisation using hydrological data violates the assumption that the residual errors associated with the individual observations are homoskedastic and independently distributed (Stedinger and Tasker, 1985). In the case of hydrological data, variations in streamflow record length and cross – correlation among concurrent flows result in estimates of the T year events which vary in precision. Matalas and Benson (1961), Matalas and Gilroy (1968), Hardison (1971), Moss and Karlinger (1974), Tasker and Moss (1979), have all examined the statistical properties and prediction of OLS procedures with hydrological data.

What has received great attention in the US is how to best estimate the parameters of a regional hydrological model given the limitations of OLS which will not identify the efficient estimates of a regression model parameters when the residual errors are not homoscedastic and independently distributed (Stedinger and Tasker, 1985).

Moreover, as shown in the studies cited above, OLS estimates of the standard error of prediction and the estimated parameters are highly biased. Weighted and Generalised Least Squares techniques were developed to deal with situations like those encountered in hydrology where a regression models residuals are heteroscedastic and perhaps cross – correlated (Draper and Smith, 1981; Johnston, 1972). Tasker (1980), has in fact, used a Weighted Least Squares (WLS) procedure to account for unequal record lengths. Marin

23 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

(1983) and Kuczera (1982a, b, 1983) developed a Bayesian and Empirical Bayesian methodology which deals with these issues.

An obstacle to the use of WLS and GLS procedures with hydrological data is the need to provide an estimate of the covariance matrix of residual errors; that covariance matrix is a function of the precision with which the true model can predict values of the streamflow statistics of concern as well as the sampling error in the available estimates of that statistic. The discussions and examples in the works by Tasker (1980) and Kuczera (1983b) illustrate difficulties associated with estimation of this matrix.

Stedinger and Tasker (1985) showed in a Monte Carlo simulation with synthetically generated flow sequences, a comparison of the performance of the OLS procedure with that of a GLS procedure. In situations where the available streamflow records at gauged sites are of different and widely varying length and concurrent flows at different sites are cross- correlated, the GLS procedure provided more accurate parameter estimates, better estimates of the accuracy with the regression models parameters were being estimated, and almost unbiased estimates of the variance of the underlying regression model residuals.

A simpler WLS procedure neglects the cross correlations among concurrent flows. The WLS algorithm is shown to do as well as the GLS procedure when the cross correlation among concurrent flows are relatively modest.

2.5.3 APPLICATION OF GENERALISED LEAST SQUARES REGRESSION Tasker et al. (1986) compared the GLS estimation technique of Stedinger and Tasker (1985) with OLS estimation and polynomial estimation in a split – sample experiment, in which real data from Pima County, Arizona were used. Two conclusions were drawn from their study. First, of the data sets considered, the differences between the model parameter estimates obtained with OLS and GLS procedures were quite modest. This can be reflected in the fact that most sites had less than 20 years of streamflow data. This, coupled with large model – error variance, meant that the sampling-error term had little effect on the analysis. In addition to this, most of the cross correlations seemed very small. The second conclusion is that the GLS method provides a nearly unbiased estimate of the true variance of prediction, while the OLS approach substantially over estimates the true prediction variance. None the less Tasker et al. (1986) found from a statistical stand point, the method 24 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney is satisfying because it deals with the problem of cross – correlated data and unequal variance between sites in a sound and logical manner.

Tasker and Stedinger (1987) propose an adjustment to the GLS model of Stedinger and Tasker (1985) to account for possible information about historical floods available at some stations in a region. The historical information is assumed to be in the form of observations of all peaks above a threshold during a long period outside the systematic record period. A Monte Carlo simulation experiment was performed to compare the GLS estimator adjusted for historical floods with the unadjusted GLS estimator and the OLS estimator. The results indicated that (1) using the GLS estimator adjusted for historical information significantly improves the regression model; (2) The modified GLS method described in Tasker and Stedinger (1987) outperforms the widely used OLS method in estimating regression model parameters.

2.5.4 AN OPERATIONAL GLS MODEL FOR HYDROLOGIC REGRESSION Monte Carlo simulation studies were undertaken by Stedinger and Tasker (1985) which documented the value of GLS procedures to estimate empirical relationships between streamflow statistics and physiographic basin characteristics. Tasker and Stedinger (1989) presented an extension of the GLS method that deals with the realities and complexities of regional hydrological data sets that were not addressed in the Monte Carlo simulation studies. These extensions include (1) a more realistic model of the underlying model error; (2) smoothed estimates of cross correlation of flows; (3) procedures for including historical flow data; (4) diagnostic statistics describing leverage and influence for GLS regression. Thus implementation of the GLS regression method employed by Stedinger and Tasker (1985) requires these new extensions to be incorporated into the model especially in regards to identifying the realistic model error associated with the GLS analysis.

2.5.5 OPERATIONAL BAYESIAN GLS REGRESSION FOR REGIONAL HYDROLOGIC ANALYSIS Reis et al. (2003, 2005) introduced a Bayesian approach to parameter estimation for the GLS regional regression model developed by Stedinger and Tasker (1985) for hydrological analysis. The results presented in Reis et al. (2005) show that for cases in which the model error variance is small compared to sampling error of the at – site estimates, which is often the case for regionalisation of a shape parameter, the Bayesian estimator provides a more 25 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney reasonable estimate of the model error variance than the Method of Moments (MOM) and Maximum Likelihood (ML) estimators. This paper by Reis et al. (2005) also presents regression statistics for WLS and GLS models including pseudo analysis of variance, a pseudo R2, error variance ratio (EVR) and variance inflation ratio (VIR), and leverage and influence. The regression procedure was illustrated with two examples of regionalisation. Results obtained from OLS, WLS and GLS procedures were compared. The OLS procedure provided very misleading results because it did not make any distinction between the variance due to the model error and the variance due to the sampling error. For the examples presented, the GLS method was found to provide the best framework because the cross correlation between concurrent flows proved to be important. Both leverage and influence statistics were very useful in identifying stations that did have a significant impact on the analysis.

2.5.6 THE USE OF GLS REGRESSION IN REGIONAL HYDROLOGIC ANALYSIS Griffis and Stedinger (2006) looked at the GLS regression method in more detail. Previous studies by the US Geological Survey using the LP3 distribution have neglected the impact of uncertainty on the weighted skew on quantile estimation. The needed relationship has been developed in this paper and its use is also illustrated in a regional flood study with 162 sites from South Carolina. The performance of this model is compared to separate models for each hydrological region tested. The results were both surprising and hydrologically reasonable. This paper also looks at new statistical diagnostic metrices such as a condition number to check for multicollinearity, a new pseudo R2 appropriate for use with GLS regression, and two error variance ratios.

2.5.7 APPLICATION OF GENERALISED LEAST SQUARES TO LOW-FLOW FREQUENCY ANALYSIS Vogel and Kroll (1990) undertook a study to compare the GLS and OLS regression procedures in developing generalised low-flow frequency relationships for ungauged sites in Massachusetts. The GLS regression procedures let to almost identical regional regression model parameter estimates when compared to the OLS procedures. Although the GLS procedures led to only marginal gains in the prediction errors associated with low flow regional regression equations, that result only reflects the fact that all sites had at least eleven years of data, and most had more than twenty years of data. In addition, the large 26 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney model error component of the total prediction errors implies that the sampling error had only a small impact on the analysis. Vogel and Kroll (1990) made note of the fact, that the GLS procedure will have significant advantages over OLS procedures in studies which seek to include very short records such as at partilal record sites. In such instances, GLS procedures can lead to significant improvements because the number of sites included in the analyses can be increased considerably.

Kroll and Stedinger (1999) examined the development of regional regression relationships with censored data for low-streamflow statistics. The basic problem was when no discharge record is available for a site; a regional regression relationship can be developed to estimate the low flow quantiles. The problems that arise in the derivation of such models are when some at-site estimates are reported as zero. One concern is that quantile estimates reported as zero may be in the range from zero to the measurement threshold. A second concern is that logarithmic transformation cannot be used with zero quantile estimates, so traditional log linear least squares estimators cannot be computed. The study by Kroll and Stedinger (1999) uses visual examples and Monte Carlo simulation to compare the performance of techniques for estimating the parameters of a regional regression model when some at-site quantile estimates are zero.

The OLS techniques employed in practice include adding a small constant to all at-site quantile estimates, or neglecting all observations reported as zero. Both these approaches performed poorly when compared to the use of a Tobit model, which is a maximum likelihood estimator (MLE) procedure that represents the below threshold estimates as a range from zero to the threshold level. A weighted Tobit model that accounts for the heteroscedasticity of the residuals in the regional regression model was also examined, but provided relatively little gain over the ordinary Tobit model.

Hewa et al. (2003) stated that the model inferences using the OLS method would be misleading for the highly correlated dependent variables. Hewa et al. (2003) points out the error structure of a regional model (error covariance matrix) is a powerful tool in deciding the most appropriate regression procedure. The methodology of the GLS procedure presented in this analysis is capable of estimating a more realistic error covariance matrix for regional hydrological models. Hewa et al. (2003) found that the estimated sampling

27 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney variance of the 7D10YR (7-day 10 year ARI) extreme low flows over the study area varied by four orders of magnitude and 92.3% of the inter correlation values are significantly different from zero at the 5% level of significance, indicating that 7D10YR values over the study area are not equally reliable and are inter correlated. Hence the GLS procedure selected as the most appropriate methodology for this regionalisation study. Regional prediction equations based on the OLS analysis were objectively evaluated. It was found that the GLS approach outperforms the OLS method.

2.6 QUANTILE REGRESSION TECHNIQUE IN AUSTRALIA Recently Rahman (2005) developed and tested a QRT for South – east Australia which is considered to be relatively easy to apply. The study involved the derivation of predictive equations for design flood estimates of the 2, 5, 10, 20, 50 and 100 year ARIs based on flood and catchment/climatic characteristics for South - east Australia. The study was based on previous work carried out by Rahman (1997) where 12 predictor variables were adopted. These variables are both readily obtainable and are considered to have a plausible role in the generation of floods. A database for the study area was developed by taking a list of candidate stations which satisfied certain criteria such as record length, urbanisation and regulation of catchment etc. Stations which were considered to be poor quality or have unusual features or were graded poor by the gauging authority were excluded from the analysis. The catchments selected were mainly unregulated with no major land use changes occurring over the period of streamflow (Rahman, 2005a).

Flood quantiles were derived using the unbiased plotting position formula (Cunnane, 1978), which involves an empirical distribution being fitted to each station. This method has quite a bit of subjectivity involved, in particular for the higher ARIs. Annual exceedance probabilities (AEP) were then plotted against observed floods on probability paper and flood quantiles were estimated by line of best fit.

Following similar criteria to that preferred by the USGS, the flood quantiles from gauged catchments were regressed against the 12 predictor variables using an OLS regression procedure. After a number of backward step regressions which involved removing the least significant variables a number of different models for the region were developed for each ARI. The regression models with the highest R2 and lowest standard error of estimate were selected as the regional regression relationship. 28 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

The application of the regression equation is quite simple as it only involves substitution of the variables which are considered to be most influential. For example in this study the variables included in the final prediction equations were (Rahman, 2005); • area; obtained from topographic maps (1:100,000); • sden; sum of drainage lines divided by catchment area;

• I12:2 , mm/h; rainfall intensity of 12–hour duration and 2–year ARI • evap; mm, mean annual evaporation obtained from Climatic Atlas of Australia (BOM) • qsa; quaternary sediment area found from geological maps of Australia (1:250,000).

The top four predictor variables were included in the 2, 5, 10 ad 20 ARI models, while the 50 and 100 ARI models included all the five predictor variables. None the less, the developed equations satisfied the underlying model assumptions well, while the split sample validation indicated reasonable accuracy of the developed equations.

As a continuation to the above study on the QRT-OLS approach by Rahman (2005), another assessment was carried out by Rijal and Rahman (2005) to compare the performances of the PRM and QRT upon the same test catchments within South - east Australia. Independent split sample validation on 20 test catchments showed that flood quantiles generally increase with ARIs but some hydrological smoothing is required. It has been found that the QRT in general provides more accurate design flood estimates than the PRM. The 75 th percentile values of the relative errors in design flood estimates for ARIs 2- 100 years range from 45% to 62% for the QRT as compared to 61% to 80% for the PRM. It was also concluded that there is approximately a 10% chance that the error in design flood estimates with QRT and PRM would exceed 100%, hence the users of these approaches should be aware of the high degree of error.

The National Committee on Water Engineering intends to test the applicability of the GLS method for Australian catchments which may form the basis of the revision of the regional flood frequency methods in ARR (Weinmann, 2006). More recently Haddad et al. (2006) presented a pilot study to compare the OLS and GLS methods using the available data from

29 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney south-east Australia to assess the suitability of the GLS method for south-east Australian catchments. This used the same data set as of Rahman (2005) to develop prediction equations using an OLS procedure to estimate design floods. This analysis looked at a two stage generalised least squares analysis that aims to reduce sampling error due to the effects of using the at site standard deviation in the error covariance matrix and deriving regional prediction equations based on GLS method.

The study indicated that the GLS method provides a better estimate of the model’s predictive ability in that the average absolute difference is smaller for the GLS than the OLS method. This can be attributed to the fact that the GLS procedure accounted for the heteroscedastic structure of the residual errors.

2.7 SUMMARY The estimation of flood behaviour at ungauged catchments is a common problem in hydrology. Regional flood frequency analysis (RFFA) is commonly used to “transfer” flood characteristics information from gauged catchments to ungauged ones. In this chapter, the literature review has covered currently applied RFFA techniques with particular emphasis to the Quantile Regression Technique (QRT).

The Index Flood Method has been discussed; which assumes the probability distribution of floods at sites of homogenous regions is identical except for a site specific scaling factor. Recent studies have shown positive results based on L–moments based Index Flood Method in South-east Australia. However the number of variables used in the study was too many for practical application.

The Probabilistic Rational Method is currently recommended in South – east Australia for design flood estimation in small to medium sized ungauged catchments. Though considered a regional method and easy to apply it has been criticised by researchers because of the assumption of geographical contiguity, and the mapping and application of the runoff coefficients.

The Quantile Regression Technique is a multiple regression technique which relates flood quantiles to catchment characteristics assuming linearity. The advantage of the quantile

30 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University Of Western Sydney

regression technique is that no assumptions are made about runoff coefficients or geographical contiguity as with the Probabilistic Rational Method. The previous application of the Quantile Regression Technique in South – east Australia has also shown promising results due to its simplistic form and limited number of prediction variables.

The preferred methodology of the Quantile Regression Technique is to use the Generalised Least Squares approach as further generalisations can be made with this method such as accounting for sampling variability and model error. The Generalised Least Squares technique has been shown to perform better than the Ordinary Least Squares technique as often in hydrology the assumption of homoskedasicity of the residuals is violated and the Ordinary Least Squares method can provide much distorted estimates of the regional model predictive ability.

The proposed research will look at both Generalised Least Squares and Ordinary Least Squares based Quantile Regression Techniques for design flood estimation in ungauged catchments. A comparison with the Probabilistic Rational Method will also be made in this thesis.

31 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

CHAPTER 3: METHODOLOGY OF STATISTICAL TECHNIQUES USED IN THIS REGIONAL FLOOD FREQUENCY ANALYSIS STUDY

3.1 GENERAL

This chapter provides a description of the statistical techniques used in this study. At the outset, the initial investigation involves determination of homogenous regions. L moments are defined and a heterogeneity measure based on L- moments is discussed.

Statistical techniques which include fitting flood frequency curves to the observed data using a Bayesian parameter fitting procedure is then presented

A discussion is then be presented about the quantile regression technique, while the basic theory has been introduced in Chapter 2, further emphasis is included on the weighted and generalised least squares regression procedures, this is done to highlight the assumptions involved in this study and to give an overview of dealing with errors in the data. The statistics involved in the operational generalised least squares model is also defined that looks at a more realistic setup of the residual error covariance matrix and underlying model errors.

A flow chart (Figure 3.1) is provided below which summaries the statistical methods and methodology adopted in this study.

32 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Identify Research Topic Literature Review

Pilot Study (Hydrology Symposium)

Methodology

Data Preparation & Collection Streamflow Data -Fill missing data, Trend analysis, identify Rating Curve errors, outlier test - Catchment & Climatic characteristics data

Regional Homogeneity

Determine homogenous regions based on L-moments

At – Site Flood Frequency Regional Skew LP3 Distribution Analysis using WLS Fit using Bayesian procedure & Bootstrapping Incorporate rating curve error.

Development of Regional Prediction Redevelopment of the Equations using OLS and GLS Probabilistic Rational Method procedures , Statistical Diagnostics

Testing and Validation – 20 test catchments Goodness of fit, comparison.

Conclusions and Recommendations

Figure 3.1 Flow chart showing statistical techniques / methods used in this study

33 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

3.2 METHODS FOR ASSESSING THE DEGREE OF HOMOGENEITY OF A REGION There are many approaches in assessing the degree of homogeneity of a region. Some of these methods have been discussed in Section 2.3.

The methods based on RFFA assume that the standardised variable qt = QT / ui at each station (i) has the same distribution at every site in the region under consideration whereQT is the flood quantile and ui is the scaling factor. In particular Cv(q) , C s(q) , the coefficient of variation and the coefficient of skewness of q, are considered to be constant across the region (Cunnane, 1988).

Departures from this assumption may lead to biased quantile estimates at some sites. Sites with Cv and C s nearest to the regional average may not suffer from such bias, but large biased quantile estimates are expected for sites whose Cv and C s deviate from the average. Good results may be obtained by regionalisation, especially in cases of short records, provided that the degree of heterogeneity is not great. In cases such as these, the large numbers of sites contributing to the parameter estimation compensates for regional heterogeneity. However in practical situations short records are reluctantly used in flood frequency analysis as they can lead to meaningless results which make neither statistical nor hydrological sense.

Other such methods of assigning homogenous regions are geographical similarity and geographical contiguity. Geographical similarity is based on soil types, climate and topography. The method based on geographical contiguity simply assumes that homogeneity is inversely related with distance; ie the degree of homogeneity between sites A and B is inversely proportional to the distance between them.

Since L-moments (defined in Section 3.2.1) suffer less from the effects of sampling variability, a homogeneity test based on L-moments is likely to provide more hydrologically and statistical meaningful groups. Of the methods based on L-moments, the one proposed by Hosking and Wallis (1993) seems to be the best viable option. This has been discussed in Section 2.3 and is used in this study to assess the degree of homogeneity among sites within the region. The statistical analysis is described below.

34 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

3.2.1 L-MOMENTS

Hosking (1990) introduced the L-moments, which are linear functions of probability weighted moments (PWMs). The L-moments are more convenient than PWMs because they can be directly interpreted as measures of scale and shape of probability distributions. In this respect they are analogous to conventional moments. L-moments are defined by Hosking in terms of PWMs α and β as

r r r * * λr+1 = (− )1 ∑pr,kα k = ∑ pr,k β k (3.1) k =0k = 0 where r r + k  * r−k    pr,k = (− )1    (3.2) k  k 

Sample moments (l r) are calculated by replacing α and β in Equation (3.1) by their sample estimates a and b expressed as PWMs in Equations (3.3) and (3.4),

N  N − i  N −1 ) ) 1     as = α s M ,0,1 s = ∑ xi /  (3.3) N i=1  s   s 

N i −1  N −1 ) ) 1     bs = bs M ,1 r 0, = ∑ xi /  (3.4) N i=1  r   s 

3.2.2 TESTS BASED ON L-MOMENTS (GOODNESS OF FIT TESTS)

) Hosking and Wallis (1991) present a goodness of fit measure based on τ r , the regional average of the sample L-kurtosis, mainly for three – parameter distributions. Since all three

– parameter distributions fitted to the data will have the same τ 3 on the LC s vs LC k diagram, ) the quality of fit can be judged by the difference between regional average τ 4 and the value ) DIST DIST of τ 4 for the fitted distribution. The statistic Z is defined below:

DIST ) DIST Z = (τ 4 −τ 4 /) σ 4 (3.5) ) is a goodness of fit measure, σ 4 is the standard deviation of τ 4 . The value of σ 4 can be obtained by simulation after fitting a Kappa distribution to the observations (Hosking,

35 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

1988). A fit is declared adequate if Z DIST is sufficiently close to zero, a reasonable

DIST criterion being Z ≤ 1.64. For small samples ( N ≤ 20) or large L-skewness ( τ 3 ≥ 4) a ) ) ) correction of τ 4 is required: instead of using τ 4 , τ 4 -β4 is used where β4 is the bias in the regional average L-kurtosis for regions with the same number of sites and the same record lengths as the observed data. β4 can also be obtained by simulations required to obtain σ 4 .

The same calculations for Z DIST can be made by the FORTRAN computer program “FORTRAN Routines for Use with the Method of L-moments” developed by Hosking (1991a).

3.3 REGIONAL HOMOGENEITY TESTS Hosking and Wallis (1991) develop two statistics which are used to test regional homogeneity. The first statistic is a discordancy measure, intended to identify those sites that are grossly discordant with the group as a whole. The discordancy measure D estimates

(i) (i) (i) T how far a given site is from the centre of the group. If ui = [t ,t3 ,t4 ] is the vector containing ,tt 3 ,t 4 values for site (i) , then the group average across NS sites is given by Equation (3.6).

1 NS u = ∑ui (3.6) NS i=1

The sample covariance matrix is given by Equation (3.7).

NS −1 T S = (NS − )1 ∑ (ui − u)( ui − u) (3.7) i=1

The discordancy measure is defined by Equation (3.8). 1 D = = (u − u )T S −1 (u − u ) (3.8) i 3 i i

A site (i) is declared to be unusual if Di is large. A suitable criterion to classify a station as discordant is that Di should be grater than or equal to 3.

36 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

The second statistic is a heterogeneity measure, intended to estimate the degree of homogeneity in a group of sites and to assess whether they might reasonably be treated as homogeneous. Specifically, the heterogeneity measure compares the between-site variations in sample L-moments for the group of sites with that expected for a homogenous region.

Three measures of variability V1, V2 and V 3 are available.

(1) Based on LC v(t) , the weighted standard deviation of (t) is Equation (3.10),

NS NS (i) 2 V1 = ∑N i (t − tˆ) / ∑ N i (3.10) i=1i = 1

where, NS in Equation 3.10 is the number of sites, Ni, is the record length at each site i and tˆ is the average value of t (i) given by Equation (3.11).

NS NS  (i)    tˆ = ∑ N i t  /∑ N i  (3.11)  i=1   i=1 

(2) Based on LC v and LC s, the weighted average distance from the site to the group weighted mean on a t vs. ts graph is computed.

NS NS (i) 2 (i) 2 /1 2 V2 = ∑N i {(t − tˆ) (t3 − tˆ3 ) } / ∑ N i (3.12) i=1 i=1

(3) Based on L-skewness (t 3) and L-kurtosis (t 4), the weighted average distance from the site to the group weighted mean on a t3 vs. t4 graph is computed in Equation (3.13).

NS NS (i) 2 (i) 2 /1 2 V3 = ∑N i {(t3 − tˆ3 ) (t4 − tˆ4 ) } / ∑ N i (3.13) i=1 i=1

To evaluate the heterogeneity measures, a Kappa distribution (Hosking, 1988) is fitted to the group average L-moments 1, tˆ , tˆ3 , b4 . Simulations of a large number of regions, Nsim ,

37 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney from this Kappa distribution are performed. The regions are assumed to be homogeneous and the data are assumed to have no cross-correlation or serial correlation. The sites are assumed to have the same record lengths as their real world counterparts. For each simulated region, Vi (where Vi is any of the three measures V1, V2 or V 3, defined above) is calculated. From the simulated data the mean µv and standard deviation σ v of the Nsim values of Vi are determined. The heterogeneity measure is defined in Equation (3.14).

H i = (Vi − µv /) σ v (3.14)

A region is declared to be heterogenous if Hi is sufficiently large. Hosking and Wallis

(1991b) suggest that a region be regarded as acceptably homogenous if Hi is less than 1, possibly heterogenous if it is between 1 and 2, and definitely heterogenous if Hi is greater than 2. Hosking and Wallis (1991) observed the statistics H2 and H3 based on the measures

V2 and V3 lack the power to discriminate between homogenous and heterogenous regions and that H1 based on V1 had better discriminating power. Therefore the H1 statistic based on V1 is recommended as a principal indicator of heterogeneity. If a Kappa distribution cannot be fitted ( tˆ4 is too large relative to tˆ3 ), the generalised logistic distribution, a special case of the Kappa distribution, is used for simulation. Also, H1 was found to be a better indicator of heterogeneity in large regions, but has a tendency to give false indications of homogeneity for small regions.

3.4 AT – SITE FLOOD FREQUENCY ANALYSIS At – site flood frequency analysis is an elementary step in any RFFA study. The primary objective of flood frequency analysis is to relate the magnitude of extreme events to their frequency of occurrence through the use of probability distributions (Chow et al., 1988). Data observed over an extended period of time in a river system are analysed in frequency analysis. The data are assumed to be independent and identically distributed. The flood data are considered to be stochastic and may even be assumed to be space and time independent. Further, it is assumed that the floods have not been affected by natural or manmade changes in the hydrological regime in the system.

In flood frequency analysis, a unique relationship between a flood magnitude and the corresponding recurrence interval T is sought. The task as stated is to extract information

38 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney from a flow record to estimate the relationship between Q and T. Three different models may be considered for this purpose (Cunnane, 1989). These models are (1) Annual Maximum (AM) Series, (2) Partial Duration (PD) Series or Peaks Over Threshold (POT) model, and (3) Time Series (TS) model. For this study the annual maximum series (AM) data is adopted.

ARR (I.E Aust, 1987) recommends the Log Pearson Type 3 (LP3) distribution fitted with the “method of moments” (MOM) for use with at – site flood frequency in Australia. However research has shown that the reassessment of the LP3 distribution is overdue (Wallis and Wood, 985; Vogel et al., 1993). Alternative distributions such as the Generalised Extreme Value (GEV) distribution have shown better results in Australia (Vogel et al, 1993; Wang, 1997; Rahman et al, 1999b; 2005b).

However this study adopts one probability distribution, the LP3 distribution fitted with a Bayesian procedure for at – site flood frequency analysis. The recommendations currently being prepared by the National Committee on Water Engineering is proposing a variation to the current MOM to using Bayesian fitting procedures to estimate the parameters of the probability distributions used in flood frequency analysis (Kuczera and Franks, 2005). Hence this method is adopted to estimate at – site flood quantiles for the gauging stations within the study area. The LP3 Bayesian procedure has shown satisfactory results in the study area (e.g. Haddad and Rahman, 2007). For consistency across other eastern states for the on-going RFFA study as a part of the revision of Book 4, the ARR Revision Team has recommended the use of LP3 Bayesian procedure.

3.4.1 FLIKE

There is no universal agreement on which probability model best describes flood data. The final choice depends on the quantity and quality of data available to the hydrologist, and goodness-of-fit criteria used in model selection. Whichever model is selected, it is important to appreciate that extrapolation well beyond observational experience involves great uncertainty. The flood frequency estimation will be done by FLIKE, which is a computer program developed by Professor George Kuczera of the University of Newcastle and facilitates Bayesian analysis.

39 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

The following section briefly describes the LP3 probability distribution model used in this study. They also describe how FLIKE (Kuczera 2005) obtains initial parameter values when searching for the most probable values.

3.4.2 LOG PEARSON TYPE 3 DISTRIBUTION

The LP3 probability model has the PDF

β α−1 f(log ex| α,β,τ) = β()log x − τ exp −β()log x − τ , α>0 Γ(α) []e []e

for β > 0, x > τ; for β < 0, x < τ (3.15) with Γ(α) being the gamma function.

The LP3 model has been widely accepted in practice because it consistently fits flood data as well if not better than other probability models. When the skew of log ex is zero, the model simplifies to the log normal.

The model, however, is not well-behaved from an inference perspective. Direct inference of the shape parameter α, the scale parameter β and the location parameter τ causes numerical problems. For example, when the skew of log ex is close to zero, the shape parameter α tends to infinity. Experience indicates it is preferable to fit the first three moments of log ex rather than α, β and τ (Kuczera, 2005).

Another problem arises when the skew of log ex is negative, the upper bound on flows can cause problems. FLIKE avoids this problem by starting the search for the most probable parameters using log normal method-of-moments parameters fitted to the gauged data.

This strategy is quite robust because when the skew of log ex is zero, the flow bounds are pushed all the way to infinity. As a result the search starts in a region of parameter space well removed from the constraints imposed by the flow bounds.

A further, more serious problem arises when the absolute value of the skew of log ex exceeds 2; that is, when α ≥ 1. When α < 1, the LP3 has a gamma-shaped density. However, when α ≥ 1, the density changes to a J-shaped function. Indeed when α = 1, the

40 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney pdf degenerates to that of an exponential distribution with scale parameter β and location parameter τ. For α ≥ 1, the J-shaped density seems to be over parameterised with three parameters. The posterior density surface reveals extremely elongated contours which are suggestive of an over parameterised model. In such circumstances, it is pointless to use the LP3 distribution. It is suggested either the GEV or GPA distributions be used as a substitute (Kuczera, 2005).

3.5 MULTIPLE REGRESSION ANALYSIS

Multiple regression analysis is fundamentally concerned with predicting the mean response

Y (At – site flood estimates Q 2,…,Q100 ) on the basis of the known values of a number of explanatory variables Xi (in our case the catchment / climatic characteristics). The k variable linear regression model involving the dependant variable Y and k independent variables X1, X 2,…X k may be written as:

Yi = β 0 + β1 X 1i + ... + β k X ki + ε i (3.16)

where i = 1, 2, …n is the number of observation; β 0 denotes the intercept; β1... β k are the regression parameters estimated by either ordinary least squares (OLS), weighted least squares (WLS) and generalised least squares (GLS); and ε i is the unexplained residual errors associated with the ith observation.

In matrix notation the systems of equations can be expressed as:

Yˆ = X β + ε (3.17)

where Yˆ is a ( n x 1) vector of flow characteristics at n sites, X is a ( n x p) matrix of ( p -

1) basin characteristics augmented by a column of ones, β is a ( p x 1) vector of regression parameters and ε is a ( n x 1) vector of random errors.

As stated earlier in Section 3.5 the parameters of the model β may be estimated by a number of methods which include ordinary least squares (OLS), weighted least squares

41 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

(WLS) and generalised least squares (GLS). The OLS under some assumptions has very appealing statistical properties that make it a very attractive method for estimating regression parameters.

3.5.1 ORDINARY LEAST SQUARES The method of OLS in general is based on the following assumptions (Dillon and Goldstein, 1984). • The expected value of the matrix of residual errors is zero [i.e. E(ε ) = ].0 • There is no serial correlation between the ith and jth residual terms [i.e.

E(ε iε j ) = 0 for i ≠ j].

2 • The residuals exhibit constant variance [E(ε iε j ) = σ for i = j ]. This is known as the assumption of homoskedasticity.

• The covariance between the Xs and the residual terms ε i is zero [i.e. cov( ε , X )= 0]. • There are no exact linear relationships among the X variables. This is known as the assumption of no multicollinearity .

In the case of hydrologic regression further assumptions are also made: • That there is no cross correlation between concurrent flows between sites. • All annual maximum records have the same record lengths. • At – site flood quantiles estimates all have the same variance.

If all the above assumptions are satisfied, the OLS estimators are unbiased and have minimum variance in the class of all linear unbiased estimators. The multiple linear regression models, with parameters estimated by OLS have been applied in hydrology traditionally. In many applications, there has recently been greater awareness in its application and attention paid to the possible errors and misleading results in the OLS technique.

If the assumption that the residuals are normally distributed with zero mean and homoskedastic variance is violated, the statistical diagnostics such as the coefficient of determination, t-test values and standard error of estimates is misleading (Stedinger and

42 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Tasker 1985, 1986). Stedinger and Tasker (1985) and Draper and Smith (1981) also state that the assumption of homoskedasticity is often violated in hydrology.

3.5.2 ASSUMPTIONS OF ORDINARY LEAST SQUARES METHOD What has been receiving greater attention recently is how to best estimate the parameters of regional hydrologic relationships given that OLS procedures will not identify the most efficient estimates of a regression model parameters when the residual errors are not homoskedastic and independently distributed. As past studies have found (e.g. Matalas and Benson, 1961, Tasker and Moss, 1979) that the OLS estimates can be highly biased.

Weighted and generalised least squares techniques (GLS) were developed to deal with situations like those encountered in hydrology where a regression model residuals are heteroskedastic and perhaps cross-correlated (Draper and Smith, 1981; Johnston, 1972). Tasker (1980) has, in fact, used a weighted least squares procedure to account for unequal record lengths.

An obstacle to the use of WLS and GLS procedures with hydrological data is the need to provide an estimate of the matrix of residual errors; that covariance matrix is a function of the precision with which the true model can predict the values of the streamflow statistic of concern as well as the sampling error in the available estimates of that statistic. The discussions and examples in the works by Tasker (1980) and Kuczera (1983) illustrate difficulties associated with estimation of this matrix.

These problems are discussed below along with the procedures developed by Stedinger and Tasker (1985) for estimating the precision of the underlying regression model as well as sampling variability of streamflow statistic estimators and their cross-correlation for use in WLS and GLS techniques.

3.5.3 THE BASIC PROBLEM - GENERALISED LEAST SQUARES Given a data set, the problem here is how to estimate the parameters of a linear regression model given the value of the T-year flow event (such as the 10-year flood peak) as a function of catchment characteristics. At each of the N sites, the available data in a region are summarised by vectors of basin characteristics, xi and annual maximum streamflow record of ni values available at each site i. Let us assume 43 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

ˆ Y = (q1 + KT s1 ,..., qN + KT sN ) (3.18)

be the vector of at–site flood quantile estimates of T-year flow, where qi and si are the sample estimates of mean and standard deviation of the observed annual maximum floods at site i, and KT is the frequency factor associated with the flow distribution for AEP of 1 in

T. Moreover let Y = (µ 1 + KT σ 1 ,..., µ N + KT σ N ) be the vector of true 1 in T AEP flows, where µi and σ i are the true mean and standard deviation of the annual maximum floods at site i. Therefore the OLS regional regression model can be expressed as

Y = Xβ + e (3.19)

Where Y is a ( n x 1) matrix of flow characteristics at N sites, X is a ( n x k) matrix of catchment characteristics augmented by a column of ones, β is a ( n x 1) vector of regression parameters and e is an ( n x 1) vector of random errors assumed to be normally distributed with zero mean and the covariance matrix assumed to be of the form INσ², where

IN is a N–dimensional identity matrix. The OLS estimate of β is

−1 ˆ β ols = (X ′X ) X ′Y (3.20)

The sampling covariance matrix based on the above assumptions can be expressed as

ˆ 2 −1 Var (β ols ) = σ (X ′ X ) (3.21)

The OLS estimator is generally used by hydrologists to estimate the parameters β in Equation 3.20. But in order for the OLS model to be statistically efficient and robust, the annual maximum flood series in the region must be uncorrelated, all the sites in the region should have equal record length and all estimates of T year events have equal variance.

Since the annual maximum flow data in a region do not generally satisfy these criteria, the assumption that the model residual errors in OLS are homoskedastic is violated and the OLS approach can provide very distorted estimates of the model’s predictive precision (model error) and the precision with which the regression model’s parameters are being estimated (Stedinger and Tasker, 1985). Hewa et al. (2003) stated that the model inferences using the OLS method would be misleading for the highly correlated dependent variables.

44 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

To overcome the above problems in OLS, Stedinger and Tasker (1985) proposed the GLS procedure which can result in remarkable improvements in the precision with which the parameters of regional hydrologic regression models can be estimated, in particular when the record length varies widely from site to site. In the GLS model, the assumptions of equal variance of the T year events and zero cross-correlation for concurrent flows are relaxed.

The GLS procedure as described by Stedinger and Tasker (1985) and Tasker and Stedinger

(1989) require an estimate of the covariance matrix of residual errors Σˆ (Y) whose elements are organised in a matrix as follows:

 2 σi  2 (κ − )1  1+K  for (i = j) n  T 4   i ˆ  Σ(Y)= (3.22)  m σˆσˆ  ij i j  2 (κ − )1  ρ 1+ρ K  for (i ≠ j) ij nn  ij T 4   i j

Where σˆ i is an estimate of the standard deviation of the observed flows at site i, KT is the

T-year frequency factor for the flow distribution, κ is the kurtosis of the flow distribution, ni is the record length at site i, mij is the concurrent record length at sites i and j, and ρij is an estimate of the cross correlation of concurrent flows at sites i and j.

The greatest obstacle faced with the GLS method has been the estimation of Σˆ (Y) , Tasker and Stedinger (1989) proposed a reasonable estimator that should be fairly independent of the sampling errors (Yˆ − Y) and the associated model errors (Yˆ − E[Y ]) . Thus to obtain

efficient and unbiased estimators of β, σ i was replaced in the error covariance matrix in (3.22) by using a generalised least squares procedure (Tasker and Stedinger, 1989) such that

σˆ i = β o + β1 ln (Ai ) (3.23)

where Ai is a surrogate for the basin characteristics.

The GLS estimate of β is

T ˆ −1 −1 T ˆ −1 ˆ βGLS = (X Λ X ) X Λ Y (3.24) 45 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

The sampling covariance matrix thus becomes

ˆ T ˆ −1 −1 Var (β GLS ) = (X Λ X ) (3.25)

Where

Λˆ = γˆ2 + Σˆ (Y) (3.26)

The model error variance γˆ 2 is due to an imperfect model and is a measure of the precision of the true regression model. The model error variance is assumed to be independent of the catchment characteristics. Unfortunately the model error is not known and needs to be estimated. Stedinger and Tasker (1986) proposed a method of moments estimator where γˆ 2 can be solved iteratively by Equation (3.27)

2 ˆ −1 (Yˆ − Xβˆ)′ [γˆ GLS + ∑(Y )] (Yˆ − Xβˆ) = N − k (3.27) where N and k are dimensions of Y and β.

Alternative methods for estimating the model error variance by maximum likelihood can be seen in Kuczera (1983).

For the OLS model, the coefficient of determination and standard error of estimate are used to assess the accuracy of prediction. For the GLS model, the coefficient of determination is

2 not used rather this uses the average variance of prediction given by γˆ p which is made up of the model error variance and the sampling error variance, the standard error of estimate is also utilised in the GLS method.

3.5.4 WEIGHTED LEAST SQUARES Tasker (1980) and Stedinger and Tasker (1984) developed weighted least squares ˆ procedures which account for sampling error in each Yi but not their cross correlation. The WLS β estimator is;

ˆ T ˆ −1 T ˆ −1 ˆ βWLS = (X WX ) X W Y (3.28)

2 −1 where wij = [Λ(γ )ii ] i = j

46 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

wij = 0 otherwise (3.29)

2 −1 Assuming [Λ(γ )] is indeed W (which is the case if ρij = 0 for all i ≠ j), the covariance is

T ˆ −1 βWLS = (X WX ) (3.30)

As with GLS, a difficulty encountered with the WLS estimation procedure is that the β estimator is defined in terms of the unknown model error variance γ 2 . Two estimators of

γ 2 are considered here for use with a WLS algorithm.

Tasker (1980) proposed a method of moments γ 2 estimator for use with WLS procedures.

2 His estimator is based on a correction to the residual mean square error sr .

In this instance the basic model is

ˆ Yi = β 0 + β1 ln Ai + εˆi (3.31) where 2 ˆ Var [εˆi ] = γ +Var [Yi ]

ˆ ˆ 2 Var [Yi ] = E[( Yi − Yi ) ]

As a result, for ρij = 0 ( i ≠ j)

N 2 2 1 ˆ E[sr ] ≅ γ + ∑Var [Yi ] (3.32) N i=1

2 ˆ Thus a method of moment’s estimator of γ for the WLS model when Yi = xi + KT si would be

N 2 2 1 2 2 γˆWLS −MM 1 = sr − ∑ 1( + KT )(2/ si / ni ) (3.33) N i=1

47 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Clearly WLS is just a special case where in Λ(γ 2 )is diagonal. Thus the method of moment’s estimators of γ 2 developed for the GLS algorithm can also be used with a WLS

2 2 approach. In particular, a second WLS method of moment estimator γˆWLS −MM 2 of γˆ can be obtained by solution of

(Yˆ − Xβˆ)T W (γˆ 2 )( Yˆ − Xβˆ) = N − k (3.34) with βˆ given by Equation 3.28, W(γ 2 ) given by Equation 3.29, k = (dimension of β ), and subject to γˆ 2 ≥0.

In summary, Equation 3.34 can be used to estimate β where W is a diagonal matrix with diagonal elements ofΛ(γ 2 ). The model error variance can be estimated by Tasker’s (1980)

2 method of moment’s estimator γˆWLS −MM 1 in Equation 3.32, or by Stedinger and Tasker’s

2 (1985) method of moment’s estimator γˆWLS −MM 2 obtained by solving Equation 3.33.

3.5.5 DEALING WITH DATA PROBLEMS If the GLS model is to work efficiently one must obtain a reasonable estimate of the underlying cross – correlation structure ρij , between flows at every pair of sites i and j.

Using a sample estimate of ρij based on observed flows often results in a Λˆ that cannot be inverted. The reason for this is the sample estimates of ρij are imprecise given the short record lengths usually encountered in hydrological data. This can lead to estimates of ρij that do not make hydrological nor statistical sense.

To overcome this problem the sample correlations are smoothed by relating them to distance between gauging stations using a nonlinear regression model of the form:

 d   ij   d  αdij +1  ij  ρˆ ij =θ = exp [ ln] θ  (3.35)  αd ij +1  where d ij is the distance between stations and θ and α are parameters of the model. In

Equation (3.35) ρˆ ij is a convex, monotonically decreasing function of d ij , when 0 < θ < 1

48 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

1/ α and α > 0. Thus it has a maximum value of 1 at d ij =0, and approaches a minimum of θ as d ij approaches infinity.

Figure 3.2 illustrates the cross correlations among annual peaks in South – east Australia for station pairs with at least 24 years of concurrent data. Use of the smoothed values of ˆ ρˆ ij has resulted in Λ matrices that can be inverted. Equation (3.38) can be refined by addition of explanatory variables to complement d ij .

Correlation Equation of line:

Estimated Correlation ρij =0.970431^(d ij /(0.015127*d ij +1)) 1

0.8

0.6

0.4

0.2

0

-0.2 Cross-CorrelationCoefficient -0.4

-0.6 0 100 200 300 400 500 600 700 800

Distance Between Stations (kM)

Figure 3.2 Relationship between the cross correlations among annual peaks and distance in South - East Australia

3.6 OPERATIONAL GENERALISED LEAST SQUARES – 4 STAGE GENERALISED LEAST SQUARES ANALYSIS

The GLS estimator employed in this study makes some assumptions about deriving the composition of the regression flood quantile model and the matrices of residual errors Λ .The assumption employed here is that the annual maximum flow at any site i may be

2 transformed so that the mean ui and variance σ i arise from regression models of the form:

km log ui = β u + ∑ β uj xij + ε i (3.36) j=1

49 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

ks log σ i = β u + ∑ βσk xik + ∂ i (3.37) k =1

2 2 where ε i and ∂ i are residual error terms with zero means and variances σ ε and σ ∂ , x ij is the jth catchment characteristic at site i , and km and ks are the numbers of catchment characteristics in Equations 3.36 and 3.37. The linear model for σ i in Equation 3.37 implies that σ is strictly positive, whereas the linear model for σ i employed in previous

Monte Carlo studies by Stedinger and Tasker (1985), can result in negative values of σ i .

However, linear models of both ui and σ i are theoretically consistent with the linear model in Equation 3.19 for the T-year flood quantiles. In practical terms, these log-linear models are only simplistic approximations of the complex regression functions for flood quantiles as a function of catchment/climatic characteristics.

All previous GLS analysis has made the assumptions that the error terms in Equations 3.36 and 3.37 are uncorrelated. If it were true that ε i and ∂ i are independent and uncorrelated,

2 the model error γ T associated with the transformed T-year flow µi + KT σ i would

2 2 2 2 be σ ε + KT σ ∂ . Thus, γ T would increase with T and one should see estimates of the model error steadily increase for regressions of the 5 year peak to regressions of the 50 year or 100 year peak. However, this is not always the case. Past studies have shown (Thomas and Benson, 1970; and Tasker and Stedinger, 1989) that the model error of the lower Ts can be larger than the higher ones only when ε i and ∂ i are negatively correlated. The correlation coefficient between ε i and ∂ i is denoted by ρε∂ . Then the model error in a regional hydrological regression model for the T-year would be:

γˆ2 = σ 2 + 2ρ k σ ∆ + k 2∆2 (3.38) ε ε∂ i ε i

2 2 2 2 where ∆ = σ i exp( σ ∂ / 2)[exp( σ ∂ ) − ]1 is the model error variance for the σ -model corresponding to Equation (3.37). This model error formulation for a regional regression of the T-year flood peak is consistent with the observed dip in the model error when ρε∂ is negative.

50 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Consider a regression of the T-year peak flow on catchment and climatic characteristics. In this case the dependant variables, yˆi , are logarithms of the T-year flood estimators obtained using ARR (1987) guideline or FLIKE (Kuczera, 2005). That is:

yˆ i = µˆ i + ki sˆi for I = 1,2,3,…, ni (3.39) where µˆi is the mean of the logs of ni observed flood peaks, sˆi is the observations standard deviation, and ki is the LP3 frequency factor for average recurrence interval T and skewness coefficient gi . The gi is estimated from a separate regional analysis rather than the sample data, or it is estimated as a weighted average of the sample estimate and a separate regional estimate. (See Section 3.6.1 for further details).

2 2 The parameters of the model error variance in Equation 3.38,σ ε , σ ∂ and ρε∂ are estimated in three steps.

2 1. First, σ ∂ is estimated by substituting si for σ i in Equation 3.37 and using the GLS regression method described is Section 3.5.3 and in Stedinger and Tasker (1985) to

2 obtain estimates of βσ and σ ∂ ; in this case the covariance matrix, Λ , is estimated as:

ˆ 2 ˆ Λ = σ ∂ I + Σ (3.40)

ˆ where Σ , the sampling covariance’s for log si , has elements;

 1 2  1( + 75.0 g i ) for i = j 2ni Σ =  (3.41) ij ρˆ m  ij ij (ρˆ + 75.0 g g for i ≠ j  ij i j 2ni n j

2 The required estimate of σ ∂ is obtained by solution of

51 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

T −1 (y − Xβ o ) Λ (y − Xβ o ) = n − k s as suggested by Stedinger and Tasker (1985, 1986) and

Tasker and Stedinger (1989), if a positive solution for σ ∂ exists. Otherwise, σˆ∂ =0.

Regression estimates of si are:

si = exp( αˆ o + ∑ β ol xil ) (3.42) l=1

2 2. In the second step, σ ε is estimated by substituting µˆi for µi in Equation 3.37 and once again using the GLS technique. In this case:

ˆ 2 ˆ Λ = σ ε I + Σ (3.43)

ˆ where Σ , the sampling covariance’s for log µi , has elements;

s 2  i for i = j  ni Σ ij =  (3.44) ρˆ ij mij sˆi sˆ j  for i ≠ j  ni n j

2 T −1 2 and σ ε obtained by solution of (y − Xβ µ ) Λ (σ ε )( y − Xβ µ ) = n − k m

3. Finally, for the quantile estimate in Equation (3.39), one can solve:

ˆ T ˆ −1 −1 T ˆ −1 ˆ β GLS = (X Λ X ) X Λ Y (3.45) and again:

(Yˆ − Xβ )T Λ−1 (Yˆ − Xβ ) = n − p (3.46)

ˆ ˆ for ρε∂ where Λ = γ (ρˆ ε∂ ) + Σ (3.47)

where γ (ρˆ ε∂ ) is a diagonal matrix of model error variances with diagonal elements:

52 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

γˆ 2 = σ 2 + ρˆ k σˆ ∆ˆ + k 2 ∆ˆ 2 (3.48) i ε ε∂ i ε i with:

ˆ 2 2 2 /1 2 ∆ = sˆ{ i exp( σˆ ∂ 2/ )[exp( σˆ ∂ ) − ]}1 (3.49) and Σˆ is the sampling error matrix with elements:

2 2  ki 2 sˆi  1[ + ki g j + 1( + 75.0 g i )] for i = j  2 ni Σ =  (3.50) ij k g k g m s s  ki g i i j i j ij ˆi ˆ j 1[ + + + (ρˆ + 75.0 g g )] ρˆ for i ≠ j  ij i j ij  2 2 2 ni n j

Estimation of ρε∂ requires a univariate search for a satisfactory estimate of ρˆ ε∂ , within the statistically reasonable range [-1.0, +1.0].

3.6.1 ESTIMATING REGIONAL SKEW

For this study a regional weighted least squares (WLS) regression was used to estimate gi, The WLS analysis requires that the vector (gˆ) (whose components are the sample estimates of gi from the observed streamflow data) be unbiased if unbiased estimators of gˆi are to be obtained; in matrix notation this condition may be written E(gˆ) = g .

Also important is the sampling error likely to be associated with each gˆi . Let ∑(gˆ) be a diagonal matrix with sampling variances var( gˆi ) of gˆi about gi on the diagonal.

The WLS procedure will assume that ∑(gˆ) is a reasonable approximation of the sampling covariance matrix of the vector estimator gˆ and gi . This will be the case if we can neglect any correlation between gˆi and gˆ j for i not equal to j, as seems reasonable (Stedinger,

1983). Another basic assumption is that the random skewness coefficient gˆi for a randomly selected site i is generated by a linear function to set of catchment characteristics (such as catchment area, slope, percent forest or their logarithms) and an additive error, ei , so that

k −1 gi = βo + ∑ xij β j + ei (3.51) j =1

53 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

2 in which the model error ei has variance γ and mean 0. Unfortunately, only the estimator gˆi for gi is available. Thus, we consider the more general model

k −1 gˆi = βo + ∑ xij β j + εi (3.52) j=1 which is in vector notation gˆi = Xβ + ε (3.53) in which X = an ( n x k ) matrix of ( k – 1 ) basin characteristics augmented by a column ones; β = a ( k x 1 ) vector of unknown coefficients to be estimated; and ε = an unobserved ( n x 1 ) vector of model errors and sampling errors with E( εi) = 0 and variances given by

2 var( εi ) = var [ei ]+ var [gi ] = γ + var( gˆi ) (3.54)

2 in which var( gˆi ) = the sampling variance of gˆi about its mean gi at site i; and γ is the model error variance defined by

2  k −1     2 var( ei ) = E g i − β o − ∑ xij β j   = γ (3.55)  j=1   In matrix notation, Equation (3.58) may be written as

T 2 E(εε ) = γ I n + ∑ g)ˆ( = W (3.56) in which any correlation between εi and εj for i = j is’ neglected; In = the n x n identity matrix; and E g)ˆ( contains var( gˆi ) along its diagonal. Given the assumptions made in Equations 3.51 - 3.53, the minimum variance unbiased estimator of the β-coefficients can be found by minimising the weighted sum of squares given by

2  k −1    ˆ * *   gˆ i − β o − ∑ xij β j  n    j=1   j = ∑  2  (3.57) i=1  γ + var( gˆ i )      The solution of this optimisation problem can be written in vector notation as βˆ * = (X TWˆ −1 X ) −1 X TWˆ −1 gˆ (3.58)

54 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

The estimator βˆ * has minimum variance among all linear unbiased estimators of β when W, defined by Equation 3.56, is given. Unfortunately, W is not known and must be estimated from the data.

If one assumes a LP3 flow variable as has been done in this study, var( gˆi ), can be approximated by (IACWD, 1982) as

2   6    [a−b log( Ni 10/ )] var( gˆ i ) = 1+   10 (3.59)   N i  a = {− 33.0 + 08.0 G if G < 90.0 in which (3.60) a = {− 52.0 + 30.0 G if G > 90.0

b = { 94.0 − 26.0 G if G < 50.1 and (3.61) b = { 55.0 if G > 50.1

In the above equation, G is a regional mean skew, not the sample at-site skew gˆi . Use of gˆi in this equation would result in undesirable correlation between W and the sample residuals and a relatively poor estimate of the model parameters (Stedinger and Tasker, 1986). Alternatively, and as adopted for this study, the use of non – parametric bootstrap estimators of var( gˆi ) as described by Tung and Mays (1981) will be used. The methodology used for bootstrapping is described in the following section.

3.6.2 VARIANCE OF SAMPLE ESTIMATORS BY BOOTSTRAP VAR ( gi) (Efron, 1977) developed the bootstrap method which can also be used to access the accuracy of any estimate of interest derived from a sample. From the theoretical point of view, the bootstrap is more widely applicable than the jackknife, and also more dependable (Efron, 1978). The procedures of this method are outlined as follows:

1. Let Fˆ be an empirical distribution of N observed streamflow data points, { yi , i=1, …, N}, i.e., 1/ N is the probability of occurrence assigned to each of the observations.

2. Use a random number generator to draw N new points { y* i, i=1, …, N} independently and with replacement from Fˆ , so each new point is an independent random selection of one of the N original data points. This set of N new points is called the bootstrap sample which is a subset of the original data points.

3. Compute the estimate θ, for the bootstrap sample { y* i , i=1, …, N}. 4. Repeat Steps 2 and 3 a large number of times say m = 1.,…, M (in this study taken as 10,000) times each time using an independent set of new random numbers to 55 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

generate the new bootstrap sample. The resulting sequence of bootstrap statistics

*( m) are θ s , m = 1,…, M.

5. The variance of θ s can be calculated as

M 1 (* m) ˆ* 2 Var (θ s ) = ∑(θ s −θ s ) (3.62) M m=1

ˆ*  1  M (* m) in which θ s =  ∑ θ s .  M  m=1 The bootstrap method as described can be used to compute the variance of any number of statistics, e.g., mean, standard deviation and skewness, derived from the sample.

3.7 ORDINARY LEAST SQUARES - MODEL DEVELOPMENT TECHNIQUES USED

Prediction equations for the flood quantiles Q1.25 , Q2, Q5, Q10 , Q20 , Q50 , Q100 and Q200 are to be developed using the OLS procedure. The statistical package MINITAB will be used to develop these equations. While the software will produce equations based on any input information, both user intervention and mathematical/hydrological judgment is required to ensure that the best and most appropriate prediction equations are derived.

The “regression” function of MINITAB will apply the theory of multiple linear regression (OLS) to develop an initial prediction equation. Within this regression analysis, the most significant independent variables are selected by using the backward variable selection technique (the variable involved must achieve at least 10% significance). Once the initial prediction equations are produced, they are then investigated for model assumptions such as outliers, normality of residuals, goodness - of - fit and influential data points.

The suitability and applicability of the prediction equations are evaluated based upon statistical parameters given in the model output, these include the following:

(a) Coefficient of multiple determination ( R2). This is the most commonly used measure of the goodness-of-fit of a linear model. It measures how good the estimated regression equation relates to the observed data points. Typically the higher the R2, the more confidence one can have in the equation. Statistically, the coefficient of determination represents the proportion of the total variation in the dependent variable that is explained

56 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney by the regression equation. It has the range of values between 0 and 1. An adjusted R2 may also be used, which attempts to more closely reflect the goodness-of-fit of the model to the population (Dillon and Goldstein, 1984). The adjusted R2 is defined by

R 2 − m 1(' − R 2 ) R 2 = (3.63) a N − m'−1 where N is sample size and m′ is number of variables in the equation.

(b) Standard error of estimate (SEE). The coefficient of determination alone is not a sufficient measure to judge the goodness – of – fit of a multiple linear regression model, since its value increases as more independent variables are added. For two models with the same R2, the one having the smaller SEE is preferable. The standard error of estimate typically provides an estimation of the dispersion of the prediction errors when Y values are being predicted from X values in a regression analysis. Given one type of transformation is being used i.e. log transformation, the unstandardised SEE of the model will be expressed in percentage form by the following equation.

SEE= SEE value obtained from the prediction equation * (100%) (3.64)

(c) Regression diagnostic plots are important and plots should be examined to check suitability of the linearity assumptions employed in ordinary least squares regression. That is that the residuals should be normally distributed with constant variance i.e. homoskedasitic. To supplement the plots in assessing homoskedasticity two test statistics will be used 1) Breusch-Pagan (also known as the Cook-Weisberg test) test for heteroskedasticity and the Kolgmogorov – Smirnov test for the normality of residuals. With the Breusch-Pagan tests the null hypothesis is made that the variance of the residuals is constant. Two runs of regression are needed: (1) Y ( Q1.25 , Q2, Q5, Q10 , Q20 , Q50 , Q100 and

Q20 against X (catchment characteristics) , and (2) Squared residuals against X. Large values of the test statistic lead to the conclusion that the error variance is not constant (i.e Heteroskedasticity). The test statistic is given by:

SSR *  SSE 2 χ 2 = ÷   (3.65) BP 2  n 

57 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

It is distributed as Chi-square, 1 degree of freedom under Ho, SSR* is the SSR for regressing the squared residual against X and SSE is the SSE for regressing Y against X.

Checking for degree of multicollinearity in the predictors: To measure the degree of multicollinearity, a statistic called the variance inflation factor ( VIF ) has been used, defined as: 1 VIF i = 2 (3.66) 1− Ri th where Ri is the multiple correlation when the i independent variable is predicted from other independent variables (Dillon and Goldstein, 1984). The value of VIF ≥ 1, depending

on the value of Ri ; a value of one indicates complete independence. A variable showing relatively high VIF is highly correlated with one or more of the other independent variables. For highly correlated variables, VIF may be as high as 50 (Norusis, 1993).

While the above criteria gives the user a strong indication of how good the equation fits the observed data, there still remains a strong possibility that the equations developed have been distorted by data points which show up as discordant sites. These data points are known as “outliers” or “influential points”. Essentially the parameters and prediction values listed within a developed equation should not be significantly affected due to the exclusion of one single data point. Subsequently it is necessary to remove these influential points until a situation is achieved where the equation will no longer be affected if further points are excluded.

The following techniques are used to identify possible influential and outliers in the data series: • Outliers are identified from the MINITAB screen output; these are where data items have a standardized residual value greater than 3; • Influential points were identified by a combination of two methods, the first being “Cook’s distance”. Where Cook's distance is commonly used to estimate the influence of a data point in a regression analysis. Using a dimensionless unit, data points yielding a Cook's distance of 1 or more are considered to warrant closer examination (or exclusion) from the analysis depending on source of discordancy.

58 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

By completing a simple chart of station serial number vs. Cooks distance, the influential stations can easily be identified and removed (if required); and the second method used to identify influential points is known as “Centered Leverage Value”. Similar to Cooks distance, the leverage is a measure of the influence of a given observation on a regression; it comes about due to its location in the space of the inputs. One rule of thumb is to compare the leverage to 2p/n where n is the number of observations and p is the number of parameters in the model. By completing a simple chart of station serial number vs. leverage, the influential stations can easily be identified and removed (if required). As a general guide it was deemed reasonable to reduce the leverages below 0.2.

3.8 GENERALISED LEAST SQUARES - MODEL DEVELOPMENT

Prediction equations for the flood quantiles Q1.25 , Q2, Q5, Q10 , Q20 , Q50 , Q100 and Q200 are to be developed using the GLS procedure. Specific software was written for this in the statistical programming language R.

The goal the of the generalised least squares (GLS) regression analysis in this study is to identify the best model one can develop for estimating flood quantiles at ungauged catchments as a function of catchment characteristics. The basin characteristics used in the OLS analysis will be used in the GLS analysis as suggested by Tasker et al. (1987).

Due to the correlation between the residuals, different record lengths at each site and the cross correlation of annual maximum flows in the region the traditional OLS analysis is not appropriate for regional hydrological regression. The GLS regression procedure should be used to relate the quantiles to the specified basin characteristics and to describe the errors. The GLS regression procedure applied for this study is given below, while further detail on the equations used can be seen in Section 3.6.

3.8.1 REGRESSION MODEL FOR SKEWNESS The streamflow record available at a site is often limited. As can be seen in this study the records have a minimum of 25 years and a maximum of 52 years. These short records yield sample skews which are sensitive to extreme events. To improve the accuracy of the skewness estimator a regional weighted least squares regression that relates sample skews to basin characteristics is undertaken, details of the equations and methods adopted can be seen in Section 3.6. Because the sample skew is an estimate with an associated estimation 59 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney error, the approximation in Equations 3.41 and 3.50 (sampling covariance matrix of residual errors for the standard deviation and flood quantile respectively) should be extended to reflect all the sampling error in the quantile estimates.

3.9 SETTING UP OF THE RESIDUAL ERROR COVARIANCE MATRICES 3.9.1 REGRESSION MODEL FOR STANDARD DEVIATION To avoid correlation between the residuals and the fitted quantiles, an estimator of the standard deviation σi other than the at-site value s i is required to compute sigma ∑. Unbiased estimators of the standard deviation are desired. The important requirement is that the estimator be independent of the model’s residuals and time-sampling error. The estimated coefficients of the standard deviation model, Tasker and Stedinger (1989) suggest a GLS procedure, given that there is possible cross correlation between sites. Thus a GLS model of the standard deviation is adopted, where the standard deviation is regressed against basin characteristics that are present in the final OLS model for each ARI.

Estimation procedure of the sampling error covariance matrix ∑(s) for the standard deviation has been explained in Section 3.6. To avoid correlation between the residuals and the estimated standard deviations,

(i) ρij is estimated as a function of the distance between sites i and j, (see Section 3.6) (ii) the regional skew, estimated from the WLS regression (see Section 3.6) is used in place of the population skew,

(iii) the standard deviations σi and σj are the log standard deviation of the observed annual maximum floods at site i and j.

Estimates of the model parameters β are obtained by iteratively solving Equation (3.42).

3.9.2 REGRESSION MODEL FOR MEAN To avoid correlation between the residuals and the fitted quantiles, an estimator of the mean µi other than the at-site value qi is required to compute sigma ∑. Unbiased estimators of the mean flood are desired. The important requirement is that the estimator be independent of the model’s residuals and time-sampling error. The estimated coefficients of the standard deviation model, Tasker and Stedinger (1989) suggest a GLS procedure, given that there is possible cross correlation between sites. Thus a GLS model of the mean

60 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney flood is adopted where the mean flood is regressed against basin characteristics that are present in the final OLS models for each ARI. Estimation of the sampling error covariance matrix ∑(µ) for the mean flood has been discussed in Section 3.6. To avoid correlation between the residuals and the estimated standard deviations,

(i) ρij is estimated as a function of the distance between sites i and j, (see Section 3.6) (ii) the standard deviations estimated from the GLS regression above are used instead of the observed standard deviations at site i and j.

Estimates of the model parameters β are obtained by iteratively solving Equation (3.44).

3.9.3 GLS REGRESSION MODEL FOR THE QUANTILES To avoid correlation between the residuals and the fitted quantiles, a GLS estimator of the mean, standard deviation and skewness is required other than the at-site values to compute sigma ∑. The important requirement is that the quantile estimated be independent of the model’s residuals and time-sampling error. To estimate the coefficients of the quantile model, Tasker and Stedinger (1989) suggest a GLS procedure, given that there is possible cross correlation between sites. Thus a GLS model for the quantile is adopted where the quantile is regressed against basin characteristics that are present in the final OLS models for each ARI. The estimation procedure of the sampling error covariance matrix Σˆ (Y) for the quantile is explained in Section 3.6. To avoid correlation between the residuals and the estimated quantiles,

(i) ρij is estimated as a function of the distance between sites i and j, (see Section 3.6); (ii) the standard deviations estimated from the GLS regression above are used instead of the observed standard deviations at site i and j; (iii) the regional skew, estimated from the WLS regression (see Section 3.6) is used in place of the population skew, the regional skew is also used to derive the regional frequency factors for each ARI.

Estimates of the model parameters β are obtained by iteratively solving Equation (3.46).

3.10 MEASURES OF MODEL AND PREDICTION ERROR The purpose of this GLS regression model is to estimate flood quantiles at ungauged sites and to compare it with the traditionally used OLS regression model. Therefore given a site 61 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney with basin characteristics xo, a concern is how well the GLS regression model predicts the true quantile, yo (Tasker et al., 1986). Under the assumption that the observed data were collected at representative sites at which predictions will be made, the average variance of prediction (VP) over the available dataset is a measure of how well the GLS regression model predicts the true quantile on average, where:

__ 2 1 T ˆ −1 −1 V pGLS = γˆ + [ ∑{xi (X Λ X ) xi′}] (3.67) i ne

__ 2 sr −1 V pOLS = [ne + ∑ xi (X ′ X ) xi′] (3.68) i ne

Here the xis are row vectors containing the catchment characteristics of each site. When comparing hydrologic regression models a smaller AVP is preferred. (See studies citied in Reis et al., 2005). Additionally if the residuals have nearly a normal distribution, then the standard error of prediction in percent (SEP) for the true flood quantile estimator is described by:

___ 2/1 SE p = 100 (Exp (V p ) − )1 (3.69)

In order to assess the precision, suitability and applicability of the GLS regression model, the AVP and the model error variance are preferred over the common measure such as the traditional R2, which misrepresents the true power of the model. The traditional R2 statistic measures the proportion of variance in the observed yˆi values explained by the fitted model. Unfortunately, that proportion considers the total error, which includes the sampling error. However, our interest is actually to quantify the proportion of the variance among the

2 unobserved yi explained by the model. Let γˆ (k) be the estimated model error variance for the regression model (as explained in Section 3.6) with k explanatory variables, and γˆ 2 (0) be the estimated model error variance when no explanatory variables are employed. Then a pseudo R 2 appropriate for use with GLS regression (Reis et al., 2005) is:

γˆ 2 (k) R 2 = 1− (3.70) GLS γˆ 2 )0(

62 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Both the pseudo R 2 and the traditional adjusted R2 correct for the degrees-of-freedom lost when k parameters are estimated.

Another important measure used in GLS regression is to analyze the average sampling error variance (ASEV). The ASEV for GLS and OLS respectively in Equations 3.71 and 3.72 are calculated as:

ˆ ' −1 −1 ' ∑i xi (X Λ X ) xi (3.71)

ˆ ' −1 ' ∑i xi (X X ) xi (3.72)

The sampling error is the error in prediction yi due to estimating the true regression parameters β, with b. The smaller the ASEV for a model the more accurate the prediction will be. The GLS model is expected to give smaller ASEVs than the competing OLS model.

The equivalent years of record, en i (Hardison 1971), expresses the accuracy of prediction in terms of years of record required to achieve results of equal accuracy. It is calculated as:

2 2  ki 2  sˆi 1+ ki gi + 1( + 75.0 gi )  2  en = (3.73) i 2 ˆ γˆi + ∑ i where gi is estimated from the regional weighted least squares regression as discussed in

2 Section 3.8.1, sˆi is the estimated variance from GLS regression as discussed in Section 3.9.1.

3.11 DEVELOPMENT OF THE PROBABILISTIC RATIONAL METHOD In the past, the Rational Method has often been regarded as a deterministic representation of the flood generated from an individual storm. It is presented in ARR 1987 as a probabilistic or statistical method for use in estimating design floods. The peak flow of selected ARI is estimated from an average rainfall intensity of the same ARI derived from Book II Section 1 of ARR. The central component of the method is a runoff coefficient, the use of which involves a simple linear interpolation over the geographic space between

63 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney the nearest contour lines of the runoff coefficients, which assumes that the geographical proximity is a surrogate to hydrological similarity.

The Rational Method was recommended in ARR1987 for application to only small catchments below some arbitrary limit such as 25 km2. This range of validity was intended to reflect the inadequate manner in which the method considers physical factors, such as the effects of temporary storage on the catchment, and temporal and spatial variations of rainfall intensity. These physical considerations have little relevance to the probabilistic interpretation, where their effects are incorporated in the recorded floods, and hence in the flood frequency statistics and the derived parameter values. Procedures derived from observed data should be valid for catchment areas and ARIs up to and somewhat beyond the maximum areas and record lengths used in derivation (I. E. Aust., 1987).

The Probabilistic Rational Method is represented by:

QT = .0 278 CT Itc ,T A (3.74)

3 where QT is the peak flow rate (m /s) for an ARI of T years; CT is the runoff coefficient

(dimensionless) for an ARI of Y years; I tc ,T is the average rainfall intensity (mm/h) for a design duration equal to the time of concentration tc (hours) and an ARI of T years; and A is the catchment area (km 2).

The runoff coefficient represents the ratio of a peak runoff intensity , determined from frequency analysis of flood peaks, and a rainfall intensity of selected duration and the same ARI, determined from frequency analysis of rainfalls (Equation 3.75). This is why Q, I and C in Equation 1 are subscripted by T to represent the ARI. This probabilistic interpretation of the Rational Method and the runoff coefficient exactly fits the way in which the method is used in design practice. Even when it is not recognized, estimation of a design flood from rainfall frequency data such as those in Book II Section 1 involves use of the Rational Method as probabilistic model (I. E. Aust., 1987).

QT (3.75) CT = .0 278 Itc ,T A

64 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Values for I tc ,T for all Australia can be found from Book II Section 1 of ARR. For several regions with adequate streamflow data, flood frequency analyses were carried out for many small to medium sized catchments. From QT values obtained by those analyses, values of

CT were determined, and the resulting design data and methods for those regions were included in the recommended procedures in ARR1987. The catchment and rainfall characteristics and conditions affecting the relation between QT and I T are automatically incorporated in CT . Derived values of CT have generally been found to vary in a reasonably regular or consistent manner over the range of ARI values on a given catchment, and for different catchments over a particular region (I. E. Aust., 1987).

Equation 3.75 shows that the value of CT depends on the duration of rainfall, and some design duration related to catchment characteristics must be specified to estimate CT as part of the overall procedure. A typical response time of flood runoff appears to be adequate, and the “time of concentration” is a convenient measure as far as practical application of the PRM is concerned. In this context, its accuracy regarding travel times is much less important than the consistency and reproducibility of derived CT values, as suggested in

ARR87. Also, values of CT cannot be compared unless consistent estimates of tc are used in their derivation (IEAust., 1987). However, in the deterministic interpretation of the

Rational Method, the critical rainfall duration is tc , which is considered to be the travel time from the most remote point on the catchment to the outlet, or the time taken from the start of rainfall until all of the catchment is simultaneously contributing flow to the outlet. For the PRM, these physical measures are not directly relevant. In several of the Rational

Method procedures recommended in ARR1987, equations are specified for estimating tc . The specified equation must be used with the design data given for the particular procedure and region. One commonly adopted equation is

38.0 tc = 76.0 A (3.76)

2 where tc is the time of concentration (hour) and A is area of catchment ( km ).

65 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

In other cases where a complete procedure based on observed data is not available, the Bransby Williams formula was recommended in ARR1987 as an arbitrary but reasonable approach. This is:

58 L (3.77) tc = 1.0 2.0 A Se

where tc is the time of concentration (hour); L the mainstream length measured to the

2 catchment divide ( km ); A the catchment area ( km ) and Se the equal area slope of the main stream projected to the catchment divide ( m/km ). This is the slope of a line drawn on a profile of a stream such that the line passes through the outlet and has the same area under it as the stream profile. In this study, Equation 3.77 was adopted for Victoria, as suggested in ARR1987.

In the development of the new prediction equations, test catchments have been excluded from the database used but there is a great possibility that the isopleths representing CT values in the PRM in ARR 1987 have included these data points within their derivation. As a result a direct comparison would show bias towards the PRM. Consequently to eliminate bias, it is deemed necessary to re-apply the existing PRM as detailed in ARR Vol. 1 (I.E Aust, 1987) based upon the latest streamflow data obtained. The theory of the PRM has been discussed in Chapter 2 while the key assumptions are discussed above.

3.12 SUMMARY A number of statistical techniques and formulations to be used in this study have been presented in this chapter. Two statistics based on L- moments are of special importance. Firstly, the discordancy measure (D) will be used at the beginning of the analysis to identify sites with possible gross data error. Secondly, the heterogeneity measure (H) will be used to assess the degree of homogeneity in the proposed region. This will be done to try and identify similar regions for more robust regional regression equations. . LP3 Bayesian methods will then be used to derive at – site flood frequency quantile estimates for the selected 8 ARIs. This will also incorporate error in rating curve analysis

66 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney which will provide more accurate flood quantiles based on a Bayesian methodology of identifying the probability distribution of the error in incremental zones.

Multiple regression analysis using ordinary least squares and generalised least squares procedure will then be undertaken to derive regional regression equations relating flood quantiles to catchment and climatic characteristics. The GLS regression procedure has been looked at in more detail. The 4 stage generalised least squares method adopted in this study, specifically for relating hydrological statistics such as flow quantiles, means, standard deviations and skewness to catchment/climatic characteristics have been discussed. The setting up the residual error covariance matrices with each stage has also been discussed.

The next chapter will discuss the study area and different aspects of streamflow data collation and preparation.

67 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

CHAPTER 4: STUDY AREA AND PREPARATION OF STREAMFLOW DATA 4.1 GENERAL

The assembly and preparation of streamflow data is an important step in any regional flood frequency analysis study. This chapter describes various aspects of the streamflow data collation adopted for this work e.g. selection of the study area, selection of stream gauging sites, checking annual maximum streamflow data, filling gaps in the streamflow data series, checking rating curve extrapolation errors associated with the streamflow data series, checking for outliers in the data series and testing for any significant trends that could undermine the purpose of flood frequency analysis. Finally, a set of independent test catchments are selected to use these for validation and testing of the developed prediction equations and PRM.

4.2 STUDY AREA For this RFFA, state of Victoria is selected as the study area. If this study shows the superiority of the QRT over the current PRM it can then be extended to other states within Australia as a part of the revisions of Book 4 of the ARR. The selected study area is shown in Figure 4.1.

Figure 4.1 Study area in the state of Victoria in Australia highlighted

68 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

4.3 SELECTION OF INITIAL CANDIDATE CATCHMENTS

The following factors were considered in making the initial selection of the study catchments.

Catchment Area: The proposed regionalisation study aims at developing prediction equations for flood estimation in small to medium sized ungauged catchments in Victoria. Since the flood frequency behaviour of large catchments has been shown to significantly differ from smaller catchments, the proposed method should be based on small to medium sized catchments. ARR (I.E Aust, 1987) suggests an upper limit of 1000 km 2 for small to medium sized catchments, which seems to be reasonable and was adopted here.

Record Length: The streamflow record at a stream gauging location should be long enough to characterise the underlying probability distribution with reasonable accuracy. In most practical situations, streamflow records at many gauging stations in a given study area are not long enough and hence a balancing act is required between obtaining a sufficient number of stations (which captures greater spatial information) and a reasonably long record length (which enhances accuracy of at-site flood frequency analysis). Selection of a cut-off record length appears to be difficult as this can affect the total number of stations available in a study area. However for this study, the stations having a minimum of 10 years of annual instantaneous maximum flow records were selected initially as ‘candidate stations’.

Regulation: Ideally, the selected streams should be unregulated, since major regulation affects the rainfall-runoff relationship significantly (storage effects). Streams with minor regulation, such as small farm dams, may be included because this type of regulation is unlikely to have a significant effect on annual floods. Gauging stations subject to major regulation were not included.

Urbanisation: Urbanisation can affect flood behaviour dramatically (e.g. decreased infiltration losses and increased flow velocity). Therefore catchments with more than 10% of the area affected by urbanisation were not included in the study.

69 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Landuse Change: Major landuse changes, such as the clearing of forests or changing agricultural practices modify the flood generation mechanisms and make streamflow records heterogeneous over the period of record length. Catchments which have undergone major land use changes over the period of streamflow records were not included in the data set.

Quality of Data: Most of the statistical analyses of flood flow data assume that the available data are essentially error free; at some stations this assumption may be grossly violated. Stations graded as ‘poor quality’ or with specific comments by the gauging authority regarding quality of the data were assessed in greater detail; if they were deemed ‘low quality’ they were excluded.

Based on the above criteria, a total of 415 stations were selected each having a minimum of 10 years of streamflow record. The geographical distribution of the candidate stations can be seen in Figure 4.2. It is interesting to note that there is a lack of stations in the Northwest of Victoria. It is not surprising, as there is usually little surface runoff during most years in this region and there is lack of well defined stream network.

Figure 4.2 Geographical distributions of the candidate study catchments

70 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

4.4 FILLING MISSING RECORDS IN ANNUAL MAXIMUM FLOOD SERIES Missing observations in streamflow records at gauging locations are very common and one of the elementary steps in any hydrological data analysis is to make decisions about filling in these missing data points. A popular method adopted for this purpose is to correlate records of the gauging station in question with one or more nearby stations that exhibit hydrological similarity. This method however was not adopted in this study for the following reasons: • there is no guarantee that nearby stations possess hydrological similarities in flood characteristics, unless it is proven so; and • research has shown that the average standard error for a missing record reconstructed with flow data from other stream gauges may be as high as 52% (Fontaine, 1986).

Instead of the above method, missing records in the annual maximum flood series were in- filled where possible by one of the following methods as shown in Rahman (1997).

Method 1 : (a) Comparison of the monthly instantaneous maximum (IM) data with monthly maximum mean daily (MMD) data at the same station for years with data gaps. If a missing month of instantaneous maximum flow corresponds to a month of very low maximum mean daily flow, then that was taken to show the annual maximum did not occur during that missing month. As an example, consider the gap in 1965 at Station 222202, shown in Table 4.1. It can be seen that, during the missing months of IM data (i.e. January-May), the MMD flows are substantially lower than the IM flow in August. Therefore it can safely be assumed that the annual IM flow for 1965 was the value recorded in August.

Table 4.1 In-filling annual maximum flood series (Method 1) Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec MMD Series 470 115 142 235 91 252 228 5155 198 360 1756 265 IM Series 351 262 11534 221 626 3091 283

71 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Method 2:

(b) Regression of the annual maximum mean daily flow series against the annual instantaneous maximum series of the same station. The regression equations developed showed strong correlations (0.82 ≤ R2 ≤ 0.99) and were used for filling gaps, but not to extend the overall period of record. For some of the stations, transformation of the flow data into natural logarithms increased the goodness – of – fit. Flow values equal or close to zero were ignored in developing the regression equations.

Method 1 was preferred over Method 2. The following points are worth mentioning in connection with the infilling of the annual maximum series. But it should be noted that the following values were obtained after removal of stations showing significant trends and stations that had no IM or MMD data for infilling gaps.

• 273 data points from 187 stations were filled by comparing flow records; (Method 1) • 60 data points from 44 stations were filled by regression; (Method 2) • Regression equations used in gap filling showed high R2 values (range 0.82 – 0.99, mean = 0.93 and SD = 0.041) • 10% of stations did not have any missing records.

Table 4.2 shows some useful information in relation to in filling the gaps in annual flood series by Method 2.

72 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 4.2: Important information on Method 2 of gap filling in AM data series

Missing Sample size for Station point regression Obtained R2 Obtained R 222202 1 20 0.96 0.98 222206 1 30 0.93 0.96 223202 1 27 0.94 0.97 223204 1 33 0.97 0.99 225213 1 31 0.95 0.98 226007 5 15 0.9 0.95 227210 1 30 0.91 0.95 227219 1 31 0.84 0.92 227231 1 31 0.94 0.97 228212 2 28 0.98 0.99 228217 1 27 0.88 0.94 229214 2 32 0.93 0.96 229218 1 22 0.82 0.91 230204 1 30 0.93 0.96 231212 7 13 0.97 0.99 234203 1 32 0.97 0.99 235212 2 15 0.88 0.94 235216 1 34 0.87 0.93 235221 1 11 0.94 0.97 235223 1 14 0.88 0.94 235229 1 14 0.97 0.99 237202 1 37 0.97 0.99 237206 1 32 0.99 1.00 238207 1 31 0.91 0.95 238208 1 37 0.92 0.96 238219 1 32 0.95 0.98 238221 1 24 0.95 0.98 238233 1 12 0.95 0.98 401208 1 40 0.9 0.95 401216 2 50 0.9 0.95 403205 1 34 0.99 1.00 403209 1 32 0.99 1.00 403213 1 32 0.91 0.95 403214 1 31 0.92 0.96 403216 1 11 0.99 1.00 403222 1 31 0.95 0.98 403227 1 32 0.95 0.98 404206 1 32 0.95 0.98 404208 1 31 0.95 0.98 405209 1 32 0.88 0.94 405219 1 38 0.98 0.99 405231 1 30 0.92 0.96 405241 1 31 0.89 0.94 407209 3 27 0.91 0.95

73 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

4.5 TREND ANALYSIS – MANN KENDALL TEST FOR TREND AND DISTRIBUTION FREE CUSUM TEST Hydrological data for any flood frequency analysis work be it at site or regional should be stationary, consistent and homogeneous. A simple but effective procedure for screening hydrological data is to test annual maximum flood series for trends both increasing and decreasing and for significant increases or decreases in the mean of a data set after a particular time period.

Hydrological data for any flood frequency analysis work, be it at-site or regional, should be stationary, consistent and homogeneous. Annual maximum time series of hydrological data may exhibit jumps and trends owing to what Yevjevich and Jeng (1969) call inconsistency and non-homogeneity. Inconsistency is a change in the amount of systematic error associated with recording of the data. It can arise from the use of different instruments and methods of observation. Non-homogeneity is a change in the statistical properties of the time series. Its causes can be either natural (shift in weather patterns/climate change) or man-made such as land use, relocation of gauging station and implementation of flow diversions.

In this study, two trend tests were applied, the Mann–Kendall test and the distribution free CUSUM test; both tests were applied at the 5% significance level. The Mann-Kendall test is concerned with testing whether there is an increase or decrease in a time series, whereas the CUSUM test concentrates on whether the means in two parts of a record are significantly different. As a useful guide and in addition to the trend tests, a simple time series plot and a cumulative flow graph of the station were also used to detect shifts in data.

Continuing from above, there is also the possible influence of long-term persistence in flood data (e.g. flood and drought dominated regimes or inter-decadal variability as described by Kuczera and Franks (2005) in the revised ARR Chapter 2) but this has not been specifically considered in this study. While the use of data from different periods would tend to smooth out such persistence effects, it could also introduce additional ‘noise’ into flood series and mask true differences in catchment flood response.

74 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

A brief outline and some equations involved in the tests are outlined below. The analysis was undertaken using the TREND detection software based on the CRC for Catchment Hydrology publication Hydrological Recipes.

MANN – KENDALL TEST This method tests whether there is a trend in the time series data. It is a non – parametric test. The n time series values ( X1, X 2, X 3,… ,X n) are replaced by their relative ranks ( R1, R 2,

R3,… ,R n) (starting at 1 for the lowest up to n).

n−1 n

The test statistic S is: S = ∑[ ∑sgn( R j − Ri )] (4.2) i=1j=i + 1 where sgn( x) = 1 for x > 0 sgn( x) = 1 for x = 0 sgn( x) = -1 for x < 0

If the null hypothesis Ho is true, then S is approximately normally distributed with: µ = 0 and σ = n ( n-1) (2 n+5)/18 (4.3) The z – statistics is therefore (critical test statistic values for various significance levels can be obtained from normal probability tables): Z = S/σ0.5 (4.4) A positive value of S indicates that there is an increasing trend and visa versa.

DISTRIBUTION FREE CUSUM TEST This method tests whether the means in two parts of a record are different (for an unknown time of change). It is a non – parametric test.

Given a time series data ( X1, X 2, X 3,…,X n) the test statistic is defined as:

k

Vk = ∑sgn( xi − xmedian ) k = 3,2,1 ,..., n (4.5) i=1 where sgn( x) = 1 for x > 0 sgn( x) = 1 for x = 0 sgn( x) = -1 for x < 0

x median is the median value of the xi data set.

75 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

The distribution of Vk follows the Kolmogorov – Smirnov two sample statistic (KS = (2/ n) max Vk) with the critical value of max Vk for the 5% significance level given by: 36.1 α = 05.0 , (4.6) n

A negative value of Vk indicates that the latter part of the record has a higher mean than the earlier part and visa versa.

Initially the Mann-Kendall test was applied to the selected stations. The results were rather surprising as this revealed that some 20% of the stations had a decreasing trend generally after 1990. Given the magnitude of the number of stations showing trend, time series plots and mass curves were prepared for the stations showing trend to detect visually if significant changes in slope could be identified.

As an example, Figure 4.3 shows a significant downward trend for Station 230210 after 1990, supporting the result from the Mann-Kendall test. In order to clarify this further the CUSUM test was applied; the result was much the same, with the plotted graph as seen in Figure 4.4 showing a decrease in the flow magnitude from 1995 onwards.

A simple time series plot was therefore useful in addition to trend tests in detecting and confirming shifts in data. With an indication from these tests that flood data are not independently and identically distributed from year to year, there needs to be caution applied when using short records in estimating long term risks. The fact that the last 10–15 years of data (after late 1980’s) show a significant downward trend for many stations (presumably due to the drier climate epoch we have entered) makes the inclusion of stations with short records in regionalisation studies quite questionable.

It is important to incorporate these findings in the data collation for this regionalisation study. We can compensate for sampling variability in many RFFA methods but we cannot compensate for the bias that will be introduced into the model due to the systematic downward trend in annual maximum flood data encountered in the short records.

However, more recently Micevski et al. (2006) present an alternative approach based on Bayesian hierarchal modelling that can quantify uncertainty in RFFA by looking at interdecadal variability; this certainly could be an alternative future approach. In this study, 76 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney the introduction of a cut-off record length appears to be appropriate, i.e. records shorter than 25 years and extending to near 2005 are likely to be affected by significant bias because of the persistent drought impacts since the early 1990’s; they should thus be excluded from the database. Although this approach will remove more than half of the candidate stations and undermine spatial coverage, the remaining stations will be less affected by bias and thus will yield more accurate RFFA methods.

The number of eligible stations remaining after the trend tests and the introduction of a cut off length of 25 years, dropped to 146, which is only 35% of the initially selected 415 stations. This result shows that the effective data set for RFFA in a given region is likely to be substantially smaller than the primary data set.

IMM - Station 230210

12000

IMM 10000

8000 Decrease in flow

6000 magnitude

Flow (ML/d) 4000

2000

0 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year

Figure 4.3 Time series graph showing significant trends after 1995

77 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Vk - Station 230210 9 8 Vk 7

6 5

Vk 4 3 Significant shift downwards, 2 1

0 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year Figure 4.4 CUSUM test plot showing significant trends after 1995 . Sample plots from the trend analysis can be seen in Appendix B.

4.6 RATING CURVE EXTRAPLOATION ERROR The rating curve used to convert measured flood levels to flood flow rates is based on periodic measurements of flow areas and velocities over a range of flood levels and flood magnitudes. However, the range of observed flood levels generally exceeds the range of ‘measured’ flows, thus requiring different degrees of extrapolation of well established rating curves. Different methods of rating curve extrapolation use a range of assumptions, from simple extension of fitted regression lines to hydraulic analysis methods requiring additional data. The magnitude of rating curve extrapolation errors depends on the stream and flood plain conditions near the gauging station, the strengths of the assumptions made in extrapolating, and the degree of extrapolation beyond the range of measured flows. (Kuczera, 1999 FLIKE HELP, Chapter 2).

Any rating curve extrapolation errors are directly transferred into the largest observations in the annual maximum flood series, and use of extrapolated data in flood frequency analysis can thus result in grossly inaccurate flood estimates. This is likely to translate into over- or under-design of hydraulic structures with great economic consequences.

78 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

To assess the degree of rating curve related error for a given station, the annual maximum flood series data point for each year (estimated flow QE) was divided by the maximum measured flow (Q M) to obtain a rating ratio (RR) (see Equation 4.7). If the RR value is below or near 1, the corresponding annual maximum flow may be considered to be free of rating curve extrapolation error. However, a RR value well above 1 indicates a rating curve error that can cause notable errors in flood frequency analysis. Q Rating Ratio (RR ) = E (4.7) QM As an example, for Station 222202, there are 11 data points with RR values greater than 1 (27% of total data points) and the maximum value of RR is 5.5 (Figure 4.5). This large degree of rating curve extrapolation is likely to affect flood frequency estimates at this station, unless appropriate measures are taken to allow for likely differences in accuracy of individual annual maxima. All the selected stations were examined for rating curve extrapolation error as described above.

Likely Rating Curve Error - 222202 6

5 Data points subject to possible rating curve 4 M

/Q 3 E Q 2

1

0 0 10 20 30 40 50 Rank of Data Point

Figure 4.5 Plot of rating ratios for Station 222202

79 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

4.7 IMPACT OF RATING CURVE ERROR ON FLOOD FREQUENCY ANALYSIS

In the remaining data set of 146 stations, many had rating ratios (RR) considerably greater than 1. For any RFFA study, a large number of stations with reasonably long record lengths are required and hence a trade-off needs to be made between an extensive data set that includes stations with very large RR values and a smaller data set with RR values restricted to what could be considered to be a “reasonable upper limit”.

A working method to decide on a cut-off RR value was determined by looking at the average RR value and the maximum RR value for each station. From the histogram of RR values of the selected stations shown in Figure 4.6 it can be seen that 90% of the RR values for all the recorded annual maxima lie between 1 and 20. Thus it was decided that a cut-off RR value of 20 would be reasonable, and that any station having an average RR value greater than 4 and a maximum RR value greater than 20 would be rejected. Rating ratios significantly greater than one could magnify the errors in flood frequency quantile estimates but, on the other hand, rejecting all stations with RR greater than one would reduce the number of stations below the minimum required for meaningful RFFA to be undertaken. Hence it seems to be reasonable and justifiable to adopt the cut off value of RR as mentioned above, which reduced the eligible number of stations to 133 only.

Histogram of Rating Ratio Values 10000 4387 Frequency 1000 384

111 90% of rating ratio s 100 61 between 1 & 20 Frequency 19 18 18 9 10 10 10 5 5 4 4 4 3 2 2 2 2 1 1 1 1 0 0 1 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50More RatioRating Ratio Ratio (RR) (RR)

Figure 4.6 Histogram of rating ratios (RR) of annual maximum flood data in Victoria (stations with record lengths > 25 years)

80 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

4.8 IMPACTS OF RATING RATIO ON FLOOD FREQUENCY ANALYSIS – SENSITIVITY ANALYSIS

The FLIKE software, which implements the principles outlined in Kuczera and Franks (2005), was employed to fit the LP3 distribution using the Bayesian parameter fitting procedure with both the ‘no rating curve error’ and the ’rating curve error’ cases to assess the impact of rating curve errors. The flow that was closest to RR=1 was used as the “anchor point” in the FLIKE rating curve error model. A log normal error probability model was also adopted.

The number of error groups was taken as 2. To deal with the incremental error standard deviation a percentage difference was estimated between the anchor flow, whose rating ratio is 1, and the measured flow (Q M), whose rating ratio could be up to RR = 20. Station 225218 is used as an example to highlight the impact of RR on flood estimates (Figure 4.7). An incremental error percentage of 20% was used. The incremental error percentage represents the coefficient of variation of the ratio of the estimated flow and the anchor point flow for RR values greater than one.

The quantile estimate (100 year ARI) for the analysis ignoring the rating curve error was 99,200 ML/d; while the quantile estimate considering the rating curve error was 112, 300 ML/d (a 13% increase). From a design point of view, adopting the flood frequency estimate (without considering the rating curve error) in this example would lead to an underestimation of the 100 year flood by 13,000 ML/d. The FLIKE error model was adopted to account for the rating curve error for all the selected stations.

81 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Station 225218 - Fitted LP3 with Bayesian Procedure & Rating Curve

10000000 Error Analysis

Rating error analysis 100 year ARI = 112300 1000000 ML/d

100000

No rating error analysis 100 year 10000 ARI = 99200ML/d Discharge(ML/day) 1000

Gauged Flow Quantile - LP3 (BAY-FIT) 90% Confidence Limits Quantile - LP3 (BAY-FIT) Rating Error Analysis 100 1.2 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure 4.7 Impact of considering rating curve error in flood frequency analysis

4.9 SELECTED CATCHMENTS As discussed in Section 4.3, a total of 415 stations, each with a minimum record length (N) of 10 years, was initially selected. After the gaps in flood series were filled (Section 4.4), stations with significant rating curve error extrapolations identified (Section 4.6), trend testing was completed and identification of stations with a minimum of 25 years of record starting before 1980 up to 2005 (Section 4.5) was done only 133 stations remained.

The length of records of the selected 133 stations are summarised in Appendix A, while the distribution of record lengths is presented in Figure 4.8. Some statistics are given below:

• Record lengths range from 25 to 52 years, mean 32 years, median 32 years and standard deviation 5 years; • 87% of the stations have record lengths in the range 25-35 years; • 8% of the stations have record lengths in the range 35-45 years; and • 5% of the stations have record lengths in the range 50-55 years.

82 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

90 80 80

70

60

50

40 37

No. of Stations 30

20 9 10 5 2 2 0 30 35 40 45 50 55 Record length (years) Figure 4.8 Distribution of record lengths of the selected 133 catchments

The geographical distribution of the 133 catchments can be seen in Figure 4.9. The areas of the selected catchments range from 3 to 997 km 2. The distribution of the catchment size is presented in Figure 4.10 which indicates that 78% of the catchments have an area less than 500 km 2.

Figure 4.9 Geographical distributions of the selected 133 catchments

83 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

60 54

50

40

30 25

20 16 16

No. of Catchments of No. 11 11 10

0 0 - 50 51 - 100 101-200 201-500 501-700 701-100

Catchment area (km 2)

Figure 4.10 Distribution of catchment areas of the selected 133 catchments

4.10 CHECKING FOR OUTLIERS IN THE ANNUAL MAXIMUM SERIES In a set of annual maximum flood series there is a possibility of outliers being present. An outlier is an observation that deviates significantly from the bulk of the data, which may be due to errors in data collection, or recording or due to natural causes (as identified in Section 4.5). The presence of outliers in the data causes difficulties when fitting a distribution to the data. Low and high outliers are both possibilities and have different effects on the analysis. For example it is not easy deleting suspected high outliers, because these highest flows might be the most important data points in the series.

There are no appropriate criteria for detection and handling of outliers in flood data (Jackson 1981), and all of the available procedures for dealing with outliers are subjective and require mathematical and hydrological judgement.

The method for treating outliers suggested in ARR (I.E Aust., 1987) was not adopted here, as it includes an adjustment for skew. Instead, the procedure called the Grubbs and Becks method (Rao and Hamed, 2000) which is also the same method used in Bulletin 17B (USDIGS, 1981) was adopted.

84 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

In the Grubbs and Beck (G-B) (1972) test the quantities xH and xL are calculated by Equations 4.8 and 4.9. xH = exp( x + k N s) (4.8) xL = exp( x − k N s) (4.9) where x and s are the mean and standard deviation of the natural logarithms of the sample, respectively, and kN is the G-B statistic tabulated for various sample sizes and significant levels. Sample values greater than xH are considered to be high outliers, while those less than xL are considered to be low outliers.

The results of the outlier detection are summarised below.

• 43% of the stations were found to have low outliers. The maximum number of low outliers detected in a data series was 5 and never exceeded 19% of the total number of data points in a series.

• Most of the detected low outliers occurred in the stations located in low rainfall areas, especially in the western part of Victoria.

• 31% of low outliers occurred in the year 1982. This is not surprising as there were severe droughts during these years; the maximum flows occurred in many rivers were merely base flows, and not due to flood activity. Similar results were found by Rahman (1997).

• 55% of the stations did not show any outliers. Even the values in drought years (1982, 1967) were not low enough to be treated as low outliers. The locations of most of these stations were in the south eastern parts of Victoria.

• Only 1 station was found to show a high outlier.

The detected outliers were treated as censored flows in flood frequency analysis using FLIKE (that is the information that there was no flood in that year was taken into account). For the detected high outlier, a check was made to decide whether to delete this point or retain it. An example is given here. From the graphical representation of fitting an LP3 85 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney distribution to the data of Station 405234, the detected high outlier was found to be within the 95% confidence band (See Figure 4.11). Also for Station 405234 the data of 1968 was compared with nearby stations which showed similar flows. Thus, the detected high outlier point was deemed not to be due to data error or natural error (e.g., measurement, recording, transmission and processing error) and retained.

1000000

Gauged Flow

LP3 100000 95% CL

10000

1000

Detected High Outlier

100 1.25 2 5 10 20 50 100 Annual Exceedance Probability (1 in Y)

Figure 4.11 Outlier detected for Station 405234

86 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

4.11 SELECTION OF TEST CATCHMENTS To provide for an independent test of the performance of the developed QRT (ordinary least squares and generalised least squares) and PRM, some of the study catchments were put aside and were not included in the development of the techniques. It was decided to keep 20 (i.e about 15% of the total) as independent test catchments. These catchments were chosen randomly and listed in Table 4.3.

Table 4.3 List of Test Catchments Catchment Site ID Name Area (km 2) Code 221207 Errinundra at Errinundra 158 T1 221210 Genoa at The Gorge 837 T2 221212 Bemm at Princes HWY 725 T3 223202 Tambo at Swifts Ck 943 T4 225224 Avon at The Channel 554 T5 226209 Moe at Darnum 214 T6 227200 Tarra at Yarram 218 T7 227210 Bruthen Ck at Carrajung Lower 18 T8 227219 Bass at Loch 52 T9 229218 Watsons Ck at Watsons Ck 36 T10 230204 Riddells Ck at Riddells Ck 79 T11 230213 Turritable Ck at Mount Macedon 15 T12 231231 Toolern Ck at Melton South 95 T13 235205 Arkins Ck West B at Wyelangta 3 T14 401210 Snowy Ck at below Granite Flat 407 T15 402217 Flaggy Ck at Myrtleford Rd Br 24 T16 405229 Wanalta Ck at Wanalta 108 T17 406200 Coliban at Malmsbury 306 T18 406213 Campaspe at Redesdale 629 T19 415238 Wattle Ck at Navarre 141 T20

4.12 SUMMARY A total of 133 catchments were selected from the State of Victoria. The annual instantaneous maximum flow series of the stations were collected, gaps filled, rating curve extrapolation errors identified, trends and shifts in data analysis identified and outlier points censored. Twenty independent test catchments were selected randomly and put aside to be used for testing of the developed regional flood frequency analysis methods.

87 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

CHAPTER 5: SELECTION AND ESTIMATION OF CLIMATIC AND CATCHMENT CHARACTERISTICS 5.1 GENERAL This study is primarily concerned with developing regional prediction equations for design flood estimation using a quantile regression technique; therefore an elementary step in any regional study such as this involves obtaining both climatic and catchment characteristics data. Identifying the most relevant catchment characteristics is difficult as there is no objective method for doing this; also many catchment characteristics are highly correlated, thus the presence of many of these in the model can cause problems with statistical analysis such as introducing multicollinearity and secondly it does not provide any extra useful information.

Indeed, Rahman (1997) indicated that there is no objective method for selecting catchment characteristics, thus an initial selection of candidate characteristics should be based on an evaluation and success of catchment characteristics used in past studies. Rahman (1997) considered in detail all possible climatic / catchment characteristics from over 20 previous studies to develop a reasonable starting point.

Nevertheless, no general inference about the significance of a particular catchment characteristic can be made on the fact that an investigator has found it to be significant, since in a regional study such as this dominant characteristics may vary from region to region.

In this chapter, the climatic and catchment characteristics to be used in this research will be selected with the aim of developing a working database of catchment characteristics. Initially the selection of candidate catchment characteristics will be described in detail and aspects of how the data was collected will be discussed. An exploratory data analysis of the catchment characteristics will then be presented which will look at the best functional form of the catchment characteristic for use in the quantile regression technique.

88 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

5.2 CATEGORIES OF CATCHMENT CHARACTERISTICS CONSIDERED The candidate catchment characteristics considered in this research come under the following headings: climatic characteristics, morphometric characteristics, catchment cover and land use characteristics, geological and soil characteristics. These characteristics are briefly discussed below.

5.2.1 CLIMATIC CHARACTERISTICS Climatic characteristics such as rain and evaporation can have direct and secondary effects in producing flows. Secondary effects can be attributed to both catchment morphology and vegetation. In regards to flood peaks, rainfall intensity is likely to be the dominant characteristic. Mean annual rainfall, though not affecting flow generation directly can have a second order effect being a surrogate for other catchment characteristics. In this research snowfall is of little importance in the study area and is not included.

5.2.2 MORPHOMETRIC CHARACTERISTICS Flood peaks are directly influenced by the morphometry of the drainage networks within a catchment. Many of the morphometric characteristics are highly correlated. With some being alternative measures of the similar characteristics. For example, equal area slope, average land slope and main stream slope are highly correlated and are all measures of slope.

5.2.3 CATCHMENT COVER AND LAND USE CHARACTERISTICS Flood peaks are influenced by both forestry and urban developments. Percent forest and percent urbanisation are used as indicators of this. Forest reduces runoff by precipitation being intercepted and also through transpiration. Percent forest has been used in past studies as an indication of land use (Flavell, 1982). An increase in urbanisation means an increase in pavement area, which results in an increase in flow velocity and a decrease in infiltration loss. Quicker flood response is a direct result of this.

89 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

5.2.4 GEOLOGY Geological and soil characteristics affect catchment response in numerous ways; directly through infiltration loss and indirectly through vegetation and channel morphometry. Both vegetation and channel morphometry are closely related to soil characteristics. NERC (1975) found soil type to be a significant variable, however, there are impracticalities associated with the use of these characteristics since: • Soil properties have different spatial and temporal variations which are very difficult to quantify; • In almost all cases, data relating to soil type is difficult to obtain, and • It is difficult to identify a catchment wide representative value of soil type.

5.3 SELECTION CRITERIA It is evident that there is no generic method of selecting catchment characteristics for a RFFA study. However, the following guidelines were adopted from past studies (e.g., Rahman, 1997) and by ARR revision team in making a reasonable selection. The ARR revision team consists of researches and industry representatives who is contributing towards the upgrade of Book 4 in ARR. • The characteristic should have a plausible role in flood generation. • They should be unambiguously defined. • Characteristics should be easily obtainable. When a simpler characteristic and a complex one are correlated and have similar effects, then the simpler characteristic should be chosen. • If a derived/combined characteristic is used, it should have a simple physical interpretation. • The selected characteristics should not be highly correlated because this introduces unstable parameters in multiple regression analysis. • The prediction performance of a particular characteristic in other regionalisation studies should be examined as this would certainly provide some information regarding the importance / inclusion of a characteristic.

90 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

5.4 CATCHMENT CHARACTERISTICS CONSIDERED FOR THE PROPOSED RESEARCH On the basis of the above criteria, previous studies by Rahman (2005) and recommendations by the ARR revision team a total of eight (8) catchment characteristics were selected for the proposed research, as listed below. They are also described in detail in the next section. (The final data set for the study catchments is presented in Appendix D). The candidate catchment/climatic characteristics are: • Design rainfall intensity of 12-hour duration and 2-year average recurrence interval

I12 2: mm/h; • Mean annual rainfall ( rain , mm); • Mean annual evapotranspiration ( evap , mm); • Catchment area ( area , km 2); • Slope of central 75% of mainstream S1085 ( slope , m/km); • Stream density ( sden , km/ km 2); • Fraction of basin covered by medium to dense forest ( forest ); and • Fraction quaternary sediment area ( qsa ).

These catchment characteristics are described in detail below.

5.4.1 RAINFALL INTENSITY Rainfall intensity, with some appropriate duration and average recurrence interval (ARI), has been found to be the most influential climatic characteristic used in RFFA studies. There is no doubt that it is significant in the flood generation process. It is also quite easy to obtain.

The use of rainfall intensity requires the selection of an appropriate duration and ARI. It seems to be logical to use rainfall intensity with duration equal to the time of concentration

(tc), as applied in the rational method. However, the time of concentration ( tc) differs for the catchments in the study area due to variability in size and shape; i.e. it is virtually impossible to select a storm having equal time of concentration which is representative of every catchment. It is therefore appropriate to analyse and derive the following varying durations for each ARI:

91 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

• 1 hr duration, 2 year ARI - (I 2:1 , mm / h );

• 12 hr duration, 2 year ARI - (I12 2: , mm / h );

• 12 hr duration, 50 year ARI - (I12 50: , mm / h );

• (tc) duration, 2 year ARI - (I , mm / h); tc 2:

• (tc) duration, 5 year ARI - (I , mm / h); tc 5:

• (tc) duration, 10 year ARI - (I , mm / h); tc 10:

• (tc) duration, 20 year ARI - (I , mm / h); tc 20:

• (tc) duration, 50 year ARI - (I , mm / h); tc 50:

• (tc) duration, 100 year ARI - (I , mm / h). tc:100

The regionalisation study will assess which of the rainfall intensities are statistically significant and which intensity shows the best functional and relational form with the flood quantiles in regression analysis.

All the basic design rainfall intensities data for the selected catchments were obtained from ARR, Vol 2 (I.E, Aust, 1987) and the software AUSIFD was used to obtain other design rainfall intensities. AUSIFD is a widely used software in Australia to derive design rainfalls. The data were obtained from Cabellero (2007).

5.4.2 MEAN ANNUAL RAINFALL Mean annual rainfall has been adopted in many previous studies; although it may not have a direct influence on flood peaks it can still have a secondary effect by acting as surrogate for other catchment characteristics (e.g. vegetation). It is also quite easy to obtain. Thus, mean annual rainfall was included as a candidate predictor variable in this study. For the Victorian catchments the mean annual rainfall data was obtained from Australian Bureau of Meteorology CD. For all the catchments, the mean annual rainfall value for the rainfall station closest to the centroid of each catchment was extracted. The data were obtained from Cabellero (2007).

92 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

5.4.3 MEAN ANNUAL EVAPOTRANSPIRATION Mean annual evapotranspiration is the third influential climatic characteristic considered in the flood generation process. Evapotranspiration does not affect the flood peak directly but can have a secondary effect by being a surrogate for other catchment characteristics. Evapotranspiration can be defined as the water lost from a water body through the combined effects of evaporation and transpiration from catchment vegetation. In this study mean annual areal potential evapotranspiration data was used.

For this, the data was obtained from the Australian Bureau of Meteorology CD. For all the catchments the value at the centroid of each catchment was extracted.

5.4.4 CATCHMENT AREA Catchment area is the most frequently adopted morphometric characteristic in the past RFFA studies, since it has a direct impact on the possible flood magnitude from a given storm event. Given the significance of catchment area in the flood generation process it was included in this study. Catchment areas of the selected catchments were measured by planimeter from 1:100,000 topographic maps. The derived areas were also compared to the values provided in the catchment data base that contained the streamflow data provided by the stream gauging authority.

5.4.5 SLOPE S1085 From the different measures of slope, S1085 seems to be easily obtainable and reported to be the best measure for prediction of mean flood (Benson, 1959). Thus, S1085 is used in this study. S1085 method of slope measurement in this study excludes the extremes of slope that can be found at either end of the mainstream. It is the ratio of the difference in elevation of the stream bed at 85% and 10% of its length from the catchment outlet and 75% of the mainstream length.

The following methodology was adopted to derive the S1085 values: • Catchment boundaries were plotted on 1:100,000 topographic maps for each gauged station; • The mainstream length was measured using an electronic map wheel. Where the mainstream was taken as the total distance from the outlet to where it intersects

93 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

with the catchment boundary of the stream. The longest path was chosen for each catchment. • Elevations were then derived for the 10% and 85% mainstream length positions. The positions were interpolated from either 10 m or 20 m contours; • S1085 values were determined from Equation 5.1

(E − E ) S1085 = 2 1 (5.1) 75.0 × (L)

where, E2 is the elevation at the 0.85 L position, E1 is the elevation at the 0.10 L position and L is the main stream length, where S1085 in m/km.

5.4.6 STREAM DENSITY This is directly related to drainage efficiency of a catchment, and is included in this study. The definition of stream density includes total stream length which was taken as the sum of the length of all the blue lines (representation of streamlines) in a catchment as shown on 1: 100,000 topographic maps. The length of the blue lines was measured by an electronic map wheel. Stream density was calculated as the total stream length divided by catchment area. The unit of stream density is km/km 2.

5.4.7 FRACTION FOREST AREA Density of forest directly influences runoff. It has been used in previous studies (Rahman, 1997) and is considered important in this study. The fraction of a catchment covered by forest was determined from 1:100,000 topographic maps. Counting unit squares defined by 1km 2 grid was considered sufficient and accurate in estimating those areas designated as dense and medium forest and dense and medium scrub on the map.

5.4.8 QUATERNARY SEDIMENT AREA Storage directly affects the shape of the flood hydrograph, however defining storage as a single parameter is difficult. Quaternary sediment area appears to be an influential surrogate for storage, because it s a good indicator of floodplain extent variability in a catchment. Values for quaternary sediment area was determined from 1:250,000 geological maps.

94 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Summary statistics of the derived catchment characteristics data are presented in Table 5.1

Table 5.1 Summary statistics of the Catchment Characteristics data Catchment Description Mean Median Mode St. Dev Min Max Characteristics Catchment Area (area) (km 2) 327.85 304.00 368.00 226.87 11.00 997.00

Rainfall Intensity (I12 2: ) (mm/h ) 4.45 4.13 4.00 0.90 3.30 7.00 Mean annual rainfall (rain) (mm) 933.98 893.66 680.06 321.33 484.39 1760.81 Mean annual areal potential evapotranspiration (evap) (mm) 1033.39 1030.20 986.40 41.77 925.90 1155.30 (forest) Fraction forest area 0.58 0.64 1.00 0.36 0.01 1.00 Main stream slope (S1085) (m/km) 11.85 8.41 8.60 11.27 0.80 69.90 Stream density (sden) (km/km 2) 1.47 1.43 1.58 0.54 0.48 4.25 Fraction quaternary (qsa) sediment area 0.08 0.01 0.01 0.15 0.01 0.70

Figures 5.1-5.8 show the distribution of the catchment characteristics across the 113 model catchments (i.e. excluding the test catchments). The normal distribution function is superimposed on the histogram to investigate the departure from normality. If the diagrams show near normality then transformations are not necessary, however as shown in Pages 96 to 98 there are some catchment characteristics with significant departures from normality, hence transformations will be required.

95 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

15

10

Frequency 5

0

0 500 1000

Area (Km2)

area (km2)

Figure 5.1 Histogram of area (km 2)

20

10 Frequency

0

3 4 5 6 7

I12:2I12_2

Figure 5.2 Histogram of (I12 2: )

96 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

20

10 Frequency

0

400 900 1400 1900 rain

Figure 5.3 Histogram of rain

25

20

15

10 Frequency

5

0

900 1000 1100 1200 evap

Figure 5.4 Histogram of evap

97 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

15

10

Frequency 5

0

0.0 0.5 1.0 forest

Figure 5.5 Histogram of forest

40

30

20 Frequency 10

0

0 10 20 30 40 50 60 70 80 S1085

Figure 5.6 Histogram of S1085

98 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

40

30

20 Frequency

10

0

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 sden

Figure 5.7 Histogram of sden

80

70

60

50

40

30 Frequency 20

10

0

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 qsa

Figure 5.8 Histogram of qsa

99 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

5.5 EXPLORATORY DATA ANALYSIS – SEARCHING FOR THE MOST SUITABLE TRANSFORMATION OF THE CATCHMENT CHARACTERISTICS VARIABLES

When applying multiple regression techniques the assumption is made that the variables follow a multivariate normal distribution. Making conclusions or assessing results that have significant deviation from multivariate normality may have reduced statistical power. However, hydrological data are rarely normally distributed.

Studying the univariate normality of the observations for each variable is often sufficient enough and good practice but it still does not guarantee multivariate normality. In the case of a variable that is much deviated from a normal distribution, it may be possible to transform the scale of measurement employed to obtain a distribution shape that is more “normal like”.

There are many procedures both graphical and non graphical for assessing univariate and multivariate normality. A popular graphical approach is the normal probability plot, where the observations are arranged in increasing order of magnitude and then plotted against normal distribution values. The plot should resemble a straight line if the normality assumption is approximately satisfied. More often than not it is appropriate and preferable to use a non graphical test. In previous studies such as Rahman (1997) the normal probability plot correlation coefficient test proposed by Filliben (1975) was used and found to be useful.

In this study two methods, the normal probability plot correlation coefficient (NPCC) ( r) proposed by Filliben (1975) has been used along with Anderson–Darling test for normality. In this study, the total number of stations to be used in the estimation set is n = 113, the value of ‘ r’ for the NPCC should be 0.988 for the variable to be normally distributed at the 5% level of significance. With the Anderson – Darling test for normality the null hypothesis that the data follow a normal distribution is given by the A statistic. The A statistic should be less than the critical value of 0.752 at the 5% level of significance.

The variables have been transformed, as shown in Table 5.2, to obtain r values as close as possible to 0.988 and A statistics less than or close to 0.752. The ‘ r’ & ‘ A’ values were computed using the statistical package R. 100 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

All the variables have an r≥ 0.988, except tsden , ti12:2 , which have ‘ r’ values of 0.954 and 0.98 respectively. Here ‘t’ implies a transformed variable. For ti12:2 this is very close to the limit of 0.988. No further improvement could be found for tsden and tqsa .

Most of the variables showed normality as tested by the Anderson–Darling statistic. All the A statistics were less than or just higher than 0.754 except for tsden , ti12:2 and train , which have ‘ A’ statistics of 3.1, 0.895, 0.858 respectively. No further improvements to the ‘ A’ statistics could be found, as for ti12:2 and train , the ‘ r’ values achieved were reasonable and thus the hypothesis of a normal distribution was accepted.

To visualise the degree of normality achieved by the transformations, normal probability plots were obtained for the variables before and after transformation. It can be seen from the plots that the assumption of normality is tenable through the transformations applied. For illustration, the normal probability plots for the variables are shown in Figures 5.9 - 5.24.

Table 5.2 Test for normality of the catchment characteristics variables

1 2 3 Variable Transformation Filliben ' r' Anderson - Darling Test ‘ A’ Before After Before After Transformation Transformation Transformation Transformation tarea sqrt (area) 0.971 0.995 1.95 0.379 tI12:2 (I12:2) -2.0 0.958 0.987 3.196 0.895 train log (rain) 0.969 0.989 2.244 0.859 tevap log (evap) 0.995 0.996 0.362 0.308 ln((forest-0.003)/(1-(forest- tforest 0.003))) 0.941 0.99 5.405 0.781 tS1085 log (s1085) 0.862 0.997 7.383 0.279 tsden (sden) 0.112 0.902 0.956 4.078 3.1 ln((qsa-0.006114)1.5 /(0.70- tqsa (qsa-0.006114)1.5 )) 0.733 0.866 24.19 12.37 1: The letter‘t’ before a variable name indicates that the variable has been transformed. 2: r should be ≥ 0.988 to achieve normality at a minimum of 5% level of significance. 3: A should be ≤ 0.752 to achieve normality at a minimum of 5% level of significance.

However, since it is a common practice in hydrology to adopt one transformation type in regional prediction equation for simplicity sake, the exploration of just applying a log base 10 transformation was explored. The variables have been log transformed, as shown in Table 5.3, to get an idea if normality can be reached by just using one transformation for all variables. The ‘ r’ & ‘ A’ values were computed using the statistical package R.

101 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 5.3 Test for normality of the log transformed catchment characteristics variables

1 2 3 Variable Transformation Filliben ' r' Anderson - Darling Test ‘ A’ Before After Before After Transformation Transformation Transformation Transformation tarea log (area) 0.971 0.962 1.95 2.579 tI12:2 log (I12:2) 0.958 0.976 3.196 1.835 train log (rain) 0.969 0.989 2.244 0.859 tevap log (evap) 0.995 0.996 0.362 0.308 tforest log (forest) 0.941 0.892 5.405 8.55 tS1085 log (s1085) 0.862 0.997 7.383 0.279 tsden log (sden) 0.902 0.954 4.078 3.283 tqsa log (qsa) 0.733 0.866 24.19 12.37 1: The letter‘t’ before a variable name indicates that the variable has been transformed. 2: r should be ≥ 0.988 to achieve normality at a minimum of 5% level of significance. 3: A should be ≤ 0.752 to achieve normality at a minimum of 5% level of significance

Not all the variables showed normality as tested by the Anderson–Darling Statistic and the plots as shown in Figures 5.9 to 5.24. All the A statistics were higher than 0.752 except for tS1085 , tevap , which have ‘ A’ statistics of 0.279 and 0.308 respectively. Other variables actually performed worse when transformed by the log transformation; this can particularly be seen for tarea and tforest . As for the rest of variables ‘ r’ values achieved were reasonable as one can expect, and thus the hypothesis of a normal distribution was accepted.

It was decided to stay with the log transformed variables for application with quantile regression technique for this thesis. An alternative analysis can be undertaken with the transformations given in Table 5.2 to see the effects of the different transformations on the prediction equations However, this study did not explore this option as log transformation is the most common for these types of RFFA studies (e.g. Rahman 2005 and Stedigner and Tasker 1985).

102 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Normal Probability Plot for Area (km2) LSXY Estimates - 95% CI

99

95 90 Goodness of Fit 80 AD* 1.95 70 60 Correlation 0.971 50 40 30

Percent 20 10 5

1

-400 -200 0 200 400 600 800 1000 Data

Figure 5.9 Variable area before transformation

Normal Probability Plot for sqrt(area)(Area )0.5 LSXY Estimates - 95% CI

99

95 90 Goodness of Fit 80 AD* 0.379 70 60 Correlation 0.995 50 40 30 Percent 20 10 5

1

0 10 20 30 40 Data

Figure 5.10 Variable area after transformation

103 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Normal Probability Plot for I12:2 LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 3.196 70 60 Correlation 0.958 50 40 30

Percent 20 10 5 1

1.5 2.5 3.5 4.5 5.5 6.5 7.5 Data

Figure 5.11 Variable I 12:2 before transformation

Normal Probability Plot for I12:2^-2 LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 0.895 70 60 Correlation 0.987 50 40 30

Percent 20 10 5

1

0.00 0.05 0.10 Data

Figure 5.12 Variable I 12:2 after transformation

104 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Normal Probability Plot for rain LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 2.244 70 60 Correlation 0.969 50 40 30

Percent 20 10 5

1

0 1000 2000 Data

Figure 5.13 Variable rain before transformation

Normal Probability Plot for log(rain) LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 0.859 70 60 Correlation 0.989 50 40 30

Percent 20 10 5 1

2.5 2.6 2.7 2.8 2.9 3.0 3.1 3.2 3.3 3.4 Data

Figure 5.14 Variable rain after transformation

105 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Normal Probability Plot for evap LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 0.362 70 60 Correlation 0.995 50 40 30

Percent 20 10 5 1

880 980 1080 1180 Data

Figure 5.15 Variable evap before transformation

Normal Probability Plot for log(evap) LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 0.308 70 60 Correlation 0.996 50 40 30

Percent 20 10 5 1

2.95 3.00 3.05 Data

Figure 5.16 Variable evap after transformation

106 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Normal Probability Plot for forest LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 5.405 70 60 Correlation 0.941 50 40 30

Percent 20 10 5 1

-0.5 0.0 0.5 1.0 1.5 Data

Figure 5.17 Variable forest before transformation

Normal Probability Plot for T-(forest) LSXY Estimates - 95% CI

99

95 90 Goodness of Fit 80 AD* 0.781 70 60 Correlation 0.99 50 40 30

Percent 20 10 5

1

-5 0 5 10 Data

Figure 5.18 Variable forest after transformation

107 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Normal Probability Plot for s1085 LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 7.383 70 60 Correlation 0.862 50 40 30

Percent 20 10 5 1

-20 -10 0 10 20 30 40 50 60 70 Data

Figure 5.19 Variable S1085 before transformation

Normal Probability Plot for log(s1085) LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 0.279 70 60 Correlation 0.997 50 40 30

Percent 20 10 5 1

0 1 2 Data

Figure 5.20 Variable S1085 after transformation

108 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Normal Probability Plot for sden LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 4.078 70 60 Correlation 0.902 50 40 30

Percent 20 10 5 1

0 1 2 3 4 Data

Figure 5.21 Variable sden before transformation

Normal Probability Plot for Box - Cox sden LSXY Estimates - 95% CI

99 95 90 Goodness of Fit 80 AD* 3.1 70 60 Correlation 0.956 50 40 30

Percent 20 10 5 1

0.9 1.0 1.1 Data

Figure 5.22 Variable sden after transformation

109 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Normal Probability Plot for qsa LSXY Estimates - 95% CI

99

95 90 Goodness of Fit 80 AD* 24.19 70 60 Correlation 0.733 50 40 30

Percent 20 10 5

1

-0.2 0.0 0.2 0.4 0.6 Data

Figure 5.23 Variable qsa before transformation

Normal Probability Plot for log(qsa) LSXY Estimates - 95% CI

99

95 90 Goodness of Fit 80 AD* 12.37 70 60 Correlation 0.866 50 40 30 Percent 20 10 5

1

-20 -10 0 Data

Figure 5.24 Variable qsa after transformation

110 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

5.6 EXPLORATORY DATA ANALYSIS – CORRELATION MATRIX OF THE TRANSFORMED CATCHMENT CHARACTERISTICS

The correlation coefficients between the 8 transformed catchment characteristics variables are presented in Tables 5.4 and 5.5. The definition of these variables can be found in Tables 5.2 and Table 5.3.

Table 5.4 Correlation matrix of the transformed variables

t(area) tI 12:2 t(rain) t(evap) t(forest) t(S1085) t(sden) t(qsa) t(area) 1.00

tI 12:2 0.11 1.00 t(rain) 0.00 -0.65 1.00 t(evap) 0.03 0.08 -0.13 1.00 t(forest) -0.08 -0.64 0.41 -0.14 1.00 t(S1085) -0.55 -0.59 0.35 -0.19 0.53 1.00 t(sden) -0.19 -0.24 0.02 0.08 0.18 0.24 1.00 t(qsa) 0.13 0.21 -0.15 0.07 -0.08 -0.17 -0.07 1.00

Table 5.5 Correlation matrix of the log transformed variables log log log log log log log (area) (I 12:2 ) log (rain) (evap) (forest) (S1085) (sden) (qsa) log (area) 1

log (I 12:2 ) -0.12 1.00 log (rain) 0.00 0.62 1.00 log (evap) 0.06 -0.04 -0.13 1.00 log (forest) -0.09 0.51 0.35 -0.13 1.00 log (S1085) -0.57 0.54 0.35 -0.19 0.50 1.00 log (sden) -0.19 0.21 0.03 0.08 0.25 0.26 1.00 log (qsa) 0.16 -0.19 -0.16 0.00 0.01 -0.18 -0.06 1.00

In analysing the above correlation matrix in Table 5.4, there does not seem to be any noticeable correlation among these predictor variables. However the degree of negative correlation between tI12:2 and train is very surprising, one would expect these two variables to be positively correlated. This is probably due to the negative transformation used for rainfall intensity.

The log transformed correlation matrix (Table 5.5) was also examined, there does not seem to be any noticeable correlation among the log transformed predictor variables. There is a slightly medium correlation between log (S1085) and log( I12:2) and log( forest ) and log( I12:2) which are in between 0.5 to 0.62.

111 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

In the first instance for both Tables 5.4 and 5.5 there does not seem to be any strongly correlated variables, therefore the problem of multi-collinearity should not be an issue. In stating this, it should be noted that multi-collinearity will be tested for during the model development. Given the degree of the correlation between all of the variables was not high; all the variables were retained to be used in the quantile regression technique at this stage.

5.7 SUMMARY

The candidate catchment characteristics for the proposed study are selected and data obtained. All the variables as seen in Table 5.1 will be used in the analysis. The variables were also examined to see if they satisfy a normal distribution; log transformation was undertaken as indicated in Table 5.3 to achieve reasonable normality. The log transformation of the variables was adopted as this is the most common transformation used in RFFA.

112 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

CHAPTER 6: SEARCHING FOR HOMOGENEOUS REGIONS

6.1 GENERAL

As discussed in Chapter 4, annual maximum streamflow data sets were prepared for 133 catchments. From these, twenty catchments were selected randomly and put aside as ‘test catchments’, thus leaving 113 for the proposed model development. The L-moments of the annual maximum flood series of all the 133 catchments were calculated as discussed in Chapter 3. The aim of this chapter is to attempt to form regions that can be considered ‘homogenous’. The first step is to identify sites that are grossly discordant with the rest of the sites in that particular group. Secondly an empirical approach of determining homogenous groups is outlined, and applied initially to the 133 catchments then to the 113 catchments. This approach is then extended to smaller sets of catchments based on regional proximity and catchment sizes to determine if any homogenous groups could be formed.

6.2 FORMATION OF HOMOGENOUS GROUPS

In identifying homogenous groups from a large number of sites, a balance needs to be maintained between selecting a reasonable sized group with more information, but being of a lower degree of homogeneity, and a smaller group with reduced information and showing a greater degree of homogeneity. The aim of this balancing act is to make use of the information available between group size and the degree of homogeneity being achieved. It still remains however that a small group that shows good homogeneity may not be appropriate for use in a regional study for a small group may not be able to provide statistically meaningful results.

Also, the H statistic used here (as explained Chapter 3) to measure homogeneity has a tendency to give false impressions of homogeneity for small regions (further discussions on this can be seen in Hosking & Wallis, 1993). Nonetheless, there does not appear to be any rules or guidelines on a minimum number of sites required to define a homogenous group or region.

113 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

What is worth mentioning is that homogeneity is often used with the index flood and similar methods. A number of sites forming a homogenous group mean that the underlying probability distribution of the standardised flood variables for the sites is the same, which implies that standardised annual maximum flood series for the sites are samples of the same population. Given that this study is based on the quantile regression technique homogeneity is not a strict prerequisite for this method. However having homogenous regions is advantageous, as this would certainly reduce the model error inherent in the regional model and would give more accurate prediction equations applicable to the region of interest.

As an aim of obtaining homogenous regions, one may start with a small homogenous region and keep adding sites close to the boundary of the assumed region whilst checking H < 1. An alternative approach is to start with a large region having a very high degree of heterogeneity and to remove sites that are grossly discordant with group until H < 1 or reasonably close. The latter approach was adopted for this study.

The first step was to treat Victoria as one whole region; in the second step, Victoria was divided across the Great Dividing Range to see if two homogenous regions could be formed. In the third approach, Victoria was split in the centre dividing it into East and West to examine if homogenous regions could be formed. A fourth strategy was to divide Victoria into 4 distinct groups being East Victoria, West Victoria, South Victoria and North Victoria and a fifth approach was to divide the catchments based on catchment size.

6.3 MEASURING THE DEGREE OF HETEROGENEITY IN A GROUP

The method proposed by Hosking and Wallis (1993), discussed in Chapter 3, was used to measure the degree of heterogeneity in a region. This method assumes that in a possible homogenous group all the sites have the same population L–Moments; however, their sample L–Moments are different due to sampling variability. The test compares the dispersion of the sample L–Moments for the proposed region with the expected dispersion of a homogenous region, estimated from a simulation. In applying the procedure, a simulation of 500 homogenous regions (i.e. Nsim = 500), with heterogeneity measures H(1) , H(2) and H(3) were computed using a FORTRAN program developed by Hosking (1991a).

114 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

6.4 FORMING ONE HOMOGENOUS GROUP

Firstly it was hypothesised that all the 133 sites (including test catchments) form one region. For this, the obtained H values were: H(1) = 22.94, H(2) = 13.98 and H(3) = 7.40 indicating a highly heterogeneous group as H is much greater than 1. Six discordant sites were identified and removed and checked for data errors. No data errors were discovered for these discordant sites and their effect on H statistics was not found to be significant.

Secondly a hypothesis was made that all the 113 stations (excluding the test catchments) form one region. For this, the obtained H values were: H(1) = 19.85, H(2) = 13.01 and H(3) = 6.63, which is a slight improvement over the H statistics above but still however quite heterogenous given that H>> 1. Six discordant sites were removed which left the region with 107 catchments. The deletion of these six catchments did improve the H statistic slightly with H(1) = 16.44, H(2) = 10.80 and H(3) = 5.85. With this run however 3 more sites were discovered as being discordant with the group as a whole and removed, this did not have a significant bearing on the H statistics and it was concluded that the region was “Definitely Heterogenous”. The summary of these results can be seen in Table 6.1.

6.5 FORMING TWO HOMOGENOUS GROUPS

There are many possibilities and ways to obtain two homogeneous regions from the 113 sites. However, there is only one logical combination of two homogenous regions that will satisfy the criteria of two regions with a significant number of stations. To achieve this aim, the State of Victoria was divided into two regions based on the North and South regions of the Great Dividing Range. This can be seen in Figure 6.1. It seemed logical to start with each region with a large number of catchments and then to remove sites that were discordant with the rest of the region.

115 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Figure 6.1 Victoria – Two regions based on the Great Dividing Range

6.5.1 HOMOGENOUS REGIONS BASED ON THE NORTH OF THE GREAT DIVING RANGE The initial region started with 59 stations. The station ID numbers of these stations can be seen in Appendix C. For this, calculated H values were: H(1) = 12.50, H(2) = 7.61 and H(3) = 4.67, which again is quite large considering that H should be less than 1 or close to it. Three discordant sites were identified and removed with no significant change to the H statistic. A further run was completed with 1 discordant site removed which again did not have any impact on the H statistics. The results of these H statistics can be viewed in Table 6.1. Thus the region based on the north of the Great Dividing Range is considered to be “Definitely Heterogeneous”.

6.5.2 HOMOGENOUS REGIONS BASED ON THE SOUTH OF THE GREAT DIVING RANGE The initial region started with 55 stations. Table 6.1 can be viewed for further details. The calculated H values were: H(1) = 13.76, H(2) = 7.88 and H(3) = 2.61, which certainly shows no improvement when compared to the region north of the Great Dividing Range. Three discordant sites were removed and H statistics calculated with no significant

116 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney improvement. Another 3 sites were removed with two more runs leaving 49 stations with again no improvement in H statistics. Therefore it was concluded that a homogenous region based on the south of the Great Dividing Range is “Definitely Heterogenous”. Table 6.1 shows the details of the H statistics results.

6.6 FORMING THREE HOMOGENOUS GROUPS

Obtaining three homogeneous regions given 113 sites while retaining a good number of stations in each region is possible. However allocating stations to a particular region is difficult. A logical combination of three homogenous regions that will satisfy the criteria of three regions with a significant number of stations is to divide the State of Victoria into East, Northwest and Southwest regions. This can be seen in Figure 6.2. It was thought that there could be some hydrological relationship between station to station in each region and that with a smaller number of stations in each region, homogeneity could be achieved.

Figure 6.2 Victoria – Three regions based on the East, Northwest and Southwest Victoria

117 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

6.6.1 HOMOGENOUS REGION BASED ON EASTERN VICTORIA The initial region started with 55 stations. The station ID numbers of these stations can be seen in Appendix C. For this, calculated H values were: H(1) = 13.00, H(2) = 9.59 and H(3) = 5.05, which again is remarkably high considering that H should be close to 1. A further four discordant sites were identified and removed over 4 simulations with no significant change to the H statistic. The results of these H statistics can be viewed in Table 6.1. Thus the region based on Eastern Victoria is considered to be “Definitely Heterogeneous”.

6.6.2 HOMOGENOUS REGION BASED ON NORTHWEST VICTORIA The initial region started with 31 stations. The station ID numbers of these stations can be seen in Appendix. For this, calculated H values were: H(1) = 10.65, H(2) = 6.29 and H(3) = 4.03, which again is high considering that H should be close to 1, but certainly a improvement when compared to the H statistics in Eastern Victoria. A further 1 discordant sites was identified and removed over 1 simulation with no significant change to the H statistic. The results of these H statistics can be viewed in Table 6.1. Thus the region based on North-western Victoria is considered to be “Definitely Heterogeneous”.

6.6.3 HOMOGENOUS REGION BASED ON SOUTHWEST VICTORIA The initial region started with 31 stations. The station ID numbers of these stations can be seen in Appendix C. For this, calculated H values were: H(1) = 9.09, H(2) = 4.35 and H(3) = 1.18, which again is high considering that H should be close to 1. H(3) however is close to one but given that H(3) lacks discriminatory power it is difficult to justify a homogenous region based on this value alone. A further 4 discordant sites were identified and removed over 4 simulations with some slight improvements to the H statistic. The results of these H statistics can be viewed in Table 6.1. Thus the region based on North-western Victoria is considered to be “Definitely Heterogeneous”.

118 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

6.7 FORMING FOUR HOMOGENOUS GROUPS For four homogeneous regions the method used was similar to as discussed in Section 6.6. A logical combination of four homogenous regions that will satisfy the criteria of four regions with a significant number of stations is to divide the State of Victoria into Northeast, Southeast, Northwest and Southwest regions. This can be seen in Figure 6.3. The boundary of the regions is selected somewhat arbitrarily to assess the degree of candidate homogenous regions. The change of boundary may not improve the results.

Figure 6.3 Victoria – Regions based on the Northeast, Southeast, Northwest and Southwest Victoria

6.7.1 HOMOGENOUS REGION BASED ON NORTHWEST VICTORIA The initial region started with 32 stations. The station ID numbers can be seen in Appendix C. For this, calculated H values were: H(1) = 10.33, H(2) = 4.88 and H(3) = 2.85, which is particularly high considering that H should be close to 1. A further six discordant sites were identified and removed over 4 simulations with a significant change to the H statistic. The results show that H(1) = 0.30, H(2) = 0.56 and H(3) = 1.17, thus signifying a homogenous region. This was further examined as it is not unusual when the number of sites is reduced for homogeneity to increase. The particular stations in this region (Northwest Victoria) tend

119 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney to show slightly less runoff than other parts of the state which tells us that the hydrology of Northwest Victoria is quite different to the rest of the state. The results can be seen in Table 6.1.

6.7.2 HOMOGENOUS REGION BASED ON SOUTHWEST VICTORIA The initial region started with 21 stations. The station ID numbers can be seen in Appendix C. For this, calculated H values were: H(1) = 6.40, H(2) = 3.85 and H(3) = 1.74. One discordant site was identified and removed with no significant change to the H statistic. The results show that Southwest Victoria is “Definitely Heterogenous”. Further results can be seen in Table 6.1.

6.7.3 HOMOGENOUS REGION BASED ON NORTHEAST VICTORIA The initial region started with 26 stations. The station ID numbers can be seen in Appendix C. For this, calculated H values were: H(1) = 4.03, H(2) = 3.79 and H(3) = 3.78. No discordant site was identified. The results show that Northeast Victoria is “Definitely Heterogenous”. Further results can be seen in Table 6.1.

6.7.4 HOMOGENOUS REGION BASED ON SOUTHEAST VICTORIA The initial region started with 34 stations. The station ID numbers can be seen in Appendix C. For this, calculated H values were: H(1) = 11.98, H(2) = 6.55 and H(3) = 1.70. A further four sites were found to be discordant over three more simulations. The results show that Southeast Victoria is “Definitely Heterogenous”. Further results can be seen in Table 6.1.

6.8 FORMING HOMOGENOUS GROUPS BASED ON CATCHMENT SIZE A final effort was made to obtain homogenous regions based on catchment sizes. To achieve this, the catchment sizes would have to be broken up into groups according to their catchment size given that flood generation is highly dependant on the catchment size. Thus the following upper limit sizes seemed reasonable and were chosen; 0-100km 2, 100- 300km 2, 300-500km 2 and >500km 2. The four regions above were tested for homogeneity with no success, with some results actually being poorer than the regional divisions adopted in the previous sections. The results of this analysis can be seen in Table 6.1.

120 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 6.1 Results of Homogeneity Test for the Hypothesised Regions Simulation Group Heterogeneity Sites in the Group Discordant Sites Measures Simulation 1 Region 1 H(1) = 22.94 226222. 226402, H(2) = 13.98 227231, 235205, H(3) = 7.40 All 133 stations 405205, 407209 Simulation 1 Region 2 H(1) = 19.85 113 stations 226222. 226402, H(2) = 13.01 (excluding test 227231, 235205, H(3) = 6.63 catchments) 405205, 407209 Simulation 2 Region 2 H(1) = 16.44 226205, 227236, H(2) = 10.80 107 (excluding 226217 H(3) = 5.85 discordant sites) Simulation 3 Region 2 H(1) = 15.05 _ H(2) = 10.35 104 (excluding H(3) = 5.97 discordant sites) Simulation 1 Region 3a H(1) = 12.50 405205, 405229, (NGDR) H(2) = 7.61 407209 H(3) = 4.67 59 Stations Simulation 2 Region 3a H(1) = 9.78 406224 (NGDR) H(2) = 6.69 H(3) = 4.36 56 Stations Simulation 3 Region 3a H(1) = 9.07 _ (NGDR) H(2) = 5.70 H(3) = 3.85 55 Stations Simulation 1 Region 3b H(1) = 13.76 226222, 226402, (SGDR) H(2) = 7.88 227231 H(3) = 2.61 55 Stations Simulation 2 Region 3b H(1) = 10.82 227236, 235205 (SGDR) H(2) = 5.41 H(3) = 2.61 52 Stations Simulation 3 Region 3b H(1) = 9.99 234200 (SGDR) H(2) = 4.89 H(3) = 0.90 50 Stations Simulation 4 Region 3b H(1) = 10.54 _ (SGDR) H(2) = 5.19 H(3) = 0.84 49 Stations Simulation 1 Region 4a (East H(1) = 13.00 226222, 226402, Vic) H(2) = 9.59 227231 H(3) = 5.05 52 Stations Simulation 2 Region 4a (East H(1) = 10.55 227236 Vic) H(2) = 7.39 H(3) = 3.46 49 Stations Simulation 3 Region 4a (East H(1) = 9.65 403224 Vic) H(2) = 6.52 H(3) = 3.04 48 Stations Simulation 4 Region 4a (East H(1) = 9.92 _ Vic) H(2) = 6.69 H(3) = 3.06 47 Stations Simulation 1 Region 4b H(1) = 10.65 407209 (Northwest Vic) H(2) = 6.29 H(3) = 4.03 31 Stations Simulation 2 Region 4b H(1) = 8.46 _ (Northwest Vic) H(2) = 5.67 H(3) = 4.50 30 Stations Simulation 1 Region 4c H(1) = 9.09 231200, 405205 (Southwest Vic) H(2) = 4.35 H(3) = 1.18 31 Stations Simulation 2 Region 4c H(1) = 7.46 234200, 235205 (Southwest Vic) H(2) = 3.30 H(3) = 0.90 29 Stations

Simulation 3 Region 4c H(1) = 7.19 _ (Southwest Vic) H(2) = 2.95 H(3) = 0.48 27 Stations Simulation 1 Region 5a H(1) = 10.33 32 Stations 405205, 405229, 121 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

(Northwest Vic) H(2) = 4.88 407209 H(3) = 2.85 Simulation 2 Region 5a H(1) = 4.42 405241, 406224 (Northwest Vic) H(2) = 2.13 H(3) = 1.85 29 Stations Simulation 3 Region 5a H(1) = 2.05 405274 (Northwest Vic) H(2) = 1.10 H(3) = 1.09 27 Stations Simulation 4 Region 5a H(1) = 0.30 _ (Northwest Vic) H(2) = 0.56 H(3) = 1.17 26 Stations Simulation 1 Region 5b H(1) = 6.40 235205 (Southwest Vic) H(2) = 3.85 H(3) = 1.74 21 Stations Simulation 2 Region 5b H(1) = 6.43 _ (Southwest Vic) H(2) = 3.88 H(3) = 1.83 20 Stations Simulation 1 Region 5c H(1) = 4.03 _ (Northeast Vic) H(2) = 3.79 H(3) = 3.78 26 Stations Simulation 1 Region 5d H(1) = 11.98 226222, 227231 (Southeast Vic) H(2) = 6.55 H(3) = 1.70 34 Stations Simulation 2 Region 5d H(1) = 9.58 226402, 227236 (Southeast Vic) H(2) = 4.78 H(3) = 0.60 32 Stations Simulation 3 Region 5d H(1) = 7.69 _ (Southeast Vic) H(2) = 3.14 H(3) = -0.65 30 Stations Simulation 1 Region 6a (0- H(1) = 9.14 _ 100km 2) H(2) = 4.76 H(3) = 1.11 20 Stations Simulation 1 Region 6b (100- H(1) = 11.79 405205, 405229, 300km 2) H(2) = 6.66 227231 H(3) = 3.91 39 Stations Simulation 2 Region 6b (100- H(1) = 9.86 406224, 407209 300km 2) H(2) = 5.06 H(3) = 2.87 36 Stations Simulation 3 Region 6b (100- H(1) = 7.47 226205 300km 2) H(2) = 4.03 H(3) = 2.61 34 Stations Simulation 4 Region 6b (100- H(1) = 6.84 _ 300km 2) H(2) = 3.66 H(3) = 2.31 33 Stations Simulation 1 Region 6c (300- H(1) = 11.78 _ 500km 2) H(2) = 7.80 H(3) = 4.25 32 Stations Simulation 1 Region 6d H(1) = 6.74 226402 (>500km 2) H(2) = 4.60 H(3) = 2.31 23 Stations Simulation 2 Region 6d H(1) = 5.49 _ (>500km 2) H(2) = 3.26 H(3) = 1.48 22 Stations

122 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

6.9 SUMMARY An empirical approach was adopted to identify homogenous regions for Victoria. The approach involved delineating Victoria into various possible regions starting with a sufficiently large number of stations and then subtracting stations that were discordant with the rest of the group. Only 1 region was found to be homogenous in Northwest Victoria, which is the driest part in Victoria. Identification of homogenous regions in Victoria was unsuccessful. This may not affect the outcome of the study as the QRT (adopted in this study to develop prediction equations) is not dependant on the underlying assumption of homogeneity as with the index flood approach.

123 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

CHAPTER 7: DEVELOPMENT OF THE PROBABALISTIC RATIONAL METHOD

7.1 GENERAL This chapter describes the development of the Probabilistic Rational Method (PRM). New isopleth map of the runoff coefficient C10 and a regression equation relating C10 to catchment and climatic characteristics is derived in this chapter. A new set of frequency factors is then obtained for use with the PRM.

7.2 DEVELOPMENT OF RUNOFF COEFFICIENT C 10 ISOPLETH MAPS

The procedures adopted in deriving runoff coefficient C10 isopleth contours are summarised below:

• Runoff coefficients C10 are derived for the model catchments in excel using Equation 3.75, where:

- Q10 values are obtained from ARR-FLIKE (Kuczera and Franks, 2005), LP3-Bayesian parameter procedure and flood quantiles for various ARIs

were noted (refer to Appendix E)

- Itc, 10 values are obtained from ARR Vol. 2 (I.E Aust, 1987) techniques as

discussed in Chapter 5; and

- Catchment area is obtained from the gauging authority.

• Latitude, longitude and C10 values of the test catchments are exported into geographical information systems (GIS) program MAPINFO where the z-value is

represented by the C10 value; • Within the MAPINFO software, triangulation is used to create a digital terrain model; • From the digital terrain model, isopleth contours are projected and labelled accordingly using a linear solution; and

• Test catchments positions are located in the MAPINFO and C10 values for the test catchments are interpolated and extracted.

124 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

7.3 DEVELOPMENT OF PREDICTION EQUATION FOR C 10 An alternative procedure was also explored in that a prediction equation was developed between C10 and catchment and climatic characteristics as independent variables, using a multiple linear regression analysis. Chapter 3 discusses the theory of multiple linear regression analysis.

The developed prediction equation for C10 is shown below.

C10 = − .2 631 − .0 125 log( rain ) + 02.1 log( evap ) + .0 072 log( S1085 ) + .0 128 *log( sden ) (7.1) R2 = 26.7%

Where rain is mean annual rainfall (mm) obtained from BOM CD (value near catchment centre), evap is mean annual areal potential evaporation obtained from BOM CD (mm) (value near catchment centre), S1085 is slope of the central 75% of mainstream (m/km) identified from 1:100,000 topographic map and sden is stream density (km/km 2) obtained from 1:100,000 topographic map.

It is worth noting here that underlying assumption of normality of the residuals was largely fulfilled for the prediction equation (7.1) and that the predicted C10 values for all the test catchments were also positive, which is a good indication of the accuracy of the developed prediction equation. The developed prediction equation also showed a small standard error of estimate (SEE) and no notable multicollinearity was present with the predictor variables as evidenced by the small variance inflation factors. However the R2 value 26.7% is low as expected, but this provided reasonable results. Comparing this to past studies e.g. Rahman and Hollerbach (2003) where poorer results had been obtained (even some negative predictions), this study shows improvement. Table 7.1 summarises relevant statistics of the developed prediction equation (7.1).

125 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 7.1 Some important statistics relating to C10 prediction equation Predictor Coefficient T P VIF

Constant 2.631 2.19 0.031 log (rain) -0.125 -2.63 0.010 1.3 log (evap) 1.02 2.58 0.012 1.1

log (lS1085) 0.072 3.19 0.002 1.4 log (sden) 0.128 2.66 0.009 1.1 SEE=6% R2 = 26.7%

Appendix E contains the derived C10 values for each of the 113 model catchments by using

Equation 3.75 and by the developed regression equation 7.1. Most of the C10 values seemed realistic except for station number 403209, which provided a C10 value of 100, which seemed quite high as compared to the other values.

The C10 contour map is shown in Figure 7.1. In comparison to ARR87 contour map, the new contours generally provide better spatial coverage with greater resolution except for the north-western part of Victoria where no reliable streamflow data are available. The C10 values do not reveal any regional pattern, and low values are surrounded by higher ones in many locations similar to ARR87, which raises the question on the method of simple linear interpolation on the contour map while estimating value of C10 for an ungauged catchment.

126 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Figure 7.1 New C10 contour map for the PRM method in Victoria

C10 values for the 20 test catchments were interpolated using the MAPINFO program and the developed prediction equation 7.1; three points which were outside the range of the contour lines were interpolated manually as indicated by highlighted text. The results are shown in Table 7.2.

127 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 7.2 – Values of C10 of the test catchments from C10 Map (Figure 7.1) and regression equation

CODE STATION No. Lat Long C10 -Map C10 -Reg % Diff T1 221207 -37.450 148.913 17.74 19.72 11 T2 221210 -37.427 149.525 22.48 20.30 10 T3 221212 -37.608 148.900 18.08 18.38 2 T4 223202 -37.263 147.722 9.02 16.57 84 T5 225224 -37.803 146.883 20.77 20.65 1 T6 226209 -38.207 146.000 4.76 11.42 140 T7 227200 -38.458 146.693 36.74 13.58 63 T8 227210 -38.397 146.742 30.26 22.13 27 T9 227219 -38.375 145.558 9.47 17.58 86 T10 229218 -37.668 145.258 15.91 15.86 0 T11 230204 -37.467 144.667 28.4 25.12 12 T12 230213 -37.422 144.583 27.28 26.04 5 T13 231231 -37.912 144.577 19.16 17.38 9 T14 235205 -38.645 143.443 28.6 19.35 32 T15 401210 -36.570 147.412 12.06 15.81 31 T16 402217 -36.388 146.875 18.73 25.10 34 T17 405229 -36.637 144.868 13.3 13.20 1 T18 406200 -37.192 144.383 23.81 13.99 41 T19 406213 -37.017 144.542 20.63 19.52 5

Generally, the results of both the methods are very similar for most of the test catchments. The advantage of using regression equation is that this avoids the necessity of estimating

C10 by geographical interpolation from the C10 contour map, an approach that has been widely criticised. Also as shown in Table 7.2 in the % difference column the values of C10 predicted by the two methods mostly agree, however, there are notable differences for a few catchments which are not unlikely for this type of RFFA study. These two methods are based on different principles, and are likely to provide different results for some catchments. Chapter 9 will assess the relative performances of the two types of C10 values by comparing estimated design floods from these C10 values and at-site flood frequency estimates.

128 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

7.4 DERIVATION OF FREQUENCY FACTORS

The frequency factors ( FF y) represent the variation in runoff coefficients across various ARIs as described in ARR (I.E. Aust, 1987). However according to ARR (I.E. Aust,

1987), the FF y also varies over space, for example, the recommended FF y values to be used with the PRM in Victoria are different to those recommended in eastern NSW. The FF y values for the PRM in Victoria can be found in Book IV ARR (I.E. Aust, 1987) Vol 1. In this study, the derivation of the FF y values assumes Victoria to be a single region, similar to ARR.

The procedures used to derive frequency factors are listed as follows:

• From the computed C10 values (as previously discussed) the FF y values were

computed by dividing runoff coefficients Cy by C10 . These were based on the 113 model catchments.

• The average and the median values of the 113 FF y values for all the ARIs were obtained. The median values were adopted for this study. None the less, the two

types of FF y values are compared in Table 7.3.

The computed FF y values for the 113 catchments are presented in Appendix E. Table 7.3 shows that for ARIs of 10 to 100 years, ARR1987 and the new FFy values are very similar. The differences for 2 and 5 years ARIs are arising from the mathematical differences between the annual maximum and partial duration series (see Pages 29, ARR 2001, Book IV) ARR1987 adopted a partial series method for flood frequency analysis but this study is based on annual maximum flood series. It should be mentioned here that the prediction equations or C10 maps developed in this thesis will provide annual maximum flood estimates, which can be converted to partial series estimates, if required, by using Equation 2.1 in ARR Book IV Page 29.

Table 7.3 Frequency factors in ARR and new values compared

ARI (years) FF y (ARR1987) FF y (New PRM 2007) 2 0.75 0.48 5 0.9 0.81 10 1 1 20 1.1 1.10 50 1.2 1.20 100 1.3 1.27

129 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

7.5 SUMMARY This chapter has described the derivation of new Probabilistic Rational Method for the state of Victoria. The derivation includes a new C10 contour map and a C10 prediction equation.

The values of C10 predicted by the two methods mostly agree, however, there are notable differences for a few catchments which are not unlikely for this type of RFFA study. Since the two methods are based on different principles, these variations in results are expected. The developed prediction equation requires four readily obtainable catchment and climatic characteristics variables. A new set of frequency factors has been obtained. The new Probabilistic Rational Method will be tested in Chapter 9.

130 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

CHAPTER 8: DEVELOPMENT OF PREDICTION EQUATIONS BY QUANTILE REGRESSION TECHNIQUE 8.1 ORDINARY LEAST SQUARES – SUMMARY OF PROCEDURES ADOPTED

In developing the prediction equations for the flood quantiles Q1.25 , Q 2, Q 5, Q 10 , Q 20 , Q 50 ,

Q100 and Q200, the following procedures were adopted:

1. 20 test catchment were initially selected at random and set aside for independent testing of the developed prediction equations thus leaving 113 catchments for model development; 2. Catchment characteristics and flood quantile variables were log-transformed (see Chapter 5). 3. Ordinary Least Squares (OLS) linear regression analysis was then carried out with 113 catchments for each of the eight quantiles using MINITAB: - “Backward” method of variable selection was used; and - Diagnostic plot options were used e.g. standardized predicted values selected as X axis and standardized residual values selected as Y axis. 4. Various model statistics were noted e.g. coefficient of determination ( R2), significance for various predictor variables, standard error of estimates ( SEE ); 5. The final models were selected mainly based on the highest R2 and lowest SEE values; 6. The primarily selected model was then tested for influential data points and outliers. Influential data points were identified by plotting station serial number versus Cook’s distance and serial number versus leverage values. Similarly outliers were identified as the data points where standardized residual values are greater than 3; 7. Influential/outlier data points were removed from the dataset and regression analysis was repeated; 8. Steps 6 and 7 were repeated until all outliers and influential data points were removed; 9. Statisitcal tests were carried out to determine presence of heteroskedascity of the residuals and normality of the residuals. 10. Final prediction equations and related model statistics were recorded. The limit of the removal of outlier stations was set to be 10% of total number of initial model catchments (i.e 10% of 113 stations).

131 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

8.2 LOG TRANSFORMATION OF DATA Log transformation is made on the dataset to change the scale of the independent variables to achieve linearity or near-linearity. Log transformation is done on both flood quantiles (dependent variables) and catchment characteristics (independent variables) dataset (see Chapter 5). The transformed data set is shown in Appendix D.

8.3 DEVELOPMENT OF PREDICTION EQUATIONS - OLS Ordinary least squares (OLS) linear regression analysis was carried out using 113 catchments’ database following the criteria set-out in Section 8.1 and 3.7. The formulated prediction equations are shown in Table 8.1 and the important model statistics are provided also in Table 8.1. An example for quantile Q10 is provided below on how the regression is undertaken. For the rest of the quantiles, the various statistics and diagnostic plots are provided in Appendix F. All OLS modelling information are provided on the accompanying CD.

8.4 Q 10 MODEL

Model development for Q10 involved the removal of 12 stations which were considered influential and outliers with respect to Cook’s distance. These stations are 226218, 228212, 232210, 233214, 238207, 403209, 406216, 233211, 235204, 401209, 401215 and 405205. The removal of these stations took place over 2 model runs until approximately 10% of the stations were removed. The following three figures show important properties of the residuals: Figure 8.1 shows that the histogram of residuals is approximately normally distributed with mean zero and an approximate constant variance (upon visual inspection). However the Breusch-Pagan test indicates that slight heteroskedasticity is present by analyzing the Chi – squared statistic (see Table 8.1). Figure 8.2 represents the normal cumulative plots of the standardized residuals which show that the residuals approximately follow a straight line indicating the residuals are approximately normally- distributed with mean zero; this is supported by the significant Kolmogorov – Smirnov test for normality (see Table 8.1). In Figure 8.3 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity (graphically), however as indicated above, slight heteroskedasticity is present. Refer to Table 8.1 for final equation and for the related important model statistics.

132 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

10

5

Frequency

0 -2 -1 0 1 2 Standardized Residual

Figure 8.1 – Histogram of Standardized Residuals for log Q10

3

2

1

0

-1

Score Normal

-2

-3 -2 -1 0 1 2 Standardized Residual

Figure 8.2 – Normal Probability Plot of the Standardised Residuals for log Q10

2

1

0

-1 Residual Standardized -2

1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 Fitted Value

Figure 8.3 – Standardised Residuals vs. Predicted Values for log Q10

133 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 8.1 Important statistics for the developed prediction equations

Reg. Independent Coeff. Data Equation Variables (β) SE (β) SEE R2 VIF BP-Test SK-Test Points 2 2 log Q1.25 (constant) -4.25 0.397 Chi =0.14 Chi =0.87 log (area) 0.519 0.050 1.0 Prob> Prob> log (rain) 1.37 0.129 1.0 Chi 2=0.71 Chi 2=0.65 log (sden) 0.352 0.144 19% 71% 1.0 99 2 log Q (constant) -1.92 0.326 2 Chi =4.46 2 Chi =0.67 log (area) 0.515 0.042 1.0 Prob> Prob> log (rain) 0.724 0.108 1.0 Chi 2=0.11 Chi 2=0.41 log (sden) 0.495 0.112 16% 69% 1.1 99 2 log Q (constant) 0.224 0.183 2 Chi =7.37 5 Chi =1.13 log (area) 0.504 0.049 1.0 Prob> Prob> log (I12:2) 0.555 0.213 1.0 Chi 2=0.03 Chi 2=0.29 log (sden) 0.591 0.131 18% 56% 1.1 100 log Q (constant) 1.32 0.421 2 10 Chi =5.73 log (area) 0.548 0.055 1.1 Chi 2=0.57 Prob> log (I12:2) 0.942 0.304 1.7 Prob> Chi 2=0.07 log (rain) -0.43 0.170 1.7 Chi 2=0.45

log (sden) 0.457 0.144 20% 53% 1.1 101 log Q (constant) 1.72 0.473 2 20 Chi =3.65 log (area) 0.543 0.062 1.1 Chi 2=0.38 Prob> log (I12:2) 1.145 0.341 1.7 Prob> Chi 2=0.161 log (rain) -0.568 0.191 1.7 Chi 2=0.54

log (sden) 0.467 0.162 22% 48% 1.1 101 log Q50 (constant) 2.02 0.517 log (area) 0.514 0.067 1.1 Chi 2=0.56 Chi 2=1.55 log (I12:2) 1.434 0.369 1.7 Prob> Prob> log (rain) -0.671 0.208 1.7 Chi 2=0.45 Chi 2=0.46 log (sden) 0.445 0.174 26% 44% 1.1 98 log Q100 (constant) 2.27 0.557 log (area) 0.554 0.074 1.1 Chi 2=2.48 Chi 2=0.94 log (I12:2) 1.75 0.399 1.7 Prob> Prob> log (rain) -0.846 0.223 1.6 Chi 2=0.11 Chi 2=0.62 log (sden) 0.591 0.181 26% 46% 1.1 98 log Q200 (constant) 2.37 0.595 log (area) 0.539 0.079 1.1 Chi 2=2.17 Chi 2=0.86 log (I12:2) 1.91 0.427 1.7 Prob> Prob> log (rain) -0.884 0.238 1.6 Chi 2=0.15 Chi 2=0.65 log (sden) 0.58 0.193 29% 43% 1.1 98

8.5 DISCUSSION After deriving these prediction equations, it seems reasonable to make some sort of distinction and comparison back to the model assumptions to ensure that these assumptions have been satisfied reasonably well. This will provide an indication of how statistically significant the prediction equations are, and if the estimated regression coefficients are 2 2 stable. Table 8.1 shows the adjusted coefficient of multiple determination (R ), the (R ) 2 ranged from 43% to 71% respectively for Q200 to Q 1.25 . It is not surprising to see the (R ) of smaller QT’s to be higher than the larger QT’s . This can be explained by the reasoning that

134 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney there is more variance that cannot be made up for by the catchment characteristics in estimating the higher ARI flows. All of the (R2) values are considered to be reasonable and indicate a relatively good linear fit for the prediction equations. For each of the prediction equations, (R2) was improved once the outliers and influential data points were removed.

The standard error of estimate (SEE) ranged from 16% to 29% with respect to the log mean squared error across the ARIs. The lowest value of SEE was achieved by Q1.25 while the largest value was found for Q200. Where the SEE is used as a measure of prediction error of the model, the SEE values are considered to be reasonable. The results can be seen in Table 8.1. Overall the (R2) and SEE values obtained in this thesis are typical for these types of RFFA studies (e.g. Rahman, Rima and Weeks, 2008). This is due to the fact that regression analysis is a data driven approach and there might be undetected noise in the data even after careful screening.

Checking for multicollinearity in the predictors was also looked at. A variable showing relatively very high VIF is highly correlated with one or more of the other independent variables. As can be seen in Table 8.1, for the prediction equations developed, none of the VIFs was found to be significantly high. The VIF values ranged from 1.0 to 1.7. These values are not considered to be high; Norusis (1993) mentions that, for highly correlated variables, VIF may be as high as 50. Thus multicollinearity is not an issue for the developed prediction equations.

Checking for linearity and homogeneity of variance assumptions were also undertaken. Standardised residuals were plotted against the predicted values for the different ARIs as can be seen in Figures 8.1 to 8.3 and Figures F.1 to F.21 in Appendix F. The plots do not show any systematic patterns between the predicted values and the residuals. However, statistical testing using the Breusch-Pagan test revealed slight to moderate heteroskedasticity for all Qs except Q20 as evidenced by the results in Table 8.1. Thus the assumptions of a linear model are to an extent satisfied, but the homogeneity of variance assumption has not largely been met.

The normality of the residuals was assessed. A number of methods for checking the normality of the residuals were undertaken. A normal probability plot and the Kolmogorov-

135 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Smirnov test for normality were adopted. In the case of perfect normal distribution, the points should lie on a straight line. For the study data, the points did not show much departure from a straight line, as shown in Figures 8.1 to 8.3 and Figures F.1 to F.21 in Appendix F. The values for the Kolmogorov-Smirnov test for normality as shown in Table

8.1 also support the normality of the residuals.

MINITAB generates DFbetas or beta weights which indicate the relative importance of the predictor variables. DFbetas are the coefficients of the independent variables when all of the variables are standardised to mean 0 and unit variance. The DFbetas are listed in Table 8.2. The independent variables in the prediction equations are arranged in order of importance determined by the DFbetas, in Table 8.2. The following comments can be made: • Catchment area appears in all the prediction equations, and heads the list for all the ARIs. Thus, catchment area is shown to be the most important variable in estimating Q.

• The climatic characteristics I12: 2 and rain have been found to be the most important predictors, after catchment area. • The next most important predictor is sden , a function of drainage network of the catchment.

Following points can be made from Table 8.2 with respect to the relative importance of the independent variables across the ARIs: • The DFbetas coefficient of catchment area is relatively consistent throughout. This as stated leads the list as the most important independent variable. Catchment area mostly affects the magnitude of flood. • The equations over all the ARIs show consistency in independent variables,

however it is worth noting that rain is absent for Q5 and rainfall intensity is absent

for Q1.25 and Q2.

136 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 8.2 Relative importance of the predictor variables Independent ARI Variables DFbetas

Q1.25 log (area) 0.581 log (rain) 0.495 log (sden) 0.181

Q2 log (area) 0.728 log (rain) 0.166 log (sden) 0.284

Q5 log (area) 0.751 log (I12:2) 0.145 log (sden) 0.221

Q10 log (area) 0.680 log (I12:2) 0.252 log (rain) -0.188 log (sden) 0.321

Q20 log (area) 0.671 log (I12:2) 0.273 log (rain) -0.021 log (sden) 0.280

Q50 log (area) 0.630 log (I12:2) 0.295 log (rain) -0.240 log (sden) 0.274

Q100 log (area) 0.607 log (I12:2) 0.274 log (rain) -0.219 log (sden) 0.264

Q200 log (area) 0.585 log (I12:2) 0.275 log (rain) -0.214 log (sden) 0.261

137 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

8.6 RESULTS FOR WEIGHTED LEAST SQUARES REGRESSION MODEL FOR SKEWNESS A critical issue here is the precision of the regional skewness estimators. The skewness estimators here have an average variance of prediction equivalent to that which would be provided by at-site skewness estimators based upon 95 years of record. This shows that the regional skew values at least in this study can provide relatively stable estimates of a site’s skewness coefficient in comparison to just using the at-site sample skew estimator alone. Also this skew analysis has avoided the correlation between the residuals and the fitted quantiles. The final equation is given by:

log skew = 23 65. + 28.2 log( rain ) −10 29. log( evap ) + .0 246 log( forest ) (8.1)

8.7 DEVELOPMENT OF PREDICTION EQUATIONS USING GENERALISED LEAST SQUARES (GLS) Generalised least squares (GLS) linear regression analysis was carried out using the same catchments identified for the ordinary least squares analysis following the criteria set-out in Sections 3.9.1 and 3.9.3. The formulated prediction equations are shown in Table 8.3 and are compared with the OLS prediction equations. The important model statistics are provided also in Table 8.3 for both the GLS model and the OLS model. An example for

Q10 is provided below on how the GLS regression was undertaken. For the rest of the ARIs, the various statistics and diagnostic plots can be seen in Appendix F. .

8.8 Q 10 MODEL The following two figures show important properties of the residuals: Figure 8.4 shows that the histogram of residuals is approximately normally distributed. In Figure 8.5 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. As compared with the OLS diagnostic plots the GLS diagnostic plots seem to be providing slightly better results. Refer to Table 8.3 for final equation and for the related important model statistics.

138 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Histogram of Residuals (Response is logQ10) 40 35 30

25

20

Frequency 15

10

5

0 -3 -2 -1 0 1 2 3

Standardised Residuals - GLS

Figure 8.4 – Histogram of Standardized Residuals (GLS) for log Q10

GLS Residuals Versus Fitted Plot 3

2

1

0

-1

Standardised Residuals Standardised -2

-3 1.4 1.6 1.8 2 2.2 2.4 Fitted Values

Figure 8.5 – Standardised Residuals vs. Predicted Values (GLS) for log Q10

139 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 8.3 Important statistics for the developed GLS prediction equations

8.9 DISCUSSION After deriving these prediction equations, it seems appropriate to make some sort of distinction and comparison back to the model assumptions to ensure that these assumptions have been satisfied to a large extent. This provides an indication of how successful the GLS prediction equations are, and if the estimated regression coefficients are more stable than their OLS counterparts. Table 8.3 shows that the adjusted coefficient of multiple 2 2 determination (R ) for both the GLS and OLS methods , the GLS (R ) ranged from 50% to 2 78% for Q200 to Q 1.25 respectively It is quite surprising to see the GLS (R ) is significantly higher than the OLS (R2). The OLS (R2) is consistently smaller than the GLS (R2), illustrating how the OLS (R2) underestimates the performance of the model. This shows that the OLS hydrologic regression procedure can give misleading results because it does not make any distinction between the variance due to the model error and the variance due to the time sampling error. Based on this measure alone one could argue that this significant increase in (R2) is enough evidence to adopt a GLS regression model for a hydrological regression exercise.

140 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

The average variance of prediction (VP) over the available dataset is a measure of how well the GLS regression model predicts the true quantile on average. For the estimation set the VP was calculated for both the OLS and GLS methods using Equations 3.67 and 3.68. As seen in Table 8.3 the VP for the GLS regression is consistently smaller than the competing OLS model. As stated by Tasker (1987) when comparing hydrological models a smaller VP is preferred. Therefore it can be stated that on average the GLS regression procedure predicts the true quantiles better than the OLS method. If we average the VPs over all the ARIs in Table 8.3 we obtain the VP for the GLS method equal to 0.041 log units, for OLS it is 0.049, hence there is a 17% (relative to OLS) difference in VP’s. As can be seen in Table 8.3 the average sampling error variance was also assessed using Equation 3.71 and 3.72 for both the GLS and OLS model respectively. Table 8.3 shows a consistent pattern in that the average sampling error variance is significantly smaller than the OLS model (in the order of 90% relative to OLS). This shows that sampling error in this study has had a notable effect on the analysis; this is also evidenced by the differences in model parameter estimates (see Table 8.3). The regression parameters of the OLS model are not therefore truly representative, as the OLS procedure neglects sampling error, model error and correlated residuals.

The standard error of prediction (Sep) can be used to describe the strength of the model. The Sep for the GLS and OLS model estimation set was calculated using Equation 3.69. The Seps ranged from 13% to 27% for GLS and 16% to 29% for OLS. The lowest value of

Sep was achieved by Q2 while the largest value was found for Q200 similar to the OLS method (see Table 8.3) . Where the Sep is used as a measure of prediction accuracy of the model, the Seps for the GLS method is considered to be reasonable. The Sep over all the ARIs for the estimation set for the GLS method is 20% and for the OLS method is 22%. Thus the results indicate regardless of which ARI is being estimated, the GLS method yielded, on average, slightly smaller errors in prediction.

The equivalent years of record, ERL (Hardison 1971), expresses the accuracy of prediction in terms of years of record required to achieve results of equal accuracy. It was calculated using Equation 3.73 for both the GLS and OLS estimation sets, while the results can be seen in Table 8.3, which shows that the GLS method is performing much better in expressing the accuracy of predictions. The GLS method consistently shows an increase in

141 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney the equivalent record lengths. On average over all the ARIs the GLS model predicts a quantile with an accuracy of prediction equivalent to a record length of 11.3 years, while the OLS method only makes that prediction with an equivalent of 5 years. This can be attributed to the fact that the sampling error was significantly reduced with the GLS model.

Checking for linearity and homogeneity of variance assumptions was also undertaken. Standardised residual were plotted against the predicted values for the different ARIs as can be seen in Figures 8.4 and 8.5 and F.22 to F.35 (see Appendix F). The plots do not show any systematic patterns between the predicted values and the residuals. Thus the assumptions of a linear model are to a large extent satisfied, and the homogeneity of variance has now largely been met.

The normality of the residuals was also assessed. If the residuals are approximately normal, about one-sixth of the residuals will fall above 1 and about one-sixth of the residuals will fall below -1, and 95% of the residuals will fall between -2 and +2. Figures 8.4 and 8.5 and F.22 to F.35 show the histogram of the standardised residuals. The figures show that the residuals follow an approximate normal distribution with mean 0. Also for all the ARIs 93- 95% of the residuals fall between -2 and +2, satisfying the assumption of normality very well. All the above assessments now ensure the stability of the regression coefficients. Confidence can be placed in the GLS regression coefficients, as they would be the best linear unbiased estimator as GLS has accounted for the heteroskedascity in the model.

8.10 SUMMARY In this chapter, flood prediction equations have been developed using multiple linear regression adopting both Ordinary Least Squares (OLS) and Generalised Least Squares (GLS) regression methods. For the OLS method variables were selected using the backward variable selection technique, while outliers/ influential data points were removed by using Cooks distance and leverage for each of the ARI models. The GLS method used the variables identified in the OLS method to develop the prediction equations.

The developed equations for the OLS model satisfied the underlying model assumptions to some extent. The histograms of standardised residuals approximately followed a normal distribution, while the standardised residuals versus the predicted values did not reveal any

142 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney marked trends. However, statistical testing revealed slight heteroskedasticity for the developed OLS model, this heteroskedasicity can cause the regression coefficients to be distorted. This chapter has also developed a practical GLS regression model that relates hydrological statistics to catchment characteristics The OLS model has provided misleading results because it does not make any distinction between the variance due to the model error and the variance due to sampling error. As shown, the GLS model proved to be the best framework because the cross correlation of annual maximum flows and the correlated residuals that cause heteroskedacity were accounted for. Regression diagnostics have also shown the GLS method to be more accurate.

In the next chapter, the performance of the developed prediction equations (both OLS and GLS) will be assessed using data of the 20 independent test catchments.

143 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

CHAPTER 9: VALIDATION OF THE DEVELOPED PREDICTION

EQUATIONS AND PRM 9.1 INTRODUCTION

Chapter 8 developed flood prediction equations (for 8 ARIs Q1.25 to Q200 ) using both an Ordinary Least Squares and a Generalised Least Squares regression procedures. The developed prediction equations allow for design flood estimates to be made at ungauged catchments given the relevant catchment characteristics data for the ungauged catchment. To assess the performances of these prediction equations at ungauged catchments we must test the developed prediction equations on a number of test catchments which were selected in Chapter 4 and were not included in the development of the prediction equations in Chapter 8.

As discussed in Chapter 2 and 7, the currently recommended method for prediction of peak floods in the study area (Victoria) is the Probabilistic Rational Method (PRM). Hence testing is carried out to see how the developed prediction equations perform against the PRM. To have a better basis for testing the PRM, new runoff coefficients were developed based on the regular way of mapping and by a new way which develops a regression equation which relates the runoff coefficients to catchment characteristics data. New frequency factors were also obtained in Chapter 7.

In the testing, observed flood quantiles for the test catchments are obtained by flood frequency analysis using FLIKE and then compared against the (a) redeveloped PRM as described in Chapter 7 i.e. C10 (map) and the new FF y values, (b) redeveloped PRM as described in Chapter 7 i.e. C10 (regression) and the new FF y values and (c) newly developed prediction equations (for both the OLS and GLS methods, equations shown in Tables 8.1 and 8.3).

144 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

9.2 AT – SITE FLOOD FREQUENCY ANALYSIS

As discussed in Chapter 3 a LP3 distribution was fitted to each test station’s annual maximum flood data adopting a Bayesian parameter fitting procedure (Kuczera and Franks, 2005). The quantiles calculated will be referred to as the observed quantiles. Flood frequency analysis (FFA) estimates were explored further by adopting the ‘error in discharge’ case facilitated in FLIKE to examine the effects of rating curve errors on flood estimates, as explained Chapter 4.

9.3 QUANTILE REGRESSION TECHNIQUE The quantile regression technique (both OLS and GLS) developed in this study (Equations in Tables 8.1 and 8.3) were applied to the 20 test catchments. The characteristics used in 2 these equations are: catchment area ( area, km ), rainfall intensity ( I12:2 , mm/h), mean annual rainfall ( rain , mm) and stream density (sden, km/km 2).

These characteristics can easily be obtained for any ungauged catchment in future applications by following similar extraction procedures described in Chapter 5. Here predicted flood quantiles Q1.25 , Q2, Q5, Q10 , Q20 , Q50 , Q100 and Q200 are compared against at site flood frequency results. As the at-site flow quantiles provide an imperfect estimate of the true but unknown flow quantile at the site, the at-site FFA estimates from the stations are likely to provide a reasonable benchmark. It is usual practice to compare RFFA estimates with at-site FFA estimates. It should be stated however that the at-site FFA estimates are subject to sampling error and hence this comparison gives an indication of the errors of the developed RFFA methods rather than a measure of true error. Further error analysis is beyond the scope of this thesis. The uncertainty with the QRT (both OLS and GLS) method is estimated by obtaining a relative error which is defined in Equation 9.1. Although, relative error, as obtained from Equation 9.1 is by no means the measure of true error, this may be taken as a ‘reasonable estimate’ of error that might arise with the QRT method as far as practical application of the method is concerned.

Q − Q Re lative error (%) = OLS / GLS FFA ×100 (9.1) QFFA

145 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

9.4 REDEVELOPED PROBABILISTIC RATIONAL METHOD In Chapter 7, the redevelopment of the PRM technique is described. By using Equation 3.74, a similar procedure to the ARR (I.E Aust, 1987), the estimation of flood quantiles for the 20 test catchments are made. Various values are obtained as below:

• Interpolated C10 values were obtained from the redeveloped C10 contour map (Figure 7.1) and as shown in Table 7.2;

• C10 values were also obtained from the regression equation (Equation 7.1). The

newly derived C10 values can be seen in Table 7.2;

• From Table 7.3 frequency factors FF y values are taken

• Using techniques described in Chapter 5, Itc,y values are obtained; and • Catchment area is taken from Chapter 4.

Now, the uncertainty with the PRM method is estimated by obtaining a relative error, defined in Equation 9.1 where QOLS/GLS is replaced by Q PRM (estimates obtained from PRM).

9.5 EVALUATION OF RESULTS The results of the testing included the following: • 4 sets of results from the four different techniques (OLS, GLS and PRM (map) and PRM (reg); • across 8 selected ARIs and • on 20 test catchments.

Due to the different regional flood frequency methods involved over a range of ARIs , it seems logical to assess the performance and success of the different methods visually in addition to numerical assessment. In statistical hydrology, visual assessment is widely adopted in practice. This evaluation is completed in a variety of ways: a) Section 9.5.1 and Figures 9.1 to 9.8 make comparisons of results of various methods (PRM, OLS, and GLS) for the selected ARI. Comments are then provided for the testing on

Q1.25 , Q2, Q5, Q10 , Q20 , Q50 , Q100 and Q200. An evaluation is then given in Table 9.1, which ranks which regional flood estimation model gives the best results with respect to a particular ARI and at – site FFA estimates.

146 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney b) Section 9.5.2 and Table 9.1 then presents these results with respect to each test catchment. Here we can determine which technique worked best on each test catchment and make conclusions based upon associated catchment sizes, the results of which can be seen in Table 9.2; c) Section 9.5.3 then assesses both the median errors and spread of the error using boxplots for each ARI; d) Section 9.5.4 then assess the flood quantile estimates (from all methods) plotted on the at-site FFA plots obtained from ARR-FLIKE to see whether the flood quantile estimates (from a method) fall within the FFA confidence limits. e) Section 9.5.5 looks at strengths of the OLS and GLS models by looking at statistics such as the average variance of prediction and standard error of prediction for the validation sets.

9.5.1 COMPARISON OF TECHNIQUES FOR VARIOUS ARIS

6000

5000 FFA QRT-OLS QRT-GLS 4000

3000 Q1.25 ML/d Q1.25 2000

1000

0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 Test Catchments

Figure 9.1 – Comparison of flood quantiles Q1.25

147 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

25000

FFA 20000 QRT-OLS QRT-GLS PRM_Map 15000 PRM_Reg

Q2 ML/d Q2 10000

5000

0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 Test Catchments

Figure 9.2 – Comparison of flood quantiles Q2

60000

FFA 50000 QRT-OLS QRT-GLS 40000 PRM_Map PRM_Reg

30000 Q5 ML/d Q5

20000

10000

0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 Test Catchments

Figure 9.3 – Comparison of flood quantiles Q5

148 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

100000

90000 FFA QRT-OLS 80000 QRT-GLS 70000 PRM_Map PRM_Reg 60000

50000

Q10 ML/d Q10 40000

30000

20000

10000

0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 Test Catchments

Figure 9.4 – Comparison of flood quantiles Q10

140000 FFA 120000 QRT-OLS QRT-GLS PRM_Map 100000 PRM_Reg

80000

60000 Q20 ML/d Q20

40000

20000

0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 Test Catchments

Figure 9.5 – Comparison of flood quantiles Q20

149 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

180000 FFA 160000 QRT-OLS QRT-GLS 140000 PRM_Map 120000 PRM_Reg

100000

80000 Q50 ML/d Q50 60000

40000

20000

0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 Test Catchments

Figure 9.6 – Comparison of flood quantiles Q50

250000 FFA QRT-OLS 200000 QRT-GLS PRM_Map PRM_Reg 150000

Q100 ML/d Q100 100000

50000

0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 Test Catchments

Figure 9.7 – Comparison of flood quantiles Q100

150 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

250000 FFA QRT-OLS 200000 QRT-GLS

150000

Q200 ML/d Q200 100000

50000

0 T1 T2 T3 T4 T5 T6 T7 T8 T9 T10 T11 T12 T13 T14 T15 T16 T17 T18 T19 T20 Test Catchments

Figure 9.8 – Comparison of flood quantiles Q200

Each of the Figures 9.1 to 9.8 makes a comparison of results of various RFFA methods (PRM, OLS, and GLS) for a particular ARI. The results in the figures show that the different methods mostly agree, however, there are some notable differences among methods for a few catchments, which are not unlikely as these methods differ in principles. None the less the GLS method seems to reproduce the at-site FFA results better than the other methods.

Table 9.1 compares the results of the four techniques on the 20 test catchments based on the graphical representations above. Here the following terminology is used to indicate which technique is performing best:

• "1" is assigned to the column if the technique best predicted the FFA quantile • "2" and "3" and “4” assigned for the next closest estimation.

151 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 9.1 Model comparison summary for various ARI’s

152 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

153 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

From the results in Table 9.1, it is apparent that the GLS method is performing much better than the other techniques. The GLS method shows consistent and better performance for

ARIs Q1.25 , Q2, Q5, Q50 , Q100 and Q200 . For Q1.25 the GLS method gets score 1 for 60% of the cases, but it should be noted here the GLS method is compared only with the OLS method. For Q2 and Q5 the GLS method receives score 1 for 70% and 40% of the cases respectively, clearly showing that the GLS method is predicting the observed quantile much more accurately that the other methods. However, at Q10 and Q20 the PRM-Map is performing slightly better than the GLS method in that Q10 and Q20 receive score 1 for 45% and 35% cases respectively, while the GLS method gets score 1 for 30% and 35% cases respectively. This shows that the PRM-Map method for the medium ARIs is preferable to the GLS method. For both the Q100 and Q200 GLS method is favourable as can be seen in Table 9.1. In summary,

• For the Q1.25 , Q2, Q5, Q50 , Q100 and Q200 models, the results appear to be the best achieved with the QRT-GLS method.

• For Q10 and Q20 models, the redeveloped PRM-Map provides the best results.

9.5.2 COMPARISON OF TECHNIQUES FOR INDIVIDUAL TEST CATCHMENT

Tables 9.2 uses the same information from Table 9.1, but is evaluated differently so that an assessment can be made on which technique worked best on individual test catchment. For each test catchment over the eight ARIs the method that scores 1 the most frequently is considered to be the best method for that catchment. From the results shown in Tables 9.2, the following comments can be made:

• 11 out of the 20 test catchments over the 8 ARIs spread over different catchment sizes show that the GLS method is performing consistently better. However, the smaller sized catchments generally show the best results (for all ARIs) for the GLS method. • 7 out of the 20 test catchments over the 8 ARIs for the larger catchment sizes show that the PRM Map method performs quite well and consistently better.

154 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

• Both the OLS and PRM methods get a score 1 for smaller number of cases, suggesting that these methods are not really suitable, however from Table 9.1 both these methods score 2 and 3s which also suggest they are not performing that badly. • Overall, the QRT - GLS and redeveloped PRM Map are the outstanding performers, particularly the QRT-GLS method with score 1 for 63 cases out of a possible 80 cases.

Table 9.2 - Comparison of Techniques for individual test catchment

9.5.3 COMPARISON OF METHODS USING MEDIAN ERRORS AND BOXPLOTS 9.5.3.1 MEDIAN RELATIVE ERRORS & BOXPLOTS

Now, the uncertainty with the QRT – OLS & GLS and PRM methods is examined by obtaining a relative error, defined in Equation 9.1. Relative error values are looked at in two different ways: (a) by ignoring the sign of the relative error which allows quantifying the degree of error without stating whether the method is underestimating or overestimating the observed FFA estimates; (b) by considering the sign and magnitude of the relative error

155 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney to measure the degree of bias (i.e. possibility of under- or over-estimation) associated with the method.

The median relative errors associated with the different methods for various ARIs are provided in Table 9.3, which shows that the 5-year ARI estimate for the PRM has the lowest median relative error, and then there is an increase in median relative error with increasing ARIs. For the 2-year ARI, the median relative error is higher than those of 5 and

10-year ARIs, which is somewhat unexpected for the PRM. When C10 is obtained from the regression equation the median relative error values are little higher (except for 10 years

ARI) than those obtained based on the C10 contour map. Overall, when C10 is obtained from the contour map, the median error ranges from 17 to 38% as opposed to 19 to 44% when

C10 is obtained from the regression equation. Looking at the median relative error associated with both quantile regression techniques, the OLS seems to have much larger relative errors as compared to the PRM methods and QRT-GLS method. The GLS method’s relative errors are more in line with the PRM method, even though the PRM shows slightly smaller values. However for the larger ARIs the results for both the GLS and PRM methods suggest the uncertainty is quite similar which in simple fact is true once we start extrapolating into the higher ARIs.

Table 9.3 Median relative errors (%) for quantile estimates for various methods

ARI (years) 1.25 2 5 10 20 50 100 200 PRM C10 obtained from contour map - 28 17 26 30 35 38 - PRM C10 from regression (Equation 5) - 31 19 23 36 41 44 - QRT - OLS 42 44 43 44 53 61 50 53 QRT - GLS 35 42 33 36 44 50 44 49

Table 9.4 Some important statistics of relative error values

Relative error values greater % of cases than (%) C10 from C10 from QRT-OLS QRT-GLS contour map regression equation 25 65 62 53 54 50 35 28 30 36 75 14 12 32 35 100 6 9 16 9

156 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table 9.4 shows that there are 6%, 9%, 16% and 9% of cases respectively when the relative error values exceed 100% and there are 35%, 28%, 30% and 35% respectively for each method of the cases when the relative error values are smaller than 25%. As can be seen also in Table 9.4, the QRT-GLS method is performing just as well as the PRM methods in the range of errors greater than 100%. It is not to be forgotten that this comparison for the QRT - OLS and GLS methods has included the 1.25 and 200 year ARI, which is why they show slightly higher values than the competing PRM methods.

Figure 9.9 shows the box plots of relative errors which consider the sign of the relative error, i.e. a negative value indicates that the PRM underestimates the observed flood quantiles and a positive value indicates an overestimation by the PRM. This figure reveals that for Q2, the overall error band is much higher when C10 is obtained from the contour map. For Q5 and Q10, the error bands are very similar. For higher ARIs, it represents a smaller error band when C10 is obtained from the contour map. Generally, most of the relative error values are within the ± 40% band except for Q2. In terms of bias, except for Q2, PRM underestimates for all the ARIs, which is more noticeable for higher ARIs and much more obvious when C10 values are obtained from the contour map. Overall, there are about 58% of cases (70 cases out of 120 representing 6 ARIs and 20 test catchments) when PRM underestimates the observed floods.

200

C10 from Contour Map

100

0

-100

-200 Q2-PRM Q5-PRM Q10-PRM Q20-PRM Q50-PRM Q100-PRM

157 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

200 C10 from Regression

100

0

-100

-200 Q2_PRM Q5_PRM Q10_PRM Q20_PRM Q50_PRM Q100_PRM

Figures 9.9 Box plot of relative errors associated with the PRM. (Relative errors in the range of ±±± 200% are displayed to highlight the errors that are most likely to occur). Negative values of error indicate that the PRM underestimates the observed flood quantiles. The horizontal line in each box depicts the median, the lower and upper whiskers indicate 5 th and 95 th percentiles, respectively, and outliers are plotted as stars outside the whiskers.

Figure 9.10 shows the box plots of relative errors which consider the sign of the relative error, i.e. a negative value indicates that the QRT underestimates the observed flood quantiles and a positive value indicates an overestimation by the QRT. This figure reveals that for Q1.25 and Q2, the overall error band is much higher when the OLS method is used. The GLS procedure shows smaller error bands for the 2, 5, 10 and 20 ARIs. For the higher ARIs both the QRT-GLS method and the PRM methods are performing quite similar. Overall, there are about 35% and 46% of cases (59 cases out of 160 and 73 out of 160 representing 8 ARIs and 20 test catchments) respectively for OLS and GLS when QRT underestimates the observed floods.

158 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

200

100

0

-100

QRT - OLS -200

Q1.25 Q2 Q5 Q10 Q20 Q50 Q100 Q200

200

100 Median

0

-100

QRT - GLS

-200 Q1.25 Q2 Q5 Q10 Q20 Q50 Q100 Q200

Figure 9.10 Box plots of relative errors in design flood estimates from the OLS and GLS methods

159 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

9.5.4 COMPARISON OF METHODS USING AT SITE FLOOD FREQENCY ANALYSIS PLOTS Both the OLS and GLS estimates were plotted and superimposed on at-site flood frequency plots to firstly get an idea of how they are performing as compared to at site flood frequency analysis (FFA), and to see wether the estimates fall within the confidence limits given by FLIKE for that catchment. Some of the results have kinks in the curve; however this is not worrying as this tends to happen with any quantile regression technique. Hydrological judgement would prevail in circumstances such as these and appropriate smoothing of the curve would be necessary. It is interesting however to note that the GLS estimates (and most of the OLS estimates) for the smaller catchments fall within the 90% confidence limits provided by FLIKE, which is quite satisfying. Figures 9.11 shows typical results from the GLS and OLS methods superimposed on the FFA plot (The rest of the plots can be seen in Appendix G). The results from these plots are found to be very positive and generally the GLS method provided a better match than the OLS one. Considering all the stations, 11 stations (55% of the stations) show a very reasonable result where the QRT estimates fall within the confidence limits of at-site FFA.

Also, the PRM estimates are plotted on the at-site FFA plot obtained from ARR-FLIKE to see whether the PRM estimates fall within the FFA confidence limits. Figure 9.12 shows one typical plot (The rest of the plots can be seen in Appendix F). Considering all the stations, 17 stations (85% of the stations) show a very reasonable result where the PRM estimates fall within the confidence limits of at-site FFA. Generally, better results are obtained when C10 values are estimated from the regression equation as compared to the method that is based on the C10 contour map. There are only 2 test catchments where the PRM estimates fall outside the confidence limits of at-site FFA, which seems to be surprisingly good results for the PRM.

160 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Station 221207 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 1000000

100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile - LP3 (BAY-FIT) 90% Confidence Limits OLS REG GLS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y) Figure 9.11 Typical comparison of OLS and GLS estimates for a test catchment (area < 300 km 2)

Station 227210 100000

10000

1000 Discharge(ML/day)

100 Gauged flow Quantile - LP3 distribution 90% confidence limits of LP3 estimate PRM - C10 from contour map PRM- C10 from regression

10 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure 9.12 Typical comparison PRM estimates plotted on FFA plot

161 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

9.5.5 STRENGTHS OF THE OLS AND GLS MODELS

2 The average variance of prediction γˆ p , based on the test catchments for each ARI (where each ARI has a different number of catchments in the model) is then estimated by Equations 3.67 and 3.68.

Table 8.3 shows the average variance of prediction for the validation set in log base 10 units. To get an idea of how the models are performing we converted the log units for the validation set by Equation 3.69. Clearly it can be seen Table 8.3 that the GLS method shows on average smaller standard errors of prediction than the OLS method. Looking at the relative difference (compared to OLS) over all the ARIs, on average the GLS method has a 9% smaller standard error of prediction and 30% smaller average variance of prediction for the validation set. Similar results were found by Stedinger and Tasker (1985), Tasker et al. (1986) and Haddad et al. (2006).

9.6 SUMMARY In this chapter, independent testing of the QRT (both OLS & GLS) and PRM has been done on the 20 test catchments. Based on the ranking method of Section 9.5.1 the GLS seems to perform best consistently over all ARIs, while the ranking scheme of Section 9.5.2, which looks at the same results in a different way, shows that the GLS method is performing well in particular for the smaller sized catchments (<300 km 2). For the larger catchments, the PRM Map method seems to perform better.

Based on the median relative error values, Q2, Q5, Q10 and Q20 flood quantiles show best results with the redeveloped PRM (from contour map) (28%, 17% and 26% respectively), while for the Q50 and Q100 QRT (GLS) and PRM (from contour map and regression equation) shows similar results with median relative error values of 44%, 38% and 44% respectively. Both the

PRM and QRT shows similar median relative error value for the Q 20 . The box plots show similar findings. However, in terms of bias, QRT seems to outperform the PRM for all the 2 quantiles. Test catchments with areas smaller than 300 km seemingly show best results with the QRT - GLS method. However, the redeveloped PRM underestimates the flood quantiles for all the ARIs (58% underestimation as compared to 46% of the GLS method). Overall, it seems that redeveloped PRM (either approach) is preferable to medium to larger catchments at all ARIs.

162 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

The QRT-GLS method is preferable for all the ARIs in particular for the smaller catchments. The strengths of the OLS and GLS models were also assessed by looking at the average variance of prediction and the standard error of prediction, again the GLS method yielded on average better results.

Finally, considering the rankings, median errors, bias, catchment size, spread of the errors, various statistics associated with the OLS and GLS methods and theoretical aspects of the PRM and QRT (both OLS & GLS) (PRM is based on geographical interpolation method which is difficult to justify and for OLS no allowance is made for sampling error and model error), the QRT - GLS method would be the best choice for the study area. However, for this method some smoothing of final flood frequency curves may be required for some ungauged catchments.

163 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

CHAPTER 10: CONCLUSIONS 10.1 INTRODUCTION Design flood estimation in small to medium sized catchments is frequently required in hydrologic analysis and design and is of notable economic significance. Australian Rainfall and Runoff (ARR) 1987 recommends the Probabilistic Rational Method for general use in South East Australia.

Other recent studies such as Rahman (2005a) and Haddad et al. (2006) have shown promising results in the generation of peak discharges via the quantile regression technique (QRT) for both the ordinary least squares (OLS) and generalised least squares (GLS) methods in the state of Victoria. The QRT (GLS) method is an extension of the OLS method that accounts for the sampling variability and the true model error structure in the regional model, thus providing more stable estimates of the hydrological model parameters.

10.2 OVERVIEW OF THIS STUDY The investigations involved in developing the quantile regression technique and Probabilistic Rational Method in the study area of Victoria is summarised in the following steps:

(i) Data selection and preparation: A total of 415 stations across the state of Victoria were initially selected for the study based on catchment size, streamflow record length (>10 years), general streamflow data quality, degree of regulation, urbanisation and land use change. Further examination indicated that many stations did not satisfy the criteria of homogeneity and representativeness for the purposes of regional flood frequency analysis. Furthermore, to reduce the potential effects of inter-decadal variability, the minimum length of records (after infilling of some missing records) was increased to 25 years. This was necessary due to the presence of a downward trend in annual maximum flood series for many stations after the late 1980s. An outlier test was then conducted for each of the stations. Finally, the influence of errors on flood frequency curves from the extrapolation of rating curves was minimised by placing limits on the degree of extrapolation involved in estimating the largest observed flood events using the in-built tool in the FLIKE software (which implements the principles outlined in Kuczera and Franks, 2005). At the end of this process, only 133 stations were left in the database. The study catchments are mainly rural with no known major land use changes over the periods of records. A total of 8 catchment 164 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney characteristics that are perceived to mainly govern the flood generation process and are relatively easy to obtain were selected, and extracted for each catchment (see Chapter 5 for details). Log transformation for RFFA has been adopted in this thesis. However other transformation could be adopted but this is outside the scope of this thesis.

From the 133 catchments, 20 were selected at random and put aside for split-sample validation. The remaining 113 catchments were used to develop prediction equations and new Probabilistic Rational Method.

(ii) Searching for homogenous regions: An empirical approach was adopted to try and identify homogenous regions within Victoria. The approach involved delineating Victoria into possible regions starting with a sufficient number of stations and then subtracting stations that were discordant with the rest of the group. The attempt was proved to be largely unsuccessful.

(iii) Development of prediction equations (OLS): Flood prediction equations were then developed using multiple linear regression adopting ordinary least squares method. The statistical package MINITAB was used to apply multiple linear regressions using the log transformed variables. The success of the models was measured by the traditional R2, Sep and VIF . Linearity assumptions often employed in ordinary least squares regression are violated in hydrological regression. That is the residuals should be normally distributed with constant variance i.e. homoskedasitic. To supplement the traditional statistics in assessing the success of the developed model two test statistics were adopted (a) Breusch- Pagan (also known as the Cook-Weisberg test) test for heteroskedasticity and (b) Kolgmogorov – Smirnov test for normality of residuals.

(iii) Development of prediction equations (GLS): The goal of the generalised least squares (GLS) regression analysis in this study was to develop the best model for estimating flood quantiles at ungauged catchments as a function of catchment characteristics. The basin characteristics used in the OLS analysis were also used in the GLS analysis as suggested by Tasker et al. (1987).

The GLS regression procedure was explored further by carrying out a 4-stage generalised least squares analysis where the development of the prediction equations was based on 165 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney relating hydrological statistics such as mean flows, standard deviations, skewness and flow quantiles to catchment characteristics.

The success of the models was measured by a set of new statistics such as a psuedo R2, Sep and average variance of predictions and estimated equivalent years of record length. The results can be seen in Chapter 8. In summary the GLS method outperforms the traditional method.

(iv) Redevelopment of the Probabilistic Rational Method: The Probabilistic Rational Method was redeveloped using the steps described in ARR (I.E Aust, 1987) to incorporate the latest streamflow data available as well as the latest flood frequency analysis techniques. From the data derived in Chapters 4 and 5, C10 values were generated for each of the 113 catchments, GIS MAPINFO software was then used to create a series of isopleth contours. Flood frequency factors (ratio of the various flood quantiles with 10 year flood) (FF y) were also estimated for the study data set. An alternative procedure was also explored in that a prediction equation was developed between C10 and catchment and climatic characteristics as independent variables, using a multiple linear regression technique.

(v) Testing and validation: In the testing, observed flood quantiles obtained by flood frequency analysis were compared against the (a) redeveloped PRM as described in

Chapter 7 (i.e. C10 map and the new FF y values) (b) redeveloped PRM as described in

Chapter 7 (i.e. C10 (regression) and the new FF y values) and (c) newly developed prediction equations (equations for both the OLS and GLS methods).

10.3 CONCLUSIONS This thesis examined the relative performances of the ordinary and generalised least squares and Probabilistic Rational Method for regional flood frequency analysis in the state of Victoria in Australia. The following conclusions can be drawn: • Eight regional flood prediction equations for recurrence intervals ranging from 1.25 to 200 years were developed for Victoria; they contain hydrologically meaningful climatic and catchment characteristics variables which can be obtained relatively easily from published climate data CDs and maps.

166 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

• The developed prediction equations obtained with GLS regression have satisfied the underlying assumptions of least squares based multiple linear regression very well with evidence of residuals being approximately-normally distributed, low values of standard error of prediction, low values of average variance of prediction and pseudo R2 values higher than the traditional OLS values. Thus, confidence can be placed on these equations.

• The weighted least squares analysis to estimate the skew value for each site was dealt with in a statistically meaningful manner that provided regional skew values with an average variance of prediction equivalent to a record length of 92 years, which is more reliable than using an at-site skew estimate from only 25 years to 50 years of streamflow data.

• Even though the cross correlations among concurrent annual maximum flows were quite small (average 0.30), there were still significant differences in the regression coefficient estimates obtained from the OLS and GLS methods. This can be attributed to the fact that the sampling error in this study had a notable effect on the analysis, and accounting for the correlated residuals provided parameter estimates that were less biased than those from the OLS method.

• The GLS method provided a better estimate of the model’s predictive ability, as evidenced by smaller values of average variance of prediction and percentage standard error of estimates for the GLS method than the OLS one. The average sampling error for the GLS method was also smaller than the OLS one.

• The median relative errors dropped significantly with the GLS method. Box plots show that the GLS procedure generally has smaller error bands, particularly for smaller to medium average recurrence intervals.

• The quantile regression technique, in particular the GLS-based method, provides reasonable quantile estimates for smaller sized catchments ranging up to 300 km 2 in Victoria.

• The updated PRM method is found to be capable of providing design flood estimates of reasonable accuracy for ungauged catchments in Victoria. 167 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

• The C10 values can be obtained from the new C10 contour map or alternatively from the developed prediction equation which requires four readily obtainable climatic and catchment characteristics variables. Generally, this prediction equation provides

better results than the method based on the C10 contour map.

• The median relative error values from the new PRM are in the range of 17% to 44%

with lowest median error values for Q 5, Q 10 and Q 20 . However, there are about 7% of cases when the relative error values exceeded 100% and 13% cases when the relative error values exceeded 75%.

• Except for Q 2, the PRM tends to underestimate the observed floods, and there were about 60% of cases for the 20 test catchments and 6 ARIs where the PRM underestimated the observed flood quantiles.

Overall, it seems that the QRT-GLS should preferably be used on the smaller to medium sized catchments for all ARIs. The PRM should preferably be used on the larger catchments on all ARIs.

Finally, considering the rankings, median errors, bias, catchment size, spread of the errors, various statistics associated with the OLS and GLS methods and theoretical aspects of the PRM and QRT (both OLS & GLS), the QRT - GLS would be the best choice for the study area.

10.4 SUGGESTIONS FOR FUTURE RESEARCH • The GLS regression procedure showed better results on smaller sized catchments. These results were significant over all the ARIs which suggest that the development of a GLS regression procedure for the set of smaller catchments would be more appropriate for practical application. This means that catchments greater than 300 km 2 should be excluded from the dataset.

• The input of skewness in the error covariance matrix is very important in minimizing the sampling error even further and reducing the residual correlation between the skewness and the fitted quantile, this study should be repeated by 168 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

looking at a generalised least squares skewness estimator that accounts for the varying sampling error, cross correlation of concurrent flows at representative sites and model error, a sensitivity analysis should be undertaken to compare the results.

• The generalised least squares procedure presented here seems to have great practical implication, thus its applicability to other states in Australia should be undertaken and assessed.

• This study adopted log transformation in developing prediction equations as generally done in RFFA, however, other transformations should be tested to examine possible further improvement in the developed prediction equations.

169 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

REFERENCES Acreman, M.C. and Sinclair, C.D. (1986). Classification of drainage basins according to their physical characteristics: an application for flood frequency analysis in Scotland. J. Hydrology., 84:365-380.

Acremen, M.C. and Wiltshire, S.E. (1989). The regions are dead: long live the regions. Methods of identifying and dispensing with regions for flood frequency analysis. IAHS Publ., 187: 175-188.

Ahmad, M.I., Sinclair, C.D and Werrity, A. (1988). Log – logistic flood frequency analysis. J. Hydrology., 98:205-224.

Alila, Y.P., Adamowski, K. and Pilon, J. (1992). Regional Homogeneity testing of low- flows using L moments. In: Proceedings of 12 th conference on probability and statistics in the Atmospheric sciences, 5 th International Meeting on Statistical Climatology, Toronto, Ont., 22-26 June, 1992.

Bates, B.C. (1994). Regionalisation of Hydrological Data: A review. Report 94/5. CRC for Catchment Hydrology, Monash University, Australia; pp61. Bates, B.C, Rahman, A and Weinmann, P.E (1997). ‘Towards a New Regional Flood Frequency Analysis Procedure for South-east Australia’ In Proc 24th Intl. Hydrology and Water Resour. Symp; :179-184.

Bates, B.C, Rahman, A, Mein, R.G and Weinmann, P.E. (1998). Climatic and physical factors that influence the homogeneity of regional floods in south-eastern Australia. Water Resour. Res., 34(12), 3369-3382.

Benson, M.A. (1968). Uniform flood frequency estimating methods for federal agencies, Water Resour. Res., 4(5): 981-908.

Benson, M.A. (1959). Channel slope factor in flood frequency analysis. J. of the hydraul. Div., ASCE, 85(HY4), 1-19.

Benson, M.A. (1962). Evolution of methods for evaluating the occurrence of floods. U.S. Geol Surv. Water Supply Paper, 1580-A, 30pp.

Caballero, W.L. (2007). Regional design flow estimation for the state of Victoria. Unpublished B.Eng(Honours) Thesis, School of Engineering, University of Western Sydney.

Chow, V.T. (1964). Handbook of Applied Hydrology. McGraw-Hill Book Company.

Chowdhury, J.U., Stedinger, J.R, and Lu, L.H. (1991). Goodness of fit tests for regional flood distributions. Water Resour. Res., 27(7): 1765-1776.

Cunnane, C. (1988). ‘Methods and merits of regional flood frequency analysis’. Journal of Hydrology, vol 100, pp 269-290

170 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Cunnane, C. (1989). Statistical Distributions for Flood Frequency Analysis. World Meteorological Organisation, Operational Hydrology Report. No. 33. Cunnane, C. and Nash. J.E. (1974). Bayesian estimation of frequency of hydrological events. In: Mathematical Models in Hydrology, Vol. 1, Proceedings of the Warsaw symposium, IAHS Publ. No. 100:47-55.

Dalrymple, Tate (1960). ‘Flood frequency analyses’. U.S. Geol. Surv. Water Supply, pap 1543-A,80 pp

Dawdy, D.R. (1961). Variation of flood ratios with size of drainage area. U. S. Geol. Surv. Prof. Pap. 424-C, Paper C36.

Dillon, R.W. and Goldstein, M. (1984). Multivariate Analysis. John Wiley and Sons, N. Y.

Draper, N.R., and H. Smith (1981): Applied regression analysis, 2 nd ed. John Wiley, New York.

Efron, B. (1977). “Bootstrap Methods: Another look at the Jacknife,” Technical report No. 32, Division of Biostatistics, Stanford University, Stanford, Calif.

Efron, B. (1978). “Computers and the Theory of Statisitcs: Thinking the Unthinkable.” Technical report No. 39, Division of Biostatistics, Stanford University, Stanford, Calif.

Fill, D.H. and Stedinger J.R. (1995a). L moment and PPCC goodness-of-fit tests for the Gumbel distribution and effect of autocorrelation. Water Resour.Res., 31(1) 225-229.

Fill, D.H. and Stedinger J.R. (1995b). Homogeneity tests based upon Gumbel distribution and a critical appraisal of Darymple’s test. J. Hydrology., 166:81-105.

Filliben, J.J. (1975). The probability plot correlation coefficient test for normality, Technometrics, 17(1), 111-117.

Flavell, D.J. (1982). The rational method applied to small rural catchments in the south west of Western Australia, Hydrology and Water Resources Symposium, p49-53. .

Fontaine. R.A. (1986). Comparison of two stream-discharge record reconstruction techniques for eight gauging stations in Main. U.S Geological Survey. Water Supply Paper 2290 107-111.

Griffis, V.W and Stedinger J.R. (2007). The use of GLS regression in regional hydrologic analyses. J. Hydrology., 344:82-95.

Guttman, N.B. (1993). The use of L-moments in the determination of regional precipitation climates. J, Climate, 6:2309-2325.

Haddad, K., Rahman, A. and Weinmann, P.E. (2006). Design flood estimation in ungauged catchments by quantile regression technique: ordinary least squares and generalised least

171 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney squares compared. In Proc. 30th Hydrology and Water Resources Symp., The Institution of Engineers Australia, 4-7 Dec 2006, Launceston, 6pp.

Haddad, K. and Rahman, A. (2007). Investigation on at-site flood frequency analysis in south-east Australia. Journal of the Institution of Engineers Malaysia.

Hardison, C.H. (1971). Prediction error of regression estimates of streamflow characteristics at ungauged sites. U.S. Geol. Pap., 750-C, C228-C236.

Hewa, G.A., McMahon, T.A., Peel, M.C. and Nathan, R.J. (2003). Identification of the most appropriate regression procedure to regionalise extreme low flows, 28 th Intl. Hydrology and Water Resour. Symp, 10-13 Nov., 2003.

Hosking, J.R.M and Wallis, J.R (1990). regional flood frequency analysis using L- moments. IMB math Res. Rep. RC 15658, IMB T.J. Watson Research Centre, Yorktown Heights, N.Y., 12pp.

Hosking, J.R.M and Wallis, J.R (1991). Some statistics useful in regional flood frequency analysis. IMB math Res. Rep. RC 17096, IMB T.J. Watson Research Centre, Yorktown Heights, N.Y., 12pp.

Hosking, J.R.M and Wallis, J.R (1993). “Some statistics useful in regional flood frequency analysis.” Water Resour. Res., 29(2): 271-281.

Hosking, J.R.M. (1986). The theory of probability weighted moments. Res. Rep. RC 12210, IBM Res., Yorktown-Heights, N.Y.

Hosking, J.R.M. and Wallis, J.R. (1986). The value of historical data in flood frequency analysis. Water Resour. Res., 22(11): 1606-1612.

Houghton, J.C. (1978). Birth of a parent: The Wakeby distribution for modelling flows. Water Resour. Res, 14(6): 1105-1115.

Institution of Engineers Australia (I. E. Aust.) (1987, 2001). Australian Rainfall and Runoff: A Guide to Flood Estimation. Vol.1, I. E. Aust., Canberra.

Interagency Advisory Committee on Water Data. (1982). “Guidelines for Determining Flood Flow Frequency,” Bulletin #17 of the hydrology subcommittee, OWDC, US Geological Survey, Reston, VA.

Jackson, D.R. (1981). WRC standard flood frequency guidelines. J. Water Resources Planning Management. Div., ASCE., 107(WR1): 211-224.

Johnston, J. (1972). Econometric Methods, Mc-Graw Hill, New York.

Klemes, V. (1987 a,b). Hydrological and engineering relevance of flood frequency analysis. In V.P. Singh (ed.), Hydrologic frequency modelling. Kluwer Academic Publishers.

172 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Kroll, C.N. and Stedinger, J.R (1999). Development of regional regression relationships with censored data. Water Resour. Res., (35)3, 775-784.

Kuczera, G. (1982a). Robust flood frequency models. Water Resour. Res., 18(2): 315-324.

Kuczera, G. (1982b). Combining site-specific and regional information: An empirical Bayes approach. Water Resour. Res., 18(2): 306-314.

Kuczera, G. (1983a). A Bayesian surrogate for regional skew in flood frequency analysis. Water Resour. Res., 1919(3):821-832.

Kuczera, G. (1983b). Effect of sampling uncertainty and spatial correlation on an empirical Bayes procedure for combining site and regional information. J. Hydrology., 65: 373-398.

Kuczera, G. (1999). FLIKE HELP, Chapter 2 FLIKE Notes, University of Newcastle.

Kuczera, G. and Franks, S. (2005). At-site flood frequency analysis. Australian Rainfall and Runoff, Book IV, Draft Chapter 2.

Lay, M. (1989). ‘Design flood estimation for ungauged rural catchments in Victoria’. Road Construction Authority, Tech Bulletin No. 38, pp 1-17.

Lettenmaier, D.P. and Potter, K.W. (1985). Testing flood frequency estimation methods using a regional flood generation model, Water Resour. Res., 21(12): 1903-1914.

Lu, LH and Stedinger, J.R (1992). ‘Sampling of variance of normalized GEV/PWM quantile estimators and a regional homogeneity test’. Journal of Hydrology, vol 138, pp 223-245.

Marin, C. (1983). Uncertainty in water resources planning, Ph.D. thesis, Harvard Univ., Cambridge Mass.

Matalas, N.C, and E.J., Gilroy (1968). Some comments on regionalisation in hydrological studies. Water Resour. Res., 4(6), 1361-1369.

Matalas, N.C, and M.A. Benson, (1961). Effects on interstation correlation on regresson analysis. J. Geophys. Res., 66(10), 3285-3293.

Micevski, T., Kuczera, G. and Franks, S.W. (2006). A Bayesian Hierarchical Regional Flood Model. 30th Hydrology and Water Resources Symp., The Institution of Engineers Australia, 4-7 Dec 2006, Launceston, 6pp.

Moss, M.F. and M.R. Karlinger: (1974). Surface water network design by regression simulation. Water Resour Res., 10(3), 427-433.

National Research Council (1988). Committee on Techniques for Estimating Probabilities of Extreme Floods, “Estimating Probabilities of Extreme Flooods, Methods and Recommended Research”, National Academy Press, Washington, D.C.

173 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Natural Environment Research Council (NERC) (1975). Flood Studies Report, NERC, London.

Norusis, M.J. (1993). SPSS for windows. SPSS Inc. Chicago, IL.

Pearson, C.P. (1991). New Zealand regional flood frequency analysis using L moments. J. Hydrology., New Zealand, Vol. 30(2): 53-64.

Pilgrim, DH and Cordery, I. (1993). ‘Flood Runoff.’ In: Maidment, D.R. (Ed.), Handbook of Hydrology, McGraw-Hill, New York, Chapter 9.

Pillon, P.J, and Adamowski, K. (1992). The value of regional information to flood frequency analysis using the method of L-moments. Can. J. Civ. Eng., 19(1): 137-147.

Potter, K.W. and Lettenmaier, D.P. (1990). ‘A comparison of regional flood frequency estimation mean using a resampling method’. Water Resour. Res, vol 26, iss 3 , pp 424.

Rahman, A. (2005). A Quantile Regression Technique to Estimate Design Floods for Ungauged Catchments in South-East Australia. Aust. Jour. of Water Resour. 9(1), 81-89.

Rahman, A, Mein, R.G, Bates, B.C and Weinmann (1999a). ‘An integrated Dataset of Climate, Geomorphological and Flood Characteristics for 104 Catchments in South-east Australia” Working Document 99/2. Cooperative Research Centre for Catchment Hydrology.

Rahman, A., Weinmann P.E. and Mein R.G. (1999b). ‘At-site flood frequency analysis: LP3- product moment, GEV-L moment and GEV-LH moment procedures compared In Proc 2nd Intl. Conference on Water Resour. and Env. Research,I.E Aust., 6-8 July, 1999; 2, pp715-720.

Rahman, A. (1997). Flood Estimation for ungauged catchments: A regional approach using flood and catchment characteristics, PhD thesis, Department of Civil Engineering, Monash University.

Rahman, A. and Hollerbach, D. (2003). Study of Runoff Coefficients Associated with the Probabilistic Rational Method for Flood Estimation in South-east Australia In Proc. 28 th Intl. Hydrology and Water Resources Symp., I. E. Aust., Wollongong, Australia, 10-13 Nov. 2003, Vol. 1, 199-203.

Rahman, A., Rima, K. and Weeks, W. (2008). Development of Regional Flood Estimation Methods Using Quantile Regression Technique: A Case Study for North-eastern Part of Queensland, 31 st Hydrology and Water Resources Symp., Adelaide, 15-17 April 2008, 329- 340.

Rao, R.A and K. Hamed (2000). Flood Frequency Analysis. CRC Press LCC, 2000 NW Corporate Blvd., Boca Ranton, Florida.

Reis Jr., D.S., Stedinger, J.R., Martins, E.S., (2003). Bayesian GLS regression with application to LP3 regional skew estimation. In: Bizier, P., DeBarry, P (Eds), Proceedings

174 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney of thee World Water and Environmental Resources Congress, Philadelphia, PA, June 23- 26. American Society of Civil Engineers.

Reis Jr., D.S., Stedinger, J.R., Martins, E.S. (2005). Bayesian GLS regression with application to LP3 regional skew estimation. Water Resour. Res., 41, W10419, (1) 1029.

Riggs, H.C. (1973). Regional analyses of streamflow techniques. Techniques of Water Resources Investigations of the U.S. Geol. Surv., Book 4, Chapter B3, U.S. Geol. Surv., Washington D.C.

Rijal, N and Rahman, A (2005). Design flood estimation in ungauged catchments: Quantile Regression Technique and Probabilistic Rational Method compared. In Proc. International Congress on Modelling and Simulation, 12-15 Dec 2005, Melbourne, ISBN 0-9758400-1- O CD Rom Publication, p1887-1893.

Rossi, F., Fiorention, M and Versace, P. (1984). Two –component extreme value distribution for flood frequency analysis. Water Resour. Res., 20(7): 847-856.

Smith, J.A. (1987). Estimating the upper tail of flood frequency distributions. Water Resour. Res, 23(18): 1657-1666.

Stedinger, J.R. (1983). “Estimating a Regional Flood Frequency Distribution”, Water Resources Research, Vol. 19, No. 2, pp. 503-510. Stedinger, J.R., Vogel, R.M and Foufoula-Georgiou, E. (1993). Frequency analysis of extreme events. In: Handbook of Hydrology, edited by D.R Maidment, McGraw-Hill, N.Y., Chap.18. Stedinger, J.R and Tasker, G.D. (1985). Regional Hydrologic Analysis, 1.Ordinary, Weighted, and Generalised Least Squares Compared, Water Resources Research 2209/1421:1432.

Stedinger, J.R and Tasker, G.D. (1986). Regional Hydrologic Analysis, 2. Model Error Estimators, Estimation of Sigma and Log – Pearson type 3 Distribution, Water Resources Research 2210/1487:1499.

Tasker, G.D and Stedinger, J.R. (1987). Regional regression offlood characteristics employing historical information. In: W.H. Kirby, S.Q. Hua and L.R. Beard (ed), Analysis of Extra-ordinary flood events. J. Hydrology., 96:255-264.

Tasker, G.D. and Stedinger, J.R., (1989). An Operational GLS model for Hydrologic Regression, Journal of Hydrology 111/361:375.

Tasker, G.D. and Stedinger, J.R. (1986). Estimating Generalised Skew with Weighted Least Squares Regression, Journal of Water Resources Planning and Management, Vol 112(2), pp 225-237.

Tasker, G.D., Eychaner, JH and Stedinger, J.R. (1986). Application of Generalised Least Squares in Regional Hydrologic Regression Analysis, U.S. Geological Survey Water Supply Paper 2310/107:115.

175 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Tasker, G.D and Stedinger, J.R. (1986). Estimating Generalised Skew with Weighted Least Squares Regression, Journal of Water Resources Planning and Management, Vol 112(2), pp 225-237.

Tasker, G.D., Lumb, A.M, Thomas, W.O Jr and Flynn, K.M. (1987). Computer Procedures for Hydrologic Regression and Network Analysis Using Generalised Least Squares. USGS. Tasker. G.D., (1980): Hydrologic regression and weighted least squares. Water Resour. Res., 16(6), 1107-1113.

Tasker, G.D., and Moss, M.E. (1979). Analysis of Arizona flood data network for regional information. Water Resour. Res., 15(6). 1791-1796.

Tasker, G.D. (1980). Hydrologic Regression and Weighted Lest Squares, Water Resour. Res., 16(6), 1107-1113.

Thomas, Jr., W.O. and Olsen. J. (1992). Regional analysis of minimum streamflow. In: Proceedings of 12 th Conference on Probability and Statistics in the Atmospheric Sciences, 5th International Meeting on Statistical Climatology, Toronto, Ont., 22-26 June, 1992, pp 261-266.

Tung, Y.K, and Mays, L.W. (1981). Generalised Skew Coefficients for Flood Frequency Analysis Water Resources Bulletin, Vol. 17. No. 2, pp.262-269.

U.S. Department of the Interior Geological (1981). Guidelines for determining flood flow frequency, Bulletin #17 of the Hydrology Sub Committee, Office of Water Data Coordination, U.S. Dep of the Int. Geol. Surv., Reston, Virginia.

Vogel, R.M and Kroll, C.N. (1990). Generalised low-flow frequency relationships for ungauged sites in Massachusetts. Water Resour. Bul., (26)2 241-253.

Vogel, R.M, and Kroll, C.N. (1989). Low – frequency analysis using probability – plot correlation coefficients. J. Water Resour. Plng. And Mgmt, ASCE, 115(3): 338-357. Vogel, R.M, McMahon, T.A and Chiew F (1993). ‘Floodflow frequency model selection in Australia’. Journal of Hydrology, vol 146, pp 421-449.

Wallis, J.R and Wood, E.F. (1985). ‘Relative accuracy of Log Pearson 3 Procedures’. Journal of Hydrology, vol 111, pp 1043-1057.

Wang, Q.J. (1997). ‘LH-moments for statistical analysis for extreme events’. Water Resour. Res, vol 33, iss 12 , pp 2841-48.

Wiltshire, S.E. (1986a). ‘Identification of homogeneous regions for flood frequency analysis’. Journal of Hydrology, vol 84, pp 287-302.

Wiltshire, S.E. (1986b). Regional flood frequency analysis II: Multivariate classification of drainage basins in Britain. Hydrol. Sci.J., vol 31(3): pp335-346.

176 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Wiltshire, S.E. (1986c). Regional flood frequency analysis II: Multivariate classification of drainage basins in Britain. Hydrol. Sci.J., 31(3): 335-346

Yevjevich, V. (1986). Specificities of karst water resource. In: Kasrt Water Resources. Proc. Ankara – Antalya Symp. July 1985 (ed. Gunay, G & Johnson, A,I.), IAHS Publ.No 161 3-36.

Yevjevich, V. and Jeng, R.I. (1969). Properties of Non – Homogenous Hydrological Time Series, Hydrology Paper 32. Colorado State University Press, Fort Collins.

Zringi , Z. and Burn, D.H (1994). Flood frequency analysis for ungauged sites using a region of influence approach. J. Hydrology., 153, 1-21.

177 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

APPENDIX A List of Study Catchments and Summary of Streamflow Data

177 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney A B C D E F G Lat Long Record length Station ID Code Station Name Map ID (degree) (degree) (Y) 221207 T1 Errinundra River @ Errinundra 8623 -37.450 148.913 35 221209 1 (East Branch) @ Weeragua 8723 -37.365 149.202 32 221210 T2 @ The Gorge -37.427 149.525 33 221211 2 Combienbar River @ Combienbar 8623 -37.442 148.982 31 221212 T3 @ Princes Highway 8622 -37.608 148.900 31 222202 3 @ Sandine Creek 8622 -37.513 148.548 41 222206 4 @ Buchan 8522 -37.500 148.175 31 222210 5 @ Deddick (Caseys) 8523 -37.092 148.425 35 222213 6 Suggan Buggan River @ Suggan Buggan 8524 -36.953 148.325 35 222217 7 @ Jacksons Crossing 8523 -37.412 148.360 30 223202 T4 @ Swifts Creek 8423, 8323, 8324, 8424 -37.263 147.722 32 223204 8 Nicholson River @ Deptford 8422, 8423, 8323, 8322 -37.595 147.697 34 224213 9 @ Lower Dargo Road 8323 -37.497 147.268 33 224214 10 Wentworth Riverr @ Tabberabbera 8323 -37.497 147.393 32 225213 11 @ Beardmore 8122 -38.757 146.423 33 225218 12 Freestone Creek @ Briagalong 8322 -37.813 147.093 34 225219 13 @ Glencairn 8222 -37.518 146.565 39 225223 14 Valencia Creek @ Gillio Road 8222 -37.728 146.980 35 225224 T5 Avon River @ The Channel 8222 -37.803 146.883 34 226204 15 @ Willow Grove 8121 -38.092 146.158 35 226205 16 Latrobe River @ Noojee 8122 -37.907 146.022 46 226209 T6 @ Darnum 8121 -38.207 146.000 34 226217 17 Latrobe River @ Hawthorn Bridge 8122 -37.975 146.083 34 226218 18 Narracan Creek @ Thorpdale 8121 -38.273 146.185 35 226222 19 Latrobe River @ near Nojee (US ADA R. Junction) 8022 -37.883 145.890 31 226226 20 @ Tanjil Junction 8121 -38.010 146.195 46 226402 21 Moe Drain @ Trafalgar East 8121 -38.178 146.212 31 227200 T7 @ Yarram 8220, 8221 -38.458 146.693 27 227205 22 Merriman Creek @ Calignee South 8221 -38.355 146.653 31 227210 T8 Bruthen Creek @ Carrajung Lower 8221 -38.397 146.742 33 227211 23 @ Toora 8120 -38.643 146.372 32 227213 24 Jack River @ Jack River 8220, 8120 -38.530 146.530 36 227219 T9 @ Loch 8021 -38.375 145.558 33 227225 25 Tarra River @ Fischers 8221 -38.472 146.555 33 227226 26 TarwinRiver East Branch @ Dumbalk North 8120, 8121 -38.500 146.158 36 227231 27 Bass River @ Glen Forbes South 8021 -38.468 145.513 31 227236 28 @ D/S Foster Creek Junction 8020, 8021 -38.562 145.707 27 228212 29 @ Tonimbuk 8021 -38.025 145.758 30 228217 30 Toomuc Creek @ Pakenham 7921 -38.068 145.462 28 Lat Long Record length Station ID Code Station Name Map ID (degree) (degree) (Y) 229218 T10 Watsons Creek @ Watsons Creek 7922 -37.668 145.258 25 230202 31 Jackson Creek @ Sunbury 7822 -37.582 144.743 31 230204 T11 Riddells Creek @ Riddells Creek 7823 -37.467 144.667 31 230205 32 Deep Creek @ Bulla (D/S of Emu Creek Junction)7822 -37.633 144.798 31 230211 33 Emu Creek @ Clarkefield 7823 -37.468 144.745 31 230213 T12 Turritable Creek @ Mount Macedon 7823 -37.422 144.583 31 231200 34 @ Bacchus Marsh 7722 -37.682 144.427 28 231213 35 @ Sardine Creek - O'Brien Crossing7723, 7722 -37.500 144.362 46 231225 36 W erribee River @ Ballan (U/S Old Western Highway7722 -37.600 144.250 32 231231 T13 Toolern Creek @ Melton South 7822 -37.912 144.577 27 232200 37 Little River @ Little River 7722 -37.958 144.483 32 232210 38 West Branch @ Lal Lal 7722 -37.645 144.040 33 232213 39 Lal Lal Creek @ U/S of Bungal Dam 7722 -37.657 144.033 28 233211 40 Birregurra Creek @ Ricketts Marsh 7621 -38.300 143.842 29 233214 41 East Branc @ Forrest (1030 hour readin7620gl) -38.533 143.730 28 234200 42 @ Pitfield 7622, 7522 -37.810 143.585 33 235202 43 @ Upper Gellibrand 7620 -37.560 143.642 31 235203 44 @ Curdie 7421, 7420, 7521, 7520 -38.447 142.960 30 235204 45 Little Aire Creek @ Beech Forest 7620 -38.657 143.528 30 235205 T14 Arkins Creek West Branch @ Wyelangta 7520 -38.645 143.443 28 235227 46 Gellibrand River @ Bunkers Hill 7520 -38.525 143.480 32 235233 47 East Branc @ Apollo Bay- Paradise 7620 -38.757 143.623 29 235234 48 Love Creek @ Gellibrand 7621 -38.485 143.570 26 236205 49 @ Woodford -38.320 147.482 32 236212 50 Brucknell Creek @ Cudgee 7421, 7521 -38.347 147.650 29 237207 51 Surry River @ Heathmere 7121, 7221 -38.250 141.662 30 238207 52 @ Jimmy Creek 7423, 7323 -37.367 142.500 31 238219 53 Grange Burn River @ Morgiana 7222 -37.710 141.830 33 401208 54 Cudgewa Creek @ Berringama 8425 -36.213 147.675 41 401209 55 Livingstone Creek @ Omeo 8423 -37.112 147.572 26

178 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

A B C D E F G 401210 T15 Snowy Creek @ below Granite Flat 8324, 8424 -36.570 147.412 38 401212 56 Nariel Creek @ Upper Nariel 8424, 8425 -36.447 147.828 52 401215 57 Morass Creek @ Uplands 8423, 8424 -36.868 147.702 35 401216 58 Big River @ Jokers Creek 8324, 8424 -36.945 141.473 52 401217 59 @ Gibbo Park 8424 -36.753 147.713 35 401220 60 Tallangatta Creek @ McCallums 8325 -36.208 147.498 29 402203 61 @ Mongans Bridge 8324 -36.597 147.098 36 402204 62 Yackandandah Creek @ Orbornes Flat 8225 -36.307 146.903 38 402206 63 Running Creek @ Running Creek 8324 -36.540 147.045 31 Lat Long Record length Station ID Code Station Name Map ID (degree) (degree) (Y) 402217 T16 Flaggy Creek @ Myrtleford Road Bridge 8225 -36.388 146.875 35 403205 64 Ovens Rivers @ Bright 8224 -36.727 146.950 35 403209 65 Reedy Creek @ Wangaratta North 8125 -36.333 146.343 32 403213 66 Fifteen Mile Creek @ Greta South 8124 -36.622 146.243 32 403221 67 Reedy Creek @ Woolshed 8225 -36.313 146.600 30 403222 68 @ Abbeyard 8223, 8224 -36.912 146.700 33 403224 69 Hurdle Creek @ Bobinawarrah 8124 -36.515 146.447 30 403226 70 Boggy Creek @ Angleside 8124 -36.607 146.360 30 403227 71 @ Cheshunt 8124 -36.833 146.397 33 403233 72 Buckland River @ Harris Lane 8224 -36.722 146.880 34 404206 73 @ Moorngag 8124 -36.800 146.017 33 404207 74 Holland Creek @ Kelfeera 8124 -36.613 146.057 31 405205 75 Murrindindi River @ Murrindindi above Colwells 8023, 8022 -37.412 145.558 31 405209 76 @ Taggerty 8023 -37.318 145.712 32 405212 77 Sunday Creek @ Tallarook 7823, 7923 -37.097 145.053 30 100 405214 78 @ Tonga Bridge 8123 -37.152 146.125 49 101 405215 79 @ Glen Esk 8123 -37.232 146.207 31 102 405217 80 @ Devlins Bridge 7923 -37.383 145.475 30 103 405218 81 @ Gerrang Bridge 8123 -37.292 146.187 46 104 405219 82 @ Dohertys 8122, 8123 -37.333 146.130 38 105 405226 83 Pranjip Creek @ Moorilim 7924, 8024 -36.623 145.305 32 106 405227 84 Big River @ Jamieson 8123 -37.367 146.055 35 107 405229 T17 Wanalta Creek @ Wanalta 7824 -36.637 144.868 34 108 405230 85 Cornella Creek @ Colbinabbin 7824 -36.605 144.803 30 109 405231 86 King Parrot Creek @ Flowerdale 7922, 7923 -37.348 145.288 32 110 405237 87 @ D/S Euroa Township 8024 -36.763 145.583 32 111 405240 88 Sugarloaf Creek @ Ash Bridge 7923 -37.062 145.052 33 112 405241 89 @ Rubicon 8023 -37.292 145.825 33 113 405245 90 Ford Creek @ Mansfield 8123 -37.040 146.050 35 114 405248 91 Major Creek @ Graytown 7824 -36.855 144.913 34 115 405251 92 Brankeet Creek @ Ancona 8024 -36.970 145.783 33 116 405263 93 Goulburn River @ U/S of Snake Creek Junction 8123 -37.463 146.247 31 117 405264 94 Big River @ D/S of Frenchman Creek Junction 8122 -37.523 146.077 30 118 405274 95 Home Creek @ Yarck 8023 -37.110 145.600 28 119 406200 T18 @ Malmsbury 7723 -37.192 144.383 33 120 406213 T19 @ Redesdale 7723, 7823 -37.017 144.542 30 121 406214 96 Axe Creek @ Longlea 7724 -36.775 144.427 33 122 406215 97 Coliban River @ Lyal 7724 -36.962 144.492 32 123 406216 98 Axe Creek @ Sedgewick 7724 -36.898 144.357 26 124 Lat Long Record length Station ID Code Station Name Map ID 125 (degree) (degree) (Y) 126 406224 99 Mount Pleasant Creek @ Runnymede 7824 -36.547 144.637 26 127 406226 100 Mount Ida Creek @ Derrinal 7824 -36.882 144.650 26 128 407214 101 Creswick Creek @ Clunes 7623, 7622, 7722, 7723 -37.298 143.788 30 129 407217 102 @ Vaughan @ D/S Fryers Creek 7723 -37.160 144.207 37 130 407220 103 Bet Bet Creek @ Norwood 7523, 7623, 7624 -36.997 143.642 33 131 407221 104 Jim Crow Creek @ Yandoit 7723 -37.208 144.100 32 132 407222 105 Tullaroop Creek @ Clunes 7623 -37.230 143.833 33 133 407230 106 Joyces Creek @ Strathlea 7623, 7723 -37.167 143.958 31 134 407246 107 Bullock Creek @ Marong 7724 -36.730 144.133 31 135 407253 108 Piccaninny Creek @ Minto 7724, 7725, 7825 -36.453 144.467 31 136 415207 109 @ Eversley 7523 -37.185 143.192 31 137 415217 110 Fyans Creek @ Grampians Road Bridge 7423 -37.258 142.533 33 138 415220 111 Avon River @ Wimmera Highway 7424 -36.643 142.978 27 139 415226 112 Richardson River @ Carrs Plains 7424 -36.745 142.785 26 140 415237 113 Concongella Creek @ Stawell 7423 -37.027 142.820 27 141 415238 T20 Wattle Creek @ Navarre 7524 -36.903 143.097 27

179 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

APPENDIX B Sample Plots (CUSUM and Time Series) associated with identifying trends in annual maximum data.

Vk - Station 229215 8 Vk 7

6

5

4

3 Vk 2

1

0

-1

-2 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year

Figure B-1 CUSUM test plot showing significant trends after 1995

IMM - Station 229215 14000 IMM 12000

10000

8000

6000 Flow (Ml/d)

4000

2000

0 1970 1975 1980 1985 1990 1995 2000 2005 2010 Year

Figure B-2 Time series graph showing trends

180 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

APPENDIX C Proposed Homogenous Regions

181 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table C-1 Stations in Each Hypothesised Homogeneous Region

NGDR = North of the Great Dividing Range, SGDR = South of the Great Dividing Range

182 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

APPENDIX D Sources of Data for the Thesis Sources of data obtained for this study:

A. Streamflow data and flood quantiles: Streamflow data were obtained from Department of Sustainability and Environment (Victoria) and Thiess Services Victoria. These data were checked, prepared and flood frequency analysis was undertaken as a part of this Thesis.

B. Catchment and climatic data:

QSA data were obtained as a part of this Thesis.

Other catchment and climatic characteristics data were obtained from the following sources:

Caballero, W. (2007). Regional design flow estimation for the state of Victoria. Unpublished B.Eng(Honours) Thesis, School of Engineering, University of Western Sydney.

Rahman, A. (1997). Flood Estimation for ungauged catchments: A regional approach using flood and catchment characteristics, Umpublished PhD thesis, Department of Civil Engineering, Monash University.

183 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.

Khaled Haddad 98072705 University of Western Sydney

Table D-1 Catchment Characteristics

184 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

185 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

186 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Table D-2 Log Transformed Catchment Characteristics

187 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

188 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

189 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

APPENDIX E Various Results

190 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Table E-1 Flood Frequency Quantiles for Estimation Set

1 Observed Flood Quantiles - FFA (ML/d) Code Station No. 2 QLP3 1.25 QLP3 2 QLP3 5 QLP3 10 QLP3 20 QLP3 50 QLP3 100 QLP3 200 3 1 221209 1088.43 3969.60 9713.94 13517.79 16710.09 20040.49 21981.70 23516.27 4 2 221211 1355.84 3656.46 8535.13 12589.33 16893.83 22874.35 27556.04 32327.49 5 3 222202 2346.28 8010.15 22208.87 35020.04 49113.99 69117.88 84919.90 101037.06 6 4 222206 1622.37 5149.65 14511.96 23827.97 35073.70 52897.76 68606.08 86210.62 7 5 222210 1046.65 4273.22 12558.05 19593.02 26753.87 35937.85 42458.68 48512.18 8 6 222213 644.83 1606.14 4016.80 6496.37 9670.68 15146.75 20439.04 26898.44 9 7 222217 818.71 3358.00 13037.02 25928.37 45247.40 83685.67 125230.72 180240.19 10 8 223204 656.34 3231.15 10836.15 17768.09 25057.58 34611.46 41477.43 47882.95 11 9 224213 2503.45 4485.53 8470.16 12064.74 16341.75 23281.09 29694.60 37295.55 12 10 224214 663.38 2361.93 7943.81 14646.03 23999.92 41338.27 58981.36 81256.02 13 11 225213 1079.09 3134.01 9893.27 18665.56 32097.97 60249.45 92750.60 138789.97 14 12 225218 1401.77 5344.50 17549.56 30857.67 47792.01 75879.02 101500.47 130907.57 15 13 225219 4298.69 8117.04 14478.98 19168.15 23899.57 30277.88 35210.48 40237.49 16 14 225223 474.50 2265.23 7154.89 11274.69 15343.73 20327.95 23687.28 26661.44 17 15 226204 1739.31 3311.06 6508.42 9387.50 12790.49 18252.98 23239.68 29080.10 18 16 226205 1001.58 1489.30 2387.62 3153.57 4035.76 5427.57 6687.32 8159.60 19 17 226217 1817.92 2737.74 4362.43 5696.86 7190.28 9472.96 11477.83 13761.97 20 18 226218 165.69 295.89 508.72 665.38 824.32 1040.67 1209.95 1384.42 21 19 226222 1830.21 2372.73 3161.38 3714.47 4268.72 5024.68 5623.44 6250.91 22 20 226226 2094.03 3600.07 6240.17 8346.23 10629.86 13981.41 16802.56 19895.93 23 21 226402 2590.75 4318.58 6292.60 7304.08 8081.24 8863.67 9321.03 9691.29 24 22 227205 324.39 882.82 2426.78 4133.60 6430.47 10597.94 14806.53 20126.82 25 23 227211 976.84 2149.94 4508.06 6514.39 8744.50 12055.46 14844.22 17884.15 26 24 227213 494.03 1322.74 3683.44 6392.10 10161.32 17276.96 24744.96 34508.59 27 25 227225 175.07 570.23 1938.16 3737.71 6487.12 12184.74 18657.60 27668.60 28 26 227226 1365.16 2936.87 5760.64 7909.01 10096.61 13047.81 15318.01 17613.01 29 27 227231 2063.70 3476.93 5078.05 5886.02 6498.14 7104.18 7452.29 7729.87 30 28 227236 2392.87 3757.56 5414.42 6348.77 7130.64 8000.42 8563.84 9061.96 31 29 228212 395.94 667.64 1146.94 1533.35 1956.52 2585.09 3120.61 3714.06 32 30 228217 410.96 927.79 1730.22 2236.83 2677.07 3173.20 3492.75 3770.89 33 31 230202 749.52 3434.87 11991.03 20833.87 31302.33 47067.03 60062.07 73682.69 34 32 230205 1621.51 6674.26 19614.22 30521.71 41541.94 55549.32 65404.69 74482.70 35 33 230211 202.62 1345.84 4904.97 7853.58 10585.21 13624.46 15462.30 16938.44 36 34 231200 428.81 1972.27 9142.32 20444.92 39804.57 84401.76 139448.09 220950.42 37 35 231213 1192.92 3542.09 7933.94 10940.14 13618.57 16654.64 18594.80 20259.39 38 36 231225 680.87 1922.43 4263.07 5924.13 7461.05 9285.56 10508.71 11602.42 39 37 232200 74.43 1029.58 6386.90 12574.22 19454.35 28332.53 34338.06 39535.85 40 38 232210 119.06 303.77 584.85 747.20 875.79 1005.14 1079.08 1137.11 41 39 232213 352.52 925.52 1847.28 2408.54 2870.95 3355.96 3644.64 3878.88 42 40 233211 98.59 406.60 965.94 1271.56 1485.48 1666.27 1751.10 1806.99 43 41 233214 251.91 877.58 2794.44 4943.86 7780.20 12718.58 17457.68 23151.63 44 42 234200 558.62 1718.97 4545.78 7135.76 10064.76 14388.14 17952.53 21732.18 45 43 235202 1212.59 2703.59 5892.50 8775.82 12138.24 17400.65 22059.23 27353.13 46 44 235203 2992.19 7108.30 14387.55 19601.95 24589.49 30818.60 35244.60 39419.88 47 45 235204 440.62 723.58 1226.77 1637.67 2093.08 2779.32 3372.51 4038.26 48 46 235227 2378.13 4412.09 8426.83 11958.43 16065.49 22549.83 28382.76 35133.88 49 47 235233 609.06 1606.54 3955.62 6168.04 8781.70 12881.99 16492.08 20556.35 50 48 235234 387.42 756.34 1446.04 2012.54 2633.16 3547.35 4315.64 5154.28 51 49 236205 887.15 4485.61 12626.91 17877.92 21974.12 25820.04 27809.90 29225.15 52 50 236212 845.59 1834.38 3116.64 3781.64 4276.10 4745.48 5000.50 5193.14 53 51 237207 533.65 1197.23 2364.08 3216.06 4050.46 5125.09 5913.98 6680.00 54 52 238207 323.87 528.18 782.12 926.75 1048.09 1183.05 1270.30 1347.21 55 53 238219 566.32 2306.65 6543.01 9920.57 13180.67 17121.51 19763.25 22101.47 56 54 401208 1972.92 4456.96 8023.37 10073.43 11722.49 13427.05 14434.40 15250.22 57 55 401209 442.17 1393.82 2764.07 3413.26 3833.90 4166.38 4313.54 4406.41 58 56 401212 1169.88 2239.75 3990.19 5249.62 6494.19 8132.31 9369.63 10605.64 59 57 401215 569.70 1580.37 3292.93 4371.52 5277.56 6244.46 6828.43 7307.32

191 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

60 Observed Flood Quantiles - FFA (ML/d) Code Station No. 61 QLP3 1.25 QLP3 2 QLP3 5 QLP3 10 QLP3 20 QLP3 50 QLP3 100 QLP3 200 62 58 401216 4327.29 6877.47 11216.01 14636.12 18334.96 23770.76 28364.97 33428.74 63 59 401217 1238.83 2741.29 4966.10 6307.76 7434.09 8657.37 9417.72 10060.50 64 60 401220 1927.19 3738.71 6952.48 9458.24 12093.62 15805.68 18795.73 21946.14 65 61 402203 5793.65 9705.28 16410.59 21676.70 27331.07 35552.82 42419.31 49903.63 66 62 402204 2456.86 5557.71 9738.57 11961.32 13636.92 15246.94 16130.69 16803.35 67 63 402206 523.67 1071.82 1961.21 2578.84 3166.99 3906.46 4439.15 4949.71 68 64 403205 2134.13 4139.23 8254.90 11976.08 16380.87 23454.92 29910.94 37466.74 69 65 403209 8452.41 21151.94 47785.47 70358.43 94966.18 130381.85 159164.86 189501.91 70 66 403213 1640.77 3613.01 7537.38 10839.89 14477.99 19825.48 24286.55 29110.28 71 67 403221 1888.61 4725.64 9699.68 13141.54 16317.23 20111.87 22687.53 25022.71 72 68 403222 2351.17 4089.49 7532.13 10611.15 14258.39 20155.70 25594.13 32031.75 73 69 403224 1031.82 2363.44 4891.85 6884.17 8954.36 11797.40 14015.61 16280.72 74 70 403226 923.78 1931.43 3652.12 4905.00 6140.95 7754.20 8957.98 10145.36 75 71 403227 3440.97 7430.34 16744.59 26053.65 37878.54 58299.62 78181.24 102698.32 76 72 403233 2376.08 5101.44 10522.85 15126.28 20250.77 27885.73 34345.96 41420.43 77 73 404206 644.49 1974.87 7601.75 16924.97 34508.75 81451.56 149408.30 266722.94 78 74 404207 1889.16 5642.45 15100.86 24219.62 35026.50 51881.59 66552.07 82850.09 79 75 405205 514.49 718.81 1084.29 1388.47 1732.98 2267.43 2744.12 3294.86 80 76 405209 4143.81 6133.18 8958.18 10863.46 12705.01 15110.46 16933.88 18773.08 81 77 405212 1770.67 5955.41 13138.10 17258.53 20362.44 23251.23 24758.49 25847.50 82 78 405214 3252.16 6689.39 12802.65 17483.67 22302.28 28903.26 34067.09 39367.92 83 79 405215 2777.29 4477.87 6960.48 8642.06 10259.05 12348.88 13911.87 15467.67 84 80 405217 2174.81 4615.69 8839.11 11940.00 15013.50 19041.78 22057.45 25038.82 85 81 405218 3721.96 5873.84 9065.99 11276.67 13443.88 16308.09 18498.29 20720.11 86 82 405219 5055.28 8432.48 12717.21 15189.78 17279.11 19617.45 21136.53 22480.33 87 83 405226 569.88 2435.47 7140.75 10958.65 14672.01 19182.08 22212.50 24896.50 88 84 405227 4208.98 6887.03 10933.11 13757.37 16532.45 20200.81 23002.27 25838.41 89 85 405230 559.97 1780.98 4484.48 6669.63 8883.76 11765.59 13866.58 15874.51 90 86 405231 798.45 2062.64 4554.32 6497.65 8467.38 11074.47 13026.11 14944.50 91 87 405237 2001.10 4729.54 10001.71 14182.45 18530.95 24495.46 29136.64 33861.07 92 88 405240 2592.21 10088.73 25968.17 36905.34 46298.22 56301.80 62235.54 66986.98 93 89 405241 1805.08 2612.45 3852.27 4755.34 5681.32 6971.25 8010.86 9114.35 94 90 405245 1069.30 3088.48 7276.41 10563.59 13859.96 18133.39 21251.67 24242.32 95 91 405248 525.30 2732.29 8875.18 13935.75 18775.85 24458.18 28118.42 31233.14 96 92 405251 701.62 1989.79 4633.44 6701.23 8774.23 11464.78 13432.00 15322.88 97 93 405263 2010.24 3272.13 5199.12 6560.94 7912.72 9720.23 11116.11 12542.69 98 94 405264 1774.05 2920.34 4629.93 5805.75 6947.28 8436.07 9558.24 10681.88 99 95 405274 4084.12 6429.97 9944.40 12403.29 14833.17 18072.78 20570.88 23122.78 100 96 406214 531.65 2643.23 7608.41 11005.62 13798.91 16581.74 18110.11 19252.08 101 97 406215 1616.78 6924.00 18127.01 25404.10 31263.70 37031.04 40180.69 42530.98 102 98 406216 77.83 471.20 1452.13 2098.50 2598.06 3056.51 3286.72 3445.93 103 99 406224 412.87 1558.58 5100.47 8973.91 13922.13 22172.36 29739.81 38468.47 104 100 406226 643.85 2183.12 5105.58 7002.01 8584.65 10234.78 11199.31 11964.97 105 101 407214 809.15 2573.10 5328.23 6774.33 7794.33 8680.16 9111.99 9407.22 106 102 407217 869.60 3339.29 9027.21 13401.84 17534.46 22434.17 25666.14 28493.22 107 103 407220 771.66 4051.12 12239.03 18100.12 23066.16 28160.07 31032.98 33224.20 108 104 407221 1159.02 3053.66 6406.41 8692.42 10760.04 13166.35 14753.87 16157.32 109 105 407222 990.52 4030.27 9771.60 13101.51 15557.74 17759.70 18857.04 19617.18 110 106 407230 737.74 2114.18 4873.19 6962.28 8997.78 11554.70 13364.85 15058.00 111 107 407246 410.37 1556.07 3531.00 4591.19 5332.05 5960.74 6257.93 6455.29 112 108 407253 926.64 2476.82 6567.46 10899.08 16532.98 26378.70 35980.80 47768.20 113 109 415207 478.52 1807.17 4845.34 7182.43 9396.16 12032.14 13779.63 15315.27 114 110 415217 334.02 699.23 1384.62 1936.71 2527.38 3370.83 4056.84 4784.02 115 111 415220 436.04 2134.45 6269.25 9251.39 11821.68 14522.50 16087.16 17309.11 116 112 415226 407.11 1300.78 3332.79 5023.11 6776.49 9118.45 10868.80 12576.48 117 113 415237 507.55 1935.79 5678.46 9047.97 12688.99 17709.34 21545.24 25338.03

192 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Table E-2 Flood Frequency Quantiles for Validation Set 1 Area Observed Flood Quantiles - FFA (ML/d) Code Station No. 2 (Km 2) QLP3 1.25 QLP3 2 QLP3 5 QLP3 10 QLP3 20 QLP3 50 QLP3 100 QLP3 200 3 T1 221207 158 1113.60 2817.61 7075.90 11416.37 16920.88 26305.48 35267.45 46090.18 4 T2 221210 837 5676.08 20265.46 55825.49 86208.96 117924.43 160211.46 191591.31 221927.45 5 T3 221212 725 3368.30 10233.08 27142.26 42917.66 61063.06 88390.06 111372.07 136162.40 6 T4 223202 943 1368.85 3602.13 8969.25 14141.01 20369.85 30358.46 39340.10 49634.26 7 T5 225224 554 1909.64 7716.44 26319.82 46857.63 73058.48 116440.09 155836.74 200806.50 8 T6 226209 214 1373.72 2396.02 3700.95 4444.32 5062.06 5738.33 6167.20 6538.53 9 T7 227200 25 239.27 837.17 2068.90 2938.03 3712.50 4579.14 5121.63 5577.12 10 T8 227210 18 177.81 463.94 1178.56 1898.56 2799.54 4309.20 5725.32 7408.16 11 T9 227219 52 946.71 1570.79 2555.02 3269.30 3991.57 4975.81 5749.11 6550.40 12 T10 229218 36 215.28 608.31 1561.35 2462.60 3521.37 5163.38 6588.65 8170.90 13 T11 230204 79 189.82 806.56 2457.35 3900.50 5397.36 7351.78 8761.04 10084.76 14 T12 230213 15 35.89 74.84 139.94 186.30 231.22 288.74 330.86 371.77 15 T13 231231 95 137.00 829.48 3794.05 7560.17 12686.75 21533.99 29728.74 39129.42 16 T14 235205 3 96.39 151.42 286.95 435.18 642.14 1046.42 1493.86 2114.68 17 T15 401210 407 2642.48 4732.66 8029.09 10365.62 12665.87 15695.04 17991.08 20295.65 18 T16 402217 24 328.05 655.43 1162.95 1501.51 1814.69 2196.68 2464.20 2714.81 19 T17 405229 108 247.54 828.72 2537.69 4400.09 6811.29 10931.98 14824.32 19443.68 20 T18 406200 306 141.60 1439.00 6177.44 9981.31 13204.72 16362.53 18020.75 19198.22 21 T19 406213 629 2366.36 7282.86 16806.28 23483.13 29518.95 36452.61 40934.22 44812.09 22 T20 415238 141 405.72 1708.81 3953.41 5081.86 5815.77 6386.86 6632.96 6784.10

Table E-3 C10 Estimation Set

ABCDEFGHIJKLMNOP

1 Station No. Lat Long C10 Station No. Lat Long C10 Station No. Lat Long C10 Station No. Lat Long C10 2 221209 -37.365 149.202 23 237207 -38.250 141.662 7 237207 -38.250 141.662 7 407214 -37.298 143.788 12 3 221211 -37.442 148.982 18 238207 -37.367 142.500 7 238207 -37.367 142.500 7 407217 -37.160 144.207 24 4 222202 -37.513 148.548 21 238219 -37.710 141.830 8 238219 -37.710 141.830 8 407220 -36.997 143.642 29 5 222206 -37.500 148.175 15 401208 -36.213 147.675 13 401208 -36.213 147.675 13 407221 -37.208 144.100 22 6 222210 -37.092 148.425 14 401209 -37.112 147.572 6 401209 -37.112 147.572 6 407222 -37.230 143.833 13 7 222213 -36.953 148.325 8 401212 -36.447 147.828 7 401212 -36.447 147.828 7 407230 -37.167 143.958 20 8 222217 -37.412 148.360 27 401215 -36.868 147.702 5 401215 -36.868 147.702 5 407246 -36.730 144.133 12 9 223204 -37.595 147.697 23 401216 -36.945 141.473 18 401216 -36.945 141.473 18 407253 -36.453 144.467 11 10 224213 -37.497 147.268 10 401217 -36.753 147.713 7 401217 -36.753 147.713 7 415207 -37.185 143.192 13 11 224214 -37.497 147.393 16 401220 -36.208 147.498 11 401220 -36.208 147.498 11 415217 -37.258 142.533 17 12 225213 -38.757 146.423 31 402203 -36.597 147.098 19 402203 -36.597 147.098 19 415220 -36.643 142.978 10 13 225218 -37.813 147.093 44 402204 -36.307 146.903 20 402204 -36.307 146.903 20 415226 -36.745 142.785 17 14 225219 -37.518 146.565 15 402206 -36.540 147.045 7 402206 -36.540 147.045 7 415237 -37.027 142.820 19 15 225223 -37.728 146.980 20 403205 -36.727 146.950 12 403205 -36.727 146.950 12 16 226204 -38.092 146.158 9 403209 -36.333 146.343 103 403209 -36.333 146.343 103 17 226205 -37.907 146.022 5 403213 -36.622 146.243 21 403213 -36.622 146.243 21 18 226217 -37.975 146.083 6 403221 -36.313 146.600 29 403221 -36.313 146.600 29 19 226218 -38.273 146.185 4 403222 -36.912 146.700 10 403222 -36.912 146.700 10 20 226222 -37.883 145.890 19 403224 -36.515 146.447 18 403224 -36.515 146.447 18 21 226226 -38.010 146.195 14 403226 -36.607 146.360 17 403226 -36.607 146.360 17 22 226402 -38.178 146.212 7 403227 -36.833 146.397 25 403227 -36.833 146.397 25 23 227205 -38.355 146.653 32 403233 -36.722 146.880 15 403233 -36.722 146.880 15 24 227211 -38.643 146.372 32 404206 -36.800 146.017 17 404206 -36.800 146.017 17 25 227213 -38.530 146.530 47 404207 -36.613 146.057 29 404207 -36.613 146.057 29 26 227225 -38.472 146.555 45 405205 -37.412 145.558 5 405205 -37.412 145.558 5 27 227226 -38.500 146.158 25 405209 -37.318 145.712 9 405209 -37.318 145.712 9 28 227231 -38.468 145.513 12 405212 -37.097 145.053 28 405212 -37.097 145.053 28 29 227236 -38.562 145.707 14 405214 -37.152 146.125 25 405214 -37.152 146.125 25 30 228212 -38.025 145.758 3 405215 -37.232 146.207 11 405215 -37.232 146.207 11 31 228217 -38.068 145.462 17 405217 -37.383 145.475 18 405217 -37.383 145.475 18 32 230202 -37.582 144.743 32 405218 -37.292 146.187 13 405218 -37.292 146.187 13 33 230205 -37.633 144.798 23 405219 -37.333 146.130 11 405219 -37.333 146.130 11 34 230211 -37.468 144.745 29 405226 -36.623 145.305 10 405226 -36.623 145.305 10 35 231200 -37.682 144.427 29 405227 -37.367 146.055 10 405227 -37.367 146.055 10 36 231213 -37.500 144.362 27 405230 -36.605 144.803 13 405230 -36.605 144.803 13 37 231225 -37.600 144.250 28 405231 -37.348 145.288 14 405231 -37.348 145.288 14 38 232200 -37.958 144.483 18 405237 -36.763 145.583 21 405237 -36.763 145.583 21 39 232210 -37.645 144.040 3 405240 -37.062 145.052 39 405240 -37.062 145.052 39 40 232213 -37.657 144.033 6 405241 -37.292 145.825 12 405241 -37.292 145.825 12 41 233211 -38.300 143.842 3 405245 -37.040 146.050 38 405245 -37.040 146.050 38 42 233214 -38.533 143.730 66 405248 -36.855 144.913 24 405248 -36.855 144.913 24 43 234200 -37.810 143.585 12 405251 -36.970 145.783 22 405251 -36.970 145.783 22 44 235202 -37.560 143.642 46 405263 -37.463 146.247 8 405263 -37.463 146.247 8 45 235203 -38.447 142.960 19 405264 -37.523 146.077 7 405264 -37.523 146.077 7 46 235204 -38.657 143.528 31 405274 -37.110 145.600 31 405274 -37.110 145.600 31 47 235227 -38.525 143.480 19 406214 -36.775 144.427 23 406214 -36.775 144.427 23 48 235233 -38.757 143.623 46 406215 -36.962 144.492 22 406215 -36.962 144.492 22 49 235234 -38.485 143.570 11 406216 -36.898 144.357 18 406216 -36.898 144.357 18 50 236205 -38.320 147.482 14 406224 -36.547 144.637 18 406224 -36.547 144.637 18 51 236212 -38.347 147.650 5 406226 -36.882 144.650 16 406226 -36.882 144.650 16

193 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Table E-4 C – coefficient, Estimation Set and Frequency Factors

1 C2 C5 C10 C20 C50 C100 FF2 FF5 FF10 FF20 FF50 FF100 2 3 0.10221 0.18758 0.22559 0.23652 0.23353 0.22425 0.45307 0.83153 1 1.04847 1.03521 0.99409 4 0.08177 0.14385 0.18187 0.20681 0.22932 0.24186 0.44963 0.79096 1 1.13710 1.26091 1.32982 5 0.07567 0.15564 0.20963 0.24770 0.28379 0.30218 0.36098 0.74242 1 1.18159 1.35373 1.44147 6 0.04700 0.10485 0.15277 0.19457 0.24808 0.28718 0.30763 0.68635 1 1.27365 1.62388 1.87982 7 0.04245 0.10001 0.13936 0.16578 0.18977 0.20026 0.30463 0.71765 1 1.18957 1.36171 1.43697 8 0.03036 0.05960 0.08493 0.10949 0.14361 0.17272 0.35743 0.70170 1 1.28918 1.69086 2.03365 9 0.04507 0.14087 0.26773 0.40916 0.64417 0.87044 0.16833 0.52614 1 1.52823 2.40601 3.25115 10 0.05823 0.15564 0.22810 0.27961 0.32816 0.35186 0.25527 0.68233 1 1.22584 1.43869 1.54260 11 0.04722 0.07389 0.09563 0.11437 0.14057 0.16329 0.49381 0.77266 1 1.19597 1.46996 1.70751 12 0.03540 0.09571 0.15767 0.22555 0.32924 0.42314 0.22454 0.60706 1 1.43055 2.08816 2.68372 13 0.07642 0.18866 0.31195 0.45810 0.71377 0.97004 0.24497 0.60478 1 1.46847 2.28805 3.10956 14 0.10977 0.28217 0.43673 0.58012 0.77452 0.91172 0.25135 0.64609 1 1.32833 1.77346 2.08762 15 0.09441 0.12819 0.14707 0.15586 0.16261 0.16592 0.64194 0.87165 1 1.05981 1.10572 1.12823 16 0.05785 0.14277 0.19571 0.22909 0.25233 0.26069 0.29560 0.72949 1 1.17056 1.28933 1.33203 17 0.04962 0.07535 0.09478 0.10995 0.12973 0.14506 0.52354 0.79507 1 1.16017 1.36877 1.53057 18 0.03289 0.04135 0.04801 0.05267 0.05903 0.06443 0.68514 0.86123 1 1.09709 1.22953 1.34207 19 0.04585 0.05655 0.06448 0.06935 0.07596 0.08045 0.71107 0.87695 1 1.07559 1.17808 1.24766 20 0.02430 0.03248 0.03714 0.03939 0.04129 0.04217 0.65430 0.87444 1 1.06055 1.11154 1.13527 21 0.17645 0.18460 0.19040 0.18858 0.18539 0.18331 0.92669 0.96951 1 0.99044 0.97367 0.96274 22 0.08558 0.11599 0.13617 0.14867 0.16243 0.17290 0.62850 0.85185 1 1.09184 1.19289 1.26975 23 0.06190 0.07055 0.07179 0.06804 0.06212 0.05777 0.86220 0.98274 1 0.94775 0.86535 0.80468 24 0.10009 0.21589 0.32300 0.42987 0.58925 0.72557 0.30989 0.66838 1 1.33085 1.82428 2.24634 25 0.15607 0.25700 0.32384 0.37218 0.42807 0.46352 0.48194 0.79360 1 1.14926 1.32185 1.43133 26 0.14084 0.30893 0.47152 0.64137 0.91189 1.15651 0.29870 0.65519 1 1.36023 1.93394 2.45273 27 0.10449 0.27114 0.45236 0.66457 1.02277 1.36757 0.23099 0.59939 1 1.46910 2.26093 3.02315 28 0.13409 0.20821 0.25172 0.27814 0.29912 0.31190 0.53269 0.82714 1 1.10495 1.18827 1.23906 29 0.10004 0.11861 0.12315 0.11824 0.10943 0.10243 0.81234 0.96310 1 0.96009 0.88857 0.83173 30 0.11512 0.13307 0.13867 0.13479 0.12703 0.12122 0.83019 0.95958 1 0.97200 0.91608 0.87417 31 0.02173 0.02859 0.03461 0.03775 0.04151 0.04444 0.62794 0.82591 1 1.09075 1.19937 1.28409 32 0.10263 0.14764 0.16701 0.16990 0.16695 0.16121 0.61449 0.88402 1 1.01729 0.99965 0.96527 33 0.07786 0.21013 0.31854 0.40792 0.51006 0.57078 0.24443 0.65964 1 1.28059 1.60123 1.79184 34 0.07268 0.16828 0.23062 0.26983 0.30245 0.31511 0.31515 0.72969 1 1.17004 1.31147 1.36639 35 0.07438 0.20715 0.28818 0.32908 0.34853 0.34610 0.25811 0.71882 1 1.14190 1.20941 1.20098 36 0.04367 0.15263 0.29348 0.48106 0.83450 1.19356 0.14880 0.52007 1 1.63919 2.84351 4.06695 37 0.13096 0.22536 0.27063 0.28506 0.28866 0.28268 0.48390 0.83271 1 1.05331 1.06661 1.04450 38 0.14126 0.23808 0.28474 0.30173 0.30937 0.30506 0.49612 0.83612 1 1.05966 1.08650 1.07135 39 0.02310 0.10863 0.18408 0.23979 0.28515 0.30073 0.12549 0.59014 1 1.30267 1.54909 1.63371 40 0.01992 0.02876 0.03150 0.03094 0.02898 0.02693 0.63240 0.91318 1 0.98225 0.92000 0.85500 41 0.03764 0.05605 0.06262 0.06240 0.05894 0.05523 0.60115 0.89509 1 0.99658 0.94121 0.88199

194 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

42 C2 C5 C10 C20 C50 C100 FF2 FF5 FF10 FF20 FF50 FF100 43 44 0.01329 0.02322 0.02575 0.02475 0.02212 0.01984 0.51593 0.90148 1 0.96093 0.85894 0.77027 45 0.17473 0.43042 0.65802 0.87806 1.17098 1.39720 0.26554 0.65411 1 1.33439 1.77954 2.12333 46 0.04564 0.09042 0.12129 0.14306 0.16508 0.17883 0.37627 0.74552 1 1.17955 1.36103 1.47441 47 0.18962 0.34035 0.45654 0.55436 0.67667 0.76674 0.41535 0.74550 1 1.21427 1.48219 1.67947 48 0.09365 0.15287 0.18513 0.20029 0.21011 0.21203 0.50587 0.82573 1 1.08188 1.13493 1.14531 49 0.19423 0.26232 0.30992 0.34000 0.37569 0.40014 0.62672 0.84643 1 1.09707 1.21223 1.29112 50 0.09715 0.15142 0.19218 0.22426 0.26714 0.29684 0.50549 0.78791 1 1.16693 1.39007 1.54460 51 0.16944 0.33303 0.45588 0.55573 0.67786 0.76402 0.37168 0.73053 1 1.21902 1.48693 1.67592 52 0.05598 0.08549 0.10539 0.11788 0.13305 0.14176 0.53115 0.81110 1 1.11845 1.26242 1.34499 53 0.05299 0.11443 0.14057 0.14685 0.14235 0.13444 0.37699 0.81409 1 1.04466 1.01269 0.95638 54 0.03116 0.04247 0.04550 0.04424 0.04087 0.03789 0.68475 0.93332 1 0.97219 0.89824 0.83262 55 0.03436 0.05600 0.06845 0.07503 0.08032 0.08231 0.50192 0.81806 1 1.09615 1.17335 1.20242 56 0.05924 0.06728 0.06890 0.06572 0.06066 0.05650 0.85980 0.97646 1 0.95380 0.88038 0.82008 57 0.02582 0.05927 0.07997 0.09173 0.10000 0.10201 0.32290 0.74109 1 1.14704 1.25034 1.27557 58 0.08045 0.11754 0.13270 0.13538 0.13310 0.12910 0.60627 0.88575 1 1.02022 1.00302 0.97288 59 0.03446 0.05406 0.05913 0.05762 0.05288 0.04862 0.58277 0.91426 1 0.97446 0.89424 0.82228 60 0.04426 0.06339 0.07477 0.08128 0.08724 0.09000 0.59200 0.84779 1 1.08713 1.16687 1.20372 61 0.02425 0.04026 0.04753 0.04989 0.05018 0.04907 0.51026 0.84704 1 1.04973 1.05575 1.03246 62 0.11441 0.15341 0.18132 0.20040 0.22419 0.24213 0.63099 0.84609 1 1.10520 1.23642 1.33539 63 0.04144 0.06102 0.06974 0.07233 0.07239 0.07098 0.59419 0.87498 1 1.03714 1.03795 1.01779 64 0.05744 0.08616 0.10516 0.11769 0.13254 0.14172 0.54623 0.81934 1 1.11915 1.26035 1.34764 65 0.10578 0.15318 0.18749 0.21273 0.24377 0.26886 0.56419 0.81703 1 1.13463 1.30018 1.43397 66 0.12638 0.17805 0.19568 0.19530 0.18577 0.17675 0.64584 0.90990 1 0.99807 0.94936 0.90327 67 0.03988 0.06056 0.07221 0.07868 0.08437 0.08731 0.55229 0.83868 1 1.08956 1.16829 1.20906 68 0.05082 0.08646 0.11605 0.14233 0.17934 0.20965 0.43796 0.74508 1 1.22650 1.54542 1.80657 69 0.44813 0.79038 1.02577 1.19377 1.37856 1.50058 0.43687 0.77052 1 1.16378 1.34394 1.46289 70 0.09688 0.16313 0.21055 0.24600 0.28835 0.31766 0.46014 0.77480 1 1.16836 1.36951 1.50870 71 0.14663 0.23796 0.28630 0.30820 0.32337 0.32455 0.51215 0.83117 1 1.07650 1.12946 1.13358 72 0.05052 0.07816 0.10092 0.12041 0.14958 0.17291 0.50058 0.77450 1 1.19313 1.48217 1.71335 73 0.08759 0.14275 0.17784 0.19996 0.22205 0.23523 0.49252 0.80266 1 1.12435 1.24855 1.32270 74 0.09058 0.13803 0.16733 0.18210 0.19796 0.20555 0.54131 0.82487 1 1.08826 1.18304 1.22840 75 0.08755 0.17061 0.24711 0.32535 0.44282 0.54850 0.35430 0.69044 1 1.31663 1.79199 2.21966 76 0.06501 0.11419 0.15144 0.18114 0.22057 0.24903 0.42932 0.75403 1 1.19614 1.45654 1.64448 77 0.02597 0.08249 0.16680 0.30050 0.61470 1.02589 0.15570 0.49452 1 1.80155 3.68525 6.15044 78 0.09593 0.20500 0.29380 0.36996 0.46955 0.53892 0.32650 0.69777 1 1.25922 1.59820 1.83432 79 0.03421 0.04098 0.04654 0.04985 0.05532 0.05943 0.73501 0.88045 1 1.07115 1.18861 1.27686 80 0.06697 0.07970 0.08698 0.08920 0.09074 0.09185 0.76987 0.91624 1 1.02546 1.04321 1.05596 81 0.13525 0.23729 0.27726 0.28361 0.27357 0.26143 0.48779 0.85585 1 1.02289 0.98669 0.94289 82 0.12981 0.20400 0.25197 0.28318 0.31747 0.33808 0.51518 0.80962 1 1.12385 1.25993 1.34174

195 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

83 C2 C5 C10 C20 C50 C100 FF2 FF5 FF10 FF20 FF50 FF100 84 85 0.07539 0.09488 0.10558 0.10950 0.11267 0.11405 0.71400 0.89858 1 1.03704 1.06709 1.08019 86 0.09831 0.14945 0.17863 0.19422 0.20775 0.21436 0.55032 0.83662 1 1.08723 1.16299 1.20001 87 0.09758 0.12053 0.13345 0.13827 0.14192 0.14433 0.73123 0.90316 1 1.03612 1.06350 1.08154 88 0.08334 0.10172 0.10900 0.10832 0.10508 0.10144 0.76458 0.93322 1 0.99372 0.96401 0.93063 89 0.03112 0.07223 0.09843 0.11431 0.12700 0.13173 0.31618 0.73384 1 1.16139 1.29035 1.33842 90 0.07050 0.09215 0.10491 0.11120 0.11713 0.11993 0.67205 0.87836 1 1.05991 1.11646 1.14318 91 0.05067 0.09794 0.12733 0.14542 0.16165 0.16886 0.39794 0.76921 1 1.14208 1.26952 1.32619 92 0.06544 0.11399 0.14371 0.16096 0.17814 0.18610 0.45537 0.79320 1 1.12006 1.23955 1.29499 93 0.09901 0.16858 0.21428 0.24461 0.27674 0.29706 0.46208 0.78673 1 1.14157 1.29149 1.38631 94 0.15026 0.30928 0.39238 0.42830 0.44343 0.43862 0.38295 0.78823 1 1.09154 1.13012 1.11786 95 0.09431 0.11201 0.12278 0.12822 0.13313 0.13752 0.76814 0.91227 1 1.04434 1.08431 1.12008 96 0.15770 0.29565 0.38243 0.43256 0.47918 0.50286 0.41237 0.77309 1 1.13108 1.25299 1.31489 97 0.06907 0.17173 0.23513 0.26913 0.29357 0.29865 0.29376 0.73035 1 1.14456 1.24852 1.27015 98 0.08926 0.16711 0.21549 0.24746 0.27586 0.28885 0.41423 0.77550 1 1.14836 1.28014 1.34045 99 0.05307 0.06810 0.07664 0.08060 0.08477 0.08683 0.69250 0.88864 1 1.05166 1.10607 1.13299 100 0.04657 0.05919 0.06599 0.06839 0.07079 0.07156 0.70575 0.89695 1 1.03644 1.07272 1.08442 101 0.22978 0.28276 0.31380 0.32697 0.33531 0.34178 0.73226 0.90108 1 1.04198 1.06854 1.08917 102 0.08265 0.18268 0.23091 0.24799 0.25002 0.24046 0.35793 0.79115 1 1.07397 1.08275 1.04135 103 0.08500 0.17750 0.22182 0.23761 0.23945 0.23331 0.38319 0.80018 1 1.07119 1.07946 1.05180 104 0.05834 0.14001 0.17845 0.19050 0.18808 0.18048 0.32694 0.78461 1 1.06755 1.05397 1.01137 105 0.04714 0.11778 0.18064 0.23971 0.31814 0.37823 0.26099 0.65202 1 1.32704 1.76121 2.09386 106 0.07337 0.13499 0.16425 0.17407 0.17492 0.17068 0.44666 0.82181 1 1.05979 1.06495 1.03913 107 0.06625 0.10530 0.11665 0.11415 0.10570 0.09698 0.56794 0.90267 1 0.97854 0.90617 0.83141 108 0.08627 0.18086 0.23503 0.26338 0.28142 0.28364 0.36705 0.76953 1 1.12065 1.19741 1.20683 109 0.09606 0.22182 0.28537 0.30991 0.31284 0.30519 0.33661 0.77731 1 1.08599 1.09626 1.06947 110 0.11534 0.18575 0.21932 0.23065 0.23420 0.23127 0.52589 0.84692 1 1.05166 1.06781 1.05446 111 0.06062 0.11313 0.13237 0.13415 0.12717 0.11831 0.45792 0.85463 1 1.01340 0.96067 0.89374 112 0.09074 0.15900 0.19673 0.21477 0.22784 0.23164 0.46124 0.80821 1 1.09171 1.15812 1.17744 113 0.06287 0.10885 0.12338 0.12236 0.11430 0.10567 0.50960 0.88224 1 0.99175 0.92641 0.85647 114 0.03958 0.07811 0.11154 0.14292 0.18789 0.22470 0.35486 0.70032 1 1.28128 1.68451 2.01450 115 0.04824 0.09831 0.12579 0.13927 0.14713 0.14629 0.38355 0.78155 1 1.10717 1.16966 1.16301 116 0.09128 0.13784 0.16701 0.18421 0.20135 0.21049 0.54656 0.82537 1 1.10302 1.20561 1.26037 117 0.03646 0.08140 0.10440 0.11359 0.11594 0.11328 0.34918 0.77968 1 1.08800 1.11050 1.08505 118 0.06920 0.13292 0.17237 0.19729 0.21793 0.22603 0.40147 0.77115 1 1.14462 1.26434 1.31132 119 0.06268 0.13835 0.19036 0.22578 0.25924 0.27597 0.32927 0.72678 1 1.18611 1.36187 1.44974

196 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Table E-5 C10 – coefficients, Estimation Set: Observed & Predicted Values

197 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

C10 (obs) Vs C 10 (pred) 1

0.9 C10 (obs) C10 (pred) 0.8

0.7 0.6 10

C 0.5

0.4

0.3 0.2

0.1

0 2 7 10 14 17 21 25 29 34 37 41 46 50 53 56 60 63 66 69 73 76 80 83 86 89 92 95 98 102105108111116119122125128131 Catchment No.

Figure E-1 Plot of C10 – coefficients, Estimation Set: Observed & Predicted Values

198 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

APPENDIX F RESULTS ASSOCIATED WITH THE OLS GLS ANALYSIS

F.1 Q1.25 MODEL

Model development for Q1.25 involved the removal of 14 stations which were considered influential and outliers with respect to Cook’s distance. These stations are 232000, 233211, 235204, 401216, 403209, 405274, 406216, 226218, 226222, 232210, 235202, 236212, 402206 and 404206. The removal of these stations took place over 2 model runs until approximately 10% of the stations were removed. The following three figures show important properties of the residuals: Figure F.1 shows that the histogram of residuals is approximately normally distributed with mean zero and constant variance (upon visual inspection). Similarly Figure F.2 represents the normal cumulative plots of the standardised residuals which show that the residuals approximately follow a straight line indicating the residuals are near normally-distributed. In Figure F.3 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. Refer to Table 8.1 for final equation and for the related important model statistics.

(response is Q1.25)

15

10

Frequency 5

0

-3 -2 -1 0 1 2 3 Standardized Residual

Figure F.1 – Histogram of Standardized Residuals for log Q1.25

199 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

3

2

1

0

-1 Normal Score Normal

-2

-3 -2 -1 0 1 2 Standardized Residual

Figure F.2 – Normal Probability Plot of the Standardised Residuals for log Q1.25

2

1

0

-1

Residual Standardized -2

0.5 1.0 1.5 Fitted Value

Figure 8.3 –Standardised Residuals vs. Predicted Values for log Q1.25

F.2 Q 2 MODEL

Model development for Q2 involved the removal of 14 stations which were considered influential and outliers with respect to Cook’s distance. These stations are 226218, 228212, 233211, 235204, 403209, 405205, 406216, 232210, 235202, 235234, 401216, 401216, 402206 and 405274. The removal of these stations took place over 2 model runs until approximately 10% of the stations were removed. The following three figures show important properties of the residuals: Figure F.4 shows that the histogram of residuals is approximately normally distributed with mean zero and an approximate constant variance (upon visual inspection). However the Breusch-Pagan test indicates that slight

200 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

heteroskedasticity is present by analyzing the Chi – squared statistic (see Table 8.1). Figure F.5 represents the normal cumulative plots of the standardized residuals which show that the residuals approximately follow a straight line indicating the residuals are near normally-distributed; this is supported by the significant Kolmogorov – Smirnov test for normality (see Table 8.1). In Figure F.6 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity, however as indicated above slight heteroskedasticity is present. Refer to Table 8.1 for final equation and for the related important model statistics.

10

5 Frequency

0 -2 -1 0 1 2 Standardized Residual

Figure F.4 – Histogram of Standardized Residuals for log Q2

3

2

1

0

-1

Score Normal

-2

-3 -2 -1 0 1 2 Standardized Residual

Figure F.5 – Normal Probability Plot of the Standardised Residuals for log Q2

201 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

2

1

0

-1

Residual Standardized -2

1.0 1.5 2.0 Fitted Value

Figure F.6 – Standardised Residuals vs. Predicted Values for log Q2

F.3 Q 5 MODEL

Model development for Q5 involved the removal of 13 stations which were considered influential and outliers with respect to Cook’s distance. These stations are 226218, 228212, 232210, 235204, 403209, 405205, 406216, 227225, 233211, 235203, 238207, 401215 and 402206. The removal of these stations took place over 2 model runs until approximately 10% of the stations were removed. The following three figures show important properties of the residuals: Figure F.7 shows that the histogram of residuals is approximately normally distributed with mean zero and an approximate constant variance (upon visual inspection). However the Breusch-Pagan test indicates that slight to moderate heteroskedasticity is present by analyzing the Chi – squared statistic (see Table 8.1). Figure F.8 represents the normal cumulative plots of the standardized residuals which show that the residuals approximately follow a straight line indicating the residuals are near normally-distributed with mean zero; this is supported by the significant Kolmogorov – Smirnov test for normality (see Table 8.1). In Figure F.9 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity (graphically), however as indicated above, slight heteroskedasticity is present. Refer to Table 8.1 for final equation and for the related important model statistics.

202 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

10

5

Frequency

0 -2 -1 0 1 2 Standardized Residual

Figure F.7 – Histogram of Standardized Residuals for log Q5

3

2

1

0

-1 Score Normal -2

-3 -2 -1 0 1 2 Standardized Residual

Figure F.8 – Normal Probability Plot of the Standardised Residuals for log Q5

2

1

0

-1

Residual Standardized -2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 Fitted Value

Figure F.9 – Standardised Residuals vs. Predicted Values for log Q5

203 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

F.4 Q 20 MODEL

Model development for Q20 involved the removal of 12 stations which were considered influential and outliers with respect to Cook’s distance. These stations are 226218, 228212, 232210, 233214, 238207, 403204, 406216, 233211, 235204, 401209, 401215 and 405205. The removal of these stations took place over 2 model runs until approximately 10% of the stations were removed. The following three figures show important properties of the residuals: Figure F.10 shows that the histogram of residuals is approximately normally distributed with mean zero and constant variance (upon visual inspection). Also the Breusch-Pagan test indicates that no heteroskedasticity is present by analyzing the Chi – squared statistic (see Table 8.1). Figure F.11 represents the normal cumulative plots of the standardized residuals which show that the residuals approximately follow a straight line indicating the residuals are near normally-distributed with mean zero; this is supported by the significant Kolmogorov – Smirnov test for normality (see Table 8.1). In Figure F.12 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity (graphically), however as indicated above, no heteroskedasticity seems to be present. Refer to Table 8.1 for final equation and for the related important model statistics.

10

5 Frequency

0 -2 -1 0 1 2 Standardized Residual

Figure F.10 – Histogram of Standardized Residuals for log Q20

204 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

3

2

1

0

-1

Score Normal

-2

-3 -2 -1 0 1 2 Standardized Residual

Figure F.11 – Normal Probability Plot of the Standardised Residuals for log Q20

2

1

0

-1

Residual Standardized -2

1.6 1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 Fitted Value

Figure F.12 – Standardised Residuals vs. Predicted Values for log Q20

F.5 Q 50 MODEL

Model development for Q50 involved the removal of 15 stations which were considered influential and outliers with respect to Cook’s distance. These stations are 226218, 228212, 231200, 232210, 233214, 238207, 403209, 406216, 225213, 233211, 235204, 401209, 401205, 404206 and 405205. The removal of these stations took place over 2 model runs until approximately 10% of the stations were removed. The following three figures show important properties of the residuals: Figure F.13 shows that the histogram of residuals is approximately normally distributed with mean zero and an approximate constant variance (upon visual inspection). Also the Breusch-Pagan test indicates that heteroskedasticity could be present by analyzing the Chi – squared statistic (see Table 8.1).

205 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Figure F.14 represents the normal cumulative plots of the standardized residuals which show that the residuals approximately follow a straight line indicating the residuals are near normally-distributed with mean zero; this is supported by the significant Kolmogorov – Smirnov test for normality (see Table 8.1). In Figure F.15 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity (graphically), however as indicated above, slight heteroskedasticity seems to be present. Refer to Table 8.1 for final equation and for the related important model statistics.

10

5

Frequency

0

-2 -1 0 1 2 Standardized Residual

Figure F.13 – Histogram of Standardized Residuals for log Q50

3

2

1

0

-1 Normal Score Normal

-2

-3 -2 -1 0 1 2 Standardized Residual

Figure F.14 – Normal Probability Plot of the Standardised Residuals for log Q50

206 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

2

1

0

-1

Residual Standardized -2

1.7 1.8 1.9 2.0 2.1 2.2 2.3 2.4 2.5 2.6 2.7 Fitted Value

Figure F.15 – Standardised Residuals vs. Predicted Values for log Q50

F.6 Q 100 MODEL

Model development for Q100 involved the removal of 15 stations which were considered influential and outliers with respect to Cook’s distance. These stations are 225213, 231200, 232210, 233214, 238207, 403209, 404206, 406216, 222217, 226218, 228212, 235204, 235233, 401209 and 401215. The removal of these stations took place over 2 model runs until approximately 10% of the stations were removed. The following three figures show important properties of the residuals: Figure F.16 shows that the histogram of residuals is approximately normally distributed with mean zero and an approximate constant variance (upon visual inspection). Also the Breusch-Pagan test indicates that slight heteroskedasticity could be present by analyzing the Chi – squared statistic (see Table 8.1). Figure F.17 represents the normal cumulative plots of the standardized residuals which show that the residuals approximately follow a straight line indicating the residuals are near normally-distributed with mean zero; this is supported by the significant Kolmogorov – Smirnov test for normality (see Table 8.1). In Figure F.18 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity (graphically), however as indicated above, slight heteroskedasticity seems to be present. Refer to Table 8.1 for final equation and for the related important model statistics.

207 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

15

10

Frequency 5

0

-3 -2 -1 0 1 2 Standardized Residual

Figure F.16 – Histogram of Standardized Residuals for log Q100

3

2

1

0

-1

Score Normal

-2

-3 -2 -1 0 1 2 Standardized Residual

Figure F.17 – Normal Probability Plot of the Standardised Residuals for log Q100

2

1

0

-1

Standardized Residual Standardized -2

1.8 2.3 2.8 Fitted Value

Figure F.18 – Standardised Residuals vs. Predicted Values for log Q100

208 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

F.7 Q 200 MODEL

Model development for Q200 involved the removal of 15 stations which were considered influential and outliers with respect to Cook’s distance. These stations are 225213, 231200, 232210, 233214, 238207, 403209, 404206, 406216, 222217, 226218, 228212, 235204, 235233, 401209 and 401215. The removal of these stations took place over 2 model runs until approximately 10% of the stations were removed. The following three figures show important properties of the residuals: Figure F.19 shows that the histogram of residuals is approximately normally distributed with mean zero and an approximate constant variance (upon visual inspection). Also the Breusch-Pagan test indicates that slight heteroskedasticity could be present by analyzing the Chi – squared statistic (see Table 8.1). Figure F.20 represents the normal cumulative plots of the standardized residuals which show that the residuals approximately follow a straight line indicating the residuals are near normally-distributed with mean zero; this is supported by the significant Kolmogorov – Smirnov test for normality (see Table 8.1). In Figure F.21 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity (graphically), however as indicated above, slight heteroskedasticity seems to be present. Refer to Table 8.1 for final equation and for the related important model statistics.

20

10

Frequency

0

-3 -2 -1 0 1 2 3 Standardized Residual

Figure F.19 – Histogram of Standardized Residuals for log Q200

209 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

3

2

1

0

-1

Score Normal

-2

-3 -2 -1 0 1 2 Standardized Residual

Figure F.20 – Normal Probability Plot of the Standardised Residuals for log Q200

2

1

0

-1

Residual Standardized -2

2.0 2.5 3.0 Fitted Value

Figure F.21 – Standardised Residuals vs. Predicted Values for log Q200

210 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

F.8 Q 1.25 MODEL The following two figures show important properties of the residuals: Figure F.22 shows that the histogram of residuals is approximately normally distributed. In Figure F.23 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. Refer to Table 8.3 for final equation and for the related important model statistics.

Histogram of Residuals (Response is logQ1.25) 35

30 25 20 15 Frequency 10

5

0 -3 -2 -1 0 1 2 3 Standardised Residuals - GLS

Figure F.22 – Histogram of Standardized Residuals (GLS) for log Q1.25

GLS Residuals Versus Fitted Plot Q1.25

2.5

2

1.5

1

0.5

0

-0.5 -1

Residual standardised GLS -1.5 -2 -2.5 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 Fitted Values

Figure F.23 –Standardised Residuals vs. Predicted Values (GLS) for log Q1.25

211 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

F.9 Q 2 MODEL The following two figures show important properties of the residuals: Figure F.24 shows that the histogram of residuals is approximately normally distributed. In Figure F.25 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. As compared with the OLS diagnostic plots the GLS diagnostic plots seem to be providing slightly better results. Refer to Table 8.3 for final equation and for the related important model statistics.

Histogram of Residuals (Response is logQ2 35

30 25

20

15 Frequency

10

5 0 -3 -2 -1 0 1 2 3 Standardised Residuals - GLS

Figure F.24 – Histogram of Standardized Residuals (GLS) for log Q2

GLS Residuals Versus Fitted Plot Q2 3

2

1

0

-1

Residuals Standardised -2

-3 0.6 0.8 1 1.2 1.4 1.6 1.8 2 Fitted Values

Figure F.25 – Standardised Residuals vs. Predicted Values (GLS) for log Q2

212 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

F.10 Q 5 MODEL The following two figures show important properties of the residuals: Figure F.26 shows that the histogram of residuals is approximately normally distributed. In Figure F.27 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. As compared with the OLS diagnostic plots the GLS diagnostic plots seem to be providing slightly better results. Refer to Table 8.3 for final equation and for the related important model statistics. Refer to Table 8.3 for final equation and for the related important model statistics.

Histogram of Residuals (Response is logQ5) 35

30 25

20

15

Frequency

10

5 0 -3 -2 -1 0 1 2 3 Standardised Residuals

Figure F.26 – Histogram of Standardized Residuals (GLS) for log Q5

GLS Residuals Versus Fitted Plots 2.5 2

1.5

1 0.5

0 -0.5 -1 Standardised Residuals Standardised -1.5 -2 -2.5 1 1.2 1.4 1.6 1.8 2 2.2 2.4 Fitted Values

Figure F.27 – Standardised Residual vs. Predicted Values (GLS) for log Q5

213 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

F.11 Q20 MODEL The following two figures show important properties of the residuals: Figure F.28 shows that the histogram of residuals is approximately normally distributed. In Figure F.29 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. Refer to Table 8.3 for final equation and for the related important model statistics.

Histogram of Residuals Response is logQ20 35

30 25

20

15

Frequency 10

5

0 -3 -2 -1 0 1 2 3 Standardised Residuals - GLS

Figure F.28 – Histogram of Standardized Residuals (GLS) for log Q20

GLS Residuals Versus Fitted Plot 2.5 2 1.5 1 0.5 0 -0.5 -1 Standardised Residuals Standardised -1.5 -2 -2.5 1.5 1.7 1.9 2.1 2.3 2.5 2.7 Fitted Values

Figure F.29 – Standardised Residuals vs. Predicted Values (GLS) for log Q20

214 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

F.12 Q50 MODEL The following two figures show important properties of the residuals: Figure F.30 shows that the histogram of residuals is approximately normally distributed. In Figure F.31 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. Refer to Table 8.3 for final equation and for the related important model statistics.

Histogram of Residuals Response is logQ50 35

30

25

20

15 Frequency 10

5

0 -3 -2 -1 0 1 2 3 Standardised Residuals - GLS

Figure F.30 – Histogram of Standardized Residuals (GLS) for log Q50

GLS Residuals Versus Fitted Plot 2.5 2 1.5 1 0.5 0 -0.5 -1

Residuals Standardised -1.5 -2

-2.5 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 Fitted Values

Figure F.31 – Standardised Residuals vs. Predicted Values (GLS) for log Q50

215 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

F.13 Q 100 MODEL The following two figures show important properties of the residuals: Figure F.32 shows that the histogram of residuals is approximately normally distributed. In Figure F.33 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. Refer to Table 8.3 for final equation and for the related important model statistics.

Histogram of Residuals Response is logQ100 35

30

25

20

15

Frequency 10

5

0 -3 -2 -1 0 1 2 3 Standardised Residuals - GLS

Figure F.32 – Histogram of Standardized Residuals (GLS) for log Q100

GLS Residuals Versus Fitted Plot 3

2

1

0

-1

Residuals Standardised -2

-3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 Fitted Value

Figure F.33 – Standardised Residuals vs. Predicted Values (GLS) for log Q100

216 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

8.14 Q 200 MODEL The following two figures show important properties of the residuals: Figure F.34 shows that the histogram of residuals is approximately normally distributed. In Figure F.35 the standardised residuals are plotted against the predicted values. The plot does not show any systematic pattern between the predicted values and the standardised residuals to indicate heteroskedasticity. Refer to Table 8.3 for final equation and for the related important model statistics.

Histogram of Residuals Response is logQ200 40 35 30

25

20

Frequency 15 10

5

0 -3 -2 -1 0 1 2 3 Standardised Residuals - GLS Figure F.34 – Histogram of Standardized Residuals (GLS) for log Q200

GLS Residuals Versus Fitted Values 3

2

1

0

-1

Residuals Standardised -2

-3 1.5 1.7 1.9 2.1 2.3 2.5 2.7 2.9 3.1 Fitted Values

Figure F.35 – Standardised Residuals vs. Predicted Values for log Q200

217 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

APPENDIX G Graphical Comparison of Flood Frequency Plots for all Flood Estimation Methods, for Test Catchments

218 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 221207 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 1000000

100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-1 Comparison of FFA plots for different Flood Estimation Methods for Station 221207

Station 221210 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 1000000

100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-2 Comparison of FFA plots for different Flood Estimation Methods for Station 221210

219 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 221212 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 1000000

100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 REG 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-3 Comparison of FFA plots for different Flood Estimation Methods for Station 221212

Station 223202 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 1000000

100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 REG 1.2 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-4 Comparison of FFA plots for different Flood Estimation Methods for Station 223202

220 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 225224 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 10000000

1000000

100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-5 Comparison of FFA plots for different Flood Estimation Methods for Station 225224

Station 226209 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-6 Comparison of FFA plots for different Flood Estimation Methods for Station 226209

221 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 227200 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 10000000

1000000

100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-7 Comparison of FFA plots for different Flood Estimation Methods for Station 227200

Station 227210 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000

1000 Discharge(ML/day)

100

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 10 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-8 Comparison of FFA plots for different Flood Estimation Methods for Station 227210

222 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 227219 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y) Figure G-9 Comparison of FFA plots for different Flood Estimation Methods for Station 227219

Station 229218 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000 Discharge(ML/day)

1000

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-10 Comparison of FFA plots for different Flood Estimation Methods for Station 229218

223 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 230204 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000

1000

100 Discharge(ML/day)

10

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 1 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-11 Comparison of FFA plots for different Flood Estimation Methods for Station 230204

Station 230213 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 10000

1000

Discharge(ML/day) 100

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 10 REG 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-12 Comparison of FFA plots for different Flood Estimation Methods for Station 230213

224 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 231231 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 1000000

100000

10000

1000 Discharge(ML/day)

100

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 10 REG 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-13 Comparison of FFA plots for different Flood Estimation Methods for Station 231231

Station 235205 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 10000

1000

Discharge(ML/day) 100

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 10 REG 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y) Figure G-14 Comparison of FFA plots for different Flood Estimation Methods for Station 235205

225 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 401210 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000 Discharge(ML/day)

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 1000 REG 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y)

Figure G-15 Comparison of FFA plots for different Flood Estimation Methods for Station 401210

Station 402217 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 10000

1000 Discharge(ML/day)

Gauged Flow Quantile LP3 (BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG OLS REG 100 REG 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability (1 in Y) Figure G-16 Comparison of FFA plots for different Flood Estimation Methods for Station 402217

226 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 405229 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000

1000 icag(MDischarge( L/ day)

Gauged Flow Quantile LP3 ( BAY-FIT) 90% Confidence Limits PRM C10 - O LS GLS REG O LS REG 100 REG 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability ( 1 in Y)

Figure G-17 Comparison of FFA plots for different Flood Estimation Methods for Station 405229

Station 406200 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000

1000 Discharge( M L/ day)

100

Gauged Flow Quantile LP3 ( BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG O LS REG 10 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability ( 1 in Y)

Figure G-18 Comparison of FFA plots for different Flood Estimation Methods for Station 406200

227 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared. Khaled Haddad 98072705 University of Western Sydney

Station 406213 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000 Discharge( M L/ day)

Gauged Flow Quantile LP3 ( BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG O LS REG 1000 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability ( 1 in Y)

Figure G-19 Comparison of FFA plots for different Flood Estimation Methods for Station 406213

Station 415238 - Fitted LP3 with Bayesian Procedure & Rating Curve Error Analysis 100000

10000

1000 Discharge( M L/ day)

Gauged Flow Q uantile LP3 ( BAY-FIT) 90% Confidence Limits PRM C10 - OLS GLS REG O LS REG 100 1.25 2 5 10 20 50 100 200 Annual Exceedance Probability ( 1 in Y)

Figure G-20 Comparison of FFA plots for different Flood Estimation Methods for Station 415238

228 Design Flood Estimation in Ungauged Catchments Using a Quantile Regression Technique: Ordinary Least Squares and Generalised Least Squares Compared.