Regional Hydrologic Analysis: Ordinary and Generalized Least Squares Revisited Charles N

WATER RESOURCES RESEARCH, VOL. 34, NO. 1, PAGES 121–128, JANUARY 1998

Regional hydrologic analysis: Ordinary and generalized least squares revisited Charles N. Kroll Environmental Resources and Forest Engineering, College of Environmental Science and Forestry, State University of New York, Syracuse

Jery R. Stedinger School of Civil and Environmental Engineering, Cornell University, Ithaca, New York

Abstract. Generalized least squares (GLS) regional regression procedures have been developed for estimating river flow quantiles. A widely used GLS procedure employs a simplified model error structure and average covariances when constructing an approximate residual error covariance matrix. This paper compares that GLS estimator ␤ˆ MC ␤ˆ E ␤ˆMC ( GLS), an idealized GLS estimator ( GLS) based on the simplifying assumptions of GLS ␤ˆT with true underlying statistics in a region, the best possible GLS estimator ( GLS) obtained using the true residual error covariance matrix, and the ordinary least squares estimator ␤ˆT ␤ˆT ␤Ê ␤ˆT ( OLS). Useful analytic expressions are developed for the variance of GLS, GLS, and OLS. ␤Ê For previously examined cases the average sampling mean square error (mses)of GLS ␤ˆT ␤ˆMC was the same as the mses of GLS, and the mses of GLS usually was larger than the mses ␤Ê ␤ˆT ␤ˆMC of both GLS and GLS. The loss in efficiency of GLS was mostly due to estimating streamflow statistics employed in the construction of the residual error covariance matrix rather than the simplifying assumptions in presently employed GLS estimators. The new analytic expressions were used to compare the performance of the OLS and GLS estimators for new cases representing greater model variability across sites as well as the effect return period has on the estimators’ relative performance. For a more heteroscedastic model error variance and larger return periods, some increase in the mses ␤Ê ␤ˆT of GLS relative to the mses of GLS was observed.

1. Introduction provided more accurate parameter estimators, better estimators of parameter sampling variances, and an almost unbiased Regional regression models are often used to estimate flow estimator of the model error variance. In particular, the aver- statistics at ungaged river sites. Relationships between flow age sampling mean square error (mses) of the GLS estimators statistics and geomorphic, geologic, climatic, and topographic was smaller than the mses of the OLS estimators when the parameters have been formulated in many regions for low model error variance was small or the cross correlations among flows [Thomas and Benson, 1970; Thomas and Cervione, 1970; the annual flows were large. Moss and Tasker [1991] showed Riggs, 1972; Vogel and Kroll, 1992] and for flood flows [Benson, that GLS procedures describe model accuracy in regional anal- 1962; Matalas and Gilroy, 1968; Thomas and Benson, 1970; yses better than OLS procedures do. Jennings et al., 1994; Tasker et al., 1996]. Traditionally, the This paper addresses a number of unresolved issues regard- parameters of these models were estimated using ordinary ing GLS and OLS regional regression procedures. The paper least squares (OLS) regression procedures employing data for (1) clarifies assumptions that have been made when imple- gaged river sites [Thomas and Benson, 1970]. For OLS param- menting GLS and OLS regional regression procedures, (2) eter estimators to be efficient the model residuals should be examines the loss of efficiency of OLS estimators and GLS independent and homoscedastic. Tasker [1980] developed a weighted least squares (WLS) estimators that employ a residual error covariance matrix dif- regression technique to account for the varying sampling error ferent than the true residual error covariance matrix, (3) de- in the at-site quantile estimators. Stedinger and Tasker [1985, termines whether this loss in efficiency in GLS estimators is 1986] extended this work by developing generalized least due to smoothing of the sampling covariance matrix or to squares (GLS) regression techniques to account for the varying implementing an inadequate model of the model error vari- sampling error and the cross correlation among concurrent ance, and (4) determines when OLS estimators are adequate flows. Tasker and Stedinger [1989] discuss an implementation of and when a GLS estimator which accounts for varying model GLS estimators which also accounts for varying model error error variance is needed to achieve efficient parameter esti- variance among sites and variations in the cross correlation of mates. To examine the efficiency of GLS estimators, the prac- concurrent observations. Using Monte Carlo simulation, Ste- tical GLS estimator developed by Stedinger and Tasker [1985] is dinger and Tasker [1985] demonstrated that GLS procedures compared with an idealized GLS estimator that uses the true regional statistics and the simplifying assumptions Stedinger Copyright 1998 by the American Geophysical Union. and Tasker used to construct the residual error covariance Paper number 97WR02685. matrix, and an ideal GLS estimator that uses the true residual 0043-1397/98/97WR-02685$09.00 error covariance matrix. 121 122 KROLL AND STEDINGER: REGIONAL HYDROLOGIC ANALYSIS

This paper is structured as follows. Section 2 presents the ͑␤ˆ T ͒ ϭ ͑ T ͒Ϫ1 T⌳ ͑ T ͒Ϫ1 var OLS X X X TX X X (7) regional regression problem. Section 3 describes Stedinger and ⌳ Tasker’s [1985] Monte Carlo experiments. Section 4 discusses Equation (7) involves T, the true residual error covariance the construction of the residual error covariance matrices. Sec- matrix. tion 5 compares the results for one of Stedinger and Tasker’s An extension of OLS parameter estimators are GLS param- ⌳ [1985] Monte Carlo experiments with analytic expressions. eter estimators that employ some estimate of T. These esti- Section 6 uses analytic expressions to compare the efﬁciency of mators weight each observation to reﬂect the variance of the OLS and GLS estimators for several new cases not considered residual error associated with that observation and the covari- ⌳ by Stedinger and Tasker [1985, 1986]. Finally, section 7 presents ance of the residual error with other residual errors. If T is our conclusions. known, one can employ the optimal GLS estimator [Johnston, 1984, equation 8-20]

␤ˆ T ϭ ͑ T⌳Ϫ1 ͒Ϫ1 T⌳Ϫ1ˆ␪ 2. Regional Regression Model GLS X T X X T (8) Following the notation by Stedinger and Tasker [1985], let ␪ whose sampling variance is [Johnston, 1984, equation 8-12] be a vector of the true flow statistics for river sites in a region ͑␤ˆ T ͒ ϭ ͑ T⌳Ϫ1 ͒Ϫ1 and let X be a matrix of drainage basin characteristics associ- var GLS X T X (9) ated with the sites augmented by a column of ones. Assume ␪ ␤ˆT ␤ˆT that the relationship between and X is described by the linear Both GLS and OLS are unbiased estimators of the parameters ␤ˆT model in the model given by (1); however, GLS has a smaller variance than ␤ˆT because the former correctly weights each of the ␪ ϭ ␤ ϩ␧ OLS X (1) ␤ˆT observations. The parameter estimator GLS is the best (min- where ␤ contains model parameters and ␧ contains the residual imum variance) linear unbiased estimator (BLUE) of the ␧ ϭ ␥2 model parameters [Greene, 1990; Johnston, 1984]. errors. Here var ( i) i is the model error variance. In practice the true flow statistic, ␪, is not known, and an estima- Stedinger and Tasker [1985] were interested in developing an tor, ˆ␪, of the statistic of interest is obtained with available estimator of the parameters of the model in (1) that had an ␤ˆT streamflow records. Assume that ˆ␪ is an unbiased estimator of efficiency approaching that of the GLS estimator GLS. Em- ␤ˆT ⌳ ⌳ ␪ so that ploying GLS is not practical because T is unknown; T requires knowledge of the model error variance associated ͓ˆ␪͔ ϭ ␪ ␥2 E (2) with each observation, i , and the sampling covariance matrix, ⌺ and , both of which need to be estimated. Stedinger and Tasker ⌳ ͓͑ˆ␪ Ϫ ␪ ͒͑ˆ␪ Ϫ ␪ ͒T͔ ϭ ⌺ [1985] proposed constructing an approximation of T, which E (3) ⌳ ⌫ ␥2 we denote as E, by approximating by diag ( ) with a where ⌺ is the sampling covariance matrix associated with the constant value of ␥2 that must be estimated, and by approxi- estimator ˆ␪. The covariance of ˆ␪ about the regression mean X␤ mating ⌺ using averaged or smoothed estimates of the sam- defines the residual error covariance matrix, pling variances associated with the at-site streamflow statistics of interest. An average value of the cross correlation between ͓͑ˆ␪ Ϫ ␤͒͑ˆ␪ Ϫ ␤͒T͔ ϭ ⌳ ϭ ⌫ ϩ ⌺ E X X T (4) concurrent flows in the region was used to calculate the co- ⌺ ⌫ϭ ␥2 ␥2 variance terms which are the off-diagonal elements of .In where diag [ i ], and i is the model error variance associated with site i. later work a relationship between the cross correlation and the ␥2 In practice, OLS regression procedures have been used to distance between gaged river sites was employed, and was ⌳ allowed to vary across sites [Tasker and Stedinger, 1989]. estimate the parameters of linear models such as (1). If T is equal to a diagonal matrix with a constant variance ␥2 along If smoothed estimates of the variance of individual stream- the diagonal, then OLS parameter estimators are efficient flow observations are available, and the average cross correlation of concurrent streamflows across sites and the average since they have minimum variance among all unbiased linear ⌳ estimators [Johnston, 1984, p. 173]. In practice the variance, model error variance are known, one can construct E and ⌳ compute the corresponding idealized GLS estimator and therefore the diagonal elements of T, will differ from site to site. However, the assumption of a constant variance (called ␤ˆ E ϭ ͑ T⌳Ϫ1 ͒Ϫ1 T⌳Ϫ1ˆ␪ GLS X E X X E (10) homoscedasticity) is often adequate for many practical prob- ⌳ lems [Draper and Smith, 1981]. For particular X and T matrices the sampling variance of this The OLS estimator of the parameter vector is [Johnston, GLS parameter estimator is 1984, equation 5-26] ͑␤ˆ E ͒ ϭ ͑ T⌳Ϫ1 ͒Ϫ1 T⌳Ϫ1⌳ ⌳Ϫ1 ͑ T⌳Ϫ1 ͒Ϫ1 var GLS X E X X E T E X X E X (11) ␤ˆ T ϭ ͑ T ͒Ϫ1 Tˆ␪ OLS X X X (5) ⌳ While Stedinger and Tasker wanted to employ E as an Because the residual errors associated with the model in (1) ⌳ approximation to T, as a practical matter they had to employ are not independent and identically distributed with equal vari- ⌳ ⌳ an estimator of E, which we denote as MC. Stedinger and ances, the standard relationship for the variance of the OLS Tasker [1985] encountered problems when at-site sample vari- parameter estimator [Johnston, 1984, equation 5-33] ances were used to construct an estimator of the sampling Ϫ ⌺ var ͑␤ˆ T ͒ ϭ ␥2͑XTX͒ 1 (6) covariance matrix , because those weights were then corre- OLS lated with the at-site quantile estimators, the dependent vari- does not describe the actual sampling variance of the OLS able in the regression equation. To avoid such problems, the parameter estimator. Instead the correct expression is variance of the individual streamflow observations used to [Johnston, 1984, equation 8-13] compute the elements of ⌺ was estimated using a regression KROLL AND STEDINGER: REGIONAL HYDROLOGIC ANALYSIS 123

relationship developed between at-site sample variances and for 50-year floods. The region included 30 sites with drainage ⌳ physiographic basin characteristics. MC was constructed using area randomly selected from a uniform distribution ranging in the computed estimate of the variance of individual observa- logarithmic space from 10 to 20,000 miles2 (25.9 to 51,800 tions from this regression relationship, a computed average of km2). It was assumed that the annual maximum flows at each ␮ the sample cross-correlation estimators, and a computed gen- site are lognormally distributed. The log space mean, i, and ␴ eralized mean square error estimator of the average model standard deviation, i, at each site were a function of drainage error variance. In Stedinger and Tasker’s Monte Carlo exper- area at that site, Ai, following the models iment, for every replicate of the experiment a different value of ␮ ϭ ␣ ϩ ␤ ͑ ͒ ϩ ␷ ⌳ i ␮ ␮ ln Ai i (14) MC was constructed to allow computation of a GLS parameter estimator ␤ˆMC for that replicate. They also employed ␴ ϭ ͓␣ ϩ ␤ ͑ ͔͒ ͑␦ ͒ GLS i ␴ ␴ ln Ai exp i (15) ⌳ to estimate the variance of the GLS parameter estimator MC ␣ ϭ ␤ ϭ ␣ ϭ ␤ ϭϪ ␷ as where ␮ 0, ␮ 0.75, ␴ 1.5, ␴ 0.14, and i and ␦ are independent normally distributed random error Ϫ Ϫ i ͑␤ˆ MC ͒ ϭ ͑ T⌳ 1 ͒ 1 2 2 var GLS X MC X (12) terms with means 0 and Ϫ0.03125␴␷, and variances ␴␷ and 2 2 ␴␦ ϭ 0.0625␴␷, respectively. For each site an annual flow Thus Stedinger and Tasker had two estimators of the variance record of length n was randomly generated from a lognormal of their GLS parameter estimator: the observed empirical vari- i distribution with moments given by (14) and (15), and the cross ability of the individual ␤ˆMC estimators across replicates of the GLS correlation among the logarithms of concurrent flows in a Monte Carlo experiment and the average of (12) across repli- region equal to a constant value, ␳. Using the generated record cates. The observed variability of the individual ␤ˆMC estima- GLS for a site, the sample moment estimators, ␮ˆ and ␴ˆ , were tors across replicates should correspond to an average of equa- i i calculated and used to compute an at-site estimator of a flow tion (11) where ⌳ is replaced by ⌳ : E MC quantile E͓var ͑␤ˆ MC ͔͒ ˆ ϭ ␮ ϩ ␴ GLS Qp,i ˆ i zpˆ i (16) ϭ ͓͑ T⌳Ϫ1 ͒Ϫ1 T⌳Ϫ1 ⌳ ⌳Ϫ1 ͑ T⌳Ϫ1 ͒Ϫ1͔ ˆ E X MC X X MC T MC X X MC X (13) where Qp,i is an at-site estimate of a flow quantile with a nonexceedance probability of p, and z is the pth percentile of ⌳ p and the expectation is taken over the joint distribution of MC, a standard normal distribution (for the 50-year flood p ϭ 0.98 ⌳ , and X; for every replicate Stedinger and Tasker randomly ϭ T and zp 2.054). Combining (14), (15), and (16), the under- generated new drainage areas for the region so that X and the lying regression model is corresponding ⌳ were both random. In the Monte Carlo T ˆ ϭ ␣ ϩ ␤ ͑ ͒ ϩ␧ experiments reported by Stedinger and Tasker [1985], the dif- Qp,i ln Ai i (17) ␣ ϭ ␣ ϩ ␣ ␤ ϭ ␤ ϩ ␤ ␧ ference between (13) and the average across all Monte Carlo where ␮ zp ␴, ␮ zp ␴, and i is the residual replicates of (12) was relatively small. error. In the following sections we compare the variance of Ste- In their first experiment, Stedinger and Tasker considered ␤ˆMC dinger and Tasker’s GLS estimator, GLS, from their Monte the cases where ␳ ϭ 0.0, 0.3, 0.6, and 0.9, and ␴␷ ϭ 0.0, 0.1, 0.3, ϭ Carlo experiment, with the variance of the GLS estimator 0.5, and 0.9. For 10 sites the record length was set to ni 50, ␤ˆT ϭ ϭ using the true residual error covariance matrix, GLS, and the for 10 sites ni 20, and for the remaining 10 site ni 10. ␤Ê variance of the GLS estimator, GLS, based on a constructed residual error covariance matrix using the average variance of the streamflows at each site, the average cross correlation 4. Construction of the Residual Error between concurrent flows, and the average model error vari- Covariance Matrix ␤ˆT ␤Ê ance. The parameter estimators GLS and GLS are based on The residual error covariance matrix constructed in Ste- ␤ˆMC population values of these statistics, whereas GLS employed dinger and Tasker’s Monte Carlo experiment for their GLS sample estimators of the corresponding parameters. The vari- estimator, ⌳ , is an estimator of the covariance matrix, ⌳ , ␤ˆT ␤Ê MC E ance of GLS and GLS will be obtained using the analytic they proposed to employ. Both of these matrices differ from expression in (9) and (11), respectively. ⌳ the true residual error covariance matrix, T. This section In their Monte Carlo analysis, Stedinger and Tasker ran- examines the assumptions employed to develop the sampling domly generated new values of the drainage areas for every covariance matrix and model error variance for each of the replicate. To account for the random drainage areas, the ana- GLS estimators. Table 1 summarizes those assumptions. lytic expressions were averaged over 100 replicates, with drainage areas randomly generated for each replicate. This Monte 4.1. Estimation of Sampling Covariance Matrix Carlo analysis was also necessary to generate a true random log The diagonal elements of the sampling covariance matrix, ⌺, space standard deviation of the flows at each site which is correspond to the variance of the at-site quantile estimators, needed to construct the true residual error covariance matrix. and the off-diagonal elements correspond to the covariance These issues are discussed in section 4. among quantile estimators for different sites. When the individual observations are normally distributed, the correct diagonal elements of the sampling covariance matrix, ⌺ , are 3. Stedinger and Tasker’s Monte Carlo ii ⌺ ϭ ͑ ˆ ͒ ϭ ͑␮ ϩ ␴ ͒ ϭ ͑␮ ͒ ϩ 2 ͑␴ ͒ Experiment ii var Qp,i var ˆ i zpˆ i var ˆ i zp var ˆ i The results from Stedinger and Tasker’s [1985] first Monte ␴2 2 ⌫͑n /2͒ 2 ϭ i ϩ 2␴2ͫ Ϫ ͩ ͪͩ i ͪ ͬ Carlo experiment will be used to compare the variance of their zp i 1 (18) n n Ϫ 1 ⌫͑͑n Ϫ 1͒/2͒ ␤ˆMC ␤ˆT i i i GLS parameter estimator, GLS, to the variance of GLS and ␤Ê GLS. This experiment considered a regional regression model Equation (18) incorporates an exact expression for the vari- 124 KROLL AND STEDINGER: REGIONAL HYDROLOGIC ANALYSIS

Table 1. Alternative Assumptions and Equations Employed to Construct Residual Error Covariance Matrices, ⌳, for the ␤ˆT ␤ˆE ␤ˆMC GLS Estimators GLS , GLS, and GLS

␤ˆT ⌳T ␤Ê ⌳ ␤ˆMC ⌳ Assumption/Equation GLS ( ) GLS ( E) GLS ( MC) ⌫ ϭ ␥ ⌫ ϭ ␥៮ ⌫ ϭ ␥ Model error variance for each ii i, true model error ii , expected model error ii ˆ, estimator of average parameter estimator variance (equation (32)) variance (equation (33)) model error variance for each replicate (equation (34)) ␴2 ␴2 ␴ 2 Estimator of variance of i , true value of variance E[ i ], expected value of (E[ ˆ i]) , square of estimated ␴2 observations, i , employed (equation (15)) variance (equation (28)) expected value of standard to calculate sampling error deviation (equation (29)) associated with at-site quantile estimators, ⌺ ϭ ˆ␪ ii var ( i) Formula employed to compute exact estimator (equation (18)) first-order approximation first-order approximation the variance of at-site (equation (20)) (equation (20)) quantile estimator, ⌺ ϭ ˆ␪ ii var ( i) Correlation coefficient used in true value true value average regional estimate for first-order approximation each replicate (equation (21)) to compute covariances among at-site quantile estimators, ⌺ ϭ ˆ␪ ˆ␪ ij cov ( i, j)

␴2 ance of the at-site estimator of the standard deviation [David, variance of the annual flows, i , is required. The expected ⌳ 1981] and is employed in T. Stedinger and Tasker employed variance of the log-space annual flows is the first-order approximation ͓␴2͔ ϭ ͓␴ ͔ ϩ ͑ ͓␴ ͔͒2 E i var i E i (22) ␴2 i var ͑␴ˆ ͒ ϭ (19) The formula for the at-site standard deviation given by (15) i 2n i assumes that the standard deviation is lognormally distributed. for the variance of the at-site estimator of the standard devi- In terms of the logarithm of (15) ation when constructing the sampling covariance matrix [Ste- ln ͑␴ ͒ ϭ ln ͓␣␴ ϩ ␤␴ ln ͑ A ͔͒ ϩ ␦ (23) dinger, 1983]. Using this approximation, (18) becomes i i i the log space moments of the standard deviation are ␴2 z2 ͑ ˆ ͒ ϭ i ͫ ϩ pͬ var Qp,i 1 (20) ͓ ͑␴ ͔͒ ϭ ͓␣ ϩ ␤ ͑ ͔͒ ϩ ͓␦ ͔ ni 2 E ln i ln ␴ ␴ ln Ai E i (24) Stedinger and Tasker used a smoothed estimator of the at-site and 2 variance of the observations, ␴ , in (20) when constructing 2 i var ͓ln ͑␴ ͔͒ ϭ var ͓␦ ͔ ϭ ␴␦ (25) ⌳ i i their residual covariance, MC. The idealized residual error ⌳ covariance matrix, E, implements (20) with the expected The real-space moments for the lognormal distribution can be ␴2 value of i for each site i. calculated using the log space moments [Loucks et al., 1981], Stedinger and Tasker also employed a first-order approxi- resulting in mation to the covariance among at-site quantile estimators E͓␴ ͔ ϭ ␣␴ ϩ ␤␴ ln ͑ A ͒ (26) which are the off-diagonal elements of the sampling covariance i i matrix, and ͓␴ ͔ ϭ ͓␣ ϩ ␤ ͑ ͔͒2͓ ͑␴2͒ Ϫ ͔ ␳ m ␴ ␴ var i ␴ ␴ ln Ai exp ␦ 1 (27) ⌺ ϭ ij ij i j ͓ ϩ ␳ 2 ͔ Þ ij 1 ijzp/2 i j (21) ninj Substituting (26) and (27) into (22) yields

where m is the number of concurrent years of data at sites i 2 2 2 ij E͓␴ ͔ ϭ ͓␣␴ ϩ ␤␴ ln ͑ A ͔͒ exp ͑␴␦͒ (28) ␳ i i and j, and ij is the lag zero correlation coefﬁcient between the annual ﬂows at sites i and j. The exact expression for the Equation (28) corresponds to the average variance of the an-

covariance among the at-site quantile estimators is quite com- nual flows at a site with drainage area Ai and is used in the ⌳ ⌳ ⌳ plex, so (21) will be employed to compute T and E in the residual error covariance matrix, E. Equation (28) is not the ⌳ ⌳ ␦ calculations reported here. T and E will employ the true same as the variance of the flows in each replicate because i value of the cross-correlation which was constant across the in (15) is replaced by an expectation in (28). In this study a region in Stedinger and Tasker’s Monte Carlo experiment. The Monte Carlo analysis was employed to account for that differ- matrix associated with the GLS estimator implemented by ence. ⌳ Stedinger and Tasker, MC, used a regional average of at-site In addition, for every replicate of their Monte Carlo exper- estimators of the correlation coefficient. iment, Stedinger and Tasker randomly generated new drainage In (18) and (20), which describe the sampling variance as- areas for the region. To account for the variability due to ␴ sociated with the at-site quantile estimators, an estimate of the random drainage areas and the difference between i in (15) KROLL AND STEDINGER: REGIONAL HYDROLOGIC ANALYSIS 125

␴2 ␥2 ⌳ ϭ ␥2 ϩ⌺ˆ ˆ␪ and E[ i ] in (28), 100 random sets of different drainage areas for ˆ where MC ˆ IN ( ), N is the number of sites, ␦ ˆ␪ and i values were generated. In the construction of the true k is the number of degrees of freedom in the model, and is ⌳ ␦ residual error covariance matrix, T, the random values of i the parameter estimator for each replicate. This estimator of ␴ and Ai were used to calculate the true value of i at each site the average model error variance is dependent upon the at-site using (15), and this value was used in (18) for constructing one data and thus varies from replicate to replicate because of ⌺ realization of ii. This allows calculation of the average value different sets of flows and drainage areas in each replicate. ␤Ê ␤ˆT over 100 replicates of the sampling variance of GLS and GLS in (9) and (11), respectively, using generated values of X and ⌳ ⌳ 5. Comparison of Stedinger and Tasker’s Monte associated T and E matrices. Stedinger and Tasker used Carlo Results With Analytic Expressions In their Monte Carlo analysis (experiment 1) Stedinger and ͑ ͓␴ ͔͒2 ϭ ͑␣ ϩ ␤ˆ ͑ ͒͒2 E i ˆ ␴ ␴ ln Ai (29) Tasker [1985] computed estimates of the variance of the pa- instead of (28) for an estimator of the variance of the obser- rameter estimators and the average sampling mean square vations. This estimator systematically underestimates the vari- error (mses) of the OLS and GLS quantile estimators. The ance of the observations, but the bias was small in Stedinger mses of a quantile estimator is the average over sites of the 2 squared difference between the quantile estimator and its true and Tasker’s Monte Carlo experiment because ␴␦ was much ␴ value at such sites. The mses of unbiased quantile estimators is smaller than 1, and thus var ( i) was much smaller than ␴ 2 (E[ i]) . ϭ ͕͓ˆ␪ Ϫ ␪͔2͖ ϭ ͑␣͒ ϩ ͓ ͑ ͔͒ ͑␣ ␤ˆ ͒ mses EA,␣ˆ,␤ˆ var ˆ 2E ln A cov ˆ , 4.2. Estimation of Model Error Variance ϩ E͓͑ln ͑ A͒͒2͔ var ͑␤ˆ ͒ (35) The residual error covariance matrix also depends upon the 2 In this study the Monte Carlo results from Stedinger and Tasker ␥ i model error variance, i , for each site . Assuming the annual ␤ˆMC ␥2 [1985] for the mses of GLS are compared to calculated values flows are lognormally distributed, i is ␤ˆT ␤Ê of the mses of GLS and GLS computed using the average ␥2 ϭ ͑ ͒ ϭ ͓␮ ϩ ␴ ͔ i var [ln Qp,i ] var i zp i values over 100 replicates of the analytic expressions for the variance of the parameter estimators, (9) and (11), respec- ϭ ͓␮ ͔ ϩ 2 ͓␴ ͔ ϩ ͓␮ ␴ ͔ var i zp var i 2zp cov i, i (30) tively. Record lengths of 10, 20, and 50 were randomly assigned ␮ ␴ ϭ to a third of the sites, as in Stedinger and Tasker’s first exper- The regional model adopted in the experiment had cov ( i, i) 0. From (14) iment. ␤ˆMC For this comparison the efficiencies of the estimators GLS 2 var ͓␮ ͔ ϭ var ͓␷ ͔ ϭ ␴␷ (31) ␤Ê i i and GLS are computed as ␴ ͓␤ˆ T ͔ The var ( i) is given in (27). Substituting (27) and (31) into mses GLS Efficiency ␤ˆMC ϭ (30) yields GLS ͓␤ˆ MC ͔ mses GLS ␥2 ϭ ␴2 ϩ 2͓␣ ϩ ␤ ͑ ͔͒2͓ ͑␴2͒ Ϫ ͔ (36) i ␷ zp ␴ ␴ ln Ai exp ␦ 1 (32) ͓␤ˆ T ͔ mses GLS Efficiency ␤ˆ E ϭ GLS ͓␤ˆ E ͔ The model error variance is a function of drainage area and mses GLS thus varies across sites. Stedinger and Tasker [1985] proposed a The efficiency of an estimator approaches unity when the mse GLS estimator that uses an average model error variance. This s of an estimator is almost as small as the mse of ␤ˆT , which assumption was relatively good in their Monte Carlo experi- s GLS has the smallest possible mse for a linear unbiased estimator. ment because the model error variance varied only slightly s 2 2 Figure 1a is a plot of these efficiencies over the range of cross among sites. In that study ␴␦ ϭ 0.0625␴␷ , so that the variance 2 correlations and average model error variances examined in of the standard deviation of the observations, ␴␦, was small 2 Stedinger and Tasker’s experiment 1. The first set of four compared to the variance of the mean of the observations, ␴␷ , columns correspond to cross correlations of 0.0, 0.3, 0.6, and and ␥2 in (32) never varied more than 25% from the average i 0.9, respectively, when the average model error variance ␥៮ 2 ϭ model error variance. 0.0. Other sets of four columns correspond to different values The average value of the model error variance is obtained by of ␥៮ 2. Figure 1a also contains the efficiencies of the OLS taking the expectation over the drainage areas of interest on ␤ˆMC estimator OLS reported by Stedinger and Tasker. right hand side of (32) to obtain ␤Ê In general, the efficiency of GLS is nearly 100% and the ͓␥2͔ ϭ ␴2 ϩ 2͕␣2 ϩ ␣ ␤ ͓ ͑ ͔͒ ␤ˆMC E i ␷ zp ␴ 2 ␴ ␴ E ln Ai efficiency of GLS is greater than 90% for the cases examined. The apparent exception is the efficiency for ␤ˆMC when ␳ ϭ 0 ϩ ␤2 ͓͑ ͑ ͒͒2͔͖ ͓ ͑␴2͒ Ϫ ͔ GLS ␴ E ln Ai exp ␦ 1 (33) and ␥៮ 2 ϭ 0.0, in which case the computed efficiency is only 80%. The computed mse of ␤ˆMC for this case (0.004) is small, Stedinger and Tasker’s experiments included six values of ␴␷ s GLS ␥2 ϭ and Stedinger and Tasker reported only one significant digit; which correspond to E[ i ] 0.0, 0.011, 0.102, 0.284, 0.557, and 0.922. These average values can be used to con- thus this low efficiency is likely due to rounding error. The ␤ˆMC struct a diagonal model error variance matrix ⌫ with constant efficiency of GLS based on the reported variance of the indi- ⌳ vidual parameter estimators was approximately 90% [Kroll, elements to compute the residual error covariance matrix E. When implementing the GLS estimator in their Monte 1996], which further confirms that this low efficiency is due to Carlo experiment, Stedinger and Tasker computed a general- rounding error. ized mean square error estimator of the average model error In the experiment reported in Figure 1a the high efficiencies ␤Ê variance by solving of GLS indicate that the assumptions Stedinger and Tasker made when simplifying the residual error covariance matrix ͑ˆ␪ Ϫ ␤ˆ ͒T͓⌳ ͔Ϫ1͑ˆ␪ Ϫ ␤ˆ ͒ ϭ Ϫ X MC X N k (34) had relatively little effect on the performance of the estimator 126 KROLL AND STEDINGER: REGIONAL HYDROLOGIC ANALYSIS

Figure 1. Efficiency of GLS and OLS estimators versus average model error variance for 50-year return period. compared to an estimator based on knowing the true residual sites, given by (32). In experiment 1 the maximum variation in ␤ˆMC ␥2 error covariance matrix. The high efficiencies of GLS indicate the model error variance, i , from the average model error the practical implementation of Stedinger and Tasker’s pro- variance, ␥៮ 2, in the region was 25%. In this case the model ␤Ê ␤ˆMC posed GLS estimator GLS in the form of GLS resulted in error variance was relatively constant across sites, and thus ␤Ê little reduction in the performance of the estimator, especially GLS, which employs a constant model error variance, per- ␥៮ 2 ␤ˆT when was greater than 0.1. formed well compared to GLS. Of interest is the relative ␤Ê ␤ˆT Figure 1a also includes the efficiency of the OLS estimator performance of GLS and GLS when the model error variance ␤ˆMC ␤ˆT OLS compared to the best GLS estimator GLS. For large has greater variability across sites in a region, as well as the ␥៮ 2 values of the efficiency of the OLS estimator is close to the effect return period has on the estimators’ performance. ␤ˆMC ␥៮ 2 2 2 efficiency of GLS. For small the efficiency of the OLS The case where ␴␦ ϭ ␴␷ was examined, which yields a max- estimator drops considerably. For moderate model error vari- ␥2 ␥៮ 2 imum variation of i from of 93%. This corresponds to a ances, as the cross correlation increases, the relative efficiency realistic but perhaps extreme case wherein the relative preci- of the OLS estimator decreases. The efficiency of the OLS sion with which the median flood flow and the log space stan- estimator depends on whether the elements along the diagonal dard deviation of the flood flows can be estimated by their ⌳ of are close to constant and on the relative magnitude of the respective models is roughly the same. The same values of ␥៮ 2 ⌳ off-diagonal elements of compared to the diagonal elements. and ␳ examined in Stedinger and Tasker’s experiment 1 were If the off-diagonal elements are relatively small and the diag- examined for a quantile estimator with a 50-year return period. onal elements are close to constant, the OLS estimator is ␤Ê ␤ˆT 2 Figure 1b contains the efficiency of GLS and OLS relative to almost as efficient as the GLS estimator. For small ␥៮ the effect T 2 2 ␤ˆ for this case. As ␴␦ increases relative to ␴␷, the model of heteroscedasticity due to the sampling error in the at-site GLS error variance varies more across sites and we observe a drop quantile estimators is greater than when ␥៮ 2 is large, and thus in the relative efficiency of both ␤Ê and ␤ˆT . The decrease the OLS estimator performs worse when ␥៮ 2 is small. Kroll GLS OLS in efficiency of ␤Ê is larger for cases with larger model error [1996] showed that the mse and variance of the OLS estimator GLS s variances, because the heteroscedasticity along the diagonal reported in Stedinger and Tasker’s [1985] Monte Carlo analysis, ⌳ ␤ˆMC terms of due to variations in the model error variances is OLS, was nearly identical to the average mses and variance of T greater for these cases. For the cases presented in Figure 1b the OLS estimator using (11), ␤ˆ , as they should be. OLS ␤Ê the efficiency of GLS is generally greater than 90%. This result indicates that the assumption of a constant model error vari- 6. Comparison of OLS and GLS Estimators ance in regions where the model error variance varies as much Using Analytic Expressions as 90% from the average model error variance only produces ␤Ê a 10% drop in the efficiency of GLS for quantile estimators Obtaining the mses of the OLS and GLS quantile estimators requires significantly less effort using the average over 100 with a 50-year return period. As the cross correlation increases, ␤Ê ␤Ê replicates of the analytic expressions (equations (7), (9), and the efficiency of GLS increases, since GLS correctly describes (11)), instead of Stedinger and Tasker’s complete Monte Carlo the cross correlation between the quantile estimators. For ␤ˆMC simulation using randomly generated streamflows. Using ana- these cases one would expect that the efficiency of GLS to ␤Ê lytic expressions, we examined how the mses of the OLS pa- track that of GLS and be somewhere between the efficiency of ␤ˆT ␤Ê ␤ˆT rameter estimator OLS compares to the GLS parameter esti- GLS and the OLS estimator OLS. ␤Ê mator GLS for a number of different cases. The previous Also of interest is the effect of return period on the perfor- ␤ˆMC ␤Ê ␤ˆT section demonstrated that GLS performed almost as well as mance of GLS and OLS. Figures 2a and 2b contain the effi- ␤Ê ␤Ê ␤ˆT ␤ˆT GLS in Stedinger and Tasker’s Monte Carlo experiment 1. ciency of GLS and OLS relative to GLS for a quantile esti- 2 2 In experiment 1 the variance of the residual terms in (14) mator with a 2-year return period when ␴␦ ϭ 0.0625␴␷ and ␴2 ϭ ␴2 ␴2 ϭ ␴2 ϭ and (15) were related by ␦ 0.0625 ␷. This relationship ␦ ␷, respectively. With a 2-year return period, zp 0in determines the variability in the model error variance across (32), and thus the model error variance is constant across sites KROLL AND STEDINGER: REGIONAL HYDROLOGIC ANALYSIS 127

Figure 2. Efﬁciency of GLS and OLS estimators versus average model error variance for 2-year return period.

2 2 in a region regardless of the relationship between ␴␦ and ␴␷.In Figures 3a and 3b consider 100-year quantile estimators. For ␴2 ϭ ␴2 ϭ Figure 2b, where ␦ ␷, we see a slight loss in efficiency in a 100-year return period zp 2.326, which produces more ␤Ê ␴2 ϭ ␴2 GLS compared to Figure 2a, for which ␦ 0.0625 ␷. This heteroscedasticity in the model error variance across the re- loss is due to errors incurred when modeling the sampling gion than when the return period is 2 or 50 years. Because ␤Ê ␴2 ␴2 ␤Ê covariance matrix in GLS.As ␦ increases relative to ␷, the GLS models the model error variance as constant across sites, ␤Ê variance of the at-site standard deviation increases, which in- we observe a greater loss in efficiency in GLS as the return 2 2 creases the error in modeling the variance of the flows as period increases and ␴␦ increases relative to ␴␷, as in Figure 3b, ␴2 ␴2 ␤Ê E[ i ] by (28), as opposed to the true value i in (15). The though the efficiency of GLS is always greater than 80% for relatively small loss in efficiency observed in Figure 2b indi- the cases examined. This result indicates that a GLS estimator ␴2 cates that modeling the variance of the flows as E[ i ] pro- which accounts for varying model error variance, as proposed ␤Ê duces only minor loss of efficiency in the estimator GLS. by Tasker and Stedinger [1989], should be implemented when In Figures 2a and 2b we also observe that the loss in effi- the return period of the quantiles of interest are 100 years or ␤ˆT ciency of the OLS estimator OLS for a 2-year return period is greater if high efficiency is desired, especially when moderate much smaller than when the return period was 50-year (Fig- to large average model error variances are present. In Figure ␤ˆT ures 1a and 1b). This is most dramatic for larger model error 3b we also observe an increased loss in efficiency of OLS variances and smaller cross correlations when the efficiency of because of the increased heteroscedasticity. ␤ˆT ␤Ê OLS is almost as large as the efficiency of GLS. This is because with a 2-year return period the model error variance is constant across sites, so for larger average model error vari- 7. Conclusions ances the diagonal term of the true residual error covariance Stedinger and Tasker [1985] used Monte Carlo simulation to matrix is nearly homoscedastic. The effect of cross correlation compare the average sampling mean square error (mses)of ␤ˆT on the efficiency of OLS decreases as the average model error ordinary least squares (OLS) and generalized least squares variance increases. (GLS) quantile estimators in regional hydrologic regression

Figure 3. Efﬁciency of GLS and OLS estimators versus average model error variance for 100-year return period. 128 KROLL AND STEDINGER: REGIONAL HYDROLOGIC ANALYSIS

␤Ê analyses. Stedinger and Tasker made a number of simplifying nearly as well as GLS, especially when the cross correlation of assumptions regarding the structure of the model error vari- the flows was small. In this case an OLS estimator, which is ance and the sampling error associated with at-site quantile much easier to implement than a GLS estimator, could be estimators when constructing their residual error covariance implemented with little or no loss in efficiency. One should matrix. They used smoothed estimators of the at-site variance note that the efficiency of the ␤ estimators is only one of the of the flows, adopted an average regional cross correlation, and advantages of GLS procedures: Stedinger and Tasker [1985] employed a single average generalized mean square model observe that GLS estimators also provide more accurate esti- variance estimator in their ideal residual error covariance ma- mators of model error variances and the precision of estimated ⌳ trix, E. In practice, all of these parameters and statistics had parameters than do OLS analyses. to be estimated, so the residual error covariance matrix actu- ally employed, ⌳ , differs from ⌳ . Both of these matrices MC E Acknowledment. This material is based upon work partially sup- are different from the true residual error covariance matrix, ⌳ ported by the Cooperative State Research Service, USDA, under T, associated with the underlying model in their Monte Carlo project 97CRMS06102. experiment. The variance of the GLS parameter estimator implemented References by Stedinger and Tasker in their Monte Carlo experiment, Benson, M. A., Factors influencing the occurrence of floods in a humid ␤ˆMC GLS, was compared to the variance of GLS parameter esti- region of diverse terrain, U.S. Geol. Surv. Water Supply Pap. 1580-B, ⌳ mators based on the residual error covariance matrices E and 61 pp., 1962. ⌳ ␤Ê ␤ˆT T, denoted GLS and GLS, respectively. Since the parameter David, H. A., Order Statistics, John Wiley, New York, 1981. estimator ␤ˆT is based on ⌳ , this estimator is unbiased and Draper, N. R., and Smith, H., Applied Regression Analysis, John Wiley, GLS T New York, 1981. has minimum variance among all linear unbiased estimators. Greene, W., Econometric Analysis, Macmillan, New York, 1990. Using the average over 100 replicates of the new analytic Jennings, M. E., W. O. Thomas, and H. C. Riggs, Nationwide summary ␤Ê ␤ˆT expressions for the mses of GLS and GLS, it was shown that of U.S. geological survey regional regression equations for estimat- ␤Ê ing magnitude and frequency of floods for ungaged sites, 1993, U.S. the mses of GLS is almost indistinguishable from the mses of ␤ˆT for the cases considered by Stedinger and Tasker. For Geol. Surv. Water Resour. Invest. Rep. 94-4002, 1994. GLS Johnston, J., Econometric Methods, McGraw-Hill, New York, 1984. these cases the approximations employed to obtain a smoothed Kroll, C. N., Censored data analyses in water resources, Ph.D. thesis, covariance matrix would result in almost no loss of efficiency. School of Civ. and Environ. Eng., Cornell Univ., Ithaca, N. Y., 1996. In addition, the parameter estimator implemented by Ste- Loucks, D. P., J. R. Stedinger, and D. A. Haith, Water Resource Systems dinger and Tasker, ␤ˆMC , had an mse almost as small as ␤Ê . Planning and Analysis, Prentice-Hall, Englewood Cliffs, N. J., 1981. GLS s GLS Matalas, N. C., and E. J. Gilroy, Some comments on regionalization in In most cases the difference was less than 10%. Thus for this hydrologic studies, Water Resour. Res., 4(6), 1361–1369, 1968. ␤ˆT case the difference in efficiency between GLS, which is the Moss, M. E., and G. D. Tasker, An intercomparison of hydrological best linear unbiased GLS estimator one could implement, and network-design technologies, J. Hydrol. Sci., 36(3), 209–221, 1991. ␤ˆMC Riggs, H. C., Low-Flow Investigations: U.S. Geological Survey Tech- GLS, which represents the practical GLS estimator employed by Stedinger and Tasker, is relatively small. niques of Water Resource Investigations, book 4, chap. B1, U.S. Geol. Surv., Denver, Colo., 1972. The mses of the OLS estimator from Stedinger and Tasker’s Stedinger, J. R., Estimating a regional flood frequency distribution, ␤ˆMC [1985] Monte Carlo experiment, OLS, was also compared to Water Resour. Res., 19(2), 503–510, 1983. the GLS estimators. For large model error variances, the effi- Stedinger, J. R., and G. D. Tasker, Regional hydrologic analysis, 1, ciency of the OLS estimator is close to the efficiency of ␤ˆMC , Ordinary, weighted, and generalized least squares compared, Water GLS Resour. Res., 21(9), 1421–1432, 1985. (Correction, Water Resour. but for a small model error variance the efficiency of the OLS Res., 22(5), 844, 1996.) estimator drops considerably. For moderate model error vari- Stedinger, J. R., and G. D. Tasker, Regional hydrologic analysis, 2, ances, the relative efficiency of the OLS estimator decreases as Model-error estimators, estimation of sigma and log-Pearson type 3 the cross correlation increases. distribution, Water Resour. Res., 22(10), 1487–99, 1986. Using analytic expressions for the mse , the performance of Tasker, G. D., Hydrologic regression and weighted least squares, Water s Resour. Res., 16(6), 1107–1113, 1980. ␤ˆT ␤Ê ␤ˆT OLS and GLS relative to GLS were compared for a number Tasker, G. D., and J. R. Stedinger, An operational GLS model for of cases not considered by Stedinger and Tasker [1985]. In hydrologic regression, J. Hydrol., 111, 361–375, 1989. particular, a model with a more heteroscedastic model error Tasker, G. D., S. A. Hodge, and C. S. Barks, Region of influence variance was considered for return periods of 2, 50, and 100 regression for estimating the 50-year flood at ungaged sites, Water Resour. Bull., 32(1), 163–170, 1996. years. For models where the model error variance varied con- Thomas, D. M., and M. A. Benson, Generalization of streamflow ␤Ê siderably across sites, some loss in efficiency in GLS was ob- characteristics from drainage-basin characteristics, U.S. Geol. Surv. served (up to 20%). It was shown that the loss in efficiency of Water Supply Pap. 1975, 1970. ␤Ê Thomas, M. P., and M. A. Cervione, A proposed streamflow data GLS resulted from using an average model error variance and not from smoothing the sampling covariance matrix. The loss program for Connecticut, Conn. Water Resour. Bull., 23, 1970. Vogel, R. M., and C. N. Kroll, Generalized low-flow frequency rela- of efficiency was particularly apparent for large return periods tionships for ungaged sites in Massachusetts, Water Resour. Bull., and moderate to large average model error variances. In this 26(2), 241–253, 1990. case a GLS estimator which accounts for varying model error variance such as that developed by Tasker and Stedinger [1989] C. N. Kroll, Environmental Resources and Forest Engineering, Col- should be implemented. They used a three-parameter error lege of Environmental Science and Forestry, State University of New York, Syracuse, NY 13210-2778. (e-mail: [email protected]) model to account for correlation between the log space means J. R. Stedinger, School of Civil and Engineering, Cornell University, and standard deviations of the flood flows. Ithaca, NY 14850-3501. Overall, for a small return period and moderate to large ␤ˆT model error variance, the OLS estimator OLS performed (Received August 20, 1996; accepted September 22, 1997.)