<<

water

Article Impact of Dataset Size on the Signature-Based Calibration of a Hydrological Model

Safa A. Mohammed 1,2, Dimitri P. Solomatine 2,3,4, Markus Hrachowitz 3 and Mohamed A. Hamouda 1,5,*

1 Department of Civil and Environmental Engineering, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates; [email protected] 2 IHE-Delft, Institute for Water Education, P.O. Box 3015, 2601 DA Delft, The Netherlands; [email protected] 3 Water Resources Section, Faculty of Civil Engineering and Applied Geosciences, Delft University of Technology, P.O. Box 5048, 2628 CN Delft, The Netherlands; [email protected] (D.P.S.); [email protected] (M.H.) 4 Water Problems Institute of RAS (Russian Academy of Sciences), 119991 Moscow, Russia 5 National Water and Energy Center, United Arab Emirates University, Al Ain P.O. Box 15551, United Arab Emirates * Correspondence: [email protected]; Tel.: +971-3-713-5155

Abstract: Many calibrated hydrological models are inconsistent with the behavioral functions of catchments and do not fully represent the catchments’ underlying processes despite their seemingly adequate performance, if measured by traditional statistical error metrics. Using such metrics for calibration is hindered if only short-term data are available. This study investigated the influence of varying lengths of streamflow observation records on model calibration and evaluated the usefulness  of a signature-based calibration approach in conceptual rainfall-runoff model calibration. Scenarios  of continuous short-period observations were used to emulate poorly gauged catchments. Two Citation: Mohammed, S.A.; approaches were employed to calibrate the HBV model for the Brue catchment in the UK. The Solomatine, D.P.; Hrachowitz, M.; first approach used single-objective optimization to maximize Nash–Sutcliffe efficiency (NSE) as Hamouda, M.A. Impact of Dataset a goodness-of-fit measure. The second approach involved multiobjective optimization based on Size on the Signature-Based maximizing the scores of 11 signature indices, as well as maximizing NSE. In addition, a diagnostic Calibration of a Hydrological Model. model evaluation approach was used to evaluate both model performance and behavioral consistency. Water 2021, 13, 970. https://doi.org/ The results showed that the HBV model was successfully calibrated using short-term datasets 10.3390/w13070970 with a lower limit of approximately four months of data (10% FRD model). One formulation of the multiobjective signature-based optimization approach yielded the highest performance and Academic Editor: Xing Fang hydrological consistency among all parameterization algorithms. The diagnostic model evaluation

Received: 8 March 2021 enabled the selection of consistent models reflecting catchment behavior and allowed an accurate Accepted: 29 March 2021 detection of deficiencies in other models. It can be argued that signature-based calibration can be Published: 31 March 2021 employed for building adequate models even in data-poor situations.

Publisher’s Note: MDPI stays neutral Keywords: HBV model; hydrological signatures; multiobjective optimization; diagnostic evaluation with regard to jurisdictional claims in approach; dataset size; lumped model calibration; poorly gauged catchments; Brue catchment published maps and institutional affil- iations.

1. Introduction Model calibration in a hydrological modeling context entails finding the most appro- Copyright: © 2021 by the authors. priate of parameters to obtain the best model outputs resembling the observed system’s Licensee MDPI, Basel, Switzerland. behavior. Model calibration can be performed manually; however, it is an inefficient This article is an open access article method because it is time-consuming and depends on the modeler’s experience. Therefore, distributed under the terms and much effort has been made over the past decades to develop effective and efficient cali- conditions of the Creative Commons bration methods such as automated (computer-based) calibration, especially in the view Attribution (CC BY) license (https:// of advances in computer technology and algorithmic support for solving optimization creativecommons.org/licenses/by/ problems [1,2]. Various metrics are used in model calibration. The most widely used 4.0/).

Water 2021, 13, 970. https://doi.org/10.3390/w13070970 https://www.mdpi.com/journal/water Water 2021, 13, 970 2 of 25

metrics are borrowed from classical statistical approaches, such as minimizing squared residuals (the difference between the observations and model simulation outputs), maxi- mizing the correlation coefficient, or aggregating several metrics such as the Kling–Gupta efficiency [1,3–5]. The calibration of a hydrological model requires multiobjective optimization because no single metric can fully describe a simulation error distribution [6,7]. For the past two decades, evolutionary-based multiobjective optimization has been used for hydrological models [8]. Multiobjective optimization has a broad range of applications in engineer- ing and water-resource management, particularly in hydrological simulations [8,9]. For exposure to this topic, readers may be directed, e.g., to Efstratiadis and Koutsoyiannis (2010) who reviewed several case studies that included multiobjective applications in hydrology [10]. In hydrological model calibration, multiobjective optimization can tradeoff between conflicting calibration objectives and can define a solution corresponding to Pareto front’s knee point, which is nearest to the optimum point and can be considered the best individual tradeoff solution [11,12]. Many researchers have argued that calibration of rainfall-runoff models should not be limited to ensuring the fitness of model simulations to observations; it should also be able to produce other hydrological variables to ensure robust model performance and consis- tency [13]. Martinez and Gupta (2011) approached the concept of hydrological consistency by recommending that the model structures and parameters produced in the classical max- imum likelihood estimation should be constrained to replicate the hydrological features of the targeted process [14]. Euser et al. (2013) defined consistency as “the ability of a model structure to adequately reproduce several hydrological signatures simultaneously while using the same set of parameter values” [13]. To improve model calibration, hydrological signatures as objective functions have received more attention over the last decade [7,15–19]. Hydrological signatures reflect the functional behavior of a catchment [20], allowing the extraction of maximum information from the available data [21–26]. New model calibra- tion metrics are continuously being developed to identify optimal solutions that are more representative of the employed hydrological signatures [7,16,17,19,27–29]. However, to the best of our knowledge, Shafii and Tolson (2015) were the first to consider numerous hydrological signatures with multiple levels of acceptability in the context of full multiob- jective optimization to calibrate several models; they demonstrated the superiority of this approach over other approaches that are based on optimizing residual-based measures [29]. Streamflow records of several years are necessary for calibrating hydrological models [30], making the calibration of hydrological modeling of poorly gauged catchments or that in situa- tions of considerable data gaps challenging. Several studies have investigated the possibility of using both limited continuous and discontinuous periods of streamflow observations to calibrate hydrological models [31–43]. Tada and Beven (2012) proposed an effective method to extract information from short observation periods in three Japanese basins. They examined calibration periods spanning 4–512 days; they randomly selecting their starting day and re- ported varying performances, concluding the challenge of pre-identifying the performing short periods in ungauged basins [37]. Sun et al. (2017) obtained performances similar to that of the full-length dataset model when calibrating a physically based distributed model with limited continuous daily streamflow data records (less than one year) in data-sparse basins [32]. Previous studies have explored the discontinuous streamflow data on two bases that can be classified into two categories suggested by Reynolds et al. (2020). One is on the basis that the available data are limited only to separated spots of discharge and the second that the continuous records of discharge are available but only for a few events [39]. In the context of the first category, Perrin et al. (2007) achieved robust parameter values for two rainfall-runoff models by random sampling of 350 discontinuous calibration days, including dry and wet conditions, in diverse climatic and hydrologic basins in the USA [38]. They concluded that in the driest catchments, stable parameter values are harder to achieve. Pool et al. (2017) investigated an optimal strategy for sampling runoff to constrain a rainfall-runoff model using only 12 daily runoff measurements in Water 2021, 13, 970 3 of 25

one year in 12 basins in temperate and snow-covered regions throughout the east of the USA. They found that the sampling strategies comprising high-flow magnitude result in better hydrograph simulations, whereas strategies comprising low-flow magnitude result in better flow duration curve (FDC) simulations [43]. In the context of the second category, Seibert and McDonnell (2015) investigated the significance of limited streamflow observations and soft data in the Maimai basin (New Zealand) [40]. They found that 10 records of discharge data sampled from high flows used to inform the calibration of a simple rainfall-runoff model result in similar results when using three months of continuous discharge data for calibration. Reynolds et al. (2020) examined the hypothesis that limited flood-event hydrographs in a humid basin are adequate to calibrate a rainfall-runoff model. Their results indicate that two to four calibration events can substantially improve flood predictions for accuracy and uncertainty reduction; however, adding more events resulted in limited performance improvements [39]. In the context of signature satisfaction and absence of time-series, Gharari et al. (2014) proposed an alternative approach for parameter identification was based on prior expe- rience (or professional knowledge) constraints. The algorithm of parameter searching was based on random search (stepwise Monte Carlo sampling) under these constraints. The parameter sets selected using this approach led to a consistent model, along with the potential to reproduce the functional behavior of the catchment [44]. In our study, we evaluate the hypothesis that using several hydrological signatures as objective functions will directly improve the calibration process in the situation of limited data. To conclude, much of the previous research has focused on identifying and evaluating the minimum requirements of data for model calibration regarding quantity and methods to find the most informative sections of hydrographs and exploring approaches for optimal sampling strategies. This study predominantly focuses on the development and evaluation of a signature-based model calibration approach that incorporates several hydrological signatures to guide the parameter search toward regions of hydrological consistency in the search space under various data-availability scenarios. Different setups of the multiobjective signature-based (MO-SB) optimization approach are compared to single- objective (SO) calibration using traditional error metrics. The focus is primarily on cases where only continuous short-period observations were available. This study would provide a practical solution to successfully calibrate conceptual rainfall-runoff models in terms of both performance and consistency for poorly gauged catchments.

2. Study Area and Datasets Selection of the case study was driven by the necessity of having enough observational data to conduct experiments with progressive reduction of data availability. The Brue catchment in the UK was chosen. It covers an area of 135 km2 in the southwest of England, starting from Brewham and ending at Burnham-on-sea, and the outlet is at Lovington. This catchment is covered by three weather radars and a densely distributed rain gauge station. Researchers have comprehensively studied the area for modeling rainfall runoff and for precipitation forecasting, especially during the hydrological radar experiment [27,45–47]. The catchment is predominantly characterized by some low hills, discontinuous groups of rocks under clay soils, and major grasslands. Figure1 shows the topography of the catchment and outlet’s location. Hourly precipitation data and discharge data from the Lovington gauging station are used in this study. Data were obtained from radar and rainfall gauges with a resolution of 15 min; in addition, data of potential evapotranspiration that was computed using a modified Penman method recommended by the Food and Agricultural Organization were obtained using automatic weather station data (temperature, solar radiation, humidity, wind speed) [48]. Water 2021, 13, 970 4 of 25 Water 2021, 13, x FOR PEER REVIEW 4 of 26

FigureFigure 1. 1.Brue Brue catchment. catchment.

ThreeHourly years precipitation and four months data and of discharge hourly data data from from 1 Septemberthe Lovington 1993 gauging to 31 December station are 1996used were in this selected study. as Data the full were dataset obtained (FD) from to calibrate radar and the model,rainfall and gauges one yearwith anda resolution almost oneof month15 min; of in data addition, from 1 Junedata 1997of potential to 3 June evapotranspiration 1998 was selected as that the validationwas computed dataset using [45]. a modified Penman method recommended by the Food and Agricultural Organization 3.were Methodology obtained using automatic weather station data (temperature, solar radiation, humid- ity, Thewind procedure speed) [48] for. a signature-based model calibration followed sequential steps (FigureThree2). The years following and four subsections months of provide hourly detailsdata from on the 1 September implemented 1993 procedure. to 31 December 1996 were selected as the full dataset (FD) to calibrate the model, and one year and almost one month of data from 1 June 1997 to 3 June 1998 was selected as the validation dataset [45].

3. Methodology The procedure for a signature-based model calibration followed sequential steps (Figure 2). The following subsections provide details on the implemented procedure.

Figure 2. Steps followed in the methodology.

Water 2021, 13, 970 5 of 25

3.1. Selection of Hydrological Signatures The literature uses numerous hydrological signatures, either in hydrological model evaluation and calibration or catchment classification [20,29,49]. This study follows the guidelines or criteria for signature selection suggested by Mcmillan et al., (2017) [50]. The signatures were derived from the available time-series data as the basis of analysis. The selection process yielded 11 hydrological signatures listed in Table1: three signatures extracted from three segments of the FDC, four signatures related to streamflow and precipitation, and four signatures characterizing the discharge statistics. The selected signatures have a distinct link to the hydrological process, leading to a better of the catchment’s functional behavior. Moreover, their scales do not depend on the catchment size as they represent different parts of the flow hydrograph.

Table 1. Summary of hydrological and statistical signatures used in the study.

Symbol Hydrological Signature Equation Comments References h = 1, 2, 3, . . . ., H are the indices of High-flow segment volume of the H FHV (FDC) high flows; their probability of [51] flow duration curve ∑ Qh h=1 exceedance is <0.02 l = 1, 2, . . . , L are the indices of low Low-flow segment volume of the L flows; their probability of FLV (FDC) [51] flow duration curve −1 × ∑ [log Ql − log QL] exceedance is between 0.7 and 1.0 (L = l 1 is the minimum flow index) m1 and m2 are the lowest and Medium-flow segment of the flow highest flow exceedance probabilities FMS (FDC) log Q − log Q [51] duration curve m1 m2 within the mid-segment of FDC (0.2 and 0.7, respectively, in this study) QDt is the filtered surface runoff at t time-step, Qt is the total flow QDt = CQDt−1 + (original streamflow) at t 1+C − 2 (Qt Qt−1) time-step, Q is the baseflow at t I Baseflow index Bt [20,52] BF QBt = Qt − QDt time-step, C is the filter parameter N I = QBt (0.925), IBF is the baseflow index, BF ∑ Qt t=1 and N is the total time steps of the study period RQP is the runoff ratio, Q is the Q RQP Runoff ratio RQP = P long-term average streamflow, and P [20,49,51] is the long-term precipitation NRL is rising limbs(number of peaks

NRL of the hydrograph) and TR is the RLD Rising limb density RLD = [13,20,49,53] TR total time that the hydrograph is rising dQ/Q is the proportional change in the streamflow, dP/P is the proportional change in dQ/Q dQ P precipitation, Qt and Pt are EQP = dP/P = dP Q EQP Stream flow elasticity   streamflow and precipitation, [20,54] Qt−Q Q EQP = median t Pt−P P respectively, at time-step, and Q and P are the mean of streamflow and precipitation, respectively, in the long-term Qt is the streamflow at t time-step N Q Mean discharge and N is total time steps of the study [29,55,56] mean ∑ Qt/N t=1 period Qmedian Median discharge M(Q) [29] Qt is the streamflow at t time-step, s Q is the mean of the streamflow, and DV(Q) Discharge variance N 2 [29] ∑ Qt − Q /(N − 1) N is the total time steps of the study t=1 period P(Q) is the peak of the streamflow Q Peak discharge P(Q) [29] peak data Water 2021, 13, 970 6 of 25

3.2. Data Setup To meet the study objective, several scenarios were created to obtain different dataset sizes representing various levels of information deficiency. The following steps were followed to set up the data. 1. Select a long-period dataset as an FD for model calibration (benchmark dataset) (Table2); 2. Select an additional dataset for model validation (Table2); 3. Divide the FD into partial datasets progressively decreasing in size, from long-term- to short-term data using four scenarios (Table2): 1. Scenario 1: Each new data subset is composed by removing a certain amount of data (a certain percentage, e.g., remove 25% of the total data) from the end of the FD (Figure3); 2. Scenario 2: The new data subset is created by removing an equal amount of data from both the start and end of the FD (Figure3); 3. Scenario 3: A section of the FD represents a short continuous dry period (no precipitation); 4. Scenario 4: A section of the FD represents a short continuous wet period (fre- quent and intensive precipitation).

Table 2. Datasets used in the study.

Dataset Date (from–to) Number of Data Records FD 1 September 1993 00:00–31 December 1996 23:00 29,232 Water 2021, 13, x FOR PEER REVIEW 7 of 26 Validation dataset 1 June 1997 01:00–30 June 1998 23:00 9478 Scenario 1 75% FRD 1 September 1993 00:00–2 March 1996 11:00 21,924 50% FRD2. Select 1an September additional 1993 dataset 00:00–2 for May model 1995 22:00validation (Table 2); 14,615 25% FRD 1 September 1993 00:00–2 July 1994 11:00 7308 10% FRD3. Divide 1 September the FD 1993into 00:00–31partial datasets December progressively 1993 23:00 decreasing in size, 2928 from long-term- 5% FRDto short 1 September-term data 1993 using 00:00–31 four October scenarios 1993 (Table 21:00 2): 1462 Scenario 2 1. Scenario 1: Each new data subset is composed by removing a certain amount 75% FRD 31 January 1994 05:00–1 August 1996 17:00 21,924 50% FRDof 2 data July 1994(a certain 10:00–2 percentage, March 1996 e.g., 11:00 remove 25% of the total 14,615data) from the end 25% FRD 1 Decemberof the FD 1994(Figure 15:00–2 3); October 1995 05:00 7308 10% FRD2. Scenario 2 March 2: 1995 The 22:00–2 new Julydata 1995 subset 21:00 is created by removing an 2928 equal amount of 5% FRDdata 2 April from 1995 both 08:00–2 the start June and 1995 end 10:00 of the FD (Figure 3); 1462 Scenario 3 Dry-period dataset3. Scenario 1 June 1994 3: 00:00–1A section August of the 1994 FD 23:00 represents a short continuous 1488 dry period (no Scenario 4 precipitation); Wet-period dataset4. 27 DecemberScenario 4: 1994 A section 10:00–1 Februaryof the FD 1995 represents 13:00 a short continuous 868 wet period (fre- quent and intensive precipitation).

FigureFigure 3.3. ExampleExample ofof twotwo scenariosscenarios to to obtain obtain 75% 75% Fraction Fraction Retained Retained Dataset Dataset (FRD) (FRD) of of the the full full dataset da- taset (FD): scenario 1 (left) and scenario 2 (right). (FD): scenario 1 (left) and scenario 2 (right). Table 2. Datasets used in the study.

Number of Data Dataset Date (from–to) Records FD 1 September 1993 00:00–31 December 1996 23:00 29,232 Validation dataset 1 June 1997 01:00–30 June 1998 23:00 9478 Scenario 1 75% FRD 1 September 1993 00:00–2 March 1996 11:00 21,924 50% FRD 1 September 1993 00:00–2 May 1995 22:00 14,615 25% FRD 1 September 1993 00:00–2 July 1994 11:00 7308 10% FRD 1 September 1993 00:00–31 December 1993 23:00 2928 5% FRD 1 September 1993 00:00–31 October 1993 21:00 1462 Scenario 2 75% FRD 31 January 1994 05:00–1 August 1996 17:00 21,924 50% FRD 2 July 1994 10:00–2 March 1996 11:00 14,615 25% FRD 1 December 1994 15:00–2 October 1995 05:00 7308 10% FRD 2 March 1995 22:00–2 July 1995 21:00 2928 5% FRD 2 April 1995 08:00–2 June 1995 10:00 1462 Scenario 3 Dry-period dataset 1 June 1994 00:00–1 August 1994 23:00 1488 Scenario 4 Wet-period dataset 27 December 1994 10:00–1 February 1995 13:00 868

3.3. HBV Model Setup HBV is a conceptual model that can simulate runoff in different climate zones using precipitation, temperature, and potential evapotranspiration as inputs; it was developed

Water 2021, 13, 970 7 of 25

3.3. HBV Model Setup HBV is a conceptual model that can simulate runoff in different climate zones using precipitation, temperature, and potential evapotranspiration as inputs; it was developed by the Swedish Meteorological and Hydrological Institute and has been applied in more than 30 countries [57,58]. Various variants of the model have been suggested, e.g., HBV-Light by Seibert (1997) [59] and HBV-96 by Lindström et al. (1997) [60]. The model comprises various routines, namely, precipitation, snow, soil, response, and routing routines. Table3 presents the HBV parameters that are calibrated herein using the following methodology.

Table 3. Parameters of the HBV model targeted for calibration in this study.

Parameter Explanation Unit Precipitation Routine LTT Lower temperature threshold ◦C UTT Upper temperature threshold ◦C RFCF Rainfall corrector factor — SFCF Snowfall corrector factor — Snow Routine CFMAX Day degree factor mm ◦C−1 h−1 TTM The temperature threshold for melting ◦C CFR Refreezing factor — CWH Water holding capacity — Soil and Evaporation Routine FC Maximum soil moisture Mm ETF Total potential evapotranspiration mm h−1 Soil moisture threshold for evaporation LP — reduction (wilting point) E_CORR Evapotranspiration corrector factor — BETA Shape coefficient — C_FLUX Capillary flux in the root zone mm h−1 Response Routine K Upper zone recession coefficient h−1 K1 Lower zone recession coefficient h−1 Maximum percolation rate from the PERC mm h−1 upper to the lower tank ALPHA Response box parameter — Routing Routine MAXBAS Routing, length of weighting function H

3.4. Model Calibration Approaches The calibration comprises the single objective (SO) optimization and Multi-objective signature-based (MO-SB) optimization approaches.

3.4.1. Formulation of SO Optimization Approach In the SO approach, the constrained SO optimization algorithm is used to maximize Nash–Sutcliffe efficiency (NSE), equivalent to minimizing the mean-squared error divided by observation variance, as a goodness-of-fit measure. Nineteen parameters of the HBV model are the decision variables of the optimization problem. The upper and lower boundaries are the constraints. The calibration approach is first used to calibrate the benchmark model (using the full dataset FD) and then implemented for the datasets in the four scenarios (calibration of 12 datasets). The initial states of the models differ from one model to another, making it necessary to obtain the initial states of each model (can be done by randomized search or simply by trial and error), at the beginning of the modeling process. The Augmented Lagrangian Harmony Search Optimizer (ALHSO) algorithm (belong- ing to the class of randomized search algorithms) from the pyOpt Python library was used Water 2021, 13, 970 8 of 25

to solve the optimization problem. It has been applied in complex and continuous prob- lems, such as calibrating hydrologic models, and is efficient [61]. The ALHSO algorithm is suitable for solving an SO optimization problem and has fewer control parameters without the need to set the initial values of decision variables [10,62].

3.4.2. Formulation of MO-SB Optimization Approach In this approach, the calibration problem is solved by evaluating the extent of signature achievement in the parameter search process executed using an optimization algorithm. Signature achievement is measured by computing the signature score function, which com- pares the simulated and measured signatures. The multiobjective optimization problem was solved by maximizing 16 objective functions, including 15 individual hydrological signature score functions, with a certain level of acceptability (threshold), and the NSE. Decision variables and constraints are the same as in the first approach (SO). The ini- tial model states are known in this phase from the first experiment and can be reused. The nondominated sorting genetic algorithm (NSGA)-II algorithm [63] from the Inspired Python library was used to solve the optimization problem. NSGA-II is a multiobjective evolutionary algorithm belonging to the class of randomized search methods. NSGA-II has advantageous features compared to other multiobjective constrained optimizers regarding convergence to Pareto optimal solutions and ensuring their good spread in decision spaces and performs well in constrained problems [64]. The observed and simulated hydrological signatures were calculated for observations and model simulations. The signature deviations (Dev) between them were calculated individually (Equation (1)), consistent with past studies [29,51,65], and transferred to scores (normalizing value) using binary functions (Equation (3)). The idea of the binary score function is based on defining thresholds (+ or −) for the acceptable values of signatures (Equation (2)), meaning that if the value of deviation is within the limits, the score will equal 1 and, if not, the score will equal 0, as implemented by [29,66].

Signature − Signature Dev = Observed simulated (1) SignatureObserved

Acceptepility threshold ∗ SignatureObserved ± Devthreshold = ± (2) SignatureObserved  1, |Dev| ≤ |±Dev | Score = threshold (3) 0, |Dev| > |±Devthreshold| In this study, two acceptability thresholds were used (10% and 20% deviation), similar to previous research [29]. In addition, to explore the algorithm convergence (speed) and diversity in Pareto optimal sets, two crossover types were implemented in setting the NSGA-II (the blend crossover (BC) and uniform crossover (UC)). In total, four parameteri- zation algorithms were formulated and coded in Python: • 10% acceptability threshold and BC, MO-BC (10%); • 10% acceptability threshold and UC, MO-UC (10%); • 20% acceptability threshold and BC, MO-BC (20%); • 20% acceptability threshold and UC, MO-UC (20%);

3.5. The Diagnostic Model Evaluation Approach The diagnostic model evaluation approach is based on validating (testing) both model performance and consistency. The performance was evaluated by calculating the per- formance measures for each model (Table4). Consistency was evaluated by calculating the difference between the simulated hydrological signatures and those calculated from observed measurements (error). In the MO-SB calibration approach, the solution is the Pareto set that contains a large number of solutions (100 solutions), making it difficult to evaluate them all. We propose herein the idea of choosing and further evaluating a single Water 2021, 13, x FOR PEER REVIEW 10 of 26

Water 2021, 13, 970 9 of 25

best solution using a single aggregated criterion (score) after exploring the composition of the optimal Pareto set. We adopted the method of ideal point, i.e., choosing a solution closestbest solution to the ideal using point a single (for aggregated the considered criterion problem, (score) it afteris the exploring point where the compositionall objective functionsof the optimal have Paretoa value set. of 1). We In adopted this study, the methodNSE was of used ideal without point,i.e., normalization choosing a solutionbecause forclosest the considered to the ideal models, point (for it was the always considered between problem, 0 and it1. isMinimizing the point where the distance all objective to the idealfunctions point have is equivalent a value of to 1). maximizing In this study, the NSE distance was usedto 0; withoutthus, the normalization aggregated score because can befor written the considered as: models, it was always between 0 and 1. Minimizing the distance to the ideal point is equivalent to maximizing the distance to 0; thus, the aggregated score can be written as: 푁 v u 2 퐴𝑔𝑔푟푒𝑔푎푡푒푑푆푐표푟푒 =u √푁푆퐸2 + ∑N 푆푐표푟푒(푆𝑖𝑔푛푎푡푢푟푒) (4) AggregatedScore = tNSE2 + Score(Signature) 2 𝑖 (4) ∑𝑖=1 i i=1

TableTable 4. Performance measures matrix matrix.. Optimal Symbol Description Formula References Symbol Description Formula Optimal ValueValue References 푛 표푏푠 푠𝑖푚 2 NSE = 1 −∑𝑖=1(푌𝑖 − 푌𝑖 ) Nash–Sutcliffe 2 NSE NSE Nash–Sutcliffe efficiency n푁푆퐸(Y obs=− 1Y−sim ) 12 1 [17,49[17,,6749,68,67],68] efficiency ∑i=1 i i ∑푛 (푌 표푏푠 − 푌푚푒푎푛) n obs mean 𝑖2=1 𝑖 ∑i=1(Yi −Y ) 푛 RMSE = 1 Root mean s 푡 푡 2 RMSERMSE Root mean square error 푅푀푆퐸n = √ ∑(푌표푏푠 − 푌푠𝑖푚0) 0 [15,49[15,,6949,70,69],70] square error 1 t 푛 t 2 n ∑ (Yobs −푡Y=1sim ) t=1 ∑푛 표푏푠 푠𝑖푚 Percent bias 𝑖=1(푌𝑖 − 푌𝑖 ) × 100 푃퐵퐼퐴푆PBIAS= = 푛 표푏푠 PBIAS Percent bias (relative volume error) n obs sim ∑𝑖=1 푌𝑖 0 [17,67,71] PBIAS (relative volume ∑i=1(Yi −Yi )×100 0 [17,67,71] n obs error) ∑i=1 Yi

4. Results 4. Results OneOne validation datasetdataset (Table (Table2) 2) with with hourly hourly data data spanning spanning one yearone (1year June (1 1997 June 01:00– 1997 01:0030 June–30 1998June 23:00)1998 23:00 was used) was inused all experiments,in all experiments, whereas whereas the FD the was FD used was for used calibration. for cali- bration.Calibration Calibration was run was under run fourunder scenarios. four scenarios. Scenarios Scenarios 1 and 1 2 and have 2 thehave same the same number num- of berpartial of partial datasets datasets with different with different combinations, combinations, whereas whereas scenarios scenarios 3 and 43 areand limited 4 are limited to dry toand dry wet and periods, wet periods, respectively respectively (Table 2(Table). Figure 2). 4Figure shows 4 the shows number the number of records of ofrecords partial of partialdatasets datasets in the fourin the scenarios four scenarios models. models.

40,000

29,232 30,000

21,924 20,000

14,615 Number Number ofrecords 10,000 7308 2928 1462 1488 868

FD 7 5 % 5 0 % 2 5 % 1 0 % 5 % D r y W e t FRD FRD FRD FRD FRD P e r i o d P e r i o d Dataset

FigureFigure 4. 4. RecordRecord numbers numbers per per (partial) (partial) dataset. dataset. 4.1. Diagnostic Evaluation of the SO Optimization Approach 4.1. Diagnostic Evaluation of the SO Optimization Approach 4.1.1. General Characterization of Results 4.1.1. General Characterization of Results Although the evaluation criteria in this study were based on performance and con- sistency,Although it is worthwhile the evaluation to visuallycriteria in inspect this study the simulated were based flow on hydrographsperformance and from con- the sistencycalibrated, it modelsis worthwhile to provide to visually an overall inspect idea ofthe models’ simulated ability flow to hydrographs estimate the observedfrom the flow (peaks, low values). Figures5–7 show the simulated hydrographs of the FD model,

Water 2021, 13, x FOR PEER REVIEW 11 of 26 Water 2021, 13, x FOR PEER REVIEW 11 of 26

Water 2021, 13, 970 10 of 25 calibrated models to provide an overall idea of models’ ability to estimate the observed calibrated models to provide an overall idea of models’ ability to estimate the observed flow (peaks, low values). Figures 5, 6, and 7 show the simulated hydrographs of the FD flow (peaks, low values). Figures 5, 6, and 7 show the simulated hydrographs of the FD model, 50%-FRD model (scenario 2), and 5%-FRD model (scenario 2), respectively. Fig- model, 50%-FRD model (scenario 2), and 5%-FRD model (scenario 2), respectively. Fig- ures50%-FRD 8 and model9 show (scenario the simulated 2), and hydrographs 5%-FRD model for the (scenario dry- and 2), respectively.wet-period models, Figures respec-8 and9 ures 8 and 9 show the simulated hydrographs for the dry- and wet-period models, respec- tively.show the simulated hydrographs for the dry- and wet-period models, respectively. tively.

a a ) )

b b ) )

Figure 5. (a) Validation of the calibrated FD model using the single-objective (SO) approach; (b) FigureFigure 5. 5. (a()a Validation) Validation of ofthe the calibrated calibrated FD FD model model using using the thesingle single-objective-objective (SO) (SO) approach; approach; (b) (b) zoomed in for a clear view. zoomedzoomed in in for for a aclear clear view. view.

a) a)

b) b)

Figure 6. (a) Validation of the calibrated 50%-FRD model; (b) zoomed in for a clear view. FigureFigure 6. 6. (a(a) )Validation Validation of of the the calibrated calibrated 50% 50%-FRD-FRD model; model; ( (bb)) zoomed zoomed in in for for a a clear clear view. view.

WaterWaterWater 202120212021,,, 1313,,, xx 970 FORFOR PEERPEER REVIEWREVIEW 121211 ofof of 2626 25

aa))

bb))

FigureFigureFigure 7.7. 7. ((a(aa)) ) ValidationValidation Validation ofof of thethe the calibratedcalibrated calibrated 5%5% 5%-FRD--FRDFRD model;model; model; (( (bbb))) zoomedzoomed zoomed inin in forfor for aa a clearclear clear view.view. view.

aa))

bb))

FigureFigureFigure 8.8. 8. ((a(aa)) ) ValidationValidation Validation ofof of thethe the calibratedcalibrated calibrated drydry dry-period--periodperiod model;model; model; (( (bbb))) zoomedzoomed zoomed inin in forfor for aa a clearclear clear view.view. view.

WaterWater 20212021,, 1313,, x 970 FOR PEER REVIEW 1312 of of 26 25

a)

b)

FigureFigure 9. 9. (a(a) )Validation Validation of of the the calibrated calibrated wet wet-period-period model; model; ( (bb)) zoomed zoomed in in for for a a clear clear view. view.

TheThe simulation simulation graphs graphs show show that that the the FD FD and and 50% 50%-FRD-FRD models models show show relatively relatively good good resultsresults (Figures 5 5 and and6); 6); however, however, none none of them of them captured captured any peaks.any peaks. For instance, For instance, the maxi- the maximummum observed observed peak flowpeak wasflow 31 was m3/h, 31 whereasm3/h, whereas the simulated the simulated flow of theflow FD of and the 50%-FRD FD and 50%models-FRD were models 17.8 were and 21.517.8 mand3/h, 21.5 respectively, m3/h, respectively, which were which much were lower much than lower the observedthan the observedone was. one Based was. on theBased visual on the comparison, visual comparison, it is difficult it tois decidedifficult which to decide model which shows model better showsresults; better thus, results; the performance thus, the performance metrics and signaturesmetrics and must signatures be evaluated. must be The evaluated. 5%-FRD The and 5%dry-period-FRD and models dry-period (Figures models7 and (Figures8, respectively) 7 and 8, show respectively) poor results, show representing poor results, a typical repre- sentingcase with a typical short-period case with data short that do-period not hold data enough that do information not hold enough that could information help simulate that couldthe streamflow. help simulate However, the streamflow. the wet-period However, dataset the wet model-period(Figure dataset9) showed model reasonable(Figure 9) showedresults withreasonable a general results flow with overestimation a general flow of 19.52 overestimation m3/h for the of2.6-m 19.523 /hm3observed/h for the flow.2.6- mSome3/h observed peaks were flow. overestimated Some peaks (43.7were m overestimated3/h vs. 22.3 m 3(43.7/h observed), m3/h vs. 22.3 whereas m3/h otherobserved), peaks whereaswere underestimated other peaks were (7.2 munderestimated3/h vs. 30.1 m 3(7.2/h observed).m3/h vs. 30.1 m3/h observed).

4.1.2.4.1.2. General General Characterization Characterization of of Results Results AllAll models models in in scenario scenario 1 1 showed showed similar similar NSE NSE values values in in the the calibration calibration period, period, ranging ranging betweenbetween 0.87 0.87 and and 0.96, with the superioritysuperiority ofof thethe 5%-FRD5%-FRD model. model. However, However, in in the the valida- vali- dationtion period, period, the the 5%-FRD 5%-FRD model model showed showed poor poor performance performance with with an NSE an NSE of approximately of approxi- matelyzero whereas zero whereas the rest the of therest models of the showedmodels goodshowed NSE good values NSE with values an average with an deviation average deviationof 0.1 from of the0.1 from NSE inthe the NSE calibration in the calibration period (Table period5). (Table Root mean5). Root square mean error square (RMSE) error (RMSE)values invalues the calibration in the calibration and validation and validation periods periods were small, were ranging small, ranging between between 0.96 and 0.96 1.7 mm, except for the 5%-FRD model showing a 5.65-mm RMSE in the validation period and 1.7 mm, except for the 5%-FRD model showing a 5.65-mm RMSE in the validation (Table5). All PBIAS values in the calibration period were positive, whereas the validation period (Table 5). All PBIAS values in the calibration period were positive, whereas the period exhibited negative values from three models (25% FRD, 10% FRD, and 5% FRD). validation period exhibited negative values from three models (25% FRD, 10% FRD, and Negative PBIAS values indicate an underestimation in the flow simulation. The PBIAS of 5% FRD). Negative PBIAS values indicate an underestimation in the flow simulation. The the 25%-FRD and 10%-FRD models were acceptable but that of the 5%-FRD model was PBIAS of the 25%-FRD and 10%-FRD models were acceptable but that of the 5%-FRD high (−106.47), which is unacceptable (Table5), indicating that this model was built using model was high (−106.47), which is unacceptable (Table 5), indicating that this model was data that were insufficient to simulate the flow. built using data that were insufficient to simulate the flow.

Water 2021, 13, 970 13 of 25

Table 5. Results of performance measures for the four scenarios.

NSE RMSE PBIAS Dataset Calibration Validation Calibration Validation Calibration Validation Scenario 1 FD 0.87 0.77 1.14 1.7 9.16 10.2 75% FRD 0.87 0.77 1.25 1.6 8.9 10.73 50% FRD 0.89 0.81 1.31 1.52 8.42 9.61 25% FRD 0.89 0.85 1.28 1.5 8.81 −3.16 10% FRD 0.94 0.82 1.14 1.51 10.03 −7.78 5% FRD 0.96 0.01 0.96 5.65 1.2 −106.47 Scenario 2 FD 0.87 0.7 1.12 1.35 7.5 11.2 75% FRD 0.87 0.72 1.09 1.3 7.33 10.24 50% FRD 0.9 0.69 1.02 1.8 7.63 15.98 25% FRD 0.84 0.78 1.3 1.34 15.44 5.96 10% FRD 0.86 0.69 0.41 1.6 6.92 −22.64 5% FRD 0.52 0.1 0.25 2.8 −0.24 −37.1 Scenario 3 Dry- period 0.72 0.18 0.1 2.65 6.58 −36 dataset Scenario 4 Wet- period 0.86 0.57 2 1.75 6.19 25.6 dataset

Similarly, in scenario 2 (Table5), the 5%-FRD model showed poor NSE in the calibra- tion and validation periods (0.52 and 0.1, respectively). NSE fluctuated without a clear pattern in the calibration and validation periods. Overall, the NSE values of scenario 2 were lower than those of scenario 1, ranging between 0.69 and 0.78 (excluding the 5%-FRD model) in the validation period. The 5%-FRD model showed the lowest performance in terms of the RMSE (2.8 mm), whereas the rest of the models showed acceptable RMSE values, ranging between 0.41 and 1.8 mm in the calibration and validation periods. The 75%-FRD and 25%-FRD models had similar RMSEs (with an average of 0.3) in the cali- bration and validation periods, whereas the RMSE of the 50%-FRD and 10%-FRD models increased slightly in the validation period, reaching 1.6 and 1.8 mm, respectively. Accord- ing to the PBIAS values, the minimum limit to acquire acceptable performance was 25% FRD as models of the short-term data (10%-FRD and 5%-FRD models) resulted in high underestimations, as indicated by the PBIAS values (−22.64 and −37.1). Scenarios 3 and 4 (Table5) represent short-term data for the dry- and wet-period models. The wet-period model performed better than the dry-period model in terms of the NSE, RMSE, and PBIAS. Specifically, the wet-period model showed higher NSE in the validation period (0.57) than the dry-period model (0.18), whereas the RMSE of the wet-period model was 1 mm less than that of the dry-period model (2.65 mm). Both models were inaccurate in the validation period, with either high overestimation (wet-period model) or high underestimation (dry-period model), as indicated by the PBIAS values for both scenarios. The consistency evaluation of the SO calibrated models is discussed in Section 4.2.2, with the MO-SB calibrated models to compare them.

4.2. Diagnostic Evaluation of the MO-SB Optimization Approach In this section, a comparison between the four multiobjective optimization algorithms (MO-BC (10%), MO-BC (20%), MO-UC (10%), and MO-UC (20%)) and the SO optimization is provided. First, the performances of the models were evaluated; then, the consistency Water 2021, 13, 970 14 of 25

Water 2021, 13, x FOR PEER REVIEW 15 of 26

Water 2021, 13, x FOR PEER REVIEW 15 of 26

of the models was evaluated by comparing the difference between the observed and 2). Scenario 1 simulatedis not present valuesed because of each signature.its results Thepertain results to the presented simple incase this of section a gradually focus on scenarios decreasing2). dataset 2,Scenario 3 (dry-period size. 1 is not data), present anded 4 because (wet-period its results data) ofpertain the dataset to the sectioningsimple case (Table of a gradually2). Scenario decreasing1 is not presented dataset size. because its results pertain to the simple case of a gradually decreasing 4.2.1. Performancedataset Evaluation size. of MO-SB The evaluation4.2.1. Performance was based Evaluation on the closest of MO solution-SB to the ideal point in the Pareto set, 4.2.1. Performance Evaluation of MO-SB which has a maximumThe evaluation aggregated was basedscore onacc theording closest to Equation solution to(4). the Figure ideal 9point shows in thethe Pareto set, The evaluation was based on the closest solution to the ideal point in the Pareto set, NSE valueswhich of the has five a modelsmaximum in the aggregated validation score period. acc ordingThe MO to-BC Equation (20%) algorithm (4). Figure pa- 9 shows the which has a maximum aggregated score according to Equation (4). Figure9 shows the rameterizationNSE gavevalues the of highest the five NSE models in all in models. the validation The NSE period. obtained The from MO the-BC other (20%) three algorithm pa- NSE values of the five models in the validation period. The MO-BC (20%) algorithm algorithm parameterizationsrameterization gave varied the highest from one NSE model in all to models. another, The but NSE in most obtained cases, from the MO the- other three parameterization gave the highest NSE in all models. The NSE obtained from the other BC led to higheralgorithm NSE parameterizations than the MO-UC varied algorithm from oneparameterization model to another, did. but The in NSE, most 5%cases,- the MO- three algorithm parameterizations varied from one model to another, but in most cases, FRD, and dryBC- periodled to highermodels NSE resulted than in the low MO NSE,-UC indicating algorithm poorer parameterization performance did.than Thethe NSE, 5%- the MO-BC led to higher NSE than the MO-UC algorithm parameterization did. The NSE, rest of the FRD,models; and however, dry-period the models wet-peri resultedod model in low showed NSE, anindicating acceptable poorer NSE performance (average: than the 5%-FRD, and dry-period models resulted in low NSE, indicating poorer performance than 0.564). Therest RMSE of the values models; ranged however, between the 1.18 wet and-peri 2.8od mmmodel for showed all models, an acceptable which is rela- NSE (average: the rest of the models; however, the wet-period model showed an acceptable NSE (average: tively low.0.564) MO-BC. The (20%) RMSE yielded values the ranged lowest between RMSEs in1.18 all and models, 2.8 mm with for different all models, dataset which is rela- 0.564). The RMSE values ranged between 1.18 and 2.8 mm for all models, which is relatively sizes (Figuretively 10). low. Moreover, MO-BC the (20%) highest yielded RMSEs the werelowest observed RMSEs infor all the models, 5%-FRD with and different dry- dataset low. MO-BC (20%) yielded the lowest RMSEs in all models, with different dataset sizes period models (Figure 11). A noticeable decrease in the PBIAS value occurred in all mod- sizes(Figure (Figure 10). 10). Moreover, Moreover, the the highest highest RMSEs RMSEs were were observed observed for thefor 5%-FRDthe 5%-FRD and and dry-period dry- els after implementing MO-SB (Figure 12) with the different parameterization algorithms. periodmodels models (Figure (Figure 11). A11). noticeable A noticeable decrease decrease in the in PBIAS the PBIAS value value occurred occurred in all in models all mod- after The poor performanceelsimplementing after implementing was MO-SBalso observeMO (Figure-SB d(Figure 12for) withmodels 12) the with of different shortthe different-term parameterization data parameterization (5%-FRD, algorithms. dry algorithms.- The poor period, andThe wetperformance poor-period performance models). was also Thewas observed wetalso- periodobserve for models modeld for ofmodelsshowed short-term ofnegative short data-term PBIAS (5%-FRD, data as (5% the dry-period,- FRD, dry and- streamflowperiod, valueswet-period andwere wet overestimated models).-period Themodels). because wet-period The the wet modelsame-period dataset showed model was negativeshowed used as PBIASnegative the valida- as PBIAS the streamflow as the tion datasetstreamflow forvalues all models were values overestimated instead were of overestimated using because a different the because same dataset dataset the for same wasthe dataset wet used-period as was the model.used validation as the dataset valida- for tionall dataset models for instead all models of using instead a different of using dataset a different for the dataset wet-period for the model. wet-period model. SO MO-BC (10%) MO-BC (20%) MO-UC (10%) MO-UC (20%) SO MO-BC (10%) MO-BC (20%) MO-UC (10%) MO-UC (20%) 1 0.8 1 0.6 0.8

NSE 0.4 0.6 0.2 NSE 0.4 0 0.2 FD 075% FRD 50% FRD 25% FRD 10%FRD 5% FRD Dry Wet FD 75% FRD 50% FRD 25% FRD 10%FRD Period5% FRD PeriodDry Wet Dataset Period Period Dataset

Figure 10. NSEFigure values 10.of differentNSE values models of different using five models datasets using via five two datasets main calibration via two main approaches calibration approaches (SO & 4 algorithmsFigure(SO &10. parameterization 4 NSE algorithms values parameterizationof differentof MO-SB) models for ofeach using MO-SB) dataset five for .datasets each dataset. via two main calibration approaches (SO & 4 algorithms parameterization of MO-SB) for each dataset.

SO MO_BC (10%) MO_BC (20%) MO_UC (10%) MO_UC (20%) SO MO_BC (10%) MO_BC (20%) MO_UC (10%) MO_UC (20%) 3 2.5 3 2 2.5 1.5 2 1 1.5

RMSE 0.5 1 0 RMSE 0.5 FD 0 75% FRD 50% FRD 25% FRD 10%FRD 5% FRD Dry Wet FD 75% FRD 50% FRD 25% FRD 10%FRD Period5% FRD PeriodDry Wet Dataset Period Period Dataset Figure 11. RMSE values of different models using five datasets via two main calibration approaches (SO & 4 algorithms parameterizationFigure of 11. MO-SB) RMSE for values each dataset.of different models using five datasets via two main calibration ap- proaches (SOFigure & 4 algorithms 11. RMSE valuesparameterization of different of models MO-SB) using for eachfive datasetsdataset. via two main calibration ap- proaches (SO & 4 algorithms parameterization of MO-SB) for each dataset.

Water 2021, 13, x FOR PEER REVIEW 16 of 26

Water 2021, 13, x FORWater PEER2021 REVIEW, 13, 970 16 of 26 15 of 25

SO MO_BC (10%) MO_BC (20%) MO_UC (10%) MO_UC (20%)

SO MO_BC40 (10%) MO_BC (20%) MO_UC (10%) MO_UC (20%) 40 20 20 0

0 PBIAS -20 PBIAS -20 -40 FD 75% FRD 50% FRD 25% FRD 10%FRD 5% FRD Dry Period Wet -40 Period FD 75% FRD 50% FRD 25% FRD 10%FRD 5% FRD Dry Period Wet Dataset Period Dataset Figure 12. PBIAS values of different models using five datasets via two main calibration ap- Figure 12. PBIAS values of different models using five datasets via two main calibration approaches (SO & 4 algorithms proaches (SO & 4 algorithms parameterization of MO-SB) for each dataset. parameterizationFigure of 12. MO-SB) PBIAS forvalues each of dataset. different models using five datasets via two main calibration ap- proaches (SO & 4 algorithms parameterization of MO-SB) for each dataset. 4.2.2. Behavioral Consistency Evaluation 4.2.2. Behavioral Consistency Evaluation 4.2.2. Behavioral ConsistencyThe differences Evaluation between the observed and simulated signatures were calculated for each signatureThe differences to evaluate between the theconsistency observed of and the simulated output from signatures the optimized were calculated calibrated for The differenceseach signaturebetween the to evaluateobserved the and consistency simulated ofsignatures the output were from calculated the optimized for calibrated models and their ability to simulate the catchment’s behavior. This section presents the each signaturemodels to evaluate and theirthe consistency ability to simulate of the output the catchment’s from the optimized behavior. Thiscalibrated section presents the results of the consistency evaluation for each signature. models and theirresults ability of theto simulate consistency the evaluationcatchment’s for behavior. each signature. This section presents the Baseflow index (IBF): The observations and simulation results revealed high IBF values results of the consistencyBaseflow evaluation index (I for): each The observationssignature. and simulation results revealed high I values (0.84–0.98), indicating BFa high baseflow in the catchment and a high groundwaterBF contri- Baseflow index(0.84–0.98), (IBF): The indicating observations a high and baseflow simulation in the results catchment revealed and ahigh high IBF groundwater values contribu- bution. IBF values for the wet-period model were lower than those for others, confirming (0.84–0.98), indicatingtion. I avalues high baseflo for thew wet-period in the catchment model were and lowera high than groundwater those for others, contri- confirming that that wetBF-period data contain high flows, therefore having more direct streamflow and bution. IBF valueswet-period for the wet data-period contain model high were flows, lower therefore than those having for more others, direct confirming streamflow and conse- consequently, less baseflow than in other datasets. Furthermore, using the MO-SB ap- that wet-periodquently, data contain less baseflow high flows, than in therefore other datasets. having Furthermore, more direct usingstreamflow the MO-SB and approach with proach with different parameterization algorithms did not improve the results signifi- consequently, lessdifferent baseflow parameterization than in other algorithms datasets. Furthermore, did not improve using the the results MO significantly;-SB ap- however, cantly; however, MO-BC (20%) yielded the lowest errors for all models (Figure 13). The proach with differentMO-BC (20%)parameterization yielded the lowest algorithms errors did for notall models improve (Figure the results 13). The signifi- simulated IBF values simulated IBF values for all models were close to the observed values, meaning that all cantly; however,for MO all models-BC (20%) were yielded close tothe the lowest observed errors values, for all meaning models that(Figure all models13). The were consistent models were consistent with the baseflow index. simulated IBF valueswith the for baseflow all models index. were close to the observed values, meaning that all models were consistent with the baseflow index. FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period

FD 75%FRD 50%FRD0.1 25%FRD 10%FRD 5%FRD Dry period Wet period 0.1 0.05 0.05 0 0

Signature Error Signature -0.05

Signature Error Signature -0.05 -0.1 SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) -0.1 SO MO-BC (10%) MO-UC (10%) MO-BCDataset (20%) MO-UC (20%)

Figure 13. I errors based onDataset observations and simulations of five datasets using two main calibra- Figure 13. IBFBF errors based on observations and simulations of five datasets using two main calibra- tiontion approaches approaches (SO (SO & & 4 4 algorithms algorithms parameterization parameterization of of MO MO-SB).-SB). Figure 13. IBF errors based on observations and simulations of five datasets using two main calibra- tion approaches (SO &Streamflow 4 algorithms elasticity parameterization (E ): Theof MO value-SB). of the E calculated from observations was Streamflow elasticity (EQPQP)The value of the EQP calculatedQP from observations was high (127.7),high (127.7), indicating indicating that the that streamflow the streamflow is sensitive is sensitive to precipitation to precipitation and that and the that catchment the catch- Streamflowment elasticity is elastic. (EQP)The value results of obtainedthe EQP calculated from simulated from observations flow after implementingwas high different is elastic. The results obtained from simulated flow after implementing different calibra- (127.7), indicatingcalibration that the algorithmstreamflow parameterizations is sensitive to precipitation dramatically and varied that the for catchment each model, indicating tion algorithm parameterizations dramatically varied for each model, indicating signature is elastic. The resultssignature obtained sensitivity from to simulated the length flow of the after records implementing and information different held calibra- by the data. The 25%- sensitivity to the length of the records and information held by the data. The 25%-FRD tion algorithm FRDparameterizations model was the dramatically most accurate varied at reflecting for each the model, streamflow’s indicating elasticity. signature The performances model was the most accurate at reflecting the streamflow’s elasticity. The performances sensitivity to theof SOlength and of MO-SB the records were similar,and information with errors held ranging by the betweendata. The− 25%2.7 and-FRD− 11.3. The 5%- model was theof most SO and accura MOte-SB at were reflecting similar, the with streamflow’s errors ranging elasticity. between The −2.7performances and −11.3. The 5%-FRD FRD and dry-period models showed small EQP values, resulting in high errors (Figure 14). of SO and MO-Wet-periodSB were similar, model with simulations errors ranging also resulted between in −2.7 larger and errors −11.3. than The other5%-FRD models but in the

Water 2021, 13, x FOR PEER REVIEW 17 of 26 Water 2021, 13, x FOR PEER REVIEW 17 of 26

Water 2021, 13, 970 16 of 25 and dry-period models showed small EQP values, resulting in high errors (Figure 14). Wet- andperiod dry -modelperiod simula modelstions showed also smallresulted EQP in values, larger resulting errors than in high other errors models (Figure but in14). the Wet op-- periodposite direction.model simula MOtions-BC (20%) also resulted is the best in calibrationlarger errors parameterization than other models approach but in asthe it op- en- positehancedopposite direction. the direction. results MO in- MO-BC BCall models,(20%) (20%) is especiallythe is best the bestcalibration in calibrationthe 10% parameterization-FRD parameterization and wet-period approach approachmodels. as it en- as it hancedenhanced the theresults results in all in models, all models, especially especially in the in the10% 10%-FRD-FRD and and wet wet-period-period models. models.

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 200 200150 150100 10050 500 -500 -100-50 -100-150 Signature Error Signature -150-200 Signature Error Signature -200 SO MO-BC (10%) MO-UC MO-BC (20%) MO-UC SO MO-BC (10%) MO-UC(10%) MO-BC (20%) MO-UC(20%) (10%) (20%) Dataset Dataset

Figure 14. EQP errors based on observations and simulations of five datasets using two main cali- Figure 14. EQP errors based on observations and simulations of five datasets using two main Figurebration 14. approaches EQP errors (SObased & 4on algorithms observations parameterization and simulations of ofMO five-SB) datasets. using two main cali- brationcalibration approaches approaches (SO & (SO 4 algorithms & 4 algorithms parameterization parameterization of MO of- MO-SB).SB).

LD LD RisingRising limb limb density density (R (RLD): The): The values values of the of observed the observed and simulated and simulated R were RLD smallwere Rising limb density (RLD): The values of the observed and simulated RLD were small (0.02small–0.04), (0.02–0.04), indicating indicating the smoothness the smoothness of the flow of the hydrograph. flow hydrograph. The results The after results imple- after (0.02–0.04), indicating the smoothness of the flow hydrograph. The results after imple- mentingimplementing the MO the-SB MO-SB algorithms algorithms were were similar similar to those to those obtained obtained using using the the SO SO approach; approach; menting the MO-SB algorithms were similar to those obtained using the SO approach; however,however, MO MO-BC-BC (20%) (20%) reduced reduced the the errors errors marginally marginally in in the the FD, FD, 75% 75%-FRD,-FRD, 50% 50%-FRD,-FRD, however, MO-BC (20%) reduced the errors marginally in the FD, 75%-FRD, 50%-FRD, 25%25%-FRD,-FRD, and and wet wet-period-period models models (Figure (Figure 15).15). 25%-FRD, and wet-period models (Figure 15).

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 0.03 0.03 0.02 0.02 0.01 0.01

Signature Error Signature 0 Signature Error Signature 0 SO MO-BC (10%) MO-UC MO-BC (20%) MO-UC SO MO-BC (10%) MO-UC(10%) MO-BC (20%) MO-UC(20%) (10%) (20%) Dataset Dataset Figure 15. R errors based on observations and simulations of five datasets using two main Figure 15. RLD LDerrors based on observations and simulations of five datasets using two main cali- Figurebrationcalibration 15. approaches RLD approaches errors (SO based (SO& 4on &algorithms observations 4 algorithms parameterization and parameterization simulations of MOof of five MO-SB).-SB) datasets. using two main cali- bration approaches (SO & 4 algorithms parameterization of MO-SB). Runoff ratio (R ): The R values were high for all models (10.9–21.6), indicating Runoff ratio (RQPQP): The RQP QPvalues were high for all models (10.9–21.6), indicating the dominationtheRunoff domination ratioof blue of(R QP bluewater): The water in R theQP invalues catchment. the catchment.were Therefore,high for Therefore, all the models streamflow the (10.9 streamflow–21.6), is larger indicating is largerthan evap- thanthe evapotranspiration in the context of water balance if we assume no change in the storage of dominationotranspiration of blue in the water context in the of watercatchment. balance Therefore, if we assume the streamflow no change isin larger the storage than evap- of the otranspirationthe catchment. in The the dry-periodcontext of water model balance showed if thewe lowestassume simulated no change R inQP theand storage consequently, of the catchment. The dry-period model showed the lowest simulated RQP and consequently, the the highest errors among the other models (Figure 16), whereas the wet-period model catchment.highest errors The among dry-period the other model models showed (Figure the lowest 16), whereas simulated the wetRQP -andperiod consequently, model showed the highestshowed errors a high among RQP but the inother the models opposite (Figure direction. 16), Thewhereas results the confirm wet-period that model data containing showed a high RQP but in the opposite direction. The results confirm that data containing frequent a frequenthigh RQP but and in high the opposite events will direction. result inThe a results large R confirmQP and vicethat data versa. containing The 5%-FRD frequent and and high events will result in a large RQP and vice versa. The 5%-FRD and 10%-FRD mod- and10%-FRD high events models will also result resulted in a large in low RQP RandQP. vice However, versa. The using 5% MO-BC-FRD and (20%) 10%- loweredFRD mod- the els also resulted in low RQP. However, using MO-BC (20%) lowered the errors significantly errors significantly from 6.5 and 5.3 (obtained via SO) to 1.5 and 0.4, respectively, which are elsfrom also 6.5 resulted and 5.3 in (obtained low RQP. viaHowever, SO) to 1.5using and MO 0.4,- BCrespectively, (20%) lowered which the are errors acceptable significantly values acceptable values compared to the errors of the rest of the models. MO-BC (10%) enhanced fromcompared 6.5 and to 5.3 the (obtained errors of via the SO) rest to of 1.5 the and models. 0.4, respectively, MO-BC (10%) which enhanced are acceptable the results values for comparedthe results to for the R QPerrorsof the of 5%-FRDthe rest of and the 10%-FRD models. modelsMO-BC and(10%) ranked enhanced second the after results MO-BC for RQP of the 5%-FRD and 10%-FRD models and ranked second after MO-BC (20%). Further- R(20%).QP of the Furthermore, 5%-FRD and the 10% MO-BC-FRD (20%)models significantly and ranked improved second after the MO RQP-valuesBC (20%). in all Further- models. more, the MO-BC (20%) significantly improved the RQP values in all models. more, the MO-BC (20%) significantly improved the RQP values in all models.

Water 2021, 13, x FOR PEER REVIEW 18 of 26

WaterWater 20212021, ,1313, ,x 970 FOR PEER REVIEW 1817 of of 26 25

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 8 68 46 24 02 -20 -4 Signature Error Signature -2

-6-4 Signature Error Signature -8-6 -8 SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) SO MO-BC (10%)DatasetMO-UC (10%) MO-BC (20%) MO-UC (20%) Dataset

Figure 16. RQP errors based on observations and simulations of five datasets using two main cali- Figure 16. R errors based on observations and simulations of five datasets using two main bration approachesQP (SO & 4 algorithms parameterization of MO-SB). Figurecalibration 16. RQP approaches errors based (SO on & observations 4 algorithms and parameterization simulations of of five MO-SB). datasets using two main cali- bration approaches (SO & 4 algorithms parameterization of MO-SB). HighHigh-flow-flow segment segment volume volume of of the the FDC FDC (FHV (FHV (FDC)): (FDC)): The The observed observed FHV FHV (FDC) (FDC) was was highhigh High (2365.1), (2365.1),-flow indicating indicating segment volume that that the the of catchment catchment the FDC faces(FHV faces frequent (FDC)): frequent floodingThe flooding observed because because FHV of high(FDC) of stream- high was streamflow.highflow. (2365.1), Simulated Simulated indicating FHV FHV (FDC) that (FDC) usingthe catchment using long-period long - facesperi modelsod frequent models (FD, flooding (FD, 75% 75% FRD, because FRD, and and50% of 50% high FRD) FRD)streamflow.showed showed lower Simulated lower values values than FHV than the (FDC) observed the usingobserved FHV long FHV (FDC),-peri od(FDC), although models although (FD, they 75%werethey FRD, stillwere acceptable. andstill 50%ac- ceptable.FRD)However, showed However, short-period lower short values- modelsperiod than models failedthe observed tofailed reflect to FHV reflect this (FDC), signature this signaturealthough as their asthey simulatedtheir were simulated still values ac- valuesceptable.were were too However, far too from far from the short observed the-period observed values—eithermodels values failed— eitherto very reflect very small this small values, signature values, as simulated as as their simulated simulated using us- the ingvalues5%-FRD the 5%were- andFRD too dry-period and far fromdry-period the models, observed models, or very values or highvery— either values,high values, very as simulated small as simulated values, using as using thesimulated wet-period the wet us-- periodingmodel. the model. 5% The-FRD results The and results dry confirm-period confirm that models, short-term that shortor very- andterm high dry-period and values, dry-period as data simulated lack data high-flow lack using high the events,-flow wet- events,periodunderestimating underestimating model. The the results volume the confirmvolume of the thatof very the short highvery-term flows,high andflows, with dry thewith-period opposite the opposite data being lack being truehigh for -trueflow the forevents,wet-period the wet underestimating-period models. models. Although the Although volume the errors ofthe the errors were very high were high for highflows, short-period for with short the-period models, opposite models, MO-BC being MO (20%)true- BCforwas (20%) the the wet was best-period the approach best models. approach according Although according to simulations the errors to simulations were of this high signature of for this short signature (Figure-period 17 (Figure ).models, 17). MO - BC (20%) was the best approach according to simulations of this signature (Figure 17).

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 1900 14001900 1400900 400900 -100400 -600-100 Signature Error Signature -1100-600

Signature Error Signature -1600-1100 -1600 SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) Dataset Dataset Figure 17. FHV (FDC) errors based on observations and simulations of five datasets using two main Figure 17. FHV (FDC) errors based on observations and simulations of five datasets using two calibration approaches (SO & 4 algorithms parameterization of MO-SB). mainFigure calibration 17. FHV (FDC)approaches errors (SO based & 4 on algorithms observations parameterization and simulations of MOof five-SB) datasets. using two main calibrationLow-flow approaches segment volume(SO & 4 algorithms of the FDC parameterization (FLV (FDC)): From of MO the-SB) high. errors (overesti- Low-flow segment volume of the FDC (FLV (FDC)): From the high errors (overesti- mation) in Figure 18, we found that the FLV (FDC) volume cannot be simulated by any of mation) in Figure 18, we found that the FLV (FDC) volume cannot be simulated by any of the models;Low-flow all segment models failedvolume to reflectof the theFDC FLV (FLV (FDC) (FDC)): in the From Brue the catchment. high errors However, (overesti- the the models; all models failed to reflect the FLV (FDC) in the Brue catchment. However, mation)wet-period in Figure models 18, yielded we found the that lowest the errors. FLV (FDC) MO-BC volum (20%)e cannot reduced be the simulated errors significantly by any of the wet-period models yielded the lowest errors. MO-BC (20%) reduced the errors signif- thein themodels; FD, 75%-FRD, all models and failed 50%-FRD to reflect models. the FLV Nevertheless, (FDC) in the according Brue catchment. to this signature, However, no icantlythemodel wet in- wasperiod the consistent.FD, models 75%-FRD, yielded and the 50% lowest-FRD errors.models. MO Nev-BCertheless, (20%) reduced according the to errors this signa-signif- ture,icantly no inmodel the FD, was 75% consistent.-FRD, and 50%-FRD models. Nevertheless, according to this signa- ture, no model was consistent.

Water 2021, 13, x FOR PEER REVIEW 19 of 26 Water Water 20212021, ,1313, ,x 970 FOR PEER REVIEW 1918 of of 26 25

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 6000 60005750 57505500 55005250 52505000 50004750 47504500

Signature Error Signature 45004250 Signature Error Signature 42504000 4000 SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) Dataset Dataset

Figure 18. FLV (FDC) errors based on observations and simulations of five datasets using two FigureFigure 18. 18. FLVFLV (FDC) (FDC) errors errors based based on on observations observations and and simulations simulations of of five five datasets datasets using using two two main main calibration approaches (SO & 4 algorithms parameterization of MO-SB). maincalibration calibration approaches approaches (SO &(SO 4 algorithms& 4 algorithms parameterization parameterization of MO-SB). of MO-SB). Mid-flow segment slope of the FDC (FMS (FDC)): The FMS (FDC) was well simulated MidMid-flow-flow segment segment slope slope of of the the FDC FDC (FMS (FMS (FDC)): (FDC)): The The FMS FMS (FDC) (FDC) was was well well simulated simulated (Figure(Figure 19). 19). The The errors errors ranged ranged between between −0.2−0.2 and and 0.5, 0.5, with with the the 5% 5%-FRD-FRD model model exhibiting exhibiting (Figure 19). The errors ranged between −0.2 and 0.5, with the 5%-FRD model exhibiting thethe best best performance performance with with an anerror error of 0.1 of 0.1for all for calibration all calibration approaches. approaches. The observed The observed FDC the best performance with an error of 0.1 for all calibration approaches. The observed FDC slopeFDC was slope steep was steep(0.8), indicating (0.8), indicating flashy flashy runoff, runoff, indicating indicating that moderate that moderate flows flows do not do re- not slope was steep (0.8), indicating flashy runoff, indicating that moderate flows do not re- mainremain in the in the catchment. catchment. Signature Signature-based-based calibration calibration improved improved the the results results slightly. slightly. Overall, Overall, main in the catchment. Signature-based calibration improved the results slightly. Overall, allall models models were were consistent consistent for for this this signature. signature. all models were consistent for this signature.

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 0.6 0.6 0.4 0.4 0.2 0.2 0 0 -0.2 Signature Error Signature -0.2 Signature Error Signature -0.4 -0.4 -0.6 -0.6 SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) Dataset Dataset Figure 19. FMS (FDC) errors based on observations and simulations of five datasets using two main Figure 19. FMS (FDC) errors based on observations and simulations of five datasets using two Figure 19. FMS (FDC) errors based on observations and simulations of five datasets using two maincalibration calibration approaches approaches (SO &(SO 4 algorithms& 4 algorithms parameterization parameterization of MO-SB). of MO-SB). main calibration approaches (SO & 4 algorithms parameterization of MO-SB). Mean discharge (Qmean): The observed and simulated values of mean streamflow Mean discharge (Qmean): The observed and simulated values of mean streamflow re- resultingMean fromdischarge different (Qmean calibration): The observed approaches and simulated ranged between values of 1.2 mean and 2.5,streamflow whereas re- the sulting from different calibration approaches ranged between 1.2 and 2.5, whereas the sultingobserved from mean different was 1.9. calibration The range approaches of the simulated ranged mean between flows 1.2 was and small 2.5, and whereas acceptable. the observed mean was 1.9. The range of the simulated mean flows was small and acceptable. observedHowever, mean most was simulated 1.9. The meanrange of flows the simulated were lower mean than flows the observed was small mean and acceptable. flow for all However, most simulated mean flows were lower than the observed mean flow for all However,models, exceptmost simulated the wet-period mean modelflows were because lower of athan lack the of lowobserved flows mean in the flow wet-period for all models, except the wet-period model because of a lack of low flows in the wet-period models,data. The except MO-SB the wet algorithms-period model provided because more of consistent a lack of models low flows than in the the SO wet approach.-period data. The MO-SB algorithms provided more consistent models than the SO approach. The data.The MO-BCThe MO- (20%)SB algorithms enhanced provided the results, more particularly consistent inmodels the FD, than 75%-FRD, the SO approach. 50%-FRD, The and MO-BC (20%) enhanced the results, particularly in the FD, 75%-FRD, 50%-FRD, and 25%- MO25%-FRD-BC (20%) models enhanced (errors the were results, almost particularly zero; see Figurein the FD,20). 75%-FRD, 50%-FRD, and 25%- FRD models (errors were almost zero; see Figure 20). FRD models (errors were almost zero; see Figure 20).

Water 2021, 13, x FOR PEER REVIEW 20 of 26 Water Water 20212021, 13, 13, x, 970FOR PEER REVIEW 2019 of of 26 25

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 0.8 0.80.6 0.60.4 0.40.2 0.20 -0.20 -0.2-0.4 Signature Error Signature -0.4-0.6 Signature Error Signature -0.6-0.8 -0.8 SO MO-BC (10%)MO-UC (10%)MO-BC (20%)MO-UC (20%) SO MO-BC (10%)MO-UC (10%)MO-BC (20%)MO-UC (20%) Dataset Dataset

Figure 20. Mean streamflow errors based on observations and simulations of five datasets using Figure 20. Mean streamflow errors based on observations and simulations of five datasets using two Figuretwo main 20. Meancalibration streamflow approaches errors (SO based & 4 on algorithms observations parameterization and simulations of MOof five-SB) datasets. using twomain main calibration calibration approaches approaches (SO (SO &4 & algorithms 4 algorithms parameterization parameterization of MO-SB).of MO-SB).

Median discharge (Qmedian): The results of the simulated median streamflow resulting Median discharge (Qmedian): The results of the simulated median streamflow result- Median discharge (Qmedian): The results of the simulated median streamflow resulting froming from the long the longand short and short-term-term period period models models using using different different calibration calibration approaches approaches con- from the long and short-term period models using different calibration approaches con- vergedconverged with with those those of the of observed the observed median median flow flowof the of catchment the catchment (0.77 (0.77–1.4)–1.4) and andthe ob- the verged with those of the observed median flow of the catchment (0.77–1.4) and the ob- servedobserved value value was was 1. The 1. wet The-period wet-period model model was the was most the consistent most consistent according according to the Q tomedian the served value was 1. The wet-period model was the most consistent according to the Qmedian asQ it showedas it zero showed errors zero for errors all parameterization for all parameterization approaches approaches (Figure 21). (Figure Overall, 21). the Overall, MO- as itmedian showed zero errors for all parameterization approaches (Figure 21). Overall, the MO- SBthe algorithms MO-SB algorithms improved improvedthe results thefor resultsmost models, for most except models, the 5% except-FRD, the dry 5%-FRD,-period, and dry- SB algorithms improved the results for most models, except the 5%-FRD, dry-period, and wetperiod,-period and models wet-period as the modelsSO approach as the resulted SO approach in smaller resulted errors in than smaller the MO errors-SB thanoptimi- the wet-period models as the SO approach resulted in smaller errors than the MO-SB optimi- zationMO-SB algorithms. optimization algorithms. zation algorithms.

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 0.5 0.5 0.3 0.3 0.1 0.1 -0.1 -0.1 -0.3 Signature Error Signature -0.3

Signature Error Signature -0.5 -0.5 SO MO-BC (10%)MO-UC (10%)MO-BC (20%)MO-UC (20%) SO MO-BC (10%)MO-UC (10%)MO-BC (20%)MO-UC (20%) Dataset Dataset Figure 21. Median discharge errors based on observations and simulations of five datasets using two Figure 21. Median discharge errors based on observations and simulations of five datasets using Figuretwomain main calibration21. Mediancalibration approachesdischarge approaches errors (SO (SO & based 4 & algorithms 4 onalgorithms observations parameterization parameterization and simulations of MO-SB). of MO of five-SB) datasets. using two main calibration approaches (SO & 4 algorithms parameterization of MO-SB). Discharge variance (DV(Q)): The observed discharge variance was high, indicating Discharge variance (DV(Q)): The observed discharge variance was high, indicating varying streamflow. The simulated DV(Q)was close to the observed value, except for the varyingDischarge streamflow. variance The ( DV(Q))simulated: The DV(Q) observedwas close discharge to the varianceobserved was value, high, except indicating for the 10%-FRD, 5%-FRD, and dry-period models, where the differences were slightly higher than varying10%-FRD, streamflow. 5%-FRD, Theand simulateddry-period DV(Q) models,was where close tothe the differences observed value,were slightlyexcept for higher the for other models (Figure 22). The 5%-FRD and dry-period models indicated no variability 10%than-FRD, for other 5%- FRD,models and (Figure dry-period 22). The models, 5%-FRD where and drythe -differencesperiod models were indicated slightly nohigher vari- in streamflow as their variances were small because their data were in the same range and thanability for in other streamflow models (Figureas their 22) variances. The 5% were-FRD small and dry because-period their models data indicated were in the no vari-same close to the mean. abilityrange andin streamflow close to the as mean. their variances were small because their data were in the same range and close to the mean.

Water 2021, 13, x FOR PEER REVIEW 21 of 26

WaterWater 20212021, 13, ,13 x, FOR 970 PEER REVIEW 21 20of of26 25

FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 2.5 2.52 1.52 1.51 0.51 0.50 Signature Error Signature -0.50

Signature Error Signature -0.5-1 -1 SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) SO MO-BC (10%) MO-UCDataset (10%) MO-BC (20%) MO-UC (20%) Dataset Figure 22. Discharge variance errors based on observations and simulations of five datasets using Figure 22. Discharge variance errors based on observations and simulations of five datasets using two main calibration approaches (SO & 4 algorithms parameterization of MO-SB). Figuretwo main22. Discharge calibration variance approaches errors (SO based & 4 on algorithms observations parameterization and simulations of MO-SB).of five datasets using two main calibration approaches (SO & 4 algorithms parameterization of MO-SB). Peak discharge (QPEAK): This signature was associated with the maximum peak ob- Peak discharge (QPEAK): This signature was associated with the maximum peak served in the catchment. The results presented in Figure 23 show a significant decrease in observedPeak discharge in the catchment. (QPEAK): TheThis results signature presented was associated in Figure 23with show the a maximum significant decreasepeak ob- in servedthethe simulated simulated in the catchment. peaks peaks of of the theThe 5% 5%-FRD results-FRD presented and dry-perioddry-period in Figure models,models, 23 show whereas whereas a significant there there was was decreas a significanta signifi-e in thecantincrease simulated increase in thepeaksin the simulated ofsimulated the 5% peaks- FRDpeaks of and theof drythe wet-period -wetperiod-period models, model. model. whereas The The 25%-FRD 25% there-FRD was model model a signifi- was was the cantthemost mostincrease consistent consistent in the in simulatedin simulating simulating peaks the the peak ofpeak the discharge.discharge. wet-period The model. simulated simulated The 25% peaks peaks-FRD obtained obtained model using was using thethethe most SO SO and consistent and MO MO-SB-SB in algorithms algorithmssimulating were the were peak the the same. discharge. same. Overall, Overall, The thesimulated the MO MO-SB-SB peaks improved improved obtained the the resultsusing results theslightly,slightly, SO and especially especially MO-SB inalgorithms in the the 10% 10%-FRD- FRDwere model. the model. same. Overall, the MO-SB improved the results slightly, especially in the 10%-FRD model. FD 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period

FD30 75%FRD 50%FRD 25%FRD 10%FRD 5%FRD Dry period Wet period 25 3020 2515 2010 155 100 5-5 Signature Error Signature -100 -5

Signature Error Signature -15 -10 -15 SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) SO MO-BC (10%) MO-UC (10%) MO-BC (20%) MO-UC (20%) Dataset

Dataset FigureFigure 23. 23. PeakPeak discharge discharge errors errors based based on on observations observations and and simulations simulations of offive five datasets datasets using using two two main calibration approaches (SO & 4 algorithms parameterization of MO-SB). Figuremain calibration 23. Peak discharge approac heserrors (SO based & 4 algorithms on observations parameterization and simulations of MO of- SB).five datasets using two main5. Discussioncalibration approaches (SO & 4 algorithms parameterization of MO-SB). 5. Discussion Overall, the results showed that the HBV model was successfully calibrated using 5. DiscussionOverall, the results showed that the HBV model was successfully calibrated using SO SO and MO-SB optimization approaches using short-term datasets with a lower limit of andapproximately Overall,MO-SB optimizationthe results four months showed approaches of that data the (10%-FRDusing HBV short model model).-term was datasets successfully These results with calibrated a correlate lower limit using with of those SOap- andproximatelyobtained MO-SB byoptimization four Brath months et al. approachesof (2004) data and(10% Perrinusing-FRD shortmodel). et al.- (2007),term These datasets indicating results with correlate that a lower calibrated with limit those of models ap- ob- proximatelytainedcan generate by Brath four reasonable et months al. (2004) of results data and (10%Perrin using-FRD dataet al. model). for(2007), less These thanindicating one results year that correlate [ 38calibrated,41]. with models those ob-can tainedgenerate byIt is reasonableBrath difficult et al. to results(2004) compare andusing the Perrin data results etfor al. ofless (2007), this than study indicatingone to year previous [38,41] that calibrated investigations. models of model can generateperformanceIt is reasonable difficult under to resultscompare continuous using the dataresults short-term for of less this data than study because one to year previous no [38,41] previous investigations. study incorporated of model performancehydrologicalIt is difficult under signatures to compare continuous in thethe processshortresults-term of thisparameters’ data study because to searchprevious no previous using investigations short-term study incorporated data.of model Thus, performancehydrologicalthe comparison undersignatures is restrictedcontinuous in the to process short the general-term of parameters’ data results because of the search no models’ previous using performance short study-term incorporated regardlessdata. Thus, of hydrologicalthethe comparison calibration signatures is method. restricted in Accordingthe to process the general to of the parameters’ performanceresults of searchthe measurements, models’ using performance short- theterm performance data. regardless Thus, of theofMO-SB thecomparison calibration was the is bestmethod.restricted for MO-BC According to the (20%)general to whereasthe results performance both of the MO-UC models’ measurements, (10% performance and 20%)the performance optimizationregardless ofof algorithmsthe MO calibration-SB was showed the method. best the for According lowest MO-BC performance (20%) to the whereas performance (less both than MO measurements, the-UC SO (10% approach), and the 20%) performance indicating optimiza- the oftion ineffectivenessMO algorithms-SB was the showed of best using for the thisMO lowest setting-BC (20%) performance (uniform whereas crossover). (lessboth MOthan This- UCthe finding (10%SO approach), and contradicts 20%) optimiza-indicating that of a tionthestudy ineffectivenessalgorithms by Shafii showed and of using Tolson the thislowest (2015), setting performance who (uniform concluded (lesscrossover). that than the the This model SO finding approach), performance contradicts indicating depends that the ineffectiveness of using this setting (uniform crossover). This finding contradicts that

Water 2021, 13, 970 21 of 25

more on the formulation of the optimization problem than on the choice of the optimization algorithm [29]. In terms of the impact of dataset size on performance, the FD, 75%-FRD, and 50%- FRD models exhibited similar performance. The performance of the 10%-FRD model was lower but still acceptable. The obtained results are consistent with those presented in a study by Brath et al. (2004), who proved that data of less than one year can contribute to acceptable model performances [41]. The 5%-FRD and dry-period models exhibited the lowest performance under all calibration approaches, indicating that in both cases, short hourly records and scarcity of events make the dataset insufficient to build a model that provides acceptable performance, even when using signature-based calibration. The results of the dry-period model correlate with previous findings, such as those of Li et al. (2010) who showed that dry catchments require a longer data-collection period for calibration to obtain stable parameters [36]. Perrin et al. (2007) concluded the difficulty of estimating robust model parameters in dry catchments and recommended longer periods of calibration to achieve stable parameters [38]. Pool et al. (2017) found that dry runoff periods, defined by mean and minimal flow samples, convey less information for hydrograph prediction than wet periods [43]. The situation is different when limited data, as in this study, are selected from a wet period because the availability of multiple events results in good performance. This finding is consistent with previous research, confirming that data containing sufficient high flows leads to better calibration and improved model performance [31,32,38]. Using the signatures in model evaluation allowed for developing a better under- standing of the catchment processes. For example, the results provided insight into the catchment’s baseflow. The baseflow index results show a high baseflow in the catchment, correlating with results from a study on the same basin [47]. Additionally, EQP values demonstrated the streamflow sensitivity to the observed precipitation. The RLD values indicate the smoothness of the flow hydrograph. However, the RQP values indicate the domination of blue water, meaning that the streamflow is larger than evapotranspiration in the context of water balance if we assume no change in the storage of the catchment. The volume of the high segment of the FDC indicates a frequent flood in the catchment because of the high streamflow, which is consistent with previous investigation of the Brue catchment [72]. The FDC slope was steep, indicating a flashy runoff; therefore, moderate flows do not remain in the catchment. Also, note that all models (including the long-term data models) failed to reflect the FLV (FDC) in the Brue catchment with significant errors. This result could be interpreted by the effects of vegetation growth in the Brue catchment, leading to poor simulations of low flows, as reported in previous studies [47]. Overall, the results show that five of the hydrological signatures were sensitive to dataset size, namely, the elasticity of streamflow, FHV (FDC), FLV (FDC), RQP, and the peak of the flow. Finally, the study provides a quantitative estimation of the impact of dataset size on model performance and consistency for several metrics and signatures. Incorporat- ing signatures in the model calibration process produced a consistent model with higher performance, improving the results in the case of limited data compared to only using goodness-of-fit measures. However, there is a lower limit on the length of the observation records to build a successful signature-based calibrated model. This is a matter of both dataset length and the hydrological information contained within these limited observa- tions. Note that the signature-based diagnostic evaluation approach enables the selection of consistent models reflecting critical characteristics of the catchment behavior and allows for identifying the deficiencies of various models.

6. Conclusions In this study, the usefulness of incorporating hydrological signatures in calibrating a rainfall-runoff model (HBV model) under different data-availability scenarios was as- sessed. In contrast to previous studies, the effect of dataset size on both performance and consistency of HBV calibrated models was investigated in this study. The results showed that a limited number of records can ensure a reasonably good performance in Water 2021, 13, 970 22 of 25

calibration because the modeled hydrograph can fit the observations to an extent sufficient for hydrological practice. Contrarily, for the validation period, using limited data resulted in poor performance and consistency, as illustrated by the 5%-FRD model in both scenarios 1 and 2. Nevertheless, the progressive reduction in dataset size deteriorates the model’s perfor- mance and the dry periods do not allow for feeding the models with enough information. Therefore, model performance depends on whether the data contains enough climatic information and some extreme events that could help make the model more represen- tative of the watershed; however, more accurate quantification of such influence would require more studies. The MO-SB calibration improved the model results better than the SO calibration approach. The diagnostic evaluation approach provides a powerful and meaningful basis for interpreting model results. However, for the considered case study, the improvement was not as much as expected, and more experiments are needed with various types of catchments. This study has some limitations, which also allow to formulate the future directions of additional research. We have not considered the uncertainty resulting from the level of confidence in the data, which is particularly important in areas where the data might not accurately represent reality. The set of signatures could be extended, and various combinations could be considered. The results of multiobjective calibration allow for a wider interpretation and provide many possibilities; however, only a single final solution was evaluated in this study. Further insights into the impact of dataset size on model performance would have been obtained if more data-partitioning approaches were explored and investigated. To extend the conclusions and validate the findings of this study, so that they would have more universal applicability, it is recommended to repeat the same investigation on more catchments with different hydrological regimes.

Author Contributions: Conceptualization, D.P.S.; methodology, D.P.S., M.H., and S.A.M.; software, S.A.M.; validation, D.P.S., M.H., and S.A.M.; formal analysis, S.A.M.; investigation, S.A.M.; resources, D.P.S.; data curation, D.P.S. and S.A.M.; writing—original draft preparation, S.A.M., and M.A.H.; writing—review and editing, S.A.M., D.P.S., M.H., and M.A.H.; visualization, S.A.M., D.P.S. and M.A.H.; supervision, D.P.S. and M.H.; project administration, D.P.S., and M.H.; funding acquisition, D.P.S. and M.A.H. All authors have read and agreed to the published version of the manuscript. Funding: The APC was funded by the National Water and Energy Center at UAE University Strategic Grant number 12R019. Institutional Review Board Statement: Not Applicable. Informed Consent Statement: Not Applicable. Data Availability Statement: The data used in this study are available from the second author upon reasonable request. Acknowledgments: The first author is grateful to the Netherlands Fellowship Programme (NFP) for providing the fellowship for the Master study in Hydroinformatics at the IHE Delft Institute for Water Education, in the framework of which this research has been carried out. Conflicts of Interest: The authors declare no conflict of interest.

References 1. Gupta, H.V.; Sorooshian, S.; Yapo, P.O. Toward improved calibration of hydrologic models: Multiple and noncommensurable measures of information. Water Resour. Res. 1998, 34, 751–763. [CrossRef] 2. Solomatine, D.P.; Dibike, Y.B.; Kukuric, N. Calage automatique de modèles d’écoulement souterrain utilisant des techniques d’optimisation globales. Hydrol. Sci. J. 1999, 44, 879–894. [CrossRef] 3. Lawrence, D.; Haddeland, I.; Langsholt, E. Calibration of HBV Hydrological Models Using PEST Parameter Estimation; Norwegian Water Resources and Energy Directorate: Oslo, Norway, 2009; ISBN 9788241006807. 4. Patil, S.D.; Stieglitz, M. Comparing spatial and temporal transferability of hydrological model parameters. J. Hydrol. 2015, 525, 409–417. [CrossRef] 5. Wöhling, T.; Samaniego, L.; Kumar, R. Evaluating multiple performance criteria to calibrate the distributed hydrological model of the upper Neckar catchment. Environ. Earth Sci. 2013, 69, 453–468. [CrossRef] Water 2021, 13, 970 23 of 25

6. Fenicia, F.; Solomatine, D.P.; Savenije, H.H.G.; Matgen, P. Soft combination of local models in a multi-objective framework. Hydrol. Earth Syst. Sci. 2007, 11, 1797–1809. [CrossRef] 7. Sahraei, S.; Asadzadeh, M.; Unduche, F. Signature-based multi-modelling and multi-objective calibration of hydrologic models: Application in flood forecasting for Canadian Prairies. J. Hydrol. 2020, 588, 125095. [CrossRef] 8. Reed, P.M.; Hadka, D.; Herman, J.D.; Kasprzyk, J.R.; Kollat, J.B. Evolutionary multiobjective optimization in water resources: The past, present, and future. Adv. Water Resour. 2013, 51, 438–456. [CrossRef] 9. Zhou, A.; Qu, B.Y.; Li, H.; Zhao, S.Z.; Suganthan, P.N.; Zhangd, Q. Multiobjective evolutionary algorithms: A survey of the state of the art. Swarm Evol. Comput. 2011, 1, 32–49. [CrossRef] 10. Efstratiadis, A.; Koutsoyiannis, D. Une décennie d’approches de calage multi-objectifs en modélisation hydrologique: Une revue. Hydrol. Sci. J. 2010, 55, 58–78. [CrossRef] 11. Kollat, J.B.; Reed, P.M.; Wagener, T. When are multiobjective calibration trade-offs in hydrologic models meaningful? Water Resour. Res. 2012, 48, 3520. [CrossRef] 12. Asadzadeh, M.; Tolson, B.A.; Burn, D.H. A new selection metric for multiobjective hydrologic model calibration. Water Resour. Res. 2014, 50, 7082–7099. [CrossRef] 13. Euser, T.; Winsemius, H.C.; Hrachowitz, M.; Fenicia, F.; Uhlenbrook, S.; Savenije, H.H.G. A framework to assess the realism of model structures using hydrological signatures. Hydrol. Earth Syst. Sci. 2013, 17, 1893–1912. [CrossRef] 14. Martinez, G.F.; Gupta, H.V. Hydrologic consistency as a basis for assessing complexity of monthly water balance models for the continental United States. Water Resour. Res. 2011, 47, W12540. [CrossRef] 15. van Werkhoven, K.; Wagener, T.; Reed, P.; Tang, Y. Sensitivity-guided reduction of parametric dimensionality for multi-objective calibration of watershed models. Adv. Water Resour. 2009, 32, 1154–1169. [CrossRef] 16. Pokhrel, P.; Yilmaz, K.K.; Gupta, H.V. Multiple-criteria calibration of a distributed watershed model using spatial regularization and response signatures. J. Hydrol. 2012, 418–419, 49–60. [CrossRef] 17. Pfannerstill, M.; Guse, B.; Fohrer, N. Smart low flow signature metrics for an improved overall performance evaluation of hydrological models. J. Hydrol. 2014, 510, 447–458. [CrossRef] 18. Asadzadeh, M.; Leon, L.; McCrimmon, C.; Yang, W.; Liu, Y.; Wong, I.; Fong, P.; Bowen, G. Watershed derived nutrients for Lake Ontario inflows: Model calibration considering typical land operations in Southern Ontario. J. Great Lakes Res. 2015, 41, 1037–1051. [CrossRef] 19. Chilkoti, V.; Bolisetti, T.; Balachandar, R. Multi-objective autocalibration of SWAT model for improved low flow performance for a small snowfed catchment. Hydrol. Sci. J. 2018, 63, 1482–1501. [CrossRef] 20. Sawicz, K.; Wagener, T.; Sivapalan, M.; Troch, P.A.; Carrillo, G. Catchment classification: Empirical analysis of hydrologic similarity based on catchment function in the eastern USA. Hydrol. Earth Syst. Sci. 2011, 15, 2895–2911. [CrossRef] 21. Seibert, J.; McDonnell, J.J. On the dialog between experimentalist and modeler in catchment hydrology: Use of soft data for multicriteria model calibration. Water Resour. Res. 2002, 38, 1241–1252. [CrossRef] 22. Hrachowitz, M.; Fovet, O.; Ruiz, L.; Euser, T.; Gharari, S.; Nijzink, R.; Freer, J.; Savenije, H.H.G.; Gascuel-Odoux, C. Process consistency in models: The importance of system signatures, expert knowledge, and process complexity. Water Resour. Res. 2014, 50, 7445–7469. [CrossRef] 23. McMillan, H.K.; Clark, M.P.; Bowden, W.B.; Duncan, M.; Woods, R.A. Hydrological field data from a modeller’s perspective: Part 1. Diagnostic tests for model structure. Hydrol. Process. 2011, 25, 511–522. [CrossRef] 24. Clark, M.P.; McMillan, H.K.; Collins, D.B.G.; Kavetski, D.; Woods, R.A. Hydrological field data from a modeller’s perspective: Part 2: Process-based evaluation of model hypotheses. Hydrol. Process. 2011, 25, 523–543. [CrossRef] 25. Reusser, D.E.; Zehe, E. Inferring model structural deficits by analyzing temporal dynamics of model performance and parameter sensitivity. Water Resour. Res. 2011, 47, W07550. [CrossRef] 26. Wagener, T.; Montanari, A. Convergence of approaches toward reducing uncertainty in predictions in ungauged basins. Water Resour. Res. 2011, 47, 453–460. [CrossRef] 27. Westerberg, I.K.; Guerrero, J.L.; Younger, P.M.; Beven, K.J.; Seibert, J.; Halldin, S.; Freer, J.E.; Xu, C.Y. Calibration of hydrological models using flow-duration curves. Hydrol. Earth Syst. Sci. 2011, 15, 2205–2227. [CrossRef] 28. Schaefli, B. Snow hydrology signatures for model identification within a limits-of-acceptability approach. Hydrol. Process. 2016, 30, 4019–4035. [CrossRef] 29. Shafii, M.; Tolson, B.A. Optimizing hydrological consistency by incorporating hydrological signatures into model calibration objectives. Water Resour. Res. 2015, 51, 3796–3814. [CrossRef] 30. Yapo, P.O.; Gupta, H.V.; Sorooshian, S. Automatic calibration of conceptual rainfall-runoff models: Sensitivity to calibration data. J. Hydrol. 1996, 181, 23–48. [CrossRef] 31. Kim, U.; Kaluarachchi, J.J. Hydrologic model calibration using discontinuous data: An example from the upper Blue Nile River Basin of Ethiopia. Hydrol. Process. 2009, 23, 3705–3717. [CrossRef] 32. Sun, W.; Wang, Y.; Wang, G.; Cui, X.; Yu, J.; Zuo, D.; Xu, Z. Physically based distributed hydrological model calibration based on a short period of streamflow data: Case studies in four Chinese basins. Hydrol. Earth Syst. Sci. 2017, 21, 251–265. [CrossRef] 33. McIntyre, N.R.; Wheater, H.S. Calibration of an in-river phosphorus model: Prior evaluation of data needs and model uncertainty. J. Hydrol. 2004, 290, 100–116. [CrossRef] Water 2021, 13, 970 24 of 25

34. Tan, S.B.; Chua, L.H.; Shuy, E.B.; Lo, E.Y.-M.; Lim, L.W. Performances of Rainfall-Runoff Models Calibrated over Single and Continuous Storm Flow Events. J. Hydrol. Eng. 2008, 13, 597–607. [CrossRef] 35. Sorooshian, S.; Gupta, V.K.; Fulton, J.L. Evaluation of Maximum Likelihood Parameter estimation techniques for conceptual rainfall-runoff models: Influence of calibration data variability and length on model credibility. Water Resour. Res. 1983, 19, 251–259. [CrossRef] 36. Li, C.Z.; Wang, H.; Liu, J.; Yan, D.H.; Yu, F.L.; Zhang, L. Effect of calibration data series length on performance and optimal parameters of hydrological model. Water Sci. Eng. 2010, 3, 378–393. 37. Tada, T.; Beven, K.J. Hydrological model calibration using a short period of observations. Hydrol. Process. 2012, 26, 883–892. [CrossRef] 38. Perrin, C.; Oudin, L.; Andreassian, V.; Rojas-Serna, C.; Michel, C.; Mathevet, T. Impact of limited streamflow data on the efficiency and the parameters of rainfall-runoff models. Hydrol. Sci. J. 2007, 52, 131–151. [CrossRef] 39. Reynolds, J.E.; Halldin, S.; Seibert, J.; Xu, C.Y.; Grabs, T. Robustness of flood-model calibration using single and multiple events. Hydrol. Sci. J. 2020, 65, 842–853. [CrossRef] 40. Seibert, J.; McDonnell, J.J. Gauging the Ungauged Basin: Relative Value of Soft and Hard Data. J. Hydrol. Eng. 2015, 20, A4014004. [CrossRef] 41. Brath, A.; Montanari, A.; Toth, E. Analysis of the effects of different scenarios of historical data availability on the calibration of a spatially-distributed hydrological model. J. Hydrol. 2004, 291, 232–253. [CrossRef] 42. Seibert, J.; Beven, K.J. Gauging the ungauged basin: How many discharge measurements are needed? Hydrol. Earth Syst. Sci. 2009, 13, 883–892. [CrossRef] 43. Pool, S.; Viviroli, D.; Seibert, J. Prediction of hydrographs and flow-duration curves in almost ungauged catchments: Which runoff measurements are most informative for model calibration? J. Hydrol. 2017, 554, 613–622. [CrossRef] 44. Gharari, S.; Shafiei, M.; Hrachowitz, M.; Kumar, R.; Fenicia, F.; Gupta, H.V.; Savenije, H.H.G. A constraint-based search algorithm for parameter identification of environmental models. Hydrol. Earth Syst. Sci. 2014, 18, 4861–4870. [CrossRef] 45. Bell, V.A.; Moore, R.J. The sensitivity of catchment runoff models to rainfall data at different spatial scales. Hydrol. Earth Syst. Sci. 2000, 4, 653–667. [CrossRef] 46. Shrestha, D.L.; Solomatine, D.P. Data-driven approaches for estimating uncertainty in rainfall-runoff modelling. Int. J. River Basin Manag. 2008, 6, 109–122. [CrossRef] 47. Westerberg, I.K.; McMillan, H.K. Uncertainty in hydrological signatures. Hydrol. Earth Syst. Sci. 2015, 19, 3951–3968. [CrossRef] 48. Allen, R.G.; Pereira, L.S.; Raes, D.; Smith, M. Crop evapotranspiration—Guidelines for computing crop water requirements. In FAO Irrigation and Drainage; FAO: Rome, Italy, 1998. 49. Yadav, M.; Wagener, T.; Gupta, H. Regionalization of constraints on expected watershed response behavior for improved predictions in ungauged basins. Adv. Water Resour. 2007, 30, 1756–1774. [CrossRef] 50. Mcmillan, H.; Westerberg, I.; Branger, F. Five guidelines for selecting hydrological signatures. Hydrol. Process. 2017, 31, 4757–4761. [CrossRef] 51. Yilmaz, K.K.; Gupta, H.V.; Wagener, T. A process-based diagnostic approach to model evaluation Application to the NWS distributed hydrologic model. Water Resour. Res. 2008, 44.[CrossRef] 52. Arnold, J.G.; Allen, P.M. Automated methods for estimating baseflow and ground water recharge from streamflow records. J. Am. Water Resour. Assoc. 1999, 35, 411–424. [CrossRef] 53. Shamir, E.; Imam, B.; Gupta, H.V.; Sorooshian, S. Application of temporal streamflow descriptors in hydrologic model parameter estimation. Water Resour. Res. 2005, 41, 1–16. [CrossRef] 54. Sankarasubramanian, A.; Vogel, R.M.; Limbrunner, J.F. Climate elasticity of streamflow in the United States. Water Resour. Res. 2001, 37, 1771–1781. [CrossRef] 55. Donnelly, C.; Andersson, J.C.M.; Arheimer, B. Using flow signatures and catchment similarities to evaluate the E-HYPE multi- basin model across Europe. Hydrol. Sci. J. 2016, 61, 255–273. [CrossRef] 56. Westerberg, I.K.; Wagener, T.; Coxon, G.; McMillan, H.K.; Castellarin, A.; Montanari, A.; Freer, J. Uncertainty in hydrological signatures for gauged and ungauged catchments. Water Resour. Res. 2016, 52, 1847–1865. [CrossRef] 57. Bergström, S. Development and Application of a Conceptual Runoff Model for Scandinavian Catchments; Report RHO 7; Swedish Meteorological and Hydrological Institute: Norrkoping, Sweden, 1976; 134p. 58. Lindström, G.; Bergström, S. Improving the HBV and PULSE-models by use of temperature anomalies. Vannet i Norden 1992, 25, 16–23. 59. Seibert, J. Estimation of Parameter Uncertainty in the HBV Model. Nord. Hydrol. 1997, 28, 247–262. [CrossRef] 60. Lindström, G.; Johansson, B.; Persson, M.; Gardelin, M.; Bergström, S. Development and test of the distributed HBV-96 hydrological model. J. Hydrol. 1997, 201, 272–288. [CrossRef] 61. Geem, Z.W.; Kim, J.H.; Loganathan, G.V. A New Heuristic Optimization Algorithm: Harmony Search. Simulation 2001, 76, 60–68. [CrossRef] 62. Dai, X.; Yuan, X.; Zhang, Z. A self-adaptive multi-objective harmony search algorithm based on harmony memory variance. Appl. Soft Comput. J. 2015, 35, 541–557. [CrossRef] 63. Deb, K.; Pratap, A.; Agarwal, S.; Meyarivan, T. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [CrossRef] Water 2021, 13, 970 25 of 25

64. Wang, H.; Jiao, L.; Yao, X. Two_Arch2: An Improved Two-Archive Algorithm for Many-Objective Optimization. IEEE Trans. Evol. Comput. 2014, 19, 524–541. [CrossRef] 65. Blazkova, S.; Beven, K. A limits of acceptability approach to model evaluation and uncertainty estimation in flood frequency estimation by continuous simulation: Skalka catchment, Czech Republic. Water Resour. Res. 2009, 45, W00B16. [CrossRef] 66. Komuro, R.; Ford, E.D.; Reynolds, J.H. The use of multi-criteria assessment in developing a process model. Ecol. Modell. 2006, 197, 320–330. [CrossRef] 67. Zhang, H.; Huang, G.H.; Wang, D.; Zhang, X. Multi-period calibration of a semi-distributed hydrological model based on hydroclimatic clustering. Adv. Water Resour. 2011, 34, 1292–1303. [CrossRef] 68. Krause, P.; Boyle, D.P.; Bäse, F. Comparison of different efficiency criteria for hydrological model assessment. Adv. Geosci. 2005, 5, 89–97. [CrossRef] 69. Madsen, H. Automatic calibration of a conceptual rainfall-runoff model using multiple objectives. J. Hydrol. 2000, 235, 276–288. [CrossRef] 70. Boyle, D.P.; Gupta, H.V.; Sorooshian, S. Toward improved calibration of hydrologic models: Combining the strengths of manual and automatic methods. Water Resour. Res. 2000, 36, 3663–3674. [CrossRef] 71. Gupta, H.V.; Sorooshian, S.; Yapo, P.O. Status of automatic calibration for hydrologic models: Comparison with multilevel expert calibration. J. Hydrol. Eng. 1999, 4, 135–143. [CrossRef] 72. Dogulu, N.; López López, P.; Solomatine, D.P.; Weerts, A.H.; Shrestha, D.L. Estimation of predictive hydrologic uncertainty using the quantile regression and UNEEC methods and their comparison on contrasting catchments. Hydrol. Earth Syst. Sci. 2015, 19, 3181–3201. [CrossRef]