Water Research 185 (2020) 116221

Contents lists available at ScienceDirect

Water Research

journal homepage: www.elsevier.com/locate/watres

River algal blooms are well pre dicte d by antecedent environmental conditions

∗ Rui Xia a, Gangsheng Wang b,c, , Yuan Zhang a, Peng Yang d, Zhongwen Yang a, Sen Ding a,

Xiaobo Jia a, Chen Yang a, Chengjian Liu a, Shuqin Ma a, Jianing Lin a, Xiao Wang a,

Xikang Hou a, Kai Zhang a, Xin Gao a, Pingzhou Duan a, Chang Qian a a State Key Laboratory of Environmental Criteria and Risk Assessment, Chinese Research Academy of Environmental Sciences, Beijing 10 0 012, b State Key Laboratory of Water Resources and Hydropower Engineering Sciences, University, Wuhan 430072, China c Institute for Environmental Genomics, University of Oklahoma, Norman, OK 73019, USA d School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China

a r t i c l e i n f o a b s t r a c t

Article history: River algal blooms have become a challenging environmental problem worldwide due to strong inter- Received 2 May 2020 ference of human activities and megaprojects (e.g., big dams and large-scale water transfer projects).

Revised 2 July 2020 Previous studies on algal blooms were mainly focused on relatively static water bodies (i.e., lakes and Accepted 22 July 2020 reservoirs), but less on the large rivers. As the largest of the River of China and the Available online 23 July 2020 main freshwater source of the South-to-North Water Diversion Project (SNWDP), the has expe- Keywords: rienced frequent algal blooms in recent decades. Here we investigated the algal blooms during a decade Diatom (2003–2014) in the Han River by two gradient boosting machine (GBM) models with k -fold cross vali- Early warning dation, which used explanatory variables from current 10-day (GBMc model) or previous 10-day period Gradient Boosting Machine (GBM) (GBMp model). Our results advocate the use of GBMp due to its higher accuracy (median Kappa = 0.9)

Han River and practical predictability (using antecedent observations) compared to GBMc. We also revealed that the Nutrient algal blooms in the Han River were significantly modulated by antecedent water levels in the Han River Water level and the Yangtze River and water level variation in the Han River, whereas the nutrient concentrations in Water temperature

Yangtze River the Han River were usually above thresholds and not limiting algal blooms. This machine-learning-based study potentially provides scientific guidance for preemptive warning and risk management of river algal blooms through comprehensive regulation of water levels during the dry season by making use of water conservancy measures in large rivers. ©2020 Elsevier Ltd. All rights reserved.

1. Introduction Ghadouani, 2012 ; Yin et al., 2012 ; Zhang et al., 2016 ; Yang et al., 2017a ). Compared to the eutrophication in relatively static water In recent decades, large-scale algal blooms have become a chal- bodies (e.g., lakes and reservoirs), large river algal blooms could lenging environmental problem in many large rivers worldwide cause more serious consequences to the aquatic ecosystem and ( Ha et al., 2003 ; Jeong et al., 2007 ; Mitrovic et al., 2007 ; Yang et al., human health at a larger scale along the rivers with a length of 2012 ; Liu et al., 2016 ; Xia et al., 2016 ; Yang et al., 2017a ). Algal over hundreds of kilometers ( Reichwaldt and Ghadouani, 2012 ; blooms in rivers due to excessive nutrient enrichment were reg- Zhang et al., 2017 ). For instance, the 1991 Darling River cyanobac- ulated by hydrological regimes influenced by intensified human terial bloom affected almost 10 0 0 km of the Barwon-Darling River activities and climate change ( Zhu et al., 2008 ; Reichwaldt and of Australia ( Bowling and Baker, 1996 ). The Han River, the largest tributary of the Yangtze River of China, has experienced more than 10 consecutive algal blooms (defined by algal density ≥ 10 7 cells L –1 ) since 1992 ( Xia et al., 2001 ; Zhang et al., 2017 ). The algal bloom occurring in the middle-lower Han River of China in 2008

∗ Corresponding author. spanned more than 400 km, which influenced the drinking water

E-mail address: [email protected] (G. Wang). of more than twenty thousand people and resulted in the shut- https://doi.org/10.1016/j.watres.2020.116221 0043-1354/© 2020 Elsevier Ltd. All rights reserved. 2 R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221 down of many waterworks ( Xia et al., 2012 ). The causal analy- 2. Materials and methods sis and prediction of algal blooms in large hydrologically-regulated rivers has become a major concern, as the algal blooms in these 2.1. Study area and data collection rivers not only lead to deteriorating water quality and aquatic ecosystems, but also restrict the socioeconomic development and The Han River is the largest tributary of the Yangtze River of threaten the public health. China ( Fig. 1 ). The length of the main river is about 1577 km, Previous studies on algal blooms were mainly focused on rel- and its total drainage area covers 168,400 km 2 . The Han River atively static water bodies (i.e., lakes and reservoirs), but less on basin ( 30 –34 ° N and 106 –114 ° E ) has a subtropical monsoon cli- the large rivers ( Schmale et al., 2019 ). Revealing the causes of river mate, which has distinct seasons and abundant rainfall. The av- algal blooms is still challenging due to complex hydrological con- erage annual temperature is about 15 °C, and the average annual ditions, as well as interactions between multiple environmental precipitation ranges from 700 to 1300 mm ( Cheng et al., 2019 ; factors in large river systems ( Xia et al., 2019 ). It is just not ex- Xin et al., 2020 ). The reservoir, located in the upper- cess nutrients and nutrient uptake that drive algal blooms in com- middle Han River, is the water source for the middle route of plicated river ecosystems ( Mischke et al., 2011 ; Yin et al., 2012 ; the South-to-North Water Diversion Project (SNWDP) ( Xia et al., Oliver et al., 2014 ). The diffusion and degradation processes of pol- 2016 ). Flowing through the most economically developed region in lutants in rivers are affected by the horizontal and vertical mix- Province, the middle-lower Han River has a length of about ing of water ( Mac Donagh et al., 2008 ; Liu et al., 2012 ; Xia et al., 652 km from the to the confluence of the 2012 ; Evtimova et al., 2014 ). Experimental studies have showed Han River and the Yangtze River ( Xia et al., 2012 ; Cheng et al., that hydrodynamic condition (i.e., fluctuation and agitation of wa- 2019 ; Xin et al., 2020 ). The width of the Han river ranges about ter) usually have significant effects on the migration, diffusion, 60 0–120 0 m in the middle stream and 20 0–30 0 m in the down- growth and accumulation of algae ( Chung et al., 2008 ; Lucas et al., stream, with average water depths of 6–12 m ( Xu, 1996 ; 1997 ; 2009 ; Whitehead et al., 2015 ; Yang et al., 2017a ). Changes in hy- Xu et al., 2002 ). There are about 79 fish species in the middle- drological conditions may lead to altered light intensities and un- lower Han River and the largest group is Cyprinidae, followed by even distribution of nutrients in water bodies, resulting in the Anguidae ( Yang et al., 2020 ). Major aquatic macrophytes are P. per- changes in the absorption dynamics of algae nutrients, thereby af- foliatus, P. malaianus, Vallisneria natans, Salvinia natans, A. philoxe- fecting algae growth ( McKiver and Neufeld, 2009 ; Istvánovics and roides, and Artemisia selengensis ( Yang et al., 2017b ). Honti, 2012 ). Relatively static hydrological conditions will also in- The causes of algal blooms in the downstream Han River are crease the transparency of water bodies, which is beneficial to the very complicated, since it is impacted by the water environment growth of algae ( Mitrovic et al., 2007 ; Zhang et al., 2017 ). Lots of in the Han River regulated by water release from the upstream studies have indicated that the algal growth in rivers is sensitive to Danjiangkou Reservoir, as well as closely related to the hydrolog- changes in hydrological conditions, particularly the changes in wa- ical regime (backwater zone) in the Yangtze River ( Fig. 1 ). In this ter levels ( Yang et al., 2012 ; Zhang et al., 2017 ; Cheng et al., 2019 ). study, we focus on the most serious algal blooms at the Wuhan Though hydrological conditions play an important role in triggering section in the downstream portion of the Han River from year algal blooms in riverine ecosystems ( Ferris and Lehman, 2007 ), no 2003–2014. We used the hydrological data from the Sta- persuasive conclusions have been drawn based on traditional sta- tion and the Station to represent the hydrological condi- tistical or ecological modeling analysis ( Long et al., 2011 ; Kim et al., tions in the Han River and the Yangtze River, respectively ( Fig. 1 ). 2014 ; Xia et al., 2016 ). Hydrological data involved in this study include daily water lev- Here we compile decade-long multi-source field observations els, flow velocities, and streamflow rates. In addition, 10-day wa- and attempt to identify the key driving factors for the occurrence ter quality data (i.e., TN and TP concentrations) and algae data of river algal blooms in the Han River using a machine learning (algae density (L –1 ), Chlorophyll a ( μg L –1 )) at three sections (Bai- method, i.e., the gradient boosting machine (GBM). GBMs are a hezui, Qinduankou and Zongguan, see Fig. 1 ) were provided by the family of powerful machine-learning techniques that have proven Yangtze River Basin Ecological and Environmental Supervision Au- successful in a wide range of domains such as pattern cognition thority. We used the water quality and algae data averaged across ( Schütz et al., 2019 ), text classification ( Natekin and Knoll, 2013 ), these three sections for modeling. motion detection ( Bouwman et al., 2020 ), and ecological and envi- ronmental studies ( De’ath and Fabricius, 20 0 0 ). The learning pro- cedure in GBMs provide a more accurate estimate of the response 2.2. Explanatory variables and response variable for the modeling variable by consecutively error-fitting ( Natekin and Knoll, 2013 ). Algal blooms in the Han River of China were dominated by ex- Based on available data in the Han River and the Yangtze River, cessive diatom ( Cyclotella sp.) growth, which led to deteriora- we selected three categories of representative explanatory vari- tion of aquatic ecosystems and threatened drinking water secu- ables for the GBM modeling: (i) hydrologic variables in the Han rity ( Xia et al., 2019 ; Xin et al., 2020 ). As the nutrient (total nitro- River (HR) and the Yangtze River (YR): flow velocity (HRV and YRV, gen (TN) and total phosphorus (TP)) concentrations in the down- with units of m s –1 ), streamflow discharge (HRQ and YRQ, m 3 s –1 ), stream Han River in spring were usually far above the eutrophi- water level (HRWL and YRWL, m), and coefficient of variation for cation thresholds (i.e., TN = 0.2 mg N L –1, TP = 0.02 mg P L –1) water level (HRWLcv and YRWLcv, unitless); (ii) water tempera- as per China’s National Water Quality Standard (GB3838-2002) ture (WTMP, °C) in the Han River; and (iii) water quality in the ( Liang et al., 2012 ), we hypothesized that the algal blooms in the Han River: TN (mg N L –1 ) and TP (mg P L –1 ). We excluded the mid-downstream Han River were primarily regulated by hydrologic streamflow discharges (HRQ and YRQ, m 3 s –1 ) from explanatory conditions in the Han River and the Yangtze River. We developed variables as they were highly correlated with water levels (corre- two GBM models, i.e., GBMc and GBMp, using measurements of ex- lation coefficient ρ = 0.78 for HR and 0.91 for YR, SI Fig. S1). This planatory variables (TN and TP concentrations, water temperature, is understandable as streamflow discharge is usually estimated by and hydrologic variables) from current and previous 10-day period, the stage-discharge relation ( Mansanarez et al., 2019 ). However, we respectively. We expect to build a predictive model (i.e., the GBMp kept both flow velocities (HRV and YRV) and water levels (HRWL model using antecedent 10-day period data) for the algal blooms and YRWL) in the GBM modeling because they represented water in the Han River, which could be of practical significance for the mobility and quantity, respectively, although these two variables prevention and control of algal blooms in large rivers. were moderately correlated ( ρ ≤ 0.7, SI Fig. S1). R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221 3

Fig. 1. Mid-downstream of the Han River, the largest tributary of the Yangtze River of China. Water levels were measured at the Hanchuan Station for the Han River and at the Hankou Station for the Yangtze River. Water quality and algae density were measured at three monitoring stations: Baihezui (BHZ), Qinduandou (QDK), and Zongguan (ZG). The Danjiangkou Reservoir serves as the freshwater source for the middle route of the South-to-North Water Diversion Project (SNWDP).

As algal blooms generally lasted around 10 days in the Han validation also helps to avoid overfitting of the model ( Natekin and River ( Xia et al., 2016 ; Cheng et al., 2019 ), we divided each month Knoll, 2013 ). We adopted both strategies to validate our GBM mod- into three periods, i.e., the early 10-day period, the middle 10-day els, where the k -fold cross-validation was embedded in the split- period, and the rest days as the third (late) period. Previous studies data validation procedure. have suggested using 10 7 cells L –1 as the indicator of algal blooms First, we used the full datasets during the study period to train in the Han River ( Lu et al., 20 0 0 ; Dou et al., 2002 ; Xia et al., 2016 ; the GBMc and GBMp model respectively, with the k -fold cross- Cheng et al., 2019 ). In this study, we adopted the same threshold validation method, where the training dataset was further parti- to determine the outbreak of algal blooms in the Han River, i.e., tioned into k (i.e., k = 10 in this study) subsets and each subset the presence and absence of blooms is defined by algal density was held out while the model was trained on all other subsets ≥ 10 7 cells L –1 and < 10 7 cells L –1 , respectively. In summary, the ( De’ath and Fabricius, 20 0 0 ). We trained the models multiple times response variable is binary, i.e., the presence or absence of algal (i.e., 10,0 0 0 times) to assess the uncertainty in model performance blooms. We attempted to explore the dependence of algal blooms and outputs ( Nelson et al., 2018 ). on nine explanatory variables by the GBM modeling: HRV and YRV, Second, regarding the model (GBMc or GBMp) performed better HRWL and YRWL, HRWLcv and YRWLcv, WTMP, TN, and TP. in the first step, we further implemented the split-sample valida- tion, i.e., splitting the full dataset into two subsets: one for model training and the other for testing ( Natekin and Knoll, 2013 ). To ex- 2.3. Gradient boosting machine modeling amine the effects of data partitioning on model performance, we used six different percentages (50–100% with a 10% interval) to The GBM is a powerful machine learning technique to train partition the training and testing subsets. Amid the model train- decision trees in a gradual, additive and sequential manner ing, the k -fold cross-validation was also used. ( Natekin and Knoll, 2013 ). The final GBM model is the weighted sum of the predictions generated by the previous tree models, where the classification process is repeated so that subsequent 2.4. Relative importance of explanatory variables trees attempt to classify observed data that have not been well classified by the previous trees. The GBM algorithm has been used Relative importance of each explanatory variable was estimated to solve a variety of practical applications including the binary with the use of ‘varImp’ function in the ‘caret’ package ( Kuhn et al., classification problem ( De’ath and Fabricius, 20 0 0 ; Natekin and 2020 ). Variable importance was determined by calculating the rel- Knoll, 2013 ), which is the case for this study. ative influence of each variable: whether that variable was se- We examined two GBM models using the nine explanatory lected to split on during the tree building process, and how much variables from current or previous period. We named these two the squared error (over all trees) improved (decreased) as a re- models as GBMc and GBMp, respectively. The GBM modeling was sult. In each GBM model run, the variable importance scores were conducted using the ‘gbm’ method with the k -fold cross valida- scaled to be between 0 (least important) and 100 (most impor- tion in the R package ‘caret’ ( Kuhn et al., 2020 ). To build a pre- tant). We summarized the variable importance scores from en- dictive model we need to implement model calibration and val- semble (i.e., 10,0 0 0) model runs for GBMc and GBMp, respec- idation. Model validation can be done using data that are not tively. The quantitative effects of important predictors on the oc- used in model training (i.e., split-sample validation) or by the k - currence of algal blooms were analyzed by the R package ‘pdp’ fold cross-validation approach ( De’ath and Fabricius, 20 0 0 ). Cross- ( Greenwell, 2017 ), which enables to graphically examine the de- 4 R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221 pendence of the response variable on subsets of the explanatory ( Fig. 2 b). The algal densities usually reached the highest in mid- variables. Specifically, we used the two-predictor partial depen- dle February, whereas the average algae densities in late Febru- dence analysis ( Greenwell, 2017 ) to quantify the interactions be- ary were similar to that in middle March. In summary, there were tween important variables and their impacts on the occurrence 102 sets (periods) of data (only 3-period data available in the first probability of algal blooms in the Han River. We only show the year 2003) for modeling, among which there were 12 algal blooms partial dependence plot within the ranges of the training data to events with algal density ≥ 10 7 cells L –1 . avoid extrapolation of such relationship in regions outside the area of the training data ( Greenwell, 2017 ). 3.2. Predictive modeling of river algal blooms by gradient boosting machine 2.5. Statistics As mentioned in the methods, we were wondering if it was possible to build a predictive model (i.e., GBMp) driven by in- Model performance was evaluated by accuracy, Cohen’s Kappa put data from the previous 10-day period. Therefore, we tested statistic, and the Area Under the Receiver Operating Characteris- two models (GBMp and GBMc) to see if there was any differ- tic curve (AUROC). The accuracy criterion evaluates the percentage ence in modeling performance between them. If the GBMp model of correctly classified instances out of all instances ( Natekin and performed well compared to the GBMc model, we would have a Knoll, 2013 ). The Kappa statistic quantitatively measures the mag- model with a 10-day lead time, which would be valuable for the nitude of agreement on the presence or absence of algal blooms early-warning of river algal blooms. between model simulations (predictions) and observations, where Overall, the performance of GBMp was significantly better than Kappa = (observed accuracy –expected accuracy)/(1– expected ac- that of GBMc ( Fig. 3 a, p -value < 0.001) as per the Kruskal-Wallis curacy) ( Viera and Garrett, 2005 ). Kappa values above 0.6 demon- test ( R Core Team, 2018 ). The median Kappa statistic was 0.6 for strate substantial strength of agreement ( Landis and Koch, 1977 ; GBMc and 0.9 for GBMp ( Fig. 3 a). The 50% confidence interval of Viera and Garrett, 2005 ). A Kappa value of 1 indicates a complete Kappa was much higher for GBMp (0.78–0.95) than that of GBMc agreement. The AUROC is another useful metric to show the trade- (0.43–0.71). Noting that Kappa values above 0.6 demonstrate sub- off between true positive rate and false positive rate across differ- stantial strength of agreement ( Landis and Koch, 1977 ; Viera and ent decision thresholds ( Schütz et al., 2019 ). An AUROC of 1 repre- Garrett, 2005 ). Our results showed that 95% of the GBMp mod- sents a complete agreement and AUROC ≥ 0.8 is considered to be els achieved a Kappa above 0.64, whereas only 50% of the GBMc “excellent” ( Hart, 2015 ). models yielded a Kappa above 0.6. In addition, among the 10,0 0 0 We used the non-parametric Kruskal-Wallis (KW) method model runs, 19.5% of the GBMp models demonstrated a complete ( Giraudoux, 2013 ) to test the significance of difference in each of agreement (i.e., Kappa = 1) between model predictions and obser- the nine explanatory variables between the presence and absence vations, compared to 2.7% for the GBMc models ( Fig. 3 b). A Kappa of algal blooms at a significance level of 0.05. We used the Spear- value of 1 indicates that the presence and absence of algal blooms man Correlation Analysis (SCA) to quantify the degree of correla- were correctly classified by the model. The excellent model perfor- tion between two variables, as SCA does not require normally dis- mance of GBMp was further corroborated by the matric of AUROC tributed variables ( Asuero et al., 2007 ). The strength of correlation ( Hart, 2015 ), which was generally higher than 0.85. is defined as ( Asuero et al., 2007 ) (i) high if the absolute value of We further applied the split-sample validation to the GBMp the correlation coefficient ( ρ), i.e., | ρ| ≥ 0.7 and p-value < 0.05; (ii) model. The GBMp model performed excellent (Kappa = 1) during moderate if 0.5 ≤ | ρ| < 0.7 and p-value < 0.05; (iii) low or weak the training process no matter what percentage (50–100%) of the if 0.3 ≤ | ρ| < 0.5 and p-value < 0.05; (iv) marginal if 0.3 ≤ | ρ| data were used for training (Table S1). However, the split-sample < 0.5 and 0.05 ≤ p -value < 0.1; and (v) little or uncorrelated if 0 testing (validation) of the GBMp model did not show substantial ≤ | ρ| < 0.3 or p-value > 0.1. strength of agreement (i.e., Kappa < 0.6) until over 90% of the data were used for training, although all the accuracy values were 3. Results greater than or equal to 0.85 (Table S1).

3.1. Variation of algae densities in the downstream Han River 3.3. Identification of important factors for river algal blooms

Based on the algal cell densities measured at three sections ( SI We evaluated the variable importance (denoted by the Fig. S2) of the downstream Han River during 2003–2014, there “var_imp” values between 0 and 100) based on the ensemble sim- were severe algal blooms in four consecutive years (i.e., 2008– ulations of both models (GBMc and GBMp) that achieved complete 2011) ( Fig. 2 a). The algae densities were highly correlated with agreement (Kappa = 1). We found that water level and its variation the concentrations of chlorophyll a (Chl- a ) (correlation coeffi- were generally more important than flow velocity and nutrient (TN cient = 0.73, see SI Fig. S3). The algal blooms generally occurred and TP) concentrations ( Fig. 4 ) during the spring (February–April of between February and April of these years ( SI Fig. S2), with the 2003–2014). Water temperature was also more important than nu- mean algae density ranging from 1.0–5.1 × 10 7 cells L –1 . In partic- trient concentrations since the concentrations of TN and TP were ular, the algal blooms in 2008 had a peak density of 5.3 × 10 7 cells high and not the limiting factors for algal growth. However, the L –1 (compared to the threshold of 1 × 10 7 cells L –1 ) in the Wuhan ranks of the relative importance of the variables were different be- section, with a peak concentration of Chl- a of 30.93 μg L –1 , which tween the two models (GBMc and GBMp). As for the GBMc model, was the most serious algal bloom event in the lower Han River the most important factor was the water level in the Yangtze River during the 12-year study period. Although the highest algal densi- (average var_imp of YRWL = 100), followed by water temperature ties ( > 0.6 × 10 7 cells L –1 ) in 20 04–20 05 and 2012–2013 were also (var_imp of WTMP = 32), water level (var_imp of HRWL = 29), close to the threshold of algal blooms (10 7 cells L –1 ) ( Yin et al., and water level variation (var_imp of HRWLcv = 26) in the Han 2017 ; Cheng et al., 2019 ), the severity was quite different from River ( Fig. 4 a). In the more accurate GBMp model, the water level that in 2008–2011. The maximum algal densities in the other three variation in the Han River became the most important (var_imp years (20 06–20 07 and 2015) were lower than 0.5 × 10 7 cells L –1 of pHRWLcv = 92), followed by the water level in the Yangtze ( Fig. 2 a). The algal blooms mostly occurred from middle Febru- River (var_imp of pYRWL = 86), and the water level (var_imp of ary (M-Feb) to middle March (M-Mar) through multi-year analysis pHRWL = 68) and temperature (var_imp of pWTMP = 52) in the R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221 5

Fig. 2. Algae density in the lower Han River in spring. (a) yearly scale, (b) 10-day scale. The prefixes E, M, and L before the month in (b) represent the early, middle and late of each month, respectively. The square symbol in each box represents the mean value.

Fig. 3. Performance evaluation of the GBMc and GBMp models from 10,0 0 0 runs. (a) Kappa statistic for the two models. (b) Percentage of models (% Models) with Kappa = 1. GBMc and GBMp denote the gradient boosting machine (GBM) model using input data from current 10-day period and previous 10-day period, respectively. The Kappa statistic quantitatively measures the magnitude of agreement on the presence or absence of algal blooms between model simulations and observations. Kappa values above 0.6 demonstrate substantial strength of agreement. A Kappa of 1 indicates complete agreement. The performance of GBMp was much better than GBMc as indicated by the Kruskal-Wallis test (‘ ∗∗∗’ in (a) denotes p-value < 0.001).

Han River ( Fig. 4 b). Regarding the GBMp model, we found only the spring ( Fig. 5 ): one region required relatively higher pHRWL the water levels (pHRWL and pYRWL) were significantly lower (p- (19.64–21.06 m) and wider range of pYRHL (14.92–18.37 m), the value < 0.05) for the presence than the absence of algal blooms ( SI other corresponded to lower pHRWL (17.51–18.58 m) and much Fig. S4). The variation in water level was generally small (ranged narrow range of pYRHL (15.59–16.45 m). 0.001–0.08) during the spring and not significantly different be- tween the absence and occurrence of algal blooms ( SI Fig. S4 and 4. Discussion Fig. S5). For example, the 95% CI of pHRWLcv ranged 0.002–0.03 and 0.001–0.02, and the 95% CI of pYRWLcv ranged 0.003–0.06 4.1. Important factors regulating river algal blooms in the Han River and 0.003–0.05 for the absence and presence of algal blooms, re- spectively. The outbreak of algal blooms was also characterized by The occurrence of river algal blooms are attributed to a va- slightly (statistically insignificant) higher TN and TP concentrations riety of abiotic and biotic factors, such as hydrodynamic condi- as well as lower water temperature and flow velocity ( SI Fig. S4). tions, water temperature, nutrient concentrations, light intensity, Using the two-predictor partial dependence analysis and zooplankton densities ( Ha et al., 2003 ; Lee and Lee, 2018 ; ( Greenwell, 2017 ), we could further quantify the interactions Xia et al., 2019 ). High light intensity and nutrient concentrations between pHRWL and pYRWL and their impacts on the occurrence are required for accrual of high algal biomass in lakes and rivers probability of algal blooms in the lower Han River. There existed ( Stevenson, 2009 ). While light intensity is thought to be a major two regions with very low probability ( ≤ 0.05) of algal blooms in inducing factor for algal blooms in lakes and reservoirs ( Imteaz and Asaeda, 20 0 0 ), frequently varying hydrological conditions in river 6 R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221

(a) (b)

Fig. 4. Relative importance of explanatory variables from the Gradient Boosting Machine (GBM) model using data from (a) current 10-day period (GBMc) and (b) previous 10- day period (GBMp). Explanatory variables include HRWL (Han River (HR) Water Level), HRWLcv (coefficient of variation for HRWL), HRV (HR flow Velocity), YRWL (Yangtze River (YR) Water Level), YRWLcv (coefficient of variation for YRWL), YRV (YR flow Velocity), WTMP (HR Water TEMperature), TN (HR Total Nitrogen) and TP (HR Total Phosphorus). The prefix “p” in each variable in (b) denotes the observations from “previous 10-day period”.

Fig. 5. Partial dependence of river algal blooms on pHRWL and pYRWL (previous 10-day water level in the Han River and the Yangtze River, units: m) plotted on the probability scale. ecosystem have a great influence on the light climate of the algae controls (e.g., zooplankton) also play an important role in regulat- ( Yang et al., 2012 ). In addition, previous studies have shown that ing diatom blooms ( Ha et al., 2003 ; Stevenson, 2009 ). However, the diatoms are better adapted to low light conditions than green algae lack of long-term biological measurements has impeded the inclu- ( Lionard et al., 2005 ; Stevenson, 2009 ; Yang et al., 2012 ). There- sion of such variables in machine-learning-based modeling of algal fore, the dearth of light intensity data in our study might not have blooms in large rivers ( Recknagel et al., 1997 ; Coppola et al., 2006 ; significant effects on the results. We acknowledge that biological Yang et al., 2012 ; Lee and Lee, 2018 ). R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221 7

Based on the available data, our analysis of variable importance (pYRV and pHRV) with respect to their effects on algal blooms using the predictive GBMp model quantitatively show that the ( Fig. 4 b), though water velocities were moderately or weakly corre- occurrence of spring algal blooms in the downstream Han River lated with water levels in the two rivers ( SI Fig. S6). Recent labora- was majorly controlled by hydrologic conditions, i.e., water levels tory and field experiments have shown that hydrologic conditions in the Han River and the Yangtze River and water level variation of affected algal absorption and growth kinetics ( Lucas et al., 2009 ; the Han River, rather than other factors including water tempera- McKiver and Neufeld, 2009 ; Ji et al., 2017 ; Yang et al., 2017a ). For ture, flow velocity, and nutrient (N and P) concentrations. Variable example, it was found that steady water level, less water flow, and importance represents the contribution of each predictor variable lower velocity could reduce water level fluctuation, and lengthen to the constructed model ( Kuhn et al., 2020 ). Different from the the retention of algae in the river, thereby influencing algal mi- eutrophication in lakes and reservoirs likely occurring in summer gration, diffusion, and accumulation ( Soballe and Kimmel, 1987 ; and autumn with relatively higher temperature, river algal blooms Neal et al., 2006 ). In addition, changes in water level and flow ve- usually occur in low temperature seasons, i.e., late winter or early locity might reduce the algal reproduction rate and prevent algal spring ( Yang et al., 2012 ; Xia et al., 2016 ). Suitable water temper- blooms in rivers by accelerating pollutant dispersion and degrada- atures for the growth of blue algae, green algae, and blue–green tion ( Chung et al., 2008 ; Ji et al., 2017 ; Yin et al., 2017 ). algae are approximately 30–35 °C, 20–25 °C, and 15–20 °C, re- Our results imply that the effects of water level fluctuation on spectively ( Cheng et al., 2019 ). In contrast, diatom blooms in rivers algal blooms in the Han River could be more complicated than usually require lower temperature, and their growth may be fa- expected in actual circumstances. The variation in Han River wa- vored by the water environment in spring and fall ( Potapova and ter level (pHRWLcv) was the most important factor identified by Charles, 2007 ; Yin et al., 2012 ). This is supported by our long- the GBMp model, which means that the overall errors in the GBM term analysis that the algal blooms often occurred in the spring- decision-tree has been greatly reduced when split by pHRWLcv. time (February–April) in the Han River. In addition, our results However, pHRWLcv was not significantly different between the show no significant difference (p-value = 0.1) in temperature be- presence and absence of algae ( SI Fig. S4). This implies that the tween the algal-bloom and non-bloom events, although lower tem- river algal blooms in the Han River could not be merely regulated perature was generally found for the presence of algal blooms by water level fluctuation. The importance of pHRWLcv in this ( SI Fig. S4). study might represent the interaction between water level varia- While excess nutrients are an essential prerequisite for algal tion and other hydrodynamic characteristics including the turbu- blooms, previous studies have affirmed that nutrients are not nec- lent eddy viscosity, the buoyancy frequency, and the mixing depth essary the limiting factor for most of the river algal bloom events ( Ji et al., 2017 ). These complex hydrodynamic factors altogether ( Mitrovic et al., 2007 ; Yang et al., 2017a ). A few studies have con- could result in effective or ineffective water circulation patterns in firmed that the concentrations of nutrients (e.g., TN and TP) in the preventing algal blooms in rivers ( Ji et al., 2017 ). Nevertheless, wa- Han River already satisfied the basic conditions for diatom growth ter levels and their variation are critical in developing the predic- but not the major limiting factors of algal blooms ( Liang et al., tive GBMp model, not to mention they are much easier to measure 2012 ; Yin et al., 2012 ). Our data show that the nutrient concen- compared to the aforementioned hydrodynamic variables. trations (TN = 1.6–2.6 mg N L –1 , TP = 0.06–0.30 mg P L –1 ) in the spring of the Han River were far above the eutrophication 4.2. Prediction of river algal blooms by machine learning methods thresholds (TN = 0.2 mg N L –1 , TP = 0.02 mg P L –1 ) ( Liang et al., 2012 ). Moreover, Zeng et al. (2006) concluded that phytoplank- Given the multitude, interplay, and complexity of diverse biotic ton and Chl- a concentrations in the Dam Region had and abiotic factors in regulating algal blooms, developing a robust no significant positive correlations with the main nutrient con- model for providing accurate prediction capability can be a daunt- tents. Similar studies have claimed that nutrients would not be ing challenge ( Coppola et al., 2006 ). Machine learning methods, al- the main limiting factors when TP concentration in the river ex- ternative to linear/nonlinear regression and process-based models, ceeds 0.03 mg L –1, but other physical factors could exert evident have been widely used in ecological and environmental modeling and direct gain effects on river algal blooms ( Hilton et al., 2006 ; studies ( De’ath and Fabricius, 20 0 0 ; Xiao et al., 2017 ; Friedel et al., Sivapragasam et al., 2010 ). Our results show that the TP concentra- 2020 ). Data-driven statistical or machine learning models are fea- tions (0.06–0.30 mg L –1) during the spring in the HR were notably sible approaches for simulating algal blooms in complex river en- higher than 0.03 mg L –1. However, TP concentrations were not sig- vironment, due to their relatively lower data requirement compare nificantly different between the absence (95% CI of TP = 0.07– to mechanistically process-based models ( Park et al., 2015 ; Lee and

0.17 mg L –1) and the presence (95% CI of TP = 0.07–0.19 mg L –1) of Lee, 2018 ; Shen et al., 2019 ). Machine learning models may work algal blooms (p-value = 0.18) ( SI Fig. S4). Such insignificant differ- in a case where statistical models fail due to little or weak cor- ence was also found for the TN concentrations ( SI Fig. S4). All these relations between response and explanatory variables. For exam- studies and analyses further support our findings that TN and TP ple, in this study, the algal densities were not correlated with any concentrations could not be solely used as indicators for river algal explanatory variables except for a low correlation coefficient (– blooms in the spring for the Han River. 0.22) with pWTMP (previous 10-day water temperature) ( SI Fig. Our analysis indicates that the algae blooms in the lower Han S6). However, we successfully built a predictive GBM model (i.e., River was closely related to the hydrologic conditions in both the GBMp) to investigate algal blooms in the Han River by convert- Yangtze River and the Han River, which is evident by notably ing the problem to a binary (presence or absence of algal bloom) lower water levels during the presence than the absence of algal classification issue, which suggests machine-learning-based model blooms ( SI Fig. S4). This means that low water levels in both the could be an effective early-warning tool to predict the outbreak of Han River and the Yangtze River in the previous 10 days might algal blooms in large rivers. be favorable for algal growth. Moreover, the water levels in the Successful model validation could increase confidence in model Yangtze River play a vital role in the occurrence of algal blooms predictions. If enough data are available, we can split the data into in the Han River, as the hydrodynamic conditions in the Yangtze two subsets: one for model training (calibration) and the other for River would exert significant influence on the stratification and testing (validation). However, it may be insufficient to build a GBM stabilization of the water body and the growth and death of al- model using only a subset of the data ( Natekin and Knoll, 2013 ). gae in the lower Han River. In addition, we found that water lev- Because there were only 12 algal bloom events out of the 102 pe- els (pYRWL and pHRWL) were more useful than flow velocities riods in this study, our split-sample validation results suggest us- 8 R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221 ing the full dataset to train the GBM model with the k -fold cross- by the water conservancy and environmental protection depart- validation method ( De’ath and Fabricius, 20 0 0 ). Such k -fold cross- ments. Generally, the water level in a section of a river channel validation is one of the best approaches to build a less biased is closely related to flow velocity and streamflow rate. The water model based on limited available data ( Kuhn et al., 2020 ). level (pHRWL) at the Hanchuan station in the lower Han River re- The GBM model developed here offers a straightforward and flects the comprehensive hydrologic conditions of the Han River. quantitative approach to developing an initial appraisal of how Similarly, the water level (pYRWL) at the Hankou station near the important potentially manageable (e.g., water levels) versus in- outlet of the Han River represents the comprehensive hydrologic tractable (e.g., nutrient loads) and non-manageable (e.g., water conditions of the Yangtze River. Based on our machine learning temperature) system features are. By applying the machine learn- model, we need to coordinate the water levels in the two rivers ing based method, we have the capability to identify important (the Han River and the Yangtze River) for an effective control of driving factors that could not be discerned by traditional ap- algal blooms in the Han River ( Fig. 5 ). These water level ranges proaches such as correlation or regression analyses. may provide a reference for water level regulation in the Han River More importantly, we found that better model performance and the Yangtze River to reduce the risk of algae blooms, though could be achieved when the previous, rather than the current, 10- large uncertainty remains due to interactive effects among multi- day data were used as explanatory variables for the modeling of ple factors including water levels, water temperature and nutrient river algal blooms. Compared to the GBMc model without a lead availability. time, the GBMp model can be potentially used for early-warning On the basis of our findings it appears critical to further detection of the occurrence of algal blooms using 10-day time-lag strengthen the monitoring of pollution discharges and adopt com- input data. This 10-day period represents the natural time lags be- prehensive regulation of water levels in the Han River (Hanchuan tween river system conditions and algal population responses in Station) and the Yangtze River (Hankou Station) during the dry the Han River ( Xia et al., 2016 ; Cheng et al., 2019 ). Our predictive season by making use of water conservancy measures ( Zhou et al., GBMp model indicates that the trajectory of the algal population 2020 ), such as the Danjiangkou Reservoir in the middle-upstream over a 10-day period can be forecasted on the basis of real-time Han River, the Three Gorges Reservoir in the mainstream Yangtze measurements ( Coppola et al., 2006 ). To manage and prevent se- River, and the Yangtze-Han River Water Diversion (YHWD) project vere occurrences, it is of great importance to make reliable early- ( Liu et al., 2016 ). By quantifying the critical importance of hydro- warning predictions of algal blooms by accounting for key environ- logical conditions for river algal blooms, our results highlight the mental variables ( Park et al., 2015 ). The lack of understanding of necessity of implementing water-hydropower joint operation and the complex physicochemical-biological processes and limited data predictive early warning to reduce the risk of river algal blooms availability hinder the development and application of process- influenced by mega water transfer projects. based modeling, particularly the early-warning, of algal blooms ( Park et al., 2015 ; Lee and Lee, 2018 ). One promising alternative 5. Conclusions approach could be the data-driven machine learning methodology such as the GBM model, though it is sensitive to extreme values To improve the security of freshwater resources for the protec- and noises as well as there is currently no fast and efficient base- tion of public health, it is vital to guide policy decision by mak- leaner to capture interactions ( Natekin and Knoll, 2013 ). Therefore, ing predictions of river algal blooms ( Schmale et al., 2019 ). Here we strongly advocate the use of the GBMp model due to its higher we used the GBM methods to investigate the algal blooms dur- accuracy and practical predictability compared to the GBMc model. ing a decade (2003–2014) in the Han River, the largest tributary In addition to the identification of key driving factors, we intend to of the Yangtze River of China. We found that better model perfor- predict the chance of the outbreak of river algal blooms, instead of mance could be achieved when the previous, rather than the cur- exact algal densities, in the next 10-day period. By this means, our rent, 10-day data were used as explanatory variables for the mod- GBMp model is useful as an early-warning decision support system eling of river algal blooms. Therefore, we strongly suggest using for the risk management of river algae blooms. the GBMp model (with a 10-day lead time) due to its higher accu- racy and practical predictability compared to the GBMc model. In 4.3. Implications for mitigation and prevention of river algal blooms addition, our analysis show that TN and TP concentrations could not be solely used as indicators for river algal blooms, whereas the Algal blooms are generally prone to occur in relatively static occurrence of spring algal blooms in the downstream Han River freshwaters (e.g., lakes and reservoirs), but they also happen in was majorly controlled by hydrologic conditions, i.e., water levels large rivers (e.g., the Han River) in recent decades ( Kim et al., 2014 ; in the Han River and the Yangtze River and water level variation of Oliver et al., 2014 ; Xia et al., 2016 ; Schmale et al., 2019 ). Although the Han River. We highlight the necessity to coordinate the water much is yet to be understood about algal blooms in rivers, com- levels in the two rivers for an effective control of algal blooms in pared to lakes, different modeling approaches have been developed the Han River, which could not be merely regulated by water level to analyze the interactions between algae and their environment fluctuation in the Han River. Our study takes an important step for- in rivers based on large-scale field observations ( Stevenson, 2009 ). ward to developing robust predictive models for river algal blooms Our machine learning based prediction of algal blooms using an- by accounting for multiple influencing factors from multiple data tecedent observations might provide possible solutions to the mit- sources. This study was limited by the available data from the igation and prevention of river algal blooms. Wuhan City is located long-term monitoring data program, opportunities exist to expand in the outlet of the Han River flowing into the Yangtze River. It is and improve this analysis by incorporating additional environmen- challenging to reduce the pollution loads to the Han River in the tal and biological variables (e.g., light intensity and macrophyte- near future due to large populations and rapid economic devel- phytoplankton-zooplankton community data) ( Nelson et al., 2018 ). opment. The regulation of water level in rivers is often the main In addition, algal blooms may become more frequent and intense countermeasure for flood control, drought relief, water supply, and in many rivers worldwide that are subject to both eutrophica- improvement of water environment ( Yang et al., 2017a ; Xia et al., tion and hydrological modification with increased pollution and 2019 ). Hydrological regulation is feasible and efficient to imple- the construction of water conservancy projects ( Ha et al., 2003 ; ment to reduce the risk of river algal blooms ( Jeong et al., 2007 ; Mitrovic et al., 2007 ). Insights gained from this study can be also Mitrovic et al., 2007 ; Yang et al., 2012 ; Xin et al., 2020 ). There- used to improve the interpretation, prediction and prevention of fore, there is an urgent need for integrated water management algal blooms in other large rivers. R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221 9

Data and code availability Friedel, M.J. , Wilson, S.R. , Close, M.E. , Buscema, M. , Abraham, P. , Banasiak, L. , 2020. Comparison of four learning-based methods for predicting groundwater redox status. J. Hydrol. (Amst.) 580, 124200 .

The model code used in this study is accessible at https:// Greenwell , 2017. pdp: an R package for constructing partial dependence plots. R. J. github.com/wanggangsheng/GBM-Algae-Bloom.git 9 (1), 421–436 . Ha, K. , Jang, M.-.H. , Joo, G.-.J. , 2003. Winter Stephanodiscus bloom development in the Nakdong River regulated by an estuary dam and . Hydrobiologia Declaration of Competing Interest 506 (1–3), 221–227 . Hart, P.D. , 2015. Receiver operating characteristic (ROC) curve analysis: a tutorial

The authors declare no competing interests. using body mass index (BMI) as a measure of obesity. J. Phys. Activity Res. 1 (1), 5–8 . Hilton, J. , O’Hare, M. , Bowes, M.J. , Jones, J.I. ,2006. How green is my river? A new CRediT authorship contribution statement paradigm of eutrophication in rivers. Sci. Total Environ. 365 (1–3), 66–83 . Imteaz, M.A. , Asaeda, T. , 20 0 0. Artificial mixing of lake water by bubble plume and effects of bubbling operations on algal bloom. Water Res. 34 (6), 1919–1929 . Rui Xia: Resources, Data curation, Formal analysis, Methodol- Istvánovics, V. , Honti, M. , 2012. Efficiency of nutrient management in controlling ogy, Formal analysis, Writing - original draft. Gangsheng Wang: eutrophication of running waters in the Middle Danube Basin. Hydrobiologia

686 (1), 55–71. Resources, Methodology, Formal analysis, Writing - original draft. Jeong, K.-.S. , Kim, D.-.K. , Joo, G.-.J. , 2007. Delayed influence of dam storage and dis- Yuan Zhang: Data curation, Formal analysis. Peng Yang: Data cura- charge on the determination of seasonal proliferations of Microcystis aeruginosa tion, Formal analysis. Zhongwen Yang: Data curation, Formal anal- and Stephanodiscus hantzschii in a regulated river system of the lower Nakdong ysis. Sen Ding: Data curation, Formal analysis. Xiaobo Jia: Data cu- River (South Korea). Water Res. 41 (6), 1269–1279. Ji, D. , Wells, S.A. , Yang, Z. , Liu, D. , Huang, Y. , Ma, J. , Berger, C.J. , 2017. Impacts of wa- ration, Formal analysis. Chen Yang: Data curation, Formal analysis. ter level rise on algal bloom prevention in the tributary of Three Gorges Reser- Chengjian Liu: Data curation, Formal analysis. Shuqin Ma: Data voir, China. Ecol. Eng. 98, 70–81 . curation, Formal analysis. Jianing Lin: Data curation, Formal analy- Kuhn, M., Wing, J., Weston, S., Williams, A., Keefer, C., Engelhardt, A., Cooper, T., Mayer, Z., Kenkel, B. and Benesty, M.2020. R Package ’caret’: classification and sis. Xiao Wang: Data curation, Formal analysis. Xikang Hou: Data Regression Training. https://github.com/topepo/caret/ . curation, Formal analysis. Kai Zhang: Data curation, Formal analy- Kim, K. , Park, M. , Min, J.-.H. , Ryu, I. , Kang, M.-.R. , Park, L.J. , 2014. Simulation of algal sis. Xin Gao: Data curation, Formal analysis. Pingzhou Duan: Data bloom dynamics in a river with the ensemble Kalman filter. J. Hydrol. (Amst.) 519, 2810–2821 . curation, Formal analysis. Chang Qian: Data curation, Formal anal- Landis, J.R. , Koch, G.G. , 1977. The measurement of observer agreement for categori- ysis. cal data. Biometrics 33 (1), 159–174 . Lee, S. , Lee, D. , 2018. Improved prediction of harmful algal blooms in four major South Korea’s rivers using deep learning models. Int J Environ Res Public Health Acknowledgements 15 (7), 1322 . Liang, K. , Wang, X. , Zhang, D. , Zhou, Y. , 2012. Ecological conditions of diatom water

This work was supported by the National Natural Science Foun- blooms formulation in the middle and lower reach of the Hanjiang River and strategy for water bloom control. Environ. Sci. Technol. (in Chinese) 35 (12J), dation of China [grant numbers E090303 , 51879252 ], the Strategic 113–116 . Priority Research Program of Chinese Academy of Sciences [grant Lionard, M. , Muylaert, K. , Van Gansbeke, D. , Vyverman, W. , 2005. Influence of number XDA23040500 ], National Key R&D Program of China [grant changes in salinity and light intensity on growth of phytoplankton communi- ties from the Schelde river and estuary (Belgium/The Netherlands). Hydrobiolo- number 2019YFC0408900], and Joint Research Project on Yangtze gia 540 (1–3), 105–115 . River Conservation and Restoration [grant number 2019-LHYJ-01- Liu, D.F. , Yang, Z.J. , Ji, D.B. , Ma, J. , Cui, Y.J. , Song, L.X. , 2016. A review on the mecha- 0103 ]. We would like to thank Dr. Guoyi Zhou for his suggestions nism and its controlling methods of the algal blooms in the tributaries of Three Gorges Reservoir. J. Hydraul. Eng. (in Chinese) 47 (3), 443–454 . on an earlier draft of this paper. Liu, L. , Liu, D. , Johnson, D.M. , Yi, Z. , Huang, Y. , 2012. Effects of vertical mixing on phytoplankton blooms in Xiangxi Bay of Three Gorges Reservoir: implications

for management. Water Res. 46 (7), 2121–2130. Supplementary materials Long, T.-y. , Wu, L. , Meng, G.-h. , Guo, W.-h. , 2011. Numerical simulation for impacts of hydrodynamic conditions on algae growth in Chongqing Section of Jialing Supplementary material associated with this article can be River, China. Ecol. Modell. 222 (1), 112–119 . found, in the online version, at doi:10.1016/j.watres.2020.116221 . Lu, D., Liu, P., Fan, T., Peng, H., Zhang, Z.,2000. The investigation of water bloom in the downstream of the Han River. Res. Environ. Sci. (in Chinese) 13 (2), 28–31 . Lucas, L.V. , Thompson, J.K. , Brown, L.R. , 2009. Why are diverse relationships ob- References served between phytoplankton biomass and transport time? Limnol. Oceanogr. 54 (1), 381–390 .

Asuero, A.G. , Sayago, A . , González, A .G. , 2007. The correlation coefficient: an Mac Donagh, M.E., Casco, M.A., Claps, M.C., 2008. Plankton relationships under small

overview. Crit. Rev. Anal. Chem. 36 (1), 41–59 . water level fluctuations in a subtropical reservoir. Aquatic Ecol. 43 (2), 371–381.

Bouwman, A . , Savchuk, A . , Abbaspourghomi, A . , Visser, B. , 2020. Automated step de- Mansanarez, V., Renard, B., Coz, J.L., Lang, M., Darienzo, M., 2019. Shift Happens! Ad-

tection in inertial measurement unit data from Turkeys. Front. Genet. 11, 207 . justing stage-discharge rating curves to morphological changes at known times.

Bowling, L. , Baker, P. , 1996. Major cyanobacterial bloom in the Barwon-Darling River, Water Resour. Res. 55 (4), 2876–2899.

Australia, in 1991, and underlying limnological conditions. Marine Freshwater McKiver, W.J., Neufeld, Z., 2009. Influence of turbulent advection on a phytoplank-

Res. 47 (4), 643–657 . ton ecosystem with nonuniform carrying capacity. Phys. Rev. E Stat. Nonlin. Soft

Cheng, B. , Xia, R. , Zhang, Y. , Yang, Z. , Hu, S. , Guo, F. , Ma, S. , 2019. Characterization Matter Phys. 79 (6), 061902.

and causes analysis for algae blooms in large river system. Sustain. Cities Soc. Mischke, U., Venohr, M., Behrendt, H., 2011. Using phytoplankton to assess the

51, 101707 . trophic status of German rivers. Int. Rev. Hydrobiol. 96 (5), 578–598.

Chung, S.-.W. , Lee, H. , Jung, Y. , 2008. The effect of hydrodynamic flow regimes Mitrovic, S.M., Chessman, B.C., Davie, A., Avery, E.L., Ryan, N., 2007. Development of

on the algal bloom in a monomictic reservoir. Water Sci. Technol. 58 (6), blooms of Cyclotella meneghiniana and Nitzschia spp. (Bacillariophyceae) in a

1291–1298 . shallow river and estimation of effective suppression flows. Hydrobiologia 596

Coppola Jr, E.A. , Jacinto, A.B. , Lohbauer, S. , Poulton, M. , Szidarvoszky, F. , Atherholt, T. , (1), 173–185.

2006. Forecasting Algal Blooms in Surface Water Systems with Artificial Neural Natekin, A., Knoll, A., 2013. Gradient boosting machines, a tutorial. Front. Neuro-

Networks. NOAH, Lawrenceville, NJ, p. 509 . robot. 7 21-21.

De’ath, G. , Fabricius, K.E. , 20 0 0. Classification and regression trees: a powerful yet Neal, C., Hilton, J., Wade, A.J., Neal, M., Wickham, H.,2006.Chlorophyll-a in the

simple technique for ecological data analysis. Ecology 81 (11), 3178–3192 . rivers of eastern England. Sci. Total Environ. 365 (1–3), 84–104.

Dou, M. , Xie, P. , Xia, J. , Shen, X. , Fang, F. , 2002. Study on algal blooms in the Han Nelson, N.G., Muñoz-Carpena, R., Phlips, E.J., Kaplan, D., Sucsy, P., Hendrickson, J.,

River. Adv. Water Sci. (in Chinese) 13 (5), 557–561 . 2018. Revealing biotic and abiotic controls of harmful algal blooms in a shallow

Evtimova, V.V. , Donohue, I. , Frid, C. , 2014. Quantifying ecological responses to am- subtropical lake through statistical machine learning. Environ. Sci. Technol. 52

plified water level fluctuations in standing waters: an experimental approach. J. (6), 3527–3535.

Appl. Ecol. 51 (5), 1282–1291 . Oliver, A.A., Dahlgren, R.A., Deas, M.L., 2014. The upside-down river: reservoirs, al-

Ferris, J.A. , Lehman, J.T. , 2007. Interannual variation in diatom bloom dynamics: gal blooms, and tributaries affect temporal and spatial patterns in nitrogen and

roles of hydrology, nutrient limitation, sinking, and whole lake manipulation. phosphorus in the Klamath River, USA. J. Hydrol. (Amst.) 519, 164–176.

Water Res. 41 (12), 2551–2562 . Park, Y., Cho, K.H., Park, J., Cha, S.M., Kim, J.H., 2015. Development of early-warn-

Giraudoux, P.2013. R Package ’pgirmess’: data analysis in ecology. http://cran. ing protocol for predicting chlorophyll-a concentration using machine learning

r-project.org/web/packages/pgirmess/index.html . models in freshwater and estuarine reservoirs, Korea. Sci. Total Environ. 502, 31–41 . 10 R. Xia, G. Wang and Y. Zhang et al. / Water Research 185 (2020) 116221

Potapova, M. , Charles, D.F. , 2007. Diatom metrics for monitoring eutrophication in Xin, X. , Zhang, H. , Lei, P. , Tang, W. , Yin, W. , Li, J. , Zhong, H. , Li, K. , 2020. Algal blooms rivers of the United States. Ecol. Indic. 7 (1), 48–70 . in the middle and lower Han River: characteristics, early warning and preven- R Core Team, 2018. R: A language and Environment for Statistical Computing. R tion. Sci. Total Environ. 706, 135293 . Foundation for Statistical Computing, Vienna, Austria https://www.R-project. Xu, J. , 1996. Wandering braided river channel pattern developed under quasi-e- org/ . quilibrium: an example from the Hanjiang River, China. J. Hydrol. (Amst.) 181 Recknagel, F. , French, M. , Harkonen, P. , Yabunaka, K.-.I. , 1997. Artificial neural net- (1–4), 85–103 . work approach for modelling and prediction of algal blooms. Ecol. Modell. 96 Xu, J. , 1997. Study of sedimentation zones in a large sand-bed braided river: an (1–3), 11–28 . example from the Hanjiang River of China. Geomorphology 21 (2), 153–165 . Reichwaldt, E.S. , Ghadouani, A. , 2012. Effects of rainfall patterns on toxic cyanobac- Xu, X. , Wu, Z. , Yu, D. , Liu, C. , Li, Z. , Xiao, K. , 2002. Diversity of aquatic plants in terial blooms in a changing climate: between simplistic scenarios and complex the Hanjiang River Basin and the effects of the North-to-South Water Transfer dynamics. Water Res. 46 (5), 1372–1393 . Project. Acta Ecologica Sinica (in Chinese) 22 (11), 1933–1938 . Schmale, D.G. , Ault, A.P. , Saad, W. , Scott, D.T. , Westrick, J.A. , 2019. Perspectives on Yang, B. , Dou, M. , Xia, R. , Kuo, Y.-.M. , Li, G. , Shen, L. , 2020. Effects of hydrological Harmful Algal Blooms (HABs) and the Cyberbiosecurity of Freshwater Systems. alteration on fish population structure and habitat in river system: a case study Front. Bioeng. Biotechnol. 7, 128 . in the mid-downstream of the Hanjiang River in China. Glob. Ecol. Conserv. 23, Schütz, N. , Leichtle, A.B. , Riesen, K. , 2019. A comparative study of pattern recogni- e01090 . tion algorithms for predicting the inpatient mortality risk using routine labora- Yang, J.R. , Lv, H. , Isabwe, A. , Liu, L. , Yu, X. , Chen, H. , Yang, J. , 2017a. Disturbance-in- tory measurements. Artif. Intell. Rev. 52 (4), 2559–2573 . duced phytoplankton regime shifts and recovery of cyanobacteria dominance in Shen, J. , Qin, Q. , Wang, Y. , Sisson, M. , 2019. A data-driven modeling approach for two subtropical reservoirs. Water Res. 120, 52–63 . simulating algal blooms in the tidal freshwater of James River in response to Yang, N. , Zhang, Y. , Duan, K. , 2017b. Effect of hydrologic alteration on the commu- riverine nutrient loading. Ecol. Modell. 398, 44–54 . nity succession of macrophytes at Site, Hanjiang River, China. Scien- Sivapragasam, C. , Muttil, N. , Muthukumar, S. , Arun, V.M. , 2010. Prediction of algal tifica (Cairo) 2017, 4083696 . blooms using genetic programming. Mar. Pollut. Bull. 60 (10), 1849–1855 . Yang, Q. , Xie, P. , Shen, H. , Xu, J. , Wang, P. , Zhang, B. , 2012. A novel flushing strategy Soballe, D.M. , Kimmel, B.L. ,1987. A large-scale comparison of factors influencing for diatom bloom prevention in the lower-middle Hanjiang River. Water Res. 46 phytoplankton abundance in rivers, lakes, and impoundments. Ecology 68 (6), (8), 2525–2534 . 1943–1954 . Yin, D.C. , Huang, W. , Wu, X.H. , Song, L.R. , 2012. Preliminary study on biological char- Stevenson, R.J. , 2009. Algae of river ecosystems. In: Likens, G.E. (Ed.), Encyclopedia acteristics of spring diatom bloom in the Hanjiang River. J. Yangtze River Sci. of Inland Waters. Academic Press, Oxford, pp. 114–122 . Res. Inst. (in Chinese) 29 (2), 6–10 . Viera, A.J. , Garrett, J.M. , 2005. Understanding interobserver agreement: the kappa Yin, D.C. , Yin, Z.J. , Yang, C.H. , Wang, D. , 2017. Key hydrological thresholds related statistic. Fam. Med. 37 (5), 360–363 . to algae bloom in middle and lower reaches of Hanjiang River and studies on Whitehead, P.G. , Bussi, G. , Bowes, M.J. , Read, D.S. , Hutchins, M.G. , Elliott, J.A. , Dad- mitigation measures. China Water Resour. (in Chinese) (9) 31–34 . son, S.J. , 2015. Dynamic modelling of multiple phytoplankton groups in rivers Zeng, H. , Song, L. , Yu, Z. , Chen, H. , 2006. Distribution of phytoplankton in the Three- with an application to the Thames river system in the UK. Environ. Modell. -Gorge Reservoir during rainy and dry seasons. Sci. Total Environ. 367 (2–3), Softw. 74, 75–91 . 999–1009 . Xia, J. , Dou, M. , Zhang, H. , 2001. Dynamic model of eutrophication in Hanjiang River. Zhang, Y. , Shi, K. , Liu, J. , Deng, J. , Qin, B. , Zhu, G. , Zhou, Y. , 2016. Meteorological Chongqing Environ. Sci. (in Chinese) 19 (7), 20–23 . and hydrological conditions driving the formation and disappearance of black Xia, R. , Chen, Z. , Zhou, Y. , 2012. Impact assessment of climate change on algal blooms, an ecological disaster phenomena of eutrophication and algal blooms. blooms by a parametric modeling study in Han river. J. Resour. Ecol. 3 (3), Sci. Total Environ. 569-570, 1517–1529 . 209–219 . Zhang, Y. , Xia, R. , Zhang, M.H. , Jing, Z.X. , Zhao, Q. , Fan, J.T. , 2017. Research progress Xia, R. , Zhang, Y. , Critto, A. , Wu, J. , Fan, J. , Zheng, Z. , Zhang, Y. , 2016. The potential on cause analysis and modeling of river algal blooms under background of mega impacts of climate change factors on freshwater eutrophication: implications for water projects. Res. Environ. Sci. (in Chinese) 30 (8), 1163–1173 . research and countermeasures of water management in China. Sustainability 8 Zhou, F. , Bo, Y. , Ciais, P. , Dumas, P. , Tang, Q. , Wang, X. , Liu, J. , Zheng, C. , Polcher, J. , (3), 229–245 . Yin, Z. , 2020. Deceleration of China’s human water use and its key drivers. Proc. Xia, R. , Zhang, Y. , Wang, G. , Zhang, Y. , Dou, M. , Hou, X. , Qiao, Y. , Wang, Q. , Yang, Z. , Natl. Acad. Sci. 117 (14), 7702–7711 . 2019. Multi-factor identification and modelling analyses for managing large Zhu, Y.P. , Zhang, H.P. , Chen, L. , Zhao, J.F. , 2008. Influence of the South-North Water river algal blooms. Environ. Pollut. 254, 113056 . Diversion Project and the mitigation projects on the water quality of Han River. Xiao, X. , He, J. , Huang, H. , Miller, T.R. , Christakos, G. , Reichwaldt, E.S. , Ghadouani, A. , Sci. Total Environ. 406 (1–2), 57–68 . Lin, S. , Xu, X. , Shi, J. , 2017. A novel single-parameter approach for forecasting algal blooms. Water Res. 108, 222–231 .