Machine-Learning Methods for Stream Water Temperature Prediction

Hydrol. Earth Syst. Sci., 25, 2951–2977, 2021 https://doi.org/10.5194/hess-25-2951-2021 © Author(s) 2021. This work is distributed under the Creative Commons Attribution 4.0 License. Machine-learning methods for stream water temperature prediction Moritz Feigl1;, Katharina Lebiedzinski1;, Mathew Herrnegger1, and Karsten Schulz1 1Institute for Hydrology and Water Management, University of Natural Resources and Life Sciences, Vienna, Austria These authors contributed equally to this work. Correspondence: Moritz Feigl ([email protected]) Received: 21 December 2020 – Discussion started: 14 January 2021 Revised: 15 April 2021 – Accepted: 27 April 2021 – Published: 31 May 2021 Abstract. Water temperature in rivers is a crucial environ- parameter sets for the tested models, showing the importance mental factor with the ability to alter hydro-ecological as of hyperparameter optimization. Especially the FNN model well as socio-economic conditions within a catchment. The results showed an extremely large RMSE standard deviation development of modelling concepts for predicting river wa- of 1.60 ◦C due to the chosen hyperparameters. ter temperature is and will be essential for effective inte- This study evaluates different sets of input variables, grated water management and the development of adapta- machine-learning models and training characteristics for tion strategies to future global changes (e.g. climate change). daily stream water temperature prediction, acting as a ba- This study tests the performance of six different machine- sis for future development of regional multi-catchment wa- learning models: step-wise linear regression, random forest, ter temperature prediction models. All preprocessing steps eXtreme Gradient Boosting (XGBoost), feed-forward neu- and models are implemented in the open-source R package ral networks (FNNs), and two types of recurrent neural net- wateRtemp to provide easy access to these modelling ap- works (RNNs). All models are applied using different data proaches and facilitate further research. inputs for daily water temperature prediction in 10 Austrian catchments ranging from 200 to 96 000 km2 and exhibiting a wide range of physiographic characteristics. The evalu- 1 Introduction ated input data sets include combinations of daily means of air temperature, runoff, precipitation and global radiation. Water temperature in rivers should not be considered only a Bayesian optimization is applied to optimize the hyperpa- physical property, since it is a crucial environmental factor rameters of all applied machine-learning models. To make and a substantial key element for water quality and aquatic the results comparable to previous studies, two widely used habitats. In particular, it influences riverine species by gov- benchmark models are applied additionally: linear regression erning e.g. metabolism (Álvarez and Nicieza, 2005), distri- and air2stream. ◦ bution (Boisneau et al., 2008), abundance (Wenger et al., With a mean root mean squared error (RMSE) of 0.55 C, 2011), community composition (Dallas, 2008) and growth the tested models could significantly improve water temper- (Imholt et al., 2010); thus, aquatic organisms have a specific ature prediction compared to linear regression (1.55 ◦C) and ◦ range of river temperature they are able to tolerate (Caissie, air2stream (0.98 C). In general, the results show a very sim- 2006). Due to the impact of water temperature on chemical ilar performance of the tested machine-learning models, with ◦ processes (Hannah et al., 2008) and other physical properties a median RMSE difference of 0.08 C between the models. such as density, vapour pressure and viscosity (Stevens et al., From the six tested machine-learning models both FNNs and 1975), stream temperature indirectly influences key ecosys- XGBoost performed best in 4 of the 10 catchments. RNNs tem processes such as primary production, decomposition are the best-performing models in the largest catchment, in- and nutrient cycling within rivers (Friberg et al., 2009). These dicating that RNNs mainly perform well when processes parameters and processes affect the level of dissolved oxygen with long-term dependencies are important. Furthermore, a (Sand-Jensen and Pedersen, 2005) and, of course, have a ma- wide range of performance was observed for different hyper- jor influence on water quality (Beaufort et al., 2016). Published by Copernicus Publications on behalf of the European Geosciences Union. 2952 M. Feigl et al.: Machine-learning methods for stream water temperature prediction Besides its ecological importance, river temperature is also arguments are also the reasons why data-intensive process- of socio-economic interest for electric power and industry based models are widely used despite their high complexity. (cooling), drinking water production (hygiene, bacterial pol- Statistical and machine-learning models are grouped into lution) and fisheries (fish growth, survival and demographic parametric approaches, including regression (e.g. Mohseni characteristics) (Hannah and Garner, 2015). Hence, a chang- and Stefan, 1999) and stochastic models (e.g. Ahmadi- ing river temperature can strongly alter the hydro-ecological Nedushan et al., 2007) and non-parametric approaches based and socio-economic conditions within the river and its neigh- on computational algorithms like neural networks or k- bouring region. Assessing alterations of this sensitive vari- nearest neighbours (Benyahya et al., 2007). In contrast to able and its drivers is essential for managing impacts and en- process-based models, statistical models cannot inform about abling prevention measurements. energy transfer mechanisms within a river (Dugdale et al., Direct temperature measurements are often scarce and 2017). However, unlike process-based models, they do not rarely available. For successful integrated water manage- require a large number of input variables, which are unavail- ment, it will be essential to derive how river temperature will able in many cases. Non-parametric statistical models have be developing in the future, in particular when considering gained attention in the past few years. Especially machine- relevant global change processes (e.g. climate change), but learning techniques have been proofed to be useful tools in also on shorter timescales. The forecast, for example, of river river temperature modelling already (Zhu and Piotrowski, temperature with a lead time of a few days can substantially 2020). improve or even allow the operation of thermal power plants. For this study we chose a set of state-of-the-art machine- Two aspects are important: the efficiency of cooling depends learning models that showed promising results for water tem- on the actual water temperature. On the other hand, legal con- perature prediction or in similar time-series prediction tasks. straints regarding maximum allowed river temperatures due The six chosen models are step-wise linear regression, ran- to ecological reasons can be exceeded when warmed-up wa- dom forest, eXtreme Gradient Boosting (XGBoost), feed- ter is directed into the river after the power plant. This is es- forward neural networks (FNNs) and two types of recurrent pecially relevant during low-flow conditions in hot summers. neural networks (RNNs). Step-wise linear regression models Knowledge of the expected water temperature in the next few combine an iterative variable selection procedure with lin- days is therefore an advantage. An important step in this con- ear regression models. The main advantage of step-wise lin- text is the development of appropriate modelling concepts to ear regression is the possibility of a variable selection proce- predict river water temperature to describe thermal regimes dure that also includes all variable interaction terms, which and to investigate the thermal development of a river. is only possible due to the short run times when fitting the In the past, various models were developed to investi- model. The main disadvantages are the linear regression spe- gate thermal heterogeneity at different temporal and spatial cific assumptions (e.g. linearity, independence of regressors, scales, the nature of past availability and likely future trends normality, homoscedasticity) that might not hold for a given (Laizé et al., 2014; Webb et al., 2008). In general, water problem, which consequently could lead to a reduced model temperature in rivers is modelled by process-based models, performance. To our knowledge only one previous study by statistical/machine-learning models or a combination of both Neumann et al.(2003) already applied this method for pre- approaches. Process-based models represent physical pro- dicting daily maximum river water temperature. cesses controlling river temperature. According to Dugdale The random forest model (RF) (Breiman, 2001) is an et al.(2017), these models are based on two key steps: first, ensemble-learning model that averages the results of multiple calculating energy fluxes to or from the river and then de- regression trees. Since they consist of a ensemble of regres- termining the temperature change in a second step. Calcu- sion trees that are trained on random subsamples of the data, lating the energy fluxes means solving the energy balance RF models are able to model linear and non-linear dependen- equation for a river reach by considering the heat fluxes at cies and are robust to outliers. RF models are fast and easy the air–water and riverbed–water interfaces (Beaufort et al., to use, as they do not need extensive

Load more