Identification of a Drinking Water Softening Model Using Machine Learning

M.Sc. THESIS Identification of a Drinking Water Softening Model using Machine Learning J.N. Jenden January 2020 Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Graduation committee: Company: Prof.dr. H.J. Zwart (UT) Witteveen+Bos Dr. C. Brune (UT) Deventer, the Netherlands Prof.dr. A.A. Stoorvogel (UT) Ir. E.H.B. Visser (Witteveen+Bos) Control Theory Department of Systems and Control Faculty of Electrical Engineering, Mathematics and Computer Science University of Twente P.O. Box 217 7500 AE Enschede The Netherlands Contents Table of contents ............................................... ii List of figures ................................................. iii Abstract .................................................... iv Acknowledgements ............................................. v Acronyms and Term Dictionary ...................................... vi Chapter 1: Introduction ........................................... 1 Softening Treatment Process . 1 HardWater............................................... 1 What is Machine Learning? . 2 Train, Validation and Test Data . 2 Previous Research . 2 Aim of the Report . 3 Report Layout . 3 Chapter 2: Background Information .................................... 6 Softening Process Configuration . 6 Pellet Softening Reactor . 6 Control Actions in the Softening Treatment Step . 7 pH as a Control Variable . 8 Chapter 3: Data Pre-Processing and Data Analysis ........................... 9 TimeSeries............................................... 9 Normalising the Data . 9 Removing Corrupted Data . 9 Pearson’s Correlation Coefficient . 10 Autocorrelation . 10 Chapter 4: Machine Learning ........................................ 11 Supervised Learning . 11 Train-Validation-Test Data Splitting Method . 11 Walk-forward Data Splitting Method . 12 Hyperparameters and Hyperparameter Grid Searches . 12 Overfitting and Underfitting . 13 Evaluation Metrics . 13 Chapter 5: Neural Networks and XGBoost ................................ 15 Neural Networks . 15 Recurrent Neural Networks (RNNs) . 15 Memory Cells . 16 Standard Time Series NN Structure . 17 LSTM Cells . 17 Regularisation using a Dropout Layer . 18 Gradient Descent . 19 i Introduction to Decision Trees . 20 Difference between Classification and Regression Trees . 20 Introduction to XGBoost . 20 Feature Importance . 21 XGBoost and RNN Model Prediction Horizons . 21 Chapter 6: Methods ............................................. 22 Identification of Inputs and Outputs . 22 Data Collection . 23 Data Pre-Processing and Data Analysis . 23 Prediction . 29 Chapter 7: Machine Learning Results ................................... 30 RNN Train-Validation-Test Model . 30 RNN Walk-Forward Models . 31 XGBoost Train-Validation-Test Model . 35 XGBoost Walk-Forward Models . 36 Chapter 8: Discussion and Conclusions ................................. 37 Chapter 9: Recommendations ....................................... 40 Bibliography ................................................. 42 Appendix A: Drinking WTP Example and Softening Process Background Information . 44 Example WTP . 44 Water Flux . 44 Water Hardness Chemistry . 45 Bypass.................................................. 45 Calcium Carbonate Crystallisation Reaction . 46 Appendix B: Data Analysis ......................................... 47 Pearson’s Correlation Coefficient Matrix . 47 BoxPlot................................................. 47 Appendix C: Machine Learning ...................................... 50 Python vs Matlab®: Machine Learning and Control Theory Implementation . 50 RMSProp Optimisation . 50 Derivation of Backpropagation Equations . 51 The Backpropagation Algorithm . 53 Logistic Activation Function . 54 Appendix D: eXtreme Gradient Boost (XGBoost) ............................ 55 Regularisation Learning Objective . 55 Gradient Tree Boosting . 56 Appendix E: XGBoost and RNN Implementation ............................ 57 XGBoost Results . 57 RNN Results (F=24[hr]) . 61 XGBoost Hyperparameter Selection . 63 RNN Hyperparameter Selection . 64 RNN Walk-Forward Training (F=1[min]) . 64 RNN Train-Validation-Test Training (F=1[min]) . 65 RNN Walk-Forward Training (F=24 [hr]) . 66 RNN Walk-Forward Training (F=4 [hr]) . 68 XGBoost Train-Validation-Test Training . 69 XGBoost Walk-Forward Training . 70 ii List of Figures 1 Softening treatment process diagram . 1 2 Water Treatment Plant (WTP) standard set-up . 6 3 Typical pellet softening fluidised bed reactor . 7 4 Supervised learning example . 11 5 Train-validation-test data split . 12 6 Walk-forward.............................................. 12 7 Simplenetwork ............................................ 15 8 Recurrent Neural Network (RNN) over time . 15 9 Memory cell . 16 10 Example RNN structure . 17 11 LSTM .................................................. 17 12 Dropout layer . 18 13 Gradient descent . 19 14 Regression tree example . 20 15 Gradient boosting . 21 16 XGBoost and RNN model features and targets . 21 17 Data interpolation method . 23 18 Caustic soda dosage flow rate and flow rate . 24 19 pH autocorrelation . 26 20 Mean pH over one day . 27 21 pH box plot over hours. 27 22 A month of data from a drinking water reactor . 28 24 RNN train-validation-test model prediction . 30 25 The first twp RNN walk-forward models . 31 26 The last four RNN walk-forward models . ..

Load more