Softmax Action Selection

DISSERTATION OPTIMAL RESERVOIR OPERATIONS FOR RIVERINE WATER QUALITY IMPROVEMENT: A REINFORCEMENT LEARNING STRATEGY Submitted by Jeffrey Donald Rieker Department of Civil and Environmental Engineering In partial fulfillment of the requirements For the Degree of Doctor of Philosophy Colorado State University Fort Collins, Colorado Spring 2011 Doctoral Committee: Advisor: John W. Labadie Darrell G. Fontane Donald K. Frevert Charles W. Anderson Copyright by Jeffrey Donald Rieker 2011 All Rights Reserved ABSTRACT OPTIMAL RESERVOIR OPERATIONS FOR RIVERINE WATER QUALITY IMPROVEMENT: A REINFORCEMENT LEARNING STRATEGY Complex water resources systems often involve a wide variety of competing objectives and purposes, including the improvement of water quality downstream of reservoirs. An increased focus on downstream water quality considerations in the operating strategies for reservoirs has given impetus to the need for tools to assist water resource managers in developing strategies for release of water for downstream water quality improvement, while considering other important project purposes. This study applies an artificial intelligence methodology known as reinforcement learning to the operation of reservoir systems for water quality enhancement through augmentation of instream flow. Reinforcement learning is a methodology that employs the concepts of agent control and evaluative feedback to develop improved reservoir operating strategies through direct interaction with a simulated river and reservoir environment driven by stochastic hydrology. Reinforcement learning methods have advantages over other more traditional stochastic optimization methods through implicit learning of the underlying stochastic structure through interaction with the simulated environment, rather than requiring a priori specification of probabilistic models. Reinforcement learning can also be coupled with various computing efficiency techniques as well as other machine ii learning methods such as artificial neural networks to mitigate the “curse of dimensionality” that is common to many optimization methodologies for solving sequential decision problems. A generalized mechanism is developed, tested, and evaluated for providing near- real time operational support to suggest releases of water from upstream reservoirs to improve water quality within a river using releases specifically designated for that purpose. The algorithm is designed to address a variable number of water quality constituents, with additional flexibility for adding new water quality requirements and learning updated operating strategies in a non-stationary environment. The generalized reinforcement learning algorithm is applied to the Truckee River in California and Nevada as a case study, where the federal and local governments are purchasing water rights for the purpose of augmenting Truckee River flows to improve water quality. Water associated with those acquired rights can be stored in upstream reservoirs on the Truckee River until needed for prevention of water quality standard violations in the lower reaches of the river. This study shows that in order for the water acquired for flow augmentation to be fully utilized as a part of a longer-term strategy for water quality management, increased flexibility is required as to how those waters are stored and how well the storage is protected from displacement through reservoir spill during times of high runoff. The results show that with those flexibilities, the reinforcement learning mechanism has the ability to produce both short-term and long-term strategies for the use of the water, with the long-term strategies capable of significantly improving water quality during times of drought over current and historic operating practices. The study also evaluates a number iii of variations and options for the application of reinforcement learning methods, as well as use of artificial neural networks for function generalization and approximation. iv ACKNOWLEDGEMENTS Credit for the completion of this dissertation lies with the Lord Jesus Christ, through whom all things are possible. I would like to thank my advisor Dr. John Labadie for his help and guidance through the completion of this project. It has been an honor and privilege to work with him through the lengthy duration of the project. I would also like to thank my entire doctoral committee for their support throughout, including Dr. Darrell Fontane, Dr. Donald Frevert, and Dr. Charles Anderson. I appreciate Don Frevert’s role in connecting me with the Bureau of Reclamation, creating opportunities for me like this one and many others. I would additionally like to acknowledge the Bureau of Reclamation for their support of this project. I thank my wife, Cathy Rieker, for her immeasurable love and support through the completion of this project. It would not have been a reality without her, and the countless hours she spent caring for our family and household which provided me the chance to see the project to completion. Thanks to our children Ryan, Megan, and Erin, for allowing me to sneak away for a portion of their early years to accomplish this. I also thank all those of our family who supported both Cathy and I through this effort; Donald and Marilyn Rieker, Tom and Linda McDavid, Greg Rieker and Julie Steinbrenner, Geoff and Susan Barker and their children Alex and Elizabeth, Grier and Karen Laughlin and their v children Addison, Grier, and Caroline, as well as the numerous members of our extended families. My appreciation also goes to my younger brother Dr. Greg Rieker, who completed his doctoral work in advance of me. This was a special opportunity for me to follow him in this achievement, and his help and guidance were critical to my project’s completion. Thanks to all those who assisted with the review and completion of this dissertation, including my parents Don and Marilyn Rieker, Kim Mitchell, Tom Strekal, Tom Scott, Jaime Trammell, and several others. Many thanks to our model development team; including Shane Coors, Heather Gacek, Mike Mann, Tom Scott, Jeff Boyer, Pat Fritchel, Nola Mitchell, and many others. I also appreciate all of those that provided tremendous support to both myself and my wife Cathy during this work, including Kenneth and Callie Parr, Betsy Rieke, Joe and Jill Andrews, Jay and Paula Frey, Marc and Amber Walling, Karen Lamb, AnneMarie McCann, and countless others who are too numerous to list. Thanks to all! vi TABLE OF CONTENTS 1 – INTRODUCTION ........................................................................................................ 1 1.1 – Background and Motivation .................................................................................. 1 1.2 – Objectives of Study ................................................................................................ 4 1.3 – Contribution of This Research ............................................................................... 5 1.4 – Organization of This Dissertation .......................................................................... 7 2 – LITERATURE REVIEW ............................................................................................. 8 2.1 – Decision Support and Optimization for Water Quality Management ................... 8 2.1.1 – Traditional Methods ........................................................................................ 9 2.1.2 – Artificial Intelligence and Reinforcement Learning Methods ...................... 10 3 – ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING .............................. 16 3.1 – Artificial Intelligence and Machine Learning ...................................................... 17 3.1.1 – Artificial Intelligence .................................................................................... 17 3.1.2 – Machine Learning ......................................................................................... 24 3.1.3 – Development of an Intelligent Agent based on Machine Learning .............. 33 3.2 – Development of an Intelligent Agent for Water Quality Management ............... 49 3.2.1 – Background and Assumptions ...................................................................... 49 3.2.2 – Agent Specification ....................................................................................... 50 3.2.3 – Learning Methods ......................................................................................... 54 vii 3.2.4 – Exploration Methods ..................................................................................... 57 3.2.5 – Physical Implementation ............................................................................... 65 4 – TRUCKEE RIVER CASE STUDY............................................................................ 86 4.1 – Truckee River Study Area ................................................................................... 87 4.1.1 – Water Quality Issues ..................................................................................... 92 4.1.2 – Water Quality Settlement Agreement ........................................................... 99 4.2 – Truckee River Water Quality Modeling ............................................................ 102 4.3 – Truckee River Water Quality Monitoring ......................................................... 107 4.4 – Truckee River Hydrologic, Policy, and Decision Support Modeling ................ 108 4.5 – Development of the Truckee River Task Environment ..................................... 112 4.5.1 – Development

Softmax Action Selection

Quantum Reinforcement Learning During Human Decision-Making

Latest Snapshot.” to Modify an Algo So It Does Produce Multiple Snapshots, ﬁnd the Following Line (Which Is Present in All of the Algorithms)

The Architecture of a Multilayer Perceptron for Actor-Critic Algorithm with Energy-Based Policy Naoto Yoshida

Dalle Banche Dati Bibliografiche Pag. 2 60 70 80 94 102 Segnalazioni

Macro Action Selection with Deep Reinforcement Learning in Starcraft

Application of Newton's Method to Action Selection in Continuous

Quantum Inspired Algorithms for Learning and Control of Stochastic Systems

Towards Learning in Probabilistic Action Selection: Markov Systems and Markov Decision Processes

Delay-Aware Model-Based Reinforcement Learning for Continuous Control Baiming Chen, Mengdi Xu, Liang Li, Ding Zhao*

Action Selection for Composable Modular Deep Reinforcement Learning

Reinforcement and Shaping in Learning Action Sequences with Neural Dynamics

The Psikharpax Project