
LEARNING REPRESENTATIONS USING REINFORCEMENT LEARNING by SOURABH BOSE Presented to the Faculty of the Graduate School of The University of Texas at Arlington in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY THE UNIVERSITY OF TEXAS AT ARLINGTON May 2019 Copyright c by SOURABH BOSE 2019 All Rights Reserved To my Mom & Dad ACKNOWLEDGEMENTS First and foremost I would like to express my sincere gratitude to my supervisor Dr. Manfred Huber. Without his patience and continuous effort, this would not have been possible. I have learned a lot from him, and I am extremely lucky to have him as my mentor. Furthermore, I'd like to thank my thesis committee members, Dr. Gergely Z´aruba, Dr. Farhad Kamangar, Dr. Vassilis Athitsos and Dr. Heng Huang for their insightful discussions and valuable guidance over the years. I would also like to thank Dr. Bob Weems for the wonderful years during which I worked as his teaching assistant. Finally, I would like to thank my family members for their love and support, and my labmates in Learning and Adaptive Robotics (LEARN) lab, who have become a part of my extended family over the years. Thank you. April 22, 2019 iv ABSTRACT LEARNING REPRESENTATIONS USING REINFORCEMENT LEARNING SOURABH BOSE, Ph.D. The University of Texas at Arlington, 2019 Supervising Professor: Manfred Huber The framework of reinforcement learning is a powerful suite of algorithms that can learn generalized solutions to complex decision making problems. However, the applications of reinforcement learning algorithms to traditional machine learning problems such as clustering, classification and representation learning, have rarely been explored. With the advent of large amounts of data, robust models are required which can extract meaningful representations from the data that can potentially be applied to new unseen tasks. The presented work investigates the applications of reinforcement learning algorithms in the perspective of transfer learning by applying algorithms in the framework of reinforcement learning to address a variety of machine learning problems in order to learn concise abstractions useful for transfer. v TABLE OF CONTENTS ACKNOWLEDGEMENTS . iv ABSTRACT . .v LIST OF ILLUSTRATIONS . ix LIST OF TABLES . xii Chapter Page 1. Introduction . .1 1.1 Previous Work . .2 1.1.1 Neural Networks . .4 1.1.2 Reinforcement Learning . .5 1.1.3 Combining Reinforcement Learning Algorithms and Neural Networks . .8 1.1.4 Summary and Contributions . .9 1.2 Document Overview . 10 2. Learning in Neural Networks Using Reinforcement Learning to Address Uncertain Data . 12 2.1 Types of Uncertainty in Data . 13 2.1.1 Interdependent and Non-Identically Distributed Input Data . 13 2.1.2 Imperfect Supervision . 14 2.1.3 Underlying Structural Correlations . 16 3. Incremental Learning of Neural Network Classifiers Using Reinforcement Learning . 18 3.1 Introduction . 18 3.2 Existing methodologies . 19 vi 3.3 Approach . 20 3.3.1 Ensemble Learning MDP . 21 3.3.2 Type Selection MDP . 25 3.3.3 Network Creation MDP . 28 3.4 Experiments . 33 3.5 Conclusion . 40 4. Semi Unsupervised Clustering Using Reinforcement Learning . 41 4.1 Introduction . 41 4.2 Existing Methodologies . 43 4.3 Approach . 43 4.4 Learning the Similarity Function . 47 4.5 Experimental results . 50 4.6 Conclusion . 53 5. Training Neural Networks with Policy Gradient . 54 5.1 Introduction . 54 5.2 Existing Methodologies . 56 5.3 Overview of Reinforcement Learning, Actor-Critic, and Policy Gradient 57 5.4 Approach . 58 5.5 Classification with Incomplete Target Data . 63 5.5.1 Experiments . 65 5.6 Proposed "Lateral" Autoencoder Sparsity . 68 5.6.1 Experiments . 71 5.7 Conclusion . 75 5.8 Future Work . 75 6. MDP Auto-encoder . 76 6.1 Introduction . 76 vii 6.2 Related Work . 78 6.3 Markov Decision Processes . 79 6.3.1 MDP Homomorphisms . 79 6.4 Latent Variable MDP Models . 82 6.4.1 Architecture . 83 6.4.2 Supervised Learning . 85 6.4.3 Adversarial Learning . 88 6.4.4 Cost Functions . 90 6.5 Experiments . 93 6.5.1 Training Procedure . 94 6.6 Conclusions . 97 7. Conclusions . 98 7.1 Contributions . 98 Appendix REFERENCES . 100 viii LIST OF ILLUSTRATIONS Figure Page 2.1 (a) Outline of the learning techniques used in this work; (b) Architecture used by Deep Reinforcement learning algorithms for decision making problems. 13 3.1 Ensemble Learning MDP Workflow . 22 3.2 Flowchart for Type Selection MDP . 27 3.3 Flowchart for Network Creation MDP . 29 3.4 Modifying the network, bold lines indicate weights which are fixed and not relearned, fine lines denote the weights which are initialized to pre- vious values and then relearned while dashed lines denote randomly initialized weights to be relearned . 31 3.5 Policy learned by the Ensemble Learning MDP from the synthetic datasets 34 3.6 Policy learned by the Type Selection MDP from the synthetic datasets 35 3.7 Policy learned by the Network Creation MDP from the synthetic datasets 36 3.8 Accuracies achieved on test set with generic policy, each point repre- senting the test accuracy achieved for a single problem in the set of test problems . 37 3.9 Adaptation rate for concept drift dataset SEA . 39 4.1 K-means clusters. Solution without dimension scaling (Left); Solution with dimension scaling (Right) . 44 4.2 Distance Along a Pair of Constraint Points . 46 4.3 Using Reinforcement learning to scale dimensions . 47 ix 4.4 K-means clusters for a range of generated problems and constraint sets. Left figures show the initial clusters and the right figures show the final clusters after scaling with the learned policy. 52 5.1 General network architecture. (a) Actor network, shaded grey nodes indicate input nodes while black nodes specify the nodes over which the constraints are to be applied. (b) Critic network, checkered black nodes indicate values from constrained nodes with added noise. Shaded grey inputs and activations of affected nodes used as input to critic network, the output being the utility for the activations produced . 59 5.2 Target-less classification network. (a) Actor network, shaded grey nodes indicate input nodes while constraints are applied on the output layer nodes. (b) Critic network, with the input and noisy actor output node activations as input . 64 5.3 Proposed approach with incomplete target data. (a) Mean-squared error for the iris dataset, where the green line shows the validation error while the blue line shows training error. (b) Reward accrued over epochs. (c) Comparison of the approximated gradient vs the true gradient for a single sample and a single node over epochs, where the green line shows the approximated gradient from the critic network while the blue line shows the true gradient. 67 5.4 Sparse autoencoder networks. (a) Actor network, shaded grey nodes indicate input nodes while constraint applied on hidden nodes. (b) Critic network, with input and noisy hidden node activations as input 70 5.5 Proposed form of sparsity. (a) Reconstruction mean-squared error for thyroid dataset (b) Penalty accrued over epochs . 74 x 5.6 Thyroid dataset features. x-axis represents sample activations, y-axis represents individual nodes, intensity represents value of activations of a node for the given sample with white being almost one while black denot- ing non-firing nodes. (a) Proposed lateral inhibition sparsity constraint approach (b) Traditional KL-divergence sparsity constraint approach . 74 6.1 Mapping MDP M to reduced MDP model M 0, where blocks shown as circles denote the aggregated states . 81 6.2 Proposed model architecture. (a) Set of encoding networks for state and action, generative model of MDP in latent space. (b) Discriminator network differentiates between the true and learned model mapping. 82 6.3 Rewards achieved by policy defined over latent space. (a) loss for − reducing CartPole problem to 40 discrete latent states. (b) Rewards over latent policy for solving the CartPole environment using 40 discrete latent states. (c) Rewards over latent policy for solving the Acrobot environment using 40 discrete latent states . 95 6.4 Mapping continuous observed space to discrete graphical models. (a) Sample frame for the Cartpole environment, where the state space is defined with four continuous variables and two discrete actions. (b) Transition dynamics of the learned latent space with five discrete blocks and two actions. Transitions for a single action is shown. 96 xi LIST OF TABLES Table Page 3.1 Accuracies achieved vs other approaches . 38 4.1 Performance analysis of individual policies . 53 5.1 Results from incomplete target data vs traditional network with com- plete target data . 66 5.2 Comparison of the two forms of sparsity . 73 6.1 Performance for different block numbers in the latent MDP model . 96 xii CHAPTER 1 Introduction Reinforcement learning allows extracting suitable priors from many general op- timization problems. The presented work focuses on traditional machine learning problems, classification, clustering and representation learning etc., and applies algo- rithms from the framework of reinforcement learning in order to extract structured information suitable for transfer in such problems. Artificial neural networks have been extensively used to solve many machine learning problems. Recent breakthroughs in deep learning techniques [1] allow neural networks to be extended to various new application domains. However, many real world problems do not possess the complete information required by the standard framework for training such networks. For example, in the case of lifelong learning systems, many algorithms act in dynamic environments where data flows continuously as streaming data. Such learning problems exhibit persistent changes in task require- ments. Many classification problems also face such issues, often termed as a Concept Drift [2][3] of the task distribution. Concept drift refers to the change in relationships between input and output data in the underlying problem over time.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages122 Page
-
File Size-