Unsupervised Feature Extraction for Reinforcement Learning
Total Page:16
File Type:pdf, Size:1020Kb
Faculteit Wetenschappen en Bio-ingenieurswetenschappen Vakgroep Computerwetenschappen Unsupervised Feature Extraction for Reinforcement Learning Proefschrift ingediend met het oog op het behalen van de graad van Master of Science in de Ingenieurswetenschappen: Computerwetenschappen Yoni Pervolarakis Promotor: Prof. Dr. Peter Vrancx Prof. Dr. Ann Now´e Juni 2016 Faculty of Science and Bio-Engineering Sciences Department of Computer Science Unsupervised Feature Extraction for Reinforcement Learning Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in de Ingenieurswetenschappen: Computerwetenschappen Yoni Pervolarakis Promotor: Prof. Dr. Peter Vrancx Prof. Dr. Ann Now´e June 2016 Abstract When using high dimensional features chances are that most of the features are not important to a specific problem. To eliminate those features and potentially finding better features different possibilities exist. For example, feature extraction that will transform the original input features to a new smaller dimensional feature set or even a feature selection method where only features are taken that are more important than other features. This can be done in a supervised or unsupervised manner. In this thesis, we will investigate if we can use autoencoders as a means of unsupervised feature extraction method on data that is not necessary interpretable. These new features will then be tested in a Reinforcement Learning environment. This data will be represented as RAM states and are blackbox since we cannot understand them. The autoencoders will receive a high dimensional feature set and will transform it into a lower dimension, these new features will be given to an agent who will make use of those features and tries to learn from them. The results will be compared to a manual feature selection method and no feature selection method. i Acknowledgements First and foremost I would like to thank Prof. Dr. Peter Vrancx for helping me find a subject I am passionate about, taking the time for weekly updates and for all his suggestions and numerous conversions on how this subject could be tackled. Secondly, I would also like to thank Prof. Dr. Ann Now´efor piquing my interest in the master Artificial Intelligence when taking her course in my first year on the Vrije Universiteit Brussel. And finally I would also like to thank my mother for supporting me to pursue my studies at university level and my girlfriend for her endless support. ii Contents 1 Introduction 1 1.1 Research Question . .4 2 Machine Learning 6 2.1 Supervised learning . .7 2.1.1 Classification . .7 2.1.2 Regression . 10 2.2 Unsupervised learning . 11 2.3 Underfitting and overfitting . 13 2.4 Bias - Variance . 15 2.5 Ensembles methods . 17 2.5.1 Bagging . 17 2.5.2 Boosting . 18 2.6 Curse of dimensionality . 18 2.7 Evaluating models . 19 2.7.1 Cross validation . 20 3 Artificial Neural Networks 21 3.1 Perceptrons . 21 3.2 Training perceptrons . 22 3.3 Multilayer perceptron . 25 3.4 Activation functions . 26 3.4.1 Sigmoid . 27 3.4.2 Hyperbolic tangent . 28 3.4.3 Rectified Linear Unit . 28 3.4.4 Which is better? . 29 3.5 Tips and tricks . 30 iii 3.6 Backpropagation . 30 3.7 Autoencoders . 32 3.8 Conclusion . 33 4 Reinforcement Learning 34 4.1 The setting . 35 4.2 Rewards . 38 4.3 Markov Decision Process . 39 4.4 Value functions . 40 4.5 Action Selection . 42 4.6 Incrementing Q-values . 44 4.7 Monte Carlo & Dynamic Programming . 45 4.8 Temporal Difference . 46 4.8.1 Q-Learning . 47 4.8.2 SARSA . 48 4.9 Eligibility traces . 48 4.10 Function approximation . 51 5 Experiments and results 54 5.1 ALE . 54 5.2 Space Invaders . 55 5.3 Reconstruction . 56 5.4 Flow of experiments . 59 5.5 Manual features and basic RAM . 60 5.6 Difference between bits and bytes . 61 5.7 Comparing different activation functions . 63 5.8 Initializing Q-values . 65 5.9 Pretraining and extracting other layers . 68 5.10 Combination of RAM and layer . 72 5.11 Visualizing high dimensional data . 73 6 Conclusions 75 6.1 Future work . 76 Appendices 77 A Extended graphs and tables 78 7 Bibliography 82 iv List of Figures 1 Architecture of data processing . .5 2 Example of a decision tree . .8 3 Classification . .9 4 Regression . 11 5 Data of two features . 12 6 k-mean clustering . 13 7 Unsupervised learning: reduction of dimensions . 13 7a MNIST example of the number 2 . 13 7b MNIST reduction of dimensions . 13 8 Difference between under and overfitting . 15 9 Dartboard analogy from (Sammut & Webb, 2011) . 16 10 Bias Variance trade-off . 17 11 Random Forest . 18 12 Searching in different dimensions . 19 12a 1D space . 19 12b 2D space . 19 12c 3D space . 19 13 Example of a perceptron . 21 14 Bitwise operations . 23 14a AND operator . 23 14b OR operator . 23 14c XOR operator . 23 15 XOR with decision boundaries by learnt MLP . 25 16 Multilayer perceptron . 26 17 Other activation functions: linear and step function ...... 27 18 Sigmoid activation function . 27 v 19 Hyperbolic tangent activation function . 28 20 ReLU activation function . 29 21 Example of an autoencoder . 33 22 A Skinner's Box from (Skinner, 1938) . 35 23 Agent Environment setting . 36 24 Another view of the agent environment setting . 36 25 Mountain car; image from (RL-Library, n.d.) . 37 26 Pole Balancing; image from (Anji, n.d.) . 37 27 Maze world . 38 28 Eligibility trace; image from (Sutton & Barto, 1998) . 49 29 Replacing traces; image from (Sutton & Barto, 1998) . 51 30 Coarse coding; image from (Sutton & Barto, 1998) . 53 31 The difference between RAM and Frames . 55 31a RAM . 55 31b Frames . 55 32 Space Invaders screen . 56 33 MSE of autoencoder with 128 bits input . 58 34 MSE Autoencoder from 1024 bits input . 59 35 Difference RAM and RAM with AND . 61 36 Autoencoders on 128 bytes . 62 37 Autoencoders on 1024 bytes . 63 38 Q = −1 .............................. 66 39 Q =1 ............................... 67 40 Extraction of a layer other than the bottleneck . 68 41 Pretraining with extraction of layer 512 . 69 42 Pretraining with extraction to a hidden layer of 4 nodes . 70 43 Pretraining with extraction of layer 512 with dropout . 71 44 Pretraining with extraction of layer 512 with dropout . 72 45 Combining the original layer with the encoded version . 73 46 t-tsne . 74 47 Linear activation function on an autoencoder . 78 48 Sigmoid activation function on an autoencoder . 79 49 ReLU activation function on an autoencoder . 79 50 Pretraining with extraction of layer 512 . 80 51 Combining the original layer with the encoded version . 80 vi List of Tables 1 Classification of animals . .8 2 Predicting the price of a house . 10 3 V ∗(s)................................ 43 4 π∗(s)................................ 43 5 Gridworld Example . 43 6 Comparing different activation functions . 64 7 P-values of the MannWhitney U test . 65 8 The difference between in setting different Q-values . 67 9 Training to a specific layer and extracting a chosen layer . 81 vii List of Algorithms 1 Q-Learning . 47 2 SARSA . 48 3 SARSA(λ)............................. 50 4 Q-Learning(λ)........................... 51 viii Chapter 1 Introduction Artificial Intelligence is a field in computer science which studies a wide range of topics like Machine Learning, Reinforcement Learning and a new rising topic, Deep Learning. Artificial Intelligence is now more part of the daily life than two decades ago. Take for example a robotic vacuum cleaners where the robot knows when to clean the house, to know exactly when the robot must return to the charging station to get a full battery and even to pick up where he has left off after recharging. More than ten years ago the vacuum cleaner robots were not seen as an AI because the robot would simply do random walks, if doing a random walk in a house long enough the whole house would eventually be cleaned. With new algorithms available, the robot can map the house to vacuum efficiently and detect how to make a detour if a object is suddenly in the way. The only way to gather all this data is to perceive all features possible. Another example are the new smart thermostats like Nest thermostat devel- oped by Google or the ATAG ONE thermostat. These new smart thermostats know when the house is empty, when the owners go to work and come back. By learning the behaviour of the owners the thermostat will automatically adapt so that the heating will be higher just before the owners are coming home and the heating will be set lower after the owners go to.