Online Optimization with Energy Based Models

Online Optimization with Energy Based Models by Yilun Du B.S., Massachusetts Institute of Technology (2019) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2020 ○c Massachusetts Institute of Technology 2020. All rights reserved. Author................................................................ Department of Electrical Engineering and Computer Science May 15, 2020 Certified by. Leslie P. Kaelbling Professor Department of Electrical Engineering and Computer Science Thesis Supervisor Certified by. Tomas Lozano-Perez Professor Department of Electrical Engineering and Computer Science Thesis Supervisor Certified by. Joshua B. Tenenbaum Professor Department of Brain and Cognitive Science Thesis Supervisor Accepted by . Leslie A. Kolodziejski Professor of Electrical Engineering and Computer Science Chair, Department Committee on Graduate Students 2 Online Optimization with Energy Based Models by Yilun Du Submitted to the Department of Electrical Engineering and Computer Science on May 15, 2020, in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering Abstract This thesis examines the power of applying optimization on learned neural networks, referred to as Energy Based Models (EBMs). We first present methods that enable scalable training of EBMs, allowing an optimization procedure to generate high resolu- tion images. We simultaneously show that resultant models are robust, compositional, and are further easy to learn online. Next we showcase how this optimization procedure can also be used to formulate plans in interactive environments. We further showcase how a similar procedure can be used to learn neural energy functions for proteins, enabling structural recovery through optimization. Finally, we show that by defining generation as a optimization procedure, we can combine generative models from different domains together, and apply optimization on the joint model. Weshow that this allows us to apply various logical operations on images generation, as well as learn to generate new concepts in a continual manner. Thesis Supervisor: Leslie P. Kaelbling Title: Professor Department of Electrical Engineering and Computer Science Thesis Supervisor: Tomas Lozano-Perez Title: Professor Department of Electrical Engineering and Computer Science Thesis Supervisor: Joshua B. Tenenbaum Title: Professor Department of Brain and Cognitive Science 3 4 Acknowledgments I would like to thank Leslie, Tomas, and Josh for helping me guide through the master degree process, giving insightful advice and feedback on my ideas and helping me mature as a researcher. Much of the work done in this thesis was done with Igor, who I am also thankful for both suggesting invaluable advice and guiding my development as a researcher. I am also greatly thankful for all the collaborators that helped me through this work. Finally, I would like to thank my lab-mates for providing great conversations and insights on each of the research topics I have worked on. 5 6 Contents 1 Introduction 17 2 Related Work 19 3 Learning Energy Models 21 3.1 Energy Based Models........................... 21 3.1.1 Sample Replay Buffer....................... 22 3.1.2 Regularization........................... 23 3.2 Image Generation............................. 25 3.2.1 Mode Evaluation......................... 26 3.3 Generalization............................... 27 3.3.1 Adversarial Robustness...................... 27 3.3.2 Out-of-Distribution Generalization............... 28 3.4 Online Learning.............................. 29 4 Energy Based Models for Planning 31 4.1 Planning through Online Optimization................. 31 4.1.1 Energy-Based Models and Terminology............. 31 4.1.2 Planning with Energy-Based Models.............. 32 4.1.3 Online Learning with Energy-Based Models.......... 34 4.1.4 Related Work........................... 36 4.2 Experiments................................ 37 4.2.1 Setup............................... 37 7 4.2.2 Online Model Learning...................... 38 4.2.3 Maximum Entropy Inference................... 41 4.2.4 Exploration............................ 42 5 Energy Models for Protein Structure 45 5.1 Background................................ 45 5.2 Method.................................. 46 5.2.1 Parameterization of protein conformations........... 48 5.2.2 Usage as an energy function................... 48 5.2.3 Training and loss functions.................... 48 5.2.4 Recovery of Rotamers...................... 49 5.3 Evaluation................................. 49 5.3.1 Datasets.............................. 49 5.3.2 Baselines.............................. 50 5.3.3 Evaluation............................. 50 5.3.4 Rotamer recovery results..................... 51 5.3.5 Visualizing Energies....................... 52 6 Compositionality with Energy Based Models 57 6.1 Method.................................. 57 6.1.1 Energy Based Models....................... 57 6.1.2 Logical Operators through Online Optimization........ 58 6.2 Experiments................................ 61 6.2.1 Setup............................... 61 6.2.2 Compositional Generation.................... 62 6.2.3 Continual Learning........................ 65 7 Conclusion 67 7.1 Future Directions............................. 67 8 List of Figures 1-1 A 2D example of combining EBMs through summation and the resulting sampling trajectories. ............................ 17 3-1 Table of Inception and FID scores for ImageNet32x32 and CIFAR-10. Quan- titative numbers for ImageNet32x32 from [Ostrovski et al., 2018]. (*) We use Inception Score (from original OpenAI repo) to compare with legacy models, but strongly encourage future work to compare soley with FID score, since Langevin Dynamics converges to minima that artificially inflate Inception Score. (**) conditional EBM models for 128x128 are smaller than those in SNGAN. .................................. 24 3-2 EBM image restoration on images in the test set via MCMC (online optimization). The right column shows failure (approx. 10% objects change with ground truth initialization and 30% of objects change in salt/pepper corruption or in-painting. Bottom two rows shows worst case of change.) . 24 3-3 Comparison of image generation techniques on unconditional CIFAR-10 dataset. 24 3-4 Illustration of cross-class implicit sampling on a conditional EBM. The EBM is conditioned on a particular class but is initialized with an image from a separate class. ............................... 25 3-5 Illustration of image completions on conditional ImageNet model. Our models exhibit diversity in inpainting. ....................... 25 3-6 휖 plots under L1 and L2 attacks of conditional EBMs as compared to PGD trained models in [Madry et al., 2017] and a baseline Wide ResNet18. ... 27 9 3-7 Histogram of relative likelihoods for various datasets for Glow, PixelCNN++ and EBM models .............................. 28 4-1 Overview of online training procedure with a EBM where grey areas represent inadmissible regions. Plans from the current observation to goal state is inferred from the EBM (left). A particular plan is chosen and executed until a planned state deviates significantly from an actual state (middle). The EBM is then trained on all real transitions 휏real and all planned transitions before the deviation (in green) 휏휃, while transitions afterwards (red) are ignored. A new plan is then generated from the new location of the agent (right) ................................... 35 4-2 Illustrations of the 4 evaluated environments. Both particle and maze environments have 2 degree of freedom for x, y movement. The Reacher environment has 2 degrees of freedom corresponding to torques to two motors. The Sawyer Arm environment has 7 degrees of freedoms corresponding to torques. ... 37 4-5 Qualitative image showing EBM successfully navigating finger end effector to goal position. .............................. 39 4-3 Navigation path with a central obstacle the model was not trained with. 39 4-4 Performance on Particle, Maze and Reacher environments where dynamics models are either pretrained on random transitions or learned via online interaction with the environment. Action FF: Action Feed-Forward Network. 39 4-6 Effects of varying number of planning steps to reach a goal state. Asthe number of steps of planning increases, there is a larger envelope of explored states. ................................... 41 4-7 Illustrations of two different planned trajectories from start to goal inthe Reacher environment. ........................... 41 10 4-8 Illustration of energy values of states (computed by taking the energy of a transition centered at the location) and corresponding visitation maps. While an EBM learns a probabilistic model of transitions in areas already explored, energies of unexplored regions fluctuate throughout training, leading toa natural exploration incentive. Early on in training (left), the EBM puts low energy on the upper corner, incentivizing agent exploration towards the top corner. Later on in training (right), an EBM puts low energy on the right lower corner, incentivizing agent exploration towards the bottom corner. 42 4-9 Illustration of exploration in a maze under random actions (left) as opposed to following an EBM (middle). Areas in blue in the maze environment (right) are admissible,

Load more