Variational Autoencoders + Deep Generative Models

Total Page:16

File Type:pdf, Size:1020Kb

Variational Autoencoders + Deep Generative Models 10-418 / 10-618 Machine Learning for Structured Data Machine Learning Department School of Computer Science Carnegie Mellon University Variational Autoencoders + Deep Generative Models Matt Gormley Lecture 27 Dec. 4, 2019 1 Reminders • Final Exam – Evening Exam – Thu, Dec. 5 at 6:30pm – 9:00pm • 618 Final Poster: – Submission: Tue, Dec. 10 at 11:59pm – Presentation: Wed, Dec. 11 (time will be announced on Piazza) 3 FINAL EXAM LOGISTICS 6 Final Exam • Time / Location – Time: Evening Exam Thu, Dec. 5 at 6:30pm – 9:00pm – Room: Doherty Hall A302 – Seats: There will be assigned seats. Please arrive early to find yours. – Please watch Piazza carefully for announcements • Logistics – Covered material: Lecture 1 – Lecture 26 (not the new material in Lecture 27) – Format of questions: • Multiple choice • True / False (with justification) • Derivations • Short answers • Interpreting figures • Implementing algorithms on paper – No electronic devices – You are allowed to bring one 8½ x 11 sheet of notes (front and back) 7 Final Exam • Advice (for during the exam) – Solve the easy problems first (e.g. multiple choice before derivations) • if a problem seems extremely complicated you’re likely missing something – Don’t leave any answer blank! – If you make an assumption, write it down – If you look at a question and don’t know the answer: • we probably haven’t told you the answer • but we’ve told you enough to work it out • imagine arguing for some answer and see if you like it 8 Final Exam • Exam Contents – ~30% of material comes from topics covered before Midterm Exam – ~70% of material comes from topics covered after Midterm Exam 9 Topics from before Midterm Exam • Search-Based Structured • Graphical Model Learning Prediction – Fully observed Bayesian – Reductions to Binary Network learning Classification – Fully observed MRF learning – Learning to Search – Fully observed CRF learning – RNN-LMs – Parameterization of a GM – seq2seq models – Neural potential functions • Graphical Model • Exact Inference Representation – Three inference problems: – Directed GMs vs. (1) marginals Undirected GMs vs. (2) partition function Factor Graphs (3) most probably assignment – Bayesian Networks vs. – Variable Elimination Markov Random Fields vs. – Belief Propagation (sum- Conditional Random Fields product and max-product) – MAP Inference via MILP 10 Topics from after Midterm Exam • Learning for Structure • Approximate Inference by Prediction Optimization – Structured Perceptron – Variational Inference – Structured SVM – Mean Field Variational – Neural network potentials Inference • Approximate MAP Inference – Coordinate Ascent V.I. (CAVI) – – MAP Inference via MILP Variational EM – MAP Inference via LP – Variational Bayes relaxation • Bayesian Nonparametrics • Approximate Inference by – Dirichlet Process Sampling – DP Mixture Model – Monte Carlo Methods • Deep Generative Models – Gibbs Sampling – Variational Autoencoders – Metropolis-Hastings – Markov Chains and MCMC 11 VARIATIONAL EM 12 Variational EM Whiteboard – Example: Unsupervised POS Tagging – Variational Bayes – Variational EM 13 Unsupervised POS Tagging Bayesian Inference for HMMs • Task: unsupervised POS tagging • Data: 1 million words (i.e. unlabeled sentenCes) of WSJ text • Dictionary: defines legal part-of-speeCh (POS) tags for eaCh word type • Models: – EM: standard HMM – VB: unCollapsed variational Bayesian HMM – Algo 1 (CVB): Collapsed variational Bayesian HMMPengyu (strong Wang, Philindep Blunsom. assumption) – Algo 2 (CVB): Collapsed variational Bayesian HMM (weaker indep. assumption) t t t – CGS: Collapsed Gibbs Sampler forexp( Bayesiant [log(C HMM+ β) + log(C + ↵) + log(C + ↵ + δ(z = k = z ))]) Eq(z¬ ) k,w¬ z¬t 1,k k,z¬ t+1 t 1 t+1 − − q(zt = k) t t t t / exp(Eq(z¬ )[log(Ck,¬ + W β) + log(Cz¬t 1, + K↵) + log(Ck,¬ + K↵ + δ(zt 1 = k))]) · − · · − Figure 2: The exact mean field update for the first CVB inference algorithm. t t t t t [C ]+↵ t [C ]+↵ + t [δ(z = k = z )] Eq(z¬ )[Ck,w¬ ]+β Eq(z¬ ) z¬t 1,k Eq(z¬ ) k,z¬ t+1 Eq(z¬ ) t 1 t+1 − − q(zt = k) t t Algo 1 mean field update: t t t t t / Eq(z¬ )[Ck,¬ ]+W β · Eq(z¬ )[Cz¬t 1, ]+K↵ · Eq(z¬ )[Ck,¬ ]+K↵ + Eq(z¬ )[δ(zt 1 = k)] · Pengyu Wang,− Phil· Blunsom · − Figure 3: The update for the first CVB algorithm using a first order Taylor series approximation. t t t C + β C¬ + ↵ C¬ + ↵ + δ(z = k = z ) t k,w¬ zt 1,k k,zt+1 t 1 t+1 p(z = k x, z¬ , ↵, β) − − CGS full Conditional: t t t t i i can be computed| as follows,/ Ck,¬ + W β · Cz¬t 1, + K↵ · TheCk,¬ challenge+ K↵ + isδ( tozt compute1 = k) the term p(xi, zi x¬ , z¬ ). − · − · The· exact computation includes expensive| non- Eq(z t)[δ(zt 1 = k = zt+1)] = q(zt 1 = k)q(zt+1 = k) ¬ − − Markov delta functions, as shown in Figure 4. We ap- Figure 1: The conditional distribution for a single hidden state(21) zi in the collapsed Gibbs sampler, conditioned t i on all other hidden states z¬ . C¬ is the count that does not includeproximatezi, w is theby assuming observation that at hidden time step variablest, W within a is the sizeThe of observation implementation space, for and theK firstis the CVB size algorithm of hidden sim- state space.sequenceδ is theonly standard exhibit first indicator order Markovfunction. dependencies ply keeps track of the global expected counts Ck,w and and output independence. Ck0,k, subtracting the expected counts for zt (and zt 1 − i i Solving theor abovez when equation needed). results After in Dirichletupdating distri-q(z ), thepractice, mean onep(x often, z x¬ draws, z¬ ) as many samples as possible 14 t+1 t i i| Figure from Wang & Blunsombutions(2013) withcounts updated around hyperparameters.zt are added back Equivalently, into the global counts.(within the limitedT 1 time frame) to reduce sampling ˜ − i i i i Beal (2003)Each suggested update the of q mean(zt) has parameters the computational✓ instead. complex-variance, and thus itp( iszi,t much+1 zi,t less, x¬ e,ffizcient¬ )p(x thani,t+1 EMzi,t+1, x¬ , z¬ ) 2 ⇡ | | This involvesity O only(K a), minor which change is same in as the EM M and step: VB. and VB. t=0 Y T 1 i i T 1 Griffiths and Steyvers− C (2004)z¬i,t,zi,t+1 observed+ ↵ Cz¬ thati,t+1,x thei,t+1 CGS+ β ˜ 4.2f Algorithm( t=0− q(zt = 2 k, zt+1 = k0)+↵) = (24) Ak,k = for LDA converged relativelyC i + K quickly.↵ · C Ini LDA,+ W theβ 0 T 1 t=0 z¬i,t, z¬i,t+1, f( − q(zt = k)+K↵) · · The strongP independencet=0 assumption in Algorithmconditional 1 distributionY for the currently updating f( T q(z = k)δ(x = w)+β) variable dependsThis approximation on other variables ignores only the through contributions the from B˜ has= the potentialt=1P t to leadt to inaccurate results. How- k,w T counts, i.e.other the parts dependency of the i onth sequence any particular to the other global counts. ever in orderf( to applyq(zt = thek)+ meanW β) field method one has P t=1 variable isCompared very small. with Hence contributions quick convergence from all other is to sequences, f(xto)=exp( partition (x the)) latent variables into disjoint and inde- pendent groups.P be expected.we assume For HMMs the impact the conditional of these local distribution counts is small. for zt in Figure 1 depends on the states of the previ- @Γ(x) Substituting (24) into (23), and with the first order where (xOur)= investigation@x is the digamma of CVB algorithms function. for HMMsous is in- hidden variable (zt 1) and the next hidden vari- spired by large scale applications in natural language Taylor approximation,− Ignoring how fluctuations in ✓ induce fluctuations in able (zt+1), as well as the global counts. Such strong processing. A common feature of those problems is z (and vice-versa) allows for analytic iterations, and dependencies makes CGS forT 1 HMMs much slower to that there are usually many short sequences (i.e.converge sen- (Gao and Johnson,− 2008).CVB CVB both EM and VB inference algorithms are efficient and q(zi) Az ,z Bz ,x (25) tences), where each sequence is drawn i.i.d. from the ⇡ i,t i,t+1 i,t+1 i,t+1 easy to implement. Nevertheless, the independence t=0 same set of parameters. Therefore the collection of Y assumption may potentially lead to very inaccurate HMM sequences can be considered as an i.i.d. model3 Collapsedwhere we variational define, inference for estimations. The parameters and latent variables are i.i.d. hidden variables with a shared set of parameters. i strongly dependent in the true posterior p(z, ✓ x, ↵, β), i Eq(z¬ )[Cz¬ ,z ]+↵ th | CVB i,t i,t+1 which is proportionalLet x be the toi thesequence joint distribution of observations, in (1). and z be Azi,t,zi,t+1 = i i i i The rapid convergence of CGS forEq LDA(z¬ )[C indicatesz¬ , ]+K that↵ As we shallthe seeith insequence the following, of hidden CGS states. and CVB Denote model the number i,t · VB in the collapsed space is likely to be e↵i ective. For i [C ]+β the dependenciesof sequences between to be parametersI. By using and the hidden derivation vari- for i.i.d. Eq(z¬ ) z¬i,t+1,xi,t+1 any independentB andCVB identically= distributed models,1 zi,t+1,xi,t+1 i ables in anmodels exact fashion.it is reasonable to assume that each hidden i [C ]+W β collapsing the parameters inducesEq( onlyz¬ ) weakz¬i,t+1, depen- state sequence is independent of the others, since they · dencies among the hidden variables.
Recommended publications
  • Disentangled Variational Auto-Encoder for Semi-Supervised Learning
    Information Sciences 482 (2019) 73–85 Contents lists available at ScienceDirect Information Sciences journal homepage: www.elsevier.com/locate/ins Disentangled Variational Auto-Encoder for semi-supervised learning ∗ Yang Li a, Quan Pan a, Suhang Wang c, Haiyun Peng b, Tao Yang a, Erik Cambria b, a School of Automation, Northwestern Polytechnical University, China b School of Computer Science and Engineering, Nanyang Technological University, Singapore c College of Information Sciences and Technology, Pennsylvania State University, USA a r t i c l e i n f o a b s t r a c t Article history: Semi-supervised learning is attracting increasing attention due to the fact that datasets Received 5 February 2018 of many domains lack enough labeled data. Variational Auto-Encoder (VAE), in particu- Revised 23 December 2018 lar, has demonstrated the benefits of semi-supervised learning. The majority of existing Accepted 24 December 2018 semi-supervised VAEs utilize a classifier to exploit label information, where the param- Available online 3 January 2019 eters of the classifier are introduced to the VAE. Given the limited labeled data, learn- Keywords: ing the parameters for the classifiers may not be an optimal solution for exploiting label Semi-supervised learning information. Therefore, in this paper, we develop a novel approach for semi-supervised Variational Auto-encoder VAE without classifier. Specifically, we propose a new model called Semi-supervised Dis- Disentangled representation entangled VAE (SDVAE), which encodes the input data into disentangled representation Neural networks and non-interpretable representation, then the category information is directly utilized to regularize the disentangled representation via the equality constraint.
    [Show full text]
  • A Deep Hierarchical Variational Autoencoder
    NVAE: A Deep Hierarchical Variational Autoencoder Arash Vahdat, Jan Kautz NVIDIA {avahdat, jkautz}@nvidia.com Abstract Normalizing flows, autoregressive models, variational autoencoders (VAEs), and deep energy-based models are among competing likelihood-based frameworks for deep generative learning. Among them, VAEs have the advantage of fast and tractable sampling and easy-to-access encoding networks. However, they are cur- rently outperformed by other models such as normalizing flows and autoregressive models. While the majority of the research in VAEs is focused on the statistical challenges, we explore the orthogonal direction of carefully designing neural archi- tectures for hierarchical VAEs. We propose Nouveau VAE (NVAE), a deep hierar- chical VAE built for image generation using depth-wise separable convolutions and batch normalization. NVAE is equipped with a residual parameterization of Normal distributions and its training is stabilized by spectral regularization. We show that NVAE achieves state-of-the-art results among non-autoregressive likelihood-based models on the MNIST, CIFAR-10, CelebA 64, and CelebA HQ datasets and it provides a strong baseline on FFHQ. For example, on CIFAR-10, NVAE pushes the state-of-the-art from 2.98 to 2.91 bits per dimension, and it produces high-quality images on CelebA HQ as shown in Fig. 1. To the best of our knowledge, NVAE is the first successful VAE applied to natural images as large as 256×256 pixels. The source code is available at https://github.com/NVlabs/NVAE. 1 Introduction The majority of the research efforts on improving VAEs [1, 2] is dedicated to the statistical challenges, such as reducing the gap between approximate and true posterior distributions [3, 4, 5, 6, 7, 8, 9, 10], formulating tighter bounds [11, 12, 13, 14], reducing the gradient noise [15, 16], extending VAEs to discrete variables [17, 18, 19, 20, 21, 22, 23], or tackling posterior collapse [24, 25, 26, 27].
    [Show full text]
  • Explainable Deep Learning Models in Medical Image Analysis
    Journal of Imaging Review Explainable Deep Learning Models in Medical Image Analysis Amitojdeep Singh 1,2,* , Sourya Sengupta 1,2 and Vasudevan Lakshminarayanan 1,2 1 Theoretical and Experimental Epistemology Laboratory, School of Optometry and Vision Science, University of Waterloo, Waterloo, ON N2L 3G1, Canada; [email protected] (S.S.); [email protected] (V.L.) 2 Department of Systems Design Engineering, University of Waterloo, Waterloo, ON N2L 3G1, Canada * Correspondence: [email protected] Received: 28 May 2020; Accepted: 17 June 2020; Published: 20 June 2020 Abstract: Deep learning methods have been very effective for a variety of medical diagnostic tasks and have even outperformed human experts on some of those. However, the black-box nature of the algorithms has restricted their clinical use. Recent explainability studies aim to show the features that influence the decision of a model the most. The majority of literature reviews of this area have focused on taxonomy, ethics, and the need for explanations. A review of the current applications of explainable deep learning for different medical imaging tasks is presented here. The various approaches, challenges for clinical deployment, and the areas requiring further research are discussed here from a practical standpoint of a deep learning researcher designing a system for the clinical end-users. Keywords: explainability; explainable AI; XAI; deep learning; medical imaging; diagnosis 1. Introduction Computer-aided diagnostics (CAD) using artificial intelligence (AI) provides a promising way to make the diagnosis process more efficient and available to the masses. Deep learning is the leading artificial intelligence (AI) method for a wide range of tasks including medical imaging problems.
    [Show full text]
  • Double Backpropagation for Training Autoencoders Against Adversarial Attack
    1 Double Backpropagation for Training Autoencoders against Adversarial Attack Chengjin Sun, Sizhe Chen, and Xiaolin Huang, Senior Member, IEEE Abstract—Deep learning, as widely known, is vulnerable to adversarial samples. This paper focuses on the adversarial attack on autoencoders. Safety of the autoencoders (AEs) is important because they are widely used as a compression scheme for data storage and transmission, however, the current autoencoders are easily attacked, i.e., one can slightly modify an input but has totally different codes. The vulnerability is rooted the sensitivity of the autoencoders and to enhance the robustness, we propose to adopt double backpropagation (DBP) to secure autoencoder such as VAE and DRAW. We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack. After smoothing the gradient by DBP, we further smooth the label by Gaussian Mixture Model (GMM), aiming for accurate and robust classification. We demonstrate in MNIST, CelebA, SVHN that our method leads to a robust autoencoder resistant to attack and a robust classifier able for image transition and immune to adversarial attack if combined with GMM. Index Terms—double backpropagation, autoencoder, network robustness, GMM. F 1 INTRODUCTION N the past few years, deep neural networks have been feature [9], [10], [11], [12], [13], or network structure [3], [14], I greatly developed and successfully used in a vast of fields, [15]. such as pattern recognition, intelligent robots, automatic Adversarial attack and its defense are revolving around a control, medicine [1]. Despite the great success, researchers small ∆x and a big resulting difference between f(x + ∆x) have found the vulnerability of deep neural networks to and f(x).
    [Show full text]
  • Generative Models
    Lecture 11: Generative Models Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 1 May 9, 2019 Administrative ● A3 is out. Due May 22. ● Milestone is due next Wednesday. ○ Read Piazza post for milestone requirements. ○ Need to Finish data preprocessing and initial results by then. ● Don't discuss exam yet since people are still taking it. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 -2 May 9, 2019 Overview ● Unsupervised Learning ● Generative Models ○ PixelRNN and PixelCNN ○ Variational Autoencoders (VAE) ○ Generative Adversarial Networks (GAN) Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 3 May 9, 2019 Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, semantic segmentation, image captioning, etc. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 4 May 9, 2019 Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Cat Goal: Learn a function to map x -> y Examples: Classification, regression, object detection, Classification semantic segmentation, image captioning, etc. This image is CC0 public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 5 May 9, 2019 Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, DOG, DOG, CAT regression, object detection, semantic segmentation, image Object Detection captioning, etc. This image is CC0 public domain Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 11 - 6 May 9, 2019 Supervised vs Unsupervised Learning Supervised Learning Data: (x, y) x is data, y is label Goal: Learn a function to map x -> y Examples: Classification, GRASS, CAT, TREE, SKY regression, object detection, semantic segmentation, image Semantic Segmentation captioning, etc.
    [Show full text]
  • Unsupervised Recurrent Neural Network Grammars
    Unsupervised Recurrent Neural Network Grammars Yoon Kim Alexander Rush Lei Yu Adhiguna Kuncoro Chris Dyer G´abor Melis Code: https://github.com/harvardnlp/urnng 1/36 Language Modeling & Grammar Induction Goal of Language Modeling: assign high likelihood to held-out data Goal of Grammar Induction: learn linguistically meaningful tree structures without supervision Incompatible? For good language modeling performance, need little independence assumptions and make use of flexible models (e.g. deep networks) For grammar induction, need strong independence assumptions for tractable training and to imbue inductive bias (e.g. context-freeness grammars) 2/36 Language Modeling & Grammar Induction Goal of Language Modeling: assign high likelihood to held-out data Goal of Grammar Induction: learn linguistically meaningful tree structures without supervision Incompatible? For good language modeling performance, need little independence assumptions and make use of flexible models (e.g. deep networks) For grammar induction, need strong independence assumptions for tractable training and to imbue inductive bias (e.g. context-freeness grammars) 2/36 This Work: Unsupervised Recurrent Neural Network Grammars Use a flexible generative model without any explicit independence assumptions (RNNG) =) good LM performance Variational inference with a structured inference network (CRF parser) to regularize the posterior =) learn linguistically meaningful trees 3/36 Background: Recurrent Neural Network Grammars [Dyer et al. 2016] Structured joint generative model of sentence x and tree z pθ(x; z) Generate next word conditioned on partially-completed syntax tree Hierarchical generative process (cf. flat generative process of RNN) 4/36 Background: Recurrent Neural Network Language Models Standard RNNLMs: flat left-to-right generation xt ∼ pθ(x j x1; : : : ; xt−1) = softmax(Wht−1 + b) 5/36 Background: RNNG [Dyer et al.
    [Show full text]
  • Infinite Variational Autoencoder for Semi-Supervised Learning
    Infinite Variational Autoencoder for Semi-Supervised Learning M. Ehsan Abbasnejad Anthony Dick Anton van den Hengel The University of Adelaide {ehsan.abbasnejad, anthony.dick, anton.vandenhengel}@adelaide.edu.au Abstract with whatever labelled data is available to train a discrimi- native model for classification. This paper presents an infinite variational autoencoder We demonstrate that our infinite VAE outperforms both (VAE) whose capacity adapts to suit the input data. This the classical VAE and standard classification methods, par- is achieved using a mixture model where the mixing coef- ticularly when the number of available labelled samples is ficients are modeled by a Dirichlet process, allowing us to small. This is because the infinite VAE is able to more ac- integrate over the coefficients when performing inference. curately capture the distribution of the unlabelled data. It Critically, this then allows us to automatically vary the therefore provides a generative model that allows the dis- number of autoencoders in the mixture based on the data. criminative model, which is trained based on its output, to Experiments show the flexibility of our method, particularly be more effectively learnt using a small number of samples. for semi-supervised learning, where only a small number of The main contribution of this paper is twofold: (1) we training samples are available. provide a Bayesian non-parametric model for combining autoencoders, in particular variational autoencoders. This bridges the gap between non-parametric Bayesian meth- 1. Introduction ods and the deep neural networks; (2) we provide a semi- supervised learning approach that utilizes the infinite mix- The Variational Autoencoder (VAE) [18] is a newly in- ture of autoencoders learned by our model for prediction troduced tool for unsupervised learning of a distribution x x with from a small number of labeled examples.
    [Show full text]
  • Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges
    Deep Learning for Source Code Modeling and Generation: Models, Applications and Challenges TRIET H. M. LE, The University of Adelaide HAO CHEN, The University of Adelaide MUHAMMAD ALI BABAR, The University of Adelaide Deep Learning (DL) techniques for Natural Language Processing have been evolving remarkably fast. Recently, the DL advances in language modeling, machine translation and paragraph understanding are so prominent that the potential of DL in Software Engineering cannot be overlooked, especially in the field of program learning. To facilitate further research and applications of DL in this field, we provide a comprehensive review to categorize and investigate existing DL methods for source code modeling and generation. To address the limitations of the traditional source code models, we formulate common program learning tasks under an encoder-decoder framework. After that, we introduce recent DL mechanisms suitable to solve such problems. Then, we present the state-of-the-art practices and discuss their challenges with some recommendations for practitioners and researchers as well. CCS Concepts: • General and reference → Surveys and overviews; • Computing methodologies → Neural networks; Natural language processing; • Software and its engineering → Software notations and tools; Source code generation; Additional Key Words and Phrases: Deep learning, Big Code, Source code modeling, Source code generation 1 INTRODUCTION Deep Learning (DL) has recently emerged as an important branch of Machine Learning (ML) because of its incredible performance in Computer Vision and Natural Language Processing (NLP) [75]. In the field of NLP, it has been shown that DL models can greatly improve the performance of many classic NLP tasks such as semantic role labeling [96], named entity recognition [209], machine translation [256], and question answering [174].
    [Show full text]
  • Variational Autoencoder for End-To-End Control of Autonomous Driving with Novelty Detection and Training De-Biasing
    Variational Autoencoder for End-to-End Control of Autonomous Driving with Novelty Detection and Training De-biasing Alexander Amini1, Wilko Schwarting1, Guy Rosman2, Brandon Araki1, Sertac Karaman3, Daniela Rus1 Abstract— This paper introduces a new method for end-to- Uncertainty Estimation end training of deep neural networks (DNNs) and evaluates & Novelty Detection it in the context of autonomous driving. DNN training has Uncertainty been shown to result in high accuracy for perception to action Propagate learning given sufficient training data. However, the trained models may fail without warning in situations with insufficient or biased training data. In this paper, we propose and evaluate Encoder a novel architecture for self-supervised learning of latent variables to detect the insufficiently trained situations. Our method also addresses training data imbalance, by learning a set of underlying latent variables that characterize the training Dataset Debiasing data and evaluate potential biases. We show how these latent Weather Adjacent Road Steering distributions can be leveraged to adapt and accelerate the (Snow) Vehicles Surface Control training pipeline by training on only a fraction of the total Command Resampled data dataset. We evaluate our approach on a challenging dataset distribution for driving. The data is collected from a full-scale autonomous Unsupervised Latent vehicle. Our method provides qualitative explanation for the Variables Sample Efficient latent variables learned in the model. Finally, we show how Accelerated Training our model can be additionally trained as an end-to-end con- troller, directly outputting a steering control command for an Fig. 1: Semi-supervised end-to-end control. An encoder autonomous vehicle.
    [Show full text]
  • GRAINS: Generative Recursive Autoencoders for Indoor Scenes
    GRAINS: Generative Recursive Autoencoders for INdoor Scenes MANYI LI, Shandong University and Simon Fraser University AKSHAY GADI PATIL, Simon Fraser University KAI XU∗, National University of Defense Technology SIDDHARTHA CHAUDHURI, Adobe Research and IIT Bombay OWAIS KHAN, IIT Bombay ARIEL SHAMIR, The Interdisciplinary Center CHANGHE TU, Shandong University BAOQUAN CHEN, Peking University DANIEL COHEN-OR, Tel Aviv University HAO ZHANG, Simon Fraser University Bedrooms Living rooms Kitchen Fig. 1. We present a generative recursive neural network (RvNN) based on a variational autoencoder (VAE) to learn hierarchical scene structures, enabling us to easily generate plausible 3D indoor scenes in large quantities and varieties (see scenes of kitchen, bedroom, office, and living room generated). Using the trained RvNN-VAE, a novel 3D scene can be generated from a random vector drawn from a Gaussian distribution in a fraction of a second. We present a generative neural network which enables us to generate plau- learned distribution. We coin our method GRAINS, for Generative Recursive sible 3D indoor scenes in large quantities and varieties, easily and highly Autoencoders for INdoor Scenes. We demonstrate the capability of GRAINS efficiently. Our key observation is that indoor scene structures are inherently to generate plausible and diverse 3D indoor scenes and compare with ex- hierarchical. Hence, our network is not convolutional; it is a recursive neural isting methods for 3D scene synthesis. We show applications of GRAINS network or RvNN. Using a dataset of annotated scene hierarchies, we train including 3D scene modeling from 2D layouts, scene editing, and semantic a variational recursive autoencoder, or RvNN-VAE, which performs scene scene segmentation via PointNet whose performance is boosted by the large object grouping during its encoding phase and scene generation during quantity and variety of 3D scenes generated by our method.
    [Show full text]
  • An Introduction to Variational Autoencoders Full Text Available At
    Full text available at: http://dx.doi.org/10.1561/2200000056 An Introduction to Variational Autoencoders Full text available at: http://dx.doi.org/10.1561/2200000056 Other titles in Foundations and Trends R in Machine Learning Computational Optimal Transport Gabriel Peyre and Marco Cuturi ISBN: 978-1-68083-550-2 An Introduction to Deep Reinforcement Learning Vincent Francois-Lavet, Peter Henderson, Riashat Islam, Marc G. Bellemare and Joelle Pineau ISBN: 978-1-68083-538-0 An Introduction to Wishart Matrix Moments Adrian N. Bishop, Pierre Del Moral and Angele Niclas ISBN: 978-1-68083-506-9 A Tutorial on Thompson Sampling Daniel J. Russo, Benjamin Van Roy, Abbas Kazerouni, Ian Osband and Zheng Wen ISBN: 978-1-68083-470-3 Full text available at: http://dx.doi.org/10.1561/2200000056 An Introduction to Variational Autoencoders Diederik P. Kingma Google [email protected] Max Welling University of Amsterdam Qualcomm [email protected] Boston — Delft Full text available at: http://dx.doi.org/10.1561/2200000056 Foundations and Trends R in Machine Learning Published, sold and distributed by: now Publishers Inc. PO Box 1024 Hanover, MA 02339 United States Tel. +1-781-985-4510 www.nowpublishers.com [email protected] Outside North America: now Publishers Inc. PO Box 179 2600 AD Delft The Netherlands Tel. +31-6-51115274 The preferred citation for this publication is D. P. Kingma and M. Welling. An Introduction to Variational Autoencoders. Foundations and Trends R in Machine Learning, vol. 12, no. 4, pp. 307–392, 2019. ISBN: 978-1-68083-623-3 c 2019 D.
    [Show full text]
  • Unsupervised Recurrent Neural Network Grammars
    Unsupervised Recurrent Neural Network Grammars Yoon Kimy Alexander M. Rushy Lei Yu3 Adhiguna Kuncoroz;3 Chris Dyer3 Gabor´ Melis3 yHarvard University zUniversity of Oxford 3DeepMind fyoonkim,[email protected] fleiyu,akuncoro,cdyer,[email protected] Abstract The standard setup for unsupervised structure p (x; z) Recurrent neural network grammars (RNNG) learning is to define a generative model θ are generative models of language which over observed data x (e.g. sentence) and unob- jointly model syntax and surface structure by served structure z (e.g. parse tree, part-of-speech incrementally generating a syntax tree and sequence), and maximize the log marginal like- P sentence in a top-down, left-to-right order. lihood log pθ(x) = log z pθ(x; z). Success- Supervised RNNGs achieve strong language ful approaches to unsupervised parsing have made modeling and parsing performance, but re- strong conditional independence assumptions (e.g. quire an annotated corpus of parse trees. In context-freeness) and employed auxiliary objec- this work, we experiment with unsupervised learning of RNNGs. Since directly marginal- tives (Klein and Manning, 2002) or priors (John- izing over the space of latent trees is in- son et al., 2007). These strategies imbue the learn- tractable, we instead apply amortized varia- ing process with inductive biases that guide the tional inference. To maximize the evidence model to discover meaningful structures while al- lower bound, we develop an inference net- lowing tractable algorithms for marginalization; work parameterized as a neural CRF con- however, they come at the expense of language stituency parser. On language modeling, unsu- modeling performance, particularly compared to pervised RNNGs perform as well their super- vised counterparts on benchmarks in English sequential neural models that make no indepen- and Chinese.
    [Show full text]