Deep Learning Part Three - Deep Generative Models

Total Page:16

File Type:pdf, Size:1020Kb

Deep Learning Part Three - Deep Generative Models DEEP LEARNING PART THREE - DEEP GENERATIVE MODELS CS/CNS/EE 155 - MACHINE LEARNING & DATA MINING GENERATIVE MODELS number of features number of data examples DATA !3 feature 3 example 1 feature 2 example 3 example 2 feature 1 DATA DISTRIBUTION !4 feature 3 feature 2 feature 1 EMPIRICAL DATA DISTRIBUTION !5 feature 3 feature 2 feature 1 DENSITY ESTIMATION estimating the density of the empirical data distribution !6 GENERATIVE MODEL a model of the density of the data distribution !7 why learn a generative model? !8 generative models can generate new data examples feature 3 feature 2 feature 1 generated examples !9 BigGan, Brock et al., 2019 Glow, Kingma & Dhariwal, 2018 WaveNet, van den Oord et al., 2016 Learning Particle Physics by Example, de Oliveira et al., 2017 MidiNet, Yang et al., 2017 !10 GQN, Eslami et al., 2018 PlaNet, Hafner et al., 2018 !11 generative models can extract structure from data feature 2 feature 1 !12 generative models can extract structure from data feature 2 labeled examples feature 1 !13 generative models can extract structure from data feature 2 labeled examples feature 1 can make it easier to learn and generalize on new tasks !14 VLAE, Zhao et al., 2017 beta-VAE, Higgins et al., 2016 Disentangled Sequential Autoencoder, Li & Mandt, 2018 InfoGAN, Chen et al., 2016 !15 modeling the data distribution pdata(x) p p✓(x) data: pdata(x) x p (x) ⇠ data model: p✓(x) parameters: ✓ x maximum likelihood estimation find the model that assigns the maximum likelihood to the data ✓⇤ = arg min DKL(pdata(x) p✓(x)) ✓ || = DKL(pdata(x) argp✓(xmin)) = Ep (x) [log pdata(x) log p✓(x)] || ✓ data − N 1 (i) = arg max Ep (x) [log p✓(x)] log p✓(x ) ✓ data ⇡ N i=1 X !16 bias-variance trade-off pdata(x) p✓(x) x p (x) ⇠ data p p p x x x large bias large variance model complexity !17 deep generative model a generative model that uses deep neural networks to model the data distribution !18 autoregressive explicit models latent variable models invertible explicit implicit latent variable models latent variable models !19 autoregressive models conditional probability distributions This morning I woke up at x1 x2 x3 x4 x5 x6 x7 What is p(x7 x1:6) ? | p(x x ) 7| 1:6 1 0 eight seven nine once dawn home encyclopedia x7 !21 a data example x1 x2 x3 xM number of features p(x)=p(x1,x2,...,xM ) !22 chain rule of probability split the joint distribution into a product of conditional distributions x1 x2 x3 xM p(x)=p(x1,x2,...,xM ) p(a, b) p(a b)= p(a, b)=p(a b)p(b) definition of | p(b) | conditional probability recursively apply to p(x1,x2,...,xM )=: p(x1 x2,...,xM )p(x2,...,xM ) | p(x ,x ,...,x )=p(x )p(x ,...,x x ) 1 2 M 1 2 M | 1 p(x1,x2,...,xM )=p(x1)p(x2 x1) ...p(xM x1,...,xM 1) | | − M p(x1,...,xM )= p(xj x1,...,xj 1) | − j=1 Y note: conditioning order is arbitrary !23 model the conditional distributions of the data learn to auto-regress each value x1 x2 x3 xM !24 model the conditional distributions of the data learn to auto-regress each value p✓(x1) x1 x2 x3 xM !25 model the conditional distributions of the data learn to auto-regress each value p (x x ) ✓ 2| 1 MODEL x1 x2 x3 xM !26 model the conditional distributions of the data learn to auto-regress each value p (x x ,x ) ✓ 3| 1 2 MODEL x1 x2 x3 xM !27 model the conditional distributions of the data learn to auto-regress each value p (x x ,x ,x ) ✓ 4| 1 2 3 MODEL x1 x2 x3 xM !28 model the conditional distributions of the data learn to auto-regress each value p (x x ) ✓ M | <M MODEL x1 x2 x3 xM !29 maximum likelihood estimation maximize the log-likelihood (under the model) of the true data examples N 1 (i) ✓⇤ == arg max Ep (x) [log p✓(x)] log p✓(x ) ✓ data ⇡ N i=1 X for auto-regressive models: M log p (x) = log p (x x ) ✓ 0 ✓ j| <j 1 j=1 Y @ A M = log p (x x ) ✓ j| <j j=1 X N M 1 (i) (i) ✓⇤ = arg max log p✓(xj x<j) ✓ N | i=1 j=1 X X !30 models can parameterize conditional distributions using a recurrent neural network p✓(x1) p (x x ) p (x x ) p✓(x4 x<4) p (x x ) p (x x ) p (x x ) ✓ 2| 1 ✓ 3| <3 | ✓ 5| <5 ✓ 6| <6 ✓ 7| <7 x1 x2 x3 x4 x5 x6 x7 see Deep Learning (Chapter 10), Goodfellow et al., 2016 !31 models can parameterize conditional distributions using a recurrent neural network p✓(x1) p (x x ) p (x x ) p✓(x4 x<4) p (x x ) p (x x ) p (x x ) ✓ 2| 1 ✓ 3| <3 | ✓ 5| <5 ✓ 6| <6 ✓ 7| <7 x1 x2 x3 x4 x5 x6 x7 see Deep Learning (Chapter 10), Goodfellow et al., 2016 !32 models can parameterize conditional distributions using a recurrent neural network The Unreasonable Effectiveness of Recurrent Neural Networks, Karpathy, 2015 Pixel Recurrent Neural Networks, van den Oord et al., 2016 !33 models can condition on a local window using convolutional neural networks p✓(x1) p (x x ) p✓(x3 x1:2) p✓(x4 x1:3) p✓(x5 x2:4) p✓(x6 x3:5) p✓(x7 x4:6) ✓ 2| 1 | | | | | x1 x2 x3 x4 x5 x6 x7 !34 models can condition on a local window using convolutional neural networks p✓(x1) p (x x ) p✓(x3 x1:2) p✓(x4 x1:3) p✓(x5 x2:4) p✓(x6 x3:5) p✓(x7 x4:6) ✓ 2| 1 | | | | | x1 x2 x3 x4 x5 x6 x7 !35 models can condition on a local window using convolutional neural networks Pixel Recurrent Neural Networks, Conditional Image Generation with PixelCNN Decoders, van den Oord et al., 2016 van den Oord et al., 2016 WaveNet: A Generative Model for Raw Audio, van den Oord et al., 2016 !36 output distributions need to choose a form for the conditional output distribution, i.e. how do we express p ( x j x 1 ,...,x j 1 ) ? | − model the data as discrete variables categorical output model the data as continuous variables Gaussian, logistic, etc. output !37 sampling sample from the model by drawing from the output distribution p✓(x1) p (x x ) p (x x ) p✓(x4 x<4) p (x x ) p (x x ) p (x x ) ✓ 2| 1 ✓ 3| <3 | ✓ 5| <5 ✓ 6| <6 ✓ 7| <7 !38 question what issues might arise with sampling from the model? training sampling p✓(x1) p (x x ) p (x x ) p✓(x4 x<4) p (x x ) p (x x ) p (x x ) p✓(x1) p (x x ) p✓(x3 x1:2) p✓(x4 x1:3) p✓(x5 x2:4) p✓(x6 x3:5) p✓(x7 x4:6) ✓ 2 1 ✓ 3 <3 ✓ 5 <5 ✓ 6 <6 ✓ 7 <7 ✓ 2| 1 | | | | | | | | | | | x1 x2 x3 x4 x5 x6 x7 errors in the model distribution can accumulate, leading to poor samples see teacher forcing Deep Learning (Chapter 10), Goodfellow et al., 2016 !39 example applications text images Pixel Recurrent Neural Networks, van den Oord et al., 2016 speech WaveNet: A Generative Model for Raw Audio, van den Oord et al., 2016 !40 Attention is All You Need, Vaswani et al., 2017 Improving Language Understanding by Generative Pre-Training, Radford et al., 2018 Language Models as Unsupervised Multi-task Learners, Radford et al., 2019 !41 explicit latent variable models latent variables result in mixtures of distributions p x approach 1 directly fit a distribution to the data p (x)= (x; µ, σ2) ✓ N approach 2 use a latent variable to model the data 2 p✓(x, z)=p✓(xpz)(px,✓( zz)=) (x; µ (z), σ (z)) (z; µ ) | ✓ N x x B z p✓(x)= p✓(x, z) z X = µ (x; µ (1), σ2(1))+(1 µ ) (x; µ (0), σ2(0)) z ·N x x − z ·N x x ⇢ mixture component ⇢ mixture component !43 probabilistic graphical models provide a framework for modeling relationships between random variables PLATE NOTATION observed variable directed set of variables x y unobserved (latent) x variable undirected x y N !44 question represent an auto-regressive model of 3 random variables with plate notation ✓ p (x ) p (x x ) p (x x ,x ) ✓ 1 ✓ 2| 1 ✓ 3| 1 2 x1 x2 x3 N !45 comparing auto-regressive models and latent variable models ✓ ✓ z p✓(z) p✓(x1) p✓(x2 x1) p✓(x3 x1,x2) p (x z) p (x z) p (x z) | | ✓ 1| ✓ 2| ✓ 3| x1 x2 x3 x1 x2 x3 N N auto-regressive model latent variable model !46 directed latent variable model Generation GENERATIVE MODEL p(x, z)=p(x z)p(z) z ✓ | joint prior conditional likelihood 1. sample z from p(z) x 2. use z samples to sample x from p(x z) | N intuitive example: graphics engine object ~ p(objects) RENDER lighting ~ p(lighting) background ~ p(bg) !47 directed latent variable model Posterior Inference INFERENCE p(x, z) joint p(z x)= | p(x) z marginal ✓ posterior likelihood use Bayes’ rule provides conditional distribution x over latent variables N intuitive example what is the probability that I am observing a cat given these pixel observations? p( |cat) p(cat) ______________ p(cat | ) = p( ) observation !48 directed latent variable model Model Evaluation MARGINALIZATION marginal p(x)= p(x, z)dz z likelihood ✓ Z joint to evaluate the likelihood of an observation, we need to marginalize over all latent variables x i.e.
Recommended publications
  • Automatic Feature Learning Using Recurrent Neural Networks
    Predicting purchasing intent: Automatic Feature Learning using Recurrent Neural Networks Humphrey Sheil Omer Rana Ronan Reilly Cardiff University Cardiff University Maynooth University Cardiff, Wales Cardiff, Wales Maynooth, Ireland [email protected] [email protected] [email protected] ABSTRACT influence three out of four major variables that affect profit. Inaddi- We present a neural network for predicting purchasing intent in an tion, merchants increasingly rely on (and pay advertising to) much Ecommerce setting. Our main contribution is to address the signifi- larger third-party portals (for example eBay, Google, Bing, Taobao, cant investment in feature engineering that is usually associated Amazon) to achieve their distribution, so any direct measures the with state-of-the-art methods such as Gradient Boosted Machines. merchant group can use to increase their profit is sorely needed. We use trainable vector spaces to model varied, semi-structured input data comprising categoricals, quantities and unique instances. McKinsey A.T. Kearney Affected by Multi-layer recurrent neural networks capture both session-local shopping intent and dataset-global event dependencies and relationships for user Price management 11.1% 8.2% Yes sessions of any length. An exploration of model design decisions Variable cost 7.8% 5.1% Yes including parameter sharing and skip connections further increase Sales volume 3.3% 3.0% Yes model accuracy. Results on benchmark datasets deliver classifica- Fixed cost 2.3% 2.0% No tion accuracy within 98% of state-of-the-art on one and exceed state-of-the-art on the second without the need for any domain / Table 1: Effect of improving different variables on operating dataset-specific feature engineering on both short and long event profit, from [22].
    [Show full text]
  • Exploring the Potential of Sparse Coding for Machine Learning
    Portland State University PDXScholar Dissertations and Theses Dissertations and Theses 10-19-2020 Exploring the Potential of Sparse Coding for Machine Learning Sheng Yang Lundquist Portland State University Follow this and additional works at: https://pdxscholar.library.pdx.edu/open_access_etds Part of the Applied Mathematics Commons, and the Computer Sciences Commons Let us know how access to this document benefits ou.y Recommended Citation Lundquist, Sheng Yang, "Exploring the Potential of Sparse Coding for Machine Learning" (2020). Dissertations and Theses. Paper 5612. https://doi.org/10.15760/etd.7484 This Dissertation is brought to you for free and open access. It has been accepted for inclusion in Dissertations and Theses by an authorized administrator of PDXScholar. Please contact us if we can make this document more accessible: [email protected]. Exploring the Potential of Sparse Coding for Machine Learning by Sheng Y. Lundquist A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Dissertation Committee: Melanie Mitchell, Chair Feng Liu Bart Massey Garrett Kenyon Bruno Jedynak Portland State University 2020 © 2020 Sheng Y. Lundquist Abstract While deep learning has proven to be successful for various tasks in the field of computer vision, there are several limitations of deep-learning models when com- pared to human performance. Specifically, human vision is largely robust to noise and distortions, whereas deep learning performance tends to be brittle to modifi- cations of test images, including being susceptible to adversarial examples. Addi- tionally, deep-learning methods typically require very large collections of training examples for good performance on a task, whereas humans can learn to perform the same task with a much smaller number of training examples.
    [Show full text]
  • Text-To-Speech Synthesis
    Generative Model-Based Text-to-Speech Synthesis Heiga Zen (Google London) February rd, @MIT Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks Beyond parametric TTS Learned features WaveNet End-to-end Conclusion & future topics Outline Generative TTS Generative acoustic models for parametric TTS Hidden Markov models (HMMs) Neural networks Beyond parametric TTS Learned features WaveNet End-to-end Conclusion & future topics Text-to-speech as sequence-to-sequence mapping Automatic speech recognition (ASR) “Hello my name is Heiga Zen” ! Machine translation (MT) “Hello my name is Heiga Zen” “Ich heiße Heiga Zen” ! Text-to-speech synthesis (TTS) “Hello my name is Heiga Zen” ! Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Speech production process text (concept) fundamental freq voiced/unvoiced char freq transfer frequency speech transfer characteristics magnitude start--end Sound source fundamental voiced: pulse frequency unvoiced: noise modulation of carrier wave by speech information air flow Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Typical ow of TTS system TEXT Sentence segmentation Word segmentation Text normalization Text analysis Part-of-speech tagging Pronunciation Speech synthesis Prosody prediction discrete discrete Waveform generation ) NLP Frontend discrete continuous SYNTHESIZED ) SEECH Speech Backend Heiga Zen Generative Model-Based Text-to-Speech Synthesis February rd, of Rule-based, formant synthesis []
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
    1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17].
    [Show full text]
  • Sparse Feature Learning for Deep Belief Networks
    Sparse Feature Learning for Deep Belief Networks Marc'Aurelio Ranzato1 Y-Lan Boureau2,1 Yann LeCun1 1 Courant Institute of Mathematical Sciences, New York University 2 INRIA Rocquencourt {ranzato,ylan,[email protected]} Abstract Unsupervised learning algorithms aim to discover the structure hidden in the data, and to learn representations that are more suitable as input to a supervised machine than the raw input. Many unsupervised methods are based on reconstructing the input from the representation, while constraining the representation to have cer- tain desirable properties (e.g. low dimension, sparsity, etc). Others are based on approximating density by stochastically reconstructing the input from the repre- sentation. We describe a novel and efficient algorithm to learn sparse represen- tations, and compare it theoretically and experimentally with a similar machine trained probabilistically, namely a Restricted Boltzmann Machine. We propose a simple criterion to compare and select different unsupervised machines based on the trade-off between the reconstruction error and the information content of the representation. We demonstrate this method by extracting features from a dataset of handwritten numerals, and from a dataset of natural image patches. We show that by stacking multiple levels of such machines and by training sequentially, high-order dependencies between the input observed variables can be captured. 1 Introduction One of the main purposes of unsupervised learning is to produce good representations for data, that can be used for detection, recognition, prediction, or visualization. Good representations eliminate irrelevant variabilities of the input data, while preserving the information that is useful for the ul- timate task. One cause for the recent resurgence of interest in unsupervised learning is the ability to produce deep feature hierarchies by stacking unsupervised modules on top of each other, as pro- posed by Hinton et al.
    [Show full text]
  • Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors
    sensors Article Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors Frédéric Li 1,*, Kimiaki Shirahama 1 ID , Muhammad Adeel Nisar 1, Lukas Köping 1 and Marcin Grzegorzek 1,2 1 Research Group for Pattern Recognition, University of Siegen, Hölderlinstr 3, 57076 Siegen, Germany; [email protected] (K.S.); [email protected] (M.A.N.); [email protected] (L.K.); [email protected] (M.G.) 2 Department of Knowledge Engineering, University of Economics in Katowice, Bogucicka 3, 40-226 Katowice, Poland * Correspondence: [email protected]; Tel.: +49-271-740-3973 Received: 16 January 2018; Accepted: 22 February 2018; Published: 24 February 2018 Abstract: Getting a good feature representation of data is paramount for Human Activity Recognition (HAR) using wearable sensors. An increasing number of feature learning approaches—in particular deep-learning based—have been proposed to extract an effective feature representation by analyzing large amounts of data. However, getting an objective interpretation of their performances faces two problems: the lack of a baseline evaluation setup, which makes a strict comparison between them impossible, and the insufficiency of implementation details, which can hinder their use. In this paper, we attempt to address both issues: we firstly propose an evaluation framework allowing a rigorous comparison of features extracted by different methods, and use it to carry out extensive experiments with state-of-the-art feature learning approaches. We then provide all the codes and implementation details to make both the reproduction of the results reported in this paper and the re-use of our framework easier for other researchers.
    [Show full text]
  • Convex Multi-Task Feature Learning
    Convex Multi-Task Feature Learning Andreas Argyriou1, Theodoros Evgeniou2, and Massimiliano Pontil1 1 Department of Computer Science University College London Gower Street London WC1E 6BT UK [email protected] [email protected] 2 Technology Management and Decision Sciences INSEAD 77300 Fontainebleau France [email protected] Summary. We present a method for learning sparse representations shared across multiple tasks. This method is a generalization of the well-known single- task 1-norm regularization. It is based on a novel non-convex regularizer which controls the number of learned features common across the tasks. We prove that the method is equivalent to solving a convex optimization problem for which there is an iterative algorithm which converges to an optimal solution. The algorithm has a simple interpretation: it alternately performs a super- vised and an unsupervised step, where in the former step it learns task-speci¯c functions and in the latter step it learns common-across-tasks sparse repre- sentations for these functions. We also provide an extension of the algorithm which learns sparse nonlinear representations using kernels. We report exper- iments on simulated and real data sets which demonstrate that the proposed method can both improve the performance relative to learning each task in- dependently and lead to a few learned features common across related tasks. Our algorithm can also be used, as a special case, to simply select { not learn { a few common variables across the tasks3. Key words: Collaborative Filtering, Inductive Transfer, Kernels, Multi- Task Learning, Regularization, Transfer Learning, Vector-Valued Func- tions.
    [Show full text]
  • Visual Word2vec (Vis-W2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes
    Visual Word2Vec (vis-w2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes Satwik Kottur1∗ Ramakrishna Vedantam2∗ Jose´ M. F. Moura1 Devi Parikh2 1Carnegie Mellon University 2Virginia Tech 1 2 [email protected],[email protected] {vrama91,parikh}@vt.edu Abstract w2v : farther We propose a model to learn visually grounded word em- eats beddings (vis-w2v) to capture visual notions of semantic stares at relatedness. While word embeddings trained using text have vis-w2v : closer been extremely successful, they cannot uncover notions of eats semantic relatedness implicit in our visual world. For in- stance, although “eats” and “stares at” seem unrelated in stares at text, they share semantics visually. When people are eating Word Embedding something, they also tend to stare at the food. Grounding diverse relations like “eats” and “stares at” into vision re- mains challenging, despite recent progress in vision. We girl girl note that the visual grounding of words depends on seman- eats stares at tics, and not the literal pixels. We thus use abstract scenes ice cream ice cream created from clipart to provide the visual grounding. We find that the embeddings we learn capture fine-grained, vi- sually grounded notions of semantic relatedness. We show Figure 1: We ground text-based word2vec (w2v) embed- improvements over text-only word embeddings (word2vec) dings into vision to capture a complimentary notion of vi- on three tasks: common-sense assertion classification, vi- sual relatedness. Our method (vis-w2v) learns to predict sual paraphrasing and text-based image retrieval. Our code the visual grounding as context for a given word.
    [Show full text]
  • Deep Learning with a Recurrent Network Structure in the Sequence Modeling of Imbalanced Data for ECG-Rhythm Classifier
    algorithms Article Deep Learning with a Recurrent Network Structure in the Sequence Modeling of Imbalanced Data for ECG-Rhythm Classifier Annisa Darmawahyuni 1, Siti Nurmaini 1,* , Sukemi 2, Wahyu Caesarendra 3,4 , Vicko Bhayyu 1, M Naufal Rachmatullah 1 and Firdaus 1 1 Intelligent System Research Group, Universitas Sriwijaya, Palembang 30137, Indonesia; [email protected] (A.D.); [email protected] (V.B.); [email protected] (M.N.R.); [email protected] (F.) 2 Faculty of Computer Science, Universitas Sriwijaya, Palembang 30137, Indonesia; [email protected] 3 Faculty of Integrated Technologies, Universiti Brunei Darussalam, Jalan Tungku Link, Gadong, BE 1410, Brunei; [email protected] 4 Mechanical Engineering Department, Faculty of Engineering, Diponegoro University, Jl. Prof. Soedharto SH, Tembalang, Semarang 50275, Indonesia * Correspondence: [email protected]; Tel.: +62-852-6804-8092 Received: 10 May 2019; Accepted: 3 June 2019; Published: 7 June 2019 Abstract: The interpretation of Myocardial Infarction (MI) via electrocardiogram (ECG) signal is a challenging task. ECG signals’ morphological view show significant variation in different patients under different physical conditions. Several learning algorithms have been studied to interpret MI. However, the drawback of machine learning is the use of heuristic features with shallow feature learning architectures. To overcome this problem, a deep learning approach is used for learning features automatically, without conventional handcrafted features. This paper presents sequence modeling based on deep learning with recurrent network for ECG-rhythm signal classification. The recurrent network architecture such as a Recurrent Neural Network (RNN) is proposed to automatically interpret MI via ECG signal. The performance of the proposed method is compared to the other recurrent network classifiers such as Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU).
    [Show full text]
  • Deep Graphical Feature Learning for the Feature Matching Problem
    Deep Graphical Feature Learning for the Feature Matching Problem Zhen Zhang∗ Wee Sun Lee Australian Institute for Machine Learning Department of Computer Science School of Computer Science National University of Singapore The University of Adelaide [email protected] [email protected] Feature Point Coordinate Local Abstract Geometric Feature The feature matching problem is a fundamental problem in various areas of computer vision including image regis- Geometric Feature Net Feature tration, tracking and motion analysis. Rich local represen- Similarity tation is a key part of efficient feature matching methods. However, when the local features are limited to the coor- dinate of key points, it becomes challenging to extract rich Local Feature Point Geometric local representations. Traditional approaches use pairwise Coordinate Feature or higher order handcrafted geometric features to get ro- Figure 1: The graph neural netowrk transforms the coordinates bust matching; this requires solving NP-hard assignment into features of the points, so that a simple inference algorithm problems. In this paper, we address this problem by propos- can successfully do feature matching. ing a graph neural network model to transform coordi- nates of feature points into local features. With our local ference problem in geometric feature matching, previous features, the traditional NP-hard assignment problems are works mainly focus on developing efficient relaxation meth- replaced with a simple assignment problem which can be ods [28, 13, 12, 4, 31, 14, 15, 26, 30, 25, 29]. solved efficiently. Promising results on both synthetic and In this paper, we attack the geometric feature match- real datasets demonstrate the effectiveness of the proposed ing problem from a different direction.
    [Show full text]
  • Unsupervised, Backpropagation-Free Convolutional Neural Networks for Representation Learning
    CSNNs: Unsupervised, Backpropagation-free Convolutional Neural Networks for Representation Learning Bonifaz Stuhr Jurgen¨ Brauer University of Applied Sciences Kempten University of Applied Sciences Kempten [email protected] [email protected] Abstract—This work combines Convolutional Neural Networks (CNNs), clustering via Self-Organizing Maps (SOMs) and Heb- Learner Problem bian Learning to propose the building blocks of Convolutional Feedback Value Self-Organizing Neural Networks (CSNNs), which learn repre- sentations in an unsupervised and Backpropagation-free manner. Fig. 1. The Scalar Bow Tie Problem: With this term we describe the current Our approach replaces the learning of traditional convolutional problematic situation in Deep Learning in which most training approaches for layers from CNNs with the competitive learning procedure of neural networks and agents depend only on a single scalar loss or feedback SOMs and simultaneously learns local masks between those value. This single value is used to subsume the performance of an entire layers with separate Hebbian-like learning rules to overcome the neural network or agent even for very complex problems. problem of disentangling factors of variation when filters are learned through clustering. We investigate the learned represen- tation by designing two simple models with our building blocks, parameters. In contrast, humans use multiple feedback types to achieving comparable performance to many methods which use Backpropagation, while we reach comparable performance learn, e.g., when they learn representations of objects, different on Cifar10 and give baseline performances on Cifar100, Tiny sensor modalities are used. With the Scalar Bow Tie Problem ImageNet and a small subset of ImageNet for Backpropagation- we refer to the current situation for training neural networks free methods.
    [Show full text]
  • Sparse Estimation and Dictionary Learning (For Biostatistics?)
    Sparse Estimation and Dictionary Learning (for Biostatistics?) Julien Mairal Biostatistics Seminar, UC Berkeley Julien Mairal Sparse Estimation and Dictionary Learning Methods 1/69 What this talk is about? Why sparsity, what for and how? Feature learning / clustering / sparse PCA; Machine learning: selecting relevant features; Signal and image processing: restoration, reconstruction; Biostatistics: you tell me. Julien Mairal Sparse Estimation and Dictionary Learning Methods 2/69 Part I: Sparse Estimation Julien Mairal Sparse Estimation and Dictionary Learning Methods 3/69 Sparse Linear Model: Machine Learning Point of View i i n i Rp Let (y , x )i=1 be a training set, where the vectors x are in and are called features. The scalars y i are in 1, +1 for binary classification problems. {− } R for regression problems. We assume there is a relation y w⊤x, and solve ≈ n 1 min ℓ(y i , w⊤xi ) + λψ(w) . w∈Rp n Xi=1 regularization empirical risk | {z } | {z } Julien Mairal Sparse Estimation and Dictionary Learning Methods 4/69 Sparse Linear Models: Machine Learning Point of View A few examples: n 1 1 i ⊤ i 2 2 Ridge regression: min (y w x ) + λ w 2. w∈Rp n 2 − k k Xi=1 n 1 i ⊤ i 2 Linear SVM: min max(0, 1 y w x ) + λ w 2. w∈Rp n − k k Xi=1 n 1 i i −y w⊤x 2 Logistic regression: min log 1+ e + λ w 2. w∈Rp n k k Xi=1 Julien Mairal Sparse Estimation and Dictionary Learning Methods 5/69 Sparse Linear Models: Machine Learning Point of View A few examples: n 1 1 i ⊤ i 2 2 Ridge regression: min (y w x ) + λ w 2.
    [Show full text]