Lecture 6: Multilayer Neural Networks & Autoencoder

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 6: Multilayer Neural Networks & Autoencoder Lecture 6: Multilayer Neural Networks & Autoencoder Topics of this lecture Multilayer perceptron (MLP) ◦ MLP-based inference (decision making) ◦ Back propagation algorithm Autoencoder ◦ Training of autoencoder ◦ Autoencoder for “internal representation” ◦ Training with l2-norm regularization ◦ Training with sparsity regularization ◦ Training a deep neural network MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/2 Multilayer perceptron One of the most popular neural network model is the multilayer feedforward neural network, which is commonly known as the multilayer perceptron (MLP). In an MLP, neurons are arranged in layers. There is one input layer, one output layer, and several (or many) hidden layers. A three layer MLP is known as a general approximator (for solving any problem), but more layers can solve the problem more efficiently (using less neurons) and more effectively (more accurate). This is a three layer MLP MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/3 MLP-based inference (푖) Suppose that 푥 is the output of the 푖-th 푦 = 푥(퐾) ∈ 푅푁퐾 layer, 푖 = 0,1, … , 퐾. ◦ 푥(0) = 푥: External input Layer K ◦ 푥(퐾) = 푦: The final output The output of the MLP can be found layer by layer (start from 푖=1) as follows: 푥(1) ∈ 푅푁1 Layer 1 (푖) 푖 푖−1 푖 푢 = 푊 푥 − 휃 (4) 푥(0) = 푥 ∈ 푅푁0 푥 푖 = 푔 푢 푖 (5) Layer 0 (푖) 푢푗 is called the effective input of the 푗-th 푁 : Dimension of the 푖-th layer. neuron in the 푖-th layer. 푖 • 푁0: Number of inputs. • 푁퐾: Number of classes. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/4 Physical meaning of the outputs The activation functions of the hidden layers are usually the same. For the output layer, the following softmax function is often used for pattern classification. exp(푢 퐾 ) 푦 = 푥(퐾) = 푛 , 푛 = 1,2, … , 푁 푛 푛 푁퐾 (퐾) 퐾 (6) σ푚=1 exp(푢푚 ) This output can be considered the conditional probability 푝 퐶푛 풙 . That is, through training, the MLP can approximate the posterior probability. This MLP can be used as a set of discriminant functions for making decisions. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/5 Objective function for training Suppose that we have a training set given by Ω = { 풙1, 풅1 , … , (풙푁, 풅푁)} Neural network training is an optimization problem and the variables to optimize are the weights (including the biases). For regression, the following squared error function is often used as the objective function: 1 퐸 푊 = σ푁 풅 − 푦(풙 ; 푊) 2 (7) 2 푛=1 푛 푛 For classification, the following cross-entropy is often used: 푁 푁퐾 퐸 푊 = − σ푛=1 σ푚=1 푑푛푚푙표푔푦푛푚(푥푛, 푊) (8) 푑푛푚: 푑푒푠푖푟푒푑 표푢푡푝푢푡 ∈ 0, 1 ; 푦푛푚: 푚-th output for the 푛-th input MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/6 The gradient descent algorithm To find the weights based on the given training data, we usually use the following learning rule to update the weights iteratively: 푊푛푒푤 = 푊표푙푑 − 휖훻퐸 (9) where 훻퐸 is the gradient of 퐸 with respect to each element of 푊, and 휖 is the learning rate. To find the gradient of 퐸 is not easy because the weights of the hidden neurons are “embedded” in a complex function. To solve the problem, we usually use the well-known back propagation (BP) algorithm. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/7 BP for squared error optimization For the weights in the output layer, we have 휕퐸 휕퐸 휕푦 휕푢 = 푗 푗 = 푦 풙 − 푑 푔′ 푢 푥 퐾−1 = 훿(퐾) 푥(퐾−1) (10) (퐾) 휕푦 휕푢 (퐾) 푗 푗 푗 푖 푗 푖 휕푤푗푖 푗 푗 휕푤푗푖 For the weights in the 푘-th hidden layer, we have (푘) 휕퐸 휕퐸 휕푢푗 (푘) (푘−1) (푘) = (푘) (푘) = 훿푗 푥푖 (11) 휕푤푗푖 휕푢푗 휕푤푗푖 (푘) 휕퐸 푁푘+1 푘+1 푘+1 푘 훿푗 = (푘) = σ푛=1 훿푛 (푤푛푗 푔′(푢푗 )) (12) 휕푢푗 Using the gradient found above, we can update the weights layer-by-layer using Eq. (9). MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/8 BP for cross-entropy optimization If the outputs are calculated using the softmax function, the cross-entropy for any input 푥 is 퐾 σ푁퐾 σ푁퐾 exp(푢푛 ) 퐸 = − 푛=1 푑푛 log 푦푛 = − 푛=1 푑푛 log 푁퐾 (퐾) (13) σ푚=1 exp(푢푚 ) For the output weights, we have 푁푘 (퐾) 휕퐸 1 휕푦푛 훿푗 = (퐾) = − ෍ 푑푛 퐾 = −푑푗 1 − 푦푗 − ෍ 푑푛 −푦푗 푦푛 휕푢푗 푛=1 휕푢푗 푛≠푗 푁퐾 = −푑푗 + 푦푗 σ푛=1 푑푛 = 푦푗 − 푑푗 (14) In the last line, we have used the property that the summation of all outputs is 1. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/9 BP for cross-entropy optimization Based on Eq. (14) we can find the error signal when the loss function is defined as the cross-entropy, and the error signals for other hidden layers can be obtained in the same way using equations (11) and (12). Therefore, the MLP for classification can be trained in the same way using the BP algorithm. For a classification problem, the desired outputs (or teacher signal) are usually given as follows: ◦ The number of outputs equals to the number of classes. ◦ Only the output corresponding to the correct class is one, and all others are zeros. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/10 Summary of MLP learning Step 1: Initialize the weights Step 2: Reset the total error This is why BP is also Step 3: Forward propagation called the extended ◦ Get a training example from the training set; ◦ Calculate the outputs of all layers; and delta learning rule ◦ Update the total error (for all data) Step 4: Back propagation ◦ Calculate the “delta” for each layer, starting from the output layer. ◦ Calculate the gradient using Eq. (10) and Eq. (11). ◦ Update the weights using Eq. (9) Step 5: See if all data have been used. If NOT, return to Step 3. Step 6: See if the total error is smaller than a desired value. If NOT, reset the total error, and return to Step 2; otherwise, terminate. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) AI Lec13/11 Meaning of error back propagation Start from the output layer, we can find Layer K the delta directly from the teacher signal and the network output. 훿(퐾) We can then find the delta for each hidden layer, from top to bottom. This delta is also called the error signal in (2) the literature. 훿 This is why BP algorithm is also called Layer 1 extended delta learning rule. 훿(1) Layer 0 MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/12 Example - 1 In this example, we try the Classify Patterns with a Shallow Neural Network in Matlab. The database used is the cancer dataset. ◦ There are 699 data, and each datum has 9 inputs and 2 outputs (1 or 0). We train a three layer MLP with 10 hidden neurons. To train the MLP, we simply use the following lines ◦ load cancer_dataset ◦ net = feedforwardnet(10); ◦ net = train(net, cancerInputs, cancerTargets); ◦ view(net) % to show the structure of the network MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/13 Example - 1 • The trained network can be evaluated by using it as a function ◦ outputs=net(cancerInputs); ◦ errors = gsubtract(cancerTargets,outputs); ◦ performance = perform(net,cancerTargets,outputs) ◦ performance = 0.0249 To find a proper network size (number of hidden neurons), we can use grid search & cross validation 1) Grid search: n_hidden=1,2,…100 2) 10 fold Cross validation to ensure the reliability of the selected value. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/14 Example - 1 The learning curves MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/15 Example - 1 The error histogram and the confusion matrix MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/16 Example - 1 Upper: the trained network Right: Training status MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/17 Autoencoder MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/18 What is an autoencoder? Autoencoder is a special MLP. Output: 푥ො There is one input layer, one output layer, and one or more hidden layers. The teacher signal equals to the input. Hidden layer(s) Input: 푥 The main purpose of using an autoencoder is to find a new (internal or latent) representation for the given feature space, with the hope of obtaining the true factors that control the distribution of the input data. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/19 What is an autoencoder? Using principal component Output layer analysis (PCA) we can obtain a linear autoencoder. Each hidden unit corresponds to Hidden layer(s) an eigenvector of the co-variance matrix, and each datum is a linear combination of these Input layer eigenvectors. Using a non-linear activation function in the hidden layer, we can obtain a non-linear autoencoder. That is, each datum can be represented using a set of non-linear basis functions. MACHINE LEARNING: PRODUCED BY QIANGFU ZHAO (SINCE 2018), ALL RIGHTS RESERVED (C) Lec06/20 Training of autoencoder Implementing an autoencoder using an MLP, we can find a more compact representation for the given problem space.
Recommended publications
  • Malware Classification with BERT
    San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Spring 5-25-2021 Malware Classification with BERT Joel Lawrence Alvares Follow this and additional works at: https://scholarworks.sjsu.edu/etd_projects Part of the Artificial Intelligence and Robotics Commons, and the Information Security Commons Malware Classification with Word Embeddings Generated by BERT and Word2Vec Malware Classification with BERT Presented to Department of Computer Science San José State University In Partial Fulfillment of the Requirements for the Degree By Joel Alvares May 2021 Malware Classification with Word Embeddings Generated by BERT and Word2Vec The Designated Project Committee Approves the Project Titled Malware Classification with BERT by Joel Lawrence Alvares APPROVED FOR THE DEPARTMENT OF COMPUTER SCIENCE San Jose State University May 2021 Prof. Fabio Di Troia Department of Computer Science Prof. William Andreopoulos Department of Computer Science Prof. Katerina Potika Department of Computer Science 1 Malware Classification with Word Embeddings Generated by BERT and Word2Vec ABSTRACT Malware Classification is used to distinguish unique types of malware from each other. This project aims to carry out malware classification using word embeddings which are used in Natural Language Processing (NLP) to identify and evaluate the relationship between words of a sentence. Word embeddings generated by BERT and Word2Vec for malware samples to carry out multi-class classification. BERT is a transformer based pre- trained natural language processing (NLP) model which can be used for a wide range of tasks such as question answering, paraphrase generation and next sentence prediction. However, the attention mechanism of a pre-trained BERT model can also be used in malware classification by capturing information about relation between each opcode and every other opcode belonging to a malware family.
    [Show full text]
  • Training Autoencoders by Alternating Minimization
    Under review as a conference paper at ICLR 2018 TRAINING AUTOENCODERS BY ALTERNATING MINI- MIZATION Anonymous authors Paper under double-blind review ABSTRACT We present DANTE, a novel method for training neural networks, in particular autoencoders, using the alternating minimization principle. DANTE provides a distinct perspective in lieu of traditional gradient-based backpropagation techniques commonly used to train deep networks. It utilizes an adaptation of quasi-convex optimization techniques to cast autoencoder training as a bi-quasi-convex optimiza- tion problem. We show that for autoencoder configurations with both differentiable (e.g. sigmoid) and non-differentiable (e.g. ReLU) activation functions, we can perform the alternations very effectively. DANTE effortlessly extends to networks with multiple hidden layers and varying network configurations. In experiments on standard datasets, autoencoders trained using the proposed method were found to be very promising and competitive to traditional backpropagation techniques, both in terms of quality of solution, as well as training speed. 1 INTRODUCTION For much of the recent march of deep learning, gradient-based backpropagation methods, e.g. Stochastic Gradient Descent (SGD) and its variants, have been the mainstay of practitioners. The use of these methods, especially on vast amounts of data, has led to unprecedented progress in several areas of artificial intelligence. On one hand, the intense focus on these techniques has led to an intimate understanding of hardware requirements and code optimizations needed to execute these routines on large datasets in a scalable manner. Today, myriad off-the-shelf and highly optimized packages exist that can churn reasonably large datasets on GPU architectures with relatively mild human involvement and little bootstrap effort.
    [Show full text]
  • Fun with Hyperplanes: Perceptrons, Svms, and Friends
    Perceptrons, SVMs, and Friends: Some Discriminative Models for Classification Parallel to AIMA 18.1, 18.2, 18.6.3, 18.9 The Automatic Classification Problem Assign object/event or sequence of objects/events to one of a given finite set of categories. • Fraud detection for credit card transactions, telephone calls, etc. • Worm detection in network packets • Spam filtering in email • Recommending articles, books, movies, music • Medical diagnosis • Speech recognition • OCR of handwritten letters • Recognition of specific astronomical images • Recognition of specific DNA sequences • Financial investment Machine Learning methods provide one set of approaches to this problem CIS 391 - Intro to AI 2 Universal Machine Learning Diagram Feature Things to Magic Vector Classification be Classifier Represent- Decision classified Box ation CIS 391 - Intro to AI 3 Example: handwritten digit recognition Machine learning algorithms that Automatically cluster these images Use a training set of labeled images to learn to classify new images Discover how to account for variability in writing style CIS 391 - Intro to AI 4 A machine learning algorithm development pipeline: minimization Problem statement Given training vectors x1,…,xN and targets t1,…,tN, find… Mathematical description of a cost function Mathematical description of how to minimize/maximize the cost function Implementation r(i,k) = s(i,k) – maxj{s(i,j)+a(i,j)} … CIS 391 - Intro to AI 5 Universal Machine Learning Diagram Today: Perceptron, SVM and Friends Feature Things to Magic Vector
    [Show full text]
  • Turbo Autoencoder: Deep Learning Based Channel Codes for Point-To-Point Communication Channels
    Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels Hyeji Kim Himanshu Asnani Yihan Jiang Samsung AI Center School of Technology ECE Department Cambridge and Computer Science University of Washington Cambridge, United Tata Institute of Seattle, United States Kingdom Fundamental Research [email protected] [email protected] Mumbai, India [email protected] Sewoong Oh Pramod Viswanath Sreeram Kannan Allen School of ECE Department ECE Department Computer Science & University of Illinois at University of Washington Engineering Urbana Champaign Seattle, United States University of Washington Illinois, United States [email protected] Seattle, United States [email protected] [email protected] Abstract Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communica- tions. Asymptotically optimal channel codes have been developed by mathemati- cians for communicating under canonical models after over 60 years of research. On the other hand, in many non-canonical channel settings, optimal codes do not exist and the codes designed for canonical models are adapted via heuristics to these channels and are thus not guaranteed to be optimal. In this work, we make significant progress on this problem by designing a fully end-to-end jointly trained neural encoder and decoder, namely, Turbo Autoencoder (TurboAE), with the following contributions: (a) under moderate block lengths, TurboAE approaches state-of-the-art performance under canonical channels; (b) moreover, TurboAE outperforms the state-of-the-art codes under non-canonical settings in terms of reliability. TurboAE shows that the development of channel coding design can be automated via deep learning, with near-optimal performance.
    [Show full text]
  • Double Backpropagation for Training Autoencoders Against Adversarial Attack
    1 Double Backpropagation for Training Autoencoders against Adversarial Attack Chengjin Sun, Sizhe Chen, and Xiaolin Huang, Senior Member, IEEE Abstract—Deep learning, as widely known, is vulnerable to adversarial samples. This paper focuses on the adversarial attack on autoencoders. Safety of the autoencoders (AEs) is important because they are widely used as a compression scheme for data storage and transmission, however, the current autoencoders are easily attacked, i.e., one can slightly modify an input but has totally different codes. The vulnerability is rooted the sensitivity of the autoencoders and to enhance the robustness, we propose to adopt double backpropagation (DBP) to secure autoencoder such as VAE and DRAW. We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack. After smoothing the gradient by DBP, we further smooth the label by Gaussian Mixture Model (GMM), aiming for accurate and robust classification. We demonstrate in MNIST, CelebA, SVHN that our method leads to a robust autoencoder resistant to attack and a robust classifier able for image transition and immune to adversarial attack if combined with GMM. Index Terms—double backpropagation, autoencoder, network robustness, GMM. F 1 INTRODUCTION N the past few years, deep neural networks have been feature [9], [10], [11], [12], [13], or network structure [3], [14], I greatly developed and successfully used in a vast of fields, [15]. such as pattern recognition, intelligent robots, automatic Adversarial attack and its defense are revolving around a control, medicine [1]. Despite the great success, researchers small ∆x and a big resulting difference between f(x + ∆x) have found the vulnerability of deep neural networks to and f(x).
    [Show full text]
  • Introduction to Machine Learning
    Introduction to Machine Learning Perceptron Barnabás Póczos Contents History of Artificial Neural Networks Definitions: Perceptron, Multi-Layer Perceptron Perceptron algorithm 2 Short History of Artificial Neural Networks 3 Short History Progression (1943-1960) • First mathematical model of neurons ▪ Pitts & McCulloch (1943) • Beginning of artificial neural networks • Perceptron, Rosenblatt (1958) ▪ A single neuron for classification ▪ Perceptron learning rule ▪ Perceptron convergence theorem Degression (1960-1980) • Perceptron can’t even learn the XOR function • We don’t know how to train MLP • 1963 Backpropagation… but not much attention… Bryson, A.E.; W.F. Denham; S.E. Dreyfus. Optimal programming problems with inequality constraints. I: Necessary conditions for extremal solutions. AIAA J. 1, 11 (1963) 2544-2550 4 Short History Progression (1980-) • 1986 Backpropagation reinvented: ▪ Rumelhart, Hinton, Williams: Learning representations by back-propagating errors. Nature, 323, 533—536, 1986 • Successful applications: ▪ Character recognition, autonomous cars,… • Open questions: Overfitting? Network structure? Neuron number? Layer number? Bad local minimum points? When to stop training? • Hopfield nets (1982), Boltzmann machines,… 5 Short History Degression (1993-) • SVM: Vapnik and his co-workers developed the Support Vector Machine (1993). It is a shallow architecture. • SVM and Graphical models almost kill the ANN research. • Training deeper networks consistently yields poor results. • Exception: deep convolutional neural networks, Yann LeCun 1998. (discriminative model) 6 Short History Progression (2006-) Deep Belief Networks (DBN) • Hinton, G. E, Osindero, S., and Teh, Y. W. (2006). A fast learning algorithm for deep belief nets. Neural Computation, 18:1527-1554. • Generative graphical model • Based on restrictive Boltzmann machines • Can be trained efficiently Deep Autoencoder based networks Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H.
    [Show full text]
  • Audio Event Classification Using Deep Learning in an End-To-End Approach
    Audio Event Classification using Deep Learning in an End-to-End Approach Master thesis Jose Luis Diez Antich Aalborg University Copenhagen A. C. Meyers Vænge 15 2450 Copenhagen SV Denmark Title: Abstract: Audio Event Classification using Deep Learning in an End-to-End Approach The goal of the master thesis is to study the task of Sound Event Classification Participant(s): using Deep Neural Networks in an end- Jose Luis Diez Antich to-end approach. Sound Event Classifi- cation it is a multi-label classification problem of sound sources originated Supervisor(s): from everyday environments. An auto- Hendrik Purwins matic system for it would many applica- tions, for example, it could help users of hearing devices to understand their sur- Page Numbers: 38 roundings or enhance robot navigation systems. The end-to-end approach con- Date of Completion: sists in systems that learn directly from June 16, 2017 data, not from features, and it has been recently applied to audio and its results are remarkable. Even though the re- sults do not show an improvement over standard approaches, the contribution of this thesis is an exploration of deep learning architectures which can be use- ful to understand how networks process audio. The content of this report is freely available, but publication (with reference) may only be pursued due to agreement with the author. Contents 1 Introduction1 1.1 Scope of this work.............................2 2 Deep Learning3 2.1 Overview..................................3 2.2 Multilayer Perceptron...........................4
    [Show full text]
  • Artificial Intelligence Applied to Electromechanical Monitoring, A
    ARTIFICIAL INTELLIGENCE APPLIED TO ELECTROMECHANICAL MONITORING, A PERFORMANCE ANALYSIS Erasmus project Authors: Staš Osterc Mentor: Dr. Miguel Delgado Prieto, Dr. Francisco Arellano Espitia 1/5/2020 P a g e II ANNEX VI – DECLARACIÓ D’HONOR P a g e II I declare that, the work in this Master Thesis / Degree Thesis (choose one) is completely my own work, no part of this Master Thesis / Degree Thesis (choose one) is taken from other people’s work without giving them credit, all references have been clearly cited, I’m authorised to make use of the company’s / research group (choose one) related information I’m providing in this document (select when it applies). I understand that an infringement of this declaration leaves me subject to the foreseen disciplinary actions by The Universitat Politècnica de Catalunya - BarcelonaTECH. ___________________ __________________ ___________ Student Name Signature Date Title of the Thesis : _________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ P a g e II Contents Introduction........................................................................................................................................5 Abstract ..........................................................................................................................................5 Aim .................................................................................................................................................6
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
    1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17].
    [Show full text]
  • 4 Nonlinear Regression and Multilayer Perceptron
    4 Nonlinear Regression and Multilayer Perceptron In nonlinear regression the output variable y is no longer a linear function of the regression parameters plus additive noise. This means that estimation of the parameters is harder. It does not reduce to minimizing a convex energy functions { unlike the methods we described earlier. The perceptron is an analogy to the neural networks in the brain (over-simplified). It Pd receives a set of inputs y = j=1 !jxj + !0, see Figure (3). Figure 3: Idealized neuron implementing a perceptron. It has a threshold function which can be hard or soft. The hard one is ζ(a) = 1; if a > 0, ζ(a) = 0; otherwise. The soft one is y = σ(~!T ~x) = 1=(1 + e~!T ~x), where σ(·) is the sigmoid function. There are a variety of different algorithms to train a perceptron from labeled examples. Example: The quadratic error: t t 1 t t 2 E(~!j~x ; y ) = 2 (y − ~! · ~x ) ; for which the update rule is ∆!t = −∆ @E = +∆(yt~! · ~xt)~xt. Introducing the sigmoid j @!j function rt = sigmoid(~!T ~xt), we have t t P t t t t E(~!j~x ; y ) = − i ri log yi + (1 − ri) log(1 − yi ) , and the update rule is t t t t ∆!j = −η(r − y )xj, where η is the learning factor. I.e, the update rule is the learning factor × (desired output { actual output)× input. 10 4.1 Multilayer Perceptrons Multilayer perceptrons were developed to address the limitations of perceptrons (introduced in subsection 2.1) { i.e.
    [Show full text]
  • 4 Perceptron Learning
    4 Perceptron Learning 4.1 Learning algorithms for neural networks In the two preceding chapters we discussed two closely related models, McCulloch–Pitts units and perceptrons, but the question of how to find the parameters adequate for a given task was left open. If two sets of points have to be separated linearly with a perceptron, adequate weights for the comput- ing unit must be found. The operators that we used in the preceding chapter, for example for edge detection, used hand customized weights. Now we would like to find those parameters automatically. The perceptron learning algorithm deals with this problem. A learning algorithm is an adaptive method by which a network of com- puting units self-organizes to implement the desired behavior. This is done in some learning algorithms by presenting some examples of the desired input- output mapping to the network. A correction step is executed iteratively until the network learns to produce the desired response. The learning algorithm is a closed loop of presentation of examples and of corrections to the network parameters, as shown in Figure 4.1. network test input-output compute the examples error fix network parameters Fig. 4.1. Learning process in a parametric system R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996 78 4 Perceptron Learning In some simple cases the weights for the computing units can be found through a sequential test of stochastically generated numerical combinations. However, such algorithms which look blindly for a solution do not qualify as “learning”. A learning algorithm must adapt the network parameters accord- ing to previous experience until a solution is found, if it exists.
    [Show full text]
  • De Novo Molecular Design by Combining Deep Autoencoder
    De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping Boris Sattarov, Igor Baskin, Dragos Horvath, Gilles Marcou, Esben Jannik Bjerrum, Alexandre Varnek To cite this version: Boris Sattarov, Igor Baskin, Dragos Horvath, Gilles Marcou, Esben Jannik Bjerrum, et al.. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping. Journal of Chemical Information and Modeling, American Chemical Society, 2018, 59 (3), pp.1182-1196. 10.1021/acs.jcim.8b00751. hal-02346951 HAL Id: hal-02346951 https://hal.archives-ouvertes.fr/hal-02346951 Submitted on 13 Nov 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. De Novo Molecular Design by Combining Deep Autoencoder Recurrent Neural Networks with Generative Topographic Mapping Boris Sattarov a, Igor I. Baskin b, Dragos Horvath a, Gilles Marcou a, Esben Jannik Bjerrum c, Alexandre Varnek a* a Laboratory of Chemoinformatics, UMR 7177 University of Strasbourg/CNRS, 4 rue B. Pascal, 67000 Strasbourg, France b Faculty of Physics, M.V. Lomonosov Moscow State University, Leninskie Gory, Moscow 19991, Russia c Wildcard Pharmaceutical Consulting, Zeaborg Science Center, Frødings Allé 41, 2860 Søborg, Denmark 1.
    [Show full text]