A Memristor-Based Cascaded Neural Networks for Specific Target

Total Page:16

File Type:pdf, Size:1020Kb

A Memristor-Based Cascaded Neural Networks for Specific Target Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2019 doi:10.20944/preprints201901.0319.v1 Article A Memristor-based Cascaded Neural Networks for Specific Target Recognition Sheng-Yang Sun, Hui Xu, Jiwei Li, Yi Sun, Qingjiang Li, Zhiwei Li * and Haijun Liu * College of Electronic Science, National University of Defense Technology, Changsha 410073, China; [email protected] (S.-Y. S.); [email protected] (H. X.); [email protected],cn (J. L.); [email protected] (Y. S.); [email protected] (Q. L.) * Correspondence: [email protected] (Z. L.); [email protected] (H. L.) 1 Abstract: Multiply-accumulate calculations using a memristor crossbar array is an important method 2 to realize neuromorphic computing. However, the memristor array fabrication technology is 3 still immature, and it is difficult t o f abricate l arge-scale a rrays w ith h igh-yield, w hich restricts 4 the development of memristor-based neuromorphic computing technology. Therefore, cascading 5 small-scale arrays to achieve the neuromorphic computational ability that can be achieved by 6 large-scale arrays, which is of great significance for promoting the application of memristor-based 7 neuromorphic computing. To address this issue, we present a memristor-based cascaded framework 8 with some basic computation units, several neural network processing units can be cascaded by this 9 means to improve the processing capability of the dataset. Besides, we introduce a split method to 10 reduce pressure of input terminal. Compared with VGGNet and GoogLeNet, the proposed cascaded 11 framework can achieve 93.54% Fashion-MNIST accuracy under the 4.15M parameters. Extensive 12 experiments with Ti/AlOx/TaOx/Pt we fabricated are conducted to show that the circuit simulation 13 results can still provide a high recognition accuracy, and the recognition accuracy loss after circuit 14 simulation can be controlled at around 0.26%. 15 Keywords: cascaded neural networks; memristor crossbar; convolutional neural networks 16 1. Introduction 17 Convolutional neural networks (CNNs) have been widely used in computer vision tasks such 18 as image classification [1][2][3], they are widely popular in industry for their superior accuracy on 19 datasets. Some concepts about CNNs were proposed by Fukushima in 1980 [4]. In 1998, LeCun et 20 al. proposed the LeNet-5 CNNs structure based on a gradient-based learning algorithm [5]. With 21 the development of deep learning algorithms, the dimension and complexity of CNNs layers have 22 grown significantly. However, state-of-the-art CNNs require too many parameters and compute at 23 billions of FLOPs, which prevents them from being utilized in embedded applications. For instance, 24 the AlexNet [1] proposed by Krizhevsky et al. in 2012, requires 60M parameters and 650K neurons; 25 ResNet [6], which is broadly used in detection task [7], has a complexity of 7.8 GFLOPs and fails to 26 achieve real-time applications even with a powerful GPU. 27 For the past few years, embedded neuromorphic processing systems have acquired significant 28 advantages, such as the ability to solve the image classification problems while consuming very 29 little area and power consumption [8]. In addition to these advantages, neuromorphic computing 30 can also be used to simulate the functions of human brains [9]. With the rapid development of the 31 VLSI industry, increasingly more devices are beginning to be miniaturized and integrated [10]. As 32 mentioned above, CNNs lead to an exploding number of network synapses and neurons in which the 33 relevant hardware costs will be huge [11], these complex computations and high memory requirements 34 make it impractical for the application of embedded systems, robotics and real-time or mobile scenarios 35 in general. 36 In recent years, the memristor [12] has received significant attention as synapses for neuromorphic 37 systems. The memristor is a non-volatile device that can store the synaptic weights of neural networks. © 2019 by the author(s). Distributed under a Creative Commons CC BY license. Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2019 doi:10.20944/preprints201901.0319.v1 2 of 11 38 Voltage pulses can be applied to memristors to change their conductivity and tune them to a specific 39 resistance [13]. A memristor crossbar array can naturally carry out vector-matrix multiplication which 40 is a computationally expensive operation for neural networks. It has been recently demonstrated that 41 analog vector-matrix multiplication can be of orders of magnitude that are more efficient than ASIC, 42 GPUs or FPGA based implementations [14]. 43 Some neural network architectures based on the memristor crossbar have been proposed [15–17], 44 these network architectures utilize memristor crossbar to perform convolution computation iterations 45 which also cost too much time and require high memory. On the other hand, the fabrication technology 46 of large-scale memristor crossbar array is still immature [18,19], however useful cascaded framework 47 in real applications with memristor-based architectures have seldom been reported. To make full 48 use of limited memristor resources and make the system work at high speed in real-time processing 49 applications, we present a memristor-based cascaded framework with some neuromorphic processing 50 chip. That means several neural network processing chips can be cascaded to improve the processing 51 capability of the dataset. The basic computation unit in this work builds on our prior work devoloping 52 memristor-based CNNs architecture [20], which has validated that the three-layer CNNs with Abs 53 activation function can get desired recognition accuracy. 54 The rest of this paper is organized as follows, Section II presents the cascaded method based on 55 the basic computation unit and a split method, including the circuits implemented based on memristor 56 crossbar array. Section III exhibits the experimental results. The final Section IV concludes the paper. 57 2. Proposed Cascaded Method 58 To have a better understanding of cascaded method, we first introduce basic units that make up 59 the cascaded network, and then a detailed description of our cascaded CNN framework is presented. 60 2.1. Basic Computation Unit 61 To take full advantage of limited memristor array, a three-layer simplified CNNs is used as the 62 basic computation unit (BCU). The structure of this network is shown in Figure1. Max Selected … Spike Flow Classification Input Image Convolution Layer Pooling Layer Fully Connected Layer Figure 1. The basic computation unit architecture. It consists of three layers, behind the input layer is convolution layer which consists of k kernels, followed by a average-pooling layer and a fully connected layer. 63 The simplified CNNs includes three layers. The convolution layer includes k kernels with kernel 64 size Ks × Ks followed by absolute nonlinearity function (Abs), which extracts the features from the 65 input images and produce the feature maps. The average-pooling obtains spatial invariance while 66 scaling feature maps with pooling size Ps × Ps from their preceding layers. A sub-sample is taken from 67 each feature map, and it reduces the data redundancy. The fully connected layer (FCL) performs the 68 final classification or image reconstruction, it takes the extracted feature maps and multiplies a weight 69 matrix following a dense matrix vector multiplication pattern. G = H(I∗) = s(W · G∗ + b) = s(W · (G∗∗ ∗ W∗) + b) (1) = s(W · (d(W∗∗ ∗ I∗ + b∗) ∗ W∗) + b) Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 31 January 2019 doi:10.20944/preprints201901.0319.v1 3 of 11 M-BCUs (a) W × H BCU #1_1 1 1 N-BCUs W2 × H2 BCU #2_1 P-BCUs W H F F 1 × 1 1 … 2 output input image BCU #1_2 BCU #3_1 … BCU #3_P … BCU #2_N W × H BCU #1_M W2 × H2 W1 × H1 Part #1 Part #2 Part #3 BCU #1_1 (b) W × H W1 × H1 BCU #1_2 f 1_1 (I * ) BCU #2_1 W × H image BCU #3_1 W × H W1 × H1 # BCU #1_3 1_ 2 * input image (!∗) f (I ) 2_1 3_1 2_1 image fimage (F) fimage ( fimage ) output W × H W1 × H1 1_ 3 * fimage (I ) Figure 2. The diagram of the proposed cascaded framework. (a): Standard cascaded network framework "M-N-P". (b): The typical "3-1-1" cascaded type. C×W×H C∗×W∗×H∗ 70 where H : R ! R is the transformation in the BCU (in other words, BCU 71 performs the image transformation), C is the number of channels of input image, s indicates the ∗ 72 hyperbolic tangent function, d indicates the Abs function, ∗ represents the convolution operator, b, b ∗ ∗∗ 73 are bias value, and W, W , W represent the weight matrix of each layer, respectively (refer to Figure 74 1). 75 2.2. The Proposed Cascaded Framework 76 Aim to combine several monolithic networks (BCUs) to obtain better performance, we propose a 77 cascaded CNN network, whose specific design can be seen in Figure2a. C×W×H 78 The cascaded framework includes three parts. Given the output G 2 R generated from a C×W×H C×W×H 79 part, a reconstruction transformation f : R ! R is applied to aggregate outputs over 80 all BCUs of this part, where C is the number of channels of input image, W and H are the spatial th 81 dimensions. The output of k part is described as Fk = Gk_1 ⊕ Gk_2 ⊕ ... ⊕ Gk_n (2) Fk+1 = Gk+1_1 ⊕ Gk+1_2 ⊕ ... ⊕ Gk+1_m (3) = Hk+1_1(Fk) ⊕ ... ⊕ Hk+1_m(Fk) th th 82 where Gk_n is the output of the n BCU of the k part, and ⊕ represents the reconstruction operator 83 which combined with several original outputs using an element-wise addition to produce a new 84 output. The new output Fk is treated as input to feed into the k + 1 part.
Recommended publications
  • Synthesizing Images of Humans in Unseen Poses
    Synthesizing Images of Humans in Unseen Poses Guha Balakrishnan Amy Zhao Adrian V. Dalca Fredo Durand MIT MIT MIT and MGH MIT [email protected] [email protected] [email protected] [email protected] John Guttag MIT [email protected] Abstract We address the computational problem of novel human pose synthesis. Given an image of a person and a desired pose, we produce a depiction of that person in that pose, re- taining the appearance of both the person and background. We present a modular generative neural network that syn- Source Image Target Pose Synthesized Image thesizes unseen poses using training pairs of images and poses taken from human action videos. Our network sepa- Figure 1. Our method takes an input image along with a desired target pose, and automatically synthesizes a new image depicting rates a scene into different body part and background lay- the person in that pose. We retain the person’s appearance as well ers, moves body parts to new locations and refines their as filling in appropriate background textures. appearances, and composites the new foreground with a hole-filled background. These subtasks, implemented with separate modules, are trained jointly using only a single part details consistent with the new pose. Differences in target image as a supervised label. We use an adversarial poses can cause complex changes in the image space, in- discriminator to force our network to synthesize realistic volving several moving parts and self-occlusions. Subtle details conditioned on pose. We demonstrate image syn- details such as shading and edges should perceptually agree thesis results on three action classes: golf, yoga/workouts with the body’s configuration.
    [Show full text]
  • Memristor-Based Approximated Computation
    Memristor-based Approximated Computation Boxun Li1, Yi Shan1, Miao Hu2, Yu Wang1, Yiran Chen2, Huazhong Yang1 1Dept. of E.E., TNList, Tsinghua University, Beijing, China 2Dept. of E.C.E., University of Pittsburgh, Pittsburgh, USA 1 Email: [email protected] Abstract—The cessation of Moore’s Law has limited further architectures, which not only provide a promising hardware improvements in power efficiency. In recent years, the physical solution to neuromorphic system but also help drastically close realization of the memristor has demonstrated a promising the gap of power efficiency between computing systems and solution to ultra-integrated hardware realization of neural net- works, which can be leveraged for better performance and the brain. The memristor is one of those promising devices. power efficiency gains. In this work, we introduce a power The memristor is able to support a large number of signal efficient framework for approximated computations by taking connections within a small footprint by taking the advantage advantage of the memristor-based multilayer neural networks. of the ultra-integration density [7]. And most importantly, A programmable memristor approximated computation unit the nonvolatile feature that the state of the memristor could (Memristor ACU) is introduced first to accelerate approximated computation and a memristor-based approximated computation be tuned by the current passing through itself makes the framework with scalability is proposed on top of the Memristor memristor a potential, perhaps even the best, device to realize ACU. We also introduce a parameter configuration algorithm of neuromorphic computing systems with picojoule level energy the Memristor ACU and a feedback state tuning circuit to pro- consumption [8], [9].
    [Show full text]
  • A Survey of Autonomous Driving: Common Practices and Emerging Technologies
    Accepted March 22, 2020 Digital Object Identifier 10.1109/ACCESS.2020.2983149 A Survey of Autonomous Driving: Common Practices and Emerging Technologies EKIM YURTSEVER1, (Member, IEEE), JACOB LAMBERT 1, ALEXANDER CARBALLO 1, (Member, IEEE), AND KAZUYA TAKEDA 1, 2, (Senior Member, IEEE) 1Nagoya University, Furo-cho, Nagoya, 464-8603, Japan 2Tier4 Inc. Nagoya, Japan Corresponding author: Ekim Yurtsever (e-mail: [email protected]). ABSTRACT Automated driving systems (ADSs) promise a safe, comfortable and efficient driving experience. However, fatalities involving vehicles equipped with ADSs are on the rise. The full potential of ADSs cannot be realized unless the robustness of state-of-the-art is improved further. This paper discusses unsolved problems and surveys the technical aspect of automated driving. Studies regarding present challenges, high- level system architectures, emerging methodologies and core functions including localization, mapping, perception, planning, and human machine interfaces, were thoroughly reviewed. Furthermore, many state- of-the-art algorithms were implemented and compared on our own platform in a real-world driving setting. The paper concludes with an overview of available datasets and tools for ADS development. INDEX TERMS Autonomous Vehicles, Control, Robotics, Automation, Intelligent Vehicles, Intelligent Transportation Systems I. INTRODUCTION necessary here. CCORDING to a recent technical report by the Eureka Project PROMETHEUS [11] was carried out in A National Highway Traffic Safety Administration Europe between 1987-1995, and it was one of the earliest (NHTSA), 94% of road accidents are caused by human major automated driving studies. The project led to the errors [1]. Against this backdrop, Automated Driving Sys- development of VITA II by Daimler-Benz, which succeeded tems (ADSs) are being developed with the promise of in automatically driving on highways [12].
    [Show full text]
  • CSE 152: Computer Vision Manmohan Chandraker
    CSE 152: Computer Vision Manmohan Chandraker Lecture 15: Optimization in CNNs Recap Engineered against learned features Label Convolutional filters are trained in a Dense supervised manner by back-propagating classification error Dense Dense Convolution + pool Label Convolution + pool Classifier Convolution + pool Pooling Convolution + pool Feature extraction Convolution + pool Image Image Jia-Bin Huang and Derek Hoiem, UIUC Two-layer perceptron network Slide credit: Pieter Abeel and Dan Klein Neural networks Non-linearity Activation functions Multi-layer neural network From fully connected to convolutional networks next layer image Convolutional layer Slide: Lazebnik Spatial filtering is convolution Convolutional Neural Networks [Slides credit: Efstratios Gavves] 2D spatial filters Filters over the whole image Weight sharing Insight: Images have similar features at various spatial locations! Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution (Learned) . Input Image Input Feature Map Source: R. Fergus, Y. LeCun Slide: Lazebnik Convolution as a feature extractor Key operations in a CNN Feature maps Rectified Linear Unit (ReLU) Spatial pooling Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Slide: Lazebnik Key operations in a CNN Feature maps Spatial pooling Max Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Slide: Lazebnik Pooling operations • Aggregate multiple values into a single value • Invariance to small transformations • Keep only most important information for next layer • Reduces the size of the next layer • Fewer parameters, faster computations • Observe larger receptive field in next layer • Hierarchically extract more abstract features Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution (Learned) . Input Image Input Feature Map Source: R.
    [Show full text]
  • 1 Convolution
    CS1114 Section 6: Convolution February 27th, 2013 1 Convolution Convolution is an important operation in signal and image processing. Convolution op- erates on two signals (in 1D) or two images (in 2D): you can think of one as the \input" signal (or image), and the other (called the kernel) as a “filter” on the input image, pro- ducing an output image (so convolution takes two images as input and produces a third as output). Convolution is an incredibly important concept in many areas of math and engineering (including computer vision, as we'll see later). Definition. Let's start with 1D convolution (a 1D \image," is also known as a signal, and can be represented by a regular 1D vector in Matlab). Let's call our input vector f and our kernel g, and say that f has length n, and g has length m. The convolution f ∗ g of f and g is defined as: m X (f ∗ g)(i) = g(j) · f(i − j + m=2) j=1 One way to think of this operation is that we're sliding the kernel over the input image. For each position of the kernel, we multiply the overlapping values of the kernel and image together, and add up the results. This sum of products will be the value of the output image at the point in the input image where the kernel is centered. Let's look at a simple example. Suppose our input 1D image is: f = 10 50 60 10 20 40 30 and our kernel is: g = 1=3 1=3 1=3 Let's call the output image h.
    [Show full text]
  • Deep Clustering with Convolutional Autoencoders
    Deep Clustering with Convolutional Autoencoders Xifeng Guo1, Xinwang Liu1, En Zhu1, and Jianping Yin2 1 College of Computer, National University of Defense Technology, Changsha, 410073, China [email protected] 2 State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, 410073, China Abstract. Deep clustering utilizes deep neural networks to learn fea- ture representation that is suitable for clustering tasks. Though demon- strating promising performance in various applications, we observe that existing deep clustering algorithms either do not well take advantage of convolutional neural networks or do not considerably preserve the local structure of data generating distribution in the learned feature space. To address this issue, we propose a deep convolutional embedded clus- tering algorithm in this paper. Specifically, we develop a convolutional autoencoders structure to learn embedded features in an end-to-end way. Then, a clustering oriented loss is directly built on embedded features to jointly perform feature refinement and cluster assignment. To avoid feature space being distorted by the clustering loss, we keep the decoder remained which can preserve local structure of data in feature space. In sum, we simultaneously minimize the reconstruction loss of convolutional autoencoders and the clustering loss. The resultant optimization prob- lem can be effectively solved by mini-batch stochastic gradient descent and back-propagation. Experiments on benchmark datasets empirically validate
    [Show full text]
  • Tensorizing Neural Networks
    Tensorizing Neural Networks Alexander Novikov1;4 Dmitry Podoprikhin1 Anton Osokin2 Dmitry Vetrov1;3 1Skolkovo Institute of Science and Technology, Moscow, Russia 2INRIA, SIERRA project-team, Paris, France 3National Research University Higher School of Economics, Moscow, Russia 4Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow, Russia [email protected] [email protected] [email protected] [email protected] Abstract Deep neural networks currently demonstrate state-of-the-art performance in sev- eral domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train [17] format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks [21] we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times. 1 Introduction Deep neural networks currently demonstrate state-of-the-art performance in many domains of large- scale machine learning, such as computer vision, speech recognition, text processing, etc. These advances have become possible because of algorithmic advances, large amounts of available data, and modern hardware.
    [Show full text]
  • Fully Convolutional Mesh Autoencoder Using Efficient Spatially Varying Kernels
    Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels Yi Zhou∗ Chenglei Wu Adobe Research Facebook Reality Labs Zimo Li Chen Cao Yuting Ye University of Southern California Facebook Reality Labs Facebook Reality Labs Jason Saragih Hao Li Yaser Sheikh Facebook Reality Labs Pinscreen Facebook Reality Labs Abstract Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes, like tetrahedrons and non-manifold meshes. While more general graph convolution methods can be employed, they lack performance in reconstruction precision and require higher memory usage. In this paper, we propose a non-template-specific fully convolutional mesh autoencoder for arbitrary registered mesh data. It is enabled by our novel convolution and (un)pooling operators learned with globally shared weights and locally varying coefficients which can efficiently capture the spatially varying contents presented by irregular mesh connections. Our model outperforms state-of-the-art methods on reconstruction accuracy. In addition, the latent codes of our network are fully localized thanks to the fully convolutional structure, and thus have much higher interpolation capability than many traditional 3D mesh generation models. 1 Introduction arXiv:2006.04325v2 [cs.CV] 21 Oct 2020 Learning latent representations for registered meshes 2, either from performance capture or physical simulation, is a core component for many 3D tasks, ranging from compressing and reconstruction to animation and simulation.
    [Show full text]
  • Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions Ramesh Simhambhatla SMU, [email protected]
    SMU Data Science Review Volume 2 | Number 1 Article 23 2019 Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions Ramesh Simhambhatla SMU, [email protected] Kevin Okiah SMU, [email protected] Shravan Kuchkula SMU, [email protected] Robert Slater Southern Methodist University, [email protected] Follow this and additional works at: https://scholar.smu.edu/datasciencereview Part of the Artificial Intelligence and Robotics Commons, Other Computer Engineering Commons, and the Statistical Models Commons Recommended Citation Simhambhatla, Ramesh; Okiah, Kevin; Kuchkula, Shravan; and Slater, Robert (2019) "Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions," SMU Data Science Review: Vol. 2 : No. 1 , Article 23. Available at: https://scholar.smu.edu/datasciencereview/vol2/iss1/23 This Article is brought to you for free and open access by SMU Scholar. It has been accepted for inclusion in SMU Data Science Review by an authorized administrator of SMU Scholar. For more information, please visit http://digitalrepository.smu.edu. Simhambhatla et al.: Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions Self-Driving Cars: Evaluation of Deep Learning Techniques for Object Detection in Different Driving Conditions Kevin Okiah1, Shravan Kuchkula1, Ramesh Simhambhatla1, Robert Slater1 1 Master of Science in Data Science, Southern Methodist University, Dallas, TX 75275 USA {KOkiah, SKuchkula, RSimhambhatla, RSlater}@smu.edu Abstract. Deep Learning has revolutionized Computer Vision, and it is the core technology behind capabilities of a self-driving car. Convolutional Neural Networks (CNNs) are at the heart of this deep learning revolution for improving the task of object detection.
    [Show full text]
  • Geometrical Aspects of Statistical Learning Theory
    Geometrical Aspects of Statistical Learning Theory Vom Fachbereich Informatik der Technischen Universit¨at Darmstadt genehmigte Dissertation zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) vorgelegt von Dipl.-Phys. Matthias Hein aus Esslingen am Neckar Prufungskommission:¨ Vorsitzender: Prof. Dr. B. Schiele Erstreferent: Prof. Dr. T. Hofmann Korreferent : Prof. Dr. B. Sch¨olkopf Tag der Einreichung: 30.9.2005 Tag der Disputation: 9.11.2005 Darmstadt, 2005 Hochschulkennziffer: D17 Abstract Geometry plays an important role in modern statistical learning theory, and many different aspects of geometry can be found in this fast developing field. This thesis addresses some of these aspects. A large part of this work will be concerned with so called manifold methods, which have recently attracted a lot of interest. The key point is that for a lot of real-world data sets it is natural to assume that the data lies on a low-dimensional submanifold of a potentially high-dimensional Euclidean space. We develop a rigorous and quite general framework for the estimation and ap- proximation of some geometric structures and other quantities of this submanifold, using certain corresponding structures on neighborhood graphs built from random samples of that submanifold. Another part of this thesis deals with the generalizati- on of the maximal margin principle to arbitrary metric spaces. This generalization follows quite naturally by changing the viewpoint on the well-known support vector machines (SVM). It can be shown that the SVM can be seen as an algorithm which applies the maximum margin principle to a subclass of metric spaces. The motivati- on to consider the generalization to arbitrary metric spaces arose by the observation that in practice the condition for the applicability of the SVM is rather difficult to check for a given metric.
    [Show full text]
  • Pre-Training Cnns Using Convolutional Autoencoders
    Pre-Training CNNs Using Convolutional Autoencoders Maximilian Kohlbrenner Russell Hofmann TU Berlin TU Berlin [email protected] [email protected] Sabbir Ahmmed Youssef Kashef TU Berlin TU Berlin [email protected] [email protected] Abstract Despite convolutional neural networks being the state of the art in almost all computer vision tasks, their training remains a difficult task. Unsupervised rep- resentation learning using a convolutional autoencoder can be used to initialize network weights and has been shown to improve test accuracy after training. We reproduce previous results using this approach and successfully apply it to the difficult Extended Cohn-Kanade dataset for which labels are extremely sparse but additional unlabeled data is available for unsupervised use. 1 Introduction A lot of progress has been made in the field of artificial neural networks in recent years and as a result most computer vision tasks today are best solved using this approach. However, the training of deep neural networks still remains a difficult problem and results are highly dependent on the model initialization (local minima). During a classification task, a Convolutional Neural Network (CNN) first learns a new data representation using its convolution layers as feature extractors and then uses several fully-connected layers for decision-making. While the representation after the convolutional layers is usually optizimed for classification, some learned features might be more general and also useful outside of this specific task. Instead of directly optimizing for a good class prediction, one can therefore start by focussing on the intermediate goal of learning a good data representation before beginning to work on the classification problem.
    [Show full text]
  • Lecture 10: Recurrent Neural Networks
    Lecture 10: Recurrent Neural Networks Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 1 May 4, 2017 Administrative A1 grades will go out soon A2 is due today (11:59pm) Midterm is in-class on Tuesday! We will send out details on where to go soon Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 2 May 4, 2017 Extra Credit: Train Game More details on Piazza by early next week Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 3 May 4, 2017 Last Time: CNN Architectures AlexNet Figure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 4 May 4, 2017 Last Time: CNN Architectures Softmax FC 1000 Softmax FC 4096 FC 1000 FC 4096 FC 4096 Pool FC 4096 3x3 conv, 512 Pool 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 Pool Pool 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 3x3 conv, 512 Pool Pool 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 3x3 conv, 256 Pool Pool 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 3x3 conv, 128 Pool Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 Input Input VGG16 VGG19 GoogLeNet Figure copyright Kaiming He, 2016. Reproduced with permission. Fei-Fei Li & Justin Johnson & Serena Yeung Lecture 10 - 5 May 4, 2017 Last Time: CNN Architectures Softmax FC 1000 Pool 3x3 conv, 64 3x3 conv, 64 3x3 conv, 64 relu 3x3 conv, 64 3x3 conv, 64 F(x) + x 3x3 conv, 64 ..
    [Show full text]