Recent Advances in Convolutional Neural Networks
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Multiscale Computation and Dynamic Attention in Biological and Artificial
brain sciences Review Multiscale Computation and Dynamic Attention in Biological and Artificial Intelligence Ryan Paul Badman 1,* , Thomas Trenholm Hills 2 and Rei Akaishi 1,* 1 Center for Brain Science, RIKEN, Saitama 351-0198, Japan 2 Department of Psychology, University of Warwick, Coventry CV4 7AL, UK; [email protected] * Correspondence: [email protected] (R.P.B.); [email protected] (R.A.) Received: 4 March 2020; Accepted: 17 June 2020; Published: 20 June 2020 Abstract: Biological and artificial intelligence (AI) are often defined by their capacity to achieve a hierarchy of short-term and long-term goals that require incorporating information over time and space at both local and global scales. More advanced forms of this capacity involve the adaptive modulation of integration across scales, which resolve computational inefficiency and explore-exploit dilemmas at the same time. Research in neuroscience and AI have both made progress towards understanding architectures that achieve this. Insight into biological computations come from phenomena such as decision inertia, habit formation, information search, risky choices and foraging. Across these domains, the brain is equipped with mechanisms (such as the dorsal anterior cingulate and dorsolateral prefrontal cortex) that can represent and modulate across scales, both with top-down control processes and by local to global consolidation as information progresses from sensory to prefrontal areas. Paralleling these biological architectures, progress in AI is marked by innovations in dynamic multiscale modulation, moving from recurrent and convolutional neural networks—with fixed scalings—to attention, transformers, dynamic convolutions, and consciousness priors—which modulate scale to input and increase scale breadth. -
Deep Neural Network Models for Sequence Labeling and Coreference Tasks
Federal state autonomous educational institution for higher education ¾Moscow institute of physics and technology (national research university)¿ On the rights of a manuscript Le The Anh DEEP NEURAL NETWORK MODELS FOR SEQUENCE LABELING AND COREFERENCE TASKS Specialty 05.13.01 - ¾System analysis, control theory, and information processing (information and technical systems)¿ A dissertation submitted in requirements for the degree of candidate of technical sciences Supervisor: PhD of physical and mathematical sciences Burtsev Mikhail Sergeevich Dolgoprudny - 2020 Федеральное государственное автономное образовательное учреждение высшего образования ¾Московский физико-технический институт (национальный исследовательский университет)¿ На правах рукописи Ле Тхе Ань ГЛУБОКИЕ НЕЙРОСЕТЕВЫЕ МОДЕЛИ ДЛЯ ЗАДАЧ РАЗМЕТКИ ПОСЛЕДОВАТЕЛЬНОСТИ И РАЗРЕШЕНИЯ КОРЕФЕРЕНЦИИ Специальность 05.13.01 – ¾Системный анализ, управление и обработка информации (информационные и технические системы)¿ Диссертация на соискание учёной степени кандидата технических наук Научный руководитель: кандидат физико-математических наук Бурцев Михаил Сергеевич Долгопрудный - 2020 Contents Abstract 4 Acknowledgments 6 Abbreviations 7 List of Figures 11 List of Tables 13 1 Introduction 14 1.1 Overview of Deep Learning . 14 1.1.1 Artificial Intelligence, Machine Learning, and Deep Learning . 14 1.1.2 Milestones in Deep Learning History . 16 1.1.3 Types of Machine Learning Models . 16 1.2 Brief Overview of Natural Language Processing . 18 1.3 Dissertation Overview . 20 1.3.1 Scientific Actuality of the Research . 20 1.3.2 The Goal and Task of the Dissertation . 20 1.3.3 Scientific Novelty . 21 1.3.4 Theoretical and Practical Value of the Work in the Dissertation . 21 1.3.5 Statements to be Defended . 22 1.3.6 Presentations and Validation of the Research Results . -
CSE 152: Computer Vision Manmohan Chandraker
CSE 152: Computer Vision Manmohan Chandraker Lecture 15: Optimization in CNNs Recap Engineered against learned features Label Convolutional filters are trained in a Dense supervised manner by back-propagating classification error Dense Dense Convolution + pool Label Convolution + pool Classifier Convolution + pool Pooling Convolution + pool Feature extraction Convolution + pool Image Image Jia-Bin Huang and Derek Hoiem, UIUC Two-layer perceptron network Slide credit: Pieter Abeel and Dan Klein Neural networks Non-linearity Activation functions Multi-layer neural network From fully connected to convolutional networks next layer image Convolutional layer Slide: Lazebnik Spatial filtering is convolution Convolutional Neural Networks [Slides credit: Efstratios Gavves] 2D spatial filters Filters over the whole image Weight sharing Insight: Images have similar features at various spatial locations! Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution (Learned) . Input Image Input Feature Map Source: R. Fergus, Y. LeCun Slide: Lazebnik Convolution as a feature extractor Key operations in a CNN Feature maps Rectified Linear Unit (ReLU) Spatial pooling Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Slide: Lazebnik Key operations in a CNN Feature maps Spatial pooling Max Non-linearity Convolution (Learned) Input Image Source: R. Fergus, Y. LeCun Slide: Lazebnik Pooling operations • Aggregate multiple values into a single value • Invariance to small transformations • Keep only most important information for next layer • Reduces the size of the next layer • Fewer parameters, faster computations • Observe larger receptive field in next layer • Hierarchically extract more abstract features Key operations in a CNN Feature maps Spatial pooling Non-linearity Convolution (Learned) . Input Image Input Feature Map Source: R. -
1 Convolution
CS1114 Section 6: Convolution February 27th, 2013 1 Convolution Convolution is an important operation in signal and image processing. Convolution op- erates on two signals (in 1D) or two images (in 2D): you can think of one as the \input" signal (or image), and the other (called the kernel) as a “filter” on the input image, pro- ducing an output image (so convolution takes two images as input and produces a third as output). Convolution is an incredibly important concept in many areas of math and engineering (including computer vision, as we'll see later). Definition. Let's start with 1D convolution (a 1D \image," is also known as a signal, and can be represented by a regular 1D vector in Matlab). Let's call our input vector f and our kernel g, and say that f has length n, and g has length m. The convolution f ∗ g of f and g is defined as: m X (f ∗ g)(i) = g(j) · f(i − j + m=2) j=1 One way to think of this operation is that we're sliding the kernel over the input image. For each position of the kernel, we multiply the overlapping values of the kernel and image together, and add up the results. This sum of products will be the value of the output image at the point in the input image where the kernel is centered. Let's look at a simple example. Suppose our input 1D image is: f = 10 50 60 10 20 40 30 and our kernel is: g = 1=3 1=3 1=3 Let's call the output image h. -
Deep Clustering with Convolutional Autoencoders
Deep Clustering with Convolutional Autoencoders Xifeng Guo1, Xinwang Liu1, En Zhu1, and Jianping Yin2 1 College of Computer, National University of Defense Technology, Changsha, 410073, China [email protected] 2 State Key Laboratory of High Performance Computing, National University of Defense Technology, Changsha, 410073, China Abstract. Deep clustering utilizes deep neural networks to learn fea- ture representation that is suitable for clustering tasks. Though demon- strating promising performance in various applications, we observe that existing deep clustering algorithms either do not well take advantage of convolutional neural networks or do not considerably preserve the local structure of data generating distribution in the learned feature space. To address this issue, we propose a deep convolutional embedded clus- tering algorithm in this paper. Specifically, we develop a convolutional autoencoders structure to learn embedded features in an end-to-end way. Then, a clustering oriented loss is directly built on embedded features to jointly perform feature refinement and cluster assignment. To avoid feature space being distorted by the clustering loss, we keep the decoder remained which can preserve local structure of data in feature space. In sum, we simultaneously minimize the reconstruction loss of convolutional autoencoders and the clustering loss. The resultant optimization prob- lem can be effectively solved by mini-batch stochastic gradient descent and back-propagation. Experiments on benchmark datasets empirically validate -
Tensorizing Neural Networks
Tensorizing Neural Networks Alexander Novikov1;4 Dmitry Podoprikhin1 Anton Osokin2 Dmitry Vetrov1;3 1Skolkovo Institute of Science and Technology, Moscow, Russia 2INRIA, SIERRA project-team, Paris, France 3National Research University Higher School of Economics, Moscow, Russia 4Institute of Numerical Mathematics of the Russian Academy of Sciences, Moscow, Russia [email protected] [email protected] [email protected] [email protected] Abstract Deep neural networks currently demonstrate state-of-the-art performance in sev- eral domains. At the same time, models of this class are very demanding in terms of computational resources. In particular, a large amount of memory is required by commonly used fully-connected layers, making it hard to use the models on low-end devices and stopping the further increase of the model size. In this paper we convert the dense weight matrices of the fully-connected layers to the Tensor Train [17] format such that the number of parameters is reduced by a huge factor and at the same time the expressive power of the layer is preserved. In particular, for the Very Deep VGG networks [21] we report the compression factor of the dense weight matrix of a fully-connected layer up to 200000 times leading to the compression factor of the whole network up to 7 times. 1 Introduction Deep neural networks currently demonstrate state-of-the-art performance in many domains of large- scale machine learning, such as computer vision, speech recognition, text processing, etc. These advances have become possible because of algorithmic advances, large amounts of available data, and modern hardware. -
Fully Convolutional Mesh Autoencoder Using Efficient Spatially Varying Kernels
Fully Convolutional Mesh Autoencoder using Efficient Spatially Varying Kernels Yi Zhou∗ Chenglei Wu Adobe Research Facebook Reality Labs Zimo Li Chen Cao Yuting Ye University of Southern California Facebook Reality Labs Facebook Reality Labs Jason Saragih Hao Li Yaser Sheikh Facebook Reality Labs Pinscreen Facebook Reality Labs Abstract Learning latent representations of registered meshes is useful for many 3D tasks. Techniques have recently shifted to neural mesh autoencoders. Although they demonstrate higher precision than traditional methods, they remain unable to capture fine-grained deformations. Furthermore, these methods can only be applied to a template-specific surface mesh, and is not applicable to more general meshes, like tetrahedrons and non-manifold meshes. While more general graph convolution methods can be employed, they lack performance in reconstruction precision and require higher memory usage. In this paper, we propose a non-template-specific fully convolutional mesh autoencoder for arbitrary registered mesh data. It is enabled by our novel convolution and (un)pooling operators learned with globally shared weights and locally varying coefficients which can efficiently capture the spatially varying contents presented by irregular mesh connections. Our model outperforms state-of-the-art methods on reconstruction accuracy. In addition, the latent codes of our network are fully localized thanks to the fully convolutional structure, and thus have much higher interpolation capability than many traditional 3D mesh generation models. 1 Introduction arXiv:2006.04325v2 [cs.CV] 21 Oct 2020 Learning latent representations for registered meshes 2, either from performance capture or physical simulation, is a core component for many 3D tasks, ranging from compressing and reconstruction to animation and simulation. -
Audio Event Classification Using Deep Learning in an End-To-End Approach
Audio Event Classification using Deep Learning in an End-to-End Approach Master thesis Jose Luis Diez Antich Aalborg University Copenhagen A. C. Meyers Vænge 15 2450 Copenhagen SV Denmark Title: Abstract: Audio Event Classification using Deep Learning in an End-to-End Approach The goal of the master thesis is to study the task of Sound Event Classification Participant(s): using Deep Neural Networks in an end- Jose Luis Diez Antich to-end approach. Sound Event Classifi- cation it is a multi-label classification problem of sound sources originated Supervisor(s): from everyday environments. An auto- Hendrik Purwins matic system for it would many applica- tions, for example, it could help users of hearing devices to understand their sur- Page Numbers: 38 roundings or enhance robot navigation systems. The end-to-end approach con- Date of Completion: sists in systems that learn directly from June 16, 2017 data, not from features, and it has been recently applied to audio and its results are remarkable. Even though the re- sults do not show an improvement over standard approaches, the contribution of this thesis is an exploration of deep learning architectures which can be use- ful to understand how networks process audio. The content of this report is freely available, but publication (with reference) may only be pursued due to agreement with the author. Contents 1 Introduction1 1.1 Scope of this work.............................2 2 Deep Learning3 2.1 Overview..................................3 2.2 Multilayer Perceptron...........................4 -
Convolutional Neural Network Model Layers Improvement for Segmentation and Classification on Kidney Stone Images Using Keras and Tensorflow
Journal of Multidisciplinary Engineering Science and Technology (JMEST) ISSN: 2458-9403 Vol. 8 Issue 6, June - 2021 Convolutional Neural Network Model Layers Improvement For Segmentation And Classification On Kidney Stone Images Using Keras And Tensorflow Orobosa Libert Joseph Waliu Olalekan Apena Department of Computer / Electrical and (1) Department of Computer / Electrical and Electronics Engineering, The Federal University of Electronics Engineering, The Federal University of Technology, Akure. Nigeria. Technology, Akure. Nigeria. [email protected] (2) Biomedical Computing and Engineering Technology, Applied Research Group, Coventry University, Coventry, United Kingdom [email protected], [email protected] Abstract—Convolutional neural network (CNN) and analysis, by automatic or semiautomatic means, of models are beneficial to image classification large quantities of data in order to discover meaningful algorithms training for highly abstract features patterns [1,2]. The domains of data mining include: and work with less parameter. Over-fitting, image mining, opinion mining, web mining, text mining, exploding gradient, and class imbalance are CNN and graph mining and so on. Some of its applications major challenges during training; with appropriate include anomaly detection, financial data analysis, management training, these issues can be medical data analysis, social network analysis, market diminished and enhance model performance. The analysis [3]. Recent progress in deep learning using models are 128 by 128 CNN-ML and 256 by 256 CNN machine learning (CNN-ML) has been helpful in CNN-ML training (or learning) and classification. decision support and contributed to positive outcome, The results were compared for each model significantly. The application of CNN-ML to diverse classifier. The study of 128 by 128 CNN-ML model areas of soft computing is adapted diagnosis has the following evaluation results consideration procedure to enhance time and accuracy [4]. -
Geometrical Aspects of Statistical Learning Theory
Geometrical Aspects of Statistical Learning Theory Vom Fachbereich Informatik der Technischen Universit¨at Darmstadt genehmigte Dissertation zur Erlangung des akademischen Grades Doctor rerum naturalium (Dr. rer. nat.) vorgelegt von Dipl.-Phys. Matthias Hein aus Esslingen am Neckar Prufungskommission:¨ Vorsitzender: Prof. Dr. B. Schiele Erstreferent: Prof. Dr. T. Hofmann Korreferent : Prof. Dr. B. Sch¨olkopf Tag der Einreichung: 30.9.2005 Tag der Disputation: 9.11.2005 Darmstadt, 2005 Hochschulkennziffer: D17 Abstract Geometry plays an important role in modern statistical learning theory, and many different aspects of geometry can be found in this fast developing field. This thesis addresses some of these aspects. A large part of this work will be concerned with so called manifold methods, which have recently attracted a lot of interest. The key point is that for a lot of real-world data sets it is natural to assume that the data lies on a low-dimensional submanifold of a potentially high-dimensional Euclidean space. We develop a rigorous and quite general framework for the estimation and ap- proximation of some geometric structures and other quantities of this submanifold, using certain corresponding structures on neighborhood graphs built from random samples of that submanifold. Another part of this thesis deals with the generalizati- on of the maximal margin principle to arbitrary metric spaces. This generalization follows quite naturally by changing the viewpoint on the well-known support vector machines (SVM). It can be shown that the SVM can be seen as an algorithm which applies the maximum margin principle to a subclass of metric spaces. The motivati- on to consider the generalization to arbitrary metric spaces arose by the observation that in practice the condition for the applicability of the SVM is rather difficult to check for a given metric. -
4 Nonlinear Regression and Multilayer Perceptron
4 Nonlinear Regression and Multilayer Perceptron In nonlinear regression the output variable y is no longer a linear function of the regression parameters plus additive noise. This means that estimation of the parameters is harder. It does not reduce to minimizing a convex energy functions { unlike the methods we described earlier. The perceptron is an analogy to the neural networks in the brain (over-simplified). It Pd receives a set of inputs y = j=1 !jxj + !0, see Figure (3). Figure 3: Idealized neuron implementing a perceptron. It has a threshold function which can be hard or soft. The hard one is ζ(a) = 1; if a > 0, ζ(a) = 0; otherwise. The soft one is y = σ(~!T ~x) = 1=(1 + e~!T ~x), where σ(·) is the sigmoid function. There are a variety of different algorithms to train a perceptron from labeled examples. Example: The quadratic error: t t 1 t t 2 E(~!j~x ; y ) = 2 (y − ~! · ~x ) ; for which the update rule is ∆!t = −∆ @E = +∆(yt~! · ~xt)~xt. Introducing the sigmoid j @!j function rt = sigmoid(~!T ~xt), we have t t P t t t t E(~!j~x ; y ) = − i ri log yi + (1 − ri) log(1 − yi ) , and the update rule is t t t t ∆!j = −η(r − y )xj, where η is the learning factor. I.e, the update rule is the learning factor × (desired output { actual output)× input. 10 4.1 Multilayer Perceptrons Multilayer perceptrons were developed to address the limitations of perceptrons (introduced in subsection 2.1) { i.e. -
Pre-Training Cnns Using Convolutional Autoencoders
Pre-Training CNNs Using Convolutional Autoencoders Maximilian Kohlbrenner Russell Hofmann TU Berlin TU Berlin [email protected] [email protected] Sabbir Ahmmed Youssef Kashef TU Berlin TU Berlin [email protected] [email protected] Abstract Despite convolutional neural networks being the state of the art in almost all computer vision tasks, their training remains a difficult task. Unsupervised rep- resentation learning using a convolutional autoencoder can be used to initialize network weights and has been shown to improve test accuracy after training. We reproduce previous results using this approach and successfully apply it to the difficult Extended Cohn-Kanade dataset for which labels are extremely sparse but additional unlabeled data is available for unsupervised use. 1 Introduction A lot of progress has been made in the field of artificial neural networks in recent years and as a result most computer vision tasks today are best solved using this approach. However, the training of deep neural networks still remains a difficult problem and results are highly dependent on the model initialization (local minima). During a classification task, a Convolutional Neural Network (CNN) first learns a new data representation using its convolution layers as feature extractors and then uses several fully-connected layers for decision-making. While the representation after the convolutional layers is usually optizimed for classification, some learned features might be more general and also useful outside of this specific task. Instead of directly optimizing for a good class prediction, one can therefore start by focussing on the intermediate goal of learning a good data representation before beginning to work on the classification problem.