A Comparative Study of Routing Methods in Capsule Networks

Total Page:16

File Type:pdf, Size:1020Kb

A Comparative Study of Routing Methods in Capsule Networks Master of Science Thesis in Electrical Engineering Department of Electrical Engineering, Linköping University, 2019 A Comparative Study of Routing Methods in Capsule Networks Christoffer Malmgren Master of Science Thesis in Electrical Engineering A Comparative Study of Routing Methods in Capsule Networks Christoffer Malmgren LiTH-ISY-EX--19/5188--SE Supervisor: M.Sc Gustav Häger isy, Linköpings universitet Ph.D. Erik Ringaby SICK Linköping Examiner: Prof. Michael Felsberg isy, Linköpings universitet Computer Vision Laboratory Department of Electrical Engineering Linköping University SE-581 83 Linköping, Sweden Copyright © 2019 Christoffer Malmgren Abstract Recently, the deep neural network structure caps-net was proposed by Sabour et al. [11]. Capsule networks are designed to learn relative geometry between the features of a layer and the features of the next layer. The Capsule network’s main building blocks are capsules, which are represented by vectors. The idea is that each capsule will represent a feature as well as traits or subfeatures of that feature. This allows for smart information routing. Capsules traits are used to predict the traits of the capsules in the next layer, and information is sent to to next layer capsules on which the predictions agree. This is called routing by agreement. This thesis investigates theoretical support of new and existing routing al- gorithms as well as evaluates their performance on the MNIST [16] and CIFAR- 10 [8] datasets. A variation of the dynamic routing algorithm presented in the original paper [11] achieved the highest accuracy and fastest execution time. iii Acknowledgments I want to thank SICK Linköping for the opportunity to make this thesis. Thanks to my examiner Michael Felsberg and supervisors Erik Ringaby and Gustav Häger for their support and helpful reviews. I also want to thank all colleagues at SICK Linköping for showing interest in my thesis and providing insight as well as in- teresting discussions. Finally, I want to thank my family and friends. Linköping, May, 2019 Christoffer Malmgren v Contents Notation ix 1 Introduction 1 1.1 Background . 1 1.2 Problem formulation . 2 1.3 Previous work . 2 1.4 Limitations . 3 2 Theoretical background 5 2.1 Neural networks . 5 2.2 Capsule networks . 7 2.3 Clustering . 9 2.3.1 K-means . 9 2.3.2 Expectation maximization . 10 2.3.3 Kernel density estimation . 11 2.3.4 Variational Bayesian methods . 11 2.4 Routing in current capsule networks . 12 2.4.1 Dynamic routing . 12 2.4.2 Optimized dynamic routing . 14 2.4.3 EM-routing . 15 2.4.4 Fast dynamic routing based on weighted kernel density es- timation . 16 3 Model design 19 3.1 Novel routing algorithms . 19 3.1.1 Mean field Gaussian mixture model . 19 3.1.2 Mean field soft k-means . 21 3.1.3 Soft k-means routing . 22 3.2 Practical modifications . 23 3.3 Existing algorithms . 24 4 Experiments 25 4.1 Implementation . 25 vii viii Contents 4.2 Data preprocessing . 29 4.2.1 MNIST . 29 4.2.2 CIFAR-10 . 29 4.3 Results . 31 5 Conclusions 35 5.1 Method analysis . 35 5.2 Routing method comparison . 36 5.3 Contributions . 36 5.4 Future work . 37 A Appendix 41 A.1 Fixed weights Gaussian mixture model expectation maximization 42 A.2 DR optimization . 43 A.3 Mean field Gaussian mixture model . 43 A.4 Mean field soft k-means . 46 A.5 Soft k-means . 47 A.6 Hyper parameters for test accuracy evaluation . 48 A.7 Validation . 50 Bibliography 91 Notation Type setting Notation Meaning x A scalar x A 2D vector X A multidimensional matrix or tensor A set X xi;:::;j Indexing or extraction of a matrix value x(t) The value of x at time or iteration t For all 8 Operators and functions Notation Meaning Σ Sum Π Product R Integral · ; · Scalar product h i · The p-norm. If p is not given then p = 2 is assumed k kp Assignment KL( · ; ·) Kullback–Leibler divergence N(µ, σ) Normal distribution with mean µ and standard deriva- tion σ Γ (α; β) Gamma distribution with shape α and rate β p(x; y) The joint probability of x and y p(x y) The conditional probability of x given y logj The natural logarithm ix x Notation Default variable usage Notation Meaning a Activation of a capsule or node b Bias d Distance function/measurement or dimension index f Function i; j; k; l; n; t Indices m Mean p Probability function r Responsibility, variable describing association u Capsule traits v Predicted capsule traits w Weights x Data points. In the context of routing the same as v α Scale β Rate Small positive number θ Parameters λ Precision scaling in the normal gamma distribution µ Mean π Probability of event or Archimedes’ constant ρ Concentration parameters σ Standard derivation τ Precision constx Constant value w.r.t x Abbreviations Abbreviation Meaning gmm Gaussian mixture model em Expectation maximization or EM-routing. Context de- pendent gmm-em Expectation maximization of a Gaussian mixture model cnn Convolutional neural network caps-net Capsule neural network sgd Stochastic gradient descent frms Fast routing with means shift frem Fast routing with expectation maximization mfgmm Mean field Gaussian mixture model mfskm Mean field soft K-means dr Dynamic routing skm Soft k-means 1 Introduction 1.1 Background The problem considered in this thesis is image classification, a computer vision task in which deep learning has become an increasingly important tool. Recently, two papers [5, 11] were published which are pointing out that convolutional neu- ral networks (cnns), the cutting edge of visual deep learning, have large limita- tions when it comes to information routing. Routing describes how to redirect the information flow in a network, breaking the standard network structure. Routing in cnns is usually solved by max-pooling, a method that is essentially throwing away information, when only considering the node with the maximum activation. What is more, the position of a feature is encoded by node position until the final layers of a cnn. This means that cnns will have trouble learning relative geome- try between objects, which is useful while determining the whole from the parts. [11]. Sabour et.al. are proposing a new model: capsule networks (caps-net). In a capsule network nodes are grouped into capsules. The goal is to have every capsule, rather than node, represent a feature. Consequently, each feature will be represented by a vector rather than a scalar, and will be able to indicate not only the probability of the feature being present, but also the feature pose or traits. Here traits refer to subfeatures or characteristics of the feature encoded, which though motivated by their ability to encode position and orientation of there feature the traits might encode any properties of the feature. This means that on a fairly low layer feature position information may be moved from capsule position to capsule activation. Capsule networks may so learn reference frames of objects and features [5], which is something humans do [6]. Representing capsules with vectors also opens the possibility for smarter routing. Information is sent by the capsules which agree on the traits of the capsule in the next layer, 1 2 1 Introduction which is called routing by agreement [11]. This idea is quite old and can be seen in [19] and [17]. The idea of learning relative geometry of parts is also shared with constella- tion models and deformable parts models [2, 3, 14]. 1.2 Problem formulation To implement routing by agreement one has to define the meaning of agreement. Similar capsule vectors is a good start, but what distance measure should be used? The goal is to have a capsule to spread its activation and trait information to the next layer’s nodes on which its traits prediction coincides with to other capsule’s predictions. This can be seen as a soft clustering problem, where the lower layer capsules are data points, and the higher layer capsules are cluster prototypes. Clustering problems can be solved in multiple ways, and when evaluating a solution one have to consider how well it fits a neural network structure. One have to consider e.g. execution time, how to handle activations, energy propaga- tion and gradient flow. The goal of this thesis is to: – Analyze routing methods with respect to their theoretical motivation as a clustering method as well as how they need to be modified to fit a network structure. – Evaluate routing methods by measuring their accuracy and execution time on common image classification datasets. The analysis will be presented together with the routing methods, that is in chap- ters 2 and 3 (Theoretical Background and Model Design). The evaluation will be presented in chapters 4 (Experiments). Both will be further discussed in chapter 5 (Conclusions). 1.3 Previous work In their first paper Sabour et. al [11] used a basic routing algorithm, with high similarities to the soft spherical k-means algorithm. It uses the cosine of the angle as distance measure, which later was thought to be sub-optimal, since it makes the clustering insensitive when it comes to distinguishing between an acceptable and an excellent match [5]. Sabour et. al later released a second paper [5], with a routing algorithm based on expectation maximization (em) of a Gaussian mix- ture model (gmm). This routing algorithm is more complex, and introduces the cluster bias in a less intuitive way. Zhang et al. [21] managed to increase the speed of computation using weighted kernel density estimation. These methods are described more thoroughly in section 2.4.
Recommended publications
  • RCNX: RESIDUAL CAPSULE NEXT by Arjun Narukkanchira Anilkumar
    RCNX: RESIDUAL CAPSULE NEXT by Arjun Narukkanchira Anilkumar A Thesis Submitted to the Faculty of Purdue University In Partial Fulfillment of the Requirements for the degree of Master of Science Department of Electrical and Computer Engineering Indianapolis, Indiana May 2021 THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF COMMITTEE APPROVAL Dr. Mohamed El-Sharkawy, Chair Department of Electrical and Computer Engineering Dr. Brian King Department of Electrical and Computer Engineering Dr. Maher Rizkalla Department of Electrical and Computer Engineering Approved by: Dr. Brian King 2 Dedicated to My Parents: Anilkumar N C and Swapna K A, My grandparents, teachers, friends, family, colleagues and all well wishers. 3 ACKNOWLEDGMENTS I would like to begin by thanking my parents and my teachers for everything. My thesis advisor, Dr. Mohamed El-Sharkawy, whose guidance and wisdom has been crucial throughout my thesis. I am grateful to have had Dr. Brian King and Dr. Maher Rizkalla for serving on my graduate committee and lending their expertise to my research. Additionally, I would like to thank Ms. Sherrie Tucker, who patiently assisted me throughout my graduate studies. I am grateful to Purdue School of Engineering & Technology for help fund my Thesis development. I am grateful to the IoT Collaboratory at IUPUI for accepting me as a member. The lab provided me with most of the hardware and software support for this research. I would like to thank my colleagues for their insights. I would like to acknowledge Lilly Endowment Inc. for their support to Indiana University Pervasive Technology Institute. 4 TABLE OF CONTENTS LIST OF TABLES ....................................
    [Show full text]
  • Capsule Network Performance with Autonomous Navigation
    International Journal of Artificial Intelligence and Applications (IJAIA), Vol.11, No.1, January 2020 CAPSULE NETWORK PERFORMANCE WITH AUTONOMOUS NAVIGATION Thomas Molnar and Eugenio Culurciello ECE, Purdue University, 610 Purdue Mall, West Lafayette, 47907, IN, USA ABSTRACT Capsule Networks (CapsNets) have been proposed as an alternative to Convolutional Neural Networks (CNNs). This paper showcases how CapsNets are more capable than CNNs for autonomous agent exploration of realistic scenarios. In real world navigation, rewards external to agents may be rare. In turn, reinforcement learning algorithms can struggle to form meaningful policy functions. This paper’s approach Capsules Exploration Module (Caps-EM) pairs a CapsNets architecture with an Advantage Actor Critic algorithm. Other approaches for navigating sparse environments require intrinsic reward generators, such as the Intrinsic Curiosity Module (ICM) and Augmented Curiosity Modules (ACM). Caps- EM uses a more compact architecture without need for intrinsic rewards. Tested using ViZDoom, the Caps- EM uses 44% and 83% fewer trainable network parameters than the ICM and Depth-Augmented Curiosity Module (D-ACM), respectively, for 1141% and 437% average time improvement over the ICM and D- ACM, respectively, for converging to a policy function across "My Way Home" scenarios. KEYWORDS Neural Networks, Autonomous, Navigation, Capsules Networks 1. INTRODUCTION Capsule Networks(CapsNets) were first presented by [1] to addressshortcomings ofConvolutionalNeural Networks (CNNs). The max pooling operation used with CNNs reduces thespatial size ofdata flowing through a network and thuslosesinformation. Thisleadsto the problem that “[i]nternaldata representation of a convolutional neural network does not take into account important spatial hierarchies between simple and complex objects" [1]. CapsNets resolve this by encoding the probability of detection of a feature as the length of their output vector.
    [Show full text]
  • Analysis of Capsule Networks for Image Classification
    International Conferences Computer Graphics, Visualization, Computer Vision and Image Processing 2021; Connected Smart Cities 2021; and Big Data Analytics, Data Mining and Computational Intelligence 2021 ANALYSIS OF CAPSULE NETWORKS FOR IMAGE CLASSIFICATION Evgin Goceri Biomedical Engineering Department, Engineering Faculty Akdeniz University, Dumlupinar Boulevard, 07058, Antalya, Turkey ABSTRACT Recently, the interest in convolutional neural networks have grown exponentially for image classification. Their success is based on the ability to learn hierarchical and meaningful image representation that results in a feature extraction technique which is general, flexible and can encode complex patterns. However, these networks have some drawbacks. For example, they need a large number of labeled data, lose valuable information (about the internal properties, such as shape, location, pose and orientation in an image and their relationships) in the pooling and are not able to encode deformation information. Therefore, capsule based networks have been introduced as an alternative to convolutional neural networks. Capsules are a group of neurons (logistic units) representing the presence of an entity and the vector indicating the relationship between features by encoding instantiation parameters. Unlike convolutional neural networks, maximum pooling layers are not employed in a capsule network, but a dynamic routing mechanism is applied iteratively in order to decide the coupling of capsules between successive layers. In other words, training between capsule layers is provided with a routing-by-agreement method. However, capsule networks and their properties to provide high accuracy for image classification have not been sufficiently investigated. Therefore, this paper aims (i) to point out drawbacks of convolutional networks, (ii) to examine capsule networks, (iii) to present advantages, weaknesses and strengths of capsule networks proposed for image classification.
    [Show full text]
  • Capsule Network Implementation on FPGA FPGA
    Journal of Pure & Applied Sciences www.Suj.sebhau.edu.ly ISSN 2521-9200 Received 23/03/2020 Revised 16/08/2020 Published online 05/10/2020 Capsule Network Implementation On FPGA *Salim A. Adrees , Ala A. Abdulrazeg Department of Computer Engineering, College of Engineering, Omar Al-Mukhtar University, Libya *Corresponding author: [email protected] Abstract A capsule neural network (CapsNet) is a new approach in artificial neural network (ANN) that produces a better hierarchical relationship. The performance of CapsNet on graphics processing unit (GPU) is considerably better than convolutional neural network (CNN) at recognizing highly overlapping digits in images. Nevertheless, this new method has not been designed as an accelerator on field programmable gate array (FPGA) to measure the speedup performance and compare it with the GPU. This paper aims to design the CapsNet module (accelerator) on FPGA. The performance between FPGA and GPU will be compared, mainly in terms of speedup and accuracy. The results show that training time on GPU using MATLAB is 789.091 s. Model evaluation accuracy is 99.79% and the validation accuracy is 98.53%. The time required to finish one routing algorithm iteration in MATLAB is 0.043622 s and in FPGA it takes 0000056s which means FPGA module is 67 times faster than GPU. Keywords: Artificial intelligent, Capsule neural network, Deep learning, FPGA, Image recognition. FPGA [email protected] ANN (CapsNet) CNN GPU FPGA MATLAB 70063 77097 9070071 0000056 00033522 MATLAB 59 FPGA I. Introduction Nowadays, Machine learning powers different fields between inputs and outputs. Unfortunately, CNN such as websites, cameras, and smartphones.
    [Show full text]
  • A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects
    > REPLACE THIS LINE WITH YOUR PAPER IDENTIFICATION NUMBER (DOUBLE-CLICK HERE TO EDIT) < 1 A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects Zewen Li, Wenjie Yang, Shouheng Peng, Fan Liu, Member, IEEE perceptron network cannot handle linear inseparable problems Abstract—Convolutional Neural Network (CNN) is one of the (such as XOR problems). In 1986, Hinton et al. [4] proposed a most significant networks in the deep learning field. Since CNN multi-layer feedforward network trained by the error back- made impressive achievements in many areas, including but not propagation algorithm—Back Propagation Network (BP limited to computer vision and natural language processing, it attracted much attention both of industry and academia in the past Network), which addressed some problems that single-layer few years. The existing reviews mainly focus on the applications of perceptron could not solve. In 1987, Waibel et al. [5] proposed CNN in different scenarios without considering CNN from a Time Delay Neural Network (TDNN) for speech recognition, general perspective, and some novel ideas proposed recently are which can be viewed as a one-dimensional convolutional neural not covered. In this review, we aim to provide novel ideas and network. Then, Zhang [6] proposed the first two-dimensional prospects in this fast-growing field as much as possible. Besides, convolutional neural network—Shift-invariant Artificial not only two-dimensional convolution but also one-dimensional Neural Network (SIANN). LeCun et al. [7] also constructed a and multi-dimensional ones are involved. First, this review starts with a brief introduction to the history of CNN. Second, we convolutional neural network for handwritten zip code provide an overview of CNN.
    [Show full text]
  • A Capsule Neural Network Based Model for Structural Damage Localization and Quantification Using Transmissibility Data
    UNIVERSIDAD DE CHILE FACULTAD DE CIENCIAS FÍSICAS Y MATEMÁTICAS DEPARTAMENTO DE INGENIERÍA MECÁNICA A CAPSULE NEURAL NETWORK BASED MODEL FOR STRUCTURAL DAMAGE LOCALIZATION AND QUANTIFICATION USING TRANSMISSIBILITY DATA TESIS PARA OPTAR AL GRADO DE MAGISTER EN CIENCIAS DE LA INGENIERÍA MENCIÓN MECÁNICA MEMORIA PARA OPTAR AL TÍTULO DE INGENIERO CIVIL MECÁNICO JOAQUÍN EDUARDO FIGUEROA BARRAZA PROFESOR GUÍA ENRIQUE LÓPEZ DROGUETT MIEMBROS DE LA COMISIÓN VIVIANA MERUANE NARANJO RUBÉN BOROSCHEK KRAUSKOPF SANTIAGO DE CHILE 2019 RESUMEN DE LA TESIS PARA OPTAR AL TÍTULO DE: Ingeniero Civil Mecánico y grado de Magister en Ingeniería Mecánica POR: Joaquín Eduardo Figueroa Barraza FECHA: Enero de 2019 PROF. GUÍA: Enrique López Droguett A CAPSULE NEURAL NETWORK BASED MODEL FOR STRUCTURAL DAMAGE LOCALIZATION AND QUANTIFICATION USING TRANSMISSIBILITY DATA Dentro de la ingeniería estructural, el monitoreo de condición usando diferentes tipos de sensores ha sido importante en la prevención de fallas y diagnóstico del estado de salud. El desafío actual es aprovechar al máximo las grandes cantidades de datos para entregar mediciones y predicciones precisas. Los algoritmos de aprendizaje profundo abordan estos problemas mediante el uso de datos para encontrar relaciones complejas entre ellos. Entre estos algoritmos, las redes neuronales convolucionales (CNN) han logrado resultados de vanguardia, especialmente cuando se trabaja con imágenes. Sin embargo, existen dos problemas principales: la incapacidad de reconocer imágenes rotadas como tales, y la inexistencia de jerarquías dentro de las imágenes. Para resolver estos problemas, se desarrollaron las redes de cápsulas (Capsule Networks), logrando resultados prometedores en problemas de tipo benchmark. En esta tesis, las Capsule Networks se modifican para localizar y cuantificar daños estructurales.
    [Show full text]
  • Examining the Benefits of Capsule Neural Networks
    1 Examining the Benefits of Capsule Neural Networks Arjun Punjabi1*, Jonas Schmid2*, and Aggelos K. Katsaggelos1 1Electrical and Computer Engineering, McCormick School of Engineering, Northwestern University 2Electrical Engineering, Institute for Communication Systems, HSR University of Applied Science Rapperswil *Both authors contributed equally to this work Abstract—Capsule networks are a recently developed class of applied to all classes of neural networks, not just those with neural networks that potentially address some of the deficiencies capsules. The interpretability of neural networks has always with traditional convolutional neural networks. By replacing the been a problem, and it is difficult to examine the benefit of standard scalar activations with vectors, and by connecting the artificial neurons in a new way, capsule networks aim to be capsules without a comparison to traditional CNN features. the next great development for computer vision applications. In an attempt to elucidate these capsules, this investiga- However, in order to determine whether these networks truly tion will begin by employing a deep visualization technique operate differently than traditional networks, one must look to generate images that visually represent the information at the differences in the capsule features. To this end, we contained in a capsule. This image can then be compared perform several analyses with the purpose of elucidating capsule features and determining whether they perform as described in to an image created in a similar fashion from a traditional the initial publication. First, we perform a deep visualization CNN, and the discrepancies between them can provide vi- analysis to visually compare capsule features and convolutional sual justification for the hypothesized benefits of capsule neural network features.
    [Show full text]
  • Handwritten Text Recognition Based on CNN Classification
    International Journal of Advanced Science and Technology Vol. 29, No. 9s, (2020), pp. 6870-6880 Handwritten Text Recognition Based On CNN Classification Kasturi Madineni, Department of Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences. [email protected] Prasanna Vasudevan, Assistant Professor, Department of Computer Science and Engineering, Saveetha School of Engineering, Saveetha Institute of Medical and Technical Sciences. [email protected] Abstract Handwritten images for characters, digits and special symbols has acknowledged more consideration in research community of pattern recognition due to vast tenders and uncertainty in learning new methods. Primarily, it includes two steps like character recognition and feature extraction, it mainly based on the classification of algorithm. The growth of handwritten charactersareapplicable mainly in cheques, medical prescriptions and tax returns etc., but identifying the handwritten characters are much difficult than the printed characters because each person have different handwriting style. The handwritten recognition mainly based on the convolutional neural network because it provides high accuracy and faster data processing. When compare to capsule neural network and Convolutional neural network the capsule neural network will work faster. The present research working on capsule neural network as classifier, MNIST as dataset with suitable parameters for testing and training for handwritten text recognition. Keywords: Handwritten recognition, convolutional neural network, capsule neural network, MNIST dataset, feature extraction and classification. INTRODUCTION In this paper written characters, digits and symbols have completely different vogue to acknowledge. Neural network is applied in several areas to acknowledge characters and gain additional accuracy. Capsule neural network square measure new sensational in Machine learning and therefore the superior performance on the Kaggle dataset.
    [Show full text]
  • Capsule Nets for Complex Medical Image Segmentation Tasks
    Capsule Nets for Complex Medical Image Segmentation Tasks Shanmugapriya Survarachakan1, Jenny Stange Johansen1, Mathias Aarseth Pedersen1, Mahdi Amani1;2, and Frank Lindseth1 1 Norwegian University of Science and Technology, Department of Computer Science, Trondheim, Norway fshanmugapriya.survarachakan, mahdi.amani, [email protected] fjennysj, [email protected] 2 SimulaMET, Simula Research Laboratory, Oslo, Norway Abstract. In recent years, automatic image segmentation using deep learning has shown great potential. U-Net is commonly applied to the medical image segmentation task. Further, a new architecture called Seg- Caps, based on capsules networks has been introduced. Even though medical image segmentation has made great strides, it is still a complex task and continued research in this area is important. In this paper, the method from the Segcaps paper was implemented and trained on the same dataset and achieved a similar result. In addition, a Multi-SegCaps model, an EM-routing SegCaps model and a U-Net model able to seg- ment an arbitrary number of classes were developed. The models were implemented for the segmentation of single 2D slices using two of its neighboring slices on each side (five in total). These models were trained to perform left atrium and hippocampus (anterior & posterior) segmenta- tion on the MSD datasets. The performance of the Multi-SegCaps model and the EM-routing SegCaps model were compared to the U-Net model. The 2.5D U-Net model had an overall better performance on all the datasets achieving a Dice score of 91.39% for the left atrium segmenta- tion, and the Dice scores of 85.18% and 83.90% for the anterior and pos- terior hippocampus segmentation.
    [Show full text]
  • Devanagari Script Recognition Using Capsule Neural Network
    International Journal of Computer Sciences and Engineering Open Access Research Paper Vol.-7, Issue-1, Jan 2019 E-ISSN: 2347-2693 Devanagari Script Recognition using Capsule Neural Network U.M. Sawant1*, R.K. Parkar2, S.L. Shitole3, S.P. Deore4 1,2,3Department of Computer Science, Modern Education Society’s College of Engineering, Pune, India 4Modern Education Society’s College of Engineering, Pune, India *Corresponding Author: [email protected], Tel.: +91-8975583271 Available online at: www.ijcseonline.org Accepted: 19/Jan/2019, Published: 31/Jan/2019 Abstract— Handwritten Devanagari Script Recognition has a lot of applications in the field of document processing, automation of postal services, automated cheque processing and so on. Several approaches have been proposed and experimented in the past depending on the type of features extracted and the ways of extracting them. In this paper, we proposed the use of Capsule Neural Networks (CapsNet) for the recognition of Handwritten Devanagari script, which is an advancement over the Convolutional Neural Networks (CNN) in terms of spatial relationships between the features. Capsule Neural Networks follow the principle of equivariance unlike the convolutional neural networks which follow the invariance property. CapsNet uses the dynamic routing by agreement method for passing data to higher capsules. CapsNet uses vector format for data representation. It can recognize similar characters in a more efficient manner as compared to CNN. Thus by using the advantages of CapsNet we are aiming to achieve better classification rate. We collected 100 samples of each of the 48 Devanagari characters and 10 Devanagari digits, and performed scaling, rotation and mirroring operations on these images.
    [Show full text]
  • Deformable Capsules for Object Detection
    Under review as a conference paper at ICLR 2021 DEFORMABLE CAPSULES FOR OBJECT DETECTION Anonymous authors Paper under double-blind review ABSTRACT Capsule networks promise significant benefits over convolutional networks by stor- ing stronger internal representations, and routing information based on the agree- ment between intermediate representations’ projections. Despite this, their success has been mostly limited to small-scale classification datasets due to their computa- tionally expensive nature. Recent studies have partially overcome this burden by locally-constraining the dynamic routing of features with convolutional capsules. Though memory efficient, convolutional capsules impose geometric constraints which fundamentally limit the ability of capsules to model the pose/deformation of objects. Further, they do not address the bigger memory concern of class-capsules scaling-up to bigger tasks such as detection or large-scale classification. In this study, we introduce deformable capsules (DeformCaps), a new capsule structure (SplitCaps), and a novel dynamic routing algorithm (SE-Routing) to balance compu- tational efficiency with the need for modeling a large number of objects and classes. We demonstrate that the proposed methods allow capsules to efficiently scale-up to large-scale computer vision tasks for the first time, and create the first-ever capsule network for object detection in the literature. Our proposed architecture is a one-stage detection framework and obtains results on MS COCO which are on-par with state-of-the-art one-stage
    [Show full text]
  • Face Identification Using Capsule Network with Small Data Set
    TALLINN UNIVERSITY OF TECHNOLOGY SCHOOL OF ENGINEERING Department of Electrical power Engineering and Mechatronics FACE IDENTIFICATION USING CAPSULE NETWORK WITH SMALL DATA SET NÄOTUVASTUS VÄIKESE ANDMEHULGAGA KAPSELVÕRGU ABIL MASTER THESIS Üliõpilane: Ahmed Amin Osman Üliõpilaskood: 165596 MAHM Juhendaja: Saleh Ragheb Saleh Alsaleh, Eng, Tallinn 2020 AUTHOR’S DECLARATION Hereby I declare, that I have written this thesis independently. No academic degree has been applied for based on this material. All works, major viewpoints and data of the other authors used in this thesis have been referenced. “.......” .................... 20….. Author: .............................. /signature / Thesis is in accordance with terms and requirements “.......” .................... 20…. Supervisor: …......................... /signature/ Accepted for defence “.......”....................20… . Chairman of theses defence commission: ................................................. /name and signature/ 2 Non-exclusive Licence for Publication and Reproduction of GraduationTthesis¹ I, Ahmed Amin Osman (12/20/1992 ) hereby 1. grant Tallinn University of Technology (TalTech) a non-exclusive license for my thesis FACE IDENTIFICATION USING CAPSULE NETWORK WITH SMALL DATA-SET supervised by Prof.Mart Tamre and Eng, Saleh Ragheb Saleh Alsaleh ____________________________________________________________, 1.1 reproduced for the purposes of preservation and electronic publication, incl. to be entered in the digital collection of TalTech library until the expiry of the term of copyright; 1.2 published via the web of TalTech, incl. to be entered in the digital collection of TalTech library until the expiry of the term of copyright. 1.3 I am aware that the author also retains the rights specified in clause 1 of this license. 2. I confirm that granting the non-exclusive license does not infringe third persons' intellectual property rights, the rights arising from the Personal Data Protection Act or rights arising from other legislation.
    [Show full text]