A Neural Network Approach to Arbitrary Symbol Recognition on Modern Smartphones
Total Page:16
File Type:pdf, Size:1020Kb
A Neural Network Approach to Arbitrary Symbol Recognition on Modern Smartphones FINAL SAMUEL WEJÉUS Master’s Thesis at CSC, KTH Supervisor: Jens Lagergren Examiner: Anders Lansner Project Sponsor: Bontouch AB TRITA xxx yyyy-nn Abstract Making a computer understand handwritten text and sym- bols have numerous applications ranging from reading of bank checks, mail addresses or digitalizing arbitrary note taking. Benefits include atomization of processes, efficient electronic storage, and possible augmented usage of parsed content. This report will provide an overview of how off-line hand- writing recognition systems can be constructed. We will show how such systems can be split into isolated modules which can be constructed individually. Focus will be on handwritten single symbol recognition and we will present how this could be accomplished using convolutional neural networks on a modern smartphone. A symbol recognition prototype application for the Apple iOS operation system will be constructed and evaluated as a proof-of-concept. Results obtained during this project shows that it is pos- sible to train a classifier to understand arbitrary symbols without the need to manually crafting class separating fea- tures and instead rely on deep learning for automatic struc- ture discovery. Referat Identifiering av symboler genom användning av neurala nätverk på moderna mobiltelefoner Att få en dator att förstå handskriven text och symboler har många användningsområden rörande allt från att läsa bankcheckar, postadresser eller att digitalisera godtyckliga notater. Fördelar av en sådan process inkluderar automa- tisering av procedurer, elektronisk lagring samt möjlig ut- ökad användning av inläst material. Denna rapport avser att ge en översikt hur system för igen- känning av handskriven text i off-line läge kan skapas. Vi ämnar visa hur sådana system kan delas upp i mindre iso- lerade delar som kan realiseras individuellt. Fokus kommer att läggas på igenkänning av enskilda handskrivna symbo- ler och vi kommer presentera hur detta kan göras med hjälp av konvolutionella neurala nätverk på en modern mobiltele- fon. En applikation som utför igenkänning av nämnda sym- boler kommer att skapas för Apples iOS operativsystem i from av en proof-of-concept applikation. Resultat erhållna genom detta projekt visar på att det är möjligt att träna en klassifierare att känna igen godtyckliga symboler utan att behöva manuellt skapa klassseparerande karaktärsdrag utan istället använda sig av djup inlärning för automatisk igenkänning av strukturer. Contents List of Figures List of Tables Acronyms 1 Introduction 1 1.1 History . 2 1.2 The Client . 2 1.3 Applications . 2 1.4 Problem Statement . 3 1.5 Challenges . 3 1.6 Limitations of Scope . 4 2 Related Work 5 2.1 The MNIST Database . 5 2.2 Current State of Field . 6 2.3 Handwriting Recognition . 8 2.3.1 Writing Styles and Related Issues . 8 2.3.2 Writer-dependence vs. Writer-independence . 9 2.3.3 On-line vs. Off-line handwriting . 9 2.3.4 Segmentation . 9 2.3.5 Features and Feature Selection . 11 2.4 Problem Reduction Techniques . 13 2.5 Description of a Complete HWR System . 13 3 Theory 15 3.1 Choice of Symbol Classifier . 15 3.2 Artificial Neural Networks . 16 3.2.1 Network Structure and Network Training . 20 3.2.2 Deep Learning of Representations . 22 3.2.3 Convolutional Neural Networks . 23 4 Results 27 4.1 Comparison of Neural Network Libraries . 27 4.1.1 Investigated Libraries . 27 4.1.2 Evaluation . 29 4.2 Lua . 29 4.3 Torch . 30 4.4 Prototype . 30 4.4.1 Network Training . 31 4.4.2 iPhone Application . 33 4.4.3 Testing . 36 5 Discussion 39 5.1 Conclusions . 41 6 Future Work 43 References 47 List of Figures 2.1 Samples from the MNIST dataset . 6 2.2 Hard samples from the MNIST set . 6 2.3 Example of recognition using the Evernote OCR system. 7 2.4 Variation in writing style . 8 2.5 Example of a captured word image before explicit segmentation . 10 2.6 Example of different results after preforming explicit segmentation . 10 2.7 Example of sliding-window technique . 11 2.8 Hypothetical HWR System Pipeline . 14 3.1 Visual model of the McCulloch and Pitt’s neuron . 17 3.2 Overview of a simple neural network model . 18 3.3 Plot of network performance over time . 22 3.4 A typical two stage ConvNet . 24 3.5 Edge detection using convolution . 25 4.1 Training using different learning rates . 32 4.2 Input views of application prototype . 33 4.3 Example of user drawing custom shapes . 34 4.4 Drawing classification pipeline . 34 4.5 Classification using camera capture . 35 4.6 Pre-processing stages of sample captured with camera . 36 4.7 Number of correct classifications using prototype (individual numbers) . 36 4.8 Number of correct classifications using prototype (total) . 37 5.1 30 filters trained on the MNIST set . 40 List of Tables 2.1 Best results reported on the MNIST set for various ML techniques. 7 3.1 Mathematical notation used when describing neural network algorithms 16 4.1 Freehand drawing speed performance of prototype on various devices . 37 4.2 Camera capture speed performance of prototype on various devices . 37 Chapter 1 Introduction The process of parsing samples of a handwritten text into symbols is usually referred to as recognition or classification. The purpose of this representation is that it can be interpreted by a machine. One usually makes a distinction between two types of recognition. If the characters are printed typewriter fonts the recognition of such are referred to as Optical Character Recognition (OCR), and if the characters are written by hand we call the process Handwriting Recognition (HWR). Furthermore, handwriting can be distinguished as being either on-line or off-line, depending on when the text is captured. If the text is captured while the author is writing it is referred to as on-line mode otherwise it is referred to as off-line. A complete recognition system consists of three parts commonly corresponding to three separate problems: localization, segmentation, and recognition. The goal for each of these parts are: isolating and finding contours of individual words (lo- calization), separating a word into individual characters (segmentation), and finally mapping segmented chunks with the correct interpretation (recognition). Today the most popular approaches for character recognition involves using some form of Machine Learning (ML) technique. A classifier is the set of techniques used and can be seen as a black box producing output in the form of a classification given some input. Classification is regarded as an instance of supervised learning. That is, learning is performed using a set of correctly identified observations. Machine learning can be used for a complete recognition system, or for specific parts. A recognition system involves complex tasks that need large computational resources. Building recognition systems for smartphones has not yet been widely investigated [24], consequently it is of interest to determine a set of suitable technologies for these type of devices. This report will focus on the off-line case of handwriting recognition on a smart- phone using neural networks. We will give an introduction to the various problems faced when building HWR systems. No assumptions or clear goal was given at the start of this project other than the question of what is the state of field today, what could be accomplished, and how would you do it. The outline of this report is of investigative nature. To validate results found a prototype for character recognition 1 on a smartphone will be presented as a proof-of-concept for how neural networks can be used efficiently on smartphones. 1.1 History Character Recognition (CR), was first studied in the beginning of the 1900s taking a mechanical approach using photocells [1]. Common techniques investigated included simple template matching and structural analysis. Initial development came to a halt when researchers realized the huge diversity and variability in text when it comes to handwritten input [48]. The history of research efforts made from its infancy have not been a linear process. The problem of CR was at first very popular subject in the field of pattern recognition since it was regarded an easy problem to solve. Like many other fields in science and technology process is usually tangled and progress is often made when research diverge and then cross-breeding different re- sults [48]. Modern state of the art recognition systems uses techniques from various fields of pattern recognition, machine learning and artificial intelligence. Today on-line HWR is considered being a close to solved problem [54]. The problem of off-line handwriting is however much harder and is considered an open question in the research community [55]. 1.2 The Client Bontouch AB is a IT consulting company whose aim is to partner in long-term collaborations with its customers. The company focus on mobile solutions for plat- forms like Android (Google)1 and iOS (Apple)2. Bontouch is located in Stockholm but have a global market. Projects developed for clients includes, among others, Sweden’s first banking app for Skandinaviska Enskilda Banken AB (SEB)3 which make use of off-line OCR scanning of a predefined printed OCR font on invoices. For future projects Bontouch is very interested in how current recognition sys- tems can be extended, or replaced to recognize arbitrary input. The main interest for Bontouch AB is to get an overview of the state of the field. What can be ac- complished today and how could a recognition system be implemented on a mobile platform? 1.3 Applications HWR software simplifies the process of extracting data from the handwritten doc- uments and storing it in electronic formats.