Character Recognition in Natural Images Utilising Tensorflow
Total Page:16
File Type:pdf, Size:1020Kb
DEGREE PROJECT IN TECHNOLOGY, FIRST CYCLE, 15 CREDITS STOCKHOLM, SWEDEN 2017 Character Recognition in Natural Images Utilising TensorFlow ALEXANDER VIKLUND EMMA NIMSTAD KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Character Recognition in Natural Images Utilising TensorFlow ALEXANDER VIKLUND EMMA NIMSTAD Degree project in Computer Science, DD143X Date: June 12, 2017 Supervisor: Kevin Smith Examiner: Örjan Ekeberg Swedish title: Teckenigenkänning i naturliga bilder med TensorFlow School of Computer Science and Communication Abstract Convolutional Neural Networks (CNNs) are commonly used for character recogni- tion. They achieve the lowest error rates for popular datasets such as SVHN and MNIST. Usage of CNN is lacking in research about character classification in nat- ural images regarding the whole English alphabet. This thesis conducts an experi- ment where TensorFlow is used to construct a CNN that is trained and tested on the Chars74K dataset, with 15 images per class for training and 15 images per class for testing. This is done with the aim of achieving a higher accuracy than the non-CNN approach by de Campos et al. [1], that achieved 55:26%. The thesis explores data augmentation techniques for expanding the small training set and evaluates the result of applying rotation, stretching, translation and noise- adding. The result of this is that all of these methods apart from adding noise gives a positive effect on the accuracy of the network. Furthermore, the experiment shows that with a three layered convolutional neural network it is possible to create a character classifier that is as good as de Campos et al.’s. It is believed that even better results can be achieved if more experiments would be conducted on the parameters of the network and the augmentation. ii Sammanfattning Det är vanligt att använda konvolutionära artificiella neuronnät (CNN) för bildigen- känning, då de ger de minsta felmarginalerna på kända datamängder som SVHN och MNIST. Dock saknas det forskning om användning av CNN för klassificering av bok- stäver i naturliga bilder när det gäller hela det engelska alfabetet. Detta arbete beskri- ver ett experiment där TensorFlow används för att bygga ett CNN som tränas och testas med bilder från Chars74K. 15 bilder per klass används för träning och 15 per klass för testning. Målet med detta är att uppnå högre noggrannhet än 55:26%, vilket är vad de Campos et al. [1] uppnådde med en metod utan artificiella neuronnät. I rapporten utforskas olika tekniker för att artificiellt utvidga den lilla datamäng- den, och resultatet av att applicera rotation, utdragning, translation och bruspåslag utvärderas. Resultatet av det är att alla dessa metoder utom bruspåslag ger en positiv effekt på nätverkets noggrannhet. Vidare visar experimentet att med ett CNN med tre lager går det att skapa en bokstavsklassificerare som är lika bra som de Campos et al.s klassificering. Om fler experiment skulle genomföras på nätverkets och utvidgningens parametrar är det tro- ligt att ännu bättre resultat kan uppnås. iii Contents Contents iv 1 Introduction 1 1.1 Problem Statement . .2 1.2 Scope and constraints . .2 1.3 Thesis outline . .2 2 Background 3 2.1 Text recognition . .3 2.1.1 Pre-processing . .3 2.1.2 Character recognition . .3 2.2 Neural networks . .4 2.2.1 Feedforward networks . .4 2.2.2 Convolutional neural networks . .4 2.2.3 Training . .5 2.2.4 Training neural networks with small datasets . .7 3 Method 8 3.1 Overview . .8 3.2 Chars74K dataset . .9 3.3 Image processing . .9 3.3.1 Pre-processing . .9 3.3.2 Dataset augmentation . 10 3.4 Neural network . 11 3.4.1 TensorFlow . 11 3.4.2 Input . 11 3.4.3 Convolutional layers . 12 3.4.4 Fully connected layers . 12 3.4.5 Cost function . 12 3.5 Training . 13 4 Result 14 4.1 Processed images . 14 4.1.1 Pre-processing . 14 4.1.2 Data augmentation . 16 4.2 Neural network accuracy . 16 iv CONTENTS CONTENTS 5 Discussion 17 5.1 Discussion on the dataset . 17 5.2 Discussion on image processing methods . 17 5.3 Discussion on neural network approach . 18 6 Conclusion 19 Bibliography 20 v Chapter 1 Introduction The problem of recognising text has arisen from the massive data collection performed with modern technology. One aspect of this is how text recognition is used in tech- niques for document digitisation, automated data entry from scanned form sheets and text separation from graphics, that is, tasks that can be performed with classic OCR (Optical Character Recognition). These tasks can be performed by OCR with certain limitations; it is required that the images are fronto-parallel and skew compen- sated [2]. Another aspect of text recognition is its usage for analysing images. The amount of data generated from an abundance of camera-equipped devices, such as smart- phones and wearables, is increasing [3], and it is desired that computers become bet- ter at analysing and indexing those images. Possibly, there is text in the images that could give important contextual information, and thus, the indexing would be im- proved if the text could be found and read. There is also a need for accurate text recognition in natural scenes within the field of automation, such as automatic read- ing of signs in driverless cars. This thesis is treating the problem of identifying already individually extracted characters from natural images. There are many different approaches to this problem, where the methods range from very traditional ones to using modern and more ad- vanced techniques. In general, the methods use some assumptions about the letters, such as them being monochrome with a significant contrast to their background. Yet in a picture of a natural setting, the light conditions and texture on materials will af- fect the colours and contrasts. This means that even if the letters originally are mono- chrome, that is not what they will look like in a digital picture, and that is one of the challenges that this problem poses. Character recognition is well suited for an artificial neural network approach, an approach that has had great success in recent research. Neural networks have for example achieved the lowest error rate of the SVHN (Street View House Number) dataset of 1:64% [4] and the lowest error rate of the MNIST dataset of 0:21% [5]. A tool that has simplified working with neural networks is TensorFlow, which was released by Google in 2015. The TensorFlow API includes functions for build- ing and training neural networks, as well as for drawing diagrams and visualising the data. 1 CHAPTER 1. INTRODUCTION 1.1. PROBLEM STATEMENT 1.1 Problem Statement One of the biggest datasets with characters in natural images is the Chars74K dataset [6]. It was created by de Campos et al., who describe how they created it in the re- port Character Recognition in Natural Images [1]. The report also provides a couple of baseline experiments for character classifica- tion, neither of which uses neural networks. The most accurate method is a Multiple Kernel Learning (MKL) algorithm that evaluates six histograms of different features. Their classification scheme is trained with 15 images per character class and then tested on 15 images per character class. The scheme results in an accuracy of 55:26%. This thesis aims to evaluate how a neural network as a classification scheme com- pares to the MKL algorithm executed by de Campos et al. To make it fair, the com- parison is made with the same training and test data, even though it is a small amount of data for training a neural network. The problem statement for this is: Is an artificial neural network method for classification of characters in natural images more accurate than the method tested by de Campos et al. [1]? 1.2 Scope and constraints The problem of identifying text in an image is a complex problem with many sub- problems. A full-fledged text recognition software would consist of algorithms to detect and extract the characters, categorise the individual characters and then use lexical and grammatical analysis to correct characters that may have been incorrectly categorised. This thesis is only concerned with the most central task of categorising characters already detected and extracted from images. No further lexical or grammatical analy- sis will be performed to improve the results. 1.3 Thesis outline This thesis begins with presenting the background of character recognition and neural networks. Focus is put on how convolutions and training of neural networks works. In the method chapter algorithms for processing the images and neural network teach- ing are described in detail, the Chars74K dataset is also presented. Then the achieved results are presented and compared to de Campos et al.’s results, to conclude how our algorithm performs on the given dataset. Ending the thesis is an analysis of the tested algorithm, to indicte its strengths and how it could be improved. 2 Chapter 2 Background In the background chapter, the techniques used in this thesis are described. First, the area of text recognition is outlined. Secondly, an introduction to neural networks is given, both their structure and how they are trained. Training optimisations and data augmentation methods are also described. 2.1 Text recognition The text recognition problem can according to Jung et al. [7] be divided into four sub- problems: 1. detection, 2. localisation, 3. extraction and enhancement, 4. recognition. The first three steps may be grouped as the pre-processing steps, the steps that are performed before the main task of recognising the characters. Jung et al. [7] mention that the terms for these steps are used interchangeably in many reports.