Automatic Number Plate Recognition for Android
Total Page:16
File Type:pdf, Size:1020Kb
Computer Science Stefan Larsson, Filip Mellqvist Automatic Number Plate Recognition for Android Bachelor's Project Automatic Number Plate Recognition for Android Stefan Larsson, Filip Mellqvist c 2019 The author(s) and Karlstad University This report is submitted in partial fulfillment of the requirements for the Bachelor's degree in Computer Science. All material in this report which is not our own work has been identified and no material is included for which a degree has previously been conferred. Stefan Larsson Filip Mellqvist Approved, 04-06-2019 Advisor: Tobias Pulls Examiner: Stefan Alfredssoon Abstract This thesis describes how we utilize machine learning and image preprocessing to create a system that can extract a license plate number by taking a picture of a car with an Android smartphone. This project was provided by AF˚ at the behalf of one of their customers who wanted to make the workflow of their employees more efficient. The two main techniques of this project are object detection to detect license plates and optical character recognition to then read them. In between are several different image preprocessing techniques to make the images as readable as possible. These techniques mainly includes skewing and color distorting the image. The object detection consists of a convolutional neural network using the You Only Look Once technique, trained by us using Darkflow. When using our final product to read license plates of expected quality in our evaluation phase, we found that 94.8% of them were read correctly. Without our image preprocessing, this was reduced to only 7.95%. Contents 1 Introduction 1 1.1 Purpose of our project . .1 1.2 Dissertation layout . .2 1.3 Project outcome . .2 2 Background 3 2.1 AF........................................˚ 3 2.2 Machine Learning . .3 2.2.1 Neural Networks . .4 2.2.2 Convolutional Neural Networks . .4 2.2.3 K-means clustering . .5 2.3 Computer Vision . .6 2.3.1 What is Computer Vision? . .6 2.3.2 Low- and High-level Vision . .6 2.3.3 Binary Image and Adaptive Threshold . .8 2.3.4 The HSV model . .9 2.3.5 Image Classification and Object Detection . .9 2.3.6 Optical Character Recognition . 10 2.4 Tools . 11 2.4.1 OpenCV . 11 2.4.2 Tesseract-OCR . 11 2.4.3 Pytesseract . 11 2.4.4 You Only Look Once . 12 2.4.5 Anaconda . 12 2.5 OpenALPR . 13 i 2.6 Summary . 13 3 Project Design 14 3.1 Android application . 15 3.2 OCR Service . 15 3.3 Database . 16 3.4 Summary . 16 4 Project Implementation 17 4.1 Building an object detector . 17 4.1.1 Setting up an environment . 17 4.1.2 Training . 19 4.1.3 Porting to an Android device . 21 4.2 OCR service . 23 4.2.1 The two colors of the plates . 24 4.2.2 Prepare the image . 25 4.2.3 Identify the contour . 28 4.2.4 Identify the corners . 30 4.2.5 Skew the plate . 32 4.2.6 Refining the image . 34 4.2.7 Read the image . 36 4.3 Summary . 37 5 Evaluation 38 5.1 Android performance . 38 5.2 Evaluation of object detection . 39 5.3 OCR service . 40 5.3.1 OCR performance . 40 5.3.2 Precision vs. time . 43 ii 5.3.3 Evaluating the impact of preprocessing . 48 5.4 Summary . 50 6 Conclusion 51 6.1 Project summary . 51 6.2 Future work . 52 6.3 Concluding remarks . 53 References 54 iii List of Figures 1.1 A simple model showing our whole system. .2 2.1 An image with red and green channel (image: CC BY-SA 3.0 [9]). The colors are represented on a plot grouped into segments using k-means (image: public domain [11]). .5 2.2 Performing canny edge detection on a pair of headphones. .7 2.3 Local and global threshold applied on an image with both bright and dark areas. Local adaptive thresholding on the image in the middle and global fixed thresholding on the image to the right. .8 2.4 The HSV cylinder showing the connection of the values: Hue, Saturation and Value (image: CC BY-SA 3.0 [10]). .9 2.5 How an Image Classifier and an Object Detector would see a cup. 10 3.1 The planned flow of our system. 14 4.1 Annotating a picture of a license plate. 19 4.2 A screenshot of the final application where the object detector is more than 90% confident that a license plate is found. 23 4.3 The result of k-means color quantization where k=3. 25 4.4 The desired HSV mask applied. The quadrilateral is highlighted and the characters are easy to identify. 27 4.5 HSV Masks with six different ranges (values in upper left corner). Left three lower values and right three higher values. 28 4.6 The final accepted mask, acquired when the saturation value ranges from 0-60 (second value in each array) and the value (third value) is ranging from 195-255. 28 4.7 Desired contour drawn on the source image. 30 4.8 The corners of the contour drawn on the source image. 32 iv 4.9 A quadrilateral with the edges A, B, C and D and the corners E, F, G and H. 32 4.10 The result of grabbing the corners of the source image and skewing them to the desired corners. 34 5.1 A flowchart of the image being put into the OCR service pipeline, first being preprocessed and then read by the OCR software, later to be matched or not. 42 5.2 A simplified flowchart of the binarization process where the image is put into the adaptive threshold, and if not matched by the OCR software, will go into the manual threshold together with a calculated threshold value. 45 v List of Tables 5.1 Comparing the running time of our app to the specifications of our devices. 39 5.2 Comparison of size and time between two generic images of minimum and maximum potential dimensions with no need for iteration. 42 5.3 A table showing the outcomes of the varying multipliers with the four most essential numbers for our evaluation. 46 5.4 A table showing x, which is the number the attempt will get raised to every iteration, together with the four most important numbers for the evaluation. 47 5.5 A table comparing the two methods for tweaking the threshold in the bina- rization of the image. 48 5.6 A table comparing the accuracy and speed of the OCR software with and without preprocessing. 49 5.7 A table comparing the accuracy and speed of the OCR software with and without preprocessing on a subset of optimized images. 50 vi Listings 4.1 Create and prepare an Anaconda virtual environment called tgpu2. 18 4.2 Clone Darkflow repository. 18 4.3 Download Darkflow dependencies with Anaconda. 18 4.4 Installing Darkflow using pip. 18 4.5 Initiating a Darkflow training session. 20 4.6 OpenCV k-means on image. 24 4.7 Reading the image with OpenCV. 25 4.8 Creating HSV ranges. That is the upper and lower limit. 26 4.9 Creating the mask with the input of the image, the lower values in HSV, as well as the upper ones. 27 4.10 Creating dynamic HSV ranges. The upper and lower limit respectively will part progressively . 27 4.11 Getting coordinates of all points that will make up the contours of the mask. 29 4.12 Confirms quadrilateral with arcLength() and approxPolyDP()....... 29 4.13 Locating the corners, iterating through every point in the contour. 31 4.14 The function that utilizes OpenCV's functions getPerspectiveTransform() and warpPerspective() to skew the image. 33 4.15 The function where the binarized image is returned together with the cal- culated optimal threshold. 36 4.16 The implementation of Pytesseract, configured to read the license plate. 37 vii List of Abbreviations ML - Machine Learning OCR - Optical Character Recognition CNN - Convolutional Neural Network OpenCV - Open Source Computer Vision PyTesseract - Python-Tesseract YOLO - You Only Look Once HSV - Hue, Saturation, Value RGB - Red, Green, Blue viii 1 Introduction Machine learning, neural networks and artificial intelligence are all concepts which have exploded in popularity the past few years. It allows computers to automatically analyze immense amounts of data and make decisions based on its patters. This can be invaluable because the amounts of data used are often much too large for any human to analyze, comprehend and draw a conclusion from. It can be used almost anywhere, from self- driving cars to brewing the perfect pint of beer1. Computer vision is a field in computer science that has had great success due to the increasing popularity of machine learning. Instead of having a human look at images and decide what they depict, we are able to teach computers to recognize patterns of previous images and see the resemblance in new images. Computer vision can also be used to read alphanumeric characters in images and turn them into text. 1.1 Purpose of our project The purpose of this project is to develop a system for our employer AF˚ that will change the workflow for one of their customers. This customer has employees who often file damage reports on cars they own using an application created by AF.˚ In its current state, the workflow consists of taking several pictures of the car in question and then opening a text editor to manually add the number of the license plate for the application to download information about it.