Character Recognition with Minimum Edit Distance Method

International Journal of Science and Research (IJSR), India Online ISSN: 2319‐7064 Character Recognition with Minimum Edit Distance Method Parul Vashist1, K. Hema2 1I.T ,U.P.T.U, GNIOT, Greater Noida Greater Noida,201308, North East Zone, India [email protected] 2E.I,U.P.T.U, GNIOT, Greater Noida Greater Noida,201308, North East Zone, India [email protected] Abstract: Character Recognition is performed using Multilayer preceptor neural network. Number of hidden layer nodes is determined to achieve high performance back propagations network in the recognition of Characters. ICR software has a self-learning system referred to as a neural network, which automatically updates the recognition database for new handwriting patterns. It extends the usefulness of scanning devices for the purpose of document processing, from printed character recognition (a function of OCR) to hand-written matter recognition. We also compared ICR with OCR and OMR with the Edit distance method refers to the distance in which insertions and deletions have equal cost and replacements have twice the cost of an insertion. Keywords: Character Recognition, ICR, Feature Extraction, Edit Distance Method 1. Introduction projections given new situations of interest and answer "what if" questions. 1.1 Neural Network Other advantages include: An Artificial Neural Network (ANN) is an information processing paradigm that is inspired by the way biological 1. Adaptive learning: An ability to learn how to do nervous systems, such as the brain, process information. tasks based on the data given for training or The key element of this paradigm is the novel structure of initial experience. the information processing system. It is composed of a 2. Self-Organization: An ANN can create its own large number of highly interconnected processing organization or representation of the information elements (neurons) working in unison to solve specific it receives during learning time. problems. ANNs, like people, learn by example. An ANN 3. Real Time Operation: ANN computations may be is configured for a specific application, such as pattern carried out in parallel, and special hardware recognition or data classification, through a learning devices are being designed and manufactured process. Learning in biological systems involves which take advantage of this capability. adjustments to the synaptic connections that exist between 4. Fault Tolerance via Redundant Information the neurons. This is true of ANNs as well. Coding: Partial destruction of a network leads to the corresponding degradation of performance. However, some network capabilities may be retained even with major network damage. 1.3 Back Propagation Back propagation, an abbreviation for "backward propagation of errors", is a common method of training artificial neural networks. From a desired output, the network learns from many inputs, similar to the way a child learns to identify a dog from examples of dogs. Figure 1: Neural Network It is a supervised learning method, and is a generalization 1.2 Why we use neural networks? of the delta rule. It requires a dataset of the desired output for many inputs, making up the training set. It is most Neural networks, with their remarkable ability to derive useful for feed-forward networks (networks that have no meaning from complicated or imprecise data, can be used feedback, or simply, that have no connections that loop). to extract patterns and detect trends that are too complex Back propagation requires that the activation function to be noticed by either humans or other computer used by the artificial neurons (or "nodes") be techniques. A trained neural network can be thought of as differentiable. Back-propagation neural network is only an "expert" in the category of information it has been practical in certain situations. Following are some given to analyses. This expert can then be used to provide guidelines on when you should use another approach: Volume 2 Issue 4, April 2013 www.ijsr.net 457 International Journal of Science and Research (IJSR), India Online ISSN: 2319‐7064 Can you write down a flow chart or a formula backbone of the character has been deleted and the broad that accurately describes the problem? If so, then strokes has been reduced to thin lines .Skeletonization is stick with a traditional programming method. illustrated in Figure 3. Is there a simple piece of hardware or software that already does what you want? If so, then the development time for a NN might not be worth it. Do you want the functionality to "evolve" in a Figure 3: Skeletonization of an English Character direction that is not pre-defined? If so, then consider using a Genetic Algorithm (that's 2.3 Normalization another topic!). Do you have an easy way to generate a There are lots of variations in handwritings of different significant number of input/output examples of persons. Therefore, after skeletonization process, the desired behavior? If not, then you won't be normalization of characters is performed so that all able to train your NN to do anything. characters could become in equal dimensions of matrix. Is the problem is very "discrete"? Correct answer can be found in a look-up table of reasonable 3. Character Recognition Types size? A look-up table is much simpler and more accurate. 3.1 OCR Is precise numeric output values required? NN's OCR (Optical Character Recognition) also called Optical are not good at giving precise numeric answers. Character Reader is a system that provides a full alphanumeric recognition of printed or handwritten 2. Character Recognition characters at electronic speed by simply scanning the form. More recently, the term Intelligent Character The procedure of handwritten English character Recognition (ICR) has been used to describe the process recognition is as follows: of interpreting image data, in particular alphanumeric text. • Acquire the sample by scanning. Function of OCR Forms containing characters images can • Skeletionization and Normalization operations be scanned through scanner and then recognition engine are performed. of the OCR system interpret the images and turn images • Apply Boundary Detection Feature Extraction of handwritten or printed characters into ASCII data technique. (machine-readable characters). Therefore, OCR allows us • Neural network Classification.\ to quickly automate data capture from forms, eliminate • Recognized Character keystrokes to reduce data entry costs and still maintain the high level of accuracy required in forms processing 2.1 English Character applications. The English language consists of 26 characters (5 vowels, Features of OCR 21 consonants) and is written from left to right. A set of hand written English characters as shown in figure 2 The technology provides a complete form processing and +documents capture solution. Usually, OCR uses a modular architecture that is open, scalable and or flow controlled. It includes forms definition, scanning, image pre-processing, and recognition capabilities. There are two basic types of core OCR algorithm, which may produce a ranked list of candidate characters. Matrix matching involves comparing an image to a stored glyph on a pixel-by-pixel basis; it is also known as "pattern matching" or "pattern recognition". This relies on the input glyph being correctly isolated from the rest of the image, and on the stored glyph being in a similar font and at the same scale. This technique works best with typewritten text and does not work well when new fonts are encountered. This is the technique the early physical Figure 2: A Set of Handwritten English Characters photocell-based OCR implemented, rather directly. 2.2 Scanning and Skeletonization Feature extraction decomposes glyphs into "features" like lines, closed loops, line direction, and line intersections. Handwritten characters are scanned and it has been These are compared with an abstract vector-like converted into 1024 (32X32) binary pixels. The representation of a character, which might reduce to one skeletonization process will be used to binary pixel image or more glyph prototypes. General techniques of feature and the extra pixels which are not belonging to the detection in computer vision are applicable to this type of Volume 2 Issue 4, April 2013 www.ijsr.net 458 International Journal of Science and Research (IJSR), India Online ISSN: 2319‐7064 OCR, which is commonly seen in "intelligent" 3.3 ICR handwriting recognition and indeed most modern OCR Intelligent Character Recognition (ICR) is the module of software. Nearest neighbor classifiers such as the k- OCR that has the ability to turn images of hand written or nearest neighbors algorithm are used to compare image printed characters into ASCII data. Sometimes OCR is features with stored glyph features and choose the nearest known as ICR. Most ICR software has a self-learning match. system referred to as a neural network, which automatically updates the recognition database for new 3.2 OMR handwriting patterns. It extends the usefulness of scanning Optical mark recognition (also called optical mark reading devices for the purpose of document processing, from and OMR) is the process of capturing human-marked data printed character recognition (a function of OCR) to hand- from document forms such as surveys and tests. written matter recognition. Because this process is involved in recognizing hand writing, accuracy levels Many traditional OMR devices work with a dedicated may, in some circumstances, not be very good but can scanner device that

Character Recognition with Minimum Edit Distance Method

Intelligent Chat Bot

An Ensemble Regression Approach for Ocr Error Correction

Practice with Python

NLP - Assignment 2

3 Dictionaries and Tolerant Retrieval

Use of Word Embedding to Generate Similar Words and Misspellings for Training Purpose in Chatbot Development

Feature Combination for Measuring Sentence Similarity

The Research of Weighted Community Partition Based on Simhash

Hybrid Algorithm for Approximate String Matching to Be Used for Information Retrieval Surbhi Arora, Ira Pandey

Levenshtein Distance Based Information Retrieval Veena G, Jalaja G BNM Institute of Technology, Visvesvaraya Technological University

Thesis Submitted in Partial Fulﬁlment for the Degree of Master of Computing (Advanced) at Research School of Computer Science the Australian National University

Effective Search Space Reduction for Spell Correction Using Character Neural Embeddings