International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

An Enhanced Template Matching Technique for Recognition of Telugu

N. SHOBHA RANI*, VASUDEV T*, PRADEEP C. H.** *Maharaja Research Foundation, Maharaja Institute of Technology University of Mysore, Mysore **Department of Computer Science, Amrita Vishwa Vidyapeetham Mysore Campus, Mysore [email protected]

Abstract: - Feature extraction and classification processes while developing Optical Character Recognition (OCR) systems involve massive computations and quite expensive especially for South Indian scripts. Multiple combinations of and along with its modifiers led to generation of huge number of classes with respect to character recognition systems. The feature extraction and classification of characters from such huge number of classes in south Indian language OCRs remains as a non-trivial problem. This paper proposes a technique for feature extraction and classification of Telugu handwritten script based on customized template matching approach with the support of caching technique for better performance. The technique of caching is implemented using main database with a cache database maintaining the frequently used character templates for set of all character templates. The XML database is used for defining the classes for various character templates and the class representations are provided using a novel class structure designed based on XML tags. The proposed system exhibits the recognition efficiency on our own test dataset with an accuracy of 93.55% for handwritten characters.

Key-Words: - Telugu characters, Telugu OCR, Customization, template matching, XML schema, cache database.

1 Introduction

The offline OCRs imbibe a scanned document input documents or forms [2], consisting of both printed and converts it into machine editable document and handwritten scripts. format necessarily into of corresponding character images. The input documents are pre- The process of automatic reading of documents is composed with text of either printed or handwritten composed of a sequence of stages like image script pertaining to a particular language. The acquisition, pre-processing, object extraction, documents used in character recognition systems are normalization or windowing, feature extraction, classified as variety of types [1]. The standard classification and post-processing. The image classifications are printed and handwritten acquisition is the process of imbibing a document as documents. The printed documents are composed an input to the character recognition system. Pre- with a particular font style and size of the language. processing stage subjects to removal of noises from The handwritten documents are written by the the image that occurs due to variety of external author in a particular script, handwriting style of the factors like improper image scan settings, quality or individual, with the freedom of writing. The resolution of image, quality of image capturing freedom and flexibility taken by the author while device and lack of illumination etc. The resulting writing the script adds more complexities in images require additional pre-processing techniques recognition of the characters, since each individual like elimination of page layout [3] and graphical have their own handwriting style and in addition the components in the image [4], skew detection and mistakes while writing appends conflicts in the correction [5] etc followed by transforming the process of recognition. The combination of printed document to a suitable form for further processing. and handwritten document had resulted into another The object extraction [6] generally called as type of documents called as application/pre-printed segmentation; the process of identifying boundaries for region of interest (ROI).

ISSN: 2367-8984 154 Volume 2, 2017 International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

The normalization or windowing is process of representation of the image and standard classifiers converting all the character objects extracted into are used to classify the images. However these equal sized blocks [7]. The windowed character features are very much sensitive to clutter and noise objects are further sent for feature extraction [8] in in the images and require only the presence of a which the features of character object are computed single object. On the other hand characters in south and sent for classification [9]. The classification is Indian scripts like Telugu are composed of more the process of determining a match of features than one object making it more complex to apply computed for particular character extracted with the high level feature extraction techniques. Therefore set of trained character features in the database. The the present work aims at performing the feature classified character object is considered as the extraction and classification of an object using a recognized character class which is sent for post customized template matching technique. In processing where it transforms the character class addition the template matching performance is with the corresponding Unicode of the character. further improved caching technique and parameter Feature extraction and classification stages being the that measures the similarity between trained and test important concern in the present work, a survey of templates. various techniques of feature extraction is discussed subsequently. 2 Related Work

Feature extraction is broadly categorized into low Numerous works are reported in literature using level and high level feature extraction. The low template matching for classification or recognition level feature extraction represents the spatial of characters for different languages. Some of the relationships between the pixels which are termed as approaches referred are discussed in brief. intensity discontinuities and these techniques do not retrieve the information related to shapes or position Majid Ziaratban et. al. [21] proposed a language or scale etc. in an image. Further low level features based template matching method for recognition of represent the edge details in the image which are Farsi/ using neural networks and implemented using first order derivative operators multi-layer perceptron model claimed an accuracy and the second order derivative operators [10]. of 97.65%. However the experimentation has not Some of the popular edge detection operators are considered the complex script for recognition. Canny, Sobel, Kirsch, Prewitt and Laplacian, Marr- Sunny Kumar et. al [22] had analyzed the Hildreth operators etc. However, the edge details of performance of template matching algorithm to particular image are not sufficient enough for English handwritten and type written characters recognition of characters in optical systems. using parameters like accuracy rate and time taken for execution. The accuracy attained is found to be In general high level feature extraction is used for about 83% for the both, however the experiment detecting and recognizing the shape of a particular was carried for a small data set, .., on about 360 object in the image. Shape description indicates images, which is relatively less for testing template computing the position, orientation and size of a matching algorithm. Danish et. al. [23] employed particular object in image. The description of a the template matching technique for recognition of particular shape depends upon the application handwritten, machine printed and type written requirements. The invariant features of a shape are English characters and numerals obtained an usually considered to establish the reliability and accuracy of about 94.50% for standard typewritten robustness of the system. The objects in image that fonts, 88% for unknown type written fonts, 98% for seek the shape information requires windowing it to numerals and 75% in case of unknown type written a common size, scale and proper illumination to fonts. Nikhil et. al. [24] applied the template for differentiate the shape from its back ground multi font styles and multi font sizes of English intensity. Many alternative ways also existing to script and attained an accuracy of around 90%. Mo perform the matching between the test template and Wenying et. al. [25] applied the template matching trained templates in the images [11-14]. algorithm by customizing with respect to weighted matching degree. This algorithm provides a higher Other sub category with in high level feature matching rate and overcome the fallacious extraction techniques includes structural features recognition produced by traditional calculation [15], statistical features [16] and global features method with accuracy of around 100%. Jatin et. [17]. Most shape and texture descriptors fall into al.[26] employed the template matching technique this category. These features provide the compact

ISSN: 2367-8984 155 Volume 2, 2017 International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

for type written English characters and classified using neural network classifiers.

In most of the reported works the template matching algorithm is implemented directly for various language scripts and the maximum approaches are focused on English character recognitions. Thus the Fig. 2(a) Binary templates 2(b) Edge templates proposed methodology makes an attempt to analyze the performance of the template matching with In the usual template matching approaches, the required enhancement particular to Telugu character trained template of maximum match among all the recognition system. templates in the database is considered as the template of best correlation with the test template. 3 Proposed Methodology The matched template will be further sent for The proposed work aims at providing a customized recognition and then for post-processing. The template matching approach with the improved significance of proposed system lies in performing performance for feature extraction and classification of the Telugu character set. To improve the the template matching efficiently. In the perspective performance of the template matching algorithm a of improving the performance of the template threshold measure called SIMILARITY MEASURE matching model, an additional parameter called as is imposed along with the shape matching analysis SIMILARITY MEASURE is applied to determine between test and trained templates. Further, to the percentage of shape matched between the test simplify the process of classification an XML template and trained template. Whenever the database with technique of caching is incorporated. percentage of correlation between the test and The architecture of the proposed system for feature trained template is found to exceed more than a extraction and classification is represented in fig. 1. specified threshold then the test template has attained the required amount of SIMILARITY

Words extracted Characters Features MEASURE and considered as recognized. Thus the from lines extracted extracted implication of SIMILARITY MEASURE reduces the time complexity of traditional template matching Cache DB process.

Correlation Recognition Process P analysis with Further, the trained templates in the proposed trained templates architecture are accessed through a technique of caching by categorizing the entire database into Post main database and cache database. Caching Main DB Processing enhances the performance of template matching algorithm by classifying the complete set of trained Process P Searches for best correlation between the test and trained template templates between the main database and cache Fig. 1 Architecture of feature extraction and database. The main database is nothing but the classification XML database of the proposed system consisting of complete set of trained template models. The cache The proposed system proceeds with the input as database consists of only references to the recently words extracted from lines and in turn character accessed set of templates in the main database with extraction from each word. The characters extracted respect to the particular session of OCR. Whenever are analyzed for its shape by performing the a reference is made for analyzing the correlation template matching using correlation analysis [25, between the test template and trained template, the 26]. The correlation analysis is performed between initial access is made to cache database if references the trained set of templates and shape extracted from are available in cache database, then the correlation test template for searching the best correlation that is computed between the test template and set of satisfies the SIMILARITY MEASURE. Fig. 2(a) recently referenced trained templates available in and fig. 2(b) depicts samples of the binary templates the cache database. If correlation adhering to and edge templates used for shape matching SIMILARITY MEASURE threshold is identified respectively. from the cache database then it will eliminate the need of performing template matching with the

ISSN: 2367-8984 156 Volume 2, 2017 International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

remaining set of templates in the main data base. In case there is no best match identified for test LPxypq,,  pq(, ) template in the cache database then the correlation (,)xy S (3.2) analysis is directed towards the trained templates By substituting equation (3.1) in (3.2), we have available in the main database. The process of finding a match in main database excludes the set of n 1 TSxpyq,, xy 2 templates cached in cache database.    1 2(,)xy S Lepq,   3.1 Feature Extraction by shape matching 2 (3.3)

Feature extraction is the process of representing the Where ‘n’ is the number of pixels in the template. images in its compact form so that the features will The function in equation 3.3 is called the likelihood uniquely identify a particular image. In the proposed function. Generally, it is expressed in logarithmic work, feature extraction by shape matching is form to simplify the analysis. The logarithmic implemented via template matching. Formally the operation scales the function, but it does not change template matching is defined as the method of the position of maximum. Thus, by applying parameter estimation. The parameters are generally logarithm, the likelihood function is redefined as considered as the set of coordinates that represent equation (3.4). the positions of the shapes that are extracted from the character templates which is normally the 2 brightness details in the image. 11TSxpyq,, xy In() Lpq,  nIn  2 2(,)xy S  Let F is function of parameter estimation, whose  (3.4) input is the respective coordinate positions of the shape extracted from test template or character. The In maximum likelihood estimation, choose the set of coordinate positions that represent the shape parameter that maximizes the likelihood function. extracted from test template are denoted by The trained templates that minimize the rate of change of objective function in equation (3.4) is Sxyxyxyxy ( , ),( , ),( , )...( , ) given by equation (3.5). 11 2 2 33 nn 2 and T represents the set of coordinate positions with min eTS  xpyq,, xy respect to the shape of trained template and is given (,)xy S (3.5) by Hence maximum likelihood estimation is equivalent Tpqpqpqpq (11 , ),( 2 , 2 ),( 3 , 3 )...(nn , ) to choose trained template model in the database that minimizes the squared error. The trained The function F evaluates whether the set S Є T or template model for which the test template best not. The correlation analysis considers some other matches is the recognized template in the database. parameters like noise in the image. Let the pixel P in Thus the result achieved by template matching is the set S is corrupted by Gaussian noise. The noise optimal for the test templates that are corrupted by has a mean of zero and standard deviation is ߪ. Gaussian noise. The function that maximize the Thus the probability that a point P, in the set S cross correlation between the test template and the matching with set T i.e., S(x, y) Є T(p, q) is given trained template through the best match can be by the normal distribution [10] as in equation (3.1). computed using equation (3.6).

1 TSpxqy,, pq 2    max eTS xpy,, q xy 1 2    Ppqxy, (,) e (,)xy S (3.6) 2 (3.1) The squared error while matching may vary from Since the noise affecting each pixel is independent, one trained template to another, the match defined the probability that the template is at position (p, q) can be poor in case of unmatched templates. is the combined probability of each pixel that the Admittedly the range of correlation is dependent template covers is given by equation (3.2). upon the size of the templates and noticeably it is

ISSN: 2367-8984 157 Volume 2, 2017 International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

variant to changes in template lighting conditions. In the proposed methodology the template matching is performed with respect to the bit map images of test templates and a set of trained templates. The bitmap image contains the edges of characters inherent in the template. Thus the bitmap image reduces the computation time.

3.2 Design of XML schema for the creation of character classes Fig. 4 XML database populated with classes XML provides a handy format for the managing the data of any format efficiently and also it is The tag in class representation of comfortable to integrate with various application XML will be able to accept ‘n’ number of Unicode programs [19, 20]. This approach provides the in format separated by semicolon (;), advantage of creating a new class structure for called as cascading of Unicodes. The feature of representation of huge number of classes and also cascading multiple Unicode combinations will facilitates the template matching based classification enable the XML class structure to extent of word of various characters. We propose a novel XML level class creation and recognition. The XML class schema for representation of various classes structure is very flexible, generic and suitable to irrespective of the script processed by OCR. The extend for the creation of database of any other XML schema includes an index to each class, path script. specifying the binary edge templates and its corresponding Unicode combination in hexadecimal 3.2.1 Algorithmic design for Caching format. Fig. 3 provides the syntax of a typical XML of class. An algorithm has been designed for caching activity during template matching and is presented for comprehension. [Index position in integer format] [Name of the letter] Caching(TEST_TEMPLATE)  [Unicode in UTF-8-Hex format] 1. Initialize TEMP 0, REQUIRED_SIMILARITY_MEASURE  0.8 [To access the features of binary edge template]  2. IDENTIFIED_EQ NULL 3. Resize TEST_TEMPLATE to DIM(55,55) 4. for n=1tosize(CACHEDB) Fig. 3 XML schema for class representation in the TRAINED_TEMPLATE  CACHEDB{n}. MAINDB data base The character tag and is ACTUAL_SIMILARITY_MEASURE  CORRELATION(TRAINE used to enclose the class information like name, D_TEMPLATE, TEST_TEMPLATE) If(ACTUAL_SIMILARITY_MEASURE > path, Unicode required for class creation. The index REQUIRED_SIMILARITY_MEASURE) tag and provides the unique index TEMP  ACTUAL_SIMILARITY_MEASURE position to each class, the letter tag and IDENTIFIED_EQ  CACHEDB{n} assigns a name to each class which PRINT(“ TEST TEMPLATE RECOGNIZED IN CACHE DB”) generally represents the image file with .bmp break; end extension of the character template. Equivalent tag end and encloses the 5. If (TEMP == 0) Unicode combination in UTF-8 hexadecimal format for n=1 to size(MAINDB) and path tag and specifies the path TRAINED_TEMPLATE  MAINDB{n} to the respective binary edge template, which  accepts the edge template format in .bmp. Fig. 4 ACTUAL_SIMILARITY_MEASURE CORRELATION(TRAINE D_TEMPLATE, TEST_TEMPLATE) explicates instances of the classes created for the If(ACTUAL_SIMILARITY_MEASURE > Telugu handwritten and printed OCR in the REQUIRED_SIMILARITY_MEASURE) proposed system. TEMP  ACTUAL_SIMILARITY_MEASURE CACHEDB{SIZE (CACHEDB) + 1}  MAINDB{n}  IDENTIFIED_EQ MAINDB{n}. CHARACTER 1 - ta e0b085 ISSN: @\telugu\a\- 2367-8984 ta.bmp 158 Volume 2, 2017 2 / International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

EQUIVALENT consonants and modifiers. Telugu  MAINDB{n} [ ] // Marking for restricting repeated language has 16 vowels, 36 consonants and 432 access PRINT(“ TEST TEMPLATE RECOGNIZED IN MAINDB single conjunct composite letters (Guninthalu) of AND ENTRY ADDED TO CACHE DB”); 504 after excluding the vowels with ardaanuswara break; ( ) and anuswara ( ) along with the mere end ◌ం ◌ః end consonant ( ). The handwritten datasets for the end ◌్ 6. If (TEMP > REQUIRED_SIMILARITY_MEASURE) experimentation had been collected from 25 authors //Proceed the test template for post-processing end each comprising of vowels, consonants, single 7. Stop. conjunct and double conjunct consonant The REQUIRED_SIMILARITY_MEASURE specifies the clusters. percentage of correlation required for assessing the The training to the database is conducted with 2730 test template to be recognized and handwritten character templates. The datasets used ACTUAL_SIMILARITY_MEASURE specifies the for training includes 455 frequently used character percentage of correlation of trained template set of Telugu script of 6 different users. The 455 computed. The attribute IDENTIFIED_EQ stores the character templates include vowels, consonants and recognized test template for carrying forward to post single as well as multi-conjunct vowel consonant processing stage. The trained templates that are clusters. The frequently used character set is recognized in MAINDB will be created as a reference composed with reference to the data used in filling in CACHEDB, further the referenced trained Telugu application form documents. The documents templates will be marked in MAINDB to avoid considered belong to the geographical locations of repeated access in both databases. The scope of the Anantapur district, Andhra . The selection of references that are linked in the CACHEDB will be frequently used character set for OCR results in an for one particular session of OCR and also will be optimal speed and accuracy. refreshed again when the MAINDB is reinitialized by the user. The caching and SIMILARITY_MEASURE Some of the frequently used single conjunct and parameter will improve the performance of the double conjunct vowel-consonant clusters are template matching approach effectively, especially considered as individual classes in the present work. when the CACHEDB is very voluminous. Also, The creation of separate classes for double conjunct employing the SIMILARITY_MEASURE as a parameter vowel-consonant clusters eliminate the need of to determine the target template also increases the maintaining symbolic ordering during segmentation chances of determining occluded or noisy test and post-processing stages, this avoids the number templates. The figs. 5(a) and 5(b) depicts the of iterations required for classification of each obtained similarity measures and recognition results conjunct, vowels and consonants separately. And of two images with noise and occlusion also post-processing of double conjunct vowel- respectively. consonant clusters may lead to erroneous recognition results if proper symbolic ordering is The ACTUAL_SIMILARITY_MEASURE of fig. 5.(a) is not followed while classification. However the estimated as 0.9607243 and the increase in number of classes may increase the ACTUAL_SIMILARITY_MEASURE of fig. 5.(b) is search time involved in recognition of characters. estimated as 0.8113057. But in the proposed system a caching technique is incorporated to suffice the performance of the system. The fig. 6 represents the multi conjunct vowel-consonant clusters and the fig. 7 represents some of the double conjunct consonant clusters of handwritten datasets.

(a) (b) Fig. 5 Images with and without occlusion Fig. 6 Multi conjunct vowel consonant clusters 4. Experimental Analysis

Apparently south Indian languages like Telugu consist of a huge alphabetical set include vowels,

ISSN: 2367-8984 159 Volume 2, 2017 International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

9.352383e-001 కే 8.420255e‐ కే 00 Fig. 7. Double conjunct vowel-consonant clusters 9.472471e-001 కై 8.489905e‐ కై

An user friendly interface has been designed using 00 9.415977e-001 MATLAB for performing the character recognition. కొ 8.484739e‐ కొ The GUI includes the functionalities to load the 001 input documents, to refresh the cache database 9.472471e-001 which updates the total number of classes present at కో 8.367062e‐కో current instance of execution and the new button to 9.578262e-001 reload another document input. కౌ 8.549644e‐ కొ 001 The results of experimentation are tabulated in table The accuracy obtained with test templates is around 1 and the REQUIRED_SIMILARITY_MEASURE 93.55% for the sample size of around 2730 different considered for the best correlation is 80%. characters from which 2554 characters are recognized correctly. Table 1. Experimental results of trained and test Templates The fig. 8 graphically represents the performance of caching technique for the data statistics represented Bitmap Required Recog Obtained Recog in Table 2. image of similarity nized similarity nized character measure Chara measure Chara cter cter

9.647845e-001 8.343535e-001 అ అ

9.598691e-001 8.297401e-001 ఆ ఆ

9.566974e-001 8.479653e-001 ఇ ఇ

9.467432e-001 8.234562e-001 ఈ ఈ

9.746453e-001 8.325270e-001 ఉ ఉ

9.736098e-001 8.620125e-001 ఊ ఊ Fig. 8 Size of CACHEDB versus time taken with/without caching 9.713786e-001 7.523467e-001 ఋ ◌ా The size of CACHEDB represents the number of 9.609760e-001 8.873520e-001 ఎ ఎ templates cached in cache database.

9.553150e-001 8.642541e-001 ఏ ఎ Table 2. Experimental statistics considered for evaluation of caching technique 9.584122e-001 8.693013e-001 ఒ ఒ No. of handwritten 9.592538e-001 8.481125e-001 No. of characters ఓ ఒ Size of Size of documents MAINDB CACHEDB recognized 9.288411e-001 కూ 8.440577e‐కూ 1 2730 126 126 2 2730 226 248 9.367023e-001 కృ 8.338047e‐ కృ 3 2730 298 369 001 4 2730 366 492 9.325272e-001 కౄ 8.443321e‐ కౄ 5 2730 398 535

9.464063e-001 కె 8.456852e‐కె From the fig. 8 and table 2, it is evident that the increase in number of documents is directly

ISSN: 2367-8984 160 Volume 2, 2017 International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

proportional to increase in size of CACHEDB. The performance of recognition depends on the size of increase in size of cache represents the new MAINDB. character templates that are referenced from MAINDB and linked to cache, whenever a new 5. Conclusions document is considered for processing during a session. The feature extraction by shape matching in conjunction with correlation based classification has It is observed that recognition accuracy remains provided satisfactory results. The inclusion of same with or without caching technique in the REQUIRED_SIMILARITY_MEASURE to find the proposed system and only characteristic that varies suitable match between the test template and trained is performance of the system. It is recommended to template greatly reduces the worst case time process similar set of documents in one particular complexity of conventional template matching session of OCR to attain optimal performance of the technique. However there are few cases of system. The user needs to refresh the CACHEDB misrecognition especially for some of the confusing whenever the subsequent document to be processed character pairs [ , ], [ , ] [ , ] [ , ] etc. The are found to be completely different from the one so కొ కో ఎ ఏ బ ఒ ఒ ఓ far processed. Refreshing the CACHEDB flushes all confusing characters possess very minor difference the entries in CACHEDB or sets the size of CACHEDB in its structural orientation, but the use of to zero. REQUIRED_SIMILARITY_MEASURE improves the performance of template matching algorithm. In the proposed system the experimentation is also However the design of efficient post processing extended to one set of references of printed methodology can correct the recognition errors with characters and its performance is indicated in fig. 9. prime regard to confusing character pairs. This suffices the reliability of the system, which is currently under investigation. In addition to this the technique of caching improves the overall performance of the classification approach. The experimental results of the proposed system can be still improved by replacing template matching with an efficient feature extraction technique so as to reach high recognition accuracies.

References: [1] Rinki Singh, Manideep Kaur, OCR for Telugu Script Using Back-Propagation Based Fig. 9 Performance of recognition – Printed Classifier, International Journal of Information Characters Technology and Knowledge Management, Vol. 2, No. 2, 2010, pp. 639-643. The testing of caching technique is performed on [2] Suman V Patgar, Vasudev T, Murali S, A windows platform with core i5 processor and 4 GB system for detection of fabrication in RAM. The higher configuration in machine may photocopy document, Journal of Computer definitely result a better performance. Science & Information Technology, Vol. 5, No. 14, 2015, pp. 29–35. The performance of recognition through caching can [3] B. Verma, M. Blumenstein, S. Kulkarni, be improved by testing with more number of similar Recent achievements in off-line handwriting documents. This is due to the fact that the size of recognition systems, School of Information cache database will be increased in par with the Technology, Griffith University, Gold Coast increase in number of input documents used for Campus. recognition. In the same way number of documents [4] C. V. Jawahar, M. N. S. S. K. Pavan Kumar, S. used for recognition in a particular session S. Ravi Kiran, A Bilingual OCR for Hindi- automatically increases the frequently used Telugu Documents and its Applications, Centre templates in cache database. In the cases of for Visual Information Technology, documents consisting of non-frequently or non- International Institute of Information repeatedly occurring textual patterns the Technology, Hyderabad.

ISSN: 2367-8984 161 Volume 2, 2017 International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

[5] N. Shobha Rani, T. Vasudev, “A Generic Line Survey, International Journal of Innovative Elimination Methodology using Circular Masks Science, Engineering & Technology, Vol. 1 for Printed and Handwritten Document Images No. 2, 2014, pp. 131-138. “, Emerging research in computing, [16] E.Kavallieratou, N.Fakotakis and information, communication and applications G.Kokkinakis, Handwritten Character ELSEVIER science and technology, Vol. 3, Recognition based on Structural No. 1, 2014, pp. 589-594. Characteristics, Proceedings of the 16th [6] Rangachar kasturi, Lawrence ’gorman, Venu International Conference of the International govindaraju, Document image analysis: A Association of Pattern Recognition, Vol. 3, primer, Sadhana, Vol. 27, No. 1, 2002, pp. 3– 2002, pp. 139-142. 22. [17] S. Arora, D. Bhattacharjee, M. Nasipuri, D. K. [7] Nallapareddy Priyanka, Srikanta Pal, Ranju Basu, M. Kundu, Application of statistical Mandal, Line and word segmentation approach features in handwritten character for printed documents, IJCA Special Issue on recognition, International Journal of Recent Recent Trends in Image Processing and Pattern Trends in Engineering, vol. 2, 2009, pp. 40–42. Recognition, Vol. 1 2010, pp. 30-36. [18] Dimitri A. Lisin, Marwan A. Mattar, Matthew [8] R. Zhang, T. T. Tjhung, Senior Member, IEEE, B. Blaschko, Mark C. Benfield, Erik G. H. J. Hu, and P. He, Window Function and Learned-Miller, Combining Local and Global Interpolation Algorithm for OFDM Frequency- Image Features for Object Class Recognition, Offset Correction, IEEE transactions on Proceedings of the IEEE Computer Society vehicular technology, Vol. 52, No. 3, 2003, pp. Conference on Computer Vision and Pattern 654-670. Recognition, Vol. 3, 2005, pp. 47-55. [9] Ryszard S. Choras, Image Feature extraction [19] John D. Evans, XML for Image and Map techniques and their applications for CBIR and Annotations (XIMA) Draft Candidate Interface Biometrics Systems, International journal of Specification, Version 0.4, NASA Digital Earth biology and biomedical engineering, Vol. 1, Office, Goddard Space Flight Center, No. 1, 2007, pp. 6-16. Greenbelt, 2001. [10] Pooja Kamavisdar, Sonam Saluja, Sonu [20] Swapnil Behle, Development of OHWR system Agrawal, A Survey on Image Classification for Hindi, C-DAC, Pune, 2013. Approaches and Techniques, International [21] Majid Ziaratban, Karim Faez, Farhad Faradji, Journal of Advanced Research in Computer Language based feature exraction using and Communication Engineering Vol. 2, No. 1, template matching in Farsi/Arabic handwritten 2013, pp. 2278-1021. numeral recognition, Ninth International [11] Mark Nixon, Alberto Aguado, Feature Conference on Document Analysis and extraction and Image processing”, Second Recognition, vol. 1, 2007, pp. 297–301. Edition, Academic Press, Elsevier, London, [22] Sunny Kumar, Pratibha Sharma, Offline UK, 2002. Handwritten & Typewritten Character [12] Yafang Xue, Optical Character Recognition, Recognition using Template Matching, Department of Biomedical Engineering, International Journal of Computer Science & University of Michigan, 2014. Engineering Technology, Vol. 4 No. 06, 2013, [13] Avneet Kaur, Lakhwinder Kaur, Savita Gupta, pp. 818-825. Image Recognition using Coefficient of [23] Danish Nadeem, Saleha Rizvi, “Character Correlation and Structural SIMilarity Index in recognition using template matching”, Project Uncontrolled Environment, International report, Department of Computer Science, Jamia Journal of Computer), Vol. 59, No.5, 2012, Millia Islamia, New Delhi, 2015. Applications pp. 0975 – 8887. [24] Nikhil Rajiv Pai, Vijaykumar S. Kolkure, [14] Jangala. Sasi Kiran, N. Vijaya Kumar, N. Sashi Design and implementation of optical character Prabha, M. Kavya, A Literature Survey on recognition using template matching for multi Digital Image Processing techniques in fonts /size, International Journal of Research in character recognition of Indian languages, Engineering and Technology, Vol. 4, No. 2, International Journal of Computer Science and 2015, pp. 398-400. Information Technologies, Vol. 6, No. 3, 2015, [25] Mo Wenying, Ding Zuchun, A Digital pp. 2065-2069. Character Recognition Algorithm Based on the [15] M. Shalini, B. Indira, Automatic Character Template Weighted Match Degree, Recognition of Indian Languages – A brief

ISSN: 2367-8984 162 Volume 2, 2017 International Journal of Signal Processing N. Shobha Rani et al. http://iaras.org/iaras/journals/ijsp

International Journal of Smart Home, Vol. 7, No. 3, 2013, pp. 53-60. [26] Jatin M Patil, Ashok P. Mane, Multi Font And Size Optical Character Recognition Using Template Matching, International Journal of Emerging Technology and Advanced Engineering, Vol. 3, No. 1, 2013, pp. 504-506.

ISSN: 2367-8984 163 Volume 2, 2017