Enhancement of South Indian Inscriptions Images Using De- Noising and Character Spotting Techniques

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume IV/Issue2/OCT2014 ENHANCEMENT OF SOUTH INDIAN INSCRIPTIONS IMAGES USING DE- NOISING AND CHARACTER SPOTTING TECHNIQUES

Naveen Kumar Talla and E.V Ramana, *M.Tech Student, #Head, Computer Science , Vidya Vikas Engineering College

ABSTRACT—South Indian inscriptions have a variety language. Similar to most languages of India, each symbol in of culture, style and history associated with them. The Telugu script represents a complete syllable. There is very inscriptions found in various parts of the South India reflect little scope for confusion and spelling problems. Similar the artistic styles of living, administration methods, types of cases are reported in Karnataka and Tamil Inscriptions. artistic tools etc. The inscriptions found will also depict the Hundreds of inscriptions were copied from hierarchy of the various South Indian languages. Various This work was supported by Shri E.V Ramana, Vidya rulers, who have ruled the respective areas developed the Vikas Engineering College, Hyderabad. regional diacritics of the language. In this paper, we study Naveen Kumar Talla is a M.Tech Student from Vidya of the South Indian Inscriptions in terms of development, Vikas Engineering College, Hyderabad. ( e-mail: hierarchy and writing styles. Methods to enchance the naveen.ns@ gmail.com). inscriptions by removing noise from the inscriptions using Ananth Nath Talla is a M.Tech from IIT Kharagpur. He various methods is also explained. is now a Scientist at DRDO, Bangalore ([email protected]). Index Terms — Inscription, de-noising, diacritics, different parts of South India which were in Dravidian enhancement, evolution languages like Tamil, Kannada and Telugu. In order to enable the scholars to get a full account of the inscriptions in different South Indian languages and to facilitate their research these series were brought out, and first in the series, INTRODUCTION edited by Dr. Hultzsch, was brought out in the year 1890. SOUTH Indian Inscriptions are found in various temples, These volumes contain texts of inscriptions with summary in world heritage sites and other historically important places. English and a general introduction about the importance of Some of the inscriptions are lost and some are in the the inscriptions The texts are published as per the original process of extinct due to climatic conditions, wear and tear, scripts like Tamil, Telugu, Kannada, Grantha and and improper management of the inscriptions. In a rude Malayalam, etc. So far, 27 such volumes have been brought shock to the Telugu community, the famous Kalamalla out and most of them contain Tamil inscriptions. They are inscription, considered to be the first in Telugu on the basis very useful for reconstructing regional histories. This paper of which the Official Language status was accorded to describes the techniques of enhancement and interpretation Telugu, has disappeared. Professor Hultzsch from the latter of text from the inscriptions from various Inscriptions of part of 1886 when he was appointed Epigraphist to the South Indian languages. Section II describes the various de- Government of Madras started a systematic collection of noising schemes. Section III describes the character spotting Inscriptions of Southern India. The Publication of these documents with texts and translations was taken up process and the developmental implementation. simultaneously and the following fascicule of South Indian Inscriptions was issued between the years 1886 and 1903. A study team on inscriptions from the then Madras found the LITERATURE SURVEY inscription by Chola King Dhanunjaya Varma belonging to In any digital image, the measurement of the three the period 575 AD on the premises of Sri Chennakesava observed color values at each pixel is subject to some Swamy temple in Kalamalla village of Kadapa district in perturbations. These perturbations are due to the random 1904 AD and preserved at Egmore Museum. Telugu is a nature of the photon counting process in each sensor. The syllabic noise can be amplified by digital corrections of the camera

IJPRES

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume IV/Issue2/OCT2014

or by any image processing software. For example, tools entire phonetic spectrum of all Indian (and most world) removing blur from images or increasing the contrast languages. For example, the only sound of the English enhance the noise. language not represented fully in Telugu (in a theoretical The principle of the first de-noising methods was quite sense) is the ‘a’ sound as in ‘apple.’ Since all Indic scripts simple: Replacing the color of a pixel with an average of have descended from braahmee, the similarities between the colors of nearby pixels. The variance law in probability modern devanagari and telugu scripts will become apparent theory ensures that if nine pixels are averaged, the noise upon close observation. For example, the devanagari (Hindi) standard deviation of the average is divided by three. Thus, letter ‘ka’ if turned by 90 degrees on the side will resemble if we can find for each pixel nine other pixels in the image ‘ka’ in kannada. with the same color (up to the fluctuations due to noise) one can divide the noise by three (and by four with 16 similar pixels, and so on). This looks promising, but where can these similar pixels be found? Marc Lebrun presents a detailed implementation of the Non-Local Bayes (NL-Bayes) image de-noising algorithm.In a nutshell, NL-Bayes is an improved variant of NL-means. In the NL-means algorithm, each patch is replaced by a weighted mean of the most similar patches present in a neighborhood. Images being mostly self- similar, such instances of similar patches are generally found, and averaging them increases the SNR. The NL- Bayes strategy improves on NL-means by evaluating for each group of similar patches a Gaussian vector model. The KANNADA INSCRIPTIONS most similar pixels to a given pixel have no reason to be close at all. Think of the periodic patterns, or the elongated edges which appear in most images. It is therefore licit to Fig. 1. A Telugu Inscription and its corresponding interpretation of the text scan a vast portion of the image in search of all the pixels by various historians of the state. that really resemble the pixel one wants to denoise. Denoising is then done by computing the average color of The Kannada inscriptions (Old Kannada, Kadamba script) these most resembling pixels. The resemblance is evaluated found on historical Hero Stone, coin and temple wall, piller, by comparing a whole window around each pixel, and not tablet and rock edict. The inscriptions found are in Proto just the color. This new filter is called non-local means and Kannada, Pre Old Kannada, Old Kannada, Middle Kannada it is given by this equation below. and New Kannada. The first written record in Kannada traced to Ashoka's Brahmagiri edict, Tagarthi inscription dates back to 350 AD, Nishadi Inscription of 400 AD of Chandragiri hill (Shravanabelagola), Halmidi inscription of where d(B(p), B(q)) is an Euclidean distance between 450 AD and Aihole inscriptions are very important in the image patches centered respectively at p and q, f is a history of Kannada and Karnataka. 5th century Tamatekallu decreasing function and C(p) is the normalizing factor. inscription of Chitradurga and 500CE Chikkamagaluru Antoni Buades names it as Non Local Means denoising. inscriptions. There are few Kannada words found in the Since the search for similar pixels will be made in a larger edicts and inscriptions those are prior to the Christian era in neighborhood, but still locally, the name "non-local" is places as far as Egypt. somewhat misleading.

In terms of extracting text from the inscriptions a process of character spotting is developed to extract text from the inscriptions. Character spotting technique to spot the characters in the inscriptions is described in the coming sections. Algorithms like HOG and template matching are applied on the inscriptions to extract text from the inscriptions.

TELUGU INSCRIPTIONS Telugu script has the capability to represent almost the Shravanabelagola. Fig. 2. Old Kannada inscription of 983 CE on Tyagada Brahmadeva

IJPRES

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume IV/Issue2/OCT2014

Pillarat Non-Local Bayes (NL-Bayes) De-noising OTHER SOUTH INDIAN INSCRIPTIONS

The original name of the king was Rajarajakesarivarman The implementation proceeds in two identical iterations, or Rajakesarivarman Mummadisoradeva, which occurs, in but the second iteration uses the denoised image of the first his earliest Tamil inscriptions. The Tiruvalangadu iteration to estimate better the mean and covariance of the plates[1] call him Arunmorivarman. This name, in the patch Gaussian models. A discussion of the algorithm shows slightly altered form Arumorideva, occurs also in some of that it is close in spirit to several state of the art algorithms the Tamil records of his reign. (TSID, BM3D, BM3D-SAPCA), and that its structure is The use of the word kadara as a principal verb is actually close to BM3D. Thorough experimental comparison common in monumental Tamil and occurs also in the made in this paper also shows that the algorithm achieves the Tanjore inscriptions. In modern Tamil it is only an auxiliary best state of the art on color images in terms of PSNR and verb. The history of this word is analogous to that of the image quality. On grey level images, it reaches a English ought. performance similar to the more complex BM3D-SAPCA (no color version is available for this last algorithm).

DENOISING METHODS Image denoising is an important image processing task, CHARACTER SPOTTING both as a process itself, and as a component in other Template matching is a technique in digital image processes. Very many ways to denoise an image or a set of processing for finding small parts of an image data exists. The main properties of a good image denoising which match a template image. It can be used in model are that it will remove noise while preserving edges. manufacturing as a part of quality control, a way to navigate a mobile robot, or as a way to detect edges in images. Non Local Means De-noising Template matching is a technique for finding areas of an image that match (are similar) to a template image (patch). The denoising of a color image u=(u1, u2, u3) and a certain patch B = B(p,f) (centered at p and size 2f+1 x 2f+1) and is a. Source image (I): The image in which we expect to given as find a match to the template image

a. Template image (T): The patch image which will be compared to the template image

where i=1, 2, 3, B(p, r) indicates a neighborhood centered at p and size 2r+1 × 2r+1 pixels and w(B(p,f),B(q,f)) has the same formulation than in the pixel wise implementation. In this way, by applying the Our goal is to detect the highest matching area: procedure for all patches in the image, we shall dispose of N² = (2f+1)² possible estimates for each pixel. These estimates can be finally averaged at each pixel location in order to build the final denoised image.

Fig 4. Template (T) taken from Source image (I)

To identify the matching area, we have to compare the template image against the source image by sliding it: Fig 3 original and denoised image using Non local means denoising.

IJPRES

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume IV/Issue2/OCT2014

The implementation uses the match template function of the OpenCV library. The Open Source Computer Vision Library (OpenCV) is a library of functions used in computer vision and image processing. It is cross-platform, so that it may be compiled for ARM and executed on the CARMA board. As of the latest version (2.4.5), the necessary make files to compile for CARMA are included in the library. It has about 500 algorithms and 5000 functions, many of which have specialized GPU implementations along with their (multi-core) CPU counterparts. This includes functions for template matching and image filtering.

Fig 5. The order of matching the template By sliding, we mean moving the patch one pixel at a time (left to right, up to down). At each location, a metric is calculated so it represents how “good” or “bad” the match at that location is (or how similar the patch is to that particular area of the source image). For each location of T over I, you store the metric in the result matrix (R). Each location (x,y) in R contains the match metric. The image in figure 6 is the result R of sliding the patch with a metric TM_CCORR_NORMED.

The equation of template matching using the metric TM_CCORR_NORMED is given by the equation R(x,y) below.

Fig 7: Data flow of thematchTemplatePrepared_CCOFF_NORMED_8U function. The downward arrows show function falls. The horizontal arrow shows flow of data. Parallelism is indicated by several boxes on top of each other.

Results The character spotting is done by a user who has knowledge of the language of script in the stone inscription. Initially, the inscription image is preprocessed with contrast adjustment and denoising. User starts with selecting the character in the inscription using a rectangular selection around the character (template) and entering the unicode of the character as an input. The template and the original image are correlated (by normalized cross correlation) in Fig 6: original and denoised image using Non local means denoising. parts. The template is divided into equally sized 8 overlapping parts, whose scores of the normalized cross correlation are accumulated at the center region of the character in original image. The resulting correlation surface is thresholded to obtain the possible regions of interest (ROI) of the original image.

IMPLEMENTATION AND RESULTS

Implementation

IJPRES

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume IV/Issue2/OCT2014

Fig 10: Inscription and character spotting results-4

Conclusion Fig 7: Inscription and character spotting results -1 The different epigraphers of the inscriptions often interpret the inscriptions in different ways and result of these interpretation results in confusion among the recent interpreters. There was no readily available way of visually examining the inscriptions themselves to resolve the disagreements in readings. This work will help the epigraphers in minimizing the confusion created by manually interpreting the inscriptions. The denoising methods will help remove the noise from the inscriptions. The denoising method chosen for south Indian inscription is based on the characteristics of the script. The spotting process software tool can be used by epigraphers to interpret the ancient Fig 8: Inscription and character spotting results -2 inscriptions.

ACKNOWLEDGMENT The work carried out in this paper is supported by my guide Shri E.V Ramana, HOD of Computer Science. Thorough his support and constant encouragement the obstacles involved during the implementation could be resolved and the useful tool in result has come.

Fig 9: Inscription and character spotting results -3

IJPRES

INTERNATIONAL JOURNAL OF PROFESSIONAL ENGINEERING STUDIES Volume IV/Issue2/OCT2014

REFERENCES [1] G. O. Young, “Synthetic structure of industrial plastics (Book style with paper title and editor),” in Plastics, 2nd ed. vol. 3, J. Peters, Ed. New York: McGraw-Hill, 1964, pp. 15–64.

[2] Indu Shridevi. Enhancement of inscription images. National Conference on Com-munications (NCC), 1(12):1--5, Febraury 2013. URL http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6488017.

[3] Panayiotis Rousopoulos. A new approach for ancient inscription's writer identifi-cation. DSP-2011, 69(3):1--6, July 2011. URL http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6004966.

[4] S Rajakumar. Eighth century tamil consonants recognition from stone inscriptions. ICRTIT-2012, 62(1):40--43, April 2012. URL http://ieeexplore.ieee.org/xpls/icp.jsp?arnumber=6206766.

[5] Seon-Kyu Kim. Character spotting using image-based stochastic models. ICDAR-2001, 62(1):60--63, September2001. URL http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=953755.

[6] Lewis JP. Fast normalized cross-correlation. Vision Interface, 10(1):120--123, May 1995.URL http://scribblethink.org/Work/nvisionInterface/nip.html.

[7] Navneet Dalal and Bill Triggs. Histograms of oriented gradients for human detec-tion. CVPR-2005, 1(1):886--893, June 2005. URL http://vision.stanford.edu/teaching/cs231b_spring1213/papers/CVPR05_DalalTriggs.pdf. Marilyn Lundberg Leta Hunt and Bruce Zuckerman. Eyewitnesses to the past. Biblos-50, 1(1):886--893, June 2001. URL

IJPRES