Line Segmentation of Javanese Image of Manuscripts in Javanese Scripts
Total Page:16
File Type:pdf, Size:1020Kb
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Repository Universitas Sanata Dharma International Journal of Engineering Innovation & Research Volume 2, Issue 3, ISSN: 2277 – 5668 Line Segmentation of Javanese Image of Manuscripts in Javanese Scripts Anastasia Rita Widiarti Agus Harjoko Marsono Sri Hartati Email: [email protected] Email: [email protected] Email: [email protected] Email: [email protected] Abstract – Segmentation is an important stage in automatic II. RELEVANT WORKS transliteration process of a manuscript image. One of the segmentation approaches generally used to get an image of Palakollu, et al. [2] investigate line segmentation the scripts of a scrip image is performing line segmentation and then performing segmentation of script images on the method on Hindi manuscript image using projection-based result of the line segmentation. approach. They build an algorithm to detect header line Line segmentation of image of manuscripts in Javanese and base line, based on several initial assumptions such as scripts is often difficult because there are lines of images average line height of 30 pixel, to make an estimation of which shouldn’t be on the same line, but are in one line area real average line height. From 500 documents of the image, and there are even images in different but investigated, the average accuracy of line segmentation is overlapping lines. This paper offers an idea to use moving 93.6%. Lehal and Singh [3] study segmentation on average to smooth the curve of vertical projection result of Gurmukhi text using a combination of statistical analysis image of manuscripts in Javanese scripts as an initial guide from the text, projection profile, and analysis on connected of line separation. Next, to separate parts of scripts in different lines but in the same line or in overlapping lines, components. By using horizontal projection, they discover average height data and standard deviation of average height segmentation failures which are separated into two of every objects are use to get information on estimation of failures, which are over segmentation and under the height of line image. Image connectivity concept in an segmentation. Over segmentation happens because there is object is also used to separate script images in different but a gap between lines of text, and under segmentation overlapping lines. happens because there are parts of a text which overlap From the test result of 4 images of manuscripts in Javanese between lines. Lehal and Singh then determines the value scripts from different writers with different writing styles, of certain unit based in characteristics of placement of average percentage of correctness of line segmentation scripts, i.e. height of an area, determination of strips obtained was 93.19% with standard deviation of 4.55%. Mistakes of line segmentation were mostly caused by classes, which are script groups in a line, and sandangan of a script in the previous line which joined the determination of the height of the first line. From the test next line, and cutting imperfect scripts. However, from the on 40 documents of Gurmukhi printed text, they produce percentage of correctness of line segmentation, it could be average percentage of correctness of overlapping script concluded that the combination of various image processing segmentation of over 87%. Tripathy and Pal [4] provide a techniques used in line segmentation was relatively good. sample of main principle for line segmentation on Oriya manuscript, which is dividing the scrip into several groups Keywords – Connected Component, Javanese Manuscript vertically the applying horizontal projection on the parts. Segmentation, Moving Average, Projection Profile. Relation between the peaks and slopes on histograms are information for performing line segmentation. Structural I. INTRODUCTION and topological approaches and characteristics of the curves formed into some kind of reservoir of script image Segmentation of script image has a very significant and histogram used to detect isolated and overlapping scripts. major role in the efforts to introduce a document image. With the approaches, success percentage in separating The success of a segmentation process will influence the overlapping scripts was 96.7%. Nicolaou and Gatos [1] cut next processes, especially script introduction process. lines of manuscripts based on the assumption that there is Moreover if the manuscript which will be recognized is a a connectivity line in a line of manuscript. Weliwitage, et hand-written manuscript. This is a challenge, because al. [5] uses projection modification named cut text manuscripts are written by many different writers, and the minimization (CMT) to perform line segmentation of writing style may have different gagrak. constant slope characteristics. The main principle of CMT Segmentation of script image can be done by first is finding a line or cutting line by cutting words not on the performing line segmentation of manuscript image, and line as minimally as possible. Yin and Liu [6] perform line then after obtaining lines of manuscript image, it’s segmentation of image of manuscripts in Chinese scripts, continued by segmentation of script images from based on grouping of connected components using corresponding line of manuscript image. However, line minimal spanning tree (MST), and then performing segmentation of manuscript image in often difficult distance matrix-based learning. During the test of 803 because there are fluctuation lines, overlapping different manuscripts in Chinese characters with a total of 8169 components, and irregularity in geometry of the lines, such lines of line image, the algorithm they proposed has very as the height and width of the lines [1]. high percentage of correctness in detecting line, which is 98.02%. Surinta, and Chamchong [7] successfully perform Copyright © 2013 IJEIR, All right reserved 239 International Journal of Engineering Innovation & Research Volume 2, Issue 3, ISSN: 2277 – 5668 segmentation of historic image handwritten on palm Area 1 is upper area or upper zone. In this zone there are leaves. It start with binarization stage with Otsu method to sandangan swara i.e. wulu and pepet, and sandangan separate object and background, then continued with line panyigeg i.e. layar and cecak. Area 2 is main area or main segmentation stage by applying profile projection method, zone. It’s called main zone because this is where basic and finally segmentation stage to get scripts using scripts of Javanese scripts which are legena scripts. Area 3 histogram of segmented images. They get an average of is lower are or lower zone. Lower zone is a place for suku, percentage of correctness of scripts of 82.5%. Jindal et al. lower part of taling, cakra mandaswara script, cakra [8] on horizontal segmentation for lines of overlapping keret, as well as pasangan placed below. Fig. 3. shows printed Indian scripts successfully get accuracy lever of examples of Javanese scripts and places to put the scripts. segmentation of 96.45% to 99.79%. Widiarti [9] has studied the use of profile projection to get images of Javanese scripts from printed manuscripts in Javanese scripts, with success rate of segmentation of 86.78%. Profile projection in this case is very possible to be used because the characteristic of printed manuscripts in Javanese scripts is having clear and even uniform distance between lines and scripts. In this paper, we have proposed new strategies to line segmentation from Fig.3. Writing zones of Javanese scripts Javanese manuscript. We use of vertical projection for line image segmentation on manuscripts in Javanese scripts Fig.4 shows the practice of Javanese script writing, combined with popular technique to smooth curves, which Javanese scripts are written from left to right, and one was moving average method, and combined again with Javanese script can take the form of: statistical information from data on the height of objects in 1) Legena script only, i.e. sa script in Fig. 4(a). an image and using pixel connectivity concept on the same 2) Legena script with sandangan placed on the upper object. zone, i.e se script in Fig. 4(b). 3) Legena script with sandangan placed on the lower III. STUDY OF THE CHARACTERISTICS AND zone, i.e. su script in Fig. 4(c). RULES OF WRITING JAVANESE SCRIPT 4) Legena script with sandangan in upper and lower zones at once, i.e. sur script in Fig. 4(d). Javanese scripts consist of two major scrip groups, which are basic Javanese scripts and derivative Javanese scripts. Basic Javanese scripts are main Javanese scripts Fig.4. Samples of script forms with combination of the which haven’t been added with various punctuation marks placement of the additions. or sandangan, therefore these main Javanese scripts are called legena or wuda scripts which means bare scripts. From results of a study on methods of writing Javanese Fig. 1 shows 20 legena Javanese scripts. scripts on manuscripts, problems related of line segmentation of manuscripts in Javanese scripts are discovered as shown in Fig. 5, because: 1) There are scripts on different lines which overlap. Fig. 5(a). Shows that part of script image in the upper line touch part of script image in the second line. Dashed Fig.1. Nglegena Javanese scripts [9] line is the sign for line change. 2) Distance between scripts isn’t clear, so it’s impossible Generally, most Javanese scripts used don’t only consist to only use vertical projection. Fig. 5(b). shows that of legena scripts, bust use various additions sandangan. part of script image in the upper line is in the same There are many kinds of sandangan, i.e. sandangan swara position as script image in the next line. for i consonant called wulu and is written above corresponding legena script. Or sandangan swara of u consonant called suku which is written beneath legena script.