<<

Pattern Recognition, Vol. 30, No. 9, pp. 1505-1519, 1997 1997 Pattern Recognition Society. PuNished by Elsevier Science Ltd Pergamon Printed in Great Britain. All rights reserved 0031 3203/97 $17.00+.00

PII: S0031-3203(96)00157-4

SKEW DETECTION AND TEXT LINE POSITION DETERMINATION IN DIGITIZED DOCUMENTS

B. GATOS, t4 . PAPAMARKOS t'* and C. CHAMZAS ~ tElectric Circuits Analysis Laboratory, Department of Electrical and Computer Engineering, Democritus University of Thrace, 67100 Xanthi, Greece *Institute of Informatics and Telecommunications, National Research Center "Democritos", 15310 Athens. Greece

(Received 8 June 1995; in revised fi)rm 30 July 1996; received fbr publication 15 October 1996)

Abstract--This paper proposes a computationally efficient procedure for skew detection and text line position determination in digitized documents, which is based on the cross-correlation between the pixels of vertical lines in a document. The determination of the skew angle in documents is essential in optical character recognition systems. Due to the text skew, each horizontal text line intersects a predefined set of vertical lines at non- horizontal positions. Using only the pixels on these vertical lines we construct a correlation matrix and evaluate the skew angle of the document with high accuracy. In addition, using the same matrix, we compute the positions of text lines in the document. The proposed method is tested on a variety of mixed-type documents and it provides good and accurate results while it requires only a short computational time. We illustrate the effectiveness of the algorithm by presenting four characteristic examples. (~ 1997 Pattern Recognition Society. Published by Elsevier Science Ltd.

Skew detection Hough transform Character recognition Segmentation

I. INTRODUCTION form. (25) Hough transform methods are the most popular, Today, most information is saved, used and distributed by but they are computationally expensive. To this end, electronic means. Scanners can convert documents stored different fast Hough transform based techniques have on papers into a format suitable for computers. The ever been proposed, (26) including the creation of the gray scale increasing application of digital document analysis sys- "burst image ''~2°) and the use of only a selected square of tems promoted the development of digital text processing the document where only bottom pixels of candidate units in order to obtain the information which appeared objects are preserved. ~21) A special case of Hough trans- on the digital documents. The main purpose of these form for skew detection is proposed by Yan. (21) This systems is the transformation of text images into recog- method is based on the correlation between two vertical nized ASCII characters, which is mainly performed with lines in a document, which is generated if one vertical Optical Character Recognition (OCR) systems. An OCR line is shifted relative to the other. This technique gives system often consists of a preprocessing stage, (1 3) a good results but, due to the shift process, it is compu- document layout understanding and segmentation tationally expensive. In projection-based methods, the stage, ~4-7~ a feature extraction stage (-14) and a classifi- projections of the document onto specific directions are cation stage. (~5 t y) Many of these stages are facilitated if first calculated. The skew angle corresponds to a rotation the document has not been skewed during the scanning angle where some projection characteristics are satisfied. process and its text lines are strictly horizontal. Although Ciardiello et al., ~23~ using horizontal projection, deter- there are some techniques for character segmentation that mine that the skew angle corresponds to a rotation for can work on skewed documents too, they are ineffective which the mean square deviation of a projection histo- and involve great computational cost. (18'19) It is therefore gram is maximized. The method proposed by Baird in preferable, in the preprocessing stage, to determine the reference (24) belongs in this category, as well. Accord- skew angle of the digitized documents. ing to this method, for an orientation angle 0, we project There are several methods for skew detection. These the locations of characters onto an "accumulator" line methods are based on the Hough transform, ~2° 22) pro- perpendicular to the projection direction. By this proce- jection based approaches (23'24) and the Fourier trans- dure, we can construct a function A(), whose global maximum corresponds to the skew angle. This method gives good accurate results but is computationally ex- * Author to whom correspondence should be addressed. Tel.: pensive due to its preprocessing stage and global max- +30-541-26478; fax: +30-541-26947; e-mail: papamark@vor- eas.ee.duth.gr. imum finding stage. The method proposed by Postl in IThis work was partially supported by the Greek GSRT grant reference (25) belongs to the Fourier transform ap- PENED. proaches. According to this method, the skew angle

1505 1506 . GATOS et al.

corresponds to the direction for which the density of the Apart from the skew detection, the information given by Fourier space becomes the largest. This method requires the pixels on the vertical lines is also used to determine computation of the Fourier transform of the document the positions of text lines. Thus, we facilitate and accel- and for this reason, it is computationally expensive. erate the character segmentation stage. In this paper, we propose a new skew detection method based on the information existing on a set of equidistant 2. IMAGE SMOOTHING AND VERTICAL LINE DATA vertical lines. We accept that a text document consists ACQUISITION mainly of horizontal text lines. We can further choose from all the image pixels those lying on a set of equi- We define the binary text image B(,y) ~ {0, 1} with distant vertical lines. For a document, these pixels cor- the integer x,y taking values in the ranges I < x < Xwin respond to pixels of text lines. By using only these pixels and 1 _< y _< Ywi,, and assuming that text pixels are we construct a correlation matrix between the vertical assigned the value I and background pixels the value lines. We need not relate every pixel of a line to all pixels 0. All the distances in this paper are expressed in units of of the other lines, but only to those pixels that lie in a pixel distance, i.e. the horizontal distances in terms of specific region defined by the expected maximum skew. horizontal pixel distance and the vertical distances in In this way, we reduce the computation time significantly term of vertical pixel distance. without sacrificing accuracy. Finally, we form the ver- Image smoothing. Before applying the method fi~r tical projection of the matrix, and the skew angle corre- skew detection to B(x, y), we pre-process the image using sponds to the projection's global maximum. As we the horizontal run-length smoothing algorithm demonstrate in Example 3, the proposed method also (RLSA), ¢27) so that text lines are transformed to thick works well with documents that, in addition to the usual solid lines. According to it, if the number of background horizontal text lines, contain images, line draws and pixels [B(x,y) = 0] lying between two adjacent horizon- tables. tal text pixels is less than or equal to a certain threshold T, By using the correlation matrix, after evaluation of the then these background pixels are converted into text skew angle is accomplished, we can also find the posi- pixels [B(x, y) -- 11. tions of the text lines. Defining a line detection function, The proper value of T depends on the text character- we form a horizontal projection of the matrix. The local istics and primarily on the character width. Therefore, the maxima of this projection give the positions of the text proper threshold value is selected according to the user's lines. The detection of the text lines is essential for many experience. We found that a suitable value for T is applications such as for segmentation and character T -- 0.1Xwin. Figure 1 shows the results of the above extraction procedures. procedure applied to a document. The innovations introduced by the new method are the Line data acquisition. We define now a set of two or following: more vertical lines in the document. Embodying the previous preprocessing tool, we define that a pixel be- • Efficiency. Instead of using all the image pixels, we longs to a vertical line if its distance from it is less than or use only those lying on certain vertical lines defined in equal to T/2. the image. This results in a drastic decrease of the Then, we define the pixels of every vertical line K, calculation time for skew detection when our method lying at horizontal distance from the left margin, is compared with the Hough transform and the cross- through the following line smoothing binary function: correlation method of Yan. The basic matrix used for r...... +T/2 ...... data storage is of much smaller dimension compared to lineK(v) ~J0 lI~-ai:m T/2 t~(t,y):y:U, ~:= 1.... , Ywin. other methods, which results in a faster algorithm " otherwise, implementation and minimum storage requirements. (1) • Accuracy. The new method extracts the document We can say that the function linex(y) depicts text pixel skew with high accuracy. This can be further improved existence at the vertical line K after the line smoothing by using more than two vertical lines, in contrast with transformation. In contrast with the Hough transform the Yah method which uses only two vertical lines. The approach where all the image pixels are used, we will use use of more than two vertical lines improves the only the pixels belonging to these vertical lines. Thus, we accuracy, reduces the possibility of a wrong result need less memory and we speed up significantly our due to noise and diminishes the possibility of missing algorithm. a text line of short length. This happens because the skew detection accuracy depends on the distance between the first and the last vertical lines. 3. SKEW DETECTION USING TWO VERTICAL LINES • Robustness. The results of the proposed method are In this section, we will describe the new method using robust to the presence of graphics in the document only two vertical lines. which is not true for methods based on the Hough transform. 3.1. Selection of pixels for skew detection Experimental results, using two or more vertical lines, illustrate the applicability of the proposed method for A common characteristic of a text document is the skew correction of documents rotated at several angles. repetition of the horizontal text lines along the vertical Skew detection and text line position determination in digitized documents t507

I I I I

sta~ac~ ..qutnlpt.tou~uad~ ~¢akb it ol~r,,I,..s, whi

of ~Cjnl~tS must bc J~tuwn ,2/~'~Or~ I 'l'be meJhud re4~.kd here e.~nbi~:s * ,i~el-tm

~rto~ ~ ~c hi~,t le~'el oJ ~ quad-u~, tot. i~qvM b 7 • city.wind di~¢~ad hi~d~r~ e~i~uli~e o~ fix: s~lt~s ebt~d ~ t~ tol~ ~'el d,hc ~zee. 7x,,¢ sel~umtafi~r, is ~ta~c~l, bu~ i~ ~a.~d ~

l~is i~ b~"uu~e, u~J~ ~ wid~y u,mcl ctu~i~l I

Fig. 1. The image before and after horizontal run-length smoothing,

direction. This is obvious (see Fig. 1) by observing the and d2 in two points having vertical distance repetition of the pixel-blocks along the vertical columns. [ ~ (D2 - DI ) tan 0. Making the assumption that a docu- These blocks correspond mainly to horizontal text lines. ment can be rotated up to ±5';, i.e, 0,T,a~=5° due to a It is noted that, although in most cases the repetition of scanning misplacement, the vertical distance / must text lines is approximately periodical, this is not a pre- satisfy the constraint: requirement for our approach. Examination of the blocks [ 2TcO,,a,~ between two different vertical lines can give the neces- -L

/ pixel at Fine d~ for yk= 1 and there is also a text pixel at \ Xwin'" ) line d2 for yt= 1+3~4. If the image skew angle is 0, then the intersection of every text line with the two vertical lines d~ and d2 should have a vertical distance (D2-D0 tan 0. So, the correla- tion matrix C will have maximum accumulation of points along the y-axis for A = int[0.5 + (Dz-Di)tan01 T (where int[-] is the integer part of -). Making a reverse ~ir /~ L approach of this syllogism, image skew is obtained if we <-_.._ detect the global maximum of the vertical projection of the correlation matrix C. dl d2 The vertical projection of the correlation matrix is given from the formula:

}/win ,1 V)~CI-~L,L P(A) = C(k'A)' 1. ~4~ < > According to the above, if the global maximum of P(A) is Fig. 2. Image window of size Xwin * Y,~.i. and region [-L,LI. at A=Am~,x, then the document skew is given by the 1508 B. GATOS et al.

24 23 22 21 2" 1 11111 20 20 11 11111 19 19 111 11111 18 18 1111 1111 17 17 11111 111 16 16 15 15 14 14 13 1 11111 1213 jji~lli~lj!~ 12 11 11111 11 11 111 11111 10 10 1111 1111 9 9 1111 111 8 8 7 7 6 6 5 5 11111 4 4 11111 3 3 11111 2 2 1111 1 1 111 ) -5-4-3-2-1 0 1 2 3 4 5 dl d2 C(y,X) X Fig. 3. Correlation matrix of the vertical line dz towards d2. following relation: 3.3. Text line detection using the correlation matrix

0 tan I \D2 - DI J' (5) With the skew angle of the document determined to be at A=A ...... then we can now use the correlation matrix C As we can see in Fig. 4, we have a global maximum of to compute the position of text lines in the document. The the projection for A 3, which means that the document concentration of points around the column of C(y,A) for skew angle is tan I[3/(D2-D1)]. A /~maxcan give us the positions of the text lines. Giving

24 23 8 P(X) 22 21 E. 21 20 []@I~E ]@ 19 3~m~@m 18 HHHI]~m 17 16 [][]r~ [ ]m 16 15 15 14 14 13 12 12 11 ,~=3 11 10 10 :Elm

7 6 oooomNE s B ,.JNNIN,IN [ 3@sm@@@ 9~m -5 -4 -3 -2 -1 0 1 2 3 4 5 dl d~

Fig. 4. Skew detection using the correlation matrix of Fig. 3. Skew detection and text line position determination in digitized documents 1509 suitable weights to the elements of C, starting with 1 for 24 A=Ama~ and descending by half every time we go away 23 from the point A=Amax, we form the line position detec- 22 tion function Ll(y): 2"1 ,~..... 20 LI(y) : ~ ~C(v,k) 1 9 ...... ' 18 ~:L 17 @@@E lind L I C(v, k). v = 1..... Y,,,i,. (6) 1 6 [ ]@ + ~ 2 k- A..... - k::~ ...... 1 15 []

We use the weights 1/21&,,~-kl to emphasize the 1413 points close to ,~max. We also, could use other kernel 1 2 OON ONO values, i.e. l/a lain"* kl, a¢2. However, our experimental 1 "1 results indicated that a=2 is a suitable choice. Taking 10 8o8 a 2, we have another advantage that is the fast imple- 9 mentation of equation (6) since 1/2 ~m- k is a single 8 °°° 'J register shift. 7 The local maxima of the function LI(y) give us the 6 position of the text lines. That is, ify~ ,Y2,... ,y~ are the 5 local maxima of the function LI(y), then the text lines lie 4 [] on the image lines with coordinates: 3 2 ((DI ,Yl -'~.... ), (D2,Yl)), ((D1 ,Y2 Arnax), (D2,Y2)) ..... 1 000[ ]OQ ((D, ,.Vn -- ~ ..... ), (De,y,,)), (7) ~'-D13 An example of this procedure is given in Fig. 5, where < D2 ) we have local maxima at y=3, 11 and 19. Using these values and the skew angle, we can determine the posi- dl d2 tions and the slopes of the text lines. As we can see in Fig. 6, we have detected three lines passing from the Fig. 6. Lines detected from the three local maxima of Fig. 5 y=3, 11 and 19 pixels of line 1 having approximately 3 C correspond to lines with coordinates: ((D~,3)-(D2.6)), ((DIAl) (D>I4)),((DbI9) (D2,22)). slope.

4.1. Determination of the correlation matrices of 4. SKEW DETECTION BY USING MULTIPLE VERTICAL multiple vertical lines LINES In order to increase the method's accuracy and robust- Now, we will extend the skew detection method to ness, we use the information of the image pixels that lie more than two vertical lines. Using multiple vertical on more than two vertical lines. Suppose we have defined lines, we can increase the accuracy and robustness of M vertical lines di, i -- 1,..., M. Distances Di from the the method. left margin are defined so that the image is divided into

3,00

2,50

2,00

LZ(y) 1,50

1,00

0,50

0,00 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

Y Fig. 5. The line detection function of the correlation matrix of Fig. 3. We detect text lines at local position for y=3, 11 and 19. 1510 B. GATOS et al.

< Xwln >

dl (:12 dl dl dM

/\ ,,/ Lij /\

\/ Lij

Dl > DM

Fig. 7. Image of size Xwi n * Ywm and vertical lines at distances DI, De,.... DM.

equal parts: Di = (i,Xwin)/(M + 1), i = 1,... ,M (see projection of CG. In order to calculate the matrix CG, we Fig. 7). first translbrm the correlation matrices C/j by suitable For every pair of vertical lines with distances di and di, scaling, considering all the vertical distances of lines we search for all point pairs with vertical distances A, di, dj being not at distances Dj, D i but at distances such as DM, D1, respectively (see Fig. 8). To perform this scaling procedure, every correlation array C o is transformed into f2"rrOma×5 Lij <_ A <_ Lij, where Lij = (Dj Di) tan~ .~60)" a new matrix CS 0 as follows:

(8) CSij(Y, a) = CSij(y, a)+Cij(Y, A),

Thus for every pixel of a vertical line d i [linei(Yk)--I ], we examine the pixels at all vertical lines dj in a region Dj D,J [-Lu,Lij] centered at Yk. Afterwards, these values are (J0) stored in the correlation matrices Cii. Clearly, we have M(M - 1)/2 correlation matrices given by the following and for every y - 1,..., Y,,,i, and -L/j < A < L~. Now, the global correlation matrix CG is equal to relation: M.-I M Cij(Y, A) -- linei(y)linej(y + A), CG(y,A/- cs0(y, at (ll 1 <_y< Ywin and -L O

4.2. Skew detection from the correlation matrices of Ywin multiple lines P(A) = ZCG(k,A), Va C [-L,~t,LIM], (12) In order to calculate the skew angle of the document, k 1 we first construct a global correlation matrix CG from all where LIM is calculated from equation (8). the correlation matrices Cij determined according to The document skew angle is calculated from the equation (9). Then, as in the case of two lines, the skew global projection maximum. If the global maximum angle corresponds to the global maximum of the vertical of the projection is for A=A ..... then the skew angle Skew detection and text line position determination in digitized documents 1511

T D M -~ (;L+l/2)'~ T Dj -[~ DM -DI (~,-1/2) Dj -a

(-'-" D'l --')' <--- D, ) ) (---- DM )

Fig. 8. Normalization of correlation matrices Cu, considering the distances bettwen lines di, d) being not D i D i hut DM Di.

is equal to resolution and Xwi,1 is the horizontal dimension of the document. A.... "] 0=tan I\Dm_D~j. (131 As an example, for a digitized image of A4 size (Xwi,=8.5 in.) and with resolution S~300 dpi, we will approximately have:

4.3. Accuracy of the skew detection algorithm • For two vertical lines (M=2): e = 0.0337 °. • For five vertical lines (M=5): c = 0.0169 °. The accuracy of the skew detection algorithm depends on image resolution and on the distance between the first and the last vertical lines. Specifically, the estimated Table 1. Accuracy of the skew method skew angle is 0:ke, where Scanner resolution

e = ~ tan -j - . (14) No. of 200 300 400 800 vertical lines Since 1/(DM--Di) << 1, we can approximate e (in 2 0.0506 0.0337 0.0253 0.0126 degrees) as 3 0.0337 0.0225 0.0169 0.0084 4 0.0281 0.0187 0.0140 0.0070 IM+ 1 1 360 5 0.0253 0.0169 0.0126 0.0063 ~ 2M - 1 S'Xwi n 27r ' (15) 6 0.0236 0.0157 0.0118 0.0059 7 0.0225 0.0150 0.0112 (I.(1056 where M is the number of vertical lines, S is the scanner's 1512 B. GATOS et al.

stadsdc=!l as~ampiions u~d~r which t operate: wl~il= both m¢ hods su~r from the."ptoblen Ithat da¢ t umber IIIII ill IIII IIII,c, of sel~p~:ats must!be know~rt~a wiorJl The t ~cthod re,ported h~t¢ coral: [ne~ a go ld-lr¢¢ smoothing Ol~qfio tt wit?~ suttisti:~l clax~ilk~tion II IIIIIIIIIIII III1,0, perform .'d at Ih¢i highest jc,~el of t] te quad-u Ce, fol- lowed I: y s downward dir~:cled bou ndar.~ csd rnatioo based o the ¢ilSents cbl~incd at t ~e top 1~~ l o[ the IIIIIIIIIII Ii1111111 ,e, t~¢, 1"~ = ~S~ni~iae is ~tatisdcal,,but is has :d on a , .... '~ "': " - ' ~ W,,' I !11111111 I !111,0 This i~ ~cau~, pnlik~ e~tai~ wide ty a~d :~'~stu,'rinl ~Igorhl ms (e.g. Ball and Hall'~'~ i: b ba~d ~n lqc~ ¢perati ms. Aftci" a d-sc~'ipucn ol. the m~ ~od, it pedo~anee is icoml~ace~ vath t~at of e_ ;la~si~ I, I II , j:L mr,xim ~m likelihood cla:ssifier o~ a set >f whir. Oaussim noisei ~tds arid an e:!ample o textu~ i = I. [ classifi~ ration is l~tcscnled.~'l'la¢ paper tscone t d.ed wit' u~

It dis~ ¢¢i~rt of ih¢ mer.h6d and itS. exlensiol t. m

~;i~i~ii~i~i~i~i~i~i~iii~ii~ii~iiii!~i~!i;iii~i~Z~!~Zi~i~!~i~!~;!~i~ii~i~i~!~;!~ii~;~i~i~!~i~i~i~i~i~iii~i~i~i...... ::~:,~:., o ¸ Angle--201662429416967 (h)

Fig. 9. Application of the algorithm in a simple text with five vertical lines and after line smoothing preprocessing: (a) document; (b)-(f) intersection points of the text with five vertical lines; (g) function Ll(y) and (h) P(A).

Files Recognition Exit

~:~:~;:~:~:~:~:;:~:~:~::I:~:~:]: :;:~:," :~:~ ::.,~.b'.~ :~ .::: ~:~.'.~:).'.~ :~:: :~ :~ :~ :~::~:: :,%".::71: :,'~ :~:: :~:.:~:: :?: :~:::,'~1 ~ :~) ::.: :~ :~ :~.-": (a) litllIlllll III IIIIIIII 'o' while Stadsti~l assumptions which [t operat;'. ~"< umber" both mcihods sui~ r from t] z ptoble~ that the i Illl IIIIl II III Illllllll '" of sel~c.ttts muslJ be knou n a ~riorl zd-tre¢ The fi~cthod re,ported t ,ere eoml~incs a ~ ieatiord smooth~g Opl~tiOt~ Wi~ b. atatisC~l cl~ IllIIll I IIIIII IIIIIIIIIIIl'°' ~c, fol l~rform~ d at lh~ highest 1~'~! of t~e quad-t lowed hy It dowa.'ward dil ccted bodndary ,.sl matio~ lllIllIlI Illl IIIII,e, based o~ the ~I~ents cbl aiacd at the top l~ ~1of th tc~, "l~ se&men~axion Ls >ta:isdeall bul is ba ed on

'~~----n ~o ~0ea the ~umber t~ asterin I ~ andi-lall'~ ~t b ~d on loc~ i Ii (g) ~l~¢llIb?IlI~ vl.v'.~i" I 1 . . _~P .t--- ~I 0peratiDns. After a d-s~-tpuon o# tu~ -,~ hod, i~ pe rfO~an¢~ is ~[cotrtpare~:l ~,ith thati of e. , JL,I I I, t, I ,,i k, maxirr~m likelihood cl.~ssifier o,~ a set ,r whi I textu_] ¢las~ti~a tion is ~rescnled t The pe.1~lr ,sconct, ~dwit , dis~~ort of lhc mcth!d and ,iS exIen,,o

~ii~iiiiiiiiiiiiiiiiiiiii~i~iii~i~i~i~i~i~i~i;~i~!~i!~i~i~;i~;ii;i;ii~i;iii~;i~i~ii~ii~i~iii~ii;!~ii~!~i~!~i~i~i~!~!;~i;!;ii~ii;!~ii~i;ii~ii~i;ii~i!~i~i~;~i~i;!::: Angle--2.02136536098914 (h)

Fig. 10. Application of the algorithm in a simple text with five vertical lines without smoothing preprocessing: (a) document; (b)-(f) intersection points of the text with five vertical lines; (g) function LI(y) and (h) P(A). Skew detection and text line position determination in digitized documents 1513

Files Recognition Exit

(a)

statistical as~mp ,ons under wlfich operates, w~i { b)j both methods sull x from the ~roblen th at the numb, of scgcttetxts must be known a wiori The medmd r, ~ort~d her~ combines a quaaqr,

perforn~d at thl highest le~l ol u,e quaY-trio, R lowed by s dow~ ]ward directed bou ~dar~ csfimati~ (¢) I based ore lh= ~¢grt~ents obtained at ( )c top l~el of t: t~¢¢. The _~.."~-p !axion ts stanstt 1, but is based o( ~ocat ¢~n'a'oi<~ ~go~thm,' ..... '~i:t:L (equurcs t10

This is because, tnlik¢ ~rta~ wide ly used ¢i~st~i algorithms (¢g. ]aU and Hall'~), i is ba~d on lo i (d) occrations. After a d=scrip rion ol the method, L ptrfot'maoot is comcared ~'Jth tltat of e. classi maximem likelihood ¢|~s~fier 0 a set of w]" Oaus,sian noise fields and an e~ampte of text cl~sification is ~ res=nted. The pap~ r is con.eluded v a discussiccz of I~¢ method and its extensmn.

~i~i::!~!::!~ii!i~i~:::;:~i I~ :::::~ i~~i~ ~i::~] ~:;:~i:.:i::: ~ ~i~i:i:f:~:.~!~ ::.:i:i ~:!~!~-:i:~i::: ~:: ~i::: ~:::::. ~:. ~ :::~i.'<::~~ ~ ~ ::.~~-:.::;::i ~i~:. ~:i ~:ii::::~::! i~!:!:ili ~:::?: i::f:iii:Ji~:: ::;~::, ~:::!~!~! i:!::.i:: i!~:i :~ 10 ]~ 20 2.) ~ ..~ At~gle--t.60228283460058

Fig. 11. Example with two vertical lines and smoothing preprocessing: (a) document; (b)-(c) intersection points of the text with two vertical lines; (d) function LICv); (e) P(A).

Comporotive Skew Detection Results

3.0 2.8 ...... [ i ...... i ...... i...... i ...... i ...... i ...... i ...... i ...... /.-:- -- R.~,.¢, i i i ! i ! i 2.6 "1 ...... = o~.',, r ...... • ...... ~...... ~ ...... ',...... ~...... : ...... :~--"~ .... 2.4 [ ...... ,..,,,. ,,,..,o,, j ! i ! i ! ! : '! i ..... i ...... i ...... iy ...... i 2.2 ..... ? ...... : ...... : ...... : ...... ~...... ! ..... ,'- ...... ? ...... : ...... : ...... ::4' ...... ~ ..... ~- ...... 2.0 1.8 cD 1.6 L_ i : i i i i : ..~'. i i i i i i " O'~ 1.4 1.2 ..... :.. .: ...... ~..... i ...... ~ ...... ' ".~" : S "~ ...... i- .i .... i .... i ...... i .... i ..... :..... 1.0 ...... :...... i...... !...... ! ..... • -<-i: .... i...... !...... !...... !...... i ...... : ...... i ...... i ..... 0.8 ...... ~ .... " ...... '...... :~.2 .... : . . : ..... : ..... ' ...... '...... " .... : ...... '...... ; ...... 0.6 0.4

0.2 ..... ~-i.. • ,~ .... ',...... : ...... i ..... : ...... ~ ..... 7 ...... " 0.0 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Fig. 12. Comparative skew detection results. 1514 B. GATOS et al.

Table 2. Experimental comparative results

Skew angle Hough transform New method HT absolute error Absolute error using the new method (degrees) 0.0 -0.1 0.0 0.1 0.0 0.2 0 O. 13 0.2 (l.07 0.4 0.2 (I.4 0.2 0.0 0.6 0.5 0.51 0.1 0.09 0.8 0.7 0.83 (1.1 0.03 1,0 0.8 0.96 0.2 0.04 1.2 1.0 I. 11 0.2 0.09 1.4 1.3 1.36 0.1 0.04 1.6 1.5 1.5 (I. 1 0.1 1.8 1.7 1.77 0.1 0.03 2.0 1.9 1.97 0.1 0.03 2.2 2.1 2.11 0. I 0.09 2.4 2.2 2.4 (I.2 0.0 2.6 2,5 2.66 0.1 0.06 2.8 2.6 2.73 0.2 0.07 3.0 2.9 3.04 0. I 0.04 Maximum error (I.2 O. 1 Minimum error 0.1 0.0 Mean value 0.1375 (l.04875 Standard deviation 0.0484 0.032763

| Files Recognition Exit (b) Skew detection and text line position determination in digitized documents 1515

Table 3. Experimental comparative results

Skew Image Hough Hough+ ICC New angle RLSA (YAN) method

4 ~~. Time 334 s Time 470 s Time 205 s Time 5 s Abs. error 0.00' Abs. error 0.00' Abs. error 0.17 Abs. error l).04

-3 ~.~- ~ ~(~_~.~ Result -2.9 ~ Result -3.1 Result -3.1F Result -3.00 Time 340 s Time 470 s Time 206 s Time 5 s Abs. error 0.10:' Abs. error 0.10: Abs. error 0.11 Abs. error 0.ll0

==,-~-~, Result -2.1 Result -2.2 Result - 1.97 Result 2.08 ' ~.'~-~': ~-- Time 336 s Time 476 s Time 206 s Time 5 s Abs. error 0.10' Abs. error l).20 Abs. error 0.03 Abs. error (I.08

I ~,..,~,~,.~ ~2:~,~ Result -13 Result -1.2 Result -1.20 Result 1.04 .~-~.~ =z~. Time 339 s Time 471 s Time 206 s Time 5 s Abs. error 0.10' Abs. error 0.20 Abs. error (1.20 Abs. error 0.04

?"~'~'--"~'~'~" ~'-; Result 0.1': Result 0.1' Result 0 Result 0.12 0 K~~~ Time 339 s Time 473 s Time 206 s Time 5 s Abs. error 0.10' Abs. error ll.00 ' Abs. error 0.00 Abs. error 0.12

-"-~-:_.~.~ Z.: ..':.:2 --:-'-'..--#- ~7-.rzT.'.'...~.-. 1 ~,~.~.z- Result 0.8' Result 1 Result 0.81' Result 0.92 "'"~%~°-" Time 330s Time 473s Time 207s Time 4s Abs. error 0.20 ~ Abs. error 0.00' Abs. error 0.19 Abs. error 0.08

- ~,~ ~. Result 2 Result 2 Resuh 2.06 Result 1.90 '~ "1~ Time 330 s Time 476 s Time 206 s Time 5 s Abs. error 0.00' Abs. errnr 0.00 Abs. error 0.06 Abs. error 0. I(I

~.--.:~S,~-!~'z-~ ~.-

3 ._~'.~.'~.,,-,,.=..d~.,r~,..,,,, ~=~=...... Result 2,9 ~ Result 2.9' Result 2.90 Result 2.93 K~~-:~.~ Time 327 s Time 470 s Time 201 s Time 4 s p~.L~__ Abs. error 0. lff Abs. error 0.10 Abs. error 0. I(F Abs. error 0.07

4' .~. ~.,,,,,,,,, Result 3.9' Result 3.9 Result 3.75 .... Result 3.91 """ff-:~£-'~'~""~ ~ Time 321 s Time 472 s Time 202 s Time 5 s Abs. error 0.10 Abs. error 0.10 Abs. error 0.25' Abs. error 0.09

Maximum error 0.20' (I.20' 0.25' 0.12 Minimum error 0.00° 0.00" 0.00 ~ 0.00 Mean error value 0.088' (l.088 '~ 0.123 0.068'

Mean required time 332.88 s 472.33 s 205.00 s 4.77 s 1516 B. GATOS et al.

266 Image sel;mentation allio~ithmi

~or(J-tll; J

Fisum 8.11.~ kaqge ~.gmentatioa in N *q~ruge rt*gioax.

The *lpplicttion or' tiff* Llgorithm to the eeltmentation of eJa image in four eq~ regionl il -town m F'~pn-e6.2A. If the image histogram i~ coucentrsted in • intemdty ra~e., uxdfot'm threthoidix~ (6.2.r) dou not giv~ good t'emdtx, bec~ number d r~ umy enutsin memt ~ the imalge pixels, w]aere~ the i the re~n~, are almmt naee~ttng. N~atmlform thxe,h~din~ crentes much 1 relmlts in this CaN, A ncw"tn~f'~rlll thresholdhalg tech~que hi based on lfillt, equsii~tio~ technique, dem:ribed ia G'mtpt~ 3. Let G(/(t,I)) be the tra m*tio~ fuactlon um~d in hi~t~ equ~dlzatioa. The ~elpmentmtion proced deecribed by the equation:

,,?., tf ,{L/.,v I < a(f(k,t)) < (~ + ~)lM,v] g(t, t) = R,,_, i~ (~v - l)lt,/~v I < c(f(t,, O) <

Non-uni[oma thred~oldiug (6.2.8) g~ae~lly reaults ia imafe segment~tlo] are bette~ tlum the ones given by (6.2.7). Thre~holding tedmiques deKribed in this *eetion 6,u*r~ttee re~ &eneit¥ but no region eonneetedneu, bee~u~ thtedaoldJag is performed on bw, end ao n~rhood iat'oxn~icn is t~ken into account. Proximity m~iou is combined with h~aogenelty rule* to produce ~ted re~ow tectmiques de*cribed in the next section.

Fil

Fig. 14. Document of Fig. 13 after skew correction.

lines by calculating the pixel concentration around the 96 dpi and 7.398 in., respectively. As we can see in column of matrix CG(y, Amax). The methodology for text Fig. 9, we use five vertical lines in a pure text area. line position determination is the same as in the two lines According to the proposed algorithm, we determine the case. We form the text line position function LI and correlation matrix CG and the line detection function detect its local maxima. LI(y). In Fig. 9, we can see the intersections of the text with the five vertical lines, the line detection function Ll(y) and the projection P(A) of the correlation matrix. 5. EXPERIMENTAL RESULTS The skew angle for this image is calculated to be equal to Our method was tested extensively on a variety of 2.0166 °, which is close enough to the real 2 '~. document images rotated at various angles. The results If we apply the skew detection algorithm without line- were very promising concerning skew angle accuracy smoothing transformation to the same image, then (see and line position determination. In this section, we Fig. 10) we observe that the skew angle is estimated as demonstrate the effectiveness of the method by present- -2.0213 °, which is approximately the same with the one ing four characteristic examples implemented in a determined in the first case. However, the line detection 486DX/33 MHz computer. function takes a non-suitable form and its local maxima cannot be found with great accuracy. Example 1. In the first example, we apply the method Finally, we apply the proposed skew detection algo- using multiple vertical lines on a document with a skew rithm with two vertical lines, and we estimate now a skew angle of 2 °. The scanning resolution and Xwi, are equal to angle at 1.8822 (see Fig. 11). Skew detection and text line position determination in digitized documents 1517

FUes RecogniUon Exit

(a) (bl[

iM~d0fl3l CBP 'N:64,H:I6?t:~1,/~:15 Jz ilm, ilmmnnnnm ,',[ Llllllllllll F.16 ~,~ ]337~ i~ ~ !~, 1~ ~,~d 29~7~ i131~ '°'[. , ... _ _1 ,,l_, ,_, J SAll~AT 29,~ 31,71dB

Angle-.G16940462033071 (h)

Fig. 15. Application of the algorithm to a document with text into tables: (a) document; (b)-(f) intersection points of the text with five vertical lines; (g) function Ll(y); (h) P(A).

Example 2. In order to have comparative results with properly from the projection of the correlation matrix and other skew detection techniques, we applied the Hough equals -3.546 ':'. This slope corresponds to the line transform as well as the proposed method to the same appearing in the middle of the image in Fig. 13, while document of Fig. 9 which was skewed from 0 to 3 '~ with in Fig. 14 we can see the document after skew correction. a step of 0.2 ~. Table 2 summarizes the results of this The Hough transform fails when applied to the same procedure and Fig. 12 shows their graphical representa- document and results in an unacceptable value of - 1.4 . tion. As we can see, the maximum absolute error for the Hough transform approach is equal to 0.27 , in contrast Example 4. Our skew detection method gives satisfactory with 0.1 ~ for the new method. Additionally, the average results even in cases of documents with tables. In Fig. 15, computation time was 92 s for the Hough transform and we have a document with a text-table that has 300 dpi only 7 s for the new method. resolution, Xwin=6.423 and a real skew angle of -0.5. Similar results are taken if we compare our method We apply the skew detection algorithm with five vertical with the use of the Hough transform after smoothing lines. In the result window of Fig. 15, we can see that we preprocessing and with the cross-correlation method of have a clear global peak in the projection corresponding Yah. ~22) The results of this comparison are demonstrated to an angle of 0.5169'. in Table 3 and were obtained from images rotated from 4' to 4 "~ with a step of 1 '. The new method gives the smallest maximum error in significantly shorter time. 6. CONCLUSION In this paper, we proposed an efficient algorithm for Example 3. The proposed method works well even when skew correction and text line position determination of we have mixed texts with graphics or texts into tables. In document images. The method uses only a limited num- contrast with the Hough transform method, the new ber of pixels lying on vertical lines located on specified algorithm can be applied to the entire document and distances. Based only on these pixels, a correlation not only to pure text areas. We apply the skew detection matrix is constructed. The skew angle is determined algorithm, using five vertical lines, to a document of from the global maximum of a projection derived from 96 dpi resolution with Xwin=l 1.458 in. The real skew this matrix. After the skew angle is determined, a text line angle for this document is -3.5 ° . As we can see in the detection function is also defined and its local maxima result window of Fig. 13, the skew angle was detected give the positions of the text lines. 1518 B. GATOS et al.

In summary, the proposed method 11. E. Persoon and K. S. Fu, Shape discrimination using Fourier descriptors, 1EEE Trans. Systems Man Cybernet. • is faster and more accurate than the Hough transform 7(2), 170-179 (1977). and Yan's cross-correlation approach and it requires 12. H. Y. Abdelazim and M. H. Hashish, Automatic reading of bilingual typewritten text, Proc. VLSI and Microelectronic less memory because it uses only a small subset of the Applications in Intelligent Peripherals and their Applica- document pixels; tion Networks, pp. 140-144 (1989). • is robust since it works equally well with mixed text/ 13. B. Taconet and M. H. Hamza, Recognition and restoration graphics documents; of globally degraded printed characters, Proc. lASTED Int. • does not depend on the periodicity of the text lines; Syrup. Applied lnformatics-Al'89, pp. 102-105 (1989). 14. G. L. Cash and M. Hatamian, Optical character recognition • determines the text line positions; by the method of moments, Comput. Vision Graphics • its accuracy can be adjusted by increasing the number Image Process. 39, 291-310 (1987). of vertical lines. 15. B. Gatos and N. Papamarkos, A novel method for character recognition, Proc. Forth Int. Conf. on Advances in Communication and Control, COMCON 93, pp. 493-503, Rhodos, Greece (1993). REFERENCES 16. N. Papamarkos, 1. Spiliotis and A. Zoumadakis, Character recognition by signature approximation, Int. Z Pattern 1. W. H. Abdulla, A. O. M. Saleh and A. H. Morad, A Recognition Artificial Intell. 8(5), 1171-1187 (October preprocessing algorithm for handwritten character 1994). recognition, Pattern Recognition Lett. 7, 13-18 (1988). 17. B. Jahne, Digital Image Processing, 2nd edn. Springer, 2. R. Kasturi, S. T. Bow, W. E1-Masri, J. Shah, J. R. Ganiker Berlin (1993). and U. B. Mokate, A System for interpretation of line 18. J. P. Bixler, Tracking text in mixed-mode documents, Proc. drawings, IEEE Trans. Pattern Analysis Mach. Intell. ACM Conf on Document Processing Systems, pp. 177-185 12(10), 978-992 (October 1990). (1988). 3. N. Papamarkos and B. Gatos, A new approach for 19. L. A. Fletcher and R. Kasturi, A robust algorithm for text multilevel threshold selection, Comput. Vision Graphics string separation from mixed text/graphics images, IEEE Image Process.: Graphical Models and Image Process. Trans. Pattern Analysis Mach. lntell. 10, 910-918 56(5), 357-370 (September 1994). (November 1988). 4. E M. Wahl, K. Y. Wong and R. G. Casey, Block 20. S. C. Hinds, J. L. Fisher and D. P. D'Amato, A document segmentation and text extraction in mixed text/image skew detection method using run-length encoding and the documents, Comput. Graphics Image Process. 20, 375-390 Hough transform, in Proc. Tenth Int. Conf. on Pattern (1982). Recognition, pp. 464-468 (1990). 5. K. Y. Wong, R. G. Casey and E M. Wahl, Document 21. D. S. Le, G. R. Thoma and H. Wechsler, Automated page analysis system, IBM J. Res. Devel. 26(6), 647 656 orientation and skew angle detection for binary document (1982). image, Pattern Recognition 27(10), 1325-1344 (1994). 6. D. Wang and S. N. Shihari, Classification of newspaper 22. H. Yan, Skew correction of document images using image blocks using texture analysis, Comput. Vision interline cross-correlation, Comput. Vision Graphics Image Graphics Image Process. 47, 327 352 (1989). Process.: Graphical ModeL~ and Image Process. 55(6), 7. E Chauvet, J. Lopez-Krahe, E. Taflin and H. Maitre, A 538-543 (November 1993). system for an intelligent office document analysis, 23. G. Ciardiello, G. Scafur, M. T. Degrandi, M. R. Spada and recognition and description, Signal Process. 32, 161-190 M. P. Roccoteli, An experimental system for office (1993). document handling and text recognition, in Proc. Ninth 8. E W. M. Stentiford, Automatic features design for Conf. on Pattern Recognition, pp. 739-743 (1988). optical character recognition using an evolutionary 24. H. S. Baird, The skew angle of printed documents, in Proc. search, IEEE Trans. Pattern Analysis Mach. lntell. 7, SPSE 40th Conf Symp. Hybrid Imaging Systems, Roche- 349-355 (1985). ster, New York, pp. 21-24 (1987). 9. C. Lettera, M. Majer, L. Maser and C. Paoli, Character 25. W. Pstl, Detection of linear oblique structures and skew recognition in office automation, Advances in Image scan in digitized documents, in Proc. Eighth Int. Conf on Processing and Pattern Recognition, V. Cappellini and R. Pattern Recognition, pp. 687-689 (1986). Marconi, eds, pp. 191-197. Elsevier, North-Holland, 26. B. Gatos, S. J. Perantonis and N. Papamarkos, Accelerated Amsterdam (1986). Hough transform using rectangular image decomposition, 10. S. Kahan, T. Pavlidis and H. S. Baird, On the recognition Electronics Lett. 32(8), 730-732 (1996). of printed characters of any font and size, 1EEE Trans. 27. K. Y. Wong, R. G. Casey and E M. Wahl, Document Pattern Analysis Mach. lntell. 9, 274-287 (1987). analysis system, IBM J. Res. Devel. 26(6), 647-656 (1982).

About the Author--BASILIOS GATOS was born in Athens, Greece, in 1967. He received his Diploma degree in Electrical Engineering from the Democritus University of Thrace in 1992. At present, he is working for his Ph.D. degree at the Electrical and Computer Engineering Department of Democritus University of Thrace, in the field of Optical Character Recognition. Since January 1993, he has been a Postgraduate Scholarship holder of the Institute of Informatics and Telecommunications, National Research center "Demokritos". His current interests include document analysis and image processing. He is a member of the Greek Technical Chamber.

About the Author -- NIKOS PAPAMARKOS was born in Alexandroupoli, Greece, in 1956. He received his Diploma degree in Electrical and Mechanical Engineering from the University of Thessaloniki, Thessaloniki, Greece, in 1979 and the Ph.D. degree in Electrical Engineering in 1986, from the Democritus University of Thrace, Greece. From 1987 to 1990, Dr Papamarkos was a Lecturer, from 1990 to 1996, Assistant Professor in the Democritus University of Thrace where he is currently an Associate Professor since 1996. During 1987 and Skew detection and text line position determination in digitized documents 1519

1992 he also served as a Visiting Research Associate at the Georgia Institute of Technology, U.S.A. His current research interests are in digital signal processing, filter design, optimization algorithms, image processing, pattern recognition and computer vision. Dr Papamarkos is a member of IEEE and a member of the Greek Technical Chamber.

About the Author--CHRISTODOULOS CHAMZAS was born in Komotini, Greece. He received the Diploma degree in Electrical and Mechanical Engineering from the National Technical University of Athens, Athens, Greece, in 1974 and the M.S. and Ph.D. degrees in Electrical Engineering in 1975 and 1979 from the Polytechnic Institute of New York, Farmingdale. From 1979 to 1982, Dr Chamzas was an Assistant Professor with the Department of Electrical Engineering at the Polytechnic Institute of New York. In September 1982 he joined AT&T Bell Laboratories, Holmdel, NJ, where he was a member of the Visual Communication Research Department until 1990. Since September 1990, he has been a member of the Faculty of the Electrical Engineering Department at Democritus University of Thrace, where he is a Director of the Electric Circuits Analysis Laboratory. He has been a major player in the definition, design and implementation of the CCITT/ISO (JBIG, JPEG, etc.), standards for coding, storage and retrieval of images (color and bilevel), an area where he holds six international patents. In 1985-1986, he was a Visiting Professor at the Department of Computer Science, University of Grete, Ir'aklion, Greece. He has held summer positions in Greece, England, Portugal, as well as at Bell Laboratories. His primary interests are in digital signal processing, image coding multimedia and communications systems. He is currently interested in the implementation of multimedia image database algorithms either with fast software or with VLS1 design. Dr Chamzas is a member of the Technical Chamber of Greece, Sigma Xi, an Editor of the IEEE Transactions of Communications and a Senior Member of IEEE.