Dartmouth College Dartmouth Digital Commons
Computer Science Technical Reports Computer Science
6-7-1993
Off-line Cursive Handwriting Recognition Using Style Parameters
Berrin A. Yanikoglu Dartmouth College
Peter A. Sandon Dartmouth College
Follow this and additional works at: https://digitalcommons.dartmouth.edu/cs_tr
Part of the Computer Sciences Commons
Dartmouth Digital Commons Citation Yanikoglu, Berrin A. and Sandon, Peter A., "Off-line Cursive Handwriting Recognition Using Style Parameters" (1993). Computer Science Technical Report PCS-TR93-192. https://digitalcommons.dartmouth.edu/cs_tr/82
This Technical Report is brought to you for free and open access by the Computer Science at Dartmouth Digital Commons. It has been accepted for inclusion in Computer Science Technical Reports by an authorized administrator of Dartmouth Digital Commons. For more information, please contact [email protected].
OFF-LINE CURSIVE HANDWRITING RECOGNITION
USING STYLE PARAMETERS
Berrin A. Yanikoglu
Peter A. Sandon
Technical Rep ort PCS-TR93-192
O -line Cursive Handwriting Recognition Using
Style Parameters
Berrin A. Yanikoglu
Peter A. Sandon
Department of Mathematics and Computer Science
Dartmouth College
Hanover, NH, 03755
June 7, 1993
Abstract
We present a system for recognizing o -line cursive English text, guided in part by global
characteristics of the handwriting. A new metho d for nding the letter b oundaries, based on
minimizing a heuristic cost function, is intro duced. The function is evaluated at each p oint
along the baseline of the word to nd the b est p ossible segmentation p oints. The algorithm
tries to nd all the actual letter b oundaries and as few additional ones as p ossible. After
size and slant normalizations, the segments are classi ed by a one hidden layer feedforward
neural network. The word recognition algorithm nds the segmentation p oints that are likely
to b e extraneous and generates all p ossible nal segmentations of the word, by either keeping
or removing them. Interpreting the output of the neural network as p osterior probabilities of
letters, it then nds the word that maximizes the probabilityofhaving pro duced the image,
over a set of 30,000 words and over all the p ossible nal segmentations. We compared two
hyp otheses for nding the likeliho o d of words that are in the lexicon and found that using a
Hidden Markov Mo del of English is signi cantly less successful than assuming indep endence
among the letters of a word. In our initial test with multiple writers, 61% of the words were
recognized correctly. 1
1 Intro duction
Handwriting recognition is the task of interpreting an image of handwritten text. We use
the terms handwriting and cursive handwriting interchangeably, in their most general sense,
i.e. successive letters of a word mayormay not b e joined. Strictly connected cursive
handwriting and discretely written handwriting are referred to as pure cursive handwriting
and handprinting, resp ectively.
The image of the handwriting can b e obtained by at-b ed scanning after the text has b een
written (o -line), or by means of digitizing tablets or stylus p ens as the text is b eing written
(on-line). On-line devices can capture dynamic characteristics of the handwriting, suchas
the numb er and order of strokes, and velo city and pressure changes, but the information must
b e pro cessed in real time. O -line approaches have less information available for recognition,
but have no real-time constraints [TSW90 ].
The system we are developing is for o -line recognition of cursive, handwritten text. The
input to the system is the scanned image of a page of cursive handwriting, written roughly
along the rulings. Currently, the system is trained to recognize lowercase letters only,but
the writing style is not constrained in any other way.
Section 2 of this pap er describ es the pro cess of nding the text line b oundaries. For each
text line, we extract parameters that characterize the style of the writing (style parameters),
as explained in Sections 3 and 4. Sections 5 and 6 describ e the algorithms that isolate
words on a line and letters in a word. This latter step, called word segmentation, or simply
segmentation, involves the use of style parameters to accurately lo cate letter b oundaries and
is key to the success of the overall system. Section 7 describ es the pro cess of recognizing
segmented letters. Two di erent metho ds for word recognition are describ ed in Section 8.
Exp erimental results are describ ed in Section 9.
2 Finding text line b oundaries
To recognize the page, we rst nd the text line b oundaries and then recognize each text line
in turn. These b oundaries are not always straight lines, due to overlap as shown in Figure
1.
To nd the exact b oundary b etween twotextlines,we compute the histogram of pixel
densities along the horizontal scan lines. Using the smo othed histogram, we roughly lo cate
the baseline (the line where the letters sit) of each text line and apply a contour following 2
algorithm to nd the exact b oundaries. The exact b oundary b etween two text lines can b e
thought of as the path of a bug trying to reach the right hand side of the page, starting from
the left hand side of the top baseline, while staying b elow the text of the top line. It is based
on the algorithm that nds the outline of an ob ject, in [DH73]. Since the algorithm employs
8-connectedness, the image is rst smo othed using a 3x3 Gaussian mask to avoid problems
caused by thin, slanted strokes.
Figure 1: Segmentation of the page into text lines.
3 Finding the reference lines
The referencelinesof a text line [SR87] are the four horizontal lines that mark the top of
the ascenders, the top of the main b o dies of letters, the baseline and the b ottom of the
descenders. They will b e referred to as l1, l2, l3 and l4, from top to b ottom. Figure 2 and
Figure 3 show the reference lines of two text lines as found by the system. Note that l1 and
l4 are the p oints where the ascenders and descenders would approximately reach, if there
were any. The algorithm that nds the reference lines is as follows:
1. Compute the horizontal pixel density histogram, h , of the text line.
0
2. Compute the smo othed histogram, s ,as
0
i=2
X
h (x + i) s (x)=
0 0
i= 2 3
3. Compute the \derivative" of the smo othed histogram, d ,as
0
d (x)= s (x) s (x 5)
0 0 0
This represents the pixel density di erence of successiveblocks of 5 scan lines.
4. Starting from the top of the line, let l 1 b e the rst p ointwheres is non-zero.
0
5. Starting from the b ottom of the line, let l 4 b e the rst p ointwheres is non-zero.
0
6. From the midp ointofl 1andl 4upto l 1, rede ne l 1 to b e the rst p oint where s
0
b ecomes zero.
7. From the midp ointofl 1andl 4down to l 4, rede ne l 4 to b e the rst p oint where s
0
b ecomes zero.
8. Let the p eak p b e the p oint where s has its maximum.
0
9. Let l 3bethepoint of lo cal minimum of d between p and l 4.
0
10. Let l 2bethepoint of lo cal maximum of d between l 1 and p.
0
11. If l 1and l 2arevery close, set l 1to(l 2 max((l 4 l 3); (l 3 l 2))).
12. If l 3and l 4arevery close, set l 4to(l 3+ max((l 2 l 1); (l 3 l 2))).
Figure 2: Reference lines l1, l2, l3 and l4, from top to b ottom.
Figure 3: l4 is adjusted since there were no descenders.
If the page was not prop erly p ositioned on the scanner, or if the writing do es not follow
the rulings, the horizontal pixel density histogram do es not show a signi cant p eak. In that
case, we compute histograms angled at -10 and 10 degrees from the horizontal and interpret
the angle corresp onding to the histogram that shows the most signi cant p eak as the line
skew. We then correct the skew by shearing, b efore applying the ab ove algorithm. 4
4 Extracting style parameters
We extract parameters that characterize the style of the writing and use them in decisions
throughout the system. Parameterized decisions are essential for writer indep endent recog-
nition. Some of these parameters are the dominantslant of the writing, the thickness of the
p en and, the average values of inter-word spaces and letter widths.
Most of the style information, such as the average heights of ascenders, descenders and
letter b o dies, is obtained from the reference lines. These are used, in particular, in the size
normalization step to decide whether a letter has an ascender, a descender or neither, and
to scale it accordingly.
We estimate the thickness of the p en and the average inter-word space by analyzing
the run-length histogram of the text. Then, using the vertical pixel density histogram, the
average b o dy height as an initial estimate, and the estimated p en thickness, we estimate the
average letter width. Finally,we nd the dominantslant of the writing by computing the
slant histogram using edge op erators and nding the most common slant within -30 and +30
degrees from the vertical. The dominantslantandtheaverage letter width are particularly
useful in nding the letter b oundaries.
For most of these parameters, it would also b e useful to know not only the average
values, but also the variances in order to assess the reliability of our decisions. For example,
in handwriting such as that in Figure 4, it is imp ortanttoknow that there is a large variance
in letter slants so as not to rely on the dominantslantwhichisnotwell de ned.
Figure 4: Writing without a consistent slant.
5 Finding word b oundaries
Before applying the segmentation algorithm, we nd the connected comp onents of the text
line, using a region growing algorithm. A distance greater than the average letter width 5
between two comp onents is considered to b e a word b oundary. Finding word b oundaries in
this way is not completely reliable but works quite well.
In general, any algorithm that uses a threshold distance is prone to failure since twowords
may b e arbitrarily close to each other, while the letters of a single word may b e signi cantly
apart. This problem implies that the task of nding the word b oundaries should b e extended
to the word recognition stage where, for example, matching the image to pre xes and suxes
in the dictionary would solve the second part of the problem.
Connected comp onent analysis also segments consecutive, disconnected letters, as a rst
step in the letter segmentation pro cess.
6 Finding letter b oundaries
We use separator lines in one of six xed angles (-20, -10, 0, 10, 20 and 30 degrees clo ckwise
1
from the vertical ) as letter b oundaries, as shown in Figure 5. The gaps and ligatures b etween
letters are detected using pixel density histograms of the text line along these six angles.
Figure 5: Segmentation of the word television using separator lines.
Note that although this sample wordisoversegmented, the segmentation is potential ly
correct in the sense that all the actual letter b oundaries are found. The oversegmentation
is partly due to the ambiguity of cursive script segmentation: some letter pairs (digraphs)
or triples (trigraphs) are indistinguishable from a single letter at the image level, as with
the letter w and the digraphs ui and iu. Since we are using a strictly b ottom-up approach,
our algorithm tries to nd all p ossible letter b oundaries and thus oversegments these letters.
Section 6.2 describ es howwe handle oversegmentation.
1
All angles are clo ckwise from the vertical, unless otherwise sp eci ed. 6
Dep ending on the dominant slant of the writing, wecho ose a subset of four of the sep-
arator line angles to use in segmenting the word. For example, if the writing has a 20
degree slant, we use 0, 10, 20 and 30 degree angles. Using each of the corresp onding angled
histograms, we segmenttheword using separator lines at the angle of that histogram, as ex-
plained b elow. Three segmentations of the ab ove sample, as computed by the segmentation
algorithm, are shown in Figure 6.
Figure 6: Three segmentations of the word television using separator lines with 0, 10 and 20
degree angles.
The b eginning of the rst word in the text line constitutes the rst segmentation p oint
for all four angles. Given one segmentation p oint, we nd the next one byevaluating a cost
function, describ ed in Section 6.1, at eachpoint along the baseline to the right of that p oint
and cho osing the p oint with the minimum cost. The cost at a p oint dep ends on the distance
of that p oint to the previous segmentation p oint, the average letter width, the number of
text pixels cut by the separator line and the height of the cut. This last feature helps to 7
lo cate the ligatures b etween letters since they are usually connected at their bases. Wemake
use of the style parameters in this pro cess. For example, we normalize the amount of text
cut by dividing it bytheaverage p en thickness.
For determining the nal letter b oundaries, we rsttake the union of these four sets
of segmentation p oints. This is necessary since a letter b oundary may b e detected in only
one of the segmentations. On the other hand, the union of the segmentation p oints usually
contains more than one segmentation p oint, corresp onding to the same letter b oundary.We
use a threshold of half the average letter width to identify segmentation p oints that are to o
close and might corresp ond to the same letter b oundary.From among these, weidentify the
ones that do corresp ond to the same b oundary and cho ose the one with the lowest cost as
the nal segmentation p oint. The details of the algorithm is describ ed b elow.
1. Find the next segmentation p oint p.
2. Let threshold b e half the average character width.
3. Find the set of p oints P that are within of p.
1
4. If there are twopoints in P with the same segmentation angle, exclude all the p oints
1
after the second one.
5. From the remaining p oints in P , nd the set of p oints P that do not span much text.
1 2
(The numb er of text pixels should b e less than the size of a small i.)
6. From among the p oints in P ,cho ose the one with the lowest cost as the nal segmentation
2
p oint.
The output of the algorithm for a sample word is shown in Figure 7. Note that al-
though some of the letter b oundaries are not found in some of the segmentations, the nal
segmentation is p otentially correct and that there are only 3 extraneous segmentation p oints.
Using this algorithm, wewere able to segment all but one of the words in our test set of
111 words, as explained in the results section.
6.1 Derivation of the cost function
We use a cost function to nd the segmentation p oints in the text. Given a segmentation
p ointwe nd the next one by assigning a cost to all the p oints to the rightofitandcho osing
the p oint that has the minimum cost. The rst segmentation p oint is the left end of the text
that can easily b e found. 8
Figure 7: Segments found using 0, 10 and 20 degree angles are shown on the second, third
and fourth lines, resp ectively. The output of the segmentation algorithm is shown at the
top.
The cost function is de ned for the pair (A; P ) where A is the angle of the separator line
(segmentation angle) passing through the p oint P on the baseline. For a xed angle A,itis
computed as follows:
2
P PP P PP
+ w (TC)+ w (HC); w cost = w
3 4 2 1
EW EW
where PP is the previous segmentation p oint, EW is the average letter width, TC is the
numb er of text pixels cut by the separator line normalized bytheaverage p en thickness and
HC is the height of the highest p oint cut normalized by the total height of the line. The w s
i
are the relativeweights of the cost terms.
We used linear programming to nd a set of weights that would correctly segment a set of
digraphs, representing all typ es of connections b etween letters. Since digraphs are the image
primitives with resp ect to segmentation, an algorithm that correctly segments all digraphs
would also correctly segmentallwords. For each digraph, wechose a correct segmentation
p oint and a wrong one that is likely to b e chosen, and constrained that the cost of the
correct segmentation p oint b e smaller than that of the wrong one. For example, weused 9
the digraph bi where the correct segmentation p ointwas chosen b etween the letters b and
i and the incorrect one was after the letter i. Note that once these two p oints are chosen,
the only unknowns are the weights. Satisfying all the constraints simultaneously is achieved
using linear programming, while minimizing the sum of the weights.
6.2 Handling oversegmentation
At this stage, some words are oversegmented as shown in Figure 8. To nd the correct
segmentation of the word, the extraneous segmentation p oints, the second and third ones
in Figure 8, need to b e identi ed and removed. Since wehavenowayofknowing which
of those p oints are extraneous with certainty,weevaluate all p ossible nal segmentations,
generated by either keeping or removing the p oints that are likely to b e extraneous. Figure
9 and 10 shows two p ossible nal segmentations for Figure 8. In the word recognition stage,
we arbitrate the segmentation by rating eachword in the lexicon for its t to each nal
segmentation and cho osing the word that has the b est t.