OPTICAL CHARACTER RECOGNITION

Màster de Visió per Computador Curs 2006 - 2007

Outline

• Introduction • Pre-processing (document level) – Binarization – Skew correction • Segmentation – Layout analysis – Character segmentation • Pre-processing (character level) • Feature extraction – Image-based features – Statistical features – Transform-based features – Structural features • Classification • Post-proccessing – Classifier combination – Exploitation of context information • Examples of OCR systems • Bibliography

2 Outline

1 Optical Character Recognition

Pattern Recognition

Statistical Structural Image Processing Pattern Recognition Pattern Recognition Methods Applications

Optical Document Character Analysis Recognition

3 Introduction

Some examples

Drawings, maps

Books, journals, reports License plates Identity PDAs Cards

Old documents Cheques, bills Postal addresses Quality control 4 Introduction

2 Document Image Analysis

G. Nagy: Twenty years of document image analysis in PAMI. IEEE Trans. on PAMI, vol. 22, nº 1, pp. 38-62 January 2000.

Document image analysis is the subfield of digital image processing that aims at converting document images to symbolic form for modification, storage, retrieval, reuse and transmission. Document image analysis is the theory and practice of recovering the symbol structure of digital images scanned from paper or produced by computer.

What is a document? Objects created expressly to convey information encoded as iconic symbols – Scanned images from paper documents – Electronic documents – Multimedia documents (video with text) –… 5 Introduction

Applications of DIA

Document Imaging: •Digitization •Storage •Compression •Re-printing

Applications of DIA

Document understanding •Recognition •Interpretation •Indexing •Retrieval

6 Introduction

3 DIA tasks

Mostly text Mostly graphics •Acquisition •Acquisition •Binarization •Binarization Document Imaging •Filtering •Filtering •Skew correction •Vectorization •Segmentation •Text-graphics •Layout analysis separation Document Understanding •OCR •Symbol recognition •Interpretation

7 Introduction

Outline of the course

Focus: Document understanding of mostly text documents

1. Acquisition 2. Pre-processing − Binarization − Skew correction 3. Layout analysis 4. Character segmentation 5. OCR − Feature extraction − Classification − Post-processing

8 Introduction

4 Categorization of Character Recognition

Optical Character Recognition

According to the type of writing

Machine-printed Hand-written character recognition character recognition

According to the type of acquisition

Off-line On-line character recognition character recognition

9 Introduction

Machine-printed character recognition

• Characters are totally defined by the font type: – Dimensions (segmentation) • Character width • Inter-character separation • Character height – Shape (recognition) • Typographic effects (boldface, italics, underline). •Challenges: – Similar shapes among characters – Multiple fonts – Joined characters – Digitization noise: broken lines, random noise, heavy characters, etc. – Document degradation: old documents, photocopies, etc.

10 Introduction

5 Machine-printed character recognition

• Classification of machine-printed OCR systems – Monofont: • One single type of font – Multifont: • Recognition of a fixed and known set of fonts • It is necessary to identify and learn the differences between characters of all the types of fonts – Omnifont: • Recognition of any arbitrary type of font, even if it has not been previously learned

11 Introduction

Off-line hand-written character recognition

• Hand-written • Off-line: acquisition by a scanner or a camera • Challenges: – Shape variability among images of the same character – Character segmentation • Subproblems: – Hand-written numeral recognition: digit recognition

– Hand-printed character recognition: well-separated characters

– Cursive character recognition: non-separated characters

12 Introduction

6 On-line hand-written character recognition

• On-line acquisition – Digitizer tablets – Digital Pen – Tablet PC • Advantages with respect to off-line acquisition: – Image is acquired while the text is written – We can take advantage of dynamic information: • Temporal information: writing order, stroke segmentation, etc • Writing speed • Pen pressure • Subproblems: – Cursive script recognition. – Signature verification/recognition.

13 Introduction

Levels of difficulty in character recognition

•S.Mori, H.Nishida, H.Yamada. Optical Character Recognition. John Wiley and sons. 1999. •S.V. Rice, G. Nagy, T.A. Nartker. Optical Character Recognition: An illustrated guide to the frontier. Kluwer Academic Publishers. 1999.

• Little shape variability 0.0. Printed characters. Specific font. Constant size. Roman • Small number of characters alphabets. • Little noise 0.1. Constrained hand-printed characters. Arabic numerals. Level 0

• Medium variation in shape 1.0. Printed characters. Multiple fonts. Nº characters < 100 • Medium noise 1.1. Loosely constrained hand-printed char. Nº char < 100 1.2. Chinese characters of few fonts

Level 1 1.3. Loosely constrained hand-printed char. Nº char≈1000

• Much variation in shape 2.0. Printed characters of multiple fonts • Heavy noise 2.1. Unconstrained hand-printed characters 2.2. Affine transformed characters Level 2 • Nonsegmented strings of 3.0. Touching/broken characters characters 3.1. Cursive handwriting characters

Level 3 3.2. Characters on a textured background 14 Introduction

7 Levels of difficulty in character recognition

Level 0 0.0. Printed character of a specific font with a constant size • Constant size • Connectivity of characters • Variation in the stroke thickness • Little noise 0.1. Constrained hand-printed characters • Characters are written according to some instructions or box guidelines

Solved problem 15 Introduction

Levels of difficulty in character recognition

Level 1 1.0. Printed characters of multiple fonts

1.1. Loosely constrained hand-printed characters

1.2. Chinese characters of few fonts.

1.3. Loosely constrained hand-printed characters. Nº characters ≈ 1000 Solved problem 16 Introduction

8 Levels of difficulty in character recognition

Level 2 2.0. Printed characters of multiple fonts

2.1. Unconstrained hand-printed characters

2.2. Affine transformed characters

17 Introduction

Levels of difficulty in character recognition

Level 3 3.0. Caràcters no separats o trencats

3.1. Cursive handwriting characters

3.2. Characters on a textured background

18 Introduction

9 Databases for OCR

Off-line Hand-written characters On-line Machine- characters printed CEDAR CENPARMI NIST UNIPEN Univ. Washington • 50.000 • 17.000 • More than • Definition of a • More than segmented manually 1.000.000 format to 1500 pages of numerals from segmented characters from represent on- articles in zip codes numerals from forms line data english • 5.000 zip codes zip codes • Several learning • 4.500.000 • More than 500 • 5.000 city and test sets characters pages of articles names (more • Segmented in japanese • 9.000 state variability) characters, • Originals, names • 91.500 words and photocopies and sentences with a sentences pages with dictionary arificially generated noise • Page segmentation into labeled zones

19 Introduction

Performance evaluation of OCR systems

• Hand-printed Character Recognition – Institute for Posts and Telecommunications Policy (IPIP) – Japan - 1996 – 5000 hand-written numerals from japanese zip codes – Performance of the best system: 97.94% (human performance: 99.84%) • Machine-printed Character Recognition – “The fifth Annual Test of OCR Accuracy”. Information Science Research Institute. TR-96-01. April 1996. http://www.isri.unlv.edu – 5.000.000 characters from 2000 pages in journals, newspapers, letters and technical reports – Performance in good quality documents: 99.77% - 99.13% – Performance in medium quality documents: 99.27% - 98.21% – Performance in low quality documents: 97.01% - 89.34% • Performance 99% => 30 errors /page (3000 characters/page)

20 Introduction

10 Performance evaluation of OCR systems

C.Y. Suen, J. Tan: Analysis of errors of handwritten digits made by a multitude of classifiers. PRL 26, pp. 369-379. 2005

Classifier N of test samples Recognition (%) Error (%) MNIST database GPR 10,000 99.06 0.94 VSVMb 10,000 99.62 0.38 VSV2 10,000 99.44 0.56 LeNet-5 10,000 99.18 0.82 POE 10,000 98.32 1.68 CENPARMI database VSVM 2000 98.7 1.3 USPS database VSVM 2007 97.66 2.34 NIST SD19 database MLP 30,000 99.16 0.84 21 Introduction

Performance evaluation of OCR systems

Classifier N. of errors Category 1 Category 2 Category 3 MNIST database GPR 94 24 11 59 VSVMb 38 15 6 17 VSV2 56 15 9 32 LeNet5 82 17 14 51 POE 168 41 9 118 CENPARMI database VSVM 26 6 4 16 USPS database VSVM 47 13 13 21 NIST SD 19 database MLP 119 30 8 81 Sum 630 161 74 395 Percentage (%) 100 25.56 11.75 62.70

22 Introduction

11 Components of an OCR system

ACQUISITION

• Filtering. DOCUMENT • Binarization. PRE-PROCESSING • Skew correction.

• Layout analysis SEGMENTATION • Text/graphics separation • Character segmentation

CHARACTER • Filtering PRE-PROCESSING • Normalization

• Image-based features FEATURE • Statistical features EXTRACTION • Transform-based features • Structural features

LEARNING CLASSIFICATION

Models Context POSTPROCESSING infromation 23 Introduction

Acquisition. Scanners

A scanner is a linear camera with a lighting and a displacement system

Scan line Opening Video circuit Digital image

text

Lens CCD

Document Lights

24 Acquisition

12 Acquisition. Scanners

Types of scanners: • Flatbed scanners. The CCD line is displaced along the paper • Traction scanners. The paper is displaced through the CCD line • Others. Specific scanners for negative films, cards, passports, etc.

Important features in a scanner: • Optical resolution / Interpolated resolution. • Bits/píxel (depth). • Speed (acquisition and calibration). • Connection (paralel, USB, SCSI). • Programming tools (TWAIN protocol, specific programming languages, such as SCL d’HP, etc.). • Automatic feeding

25 Acquisition

Acquisition. Scanners vs cameras • Resolution determines: • Quality of image • Size of image • Speed of acquisistion • Minimal resolution for OCR: 200dpi. • 12 points character (2x3mm aprox.) at 200dpi generates an image 16x24.

B 200 dpi Noise

• A4 page (297x210 mm) • Scanner at 200dpi : image 2376x1680. • Camera 1024x1024 pixels: resolution of 90dpi. • Dni (85x55 mm) • Scanner at 200dpi: image 670x435 punt. • Camera 1024x1024 pixels: resolució of 300dpi. 26 Acquisition

13 Acquisition. Scanners vs cameras

• Advantages of scanners: –Cost – Resolution – Lighting is under control – Control of optical distortions • Advantages of cameras: – Acquisition speed – More flexibility to adapt to the environment and to the material to read

27 Acquisition

On-line acquisition

• Input device: simulates pen and paper – The image is acquired while it is generated – Special input device that provides x and y coordinates along the time. • Components of the device – Pen, paper and support. At least, one of these components must be special

Digitizer tablet Digitizer tablet Tablet PC Digital pen without display with display and paper • Technical specifications – Resolution – Sample frequency – Precision

28 Acquisition

14 Components of an OCR system

ACQUISITION

• Filtering. DOCUMENT • Binarization. PRE-PROCESSING • Skew correction.

• Layout analysis SEGMENTATION • Text/graphics separation • Character segmentation

CHARACTER • Filtering PRE-PROCESSING • Normalization

• Image-based features FEATURE • Statistical features EXTRACTION • Transform-based features • Structural features

LEARNING CLASSIFICATION

Models Context POSTPROCESSING infromation 29 Pre-processing

Binarization

O.D. Trier, T. Taxt: Evaluation of binarization methods for document images. IEEE Trans on PAMI, vol. 17, nº 3, pp. 312-315. 1995 • Global methods: – Apply the same global threshold to all the pixels of the image • Local adaptative – Apply a different threshold to every pixel depending on the local distribuition of gray values • Special case: binarization of textured backgrounds

30 Pre-processing

15 Binarization: Otsu

N. Otsu: A threshold selection method from gray-level histograms. IEEE Trans on systems, man and cybernetics, vol. 9, nº 1, pp. 62-66, 1979 • Two classes of gray-scale levels: black pixels (foreground) and white pixels (background). • Probabilistic criterion to select the threshold: – To maximize inter-class variability – To minimize intra-class variability

31 Pre-processing

Otsu

• The probability distribution of gray levels is defined as: n p = i i N

where ni is the number of pixels with gray level i and N is the total number of pixels. • We want to define two classes of gray levels:

–C1: [1..k]

–C2: [k+1..L] where k is the threshold • The mean and standard deviation of each class are defined as: k L L µ = µ = iP(i | ) µ = 1 ∑iP(i | C1) 2 ∑ 2 T ∑ipi i=1 i=k +1 i=1 k L L σ 2 = ()− µ 2 σ 2 = ()i − µ 2 P(i | C ) σ 2 = ()− µ 2 1 ∑ i 1 P(i | C1) 2 ∑ 2 2 T ∑ i T pi i=1 i=k +1 i=1

p pi P(i | C ) = i P(i | C ) = 1 ω 2 ω 1 2 L k ω = p ω = p 2 ∑ i 1 ∑ i i=k +1 = i 1 32 Pre-processing

16 Otsu

• We define within-class and between-class variances: σ 2 = ω σ 2 +ω σ 2 W 1 1 2 2 σ 2 = ω ()()µ − µ 2 +ω µ − µ 2 B 1 1 T 2 2 T • The following criteria of class separability are equivalent:

σ 2 σ 2 σ 2 λ = B κ = T η = B σ 2 σ 2 σ 2 W W T λ κ = λ + η = 1 λ +1 • Then, the optimal threshold is: = σ 2 k arg max B (k) 1≤k≤L

33 Pre-processing

Binarization: Local adaptative binarization

• The threshold at every pixel depends on the local distribution of gray levels on the neighborhood of the pixel. • Niblack’s method:

W. Niblack: An introduction to digital image processing, pp. 115-116, Prentice Hall, 1986 T (x, y) = m(x, y) + k ⋅ s(x, y) – m(x,y) and s(x,y) are the mean and standard deviation in a local neighborhood of the pixel. – Window size: 15 x 15 – k = -0.2 • Eikvil et al.’s method: L. Eikvil, T. Taxt, k. Moen: A fast adaptative method for binarization of document images. International Conference on Document Analysis and Recognition 2001, pp. 435-443 – For every pixel, we define a small window S (3 x 3) and a large window L (15 x 15). – The threshold value is selected by applying Otsu to the large window L. – The pixels in the small window S are thresholded using this value.

34 Pre-processing

17 Binarization: evaluation

Otsu Eikvil

Otsu Niblack

35

Binarization: evaluation

• Visual criteria of evaluation – Broken line structures: gaps in lines – Broken symbols and text: symbols and text with gaps – Blurring of lines, symbols and text – Loss of complete objects – Noise in homogeneous areas: noisy spots and flase objects in both background and print • Scale of 1-5 for each criterion

36

18 Binarization: evaluation

37

Binarization: Textured Backgrounds

Y. Liu, S. Srihari: Document image binarization based on texture features: IEEE Trans. on PAMI, vol. 19, nº 5, pp 540-544, 1997 1. Selection of candidate thresholds • Iterative application of Otsu • At each iteration Otsu is applied to the part of the histogram with the lowest mean • The number of iterations depends on the number of peaks in the histogram • We get a set of possible threshold values 2. Computation of a set of texture features for each possible binarization • Based on the computation of run-length histogram of the binarized image, R(i), i∈{1,..L} 3. Selection of the optimal binarization

38 Pre-processing

19 Binarization: Textured Backgrounds

• Texture features – Stroke width: run-length with the highest frequency SW = arg max R(i),i ≠ 1 i – Stroke-like pattern noise: relevance of stroke-like patterns in the background between consecutive threshold selections. If it is high, it denotes stroke-like patterns due to texture. max R (i) SPN = i j ,i ≠ 1 maxi R j+1(i) – Unit run noise: relevance of unit run-lengths. Ideally, it should be low R(1) URN = ,i ≠ 1 maxi R(i) – Long run noise: relevance of long rung-lengths. Ideally, it should be low R(i) LRN = ∑i>L ,i ≠ 1 maxi R(i) – Broken character: it will be high if characters are broken.

= mini∈I ' R(i) = BC , I' {0,arg maxi R(i)} maxi R(i) 39 Pre-processing

Binarization: Textured Backgrounds

• Experimental results show it is enough with 2 iterations of otsu

• Decision tree to select the optimal threshold (between T1 and T2, T1 > T2) 1. Select the value with larger stroke width feature.

2. If T1 is the selected value,

– If background noise features are low, T1 is the selected threshold.

– Otherwise, if T2 does not result in many, broken characters, select T2

3. If T2 is the selected value,

– If the broken character feature is low, select T2

– Otherwise, if noise features are low with T1, select T1 4. If neither of both thresholds is quite good, select the average between them

40 Pre-processing

20 Binarization of textured backgrounds

41 Pre-processing

Skew Correction : Projection profiles

W. Postl: Detection of linear oblique structures and skew scan in digitized documents. 8th International Conference on Pattern Recognition, pp. 687-689, 1986

1. Compute the horizontal projection at several angles (depending on the desired resolution) 2. For every projection, compute a directional criterion (that estimates the difference between maxima and minima in the projection) • Sum of squared differences in adjacent rows • Variance of the number of black pixels in a scan line 3. Select the angle that maximizes the directional criterion

42 Pre-processing

21 Skew Correction : Projection profiles

H.S. Baird: The skew angle of printed documents. Proc. Of the Society of Photographic Scientist and Engineers, pp. 14-21, 1987. • Modification of the previous algorithm: – Use only the bottom-centers of connected components to compute the projection profiles – Reduces computation cost

43

Analysis of neighboring connected components

L. O’Gorman: The document spectrum for page layout analysis. IEEE Trans. on PAMI, vol. 15, nº 11, 1993 1. Compute connected components. 2. For every connected component, search the k nearest negihbours (k = 5) 3. For every pair of connected components, compute the angle between the centroids. 4. Compute the histogram of angles. 5. Estimation of the skew angle: maximum of the histogram

44 Pre-processing

22 Analysis of neighboring connected components

• Accurate angle estimation: 1. Find connected components in the same line (clusters of pairs of connected components with an angle near the rough estimation). 2. Fit a straight line (using regression) to the centroids of components in each line. 3. Make the final estimation from these text lines.

45 Pre-processing

Components of an OCR system

ACQUISITION

• Filtering. DOCUMENT • Binarization. PRE-PROCESSING • Skew correction.

• Layout analysis SEGMENTATION • Text/graphics separation • Character segmentation

CHARACTER • Filtering PRE-PROCESSING • Normalization

• Image-based features FEATURE • Statistical features EXTRACTION • Transform-based features • Structural features

LEARNING CLASSIFICATION

Models Context POSTPROCESSING infromation 46 Segmentation

23 Page Layout Analysis

• Layout analysis: segmentation of the image into several blocks with the same type of information: text, graphics, table, image, etc. • Methods: – Run-length smearing – Analysis of connected components

47 Segmentation

Page Layout Analysis

48

24 Page Layout Analysis: Run-length smearing

J.L. Fisher, S. C. Hinds, D.P. D’amato: A rule-based system for document image segmentation. 10th International Conference on Pattern Recognition, pp. 567-572, 1990. 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 0 1 0 0 1 0 0 0 1 1. Horizontal run-length smearing 1 1 1 0 0 0 0 1 1 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 • Threshold: inter-character separation: maximum of the histogram of the width of white runs 2. Vertical smearing • Threshold: inter-line separation: estimated during skew correction 3. Logical AND between both images 4. Additional horizontal smearing 5. Connected components are the blocks 6. Computation of features for each connected component: aspect ratio, black pixels density, Euler number, perimeter length, perimeter to width ratio, perimeter- squared to area ratio

49 Segmentation

Page Layout Analysis: Run-length smearing

• A set of rules classifies each block into text or non-text according to the features for each connected component

50

25 Page Layout Analysis: Run-length smearing

51 Segmentation

Page Layout Analysis: Analysis of connected components

A.K. Jain, B. Yu: Document representation and its application to page decomposition. IEEE Trans. on PAMI, vol. 20, nº 3, pp. 294-308, 1998 1. Detection of connected components 2. Definition of distance and overlap among components = − Dx (oi ,o j ) max[X u (oi ), X u (o j )] min[X l (oi ), X l (o j )]

= − Dy (oi ,o j ) max[Yu (oi ),Yu (o j )] min[Yl (oi ),Yl (o j )] − D (o ,o ) = x i j Vx (oi ,o j ) min[W (oi ),W (o j )] − D (o ,o ) = y i j Vy (oi ,o j ) min[H (oi ), H (o j )]

3. Grouping of connected components in the same line:

• Dx below a distance threshold

• Vy above an overlap threshold 52 Segmentation

26 Page Layout Analysis: Analysis of connected components

4. Classification of lines between text and non-text • Text lines: • Height below a threshold and horizontally aligned (standard deviation of bottom edges of CC below a threshold) • Width over a threshold and all CC have similar height (ratio between mean and standard deviation less than a threshold) 5. Grouping of text lines into text regions • Vertically close and horizontally overlapped 6. Grouping of non-text lines into non-text regions • Vertically close and horizontally overlapped • Horizontally close and vertically overlapped 7. Identification of image regions from non-text regions • Large regions • The ratio of black pixels is large 8. Identification of table regions • Detection of horizontal and vertical lines with similar orientations • Similar height of CC 9. Remaining non-text regions: drawing regions 53 Segmentation

Page Layout Analysis: Analysis of connected components

54 Segmentation

27 Page Layout Analysis: Multiscale analysis

1. Application of wavelets to obtain a multiscale representation 2. Computation of local features (local moments) from the wavelet representation 1 n 1 ⎛ − ⎞ µ (W ) = ⎜ ()f (x) − f ⎟ n i ∑ wi |W | ⎜ ∈ ⎟ i ⎝ x Wi ⎠ 3. Training of a neural network to classify each block as text, image or graphics according to these local features. 4. Propagation of classification through adjacent blocks and between different scales of the wavelet representation

55 Segmentation

Page Layout Analysis: Performance Evaluation

•Goals – Evaluate the performance of several commercial OCR engines on a set of journal pages – Use the results of the evaluation to define methods for the combination of these engines in order to improve the overall performance

• Therefore, evaluation must permit to determine the strengths and weaknesses of each method

56

28 Page Layout Analysis: Performance Evaluation

Output of segmentation Ground-truth

Misrecognition Correct

Unrecognition

57 Segmentation Error zones

Page Layout Analysis: Performance Evaluation

58

29 Page Layout Analysis: Performance Evaluation

• 5 types of zones: –Text –Graphics –Table – Background – Text Over Image

• Comparation between the ground-truth and the output of engines Ground-truth zones

Output Overlapping between output zones and ground-truth zones

59 Segmentation

Page Layout Analysis: Performance Evaluation

• Evaluation measures: more than 100 measures grouped in six categories – Good recognition measures: percentages of ground truth area (grouped by zone type), recognized as zones with same type in the output, i.e. text zones in ground truth recognized as text zones, graph zones recognized as graph zones, etc. – Unrecognition measures: relative area of zones with type text, graph, textoverimage or table in ground truth recognized as background – Misrecognition measures: zones in ground truth recognized as a different type, for example text zone recognized as (graph, textoverimage or table). – Overlap measures: relative area (grouped by type) recognized twice by a zoning engine. – Split and merge measures: how many zones are recognized and assigned in terms of splitting and merging errors.

60 Segmentation

30 Page Layout Analysis: Performance Evaluation

• Experiments – Creation of the ground-truth for 100 journal pages – Evaluation of six OCR engines – Tests with two formats of images (TIFF and JPEG) and 4 image resolutions (100 dpi, 200 dpi, 300 dpi and 400 dpi) – Evaluation of a simple combination scheme

61 Segmentation

Page Layout Analysis: Performance Evaluation

62

31 Page Layout Analysis: Performance Evaluation

63

Page Layout Analysis: Performance Evaluation

64

32 Page Layout Analysis: Performance Evaluation

65

Page Layout Analysis: Performance Evaluation

66

33 Page Layout Analysis: Performance Evaluation

67

Page Layout Analysis: Performance Evaluation

Text correct recognition Text correct recognition

100 100 90 90 ABBYY 6 ABBYY 6 80 80 ABBYY 7 70 70 ABBYY 7 60 PODCORE 60 PODCORE 50 PODCORE2 50 PODCORE2 40 SCANSOFT 40 SCANSOFT 30 30 XV IS ION XVISION 20 20 COMB COMB 10 10 0 0 TIFF400 TIFF300 TIFF200 TIFF100 JPG400 JPG300 JPG200 JPG100

Text misrecognition Text missrecognition

100 100 90 90 ABBYY 6 ABBYY 6 80 80 ABBYY 7 ABBYY 7 70 70 60 PODCORE 60 PODCORE 50 PODCORE2 50 PODCORE2 40 SCANSOFT 40 SCANSOFT 30 30 XV IS ION XVISION 20 20 COMB COMB 10 10 0 0 TIFF400 TIFF300 TIFF200 TIFF100 JPG400 JPG300 JPG200 JPG100

68 Segmentation

34 Page Layout Analysis: Performance Evaluation

Graph correct recognition Graph correct recognition

100 100 90 90 ABBYY 6 ABBYY 6 80 80 ABBYY 7 70 70 ABBYY 7 60 PODCORE 60 PODCORE 50 PODCORE2 50 PODCORE2 40 SCANSOFT 40 SCANSOFT 30 30 XVISION XVISION 20 20 COMB COMB 10 10 0 0 TIFF400 TIFF300 TIFF200 TIFF100 JPG400 JPG300 JPG200 JPG100

ToI correct recognition ToI correct recognition

100 100 90 90 ABBYY 6 ABBYY 6 80 80 ABBYY 7 70 ABBYY 7 70 60 PODCORE 60 PODCORE 50 PODCORE2 50 PODCORE2 40 SCANSOFT 40 SCANSOFT 30 30 XVISION XVISION 20 20 COMB COMB 10 10 0 0 TIFF400 TIFF300 TIFF200 TIFF100 JPG400 JPG300 JPG200 JPG100

69 Segmentation

Page Layout Analysis: Performance Evaluation

• Conclusions – Image format makes no difference in the final results – No ideal resolution • Sometimes better at 300dpi, sometimes, better at 400dpi • Results are a little bit lower at 200dpi or 100dpi – Combination can improve the results, but more advanced combination schemes should be defined

70 Segmentation

35 Text/Graphics Separation

L.A. Fletcher, R. Kasturi: A robust algorithm for text string separation from mixed text/graphics images. IEEE Trans. on PAMI, vol. 10 nº 5, pp. 910-918, 1988 • Analysis of connected components – Size: characters are smaller than graphic components – Aspect ratio x-y: characters are more “squared” than graphic components – Pixel density: characters are more dense than graphic components • Grouping of characters: Hough transform and component proximity

• Difficulties – Joined characters – Characters touching to lines 71 Segmentation

Text/Graphics Separation Specific methods Z. Lu: “Detection of Text Regions From Digital Engineering Drawings”. IEEE Trans. on PAMI, Vol. 20, n.4, pp. 431-439. April 1998.

• Detection and removal of lines at given orientations: horizontal, vertical, ±22.5, ±45, ±67.5 – Detection of long consecutive runs of black pixels after rotating the image at given orientations. • Analysis of connected components to separate text and graphics

72

36 Text/Graphics Separation

•Vertical and horizontal run-length smearing to join components

•Classification of final components as text or graphics based on density and size of components •Recovering of the original image from the enclosing rectangles of text components

73

Character segmentation

•R.G. Casey, E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation. IEEE Transactions on PAMI, Vol. 18 no. 7, pp. 690-706, 1996. •Y. Lu. Machine Printed Character Segmentation – An Overview. Pattern Recognition, Vol. 28, n. 1, pp. 67-80, 1995. •Y. Lu, M. Shridar. Character Segmentation in Handwritten Words – An Overview. Pattern Recognition, Vol. 29, n. 1, pp. 77-96, 1996. • Segmentation of characters in blocks of text • Levels of difficulty: – Characters with uniform separation and fixed width – Well separated characters with proportional width – Broken characters – Touching characters – Broken and touching characters – Cursive script – Hand-printed words – Handwritten cursive words.

74 IntroduccióSegmentation

37 Character segmentation

• Relevant features for segmentation – Character width and height – Distance between characters – Inter-character interval (distance between character centres) – Aspect ratio. – Baseline and Top Baseline – Ascenders and Descenders

75 Segmentation

Character segmentation: classification of methods

• External segmentation: – Segmentation before recognition. Independent processes – The goal is to find the exact location of character separation – Low performance with cursive script, touching characters or handwriting • Internal segmentation: – Based on Sayre Paradox: a letter cannot be segmented without being recognized and cannot be recognized without being segmented – Segmentation and recognition are done at the same time – Recognition generates or validates segmentation hypothesis • Holistic methods – No character segmentation – Recognition tries to recognize words without recognizing individual characters

76 Segmentation

38 Character segmentation: classification of methods

analytical holistic

external internal Dynamic hybrid Markov (based on image) (based on recognition) programming

windowing Feature-based

post-process dissection (graphemes)

Hidden Markov No Markov Model

77 Segmentation

External segmentation

• Image decomposition in sub-images using general features • Each subimage corresponds to a possible character • Combination of several methods: – Connected component labelling – Run-Lenght smearing. – Projections and X-Y tree decomposition – Analysis of contours – Analysis of profiles

78 Segmentation

39 External segmentation : Connected component labelling

Original image Labelling according to neighbours ?

End of labelling Unification of equivalent classes

79 Segmentation

External segmentation : Projections and X-Y trees

If text lines are perfectly separated, only one vertical projection is required. Otherwise, it is necessary to apply several vertical and horizontal projections

X-Y tree projecció hor. 1 2

projecció vert. 1.1 1.2 ... 1.10 2.1 2.2 ... 2.4 ... 1.9 th s co projecció hor. 2.4.1 2.4.2 i . 80 Segmentation

40 External segmentation : projections

It is more robust to use the second derivative of the projection with respect to the value at each point of the histogram (it enhances the projection minima)

V (x −1) − 2V (x) +V (x +1) F(x) = V (x)

81 Segmentation

External segmentation: Run-length smearing

• Run-lengths: sequences of consecutive pixels with the same colour in a row or column • Smearing: inversion of runs with a length below a certain threshold

Smearing horitzontal

AND

Smearing vertical

82 Segmentation

41 External segmentation: analysis of profiles

Upper profile

Lower profile

• Determination of the point of separation between characters: – Follow the profile beginning at a minimum up to the following maximum – Distance between upper and lower profile

83 Segmentation

External segmentation: post-process

• Problem: broken and touching characters – Analysis of the bounding box: definition of several rules that permit to join or break them properly • Estimated character size (width and height) • Number of estimated characters • Component aspect ratio • Proximity/overlapping of bounding boxes

84 Segmentation

42 External segmentation: post-process

M. Shridhar, A. Badreldin. “Recognition of Isolated and Simply Connected Handwritten Numerals”. Pattern Recognition, Vol. 19, n. 1, pp. 1-12, 1986. • “Hit and Deflect” strategy – Starting point: maximum of lower profile / minimum of upper profile – Contour following: • Vertical scan to find a contour point • Move the scan point according to the value of neighbouring pixels: – To the right/left if one pixel correspond to the character and the other does not – Up, if both neighbouring pixels belong to the character

85 Segmentation

External segmentation: oversegmentation

• Segmentation of the image into subimages that do not necessarily correspond to individual characters • Analysis form the contour minima and maxima R. Bozinovic, S. Srihari. “Off-line Cursive Script Word Recognition”. IEEE Trans. on PAMI, Vol. 11, n. 1, 1989 – Detection of significant contour minima: • Cut points • Lower extreme of a character – Generation of possible cut points: • For each contour minima, search to the left and to the right those points that correspond to a single vertical run with low density – Compactation of nearby cut points

86 Segmentation

43 Character segmentation: classification of methods

analytical holistic

external internal Dynamic hybrid Markov (based on image) (based on recognition) programming

windowing Feature-based

post-process dissection (graphemes)

Hidden Markov No Markov Model

87 Segmentation

Internal segmentation

• Segmentation and recognition at the same time • Two approaches: – Windowing • Sequential scan the image from left to right • Generation of segmentation hypotheses • Detection of the cut points with the best recognition performance (verification step) – Based on image features • Feature detection • Generation of possible correspondences between features and letters • Search for the best possible combination among all correspondences • Two types of methods: Markov-based and non Markov-based

88 Segmentation

44 Internal segmentation: windowing

R.G. Casey, G. Nagy: Recursive segmentation and classification of composite patterns. 6th International Conference on Pattern Recognition, pp. 349-451, 1986 • A mobile window is used to generate possible segmentation sequences • Each possible segmentation is validated by the recognition process • Search for a segmentation sequence that yields to a valid final result

89 Segmentation

Internal segmentation: windowing

C.J.C. Burges, J.I. Be, C.R. Nohl: Recognition of handwritten cursive postal words using neural networks. Proc. USPS 5th Advanced Technology Conference, p. A-117, 1992 • Shortest Path Segmentation – Representation of all segmentation possibilities with a graph: • Nodes: all possible combinations of pre- segmented zones • Edges: neighbour compatibility between pre- segmented zones – Using a neural network, each node is assigned a recognized character, with a measure of confidence (distance) – Finding the shortest path in the graph is equivalent to finding the best possible segmentation

90 Segmentation

45 Internal segmentation: Feature-based

J. Rocha, T. Pavlidis: Character recognition without segmentation. IEEE Trans. on PAMI, vol. 17, nº 9, pp. 903-909, 1995 •Graph-based: – Representation of image skeleton with a graph – Subgraph matching to find possible correspondences with the character prototypes – Creation of a network • Nodes: recognized prototypes labelled with the matching cost • Edges: adjacency relationship among nodes – Recognition: searching the optimal path in the network

I J C X T C S T E R U O M

91 Segmentation

Components of an OCR system

ACQUISITION

• Filtering. DOCUMENT • Binarization. PRE-PROCESSING • Skew correction.

• Layout analysis SEGMENTATION • Text/graphics separation • Character segmentation

CHARACTER • Filtering PRE-PROCESSING • Normalization

• Image-based features FEATURE • Statistical features EXTRACTION • Transform-based features • Structural features

LEARNING CLASSIFICATION

Models Context POSTPROCESSING infromation Character pre- 92 processing

46 Character pre-processing

Some usual pre-processing operation in OCR:

• Filtering: noise reduction • Thinning • Binarization • Normalization: – Reduce character variability – Converting the character to a normal shape: •Orientation •Slant •Size • Stroke thickness

Character pre- 93 processing

Normalization

Inverse transforms to reduce intra- class variance The most usual normalization transforms are:

• Rotation. Rotated scan, text in graphic documents • Slant. Cursive fonts or handwriting • Stroke thickness. Bold fonts of very thin strokes, handwriting with different pen thickness • Size. Titles, footnotes, handwriting

Character pre- 94 processing

47 Normalization

⎛ x'⎞ ⎛ a a ⎞⎛ x ⎞ ⎛t ⎞ ⎜ ⎟ = ⎜ 11 12 ⎟⎜ ⎟ + ⎜ x ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜t ⎟ ⎝ y'⎠ ⎝a21 a22 ⎠⎝ y⎠ ⎝ y ⎠

Type Parameters Meaning t , t : traslation Traslation a =0 ; i,j=1,2 x y ij factors

a11=sx a12=0 s s : scaling Scaling x, y factors a21=0 a11=sy

a11=cos α a12=-sin α Rotation α: rotaion angle a21=sin α a22=cos α

a11=1 a12=tg β Slant β: slant angle a21=0 a22=1

Character pre- 95 processing

Normalization

• Rotation. – To determine the baseline of a set of aligned characters: projection analysis, Hough transform – To determine the orientation of a single characters: inertia axes (moments of second order) •Slant. – Approximation of slant angle from the regression line of the character pixels – Approximation from the orientation of “vertical” segments in a word

Character pre- 96 processing

48 Normalization

• Size. – Normalization to a standard size – Normalization of the relation between x size and y size – Usually, it is done from the bounding box of the character – Some pixel resamping is required: interpolation to avoid ”aliasing” • Stroke thickness – It is not straightforward to determine because of variability in the character itself – Approximation of stroke thickness: • It is assumed that the character stroke has length l and width w. • The width is estimated from the area and perimeter of the character – Then, morphological operations (dilation and erosion) are used to normalize the character to standard width

Character pre- 97 processing

Non-linear normalization

• Optical distortion – Parameter estimation with least squares criterion using a calibration image

⎛ x'⎞ ⎛ x ⎞ ⎛ (x2 + y 2 )x ⎞ ⎜ ⎟ = C ⎜ ⎟ + C ⎜ ⎟ ⎜ ⎟ m ⎜ ⎟ d ⎜ 2 2 ⎟ ⎝ y'⎠ ⎝ y⎠ ⎝(x + y )y⎠

Cm : Magnification coefficient

Cd : Distorsion coefficient

Character pre- 98 processing

49 Components of an OCR system

ACQUISITION

• Filtering. DOCUMENT • Binarization. PRE-PROCESSING • Skew correction.

• Layout analysis SEGMENTATION • Text/graphics separation • Character segmentation

CHARACTER • Filtering PRE-PROCESSING • Normalization

• Image-based features FEATURE • Statistical features EXTRACTION • Transform-based features • Structural features

LEARNING CLASSIFICATION

Models Context POSTPROCESSING infromation Feature 99 Extraction

Feature-based Recognition

• Probably, the selection of the method for feature extraction is the most important factor in order to achieve good recognition performance

Which are the best features to discriminate between the characters?

Feature 100 Extraction

50 Feature Extraction

To extract from the image the most relevant information for classification, i.e., to minimize intra-class variability while maximizing inter-class variability

• Selection of appropiate features: – It is a critical decision – It depends on the specific application – Features must be invariant to the character variations (depending on the application): rotation, degradation, noise, shape distortion – Low dimensionality to avoid large learning sets – Features determine the type of information to work with: gray-level image, binary image, character contour, vectorization of the skeleton, etc. – Features also determine the type of classifier

Feature 101 Extraction

Feature Extraction

O.D. Trier, A.K. Jain, T. Taxt. Feature Extraction Methods for Character Recognition - A Survey. Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996. • Image-based features – Projection – Profiles –Crossings • Statistical features –Moments – Zoning • Global transforms and series expansion – Karhunen-Loeve – Fourier descriptors • Topological and geometric features. Structural analysis – Contour analysis – Skeleton analysis – Topological and geometric features Feature 102 Extraction

51 Image-based Features

• All the image as feature vector – Classification by correlation – Very sensitive to noise, character distortion and similarity between classes. • x and/or y projections – We can use the accumulated projection too – Sensitive to rotation, distortion and large number of characters • Peephole. – Coding with a binary number some pre- selected pixels of the image – Pre-selected pixels can vary depending on the character to be recognized A Feature 010111011 103 Extraction

Image-based Features

F. Kimura, M. Shridhar: Handwritten numeral recognition based on multiple algorithms. Pattern Recognition, 24(10), pp. 969-983, 1991 • Contour profiles – Left (right) profile: contour minimum (maximum) x at every y value – Lower (upper) profile: contour minimum (maximum) y at every x value

– Feature: • Profile values • Difference between consecutive profile values • Maxima and minima of the profile • Maxima of the difference between profile values Feature 104 Extraction

52 Image-based Features

• Crossing method – Features are computed from the number of times that the character is crossed by vectors along some orientations, for example, 0o, 45o, 90o, 135o. – Used in commercial systems because of speed and low complexity – Robust to some distortions and noise – Sensitive to size variations

Feature 105 Extraction

Statistical Features

• Methods based on the statistical distribution of pixels in the image: – Geometric moments – Zoning. • Features are robust to distortion and, up to a certain extent, to some style variations • Low computation time and easy to implement • A learning step is needed to infer the model of characters

Feature 106 Extraction

53 Statistical Features: Zoning • The image is divided in n x m cells. • For each cell the mean of gray levels is computed and all these values are joined in a feature vector of length n x m.

• We can also use information from the contour or any other feature computed in every zone F. Kimura, M. Shridhar: Handwritten numeral recognition based on multiple algorithms. Pattern Recognition, 24(10), pp. 969-983, 1991

orientation num. 0o 1 45o 5 90o 2 135o 0 Feature 107 Extraction

Statistical Features: Geometric Moments

M. Hu: Visual pattern recognition by moment invariants. IRE Trans. Inf. Theroy 8, pp. 179-187, 1962 • Moments of order (p+q) of image f: N N = p q mpq ∑∑ f (x, y)(x) (y) x==11y – m character area (in binary images) oo m m – Center of gravity of the character: x = 10 y = 01

m00 m00 • Central moments (centering the character at the center of gravity): N N µ = − p − q pq ∑∑ f (x, y)(x x) (y y) x==11y µ µ µ – Central moments of order 2 ( 20, 02, 11) permit to compute: • Main inertia axes 1 ⎛ 2µ ⎞ • Character length θ = atan⎜ 11 ⎟ ⎜ µ − µ ⎟ • Character orientation 2 ⎝ 20 02 ⎠ Feature 108 Extraction

54 Statistical Features: Geometric Moments • Invariant moments: µ – Central moments pq are traslation-invariant – Scale-invariants µ ν = pq , p + q ≥ 2 pq µ (1+(p+q)/2) 00 – Rotation-invariants (order 2): φ =ν +ν 1 20 02 φ = ν −ν 2 +ν 2 2 ( 20 02 ) 11 – Invariant to general linear transforms: = µ µ − µ 2 I1 20 02 11 = ()µ µ − µ µ 2 − ()()µ µ − µ 2 µ µ − µ 2 I2 30 03 21 12 4 30 12 21 21 03 12 I ψ = I1 ψ = 1 1 µ 4 1 µ 4 00 00 – A set of moment invariants of different orders can be defined in a similar way T.H. Reiss: The revised fundamental theorem of moment invariants. IEEE Trans. PAMI, Feature vol. 13, nº 8, pp. 830-834, 1991 109 Extraction

Statistical Features: Zernike Moments • Geometric moments: – Projection of the function f(x,y) over the monomial xpyq (No orthogonality => information redundancy) • Zernike moments: – Change to polar coordinates to achieve orthogonality and rotation invariance

– Projection of the image over the Zernike polynomials Vnm which are orthogonalinside the unitary circle x2+y2=1. A. Khotanzad, Y.H. Hong: Invariant image recognition by Zernike moments. IEEE Trans. PAMI, vol. 12, nº 5, pp. 489-497, 1990. = ρ θ = ρ jm θ Vnm (x, y) Vnm ( , ) Rnm ( )e where j = −1, θ =a tan(y / x), ρ = x2 + y 2 n ≥ 0, m ≤ n, n − m is even, and (n− m )/ 2 − s ρ n−2s − ρ = ( 1) (n s)! Rnm ( ) ∑ = ⎛ n + m ⎞ ⎛ n − m ⎞ s 0 s!⎜ − s⎟!⎜ − s⎟! ⎜ 2 ⎟ ⎜ 2 ⎟ Feature ⎝ ⎠ ⎝ ⎠ 110 Extraction

55 Statistical Features: Zernike Moments

• The image is projected over the Zernike polynomials: = ρ θ f (x, y) ∑∑ AnmVnm ( , ) nm

•Anm coefficients are the Zernike moments of order n and repetition m: n +1 A = f (x, y)[V (ρ,θ )]* nm π ∑∑ nm xy where x2+y2≤ 1 (the image must be re-scaled to the unitay circle) and * is the conjugate complex

•| Anm| is rotation invariant • Relation between Zernike moments and geometric moments: µ 1 A = 00 = , 00 π π = = A11 A1,−1 0, 3()µ − µ − 2 jµ A = 02 20 11 , 22 π ()µ + µ − = 3 20 2 02 1 Feature A20 111 Extraction π

Statistical Features: Zernike Moments

Rows 1 and 2: Zernike moments of order 1-13. This equation is used to display them: = 2 + 2 ≤ ≤ − I n (x, y) ∑ AnmVnm (x, y), on x y 1, m n i n m parell. m Rows 3 and 4: Image reconstruction from Zernike moments of order 1-13. •To reconstruct the image from Zernike moments: N = f (x, y) lim AnmVnm (x, y) N →∞ ∑∑ n=0 m, m ≤n, n− m parell Orders 1 and 2, for example, represent orientation, height and width

Feature 112 Extraction

56 Statistical Features: Zernike Moments

• Image reconstruction using moments up to order 10 (66 moments).

Feature 113 Extraction

Statistical Features: Zernike Pseudo-Moments

• Less noise sensitive than Zernike moments • Better recognition results

− n m (−1)m ρ n−s (2n +1− s)! R (ρ) = nm ∑ ()+ + − ()− − s=0 s! n m 1 s ! n m s ! – Removing the constraint: n − m even

Feature 114 Extraction

57 Statistical Features: Invariant Moments

• Experiments show that a robust OCR system needs, at least, 10- 15 features, i.e, we need to define between 10 and 15 geometric invariants • Handwritten digit recognition: – Moments up to order 6: • Regular moments (24 moments): 94% • Zernike moments (23 moments): 95% • Zernike psedudo-moments (44 moments): 91.5% – Moments of upper orders: decrease in recognition performance S.O. Belkasim, M. Shridhar, M. Ahmadi: Pattern Recognition with Moment Invariants: A Comparative Study and New Results. Pattern Recognition, Vol. 24, n. 12, pp. 1117-1138, 1991.

Feature 115 Extraction

Transform-based features

• Instead of using the character image to compute the feature vector, a linear transform is applied to compute the features g = T ⋅ f where T is a matrix of constant values • These transforms help to reduce the dimensionality of the feature vector, preserving the most relevant information of the shape of the character • Original image can be reconstructed from the feature vector • Features are invariant to some global deformations, such as traslation and rotation • High computational cost • Some examples: – Karhunen-Loeve. – Sèries de Fourier.

Feature 116 Extraction

58 Transform-based features: Karhunen-Loeve Expansion

• KLT is defined as: g = T t ⋅( f − f ) where f is the feature vector of the image and f is the mean of all the samples representing the character = 1 N f ∑ fi N i=1 • Each column of T is an eigenvector of the covariance matrix: N 1 C = ( f − f )⋅( f − f )t N ∑ i i i=1 • Usually, only M (M

Feature 117 Extraction

Transform-based features: Karhunen-Loeve Expansion

• For each image, we get a feature vector x of dimension d • The learning set is composed of n samples per class • The set of samples is represented by matrix X, of dimension nxd. • For each class, we can compute the covariance matrix R: 1 1 R = X T X = (x − x)(x − x)T n −1 n −1∑ i i • Then, the transform matrix T is defined as: = [] T v1 Lvd

v1 Lvd :eigenvectors of covariance matrix R λ λ 1 L d :eigenvalues of covariance matrix R λ ≥ λ ≥ λ 1 2 L d • The transformation of an image x is: y = T T x

Feature 118 Extraction

59 Transform-based features: Karhunen-Loeve Expansion • Usually, the dimensionality is reduced and we only take the m eigenvectors of R with greater weight • Then, the transform matrix becomes: = [] P v1 Lvm

• For each image xi, the feature vector yi is: = T yi P xi •Usually m is selected in such a way that the eigenvectors explain some pre- specified percentage of the total variance (usually 0.9 or 0.95). • The percentage of variance explained by the eigenvectors is given by: m λ ∑ i i=1

Feature 119 Extraction

Transform-based features: Karhunen-Loeve Expansion

M. D. Garris, J.L. Blue, G. T. Candela, P. J. Grother, S. A. Janet, C.L. Wilson: NIST Form-based handprint recognition system (release 2.0). Technical report NISTIR 5959, National Institute of Standards and Technology, USA, 1994.

• Application of KL transform to the NIST database: – Digit recognition: 96% - 97% – Uppercase recognition: 89% - 90% – Lowecase recognition: 77% - 82%

120

60 Transform-based features: Fourier Descriptors

• Decomposition as a Fourier series of a periodic function of period T

∞ 2πnt 2πnt i T −i f (t) = C ⋅e T = ⋅ T ∑ n Cn f (t) e dt n=−∞ ∫0

∞ π π = + ⎛ ⋅ 2 nt + ⋅ 2 nt ⎞ f (t) A0 ∑⎜an cos bn sen ⎟ n=1 ⎝ T T ⎠ = 1 T A0 f (t)dt T ∫0 T π = 2 ⋅ 2 nt an f (t) cos dt T ∫0 T T π = 2 ⋅ 2 nt bn f (t) sen dt T ∫0 T

Feature 121 Extraction

Transform-based features: Fourier Descriptors

C. T. Zhan, R. Z. Roskies: Fourier descriptors for plane closed curves. Trans. on computers, vol. C-21, nº 3, pp. 269-281,1972 The shape contour can be described in polar coordinates: z(s) = x(s) + iy(s) The contour can be described as a function of the tangent angle: s x(s) = x(0) + cos[]θ (α) dα ∫0 s y(s) = y(0) + sin[]θ (α) dα ∫0 Defining the function of accumulation of the tangent angle: Φ(l) =θ (l) −θ (0) This function is normalized in the range [0, 2π]

* ⎛ Lt ⎞ Φ (t) =Φ⎜ ⎟ + t ⎝ 2π ⎠ Finally, this function can be descomposed in Fourier descriptors in the following way:

∞ Φ* = + + (t) a0 ∑(ak cos kt bk sin kt) Feature k =1 122 Extraction

61 Transform-based features: Fourier Descriptors

• In the discrete case: = + z(s j ) x(s j ) iy(s j ) ⎡ y(s ) − y(s )⎤ ∆Φ(s ) = tan −1 j j−1 j ⎢ − ⎥ ⎣⎢ x(s j ) x(s j−1) ⎦⎥ j Φ = ∆Φ (s j ) ∑ (s j ) k=1 • Then: m = −π − 1 ∆Φ a0 ∑lk (k) L k=1 −1 m 2πnl a = ∆Φ(k)sin k n π ∑ n k =1 L 1 m 2πnl b = ∆Φ(k)cos k n π ∑ n k =1 L • Fourier descriptors depend on the starting point • For 64x64 images, it has been shown that 5 coefficients are enough to discriminate between “2” and “Z” Feature 123 Extraction

Transform-based features: Fourier Descriptors

F.P. Kuhl, C.R. Giardina: Elliptic fourier features of a closed contour. Comput. Vis. Graphics Image Process, vol. 18, pp. 236-258, 1982 T m ∆x a = i []cosφ − cosφ Elliptic Fourier Descriptors n 2π 2 ∑ ∆ i i−1 2n i=1 ti N π π = + ⎡ 2n t + 2n t ⎤ T m ∆x xˆ(t) A0 an cos bn sin = i []φ − φ ∑ ⎢ ⎥ bn sin i sin i−1 n=1 ⎣ T T ⎦ 2π 2 ∑ ∆ 2n i=1 ti N ⎡ 2nπt 2nπt ⎤ m ∆ = + + T yi yˆ(t) C0 cn cos dn sin = []φ − φ ∑ ⎢ ⎥ cn 2 2 cos i cos i−1 n=1 T T In the π ∑ ∆ ⎣ ⎦ 2n i=1 ti discrete where T is the contour length and T m ∆y case d = i []sinφ − sinφ ˆ ≡ ˆ ≡ → ∞ n 2π 2 ∑ ∆ i i−1 x(t) x(t), y(t) y(t) when N . 2n i=1 ti φ = π ∆ = − ∆ = − on i 2n ti /T, xi xi xi−1, yi yi yi−1, i Invariance to starting point ∆ = ∆ 2 + ∆ 2 = ∆ ti x y , ti ∑ t j j=1 The phase shift with respecte the main axis m = = ∆ is computed and the coefficients are rotated T tm ∑ t j , m number of contour pixels = according to this angle j 1

* * θ − θ ⎡a b ⎤ ⎡an bn ⎤ ⎡cos n 1 sen n 1 ⎤ 1 − 2(a b + c d ) n n θ = tan 1 1 1 1 1 = ⋅ 2 2 2 2 ⎢ * * ⎥ ⎢ ⎥ ⎢ ⎥ 2 a − b + c − d c d sen nθ cos nθ 1 1 1 1 ⎣cn dn ⎦ ⎣ n n ⎦ ⎣ 1 1 ⎦ Feature 124 Extraction

62 Transform-based features: Fourier Descriptors

Rotation invariance The orientation of the main semi-axis is computed and the coefficients rotated

* ** ** ψ ψ * * c ⎡an bn ⎤ ⎡ cos 1 sen 1 ⎤ ⎡an bn ⎤ ψ = arctg 1 ⎢ ⎥ = ⋅ ⎢ ⎥ 1 * ** ** ⎢− senψ cosψ ⎥ * * a1 ⎣cn dn ⎦ ⎣ 1 1 ⎦ ⎣cn dn ⎦ Scale invarianve Coefficients are divided by the magnitude of the main semi-axis

2 2 E = a* + c* = a** Character 5 reconstructed using elliptic Fourier descriptors of 1 1 1 order 1, 2, ..., 10; 15, 20, 30, 40, 50 and 100 respectively

Feature 125 Extraction

Transform-based features: Fourier Descriptors

T. Taxt, J.B. Olafsdottir, M. Daehlen: Recognition of Handwritten Symbols, Pattern Recognition, Vol. 23, n. 11, pp. 1155-1166, 1990 Evaluation • Experiments with handwritten digits (100 images per digit) – 12 elliptic descriptors: 99.7% – 12 non-elliptic descriptors: 99.5% • Experiments with digits + lowercase letters: – 12 elliptic descriptors: 98.6% – 12 non elliptics descriptors: 90.1%

Feature 126 Extraction

63 Structural Analysis

Methods based on the analysis of the character structure from the detection of some features and their relationships (the basic idea is to divide the character into its basic parts) • Contour analysis • Skeleton analysis • Analysis of topological and geometric features

Feature 127 Extraction

Structural Analysis: Run-length encoding

• It is the simplest structural representation • run-length encoding represents each image row as a sequence of pairs (l,g), where each pair represents a sequence of l consecutive píxels with gray level g.

0 0 4 4 4 2 2 2 2 2 4 4 4 1 1 1 1 1 1 1 1 0 0 0 0 (2,0), (3,4), (5,2), (3,4), (8,1), (4,0)

• For binary images, only the sequence length is required

2, 4, 5, 1, 4, 3, 1

Feature 128 Extraction

64 Structural Analysis: Run-length encoding

A graph is built on the run-length 7,7 encoding, where: Region 1 6,6 Nodes: run-lengths. Edges: overlapping between runs in consecutive rows. 4,7

3,3 8,8

8,8

4,8 Region 2 3,3 8,8

3,3 8,8

3,3 8,8

4,7 9,9 Feature 129 Extraction

Structural Analysis: Chain-code

Chain-codes or Freeman codes are the simplest angular approximation. They

permit to code each vector di between two consecutive points with a code between 0 and 7. The codification of a string S is composed of 3 fields:

• Starting point of the segment po(S)=(xo,yo). •Segment lengthl(S). • A table of directions: ∈ ∀ ∈ Dir(S) = [d0, d1, ... , dl(S)-1] where di [0,7], i [0,l(S)-1], according to the following codification:: 2 3 1 4 0 5 7 6 0 1 3 4 5 7 d d d d d d i i 2 i i-1 i-1 6 i-1 di-1 di di-1 di-1 di-1 di di-1 di di di

Feature 130 Extraction

65 Structural Analysis: Contour Analysis

• Contour pixels are codified accoriding to their orientatios and using the chain-code representation

Chain Codes 2 3 1 4 0 5 7 6

2222122020 1000070070

• Classification can be done with methods of structural recogntion based on the distance string-edit algorithm by Wagner i Fischer

Feature 131 Extraction

Structural Analysis: Skeleton Analysis

• Image thinning and representation of the skeleton with some encoding method allowing to compare two shapes (sometimes it is necessary to vectorize the skeleton): – Chain codes esquelet – Grafs B – Gramàtiques – Zoning – Característiques discretes. • Skeleton problems: – Noise sensitive – Variability of the representation

Feature 132 Extraction

66 Structural Analysis: Skeleton Analysis

• Representation with graphs or grammars – Based on the detection of characteristic skeleton points and skeleton polygonal approximation – Two possibilities to represent the skeleton with a graph: • Nodes are the characteristic points while edges are the segments joining the points • Nodes are the segments of polygonal approximation while edges represent the adjancency relations between the segments

v1 v1 a1 v4 a2 a1 v4 a2 a3 v3 a7 a4 v2 v2 v5 v5 a3 a6 v6 a5 a6 a4 a7 a5 v3 v6 v7

Feature 133 Extraction

Structural Analysis: Skeleton Analysis

J. Rocha, T. Pavlidis: Character recognition without segmentation. IEEE Trans. on PAMI, vol. 17, nº 9, pp. 903-909, 1995 Representation with graphs:

Feature 134 Extraction

67 Structural Analysis: Skeleton Analysis

• Zoning.

ABC Opció 1: stroke length within each zone Opció 2: coding from the arcs: D E F ArC, ArD, CcF, DrF, DrG, FcI, GrI on r = recta, c = corba. GH I

• Discrete features: – Number of loops – Number of T joints and X joints – Number of terminal points, corner points and isolated points – Cross points with horizontal and vertical axes

Feature 135 Extraction

Structural Analysis : Topological and geometric features

• Aspect ratio x-y. • Perimeter, area, center of gravity • Minimal and maximal distance of the contour to the center of gravity • Number of holes • Euler number = (n. of connected components) - (n. of holes) • Compacity = (perimeter)2 / (4π ·area) • Information about contour curvature • Ascenders and descenders • Concavities and holes • Loops • Unions, terminal points, crossings with horizontal and vertical axes • Angular information: histogram of segment angles

Feature 136 Extraction

68 Components of an OCR system

ACQUISITION

• Filtering. DOCUMENT • Binarization. PRE-PROCESSING • Skew correction.

• Layout analysis SEGMENTATION • Text/graphics separation • Character segmentation

CHARACTER • Filtering PRE-PROCESSING • Normalization

• Image-based features FEATURE • Statistical features EXTRACTION • Transform-based features • Structural features

LEARNING CLASSIFICATION

Models Context POSTPROCESSING infromation Classification 137

Classification

• Different methods, depending on the model of feature representation • Classification using feature vectors – Correlation – Euclidean distance – Mahalanobis distance – k nearest neigbours – Bayes’ classifier – Neural networks • Classification with structural features – Dicothomic search –Stringedit – Graph matching – Grammars

Classification 138

69 Classification using feature vectors

• Correlation – Classification with the class with the largest correlation value

f (x, y)gi (x, y)dxdy S i ( f ) = ∫∫R 2 2 f (x, y) gi (x, y) ∫∫R ∫∫R

• Minimal euclidean distance

– Distance to the mean mi of the class – It does not take into account differences in variance for each class

= − T − Di (x) (x mi ) (x mi )

Classification 139

Classification using feature vectors

• Minimal quadratic distance (Mahalanobis).

– For each class i, the mean mi and covariance matrix Si are computed from the set of samples – The covariance matrix is taken into account when computing the distance from an image to the class i = − T −1 − = − T Di (x) (x mi ) Si (x mi ) z z = Λ−1/ 2ΨT − z i i (x mi ) Λ i : valors propis de Si Ψ i :vectors propis de Si – The feature vector of the image x is projected over the eigenvectors of the class

Classification 140

70 Classification using feature vectors

• k-Nearest Neighbors – Several sample models or each class – Given an image, we take the k nearest models to the image – The image is classified into the class with more elemetns in the set of k nearest neighbors = (i) Di (x) Vx

Vx : conjunt dels k models més propers a x (i) Vx : models que estan dins de Vx i que pertanyen a la classe i • Weighted several nearest neighbors

– k depends on the image. For each image x, the set Vx contains all the models with a distance lower than α times the distance to the nearest model V (i) D (x) = x i 2 ()(i) ∑ d x, x j ∈ (i ) j Vx Classification 141

Classification using feature vectors

• Bayes’ classifier p(x) :probability of observing vector image x

p(x | wi ) : Likelihood. Probability of obseving vector image x given that it belongs to class i.

p(wi | x) :probability that x belongs to class i. Posterior probability – An image x is classified into the class i that maximize the posterior

probability p(wi|x) – Applying Bayes’ theorem: p(x | w ) p(w ) p(w | x) = i i i p(x) – p(x) is constant and independent of class i. Therefore, it has no influence in classification

– If all classes have the same prior probability, p(wi) we can discard it too. Then: = arg max p(wi | x) arg max p(x | wi ) i i

Classification 142

71 Classification using feature vectors

• If we assume a normal distribution for each class with mean mi and covariance Si, estimated from the set of samples for each class:

−1 − T −1 − −1/ 2 ( x mi ) Si (x mi ) = π −n / 2 ⋅ ⋅ 2 p(x | wi ) (2 ) Si e

• Discarding constants, taking squares and applying the logarithm, we can infer the following discriminant function:

( x) =− − − T −1 − Di log Si (x mi ) Si (x mi )

Classification 143

Classification using feature vectors

• Neural networks – They can be applied directly to the image or to a feature vector, previously computed from the image – The most used neural networks are multi-layer feed-forward networks

Y.L. Cun et al. Backpropagation applied to zip code recognition, Neural comput. 541-551 (1989). Each layer represents a feature subvector of a higher level

Classification 144

72 Classification using feature vectors

• A neural network is organized into several layers. Each layer has a fixed number of nodes • In the first layer, the nodes correspond to the values in the feature vector • In the last layer, the nodes correspond to each of the classes • Intermediate (hidden) layers represent feature subvectors of a higher level • The value at each node in a given layer is computed through the application of a propagation function to the values of the nodes in the previous layer, weighted by a vector of weights

N (k −1) ⎛ − ⎞ k =σ ⎜ k + k k 1 ⎟ x j ⎜bj ∑ w ji xi ⎟ ⎝ i=1 ⎠ k xi :value of node i at layer k N(k) : Nº of nodes of layer k k bi : node bias i at layer k k − w ji : weight of the connexion between node j layer k, node i layer k 1

Classification 145

Classification using feature vectors

• To design a neural network, we have to decide: – The number of layers – The number of nodes at each layer • Learning step, using a set of samples – The weights at each connexion are automatically determined in such a way that minimize classification error • Example of neural network – Three-layer perceptron (the input accounts for one layer). The final discriminant function for each class is:

⎛ N (1) ⎛ N (0) ⎞⎞ D (x) = f ⎜b(2) + w(2) f ⎜b(1) + w(1) x⎟⎟ i ⎜ i ∑∑ij ⎜ j jk ⎟⎟ ⎝ j==1 ⎝ k 1 ⎠⎠ f (x) =1/(1+ e−x ) sigmoidea,

Classification 146

73 Classification using structural features

• Dicotomic search. The presence or absence of certain primitives is tested in several steps. Character models can be organized with decision trees • String or graph matching. Application of algorithms for string edit or matching in graphs of atributes

• Grammars. We test if the character belongs to the language generated by the grammar that represents the model

Classification 147

Classification using Deformable models

• Deformable models – We start with an ideal representation of the shape of the object – This ideal shape is deformed using a set of rules or pre-defined operation in such a way that all possible valid object distortions can be generated – Given an image, we look for the object deformation that yields the best result for an energy function defined as the combination of two measures • Internal energy: it measures the deformation degree from the model of the object • External energy: it measures the degree of similarity of the deformation to the image – The image is classified into the model with the lowest global energy

Classification 148

74 Classification using Deformable models

• Based on a character prototype: – We define a prototype of the character that can be deforme by applying a series of trigonometric transforms over the image space – Basis of the transform: x = π π emn (2sin( nx)cos( my),0) y = π π emn (0,2cos( mx)sin( ny)) – External energy bassed on the distance of the deformation to the image contour – Bayesian combination of internal and external energy

Classification 149

Classification using Deformable models

• Based on a set of point generators located on the surface of a spline: – Internal energy: • The character is represented by a spline. We can modify the control points of the spline • A probability is generated based on the modification of these control points – External energy: • Image points are generated from generators located along the spline • A probability is defined based on the distance from image pixels to point generators – Minimization: • Probabilistic combination • EM algorithm

Classification 150

75 Classification using Deformable models

• Point distribution model: – The model is represented as the mean of a set of points obtained from the skeleton of the learning samples – PCA is applied to obtain the set of valid character deformation from the learning set x =x + Pb – For each image, we get the nearest deformation according to the space defined by PCA ≈ T − bx P (x x)

– Internal energy: bx – External energy: distance between the image and the image obtained using the nearest deformation

Classification 151

Holistic methods

• Used for recognition of handwritten script • Each word is a recognition unit. We try to recognize each word using global features of it. Each word is a different class • Based on psychological evidences • Applications: – Constrained domains with only a few words (bank applications, personal agendas, etc) – Filtering of large domains to reduce the set of possible words • Usually, they are based on the application of HMM or dynamic programming (string edit) to words (not to letters)

Classification 152

76 Estratègies holístiques

• Global word features: – Distribution of segment orientations (horizontal, vertical, diagonal) – Terminal points – Concavities –Holes – Loops – Word length – Ascenders and Descenders. – Crossing point with the central line – Fourier coefficients • Feature representation: – Feature vectors of matrices: the word is divided into zones. Each zone corresponds to a component of the feature vector where it is tested the presence or absence of the features – Graphs: adjacency or neighboring relations between detected features

Classification 153

Hidden Markov Models

• Hidden Markov Model – It represents a double stochastic process where a Markov chain is not directly observable. It can only be inferred from the observation of another stochastic process

– In an HMM, a sequence of observations O=(o1,...,oT) is produced by a sequence of states Q=(q1,...) – An HMM is modelled by: = {}≤ ≤ V vk ;1 k M : set of observable symbols = {}≤ ≤ S si ;1 i N : set of states Π = {}π π = = i , i P(q1 si ) : Initial probability for each state = {} = = = A a ij , a ij P(qt+1 s j | qt si ) : transition probability between states Γ = {}γ γ = = j , j P(qT s j ) : probability of the final state = {} = = = B b j (k) , b j (k) P(ot vk | qt j) : probability of observing a symbol in a given state

Hidden Markov Models 154

77 Hidden Markov Models

Example: Markov Model to model the weather

States: S = {rainy, sunny} Set of observations (possible values of humidity) V = {0%, 25%, 50%, 75%, 100%} Sequence of observations: O = {0%,25%,25%,50%,25%,75%,50%} P(0%|s) P(0%|p) P(s|s) P(p|p) P(p|s) P(25%|s) P(25%|p) P(50%|s) P(50%|p) sunny rainy P(75%|s) P(75%|p) P(s|p) P(100%|s) P(100%|p) With the model we can: •Compute the probability that the sequence of observations can be generated with the model of weather •Given a set of observations of humidity, to estimate the model of weather •Decide if the weather has been rainy or sunny for each of the days under observation

Hidden Markov Models 155

Hidden Markov Models

• Three problems related to a HMM: – Given a sequence of observations O and a model λ={V,S,Π,A,B,Γ), find P(O| λ), the probability that the sequence of observations can be generated from the model • Computed by summing the probabilities of the sequence of observations for all possible sequences of states • Forward/backward propagation methods – Learning problem: given a learning set, O, find the parameters of the model λ that maximize P(O| λ). • Baum-Welch algorithm – Recognition problem: given a sequence of observations O and a model λ={V,S,Π,A,B,Γ), find the optimal sequence of states Q • Viterbi algorithm

Hidden Markov Models 156

78 Hidden Markov Models. OCR applications

1. External segmentation with context post-process: • The Markov model represents grapheme variations. Based on bigram frequencies or trigram frequencies, probabilities of finding sequences of consecutives letters in the dictionary of the language 2. Internal segmentation: • The Markov chain represents character features extracted from left to right along the text. These features are compared with the model and, with the recognition, it can be decided when a character ends and the next one begins 3. Holistic methods: • The Markov chain represents variations within a given word, belonging to a lexicon with the valid words

Hidden Markov Models 157

Hidden Markov Models. OCR applications

External segmentation with context post-process p mm pnn paa pam pnm a m n ...

pma pmn

• One state per letter • Transition probabilities are the probabilities of finding two consecutive letters in the language • The observations are the pre-segmented zones of the image • The probability of the observations given a state are the probabilities that each pre-segmented zone corresponds to the letter associated to the state

Hidden Markov Models 158

79 Hidden Markov Models. OCR applications

Internal segmentation

• Model Discriminant HMM: – AnHMM foreachletter – Recognition is done letter by letter – States within an HMM represent zones within the letter with different feature values – Word recognition is done by finding the best individual HMM combination according to all segmentation possibilities

Hidden Markov Models 159

Hidden Markov Models. OCR applications

Internal segmentation • Path-disciminant HMM –OneHMM – Each state corresponds to a letter – Transitions between states correspond to the probability of changing from one letter to another within a word – Over segmentation of the image. Each observation corresponds to a set of features extracted from each segment – The probability of each observation for a given state is the probability of generating the set of features from a given letter – Recognition looks for the sequence of states (letters) that best generates the set of features extracted from the image

a b c ...

Hidden Markov Models 160

80 Hidden Markov Models. OCR applications

Holistic methods • Example: HMM for word recognition – One HMM for each word – Each letter is represented with four states. Each state is a possible result of a previous oversegmentation – Recognition looks for the HMM (word) that maximizes the probability of generating the sequence of observations (features extracted from the image)

c a

...

Hidden Markov Models 161

Hidden Markov Models. OCR applications

S. Gunter, H. Bunke: HMM-based handwritten word recognition: on the optimization of the number of states, training iterations and Gaussian components. Pattern Recognition, vol. 37, pp. 2069-2070, 2004. • Feature extraction: – Computation of 9 features inside a mobile window: • Number of pixels • Center of gravity • Moments of order 2 • Location of upper and lower contour • Orientation of upper and lower contour • Number of black-white transitions in the vertical direction • Number of black pixels between upper and lower contours • One HMM for each character • HMM are concatenated to compose words. Find the combination of HMM with the highest probability • Results: – Vocabulary: 2296 words – Test set: 3850 words, 80 people – Recognition: 82.05% 162

81 Components of an OCR system

ACQUISITION

• Filtering. DOCUMENT • Binarization. PRE-PROCESSING • Skew correction.

• Layout analysis SEGMENTATION • Text/graphics separation • Character segmentation

CHARACTER • Filtering PRE-PROCESSING • Normalization

• Image-based features FEATURE • Statistical features EXTRACTION • Transform-based features • Structural features

LEARNING CLASSIFICATION

Models Context POSTPROCESSING infromation Post-process 163

Post-process

• Post-process to improve OCR results: – Voting. Combination of several classifiers – Utilization of context information. Analysis of classification of individual characters in the context of adjacent characters

Post-process 164

82 Post-process. Classifier Combination

Combination of specific classifiers with good performance for some characters, fonts, etc. •Integrity: how the voting algorithm controls Text the activation and configuration of individual classifiers. High integrity => The voting algorithm decides the best classifiers in each situation •Representation of the classification results: Classifier 1 Classifier 2 ... Classifier k •Abstract: each classifier simply gives the label of the class. Output of individual •Ranked: each classifier gives several ranked labels. Classifiers classification •Rankedwitha degree of confidence: Knowledge Combination (voting) for each label, the classifier gives a level of confidence. Final •Combination of the classifier results. How to output combine the results

Post-process 165

Post-process. Classifier Combination

x =

Classif. e

?

C17=‘s’, c21=‘S’, c4=‘g’, c9=‘8’

Post-process 166

83 Post-process. Context information

13 o B?

• Context analysis tries to correct errors produced by decisions taken in function of local features • In the presence of uncertainty, hypothesis generated by the local classifier are complemented with hypothesis of neighboring characters • Two points of view – Geometric conext (typographic). – Linguistic context

Post-process 167

Post-process. Context information

• Methods based on n-grams (Combinations of n letters). – Probability that an n-gram appears in the words of the dictionary – When there are characters with uncertainty, we take the final decision according to the n-gram with the highest probability – bottom-up techniques – Viterbi algorithm and Markov methods • Methods based on grammars – A grammar is used to validate the results of the OCR – Similar to n-grams, but they permit to consider variable-length strings and recursivity vint - cents vuit - cents ? Xifra → Desena ‘-’ Unitat | D - C U - C Unitat ‘-’ Centena

vuit - cents Post-process 168

84 Post-process. Context information

• Methods based on a dictionary – Creation of a dictionary with the set of correct words – It permits ortographic correction of the text – Utilization of string-edit algorithms – Problem: words not included in the dictionary – Requires structures for representing the dictionary providing a quick access L L I B R E Autòmat per al diccionari: C A U {LLIBRE, LLIURE, CAURE, COURE, COST} O S T 1 2

paraula p a r a u l a h h(k) clau Hash function n Post-process dictionary 169

Examples of OCR systems

• Printed characters: – Readstar (Innovatic). – Cuneiform (Cognitive Technology). – Word Scan (Calera). – OmniPage (Caere). – Text Bridge (Xerox). – Neuro Talker OCR (Int. Neural Machines). – OCR Master. – Recognita Plus. – TypeReader Professional. –Etc. • Hand-written characters: – Mitek.

170 Examples

85 Examples of OCR systems

• Inspection of surgical sachets – Digit recognition: reference and date – System requirements • Irregular surface: shadows and reflections • Resolution: 175dpi • Aqcuisition with a camera B/W • Diffuse lighting • Detection of 16 defects in 400ms. • Verification system

171 Examples

Examples of OCR systems

• Preprocess – Skew correction: combination of the angle of the upper contour and the segments of the box surrounding the word LOT.

– Binarization: otsu – Thinning.

172 Examples

86 Examples of OCR systems

• Segmentation – Connected components from the skeleton – Application of domain knowledge (character size and separation) to segments touching or broken characters: • Divide wide components • Join thin and nearby components

173 Examples

Examples of OCR systems

• Feature extraction: zoning – The size of each zone is not constant: adapted to image size – Two versions: • Value at each zone: measure of the importance of the zone with respect to the whole character: – Number of pixels greater than a percentage of the total number in the image Æ 1. Otherwise Æ 0 – The central region is more important. Value multiplied by 2. – Three values are added to combine the values in the more discriminant zones • Value at each zone: percentage of white pixels at each zone

1 1 1 1 0 1 1 2 1 0 0 1 1 1 1

2.0 0.5 0.0 174 Examples

87 Examples of OCR systems

• Classification –Version1 • Model with the minimal distance

n = − d ∑ i j m j j=1

• If several models have the same minimal distance and the digit to verify is among the candidates, it is verified with an ambigous signal –Versió2: • Mahalanobis distance • Learning step to compute mean and covariance for each digit • Verification of the digit string: – The string is rejected if there are more than a digit mis-recognized or more than two ambigous digits

175 Examples

Bibliography

•S.Mori, H.Nishida, H.Yamada. Optical Character Recognition. John Wiley and sons. 1999. •H. Bunke, P.S.P. Wang. Handbook of Character Recognition and Document Image Analysis. World Scientific Publishing Company, 1997. •S.V. Rice, G. Nagy, T.A. Nartker. Optical Character Recognition: An illustrated guide to the frontier. Kluwer Academic Publishers. 1999. •S. Impedovo. Fundamentals in Handwiting Recognition. Springer-Verlag, 1994. • A. Belaïd, Y. Belaïd. Reconnaissance des formes. Méthodes et Applications. Inter Editions, Paris, 1992. • A.C. Downtown, S. Impedovo. Progress in Handwriting Recognition. World Scientific Publishing Company, 1997. •P.S.P. Wang. Character and Handwriting Recognition: Expanding Frontiers. Special Issue of IJPRAI, Vol. 5 nums. 1,2, 1991. • T. Pavlidis, S. Mori. Optical Character Recognition. Special Issue of Proceedings of the IEEE, Vol. 80 no. 7, 1992. • V.K. Govindan, A.P. Shivaprasad. Character Recognition - A Review. Pattern Recognition, Vol. 23, No. 7, pp. 671-683, 1990.

176 Bibliography

88 Bibliography

• O.D. Trier, A.K. Jain, T. Taxt. Feature Extraction Methods for Character Recognition - A Survey. Pattern Recognition, Vol. 29, No. 4, pp. 641-662, 1996. • R.G. Casey, E. Lecolinet. A Survey of Methods and Strategies in Character Segmentation. IEEE Transactions on PAMI, Vol. 18 no. 7, pp. 690-706, 1996. • Y. Lu. Machine Printed Character Segmentation – An Overview. Pattern Recognition, Vol. 28, n. 1, pp. 67-80, 1995. • Y. Lu, M. Shridar. Character Segmentation in Handwritten Words – An Overview. Pattern Recognition, Vol. 29, n. 1, pp. 77-96, 1996. • S.W. Lee. Advnces in Handwriting Recognition. World Scientific. 1999. • C.H. Chen, L.F. Pau, P.S.P. Wang. Handbook of Pattern Recognition and Computer Vision. World Scientific. 1993. • J.L. Blue et al. Evaluation of Pattern Classifiers for Fingerprint and OCR Applications. Pattern Recognition, Vol. 27, no. 4, pp. 485-501, 1994. • R. Palmondon, S. Srihari. On-line and Off-line Handwriting Recognition. A Comprehensive Survey. IEEE Transactions on PAMI. Vol. 22, no. 1, pp. 63-84, 2000. •G. Nagy. Twenty Years of Document Image Analysis in PAMI. IEEE Transactions on PAMI, Vol. 22, no. 1, pp. 38-62, 2000. • H.Bunke, T. Caelli. Hidden Markov Models. Applications in Computer Vision. World Scientific. 2001. • R. Duda, P. Hart, D. Stork. Pattern Classification. 2ª ed., Wiley Interscience, 2000

177 Bibliography

Practical work

Binarization Binary image Layout Analysis (Local adaptative)

Document image

Feature N character Character N binary Extraction images Segmentation text images

Classification Feature Vector Character Label (Multiple classifiers)

Groups of 2 people for each task

178

89