bachelor’s thesis

Font Family/Style Recognition

Tereza Soukupová

May 2014

Ing. Michal Bušta

Czech Technical University in Prague

Faculty of Electrical Engineering, Department of Cybernetics

České vysoké učení technické v Praze Fakulta elektrotechnická

Katedra kybernetiky

ZADÁNÍ BAKALÁŘSKÉ PRÁCE

Student: Tereza S o u k u p o v á

Studijní program: Otevřená informatika (bakalářský)

Obor: Informatika a počítačové vědy

Název tématu: Odhadování třídy fontů v úloze rozpoznávání textu v obrázcích

Pokyny pro vypracování: 1. Seznamte se se systémem TextSpotter pro detekci a rozpoznávání textu z obrázků vyvíjeným v Centru strojového vnímání KK FEL ČVUT. Zaměřte se zejména na OCR modul. 2. Seznamte se se state-of-the-art v odhadování fontu písma. 3. Definujte pojem “třída fontu” (použitelnou zejména pro OCR úlohu). 4. Navrhněte algoritmus pro odhadování třídy fontu. 5. Implementujte a otestujte jeho kvalitu. 6. Zvažte využití informace o třídě fontu pro zlepšení kvality OCR.

Seznam odborné literatury: [1] Neumann L., Matas J.: Scene Text Localization and Recognition with Oriented Stroke Detection. ICCV 2013 (Sydney, Australia). [2] Neumann L.: Vyhledání a rozpoznání textu v obrazech reálných scén. Master thesis, ČVUT, 2010. [3] Al-Khaffaf H. S. M., Shafait F., Cutter M. P., Breuel T. M.: On the Performance of Decapod’s Digital Reconstruction. International Conference on Pattern Recognition November 2012, pp.649 – 652 (2012).

Vedoucí bakalářské práce: Ing. Michal Bušta

Platnost zadání: do konce letního semestru 2014/2015

L.S.

doc. Dr. Ing. Jan Kybic prof. Ing. Pavel Ripka, CSc. vedoucí katedry děkan

V Praze dne 10. 1. 2014 Czech Technical University in Prague Faculty of Electrical Engineering

Department of Cybernetics

BACHELOR PROJECT ASSIGNMENT

Student: Tereza S o u k u p o v á

Study programme: Open Informatics

Specialisation: Computer and Information Science

Title of Bachelor Project: Font Family/Style Recognition

Guidelines:

1. Familiarize yourself with the TextSpotter [2] for “text in the wild” detection and recognition developed at the Centre for Machine Perception at the Department of Cybernetics FEE CTU. Focus on the OCR module. 2. Familiarize yourself with the state-of-the-art in estimation of the Font Family/Style Recognition. 3. Define the “Class of the Font” (particularly useful for the OCR task). 4. Suggest an algorithm for the estimation of the class of the Font. 5. Implement it and test its quality. 6. Try to use the information about the class of the Font to improve the OCR quality.

Bibliography/Sources: [1] Neumann L., Matas J.: Scene Text Localization and Recognition with Oriented Stroke Detection. ICCV 2013 (Sydney, Australia). [2] Neumann L.: Vyhledání a rozpoznání textu v obrazech reálných scén. Master thesis, ČVUT, 2010. [3] Al-Khaffaf H. S. M., Shafait F., Cutter M. P., Breuel T. M.: On the Performance of Decapod’s Digital Font Reconstruction. International Conference on Pattern Recognition November 2012, pp.649 – 652 (2012).

Bachelor Project Supervisor: Ing. Michal Bušta

Valid until: the end of the summer semester of academic year 2014/2015

L.S.

doc. Dr. Ing. Jan Kybic prof. Ing. Pavel Ripka, CSc. Head of Department Dean

Prague, January 10, 2014

Poděkování

Chtěla bych poděkovat Ing. Michalu Buštovi za odborné vedení, trpělivost, ochotu a cenné rady, které mi v průběhu zpracování bakalářské práce věnoval. Děkuji také prof. Ing. Jiřímu Matasovi, Ph.D. za věcné připomínky.

Prohlášení

Prohlašuji, že jsem předloženou práci vypracovala samostatně, a že jsem uvedla veškeré použité informační zdroje v souladu s Metodickým pokynem o dodržování etických principů při přípravě vysokoškolských závěrečných prací.

vii

Acknowledgement

I would like to thank Ing. Michal Bušta for his guidance, patience, willingness and assistance during the writing of my thesis. I also thank prof. Ing. Jiří Matas, Ph.D. for his advice.

Declaration

I declare that I have completed this thesis independently and that I have listed all used information sources in accordance with Methodical instruction about ethical principles in the preparation of university theses.

ix Abstrakt

Práce se zabývá optickým rozpoznáváním fontů v obrazech reálných scén. Vychází z OCR systému TextSpotter, který je vyvíjen v Centru strojového vnímání na ČVUT v Praze. Systém vyhledává text v obraze, detekuje oblasti znaků, binarizuje je a ná- sledně se je snaží klasifikovat. OCR klasifikátor je naučen na množině obrázků písmen latinské abecedy napsané v množině trénovacích fontů. Některé znaky jsou ale rozpo- znány špatně nebo nejsou rozpoznány vůbec. Je to dáno tím, že v některých fontech vypadají podobně rozdílné znaky, např „g” může vypadat v jiném fontu jako „8”. Tato práce přináší vylepšení v tom, že rozpozná přibližný font daného textu, a následně na- učí klasifikátor pouze tímto fontem. Sníží se tak velikost trénovací množiny ajevíce pravděpodobné, že znak bude rozpoznán správně.

Klíčová slova

Rozpoznávání fontu; OFR; rozpoznávání textu; OCR

x Abstract

This work presents an algorithm for optical font recognition of a text in real-scene images. It is based on an OCR system TextSpotter developed at Czech Technical Uni- versity in Prague. This system locates areas with a text, detects connected components of characters, binarizes them, extracts features and tries to recognize characters. The OCR system trains its classifier on all initial . Some characters are not well recognized or they stay unrecognized because in some different fonts mismatching char- acters look similar. The goal of this project is to recognize a font of the text and to train the classifier only on this font. The unrecognized characters are then classified again by the classifier with a reduced training set. It is thus more likely that they willbe classified correctly and the OCR quality increases.

Keywords

Optical Font Recognition; OFR; text detection; Optical Character Recognition; OCR

xi Contents

1 Introduction1 1.1 Problem formulation...... 1 1.2 Definitions...... 3

2 State-of-the-art4 2.1 Global feature approaches...... 4 2.2 Local feature approaches...... 7 Our approach...... 8

3 OCR system TextSpotter9 3.1 Input...... 9 3.2 Processing...... 9 3.3 Exctrating features...... 10 3.4 OCR classifier...... 10 3.4.1 K-nearest neighbor...... 10 3.4.2 Training...... 11 3.4.3 Classification...... 11

4 Clustering 12 Hierarchical clustering...... 13 Implementation...... 13 4.1 Clustering through all characters...... 14 4.1.1 Distances between fonts...... 14 The class of the font...... 14 4.1.2 Clusters and their representatives...... 15 Example with the character a ...... 15 4.2 Clustering per character...... 18 4.2.1 Distances...... 18 4.2.2 Clusters and their representatives...... 18

5 Recognition algorithms 19 5.1 Font recognition...... 19 5.1.1 Nearest neighbour font voting...... 19 xii 5.1.2 Finding the shortest path in a multistage graph...... 20 Dynamic programming approach...... 23 5.2 Clustering through all characters...... 26 5.2.1 Nearest neighbour cluster voting...... 26 5.2.2 Finding the shortest path in a multistage graph...... 26 5.3 Clustering per character...... 28 The algorithm...... 28 5.4 An utilization of font knowledge to improve the OCR...... 29 Recognition pipeline...... 30

6 Experiments 31 6.1 Font or cluster recognition...... 31 6.1.1 DATASET 1 – computer-generated images...... 31 6.1.2 DATASET 2 – real-scene images...... 32 Example 1...... 32 Example 2...... 33 Example 3...... 33 Example 4...... 34 6.2 An utilization of font knowledge to improve the OCR quality...... 35 6.2.1 DATASET 1 – computer-generated images...... 35 6.2.2 DATASET 2 – all 132 real-scene images...... 36 6.2.3 DATASET 2a – method FontNN...... 37 6.2.4 DATASET 2b – method FontDynamic...... 37 6.2.5 DATASET 2c – method ClusterNN...... 37 6.2.6 DATASET 2d – method ClusterDynamic...... 38 6.2.7 DATASET 2e – method ClusterPerCharacter...... 38 6.2.8 Examples...... 39 Examples with an improvement...... 39 Examples with no improvement...... 40 Examples where the methods have errors...... 42

7 Implementation 44 7.1 Programming language...... 44 7.2 Used libraries...... 44

xiii 7.3 The code...... 44

8 Conclusion 45 8.1 Font or cluster recognition...... 45 8.2 An utilization of font knowledge to improve the OCR quality...... 45

Appendices

A Appendix 47 A.1 The maximum distance between the letters within the cluster is 6.5... 48 A.2 The maximum distance between the letters within the cluster is 7.5... 49 A.3 The maximum distance between the letters within the cluster is 8.... 50 A.4 The maximum distance between the letters within the cluster is 8.5... 51

B Appendix 52 B.1 The maximum distance between the letters within the cluster is 5.... 53 B.2 The maximum distance between the letters within the cluster is 6.... 54 B.3 The maximum distance between the letters within the cluster is 8.... 55 B.4 The maximum distance between the letters within the cluster is 8.... 56

C Appendix 57 C.1 Enclosed CD...... 57

Bibliography 58

xiv Abbreviations

OFR Optical Font Recognition OCR Optical Character Recignition k-NN k-nearest neighbors

xv

1 Introduction

The optical character recognition automatically converts a photographed text from real-scene images to a computer-readable form. An output of the OCR is a sequence of ASCII character codes. It is more difficult to recognize the text in real-scene images than in printed documents. Worse readibility is caused by geometric distortion, bad lighting conditions and other factors influencing legibility. Automatic processing of the text in images is useful in many areas, e.g. it can be used for automatic number plate recognition, converting books into electronic form, automatic reading of administratice documents such as passports, bank statements or receipts. The OCR system is also very helpful for blind and visually impaired people so that they could read labels on food or medicaments, timetables or bills. Nowadays almost everyone has a mobile phone with a camera and so a popularity of taking photos increases. The OCR is a complex problem in a field of computer vision. An optical font recognition is also important although it is a much less researched problem than OCR. The OFR does not concern labeling of characters but it tries to recognize a font of a typed text. It is e.g. necessary to know a font for a document reproduction. It can also be useful for graphic designers to detect a font of labels written on some products.

1.1 Problem formulation

This work focuses on text detection and recognition in real-scene images, e.g. traf- fic signs, labels on buildings, labels on food or medicaments. It is built onbasis of a TextSpotter system [1] where the text localization is discussed and it is not the point of this thesis. The OCR system recognizes each character individually without any a priori knowledge of a font of the whole text. The main point of this work is a font recognition of the text and application of font knowledge to improve the OCR system.

1 1 Introduction

An input of the algorithm is a sequence of blobs which are tried to be labeled with a character. For each blob there are assigned letters determined by a character

푐푖 ∈ 퐴푙푝ℎ푎푏푒푡 and a font 푓푖 ∈ 퐹 표푛푡푠 which are admissible to be representatives of that blob. In an ideal example the algorithm finds classification of blobs {(푐1, 푓1),..., (푐푛, 푓푛)} so that the font is invariant 푓1 = 푓2 = ··· = 푓푛. The knowledge of the font can be used to improve an OCR quality. The problem is that in two different fonts mismatching characters look the same (e.g. ’g’ and’9’or ’g’ and ’8’, see figure1). The training set can be reduced and the classifier is trained only on the recognized fonts. This may help to better classification. There are really many fonts despite the fact we only focus on . It would be very difficult to train the classifier on all existing fonts even though the new fonts are still being made. However, many fonts are very similar so it is sufficient to get the similar font for our purposes. For initial experiments I have decided to use a set of most common fonts: Windows system fonts.

a) ’8’ - Bold, ’g’ - Bold. b) ’g’ - Vrinda Bold, ’9’ - Calibri Bold.

Figure 1 Mismatching letters which look similar in different fonts.

The algorithm has been tested on 2 datasets. The first dataset is a set of 20 most common English words in all 162 initial fonts synthetically generated in a computer, total 3240 tested words. The second dataset consists of 132 real-scene images with 940 typed words.

This work is structured as follows. In the chapter2 you can read about state-of-the- art dealing with the OFR. The TextSpotter system and preprocessing of the images is described in the chapter3. In the chapter4 you can find different ways for the font clustering. The font recognition algorithms are proposed in the chapter5. The chap- ter6 deals with experiments and testing of proposed algorithms. The implementation details and used libraries are described in the chapter7. In the chapter8 there is an evaluation of the work.

2 1.2 Definitions

1.2 Definitions

Let me define terms that I will use throughout the thesis.

Term Explanation Character c a representation of the ASCII code Alphabet Ω a set of all characters Font f a set of all characters that share design features Letter a character written in one font Blob x a bitmap that resembles a letter Visual mistake an unmeasurable variable decided subjectively by the vision

3 2 State-of-the-art

There are several reasons for the font recognition. The most commonly used reason is a document reproduction. You have a scan or an image of a document (book, article) and you want to make the most similar digitalized copy of it. You need to know a font of the text for the reproduction. The optical font recognition is also useful for graphic and font designers. Another reason for the OFR is to improve the OCR systems. We believe that if we know the font of the given text we can get a higher OCR quality. This is our main goal for the font recognition. Most of the methods for the OFR are used for document reproductions. They need mostly several lines of the text for the recognition. This is unusable for us because we usually have only a few characters or words in a line. We can only assume some approaches and feature extractors from these methods. There are two main approaches: a global and a local feature approach. In the global feature approach features are computed from the whole block of a text (lines, paragraphs, pages). They are good for document reproduction without any knowledge of the content and letters. They usually work with the texture recognition. The local feature approach deals with individual letters or even parts of the letters. These features are based on specific details of each letter.

2.1 Global feature approaches

Zramdini and Ingold [2,3] present an 푎 푝푟푖표푟푖 font recognition based on global features extracted from vertical and horizontal projection profiles of the lines. They say these features are independent. For classification multivariate Bayesian classifier is used. They recognize only 10 in 7 sizes and 4 styles (280 font models). Each font model has been estimated from features of about 100 text lines scanned at 300 dpi. The classifier achieved a recognition rate of near 97 percent. The font recognition rate is sensitive to the length of the text. However, they think that the method can be

4 2.1 Global feature approaches considered as applicable on short texts of about ten characters.

Figure 2 Horizontal and vertical projection profiles. This picture is taken from the paper[3].

Satkhozhina, Ahmadullin and Allebach [4] deal with reproduction of printed docu- ments in their article and try to generate a new visually similar document with a new content. They use 8 features from Zramdini and Ingold [2] using projection profiles (width and height of the word, densities of prejections, normalized height and width, spaces between characters). These features can be used only on some characters so these features were expanded by 5 new for better usage in short texts. They used con- ditional random field which is a probabilistic graphical model. The advantage ofthis method is that the typographical features may not be independent of each other. They scanned 168 pages at 300 dpi of 11 typefaces, 2 weights, 2 slopes, 14 font sizes. The recognition quality was about 64 %. However, they think that their method is sufficient for the automated generating of a new document that looks visually similar to the original.

Figure 3 Horizontal and vertical projection profiles. This picture is reproducted from the paper [3].

In a paper Cutter et al. [5,6] a method for a font reconstruction is proposed which makes scanned documents searchable and similar to the original. They form the recon- structed font from letters which are created from the approximated shapes of tokens of the letters. They scanned pages with multiple fonts at 300 dpi and the method has

5 2 State-of-the-art

98.4 % accuracy of the assigning letters into their candidate fonts. Zhang, Lu and Tan [7] describe a method for an italic font recognition using stroke pattern analysis. They make wavelet decomposition from each word image from scanned text documents. The recognition accuracy is about 96 %. Khoubyari and Hull [8,9] focuse on font recognition using frequent words in a docu- ment such as 푡ℎ푒, 표푓, 푎푛푑, 푎 and 푡표. They make clusters of equivalent words from word images segmented from a document. Then they find clusters with the given function words and compare them with a database of fonts. They experimented that 34 out of 40 test cases were correctly identified. Zhu, Tan and Wang [10] deal with the font recognition method based on global feature extraction and texture analysis of a document. A recognition rate is 99.1 %. Carlos, Juan and Hidalgo [11] proposed a global method based on the analysis of the texture of a text block. They use invariant moments from random variable computed from the image block. They use 8 basic fonts with their varieties: regular, italic, bold and italic bold, totally 32 combinations. They scanned text blocks at 300 dpi. They reached 95 % accuracy.

Figure 4 An uniform text block after preprocessing. This picture is taken from the paper [10].

In a paper [12] they present a method for reconstructing font from printed documents and generating PDF document with a faithful font to the original.

Figure 5 Sample line of Times font. Top - Original, Middle - Decapod. This picture is taken from the paper [12].

6 2.2 Local feature approaches

2.2 Local feature approaches

Solli and Lenz [13] use local approach. This method is based on eigenimages. The prin- cipal component analysis describe the images in a lower dimensional subspace. The dis- advantage of this method is that it can not be used without working OCR system. It is evaluated on printed and scanned text lines. They have 2763 different fonts. This method finds the correct font name for 99 percent of the queries in the 5 bestmatches.

Figure 6 The first five eigenimages for character a, reshaped to 2-D images. This picture is taken from the paper [13].

Lidke [14] applied Bag-Of-Visual-Words approach. It uses letter snippets and trans- late them to visual words. It was tested on 9809 different fonts and the best-in-five rate is 94 %.

Figure 7 Examples of visual words. This picture is taken from the paper [14].

Ozturk, Sankur, Abak [15] deal with a font clustering. Their goal is not to get exact font but to get similar font to improve the OCR. They try to find a minimal set of clusters that provides adequate OCR performance across all fonts and to choose one representative from each cluster. They use 28 typefaces in weight and slope varieties (total number amounts to 65). This paper shows that for all the fonts six to eight clusters prove to be adequate.

7 2 State-of-the-art

Our approach

Most of these global feature approaches are not useful for us because they use features from the whole text and we have only several characters. These approaches often use the ratio between uppercase and lowercase letters. We can not use this because in our word there are often only uppercase or only lowercase letters. We have got also perspective distortion and rotation of the text which makes the task more difficult. The local feature approaches of single letters are more useful for us because we often have only a few words (sometimes even just one) at the input and we process it per letter.

8 3 OCR system TextSpotter

The algorithm follows the work of Neumann and Matas [16], [17]. The OCR task is a complex computer vision problem with many utilizations. In this chapter the preprocessing of characters is described.

Figure 8 The ouptut of the TextSpotter system.

3.1 Input

The input for this algorithm is any picture with a captured text. The text needs to have at least 3 characters.

3.2 Processing

The preprocessing consists of text localization, segmentation and binarization of char- acters. The characters in the real-scene image are detected as connected components (more described in [16]) which is a set of connected components. It is based on the idea that the text in the images has to be in contrast to its background. Then the algorithm

9 3 OCR system TextSpotter determines if the connected component is a character or a non-character and the whole text is binarized. The connected components are merged into lines. Each character is then cropped by its bounding box and normalized to a given size 20×20 pixels.

Figure 9 The character a and its bounding box.

3.3 Exctrating features

We have 200 dimensional features based on Chain code. The image is divided into 25 (5×5) small squares of the same size. In each of 8 directions (down, up, left, right, down- left, down-right, up-left, up-down) we look into each small square if there is a contour in a given direction. Finally we have 8 25 dimensional features connected into one big 200 dimensional. The images are blurred to be more robust to the distortion.

Figure 10 Heat maps from chain code features with Gaussian distortion of character A, the font is Bold.

3.4 OCR classifier

3.4.1 K-nearest neighbor

A sufficient classifier to use for this moment is the k-nearest neighbor. Itfinds 푘 most similar results to the input data. A training set is a set of multidimensional feature vectors with class labels. Computing exact nearest neighbors in higher dimensions is a

10 3.4 OCR classifier very computationally expensive task. Fast Library for Approximate Nearest Neighbors [18] is used and only approximate neighbors close enough are found.

3.4.2 Training

The classifier is trained on synthetic data. The set of images of letters is generated ina computer. All 62 characters of the alphabet are generated in 162 fonts in a pure form and also with blur distortion, totally 40 176 samples.

3.4.3 Classification

The OCR system extracts features from an image of character to be recognized and searches the k-nearest neighbours. It searches the nearest neighbour through all char- acters of all fonts. The most occurred result is considered as the recognized character. It doesn’t use any knowledge of a font of one word. In future work we will suppose that the word is written by one font.

Figure 11 Example of k-NN classification. The test sample (green circle) should be classified either to the first class of blue squares or to the second class of red triangles. If k=3(solid line circle) it is assigned to the second class because there are 2 triangles and only 1 square inside the inner circle. If k = 5 (dashed line circle) it is assigned to the first class (3 squares vs. 2 triangles inside the outer circle). This image and caption is taken from Wikipedia [19].

11 4 Clustering

There are many fonts all over the world. However, it is complex and inefficient to train the k-nearest neighbors classifier on all existing fonts. The task is to observe afeature space of letters in different fonts and explore if it is possible to divide fonts tosome classes and from each class to find one representative.

Figure 12 Minimum spanning tree of all fonts. It is a difficult task to cluster fonts.

12 Hierarchical clustering

Hierarchical clustering is a clustering method which groups data according to the crite- rion and creates a hierarchy tree. There are two main approaches: agglomerative and divisive. In agglomerative approach each observation begins in its own seperated cluster and then clusters are merged. In divisive approach all observations are at one cluster at the start and then they are divided according to the criterion. I have decided to use agglomerative approach beacuse it is less complex method and easier to implement. The result of hierarchical clustering can be shown in a dendrogram.

Figure 13 Dendrogram of all fonts with coloured clusters.

Implementation

For clustering the library ‘SciPy’ [20] is used, especially its module ‘scipy.cluster.hierarchy’. It is very important to choose properly the linkage method which determines the way how to compute the distance between 2 clusters in the hierarchical agglomerative clustering. The main linkage methods are single, complete and average. The single linkage method merges two clusters with a minimum distance through all points in both clusters. The complete linkage method merges two clusters in the same way as the single method but with the difference of using maximum of distances and the average

13 4 Clustering method computes average distance through all points in a cluster. I have decided to use the complete criterion because it guarantees us that no two fonts in the same cluster are not too far apart.

푑(푢, 푣) = max(푑푖푠푡(푢[푖], 푣[푗])) (1) 푖,푗 for all points 푖 in cluster 푢 and 푗 in cluster 푣. This is also known by the Farthest Point Algorithm.

4.1 Clustering through all characters

As the first experiment I have tried to cluster fonts that are similar in all characters.

4.1.1 Distances between fonts

There are many ways how to compute a distance between two fonts, e.g. use mini- mal, average or maximal distance between matching characters from both fonts. For computing we will use our 200 dimensional features based on a chain code. As a dissimilarity measure 퐷 between two fonts I have chosen the biggest euclidean distance between the features of the same char from both fonts. The dissimilarity measure is thus determined by two least similar matching chars.

퐷(푓1, 푓2) = max ‖푐푓 , 푐푓 ‖2 (2) 푐∈퐶 1 2 where 퐶 is the set of all possible chars:

퐶 = {1, 2, . . . , 퐴, 퐵, . . . , 푍, 푎, 푏, . . . , 푧} (3)

푓1 and 푓2 are two given fonts and 푐푓푖 are extracted chain code features of char 푐 and font 푓푖.

The class of the font

Here we can define the class of the font as all fonts belonging to the same cluster. It means that the dissimilarity measure for each two fonts within one cluster is a threshold to the maximum.

14 4.1 Clustering through all characters

4.1.2 Clusters and their representatives

I attach some images of clusters made with the hierarchical clustering using maximal euclidean distance and complete linkage criterion. I have experimented with changing the maximum distance within each cluster. The representative font is chosen as the font having the smallest sum of feature distances to all other fonts. More examples can be seen in the Appendix A.

Example with the character a

A satisfying number of clusters is made with the maximal distance between letters within cluster set to 8, see figure 14. However, there are some visual mistakes coloured by red which do not look good. We can decrease the distance to 7.5, see figure 15. There are still some mistakes. If we decrease the distance to 6.5, we can see well seperated clusters, see figure 16. But the number of clusters 52 is too big and many letters are in the cluster itself.

15 4 Clustering

Figure 14 12 clusters of the character a with the maximal distance set to 8.

Figure 15 25 clusters of the character a with the maximal distance set to 7.5.

Figure 16 52 clusters of the character a with the maximal distance set to 6.5.

16 4.1 Clustering through all characters

In this table, there are 25 representative fonts and their clusters which I will use in next parts of this thesis.

REPRESENT. FONTS IN CLUSTER SegoeScript-Bold SegoeScript-Bold, SegoeScript-R., IskoolaPota-R. AngsanaNew-Bold, AngsanaNew-R., AngsanaUPC-R., Aparajita-Bold, Aparajita-R., David-Bold, David-R., FrankRuehl-R., FreeSerif- Medium, IskoolaPota-Bold, IskoolaPota-R., Kokila-Bold, Kokila-R., MicrosoftHimalaya-R., MongolianBaiti-R., ShonarBangla-Bold, ShonarBangla-R., TimesNewRoman-Bold, TimesNewRoman-R., TraditionalArabic-Bold, TraditionalArabic-R. -R. Andalus-R., ArabicTypesetting-R., Constantia-R., DaunPenh-R., MicrosoftUighur-R., Narkisim-R., -R., PalatinoLinotype-Bold, PalatinoLinotype-R., PlantagenetCherokee-R. IrisUPC-Bold IrisUPC-Bold, IrisUPC-Regula, -Bold -Bold, Constantia-Bold, EucrosiaUPC-Bold, EucrosiaUPC-R., Georgia-Bold, Georgia-R., KodchiangUPC-Bold, KodchiangUPC-R., Vani-Bold, Vani-R., -R. Gabriola-R. SegoePrint-Bold SegoePrint-Bold, SegoePrint-R. MVBoli-R. MVBoli-R. LilyUPC-Bold LilyUPC-Bold, LilyUPC-R. FranklinG.M.-R. FranklinGothicMedium-R., FreesiaUPC-Bold, FreesiaUPC-R. -Black Arial-Black, CourierNew-Bold, DilleniaUPC-Bold, JasmineUPC-Bold Calibri-Bold Aharoni-Bold, Calibri-Bold, SimplifiedArabic-Bold KaiTi-R. -Bold, Consolas-R., KaiTi-R., SimHei-R. -R. Corbel-Bold, Corbel-R., LevenimMT-Bold, LevenimMT-R. Kalinga-Bold DilleniaUPC-R., JasmineUPC-R., Kalinga-Bold, Kalinga-R. SimplifiedA.F.-R. CourierNew-R., MiriamFixed-R., Rod-R., SimplifiedArabicFixed-R. KhmerUI-Bold Arial-Bold, BrowalliaNew-Bold, BrowalliaUPC-Bold, ComicSansMS- Bold, ComicSansMS-R., -Bold, Gisha-Bold, Kartika- Bold, KhmerUI-Bold, LaoUI-Bold, Latha-Bold, Leelawadee-Bold, Mangal-Bold, MicrosoftNewTaiLue-Bold, MicrosoftPhagsPa-Bold, MicrosoftTaiLe-Bold, MicrosoftYaHei-Bold, Raavi-Bold, SegoeUI-Bold, -Bold, Tunga-Bold, Utsaah-Bold, -Bold, Vrinda-Bold -Bold Calibri-Light, Calibri-R., Candara-Bold, Candara-R., EstrangeloEdessa-R., -R., MoolBoran-R., SakkalMajalla- Bold, SakkalMajalla-R., TrebuchetMS-Bold, TrebuchetMS-R. CordiaNew-R. Arial-R., BrowalliaNew-R., CordiaNew-R., CordiaUPC-Bold, CordiaUPC-R., DokChampa-R., Gautami-R., Kartika-R., Latha- R., Mangal-R., MicrosoftSansSerif-R., MicrosoftYiBaiti-R., Miriam-R., Raavi-R., Shruti-Bold, Shruti-R., SimplifiedArabic-R., Tunga-R., Utsaah-R., Vrinda-R. KhmerUI-R. Gisha-R., KhmerUI-R., LaoUI-R., Leelawadee-R., LucidaConsole- R., LucidaSansUnicode-R., MalgunGothic-Bold, MalgunGothic-R., MicrosoftJhengHei-Bold, MicrosoftJhengHei-R., MicrosoftNewTaiLue- R., MicrosoftPhagsPa-R., MicrosoftTaiLe-R., MicrosoftYaHei-R., SegoeUI-Light, SegoeUI-R., SegoeUI-Semibold, SegoeUISymbol-R., Tahoma-R., Verdana-R. FangSong-R. DFKai-SB-R., FangSong-R., SimSun-ExtB-R. Vijaya-Bold Vijaya-Bold, Vijaya-R. FISHfingers-Light FISHfingers-Light, FISHfingers-R. -R. Impact-R., TheMightyAvengers-TheMightyAvengers ASweetM.M.L.-R. ASweetMelodyMyLady-R.

Figure 17 The table shows clusters of the fonts and their selected representative fonts .

17 4 Clustering

4.2 Clustering per character

It is a difficult task to cluster fonts that it satisfies all the characters and nottohavea lot of clusters because the distances between the same characters in the different fonts are sometimes larger than the distances between the different characters. So the second experiment for the font clustering is to cluster per character. It means that for each character new own clusters of fonts are created. Then we define the classes of the font for each character. In one class of the font of character 푐 there are all fonts whose characters 푐 have smaller distance from one another than a maximum threshold.

4.2.1 Distances

It is utilized by the same procedure for clustering as described above. The only differ- ence is in computing distances. Each character is taken seperately and the distances are computed between the letters of all fonts of the character.

퐶푑푖푠푡푎푛푐푒(푓1, 푓2) = ‖푐푓1 , 푐푓2 ‖2 (4) for all characters: 퐶 = {1, 2, . . . , 퐴, 퐵, . . . , 푍, 푎, 푏, . . . , 푧} (5)

푓1 and 푓2 are two given fonts and 푐푓푖 are extracted chain code features of char 푐 and font 푓푖.

4.2.2 Clusters and their representatives

I attach an example of a clustering of the character a. For satisfaying clustering with only 1 visual mistake 8 clusters are enough. More examples are shown in the Ap- pendix B.

Figure 18 Clusters of the character a with maximal distance between letters set to 7.

18 5 Recognition algorithms

The main idea of this work is to recognize a font and to improve the OCR quality by using this knowledge. The current OCR system works in such a way that it searches similar letters to the connected component across all fonts. The nearest neighbor classi- fier returns even more hypotheses. The problem is that in some fonts a character looks like a different character in different fonts. The idea is to introduce an additional latent variable (the class of the font) to improve the OCR. We can thus reduce the training set and try to find the classifications of the whole word with the invariant font.

5.1 Font recognition

This section proposes methods for recognition of a specific font. It determines one of the 162 training fonts.

5.1.1 Nearest neighbour font voting

This is the simplest method how to determine a font. It is a baseline for the font recognition. For each connected component in a detected text 1-nearest neighbour is found. It is supposed to be the right character. The most occured font is decided to be the correct one.

In this example the word Lectures has 8 letters and the font Kokila Bold was recognized at 2 letters. It means that the font was recognized with 25% certainty. Certainty is a ratio between a number of letters with the recognized font and a number of all letters. If there is each font only once, the font of recognized character with the lowest distance is chosen.

19 5 Recognition algorithms

L e c t u r e s Shonar Kokila David Traditional David Times New Kokila 1st Bangla Bold Bold Arabic Regular Roman Bold Himalaya Bold Bold Bold Regular

Figure 19 The table shows the nearest neighbour fonts for each character from the word Lectures.

5.1.2 Finding the shortest path in a multistage graph

A more robust way how to detect a font is finding the shortest path in an evaluated oriented multistage graph. More than 1 nearest neighbours are found for each character. In our example we have found 5 nearest neighbours as you can see below in a table. The font can be repeated in one column because for each letter there are 4 samples in the training set generated, one normal and three with some distortion.

L e c t u r e s Shonar Kokila David Traditional David Times New Kokila Microsoft 1st Bangla Bold Bold Arabic Regular Roman Bold Himalaya Bold Bold Bold Regular Aparajita Vani Kokila Iskoola Frank Shonar Eucrosia Aparajita 2nd Bold Bold Bold Pota Ruehl Bongla UPC Regular Bold Regular Bold Bold Shonar Cambria Jasmine Iskoola Iskoola Aparajita DFKai Shonar 3rd Bangla Bold UPC Pota Pota Bold Regular Bongla Bold Bold Bold Regular Bold Shonar Cambria Kodchiang Traditional Times New Shonar Shonar Times New 4th Bangla Bold UPC Arabic Roman Bongla Bongla Roman Bold Bold Bold Regular Bold Bold Bold David Georgia Franklin Frank Mongolian David Georgia Eucrosia 5th Bold Regular Gothic Ruehl Baiti Bold Bold UPC Regular Bold Regular Regular

Figure 20 The table shows 5 nearest neighbour fonts for each character from the word Lectures.

The cardinalities for each font are counted from the table above as the number of columns where the given font appears.

Shonar Bangla Bold 4 Eucrosia UPC Bold 1 Microsoft Himalaya Regular 1 Kokila Bold 3 Georgia Regular 1 David Regular 1 David Bold 3 Regular 1 Traditional Arabic Bold 1 Aparajita Bold 2 Aparajita Regular 1 Mongolian Baiti Regular 1 Bold 2 Vani Bold 1 Times New Roman Regular 1 Frank Ruehl Regular 2 Kodchiang UPC Bold 1 Jasmine UPC Bold 1 Georgia Bold 1 Cambria Bold 1 Iskoola Pota Regular 1 Eucrosia UPC Regular 1 DFKai Regular 1 Iskoola Pota Bold 1

Figure 21 The cardinalities of the fonts found as the nearest neighbours.

20 5.1 Font recognition

The font of the evaluated text is not known so only an image of the word Lectures in Kokila Bold is attached for visual comparement.

Figure 22 The original image with the comparement of a cropped word and a word generated in the detected font Kokila Bold and in the second best font ShonarBangla Bold

Let’s have a directed multistage graph 퐺 = (푉, 퐸). The vertices at this graph are partitioned into 퐿 = 2 + 푙푒푛(푤표푟푑) disjoint sets 푉푖 where 0 ≤ 푖 ≤ 퐿. The cardinalities of sets 푉0, 푉퐿 are |푉0| = |푉퐿| = 1 and the cardinality of 푉푖, 1 ≤ 푖 ≤ 퐿 − 1, is the number of the nearest neighbors found, |푉푖| = 퐾. The set 푉1 is the start node (푆) and

푉퐿 is the final node퐹 ( ). Each set 푉푖 is called a stage. Each node from the stage 푉푖, 1 ≤ 푖 ≤ 퐿 − 1, contains information about the font, the character and the distance.

From each node in the stage 푉푖, 0 ≤ 푖 ≤ 퐿 − 1, the directed edges lead to all the nodes in a stage 푉푖+1. An edge has a cost 푐(푖, 푗). Our problem is to find a minimum cost path from the start node to the final node. The cost of an edge between nodes 퐴, 퐵 is computed dynamically as a sum of the rank of the nearest neighbour of 퐵 (raw in which is the node 퐵), a penalty of the font for low cardinality and a penalty if the font differs with the font of the previous node in a path. The algorithm finds theshortest path between nodes 푆 and 퐹 . The most occured font in the shortest path is detected as the result font.

21 5 Recognition algorithms

Let us define:

푐(퐴, 퐵) cost of the edge between nodes 퐴 and 퐵 푝푒푛푁푁(퐴) the rank of the node A in the nearest neighbours queue (the row of node 퐴 in a Fig: 20) 푝푒푛퐷푖푓푓퐹 표푛푡(퐴, 퐵) the penalty if the font name from node 퐵 differs from the font name from previous node 퐴 푐푎푟푑(퐴) the number of columns in which the given font 퐴 is 푝푒푛퐶푎푟푑(퐴) the penalty for the low cardinality of the font of node 퐴 퐾 number of the nearest neighbours found

⎧ ⎪ ⎨⎪퐾, if 푓표푛푡(퐵) ̸= 푓표푛푡(퐴) 푝푒푛퐷푖푓푓퐹 표푛푡(퐴, 퐵) = ⎪ ⎩⎪0, otherwise

푝푒푛퐶푎푟푑(퐹푖) = [max 푐푎푟푑(퐹푗) − 푐푎푟푑(퐹푖)] * 2 (6) 푗∈퐹 where 퐹 is the set of all fonts found The cost 푐(퐴, 퐵) is computed:

푐(퐴, 퐵) = 푝푒푛푁푁(퐵) + 푝푒푛퐷푖푓푓퐹 표푛푡(퐴, 퐵) + 푝푒푛퐶푎푟푑(퐵) (7)

22 5.1 Font recognition

푆퐵 KB 퐷퐵 푇 퐴 퐷푅 푇 푁 KB 푀퐻

푆 퐴퐵 푉 퐵 KB 퐼푃 퐹 푅 푆퐵 퐸푈 퐴푅 퐹

푆퐵 퐶퐵 퐽푈 퐼푃 퐼푅 퐴퐵 퐷퐹 푆퐵

Figure 23 Multi-stage graph of the 3 nearest neigbours. In each column (except the first and the last one) are nearest neighbours for each character from the word lectures. The abbreviations means: SB - Shonar Bangla Bold, KB - Kokila Bold, DB - David Bold, TA - Traditional Arabic Bold, DR - David Regular, TN - Times New Roman Bold, MH - Microsoft Himalaya Regular, AB - Aparajita Bold, VB - Vani Bold, IP - Iskoola Pota Bold, FR - Frank Ruehl Regular, EU - Eucrosia UPC Bold, AR - Aparajita Regular, CB - Cambria Bold, JU - Jasmine UPC Bold, IR - Iskoola Pota Regular, DF - DFKai Regular

Because of the ‘global’ penalties penDiffFont and penCard the found path does not guarantee the optimality. It can be solved e.g. by finding 퐾 – shortest paths or by increasing a number of the nearest neighbours. We do not have enough time to design an optimal algorithm within the thesis because the task is difficult.

Dynamic programming approach

The shortest path in the multistage graph is found by dynamic programming. The for- ward approach [21] is used. Let 푐(퐴, 퐵) be the cost of the edge between nodes 퐴 and 퐵 and 푑(푆, 퐴) be the cost of a path from start node 푆 to node 퐴. 푐(푆, 퐴) = 2 푐(퐴, 퐷) = 4 푐(퐵, 퐷) = 3 푐(퐶, 퐷) = 4 푐(퐷, 퐹 ) = 1 푐(푆, 퐵) = 2 푐(퐴, 퐸) = 5 푐(퐵, 퐸) = 2 푐(퐶, 퐸) = 3 푐(퐸, 퐹 ) = 3 푐(푆, 퐶) = 4 푐(퐴, 퐺) = 4 푐(퐵, 퐺) = 4 푐(퐶, 퐺) = 1 푐(퐺, 퐹 ) = 2

The distances are computed from left (start node) to right (final node): 푑(푆, 퐴) = 푐(푆, 퐴) = 2 푑(푆, 퐵) = 푐(푆, 퐵) = 2 푑(푆, 퐶) = 푐(푆, 퐶) = 4

푑(푆, 퐷) = min [푑(푆, 퐴) + 푐(퐴, 퐷), 푑(푆, 퐵) + 푐(퐵, 퐷), 푑(푆, 퐶) + 푐(퐶, 퐷)]

23 5 Recognition algorithms

= min [2 + 4, 2 + 3, 4 + 4] = 푑(푆, 퐵) + 푐(퐵, 퐷) = 5

푑(푆, 퐸) = min [푑(푆, 퐴) + 푐(퐴, 퐸), 푑(푆, 퐵) + 푐(퐵, 퐸), 푑(푆, 퐶) + 푐(퐶, 퐸)] = min [2 + 5, 2 + 2, 4 + 3] = 푑(푆, 퐵) + 푐(퐵, 퐸) = 4

푑(푆, 퐺) = min [푑(푆, 퐴) + 푐(퐴, 퐺), 푑(푆, 퐵) + 푐(퐵, 퐺), 푑(푆, 퐶) + 푐(퐶, 퐺)] 푑(푆, 퐺) = min [푑(푆, 퐴) + 푐(퐴, 퐺), 푑(푆, 퐵) + 푐(퐵, 퐺), 푑(푆, 퐶) + 푐(퐶, 퐺)] = min [2 + 4, 2 + 4, 4 + 1] = 푑(푆, 퐶) + 푐(퐶, 퐺) = 5

푑(푆, 퐹 ) = min [푑(푆, 퐷) + 푐(퐷, 퐹 ), 푑(푆, 퐸) + 푐(퐸, 퐹 ), 푑(푆, 퐺) + 푐(퐺, 퐹 )] = min [5 + 1, 4 + 3, 5 + 2] = 푑(푆, 퐷) + 푐(퐷, 퐹 ) = 6

The minimum cost of the path is 6 and the path is S-B-D-F.

24 5.1 Font recognition

4 퐴 퐷 5 4 2 1

3 2 2 3 푆 퐵 퐸 퐹

4 4 2

4 3 퐶 퐺 1

Figure 24 An example of an multistage weighted directed graph. input : A matrix of nearest neighbours 퐹 of size number of NN = N × length of word output: Recognized font for 푐 ← 1 to 푙푒푛(푤표푟푑) do for 푟 ← 1 to 푁 do BestPredecessor ← NULL; BestCost← MAXINT; for node in nodes of previous stage do cost(c, r) = node.cost + r + penaltyDiffFont + penaltyCard; if cost(c, r) < BestCost then BestCost← cost(c, r); BestPredecessor ← node; end end nodes[r][c].path = BestPredecessor.path + F[r][c]; nodes[r][c].cost = BestCost; end end [BestCost, BestNode] ← findMinCost(nodes[:,-1]); RecognizedFont ← findMostOccuredFont(BestNode.path); Algorithm 1: Finding the shortest path

25 5 Recognition algorithms

5.2 Clustering through all characters

Several clusters of similar fonts have been made, see 푠푒푐푡푖표푛 4.1. If the maximal distance between letters is set to 8 then 12 clusters are created. From each cluster one repre- sentative font is chosen. The algorithm for recognition is the same as in the previous section with the difference that only cluster representative fonts are used.

5.2.1 Nearest neighbour cluster voting

The method is similar to the method in subsection 5.1.1, 1 nearest neighbour is found and the most occured cluster is selected. The clustering is described in the section 4.1. In the example with the word Lectures the cluster Iskoola Pota Regular was recognized with 50% certainty.

L e c t u r e s Iskoola KhmerUI Iskoola Iskoola Fang Vijaya Constantia Iskoola 1st Pota Regular Pota Pota Song Bold Regular Pota Regular Regular Regular Regular Regular

Figure 25 The table shows the nearest neighbour clusters for each character from the word Lectures.

Figure 26 The original image with the comparement of a cropped word and a word generated in the detected cluster Iskoola Pota Regular

5.2.2 Finding the shortest path in a multistage graph

The method is desribed above in section 5.1.2. The difference is that we use only fonts which are representative fonts in the clusters. Because the training set consists of a smaller number of the fonts, only 3 nearest neighbours are found for each blob.

26 5.2 Clustering through all characters

L e c t u r e s Iskoola KhmerUI Iskoola Iskoola Fang Vijaya Constantia Iskoola 1st Pota Regular Pota Pota Song Bold Regular Pota Regular Regular Regular Regular Regular Kaiti Fang Vijaya KhmerUI Iskoola KhmerUI Kaiti Constantia 2nd Regular Song Bold Regular Pota Bold Regular Regular Regular Regular Corbel Constantia KhmerUI Corbel KhmerUI Kaiti KhmerUI Fang 3rd Regular Regular Bold Regular Regular Regular Regular Song Regular

Figure 27 The table shows 3 nearest neighbour clusters for each character from the word Lectures.

Cardinalities of each cluster are computed. The cardinality of cluster means a number of columns in which the cluster is.

Iskoola Pota Regular 5 Fang Song Regular 3 KhmerUI Bold 2 Khmer UI Regular 4 KaiTi Regular 3 Corbel Regular 2 Constantia Regular 3 Vijaya Bold 2

Figure 28 The cardinalities of the clusters found as the nearest neighbours.

I can attach the path in a multistage graph found by dynamic programming.

IP 퐾푈 IP IP 퐹 푆 푉 퐵 퐶푂 IP

푆 퐾푅 퐹 푆 푉 퐵 퐾푈 IP 퐾퐵 퐾푅 퐶푂 퐹

퐶푅 퐶푂 퐾퐵 퐶푅 퐾푈 퐾푅 퐾푈 퐹 푆

Figure 29 Multi-stage graph of the 3 nearest neigbours. In each column (except the first and the last one) are the nearest neighbours for each character from the word lectures. The ab- breviations means: IP - Iskoola Pota Regular, KR - Kaiti Regular, CR - Corbel Regular, KU - KhmerUI Regular, FS - Fang Song Regular, CO - Constantia Regular, VB - Vijaya Bold, KB - KhmerUI Bold.

27 5 Recognition algorithms

5.3 Clustering per character

This type of clustering is more described in the section 4.2. Each character has its own clusters of fonts.

The algorithm

The blobs are tried to be recognized into the characters. If the quality of the recognition is bigger than 0.5, the blob is marked as well recognized and classified as this charac- ter. From these recognized blobs the first well recognized is taken from the nearest neighbours queue and its font is saved with the label of a character. Then, from each saved pair (font and character) the character’s cluster in which is the font is found. For experiments 25 clusters for each character were made. All fonts from the chosen cluster are appended to the original font as hypothesis of other possible fonts because the recognized character differs only a little in these fonts and is very easy to confuse a font. The most occured fonts from all saved are chosen as correct fonts.

Character OCR quality Recognized char Recognized font L 1.0 L Aparajita Bold e 1.0 e Microsoft Uighur Regular c 1.0 c David Bold t 1.0 t Iskoola Pota Bold u 1.0 u Free Serif Medium r 1.0 r Aparajita Bold e 0.9 e Eucrosia UPC Bold s 0.2 - -

Figure 30 Characters with their OCR qualities and fonts.

Figure 31 The letters in their recognized fonts desribed in the figure 30.

28 5.4 An utilization of font knowledge to improve the OCR

David Bold 4 Angsana New Bold 4 Cambria Bold 3 Eucrosia UPC Bold 4 Times New Roman Bold 4 Vani Bold 3 Aparajita Bold 4 Traditional Arabic Bold 4 ...

Figure 32 The computed cardinalities of fonts from the saved font list.

There are 6 fonts considered as the correct fonts with the highest cardinality. There were 4 from 6 characters determined that can be written in one of these 6 fonts.

Figure 33 The word Lectures in its recognized fonts. The first line: the original image. The second line: David Bold, Eucrosia UPC Bold, Aparajita Bold. The third line: Angsana New Bold, Times New Roman Bold, Traditional Arabic Bold.

5.4 An utilization of font knowledge to improve the OCR

The shortest path found in the multistage graph with the nearest neighbours described in 5.2.2 has been used as the first attempt for the improvement of the OCR quality. The characters belonging to the nodes of the shortest path have been considered as the classified characters and compared with the standard OCR method[1]. The results were worse in most cases but there were even some examples with the improvement. We have suggested an algorithm that could solve this problem better.

29 5 Recognition algorithms

Recognition pipeline

Firstly, a classifier is trained on 162 initial fonts. If a blob is not classified withacer- tainty bigger than a threshold, the class of the font is recognized and the classifier is trained only on letters of this class. Then the blob is reclassified.

Blobs

All fonts Recognized font Training

OCR certainty < threshold OCR OFR

Classified letters

Figure 34 Recognition pipeline for improving of the OCR quality.

30 6 Experiments

We have made several experiments described in this chapter. The font or cluster recog- nition is experimented in the first part. In the second part we have tried to use this information to improve the OCR quality. We have got 2 datasets. The first DATASET 1 is synthetically generated in a computer. There are generated 20 most common En- glish words in 162 fonts (3240 words). This dataset is important because it is ground truth and we can verify the results of the font recognition. The second DATASET 2 contains 132 real-scene images with a written text. In the real-scene images we do not have any information about the correct font.

6.1 Font or cluster recognition

Experiments were performed to estimate a font or a similar font of a written text. We have tested 5 methods for the font recognition.

Method FontNN font recognition as 1 the nearest neighbour voting, see 5.1.1 Method FontDynamic font recognition by dynamic programing approach, see 5.1.2 Method ClusterNN cluster recognition as 1 the nearest neighbour voting, see 5.2.1 Method ClusterDynamic cluster recognition by dynamic programming approach, see 5.2.2 Method ClusterPerChar font recognition with help of cluster of each character, see 5.3

6.1.1 DATASET 1 – computer-generated images

A table shows results of the font or cluster recognition rate. In methods FontNN and FontDynamic is tested if exactly the right font is recognized. Methods ClusterNN and ClusterDynamic indicate if the cluster which includes the font of a written text is detected. An output of ClusterPerCharacter is not only one font or cluster but a set of possible fonts. It is tested if the original font is in this set of most likely fonts. It is therefore obvious that the rate will be higher and thus these methods can not be directly compared.

31 6 Experiments

FontNN FontDynamic ClusterNN ClusterDynamic ClusterPerChar Quality [%] 79.2 78.1 44.6 47.1 93.6

Figure 35 Font or cluster recognition qualities for each method.

method-number of clusters N-25 D-25 N-50 D-50 N-100 D-100

퐹푂퐶푅 [%] 85.8 86.1 89.7 89.4 94.2 94.5 Cluster rec. [%] 44.6 47.1 53.0 54.5 73.3 76.8

Figure 36 Cluster recognition rate and its quality by 퐹푂퐶푅 with 휃 = 1.0. The abbreviations mean N - ClusterNN method and D - ClusterDynamic method.

If the number of clusters are increased in the methods ClusterNN and ClusterDy- namic, the cluster recognition rate also increases. For 100 clusters the rate is closer to the FontNN and the FontDynamic rate. But it means that in one cluster there are only 1.6 fonts on average. The methods ClusterNN and ClusterDynamic are not very successful for the cluster recognition.

6.1.2 DATASET 2 – real-scene images

It is impossible to evaluate the methods for the font recognition on real-scene images because we do not have the ground truth information about the font. I can only attach some examples of images for visual comparement.

Example 1

In this image there is only one word kores.

32 6.1 Font or cluster recognition

Original image FontNN Candara Bold FontDynamic Arial Black ClusterNN Khmer UI Bold ClusterDynamic Khmer UI Bold ClusterPerChar Aharoni Bold Arial Black Tahoma Bold Microsoft PhagsPa Bold Verdana Bold Candara Bold

Figure 37 The results of sugested methods for Image 1.

Example 2

In this image we have chosen the first word Rauchen.

Original image

FontNN Impact Regular FontDynamic Impact Regular ClusterNN Khmer UI Bold ClusterDynamic Khmer UI Bold

Figure 38 The results of sugested methods for Image 2. Method ClusterPerChar returned 20 fonts and they are not shown.

Example 3

In this image there is only one word springer.

33 6 Experiments

Original image

FontNN Iskoola Pota Bold FontDynamic Iskoola Pota Bold ClusterNN Constantia Regular

ClusterDynamic Khmer UI Bold ClusterPerChar Vani Bold Kodchiang UPC Bold Georgia Bold Linotype Bold Constantia Bold Iskoola Pota Bold

Figure 39 The results of sugested methods for Image 3.

Example 4

In this image we have chosen the word GALAXY.

Original image FontNN Cordia New Regular FontDynamic Tunga Regular ClusterNN Khmer UI Regular ClusterDynamic Khmer UI Regular ClusterPerChar Lily UPC Regular Candara Bold

Figure 40 The results of sugested methods for Image 4.

It is very difficult to evaluate the results if the ground truth about the font isunknown. I think that in the example 1 all detected fonts are admissible. In the example 2 methods FontNN and FontDynamic are good. In the example 3 I do not like the result of the method ClusterDynamic because of the different letter ‘g’. I think there are all results admissible in the example 4 but the best is FontNN.

34 6.2 An utilization of font knowledge to improve the OCR quality

6.2 An utilization of font knowledge to improve the OCR quality

The success of the algorithm is tested as follows. The standard OCR (푆푂퐶푅) quality is computed. It means that a classifier is trained on a set of letters generated inall fonts. There are 퐾-nearest neighbours from this set found for each blob. The OCR quality for each character 푐 is computed as:

푛 푄 = 푐 (8) 푐 퐾 where 푛푐 is a count of character 푐 in the 퐾-nearest neighbours. The maximal quality

푀푞 and the most occured character 푀푐 is found, Ω is the alphabet.

푀푞 = max 푄푐 푐∈Ω (9) 푀푐 = arg max 푄푐 푐∈Ω

The standard OCR (푆푂퐶푅) returns 푀푐 as the final classification.

A standard OCR with unrecognized letters (푈푂퐶푅) classifies almost in the same way as 푆푂퐶푅 but with the difference that it rejects the classification of low quality. If the 푀푞 is greater or equal than a threshold 휃 the blob is classified as 푀푐. Otherwise it is indicated as an unrecognized letter.

An improved OCR (퐹푂퐶푅) method takes the classified blobs from the 푆푂퐶푅 method. For each blob if its 푀푞 is less than a threshold 휃 the classification is triggered in a different way. A font of the whole word is detected and the nearest neighbour clas- sifier is trained only on this font (the class of the font). For each blobwiththe 푀푞 < 휃 one nearest neighbour is found and this blob is reclassified as a found character inthe nearest neighbours.

The final qualities of the OCR are computed as a number of well recognized blobs divided by a number of all blobs.

6.2.1 DATASET 1 – computer-generated images

In this dataset 3240 words were computer-generated. There are 20 different English words in 162 fonts.

35 6 Experiments

휃 0.5 0.6 0.7 0.8 0.9 1.0 퐹푂퐶푅

All methods 푆푂퐶푅 quality [%] 84.4 84.4 84.4 84.4 84.4 84.4 84.4

All methods 푈푂퐶푅 quality [%] 80.3 76.1 73.3 69.6 65.4 57.4 0.0

FontNN 퐹푂퐶푅 quality [%] 88.9 92.3 93.6 94.5 95.8 96.0 95.4

FontDynamic 퐹푂퐶푅 quality [%] 88.2 91.0 92.1 93.0 94.4 94.5 94.0

ClusterNN 퐹푂퐶푅 quality [%] 85.1 88.4 88.1 87.8 87.3 85.8 82.0

ClusterDynamic 퐹푂퐶푅 quality [%] 85.9 87.2 87.1 86.8 86.4 85.1 82.1

ClusterPerCharacter 퐹푂퐶푅 quality [%] 89.0 92.3 93.6 94.5 95.8 95.9 95.9

The best impr. [%] 4.6 7.9 9.2 10.1 11.4 11.6 11.5

We have observed the improvement in all methods while testing the computer- generated images with typed words. The most successful method is ClusterPerCharac- ter followed by FontNN. The best improvement of 11.6 % is with 휃 = 1.0. It means that only classifications with the quality 1.0 are taken from 푆푂퐶푅 and the rest is classified by

퐹푂퐶푅. An influence of the 퐹푂퐶푅 increases with the 휃 because more blobs are classified by 퐹푂퐶푅. It is a success for the 퐹푂퐶푅 that the improvement increases with the 휃.

6.2.2 DATASET 2 – all 132 real-scene images

The methods successful on the computer-generated data have been tested on the real- scene images.

휃 0.1 0.2 0.3 0.4

All methods 푆푂퐶푅 quality [%] 65.5 65.5 65.5 65.5

All methods 푈푂퐶푅 quality [%] 65.5 65.2 64.6 62.5

FontNN 퐹푂퐶푅 quality [%] 65.5 65.5 65.5 65.2

FontDynamic 퐹푂퐶푅 quality [%] 65.5 65.5 65.5 65.5

ClusterNN 퐹푂퐶푅 quality [%] 65.5 65.6 65.7 65.1

ClusterDynamic 퐹푂퐶푅 quality [%] 65.5 65.5 65.6 65.1

ClusterPerCharacter 퐹푂퐶푅 quality [%] 65.5 65.5 65.7 65.6

Best improvement 퐹푂퐶푅 quality [%] 0.0 0.1 0.2 0.1

As it can be seen the best improvement across all methods is zero or very small. The methods are unusable as the improvement of 푆푂퐶푅 on the whole dataset. They can only be used as a confirmation of the classifications with the low quality (lower than 휃) which method 푈푂퐶푅 rejects. However, it has been found that the selection of images can greatly affect the final

36 6.2 An utilization of font knowledge to improve the OCR quality results. We are able to find a set of images for each method where the method isvery successful. In some images the quality has been improved by 퐹푂퐶푅 up to about 40 %.

The new subset of images from the whole DATASET 1 has been made for each method. In the subset are included the images whose improvement by that method was at least 10 % for any theta.

6.2.3 DATASET 2a – method FontNN

This dataset contains 17 images.

휃 0.5 0.6 0.7 0.8 0.9

All methods 푆푂퐶푅 quality [%] 50.0 50.0 50.0 50.0 50.0

FontNN 퐹푂퐶푅 quality [%] 56.7 59.7 61.0 62.1 61.5

FontDynamic 퐹푂퐶푅 quality [%] 56.7 58.5 59.7 61.0 60.3

ClusterNN 퐹푂퐶푅 quality [%] 55.6 57.2 58.2 58.2 58.5

ClusterDynamic 퐹푂퐶푅 quality [%] 55.4 56.9 58.2 58.7 59.2

ClusterPerCharacter 퐹푂퐶푅 quality [%] 53.3 55.4 56.2 55.9 56.4 The best impr. [%] 6.7 9.7 11.0 12.1 11.5

6.2.4 DATASET 2b – method FontDynamic

This dataset contains 21 images.

휃 0.5 0.6 0.7 0.8 0.9

All methods 푆푂퐶푅 quality [%] 45.9 45.9 45.9 45.9 45.9

FontNN 퐹푂퐶푅 quality [%] 51.6 53.4 54.4 54.9 53.1

FontDynamic 퐹푂퐶푅 quality [%] 52.9 55.4 57.9 59.4 58.4

ClusterNN 퐹푂퐶푅 quality [%] 51.6 52.9 53.6 53.4 52.6

ClusterDynamic 퐹푂퐶푅 quality [%] 51.4 52.6 52.9 53.1 52.6

ClusterPerCharacter 퐹푂퐶푅 quality [%] 48.4 49.9 50.6 50.9 50.6 The best impr. [%] 3.0 5.5 8.0 9.5 8.5

6.2.5 DATASET 2c – method ClusterNN

This dataset contains 10 images.

37 6 Experiments

휃 0.5 0.6 0.7 0.8 0.9

All methods 푆푂퐶푅 quality [%] 25.4 25.4 25.4 25.4 25.4

FontNN 퐹푂퐶푅 quality [%] 32.6 33.5 35.2 36.9 36.9

FontDynamic 퐹푂퐶푅 quality [%] 33.5 34.3 36.0 37.7 37.7

ClusterNN 퐹푂퐶푅 quality [%] 34.7 36.0 38.6 40.3 42.4

ClusterDynamic 퐹푂퐶푅 quality [%] 33.9 35.2 37.3 40.0 39.8

ClusterPerCharacter 퐹푂퐶푅 quality [%] 29.7 30.5 30.9 30.9 31.8 The best impr. [%] 9.3 10.6 13.2 14.9 17.0

6.2.6 DATASET 2d – method ClusterDynamic

This dataset contains 12 images.

휃 0.5 0.6 0.7 0.8 0.9

All methods 푆푂퐶푅 quality [%] 45.0 45.0 45.0 45.0 45.0

FontNN 퐹푂퐶푅 quality [%] 49.6 52.2 53.7 54.0 53.7

FontDynamic 퐹푂퐶푅 quality [%] 50.1 51.9 53.5 54.0 53.5

ClusterNN 퐹푂퐶푅 quality [%] 50.1 52.7 55.0 55.3 56.3

ClusterDynamic 퐹푂퐶푅 quality [%] 51.4 54.2 56.8 57.8 59.4

ClusterPerCharacter 퐹푂퐶푅 quality [%] 48.3 49.1 49.9 50.1 50.9 The best impr. [%] 6.4 9.2 11.8 12.8 14.4

6.2.7 DATASET 2e – method ClusterPerCharacter

This dataset contains 13 images.

휃 0.5 0.6 0.7 0.8 0.9

All methods 푆푂퐶푅 quality [%] 59.6 59.6 59.6 59.6 59.6

FontNN 퐹푂퐶푅 quality [%] 61.9 64.2 66.2 66.5 65.4

FontDynamic 퐹푂퐶푅 quality [%] 61.9 62.7 63.8 64.2 63.5

ClusterNN 퐹푂퐶푅 quality [%] 56.9 51.2 49.2 44.2 41.5

ClusterDynamic 퐹푂퐶푅 quality [%] 56.9 51.2 49.2 44.2 41.5

ClusterPerCharacter 퐹푂퐶푅 quality [%] 63.5 66.5 69.6 69.6 70.8 The best impr. [%] 3.9 6.9 10.0 10.0 11.2

As we can see, for some subsets of images the improvement by 퐹푂퐶푅 is about 10 %.

38 6.2 An utilization of font knowledge to improve the OCR quality

6.2.8 Examples

Examples with an improvement

a) ALKOHOL b) ZAPAD

Figure 41 Examples of images where suggested FontDynamic helped.

The word ALKOHOL in image 41a was improved with the method FontDynamic by 43 %. There was recognized only 1 character with 푆푂퐶푅 method. However, 퐹푂퐶푅 method recognized 4 characters from 7. The font was detected as Impact Regular.

True label A L K O H O L

Classification by 푆푂퐶푅 1 i K 0 N 0 i

Classification by 퐹푂퐶푅 A L K o u o L

Figure 42 Classificatin of 푆푂퐶푅 and its improvement by 퐹푂퐶푅.

a) Original image. b) Detected font Impact Regular.

Figure 43 Example shown in figure 41a with the word ALKOHOL.

The word ZAPAD in image 41a was improved with the method FontDynamic by

20 %. There were 3 characters well recognized with 푆푂퐶푅 method. Method 퐹푂퐶푅 rec-

39 6 Experiments ognized 1 more character from total 5. The font was detected as Fish Fingers Regular.

True label Z A P A D

Classification by 푆푂퐶푅 Z A F A J

Classification by 퐹푂퐶푅 Z A n A D

Figure 44 Classificatin of 푆푂퐶푅 and its improvement by 퐹푂퐶푅.

a) Original image. b) Detected font Fish Fingers Light.

Figure 45 Example shown in figure 41b with the word ZAPAD.

Examples with no improvement

a) Original image. b) Detected font Impact Regular.

Figure 46

True label R E S T A U R A C E

Classification by 푆푂퐶푅 8 i C 1 A L i t r t

Classification by 퐹푂퐶푅 B L 9 1 b L B j C D

Figure 47 Classificatin of 푆푂퐶푅 and 퐹푂퐶푅 methods.

40 6.2 An utilization of font knowledge to improve the OCR quality

The text in the figure is curved and morever is written in a specific font whichwedo not have in a training set neither similar. Maybe learning classifier with a rotation could help. Only 1 from 11 characters was recognized by 푆푂퐶푅 and also 1 character was recognized by 퐹푂퐶푅

a) MODERNI b) Jecmen

Figure 48 Examples of well recognized words by 푆푂퐶푅 method.

In these two examples 48a and 48b was nothing to improve because all characters in words MODERNI and Jecmen were recognized well by 푆푂퐶푅 method. At the image

48a we can even see a deterioration with the classification by the method 퐹푂퐶푅.

True label J e c m e n

Classification by 푆푂퐶푅 J e c m e n

Classification by 퐹푂퐶푅 J e c m e n

Figure 49 Classificatin of 푆푂퐶푅 and 퐹푂퐶푅 methods.

True label M O D E R N I

Classification by 푆푂퐶푅 M O D E R N I

Classification by 퐹푂퐶푅 V O O E R N I

Figure 50 Classificatin of 푆푂퐶푅 and 퐹푂퐶푅 methods.

41 6 Experiments

Examples where the methods have errors

a) Original image b) Recognized fonts

Figure 51 Logitech written in the detected fonts. From up to down the results of the meth- ods: original image, FontNN – Fish Fingers Regular, FontDynamic – Print Regular, ClusterNN = ClusterDynamic – Calibri Bold, ClusterPerCharacter – Candara Regular.

In this example all methods are unsuccessful. FontNN and FontDynamic methods determined entirely different font and correctly classified only 2 characters from8. The remaining methods differ a lot at the ‘g’ character and they correctly classified all characters except the ‘g’ and ‘i’ character.

a) Original image b) Recognized fonts

Figure 52 Samsung written in the detected fonts. From up to down the results of the methods: original image, FontNN = FontDynamic – Euphemia Regular, ClusterNN – KaiTi Regular, ClusterDynamic – Cordia New Regular, ClusterPerCharacter (2 last images) – Kalinga Bold, Console Regular.

42 6.2 An utilization of font knowledge to improve the OCR quality

In this example the detected fonts are very similar but they differ only in ‘g’ char- acter. As expected, the character ‘g’ was correctly classified only by the methods ClusterDynamic and ClusterPerCharacter.

43 7 Implementation

7.1 Programming language

This work is programmed in Python 2.7. The integrated development environment was used Spyder.

7.2 Used libraries

TextSpotter Real-Time Scene Text Localization and Recognition [1] SciPy fundamental library for scientific computing [20] Matplotlib python 2D plotting library [22] NumPy computing with multi-dimensional arrays and matrices [23] OpenCV computer vision functions [24] FLANN Fast Library for Approximate Nearest Neighbors [18]

7.3 The code

In the class FontCluster there are methods for both types of clustering. For the first clustering described in the section 4.1 is called method make clusters. The second type of clustering described in the section 4.2 is called by the method make clusters of char. For a simple font or cluster recognition there is the class FontsDetector with the method detect font. The dynamic approach is described in the class FontsDyn- Detector and the process is called by the methods find shortest path and detect font dyn. The algorithm described in the section 5.3 is called by methods classify images, add fonts within clusters and most occ font.

44 8 Conclusion

This thesis presents several algorithms for the font recognition and for the improvement of the character recognition by using information about the detected font.

8.1 Font or cluster recognition

Two datasets of the images have been used. The first DATASET 1 consists of the computer-generated images of 20 most common English words typed in 162 fonts. The font recognition rate for the method FontNN is 79.2 % and for the method FontDy- namic is 78.2 %. In the cluster recognition the methods ClusterNN and ClusterDynamic have not been very successful. For 25 clusters created from 162 fonts (average 6.48 fonts per cluster) the accuracy of the recognition has been only 44.6 % for ClusterNN and 47.1 % for ClusterDynamic method. With the increasing number of clusters the recog- nition rate has also moved upwards. But with the large number of clusters it coincides with the methods FontNN and FontDynamic. The recognition rate of the Cluster- PerCharacter method can not be directly compared with the other methods because it sometimes returns more than one font. The font is considered as correctly determined if it is included in the set of returned fonts. The rate of recognition is 93.6 %. The sec- ond DATASET 2 consists of 132 real-scene images (940 words). The font recognition rate for this dataset cannot be evaluated because we do not have information about the font in which the text is written.

8.2 An utilization of font knowledge to improve the OCR quality

We have also tested the suggested methods used to improve the OCR quality by us- ing knowledge of the detected font. On the first computer-generated DATASET 1 the improvement could be seen at all the methods. The most successful methods have

45 8 Conclusion been ClusterPerCharecter and FontNN. These methods compared to the standard OCR method have reached the improvement of 11.6 %. They have improved the OCR quality from 84.4 % to 96.0 %. But we have discovered that these methods are not generally applicable for the second DATASET 2 consisting of the real-scene images. There was no improvement, or very small, on the whole dataset. However, we have found out that the methods are very successful for some images. For each method we are able to find a subset of 10 to 21 images where the method has the improvement ofabout 10 %. At some images the improvement is even about 40 %. The methods have failed on the whole dataset either the font was determined completely wrong or just the let- ters were not recognized because of distortion and blurring. Maybe more features for the font recognition, additional information about the blobs and a bigger training set with the rotation and distortion could help.

46 Appendix A

Appendix

I attach the examples of clustering for different within cluster distances. The cluster creation is described in the section 4.1.

47 Appendix A Appendix

A.1 The maximum distance between the letters within the cluster is 6.5

Figure 53 Clusters of letters ‘a’, ‘Q’, ‘g’ and ‘E’ with their representatives.

48 A.2 The maximum distance between the letters within the cluster is 7.5

A.2 The maximum distance between the letters within the cluster is 7.5

Figure 54 Clusters of letters ‘a’, ‘Q’, ‘g’ and ‘E’ with their representatives.

49 Appendix A Appendix

A.3 The maximum distance between the letters within the cluster is 8

Figure 55 Clusters of letters ‘a’, ‘Q’, ‘g’ and ‘E’ with their representatives.

50 A.4 The maximum distance between the letters within the cluster is 8.5

A.4 The maximum distance between the letters within the cluster is 8.5

Figure 56 Clusters of letters ‘a’, ‘Q’, ‘g’ and ‘E’ with their representatives.

51 Appendix B

Appendix

I attach the examples of clustering for different within cluster distances. The cluster creation is described in the section 4.2.

52 B.1 The maximum distance between the letters within the cluster is 5

B.1 The maximum distance between the letters within the cluster is 5

Figure 57 Clusters of letters ‘a’, ‘Q’, ‘g’ and ‘E’ with their representatives.

53 Appendix B Appendix

B.2 The maximum distance between the letters within the cluster is 6

Figure 58 Clusters of letters ‘a’, ‘Q’, ‘g’ and ‘E’ with their representatives.

54 B.3 The maximum distance between the letters within the cluster is 8

B.3 The maximum distance between the letters within the cluster is 8

Figure 59 Clusters of letters ‘a’, ‘Q’, ‘g’ and ‘E’ with their representatives.

55 Appendix B Appendix

B.4 The maximum distance between the letters within the cluster is 8

Figure 60 Clusters of letters ‘a’, ‘Q’, ‘g’ and ‘E’ with their representatives.

56 Appendix C

Appendix

C.1 Enclosed CD

I enclose CD with my program, datasets and this thesis in PDF format.

57 Bibliography

[1] Lukáš Neumann and Jiří Matas. TextSpotter. Real-Time Scene Text Localization and Recognition. 2012. url: http://www.textspotter.org/.

[2] A. Zramdini and R. Ingold. “Optical font recognition using typographical fea- tures”. In: Pattern Analysis and Machine Intelligence, IEEE Transactions on 20.8 (Aug. 1998), pp. 877–882. issn: 0162-8828. doi: 10.1109/34.709616.

[3] A. Zramdini and R. Ingold. “Optical font recognition from projection profiles”. In: Electronic Publishing 6.3 (Sept. 1993), pp. 249–260.

[4] A. Satkhozhina, I. Ahmadullin, and J. P. Allebach. “Optical Font Recognition using Conditional Random Field”. In: (Sept. 2013).

[5] Michael P. Cutter et al. Font group identification using reconstructed fonts. 2011. doi: 10.1117/12.873398. url: http://dx.doi.org/10.1117/12.873398.

[6] Michael Patrick Cutter et al. “Unsupervised Font Reconstruction Based on To- ken Co-occurrence”. In: Proceedings of the 10th ACM Symposium on Document Engineering. DocEng ’10. Manchester, United Kingdom: ACM, 2010, pp. 143– 150. isbn: 978-1-4503-0231-9. doi: 10 . 1145 / 1860559 . 1860589. url: http : //doi.acm.org/10.1145/1860559.1860589.

[7] L. Zhang, Yue Lu, and C.L. Tan. “Italic font recognition using stroke pattern analysis on wavelet decomposed word images”. In: Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on. Vol. 4. Aug. 2004, 835–838 Vol.4. doi: 10.1109/ICPR.2004.1333902.

[8] Siamak Khoubyari and Jonathan J. Hull. “Font and Function Word Identification in Document Recognition”. In: Computer Vision and Image Understanding 63.1 (1996), pp. 66–74. issn: 1077-3142. doi: http://dx.doi.org/10.1006/cviu. 1996.0005.

58 Bibliography

[9] Siamak Khoubyari and Jonathan J. Hull. “Keyword Location in Noisy Document Images”. In: Symp. on Document Analysis and Information Retrieval, Las Vegas, NV (Apr. 1993).

[10] Yong Zhu, Tieniu Tan, and Yunhong Wang. “Font recognition based on global tex- ture analysis”. In: Pattern Analysis and Machine Intelligence, IEEE Transactions on 23.10 (Oct. 2001), pp. 1192–1200. issn: 0162-8828. doi: 10.1109/34.954608.

[11] Avilé-Cruz Carlos, Villegas-Cortes Juan, and J. Ocampo-Hidalgo. “A Robust Font Recognition Using Invariant Moments”. In: Proceedings of the 5th WSEAS Inter- national Conference on Applied Computer Science. ACOS’06. Hangzhou, China: World Scientific, Engineering Academy, and Society (WSEAS), 2006, pp. 114–117. isbn: 960-8457-43-2. url: http://dl.acm.org/citation.cfm?id=1973598. 1973622.

[12] Hasan S. M. Al-Khaffaf et al. “On the performance of Decapod’s digital font reconstruction.” In: ICPR. IEEE, 2012, pp. 649–652. isbn: 978-1-4673-2216-4. url: http : / / dblp . uni - trier . de / db / conf / icpr / icpr2012 . html # Al - KhaffafSCB12.

[13] M. Solli and R. Lenz. “A Font Search Engine for Large Font Databases”. In: Electronic Letters on Computer Vision and Image Analysis 10.1 (2011), pp. 24– 41.

[14] J. T. Lidke. “Hierarchical Font Recognition. Letter Snippets - Visual Words in Font Recognition”. Diploma Thesis. Philipps University of Marburg, 2010.

[15] Serdar Ozturk, A. Toygar Abak, and Bulent Sankur. “Font clustering and clus- ter identification in document images”. In: Journal of Electronic Imaging 10(2) (2001), pp. 418–430. doi: 10.1117/1.1351820. url: http://dx.doi.org/10. 1117/1.1351820.

[16] Lukáš Neumann. “Vyhledávání a rozpoznávání textu v obrazech reálných scén”. Diplomová práce. ČVUT, 2010. url: http://cmp.felk.cvut.cz/~neumalu1/ Neumann-thesis-2010.pdf.

[17] Lukáš Neumann and Jiří Matas. Scene Text Localization and Recognition with Oriented Stroke Detection. ICCV 2013, IEEE International Conference on Com- puter Vision, Sydney, Australia, 2013. url: http : / / cmp . felk . cvut . cz / ~neumalu1/neumann-iccv2013.pdf.

59 Bibliography

[18] FLANN - Fast Library for Approximate Nearest Neighbors. url: http://www. cs.ubc.ca/research/flann/.

[19] Wikipedia. K-nearest neighbors algorithm — Wikipedia, The Free Encyclopedia. [Online; accessed 14-May-2014]. 2014. url: http : / / en . wikipedia . org / w / index.php?title=K-nearest_neighbors_algorithm&oldid=604493028.

[20] The Scipy community, ed. SciPy. 2008-2009. url: http://docs.scipy.org/ doc/scipy/reference/.

[21] A. A. Puntambekar. Analysis of Algorithm and Design. Technical Publications Pune, 2009.

[22] The matplotlib development team, ed. Matplotlib. 2013. url: http://matplotlib. org/.

[23] Numpy developers, ed. NumPy. 2013. url: http://www.numpy.org/.

[24] Itseez, ed. OpenCV. 2014. url: http://opencv.org/.

60