A New Computational Method for Arabic Calligraphy Style Representation and Classiﬁcation

Home , Arabic calligraphy, Diwani, Naskh (script), Thuluth

applied sciences

Article A New Computational Method for Arabic Calligraphy Style Representation and Classiﬁcation

Zineb Kaoudja 1,*, Mohammed Lamine Kherﬁ 1,2 and Belal Khaldi 3

1 Lab or Artiﬁcial Intelligence and Data Science, Kasdi Merbah Ouargla University, Ouargla BP. 511, Algeria; kherﬁ[email protected] 2 LAMIA Laboratory, University du Québec à Trois-Rivières, Trois-Rivières, QC G8Z 4M3, Canada 3 LINATI Laboratory, University of Ouargla-Kasdi Merbah, Ouargla BP. 511, Algeria; [email protected] * Correspondence: [email protected]

Abstract: Despite the importance of recognizing Arabic calligraphy styles and their potential useful- ness for many applications, a very limited number of Arabic calligraphy style recognition works have been established. Thus, we propose a new computational tool for Arabic calligraphy style recognition (ACSR). The present work aims to identify Arabic calligraphy style (ACS) from images where text images are captured by different tools from different resources. To this end, we were inspired by the indices used by human experts to distinguish different calligraphy styles. These indices were transformed into a descriptor that defines, for each calligraphy style, a set of specific features. Three scenarios have been considered in the experimental part to prove the effectiveness of the proposed tool. The results confirmed the outperformance of both individual and combine features coded by our descriptor. The proposed work demonstrated outstanding performance, even with few training samples, compared to other related works for Arabic calligraphy recognition.

Citation: Kaoudja, Z.; Kherﬁ, M.L.; Keywords: document image analyzing; Arabic calligraphy; feature extraction; machine learning Khaldi, B. A New Computational Method for Arabic Calligraphy Style Representation and Classiﬁcation. Appl. Sci. 2021, 11, 4852. https:// 1. Introduction doi.org/10.3390/app11114852 Document Analysis (DA) is one of the computer vision applications. It is a discipline that combines image processing and machine learning techniques to process and extract Academic Editor: information from document images of different text types such as digits or alphabet. DA Antonio Fernández-Caballero attempts to extract the layout from a scanned document page and then reuse it to generate a new visually similar document with other content. Therefore, it must be capable of Received: 16 April 2021 identifying writing characteristics, such as the style used in writing, and employ it to Accepted: 21 May 2021 Published: 25 May 2021 reproduce similar documents with the same style. DA techniques have been employed in many applications such as document layout

Publisher’s Note: MDPI stays neutral analysis [1], signature verification [2], writer identification [3], optical character recognition with regard to jurisdictional claims in (OCR), optical Font recognition (OFR), etc. For instance, OFR aims to define the text writing published maps and institutional afﬁl- style from a document image. Based on the purpose of the writing and the source of the iations. written text, text writing can be classified into two categories, as shown in Figure1. The ﬁrst one is texts written by humans, which in turn comprises two sub-categories namely ordinary handwriting and calligraphy. The second one is machine-printed texts that may be written using common styles. Thus, text produced by machine or handwritten calligraphy uses well-known styles. Writing styles used by a machine are referred to as Copyright: © 2021 by the authors. Licensee MDPI, Basel, Switzerland. fonts, whereas calligraphy refers to the set of styles that might be used in handwriting This article is an open access article texts. Fonts and calligraphy also differ in the purpose of use, calligraphy is mostly used for distributed under the terms and decoration such as in mosque walls, or in artists’ paintings, whereas fonts are mostly used conditions of the Creative Commons for documentation and communication. Fonts are far less complicated than calligraphy. Attribution (CC BY) license (https:// They may provide alternatives, ligatures, and some stylized versions, but fundamentally, creativecommons.org/licenses/by/ they can only be as creative as type designers. When writing letters by hand, there are no 4.0/). limitations to how to write them. Every version of a letter can be written differently. Letters

Appl. Sci. 2021, 11, 4852. https://doi.org/10.3390/app11114852 https://www.mdpi.com/journal/applsci Appl. Sci. 2021, 11, x FOR PEER REVIEW 2 of 17

Appl. Sci. 2021, 11, 4852 2 of 17 sions, but fundamentally, they can only be as creative as type designers. When writing letters by hand, there are no limitations to how to write them. Every version of a letter can be written differently. Letters may be merged or connected even though they are not may be merged or connected even though they are not adjacent. Consequently, fonts do adjacent. Consequently, fonts do not offer flexibility compared to handwriting. Such not offer flexibility compared to handwriting. Such flexibility makes the recognition of flexibility makes the recognition of calligraphy or reproduction of text using similar cal- calligraphy or reproduction of text using similar calligraphy styles an extremely difficult ligraphy styles an extremely difficult task for a DA system. In this work, we aim to pro- task for a DA system. In this work, we aim to propose a tool that is able to classify Arabic pose a tool that is able to classify Arabic calligraphy styles. calligraphy styles.

FigureFigure 1. 1. TheThe different different categories of text writingwriting styles.styles. OurOur workwork fallsfallsin inthe thehighlighted highlighted category category of ofcalligraphy calligraphy text. text.

CalligraphyCalligraphy style style recognition recognition (CSR) (CSR) is is considered an important research area for the followingfollowing reasons: reasons: (1) (1) It It helps helps to to read read the te textxt content. In other words, knowing the used stylestyle reflects reflects the rules used in the writing of different characters, which in turn eases the readingreading task. task. (2) (2) It It helps helps to to recognize the di differentfferent parts of a document.document. Some styles are defineddefined to to represent represent specific specific parts in in the the docume documentnt such such as as titles, titles, footers, footers, paragraphs, paragraphs, etc. etc. (3)(3) It It also also helps helps to to grasp grasp the the history history of of a a docu document.ment. In In paleography, paleography, CSR is used to define define thethe era era in in which which the the document document was was written written becaus becausee styles appeared in different eras. (4) It alsoalso helps helps to to define define the the origins origins (i.e., (i.e., geographical geographical area) area) in in which which a a document document was was written. written. (5)(5) It can be used for tutortutor purposes.purposes. Callig Calligraphyraphy learners use CSR systems to judge the quality of their writings in cases of experts’ absence. ArabicArabic is is the the fourth fourth most most spoken spoken language language in in the the world, world, and and it is it isthe the first first language language of moreof more than than 200 200 million million people people across across the the world. world. Arabic Arabic script script is is the the third third most most widely usedused script script after Latin script [4]. [4]. Similar Similar to to other other languages, languages, it it has has grammar, grammar, spelling, punctuation rules, pronunciation, slang, and idioms. Several characteristics beyond the meremere differences between languages make Arab Arabicic distinctive, including the number of variations and and the the written form. Arabic Arabic handwritten handwritten text text is is divided divided into into two two main main parts: parts: artisticartistic writing called calligraphycalligraphy andand non-artisticnon-artistic (handwritten (handwritten or or printed) printed) scripts. scripts. Arabic Ara- biccalligraphy calligraphy (AC) (AC) is one is one of the of mostthe most significant signific artsant inarts the in world. the world. It is written It is written only byonly hand. by hand.The first The style first of style calligraphy of calligraphy was developed was developed at the end at the of the end 7th of century. the 7th Overcentury. time, Over the Muslim and Arab world developed many other styles of calligraphy. Due to the complex time, the Muslim and Arab world developed many other styles of calligraphy. Due to the shape of the Arabic text, one must hold a certain level of expertise to be able to write AC complex shape of the Arabic text, one must hold a certain level of expertise to be able to texts or recognize a writing style. write AC texts or recognize a writing style. Before delving into Arabic calligraphy systems, we must shed some light on Arabic Before delving into Arabic calligraphy systems, we must shed some light on Arabic font recognition (AOFR) and how it differs from calligraphy recognition. On the one hand, font recognition (AOFR) and how it differs from calligraphy recognition. On the one the AOFR domain has received a lot of attention for several reasons: (a) the availability hand, the AOFR domain has received a lot of attention for several reasons: (a) the avail- of public big datasets, (b) the ease of dealing with Arabic fonts due to their fixed shapes, ability of public big datasets, (b) the ease of dealing with Arabic fonts due to their fixed (c) and the considerable amount of related works [5,6] on the subject. On other hand, far shapes, (c) and the considerable amount of related works [5,6] on the subject. On other less attention has been given to Arabic calligraphy style recognition (ACSR). However, hand, far less attention has been given to Arabic calligraphy style recognition (ACSR). for the other languages, several works have been made to this end such as the work [7] However, for the other languages, several works have been made to this end such as the in which topological features have been proposed to recognize Hebrew handwriting workstyles. [7] Nonetheless, in which topological the complexity features of calligraphy have been stylesproposed makes to itrecognize possible toHebrew use only hand- two writingcharacters, styles. the characterNonetheless, Aleph the and complexity the character of calligraphy Lamed. Other styles attempts makes haveit possible been carriedto use onlyout for two Chinese characters, style the recognition character inAleph [8], proposing and the character a feature Lamed. extraction Other method attempts based have on the regional guided filter (RGF) with reference images, which is generated by K-Nearest Appl.Appl.Appl. Sci.Sci. Sci. 20212021 2021,, 11,11 11,, x,x x FORFOR FOR PEERPEER PEER REVIEWREVIEW REVIEW 33 3 of ofof 17 1717 Appl. Sci. 2021, 11, x FOR PEER REVIEW 3 of 17

beenbeenbeen carriedcarriedcarried outoutout forforfor ChineseChineseChinese stylestylestyle recognitionrecognitionrecognition ininin [8],[8],[8], proposingproposingproposing aaa featurefeaturefeature extractionextractionextraction been carried out for Chinese style recognition in [8], proposing a feature extraction Appl. Sci. 2021, 11, 4852 methodmethodmethod based basedbased on onon the thethe regional regionalregional guided guidedguided filter filterfilter (RGF) (RGF)(RGF) with withwith reference referencereference images, images,images, which whichwhich is isis3 gen- gen-ofgen- 17 method based on the regional guided filter (RGF) with reference images, which is gen- eratederatederated byby by K-NearestK-Nearest K-Nearest NeighborNeighbor Neighbor (KNN)(KNN) (KNN) mattingmatting matting andand and usedused used asas as thethe the inputinput input imageimage image forfor for RGF,RGF, RGF, andand and erated by K-Nearest Neighbor (KNN) matting and used as the input image for RGF, and ininin [9] [9][9] using usingusing deep deepdeep features featuresfeatures with withwith Support SupportSupport Vector VectorVector Machine MachineMachine (SVM) (SVM)(SVM) and andand Neural NeuralNeural Network NetworkNetwork in [9] using deep features with Support Vector Machine (SVM) and Neural Network (NN)(NN)(NN) classifiers.classifiers. classifiers. Neighbor(NN) classifiers. (KNN) matting and used as the input image for RGF, and in [9] using deep DespiteDespiteDespite thethe the laborlabor labor inin in otherother other languageslanguages languages toto to findfind find aa a commoncommon common solutionsolution solution forfor for stylestyle style recognition,recognition, recognition, featuresDespite with the Support labor Vectorin other Machine languages (SVM) to find and a Neural common Network solution (NN) for style classifiers. recognition, thesethesethese featuresfeatures features areare are notnot not efficientefficient efficient andand and applicableapplicable applicable forfor for ArabicArabic Arabic duedue due toto to theirtheir their nature.nature. nature. ArabicArabic Arabic cal-cal- cal- theseDespite features the are labor not inefficient other languagesand applicable to find for a commonArabic due solution to their for nature. style recognition, Arabic cal- ligraphyligraphyligraphy textstexts texts areare are extremelyextremely extremely smooth,smooth, smooth, interconneinterconne interconnected,cted,cted, andand and havehave have aa a uniqueunique unique characteristiccharacteristic characteristic inin in theseligraphy features texts areare not extremely efficient smooth, and applicable interconne forcted, Arabic and due have to their a unique nature. characteristic Arabic callig- in theirtheirtheir form. form.form. Arabic ArabicArabic has hashas 28 2828 letters; letters;letters; each eacheach one oneone has hashas isolated isolatedisolated and andand connected connectedconnected forms formsforms (at (at(at the thethe bebe-be- raphytheir form. texts areArabic extremely has 28 smooth, letters; interconnected,each one has isolated and have and a uniqueconnected characteristic forms (at inthe their beginning,ginning,ginning, center, center,center, and andand end), end),end), as asas illustrated illustratedillustrated in inin Figure FigureFigure 2. 2.2. Another AnotherAnother difficulty difficultydifficulty in inin Arabic ArabicArabic callig- callig-callig- form.ginning, Arabic center, has and 28 letters; end), as each illustrated one has isolatedin Figure and 2. Another connected difficulty forms (at in the Arabic beginning, callig- raphyraphyraphy isis is thethe the possibilitypossibility possibility ofof of lettersletters letters overlappinoverlappin overlappinggg oneone one aboveabove above thethe the other,other, other, whichwhich which isis is notnot not thethe the mattermatter matter center,raphy is and the end), possibility as illustrated of letters in overlappin Figure2. Anotherg one above difficulty the other, in Arabic which calligraphy is not the matter is the withwithwith otherother other languages.languages. languages. AdditionAddition Additionally,ally,ally, thethe the shapesshapes shapes ofof of lettersletters letters areare are affectedaffected affected byby by bothboth both thethe the writingwriting writing possibilitywith other of languages. letters overlapping Addition oneally, above the shapes the other, of letters which are is notaffected the matter by both with the other writing lan- andandand the thethe writer’s writer’swriter’s style, style,style, and andand alsoalso also by byby the thethe purpose purposepurpose of ofof the thethe writing writingwriting itselfitself itself (e.g., (e.g.,(e.g., for forfor decoration, decoration,decoration, guages.and the Additionally,writer’s style, the and shapes also ofby lettersthe purpose are affected of the by writing both the itself writing (e.g., and for thedecoration, writer’s documentation,documentation,documentation, oror or otherother other purposes).purposes). purposes). style,documentation, and also by or the other purpose purposes). of the writing itself (e.g., for decoration, documentation, or other purposes).

((a(aa)) ) ((b(bb)) ) (a) (b) FigureFigureFigure 2.2. 2. ComponentsComponents Components ofof of thethe the ArabicArabic Arabic language.language. language. (( aa(a)) )TheThe The alphabetalphabet alphabet consistingconsisting consisting ofof of 2828 28 characterscharacters characters andand and (( bb(b)) ) Figure 2. Components of the Arabic language. (a) The alphabet consisting of 28 characters and (b) Figurethethethe differentdifferent different 2. Components formsforms forms ofof of thethe the of samesame thesameArabic charactecharacte characte language.rr r basedbased based onon on (a thethe )the The positionposition position alphabet inin in thethe the consisting word.word. word. of 28 characters and (theb) the different different forms forms of the of the same same characte characterr based based on onthe the position position in the in the word. word. InInIn this thisthis paper, paper,paper, we wewe propose proposepropose a aa robust robustrobust solution solutionsolution for forfor AC ACAC style stylestyle recognition recognitionrecognition from fromfrom images. images.images. InIn thisthis paper,paper, wewe proposepropose aa robustrobust solutionsolution forfor ACAC stylestyle recognitionrecognition fromfrom images.images. OurOurOur proposalproposal proposal consistsconsists consists ofof of twotwo two mainmain main phases:phases: phases: (1(1 (1)) ) featurefeature feature extractionextraction extraction fromfrom from ACAC AC imagesimages images andand and (2)(2) (2) OurOur proposalproposal consistsconsists ofof twotwo mainmain phases:phases: (1)(1) featurefeature extractionextraction fromfrom ACAC imagesimages andand (2)(2) classificationclassificationclassification through throughthrough decision decisiondecision fusion. fusion.fusion. The TheThe extracted extractedextracted features, features,features, which whichwhich represent representrepresent the thethe dif- dif-dif- classificationclassification through through decision decision fusion. fusion. The The extracted extracted features, features, which which represent represent the different the dif- ferentferentferent morphologiesmorphologies morphologies ofof of AC,AC, AC, havehave have beenbeen been usedused used forfor for thethe the firstfirst first timetime time inin in thisthis this workwork work andand and havehave have beenbeen been morphologiesferent morphologies of AC, haveof AC, been have used been for used the first for the time first in this time work in this and work have and been have inspired been inspiredinspiredinspired byby by thethe the characteristicscharacteristics characteristics expertsexperts experts (calligraphers)(calligraphers) (calligraphers) useuse use forfor for recognizingrecognizing recognizing ACAC AC styles.styles. styles. SinceSince Since byinspired the characteristics by the characteristics experts (calligraphers)experts (calligraphers) use for use recognizing for recognizing AC styles. AC styles. Since eachSince eacheacheach feature featurefeature is isis dedicated dedicateddedicated to toto distinguishing distinguishingdistinguishing on ononeee or oror more moremore styles, styles,styles, multiple multiplemultiple classifiers classifiersclassifiers have havehave featureeach feature is dedicated is dedicated to distinguishing to distinguishing one or on moree or styles,more styles, multiple multiple classifiers classifiers have beenhave beenbeenbeen engaged engagedengaged in inin the thethe process processprocess of ofof recognition recognitionrecognition in inin which whichwhich the thethe decisions decisionsdecisions are areare fusedfused fused to toto obtain obtainobtain a aa engagedbeen engaged in the in process the process of recognition of recognition in which in which the decisions the decisions are fused are fused to obtain to obtain a final a finalfinalfinal decision.decision. decision. ItIt It shouldshould should bebe be mentionedmentioned mentioned thatthat that nonenone none ofof of thethe the formerformer former worksworks works onon on ACAC AC recognitionrecognition recognition decision.final decision. It should It should be mentioned be mentioned that nonethat none of the of former the former works works on AC onrecognition AC recognition has hashashas implicatedimplicated implicated suchsuch such aa a widewide wide varietyvariety variety ofof of stylesstyles styles (nine(nine (nine styles)styles) styles) asas as wewe we diddid did inin in thisthis this workwork work (Table(Table (Table 2).2). 2). implicatedhas implicated such such a wide a wide variety variety of styles of styles (nine (nine styles) styles) as we as did we in did this in workthis work (Table (Table 2). Our 2). OurOurOur proposalproposal proposal achievedachieved achieved stst state-of-the-artate-of-the-artate-of-the-art andand and alsoalso also providedprovided provided stabilitystability stability againstagainst against scalescale scale variationvariation variation proposalOur proposal achieved achieved state-of-the-art state-of-the-art and alsoand providedalso provided stability stability against against scale scale variation variation and andandand learninglearning learning fromfrom from aa a fewfew few samples.samples. samples. learningand learning from from a few a samples.few samples. TheTheThe remainderremainder remainder ofof of thisthis this paperpaper paper isis is organizedorganized organized asas as follows.follows. follows. InIn In SectionSection Section 2,2, 2, wewe we clarifyclarify clarify ArabicArabic Arabic TheThe remainderremainder ofof thisthis paperpaper isis organizedorganized asas follows.follows. InIn SectionSection2 ,2, we we clarify clarify Arabic Arabic writingwritingwriting characteristics, characteristics,characteristics, styles, styles,styles, the thethe writing writingwriting complexity complexitycomplexity of ofof each eacheach style, style,style, and andand the thethe reasons reasonsreasons we wewe writingwriting characteristics,characteristics, styles,styles, thethe writingwriting complexitycomplexity ofof eacheach style,style, andand thethe reasonsreasons wewe considerconsiderconsider ninenine nine styles.styles. styles. SectionSection Section 33 3 listslists lists andand and discussesdiscusses discusses highlyhighly highly relatedrelated related worksworks works inin in thethe the fieldfield field ofof of ACAC AC considerconsider ninenine styles.styles. SectionSection3 3 lists lists and and discusses discusses highly highly related related works works in in the the field field of of AC AC stylestylestyle Recognition.Recognition. Recognition. Thereafter,Thereafter, Thereafter, wewe we introduceintroduce introduce ourour our proposalproposal proposal inin in SectionSection Section 44 4 andand and experimentallyexperimentally experimentally stylestyle Recognition.Recognition. Thereafter,Thereafter, wewe introduceintroduce ourour proposalproposal inin SectionSection4 4and and experimentally experimentally evaluateevaluateevaluate itit it inin in SectionSection Section 5.5. 5. Finally,Finally, Finally, wewe we inferinfer infer aa a generalgeneral general conclusionconclusion conclusion andand and somesome some perspectives.perspectives. perspectives. evaluateevaluate itit inin SectionSection5 5.. Finally, Finally, we we infer infer a a general general conclusion conclusion and and some some perspectives. perspectives. 2.2.2. Characteristics Characteristics Characteristics ofof of ArabicArabic Arabic HandwrittenHandwritten Handwritten CalligraphyCalligraphy Calligraphy andand and ItsIts Its ComplexityComplexity Complexity 2. Characteristics of Arabic Handwritten Calligraphy and Its Complexity ArabicArabicArabic consistsconsists consists ofof of 1717 17 mainmainmain main charactercharactercharacter character forms.formform forms.s.s. WithWith With thethethe the additionadditionaddition addition ofofof of dotsdotsdots dots placedplacedplaced placed aboveaboveabove above Arabic consists of 17 main character forms. With the addition of dots placed above ororor belowbelow below certaincertain certain ofof of them,them, them, itit it providesprovides provides aa a totaltotaltotal total ofofof of 282828 28 letters.letters.letters. letters. AsAs As shownshown shown inin in (Figure(Figure (Figure 2 2)2) )2) above, above,above, above, or below certain of them, it provides a total of 28 letters. As shown in (Figure 2) above, aa a,( ,(,(,(ﺏﺏ)ﺏ )) thethethe samesame same letterletter letter shapeshape shape can can can form form form a a a “b” “b” “b” or or or “baa” “baa” “baa” sound sound sound when when when one one one dot dot dot is is is placed placed placed below belowbelow a ,(ﺏ) the same letter shape can form a “b” or “baa” sound when one dot is placed below oror or aa aa “th”“th” “th”“th” oror oror “thaa”“thaa” “thaa” sound sound sound whenwhenwhen when ,(,(,(ﺕﺕ)ﺕ )) ) a“t”“t”“t” “t” oror or or “taa”“taa” “taa” “taa” soundsound sound whenwhen when twotwo twotwo dotsdots dots are areare placed placed placed aboveabove above or a “th” or “thaa” sound when ,(ﺕ) t” or “taa” sound when two dots are placed above“ ShortShort Short vowelsvowels vowelsvowels areare areare notnot notnot includedincluded includedincluded in in in the the the alphabet alphabet alphabet butbut but insteadinstead instead .(.(.(.(ﺙﺙ)ﺙ )) ) threethreethree dotsdotsdots dots areareare are added addedadded added above aboveabove above Short vowels are not included in the alphabet but instead .(ﺙ) three dots are added above indicated by signs (diacritics) placed above or below the consonant or long vowel that they follow. The Arabic writing system is a Right to Left (RTL), which means words are written and read in RTL way as in Hebrew, Persian, and Urdu. Texts are written in a cursive way where letters must be interconnected unlike other languages, such as English or French, Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 17

Appl. Sci. 2021, 11, x FOR PEER REVIEW 4 of 17 indicated by signs (diacritics) placed above or below the consonant or long vowel that

Appl. Sci. 2021, 11, x FOR PEER REVIEWthey follow. The Arabic writing system is a Right to Left (RTL), which means words4 ofare 17 written and read in RTL way as in Hebrew, Persian, and Urdu. Texts are written in a Appl. Sci. 2021, 11, x FOR PEER REVIEWindicated by signs (diacritics) placed above or below the consonant or long vowel4 of that 17 cursive way where letters must be interconnected unlike other languages, such as English orthey French, follow. where The connectingArabic writing letters system is a matter is a Right of choice to Left rather (RTL), than which obligation. means Onewords must are alsowrittenindicated know and thatby read signs a letter in (diacritics) RTL in Arabic way placedas can in beHebrew, abovewritten or Persian, in below three andthedifferent consonantUrdu. forms Texts or based arelong written on vowel its loca- inthat a they follow. The Arabic writing system is a Right to Left (RTL), which means words are indicatedtioncursive or syllable way by wheresigns in the (diacritics) letters word must (beginning, placed be interconne above middl orctede, belowor unlikeat the the end)other consonant as languages, shown or in long suchFigure vowel as 2. English that written and read in RTL way as in Hebrew, Persian, and Urdu. Texts are written in a theyor French, follow. where The Arabic connecting writing letters system is a matteris a Right of choiceto Left rather (RTL), than which obligation. means words One must are cursive way where letters must be interconnected unlike other languages, such as English writtenArabicalso know Calligraphy and that read a letterStylesin RTL in wayArabic as canin Hebrew,be written Persian, in three and different Urdu. formsTexts basedare written on its loca-in a or French, where connecting letters is a matter of choice rather than obligation. One must Appl. Sci. 2021, 11, 4852 cursivetion ACor waysyllable consists where in of the letters two word main must (beginning, styles be interconne (Kufi middl andcted e,Naskh), or unlike at the each other end) style languages,as shownhas several in such Figure variants, as English 2. 4 of as 17 also know that a letter in Arabic can be written in three different forms based on its loca- orwell French, as region-specific where connecting styles. letters In the is historya matter of of Arabic choice calligraphy,rather than obligation.there were One six mustmain Arabiction or Calligraphy syllable in Stylesthe word (beginning, middle, or at the end) as shown in Figure 2. alsostyles know namely that Kufi,a letter Diwani, in Arabic Farisi, can beNaskh, written Rekaa, in three and different Thuluth forms each basedof which on its has loca- its tioncharacteristics orAC syllable consists and in theof writing two word main rules.(beginning, styles Over (Kufi time, middl and overe, Naskh),or 100 at the different each end) style as styles shown has appeared several in Figure variants, and 2. iden- as whereArabic connectingCalligraphy lettersStyles is a matter of choice rather than obligation. One must also know well as region-specific styles. In the history of Arabic calligraphy, there were six main tifiedthat a driven letter in from Arabic the can former be written main in styles. three differentNine distinctive forms based categories on its location of cursive or syllable styles stylesAC namely consists Kufi, of twoDiwani, main Farisi, styles (KufiNaskh, and Rekaa, Naskh), and each Thuluth style haseach several of which variants, has itsas Arabicevolved,in the wordCalligraphy becoming (beginning, Styles ever middle,more refined or at theand end) elegant, as shown some inof Figurewhich2 are. still used and some well as region-specific styles. In the history of Arabic calligraphy, there were six main otherscharacteristicsAC that consists have and ofdisappeared. writingtwo main rules. styles The Over (Kuficommonly time, and over Naskh), used 100 stylesdifferent each stylenowadays styles has severalappeared are the variants, andsix basiciden- as styles namely Kufi, Diwani, Farisi, Naskh, Rekaa, and Thuluth each of which has its wellstylesArabictified as (namely:driven Calligraphyregion-specific from Kufi, Styles the Diwani, styles.former Farisi,In main the Naskh,historystyles. NineRekaa,of Arabic distinctive and calligraphy, Thuluth) categories and there three of were cursive other six stylesmainstyles characteristics and writing rules. Over time, over 100 different styles appeared and iden- stylesthatevolved, areAC namely driven consistsbecoming Kufi, from of twoever Diwani,them mainmore which Farisi, stylesrefined are (KufiNaskh, Maghri and andelegant, Rekaa,bi, Naskh), Mohakik, some and each ofThuluth and which style Square-Kufi. has eachare severalstill of usedwhich Unlike variants, and has other some its as tified driven from the former main styles. Nine distinctive categories of cursive styles characteristicsrelatedwellothers as region-specificworksthat have thatand disappeared. dealtwriting styles. with rules. Ina few theThe Over historyof commonlythese time, of styles over Arabic used ,100 incalligraphy, this different styles work, nowadays styleswe there consider appeared were are six thethe main andfull six list stylesiden-basic of evolved, becoming ever more refined and elegant, some of which are still used and some tifiedthenamelystyles nine driven (namely:Kufi, styles. Diwani,from Table Kufi, the Farisi,1Diwani, resumesformer Naskh, Farisi,main the Rekaa,different styles. Naskh, and Nine characteristicsRekaa, Thuluth distinctive and each Thuluth) ofand categories which the and cues has threeof itsthat cursive characteristics expertsother stylesstyles use others that have disappeared. The commonly used styles nowadays are the six basic evolved,inandthat identifying writingare drivenbecoming rules. each from Over oneever them of time,more these which overrefined styles. 100are and differentMaghriIt also elegant, providesbi, styles Mohakik, some appeared a representativeof andwhich andSquare-Kufi. are identified still sample used Unlike driven fromand some fromothereach styles (namely: Kufi, Diwani, Farisi, Naskh, Rekaa, and Thuluth) and three other styles othersstyletherelated former to that showworks mainhave how that styles.disappeared. it dealt differs Ninewith from a distinctive fewThe others. ofcommonly these categories styles used, in of thisstyles cursive work, nowadays styleswe consider evolved, are thethe becoming sixfull basiclist of that are driven from them which are Maghribi, Mohakik, and Square-Kufi. Unlike other styleseverthe nine more (namely: styles. refined Kufi,Table and Diwani, 1 elegant,resumes Farisi, somethe differentNaskh, of which Rekaa, characteristics are stilland usedThuluth) and and the someand cues three others that other experts that styles have use related works that dealt with a few of these styles, in this work, we consider the full list of Table 1. The nine most commonthatdisappeared.in identifying are AC driven styles The each withfrom commonly theirone them ofcharacteristic thesewhich used styles. are styless Maghriand It alsonowadayscuesbi, providesthat Mohakik, are are used a the representative andby six experts Square-Kufi. basic to styles identify sample (namely: Unlike them. from otherKufi, each stylethe nine to show styles. how Table it differs 1 resumes from the others. different characteristics and the cues that experts use relatedDiwani, works Farisi, that Naskh, dealt Rekaa,with a few and of Thuluth) these styles and, threein this other work, styles we consider that are the driven fullStyle list from of AC Style Examplethemin Image identifying which are each Maghribi, one General of Mohakik, these Characteristics styles. and It Square-Kufi. also provides Unlike Cues a representative Used other by related Experts sample works thatfrom dealt each the nine styles. Table 1 resumes the different characteristics and the cues that expertsSource use Table 1. The nine most commonwithstyle a toAC few show styles of these how with styles,ittheir differs characteristic in from this work, others.s and we cues consider that are the used full by list experts of the to nine identify styles. them. Table 1 in identifying eachComposed one of these of styles.geometrical It also forms provides - it a is representative angular sample from each styleresumes to show the different how it differs characteristics from others. and the cues that experts use in identifying each one Table 1. The nine most common AC styles withsuch their as characteristicstraight verti-s and cues that are- thick used strokesby experts to identify them.Style ACKufi Style Exampleof Image these styles. It also provides General aCharacteristics representative sample Cues from Used each by style Experts to show how it cals/horizontals lines and distin- - long vertical extended Source Table 1. The nine most commondiffers AC from styles others. with their characteristics and cues that are used by experts to identify them.Style AC Style Example Image guishableComposed General angles. of geometricalCharacteristics forms strokes- it Cues is angular Used by Experts Source Table 1. The nine most common AC styles withsuch their as characteristics straight verti- and cues that are-- usedslant thick by words strokes experts to identify them.Style ACKufi Style Example Image Composed General of Characteristics geometrical forms - Cues it is angularUsed by Experts Acals/horizontals cursive script, beautiful,lines and distin-and - long vertical extended Source AC Style Example Image General Characteristics- written Cues Used with by Experts a rounded Style Source harmonious.such as straight It is verti-characterized by - thick strokes Kufi Composedguishable angles.of geometrical forms -shapestrokes it is angular - it is angular thecals/horizontalsComposed rounded of geometrical shape lines of forms letters. and such distin- as - long vertical extended such as straight verti- --- thick slant strokes strokeswords DiwaniKufi Astraight cursive verticals/horizontals script, beautiful, lines and and - does not follow a Kufi Wordsguishable are writtenangles. skewed de- -strokes long vertical cals/horizontalsdistinguishable angles. lines and distin- -straight- long written vertical line with extended a rounded harmonious. It is characterized by extended- slant strokeswords guishablescend from angles. right to left. Lines startstrokesshape theA cursive rounded script, shape beautiful, of letters. and -- the slant style words width changes Appl.Diwani Sci. 2021 , 11, x FOR PEER REVIEW thick and then attenuate. -- doeswritten not with follow a rounded a 5 of 17 Wordsharmonious.A cursive are script, written beautiful, It is characterized skewed and harmonious. de- by-from- writtenslant thick words with a to thin AIt cursive is characterized script, by thebeautiful, rounded shape and of roundedstraightshape shape line Diwani Characterizedscendtheletters. rounded Wordsfrom are right shape writtenby theto skewed left.of letter’s letters. Lines descend ac- start-- doeswritten not follow with a a rounded Diwani harmonious.from right to left. It Lines is characterized start thick and by straight- does line not follow a curacyWords and are extension,written skewed and by de- its shape-- simple the style orientation width changes thethickthen rounded attenuate. and then shape attenuate. of letters. -straight the style width line changes from DiwaniFarisi Mostly,easescend and from used clarity rightwithout and to lackleft. diacritics. Linesof com- start -thickfrom does to thinthicknot follow to thin a Appl. Sci. 2021, 11, x FOR PEER REVIEW Words are written skewed de- - the style width changes 5 of 17 knownCharacterizedthick and with then its by clippedattenuate. the letter’s letters. ac- straight line plexity.Characterized Farisi by the(alt., letter’s Nasta’liq) accuracy and is a straight line scend from right to left. Lines start-- fromsimple simple orientationthick orientation to thin Composedcuracyextension, and and of byextension, short, its ease andstraight clarityand andby lines its - simple orientation Farisi cursive text. -- doesthe notstyle follow width a changes thickCharacterizedlack of and complexity. then Farisiattenuate.by the (alt., letter’s Nasta’liq) ac- is a - written above the base- Farisi ease and clarity and lack of com- straight- does line not follow a Rekaa andOnecursive simple of text.the simplestcurves, as writing well as types its from thick to thin curacy and extension, and by its line- simple orientation Mostly,straightCharacterizedplexity. usedand Farisi evenwithout by(alt., thelines. Nasta’liq) letter’sdiacritics. It is writ- ac- is a straight line Appl. Sci. 2021, 11, x FOR PEER REVIEW of calligraphy. Its name means - thick stroke 5 of 17 Farisi knowncursiveeaseOne of and thewith text. simplest clarity its clipped writing and typeslack letters. of of com- -simple- does notorientation follow a tencuracy“the above copy and thestyle” extension, baseline because andexcept it byhas forits -- simplesimple orientationorientation plexity.calligraphy. Farisi Its name (alt., means Nasta’liq) “the copy style” is a --simple straightwritten orientation online the baseline Farisi ComposedsomeeaseOnebecause and ofspecific itthe hasclarity of beensimplest short,letters. used and for straight lackwriting copying of books.com-lines types -- writtendoes not on the follow baseline a NaskhNaskh been used for copying books. It is - written above the base- Rekaa andofcursiveIt iscalligraphy. simple recognizable text. curves, by Its its balance nameas well and means as its plainits -- slightslight change change in the in the text Itplexity.recognizable clearis an forms. elegant, Farisi Nowadays, by (alt., cursive its this Nasta’liq)balance script script is usedand thatis aits straighttext thickness line One of the simplest writing typesline -simple orientation Mostly,straighthas“theprimarily large, copy usedand in elongated, print. style” even without becauselines. and diacritics. It elegantisit writ-has thickness cursiveplain clear text. forms. Nowadays, this - -thick written stroke on the baseline Naskh knowntenbofeen abovecalligraphy. used with the for its baseline copyingclipped Its name except letters.books. means for It is letters.OnescriptMostly, of is usedItthe used has withoutsimplest certain primarily diacritics. writingroles, knownin print.but types it -- -simple-simpleoverlapping slight changeorientation orientation letters in the text Composedsome“thewith its specificcopy clipped ofstyle” letters. letters.short, because Composed straight ofit short,lineshas couldrecognizable be changed by its according balance andto its-- big simple rounded orientation letters ofstraight calligraphy. lines and simple Its name curves, means as well as - written- written above on the the baseline base- RekaaNaskh been used for copying books. It is-simple -thickness written above orientation the baseline Rekaa andItplain itsis straightansimple clearelegant, and curves,forms. even cursive lines. Nowadays,as It iswell script written as itsthat this calligraphers’“the copy style” needs. because It has it bighas line-- -high thick slight stroke number change of in dia- the text Thuluth straighthasrecognizableabove large, the and baseline elongated, even by except itslines. for balanceand some It elegantis writ- and its- written on the baseline Naskh letters,bscripteenspecific used is letters.which used for takecopyingprimarily a big books. spacein print. It is criticsthickness plain clear forms. Nowadays, this-- thickslight stroke change in the text tenletters.aboverecognizable above theIt has thebaseline; certain bybaseline its balancethe roles, except big butover- and for it its -- overlappingit does not consider letters the somecouldscriptIt is an specificbe elegant, is changed used cursive letters. primarily according script that in has print. large,to -thickness big rounded letters lappedplainelongated, clear curves and forms. elegant result letters.Nowadays, in Itbig has certainholes this b-aseline overlapping letters Itcalligraphers’that roles,is ancalligraphers but elegant, it could needs. be cursive changed fill Itwith according hasscript artisticbig that to -- high big rounded number letters of dia- ThuluthThuluth scriptcalligraphers’ is used needs. primarily It has big letters,in print. - high number of diacritics hasletters,diacritics.which large, take which aelongated, big spacetake above a big and the space baseline;elegant critics- it does not consider the big overlapped curves result in big holes the baseline letters.aboveIt thathas calligraphers specialthe It has baseline; certainletterforms fill with the artisticroles, big which diacritics. butover- it - itoverlapping does not consider letters the couldlapped be curves changed result according in big holes to -b aselinebig rounded letters provideIt has special it with letterforms a unique which provide beauty it thatwith calligraphers a unique beauty and fill make with it easy artistic to calligraphers’and make it easy needs. to read, It has even big in -- - hightext text orientation orientation number of dia- ThuluthMaghribi read, even in long texts. It is marked by -Kufi Maghribi - written on the baseline -Kufi letters,diacritics.longdescending texts. which lines It is writtentake marked a with bigvery by space de- critics- written on the baseline large bowls. aboveItscending has specialthe linesbaseline; letterforms written the withbig which over- very - it does not consider the lappedprovidelarge bowls. curves it with result a unique in big beauty holes baseline thatand makecalligraphers it easy to fill read, with even artistic in -- textspecial orientation orientation Maghribi Mohakik and Thuluth have al- -Kufi diacritics.long texts. It is marked by de- -under written the on baseline the baseline most the same characteristics. Mohakik Itscending has special lines letterforms written with which very - same diacritics as the -Thuluth providelargeHowever, bowls. it withletters a uniquein Mohakik beauty are thuluth style andlonger make under it easy the tobaseline. read, even in - textspecial orientation orientation Maghribi Mohakik and Thuluth have al- - written on the baseline -Kufi long texts. It is marked by de- -under written the on baseline the baseline mostA unique the same style characteristics.developed for Mohakik scending lines written with very - same diacritics as the -Thuluth However,decorating letters walls inrather Mohakik than pa-are large bowls. thuluth style longerpers. usually under thedesigned baseline. with mo- - specialwritten orientationon the baseline Mohakiksaic faience, and decorative Thuluth have glazed al- - strokes with equal A unique style developed for under the baseline Square-Kufi mostfacing the tiles, same or simplecharacteristics. bricks, ra- thickness -Kufi Mohakik decorating walls rather than pa- - same diacritics as the -Thuluth However,ther than reedletters pens in Mohakikand ink. Itare has - square angles pers. usually designed with mo- thuluth style longertwo strict under rules: the (a) baseline. evenness of saicfull andfaience, empty decorative spaces, andglazed (b) - strokeswritten withon the equal baseline Square-Kufi Afacingsquare unique tiles, angles style or andsimple developed strict bricks, lines. for ra- thickness -Kufi decoratingther than reed walls pens rather and than ink. Itpa- has - square angles Due to the complexitypers.two strict usually ofrules: AC designed (a)and evenness the with shared mo- of characteristics among its styles, none of the former related saicfull works andfaience, hasempty considereddecorative spaces, and glazedthe (b)full list- ofstrokes nine styles.with equal However, there were Square-Kufi some serious effortsfacingsquare to come tiles,angles upor andsimplewith strict some bricks, lines. specific ra- techniquesthickness for recognizing-Kufi AC styles. In the following section,ther than we reed list andpens discuss and ink. works It has that - square aimed angles at recognizing Arabic cal- ligraphyDue stylesto the usingcomplexitytwo computer strict ofrules: AC vision (a)and evenness thetechniques. shared of characteristics among its styles, none of the former related full works and hasempty considered spaces, andthe full(b) list of nine styles. However, there were some3. Related serious Work efforts square to come angles up andwith strict some lines. specific techniques for recognizing AC styles. In theArabic following text fontsection, recognition we list and has discussbeen extensively works that studied aimed owing at recognizing to the availability Arabic cal- of ligraphya hugeDue amount stylesto the using complexityof resources computer of that AC vision helpand techniques.theresear sharedchers characteristics build new solutions. among itsWorks styles, on none Arabic of thefont former recognition related can works be groupedhas considered into two the categories,full list of nine namely styles. handcrafted- However, there and weredeep some3.learning-based Related serious Work efforts categories. to come On up the with one some hand, sp ecificstarting techniques with handcrafted-based for recognizing ACsolutions, styles. InFaten theArabic followingKellal text [10] fontsection, uses recognition discrete we list and hascurvelet discussbeen extensivelytransform works that (DCT)studied aimed to owing atconvert recognizing to theimages availability Arabic into cal-de- of ligraphyascriptors, huge amount styles and then, usingof resources a computerbackpropagation that vision help techniques.resear neuralchers network build has new been solutions. employed Works to predict on Arabic the font recognition can be grouped into two categories, namely handcrafted- and deep 3.learning-based Related Work categories. On the one hand, starting with handcrafted-based solutions, FatenArabic Kellal text [10] font uses recognition discrete hascurvelet been extensivelytransform (DCT) studied to owing convert to theimages availability into de- of ascriptors, huge amount and then, of resources a backpropagation that help resear neuralchers network build has new been solutions. employed Works to predict on Arabic the font recognition can be grouped into two categories, namely handcrafted- and deep learning-based categories. On the one hand, starting with handcrafted-based solutions, Faten Kellal [10] uses discrete curvelet transform (DCT) to convert images into descriptors, and then, a backpropagation neural network has been employed to predict the Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 17

Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 17

Mostly, used without diacritics. known with its clipped letters. - simple orientation Mostly,Composed used of without short, straight diacritics. lines known with its clipped letters. - written above the base- Rekaa and simple curves, as well as its - simple orientation Composed of short, straight lines line straight and even lines. It is writ- - written above the base- Rekaa and simple curves, as well as its - thick stroke ten above the baseline except for line straightsome specific and even letters. lines. It is writ- - thick stroke tenIt is above an elegant, the baseline cursive except script forthat somehas large, specific elongated, letters. and elegant Itletters. is an elegant,It has certain cursive roles, script but that it - overlapping letters hascould large, be changed elongated, according and elegant to - big rounded letters letters.calligraphers’ It has certain needs. roles, It has but big it -- overlappinghigh number letters of dia- Thuluth couldletters, be which changed take according a big space to -critics big rounded letters calligraphers’ needs. It has big - high number of dia- Thuluth above the baseline; the big over- - it does not consider the letters, which take a big space critics lapped curves result in big holes baseline abovethat calligraphers the baseline; fill the with big artisticover- - it does not consider the lappeddiacritics. curves result in big holes baseline thatIt has calligraphers special letterforms fill with which artistic diacritics.provide it with a unique beauty Itand has make special it easy letterforms to read, which even in - text orientation Maghribi -Kufi Appl. Sci. 2021, 11, 4852 providelong texts. it with It is markeda unique by beauty de- - written on the baseline 5 of 17 and make it easy to read, even in - text orientation Maghribi scending lines written with very -Kufi longlarge texts. bowls. It is marked by de- - written on the baseline

scending lines written with very - special orientation MohakikTable and 1. Cont.Thuluth have al- large bowls. under the baseline AC Style Example Imagemost the Generalsame Characteristicscharacteristics. - special Cues Used orientation by Experts Style Source Mohakik Mohakik and Thuluth have al- - same diacritics as the -Thuluth However, letters in Mohakik are under- special the orientation baseline under mostMohakik the andsame Thuluth characteristics. have almost the same thuluththe baseline style MohakikMohakik longercharacteristics. under However, the baseline. letters in Mohakik - -same same diacritics diacritics as the as the -Thuluth-Thuluth However, letters in Mohakik are - written on the baseline are longer under the baseline. thuluththuluth style style longerA unique under style the developed baseline. for - written on the baseline decorating walls rather than pa- - written on the baseline A unique style developed for pers.A unique usually style developed designed for decoratingwith mo- decoratingwalls rather thanwalls papers. rather usually than designed pa- saicwith faience, mosaic faience, decorative decorative glazed glazed - strokes with equal - strokes with equal thickness Square-Kufi pers.facing usually tiles, or simple designed bricks, rather with than mo- -Kufi Square-Kufi facing tiles, or simple bricks, ra- thickness- square angles -Kufi saicreed faience, pens and ink.decorative It has two strict glazed rules: (a) - strokes with equal therevenness than of reed full and pens empty and spaces, ink. and It (b) has - square angles Square-Kufi facingtwosquare strict tiles, angles rules: andor simple strict (a) lines. evenness bricks, ra-of thickness -Kufi therfull andthan empty reed pens spaces, and and ink. (b) It has - square angles twosquare strict angles rules: and (a) strictevenness lines. of full and empty spaces, and (b) Due to the complexity of AC and the shared characteristics among its styles, none of theDue former to the related complexitysquare works angles of hasAC and consideredand strict the lines.shared the characteristics full list of nine among styles. its However, styles, none there of thewere former some related serious works efforts has to comeconsidered up with the some full list specific of nine techniques styles. However, for recognizing there were AC somestyles.Due serious In to the the followingefforts complexity to come section, of up AC we with and list somethe and shared discussspecific characteristics works techniques that aimed for among recognizing at recognizingits styles, AC none Arabicstyles. of theIncalligraphy theformer following related styles section, works using computerwehas list considered and vision discuss the techniques. worksfull list that of nine aimed styles. at recognizing However, thereArabic were cal- someligraphy serious styles efforts using to computer come up visionwith some techniques. specific techniques for recognizing AC styles. In3. the Related following Work section, we list and discuss works that aimed at recognizing Arabic cal- ligraphy3. RelatedArabic styles Work text using font computer recognition vision has been techniques. extensively studied owing to the availability of a hugeArabic amount text of font resources recognition that help has researchers been extensively build new studied solutions. owing Works to the onavailability Arabic font of 3. Related Work arecognition huge amount can of be resources grouped intothat twohelp categories,researchers namely build new handcrafted- solutions. and Works deep on learning- Arabic fontbased Arabicrecognition categories. text font can On recognition be the grouped one hand,has into beenstarting two extensively categories, with handcrafted-basedstudied namely owing handcrafted- to the solutions, availability and Fatendeep of alearning-basedKellal huge [amount10] uses ofcategories. discrete resources curvelet On that the help transform one resear hand, (DCT)chers starting build to convert with new handcrafted-based solutions. images into Works descriptors, onsolutions, Arabic and fontFatenthen, recognition aKellal backpropagation [10] can uses be discretegrouped neural curvelet network into two transform has categories, been employed(DCT) namely to to converthandcrafted- predict images the styleand into ofdeep de- the learning-basedscriptors,image text and based then, categories. on a backpropagation the extractedOn the one descriptor. hand,neural starting network In [11 with, 12has], handcrafted-basedbeen the former employed method to predictsolutions, has been the Fatenreinforced Kellal against [10] uses scale discrete changes curvelet by using transform a steerable (DCT) pyramid to convert texture. images Hussein. into A.K de- [5] scriptors,has experimented and then, anda backpropagation compared several neural texture network descriptors has been for employed Arabic font to predict recognition the using extreme learning machine (ELM) and fast learning machine (FLN). On the other hand, deep convolutional neural networks (DCNN) [13,14] have been utilized in several occasions to perform Arabic font and font size recognition. While there has been much research on Arabic font, few researchers have dealt with Arabic calligraphy. It could be attributed to the lack of resources such as public datasets and insufficiently related works. However, the reader should know that dealing with calligraphy is much harder than fonts due to the absence of creativity in the latter. In other words, calligraphy texts are affected by the calligrapher’s impression, which is not the case with texts generated using machine fonts. Therefore, we limit ourselves to works that tackle the issue of recognizing calligraphy styles rather than fonts. Table2 lists works conducted on AC style recognition with their respective datasets and the number of styles taken into account. In [15], the authors argued that Arabic calligraphy images should be dealt with as textures, and therefore, they evaluated a set of commonly used texture descriptors for the task of AC identification. Results indicated that Binarized Statistical Image Features (BSIF) is the best texture descriptor among all for such a task. Due to the lack of big datasets of AC and the fine-grained styles (e.g., Mohakik and Thuluth), most works in the literature have utilized classical machine learning tools instead of deep learning. Moreover, they attempted to propose their descriptors for this issue instead of using other common descriptors such as Sift, LBP, etc. Appl. Sci. 2021, 11, 4852 6 of 17

Table 2. List of related works on ACSR.

Author (Year) Language Dataset No Styles Z.Kaoudja et al. (2020) [15] Arabic 1685 line image 9 Batainah et al. (2012) [16] Arabic jawi 700 blocks image (privet) 6 Batainah et al. (2013) [17] Arabic-jawi 700 block image (privet) 7 Batainah et al. (2011) [18] Arabic 14 documents images (privet) Unknown Batainah et al. (2011) [19] Arabic 100 line image (privet) 6 Talab et al.(2011) [20] Arabic 700 line images 7 Azmi et al. (2011) [21] Arabic-jawi 100 character image (privet) 5 Azmi et al. (2011) [22] Arabic-jawi 1019 character images (privet) 4 Adam et al. (2017) [23] Arabic 330 character images (privet) 6 Allaf et al.(2016) [24] Arabic 267 line/word images (privet) 3 Elhmouz et al. (2020) [25] Arabic 421 line/word images (privet) 3 M. Khayyat (2020) [26] Arabic 2653 documents images (privet) 6 Z.Kaoudja et al. (2019) [27] Arabic 1685 line image 9 R.F. Moghaddam et al. (2012) [28] Arabic 27709 word image Unknown

In all of their works, Bilel Bataineh [16–19] and Ahmed Talab [20] attempt to classify AC styles with a proposed statistical descriptor, named Edge Direction Matrix (EDMs). They firstly pass through two steps called EDM1 and EDM2, which produce 3 × 3 matrices counting edges (i.e., pixels’ adjacency), and then extract 22 statistical moments to constitute the final image descriptor. Unlike the aforementioned global approach, the works in [21–23] focused on extracting features using local shapes of letters. For instance, Allaf [24] uses the text geometrical shape, the density of text above and under the baseline, number of diacritics up and down the text, text orientation, the position of the text baseline, and the ratio of black and white for the whole image. However, the validity of this solution has been proven using a small dataset (260 images) comprising only three styles, which is highly insufficient. To sum up, the aforementioned works can be classified into two categories, holistic and local approaches. The holistic one is concerned with recognizing styles using global descriptors. However, these approaches are not dedicated to calligraphy, but they can be applied to heterogeneous images ignoring the very specific characteristics of AC. On the other hand, the local approaches look into the characteristics of AC, exaggeratedly, via letter segmentation which makes it inapplicable especially for complicated styles or for big datasets. In this paper, we propose a solution that combines the advantages of both, describing the specific features of each style without resorting to letter segmentation. Although deep learning has proven itself as the best solution for many issues, it has not been widely explored for ACSR due to the fine-grained styles and absence of big datasets. We list the only two works that utilize deep learning nets for ACSR, which are [25,26]. In the former, a common deep model (stacked-auto encoder) has been trained and utilized for feature extraction, and a SoftMax has been employed to perform the recognition process. In the latter, the full MobileNetV1 model has been fine-tuned and used to classify six AC styles.

this solution has been proven using a small dataset (260 images) comprising only three styles, which is highly insufficient. To sum up, the aforementioned works can be classified into two categories, holistic and local approaches. The holistic one is concerned with recognizing styles using global descriptors. However, these approaches are not dedicated to calligraphy, but they can be applied to heterogeneous images ignoring the very specific characteristics of AC. On the other hand, the local approaches look into the characteristics of AC, exaggeratedly, via letter segmentation which makes it inapplicable especially for complicated styles or for big datasets. In this paper, we propose a solution that combines the advantages of both, describing the specific features of each style without resorting to letter segmentation. Although deep learning has proven itself as the best solution for many issues, it has not been widely explored for ACSR due to the fine-grained styles and absence of big datasets. We list the only two works that utilize deep learning nets for ACSR, which are [25,26]. In the former, a common deep model (stacked-auto encoder) has been trained and utilized for feature extraction, and a Soft- Max has been employed to perform the recognition process. In the latter, the full Mo- bileNetV1 model has been fine-tuned and used to classify six AC styles.

4. Proposed Method In this paper, a new tool is proposed for the task of ACSR. The proposal is capable of recognizing nine different AC styles and might be extended to other styles. After analyzing Arabic calligraphy, we found out that the styles are geometrically different. Some styles have a unique shape, some share similar letter shapes, and some others have special diacritics. For the visually different styles, it is easy to separate them, whilst for the Appl. Sci. 2021, 11, 4852 other fine-grained styles, we needed to extract one or more distinctive characteristics7 of17 to separate them. The proposed solution was inspired by the calligraphy expert. A calligraphy expert uses tools such as brushes, pens, or markers to create a special writing style that is artistic and expressive. After an in-depth study of each style, looking for the visual andcharacteristics expressive. Afterused anby in-depthcalligraphers study to of distinguish each style, lookingit, we did for thecome visual up characteristicswith the main useddistinctive by calligraphers features that to distinguishneed to be it,extracted we did comefor each up style with or the set main of styles. distinctive In the features feature thatextraction need to section, be extracted we explain for each these style visual or set offeatures styles. and In the the feature relationship extraction with section, their wecor- explainresponding these styles. visual features and the relationship with their corresponding styles. OurOur proposedproposed tooltool consists consists of of three three main main steps: steps: (a) (a) image image preprocessing: preprocessing: perform perform severalseveral transformations transformations to to prepare prepare the the image image for for feature feature extraction, extraction, (b) (b) feature feature extraction: extraction: extractextract the the proposed proposed descriptor descriptor from from each each image, image, and and (c) (c) classiﬁcation: classification: use use the the extracted extracted featuresfeatures to to predict predict the the style style of of the the input input image. image. Figure Figure3 3illustrates illustrates the the general general scheme scheme of of ourour proposed proposed tool. tool.

FigureFigure 3. 3.A A generalgeneral schemaschema forfor ourour proposedproposed tool.tool. The dotted line separates the the ( (AA)) training training from from (B(B)) test test phases. phases. The The images images pass pass through through a processinga processing sequence sequence starting starting from from preprocessing, preprocessing, features fea- extraction, and finally classification. The final descriptor contains eight features, as shown, which will be explained in Section 4.2.

4.1. Preprocessing Since we are interested only in the morphology of text letters, all images are ﬁrst converted into binary (i.e., black text on a white background) after having the texts separated from backgrounds [29]. It should be known that most images contain either meaningful texture, which is a part of the decoration or a meaningless one that resulted from the noise while capturing. In either case, we illuminate the background so it does not affect the results. Thereafter, we apply the following micro-operations to extract, for each image, a set of images namely edge image, skeleton image, diacritics image, and no-diacritics image. Each of these generated images will thereafter be used to deﬁne the characteristics of one or more styles of AC. Figure4 shows some of the images that resulted from these operations.

4.2. Feature Extraction In most fields, it is a common practice to ask experts about cues they use to decide for a certain inquiry. Calligraphers use some specific features, which we will discuss hereafter, to find the appropriate style for a given text. The main aim of this step is to transform these features into values that can be fed to a machine so it can decide on the style of a given text image. It should be mentioned that a feature might specify one or more styles. Features designated for each style or set of styles might be used in a sequential manner (sequential decision) or parallel manner. However, in our work, we considered all the features to be equally important and, therefore, adopt a parallel manner. That is to say, each feature descriptor is fed to a classification machine, SVM in our case, and the final decision is the combination of all the decisions of these machines. Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 17

tures extraction, and finally classification. The final descriptor contains eight features, as shown, which will be explained in Section 4.2.

4.1. Preprocessing Since we are interested only in the morphology of text letters, all images are first converted into binary (i.e., black text on a white background) after having the texts separated from backgrounds [29]. It should be known that most images contain either meaningful texture, which is a part of the decoration or a meaningless one that resulted from the noise while capturing. In either case, we illuminate the background so it does not affect the results. Thereafter, we apply the following micro-operations to extract, for each image, a set of images namely edge image, skeleton image, diacritics image, and Appl. Sci. 2021, 11, 4852 no-diacritics image. Each of these generated images will thereafter be used to define8 of the 17 characteristics of one or more styles of AC. Figure 4 shows some of the images that resulted from these operations.

FigureFigure 4.4. Operations executed in the preprocessing step to generate a set of accompanying images images forfor each image image in in the the dataset. dataset. Each Each generated generated image image will will thereafter thereafter be used be used to extract to extract one or one a set or of a setfeatures. of features. These Theseoperations operations are (a) areedge (a detection) edge detection using a using3 × 3 Laplacian a 3 × 3 Laplacian filter, (b) filter,skeleton (b) detection skeleton using ‘Zhang-Suen Thinning Algorithm’ [30], and (c) separating diacritics from text using the detection using ‘Zhang-Suen Thinning Algorithm’ [30], and (c) separating diacritics from text using flood-fill algorithm [31]. the flood-fill algorithm [31].

4.2.1.4.2. Feature Horizontal Extraction and Vertical Straight Lines (HVSL) ThisIn most descriptor fields, hasit is beena common designed practice to separate to ask Square-Kuﬁ experts about from cues other they styles. use to It mainlydecide describesfor a certain how inquiry. frequently Calligraphers horizontal and use vertical some specific lines appear features, in a textwhich image. we Square-Kuﬁwill discuss Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 17 differshereafter, from to otherfind the styles appropriate by the high style number for a ofgiven vertical text. and The horizontal main aim lines of this as shownstep is into (Figuretransform5). these features into values that can be fed to a machine so it can decide on the style of a given text image. It should be mentioned that a feature might specify one or more styles. Features designated for each style or set of styles might be used in a sequential manner (sequential decision) or parallel manner. However, in our work, we considered all the features to be equally important and, therefore, adopt a parallel manner. That is to say, each feature descriptor is fed to a classification machine, SVM in our case, and the final decision is the combination of all the decisions of these machines.

4.2.1. Horizontal and Vertical Straight Lines (HVSL) This descriptor has been designed to separate Square-Kufi from other styles. It mainly describes how frequently horizontal and vertical lines appear in a text image. Square-Kufi differs from other styles by the high number of vertical and horizontal lines as shown in (Figure 5).

Figure 5. Vertical and horizontal lines as basic features ofof Square-Kuﬁ.Square-Kufi.

HVSL is not extracted from the input image, but rather from the edge image resulted from the the preprocessing preprocessing step. step. The The final final HVSL HVSL descriptor descriptor holds holds the the following following features: features: (a) (a)The The appearance appearance frequency frequency of ofvertical vertical and and horizontal horizontal (i.e., (i.e., V/H) V/H) lines, lines, which which is is the the main cue used to distinguish Square-Kufi, Square-Kufi, (b) The difference ratio between the number of the pixels that constitute the texts edge and the sum of V/H lines lines a appearanceppearance frequency. frequency. 4.2.2. Text Orientation from Edge/Skeleton (ToE/ToS) 4.2.2. Text Orientation from Edge/Skeleton (ToE/ToS) One of the main characteristics of all the Arabic calligraphy styles is text orientation. One of the main characteristics of all the Arabic calligraphy styles is text orientation. Each style is usually written in a specific direction that defines its unique shape. By Eachdirection, style we is usually mean the written slope ofin thea specific pen while direction writing that words. defines For its instance, unique the shape. pen inBy Kufi direction,is used towe write mean on the the slope baseline of the without pen while any slope,writing whereas, words. inFor Rekaa, instance, the writingthe pen isin slightly Kufi is used to write on the baseline without any slope, whereas, in Rekaa, the writing is slightly sloping. The pen itself might be held by the calligrapher vertically or in a sloppily way based on the style intended to be adopted. A vertical grasp of the pen results in flat tails whereas a sloping grasp results in pointed ones, as Figure 6 shows.

(a) (b)

Figure 6. The effect of pen hold manner on the writing. In (a), the pen is held straight vertical whereas, in (b), the pen is held in a sloping manner.

To capture the orientation of words, we utilize the skeleton image, whereas to capture the effect of pen sloping, we use the text edge image. From each image, the orientation at each pixel level is extracted and then codified into a histogram (i.e., ToE for edge orientations and ToS for skeleton orientations). Figure 7 shows a representative example of the process. Appl. Sci. 2021, 11, x FOR PEER REVIEW 9 of 17

Figure 5. Vertical and horizontal lines as basic features of Square-Kufi.

HVSL is not extracted from the input image, but rather from the edge image resulted from the preprocessing step. The final HVSL descriptor holds the following features: (a) The appearance frequency of vertical and horizontal (i.e., V/H) lines, which is the main cue used to distinguish Square-Kufi, (b) The difference ratio between the number of the pixels that constitute the texts edge and the sum of V/H lines appearance frequency.

4.2.2. Text Orientation from Edge/Skeleton (ToE/ToS)

Appl. Sci. 2021, 11, 4852 One of the main characteristics of all the Arabic calligraphy styles is text orientation.9 of 17 Each style is usually written in a specific direction that defines its unique shape. By direction, we mean the slope of the pen while writing words. For instance, the pen in Kufi is used to write on the baseline without any slope, whereas, in Rekaa, the writing is slightly sloping. The pen itself might be held by thethe calligraphercalligrapher vertically or inin aa sloppilysloppily wayway based on thethe stylestyle intendedintended toto bebe adopted.adopted. A vertical grasp of thethe penpen resultsresults inin ﬂatflat tailstails whereaswhereas aa slopingsloping graspgrasp resultsresults inin pointedpointed ones,ones, asas FigureFigure6 6shows. shows.

(a) (b)

Figure 6. TheThe effect effect of of pen pen hold hold manner manner on on the the writing. writing. In (a In), (thea), pen the penis held is heldstraight straight vertical vertical whereas,whereas, inin ((bb),), thethe penpen isis heldheld inin aa slopingsloping manner.manner.

To capturecapture thethe orientation orientation of of words, words, we we utilize utilize the the skeleton skeleton image, image, whereas whereas tocapture to cap- theture effect the effect of pen of sloping,pen sloping, we use we theuse textthe edgetext edge image. image. From From each each image, image, the the orientation orienta- attion each at each pixel pixel level level is extracted is extracted and and then then codiﬁed codified into into a a histogram histogram (i.e., (i.e.,ToE ToE forfor edgeedge Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 17 orientations andand ToSToS for for skeleton skeleton orientations). orientations). Figure Figure7 shows 7 shows a representative a representative example example of theof the process. process.

(a) (b)

FigureFigure 7. ExtractingExtracting orientations orientations to to be be used used to togenerate generate ToE ToE and and ToS ToS histograms. histograms. Orientations Orientations ex- extractedtracted from from (a ()a the) the skeleton skeleton image, image, and and (b ()b the) the edge edge image. image.

4.2.3.4.2.3. Long Vertical LinesLines (LVL)(LVL) SinceSince KufiKufi andand Square-KufiSquare-Kufi shareshare commoncommon featuresfeatures suchsuch asas verticalvertical andand horizontalhorizontal lines,lines, HVSLHVSL will will consider consider Kufi Kufi images images as Square-Kufias Square-Kufi and separateand separate them fromthem otherfrom styles.other However,styles. However, Kufi has Kufi a distinctive has a distinctive characteristic, characteristic, compared compared to Square-Kufi, to Square-Kufi, which is thewhich long is verticalthe long straight vertical lines. straight The lines. Long The Vertical Long Lines Vertical (LVL) Lines descriptor (LVL) descriptor has been designatedhas been des- to eliminateignated to the eliminate conflict between the conflict texts writtenbetween in kufitexts and written the ones in ofkufi Square-Kufi. and the Itones mainly of describesSquare-Kufi. how It verticalmainly linesdescribes in a texthow vary. vertical lines in a text vary. After having the vertical straight lines extracted from the skeleton image as Figure8 shows,After the having following the five vertical measurements straight lines are extr calculated:acted from (a) the the text skeleton height image from theas Figure bottom 8 toshows, top, (b)the thefollowing number five of measurements detected vertical are lines, calculated: (c) the (a) length the text of theheight highest from detected the bot- verticaltom to top, line, (b) (d) the the number difference of detected ratio between vertical the lines, text (c) height the length and the of highestthe highest vertical detected line, andvertical (e) theline, variance (d) the amongdifference the ratio vertical between lines. the text height and the highest vertical line, and (e) the variance among the vertical lines.

Figure 8. Vertical straight lines detection.

4.2.4. Text Thickness (Tth) Stroke thickness plays an important role in defining the style. Some styles use a flat pen, whereas some others use a pointed one. In some styles, calligraphers alter the thickness while writing (via pushing down the pen or the opposite), whereas in others, the thickness is always preserved. Modeling such a feature in form of a descriptor will help the machine to understand more specificities of each style. Text thickness (Tth) descriptor codifies the appearance frequency of different line thicknesses in a text image. To extract this descriptor, we employ both the skeleton and edge image. To find the thickness level at a pixel of the skeleton, we calculate the distance between the two points on a perpendicular line that passes through p.

4.2.5. Special Diacritics (SDs) Thuluth and Mohakik have a similar writing style that is decorated with diacritics having special shapes, as shown in Figure 9. An SDs descriptor will be used to inspect the Appl. Sci. 2021, 11, x FOR PEER REVIEW 10 of 17

(a) (b)

Figure 7. Extracting orientations to be used to generate ToE and ToS histograms. Orientations extracted from (a) the skeleton image, and (b) the edge image.

4.2.3. Long Vertical Lines (LVL) Since Kufi and Square-Kufi share common features such as vertical and horizontal lines, HVSL will consider Kufi images as Square-Kufi and separate them from other styles. However, Kufi has a distinctive characteristic, compared to Square-Kufi, which is the long vertical straight lines. The Long Vertical Lines (LVL) descriptor has been designated to eliminate the conflict between texts written in kufi and the ones of Square-Kufi. It mainly describes how vertical lines in a text vary.

After having the vertical straight lines extracted from the skeleton image as Figure 8 shows, the following five measurements are calculated: (a) the text height from the bot- Appl. Sci. 2021, 11, 4852 tom to top, (b) the number of detected vertical lines, (c) the length of the highest detected10 of 17 vertical line, (d) the difference ratio between the text height and the highest vertical line, and (e) the variance among the vertical lines.

Figure 8. Vertical straight lines detection.detection. 4.2.4. Text Thickness (Tth) 4.2.4. Text Thickness (Tth) Stroke thickness plays an important role in defining the style. Some styles use a flat pen,Stroke whereas thickness some plays others an important use a pointed role one.in defining In some the styles, style. calligraphers Some styles use alter a theflat pen,thickness whereas while some writing others (via pushinguse a pointed down theone. pen In or some the opposite), styles, calligraphers whereas in others, alter the thickness while is always writing preserved. (via pushing Modeling down such the a pen feature or the in formopposite), of a descriptor whereas in will others, help the machinethickness to is understand always preserved. more specificities Modeling ofsu eachch a style.feature Text in thicknessform of a (Tth)descriptor descriptor will helpcodifies the themachine appearance to understand frequency more of different specificities line thicknessesof each style. in Text a text thickness image. To (Tth) extract de- scriptorthis descriptor, codifies we the employ appearance both thefrequency skeleton of and different edge image. line thicknesses To find the in thickness a text image. level To at extracta pixel ofthis the descriptor, skeleton, we we calculate employ theboth distance the skeleton between and the edge two pointsimage. onTo a find perpendicular the thick- nessline thatlevel passes at a pixel through of the p. skeleton, we calculate the distance between the two points on a perpendicular line that passes through p. Appl. Sci. 2021, 11, x FOR PEER REVIEW 11 of 17 4.2.5. Special Diacritics (SDs) 4.2.5.Thuluth Special Diacritics and Mohakik (SDs) have a similar writing style that is decorated with diacritics havingThuluth special and shapes, Mohakik as shown have a in similar Figure 9writing. An SDs style descriptor that is decorated will be used with to diacritics inspect havingtheexistence existence special of such of shapes, such diacritics diacritics as shown in a in givenin aFigure given text 9. textim Anage. image.SDs To descriptor this To thisend, end, willeach be each of used the of diacriticsto the inspect diacritics rep- the representedresented in inFigure Figure 9 9is is explicitly explicitly segmented segmented and and then then represented represented with aa vectorvector ofof HuHu momentsmoments [[32].32]. Thereafter,Thereafter, the the distance distance among among these these moments moments and and those those extracted extracted from from the inputthe input image’s image’s diacritics diacritics are calculated are calculat to decideed to whether decide the whether image isthe Thuluth/Mohakik image is Thu- orluth/Mohakik not. or not.

FigureFigure 9.9. Thuluth and Mohakik diacritics.

ItIt isis worthworth mentioningmentioning thatthat SDsSDs hashas notnot beenbeen designeddesigned toto identifyidentify ThuluthThuluth fromfrom MohakikMohakik butbut ratherrather toto separateseparate bothboth ofof themthem fromfrom otherother styles.styles.

4.2.6.4.2.6. Word’s OrientationOrientation (WOr)(WOr) OneOne ofof thethe salientsalient features features of of the the Diwani Diwani style style is theis the words words written written in ain slanted a slanted format, for- asmat, Figure as Figure 10 shows. 10 shows. WOr descriptorWOr descriptor mainly mainly specifies specifies how, on how, average, on average, words inwords text images in text areimages oriented. are oriented. The flood-fill The flood-fill algorithm algorithm is used tois used detect to words detect from words no from diacritics no diacritics images thatimages are that generated are generated in the processing in the processing step. After step. calculating After calculating each word’s each orientation,word’s orienta- the meantion, the orientation mean orientation along with along the number with the of detectednumber wordsof detected are combined words are toform combined the final to WOr descriptor. form the final WOr descriptor. WOr algorithm was used to distinguish Diwani from other styles. Diwani style yield an orientation average of about 45 degrees compared to 0 degrees by other styles.

Figure 10. A slanted word from Diwani style, with its orientation measured, detected using the flood-fill algorithm.

WOr algorithm was used to distinguish Diwani from other styles. Diwani style yield an orientation average of about 45 degrees compared to 0 degrees by other styles.

4.2.7. Horizontal Profile Projection (HPP) In some AC styles, all words are written on a baseline, whereas in some other styles, words are not subjected to a baseline. Horizontal profile projection is a descriptor that specifies how words are vertically spread within a text image. First, a sub-image that exactly fits the text is cropped from the text image, and then, all pixels are projected to a vertical histogram, using the following Equation (1), to find how pixels are vertically spread.

𝐻𝑃𝑃 ∑ 𝐼𝑀, 𝑁 max∑ 𝐼𝑀, 𝑁 (1) In an image in which words are written on a baseline, HPP will have only one bin with a high value, whilst, in an image in which words are vertically spread, HPP will have more than one bin with high value.

5. Experimental Results and Discussion This section is dedicated to evaluating the performance of the proposed solution. Three scenarios have been carried out; the first one is meant to evaluate each descriptor individually to validate the role it has been proposed for in the first place. In the second scenario, the decisions yielded by all classifiers are combined to produce a final decision. Appl. Sci. 2021, 11, x FOR PEER REVIEW 11 of 17

existence of such diacritics in a given text image. To this end, each of the diacritics represented in Figure 9 is explicitly segmented and then represented with a vector of Hu moments [32]. Thereafter, the distance among these moments and those extracted from the input image’s diacritics are calculated to decide whether the image is Thu- luth/Mohakik or not.

Figure 9. Thuluth and Mohakik diacritics.

It is worth mentioning that SDs has not been designed to identify Thuluth from Mohakik but rather to separate both of them from other styles.

4.2.6. Word’s Orientation (WOr) One of the salient features of the Diwani style is the words written in a slanted format, as Figure 10 shows. WOr descriptor mainly specifies how, on average, words in text images are oriented. The flood-fill algorithm is used to detect words from no diacritics Appl. Sci. 2021, 11, 4852 images that are generated in the processing step. After calculating each word’s orienta-11 of 17 tion, the mean orientation along with the number of detected words are combined to form the final WOr descriptor.

Figure 10. AA slanted slanted word word from from Diwani Diwani style, style, with with its its orientation orientation measured, measured, detected detected using using the the ﬂood-ﬁllflood-fill algorithm.algorithm.

4.2.7.WOr Horizontal algorithm Profile was Projection used to distinguish (HPP) Diwani from other styles. Diwani style yield an orientationIn some AC average styles, of all about words 45 are degrees written compared on a baseline, to 0 degrees whereas by in other some styles. other styles, words are not subjected to a baseline. Horizontal profile projection is a descriptor that specifies4.2.7. Horizontal how words Profile are vertically Projection spread (HPP) within a text image. First, a sub-image that exactly fits theIn textsome is AC cropped styles, from all words the text are image, written and on then,a baseline, all pixels whereas are projected in some toother a vertical styles, histogram,words are usingnot subjected the following to a baseline. Equation Horizont (1), to findal profile how pixels projection are vertically is a descriptor spread. that specifies how words are vertically spread within a text image. First, a sub-image that HPP = I(M, N)/max( I(M, N)) (1) exactly fits the text is cropped from∑N the text image,M ∑ andN then, all pixels are projected to a vertical histogram, using the following Equation (1), to find how pixels are vertically spread.In an image in which words are written on a baseline, HPP will have only one bin with a high value, whilst, in an image in which words are vertically spread, HPP will have 𝐻𝑃𝑃 ∑ 𝐼𝑀, 𝑁 max∑ 𝐼𝑀, 𝑁 more than one bin with high value. (1)

5. ExperimentalIn an image Results in which and words Discussion are written on a baseline, HPP will have only one bin with a high value, whilst, in an image in which words are vertically spread, HPP will This section is dedicated to evaluating the performance of the proposed solution. have more than one bin with high value. Three scenarios have been carried out; the first one is meant to evaluate each descriptor individually to validate the role it has been proposed for in the first place. In the second 5. Experimental Results and Discussion scenario, the decisions yielded by all classifiers are combined to produce a final decision. We devoteThis section this final is scenariodedicated to to examine evaluating the effectthe performance of using few of samples the proposed for training. solution. All theThree experiments scenarios have been carriedcarried outout; under the firs thet one following is meant configuration: to evaluate each descriptor individually to validate the role it has been proposed for in the first place. In the second • Dataset: We use the same dataset adopted in our previous works [15,27]. It consists scenario, the decisions yielded by all classifiers are combined to produce a final decision. of nine (9) classes each of which represent an Arabic style. The total number of the dataset’s images is 1685. Each style has several images that vary from 180 to 195. • Classification: For image classification, we use the well-known classifier Support Vector Machine (SVM). Support Vector Machine classifier is a supervised machine learning technique. The main goal of this supervised method is to find a function in a multidimensional space that is able to separate training data with known class labels. SVM can be used for linear as well as nonlinear data classification. In our case, we use the Polynomial Kernel. This choice is a result of the outperformance SVM showed compared to other classifiers. To avoid overfitting, 3-fold cross-validation has been adopted. • Metrics: To validate our model, we use four metrics, namely: Recall (R), Precision (P), F1-score, and Accuracy given by the following formulas.

treue positif R = true positif + false negatif true positif P = true positif + false positif R·P F1 − score = 2· R + P true positif + true negatif Accuracy = true positif + false positive + true negatif + false negatif Scenario 1. Experimenting with each descriptor individually: Appl. Sci. 2021, 11, 4852 12 of 17

As we discussed beforehand, some descriptors have been mainly proposed to identify one single style, whereas some others are general and can be used to separate multiple styles. This scenario is dedicated to analyzing the results for each descriptor individually. It should be mentioned that each descriptor has been used to train a separated SVM. Table3 presents the accuracies yielded by each feature descriptor for each style.

Table 3. The accuracies yielded by each descriptor. The ﬁrst column holds descriptors with the respective style they have been proposed for. Horizontal and Vertical Straight Lines (HVSL), Text orientation from Edges (ToE), Text orientation from Skeleton (ToS), Long Vertical Lines (LVL), Text thickness (Tth), Special Diacritics (SDs), Word’s Orientation (WOr), Horizontal Proﬁle Projection (HPP).

Diwani Naskh Farisi Rekaa Thuluth Maghribi Kufi Mohakik Square-Kufi HVSL (Square-Kufi) 63% 62% 34% 37% 83% 62% 87% 74% 100% ToE (General) 93% 88% 94% 87% 88% 81% 94% 86% 99% ToS (General) 96% 88% 91% 83% 90% 80% 91% 86% 98% LVL (Kufi) 54% 77% 28% 44% 49% 34% 89% 25% 91% Tth (General) 47% 61% 51% 52% 66% 51% 76% 43% 82% SDs (Thuluth + Mohakik) 13% 2% 0% 10% 86% 11% 17% 62% 18% WOr (Diwani) 85% 24% 9% 21% 9% 6% 8% 18% 62% HPP (General) 76% 97% 58% 76% 83% 92% 97% 78% 100%

From Table3, we can see that the descriptors that have been proposed for one or two specific styles have performed adequately. HVSL and WOr as instances generated accuracies of 100% and 85% respectively, which are the highest ones compared to others. However, LVL yielded a similar performance for both Kufi and Square-Kufi, although it has been proposed for the former. This could be attributed to the similarity between these two styles. The Tth descriptor works well for several styles that are characterized by their thickness. Those styles are Naskh, Thuluth, Kufi, and Square-Kufi. The highest accuracy was for Square-Kufi with a correct rate of 82%. In contrast, the lowest accuracy was for Mohakik style with 43%. HPP descriptor has worked well with most of the styles because it is a representation of a general characteristic. The lowest accuracy, which is 58%, produced with the style Farisi because it can be written in different manners (some texts are written on the baseline; in some others, words are written one above another). We can conclude that general descriptors have the highest accuracies. Nevertheless, by combining these general descriptors with the specific ones, even better results can be achieved. In the following scenario, we evaluate the impact of combining all the descriptors and compare the results to other proposed methods including deep learning. Scenario 2: combine the descriptors After having proved the efficiency of the individual descriptors, we combine them to get a powerful descriptor that can be used to distinguish the nine AC styles effectively. To this end, each descriptor is used to train a separated classifier (SVM in our case) and the overall decision will be the sum of the decisions generated by all the classifiers. In other words, the final decision will be the one with the highest votes. Figure 11 represents the precision and recall yielded by our combined descriptor compared to other methods from state of the art [16,17,20,24,25]. From Figure 11, it seems that our proposed method outperforms all other related works with a precision of 97%. This remarkable performance of ours is a result of combining all AC features used by calligraphers to recognize different styles in one descriptor. The method proposed in [20] (EDM1+LBP) has scored the second-best performance. This is because the latter deals with AC images as texture that is an approach; we prove its effectiveness in [22]. Although the work [23] has opted for morphological descriptors as we did, the results were too weak. This is because the authors have taken into account the general features only and did not consider the specificity of each AC style. On the other Appl. Sci. 2021, 11, x FOR PEER REVIEW 13 of 17

has been proposed for the former. This could be attributed to the similarity between these two styles. The Tth descriptor works well for several styles that are characterized by their thickness. Those styles are Naskh, Thuluth, Kufi, and Square-Kufi. The highest accuracy was for Square-Kufi with a correct rate of 82%. In contrast, the lowest accuracy was for Mohakik style with 43%. HPP descriptor has worked well with most of the styles because it is a representation of a general characteristic. The lowest accuracy, which is 58%, produced with the style Farisi because it can be written in different manners (some texts are written on the baseline; in some others, words are written one above another). We can conclude that general descriptors have the highest accuracies. Nevertheless, by combining these general descriptors with the specific ones, even better results can be achieved. In the following scenario, we evaluate the impact of combining all the de- Appl. Sci. 2021, 11, 4852 scriptors and compare the results to other proposed methods including deep learning.13 of 17 Scenario 2: combine the descriptors After having proved the efficiency of the individual descriptors, we combine them tohand, get a powerful stacked auto-encoder descriptor that has can scored be used the worstto distinguish precision the (resp. nine recall) AC styles among effectively. all. This is Tobecause this end, using each stacked descriptor auto-encoders is used to needstrain a tens separated of thousands classifier of images (SVM in to tuneour case) thousands and theof overall parameters decision that constitutewill be the the sum network, of the especiallydecisions generated since the input by all of the the classifiers. network is In the otherpixels words, of the the image final itselfdecision rather will than be the a descriptor. one with the Yet, highest no AC votes. dataset Figure contains 11 represents this huge thenumber precision of images.and recall To yielded further evaluateby our combined the performance descriptor of thesecompared methods, to other F1-score methods at the fromstyle-level state of hasthe beenart [16,17,20,24,25]. estimated and listed in Table4.

100% Recall Precision 90%

80%

70%

60%

50%

40%

30% F Allaf/NN EDMS/DT EDM1/NN Our Method Our EDM1+LBP/R Auto-encoder

FigureFigure 11. 11. RecallRecall and and precision precision generated generated by by the the proposed proposed method method compared compared to to other other AC-related AC-related works. works.

Table 4. F1-scoreFrom Figure at style-level 11, it yieldedseems bythat our our method propos anded other method related outperform works. s all other related works with a precision of 97%. This remarkable performance of ours is a result of com- EDMS/Decision EDM1+LBP/Random Auto-Encoder EDM1/NN [17] Allaf/NN [24] Our Method Tree [16] bining all AC features usedForest by [calligraphers20] to recognize different[25] styles in one de- Diwani 76%scriptor. The 75% method proposed 90% in [20] (EDM1+LBP) 30% has scored the 35% second-best perfor- 98% mance. This is because the latter deals with AC images as texture that is an approach; we Naskh 92% 87% 98% 81% 48% 99% prove its effectiveness in [22]. Although the work [23] has opted for morphological de- Farisi 56% 58% 80% 23% 2% 97% scriptors as we did, the results were too weak. This is because the authors have taken into Rekaa 54%account the 53% general features only 77% and did not consider 45% the specificity 12% of each AC 98% style. Thuluth 63%On the other 56% hand, stacked auto-encoder 78% has scored 73% the worst precision 43% (resp. 94%recall) Maghribi 66%among all. 77%This is because using 75% stacked auto-encoders 46% needs tens 15% of thousands 97%of im- Kuﬁ 64%ages to tune 72% thousands of parameters 93% that constitute 73% the network, 14% especially since 98% the Mohakik 41%input of the 60% network is the pixels 79% of the image itself 45% rather than a descriptor. 33% Yet, 94%no AC S-Kuﬁ 96% 94% 97% 94% 80% 99%

At first glance, Table4 shows that our proposed method outperforms other methods in all styles. S-kufi seems to be the most distinguishable style among the others due to its unique features. In contrast, Thuluth, Mohakik, and Rakaa seem to be harder to distinguish. This can be attributed to the common way of writing of Rakaa and the similarity between Mohakik and Thuluth. Nevertheless, our proposed method has yielded satisfying results because we took advantage of slight changes among styles such as the unique way of writing diacritics. With most of the styles, our proposed method has produced more than 97% accuracy. However, with Thuluth and Mohakik, our method yields the lowest accuracy, which is 94%. To get an idea about how the styles Thuluth and Mohakik are misclassified, we generate and present the confusion matrix in Table5. Appl. Sci. 2021, 11, 4852 14 of 17

Table 5. The confusion matrix, of the nine AC styles, generated by our proposed method.

Style Diwani Naskh Farisi Rekaa Thuluth Maghribi Kufi Mohakik S-Kufi Diwani 100% 0% 0% 0% 0% 0% 0% 0% 0% Naskh 0% 100% 0% 0% 0% 0% 0% 0% 0% Farisi 2% 0% 96% 2% 0% 0% 0% 0% 0% Rekaa 0% 0% 1% 98% 0% 1% 0% 0% 0% Thuluth 0% 0% 1% 0% 95% 0% 0% 3% 1% Maghribi 0% 0% 1% 0% 1% 99% 0% 0% 0% Kufi 0% 0% 0% 0% 0% 2% 98% 0% 0% Mohakik 0% 0% 0% 0% 5% 0% 0% 94% 1% S-Kufi 0% 0% 0% 1% 0% 0% 1% 0% 99%

From Table5, we can see that our proposed method produces satisfying results with most of the styles. However, Mohakik and Thuluth have misclassiﬁed one as the other in some cases, which decreases the accuracy. This is because both styles Mohakik and Thuluth have similar shapes with less unique features that can be used to distinguish one from the other. Scenario 3: Compare to other texture and deep learning methods In a previous work [15], we proved that AC is better to be dealt with as texture. Appl. Sci. 2021, 11, x FOR PEER REVIEW 15 of 17 Since AC text is relatively a homogeneous (i.e., stationary) texture with repeated patterns (characters, diacritics, etc.), using statistical feature descriptors yields better results than learning-based including deep learning [33]. Furthermore, descriptors that are designed to besigned used to with be heterogenousused with heterogenous images are harderimages to are train harder (i.e., moreto train images (i.e., aremore needed) images than are descriptorsneeded) than dedicated descriptors to a speciﬁcdedicated family to a ofspecific images. family That of is toimages. say, all That the descriptorsis to say, all that the wedescriptors compare that to our we method compare will to poorly our method perform will in cases poorly of aperform small training in cases set. of To a provesmall thesetraining claims, set. anTo evaluation prove these of statisticalclaims, an against evaluation learning-based of statistical (deep against learning) learning-based methods in both(deep normal learning) and methods small training in both set normal size has and been sm carriedall training out. Figure set size 12 has represents been carried precision out. obtainedFigure 12 in represents both cases. precision obtained in both cases.

100% 10% Train 33% Train 95%

90%

85%

80%

75%

70%

65%

60%

55%

50% BSIF LPQ Resnet-50 Rest net 101 Alexnet Our method

FigureFigure 12.12. RelatedRelated worksworks evaluatedevaluated andand comparedcompared againstagainst ourour methodmethod usingusing 33%33% andand 10%10% imagesimages subset respectively for training. subset respectively for training.

FromFrom Figure Figure 12 12,, it it appears appears that that our our method method outperforms outperforms all otherall other methods methods in both in 33%both and33% 10% and training 10% training cases. Ascases. we haveAs we stated have above, stated statistical above, statistical descriptors descriptors (e.g., BSIF (e.g., and LPQ) BSIF thatand areLPQ) designed that are to designed be used to with be used heterogeneous with heterogeneous textures have textures performed have performed well with well AC stylewith recognition.AC style recognition. However, we However, have claimed we ha thatve suchclaimed descriptors that such lose descriptors performance lose in per- the formance in the case of using a few samples for training. The proof of this claim can easily be noted in the accuracy drop of all methods in the case of using a small image set for training. Our method, on the other hand, seems not to be greatly affected by reducing the size of the training subset to 10%. By this last scenario, we confirm that using dedicated descriptors for some specific datasets, which is the case with AC, is far more effective than general descriptors

6. Conclusions The main aim of this research was to investigate employing cues used by the calligrapher for AC style recognition. To this end, the features used by calligraphers have been codified into descriptors, some of which are dedicated to distinguishing specific styles and some others being for general features and some others are for general features. We did experiment our proposed descriptor in two different ways as individual features, as some descriptors have been made to identify one single style, whereas some others are general and can be used to separate multiple styles; in combined form, we combined them to get a powerful descriptor that can be used to distinguish the nine AC styles effectively using a rich dataset containing 1685 images categorized into nine styles. The results indicated the outperformance of the proposed method compared to other related works including deep learning. By these results, we have confirmed that exploiting the calligrapher’s expertise rather than using general image features highly improves the performance of the AC recognition. Furthermore, the experiment outcomes proved that our descriptor was not highly affected by the small size of the training data, unlike other general-purpose descriptors. In this work, we have considered the most Appl. Sci. 2021, 11, 4852 15 of 17

case of using a few samples for training. The proof of this claim can easily be noted in the accuracy drop of all methods in the case of using a small image set for training. Our method, on the other hand, seems not to be greatly affected by reducing the size of the training subset to 10%. By this last scenario, we conﬁrm that using dedicated descriptors for some speciﬁc datasets, which is the case with AC, is far more effective than general descriptors.

6. Conclusions The main aim of this research was to investigate employing cues used by the calligrapher for AC style recognition. To this end, the features used by calligraphers have been codified into descriptors, some of which are dedicated to distinguishing specific styles and some others being for general features and some others are for general features. We did experiment our proposed descriptor in two different ways as individual features, as some descriptors have been made to identify one single style, whereas some others are general and can be used to separate multiple styles; in combined form, we combined them to get a powerful descriptor that can be used to distinguish the nine AC styles effectively using a rich dataset containing 1685 images categorized into nine styles. The results indicated the outperformance of the proposed method compared to other related works including deep learning. By these results, we have confirmed that exploiting the calligrapher’s expertise rather than using general image features highly improves the performance of the AC recognition. Furthermore, the experiment outcomes proved that our descriptor was not highly affected by the small size of the training data, unlike other general-purpose descriptors. In this work, we have considered the most common features used by calligraphers in distinguishing styles. However, additional efforts should be spent to explore other less common features. Such features will surely further improve the results and extend the system’s capability to other styles. Another aspect that needs to be tackled is text transformation, such as rotation, which is the case with most decorative texts.

Author Contributions: Conceptualization, Z.K., M.L.K. and B.K.; methodology, Z.K.; software, Z.K.; validation, Z.K., M.L.K. and B.K.; formal analysis, Z.K. and B.K.; investigation, Z.K.; resources, Z.K.; data curation, Z.K.; writing—original draft preparation, Z.K.; writing—review and editing, M.L.K. and B.K.; visualization, Z.K.; supervision, M.L.K.; project administration, M.L.K. All authors have read and agreed to the published version of the manuscript. Funding: This research received no external funding. Institutional Review Board Statement: Not applicable. Informed Consent Statement: Not applicable. Data Availability Statement: The data that support the findings of this study are available from the corresponding author upon reasonable request. Conflicts of Interest: The authors declare no potential conflict of interests.

References 1. Binmakhashen, G.M.; Mahmoud, S.A. Document Layout Analysis: A Comprehensive Survey. ACM Comput. Surv. 2019, 52, 109. [CrossRef] 2. Stauffer, M.; Maergner, P.; Fischer, A.; Riesen, K. A Survey of State of the Art Methods Employed in the Offline Signature Verifica- tion Process. In New Trends in Business Information Systems and Technology: Digital Innovation and Digital Business Transformation; Dornberger, R., Ed.; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2020; pp. 17–30. 3. Rehman, A.; Naz, S.; Razzak, M.I. Writer identification using machine learning approaches: A comprehensive review. Multimed. Tools Appl. 2019, 78, 10889–10931. [CrossRef] 4. Available online: https://www.worldatlas.com/articles/the-world-s-most-popular-writing-scripts.html (accessed on 10 May 2021). 5. Hussein, A.K. Fast learning neural network based on texture for Arabic calligraphy identification. Indones. J. Electr. Eng. Comput. Sci. 2021, 21, 1794–1799. [CrossRef] 6. Luqman, H.; Mahmoud, S.A.; Awaida, S. Arabic and Farsi Font Recognition: Survey. Int. J. Pattern Recognit. Artif. Intell. 2015, 29, 1553002. [CrossRef] Appl. Sci. 2021, 11, 4852 16 of 17

7. Bar Yosef, I.; Kedem, K.; Dinstein, I.; Beit-Arie, M.; Engel, E. Classification of hebrew calligraphic handwriting styles: Preliminary results. In Proceedings of the First International Workshop on Document Image Analysis for Libraries, Palo Alto, CA, USA, 23–24 January 2004; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2004; pp. 299–305. 8. Wang, L.; Gong, X.; Zhang, Y.; Xu, P.; Chen, X.; Fang, D.; Zheng, X.; Guo, J. Artistic features extraction from chinese calligraphy works via regional guided filter with reference image. Multimed. Tools Appl. 2018, 77, 2973–2990. [CrossRef] 9. Pengcheng, G.; Gang, G.; Jiangqin, W.; Baogang, W. Chinese calligraphic style representation for recognition. Int. J. Doc. Anal. Recognit. 2017, 20, 59–68. [CrossRef] 10. Kallel, F.; Mezghani, A.; Kanoun, S.; Kherallah, M. Arabic Font Recognition Based on Discret Curvelet Transform. In Proceedings of the Third International Afro-European Conference for Industrial Advancement—AECIA 2016; Advances in Intelligent Systems and Computing Series; Springer Science and Business Media LLC: Berlin/Heidelberg, Germany, 2018; Volume 565, pp. 360–369. 11. Jaiem, F.K.; Kherallah, M. A novel Arabic font recognition system based on texture feature and dynamic training. Int. J. Intell. Syst. Technol. Appl. 2017, 16, 289. 12. Jaiem, F.K.; Slimane, F.; Kherallah, M. Arabic font recognition system applied to different text entity level analysis. In Proceedings of the 2017 International Conference on Smart, Monitored and Controlled Cities (SM2C) 2017, Sfax, Tunisia, 17–19 February 2017; pp. 36–40. 13. Sakr, G.; Mhanna, A.; Demerjian, R. Convolution Neural Networks for Arabic Font Recognition. In Proceedings of the 2019 15th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Sorento, Italy, 26–29 November 2019; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2019; pp. 128–133. 14. Amer, I.M.; Mostafa, M.G. Deep Arabic Font Family and Font Size Recognition. Int. J. Comput. Appl. 2017, 176, 1–6. [CrossRef] 15. Kaoudja, Z.; Khaldi, B.; Kherfi, M.L. Arabic Artistic Script Style Identification Using Texture Descriptors. In Proceedings of the 2020 1st International Conference on Communications, Control Systems and Signal Processing (CCSSP), El Oued, Algeria, 16–17 May 2020; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2020; pp. 113–118. 16. Bataineh, B.; Abdullah, S.N.H.S.; Omar, K. A novel statistical feature extraction method for textual images: Optical font recognition. Expert Syst. Appl. 2012, 39, 5470–5477. [CrossRef] 17. Bataineh, B.; Abdullah, S.N.H.S.; Omar, K.; Batayneh, A. Arabic-Jawi Scripts Font Recognition Using First-Order Edge Direction Matrix. In Proceedings of the International Multi-Conference on Artificial Intelligence Technology, Shah Alam, Malaysia, 28–29 August 2013; pp. 27–38. 18. Bataineh, B.; Abdullah, S.N.H.S.; Omar, K. Arabic calligraphy recognition based on binarization methods and de-graded images. In Proceedings of the 2011 International Conference on Pattern Analysis and Intelligent Robotics, ICPAIR 2011, Putrajaya, Malaysia, 28–29 June 2011; pp. 65–70. 19. Bataineh, B.; Abdullah, S.N.H.S.; Omar, K. Generating an Arabic Calligraphy Text Blocks for Global Texture Analysis. Int. J. Adv. Sci. Eng. Inf. Technol. 2011, 1, 150–155. [CrossRef] 20. Talab, M.A.; Abdullah, S.N.H.S.; Razalan, M.H.A. Edge direction matrixes-based local binary patterns descriptor for invariant pattern recognition. In Proceedings of the 2013 International Conference on Soft Computing and Pattern Recognition (SoCPaR), Hanoi, Vietnam, 15–18 December 2013; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2013; pp. 13–18. 21. Azmi, M.S.; Omar, K.; Nasrudin, M.F.; Ghazali, K.W.M.; Abdullah, A.; Abdullah, A. Arabic calligraphy identification for Digital Jawi Paleography using triangle blocks. In Proceedings of the 2011 International Conference on Electrical Engineering and Informatics, Bandung, Indonesia, 17–19 July 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 1–5. 22. Azmi, M.S.; Omar, K.; Nasrudin, M.F.; Muda, A.K.; Abdullah, A. Arabic calligraphy classification using triangle model for Digital Jawi Paleography analysis. In Proceedings of the 2011 11th International Conference on Hybrid Intelligent Systems (HIS), Malacca, Malaysia, 5–8 December 2011; pp. 704–708. 23. Adam, K.; Al-Maadeed, S.; Bouridane, A. Letter-based classification of Arabic scripts style in ancient Arabic manuscripts: Preliminary results. In Proceedings of the 2017 1st International Workshop on Arabic Script Analysis and Recognition (ASAR), Nancy, France, 3–5 April 2017; pp. 95–98. [CrossRef] 24. Allaf, S.R.; Al-Hmouz, R. Automatic Recognition of Artistic Arabic Calligraphy Types. J. King Abdulaziz Univ. 2016, 27, 3–17. [CrossRef] 25. Al-Hmouz, R. Deep learning autoencoder approach: Automatic recognition of artistic Arabic calligraphy types. Kuwait J. Sci. 2020, 47, 3. 26. Khayyat, M.; Elrefaei, L. A Deep Learning Based Prediction of Arabic Manuscripts Handwriting Style. Int. Arab. J. Inf. Technol. 2020, 17, 702–712. [CrossRef] 27. Kaoudja, Z.; Kherfi, M.L.; Khaldi, B. An efficient multiple-classifier system for Arabic calligraphy style recognition. In Proceedings of the 2019 International Conference on Networking and Advanced Systems (ICNAS), Annaba, Algeria, 26–27 June 2019; Institute of Electrical and Electronics Engineers (IEEE): Piscataway, NJ, USA, 2019; pp. 1–5. 28. Moghaddam, R.F.; Cheriet, M.; Milo, T.; Wisnovsky, R. A prototype system for handwritten sub-word recognition: Toward Arabic-manuscript transliteration. In Proceedings of the 2012 11th International Conference on Information Science, Signal Processing and their Applications (ISSPA), Montreal, QC, Canada, 2–5 July 2012; pp. 1198–1204. 29. Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [CrossRef] Appl. Sci. 2021, 11, 4852 17 of 17

30. Cheriet, M.; Kharma, N.; Liu, C.-L.; Suen, C. Character Recognition Systems: A Guide for Students and Practitioners; John Wiley & Sons: Hoboken, NJ, USA, 2007. 31. Lutf, M.; You, X.; Cheung, Y.-M.; Chen, C.P. Arabic font recognition based on diacritics features. Pattern Recognit. 2014, 47, 672–684. [CrossRef] 32. Hu, M.-K. Visual pattern recognition by moment invariants. IEEE Trans. Inf. Theory 1962, 8, 179–187. [CrossRef] 33. Khaldi, B.; Aiadi, O.; Kherﬁ, M.L. Combining colour and grey-level co-occurrence matrix features: A comparative study. IET Image Process. 2019, 13, 1401–1410. [CrossRef]