Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx

Contents lists available at ScienceDirect Engineering Science and Technology, an International Journal

journal homepage: www.elsevier.com/locate/jestch

Full Length Article Recognition-based online Kurdish character recognition using hidden Markov model and harmony search ⇑ Rina D. Zarro , Mardin A. Anwer

Software Engineering Department, Salahaddin University-Erbil, Erbil, Kurdistan, article info abstract

Article history: In this paper a hidden Markov model and harmony search algorithms are combined for writer indepen- Received 1 August 2016 dent online Kurdish character recognition. The Markov model is integrated as an intermediate group clas- Revised 4 November 2016 sifier instead of a main character classifier/recognizer as in most of previous works. Markov model is used Accepted 18 November 2016 to classify each group of characters, according to their forms, into smaller sub groups based on common Available online xxxx directional feature vector. This process reduced the processing time taken by the later recognition stage. The small number of candidate characters are then processed by harmony search recognizer. The har- Keywords: mony search recognizer uses a dominant and common movement pattern as a fitness function. The objec- Character recognition tive function is used to minimize the matching score according to the fitness function criteria and Evolutionary computation Kurdish character recognition according to the least score for each segmented group of characters. Then, the system displays the gen- Hidden markov model erated word which has the lowest score from the generated character combinations. The system was Harmony search tested on a dataset of 4500 words structured with 21,234 characters in different positions or forms (iso- lated, start, middle and end). The system scored 93.52% successful recognition rate with an average of 500 ms. The system showed a high improvement in recognition rate when compared to similar systems that use HMM as its main recognizer. Ó 2016 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

1. Introduction in character recognition for the past few years. The research moti- vation in Kurdish character recognition originates from its special Nowadays, the growth of portable and mobile technologies, and complicated styles of writing related to the same languages. lead to a necessity to develop a character or text recognition appli- Complications in Kurdish recognition come from the possibility cation, as most of the current devices does not integrate a key- of writing a single word with one stroke or with many strokes, board. However, character recognition systems represent a depending on the user writing style. Moreover, it is possible to complex and wide research area. Developing a successful system write a single character in various styles. Besides, the word may requires a tremendous dataset to be analyzed which takes a long change when (dots, hamza, ...etc) are written. time. The main procedures that effects the recognition success rate In this paper, a combination of Hidden Markov Model (HMM) may resides in the methods used in character preprocessing, and harmony search (HS) is used to recognize Kurdish characters extracted feature and recognition. However, the challenges of extracted from word segmentation process. The character is pre- developing an efficient system are not limited to its recognition processed to create a more consistent movement pattern, and also rate and accuracy but also its recognition time especially in online removing effects such as incomplete chain, hocks. HMM is used as recognition [23]. Therefore, it is very common that a recognition first stage recognition step for extracting the possible group of systems may compromise the success rate in order to achieve an characters the character may belong. Then, HS is applied to identify acceptable recognition time. the closest fit character from the group when compared to a set of Kurdish, as many languages which use Urdu, Arabic and Persian dataset. A special matching criteria is used in HS fitness function. based characters or , has gained a wide research interest The rest of this paper is organized as follows. In Section 2 a review of similar works in the field of online Kurdish characters recognition is introduced. Section 3 introduces the Kurdish charac- ⇑ Corresponding author. ter systems with explanation to the character’s structure and the E-mail addresses: [email protected] (R.D. Zarro), mardinsherwany77@ problems and challenges exists in the system. Section 4 explains gmail.com (M.A. Anwer). the structure of the online Kurdish character recognition system. Peer review under responsibility of Karabuk University. http://dx.doi.org/10.1016/j.jestch.2016.11.016 2215-0986/Ó 2016 Karabuk University. Publishing services by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 2 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx

It explains the preprocessing phase where characters are enhanced which share the same alphabets. The review included HMM algo- before features are extracted for the recognition phase. The recog- rithm use for recognizing Arabic, Persian, Urdu abd Pishtu hand- nition phase explains the HMM classifier and HS recognizer struc- writings. The review explore various feature extraction tures. The result is introduced in Section 5. Section 6 presents the techniques and classification. last drawn conclusion. HMM was combined with other algorithms in order to enhance the feature extraction results. Amor et al. [6] used a Hough trans- formation and HMM for multi-font Kurdish character recognition. 2. Related works The Hough transform was used to extract unique and meaningful features within the character in which a set of lines were extracted. Kurdish character recognition systems can be categorized into These lines were used to train the HMM. The HMM for each char- two groups: offline and online [23]. Offline systems deal with acter had 4–7 left to right states in which each state return to itself machine printed or handwritten text or characters. These data or go to the next state. The system was intended to recognize Kur- are processed more as images and pixels to extract as much fea- dish printed characters with different fonts. This approach was tures and information from these images as possible. The time later used by the same authors to develop a more robust features may not be an important or relevant factor compared to output based on Wavelet transformation [7]. The System uses both Hough text recognition accuracy. On the other hand, online systems deal features and Wavelet features to train HMM for a better results. with direct coordinate data obtained from writing strokes made The basic results obtained from these HMM recognition based sys- on the tablet device. Since preprocessing and information extrac- tem was acceptable. However, the result obtained was always lim- tion in online systems must occurs in real time, a fast algorithm ited and there was only a slight opportunity to modify for should be integrated in the system. Therefore, it may produce obtaining better results. Razzak et al. [36] used a fuzzy set theory lower recognition accuracy compared to offline as the processing with HMM for based Urdu characters. The HMM input con- time is reduced. sists of fuzzy rules created to identify characters within the script. Although, there are not many researches that deals with Kur- This approach modified the result from 81% to 87% successful dish handwritings [33], there are various studies which handles recognition rate. this field but with similar set of characters such as Arabic, Urdu Evolutionary or swarmed based algorithms represented a gen- and Persian. These languages have many characters in common eric population based metaheuristic optimization algorithms. It with . Therefore, the paper will focus on these depends on generating a new population from the existence one languages for problem presentation and literature review, as well following some fitness function. Initially, these algorithms were as, comparison. Research in Kurdish like characters was initiated successfully used in offline character recognition rather than in the end of 80s. The earliest systems were only handling optical online recognition because it takes a longer time to find a feasible recognition for scanned documents in the form of printed or hand- solution. This short time recognition characteristic is crucial in written. The online systems were basically dealing with isolated online systems. Genetic algorithm (GA) was initially used to opti- Arabic characters, without taking character forms in cursive writ- mize feature selection problem. The GA generated population ings under consideration for keeping the system as simple as pos- was used to find the smallest feature subset from a wider feature sible [15]. Later, advance methods in online Kurdish like character range which optimize the separation between different classes. systems were introduced with most of them handling recognition The algorithm was tested successfully on digit dataset [11] and of isolated characters [1,16]. These systems were modified and Persian [40]. In addition to feature selection, GA was used in offline enhanced by the end of the 90s to deal with segmented or segmen- character recognition system as well. It was successfully applied to tation free text based recognition system [5]. Latin [27] and Arabic [2,28]. In addition to GA, other heuristic Hidden Markov model or briefly HMM is a statistical tool used methods were employed in character recognition based on swarm to model a sequence of events that can be characterized by a Mar- optimization. Nebti et al. [31] applied particle swarm optimization kov process [34] it was widely used effectively in speech or charac- (PSO) and a combination of back propagation neural network and ter recognition systems for different languages such as Latin, bee colony for digit recognition based on image momentum calcu- Korean, Kurdish and many others [14,4,10,25]. In languages having lations. Particle swarm was used as a statistical classifier for com- similar set as Kurdish, HMM was integrated as a stan- paring the generated feature with the digit data set feature to dalone recognizer or a stage in the recognition process. The signif- obtain the optimal class while the second method was determining icance between these works is either the extracted character the assigned class using back propagation to classify the digits and features, or character encoding type used in training of HMM. in case of no classification is obtained the bee colony was used to Daifallah et al. [12] used a HMM for recognition, in assign the class. Sarfraz et al. [37] used PSO with moment invari- which, the segmented characters are treated as images. These ants for Arabic character recognition. PSO applied to optimize the images are preprocessed and scaled to extract characters’ image weight given to each feature in the feature vector so that it can seven momentums. These momentums are used to train and clas- maximize the possibility to find the right class. Singh and Shrivas- sify the result for the HMM recognizer. Rashwan et al. [35] used a tava [39] conducted a performance evaluation of feed forward feature vector in HMM training and recognition. The system uses a Neural Network with three different soft computing techniques sliding window through the word to generate a set of feature per on handwritten English alphabets. The study concluded that using word. The used vector consists of lossless differential luminosity Neural Network with evolution algorithm gave a better recognition coding based features. Additionally, a dynamic range normaliza- accuracy than using stand-alone Neural Network. The proposed tion parameters estimator was used to detect the effective study observed that, there are more than one converge weight dynamic range of the features to calculate the normalization matrix in character recognition for every training set. Finally, Singh parameters from the population of the feature vectors in the train- et al. [38] integrated GA and feed forward NN for evaluating the ing data. Biadsy et al. [8] used a feature vector and HMM to recog- recognition of Hindi curved scripts. GA is used to make the search nize Arabic words. The feature vector consists of three main process more efficient to determine the optimal weight vectors features which are local angle, super segment and loop feature. from the generated population. The study analyzed that the pro- These feature was quantized to create a dataset for the purpose posed method of a descent gradient of distributed error with the of HMM training. Naz et al. [30] presented a full review on using GA known as hybrid distributed evolutionary technique for the HMM algorithm for handwriting recognition for various languages multilayer feed forward neural performs better in terms of accu-

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx 3 racy, epochs and the number of optimal solutions for the given characters (Alf, Lam and Tah) have the same vertical pattern. There training and test pattern sets. are characters which may include a specific geometric shape such For the metaheuristic search techniques, HS algorithm is well as loops in characters (Mem and Waw), acute angle in characters known for its efficient and fast search method [21]. It was intro- (Dal and Haa) or curvature in characters (Ain and Raa). These fea- duced as a new heuristic search technique in 2001 [22]. Since then, tures can be used to differentiate between characters or can be it was widely tested on different well-known optimization prob- used to divide the character dataset into smaller subsets or even lems such as task assignment [42], flow shop scheduling [19,20] a specific single character. and routing problems [9]. Moreover, it was involved in many fields Kurdish character recognition represents a good example of of studies to solve various problem such as engineering optimiza- how complex a recognition system can be. Most of the characters tion [17], structure design [13], communication [18] and many have a widely common features which may cause recognition others. However, its usability in character recognition has not been error. In addition, the cursive nature of Kurdish writings leads to explored yet. The advantage of HS as a fast search method, makes it a verity in letters shape and writing style depending on the writer suitable for problems where time is an important factor such as in style. Moreover, the standard writing movement for any can online applications. change subject to person educational background or preferences. For this research, HS is used with HMM to enhance the accuracy Other factors, that may affect the quality or clarity of writings and rate of online Kurdish character recognition. HMM as stan- are the instability of the writing device (e.g., writing pen and tablet dalone classifier (recognizer) can have many drawbacks. One of device) which may cause inconsistent writing pattern, non- the well-known drawback is that the probability of each observa- continuous connected pattern and character hocks occurs in start tion depends on the current state only which makes the contextual and end of writing. Fig. 3 illustrates some of the common charac- effects too difficult to be modeled. Therefore, in this study, it is teristics and difficulties which Kurdish character system may used to classify the characters into multiple subsets based on sim- implies. ilarity, instead of, classifying each character into a single class. This will obviously delete any miss-classification between similar char- 4. Online Kurdish character recognition system acters. Finally, HS is used to match the direction vector of the tar- get character to a combination of directions generated from stored The proposed system for online Kurdish character recognition is dataset of direction vectors of each Kurdish alphabet. The system shown in Fig. 4. The system starts with preprocessing system to uses a recognition based method based on best score of characters prepare the character for the recognition step after segmentation. per selected ligature. This will guarantee that the generated iso- The paper uses segmentation technique proposed in [32]. Charac- lated characters from segmentation phase is correct. ter segmentation uses dominant point detection to transform the handwritten script into a set of straight line movement. After- wards, any horizontal line moves from right to left is considered 3. Kurdish characters review as candidate for segmentation. The final segment points are formed in different combination of candidate characters. For example, if 3 Kurdish language is a common language spread over 4 countries segment points are found, then there will be 7 segment combina- in the middle east [41]. It is used by more than 30 Million people in tion sets as candidate for actual segment points. The preprocessing the region. The character set are basically used in other lan- stage includes smoothing, point interpolation and hock removing. guages such as Arabic, Persian, Urdu and Jawi [30]. It consists of 33 The second step is to classify the character to a specific group letters in its basic isolated form and its writing form is cursive according to its feature. For this purpose, HMM is used to assign which is directed from right to left. This cursive form changes the character to its corresponding subgroup. The final step includes the shape of the character when exist in specific position and can a HS recognition by using best matching character from the sub- be divided into 4 categories: isolated, initial, medial or final form group according to special optimization criteria. The minimal score as shown in Fig.1. The number of characters is not fixed in each from all character (total word score) represents the recognized form and does not include all the characters. It is specified as 33 word. in isolated form, 27 in start form, 23 in middle form and 33 in end form. Moreover, the main shape of some of these letters are 4.1. Preprocessing the same (e.g. (paa, baa, taa)) only diacritics (dot) is making the let- ter different. The non-diacritics character set reduces the number Most of character recognition system (especially handwritten) of characters as a base structure only. For example, the letters includes a preprocessing phase where the character is normalized (paa, baa, taa) or (Geem, Haa, Chaa) has one base structure only and prepared for information or feature extraction. The handwrit- (Baa, Haa), therefore, the total number of characters for each form ten words or characters may include errors in stroke or jitters is reduced to 15 characters in isolated form, 9 characters in initial which caused by instability in writing device or hand shivering form, 7 characters in medial form and 15 characters in final form. at the time of writing. Moreover, the writing speed may cause In some cases, the handwriting may cause a merge operation other problems such as incomplete chain of points along the char- between characters. This case is widely existing in the combination acters causing gaps and ambiguity in the written characters (at the of letter Alef and Lam to form a single pattern La. Moreover, the start or the end of a character) or hocks (as was illustrated in writing style may shape characters to be very difficult to separate Fig.2a and b). To solve the problem of jitters and incomplete chain, such as in the case of letter Lam in its initial form. When the letters a cubic Bezier curve interpolation is used to enhance the shape of Haa or Mem follow the letter Lam in handwritten form, it will the movement and to fill the gaps between the points [24]. This cause a mixing between these two letter and causes a change or interpolation method divides the character into a group of four missing feature in the first letter. Therefore, the recognition system points as curve control to fill the gaps between these points may consider such cases as one character and assign a class for according to the formula, Fig. 2 shows the two cases .(ﻻ) each case, especially the character of merging characters in handwritten form. 3 2 2 3 BzðtÞ¼ð1 tÞ P1 þ 3tð1 tÞ P2 þ 3t ð1 tÞP3 þ t P4 ð1Þ The Kurdish alphabets dataset consist of many different struc- tures within. In many cases, the characters may have similarity where t represents an interval changing from zero to one. This in a specific part of the characters, for example, the start of the interpolation will generate a duplicated point while trying to fill

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 4 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx

Fig. 1. Kurdish alphabet with character writing style at different position (isolated, initial, medial and final).

Fig. 2. Merged Kurdish characters with difficult separation without loosing feature.

,shape at different position (isolated ﻉ Fig. 3. Kurdish character characteristics. (a) Incomplete writing chain. (b) Dehocking in start and end of writings. (c) Change in letter .in middle position ﻩ e) Different movement pattern for Kurdish letter) .ﺡ start, middle and end). (d) Different writing style for letter the gap. These redundant points are removed from the character class from the others. There were different attempts to enhance representation using adjacent check for similar (x, y) coordinate the extraction feature process either using physical properties such points. The interpolation method does not overcome the hocking as momentums and speed modeling [11], or mathematical such as problem. To remove hocks from the start and end of the character, curves and lines [16,29]. This work focuses on two feature cate- an iterative method based on Huang et al. [26] is used to check any gories which are directional and structural features. The directional sign change in slops between two adjacent lines at 5% of the points features will divide the full set of characters (for each character at the start and end of the character. form) into smaller subsets having similar feature. This feature could represent the initial movement in the character (e.g., (Ain- 4.2. Kurdish set classification with hidden Markov model Yaa-Gaa)) or the end movements in characters (e.g., Baa, Faa, Lam). Another directional features example is the writing direction This section explains how HMM is used as group classifier. The in which characters are initially following the writing direction classification process consists from two phases: directional feature (right to left) such as (Waw-Ain-Faa) or opposite direction such extraction and HMM classifier. as (Haa-Dal) or neither such as in (Raa-Lam-Alf). On the other hand, structural features represent character’s shape related prop- 4.2.1. Feature extraction erties such as including loops, sharp angles or height to width ratio. Feature extraction phase determines the set of significant fea- The directional feature is extracted from the character by trans- tures or properties which can be used to separate each character forming its representation into a movement vector. The character

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx 5

Fig. 4. Proposed system for online Kurdish character recognition. expression is changed from a sequence of coordinate points (x, y) of the characters (Ain-Yaa-Gaa) and illustrates the close movement into a chain of angular direction using eight directions model as similarity between these characters. This approach is used to group shown in Fig. 5(a). The movement vector consists of major move- characters based on movement pattern similarities at specific parts ments and less effecting movement. This vector is clustered to for each character group (isolated, initial, medial and final), such obtain a significant movement vector consist of only the dominant that, isolated characters and initial characters has many characters directions. To cluster the character, consider a character CD which shares the same writing pattern at the start of the character. described as a sequence of N directions {vi:vie [1, 8]}. This repre- While medial characters can have characters which share the same sentation can be described as a run length encoding such that, writing pattern in the middle parts (as the start and end parts are parts of ligatures or connectors to previous and next character). C ¼fv ðN Þv ðN Þv ðN Þg ð2Þ D 1 1 2 2 m m Finally, the final character has many characters which share the P m same end of writing movement (e.g. waw and raa) which make it where, i¼0Ni ¼ N. simple to consider them in the same group. To cluster a non-dominant vector vi which full lower than a threshold value k, it is compared to the previous and next vectors vi1 and vi+1, such that, 4.2.2. Hidden Markov model classifier HMM is a statistical model based on stochastic process called vi1 vi1 P viþ1 v ¼ ð3Þ Markov chain. The model is represented with a number of state i v v < v i1 i1 iþ1 which is visited according to their transition probabilities. When a state is visited, it will omit a specific observation symbols at Eq. (3) implies that if vi value falls under or equal a threshold value specific probability. The model can be described by the symbol k then it is reallocated to the most effective adjacent vector. The in the following model relation [34], threshold value k used in this paper equals 3, and it was obtained empirically from the characters’ dataset. Fig. 5(b) shows the result k ¼ðA; B; pÞð4Þ .ﻉ of applying this clustering technique to Kurdish character From the clustering result, it is obvious that the resulted vector where k is the HMM model, A is the transition probability, B is the consists of only dominant movements and the less influential observation probability and p is the initial state probability. movement has been removed. Moreover, the new character shape The model is trained to obtain the parameter of A, B using consists from straight line segments instead of a smooth curve. sequence observations. These observations represent the common This approach will increase similarity in specific position between sequence in which HMM is dedicated to find a pattern which con- certain groups of characters, while increasing difference with nects these observations. This training process will create a rela- others. Fig. 5(c) shows the initial movement vector (starting part) tion between the states, in which, the input sequence will follow

Fig. 5. Encoding and sampling technique for Kurdish characters. (a) Eight directional model for character encoding. (b) Character A in encoding and sampling. (c) Similarity in initial direction between letters (Ain-Yaa-Gaa).

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 6 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx a state path according to its transition probability. Eventually, the nized and separated according to the difference in shape or final HMM model will contain a full set of initial, transition and structure. observation probabilities, in addition to, other entities which defines the model. In general, the following parameters must be 4.3. Character recognition using harmony search defined before using HMM for classification, The final step of Kurdish character recognition is to recognize p p ={ i = P(si at t = 1)} the classified set into a single character class. This step consists A ={aij = P(si at t +1|sj at t)} of two steps: Geometrical feature extraction and HS recognition. B ={bi(k)=P(ok at t|si at t)} T = length of observation sequence. 4.3.1. Geometric feature extraction N = number of states in the model. Geometrical features are related to the character shape and M = number of observation symbols. gives each character its differences, such as characters with loops 6 6 S ={si}1 i N, states. (Mem-Waw). It is a difficult process to extract these features 6 6 O ={oj}1 i M, discrete set of possible observation symbols because the writing style may differ between the writers which yield an error in defining the feature. The main feature which are For this paper, a HMM with left to right state configuration is consider in this research are loop, degree of reverse movement used to classify Kurdish characters into small subset as shown in and width to height ratio. Fig. 6. The model is derived with the following conditions: 4.3.1.1. Loop feature detection. Loop feature exist in approximately ? a. There is no transition to the previous state, i.e., si Sj with half of the non-punctuated character set. Loop feature is not lim- j>i. ited to the standard typed characters. In handwritten scripts, some b. Transition to next state must not skip specific number of characters that don’t have loop can be written in a loop form such ? 6 D as shown in Fig. 8(a). The basic idea of detecting loop ﺡ states, i.e., si Sj with j i . as the letter

structure is to find two matched coordinate points (xs,ys) and (xe, Many systems that uses HMM as classifier-recognizer process ye) within the character coordinate representation. In case of has to deal with its limited capabilities [3,6,29]. The HMM config- incomplete loop, a line is drawn from the first point of the charac- uration will highly depend on training set and determining the ter (isolated and initial form) or the end point of the character (in number of states used to describe the system. However, HMM is final form) with slope calculated from the two consecutive points vulnerable to any close similarities or errors occurs within the as shown from Fig. 8(b). The line length is not exceeding half the writing process since its observation depends on the current state dimension in the direction of sloop. Then after, the maximum only. Hence, this system uses HMM as a subset classifier in which a and minimum x and y coordinate values for the points laying subset of characters with the same feature are extracted from a between the two matched points. This operation will form a rect- more general set. The set of all Kurdish characters are divided into angle with coordinates (xmin,ymin), (xmin,ymax), (xmax,ymin) and (xmax, a small subset with each containing 3–6 characters at most. These ymax) as shown in Fig. 8(c). Now, consider the point Pc = (xc,yc)to characters are grouped according to their directional vector when- be the center point of the rectangle and AR, ALP to be the total area ever a subset of characters have a common movement patterns. inside the rectangle and inside the loop respectively. The areas are The extracted subset of characters may have the same initial start- calculated as the area of a rectangle and area of ellipse (approxi- ing pattern (e.g., (Ain-Yaa-Gaa)), medial pattern (e.g., middle Baa mate value) and found from, and Lam) or end pattern (e.g., Faa-Baa or Waw-Raa). The division is made according to the following criteria for each of the four AR ¼jðxmin xminÞðy y Þj max min ð5Þ forms of characters (Isolated, initial, medial and final): p ALP ¼ jðxmax xminÞðymax yminÞj

a. Isolated and initial characters are divided according to the A loop LP having M points, is detected if the following conditions are initial movement vector with vector length equals to 10 satisfied: samples per characters. b. Medial characters are divided according to their medial 1. No two point are the same except the intersection. ; ! ; ; ; – ; ; ...; ; ; ...; movement part of the vector with vector length equals 20. 8P 2 LP 9ðx yÞi ¼ðx yÞj i j i ¼ 2 M j ¼ 2 M

c. Final characters are divided according to their ending move- 2. The center point belongs to the loop area, Pmid e AR and ment part of the vector with vector length equals 10. Pmid e ALP. 3. The Area of the loop is greater or equal half the area of the rect-

Fig. 7 shows the subset of each character form according to the angle, ALP P ½ AR. level of similarities in movement vector with respect to the above 4. The rectangle width is greater than half of the height. classification criteria under consideration. HMM is used to assign an input movement vector to corresponding subset of characters. The first condition implies that there is no other redundant The similarity divides the 15 isolated characters into four subsets, point along the path of the loop apart from the first intersection the 9 initial characters into four subsets, the 7 medial characters point (i=1). The second condition tests the center point of the rect- into three subsets and the 15 final characters into five subsets. angle, in which, if it is inside the loop then it could by a loop. Third The diversity between a single set characters can be clearly recog- condition test if the area of the loop is greater than half of the rect- angle area which encloses the loop. Generally, the area of the loop (ellipse) must be greater than half of the rectangle area because the ration of the rectangle’s area to ellipse area is approximately 0.785. Therefore, the area of an ellipse is 1.57 times greater than half of the rectangle area. The final condition tests if the width of rectan- gle is greater than the height to separate real looped characters from accidental loops pattern. Fig. 7(d) shows the result of apply- Fig. 6. Left to right Hidden Markov Model. ing this algorithm on two characters with real and accidental loop

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx 7

Fig. 7. Direction based sub grouping of different Kurdish characters forms.

Fig. 8. Loop detection algorithm in Kurdish characters. (a) Multi writing style with and without loop for letter Haa. (b) Loop completion algorithm (red lines) with slop along x-axis and y-axis. (c) Loop boxing with dimension calculations. (d) Area and center calculation for real and accidental loop with real loop area greater than the half of rectangle area.

structures. The accidental loop mostly occurs with characters that to generate a new solution or next generation. This approach will include a reverse return in the same path of writing such as in the allow for more diversity in generating new class of the same char- final form of character Raa. The area in accidental loops is always acter by combining different movement pattern from different smaller than half of the rectangular area compared to the real loop. writings. In addition, the center point may occur outside the accidental loop HS solution space consists of a harmony memory with size area in some cases. (HMS) which contains the generated population and their corre- sponding solution, 2 3 4.3.1.2. Reverse movement feature. Many Kurdish characters include x1 x1 f ðx1Þ a reverse movement opposite to the writing direction. These 6 1 n 7 6 . . . . . 7 reverse movements occur in the direction vector 2, 3, and 4 shown 6 . . . . 7 HM ¼ 6 7 ð7Þ in Fig. 4. The character sets are divided into two subsets according 6 . . . 7 4 . . . 5 to its reverse movements: low reverse like some writing pattern of HMS HMS HMS letters (Alf-Baa-Lam) or largely reversed like pattern found in let- x1 xn f ðx Þ ters (Ain-Haa-Dal). The character is categorized to these two group where n is the number of variables used to find the function f. For according to the reverse movement to movement vector length Kurdish character recognition, HM entries will consist of the pre- ratio, stored directional vector in the character database. Hence, for each P M v ; ; character obtained from HMM classification, HS is required to i 0b 1 9 2f2 3 4g Rev ¼ ¼ ; b ¼ ð6Þ obtain the best matched sequence of direction for the target charac- M 0 otherwise ter. The solution is constructed by selecting random values from

This ration is determined empirically for each category. The value HMS or from a vector bounded by values xmin and xmax (bound of ranges for each category is found to be: less than 0.075 for low the character direction vector) which is determined by the value reverse and greater than 0.1 for largely reversed. generated from a random variable known as harmony memory con- sideration rate (0 6 HMCR 6 1). 4.3.1.3. Dimensional shape feature. There are different character 2 HM HMCR xnew ¼ i ¼ 1; ...; n ð8Þ shape characteristics in the Kurdish set. The characters may be cat- i 2½xL; xU 1 HMCR egorized as vertically shaped such as (Haa-Lam-Alf), horizontally i i shaped such as (Baa-Faa) or neutral such as (Raa-Dal). The category When harmony memory value is selected, another parameter is is determined by taking width to height ratio. If the ration is used to decide whether this value is picked or to be tweaked. The greater than 2 the character is horizontal based or if the ratio is less tweaking operation is known in HS as pitch adjustment and is con- than 0.5 then it is considered as vertical based, otherwise, it is con- trolled by a random generated parameter known as the pitch sidered as neutral. All the determined ration was obtained empir- adjustment rate (PAR). In discrete harmony search, as in Kurdish ically from the collected and obtained dataset used during this direction vector, the PAR selects the neighbor value according to study. the shift parameter. new new xi ðkÞ 1 PAR 4.3.2. Harmony search recognizer x ðkÞ¼ ð9Þ i xnewðk þ mÞ PAR Harmony search algorithm uses the same concept as in all evo- i lutionary algorithms which is based on random population gener- where m e [1,1] is the integer shifting value. For this work, the ation such as Genetic algorithm [22]. Generating new solution in value of PAR is fixed on 0.2 to insure minimal shifting from posi- HS depends on all individual population (solutions) in the solution tions. To specify the shift direction, a shifting parameter (SP) is used space. This process differs from GA which depends on two parents and defined by:

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 8 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx 2 3 xnewðk þ 1Þ SP p1 p1 xnewðkÞ¼ i ð10Þ 6 1 N 7 i xnewðk 1Þ 1 SP 6 . . . 7 i 4 . . . . 5 The most important part in optimization problems is to define the 8 8 p1 pN objective function to be optimized. For this paper, the objective is to find a character which has the best match direction vector to For each vector obtained from HS search, the sum of the probabili- the input character vector [32]. The matching is based direction ties of each direction to exist in position i is to be calculated. The vector and probability of each direction in its relative position in total direction vector probability is found as, the vector. The objective function is defined: XN d M P ¼ pi ð14Þ F ¼ W ð11Þ i¼1 P where M is the matching function, P is the probability of each match The parameter in (14) is used in (11) to determine the likelihood of movement in its position and W is the penalty function. To define the movement vector obtained from HS to belong to the character which movement in the dataset. In other words, this factor will measure ﮒ and ﻉ, ﺡ the terms in (11), let us consider the characters and similar starting the difference of the generated characters from HS from the original (ﻉ،ﺡ) has similar structure in the end part for -The matching function can be defined based characters stored in the dataset, therefore, it will discard those fal .(ﻉ،ﮒ) structure for on these characters’ movement structure by partitioning the char- sely generated matches to the target vector from residing in the HS ﻭ acters into similar number of parts. In general, the Kurdish charac- memory. For example, if the target character is ( ) and the classified ﻭ ﻑ ter can be separated into 3 parts: start, middle and end movement set are the characters ( - ), then both have a similarity in the start parts, which can be compared separately. The number of parts was section and middle section but differs in the end section. HS can ﻑ ﻭ decided based on similarity and difference in movement pattern of generate the character ( ) from ( ) by randomly generating and character in the same groups shown in Fig. 7, as well as, the length replacing the movement in end with the middle part. However, of the movement sequence in these groups (similar and different). the parameter from (14) will have a smaller probability score mak- For each part, a weight is assigned to separate the matching func- ing an increase in the objective function value compared to the orig- tion according to the character forms. Thus, the matching function inal matching class. Therefore, the score in (14) will filter these can be written as, vectors that have movement patterns that are dissimilar from those 8 originally obtained from the dataset. > wsCs þ wmCm þ weCe Isolated The last factor to be considered in the objective function is the <> wsCs þ wmCm Initial non-matched geometric feature penalty factor (loop, reverse, and f match ¼ > ð12Þ dimension). This is given as, > wsCs þ wmCm þ weCe Medial : wsCs þ wmCm þ weCe Final XG 0 match ; W ¼ 1 þ gf gf ¼ ð15Þ with, i¼0 1 mismatch > ws ¼ we wm Isolated where G represents the number of features to be matched (for this ws > wm; we ¼ 0 Initaial research is 3). The geometry feature will work as a penalty function which keeps the selected vector within the right class of the target wm > ws > we Medial vector. For example, if the character contains a loop, then, the pen- we > ws > wm Final alty will increase the chance of rejecting by over-scoring the objec- and C is the matching function defined as, tive function. Thus, the final matching objective function to be minimized is defined as, XN 0 vt ¼ vs ( C ¼ V i ! V i ¼ ð13Þ W P f match ¼ 0 i¼1 1 otherwise W Min Fscore ¼ min f match P ¼ min ð16Þ f matchW – P f match 0 where vt and vs are the target and source directions, w is the weight value assigned to the character part. It can be noted from (12) that From Eq. (16), it is clear that the fmatch function will increase the the weight consideration is differently taken for each form. The ini- matching score whenever there is a mismatching score between tial form has considered only the start and middle part of the move- the HS vector and the target vector with reference to Eq. (13). The ment vector because the end part is similar in all the characters and parameter 1/P minimizes the scoring function whenever the impro- belongs to the connector in between the characters. For medial and vised direction is correctly located in the right position between the final form, the start and middle part of the movement vector is source data set vector and HS direction vectors. The penalty mostly considered since most characters are similar in the end part. increases the function whenever there is mismatching between Moreover, the matching term C will increase in value whenever the two matching characters. The basic idea of HS optimization there are too mismatching directions between target vector and for this system is to decrement the matching function in (12) while HS vector. increasing the probability of locating a directional entity in the As shown in (11), HS objective function is not considering the write location. HS search algorithm tries to construct a directional matching function as the only factor because in some cases the feature vector from a pre-stored vectors of a class which minimizes matching process fails to select the right character by one to the matching function while maximizes its probability. d one match. To enhance the scoring function, a parameter pi is Fig. 9 shows the HS recognition process applied to the isolated used to score the probability of a direction d at position i for character subset (Ain-Yaa-Gaa). The HS recognition start by fetch- each character in the training dataset to reduce the possibility ing the dataset for all the three characters and store it in a separate of incorrect direction or non-existent direction to occur in the HS memory along with their feature vector. Then, for each of the new improvised harmony vector. For the chosen direction model, three characters, the direction probability matrix is calculated to each character in the training dataset has N by 8 probability obtain the overall probability of each direction at each location of value array to reflect the direction probability per position as a N lengthen vector (for this example N = 10 and the matrix is matrix as follows, 8 10). In order to calculate the match function, each vector is

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx 9

Fig. 9. Sampled data for the calculation of HS applied to the character subset (Gaa-Ain-Yaa) obtained from HMM classification. divided into three parts with each character generates a three sub- between a single character and the target character (second Gaa set directional vector with a count of 4-3-3 directions. Then, the sample in Table 1). This feature is very important in writer inde- matching of these vectors according to (13) is multiplied by the pendent systems where the recognition system depends on the associated weight of (1–0.5–1) to generate the final score from general common features of the class instead of the best match- (12). Finally, this product is multiplied by its corresponding feature of-multiple samples. Moreover, HS search operation can generate penalty and divided by the probability score obtained from the new patterns from the same class samples to match noisy samples direction probability matrix. The calculation is shown in Table 1. or irregular writing pattern. This operation is achieved by taking a As it can be seen from the table, the best matched vector is single direction pattern from each database sample and tries to fit found in the right class (Ain) with minimum score of (0.27). The the closest match to the target considering the probability and fea- direction vector (8746776664) associated to this score, is the clos- ture scores (see Table 2). est match to the vector (8766777664) with eight out of ten matched directions. This result represents the initial stage calcula- 5. Results and discussion tion of HS at the initial trail. Now, consider the case where after a long search, HS produces the exact match of the target vector in all The result section are divided into three parts: system local fea- three characters classes. Hence, the score of the matching function ture test, comparison with Kurdish character system and compar- f is equal to (0) for all characters according to (12) and (13), match ison with Kurdish word recognition with common dataset. In the and the probability is equal to (3.4, 4.2, 3.1) for each of (Gaa-Ain- first phase, a dataset is created by collecting handwritten word Yaa) respectively. The final score for all the three characters is samples from 10 users. The characters is extracted using a segmen- found as (0.88, 0.23 and 0.96) for (Gaa-Ain-Yaa) respectively. tation technique described by [32]. The segmentation process was Therefore, the minimum score is found to be (0.23) and it corre- not 100% error free and led to some errors in recognition. However, sponds to the right class character (Ain). the amount of error is less than 2% therefore it is considered as The shown calculations of the previous example illustrates the acceptable for the current study. The dataset consist of 24,960 effectiveness of the score function used in HS. Although, the characters divided into four categories according to their form. This matching function was equal in all three characters, the probability data set is divided into two parts: training data and testing data. and penalty functions scores have effected the final result, in The training data consists of 7500 characters which are used in which, it minimizes the value for the character with best feature HMM training, as well as, in harmony memory for determining and direction probabilities. However, the three characters have a HS matching function and character class directional probability common writing pattern at the beginning but the middle and matrix. The data was divided as 150 sample per character to guar- end part has a lot of writing pattern difference which is reflected antee a diversity in character writing style to support its writer in the low probability score. Thus, HS objective function can suc- independent feature. The testing data consists of 14,460 characters cessfully locate the right class even when there is a close match used to test the accuracy and time of the recognition process.

Table 1 Simple calculation example for three Kurdish characters (Gaa, Ain and Yaa) to illustrate the use of Eq. (16).

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 10 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx

Table 2 Results of HMM classification for the trained and test data.

Form No of samples Correctly classified Errors % Accuracy Isolated 6120 5800 320 94.77 Initial 9240 9130 150 98.81 Medial 3680 3490 190 94.84 Final 5650 5370 280 95.04 Overall accuracy 95.87

Table 3 Comparison between HS, HMM-HS without penalty and HMM-HS with penalty for recognition rate and time (in ms).

Character Time Time TimeHMM-HS TimeHMM-HS Recognition Recognition Recognition rate HMM-HS Recognition rate HMM-HS form (HMM) (HS) without Penalty with penalty HMM (%) HS (%) without penalty (%) with penalty (%) Isolated 300 8400 863 500 76 78 85.23 92.70 Initial 300 8400 863 500 78.4 82 87.75 94.88 Medial 300 8400 863 500 80.2 82.15 88. 18 93.81 Final 300 8400 863 500 77.38 81.15 86.44 92.70 Average 300 8400 863 500 77.995 80.825 86.9 93.53

Fig. 10. Experimental results between HMM, HS and HMM-HS (without and with penalty function). (a) Recognition time. (b) Recognition rate.

The testing step started with training and testing the HMM for the stoping condition when no improved solution is obtained from the subset classification success rate. Table 2 shows the result the most resent improved result). Table 3 and Fig. 10 shows the obtained from currying this test. From the table it is clear that comparison for the time and recognition rate for all these the subset classification result for the initial and final form scored systems. the highest rate with 98.81% and 95.84% respectively, compared to The results show that the time factor is tremendously reduced 94.77% and 94.84% isolated and medial forms. Most of the obtained between HMM-HS based and HS standalone recognizers. The time errors are caused by high noise samples obtained from the users. is approximately reduced to 1/10 and 1/16 when using HMM-HS These errors shows how the input devices can effect the recogni- without and with penalty function respectively. HMM scored the tion rate in most of the systems. shortest execution time credit to its fast classification. In addition, The output of the HMM classifier is passed to the HS recognizer. the recognition accuracy is enhanced from 76 to 80% in HMM and In order to determine the overall recognition algorithm, the system 78%–82% in HS, to 85%–88% in HMM-HS without penalty function is compared with HMM and HS as standalone recognizers and and to 92%–94% in HMM-HS with penalty function. It is obvious HMM-HS with HS matching function does not include the proba- that amount of enhancement the proposed HS objective function bility function and direction vector penalty. This comparison will on the recognition result for HMM standalone system despite the determine the contribution of these two functions in recognition slower response. success rate, as well as, the time improvement to make meta- The second part of the test was to compare the final test result heuristic algorithm suitable for online application. To guarantee a with some related works. Table 4 shows the comparison of this stable result, the experiment was applied to the testing data for method to some previous works in the field of Kurdish characters as many as 20 times and taking the average of the execution time which uses feature classification, Fuzzy, ANN or HMM. It is obvious and recognition success rate with regards to the standard deviation how the system has a good performance compared to some of of these runs. This calculation approach is used, since HS is a these techniques. Especially, the system covers all the forms of stochastic process which may produce different results at different character writings while these systems only covers the aspect of runs. The number of improvisation in the HS is set at 100,000 and isolated form. However, there is no common background between the stoping condition for unimproved result was set at 5000 (this is these systems since the datasets are different (Arabic or Persian),

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx 11

Table 4 Comparison of recognition time and rate with previous systems.

System name Language Recognition method Training set size Recognition accuracy (%) [6] Arabic Hough transform HMM and ANN 80,000 96.84 [7] Arabic Modified Hough transform HMM and ANN 80,000 97.36 [30] Farsi Structural Feature and HMM 4000 94–99 [30] Farsi Discrete HMM and Kohonen self-organizing vector quantization 17,820 62 [30] Urdu Discrete Cosine Transform with HMM 1259 92 [30] Urdu Hybrid Fuzzy-HMM 1800 89.4 HMM-HS Kurdish Hidden Markov model and Harmony search 7500 characters 93.52

Fig. 11. Examples of errors occurs during HMM-HS recognition due to HMM classification or HS recognition phase. while each technique is used for either a form or a font based final match score by discarding a fake or missed feature match. The recognizer. system was tested in two phases. The first tests were applied on It is important to mention that some of the errors reflected in different scenarios to obtain the effect of HMM, HS and penalty the tests were caused by misclassified characters by HMM classi- effects. The obtained results demonstrated the enhancement of fier, as well as, HS errors of matching similar characters writing combining HMM with HS on recognition rate and reduction in exe- Despite the fact that these cution time. The recognition time was reduced up to 1/16 times of .(ـﻔـ) close to (ـﻌـ) style such as writing characters existed in the data set and creates a 100% one-to-one using HS alone. Moreover, the recognition rate for the HMM-HS match, its probability score will be relatively low, since the sample with penalty is found to be 93.52% compared to 80.83% and was written in a style different from the normal writing style of the 86.47% of HS and HMM-HS without penalty systems respectively. character. These errors match the system configuration of being a In the Final test, the system was compared to some similar systems writer-independent because of its focus on the probability of com- which used HMM and the comparison showed that the proposed mon direction pattern instead of matching one-to-one character method achieved better recognition rate to systems using HMM pattern. Fig. 11 illustrates some of these misclassified/misrecog- as their main recognition process. nized characters produced during the test. References

6. Conclusion [1] S. Al-Emami, M. Usher, On-line recognition of handwritten Arabic characters, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1990) 704–710. Online Kurdish character recognition systems have many prob- [2] H. Aljuaid, D. Mohamad, M. Sarfraz, Evaluation approach of Arabic character recognition, Int. J. Comput. Vision Image Process. 1 (2) (2011) 58–77. lems and challenges compared to offline recognition systems. [3] J. AlKhateeb, J. Ren, J. Jiang, H. Al-Muhtaseb, Offline handwritten Arabic cursive These systems must not only maintain a good recognition rate text recognition using Hidden Markov Models and re-ranking, Pattern Recogn. but also obtain a good processing time. This means the system Lett. 32 (8) (2011) 1081–1088. [4] S. Alma’adeed, C. Higgens, D. Elliman, Recognition of off-line handwritten must compromise the recognition rate to achieve a reasonable exe- Arabic words using hidden Markov model approach, Proceedings. 16th cution time. In Kurdish language, the character system complexity International Conference on Pattern Recognition, vol. 3, 2002. and writing style verity presents a real problem for research in the [5] A. Amin, Recognition of printed and handwritten Arabic characters, in: First Brazilian Symposium, BSDIA’97, Curitiba, Brazil, November 2–5, 1997, pp. 40– field of time and accuracy. In this paper an evolutionary method 59. based on Harmony search algorithm combined with hidden Mar- [6] N. Ben Amor, N.E. Ben Amra, Multifont Arabic character recognition using kov model is present. The system processes the characters in two Hough transform and hidden Markov models, in: 4th International Symposium on Image and Signal Processing and Analysis, 2005, pp. 285–288. phases: subset classification using HMM and final recognition with [7] N. Ben Amor, N. Essoukri, Combining a hybrid approach for features selection HS. HMM is used to classify the characters into small subset by and hidden Markov models in multifont Kurdish characters recognition, in: using common direction features between a set of characters. Proc. IEEE Conf. Document Image Analysis for Libraries, 2006, pp. 103–110. [8] F. Biadsy, J. El-Sana, N. Habash, Online Arabic handwriting recognition using The divided subsets will have a better chance to locate the differ- hidden Markov models, in: Proc. Int’l Workshop Frontiers of Handwriting ence between them according to shape or direction pattern. For Recognition, 2006. each of the HMM subset characters, HS is used to determine the [9] G.H. Bo, M. Huang, The Harmony search for the routing optimization in fourth closes match between candidate character direction vectors and party logistics with time windows, in: 2009 Ieee Congress on Evolutionary Computation, vols. 1–5, 2009, pp. 962–967. input target direction vector using a novel matching function. [10] C. Oh, W.S. Kim, Off-line recognition of handwritten Korean and alphanumeric The matching function represents a minimization problem which characters using hidden Markov models, in: Proceedings of the Third determines the best directional vector that matches the target vec- International Conference on Document Analysis and Recognition, vol. 2, no., 1995, pp. 815–818 vol. 2, 14–16. tor. The matching function consists of three parts: scoring function, [11] L. Cordella, C. De Stefano, F. Fontanella, C. Marrocco, A feature selection probability and feature penalty. The scoring function is used to algorithm for handwritten character recognition, in: 19th International compare the direction vector of the target and the generated vec- Conference on Pattern Recognition, ICPR 2008, 2008, pp. 1–4. [12] K. Daifallah, N. Zarka, H. Jamous, Recognition-based segmentation algorithm tors from HS. In addition, a penalty factor, that consists of geomet- for on-line arabic handwriting, in: 10th International Conference on Document rical feature (loop, reverse and orientation) is used to enhance the Analysis and Recognition, ICDAR ’09, 2009, pp. 886–890.

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016 12 R.D. Zarro, M.A. Anwer / Engineering Science and Technology, an International Journal xxx (2016) xxx–xxx

[13] S.O. Degertekin, Harmony search algorithm for optimum design of steel frame [29] Z. Narima, R. Messaoud, B. Mouldi, Neuro-Markovian hybrid system for structures: a comparative study with other optimization methods, Struct. Eng. handwritten Arabic word recognition, in: Proceedings of the 2003 10th IEEE Mech. 29 (4) (2008) 391–410. International Conference on Electronics, Circuits and Systems, 2003. ICECS [14] I. El-Feghi, F. Elmahjoub, B. Alswady, A. Baiou, Offline handwritten Arabic 2003, vol. 2, no., 2003, pp. 878–881. vol. 2, pp. 14–17. words recognition using Zernike moments and Hidden Markov Models, in: [30] S. Naz, A.I. Umar, S.H. Shirazi, M.M. Ajmal, The optical character recognition for 2010 International Conference on Computer Applications and Industrial cursive script using HMM: a REVIEW, Res. J. Appl. Sci. Eng. Technol. 8 (19) Electronics (ICCAIE), 2010, pp. 165–168, pp. 5–8. (2014) 2016–2025. [15] T. El-Sheikh, S.G. El-Taweel, Real-time Arabic handwritten character [31] S. Nebti, A. Boukerram, F. Zavoral, J. Yaghob, P. Pichappan, E. El-Qawasmeh, recognition, in: Proc. 3rd Inter. Conf. On Image Proc. and its Appl., Warwick, Handwritten digits recognition based on swarm optimization methods, in: IEE. London, UK, 1989, pp. 212–216. Networked Digital Technologies, Springer, Communications in Computer and [16] T. EL-Sheikh, S.G. El-Taweel, Real-time Arabic handwritten character Information Science, vol. 87, Part 1, 2010, pp. 45–54 . recognition, Pattern Recogn. 23 (1990) 1323–1332. [32] M.Y. Potrus, U.K. Ngah, B.A. Salahaddin, An evolutionary harmony search [17] M. Fesanghary, M. Mahdavi, Hybridizing harmony search algorithm with algorithm with dominant point detection for recognition-based segmentation sequential quadratic programming for engineering optimization problems, of online Arabic text recognition, Ain Shams Eng. J. 5 (4) (2014) 1129–1139. Comput. Methods Appl. Mech. Eng. 197 (33–40) (2008) 3080–3091. [33] B. Omar, Handwritten Kurdish character recognition using geometric [18] R. Forsati, A.T. Haghighat, Harmony search based algorithms for bandwidth- discretization feature, Int. J. Comput. Sci. Commun. 4 (1) (2013) 51–55. delay-constrained least-cost multicast routing, Comput. Commun. 31 (10) [34] L.R. Rabiner, A tutorial on hidden Markov models and selected applications in (2008) 2505–2519. speech recognition, Proc. IEEE 77 (2) (1989) 257–286. [19] M. Frosolini, M. Braglia, A modified harmony search algorithm for the multi- [35] M.A. Rashwan, M.W. Fakhr, M. Attia, M.S. El-Mahallawy, Arabic OCR system objective flowshop scheduling problem with due dates, Int. J. Prod. Res. 49 (20) analogous to HMM-based ASR systems; implementation and evaluation, J. Eng. (2011) 5957–5985. Appl. Sci. 54 (2007) 653–672. Faculty of Engineering Cairo University. [20] K.Z. Gao, Q.K. Pan, Discrete harmony search algorithm for the no-wait flow [36] M.I. Razzak, F. Anwar, S.A. Husain, A. Belaid, M. Sher, HMM and fuzzy logic: a shop scheduling problem with total flow time criterion, Int. J. Adv. Manuf. hybrid approach for online Urdu script-based languages’ character recognition, Technol. 56 (5–8) (2011) 683–692. Knowl.-Based Syst. 23 (8) (2010) 914–923. [21] Z.W. Geem, Music-Inspired Harmony Search Algorithm: Theory and [37] M. Sarfraz, A.T.A. Al-Awami, Arabic character recognition using particle swarm Applications, Springer publishing company, 2009. optimization with selected and weighted moment invariants, in: International [22] Z.W. Geem, J.H. Kim, A new heuristic optimization algorithm: harmony search, Symposium on Signal Processing and its Applications in conjunction with the Simulation 76 (2) (2001) 60–68. International Conference on Information Sciences, Signal Processing and its [23] V.K. Govindan, A.P. Shivaprasad, Character recognition — a review, Pattern Applications (ISSPA 2007), 2007, pp 12–15. Recogn. 23 (7) (1990) 671–683. [38] M.P. Singh, S. Kumar, J. Goel, R. Lavania, Hybrid evolutionary techniques in [24] T.F. Hain, S.V.R. Racherla, D.D. Langan, Fast, precise flattening of cubic Bezier feed forward neural network with distributed error for classification of segment offset curves, in: Proceedings. 17th Brazilian Symposium on handwritten Hindi ‘SWARS’, Connection Sci. 25 (4) (2013) 197–215. Computer Graphics and Image Processing, 2004, pp. 244–249. [39] M.P. Singh, S. Shrivastava, Performance evaluation of feed-forward neural [25] J. Hu, K. Brown, W. Turin, HMM based on-line handwriting recognition, IEEE network with soft computing techniques for hand written English alphabets, Trans. Pattern Anal. Mach. Intell. 18 (10) (1996) 1039–1045. Appl. Soft Comput. 11 (1) (2011) 1156–1182. ELSEVIER. [26] B. Huang, Y.B. Zhang, M. Kechadi, Preprocessing techniques for online [40] M. Soryani, N. Rafat, Application of genetic algorithms to feature subset handwriting recognition, Intell Text Categorization Clustering (2009) 25–45. selection in a Farsi OCR, World Acad. Sci. Eng. Technol. 18 (2006) 113–116. [27] R. Kala, H. Vazirani, A. Shukla, R. Tiwari, Offline handwriting recognition using [41] Wikipedia, Full Kurdish Alphabet, http://en.wikipedia.org/wiki/Kurdish_ genetic algorithm, Int. J. Comput. Sci. 7 (2(1)) (2010) 16–25. alphabet, [online]. [28] S. Kherallah, F. Bouri, A.M. Alimi, On-line Arabic handwriting recognition [42] D.X. Zou, L.Q. Gao, A novel global harmony search algorithm for task system based on visual encoding and genetic algorithm, Eng. Appl. Artif. Intell. assignment problem, J. Syst. Softw. 83 (10) (2010) 1678–1688. 22 (1) (2009) 153–170.

Please cite this article in press as: R.D. Zarro, M.A. Anwer, Recognition-based online Kurdish character recognition using hidden Markov model and har- mony search, Eng. Sci. Tech., Int. J. (2016), http://dx.doi.org/10.1016/j.jestch.2016.11.016