ANIMATING MOVES RECORDED ON CHESS INFORMANT

Süleyman Eken, Ahmet Sayar Department of Computer Engineering, Kocaeli University, Kocaeli, Turkey {[email protected], ahmet.sayar@ kocaeli.edu.tr }

ABSTRACT including chess moves are darker than the other part of In this study, we focus on animating chess games image. So, we separate these portions from others and recorded on chess informant. This involves recognition then purpose of recognition of chess moves characters. of chess characters as well as moves and playing them Within this concept, interested problem is a kind of on . The proposed technique eliminates document image analysis problem. false recognitions by means of controlling possible Although chess game is very old, the game moves in accordance with the continues to interest and is very popular mind-game in (semantics). The paper produces solution for figurine many countries [2], [3]. There are various notations for algebraic notation (FAN). For character and figure recording the number of chess moves and each country recognition, we form feature vector including area, has own markings for figures. With the development center of area, perimeter, thinness ratio, aspect ratio, of internet, it is critically important to transfer compactness, Euler number, and projection. In the recognition stage, multi-layer feed-forward (MLF) magazines and books including chess games to the neural network with back-propagation learning electronic environment. The descriptions of game are algorithm is utilized to recognize characters and not available in computer-readable. figures of chess by using this feature vector. The Text of Chess Informant series was set using 200 results show that the proposed system is capable of different letters. These series consist alphanumeric contributing to the generation of robust game fonts in both roman and boldface, chess characters, databases through digitizing of chess games recorded and a number of special symbols used for on chess informant. interpretation. Lines of text are set uninterrupted, in other words lines are spaced vertically as tightly as KEYWORDS possible for the point size. Also, quality of Informant‘s Chess readings; chess charactrer recognition; chess print is mostly worse than that seen in most modern notaions; document image analysis. books [4]. Some of the existing studies in the literature on the recognition of chess characters are as follows: Nabiyev 1 INTRODUCTION [5] initially represented chess readings turning several notations to common notation that is FAN. After the Nowadays, pattern recognition has been studied in chess readings analysis, the transformations of many fields including psychology, psychiatry, recognized chess characters have been performed. biometrics, bioinformatics and gene expression Chess moves have been animated on chessboard, so analysis, cognitive science, traffic flow, handwriting mistakes in recognition stage have been eliminated by recognition in criminology and banking, optical controlling possible moves in accordance with the character recognition (OCR), and computational rules of chess. For character recognition, he utilized finance and the stock market. OCR systems are the heuristic solution together with the figural divided into two categories: task-specific and general information relating to the notations. Baird and purpose systems. In task-specific systems, certain Tompson [4] proposed an empirical page reader portions of document are digitized by means of system performed top-down layout analysis -divide a specific equipments. Systems such as bank document image into smaller regions- for readers, account processing systems, and airline identification of columns, lines, and characters through ticketing readers use task-specific OCR systems. In skew-estimation technique. By analyzing the formal general purpose OCR systems, the document is rules of chess, the error rate was minimized separated text and non-text blocks. Text blocks are considerably. Zhu and et al. [6] presented a Chinese separated to lines, words, and characters. These chessman pattern recognition system based on rate- systems are known as document image analysis [1]. In connectivity and concentric circle algorithms. Chess Informant- a series of volumes leading source of According to their experimental results, concentric games and analysis for serious chess players, portions

circle algorithm is more suitable then rate-connectivity horizontal rows of squares labelled 1 through 8. The algorithm in terms of complexity and accuracy. files are known as the vertical columns of squares In this study, we propose a system extracting of labelled a through h. After the labelling, each square chess moves the Chess Informant and recognizing of on the chessboard can be described by a peerless file chess characters. In a Chess Informant, each game and rank . Excepting of , all pieces consists of a move number and two ply (half-moves), are identified by a single capital letter starting with and each ply is described in three characters on their names, because the K letter is used for a , so average (see Fig. 1). By applying knowledge of the the Knight is identified by an N (K-King, Q-, R- rules of chess (semantics), each move is checked for , B – , N-Knight). There is no piece legality. identifier for a . In this notation, firstly white moves are written and then black moves. Each full move is sequentially numbered. Many books and programs use FAN. This is as closely as normal algebraic notation except that identifiers of the piece are replaced with graphic symbols of the pieces. Here is an example of the same moves in some Figure 1: An example of text from the Chess Informant of the notations (see Table I).

This study consists of following steps: (i) Table 1: examples preprocessing on text of chess, (ii) extracting of chess # Algebraic FAN LAN Descriptive Coordinate games from the Chess Informant, (iii) recognizing of e4 e2-e4 P-K4 E2-E4 chess characters, and (iv) applying knowledge of the 1. e4 e5 rules of chess and checking for legality. The remainder e5 e7-e5 P-K4 E7-E5 of this paper is organized as follows. In Section II, Nf3 ♘f3 Ng1-f3 N-KB3 G1-F3 2. chess notations are presented for better understanding Nc6 ♞c6 Nb8-c6 N-QB3 B8-C6 of the subject. In Section III, extraction of portion of Bb5 ♗b5 Bf1-b5 B-N5 F1-B5 3. chess game from other part of paper is explained. a6 a6 a7-a6 P-QR3 A7-A6 Also, the methodology of the study and findings are Bxc6 ♗xc6 Bb5xc6 BxN B5-C6 presented in same chapter. Section IV draws a 4. conclusion and future works. dxc6 dxc6 d7xc6 QPxB D7-C6 d2-d3 d3 d3 P-Q3 D2-D3 5. Bf8- Bb4+ ♝b4+ B-N5ch F8-B4 2 CHESS NOTATIONS b4+ Nc3 ♘c3 Nb1-c3 N-B3 B1-C3 Chess notation is developed to save moves made 6. Nf6 ♞f6 Ng8-f6 N-B3 G8-F6 during a game of chess or the positions of the pieces. They will allow chess players to reconsider O-O O-O O-O O-O E1-G1 7. their games and read chess books. Many systems of Bxc3 ♝xc3 Bb4xc3 BxN B4-C3 chess notation are available today, three of them are common. These are descriptive (English notation), There are also the following notations: coordinate, and algebraic. Algebraic notation is the International Numeric Notation most commonly used in tournaments and in all modern (ICCF), (PGN), Forsyth– books. Also, it is sort of a hybrid notation between Edwards notation (FEN), and Extended position descriptive and coordinate. Detailed information about description (EPD) [7]. Different languages have algebraic notations is given as follows. different names for the pieces. FAN is independent of the language. 2.1 Algebraic Notations 2.2 Chess Symbols There are a number of different types of algebraic chess notations but popular types used today are Short Various symbols are usually found in chess books. The Algebraic Notation (SAN), Long Algebraic Notation following markings are usually used by commentators (LAN), and Figurine Algebraic Notation (FAN). to give evaluative comment on a move: Chessboard is divided into ranks and files in the algebraic notation. The ranks are known as the

 ! (a particularly good move) 3 EXTRACTION OF CHESS  !! (an excellent move) GAMES AND MOVES  ? (a bad move)  ?? or ??? (a ) Characters in the Chess Informant must be recognized  !? or * (an interesting move that may not be best) firstly, in order to interpret printed text. Character  ?! (a dubious move - one which may turn out to be recognition process usually requires pretreatment. bad) Pretreatment consists of several steps such as  TN or N (a theoretical novelty) converting gray level image, binarization, detection frame of characters, and scaling/normalization. In this These symbols indicate the strategic balance of the study, we focus on images with less noise and our game position: proposed technique produces solution for FAN.

 ∞ (Unclear): It is unclear who (if anyone) has an 3.1 Image Pre-treatment advantage  =/∞ (Whoever is down in material has Raw image is converted into grayscale for simplifying for the material) the tasks of the subsequent steps in the algorithm. As a  = (Even position) result of converting gray-image, each pixel is  +/= (White has slightly better chances) represented by 1 byte.  =/+ (Black has slightly better chances) Median filtering is one kind of smoothing  +/− or ± (White has much better chances) technique. Most smoothing techniques remove noise,  −/+ or ∓ (Black has much better chances) but they blur edges. We utilize median filtering to  +− (White has a clear advantage) reduce noise over image. Applying multiple median  −+ (Black has a clear advantage) filters lessens noise and brightness. To avoid the computational expense, median filter was used with There are some other symbols used in multilingual small size (3x3) window [9]. The filter is considered publications: following 3x3 image window. As the performance of thresholding affects the  ○ (Space indicates more space owned by one player) performance of subsequent image processing methods,  ↑ ( indicates an advantage in initiative) threshold value must be chosen automatically to reveal scene information as possible. Since image features  ↑↑ or ↻ (Development indicates a lead vary from region to region, thresholding technique in development) must take the location and the pixel relationships into  ⇄ (Counterplay indicates that the player has account. Consequently, we need to use different counterplay) threshold values according to the variance of pixels.  ∇ (Countering indicates the opponent's plan this Otsu thresholding technique [10], taking gray level defends against) values distributions in image as well as the local  Δ (Idea indicates the future plan this move supports) characteristics of the pixels into consideration, was [8] used (see Fig. 3).

In recognition of chess text; characters, symbols, letters, numbers, and figures are concerned. Also, figures can vary in (see Fig. 2).

Figure 2: Example of different type symbols for chess game (a)

Obtained characters and symbols are normalized for adoption with characters and symbols recorded in database. 3.3 Recognition of Chess Characters and Symbols

This paper produces solution for figurine algebraic notation (FAN). FAN character set consists of 5 big letters (K, Q, R, B, N), 8 little letters (a, b, c, d, e, f, g, h), numbers (0, 1, 2, 3, 4, 5, 6, 7, 8, 9), symbols (!, ?, +, -, x, .), and game set ( , ) for whites and blacks. For (b) character and symbol (we call them generally object) Figure 3: (a) Raw Chess Informant image and (b) its binary recognition, we form feature vector including position image and orientation properties such as area, center of area/mass, perimeter and shape properties such as 3.2 Extraction of Chess Moves thinness ratio, aspect ratio, compactness, Euler number, projection. It is necessary to extract chess characters and symbols To obtain these features for individual objects, we for recognition of them. This requires the need to create separate binary image for each of them. determination of rows and columns in the text. A text We can achieve this by assigning 1 to pixels may contain single or a two-column. Columns of text representing object and 0 elsewhere (see Fig. 5). are obtained by calculating vertical projection histogram (tonal distribution in a digital image) for binary Chess Informant image. Parts of binary image containing text remain only and so, images that do not contain text are eliminated. This step saves computation cost in other stages. Similarly, rows of text are obtained by means of horizontal projection histogram. The parts in the text of chess books such as Chess Informant and Encyclopedia of Chess Opennings are written in boldface. With the help of this information, other processes are realized on these parts by evaluating histogram values. The characters are Figure 5: The binary image for ‗b‘ character bordered by vertical projection histogram for each row of bold text (containing chess game). Figure 4 shows The object‘s area is computed as follows: vertical and horizontal projection histograms for each row of chess game. height 1 width 1 = I(r c) (1) r=0 c=0

Where I(r,c) represents pixel value (0 or 1) at row-r and column-c for image I. Area is the total number of pixel that constitutes to the object. The center of area for an object is computed by means of Equation 2 as following:

height 1 width 1

Figure 4: Extraction of chess moves with vertical = r I(r c) and horizontal projection histograms r=0 c=0

height 1 width 1 (2)

= c I(r c)

r=0 c=0 (5)

It is basically the coordinate of the center of the object, similar to center of gravity. Perimeter is defined as the total pixels generating the edge of the object. Perimeter is required to calculate the useful MLF neural network with back-propagation feature of thinness ratio. This feature is very useful in learning algorithm is utilized to recognize characters describing the shape (roundness) of an object. and figures of chess by using this feature vector. Our Thinness ratio is defined as follows: database is not stable. Data are obtained from different printed Informant books and also we create own (3) dataset via software like Paint. So, the input patterns T= P2 come from the different datasets. Different datasets are used training and prediction stages of neural network. Notice that this ratio becomes 1 when the object is Although success rate of recognition of characters is a circle; the nearer this ratio to 1, the more like a 91%, success rate of recognition of chess symbols is circle. The more object become thin, the more the 75%. Because of symbols are more complex than perimeter becomes large relatives to the area; the ratio characters. decreases. The inverse of thinness ratio is compactness or irregularity ratio = 1/T. Aspect ratio is defined by the ratio of the 4 CONCLUSION AND FUTURE dimension of the circumscribed of the object. This can WORK be found by scanning the image and keeping the maximum and minimum values on row and column Chess is very popular mind game and have existed where the object lies. The equation for aspect ratio is for a long time. It still draws interest of many different as follows: science domains. In this study, we have focuses on extraction of chess games from the chess informant, c c +1 (4) max min which is recorded as texts, and animating the games on rmax rmin+1 chessboard. If an error happens during the game extraction phase, false recognitions are eliminated in Euler number is defined as the difference of accordance with the rules of chess (semantics). We number of object and number of holes. have taken only single error cases. The study presented Euler number = number of object – number of in this paper has used figurine algebraic notation holes (FAN). Chess objects have been recognized by In the form of one object, the Euler number forming feature vectors. MLF neural network with indicates the total number of closed curves within the back-propagation learning algorithm is utilized to object. Euler number can be used in character recognize chess objects by using this feature vector. recognition to differentiate character with hole (K, Q, The results obtained show that the proposed system is R, B, N, a, b, c, d, e, f, g, h, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, !, capable to contribute to the generation of robust game ?, +, -, x, .) from those without holes. For example, the databases through digitizing of chess games from the letter Q has Euler number of 0, B=-1, f=0, 8=-1, and so chess informant. In the future, we aim to recognize on. chess objects for other notations and we deal with Also, we benefit from projection. The projection of cases involving more than one error. a binary object provides useful information related to object‘s shape. It can be found by summing all the pixels along row and column. If we sum rows, we have REFERENCES the horizontal projection; if we sum columns, we have the vertical projection. We can define the horizontal [1] V.V. Nabiyev, Yapay Zeka: Problemler Yöntemler lgoritmalar 2nd ed. nkara: Seçkin Yayınevi 2005 projection hi(r) and vertical projection vi(c) as follows: pp. 522-575. [2] http://en.wikipedia.org/wiki/Chess [3] www.chesscorner.com

[4] S.H . Baird K. Tompson ―Reading Chess‖ IEEE Trans. On Pattern Analysis and Machine Intelligence, vol. 12, no. 6, pp. 697-708, 1990. [5] V.V. Nabiyev ―Providing Harmony among Different Notations Through the Chess Readings‖ IEEE 19th Signal Processing and Communications Applications Conference, pp. 29-33, 2011. [6] H. Zhu J. Lei X. Tian ― Pattern Recognition System Based on Computer Vision — The method of Chinese chess recognition‖ IEEE International Conference on Granular Computing, pp. 865-868, 2008. [7] http://chessnotation.com/ [8] http://en.wikipedia.org/wiki/Chess_annotation_symbols [9] R.J. Schalkoff, Digital Image Processing and Computer Vision, 1st ed.., New York: John Wiley & Sons Inc, 1992. [10] N. Otsu ― threshold selection method from gray level histograms‖ IEEE Transactions on Systems Man and Cybernetics, vol. 9, no. 1, pp. 62-66, January 1979.