<<

ABCDEFGHIJKLMNOPQRSTUVWXYZ234+?

TDT4500 Intelligent Systems, Specialization Project

Autumn 2010

Evolving Neural Networks

Classifying Comparing Evolution and Learning

Tiril Anette Langfeldt Rødland

Supervisor: Keith . Downing abcdefghijklmnopqrstuvwxyz567

Abstract

This dissertation investigates the classification capabilities of artificial neural networks. The goal is to generalize over the features of a , and thus classify the of a previously unseen . Evolution and learning were both used to tune the weights of a network. This was to investigate which of the methods is best suited for creation of this kind of generalization and classification networks. The difference was remarkable, as the evolved network displayed absolutely no ability to classify the glyphs. However, the learned network managed to clearly separate between the writing systems, resulting in correct classifications. It can thus concluded that artificial neural networks are able to correctly classify the writing system of a glyph, and that learning by backpropagation is the ultimate method for creating such networks.

ii Preface

This dissertation is the pre-study project of the master dissertation. It is written at the Department of Computer and Information Science, NTNU, during the autumn semester of 2010. The original plan was to find an Egyptological dissertation subject, due to my long-lasting interest in the field. I appreciate the assistance from Vidar Edland, Norwegian Society, for helping me in the search for a suitable subject. However, the search came up empty. Inspired by Dehaene (2009), Egyptology turned to , and then other writing systems joined in. Classification seemed to be a natural choice of task, with regard to different writing systems and neural networks. This revealed an interesting subject on which there seemingly had been done no previous research. I like to think it still is a relevant problem. At least it can be used to show off the generalization abilities of artificial neural networks. I wish to thank my supervisor Professor Keith L. Downing at the Depart- ment of Computer and Information Science, Norwegian University of Science and Technology, for his invaluable assistance. Special thanks are given to John A. Bullinaria, for his immense help in debug- ging, and for sharing of his infinite regarding neural networks. Gratitude is given to Arne Dag Fidjestøl and Aslak Raanes in the System Administration Group, Department of Computer and Information Science, for giving me access to hardware on which I could run the evolution. I am also very grateful to Cyrus Harmon, Edi Weitz, David Lichteblau, and all the other Common Lisp programmers out there, whose libraries saved me from months of extra work.

Trondheim, December 2010

Tiril Anette Langfeldt Rødland

iii iv Contents

Chapter 1 : Introduction 1

Chapter 2 : Background 3 2.1 Writing Systems ...... 3 2.2 Classification of Glyphs ...... 5 2.3 Artificial Neural Network ...... 6 2.3.1 The XOR Example ...... 6 2.4 Training ANNs ...... 9 2.4.1 Training ANNs Using Supervised Learning ...... 9 2.4.2 Backpropagation ...... 10 2.4.3 Fast Backpropagation ...... 12 2.4.4 Tricks for Multi-Class Classification Problems ...... 13 2.5 Evolving ANNs ...... 14 2.5.1 Evolving Neural Networks ...... 15

Chapter 3 : Methodology 17 3.1 Glyphs and Writing Systems ...... 17 3.2 Neural Network Design ...... 20 3.3 Learning ...... 21 3.4 Evolution ...... 22

Chapter 4 : Results and Analysis 25 4.1 Learning ...... 25 4.1.1 Pair Classification ...... 25 4.1.2 Full Classification ...... 27 4.1.3 Writing System Classification Quality ...... 27 4.2 Evolution ...... 30

Chapter 5 : Discussion 35 5.1 Learning or Evolution? ...... 35 5.2 Writing Systems ...... 35 5.3 Future Work ...... 36

References 39

Software Library 43

Appendix A : Charts A-1

v Appendix : Classification B-1 B.1 Pair Classification ...... B-1 B.1.1 Training ...... B-1 B.1.2 Testing ...... B-6 B.2 Full Classification ...... B-29 B.2.1 Learning ...... B-29 B.2.2 Evolution ...... B-31

Appendix : Data Distribution C-1

Appendix : Pixel Comparison D-1

vi List of Figures

2.1 A simple ANN ...... 7 2.2 Activation function ...... 7 2.3 XOR ANN ...... 8 2.4 Backpropagation ANN ...... 11 2.5 The evolution cycle ...... 15

3.1 ANN: From Unicode to output values ...... 19

4.1 Learning: full classification ...... 28 4.2 Correlation ...... 30 4.3 Evolution: full classification ...... 32

A.1 Coptic ...... A-2 A.2 ...... A-3 A.3 ...... A-4 A.4 ...... A-5 A.5 jamo ...... A-6 A.6 ...... A-7 A.7 Thai ...... A-8 A.8 Canadian Aboriginal syllabics ...... A-9 A.9 unification ...... A-10 A.10 ...... A-11

C.1 Approach 1 ...... C-3 C.2 Approach 2 ...... C-4 C.3 Approach 3 ...... C-4

vii viii List of Tables

2.1 XOR truth table ...... 8 2.2 Evolutionary methods ...... 14

3.1 Writing systems ...... 17 3.2 Evolution parameters ...... 22 3.3 The gene ...... 23

4.1 Learning: pair classification ...... 26 4.2 Learning: pair classification probabilities ...... 26

C.1 Number of glyphs ...... C-1 C.2 Comparison of combinations of relative set size ...... C-2

D.1 Glyph similarity ...... D-2 D.2 Glyph similarity ...... D-2 D.3 Pixel comparison using Richard’ curve (128) ...... D-3 D.4 Pixel comparison using Richard’s curve (200) ...... D-3 D.5 Pixel comparison using Richard’s curve (250) ...... D-3 D.6 Pixel comparison using difference ≤ 40 ...... D-4 D.7 Pixel comparison using difference ≤ 20 ...... D-4 D.8 Pixel comparison using difference ≤ 10 ...... D-4 D.9 Pixel comparison using difference ≤ 5 ...... D-5

ix Abbreviations and

ANN Artificial neural networks BP Backpropagation CJK Chinese, Japanese, Korean EA Evolutionary algorithm EEC Evolving efficient connections EP Evolutionary programming Evolutionary strategies Genetic algorithm GP Genetic programming TSS Total sum squared error XOR Exclusive or

xii CHAPTER 1

Introduction

33 000 years ago, the first Homo sapiens discovered that they could draw rec- ognizable pictures on the wall of the Chauvet cave, in southern France (Dehaene, 2009). There were drawings of animals, as well as nonfigurative shapes such as dots, lines and curves. Pictography appeared in ancient Sumeria and , creating easy-to-read . The art of writing mutated as it spread throughout the cultures. Even though are relatively easy to read, they are — due to their many details — very slow to write. They were thus simplified in different ways, and the symbols were tied to different sounds. For example, the Chinese ⾺ is a simplification of a horse, and the A was originally an ox (Dehaene, 2009, Figure 4.3). This is visible in the Greek letter α (), as means just that. Similarly, symbolizes waves (), k symbolizes a with outstretched fingers (kaf), and symbolizes a head (res)(Dehaene, 2009). Even though both ⾺ and A originally came from large, four-legged animals, the resulting glyphs are very different. For each step of simplification, some features were extracted for use in the simplified glyph. Different civilizations focused on different features. The simplification was also depending on the writing tools available; pencils and paper enable glyphs of many different shapes, while glyphs carved in stone ought to consist of straight lines. Today, 33 000 years after the first drawings in a cave, we have numerous writing systems, in different stages of evolution. The human brain seems to be rather good at separating between these different writing systems, depending on their relative placement on the family tree. It might be difficult to separate between the , Greek and Cyrillic , if one is familiar with neither. However, it is trivial to separate these from Egyptian hieroglyphs. After having learned that m, l and are hieroglyphs, it is easy to see that R also is of this writing system. Even though the glyphs within one writing system differ from each other, they are also very similar. One tend to reuse the same basic shapes for different glyphs within the writing system (Dehaene, 2009, Figure 4.1), thus creating a subconscious expectation on how glyphs from that writing system shall look. When humans separate between writing systems like that, it is due to the

1 CHAPTER 1. INTRODUCTION 2 neural network of our brain. We are able to generalize common features over a set of glyphs, and classify unseen glyphs based on that. Since the natural neural network is able to do this separation, an artificial neural network ought to be as well. We already know that artificial neural networks are able to recognize letters (Bishop, 1996). This dissertation will investigate whether an artificial neural network can generalize over the patterns and shapes common for the different writing systems, and thus find the writing system of an unseen glyph. It was evolution that gave humans a brain capable of . However, no one can read without having learnt it first. The main question of the dissertation is thus whether learning or evolution of the neural net will give the best result. That is, the ultimate solution with respect to both success rate and execution time, or rather how easy it is to attain said solution. The dissertation is structured as follows. This introductory chapter intro- duces the problem, and writing systems in general. Chapter 2 demonstrates the concepts of classification, evolution, and artificial neural networks, including com- mon methods and algorithms. Also writing systems are discussed in more detail. Chapter 3 presents the methodology used, focusing on differences from the stan- dard approach given in Chapter 2. Chapter 4 reveals the experiments and their results, including a low-level analysis. A more general discussion can be found in Chapter 5, in addition to future work. Additional information can be found in the appendices. Appendix A includes Unicode charts from the writing systems used, and Appendix B contains the full results from the experiments. The optimal data set distribution is researched in Appendix C, while a comparison of the writing systems based on pixel values can be found in Appendix D. CHAPTER 2

Background

This chapter introduces the concepts used in this dissertation. First there is a brief overview of writing systems, followed by an introduction to artificial neural networks.

2.1 Writing Systems

A writing system is the set of glyphs used to represent the writing of a specific human (FOLDOC, 1998b). In this case, a glyph is an image used to visually represent the characters of the language (FOLDOC, 1998a). There are many different classes of writing systems, depending mainly on what each glyph represent. Below is the classification used by The (2009, Ch. 6), as well as a few examples.

Alphabets are writing systems consisting of letters, where each letter can be either a or a . The two have the same status as letters, giving a homogeneous collection of glyphs. The Latin is the most well-known example, as it is used to write thousands of different . Another example is the old runic alphabet.

Abjads are writing systems which have only the and the long . All other sounds are either left out entirely from the writing, or simply indicated by specific marks on the consonants. are often , with Hebrew and Arabic as well-known examples.

Syllabaries are writing systems where each represent both a consonant and a vowel, or sometimes more than one. The Japanese systems — hiragana and — are both . The glyphs there describe like for example (か), ki (き), ku (く), ke (け), (こ), (ら), and ri (り). Here, each glyph is unique in shape, such that か and き look nothing alike, even though they both start with the sound k.

3 CHAPTER 2. BACKGROUND 4

Other syllabaries, like the Korean Hangul jamo, have glyphs functioning more like conglomerates. That is, they consist of one part for the initial consonant or consonants, one part for the vowel, and an optional consonant part in case of a final consonant.

Abugidas are writing systems which mix the syllabic and alphabetic character- istics. Each consonant has an inherent vowel connected to it (usually a). There are also independent vowels, and when one of these follows a consonant, it over- rides the inherent vowel. For example, when i follows ka, it results in ki. Often, there is also a symbol that, when it follows a consonant, kills the inherent vowel without adding another sound, thus resulting in a bare consonant sound. The , which is used to write for example Hindi and Sanskrit, is an .

Logosyllabaries are writing systems representing words or of words, instead of only the sounds. The Han script — which is used to write Chinese — is a logosyllabary. These glyphs are borrowed by other East Asian languages, .. Japanese and Korean. As a consequence, it is possible to understand when you know Japanese, even though the spoken languages are nothing alike. Japanese is special, in that it uses four different writing systems in the same composite system (Banno et al., 1999). That is, (logosyllabary) for things and meanings, hiragana () for e.g. grammatical postfixes and particles, katakana (syllabary) for e.g. foreign words, and Latin (alphabet) for romaji.¯ Romaji¯ is a of the sound, for use when the other systems will not be understood, e.g. when working with computers and foreigners. Because of this separation, the kanji is used only for meaning, not sound. Each kanji can have many different sounds connected to it. As a result, the same sequence of kanjis can be translated to different strings of kana, depending on the context. ⽇曜⽇, meaning Sunday, is pronounced nichiyoubi, にちよう び. Note that the first and the second ⽇ have different sounds. This is because they have different meanings; the first ⽇ means Sun, while the second means day.1 The basic meaning is the same, however, as the day can be seen as a function of the Sun. Similarly, ⽉ means both Moon and month. When it comes to logosyllabaries, context is everything. A hiragana て, on the other hand, will always be a te. Egyptian hieroglyphs, however, can double as both kanji and kana. While the majority of the hieroglyphs are logographs, displaying the meaning, there is also a subset of these that behaves like an alphabet. Other glyphs represent sequences of consonants. In other words, hieroglyphs are complicated. One can take () as an example (Collier and Manley, 2003). This looks like the floor plan of a simple one-room house, and is thus the symbol used for j house. | means an actual house, as | is a sign used to indicate that this indeed j is an . When used in rL (pr), on the other hand, j has nothing to do with houses. Similarly, r (r) does no longer mean mouth. j is now used for the sound pr, which means to go out, leave. The r is there to clarify the

1曜 means weekday. 5 2.2. CLASSIFICATION OF GLYPHS reading of j, not to add an extra r. The walking legs at the end, L, are used as a , an unspoken character placed at the end to give the reader a general idea of the meaning. Which in this case is movement.

2.2 Classification of Glyphs

Glyph classification is an important field of research, with many application areas. Systems that can recognize and classify glyphs can be used to e.g. sort mail automatically based on the hand-written zip codes (or similar area-specific numbers), and convert printed documents into digital text. A good example is Detexify, created by Kirsch. It is a LaTEX symbol recognizer, driven by an artificial neural network. The user draws a symbol on the screen, and the system displays a list of possible symbols and their LaTEX command. All of this necessitates the recognition of each glyph, i.e. to separate an a from a b, and to recognize a’s of different and quality, printed or handwritten. Character recognition is thus a much researched area, and an important part of general pattern recognition (Bishop, 1996). Mantas (1986) and Govindan and Shivaprasad (1990) present an overview of the methods at the time. However, there is still research going on in the field. For example, Ramos-Garijo et al. (2007) has created an engine to recognize handwritten characters, and Rocha and Pavlidis (1994) propose a method of recognizing characters written with different fonts. This dissertation, however, is not about recognizing neither a handwritten nor a printed a. It is about recognizing the a as a Latin character. Which is a problem without many application areas, as the writing system tend to be known based on context. As a result, there is not much research going on in this part of the field. Alas, there is no available research material on actually classifying the writing system of a glyph. The most relevant research might be by Knight and Yamada (1999), who propose an approach to decipher unknown scripts. That is, historically forgotten scripts where either the written or the spoken language is unknown. Unfortunately, that is of no use for this problem. The main difference between recognizing a character and classifying the char- acter is the data set. When trying to recognize an a, the system has already seen many a’s. They might be different, with different fonts or noise. Nonetheless, the system has seen an a before. The task is then to generalize over all the a’s and extract the important features, to also recognize a’s which look a bit different. This luxury is not allowed when classifying the writing system. The point is not to recognize already seen glyphs, but to generalize important aspects of a writing system and recognize other glyphs based on these features. That is, to recognize a か as a hiragana after learning that は, て and る are hiragana. However, both character recognition and glyph classification are pattern recog- nition problems. As artificial neural networks are an important tool in the field of pattern recognition (Bishop, 1996), there is reason to believe they will solve the glyph classification problem. CHAPTER 2. BACKGROUND 6

2.3 Artificial Neural Network

An artificial neural network (ANN) is an artificial constructed brain, designed to benefit from its parallelism (Callan, 1999). An ANN is more or less a collec- tion of many interconnected processors, similar to the neurons in a brain. Each processor — or neuron — is very simple. It receives an input signal, and sends an output signal, which is a function of the input signal. Even though one neuron is simple, an ANN can solve quite complex problems due to the large numbers of these simple neurons. These nodes or neurons can be structured in many different ways. The ANN must have at least one input node, and at least one output node. There may also be nodes in other layers, so-called hidden nodes. These are the nodes that are neither input nor output, and are thus hidden in the black box of the ANN. All these nodes can be connected to each other in multiple ways. One common approach is the layered model, with one layer of input nodes, one layer of output nodes and zero, one or more layers of hidden nodes. Here, each layer is fully connected to all the nodes in the layer above. There can also be connections internally in the layers. Another approach is to allow any node to be connected to any other node, without any layered structure. The ANN in Figure 2.1 on the facing page is an example of a three-layered ANN, fully connected between the layers. There are two blue input nodes, two red output nodes and one layer of four green hidden nodes. A link is the collection of connections between two layers. In the ANN in Figure 2.1, there are two links; one between the input layer and the hidden layer, and one between the hidden layer and the output layer.

netj = xiwij, (2.1) i=0 For all nodes but the input nodes, the input signal is given as Equation 2.1, where xi is the input weight from node i in the layer below, and wij is the weight on the arrow from node i to this current node, j. So the input is the sum of the output from all the incoming nodes, times their weight. The output is calculated by sending the input signal through an activation function. This function takes in the input, and returns a number between either 0 and 1, or -1 and 1. A commonly used activation function is the sigmoid function (Equation 2.2 and Figure 2.2 on the next page). 1 (net) = (2.2) 1 + exp(−net) The output of an ANN is thus a function of the input signals, but also of the weights. These can be manually set on small networks, but on larger networks they ought to be automatically configured.

2.3.1 The XOR Example The XOR ANN in Figure 2.3 on page 8 is a good example of how an ANN works. The XOR function (exclusive or, Table 2.1,(WorldLingo, 2010n)) is a binary operation that is true only if one input is true and the other false. .

7 2.3. ARTIFICIAL NEURAL NETWORK

.O1 .O2

.H1 .H2 .H3 .H4

.I1 .I2

Figure 2.1: A simple ANN with one hidden layer.

The sigmoid activation function 1 0.9 0.8 1 0.7 f(net) = 1+exp(−net) 0.6 )

net 0.5 ( f 0.4 0.3 0.2 0.1 0 -4 -2 0 2 4 net

Sigmoid

Figure 2.2: The sigmoid activation function. CHAPTER 2. BACKGROUND 8

Table 2.1: Truth table for the XOR function.

. Input Output A B A XOR B 0 0 0 0 1 1 1 0 1 1 1 0

. .

.H4act = 1

.1 .1 .-2

.H1act = 1 .H2act = 2 .H3act = 1

.1 .1 .1 .1

.I1 .I2

Figure 2.3: An ANN that solves XOR. 9 2.4. TRAINING ANNS

Say I1 = 1 and I2 = 0. The input signal for the three nodes in the first hidden layer of Figure 2.3 is then × × H1in =I1 1 = 1 1 = 1 × × × × H2in =I1 1 + I2 1 = 1 1 + 0 1 = 1 × × H3in =I2 1 = 0 1 = 0

The activation values are 1, 2 and 1, respectively. As a result, only H1 is activated, due to the activation values. ≥ ⇒ H1in H1act H1out = 1 ⇒ H2in < H2act H2out = 0 ⇒ H3in < H3act H3out = 0

The input signal to H4 is thus × × − × H1out 1 + H2out 2 + H3out 1 = 1, meaning the output signal is 1. The output node will then be 1, as True XOR False is True. It is thanks to the weight values that this ANN actually manages to solve the XOR function. In this case, the weights are possible to set by hand, as it is a rather small network. For larger networks, however, we must find a way to tune in the correct weight values automatically. This can be done by for example learning or evolution.

2.4 Training ANNs

There are three main types of learning in an ANN: Supervised learning is learning with a teacher that for each step informs the system if it was right or wrong, and gives the correct answer (Downing, 2009c; Floreano and Mattiussi, 2008). This is often used in classification, where each new training object has a correct (known) class. Unsupervised learning is learning with no external feedback, neither from a teacher nor the environment. The system itself ought to realize what was a good or a bad move (Downing, 2009d). Reinforcement learning is learning with only infrequent feedback and super- vision (Downing, 2009b). The system is told the overall result, but not what it did wrong and what it should have done. This section will focus on supervised learning, as that is the main method for use in classification. All the glyphs do have a correct and known answer, and this should be used to guide the learning.

2.4.1 Training ANNs Using Supervised Learning Using supervised learning, the weights of the network are learned by exploiting the difference between the output of the network and the known solution. CHAPTER 2. BACKGROUND 10

For each pattern µ of the M total training patterns, there is a pair of input vectors ~xµ and desired output vectors ~tµ. The goal is to find the set of synaptic weights such that the actual output ~yµ of a two-layered network is as close as possible to the desired output ~tµ, for all of the M patterns. The weights of the network should change gradually, until it is performing acceptably. The performance can be described using an error function, e.g. the mean quadratic error between the desired and the actual output (Equation 2.3). The error function is, just as the output ~yµ, depending on the synaptic weights. The error can thus be reduced by changing the weights accordingly.

  2 ∑ ∑ ∑ ∑ ∑ 1 µ − µ 2 1  µ − µ EW = (ti yi ) = ti wijxj (2.3) 2 µ i 2 µ i j

The weight change is given by a learning rule. Differentiating (2.3) with respect to the weights gives the learning rule in Equation 2.4, where η is the learning rate (Wagstaff, 2008; Floreano and Mattiussi, 2008).

∆wij = η (ti − yi) xj (2.4)

This algorithm is known both as the Widrow-Hoff rule — after its authors Widrow and Hoff (1960) — and as the rule, because of the difference δ = . It is often written as simply ∆wij = ηδixj. This ∆wij is then added to the weight wij, either online, i.e. after each and every pattern, or in batch mode, i.e. adding the accumulated weight change after all the M training patterns have been run through the net. While the delta rule works fine for training patterns that are linearly separable, it does not work for more complex patterns, e.g. the XOR example in Section 2.3.1. As this function needs hidden layers (Figure 2.3 on page 8), a more complex version of the delta rule is necessary. The solution — the generalized delta rule — was presented by Rumelhart et al. (1986). This algorithm, best known as backpropagation of error, propagates the error back through the network, from the outer layer back through all the hidden layers, and to the input layer. As a result, it can handle a neural network with an arbitrary number of nodes and layers.

2.4.2 Backpropagation Backpropagation can be explained using the three-layered ANN in Figure 2.4 on the facing page. The backpropagation algorithm given in Floreano and Mattiussi (2008) for online weight updating, is as follows:

1. Initialize all weights, vjk and wij, to random values close to 0.

2. Set the values of input nodes to the current training pattern s:

µ µ xk = sk (2.5) .

11 2.4. TRAINING ANNS

. . .yi

.wij

. . . . .hj

.vjk

. . .xk

Figure 2.4: An ANN with the backpropagation syntax. xk is the input layer, with k nodes. Similarly, hj and yi are the hidden layer with j nodes and the output layer with i nodes, respectively. vjk is the link between the input layer and the hidden layer. Each node in the input layer is connected to every node in the hidden layer, so the link consists of k × j connections. Similarly, wij is the link between the hidden layer and the output layer.

3. Compute the values of the hidden nodes, based on the input nodes and the link from the input to the hidden layer: ( ) ∑ µ µ hj = Φ vjkxk (2.6) k where 1 Φ() = − (2.7) 1 + e kai 4. Compute the values of the output nodes, based on the hidden nodes and the link from the hidden to the output layer:   ∑ µ  µ yi = Φ wijhj (2.8) j 5. Compute the delta error for each output node, based on the hidden nodes and the link from the hidden to the output layer, as well as the difference between the expected and the actual output:   ∑ µ ˙  µ µ − µ δi = Φ wijhj (ti yi ) (2.9) j CHAPTER 2. BACKGROUND 12

It can be noted that the derivative of the sigmoid function can be expressed in terms of the output of the sigmoid function (Wagstaff, 2008):   ∑ ˙  µ µ − µ Φ wijhj = yi (1 yi ) (2.10) j As a result, the delta error can be written solely depending on the actual and the expected output: µ µ − µ µ − µ δi = yi (1 yi )(ti yi ) (2.11)

6. Compute the delta error for each hidden node, based on the input nodes, both links, and the delta error for the output nodes: ( ) ∑ ∑ µ ˙ µ µ δj = Φ vjkxk wijδi (2.12) k i Thanks to the differentiated of the sigmoid function, however, the error is independent of the input: ( ) ∑ µ µ − µ µ δj = hj 1 hj wijδi (2.13) i

7. Compute the changes in synaptic weights, based on the delta errors and the output values of the first two layers:

µ µ µ ∆wij = δi hj (2.14) µ µ µ ∆vjk = δj xk (2.15)

8. Update the weights by adding the changes to each weight, multiplied with the learning rate η:

t t−1 µ wij = wij + η∆wij (2.16) t t−1 µ wjk = wjk + η∆wjk (2.17)

The algorithm goes through this procedure multiple times for every training case, until the total sum squared error is sufficiently small. This error (Equation 2.18) is computed over all the output nodes (i), for all the training patterns (µ). ( ) ∑M ∑N 1 1 µ − µ 2 TSS = (ti yi ) (2.18) M µ N i

2.4.3 Fast Backpropagation The backpropagation algorithm fulfils its task, but is not specifically designed to be as efficient as possible. Ngolediage et al. (1993) thus proposed a more efficient variant of the algorithm, the fast backpropagation algorithm. The fast backpropagation algorithm is similar to the standard algorithm, but is continuously adapting the learning parameter η for individual synapses. This is done using the Fermi-Dirac Distribution (Ngolediage et al., 1993), which again is based upon quantum principles. 13 2.4. TRAINING ANNS

The probability P (w) that a specific electron has an energy between w and w + δw is given by the Fermi-Dirac Distribution function (Equation 2.19). 1 P (w) = ( ) (2.19) w−wF 1 + exp KT

K is the Boltzmann constant (Encyclopædia Britannica, 2010b), T is the ab- solute temperature (Encyclopædia Britannica, 2010a), and wF is the Fermi level of the material (Encyclopædia Britannica, 2010c). Inspired by this, the fast backpropagation updates the learning rate by Equa- tion 2.20. ( ) |β(t)| η = λ exp − (2.20) γT

1 β(t) = (||∆w(t)| − |∆w(t − 1)|| + β(t − 1)) (2.21) 2 Here, T is a parameter pretending to be the system temperature, and β(t) (Equation 2.21), is the mean difference between the step sizes at two consecutive epochs (t − 1 and t). λ is the step size parameter, and γ ≥ 0 is the scale factor for T . The final weight update rule for fast backpropagation is thus given as Equation 2.22, where E is the error between the desired and the actual output, and α is the momentum of the change. ( ) |β(t)| δE ∆w(t) = −λ exp + α∆w(t − 1) (2.22) γT δw(t)

2.4.4 Tricks for Multi-Class Classification Problems Bullinaria (2009) proposed the use of -entropy error function and the softmax activation function for multi-class classification problems. According to Dunne and Campbell (1997), there is a natural pairing between these two functions, and they should be used together.

Softmax The softmax activation function is only for use in the output layer (Wikibooks, 2010). It is specifically designed for the output nodes, where each node is representing one class. Softmax is giving each of these classes (or nodes) a value, which is the probability that the input pattern belongs to this class (Bishop, 1996). It thus ensures that all the output values are in the range [0, 1], and that all the output nodes sum up to 1.

eqi = ∑ (2.23) n qj j=1 e

The softmax function is represented by Equation 2.23, where pi is the value of output node i, is the net input to the output node i, and n is the number of output nodes (Bishop, 1996). CHAPTER 2. BACKGROUND 14

Cross-entropy error function The cross-entropy error function (Equation 2.24) is the output error function. It is used to specify if the error if low enough to stop learning (Bullinaria, 2009; Bishop, 1996). It is used together with the delta function (2.25), instead of the more common delta function (2.11)(Bullinaria, 2009).

− µ µ − µ − µ Error = (ti log(yi ) + (1 ti ) log (1 yi )) (2.24)

µ µ − µ δi = ti yi (2.25)

2.5 Evolving ANNs

Evolutionary computation is a search method, or family thereof, deeply in- spired by the concept of evolution (Darwin, 1859). One starts off with a bunch of randomly generated solutions — or individuals — and use these to evolve a good solution. As long as one has a fitness function — a way to grade how well the individual solves the problem — the fitness can be used to kill off the least fit individuals, and enhance the best. As in real world evolution, the surviving individuals mate and create new individuals, with aspects from both parents. Mutations can also occur, changing one individual slightly. This might be a bad move, in which case the individual is likely to die. It is also possible that the mutation increased the fitness. In that case, the individual is allowed to mate, and the mutation will spread down through the generations. Evolutionary computation can be divided into four subfields: a) evolutionary strategies (ES), b) evolutionary programming (EP), c) genetic algorithms (GA), and d) genetic programming (GP). They all follow the same basic principle, but differ in their traditional characteristics (Downing, 2009a). According to Downing (2009a, Table 1), the characteristics of the four evolu- tionary methods can be summarized as in Table 2.2. The main difference between the four is the choice of genotype. ES use a list of real values, EP a list of integers, and GA use a bit vector. GP is used to evolve programs, and the genotype is thus a code tree.

Table 2.2: Brief overview of the four evolutionary methods (Downing, 2009a, Table 1). Over- Prod is selection among an overproduction of children, GenMix is selection among both current and last generation, GenRep is full generational replacement, and FitPro is fitness proportion- ate.

Adult Parent Mate Method Genotype Mutation Crossover selection selection selection Real OverProd, ES Yes Sometimes Truncate Uniform vector GenMix Int vec- OverProd, EP Yes No Truncate Uniform tor GenMix Bit vec- GA Yes Yes GenRep Uniform FitPro tor GP Code tree Yes Yes GenRep Uniform FitPro .

15 2.5. EVOLVING ANNS

.Development .Reproduction

.Parent .selection .Adult .selection

Figure 2.5: The evolution cycle can be described as these stars. The green four-tagged stars are the representation of the genotypes, and the blue eight-tagged stars are the phenotypes.

Even though the four methods differ in the implementation details, the main algorithm is the same (Figure 2.5). Simple representations are created (the geno- types), and these develop into phenotypes. Phenotypes are more complex data structures that can be tested for fitness. Each phenotype will thus have a fitness value. Based on this fitness, some (or all) of the individuals in the population are selected to become adults. Of these, some (or all) are selected to become parents. From this pool of parents, pairs of individuals are selected to mate, creating new individuals with genotype parts from both parents (or as a copy of one). Some of these will also be mutated, with a small change in genotype. Now we have a new population of genotype-children, which develops into phenotypes, and so it continues.

2.5.1 Evolving Neural Networks When evolving a neural network, one must find a genotype to represent the network (the phenotype). The fitness function will be depending on the output of the neural network. A bit vector genotype gives the most freedom when it comes to translation of the genotype to phenotype. A bit vector can represent almost anything, and is thus a good choice when evolving a neural network (Whitley et al., 1990). That points to the use of a genetic algorithm (Table 2.2), and thus also the use of both mutation and crossover, full generational replacement as adult selection, a CHAPTER 2. BACKGROUND 16 uniform parent selection and a fitness proportionate mate selection. When it comes to evolution of neural networks, there are many ways to do it. One can use a predefined topology and only evolve the weights, or one can evolve the topology as well (Jones, 2009; Dewri, 2003). Since this network shall be changed by both the learning and the evolution, the topology ought to be predefined. That would make the implementation less complicated, and more modular. However, it is still possible to slightly adapt the network. Shi and (2008) designed EEC (Evolving Efficient Connections), a method to evolve both connection weights and a switch to turn the weight on or off. Otherwise, one must wait for all the bits to become 0 for the connection to be removed; with EEC just one bit needs to be turned off.

2.5.1.1 Genetic Algorithm A genetic algorithm used to evolve a neural network has the same main ap- proach as in Figure 2.5 (De Jong, 2006): 1. Randomly generate a population of m parents. 2. Repeat: 3. Compute and save the fitness (i) for each individual i in the current parent population. 4. Define selection probabilities p(i) for each parent i so that p(i) is proportional to u(i). 5. Generate m offspring by probabilistically selecting parents to produce offspring. 6. Select only the offspring to survive. 7. End repeat The uniform parent selection from Table 2.2 can be seen in line 3, where all the adults are seen as potential parents. No individuals are excluded from the possibility of reproduction; their probability of reproduction is, however, depend- ing on their fitness (line 4 and 5). There is also full generational replacement, as all of the offspring — and nothing but the offspring — is allowed to survive (line 6). The most important, however, is the use of a bit vector as genotype. CHAPTER 3

Methodology

3.1 Glyphs and Writing Systems

The writing systems to be classified are chosen based on the classification in Section 2.1. There are two systems from each type, as seen in Table 3.1.

Table 3.1: The writing systems to be classified, and their respective type and Unicode values (The Unicode Consortium, 2009). Example glyphs from the writing systems can be found in Appendix A.

Type Script range Image

Coptic U2C80-U2CFF Alphabet Runic U16A0-U16FF

Hebrew U0590-U05FF Arabic U0600-U06FF

Hangul U1100-U11FF Syllabary Hiragana U3040-U309F

Thai U0E00-U0E7F Abugida Canadian Aboriginal syllabics U1400-U167F

Han U4E00-U9FCF Logosyllabary Egyptian hieroglyphs U13000-U1342F

The Egyptian hieroglyphs were chosen because of a strong interest in the , and because — unlike in most other writing systems — the original meaning of the glyph is still visible within it (WorldLingo, 2010f). This is an interesting feature that adds to the complexity of the glyph. Coptic was chosen because of the same Egyptian interest, as Coptic was used to write the (WorldLingo, 2010b). Also, it is quite close to

17 CHAPTER 3. METHODOLOGY 18

Greek, which also was under consideration.

As the other alphabet, runic was chosen. are also historically interest- ing, as they were used here in (WorldLingo, 2010j).

Abjads are mostly used in the Middle East (WorldLingo, 2010m). Hebrew and Arabic were chosen mainly because they are both well known and not too similar (WorldLingo, 2010e,a).

Han — also known as the CJK ideographs, the unified logosyllabaries of - nese, Japanese and Korean — were chosen as a modern logosyllabary. This was because of their differences from the Egyptian hieroglyphs, their widespread use today, an interest in Japanese, and because it is a writing system used for three very different spoken languages (WorldLingo, 2010c). While this might be common for other writing systems as well (e.g. Latin (WorldLingo, 2010i)), the fact that Han is a logosyllabary makes the same text more or less readable for both Chinese, Japanese and . The most important reason for choosing Han/CJK and not just kanji, however, is because the system relies on Unicode, and CJK is the Unicode way of saving storage on three very similar writing sys- tems (The Unicode Consortium, 2009, CJK Unified Ideographs).

Since the Japanese kanji were added as part of Han, hiragana was chosen. Part because of the interest in Japanese, but mainly to see if the system would have less trouble separating the kanji from the hiragana than from the other scripts. After all, hiragana is a simplification of kanji, for use by the uneducated women (WorldLingo, 2010g). Katakana — another Kanji simplification (WorldLingo, 2010h) — was also under consideration, but hiragana was chosen; part because of its softer fonts (as to not interfere with the runes), and part because hiragana is the most common of the two. Katakana is mostly used for loan words, transcribed from foreign languages (WorldLingo, 2010h).

The other syllabary was chosen for similar reasons. Hangul jamo is the native writing system of the (WorldLingo, 2010d), in contrast to Han. It is also not too different from Han, even though Hangul is not a logosyllabary.

Thai was chosen to have another complex set of glyphs from , of yet another type (WorldLingo, 2010l). Then the abugida was topped off by the Canadian Aboriginal syllabics, mainly because they look a bit alien and have fascinating vowels. That is, the independent vowel following a consonant is represented by the rotation of the consonant (WorldLingo, 2010k). 19 3.1. GLYPHS AND WRITING SYSTEMS

.1.00

∑ .

.0.02 .0.05 .0.11 .0.82

.y1 .y2 .... .y10

.0.5 .h1 .h2 .... .h50

. .0.5 .x1 .x2 .... .x400

. . . .

.

.I

.#13001

Figure 3.1: The path from glyph to output values is long and complex. The starting point is the Unicode value, i.e. #13001, which is converted to a glyph image. This is resized to a 20 × 20 image. Each of the 400 pixels are given to an input node in the ANN, after going through a sigmoid function (Equation 3.2 on page 21). The output of the ANN is a list of normalized number, one for each output node; thus one for each writing system. The value is the probability that the glyph in question is of that specific writing system. CHAPTER 3. METHODOLOGY 20

3.2 Neural Network Design

The neural networks used have the same three-layered architecture. There are 400 input nodes plus one bias node, 50 hidden nodes plus one bias, and up to ten output nodes.

Nodes The number of input nodes is the number of pixels in the glyph images. This must be balanced between enough pixels to separate the glyphs and few enough to have an acceptably low execution time. The trick is thus to go as low as possible without compromising the image quality too much. By creating glyphs of different resolutions, those around 20 times 20 were found to be a good compromise. The number of hidden nodes was chosen a bit more ad hoc. 50 was chosen because it gave an acceptable result in acceptably short time. 100 hidden nodes added more to the execution time than to the quality of the results. The number of output nodes, on the other hand, is undoubtedly correct. There is one output node for each writing system to be classified in that run. Each output node has thus its own writing system, and the value of the node describes the probability that the glyph in question is of that specific writing system. This output design is to help analyze the relationship between the writing systems. By using the probability for each of the writing systems, it is easier to pick up on common misclassifications and other patterns, e.g. if it really is harder to separate between Han and hiragana than, say, Han and Thai. The two bias nodes were chosen to add a threshold to the nodes (Jones, 2009; Floreano and Mattiussi, 2008; Callan, 1999). The bias was originally 1, but experimentation on the values 0, 0.5 and 1 indicated that 0.5 was the best value for both bias nodes.

Architecture The final architecture, as well as the whole path from Unicode point to output values, can be seen in Figure 3.1 on the previous page. The starting point is the Unicode value, e.g. #13001. This is converted to a glyph image, which then is resized to a 20 × 20 image. This results in a list of 400 pixels, each with an integer value between 0 and 255, inclusive. Each of these pixels are run through a sigmoid function (Equation 3.2 on the facing page), such that the final value is a floating point between 0 and 1, inclusive. In addition to reducing the numbers to a more ANN friendly value, the sigmoid function widens the gap between the light and the dark pixels, creating more of a binary situation. The 400 floating points are then given to one input node each, and the values propagate through the network in the normal ANN manner (Section 2.3 on page 6). The final output is a normalized list of the values of the output nodes. That is, they sum to 1.0, or 100%. Each of these values is the probability that the glyph in question is of the writing system coupled with that specific output node. A perfect classification of #13001 would thus be (0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000), as the glyph1 is an Egyptian hieroglyph,

1The glyph used in Figure 3.1 is I. This is a determinative, or meaning-sign, used to illustrate that the word it follows has something to do with the mouth, either literally or metaphorically. That is, it follows words for what can be taken in or expelled through the mouth. As it is a determinative, the glyph has no sound of its 21 3.3. LEARNING which is coupled to the last output node. The writing systems are coupled with nodes in the order given in Table 3.1 on page 17.

3.3 Learning

The learning algorithm is based on the standard backpropagation algorithm on page 10, but using the tricks in Section 2.4.4.

Weight Initialization All the weights are initialized according to Equation 3.1, where n is the number of nodes in the link’s input layer. This gives a number between -0.3 and 0.4, before it is divided on the number of input nodes n. The weights are divided on n to make sure that the input values for the nodes in the next layer will be at an acceptably low level, even given the large number of incoming weights.

random(70) − 30 w = (3.1) 100n One might think that using 35 instead of 30 — thus initialize to a number between -0.35 and 0.35, with an average of 0 — would be a better solution. How- ever, experimentation showed that this led to a slowly increasing error, instead of a faster decreasing error. Initialization between -0.4 and 0.3 led to a faster increasing error. A number between -0.3 and 0.4 seemed thus like best option.

From Glyph to Input Node For each of the training patterns, the pixels are mapped onto the input nodes. However, the pixel values are not directly copied. Both copying, some different sigmoid functions, and bucket classification were tried. The best result came when sending the pixels through Equation 3.2 (Ta- ble D.3 on page D-3). The input values are thus in the range [0, 1].

1 Φ(n) = (3.2) 1 + e−0.1(n−127)

From Input Node to Output Node The values go from the input layer to the hidden layer through Equation 2.6, and further to the output layer by using Equation 2.8. The sigmoid function used, however, is not Equation 2.7. For the last link, into the output nodes, the softmax activation function (2.23) is used. As the softmax function is specifically designed for multi-class classifica- tion problems (which the glyph classification problem is), it is the natural choice. However, as it is only for use in the output layer, Equation 3.3 was used for the hidden layer. (3.3) is a specialized version of (2.7), with k = 1.

1 Φ(ai) = − (3.3) 1 + e ai own; it is simply a written construct to give the reader more information and context. (Collier and Manley, 2003) CHAPTER 3. METHODOLOGY 22

Table 3.2: Parameter values for the evolution.

Parameter Value Crossover rate 0.30 Mutation rate 0.02 Elitists 2 Number of children 60 Culling rate 0.20 Maximum number of generations 200 Error threshold 0.1

Error and Weight Change The errors for the output layer and the hidden layer are calculated using Equations 2.25 and 2.13, respectively. (2.25) is used because of the softmax function and the cross-entropy error function. The weight change is based on (2.14) and (2.15). In addition, there is a momentum factor, adding a momentum term m that determines the effect the last weight change shall have on this weight change (Cho et al., 1991).

µt µ µ µt−1 ∆wij = δi hj + m∆wij (3.4) µt µ µ µt−1 ∆vjk = δj xk + m∆vjk (3.5) In reality, however, it is the same as in (2.14) and (2.15). Experimentation has shown that m = 0 gives the best results, so the whole term is in practice removed. The final weight is then updated as in (2.16) and (2.17), by increasing the weight by the product of the weight changes and the learning rate η = 0.01.

Error and Stopping Criteria The final error is the cross-entropy error function (2.24), as proposed by Bullinaria (2009). The learning continues until the error (3.6) is less than 0.01. When the error is this low, the accuracy is acceptably high, and it does not take too long to run.

|− µ µ − µ − µ | Error = (ti log(yi ) + (1 ti ) log (1 yi )) < 0.01 (3.6)

3.4 Evolution

As a genetic algorithm (GA) is a good choice of evolutionary computation method when it comes to evolving neural networks (Downing, 2006), that is the method used here. That includes a bit vector genotype, both mutation and crossover, full gen- erational replacement, uniform parent selection and fitness proportionate mate selection (Table 2.2 on page 14). The parameters used in the evolution can be found in Table 3.2.

Development As this is a GA, the genotype is a bit vector. Each chromosome consists of one gene for each weight, where each gene consists of 8 bits (Table 3.3). The first bit is the connection bit, inspired by EEC (Shi and Wu, 2008). If it is 0, there is no connection; i.e. the weight is automatically set to 0. If 1, the 23 3.4. EVOLUTION

Table 3.3: The gene consists of 8 bits, and specifies whether there is a connection (bit 0), the sign of the weight (bit 1) and the weight value (bit 2-7).

0 1 2 3 4 5 6 7 Connection Sign 100 10−1 10−2 10−3 10−4 10−5 weight is as specified by the later bits. The second bit is the sign. If it is 0, the weight is positive; if 1, the weight is negative. The rest of the bits specify the weight value. The first of these bits specifies the ones, then follows the tenths, the hundreds and so on. This results in a weight range from −1.96875 (11111111) to 1.96875 (10111111).

Fitness Testing The fitness of the individual is depending on the classification results. The weights created during development are pushed on an ANN, and then the ANN is used to classify the glyphs in the training set. The error is calculated as the average error for all the glyphs. The error function used is not the same as during learning, but simply the total sum squared error (Equation 2.18). The fitness of an individual is calculated based on the actual error and the maximal error. As the error is the total sum squared error (2.18), and the possible values for each node is between 0 and 1, the error must necessarily be between 0 and 1. The fitness function is thus: Error − error fitness(error) = max = 1 − error (3.7) Errormax

Adult Selection As specified by the algorithm (Section 2.5.1.1), GAs shall have a full generational replacement, where all of the children — and only the children — survive into adulthood. As a result, there is virtually no selection pressure. This gives all the individuals the possibility to stay and maybe develop into a very good individual in some generations. However, this is a slow process and success is in no way guaranteed. To speed up the process, culling was introduced (Baum et al., 2001; McQuesten and Miikkulainen, 1997). Real-life culling is the process of removing animals from a group based on specific criteria, either to reinforce good characteristics or to remove undesired characteristics (Merriam-Webster, 2010). The same process is implemented here; the individuals are sorted based on fitness, and a certain percentage of the best children are kept, while the rest are killed off. The adult population consists thus of only 20% of the best children.

Parent Selection There is a uniform parent selection, as specified by the algo- rithm (Section 2.5.1.1). It is not, however, as uniform as it was meant to be, as the suboptimal individuals were killed off in the adult selection. Instead of a uni- form selection between all the individuals, it is thus a uniform selection between the best individuals.

Reproduction The reproduction follows the algorithmic guidelines given in Sec- tion 2.5.1.1. The parents are chosen from the adult pool with a probability proportional to their fitness. There is a 30% chance the parents are mating using CHAPTER 3. METHODOLOGY 24 crossover, otherwise they are just directly copied into the next generation. In case of crossover, their chromosomes are split on the border between two genes, to prevent the destruction of two potentially good genes. Then, for each gene in each of the new chromosomes, there is a 2% chance of a mutation, i.e. a bit flip of a random bit. The newborn children are not the only ones competing for a place in the next generation. There are also two elitists; the very best from the last generation. In addition to having a high probability of giving their genes to the next generation through mating, they are allowed to join in themselves. This is to prevent good individuals from dying off without contributing to even better children; if the next generation is not superior to the last, at least the best individual is not any worse.

Stopping Criteria The evolution was meant to continue until one of the indi- viduals had reach the error threshold of 0.01. However, the evolution was a slow process, with a very slowly decreasing error. The available memory, on the other hand, tended to disappear very quickly. Alas, the whole process always died of memory fault, long before this error was reached. To solve this problem, the evolution continues only until the error has reached 0.1, but for no more than for 200 generations. Then, a neural network is created based on the best individual, and that ANN is used for testing. CHAPTER 4

Results and Analysis

For both learning and evolution, full comparisons using all ten writing systems were performed. In addition, the writing systems were compared in pairs using learning. This was not retested using evolution, as it turned out that learning worked while evolution did not.

4.1 Learning

Learning gave generally good results, and was used for both pair classification and full classification.

4.1.1 Pair Classification All combinations of writing systems were compared in pairs. The graphs can be found in Appendix B.1. For the training, all the runs that contain a specific writing system are drawn in the plot for that writing system. As a result, all runs are plotted twice; once for each writing system. The test graphs, on the other hand, contain only two writing systems at once. These classifications were performed using learning, with a learning rate η = 0.05 and a momentum m = 0. There were 30 glyphs from each writing system, and a training ratio of 80%. There were a maximum of 500 iterations. However, the graphs end at 300 iterations, as there were no notable change in error for the last 200 iterations. It is worth noting that there was only one run per combination, so this data is not scientifically significant. It might, however, work as a proof of concept. Each comparison had a test set with 12 glyphs, 6 from each writing system. Table 4.1 displays, for each of the comparisons, how many of these 12 that were correctly classified. The test plots in Appendix B.1.2 display the accumulated output from the runs, separated on the actual writing system. This does not say anything about the final classification, as that is depending on the difference in output (prob- ability) on each glyph. As a result, even if the accumulated values are close, the classification might still be correct. The test plots do, however, give a good

25 CHAPTER 4. RESULTS AND ANALYSIS 26

Table 4.1: The number of correctly classified glyphs, of a total of 12 test glyphs. Hangul Syllabics Hieroglyphs Coptic Runic Hebrew Arabic Hiragana Thai Han Average Coptic - 11 10 7 12 11 9 11 10 12 86.1% Runic 11 - 12 9 12 12 12 10 12 6 88.9% Hebrew 10 12 - 11 10 11 12 10 11 12 91.7% Arabic 7 9 11 - 10 10 10 9 11 12 82.4% Hangul 12 12 10 10 - 10 11 9 10 12 88.9% Hiragana 11 12 11 10 10 - 12 9 9 12 88.9% Thai 9 12 12 10 11 12 - 11 12 12 93.5% Syllabics 11 10 10 9 9 9 11 - 10 12 84.3% Han 10 12 11 11 10 9 12 10 - 12 89.8% Hieroglyphs 12 6 12 12 12 12 12 12 12 - 94.4%

Table 4.2: The maximal number of accumulated misclassification probabilities (Figures in Appendix B.1.2). At 3, there is an equal probability for both writing systems, i.e. below 3 there is a very good chance of correct classification — perfect classification at 0 — while above 3 the classification is not good. At 6 there is a perfect misclassification, the worst possible result. Coptic Hebrew Arabic Hangul Hiragana Thai Syllabics Runic Han Hieroglyphs Coptic 1 1 1 3 1 2 1 2 0 Runic 1 0 3 0 0 0 2 0 4 Hebrew 1 0 1 1 1 0 2 1 0 Arabic 1 3 1 1 3 2 2 2 0 Hangul 3 0 1 1 2 1 3 2 0 Hiragana 1 0 1 3 2 1 2 2 0 Thai 2 0 0 2 1 1 1 1 0 Syllabics 1 2 2 2 3 2 1 2 0 Han 2 0 1 2 2 2 1 2 0 Hieroglyphs 0 4 0 0 0 0 0 0 0 picture on the quality of the classification. The ultimate classification has 100% probability for the correct writing system. Table 4.2 displays the highest misclassification from these plots (Appendix B.1.2). It shows that Egyptian hieroglyphs are not just correctly classified by accident; there is actually a very clear separation between the hieroglyphs and the other writing system. The exception is with runic, but that was a run which never got an error below 0.6 (Appendix B.1.1, Runic). Otherwise, runic does very well, with multiple 0-runs. Also Thai and Hebrew do well, with mainly 1’s. The fact that both hieroglyphs and runic do very well in general but are a disastrous combination, points to the possibility of their glyphs being too similar. The example images in Table 3.1 might help explain this; both the hieroglyphs and the runic glyph have very light colours. While the other images are mainly black and white, hieroglyphs and runic are on the grey-scale. The theory is supported by the pixel comparison in Appendix D. While the two writing systems are very similar when the acceptable difference is larger (Tables D.3 and D.6), they are 27 4.1. LEARNING not so similar when the acceptable difference is smaller (Tables D.2 and D.9). With larger spans, their grey-scale pixels are seen as identical. The other writing systems are not caught in this similarity, as their white and black pixels are too far off in value.

4.1.2 Full Classification The main classification consists of all ten writing systems, using 30 glyphs from each. This was, according to the tests in Appendix C, the fairest data set distribution. That is, as many glyphs as possible while using the same number from each writing system, and without duplicates. As the system is highly random, e.g. when selecting the glyphs to use, the classification of all ten writing systems were run twenty times. All the twenty error graphs can be seen in Figure 4.1a on the following page. The runs that start their flattening phase first, are among those that need the most iterations to reach the error threshold. This is not unnatural, assuming the error decreases with roughly the same speed in all of their flattening phases. Figure 4.1b on the next page displays the percentage of correctly classified glyphs for all the runs. The success rate is ranging from 58.33% to 86.67%, with an average of 73.17%. That is, almost 44 of 60 test glyphs correctly classified. 73% correctly classified means there is still 27% that is incorrectly classified. That is not a very impressive figure. However, these figures represent the situation in black and white. The grey-scale can be seen in Figure 4.1c on page 29. This plot shows the average of the accumulated probability that each of the test glyphs is a member of each of the writing systems. Even though none of the writing systems are perfectly classified, all of them have a much higher probability of being classified as the correct one, than as any one of the others. It is worth noting that the similarity between runic and Egyptian hieroglyphs continues. It is not as visible in the full classification as in the pair classification, where they never reached a low error (Appendix B.1.1). In this case (Figure 4.1c), both runic and hieroglyphs have a higher success rate than average. However, while the other writing systems are misclassified as a seemingly random other writing system, runic and hieroglyphs are (almost) always misclassified as each other. That does make sense, as all the other writing systems share the same black or white similarity. If they are misclassified, it ought to be as a writing system that share their pixel value distribution.

4.1.3 Writing System Classification Quality Figure 4.2 on page 30 points to a certain correlation between the pair clas- sification and the full classification. The five writing systems with the lowest accumulated probability of correct classification during the full classification — Hangul, hiragana, Coptic, syllabics and Arabic — also had the lowest average of correct classification in the pair classification. The most notable exception is runic, which had precisely the same average in the pair classification as Hangul and hiragana, but did much better in the full classification. Otherwise, there is a clear separation between the five best writing systems in the top right corner, and the five worst in the bottom left corner. It seems learning is better suited to separate Egyptian hieroglyphs, Thai, Hebrew, Han CHAPTER 4. RESULTS AND ANALYSIS 28

Training 0.35 Error 0.3

0.25

0.2

Error 0.15

0.1

0.05

0 0 500 1000 1500 2000 2500 Iteration (a) Training

Testing 100 80 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20

Correctly classified glyphs [%] Run

Success rate

(b) Success rate

Figure 4.1: Classification of all ten writing systems. 20 runs with learning rate η = 0.01, momentum m = 0, error threshold of 0.01, 30 glyphs from each writing system, and a training ratio of 80%. The glyphs are selected anew for each of the twenty runs. Figure 4.1a contains the 20 error curves for the run. They are smoothed using a Bezier function, and then drawn on top of each other. Figure 4.1b displays the percentage of correctly classified glyphs for each of the runs. This is a matter of correct or not correct, while Figure 4.1c contains the accumulated probability that each of the glyphs is a member of each of the writing systems. Here, the data is averaged over the 20 runs. 29 4.1. LEARNING

Testing 6

5

4

3 Classified as

2

1

0 Han Hiragana Syllabics Thai Hangul Coptic Runic Hebrew Arabic Hieroglyphs

Writing system

Coptic Runic Hebrew Arabic Hangul Hiragana Thai Canadian Aboriginal syllabics Han Egyptian hieroglyphs

(c) Testing

Figure 4.1: (cont.) The test results. CHAPTER 4. RESULTS AND ANALYSIS 30 and runic, than Hangul, hiragana, Coptic, Canadian Aboriginal syllabics and Arabic. There is no clear correlation between these results and the pixel similarity in Appendix D. According to Table D.3 — as Richard’s curve with maximum growth at 128 was used during learning — runic, Thai and Egyptian hieroglyphs have all higher similarity with itself than with most other writing systems. Han and Hebrew, on the other hand, do not. It is thus difficult to say if there is any correlation between pixel similarity and ease of classification.

Correlation between pair classification and full classification 96

Hieroglyphs 94 Thai

92 Hebrew

90 Han HangulHiragana Runic 88

86 Coptic

Syllabics 84 Average correctly classified in pair classification [%]

Arabic 82 3 3.5 4 4.5 5 5.5 6 Accumulated probability in full classification

Figure 4.2: Correlation between the full classification and the pair classification. The x axis is the accumulated probability that each of the writing systems is correctly classified in the full classification (Figure 4.1c), while the y axis is the average of correctly classified glyphs in the pair classification (Table 4.1).

4.2 Evolution

As the evolution was slow and memory consuming (Section 3.4), it was difficult to good results. The evolution had to stop after 200 generations to avoid memory problems, but by then the fitness had not yet reached an acceptable level (Figure 4.3). Because of the time consumed and the overall bad results, the evolution was only run four times. This is not enough to make sure the results are not just random fluctuations. However, the results are significantly worse than for learning, and there is no reason to believe the next sixteen runs would be any 31 4.2. EVOLUTION different in that matter. The fitness for the four runs can be seen in Figure 4.3a. This is an interesting graph with respect to the minimal and average fitness. In all four cases, the average fitness is just above the minimal fitness. This points to a homogeneous population, where the average is pulled up because of some individuals that are much more fit than the rest. These individuals would be the elitists, making sure that the best individual in the population is as good as or better than the best individual in the last generation. One theory was that the mutation was destructive, causing potentially good individuals to change too much. The mutation rate was thus reduced from 2% to 0.2%, and then further down to 0.02%. There are no official data for any of these runs, but there were no positive change in the first hundred generations. Then the crossover rate was changed from 30% to 0.3%, but still no change. As the error at the end of training was this high (more than ten times higher than during learning), the testing results come as no surprise (Figure 4.3b). The test results range from 15% correct to 3.33% correct, with an average of 9.58%. Compared to the 73% average of learning (Figure 4.1b on page 28), this is clearly suboptimal. The problem can be seen in Figure 4.3c. While the test results from the learning (Figure 4.1c) clearly separate between the writing systems, the evolution does not. The probability of being classified as one specific writing system is almost the same for each of the writing systems. All of the writing systems have a high probability of being classified as hiragana or hieroglyphs. Hebrew and Canadian Aboriginal syllabics follow not far behind. It seems the few glyphs that were correctly classified, were either hiragana or hieroglyphs, or just lucky. These results clearly show that the evolved brain is not able to separate between the writing systems. CHAPTER 4. RESULTS AND ANALYSIS 32

Training 0.91 0.014 0.9 0.89 0.012 0.88 0.01 0.87 0.86 0.008 0.85 0.006 Fitness 0.84 0.83 0.004 0.82

0.002 Standard deviation 0.81 0.8 0 0 20 40 60 80 100 120 140 160 180 200 Generation

Minimal fitness Maximal fitness Average fitness Average standard deviation

(a) Training

Testing 100 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3

Correctly classified glyphs [%] Run

Success rate

(b) Success rate

Figure 4.3: Classification of all ten writing systems. 4 runs with crossover rate of 30%, mutation rate of 5%, 60 children, 2 elitists, 20% culling rate, an error threshold of 0.1, and a maximum of 200 generations. The glyphs are selected anew for each of the four runs. Figure 4.3a contains the fitness for the four runs, smoothed using a Bezier function. This graph contains both the maximal, minimal and average fitness, as well as the average standard deviation. Figure 4.3b displays the percentage of correctly classified glyphs for each of the runs. This is a matter of correct or not correct, while Figure 4.3c contains the accumulated probability that each of the glyphs is a member of each of the writing systems. Here, the data is averaged over the 4 runs. 33 4.2. EVOLUTION

Testing 6

5

4

3 Classified as

2

1

0 Han Hiragana Syllabics Thai Hangul Coptic Runic Hebrew Arabic Hieroglyphs

Writing system

Coptic Runic Hebrew Arabic Hangul Hiragana Thai Canadian Aboriginal syllabics Han Egyptian hieroglyphs

(c) Testing

Figure 4.3: (cont.) The test results. CHAPTER 4. RESULTS AND ANALYSIS 34 CHAPTER 5

Discussion

The main goal of this dissertation is to see if a neural net can generalize enough to classify the writing system of an unseen glyph. The results from the learning (Figure 4.1 on page 29) point to a neural net that is capable of this. The results are far from perfect. However, with more than 50% correct from ten potential answers, it is clear that the network is able to separate between the writing system.

5.1 Learning or Evolution?

The results indicate that learning is far better suited for this task than evo- lution is. The best network created from learning had an average success rate of 73% (Figure 4.1), while the best evolved network achieved less than 10% on average (Figure 4.3). As Figure 4.1c reveals, learning had a success rate far above what is expected of random classification. Evolution, however, was unsuccessful. There is nothing in Figure 4.3c that indicates any form of intelligent classification. It is only natural that the learned network perform better than the evolved net- work. Learning stopped when the error was below 0.01, while evolution stopped already at 0.1, or after a maximum of 200 generations. This is the main problem with evolution; it uses too much memory, too much processing power, and too much time. No available hardware is able to keep the evolution going for more than this. Alas, the results are a natural consequence. However, the fitness graphs in Figure 4.3a do not look promising. The maximal fitness is rising because of elitists; there is no clear improvement in the average fitness. Parameter tweaking might be able to solve this. The evolutionary algo- rithm in its current state, however, will probably never reach an error of less than 0.01; except by a potential strike of pure luck in the random generator.

5.2 Writing Systems

It seems like there is no specific correlation between any of the writing systems. This includes the closely related Han and hiragana (Figure 4.1c, Table 4.2, Section 3.1). Egyptian hieroglyphs and runic are tightly coupled (Figure 4.1c, Tables 4.1

35 CHAPTER 5. DISCUSSION 36 and 4.2), but that is probably just a result of the grey-scale of their images (Section 4.1.2). It is worth noting that hieroglyphs, Thai, Hebrew, Han and runic are easier to classify than Hangul, hiragana, Coptic, syllabics and Arabic (Figure 4.2). How- ever, there is no obvious reason why these specific systems are easier. It does not seem to have anything to do with pixel similarity (Section 4.1.3).

5.3 Future Work

There are many ways to go from here. There is still fine-tuning needed on the learning algorithm. On the other hand, it is too early to dismiss evolution just yet. There are still untested tweaks and algorithms that might improve the results.

Perfecting the Learning While the results from the learning are good, there is still room for improvement. Since learning is not especially memory consuming, nor has a very long execution time, it might be a good idea to create more input nodes. That is, higher resolution on the glyphs. It might be that the current resolution is generalizing just enough, but it is something one ought to investigate. More layers might help in the generalization. More than two hidden layers are probably not necessary, but one extra layer ought to be tested. Fast backpropagation or other versions of the algorithm might also improve the results. Either way, the parameter values ought to be better configured. These can for example be found using evolution, or just manual trial and error.

Giving the Evolution a Second Chance Another possibility is to focus on the evolution, and try to achieve better results. As this would be done with no regard for learning, also the topology can be evolved. Hopefully this will create a network more suited for the task. Also here, some extensive parameter tweaking ought to be performed. Neuroevolution of augmenting topologies (NEAT) is a method for evolving neural networks, including the topology. Stanley and Miikkulainen (2002) claim it is very efficient, which is a very good regarding the execution time problems of the standard evolution. The specialization HyperNEAT (Gauci and Stanley, 2010) might be an even better solution, as HyperNEAT introduces geometry. The glyph classification can be seen as a geometrical problem, as the pixels’ relative location is crucial for the representation of the glyph. The possibilities of using NEAT or HyperNEAT to evolve the network ought to be explored. However, extensive evolution necessitates a cluster to run on, or corresponding hardware.

General Improvements It might be advantageous to create the network cumu- latively. For example, one can create a subnet for each of the writing systems, and then merge them into one larger network. This method should work for both learning and evolution, but is probably most worthwhile during evolution. It is during evolution the execution time is most problematic, and thus there this division of labour is most important. 37 5.3. FUTURE WORK

The input might need some redesign. The pixel values used in this version are probably not the best solution when it comes to capturing the lines and circles of the glyph. It is worth investigating more advanced image processing techniques, to create another means of representation. CHAPTER 5. DISCUSSION 38 References

Eri Banno, Yutaka Ohno, Yoko Sakane, and Chikako Shinagawa. Genki 1: An Integrated Course in Elementary Japanese. 1999. ISBN 4789009637. E.B. Baum, D. Boneh, and C. Garrett. Where Genetic Algorithms Excel. Evolutionary Computation, 9(1):93–124, 2001. ISSN 1063-6560. URL http: //www.mitpressjournals.org/doi/abs/10.1162/10636560151075130. BioSS. The generalised logistic curve. URL http://www.bioss.ac.uk/smart/ unix/mgrow/slides/slide02.htm. Christopher M. Bishop. Neural Networks for Pattern Recognition. Oxford Uni- versity Press, Oxford, , 1996. ISBN 0198538642. John A. Bullinaria. John Bullinaria’s Step by Step Guide to Implementing a Neural Network in C, 2009. URL http://www.cs.bham.ac.uk/~{}jxb/INC/ nn.html. Robert Callan. The Essence of Neural Networks. Prentice Hall Europe, Harlow, Essex, Enland, 1999. ISBN 0-13-908732-X. T.-. Cho, R.W. Conners, and P.A. Araman. Fast backpropagation learning using steep activation functions and automatic weight reinitialization. Confer- ence Proceedings 1991 IEEE International Conference on Systems, , and Cybernetics, 3:1587–1592, 1991. doi: 10.1109/ICSMC.1991.169915. URL http: //ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=169915. Mark Collier and Bill Manley. How to read Egyptian hieroglyphs: a step- by-step guide to teach yourself. University of California Press, Los Ange- les, 2003. ISBN 0520239490. URL http://books.google.com/books?id= aroQ0Zrl8J4C&pgis=1. Charles Darwin. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. Murray, London, - land, 1859. URL http://www.literature.org/authors/darwin-charles/ the-origin-of-species/. Kenneth A. De Jong. Evolutionary Computation : A Unified Approach. The MIT Press, Cambridge, , USA, 2006. ISBN 0-262-04194-4. Stanislas Dehaene. Reading in the Brain : The Science and Evolution of a Human Invention. Penguin Group, London, England, 2009. ISBN 978-0-670-02110-9.

39 REFERENCES 40

Rinku Dewri. Evolutionary Neural Networks: Design Methodolo- gies - AI Depot, 2003. URL http://ai-depot.com/articles/ evolutionary-neural-networks-design-methodologies/. Keith L. Downing. Combining Evolutionary Algorithms and Neural Net- works. Science And Technology, 2006. URL http://citeseerx.ist.psu.edu/ viewdoc/download?doi=10.1.1.100.5913&rep=rep1&type=pdf. Keith L. Downing. Introduction to evolutionary algorithms. 2009a. URL http: //www.idi.ntnu.no/emner/it3708/lectures/evolalgs.pdf. Keith L. Downing. Reinforcement Learning in Neural Networks. 2009b. URL http://www.idi.ntnu.no/emner/it3708/lectures/learn-rl.pdf. Keith L. Downing. Supervised Learning in Neural Networks. 2009c. URL http: //www.idi.ntnu.no/emner/it3708/lectures/learn-sup.pdf. Keith L. Downing. Unsupervised Learning in Neural Networks. 2009d. URL http://www.idi.ntnu.no/emner/it3708/lectures/learn-unsup.pdf. R. A. Dunne and N. A. Campbell. On The Pairing Of The Softmax Activation And Cross-Entropy Penalty Functions And The Derivation Of The Softmax Activation Function. In 8th Australian Conference on Neural Networks, pages 181–185, Melbourne, Australia, 1997. Encyclopædia Britannica. Absolute temperature scale, 2010a. URL http://www. britannica.com/EBchecked/topic/1801/absolute-temperature-scale. Encyclopædia Britannica. Boltzmann constant, 2010b. URL http://www. britannica.com/EBchecked/topic/72417/Boltzmann-constant. Encyclopædia Britannica. Fermi level, 2010c. URL http://www.britannica. com/EBchecked/topic/204779/Fermi-level. Dario Floreano and Claudio Mattiussi. Bio-Inspired Artificial Intelligence : The- ories, Methods, and Technologies. The MIT Press, Cambridge, MA, USA, 2008. ISBN 978-0-262-06271-8. FOLDOC. Glyph, 1998a. URL http://foldoc.org/glyphs. FOLDOC. Writing System, 1998b. URL http://foldoc.org/writing+system. Jason Gauci and Kenneth O. Stanley. Autonomous evolution of topographic regularities in artificial neural networks., July 2010. ISSN 1530-888X. URL http://www.ncbi.nlm.nih.gov/pubmed/20235822. V.K. Govindan and A.P. Shivaprasad. Character Recognition - A Review. Pattern Recognition, 23(7):671–683, 1990. M. Tim Jones. Artificial Intelligence : A Systems Approach. Jones and Bartlett Publishers, Sudbury, MA, USA, 2009. ISBN 978-0-7637-7337-3. Daniel Kirsch. Detexify LaTeX handwritten symbol recognition. URL http: //detexify.kirelabs.org/classify.html. 41 REFERENCES

Kevin Knight and Kenji Yamada. A computational approach to deci- phering unknown scripts. In ACL Workshop on Unsupervised Learning in Natural Language Processing, number 1, pages 37–44. Citeseer, 1999. URL http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43. 7982&rep=rep1&type=pdf. J Mantas. An Overview of Character Recognition Methodologies. Pattern Recog- nition, 19(6):3–8, 1986. Paul McQuesten and Risto Miikkulainen. Culling and teaching in neuro- evolution. In Proceedings of the 7th International Conference on Genetic Al- gorithms, pages 1–8, 1997. URL http://citeseerx.ist.psu.edu/viewdoc/ download?doi=10.1.1.49.8014&rep=rep1&type=pdf. Merriam-Webster. Cull, 2010. URL http://www.merriam-webster.com/ dictionahttp://www.merriam-webster.com/dictionary/culling. J.E. Ngolediage, R.N.G. Naguib, and S.S. Dlay. Fast backpropagation for super- vised learning. Proceedings of 1993 International Conference on Neural Net- works (IJCNN-93-Nagoya, Japan), 2(4):2591–2594, 1993. doi: 10.1109/IJCNN. 1993.714254. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper. htm?arnumber=714254. Rafael Ramos-Garijo, Sergio Mart´ın, Andr´es Marzal, Frederico Prat, Juan Miguel Vilar, and David Llorens. An input panel and recognition engine for on-line handwritten text recognition, pages 223–232. IOS Press, 2007. Jairo Rocha and Theo Pavlidis. A shape analysis model with applications to a character recognition system. IEEE Transactions on Pattern Analysis and Ma- chine Intelligence, 16(4):393–404, April 1994. ISSN 01628828. doi: 10.1109/34. 277592. URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm? arnumber=277592. D.E. Rumelhart, G.E. Hinton, and R.J. Williams. Learning Representations by Back-Propagation of Errors. Nature, 1986. Shi and Haifeng Wu. Evolving Efficient Connection for the Design of Artificial Neural Networks. In Proceedings of the 18th International Con- ference on Artificial Neural Networks, pages 909–918, Trondheim, Norway, 2008. URL http://www.springerlink.com/content/au45qn6753k42360/ fulltext.pdf. Kenneth O. Stanley and Risto Miikkulainen. Evolving Neural Networks through Augmenting Topologies. Evolutionary Computation, 10(2):99–127, 2002. The Unicode Consortium. The Unicode Standard, Version 5.2.0. The Unicode Consortium, Mountain View, , USA, 2009. ISBN 978-1-936213-00-9. URL http://www.unicode.org/versions/Unicode5.2.0/. Kiri Wagstaff. ANN Backpropagation : Weight updates for hidden nodes. pages 1–3, 2008. URL http://www.wkiri.com/cs461-w08/Lectures/Lec4/ ANN-deriv.pdf. REFERENCES 42

D. Whitley, T. Starkweather, and C. Bogart. Genetic algorithms and neural networks: optimizing connections and connectivity. Parallel Computing, 14(3): 347–361, August 1990. ISSN 01678191. doi: 10.1016/0167-8191(90)90086-O. URL http://linkinghub.elsevier.com/retrieve/pii/016781919090086O. B. Widrow and M.E. Hoff. Adaptive Switching Circuits. In Proceedings of the 1960 IRE WESCON Convention, pages 96–104, New York City, NY, USA, 1960. IRE. Wikibooks. Artificial Neural Networks/Activation Functions - Wik- ibooks, collection of open-content textbooks, 2010. URL http: //en.wikibooks.org/w/index.php?title=Artificial_Neural_Networks/ Activation_Functions&oldid=1942655. WorldLingo. , 2010a. URL http://www.worldlingo.com/ma/ enwiki/en/Arabic_alphabet. WorldLingo. , 2010b. URL http://www.worldlingo.com/ma/ enwiki/en/Coptic_alphabet. WorldLingo. Han unification, 2010c. URL http://www.worldlingo.com/ma/ enwiki/en/Han_unification. WorldLingo. Hangul, 2010d. URL http://www.worldlingo.com/ma/enwiki/ en/Hangul. WorldLingo. , 2010e. URL http://www.worldlingo.com/ma/ enwiki/en/Hebrew_alphabet. WorldLingo. Egyptian hieroglyphs, 2010f. URL http://www.worldlingo.com/ ma/enwiki/en/Egyptian_hieroglyphs. WorldLingo. Hiragana, 2010g. URL http://www.worldlingo.com/ma/enwiki/ en/Hiragana. WorldLingo. Katakana, 2010h. URL http://www.worldlingo.com/ma/enwiki/ en/Katakana. WorldLingo. , 2010i. URL http://www.worldlingo.com/ma/ enwiki/en/Latin_alphabet. WorldLingo. Runic alphabet, 2010j. URL http://www.worldlingo.com/ma/ enwiki/en/Runic_alphabet. WorldLingo. Canadian Aboriginal syllabics, 2010k. URL http://www. worldlingo.com/ma/enwiki/en/Canadian_Aboriginal_syllabics. WorldLingo. Thai alphabet, 2010l. URL http://www.worldlingo.com/ma/ enwiki/en/Thai_alphabet. WorldLingo. Writing system, 2010m. URL http://www.worldlingo.com/ma/ enwiki/en/Writing_system. WorldLingo. Exclusive or, 2010n. URL http://www.worldlingo.com/ma/ enwiki/en/Exclusive_or. Software Library

. Alexandria. http://www.cliki.net/Alexandria. http://common-lisp.net/~loliveira/tarballs/inofficial/ alexandria-2008-07-29.tar.gz, version 2008-07-29. atdoc. atdoc : Common Lisp documentation generation. http://www. lichteblau.com/atdoc. http://www.lichteblau.com/atdoc/download/ atdoc.tar.gz, version 2008.11.30. David Lichteblau. babel. Babel. http://www.cliki.net/Babel. http://common-lisp.net/ project/babel/releases/babel_latest.tar.gz, version 0.3.0. ch-image. ch-image. http://www.cliki.net/ch-image. http://cyrusharmon. org/static/releases/ch-image_0.4.1.tar.gz. ch-util. ch-util. http://cyrusharmon.org/static/releases/ch-util_0.3. 10.tar.gz. cl-jpeg. cl-jpeg. http://www.cliki.net/cl-jpeg. http://www.common-lisp. net/project/mcclim/cl-jpeg.tar.gz. cl-ppcre. CL-PPCRE : Portable perl-compatible regular expressions for Common Lisp. http://www.weitz.de/cl-ppcre. Ubuntu package cl-ppcre, http: //weitz.de/files/cl-ppcre.tar.gz, version 2.0.1-2. Edi Weitz. cl-unicode. CL-UNICODE : A portable unicode library for Common Lisp. http: //www.weitz.de/cl-unicode. http://weitz.de/files/cl-ppcre.tar.gz, version 0.1.1. Edi Weitz. cl-yacc. CL-Yacc. http://www.cliki.net/CL-Yacc. http://www.pps. jussieu.fr/~jch/software/files/cl-yacc-0.3.tar.gz, version 0.3. clem. clem. http://cyrusharmon.org/static/releases/clem_0.4.1.tar.gz. closer-mop. Closer to mop. http://www.cliki.net/closer-mop. http://www. common-lisp.net/project/closer/ftp/closer-mop_latest.tar.gz, ver- sion 0.61. closure-common. closure-common. http://www.cliki.net/closure-common. http://www.common-lisp.net/project/cxml/download/closure-common. tar.gz, version 2008-11-30.

43 SOFTWARE LIBRARY 44 closure-html. closure-html. http://www.cliki.net/closure-html. http:// www.common-lisp.net/project/closure/download/closure-html.tar.gz, version 2008-11-30. cxml. CXML. http://www.cliki.net/CXML. http://www.common-lisp.net/ project/cxml/download/cxml.tar.gz, version 2008-11-30. cxml-stp. cxml-stp. http://www.cliki.net/cxml-stp. http://www. lichteblau.com/cxml-stp/download/cxml-stp.tar.gz, version 2008-11-30. flexi-streams. flexi-streams. http://www.cliki.net/Flexi-streams. http: //weitz.de/files/flexi-streams.tar.gz, version 1.0.7. pango. Pango. http://www.pango.org. Ubuntu packages gir1.0-pango1.0, libpango1.0-0, libpango1.0-0-dbg, libpango1.0-common, libpango1.0-dev, libpango1.0-doc, libsdl-pango-dev, libsdl-pango1, version 1.28.0. parse-number. Parse-number. http://www.cliki.net/PARSE-NUMBER. http:// www.common-lisp.net/project/asdf-packaging/parse-number-latest. tar.gz, version 1.0. plexippus-xpath. plexippus-xpath. http://www.cliki.net/plexippus-xpath. http://www.common-lisp.net/project/plexippus-xpath/download/ plexippus-xpath.tar.gz, version 2008-12-07. puri. Puri (Portable Universal Resource Identifier). http://www.cliki.net/ Puri. http://files.b9.com/puri/puri-latest.tar.gz, version 1.5.5. Salza2. Salza2. http://www.cliki.net/Salza2. sbcl. SBCL (Steel Bank Common Lisp). http://www.sbcl.org. Ubuntu package sbcl, version 1.0.29.11.debian. split-sequence. Split-sequence. http://www.cliki.net/SPLIT-SEQUENCE. http: //ftp.linux.org.uk/pub/lisp/experimental/cclan/split-sequence. tar.gz. trivial-features. trivial-features. http://www.cliki.net/trivial-features. http://common-lisp.net/~loliveira/tarballs/trivial-features/ trivial-features_latest.tar.gz, version 0.5. trivial-gray-streams. trivial-gray-streams. http://www.cliki.net/ trivial-gray-streams. http://common-lisp.net/project/cl-plus-ssl/ download/trivial-gray-streams.tar.gz, version 2008-11-02. xuriella. Xuriella. http://www.cliki.net/xuriella. http://www. common-lisp.net/project/xuriella/download/xuriella.tar.gz, version 2009-04-04. ZPNG. Zpng. http://www.cliki.net/ZPNG. http://www.xach.com/lisp/ zpng.tgz. APPENDIX A

Unicode Charts

Below are examples of glyphs from the ten writing systems used. All the glyphs are not present, as there are over 600 pages in total. This should, however, be enough to get the characteristics of each writing system. The Unicode tables are copied from the Unicode charts (The Unicode Con- sortium, 2009), with the blessing of the Unicode Terms of Use (The Unicode Consortium, 2009, http://www.unicode.org/copyright.html): Any person is hereby authorized, without fee, to view, use, reproduce, and distribute all documents and files solely for informational purposes in the creation of products supporting the Unicode Standard, subject to the Terms and Conditions herein.

A-1 APPENDIX A. UNICODE CHARTS A-2

Figure A.1: Coptic A-3

Figure A.2: Runic APPENDIX A. UNICODE CHARTS A-4

Figure A.3: Hebrew A-5

Figure A.4: Arabic APPENDIX A. UNICODE CHARTS A-6

Figure A.5: Hangul jamo A-7

Figure A.6: Hiragana APPENDIX A. UNICODE CHARTS A-8

Figure A.7: Thai A-9

Figure A.8: Canadian Aboriginal syllabics APPENDIX A. UNICODE CHARTS A-10

Figure A.9: Han unification A-11

Figure A.10: Egyptian hieroglyphs APPENDIX A. UNICODE CHARTS A-12 APPENDIX B

Classification

This appendix contains the training and testing graphs for all the runs. Af- ter the pair classification follows the full classification, with both learning and evolution.

B.1 Pair Classification

The pair classification was only performed with learning. The training plots are collected per writing system, while the test results are plottet for pair.

B.1.1 Training

Coptic 1 Runic Hebrew Arabic 0.8 Hangul jamo Hiragana Thai 0.6 Canadian Aboriginal syllabics Han unification Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration

B-1 APPENDIX B. CLASSIFICATION B-2

Runic 1 Coptic Hebrew Arabic 0.8 Hangul jamo Hiragana Thai 0.6 Canadian Aboriginal syllabics Han unification Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration

Hebrew 1 Coptic Runic Arabic 0.8 Hangul jamo Hiragana Thai 0.6 Canadian Aboriginal syllabics Han unification Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration B-3 B.1. PAIR CLASSIFICATION

Arabic 1 Coptic Runic Hebrew 0.8 Hangul jamo Hiragana Thai 0.6 Canadian Aboriginal syllabics Han unification Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration

Hangul jamo 1 Coptic Runic Hebrew 0.8 Arabic Hiragana Thai 0.6 Canadian Aboriginal syllabics Han unification Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration APPENDIX B. CLASSIFICATION B-4

Hiragana 1 Coptic Runic Hebrew 0.8 Arabic Hangul jamo Thai 0.6 Canadian Aboriginal syllabics Han unification Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration

Thai 1 Coptic Runic Hebrew 0.8 Arabic Hangul jamo Hiragana 0.6 Canadian Aboriginal syllabics Han unification Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration B-5 B.1. PAIR CLASSIFICATION

Canadian Aboriginal syllabics 1 Coptic Runic Hebrew 0.8 Arabic Hangul jamo Hiragana 0.6 Thai Han unification Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration

Han unification 1 Coptic Runic Hebrew 0.8 Arabic Hangul jamo Hiragana 0.6 Thai Canadian Aboriginal syllabics Egyptian hieroglyphs Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration APPENDIX B. CLASSIFICATION B-6

Egyptian hieroglyphs 1 Coptic Runic Hebrew 0.8 Arabic Hangul jamo Hiragana 0.6 Thai Canadian Aboriginal syllabics Han unification Error 0.4

0.2

0 0 50 100 150 200 250 300 Iteration

B.1.2 Testing

Coptic, Runic 6 5 4 3 2 Classified as 1 0 Coptic Runic

Writing system Coptic Runic B-7 B.1. PAIR CLASSIFICATION

Coptic, Hebrew 6 5 4 3 2 Classified as 1 0 Hebrew Coptic

Writing system Coptic Hebrew

Coptic, Arabic 6 5 4 3 2 Classified as 1 0 Coptic Arabic

Writing system Coptic Arabic APPENDIX B. CLASSIFICATION B-8

Coptic, Hangul jamo 6 5 4 3 2 Classified as 1 0 Hangul Coptic

Writing system Coptic Hangul jamo

Coptic, Hiragana 6 5 4 3 2 Classified as 1 0 Coptic Hiragana

Writing system Coptic Hiragana B-9 B.1. PAIR CLASSIFICATION

Coptic, Thai 6 5 4 3 2 Classified as 1 0 Thai Coptic

Writing system Coptic Thai

Coptic, Canadian Aboriginal syllabics 6 5 4 3 2 Classified as 1 0 Syllabics Coptic

Writing system Coptic Canadian Aboriginal syllabics APPENDIX B. CLASSIFICATION B-10

Coptic, Han unification 6 5 4 3 2 Classified as 1 0 Han Coptic

Writing system Coptic Han unification

Coptic, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Coptic Hieroglyphs

Writing system Coptic Egyptian hieroglyphs B-11 B.1. PAIR CLASSIFICATION

Runic, Hebrew 6 5 4 3 2 Classified as 1 0 Hebrew Runic

Writing system Runic Hebrew

Runic, Arabic 6 5 4 3 2 Classified as 1 0 Runic Arabic

Writing system Runic Arabic APPENDIX B. CLASSIFICATION B-12

Runic, Hangul jamo 6 5 4 3 2 Classified as 1 0 Hangul Runic

Writing system Runic Hangul jamo

Runic, Hiragana 6 5 4 3 2 Classified as 1 0 Runic Hiragana

Writing system Runic Hiragana B-13 B.1. PAIR CLASSIFICATION

Runic, Thai 6 5 4 3 2 Classified as 1 0 Runic Thai

Writing system Runic Thai

Runic, Canadian Aboriginal syllabics 6 5 4 3 2 Classified as 1 0 Syllabics Runic

Writing system Runic Canadian Aboriginal syllabics APPENDIX B. CLASSIFICATION B-14

Runic, Han unification 6 5 4 3 2 Classified as 1 0 Runic Han

Writing system Runic Han unification

Runic, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Runic Hieroglyphs

Writing system Runic Egyptian hieroglyphs B-15 B.1. PAIR CLASSIFICATION

Hebrew, Arabic 6 5 4 3 2 Classified as 1 0 Arabic Hebrew

Writing system Hebrew Arabic

Hebrew, Hangul jamo 6 5 4 3 2 Classified as 1 0 Hebrew Hangul

Writing system Hebrew Hangul jamo APPENDIX B. CLASSIFICATION B-16

Hebrew, Hiragana 6 5 4 3 2 Classified as 1 0 Hebrew Hiragana

Writing system Hebrew Hiragana

Hebrew, Thai 6 5 4 3 2 Classified as 1 0 Hebrew Thai

Writing system Hebrew Thai B-17 B.1. PAIR CLASSIFICATION

Hebrew, Canadian Aboriginal syllabics 6 5 4 3 2 Classified as 1 0 Hebrew Syllabics

Writing system Hebrew Canadian Aboriginal syllabics

Hebrew, Han unification 6 5 4 3 2 Classified as 1 0 Hebrew Han

Writing system Hebrew Han unification APPENDIX B. CLASSIFICATION B-18

Hebrew, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Hieroglyphs Hebrew

Writing system Hebrew Egyptian hieroglyphs

Arabic, Hangul jamo 6 5 4 3 2 Classified as 1 0 Arabic Hangul

Writing system Arabic Hangul jamo B-19 B.1. PAIR CLASSIFICATION

Arabic, Hiragana 6 5 4 3 2 Classified as 1 0 Arabic Hiragana

Writing system Arabic Hiragana

Arabic, Thai 6 5 4 3 2 Classified as 1 0 Arabic Thai

Writing system Arabic Thai APPENDIX B. CLASSIFICATION B-20

Arabic, Canadian Aboriginal syllabics 6 5 4 3 2 Classified as 1 0 Arabic Syllabics

Writing system Arabic Canadian Aboriginal syllabics

Arabic, Han unification 6 5 4 3 2 Classified as 1 0 Arabic Han

Writing system Arabic Han unification B-21 B.1. PAIR CLASSIFICATION

Arabic, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Hieroglyphs Arabic

Writing system Arabic Egyptian hieroglyphs

Hangul jamo, Hiragana 6 5 4 3 2 Classified as 1 0 Hiragana Hangul

Writing system Hangul jamo Hiragana APPENDIX B. CLASSIFICATION B-22

Hangul jamo, Thai 6 5 4 3 2 Classified as 1 0 Thai Hangul

Writing system Hangul jamo Thai

Hangul jamo, Canadian Aboriginal syllabics 6 5 4 3 2 Classified as 1 0 Hangul Syllabics

Writing system Hangul jamo Canadian Aboriginal syllabics B-23 B.1. PAIR CLASSIFICATION

Hangul jamo, Han unification 6 5 4 3 2 Classified as 1 0 Han Hangul

Writing system Hangul jamo Han unification

Hangul jamo, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Hangul Hieroglyphs

Writing system Hangul jamo Egyptian hieroglyphs APPENDIX B. CLASSIFICATION B-24

Hiragana, Thai 6 5 4 3 2 Classified as 1 0 Hiragana Thai

Writing system Hiragana Thai

Hiragana, Canadian Aboriginal syllabics 6 5 4 3 2 Classified as 1 0 Hiragana Syllabics

Writing system Hiragana Canadian Aboriginal syllabics B-25 B.1. PAIR CLASSIFICATION

Hiragana, Han unification 6 5 4 3 2 Classified as 1 0 Hiragana Han

Writing system Hiragana Han unification

Hiragana, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Hiragana Hieroglyphs

Writing system Hiragana Egyptian hieroglyphs APPENDIX B. CLASSIFICATION B-26

Thai, Canadian Aboriginal syllabics 6 5 4 3 2 Classified as 1 0 Thai Syllabics

Writing system Thai Canadian Aboriginal syllabics

Thai, Han unification 6

5

4

3

Classified as 2

1

0 Thai Han

Writing system Thai Han unification B-27 B.1. PAIR CLASSIFICATION

Thai, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Hieroglyphs Thai

Writing system Thai Egyptian hieroglyphs

Canadian Aboriginal syllabics, Han unification 6 5 4 3 2 Classified as 1 0 Han Syllabics

Writing system Canadian Aboriginal syllabics Han unification APPENDIX B. CLASSIFICATION B-28

Canadian Aboriginal syllabics, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Hieroglyphs Syllabics

Writing system Canadian Aboriginal syllabics Egyptian hieroglyphs

Han unification, Egyptian hieroglyphs 6 5 4 3 2

Classified as 1 0 Hieroglyphs Han

Writing system Han unification Egyptian hieroglyphs B-29 B.2. FULL CLASSIFICATION

B.2 Full Classification

Here follows the graphs from the full classification, with all ten writing systems. This was done with both learning and evolution. There are three graphs for each method; first the training graph displaying the error, then a statistical graph displaying the success rate, and finally the classification results.

B.2.1 Learning

Training 0.35 Error 0.3 0.25 0.2

Error 0.15 0.1 0.05 0 0 500 1000 1500 2000 2500 Iteration

Testing 100 80 60 40 20 0 0 2 4 6 8 10 12 14 16 18 20 Run Correctly classified glyphs [%] Success rate APPENDIX B. CLASSIFICATION B-30

Testing 6

5

4

3 Classified as

2

1

0 Hangul Arabic Hebrew Hiragana Coptic Runic Thai Syllabics Han Hieroglyphs

Writing system Coptic Runic Hebrew Arabic Hangul Hiragana Thai Canadian Aboriginal syllabics Han Egyptian hieroglyphs B-31 B.2. FULL CLASSIFICATION

B.2.2 Evolution

Training 0.91 0.014 0.9 0.89 0.012 0.88 0.01 0.87 0.86 0.008 0.85 0.006

Fitness 0.84 0.83 0.004 0.82 0.002

0.81 Standard deviation 0.8 0 0 20 40 60 80 100 120 140 160 180 200 Generation Minimal fitness Maximal fitness Average fitness Average standard deviation

Testing 100 80 60 40 20 0 0 0.5 1 1.5 2 2.5 3 Run Correctly classified glyphs [%] Success rate APPENDIX B. CLASSIFICATION B-32

Testing 6

5

4

3 Classified as

2

1

0 Hangul Arabic Hebrew Hiragana Coptic Runic Thai Syllabics Han Hieroglyphs

Writing system Coptic Runic Hebrew Arabic Hangul Hiragana Thai Canadian Aboriginal syllabics Han Egyptian hieroglyphs APPENDIX C

Data Set Distribution

The chosen writing systems have a different amount of glyphs. As can be seen in Table 3.1 on page 17, their Unicode point range varies from 95 (Hebrew and Thai) to 20943 (Han). It can be noted that the actual numbers are not this high. Many code points do not yet have a glyph, and are thus removed. The vowel marks in the abjads are also removed, as they cover only a few pixels in the final image. They would give the abjads an unfair advantage, as they all look very much the same. The available glyphs thus range from 32 (Hebrew) to 20940 (Han) (Table C.1).

Table C.1: The actual number of glyphs in the writing systems, after removal of the empty code points.

Script Number of glyphs Coptic 111 Runic 90 Hebrew 32 Arabic 190 Hangul 254 Hiragana 93 Thai 87 Canadian Aboriginal syllabics 363 Han 20940 Egyptian hieroglyphs 1071

There are, however, not this many glyphs that actually can be used. For some glyphs, something has gone wrong either when drawing the glyph or when reading it, resulting in no usable image. These must be removed during runtime. Alas, it is hard to predict precisely how many glyphs there are available in each writing system. Either way, there is such a large difference that some writing systems will have an advantage compared to others. Which writing systems depend on how the glyphs are selected. Three different approaches were tested. Approach 1 One solution is to use more glyphs in one writing system than in another. This is a bit problematic, as the ANN will be better trained on the larger writing systems than on the smaller. This will thus give the smaller

C-1 APPENDIX C. DATA SET DISTRIBUTION C-2

writing systems a disadvantage. Approach 2 One could also use the same number of glyphs for all writing sys- tems, and allow duplicates. Then, the smaller writing systems would have multiple instances of the same glyphs. Those systems will have a huge ad- vantage, as they probably will have the same glyphs in both the training set and the test set. Approach 3 One can use the same number of glyphs from every writing system, but without duplicates. Alas, one cannot use more glyphs than the smallest writing system has available. There are probably many important aspects of the larger systems that are lost in the unchosen glyphs. This approach will thus give the larger writing systems a disadvantage.

Table C.2: Comparison of Hebrew, Hangul, Thai and Han with different combinations of number of glyphs. All runs used a learning rate η = 0.02 and used 80% of the glyphs for training, the rest for testing.

Writing Number of Correctly Total correctly Number of Approach systems glyphs classified classified iterations Hebrew 30 0/6 Hangul 250 41/50 1 88.24% 31 Thai 80 0/16 Han 1000 199/200 Hebrew 300 53/60 Hangul 300 11/60 2 64.17% 16 Thai 300 60/60 Han 300 30/60 Hebrew 30 5/6 Hangul 30 1/6 3 58.33% 158 Thai 30 3/6 Han 30 5/6

The outputs of the runs are summarized in Figures C.1 to C.3. This test was run with an earlier version of the learning algorithm than what was used for the real classification (Appendix B). Alas, the graphs function a bit different. The output in this case is a number between 1 and -1, not 1 and 0. The optimal result will thus have the correct writing system far above zero, and the three other far below zero. At first glance, the first approach seems to be the best, with 88% correct. However, the predicted advantage for the larger writing systems has occurred. While Han — with its 1000 glyphs — has 199 of 200 correct, Hebrew and Thai — which had far fewer glyphs available — had none of their glyphs correctly classified (Table C.2, Figure C.1). This approach still did good on total, because Han was almost completely correctly classified, and there were more testing glyphs of Han than of the other three in total. Adding on the rather good Hangul gives 250 glyphs that has a high probability of being correct, of a total of 272. Alas, since Hebrew and Thai have so few glyphs involved, their classification is disastrous. But since they have so few glyphs, the disaster is not noticeable in the grand scheme. This is clearly an approach that is best for the larger writing systems. It is C-3

Approach 1 200 180 160 140 120 100 80

Classified as 60 40 20 0 Thai Han Hebrew Hangul

Writing system

Hebrew Hangul Thai Han

Figure C.1: Comparison of what the different writing systems were classified as, using Approach 1; as many glyphs from each writing system as that writing system has. also a good solution if one only care about the percentage of correctly classified glyphs. If the system actually is to be used, however, this is not the way to go. The second approach does also a rather good job, with 64% (Table C.2). As can be seen in Figure C.2, the result is not as biased as that of Approach 1. However, it is clear that the small writing systems (Hebrew and Thai) are far better classified than the larger. This is quite natural; with 300 glyphs, each Hebrew glyph is used roughly 10 times.. As a result, the ANN can train again and again on the same Hebraic glyphs. The main problem is that the testing glyphs of Hebrew must necessarily be present also in the training glyphs, to avoid only being able to test on one or two different glyphs. Then, the ANN is not generalizing over the Hebraic glyphs as it does over the Han glyphs; the ANN has actually seen the Hebraic glyphs before. Then it naturally does a better job classifying them. Even though Hebrew and Thai get an unfair advantage, the solution does seem to work. It is far from perfect, but almost all of the writing systems are classified more often as itself than of any of the others. Hangul is more often classified as Thai than of Hangul, but just a bit. It is an even greater problem in the last approach (Figure C.2). Approach 3 is good at classifying both Hebrew and Han, and is thus not biased towards smaller or larger writing systems (Figure C.3). It is, however, not very good at separating Hangul and Thai. The percentage is not that impressive (Table C.2), but it is the most unbiased, as it performs well on both small and large writing systems. Approach 3 — to use only 30 glyphs from each of the writing systems — is the overall best solution. This is the most unbiased approach, and thus the fairest. APPENDIX C. DATA SET DISTRIBUTION C-4

Approach 2 35 30 25 20 15 10 Classified as 5 0 -5 Hebrew Thai Han Hangul

Writing system

Hebrew Hangul Thai Han

Figure C.2: Comparison of what the different writing systems were classified as, using Approach 2; as many glyphs as possible, with repeating glyphs if necessary.

Approach 3 4 3 2 1 0 -1 Classified as -2 -3 -4 Thai Han Hebrew Hangul

Writing system

Hebrew Hangul Thai Han

Figure C.3: Comparison of what the different writing systems were classified as, using Approach 3; as many glyphs as possible without repetition and with the same amount from all writing systems. APPENDIX D

Pixel Comparison

To get a better grasp on how the system interprets the writing systems, a pixel comparison test was performed. The writing systems were compared to each other, glyph by glyph, based on their pixel values. That is, based on their input values to the neural network. Two glyphs are compared by going through the image, pixel for pixel, testing the two pixels for similarity. The total similarity is then the average number of pixels that are similar. Precisely what it means to be similar, is the difference between the comparisons below. The colours in the comparison tables are the grey-scale of LaTEX, as a function of the comparison values. Hence, a lighter colour means a higher value, thus higher similarity. As can be seen in Table D.1 on page D-2, the similarity is not that high, not even for the same writing systems. Even though a glyph have a similarity of 1.0 compared to itself, the writing systems have a lower similarity value because of the differences between the glyphs of that writing system. There are some interesting similarities (Table D.2 on page D-2). For example, the Egyptian hieroglyphs have a very low similarity with all the writing systems, itself included. Thai, on the other hand, has a clear similarity with itself. The same goes for Hebrew, Hangul, hiragana, Canadian Aboriginal syllabics and Han, but not as much. It is also clear that Hangul, hiragana, Canadian Aboriginal syllabics and Han are quite similar. There are four lighter areas where these writing systems meet. Thai is also not that far from. It creates a darker cross in the middle, but it is still far lighter than the other writing systems. Hebrew is not far from either, but is more similar to Thai than the others. Coptic, runic, Arabic and hieroglyphs can be seen as darker lines, with low similarity to the rest. However, while Coptic and runic has acceptably high similarity with itself, Arabic and hieroglyphs have very low internal similarity. Tables D.1 and D.2 do not, however, show the whole truth. As the pixels are compared on a same-value basis, the pixel values 253 and 254 are just as different as 0 and 255. More tests were thus executed. Tables D.3 to D.5 compare the pixels by first sending them through Richards’ curve, or the generalized sigmoid function (BioSS). The time of maximal growth

D-1 APPENDIX D. PIXEL COMPARISON D-2 is 128, 200 and 250, respectively. Tables D.6 to D.9 compare the pixels by comparing the difference in their values. The glyphs are considered similar if the difference is less than 40, 20, 10, and 5, respectively. Considering all the tables, we see some patterns. An expected pattern is that there is a higher general similarity in the cases with a more forgiving similarity function. Another pattern is how different Arabic is from all the other writing systems. This is the case in all of the tables. Another of these black lines can be seen in all the tables. This one, however, is moving from Han to the hieroglyphs. In the cases with strict similarity the hieroglyphs are different from the rest, while Han is different in the cases with a more forgiving similarity function. The crossing line where the writing systems are compared to itself can also be seen as more or less lighter than the rest.

Table D.1: Similarity between the glyphs, compared by pixel values. Runic Hebrew Arabic Hiragana Han Syllabics Coptic Thai Hieroglyphs Hangul Coptic 0.221 0.180 0.212 0.182 0.230 0.228 0.212 0.221 0.210 0.154 Runic 0.180 0.217 0.198 0.180 0.215 0.209 0.194 0.198 0.192 0.154 Hebrew 0.212 0.198 0.285 0.209 0.248 0.247 0.261 0.223 0.210 0.177 Arabic 0.182 0.180 0.209 0.195 0.216 0.212 0.210 0.199 0.188 0.156 Hangul 0.230 0.215 0.248 0.216 0.313 0.291 0.251 0.268 0.262 0.172 Hiragana 0.228 0.209 0.247 0.212 0.291 0.313 0.260 0.262 0.261 0.176 Thai 0.212 0.194 0.261 0.210 0.251 0.260 0.368 0.230 0.214 0.180 Syllabics 0.221 0.198 0.223 0.199 0.268 0.262 0.230 0.259 0.241 0.162 Han 0.210 0.192 0.210 0.188 0.262 0.261 0.214 0.241 0.259 0.155 Hieroglyphs 0.154 0.154 0.177 0.156 0.172 0.176 0.180 0.162 0.155 0.155

Table D.2: Similarity between the glyphs, compared by pixel values. Light cells have a higher level of similarity than darker cells. Hangul Runic Hebrew Syllabics Hieroglyphs Arabic Hiragana Thai Han Coptic Coptic Runic Hebrew Arabic Hangul Hiragana Thai Syllabics Han Hieroglyphs D-3

Table D.3: Comparison of Richard’s curve (BioSS) as a function of pixel value, with 128 as time of maximum growth. This was the function used during learning (Section 3.3). Hangul Coptic Arabic Syllabics Runic Hebrew Hiragana Thai Han Hieroglyphs Coptic Runic Hebrew Arabic Hangul Hiragana Thai Syllabics Han Hieroglyphs

Table D.4: Comparison of Richard’s curve (BioSS) as a function of pixel value, with 200 as time of maximum growth. Runic Hangul Syllabics Hebrew Hieroglyphs Coptic Arabic Hiragana Thai Han Coptic Runic Hebrew Arabic Hangul Hiragana Thai Syllabics Han Hieroglyphs

Table D.5: Comparison of Richard’s curve (BioSS) as a function of pixel value, with 250 as time of maximum growth. Coptic Hangul Hiragana Thai Syllabics Runic Hebrew Hieroglyphs Arabic Han Coptic Runic Hebrew Arabic Hangul Hiragana Thai Syllabics Han Hieroglyphs APPENDIX D. PIXEL COMPARISON D-4

Table D.6: Comparison where the pixels are considered similar if the difference between them is less than 40. Hangul Hieroglyphs Coptic Runic Hebrew Arabic Hiragana Thai Syllabics Han Coptic Runic Hebrew Arabic Hangul Hiragana Thai Syllabics Han Hieroglyphs

Table D.7: Comparison where the pixels are considered similar if the difference between them is less than 20. Hangul Han Runic Hebrew Syllabics Hieroglyphs Coptic Arabic Hiragana Thai Coptic Runic Hebrew Arabic Hangul Hiragana Thai Syllabics Han Hieroglyphs

Table D.8: Comparison where the pixels are considered similar if the difference between them is less than 10. Hiragana Coptic Hangul Runic Hebrew Syllabics Arabic Thai Han Hieroglyphs Coptic Runic Hebrew Arabic Hangul Hiragana Thai Syllabics Han Hieroglyphs D-5

Table D.9: Comparison where the pixels are considered similar if the difference between them is less than 5. Runic Hangul Syllabics Hebrew Hieroglyphs Coptic Arabic Hiragana Thai Han Coptic Runic Hebrew Arabic Hangul Hiragana Thai Syllabics Han Hieroglyphs