<<

Analysis of Notation Systems for Machine Translation of Sign Languages

Submitted in partial fulfilment of the requirements of the degree of

Bachelor of Science (Honours)

of Rhodes University

Jessica Jeanne Hutchinson

Grahamstown, South Africa November 2012 Abstract

Machine translation of sign languages is complicated by the fact that there are few stan- dards for sign languages, both in terms of the actual languages used by signers within regions and dialogue groups, and also in terms of the notations with which sign languages are represented in written form. A standard textual representation of sign languages would aid in optimising the translation process.

This area of research still needs to determine the best, most efficient and scalable tech- niques for translation of sign languages. Being a young field of research, there is still great scope for introducing new techniques, or greatly improving on previous techniques, which makes comparing and evaluating the techniques difficult to do. The methods used are factors which contribute to the process of translation and need to be considered in an evaluation of optimising translation systems.

This project analyses notation systems; what systems exists, what data is currently available, and which of them might be best suited for machine translation purposes. The question being asked is how using a textual representation of signs could aid machine translation, and which notation would best suit the task. A small corpus of SignWriting data was built and this notation was shown to be the most accessible. The data was cleaned and run through a statistical machine translation system. The results had limitations, but overall are comparable to other translation systems, showing that translation using a notation is possible, but can be greatly improved upon. ACM Computing Classification System

Thesis classifcation under the ACM Computing Classifcation System (1998 version, valid through 2012)

I.2.7 [Natural Language Processing]: Language parsing and understanding, Machine translation, Text analysis

J.5 [Arts and Humanities] Language translation, Linguistics

General Terms : Sign Language, Machine Translation Acknowledgements

I would like to acknowledge the financial and technical support of Telkom, Tellabs, Stortech,Genband, Easttel, Bright Ideas 39 and THRIP throughthe Telkom Centre of Excellence in the Department of Computer Science at Rhodes University. Contents

1 Introduction 1

2 Background 3

2.1 Sign Language ...... 3

2.1.1 3D Signing Space ...... 3

2.1.2 Use of Time ...... 4

2.1.3 Classifier Predicates ...... 5

2.1.4 and ...... 5

2.1.5 Variation ...... 7

2.1.6 Notations ...... 7

2.2 Machine Translation ...... 7

2.2.1 Rule-based Systems ...... 8

2.2.2 Empirical Systems ...... 9

2.3 Machine Translation of Sign Languages ...... 10

2.3.1 Text-To-Sign Systems ...... 10

2.3.2 Sign-To-Text Translation ...... 11

i CONTENTS ii

3 Notations 13

3.1 Stokoe ...... 13

3.2 Gloss ...... 14

3.3 ...... 15

3.4 SignWriting ...... 16

3.5 Other Representation Systems ...... 18

4 Corpus Construction and Translation 22

4.1 Data Requirements ...... 22

4.2 Acquiring Data ...... 23

4.3 Data Cleaning and Preparation ...... 24

4.3.1 Language model ...... 26

4.4 Machine Translation ...... 26

5 Conclusion 29

5.1 Future work ...... 29 List of Figures

2.1 The ASL sign for “dog” [1] ...... 4

2.2 The ASL sign for “mother” on the left, and for “father” on the right . . . . 5

2.3 Machine Translation Architectures [14] ...... 8

3.1 An Example of the Stokoe system - an extract from Goldilocks and the Three Bears[21] ...... 14

3.2 An Example of Gloss Notation [22] ...... 15

3.3 An Example of HamNoSys using the same passage as in Figure 3.1 . . . . 16

3.4 An Example of SignWriting, with the same passage as in Figure 3.1 . . . . 17

3.5 An example of a sign in the MSW encoding [26] ...... 18

3.6 A passage of the bible in ASLSJ ...... 19

4.1 Luke 3:8 in ...... 28

iii List of Tables

iv Chapter 1

Introduction

Machine translation is one of the oldest areas of research in Computer Science[16]. Sign Linguistics, on the other hand, is a relatively new field of study in the area of Linguistics, and the combination - machine translation of sign languages - is less than twenty years old[23].

Sign languages as languages have not been studied as extensively as spoken languages, and there is still much to be learned about them. There are few standards for sign lan- guages, both in terms of the actual languages used by signers within regions and dialogue groups, and also in terms of the notations with which sign languages are represented in written form. The second factor affects the first: sign languages remain localised to small regions, as there is little technology that allows widespread use of a language or dialect. Visual communication tools such as Skype or Google Hangouts offer a partial solution to this problem, but there are still network-related and hardware-related difficulties that make such communication more difficult than text communication, especially in regions where there is less access to good technology. A standard textual representation of sign languages would provide an easier communication medium to sign language users and if more communication is happening across regions, language use and dialects also become more standardised [4].

Standardisation of language aids in machine translation of that language, as there are fewer variations, contributing to a smaller error margin. Correspondingly, machine trans- lation can assist with the standardisation of language, as it can provide greater accessibil- ity for communication, even between language groups (such as between spoken languages and signed languages, or between two signed languages).

1 2

A number of systems for translation between signed and spoken languages have been built or prototyped, with a range of focuses and using a variety of techniques, some suc- cessful and some not yet successful, all with their advantages and disadvantages. Many research projects are currently in progress, researching better techniques for data process- ing, translation models, gesture recognition, sign synthesis. Projects in this area have not yet reached a point where they can start improving on existing systems, optimising them, and deploying them; the general research question is still “can this be done?”

The current goal for this area of research would be to get to that point where the question of whether translation can be achieved no longer needs to be asked and where the best, most efficient and scalable techniques for translation are known. To get to that point, however, a lot more research needs to be done on which options are the best. Being a young field of research, there is still great scope for introducing new techniques, or greatly improving on previous techniques, which makes comparing and evaluating the techniques difficult to do. A technique previously considered inadequate may have been improved upon outside of its use in translation, and might no longer be inadequate. The methods used are factors which contribute to the process of translation and need to be considered in an evaluation of optimising translation systems.

This project aims to look into the field of sign language translation in terms of notation systems; what systems exists, what data is currently available, and which of them might be best suited for machine translation purposes. The question being asked is how using a textual representation of signs could aid machine translation, and which notation would best suit the task. To do this, the research will provide an analysis of the major notation systems and some minor ones with a focus on their suitability for translation by machine. This research will be both theoretical and practical, involving the building of a basic translation system from available data in order to practically test the limitations and strengths of that notation. The result of this research will hopefully provide knowledge that can be built upon to improve future translation systems.

To begin with, a background into the three main fields will be provided; sign language lin- guistics, machine translation, and the combination: machine translation of sign languages. A brief overview of previous translation systems that have been built or prototyped will be provided. The research will then provide an analysis of notation systems and go into describing the process of constructing a corpus for translation, with reference to practical experience translating ASL data with the Moses translation system. Chapter 2

Background

2.1 Sign Language

Sign languages have not been studied as extensively as spoken languages, and the field of research is still fairly young [13]. A good understanding of how sign languages are used is necessary for creating a good translation system. Sign languages are vastly different from spoken languages; some of the methods used in spoken language translation/recognition systems can be applied, but not all of them are suitable. Sign languages are not all the same; each one will have differences in syntax, lexicon, etc. However, there are concepts referring to the nature of the production of signs that are unique to sign languages and that apply to all sign languages. What is known about sign language universals must be considered and used to adapt translation systems.

2.1.1 3D Signing Space

Sign languages exist in a visual medium and are three dimensional; spoken languages exist in an audio medium and are two dimensional. Sign languages function within both space and time, therefore requiring specific techniques for processing, and posing interesting challenges not encountered in the translation/recognition of spoken languages. A signer will use their whole body and the space around them in communication. The hands of a signer are the main production units of signs, but anywhere from above the head to about waist height on the body is used as place of articulation[8]. Some signs, such as the ASL (American Sign Language) sign for dog, may even exist below the waistline (see Figure

3 2.1. SIGN LANGUAGE 4

Figure 2.1: The ASL sign for “dog” [1]

2.1.1 below). Signs can also be accompanied by a slight forward or backward lean of the body, which can add emphasis to words or indicate topic [24].

2.1.2 Use of Time

Sign languages use the medium of time differently to spoken languages; signed utter- ances make use of both simultaneous and sequential processes to distinguish signs. The -Hold model developed by Scott Liddell and Robert Johnson describes signs according to their sequential nature, as a series of movements and holds[19]. A sign may be described by a series of movements ending in a hold (MMMH), or moving from a hold to a hold (HMH), etc.

Speech processes occur sequentially and can be separated within a time sequence, e.g. the word “bad” can be separated into its phonetic components “b-a-d”, which occur after each other in the course of time (i.e. sequentially). The phonetic components of signs, however, can occur simultaneously as well as sequentially. An example is the pair of ASL signs for textitmother and father, which are the same except for their ; mother is signed on the chin and father is signed on the forehead (Figure 2.1.2)[31]. This difference occurs simultaneously with the other components of the sign - that of movement and . Sequentiality is also an important process, though, and should not be disregarded. An example of sequentially contrasting phonological processes is shown by the SASL signs for please and thank you; these two signs are the same, except that the sign for thank you has an extra hold after the other movements have occurred. 2.1. SIGN LANGUAGE 5

Figure 2.2: The ASL sign for “mother” on the left, and for “father” on the right

There are no clear boundaries between signs, i.e. where one sign ends and the next sign begins [32]. Boundaries cannot be determined by length because signs are varying in length. Change in movement, position, or hand shape does not indicate a change in sign, as such a change could still be part of one sign.

2.1.3 Classifier Predicates

Classifiers are morphological units which indicate the class to which an item belongs, i.e. they classify that item. Classifier predicates in sign languages function pronominally, i.e. the classified item will function as a pronoun [6]. Pronouns in Sign Languages are positioned as objects within the signing space that the signer can refer to; the term “classifier predicate” refers to this pronominal construct which a signer can manipulate to indicate various meanings such as size, location, movement, and other properties [15]. With classifier predicates, the signer can make use of the space around them to construct a 3D scene, manipulating referents within that scene in order to convey specific meaning. For example, an object can be depicted as travelling around another object, by making the sign for the first travel around the sign for the second. This adds greater complexity to the translation process.

2.1.4 Phonetics and Phonology

Sign Language Phonology is not well defined and has not been widely researched, com- pared with phonology of spoken languages[31]. In spoken languages, a is the smallest contrastive unit of sound, distinguishing between minimal pairs. For example, 2.1. SIGN LANGUAGE 6 the words bat and cat are distinguished by /b/ and /c/. in sign languages have visual rather than auditory contrastive components. The previously mentioned example of the ASL signs for mother and father exemplifies this, distinguished by the location of the sign (chin or forehead). Phonemes are distinguished by features [12]; in sign languages these features are , locations, orientations and movements [31].

Phonetics is related to phonology, but describes the physical structure of a sign, not only the features which contrast meaning. For spoken languages, phonology describes how sounds are perceived, while phonetics describes how sounds are physically created. Both phonetics and phonology use similar features to describe the discussed sound. An example of phonetic differences of sounds in the English spoken language is the difference between aspirated and unaspirated /p/. The aspirated sound ([ph]) occurs at the beginning of words such as pit, the unaspirated sound ([p]) occurs when the sound is made in the middle of a word, as in spit. The two sounds are perceived to be the same sound to the English speaker, but they are produced slightly differently; [ph] is made with a slight puff of air, while [p] is not.

Phonology is language specific, as it refers to how sounds (or signs) are perceived with reference to how they distinguish meaning between signs. If two different, but similar, handshapes are not used to contrast meaning in one sign language, they may not be seen as fundamentally different handshapes to signers of that language, and can be transcribed using the same symbol. However, a different language may use this difference in handshape to distinguish meaning, and therefore would require them to be distinguished in written form as well. For a sign language translation system to be universally applicable, a phonetic description of signs would be better than a phonological system.

Phonological Processes

When signs are produced continuously (as opposed to being produced in isolation), they undergo phonological processes which affect their production [19]. These processes in- clude:

• movement epenthesis: adding a movement between signs to get from the final posi- tion of one sign to the initial position of the next

• hold deletion: final hold of first sign and initial hold of the next sign are deleted

• metathesis: units of a sign swap places 2.2. MACHINE TRANSLATION 7

• assimilation: sign takes on characteristics of the sign before or after it

2.1.5 Variation

As was said before, sign languages are not standardised, and within one language different word forms may have dialectal differences between users. Even if the signs are the same, however, there will always be some variation in the way they are performed, whether by a different signer or the same signer. Sometimes these differences may have meaningful components, such as making a sign smaller or bigger to indicate the size of the object being spoken of or to emphasise the word, etc. but even between signs with exactly the same meaning performed by the same signer, there are still statistical variations in the motion that is made [32].

Signs can be performed by the left hand or the right hand, depending on whether the signer is left handed or right handed. If the sign is a two-handed sign, the dominant and weak hands could be switched between left and right for a number of reasons, besides the handedness of the speaker. It could indicate a difference in meaning (such as indicating which side of the speaker the signed object was on), or could be due to the hand positions of the previous sign or the following sign, or could indicate topic and subject, etc.

2.1.6 Notations

Sign languages are very far behind spoken languages in becoming written languages. There is still as yet no standard notation for sign languages, though various notations and forms of representation exist. Standards emerge with use, however, and this takes time. An analysis of the notation systems that exist will be carried out in Chapter 3, with a focus on how useful each notation system is for machine translation.

2.2 Machine Translation

Machine translation is one of the oldest fields in computing, but has advanced slowly due to challenges faced by the complex nature of natural languages, and the scarcity of funding due to lack of external interest at different stages of its history[16]. Machine 2.2. MACHINE TRANSLATION 8

Figure 2.3: Machine Translation Architectures [14] translation of spoken language has made significant progress by this point in time, but translation of sign languages remains in the early stages of development.

There are two main approaches to building machine translation systems: the rule-based approach, and the empirical approach. Early systems for spoken language translation were all rule-based; advancing research in artificial intelligence led to the incorporation of empirical methods[16]. The hybrid approach, combining empirical methods with a few grammatical rules, has proved to be the most successful[10].

2.2.1 Rule-based Systems

Rule-based machine translation can occur on three different levels, depicted in Figure 2.2.1 below.

Translation can occur on any of the three levels, the depth of linguistic information increasing with each level. The three different levels - direct, transfer and interlingua - require different levels of analysis of the source language, as shown by the labels on the left of the figure (i.e. morphological, syntactic, and semantic analysis), as well as different levels in the process of synthesizing the target language [14].

Grammars can be constructed in different ways. Some of those used by other sign lan- guage translation projects include tree adjoining grammars, head driven phrase structure grammars and lexical functional grammars[14][25]. 2.2. MACHINE TRANSLATION 9

Rule-based systems are limited by the need for grammars to describe the language. Con- structing grammars is a difficult process as natural languages are infinite in nature and the rules that describe them often interact in complex ways[2]. New processes encountered need new rules to be added to the grammar. Furthermore, languages are inherently am- biguous; it is often very difficult for a computer to accurately parse a complex sentence[3].

There are many factors contributing to the difficulty of constructing grammars for sign languages. A grammar would need to be constructed by the researcher for each sign language that is to be translated. This requires an extensive knowledge of the linguistic structure of the sign language in question, something which is yet to be formalised for each sign language. Most sign linguistic studies have been conducted on ASL; there are some linguistic universals common to all or most sign languages (such as those mentioned in Section 2.1), but grammatical rules do not transfer across languages, therefore what is known about ASL grammar is not always directly usable in translation systems for other languages.

Furthermore, this approach is not easily scalable. It can and has worked well within narrow language domains that have limited variation in sentence structure[14]. However, in order to deal with more complex sentence structures, new rules need to be added, and rules for the interaction of the new rules and existing rules need to be added, which quickly becomes complex.

2.2.2 Empirical Systems

Empirical systems can be either example-based or statistical[10]. The empirical approach uses data (examples) rather than predefined rules; the system will construct its own rules using a knowledge base of previously encountered translations. Example-based translation compares the sentence/phrase to be translated with a knowledge base of previously trans- lated sentences/phrases, building the translation from what it has encountered before. Statistical translation works with probabilities, using a corpus of parallel sentences to as- sign probability weights to words and phrases, and then using those weights to translate sentences/phrases[22].

An empirical system is more easily transferable from one language pair to another than rule-based systems, as there is no need for creating new rules; the system constructs a translation model from data, therefore to use the same system with another language, one need only to find a parallel corpus for that language pair and allow the system to 2.3. MACHINE TRANSLATION OF SIGN LANGUAGES 10 reconstruct translation models for that pair. This is complicated, however, by the fact that large corpora for sign languages do not exist. TAUS labs recommends 800 000 sentence pairs as a minimum for a good translation system, and 1.2 million or more aligned sentences for difficult language pairs [10].

2.3 Machine Translation of Sign Languages

Machine translation of sign languages is complicated by the fact that there are essentially two steps to the process: the translation component, and either recognition or synthesis depending on the direction of translation. Speech recognition software functions within the same language but between two media (voice and text); language translation soft- ware, while creating a mapping from one language to another, usually functions within the same medium (text to text). Sign Language translation would need to include both a natural language processing component and a computer vision component for recogni- tion/synthesis of visual signs, functioning between two media (video and text) as well as two languages (the signed language and the spoken language).

Translation can occur in two directions, from spoken to signed language or vice versa. Each process has its own challenges. It is useful for native signers to access information from the world around them, which is mostly geared for spoken language users. However, there can be no interaction with the world around them if the signers have no way of communicating in the opposite direction, which would require a translation system that functions in the opposite direction as well. Most of the systems that exist focus on TTS (Text-To-Sign), and are still mostly small-scale, functioning on limited datasets and focusing on the image processing component rather than the natural language processing component.

2.3.1 Text-To-Sign Systems

TEAM This system makes use of a synchronous lexicalised Tree Adjoining Grammar (TAG), storing pairs of words and signs in a dictionary. TEAM translates using a transfer architecture on a syntactic level[33].

ZARDOZ This system’s approach to translation is to use a language-independent event schema 2.3. MACHINE TRANSLATION OF SIGN LANGUAGES 11

for translation, which is essentially an interlingua, the highest translation level in the pyramid (see Figure 2.2.1)[29]. The higher the translation level, however, the more restricted the domain of the translation system[].

ViSiCAST The ViSiCAST system describes two major stages of TTS: semantic analysis of English text, producing some semantic-based representation, and translation from this to a representation that can be used to render an animation [25]. The tools used in this process are: a CMU Link Parser, Discourse Representation Theory, and a Head-Driven Phrase Structure Grammar.

ASL Workbench This system uses a lexical-functional grammar and a sophisticated phonological model, but requires user input at most stages of translation[14].

Morrissey Sara Morrissey outlines a prototype example-based machine translation (EBMT) system for the translation of sign languages [22]. The system translates between the following language pairs:

(ISL) and German (DE) • /Deutsche Gebardensprache (DGS) and English (EN) • DGS and DE • ISL and EN

Morrissey’s system focuses on the domain of travel dialogue, and transcription is done using the gloss transcription method. The system used for translation is Ma- TreX, which is a wrapper around the Moses translation system [22].

2.3.2 Sign-To-Text Translation

STT (Sign-To-Text) systems face the challenge of computer vision processes; extracting features from video data and representing them in some way before using the representa- tion in a translation model.

STT systems focus on the recognition component of translation. This component involves feature analysis of video images[31]. Features extracted from the video are matched to 2.3. MACHINE TRANSLATION OF SIGN LANGUAGES 12 phonetic or phonemic features of signs. The recognised features are mapped to a repre- sentation, which is then processed into a spoken language using translation techniques.

The method commonly used for recognition is hidden Markov models. This technique has been improved upon by Christian Vogler, who suggested that parallel hidden Markov models be used, aiding in the description of classifier predicates[31]. Vogler also recom- mends a phonetic rather than phonemic approach to recognising signs, because phonemes do not always fully describe a sign.

After the computer vision component, there ought to be a natural language processing component to the translation system. Current systems are often rule-based, after the initial recognition step. Rule-based approaches, however, are still somewhat limited to small datasets or specific domains. Statistical methods are not as easy to achieve, however, due to the lack of a written form for the signs. Video data is incompatible with existing translation systems, without a suitable transcription. A corpus based approach using statistical methods could optimise the translation process, but a written form of signs is necessary. Chapter 3

Notations

As previously stated, no standard written form for sign languages exists. Some of the representation systems that have been created and are currently being used in various ways will be described below. Using a notation is useful in sign language translation to enable easy distribution of data and to make the translation system scalable. Current systems only focus on limited domains, such as one topic domain (e.g. weather news broadcasts[30], travel dialogue[22]) or on a subset of sentence types (e.g. simple state- ments, wh-questions). Use of a notation in an intermediary step may aid in making systems more scalable, although it may depend on the notation used.

Each new project conducted by different researchers starts the entire process from the first step, gathering data for translation. It is difficult to share data between research projects due to the fact that each project builds its corpus with a particular research focus, using a particular language pair. This is further complicated by the lack of standards within the field. No standard notation means that each research project must decide on a notation system for that project. Gloss is the most common system used, but it has its own limitations. If a standard notation system emerges, it can aid in data sharing, and new projects can build on older projects, thereby enabling the projects and the field to progress more rapidly.

3.1 Stokoe

The Stokoe system was developed by in 1960[19]. It is phonetically based, and was the first notation to be produced for sign languages; most other notations are

13 3.2. GLOSS 14 based on Stokoe’s work[21]. Stokoe introduced the concept of segmenting signs into phones or phonemes.

Stokoe describes signs with the following aspects:

Hand configuration determined by the active hand, and denoted designator (dez)

Place of articulation denoted tabula (tab)

Movement the action of the sign, denoted signation (sig)

Figure 3.1: An Example of the Stokoe system - an extract from Goldilocks and the Three Bears[21]

The development of the Stokoe system advanced the field of sign linguistics significantly, but it is now outdated. The Stokoe system is seen as inadequate to represent sign lan- guage, as it has no way of indicating nonmanual features such as facial expressions. The system was created as a tool for linguists to describe signs, but is not necessarily useful for anything else (and indeed is no longer used for its primary purpose, due to its lack of nonmanual feature representation). It is primarily a hand-written system, although an ASCII form has been developed [20]. This ASCII system is described, but if any electronic data using the ASCII system has been created, it was not found. Stokoe’s system remains largely a hand-written system.

3.2 Gloss

Gloss notation represents a sign using a word stem from a spoken language[7].

Using gloss notation, one can describe signs with any degree of detail that one requires. This can be particularly useful for describing nonmanual components, emphasis, classi- fier predicates, etc. However, this also means that data described using gloss notation 3.3. HAMBURG NOTATION SYSTEM 15

Figure 3.2: An Example of Gloss Notation [22]

is variable. There are no standards for transcribing data using gloss and without guide- lines each dataset will be transcribed differently if transcribed by different people. This has implications for statistical translation; greater variability will affect the weightings assigned by the translation system.

One advantage of using gloss for machine translation is that the transcribed signs will already be in words from the target language, making it easier to map between the two languages. However, this is only true for that particular language pair. To have this close match using the same sign language and a different spoken language would require the data to be transcribed again. It is possible to translate between sign and spoken language when the sign language data is transcribed with a different spoken language gloss, but a close match provides a more accurate and efficient translation[22].

3.3 Hamburg Notation System

The Hamburg Notation System (HamNoSys) is a phonetically based notation system that was developed by Siegmund Prillwitz in 1984[11]. This system, like most representation systems, was initially handwritten, but a machine readable font is available from the University of Hamburg1. An XML encoding of HamNoSys called Signing Gesture Markup Language (SiGML) is also available. It was developed for the ViSiCast project by Richard Kennaway[17].

1http://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/hamnosys-97.html 3.4. SIGNWRITING 16

Some advantages of this system are that it is international (can be used for any sign language), it is iconic, it is adaptable, has a formal syntax and can be stored in a computer database[11]. However, it does not provide any easy way to describe non-manual features, such as facial expressions. This notation was developed for a linguistic description of signs, not to be used in any form of communication[11].

Figure 3.3: An Example of HamNoSys using the same passage as in Figure 3.1

3.4 SignWriting

SignWriting (SW) is a system developed by , a dancer who first devel- oped Sutton DanceWriting to annotate movement[27], and then after that developed SW to annotate gestures in sign languages. SignWriting was developed for communication purposes rather than linguistic purposes; the representations of signs are deliberately in- tuitive, with the intention being that the will be used by signers themselves, not only the linguists studying sign language. Furthermore, the notation was developed not only for one language, but was built to be appropriate for any sign language. A vast number of sign languages are already making use of the script. The goal of SignWriting is to enable signers to be literate in their first language, not requiring them to learn another language in order to read and write[28]

SW is a pictorial notation system and can describe non-manual features. It makes use of a 3.4. SIGNWRITING 17 set of symbols that can be combined to describe any sign. Though standardisation efforts are being put into place, the system is still flexible; if a language cannot describe a sign with the available symbols, it is possible to add to the set. Additions are not unlimited; if two signs can be described with the same symbol with understanding, there is no need for a new symbol. For example, in the English language written using the Latin , the same letter ’c’ can be used to describe the sound made in the word script and the sound made in the word cite.

Figure 3.4: An Example of SignWriting, with the same passage as in Figure 3.1

The SignWriting community exists both online and offline. This means that the script is not solely handwritten, but is used extensively both on computer and on paper. A markup language exists to describe texts written in SignWriting (SignWriting Markup Language, or SWML). There are also a number of ways to represent individual signs that are machine readable. Stephen Slavinski has developed the Sign Image Server (SWIS), which encodes the SignWriting script using a mathematical encoding known as Modern SignWriting (MSW)[26]. 3.5. OTHER REPRESENTATION SYSTEMS 18

Figure 3.5: An example of a sign in the MSW encoding [26]

3.5 Other Representation Systems

The following representation systems are less widely used than those previously men- tioned.

Berkeley Transcription System The Berkeley Transcription System (BTS) was developed to aid linguistic analysis of sign languages, at the level of meaning components [13]. The system was built by Nini Hoiting and Dan Slobin because other transcription systems at the time (2001) were found to be inadequate for the research they wished to conduct on sign languages. They aimed to capture the linguistic and communicative elements used by signing deaf children and their parents in order to be able to research sign language acquisition. The system was inspired by CHILDES2 (CHIld Langauge Data Exchange System), which provides a format for child language researchers to share data. The BTS is encoded in ASCII. Data that has been transcribed using BTS is a collection of conversations between signing children and their deaf or hearing parents.

Si5s3 is a for American Sign Language. It is newly developed (2003), and has not (yet) been extended for any other sign language. It was developed

2http://childes.psy.cmu.edu/ 3.5. OTHER REPRESENTATION SYSTEMS 19

Figure 3.6: A passage of the bible in ASLSJ

specifically for ASL, but that does not restrict it to use by ASL; it may be possible to extend the script beyond ASL. This script is still quite new.

ASLSJ4 ASLSJ (American Sign Language - Sign Jotting) was developed by Thomas Stone as a written aid for his own learning of ASL. The system does not follow a phonetic model like most other systems do, although there is an element of phonetics to it. The aim of the system is not to describe signs fully, but rather to provide a way to quickly write down signs on paper (“jotting” them down, as it were) in a way that only requirs one sign to be distinguishable from another. Stone states that he “felt this word, jotting, more accurately described [his] way of - for lack of a better simile - sounding out sign words”. An example sentence in ASLSJ:

SLIPA5 This system was created to provide an International Phonetic Alphabet (IPA) for sign languages (hence Sign Language IPA or SLIPA). It is also based on the , and not widely used.

SignFont This is a transcription system that is similar to SignWriting in that it is iconic, but does not have much acceptance and it is not widely used [9]. It has a small symbol set.

SignPhon SignPhon is a database that describes signs phonologically and in isolation. The database uses an encoding to describe signs, rather than developing a new script, in order to keep it notation-neutral[5]. 3.5. OTHER REPRESENTATION SYSTEMS 20

These are only a sample of the notation systems that have been and are being developed to describe sign languages. Each script was developed with a particular purpose in mind. The glossing system and the earlier scripts, Stokoe and HamNoSys, were developed to describe signs linguistically. Some of the minor representations were built for this same use, e.g. BTS and SignPhon. SignWriting, on the other hand, was developed as a script that signers themselves could use as a writing system. This is also the focus of Si5s and ASLSJ. The scripts that focus on describing signs linguistically can aid machine translation by the fact that they include relevant information. The scripts that focus on providing a written form for sign languages are also useful for machine translation, as they will describe signs enough to distinguish meaning. The major benefit of these representations is that if they are being used by signers then signs are already written down, reducing the enormity of the task of getting data into a transcribed form before using it to build translation systems.

A summary of these notation systems is shown in table 3.5

With this analysis it can be concluded that the best notation system for translation is one that signers themselves are using. This rules out the Stokoe system completely. The gloss- ing system is used by bilingual signers; most signers are bilingual by necessity, learning to read and write in the closest geographical spoken language. Glossing is problematic, though, due to its lack of standards and its restriction of language pairing. The minor transcription systems are mostly fairly new, and not often widely known. BTS is a lin- guistic description of signs, not a written representation. Si5s is deliberately a written representation, but it is a new system and it is not yet widely used. The remainder of the systems are project specific or developed by individuals and not widely used, or at least not yet. 3.5. OTHER REPRESENTATION SYSTEMS 21

It is not the aim of this research to decide on which notation system should be used by all signers and sign language research. Each script has its advantages and disadvantages, and most are being continuously changed and improved upon. The lack of existing data in certain scripts is seen as a disadvantage to machine translation, and it is the aim of this research to provide a recommendation on scripts to use for machine translation. However, the result will need to be continuously re-evaluated. Currently the most widely used representation by signers themselves is SignWriting. However, this is subject to change. Si5s, for example, has potential to be adopted as a script due to its ease of use. If reanalysis of transcription systems results in a different system being recommended, then it is always possible to create data using that system, thereby providing existing data.

Gloss notated data seems to be research specific. Stokoe is outdated and not used. Ham- NoSys is possibly the most widely used at this stage (besides gloss notation), but data is not easily attainable. SignWriting has proved to be the most accessible notation format for this research. There are a number of reasons to use SignWriting, one of them being that corpora are not only being added to by many people for various research reasons, but a corpus already exists. It is still in developmental stages, and there are a number of errors in the corpus data[26], but these are being improved on for the simple reason that SignWriting is being used. Moreover, SignWriting is not only being used by linguists or computer scientists conducting research on sign languages, it is being used by sign- ers themselves. SignWriting may have its limitations, but it is the most advanced and widely-used system that exists. Chapter 4

Corpus Construction and Translation

In order to assess usability of notations with experiential as well as theoretical reference, a basic machine translation system was implemented. The task involved gathering textual data, building a parallel corpus, cleaning and preparing the data, and running it through the translation system. Complete success was not expected; the goal of this process was to gain experiential knowledge of the limitations of different notations.

4.1 Data Requirements

All data is time-consuming to create. The 595 sentences used in [22] took one person three months to transcribe using gloss notation. Most notations require familiarity with the system in order to use it. This means that building a translation system from scratch is no easy task and requires a great deal of work. The researcher needs to familiarise him or herself with the script as well as with the sign language, gather data, take the time to transcribe that data, build the data into a corpus, clean and prepare the data for translation, build the system to be used for translation (which will involve another fair number of tasks), run the corpus through the system, and finally analyse the results before tuning the system and reiterating over the process.

It would be more time-efficient if data from previous projects were accessible and usable by other researchers so that new research does not need to start from the ground up each time. More time to focus on other areas of the process means that more progress can be made. New systems need to build on what exists; the fact that the data has to be produced each time for specific research projects by the researchers themselves means that

22 4.2. ACQUIRING DATA 23 it very often restricts the focus of the project. This is particularly limited by systems that use gloss notation, as the same data would need to be re-transcribed into the relevant spoken language in order to be reusable.

If there is more than one reason to build a corpus, it is more likely to be built. A corpus that is constructed for one research project with a particular research focus might not have any relevance to other research projects, and therefore getting more people involved in the corpus construction is difficult. However, if the corpus is reusable, and constructed with a broader aim, the corpus can grow much more quickly with contributions from other people’s research.

4.2 Acquiring Data

The original aim of this research was to gather data in a number of notations or represen- tations, and run them through a rudimentary translation system in order to compare the results between each of them. However, obtaining sufficient data for anything other than SignWriting proved difficult. Any other data that does exist is not easily accessible, or is not yet available. The database of SignWriting data, however, is being added to everyday, and is freely available on the Internet. From the SignWriting website it was possible to find a large portion of the Bible translated into ASL and transcribed in SignWriting, easily downloadable from the ASL Bible SignPuddle1. This was the largest available dataset, and provides verse aligned parallel data. This dataset will be discussed in more detail in the next section.

Other datasets that are available are a few children’s stories and some poetry; these, however, were not used, as there was not a large portion of it, and the language structure is not appropriate for training a translation system.

A project is currently in progress to translate Wikipedia pages into ASL, transcribed using SignWriting2. At the time of this research there were not many pages available on the ASL Wikipedia; initially only one page (the title page) had been translated, but by the time of writing, five other articles had been translated into ASL. This is likely to grow, and may prove to be a useful resource for gathering data for future projects. Another resource for SW data is The Frost Village Blog3, which is written in both English and

1http://www.signbank.org/signpuddle2.0/export.php?ui=1&sgn=28 2http://ase.wikipedia.wmflabs.org/wiki/Main_Page 3http://frostvillage.com/blog/lang/ase/ 4.3. DATA CLEANING AND PREPARATION 24

ASL using SignWriting, displaying the signs using the SignWriting Image Server (SWIS) rather than pictures, i.e. the signs are machine readable. This blog also allows readers to write comments using SignWriting. The advantage of this resource is that it contains conversational data. One disadvantage is that the data is paragraph aligned, not sentence aligned.

Another intended aim of this project was to focus on translating from SASL. The Lin- guistics department at Rhodes University is conducting research into SASL, and a group of students videoed 63 sentences signed by four different signers. The sentences were chosen according to specific research questions that each student expected to investigate linguistically. They were to transcribe these sentences using ELAN software and gloss notation. However, transcription is a time-consuming process, and only a small portion of the sentences have been transcribed - not enough for a translation system. This data was therefore not used for comparison within this project.

A survey of sign language corpora has been conducted, available from University of Ham- burg4. It includes a list of corpus projects and what is available. Many of the projects are still in progress, and will only be available in the future. The corpora are all video corpora, and most of them are transcribed with glossing systems, if they are transcribed.

4.3 Data Cleaning and Preparation

The ASL Bible corpus was easy to obtain, and consists of an XML file with verses from the Bible encoded in MSW, aligned with an English translation. The data needed to be extracted from the XML, aligned on a sentence level, and placed into two plain-text files in UTF-8 encoding. Sentence aligned data was required by the machine translation system used for this research (discussed in the next section). The ASL Bible is aligned by Bible verses, which may be complete sentences, but may also be fragments or may be more than one sentence on one line. This is not ideal for translation. Scripts have been written to automatically align parallel data by sentence using statistical methods, but these scripts are language specific; an ASL/English script would need to be written. Such a script was not found, and it was not possible to write one because the source language (ASL) was unfamiliar to the researcher. The data was sentence aligned as far as possible, by removing any lines that contained sentence fragments or too many sentences in one line. 4http://www.sign-lang.uni-hamburg.de/dgs-korpus/index.php/sl-corpora.html 4.3. DATA CLEANING AND PREPARATION 25

Because of the unfamiliarity of the source language, data was likewise not easy to clean. It was not possible to make any judgement on the adequacy of the ASL data and whether any lines ought to be removed or cleaned based on what was found in the ASL sentences. If any lines are not suitable due to reasons found in the ASL side of the data, they were not found; the data had to be cleaned by running through the English sentences only, and marking any lines that were uncertain. These lines were then simply removed from the corpus.

Some examples of lines that were removed:

Old English He that walketh uprightly walketh surely: but he that perverteth his ways shall be known.

Song lyrics - some song lyrics were found in the raw data that often contained repeti- tions, sentence fragments, or whole paragraphs of sentences. In my plans I was the first to leave - precious child, precious child. But in this world I was left here to grieve - precious child, precious child.

Genealogies - long lists of names, not relevant to machine translation, as they are often only seen once, and for sign language they are finger spelled Er was the son of Joshua. Joshua was the son of Eliezer Eliezer was the son of Jorim...

Descriptions - some verses were aligned with word-for-word translations or descriptions of the signs rather than an English translation His (looking up) bloodshed-over-me save me (2 hands), you-all (2 hands).

Lines that began with “Verse x” were not removed, but the verse marker was stripped from both the English and the ASL data. This was done with a script that searched the English data for lines that started with the word “Verse” or that had a colon within the second word (e.g. if the line started with “James 1:2”) and then stripped this from the lines in both files. This would clean most occurrences of verse markers. However, there are some that were not found and stripped. An example would be a verse beginning with a book and chapter marker rather than a verse marker, e.g. “Psalm 91”.

Ideally, the corpus should be cleaned better by going through it line by line, looking at both the English and the ASL and removing any unwanted data. If the corpus is very large, this is not possible. The corpus used in this research is not large, though it is not 4.4. MACHINE TRANSLATION 26 small either, but there were further limitations due to time constraints and the fact that the researcher was not familiar with the source language. Errors will occur in corpus data; the best that can be done is to minimise the errors as far as possible.

The original XML file was 87950; after extracting and cleaning, the resulting corpus has 4275 parallel lines in English and ASL. Though this is more than the number of sentences used in previous translation systems (Morrissey used less than 600 [22]), it falls short of the recommended number: 800 000 aligned sentences for a good translation system, and upwards from 1.2 million aligned sentences for difficult language pairs[10]. There is therefore scope for improvement in the corpus collection.

4.3.1 Language model

For the language model, a monolingual corpus was built of English sentences. These sentences were also taken from the Bible, so that the language domain was consistent. This means that the monolingual corpus is also verse rather than sentence aligned. The data for this was acquired from the World English Bible5.

4.4 Machine Translation

The statistical machine translation system used in this research is Moses[18], which uses statistical methods to create a translation model from a sentence aligned parallel corpus of two different languages. The Do Moses Yourself (DoMY)6 program was used, which automates the Moses installation process on Ubuntu 64-bit, and provides scripts which are wrappers around the Moses scripts. DoMY does no processing of its own; all the processing is done by the Moses system.

The steps in the translation process are as follows: clean-lm, clean-tm Cleaning involves word segmentation, and removal of unrecognised characters. The clean-lm command takes as input a monolingual corpus in the target language; the clean-tm command takes as input parallel corpus files in the source and target language.

5http://ebible.org/web/ 6http://www.precisiontranslationtools.com 4.4. MACHINE TRANSLATION 27 build-lm, build-tm These commands create build sets from the corpora from the pre- vious step. train This command builds a language model and a translation model from the build sets from the previous step. translate At this step, the system runs through the translation process.

The system divides up the data into training and testing sets according to a given ratio, trains the models on the first set, and then runs the models over the testing set. The results are evaluated with a Bleu and a NIST score.

The resulting BLEU score was fairly low, at 0.0845. On a second run the BLEU score was 0.0924, which is still fairly low. The translation output includes a number of words that are not translated, possible words that only occurred once in the corpus and therefore did not have a mapping assigned to them. This would affect the BLEU score. Apart from the words that were not translated, however, the resulting sentences were readable. For example, below is shown the Sign Language for Luke 3:8, with the translation being

That means nothing, for I tell you, God can create children of Abraham from these very stones.

The translation produced

that means nothing, for I tell you, God can the children of Abraham from these very stones. 28

Figure 4.1: Luke 3:8 in American Sign Language Chapter 5

Conclusion

Translation of sign languages is possible. The steps in translation of sign languages have scope for much improvement. This research has focused on the intermediary step between sign recognition and spoken language synthesis, analysing sign language notation systems and how they can be used to optimise the translation process. It was found that SignWriting provided a good notation for translation, though the other systems have not been completely ruled out and may still be successfully used in translation systems. SignWriting has its disadvantages, but the advantages of the system make it stand out from the rest of the systems.

The advantages of the SignWriting system are, in summary: it describes all the relevant features of signs, it is accessible to native signers, it is universal and is widely used, it has a machine readable format which is continuously being improved upon, and data in SignWriting is easily available.

A corpus of SignWriting data was constructed, cleaned, and translated using a statistical translation tool (Moses). The results obtained were comparable to previous translation systems, showing that this is indeed an adequate method to use in translating sign lan- guages.

5.1 Future work

There is great scope for improvement within every step of the translation of sign languages. A large corpus of data is essential, and future work can look at building such a corpus, and standardising the format of the data that is gathered.

29 5.1. FUTURE WORK 30

This research has not included sign recognition to or sign synthesis from notation sys- tems. Further research could build a system which maps features extracted from video to SignWriting or another notation.

Statistical translation involves a number of other processes for which scripts can be created to automate the process. For example, word segmenters, sentence aligners, etc. Bibliography

[1] American sign langauge university.

[2] Berger, A. L., Pietra, V. J. D., and Pietra, S. A. D. A maximum entropy approach to natural language processing. Comput. Linguist. 22, 1 (Mar. 1996), 39–71.

[3] Bird, S., Klein, E., and Loper, E. Natural language processing with Python. O’Reilly Media, Incorporated, 2009.

[4] Crasborn, O., Mesch, J., Waters, D., Nonhebel, A., van der Kooij, E., Woll, B., and Bergman, B. Sharing sign language data online: Experiences from the echo project. International journal of corpus linguistics 12, 4 (2007), 535–562.

[5] Crasborn, O., van der Hulst, H., and van der Kooij, E. Signphon: A phonological database for sign languages. Sign language & linguistics 4, 1-2 (2001), 1–2.

[6] Crystal, D. Dictionary of Linguistics and Phonetics. The Language Library. John Wiley & Sons, 2011.

[7] Dreuw, P., Rybach, D., Deselaers, T., Zahedi, M., and Ney, H. Speech recognition techniques for a sign language recognition system. In Eighth Annual Conference of the International Speech Communication Association (2007).

[8] Friedman, L. A. Space, time, and person reference in american sign language. Language 51, 4 (1975), pp. 940–961.

[9] Grieve-Smith, A. Signsynth: A sign language synthesis application using web3d and perl. Gesture and Sign Language in Human-Computer Interaction (2002), 37–53.

[10] Haddow, B. Principles of machine translation. Online, July 2012.

31 BIBLIOGRAPHY 32

[11] Hanke, T. Hamnosysrepresenting sign language data in language resources and language processing contexts. In Workshop on the Representation and Processing of Sign Languages on the occasion of the Fourth International Conference on Language Resources and Evaluation, Lisbon, Portugal (2004).

[12] Harms, R. Introduction to phonological theory. Prentice-Hall, 1968.

[13] Hoiting, N., and Slobin, D. Transcription as a tool for understanding: The berkeley transcription system for sign language research (bts). Directions in sign language acquisition (2002), 55–75.

[14] Huenerfauth, M. A survey and critique of american sign language natural lan- guage generation and machine translation systems. Tech. rep., Technical Report MS-CIS-03-32, Computer and Information Science, University of Pennsylvania, 2003.

[15] Huenerfauth, M. Generating american sign language classifier predicates for english-to-asl machine translation. PhD thesis, University of Pennsylvania, 2006.

[16] Hutchins, W. J. Machine translation. In Encyclopedia of Computer Science. John Wiley and Sons Ltd., Chichester, UK, pp. 1059–1066.

[17] Kennaway, R. Synthetic animation of deaf signing gestures. Gesture and Sign Language in Human-Computer Interaction (2002), 149–174.

[18] Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, C., Zens, R., et al. Moses: Open source toolkit for statistical machine translation. In Annual meeting-association for computational linguistics (2007), vol. 45, p. 2.

[19] Liddell, S., and Johnson, R. American sign language: The phonological base. Gallaudet University Press, Washington. DC, 1989.

[20] Mandel, M. Ascii-: A computer-writeable transliteration system for stokoe notation of american sign language, 1983.

[21] Martin, J. About the stokoe system.

[22] Morrissey, S. Data-driven machine translation for sign languages. PhD thesis, Dublin City University, 2008.

[23] Morrissey, S., and Way, A. An example-based approach to translating sign language. BIBLIOGRAPHY 33

[24] Ong, S., and Ranganath, S. Automatic sign language analysis: A survey and the future beyond lexical meaning. Pattern Analysis and Machine Intelligence, IEEE Transactions on 27, 6 (2005), 873–891.

[25] Saf´ ar,´ E.,´ and Marshall, I. The architecture of an english-text-to-sign- languages translation system.

[26] Slevinski, S. E. Modern SignWriting. Online, February 2012. Available from: https://github.com/Slevinski/msw.

[27] Sutton, V. About .

[28] Sutton, V. Why write sign language?

[29] Veale, T., and Conway, A. Cross modal comprehension in zardoz an english to sign-language translation system. In Proceedings of the Seventh International Workshop on Natural Language Generation (Stroudsburg, PA, USA, 1994), INLG ’94, Association for Computational Linguistics, pp. 249–252.

[30] Verlinden, M., Tijsseling, C., and Frowein, H. A signing avatar on the www. Gesture and Sign Language in Human-Computer Interaction (2002), 175–199.

[31] Vogler, C. American sign language recognition: reducing the complexity of the task with phoneme-based modeling and parallel hidden markov models.

[32] Vogler, C., and Metaxas, D. Asl recognition based on a coupling between hmms and 3d motion analysis. In Computer Vision, 1998. Sixth International Conference on (1998), IEEE, pp. 363–369.

[33] Zhao, L., Kipper, K., Schuler, W., Vogler, C., Badler, N., and Palmer, M. A machine translation system from english to american sign language. Envision- ing Machine Translation in the Information Future (2000), 191–193.