Analysis of Notation Systems for Machine Translation of Sign Languages

Analysis of Notation Systems for Machine Translation of Sign Languages Submitted in partial fulfilment of the requirements of the degree of Bachelor of Science (Honours) of Rhodes University Jessica Jeanne Hutchinson Grahamstown, South Africa November 2012 Abstract Machine translation of sign languages is complicated by the fact that there are few standards for sign languages, both in terms of the actual languages used by signers within regions and dialogue groups, and also in terms of the notations with which sign languages are represented in written form. A standard textual representation of sign languages would aid in optimising the translation process. This area of research still needs to determine the best, most efficient and scalable techniques for translation of sign languages. Being a young field of research, there is still great scope for introducing new techniques, or greatly improving on previous techniques, which makes comparing and evaluating the techniques difficult to do. The methods used are factors which contribute to the process of translation and need to be considered in an evaluation of optimising translation systems. This project analyses sign language notation systems; what systems exists, what data is currently available, and which of them might be best suited for machine translation purposes. The question being asked is how using a textual representation of signs could aid machine translation, and which notation would best suit the task. A small corpus of SignWriting data was built and this notation was shown to be the most accessible. The data was cleaned and run through a statistical machine translation system. The results had limitations, but overall are comparable to other translation systems, showing that translation using a notation is possible, but can be greatly improved upon. ACM Computing Classification System Thesis classifcation under the ACM Computing Classifcation System (1998 version, valid through 2012) I.2.7 [Natural Language Processing]: Language parsing and understanding, Machine translation, Text analysis J.5 [Arts and Humanities] Language translation, Linguistics General Terms : Sign Language, Machine Translation Acknowledgements I would like to acknowledge the financial and technical support of Telkom, Tellabs, Stortech,Genband, Easttel, Bright Ideas 39 and THRIP throughthe Telkom Centre of Excellence in the Department of Computer Science at Rhodes University. Contents 1 Introduction 1 2 Background 3 2.1 Sign Language . 3 2.1.1 3D Signing Space . 3 2.1.2 Use of Time . 4 2.1.3 Classifier Predicates . 5 2.1.4 Phonetics and Phonology . 5 2.1.5 Variation . 7 2.1.6 Notations . 7 2.2 Machine Translation . 7 2.2.1 Rule-based Systems . 8 2.2.2 Empirical Systems . 9 2.3 Machine Translation of Sign Languages . 10 2.3.1 Text-To-Sign Systems . 10 2.3.2 Sign-To-Text Translation . 11 i CONTENTS ii 3 Notations 13 3.1 Stokoe . 13 3.2 Gloss . 14 3.3 Hamburg Notation System . 15 3.4 SignWriting . 16 3.5 Other Representation Systems . 18 4 Corpus Construction and Translation 22 4.1 Data Requirements . 22 4.2 Acquiring Data . 23 4.3 Data Cleaning and Preparation . 24 4.3.1 Language model . 26 4.4 Machine Translation . 26 5 Conclusion 29 5.1 Future work . 29 List of Figures 2.1 The ASL sign for \dog" [1] . 4 2.2 The ASL sign for \mother" on the left, and for \father" on the right . 5 2.3 Machine Translation Architectures [14] . 8 3.1 An Example of the Stokoe system - an extract from Goldilocks and the Three Bears[21] . 14 3.2 An Example of Gloss Notation [22] . 15 3.3 An Example of HamNoSys using the same passage as in Figure 3.1 . 16 3.4 An Example of SignWriting, with the same passage as in Figure 3.1 . 17 3.5 An example of a sign in the MSW encoding [26] . 18 3.6 A passage of the bible in ASLSJ . 19 4.1 Luke 3:8 in American Sign Language . 28 iii List of Tables iv Chapter 1 Introduction Machine translation is one of the oldest areas of research in Computer Science[16]. Sign Linguistics, on the other hand, is a relatively new field of study in the area of Linguistics, and the combination - machine translation of sign languages - is less than twenty years old[23]. Sign languages as languages have not been studied as extensively as spoken languages, and there is still much to be learned about them. There are few standards for sign languages, both in terms of the actual languages used by signers within regions and dialogue groups, and also in terms of the notations with which sign languages are represented in written form. The second factor affects the first: sign languages remain localised to small regions, as there is little technology that allows widespread use of a language or dialect. Visual communication tools such as Skype or Google Hangouts offer a partial solution to this problem, but there are still network-related and hardware-related difficulties that make such communication more difficult than text communication, especially in regions where there is less access to good technology. A standard textual representation of sign languages would provide an easier communication medium to sign language users and if more communication is happening across regions, language use and dialects also become more standardised [4]. Standardisation of language aids in machine translation of that language, as there are fewer variations, contributing to a smaller error margin. Correspondingly, machine translation can assist with the standardisation of language, as it can provide greater accessibil- ity for communication, even between language groups (such as between spoken languages and signed languages, or between two signed languages). 1 2 A number of systems for translation between signed and spoken languages have been built or prototyped, with a range of focuses and using a variety of techniques, some successful and some not yet successful, all with their advantages and disadvantages. Many research projects are currently in progress, researching better techniques for data processing, translation models, gesture recognition, sign synthesis. Projects in this area have not yet reached a point where they can start improving on existing systems, optimising them, and deploying them; the general research question is still \can this be done?" The current goal for this area of research would be to get to that point where the question of whether translation can be achieved no longer needs to be asked and where the best, most efficient and scalable techniques for translation are known. To get to that point, however, a lot more research needs to be done on which options are the best. Being a young field of research, there is still great scope for introducing new techniques, or greatly improving on previous techniques, which makes comparing and evaluating the techniques difficult to do. A technique previously considered inadequate may have been improved upon outside of its use in translation, and might no longer be inadequate. The methods used are factors which contribute to the process of translation and need to be considered in an evaluation of optimising translation systems. This project aims to look into the field of sign language translation in terms of notation systems; what systems exists, what data is currently available, and which of them might be best suited for machine translation purposes. The question being asked is how using a textual representation of signs could aid machine translation, and which notation would best suit the task. To do this, the research will provide an analysis of the major notation systems and some minor ones with a focus on their suitability for translation by machine. This research will be both theoretical and practical, involving the building of a basic translation system from available data in order to practically test the limitations and strengths of that notation. The result of this research will hopefully provide knowledge that can be built upon to improve future translation systems. To begin with, a background into the three main fields will be provided; sign language linguistics, machine translation, and the combination: machine translation of sign languages. A brief overview of previous translation systems that have been built or prototyped will be provided. The research will then provide an analysis of notation systems and go into describing the process of constructing a corpus for translation, with reference to practical experience translating ASL data with the Moses translation system. Chapter 2 Background 2.1 Sign Language Sign languages have not been studied as extensively as spoken languages, and the field of research is still fairly young [13]. A good understanding of how sign languages are used is necessary for creating a good translation system. Sign languages are vastly different from spoken languages; some of the methods used in spoken language translation/recognition systems can be applied, but not all of them are suitable. Sign languages are not all the same; each one will have differences in syntax, lexicon, etc. However, there are concepts referring to the nature of the production of signs that are unique to sign languages and that apply to all sign languages. What is known about sign language universals must be considered and used to adapt translation systems. 2.1.1 3D Signing Space Sign languages exist in a visual medium and are three dimensional; spoken languages exist in an audio medium and are two dimensional. Sign languages function within both space and time, therefore requiring specific techniques for processing, and posing interesting challenges not encountered in the translation/recognition of spoken languages. A signer will use their whole body and the space around them in communication. The hands of a signer are the main production units of signs, but anywhere from above the head to about waist height on the body is used as place of articulation[8].

Analysis of Notation Systems for Machine Translation of Sign Languages

An Animated Avatar to Interpret Signwriting Transcription

A Human-Editable Sign Language Representation for Software Editing—And a Writing System?

Greek Alphabet ( ) Ελληνικ¿ Γρ¿Μματα

Expanding Information Access Through Data-Driven Design

Complete Issue

Hwaslpreview.Pdf

Signwriting Symbols to Students

A Cross-Linguistic Guide to Signwriting ® a Phonetic Approach

22Hwaslebookfree.Pdf

Automated Sign Language Translation: the Role of Artiﬁcial Intelligence Now and in the Future

© 2013 Gabrielle Anastasia Jones

Writing Signed Languages: What For? What Form? Donald A