Repositorio Institucional De La Universidad Autónoma De Madrid
Total Page:16
File Type:pdf, Size:1020Kb
Repositorio Institucional de la Universidad Autónoma de Madrid https://repositorio.uam.es Esta es la versión de autor de la comunicación de congreso publicada en: This is an author produced version of a paper published in: Computer Speech and Language 28.3 (2014): 788-811 DOI: http://dx.doi.org/10.1016/j.csl.2013.10.003 Copyright: © 2014 Elsevier B.V. All rights reserved El acceso a la versión del editor puede requerir la suscripción del recurso Access to the published version may require subscription Accepted Manuscript Title: A rule-based translation from written Spanish to Spanish Sign Language glosses Author: Jordi Porta Fernando Lopez-Colino´ Javier Tejedor Jose´ Colas´ PII: S0885-2308(13)00086-7 DOI: http://dx.doi.org/doi:10.1016/j.csl.2013.10.003 Reference: YCSLA 626 To appear in: Received date: 1-2-2012 Revised date: 26-8-2013 Accepted date: 11-10-2013 Please cite this article as: Porta, J., Lopez-Colino,´ F.,A rule-based translation from written Spanish to Spanish Sign Language glosses, Computer Speech & Language (2013), http://dx.doi.org/10.1016/j.csl.2013.10.003 This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain. *Manuscript 1 2 3 A rule-based translation from written Spanish to Spanish Sign Language glosses 4 5 6 Jordi Porta1, Fernando L´opez-Colino, Javier Tejedor, Jos´eCol´as 7 Human Computer Technology Laboratory 8 Escuela Polit´ecnica Superior – Universidad Aut´onoma de Madrid 9 Av. Francisco Tom´as y Valiente, 11. 28049 Madrid, Spain 10 11 12 13 14 15 Abstract 16 One of the aims of Assistive Technologiesis to help people with disabilities to communicate with others and to provide 17 means of access to information. As an aid to Deaf people, we present in this work a production-quality rule-based 18 machine system for translating from Spanish to Spanish Sign Language (LSE) glosses, which is a necessary precursor 19 to building a full machine translation system that eventually produces animation output. The system implements a 20 21 transfer-based architecture from the syntactic functions of dependency analyses. A sketch of LSE is also presented. 22 Several topics regarding translation to sign languages are addressed: the lexical gap, the bootstrapping of a bilingual 23 lexicon, the generation of word order for topic-oriented languages, and the treatment of classifier predicates and 24 classifier names. The system has been evaluated with an open-domain testbed, reporting a 0.30 BLEU (BiLingual 25 Evaluation Understudy) and 42% TER (Translation Error Rate). These results show consistent improvements over a 26 statistical machine translation baseline, and some improvements over the same system preserving the word order in 27 the source sentence. Finally, the linguistic analysis of errors has identified some differences due to a certain degree of 28 structural variation in LSE. 29 Keywords: Machine Translation, Spanish, Spanish Sign Language, LSE, Deaf People Communication 30 31 32 33 34 1. Introduction 35 36 Translation helps people to communicate across linguistic and cultural barriers. However, according to Isabelle 37 and Foster (2005), translation is too expensive, and its cost is unlikely to fall substantially enough, to constitute it as a 38 practical solution to the everyday needs of ordinary people. Although it remains to be seen if machines will ultimately 39 compete seriously with humans in translation, machine translation can help break linguistic barriers and can make 40 translation affordable to many people. This situation is especially important for Deaf people, since translation helps 41 Deaf and Hearing communities to communicatewith each other, and provides Deaf people with the same opportunities 42 to access information as everyone else. 43 44 1.1. Sign Languages 45 Sign Languages (SLs) exploit a different physical medium from the oral-aural system of spoken languages. SLs 46 are gestural-visual languages, and this difference in modality causes SLs to constitute another branch within the 47 typology of languages. However, there are still many myths around SLs. One of the most common and enduring 48 myths is that sign language is universal; however, every Deaf community has its own SL, even within the same 49 country. For example, in Spain, apart from Spanish Sign Language (LSE), there exists another recognised SL, known 50 Accepted Manuscript 51 as Catalan Sign Language (LSC) and used within the Catalan Deaf community. Another common myth is that there 52 is a correlation between spoken and SL families. American Sign Language (ASL) and British Sign Language (BSL), 53 however, despite the fact that both are SLs used in English-speaking countries, are mutually unintelligible. Sign 54 55 Email addresses: [email protected] (Jordi Porta), [email protected] (Fernando L´opez-Colino), [email protected] (Javier 56 Tejedor), [email protected] (Jos´eCol´as) 57 1Corresponding Author. Tel.: +34 91 497 33 64 / Fax: +34 91 497 22 35 58 59 Preprint submitted to Computer Speech and Language August 26, 2013 60 61 62 63 64 Page 1 of 27 65 1 2 3 languages do not derive from spoken languages, but, as any other languages, can be influenced by contact with other 4 languages. As with spoken languages, when the use of an SL is extended, dialects and varieties are developed. 5 In other words, SLs are natural languages that arise spontaneously in Deaf communities to fulfil the function of 6 communication. 7 Phonocentrism is a view in which speech is considered to be superior to, or more natural than, written language. 8 ff 9 This attitude, which is still dominant in western culture, has negatively a ected the consideration of sign languages, 10 adding to their status of minority languages the status of minorised languages, i.e., languages whose value is not 11 recognized on the interactional scene by speakers of a sociolinguistically dominant language. This also encourages 12 the assumption that speakers of the minorised language conform to the usage and interactive norms set by their 13 interlocutors. Nevertheless, scientific claims regarding the status of SLs as “real” human languages have been made 14 since the work of W. Stokoe on the ASL (Stokoe, 1960). SLs in developed countries, especially SLs in Europe 15 and North America, dominated research during the first decades of study. Currently, language typologists still have 16 some difficulties accessing research on a number of regions like Central and South America, Africa, or Asia because 17 most publications are written in national languages not accessible to a wider international audience. According to 18 Zesman (2007), the state of knowledge regarding SLs has developed like a mosaic with many untiled gaps, but 19 this is increasingly giving typologists a clearer picture of the range of diversity in SLs. Some cross-linguistic and 20 typological studies of SLs, such as those by Sandler and Lillo-Martin (2006) or Brentari (2010), have shed light on 21 both the universals and the diversity of SLs, contributing to the understanding of human languages in general. 22 The lack of a writing system is characteristic of SLs, and one shared with two-thirds of the spoken languages of 23 the world. Strictly speaking, the only way of representing SLs is to use motion pictures. However, several notational 24 systems exist. The most important today are SignWriting (Sutton, 1974) and HamNoSys (Prillwitz et al., 1989). 25 SignWriting was conceived primarily as a writing system, and has its roots in DanceWriting (Sutton, 1973), a notation 26 for reading and writing dance movements. HamNoSys was conceived as a phonological transcription system for 27 SLs, with the same objective as the International Phonetic Alphabet (IPA) for spoken languages. There is another 28 29 alphabetic writing system, designed specifically for LSE, and called SEA (Sistema de escritura alfab´etica) (Herrero 30 et al., 2001); this uses the Latin alphabet, and has LSE’s phonology as its basis. As this paper focuses primarily 31 on syntax, glosses will be used here instead of a phonological notation. Glossing is a commonly used system for 32 explaining or representing the meaning of signs and the grammatical structure of signed phrases and sentences in a 33 text written in another language. However, glossing is not a writing system that could be understood by sign language 34 users. A machine translation (MT) system needs to produce an animation to be considered a complete and useful 35 system. 36 Most contemporary works on SLs have adopted language theories created for spoken language instead of devel- 37 oping new theories. This adoption leads naturally not only to the study of the phonology, morphology, and morpho- 38 syntax of SLs, but also to the study of all other descriptive levels found in spoken languages. However, from the point 39 of view of natural language processing, SLs are still under-resourced or low-density languages – that is to say, little 40 or no specific technology is available for these languages, and computerised linguistic resources, such as corpora or 41 lexicons, are very scarce. This situation, of course, is not exclusive to SLs, since it in fact applies to most of the 42 languages of the world. 43 44 1.2. Motivation and organisation of this paper 45 2 3 46 According to INE and MEC , there are 1,064,000 Deaf people in Spain.