Coupling Natural Language Processing and Animation Synthesis in Portuguese Sign Language Translation

Coupling Natural Language Processing and Animation Synthesis in Portuguese Sign Language Translation Inês Almeida and Luísa Coheur Sara Candeias INESC-ID Microsoft Language Development Center Instituto Superior Técnico, Universidade de Lisboa Lisbon, Portugal [email protected] [email protected] Abstract Unfortunately, sign languages are not universal or a mere mimic of its country’s spoken counter- In this paper we present a free, open part. For instance, Brazilian Sign Language is not source platform, that translates in real time related with the Portuguese one. Therefore, none (written) European Portuguese into Por- or little resources can be re-used when one moves tuguese Sign Language, being the signs from one (sign) language to another. produced by an avatar. We discuss basic There is no official number for deaf persons in needs of such a system in terms of Nat- Portugal, but the 2011 census (Instituto Nacional ural Language Processing and Animation de Estatística (INE), 2012) mentions 27,659 deaf Synthesis, and propose an architecture for persons, making, however, no distinction in the it. Moreover, we have selected a set of level of deafness, and on the respective level of existing tools that couple with our free, Portuguese and Portuguese Sign Language (LGP) open-source philosophy, and implemented literacy. The aforementioned Virtual Sign Trans- a prototype with them. Several case stud- lator targets LGP, as well as the works described ies were conducted. A preliminary evalu- in (Bento, 2103) and (Gameiro et al., 2014). How- ation was done and, although the transla- ever, to the best of our knowledge, none of these tion possibilities are still scarce and some works explored how current Natural Language adjustments still need to be done, our plat- Processing (NLP) tasks can be applied to help form was already much welcomed by the the translation process of written Portuguese into deaf community. LGP, which is one of the focus of this paper. In addition, we also study the needs of such trans- 1 Introduction lator in terms of Animation Synthesis, and pro- Several computational works dealing with the pose a free, open-source platform, integrating state translation of sign languages from and into their of the art technology from NLP and 3D anima- spoken counter-parts have been developed in the tion/modelling . Our study was based on LGP last years. For instance, (Barberis et al., 2011) videos from different sources, such as the Spread 1 describes a study targeting the Italian Sign Lan- the Sign initiative , and static images of hand con- guage, (Lima et al., 2012) targets LIBRAS, the figurations presented in an LGP dictionary (Bal- Brazilian Sign Language, and (Zafrulla et al., tazar, 2010). The (only) LGP grammar (Amaral 2011) the American Sign Language. Some of the et al., 1994) was also widely consulted. Neverthe- current research focus on sign language recogni- less, we often had to recur to the help of an inter- tion (as the latter), some in translating text (or preter. speech) into a sign language (like the previously Based on this study we have implemented a pro- mentioned work dedicated to Italian). Some works totype, and examined several case studies. Fi- aim at recognising words (again, like the latter), nally, we performed a preliminary evaluation of others only letters (such as the work about LI- our prototype. Although much work still needs to BRAS). Only a few systems perform the two-sided be done, the feedback from deaf associations was translation, which is the case of the platform im- very positive. Extra details about this work can plemented by the Microsoft Asia group system be found in (Almeida, 2104) and (Almeida et al., (Chai et al., 2013), and the Virtual Sign Transla- tor (Escudeiro et al., 2013). 1http://www.spreadthesign.com 94 Proceedings of the 2015 Workshop on Vision and Language (VL’15), pages 94–103, Lisbon, Portugal, 18 September 2015. c 2015 Association for Computational Linguistics. 2015). The whole system is freely available2. words and punctuation. Then, possible ortho- This paper is organised as follows: Section 2 graphic errors are corrected. After this step, a ba- describes the proposed architecture, and Section 3 sic approach could directly consult the dictionar- its implementation. In Section 4 we present our ies, find the words that are translated into sign lan- prototype and, in Section 5, a preliminary evalua- guage, and return the correspondent actions, with- tion. Section 6 surveys related work and Section 7 out further processing. However, other NLP tools concludes, pointing directions for future work. can still contribute to the translation process. Some words in European Portuguese are signed 2 Proposed architecture in LGP as a sequence of signs, related with the Figure 1 presents the envisaged general architec- stem and affixes of the word. Therefore, a stem- ture. mer can be used to identify the stem and relevant suffixes (and prefixes), which allows to infer, for instance, the gender and number of a given word. Thus, we still might be able to properly translate a word that was not previously translated into LGP (or, at least, produce something understandable), if we are able to find its stem and affixes. To il- lustrate this, take as example the word ‘coelhinha’ (‘little female rabbit’). If we are able to identify its stem, ‘coelho’ (rabbit), and the suffix ‘inha’ (meaning, roughly, female (the ‘a’) and small (the ‘inho’)), we can translate that word into LGP by signing the words ‘female’ + ‘rabbit’ + ‘small’, in this order (which, in fact, is how it should be signed). Figure 1: Proposed architecture A Part-of-Speech (POS) tagger can also contribute to the translation process: We followed a gloss-based approach, where words are associated to their ‘meaning’ through a It can couple with the stemmer in the identi- • dictionary. The order of the glosses is calculated fication of the different types of affixes (for according with the LGP grammar (structure trans- instance, in Portuguese, a common noun that fer). Then, glosses are converted into gestures by ends in ‘ões’ is probably a plural). retrieving the individual actions that compose it. As there are some morphosyntactic cate- In the last and final stage, the animation is syn- • thesised by placing each action in time and space gories that have a special treatment in LGP, it in a non-linear combination. The current platform is important to find the correspondent words. is based on hand-crafted entries/rules, as there is For instance, according with (Bento, 2103), no large-scale parallel corpus available that would articles are omitted in LGP ((Amaral et al., allow us to follow recent tendencies in Machine 1994) reports doubts in the respect of their Translation. existence), and thus could be ignored when In the next sections we detail the three main identified. Also, the Portuguese grammar components of this platform, namely the NLP, the (Amaral et al., 1994) refers a temporal line in Lookup and the Animate components, by focus- the gesturing space with which verbs should ing on the needs of the translation system and how concord with in past, present and future these components contribute to it. tenses. Thus, to be able to identify the tense of a verb can be very important. 2.1 The Natural Language Processing A POS tagger usual feeds further processing, component • as for instance named entity recognisers and As usual, the first step consists in splitting the in- syntactic analysers. put text into sentences. These are tokenised into 2http://web.ist.utl.pt/~ist163556/ A Named Entity Recognizer allows to identify pt2lgp names of persons. It is usual, among the deaf, to 95 name a person with a sign (his/her gestural name), identified in other works (Liddell and Johnson, often with a meaning in accordance to his/her 1989; Liddell, 2003) as gesture subunits: a) hand characteristics. For instance, names of public per- configuration, orientation, placement, and move- sonalities, such as the current Portuguese prime ment, and; b) non manual (facial expressions and minister, usually have a gestural name. However, body posture). if this name is unknown, fingerspelling the letters The base hand configurations are Sign Lan- of his/her name is what should be done. guage (SL) dependent. The parameter definition A Syntactic Analyser if fundamental to iden- for orientation, placement and movement is often tify the syntactic components of the sentence, such of relative nature. For example, gestures can be as subject, and object, as LGP is usually Object– signed ‘fast’, ‘near’, ‘at chest level’, ‘touching the Subject–Verb (OSV), while spoken Portuguese is cheek’ and so on. The definition of speed is depen- predominantly Subject–Verb–Object (SVO). It dent on the overall speed of the animation, and the does not matter if it is a dependency parser or a definition of locations is dependent on the avatar constituents-based one. The only requirement is and its proportions. that, at the end, it allows structure transfer rules to be applied to the glosses. Finally, a sentiment 2.3.1 Rig analyser would allow to infer subjective informa- To setup the character, an humanoid mesh with tion towards entities and the generality of the sen- appropriate topology for animation and real-time tence, so that emotional animation layers and fa- playback is needed. Then, we need to associate it cial expression reinforcement can be added to the with the mechanism to make it move, the rig. We result. suggest a regular approach with a skinned mesh to After all this processing, a bilingual dictio- a skeleton and bones. nary (glosses) is consulted, so that meaningful se- Bones should be named according to a conven- quences of words (glosses) are identified (lexical tion for symmetry and easy identification in the transfer), and a set of syntactic rules applied, so code.

Load more