Arxiv:2105.02877V1 [Cs.CV] 6 May 2021 Tles Corresponding to the Audio Content

Total Page:16

File Type:pdf, Size:1020Kb

Arxiv:2105.02877V1 [Cs.CV] 6 May 2021 Tles Corresponding to the Audio Content Aligning Subtitles in Sign Language Videos Hannah Bull1* Triantafyllos Afouras2∗ Gul¨ Varol2;3 Samuel Albanie2 Liliane Momeni2 Andrew Zisserman2 1 LISN, Univ Paris-Saclay, CNRS, France 2 Visual Geometry Group, University of Oxford, UK 3 LIGM, Ecole´ des Ponts, Univ Gustave Eiffel, CNRS, France [email protected]; fafourast,gul,albanie,liliane,[email protected] https://www.robots.ox.ac.uk/˜vgg/research/bslalign/ Overlooked by a small hill known as Leopard the mother must make sure To keep her cubs alive in this dangerous neighbourhood, Saudio Rock. they stay hidden. Sgt Overlooked by a small hill known as Leopard Rock. the mother must make sure they stay hidden. To keep her cubs alive in this dangerous neighbourhood, Time 14:12 14:14 14:16 14:18 14:20 14:22 14:24 14:26 14:28 14:30 Figure 1: Subtitle alignment: We study the task of aligning subtitles to continuous signing in sign language interpreted TV broadcast data. The subtitles in such settings usually correspond to and are aligned with the audio content (top: audio subtitles, Saudio) but are unaligned with the accompanying signing (bottom: Ground Truth annotation of the signing corresponding to the subtitle, Sgt). This is a very challenging task as (i) the order of subtitles varies between spoken and sign languages, (ii) the duration of a subtitle differs considerably between signing and speech, and (iii) the signing corresponds to a translation of the speech as opposed to a transcription. Abstract evaluations, we show substantial improvements over exist- ing alignment baselines that do not make use of subtitle text The goal of this work is to temporally align asyn- embeddings for learning. Our automatic alignment model chronous subtitles in sign language videos. In particular, opens up possibilities for advancing machine translation we focus on sign-language interpreted TV broadcast data of sign languages via providing continuously synchronized comprising (i) a video of continuous signing, and (ii) subti- video-text data. arXiv:2105.02877v1 [cs.CV] 6 May 2021 tles corresponding to the audio content. Previous work ex- ploiting such weakly-aligned data only considered finding keyword-sign correspondences, whereas we aim to localise 1. Introduction a complete subtitle text in continuous signing. We propose Sign languages constitute a key form of communication a Transformer architecture tailored for this task, which we for Deaf communities [53]. Our goal in this paper is to train on manually annotated alignments covering over 15K temporally localise subtitles in continuous signing video. subtitles that span 17.7 hours of video. We use BERT subti- Automatic alignment of subtitle text to signing content has tle embeddings and CNN video representations learned for great potential for a wide range of applications including sign recognition to encode the two signals, which interact assistive tools for education and translation, indexing of through a series of attention layers. Our model outputs sign language video corpora, efficient subtitling technology frame-level predictions, i.e., for each video frame, whether for signing vloggers1, and automatic construction of large- it belongs to the queried subtitle or not. Through extensive 1Unlike spoken vlogs that benefit from automatic closed captioning on *Equal contribution sites such as YouTube, signing vlog creators who wish to provide written 1 scale sign language datasets that support computer vision hands, head movement, pauses, and facial expressions [24]. and linguistic research. However, as shown in our evaluations in Sec.4, such ap- Despite recent advances in computer vision, machine proaches based on prosody-only perform poorly in our set- translation between continuous signing and written lan- ting, where subtitles do not necessarily correspond to com- guage remains largely unsolved [5]. Recent works [10, 11] plete sign sentences with clear visual boundaries. have shown promising translation results, but to date these In this paper, we instead propose to use the subtitle text have been achieved only in constrained settings where con- as an additional signal for better alignment. We make the tinuous signing is manually pre-segmented into clips, with following three contributions: (1) we show that encoding each clip associated to a written sentence from a limited vo- the subtitle text as input to the alignment model significantly cabulary. Two key bottlenecks for scaling up translation improves the temporal localisation quality as opposed to to continuous signing depicting unconstrained vocabularies only relying on visual cues to segment continuous sign lan- are (i) the segmentation of signing into sentence-like units, guage videos into subtitle units; (2) we design a novel for- and (ii) the availability of large-scale sign language training mulation for the subtitle alignment task based on Trans- data. formers; and (3) we present a comprehensive study ablating Manual alignment of subtitles to sign language video is our design choices and provide promising results for this tedious – an expert fluent in sign language takes approxi- new task when evaluating on unseen signers and content. mately 10-15 hours to align subtitles to 1 hour of continu- ous sign language video. In this work, we focus on the task 2. Related Work of aligning a particular known subtitle within a given tem- For a recent comprehensive survey about sign language poral signing window. We explore this task in the context of recognition and translation, see [33]. Here, we review rele- sign language interpreted TV broadcast footage – a readily vant works on temporal localisation at the levels of individ- available and large-scale source of data – where the subti- ual signs and sequences, in addition to more general tempo- tles are synchronised with the audio, but the corresponding ral alignment methods from the literature. sign language translations are largely unaligned due to dif- Temporal localisation of individual signs. A rich body ferences between spoken and sign languages as well as lags of work has considered the task of localising sparse sign from the live interpretation. instances in continuous signing, often referred to as “sign Subtitle alignment to continuous signing remains a very spotting”. Early efforts using signing gloves [38] were fol- challenging task. First, sign languages have grammatical lowed by methods employing hand-crafted visual features structures that vary considerably from those of spoken lan- to represent the hands, face and motion that were integrated guages [53], and as a result the ordering of words within with CRFs [61, 62], HMMs [49] and HSP Trees [45]. Sev- a subtitle as well as the subtitles themselves is often not eral studies have sought to employ subtitles as weak super- maintained in the signing (see Fig.1). Second, the dura- vision for learning to localise and classify signs, using apri- tion of a subtitle varies considerably between signing and ori mining [17] and multiple-instance learning [6,7, 46]. speech due to differences in speed and grammar. Third, the More recent work has leveraged cues such as mouthings [2] signing corresponds to a translation of the speech that ap- and visual dictionaries [42] and by making use of deep neu- pears in the subtitles as opposed to a transcription: there is ral network features with sliding window classifiers [37] no direct one-to-one mapping between subtitle words and and attention learned via a proxy translation task [56]. In signs produced by interpreters, and entire subtitles may not deviation from these works, our objective is to localise com- be signed. plete subtitle units, rather than individual signs. Previous work exploiting such weakly-aligned data has Temporal localisation of sign sequences. The alignment mainly focused on finding sparse correspondences between of subtitles to continuous signing was considered in creative keywords in the subtitle and individual signs [2, 42, 56], as early work by combining cues from multiple sparse corre- opposed to localising the start and end times of a complete spondences [23], but under the assumption that ordering of subtitle text in continuous signing. Though, as we show, lo- words in subtitles are preserved in the signing (which does calising isolated signs identified by keyword spotting never- not hold in our problem setting). Other sequence-level sign theless forms a useful pretraining task for full subtitle align- language temporal localisation tasks that have received at- ment. Most closely related to our work, Bull et al. [8] con- tention in the literature include category-agnostic sign seg- sider the task of segmenting a continuous signing video into mentation [22, 47], active signer detection [4, 16, 43, 52] subtitle units purely based on body keypoints. In fact, sim- and diarisation [1, 26, 27]—each considers a temporal gran- ilarly to speech which can be segmented based on prosodic ularity that differs from subtitle units. Most closely related cues such as pauses, sign sentence boundaries can to an ex- to our work, Bull et al. [8] employ a keypoint-based model tent be detected through visual cues such as lowering the to segment continuous signing into sentence-like units with- subtitles must both translate and align their subtitles manually. out knowledge of the written subtitles during inference. Our 2 Subtitle text “The souffle is just a little bit of Transformer Encoder frame not in subtitle work but it's really worth it.” BERT Linear /////////// frame in subtitle Continuous signing video ///////// Sgt I3D Linear T frames + Transformer Decoder /////////// Spred Linear Sprior Sigmoid T frames ///////////// Linear T frames PE Figure 2: SAT model overview: We input to our model (i) token embeddings of the subtitle text we wish to align, (ii) a sequence of video features extracted from a continuous sign language video segment and (iii) the shifted temporal boundaries F4B6BB of the audio-aligned subtitle, Sprior. Using these inputs, the model outputs a vector of values between 0 and 1 of length T .
Recommended publications
  • Preparation and Exploitation of Bilingual Texts Dusko Vitas, Cvetana Krstev, Eric Laporte
    Preparation and exploitation of bilingual texts Dusko Vitas, Cvetana Krstev, Eric Laporte To cite this version: Dusko Vitas, Cvetana Krstev, Eric Laporte. Preparation and exploitation of bilingual texts. Lux Coreana, 2006, 1, pp.110-132. hal-00190958v2 HAL Id: hal-00190958 https://hal.archives-ouvertes.fr/hal-00190958v2 Submitted on 27 Nov 2007 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Preparation and exploitation of bilingual texts Duško Vitas Faculty of Mathematics Studentski trg 16, CS-11000 Belgrade, Serbia Cvetana Krstev Faculty of Philology Studentski trg 3, CS-11000 Belgrade, Serbia Éric Laporte Institut Gaspard-Monge, Université de Marne-la-Vallée 5, bd Descartes, 77454 Marne-la-Vallée CEDEX 2, France Introduction A bitext is a merged document composed of two versions of a given text, usually in two different languages. An aligned bitext is produced by an alignment tool or aligner, that automatically aligns or matches the versions of the same text, generally sentence by sentence. A multilingual aligned corpus or collection of aligned bitexts, when consulted with a search tool, can be extremely useful for translation, language teaching and the investigation of literary text (Veronis, 2000).
    [Show full text]
  • Machine Translation in Post-Contemporary Era
    Title : Machine Translation in Post-Contemporary Era Author name(s) : Grace Hui Chin Lin Adjunct Professor, National Changhua University of Education Publication date: (or date of document completion) 2010, Dec. 10 Conference information for conference papers (name, date and location of the conference) : Name : 2010 International Conference on TESOL and Translation, Dept. of English Language, Da-Yeh University Date: Dec. 10, 2010, Location: Changhua, Taiwan Proceedings of the International Conference on TESOL and Translation 2010 Machine Translation in Post-Contemporary Era Hui-chin Lin National Changhua University of Education Abstract This article focusing on translating techniques via personal computer or laptop reports updated artificial intelligence progresses before 2010. Based on interpretations and information for field of MT by Yorick Wilks’ book, Machine Translation, Its scope and limits, this paper displays understandable theoretical frameworks from views of a translating field worker and linguistics. Moreover, from author’s practical application experiences working as a translator, this research in addition introduces two current and popular soft wares and translating systems created by SYSTRAN and Google. The basic functions and characteristics created by computering assistant translation are thoroughly examined and introduced. More importantly, the advantages and weaknesses in computational assisting procedures are investigated and revealed. The purpose of this study mainly focuses on providing an overview for what functions and what doesn’t perform well in computational linguistics, comparing to human translation. After reading this commentary-oriented article, the reader will obtain basic concepts for definitions and explanations for diverse terminologies in MT field. Key words: Machine Translation, Artificial Intelligence, Computational Linguistics 47 Introduction Machine Translation has been a significant issue in our post-modernized world where almost everyone is able to use a computer and surf on the internet.
    [Show full text]
  • Study on the Size of the Language Industry in the EU
    Studies on translation and multilingualism o The size of the language industry in the EU European Commission Directorate-General for Translation 1/2009 Manuscript completed on 17th August 2009 ISBN 978-92-79-14181-2 © European Commission, 2009 Reproduction is authorised provided the source is acknowledged. %R7`V]Q` Q .V 1`VH Q`: VVJV`:C`Q``:JC: 1QJ Q` .V%`Q]V:J QII11QJ !1J:C0V`1QJ R$R% %R7QJ .V1<VQ` .VC:J$%:$V1JR% `71J .V .%$% .V:J$%:$VVH.JQCQ$7VJ `V R R 1J$ QJ1CC 1J$ QJ%]QJ.:IV %``V7 J1 VR1J$RQI 1118C:J$ VH.8HQ8%@ % .Q`7 `8R`1:JV 1JH.V.::.#1JQI]% : 1QJ:C1J$%1 1H``QI%QJJJ10V`1 75(V`I:J78 .V `Q%JRVR .V :J$%:$V VH.JQCQ$7 VJ `V ^_ 1J 5 : C1I1 VR HQI]:J7 G:VR 1J QJRQJ :JR 1JHQ`]Q`: VR 1J :.1J$ QJ #8 .J /]`1C 5 GVH:IV ]:` Q` : $`Q%] Q` HQI]:J1V%JRV` .V%IG`VCC:Q`/12#.3( R11 .#`811JH.V:I:=Q`1 7.:`V.QCRV`8 JRV`#`811JH.V;CV:RV`.1]5HQJ 1J%V QQ]V`: V::I%C 1C1J$%:CHQJ%C :JH75V`01HV :JRQ` 1:`VR1 `1G% 1QJHQI]:J71.V`V:Q` 1:`VRV1$J5RV0VCQ]IVJ :JR%]]Q` 1: `:J`V``VR Q/$1CVVGQC% 1QJ R811 .Q``1HV1JQJRQJ:JR%QJJ5(V`I:J78 #`8 11JH.V HQRQ`R1J: V 1J V`J:C :JR 7 `%JRVR `VV:`H. :JR RV0VCQ]IVJ ]`Q=VH 5 I:`@V %R1V:JR `1:C8.V1::]]Q1J VRV0:C%: Q``Q`V0V`:C:CC`Q``Q]Q:CQ` .V 7%`Q]V:JQII11QJ5:JR`V01V1V``Q`V0V`:C7]`Q=VH V0:C%: 1QJ8 :R1:1Q` V`:R:JQ 1;]`Q`1CV1JHC%RV:%H1J.71H:JR/R0:JHVRVH.JQCQ$1V]%`%VR : .VJ10V`1 1V Q` 8`V1G%`$ ^(V`I:J7_ :JR 1VJ: ^.
    [Show full text]
  • The Iafor European Conference Series 2014 Ece2014 Ecll2014 Ectc2014 Official Conference Proceedings ISSN: 2188-1138
    the iafor european conference series 2014 ece2014 ecll2014 ectc2014 Official Conference Proceedings ISSN: 2188-1138 “To Open Minds, To Educate Intelligence, To Inform Decisions” The International Academic Forum provides new perspectives to the thought-leaders and decision-makers of today and tomorrow by offering constructive environments for dialogue and interchange at the intersections of nation, culture, and discipline. Headquartered in Nagoya, Japan, and registered as a Non-Profit Organization 一般社( 団法人) , IAFOR is an independent think tank committed to the deeper understanding of contemporary geo-political transformation, particularly in the Asia Pacific Region. INTERNATIONAL INTERCULTURAL INTERDISCIPLINARY iafor The Executive Council of the International Advisory Board IAB Chair: Professor Stuart D.B. Picken IAB Vice-Chair: Professor Jerry Platt Mr Mitsumasa Aoyama Professor June Henton Professor Frank S. Ravitch Director, The Yufuku Gallery, Tokyo, Japan Dean, College of Human Sciences, Auburn University, Professor of Law & Walter H. Stowers Chair in Law USA and Religion, Michigan State University College of Law Professor David N Aspin Professor Emeritus and Former Dean of the Faculty of Professor Michael Hudson Professor Richard Roth Education, Monash University, Australia President of The Institute for the Study of Long-Term Senior Associate Dean, Medill School of Journalism, Visiting Fellow, St Edmund’s College, Cambridge Economic Trends (ISLET) Northwestern University, Qatar University, UK Distinguished Research Professor of Economics,
    [Show full text]
  • Kernerman Kdictionaries.Com/Kdn DICTIONARY News the European Network of E-Lexicography (Enel) Tanneke Schoonheim
    Number 22 ● July 2014 Kernerman kdictionaries.com/kdn DICTIONARY News The European Network of e-Lexicography (ENeL) Tanneke Schoonheim On October 11th 2013, the kick-off meeting of the European production and reception of dictionaries. The internet offers Network of e-Lexicography (ENeL) project took place in entirely new possibilities for developing and presenting Brussels. This meeting was the outcome of an idea ventilated dictionary information, such as with the integration of sound, a year and a half earlier, in March 2012 in Berlin, at the maps or video, and various novel ways of interacting with European Workshop on Future Standards in Lexicography. dictionary users. For editors of scholarly dictionaries the new The workshop participants then confirmed the imperative to medium is not only a source of inspiration, it also generates coordinate and harmonise research in the field of (electronic) new and serious challenges that demand cooperation and lexicography across Europe, namely to share expertise relating standardization on various levels: to standards, discuss new methodologies in lexicography that a. Through the internet scholarly dictionaries can potentially fully exploit the possibilities of the digital medium, reflect on reach large audiences. However, at present scholarly the pan-European nature of the languages of Europe and attain dictionaries providing reliable information are often not easy a wider audience. to find and are hard to decode for a non-academic audience; A proposal was written by a team of researchers from
    [Show full text]
  • TANGO: Bilingual Collocational Concordancer
    TANGO: Bilingual Collocational Concordancer Jia-Yan Jian Yu-Chia Chang Jason S. Chang Department of Computer Inst. of Information Department of Computer Science System and Applictaion Science National Tsing Hua National Tsing Hua National Tsing Hua University University University 101, Kuangfu Road, 101, Kuangfu Road, 101, Kuangfu Road, Hsinchu, Taiwan Hsinchu, Taiwan Hsinchu, Taiwan [email protected] [email protected] [email protected] du.tw on elaborated statistical calculation. Moreover, log Abstract likelihood ratios are regarded as a more effective In this paper, we describe TANGO as a method to identify collocations especially when the collocational concordancer for looking up occurrence count is very low (Dunning, 1993). collocations. The system was designed to Smadja’s XTRACT is the pioneering work on answer user’s query of bilingual collocational extracting collocation types. XTRACT employed usage for nouns, verbs and adjectives. We first three different statistical measures related to how obtained collocations from the large associated a pair to be collocation type. It is monolingual British National Corpus (BNC). complicated to set different thresholds for each Subsequently, we identified collocation statistical measure. We decided to research and instances and translation counterparts in the develop a new and simple method to extract bilingual corpus such as Sinorama Parallel monolingual collocations. Corpus (SPC) by exploiting the word- We also provide a web-based user interface alignment technique. The main goal of the capable of searching those collocations and its concordancer is to provide the user with a usage. The concordancer supports language reference tools for correct collocation use so learners to acquire the usage of collocation.
    [Show full text]
  • Aconcorde: Towards a Proper Concordance for Arabic
    aConCorde: towards a proper concordance for Arabic Andrew Roberts, Dr Latifa Al-Sulaiti and Eric Atwell School of Computing University of Leeds LS2 9JT United Kingdom {andyr,latifa,eric}@comp.leeds.ac.uk July 12, 2005 Abstract Arabic corpus linguistics is currently enjoying a surge in activity. As the growth in the number of available Arabic corpora continues, there is an increased need for robust tools that can process this data, whether it be for research or teaching. One such tool that is useful for both groups is the concordancer — a simple tool for displaying a specified target word in its context. However, obtaining one that can reliably cope with the Arabic lan- guage had proved extremely difficult. Therefore, aConCorde was created to provide such a tool to the community. 1 Introduction A concordancer is a simple tool for summarising the contents of corpora based on words of interest to the user. Otherwise, manually navigating through (of- ten very large) corpora would be a long and tedious task. Concordancers are therefore extremely useful as a time-saving tool, but also much more. By isolat- ing keywords in their contexts, linguists can use concordance output to under- stand the behaviour of interesting words. Lexicographers can use the evidence to find if a word has multiple senses, and also towards defining their meaning. There is also much research and discussion about how concordance tools can be beneficial for data-driven language learning (Johns, 1990). A number of studies have demonstrated that providing access to corpora and concordancers benefited students learning a second language.
    [Show full text]
  • Audio Description and Audio Subtitling in a Dubbing Country: Case Studies
    Audio description and audio subtitling in a dubbing country: Case studies Bernd Benecke Bavarian Broadcasting, Munich, Germany Abstract In many European countries foreign films are not dubbed but subtitled. An audio describer has to include all the written subtitles in his script and try to make the description fit in between. Dubbing countries like Spain, Italy and Germany are also used to combining audio description and audio subtitling – for different reasons. This presentation shows how audio subtitling affects the work of describers in a dubbing country like Germany. It will present examples from daily work to show how many different ways are used to deal with the subtitles. Introduction A new focus in the research on audio description is the interaction with the field of audio subtitling. In many European countries, foreign films (mainly with English dialogues) are not dubbed, but rather subtitled. In such a case, the work of the audio describer becomes more complicated, for he has to include all the written subtitles in his script and try to make the description fit in between. In the production process, sometimes more than one narrator is needed to make a distinction between what is subtitle and what is description. From time to time, audio description and audio subtitling in a dubbing country 99 the describer has to introduce the name of a character being subtitled, in order to make clear who is speaking. However, audio subtitling is not a common practice only in subtitling countries: dubbing countries like Spain, Italy and Germany are also used to combining audio description and audio subtitling.
    [Show full text]
  • You with the Magic Gaze (Yā Sāhir Al-Tarf ).3
    1 Distribution Agreement In presenting this thesis or dissertation as a partial fulfillment of the requirements for an advanced degree from Emory University, I hereby grant to Emory University and its agents the non-exclusive license to archive, make accessible, and display my thesis or dissertation in whole or in part in all forms of media, now or hereafter known, including display on the world wide web. I understand that I may select some access restrictions as part of the online submission of this thesis or dissertation. I retain all ownership rights to the copyright of the thesis or dissertation. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. Signature: __________________________ _______________ Christine N. Kalleeny Date 2 FIGURING THE ICONOCLAST: THE EROS OF WINE IN TWO POEMS BY ABŪ NUWĀS AND THE SYMPOSIUM OF PLATO By Christine N. Kalleeny Doctor of Philosophy Comparative Literature Claire Nouvet, Ph.D., Advisor Kevin Corrigan, Ph.D., Committee Member Phillip F. Kennedy, Ph.D., Committee Member Deborah A. White, Ph.D., Committee Member Accepted: Lisa A. Tedesco, Ph.D. Dean of the James T. Laney School of Graduate Studies Date FIGURING THE ICONOCLAST: THE EROS OF WINE IN TWO POEMS BY ABŪ NUWĀS AND THE SYMPOSIUM OF PLATO 3 By Christine N. Kalleeny Ph.D., Emory University, 2010 Advisor: Claire Nouvet, Ph.D. An abstract of A dissertation submitted to the Faculty of the James T. Laney School of Graduate Studies of Emory University In partial fulfillment of the requirements for the degree of Doctor of Philosophy In Comparative Literature 2010 4 Abstract FIGURING THE ICONOCLAST: THE EROS OF WINE IN TWO POEMS BY ABŪ NUWĀS AND THE SYMPOSIUM OF PLATO By Christine N.
    [Show full text]
  • Panacea D6.1
    SEVENTH FRAMEWORK PROGRAMME THEME 3 Information and communication Technologies PANACEA Project Grant Agreement no.: 248064 Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies D6.1 Technologies and Tools for Lexical Acquisition Dissemination Level: Public Delivery Date: July 16th 2010 Status – Version: Final Author(s) and Affiliation: Laura Rimell (UCAM), Anna Korhonen (UCAM), Valeria Quochi (ILC-CNR), Núria Bel (UPF), Tommaso Caselli (ILC-CNR), Prokopis Prokopidis (ILSP), Maria Gavrilidou (ILSP), Thierry Poibeau (UCAM), Muntsa Padró (UPF), Eva Revilla (UPF), Monica Monachini(CNR-ILC), Maurizio Tesconi (CNR-IIT), Matteo Abrate (CNR-IIT) and Clara Bacciu (CNR-IIT) D6.1 Technologies and Tools for Lexical Acquisition This document is part of technical documentation generated in the PANACEA Project, Platform for Automatic, Normalized Annotation and Cost-Effective Acquisition (Grant Agreement no. 248064). This documented is licensed under a Creative Commons Attribution 3.0 Spain License. To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/es/. Please send feedback and questions on this document to: [email protected] TRL Group (Tecnologies dels Recursos Lingüístics), Institut Universitari de Lingüística Aplicada, Universitat Pompeu Fabra (IULA-UPF) D6.1 – Technologies and Tools for Lexical Acquisition Table of contents Table of contents ..........................................................................................................................
    [Show full text]
  • Translate's Localization Guide
    Translate’s Localization Guide Release 0.9.0 Translate Jun 26, 2020 Contents 1 Localisation Guide 1 2 Glossary 191 3 Language Information 195 i ii CHAPTER 1 Localisation Guide The general aim of this document is not to replace other well written works but to draw them together. So for instance the section on projects contains information that should help you get started and point you to the documents that are often hard to find. The section of translation should provide a general enough overview of common mistakes and pitfalls. We have found the localisation community very fragmented and hope that through this document we can bring people together and unify information that is out there but in many many different places. The one section that we feel is unique is the guide to developers – they make assumptions about localisation without fully understanding the implications, we complain but honestly there is not one place that can help give a developer and overview of what is needed from them, we hope that the developer section goes a long way to solving that issue. 1.1 Purpose The purpose of this document is to provide one reference for localisers. You will find lots of information on localising and packaging on the web but not a single resource that can guide you. Most of the information is also domain specific ie it addresses KDE, Mozilla, etc. We hope that this is more general. This document also goes beyond the technical aspects of localisation which seems to be the domain of other lo- calisation documents.
    [Show full text]
  • Corpus Linguistics and Data-Driven Learning: a Critical Overview
    Published in Bulletin VALS-ASLA 97, 97-118, 2013 which should be used for any reference to this work Corpus linguistics and data-driven learning: a critical overview Alex BOULTON Université de Lorraine et Crapel, ATILF (UMR 7118) 44 avenue de la libération, 54000 Nancy, France [email protected] Henry TYNE Université de Perpignan Via Domitia et VECT (EA 2983) 52 avenue Paul Alduy, 66000 Perpignan, France [email protected] L'utilisation de corpus s'est avérée intéressante dans de nombreux domaines depuis plus de vingt ans, notamment en didactique des langues. Un état de l'art des études publiées dans ce domaine nous permettra de constater que si les résultats sont en général assez positifs (l'apprentissage a lieu), l'exploitation de corpus reste une activité marginale. Si, en général, de telles recherches en didactique des langues ne donnent pas toujours des résultats fulgurants, on pourrait faire l’hypothèse que les approches sont plus ou moins adaptées à certains types d'apprenants, ou à certains types d'activité, et qu'elles contribuent au bagage général des outils et des techniques disponibles. Qui plus est, avec une approche sur corpus, on pourrait de même croire que les étudiants deviennent tout simplement des apprenants plus conscients de leur apprentissage grâce au travail effectué. Nous suggérons que l'apport des corpus pour l'apprentissage des langues offre ainsi de nombreuses possibilités. Mots-clés: Corpus, apprentissage sur corpus, pratiques ordinaires d'enseignement-apprentissage, processus, TIC 1. Introduction Corpus linguistics is not simply a recondite field of research within linguistics: it affords practical methodologies and tools to further the study of all aspects of language use (e.g.
    [Show full text]