D4.1.4 -‐ Annex Ii Training Methodology for Machine Translation Post-‐Editing

D4.1.4 -‐ Annex Ii Training Methodology for Machine Translation Post-‐Editing

D4.1.4 - ANNEX II TRAINING METHODOLOGY FOR MACHINE TRANSLATION POST-EDITING Celia Rico Pérez, Pedro L. Díez Orzas, et al. Distribution: Public MultilingualWeb-LT (LT-Web) Language Technology in the Web FP7-ICT-2011-7 Project no: 287815 Document Information Deliverable number: 4.1.4 Annex II Deliverable title: EDI-TA: Training Methodology for Machine Translation Post-Editing Dissemination level: PU Contractual date of delivery: 31st December 2012 Actual date of delivery: 15th February 2013 Author(s): Celia Rico Pérez, Pedro L. Díez Orzas, Lidia Cámara, Igone Regidor, Martin Ariano, Felix Fernández, Johanna Blasco Participants: LINGUASERVE, Universidad Europea de Madrid Internal Reviewer: LINGUASERVE Workpackage: WP4 Task Responsible: Celia Rico Pérez Workpackage Leader: Pedro L. Díez Orzas Revision History Revision Date Author Organization Description 1 18/10/2012 Celia Rico Pérez Universidad First final draft Europea de Madrid (Linguaserve) 2 23/10/2012 Pedro L. Díez Orzas Linguaserve Revision final draft 3 23/11/2012 Lidia Cámara, Igone Linguaserve Revision final draft Regidor, Martin Ariano, Felix Fernández, Johanna Blasco 4 22/12/2012 Pedro L. Díez Orzas Linguaserve Final Version 5 13/02/2013 Pedro L. Díez Orzas Linguaserve Revision Final Version 2 D4.1.4.Annex II (EDITA) CONTENTS Document Information ....................................................................................................................................... 2 Revision History .................................................................................................................................................. 2 Contents ............................................................................................................................................................. 3 1. Executive Summary ................................................................................................................................... 5 2. Introduction .............................................................................................................................................. 6 3. EDI-TA: training methodology ................................................................................................................... 7 3.1. The training kit ...................................................................................................................................... 8 3.2. Competences expected in a post-editor ............................................................................................... 8 3.3. Post-editing tasks .................................................................................................................................. 9 3.4. The role of the post-editor in the translation workflow ...................................................................... 10 4. Post-editing guidelines specification and ITS 2.0 .................................................................................... 12 5. PE Rules ................................................................................................................................................... 14 5.1. BASIC Language independent rules ..................................................................................................... 14 5.2. BASIC Language dependent rules ....................................................................................................... 14 5.3. PE Samples .......................................................................................................................................... 16 Language independent rules ........................................................................................................................ 16 a) Language independent rules implemented in the combination EN-ES .............................................. 16 b) Language independent rules implemented in the combination ES-EN .............................................. 17 c) Language independent rules implemented in the combination ES-FR ............................................... 18 d) Language independent rules implemented in the combination ES-EU .............................................. 20 Language dependent rules ........................................................................................................................... 22 a) Language dependent rules into Spanish (from English) ...................................................................... 22 b) Language dependent rules into English (from Spanish) ...................................................................... 23 c) Language dependent rules into French (from Spanish) ...................................................................... 25 d) Language dependent rules into Euskera/Basque (from Spanish) ....................................................... 27 5.4. PE exercises ......................................................................................................................................... 29 3 D4.1.4.Annex II (EDITA) Post-edited sample text in EN-ES .................................................................................................................. 29 Post-edited sample text in ES-EN .................................................................................................................. 39 Post-edited sample text in ES-EU .................................................................................................................. 40 Post-edited sample text in ES-FR .................................................................................................................. 48 6. References ............................................................................................................................................... 50 4 D4.1.4.Annex II (EDITA) 1. EXECUTIVE SUMMARY EDI-TA’s key objectives are as follows: 1. Contribute to defining metadata suitable for post-editing purposes. 2. Test the contribution of metadata in order to improve post-editing processes. 3. Define a practical methodology for post‐editing between distant languages pairs, namely, Spanish into English, French and Basque, and from English into Spanish. 4. Suggest improvements in the MT system so as to optimize the output for post‐editing specific purposes 5. Show the feasibility and cost reduction of implementing post‐editing in a real scenario. 6. Identify functions to improve post-editing tools. 7. Define a methodology for training post‐editors in the following language pairs: ES, EN, FR and EU. The present document reports work carried out towards the methodology for training translators in post- editing tasks for achieving objectives 1 to 6 and is complementary to a first report defining post-editing methodology. The MT system from Lucy Software has been used, as well as public content from the Spanish Tax Agency or the City Hall of Barcelona web sites. The main elements in the post-editing training kit are the following, and they are described in the most general way possible: A first advance of using ITS2.0 for post-editing is also in this report, organized into two groups: Contextual information can be used from the following ITS 2.0 data categories: Translate, about non translatable text, but necessary for segment comprehension; Language Information, about a shift of language in a specific part of content; Domain, about subject area; and Provenance, about the MT System and who has post-edited. Activation rules can be contained in ITS 2.0 categories, like: Localization Note (from content creator) or Localization Quality Issue (from content creator or post-editor). More practical information about training in post-editing MT or revising translation will be included in Deliverable 4.2.2 and Deliverable 3.2.3 (in the online MT system, CMS-TMS reports) of MLW-LT project. 5 D4.1.4.Annex II (EDITA) 2. INTRODUCTION This report is based on work carried out in the framework of EDI-TA, an R&D project conducted by Linguaserve and Universidad Europea de Madrid, as part of the tasks that Linguaserve is developing within The MultilingualWeb-LT (Language Technologies) Working Group, which belongs to the W3C Internationalization Activity and the MultilingualWeb community. The MultilingualWeb-LT Working Group receives funding by the European Commission (project name LT-Web) through the Seventh Framework Programme (FP7) Grant Agreement No. 287815. EDI-TA had a duration of 6 months, from March 2012 to October 2012, and work was organized into three phases: 1. Phase 1. Post‐editing pilot project start-up. First stage of the pilot test: test design, implementation, data collection and partial evaluation of results. 2. Phase 2. MT post‐editing experimentation. Implementation, in two subsequent stages, of a test bed that serves as a showcase. 3. Phase 3. Training and methodology Figure 1 summarizes work on the basis of each of the objectives: Figure 1: EDI-TA’s workplan Although EDI-TA was finished in September 2012, when ITS 2.0 data categories were at definition stage, we included some general post-editing guidelines specification of ITS 2.0 usage. More practical information about training in post-editing MT or revising translation will be included in Deliverable 4.2.2 and Deliverable 3.2.3 (in the online MT system, CMS-TMS reports) of MLW-LT project. 6 D4.1.4.Annex II (EDITA) 3. EDI-TA: TRAINING METHODOLOGY Training post-editors is key to the successful completion of any PE project. In this respect, the following aspects should be considered: • Competences development.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    50 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us