How to Link Productivity and Quality Andrzej Zydron

Total Page:16

File Type:pdf, Size:1020Kb

How to Link Productivity and Quality Andrzej Zydron How to link produc<vity and quality Andrzej Zydroń CTO XTM Intl Translang Europe, Warszawa 2016 Language is difficult Language is difficult Language is organic Language is diverse Language is human 30 billion cells, 100 trillion synapses UG UG Morphology Spectrum Primi've morphology Extremely rich Informaon Technology Evolu'on 1945- 1975- 1985- 2000- 2010- 2006- Mainframe Mini WorKstaon/PC Laptop Tablet Cloud Informaon Technology Evolu'on The Cloud A Connected World Unimaginable Scales Algorithmic advances Turing/von Neumann architecture John von Neumann Alan Turing 1903-1957 1912-1954 von Neumann architecture von Neumann architecture limitaons = The right tool for the job The right tool for the job The Human Brain 30 billion cells, 100 trillion synapses Von Neuman architecture does not scale DARPA Cat brain project Pung Things Into Perspec've Innovaon in Translaon Technology • Standards ✴ Unicode ✴ XML ✴ L10N Interoperability (TMX, XLIFF, TBX, TIPP etc.) ✴ Quality QT Launchpad, TAUS DQF • Internet ✴ Resources (sharing, accessing) ✴ Communication • Automation ✴ Translation Management Systems (TMS) ✴ Automated file processing ✴ Collaborative workflows ✴ Connected real time resource sharing • Advanced algorithmic technology ✴ Web Services ✴ Voice Recognition ✴ NLP ✴ SMT, NMT ✴ POS analysers ✴ Stemmers ✴ Terminology extraction (monolingual, bilingual) ✴ High quality dictionary based bilingual text alignment ✴ Linked data lexicons Why Standards? Why Standards? Why have Standards? ISO Standard Standards = Efficiency Standards = Lower Costs Standards = Safe to Implement Standards = Greater Interoperability Standards: Unforeseen Benefits Standards: Misuse imap://azydron%40xml-intl%40xml-intl %[email protected]:143/fetch%3EUID %3E.INBOX%3E87222? part=1.2&filename=image003.jpg Standards: Abuse Standards: Sabotage L10N Standards • Encoding – Unicode • 16 and 32 bit encoding • TR 29 - Word Boundaries – ISO 639, ISO 3166 – IETF BCP 47 – Locale • Descriptive – W3C ITS - Internationalization and Localization Tag Set L10N Standards • Exchange Standards – TMX – LISA OSCAR • Translation Memory Exchange – TBX, TBX Link, TBX Basic - LISA OSCAR • Terminology Exchange – SRX - LISA OSCAR • Segmentation rules exchange – GMX – LISA OSCAR • Metrics Exchange (Volume, Complexity, Quality) – XLIFF - OASIS • XML Localization Interchange File Format L10N Standards • Interoperability – Translation Web Services - OASIS – Interoperability Now - XLIFF:doc, TIPP – Linport - Language Interoperability Portfolio • Reuse – DITA – OASIS - Darwin Information Technology Architecture • Topic level document granularity (Reference, Concept, Task) – xml:tm – LISA OSCAR • Sentence granularity L10N Standards • Architectural • OAXAL - OASIS ✴ Open Architecture for XML Authoring and Localization ✴ Brings all of the L10N standards together in one architectural framework ✴ OASIS Reference Architecture Standard L10N Standards • Quality Measurement MQM - QT Launchpad - TAUS DQF Core L10N Standards 2016 • W3C ITS Document Rules • Gala SRX • ETSI LIS xml:tm • ETSI LIS TMX • ETSI LIS TBX • ETSI LIS GMX-V • OASIS XLIFF • W3C/OASIS DITA (XHTML, DocBook, or any XML Vocabulary) • Linport Interoperability: TIPP XLIFF:doc • OASIS OAXAL • Unicode • QT Launchpad • TAUS DQF OAXAL 2.0 • Open Architecture for XML Authoring and Localization (OAXAL) – http://wiki.oasis-open.org/oaxal/FrontPage OAXAL 2.0 OAXAL 2.0 Pung Things Into Perspec've Process Automaon: Translaon Management Systems TMS: Raises Quality • Process automation • Significantly reduced costs • Reduced turnaround times • Eliminate repetitive administrative tasks • All data is immediately available ✴ Terminology ✴ TM • More secure • JIT and ‘never ending projects’ • Built-in quality assessment Improving Quality • Standards for MQM ✴ QT Launchpad ✴ TAUS DQF • Integration of MQM with workflow • Process automation • Interactively shared data - consistency ✴ Terminology ✴ TM • Terminology extraction QT Launchpad TAUS DQF Normalised with QT Launchpad • Content Profiling and Knowledge Base • DQF Tools • Quality Dashboard • API TAUS DQF Translaon Tool Improvements • Predictive Typing • Voice input • Fuzzy matching completion • Concordance • Automated alignment • Terminology extraction • Automatic terminology insertion • QA tools ✴ Spelling ✴ Grammar ✴ Omissions etc. Automated alignment MT Development – Rule based 1950+ – Stas'cal Word Based 2000 – Stas'cal Phrase Based 2008 – Stas'cal Hybrid Word/Phrase + Grammar 2012 – Stas'cal Deep Learning: 2015 • Neural NetworK • Powerful Language Models • Dic'onary, disambiguaon (BabelNet) Stas'cal MT Neural MT SMT: No problem can be solved from the same consciousness that they have arisen NMT: Problems can never be solved with the same way of thinking that caused them. Neural MT Neural MT NMT predicts a target word based on the context associated with source and previously generated target words An attention mechanism is used to analyse the context for every source word NMT Assessment Joss Moorkens (DCU, ASLING TC38): • Improved translaon quality for morphologically rich languages • Fluency is improved, word order errors are fewer • Fewer segments require edi'ng • Fewer morphological errors • No clear improvement for omission or mistranslaon • Mistakes can be harder to spot • NMT for produc'on: no great improvement in post-edi'ng throughput Limits of MT technology The limitaons of computaonal linguiscs: • Syntax • Morphology • Grammar • Language model size • Training set quality and size: data dilu'on • Domain similarity • Homographs, Polysemy • OOV: Out of Vocabulary words • Word and phrase alignment SMT Limitaons: – More data != beoer performance – Diminishing returns John Searle’s Chinese Room The Ul'mate MT Limitaon In Order to Translate you need to UNDERSTAND How can we assess the potential productivity savings by using MT? Current MT providers answer: Well it depends… Theore'cal limits of MT 1 HT en-US > en-GB en-US > fr-FR Morphology SMT Delta = en-US > de-DE Language Closeness en-US > ru-RU 0 Quality 1 Language Similarity Training Set Size Factor Where Size is the actual training data size and Size' is an empirical value which makes TSSF equal 0.5. Estimating Percentage Reduction in Translator Effort (PRTE) PRTE = (LC x TSSF x DMS) x 100% PRTE Calculaon Examples • Translating from en-US to en-GB we can assume a LC value of 1. If we have an ideal reference TSSF of 1 and an ideal DMS of 1, we arrive at a PRTE of: 1x1x1x100 = 100% • Translating from en-US to fr-FR we can assume a LC value of 0.8. If we have a slightly less that ideal TSSF of 0.75 but with an ideal DMS of 1, we arrive at a PRTE of: 0.8x0.75x1x100 = 60% • Translating from en-US to ja-JP we can assume a LC value of 0.2. If we have an ideal TSSF value of 1 and an ideal DMS of 1, we arrive at a PRTE value of: .2x1x1x100 = 20% Ques'on and Answer session Better Translation Technology Contact Details XTM Internaonal www.xtm-intl.com Register for future Webinar sessions www.xtm-intl.com/demos Contact [email protected] +44 (0) 1753 480 479 .
Recommended publications
  • A Semantic Model for Integrated Content Management, Localisation and Language Technology Processing
    A Semantic Model for Integrated Content Management, Localisation and Language Technology Processing Dominic Jones1, Alexander O’Connor1, Yalemisew M. Abgaz2, David Lewis1 1 & 2 Centre for Next Generation Localisation 1Knowledge and Data Engineering Group, 1School of Computer Science and Statistics, Trinity College Dublin, Ireland {Dominic.Jones, Alex.OConnor, Dave.Lewis}@scss.tcd.ie 2School of Computing, Dublin City University, Dublin, Ireland 2 [email protected] Abstract. Providers of products and services are faced with the dual challenge of supporting the languages and individual needs of the global customer while also accommodating the increasing relevance of user-generated content. As a result, the content and localisation industries must now evolve rapidly from manually processing predicable content which arrives in large jobs to the highly automated processing of streams of fast moving, heterogeneous and unpredictable content. This requires a new generation of digital content management technologies that combine the agile flow of content from developers to localisers and consumers with the data-driven language technologies needed to handle the volume of content required to feed the demands of global markets. Data-driven technologies such as statistical machine translation, cross-lingual information retrieval, sentiment analysis and automatic speech recognition, all rely on high quality training content, which in turn must be continually harvested based on the human quality judgments made across the end-to-end content processing flow. This paper presents the motivation, approach and initial semantic models of a collection of research demonstrators where they represent a part of, or a step towards, documenting in a semantic model the multi-lingual semantic web.
    [Show full text]
  • Euractiv Proposal
    EurActiv Proposal Andrzej Zydroń MBCS CTO XTM International, Balázs Benedek CTO Easyling Andrzej Zydroń CTO XTM-Intl • 37 years in IT , 25 of those related to Localization • Member of British Computer Society • Chief Technical Architect @ Xerox , Ford , XTM International • 100% track record design and delivery of complex systems for European Patent Office , Xerox , Oxford University , Ford , XTM International • Expert on computional aspects of L10N and related Open Standards • Co-founder XTM International • Technical Architect of XTM Cloud • Open Standard Technical Committees: LISA OSCAR GMX LISA OSCAR xml:tm LISA OSCAR TBX W3C ITS OASIS XLIFF OASIS Translation Web Services OASIS DITA Translation OASIS OAXAL ETSI LIS Interoperability Now! TIPP and XLIFF:doc XTM International – company background • XTM International was formed in 2002 • Independent TMS & CAT tool developer • Software development & support teams based in Poland • Sales & Marketing in UK and Ireland • XTM Cloud was launched 2010 & is available as: – Public cloud – Accessed on XTM International’s servers – Private cloud – Installed on your servers – 100% Open Standards based : OAXAL – Open APIs, infinitely scalable modular SOA Architecture design – Next Generation SaaS TMS/CAT solution XTM International – The Team • 50 People • 30 man software development team • All Software engineers have Computer Science MSc • Efficient and effective Software Support Team • Extensive experience in 3rd party systems integration , including: XTM Cloud design principles • XML - XTM is built
    [Show full text]
  • Metadata-Group Report
    Open and Flexible Metadata in Localization How often have you tried to open a Translation Memory (TM) created with one computer-aided translation (CAT) tool in another CAT tool? I assume pretty often. In the worst case, you cannot open the TM. In the best case, you can open it, but data and metadata are lost. You aren’t able to tell which strings have been locked, which are under review and so on. The standard Translation Memory eXchange (TMX), developed by the Localization Industry Standards Association (LISA)’s standards committee called OSCAR (Open Standards for Container/content Allowing Reuse) undoubtedly makes the exchange of TM data easier and does not lock the translators in a specific tool or tool provider. Also, the standard XML Localisation Interchange File Format (XLIFF), developed under the auspices of the Organization for the Advancement of Structured Information Standards (OASIS) is an interchange file format which exchanges localization data and can be used to exchange data between companies, such as a software publisher and a localization vendor, or even between localization tools. Both TMX and XLIFF are important standards for the localization process. These standards have their own format, though the synergy is there: XLIFF’s current version 1.2 borrows from the TMX 1.2 specification, and the inline markup XLIFF support in TMX 2.0 is currently in progress. There is a range of standard data formats, apart from TMX and XLIFF, such as darwin information typing architecture (DITA), attached to OASIS, Internationalization Tag Set (ITS), put out by W3C, Segmentation Rules eXchange (SRX), affiliated with LISA/OSCAR along with Global Information Management Metrics eXchange-Volume (GMX-V) and so on.
    [Show full text]
  • OAXAL Open Architecture for XML Authoring and Localization
    OAXAL Open Architecture for XML Authoring and Localization June 2010 Andrzej Zydroń: [email protected] Why OAXAL? Why OAXAL? Globalization Standards Globalization Standards Interoperability Interoperability Globalization Standards • Can we imagine world trade without the Shipping container • World trade would be significantly hampered • World GDP would be significantly reduced • Billions of people would be condemned to a life of constant poverty Why Open Standards? • Usability • Interoperability • Exchange • Risk reduction • Investment protection • Reduced implementation costs Why Open standards • Free – no fees • Input is from accredited volunteers • Democratic process • Extensive peer review • Extensive public review • Well documented Localization Standards . Parent organization • LISA OSCAR, ISO, W3C, OASIS . Constitution • IP policy, membership and committee rules . Membership • Company, academic, individual . Technical committee • OASIS XLIFF, LISA OSCAR GMX, W3C ITS . Extensive peer review and discussions . Public review process Localization Standards . Standards matter . Reduce costs . Improve interoperability . Improve quality . Create more competition True Cost of Translation Localization without Standards Too Many Standards? • W3C ITS Document Rules • Unicode TR29 • LISA OSCAR SRX • LISA OSCAR xml:tm • LISA OSCAR TMX • LISA OSCAR GMX • OASIS XLIFF • W3C/OASIS DITA, XHTML, DocBook, or any component based XML Vocabulary DITA XLIFF SRX TMX ? GMX Unicode TR29 xml:tm W3C ITS OAXAL Open Architecture for XML Authoring and Localization
    [Show full text]
  • App Localization SDL LSP Partner Program Our Partners
    Language | Technology | Business January/February 2015 Focus: Cloud Technology Technology: App localization SDL LSP Partner Program Our Partners 1 to 1 Translations Etymax Ltd Language Connect RWS Group 1-800-translate Eurhode Traduction Language Line Translation RWS Group Deutschland GmbH 3ic International Euris Consult Ltd Solutions Sandberg Translation Partners Ltd À Propos eurocomTranslation Services Language Link Uk Santranslate Ltd A.C.T. Fachübersetzungen GmbH GmbH Language Translation Inc. sbv anderetaal a+a boundless communication Eurotext AG Languages for Life Ltd Semantix AA Translations Limited Eurotext Translations LATN Inc. SH3 Inc. Absolute Translations Ltd Exiga Solutions GmbH Lemoine International GmbH Soget s.r.l. Accurate Translations Ltd Ferrari Studio Lexilab Sprachendienst Dr. Herrlinger Accu Translation Services Ltd Five Office Ltd Lingsoft Inc GmbH Advanced Language Translation Fokus Translatørerne Traductores jurados LinguaVox S.L. SprachUnion Inc Foreign Translations, Inc Linguistic Systems Inc Supertext AG AKAB SRL Fry & Bonthrone Partnerschaft Link Translation Bureau Synergium.eu Alaya Inc Gedev LIT - Lost in Translation srl Synergy Language Services Alexika Ltd Geneva Worldwide Inc Local Concept Tech-Lingua Bt. ALINEA Financial Translations Geotext Translations LocalEyes Ltd Techtrans GmbH All Translations Company GFT GmbH Locasoft GmbH Techworld Language Solutions Alpha CRC Ltd GIB consult SPRL Logos Group - Multilingual TECKNOTRAD ALTA Language Services Global LT LTD Translation Services TecTrad Altica Traduction Global Translations GmbH Louise Killeen Translations Limited Ten-Nine Communications Inc. AmeriClic Global Translations Solutions Ltconsult Texo SRL Apostroph AG Global Words aps Mastervoice Texpertec GmbH Arc Communications Gradus Multilingual Services Inc MasterWord Services Inc. TextMinded Arkadia Translations Srl HCR Mc Lehm Translation Services textocreativ AG Arvato Technical Information Sl hCtrans Company Limited MCL Corporation Textualis Babylon Translations Ltd Hedapen Global Services Media Research Inc.
    [Show full text]
  • Dita + Xliff + Cms
    www.oasis-open.org AutomatingAutomating ContentContent Localization:Localization: DITA + XLIFF + CMS TonyTony JewtushenkoJewtushenko Co-Chair, OASIS XLIFF TC Director, Product Innovator Ltd, Dublin Ireland [email protected] What is a CMS? Creates or acquires content Organizes and structures the content Stores content and metadata in a repository Business and workflow rules that customize and retrieve content Publishes format & output Why use a CMS? Benefits any organization producing content with: z Lots of contributors z Lots of revisions z Lots of content z Lots of different publication formats z Need for fast turnaround z Need for high quality www.oasis-open.org Content Management Systgem (CMS) Workflow ng> i Document Review / Review / hor t Creation Edit Approve u A < > y r o t i s Store/ Store/ Deploy Archive update update epo <R ng> hi HTML s i PDF XHTML Help ubl <P > e t a l ans <Tr What is a GMS? A Globalization Management System consists of: z A CMS to connect with z CMS Connectors z Workflow Engine z Translation Repository z CAT Software: TM, TermBase, MT In a Nutshell: z A CMS + Workflow for Pre-translation www.oasis-open.org CMS Workflow with GMS > ng i Document Review / Review / hor Creation Edit Approve ut A < y> or t i Store/ Store/ Deploy Archive update update epos R < > ng hi HTML s i PDF XHTML Help ubl P < > te a l GMS ans r T < Elements of a GMS CMS’s / GMS’s that support DITA and XLIFF An XLIFF document is a container for all localization data 1.
    [Show full text]
  • Limitations of Localisation Data Exchange Standards and Their Effect on Interoperability
    Lossless Exchange of Data: Limitations of Localisation Data Exchange Standards and their Effect on Interoperability A Thesis Submitted for the Degree of Doctor of Philosophy By Ruwan Asanka Wasala Department of Computer Science and Information Systems, University of Limerick Supervisors: Reinhard Schäler, Dr. Chris Exton Submitted to the University of Limerick, May, 2013 Abstract Localisation is a relatively new field, one that is constantly changing with the latest technology to offer services that are cheaper, faster and of higher quality. As the complexity of localisation projects increases, interoperability between different localisation tools and technologies becomes ever more important. However, lack of interoperability is a significant problem in localisation. One important aspect of interoperability in localisation is the lossless exchange of data between different technologies and different stages of the overall process. Standards play a key role in enabling interoperability. Several standards exist that address data-exchange interoperability. The XML Localisation Interchange File Format (XLIFF) has been developed by a Technical Committee of Organization for the Advancement of Structured Information Standards (OASIS) and is an important standard for enabling interoperability in the localisation domain. It aims to enable the lossless exchange of localisation data and metadata between different technologies. With increased adoption, XLIFF is maturing. Solutions to significant issues relating to the current version of the XLIFF standard and its adoption are being proposed in the context of the development of XLIFF version 2. Important matters for a successful adoption of the standard, such as standard-compliance, conformance and interoperability of technologies claiming to support the standard have not yet been adequately addressed.
    [Show full text]
  • Translations in Libre Software
    UNIVERSIDAD REY JUAN CARLOS Master´ Universitario en Software Libre Curso Academico´ 2011/2012 Proyecto Fin de Master´ Translations in Libre Software Autor: Laura Arjona Reina Tutor: Dr. Gregorio Robles Agradecimientos A Gregorio Robles y el equipo de Libresoft en la Universidad Rey Juan Carlos, por sus consejos, tutor´ıa y acompanamiento˜ en este trabajo y la enriquecedora experiencia que ha supuesto estudiar este Master.´ A mi familia, por su apoyo y paciencia. Dedicatoria Para mi sobrino Dar´ıo (C) 2012 Laura Arjona Reina. Some rights reserved. This document is distributed under the Creative Commons Attribution-ShareAlike 3.0 license, available in http://creativecommons.org/licenses/by-sa/3.0/ Source files for this document are available at http://gitorious.org/mswl-larjona-docs 4 Contents 1 Introduction 17 1.1 Terminology.................................... 17 1.1.1 Internationalization, localization, translation............... 17 1.1.2 Free, libre, open source software..................... 18 1.1.3 Free culture, freedom definition, creative commons........... 20 1.2 About this document............................... 21 1.2.1 Structure of the document........................ 21 1.2.2 Scope................................... 22 1.2.3 Methodology............................... 22 2 Goals and objectives 23 2.1 Goals....................................... 23 2.2 Objectives..................................... 23 2.2.1 Explain the phases of localization process................ 24 2.2.2 Analyze the benefits and counterparts.................. 24 2.2.3 Provide case studies of libre software projects and tools......... 24 2.2.4 Present personal experiences....................... 24 3 Localization in libre software projects 25 3.1 Localization workflow.............................. 25 3.2 Prepare: Defining objects to be localized..................... 27 3.2.1 Some examples.............................
    [Show full text]
  • IJCNLP 2011 Proceedings of the Workshop on Language Resources,Technology and Services in the Sharing Paradigm
    IJCNLP 2011 Proceedings of the Workshop on Language Resources,Technology and Services in the Sharing Paradigm November 12, 2011 Shangri-La Hotel Chiang Mai, Thailand IJCNLP 2011 Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm November 12, 2011 Chiang Mai, Thailand We wish to thank our sponsors Gold Sponsors www.google.com www.baidu.com The Office of Naval Research (ONR) Department of Systems Engineering and The Asian Office of Aerospace Research and Devel- Engineering Managment, The Chinese Uni- opment (AOARD) versity of Hong Kong Silver Sponsors Microsoft Corporation Bronze Sponsors Chinese and Oriental Languages Information Processing Society (COLIPS) Supporter Thailand Convention and Exhibition Bureau (TCEB) We wish to thank our sponsors Organizers Asian Federation of Natural Language National Electronics and Computer Technolo- Processing (AFNLP) gy Center (NECTEC), Thailand Sirindhorn International Institute of Technology Rajamangala University of Technology Lanna (SIIT), Thailand (RMUTL), Thailand Chiang Mai University (CMU), Thailand Maejo University, Thailand c 2011 Asian Federation of Natural Language Proceesing vii Introduction The Context Some of the current major initiatives in the area of language resources – FLaReNet (http://www.flarenet.eu/), Language Grid (http://langrid.nict.go.jp/en/index.html) and META-SHARE (www.meta-share.org, www.meta-net.eu) – have agreed to organise a joint workshop on infrastructural issues that are critical in the age of data sharing and open data, to discuss the state of the art, international cooperation, future strategies and priorities, as well as the road-map of the area. It is an achievement, and an opportunity for our field, that recently a number of strategic-infrastructural initiatives have started all over the world.
    [Show full text]
  • Language Industry Standards and Guidelines
    CELAN WP 2 – DELIVERABLE D2.1 ANNOTATED CATALOGUE OF BUSINESS-RELEVANT SERVICES, TOOLS, RESOURCES, POLICIES AND STRATEGIES AND THEIR CURRENT UPTAKE IN THE BUSINESS COMMUNITY ANNEX 2 INVESTIGATION OF BUSINESS-RELEVANT STANDARDS AND GUIDELINES IN THE FIELDS OF THE LANGUAGE INDUSTRY Project Title: CELAN Project Type: Network Project Programme: LLP – KA2 Project No: 196466-LLP-1-2010-1-BE-KA2-KA2PLA Version: 1.2 Date: 2013-01-30 Author: Blanca Stella Giraldo Pérez (sub-contract for standards investigation and analysis) Contributors: Infoterm (supervision), other CELAN partners (comments) The CELAN network project has been funded with support from the European Commission, LLP programme, KA2. This communication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. CELAN D2.1 ANNEX 2_fv1.2 Executive Summary The investigation of industry&business-relevant standards and guidelines in the fields of the language industry (LI) was subdivided into four parts: General standardization framework relevant to CELAN, Basic standards related to the ICT infrastructure with particular impact on the LI, Specific standards pertaining to language technologies, resources, services and LI related competences and skills, Latest developments with respect to the complementarity of LI standards and assistive technologies standards. At the end of the investigation a summary and recommendations are given. Standards can play an important role in support of developing LI policies/strategies, educational schemes, language technology tools (LTT) and language and other content resources (LCR), language services or using the offers of language service providers (LSP). They can definitely contribute to the design of better products and the saving of financial resources from the outset and thus to avoid the need of having to retrofit products in later stages of their life cycle at significantly higher costs.
    [Show full text]
  • Multilingualweb – Language Technology a New W3C Working Group
    Why Localisation Standardisation Activities Matter to You MultilingualWeb – Language Technology A New W3C Working Group Pedro L. Díez Orzas A few words about who is talking: Pedro L. Díez-Orzas CEO at Linguaserve I.S. S.A Professor at Univ. Complutense de Madrid PhD in Computational Linguistics Member of MultilingualWeb-LT W3C Member CTN 191 – AENOR – Terminology [email protected] Standards They are great. Everyone should have their own. New Standards for Old Needs http://xkcd.com/927/ New Standards for New Needs Viability of Multilinguality on the Web depends on the level of mechanisation of methods and processes. A certain level of standard metadata can decisively help to make this happen. But it cannot always take that long… The Space Shuttle and the Horse's Rear End http://www.astrodigital.org/space/stshorse.html MultilingualWeb‐LT •New W3C Working Group under I18n Activity – http://www.w3.org/International/multilingualweb/lt/ • Aims: define meta‐data for web content that facilitates its interaction with language technologies and localization processes. •Already have 28 participants from 20 organisations – Chairs: Felix Sasaki, David Filip, Dave Lewis • Timeline: –Feature Freeze Nov 2012 – Recommendation complete Dec 2013 EU Project FP7‐ICT MLW‐LT Approach • Standardise Data Categories –ITS (1.0) has: Translate, Loc note, Terminology, Directionality, Ruby, Language Info, Element Within Text –MLW‐LT could add: MT‐specific instructions, workflow, quality‐related provenance, legal? •Map to formats –ITS focussed on XML •useful for XHTML, DITA, DocBook –MLW‐LT also targets HTML5 and CMS‐based ‘deep web’ –Use of microdata and RDFa • Uses Cases MLS‐LT Main Tasks • Develop this metadata standard through broad consensus across communities, involving content producers, localisation workers, language technology experts, browser vendors and users.
    [Show full text]
  • Translating and the Computer 36
    seeeeeeeeeeeeeeee Editions Tradulex, Geneva © AsLing, The International Association for Advancement in Language Technology, 2014 Distribution without the authorisation from ASLING is not allowed. These proceedings are downloadable from www.tradulex.com 1 Conference Chairs and Editors of the Proceedings João Esteves-Ferreira, Tradulex – International Association for Quality Translation. Juliet Macan, Arancho Doc srl. Ruslan Mitkov, University of Wolverhampton. Olaf-Michael Stefanov, JIAMCATT - International Annual Meeting on Computer-Assisted Translation and Terminology, United Nations (ret.). Programme Commitee Alain Désilets, National Research Council of Canada (NRC). David Chambers, World Intellectual Property Organisation (ret.). Gloria Corpas Pastor, University of Malaga. Estelle Delpech, Nomao. David Flip, LRC - Localisation Research Centre (Ireland), CNGL - Centre for Global Intelligent Content (Ireland). Web, University of Limerick. Pamela Mayorcas, Fellow of the Institute of Translation and Interpreting (FITI). Paola Valli Nelson Verástegui, International Telecommunications Union (ITU). Conference Manager Nicole Adamides Technical Advisor Jean-Marie Vande Walle Editorial Assistants Míriam Urbano Mendaña Petya Petkova 2 Acknowledgements AsLing wishes to thank and acknowledge the support of the sponsors of TC36 3 Preface 2014 has been one of profound changes for the Translating and the Computer conference that has managed to maintain over 36 years its character of a unique forum for researchers, developers and users. Bringing together academics involved in language technology research and in teaching translation and terminology with those who develop and market tools for language transformation and both of these groups with practitioners: translators, terminologists, interpreters, and voice-over specialists, whether freelancers or working in translation departments of large organisations, international companies as well as Language Services Providers (LSPs), large and small.
    [Show full text]