Segmentation Rules SRX – Segmentation Rules Exchange

Total Page:16

File Type:pdf, Size:1020Kb

Segmentation Rules SRX – Segmentation Rules Exchange A Guide to Open Standards and Open Source A Conceptual Case Study Angelika Zerfass [email protected] David Filip, Ph.D. [email protected] © 2009 Moravia IT a.s. and Angelika Zerfass Agenda 1. Polling Questions 2. Definitions ------------------------ 3. Architecture considerations 4. Strategy 5. Open Standards 6. Talking Legalese 7. Open Tools 8. Usage Cases 1. Polling Questions • (A) What is your level of experience with localization industry open standards (such as the XML-based TMX, TBX, SRX, and XLIFF standards)? * I know these standards well and see them regularly in the work done at my organization. * I have a basic understanding of localization industry open standards. * I'm new to localization industry open standards and want to learn more. 1. Polling Questions • (B) How familiar are you with open source applications used in the localization industry (such as OmegaT, Okapi Framework, Sun XLIFF Translation Editor)? * I'm familiar with these tools and use them (or tools like them) regularly. * I have a basic understanding of these applications, but don't really use them. * I'm new to the idea of open source tools for the localization industry and want to learn more. 1. Polling Questions • (C) Which are important to you? * Learning about the differences between open standards and open source. * Learning about actual use open standards and commonly used tools. * Learning about licensing and patent issues. * Learning about the open Translation Management Systems in use or development today. Definitions The magic quadrant Open Standards Q1 Q2 Open-Closed Open-Open Good Good Closed Open Source Source Q3 Q4 Proprietary- Proprietary- Closed Open Bad Wild Proprietary ways 2. Definitions • TMS, GMS ETMS – Enterprise TMS, “from cradle to the grave” Computer Aided L10N Project Management System (CALPMS) • Open Standards, XLIFF, TMX, TBX, SRX etc. • OSS, Open Source, Free Software vs. Freeware • Open Source (Copy-left) Licensing vs. Permissive Licensing 3. Architecture 4. Strategy Win Translator Win LSP of any size Win Enterprise Exponential growth of content Change Changing balance Enabler between published and user generated New content Need for Continuous TinyTM business Translation OmegaT Needs Community Translation Open ACS Shared language OKAPI framework data Etc. Massive online collaboration Translation automation What is an open standard? World Wide Web Consortium's definition • Transparency (design/due process is public, and all technical discussions, meeting minutes, are archived and referencable in decision making) • Relevance (new standardization is started upon due analysis of the market needs, including requirements phase, e.g. accessibility, multi-linguism) • Openness (anyone can participate: industry, individual, public, government bodies, academia, on a worldwide scale) • Impartiality and consensus (neutral org leading it, with equal weight for each participant) • Availability (free access to the standard text, both during development and at final stage, translations, and clear IPR rules for implementation, allowing open source development in the case of Web technologies) • Support (multiple implementations, ongoing process for testing, errata, revision, permanent access) Wikipedia, 2009 Goal of open standards • Interoperability of tools • Vendors can concentrate on innovation in other fields than their proprietary formats • Standardization of processes (translation of just one file format like XLIFF instead of DOC, HTML, InDesign, FM…) Success of open standards • Depends on the commercial usability • TMX – widespread, XLIFF – coming on strong, SRX – not widely used, TBX – slow, others – in the making (TBX Basic, GMX…) 5. Open Standards • Why Open Standards in Open Source? • Implementing open standards seems obvious success scenario for OSS development • XLIFF and TMX are open standards co-developed by our clients • Minimalist open standards implementation ensures desired functionality and is also legally safe • LISA OSCAR TMX 1.4b, 1.5?, 2.0? • OASIS XLIFF 1.1, 1.2, 1.2.1?, 2.0? Open Standards OAXAL © Andrzej Zydron, OASIS OAXAL TC OAXAL OASIS Zydron, Andrzej © TMX Translation Memory Exchange • From the TMX specification: • …The purpose of the TMX format is to provide a standard method to describe translation memory data that is being exchanged among tools and/or translation vendors, while introducing little or no loss of critical data during the process… What is TMX • It is an XML representation of translation memory data • Header • Body <header creationtool=“Déjà Vu " Déjà Vu, Transit, Trados, MemoQ creationtoolversion=“4" Version / build number of the tool datatype="PlainText” HTML, SGML, RTF, Interleaf, Java… segtype="sentence" Basic segmentation adminlang="en-us" Default language for elements like <note> srclang="en-us" Source text language o-tmf="DVMDB" > Original translation memory format (DVMDB – Déjà Vu database…) What is TMX • Body <body> <tu creationdate="20030915T153704Z" creationid="USER"> <tuv lang="EN-US"> <seg>This is the first sentence.</seg> </tuv> <tuv lang="DE-DE"> <seg>Dies ist der erste Satz</seg> </tuv> </tu> </body> tu = Translation Unit tuv lang = translation unit variant (language), seg = segment What is TMX • Depending on the tool that created the TMX file, it can be bilingual or multilingual. • Importing multilingual TMX file into a bilingual project will only import the relevant languages Levels of TMX • Level 1: • Plain text only (sufficient for data coming from software localization tools) • Level 2: • Text plus formatting (data coming from translation memory tools used for translation of documentation) To move formatting and text from one tool to the other both tools need to be level.2 compliant! Level 1 • Formatting that is applied to the source and target text of a translation unit is not exported to the TMX file, only pure text. • Original • This sentence has some formatting. • In TMX • This sentence has some formatting. Level 2 • Formatting that is applied to the source and target text of a translation unit is exported to the TMX file. • Different tools use different ways of encoding that information (placeholders or actual formatting information). Level 2 MemoQ – Word DOC with formatting seg> This is the <bpt i='1' type='bold'>{}</bpt>first<ept i='1'>{}</ept> sentence; this is <bpt i='2' type='ulined'>{}</bpt>another<ept i='2'>{}</ept> sentence. </seg> Trados 2007 8.2 / 8.3 – Word DOC with formatting <seg> This is the <bpt i="1">&lt;cf bold=&quot;on&quot;&gt;</bpt>first<ept i="1">&lt;/cf&gt;</ept> sentence; this is <bpt i="2">&lt;cf underlinestyle=&quot;single&quot;&gt;</bpt>another<ept i="2">&lt;/cf&gt;</ept> sentence. </seg> Trados 2009 – Word DOC with formatting <seg> This is the <bpt i="1" type="Bold" />first<ept i="1" /> sentence; this is<bpt i="2" type="Underline" />another<ept i="2" /> sentence. </seg> Level 2 MemoQ – HTML file with link <seg> Text with a link to <bpt i='1'>&lt;a href=&quot;http://www.samplehtml.com/page1.htm&quot;&gt;</bpt>another page<ept i='1'>&lt;/a&gt;</ept>. </seg> Trados 2007 8.2 / 8.3 – HTML file with link <seg> Text with a link to <bpt i="1" type="link">&lt;a href = &quot;http://www.samplehtml.com/page1.htm&quot;&gt;</bpt>another page<ept i="1">&lt;/a&gt;</ept>. </seg> Trados 2009 – HTML file with link <seg> Text with a link to <bpt i="1" type="19" x="1" />another page<ept i="1" />. </seg> OmegaT - HTML file with link OmegaT internal format: <seg>Text with a link to &lt;a0&gt;another page&lt;/a0&gt;.</seg> TMX Level 2 format: <seg>Text with a link to <bpt i='0' x='0'>&lt;a0&gt;</bpt>another page<ept i='0'>&lt;/a0&gt;</ept>.</seg> Level 2 MemoQ – InDesign <seg> InDesign Text with <bpt i='1' type='bold'>{}</bpt>formtatting in bold<ept i='1'>{}</ept>. </seg> Trados 2007 8.2 / 8.3 – InDesign <seg> InDesign text with <bpt i="1">&lt;cf ptfs=&quot;c_Bold&quot;&gt;</bpt>formatting in bold<ept i="1">&lt;/cf&gt;</ept>. </seg> Trados 2009 – InDesign <seg> InDesign text with <bpt i="1" type="pt16" x="1" />formatting in bold<ept i="1" />. </seg> Implications of different tags for formatting • Tools that use placeholder tags do not include the actual formatting information in the TMX file • Other tools might only be able to re-use the text, especially if the formatting is only applied in the target segment, but not in the source • The result of the exchange would then be the same as with TMX level 1 (text only) • TMX files which carry the actual formatting information will yield better matches in other tools that can read this information Where do you use TMX? • Transfering data between different translation memory tools • Checking tools / QA tools • TM maintenance tools • Basis for bilingual term extraxtion Reusing TMX data • Although Translation Memory Tools have the same basic idea (storing source-target language pairs and recycling translations), this has been realized in different ways. • Exchange with TMX works, but there is an issue that can lower the match rates nonetheless… the segmentation rules SRX – Segmentation Rules Exchange • From the SRX specification • …The purpose of the SRX format is to provide a standard method to describe segmentation rules that are being exchanged among tools and/or translation vendors... • …is intended to enhance the TMX standard… Why SRX? • Tool A • Semicolon is end of segment • This is a sentence; this is another sentence. • TM system sees two separate segments • Tool B • Semicolon is NOT end of segment • This is a sentence; this is another sentence. • TM system sees one segment • No match
Recommended publications
  • A Semantic Model for Integrated Content Management, Localisation and Language Technology Processing
    A Semantic Model for Integrated Content Management, Localisation and Language Technology Processing Dominic Jones1, Alexander O’Connor1, Yalemisew M. Abgaz2, David Lewis1 1 & 2 Centre for Next Generation Localisation 1Knowledge and Data Engineering Group, 1School of Computer Science and Statistics, Trinity College Dublin, Ireland {Dominic.Jones, Alex.OConnor, Dave.Lewis}@scss.tcd.ie 2School of Computing, Dublin City University, Dublin, Ireland 2 [email protected] Abstract. Providers of products and services are faced with the dual challenge of supporting the languages and individual needs of the global customer while also accommodating the increasing relevance of user-generated content. As a result, the content and localisation industries must now evolve rapidly from manually processing predicable content which arrives in large jobs to the highly automated processing of streams of fast moving, heterogeneous and unpredictable content. This requires a new generation of digital content management technologies that combine the agile flow of content from developers to localisers and consumers with the data-driven language technologies needed to handle the volume of content required to feed the demands of global markets. Data-driven technologies such as statistical machine translation, cross-lingual information retrieval, sentiment analysis and automatic speech recognition, all rely on high quality training content, which in turn must be continually harvested based on the human quality judgments made across the end-to-end content processing flow. This paper presents the motivation, approach and initial semantic models of a collection of research demonstrators where they represent a part of, or a step towards, documenting in a semantic model the multi-lingual semantic web.
    [Show full text]
  • Euractiv Proposal
    EurActiv Proposal Andrzej Zydroń MBCS CTO XTM International, Balázs Benedek CTO Easyling Andrzej Zydroń CTO XTM-Intl • 37 years in IT , 25 of those related to Localization • Member of British Computer Society • Chief Technical Architect @ Xerox , Ford , XTM International • 100% track record design and delivery of complex systems for European Patent Office , Xerox , Oxford University , Ford , XTM International • Expert on computional aspects of L10N and related Open Standards • Co-founder XTM International • Technical Architect of XTM Cloud • Open Standard Technical Committees: LISA OSCAR GMX LISA OSCAR xml:tm LISA OSCAR TBX W3C ITS OASIS XLIFF OASIS Translation Web Services OASIS DITA Translation OASIS OAXAL ETSI LIS Interoperability Now! TIPP and XLIFF:doc XTM International – company background • XTM International was formed in 2002 • Independent TMS & CAT tool developer • Software development & support teams based in Poland • Sales & Marketing in UK and Ireland • XTM Cloud was launched 2010 & is available as: – Public cloud – Accessed on XTM International’s servers – Private cloud – Installed on your servers – 100% Open Standards based : OAXAL – Open APIs, infinitely scalable modular SOA Architecture design – Next Generation SaaS TMS/CAT solution XTM International – The Team • 50 People • 30 man software development team • All Software engineers have Computer Science MSc • Efficient and effective Software Support Team • Extensive experience in 3rd party systems integration , including: XTM Cloud design principles • XML - XTM is built
    [Show full text]
  • Metadata-Group Report
    Open and Flexible Metadata in Localization How often have you tried to open a Translation Memory (TM) created with one computer-aided translation (CAT) tool in another CAT tool? I assume pretty often. In the worst case, you cannot open the TM. In the best case, you can open it, but data and metadata are lost. You aren’t able to tell which strings have been locked, which are under review and so on. The standard Translation Memory eXchange (TMX), developed by the Localization Industry Standards Association (LISA)’s standards committee called OSCAR (Open Standards for Container/content Allowing Reuse) undoubtedly makes the exchange of TM data easier and does not lock the translators in a specific tool or tool provider. Also, the standard XML Localisation Interchange File Format (XLIFF), developed under the auspices of the Organization for the Advancement of Structured Information Standards (OASIS) is an interchange file format which exchanges localization data and can be used to exchange data between companies, such as a software publisher and a localization vendor, or even between localization tools. Both TMX and XLIFF are important standards for the localization process. These standards have their own format, though the synergy is there: XLIFF’s current version 1.2 borrows from the TMX 1.2 specification, and the inline markup XLIFF support in TMX 2.0 is currently in progress. There is a range of standard data formats, apart from TMX and XLIFF, such as darwin information typing architecture (DITA), attached to OASIS, Internationalization Tag Set (ITS), put out by W3C, Segmentation Rules eXchange (SRX), affiliated with LISA/OSCAR along with Global Information Management Metrics eXchange-Volume (GMX-V) and so on.
    [Show full text]
  • OAXAL Open Architecture for XML Authoring and Localization
    OAXAL Open Architecture for XML Authoring and Localization June 2010 Andrzej Zydroń: [email protected] Why OAXAL? Why OAXAL? Globalization Standards Globalization Standards Interoperability Interoperability Globalization Standards • Can we imagine world trade without the Shipping container • World trade would be significantly hampered • World GDP would be significantly reduced • Billions of people would be condemned to a life of constant poverty Why Open Standards? • Usability • Interoperability • Exchange • Risk reduction • Investment protection • Reduced implementation costs Why Open standards • Free – no fees • Input is from accredited volunteers • Democratic process • Extensive peer review • Extensive public review • Well documented Localization Standards . Parent organization • LISA OSCAR, ISO, W3C, OASIS . Constitution • IP policy, membership and committee rules . Membership • Company, academic, individual . Technical committee • OASIS XLIFF, LISA OSCAR GMX, W3C ITS . Extensive peer review and discussions . Public review process Localization Standards . Standards matter . Reduce costs . Improve interoperability . Improve quality . Create more competition True Cost of Translation Localization without Standards Too Many Standards? • W3C ITS Document Rules • Unicode TR29 • LISA OSCAR SRX • LISA OSCAR xml:tm • LISA OSCAR TMX • LISA OSCAR GMX • OASIS XLIFF • W3C/OASIS DITA, XHTML, DocBook, or any component based XML Vocabulary DITA XLIFF SRX TMX ? GMX Unicode TR29 xml:tm W3C ITS OAXAL Open Architecture for XML Authoring and Localization
    [Show full text]
  • App Localization SDL LSP Partner Program Our Partners
    Language | Technology | Business January/February 2015 Focus: Cloud Technology Technology: App localization SDL LSP Partner Program Our Partners 1 to 1 Translations Etymax Ltd Language Connect RWS Group 1-800-translate Eurhode Traduction Language Line Translation RWS Group Deutschland GmbH 3ic International Euris Consult Ltd Solutions Sandberg Translation Partners Ltd À Propos eurocomTranslation Services Language Link Uk Santranslate Ltd A.C.T. Fachübersetzungen GmbH GmbH Language Translation Inc. sbv anderetaal a+a boundless communication Eurotext AG Languages for Life Ltd Semantix AA Translations Limited Eurotext Translations LATN Inc. SH3 Inc. Absolute Translations Ltd Exiga Solutions GmbH Lemoine International GmbH Soget s.r.l. Accurate Translations Ltd Ferrari Studio Lexilab Sprachendienst Dr. Herrlinger Accu Translation Services Ltd Five Office Ltd Lingsoft Inc GmbH Advanced Language Translation Fokus Translatørerne Traductores jurados LinguaVox S.L. SprachUnion Inc Foreign Translations, Inc Linguistic Systems Inc Supertext AG AKAB SRL Fry & Bonthrone Partnerschaft Link Translation Bureau Synergium.eu Alaya Inc Gedev LIT - Lost in Translation srl Synergy Language Services Alexika Ltd Geneva Worldwide Inc Local Concept Tech-Lingua Bt. ALINEA Financial Translations Geotext Translations LocalEyes Ltd Techtrans GmbH All Translations Company GFT GmbH Locasoft GmbH Techworld Language Solutions Alpha CRC Ltd GIB consult SPRL Logos Group - Multilingual TECKNOTRAD ALTA Language Services Global LT LTD Translation Services TecTrad Altica Traduction Global Translations GmbH Louise Killeen Translations Limited Ten-Nine Communications Inc. AmeriClic Global Translations Solutions Ltconsult Texo SRL Apostroph AG Global Words aps Mastervoice Texpertec GmbH Arc Communications Gradus Multilingual Services Inc MasterWord Services Inc. TextMinded Arkadia Translations Srl HCR Mc Lehm Translation Services textocreativ AG Arvato Technical Information Sl hCtrans Company Limited MCL Corporation Textualis Babylon Translations Ltd Hedapen Global Services Media Research Inc.
    [Show full text]
  • How to Link Productivity and Quality Andrzej Zydron
    How to link produc<vity and quality Andrzej Zydroń CTO XTM Intl Translang Europe, Warszawa 2016 Language is difficult Language is difficult Language is organic Language is diverse Language is human 30 billion cells, 100 trillion synapses UG UG Morphology Spectrum Primi've morphology Extremely rich Informaon Technology Evolu'on 1945- 1975- 1985- 2000- 2010- 2006- Mainframe Mini WorKstaon/PC Laptop Tablet Cloud Informaon Technology Evolu'on The Cloud A Connected World Unimaginable Scales Algorithmic advances Turing/von Neumann architecture John von Neumann Alan Turing 1903-1957 1912-1954 von Neumann architecture von Neumann architecture limitaons = The right tool for the job The right tool for the job The Human Brain 30 billion cells, 100 trillion synapses Von Neuman architecture does not scale DARPA Cat brain project Pung Things Into Perspec've Innovaon in Translaon Technology • Standards ✴ Unicode ✴ XML ✴ L10N Interoperability (TMX, XLIFF, TBX, TIPP etc.) ✴ Quality QT Launchpad, TAUS DQF • Internet ✴ Resources (sharing, accessing) ✴ Communication • Automation ✴ Translation Management Systems (TMS) ✴ Automated file processing ✴ Collaborative workflows ✴ Connected real time resource sharing • Advanced algorithmic technology ✴ Web Services ✴ Voice Recognition ✴ NLP ✴ SMT, NMT ✴ POS analysers ✴ Stemmers ✴ Terminology extraction (monolingual, bilingual) ✴ High quality dictionary based bilingual text alignment ✴ Linked data lexicons Why Standards? Why Standards? Why have Standards? ISO Standard Standards = Efficiency Standards = Lower Costs Standards
    [Show full text]
  • Dita + Xliff + Cms
    www.oasis-open.org AutomatingAutomating ContentContent Localization:Localization: DITA + XLIFF + CMS TonyTony JewtushenkoJewtushenko Co-Chair, OASIS XLIFF TC Director, Product Innovator Ltd, Dublin Ireland [email protected] What is a CMS? Creates or acquires content Organizes and structures the content Stores content and metadata in a repository Business and workflow rules that customize and retrieve content Publishes format & output Why use a CMS? Benefits any organization producing content with: z Lots of contributors z Lots of revisions z Lots of content z Lots of different publication formats z Need for fast turnaround z Need for high quality www.oasis-open.org Content Management Systgem (CMS) Workflow ng> i Document Review / Review / hor t Creation Edit Approve u A < > y r o t i s Store/ Store/ Deploy Archive update update epo <R ng> hi HTML s i PDF XHTML Help ubl <P > e t a l ans <Tr What is a GMS? A Globalization Management System consists of: z A CMS to connect with z CMS Connectors z Workflow Engine z Translation Repository z CAT Software: TM, TermBase, MT In a Nutshell: z A CMS + Workflow for Pre-translation www.oasis-open.org CMS Workflow with GMS > ng i Document Review / Review / hor Creation Edit Approve ut A < y> or t i Store/ Store/ Deploy Archive update update epos R < > ng hi HTML s i PDF XHTML Help ubl P < > te a l GMS ans r T < Elements of a GMS CMS’s / GMS’s that support DITA and XLIFF An XLIFF document is a container for all localization data 1.
    [Show full text]
  • Limitations of Localisation Data Exchange Standards and Their Effect on Interoperability
    Lossless Exchange of Data: Limitations of Localisation Data Exchange Standards and their Effect on Interoperability A Thesis Submitted for the Degree of Doctor of Philosophy By Ruwan Asanka Wasala Department of Computer Science and Information Systems, University of Limerick Supervisors: Reinhard Schäler, Dr. Chris Exton Submitted to the University of Limerick, May, 2013 Abstract Localisation is a relatively new field, one that is constantly changing with the latest technology to offer services that are cheaper, faster and of higher quality. As the complexity of localisation projects increases, interoperability between different localisation tools and technologies becomes ever more important. However, lack of interoperability is a significant problem in localisation. One important aspect of interoperability in localisation is the lossless exchange of data between different technologies and different stages of the overall process. Standards play a key role in enabling interoperability. Several standards exist that address data-exchange interoperability. The XML Localisation Interchange File Format (XLIFF) has been developed by a Technical Committee of Organization for the Advancement of Structured Information Standards (OASIS) and is an important standard for enabling interoperability in the localisation domain. It aims to enable the lossless exchange of localisation data and metadata between different technologies. With increased adoption, XLIFF is maturing. Solutions to significant issues relating to the current version of the XLIFF standard and its adoption are being proposed in the context of the development of XLIFF version 2. Important matters for a successful adoption of the standard, such as standard-compliance, conformance and interoperability of technologies claiming to support the standard have not yet been adequately addressed.
    [Show full text]
  • Translations in Libre Software
    UNIVERSIDAD REY JUAN CARLOS Master´ Universitario en Software Libre Curso Academico´ 2011/2012 Proyecto Fin de Master´ Translations in Libre Software Autor: Laura Arjona Reina Tutor: Dr. Gregorio Robles Agradecimientos A Gregorio Robles y el equipo de Libresoft en la Universidad Rey Juan Carlos, por sus consejos, tutor´ıa y acompanamiento˜ en este trabajo y la enriquecedora experiencia que ha supuesto estudiar este Master.´ A mi familia, por su apoyo y paciencia. Dedicatoria Para mi sobrino Dar´ıo (C) 2012 Laura Arjona Reina. Some rights reserved. This document is distributed under the Creative Commons Attribution-ShareAlike 3.0 license, available in http://creativecommons.org/licenses/by-sa/3.0/ Source files for this document are available at http://gitorious.org/mswl-larjona-docs 4 Contents 1 Introduction 17 1.1 Terminology.................................... 17 1.1.1 Internationalization, localization, translation............... 17 1.1.2 Free, libre, open source software..................... 18 1.1.3 Free culture, freedom definition, creative commons........... 20 1.2 About this document............................... 21 1.2.1 Structure of the document........................ 21 1.2.2 Scope................................... 22 1.2.3 Methodology............................... 22 2 Goals and objectives 23 2.1 Goals....................................... 23 2.2 Objectives..................................... 23 2.2.1 Explain the phases of localization process................ 24 2.2.2 Analyze the benefits and counterparts.................. 24 2.2.3 Provide case studies of libre software projects and tools......... 24 2.2.4 Present personal experiences....................... 24 3 Localization in libre software projects 25 3.1 Localization workflow.............................. 25 3.2 Prepare: Defining objects to be localized..................... 27 3.2.1 Some examples.............................
    [Show full text]
  • IJCNLP 2011 Proceedings of the Workshop on Language Resources,Technology and Services in the Sharing Paradigm
    IJCNLP 2011 Proceedings of the Workshop on Language Resources,Technology and Services in the Sharing Paradigm November 12, 2011 Shangri-La Hotel Chiang Mai, Thailand IJCNLP 2011 Proceedings of the Workshop on Language Resources, Technology and Services in the Sharing Paradigm November 12, 2011 Chiang Mai, Thailand We wish to thank our sponsors Gold Sponsors www.google.com www.baidu.com The Office of Naval Research (ONR) Department of Systems Engineering and The Asian Office of Aerospace Research and Devel- Engineering Managment, The Chinese Uni- opment (AOARD) versity of Hong Kong Silver Sponsors Microsoft Corporation Bronze Sponsors Chinese and Oriental Languages Information Processing Society (COLIPS) Supporter Thailand Convention and Exhibition Bureau (TCEB) We wish to thank our sponsors Organizers Asian Federation of Natural Language National Electronics and Computer Technolo- Processing (AFNLP) gy Center (NECTEC), Thailand Sirindhorn International Institute of Technology Rajamangala University of Technology Lanna (SIIT), Thailand (RMUTL), Thailand Chiang Mai University (CMU), Thailand Maejo University, Thailand c 2011 Asian Federation of Natural Language Proceesing vii Introduction The Context Some of the current major initiatives in the area of language resources – FLaReNet (http://www.flarenet.eu/), Language Grid (http://langrid.nict.go.jp/en/index.html) and META-SHARE (www.meta-share.org, www.meta-net.eu) – have agreed to organise a joint workshop on infrastructural issues that are critical in the age of data sharing and open data, to discuss the state of the art, international cooperation, future strategies and priorities, as well as the road-map of the area. It is an achievement, and an opportunity for our field, that recently a number of strategic-infrastructural initiatives have started all over the world.
    [Show full text]
  • Language Industry Standards and Guidelines
    CELAN WP 2 – DELIVERABLE D2.1 ANNOTATED CATALOGUE OF BUSINESS-RELEVANT SERVICES, TOOLS, RESOURCES, POLICIES AND STRATEGIES AND THEIR CURRENT UPTAKE IN THE BUSINESS COMMUNITY ANNEX 2 INVESTIGATION OF BUSINESS-RELEVANT STANDARDS AND GUIDELINES IN THE FIELDS OF THE LANGUAGE INDUSTRY Project Title: CELAN Project Type: Network Project Programme: LLP – KA2 Project No: 196466-LLP-1-2010-1-BE-KA2-KA2PLA Version: 1.2 Date: 2013-01-30 Author: Blanca Stella Giraldo Pérez (sub-contract for standards investigation and analysis) Contributors: Infoterm (supervision), other CELAN partners (comments) The CELAN network project has been funded with support from the European Commission, LLP programme, KA2. This communication reflects the views only of the authors, and the Commission cannot be held responsible for any use which may be made of the information contained therein. CELAN D2.1 ANNEX 2_fv1.2 Executive Summary The investigation of industry&business-relevant standards and guidelines in the fields of the language industry (LI) was subdivided into four parts: General standardization framework relevant to CELAN, Basic standards related to the ICT infrastructure with particular impact on the LI, Specific standards pertaining to language technologies, resources, services and LI related competences and skills, Latest developments with respect to the complementarity of LI standards and assistive technologies standards. At the end of the investigation a summary and recommendations are given. Standards can play an important role in support of developing LI policies/strategies, educational schemes, language technology tools (LTT) and language and other content resources (LCR), language services or using the offers of language service providers (LSP). They can definitely contribute to the design of better products and the saving of financial resources from the outset and thus to avoid the need of having to retrofit products in later stages of their life cycle at significantly higher costs.
    [Show full text]
  • Multilingualweb – Language Technology a New W3C Working Group
    Why Localisation Standardisation Activities Matter to You MultilingualWeb – Language Technology A New W3C Working Group Pedro L. Díez Orzas A few words about who is talking: Pedro L. Díez-Orzas CEO at Linguaserve I.S. S.A Professor at Univ. Complutense de Madrid PhD in Computational Linguistics Member of MultilingualWeb-LT W3C Member CTN 191 – AENOR – Terminology [email protected] Standards They are great. Everyone should have their own. New Standards for Old Needs http://xkcd.com/927/ New Standards for New Needs Viability of Multilinguality on the Web depends on the level of mechanisation of methods and processes. A certain level of standard metadata can decisively help to make this happen. But it cannot always take that long… The Space Shuttle and the Horse's Rear End http://www.astrodigital.org/space/stshorse.html MultilingualWeb‐LT •New W3C Working Group under I18n Activity – http://www.w3.org/International/multilingualweb/lt/ • Aims: define meta‐data for web content that facilitates its interaction with language technologies and localization processes. •Already have 28 participants from 20 organisations – Chairs: Felix Sasaki, David Filip, Dave Lewis • Timeline: –Feature Freeze Nov 2012 – Recommendation complete Dec 2013 EU Project FP7‐ICT MLW‐LT Approach • Standardise Data Categories –ITS (1.0) has: Translate, Loc note, Terminology, Directionality, Ruby, Language Info, Element Within Text –MLW‐LT could add: MT‐specific instructions, workflow, quality‐related provenance, legal? •Map to formats –ITS focussed on XML •useful for XHTML, DITA, DocBook –MLW‐LT also targets HTML5 and CMS‐based ‘deep web’ –Use of microdata and RDFa • Uses Cases MLS‐LT Main Tasks • Develop this metadata standard through broad consensus across communities, involving content producers, localisation workers, language technology experts, browser vendors and users.
    [Show full text]