Part 2: Data Category Selection (DCS) for Electronic Terminological Resources (ETR)
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Proceedings of the Workshop on Multilingual Language Resources and Interoperability, Pages 1–8, Sydney, July 2006
COLING •ACL 2006 MLRI’06 Multilingual Language Resources and Interoperability Proceedings of the Workshop Chairs: Andreas Witt, Gilles Sérasset, Susan Armstrong, Jim Breen, Ulrich Heid and Felix Sasaki 23 July 2006 Sydney, Australia Production and Manufacturing by BPA Digital 11 Evans St Burwood VIC 3125 AUSTRALIA c 2006 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 1-932432-82-5 ii Table of Contents Preface .....................................................................................v Organizers . vii Workshop Program . ix Lexical Markup Framework (LMF) for NLP Multilingual Resources Gil Francopoulo, Nuria Bel, Monte George, Nicoletta Calzolari, Monica Monachini, Mandy Pet and Claudia Soria . 1 The Role of Lexical Resources in CJK Natural Language Processing Jack Halpern . 9 Towards Agent-based Cross-Lingual Interoperability of Distributed Lexical Resources Claudia Soria, Maurizio Tesconi, Andrea Marchetti, Francesca Bertagna, Monica Monachini, Chu-Ren Huang and Nicoletta Calzolari. .17 The LexALP Information System: Term Bank and Corpus for Multilingual Legal Terminology Consolidated Verena Lyding, Elena Chiocchetti, Gilles Sérasset and Francis Brunet-Manquat . 25 The Development of a Multilingual Collocation Dictionary Sylviane Cardey, Rosita Chan and Peter Greenfield. .32 Multilingual Collocation Extraction: Issues and Solutions Violeta Seretan and Eric Wehrli . 40 Structural Properties of Lexical Systems: Monolingual and Multilingual Perspectives Alain Polguère . 50 A Fast and Accurate Method for Detecting English-Japanese Parallel Texts Ken’ichi Fukushima, Kenjiro Taura and Takashi Chikayama . 60 Evaluation of the Bible as a Resource for Cross-Language Information Retrieval Peter A. -
Ffimffi Llmub Liks Kord Kuus Alates Lggg
EESTI STAtrDARNIAMET ffiMffi llmub liks kord kuus alates lggg. aastast tssN 1406-0698 Tdnases numbris : r) EESTI UUDISED 1 tr) Standardikomisjonis 2 + Akred iteeritud katselaborid 3 tr) Tur.ibikinnitused 5 tr) Kval iteed ij u hti mi ne toid uainetoostuses 10 tr) Koolitus DIN-is 11 E> Harmoneeritud standardid 13 tr) IFAN 13 r) CEN UUDISED 14 tr) ISO UUDISED ISO bulletddni lehekulgedelt .... 15 tr+ Juulis saadud ISO standardid ja ISO/DIS 17 IEC standardid 22 CEN standardid 25 r) Eesti standardite koostamisettepanekud . 2a E) Eesti standardite kavandid 28 r) Eesti standardite muugi TOp 10 . 29 r+ Mtiugile saabunr-ld 30 tr) Leiva- ja saiatootjatele 30 tr) Reglstrisse kantud 30 EESTI AUDISBD 3-7. juunil toimus Eesti P6llumajandustilikoolis P6hjarnaade Metsanduse, Veterinaaria ja P6llumajanduse Ulikooli poolt Baltimaadele organiseeritud kursus teemal "Kvaliteedijuhtimise stisteem toiduainetiitistuses", millest v6ite lugeda lk. 10 25. juunil toimus Kaitseministeeriumis standardiseerimisalane seminar. Standardiseerimist NATO raames ja sellealast tegevust Taanis tutvustasid Taani spetsialistid. Standardiameti Standardikomisjoni koosolek toimus 20 A6 96. ,l?, l ''-{a Standardiameti, kui katselaborite ning sertifitseerimis- ja inspektsioonorganite akrediteerimisorgani poolt on seisuga 01 08 96 akditeeritud 2 esimest laboric RAS ARETO toiduainete analiiiisi ning Riigi Veterinaarlaboratoorium toiduainete analiltlsi ja loomhaiguste diagnoosi valdkonnas. Liihemalt lugege lk. 3 STANDARDIKOMISJOMS Standardiameti Standardikomisjoni 20 M 96 koosolekul oli piievakonas: 1. Eestistandarditekomtamisettepanekutearutelu l.l Keevituse koordineerimine. thesanded ja vastutus Ettepaneku esitaja: AS Sele Aluseksvdetavad dokumendid: EN 7 19 i,ilev6tt. OTSUSTATI Ettepanek heaks kiita. Paluda U.Vainul tiipsustada, milles seisneb keevituse koordineerimise olemus. Vdib-olla peaks nimetuses s6na "koordineerimine" asemel olema muu s6na. Kas keevitus v6i keevitusttitid? 1.2 NDT operaatorite kvalifitseerimine ja sertifitseerimine. -
ISO/IEC JTC 1 Information Technology
ISO/IEC JTC 1 Information technology Big data Preliminary Report 2014 Our vision Our process To be the world’s leading provider of high Our standards are developed by experts quality, globally relevant International all over the world who work on a Standards through its members and volunteer or part-time basis. We sell stakeholders. International Standards to recover the costs of organizing this process and Our mission making standards widely available. ISO develops high quality voluntary Please respect our licensing terms and International Standards that facilitate copyright to ensure this system remains international exchange of goods and independent. services, support sustainable and equitable economic growth, promote If you would like to contribute to the innovation and protect health, safety development of ISO standards, please and the environment. contact the ISO Member Body in your country: www.iso.org/iso/home/about/iso_ members.htm This document has been prepared by: Copyright protected document ISO/IEC JTC 1, Information technology All rights reserved. Unless otherwise Cover photo credit: ISO/CS, 2015 be reproduced or utilized otherwise in specified,any form noor partby anyof this means, publication electronic may or mechanical, including photocopy, or posting on the internet or intranet, without prior permission. Permission can be requested from either ISO at the address below or ISO’s member body in the country of the requester: © ISO 2015, Published in Switzerland Case postale 56 • CH-1211 Geneva 20 Tel.ISO copyright+41 22 -
A Multi-View Hyperlexicon Resource for Speech and Language System Development
A multi-view hyperlexicon resource for speech and language system development Dafydd Gibbon, Thorsten Trippel Fakult¨at f¨ur Linguistik und Literaturwissenschaft Universit¨at Bielefeld Postfach 100 131, D–33501 Bielefeld, Germany gibbon, ttrippel @spectrum.uni-bielefeld.de Abstract New generations of integrated multimodal speech and language systems with dictation, readback or talking face facilities require multiple sources of lexical information for development and evaluation. Recent developments in hyperlexicon development offer new perspectives for the development of such resources which are at the same time practically useful, computationally feasible, and theoretically well– founded. We describe the specification, three–level lexical document design principles, and implementation of a MARTIF document structure and several presentation structures for a terminological lexicon, including both on demand access and full hypertext lexicon compilation. The underlying resource is a relational lexical database with SQL querying and access via a CGI internet interface. This resource is mapped on to the hypergraph structure which defines the macrostructure of the hyperlexicon. 1. Overview in the same sense in which it is used for smaller linguistic New generations of integrated multimodal speech and units such as word and sentence in current linguistic theory. language systems with dictation, readback or talking face Advances in computational techniques have led to a rap- facilities require multiple sources of lexical information for prochement between lexicon theory in computational lin- development and evaluation. Recent developments in hy- guistics and large–scale corpus–based computational lexi- perlexicon development offer new perspectives for the de- cography, minimising possible theory–application conflicts velopment of such resources which are at the same time in the present case. -
Iso 24613:2008(E)
This preview is downloaded from www.sis.se. Buy the entire standard via https://www.sis.se/std-910504 INTERNATIONAL ISO STANDARD 24613 First edition 2008-11-15 Language resource management — Lexical markup framework (LMF) Gestion de ressources langagières — Cadre de balisage lexical (LMF) Reference number ISO 24613:2008(E) © ISO 2008 This preview is downloaded from www.sis.se. Buy the entire standard via https://www.sis.se/std-910504 ISO 24613:2008(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT © ISO 2008 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body in the country of the requester. -
ESFRI / E-IRG Report on Data Management, January 2010
e-IRG Report on Data Management Data Management Task Force November 2009 1 1 EXECUTIVE SUMMARY A fundamental paradigm shift known as Data Intensive Science is quietly changing the way science and research in most disciplines is being conducted. While the unprecedented capacities of new research instruments and the massive computing capacities needed to handle their outputs occupy the headlines, the growing importance and changing role of data is rarely noticed. Indeed it seems the only hints to this ever burgeoning issue are mentions of heights of hypothetical stacks of DVDs when illustrating massive amounts of ”raw, passive fuel” for science. However, a shift from a more traditional methodology to Data Intensive Science – also sometimes recognized as the 4th Research Paradigm – is happening in most scientific areas and making data an active component in the process. This shift is also subtly changing how most research is planned, conducted, communicated and evaluated. This new paradigm is based on access and analysis of large amounts of new and existing data. This data can be the result of work of multiple groups of researchers, working concurrently or independently without any partnership to the researchers that originally gathered the information. Use of data by unknown parties for purposes that were not initially anticipated creates a number of new chal- lenges related to overall data management. Long-term storage, curation and certification of the data are just the tip of the iceberg. So called Digital Data Deluge, for example, caused by the ease with which large quantities of new data can be created, becomes much more difficult to deal with in this new environment. -
Tomaž Erjavec: Platforms and Standards for Data Sharing
7/21/15 Platforms and standards for data sharing Tomaž Erjavec Dept. of Knowledge Technologies Jožef Stefan Institute Ljubljana How to make data reusable? UFSP Sprache und Raum May 2015 2 Overview 1. Introduction 2. CLARIN repositories 3. Standards for encoding language data 4. Conclusions 1 7/21/15 Open data 3 Open source/free software • A very successful hippy attitude to program development and distribution: Users have the freedom to run, copy, distribute, study, change and improve the software. • Success stories: emacs, Linux, Perl, Apache, … • Licences to go with OS software: GPL, LGPL, Apache license, … → not only should the software be open, but any upgrade should also be made open Open data 4 Closed data • The basis of science is that experiments should be reproducible • Yet without the data, they cannot be. • But research data is typically unavailable to other researchers • Data is produced by researchers in (mostly) non-profit public institutions • Data is developed with public money So, why is it closed? 2 7/21/15 Open data 5 Reasons for locking (linguistic) data • Fear: „I could be sued for copyright or privacy violation“ • Perfectionism: „It still contains mistakes“ • Stinginess: „I worked too hard on it to just give it away“ • Work: „I would have to document/format it first“ • Money: „Maybe I can sell it at some point“ • Monopoly: „I am protecting my scientific position“ Open data 6 Results • Waste of public funds and of researchers time (duplication of effort) • Impossible to improve previous results & to collaborate (smaller efficiency) • Impossible to involve citizens and society (non-transparency of the scientific process) 3 7/21/15 Open data 7 Changing times Open text repositories: • MediaWiki, Google Books, OLAC, … H2020: • Open data and publications are a requirement • This policy is being adopted by EU member states Research infrastructures: • EU instrument for establishing long term facilities, resources and related services in order to support research • Humanities and social sciences: DARIAH, CLARIN Open data 8 II. -
Accessibility of Multilingual Terminological Resources - Current Problems and Prospects for the Future
Accessibility of Multilingual Terminological Resources - Current Problems and Prospects for the Future Gerhard Budin*, Alan K. Melby# * University of Vienna Department of Translation and Interpretation, 1190 Vienna, Austria <[email protected]> # Brigham Young University Department of Linguistics, Provo, Utah, USA <[email protected]> Abstract In this paper we analyse the various problems in making multilingual terminological resources available to users. Different levels of diversity and incongruence among such resources are discussed. Previous standardization efforts are reviewed. As a solution to the lack of co-ordination and compatibility among an increasing number of ‘standard’ interchange formats, a higher level of integration is proposed for the purpose of terminology-enabled knowledge sharing. The family of formats currently being developed in the SALT project is presented as a contribution to this solution. On all these levels, quite a number of standards have 1. Current Problems emerged: ISO standards such as ISO 12200 and 12620, Multilingual Terminological Resources (MTRs) have been industry standards such as LISA's TMX and TBX, created for decades for a variety of purposes. The tradition project- or organization-specific standards on the of specialized lexicography or terminography for creating international level such as the TEI, and on the EU level technical dictionaries has developed concomitantly with such as ELRA, EAGLES, Geneter (from Inesterm and the rise of modern science and technology; multilingual other projects), IIF (from the Interval project), OLIF (from terminology databases have been in existence since the the Otelo project), and many others. 1950s. But the integrative effect of standardization has been Being an integral part of the language industries and the limited by lack of co-ordination among such initiatives, information economy, MTRs have been integrated more resulting in the absence of interfaces among application- recently into machine translation systems, technical specific formats and models. -
Terminology for Translators — an Implementation of ISO 12620 Robert Bononno
Document generated on 09/27/2021 7:17 p.m. Meta Journal des traducteurs Translators' Journal Terminology for Translators — an Implementation of ISO 12620 Robert Bononno Volume 45, Number 4, décembre 2000 Article abstract As far as translators are concerned, terminology is primarily an ad hoc affair, URI: https://id.erudit.org/iderudit/002101ar more a matter of filling in the blanks in their knowledge than systematically DOI: https://doi.org/10.7202/002101ar studying a constellation of terms in a given universe of discourse. This article sketches the historical background of terminology and discusses the See table of contents importance of terminology management for translators. More specifically, it outlines a method by which translators can make use of the data categories discussed in ISO 12620 in their daily work. A specific implementation of these Publisher(s) data categories, using off-the-shelf software, is presented. Les Presses de l'Université de Montréal ISSN 0026-0452 (print) 1492-1421 (digital) Explore this journal Cite this article Bononno, R. (2000). Terminology for Translators — an Implementation of ISO 12620. Meta, 45(4), 646–669. https://doi.org/10.7202/002101ar Tous droits réservés © Les Presses de l'Université de Montréal, 2000 This document is protected by copyright law. Use of the services of Érudit (including reproduction) is subject to its terms and conditions, which can be viewed online. https://apropos.erudit.org/en/users/policy-on-use/ This article is disseminated and preserved by Érudit. Érudit is a non-profit inter-university consortium of the Université de Montréal, Université Laval, and the Université du Québec à Montréal. -
Iso/Ts 19104:2008(E)
This preview is downloaded from www.sis.se. Buy the entire standard via https://www.sis.se/std-910473 TECHNICAL ISO/TS SPECIFICATION 19104 First edition 2008-11-15 Geographic information — Terminology Information géographique — Terminologie Reference number ISO/TS 19104:2008(E) © ISO 2008 This preview is downloaded from www.sis.se. Buy the entire standard via https://www.sis.se/std-910473 ISO/TS 19104:2008(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT © ISO 2008 The reproduction of the terms and definitions contained in this International Standard is permitted in teaching manuals, instruction booklets, technical publications and journals for strictly educational or implementation purposes. The conditions for such reproduction are: that no modifications are made to the terms and definitions; that such reproduction is not permitted for dictionaries or similar publications offered for sale; and that this International Standard is referenced as the source document. -
05 Administrative and Preservation Metadata
Administrative and preservation 05 metadata Introduction During the course of its lifecycle, a single digital archival object requires an extensive amount of associated metadata, so that it can be managed and preserved effectively by the repository, and understood and accessed by the researcher. There are three broad categories of metadata: • Descriptive metadata: information about the intellectual content of a digital object, which is used to aid identification and discovery of the object by the researcher. • Structural metadata: information about the relationships between digital objects, which can be very complex in a large hybrid personal archive. Structural metadata also supports the display and navigation of digital objects by users. • Administrative metadata: information needed by the repository for the long-term management of a digital object, including information about an object’s creation, technical information such as file formats, provenance information and information about intellectual property rights (see p. 252). This chapter of the Workbook is primarly concerned with the metadata that must be recorded for administrative and preservation purposes, though it touches on descriptive metadata where this is relevant, and a single piece of metadata may, of course, fulfil several functions. The following metadata areas are introduced and their application to the context of hybrid or digital personal archives is explored: • Persistent identifiers (see below). • Preservation metadata for personal digital archives (see p. 73). • Using METS for preservation and dissemination of personal digital archives (see p. 117). • Rights metadata for personal digital archives (see p. 141). • Metadata for authenticity: hash functions and digital signatures (see p. 152). Persistent identifiers Persistent identifiers, often referred to as PIDs, provide a means of connecting and distinguishing between an identifier for an object (which should be permanent) and an object’s location (which may change). -
ISO Standards and Working Processes at the United Nations Language Services: a Comparison
ISO standards and working processes at the United Nations language services: a comparison María Barros Senior Reviser, STS/DGACM, UNHQ United Nations Sabbatical Leave Programme 2017 Abstract International standards, in particular those developed by the International Organization for Standardization (ISO), are a universally recognized means of guaranteeing service quality. In the translation and language industries, the demand for ISO-certified services has been growing steadily since the publication of ISO 17100, on requirements for translation service providers, which addresses translation-specific processes that are essential for ensuring quality. ISO 17100 covers a wide range of aspects, from the competences of translators, revisers and project managers, to technology and quality management requirements, or the need to assess client satisfaction and take corrective action if necessary. All these tasks are performed by the language services of the 1 | Page United Nations, which therefore can benefit from an in-depth analysis of both the standard requirements and the changes implemented by private sector providers that have obtained certification. The present study covers all the tasks included in ISO 17100, with a focus on improving and harmonizing existing working processes and suggesting their establishment where they are lacking. ACKNOWLEDGEMENTS: First of all, I would like to thank Professor Sue Ellen Wright, from Kent State University, for her collaboration, support and expert guidance in this study. I am also grateful to my colleagues from the Documentation Division of DGACM who kindly agreed to be interviewed and provided me with insights into the daily work of their units: Martine Azubuike (French Translation Service), Kieran Burns (English Translation Service), Luke Croll (Editing Section), José Carlos Fernández-Gancedo and Carmen Peris (Spanish Translation Service), Mario Gatti and Frank Scharm (German Translation Section), Pyotr Knyazev (Russian Translation Service) and Dexin Yuan (Chinese Translation Service).