A Registry of Linguistic Data Categories

Total Page:16

File Type:pdf, Size:1020Kb

A Registry of Linguistic Data Categories The Workshop Programme [Please insert timing of sessions, authors and titles of speeches, coffee/lunch breaks using font Times New Roman, 12 pts and interlinear spacing as you prefer] 9:30 Welcome and Introduction 9:45 Daan Broeder et al., INTERA: A Distributed Metadata Domain of Language Resources 10:15 Laurent Romary et al., An on-line data category registry for linguistic resource management 10:45Break 11:00 Gil Francopoulo et al., Data Categories in Lexical Markup Framework or how to lighten a Model 11:30 Francesca Bertagna et al.,The MILE Lexical Classes: Data Categories for Content Interoperability among Lexicons 12:00 David Dalby and Lee Gillam,“Weaving the Linguasphere”:LS 639, ISO 639 and ISO 12620 12:30 Lunch 14:00 Robert Kelly et al., Annotating Syllable Corpora with Linguistic Data Categories in XML 14:30 Thorsten Trippel, Metadata for Time Aligned Corpora 15:00 Ricardo Ribeiro, How to integrate data from different sources 15:30 Kiyong Lee et al., Towards an international standard on feature structure representation (2) 16:00 Ewan Klein and Stephen Potter, An Ontology for NLP Services 16:30 Discussion and Panel: Evaluation of Metadata and Data Categories Workshop Organisers [names and affiliations] Thierry Declerck, Saarland University and DFKI GmbH Nancy Ide, Vassar College Key-Sun Choi, Kaist Laurent Romary, LORIA Workshop Programme Committee [names and affiliations] Maria Gavrilidou, ILSP Stelios Piperidis ILSP Daan Broeder, Max Planck Institute Peter Wittenburg, Max Planck Institute Nicoletta Calzolari, ILC Monica Monachini, ILC Claudia Soria,ILC Khalid Choukry, ELRA/ELDA Mahtab Nikkhou, ELDA Kiyong Lee, Korea University Paul Buitelaar, DFKI Andreas Witt, University of Bielefeld Scott Farrar, University of Bremen William Lewis, University of Arizona Terry Langendoen, University of Arizona Gary Simons, SIL International Eric de la Clergerie, INRIA Table of Contents INTERA: A Distributed Metadata Domain of Language Resources. Daan Broeder, Maria Nava and Thierry Declerck An on-line data category registry for linguistic resource management. Laurent Romary, Nancy Ide, Peter Wittenburg and Thierry Declerck Data Categories in Lexical Markup Framework or how to lighten a Model. Gil Francopoulo, Monte George and Mandy Pet The MILE Lexical Classes: Data Categories for Content Interoperability among Lexicons. Francesca Bertagna, Alessandro Lenci, Monica Monachini and Nicoletta Calzolari “Weaving the Linguasphere”:LS 639, ISO 639 and ISO 12620. David Dalby and Lee Gillam Annotating Syllable Corpora with Linguistic Data Categories in XML. Robert Kelly, Moritz Neugebauer, MichaelWalsh & StephenWilson Metadata for Time Aligned Corpora. Thorsten Trippel How to integrate data from different sources. Ricardo Ribeiro, David M. de Matos, and Nuno J. Mamede Towards an international standard on feature structure representation (2): Kiyong Lee, Lou Burnard, Laurent Romary, Eric de la Clergerie, Ulrich Schaefer, Thierry Declerck, Syd Bauman, Harry Bunt, Lionel Clément, Tomaz Erjavec, Azim Roussanaly, and Claude Roux An Ontology for NLP Services: Ewan Klein and Stephen Potter Author Index Syd Bauman Francesca Bertagna Daan Broeder Harry Bunt Lou Burnard Nicoletta Calzolari Lionel Clément David M. de Matos Eric de la Clergerie David Dalby Thierry Declerck Tomaz Erjavec Gil Francopoulo Monty George Lee Gillam Nancy Ide Robert Kelly Ewan Klein Kiyong Lee Alessandro Lenci Nuno J. Mamede Monica Monachini Maria Nava Moritz Neugebauer Mandy Pet Stephen Potter Ricardo Ribeiro Laurent Romary Azim Roussanaly Claude Roux Ulrich Schaefer Thorsten Trippel Peter Wittenburg INTERA: A Distributed Metadata Domain of Language Resources Daan Broeder, Maria Nava+, Thierry Declerck++ Max-Planck-Institute for Psycholinguistics [email protected], +Evaluation and Language Resources Distribution Agency, ++University of Saarland Abstract The INTERA and ECHO projects were partly intended to create a critical mass of open and linked metadata descriptions of language resources helping researchers to understand the benefits of an increased visibility of language resources in the Internet and to motivate them to participate. The work was based on using the new IMDI version 3.0.3 which is a result of experiences with the earlier version and new requirements coming from the involved partners. Language resource distribution centers such as ELDA have the opportunity to use and add to this metadata infrastructure and use it to enhance their catalogue and offer more services to their customers such as offering data samples and download of partial corpora .This document sumarizes mainly experiences done in the project INTERA. schema. It was adapted to simplify the content description Introduction and the artificial distinction between collectors and other At LREC 2000 in Athens the first workshop1 about participants probably influenced by Dublin Core was metadata concepts for making language resources visible removed. Three major extensions were applied: First, it is in and discoverable via the Internet was organized by now possible to describe written resources that are not some of the authors. At LREC 2002 two groups annotations or descriptions. This was necessary, since demonstrated operational frameworks for creating most language collections contain written resources in the metadata for language resources and to work with them form of field notes, sketch grammars, phoneme for management and discovery purposes. While OLAC2 descriptions etc. Second, as a consequence of long discussions with participants of the MILE lexicon (Open Language Community Archives Community) 6 started form a Dublin Core3 point of view with the goal to initiative , it is now possible to describe lexicons with a create a set that allows for the description of all types of specialized set of descriptor elements. language resources, software tools, and advice, the IMDI4 (ISLE Metadata Initiative) activities started with a slightly Third, it is now possible to define and add project-specific different approach. The focus was primarily on profiles. In the earlier version IMDI supported already the multimedia/multimodal corpora and a more detailed set possibility of extensions at various levels in the form of was worked out that can be used not only for resource user defined category–value pairs, i.e., the user was able discovery but also for exploitation and managing large to define a private category and associate values with it. corpora. Most importantly, IMDI allows its metadata descriptions to be organized into linked hierarchies This feature was used by individuals and also projects to supporting browsing and enabling data managers to carry include special descriptors, however, these descriptors out a variety of management tasks. were not fully supported by the IMDI tools. In the new version, however, projects or sub-domains such as the The two years since 2002 have been used to improve the Dutch Spoken Corpus respectively the Sign Language metadata sets based on the experience of the communities. community can define a set of important categories and They have also been used to create an interoperable these are supported while editing or searching. domain, i.e., a mapping schema was worked out between the IMDI and OLAC sets and the IMDI domain acts as an Therefore, IMDI consists of its core definitions that have OLAC data provider. IMDI records can be searched for to be stable to assure users that their work will be from the OLAC domain. exploitable even after many years and of sub-community specific extensions, which nevertheless are result of IMDI Metadata Set 3.0.3 discussion processes. Based on the experiences and on a broad discussion A new direction is also given to IMDI in INTERA, which process including field linguists, corpus linguists and foresees the linking (or merging) of metadata for language language engineers, the IMDI set 3.0.3 was designed as 5 data with descriptors in use in catalogues for language part of the INTERA project and is available as an XML- technology tools, like the ACL Natural Language 7 Registry, hosted at DFKI . 1 http://www.mpi.nl/ISLE 2 OLAC: http://language-archives.org 3 See http://dublincore.org/ for more information on the 6 See the Dublin Core Metadata Initiative http://www.ilc.cnr.it/EAGLES96/isle/complex/clwg_hom 4 IMDI: http://www.mpi.nl/IMDI e_page.htm for more details 5 Integrated European language Resource Area: 7 See http://regsistry.dfki.de http://www.elda.fr/index.html IMDI Catalogue Metadata descriptors. The introduction of metadata elements The design and development of the IMDI metadata set supplying information on feature distribution is currently was directed to offering adequate descriptions at the level under study. This class of metadata would be specific for of resources. However it was recognized at an early stage describing varying parameters across a corpus, where their that there is a need to describe whole collections of overall distribution is important in order to determine language resources at the level of a finished or published whether a corpus may be suitable for a certain purpose: corpus. During a IMDI workshop in 2001 a proposal for Among these feature distribution metadata are: the IMDI Metadata Elements for Catalogue descriptions8 • was presented based on information from the Evaluation Age/Gender distribution of participants (age and Language Resources Distribution Agency (ELDA) or classes, number of age classes, etc); • the Linguistic Data Consortium (LDC) Language distribution (number of languages, The description
Recommended publications
  • Standardisation Action Plan for Clarin
    Standardisation Action Plan for Clarin State: Proposal to CLARIN community Nuria Bel, Jonas Beskow, Lou Boves, Gerhard Budin, Nicoletta Calzolari, Khalid Choukri, Erhard Hinrichs, Steven Krauwer, Lothar Lemnitzer, Stelios Piperidis, Adam Przepiorkowski, Laurent Romary, Florian Schiel, Helmut Schmidt, Hans Uszkoreit, Peter Wittenburg August 2009 Summary This document describes a proposal for a Standardisation Action Plan (SAP) for the Clarin initiative in close synchronization with other relevant initiatives such as Flarenet, ELRA, ISO and TEI. While Flarenet is oriented towards a broader scope since it is also addressing standards that are typically used in industry, CLARIN wants to be more focussed in its statements to the research domain. Due to the overlap it is agreed that the Flarenet and CLARIN documents on standards need to be closely synchronized. This note covers standards that are generic (XML, UNICODE) as well as standards that are domain specific where naturally the LRT community has much more influence. This Standardization Action Plan wants to give an orientation for all practical work in CLARIN to achieve a harmonized domain of language resources and technology stepwise and therefore its core message is to overcome fragmentation. To meet these goals it wants to keep its message as simple as possible. A web-site will be established that will contain more information about examples, guidelines, explanations, tools, converters and training events such as summer schools. The organization of the document is as follows: • Chapter 1: Introduction to the topic. • Chapter 2: Recommended standards that CLARIN should endorse page 4 • Chapter 3: Standards that are emerging and relevant for CLARIN page 8 • Chapter 4: General guidelines that need to be followed page 12 • Chapter 5: Reference to community practices page 14 • Chapter 6: References This document tries to be short and will give comments, recommendations and discuss open issues for each of the standards.
    [Show full text]
  • Proceedings of the Workshop on Multilingual Language Resources and Interoperability, Pages 1–8, Sydney, July 2006
    COLING •ACL 2006 MLRI’06 Multilingual Language Resources and Interoperability Proceedings of the Workshop Chairs: Andreas Witt, Gilles Sérasset, Susan Armstrong, Jim Breen, Ulrich Heid and Felix Sasaki 23 July 2006 Sydney, Australia Production and Manufacturing by BPA Digital 11 Evans St Burwood VIC 3125 AUSTRALIA c 2006 The Association for Computational Linguistics Order copies of this and other ACL proceedings from: Association for Computational Linguistics (ACL) 209 N. Eighth Street Stroudsburg, PA 18360 USA Tel: +1-570-476-8006 Fax: +1-570-476-0860 [email protected] ISBN 1-932432-82-5 ii Table of Contents Preface .....................................................................................v Organizers . vii Workshop Program . ix Lexical Markup Framework (LMF) for NLP Multilingual Resources Gil Francopoulo, Nuria Bel, Monte George, Nicoletta Calzolari, Monica Monachini, Mandy Pet and Claudia Soria . 1 The Role of Lexical Resources in CJK Natural Language Processing Jack Halpern . 9 Towards Agent-based Cross-Lingual Interoperability of Distributed Lexical Resources Claudia Soria, Maurizio Tesconi, Andrea Marchetti, Francesca Bertagna, Monica Monachini, Chu-Ren Huang and Nicoletta Calzolari. .17 The LexALP Information System: Term Bank and Corpus for Multilingual Legal Terminology Consolidated Verena Lyding, Elena Chiocchetti, Gilles Sérasset and Francis Brunet-Manquat . 25 The Development of a Multilingual Collocation Dictionary Sylviane Cardey, Rosita Chan and Peter Greenfield. .32 Multilingual Collocation Extraction: Issues and Solutions Violeta Seretan and Eric Wehrli . 40 Structural Properties of Lexical Systems: Monolingual and Multilingual Perspectives Alain Polguère . 50 A Fast and Accurate Method for Detecting English-Japanese Parallel Texts Ken’ichi Fukushima, Kenjiro Taura and Takashi Chikayama . 60 Evaluation of the Bible as a Resource for Cross-Language Information Retrieval Peter A.
    [Show full text]
  • Part 2: Data Category Selection (DCS) for Electronic Terminological Resources (ETR)
    © ISO 2003 – All rights reserved Reference number of working document: ISO/TC 37/SC 3 N 486 Date: 2003-10-31 (12620.Part 2.doc ) ISO/CD 12620-2 Committee identification: ISO/TC 37/SC 3 Secretariat: DIN Terminology and other language resources — Data categories — Part 2: Data category selection (DCS) for electronic terminological resources (ETR) Warning This document is not an ISO International Standard. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an International Standard. Recipients of this document are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation. Document type: International standard Document subtype: if applicable Document stage: 00.20 Document language: en M:\dms\dtec\xdom741\iso\12620\Part 2\N486 CD12620pt2 2003.DOC EN2 2000-07-10 © ISO ISO Draft Revision 12620, Part 2:2003(E) Copyright notice This ISO document is a draft revision and is copyright-protected by ISO. While the reproduction of a draft revision in any form for use by participants in the ISO standards development process is permitted without prior permission from ISO, neither this document nor any extract from it may be reproduced, stored or transmitted in any form for any other purpose without prior written permission from ISO. Requests for permission to reproduce this document for the purpose of selling it should be addressed as shown below or to ISO’s member body in the country of the requester: [Indicate : the full address telephone number fax number telex number and electronic mail address as appropriate, of the Copyright Manager of the ISO member body responsible for the secretariat of the TC or SC within the framework of which the draft has been prepared] Reproduction for sales purposes may be subject to royalty payments or a licensing agreement.
    [Show full text]
  • A Multi-View Hyperlexicon Resource for Speech and Language System Development
    A multi-view hyperlexicon resource for speech and language system development Dafydd Gibbon, Thorsten Trippel Fakult¨at f¨ur Linguistik und Literaturwissenschaft Universit¨at Bielefeld Postfach 100 131, D–33501 Bielefeld, Germany gibbon, ttrippel @spectrum.uni-bielefeld.de Abstract New generations of integrated multimodal speech and language systems with dictation, readback or talking face facilities require multiple sources of lexical information for development and evaluation. Recent developments in hyperlexicon development offer new perspectives for the development of such resources which are at the same time practically useful, computationally feasible, and theoretically well– founded. We describe the specification, three–level lexical document design principles, and implementation of a MARTIF document structure and several presentation structures for a terminological lexicon, including both on demand access and full hypertext lexicon compilation. The underlying resource is a relational lexical database with SQL querying and access via a CGI internet interface. This resource is mapped on to the hypergraph structure which defines the macrostructure of the hyperlexicon. 1. Overview in the same sense in which it is used for smaller linguistic New generations of integrated multimodal speech and units such as word and sentence in current linguistic theory. language systems with dictation, readback or talking face Advances in computational techniques have led to a rap- facilities require multiple sources of lexical information for prochement between lexicon theory in computational lin- development and evaluation. Recent developments in hy- guistics and large–scale corpus–based computational lexi- perlexicon development offer new perspectives for the de- cography, minimising possible theory–application conflicts velopment of such resources which are at the same time in the present case.
    [Show full text]
  • Iso 24613:2008(E)
    This preview is downloaded from www.sis.se. Buy the entire standard via https://www.sis.se/std-910504 INTERNATIONAL ISO STANDARD 24613 First edition 2008-11-15 Language resource management — Lexical markup framework (LMF) Gestion de ressources langagières — Cadre de balisage lexical (LMF) Reference number ISO 24613:2008(E) © ISO 2008 This preview is downloaded from www.sis.se. Buy the entire standard via https://www.sis.se/std-910504 ISO 24613:2008(E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF-creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. COPYRIGHT PROTECTED DOCUMENT © ISO 2008 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the address below or ISO's member body in the country of the requester.
    [Show full text]
  • Standardization Work Isotiger Transcription of Spoken Language CQLF: Corpus Query Lingua Franca
    Recent Initiatives towards New Standards for Language Resources Gottfried Herzog1, Ulrich Heid2, Thorsten Trippel3, Piotr Ba´nski4, Laurent Romary5, Thomas Schmidt4, Andreas Witt4, Kerstin Eckart6 1DIN Deutsches Institut fur¨ Normung e. V., Berlin, 2Universit¨at Hildesheim, 3Universit¨at Tubingen,¨ 4Institut fur¨ Deutsche Sprache, Mannheim, 5Inria, 6Universit¨at Stuttgart E-mail: [email protected] Standardization Work Transcription of spoken language • Who? A representation format to compare, interchange and combine { Experts from industry, academia and administrations orthography-based transcriptions of spoken language. { Experts are nominated { based on expertise and interest The standard is developed in cooperation with TEI proposals in the field. [Schmidt 2011] { ISO committee TC 37/SC 4 and national mirror committees, Based on e.g. for Germany: Normenausschuss NA-105-00-06 AA • State of the art tools and formats for creating, editing, Sprachressourcen publishing and querying transcribed data DIN { Deutsches Institut f¨ur Normung e.V. • Widely used transcription systems • How? { Stepwise procedures: Encoded components Example on handout Proposals { working drafts { (draft) international standards • Metadata: based on TEI header { Consensus-based: drafting { commenting { ballot • Macrostructure: timeline, single and grouped utterances, { National standards organizations provide infrastructure elements outside utterances (e.g. <pause> and <incident>) • Microstructure: Excerpt of the list of standards and standard proposals by ISO
    [Show full text]
  • Mise En Page 1
    Catalogue général Janvier 2019 27-37 St George’s Road – London SW19 4EU — United Kingdom www.iste-editions.fr – www.openscience.fr – www.iste.co.uk – www.istegroup.com Table des matières Table of contents Organisation éditoriale, SCIENCES .................................................................................. 4 Biologie, médecine et santé • Biology, Medecine and Health ........................................ 7 Bioingénierie médicale • Biomedical Engineering ................................................................ 8 Biologie • Biology ................................................................................................................. 11 Ingénierie de la santé et société • Health Engineering and Society ..................................... 12 Chimie • Chemistry ............................................................................................................ 17 Agriculture, science des aliments et nutrition ....................................................................... 18 Agriculture, Food Science and Nutrition Chimie • Chemistry ............................................................................................................... 20 Écologie et environnement • Ecology and Environment ................................................ 23 Écologie • Ecological Sciences ............................................................................................ 24 Environnement • Environmental Sciences ........................................................................... 28
    [Show full text]
  • ESFRI / E-IRG Report on Data Management, January 2010
    e-IRG Report on Data Management Data Management Task Force November 2009 1 1 EXECUTIVE SUMMARY A fundamental paradigm shift known as Data Intensive Science is quietly changing the way science and research in most disciplines is being conducted. While the unprecedented capacities of new research instruments and the massive computing capacities needed to handle their outputs occupy the headlines, the growing importance and changing role of data is rarely noticed. Indeed it seems the only hints to this ever burgeoning issue are mentions of heights of hypothetical stacks of DVDs when illustrating massive amounts of ”raw, passive fuel” for science. However, a shift from a more traditional methodology to Data Intensive Science – also sometimes recognized as the 4th Research Paradigm – is happening in most scientific areas and making data an active component in the process. This shift is also subtly changing how most research is planned, conducted, communicated and evaluated. This new paradigm is based on access and analysis of large amounts of new and existing data. This data can be the result of work of multiple groups of researchers, working concurrently or independently without any partnership to the researchers that originally gathered the information. Use of data by unknown parties for purposes that were not initially anticipated creates a number of new chal- lenges related to overall data management. Long-term storage, curation and certification of the data are just the tip of the iceberg. So called Digital Data Deluge, for example, caused by the ease with which large quantities of new data can be created, becomes much more difficult to deal with in this new environment.
    [Show full text]
  • Tomaž Erjavec: Platforms and Standards for Data Sharing
    7/21/15 Platforms and standards for data sharing Tomaž Erjavec Dept. of Knowledge Technologies Jožef Stefan Institute Ljubljana How to make data reusable? UFSP Sprache und Raum May 2015 2 Overview 1. Introduction 2. CLARIN repositories 3. Standards for encoding language data 4. Conclusions 1 7/21/15 Open data 3 Open source/free software • A very successful hippy attitude to program development and distribution: Users have the freedom to run, copy, distribute, study, change and improve the software. • Success stories: emacs, Linux, Perl, Apache, … • Licences to go with OS software: GPL, LGPL, Apache license, … → not only should the software be open, but any upgrade should also be made open Open data 4 Closed data • The basis of science is that experiments should be reproducible • Yet without the data, they cannot be. • But research data is typically unavailable to other researchers • Data is produced by researchers in (mostly) non-profit public institutions • Data is developed with public money So, why is it closed? 2 7/21/15 Open data 5 Reasons for locking (linguistic) data • Fear: „I could be sued for copyright or privacy violation“ • Perfectionism: „It still contains mistakes“ • Stinginess: „I worked too hard on it to just give it away“ • Work: „I would have to document/format it first“ • Money: „Maybe I can sell it at some point“ • Monopoly: „I am protecting my scientific position“ Open data 6 Results • Waste of public funds and of researchers time (duplication of effort) • Impossible to improve previous results & to collaborate (smaller efficiency) • Impossible to involve citizens and society (non-transparency of the scientific process) 3 7/21/15 Open data 7 Changing times Open text repositories: • MediaWiki, Google Books, OLAC, … H2020: • Open data and publications are a requirement • This policy is being adopted by EU member states Research infrastructures: • EU instrument for establishing long term facilities, resources and related services in order to support research • Humanities and social sciences: DARIAH, CLARIN Open data 8 II.
    [Show full text]
  • A Method for Reusing and Re-Engineering Non-Ontological Resources for Building Ontologies
    Departamento de Inteligencia Artificial Facultad de Informatica´ PhD Thesis A Method for Reusing and Re-engineering Non-ontological Resources for Building Ontologies Author : Msc. Boris Marcelo Villazon´ Terrazas Advisor : Prof. Dr. Asuncion´ Gomez´ Perez´ 2011 ii Tribunal nombrado por el Sr. Rector Magfco. de la Universidad Politecnica´ de Madrid, el d´ıa...............de.............................de 20.... Presidente : Vocal : Vocal : Vocal : Secretario : Suplente : Suplente : Realizado el acto de defensa y lectura de la Tesis el d´ıa..........de......................de 20...... en la E.T.S.I. /Facultad...................................................... Calificacion´ .................................................................................. EL PRESIDENTE LOS VOCALES EL SECRETARIO iii iv Abstract Current well-known methodologies for building ontologies do not consider the reuse and possible subsequent re-engineering of existing knowledge resources. The ontologization of non-ontological resources has led to the design of several specific methods, techniques and tools. These are mainly specific to a particular resource type, or to a particular resource implementation. Thus, everytime ontol- ogy engineers are confronted with the task of re-engineering a new resource into an ontology, they develop ad-hoc solutions for transforming such resource into a single ontology. Within the context of the NeOn project, we propose a novel methodology for building ontology networks: the NeOn Methodology, a methodology based on sce- narios. One
    [Show full text]
  • Accessibility of Multilingual Terminological Resources - Current Problems and Prospects for the Future
    Accessibility of Multilingual Terminological Resources - Current Problems and Prospects for the Future Gerhard Budin*, Alan K. Melby# * University of Vienna Department of Translation and Interpretation, 1190 Vienna, Austria <[email protected]> # Brigham Young University Department of Linguistics, Provo, Utah, USA <[email protected]> Abstract In this paper we analyse the various problems in making multilingual terminological resources available to users. Different levels of diversity and incongruence among such resources are discussed. Previous standardization efforts are reviewed. As a solution to the lack of co-ordination and compatibility among an increasing number of ‘standard’ interchange formats, a higher level of integration is proposed for the purpose of terminology-enabled knowledge sharing. The family of formats currently being developed in the SALT project is presented as a contribution to this solution. On all these levels, quite a number of standards have 1. Current Problems emerged: ISO standards such as ISO 12200 and 12620, Multilingual Terminological Resources (MTRs) have been industry standards such as LISA's TMX and TBX, created for decades for a variety of purposes. The tradition project- or organization-specific standards on the of specialized lexicography or terminography for creating international level such as the TEI, and on the EU level technical dictionaries has developed concomitantly with such as ELRA, EAGLES, Geneter (from Inesterm and the rise of modern science and technology; multilingual other projects), IIF (from the Interval project), OLIF (from terminology databases have been in existence since the the Otelo project), and many others. 1950s. But the integrative effect of standardization has been Being an integral part of the language industries and the limited by lack of co-ordination among such initiatives, information economy, MTRs have been integrated more resulting in the absence of interfaces among application- recently into machine translation systems, technical specific formats and models.
    [Show full text]
  • Terminology for Translators — an Implementation of ISO 12620 Robert Bononno
    Document generated on 09/27/2021 7:17 p.m. Meta Journal des traducteurs Translators' Journal Terminology for Translators — an Implementation of ISO 12620 Robert Bononno Volume 45, Number 4, décembre 2000 Article abstract As far as translators are concerned, terminology is primarily an ad hoc affair, URI: https://id.erudit.org/iderudit/002101ar more a matter of filling in the blanks in their knowledge than systematically DOI: https://doi.org/10.7202/002101ar studying a constellation of terms in a given universe of discourse. This article sketches the historical background of terminology and discusses the See table of contents importance of terminology management for translators. More specifically, it outlines a method by which translators can make use of the data categories discussed in ISO 12620 in their daily work. A specific implementation of these Publisher(s) data categories, using off-the-shelf software, is presented. Les Presses de l'Université de Montréal ISSN 0026-0452 (print) 1492-1421 (digital) Explore this journal Cite this article Bononno, R. (2000). Terminology for Translators — an Implementation of ISO 12620. Meta, 45(4), 646–669. https://doi.org/10.7202/002101ar Tous droits réservés © Les Presses de l'Université de Montréal, 2000 This document is protected by copyright law. Use of the services of Érudit (including reproduction) is subject to its terms and conditions, which can be viewed online. https://apropos.erudit.org/en/users/policy-on-use/ This article is disseminated and preserved by Érudit. Érudit is a non-profit inter-university consortium of the Université de Montréal, Université Laval, and the Université du Québec à Montréal.
    [Show full text]