Iso/Wd 21829 Terminology for Language Resource Management
Total Page:16
File Type:pdf, Size:1020Kb
Reference number of working document: ISO/TC 37/SC 4 N 179 Date: 2004-08-12 Committee identification: ISO/TC 37/SC 4 Secretariat: KATS ISO/WD 21829 TERMINOLOGY FOR LANGUAGE RESOURCE MANAGEMENT Warning This document is not an ISO International Standard. It is distributed for review and comment. It is subject to change without notice and may not be referred to as an International Standard. Recipients of this document are invited to submit, with their comments, notification of any relevant patent rights of which they are aware and to provide supporting documentation. Document type: International standard Document subtype: if applicable Document stage: 00.20 Document language: en ii © ISO 2004 – All rights reserved FOREWORD.............................................................................................................................................................................. IV INTRODUCTION.........................................................................................................................................................................1 1 SCOPE.......................................................................................................................................................................................2 2 NORMATIVE REFERENCES...............................................................................................................................................2 3 GENERAL CONCEPTS .........................................................................................................................................................2 4. PHONETICS AND PHONOLOGY ......................................................................................................................................10 5. MORPHOLOGY.....................................................................................................................................................................11 6. SYNTAX ..................................................................................................................................................................................21 7. SEMANTICS ...........................................................................................................................................................................25 8. LEXICAON .............................................................................................................................................................................35 9. LANGUAGE ENGINEERING ..............................................................................................................................................37 10. PRAGMATICS......................................................................................................................................................................41 11. FEATURE STRUCTURE ....................................................................................................................................................41 BIBLIOGRAPHY .......................................................................................................................................................................45 ALPHABETICAL INDEX .........................................................................................................................................................46 © ISO 2004 – All rights reserved iii Foreword [ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies (ISO member bodies). The work of preparing International Standards is normally carried out through ISO technical committees. Each member body interested in a subject for which a technical committee has been established has the right to be represented on that committee. International organizations, governmental and non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization. International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 3. Draft International Standards adopted by the technical committees are circulated to the member bodies for voting. Publication as an International Standard requires approval by at least 75 % of the member bodies casting a vote. Attention is drawn to the possibility that some of the elements of this International Standard may be the subject of patent rights. ISO shall not be held responsible for identifying any or all such patent rights. International Standard ISO 21829 was prepared by Technical Committee ISO/TC 37, Terminology and other language resources. iv © ISO 2004 – All rights reserved © ISO ISO/WD 21829: 2004 Introduction This standard draft was prepared for defining reference terms for all activities within language resource management as well as reference concepts for the natural language processing community. The model for preparing this standard is based on the ISO/TC 37/SC4 N027 ‘Basic Requirements for Terminology Management’ proposed by Klaus-D. Schmitz. - Term Autonomy - History/Backtracking - Design of the TMS The procedure for developing this standard is as follows: - Step 1: Collection of documents or texts relevant to language resource management { LREC 2002 Proceedings (354 Articles) { ISO/TC 37/SC 4 Documents (10 Documents) - Step 2: Term extraction from texts (automatic/manual) - Step 3: Determining term list (manual) { Selecting terms from candidate terms extracted from texts { Extending term list based on existing term banks/dictionaries - Step 4: Specification of data categories { Automatic extraction of term information (e.g. usages) { Manual input by referring existing resources like dictionaries, term banks, etc. The terms of this international terminology standard are given in a clustered ordering subsumed under a few general headings. The layout is designed according to ISO 10241. Thus, the elements of an entry appear in the following order: Entry number (bold face) Preferred term (bold face) Definition Note Example Usage Termporarily, sources for definition and uasage are identified within brackets. 1 ISO/WD 21829: 2004 © ISO 1 Scope This International Standard specifies terms for language resource management. 2 Normative references ISO 704:2000, Terminology work – Principles and methods. ISO 1087-1:2000, Terminology work – Vocabulary – Part 1: Theory and application. ISO 1087-2:2000, Terminology work – Vocabulary – Part 2: Computer applications. ISO 10241:1992, International terminology standards – Preparation and layout. ISO/IEC Guide 2:1991, General terms and their definitions concerning standardization and related activities 3 General concepts 3.1 adequacy Evaluation of success in the writing of a grammar according to various criteria [ASHER] 3.2 annotation description, reference or explanation, added to or interspersed among the statements of the source language, that has no effect in the object language [TERMIUM] USAGE The annotation contains information about the abbreviation full stop, but it can not be deleted from the token list because the same full stop can also be the end of a sentence. [LREC 2002, 70.txt] 3.3 aspect way of looking at the action There are two aspects of the verb in Bulgarian “and in the other Slavic languages". EXAMPLE I'll phone my mother tomorrow. (The event is planed as a single completed action.) From now on every week I'll phone my mother. (The action is intended to be completed successfully and repeated more than once.) Tomorrow afternoon I'll be preparing myself for the English language test. (The action will be in progress by tomorrow afternoon.) 3.4 base document document containing data to be captured in order to be processed by a data processing system [TERMIUM] 3.5 bibliographical entry 2 © ISO ISO/WD 21829: 2004 note in a catalog or bibliography, relating to the bibliographical history or description of a book [TERMIUM] 3.6 capital letter large forms of letters, e.g. Z, I, A. [TERMIUM] 3.7 cardinal numeral numeral of the class whose members : - are considered basic in form - are used in counting, and - are used in expressing how many objects are referred to [www.sil.org/linguistics/GlossaryOfLinguisticTerms/WhatIsACardinalNumeral.htm] 3.8 character set finite set of different characters that is complete for a given purpose [TERMIUM] USAGE To extract information about the graphical typology of the character set , it is necessary to transform the low bitmap image information into a higher level representation. [LREC 2002, 25.txt] 3.9 communication transfer of data among functional units according to sets of rules governing data transmission and the coordination of the exchange [TERMIUM] USAGE Thirdly, a major characteristic of human communication is behavioural coordination. [LREC 2002, 214.txt] 3.10 constraint property or relation that restricts the space of possible solutions to a problem [TERMIUM] USAGE Ensuring the validity of feature structures may require much more than simply specifying the range of allowed values for each feature. There may be constraints on the co-occurrence of one feature value with the value of another feature in the same feature structure or in an embedded feature structure. [N040.txt] 3.11 construction ordered arrangement of grammatical units forming a larger unit [http://www.sil.org/linguistics/GlossaryOfLinguisticTerms/] 3.12 context text which illustrates a concept or the use of a designation [ISO 12620]