Iso/Iec 30112 Wd12 Standard
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Computational Development of Lesser Resourced Languages
Computational Development of Lesser Resourced Languages Martin Hosken WSTech, SIL International © 2019, SIL International Modern Technical Capability l Grammar checking l Wikipedia l OCR l Localisation l Text to speech l Speech to text l Machine Translation © 2019, SIL International Digital Language Vitality l 0.2% doing well − 43% world population l 78% score nothing! − ~10% population © 2019, SIL International Simons and Thomas, 2019 Climbing from the Bottom l Language Tag l Linebreaking l Unicode encoding l Locale Information l Font − Character Lists − Sort order l Keyboard − physical l Content − phone © 2019, SIL International Language Tag l Unique orthography l lng – ISO639 identifier l Scrp – ISO 15924 l Structure: l RE – ISO 3166-1 − lng-Scrp-RE-variants − ahk = ahk-Latn-MM − https://ldml.api.sil.org/langtags.json BCP 47 © 2019, SIL International Language Tags l Variants l Policy Issues − dialect/language − ISO 639 is linguistic − orthography/script − Language tags are sociolinguistic − registration/private use © 2019, SIL International Unicode Encoding l Engineering detail l Policy Issues l Almost all scripts in − Use Unicode − Publish Orthography l Find a char Descriptions − Sequences are good l Implies an orthography © 2019, SIL International Fonts l Lots of fonts! l Policy Issues l SIL Fonts − Ensure industry support − Full script coverage − Encourage free fonts l Problems − adding fonts to phones © 2019, SIL− InternationalNoto styling Keyboards l Keyman l Wider industry − All platforms − More capable standard − Predictive text − More industry interest − Open Source − IDE © 2019, SIL International Keyboards l Policy Issues − Agreed layout l Per language l Physical & Mobile © 2019, SIL International Linebreaking l Unsolved problem l Word frequencies − Integration − open access − Description − same as for predictive text l Resources © 2019, SIL International Locale Information l A deep well! l Key terms l Unicode CLDR l Sorting − Industry base data l Dates, Times, etc. -
Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No. -
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
Multilingual Content Management and Standards with a View on AI Developments Laurent Romary
Multilingual content management and standards with a view on AI developments Laurent Romary To cite this version: Laurent Romary. Multilingual content management and standards with a view on AI developments. AI4EI - Conference Artificial Intelligence for European Integration, Oct 2020, Turin / Virtual, Italy. hal-02961857 HAL Id: hal-02961857 https://hal.inria.fr/hal-02961857 Submitted on 8 Oct 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Multilingual content management and standards with a view on AI developments Laurent Romary Directeur de Recherche, Inria, team ALMAnaCH ISO TC 37, chair Language and AI • Central role of language in the revival of AI (machine-learning based models) • Applications: document management and understanding, chatbots, machine translation • Information sources: public (web, cultural heritage repositories) and private (Siri, Amazon Alexa) linguistic information • European context: cf. Europe's Languages in the Digital Age, META-NET White Paper Series • Variety of linguistic forms • Spoken, written, chats and forums • Multilingualism, accents, dialects, technical domains, registers, language learners • General notion of language variety • Classifying and referencing the relevant features • Role of standards and standards developing organization (SDO) A concrete example for a start Large scale corpus Language model BERT Devlin, J., Chang, M. -
Agricultural Soil Carbon Credits: Making Sense of Protocols for Carbon Sequestration and Net Greenhouse Gas Removals
Agricultural Soil Carbon Credits: Making sense of protocols for carbon sequestration and net greenhouse gas removals NATURAL CLIMATE SOLUTIONS About this report This synthesis is for federal and state We contacted each carbon registry and policymakers looking to shape public marketplace to ensure that details investments in climate mitigation presented in this report and through agricultural soil carbon credits, accompanying appendix are accurate. protocol developers, project developers This report does not address carbon and aggregators, buyers of credits and accounting outside of published others interested in learning about the protocols meant to generate verified landscape of soil carbon and net carbon credits. greenhouse gas measurement, reporting While not a focus of the report, we and verification protocols. We use the remain concerned that any end-use of term MRV broadly to encompass the carbon credits as an offset, without range of quantification activities, robust local pollution regulations, will structural considerations and perpetuate the historic and ongoing requirements intended to ensure the negative impacts of carbon trading on integrity of quantified credits. disadvantaged communities and Black, This report is based on careful review Indigenous and other communities of and synthesis of publicly available soil color. Carbon markets have enormous organic carbon MRV protocols published potential to incentivize and reward by nonprofit carbon registries and by climate progress, but markets must be private carbon crediting marketplaces. paired with a strong regulatory backing. Acknowledgements This report was supported through a gift Conservation Cropping Protocol; Miguel to Environmental Defense Fund from the Taboada who provided feedback on the High Meadows Foundation for post- FAO GSOC protocol; Radhika Moolgavkar doctoral fellowships and through the at Nori; Robin Rather, Jim Blackburn, Bezos Earth Fund. -
Database Globalization Support Guide
Oracle® Database Database Globalization Support Guide 19c E96349-05 May 2021 Oracle Database Database Globalization Support Guide, 19c E96349-05 Copyright © 2007, 2021, Oracle and/or its affiliates. Primary Author: Rajesh Bhatiya Contributors: Dan Chiba, Winson Chu, Claire Ho, Gary Hua, Simon Law, Geoff Lee, Peter Linsley, Qianrong Ma, Keni Matsuda, Meghna Mehta, Valarie Moore, Cathy Shea, Shige Takeda, Linus Tanaka, Makoto Tozawa, Barry Trute, Ying Wu, Peter Wallack, Chao Wang, Huaqing Wang, Sergiusz Wolicki, Simon Wong, Michael Yau, Jianping Yang, Qin Yu, Tim Yu, Weiran Zhang, Yan Zhu This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. -
Bopomofo Extended Range: 31A0–31BF
Bopomofo Extended Range: 31A0–31BF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. -
5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. -
Czech and Slovak Typesetting Rules
Czech and Slovak Typesetting Rules Tomáš Hála Mendel University in Brno, CZ BachoTEX 2019 Czech and Slovak Typesetting Rules Selected sources − ON 88 2503:1974 − Pop, Flégr and Pop: Sazba I [Typesetting I], 1989 (textbook) − ČSN 01 6910:2007 and older − ČSN 01 6910:2011 − STN 01 6910:2011 − Pravidla českého pravopisu [Rules of Czech Ortography], 1987, − Pravidla českého pravopisu [Rules of Czech Ortography], 1993 − Pravidlá slovenského pravopisu [Rules of Slovak Ortography], 1993, 2000 2 Czech and Slovak Typesetting Rules Spaces intersentence spacing interword space non-breaking interword space thin space 3 Czech and Slovak Typesetting Rules Spaces between sentences intersentence spacing % Czech, Slovak \frenchspacing % English (American) \nonfrenchspacing 4 Czech and Slovak Typesetting Rules Spaces between sentences intersentence spacing % ConTeXt \installlanguage [\s!en] [\c!spacing=\v!broad, ... \installlanguage [\s!cs] [\c!spacing=\v!packed, ... 5 Czech and Slovak Typesetting Rules Dashes: punctuation usage en-dash XOR em-dash en-dash v em-dash: designer’s opinion dashes v spaces: semanticising usage 6 Czech and Slovak Typesetting Rules Dashes: punctuation usage dashes must not open the new line \def\ip{\pdash} % Czech, Slovak \def\pdash{~-- } 7 Czech and Slovak Typesetting Rules Dashes: interval usage Czech and Slovak ∘ 35–45 %, 5–8 C English ∘ ∘ 35%–45%, 5 C–15 C, 70–72 percent %Czech and Slovak \def\idash{\discretionary{\char32až}{}{--}} \def\az{\idash} %English \def\idash{\discretionary{\char32to}{}{--}} 8 Czech and Slovak Typesetting -
Kyrillische Schrift Für Den Computer
Hanna-Chris Gast Kyrillische Schrift für den Computer Benennung der Buchstaben, Vergleich der Transkriptionen in Bibliotheken und Standesämtern, Auflistung der Unicodes sowie Tastaturbelegung für Windows XP Inhalt Seite Vorwort ................................................................................................................................................ 2 1 Kyrillische Schriftzeichen mit Benennung................................................................................... 3 1.1 Die Buchstaben im Russischen mit Schreibschrift und Aussprache.................................. 3 1.2 Kyrillische Schriftzeichen anderer slawischer Sprachen.................................................... 9 1.3 Veraltete kyrillische Schriftzeichen .................................................................................... 10 1.4 Die gebräuchlichen Sonderzeichen ..................................................................................... 11 2 Transliterationen und Transkriptionen (Umschriften) .......................................................... 13 2.1 Begriffe zum Thema Transkription/Transliteration/Umschrift ...................................... 13 2.2 Normen und Vorschriften für Bibliotheken und Standesämter....................................... 15 2.3 Tabellarische Übersicht der Umschriften aus dem Russischen ....................................... 21 2.4 Transliterationen veralteter kyrillischer Buchstaben ....................................................... 25 2.5 Transliterationen bei anderen slawischen -
AGU Grammar and Style Guide
AGU Grammar and Style Guide 1. Hyphenation . 1 1.1. Attributive Adjectives . 1 1.2. Nouns . 5 1.3. Words Formed With Prefixes . 6 1.4. Words of Equal Weight . 7 2. Commas . 8 2.1. Examples of Correct Usage. 8 2.2. AGU Style . 9 2.3. Comma Usage at Beginning of Sentence . 9 2.4. Some Parts of Speech and Common Examples . 10 3. Additional Grammar/Punctuation Rules . 11 3.1. Adjective/Adverbial Phrases . 11 3.2. Comprise Versus Compose . 11 3.3. Singular Versus Plural With Certain Nouns. 11 3.4. Other Rules . 12 4. Spelling . 14 4.1. Alternate Spellings . 14 4.2. Commonly Used Proper Names . 14 4.3. Countries . 15 5. Capitalization . 16 5.1. Geographical Terms . 16 5.2. Text Capitalization . 17 5.3. Stratigraphic Divisions . 18 6. Numbers . 19 6.1. Cardinal Numbers/Arabic Numerals . 19 6.2. Ordinal Numbers . 19 6.3. Miscellaneous Style for Numbers . 19 7. Miscellaneous Style Rules . 20 8. Special Notations. 22 8.1. Astronomical Notation for Dates and Time. 22 8.2. Degrees, Minutes, and Seconds of Arc. 22 8.3. Units of Measure . 22 8.4. Dimensions. 25 8.5. Seismology. .. 25 8.6. Mineralogy. .. 26 8.7. Ranges. 26 8.8. Ships and Spacecraft. 26 8.9. Comets. .. 27 8.10. Temperature. .. 27 8.11. Times. .. 27 8.12. Storms. 27 8.13. Biology. 27 9. Word List . 28 GRAMMAR/STYLE GUIDE 2/03 ATTRIBUTIVE ADJECTIVES 1 1. Hyphenation The main reason for hyphenation is increased clarity. 1.1. Attributive Adjectives Always hyphen. The following should always be hyphened as attributive adjectives: 1. -
Alphabets, Letters and Diacritics in European Languages (As They Appear in Geography)
1 Vigleik Leira (Norway): [email protected] Alphabets, Letters and Diacritics in European Languages (as they appear in Geography) To the best of my knowledge English seems to be the only language which makes use of a "clean" Latin alphabet, i.d. there is no use of diacritics or special letters of any kind. All the other languages based on Latin letters employ, to a larger or lesser degree, some diacritics and/or some special letters. The survey below is purely literal. It has nothing to say on the pronunciation of the different letters. Information on the phonetic/phonemic values of the graphic entities must be sought elsewhere, in language specific descriptions. The 26 letters a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z may be considered the standard European alphabet. In this article the word diacritic is used with this meaning: any sign placed above, through or below a standard letter (among the 26 given above); disregarding the cases where the resulting letter (e.g. å in Norwegian) is considered an ordinary letter in the alphabet of the language where it is used. Albanian The alphabet (36 letters): a, b, c, ç, d, dh, e, ë, f, g, gj, h, i, j, k, l, ll, m, n, nj, o, p, q, r, rr, s, sh, t, th, u, v, x, xh, y, z, zh. Missing standard letter: w. Letters with diacritics: ç, ë. Sequences treated as one letter: dh, gj, ll, rr, sh, th, xh, zh.