Iso/Iec Jtc1/Sc2/Wg2 N4178r L2/12-002
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Iso/Iec Jtc1/Sc22/Wg20 N809r
ISO/IEC JTC1/SC22/WG20 N809R 2001-01-09 Internationalization International Organization for Standardization Organisation internationale de normalisation еждународнаяорганизацияпостандартизации Doc Type: Working Group Document Title: Ordering the Runic script Source: Michael Everson Status: Expert Contribution Date: 2001-01-09 On 2000-12-24 Olle Järnefors published on behalf of the ISORUNES Project in Sweden a proposal for ordering the Runes in the Common Tailorable Template (CTT) of ISO/IEC 14651. In my view this ordering is unsuitable for the CTT for a number of reasons. Runic ordering in ISO/IEC 10646. The Runes are encoded at U+16A0–U+16FF, in a unified set of characters encompassing the four major traditions of Runic use: Germanic, Anglo-Frisian, Danish, and Swedish-Norwegian, and Medieval. The Runes are arranged in the code table agreed by ISO/IEC JTC1/SC2/WG2 in an order based on the the traditional positions of the Runes in abecedaries, namely, the fuþark order. This order is known from hundreds of primary sources which list the Runes in sequence, often with no other text. Most scholarly texts refer to the fuþark in one way or another. Nearly all secondary texts, whether popular introductions to the Runes or New-Age esoterica, give primacy to the traditional fuþark sequence. Runic names in ISO/IEC 10646. The names given to the Runes in the UCS may be a bit clumsy, but they are intended to serve the needs of scholars and amateurs alike; not everyone is familiar with Runic transliteration practices, and not everyone is conversant with the traditional names in Germanic, English, and Scandinavian usage. -
Latin Spelling and Pronunciation 1 Latin Spelling and Pronunciation
Latin spelling and pronunciation 1 Latin spelling and pronunciation Latin spelling or orthography refers to the spelling of Latin words written in the scripts of all historical phases of Latin, from Old Latin to the present. All scripts use the same alphabet, but conventional spellings may vary from phase to phase. The Roman alphabet, or Latin alphabet, was adapted from the Old Italic alphabet to represent the phonemes of the Latin language. The Old Italic alphabet had in turn been borrowed from the Greek alphabet, itself adapted from the Phoenician alphabet. Latin pronunciation continually evolved over the centuries, making it difficult for speakers in one era to know how Latin was spoken in prior eras. A given phoneme may be represented by different letters in different periods. This article deals primarily with modern scholarship's best reconstruction of Classical Latin's phonemes (phonology) and the pronunciation and spelling used by educated people in the late Ancient Roman inscription in Roman square capitals. The words are separated by Republic, and then touches upon later engraved dots, a common but by no means universal practice, and long vowels are changes and other variants. marked by apices. Letters and phonemes In Latin spelling, individual letters mostly corresponded to individual phonemes, with three main exceptions: 1. Each vowel letter—⟨a⟩, ⟨e⟩, ⟨i⟩, ⟨o⟩, ⟨v⟩, ⟨y⟩—represented both long and short vocalic phonemes. As for instance mons /ˈmoːns/ has long /oː/, pontem /ˈpontem/ short /o/. The long vowels were distinguished by apices in many Classical texts (móns), but are not always reproduced in modern copy. -
UAX #44: Unicode Character Database File:///D:/Uniweb-L2/Incoming/08249-Tr44-3D1.Html
UAX #44: Unicode Character Database file:///D:/Uniweb-L2/Incoming/08249-tr44-3d1.html Technical Reports L2/08-249 Working Draft for Proposed Update Unicode Standard Annex #44 UNICODE CHARACTER DATABASE Version Unicode 5.2 draft 1 Authors Mark Davis ([email protected]) and Ken Whistler ([email protected]) Date 2008-7-03 This Version http://www.unicode.org/reports/tr44/tr44-3.html Previous http://www.unicode.org/reports/tr44/tr44-2.html Version Latest Version http://www.unicode.org/reports/tr44/ Revision 3 Summary This annex consolidates information documenting the Unicode Character Database. Status This is a draft document which may be updated, replaced, or superseded by other documents at any time. Publication does not imply endorsement by the Unicode Consortium. This is not a stable document; it is inappropriate to cite this document as other than a work in progress. A Unicode Standard Annex (UAX) forms an integral part of the Unicode Standard, but is published online as a separate document. The Unicode Standard may require conformance to normative content in a Unicode Standard Annex, if so specified in the Conformance chapter of that version of the Unicode Standard. The version number of a UAX document corresponds to the version of the Unicode Standard of which it forms a part. Please submit corrigenda and other comments with the online reporting form [Feedback]. Related information that is useful in understanding this annex is found in Unicode Standard Annex #41, “Common References for Unicode Standard Annexes.” For the latest version of the Unicode Standard, see [Unicode]. For a list of current Unicode Technical Reports, see [Reports]. -
5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. -
Europe-II 8 Ancient and Other Scripts
The Unicode® Standard Version 12.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2019 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 12.0. Includes index. ISBN 978-1-936213-22-1 (http://www.unicode.org/versions/Unicode12.0.0/) 1. -
ISO/IEC JTC1/SC2/WG2 N2664 L2/03-393 A. Administrative B. Technical -- General
ISO/IEC JTC1/SC2/WG2 N2664 L2/03-393 2003-11-02 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation internationale de normalisation еждународная организация по стандартизации Doc Type: Working Group Document Title: Preliminary proposal to encode the Cuneiform script in the SMP of the UCS Source: Michael Everson, Karljürgen Feuerherm, Steve Tinney Status: Individual Contribution Date: 2003-11-02 A. Administrative 1. Title Preliminary proposal to encode the Cuneiform script in the SMP of the UCS. 2. Requester’s name Michael Everson, Karljürgen Feuerherm, Steve Tinney 3. Requester type (Member body/Liaison/Individual contribution) Individual contribution. 4. Submission date 2003-11-02 5. Requester’s reference (if applicable) 6. Choose one of the following: 6a. This is a complete proposal No. This is a preliminary proposal 6b. More information will be provided later Yes. B. Technical -- General 1. Choose one of the following: 1a. This proposal is for a new script (set of characters) Yes. Proposed name of script Cuneiform and Cuneiform Numbers. 1b. The proposal is for addition of character(s) to an existing block No. 1b. Name of the existing block 2. Number of characters in proposal 952. 3. Proposed category (see section II, Character Categories) Category B. 4a. Proposed Level of Implementation (1, 2 or 3) (see clause 14, ISO/IEC 10646-1: 2000) Level 1. 4b. Is a rationale provided for the choice? Yes. 4c. If YES, reference Characters are ordinary spacing characters. 5a. Is a repertoire including character names provided? Yes. 5b. If YES, are the names in accordance with the character naming guidelines in Annex L of ISO/IEC 10646-1: 2000? Yes. -
Jtc1/Sc2/Wg2 N3427 L2/08-132
JTC1/SC2/WG2 N3427 L2/08-132 2008-04-08 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Proposal to encode 39 Unified Canadian Aboriginal Syllabics in the UCS Source: Michael Everson and Chris Harvey Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 2008-04-08 1. Summary. This document requests 39 additional characters to be added to the UCS and contains the proposal summary form. 1. Syllabics hyphen (U+1400). Many Aboriginal Canadian languages use the character U+1428 CANADIAN SYLLABICS FINAL SHORT HORIZONTAL STROKE, which looks like the Latin script hyphen. Algonquian languages like western dialects of Cree, Oji-Cree, western and northern dialects of Ojibway employ this character to represent /tʃ/, /c/, or /j/, as in Plains Cree ᐊᓄᐦᐨ /anohc/ ‘today’. In Athabaskan languages, like Chipewyan, the sound is /d/ or an alveolar onset, as in Sayisi Dene ᐨᕦᐣᐨᕤ /t’ąt’ú/ ‘how’. To avoid ambiguity between this character and a line-breaking hyphen, a SYLLABICS HYPHEN was developed which resembles an equals sign. Depending on the typeface, the width of the syllabics hyphen can range from a short ᐀ to a much longer ᐀. This hyphen is line-breaking punctuation, and should not be confused with the Blackfoot syllable internal-w final proposed for U+167F. See Figures 1 and 2. 2. DHW- additions for Woods Cree (U+1677..U+167D). ᙷᙸᙹᙺᙻᙼᙽ/ðwē/ /ðwi/ /ðwī/ /ðwo/ /ðwō/ /ðwa/ /ðwā/. The basic syllable structure in Cree is (C)(w)V(C)(C). -
Comments Received for ISO 639-3 Change Request 2015-046 Outcome
Comments received for ISO 639-3 Change Request 2015-046 Outcome: Accepted after appeal Effective date: May 27, 2016 SIL International ISO 639-3 Registration Authority 7500 W. Camp Wisdom Rd., Dallas, TX 75236 PHONE: (972) 708-7400 FAX: (972) 708-7380 (GMT-6) E-MAIL: [email protected] INTERNET: http://www.sil.org/iso639-3/ Registration Authority decision on Change Request no. 2015-046: to create the code element [ovd] Ӧvdalian . The request to create the code [ovd] Ӧvdalian has been reevaluated, based on additional information from the original requesters and extensive discussion from outside parties on the IETF list. The additional information has strengthened the case and changed the decision of the Registration Authority to accept the code request. In particular, the long bibliography submitted shows that Ӧvdalian has undergone significant language development, and now has close to 50 publications. In addition, it has been studied extensively, and the academic works should have a distinct code to distinguish them from publications on Swedish. One revision being added by the Registration Authority is the added English name “Elfdalian” which was used in most of the extensive discussion on the IETF list. Michael Everson [email protected] May 4, 2016 This is an appeal by the group responsible for the IETF language subtags to the ISO 639 RA to reconsider and revert their earlier decision and to assign an ISO 639-3 language code to Elfdalian. The undersigned members of the group responsible for the IETF language subtag are concerned about the rejection of the Elfdalian language. There is no doubt that its linguistic features are unique in the continuum of North Germanic languages. -
Issues in the Representation of Pointed Hebrew in Unicode Third Draft, Peter Kirk, August 2003
Issues in the Representation of Pointed Hebrew in Unicode Third draft, Peter Kirk, August 2003 1. Introduction The Hebrew block of the Unicode Standard (http://www.unicode.org/charts/PDF/U0590.pdf) is intended to include all of the characters needed for proper representation of Hebrew texts from all periods of the Hebrew language, including fully pointed and cantillated ancient texts such as that of the Hebrew Bible. It is also intended to cover other languages written in Hebrew script, including Aramaic as used in biblical and other religious texts1 as well as Yiddish and a few other modern languages. In practice there are a number of issues and minor deficiencies in the Hebrew block as currently defined, in version 4.0 of the Unicode Standard (http://www.unicode.org/versions/Unicode4.0.0/), which affect its usefulness for representation of pointed Hebrew texts and of Hebrew script texts in some other languages. Some of these simply require clarification and agreed guidelines for implementers. Others require further discussion and decision, and possibly additions to the Unicode standard or other action by the Unicode Technical Committee. The conclusion reached in this paper is that two new Unicode characters should be proposed; other issues can be resolved by use of suitable sequences of existing characters, provided that such use is generally agreed by content providers and rendering systems. Several of these issues relate to different typographical conventions for publishing of Hebrew texts. It seems that a particular set of conventions is used for general publications in Hebrew, especially in Israel, but various other conventions, in which more fine distinctions are made, are used mainly for quality editions of biblical and other religious texts. -
Encoding of Tengwar Telcontar Version 0.08
Tengwar Telcontar This document discusses the encoding of Tengwar Telcontar version 0.08. The latest version of the font and this document can be downloaded from the Free Tengwar Font Project: http://freetengwar.sourceforge.net/. Changes from earlier proposals By creating a fully functional Unicode font for Tengwar, my intention is to promote the proposal to encode Tengwar in Unicode, and to spur the discussion on the best way to design such an encoding. What then constitutes the best way to encode Tengwar? This I can hardly decide on my own; indeed, it is of the highest importance that an encoding proposal is approved by a majority of the Tengwar user community. You are therefore cordially invited to discuss and to propose changes to Tengwar Telcontar. If you want to familiarize yourself with the Unicode standard in general, you can find much information at http://www.unicode.org/. In particular, I recommend that you read chapter 2 of The Unicode Standard, which defines many important concepts (e. g. terms like character, glyph, etc.): http://www.unicode.org/versions/Unicode5.2.0/ch02.pdf. The current encoding of the characters in Tengwar Telcontar (shown in a table at the end of this document) is based on the encoding in Michael Everson’s latest discussion paper at the Conscript Unicode Registry: http://www.evertype.com/standards/iso10646/pdf/tengwar.pdf. However, I have on more than one occasion diverged from Everson’s table, adding some characters that I felt were missing and removed others that, to my opinion, either do not merit inclusion at all, or which possibly might be better represented in other ways. -
Manuscript Submission: Use of the Coptic Script Version 1.1, April 19, 2021, by Pim Rietbroek and Maaike Langerak
Manuscript Submission: Use of the Coptic Script Version 1.1, April 19, 2021, by Pim Rietbroek and Maaike Langerak Instructions for Authors These instructions cover the Sahidic dialect of the Coptic language; for any questions about glyph variants or diacritics found in Bohairic Coptic or other Coptic dialects, please contact your associate editor at Brill. 1 Word Processing Windows users should use MS Office Word 2016 or later, or 365. Documents should be saved in .docx format. Mac users should use either MS Office for Mac, Mellel, Nisus Writer Pro, Nisus Writer Express, or Pages. Save (or export) in .doc or .docx format, but also submit the files in their original format (.mellel, .pages, etc.). 2 Input Fonts Make sure you use a Unicode font, preferably Antinoou. Brill uses Antinoou as default font for typesetting Coptic. Antinoou has been developed by Michael Everson under the auspices of the International Association for Coptic Studies. The Institut français d’archéologie Orientale (IFAO) provides a couple of Coptic fonts: IFAO N Copte and IFAO- Grec Unicode. Please do not use these fonts: some of the rarer characters and glyphs in them are encoded in the Private Use Area, and these will be lost when the text is placed in the layout template by the typesetters, meaning they will not transfer to either print or online publications. Antinoou will render most of them using standard Unicode characters; should you need Coptic characters or glyphs that are not yet encoded in the Unicode Standard, please contact your associate editor at Brill. 3 Keying Coptic 3.1 Keyboards Out of the box Windows and macOS do not provide ‘keyboards’ (‘IMEs’ or ‘Input Methods’ or ‘Input Sources’) for Coptic, but fortunately there are few available as free downloads. -
The Twenty-Nine Enclitics of Meskwaki
The Twenty-Nine Enclitics of Meskwaki IVES GODDARD Smithsonian Institution INTRODUCTION Enclitics in Algonquian languages have received some attention (e.g., Bloom¿eld 1957:7, 131–132; Bloom¿eld 1962:459–462; Jolley 1984; Valentine 2001:72–73, 150–152; Goddard 2008:262–270; Quinn 2010; LeSourd 2011), but they are often classed with other particles and not explicitly labeled (Szabó 1981).1 Meskwaki enclitics will be of interest because they are clearly identi¿able as a formal class, and because it appears likely that Meskwaki has by far the largest repertoire of any language in the family. The present paper is perforce only a preliminary survey of the Meskwaki enclitics and their many interesting features. After the summary introduction there is a complete inventory followed by sections on idiomatic enclitic combinations, other idioms that include enclitics, multiple enclitics, and cognates and etymologies.2 The Meskwaki enclitics are particles (uninÀected words) of no more than three syllables that always attach to a preceding word (the host); in phonemic transcription they are separated from the host by a double hyphen (or equals sign: =) and this is also used to mark an enclitic when it is cited as a word. The questions of de¿nition and identi¿cation that dominate the recent general literature on enclitics thankfully do not arise. Meskwaki enclitics are 1. Of course, the terms enclitic and clitic have also sometimes been applied to af¿xes that are not enclitics, as in Szabó (1981). 2. The entries for the enclitics and other topical entries are numbered in parentheses and cross-referred to by non-italic numbers in parentheses (even within parentheses).