Comments on Review Draft UC3M0714 of Unicode 3.0 Source: Michael Everson Status: Expert Contribution Date: 1998-11-01

Total Page:16

File Type:pdf, Size:1020Kb

Comments on Review Draft UC3M0714 of Unicode 3.0 Source: Michael Everson Status: Expert Contribution Date: 1998-11-01 Title: Comments on Review Draft UC3M0714 of Unicode 3.0 Source: Michael Everson Status: Expert contribution Date: 1998-11-01 In the pages following, I have given items from a text dump of the Unicode 3.0 draft in 9- point Helvetica with comments following the items in 12-point Times. I have not reviewed the Brahmic scripts in this document. 0021 EXCLAMATION MARK = factorial = bang → (inverted exclamation mark - 00A1) → (latin letter retroflex click - 01C3) → (double exclamation mark - 203C) → (heavy exclamation mark ornament - 2762) Add x 203D ?! interrobang 0022 QUOTATION MARK = APL quote • neutral (vertical), used as opening or closing quotation mark • preferred characters for paired quotation marks are 201C & 201D This is not true for Swedish or German. Please add “in the English language”. → (modifier letter double prime - 02BA) → (combining double acute accent - 030B) → (combining double vertical line above - 030E) The leading is bad here. Please make sure the leading of all lines in the document is the same everywhere. 0028 LEFT PARENTHESIS = OPENING PARENTHESIS 0029 RIGHT PARENTHESIS = CLOSING PARENTHESIS • see discussion on semantics of paired bracketing characters This note should appear under both 0028 and 0029, and all other paired bracketing characters. Also, it should rever to the actual reference to where the discussion is. 002C COMMA → (arabic comma - 060C) → (ideographic comma - 3001) Add x 201A , SINGLE LOW-9 QUOTATION MARK 002D HYPHEN-MINUS = hyphen or minus sign • used for either hyphen or minus sign • other hyphen and dash characters: 2010-2015 In the printout, but not in this text dump, it says “2010 - — 2015 —”, but the first em-dash is confusing. Whenever ranges are given, the arrow → should be given. Cross references should not be given with the arrow. This is confusing. I suggest that the cross references be shown with the commonly recognized and attractive printer’s fist ☞ 261E. Page 1 002F SOLIDUS = SLASH = virgule, shilling (British) → (latin letter dental click - 01C0) → (fraction slash - 2044) → (division slash - 2215) The alignment between the glyphs and the names for the cross references is not the same for each line and should be. A tab needs to be set for these things. 0049 LATIN CAPITAL LETTER I • Turkish uses 0131 for lowercase Correct to “Turkish and Azerbaijani use 0131 for lowercase”. Add cross references to vertical line 007C, cyrillic capital letter byelorussian-ukrainian i 0406, cyrillic palochka 04C0, and roman numeral one 2160. 004B LATIN CAPITAL LETTER K → (kelvin sign - 212A) I have never understood why the glyphs at °C 2103 and °F 2109 but °K 212A does not. One says “degrees Celsius”, “degrees Fahrenheit”, and “degrees Kelvin”. 0069 LATIN SMALL LETTER I • Turkish uses 0130 for uppercase Correct to “Turkish and Azerbaijani use 0130 for uppercase” 007C VERTICAL LINE Add cross references to latin capital letter I 0049, cyrillic capital letter byelorussian-ukrainian i 0406, cyrillic palochka 04C0, and roman numeral one 2160. 00A1 INVERTED EXCLAMATION MARK • Spanish Add “, Asturian, Galician” 00A3 POUND SIGN = pound sterling Add “, Irish punt” 00A4 CURRENCY SIGN • other currency symbol characters: 20A0-20AD Add 00A2 CENT SIGN, 00A3 POUND SIGN, 00A5 YEN SIGN, 0E3F THAI BAHT, 17DB KHMER CURRENCY SYMBOL RIEL all of which are other currency symbol characters. 00A7 SECTION SIGN • paragraph sign in some European usage Possibly add note “• derives from manuscript tradition” 00AA FEMININE ORDINAL INDICATOR • Spanish This is probably used in other languages, such as Galician, Italian, Portuguese, etc. Page 2 00B2 SUPERSCRIPT TWO = squared • other superscript digit characters: 2070-2079 With regard to the range mark, see the note on 002D above. 00B3 SUPERSCRIPT THREE = cubed → (superscript one - 00B9) ≈ <super> 0033 It should be noted that ≈ 2248 is ALMOST EQUAL TO. What is ≈ intended to represent here? Is it not intended to represent Í 224D EQUIVALENT TO or á 2263 STRICTLY EQUIVALENT TO? This is not clear to the reader. 00BA MASCULINE ORDINAL INDICATOR • Spanish This is probably used in other languages, such as Galician, Italian, etc. In the absence of further information, the note should be deleted as noncomprehensive. 00D0 LATIN CAPITAL LETTER ETH (Icelandic) → (latin small letter eth - 00F0) → (latin capital letter d with stroke - 0110) → (latin capital letter african d - 0189) The glyph for 0110 should not differ from the glyph for 00D0 and 0189. 00DF LATIN SMALL LETTER SHARP S (German) = ess-zed • German Note that the parenthetical has been retained (possibly an error in Asmus’ program) in the character name. “Ess-zed” is incorrect. The correct German spelling is “Eszett” (note capital E). Add “scharfes Es” or “scharfes s”. • uppercase is "SS" This is no longer entirely true; the rule is, apparently, suspended for personal names. Delete the note here. Or add LATIN CAPITAL LETTER SHARP S to the UCS. :-) 00E5 LATIN SMALL LETTER A WITH RING ABOVE • Danish, Norwegian, Swedish Add “, Walloon”. 00E6 LATIN SMALL LETTER AE = LATIN SMALL LIGATURE AE • IPA → (latin small ligature oe - 0153) Add “Danish, Norwegian, Icelandic, Faroese, Old English, French”. Add “Commonly called Ash (from Old English æsc)”. 00EC LATIN SMALL LETTER I WITH GRAVE • Italian, Malagash It is Malagasy and not Malagash (which is an anglicization of the French term Malagache). Page 3 00F0 LATIN SMALL LETTER ETH (Icelandic) • Icelandic, Faroese, old English, IPA Say “Old Engish”. → 00D0 D latin capital letter eth Add cross references to 03B4 GREEK SMALL LETTER DELTA and 2202 PARTIAL DIFFERENTIAL. 00FD LATIN SMALL LETTER Y WITH ACUTE • Czech, Slovak, Icelandic, Faroese, Malagash Add “, Welsh”. It is Malagasy and not Malagash (which is an anglicization of the French term Malagache). 00FE LATIN SMALL LETTER THORN (Icelandic) • Icelandic, old English, IPA • Runic letter borrowed into Latin script Say Old English. 0101 LATIN SMALL LETTER A WITH MACRON • Latvian, ... Change “Latvian, …” to “Latvian, Latin”. Do not use ellipses in these notes. 0103 LATIN SMALL LETTER A WITH BREVE • Romanian, Vietnamese, ... Change “Romanian, Vietnamese, …” to “Romanian, Vietnamese, Latin”. Do not use ellipses in these notes. 0104 LATIN CAPITAL LETTER A WITH OGONEK The position of the ogonek in the glyph is incorrect. 0105 LATIN SMALL LETTER A WITH OGONEK • Polish, Lithuanian, ... Say “Lithuanian, Lakota, Polish”. Do not use ellipses in these notes. 0107 LATIN SMALL LETTER C WITH ACUTE • Polish, Croatian, ... Say “Croatian, Polish”. Do not use ellipses in these notes. 0109 LATIN SMALL LETTER C WITH CIRCUMFLEX • Esperanto 010B LATIN SMALL LETTER C WITH DOT ABOVE • Maltese Add “, Irish Gaelic (old orthography)” 010D LATIN SMALL LETTER C WITH CARON • (many) Say “Czech, Lakota, Slovak, Slovenian, and many other languages.” or delete the note. Page 4 010E LATIN CAPITAL LETTER D WITH CARON • the form using caron/hacek is preferred in all contexts Do not use the term hacek here. If you do use the term it must be spelled correctly, namely hácˇek. 010F LATIN SMALL LETTER D WITH CARON • Czech, Slovak • the form using apostrophe is preferred in typesetting The glyph is incorrect. It is not acceptable to display this character with the non-preferred glyph. 0110 LATIN CAPITAL LETTER D WITH STROKE The glyph is incorrect. The glyph should be identical to the glyph used at 00D0 and 0189. 0111 LATIN SMALL LETTER D WITH STROKE • Croatian, Vietnamese, Lappish Do not use the term “Lappish” as it is considered offensive by Sámi people. Use the term “Sámi”. 0113 LATIN SMALL LETTER E WITH MACRON • Latvian, ... Change “Latvian, …” to “Latvian, Latin”. Do not use ellipses in these notes. 0115 LATIN SMALL LETTER E WITH BREVE • Malay, ... Change “Malay, …” to “Malay, Latin”. Do not use ellipses in these notes. 0118 LATIN SMALL LETTER E WITH OGONEK The position of the ogonek in the glyph is incorrect. 0119 LATIN SMALL LETTER E WITH OGONEK • Polish, Lithuanian, ... Change “Polish, Lithuanian, …” to “Polish, Lithuanian”. Do not use ellipses in these notes. The position of the ogonek in the glyph is incorrect. 011B LATIN SMALL LETTER E WITH CARON • Czech, ... Change “Czech, …” to “Czech”. Do not use ellipses in these notes. 011F LATIN SMALL LETTER G WITH BREVE • Turkish Change to “Turkish, Azerbaijani”. 0121 LATIN SMALL LETTER G WITH DOT ABOVE • Maltese, ... Change “Maltese, …” to “Maltese, Irish Gaelic (old orthography)”. Do not use ellipses in these notes. 0122 LATIN CAPITAL LETTER G WITH CEDILLA The glyph is incorrect. Page 5 0123 LATIN SMALL LETTER G WITH CEDILLA • Latvian, Lappish This letter is not used in Sámi, so delete “Lappish”. The glyph is incorrect and the preferred form with turned comma above must be used. • there are three glyph variants Delete this note. The other glyph variants are not preferred by Latvians. 0127 LATIN SMALL LETTER H WITH STROKE • Maltese, IPA, ... Change “Maltese, IPA, …” to “Maltese, IPA”. Do not use ellipses in these notes. 0129 LATIN SMALL LETTER I WITH TILDE • Greenlandic Change to “Greenlandic (old orthography) 012B LATIN SMALL LETTER I WITH MACRON • Latvian, ... Change “Latvian, …” to “Latvian, Latin”. Do not use ellipses in these notes. 012D LATIN SMALL LETTER I WITH BREVE • Latin, ... Change “Latin, …” to “Latin”. Do not use ellipses in these notes. 012E LATIN CAPITAL LETTER I WITH OGONEK The position of the ogonek in the glyph is incorrect. 012F LATIN SMALL LETTER I WITH OGONEK • Lithuanian, ... Change “Lithuanian, …” to “Lithuanian, Navajo”. Do not use ellipses in these notes. The position of the ogonek in the glyph is incorrect. 0130 LATIN CAPITAL LETTER I WITH DOT ABOVE = LATIN CAPITAL LETTER I DOT • Turkish Use “Turkish, Azerbaijani” 0131 LATIN SMALL LETTER DOTLESS I • Turkish Use “Turkish, Azerbaijani” 0136 LATIN CAPITAL LETTER K WITH CEDILLA The glyph is incorrect. It must be a comma below. 0137 LATIN SMALL LETTER K WITH CEDILLA • Latvian, ... Change “Latvian, …” to “Latvian”. Do not use ellipses in these notes. The glyph is incorrect. It must be a comma below. Page 6 0138 LATIN SMALL LETTER KRA (Greenlandic) • old Greenlandic Change to “Greenlandic (old orthography) 013B LATIN CAPITAL LETTER L WITH CEDILLA The glyph is incorrect.
Recommended publications
  • Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
    ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No.
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Combining Diacritical Marks Range: 0300–036F the Unicode Standard
    Combining Diacritical Marks Range: 0300–036F The Unicode Standard, Version 4.0 This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 4.0. Characters in this chart that are new for The Unicode Standard, Version 4.0 are shown in conjunction with any existing characters. For ease of reference, the new characters have been highlighted in the chart grid and in the names list. This file will not be updated with errata, or when additional characters are assigned to the Unicode Standard. See http://www.unicode.org/charts for access to a complete list of the latest character charts. Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 4.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this excerpt file, please consult the appropriate sections of The Unicode Standard, Version 4.0 (ISBN 0-321-18578-1), as well as Unicode Standard Annexes #9, #11, #14, #15, #24 and #29, the other Unicode Technical Reports and the Unicode Character Database, which are available on-line. See http://www.unicode.org/Public/UNIDATA/UCD.html and http://www.unicode.org/unicode/reports A thorough understanding of the information contained in these additional sources is required for a successful implementation. Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts.
    [Show full text]
  • 1. Introduction
    ISO/IEC JTC1/SC2/WG2 N4162 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Revised proposal to encode Latin letters used in the Former Soviet Union Authors: Nurlan Joomagueldinov, Karl Pentzlin, Ilya Yevlampiev Status: Expert Contribution Action: For consideration by JTC1/SC2/WG2 and UTC Date: 2012-01-29 Supersedes: L2/11-360, WG2 N4162 – Two characters were added for Komi-Permyak (LATIN CAPITAL/SMALL LETTER ZE WITH DESCENDER). – The LATIN SMALL LETTER CAUCASIAN LONG S was disunified from U+017F LATIN SMALL LETTER LONG S (see the remark in the list of proposed characters at U+AB89). – Some issues raised in L2/11-422 are addressed in the text (especially, section 2.1.1 "Descender vs. cedilla" was added). Terminology used in this document: "Descender" refers to the specially formed appendage on letters like the one in the already encoded letter U+A790 LATIN CAPITAL LETTER N WITH DESCENDER. "Typographical descender" refers to the part of a letter below the baseline, thus resembling the term "descender" as used in typography. 1. Introduction In the wake of the October Revolution of 1917 in Russia, alphabetization of the people living in the then formed Soviet Union became an important point of the political agenda. At that time, some languages spoken in the Soviet Union had no standardized orthography at all, while others (especially in areas where the Islam was the predominant religion) used the Arabic script. As most of these orthographies did not reflect the phonetics of these languages very well, and as the Arabic script was considered unnecessarily difficult by some due to its structure, for most of the non-Slavic languages it was decided to design new orthographies from scratch.
    [Show full text]
  • Gerard Manley Hopkins' Diacritics: a Corpus Based Study
    Gerard Manley Hopkins’ Diacritics: A Corpus Based Study by Claire Moore-Cantwell This is my difficulty, what marks to use and when to use them: they are so much needed, and yet so objectionable.1 ~Hopkins 1. Introduction In a letter to his friend Robert Bridges, Hopkins once wrote: “... my apparent licences are counterbalanced, and more, by my strictness. In fact all English verse, except Milton’s, almost, offends me as ‘licentious’. Remember this.”2 The typical view held by modern critics can be seen in James Wimsatt’s 2006 volume, as he begins his discussion of sprung rhythm by saying, “For Hopkins the chief advantage of sprung rhythm lies in its bringing verse rhythms closer to natural speech rhythms than traditional verse systems usually allow.”3 In a later chapter, he also states that “[Hopkins’] stress indicators mark ‘actual stress’ which is both metrical and sense stress, part of linguistic meaning broadly understood to include feeling.” In his 1989 article, Sprung Rhythm, Kiparsky asks the question “Wherein lies [sprung rhythm’s] unique strictness?” In answer to this question, he proposes a system of syllable quantity coupled with a set of metrical rules by which, he claims, all of Hopkins’ verse is metrical, but other conceivable lines are not. This paper is an outgrowth of a larger project (Hayes & Moore-Cantwell in progress) in which Kiparsky’s claims are being analyzed in greater detail. In particular, we believe that Kiparsky’s system overgenerates, allowing too many different possible scansions for each line for it to be entirely falsifiable. The goal of the project is to tighten Kiparsky’s system by taking into account the gradience that can be found in metrical well-formedness, so that while many different scansion of a line may be 1 Letter to Bridges dated 1 April 1885.
    [Show full text]
  • 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
    Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
    [Show full text]
  • Haptiread: Reading Braille As Mid-Air Haptic Information
    HaptiRead: Reading Braille as Mid-Air Haptic Information Viktorija Paneva Sofia Seinfeld Michael Kraiczi Jörg Müller University of Bayreuth, Germany {viktorija.paneva, sofia.seinfeld, michael.kraiczi, joerg.mueller}@uni-bayreuth.de Figure 1. With HaptiRead we evaluate for the first time the possibility of presenting Braille information as touchless haptic stimulation using ultrasonic mid-air haptic technology. We present three different methods of generating the haptic stimulation: Constant, Point-by-Point and Row-by-Row. (a) depicts the standard ordering of cells in a Braille character, and (b) shows how the character in (a) is displayed by the three proposed methods. HaptiRead delivers the information directly to the user, through their palm, in an unobtrusive manner. Thus the haptic display is particularly suitable for messages communicated in public, e.g. reading the departure time of the next bus at the bus stop (c). ABSTRACT Author Keywords Mid-air haptic interfaces have several advantages - the haptic Mid-air Haptics, Ultrasound, Haptic Feedback, Public information is delivered directly to the user, in a manner that Displays, Braille, Reading by Blind People. is unobtrusive to the immediate environment. They operate at a distance, thus easier to discover; they are more hygienic and allow interaction in 3D. We validate, for the first time, in INTRODUCTION a preliminary study with sighted and a user study with blind There are several challenges that blind people face when en- participants, the use of mid-air haptics for conveying Braille. gaging with interactive systems in public spaces. Firstly, it is We tested three haptic stimulation methods, where the hap- more difficult for the blind to maintain their personal privacy tic feedback was either: a) aligned temporally, with haptic when engaging with public displays.
    [Show full text]
  • Kyrillische Schrift Für Den Computer
    Hanna-Chris Gast Kyrillische Schrift für den Computer Benennung der Buchstaben, Vergleich der Transkriptionen in Bibliotheken und Standesämtern, Auflistung der Unicodes sowie Tastaturbelegung für Windows XP Inhalt Seite Vorwort ................................................................................................................................................ 2 1 Kyrillische Schriftzeichen mit Benennung................................................................................... 3 1.1 Die Buchstaben im Russischen mit Schreibschrift und Aussprache.................................. 3 1.2 Kyrillische Schriftzeichen anderer slawischer Sprachen.................................................... 9 1.3 Veraltete kyrillische Schriftzeichen .................................................................................... 10 1.4 Die gebräuchlichen Sonderzeichen ..................................................................................... 11 2 Transliterationen und Transkriptionen (Umschriften) .......................................................... 13 2.1 Begriffe zum Thema Transkription/Transliteration/Umschrift ...................................... 13 2.2 Normen und Vorschriften für Bibliotheken und Standesämter....................................... 15 2.3 Tabellarische Übersicht der Umschriften aus dem Russischen ....................................... 21 2.4 Transliterationen veralteter kyrillischer Buchstaben ....................................................... 25 2.5 Transliterationen bei anderen slawischen
    [Show full text]
  • Alphabets, Letters and Diacritics in European Languages (As They Appear in Geography)
    1 Vigleik Leira (Norway): [email protected] Alphabets, Letters and Diacritics in European Languages (as they appear in Geography) To the best of my knowledge English seems to be the only language which makes use of a "clean" Latin alphabet, i.d. there is no use of diacritics or special letters of any kind. All the other languages based on Latin letters employ, to a larger or lesser degree, some diacritics and/or some special letters. The survey below is purely literal. It has nothing to say on the pronunciation of the different letters. Information on the phonetic/phonemic values of the graphic entities must be sought elsewhere, in language specific descriptions. The 26 letters a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z may be considered the standard European alphabet. In this article the word diacritic is used with this meaning: any sign placed above, through or below a standard letter (among the 26 given above); disregarding the cases where the resulting letter (e.g. å in Norwegian) is considered an ordinary letter in the alphabet of the language where it is used. Albanian The alphabet (36 letters): a, b, c, ç, d, dh, e, ë, f, g, gj, h, i, j, k, l, ll, m, n, nj, o, p, q, r, rr, s, sh, t, th, u, v, x, xh, y, z, zh. Missing standard letter: w. Letters with diacritics: ç, ë. Sequences treated as one letter: dh, gj, ll, rr, sh, th, xh, zh.
    [Show full text]
  • ISO/IEC JTC1/SC2/WG2 N 2005 Date: 1999-05-29
    ISO INTERNATIONAL ORGANIZATION FOR STANDARDIZATION ORGANISATION INTERNATIONALE DE NORMALISATION --------------------------------------------------------------------------------------- ISO/IEC JTC1/SC2/WG2 Universal Multiple-Octet Coded Character Set (UCS) -------------------------------------------------------------------------------- ISO/IEC JTC1/SC2/WG2 N 2005 Date: 1999-05-29 TITLE: ISO/IEC 10646-1 Second Edition text, Draft 2 SOURCE: Bruce Paterson, project editor STATUS: Working paper of JTC1/SC2/WG2 ACTION: For review and comment by WG2 DISTRIBUTION: Members of JTC1/SC2/WG2 1. Scope This paper provides a second draft of the text sections of the Second Edition of ISO/IEC 10646-1. It replaces the previous paper WG2 N 1796 (1998-06-01). This draft text includes: - Clauses 1 to 27 (replacing the previous clauses 1 to 26), - Annexes A to R (replacing the previous Annexes A to T), and is attached here as “Draft 2 for ISO/IEC 10646-1 : 1999” (pages ii & 1 to 77). Published and Draft Amendments up to Amd.31 (Tibetan extended), Technical Corrigenda nos. 1, 2, and 3, and editorial corrigenda approved by WG2 up to 1999-03-15, have been applied to the text. The draft does not include: - character glyph tables and name tables (these will be provided in a separate WG2 document from AFII), - the alphabetically sorted list of character names in Annex E (now Annex G), - markings to show the differences from the previous draft. A separate WG2 paper will give the editorial corrigenda applied to this text since N 1796. The editorial corrigenda are as agreed at WG2 meetings #34 to #36. Editorial corrigenda applicable to the character glyph tables and name tables, as listed in N1796 pages 2 to 5, have already been applied to the draft character tables prepared by AFII.
    [Show full text]
  • Unicode Alphabets for L ATEX
    Unicode Alphabets for LATEX Specimen Mikkel Eide Eriksen March 11, 2020 2 Contents MUFI 5 SIL 21 TITUS 29 UNZ 117 3 4 CONTENTS MUFI Using the font PalemonasMUFI(0) from http://mufi.info/. Code MUFI Point Glyph Entity Name Unicode Name E262 � OEligogon LATIN CAPITAL LIGATURE OE WITH OGONEK E268 � Pdblac LATIN CAPITAL LETTER P WITH DOUBLE ACUTE E34E � Vvertline LATIN CAPITAL LETTER V WITH VERTICAL LINE ABOVE E662 � oeligogon LATIN SMALL LIGATURE OE WITH OGONEK E668 � pdblac LATIN SMALL LETTER P WITH DOUBLE ACUTE E74F � vvertline LATIN SMALL LETTER V WITH VERTICAL LINE ABOVE E8A1 � idblstrok LATIN SMALL LETTER I WITH TWO STROKES E8A2 � jdblstrok LATIN SMALL LETTER J WITH TWO STROKES E8A3 � autem LATIN ABBREVIATION SIGN AUTEM E8BB � vslashura LATIN SMALL LETTER V WITH SHORT SLASH ABOVE RIGHT E8BC � vslashuradbl LATIN SMALL LETTER V WITH TWO SHORT SLASHES ABOVE RIGHT E8C1 � thornrarmlig LATIN SMALL LETTER THORN LIGATED WITH ARM OF LATIN SMALL LETTER R E8C2 � Hrarmlig LATIN CAPITAL LETTER H LIGATED WITH ARM OF LATIN SMALL LETTER R E8C3 � hrarmlig LATIN SMALL LETTER H LIGATED WITH ARM OF LATIN SMALL LETTER R E8C5 � krarmlig LATIN SMALL LETTER K LIGATED WITH ARM OF LATIN SMALL LETTER R E8C6 UU UUlig LATIN CAPITAL LIGATURE UU E8C7 uu uulig LATIN SMALL LIGATURE UU E8C8 UE UElig LATIN CAPITAL LIGATURE UE E8C9 ue uelig LATIN SMALL LIGATURE UE E8CE � xslashlradbl LATIN SMALL LETTER X WITH TWO SHORT SLASHES BELOW RIGHT E8D1 æ̊ aeligring LATIN SMALL LETTER AE WITH RING ABOVE E8D3 ǽ̨ aeligogonacute LATIN SMALL LETTER AE WITH OGONEK AND ACUTE 5 6 CONTENTS
    [Show full text]
  • Chapter 5. Characters: Typology and Page Encoding 1
    Chapter 5. Characters: typology and page encoding 1 Chapter 5. Characters: typology and encoding Version 2.0 (16 May 2008) 5.1 Introduction PDF of chapter 5. The basic characters a-z / A-Z in the Latin alphabet can be encoded in virtually any electronic system and transferred from one system to another without loss of information. Any other characters may cause problems, even well established ones such as Modern Scandinavian ‘æ’, ‘ø’ and ‘å’. In v. 1 of The Menota handbook we therefore recommended that all characters outside a-z / A-Z should be encoded as entities, i.e. given an appropriate description and placed between the delimiters ‘&’ and ‘;’. In the last years, however, all major operating systems have implemented full Unicode support and a growing number of applications, including most web browsers, also support Unicode. We therefore believe that encoders should take full advantage of the Unicode Standard, as recommended in ch. 2.2.2 above. As of version 2.0, the character encoding recommended in The Menota handbook has been synchronised with the recommendations by the Medieval Unicode Font Initiative . The character recommendations by MUFI contain more than 1,300 characters in the Latin alphabet of potential use for the encoding of Medieval Nordic texts. As a consequence of the synchronisation, the list of entities which is part of the Menota scheme is identical to the one by MUFI. In other words, if a character is encoded with a code point or an entity in the MUFI character recommendation, it will be a valid character encoding also in a Menota text.
    [Show full text]