Comments on Review Draft UC3M0714 of Unicode 3.0 Source: Michael Everson Status: Expert Contribution Date: 1998-11-01

Title: Comments on Review Draft UC3M0714 of Unicode 3.0 Source: Michael Everson Status: Expert contribution Date: 1998-11-01 In the pages following, I have given items from a text dump of the Unicode 3.0 draft in 9- point Helvetica with comments following the items in 12-point Times. I have not reviewed the Brahmic scripts in this document. 0021 EXCLAMATION MARK = factorial = bang → (inverted exclamation mark - 00A1) → (latin letter retroflex click - 01C3) → (double exclamation mark - 203C) → (heavy exclamation mark ornament - 2762) Add x 203D ?! interrobang 0022 QUOTATION MARK = APL quote • neutral (vertical), used as opening or closing quotation mark • preferred characters for paired quotation marks are 201C & 201D This is not true for Swedish or German. Please add “in the English language”. → (modifier letter double prime - 02BA) → (combining double acute accent - 030B) → (combining double vertical line above - 030E) The leading is bad here. Please make sure the leading of all lines in the document is the same everywhere. 0028 LEFT PARENTHESIS = OPENING PARENTHESIS 0029 RIGHT PARENTHESIS = CLOSING PARENTHESIS • see discussion on semantics of paired bracketing characters This note should appear under both 0028 and 0029, and all other paired bracketing characters. Also, it should rever to the actual reference to where the discussion is. 002C COMMA → (arabic comma - 060C) → (ideographic comma - 3001) Add x 201A , SINGLE LOW-9 QUOTATION MARK 002D HYPHEN-MINUS = hyphen or minus sign • used for either hyphen or minus sign • other hyphen and dash characters: 2010-2015 In the printout, but not in this text dump, it says “2010 - — 2015 —”, but the first em-dash is confusing. Whenever ranges are given, the arrow → should be given. Cross references should not be given with the arrow. This is confusing. I suggest that the cross references be shown with the commonly recognized and attractive printer’s fist ☞ 261E. Page 1 002F SOLIDUS = SLASH = virgule, shilling (British) → (latin letter dental click - 01C0) → (fraction slash - 2044) → (division slash - 2215) The alignment between the glyphs and the names for the cross references is not the same for each line and should be. A tab needs to be set for these things. 0049 LATIN CAPITAL LETTER I • Turkish uses 0131 for lowercase Correct to “Turkish and Azerbaijani use 0131 for lowercase”. Add cross references to vertical line 007C, cyrillic capital letter byelorussian-ukrainian i 0406, cyrillic palochka 04C0, and roman numeral one 2160. 004B LATIN CAPITAL LETTER K → (kelvin sign - 212A) I have never understood why the glyphs at °C 2103 and °F 2109 but °K 212A does not. One says “degrees Celsius”, “degrees Fahrenheit”, and “degrees Kelvin”. 0069 LATIN SMALL LETTER I • Turkish uses 0130 for uppercase Correct to “Turkish and Azerbaijani use 0130 for uppercase” 007C VERTICAL LINE Add cross references to latin capital letter I 0049, cyrillic capital letter byelorussian-ukrainian i 0406, cyrillic palochka 04C0, and roman numeral one 2160. 00A1 INVERTED EXCLAMATION MARK • Spanish Add “, Asturian, Galician” 00A3 POUND SIGN = pound sterling Add “, Irish punt” 00A4 CURRENCY SIGN • other currency symbol characters: 20A0-20AD Add 00A2 CENT SIGN, 00A3 POUND SIGN, 00A5 YEN SIGN, 0E3F THAI BAHT, 17DB KHMER CURRENCY SYMBOL RIEL all of which are other currency symbol characters. 00A7 SECTION SIGN • paragraph sign in some European usage Possibly add note “• derives from manuscript tradition” 00AA FEMININE ORDINAL INDICATOR • Spanish This is probably used in other languages, such as Galician, Italian, Portuguese, etc. Page 2 00B2 SUPERSCRIPT TWO = squared • other superscript digit characters: 2070-2079 With regard to the range mark, see the note on 002D above. 00B3 SUPERSCRIPT THREE = cubed → (superscript one - 00B9) ≈ <super> 0033 It should be noted that ≈ 2248 is ALMOST EQUAL TO. What is ≈ intended to represent here? Is it not intended to represent Í 224D EQUIVALENT TO or á 2263 STRICTLY EQUIVALENT TO? This is not clear to the reader. 00BA MASCULINE ORDINAL INDICATOR • Spanish This is probably used in other languages, such as Galician, Italian, etc. In the absence of further information, the note should be deleted as noncomprehensive. 00D0 LATIN CAPITAL LETTER ETH (Icelandic) → (latin small letter eth - 00F0) → (latin capital letter d with stroke - 0110) → (latin capital letter african d - 0189) The glyph for 0110 should not differ from the glyph for 00D0 and 0189. 00DF LATIN SMALL LETTER SHARP S (German) = ess-zed • German Note that the parenthetical has been retained (possibly an error in Asmus’ program) in the character name. “Ess-zed” is incorrect. The correct German spelling is “Eszett” (note capital E). Add “scharfes Es” or “scharfes s”. • uppercase is "SS" This is no longer entirely true; the rule is, apparently, suspended for personal names. Delete the note here. Or add LATIN CAPITAL LETTER SHARP S to the UCS. :-) 00E5 LATIN SMALL LETTER A WITH RING ABOVE • Danish, Norwegian, Swedish Add “, Walloon”. 00E6 LATIN SMALL LETTER AE = LATIN SMALL LIGATURE AE • IPA → (latin small ligature oe - 0153) Add “Danish, Norwegian, Icelandic, Faroese, Old English, French”. Add “Commonly called Ash (from Old English æsc)”. 00EC LATIN SMALL LETTER I WITH GRAVE • Italian, Malagash It is Malagasy and not Malagash (which is an anglicization of the French term Malagache). Page 3 00F0 LATIN SMALL LETTER ETH (Icelandic) • Icelandic, Faroese, old English, IPA Say “Old Engish”. → 00D0 D latin capital letter eth Add cross references to 03B4 GREEK SMALL LETTER DELTA and 2202 PARTIAL DIFFERENTIAL. 00FD LATIN SMALL LETTER Y WITH ACUTE • Czech, Slovak, Icelandic, Faroese, Malagash Add “, Welsh”. It is Malagasy and not Malagash (which is an anglicization of the French term Malagache). 00FE LATIN SMALL LETTER THORN (Icelandic) • Icelandic, old English, IPA • Runic letter borrowed into Latin script Say Old English. 0101 LATIN SMALL LETTER A WITH MACRON • Latvian, ... Change “Latvian, …” to “Latvian, Latin”. Do not use ellipses in these notes. 0103 LATIN SMALL LETTER A WITH BREVE • Romanian, Vietnamese, ... Change “Romanian, Vietnamese, …” to “Romanian, Vietnamese, Latin”. Do not use ellipses in these notes. 0104 LATIN CAPITAL LETTER A WITH OGONEK The position of the ogonek in the glyph is incorrect. 0105 LATIN SMALL LETTER A WITH OGONEK • Polish, Lithuanian, ... Say “Lithuanian, Lakota, Polish”. Do not use ellipses in these notes. 0107 LATIN SMALL LETTER C WITH ACUTE • Polish, Croatian, ... Say “Croatian, Polish”. Do not use ellipses in these notes. 0109 LATIN SMALL LETTER C WITH CIRCUMFLEX • Esperanto 010B LATIN SMALL LETTER C WITH DOT ABOVE • Maltese Add “, Irish Gaelic (old orthography)” 010D LATIN SMALL LETTER C WITH CARON • (many) Say “Czech, Lakota, Slovak, Slovenian, and many other languages.” or delete the note. Page 4 010E LATIN CAPITAL LETTER D WITH CARON • the form using caron/hacek is preferred in all contexts Do not use the term hacek here. If you do use the term it must be spelled correctly, namely hácˇek. 010F LATIN SMALL LETTER D WITH CARON • Czech, Slovak • the form using apostrophe is preferred in typesetting The glyph is incorrect. It is not acceptable to display this character with the non-preferred glyph. 0110 LATIN CAPITAL LETTER D WITH STROKE The glyph is incorrect. The glyph should be identical to the glyph used at 00D0 and 0189. 0111 LATIN SMALL LETTER D WITH STROKE • Croatian, Vietnamese, Lappish Do not use the term “Lappish” as it is considered offensive by Sámi people. Use the term “Sámi”. 0113 LATIN SMALL LETTER E WITH MACRON • Latvian, ... Change “Latvian, …” to “Latvian, Latin”. Do not use ellipses in these notes. 0115 LATIN SMALL LETTER E WITH BREVE • Malay, ... Change “Malay, …” to “Malay, Latin”. Do not use ellipses in these notes. 0118 LATIN SMALL LETTER E WITH OGONEK The position of the ogonek in the glyph is incorrect. 0119 LATIN SMALL LETTER E WITH OGONEK • Polish, Lithuanian, ... Change “Polish, Lithuanian, …” to “Polish, Lithuanian”. Do not use ellipses in these notes. The position of the ogonek in the glyph is incorrect. 011B LATIN SMALL LETTER E WITH CARON • Czech, ... Change “Czech, …” to “Czech”. Do not use ellipses in these notes. 011F LATIN SMALL LETTER G WITH BREVE • Turkish Change to “Turkish, Azerbaijani”. 0121 LATIN SMALL LETTER G WITH DOT ABOVE • Maltese, ... Change “Maltese, …” to “Maltese, Irish Gaelic (old orthography)”. Do not use ellipses in these notes. 0122 LATIN CAPITAL LETTER G WITH CEDILLA The glyph is incorrect. Page 5 0123 LATIN SMALL LETTER G WITH CEDILLA • Latvian, Lappish This letter is not used in Sámi, so delete “Lappish”. The glyph is incorrect and the preferred form with turned comma above must be used. • there are three glyph variants Delete this note. The other glyph variants are not preferred by Latvians. 0127 LATIN SMALL LETTER H WITH STROKE • Maltese, IPA, ... Change “Maltese, IPA, …” to “Maltese, IPA”. Do not use ellipses in these notes. 0129 LATIN SMALL LETTER I WITH TILDE • Greenlandic Change to “Greenlandic (old orthography) 012B LATIN SMALL LETTER I WITH MACRON • Latvian, ... Change “Latvian, …” to “Latvian, Latin”. Do not use ellipses in these notes. 012D LATIN SMALL LETTER I WITH BREVE • Latin, ... Change “Latin, …” to “Latin”. Do not use ellipses in these notes. 012E LATIN CAPITAL LETTER I WITH OGONEK The position of the ogonek in the glyph is incorrect. 012F LATIN SMALL LETTER I WITH OGONEK • Lithuanian, ... Change “Lithuanian, …” to “Lithuanian, Navajo”. Do not use ellipses in these notes. The position of the ogonek in the glyph is incorrect. 0130 LATIN CAPITAL LETTER I WITH DOT ABOVE = LATIN CAPITAL LETTER I DOT • Turkish Use “Turkish, Azerbaijani” 0131 LATIN SMALL LETTER DOTLESS I • Turkish Use “Turkish, Azerbaijani” 0136 LATIN CAPITAL LETTER K WITH CEDILLA The glyph is incorrect. It must be a comma below. 0137 LATIN SMALL LETTER K WITH CEDILLA • Latvian, ... Change “Latvian, …” to “Latvian”. Do not use ellipses in these notes. The glyph is incorrect. It must be a comma below. Page 6 0138 LATIN SMALL LETTER KRA (Greenlandic) • old Greenlandic Change to “Greenlandic (old orthography) 013B LATIN CAPITAL LETTER L WITH CEDILLA The glyph is incorrect.

Comments on Review Draft UC3M0714 of Unicode 3.0 Source: Michael Everson Status: Expert Contribution Date: 1998-11-01

Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names

Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress

Combining Diacritical Marks Range: 0300–036F the Unicode Standard

1. Introduction

Gerard Manley Hopkins' Diacritics: a Corpus Based Study

5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721

Haptiread: Reading Braille As Mid-Air Haptic Information

Kyrillische Schrift Für Den Computer

Alphabets, Letters and Diacritics in European Languages (As They Appear in Geography)

ISO/IEC JTC1/SC2/WG2 N 2005 Date: 1999-05-29

Unicode Alphabets for L ATEX

Chapter 5. Characters: Typology and Page Encoding 1