ISO/IEC JTC 1/SC 2 N 3194/WG2 N1905 Date: 1998-10-22

Total Page:16

File Type:pdf, Size:1020Kb

ISO/IEC JTC 1/SC 2 N 3194/WG2 N1905 Date: 1998-10-22 ISO/IEC JTC 1/SC 2 N 3194/WG2 N1905 Date: 1998-10-22 Replaces SC 2 N 3112 ISO/IEC JTC 1/SC 2 CODED CHARACTER SETS SECRETARIAT: JAPAN (JISC) DOC TYPE: Text for FDAM ballot TITLE: Revised text of 10646-1/FPDAM 23, Universal Multiple-Octet Coded Character set (UCS) -- Part 1: Architecture and Basic Multilingual Plane -- AMENDMENT 23: Bopomofo Extended and other characters SOURCE: Project Editor PROJECT: JTC 1.02.18.01.23 STATUS: In accordance with Resolution M35.08 adopted at the 35th meeting of SC 2/WG 2 held in London, UK, 1998-09-21/25, this document has been prepared by project editor. It is submitted to ITTF for a two-month FDAM ballot. ACTION ID: ITTF DUE DATE: -- DISTRIBUTION:P, O and L Members of ISO/IEC JTC 1/SC 2 WG Conveners and Secretariats Secretariat, ISO/IEC JTC 1 ISO/IEC ITTF NO. OF PAGES:5 ACCESS LEVEL: Defined WEB ISSUE #: 023 Contact: Secretariat ISO/IEC JTC 1/SC 2 - Toshiko KIMURA IPSJ/ITSCJ (Information Processing Society of Japan/Information Technology Standards Commission of Japan)* Room 308-3, Kikai-Shinko-Kaikan Bldg., 3-5-8, Shiba-Koen, Minato-ku, Tokyo 105-0011 JAPAN Tel: +81 3 3431 2808; Fax: +81 3 3431 6493; E-mail: [email protected]; http://www.dkuug.dk/jtc1/sc2 *A Standard Organization accredited by JISC © ISO/IEC FDAM for ISO/IEC 10646-1: 1993/Amd. 23: 1999 (E) Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane AMENDMENT 23: Bopomofo Extended and other characters 1. List of new character names Page 89. Table 37 - Row 20: COMBINING MARKS FOR SYMBOLS Insert the following character name entries at the indicated positions in the tables of character names hex Name identified below, replacing the existing entries which E2 ENCLOSING SCREEN read “(This position shall not be used)”. E3 ENCLOSING KEY CAP Page 109. Table 47 - Row 26: MISCELLANEOUS Page 25. Table 5 - Row 02: LATIN EXTENDED-B SYMBOLS hex Name hex Name 1E LATIN CAPITAL LETTER H WITH CARON 70 WEST SYRIAC CROSS 1F LATIN SMALL LETTER H WITH CARON 71 EAST SYRIAC CROSS Page 29. Table 7 - Row 02: MODIFIER LETTERS Page 115. Table 50 - Row 30: SPECIALS hex Name hex Name EA MODIFIER LETTER YIN DEPARTING TONE 3E IDEOGRAPHIC VARIATION INDICATOR MARK EB MODIFIER LETTER YANG DEPARTING TONE MARK Page 121. Table 53 - Row 31 Amend title to: CJK MISCELLANEOUS, Page 11 of Amendment 11 Table 221 - Row 16: BOPOMOFO EXTENDED and add the following UNIFIED CANADIAN ABORIGINAL SYLLABICS characters: hex Name hex Name 6F CANADIAN SYLLABICS QAI A0 BOPOMOFO LETTER BU 70 CANADIAN SYLLABICS NGAI A1 BOPOMOFO LETTER ZI 71 CANADIAN SYLLABICS NNGI A2 BOPOMOFO LETTER JI 72 CANADIAN SYLLABICS NNGII A3 BOPOMOFO LETTER GU 73 CANADIAN SYLLABICS NNGO A4 BOPOMOFO LETTER EE 74 CANADIAN SYLLABICS NNGOO A5 BOPOMOFO LETTER ENN 75 CANADIAN SYLLABICS NNGA A6 BOPOMOFO LETTER OO 76 CANADIAN SYLLABICS NNGAA A7 BOPOMOFO LETTER ONN A8 BOPOMOFO LETTER IR A9 BOPOMOFO LETTER ANN Page 87. Table 36 - Row 20: SUPERSCRIPTS AND AA BOPOMOFO LETTER INN SUBSCRIPTS, CURRENCY SYMBOLS AB BOPOMOFO LETTER UNN hex Name AC BOPOMOFO LETTER IM AD KIP SIGN AD BOPOMOFO LETTER NGG AE BOPOMOFO LETTER AINN AF BOPOMOFO LETTER AUNN B0 BOPOMOFO LETTER AM B1 BOPOMOFO LETTER OM B2 BOPOMOFO LETTER ONG 1 FDAM for ISO/IEC 10646-1: 1993/Amd. 23: 1999 (E) © ISO/IEC B3 BOPOMOFO LETTER INNN B4 BOPOMOFO FINAL LETTER P 1673: B5 BOPOMOFO FINAL LETTER T ó B6 BOPOMOFO FINAL LETTER K B7 BOPOMOFO FINAL LETTER H 1674: ô 2. List of new graphic symbols Insert the following graphic character symbols at the indicated positions in the tables of character glyphs 1675: õ identified below, replacing the existing entries which are indicated by a hatched fill. 1676: ö Page 24. Table 5 - Row 02: LATIN EXTENDED-B Page 86. Table 36 - Row 20: SUPERSCRIPTS AND SUBSCRIPTS, CURRENCY SYMBOLS 021E: • 20AD: ¯ 021F: ® Page 88. Page 28. Table 37 - Row 20: COMBINING MARKS Table 7 - Row 02: MODIFIER LETTERS FOR SYMBOLS 02EA: º 20E2: Ì ± 02EB: » Ì 20E3: ° Page 10 of Amendment 11. Page 108. Table 221 - Row 16: UNIFIED CANADIAN Table 46 - Row 26: MISCELLANEOUS ABORIGINAL SYLLABICS SYMBOLS 166F: ï 2670: ² 1670: ð 2671: ³ Page 114. 1671: ñ Table 50 - Row 30: SPECIALS 1672: ò 303E: ´ 2 © ISO/IEC FDAM for ISO/IEC 10646-1: 1993/Amd. 23: 1999 (E) Page 120. Table 53 - Row 31: Amend title to: 31A 31B Table 53 - Row 31: CJK MISCELLANEOUS, BOPOMOFO EXTENDED 0 Add two new columns numbered 31A and 31B to the € • code table as shown on the right of this page. 176 192 1 • ‘ 3. Other changes 177 193 Page 11, clause 19 2 In the list of Block names, after the entry CJK ‚ ’ 178 194 MISCELLANEOUS insert a new entry as follows: BOPOMOFO EXTENDED 31A0 - 31BF 3 ƒ “ 179 195 Page 15, Clause 25, Figure 4 (see Amendment 5) 4 „ ” In Figure 4, Row 31, replace the left part of the cross- 180 196 hatching with the words Bopomofo Extended 5 … • 181 197 centred on a white background. 6 † – Page 700, Annex A 182 198 In the list of collection numbers and names, amend 52 7 BOPOMOFO to read: ‡ — 183 199 52 BOPOMOFO 3100-312F, 31A0-31BF 8 ˆ 184 200 Pages 709 ff, Annex E Alphabetically sorted list of character names 9 ‰ Insert each of the character name entries from Item 1 185 201 above at the appropriate position, ordered alphabetically by the character name, in the list of character names in A Annex E. [Editor's note: A list of the entries, showing their Š proper positions, will be provided in the Final Text.] 186 202 B 31B7 BOPOMOFO FINAL LETTER H ‹ 31B6 BOPOMOFO FINAL LETTER K 187 203 31B4 BOPOMOFO FINAL LETTER P 31B5 BOPOMOFO FINAL LETTER T C 31AE BOPOMOFO LETTER AINN Œ 31B0 BOPOMOFO LETTER AM 188 204 31A9 BOPOMOFO LETTER ANN 31AF BOPOMOFO LETTER AUNN D 31A0 BOPOMOFO LETTER BU • 31A4 BOPOMOFO LETTER EE 189 205 31A5 BOPOMOFO LETTER ENN 31A3 BOPOMOFO LETTER GU E 31AC BOPOMOFO LETTER IM Ž 31AA BOPOMOFO LETTER INN 190 206 31B3 BOPOMOFO LETTER INNN 31A8 BOPOMOFO LETTER IR F 31A2 BOPOMOFO LETTER JI • 31AD BOPOMOFO LETTER NGG 191 207 31B1 BOPOMOFO LETTER OM 3 FDAM for ISO/IEC 10646-1: 1993/Amd. 23: 1999 (E) © ISO/IEC 31B2 BOPOMOFO LETTER ONG 31A7 BOPOMOFO LETTER ONN 31A6 BOPOMOFO LETTER OO 31AB BOPOMOFO LETTER UNN 31A1 BOPOMOFO LETTER ZI 1670 CANADIAN SYLLABICS NGAI 1675 CANADIAN SYLLABICS NNGA 1676 CANADIAN SYLLABICS NNGAA 1671 CANADIAN SYLLABICS NNGI 1672 CANADIAN SYLLABICS NNGII 1673 CANADIAN SYLLABICS NNGO 1674 CANADIAN SYLLABICS NNGOO 166F CANADIAN SYLLABICS QAI 2671 EAST SYRIAC CROSS 20E3 ENCLOSING KEY CAP 20E2 ENCLOSING SCREEN 303E IDEOGRAPHIC VARIATION INDICATOR 20AD KIP SIGN 021E LATIN CAPITAL LETTER H WITH CARON 021F LATIN SMALL LETTER H WITH CARON 02EB MODIFIER LETTER YANG DEPARTING TONE MARK 02EA MODIFIER LETTER YIN DEPARTING TONE MARK 2670 WEST SYRIAC CROSS 4.
Recommended publications
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Bopomofo Extended Range: 31A0–31BF
    Bopomofo Extended Range: 31A0–31BF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Unifoundry.Com GNU Unifont Glyphs
    Unifoundry.com GNU Unifont Glyphs Home GNU Unifont Archive Unicode Utilities Unicode Tutorial Hangul Fonts Unifont 9.0 Chart Fontforge Poll Downloads GNU Unifont is part of the GNU Project. This page contains the latest release of GNU Unifont, with glyphs for every printable code point in the Unicode 9.0 Basic Multilingual Plane (BMP). The BMP occupies the first 65,536 code points of the Unicode space, denoted as U+0000..U+FFFF. There is also growing coverage of the Supplemental Multilingual Plane (SMP), in the range U+010000..U+01FFFF, and of Michael Everson's ConScript Unicode Registry (CSUR). These font files are licensed under the GNU General Public License, either Version 2 or (at your option) a later version, with the exception that embedding the font in a document does not in itself constitute a violation of the GNU GPL. The full terms of the license are in LICENSE.txt. The standard font build — with and without Michael Everson's ConScript Unicode Registry (CSUR) Private Use Area (PUA) glyphs. Download in your favorite format: TrueType: The Standard Unifont TTF Download: unifont-9.0.01.ttf (12 Mbytes) Glyphs above the Unicode Basic Multilingual Plane: unifont_upper-9.0.01.ttf (1 Mbyte) Unicode Basic Multilingual Plane with CSUR PUA Glyphs: unifont_csur-9.0.01.ttf (12 Mbytes) Glyphs above the Unicode Basic Multilingual Plane with CSUR PUA Glyphs: unifont_upper_csur-9.0.01.ttf (1 Mbyte) PCF: unifont-9.0.01.pcf.gz (1 Mbyte) BDF: unifont-9.0.01.bdf.gz (1 Mbyte) Specialized versions — built by request: SBIT: Special version at the request
    [Show full text]
  • Outline of the Course
    Outline of the course Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (()30%) User Services (10%) Additional topics (()15%) Buliding of a (small) digital library Reference material: – Ian Witten, David Bainbridge, David Nichols, How to build a Digital Library, Morgan Kaufmann, 2010, ISBN 978-0-12-374857-7 (Second edition) – The Web FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -1 Access to information Representation of characters within a computer Representation of documents within a computer – Text documents – Images – Audio – Video How to store efficiently large amounts of data – Compression How to retrieve efficiently the desired item(s) out of large amounts of data – Indexing – Query execution FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -2 Representation of characters The “natural” wayyp to represent ( (palphanumeric ) characters (and symbols) within a computer is to associate a character with a number,,g defining a “coding table” How many bits are needed to represent the Latin alphabet ? FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -3 The ASCII characters The 95 printable ASCII characters, numbdbered from 32 to 126 (dec ima l) 33 control characters FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -4 ASCII table (7 bits) FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -5 ASCII 7-bits character set FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -6 Representation standards ASCII (late fifties) – AiAmerican
    [Show full text]
  • Iso/Iec 10646:2011 Fdis
    ISO/IEC International Standard ISO/IEC 10646 Final Draft International Standard Information technology – Universal Coded Character Set (UCS) Technologie de l‘information – Jeu universel de caractères codés (JUC) Second edition, 2011 \fdis10646.docx ISO/IEC 10646:2011 (E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF- creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. © ISO/IEC 2011 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the ad- dress below or ISO's member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail [email protected] Web www.iso.ch Printed in Switzerland 2 © ISO/IEC 2011 – All rights reserved ISO/IEC 10646:2011 (E) CONTENTS Foreword...............................................................................................................................................
    [Show full text]
  • Fonts in Mpdf Version 5.X Mpdf Version 5 Supports Truetype Fonts, Reading and Embedding Directly from the .Ttf Font Files
    mPDF Fonts in mPDF Version 5.x mPDF version 5 supports Truetype fonts, reading and embedding directly from the .ttf font files. Fonts must follow the Truetype specification and use Unicode mapping to the characters. Truetype collections (.ttc files) and Opentype files (.otf) in Truetype format are also supported. EASY TO ADD NEW FONTS 1. Upload the Truetype font file to the fonts directory (/ttfonts) 2. Define the font file details in the configuration file (config_fonts.php) 3. Access the font by specifying it in your HTML code as the CSS font-family These are some examples of Windows fonts: Arial - The quick, sly fox jumped over the lazy brown dog. Comic Sans MS - The quick, sly fox jumped over the lazy brown dog. Trebuchet - The quick, sly fox jumped over the lazy brown dog. Calibri - The quick, sly fox jumped over the lazy brown dog. QuillScript - The quick, sly fox jumped over the lazy brown dog. Lucidaconsole - The quick, sly fox jumped over the lazy brown dog. Tahoma - The quick, sly fox jumped over the lazy brown dog. AlbaSuper - The quick, sly fox jumped over the lazy brown dog. FULL UNICODE SUPPORT The DejaVu fonts distributed with mPDF contain an extensive set of characters, but it is easy to add fonts to access uncommon characters. Georgian (DejaVuSansCondensed) Ⴀ Ⴁ Ⴂ Ⴃ Ⴄ Ⴅ Ⴆ Ⴇ Ⴈ Ⴉ Ⴊ Ⴋ Ⴌ Ⴍ Ⴎ Ⴏ Ⴐ Ⴑ Ⴒ Ⴓ Cherokee (Quivira) Ꭰ Ꭱ Ꭲ Ꭳ Ꭴ Ꭵ Ꭶ Ꭷ Ꭸ Ꭹ Ꭺ Ꭻ Ꭼ Ꭽ Ꭾ Ꭿ Ꮀ Ꮁ Ꮂ Runic (Junicode) ᚠ ᚡ ᚢ ᚣ ᚤ ᚥ ᚦ ᚧ ᚨ ᚩ ᚪ ᚫ ᚬ ᚭ ᚮ ᚯ ᚰ ᚱ ᚲ ᚳ ᚴ ᚵ ᚶ ᚷ ᚸ ᚹ ᚺ ᚻ ᚼ Greek Extended (Quivira) ἀ ἁ ἂ ἃ ἄ ἅ ἆ ἇ Ἀ Ἁ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ ἐ ἑ ἒ ἓ ἔ ἕ IPA Extensions (Quivira)
    [Show full text]
  • ISO/IEC International Standard 10646-1
    JTC1/SC2/WG2 N3658 Proposed Draft Amendment (PDAM) 8 ISO/IEC 10646:2003/Amd.8:2009 (E) Information technology — Universal Multiple-Octet Coded Character Set (UCS) — AMENDMENT 8: Additional symbols, Bamum supplement, CJK Unified Ideographs Extension D, and other characters Page 2, Clause 3, Normative references Replace the list describing the fields of the linked con- tent (CJKU_SR.txt) and the following paragraph with Update the reference to the Unicode Bidirectional Algo- the following text: rithm and the Unicode Normalization Forms as follows: st • 1 field: BMP or SIP code point (0hhhh), Unicode Standard Annex, UAX#9, The Unicode Bidi- (2hhhh) rectional Algorithm: • 2nd field: Radical Stroke index http://www.unicode.org/reports/tr9/tr9-21.html. (d{1,3}’.d{1,2}), (Radical is one to three di- gits, optionally followed by an apostrophe for alter- Unicode Standard Annex, UAX#15, Unicode Normali- nate radical, followed by a full stop, and ending by zation Forms: one or two digits for the stroke count). http://www.unicode.org/reports/tr15/tr15-31.html. rd • 3 field: Hanzi G sources(G0-hhhh), (G1- hhhh), (G3-hhhh), (G5-hhhh), (G7- Editor’s Note: The versions for the Unicode Standard hhhh), (GS-hhhh), (G8-hhhh), (G9- Annexes mentioned above will be updated as appropri- hhhh), (GE-hhhh), (G_4K), (G_BK), ate in future phases of this amendment process. (G_BKddddd), (G_CH), (G_CY), (G_CYYddddd), (G_CHddddd), (G_FZ), Page 21, Sub-clause 23.1 Source references (G_FZddddd), (G_GHddddd), for CJK Unified Ideographs (G_GFHZBddd), (G_GJZddddd), (G_HC),
    [Show full text]
  • Iso/Iec 30112 Wd12 Standard
    INTERNATIONAL ISO/IEC 30112 WD12 STANDARD ISO/IEC 30112 WD12 2018-02-12 Information technology — Specification methods for cultural conventions Technologies de l'information — Méthodes de modélisation des conventions culturelles This page left for ISO/IEC copyright notices. ISO/IEC 30112 WD12 Contents Page CONTENTS iii FOREWORD iv INTRODUCTION v 1 SCOPE 1 2 NORMATIVE REFERENCES 1 3 TERMS, DEFINITIONS AND NOTATIONS 1 4 FDCC-set 7 4.1 FDCC-set description 8 4.2 LC_IDENTIFICATION 13 4.3 LC_CTYPE 15 4.4 LC_COLLATE 48 4.5 LC_MONETARY 62 4.6 LC_NUMERIC 67 4.7 LC_TIME 68 4.8 LC_MESSAGES 77 4.9 LC_XLITERATE 78 4.10 LC_NAME 80 4.11 LC_ADDRESS 82 4.12 LC_TELEPHONE 85 4.13 LC_PAPER 86 4.14 LC_MEASUREMENT 86 4.15 LC_KEYBOARD 87 5 CHARMAP 87 6 REPERTOIREMAP 93 7 FUNCTIONALITY 127 8 MESSAGE FORMAT 127 Annex A (informative) DIFFERENCES FROM POSIX 127 Annex B (informative) RATIONALE 129 Annex C (informative) BNF GRAMMAR 145 Annex D (informative) RELATION TO TAXONOMY 151 Annex E (informative) IMPLEMENTATION IN GLIBC 154 Annex F (informative) INDEX 155 BIBLIOGRAPHY 158 2 © ISO/IEC 2018 – All rights reserved ISO/IEC 30112 WD12 Foreword ISO (the International Organization for Standardization) and IEC (the International Electrotechnical Commission) form the specialized system for worldwide standardization. National bodies that are members of ISO or IEC participate in the development of International Standards through technical committees established by the respective organization to deal with particular fields of technical activity. ISO and IEC technical committees collaborate in fields of mutual interest. Other international organizations, governmental and non-governmental, in liaison with ISO and IEC, also take part in the work.
    [Show full text]
  • Chinese Taalverwerking Op De Computer
    FACULTEIT LETTEREN DEPARTEMENT OOSTERSE EN SLAVISCHE STUDIES KATHOLIEKE UNIVERSITEIT LEUVEN CHINESE TAALVERWERKING OP DE COMPUTER Deel I : Theoretisch Overzicht Promotor : Prof. Dr. Fred Truyen Verhandeling aangeboden tot het verkrijgen van de graad van licentiaat in de Sinologie door: Sébastien Bruggeman - 2001-2002 - VOORWOORD Dit theoretische overzicht handelt over de Chinese taalverwerking op de computer. Het heeft de bedoeling om zo volledig mogelijk te zijn, maar zal het helaas nooit kunnen zijn door de uitgebreidheid van dit onderwerp. Hoewel dit deel veel technische details bevat is er geen voorkennis vereist. Naast dit theoretisch overzicht is er ook nog een praktische handleiding voor mensen die Chinees in de praktijk op hun computer willen gebruiken. Ook voor dit deel is geen voorkennis vereist, wel wordt er gerekend op een basiskennis van Microsoft Windows. Het voorhanden hebben van een computer met internetverbinding maakt het mogelijk om alles onmiddellijk in de praktijk om te zetten. Het derde luik van deze verhandeling is een website. Op deze website kunnen extra documentatie, voorbeelden en links gevonden worden. Daarnaast kan men ook terecht op het forum voor extra vragen en antwoorden. Tot slot wens ik U nog veel leesplezier en hoop ik dat U door deze licentiaatsverhandeling een betere kijk krijgt op de Chinese taalverwerking op de computer. Sébastien Bruggeman Thesis Sébastien Bruggeman Pagina 2 Thesis Sébastien Bruggeman Pagina 3 INHOUDSTAFEL 0. Gebruikte conventies......................................................................................................11
    [Show full text]
  • Proposal to Encode Cantonese Bopomofo Characters 1
    Proposal to encode Cantonese Bopomofo Characters Ben Yang Eiso Chan (陈永聪) Director of Technology PanLex — The Long Now Foundation [email protected] May 02, 02019 (last revised) 1. Introduction This proposal is a followup to L2/19-100 Preliminary proposal to encode four extended Bopomofo letters for Cantonese in BMP by Eiso Chan (陈永聪). Cantonese (�東�, �东话, Gwong2 Dung1 Waa2) is a variety of Chinese spoken by approximately 80 million native speakers. Between the 01930s and 01950s, Bopomofo (or Zhuyin Fuhao) was adapted to transcribe Cantonese by adding 4 additional characters, currently unencoded in Unicode. These additional characters are proposed below. 2. Request This proposal requests the addition of 4 new characters in the Bopomofo Extended block with the following names and code points: U+31BC BOPOMOFO LETTER GW U+31BD BOPOMOFO LETTER KW U+31BE BOPOMOFO LETTER OE U+31BF BOPOMOFO LETTER AH Additionally, this proposal requests the following changes to the Unicode Core Spec, section 18.3 Bopomofo (under Extended Bopomofo): Replace the following line from the first paragraph: There are no standard Bopomofo letters for the phonetics of Cantonese or several other Southern Chinese dialects. with the following: The use of Bopomofo letters for the phonetics of Cantonese and several other Southern Chinese dialects was never fully standardized. However, there have been several attempts, some of which have been represented in published documents. 1 of 14 Add the following to the end of the first paragraph: The four characters encoded at U+31BC..U+31BF were designed to cover additional sounds found in Cantonese. Add the following subsection after the second paragraph: In Cantonese, final consonants not covered by the set of standard Bopomofo with final “N” and “NG” are marked with a standard-sized character (ㄆ, ㄇ, ㄊ, ㄋ, ㄎ, ㄫ).
    [Show full text]
  • Unicode Groups
    Unicode 3.2 Code Groups ======================= ================================================================================ Blocks ================================================================================ Start End Block Name -------------------------------------------------------------------------------- 0000 007F Basic Latin 0080 00FF Latin-1 Supplement 0100 017F Latin Extended-A 0180 024F Latin Extended-B 0250 02AF IPA Extensions 02B0 02FF Spacing Modifier Letters 0300 036F Combining Diacritical Marks 0370 03FF Greek and Coptic 0400 04FF Cyrillic 0500 052F Cyrillic Supplementary 0530 058F Armenian 0590 05FF Hebrew 0600 06FF Arabic 0700 074F Syriac 0780 07BF Thaana 0900 097F Devanagari 0980 09FF Bengali 0A00 0A7F Gurmukhi 0A80 0AFF Gujarati 0B00 0B7F Oriya 0B80 0BFF Tamil 0C00 0C7F Telugu 0C80 0CFF Kannada 0D00 0D7F Malayalam 0D80 0DFF Sinhala 0E00 0E7F Thai 0E80 0EFF Lao 0F00 0FFF Tibetan 1000 109F Myanmar 10A0 10FF Georgian 1100 11FF Hangul Jamo 1200 137F Ethiopic 13A0 13FF Cherokee 1400 167F Unified Canadian Aboriginal Syllabics 1680 169F Ogham 16A0 16FF Runic 1700 171F Tagalog 1720 173F Hanunoo 1740 175F Buhid 1760 177F Tagbanwa 1780 17FF Khmer 1800 18AF Mongolian 1E00 1EFF Latin Extended Additional 1F00 1FFF Greek Extended 2000 206F General Punctuation 2070 209F Superscripts and Subscripts 20A0 20CF Currency Symbols 20D0 20FF Combining Diacritical Marks for Symbols 2100 214F Letterlike Symbols 2150 218F Number Forms 2190 21FF Arrows 2200 22FF Mathematical Operators 2300 23FF Miscellaneous Technical 2400 243F Control
    [Show full text]
  • Greek Abcdefghijklmno
    Autoclick - typing UNICODE - http://pcl.to/unicode © RedTitan™ Technology 2005 ÊËÌÍÎÏ&'Ï(ËÎ)ÎÏ*Ë+,-./01Ï234567Ï89:0Ï;<Ï=4Ï>04?;+Í&Ïê To view this document as UNICODE.TXT you need Notepad conf igured for font Arial on Windows XP Most Windows XP text utilities support Unicode using UTF - see http://pcl.to/unicode/utf .xml Many Windows XP fonts have more than 255 character glyphs defined. For example, the font Arial has 1419 glyphs defined which covers many foreign languages and a lot of technical symbols. Check your favourite font on http://pcl.to/pclt/ The question remains - How do you insert UTF encoded UNICODE into a docum ent like this text file? You can buy specialist editors and a new keyboard but if you just want to insert a small amount of ÊËÌÍÎËÏ Arabic text there has got to be an easier way! The AUTOCLICK utility from RedTitan provides an answer. Just click on the character and you too can - ABCÏDEFGHIJCÏKFLBMNÏ (type in Greek) Unicode code plane 0x03: JDOPBCQHGMIKRCLSTENUVWXYZ[Ï- Greek Unicode code plane 0x04: \]^_`abcdefghijklmnopqrstuvwxyz{Ï- Russian Unicode code plane 0x05: &'()*+ - Hebrew Unicode code plane 0x06: ÊËÌÍÎËÏ -Arabic Print this document with a PCL driver and use RedTitan EscapeE to convert to PDF. Note: AutoClick supports characters in the rang e 0x0000 to 0xFFFF. i.e. two byte. Version 3 UNICODE defines 3 bytes and moves some of the original glyph ranges (e.g Chinese). The full list looks like this Unicode version 3.0 - range start codes. 0000 Basic Latin 0080 Latin-1 Supplem ent 0100 Latin Extended-A 0180 Latin
    [Show full text]