CJK Compatibility Ideographs Supplement Range: 2F800–2FA1F

Total Page:16

File Type:pdf, Size:1020Kb

CJK Compatibility Ideographs Supplement Range: 2F800–2FA1F CJK Compatibility Ideographs Supplement Range: 2F800–2FA1F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. Copying characters from the character code tables or list of character names is not recommended, because for production reasons the PDF files for the code charts cannot guarantee that the correct character codes will always be copied. Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts. See https://www.unicode.org/charts/fonts.html for a list. Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these code charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site. See https://www.unicode.org/pending/pending.html and https://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2021 Unicode, Inc. All rights reserved. 2F800 CJK Compatibility Ideographs Supplement 2F823 Duplicate characters from 2F80C 2F818 CNS 11643-1992 ⼈ 9.15 ⼍ 14.8 2F800 TF-594D T6-3544 ⼀ 1.6 ≡ 349E 㒞 ≡ 51A4 冤 ~ 349E FE00 ~ 51A4 FE00 T6-2936 2F80D 2F819 ≡ 4E3D 丽 ~ 4E3D FE00 ⼏ 16.4 ⼈ 9.2 2F801 T6-2352 T4-213F ⼂ 3.2 → 5145 充 ≡ 4ECC 仌 ~ 4ECC FE00 T6-2131 ≡ 2063A ~ 2063A FE00 2F81A ≡ 4E38 丸 ~ 4E38 FE00 2F80E ⼎ 15.3 2F802 ⼉ 10.6 T6-223C ⼃ 4.0 T3-2452 ≡ 51AC 冬 免 ~ 51AC FE00 T6-2121 ≡ 514D ~ 514D FE01 2F81B ≡ 4E41 乁 ~ 4E41 FE00 2F80F ⼎ 15.5 2F803 ⼉ 10.6 T3-2441 ⼆ 7.4 T3-2753 ≡ 51B5 况 兔 ~ 51B5 FE01 T6-2566 ≡ 5154 ~ 5154 FE00 2F81C ≡ 20122 ~ 20122 FE00 2F810 ⾭ 174.9 2F804 ⼉ 10.19 T7-367A ⼈ 9.5 TF-6740 ≡ 291DF 兤 ~ 291DF FE00 T6-2572 KP1-34EE ≡ 5164 ~ 5164 FE00 2F81D ≡ 4F60 你 ~ 4F60 FE00 2F811 ⼐ 17.0 2F805 ⼋ 12.6 T5-2129 ⼈ 9.7 T3-2754 ≡ 51F5 凵 具 ~ 51F5 FE00 T4-253D KP1-3534 ≡ 5177 ~ 5177 FE00 2F81E ≡ 4FAE 侮 ~ 4FAE FE01 2F812 ⼑ 18.1 2F806 ⼋ 12.9 T6-2138 ⼈ 9.7 T6-3D3C ≡ 5203 刃 ~ 5203 FE00 T6-2E61 ≡ 2051C ~ 2051C FE00 2F81F ≡ 4FBB 侻 ~ 4FBB FE00 2F813 ⼑ 18.5 2F807 ⼋ 12.18 TF-2337 ⼈ 9.8 T7-4D3E ≡ 34DF 㓟 ~ 34DF FE00 TF-2D68 ≡ 34B9 㒹 ~ 34B9 FE00 2F820 ≡ 5002 倂 ~ 5002 FE00 2F814 ⼑ 18.6 2F808 ⼊ 11.2 T6-2963 ⼈ 9.9 T6-2150 ≡ 523B 刻 内 ~ 523B FE00 T6-3D35 → 5185 5167 內 2F821 ≡ 507A 偺 ≡ ~ 507A FE00 ~ 5167 FE00 ⼑ 18.7 2F809 2F815 T6-2E76 MD-2F821 ⼌ 13.4 剆 ⼈ 9.10 ≡ 5246 T3-227B ~ 5246 FE00 T6-505B 518D 再 2F822 ≡ 5099 備 ≡ ~ 5099 FE00 ~ 518D FE00 ⼑ 18.10 2F80A 2F816 T6-4667 ⼌ 13.4 割 ⼈ 9.12 ≡ 5272 T6-2359 ~ 5272 FE00 T4-3C30 2054B 2F823 ≡ 50E7 僧 ≡ ~ 50E7 FE01 ~ 2054B FE00 ⼑ 18.11 2F80B 2F817 T3-4043 ⼍ 14.2 剷 ⼈ 9.12 ≡ 5277 T3-214F ~ 5277 FE00 T6-5A72 5197 冗 ≡ 50CF 像 ≡ ~ 50CF FE00 ~ 5197 FE00 The Unicode Standard 14.0, Copyright © 1991-2021 Unicode, Inc. All rights reserved. 2F824 CJK Compatibility Ideographs Supplement 2F847 2F824 2F830 2F83C ⼒ 19.4 ⼙ 26.7 ⼝ 30.6 TF-2229 T6-2F3D T4-235C ≡ 3515 㔕 ≡ 537D 卽 ≡ 549E 咞 ~ 3515 FE00 ~ 537D FE00 ~ 549E FE00 2F825 2F831 2F83D ⼒ 19.7 ⼙ 26.9 ⼝ 30.4 T6-2F25 H-9874 T6-3D59 T6-264E ≡ 52C7 勇 ≡ 537F 卿 ≡ 5438 吸 ~ 52C7 FE01 ~ 537F FE00 ~ 5438 FE00 2F826 2F832 2F83E ⼒ 19.7 ⼙ 26.9 ⼝ 30.4 T6-3558 T6-3D5A T4-235B ≡ 52C9 勉 ≡ 537F 卿 ≡ 5448 呈 ~ 52C9 FE01 ~ 537F FE01 ~ 5448 FE00 2F827 2F833 2F83F ⼒ 19.11 ⼙ 26.9 ⼝ 30.5 T4-364C T3-3A26 KP1-38CF T6-2A3C ≡ 52E4 勤 ≡ 537F 卿 ≡ 5468 周 ~ 52E4 FE01 ~ 537F FE02 ~ 5468 FE00 2F828 2F834 2F840 ⼓ 20.1 ⼚ 27.2 ⼝ 30.6 T4-212F TF-2133 H-A047 ≡ 52FA 勺 ≡ 20A2C ≡ 54A2 咢 ~ 52FA FE01 ~ 20A2C FE00 ~ 54A2 FE00 2F829 2F835 2F841 ⼓ 20.3 ⼚ 27.4 ⼝ 30.7 ⽕ 86.2 T6-2246 T3-2429 KP1-54B7 T3-3023 ≡ 5305 包 ≡ 7070 灰 ≡ 54F6 哶 ~ 5305 FE00 ~ 7070 FE00 ~ 54F6 FE00 2F82A 2F836 2F842 ⼓ 20.3 ⼜ 29.2 ⼝ 30.7 T3-2225 T6-2161 T6-357E ≡ 5306 匆 ≡ 53CA 及 ≡ 5510 唐 ~ 5306 FE00 ~ 53CA FE00 ~ 5510 FE00 2F82B 2F837 2F843 ⼔ 21.3 ⼜ 29.8 ⼝ 30.8 T6-2249 T6-2643 T4-3076 ≡ 5317 北 ≡ 53DF 叟 ≡ 5553 啓 ~ 5317 FE01 ~ 53DF FE00 ~ 5553 FE00 2F82C 2F838 2F844 ⼗ 24.3 ⼜ 29.9 ⼝ 30.8 T3-2329 T5-3131 T6-3D7C → 20984 ≡ 20B63 ≡ 5563 啣 ≡ 5349 卉 ~ 20B63 FE00 ~ 5563 FE00 ~ 5349 FE00 2F839 2F845 2F82D ⼝ 30.2 ⼝ 30.9 ⼗ 24.6 T6-225B T6-472A T6-2F38 ≡ 53EB 叫 ≡ 5584 善 ≡ 5351 卑 ~ 53EB FE00 ~ 5584 FE00 ~ 5351 FE01 2F83A 2F846 2F82E ⼝ 30.2 ⼝ 30.9 ⼗ 24.10 TU-2F83A T6-472C T6-4674 ≡ 53F1 叱 ≡ 5584 善 ≡ 535A 博 ~ 53F1 FE00 ~ 5584 FE01 ~ 535A FE00 2F83B 2F847 2F82F ⼝ 30.3 ⼝ 30.9 ⼙ 26.5 TU-2F83B H-9AC8 T6-4730 T6-2A23 → 4DB8 䶸 ≡ 5599 喙 ≡ 5373 即 ≡ 5406 吆 ~ 5599 FE01 ~ 5373 FE00 ~ 5406 FE00 The Unicode Standard 14.0, Copyright © 1991-2021 Unicode, Inc. All rights reserved. 2F848 CJK Compatibility Ideographs Supplement 2F86B 2F848 2F854 2F860 ⼝ 30.9 ⼟ 32.8 ⼥ 38.2 T6-4731 T6-3E2B T6-2267 ≡ 55AB 喫 ≡ 580D 堍 → 216A7 ~ 55AB FE00 ~ 580D FE00 ≡ 216A8 2F849 2F855 ~ 216A8 FE00 ⼝ 30.9 ⼟ 32.6 2F861 T6-4733 T3-3470 KP1-3BD5 ⼥ 38.5 ≡ 55B3 喳 ≡ 578B 型 TF-2A2B ~ 55B3 FE00 ~ 578B FE00 ≡ 216EA 2F84A 2F856 ~ 216EA FE00 ⼝ 30.10 ⼟ 32.9 2F862 T4-3C50 T4-3676 ⼥ 38.7 ≡ 55C2 嗂 ≡ 5832 堲 T6-364C ~ 55C2 FE00 ~ 5832 FE00 ≡ 59EC 姬 2F84B 2F857 ~ 59EC FE00 ⼞ 31.11 ⼟ 32.9 2F863 T6-5B5B T6-514A ⼥ 38.7 ≡ 5716 圖 ≡ 5831 報 T6-364D ~ 5716 FE00 ~ 5831 FE00 ≡ 5A1B 娛 2F84C 2F858 ~ 5A1B FE00 ⼝ 30.11 ⼟ 32.12 2F864 T6-5136 KP1-3A92 T7-2176 ⼥ 38.7 ≡ 5606 嘆 ≡ 58AC 墬 TF-2E6D ~ 5606 FE01 ~ 58AC FE00 ≡ 5A27 娧 2F84D 2F859 ~ 5A27 FE00 ⼞ 31.11 ⼟ 32.16 2F865 T6-5B59 T7-463E ⼥ 38.6 ≡ 5717 圗 ≡ 214E4 T6-3E54 ~ 5717 FE00 ~ 214E4 FE00 ≡ 59D8 姘 2F84E 2F85A ~ 59D8 FE00 ⼝ 30.11 ⼠ 33.4 2F866 T7-2160 TF-235B ⼥ 38.8 ≡ 5651 噑 → 58F3 壳 T6-3E50 ~ 5651 FE00 ≡ 58F2 売 ≡ 5A66 婦 2F84F ~ 58F2 FE00 ~ 5A66 FE00 ⼝ 30.12 2F85B 2F867 T7-2C65 KP1-3AD2 ⼠ 33.8 ⼥ 38.9 ≡ 5674 噴 T6-5157 T6-4761 ~ 5674 FE00 → 21533 ≡ 36EE 㛮 2F850 ≡ 58F7 壷 ~ 36EE FE00 ⼑ 18.2 ~ 58F7 FE00 2F868 T3-217C 2F85C ⼥ 38.9 ≡ 5207 切 ⼡ 34.4 T6-5169 ~ 5207 FE01 T5-2362 ≡ 36FC 㛼 2F851 ≡ 5906 夆 ~ 36FC FE00 ⼠ 33.3 ~ 5906 FE00 2F869 T6-2433 2F85D ⼥ 38.12 ≡ 58EE 壮 ⼣ 36.3 TF-4746 ~ 58EE FE00 T6-243B ≡ 5B08 嬈 2F852 ≡ 591A 多 ~ 5B08 FE00 ⼟ 32.6 ~ 591A FE00 2F86A T6-3635 KP1-3BAF 2F85E ⼥ 38.16 ≡ 57CE 城 ⼣ 36.11 T3-5A33 ~ 57CE FE00 T6-515E ≡ 5B3E 嬾 2F853 ≡ 5922 夢 ~ 5B3E FE00 ⼟ 32.8 ~ 5922 FE00 2F86B T6-3E2C 2F85F ⼥ 38.16 ≡ 57F4 埴 ⼤ 37.9 T7-4651 ~ 57F4 FE00 T6-4756 ≡ 5B3E 嬾 ≡ 5962 奢 ~ 5B3E FE01 ~ 5962 FE00 The Unicode Standard 14.0, Copyright © 1991-2021 Unicode, Inc. All rights reserved. 2F86C CJK Compatibility Ideographs Supplement 2F88F 2F86C 2F878 2F884 ⼧ 40.3 ⼬ 45.0 ⼰ 49.9 T6-2448 TU-2F878 H-8BC3 T6-4837 ≡ 219C8 → 4DB9 䶹 ≡ 5DFD 巽 ~ 219C8 FE00 ≡ 5C6E 屮 ~ 5DFD FE00 2F86D ~ 5C6E FE01 2F885 ⼧ 40.8 2F879 ⼱ 50.7 T4-3130 ⼭ 46.5 T6-372C ≡ 5BC3 寃 TF-2662 ≡ 5E28 帨 ~ 5BC3 FE00 ≡ 5CC0 峀 ~ 5E28 FE00 2F86E ~ 5CC0 FE00 2F886 ⼧ 40.10 2F87A ⼱ 50.9 TF-412B ⼭ 46.6 T6-483C ≡ 5BD8 寘 T3-2C40 ≡ 5E3D 帽 ~ 5BD8 FE00 ≡ 5C8D 岍 ~ 5E3D FE00 2F86F ~ 5C8D FE00 2F887 ⼧ 40.11 2F87B ⼱ 50.13 T6-5C22 ⼭ 46.7 T7-2D53 KP1-40D3 ≡ 5BE7 寧 T6-304E ≡ 5E69 幩 ~ 5BE7 FE02 ≡ 21DE4 ~ 5E69 FE00 2F870 ~ 21DE4 FE00 2F888 ⼧ 40.16 2F87C ⼱ 50.13 T3-5A36 ⼭ 46.9 T7-2D55 ≡ 5BF3 寳 T6-482B ≡ 3862 㡢 ~ 5BF3 FE00 ≡ 5D43 嵃 ~ 3862 FE00 2F871 ~ 5D43 FE00 2F889 ⼧ 40.23 2F87D ⼱ 50.21 T7-606D ⼭ 46.7 T7-606E ≡ 21B18 T6-4835 ≡ 22183 ~ 21B18 FE00 ≡ 21DE6 ~ 22183 FE00 2F872 ~ 21DE6 FE00 2F88A ⼨ 41.4 2F87E ⼴ 53.6 T6-2721 ⼭ 46.10 T5-2927 ≡ 5BFF 寿 T3-407E ≡ 387C 㡼 ~ 5BFF FE00 ≡ 5D6E 嵮 ~ 387C FE00 2F873 ~ 5D6E FE00 2F88B ⼨ 41.6 2F87F ⼴ 53.8 T6-3667 ⼭ 46.10 T6-3F46 KP1-412C ≡ 5C06 将 T6-5233 ≡ 5EB0 庰 ~ 5C06 FE00 ≡ 5D6B 嵫 ~ 5EB0 FE00 2F874 ~ 5D6B FE00 2F88C ⼩ 42.3 2F880 ⼴ 53.8 T6-244B ⼭ 46.11 T6-3F45 → 22450 T6-5C3D ≡ 5EB3 庳 ≡ 5F53 当 ≡ 5D7C 嵼 ~ 5EB3 FE00 ~ 5F53 FE00 ~ 5D7C FE00 2F88D 2F875 2F881 ⼴ 53.8 ⼪ 43.0 ⼮ 47.4 T3-355F T4-2134 T6-2736 ≡ 5EB6 庶 ≡ 5C22 尢 ≡ 5DE1 巡 ~ 5EB6 FE00 ~ 5C22 FE00 ~ 5DE1 FE00 2F88E 2F876 2F882 ⼴ 53.9 ⼪ 43.6 ⼮ 47.8 T6-5240 T5-2873 T6-5C49 UTC-00136 ≡ 5ECA 廊 ≡ 3781 㞁 ≡ 5DE2 巢 ~ 5ECA FE01 ~ 3781 FE00 ~ 5DE2 FE00 2F88F 2F877 2F883 ⿇ 200.3 ⼫ 44.9 ⼰ 49.2 T5-455D T6-477B TF-215F ≡ 2A392 ≡ 5C60 屠 ≡ 382F 㠯 ~ 2A392 FE00 ~ 5C60 FE00 ~ 382F FE00 The Unicode Standard 14.0, Copyright © 1991-2021 Unicode, Inc.
Recommended publications
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • CJK Compatibility Ideographs Range: F900–FAFF
    CJK Compatibility Ideographs Range: F900–FAFF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Hong Kong Supplementary Character Set – 2016 (Draft)
    中 文 界 面 諮 詢 委 員 會 工 作 小 組 文 件 編 號 2017/02 (B) Hong Kong Supplementary Character Set – 2016 (Draft) Office of the Government Chief Information Officer & Official Languages Division, Civil Service Bureau The Government of the Hong Kong Special Administrative Region April 2017 1/21 中 文 界 面 諮 詢 委 員 會 工 作 小 組 文 件 編 號 2017/02 (B) Table of Contents Preface Section 1 Overview……………….……………………………………………. 1 - 1 Section 2 Coding Scheme of the HKSCS–2016….……………………………. 2 - 1 Section 3 HKSCS–2016 under the Architecture of the ISO/IEC 10646………. 3 - 1 Table 1: Code Table of the HKSCS–2016……………………………………….. i - 1 Table 2: Newly Included Characters in the HKSCS–2016...………………….…. ii - 1 Table 3: Compatibility Characters in the HKSCS–2016…......………………..…. iii - 1 2/21 中 文 界 面 諮 詢 委 員 會 工 作 小 組 文 件 編 號 2017/02 (B) Preface After the first release of the Hong Kong Supplementary Character Set (HKSCS) in 1999, there have been three updated versions. The HKSCS-2001, HKSCS-2004 and HKSCS-2008 were published with 116, 123 and 68 new characters added respectively. A total of 5 009 characters were included in the HKSCS-2008. These publications formed the foundation for promoting the adoption of the ISO/IEC 10646 international coding standard, and were widely supported and adopted by the IT sector and members of the public. The ISO/IEC 10646 international coding standard is developed by the International Organization for Standardization (ISO) to provide a common technical basis for the storage and exchange of electronic information.
    [Show full text]
  • About the Code Charts 24
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • New Ideographs in Unicode 3.0 and Beyond
    New Ideographs in Unicode 3.0 and Beyond John H. Jenkins International and Text Group Apple Computer, Inc. 1) Background The Unicode Standard, version 2.1, contains a total of 21,204 East Asian ideographs. More than half (nearly 55%) of the encoded characters in the standard are ideographs. This ideographic repertoire, commonly referred to as “Unihan,” is already larger than the ideographic repertoires of most other major character set standards. The exceptions, however, use different unification rules than those used in Unihan, so although they provide more glyphic variants for characters than does Unihan, they actually encode about the same number of characters as Unihan. Nonetheless, Unihan is far from being an exhaustive set of ideographs—tens of thousands more remain unencoded. As a result, additions and extensions to Unihan will continue to be made as the Unicode Standard develops. The history of East Asian ideographs can be reliably traced back to the second millennium BCE, and all the major features of the current system were in place by the Zhou dynasty (ca. 1100 BCE). The shapes of the ideographs have altered over the centuries, and the Chinese language has continued to develop with new words coming into existence and old ones being dropped, but the writing system has endured. Chinese ideographs constitute the oldest writing system in the world still in common use. 15th International Unicode Conference 1 San Jose, CA, August/September 1999 New Ideographs in Unicode 3.0 and Beyond This long history is one of the major reasons why the collection of ideographs is so vast.
    [Show full text]
  • Outline of the Course
    Outline of the course Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (()30%) User Services (10%) Additional topics (()15%) Buliding of a (small) digital library Reference material: – Ian Witten, David Bainbridge, David Nichols, How to build a Digital Library, Morgan Kaufmann, 2010, ISBN 978-0-12-374857-7 (Second edition) – The Web FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -1 Access to information Representation of characters within a computer Representation of documents within a computer – Text documents – Images – Audio – Video How to store efficiently large amounts of data – Compression How to retrieve efficiently the desired item(s) out of large amounts of data – Indexing – Query execution FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -2 Representation of characters The “natural” wayyp to represent ( (palphanumeric ) characters (and symbols) within a computer is to associate a character with a number,,g defining a “coding table” How many bits are needed to represent the Latin alphabet ? FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -3 The ASCII characters The 95 printable ASCII characters, numbdbered from 32 to 126 (dec ima l) 33 control characters FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -4 ASCII table (7 bits) FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -5 ASCII 7-bits character set FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -6 Representation standards ASCII (late fifties) – AiAmerican
    [Show full text]
  • CJK Compatibility Range: 3300–33FF
    CJK Compatibility Range: 3300–33FF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Section 18.1, Han
    The Unicode® Standard Version 12.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2019 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 12.0. Includes index. ISBN 978-1-936213-22-1 (http://www.unicode.org/versions/Unicode12.0.0/) 1.
    [Show full text]
  • Guide to Han Radical-Stroke Index
    Guide to Han Radical-Stroke Index To expedite locating specific Han ideographic characters in the code charts, radical-stroke indices are provided on the Unicode web site. An interactive radical-stroke index page enables queries by specific radical numbers and stroke counts. Two fully formatted tradi- tional radical-stroke indices are also posted in PDF format. The larger of those provides a radical-stroke index for all of the Han ideographic characters in the Unicode Standard, including CJK compatibility ideographs. There is also a more compact radical-stroke index limited to the IICore set of 9,810 CJK unified ideographs in common usage. The following text describes how radical-stroke indices work for Han ideographic characters and explains the particular adaptations which have been made for the Unicode radical-stroke indices. Under the traditional radical-stroke system, each Han ideograph is considered to be writ- ten with one of a number of different character elements or radicals and a number of addi- tional strokes. For example, the character @ has the radical $ and seven additional strokes. To find the character @ within a dictionary, one would first locate the section for its radi- cal, $, and then find the subsection for characters with seven additional strokes. This method is complicated by the fact that there are occasional ambiguities in the count- ing of strokes. Even worse, some characters are considered by different authorities to be written with different radicals; there is not, in fact, universal agreement about which set of radicals to use for certain characters, particularly with the increased use of simplified characters.
    [Show full text]
  • 10646-2CD US Comment
    WG2 N2807R INCITS/L2/04- 161R2 Date: June 21, 2004 Title: HKSCS and GB 18030 PUA characters, background document Source: UTC/US Authors: Michel Suignard, Eric Muller, John Jenkins Action: For consideration by UTC and IRG Summary This documents describes characters still encoded in the Private Use Area of ISO/IEC 10646/Unicode as commonly found in the mapping information for Chinese coded characters such as HKSCS and GB-18030. It describes new encoding proposal to eliminate these Private Use Area allocation, so that the PUA can really be used for its true purpose. Doing so would tremendously improve interoperability between the East Asian market platforms because support for Government related encoded repertoire would not interfere with local comprehensive usage of the PUA area. Hong Kong Supplementary Character Set (HKSCS) According to http://www.info.gov.hk/digital21/eng/hkscs/download/big5-iso.txt there are a large number of HKSCS-2001 characters still encoded in the Private Use Area (PUA). A large majority of these characters looks like CJK Basic stroke that could be used to describe the appearance of CJK characters. Although there are already collections of various CJK fragments (such as CJK Radicals Supplement, Kangxi Radical) and methods to describe their arrangement using the Ideographic Description Characters, these ‘stroke’ elements stands on their own merit as an interesting mechanism to describe CJK characters and corresponding glyphs. Most of these characters have been proposed for encoding on the CJK Extension C. However that extension is not yet mature, but at the same time removing characters from the PUA is urgent.
    [Show full text]
  • Chapter 22, Symbols
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • Stabilizing CJK Compatibility Ideographs Through the Use of Standardized Variants Author: Dr
    ISO/IEC JTC1/SC2/WG2 N4246R2 ISO/IEC JTC1/SC2/WG2/IRG N1844R L2/12-095R Universal Multiple-Octet Coded Character Set International Organization for Standardization Doc Type: Working Group Document Title: Stabilizing CJK Compatibility Ideographs through the use of Standardized Variants Author: Dr. Ken Lunde, Adobe Systems Incorporated Source: The Unicode Consortium Status: Individual Contribution Action: For consideration by JTC1/SC2/WG2 Date: 2012-03-05 (revised; originally submitted on 2012-02-15) Background CJK Compatibility Ideographs, for which there are now 1,002 characters as of Unicode Version 6.1, are subject to normalization, and are thus considered unstable because the distinctions that they are intended to convey cannot be preserved, regardless of which of the four normalization forms is applied. When normalized, a CJK Compatibility Ideograph is reverted into its canonical equivalent, which is always a CJK Unified Ideograph. For example, the CJK Compatibility Ideographs U+FA5D (艹) and U+FA5E (艹), when normalized, are reverted to their shared canonical equivalent, specifically U+8279 (艹). Furthermore, given the broad extent to which text services interact in today’s applications and OSes, it is not possible to guarantee that normalization will not be applied, except for completely closed environments. In other words, a wide variety of products, protocols, and environments normalize text data on a regular basis, and this cannot be changed, so a solution for preserving the distinctions that are intended to be conveyed by CJK Com- patibility Ideographs becomes necessary. Recommendation In order to preserve any distinctions that the 1,002 CJK Compatibility Ideographs were intended to convey, and to enable round-trip capability, it is recommended that 1,002 Standardized Variants be accepted into the standard, which would be equivalent to the CJK Compatibility Ideographs themselves, and which would be immune to the effects of normalization.
    [Show full text]