L2/20-058 (Proposal to Change the Code Chart Font for 21 Blocks In

Total Page:16

File Type:pdf, Size:1020Kb

L2/20-058 (Proposal to Change the Code Chart Font for 21 Blocks In L2/20-058 Title: Proposal to change the code chart font for 21 blocks in Unicode Version 14.0 Author: Ken Lunde Date: 2020-01-22 (replaces L2/19-386) This is a proposal to change the code chart font for 21 blocks—affecting 1,762 characters— that are CJK-related or otherwise include characters that are used primarily in CJK contexts and are therefore supported in fonts for typical CJK typefaces. The suggested target for this change is Unicode Version 14.0 (2021). The purpose of this font change is four-fold: • Reduce the number of fonts that are necessary for code chart production. • Improve the consistency of the glyphs for similar or related characters by using a uniform typeface design and typeface style, at least for characters whose glyphs do not need to ad- here to a particular specification.. • Simplify code chart font management by covering as many complete blocks as possible using a single font. • Ensure that the font is accessible via open source. The actual font, CJK Symbols, has been open-sourced in both TrueType and OpenType/CFF formats under the terms of the SIL Open Font License, Version 1.1, and is already available on GitHub. The font can therefore be updated to include glyphs for additional blocks, or to add glyphs for characters in blocks that are already supported. This is maintenance work that I plan to take on for the foreseeable future. Code Chart Excerpts The even-numbered pages of this proposal include the current—Unicode Version 13.0 BETA— code chart excerpts for the 21 affected blocks. These can be used to draw a comparison be- tween the current representative glyphs with those provided in the CJK Symbols font that are shown on the facing odd-numbered pages. Code Tables Starting from page 3, the odd-numbered pages include left-to-right, top-to-bottom code ta- bles that serve as a glyph synopsis for the CJK Symbols font. Some of these code tables include a special note at the end that points out a peculiarity or specifies an action that needs to be taken for code chart production. Of the 21 blocks that are covered, 19 are covered in their entirety as of the Unicode Version 13.0 repertoire. The two blocks that include glyphs for only a subset of their characters have an appropriate note inked in red. Block names are inked in blue, and serve as links to the perpetually-current code charts. 1 02B0 Spacing Modifier Letters 02FF Spacing Modifier Letters—only U+02EA & U+02EB 0 1 2 3 4 5 6 7 8 9 A B C D E F 02B 02C 02D 02E 02F U+02E ˪ ˫ 0 ʰ ˀ ː ˠ ˰ 02B0 02C0 02D0 02E0 02F0 1 ʱ ˁ ˑ ˡ ˱ 02B1 02C1 02D1 02E1 02F1 2 ʲ ˂ ˒ ˢ ˲ 02B2 02C2 02D2 02E2 02F2 3 ʳ ˃ ˓ ˣ ˳ 02B3 02C3 02D3 02E3 02F3 4 ʴ ˄ ˔ ˤ ˴ 02B4 02C4 02D4 02E4 02F4 5 ʵ ˅ ˕ ˥ ˵ 02B5 02C5 02D5 02E5 02F5 6 ʶ ˆ ˖ ˦ ˶ 02B6 02C6 02D6 02E6 02F6 7 ʷ ˗ ˧ ˷ 02B7 02C7 02D7 02E7 02F7 8 ʸ ˈ ˨ ˸ 02B8 02C8 02D8 02E8 02F8 9 ʹ ˉ ˩ ˹ 02B9 02C9 02D9 02E9 02F9 A ʺ ˊ ˪ ˺ 02BA 02CA 02DA 02EA 02FA B ʻ ˋ ˫ ˻ 02BB 02CB 02DB 02EB 02FB C ʼ ˌ ˜ ˬ ˼ 02BC 02CC 02DC 02EC 02FC D ʽ ˍ ˭ ˽ 02BD 02CD 02DD 02ED 02FD E ʾ ˎ ˞ ˮ ˾ 02BE 02CE 02DE 02EE 02FE F ʿ ˏ ˟ ˯ ˿ 02BF 02CF 02DF 02EF 02FF DRAFT The Unicode Standard 13.0, Copyright © 1991-2019 Unicode, Inc. All rights reserved. 2 3 2460 Enclosed Alphanumerics 24FF Enclosed Alphanumerics 0 1 2 3 4 5 6 7 8 9 A B C D E F 246 247 248 249 24A 24B 24C 24D 24E 24F U+246 ① ② ③ ④ ⑤ ⑥ ⑦ ⑧ ⑨ ⑩ ⑪ ⑫ ⑬ ⑭ ⑮ ⑯ 0 ① ⑰ ⒀ ⒐ ⒠ ⒰ Ⓚ ⓐ ⓠ ⓰ U+247 2460 2470 2480 2490 24A0 24B0 24C0 24D0 24E0 24F0 ⑰ ⑱ ⑲ ⑳ ⑴ ⑵ ⑶ ⑷ ⑸ ⑹ ⑺ ⑻ ⑼ ⑽ ⑾ ⑿ 1 ② ⑱ ⒁ ⒑ ⒡ ⒱ Ⓛ ⓑ ⓡ ⓱ U+248 2461 2471 2481 2491 24A1 24B1 24C1 24D1 24E1 24F1 ⒀ ⒁ ⒂ ⒃ ⒄ ⒅ ⒆ ⒇ ⒈ ⒉ ⒊ ⒋ ⒌ ⒍ ⒎ ⒏ 2 ③ ⑲ ⒂ ⒒ ⒢ ⒲ Ⓜ ⓒ ⓢ ⓲ U+249 ⒐ ⒑ ⒒ ⒓ ⒔ ⒕ ⒖ ⒗ ⒘ ⒙ ⒚ ⒛ ⒜ ⒝ ⒞ ⒟ 2462 2472 2482 2492 24A2 24B2 24C2 24D2 24E2 24F2 U+24A 3 ④ ⑳ ⒃ ⒓ ⒣ ⒳ Ⓝ ⓓ ⓣ ⓳ ⒠ ⒡ ⒢ ⒣ ⒤ ⒥ ⒦ ⒧ ⒨ ⒩ ⒪ ⒫ ⒬ ⒭ ⒮ ⒯ 2463 2473 2483 2493 24A3 24B3 24C3 24D3 24E3 24F3 U+24B 4 ⑤ ⑴ ⒄ ⒔ ⒤ ⒴ Ⓞ ⓔ ⓤ ⓴ ⒰ ⒱ ⒲ ⒳ ⒴ ⒵ Ⓐ Ⓑ Ⓒ Ⓓ Ⓔ Ⓕ Ⓖ Ⓗ Ⓘ Ⓙ 2464 2474 2484 2494 24A4 24B4 24C4 24D4 24E4 24F4 U+24C Ⓚ Ⓛ Ⓜ Ⓝ Ⓞ Ⓟ Ⓠ Ⓡ Ⓢ Ⓣ Ⓤ Ⓥ Ⓦ Ⓧ Ⓨ Ⓩ 5 ⑥ ⑵ ⒅ ⒕ ⒥ ⒵ Ⓟ ⓕ ⓥ ⓵ 2465 2475 2485 2495 24A5 24B5 24C5 24D5 24E5 24F5 U+24D ⓐ ⓑ ⓒ ⓓ ⓔ ⓕ ⓖ ⓗ ⓘ ⓙ ⓚ ⓛ ⓜ ⓝ ⓞ ⓟ 6 ⑦ ⑶ ⒆ ⒖ ⒦ Ⓐ Ⓠ ⓖ ⓦ ⓶ U+24E 2466 2476 2486 2496 24A6 24B6 24C6 24D6 24E6 24F6 ⓠ ⓡ ⓢ ⓣ ⓤ ⓥ ⓦ ⓧ ⓨ ⓩ ⓪ ⓫ ⓬ ⓭ ⓮ ⓯ 7 ⑧ ⑷ ⒇ ⒗ ⒧ Ⓑ Ⓡ ⓗ ⓧ ⓷ U+24F 2467 2477 2487 2497 24A7 24B7 24C7 24D7 24E7 24F7 ⓰ ⓱ ⓲ ⓳ ⓴ ⓵ ⓶ ⓷ ⓸ ⓹ ⓺ ⓻ ⓼ ⓽ ⓾ ⓿ 8 ⑨ ⑸ ⒈ ⒘ ⒨ Ⓒ Ⓢ ⓘ ⓨ ⓸ 2468 2478 2488 2498 24A8 24B8 24C8 24D8 24E8 24F8 9 ⑩ ⑹ ⒉ ⒙ ⒩ Ⓓ Ⓣ ⓙ ⓩ ⓹ 2469 2479 2489 2499 24A9 24B9 24C9 24D9 24E9 24F9 A ⑪ ⑺ ⒊ ⒚ ⒪ Ⓔ Ⓤ ⓚ ⓪ ⓺ 246A 247A 248A 249A 24AA 24BA 24CA 24DA 24EA 24FA B ⑫ ⑻ ⒋ ⒛ ⒫ Ⓕ Ⓥ ⓛ ⓫ ⓻ 246B 247B 248B 249B 24AB 24BB 24CB 24DB 24EB 24FB C ⑬ ⑼ ⒌ ⒜ ⒬ Ⓖ Ⓦ ⓜ ⓬ ⓼ 246C 247C 248C 249C 24AC 24BC 24CC 24DC 24EC 24FC D ⑭ ⑽ ⒍ ⒝ ⒭ Ⓗ Ⓧ ⓝ ⓭ ⓽ 246D 247D 248D 249D 24AD 24BD 24CD 24DD 24ED 24FD E ⑮ ⑾ ⒎ ⒞ ⒮ Ⓘ Ⓨ ⓞ ⓮ ⓾ 246E 247E 248E 249E 24AE 24BE 24CE 24DE 24EE 24FE F ⑯ ⑿ ⒏ ⒟ ⒯ Ⓙ Ⓩ ⓟ ⓯ ⓿ 246F 247F 248F 249F 24AF 24BF 24CF 24DF 24EF 24FF DRAFT The Unicode Standard 13.0, Copyright © 1991-2019 Unicode, Inc. All rights reserved. 4 5 2700 Dingbats 27BF Dingbats—only U+2776 through U+2793 0 1 2 3 4 5 6 7 8 9 A B C D E F 270 271 272 273 274 275 276 277 278 279 27A 27B U+277 ❶ ❷ ❸ ❹ ❺ ❻ ❼ ❽ ❾ ❿ 0 ✀ ✐ ✠ ✰ ❀ ❐ ❰ ➀ ➐ ➠ ➰ ❞ U+278 2700 2710 2720 2730 2740 2750 2760 2770 2780 2790 27A0 27B0 ➀ ➁ ➂ ➃ ➄ ➅ ➆ ➇ ➈ ➉ ➊ ➋ ➌ ➍ ➎ ➏ 1 ✁ ✑ ✡ ✱ ❁ ❑ ❡ ❱ ➁ ➑ ➡ ➱ U+279 2701 2711 2721 2731 2741 2751 2761 2771 2781 2791 27A1 27B1 ➐ ➑ ➒ ➓ 2 ✂ ✒ ✢ ✲ ❂ ❒ ❢ ❲ ➂ ➒ ➢ ➲ 2702 2712 2722 2732 2742 2752 2762 2772 2782 2792 27A2 27B2 3 ✃ ✓ ✣ ✳ ❃ ❓ ❣ ❳ ➃ ➓ ➣ ➳ 2703 2713 2723 2733 2743 2753 2763 2773 2783 2793 27A3 27B3 4 ✄ ✔ ✤ ✴ ❄ ❔ ❤ ❴ ➄ ➔ ➤ ➴ 2704 2714 2724 2734 2744 2754 2764 2774 2784 2794 27A4 27B4 5 ✅ ✕ ✥ ✵ ❅ ❕ ❥ ❵ ➅ ➕ ➥ ➵ 2705 2715 2725 2735 2745 2755 2765 2775 2785 2795 27A5 27B5 6 ✆ ✖ ✦ ✶ ❆ ❖ ❦ ❶ ➆ ➖ ➦ ➶ 2706 2716 2726 2736 2746 2756 2766 2776 2786 2796 27A6 27B6 7 ✇ ✗ ✧ ✷ ❇ ❗ ❧ ❷ ➇ ➗ ➧ ➷ 2707 2717 2727 2737 2747 2757 2767 2777 2787 2797 27A7 27B7 8 ✈ ✘ ✨ ✸ ❈ ❘ ❨ ❸ ➈ ➘ ➨ ➸ 2708 2718 2728 2738 2748 2758 2768 2778 2788 2798 27A8 27B8 9 ✉ ✙ ✩ ✹ ❉ ❙ ❩ ❹ ➉ ➙ ➩ ➹ 2709 2719 2729 2739 2749 2759 2769 2779 2789 2799 27A9 27B9 A ✊ ✚ ✪ ✺ ❊ ❚ ❪ ❺ ➊ ➚ ➪ ➺ 270A 271A 272A 273A 274A 275A 276A 277A 278A 279A 27AA 27BA B ✋ ✛ ✫ ✻ ❋ ❛ ❫ ❻ ➋ ➛ ➫ ➻ 270B 271B 272B 273B 274B 275B 276B 277B 278B 279B 27AB 27BB C ✌ ✜ ✬ ✼ ❌ ❜ ❬ ❼ ➌ ➜ ➬ ➼ 270C 271C 272C 273C 274C 275C 276C 277C 278C 279C 27AC 27BC D ✍ ✝ ✭ ✽ ❍ ❝ ❭ ❽ ➍ ➝ ➭ ➽ 270D 271D 272D 273D 274D 275D 276D 277D 278D 279D 27AD 27BD E ✎ ✞ ✮ ✾ ❎ ❞ ❮ ❾ ➎ ➞ ➮ ➾ 270E 271E 272E 273E 274E 275E 276E 277E 278E 279E 27AE 27BE ✏ ✟ ✯ ✿ ❏ ❯ ❿ ➏ F ❜ ➟ ➯ ➿ 270F 271F 272F 273F 274F 275F 276F 277F 278F 279F 27AF 27BF DRAFT The Unicode Standard 13.0, Copyright © 1991-2019 Unicode, Inc. All rights reserved. 6 7 2FF0 Ideographic Description Characters 2FFF Ideographic Description Characters 0 1 2 3 4 5 6 7 8 9 A B C D E F 2FF Ideographic description characters U+2FF These are visibly displayed graphic characters, not invisible composition controls. ⿰ ⿱ ⿲ ⿳ ⿴ ⿵ ⿶ ⿷ ⿸ ⿹ ⿺ ⿻ 0 2FF0 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT 2FF0 TO RIGHT 2FF1 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO BELOW 1 2FF2 IDEOGRAPHIC DESCRIPTION CHARACTER LEFT 2FF1 TO MIDDLE AND RIGHT 2FF3 IDEOGRAPHIC DESCRIPTION CHARACTER ABOVE TO MIDDLE AND BELOW 2 2FF4 IDEOGRAPHIC DESCRIPTION CHARACTER FULL 2FF2 SURROUND 2FF5 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM ABOVE 3 2FF6 IDEOGRAPHIC DESCRIPTION CHARACTER 2FF3 SURROUND FROM BELOW 2FF7 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM LEFT 4 2FF8 IDEOGRAPHIC DESCRIPTION CHARACTER 2FF4 SURROUND FROM UPPER LEFT 2FF9 IDEOGRAPHIC DESCRIPTION CHARACTER SURROUND FROM UPPER RIGHT 5 2FFA IDEOGRAPHIC DESCRIPTION CHARACTER 2FF5 SURROUND FROM LOWER LEFT 2FFB IDEOGRAPHIC DESCRIPTION CHARACTER OVERLAID 6 2FF6 7 2FF7 8 2FF8 9 2FF9 A 2FFA B 2FFB C D E F DRAFT The Unicode Standard 13.0, Copyright © 1991-2019 Unicode, Inc. All rights reserved. 8 9 3000 CJK Symbols and Punctuation 303F CJK Symbols and Punctuation 0 1 2 3 4 5 6 7 8 9 A B C D E F 300 301 302 303 U+300 、 。 〃 〄 々 〆 〇 〈 〉 《 》 「 」 『 』 0 【 〠 〰 U+301 3000 3010 3020 3030 【 】 〒 〓 〔 〕 〖 〗 〘 〙 〚 〛 〜 〝 〞 〟 1 、 】 〡 〱 U+302 3001 3011 3021 3031 〠 〡 〢 〣 〤 〥 〦 〧 〨 〩 〪 〫 〬 〭 〮 〯 2 。 〒 〢 〲 U+303 3002 3012 3022 3032 〰 〱 〲 〳 〴 〵 〶 〷 〸 〹 〺 〻 〼 〽 〾 〿 3 〃 〓 〣 〳 3003 3013 3023 3033 Special Note: The two-em tall glyphs for U+3031 VERTICAL KANA REPEAT MARK and U+3032 VERTICAL KANA REPEAT WITH VOICED SOUND MARK should fit in the code chart proper, but I 4 〄 〔 〤 〴 3004 3014 3024 3034 recommend that they be given separate annotations, as shown below, so that their glyphs do not collide on the subsequent pages that show character names and other metadata: 5 々 〕 〥 〵 implemented as a glyph that is two-em tall 3005 3015 3025 3035 6 〆 〖 〦 〶 3006 3016 3026 3036 7 〇 〗 〧 〷 3007 3017 3027 3037 8 〈 〘 〨 〸 3008 3018 3028 3038 9 〉 〙 〩 〹 3009 3019 3029 3039 A 《 〚 $〪 〺 300A 301A 302A 303A B 》 〛 $〫 〻 300B 301B 302B 303B C 「 〜 $〬 〼 300C 301C 302C 303C D 」 〝 $〭 〽 300D 301D 302D 303D E 『 〞 $〮 300E 301E 302E 303E F 』 〟 $〯 〿 300F 301F 302F 303F DRAFT The Unicode Standard 13.0, Copyright © 1991-2019 Unicode, Inc. All rights reserved. 10 11 3040 Hiragana 309F Hiragana 0 1 2 3 4 5 6 7 8 9 A B C D E F 304 305 306 307 308 309 U+304 ぁ あ ぃ い ぅ う ぇ え ぉ お か が き ぎ く 0 ぐ だ ば む ゐ U+305 3050 3060 3070 3080 3090 ぐ け げ こ ご さ ざ し じ す ず せ ぜ そ ぞ た 1 ぁ け ち ぱ め ゑ U+306 3041 3051 3061 3071 3081 3091 だ ち ぢ っ つ づ て で と ど な に ぬ ね の は あ げ ぢ ひ も を U+307 2 ば ぱ ひ び ぴ ふ ぶ ぷ へ べ ぺ ほ ぼ ぽ ま み 3042 3052 3062 3072 3082 3092 U+308 3 ぃ こ っ び ゃ ん む め も ゃ や ゅ ゆ ょ よ ら り る れ ろ ゎ わ 3043 3053 3063 3073 3083 3093 U+309 4 い ご つ ぴ や ゔ ゐ ゑ を ん ゔ ゕ ゖ ゙ ゚ ゛ ゜ ゝ ゞ ゟ 3044 3054 3064 3074 3084 3094 5 ぅ さ づ ふ ゅ ゕ 3045 3055 3065 3075 3085 3095 6 う ざ て ぶ ゆ ゖ 3046 3056 3066 3076 3086 3096 7 ぇ し で ぷ ょ 3047 3057 3067 3077 3087 8 え じ と へ よ 3048 3058 3068 3078 3088 9 ぉ す ど べ ら $゙ 3049 3059 3069 3079 3089 3099 A お ず な ぺ り $゚ 304A 305A 306A 307A 308A 309A B か せ に ほ る ゛ 304B 305B 306B 307B 308B 309B C が ぜ ぬ ぼ れ ゜ 304C 305C 306C 307C 308C 309C D き そ ね ぽ ろ ゝ 304D 305D 306D 307D 308D 309D E ぎ ぞ の ま ゎ ゞ 304E 305E 306E 307E 308E 309E F く た は み わ ゟ 304F 305F 306F 307F 308F 309F DRAFT The Unicode Standard 13.0, Copyright © 1991-2019 Unicode, Inc.
Recommended publications
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Bopomofo Extended Range: 31A0–31BF
    Bopomofo Extended Range: 31A0–31BF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Wg2 N5125 & L2/19-386
    ISO/IEC JTC1/SC2/WG2 N5125 L2/19-386 Title: Proposal to change the code chart font for 21 blocks in Unicode Version 14.0 Author: Ken Lunde Date: 2019-12-02 This document proposes to change the code chart font for 21 blocks—affecting 1,762 charac- ters—that are CJK-related or otherwise include characters that are used primarily in CJK con- texts and are therefore supported in fonts for typical CJK typefaces. The suggested target for this change is Unicode Version 14.0 (2021). The purpose of this font change is four-fold: • Reduce the number of fonts that are necessary for code chart production. • Improve the consistency of the glyphs for similar or related characters by using a uniform typeface design and typeface style, at least for characters whose glyphs do not need to ad- here to a particular specification.. • Simplify code chart font management by covering as many complete blocks as possible using a single font. • Ensure that the font is accessible via open source. The actual font, CJK Symbols, will be open-sourced in both TrueType and OpenType/CFF for- mats under the terms of the SIL Open Font License, Version 1.1, and therefore can be updated to include glyphs for additional blocks, or to add glyphs to blocks that are already supported. This is maintenance work that I plan to take on for the foreseeable future. Code Tables This and subsequent pages include left-to-right, top-to-bottom code tables that serve as a glyph synopsis for the CJK Symbols font.
    [Show full text]
  • The Unicode Standard, Version 4.0--Online Edition
    This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten.
    [Show full text]
  • The Unicode Standard, Version 3.0, Issued by the Unicode Consor- Tium and Published by Addison-Wesley
    The Unicode Standard Version 3.0 The Unicode Consortium ADDISON–WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts · Harlow, England · Menlo Park, California Berkeley, California · Don Mills, Ontario · Sydney Bonn · Amsterdam · Tokyo · Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. If these files have been purchased on computer-readable media, the sole remedy for any claim will be exchange of defective media within ninety days of receipt. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. ISBN 0-201-61633-5 Copyright © 1991-2000 by Unicode, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or other- wise, without the prior written permission of the publisher or Unicode, Inc.
    [Show full text]
  • Unifoundry.Com GNU Unifont Glyphs
    Unifoundry.com GNU Unifont Glyphs Home GNU Unifont Archive Unicode Utilities Unicode Tutorial Hangul Fonts Unifont 9.0 Chart Fontforge Poll Downloads GNU Unifont is part of the GNU Project. This page contains the latest release of GNU Unifont, with glyphs for every printable code point in the Unicode 9.0 Basic Multilingual Plane (BMP). The BMP occupies the first 65,536 code points of the Unicode space, denoted as U+0000..U+FFFF. There is also growing coverage of the Supplemental Multilingual Plane (SMP), in the range U+010000..U+01FFFF, and of Michael Everson's ConScript Unicode Registry (CSUR). These font files are licensed under the GNU General Public License, either Version 2 or (at your option) a later version, with the exception that embedding the font in a document does not in itself constitute a violation of the GNU GPL. The full terms of the license are in LICENSE.txt. The standard font build — with and without Michael Everson's ConScript Unicode Registry (CSUR) Private Use Area (PUA) glyphs. Download in your favorite format: TrueType: The Standard Unifont TTF Download: unifont-9.0.01.ttf (12 Mbytes) Glyphs above the Unicode Basic Multilingual Plane: unifont_upper-9.0.01.ttf (1 Mbyte) Unicode Basic Multilingual Plane with CSUR PUA Glyphs: unifont_csur-9.0.01.ttf (12 Mbytes) Glyphs above the Unicode Basic Multilingual Plane with CSUR PUA Glyphs: unifont_upper_csur-9.0.01.ttf (1 Mbyte) PCF: unifont-9.0.01.pcf.gz (1 Mbyte) BDF: unifont-9.0.01.bdf.gz (1 Mbyte) Specialized versions — built by request: SBIT: Special version at the request
    [Show full text]
  • Outline of the Course
    Outline of the course Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (()30%) User Services (10%) Additional topics (()15%) Buliding of a (small) digital library Reference material: – Ian Witten, David Bainbridge, David Nichols, How to build a Digital Library, Morgan Kaufmann, 2010, ISBN 978-0-12-374857-7 (Second edition) – The Web FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -1 Access to information Representation of characters within a computer Representation of documents within a computer – Text documents – Images – Audio – Video How to store efficiently large amounts of data – Compression How to retrieve efficiently the desired item(s) out of large amounts of data – Indexing – Query execution FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -2 Representation of characters The “natural” wayyp to represent ( (palphanumeric ) characters (and symbols) within a computer is to associate a character with a number,,g defining a “coding table” How many bits are needed to represent the Latin alphabet ? FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -3 The ASCII characters The 95 printable ASCII characters, numbdbered from 32 to 126 (dec ima l) 33 control characters FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -4 ASCII table (7 bits) FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -5 ASCII 7-bits character set FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -6 Representation standards ASCII (late fifties) – AiAmerican
    [Show full text]
  • Iso/Iec 10646:2011 Fdis
    ISO/IEC International Standard ISO/IEC 10646 Final Draft International Standard Information technology – Universal Coded Character Set (UCS) Technologie de l‘information – Jeu universel de caractères codés (JUC) Second edition, 2011 \fdis10646.docx ISO/IEC 10646:2011 (E) PDF disclaimer This PDF file may contain embedded typefaces. In accordance with Adobe's licensing policy, this file may be printed or viewed but shall not be edited unless the typefaces which are embedded are licensed to and installed on the computer performing the editing. In downloading this file, parties accept therein the responsibility of not infringing Adobe's licensing policy. The ISO Central Secretariat accepts no liability in this area. Adobe is a trademark of Adobe Systems Incorporated. Details of the software products used to create this PDF file can be found in the General Info relative to the file; the PDF- creation parameters were optimized for printing. Every care has been taken to ensure that the file is suitable for use by ISO member bodies. In the unlikely event that a problem relating to it is found, please inform the Central Secretariat at the address given below. © ISO/IEC 2011 All rights reserved. Unless otherwise specified, no part of this publication may be reproduced or utilized in any form or by any means, electronic or mechanical, including photocopying and microfilm, without permission in writing from either ISO at the ad- dress below or ISO's member body in the country of the requester. ISO copyright office Case postale 56 CH-1211 Geneva 20 Tel. + 41 22 749 01 11 Fax + 41 22 749 09 47 E-mail [email protected] Web www.iso.ch Printed in Switzerland 2 © ISO/IEC 2011 – All rights reserved ISO/IEC 10646:2011 (E) CONTENTS Foreword...............................................................................................................................................
    [Show full text]
  • Using the Unicode Standard for Linguistic Data: Preliminary Guidelines∗
    Using the Unicode Standard for Linguistic Data: Preliminary Guidelines∗ Deborah Anderson UC Berkeley 1 Introduction A core concern for E-MELD is the need for a common standard for the digitalization of linguistic data, for the different archiving practices and representation of languages will inhibit accessing data, searching, and scientific investigation.1 While the context is expressed in terms of documenting endangered languages, E- MELD will serve generally as a “showroom of best practices” for linguistic data from all languages. The problem posed by the lack of a single, common standard for text data in various scripts was already evident in the business world in the 1980s, when competing industry and governmental body character encoding standards made the exchange of text data chaotic. This situation led to the creation of a single, universal character encoding standard, Unicode.2 That standard was synchronized with the ISO/IEC 10646, the parallel International Standard maintained by the International Organization for Standardization (ISO). Unicode is now the default character encoding standard for XML. Fortunately, Unicode provides a single standard that can be used by linguists in their work, either when working with transcription, languages with existent writing systems, or those languages that have no established orthography. Indeed, its use is advocated in “Seven Dimensions of Portability for Language Documentation and Description” (Bird and Simons 2002). But specifying “Unicode” as the character encoding does not provide enough guidance for many linguists, who may be puzzled by the structure of the Unicode Standard and how to use it, are unsure which Unicode characters ought to be used in a particular context and whom to ask about this, are uncertain how to report a missing character, etc.
    [Show full text]
  • Fonts in Mpdf Version 5.X Mpdf Version 5 Supports Truetype Fonts, Reading and Embedding Directly from the .Ttf Font Files
    mPDF Fonts in mPDF Version 5.x mPDF version 5 supports Truetype fonts, reading and embedding directly from the .ttf font files. Fonts must follow the Truetype specification and use Unicode mapping to the characters. Truetype collections (.ttc files) and Opentype files (.otf) in Truetype format are also supported. EASY TO ADD NEW FONTS 1. Upload the Truetype font file to the fonts directory (/ttfonts) 2. Define the font file details in the configuration file (config_fonts.php) 3. Access the font by specifying it in your HTML code as the CSS font-family These are some examples of Windows fonts: Arial - The quick, sly fox jumped over the lazy brown dog. Comic Sans MS - The quick, sly fox jumped over the lazy brown dog. Trebuchet - The quick, sly fox jumped over the lazy brown dog. Calibri - The quick, sly fox jumped over the lazy brown dog. QuillScript - The quick, sly fox jumped over the lazy brown dog. Lucidaconsole - The quick, sly fox jumped over the lazy brown dog. Tahoma - The quick, sly fox jumped over the lazy brown dog. AlbaSuper - The quick, sly fox jumped over the lazy brown dog. FULL UNICODE SUPPORT The DejaVu fonts distributed with mPDF contain an extensive set of characters, but it is easy to add fonts to access uncommon characters. Georgian (DejaVuSansCondensed) Ⴀ Ⴁ Ⴂ Ⴃ Ⴄ Ⴅ Ⴆ Ⴇ Ⴈ Ⴉ Ⴊ Ⴋ Ⴌ Ⴍ Ⴎ Ⴏ Ⴐ Ⴑ Ⴒ Ⴓ Cherokee (Quivira) Ꭰ Ꭱ Ꭲ Ꭳ Ꭴ Ꭵ Ꭶ Ꭷ Ꭸ Ꭹ Ꭺ Ꭻ Ꭼ Ꭽ Ꭾ Ꭿ Ꮀ Ꮁ Ꮂ Runic (Junicode) ᚠ ᚡ ᚢ ᚣ ᚤ ᚥ ᚦ ᚧ ᚨ ᚩ ᚪ ᚫ ᚬ ᚭ ᚮ ᚯ ᚰ ᚱ ᚲ ᚳ ᚴ ᚵ ᚶ ᚷ ᚸ ᚹ ᚺ ᚻ ᚼ Greek Extended (Quivira) ἀ ἁ ἂ ἃ ἄ ἅ ἆ ἇ Ἀ Ἁ Ἂ Ἃ Ἄ Ἅ Ἆ Ἇ ἐ ἑ ἒ ἓ ἔ ἕ IPA Extensions (Quivira)
    [Show full text]
  • Section 18.1, Han
    The Unicode® Standard Version 12.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2019 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 12.0. Includes index. ISBN 978-1-936213-22-1 (http://www.unicode.org/versions/Unicode12.0.0/) 1.
    [Show full text]
  • ISO/IEC International Standard 10646-1
    JTC1/SC2/WG2 N3658 Proposed Draft Amendment (PDAM) 8 ISO/IEC 10646:2003/Amd.8:2009 (E) Information technology — Universal Multiple-Octet Coded Character Set (UCS) — AMENDMENT 8: Additional symbols, Bamum supplement, CJK Unified Ideographs Extension D, and other characters Page 2, Clause 3, Normative references Replace the list describing the fields of the linked con- tent (CJKU_SR.txt) and the following paragraph with Update the reference to the Unicode Bidirectional Algo- the following text: rithm and the Unicode Normalization Forms as follows: st • 1 field: BMP or SIP code point (0hhhh), Unicode Standard Annex, UAX#9, The Unicode Bidi- (2hhhh) rectional Algorithm: • 2nd field: Radical Stroke index http://www.unicode.org/reports/tr9/tr9-21.html. (d{1,3}’.d{1,2}), (Radical is one to three di- gits, optionally followed by an apostrophe for alter- Unicode Standard Annex, UAX#15, Unicode Normali- nate radical, followed by a full stop, and ending by zation Forms: one or two digits for the stroke count). http://www.unicode.org/reports/tr15/tr15-31.html. rd • 3 field: Hanzi G sources(G0-hhhh), (G1- hhhh), (G3-hhhh), (G5-hhhh), (G7- Editor’s Note: The versions for the Unicode Standard hhhh), (GS-hhhh), (G8-hhhh), (G9- Annexes mentioned above will be updated as appropri- hhhh), (GE-hhhh), (G_4K), (G_BK), ate in future phases of this amendment process. (G_BKddddd), (G_CH), (G_CY), (G_CYYddddd), (G_CHddddd), (G_FZ), Page 21, Sub-clause 23.1 Source references (G_FZddddd), (G_GHddddd), for CJK Unified Ideographs (G_GFHZBddd), (G_GJZddddd), (G_HC),
    [Show full text]