CJK Compatibility Ideographs Range: F900–FAFF the Unicode Standard CJK Compatibility Ideographs Range: F900–FAFF The Unicode Standard, Version 3.2 This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 3.2. Characters in this chart that are new for The Unicode Standard, Version 3.2 are shown in conjunction with any existing characters. For ease of reference, the new characters have been highlighted in the chart grid and in the names list. This file will not be updated with errata, or when additional characters are assigned to the Unicode Standard. See http://www.unicode.org/charts for access to a complete list of the latest character charts. Disclaimer These charts are provided as the on-line reference to the character contents of the Unicode Standard, Version 3.2 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this excerpt file, please consult the appropriate sections of The Unicode Standard, Version 3.0 (ISBN 0-201-61633-5), as well as Unicode Standard Annexes #28 and #27, the other Unicode Technical Reports and the Unicode Character Database, which are available on-line. See http://www.unicode.org/Public/UNIDATA/UnicodeCharacterDatabase.html and http://www.unicode.org/unicode/reports A thorough understanding of the information contained in these additional sources is required for a successful implementation. Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts. See http://www.unicode.org/unicode/uni2book/u2fonts.html for a list. Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you are welcome to provide links to these charts. The fonts and font data used in production of these Code Charts may NOT be extracted or otherwise used in any commercial product without permission or license granted by the typeface owner(s). The information in this file may be updated from time to time. The Unicode Consortium is not liable for errors or omissions in this excerpt file or the standard itself. Information on characters added to the Unicode Standard since the publication of Version 3.2 as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site. See http://www.unicode.org/pending/pending.html and http://www.unicode.org/unicode/alloc/Pipeline.html. Copyright © 1991-2002 Unicode, Inc. All rights reserved. F900 CJK Compatibility Ideographs F9AF F90 F91 F92 F93 F94 F95 F96 F97 F98 F99 F9A 0 ᬦ ! 1 A Q a ᰍ ᲋ Უ ¡ F900 F910 F920 F930 F940 F950 F960 F970 F980 F990 F9A0 1 ᬬ " 2 B R b ᱍ ᲌ Ღ ¢ F901 F911 F921 F931 F941 F951 F961 F971 F981 F991 F9A1 2 ᬷ # 3 C S c s £ F902 F912 F922 F932 F942 F952 F962 F972 F982 F992 F9A2 3 $ 4 D T d ᱙ Წ ᳀ F903 F913 F923 F933 F943 F953 F963 F973 F983 F993 F9A3 4 % 5 E U e u Ჯ ᳁ F904 F914 F924 F934 F944 F954 F964 F974 F984 F994 F9A4 5 & 6 F V f v ᲏ Ჲ ᳃ F905 F915 F925 F935 F945 F955 F965 F975 F985 F995 F9A5 6 ' 7 G W g w Ვ Ჵ ᳉ F906 F916 F926 F936 F946 F956 F966 F976 F986 F996 F9A6 7 ( 8 H X h x Თ ¨ F907 F917 F927 F937 F947 F957 F967 F977 F987 F997 F9A7 8 ) 9 I Y ᭢ y ᳋ F908 F918 F928 F938 F948 F958 F968 F978 F988 F998 F9A8 9 * : J Z ᮤ z ᳌ F909 F919 F929 F939 F949 F959 F969 F979 F989 F999 F9A9 A + ; K [ ᮦ { Ჶ ᳎ F90A F91A F92A F93A F94A F95A F96A F97A F98A F99A F9AA B , < L \ l | Კ Ჸ ᳏ F90B F91B F92B F93B F94B F95B F96B F97B F98B F99B F9AB C - = M ] m } Ლ Ჺ ᳒ F90C F91C F92C F93C F94C F95C F96C F97C F98C F99C F9AC D . > N ^ n ~ Პ ᲻ ᳛ F90D F91D F92D F93D F94D F95D F96D F97D F98D F99D F9AD E / ? O _ ᯗ ᲼ ᳝ F90E F91E F92E F93E F94E F95E F96E F97E F98E F99E F9AE F 0 @ P ` ᯥ ᲊ Ტ Ჿ ° F90F F91F F92F F93F F94F F95F F96F F97F F98F F99F F9AF 832 The Unicode Standard 3.2, Copyright © 1991-2002, Unicode, Inc. All rights reserved. F9B0 CJK Compatibility Ideographs FA5F F9B F9C F9D F9E F9F FA0 FA1 FA2 FA3 FA4 FA5 0 ± ᳽ Ñ á ñ ( 2 F9B0 F9C0 F9D0 F9E0 F9F0 FA00 FA10 FA20 FA30 FA40 FA50 1 ᳡ ᴀ Ò â ᶞ ) 3 F9B1 F9C1 F9D1 F9E1 F9F1 FA01 FA11 FA21 FA31 FA41 FA51 2 ᳣ Ã Ó ã ᶡ ἄ * 4 F9B2 F9C2 F9D2 F9E2 F9F2 FA02 FA12 FA22 FA32 FA42 FA52 3 ᳥ ᴆ ᵍ ᵾ ᶢ Ἐ ! + 5 F9B3 F9C3 F9D3 F9E3 F9F3 FA03 FA13 FA23 FA33 FA43 FA53 4 ᳦ ᴚ ᵎ ᶁ ᶫ ἢ " , 6 F9B4 F9C4 F9D4 F9E4 F9F4 FA04 FA14 FA24 FA34 FA44 FA54 5 ¶ Æ ᵒ ᶄ ᶬ # - 7 F9B5 F9C5 F9D5 F9E5 F9F5 FA05 FA15 FA25 FA35 FA45 FA55 6 · Ç ᵓ ᶆ ᶮ $ . 8 F9B6 F9C6 F9D6 F9E6 F9F6 FA06 FA16 FA26 FA36 FA46 FA56 7 ¸ È Ø è ø % / 9 F9B7 F9C7 F9D7 F9E7 F9F7 FA07 FA17 FA27 FA37 FA47 FA57 8 ¹ É Ù é ù & 0 : F9B8 F9C8 F9D8 F9E8 F9F8 FA08 FA18 FA28 FA38 FA48 FA58 9 º Ê ᵖ ᶋ ú ' 1 ; F9B9 F9C9 F9D9 F9E9 F9F9 FA09 FA19 FA29 FA39 FA49 FA59 A » Ë ᵗ ᶌ û + F9BA F9CA F9DA F9EA F9FA FA0A FA1A FA2A FA3A FA4A FA5A B ¼ Ì ᵘ ᶏ ᷓ , F9BB F9CB F9DB F9EB F9FB FA0B FA1B FA2B FA3B FA4B FA5B C ᳸ Í ᵞ ᶐ ý ᗌ - F9BC F9CC F9DC F9EC F9FC FA0C FA1C FA2C FA3C FA4C FA5C D ᳹ Î ᵸ ᶕ þ ᗍ . F9BD F9CD F9DD F9ED F9FD FA0D FA1D FA2D FA3D FA4D FA5D E ¿ Ï ᵹ ï / F9BE F9CE F9DE F9EE F9FE FA0E FA1E FA3E FA4E FA5E F À Ð ᵼ ð ẛ 0 F9BF F9CF F9DF F9EF F9FF FA0F FA1F FA3F FA4F FA5F The Unicode Standard 3.2, Copyright © 1991-2002, Unicode, Inc. All rights reserved. 833 FA60 CJK Compatibility Ideographs FAFF FA6 FA7 FA8 FA9 FAA FAB FAC FAD FAE FAF 0 < FA60 1 = FA61 2 > FA62 3 ? FA63 4 @ FA64 5 A FA65 6 B FA66 7 C FA67 8 D FA68 9 E FA69 A ; FA6A B C D E F 834 The Unicode Standard 3.2, Copyright © 1991-2002, Unicode, Inc. All rights reserved. F900 CJK Compatibility Ideographs F924 Pronunciation variants from KS C F912 CJK COMPATIBILITY IDEOGRAPH- 5601-1987 F912 ≡ 88F8 F900 ᬦ CJK COMPATIBILITY IDEOGRAPH- F913 CJK COMPATIBILITY IDEOGRAPH- F900 F913 ≡ 8C48 ≡ 908F ሸ F901 ᬬ CJK COMPATIBILITY IDEOGRAPH- F914 CJK COMPATIBILITY IDEOGRAPH- F901 F914 ≡ 66F4 ≡ 6A02 F902 ᬷ CJK COMPATIBILITY IDEOGRAPH- F915 CJK COMPATIBILITY IDEOGRAPH- F902 F915 ≡ 8ECA ≡ 6D1B F903 CJK COMPATIBILITY IDEOGRAPH- F916 CJK COMPATIBILITY IDEOGRAPH- F903 F916 ≡ 8CC8 ≡ 70D9 F904 CJK COMPATIBILITY IDEOGRAPH- F917 CJK COMPATIBILITY IDEOGRAPH- F904 F917 ≡ 6ED1 ≡ 73DE F905 CJK COMPATIBILITY IDEOGRAPH- F918 CJK COMPATIBILITY IDEOGRAPH- F905 F918 ≡ 4E32 Ɨ ≡ 843D F906 CJK COMPATIBILITY IDEOGRAPH- F919 CJK COMPATIBILITY IDEOGRAPH- F906 F919 ≡ 53E5 ≡ 916A F907 CJK COMPATIBILITY IDEOGRAPH- F91A CJK COMPATIBILITY IDEOGRAPH- F907 F91A ≡ 9F9C ᗃ ≡ 99F1 F908 CJK COMPATIBILITY IDEOGRAPH- F91B CJK COMPATIBILITY IDEOGRAPH- F908 F91B ≡ 9F9C ᗃ ≡ 4E82 lj F909 CJK COMPATIBILITY IDEOGRAPH- F91C CJK COMPATIBILITY IDEOGRAPH- F909 F91C ≡ 5951 ≡ 5375 F90A CJK COMPATIBILITY IDEOGRAPH- F91D CJK COMPATIBILITY IDEOGRAPH- F90A F91D ≡ 91D1 ≡ 6B04 F90B CJK COMPATIBILITY IDEOGRAPH- F91E CJK COMPATIBILITY IDEOGRAPH- F90B F91E ≡ 5587 ≡ 721B F90C CJK COMPATIBILITY IDEOGRAPH- F91F CJK COMPATIBILITY IDEOGRAPH- F90C F91F ≡ 5948 ≡ 862D F90D ! CJK COMPATIBILITY IDEOGRAPH- F920 CJK COMPATIBILITY IDEOGRAPH- F90D F920 ≡ 61F6 ≡ 9E1E F90E " CJK COMPATIBILITY IDEOGRAPH- F921 CJK COMPATIBILITY IDEOGRAPH- F90E F921 ≡ 7669 ! ≡ 5D50 ԭ F90F # CJK COMPATIBILITY IDEOGRAPH- F922 CJK COMPATIBILITY IDEOGRAPH- F90F F922 ≡ 7F85 " ≡ 6FEB F910 $ CJK COMPATIBILITY IDEOGRAPH- F923 CJK COMPATIBILITY IDEOGRAPH- F910 F923 ≡ 863F # ≡ 85CD F911 % CJK COMPATIBILITY IDEOGRAPH- F924 CJK COMPATIBILITY IDEOGRAPH- F911 F924 ≡ 87BA $ ≡ 8964 The Unicode Standard 3.2, Copyright © 1991-2002, Unicode, Inc. All rights reserved. 835 F925 CJK Compatibility Ideographs F94A F925 9 CJK COMPATIBILITY IDEOGRAPH- F938 & CJK COMPATIBILITY IDEOGRAPH- F925 F938 ≡ 62C9 8 ≡ 9732 % F926 : CJK COMPATIBILITY IDEOGRAPH- F939 ' CJK COMPATIBILITY IDEOGRAPH- F926 F939 ≡ 81D8 9 ≡ 9B6F ᐘ F927 ; CJK COMPATIBILITY IDEOGRAPH- F93A ( CJK COMPATIBILITY IDEOGRAPH- F927 F93A ≡ 881F : ≡ 9DFA ' F928 < CJK COMPATIBILITY IDEOGRAPH- F93B ) CJK COMPATIBILITY IDEOGRAPH- F928 F93B ≡ 5ECA ; ≡ 788C ( F929 = CJK COMPATIBILITY IDEOGRAPH- F93C * CJK COMPATIBILITY IDEOGRAPH- F929 F93C ≡ 6717 < ≡ 797F ) F92A > CJK COMPATIBILITY IDEOGRAPH- F93D + CJK COMPATIBILITY IDEOGRAPH- F92A F93D ≡ 6D6A = ≡ 7DA0 യ F92B ? CJK COMPATIBILITY IDEOGRAPH- F93E , CJK COMPATIBILITY IDEOGRAPH- F92B F93E ≡ 72FC > ≡ 83C9 + F92C @ CJK COMPATIBILITY IDEOGRAPH- F93F - CJK COMPATIBILITY IDEOGRAPH- F92C F93F ≡ 90CE ? ≡ 9304 , F92D A CJK COMPATIBILITY IDEOGRAPH- F940 . CJK COMPATIBILITY IDEOGRAPH- F92D F940 ≡ 4F86 @ ≡ 9E7F ᕎ F92E B CJK COMPATIBILITY IDEOGRAPH- F941 / CJK COMPATIBILITY IDEOGRAPH- F92E F941 ≡ 51B7 A ≡ 8AD6 . F92F C CJK COMPATIBILITY IDEOGRAPH- F942 0 CJK COMPATIBILITY IDEOGRAPH- F92F F942 ≡ 52DE B ≡ 58DF / F930 D CJK COMPATIBILITY IDEOGRAPH- F943 1 CJK COMPATIBILITY IDEOGRAPH- F930 F943 ≡ 64C4 C ≡ 5F04 0 F931 E CJK COMPATIBILITY IDEOGRAPH- F944 2 CJK COMPATIBILITY IDEOGRAPH- F931 F944 ≡ 6AD3 D ≡ 7C60 ಣ F932 F CJK COMPATIBILITY IDEOGRAPH- F945 3 CJK COMPATIBILITY IDEOGRAPH- F932 F945 ≡ 7210 E ≡ 807E ๅ F933 G CJK COMPATIBILITY IDEOGRAPH- F946 4 CJK COMPATIBILITY IDEOGRAPH- F933 F946 ≡ 76E7 F ≡ 7262 କ F934 H CJK COMPATIBILITY IDEOGRAPH- F947 5 CJK COMPATIBILITY IDEOGRAPH- F934 F947 ≡ 8001 G ≡ 78CA 4 F935 I CJK COMPATIBILITY IDEOGRAPH- F948 6 CJK COMPATIBILITY IDEOGRAPH- F935 F948 ≡ 8606 H ≡ 8CC2 ჉ F936 J CJK COMPATIBILITY IDEOGRAPH- F949 7 CJK COMPATIBILITY IDEOGRAPH- F936 F949 ≡ 865C I ≡ 96F7 6 F937 K CJK COMPATIBILITY IDEOGRAPH- F94A 8 CJK COMPATIBILITY IDEOGRAPH- F937 F94A ≡ 8DEF J ≡ 58D8 7 836 The Unicode Standard 3.2, Copyright © 1991-2002, Unicode, Inc.
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
  • CJK Compatibility Ideographs Range: F900–FAFF
    CJK Compatibility Ideographs Range: F900–FAFF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
  • Hong Kong Supplementary Character Set – 2016 (Draft)
    中 文 界 面 諮 詢 委 員 會 工 作 小 組 文 件 編 號 2017/02 (B) Hong Kong Supplementary Character Set – 2016 (Draft) Office of the Government Chief Information Officer & Official Languages Division, Civil Service Bureau The Government of the Hong Kong Special Administrative Region April 2017 1/21 中 文 界 面 諮 詢 委 員 會 工 作 小 組 文 件 編 號 2017/02 (B) Table of Contents Preface Section 1 Overview……………….……………………………………………. 1 - 1 Section 2 Coding Scheme of the HKSCS–2016….……………………………. 2 - 1 Section 3 HKSCS–2016 under the Architecture of the ISO/IEC 10646………. 3 - 1 Table 1: Code Table of the HKSCS–2016……………………………………….. i - 1 Table 2: Newly Included Characters in the HKSCS–2016...………………….…. ii - 1 Table 3: Compatibility Characters in the HKSCS–2016…......………………..…. iii - 1 2/21 中 文 界 面 諮 詢 委 員 會 工 作 小 組 文 件 編 號 2017/02 (B) Preface After the first release of the Hong Kong Supplementary Character Set (HKSCS) in 1999, there have been three updated versions. The HKSCS-2001, HKSCS-2004 and HKSCS-2008 were published with 116, 123 and 68 new characters added respectively. A total of 5 009 characters were included in the HKSCS-2008. These publications formed the foundation for promoting the adoption of the ISO/IEC 10646 international coding standard, and were widely supported and adopted by the IT sector and members of the public. The ISO/IEC 10646 international coding standard is developed by the International Organization for Standardization (ISO) to provide a common technical basis for the storage and exchange of electronic information.
  • CJK Compatibility Ideographs Supplement Range: 2F800–2FA1F
    CJK Compatibility Ideographs Supplement Range: 2F800–2FA1F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
  • About the Code Charts 24
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
  • New Ideographs in Unicode 3.0 and Beyond
    New Ideographs in Unicode 3.0 and Beyond John H. Jenkins International and Text Group Apple Computer, Inc. 1) Background The Unicode Standard, version 2.1, contains a total of 21,204 East Asian ideographs. More than half (nearly 55%) of the encoded characters in the standard are ideographs. This ideographic repertoire, commonly referred to as “Unihan,” is already larger than the ideographic repertoires of most other major character set standards. The exceptions, however, use different unification rules than those used in Unihan, so although they provide more glyphic variants for characters than does Unihan, they actually encode about the same number of characters as Unihan. Nonetheless, Unihan is far from being an exhaustive set of ideographs—tens of thousands more remain unencoded. As a result, additions and extensions to Unihan will continue to be made as the Unicode Standard develops. The history of East Asian ideographs can be reliably traced back to the second millennium BCE, and all the major features of the current system were in place by the Zhou dynasty (ca. 1100 BCE). The shapes of the ideographs have altered over the centuries, and the Chinese language has continued to develop with new words coming into existence and old ones being dropped, but the writing system has endured. Chinese ideographs constitute the oldest writing system in the world still in common use. 15th International Unicode Conference 1 San Jose, CA, August/September 1999 New Ideographs in Unicode 3.0 and Beyond This long history is one of the major reasons why the collection of ideographs is so vast.
  • Outline of the Course
    Outline of the course Introduction to Digital Libraries (15%) Description of Information (30%) Access to Information (()30%) User Services (10%) Additional topics (()15%) Buliding of a (small) digital library Reference material: – Ian Witten, David Bainbridge, David Nichols, How to build a Digital Library, Morgan Kaufmann, 2010, ISBN 978-0-12-374857-7 (Second edition) – The Web FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -1 Access to information Representation of characters within a computer Representation of documents within a computer – Text documents – Images – Audio – Video How to store efficiently large amounts of data – Compression How to retrieve efficiently the desired item(s) out of large amounts of data – Indexing – Query execution FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -2 Representation of characters The “natural” wayyp to represent ( (palphanumeric ) characters (and symbols) within a computer is to associate a character with a number,,g defining a “coding table” How many bits are needed to represent the Latin alphabet ? FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -3 The ASCII characters The 95 printable ASCII characters, numbdbered from 32 to 126 (dec ima l) 33 control characters FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -4 ASCII table (7 bits) FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -5 ASCII 7-bits character set FUB 2012-2013 Vittore Casarosa – Digital Libraries Part 7 -6 Representation standards ASCII (late fifties) – AiAmerican
  • CJK Compatibility Range: 3300–33FF
    CJK Compatibility Range: 3300–33FF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
  • Section 18.1, Han
    The Unicode® Standard Version 12.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2019 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 12.0. Includes index. ISBN 978-1-936213-22-1 (http://www.unicode.org/versions/Unicode12.0.0/) 1.
  • Guide to Han Radical-Stroke Index
    Guide to Han Radical-Stroke Index To expedite locating specific Han ideographic characters in the code charts, radical-stroke indices are provided on the Unicode web site. An interactive radical-stroke index page enables queries by specific radical numbers and stroke counts. Two fully formatted tradi- tional radical-stroke indices are also posted in PDF format. The larger of those provides a radical-stroke index for all of the Han ideographic characters in the Unicode Standard, including CJK compatibility ideographs. There is also a more compact radical-stroke index limited to the IICore set of 9,810 CJK unified ideographs in common usage. The following text describes how radical-stroke indices work for Han ideographic characters and explains the particular adaptations which have been made for the Unicode radical-stroke indices. Under the traditional radical-stroke system, each Han ideograph is considered to be writ- ten with one of a number of different character elements or radicals and a number of addi- tional strokes. For example, the character @ has the radical $ and seven additional strokes. To find the character @ within a dictionary, one would first locate the section for its radi- cal, $, and then find the subsection for characters with seven additional strokes. This method is complicated by the fact that there are occasional ambiguities in the count- ing of strokes. Even worse, some characters are considered by different authorities to be written with different radicals; there is not, in fact, universal agreement about which set of radicals to use for certain characters, particularly with the increased use of simplified characters.
  • 10646-2CD US Comment
    WG2 N2807R INCITS/L2/04- 161R2 Date: June 21, 2004 Title: HKSCS and GB 18030 PUA characters, background document Source: UTC/US Authors: Michel Suignard, Eric Muller, John Jenkins Action: For consideration by UTC and IRG Summary This documents describes characters still encoded in the Private Use Area of ISO/IEC 10646/Unicode as commonly found in the mapping information for Chinese coded characters such as HKSCS and GB-18030. It describes new encoding proposal to eliminate these Private Use Area allocation, so that the PUA can really be used for its true purpose. Doing so would tremendously improve interoperability between the East Asian market platforms because support for Government related encoded repertoire would not interfere with local comprehensive usage of the PUA area. Hong Kong Supplementary Character Set (HKSCS) According to http://www.info.gov.hk/digital21/eng/hkscs/download/big5-iso.txt there are a large number of HKSCS-2001 characters still encoded in the Private Use Area (PUA). A large majority of these characters looks like CJK Basic stroke that could be used to describe the appearance of CJK characters. Although there are already collections of various CJK fragments (such as CJK Radicals Supplement, Kangxi Radical) and methods to describe their arrangement using the Ideographic Description Characters, these ‘stroke’ elements stands on their own merit as an interesting mechanism to describe CJK characters and corresponding glyphs. Most of these characters have been proposed for encoding on the CJK Extension C. However that extension is not yet mature, but at the same time removing characters from the PUA is urgent.
  • Chapter 22, Symbols
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
