Universal Multiple-Octet Coded Character Set (UCS) —
Total Page:16
File Type:pdf, Size:1020Kb
Final Proposed Draft Amendment (FPDAM) 2 ISO/IEC 10646-1:2000/Amd. 2:2002 (E) Information technology — Universal Multiple-Octet Coded Character Set (UCS) — Part 1: Architecture and Basic Multilingual Plane AMENDMENT 2: Limbu, Tai Le, Yijing and other characters Page v, Foreword Page 9, sub-clause 13.2 Four-octet canonical Add the following text after the last paragraph. form The standard contains material which may be only Change the sub-clause header to “13.2 Four-octet available to users who obtain their copy in a ma- canonical form (UCS-4)” and add a new note after chine readable format. That material consists of the the existing note which is renamed NOTE 1: following printable files: NOTE 2 – When confined to the code positions in Planes 0 to 10 (U+0000 to U+10FFFF), UCS-4 is also CJKC0SR.txt referred to as UCS Transformation Format 32 (UTF- CJKUA_SR.txt 32). The Unicode Standard, Version 3.2, defines the […] following forms of UTF-32: • (Editor’s note: more files are likely to be added for the next UTF-32: the ordering of octets (specified in edition) sub-clause 6.3) is not defined and the signa- tures (specified in Annex H) may appear; • UTF-32BE: in the ordering of octets the more Page 1, clause 1 Scope significant octets precede the less significant Replace the first sentence of the Note 1 as follows: octets, as specified in sub-clause 6.2, and no signatures appear; The Unicode Standard Version 4.0 includes a set of • characters, names, and coded representations for UTF-32LE: in the ordering of octets the less significant octets precede the more signifi- the Basic Multilingual Plane that are identical with cant octets, and no signatures appear. those in this Part of this International Standard. Page 13, clause 22 Compatibility characters Page 2, clause 3 Normative references In the sub-clause 22.2 added by the first amendment Add the following reference: insert the new KP sources after the U source as fol- Unicode Standard Annex, UAX#15, Unicode lows: Normalization Forms, Version 3.2.0, 2002-03-27. (Editor’s note: The Version number is likely to be updated The Hanja KP source is: before publication) KP1 KPS 10721-2000 In the next paragraph first sentence, change ‘8-line’ Page 4, clause 5 General structure of the UCS by ‘9-line’. th Insert the following note after the 6 paragraph (ex- In the line ‘01-06 octet: BMP…’, replace ‘code value’ cluding the existing note which is renamed NOTE 1): by ‘code position’. NOTE 2 – The use of the term “canonical” for this After the line ’29-36 octet: Unicode U sources..’, form does not imply any restriction or preference for add: this form over transformation formats that a conform- 37-44 octet: Hanja KP sources (KP1-hhhh) ing implementation may choose for the representation of UCS characters. Add the KP source entries in the CJK Compatibility reference file. 1 ISO/IEC 10646-1:2000/Amd. 2:2002 (E) Final Proposed Draft Amendment (FPDAM) 2 In the same reference file, for the entry 0F951, Page 14, clause 24 Combining characters change the corresponding CJK Unified Ideograph Change the second paragraph of sub-clause 24.5 from 096FB to 0964B. introduced by the first amendment (now sub-clause Click on this highlighted text to access the updated 25.5) to read as follows: reference file (CJKC0SR.txt). No sequences using characters from VARIATION SELECTOR-2 to VARIATION SELECTOR-16 from Page 13, after clause 23 Order of characters the Basic Multilingual Plane and VARIATION SELECTOR-17 to VARIATION SELECTOR-256 Insert a new clause 24 between the current clause from the Supplementary Special-purpose Plane are 23 and 24 (the following clauses are incremented by defined at this time. one): In the table showing the variant appearances 24 Normalization forms change the entry <2269, FE00> as follows: Normalization forms are the mechanisms allowing the selection of a unique coded representation <2269, FE00>GREATER-THAN BUT NOT EQUAL TO with among alternative, but equivalent coded text repre- vertical stroke sentations of the same text. Normalization forms for use with ISO/IEC 10646 are specified in the Unicode Remove the entries <2278, FE00> and <2279, Standard UAX#15. FE00> from the same table. NOTE 1 – By definition, the result of applying any of these normalization forms is stable over time. It Page 304, clause 27 CJK Unified ideographs means that a normalized representation of text re- mains normalized even when the standard is After the clause title, add a new sub-clause title: amended. 28.1 Code tables NOTE 2 – Some normalizations forms favor compos- Add the following Hanzi G sources after ‘GE ite sequences over shorter representations of text, GB16500-95’: others favor the shorter representations. The back- ﹙康熙字典﹚ ward compatibility requirement is provided by estab- G_KX Kangxi Dictionary ideographs lishing ISO/IEC 10646-1:2000 and ISO/IEC 10646- including the addendum﹙康熙字典﹚補遺. 2:2001 as the reference versions for the definition of G_HZ Han Yu Da Zi Dian ideographs ﹙漢語 大 the shorter representation of text. 字典﹚. Page 14, before sub-clause 24.3 In the current clause 24 (which becomes 25), insert Insert the new Hanzi H source reference after the a new clause 25.3 (after the current clause 24.2): Hanzi GK sources as follows: 25.3 Alternate coded representations Alternate coded representations of text are gener- The Hanzi H source is: ated by using multiple combining characters in H Hong Kong Supplementary Character Set different orders, or using various equivalent com- binations of characters and composite sequences. Add the following Hanja K source after ‘K3 PKS C These alternate coded representations result in 5700-2 1994’: multiple representation of the same text. Normal- K4 PKS 5700-3:1998 izing these coded representations creates a unique representation. Insert the new Hanja KP source references after the NOTE – For example, in implementation level 3 the Hanja K sources as follows: French word “là” may be represented by the charac- ters LATIN SMALL LETTER L followed by LATIN Hanja KP sources are SMALL LETTER A WITH GRAVE, or may be rep- KP0 KPS 9566-97 resented by the characters LATIN SMALL LETTER KP1 KPS 10721-2000 L followed by LATIN SMALL LETTER A followed by COMBINING GRAVE ACCENT. When the normali- Add the following ChuNom V sources after ‘V1 zations forms are applied on those alternate coded TCVN 6056:1995’: representations, only one representation remains. The form of the remaining representation depends V2 VHN 01:1998 on the normalization form used. V3 VHN 02:1998 In the current clause 24.3 (new 25.4), remove the After the last paragraph of the current clause 27, Note (replaced by note mentioned above) add the following new text: 2 Final Proposed Draft Amendment (FPDAM) 2 ISO/IEC 10646-1:2000/Amd. 2:2002 (E) 28.2 Source references for CJK Unified Ideo- Add a ‘*’ (for fixed collections) to the following collec- graphs tions: The source references to CJK Unified Ideographs 6 SPACING MODIFIER LETTERS and CJK Unified Ideographs Extension A provide 15 ARABIC EXTENDED source information in a machine-readable format 43 ENCLOSED ALPHANUMERICS that is accessible as a link to this document. The 56 CJK COMPATIBILITY 66 CJK COMPATIBILITY FORMS content pointed to by these links is also normative. In the list of collection numbers and names, after The content linked to is a plain text file, using 103 VARIATION SELECTORS ISO/IEC 646-IRV characters with LINE FEED as end insert new entries as follows: of line mark, that specifies, after a 11-line header, as 104 LTR ALPHABETIC PRESENTATION FORMS many lines as CJK Unified Ideographs in the sum of FB00-FB1C the two blocks; each containing the following infor- 105 LTR ALPHABETIC PRESENTATION FORMS mation organized in fixed width fields: FB1D-FB4F 106 LIMBU 1900-194F • 01-05 octet: Plane 0 code position (0hhhh) 107 TAI LE 1950-197F 108 KHMER SYMBOLS 19E0-19FF * • 06-12 octet: Hanzi G sources (G0-hhhh), 109 PHONETIC EXTENSIONS 1D00-1D7F (G1-hhhh), (G3-hhhh), (G5-hhhh), 110 MISCELLANEOUS SYMBOL AND ARROWS (G7-hhhh), (GS-hhhh), (G8-hhhh), 2B00-2B2F (GE-hhhh), (G_KX ) or (G_HZ ). 111 YIJING HEXAGRAM SYMBOLS 4DC0-4DFF * • 13-19 octet: Hanzi T sources (T1-hhhh), Move the collection 63: (T2-hhhh), (T3-hhhh), (T4-hhhh), 63 ALPHABETIC PRESENTATION FORMS (T5-hhhh), (T6-hhhh), (T7-hhhh) or (TF-hhhh). within the list already describing collections which are union of particular collections (collection 250 and • 20-26 octet: Kanji J sources (J0-hhhh), 251) as follows: (J1-hhhh) or (JA-hhhh). 63 ALPHABETIC PRESENTATION FORMS Collections 104-105 • 27-33 octet: Hanja K source (K0-hhhh), (K1-hhhh), (K2-hhhh), (K3-hhhh) or After (K4-dddd). 282 MES-1 add • 34-40 octet: ChuNom V sources (V0-hhhh), 283 MODERN EUROPEAN SCRIPTS See A.4.3 * (V1-hhhh), (V2-hhhh) or (V3-hhhh). After • 41-47 octet: Hanzi H source (H-hhhh ). 304 UNICODE 3.2 • 48-55 octet: Hanja KP sources (KP0-hhhh) add or (KP1-hhhh). 305 UNICODE 4.0 See A.5.3 * The format definition uses ‘d’ as a decimal unit Page 881, Annex A.1 and ‘h’ as a hexadecimal unit. Uppercase charac- ters and all other symbols between parentheses Under Note 1, add ‘104’ to the list of numbers (after including the space character appear as shown. 96). Click on this highlighted text to access the refer- In the alphabetical list of keywords: ence file. add collection “110” to the entry “Arrows”, NOTE 1 – The content is also available as a separate viewable file in the same file directory as this docu- add collections “104” and “105” to the entry “Presen- ment.