Universal Multiple-Octet Coded Character Set (UCS) -- AMENDMENT
Total Page:16
File Type:pdf, Size:1020Kb
JTC1/SC2/WG2 N2936 Final Proposed Draft Amendment (FPDAM) 2 ISO/IEC 10646:2003/Amd.2:2005 (E) Information technology — Universal Multiple-Octet Coded Character Set (UCS) — AMENDMENT 2: NKo, Phags-pa, Phoenician and other characters Page 1, Clause 1 Scope Page 10, Sub-clause 9.2 Other Planes reserved In the note, update the Unicode Standard version for future standardization from 4.1 to 5.0. Replace text and note by the following: Planes 11 to FF in Group 00 and all planes in any Page 2, Clause 3 Normative references other groups (i.e. Planes 00 to FF in Groups 01 to Update the reference to the Unicode Bidirectional 7F) are permanently reserved. Algorithm and the Unicode Normalization Forms as Code positions in these planes do not have a map- follows: ping to the UTF-16 form (see annex C). Unicode Standard Annex, UAX#9, The Unicode Bidi- Page 13, Clause 18 Block names rectional Algorithm, Version 5.0.0, [date TBD]. Replace title with “Block and Collection names”. Add Unicode Standard Annex, UAX#15, Unicode Nor- a sub-clause “18.1 Block Names”. Add the following malization Forms, Version 5.0.0, [date TBD]. paragraph after last. Rules to be used for constructing the names of blocks are given in clause 28.1. Page 6, Figure 1 - Entire coding space of the Add the following sub-clause: Universal Multiple-Octet Coded Character set 18.2 Collection Names Remove the note. Collections are shown in Annex A. Page 7, Figure 2 – Group 00 of the Universal Mul- Rules to be used for constructing the names of col- tiple-Octet Coded Character set lections are given in clause 28.1. Remove the second note and rename the first note “NOTE” instead of “NOTE 1”. Page 17, Sub-clause 20.4 Variation selectors Insert the following text after the table describing the Page 8, Sub-clause 6.4 Naming of characters Mongolian variation sequences: In the list item c), remove ‘unified’ in front of ‘ideo- The following table provides a description of the graphs’. variant appearances corresponding to the use of Replace last paragraph starting with “Guidelines to appropriate variation selectors with all allowed base be used” with the following text: Phags-pa characters. These variation selector se- Additional rules to be used for constructing the quences do not select fixed visual representation; names of characters are given in clause 28.1. rather, they select a representation that is reversed from the normal form predicted by the preceding Page 9, Clause 7 General requirement for the character. UCS Sequence Description of variant appearance Make the second note normative by removing the (UID notation) ‘NOTE 2” term and by setting it as requirement b. <A856, FE00> PHAGS-PA LETTER reversed shaping The first note is renamed “NOTE” and the following SMALL A requirements become c. and d. <A85C, FE00> PHAGS-PA LETTER reversed shaping SMALL HA © ISO/IEC 2005 – All rights reserved 1 ISO/IEC 10646:2003/Amd.2:2005 (E) Final Proposed Draft Amendment (FPDAM) 2 <A85E, FE00> PHAGS-PA LETTER reversed shaping Character names and named UCS sequence identi- SMALL I fiers only may additionally allow a sequence of a <A85F, FE00> PHAGS-PA LETTER reversed shaping SPACE directly followed by a HYPHEN-MINUS, or a SMALL U HYPHEN-MINUS directly followed by a SPACE, as <A860, FE00> PHAGS-PA LETTER reversed shaping in the following examples: SMALL E 0F60 TIBETAN LETTER -A <A868, FE00> PHAGS-PA SUBJOINED LETTER re- 0F0A TIBETAN MARK BKA- SHOG YIG MGO versed shaping YA 28.1.3 Name uniqueness Each entity named in this standard shall be given Page 23, Clause 28 Character names and annota- only one name. tions, sub-clause 28.1 General NOTE – This does not preclude the informative use of Replace paragraph with following new sub-clauses name aliases or acronyms for the sake of clarity. 28.1.1 to 28.1.4 However, the normative entity name will be unique. 28.1.1 Entity names In addition, each entity name must also be unique This standard specifies names for the following en- within an appropriate name space, as specified here. tity types: 28.1.3.1 Block names • characters Block names constitute a name space. Each block • named UCS sequence identifiers (clause 29) name must be unique and distinct from all other • blocks (clause 18, Annex A.2) block names specified in the standard. • collections (clause A.1) 28.1.3.2 Collection names The names given by this standard to these entities Collection names constitute a name space. Each shall follow the rules for name formation and name collection name must be unique and distinct from all uniqueness specified in this clause. This specifica- other collection names specified in the standard. tion applies to the entity names in the English lan- guage version of this standard. 28.1.3.3 Character and named UCS sequence identifiers NOTE 1 – In a version of such a standard in another Character names and named UCS sequence identi- language: fiers, taken together, constitute a name space. Each a) these rules may be amended to permit names to be character name or named UCS sequence identifier generated using words and syntax that are considered must be unique and distinct from all other character appropriate within that language; names or UCS sequence identifiers. b) the entity names from this version of the standard 28.1.3.4 Determining uniqueness may be replaced by equivalent unique names con- structed according to the rules amended as in a) For block names and collection names, two names above. shall be considered unique and distinct if they are different even when SPACE and medial HYPHEN- NOTE 2 – Additional guidelines for constructing entity MINUS characters are ignored in comparison of the names are given in annex L for information. names. 28.1.2 Name formation Entity names shall use only Latin capital letters A to For example, the following hypothetical block names Z, digits 0 to 9, SPACE, and HYPHEN-MINUS would be unique and distinct: (002D). LATIN-A Collection names only may additionally use the LATIN-B FULL STOP (002E). And the following hypothetical block names would The first character in an entity name shall consist not be unique and distinct: only of the Latin capital letters A to Z. The non-alphanumeric characters in entity names LATIN-A (SPACE, HYPHEN-MINUS, and for collection LATIN A names, FULL STOP) have the following additional LATINA restrictions: They must not occur in sequences of For character names and named UCS sequence more than one in a row, and they must not terminate identifiers, two names shall be considered unique an entity name. and distinct if they are different even when SPACE 2 © ISO/IEC 2005 – All rights reserved Final Proposed Draft Amendment (FPDAM) 2 ISO/IEC 10646:2003/Amd.2:2005 (E) and medial HYPHEN-MINUS characters are ignored 29 Named UCS Sequence Identifiers and when the strings "LETTER", "CHARACTER", A named UCS Sequence Identifier (USI) is a USI and "DIGIT" are ignored in comparison of the names. associated to a name following the same construc- tion rules as for character names. These rules are For example, the following hypothetical character given in Clause 28. names would not be unique and distinct: NOTE – The purpose of these named USIs is to spec- ify sequences of characters that may be treated as MANICHAEAN CHARACTER A single units, either in particular types of processing, in MANICHAEAN LETTER A reference by standards, in listing of repertoires (such But the following two actual character names are as for fonts or keyboards). considered unique and distinct, because they differ The following list provides a description of these by a *non*-medial HYPHEN-MINUS: named UCS sequence identifiers. 0F68 TIBETAN LETTER A USI USI name 0F60 TIBETAN LETTER -A <0100, 0300> LATIN CAPITAL LETTER A WITH MACRON AND GRAVE The following two character names are the only ex- <0101, 0300> LATIN SMALL LETTER A WITH ceptions to this specification, because they were MACRON AND GRAVE created before this name uniqueness requirement <00E1, 0328> LATIN SMALL LETTER A WITH ACUTE AND OGONEK was specified: <0045, 0329> LATIN CAPITAL LETTER E WITH VERTICAL LINE BELOW 116C HANGUL JUNGSEONG OE <0065, 0329> LATIN SMALL LETTER E WITH 1180 HANGUL JUNGSEONG O-E VERTICAL LINE BELOW <00C8, 0329> LATIN CAPITAL LETTER E WITH 28.1.4 Annotations VERTICAL LINE BELOW AND GRAVE A character name or a named UCS sequence identi- <00E8, 0329> LATIN SMALL LETTER E WITH fier may be followed by an additional explanatory VERTICAL LINE BELOW AND GRAVE <00C9, 0329> LATIN CAPITAL LETTER E WITH statement not part of the name, and separated by a VERTICAL LINE BELOW AND ACUTE single SPACE character. These statements are in <00E9, 0329> LATIN SMALL LETTER E WITH parentheses and use the Latin lower case letters a- VERTICAL LINE BELOW AND ACUTE z, digits 0-9, SPACE and HYPHEN-MINUS. A capi- <00CA, 0304> LATIN CAPITAL LETTER E WITH CIRCUMFLEX AND MACRON tal Latin letter A-Z may be used for word initials <00EA, 0304> LATIN SMALL LETTER E WITH where required. CIRCUMFLEX AND MACRON <00CA, 030C> LATIN CAPITAL LETTER E WITH Such parenthetical annotations are not part of the CIRCUMFLEX AND CARON entity names themselves, and the characters used in <00EA, 030C> LATIN SMALL LETTER E WITH the annotations are not subject to the name unique- CIRCUMFLEX AND CARON <012A, 0300> LATIN CAPITAL LETTER I WITH ness requirements. MACRON AND GRAVE A character name may also be followed by a single <012B, 0300> LATIN SMALL LETTER I WITH MACRON AND GRAVE ASTERISK separated from the name by a single <0069, 0307, 0301> LATIN SMALL LETTER I WITH SPACE.